Feature Request: content-aware or "smart" sync

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • timg11
    Expert
    • Apr 2010
    • 82

    Feature Request: content-aware or "smart" sync

    I have several archive mailbox files created by Outlook. They never change, but outlook needs to open them to make the content searchable. For unknown reasons, whenever Outlook opens a PST file, the timestamp is modified. Outlook may change some bytes in the file as well, but no actual data (messages) are changed.

    I use BC4 to back up these files, and these PST files are many GB in size so syncing them every time is slow. A binary compare could tell if there are differences, but the compare requires fully reading both files, and takes as long as just copying the files.

    Is there any way BC could be "smarter" about comparing and syncing large files in the case where only a small part of the file might be different? Especially in the case where the remote drive is on another computer, would it be possible to have an "agent" run on the remote machine that could scan CRCs or MD5 for every 100Mbyte segment of the file (for example) and compare with a similar scan by BC on the local machine, and then transfer only the segments with differences? This capability is similar to the Unix utility rsync.

    Actually, as I think about it, it wouldn't even require an "agent" to execute on the remote system, if timestamps are available and reliable.
    BC could do the segment CRCs on the local drive file when it runs. The first time it just copies the whole file to the remote, and also copies the file containing segment CRC list. The next time BC runs it compares the remote file's timestamp with the copy of the CRC, and if they are the same, conclude the remote file has not changed since the CRCs were made. Then run the CRC scan on the local file, and update only the segments of the remote file that are different. Then update the remote CRC file to match its current state.
  • Aaron
    Team Scooter
    • Oct 2007
    • 15997

    #2
    Partial updates would be a nice feature to support, but is also a very large project that we wouldn't be able to tackle anytime soon. We do currently have Snapshots, which can save the CRC and Timestamp/Size information of a directory. Given that Outlook is updating the PST timestamp, however, it doesn't seem like the timestamp is a reliable calculation. This means still scanning and calculating the CRC each time you want to see if the local file has actually changed (which, it might if Outlook updates any metadata, like an internal accessed timestamp, similar to Excel). We don't have a segmented CRCs, so the snapshot's CRC is only of the full file.
    Aaron P Scooter Software

    Comment

    Working...