Page 1 of 2 12 LastLast
Results 1 to 10 of 17
  1. #1
    Join Date
    Feb 2011
    Posts
    144

    Default Largest file size supported by binary comparison

    What is the largest file size that can be compared using binary comparison?

    I tried comparing 2 identical zip files (each one containing 2 files: a small text file and a 30 GB data file).

    When opening the comparison and trying to force a CRC comparison or a Rules-based comparison, BC would simply indicate instantly that all the files are a binary match. This is impossible for it to know because the comparison takes no time at all, and one of the zip files is on an internal drive and the other one on an external USB drive). Force refreshing would result in the same thing: instantly I'm told that they're binary matches.

    But then if I try to force a Binary Comparison (instead of CRC), then it actually starts doing some work. But after a few minutes, I just get an error and the comparison process fails. So I guess there is some kind of limit.

    I'm running v4.1.9 64-bit on a Windows 10 system with 8GB RAM. In settings, I have configured the buffer size for binary compare to 33554432.

  2. #2
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    There's no known upper limit and we use smaller blocks when scanning so the amount of RAM doesn't increase with the file size. A Rules-based scan might fail depending on the file extension, as certain session types have upper limits on the file size. CRC and Binary scans should complete successfully after a scan period. Note: a binary scan can return unequal very quickly if the size is different or if it finds the difference early in the file. An equal scan, however, will take longer and scan the entire file. On my own machine, I quickly generated and tested 30 and 40gb files without seeing the same problem.

    If you reboot your machine and power cycle the external hdd, does this impact the issue? If you compare a pair of test folders that are only on the external drive, do they show the same issue? Or a sample pair of files that exist only on the local drive?
    Aaron P Scooter Software

  3. #3
    Join Date
    Feb 2011
    Posts
    144

    Default

    Hi Aaron

    Internal vs external drive makes no difference, and rebooting makes no difference.

    It is still the case that CRC Comparison instantly says the files inside the ZIP "folder" are binary matches. And Binary Comparison still fails after a few minutes.

    30/11/2016 18:36:32 Load comparison: F:\DellE7470_Factory_USB_Recovery_Media_20161114.z ip <-> D:\DellE7470_Factory_USB_Recovery_Media_20161114.z ip
    30/11/2016 18:40:36 Unable to retrieve D:\DellE7470_Factory_USB_Recovery_Media_20161114.z ip\DellE7470_Factory_USB_Recovery_Media_20161114.b in: Invalid size or check sum of file
    30/11/2016 18:40:36 Unable to retrieve F:\DellE7470_Factory_USB_Recovery_Media_20161114.z ip\DellE7470_Factory_USB_Recovery_Media_20161114.b in: Invalid size or check sum of file
    30/11/2016 18:40:36 Background content comparison completed in 4 minutes, 5 seconds


    I believe it is the ZIP file itself that is causing problems for Beyond Compare. Let me tell you more about the file.

    I created a system restore USB flash drive for my laptop, so that it can be restored to factory settings by booting from the USB flash drive. That process copied about 10GB of uncompressible data to the 32GB USB flash drive. I then used a popular utility program called ImageUSB to create an image file of that USB flash drive. The file it created is 32GB in size, which I guess is mostly empty blocks. I then used WinRAR to create a zip file of that 32GB mostly-empty file, together with a small text file describing the image, which resulted in a 10GB zip file.

    So maybe you can replicate this scenario on your side? Perhaps partially fill up a USB flash drive, then use ImageUSB to create an image of it (which will include empty blocks), and then use WinRAR to create a zip file containing that image file. Then make 2 copies of that ZIP file and try to compare them in Beyond Compare.

  4. #4
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Ah hah! I missed "zip" in the first post. Sorry about that. I've created the large archive with very large content and I'm seeing the same behavior you are. I'll make a tracker entry to investigate. If you use the right-click Compare Contents command to perform a foreground scan, how does this work for you?
    Aaron P Scooter Software

  5. #5
    Join Date
    Feb 2011
    Posts
    144

    Default

    Quote Originally Posted by Aaron View Post
    If you use the right-click Compare Contents command to perform a foreground scan, how does this work for you?
    It works like I described:
    CRC Comparison instantly says the files are binary matches. And Binary Comparison fails after a few minutes (see the log file contents I posted in the previous message).

    When you say you are seeing the same behavior, do you mean that you see both of these incorrect things happen?

  6. #6
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    Chatting with a dev, I've learned something new: when working with a .zip the CRC values are stored as part of the zip and we use those for both CRC and Rules-based scan results (if CRC equal). This will return nearly instantly. A binary scan will ignore the stored CRC and will take a long time to scan the files. This is all intended behavior; does your scenario differ in any way?
    Aaron P Scooter Software

  7. #7
    Join Date
    Feb 2011
    Posts
    144

    Default

    OK, but are you seeing the problem where the Binary Comparison fails after a few minutes, like I do where it fails after 4 minutes and 5 seconds in my example?

    Concerning the other problem, i.e. the CRC Comparison, I think it's a very bad idea to rely on the CRC in the ZIP when trying to compare the actual contents. Sometimes the contents of a zip file are corrupt, and then the content won't match the CRC saved in the metadata of the zip. Just the other day I was extracting some RAR files, and WinRAR warned me when extracting that some of the files did not match their stored CRC values. Somehow, the contents of the archive had become corrupt. That is why archive tools usually also have a Test function where they compare the contents to the stored checksums.

    When using Beyond Compare to compare the actual contents of an archive, the expectation is that it is comparing the actual contents by calculating CRC values. Surely it should be doing pretty much the same thing that WinRAR/Winzip do when extracting/testing, i.e. check the actual contents to find out whether there's any corruption/difference in the actual data.

    Currently, if using Beyond Compare's CRC comparison to compare a good ZIP with a corrupt ZIP, it will report that they're identical because of course the metadata will still match.

  8. #8
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    No, the binary scan is finishing without error and without needing to alter the binary buffer option. If you extract these items out of the zip, do you encounter any errors during a binary scan of those contents?

    Trusting CRC values is something BC4 will do if CRC is selected, while Binary does not compute CRC and would bypass these values. If CRC corruption is a concern, then you would want to use Binary (and we'll troubleshoot getting this working).
    Aaron P Scooter Software

  9. #9
    Join Date
    Feb 2011
    Posts
    144

    Default

    Quote Originally Posted by Aaron View Post
    No, the binary scan is finishing without error and without needing to alter the binary buffer option.
    It took a lot of back-and-forth but I'm glad you now finally see that I was reporting 2 separate problems.

    Quote Originally Posted by Aaron View Post
    If you extract these items out of the zip, do you encounter any errors during a binary scan of those contents?
    If I extract the items out of the zip and do a binary scan on them, there are no errors.

    Quote Originally Posted by Aaron View Post
    Trusting CRC values is something BC4 will do if CRC is selected
    Can you please ask the developers to read this thread in its entirety? I still strongly disagree that even when explicitly choosing to Compare Contents via the Rules dialog, or by right-clicking on a specific file within an expanded ZIP and choosing to Compare Contents, the contents by default are not actually compared at all. At best, it could be called 'Compare metadata'. In the current implementation the little icon in the middle that shows a binary match is highly misleading, due to the contents not having been compared at all by Beyond Compare. Any corrupt content within zip files will be shown by Beyond Compare to have a little binary match icon. I need to be able to trust that there is no possibility that my file comparison tool is showing me false information. For other file types, Beyond Compare calculates the CRC values itself. I bet that not many people know or expect that when they ask Beyond Compare to compare their archive files' contents, by default Beyond Compare is not comparing the contents at all, and this can surely get people into trouble. Relying on an archive's built in CRC values in its metadata is fine for a quick comparison, but is of no use when doing a content comparison by any method. It should not be necessary to have to choose Binary Comparison. CRC Comparison should be reliable too.

  10. #10
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    Sorry, I missed mentioning that I expected foreground Binary scan to work when I suggested you try it. Binary scan was always functioning for me, and I was attempting to troubleshoot your "instant results".

    A dev has reviewed this, hence the my post that it's expected behavior and updating my explanation with what they told me. The Binary scan is designed for full data verification, while CRC and Rules-based are by design made to be quicker or ignore differences. If CRC always ignores the stored CRC code in the Zip, then it's simply a slower, less reliable Binary scan and affords no advantage. CRC scan is provided specifically in scenarios where you can trust CRC codes, so scans against zip files or FTP servers that provide xCRC codes can be done quickly. Similarly, a corruption that occurs outside of visible data would not be caught by a Rules-based scan. If you need verify an exact match and expect to deal with corrupt sources, then you need to use the Binary scan.

    If you are dealing with corrupt archives, this could be part of why the binary scan is failing. If you attempt to run the WinZip "Test" function both archives (left and right) do either of them report failure? If you work with new, test archives you create by manually zipping up some large files, do all archives crash similarly?
    Aaron P Scooter Software

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •