compare zip content from command line

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cyberchicken
    Enthusiast
    • Mar 2015
    • 26

    compare zip content from command line

    Hello! I trust BC and I think I simply need some hint to do what I want.

    I have some hundreds of zip files (backups) which are all binarily different, but I know for sure that most are the same content-wise.
    I know how to compare zip files with the gui: in hex they have some different bytes, and by content they are the same. That is plain.

    Now, how can I compare two zips contents on the command line? How should I interpret the response?
    Is "Binary differences" (11) meaning that the content is the same?
    What does it mean "Similar" (12)?
    Should I aim for "Rules-based same" (2)?

    My idea is to write a batch file to compare each couple of zips and delete one of them if they have the same content. I will have no problem doing that (if BC has a way of automating EVEN THAT that would be incredible
    Anyway, comparing content, ignoring zip wrapper differences would be great in general.

    Of course I could unzip the files, compare, take a decision and so on, but I thought that if I can do that with the gui it should be possible to do it also on the command line.

    Thank you for any suggestion!
    Last edited by cyberchicken; 28-May-2015, 08:20 PM. Reason: improved question
  • Aaron
    Team Scooter
    • Oct 2007
    • 16007

    #2
    Hello,

    If the archives are binary but the file content is equal, I would suggest using BC Scripting, load the archives as base folders, and then expand all, generate a folder-report. You can do this first in the graphical interface by starting a Folder Compare session, load your two archives, then use the Session menu -> Folder Compare Report. Try out different layouts to find the best to use for scripting.

    Script would be similar to:
    criteria rules-based
    load "c:\file.zip" "c:\file2.zip"
    expand all
    folder-report layout:side-by-side options:display-mismatches output-to:"c:\bcreport.html" output-options:html-color
    Aaron P Scooter Software

    Comment

    • cyberchicken
      Enthusiast
      • Mar 2015
      • 26

      #3
      Thank you Aaron!

      Scripting will do. Errorlevel would be easier, but I can grep "Differences Files (0)" and delete the zip (layout:summary & output-to:clipboard & options:column-none ).

      BTW, at this point, I could execute the deletion right into BC, leaving me with an empty zip. Is there a smart way of deleting the container zip, when empty, within the same script?

      Another consideration: in general this process would be useful to "add" differential capability to backup softwares missing it, like on most the web applications: just leave the different files.

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16007

        #4
        Hello,

        You could load the folder above the zips instead of the zips directly, and set the file name filter to then only show that one zip file. That way, an Expand All would show the files, but you can still Select the zip and Delete it with script commands. If it's the current base folder, you couldn't delete it from bcscripting, and would need to use a wrapper .bat file to first call the BCScript, then delete the zip using the .bat scripting.

        How would you like to "leave" different files? The select command can select differences, then either Copy To Folder to a new .zip file target, or you can select the equal files and delete them from the zip, leaving any different files in the original zip.
        Aaron P Scooter Software

        Comment

        • cyberchicken
          Enthusiast
          • Mar 2015
          • 26

          #5
          Originally posted by Aaron
          You could load the folder above the zips instead of the zips directly, and set the file name filter to then only show that one zip file. That way, an Expand All would show the files, but you can still Select the zip and Delete it with script commands.
          Good idea, but can I script BC to align two zip with different names? All zips are in the same folder and have a timestamp in the name.

          If it's the current base folder, you couldn't delete it from bcscripting, and would need to use a wrapper .bat file to first call the BCScript, then delete the zip using the .bat scripting.
          Yes.

          How would you like to "leave" different files? The select command can select differences, then either Copy To Folder to a new .zip file target, or you can select the equal files and delete them from the zip, leaving any different files in the original zip.
          I would delete equal files, the remaining ones would be the "increment" or the "difference" in the backup.

          Thank you for the support!

          Comment

          • Aaron
            Team Scooter
            • Oct 2007
            • 16007

            #6
            Alignment Overrides can be done in the graphical interface if you have a Pro license, but we do not support them in scripting yet. If the file names are static, you could save a Folder Compare session that points to and aligns your files, but if they are in flux I'd recommend the second (.bat wrapper) method.

            I can't think of an easy way to know if, when deleting the files within the zip it empties the zip and results in a 0 size file. If you set the zips as base folders, and compare the files within, you can select equal files and then delete them in scripting. Since script has no preview, and we do not support Undo, I would recommend testing with test files first while learning scripting to prevent loosing any data unintentionally. You could then run reports on the resulting zips, and parse those reports to see if the zip is empty to then determine if you want to delete it? You may want to consider using the criteria command to enable a content comparison scan (Binary, CRC, or rules-based) to determine if the files are different.
            Aaron P Scooter Software

            Comment

            • cyberchicken
              Enthusiast
              • Mar 2015
              • 26

              #7
              Tide has changed and time evaporated. I'll report when I'll do it!

              Originally posted by Aaron
              If the file names are static, you could save a Folder Compare session that points to and aligns your files, but if they are in flux I'd recommend the second (.bat wrapper) method.
              Yes, the wrapper seems the way to go.

              I can't think of an easy way to know if, when deleting the files within the zip it empties the zip and results in a 0 size file.
              Actually empty zips are a few bytes long. I expect every zipper to make different empty zipfiles, so I wouldn't rely much on size detection, BUT in this specific application the zipper program will always be the same so it would be viable.
              As a backup pathway I think I can find an interrogation tool for zipfiles.
              Now that I think: who cares, in the end my aim is to save space, I could just let go empty zipfiles...

              If you set the zips as base folders, and compare the files within, you can select equal files and then delete them in scripting.
              I will do exactly that.

              Since script has no preview, and we do not support Undo, I would recommend testing with test files first while learning scripting to prevent loosing any data unintentionally.
              Sure as hell I will test

              You could then run reports on the resulting zips, and parse those reports to see if the zip is empty to then determine if you want to delete it? You may want to consider using the criteria command to enable a content comparison scan (Binary, CRC, or rules-based) to determine if the files are different.
              Yes, also that.

              Thank you for the support!
              Last edited by cyberchicken; 17-Jun-2015, 04:50 AM.

              Comment

              • cyberchicken
                Enthusiast
                • Mar 2015
                • 26

                #8
                Btw, I found this:
                https://www.mkssoftware.com/docs/man1/zipinfo.1.asp

                Comment

                • Aaron
                  Team Scooter
                  • Oct 2007
                  • 16007

                  #9
                  Useful. You could also generate a Folder-Report Summary report on the .zip to get a file listing, but the command line is probably quicker since you just need a quick count, not a saved file listing.
                  Aaron P Scooter Software

                  Comment

                  Working...