filter out binary files for folder compare report

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • peakjason
    Visitor
    • Jul 2015
    • 4

    filter out binary files for folder compare report

    I want to produce a folder compare report using the Statistics option, the CSV output where each file is listed in a row and the number of lines added, deleted, changed, etc. are all listed (IAdded, IChanged, etc). The folders I'm comparing are a mixture of text files and binary files. When BC includes a binary file when you choose the report you want, it does not allow you to select the Statistics report. It seems like all those useful statistics are only calculated for text files.

    If I was just comparing a couple folders individually, then I would sort the comparison results in the folder and only select the text files for the report, then I could choose the Folder Compare>Statistics option. But I have a lot of code repositories with many subdirectories and I can't do this individually for it all. Is there some filter I can make that will let me hide the binary files, so I can run the Statistics report easily? Maybe a regular expression with a hex range??

    Alternatively, is there a way that I can get the statistics report option to include the binary files, but just leave the IAdded/IChanged data for those files blank?

    Thanks for any help. This has bugged me for a couple projects now so I'd really like to solve it.
  • Aaron
    Team Scooter
    • Oct 2007
    • 15941

    #2
    Hello,

    As you've found, the Actions menu -> File Compare Report dialog works on the current selection, and determines the report options available to the selected items. We'll show all relevant text-report options if only text files are included, but once other file types are (binary, mp3, picture, etc), then the dialog represents the options available to a mixed report.

    We don't have a method to automatically remove binary files or include them in a text-report, but you could use File Name filters. These are defined on the toolbar in the Filters box or the Session menu -> Session Settings, Name Filters tab. Here you can exclude all known binary file extensions you are working with or include only the text extensions you know you are working with. This would leave the view with only the appropriate files, Edit menu -> Expand All, Select All Files, and you can then generate the Statistics Report on a selection of only text files.
    Aaron P Scooter Software

    Comment

    • peakjason
      Visitor
      • Jul 2015
      • 4

      #3
      Thanks for the suggestion. Is there a way I could write a regular expression to find only the text files, or alternatively reject all the binary files? If I just look for all the ASCII codes up to 128, will that knock out any Unicode text files I'd actually want to see?

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 15941

        #4
        Hello,

        Using the Other Filters tab, Containing/Not Containing filter? Perhaps, although the Containing filter is a demanding scan and will significantly increase the scan time. Of note: the Name Filters dropdown and quickly swap between presets; and presets can be defined and added to the list in the Options dialog, Tweaks section.
        Aaron P Scooter Software

        Comment

        • peakjason
          Visitor
          • Jul 2015
          • 4

          #5
          Sure, I understand how to access the feature. I guess what I'm asking is what criteria does BC use to decide if a file is binary, and what regex should I use to mimic that same decision?

          Comment

          • Aaron
            Team Scooter
            • Oct 2007
            • 15941

            #6
            BC uses a heuristic to try and determine if a file is Binary, which requires opening and scanning the file to perform. We don't have a RegEx to emulate this behavior since the heuristic is capable of sampling sections or stopping once it makes a determination; the Containing Filter must scan the entire file (for control characters: 0x00-0x1F) to determine if it is containing specific text. Containing filter is not designed to help determine if a file is text or binary.

            But before BC4 uses the heuristic, we have File Formats which catch files of specific extensions. These formats then control if the Text Compare or Hex Compare (binary) is used, and the heuristic is used if the file extension falls through all defined formats in the Tools menu -> File Formats list.

            We don't currently support a filter to scan and include or exclude files BC determines are binary. The current version of BC4's method is to flip between these types of files is the Name Filter Dropdown, which can be populated with presets (defaults include C++/C# Code, Delphi, HTML Web development, etc) and additional presets can be added.
            Aaron P Scooter Software

            Comment

            Working...