FolderCompare with File Diff only in Encoding

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mwatts
    Visitor
    • Mar 2008
    • 4

    FolderCompare with File Diff only in Encoding

    I have some folder diffs that I do of some Open Source repos where each repository contains different versions of the the same source tree. I kept getting the Folder Compare to report differences that when I looked at the files in the file difference they appears to have exactly the same content. Note: I am using binary compare.

    I finally figured out what is happening. They do have the same 'content' but with different encodings. One is ASCII and the other is UTF-8. It would be helpful to have a couple ways to handle that.

    1) Ignore encoding differences only (upconvert ASCII to UTF08 and compare for example)
    2) Something that clearly indicates in the file compare that they are different encodings. The encoding at the top fo the file IS different but its not highlighted as different so my eye tends to ignore it.

    Thanks
    -mark
  • Michael Bulgrien
    Carpal Tunnel
    • Oct 2007
    • 1772

    #2
    Sounds like a candidate for a new content compare rule under folder session settings (comparison tab):

    [ ] Ignore encoding differences

    Of course, this would be as slow as a binary compare since Cirrus would need to verify the content of the files before eliminating the encoding difference.

    Regarding the text compare (or binary compare in Mark's case) this is an interesting phenomemnon. From the folder compare, two identical files (except for coding differences) show as different for:


    CRC Content Compares
    Binary Content Compares


    And show as identical for:

    Quick Compares
    When Opened in a file compare window


    If a Content Compare has set the "not equal" icon in the margin between the two files, a quick compare will not reset the icon to an "equal" icon... but double-clicking on the file pair (or choosing "Open View" from the Quick Compare dialog) does cause the "not equal" icon to be replaced with the "equal" icon. It seems a little odd for a content compare to report something different than an open file compare.

    While I'm not sure it makes much sense to change how encoding is reported when a file compare is open (the encoding dropdown in the file info panel seems sufficient to me) I think it would make sense to report the encoding difference in the Quick Compare dialog since a Quick compare seems to know that the content is the same. Current dialog says:


    Path1\File1.ext
    Path2\File2.ext

    Quick Compare:
    = Same

    Perhaps this same dialog could list the encodings as follows:

    [ANSI] Path1\File1.ext
    [UTF8] Path2\File2.ext

    Quick Compare:
    = Same

    Or, the dialog could differentiate between encoding and content as follows:

    Quick Compare:
    ≠ Encoding Different
    = Content Same
    BC v4.0.7 build 19761
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    Comment

    • Chris
      Team Scooter
      • Oct 2007
      • 5538

      #3
      Mark,

      Instead of using binary, why not use rules-based comparison in the Folder Compare? A rules-based comparison will ignore the character encoding differences.

      Also, the text compare is intended to only show text differences using the default settings. This includes ignoring differences in character encoding, and ignoring differences in line terminator style (Unix line endings vs Windows line endings).

      If you want to see non-text differences show as a difference, then binary comparison criteria is appropriate instead of a rules-based or text comparison.
      Chris K Scooter Software

      Comment

      • Chris
        Team Scooter
        • Oct 2007
        • 5538

        #4
        Michael,

        What would "ignore character encoding differences" offer that isn't already available with rules-based comparison criteria?

        Also, the "Quick Compare" results can be either a binary comparison or rules-based comparison. From your description, it sounds like Cirrus is set to do a rules-based comparison for quick compares. To configure this, go to the "Startup" section of Cirrus' Tools|Options dialog.

        Again, this goes back to using binary criteria if you want to see character encoding differences as a difference, and rules-based comparison if you don't want to see character encoding differences as a difference.
        Chris K Scooter Software

        Comment

        • Michael Bulgrien
          Carpal Tunnel
          • Oct 2007
          • 1772

          #5
          Originally posted by Chris
          What would "ignore character encoding differences" offer that isn't already available with rules-based comparison criteria?
          I've never used a "rules-based comparison". Correct me if I am wrong, but if I understand the purpose of a rules-based comparison, it is to compare the contents of files based on the rules of that file-type (including such things as text replacements, importance, etc.) in the evaluation of the files.

          Personally, I think that Mark's concern is valid. When the contents of two files are exactly the same (regardless of rules) and only the encoding is different, it would be convenient for Cirrus to be able to identify the encoding as the sole difference.

          Conversely, it would be extremely inconvenient to have to change one's default comparison rules simply to determine if encoding is the only difference between two files. The fact is that users that have their default content compare rules set will likely have them set in a certain way for a very good reason, and will not want to modify them (nor would it make sense for them to) simply because they come across a file pair with encoding differences. By the time they've figured out that encoding is the only difference, they won't need to reconfigure Cirrus to set up a rules-based comparison to confirm their findings. But they will already have wasted time that could have been avoided if Cirrus detected and reported the situation sooner.

          Originally posted by Chris
          Also, the "Quick Compare" results can be either a binary comparison or rules-based comparison. From your description, it sounds like Cirrus is set to do a rules-based comparison for quick compares. To configure this, go to the "Startup" section of Cirrus' Tools|Options dialog.
          Actually, no. My Startup options are set to perform a Binary Quick Compare. Since UTF-8 encoded files require no byte order mark (BOM) it is possible for the the contents of an ANSI encoded file and a UTF-8 encoded file to be exactly the same, as was the case with the files I tested with.

          Originally posted by Chris
          Again, this goes back to using binary criteria if you want to see character encoding differences as a difference, and rules-based comparison if you don't want to see character encoding differences as a difference.
          As I said above, this would be useful information to know without having to change how all other files are compared. One should not have to reconfigure a product to evaluate all files differently just in case one might happen to come across folders with encoding differences. As Mark indicated in his original post, it would be useful to see this information regardless of how the product is configured for normal use. Seeing the encoding difference in the quick compare summary (even if it only appears when there is an encoding difference) would be a very useful enhancement.
          Last edited by Michael Bulgrien; 26-Mar-2008, 08:16 PM.
          BC v4.0.7 build 19761
          ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

          Comment

          • Michael Bulgrien
            Carpal Tunnel
            • Oct 2007
            • 1772

            #6
            Originally posted by Michael Bulgrien
            Since UTF-8 encoded files require no byte order mark (BOM) it is possible for the the contents of an ANSI encoded file and a UTF-8 encoded file to be exactly the same, as was the case with the files I tested with.
            Correction: My UTF-8 file does have a 3-character BOM. That is what makes it binarily different from the ANSI file. My Quick Compare shows the files as the same even though my Startup options have Quick Compare type set to Binary.
            BC v4.0.7 build 19761
            ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

            Comment

            • Zoë
              Team Scooter
              • Oct 2007
              • 2666

              #7
              Apparently I wasn't watching this topic closely enough. The "Quick Compare" command always does a rules-based comparison. The dialog when you start a new comparison from the command line uses the setting in the "Startup" options. The defaults for that initial dialog are designed to get it out of the way as quickly as possible unless it's an exact match.
              Zoë P Scooter Software

              Comment

              Working...