HTML v Text Format

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chrisjj
    Carpal Tunnel
    • Apr 2008
    • 2537

    HTML v Text Format

    If I compare a .htm file with a .txt file of identical content, I get differences like this



    (which of course disappear if I change either side's format to match the other's.)

    Why the differences, please?
    Last edited by chrisjj; 20-Aug-2009, 04:50 PM.
  • Michael Bulgrien
    Carpal Tunnel
    • Oct 2007
    • 1772

    #2
    Because the comparison is rule's based, and the "Text Format" consists of different rules than the "HTML" format.

    You need to have the both sides evaluated using the same file format for differences to be evaluated correctly. Using the dropdown on the File Info pane to change the format of the text file from "Text Format" to "HTML" will ensure that you are performing an "Apples to Apples" comparison.
    BC v4.0.7 build 19761
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    Comment

    • Aaron
      Team Scooter
      • Oct 2007
      • 16002

      #3
      Michael's suggestion will help.

      The reason it is marked as different are different Grammar elements are considered different.
      So if in one file you have String defined as " to ", and in the other file there are no quotes then:
      This Text is Unequal
      vs
      "This Text is Unequal"

      will be marked as completely red, since the String is not equal to 'Everything Else' text.

      This happens if you mix and match file formats that do not have the same grammars defined. However, it is perfectly valid to compare two different formats if they both define the same Grammar element names. If in Java a String is ' to ', and in C# a String is " to " then:
      'This text is equal'
      to
      "This text is equal"

      will be equal.

      I hope that helps clear up what is happening. Let us know if you have any questions.
      Aaron P Scooter Software

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16002

        #4
        Also note, you can click into the text with the blinking selection cursor, and the status bar will show the name of the grammar element you are currently viewing.
        Aaron P Scooter Software

        Comment

        • chrisjj
          Carpal Tunnel
          • Apr 2008
          • 2537

          #5
          > You need to have the both sides evaluated using the same file format
          > for differences to be evaluated correctly.

          OK... so why does BC default to formats that give incorrect results?

          Comment

          • Michael Bulgrien
            Carpal Tunnel
            • Oct 2007
            • 1772

            #6
            BC3 did not choose to evaluate two different kinds of files. You are the one who chose files with different extensions.
            BC v4.0.7 build 19761
            ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

            Comment

            • chrisjj
              Carpal Tunnel
              • Apr 2008
              • 2537

              #7
              > BC3 did not choose to evaluate two different kinds of files.

              I didn't say it did.

              > You are the one who chose files with different extensions.

              So what??

              It was BC that defaulted to formats that give incorrect results.

              Comment

              • Aaron
                Team Scooter
                • Oct 2007
                • 16002

                #8
                chrisjj, Your screenshots are incomplete. Michael may be assuming you are comparing two html files, and used the dropdown to select different formats manually.

                Beyond Compare will use the topmost File Format for the defined filetype. If both of your files are *.html, then you used the dropdown or session settings to change a file format.
                If one file is *.html and the other is another format, then the other format is using the topmost rule that is defined. From your screenshot we can guess you are comparing a .txt file containing html code to an .html file containing html code, assuming you are using the default setting. If you show a full screen screenshot it would be easier to determine these cases. Emailing us example files and your BCSupport package would let us verify for certain.

                Our Text format, by default, is not going to contain the same grammar definitions as an html file. A normal text file does not have Keywords or the concept of Comments.

                All of this is customizable, however. If you want to change the default behavior to fit your workflow, or create another .txt rule to use in these scenarios, then you can do so easily. You can clone your HTML rule and create a custome rule to use with .txt files, or add .txt to your HTML file format.

                If you are still having trouble setting this up, please let us know.
                Aaron P Scooter Software

                Comment

                • chrisjj
                  Carpal Tunnel
                  • Apr 2008
                  • 2537

                  #9
                  > Michael may be assuming you are comparing two html files, and
                  > used the dropdown to select different formats manually.

                  I doubt that, given I said:

                  >>> It was BC that defaulted to formats that give incorrect results.

                  > you are comparing a .txt file containing html code to an .html file
                  > containing html code, assuming you are using the default setting.

                  Correct.

                  > If you are still having trouble setting this up, please let us know.

                  The only trouble I'm having is understanding why in such cases BC defaults to formats that give incorrect results.

                  Comment

                  • Aaron
                    Team Scooter
                    • Oct 2007
                    • 16002

                    #10
                    Chrisjj,

                    Our default behavior for .txt files is actually to use the <default> file format rules. These would then pass through and use the Other side if the other side had a defined rule: in this case, HTML.

                    If you email your support package to [email protected] we could verify this, but it looks like you may have created a Text Format file format associated with text files.

                    If you have a text file, BC cannot predict if the text inside is html code, c# code, php code, or pseudo-code. We use the file extension to determine how to treat the file; in this case it is .txt or .html. Auto-detecting code structure is something that is on our wishlist, but is not supported in the current version.

                    A .txt file probably contains no specific grammar elements. That is because .txt files do not have a syntax associated with them. If you open your file, and recognize that it can use a different file format, you can switch this with the dropdown, or use Session Settings to save this for future cases.


                    The reason you are seeing it marked as a difference, as mentioned above, is because the grammar elements have to match to be considered the same. Since a .txt file probably has no grammar elements, it can't match to anything that is a grammar element.
                    If you were comparing a C# file to an HTML file, and the text was both considered a "String", then it could potentially be equal text.

                    Disable your custom Text Format file format (uncheck it under the Tools menu -> File Formats) and you should see the correct default behavior.

                    If you are still having any trouble, please email your support package (Help menu -> Support; Export), and a pair of example files to [email protected]

                    Please include a link back to this forum post. Thanks.
                    Aaron P Scooter Software

                    Comment

                    Working...