View - Hex details - does NOT show BOM

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • h2o2
    Enthusiast
    • Nov 2020
    • 22

    View - Hex details - does NOT show BOM

    1. Make new text file with BOM

    ***

    2. Start "New Session" - "Text Compare"
    3. Load this file
    4. Press "View" - "Hex details"

    Bottom Hex view is incomplete - BOM is missing.

    ***

    2. Start "New Session" - "Hex Compare"
    3. Load this file

    BOM in place, Ok

    ***

    Can you fix it?
    Hex is Hex. All bytes need to show.
    Attached Files
    Last edited by h2o2; 06-Nov-2020, 01:10 PM.
  • Aaron
    Team Scooter
    • Oct 2007
    • 16000

    #2
    Hello,

    The Hex Details line in the Text Compare only shows the Hex for the visually selected line. The BOM details are included in the Header area and underlined red. In your screenshot, your header is very narrow, but the "U" is the beginning of this phrasing. I think if it were a little wider it would be easier to spot. You can also hover over this status bar for a pop-up.

    The Text Compare is focused on comparing visible Text data. A difference in BOM (or other encoding differences) does not mark the files as Rules-based different, so the header area is just informational, much like the timestamp. A Binary scan from the Folder Compare would mark the files as different, or using the Hex Compare to compare the entire Hex of the file beyond the Text data.

    You can launch a Hex Compare from a Text Compare if you want to dig deeper into the entire file from the Session menu -> Compare Files Using -> Hex Compare.
    Aaron P Scooter Software

    Comment

    • h2o2
      Enthusiast
      • Nov 2020
      • 22

      #3
      Hi!

      Sorry, but you do not understand what you are writing about. Let me show.

      I have not Mac and Mac files, therefore I am working with UNIX and Windows files only.
      They may be with or w/o BOM.
      In result we have four types of files:
      1) With BOM + UNIX end line
      2) With BOM + Windows end line
      3) Without BOM + UNIX end line
      4) Without BOM + Windows end line

      Let make this files. All in UTF-8.
      I made five lines files.
      Visually equal but all different size.
      BOM - addition 3 bytes.
      Every Windows line end - addition 1 byte per line. (5 bytes in my example).
      Attached Files

      Comment

      • h2o2
        Enthusiast
        • Nov 2020
        • 22

        #4
        Originally posted by Aaron
        The Hex Details line in the Text Compare only shows the Hex for the visually selected line.
        Mistake 1

        Hex Details show difference with line endings. They are not "visually selected" as you wrote, but we see it.
        Let compare any files with UNIX and Windows line end. With or w/o BOM does not matter.

        Attached Files
        Last edited by h2o2; 10-Nov-2020, 04:20 AM.

        Comment

        • h2o2
          Enthusiast
          • Nov 2020
          • 22

          #5
          Originally posted by Aaron
          Hello,
          The BOM details are included in the Header area and underlined red. In your screenshot, your header is very narrow, but the "U" is the beginning of this phrasing. I think if it were a little wider it would be easier to spot.
          Mistake 2

          This is word UNIX.
          And this is not BOM. This is show type of line end (UNIX or Windows (PC in your program)).

          Let compare files with and w/o BOM.
          First pair all UNIX end.
          Second pair all Windows end.
          As you can see word UNIX show type of end line. It is not connected with presence of BOM.

          BUG detected!
          On files with BOM we can see number 4294967292 instead of encoding.
          (This number looks like internal program error because it is max integer of 32 bit system.)




          Attached Files
          Last edited by h2o2; 10-Nov-2020, 04:23 AM.

          Comment

          • h2o2
            Enthusiast
            • Nov 2020
            • 22

            #6
            As result in this theme for "Text compare" tab:

            Two bugs detected:
            1) Hex Details shows invisible chars (such as line endings) but missing BOM
            2) On files with BOM we see number 4294967292 instead of encoding

            There is a wish:
            File difference by:
            - BOM presence
            - Type of line endings
            - Encoding
            Now program shows only 2 of 3 file specifications.
            Programm shows type of line endings as word "UNIX" or "PC".
            Programm shows encoding as "Current locale (UTF-8)".
            But program does not show BOM anywhere.

            I need see BOM in Hex Details and in status bar ("Header area and underlined red" as you wrote).





            Comment

            • Aaron
              Team Scooter
              • Oct 2007
              • 16000

              #7
              Hello,

              There's a lot to go through. I'll try to tackle each point, and let me know if I miss anything:

              1) View menu -> Visible Whitespace. This will allow the Line Ending characters to be visually represented in the file (besides breaking the text data). Whitespace and Linebreak characters are hidden from view by default to make the file more readable, but can be enabled with this toggle option. BC4 always considers the line break important and breaks the line, but you can configure if BC4 considers a difference in type (Win vs Unix vs Old Mac) as Important or Unimportant. Each is represented by a different Visible Whitespace character, and the Importance is controlled in the Session Settings.

              2) That status bar Max Int does look like a bug in the display. I'll open a tracker to investigate why it isn't showing the correct title.

              3) The program does show the BOM in the status bar, but not in the text compare's comparison pane (or in the Hex Details, which is the details only for the currently selected line. For example, if you were to compare Word or PDF files, this also presents the data as text in the Text Compare, but removes all font information, pictures, formatting, etc. The Text Compare is designed to allow different encoding files to be compared against each other and evaluate their text differences, but does not have options for marking an encoding differences as a rules-based difference. Only a Binary scan (or CRC scan) would determine if non-text information in the file is different.
              Aaron P Scooter Software

              Comment

              • h2o2
                Enthusiast
                • Nov 2020
                • 22

                #8
                Originally posted by Aaron
                1) View menu -> Visible Whitespace.
                We speak about Hex Preview.
                But now you try to change theme to text editor. Text editor is Ok.
                Line endings show in Hex Preview regardless of setting "Visible Whitespace".

                Originally posted by Aaron
                3) The program does show the BOM in the status bar, but not in the text compare's comparison pane (or in the Hex Details, which is the details only for the currently selected line.
                What status bar?
                Can you make screenshot?

                Originally posted by Aaron
                The Text Compare is designed to allow different encoding files to be compared against each other and evaluate their text differences, but does not have options for marking an encoding differences as a rules-based difference. Only a Binary scan (or CRC scan) would determine if non-text information in the file is different.
                BOM has priority over the text.
                This means that information about BOM must be reflected in the text comparison tab.

                https://www.w3.org/International/que...yte-order-mark
                "Changes introduced with HTML5 mean that the byte-order mark overrides any encoding declaration in the HTTP header when detecting the encoding of an HTML page. This can be very useful when the author of the page cannot control the character encoding setting of the server, or is unaware of its effect, and the server is declaring pages to be in an encoding other than UTF-8. If the BOM has a higher precedence than the HTTP headers, the page should be correctly identified as UTF-8."

                Comment

                • Aaron
                  Team Scooter
                  • Oct 2007
                  • 16000

                  #9
                  BC4 respects and uses the BOM to load the files if it is present, otherwise it attempts detection. This is displayed on the status bar, which you have screenshots of where max int is displaying. That text should be "UTF-8 with BOM". If you open other combinations, I suspect other titles will display correctly (and UTF-8 with BOM does display on other platforms). For example, you can see "ANSI" in the Comparing Text Files screenshot here:
                  http://www.scootersoftware.com/features.php

                  Here's a quick screenshot from Windows that shows a difference in UTF8 with BOM. vs. UTF8, underlined red.

                  I opened a bug tracker entry to look into why that is showing "UTF8 with BOM" as MaxInt. It should be "UTF-8 with BOM"

                  Whitespace and Linebreaks aren't visible unless View Whitespace is enabled, but they are still part of the comparison. If they were Important (configuration) and a difference, the line would be red but the character not shown on screen unless you toggle on the option to view it. This applies to spaces, tabs, or line breaks. The Encoding status is only at the top of the file in the status bar, and not part of the rules-based comparison. Since it is pulled out to allow files of different encodings to report as equal if the text is equal, then it isn't in the Hex Details of Line 1 either. The only way to see that as Hex is to use the Hex Compare, which does compare the entire file as Hex.
                  Attached Files
                  Aaron P Scooter Software

                  Comment

                  • h2o2
                    Enthusiast
                    • Nov 2020
                    • 22

                    #10
                    Ok, thanks

                    Comment

                    Working...