Bad UTF-8 char not handled well.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dmurdoch
    Visitor
    • Feb 2019
    • 7

    Bad UTF-8 char not handled well.

    I was comparing two files that contained illegal UTF-8 chars (they were really some other encoding). Since my locale is set to UTF-8, these couldn't be displayed properly. Instead, they were displayed as a question mark "?". This caused two problems:

    - I couldn't see the bad char, it looks like a perfectly legal question mark. Could it be displayed in hex in a different colour? That's what "less" does.

    - The bad chars were the same in both files, and no difference was noted, but they happened to be near a real difference, and I copied lines from one file to the other. Instead of copying the bad char, I got the question mark copied, introducing a difference that wasn't there. However, unless I set my encoding to something where those chars are legal (e.g. "Western European (ISO)"), BC doesn't display a difference between the files.
  • Aaron
    Team Scooter
    • Oct 2007
    • 16000

    #2
    Hello,

    Thanks for the bug report. When BC4 detects bad encoding, there should be an error line with a line number so you can navigate and correct it. A file with an encoding error should also be set to read only, until the encoding error is corrected. This doesn't seem to be happening in the current Mac build.

    If you encounter an encoding error, please refrain from editing or copying from the file until you can fix the error.
    Aaron P Scooter Software

    Comment

    • dmurdoch
      Visitor
      • Feb 2019
      • 7

      #3
      Thanks for the response. In this case, the error is really in the assumed encoding; I'd suggest that if bad chars aren't handled in UTF-8, it should be made easy to change the assumed encoding to allow the comparison to proceed normally.

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16000

        #4
        Hello,

        BC4's encoding detection can miss, or the character is malformed; either way, it should be marking the file as read-only to prevent these errors from going forward. We'll be investigating this fix.

        Related, but we can manually override the encoding quickly and easily. You can manually override the encoding quickly by clicking the upper status bar text of the Encoding name, and then the dropdown menu presents selectable encoding. You can also perform the same override from the Text Compare's Beyond Compare menu -> Session Settings dialog, Format tab.
        Aaron P Scooter Software

        Comment

        • Chris
          Team Scooter
          • Oct 2007
          • 5538

          #5
          The encoding detection error on macOS is fixed in Beyond Compare 4.2.10, now available on the download page.
          Chris K Scooter Software

          Comment

          Working...