Wish-list: manual BOM control

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Gerd
    Visitor
    • Jun 2014
    • 4

    Wish-list: manual BOM control

    It’s only a minor annoyance with your great tool, but
    I'd appreciate some more control about whether or not a file should start with a byte-order mark.

    Currently, when inserting Unicode graphics (mainly formulae as comments) in a file encoded in Ansi, I usually switch to UTF-8. Unfortunately, no BOM is inserted automatically. So I have to convert the file using some less favorite text editor.

    Given that BC already detects the presence of a BOM, it should be rather simple to change this attribute additionally to altering code page and line-ending style.
  • Aaron
    Team Scooter
    • Oct 2007
    • 16009

    #2
    Hello,

    Thanks for the feedback. This is something we can investigate and add to our wishlist.

    When you perform this edit, is it BC4 that then does not detect the files as UTF-8 without a BOM? If so, we would appreciate example files that we can review and add to our test cases. You can email us at [email protected] with a link back to this forum thread for our reference. If not BC4, which program has trouble with these files?
    Aaron P Scooter Software

    Comment

    • Gerd
      Visitor
      • Jun 2014
      • 4

      #3
      Originally posted by Aaron
      is it BC4 that then does not detect the files as UTF-8 without a BOM
      Yes and no. This leads to quite another problem.

      BC 3 or BC 4 would detect UTF-8 without a BOM, if I were not forced to override the detection.

      This is because most of my source files are Windows-1252, but BC stubbornly insists that too many of them are Japanese (Shift-JIS) with encoding errors.

      I do not know, which characters trigger this. Just now got one of these German files, 4533 characters, mostly ASCII, except for some NBSP, SHY and ¯Üßäöü: A0 (1), AD (1), AF (177), DC (1), DF (4), E4 (14), F6 (8), FC (18).

      For me, it is less annoying to switch from Ansi to undetected UTF-8 (rarely), than to switch from misdetected Japanese to Ansi (about 4 out of 10 files). I'd prefer a detection setting like “Ansi, except for UTF-8 with BOM” or “UTF-8 if correctly encoded, otherwise Ansi” or even “auto-detect, but on Japanese encoding errors always Ansi”.

      All this is rather disconnected from the wish for inserting a BOM. I do not even remember whether there once was some software (except for my own) trying to read UTF-8 as Ansi. The BOM is for the human reader, so not being forced to scan the text trying to detect or misdetect the encoding like BC does. It’s just another rule to simplify life: if you need to know the encoding in advance (or without reading), check the BOM. No guesswork needed without a BOM, because there is a single default codepage.

      Comment

      • Chris
        Team Scooter
        • Oct 2007
        • 5538

        #4
        Beyond Compare 4.2 beta now allows you to save with or without a Byte Order Mark in the Save As dialog.

        Beta page: http://www.scootersoftware.com/beta
        Chris K Scooter Software

        Comment

        Working...