No announcement yet.

Wish-list: manual BOM control

  • Filter
  • Time
  • Show
Clear All
new posts

  • Wish-list: manual BOM control

    Its only a minor annoyance with your great tool, but
    I'd appreciate some more control about whether or not a file should start with a byte-order mark.

    Currently, when inserting Unicode graphics (mainly formulae as comments) in a file encoded in Ansi, I usually switch to UTF-8. Unfortunately, no BOM is inserted automatically. So I have to convert the file using some less favorite text editor.

    Given that BC already detects the presence of a BOM, it should be rather simple to change this attribute additionally to altering code page and line-ending style.

  • #2

    Thanks for the feedback. This is something we can investigate and add to our wishlist.

    When you perform this edit, is it BC4 that then does not detect the files as UTF-8 without a BOM? If so, we would appreciate example files that we can review and add to our test cases. You can email us at with a link back to this forum thread for our reference. If not BC4, which program has trouble with these files?
    Aaron P Scooter Software


    • #3
      Originally posted by Aaron View Post
      is it BC4 that then does not detect the files as UTF-8 without a BOM
      Yes and no. This leads to quite another problem.

      BC 3 or BC 4 would detect UTF-8 without a BOM, if I were not forced to override the detection.

      This is because most of my source files are Windows-1252, but BC stubbornly insists that too many of them are Japanese (Shift-JIS) with encoding errors.

      I do not know, which characters trigger this. Just now got one of these German files, 4533 characters, mostly ASCII, except for some NBSP, SHY and : A0 (1), AD (1), AF (177), DC (1), DF (4), E4 (14), F6 (8), FC (18).

      For me, it is less annoying to switch from Ansi to undetected UTF-8 (rarely), than to switch from misdetected Japanese to Ansi (about 4 out of 10 files). I'd prefer a detection setting like Ansi, except for UTF-8 with BOM or UTF-8 if correctly encoded, otherwise Ansi or even auto-detect, but on Japanese encoding errors always Ansi.

      All this is rather disconnected from the wish for inserting a BOM. I do not even remember whether there once was some software (except for my own) trying to read UTF-8 as Ansi. The BOM is for the human reader, so not being forced to scan the text trying to detect or misdetect the encoding like BC does. Its just another rule to simplify life: if you need to know the encoding in advance (or without reading), check the BOM. No guesswork needed without a BOM, because there is a single default codepage.


      • #4
        Beyond Compare 4.2 beta now allows you to save with or without a Byte Order Mark in the Save As dialog.

        Beta page:
        Chris K Scooter Software