Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2014
    Location
    Freiburg, Germany
    Posts
    4

    Default Wish-list: manual BOM control

    It’s only a minor annoyance with your great tool, but
    I'd appreciate some more control about whether or not a file should start with a byte-order mark.

    Currently, when inserting Unicode graphics (mainly formulae as comments) in a file encoded in Ansi, I usually switch to UTF-8. Unfortunately, no BOM is inserted automatically. So I have to convert the file using some less favorite text editor.

    Given that BC already detects the presence of a BOM, it should be rather simple to change this attribute additionally to altering code page and line-ending style.

  2. #2
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    12,006

    Default

    Hello,

    Thanks for the feedback. This is something we can investigate and add to our wishlist.

    When you perform this edit, is it BC4 that then does not detect the files as UTF-8 without a BOM? If so, we would appreciate example files that we can review and add to our test cases. You can email us at support@scootersoftware.com with a link back to this forum thread for our reference. If not BC4, which program has trouble with these files?
    Aaron P Scooter Software

  3. #3
    Join Date
    Jun 2014
    Location
    Freiburg, Germany
    Posts
    4

    Default

    Quote Originally Posted by Aaron View Post
    is it BC4 that then does not detect the files as UTF-8 without a BOM
    Yes and no. This leads to quite another problem.

    BC 3 or BC 4 would detect UTF-8 without a BOM, if I were not forced to override the detection.

    This is because most of my source files are Windows-1252, but BC stubbornly insists that too many of them are Japanese (Shift-JIS) with encoding errors.

    I do not know, which characters trigger this. Just now got one of these German files, 4533 characters, mostly ASCII, except for some NBSP, SHY and ¯Üßäöü: A0 (1), AD (1), AF (177), DC (1), DF (4), E4 (14), F6 (8), FC (18).

    For me, it is less annoying to switch from Ansi to undetected UTF-8 (rarely), than to switch from misdetected Japanese to Ansi (about 4 out of 10 files). I'd prefer a detection setting like “Ansi, except for UTF-8 with BOM” or “UTF-8 if correctly encoded, otherwise Ansi” or even “auto-detect, but on Japanese encoding errors always Ansi”.

    All this is rather disconnected from the wish for inserting a BOM. I do not even remember whether there once was some software (except for my own) trying to read UTF-8 as Ansi. The BOM is for the human reader, so not being forced to scan the text trying to detect or misdetect the encoding like BC does. It’s just another rule to simplify life: if you need to know the encoding in advance (or without reading), check the BOM. No guesswork needed without a BOM, because there is a single default codepage.

  4. #4
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    4,790

    Default

    Beyond Compare 4.2 beta now allows you to save with or without a Byte Order Mark in the Save As dialog.

    Beta page: http://www.scootersoftware.com/beta
    Chris K Scooter Software

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •