PDA

View Full Version : BOM (Byte Order Mark) ignored when comparing


Pesche
28-Aug-2008, 12:20 PM
I was comparing two XML files in UTF-8 encoding, one starting with a BOM and the other not. Beyond Compare doesn't show this difference.

See http://en.wikipedia.org/wiki/Byte_order_mark for more about what a BOM is.

Is there a way (similar to the whitespace and line ending toggles) to display the presence of a BOM?

Michael Bulgrien
28-Aug-2008, 03:24 PM
If the BOM is present, then BC3 will reports the file as UTF-8
If the BOM is not present, then BC3 will reports the file as ANSI

http://content.screencast.com/users/Bulgrien/folders/Jing/media/ff5a3d85-e67b-4b2a-bfc0-dee1cb0cf4b9/2008-08-28_1713.png

The only way that I know of to see the BOM is to open the files in a hex compare:

http://content.screencast.com/users/Bulgrien/folders/Jing/media/cfed1f4f-3477-40e0-83a6-78d6ead57fe7/2008-08-28_1720.png\

Michael Bulgrien
28-Aug-2008, 05:48 PM
Also, if you perform a Compare Contents (=?) from a folder compare, be aware that you can ignore the encoding from there as well. Choosing a "Rules-based comparison" will compare the textual content of the files without regard to their encoding (the BOM will be ignored). Choosing a "Binary comparison" will treat it as a difference if there is a BOM in one file and not in the other.

Pesche
31-Aug-2008, 03:08 PM
In my case also the file without BOM is shown as UTF-8; in Text View, the file size is the only hint that the two files are not identical:
http://www.scootersoftware.com/vbulletin/attachment.php?attachmentid=228&stc=1&d=1220216307

In Hex view, of course the difference is clearly visible:
http://www.scootersoftware.com/vbulletin/attachment.php?attachmentid=229&stc=1&d=1220216307

I would like some easy to spot indicator that the files are not 100% identical without having to check in Hex view first. Preferably some statement in the status line summary, something like "same (filesize differs)" instead of just "same".

Michael Bulgrien
31-Aug-2008, 10:19 PM
I have suggested something similar in the past. Perhaps:

= Same (with encoding difference)

Or something like that...