XML Tidied with attr. sorted: Problem with Encoding

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Iso
    New User
    • Jul 2014
    • 1

    XML Tidied with attr. sorted: Problem with Encoding

    Hello
    Recently, I discovered a problem with encoding in XML files.

    The setup is:
    • Beyond Compare 3.3.12
    • Alternative from build-in file format: XML Tidied with attributes sorted


    This works brilliant, except when it comes to special characters, e.g. a "non breaking space":  

    When you switch from the build-in file format "xml" to "xml tidied with attributes sorted" the program "HTMLTidy" returns a non-valid xml file, the encoding will fail. Here is a simple example xml file to test this behaviour:

    Code:
    <?xml version="1.0" encoding="utf-8"?>
    <foo>
    	<bar>bar&#xA0;baz</bar>
    </foo>
    This wrong return of HtmlTidy seems to be caused by the setting of the character encoding in the config file (first line): "char-encoding: raw". The manpage says:
    raw: output values above 127 without conversion to entities
    So if I understand the behaviour of HtmlTidy correctly, it pareses the input file and won't translate the byte 0xA0 back to an entity.

    Fortunately all our source xml files are encoded to utf8 so I can avoid this problem by changing the encoding in the config file to utf8. But that might not work for other encodings.

    Would it be complicated to parse the encoding of the xml-file and set the parameter at the HtmlTidy-call correctly? Or something similar to that? Are there probably some other ideas?

    Kind regards,
    Iso
  • Aaron
    Team Scooter
    • Oct 2007
    • 15997

    #2
    Given the current implementation, we can't parse for encoding and then alter the config file before performing the external conversion. We rely on the configuration or ability of the external program to handle the file we pass, which in this case might require some customization for specific file sets.

    Enhancing our comparisons for XML files is on our wishlist. I believe I understand the type of files you have, but if you would like to email any samples into [email protected] (with a link back to this forum post for our reference), I can add them to our sample test cases.
    Aaron P Scooter Software

    Comment

    • Aaron
      Team Scooter
      • Oct 2007
      • 15997

      #3
      Also, how does our BC4 beta handle this? You can create File Formats for XML files which, in the Conversion tab, can be set to use an internal XML Tidy or Sort.
      Aaron P Scooter Software

      Comment

      Working...