Announcement

Collapse
No announcement yet.

Generated html report has wrong encoding UCS-2 LE BOM

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generated html report has wrong encoding UCS-2 LE BOM

    Generating an html report from two xml files encoded as UTF-8 generates a report encoded as UCS-2 LE BOM per Notepad++.

    Firefox is able to display the page correctly. However I am first saving the file to the db as a clob via python.

    Something like:
    Code:
    print(open("test3.html").read())
    where test3.html is the report generated by BC.
    Here is some output from the above code:

    *■< ! D O C T Y P E H T M L P U B L I C " - / / W 3 C / / D T D H T M L
    4 . 0 1 T r a n s i t i o n a l / / E N " " h t t p : / / w w w . w 3 . o
    g / T R / h t m l 4 / l o o s e . d t d " >
    < h t m l >

    < / h t m l >
    So there is definitely something in the front mucking things up.

    I read that BC should output the report in the format of the left input file and that is working as expected for any files that are not .XML

    I also double checked and Notepad++ opens the left and right XML files as UTF-8, so the encoding of the input files appears to be ok.

    How can I get my report to be generatated in UTF-8.

    Version 3.3.12 b 18981
    Thanks,
    Dennis

  • #2
    Hello Dennis,

    Which HTML report layout would you be generating, and in the interface which Encoding is detected for the files in the upper status bar?

    Testing with two files in the Text Compare that detect as UTF-8, and the generated HTML report (Side-by-side layout) is generated as UTF-8.
    I tested this with BC3.3.13 and BC4.1.8. All minor updates are free, so you can update to BC3.3.13 here:
    http://www.scootersoftware.com/download.php?zz=dl3_en

    If you can email support@scootersoftware.com in your current BCSupport.zip (Help menu -> Support; Export) and a pair of sample files, let us know which report to generate to test. Please include a link back to this forum thread for our reference.
    Aaron P Scooter Software

    Comment


    • #3
      Aaron,
      I am not sure if I would be allowed to send in the files per our company policy. I appreciate the assistance but for now I just hacked around the issue. For others in the same boat here is the hack I ended up creating.

      Code:
              try:
                  concomitant_pk = qa_query().add_concomitant(beyond_compare=codecs.open(bc_path, 
                                                              'r', encoding = 'utf16').read())
              except UnicodeError: 
                  concomitant_pk = qa_query().add_concomitant(beyond_compare=open(bc_path,'r').read())
      So just check for utf16 (apparently that is the same as UCS-2).
      http://stackoverflow.com/questions/1...-ucs-2-be-file

      Comment

      Working...
      X