[Issue, and HowTo] compare pdf's when one is corrupt

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bcUser-7
    Visitor
    • Feb 2014
    • 4

    [Issue, and HowTo] compare pdf's when one is corrupt

    hi,
    I tried to compare two pdfs where one was corrupt (when I re-downloaded from Hotmail it returned to me corrupt).


    Anyway, my issues to report are:
    1) when I compared them, the corrupt pdf causes BC to notify me that it couldn't be rendered, and then shows that side of the comparison differently to the other side (which displays as pdf text).

    2) when I tried to do a 'text view' compare on the raw pdf file [yes opening the binary pdf file in a text view rather than inside of a hex view]
    it displayed, however, there are many white spaces where some type of symbol should be displayed to represent it (like in notepad++ etc)

    My How To question is:
    how do I compare the pdf files in text mode so that the 'binary characters' are also represented uniquely rather than all of them being 'white space'?

    - regarding the 'compare' algo, I want the comparison to do a binary / hex comparison, but just show the results as 'text mode'.


    Can someone please help.



    Thanks.


    --( I haven't sent this as an email to tech support, I guess this question is ideal for the forum if the owner of scooter would like to reply before a user does )
  • Aaron
    Team Scooter
    • Oct 2007
    • 16000

    #2
    Hello,

    Normally, the PDF does open in the Text Compare and uses the PDF Format (which performs an external conversion to plain .txt, then displays that text).

    When you say you open the raw PDF in the Text viewer, how were you bypassing the default behavior? Did you manually pick a different format, or disable the conversion process?

    Otherwise, binary garbage characters usually appear as random bits and symbols, and not uniformly whitespace. You could also open the files in the Hex Compare to view the hex content without a conversion.

    Another tactic would be to try and open the file in Adobe, then use Adobe's Save As Text command to get the plain text. Do they have better corruption handling?
    Aaron P Scooter Software

    Comment

    • bcUser-7
      Visitor
      • Feb 2014
      • 4

      #3
      Originally posted by Aaron
      Hello,

      Normally, the PDF does open in the Text Compare and uses the PDF Format (which performs an external conversion to plain .txt, then displays that text).

      When you say you open the raw PDF in the Text viewer, how were you bypassing the default behavior? Did you manually pick a different format, or disable the conversion process?

      Otherwise, binary garbage characters usually appear as random bits and symbols, and not uniformly whitespace. You could also open the files in the Hex Compare to view the hex content without a conversion.

      Another tactic would be to try and open the file in Adobe, then use Adobe's Save As Text command to get the plain text. Do they have better corruption handling?


      what I mean is:
      1. the external text conversion fails, so the corrupt pdf doesn't get converted - yet it still attempts to compare.

      2. if you open a binary file in notepad++ it will show a symbol to represent the binary characters, but BC seems to visually display binary chars as whitespace (invisible ' ').


      3. so that being said, how do I view the pdf's as unfiltered 'text' like viewing it in notepad++?

      -- so the compare views like notepad++ would represent a binary file?


      let me know if you aren't sure what I mean, and I will take some screenshots for you.

      thanks.

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16000

        #4
        Some screenshots would help. The expected output, if the conversion fails, is a blank pane with "Conversion Error" in the status bar. If the conversion was not performed (a different format selected, like Default), then the binary characters would appear as gibberish (like notepad++) and not as whitespace.

        If you would like, you could also email us at [email protected] with
        1) A copy of your BCSupport.zip from the Help menu -> Support; Export
        2) some screenshots
        3) a sample pdf file
        And we should then be able to recreate what you are seeing.
        Aaron P Scooter Software

        Comment

        • bcUser-7
          Visitor
          • Feb 2014
          • 4

          #5
          hi,
          I can provide the screenshots soon.

          funny enough this is a copy of one of my resume's. (unfortunately it isn't a copy of my software development resume, it is a copy of my general / any non slave labor resume which includes information applicable for that job area)...

          Since it is a resume sample I will send the resume file by email, however I will update this thread too with all non-private info including screen shots that I am happy to share (with any personal info redacted on the non corrupted view).

          ( p.s as I said, I downloaded this copy of my general resume back from my hotmail account, so if it has any corruption or intentional corruption ... I need not say anymore except to say it seems suspicious to me )...

          thanks.
          William.

          ( I will update asap )

          Comment

          • bcUser-7
            Visitor
            • Feb 2014
            • 4

            #6
            hi,

            It seems bc4 will display 'invisible chars'.

            I just started to do the screen's and I realized ... I can't really be bothered as this would take at least an hour of work.


            feel free the delete this thread since I have changed my mind about trying to explain it.


            thanks for your time though.

            Comment

            Working...