How to compare text files with lines broken at different points?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • hugh
    Enthusiast
    • Jan 2008
    • 31

    How to compare text files with lines broken at different points?

    I have two text files with different line wrapping (newlines, not ending encodings). When I compare BC thinks virtually the entire document is different.

    I tried removing all line endings from the files, then compare, but unfortunately I get one huge line and BC won't wrap the view.

    Any ideas how to compare two text files?
  • Aaron
    Team Scooter
    • Oct 2007
    • 16000

    #2
    Hello,

    Line Endings are important for the Text Compare, but you can run your files through an external conversion which normalizes the whitespace and line breaks of the two files.

    We have some additional downloads for specific formats such as XML Tidied and HTML Tidied:
    http://www.scootersoftware.com/downl..._moreformatsv4

    And you can also use a custom conversion, using an RESX format as the example:
    http://www.scootersoftware.com/suppo...rnalconversion
    Aaron P Scooter Software

    Comment

    • hugh
      Enthusiast
      • Jan 2008
      • 31

      #3
      Thanks for your answer.

      The sort of thing I'm comparing is markdown (.md) where line breaks in paragraphs are ignored and re-wrapped when rendered.

      I'd like to be able to compare those paragraphs.

      Following your suggestion, I wrote a utility to normalise two .md files for comparison. Then it was ok for BC.

      Is there any documentation on how to make a package for new files formats (eg for .md files). I could make my utility available.

      thanks.

      Comment

      • patch
        Journeyman
        • Sep 2008
        • 14

        #4
        I also often want to compare documents which have been hard word wrapped. Having this functionality readily available within Beyond compare would be a really appreciated feature.

        As it has been requested in the past and not implemented so I assume the restriction is deeply embedded within Beyond Compare. Maybe having a standard filter to convert text to sentences would be more achievable. See Word Wrap thread
        Last edited by patch; 24-Sep-2016, 08:02 PM.

        Comment

        • hugh
          Enthusiast
          • Jan 2008
          • 31

          #5
          thanks.

          I posted my paragraph fold util here, https://gitlab.com/jkj/foldp
          I don't know how to make BC run it automatically.

          It only works on plain text files that are paragraphs separated by blank lines. Sort of things like this I need to compare are software licences and contracts. What i do is re-wrap them to, say 100 letters per line, then use BC to compare.

          Comment

          • Aaron
            Team Scooter
            • Oct 2007
            • 16000

            #6
            Hello,

            Thanks!

            For automation, use the Tools menu -> File Formats dialog, click +/New -> Text Format. Add *.md as the extension, and in the Conversion tab add the path to your utility. I would recommend storing it in our %AppData%\Scooter Software\Beyond Compare 4\Helpers\foldp. From your GitHub documentation, the Conversion path could be:
            Helpers\foldp "%s" "%t"

            Where %s is the source file and %t is the temp target that will be generated and displayed. This format, when used, should auto-convert your *.md files. Does this work for your files?
            Aaron P Scooter Software

            Comment

            • hugh
              Enthusiast
              • Jan 2008
              • 31

              #7
              It worked!

              foldp is not very clever. it doesn't really understand markdown (.md). It works by refolding paragraphs of text, where a paragraph is any plain text separated by a blank line.

              However, it works well enough for some of my simple markdown files, also some of the text software licence agreements i was comparing.

              One thing to note. If you make changes to the compared files then save, you'll be saving the wrapped version not the original.

              If i run into problems, i'll update it. or anyone have a problem, request me at [email protected]

              Comment

              • jackting
                Journeyman
                • Dec 2007
                • 16

                #8
                Originally posted by hugh
                thanks.

                I posted my paragraph fold util here, https://gitlab.com/jkj/foldp
                I don't know how to make BC run it automatically.

                It only works on plain text files that are paragraphs separated by blank lines. Sort of things like this I need to compare are software licences and contracts. What i do is re-wrap them to, say 100 letters per line, then use BC to compare.
                Hi, I've try your util, but it does not deal with UTF-8 encoding correctly.

                so, I try pandoc from http://pandoc.org/installing.html
                by using the following steps, it was done successfully, and work perfect for me.

                1. make a copy of 'file formats' setting 'HTML', name it as 'MD'.
                2. editing the 'Mask' in 'General' tab to '*.md'.
                3. in 'Conversion' tab
                3a. choose 'External Program (Unicode filenames)'
                3b. set 'Loading' to "C:\Utils\Console\pandoc" "%s" -f markdown_github -t html -s -o "%t"
                (I put pandoc.exe at C:\Utils\Console)
                3c. make a click on 'Disable editing'
                3d. change 'Encoding' to 'UTF-8'

                Note: there's many various derivation of 'markdown' sets, I'm using gitbook, so I choose '-f markdown_github' when convert it.

                Comment

                Working...