Ignoring Line Differences

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Gltokensp06

    Ignoring Line Differences

    I'm sure this has been asked and answered before but I just can't find it in the forums. My question is, how do you ignore line differences while using BC3. I'm personally using PDF files so for example: This is file 1 compared to file 2




    My problem is, is that what i'm comparing, the lines are gonna be different the text is going to be the same. So is there a way to tell the session to ignore those differences so there isn't red everywhere on the screen?
    Last edited by Guest; 31-Mar-2009, 10:25 AM.
  • Aaron
    Team Scooter
    • Oct 2007
    • 15938

    #2
    Hello,

    Beyond Compare uses the different lines to perform the alignment, and then compares those lines. It is not possible to ignore line endings (that option ignores the 'type' of line ending, not where they occur).

    Often, users will run a script or conversion that will sort their data in a comparable way.

    Would you be able to email your example files, your support package (Help menu -> Support; Export), and a link to this forum post? We could look at your PDF files and try to determine exactly why there are line breaks there.
    Aaron P Scooter Software

    Comment

    • ojcitt
      Visitor
      • Aug 2009
      • 6

      #3
      Paragraph detection would be a nice feature, and not all that hard to add (certainly much less trouble than implementing PDF comparison in the first place). This problem also comes up in text documents that are hard-filled (e.g. README files designed for terminal reading), but have skipped lines between paragraphs. All BC would need to be able to do would be to strip the single new lines, and leave breaks between paragraphs.

      In other words, I want an option to treat single line separators as whitespace differences (ignorable), and double line separators as single line breaks. In source code, it makes sense to use single line breaks as the unit of decomposition, but in prose, (especially when different layout engines may have gotten involved), it really isn't.

      My current solution to copy the text from the pdf, run a regex replacement on the pairs of newlines new lines in each version (say with <<TEMP_PARAGRAPH_BREAK>>), delete all the single newlines, then replace the temporary break pattern with an actual line ending.

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 15938

        #4
        Thanks for the suggestion. This is on our Customer Wishlist, and I've added your notes to the entry.

        If you are able to perform that conversion from the command line, bat file, or script, you could create a custom Conversion for BC3 to use and call each time you open your specific file format. The bat file could call BC3's PDF conversion line, then format that file the way you want for comparison purposes and return that temp file. Make sure the file format is read-only to prevent accidental saving.
        Aaron P Scooter Software

        Comment

        • JohnWSaundersIII
          Visitor
          • Jun 2007
          • 3

          #5
          Source Code Too

          My scenario is to make differences due to source code formatting ignorable. For instance, in C# source, I'd like

          MethodCall(1, 2, 3, 4);

          to be the same as

          MethodCall(
          1,
          2,
          3,
          4);

          I'm intrigued by an earlier reply about conversions: I hadn't looked into them before. Unfortunately, it would take a week or so for me to come up with the code to recognize statements and write them out with the newlines replaced by space. I'd also want to be able to recognize other multi-line structures, for instance:

          public class SomeClass<T1, T2>
          where T1 : constraint
          where T2 : constraint
          {
          }

          should be the same as

          public class SomeClass<T1, T2> where T1 : constraint where T2 : constraint
          {
          }

          Comment

          • Maike Geng
            New User
            • Nov 2011
            • 1

            #6
            Any progress on this?

            In many languages and file formats newlines are treated as ordinary whitespace. There are some odd exceptions and that always makes it harder to get completely right.

            It'd be helpful to have an option that would make one pass through the changes and reclassify blocks that are purely newline and whitespace as unimportant changes.

            Comment

            • Aaron
              Team Scooter
              • Oct 2007
              • 15938

              #7
              Hello,

              Such a process is dependent on your current files (such as programming language or structure). We support an external conversion to perform this pass, and have several for download in our additional and alternative file formats. This is functionality we would like to build into BC3, but is not currently supported.
              Aaron P Scooter Software

              Comment

              • Arnaud
                Visitor
                • Sep 2015
                • 3

                #8
                any progress on this ?

                Hello,

                Was this feature actually implemented ?

                Indeed, we have to compare large .ptu files (IBM Rational Test RealTime scripts) and this file format has the capacity to split a line into several by escaping them with a "&". So, we have plenty of files which differ only on the positions of line breaks, and we would really need it such a feature. Unfortunately, I cannot find it in BC4.

                Even if .ptu file format is quite uncommon, I think that such a feature would also be a great help for considering as non-important the differences on, for example, the verbatim string literals in C# and C++11, or more generally speaking, because most programming languages treat the line breaks as whitespace.

                Comment

                • Aaron
                  Team Scooter
                  • Oct 2007
                  • 15938

                  #9
                  Hello,

                  BC4 does not currently support comparing across line breaks. We use External Conversions to help reorganize and normalize whitespace for comparisons. We have a few available for download on our downloads page, but do not have one available for PTU files. We do have a KB article that helps and goes into detail on defining any custom command line tool, using RESX files as the example:
                  http://www.scootersoftware.com/suppo...rnalconversion
                  Aaron P Scooter Software

                  Comment

                  • Arnaud
                    Visitor
                    • Sep 2015
                    • 3

                    #10
                    Hello,

                    Thank you for the workaround you suggested.

                    So, I wrote a very basic Windows PowerShell script ("Normalize-PTU.ps1") to normalize the lines by concatenating the instructions split across several lines :

                    Code:
                    if ( $args.Length -ne 2 )
                    {
                        echo "args.Length /= 2 !"
                        exit;
                    }
                    
                    $source = $args[0]
                    $target = $args[1]
                    
                    if ( ! ( Test-Path $source -pathType leaf ) )
                    {
                        echo "source file does not exist !"
                        exit;
                    }
                    
                    ( Get-Content $source -Raw ) -replace ("`r`n\s*&\s*"," ") | Set-Content ( $target + ".normalized" )
                    And, since Beyond Compare seems to handle only Batch scripts, I also wrote a Batch wrapper script ("Normalize-PTU.bat") to call the Windows PowerShell script :

                    Code:
                    @echo off
                    
                    powershell -File Normalize-PTU.ps1 %1 %2
                    When I run the Batch file manually (for example with the command "Normalize-PTU.bat my_source_file.ptu my_target_file.ptu"), everything goes well, with the target file being created with the expected contents.

                    However, when trying to use it via Beyond Compare, I get a "conversion error" message in both the left pane and the right pane of the window :

                    Click image for larger version

Name:	Beyond_Compare.png
Views:	1
Size:	5.0 KB
ID:	76388

                    Here is the configuration I set up :

                    Click image for larger version

Name:	Beyond_Compare_1.png
Views:	1
Size:	60.5 KB
ID:	76389
                    Click image for larger version

Name:	Beyond_Compare_2.png
Views:	1
Size:	74.8 KB
ID:	76387

                    Is there anything I missed ?

                    Comment

                    • Aaron
                      Team Scooter
                      • Oct 2007
                      • 15938

                      #11
                      Hello,

                      My system defaults to preventing this script from executing due to Windows security. Have you made customization that allows this script to execute? And if so, would those extend to programs calling various script or just your command line? And to verify, if on the Windows Command Prompt you enter: Normalize-PTU.bat a.txt out.txt
                      It will take your example a file and create the output into out.txt, which is not an empty file? Also, does your conversion return any errorlevel to the command line?
                      Aaron P Scooter Software

                      Comment

                      • Arnaud
                        Visitor
                        • Sep 2015
                        • 3

                        #12
                        Hello,

                        I finally found my mistake !
                        In the Windows PowerShell script, I was renaming the target file (%t in BeyondCompare, %2 in the Batch script and $target in the PowerShell script) by suffixing it by ".normalized".
                        Indeed, I had not understood that this target file name was a temporary name set up by BeyondCompare and that it shall not be modified in the conversion tool chain. A look in my "~/AppData/Local/Temp" folder gave me the idea, by finding that some "BCXXXXX.tmp.normalized" files were created here each time I was trying to compare PTU files...

                        Thank you for your help,

                        Arnaud

                        Comment

                        Working...