Unchanged lines marked as having differences

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • blackpuma
    Enthusiast
    • Aug 2006
    • 21

    Unchanged lines marked as having differences

    I've been a user of BC since version one and was thrilled to have it available on Mac. One of a couple of oddities, this being the more serious one.

    I have a Perl script under revision control. I have SourceTree configured to use BeyondCompare as its external diff tool.

    If I modify the Perl script using vim from the command line, BC correctly discerns that it's Perl. However, various lines that have not been altered are marked as having differences.

    If I switch from "Perl Scripts" to "Everything Else" then the diff works correctly. Playing with the filters some of the other filters also exhibit the same behaviour.

    OS X version: 10.9.4
    BC version: Version 4.0 (build 18847)
    Attached Files
  • Garret
    Team Scooter
    • May 2014
    • 11

    #2
    Hello,

    There are a couple reasons that would cause this sort of behavior in BC.

    The first thing you should check when you run the comparison is the file formats on both sides of the comparison.
    In your screenshots, you included only one side of the comparison, but not the other. It is possible that with a
    revision control one of the files may have a temp file extension, which may lead to the file formats being different.

    If that is not the case, you can test to see what grammar elements BC recognizes the diffs as. You can check this by placing the cursor
    on the difference area and seeing what the status bar reads.

    EX: Click image for larger version

Name:	2014-09-17 12_45_49-HswzDb_Testing.py _--_ Testing.py - Text Compare - Beyond Compare.png
Views:	1
Size:	1.5 KB
ID:	76264

    If the elements do not match (ex: String & Keyword) BC will show these as diffs.
    Garret H

    Scooter Software

    Comment

    • blackpuma
      Enthusiast
      • Aug 2006
      • 21

      #3
      Garret,

      Thanks for the information.

      If I'm understanding correctly, it appears that the grammar elements are the cause. As I move the cursor around, I see "String" on one side and a variety of other grammar elements such as "Operator", "Identifier", "Default Text", "Number", etc. Sometimes "String" is on the left, sometimes on the right. Regardless, one side is "String" and the other side is a grammar element.

      I'm attaching two screen shots of the entire BC window to (hopefully) capture all relevant information. The diff is from a git commit, using SourceTree's external diff feature.

      This checkin deleted two line and inserted a sizable chunk of code.

      The first screenshot is using file type "Perl Scripts" (what BC detected) and the second screenshot is when I changed both file types to "Everything Else".

      The "changed" lines in the lower part of the left-hand gutter have not been altered at all, as can be seen in the second screen shot.

      I'm not sure why otherwise identical lines would be recognized as different grammar. I don't see this in, for example, Python code.
      Attached Files

      Comment

      • Garret
        Team Scooter
        • May 2014
        • 11

        #4
        If I'm understanding correctly, it appears that the grammar elements are the cause.
        Yes, this is the case then.

        If it is possible, could you send us some sample scripts so that we might find where the issue lies?
        If you can isolate the section of your file that begins the incorrect grammar detection, that would
        be very helpful. It works best for us if you can email them to [email protected]
        Garret H

        Scooter Software

        Comment

        • blackpuma
          Enthusiast
          • Aug 2006
          • 21

          #5
          I'll see about setting some time aside to do this. I'm keen to have it resolved.

          Comment

          • blackpuma
            Enthusiast
            • Aug 2006
            • 21

            #6
            I've successfully isolated the problem. The parser is attempting to parse the contents of a heredoc and getting confused.

            I'll send an email as you requested, but for the benefit of others the problem is as follows.

            The following happens to work:

            print <<EOT;
            foo "hello "
            EOT

            print "There";

            But the following does not work:

            print <<EOT;
            foo "hello \"
            EOT

            print "There";

            In a heredoc (the << multi-line string literal syntax), the parser should treat everything until the final EOT as a string. However, it's parsing the contents of the heredoc.

            In the first example, it sees foo as an identifier, the initial " as the beginning of a string, and the second " as the end of a string.

            In the second example, it sees foo as an identifier, the initial " as the beginning of a string, and the \" as a literal character and continues looking for the end of a string (the first " of print "There").

            But, it should not be parsing the contents of heredocs at all. The entire content of the heredoc should be treated as a string. It is true that heredocs with <<"EOT" are treated as "" strings in Perl and do interpolation. However <<'EOT' should be treated as '' strings in Perl without interpolation.

            Comment

            • Aaron
              Team Scooter
              • Oct 2007
              • 15997

              #7
              Hello,

              Thanks for the verification. Our Perl scripts format has String definitions for " to " and a few other scenarios, but not << to EOT. Our grammar elements do not support a scan ahead, so it would stop at the first EOT if it was defined; would this work, or does it need to skip ahead to a 'final' EOT?

              Otherwise, our general behavior is: if two different grammar elements are aligned, they are a difference even if the text values are equal. So a "String" to Not A String would be marked as a difference, where that difference is Important or Unimportant depending on if the grammar elements are Important or Unimportant.

              The other basic String grammar that is defined in this Perl Format is " to " with / as an escape character.
              Aaron P Scooter Software

              Comment

              • blackpuma
                Enthusiast
                • Aug 2006
                • 21

                #8
                Your grammar parser handles string constants that span multiple lines, so I hope this isn't a complication.

                The simple way, I think, would be to take the delimiter string immediately after << and search for the next occurance of ^delimeter$

                Unix shell here documents are simple

                Code:
                <<Alpha_Numeric
                while Perl can have quoted delimiters which can include any characters like

                Code:
                <<' foo?' 
                <<"bar!" 
                <<`baz -_-`
                Last edited by blackpuma; 18-Sep-2014, 12:21 PM.

                Comment

                • blackpuma
                  Enthusiast
                  • Aug 2006
                  • 21

                  #9
                  A fuller implementation would be mindful of one concept: the here doc style of multi-line string constant is line based. The string constant begins on the *next* line and continues until the string delimiter is encountered alone on its own line without whitespace (^delimiter$).

                  With regular quotes one can say

                  Code:
                  print "Hello
                  
                  
                  There\n";
                  Where the string starts with the first quote and terminates with the second quote three lines down. In other words, the string literal is "Hello\n\n\nThere\n".

                  Unix shells (e.g. Bash) define here documents as << followed by a string. The string literal starts *on the next line* and continues until the string is encountered *alone* on a line.

                  For example,

                  Code:
                  sort << MyList
                  orange
                  apple
                  mud
                  MyList
                  The string literal is "orange\napple\nmud\n".

                  However the terminating delimiter has to be on its own line, alone. For example,

                  Code:
                  sort << MyList
                  orange
                  apple
                  mud
                             MyList
                  MyList
                  The string literal is "orange\napple\nmud\n MyList\n".


                  Perl expands this defines a here document as << followed by a string, optionally enclosed in '' or "" or ``. The string starts *on the next line* and continues until it encounters the string alone on a line.

                  So in Perl one can do like shell scripts:

                  Code:
                  print <<MyList; 
                  orange
                  apple
                  mud
                  MyList
                  Or have some syntax around the << line such as:

                  Code:
                  open(FH, ">/tmp/foo.txt") or die "Can't open file.\n";
                  print FH <<MyList or die "Can't write to file.\n":
                  orange
                  apple
                  mud
                  MyList
                  close(FH);
                  Here the string literal is "orange\napple\mud\n" and the print line has extra grammar elements. <<MyList simply states that the next line is a string literal.

                  In all cases, the string literal starts on the *next* line. (There is a weird oddity with multiple here docs in Perl that I wouldn't worry about personally.)

                  Perl supports quotes so one can do something like:

                  Code:
                      print <<"    MyList";
                  orange 
                  apple
                  mud
                      MyList
                  Where the delimiter is four spaces followed by MyList.
                  Last edited by blackpuma; 18-Sep-2014, 12:20 PM.

                  Comment

                  • Garret
                    Team Scooter
                    • May 2014
                    • 11

                    #10
                    Unfortunately at this time our parser cannot handle all of the specifications you listed for heredocs. The best case we can come up with that would suit the heredoc specs would be to define a new String grammar element in the Perl Scripts file format using the specs as seen below:

                    Click image for larger version

Name:	2014-09-18 15_10_23-Grammar Item.png
Views:	1
Size:	37.2 KB
ID:	76269

                    With the way our delimited parse function works, you would need to specify individual grammar elements for each case where you would be using a different string literal for the heredoc.

                    For future versions of BC we are looking into better options for specific document parsing so that BC will better detect syntax like heredocs.
                    Garret H

                    Scooter Software

                    Comment

                    • blackpuma
                      Enthusiast
                      • Aug 2006
                      • 21

                      #11
                      Garret,

                      Thanks for the information, and thanks for taking the time to looking into the issue.

                      My style is consistent when it comes to heredocs, so the workaround that you've presented will do just fine.

                      In a pinch the generic "Everything Else" file type works.

                      Comment

                      • Aaron
                        Team Scooter
                        • Oct 2007
                        • 15997

                        #12
                        Thanks for the clarification. BC4's grammars do not currently support this type of back reference (to know what the keyword to end on is), but it's something we'll add this to our wishlist.
                        Aaron P Scooter Software

                        Comment

                        Working...