Unchanged lines marked as having differences

**Garret** · 17-Sep-2014, 12:49 PM

Hello,

There are a couple reasons that would cause this sort of behavior in BC.

The first thing you should check when you run the comparison is the file formats on both sides of the comparison.
In your screenshots, you included only one side of the comparison, but not the other. It is possible that with a
revision control one of the files may have a temp file extension, which may lead to the file formats being different.

If that is not the case, you can test to see what grammar elements BC recognizes the diffs as. You can check this by placing the cursor
on the difference area and seeing what the status bar reads.

EX:

If the elements do not match (ex: String & Keyword) BC will show these as diffs.

**blackpuma** · 17-Sep-2014, 01:38 PM

Garret,

Thanks for the information.

If I'm understanding correctly, it appears that the grammar elements are the cause. As I move the cursor around, I see "String" on one side and a variety of other grammar elements such as "Operator", "Identifier", "Default Text", "Number", etc. Sometimes "String" is on the left, sometimes on the right. Regardless, one side is "String" and the other side is a grammar element.

I'm attaching two screen shots of the entire BC window to (hopefully) capture all relevant information. The diff is from a git commit, using SourceTree's external diff feature.

This checkin deleted two line and inserted a sizable chunk of code.

The first screenshot is using file type "Perl Scripts" (what BC detected) and the second screenshot is when I changed both file types to "Everything Else".

The "changed" lines in the lower part of the left-hand gutter have not been altered at all, as can be seen in the second screen shot.

I'm not sure why otherwise identical lines would be recognized as different grammar. I don't see this in, for example, Python code.

Attached Files

**Garret** · 17-Sep-2014, 05:22 PM

If I'm understanding correctly, it appears that the grammar elements are the cause.

Yes, this is the case then.

If it is possible, could you send us some sample scripts so that we might find where the issue lies?
If you can isolate the section of your file that begins the incorrect grammar detection, that would
be very helpful. It works best for us if you can email them to [email protected]

**blackpuma** · 17-Sep-2014, 06:17 PM

I'll see about setting some time aside to do this. I'm keen to have it resolved.

**blackpuma** · 17-Sep-2014, 06:58 PM

I've successfully isolated the problem. The parser is attempting to parse the contents of a heredoc and getting confused.

I'll send an email as you requested, but for the benefit of others the problem is as follows.

The following happens to work:

print <<EOT;
foo "hello "
EOT

print "There";

But the following does not work:

print <<EOT;
foo "hello \"
EOT

print "There";

In a heredoc (the << multi-line string literal syntax), the parser should treat everything until the final EOT as a string. However, it's parsing the contents of the heredoc.

In the first example, it sees foo as an identifier, the initial " as the beginning of a string, and the second " as the end of a string.

In the second example, it sees foo as an identifier, the initial " as the beginning of a string, and the \" as a literal character and continues looking for the end of a string (the first " of print "There").

But, it should not be parsing the contents of heredocs at all. The entire content of the heredoc should be treated as a string. It is true that heredocs with <<"EOT" are treated as "" strings in Perl and do interpolation. However <<'EOT' should be treated as '' strings in Perl without interpolation.

**Aaron** · 18-Sep-2014, 10:13 AM

Hello,

Thanks for the verification. Our Perl scripts format has String definitions for " to " and a few other scenarios, but not << to EOT. Our grammar elements do not support a scan ahead, so it would stop at the first EOT if it was defined; would this work, or does it need to skip ahead to a 'final' EOT?

Otherwise, our general behavior is: if two different grammar elements are aligned, they are a difference even if the text values are equal. So a "String" to Not A String would be marked as a difference, where that difference is Important or Unimportant depending on if the grammar elements are Important or Unimportant.

The other basic String grammar that is defined in this Perl Format is " to " with / as an escape character.

**blackpuma** · 18-Sep-2014, 12:15 PM

Your grammar parser handles string constants that span multiple lines, so I hope this isn't a complication.

The simple way, I think, would be to take the delimiter string immediately after << and search for the next occurance of ^delimeter$

Unix shell here documents are simple

Code:

<<Alpha_Numeric

while Perl can have quoted delimiters which can include any characters like

Code:

<<' foo?' 
<<"bar!" 
<<`baz -_-`

**blackpuma** · 18-Sep-2014, 12:16 PM

A fuller implementation would be mindful of one concept: the here doc style of multi-line string constant is line based. The string constant begins on the *next* line and continues until the string delimiter is encountered alone on its own line without whitespace (^delimiter$).

With regular quotes one can say

Code:

print "Hello


There\n";

Where the string starts with the first quote and terminates with the second quote three lines down. In other words, the string literal is "Hello\n\n\nThere\n".

Unix shells (e.g. Bash) define here documents as << followed by a string. The string literal starts *on the next line* and continues until the string is encountered *alone* on a line.

For example,

Code:

sort << MyList
orange
apple
mud
MyList

The string literal is "orange\napple\nmud\n".

However the terminating delimiter has to be on its own line, alone. For example,

Code:

sort << MyList
orange
apple
mud
           MyList
MyList

The string literal is "orange\napple\nmud\n MyList\n".

Perl expands this defines a here document as << followed by a string, optionally enclosed in '' or "" or ``. The string starts *on the next line* and continues until it encounters the string alone on a line.

So in Perl one can do like shell scripts:

Code:

print <<MyList; 
orange
apple
mud
MyList

Or have some syntax around the << line such as:

Code:

open(FH, ">/tmp/foo.txt") or die "Can't open file.\n";
print FH <<MyList or die "Can't write to file.\n":
orange
apple
mud
MyList
close(FH);

Here the string literal is "orange\napple\mud\n" and the print line has extra grammar elements. <<MyList simply states that the next line is a string literal.

In all cases, the string literal starts on the *next* line. (There is a weird oddity with multiple here docs in Perl that I wouldn't worry about personally.)

Perl supports quotes so one can do something like:

Code:

    print <<"    MyList";
orange 
apple
mud
    MyList

Where the delimiter is four spaces followed by MyList.

**Garret** · 18-Sep-2014, 03:16 PM

Unfortunately at this time our parser cannot handle all of the specifications you listed for heredocs. The best case we can come up with that would suit the heredoc specs would be to define a new String grammar element in the Perl Scripts file format using the specs as seen below:

With the way our delimited parse function works, you would need to specify individual grammar elements for each case where you would be using a different string literal for the heredoc.

For future versions of BC we are looking into better options for specific document parsing so that BC will better detect syntax like heredocs.

**blackpuma** · 18-Sep-2014, 03:27 PM

Garret,

Thanks for the information, and thanks for taking the time to looking into the issue.

My style is consistent when it comes to heredocs, so the workaround that you've presented will do just fine.

In a pinch the generic "Everything Else" file type works.

**Aaron** · 18-Sep-2014, 03:47 PM

Thanks for the clarification. BC4's grammars do not currently support this type of back reference (to know what the keyword to end on is), but it's something we'll add this to our wishlist.

Unchanged lines marked as having differences

Unchanged lines marked as having differences

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment