No announcement yet.

Optimistic matching--preserve existing tokens when possible

  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimistic matching--preserve existing tokens when possible

    I'm sure this has come up before, but I'm having a hard time finding it because I don't know what the proper terminology is.

    Quite often I compare text where both sides are an exact match, except one has additional text that the other does not have. For example, I may have just added something to a file, but didn't change/delete anything that already existed. It is not too uncommon for the beginning and/or ending characters of the new text to match a neighboring character in the original file. When this happens, BC only starts highlighting when the differences start which can sometimes be in the middle of a word. While this is technically not incorrect, it is not how a human would highlight differences and as such, I am sometimes thrown off for a moment by it.

    A quick example would be to open a new text compare and type the following on the left:
    This is test two.
    This is test thirty-two.
    on the right. Before you do that, though, imagine you had a highlighter in your hand and you were asked to highlight only the difference(s) between the 2 "files". Obviously, the only highlight will be on the right side, but what do you mark? Most humans would highlight "thirty-". BC highlights "hirty-t". Again, this technically correct, but the "human way" of doing it is also correct and I think it makes a lot more sense.

    Here's what it looks like:

    I have messed around with my alignment settings, but nothing seems to fix this. (If I'm not mistaken, alignment talks more about matching full lines than text on a single line... yes?)

    Is there a setting somewhere that would tell BC to be less eager to break up words? If not, is that something that could be added in a future version? (I acknowledge that this could be a very difficult problem, but it's worth asking )

  • #2

    Thanks for posting. The main issue in terminology is Alignment generally refers to the line by line alignment, while character alignment is not impacted by these options. Defined Grammars can impact the character alignment, but in order to keep it fast and efficient, it can lead to scenarios like the one you've found. Are your files generally plain text (words, sentences, paragraphs) or a code language that grammars might help with?
    Aaron P Scooter Software


    • #3
      I use BC all the time for many different things, but the majority of the time, I compare C# code files. As a side note--I was thrilled to discover that I could change the alignment settings from "Standard" to "Myers O(ND)" and that seems to have fixed the problem where code blocks are split up in unnatural ways, similar to what I posted above. I guess I should have read the manual a long time ago!


      • #4
        In C#, using that format, should help with the above example when the grammars are in play, but less so when within the same grammar type. If you have sample files you would like to submit for our test cases (to help with future testing) you can email us at with a link back to this forum thread for our reference.
        Aaron P Scooter Software


        • #5
          I'm not sure I understood that first sentence... Using the same example as above, I changed the format to "C, C++, C#, ObjC Source", but I don't see a difference. Did I misunderstand?

          This contrived example isn't important. I'll keep my eye open for real-live scenarios, as you suggest...


          • #6
            No, sorry. It wouldn't effect the above example since those aren't actually C# files. The C# Format would have grammars defined, and if the grammars match specific text within the line. This doesn't dictate the character alignment but can help influence it, which can help with similar examples.
            Aaron P Scooter Software