No announcement yet.

Mark entire words instead of characters

  • Filter
  • Time
  • Show
Clear All
new posts

  • Mark entire words instead of characters

    Hello all,

    I am currently testing the Beyond Compare trial which seems very promising.

    However I have an issue that I have not been able to solve. I've tried searching the forums and documentation and could not find a solution. I would like the software to mark entire words as incorrect, instead of single characters, especially when those words are very dissimilar.

    So for example, if the two compared words are Apple and Apples, the "s" being marked is OK for my needs.

    But for Apples and Oranges, "appl" and "orng" are marked as different. Is there any way, using any built in functions or regular expressions to get the whole word marked as different?

    Thank you,


  • #2

    Performing the comparison based on Words instead of Characters is something on our wishlist. We can define grammars, and any aligned characters of different grammars would be different, even if they are the same character. This is most commonly seen in "This is a String" vs. This is a String, where this is a difference even though the characters are the same only one side is actually a String grammar. If you had a grammar for Apples and a grammar for Oranges, it would apply similarly. This would require a Grammar for each suspected case, however, which may be outside your scope if differences aren't predictable variables.
    Aaron P Scooter Software


    • #3
      Thank you for the quick response Aaron!

      If anyone has found a workaround I would love to hear about it. For the moment we have used the delimiter in grammar for predictable variables, but we cannot do so for unpredictable ones as Aaron mentioned.


      • #4
        Hello again,

        We have used the Delimiter rule to get what we wanted.
        Text From: {{
        To: }}

        Whilst this works well 99% of the time, sometimes it acts a bit weird. For example comparing the following:

        Please note that {{NAME}}, from the {{COUNTRY OF ORIGIN}} and with passport number {{PASSPORT NUMBER}} is...
        Please note that John Smith, from the the UK and with passport number EC1234567 is...

        The "r" in "number" is marked as being different. If I change the passport number from "EC1234567" to "EC123 4567" or several other changes, the "r" is no longer marked as being different. Does anyone know what is wrong and why the "r" is marked as being different?



        • #5

          In the View menu, if you enable Alignment Details, I suspect the R is aligning to a different character or another r, but the R's belong to different grammar elements. An aligned character that belongs to a different grammar is marked as a difference (such as "String" vs. String is a difference: the characters are the same but only one of them is within a String grammar)
          Aaron P Scooter Software


          • #6
            Dear Aaron,

            thank you once again for the help. I've enabled Alignment Details and it seems that "r" is matched with an "r" character within the {{ }} delimiter. Please excuse the image, company policies.


            Is there a way to prevent this from happening?

            Thank you


            • #7

              Defining grammars can help push in-line alignment, but there isn't a way to prevent it. The in-line character alignment is an algorithm that works in multiple directions at once to be faster, so it can potentially match in either direction; although it tries it's best. In your example I defined two grammars which helped:
              Grammar 1 = PassVar = type delimited from {{ to }}
              Grammar 2 = PassLiteral = Basic RegEx that matched on the literal Passport Numeric. In my example, I use a literal string example "berEC843y793" but if the surrounding text or pattern can be known, you can create a RegEx that matches on any PassportID
              Aaron P Scooter Software


              • #8
                My issue is that passports can have all kinds of regex so I am not sure this would solve my issue. Appreciate the help though!


                • #9
                  Is there any kind of common pattern we could match on for the Passport Literal? If it is at the end of the line, we would match on X characters to End of Line.
                  Aaron P Scooter Software


                  • #10
                    Unfortunately no, it is in the middle of the sentence, plus there are more variables around it that it would make it impossible to match. I have marked "passport number" as important and that kind of solves it as it groups it together now.