Replacements & Whitespace

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Michael Bulgrien
    Carpal Tunnel
    • Oct 2007
    • 1772

    Replacements & Whitespace

    Usually additional whitespace does not result in a significant change.

    I am having problems with Replacements. For example, I have a replacement defined with the "replace with" text set to:

    AND c_timestamp = @c_timestamp

    On the right-hand side of the compare, some files have additional whitespace around the equal sign. If the "text to find" on the left was actually replaced with the "replace with" text, then the only difference between the left and right sides would be whitespace, and the lines would be considered similar. In replacements, however, a difference in whitespace makes the change significant.

    I understand that replacements expects a 1-to-1 match with the text in the right pane, but this discrepancy still feels wrong.

    Suggestion:

    Why not programatically create a regular expression from the "replace with" text as follows...
    • Substitute [ /t]* for whitespace next to a delimiting character
    • Substitute [ /t]+ for whitespace not next to a delimiting character
    • Convert any special characters to their regular expression equivalents (such as [[] for [ and [\]] for ] and [(] for ( and [)] for ) etc.)


    ...then, when the "Text to find" is found on the left hand side, evaluate the right hand side to see if the programatically created "replace with" regular expression exists and, if so, change the resulting text to blue instead of allowing whitespace differences to cause a replacement to appear as a significant change.
    BC v4.0.7 build 19761
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
  • Erik
    Team Scooter
    • Oct 2007
    • 437

    #2
    Just create a Replacement with "Text to find" to be "AND\s+c_timestamp", "Replace with" to be "@c_timestamp" and check "Regular expression".

    Sorry, I don't agree with your suggestion. Enforcing the assumptions you're making is not a good idea because they aren't always appropriate. As long as you're precise, the current implementation should work fine. As always, just contact us if you need help developing an appropriate regular expression.
    Erik Scooter Software

    Comment

    • Michael Bulgrien
      Carpal Tunnel
      • Oct 2007
      • 1772

      #3
      Originally posted by Erik
      Enforcing the assumptions you're making is not a good idea because they aren't always appropriate.
      Few things are always appropriate. That is why products provide configurable options to enable and disable certain functionalities.

      I provided a simple example of a replacement that shows up as an important change when the only difference is whitespace...and whitespace is defined as being unimportant. I suggested a way for Scooter to enhance the product to prevent this discrepancy. If there isn't a better argument against it than what you've given, then at least you could be good enough to add improved replacement analysis (based on whitespace importance rules) to the wish list. Thank you.
      BC v4.0.7 build 19761
      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

      Comment

      • Michael Bulgrien
        Carpal Tunnel
        • Oct 2007
        • 1772

        #4
        Originally posted by Erik
        As always, just contact us if you need help developing an appropriate regular expression.
        Since you think my idea a poor one and you've offered your services to assist me in setting up replacements properly...with this "Text to find" regular expression:
        [(]?\s*c_timestamp\s*=\s*@c_timestamp[ /t]*[)]?

        Perhaps you can tell me how to set up the replacement to make it successfully display the right hand side as unimportant for all of the following values:
        TSEqual(c_timestamp,@c_timestamp)
        TSEqual( c_timestamp,@c_timestamp)
        TSEqual(c_timestamp ,@c_timestamp)
        TSEqual(c_timestamp, @c_timestamp)
        TSEqual( c_timestamp ,@c_timestamp)
        TSEqual( c_timestamp, @c_timestamp)
        TSEqual( c_timestamp,@c_timestamp )
        TSEqual(c_timestamp , @c_timestamp)
        TSEqual(c_timestamp ,@c_timestamp )
        TSEqual(c_timestamp, @c_timestamp )
        TSEqual( c_timestamp , @c_timestamp)
        TSEqual( c_timestamp ,@c_timestamp )
        TSEqual(c_timestamp , @c_timestamp )
        TSEqual( c_timestamp , @c_timestamp )

        Or if I use this "Text to find" regular expression instead:
        TSEqual\s*[(]\s*c_timestamp\s*,\s*@c_timestamp\s*[)]

        Perhaps you can tell me how to make it successfully display the right hand side as unimportant for all of the following values:
        c_timestamp=@c_timestamp
        c_timestamp= @c_timestamp
        c_timestamp =@c_timestamp
        c_timestamp = @c_timestamp

        If you can't create a replacement to do that, then there is a problem... because all the compare to values are equivalent, and whitespace is defined as unimportant in for SQL scripts (not to mention that it is also case insensitive...so we could a few hundred more examples to the lists above).
        BC v4.0.7 build 19761
        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

        Comment

        • Chris
          Team Scooter
          • Oct 2007
          • 5538

          #5
          Hi Michael,

          Thanks for the explanation. As I understand it, this is the text in your files:

          Left Side:
          AND c_timestamp = @c_timestamp

          Right Side:
          TSEqual(c_timestamp,@c_timestamp)

          where the right side can have variable numbers of spaces between the comma or inside the parenthesis.

          It isn't possible to have variable amounts of whitespace in the "Replace with:" expression.

          However, if the expression "AND c_timestamp = @c_timestamp" is fixed, you can swap the sides of your files, so it is the replacement.

          Once you've swapped sides, you can use the following as the replacement:
          Text to find:
          TSEqual\(\s*c_timestamp\s*,\s*@c_timestamp\s*\)

          Replace with:
          AND c_timestamp = @c_timestamp

          Does this work for the files you're comparing?
          Chris K Scooter Software

          Comment

          • Michael Kujawa
            Enthusiast
            • Oct 2007
            • 46

            #6
            I understand that isn't how replacements work; it's even clear from the name: it's not "fuzzy matches", it's "replacements". But I have several times wanted the kind of power Michael is talking about: patterns on the right.

            You suggest swapping sides as a fix; perhaps we could have right-side replacements as well? Then you'd compare the left-replaced side with the right-replaced side. So for Michael's problem, you'd left replace
            AND\s+c_timestamp\s+=\s+@c_timestamp
            with
            TSEqual(c_timestamp,@c_timestamp)

            and you'd right replace
            TSEqual(\s*c_timestamp\s*,\s*@c_timestamp\s*)
            with
            TSEqual(c_timestamp,@c_timestamp)

            Then they'd match when you perform the comparison. As for the UI, I'm thinking there could be one replacements UI with three fields: left pattern, right pattern, and replacement. That would cover the common case of "I want lines with this pattern on the left to match lines with this pattern on the right" as well as providing the full suite of powers (by leaving a field blank.)

            Comment

            • Erik
              Team Scooter
              • Oct 2007
              • 437

              #7
              Thanks, Michael Kujawa. I've been meaning to add a "Left is source" option for replacements that fixes a swap issue and will also address this issue. It should make it into 3.0.5 which will probably be out later today.
              Erik Scooter Software

              Comment

              • Michael Bulgrien
                Carpal Tunnel
                • Oct 2007
                • 1772

                #8
                Originally posted by Chris
                As I understand it, this is the text in your files:
                Left Side:
                AND c_timestamp = @c_timestamp
                Right Side:
                TSEqual(c_timestamp,@c_timestamp)
                There is no "AND" in the last example I posted, but other than that, you have the right idea. However, we are talking about hundreds of files that have been written by multiple developers over a span of 10 years. So, not all statements are coded alike (some have extra whitespace, some have no whitespace).

                Originally posted by Chris
                It isn't possible to have variable amounts of whitespace in the "Replace with:" expression.
                Yes, I realize that. That is why I suggested a way that the Scooter team could enhance Beyond Compare by adding the functionality.

                Originally posted by Chris
                However, if the expression "AND c_timestamp = @c_timestamp" is fixed, you can swap the sides of your files, so it is the replacement.
                The whole purpose of the compare tool is to identify files that need to be "fixed". Having to "fix" one set of files in order to make replacements work with another set of files defeats the purpose of using the tool.

                BC3 replacements would be a whole lot more powerful if an enhancement such as I described were implemented. As a developer, I pictured how it could be acomplished, and it seemed quite doable. I already know how to do the work-arounds...I am quite proficient with writing one-timers and doctoring files. I also know how much effort would be eliminated if there was no need for a work-around. Nevertheless, it is your product...and I am just a user...so if you don't think my suggestion has value...you're entitled to that opinion. Thanks.
                BC v4.0.7 build 19761
                ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

                Comment

                • Michael Bulgrien
                  Carpal Tunnel
                  • Oct 2007
                  • 1772

                  #9
                  Originally posted by Michael Kujawa
                  You suggest swapping sides as a fix; perhaps we could have right-side replacements as well? Then you'd compare the left-replaced side with the right-replaced side...

                  ...As for the UI, I'm thinking there could be one replacements UI with three fields: left pattern, right pattern, and replacement. That would cover the common case of "I want lines with this pattern on the left to match lines with this pattern on the right" as well as providing the full suite of powers (by leaving a field blank.)
                  Thanks Michael. I thought of a way to programatically derive the right-hand regular expression without changing the UI, but your suggestion for bi-directional replacements eliminates the "hokiness" that set Erik off. Well thought out and well stated.
                  BC v4.0.7 build 19761
                  ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

                  Comment

                  • Michael Kujawa
                    Enthusiast
                    • Oct 2007
                    • 46

                    #10
                    I pulled up a diff to play with this new Left Is Source feature (I'm surprised this isn't the default, since it's the old behavior.) I do like being able to apply the replacements in both directions.

                    It appears that left replacements are compared against unaltered right lines and vice versa? If I have "AAA" on the left and "BBB" on the right, and I add AAA=CCC (left) and BBB=CCC (right) replacements, the line still shows as different. I was expecting the final comparison to be against CCC on both sides.

                    ( Michael: Thanks )
                    Last edited by Michael Kujawa; 25-Sep-2008, 06:45 PM.

                    Comment

                    • Erik
                      Team Scooter
                      • Oct 2007
                      • 437

                      #11
                      Hi Michael K,

                      "Left is source" will be the default in 3.0.8. Replacements are designed to handle variable renames. They behave as if you searched and replaced the text on the source side before comparing to the other side. If you have replacements on both sides, they are compared to the original text.
                      Erik Scooter Software

                      Comment

                      Working...