subset a comparison based on substring in each file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • publiclee
    New User
    • Feb 2016
    • 2

    subset a comparison based on substring in each file

    A standard comparison of two sets of files ( i.e. two versions in an open source distribution) yields a bunch of files with edits. So far so good. But now I find that the versioning information in those files hasn't been updated for some of the files.

    So, for example, here is the top few lines of the older version:
    Code:
    <?php
    /**
     * @package admin
     * @copyright Copyright 2003-2013 Zen Cart Development Team
     * @copyright Portions Copyright 2003 osCommerce
     * @license http://www.zen-cart.com/license/2_0.txt GNU Public License V2.0
     [COLOR="#0000CD"]* @version GIT: $Id: Author: DrByte  Mon Mar 18 12:55:13 2013 -0400 Modified in v1.5.2 $
    [/COLOR] */
    
      class tableBlock {
    The newer version starts like this
    Code:
    <?php
    /**
     * @package admin
     * @copyright Copyright 2003-2013 Zen Cart Development Team
     * @copyright Portions Copyright 2003 osCommerce
     * @license http://www.zen-cart.com/license/2_0.txt GNU Public License V2.0
     [COLOR="#0000CD"]* @version GIT: $Id: Author: DrByte  Mon Mar 18 12:55:13 2013 -0400 Modified in v1.5.2 $
    [/COLOR] */
    
      class [COLOR="#FF0000"]box[/COLOR]TableBlock {
    So the version strings are the same and yet the files are different from the first line of code onwards. A normal text comparison for SVN Keywords won't flag them because they are the same strings. How do I flag them when the version strings are the same? And of course I want to build a report listing those files.
  • Aaron
    Team Scooter
    • Oct 2007
    • 15941

    #2
    Hello,

    You would like to flag the pair of files if text is equal on both sides? Unfortunately, we do not support that type of comparison scan in BC4. By default, comments are considered Unimportant and we can define that grammar to be Important instead, but Important would still be equal if the text itself is equal. We do not have means of tracking a specific grammar and marking it as 'review' if equal. Or am I misunderstanding the request?

    The best workflow I can think of is that a rules-based scan would find the difference on the class line, at which point you can manually review the git code line to see if it is equal or different.
    Aaron P Scooter Software

    Comment

    • publiclee
      New User
      • Feb 2016
      • 2

      #3
      Originally posted by Aaron
      You would like to flag the pair of files if text is equal on both sides?
      Not just text but a specific selection within the file: the $version string which I thought would be matched by a variation on the SVN Keywords text comparison with a setting for finding nothing i.e. !(\$(Id|Author|Date):.+\$) or (\$(Id|Author|Date):.+\$){0}
      Unfortunately the former doesn't work and the latter is rejected as illegal.
      The best workflow I can think of is that a rules-based scan would find the difference on the class line, at which point you can manually review the git code line to see if it is equal or different.
      Damn! There are 374 files to review that way!

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 15941

        #4
        Any RegEx grammar definition can match on the .git flag text you want to find. However, even if we correctly define the grammar, the extend of our comparison will treat it as "Important" or "Unimportant", where Unimportant Differences can be ignored.

        The issue is then if the text is equal, we would give it a status of equal. If it is different, then it can be different or ignored (equal). We could mark all other text as Unimportant, and only compare this single line in your files. Then, if the files are Equal then this line is equal, and if different it is different, ignoring all other text. You can then set the display filter to only show Equal to find which files have this line of text match. In combination, you could also set a Timestamp and Size comparison, and (crucially) disable Override Quick Test results. This would allow you to see files as different if the line is different or if the timestamp or size is different. This way, you could review files that have differences in timestamp or size but an equal content compare (center column is an =).

        Would this help?

        To define a grammar, use the File Format to create a new File Format with a grammar that matches on the .git @version, and no other definitions. A type delimited of:
        * @version
        to
        End of Line

        Then, open a single pair of files and in the Session Settings, Importance tab, check your grammar element and uncheck everything else. This should result in all other differences appearing as blue, which can be ignored with the Ignore Unimportant Differences toggle.
        If this type of comparison is useful, we can then set the dropdown in the Session Settings for "use for all files in parent folder compare", and head up to the Folder Compare and alter it's Session Settings, Comparison tab to "Rules-based" compare, along with Timestamp, Size, and disable Override.
        Aaron P Scooter Software

        Comment

        Working...