Compare folders disregarding file names

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Zoë
    Team Scooter
    • Oct 2007
    • 2666

    #16
    Originally posted by boarders paradise
    .... plus you can find "rule-based duplicates", whereas most other programs can only find binary ones.
    I wouldn't assume that if we ever add "Find Duplicates" functionality that it will initially support "rules-based" duplicates too. The code to do so would be quite a bit different, and while I can certainly see it as an enhancement on top of the current feature request, I definitely wouldn't consider it a show stopper before we would release the more general feature. We can't just do rules-based comparisons of each and every file in order to find the duplicates; it would be prohibitively slow.
    Zoë P Scooter Software

    Comment

    • Michael Bulgrien
      Carpal Tunnel
      • Oct 2007
      • 1772

      #17
      Originally posted by boarders paradise
      ... which is consistent with and does not contradict what I said:
      I was not implying that your statement was a contradiction. I was simply putting the prior quotes in context.

      Don't get me wrong... I wholeheartedly agree that it would be nice if an "alignment mask" could be applied to both sides instead of knowing the exact difference between the file/folder names on each side. Unfortunately, that is not how it was implemented.

      I would like to see such an "alignment mask" option on the official customer wish list. The way I envision it, BC3 Session Settings would let the user choose between an "alignment mask" or the current "alignment overrides" method, but not necessarily both at the same time.
      Last edited by Michael Bulgrien; 27-Aug-2009, 07:27 PM. Reason: typo
      BC v4.0.7 build 19761
      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

      Comment

      • Michael Bulgrien
        Carpal Tunnel
        • Oct 2007
        • 1772

        #18
        Originally posted by Craig
        We can't just do rules-based comparisons of each and every file in order to find the duplicates; it would be prohibitively slow.
        While you're right about doing rules-based comparisons of each and every file being prohibitively slow... I would suggest only doing a rules-based compare when certain criteria is met:

        For example:
        File names are the same
        - or -
        File sizes are the same

        On files that have the same file size, you'll have to do a binary comparison anyways to figure out if it is a true duplicate, so doing a rules-based compare instead would not be a big deal.

        Additional rules-based compares on files with the same name would significantly limit the scope of the effort (would not be prohibitively slow) and would make it possible to identify duplicate files with different line endings, etc.
        BC v4.0.7 build 19761
        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

        Comment

        • Zoë
          Team Scooter
          • Oct 2007
          • 2666

          #19
          Originally posted by Michael Bulgrien
          File names are the same
          - or -
          File sizes are the same
          Given the thread is titled "Compare folders disregarding file names", I'm pretty sure doing the rules-based compare only if the file names are the same would be missing the point.

          As for only doing it when the file sizes are the same, that would be such a small number of cases that it would be completely useless. The times where a rules-based comparison is better than a binary comparison are those where the file sizes are different, due to case changes, differences in encoding or line endings, etc.

          The only way I could actually see doing a "rules-based find duplicates" would be to create three CRCs for each file, one with the original content, one with the line endings and character encoding standardized, and one with the unimportant text stripped and the remaining text converted to upper case. It would give similar results to our current rules-based comparison, and wouldn't involve lengthy text compares, though it obviously couldn't support replacements.

          That would all be new code. Our current structures are all optimized for the interactive or quick compare, so we would have to add support for doing that kind of standardization and computing hashes for it, and we'd need to add similar finger-printing support to the data compare, picture compare, and MP3 compare.
          Zoë P Scooter Software

          Comment

          • Michael Bulgrien
            Carpal Tunnel
            • Oct 2007
            • 1772

            #20
            My post was in response to your suggestion that a rules-based algorythm would not be prioritized for an initial implementation of a dup finder. I would hope that the initial implementation would include a simple rules-based compare on like filenames even if it did not apply to files with different file names.

            When it comes to a complete implementation of a rules-based dup finder, then I agree that a rules-based CRC would be a great solution. I've written my own routines to strip comments and other "unimportant" code from files in the past in order to programatically determine if there are any significant changes between two sets of source code. I agree that it would be a significant effort and would not expect it to be implemented from day one.
            Last edited by Michael Bulgrien; 27-Aug-2009, 07:30 PM.
            BC v4.0.7 build 19761
            ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

            Comment

            • Michael Bulgrien
              Carpal Tunnel
              • Oct 2007
              • 1772

              #21
              Originally posted by Craig
              Given the thread is titled "Compare folders disregarding file names", I'm pretty sure doing the rules-based compare only if the file names are the same would be missing the point.
              The initial post in this thread was regarding the inablility to align files with different filenames. I would consider my comments about the dup-finder to be a somewhat off-topic response to your post. What I am really interested in is your thoughts to my Post #17

              Has the concept of an "alignment mask" as I've described above been added to the customer wish list? This is certainly a topic I've seen come up multiple times since BC3 was released.
              BC v4.0.7 build 19761
              ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

              Comment

              • Zoë
                Team Scooter
                • Oct 2007
                • 2666

                #22
                Yes, it looks like it's actually on the wishlist twice.
                Zoë P Scooter Software

                Comment

                Working...