Option to disregard duplicate files?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • joema
    Visitor
    • Jun 2007
    • 7

    Option to disregard duplicate files?

    Beyond Compare is extremely useful when comparing two folder trees using "ignore folder structure". This allows validating whether any file in one tree -- regardless of location -- is orphaned from the other tree, even though the files may be in different locations in each tree. This one feature saves an immense amount of work.

    Unfortunately if any of the files within a tree are duplicate, this throws off the comparison. The first file fulfills the match, hence is not shown as an orphan. The next instance of the file is a duplicate so appears as an orphan. It's not possible to tell why it appears, so each one must be checked manually.

    E.g, verify all the files on a portable drive are present in another drive. You don't care where -- you just want to make sure you have them, so the portable drive can be reformatted for other use.

    There are de-duplication utilities you could run on the portable drive before running BC, but this has its own issues. If BC could optionally re-compare the set of orphans to the existing "equals" set, then don't display any which match, that would be great.

    Is there any way to do this now, or is this a possible future feature?
  • Aaron
    Team Scooter
    • Oct 2007
    • 15997

    #2
    There is not a method to detect duplicates now; you could hide Orphans, but this would hide any files that did not match on file name (exist only on one side), not just duplicates.

    Duplicate handling is on our wishlist.
    Aaron P Scooter Software

    Comment

    • joema
      Visitor
      • Jun 2007
      • 7

      #3
      I just wanted to reiterate I am running into this situation frequently. My current workaround is run a stand alone de-duplicating tool (there are many, each with pros/cons), delete the duplicates, then run BC between that de-duplicated folder and the other folder. This exact issue was raised in this thread: http://www.scootersoftware.com/vbull...ight=duplicate

      In my work as an archivist for a documentary film group, I often need to compare two folders with "ignore folder structure" to verify we have all files from one hard drive located somewhere within another hard drive having a different folder structure. Duplicates are OK as long as we have the files. I don't need a generalized de-duplicating tool or a duplicate file finder -- I only need BC to disregard duplicates during a compare using "ignore folder structure".

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 15997

        #4
        Hello,

        BC would treat duplicates as Orphans. Only the first match would match, at which point any additional duplicates on one side would be Orphans (unless both sides have multiple duplicates, at which point they would also align). It would then be a matter of reviewing the Orphans for duplicates or additions/deletes.

        Adding duplicate detection is on our Customer Wishlist, but is not a small project and one we haven't been able to tackle yet.
        Aaron P Scooter Software

        Comment

        • ingrambarclay
          New User
          • May 2018
          • 1

          #5
          Aaron,

          Have y'all gotten any further on this request? It's been a while.. This is one of the main reasons I need software like this. I frequently wind up with Windows duplicating user profile common folder files such as documents and pictures due to people seeding their folders from old computers and roaming profiles dropping another copy with an index at the end of the base filename in parentheses.

          EXAMPLE:
          original file: filename.txt
          duplicate one: filename (1).txt
          duplicate two: filename (2).txt


          Obviously I don't care about these duplicates and would even love having a way to easily strip them out.

          THERE IS A MAJOR CAVEAT HOWEVER to the obvious solution of using the filename filter "* (?).*": sometimes there is no instance of the base filename or the file was intentionally saved by a user with that filename format, and we need the first copy of that file with the parens in the name if so.

          Thanks!

          Ingram Barclay

          Comment

          • Aaron
            Team Scooter
            • Oct 2007
            • 15997

            #6
            Hello,

            It's still something on our wishlist. However, it's a very large project (some other entire programs only perform this single task), and not one we've been able to tackle yet.
            Aaron P Scooter Software

            Comment

            • joema
              Visitor
              • Jun 2007
              • 7

              #7
              Just wanted to state this feature is still needed, but a few cases can be handled by other utilities. Also, in a "left orphan" check, duplicates on the right are OK. However duplicates on the left will cause spurious "left orphans", which is currently expected behavior. One solution is manually de-duplicate the left before comparison. There are various utilities for this but they all entail some risk.

              For field video offloading there are dedicated utilities like Hedge which now have duplicate detection built in. This solves the problem of creating duplicates if the camera card was not reformatted. https://www.newsshooter.com/2018/04/...ittle-smarter/

              For other cases BC duplicate detection would still be useful. We frequently encounter situations where we need to reformat drive x, but it has data (inc'l some duplicates) and we aren't sure is it all copied to a backup RAID. In this case drive x is left and the RAID is right. The left-side duplicates prevent a "left orphan" check using "ignore folder structure". The workaround is manually de-duplicate the left side before running the comparison.

              I'm not sure what % of the BC user base understand how powerful and useful the ability to check for orphans using "ignore folder structure", hence I'm not sure how many people have requested duplicate handling. But I would rather not have it in BC than have it unreliable or with performance issues. IMO a fully-released 64-bit Mac version with the current feature set is much more important.
              Last edited by joema; 21-Oct-2019, 10:30 AM.

              Comment

              • Aaron
                Team Scooter
                • Oct 2007
                • 15997

                #8
                Thanks. Duplicate finding is something we still on our wishlist. With MacOS's Catalina release requiring a 64bit client, it was pretty important to make that release available for those users. It's not easy to balance the schedule with stability and features. While we generally trend towards stability, we want to make sure to get both as often as we can.
                Aaron P Scooter Software

                Comment

                Working...