Announcement

Collapse
No announcement yet.

Find duplicate files with different names

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • frankcollins
    started a topic Find duplicate files with different names

    Find duplicate files with different names

    Hello all,

    We have bc3 here at my job. We have recently received a large number (90,000) of contracts in pdf. In viewing the contracts we realized that many are duplicated over and over again with different file names. So what I need to do is compare the folder's contents against itself, based on the document's contents NOT the file name.

    Is this something bc3 can handle?

    Thanks!
    Frank

  • Aaron
    replied
    Hello,

    It's that easy for an initial scan, but gets trickier with outlier cases and internal or external changes to the file content. We don't want to have a tool that gives the impression of a task but has unreliable results. If it were easy, we'd throw it in the application.

    Leave a comment:


  • peterr
    replied
    Originally posted by dariusan View Post
    Could you allow as to find duplicates by giving us a feature that let's us align files by CRC and SIZE in "flatten" mode?
    I would like to find files with same content but different names....
    Yes, that is all that is required. Flat view, no folders, find duplicates by CRC and SIZE.

    Leave a comment:


  • dariusan
    replied
    Could you allow as to find duplicates by giving us a feature that let's us align files by CRC and SIZE in "flatten" mode?
    I would like to find files with same content but different names....

    Leave a comment:


  • peterr
    replied
    Thanks Aaron. I did have a look at that after you mentioned it. But as the 2 folders had no filenames that matched on name only, there would basically be a heap of left orphans and right orphans. For now, I have been running fdupes to display the duplicates. I'm expecting that it finds all duplicates, based on CRC/size.

    Leave a comment:


  • Aaron
    replied
    Hello,

    Were you in a scenario where showing only orphans or excluding them would be useful? If you right click the toolbar you can switch to Toggles mode, which can individually enable/disable different criteria (Left Orphans, Right Orphans, Right Newer, Same, etc) individually. You could show a very specific criteria using this mode, if it's helpful.

    Leave a comment:


  • peterr
    replied
    Thanks Aaron. I did try the view 'Structure | Ignore Folder structure" which showed me 'duplicate' files side by side. Not on the same line of course, but very similar as shown at https://www.scootersoftware.com/vbul...4&d=1422253539

    'Duplicates' could be easily seen, although there were over 64,000 files, so I wasn't going to check each one. It would be nice to have left orphans and right orphans show up, disregarding the filename of course.

    I did try fdupes, and it did show all the duplicates. Would even be nice if that was 'tweaked' to show 'non duplicates' (orphans).

    Thanks for your help.

    Leave a comment:


  • Aaron
    replied
    Appreciated, peterr. It is something we'd like to tackle, but have been very busy with other large projects and core functionality like 64bit support. It's one of our top wishlist items, too, so it is never out of sight.

    Leave a comment:


  • peterr
    replied
    Originally posted by Aaron View Post
    Thanks. Duplicate scanning is on our Wishlist still. Our Customer Wishlist has items that are not currently scheduled projects, but a place our developers go for ideas for future features and enhancements.
    This thread has had 19,940 views. I think it is about time it had priority.

    Leave a comment:


  • peterr
    replied
    Seriously need this feature added. In the middle of an email recovery, and the files recovered have different file names. Is it a feature yet ?

    Leave a comment:


  • peterr
    replied
    Thanks Aaron. I'm sure many users would find this feature useful. Removing the filename from any folder comparision will no doubt force the need to compare on CRC + Filesize only. That said, once those 2 conditions are met with 2 files, if the contents 'equal', we have a match. (Just scratching out some rough pseudo specs for your developers. )

    Leave a comment:


  • Aaron
    replied
    Thanks. Duplicate scanning is on our Wishlist still. Our Customer Wishlist has items that are not currently scheduled projects, but a place our developers go for ideas for future features and enhancements.

    Leave a comment:


  • peterr
    replied
    Would this "feature" be scheduled for development soon ? The closest I can get to matching the file content is to display the CRC and then sort the list to CRC order. See attached
    Attached Files

    Leave a comment:


  • Aaron
    replied
    Originally posted by peterr View Post
    Okay, so does that mean if I post a request here each day, it will get on the list quicker ?
    I think we'll notice if it's just you. But we have had several users request this. It is something we would like to do, but we have several other large projects already scheduled and being worked on, so it is still on the Wishlist for now.

    Leave a comment:


  • peterr
    replied
    Originally posted by Chris View Post
    Finding duplicate files is still on our wish list, we do keep track of how often it is requested.
    Okay, so does that mean if I post a request here each day, it will get on the list quicker ?

    Leave a comment:

Working...
X