Announcement

Collapse
No announcement yet.

Find duplicate files with different names

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • peterr
    replied
    Could this be added to the BC wishlist ? That is, find duplicate files in a folder comparison, regardless of filename, just match on file size and CRC.

    btw, when I searched for 'crc' I kept getting "Sorry - no matches. Please try some different terms.", had to use Google which told me 122 hits ? Possibly strings of length 3 are ignored ?

    Peter

    Leave a comment:


  • Aaron
    replied
    Thanks for the creative Regular Expression solution and clear explanation, Michael.

    Leave a comment:


  • aussieboykie
    replied
    Excellent Michael. Thanks for the code and the clear explanation. Much appreciated.

    Regards, AB

    Leave a comment:


  • Michael Bulgrien
    replied
    Sorry, you won't be able to capture and use more than one backreference in Beyond Compare for the purpose of aligning different kinds of files.

    Simply create a separate alignment override definition for each file type and you're done.

    Or you could do something like this:

    Align left file: [0-9,_]+([A-Z]{3}_*\d+\.JPG)
    with right file: $1

    [A-Z]{3} Three alphabetic characters will match both IMG and DSC.
    _* Using an * instead of a + means zero or more instances instead of one or more instances of the previous character. This, then, will recognize the _ in the IMG format but not require one for the DSC format.
    Last edited by Michael Bulgrien; 21-Jan-2010, 07:26 AM.

    Leave a comment:


  • aussieboykie
    replied
    Originally posted by Michael Bulgrien View Post
    (IMG_\d\d\d\d.JPG) The ( ) indicate that whatever matches the expression inside should be assigned to a variable to be used later.

    $1 is the variable being used later. It contains what was matched in the ( ) on the other side. If you had more than one set of ( ), you would have more than one variable assigned: $2, $3, etc.
    At the risk of pushing my luck beyond reasonable bounds...

    I actually have images from more than one camera, so some are IMG_xxxx, some are DSCxxxxx, and so on. I would therefore like to have more than one set of ( ) and matching $1, $2, $3, etc.. I understand the thrust of what you are saying but am unsure of precisely what to code for left and right to replace your earlier simple case.

    Originally posted by Michael Bulgrien
    Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
    with right file: $1
    Regards, AB

    Leave a comment:


  • Michael Bulgrien
    replied
    [0-9,_] defines a set of valid characters. The square brackets simply enclose the set. The actual characters are 0 through 9 and the underscore character. Since \d also represents a numeric digit, this could also have been written [\d,_]

    + means that the prior item occurs one or more times. So we are including any combination of numeric digits and underscores at the beginning of the regular expression.

    (IMG_\d\d\d\d.JPG) The four \d explicitly define four numeric digits. This could also have been written (IMG_\d+.JPG) to indicate one or more digits without limiting it to four, or (IMG_\d{4}.JPG) with {4} meaning repeat the prior item exactly 4 times.

    (IMG_\d\d\d\d.JPG) The ( ) indicate that whatever matches the expression inside should be assigned to a variable to be used later.

    $1 is the variable being used later. It contains what was matched in the ( ) on the other side. If you had more than one set of ( ), you would have more than one variable assigned: $2, $3, etc.

    Leave a comment:


  • aussieboykie
    replied
    Originally posted by Michael Bulgrien View Post
    Try this:

    Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
    with right file: $1

    It should work if your date qualified files are on the left.
    Woohoo! Many thanks Michael. I may even take the time to try to understand the regular expression. This is very useful.

    Regards, AB

    Leave a comment:


  • Michael Bulgrien
    replied
    Try this:

    Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
    with right file: $1

    It should work if your date qualified files are on the left.

    Leave a comment:


  • Aaron
    replied
    A couple of notes: the Alignment Overrides feature is only in BC3 Pro.

    It also will not help with your current issue. The Regular Expression can be useful in defining the specific, matching text, but cannot match on the different text. In this case, your prefix (2010_01_10_19... must be explicit while IMG_4634 can be a regular expression.

    In this case you would align:
    (IMG_\d*\.jpg)
    with 2010_01_10_19_38_58_$1

    We do not currently allow masking on the matchTo side. This is on our Customer Wishlist.

    Leave a comment:


  • aussieboykie
    replied
    Originally posted by Michael Bulgrien View Post
    If your prefix can differ, then you would need to code a smarter alignment override using a regular expression
    What I actually have is a bunch of digital images in one folder and some of the same images in another folder, renamed with a shooting date/time prefix, e.g.

    IMG_4634.JPG == 2010_01_10_19_38_58_IMG_4634.JPG
    IMG_4635.JPG == 2010_01_10_19_38_58_IMG_4635.JPG
    IMG_4636.JPG == 2010_01_10_19_38_58_IMG_4636.JPG
    IMG_4637.JPG == (missing)
    IMG_4638.JPG == 2010_01_10_19_41_01_IMG_4638.JPG
    etc.

    The prefix is a constant length, but variable text. My level of competence with regular expressions is, at best, embarrassing. If some kind soul could point me in the right direction I'd be grateful.

    Regards, AB

    Leave a comment:


  • Michael Bulgrien
    replied
    Originally posted by aussieboykie View Post
    Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix?
    So long as the prefix is constant throughout the folder, you can use:
    Session \ Session Settings... \ Misc tab \ Alignment overrides

    Click New...
    Align left file: *
    with right file: January-*

    If your prefix can differ, then you would need to code a smarter alignment override using a regular expression

    Leave a comment:


  • peterr
    replied
    Originally posted by aussieboykie View Post
    Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix? In this case, it would be a simpler approach than using CRC.
    Easy with Linux, just parse through a folder/path, and add a prefix/sufix to the filename.

    Not sure if windooze can do it though ??

    Either the second or forth post at http://www.linuxforums.org/forum/lin...ript-help.html , will do it.
    Last edited by peterr; 20-Jan-2010, 03:13 AM. Reason: found a nice little script

    Leave a comment:


  • aussieboykie
    replied
    Having looked at what's possible with Session --> Folder Compare Report, the one column label that is not listed/selectable is Name, presumably because it is assumed that a name comparison is always required. If Name was added as an option, and could therefore be unchecked, it would be become trivial to compare on CRC/Size/Modified.

    Consider this duly suggested.

    Regards, AB

    Leave a comment:


  • aussieboykie
    replied
    A related question. Several times recently I've had occasion to want to check for missing files in a scenario where left folder contains a bunch of files with original names and right folder contains a subset of the contents of left folder with original names modified by prefix or suffix. For example, using prefix...

    File1.txt == January-File1.txt
    File2.txt == January-File2.txt
    File3.txt
    File4.txt
    File5.txt == January-File5.txt

    Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix? In this case, it would be a simpler approach than using CRC.

    Regards, AB

    Leave a comment:


  • peterr
    replied
    Originally posted by chrroe View Post
    Using CRC alone brings roughly 99,99999999% sureness that the files are the same. But when you consider the filesize and date+time like in your screenshot then you can be pretty sure.
    Originally posted by Michael Bulgrien View Post
    I agree. Filesize plus CRC is sufficient to ensure duplicity.
    That's good, thanks Christoph and Michael.

    Peter

    Leave a comment:

Working...
X