Find duplicate files with different names

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • peterr
    Fanatic
    • Nov 2004
    • 142

    #16
    Originally posted by aussieboykie
    Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix? In this case, it would be a simpler approach than using CRC.
    Easy with Linux, just parse through a folder/path, and add a prefix/sufix to the filename.

    Not sure if windooze can do it though ??

    Either the second or forth post at http://www.linuxforums.org/forum/lin...ript-help.html , will do it.
    Last edited by peterr; 20-Jan-2010, 03:13 AM. Reason: found a nice little script

    Comment

    • Michael Bulgrien
      Carpal Tunnel
      • Oct 2007
      • 1772

      #17
      Originally posted by aussieboykie
      Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix?
      So long as the prefix is constant throughout the folder, you can use:
      Session \ Session Settings... \ Misc tab \ Alignment overrides

      Click New...
      Align left file: *
      with right file: January-*

      If your prefix can differ, then you would need to code a smarter alignment override using a regular expression
      BC v4.0.7 build 19761
      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

      Comment

      • aussieboykie
        Expert
        • Oct 2009
        • 55

        #18
        Originally posted by Michael Bulgrien
        If your prefix can differ, then you would need to code a smarter alignment override using a regular expression
        What I actually have is a bunch of digital images in one folder and some of the same images in another folder, renamed with a shooting date/time prefix, e.g.

        IMG_4634.JPG == 2010_01_10_19_38_58_IMG_4634.JPG
        IMG_4635.JPG == 2010_01_10_19_38_58_IMG_4635.JPG
        IMG_4636.JPG == 2010_01_10_19_38_58_IMG_4636.JPG
        IMG_4637.JPG == (missing)
        IMG_4638.JPG == 2010_01_10_19_41_01_IMG_4638.JPG
        etc.

        The prefix is a constant length, but variable text. My level of competence with regular expressions is, at best, embarrassing. If some kind soul could point me in the right direction I'd be grateful.

        Regards, AB

        Comment

        • Aaron
          Team Scooter
          • Oct 2007
          • 15996

          #19
          A couple of notes: the Alignment Overrides feature is only in BC3 Pro.

          It also will not help with your current issue. The Regular Expression can be useful in defining the specific, matching text, but cannot match on the different text. In this case, your prefix (2010_01_10_19... must be explicit while IMG_4634 can be a regular expression.

          In this case you would align:
          (IMG_\d*\.jpg)
          with 2010_01_10_19_38_58_$1

          We do not currently allow masking on the matchTo side. This is on our Customer Wishlist.
          Aaron P Scooter Software

          Comment

          • Michael Bulgrien
            Carpal Tunnel
            • Oct 2007
            • 1772

            #20
            Try this:

            Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
            with right file: $1

            It should work if your date qualified files are on the left.
            BC v4.0.7 build 19761
            ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

            Comment

            • aussieboykie
              Expert
              • Oct 2009
              • 55

              #21
              Originally posted by Michael Bulgrien
              Try this:

              Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
              with right file: $1

              It should work if your date qualified files are on the left.
              Woohoo! Many thanks Michael. I may even take the time to try to understand the regular expression. This is very useful.

              Regards, AB

              Comment

              • Michael Bulgrien
                Carpal Tunnel
                • Oct 2007
                • 1772

                #22
                [0-9,_] defines a set of valid characters. The square brackets simply enclose the set. The actual characters are 0 through 9 and the underscore character. Since \d also represents a numeric digit, this could also have been written [\d,_]

                + means that the prior item occurs one or more times. So we are including any combination of numeric digits and underscores at the beginning of the regular expression.

                (IMG_\d\d\d\d.JPG) The four \d explicitly define four numeric digits. This could also have been written (IMG_\d+.JPG) to indicate one or more digits without limiting it to four, or (IMG_\d{4}.JPG) with {4} meaning repeat the prior item exactly 4 times.

                (IMG_\d\d\d\d.JPG) The ( ) indicate that whatever matches the expression inside should be assigned to a variable to be used later.

                $1 is the variable being used later. It contains what was matched in the ( ) on the other side. If you had more than one set of ( ), you would have more than one variable assigned: $2, $3, etc.
                BC v4.0.7 build 19761
                ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

                Comment

                • aussieboykie
                  Expert
                  • Oct 2009
                  • 55

                  #23
                  Originally posted by Michael Bulgrien
                  (IMG_\d\d\d\d.JPG) The ( ) indicate that whatever matches the expression inside should be assigned to a variable to be used later.

                  $1 is the variable being used later. It contains what was matched in the ( ) on the other side. If you had more than one set of ( ), you would have more than one variable assigned: $2, $3, etc.
                  At the risk of pushing my luck beyond reasonable bounds...

                  I actually have images from more than one camera, so some are IMG_xxxx, some are DSCxxxxx, and so on. I would therefore like to have more than one set of ( ) and matching $1, $2, $3, etc.. I understand the thrust of what you are saying but am unsure of precisely what to code for left and right to replace your earlier simple case.

                  Originally posted by Michael Bulgrien
                  Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
                  with right file: $1
                  Regards, AB

                  Comment

                  • Michael Bulgrien
                    Carpal Tunnel
                    • Oct 2007
                    • 1772

                    #24
                    Sorry, you won't be able to capture and use more than one backreference in Beyond Compare for the purpose of aligning different kinds of files.

                    Simply create a separate alignment override definition for each file type and you're done.

                    Or you could do something like this:

                    Align left file: [0-9,_]+([A-Z]{3}_*\d+\.JPG)
                    with right file: $1

                    [A-Z]{3} Three alphabetic characters will match both IMG and DSC.
                    _* Using an * instead of a + means zero or more instances instead of one or more instances of the previous character. This, then, will recognize the _ in the IMG format but not require one for the DSC format.
                    Last edited by Michael Bulgrien; 21-Jan-2010, 07:26 AM.
                    BC v4.0.7 build 19761
                    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

                    Comment

                    • aussieboykie
                      Expert
                      • Oct 2009
                      • 55

                      #25
                      Excellent Michael. Thanks for the code and the clear explanation. Much appreciated.

                      Regards, AB

                      Comment

                      • Aaron
                        Team Scooter
                        • Oct 2007
                        • 15996

                        #26
                        Thanks for the creative Regular Expression solution and clear explanation, Michael.
                        Aaron P Scooter Software

                        Comment

                        • peterr
                          Fanatic
                          • Nov 2004
                          • 142

                          #27
                          Could this be added to the BC wishlist ? That is, find duplicate files in a folder comparison, regardless of filename, just match on file size and CRC.

                          btw, when I searched for 'crc' I kept getting "Sorry - no matches. Please try some different terms.", had to use Google which told me 122 hits ? Possibly strings of length 3 are ignored ?

                          Peter

                          Comment

                          • Chris
                            Team Scooter
                            • Oct 2007
                            • 5538

                            #28
                            Finding duplicate files is still on our wish list, we do keep track of how often it is requested.

                            Yes, the forum software that we're using has a minimum of 4 characters for search terms. I sometimes use Google myself to search our forums. In Google I enter the term I'm searching for plus "site:http://www.scootersoftware.com/vbulletin/" to limit the search to our forums.
                            Chris K Scooter Software

                            Comment

                            • peterr
                              Fanatic
                              • Nov 2004
                              • 142

                              #29
                              Originally posted by Chris
                              Finding duplicate files is still on our wish list, we do keep track of how often it is requested.
                              Okay, so does that mean if I post a request here each day, it will get on the list quicker ?

                              Comment

                              • Aaron
                                Team Scooter
                                • Oct 2007
                                • 15996

                                #30
                                Originally posted by peterr
                                Okay, so does that mean if I post a request here each day, it will get on the list quicker ?
                                I think we'll notice if it's just you. But we have had several users request this. It is something we would like to do, but we have several other large projects already scheduled and being worked on, so it is still on the Wishlist for now.
                                Aaron P Scooter Software

                                Comment

                                Working...