Announcement

Collapse
No announcement yet.

Find duplicate files with different names

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by aussieboykie View Post
    Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix? In this case, it would be a simpler approach than using CRC.
    Easy with Linux, just parse through a folder/path, and add a prefix/sufix to the filename.

    Not sure if windooze can do it though ??

    Either the second or forth post at http://www.linuxforums.org/forum/lin...ript-help.html , will do it.
    Last edited by peterr; 20-Jan-2010, 03:13 AM. Reason: found a nice little script

    Comment


    • #17
      Originally posted by aussieboykie View Post
      Is there a way of transforming names on one side or the other - e.g. to add or subtract a prefix/suffix?
      So long as the prefix is constant throughout the folder, you can use:
      Session \ Session Settings... \ Misc tab \ Alignment overrides

      Click New...
      Align left file: *
      with right file: January-*

      If your prefix can differ, then you would need to code a smarter alignment override using a regular expression
      BC v4.0.7 build 19761
      ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

      Comment


      • #18
        Originally posted by Michael Bulgrien View Post
        If your prefix can differ, then you would need to code a smarter alignment override using a regular expression
        What I actually have is a bunch of digital images in one folder and some of the same images in another folder, renamed with a shooting date/time prefix, e.g.

        IMG_4634.JPG == 2010_01_10_19_38_58_IMG_4634.JPG
        IMG_4635.JPG == 2010_01_10_19_38_58_IMG_4635.JPG
        IMG_4636.JPG == 2010_01_10_19_38_58_IMG_4636.JPG
        IMG_4637.JPG == (missing)
        IMG_4638.JPG == 2010_01_10_19_41_01_IMG_4638.JPG
        etc.

        The prefix is a constant length, but variable text. My level of competence with regular expressions is, at best, embarrassing. If some kind soul could point me in the right direction I'd be grateful.

        Regards, AB

        Comment


        • #19
          A couple of notes: the Alignment Overrides feature is only in BC3 Pro.

          It also will not help with your current issue. The Regular Expression can be useful in defining the specific, matching text, but cannot match on the different text. In this case, your prefix (2010_01_10_19... must be explicit while IMG_4634 can be a regular expression.

          In this case you would align:
          (IMG_\d*\.jpg)
          with 2010_01_10_19_38_58_$1

          We do not currently allow masking on the matchTo side. This is on our Customer Wishlist.
          Aaron P Scooter Software

          Comment


          • #20
            Try this:

            Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
            with right file: $1

            It should work if your date qualified files are on the left.
            BC v4.0.7 build 19761
            ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

            Comment


            • #21
              Originally posted by Michael Bulgrien View Post
              Try this:

              Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
              with right file: $1

              It should work if your date qualified files are on the left.
              Woohoo! Many thanks Michael. I may even take the time to try to understand the regular expression. This is very useful.

              Regards, AB

              Comment


              • #22
                [0-9,_] defines a set of valid characters. The square brackets simply enclose the set. The actual characters are 0 through 9 and the underscore character. Since \d also represents a numeric digit, this could also have been written [\d,_]

                + means that the prior item occurs one or more times. So we are including any combination of numeric digits and underscores at the beginning of the regular expression.

                (IMG_\d\d\d\d.JPG) The four \d explicitly define four numeric digits. This could also have been written (IMG_\d+.JPG) to indicate one or more digits without limiting it to four, or (IMG_\d{4}.JPG) with {4} meaning repeat the prior item exactly 4 times.

                (IMG_\d\d\d\d.JPG) The ( ) indicate that whatever matches the expression inside should be assigned to a variable to be used later.

                $1 is the variable being used later. It contains what was matched in the ( ) on the other side. If you had more than one set of ( ), you would have more than one variable assigned: $2, $3, etc.
                BC v4.0.7 build 19761
                ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

                Comment


                • #23
                  Originally posted by Michael Bulgrien View Post
                  (IMG_\d\d\d\d.JPG) The ( ) indicate that whatever matches the expression inside should be assigned to a variable to be used later.

                  $1 is the variable being used later. It contains what was matched in the ( ) on the other side. If you had more than one set of ( ), you would have more than one variable assigned: $2, $3, etc.
                  At the risk of pushing my luck beyond reasonable bounds...

                  I actually have images from more than one camera, so some are IMG_xxxx, some are DSCxxxxx, and so on. I would therefore like to have more than one set of ( ) and matching $1, $2, $3, etc.. I understand the thrust of what you are saying but am unsure of precisely what to code for left and right to replace your earlier simple case.

                  Originally posted by Michael Bulgrien
                  Align left file: [0-9,_]+(IMG_\d\d\d\d.JPG)
                  with right file: $1
                  Regards, AB

                  Comment


                  • #24
                    Sorry, you won't be able to capture and use more than one backreference in Beyond Compare for the purpose of aligning different kinds of files.

                    Simply create a separate alignment override definition for each file type and you're done.

                    Or you could do something like this:

                    Align left file: [0-9,_]+([A-Z]{3}_*\d+\.JPG)
                    with right file: $1

                    [A-Z]{3} Three alphabetic characters will match both IMG and DSC.
                    _* Using an * instead of a + means zero or more instances instead of one or more instances of the previous character. This, then, will recognize the _ in the IMG format but not require one for the DSC format.
                    Last edited by Michael Bulgrien; 21-Jan-2010, 07:26 AM.
                    BC v4.0.7 build 19761
                    ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

                    Comment


                    • #25
                      Excellent Michael. Thanks for the code and the clear explanation. Much appreciated.

                      Regards, AB

                      Comment


                      • #26
                        Thanks for the creative Regular Expression solution and clear explanation, Michael.
                        Aaron P Scooter Software

                        Comment


                        • #27
                          Could this be added to the BC wishlist ? That is, find duplicate files in a folder comparison, regardless of filename, just match on file size and CRC.

                          btw, when I searched for 'crc' I kept getting "Sorry - no matches. Please try some different terms.", had to use Google which told me 122 hits ? Possibly strings of length 3 are ignored ?

                          Peter

                          Comment


                          • #28
                            Finding duplicate files is still on our wish list, we do keep track of how often it is requested.

                            Yes, the forum software that we're using has a minimum of 4 characters for search terms. I sometimes use Google myself to search our forums. In Google I enter the term I'm searching for plus "site:http://www.scootersoftware.com/vbulletin/" to limit the search to our forums.
                            Chris K Scooter Software

                            Comment


                            • #29
                              Originally posted by Chris View Post
                              Finding duplicate files is still on our wish list, we do keep track of how often it is requested.
                              Okay, so does that mean if I post a request here each day, it will get on the list quicker ?

                              Comment


                              • #30
                                Originally posted by peterr View Post
                                Okay, so does that mean if I post a request here each day, it will get on the list quicker ?
                                I think we'll notice if it's just you. But we have had several users request this. It is something we would like to do, but we have several other large projects already scheduled and being worked on, so it is still on the Wishlist for now.
                                Aaron P Scooter Software

                                Comment

                                Working...
                                X