Page 1 of 2 12 LastLast
Results 1 to 10 of 13
  1. #1
    Join Date
    Jun 2009
    Posts
    11

    Default An Apples to Oranges Comparison?

    How can, alone or in conjunction with a 3rd party utility or some Windows OS tweaking, BC conduct a Text to Folder session?

    My current understanding is that BC does Text file to Text file, or Folder to Folder, but NOT Text file to Folder sessions. I have a text file that includes a large list of phrases, each on its own line (e.g. titles of famous paintings, songs I would like to obtain images and tracks of). I also have a huge folder in Windows with many such media files I have already obtained, sorted in deep levels of subfolders. I'd like to know which of the items in my list I have already obtained (as the filenames usually contain some part of a string from the phrase), so I can remove them from my list and do not go after those again. Maybe there's some batch search utility I don't know about, so I'm trying to do this with BC as BC goes "beyond" comparisons.

    The search would have to be approximate (like Google/Windows Search) not exact as the filenames in the folders would be close but not the same as the items in my list file, and I noticed the Text comparison logic BC3 uses is good enough. Any ideas? I may sound like a power user but I'm not so I appreciate details. Thanks...
    Last edited by a_bc_user; 17-Jan-2010 at 04:48 AM.

  2. #2
    Join Date
    Oct 2007
    Location
    pittsburgh, PA
    Posts
    64

    Default

    A quick and dirty way would be to open up a command prompt, go to the folder you are interested in, and enter this command:

    dir > filenames.txt

    This puts a list of all of the file names in the folder in "filenames.txt", one per line, which you can then compare to your existing file of phrases. This ought to be at least close to what you are looking for.

  3. #3
    Join Date
    Jun 2009
    Posts
    11

    Default

    Thanks tlsscales, I think we're making progress with this approach, though it's looking like it's going to be more of a slow and dirty way for me! I channeled the output into a text file with the syntax you provided, adding /s after the dir command to drill into subdirectories. I then loaded the two text files in a BC Text Compare session. It seems somehow I have to tweak some BC settings for it to be useful, as right now I'm getting all red text on both sides, with 1-3 coincidental letters being matched as black, rather than words or phrases. How can I do this? Also, I don't know if it's necessary for the comparison to work, but there's a lot of useless information in the Dir output that doesn't have to do with a filename (like directory names, size, spaces, etc).

    Left side example (my small list of items I am looking for):

    Carl Orff - O Fortuna
    Leonardo Davinci - Mona Lisa
    [ETC...]

    Right side example (directory output):

    Volume in drive E is LACIE-NTFS
    Volume Serial Number is XXXX-XXXX

    Directory of E:\MYDOCU~1\MYMEDI~1

    17-Jan-10 10:00 PM <DIR> .
    17-Jan-10 10:00 PM <DIR> ..
    23-Jun-09 03:30 PM <DIR> Paintings
    17-Jan-10 10:00 PM 0 output.txt
    14-Jan-10 02:48 PM <DIR> Operas
    1 File(s) 0 bytes

    Directory of E:\MYDOCU~1\MYMEDI~1\Paintings

    23-Jun-09 03:30 PM <DIR> .
    23-Jun-09 03:30 PM <DIR> ..
    05-Sep-08 12:24 AM 1,000,000 01-Henri Matisse Odalisque Watercolor.jpg
    05-Sep-08 12:24 AM 1,000,000 02-Mona Lisa and other Louvre Works.jpg
    2 File(s) 2,000,000 bytes

    Directory of E:\MYDOCU~1\MYMEDI~1\Operas

    23-Jun-09 03:30 PM <DIR> .
    23-Jun-09 03:30 PM <DIR> ..
    05-Sep-08 12:24 AM 1,000,000 8-Fortuna Imperatrix Mundi.wav
    05-Sep-08 12:24 AM 2,000,000 Fiddler on the Roof.wav
    [ETC...]

    So in this example I'd need BC to align the lines containing "Mona Lisa" and "Fortuna" on both sides on the same line and mark them accordingly as a match.

    Or a completely different approach to knowing which items in the left are already contained in the right (some kind of search maybe?).

  4. #4
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    4,730

    Default

    If the file you're comparing the directory listing against only has filenames in it, using "DIR /B > filenames.txt" might help. /B is for bare, it gives a directory listing with only filenames.

    You can list all command line switches supported by DIR by entering DIR /? on the command line.

    If the names in your list aren't in alphabetical order, you might also want to use the "Sorted" file format to sort the filenames before they are compared.
    Chris K Scooter Software

  5. #5
    Join Date
    Jun 2009
    Posts
    11

    Default

    Ahh you're bringing back good ol' DOS to me now after 15 years. good of MS to keep "cmd" in Windows 7 and also the nifty Windows PowerShell that understands unix and dos commands.

    So, the best I could do with the Dir command attributes is:
    dir /a:-D /B O:N /s > list.txt
    The resulting file drills into subdirectories and lists files only with no directories, sorted alphabetically.

    Still, I'm not getting just the filename to be useful when looking at it in BC... the full path of every file is listed on each line, which is quite verbose as these 1000s of files go about 7 directory levels deep.

    Example: E:\Dir\Dir1\Dir1A\Dir1Aa\file.wav

    And the files are not sorted alphabetically compared to each other, but relative to only the other files in their subdirectory then the order resets.

    If there isn't some syntax to do/script the dos command so it ignores the path, maybe there's a rule in BC to cut out everything before the last "\" on each line, etc.?

  6. #6
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    4,730

    Default

    It is possible to mark some of the path unimportant in BC3. See the following link for instructions: http://www.scootersoftware.com/suppo...mportantv3.php
    Chris K Scooter Software

  7. #7
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    2,525

    Default

    If you want to stick with a purely command-line script, I'd suggest passing your file through Sed to strip off the paths then use the Windows "sort" command to sort the resulting file. A plain text editor that supports seach and replace using regular expressions would work too.

    The regular expressions to use would be:

    Find: .*\\([^\\]+)$
    Replace with: $1

    I tested it with EditPad Pro and it handled the replacement and has a sort command built-in.
    ZoŽ P Scooter Software

  8. #8
    Join Date
    Jun 2009
    Posts
    11

    Default

    OK, I've tried both approaches and decided to first strip the paths then load into BC. The regular expression provided worked beautifully. Of course there are always some unexpected remaining things needing to be stripped, such as some filenames are preceded by track numbers, which I had some hurdles forming the right [1-9][0-9] syntax for.
    I'll keep learning about regex;
    Perhaps even without sorting it is good enough now after paths have been stripped to look at in BC... at least all the filenames fit on the right screen.

    In BC, the compare is still turning up a lot of reds as it is trying to match letters (matched in black) rather than whole words. Was there some way to do this with the Grammar match whole words only, or another regular expression within BC?

  9. #9
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,908

    Default

    Could you give some specific examples? You may just need to tweak your Alignment settings to avoid mismatches, such as disable Align Similar, or Never Align Mismatches.
    Aaron P Scooter Software

  10. #10
    Join Date
    Jun 2009
    Posts
    11

    Default

    Sure,

    Left side (list of items I am looking for):

    Blah
    Carl Orff - O Fortuna
    Blah Blah
    Leonardo Davinci - Mona Lisa
    Blah Blah Blah
    [ETC...]

    Right side (actual directory output I am looking in):

    22-Mona Lisa and other Louvre Works.jpg
    Mumbo
    Mumbo Jumbo
    8-Fortuna Imperatrix Mundi.wav
    Coco Mumbo Jumbo
    [ETC...]

    What is currently happening is:
    - The text on both sides is all mainly red.
    - The lines on both sides are aligned when just 1 or 2 letters match.
    - Those corresponding letters are in black.
    I think the "l" in the first left Blah would be black as is the first "L" in 22-Mona Lisa and other Louvre Works.jpg, but it may be that this is only marked as a match when the letters occupy the same position in the phrase.

    What should happen is:
    - The text on both sides is all mainly red
    - The lines on both sides are aligned when 1 complete word matches (e.g., "Mona" or "Fortuna", irrespective of the position of the letters on that line. That word is marked black. Something similar to:

    Left:
    Carl Orff - O Fortuna
    Leonardo Davinci - Mona Lisa
    Blah
    Blah Blah
    Blah Blah Blah

    Right:
    8-Fortuna (Imperatrix Mundi).wav
    22-Mona Lisa Displayed at the Louvre.jpg
    Mumbo
    Mumbo Jumbo
    Coco Mumbo Jumbo

    Basically I just want the simplest way to catch which titles I already have files corresponding for.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •