Page 1 of 3 123 LastLast
Results 1 to 10 of 23
  1. #1
    Join Date
    Dec 2010
    Posts
    17

    Default Delete one text list from another.

    Hi

    You will have to forgive me as I am a bit of an idiot as I can’t seem to do the simplest things sometimes !

    I am trying to use your software to compare two text lists and subtract one from the other.

    I managed to work out that if I used the “Show Same” view then delete the displayed lines that does work….a bit.

    However what I am trying to do is simply remove the same words or lines that are on list A from list B.

    Your program seems to only check for lines that are in a similar position and not search throughout the entire list. I guess what I am trying to do is a bit like an automated “Find and Replace” where the entire text of list A is compared to the entire text of list B.

    Sample.

    List A

    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    line 8

    List B

    Other Text 1
    Other Text 2
    Other Text 3
    Other Text 4
    line 1
    Other Text 5
    line 2
    Other Text 6
    line 3
    Other Text 7
    Other Text 8
    line 4
    Other Text 9

    What I am trying to achieve would be an output of …

    List C

    Other Text 1
    Other Text 2
    Other Text 3
    Other Text 4
    Other Text 5
    Other Text 6
    Other Text 7
    Other Text 8
    Other Text 9

    So List A is subtracted from List B to give me List C.

    Can anyone help me please ?

  2. #2
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Given your example, you should get as you expect by using the Show Same filter, and deleting the Line 1-4 on the right side.

    The issue I believe you are running into is if line1-4 is not in the same order on both sides where List A is:

    line 2
    line 3
    line 4
    line 1
    line 5
    line 6
    line 7
    line 8


    To help with this scenario, your files can be sorted. If your data is only line by line, and it is ok to sort every line independently and alphabetically, we have a Sort file format that can do that. Go to the Session menu -> Session Settings, Format tab, and switch from Detected to Sorted for both sides.

    How does that work for you? If you are still having any trouble, would you be able to post or email us a pair of example files? Our email is support@scootersoftware.com and if you email us please include a link back to this forum post.
    Aaron P Scooter Software

  3. #3
    Join Date
    Dec 2010
    Posts
    17

    Default

    Hi Aaron

    Thank you for your reply and help.

    I tried what you said and I can see what you mean, however it didn’t really work. It did remove some but in cases where there is more than one of the same lines on List B it doesn’t seem to work.

    I did mess about with this…for some hours and discovered that if I changed this setting ..

    Rules / Alignment / and set it to Alternate Method.

    It seemed to work better.

    I have made a sample list for you below with very few lines that seem to throw the program. They are deliberately messed about to show you the problem.

    I think this is obviously a function this program isn’t meant to do, I guess it is an unusual request. So rather than me wasting your time trying to make it work could this perhaps be a feature request for a later version ?

    Could you please make an option to allow 2 lists of different sizes to be compared and then a button to remove all the “same” matching lines / words from list B ? Also make it so they don’t have to be aligned. As mentioned before almost like an automatic “find and replace with blank”.

    Thanks

    Here’s a sample for you to play with.

    List A

    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    line 8
    line 9
    line 10



    List B

    line 1 This line should be left after filtering. 1/4
    line 6
    line 4B This line should be left after filtering. 4/4
    line 1
    line 2
    line 3
    line 1
    line 7
    line 8
    line 6
    line 7
    line 8
    line 2
    line 5
    line 6
    line 2B This line should be left after filtering. 2/2
    line 1
    line 7
    line 8
    line 8
    line 2
    line 6
    line 7
    line 3B This line should be left after filtering. 3/4
    line 4
    line 5
    line 6
    line 1
    line 3

  4. #4
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    If you enable Never Align mismatches, and use the Sorted file format, I believe this should align as expected with the initial example.

    With the new example, however, you have introduced the concept of duplicate lines that you wish to remove. Unfortunately, we do not have a method of automatically removing these at this time. As a workaround, you can use our Replace dialog to manually replace all instances of specific text (line 8) with a blank line to remove them.

    You can also create your own custom external conversion. If this could remove all duplicate entries from a file, and sort the remaining unique lines, you could then continue to use the above method:
    http://www.scootersoftware.com/suppo...rnalconversion
    Aaron P Scooter Software

  5. #5
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Also, what kind of files are you comparing? We are always interested in hearing how our customers are using our application in real world scenarios.
    Aaron P Scooter Software

  6. #6
    Join Date
    Dec 2010
    Posts
    17

    Default

    Hi Aaron

    Well,…. thank you very much you worked it out !!!

    Yes it made all the difference changing the “Never Align Mismatches” although in my version it is called “Never Align Differences”.

    I did change the last example to contain duplicates as I thought BC would do that automatically but it is no problem as there is a workaround and I can also remove duplicates with another program beforehand.

    The workaround, if you are interested, is to keep saving and reloading the lists, eventually all the duplicates get picked up in the method you described.

    Unfortunately even though the method you kindly provided does work it is so incredibly slow it is almost unusable even with very small text files of only 5MB or so. I have some text files that are 200MB !

    You ask what I am doing “In the real world” ha ha ! Well I admit this isn’t probably a common thing so don’t feel bad about it. I use you program as everyone else does I suppose to compare 2 documents that have been written or changed in 2 different places. I sometimes have a copy on my lap top and one on my computer so things can get messed up. I use your program to compare the 2 versions and pretty much just use the merge function.

    This new idea or problem is I am making text lists of words for specific subjects. These are growing in size and I realise that many text files contain the same words. I thought I could reduce the size of all these files by removing the contents of the English dictionary from each one and that should leave me with only words specific to a given subject. So once you remove the dictionary you are left with slang terms and unique terms to that subject.

    I must admit I didn’t think comparing text files on a computer would be a problem but I simply cannot find any software anywhere that can do it. I know yours can but as I say it cannot handle text files of more than a few hundred KB, well not on my computer anyway.

    Basically what I am looking for is a “Find and Replace” found in all text editors but being able to load a text file in rather than a single word. Then simply replace with a “blank”.

    Out of interest do you know the maximum text file size BC3 can handle is please ?

    Thank you for your help.

  7. #7
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    Our max file size test are available here:
    http://www.scootersoftware.com/suppo...z=kb_maxfilev3

    Currently, we have tested upwards of 500meg files in the Text Compare. These tests are still limited based on the hardware you are using. How old is your computer and what kind of processor/ram are you using?

    Also, the "above method" in the KB article is specifically for RESX files. You would need to provide your own duplicate removing command line program, which can then be plugged in and used automatically.
    Aaron P Scooter Software

  8. #8
    Join Date
    Dec 2010
    Posts
    17

    Default

    Hi Aaron

    I am sorry for taking your time up with this.

    Can I just make it clear here that I have no trouble loading larger files in for the normal functions of BC3. It’s just with the topic discussed here that I am having trouble.

    Lap top about 3 years old.
    Intel Core2
    T5500 @ 1.66GHz
    2GB RAM
    XP Pro SP3

    They do load in ok but BC3 seems to lock up, I leave for up to 15 minutes for a 3.4MB (3 point 4 not 34) List A file together with anything up to 2MB List B file. I have to crash it to close it.

    As I say duplicates are ok as I have another program to sort that, but thank you anyway.

  9. #9
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,384

    Default

    Hello,

    What do you mean by the Normal functions of BC3? Is it only the "Sorted" file format that is giving you trouble, and <Default> works fine for the same files? Sort is actually a built-in command line that is used by Windows, so that could give us a hint as to what is causing the trouble.
    Aaron P Scooter Software

  10. #10
    Join Date
    Dec 2010
    Posts
    17

    Default

    Hi

    Yes, sorry I should have explained that better, normal for me ! Ha ha !

    I usually use BC for the text merge function. If I load the same files into that they open up pretty much instantly.

    If I load the same files in the text compare function the first file (whichever one it doesn’t matter which) loads straight away it is only when I load the second one in that I get the hour glass for ages. It goes on so long that I have to shut it down.

    You had me worried that my hardware was at fault so I have tried it on the other 2 computers here and I get exactly the same results. I have even used different text files so it is reproducible with any text in either or both lists.

    Have you tried this yourself ? All you need is two different text files of about 3Mb, use the “Text Compare” function, set to “never align differences”, also the “sorted file format” and it should lock up or at least take a disproportionate amount of time.

    Something that might give you a clue is if I select “Unaligned” instead of “Alternate Method” the lists load instantly.

    Thanks.

    More info in case it is useful to you.

    I have

    XP Pro SP3 32bit.
    AVG 2011
    Comodo Firewall

    Nothing else really running all the time. I have office and such like but nothing that I can think of as causing trouble. I also don’t think there is anything wrong as I regularly load fresh installs from a ghost image for my C drive. The other computers here are fine also one of which has Win 7 on.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •