Delete one text list from another.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Indent
    Journeyman
    • Dec 2010
    • 17

    Delete one text list from another.

    Hi

    You will have to forgive me as I am a bit of an idiot as I can’t seem to do the simplest things sometimes !

    I am trying to use your software to compare two text lists and subtract one from the other.

    I managed to work out that if I used the “Show Same” view then delete the displayed lines that does work….a bit.

    However what I am trying to do is simply remove the same words or lines that are on list A from list B.

    Your program seems to only check for lines that are in a similar position and not search throughout the entire list. I guess what I am trying to do is a bit like an automated “Find and Replace” where the entire text of list A is compared to the entire text of list B.

    Sample.

    List A

    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    line 8

    List B

    Other Text 1
    Other Text 2
    Other Text 3
    Other Text 4
    line 1
    Other Text 5
    line 2
    Other Text 6
    line 3
    Other Text 7
    Other Text 8
    line 4
    Other Text 9

    What I am trying to achieve would be an output of …

    List C

    Other Text 1
    Other Text 2
    Other Text 3
    Other Text 4
    Other Text 5
    Other Text 6
    Other Text 7
    Other Text 8
    Other Text 9

    So List A is subtracted from List B to give me List C.

    Can anyone help me please ?
  • Aaron
    Team Scooter
    • Oct 2007
    • 16000

    #2
    Given your example, you should get as you expect by using the Show Same filter, and deleting the Line 1-4 on the right side.

    The issue I believe you are running into is if line1-4 is not in the same order on both sides where List A is:

    line 2
    line 3
    line 4
    line 1
    line 5
    line 6
    line 7
    line 8


    To help with this scenario, your files can be sorted. If your data is only line by line, and it is ok to sort every line independently and alphabetically, we have a Sort file format that can do that. Go to the Session menu -> Session Settings, Format tab, and switch from Detected to Sorted for both sides.

    How does that work for you? If you are still having any trouble, would you be able to post or email us a pair of example files? Our email is [email protected] and if you email us please include a link back to this forum post.
    Aaron P Scooter Software

    Comment

    • Indent
      Journeyman
      • Dec 2010
      • 17

      #3
      Hi Aaron

      Thank you for your reply and help.

      I tried what you said and I can see what you mean, however it didn’t really work. It did remove some but in cases where there is more than one of the same lines on List B it doesn’t seem to work.

      I did mess about with this…for some hours and discovered that if I changed this setting ..

      Rules / Alignment / and set it to Alternate Method.

      It seemed to work better.

      I have made a sample list for you below with very few lines that seem to throw the program. They are deliberately messed about to show you the problem.

      I think this is obviously a function this program isn’t meant to do, I guess it is an unusual request. So rather than me wasting your time trying to make it work could this perhaps be a feature request for a later version ?

      Could you please make an option to allow 2 lists of different sizes to be compared and then a button to remove all the “same” matching lines / words from list B ? Also make it so they don’t have to be aligned. As mentioned before almost like an automatic “find and replace with blank”.

      Thanks

      Here’s a sample for you to play with.

      List A

      line 1
      line 2
      line 3
      line 4
      line 5
      line 6
      line 7
      line 8
      line 9
      line 10



      List B

      line 1 This line should be left after filtering. 1/4
      line 6
      line 4B This line should be left after filtering. 4/4
      line 1
      line 2
      line 3
      line 1
      line 7
      line 8
      line 6
      line 7
      line 8
      line 2
      line 5
      line 6
      line 2B This line should be left after filtering. 2/2
      line 1
      line 7
      line 8
      line 8
      line 2
      line 6
      line 7
      line 3B This line should be left after filtering. 3/4
      line 4
      line 5
      line 6
      line 1
      line 3

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16000

        #4
        Hello,

        If you enable Never Align mismatches, and use the Sorted file format, I believe this should align as expected with the initial example.

        With the new example, however, you have introduced the concept of duplicate lines that you wish to remove. Unfortunately, we do not have a method of automatically removing these at this time. As a workaround, you can use our Replace dialog to manually replace all instances of specific text (line 8) with a blank line to remove them.

        You can also create your own custom external conversion. If this could remove all duplicate entries from a file, and sort the remaining unique lines, you could then continue to use the above method:
        http://www.scootersoftware.com/suppo...rnalconversion
        Aaron P Scooter Software

        Comment

        • Aaron
          Team Scooter
          • Oct 2007
          • 16000

          #5
          Also, what kind of files are you comparing? We are always interested in hearing how our customers are using our application in real world scenarios.
          Aaron P Scooter Software

          Comment

          • Indent
            Journeyman
            • Dec 2010
            • 17

            #6
            Hi Aaron

            Well,…. thank you very much you worked it out !!!

            Yes it made all the difference changing the “Never Align Mismatches” although in my version it is called “Never Align Differences”.

            I did change the last example to contain duplicates as I thought BC would do that automatically but it is no problem as there is a workaround and I can also remove duplicates with another program beforehand.

            The workaround, if you are interested, is to keep saving and reloading the lists, eventually all the duplicates get picked up in the method you described.

            Unfortunately even though the method you kindly provided does work it is so incredibly slow it is almost unusable even with very small text files of only 5MB or so. I have some text files that are 200MB !

            You ask what I am doing “In the real world” ha ha ! Well I admit this isn’t probably a common thing so don’t feel bad about it. I use you program as everyone else does I suppose to compare 2 documents that have been written or changed in 2 different places. I sometimes have a copy on my lap top and one on my computer so things can get messed up. I use your program to compare the 2 versions and pretty much just use the merge function.

            This new idea or problem is I am making text lists of words for specific subjects. These are growing in size and I realise that many text files contain the same words. I thought I could reduce the size of all these files by removing the contents of the English dictionary from each one and that should leave me with only words specific to a given subject. So once you remove the dictionary you are left with slang terms and unique terms to that subject.

            I must admit I didn’t think comparing text files on a computer would be a problem but I simply cannot find any software anywhere that can do it. I know yours can but as I say it cannot handle text files of more than a few hundred KB, well not on my computer anyway.

            Basically what I am looking for is a “Find and Replace” found in all text editors but being able to load a text file in rather than a single word. Then simply replace with a “blank”.

            Out of interest do you know the maximum text file size BC3 can handle is please ?

            Thank you for your help.

            Comment

            • Aaron
              Team Scooter
              • Oct 2007
              • 16000

              #7
              Hello,

              Our max file size test are available here:
              http://www.scootersoftware.com/suppo...z=kb_maxfilev3

              Currently, we have tested upwards of 500meg files in the Text Compare. These tests are still limited based on the hardware you are using. How old is your computer and what kind of processor/ram are you using?

              Also, the "above method" in the KB article is specifically for RESX files. You would need to provide your own duplicate removing command line program, which can then be plugged in and used automatically.
              Aaron P Scooter Software

              Comment

              • Indent
                Journeyman
                • Dec 2010
                • 17

                #8
                Hi Aaron

                I am sorry for taking your time up with this.

                Can I just make it clear here that I have no trouble loading larger files in for the normal functions of BC3. It’s just with the topic discussed here that I am having trouble.

                Lap top about 3 years old.
                Intel Core2
                T5500 @ 1.66GHz
                2GB RAM
                XP Pro SP3

                They do load in ok but BC3 seems to lock up, I leave for up to 15 minutes for a 3.4MB (3 point 4 not 34) List A file together with anything up to 2MB List B file. I have to crash it to close it.

                As I say duplicates are ok as I have another program to sort that, but thank you anyway.

                Comment

                • Aaron
                  Team Scooter
                  • Oct 2007
                  • 16000

                  #9
                  Hello,

                  What do you mean by the Normal functions of BC3? Is it only the "Sorted" file format that is giving you trouble, and <Default> works fine for the same files? Sort is actually a built-in command line that is used by Windows, so that could give us a hint as to what is causing the trouble.
                  Aaron P Scooter Software

                  Comment

                  • Indent
                    Journeyman
                    • Dec 2010
                    • 17

                    #10
                    Hi

                    Yes, sorry I should have explained that better, normal for me ! Ha ha !

                    I usually use BC for the text merge function. If I load the same files into that they open up pretty much instantly.

                    If I load the same files in the text compare function the first file (whichever one it doesn’t matter which) loads straight away it is only when I load the second one in that I get the hour glass for ages. It goes on so long that I have to shut it down.

                    You had me worried that my hardware was at fault so I have tried it on the other 2 computers here and I get exactly the same results. I have even used different text files so it is reproducible with any text in either or both lists.

                    Have you tried this yourself ? All you need is two different text files of about 3Mb, use the “Text Compare” function, set to “never align differences”, also the “sorted file format” and it should lock up or at least take a disproportionate amount of time.

                    Something that might give you a clue is if I select “Unaligned” instead of “Alternate Method” the lists load instantly.

                    Thanks.

                    More info in case it is useful to you.

                    I have

                    XP Pro SP3 32bit.
                    AVG 2011
                    Comodo Firewall

                    Nothing else really running all the time. I have office and such like but nothing that I can think of as causing trouble. I also don’t think there is anything wrong as I regularly load fresh installs from a ghost image for my C drive. The other computers here are fine also one of which has Win 7 on.

                    Comment

                    • Aaron
                      Team Scooter
                      • Oct 2007
                      • 16000

                      #11
                      Hello,

                      I just tested with a couple pair of 32meg test files, and the sorted rule works ok for them. It may be the alignment that is having trouble of your specifically sorted data. Is it possible to email us a pair of sample files you are having trouble with and your Support package (Help menu -> Support; Export)?

                      Please also include a link back to this forum post in the email for reference.

                      Update: to clarify, sorting only took a few seconds. Initially loading also took several seconds (under 10). My machine is currently running a Duo core processor and has 3 gigs of ram on WinXP 32bit.
                      Last edited by Aaron; 10-Dec-2010, 12:25 PM. Reason: Update
                      Aaron P Scooter Software

                      Comment

                      • Indent
                        Journeyman
                        • Dec 2010
                        • 17

                        #12
                        Hi Aaron

                        I’ve found out what’s causing it to lock up !

                        It works ok for me here when I use two lists of equal size, or very nearly equal size. The way to get it to crash is to have a large file as List A and a small file as List B.

                        Another thing I noticed when I discovered this is that when I compared two lists of nearly equal size with just the addition of my 8 test lines added to List B when I deleted or tried to delete the similar lines it locked up again. I guess this is because BC tried to update the contents of both lists which is the same situation as loading two different sized lists in the first place.

                        I am willing to send you what you asked for above but I am not at home now, also we are on different time zones and I wanted you to know what was wrong before you went home for the weekend.

                        So… here’s a way for you to reproduce the problem now and if this isn’t enough for you I will send what you asked for above when I get home.

                        List A
                        Make a list of any words etc or get a copy of a dictionary and use that. Make sure it is a couple of MB at least.

                        List B
                        Make a new text file and copy and paste these lines in it.

                        This line is not in the dictionary.1/8
                        This line is not in the dictionary.2/8
                        This line is not in the dictionary.3/8
                        This line is not in the dictionary.4/8
                        This line is not in the dictionary.5/8
                        This line is not in the dictionary.6/8
                        This line is not in the dictionary.7/8
                        This line is not in the dictionary.8/8

                        Set BC up as you described to me earlier.

                        Load List A in the left-hand window.

                        Let it load in fully.

                        Then load List B in the right-hand window.

                        BC should now lock up completely.

                        Another test is to have two identical lists and then add only a few test lines to list B. Load them up, view the differences and then try to delete them. BC should lock up again.

                        Thanks for your help with this and I hope this helps you work out what’s going wrong.

                        PS.

                        I haven’t experimented with a smaller list A than list B or any other combination yet, but I expect you will probably know what’s wrong by now anyway.

                        Comment

                        • Indent
                          Journeyman
                          • Dec 2010
                          • 17

                          #13
                          Hi

                          I have e-mailed support with the files you asked for.

                          Comment

                          • Indent
                            Journeyman
                            • Dec 2010
                            • 17

                            #14
                            Weekend Update

                            I have discovered this program called WinMerge.

                            http://winmerge.org/

                            Using this I am able to load and sort my lists so I am almost certain my computer is ok.

                            Unfortunately WinMerge doesn’t have the capability to select “All, Differences or Same” in the way BC3 can. So I am stuck with WinMerge being able to sort the lists but not allowing me to select the type I want and BC3 not being able to sort the lists but having the option to select what I want !!

                            Ha ha ! ....

                            Comment

                            • Aaron
                              Team Scooter
                              • Oct 2007
                              • 16000

                              #15
                              Hello,

                              We got your email. When you say "hang" is the program still responsive (able to switch between multiple tabs, the progress bar still animates)?

                              If so, I think I've reproduced the behavior you are seeing. The Alternate alignment method requires the whole file to be loaded and computed before it can display results. The Std method can load incrementally and begins showing information right away (and is a bit faster/more responsive). Does switching the alignment method help?
                              Aaron P Scooter Software

                              Comment

                              Working...