Announcement

Collapse
No announcement yet.

Delete one text list from another.

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indent
    started a topic Delete one text list from another.

    Delete one text list from another.

    Hi

    You will have to forgive me as I am a bit of an idiot as I can’t seem to do the simplest things sometimes !

    I am trying to use your software to compare two text lists and subtract one from the other.

    I managed to work out that if I used the “Show Same” view then delete the displayed lines that does work….a bit.

    However what I am trying to do is simply remove the same words or lines that are on list A from list B.

    Your program seems to only check for lines that are in a similar position and not search throughout the entire list. I guess what I am trying to do is a bit like an automated “Find and Replace” where the entire text of list A is compared to the entire text of list B.

    Sample.

    List A

    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    line 8

    List B

    Other Text 1
    Other Text 2
    Other Text 3
    Other Text 4
    line 1
    Other Text 5
    line 2
    Other Text 6
    line 3
    Other Text 7
    Other Text 8
    line 4
    Other Text 9

    What I am trying to achieve would be an output of …

    List C

    Other Text 1
    Other Text 2
    Other Text 3
    Other Text 4
    Other Text 5
    Other Text 6
    Other Text 7
    Other Text 8
    Other Text 9

    So List A is subtracted from List B to give me List C.

    Can anyone help me please ?

  • Indent
    replied
    Hi Aaron

    Thank you for your time. Yes this does work better for me now !

    However it does seem a long winded way for a user to have to set up BC to simply compare 2 lists. Could this perhaps be a feature request to make it simpler for a user to do this task in future versions ? Perhaps also some better optimisation for this sort of thing ?

    Thanks again and have a happy new year !

    Leave a comment:


  • Aaron
    replied
    Our Data Compare session type also handles your files well once you set the columns to be fixed and leave the field blank, but runs a sorting algorithm that can take some time, making it a bit slower since your files are already sorted. We'll look into that to see if it can be improved.

    Leave a comment:


  • Aaron
    replied
    Thanks for the sample files. I played around with them and showed them to a developer. A good configuration we came up with is:

    In the Tools menu -> File Formats, make a format for these files, and in the Grammar tab, add a new Line Weight of
    .*
    that is a Regular Expression and a Priority of 5.

    Then in your Session Settings, Importance tab, set all three whitespaces to be important, and in the Alignment tab set Skew Tolerance to something very large (30000) and check to Never align differences.

    I was able to then load your files for comparison in about 15 seconds. The Line Weight is not necessary, but is a neat trick our developer recommended that will speed up the comparison. Since your files are Dictionaries, if you find a match it will immediately match it without looking any further.

    Leave a comment:


  • Indent
    replied
    Hi Aaron

    I have sent you another attachment via e-mail including my settings. I have sent an English dictionary and a list of junk words / text from the internet.

    I have tried (using the settings you kindly told me to use) to remove all the English words in List A from List B.

    I have used many different combinations of settings with no luck. I have used other software to accomplish this task for myself personally but I thought you might like to work out why BC3 doesn’t work for me in this situation.

    Please could you try the two lists using any method or configuration and see if you can remove the words in List A from List B ? If you manage to get BC3 to do it I would be grateful if you would post how you did it on here, so I can try to reproduce your method.

    Thanks for your time and Happy Christmas !
    Last edited by Indent; 24-Dec-2010, 10:13 AM. Reason: An appalling spelling mistake.

    Leave a comment:


  • Aaron
    replied
    Sorry for the delay.

    When using the Standard method, were your files entirely loaded? One of the big differences is that the Std method can load the longer file partially, but if the comparison is not finished then clicking on various display filters may not show what you expect.

    Clicking various buttons that interact with the still-loading Alternative comparison seems to cause issues for you. That behavior I have not gotten to reproduce quite yet. If you open a new tab and work in there for a bit, does it also "freeze up" if you leave the original comparison running?

    Leave a comment:


  • Indent
    replied
    Hi

    I just wonderd if there were any progress reports on this ?

    Leave a comment:


  • snidely.too
    replied
    Originally posted by Indent View Post
    As far as I know WinMerge is open source so could you perhaps see how they do it ?
    Careful with this advice ... it could create a situation where some or all of Scooter Software's has to be released as open source. Details of this vary with the type of open source license, and how much the proprietary software is "contaminated" by the open source.

    GPL3 is, AIUI, very aggressive about being used becoming a requirement of open release, while some of the other licenses (BSD and SSH are examples, I believe) are more flexible. At the most lenient end is traditional public domain code.

    One way around this (sometimes) is to have one person do the looking, and then provide a summary (not a copy) to another person who interprets that to write the proprietary code.

    Open source property rights and their interaction with proprietary rights can be a challenging subject.

    /dps

    Leave a comment:


  • Indent
    replied
    Hi Aaron

    It can vary between “hangs”.

    The most common is the progress bar will continuously scroll left to right but not fill any green blocks permanently. Sometimes I can press other buttons and they work but more often than not I get the “This program is not responding” message and I have to crash it. Usually the two screens where the text should be are blank and remain so.

    The two files I sent will lock it up in the less obvious way. If you want to really lock it up add more text to List B. It will be even better if it isn’t alphabetically sorted as you just don’t get anywhere with it then.

    I have even left the program running for a few hours to see if it was just a matter of waiting but nothing changes.

    I have tried the standard method with the two files I sent you. List A loads quickly and so does List B or so it seems. However I cannot find any text in List B even when selecting to “Show Differences”.

    I also tried altering the slider bar with no difference.

    Thanks.

    Have you tried WinMerge ? If not, would you like to try it on the lists I sent you, it should load and mark everything nearly instantly.

    As far as I know WinMerge is open source so could you perhaps see how they do it ? BC3 will have the advantage over WinMerge as the user will be able to select the similarities or the differences which WinMerge cannot do.
    Last edited by Indent; 13-Dec-2010, 05:56 PM. Reason: Added WinMerge Suggestion

    Leave a comment:


  • Aaron
    replied
    Hello,

    We got your email. When you say "hang" is the program still responsive (able to switch between multiple tabs, the progress bar still animates)?

    If so, I think I've reproduced the behavior you are seeing. The Alternate alignment method requires the whole file to be loaded and computed before it can display results. The Std method can load incrementally and begins showing information right away (and is a bit faster/more responsive). Does switching the alignment method help?

    Leave a comment:


  • Indent
    replied
    Weekend Update

    I have discovered this program called WinMerge.

    http://winmerge.org/

    Using this I am able to load and sort my lists so I am almost certain my computer is ok.

    Unfortunately WinMerge doesn’t have the capability to select “All, Differences or Same” in the way BC3 can. So I am stuck with WinMerge being able to sort the lists but not allowing me to select the type I want and BC3 not being able to sort the lists but having the option to select what I want !!

    Ha ha ! ....

    Leave a comment:


  • Indent
    replied
    Hi

    I have e-mailed support with the files you asked for.

    Leave a comment:


  • Indent
    replied
    Hi Aaron

    I’ve found out what’s causing it to lock up !

    It works ok for me here when I use two lists of equal size, or very nearly equal size. The way to get it to crash is to have a large file as List A and a small file as List B.

    Another thing I noticed when I discovered this is that when I compared two lists of nearly equal size with just the addition of my 8 test lines added to List B when I deleted or tried to delete the similar lines it locked up again. I guess this is because BC tried to update the contents of both lists which is the same situation as loading two different sized lists in the first place.

    I am willing to send you what you asked for above but I am not at home now, also we are on different time zones and I wanted you to know what was wrong before you went home for the weekend.

    So… here’s a way for you to reproduce the problem now and if this isn’t enough for you I will send what you asked for above when I get home.

    List A
    Make a list of any words etc or get a copy of a dictionary and use that. Make sure it is a couple of MB at least.

    List B
    Make a new text file and copy and paste these lines in it.

    This line is not in the dictionary.1/8
    This line is not in the dictionary.2/8
    This line is not in the dictionary.3/8
    This line is not in the dictionary.4/8
    This line is not in the dictionary.5/8
    This line is not in the dictionary.6/8
    This line is not in the dictionary.7/8
    This line is not in the dictionary.8/8

    Set BC up as you described to me earlier.

    Load List A in the left-hand window.

    Let it load in fully.

    Then load List B in the right-hand window.

    BC should now lock up completely.

    Another test is to have two identical lists and then add only a few test lines to list B. Load them up, view the differences and then try to delete them. BC should lock up again.

    Thanks for your help with this and I hope this helps you work out what’s going wrong.

    PS.

    I haven’t experimented with a smaller list A than list B or any other combination yet, but I expect you will probably know what’s wrong by now anyway.

    Leave a comment:


  • Aaron
    replied
    Hello,

    I just tested with a couple pair of 32meg test files, and the sorted rule works ok for them. It may be the alignment that is having trouble of your specifically sorted data. Is it possible to email us a pair of sample files you are having trouble with and your Support package (Help menu -> Support; Export)?

    Please also include a link back to this forum post in the email for reference.

    Update: to clarify, sorting only took a few seconds. Initially loading also took several seconds (under 10). My machine is currently running a Duo core processor and has 3 gigs of ram on WinXP 32bit.
    Last edited by Aaron; 10-Dec-2010, 12:25 PM. Reason: Update

    Leave a comment:


  • Indent
    replied
    Hi

    Yes, sorry I should have explained that better, normal for me ! Ha ha !

    I usually use BC for the text merge function. If I load the same files into that they open up pretty much instantly.

    If I load the same files in the text compare function the first file (whichever one it doesn’t matter which) loads straight away it is only when I load the second one in that I get the hour glass for ages. It goes on so long that I have to shut it down.

    You had me worried that my hardware was at fault so I have tried it on the other 2 computers here and I get exactly the same results. I have even used different text files so it is reproducible with any text in either or both lists.

    Have you tried this yourself ? All you need is two different text files of about 3Mb, use the “Text Compare” function, set to “never align differences”, also the “sorted file format” and it should lock up or at least take a disproportionate amount of time.

    Something that might give you a clue is if I select “Unaligned” instead of “Alternate Method” the lists load instantly.

    Thanks.

    More info in case it is useful to you.

    I have

    XP Pro SP3 32bit.
    AVG 2011
    Comodo Firewall

    Nothing else really running all the time. I have office and such like but nothing that I can think of as causing trouble. I also don’t think there is anything wrong as I regularly load fresh installs from a ghost image for my C drive. The other computers here are fine also one of which has Win 7 on.

    Leave a comment:

Working...
X