No Comparing of Split Lines ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • purchase
    Enthusiast
    • Oct 2011
    • 35

    No Comparing of Split Lines ?

    First, I purchased BC3 some months ago for text compare tasks, and I'm very satisfied not only with these, but I also discovered BC's ability to compare folders, i.e. thousands of files, even on deeper levels than just those attributes (name, size, last changed, etc.) any file commander can "compare" files - this is very handy after doing savings of whole hdd's to another one, in order to be sure everything has been replicated without fault.

    ( Please allow for my touting of "Replicator", a fine freeware by the late Karen Kenworthy, perfect for any regular sync task, to be found on www.karenware.com )

    Then, again, help file sometimes is not comprehensive enough for non-programmers. One problem I encounter is BC's inability (or mine, to discover how it works) to compare identical lines (i.e. NOT giving alarm then, instead of giving false alarms) that have been split in one of two reference files. Let me give an example:

    I have downloaded some laws, from a governmental website. The articles there are plain-text, in the form

    Article 314 blabla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
    bla bla.

    In my downloads, I do some formatting, in the form

    Article 314
    bla bla bla bla........
    bla bla..... etc.

    with "Article 314" bold, and any parts of the text in bold, or in bold-underlined that are of special interest to me.

    Every 3 months or so, I download the original (government-wise consolidated) text anew, in order to see if there have been any amendmends to them.

    Now my problem becomes evident : Since I cannot find the way BC could treat the lines

    Article 314 bla bla bla

    and

    Article 314
    bla bla bla

    as identical, i.e. cannot find how to treat line feeds or "new paragraph formfeed" or whatever as NOT making any difference, I only got TWO alternatives to do the comparison :

    a) I could compare any further downloaded text by processing my formatted text with a text editor in which I replace DOUBLE new line characters with a special character, then replace all (single) line feed characters with a space, then replace the newly introduced special character with two linefeeds again ; then I'd compare with my formatted version. This could be automatted, but unfortunately, this is not feasible for me since within the articles (i.e. within the downloads, and within my stored text), new sub-articles begin with a single line feed (i.e. not a blank line between, as for a new article), and thus, such a macro would cause chaos within those sub-articles.

    b) Thus, in practice, I've got TWO versions of my downloads : One with the article number within the first line of the text, as in the original text, and one, the formatted one, i.e. my "working copy", with the article number as sole first line before the article's text body.

    Thus, I compare the new download not with my working copy, but with the intermediate version, no problem. Problems arise whenever there is an amendment (which occurs rather often, frenzy lawmaking (instead of solving problems) obligeing) : Here, I not only have to (rather) manually (because of formattings) sync the new official version with my working file, but also the intermediary file (since if I don't do it, it cannot be used anew for the next comparison in three months or so).

    Hence, I could have posted within the 3-file compare section of this forum, but good heavens ! And of course, I could use the intermediate file for comparison only, reserving any synching to the original and the target files, then deleting the intermediate file and using the freshly downloaded version as the intermediate file in three months.

    But needless to say, any which way I try, I always get stuck into trouble, doing a lot of manual paper scribbling, and invariably ending up with questioning myself, did I process this amendment, if yes, in both files, etc., etc. - I put three or four times as much time to this comparison task, as I should devote to it, and without being sure I've got a faultless new working version in the end (since that doesn't compare to the web version, because of those line breaks).

    I'm even considering doing away with my line breaks, in order to be able to do without any intermediate version outstandingly complicating matters, or to use another (which one?) comparison software, not instead of BC, but on top of it, for this precise sort of task.

    Help, please !
  • Aaron
    Team Scooter
    • Oct 2007
    • 16002

    #2
    Hello,

    BC3 always considers line breaks as important in the current version. For files with different line breaks or whitespace, we often use external conversions to re-format the text, similar to the method you describe in section a. A few of these are available for download on our website as "Tidied" variants for existing rules in the Alternative File Formats download page:
    http://www.scootersoftware.com/downl...kb_moreformats

    We also have directions for defining a custom format in a KB article, here:
    http://www.scootersoftware.com/suppo...rnalconversion

    Comparing across line breaks is on our Customer Wishlist, but is not currently scheduled development. I'm not immediately aware of another diff tool that ignores line breaks, but if you find one you could launch it from a BC3 Folder Compare session by defining an "Open With" command line in the Tools menu -> Options dialog. This way, you can use BC3 to find the different pairs, select a pair, and then trigger either a BC3 Text Compare running an external conversion, or any other command line with an Open With.
    Aaron P Scooter Software

    Comment

    • purchase
      Enthusiast
      • Oct 2011
      • 35

      #3
      Hello Aaron,

      Thank you very much for your quick answer. Of course, I had hoped that BC was able to compare split lines (i.e. just in pairs, not bunches of succeeding ones, which would be more difficult to program, but just considering single linefeeds as unimportant), and that I just didn't get it on my own (I had experimented with various "^x", etc. codes, to no avail of course.

      Since it does not, I suppose it should be put upon the agenda for immediate future : why ? Because I suppose a lot of people have got data in the FORM (i.e. this is just an example of such data, more examples would be product data, data of spare parts, and lots more) of :

      Name
      Department
      Address Line 1
      Address Line 2
      Post Code
      Town
      Tel with Prefix
      and so on

      And this in various line varieties, i.e. tel with prefix in one or two lines, name / christian name in one or two lines, post code and town in one or two lines, and so on ; the same with product data.

      If they want to compare with BC, they must then not only preserve the ORDER (as with every other such program) of their data, but also the original line settings of the original data, even if that is rather unhandy for them - not being able to do so harms the possibilities of use for BC in corporate environments.

      But even for programming listings / code, I see the utility of such an ability : There might be commentaries after a ";" within the line, for one programmer, whilst another writes them into a dedicated line, hence the utility for a function

      "if there is a difference, by missing text in one of the reference files,
      then look into the next line of the file where text is missing from the line,
      and if the missing text is there indeed,
      then mark the previous line as identical
      (if there is no other difference, of course, and if the user has checked that "look out for split lines" option, of course)
      and "substract" that "overpouring" part of this line from its "core content", in order to not mark this line as "different" if that's the only difference differentiating it from the corresponding line within the other reference file (where the line in question is NOT split)".

      Such a function should be programmable, I hope, and should certainly be of great interest for many a marketing and other corporation departments if advertized well (You do a lot of google advertizing (which I take care of NOT clicking on, of course), so functions giving BC a wider scope should be of interest to implement ; thus please consider this.

      I followed your links but didn't get any hint as to how apply something in order to have BC do what I need there. Please have in mind : The target file could be processed in some ways but the problem would be that I need to implement the changes into the target file... into the UN-processed version of the target file. Thus, any processing of the target file would cause creation of that intermediate file I must avoid to create at all cost since I scramble it all, with three files instead of just two. Remember also that the comparison is done within the BC window, whereas the updating is done within the target file, NOT within the BC window (= since that is the unformatted version of the target file), but within the original file, so I'm shuffling around bits of text a lot anyway.

      Or am I dumb here ? I always proceeded like this with formatted files, back and forth, back and forth... Or could it be that I import a formatted text (.rtf) into a BC pane, did all the editing there, and then simply did a control-a, control-c, and then, within the target program, a control-v ? Up to now, it never occured to me that this could be possible, preserving all (?) the formatting ?

      If yes, what formatting would be preserved, then ? Normal text formatting, I suppose (= no paragraph styles, of course?) - and indentation ? Even more ? I disgress from the original problem here, but it'd be of high interest if BC preserved a max of formatting in rtf files, allowing thus for editing them within BC.

      (In my case, I must search for an alternative text comparing program for those law jobs, then.)

      Comment

      • purchase
        Enthusiast
        • Oct 2011
        • 35

        #4
        For clarification and since other users might consider these problems :

        We're not speaking of "Moved Blocks" here, a problem that was treated elsewhere, I'm just referencing to "overpouring lines" vs. "split lines" which programming-wise should be a much lesser problem.

        There is a comparison chart in wikipedia that states that BC (as we know), diff, TKdiff, Pretty Diff and others do NOT have a "moved lines" function, and Araxis Merge (about 260 dollars) is included into the have-not's, so you don't always get what you pay for (and what about DiffDoc, 400 dollars ?).

        According to that comparison chart, Code Compare, Meld, WinDiff, WinMerge, UCC and ExamDiff Pro DO have a "moved lines function" (problem with ExamDiff Pro being that they have got an "annual licence" of 35 dollars that could be renewable for 20 dollars or for 35 dollars, the site being ambiguous to that ; problem with some others being they don't compare MS Word or otherwise formatted / marked-up files) - but again, "moved lines" / "moved blocks" (of lines) is much more (and something else!) than "split lines" / "word wrap", so there's a lot of trialling to do if you really need it...

        But who doesn't, in the end ? Hence the importance of both for BC's roadmap.
        Last edited by purchase; 12-Mar-2012, 04:40 PM.

        Comment

        • purchase
          Enthusiast
          • Oct 2011
          • 35

          #5
          EDIT of the above :

          ExamDiff Pro costs 35 dollars for continuous use ; they clarify it in the "Purchase" page of their site, whereas they seem to pretend otherwise in their "Prices" page, so price is ok. Yes, they do have "moved blocks" functionality, but many a thing on their wish list is included in BC (and probably from 2.0 or even 1.0 on). Their "word wrap" is for display only, and number 73 of their 77items wish list is :

          73. Option to not treat CR charaacter [sic] as a linebreak

          Code Compare is highly interesting for programmers since it offers something like AI for programming languages, trying to "see" what kinds of code snippets might be equivalent but all this - like the rules in ExamDiff Pro and elsewhere - are highly prone to quickly dismiss as equivalent strings that should be marked as different indeed.

          Whereas the rules in BC are of highest precision (if not really handy to implement beforehand) and can be freely combined, one by one, up to the result you need - I thought this was standard : it isn't.

          It's just that some details are left out in BC up to now, and I long for them to be implemented.

          P.S. Item one on the above-mentioned competitor's wishlist is 3-pane comparison...

          Thus, I'm not referring to contenders in order to denigrate BC : It does a lot many contenders do not.

          (But my very special problem isn't resolved...)

          Comment

          • purchase
            Enthusiast
            • Oct 2011
            • 35

            #6
            Since it doesn't seem I don't get any more help with this here, please allow for this conclusion :

            Instead of trying to process the target file (or some intermediate file on the "target side"), I decided to process the download file :

            Instead of loading the download file into BC, I process it with an programmable editor in which I search for blank lines (= after the head body, that is) within the original text = the separator lines between articles. Then, for every such blank line, the script goes down a line, moves the cursor right two words (= "Article 1234"), then replaces the space there with a line feed. I then feed the processed "input" file into BC, in order to compare it with the target file.

            This is a viable solution to my problem, but considering I have to check for about 100 of such input files every 3 months or so, I'm not happy with it ; the above-mentioned script amending BC's capabilities (= option to check for split lines within BC's rules, then process previous / current / next line within a sub-routine there) would be much more elegant... and would allow for marketing BC for many uses in corporate environments that are currently out of the question.

            Remember here : Text comparers ain't exclusively used by programmers for their versioning needs. Hence the high interest of creating (= developing an existing one into such) a text comparer being useful for many other tasks (and market it accordingly), when most contenders do NOT serve these (broader) markets. I very much hope my voice is heard here ; BC is a very good program ; the steps to make it an excellent one ain't that big in my understanding.

            EDIT : The above-mentioned 73rd position in a 77-position contender's wish list clearly indicates that such a function seems irrelevant to CURRENT users of such programs (whilst "moved blocks" is, not astonishingly, in the very first position) - but then, those many possible users from not-programming departments do not contribute to wishlists but simply dismiss these programs and don't buy them since they don't meet their needs ; this is to say, 73rd position within 77 does NOT inform us upon the OVERALL importance of such a feature.
            Last edited by purchase; 14-Mar-2012, 04:55 AM.

            Comment

            • Aaron
              Team Scooter
              • Oct 2007
              • 16002

              #7
              Hello,

              Such a solution, made into an automated script that takes an input from the command line and produces a text output, is an example of an External Conversion. From your description, it sounds like you may be using a Macro instead, but if you can do it from the command line, you can plug it into BC3.

              We are aware of the demand for comparison over line breaks, but it simply isn't something we currently support in BC3. It is on our wishlist and something we would like to tackle, but is a large project and would not be soon coming. Hence the ideas for workarounds and a Tidied solution using an external conversion.
              Aaron P Scooter Software

              Comment

              • purchase
                Enthusiast
                • Oct 2011
                • 35

                #8
                Hello Aaron,

                My first step was to look into the descriptions, help files, etc. of competing sw, not for switching, but for buying on top. I didn't find any contender who can do this (which doesn't mean I could have overlooked such an offering)... AND I got aware, if that was needed (it wasn't really) how good BC is; in fact, most contender offer far less details for text comparing (but some offer switched lines compare indeed, but not really important for me).

                My second step was to renounce on my cutting up the title and the text of paragraphs in my texts, so these titles are just bold, but within the same line now with the rest of the text, as they are in the original legal texts to which I have to compare mine.

                In the end, it's a question of quick and frequent comparing, instead of having it "your way" but which would be a complication preventing you from frequent comparing; it's all a question of priorities, and mine are "the least fuss possible" instead of "I like it that way, graphically" - at least from that moment on I got aware that even by buying another program on top of BC, I couldn't have it "my way" without complications.

                Here also, my answer and my "thank you" comes a little late, but then, in such instances, I prefer to get some experience with "real work done" before I (can) share my definitive "solution".

                In this case, I wasn't aware at all that "nobody does it", i.e. that it doesn't seem so easy, programming-wise.

                But please allow a kind uttering my general (! I'm not speaking of this particular and perhaps too exotic feature!) opinion: It seems that development of BC has become a little bit slow, then? Ok, many, many very good details ARE there already, but why slowing down development then, on a, I admit it, high level? If synch tools are "going cloud" more and more, because data storage is going cloud more and more, this is NOT the case with compare tools, so here at least, a more alive development WOULD be honored by happy users / customers!

                Comment

                • peewee
                  New User
                  • Sep 2016
                  • 2

                  #9
                  Need this for comparing patents / claims

                  Comment

                  Working...