Page 1 of 2 12 LastLast
Results 1 to 10 of 12
  1. #1
    Join Date
    Mar 2015
    Posts
    7

    Default ignoring word wrapping

    I would like the following two paragraphs to be seen as matching. Is there any way to achieve this?

    one two three four
    five six seven
    eight nine

    one two three
    four five six
    seven eight
    nine

  2. #2
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,826

    Default

    Hello,

    BC4 always determines line breaks to be important. The current workaround for this is to perform an external conversion that normalizes where these breaks occur. We have a few for download for specific formats like Java and HTML on our website. We also have a general guide for setting up any command line tidy app and plugging it into Beyond Compare:
    http://www.scootersoftware.com/suppo...rnalconversion

    Comparing across line breaks is on our wishlist. What file format type are you comparing?
    Aaron P Scooter Software

  3. #3
    Join Date
    Mar 2015
    Posts
    7

    Default

    Thanks!

    We are comparing HTML that has already been processed (by a plugin) to strip away everything except for the underlying text. Seems like the merging of paragraph text into a single line would be best done with in the plugin, since I hope that it will know what is a paragraph and what is not.

    Another approach would be to make the entire page a single line of text, but that would make it difficult to identify exactly how the observed differences corresponded to the original page.
    Last edited by jon bondy; 30-Mar-2015 at 09:16 AM.

  4. #4
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,826

    Default

    Hello,

    We have an HTML Tidy format available in the "Alternatives" section of our file formats, here:
    http://www.scootersoftware.com/downl...kb_moreformats

    Once installed, go to the Tools menu -> File Formats, and move it up or down the list (above or below) the default HTML format included. Whichever is top-most is the one that is used automatically when scanning or opening HTML files. The other can be manually selected using the Formats dropdown menu on the toolbar or in the Session Settings.

    Please note, that if you Save the file, it will save with the new line breaks, as is.

    If you do find yourself with a single, long line that contains multiple differences, Ctrl+Shift+N will go to the Next Difference within a line.
    Aaron P Scooter Software

  5. #5
    Join Date
    Mar 2015
    Posts
    7

    Default

    Thanks. I have no idea how to install the HTML Tidy software. When I go to your link, and then the next link, and then the next link, I end up at a page that does not contain anything useful to download. Can you provide more explicit instructions please?

    My original problem seems to originate in the HTML To Text plugin which takes the original HTML with text on a single line and re-formats it so that the text wraps after 80 characters or so. While this does make things pretty, it also creates a huge number of artificial differences in the text, given BC's notion that a new line is significant. Is there any way to "fix" the HTML To Text plugin, or to make the wrapping optional?

    Thanks, again

  6. #6
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,826

    Default

    Hello,

    That would depend on the exact version of the format you are using, and if the title is exactly "HTML to Text". We have several "HTML" file format variants, and each formats differently. If you go to the above Alternatives link, then click Windows, then search "HTML" you should see several results. These include HTML Tables, HTML tidied, and HTML to Text.

    Note that HTML to Text is not included in the default installation. It must have been downloaded and installed from this website, or deployed by your IT department. The default format is "HTML" and is also in the list of file formats. You can switch from "HTML to Text" or disable it by unchecking it in the Tools menu -> File Formats dialog, which will then allow the default "HTML" behavior, which will not wrap at 80.

    HTML Tidied would reformat your HTML into unified line breaks. This also includes the HTML code (which HTML to Text removes). Did you need the code removed from the view?
    If you are still having trouble, it may be best to email in your current settings with the Help menu -> Support; Export, to support@scootersoftware.com
    Please include a link back to this original forum thread, and an example file if you could.
    Aaron P Scooter Software

  7. #7
    Join Date
    Mar 2015
    Posts
    7

    Default

    The plugin I am using is exactly "HTML to Text" and I did download and install it.

    The default HTML behavior is not useful because I do not care whether the font changed; I only care if the visible text changed. With full HTML comparison, there are even MORE false differences shown.

    So HTML Tidied tidies everything up (nice), but includes the HTML (not helpful); while HTML as Text removes the HTML (nice) but inserts line breaks (not helpful)?

    Seems like nothing that you offer will help me. It is hard to believe that I am the first person in a decade to want this.

    I noticed that it is possible to attach an external text processor. Are there any tech notes about how to write one of these? Is it just a command line program with two parameters (input file and output file)?

    Thanks!

  8. #8
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,826

    Default

    Hello,

    If you could email your BCsupport.zip and sample files, I could verify it is set up correctly. For example, if the File Format's Conversion tab, characters per line limit has been set to 80, this would explain the behavior you are hitting. The default value is 4096.

    And correct: the File Format's Conversion tab, External program takes an input and output (*.txt). We then open the output.txt file.
    Aaron P Scooter Software

  9. #9
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,826

    Default

    Thanks for the sample files. With these text blocks, it does look like the downloadable HTML to Text format will introduce line breaks, presumably for readability. There don't appear to be any command line parameters to help control this behavior either. If you've found one, I would recommend plugging another HTML2Text utility in as the External Conversion that produces content closer to what you would like to compare.

    The External Conversion command line can run a program as if it were from the Windows Command Line, given an input.html file and an output.txt file as our parameters. We have documentation on this setup (using .resx as the example file type) here:
    http://www.scootersoftware.com/suppo...rnalconversion
    Aaron P Scooter Software

  10. #10
    Join Date
    Mar 2015
    Posts
    7

    Default

    I wrote my own plugin. Thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •