Vergleich von Textdateien mit mehr als 65535 Zeichen pro Zeile

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RolandGr
    New User
    • Aug 2011
    • 1

    Vergleich von Textdateien mit mehr als 65535 Zeichen pro Zeile

    Hallo,

    wie dier Titel schon verrät habe ich ein Problem beim Vergleich von Textdateien mit mehr als 65535 Zeichen pro Zeile.

    Fehlen z.B. in der ersten Zeile von Datei B im Vergleich zur ersten Zeile von Datei A 5 Zeichen, werden in den von Beyond Compare automatisch umgebrochen Zeilen (nach jeweils 65535 Zeichen) die ersten 5 Zeichen als Fehlerhaft markiert.

    Ist es daher möglich die maximale Anzahl an Zeichen pro Zeile weiter zu erhöhen?
    Eventuell in einer Konfigurationsdatei, da es in der GUI nicht möglich ist?
    Oder eventuell per Kommandozeilenbefehl oder Skript Befehl?
    Aktuell verwende ich BC Version 3.1.9

    Hier mein Kommandozeilenbefehl:
    (Ich habe die Verzeichnisnamen rausgekürzt damit es hier in ein Zeile passt)
    C:\BCompare.exe "@C:\FileSkript.txt" C:\300.txt C:\300.txt C:\bericht.html

    Skript:
    log verbose "c:\temp\BCLog.txt"
    text-report layout:interleaved &
    optionsatch-unified,display-mismatches &
    output-to:%3 output-options:html-color %1 %2

    Vielleicht kann mir ja jemand von euch weiterhelfen...
    Gruß
    Roland
  • Aaron
    Team Scooter
    • Oct 2007
    • 16002

    #2
    max. Zeichen pro Zeile

    BC3's current limit is 65536 characters per line. We cannot compare greater than that without wrapping. If you can, set a lower limit to automatically wrap at specific points, or use a pre-process conversion utility to cut your lines at specific intervals less than the max line length.

    We have a KB article that goes into more detail on pre-processing, as detailed here:
    http://www.scootersoftware.com/suppo...rnalconversion

    ============================

    BC3's Begrenzung liegt derzeit bei 65536 Zeichen pro Zeile. Wir können größere Zeilen nicht ohne Umbrechen vergleichen. Wenn möglich, setz eine niedrigere Obergrenze, um an bestimmten Punkten umzubrechen, oder verwende ein Präprozess-Konvertierungsprogramm, um die Zeilen in bestimmten Abständen unterhalb der Maximalbegrenzung zu schneiden.

    Wir haben einen Knowledge-Base-Artikel, der hinsichtlich Pre-Processing mehr ins Detail geht:
    http://www.scootersoftware.com/suppo...rnalconversion
    Last edited by Gunnar; 06-Sep-2011, 10:01 AM. Reason: ENG->GER translation
    Aaron P Scooter Software

    Comment

    • MiroJ
      New User
      • May 2013
      • 2

      #3
      Hi Aaron,

      the suggested preprocessing sounds maybe interesting, but it is a workaround.

      Problem:
      Let’s consider we have two XML files A and B, with approx. 1 MB content with no line breaks, which differs only in 3 extra continuous blanks in the file “A”. The blanks are pre-set as unimportant difference, but BC presents those 2 files as having plenty of important differences!

      I suppose and suggest, this could be solved in BC programmatically.

      Proposal:
      If you look at the differences, than you first see the 3 unimportant spaces in the middle of the first line of difference in A. But this line is also marked as having an important difference! The “important” difference are exact 3 “extra” characters (letters) in the line from file B. Exactly this 3 characters on the beginning of the fictive “next line” of A are then claimed to be an important difference to B.

      This scenario with 3 characters on each BC fictive line is then repeated till the end of the real line or EOF.

      Generally speaking, if there is a length difference of X characters between the line of A and B files before BC breaks the line, and A has the extra X characters more, then B shows exactly amount of X extra characters of important difference at the end of the first fictive BC line. On the beginning of the next line A shows to have a difference in exactly the same string with length X as on previous fictive line end. And so it commutates till the end of line. Just give it a try in BC to get a better picture of this.

      Could it be possible to:
      a) compare the whole lines in one piece? If no, then to
      b) break the “shorter” B line count of X characters sooner than the line of A? Consider you have the sense of the lines being the same till the fictive line end minus X!
      c) design differentiation of the fictive and real line breaking. I hope you are not limited by an 16 Bit int…

      Background:
      I am a colleague of Roland and now I am coping with the differences between two sets of half a million of XHTML documents. They are not “pretty formatted”, e.g. because the extra whitespace it is unwanted and could cause also different spacing in the HTML pages. Also because it is a lawyer text, the difference recognition is crucial and any manipulation of data should be prohibited. Also a preprocessing is considered as a source of errors.

      Unformatted XML is almost always one or couple of huge lines. We cope with documents at maximum smaller than 5 MB.

      Bye
      Miro

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16002

        #4
        Hello Miro,

        Thanks for the suggestion. Comparing across line breaks and XML comparisons are on our wishlist. I'll add your ideas to our entry on the subject.
        Aaron P Scooter Software

        Comment

        Working...