No announcement yet.

unimportant text regexp help with "or" (|)

  • Filter
  • Time
  • Show
Clear All
new posts

  • unimportant text regexp help with "or" (|)


    In comparing XML files, I'm trying to ignore when one file uses
    <element title="something"/>
    and another file uses
    <element title="something">
    since XML sees them as the same (or close enough).

    Using regexp I've tried various uses of "or"(|) and it always seems to see them as different:
    • <element title="something"(/>|>\s*</element>)
    • />|>\s*</element>
    Is something wrong with the regexp, or is it the way that I'm trying to do it? I've been adding these to the Rules "Unimportant text" field and running with "Minor" selected. I've been successful ignoring other regexp text strings including one with optional whitespace including a line break (\s*), but this is the only one using "or". I haven't tried making a grammar rule, would that make a difference? Or, maybe I'm wrong in using "or" here?

    Thanks for any help

  • #2

    There isn't a method to easily ignore the restructured node across the line break.

    Your RegEx can match on specific sections of text, but the line break is what is throwing off this comparison. Comparison is line by line, so you can define either Unimportant or Text Replacements to define importance for that pairing, but not logically to </element> from the line above it. It would have to be whatever it is on the left with just </element>, if that concept can be defined separately.

    By including the element title and "something", this text is included in the element and would be ignored if the element is unimportant. The definition doesn't control the structure, but if what it matches on is Important or Unimportant.
    If you wanted a guide for how to define a grammar element and mark it as unimportant, we have a guide here:
    Aaron P Scooter Software


    • #3
      Another method is to use a Tidy program to restructure the nodes, normalize the whitespace, etc. XML Tidy is an additional file format which would remove the trailing </node> and use the same formatting on both sides. If edited, however, it will save in the new format as literal text.

      You can download XML Tidy (or XML Sorted) file formats, here. Once installed, the highest format in the Tools menu -> File Formats list controls which format is used automatically.
      Aaron P Scooter Software