Comparing HTML with all elements unimportant

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chrisjj
    Carpal Tunnel
    • Apr 2008
    • 2537

    Comparing HTML with all elements unimportant

    Is this possible? I've looked under Importance but find no rule that covers the HTML tags. "Keyword" covers just the tag names i.e. not attributes.

    Thanks.
  • Aaron
    Team Scooter
    • Oct 2007
    • 16002

    #2
    We have an HTML rule that displays just the text of the HTML file, HTML to Text:
    http://www.scootersoftware.com/downl...oreformats_alt
    This would remove the tags from view entirely.

    If you need them present, but unimportant, you may need to define new Grammar items that encapsulate the text you wish to define as unimportant. This could be a delimited grammar from "<" to ">", or something more complex.
    http://www.scootersoftware.com/suppo..._unimportantv3

    Let us know if you have any questions. Please include any sample files and your current settings. You can email us at [email protected], and please include the link back to this forum post.
    Aaron P Scooter Software

    Comment

    • chrisjj
      Carpal Tunnel
      • Apr 2008
      • 2537

      #3
      > If you need them present, but unimportant

      I do.

      > you may need to define new Grammar items that encapsulate the text
      > you wish to define as unimportant. This could be a delimited grammar from
      > "<" to ">", or something more complex.
      > http://www.scootersoftware.com/suppo..._unimportantv3

      Thanks. I've followed that but it doesn't work, even if I uncheckmark the importance of all the preexisting elements:


      Any ideas?

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16002

        #4
        The Keyword definition is probably swallowing the Tag definition. You may need to delete your definition for Keywords. I would suggest making a copy of your current HTML rule and make edits there. Then place the default file format lower in the list. This way, you can revert to default behavior if needed.
        Aaron P Scooter Software

        Comment

        • chrisjj
          Carpal Tunnel
          • Apr 2008
          • 2537

          #5
          Originally posted by Aaron
          The Keyword definition is probably swallowing the Tag definition. You may need to delete your definition for Keywords.
          Thanks - deleting the first Keyword definition solved it, though of course leaving me unable to use the Keyword definition in this file format.

          Can you please explain this swallowing problem? Since I tried my Tag element both above and below the Keyword element, I am surprised that interference occurred.

          Comment

          • Michael Bulgrien
            Carpal Tunnel
            • Oct 2007
            • 1772

            #6
            Originally posted by chrisjj
            I've followed that but it doesn't work, even if I uncheckmark the importance of all the preexisting elements
            In my experience, unchecking the importance of all preexisting elements is not enough. Move your Tag grammar definition to the top of the list so that it is evaluated before the Keyword grammar definition.

            Edit: I see that a new post appeared while I was posting this one. If you already tried putting the Tag grammar definition first then I, too, am surprised that "interference occurred". There must be some "undocumented" override for some of the built-in grammar types (i.e. Comments processed before keywords, keywords processed before other grammar types, etc.)
            Last edited by Michael Bulgrien; 12-Apr-2011, 06:05 PM.
            BC v4.0.7 build 19761
            ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

            Comment

            • chrisjj
              Carpal Tunnel
              • Apr 2008
              • 2537

              #7
              > If you already tried putting the Tag grammar definition first

              Here it is:



              > then I, too, am surprised that "interference occurred". There must be some
              > "undocumented" override for some of the built-in grammar types

              I wait to hear. Thanks.

              Comment

              • Erik
                Team Scooter
                • Oct 2007
                • 437

                #8
                Originally posted by chrisjj
                Thanks - deleting the first Keyword definition solved it, though of course leaving me unable to use the Keyword definition in this file format.
                Each character in the file can only be classified as a single element type. Therefore, if you define "Tag" to match all characters between "<" and ">", the "Keyword" definition that matches parts of tags is completely useless.
                Erik Scooter Software

                Comment

                • chrisjj
                  Carpal Tunnel
                  • Apr 2008
                  • 2537

                  #9
                  Originally posted by Erik
                  ...if you define "Tag" to match all characters between "<" and ">", the "Keyword" definition that matches parts of tags is completely useless.
                  It remains useful for enabling when required in the Important list. What mystifies me is even when disabled, it somehow overrides the Tag element - including when the tag element has priority in the list.

                  Comment

                  • Erik
                    Team Scooter
                    • Oct 2007
                    • 437

                    #10
                    You can't "disable" a grammar item. The only way to prevent it from classifying text is to delete it. You can change whether or not it is important. You cannot meaningfully use the built-in keyword definition and your new tag definition at the same time.
                    Erik Scooter Software

                    Comment

                    • chrisjj
                      Carpal Tunnel
                      • Apr 2008
                      • 2537

                      #11
                      Originally posted by Erik
                      You can't "disable" a grammar item. The only way to prevent it from classifying text is to delete it.
                      Shouldn't the Keyword item be disabled by a match on the the higher Tag item? As per the documentation:

                      Text Format Grammar Settings
                      ...
                      Items higher on the list take precedence over lower items.

                      Comment

                      • Aaron
                        Team Scooter
                        • Oct 2007
                        • 16002

                        #12
                        Hello Chris,

                        The order is significant in helping to break ties between multiple possible matches. However, other factors can make a section of text match one grammar over another before the list precedence is used. In this example, the length of the match matters more than the position in the priority list. The 'Keyword' is a longer match than the left side of the delimiter ("<"). In this case, the longest match will used. If they are equal in length, then the list's priority breaks a tie.
                        Aaron P Scooter Software

                        Comment

                        • chrisjj
                          Carpal Tunnel
                          • Apr 2008
                          • 2537

                          #13
                          Originally posted by Aaron
                          The order is significant in helping to break ties between multiple possible matches. However, other factors can make a section of text match one grammar over another before the list precedence is used.
                          Thanks. That's news to me, despite me having read the Help. Did I miss it somewhere?

                          Originally posted by Aaron
                          In this example, the length of the match matters more than the position in the priority list. The 'Keyword' is a longer match than the left side of the delimiter ("<"). In this case, the longest match will used.
                          Note that in this case the longest match is Tag, and it is not being used.

                          Comment

                          • Aaron
                            Team Scooter
                            • Oct 2007
                            • 16002

                            #14
                            Originally posted by chrisjj
                            Thanks. That's news to me, despite me having read the Help. Did I miss it somewhere?

                            Note that in this case the longest match is Tag, and it is not being used.
                            The delimited type matches on the left side first, and the left side of tag is only "<". Making this behavior clearer and improving on it in general is on our wishlist for a future version of Beyond Compare.

                            I recommend creating the copy of the File Format, and deleting the Keyword definition from the copy. You can then toggle between the two methods of comparison using the dropdown on the toolbar.
                            Aaron P Scooter Software

                            Comment

                            • chrisjj
                              Carpal Tunnel
                              • Apr 2008
                              • 2537

                              #15
                              Originally posted by Aaron
                              The delimited type matches on the left side first, and the left side of tag is only "<". Making this behavior clearer and improving on it in general is on our wishlist for a future version of Beyond Compare.
                              Thanks. I suggest precedence should go by what matches the element, not just part of the element "first".

                              Comment

                              Working...