No announcement yet.

Processing and efficiency?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Processing and efficiency?

    As I'm continuously working on File Formats (and a long list I still want to do) I'd like to have a little more insight into how the Grammars are processed, and whether or how techniques I'm using now may impact efficiency.

    My current approach is as follows:
    1. I put Grammar items that need to be processed first earlier in the list (this mostly works)
    2. I separate Grammar items for the same "element" type into logical items, rather than batch them all together - for instance, I'm working on a Grammar for Apache 1.3, where I have a large number of "directive" Grammar items, one for the core, and one for each module; several modules have only a single, or just two or three directives. While I can't add a comment to each item (yet?), as long as I keep them in alphabetical order by module, it does at least help for further maintenance and will help for deriving a second File Format for Apache 2.x.
    3. When one logical "element" type needs mostly plain text items and a (small) number of regular expressions, I tend to put the regular expressions in a separate grammar item

    Specific questions:
    • How would a grammar set up like described above be processed?
    • Would separating an element type in a large number of Grammar items have any negative impact - on startup? - when applying to a file?
    • Does a mix of "basic" and "list" items for a single element type have any impact?
    • Does it help to separate out the regular expressions from a list with mostly non-REs?
    • Does it make any difference whether a "list" item is sorted alphabetically?
    • What happens when several lists for the same "element" contain identical strings?

    It would be great to have some insight in these issues. My File Formats tend to become rather large, and while maintainability is an issue (and being able to add notes would help a great deal!), for the larger Grammars performance becomes an issue as well. If I knew more about how these are processed I may be able to better balance the techniques I'm using.

    Last edited by Marjolein Katsma; 20-Oct-2007, 03:24 PM. Reason: New question added

  • #2
    Erik might be able to give more complete help, but I have a pretty good idea how things work:

    Processing-wise, delimited, basic, and list items are all grouped together as a big state table (DFA). Plain text matches are just converted to regular expressions with all the special characters escaped, so splitting them out from REs won't affect anything. There shouldn't be a significant difference in speed between having items split out or grouped together. The table implicitly sorts elements, so sorting them alphabetically shouldn't affect file processing speed, but might affect the table building time. The element types aren't checked until after a regular expression has been matched, so having six lists mapped to 'comment' or one list six times as long should take about the same amount of time to process. The "basic" type is essentially a one item long list.

    The state tables are cached after a file format is used once, so to see the effect on startup time:
    1) Uncheck your format in the File Format Manager dialog, so it's not used by default.
    2) Start a fresh copy of Cirrus with the files you're using. It should use the <Default> file format. The status bar will show the comparison time without a grammar.
    3) Select your format explicitly from the Umpire button dropdown or Session->Rules submenu. The status bar will now show the comparison time including building the state table.
    4) Switch back to <default> and then again to your format. The status bar will now show the comparison time with the state table already built.
    ZoŽ P Scooter Software


    • #3
      Thanks, Craig, that's quite helpful!

      Of course if Erik has any further insights to share, I'd be grateful.

      (And isn't the Scooter office closed during the weekend? )