Processing and efficiency?

**Zoë** · 20-Oct-2007, 12:51 PM

Erik might be able to give more complete help, but I have a pretty good idea how things work:

Processing-wise, delimited, basic, and list items are all grouped together as a big state table (DFA). Plain text matches are just converted to regular expressions with all the special characters escaped, so splitting them out from REs won't affect anything. There shouldn't be a significant difference in speed between having items split out or grouped together. The table implicitly sorts elements, so sorting them alphabetically shouldn't affect file processing speed, but might affect the table building time. The element types aren't checked until after a regular expression has been matched, so having six lists mapped to 'comment' or one list six times as long should take about the same amount of time to process. The "basic" type is essentially a one item long list.

The state tables are cached after a file format is used once, so to see the effect on startup time:
1) Uncheck your format in the File Format Manager dialog, so it's not used by default.
2) Start a fresh copy of Cirrus with the files you're using. It should use the <Default> file format. The status bar will show the comparison time without a grammar.
3) Select your format explicitly from the Umpire button dropdown or Session->Rules submenu. The status bar will now show the comparison time including building the state table.
4) Switch back to <default> and then again to your format. The status bar will now show the comparison time with the state table already built.

**Marjolein Katsma** · 20-Oct-2007, 02:46 PM

Thanks, Craig, that's quite helpful!

Of course if Erik has any further insights to share, I'd be grateful.

(And isn't the Scooter office closed during the weekend?

)

Processing and efficiency?

Processing and efficiency?

Comment

Comment