Friday 19 December 2008

Parsing top-level constituents and treebank caching

Since parsing with non-top constituents had a bad effect on efficiency, the LOAD command has gone back to loading the normal grammar, as it did before. If you want to be able to parse with non-top constituents, I have introduced a new command, LOAD_DEBUG, which loads an extended version of the grammar suitable for debugging. You are advised not to use this for creating specialised grammars.

The initial version of this functionality turned out to have a rather nasty bug, which is I think the thing that got Beth Ann yesterday... there was an incorrect interaction between LOAD_DEBUG, grammar caching and treebank caching. This meant that treebank creation sometimes incorrectly thought that the grammar rules had been changed when in fact they hadn't, and unnecessarily reparsed the training corpus. Beth Ann, please update from Regulus and see if this is now fixed!