Sunday, 26 July 2009

Treebank caching and preferences

Pierrette reminded me last week that there was a known problem in treebank caching: when parse preferences change, cached analyses may no longer be valid.

I've just checked in new code which stores the preferences used to create the treebank, and compares them with the current preferences. If the two are different, the treebank is regenerated. It would be nice if we could only regenerate the part that might be affected by the changed preferences, but that's unfortunately very difficult to do.

Monday, 20 July 2009

Regulus for DORIS, continued

I did some more tweaking of the DORIS grammar, and it now covers about 90% of Patrick's corpus. The Nuance grammar it generates is nice and compact (less than 1000 rules), and many of the remaining coverage holes still look easy to solve. As we hoped, this seems to be a good domain for Regulus.

Regulus for DORIS

When I was visiting Melbourne Uni last week, I talked with Patrick Ye about the possibility of using Regulus to provide speech recognition for the DORIS project, which, as he pointed out, is in fact quite similar to SHRD2; in both domains, the basic idea is to find things, pick them up, and move them around. I did indeed find it very easy to use the examples corpus and vocabulary that Patrick sent me to adapt the existing SHRD2 resources, and in just a few hours put together an initial Regulus grammar that could be compiled into a recogniser. The current version of the grammar covers a bit more than 80% of Patrick's corpus, and the recogniser can turn spoken sentences in Australian-accented English into either strings of words or scoped logical forms. For example, here's the representation it produces of "the red book is on the desk":

[[dcl,
quant(def_sing, A, [[book,A],[color,A,red]],
quant(def_sing, B,
[[desk,B]],
quant(exist,C,[[be_on_loc,C,A,B],[tense,C,present]],true)))]]

If people want to look at the details, the files are checked in at http://regulus.cvs.sourceforge.net/viewvc/regulus/Regulus/Examples/Doris/. The interesting ones are the lexicon, at http://regulus.cvs.sourceforge.net/viewvc/regulus/Regulus/Examples/Doris/Regulus/doris_lex.regulus, and the corpus, at http://regulus.cvs.sourceforge.net/viewvc/regulus/Regulus/Examples/Doris/corpora/doris_corpus.pl

Friday, 10 July 2009

First steps in Bridge system

With help from Cathy Chua, our Bridge expert, I've been putting together a first version of a grammar for the Bridge domain this week. Cathy has been supplying vocabulary and a corpus of examples showing how to use it, and I've used that to build an initial lexicon. The specialized grammar derived from these resources now has a vocabulary of about 220 surface words. Here are some typical utterances it can already handle:

who has the queen of clubs
who bid one no trump
is two clubs a transfer
cover the ten with the jack
can you finesse in diamonds
can you make if spades are four one

We tried compiling a recognizer using the Australian English package, and, with Cathy's Australian voice, recognition is anecdotally quite good. (I have discovered that the difference between British English and Australian English is substantial). We will be adding more coverage over the weekend. Next week, I hope we'll be able to start thinking concretely about how to hook up the Regulus components with BASSINET, Leon Stirling's Bridge program, to produce a first cut at an end-to-end system that can respond to spoken questions and commands.

As we expected, the Bridge domain is quite a lot more more complex than anything we have tried so far in Regulus. It will definitely stretch us!