Wednesday, 23 September 2009

New treatment of Japanese verbs

I've been discussing Japanese verbs with Yukie - we need new inflectional forms for CALL-SLT, in the particular the volitional (-tai) form, and the old system was getting out of hand. Japanese has extraordinarily regular morphology, with only two irregular verbs and some very straightforward sound-changes, so it seemed to me that we really ought to be able to get by without explicitly listing all the inflections of every verb we needed.

Yukie wrote down a table of inflections, and we discussed ways of splitting up inflected verbs into stems and affixes. Based on our discussion, I've implemented a first version of a new treatment, where you now only need to specify a single root form of the verb, and everything else is done by morphotax rules, where the affixes are treated by Nuance as separate words. I've tested by converting the Japanese Calendar lexicon to the new form, and compiling into a recognizer. Coverage is what it was, and recognition is anecdotally fine with my voice. I will dig out some Japanese Calendar data soon and run proper tests.

If anyone wants to look at the details, the morphotax rules are in $REGULUS/Grammar/Japanese/japanese_verb_morphology.regulus. The new version of the Japanese Calendar lexicon is at $REGULUS/Examples/Calendar/Regulus/japanese_calendar_lex_new.regulus.

Here's an example of a parse:

$ nanji ni owa ri mashita ka

(Parsing with left-corner parser)

Analysis time: 0.09 seconds

Return value: [[question,form(past,[[owaru],[ni,term(null,nanji,[])]])]]

Global value: []

Syn features: []

Parse tree:

utterance [JAPANESE_CORE_RULES:120-123]
/ main_clause [JAPANESE_CORE_RULES:147-151]
| / comps [JAPANESE_CORE_RULES:190-195]
| | / pp [JAPANESE_CORE_RULES:414-423]
| | | / np [JAPANESE_CORE_RULES:267-273]
| | | | n lex(nanji) [JAPANESE_CALENDAR_LEX_NEW:84-84]
| | | \ p lex(ni) [JAPANESE_CALENDAR_LEX_NEW:274-284]
| | \ comps null [JAPANESE_CORE_RULES:163-166]
| | vbar [JAPANESE_CORE_RULES:249-253]
| | / v_stem [JAPANESE_VERB_MORPHOLOGY:27-38]
| | | / v_stem lex(owa) [JAPANESE_CALENDAR_LEX_NEW:227-237]
| | | \ stem_affix lex(ri) [JAPANESE_VERB_MORPHOLOGY:138-138]
| \ \ affix lex(mashita) [JAPANESE_VERB_MORPHOLOGY:80-83]
\ lex(ka)

