Saturday 25 October 2008

Default parse preferences for specialised grammars

Following a discussion with Pierrette, I have added default parse preferences for specialised grammars, based on the geometric mean of the rule frequencies as observed in the training corpus. This is what we have been doing for some time in generation. To get the new functionality, you need to update Regulus and remake the specialised grammar you are using. Most of the time, you shouldn't notice anything new, except that the rule frequencies will be displayed in the parse trees, as in the following example:

>> is it a sharp pain
(Parsing with left-corner parser)

Analysis time: 0.12 seconds

Return value: [(object=[adj,sharp]), (agent=[pronoun,it]), (object=[secondary_symptom,pain]),
(null=[tense,present]), (null=[utterance_type,ynq]), (null=[verb,be]),
(null=[voice,active])]

Global value: []

Syn features: []

Parse tree:

.MAIN (freq 836) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:2629-3470]
top (freq 830) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:3471-4306]
utterance (freq 622) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:4307-4934]
s (freq 31) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:11277-11313]
/ vbar (freq 461) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:5947-6413]
| / v lex(is) (freq 39) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:10819-10863]
| | np (freq 314) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:7213-7532]
| \ pronoun lex(it) (freq 53) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:10336-10394]
| tmp_cat_12 (freq 31) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:11314-11317]
| / np (freq 1153) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:1622-2628]
| | / np (freq 63) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:9998-10066]
| | | / d lex(a) (freq 86) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:9110-9201]
| | | | tmp_cat_6 (freq 63) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:10067-10070]
| | | | / adj lex(sharp) (freq 6) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:14728-14739]
| | | \ \ n lex(pain) (freq 389) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:6818-7212]
| | \ post_mods null (freq 1399) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:615-1621]
\ \ post_mods null (freq 1399) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:615-1621]

------------------------------- FILES -------------------------------

MED_ROLE_MARKED_SPECIALISED_DEFAULT: c:/cygwin/home/speech/speechtranslation/medslt2/eng/generatedfiles/med_role_marked_specialised_default.regulus

Preference information:

1.80 Rule frequency score
Total preference score: 1.80

The bad news: I was hoping this would solve an annoying problem in Eng/Spa bidirectional. Unfortunately, it doesn't seem to do that. No idea why this used to work, in fact!

No comments: