Thursday, 9 October 2008

Parsing non-top constituents

Following a conversation with Pierrette last week, I realised that there was an easy way to fix things so that we can parse non-top constituents in the LC (normal) parser, as well as the DCG one. I have just checked in a first version of the new functionality. Now, when you load a grammar using the LOAD command, an extra file of dummy rules is created and added to the ones explicitly specified. There is one dummy rule for each category Cat in the grammar, of the form

dummy_top:[sem=Sem] --> Cat, Cat:[sem=Sem]

For example, the dummy rule for 'np' is

dummy_top:[sem=Sem] --> np, np:[sem=Sem]

What this means is that you can now parse NPs at top-level by simply prefacing them with the word 'np'. Thus for instance in Calendar we can do things like the following:

>> np the last meeting in geneva
(Parsing with left-corner parser)

Analysis time: 0.97 seconds

Return value: [[at_loc,[[spec,name],[head,geneva]]],[head,meeting],[spec,the_last]]

Global value: []

Syn features: []

Parse tree:

/ lex(np)
| np [GENERAL_ENG:2026-2044]
| / np [GENERAL_ENG:1849-1863]
| | / d lex(the) lex(last) [GEN_ENG_LEX:355-355]
| | | nbar [GENERAL_ENG:2071-2083]
| | \ n lex(meeting) [CALENDAR_LEX:88-89]
| | post_mods [GENERAL_ENG:1591-1680]
| | / pp [GENERAL_ENG:1747-1765]
| | | / p lex(in) [CALENDAR_LEX:151-151]
| | | | np [GENERAL_ENG:1955-1963]
| | | \ name lex(geneva) [GENERATED_NAMES:41-41]
\ \ \ post_mods null [GENERAL_ENG:1410-1416]

------------------------------- FILES -------------------------------

CALENDAR_DUMMY_TOP_LEVEL_RULES: c:/cygwin/home/speech/regulus/examples/calendar/generated/calendar_dummy_top_level_rules.regulus
CALENDAR_LEX: c:/cygwin/home/speech/regulus/examples/calendar/regulus/calendar_lex.regulus
GENERAL_ENG: c:/cygwin/home/speech/regulus/grammar/general_eng.regulus
GENERATED_NAMES: c:/cygwin/home/speech/regulus/examples/calendar.regulus
GEN_ENG_LEX: c:/cygwin/home/speech/regulus/grammar/gen_eng_lex.regulus

Semantic triples: []

No preferences apply

I should be able to improve this a little, in particular by adding some functionality to display the features on the non-top constituent as well as the semantics, but hopefully the existing version will already be quite useful.

