Regulus News: 2008

Friday, 19 December 2008

Parsing top-level constituents and treebank caching

Since parsing with non-top constituents had a bad effect on efficiency, the LOAD command has gone back to loading the normal grammar, as it did before. If you want to be able to parse with non-top constituents, I have introduced a new command, LOAD_DEBUG, which loads an extended version of the grammar suitable for debugging. You are advised not to use this for creating specialised grammars.

The initial version of this functionality turned out to have a rather nasty bug, which is I think the thing that got Beth Ann yesterday... there was an incorrect interaction between LOAD_DEBUG, grammar caching and treebank caching. This meant that treebank creation sometimes incorrectly thought that the grammar rules had been changed when in fact they hadn't, and unnecessarily reparsed the training corpus. Beth Ann, please update from Regulus and see if this is now fixed!

Tuesday, 28 October 2008

Improvement to dynamic lexicon functionality

I have just checked in some improvements to dynamic lexicons, which should considerably reduce the number of external files created at runtime. Hopefully this will improve recognition response times, but so far I don't have a non-trivial dynamic lexicon application to test on - so I would appreciate feedback from the Ford project. In particular, please let me know at once if anything appears to be broken. If necessary, you can reverse the change by reverting the file $REGULUS/Prolog/dynamic_lexicon.pl to the previous version.

Saturday, 25 October 2008

Default parse preferences for specialised grammars

Following a discussion with Pierrette, I have added default parse preferences for specialised grammars, based on the geometric mean of the rule frequencies as observed in the training corpus. This is what we have been doing for some time in generation. To get the new functionality, you need to update Regulus and remake the specialised grammar you are using. Most of the time, you shouldn't notice anything new, except that the rule frequencies will be displayed in the parse trees, as in the following example:


>> is it a sharp pain
(Parsing with left-corner parser)

Analysis time: 0.12 seconds

Return value: [(object=[adj,sharp]), (agent=[pronoun,it]), (object=[secondary_symptom,pain]),
               (null=[tense,present]), (null=[utterance_type,ynq]), (null=[verb,be]),
               (null=[voice,active])]

Global value: []

Syn features: []

Parse tree:

.MAIN (freq 836) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:2629-3470]
   top (freq 830) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:3471-4306]
      utterance (freq 622) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:4307-4934]
         s (freq 31) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:11277-11313]
         /  vbar (freq 461) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:5947-6413]
         |  /  v lex(is) (freq 39) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:10819-10863]
         |  |  np (freq 314) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:7213-7532]
         |  \     pronoun lex(it) (freq 53) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:10336-10394]
         |  tmp_cat_12 (freq 31) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:11314-11317]
         |  /  np (freq 1153) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:1622-2628]
         |  |  /  np (freq 63) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:9998-10066]
         |  |  |  /  d lex(a) (freq 86) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:9110-9201]
         |  |  |  |  tmp_cat_6 (freq 63) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:10067-10070]
         |  |  |  |  /  adj lex(sharp) (freq 6) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:14728-14739]
         |  |  |  \  \  n lex(pain) (freq 389) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:6818-7212]
         |  |  \  post_mods null (freq 1399) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:615-1621]
         \  \  post_mods null (freq 1399) [MED_ROLE_MARKED_SPECIALISED_DEFAULT:615-1621]

------------------------------- FILES -------------------------------

MED_ROLE_MARKED_SPECIALISED_DEFAULT: c:/cygwin/home/speech/speechtranslation/medslt2/eng/generatedfiles/med_role_marked_specialised_default.regulus

Preference information:

1.80  Rule frequency score
Total preference score: 1.80

The bad news: I was hoping this would solve an annoying problem in Eng/Spa bidirectional. Unfortunately, it doesn't seem to do that. No idea why this used to work, in fact!

Sunday, 12 October 2008

Parsing non-top constituents (continued)

I have now checked in an improved version of the functionality for parsing non-top constituents, which hides the dummy rules and shows the features for the constituent. Here are a couple of examples from Toy1Specialised:


>> np the light in the kitchen
(Parsing with left-corner parser)

Analysis time: 0.55 seconds

Return value: [[device,light],[location,kitchen],[prep,in_loc],[spec,the_sing]]

Global value: []

Syn features: [agr=3/\sing,case=A,conj=n,def=y,gapsin=B,gapsout=B,n_appositive_mod_type=none,
n_of_mod_type=none,nform=normal,pronoun=n,sem_n_type=dimmable\/switchable,
syn_type=np_with_noun,takes_about_pp=n,takes_attrib_pp=n,takes_cost_pp=n,
takes_date_pp=n,takes_duration_pp=n,takes_frequency_pp=n,takes_from_pp=n,
takes_loc_pp=n,takes_partitive=n,takes_passive_by_pp=none,takes_post_mods=n,
takes_side_pp=n,takes_time_pp=n,takes_to_pp=n,takes_with_pp=n,wh=n]

Parse tree:

np [GENERAL_ENG:2026-2044]
/  np [GENERAL_ENG:1864-1874]
|  /  d lex(the) [GEN_ENG_LEX:341-344]
|  |  nbar [GENERAL_ENG:2071-2083]
|  \     n lex(light) [TOY1_LEX:44-47]
|  post_mods [GENERAL_ENG:1591-1680]
|  /  pp [GENERAL_ENG:1747-1765]
|  |  /  p lex(in) [TOY1_LEX:51-58]
|  |  |  np [GENERAL_ENG:2026-2044]
|  |  |  /  np [GENERAL_ENG:1864-1874]
|  |  |  |  /  d lex(the) [GEN_ENG_LEX:341-344]
|  |  |  |  |  nbar [GENERAL_ENG:2071-2083]
|  |  |  |  \     n lex(kitchen) [TOY1_LEX:38-39]
|  |  \  \  post_mods null [GENERAL_ENG:1410-1416]
\  \  post_mods null [GENERAL_ENG:1410-1416]

------------------------------- FILES -------------------------------
GENERAL_ENG: c:/cygwin/home/speech/regulus/grammar/general_eng.regulus
GEN_ENG_LEX: c:/cygwin/home/speech/regulus/grammar/gen_eng_lex.regulus
TOY1_LEX:    c:/cygwin/home/speech/regulus/examples/toy1specialised/regulus/toy1_lex.regulus

>> n light
(Parsing with left-corner parser)

Analysis time: 0.02 seconds

Return value: [[device,light]]

Global value: []

Syn features: [agr=3/\sing,conj=n,n_appositive_mod_type=none,n_of_mod_type=none,
n_post_mod_type=none,n_pre_mod_type=loc,sem_n_type=dimmable\/switchable,
takes_about_pp=n,takes_attrib_pp=n,takes_cost_pp=n,takes_date_pp=n,takes_det_type=def,
takes_duration_pp=n,takes_frequency_pp=n,takes_from_pp=n,takes_loc_pp=y,
takes_partitive=n,takes_passive_by_pp=none,takes_side_pp=n,takes_time_pp=n,
takes_to_pp=n,takes_with_pp=n]

Parse tree:

n lex(light) [TOY1_LEX:44-47]

------------------------------- FILES -------------------------------

TOY1_LEX: c:/cygwin/home/speech/regulus/examples/toy1specialised/regulus/toy1_lex.regulus

Thursday, 9 October 2008

Parsing non-top constituents

Following a conversation with Pierrette last week, I realised that there was an easy way to fix things so that we can parse non-top constituents in the LC (normal) parser, as well as the DCG one. I have just checked in a first version of the new functionality. Now, when you load a grammar using the LOAD command, an extra file of dummy rules is created and added to the ones explicitly specified. There is one dummy rule for each category Cat in the grammar, of the form


dummy_top:[sem=Sem] --> Cat, Cat:[sem=Sem]

For example, the dummy rule for 'np' is


dummy_top:[sem=Sem] --> np, np:[sem=Sem]

What this means is that you can now parse NPs at top-level by simply prefacing them with the word 'np'. Thus for instance in Calendar we can do things like the following:


>> np the last meeting in geneva
(Parsing with left-corner parser)

Analysis time: 0.97 seconds

Return value: [[at_loc,[[spec,name],[head,geneva]]],[head,meeting],[spec,the_last]]

Global value: []

Syn features: []

Parse tree:

.MAIN [CALENDAR_DUMMY_TOP_LEVEL_RULES:1-1]
   dummy_top [CALENDAR_DUMMY_TOP_LEVEL_RULES:22-22]
   /  lex(np)
   |  np [GENERAL_ENG:2026-2044]
   |  /  np [GENERAL_ENG:1849-1863]
   |  |  /  d lex(the) lex(last) [GEN_ENG_LEX:355-355]
   |  |  |  nbar [GENERAL_ENG:2071-2083]
   |  |  \     n lex(meeting) [CALENDAR_LEX:88-89]
   |  |  post_mods [GENERAL_ENG:1591-1680]
   |  |  /  pp [GENERAL_ENG:1747-1765]
   |  |  |  /  p lex(in) [CALENDAR_LEX:151-151]
   |  |  |  |  np [GENERAL_ENG:1955-1963]
   |  |  |  \     name lex(geneva) [GENERATED_NAMES:41-41]
   \  \  \  post_mods null [GENERAL_ENG:1410-1416]

------------------------------- FILES -------------------------------

CALENDAR_DUMMY_TOP_LEVEL_RULES: c:/cygwin/home/speech/regulus/examples/calendar/generated/calendar_dummy_top_level_rules.regulus
CALENDAR_LEX:                   c:/cygwin/home/speech/regulus/examples/calendar/regulus/calendar_lex.regulus
GENERAL_ENG:                    c:/cygwin/home/speech/regulus/grammar/general_eng.regulus
GENERATED_NAMES:                c:/cygwin/home/speech/regulus/examples/calendar.regulus
GEN_ENG_LEX:                    c:/cygwin/home/speech/regulus/grammar/gen_eng_lex.regulus

Semantic triples: []

No preferences apply

I should be able to improve this a little, in particular by adding some functionality to display the features on the non-top constituent as well as the semantics, but hopefully the existing version will already be quite useful.

Thursday, 18 September 2008

Dynamic Regulus lexicon entries

Regulus now includes an interface to Nuance dynamic grammar capabilities, making it possible in effect to add new lexicon entries at runtime. Dynamic lexicon entries need to be defined using macros which have been declared dynamic in the Regulus source file.

I have checked in a sample application in $REGULUS/Examples/Toy1SpecialisedDynamic; there is basic documentation in doc/README.txt. The application uses a version of the Toy1Specialised grammar in which commands need to be prefaced by a name. The user can dynamically add new names to the recognition vocabulary while the application is running. The following extract from the lexicon file shows the macro and declaration for the dynamic name entries:


macro(person_name(Surface, Sem),
      @name(Surface, [Sem], [agent], sing, [])).

dynamic_lexicon( @person_name(Surface, Sem) ).

At runtime, new name entries can be added using calls to the predicate assert_dynamic_lex_entry/1. A typical call might look like this:


assert_dynamic_lex_entry( @person_name((howard, the, duck), howard_the_duck))

Note that the infrastructure needed to run dynamic applications is somewhat different from the standard one. In particular, it is necessary to use a Resource Manager and a Compilation Server,and compile a dummy "just-in-time" recognition package. The sample application gives examples of the scripts required. I will be checking in proper documentation soon, andwill post again when I have done that.

Thursday, 11 September 2008

Incremental treebanking for grammar specialisation

I have just checked in some new code, which should make the process of creating a specialised grammar much more efficient. The most time-consuming part of the process is parsing the treebank, using the EBL_TREEBANK command, or commands like EBL_ANALYSIS which call it indirectly. Until now, the whole set of training sentences had to be parsed every time. This was wasteful, since the greater part of the parses in the existing treebank were often still valid.

The new functionality improves the picture by trying to determine which parses can be kept, and only reparsing the remaining ones. The current rules for determining which new parses are required are as follows:

After each invocation of EBL_TREEBANK, Regulus saves both the treebank and a copy of the grammar used to create it. The next time EBL_TREEBANK is called, the system compares the saved grammar and treebank with the current grammar and training corpus.
The grammar comparison determines two things: 1) Have any non-lexical rules changed? 2) If only lexical rules have changed, which lexical items are affected?
If non-lexical rules have changed, the whole treebank needs to be reparsed. Most often, however, this is not the case. If no rules, or only lexical rules, have changed, the treebank is incrementally updated as follows.
Any items in the treebank which correspond to sentences no longer in the current training corpus are removed.
Any items in the current training corpus which do not occur in the old treebank are parsed and added to the new treebank.
Any items in the treebank which include changed lexical items are reparsed and added to the new treebank.
All remaining items in the old treebank are kept.

You need to update Regulus to get the new functionality. Note that nothing will happen the first time you do EBL_TREEBANK after the update, since the old copy of the grammar is saved after EBL_TREEBANK is invoked, and you will not originally have an old saved grammar. So you will only notice a difference the second time you do EBL_TREEBANK.

I have done some testing, and things appear OK, but I know from experience that this kind of non-monotonic code often contains subtle bugs which aren't immediately apparent. Please let me know if things don't work as expected, and I will give priority to sorting out problems. If necessary, you can toggle the incremental treebanking functionality using the new commands INCREMENTAL_TREEBANKING_OFF and INCREMENTAL_TREEBANKING_ON. By default, incremental treebanking is on.

Thursday, 24 July 2008

Documentation for NUANCE_PARSER command

I've added some basic documentation for the new NUANCE_PARSER command. Here's what you get when you access it using DOC:


>> DOC NUANCE_PARSER
(Print documentation for command or config file entry)

NUANCE_PARSER
[Brief doc: Start new Nuance nl-tool process and use it as parser]

Start an nl-tool process, and use it to do parsing. Any old nl-tool processes
are first killed. The current config file needs to include either a
dialogue_rec_params declaration (for dialogue apps) or a translation_rec_params
declaration (for speech translation apps); the declaration must
contain definitions for 'package' and 'grammar'. The following is a
typical example of a suitable declaration:

regulus_config(dialogue_rec_params,
               [package=calendar_runtime(recogniser), grammar='.MAIN',
                'rec.Pruning=1600', 'rec.DoNBest=TRUE', 'rec.NumNBest=6']).

Notes:

-  After NUANCE_PARSER is successfully invoked, nl-tool is used for
ALL parsing, including batch processing with commands like TRANSLATE_CORPUS
and Prolog calls to parse_with_current_parser/6.
-  The Nuance parser only returns logical forms, not parse trees.

Thursday, 17 July 2008

Wednesday, 16 July 2008

Improvements to Regulus documentation

I have been doing some work on and off over the last few weeks to try and improve the Regulus documentation. It's one of those "important non-urgent" tasks that is very hard to schedule, because you always feel you have something that should take higher priority, but I do finally seem to have made some concrete progress.

There are three parts to the work, which are meant to be closely interlinked. First, I have created a directory under Regulus/doc called CommandDoc, which is supposed to contain one short file for each command and type of config file entry. I've so far populated it with the information in RegulusDoc.html, which certainly confirmed that RegulusDoc is badly out of date... I'm afraid half the files are currently empty.

Second, I have added a new top-level Regulus command called DOC. If you type DOC followed by the name of a command, you get the CommandDoc file printed out in a reasonably readable way. For example:


>> DOC LOAD_DIALOGUE
(Print documentation for command)

LOAD_DIALOGUE
[Brief doc: Load dialogue-related files]
Compile the files defined by the dialogue_files config file entry.

>> DOC EBL_NUANCE
(Print documentation for command)

EBL_NUANCE
[Brief doc: Compile current specialised Regulus grammar into Nuance GSL form]
Compile current specialised Regulus grammar into Nuance GSL form. Same
as the NUANCE command, but for the specialised grammar. The input is
the file created by the EBL_POSTPROCESS command; the output Nuance GSL
grammar is placed in the file defined by the ebl_nuance_grammar config
file entry.

>> DOC TRANSLATE_CORPUS
(Print documentation for command)

TRANSLATE_CORPUS
[Brief doc: Process text translation corpus]

Process the default text mode translation corpus, defined by the
translation_corpus config file entry. The output file, defined by
the translation_corpus_results config file entry, contains
question marks for translations that have not yet been judged. If
these are replaced by valid judgements, currently 'good', 'ok' or
'bad', the new judgements can be incorporated into the translation
judgements file (defined by the translation_corpus_judgements
config file entry) using the command
UPDATE_TRANSLATION_JUDGEMENTS.

TRANSLATE_CORPUS <Arg>
[Brief doc: Process text translation corpus with specified ID]

Parameterised version of TRANSLATE_CORPUS. Process the text mode
translation corpus with ID <Arg>, defined by the
parameterised config file entry
translation_corpus(<Arg>). The output file, defined
by the parameterised config file entry
translation_corpus_results(<Arg>), contains
question marks for translations that have not yet been judged. If
these are replaced by valid judgements, currently 'good', 'ok' or
'bad', the new judgements can be incorporated into the translation
judgements file (defined by the translation_corpus_judgements
config file entry) using the parameterised command
UPDATE_TRANSLATION_JUDGEMENTS <Arg>.

Third, and last, I have also arrange things so that the doc files are automatically included in the new Cookbook. This is still just a skeleton, but my plan is to start by completing the command and config-file section, so that the book will immediately be useful for something, and then work outwards from there. The PDF version is checked in as Regulus/doc/Cookbook/draft_cookbook.pdf.

Sunday, 13 July 2008

Substitutable help classes

Following a discussion with Nikos last month, I've added a new feature to the help system, so that help examples can be modified to be closer to the recognition result. Recall that the help system assumes that the designer will have declared a set of help classes; each class C defines a set of phrases P(C). When choosing a help match, both the recognition result and the help examples are backed off so that, for each class C, phrases in P(C) are replaced by C.

The new functionality I've just added makes it possible to declare some help classes as "substitutable". Suppose that class C is defined as substitutable, that the phrase P1 in the recognition result is backed off to C, and that the phrase P2 in a matched help example H is also backed off to C. In this case, H will not be presented in its original form, but with P2 substituted by P1. Evidently, not all help classes can be defined as substitutable, since it's essential that all the words in a substitutable class have exactly the same syntactic properties.

There are however some important classes which can in general be made substitutable, in particular (at least in English) names for specific types of individual, plural numbers, days of the week and months of the year. I've tested the new functionality on the Calendar app, and it does indeed seem to give considerably more useful responses. Here's an example. Without substitutable classes, the sentence "what meetings has nikos been to" gets the help responses


#1 : "what meetings has pierrette attended in geneva"
     "what meeting_noun has person_name attend_verb preposition loc_name" (backed off)
#2 : "which meetings has elisabeth attended"
     "which meeting_noun has person_name attend_verb" (backed off)
#3 : "what meetings is pierrette going to attend in geneva"
     "what meeting_noun is person_name going to attend_verb preposition loc_name" (backed off)
#4 : "what meetings is pierrette going to attend"
     "what meeting_noun is person_name going to attend_verb" (backed off)
#5 : "what meetings have there been in geneva"
     "what meeting_noun have there been preposition loc_name" (backed off)

With substitutable classes (person_name is one of them), the response is


#1 : "what meetings has nikos attended in geneva"
     "what meeting_noun has person_name attend_verb preposition loc_name" (backed off)
#2 : "which meetings has nikos attended"
     "which meeting_noun has person_name attend_verb" (backed off)
#3 : "what meetings is nikos going to attend in geneva"
     "what meeting_noun is person_name going to attend_verb preposition loc_name" (backed off)
#4 : "what meetings is nikos going to attend"
     "what meeting_noun is person_name going to attend_verb" (backed off)
#5 : "what meetings will nikos attend in geneva"
     "what meeting_noun will person_name attend_verb preposition loc_name" (backed off)

I haven't done any systematic testing, but anecdotally I'm pretty sure that this seems to make help more responsive.

Friday, 11 July 2008

MedSLT almost fully converted to AFF

I've made a lot of progress this week on converting the bidirectional English/Spanish version of MedSLT to AFF format. I've parameterized the Spanish system to support AFF representations, and added AFF versions of most of the necessary config file and scripts. In particular,

You can build a full AFF version of Spa by doing 'make role_marked' in Spa/scripts.
You can run interactive bidirectional AFF Eng/Spa text systems using the files load_bidirectional_role_marked.pl and load_bidirectional_restricted_role_marked.pl in EngSpa/scripts.
There are targets in EngSpa/scripts/Makefile for running AFF versions of the QA corpus, both plain and restricted, with the obvious naming conventions.

I've done preliminary testing, and everything should be checked in. There are still things missing, e.g. nothing so far in Spa/Spa for checking back-translation, but I figured it would be best at this point to hand over to Pierrette, so that she can refine the rules. I've only made some absolutely minimal changes, enough to check that a few sentences go through.

When this piece of work is finished, all of MedSLT should be available in AFF format, which means that we'll be able to retire the old linear version and only support one system. I'm hopeful that build times will then be low enough for us to go back to building and testing the system every night, as we used to do.

Tuesday, 8 July 2008

N-best rescoring again

[Updated July 9]

I've just checked in new code that makes it possible to create training material for doing N-best rescoring on speech translation applications - the functionality is basically the same as what we already had for dialogue applications, but there were a number of details that had to be fixed. It seems that the potential for improving performance using N-best rescoring varies considerably between apps. So far, we've looked at the following cases:

Calendar: can already almost halve error rate using rescoring, more should be possible.
Ford app: almost no potential for improvement.
Paideia app: considerable potential for improvement (don't currently have figures)
English MedSLT: maximum possible improvement looks like about 10% relative.
French MedSLT: maximum possible improvement about 15-20% relative.
Japanese MedSLT: almost no potential for improvement.

The variation in behavior between the different apps is quite surprising. In particular, I don't yet have a good explanation for why the MedSLT languages should be so different.

Tuesday, 24 June 2008

McNemar at the word level?

I was thinking about our paper for GoTAL, and one thing that's bothered me a little is that we did all the significance testing using SER - the reason was that it's easy to run a McNemar test. However, we got rather bigger improvements in WER, which is really what you would expect from SLMs.

It seems to me though that you should also be able to do McNemar at the word level. You look at each word in the transcription, and then check each of the two hypotheses you're comparing to see whether they include it. This is a little coarse-grained (you treat each sentence as a bag of words), but I'd guess it would still give interesting results. Shouldn't be at all hard to implement either. If we do an expanded version of the GoTAL paper, I'd definitely like to try this.

In fact, this idea is so obvious that either it's wrong, or someone must already have thought of it. Any idea which?

PS Jun 26. Beth Ann pointed out that the proposal as originally formulated only covered deletions, but it's trivial to extend it to do insertions too. More seriously, she wondered if the significance results would always be reliable, given that there may be subtle dependencies. I am really not sure about this, but one way to investigate the idea empirically would be to generate large sets of simulated recognition results using a stochastic process, and look at the distributions. For example, if you generate 10000 simulated recognition runs, then take one run and find all the other runs that come out as different from it at P < 0.01 according to the new statistic, you'd be reassured to find there were not more than 100 of them. A lot more, and something is presumably wrong. A lot less presumably just shows the test isn't very sensitive.

Faster parsing in Regulus using Nuance

Here's something I've been meaning to do for a while, that really should be moved up the priority stack. It should be quite easy to arrange things so that, in cases where we have compiled a grammar down to Nuance form, we use Nuance to do parsing - this ought to be much faster than the Regulus parser, and could really let us speed up corpus runs. There are at least two straightforward ways to implement it. One is to start an nl-tool process and pipe sentences into it, reading the analyses that come back. It may be even simpler to use the Regserver, now that we can connect to it from the Regulus top-level, and send an "interpret" message. More about this soon, I hope.

PS Jun 26. It was indeed very easy - I took the route of creating an nl-tool process and connecting to it with pipes. The new NUANCE_PARSER command now lets you use nl-tool as the parser. Parsing times are at least 30 times faster. Things should be checked in. More about this soon.

Tuesday, 17 June 2008

"Paraphrase corpora" for estimating semantic error rates

I've implemented a first cut at the "paraphrase corpus" idea that I suggested in yesterday's post. So far, it only works for speech translation, but it's rather nice to see that we can now measure the effect that N-best rescoring has on semantic error rate in a way that's both much quicker and much more objective than what we were doing previously. On the whole of the Eng corpus (the only one I've tried so far), semantic error rate on this metric is reduced by N-best rescoring by about 4% absolute, or 8% relative.

My next task here is to extend the method to dialogue processing - this should be easy, I think. We will then be able to do dialogue N-best rescoring experiments using out-of-coverage as well as in-coverage data, which should open up several new possibilities.

Monday, 16 June 2008

Better ways to estimate semantic error rate

I've just added some code to automatically estimate semantic error rate for translation applications. It does more or less the same thing as the code we've had for a while in dialogue apps, and counts an example from a speech corpus as semantically correct if it produces the same interlingua as the transcription would have done.

Unfortunately, the problem with this definition is that it doesn't work for utterances that are in domain, but out of grammar coverage. For example, I was just looking though the results for the English MedSLT corpus. In one example, the transcription is "does the pain ache", which is out of grammar coverage. The first hypothesis which produces well-formed interlingua is "does the pain feel aching", which is a good paraphrase and is selected. So this should really be counted as semantically correct, but isn't.

I think we can address the problem by allowing the developer to declare a file of paraphrases, and say that the example is semantically correct if it gives the same result as either the actual transcription or one of its paraphrases. Then if the developer adds in-coverage paraphrases where they exist, things will work correctly. This should be easy to implement. Probably we want a warning if a paraphrase in fact is also determined to be out of coverage.

This paraphrase functionality should also be useful for the N-best rescoring work that Maria and I have been doing for dialogue apps. We have the same problem there - we want to be able to experiment with out of coverage examples, but currently get no figures.

Wednesday, 11 June 2008

Interlingua corpora for multiple domains

Following a discussion with Pierrette last week, I have added two more MedSLT Interlingua corpora, for the chest pain and abdominal pain domains. I've also added all the associated config files, scripts etc for the currently relevant language pairs (EngInt, JapInt, IntEng, IntFre and IntJap), so it should now possible to do systematic interlingua-centered development for all three domains. I have only built AFF versions, since we're planning to retire the linear formalism soon.

The naming conventions are the usual ones. I hopefully managed to check everything in, but let me know if files that you expected to find are missing. Pierrette should at some point do some work tidying up IntFre and FreInt and Yukie should do the same for IntJap and JapInt. Further down the line, we should really add coverage for these domains in the missing languages.

Tuesday, 10 June 2008

AFF version of Catalan

I've added initial versions of all the files needed for the AFF version of Catalan in MedSLT. Naming conventions are the usual ones, and I was able to build all the AFF Cat resources by doing

make role_marked

in the Cat/scripts directory. There should now be config files for all 5 x 5 = 25 pairs of languages in {Ara, Cat, Eng, Fre, Jap} - this involved adding a few new pairs. I only tested Interlingua to Catalan and Catalan to Interlingua. We get currently translations for about 75% of the sentences in IntCat, and about 20% in CatInt. Hopefully it will be easy to improve these figures.

Over to Pierrette and Bruna to debug the rules. Note that I have macrotised the Cat lexicon to make the AFF version work. It should be mostly OK, but there were a few cases (in particular, WH+ PPs) where I wasn't quite sure how to do the macrotisation - people who actually know Catalan should review the entries.

Wednesday, 4 June 2008

Regulus 2.9.0 released

Nikos has just created and uploaded the new 2.9.0 release of Regulus. I tried downloading and running a couple of simple tests in text and speech mode (under SICStus 4.0.3), and Toy1 at least appears to work fine. Please mail me if you notice problems.

Here are the release notes:


MAIN CHANGES TO REGULUS BETWEEN 2.8.0 AND 2.9.0

A large number of new features have been added to Regulus since
version 2.8.0. Most importantly, Regulus now runs under Sicstus 4;
it is possible to use speech input directly from the top-level;
N-best processing is supported in both dialogue and translation mode;
and a new semantics for translation applications has been added.

The new features are listed below in more detail. Not all of them are
fully documented yet, but we are giving priority to adding the
necessary documentation.

- Support for Sicstus 4
  - Regulus runs under Sicstus 4.
    - It has been thoroughly tested under 4.0.2.
    - Some testing has been done under 4.0.3, but this has not yet been carefully
      verified. NOTE: under 4.0.3, it is necessary to load the patch files in
      Prolog/SicstusPatches/4.0.3
  - Regulus still runs under Sicstus 3, and has been thoroughly
    tested under 3.12.5.

- Top-level
  - Errors are now written to stderr
  - There is a version of regulus_batch with an extra argument, which returns
    the list of error outputs created when running the commands.
  - It is possible to compile Nuance grammars from the Regulus top-level
    using the NUANCE_COMPILE command.
  - It is possible to perform speech recognition directly from the top-level
    - The LOAD_RECOGNITION command starts defined speech resources, including
      a license manager, recserver and Regserver
    - After loading resources using LOAD_RECOGNITION, the RECOGNISE command
      takes live speech input and passes it to the current application.
    - Wavfiles are automatically logged by RECOGNISE. The WAVFILES  command
      lists the  most recent recorded wavfiles.
    - When speech resources are loaded, text input of the form

      WAVFILE: 

      performs recognition on , and passes the result to the
      current application

- Java GUI
  - The Java GUI has been greatly improved, and many bugs have been fixed.
  - The GUI supports direct speech input, similar to the Prolog top-level
    described above
  - It is possible to run multiple copies of the GUI at the same time.

- Stepper
  - The commands LOAD, LOAD_GENERATION, EBL_LOAD and EBL_LOAD_GENERATION
    can be invoked from within the stepper.

- Support for spoken dialogue applications
  - When speech resources have been loaded from the command line,
    dialogue corpora can contain items of the form wavfile().
    This makes it possible to test corpora containing a mixture of speech
    and non-speech inputs.
  - Batch processing of speech input in dialogue mode produces figures
    for semantic error rate. An utterance is deemed semantically correct if
    it produces the same dialogue move as the transcription would have done.
  - A timeout has been added in batch dialogue processing, so that processing
    gives up after 10 seconds.
  - If N-best preferences are defined, preference info is printed in
    dialogue mode.
  - Allow dialogue server to take XML-formatted requests

- Generation
  - When the declaration

    regulus_config(prolog_semantics, yes).

    is included, generation grammars can contain arbitrary Prolog structures.

- Translation
  - There is extensive support for translation using both the original
    "linear" semantics, and also the new "Almost Flat Functional" (AFF)
    semantics. AFF is described in our COLING 2008 paper, which will soon
    posted on the Regulus website. Some initial documentation will be added
    to RegulusDoc.htm.
  - It is possible in a translation config file to define an interlingua
    as either a source or a target language. There are many examples
    in the MedSLT project directory.
  - Batch translation produces output files for judging both in Prolog
    and in CSV form. There are new commands for updating judgements from the
    CSV files.
  - When speech resources have been loaded from the command line,
    translation corpora can contain items of the form wavfile().
  - A simple version of N-best processing has been added for applications
    that use interlingual translation with an interlingua grammar. In N-best mode,
    the first utterance producing well-formed interlingua is selected.
  - Interlingua expressions ambiguous according to the interlingua grammar
    are flagged in translation mode.
  - If performing batch translation from Source to Target through Interlingua,
    combine available Source -> Interlingua and Interlingua -> Target
    judgements into Source -> Target judgements if possible.
  - Show average number of generated target language surface forms when
    doing batch translation.
  - Translation conditions can include elements of the form

    context_below()

    This matches an  in a clause.

- Grammar specialisation
  - Fix bug in processing of include_lex declarations.

- Help
  - When defining intelligent help for translation applications, help resources
    can be built from an interlingua corpus.

- Extension to Regulus grammar formalism
  - Allow =@ as synonym for = @
  - Add runtime support for GSL functions strcat/2, add/2, sub/2, neg/1, mul/2, div/2

- English grammar
  - Rules for dates including years have been added.

- Other
  - Tool added to perform random generation from PCFG-trained GSL grammars

Monday, 2 June 2008

Problems with SICStus 4.0.3 resolved

The SICStus people were as usual very responsive, and we now seem to be OK for running under 4.0.3. However, (this is IMPORTANT), you need to install a couple of patch files if you are using that version of SICStus. So, if you're using 4.0.3, do the following:

Update Regulus from CVS, using the -d option to get new directories.
Copy the files from Prolog/SicstusPatches/4.0.3 to C:/Program Files/SICStus Prolog 4.0.3/library, or wherever you have your copy of SICStus.

I will set my default version of SICStus to 4.0.3, which means I'll no doubt test it a fair amount over the next few days. I would not recommend people to switch over to 4.0.3 until I've run with it a while and reported on how it's working.

Problems with SICStus 4.0.3

We are unfortunately still having problems with SICStus 4. Things have been more or less stable with 4.0.2, but there were a few rather ugly patches - the SICStus people said things would be better in the next version. Sad to tell, I have just downloaded 4.0.3 and tried it out, and in fact, at least as far as Regulus is concerned, it's gone backwards. Due to new incompatibilities in the operating system interface libraries, it's not currently possible to run Regulus in speech mode with 4.0.3 - there may also be other problems. I can presumably implement a workaround, but the idea of having to patch the code after every new SICStus release makes me very nervous.

For Prolog people who want the low-level details, here is part of the mail I just sent to the SICStus team:


Unless I am misunderstanding something important, SP4.0.3's version of the
system3 library is still not  downward-compatible with SP3's system library, and is in fact rather
less downward-compatible than SP4.0.2's system3. The problem is now in system/1.
In SP4.0.2, system/1 is defined as follows:

system(Cmd) :-
  system_binary(Binary, DashC),
  proc_call(Binary, DashC, Cmd, exit(0)).

so it's possible to make calls like the following, running under Cygwin:

| ?- system('dir > tmp_dir.txt').
      1      1 Call: system('dir > tmp_dir.txt') ?
      2      2 Call: system3:environ('COMSPEC',_790) ?
      2      2 Exit: system3:environ('COMSPEC','C:\\WINDOWS\\system32\\cmd.exe') ?
      3      2 Call: system3:process_create('C:\\WINDOWS\\system32\\cmd.exe',['/C','dir > tmp_dir.txt'],system3:[process(_1437)]) ?
      3      2 Exit: system3:process_create('C:\\WINDOWS\\system32\\cmd.exe',['/C','dir > tmp_dir.txt'],system3:[process('$process'('$ptr IEDNJP'))]) ?
      4      2 Call: system3:process_wait('$process'('$ptr IEDNJP'),exit(0)) ? s
      4      2 Exit: system3:process_wait('$process'('$ptr IEDNJP'),exit(0)) ?
      1      1 Exit: system('dir > tmp_dir.txt') ?

Under SP4.0.3, system/1 is defined thus:

system(Cmd, Status) :-
      shell_exec(Cmd, [], exit(Status)).

and the corresponding call looks like this:

| ?- system('dir > tmp_dir.txt').
      1      1 Call: system('dir > tmp_dir.txt') ?
      2      2 Call: system3:system('dir > tmp_dir.txt',0) ?
      3      3 Call: system3:process_create('dir > tmp_dir.txt',[],system3:[commandline(true),process(_1119)]) ?
      3      3 Exit: system3:process_create('dir > tmp_dir.txt',[],system3:[commandline(true),process('$process'('$ptr ALJLOO'))]) ?
      4      3 Call: system3:process_wait('$process'('$ptr ALJLOO'),exit(0)) ?
      4      3 Fail: system3:process_wait('$process'('$ptr ALJLOO'),exit(0)) ?
      2      2 Fail: system3:system('dir > tmp_dir.txt',0) ?
      1      1 Fail: system('dir > tmp_dir.txt') ?

The problem, as far as I can see, is that process_create requires the first arg
of process_create to be a program, which it isn't here.

Unfortunately, we have people running Regulus under at least 3.12.5, 4.0.2 and 4.0.3.
Maintaining the code so that it runs under all these different versions is
becoming quite difficult - the operating system interface primitives are
absolutely essential. Advice appreciated.

Interlingua corpora

Over the last few months, we have been moving MedSLT development towards a new way of doing things, which is based on the idea of an "Interlingua corpus". We present the basic picture in our LREC 2008 paper, but that's already somewhat out of date, and doesn't give any low-level details.

We now have four interlingua corpora, representing the cross-produce of {linear, AFF} x {plain, combined}. The linear/AFF distinction is concerned with the type of semantics used. "Linear" is the old MedSLT semantics; AFF semantics is explained in the paper by Pierrette, Beth Ann, Yukie and myself which has just been accepted for COLING 2008, and which will soon be appearing on the Geneva website.

The plain/combined distinction says what information has been incorporated in the corpus. The "plain" corpus is created by merging results of translating FROM each source language into interlingua, so each interlingua form lists the source language results that translate into it. The "combined" corpus contains all the information in the "plain" corpus, plus also the results of translating TO each target language.

At the moment, we use the plain corpus for developing translations rules that go from Interlingua to target languages. The combined corpus is used for creating help resources.

All the scripts used to build interlingua corpora are referenced in $MED_SLT2/Interlingua/scripts/Makefile.

Sunday, 1 June 2008

Running multiple copies of the GUI

Elisabeth did a little work over the weekend, and it's now possible to run multiple copies of the GUI simultaneously - this is an important feature that people have been requesting for some time. The solution turns out to be embarrassingly simple. All we needed to do, in the end, was fix things so that it's possible for both the Java and the Prolog processes to specify from the command line which port they use to communicate with each other. As long as different {Java, Prolog} pairs use different ports, they don't interfere with each other.

I've added an example script to Regulus/Java called run_prolog_and_java2.bat - this is just like run_prolog_and_java.bat, but starts a second pair of processes, communicating over a new port.

Saturday, 31 May 2008

Progress on N-best rescoring

Maria Georgescul and I have been doing some work over the last few days on N-best rescoring, using the Calendar application as a test-bed. The basic division of labor was for me to define features and transform N-best hypothesis lists into lists of feature vectors, while Maria fed these into an SVM-based learner to perform the actual rescoring. We did the experiments using a set of 459 recorded utterances. Rescoring now reduces semantic error rate from 19% to 11%, and WER from 11% to 10%.

I defined the features by looking at examples of N-best lists, and finding common examples of things which I felt intuitively should be penalized. The current set of features is as follows:


rank: Place in the N-best list

no_dialogue_move: Hypothesis produces no dialogue move

underconstrained_query: Query with no contentful constraints

non_indefinite_existential: Existentials with non-indefinite arg, e.g. "is there the meeting next week"

non_show_imperative: Imperatives where the main verb isn't "show" or something similar

indefinite_meeting_and_meeting_referent: combination of indefinite mention of meeting + available meeting referent

Wednesday, 28 May 2008

Printing N-best feature info at top level

If you're in dialogue mode, and have N-best preferences defined, you now get them printed out at top level. This is useful for debugging feature definitions. Here's an example from the Calendar application:


>> what was the last meeting

      Old state: [lf=[[whq,form(past,[[be,term(the_last,meeting,[]),[loc,where]]])]], 
                  referents=[record(meeting,meeting_10),attribute(meeting,meeting_10,where)]]
             LF: [[whq,form(past,[[be,term(the_last,meeting,[]),term(what,null,[])]])]]
    Resolved LF: [[whq,form(past,[[be,term(the_last,meeting,[]),term(what,null,[])]])]]
     Resolution: [trivial]
  Dialogue move: [tense_information=referent(past), utterance_type=whq, 
                  aggregate(last_n_meetings(1),[])]
  Resolved move: [tense_information=interval(datime(1980,0,0,0,0,0),datime(2008,5,28,18,27,24)),
                  utterance_type=whq, aggregate(last_n_meetings(1),[])]
     Paraphrase: list meetings in past the last meeting
Abstract action: say(referent_list([record(meeting,meeting_10)]))
Concrete action: tts(meeting at pierrette 's room on november 25)
      New state: [lf=[[whq,form(past,[[be,term(the_last,meeting,[]),term(what,null,[])]])]], 
                  referents=[attribute(meeting,meeting_10,where),record(meeting,meeting_10)]]

N-BEST FEATURES AND SCORES:

rank                                    -1.00 * 0.00 = 0.00
no_dialogue_move                        -50.00 * 0.00 = 0.00
underconstrained_query                  -10.00 * 0.00 = 0.00
inconsistent_tense                      -10.00 * 0.00 = 0.00
non_indefinite_existential              -10.00 * 0.00 = 0.00
non_show_imperative                     -50.00 * 0.00 = 0.00
definite_meeting_and_meeting_referent   3.00 * 0.00 = 0.00

Total score: 0.00

Dialogue processing time: 0.00 seconds

>> when did that meeting start

      Old state: [lf=[[whq,form(past,[[be,term(the_last,meeting,[]),term(what,null,[])]])]], 
                  referents=[attribute(meeting,meeting_10,where),record(meeting,meeting_10)]]
             LF: [[whq,form(past,[[start,term(that,meeting,[])],[time,when]])]]
    Resolved LF: [[whq,form(past,[[start,term(that,meeting,[])],[time,when]])]]
     Resolution: [trivial]
  Dialogue move: [query_object=start_time, referent_from_context=meeting, 
                  tense_information=referent(past), utterance_type=whq]
  Resolved move: [meeting=meeting_10, query_object=start_time, referent_from_context=meeting, 
                  tense_information=interval(datime(1980,0,0,0,0,0),datime(2008,5,28,18,27,35)),
                  utterance_type=whq]
     Paraphrase: start time for that meeting in past
Abstract action: say(referent_list([attribute(meeting,meeting_10,start_time)]))
Concrete action: tts(10 00 on november 25)
      New state: [lf=[[whq,form(past,[[start,term(that,meeting,[])],[time,when]])]], 
                  (referents
                   = 
                   [attribute(meeting,meeting_10,where), record(meeting,meeting_10), 
                    attribute(meeting,meeting_10,start_time)])]

N-BEST FEATURES AND SCORES:

rank                                    -1.00 * 0.00 = 0.00
no_dialogue_move                        -50.00 * 0.00 = 0.00
underconstrained_query                  -10.00 * 0.00 = 0.00
inconsistent_tense                      -10.00 * 0.00 = 0.00
non_indefinite_existential              -10.00 * 0.00 = 0.00
non_show_imperative                     -50.00 * 0.00 = 0.00
definite_meeting_and_meeting_referent   3.00 * 1.00 = 3.00

Total score: 3.00

Dialogue processing time: 0.01 seconds

Tuesday, 27 May 2008

Catching Regulus errors

Peter Ljunglöf wondered whether error reporting in Regulus could be improved, and had a couple of suggestions. I've implemented and checked in the following improvements:

All error messages should now be printed to stderr.
When processing fails during execution of the Regulus command , a line of the form

Error processing command:

should be printed. This was not previously the case.
There is a new top-level predicate

regulus_batch_storing_errors(+ConfigFile, +Commands, -ErrorString)

which is like regulus_batch/2, except that it instantiates ErrorString with a string containing all the errors printed out during execution of Commands.

I expect there will be some glitches (I had to change a lot of lines of code), so please let me know if thing don't work as intended.

Building help resources from the combined interlingua corpus (2)

Considerable progress on this task today:

I've added French to the AFF interlingua corpora, including the New York material as requested by Pierrette. The corpora are remade and checked in.
The help resources for Eng and Ara (the languages where we have help class definitions) are now made from the combined interlingua corpus. A separate help file is made for each of the six pairs EngAra, EngFre, EngJap, AraEng, AraFre, AraJap, reflecting the different levels of coverage. You can make the help resources for all of these pairs by doing 'make help_resources' in $MED_SLT2 (i.e. at the top level in the MedSLT directory), and it only takes a few minutes.

Building help resources from the combined interlingua corpus

I've just checked in code that allows us to build Prolog help resources from the combined interlingua corpus in multi-lingual translation applications. This will make it much easier to integrate construction of help resources into the MedSLT build - it should now be almost trivial.

I'm currently remaking the interlingua corpus (I have had to change the format a little), and should be able to check in all the relevant MedSLT stuff later this evening.

Flagging ambiguity in interlingua checking

I've just checked in code to catch cases where interlingua is
ambiguous, in the sense of generating multiple different surface
strings in the interlingua grammar. This is most likely to occur in
AFF, when the to-interlingua rules are underconstrained and the
interlingua is only partially instantiated. The following Japanese
-> Interlingua example in MedSLT illustrates:


>> doko ga itami masu ka

Source: doko ga itami masu ka
Target: WH-QUESTION pain be where PRESENT ACTIVE
Other info:
n_parses = 1
parse_time = 0.297
source_representation = [null=[path_proc,itamu], null=[tense,present],
                      null=[utterance_type,question], subject=[body_part,doko]]
source_discourse = [null=[utterance_type,question], subject=[body_part,doko],
                 null=[tense,present], null=[path_proc,itamu]]
resolved_source_discourse = [null=[utterance_type,question], subject=[body_part,doko],
                          null=[tense,present], null=[path_proc,itamu]]
resolution_processing = trivial
interlingua = [loc=[loc,where], arg1=[secondary_symptom,pain], null=[tense,present],
            null=[utterance_type,whq], null=[verb,be], null=[voice,active]]
interlingua_surface = WH-QUESTION pain be where PRESENT ACTIVE
other_interlingua_surface = [WH-QUESTION pain be above-loc where PRESENT ACTIVE,
                          WH-QUESTION pain be around-loc where PRESENT ACTIVE,
                          WH-QUESTION pain be between-loc where PRESENT ACTIVE,
                          WH-QUESTION pain be in-loc where PRESENT ACTIVE,
                          WH-QUESTION pain be under-loc where PRESENT ACTIVE]

Background

If you've reached this blog and don't have any idea what it's about, Regulus is an Open Source platform for constructing speech-enabled systems, which we've been developing since 2001. We've now built several high-profile applications, including Clarissa, so far the only speech-enabled system to have flown in space, and MedSLT, a medical speech translator. You can read more about Regulus here

First entry

Rather than mail people about new Regulus features, fixes, etc, I am starting a blog. Don't know why I didn't do this earlier!