Wednesday 30 December 2009

Swedish MedSLT, continued

A few more days of messing around, and I'm now translating about 80% of the 1200-item combined interlingua corpus into Swedish. Elisabeth (a native speaker) looked at about half of the material, and made some suggestions in the direction of improving quality. After implementing them, she thinks that over 90% of the translations are clearly good. To do this work, I have had to make a few more improvements in the combined Scandinavian/English grammar: the most important of these is an initial treatment of lexically reflexive verbs.

I'll see if I can improve the numbers a bit more, and will then start on the Swedish-to-Interlingua direction.

Friday 25 December 2009

Swedish MedSLT, continued

Encouraging progress on Swedish MedSLT: I can now translate a third of the interlingua corpus into Swedish. The translations aren't very good yet, but they are nearly all grammatical. I think it will be fairly easy to improve things.

Tuesday 22 December 2009

Swedish MedSLT

I have just started on a Swedish version of MedSLT; this will give the new Scandinavian/English grammar a much more thorough workout. After an hour or two of messing around, I can parse one sentence, var har du ont ("where is your pain", literally "where have you pain").

The first thing I notice is that we need rules for impersonal and lexically reflexive verbs, which are very important in this domain. In particular, an expression that occurs all the time is det gör ont; "it hurts", literally "it makes pain".

More soon.

Saturday 19 December 2009

A Scandinavian/English grammar, continued

I've now changed the config files for English CALL-SLT so that they use the new Scandinavian grammar instead of the English-only one that it's based on. This will give us more of a chance to test how it works.

Next, I will make similar changes in the English part of MedSLT.

Friday 18 December 2009

A Scandinavian/English grammar, continued

The Scandinavian grammar is working quite well, but it's certainly not complete yet. Here are some important things that still need to be added, all of which occur in Scandinavian but not in English:
  • Negation. This is just an adverb in Scandinavian. The slightly non-trivial thing is that the position of this adverb (also some others) is different, depending on whether it's a main or a subordinate clause.
  • Lexically reflexive pronouns. As in Romance, some Scandinavian verbs subcategorize for lexically reflexive pronouns, which have no semantic value.
  • Lexical passive. Scandinavian verbs have a lexical passive form. I propose to do this in the morphotax.
  • Definiteness. I don't yet have all the definiteness constraints. In particular, a definite singular NP has an implicit definite article, but a premodifying adjective requires an explicit definite article.
I don't think any of these things are particularly difficult to implement, or should require major changes to the grammar.

Wednesday 16 December 2009

Swedish CALL-SLT

I must stop messing around with the Scandinavian grammar... it's really too much fun! Anyway, I now have a reasonable first cut at a Swedish version of CALL-SLT, working as usual in the restaurant domain. Elisabeth helped me add more material to the Swedish corpus last night; currently, it contains about 160 entries, of which about 90% work. I just tried out the live system, using Maria's GUI, and it runs fine. As I'd hoped, recognition picked up a good deal once I was able to switch on N-best rescoring. I'm getting performance in Swedish only slightly inferior to what I get in English, which seems reasonable given my relative abilities in the two languages.

If people want to try it out, you need to update both Regulus and CALL-SLT using the -d flag, and do a make in CALLSLT/Swe/scripts. Then run in the usual way. So far there is no spoken help, but Elisabeth has promised to record files over Christmas.

Tuesday 15 December 2009

A Scandinavian/English grammar, continued

A bit more fiddling around, and I have two-thirds of the initial Swedish CALL-SLT corpus parsing. You can see it here. I've also compiled an initial recogniser. So far, it doesn't work very well, but if it's like the other languages it will improve considerably once I add N-best rescoring.

This is all progressing rather nicely! I need to do some work now on processing the results of our recent CALL-SLT experiments, but once I've done that I'll return to the Swedish. Elisabeth says she will act as our native speaker.


Monday 14 December 2009

A Scandinavian/English grammar

This weekend, I finally started on a task that I've been meaning to do for ages, and put together a first version of a shared grammar that is intended to cover both English and the Scandinavian languages Swedish, Norwegian and Danish. I've checked it into Regulus/Grammar/Scandinavian. Initially, I'm developing using the CALL-SLT restaurant domain, in English and Swedish. The Swedish version can already handle a reasonable range of language. Here's an example: skulle jag kunna få en pizza = would I be-able-to get a pizza. This is a common way to ask for something in Swedish.
>> skulle jag kunna få en pizza
(Parsing with left-corner parser)

Analysis time: 0.34 seconds

3 possibilities:

----------------------------------------------------------------
Possibility 1
Return value: [(null=[action,få]), (object=[food,pizza]), (null=[modal,kan]),
(null=[modal,skulle]), (agent=[pronoun,jag]), (null=[utterance_type,ynq]),
(null=[voice,active])]

Global value: []

Syn features: []

Parse tree:

.MAIN [GENERAL_SCA:541-546]
top [GENERAL_SCA:552-558]
/ utterance_intro null [GENERAL_SCA:566-568]
| utterance [GENERAL_SCA:615-620]
| s [GENERAL_SCA:713-718]
| s [GENERAL_SCA:817-826]
| vp [GENERAL_SCA:1124-1137]
| / vbar [GENERAL_SCA:876-898]
| | / v lex(skulle) [GEN_SWE_LEX:51-51]
| | | np [GENERAL_SCA:1952-1960]
| | \ pronoun lex(jag) [GEN_SWE_LEX:200-200]
| | vp [GENERAL_SCA:1124-1137]
| | / vbar [GENERAL_SCA:853-875]
| | | v lex(kunna) [GEN_SWE_LEX:61-62]
| | | vp [GENERAL_SCA:1317-1337]
| | | / vp [GENERAL_SCA:1042-1051]
| | | | / vbar [GENERAL_SCA:853-875]
| | | | | v lex(få) [CALLSLT_LEX:33-33]
| | | | | np [GENERAL_SCA:2073-2091]
| | | | | / np [GENERAL_SCA:1907-1917]
| | | | | | / d lex(en) [GEN_SWE_LEX:277-278]
| | | | | | | nbar [GENERAL_SCA:2118-2130]
| | | | | | \ n lex(pizza) [CALLSLT_LEX:179-181]
| | | | \ \ post_mods null [GENERAL_SCA:1451-1457]
| \ \ \ post_mods null [GENERAL_SCA:1451-1457]
\ utterance_coda null [GENERAL_SCA:597-599]

------------------------------- FILES -------------------------------

CALLSLT_LEX: d:/cygwin/home/speech/call-slt/swe/regulus/callslt_lex.regulus
GENERAL_SCA: d:/cygwin/home/speech/regulus/grammar/scandinavian/general_sca.regulus
GEN_SWE_LEX: d:/cygwin/home/speech/regulus/grammar/scandinavian/swedish/gen_swe_lex.regulus

Here are some of the issues I've encountered so far:
  • Agreement is different in English and Swedish, so the range of possible values for the 'agr' feature has to be language-dependent. In English, agreement is by person and number. In Swedish, it's primarily by number and gender ("common" or "neuter"). However, you also need person, since reflexive pronouns agree with the subject in person.
  • All verbs invert in Swedish, and there is no auxiliary "do".
  • The range of verb inflections is different in the two languages. English verbs have five forms: base, third person singular present, imperfect, past participle, present participle. So, for example, go, goes, went, gone, going. Modern Swedish doesn't inflect by person or number in the present tense, but on the other hand distinguishes the imperative from the infinitive, distinguishes the "supine" (the form used for the perfect tense) from the past participle, and inflects the past participle by number and gender. So, for example, bryta (break) has the forms bryt (imperative), bryta (infinitive), bryter (present), bröt (imperfect), brutit (supine), brytande (present participle), bruten (past participle singular common), brutet (past participle singular neuter), brutna (past participle plural).
  • Swedish nouns inflect for definiteness. So for example bord is "table", but bordet is "the table". Adjectives also inflect for definiteness, thus ett stort bord ("a big table") but det stora bordet ("the big table").
  • Swedish possessives inflect for gender and number. So min bil ("my car", common/singular), mitt hus ("my house", neuter/singular), mina barn ("my children", plural).
  • Swedish partitives are slightly different. In English, "a bottle of beer"; in Swedish en flaska öl (nothing corresponding to the "of").
  • Swedish date and time grammar is slightly different. In English, "december fourteenth"; in Swedish, fjortonde december. In English, "nine thirty"; in Swedish, nio och trettio.
  • Swedish negation is basically an adverb, e.g. jag beställde inte någon pizza = I ordered not any pizza. The negation adverb's position is after the verb in main clauses, before in subordinate clauses.
Most of this stuff seems easy, and I can adapt the treatments we implemented in the SLT grammar during the 90s. I am guessing that 85-90% of the rules in the final shared grammar will be common to English and Swedish. Judging from our experiences with SLT, Danish and Swedish overlap to 95% or better, and Norwegian should be similar.

Wednesday 23 September 2009

New treatment of Japanese verbs

I've been discussing Japanese verbs with Yukie - we need new inflectional forms for CALL-SLT, in the particular the volitional (-tai) form, and the old system was getting out of hand. Japanese has extraordinarily regular morphology, with only two irregular verbs and some very straightforward sound-changes, so it seemed to me that we really ought to be able to get by without explicitly listing all the inflections of every verb we needed.

Yukie wrote down a table of inflections, and we discussed ways of splitting up inflected verbs into stems and affixes. Based on our discussion, I've implemented a first version of a new treatment, where you now only need to specify a single root form of the verb, and everything else is done by morphotax rules, where the affixes are treated by Nuance as separate words. I've tested by converting the Japanese Calendar lexicon to the new form, and compiling into a recognizer. Coverage is what it was, and recognition is anecdotally fine with my voice. I will dig out some Japanese Calendar data soon and run proper tests.

If anyone wants to look at the details, the morphotax rules are in $REGULUS/Grammar/Japanese/japanese_verb_morphology.regulus. The new version of the Japanese Calendar lexicon is at $REGULUS/Examples/Calendar/Regulus/japanese_calendar_lex_new.regulus.

Here's an example of a parse:

$ nanji ni owa ri mashita ka

(Parsing with left-corner parser)

Analysis time: 0.09 seconds

Return value: [[question,form(past,[[owaru],[ni,term(null,nanji,[])]])]]

Global value: []

Syn features: []

Parse tree:

.MAIN [JAPANESE_CORE_RULES:112-116]
top [JAPANESE_CORE_RULES:117-119]
utterance [JAPANESE_CORE_RULES:120-123]
/ main_clause [JAPANESE_CORE_RULES:147-151]
| s [JAPANESE_CORE_RULES:155-162]
| / comps [JAPANESE_CORE_RULES:190-195]
| | / pp [JAPANESE_CORE_RULES:414-423]
| | | / np [JAPANESE_CORE_RULES:267-273]
| | | | n lex(nanji) [JAPANESE_CALENDAR_LEX_NEW:84-84]
| | | \ p lex(ni) [JAPANESE_CALENDAR_LEX_NEW:274-284]
| | \ comps null [JAPANESE_CORE_RULES:163-166]
| | vbar [JAPANESE_CORE_RULES:249-253]
| | v [JAPANESE_VERB_MORPHOLOGY:13-26]
| | / v_stem [JAPANESE_VERB_MORPHOLOGY:27-38]
| | | / v_stem lex(owa) [JAPANESE_CALENDAR_LEX_NEW:227-237]
| | | \ stem_affix lex(ri) [JAPANESE_VERB_MORPHOLOGY:138-138]
| \ \ affix lex(mashita) [JAPANESE_VERB_MORPHOLOGY:80-83]
\ lex(ka)

------------------------------- FILES -------------------------------

JAPANESE_CALENDAR_LEX_NEW:
d:/cygwin/home/speech/regulus/examples/calendar/regulus/japanese_calendar_lex_new.regulus
JAPANESE_CORE_RULES:
d:/cygwin/home/speech/regulus/grammar/japanese/japanese_core_rules.regulus
JAPANESE_VERB_MORPHOLOGY:
d:/cygwin/home/speech/regulus/grammar/japanese/japanese_verb_morphology.regulus

Thursday 17 September 2009

"Abstract actions" and the dialogue server

I've had some productive discussions with Maria over the last few days, which have resulted in a couple of significant improvements to the dialogue server. Maria is going to build a Java GUI for the CALL-SLT system. She needs to be able to send requests to the dialogue server, and get back information that she will pass on to the user. Most often this will be in the form of screen-based output. The new functionality is motivated by this scenario, but is quite generic.

The first point Maria made was that she would prefer to use XML-formatted messages. Java finds it easy to manipulate XML; parsing Prolog messages, on the other hand, is a complete pain. So I added switches that allow the client to put the server into a mode where all messages are XML strings inside a minimal Prolog wrapper.

Yesterday, Maria made another very sensible request. In the first version of the application, the Prolog "output manager" module received abstract actions, and transformed them into concrete actions. Typically, concrete actions would involve printing strings. So, for example, suppose that the system has just given you the prompt

POLITE REQUEST TABLE outside

and you have correctly replied

i would like a table outside please

The abstract action produced is

display_matching_info('i would like a table outside please', correct, [2,1,2])

which means that the recognized words were 'i would like a table outside please'; they correctly matched the prompt; and the score is now 2 correct, 1 incorrect, with a positive streak of 2. This is converted by the output manager into the concrete action

print('I heard: "i would like a table outside please"

Correct!

Score: 2 right, 1 wrong (66.7%) Streak: 2')

This works fine for a text-based command-line interface; but, as Maria pointed out, the output processing isn't necessarily appropriate when you have a Java Swing GUI, in which case you probably would prefer to format it yourself. For instance, you might want to print the recognized words in one pane, render the "correct" as a green tick-mark in another one, and present the score graphically as three columns of different heights.

In general, the abstract action is going to be more useful to you than the concrete one. So I've just added a little more functionality to the dialogue server to handle that too. Here's a summary of the new messages, and what they do; they are also documented in the file itself, $REGULUS/Prolog/dialogue_server.pl.
  • action(xml_messages). Format future messages in both directions in XML form. Each message will be of the form

    xml_message(XMLString).

    where XMLString is an XML encoding of the corresponding Prolog message produced using the predicate prolog_xml/2 in $REGULUS/PrologLib/prolog_xml.pl. The XML can be converted back into Prolog if necessary using the same predicate.
  • action(prolog_messages). Format future messages in both directions in Prolog form (default).
  • action(abstract_actions). Pass abstract actions to the client, so that the client can do its own output management.
  • action(concrete_actions). Pass concrete actions to the client (default).

Tuesday 15 September 2009

Warning: don't use SICStus 4.0.2

Maria and I have just spent two very frustrating days trying to figure out why CALL-SLT wasn't running correctly on her machine. In the end, it turned out that a few bits of Regulus functionality don't work correctly under SICStus 4.0.2, which is the version she was using... there appears to be something wrong with the SICStus/operating system interface.

So avoid this release! 4.0.4 is fine.

Sunday 13 September 2009

Dialogue server now accepts XML format messages

Another of those things I should no doubt have done years ago: after a discussion with Maria, I've now modified the dialogue server so that it can also run in a mode where all messages are XML-formatted.
Details (this is documented in $REGULUS/Prolog/dialogue_server.pl):

- Initially, the server is in Prolog mode.

- To put the server into XML mode, send the message

xml_messages.

- Subsequent messages are of the form

xml_message(XMLMessage).

where XMLMessage is an XML encoding of the corresponding Prolog message produced using the predicate prolog_xml/2 in $REGULUS/PrologLib/prolog_xml.pl.
The XML can be converted back into Prolog if necessary using the same predicate.

I have tested by converting the CALL-SLT Prolog client to use XML-flavor messages, and it all works fine.

The routines in $REGULUS/PrologLib/prolog_xml.pl should in general be useful for translating Prolog into XML form in a reversible way. Look at the file for documentation and an example.

Monday 7 September 2009

CALL-SLT and Japanese

I have just added a little more coverage to the Japanese version... it now has about a dozen sentences for the student to practice on. I tried it, and so far it still recognizes everything I say. This is probably more because vocabulary is so small than because I have a wonderful Japanese accent :)

Yukie and I should talk about how to proceed here. The first step will be to add material to the Japanese corpus.

Saturday 5 September 2009

Using recorded wavfiles as help information in CALL-SLT (part 2)

I did a little more fiddling around with the translation game strategy code, and it's now possible to define a strategy where the system only chooses entries which don't have an associated wavfile. The idea is to make it easy for the teacher to add missing wavfiles.

I tested it on English, and we now have a complete set of wavfiles for that language. As soon as we have a bit more coverage for French and Japanese, I'll add similar scripts for them too.

Friday 4 September 2009

Using recorded wavfiles as help information in CALL-SLT

I've got a new feature working on CALL-SLT, which allows speech input to be logged and reused as help for students. When you start the system, it asks whether or not you wish to be considered a native speaker. If you answer yes, it keeps the wavfile for each successful match, and stores it in such a way that that the wavfile is associated with the current prompt. Subsequently, if a student is given the same prompt and hits the HELP button, the native speaker's wavfile is replayed. By construction, we know that the native speaker was correctly recognized, so if the student can just imitate them well enough they should be recognized too.

The idea is simple, but there were some messy technical problems... a bad interaction between Nuance and SICStus concerning relative pathnames, and the question of what happens if two different users try to check in new wavfiles simultaneously. I think I have decent solutions, though. For more details, look at the online documentation which I have just added. This also tells you how to download and run the system.

Sunday 23 August 2009

CALL-SLT and Japanese

And we now have a skeleton Japanese system too. So far, it can only do one sentence,

hitori no teeburu wa arimasu ka

which is rendered in the Interlingua as

POLITE REQUEST TABLE 1 PERSON

Though it is nice that this goes all the way through: I can get the Interlingua as a prompt, speak the sentence, and be informed that I got it right. Not surprisingly, since the grammar doesn't cover anything else, it's very reliable when you say the one thing it knows!

Yukie and I need to get together and add more content. The first step will be for Yukie to flesh out the corpus, which currently only has a dozen or so examples.

Saturday 22 August 2009

CALL-SLT and French

We should now have a complete set of of scripts, config files etc for the French version of CALL-SLT. I added a couple of placeholder files, with enough translation rules to do the sentence "Je voudrais deux bières". The make appears to work, and I was able to run the initial version of the translation game in the server. Over to Pierrette to add some actual content!

Next, Japanese...

Progress on CALL-SLT

I have the generic CALL-SLT functionality packaged up so that it can be run inside the dialogue server - this involved extending the dialogue server a little bit, so that you can now call recognition from inside it. The interface between the client and the CALL-SLT-loaded dialogue server is consequently very simple. There are so far just three commands:
  • "Next prompt". The server generates a new prompt, using its current strategy, and returns it.
  • "Recognize and match". The server performs recognition, translates it to Interlingua, matches against the current prompt, and returns a string explaining what happened.
  • "Help". The server returns the current prompt, plus a text example illustrating one possible way to realize the prompt.
I have been testing with a minimal Prolog-based client. It should be easy to write a Java client which offers a nice GUI-style interface.

Thursday 20 August 2009

Progress on CALL-SLT

I have a first version of the translation game working for English! It's still very clunky indeed (I'm running it from the command-line in the development environment), but it's already kind of fun. Next step will be to package it up in a better way, using the dialogue server; this should be quite easy. Once I've done that, it'll be possible to add a Java GUI, so that we have a complete first version of the system.

Of course, what I really want to do is try it in another language... I already know how to order in English restaurants! I'll start sorting out the config files and scripts for Japanese. My restaurant Japanese is shaky, and I'm very curious to see if I can use CALL-SLT to improve it.

Wednesday 19 August 2009

Progress on CALL-SLT

I've spent some time messing around with the English to Interlingua translation rules and the Interlingua grammar. We're now able to translate 95% of the development corpus into sensible-looking interlingua - though the current surface form will probably be revised a bit at some point by Pierrette and Johanna.

At any rate, you can now run the speech-input English system from the Regulus command-line, setting it so that spoken inputs are parsed and translated into Interlingua. I don't think it's that much extra work to add code so that we have a complete first version of the system. There will initially be two commands. You can either ask for a new Interlingua prompt, or ask to speak. If you speak, it will translate what you say into Interlingua, compare with the current prompt, and score you. With any luck, I'll have this working before next week.

Here's the current English development corpus, with the Interlingua translations it produces:

i will take a beer REQUEST beer
give me a beer REQUEST beer
hello NEUTRAL-GREETING
good evening EVENING-GREETING
i would like a table for one POLITE REQUEST TABLE 1 PERSON
could i have a table for one POLITE REQUEST TABLE 1 PERSON
do you have a table for one POLITE REQUEST TABLE 1 PERSON
i would like a table for two POLITE REQUEST TABLE 2 PERSON
could i have a table for three POLITE REQUEST TABLE 3 PERSON
do you have a table for four POLITE REQUEST TABLE 4 PERSON
do you have a table outside POLITE REQUEST TABLE outside
is there a table outside POLITE REQUEST TABLE outside
could i have a table outside POLITE REQUEST TABLE outside
could we have a table by the window POLITE REQUEST TABLE by-loc window
do you have a table near the window POLITE REQUEST TABLE in-loc window
i 'd like a table in the smoking area POLITE REQUEST TABLE smoking
i 'd like a table in the non-smoking area POLITE REQUEST TABLE non-smoking
smoking please POLITE REQUEST TABLE smoking
non-smoking please POLITE REQUEST TABLE non-smoking
could i have a table in the corner POLITE REQUEST TABLE in-loc corner
can i have a table in the corner POLITE REQUEST TABLE in-loc corner
i would like a non-smoking table for one POLITE REQUEST TABLE 1 PERSON non-smoking
do you have a non-smoking table for one POLITE REQUEST TABLE 1 PERSON non-smoking
a non-smoking table for two please POLITE REQUEST TABLE 2 PERSON non-smoking
could i have a non-smoking table for three people POLITE REQUEST TABLE 3 PERSON non-smoking
i would like a table for three people in the smoking area POLITE REQUEST TABLE 3 PERSON smoking
i would like a table for four POLITE REQUEST TABLE 4 PERSON
do you have a table for four POLITE REQUEST TABLE 4 PERSON
i would like to reserve a table for seven o'clock POLITE REQUEST TABLE 19 00
could i reserve a table for seven thirty POLITE REQUEST TABLE 19 30
i 'd like to reserve a table for six forty five POLITE REQUEST TABLE 18 45
i would like to reserve a table for two for seven fifteen POLITE REQUEST TABLE 2 PERSON 19 15
could i reserve a table for two people for eight o'clock POLITE REQUEST TABLE 2 PERSON 20 00
do you have a table for two at seven thirty POLITE REQUEST TABLE 2 PERSON 19 30
could i reserve a table for seven o'clock tomorrow evening POLITE REQUEST TABLE 19 00 time-evening date-tomorrow
i would like to reserve a table for six forty five tomorrow please parsing_failed
could i reserve a table for two for tomorrow evening POLITE REQUEST TABLE 2 PERSON time-evening date-tomorrow
could i reserve a table for this evening POLITE REQUEST TABLE time-evening date-today
do you have a table for three tomorrow evening around seven o'clock POLITE REQUEST TABLE 3 PERSON 19 00 time-evening date-tomorrow
i have a reservation in the name of smith parsing_failed
i should have a reservation in the name of smith parsing_failed
could we see the menu POLITE REQUEST menu
i would like to see the menu POLITE REQUEST menu
could we get the menu POLITE REQUEST menu
could we have the bill POLITE REQUEST check
could we have the bill please POLITE REQUEST check
may i have the bill POLITE REQUEST check
may i have the check POLITE REQUEST check
could you give us the check please POLITE REQUEST check
could you give me a receipt POLITE REQUEST receipt
i would like a receipt please POLITE REQUEST receipt
could i get a receipt please POLITE REQUEST receipt
i would like a beer POLITE REQUEST beer
could we have two beers POLITE REQUEST 2 beer
could you give us two beers POLITE REQUEST 2 beer
could i have a latte POLITE REQUEST latte
i 'd like a medium latte POLITE REQUEST medium latte
a cup of tea please POLITE REQUEST cup tea
two cups of tea please POLITE REQUEST 2 cup tea
could we get two glasses of water POLITE REQUEST 2 glass water
i 'd like a glass of the house red POLITE REQUEST glass house-red-wine
could we have two glasses of the house red POLITE REQUEST 2 glass house-red-wine
i would like a large glass of white wine POLITE REQUEST large glass white-wine
i would like two small glasses of white wine POLITE REQUEST 2 small glass white-wine
i would like a pizza POLITE REQUEST pizza
i would like a hamburger POLITE REQUEST hamburger
i would like the soup POLITE REQUEST soup
i would like two pizzas POLITE REQUEST 2 pizza

Tuesday 18 August 2009

Progress on CALL-SLT

I've now got enough stuff working in CALL-SLT that it's possible to translate a few simple sentences from spoken English to Interlingua. Here's an example:

>> i would like two beers

Source: i would like two beers
Target: POLITE REQUEST two beer
Other info:
n_parses = 1
parse_time = 0.047
source_representation = [(agent=[pronoun,i]), (null=[action,like]), (null=[modal,would]),
(null=[utterance_type,dcl]), (null=[voice,active]),
(object=[drink,beer]), (object=[spec,2])]
transfer_to_source_discourse_time = 0.0
source_discourse = [(null=[utterance_type,dcl]), (agent=[pronoun,i]), (null=[voice,active]),
(null=[modal,would]), (null=[action,like]), (object=[spec,2]),
(object=[drink,beer])]
resolved_source_discourse = [(null=[utterance_type,dcl]), (agent=[pronoun,i]),
(null=[voice,active]), (null=[modal,would]), (null=[action,like]),
(object=[spec,2]), (object=[drink,beer])]
resolution_processing = trivial
resolution_time = 0.0
transfer_to_interlingua_time = 0.0
interlingua = [(arg2=[drink,beer]), (arg2=[number,2]), (null=[politeness,polite]),
(null=[utterance_type,request])]
interlingua_surface = POLITE REQUEST two beer
interlingua_checking_time = 0.0

--- Performed command i would like two beers, time = 0.05 seconds

We are not far from having everything we need to be able to build and run an initial version of the CALL-SLT server. Originally, it will prompt in the surface interlingua. but I'll leave in a hook so that we can use the picture interlingua as soon as we have something available.

Sunday 16 August 2009

CALL-SLT corpus

For people who don't already know, we are collecting our initial corpus for CALL-SLT on a Google Docs document. If you don't already have access to this document and would like to contribute, please get a Google account and mail me your Google address.

All contributions very gratefully received - and don't worry that you might break something accidentally, Google Docs has excellent facilities for tracking revisions and if necessary reverting to earlier versions.

Here's a sample extract showing what the corpus looks like:

Fre: auriez-vous une table sur la terrasse (SVP) ?
Fre: auriez-vous une table en terrasse ?
Eng: do you have a table outside
Eng: is there a table outside
Eng: could i have a table outside
Int: POLITE REQUEST LOCATION-TABLE LOCATION-OUTSIDE [note: is "en terrasse" different from "outside"?]

Fre: auriez-vous une table près de la fenêtre (SVP) ?
Fre: auriez-vous une table à côté de la fenêtre (SVP) ?
Jap: mado gawa no seki wa ari masu ka
Eng: could we have a table by the window
Eng: do you have a table near the window
Int: POLITE REQUEST LOCATION-TABLE NEAR WINDOW

Fre: auriez-vous une table non-fumeur/fumeur (SVP)?
Fre: auriez-vous une table dans la zone non-fumeur /fumeur (SVP)?
Fre: auriez-vous une table dans la section non-fumeur/fumeur (SVP)?
Jap: kinenseki wa ari masu ka
Eng: i 'd like a table in the smoking area
Eng: i 'd like a table in the non-smoking area
Eng: smoking please
Eng: non-smoking please
Int: POLITE REQUEST LOCATION-TABLE IN SMOKING-AREA
Int: POLITE REQUEST LOCATION-TABLE IN NON-SMOKING-AREA

Fre: auriez-vous une table dans le coin (SVP) ?
Jap: oku no seki wa ari masu ka
Jap: oku no teeburu wa ari masu ka
Eng: could i have a table in the corner
Int: POLITE REQUEST LOCATION-TABLE IN CORNER

First CALL-SLT meeting

We've had our first CALL-SLT meeting, which has done a lot to clarify our immediate goals for the project. We're going to start by building a simple version of the system, constructed in such a way that it will be easy to upgrade it by successively replacing simple modules by more complex ones. Initially, we will be working in the tourism/restaurant domain, using the languages English, French and Japanese. When we have those working well enough, we'll also start on Chinese; this is a language none of us know, so it will give us intuitions about what it's like to be an elementary-level student trying to use the system to get some fluency in a language.

The initial prototype will work as follows. At each turn, the system prompts the student with a description of what they are supposed to say, formulated in a version of the Interlingua. The student will attempt to speak it in the L2 (the language they are trying to learn). The system applies speech recognition to the student's utterance, then tries to translate the result into the interlingua. Finally, it compares the translated interlingua with the one used to prompt the student, and gives them feedback on how they did. Here are more details:
  • Prompting in Interlingua. In the first version, the interlingua will be shown to the student in a text-based form, using the methods we've developed under MedSLT. So for example, the system might show the student

    POLITE REQUEST TABLE 3 PERSON TIME 19:30

    expecting the student to say something like

    I would like to reserve a table for three people at seven thirty

    or whatever the equivalent is in the L2 they are using.

  • As soon as we've figured out a good way to do it, we would like to be able to present the interlingua prompt in graphical form. So here, we might have a picture that could be described as

    Scene:
    Client is talking to waiter.
    Speech bubble from client.
    Inside speech bubble:
    three chairs around a restaurant table;
    large clock in background shows 19:30
  • All speech input to the system will be logged in the usual way. We will have a registration process which allows us to associate each recorded utterance with meta-data which in particular will specify whether or not the utterance was recorded by a native speaker, and whether or not speech recognition got it right.

  • When the system has compared the student's interlingua with the prompt interlingua, there are two simple ways for it to give helpful feedback. The first is to present both versions of the interlingua, highlighting the elements that are different. For instance, in the example above, if the system recognized

    Could I have a table for two people at seven thirty

    then the system would present the prompt and recognized interlinguas roughly as follows:

    POLITE REQUEST TABLE *3* PERSON TIME 19:30
    POLITE REQUEST TABLE *2* PERSON TIME 19:30

    The second way to give help will be to play an example of a native speaker saying some version of the sentence in the L2, if such an example already exists.

  • The prompt selection module will have hooks allowing specification of a strategy. A simple strategy we will implement soon is to choose the prompt from a list of examples where there is a recorded example of a native speaker saying the prompt in the L2, possibly with some other constraints. This will make it easy for a teacher to create a lesson. They will first interact with the system in the L2, to create a set of recorded examples which work correctly. When the student logs on, the system will then be set to select prompts matching the teacher's examples.

  • The functionality will be bundled up as a Prolog-based server, which does most of the processing, and will connect to a lightweight Java-based GUI which presents a client view. The server will initially handle two messages: (1) NEXT_EXAMPLE, returning a new interlingua prompt with associated information, and (2) RECOGNISE, prompting the student to speak, carrying out recognition, and returning the pieces of information produced by carrying out the interlingua comparison process.

Monday 10 August 2009

Initial English recognizer for CALL-SLT (Part 2)

I have done some more work on the English CALL-SLT recognizer, and we now have about 170 surface words. You can order food and drink in various ways, e.g.

i would like two beers
could i have a pizza

There is language for reserving tables, e.g.

do you have a table for two at seven thirty
could i have a non-smoking table for three people

I've also added a few more things like asking for the menu and the check. If I could say all this stuff in a language I didn't already know, say Chinese, I'd really feel I'd learned something useful!

Initial English recognizer for CALL-SLT

We can now build an initial English recognizer for CALL-SLT too. The domain is the same, ordering in restaurants. The training corpus currently contains about 60 examples, about 95% of which parse. Vocabulary is about 130 surface words. Recognition is anecdotally quite good with my voice, though it will of course be more interesting to see how foreign voices do.

If you want more details, here are the corpus and the domain lexicon.

Friday 7 August 2009

Initial French recognizer for CALL-SLT

Pierrette's now checked in some real (as opposed to placeholder) material, and I was able to compile an initial French recognizer for CALL-SLT. It covers some very basic tourist French, so far all about ordering in restaurants. Simple as it is, I was already able to use it to improve my pronunciation of "Je voudrais un verre d'eau". As everyone who's heard me speak French knows, my version of the "r" sound is terrible. Well, at least I can roll my eyes.

If people want to look at the CALL-SLT files, they're at http://callslt.cvs.sourceforge.net/viewvc/callslt/CALL-SLT/

Tuesday 4 August 2009

Adding Romance to Regulus (part 2)

I've just checked in initial French files under CALL-SLT/Fre. Update CALLSLT with the -d flag to get them.

So far, the CALL-SLT files are pretty much the same as the MedSLT files they were adapted from. I'm assuming Pierrette will make the necessary changes! I checked that you can get as far as building a Nuance grammar... it all worked fine for me, but let me know if there are problems.

Monday 3 August 2009

Adding Romance to Regulus

Following discussions with Pierrette, I've now moved the shared Romance grammar from MedSLT/Rom to the new directory Regulus/Grammar/Romance. I've also moved the domain-independent French grammar and lexicon files to Regulus/Grammar/Romance/French. This mirrors the directory structure in Jen's Germanic grammar.

I've adjusted the MedSLT config files for Fre, Spa and Cat to point to the new files. All three languages still appear to build correctly in the AFF (role-marked semantics) versions. I have not done anything with the old (linear semantics) versions, which I am now assuming are obsolete.

All three languages appeared to build fine when I tested, but Pierrette will probably want to check things more carefully. If there are problems, please me know - they should be easy to fix.

The payoff is that I will now be able to construct a config file for the French version of CALL-SLT. Coming up next.

Sunday 26 July 2009

Treebank caching and preferences

Pierrette reminded me last week that there was a known problem in treebank caching: when parse preferences change, cached analyses may no longer be valid.

I've just checked in new code which stores the preferences used to create the treebank, and compares them with the current preferences. If the two are different, the treebank is regenerated. It would be nice if we could only regenerate the part that might be affected by the changed preferences, but that's unfortunately very difficult to do.

Monday 20 July 2009

Regulus for DORIS, continued

I did some more tweaking of the DORIS grammar, and it now covers about 90% of Patrick's corpus. The Nuance grammar it generates is nice and compact (less than 1000 rules), and many of the remaining coverage holes still look easy to solve. As we hoped, this seems to be a good domain for Regulus.

Regulus for DORIS

When I was visiting Melbourne Uni last week, I talked with Patrick Ye about the possibility of using Regulus to provide speech recognition for the DORIS project, which, as he pointed out, is in fact quite similar to SHRD2; in both domains, the basic idea is to find things, pick them up, and move them around. I did indeed find it very easy to use the examples corpus and vocabulary that Patrick sent me to adapt the existing SHRD2 resources, and in just a few hours put together an initial Regulus grammar that could be compiled into a recogniser. The current version of the grammar covers a bit more than 80% of Patrick's corpus, and the recogniser can turn spoken sentences in Australian-accented English into either strings of words or scoped logical forms. For example, here's the representation it produces of "the red book is on the desk":

[[dcl,
quant(def_sing, A, [[book,A],[color,A,red]],
quant(def_sing, B,
[[desk,B]],
quant(exist,C,[[be_on_loc,C,A,B],[tense,C,present]],true)))]]

If people want to look at the details, the files are checked in at http://regulus.cvs.sourceforge.net/viewvc/regulus/Regulus/Examples/Doris/. The interesting ones are the lexicon, at http://regulus.cvs.sourceforge.net/viewvc/regulus/Regulus/Examples/Doris/Regulus/doris_lex.regulus, and the corpus, at http://regulus.cvs.sourceforge.net/viewvc/regulus/Regulus/Examples/Doris/corpora/doris_corpus.pl

Friday 10 July 2009

First steps in Bridge system

With help from Cathy Chua, our Bridge expert, I've been putting together a first version of a grammar for the Bridge domain this week. Cathy has been supplying vocabulary and a corpus of examples showing how to use it, and I've used that to build an initial lexicon. The specialized grammar derived from these resources now has a vocabulary of about 220 surface words. Here are some typical utterances it can already handle:

who has the queen of clubs
who bid one no trump
is two clubs a transfer
cover the ten with the jack
can you finesse in diamonds
can you make if spades are four one

We tried compiling a recognizer using the Australian English package, and, with Cathy's Australian voice, recognition is anecdotally quite good. (I have discovered that the difference between British English and Australian English is substantial). We will be adding more coverage over the weekend. Next week, I hope we'll be able to start thinking concretely about how to hook up the Regulus components with BASSINET, Leon Stirling's Bridge program, to produce a first cut at an end-to-end system that can respond to spoken questions and commands.

As we expected, the Bridge domain is quite a lot more more complex than anything we have tried so far in Regulus. It will definitely stretch us!

Tuesday 9 June 2009

Update on SHRD2 paraphrasing

I extended the paraphrasing capabilities for SHRD2, and we now get paraphrases for about 45% of the 180-ish examples in the current corpus. Beth Ann and I did a little experiment for the paper we presented at the SETQA-NLP workshop in Colorado last week. We each tried judging the corpus examples for correctness, using both the paraphrases and the underlying representations. In order to minimize learning effects, we permuted the order in which we did things.

Even though the "logic English" paraphrases seem very similar to the scoped logical representations, it in fact turns out that judging paraphrases is a lot faster. Even for me, knowing all the representations from having worked on them, judging paraphrases took 29 minutes, against 22 minutes for judging structures. For Beth Ann, who didn't know the datastructures previously, paraphrases were more than twice as quick. Part of the payoff comes from the fact that the paraphrase grammar acts as a filter; most ill-formed structures produce no paraphrase, hence don't need to be judged at all when paraphrases are used.

One person in the workshop audience said he was pleased to see a paper about software engineering which actually contained an experiment! Beth Ann was clearly right to insist that we do this, and work out the methodology.

Sunday 24 May 2009

First version of paraphrase grammar for SHRD2

I've just added a first version of a paraphrase grammar for SHRD2, where it tries to realize the scoped LFs in "logic English". Text example below. It doesn't do much more than this yet, but now that I have the basic structure working I'm hoping that it will be easy to add coverage. Planning to continue with this later today!

>> pick up a big red block

Old state: []
LF: [[imp,
form(imperative,
[[pick_up,term(pro,you,[]),term(a,block,[[size,big],[color,red]]),up]])]]
Intermediate 1: [[imp,
scoping_unit([modal, imperative,
term(event_exists, A,
[[pick_up, A, term(pro,B,[[you,B]]),
term(a,C,[[block,C],[size,C,big],[color,C,red]]), up]])])]]
Intermediate 2: [[imp,
quant(pro, A, [[you,A]],
quant(a, B,
[[block,B],[size,B,big],[color,B,red]],
imperative(quant(event_exists,C,[[pick_up,C,A,B,up]],true))))]]
Intermediate 3: [[imp,
quant(pro, A, [[you,A]],
quant(a, B,
[[block,B],[size,B,big],[color,B,red]],
imperative(quant(event_exists,C,[[pick_up,C,A,B]],true))))]]
Dialogue move: [[imp,
quant(exist, A, [[you,A]],
quant(exist, B,
[[block,B],[size,B,big],[color,B,red]],
imperative(quant(exist,C,[[pick_up,C,A,B]],true))))]]
Paraphrase: COMMAND there is an A SUCH THAT you are A AND there is a B SUCH THAT B is a block AND B is big AND B is red AND MAKE IT TRUE THAT there is a C SUCH THAT C is that A picks up B
Abstract action: say(i_dont_understand,present)

Thursday 21 May 2009

Multiple processing stages in dialogue top-level

When building dialogue applications, it's quite often the case that there are multiple stages in the process of converting the LF (the thing that comes out of the recognizer) into a dialogue move. I've just added some hooks so that you can pass back the intermediate levels of representation as an optional extra argument to lf_to_dialogue_move, and display them. This is useful for debugging.

Here's a text example from SHRD2:

>> is the large red block in the box

Old state: []
LF: [[ynq,
form(present,
[[be, term(the_sing,block,[[size,large],[color,red]]),
[in_loc,term(the_sing,box,[])]]])]]
Intermediate 1: [[ynq,
scoping_unit(term(event_exists, A,
[[be, A,
term(the_sing, B,
[[block,B],[size,B,large],[color,B,red]]),
[in_loc,term(the_sing,C,[[box,C]])]],
[tense,A,present]]))]]
Intermediate 2: [[ynq,
quant(the_sing, A, [[block,A],[size,A,large],[color,A,red]],
quant(the_sing, B, [[box,B]],
quant(event_exists, C,
[[be,C,A,[in_loc,B]],[tense,C,present]], true)))]]
Dialogue move: [[ynq,
quant(the_sing, A, [[block,A],[size,A,big],[color,A,red]],
quant(the_sing, B, [[box,B]],
quant(event_exists,C,[[be_in_loc,C,A,B],[tense,C,present]],true)))]]
Abstract action: say(i_dont_understand,present)
Concrete action: tts(sorry, I don't understand)
New state: []

Intermediate 1 is after addition of variables; Intermediate 2 is after scoping; and Dialogue move is after rewriting of lexical predicates. This last step is still very primitive.

Wednesday 20 May 2009

"Pick up a big red block", revisited

OK, I did some more work on SHRD2, and, as of a few minutes ago, I managed to speak a sentence and get it turned into a logic-like representation. Here's a slightly edited trace (the less interesting parts of the output have been removed). I did indeed say "pick up a big red block"!

>> RECOGNISE
(Take next loop input from live speech)

Recognised: recognition_succeeded([rec_result(62,pick up a big red block,...)

Old state: []
LF: [[imp,
form(imperative,
[[pick_up,term(pro,you,[]),term(a,block,[[size,big],[color,red]]),up]])]]
Dialogue move: [[imp,
quant(pro, A, [[you,A]],
quant(a, B,
[[block,B],[size,B,big],[color,B,red]],
imperative(quant(event_exists,C,[[pick_up,C,A,B,up]],true))))]]
Abstract action: say(i_dont_understand,present)
Concrete action: tts(sorry, I don't understand)
New state: []

Dialogue processing time: 0.01 seconds

Obviously, there's plenty more left to do before we have anything resembling a complete system. In particular, there is as yet no reference resolution or dialogue management, so it can't react to the commands and questions in any way. But, all things considered, I think we're making reasonable progress.

Sunday 17 May 2009

CAT and FEAT commands

One of the good things about doing some grammar development on SHRD2 is that it suggests ideas for new development environment functionality. I got tired of consulting the grammar every time I needed to find out what features a category had, or what possible values a feature could take. So I added a couple of new commands called CAT and FEAT, which give you those pieces of information. Here's an example:

>> CAT p
(Display information for specified category)

Features for category "p": [def,obj_sem_n_type,postposition,sem,sem_p_type,sem_pp_type]

--- Performed command CAT p, time = 0.00 seconds

>> FEAT sem_p_type
(Display information for specified feature)

Feature values for feature "sem_p_type": [[back,down,none,normal,off,onoff,over,up,updown]]

--- Performed command FEAT sem_p_type, time = 0.00 seconds

SHRD2

I was talking with some people about Terry Winograd's book Understanding Natural Language, and how inspiring I found the SHRDLU system when I first read about it as an undergraduate. They asked me what the current equivalent would be. It bothered me that I couldn't come up with anything, and that no one really seemed to be building this kind of system any more.

Well... so why not do something about it? Stage 1 is to build a speech-enabled, and hopefully rather more robust reconstruction of SHRDLU, using Regulus. Some initial stuff is already checked in under $REGULUS/Examples/SHRD2. When we've satisfied ourselves that we can handle the Blocks World, Stage 2 will be to define a new and more ambitious domain - something which will hopefully demonstrate that there has in fact been significant progress since the 70s.

I'll be posting more about this soon. So far, SHRD2 has already turned up some important holes in the general English grammar, which I'm working on fixing.

Showing missing vocabulary in EBL_TREEBANK

A trivial but rather useful little feature I just added: when you run EBL_TREEBANK, you now get missing vocabulary displayed for relevant sentences. I don't know why I didn't do this years ago. Here's an example. To get the new functionality, you just need to update Regulus from CVS, nothing needs to be remade.

>> EBL_TREEBANK
(Parse all sentences in current EBL training set into treebank form)

--- Read parsing history file (114 records) d:/cygwin/home/speech/regulus/examples/shrd2/generated/shrd2_parsing_history.pl
--- Incremental treebanking switched off, not trying to convert treebank

Parsing corpus data in d:/cygwin/home/speech/regulus/examples/shrd2/corpora/shrdlu_corpus.pl:
..
*** Parsing failed for: "find a block which is taller than the one you are holding and put it into the box", line 2
...
*** Parsing failed for: "is at least one of them narrower than the one which i told you to pick up", line 6
Words not in current vocabulary: [told]
.....
*** Parsing failed for: "will you please stack up both of the red blocks and either a green cube or a pyramid", line 12
Words not in current vocabulary: [either]
...

Blog Archive