Wednesday 30 December 2009

Swedish MedSLT, continued

A few more days of messing around, and I'm now translating about 80% of the 1200-item combined interlingua corpus into Swedish. Elisabeth (a native speaker) looked at about half of the material, and made some suggestions in the direction of improving quality. After implementing them, she thinks that over 90% of the translations are clearly good. To do this work, I have had to make a few more improvements in the combined Scandinavian/English grammar: the most important of these is an initial treatment of lexically reflexive verbs.

I'll see if I can improve the numbers a bit more, and will then start on the Swedish-to-Interlingua direction.

Friday 25 December 2009

Swedish MedSLT, continued

Encouraging progress on Swedish MedSLT: I can now translate a third of the interlingua corpus into Swedish. The translations aren't very good yet, but they are nearly all grammatical. I think it will be fairly easy to improve things.

Tuesday 22 December 2009

Swedish MedSLT

I have just started on a Swedish version of MedSLT; this will give the new Scandinavian/English grammar a much more thorough workout. After an hour or two of messing around, I can parse one sentence, var har du ont ("where is your pain", literally "where have you pain").

The first thing I notice is that we need rules for impersonal and lexically reflexive verbs, which are very important in this domain. In particular, an expression that occurs all the time is det gör ont; "it hurts", literally "it makes pain".

More soon.

Saturday 19 December 2009

A Scandinavian/English grammar, continued

I've now changed the config files for English CALL-SLT so that they use the new Scandinavian grammar instead of the English-only one that it's based on. This will give us more of a chance to test how it works.

Next, I will make similar changes in the English part of MedSLT.

Friday 18 December 2009

A Scandinavian/English grammar, continued

The Scandinavian grammar is working quite well, but it's certainly not complete yet. Here are some important things that still need to be added, all of which occur in Scandinavian but not in English:
  • Negation. This is just an adverb in Scandinavian. The slightly non-trivial thing is that the position of this adverb (also some others) is different, depending on whether it's a main or a subordinate clause.
  • Lexically reflexive pronouns. As in Romance, some Scandinavian verbs subcategorize for lexically reflexive pronouns, which have no semantic value.
  • Lexical passive. Scandinavian verbs have a lexical passive form. I propose to do this in the morphotax.
  • Definiteness. I don't yet have all the definiteness constraints. In particular, a definite singular NP has an implicit definite article, but a premodifying adjective requires an explicit definite article.
I don't think any of these things are particularly difficult to implement, or should require major changes to the grammar.

Wednesday 16 December 2009

Swedish CALL-SLT

I must stop messing around with the Scandinavian grammar... it's really too much fun! Anyway, I now have a reasonable first cut at a Swedish version of CALL-SLT, working as usual in the restaurant domain. Elisabeth helped me add more material to the Swedish corpus last night; currently, it contains about 160 entries, of which about 90% work. I just tried out the live system, using Maria's GUI, and it runs fine. As I'd hoped, recognition picked up a good deal once I was able to switch on N-best rescoring. I'm getting performance in Swedish only slightly inferior to what I get in English, which seems reasonable given my relative abilities in the two languages.

If people want to try it out, you need to update both Regulus and CALL-SLT using the -d flag, and do a make in CALLSLT/Swe/scripts. Then run in the usual way. So far there is no spoken help, but Elisabeth has promised to record files over Christmas.

Tuesday 15 December 2009

A Scandinavian/English grammar, continued

A bit more fiddling around, and I have two-thirds of the initial Swedish CALL-SLT corpus parsing. You can see it here. I've also compiled an initial recogniser. So far, it doesn't work very well, but if it's like the other languages it will improve considerably once I add N-best rescoring.

This is all progressing rather nicely! I need to do some work now on processing the results of our recent CALL-SLT experiments, but once I've done that I'll return to the Swedish. Elisabeth says she will act as our native speaker.


Monday 14 December 2009

A Scandinavian/English grammar

This weekend, I finally started on a task that I've been meaning to do for ages, and put together a first version of a shared grammar that is intended to cover both English and the Scandinavian languages Swedish, Norwegian and Danish. I've checked it into Regulus/Grammar/Scandinavian. Initially, I'm developing using the CALL-SLT restaurant domain, in English and Swedish. The Swedish version can already handle a reasonable range of language. Here's an example: skulle jag kunna få en pizza = would I be-able-to get a pizza. This is a common way to ask for something in Swedish.
>> skulle jag kunna få en pizza
(Parsing with left-corner parser)

Analysis time: 0.34 seconds

3 possibilities:

----------------------------------------------------------------
Possibility 1
Return value: [(null=[action,få]), (object=[food,pizza]), (null=[modal,kan]),
(null=[modal,skulle]), (agent=[pronoun,jag]), (null=[utterance_type,ynq]),
(null=[voice,active])]

Global value: []

Syn features: []

Parse tree:

.MAIN [GENERAL_SCA:541-546]
top [GENERAL_SCA:552-558]
/ utterance_intro null [GENERAL_SCA:566-568]
| utterance [GENERAL_SCA:615-620]
| s [GENERAL_SCA:713-718]
| s [GENERAL_SCA:817-826]
| vp [GENERAL_SCA:1124-1137]
| / vbar [GENERAL_SCA:876-898]
| | / v lex(skulle) [GEN_SWE_LEX:51-51]
| | | np [GENERAL_SCA:1952-1960]
| | \ pronoun lex(jag) [GEN_SWE_LEX:200-200]
| | vp [GENERAL_SCA:1124-1137]
| | / vbar [GENERAL_SCA:853-875]
| | | v lex(kunna) [GEN_SWE_LEX:61-62]
| | | vp [GENERAL_SCA:1317-1337]
| | | / vp [GENERAL_SCA:1042-1051]
| | | | / vbar [GENERAL_SCA:853-875]
| | | | | v lex(få) [CALLSLT_LEX:33-33]
| | | | | np [GENERAL_SCA:2073-2091]
| | | | | / np [GENERAL_SCA:1907-1917]
| | | | | | / d lex(en) [GEN_SWE_LEX:277-278]
| | | | | | | nbar [GENERAL_SCA:2118-2130]
| | | | | | \ n lex(pizza) [CALLSLT_LEX:179-181]
| | | | \ \ post_mods null [GENERAL_SCA:1451-1457]
| \ \ \ post_mods null [GENERAL_SCA:1451-1457]
\ utterance_coda null [GENERAL_SCA:597-599]

------------------------------- FILES -------------------------------

CALLSLT_LEX: d:/cygwin/home/speech/call-slt/swe/regulus/callslt_lex.regulus
GENERAL_SCA: d:/cygwin/home/speech/regulus/grammar/scandinavian/general_sca.regulus
GEN_SWE_LEX: d:/cygwin/home/speech/regulus/grammar/scandinavian/swedish/gen_swe_lex.regulus

Here are some of the issues I've encountered so far:
  • Agreement is different in English and Swedish, so the range of possible values for the 'agr' feature has to be language-dependent. In English, agreement is by person and number. In Swedish, it's primarily by number and gender ("common" or "neuter"). However, you also need person, since reflexive pronouns agree with the subject in person.
  • All verbs invert in Swedish, and there is no auxiliary "do".
  • The range of verb inflections is different in the two languages. English verbs have five forms: base, third person singular present, imperfect, past participle, present participle. So, for example, go, goes, went, gone, going. Modern Swedish doesn't inflect by person or number in the present tense, but on the other hand distinguishes the imperative from the infinitive, distinguishes the "supine" (the form used for the perfect tense) from the past participle, and inflects the past participle by number and gender. So, for example, bryta (break) has the forms bryt (imperative), bryta (infinitive), bryter (present), bröt (imperfect), brutit (supine), brytande (present participle), bruten (past participle singular common), brutet (past participle singular neuter), brutna (past participle plural).
  • Swedish nouns inflect for definiteness. So for example bord is "table", but bordet is "the table". Adjectives also inflect for definiteness, thus ett stort bord ("a big table") but det stora bordet ("the big table").
  • Swedish possessives inflect for gender and number. So min bil ("my car", common/singular), mitt hus ("my house", neuter/singular), mina barn ("my children", plural).
  • Swedish partitives are slightly different. In English, "a bottle of beer"; in Swedish en flaska öl (nothing corresponding to the "of").
  • Swedish date and time grammar is slightly different. In English, "december fourteenth"; in Swedish, fjortonde december. In English, "nine thirty"; in Swedish, nio och trettio.
  • Swedish negation is basically an adverb, e.g. jag beställde inte någon pizza = I ordered not any pizza. The negation adverb's position is after the verb in main clauses, before in subordinate clauses.
Most of this stuff seems easy, and I can adapt the treatments we implemented in the SLT grammar during the 90s. I am guessing that 85-90% of the rules in the final shared grammar will be common to English and Swedish. Judging from our experiences with SLT, Danish and Swedish overlap to 95% or better, and Norwegian should be similar.