Monday, 14 December 2009

A Scandinavian/English grammar

This weekend, I finally started on a task that I've been meaning to do for ages, and put together a first version of a shared grammar that is intended to cover both English and the Scandinavian languages Swedish, Norwegian and Danish. I've checked it into Regulus/Grammar/Scandinavian. Initially, I'm developing using the CALL-SLT restaurant domain, in English and Swedish. The Swedish version can already handle a reasonable range of language. Here's an example: skulle jag kunna få en pizza = would I be-able-to get a pizza. This is a common way to ask for something in Swedish.
>> skulle jag kunna få en pizza
(Parsing with left-corner parser)

Analysis time: 0.34 seconds

3 possibilities:

Possibility 1
Return value: [(null=[action,få]), (object=[food,pizza]), (null=[modal,kan]),
(null=[modal,skulle]), (agent=[pronoun,jag]), (null=[utterance_type,ynq]),

Global value: []

Syn features: []

Parse tree:

top [GENERAL_SCA:552-558]
/ utterance_intro null [GENERAL_SCA:566-568]
| utterance [GENERAL_SCA:615-620]
| s [GENERAL_SCA:713-718]
| s [GENERAL_SCA:817-826]
| vp [GENERAL_SCA:1124-1137]
| / vbar [GENERAL_SCA:876-898]
| | / v lex(skulle) [GEN_SWE_LEX:51-51]
| | | np [GENERAL_SCA:1952-1960]
| | \ pronoun lex(jag) [GEN_SWE_LEX:200-200]
| | vp [GENERAL_SCA:1124-1137]
| | / vbar [GENERAL_SCA:853-875]
| | | v lex(kunna) [GEN_SWE_LEX:61-62]
| | | vp [GENERAL_SCA:1317-1337]
| | | / vp [GENERAL_SCA:1042-1051]
| | | | / vbar [GENERAL_SCA:853-875]
| | | | | v lex(få) [CALLSLT_LEX:33-33]
| | | | | np [GENERAL_SCA:2073-2091]
| | | | | / np [GENERAL_SCA:1907-1917]
| | | | | | / d lex(en) [GEN_SWE_LEX:277-278]
| | | | | | | nbar [GENERAL_SCA:2118-2130]
| | | | | | \ n lex(pizza) [CALLSLT_LEX:179-181]
| | | | \ \ post_mods null [GENERAL_SCA:1451-1457]
| \ \ \ post_mods null [GENERAL_SCA:1451-1457]
\ utterance_coda null [GENERAL_SCA:597-599]

------------------------------- FILES -------------------------------

CALLSLT_LEX: d:/cygwin/home/speech/call-slt/swe/regulus/callslt_lex.regulus
GENERAL_SCA: d:/cygwin/home/speech/regulus/grammar/scandinavian/general_sca.regulus
GEN_SWE_LEX: d:/cygwin/home/speech/regulus/grammar/scandinavian/swedish/gen_swe_lex.regulus

Here are some of the issues I've encountered so far:
  • Agreement is different in English and Swedish, so the range of possible values for the 'agr' feature has to be language-dependent. In English, agreement is by person and number. In Swedish, it's primarily by number and gender ("common" or "neuter"). However, you also need person, since reflexive pronouns agree with the subject in person.
  • All verbs invert in Swedish, and there is no auxiliary "do".
  • The range of verb inflections is different in the two languages. English verbs have five forms: base, third person singular present, imperfect, past participle, present participle. So, for example, go, goes, went, gone, going. Modern Swedish doesn't inflect by person or number in the present tense, but on the other hand distinguishes the imperative from the infinitive, distinguishes the "supine" (the form used for the perfect tense) from the past participle, and inflects the past participle by number and gender. So, for example, bryta (break) has the forms bryt (imperative), bryta (infinitive), bryter (present), bröt (imperfect), brutit (supine), brytande (present participle), bruten (past participle singular common), brutet (past participle singular neuter), brutna (past participle plural).
  • Swedish nouns inflect for definiteness. So for example bord is "table", but bordet is "the table". Adjectives also inflect for definiteness, thus ett stort bord ("a big table") but det stora bordet ("the big table").
  • Swedish possessives inflect for gender and number. So min bil ("my car", common/singular), mitt hus ("my house", neuter/singular), mina barn ("my children", plural).
  • Swedish partitives are slightly different. In English, "a bottle of beer"; in Swedish en flaska öl (nothing corresponding to the "of").
  • Swedish date and time grammar is slightly different. In English, "december fourteenth"; in Swedish, fjortonde december. In English, "nine thirty"; in Swedish, nio och trettio.
  • Swedish negation is basically an adverb, e.g. jag beställde inte någon pizza = I ordered not any pizza. The negation adverb's position is after the verb in main clauses, before in subordinate clauses.
Most of this stuff seems easy, and I can adapt the treatments we implemented in the SLT grammar during the 90s. I am guessing that 85-90% of the rules in the final shared grammar will be common to English and Swedish. Judging from our experiences with SLT, Danish and Swedish overlap to 95% or better, and Norwegian should be similar.

No comments: