Saturday, 2 January 2010

Swedish MedSLT, continued

I temporarily broke off working on Interlingua to Swedish, and spent a day concentrating on the opposite direction. I built the recognition grammar by training on a corpus which was the union of the original recognition corpus and the generation corpus (this ensures that everything you can generate will also get recognized); then I did PCFG tuning using the set of translations produced from the combined Interlingua corpus. I also used the set of translations as the initial Swedish corpus for translation testing. All the corpora concerned are created on-the-fly as part of the make process, so the correspondences will stay up to date.

It was easy to get things working in Swe -> Int direction, and 98% of the translation corpus now produces well-formed interlingua. I compiled a Swedish recognizer, and hooked everything together to get a speech-to-speech system for Swedish -> English. Anecdotally, it's not bad.

The most urgent thing now is probably to add more Swedish coverage. There are several very common constructions that currently aren't in the specialized Swedish grammar.

