Sunday, 16 August 2009

First CALL-SLT meeting

We've had our first CALL-SLT meeting, which has done a lot to clarify our immediate goals for the project. We're going to start by building a simple version of the system, constructed in such a way that it will be easy to upgrade it by successively replacing simple modules by more complex ones. Initially, we will be working in the tourism/restaurant domain, using the languages English, French and Japanese. When we have those working well enough, we'll also start on Chinese; this is a language none of us know, so it will give us intuitions about what it's like to be an elementary-level student trying to use the system to get some fluency in a language.

The initial prototype will work as follows. At each turn, the system prompts the student with a description of what they are supposed to say, formulated in a version of the Interlingua. The student will attempt to speak it in the L2 (the language they are trying to learn). The system applies speech recognition to the student's utterance, then tries to translate the result into the interlingua. Finally, it compares the translated interlingua with the one used to prompt the student, and gives them feedback on how they did. Here are more details:
  • Prompting in Interlingua. In the first version, the interlingua will be shown to the student in a text-based form, using the methods we've developed under MedSLT. So for example, the system might show the student

    POLITE REQUEST TABLE 3 PERSON TIME 19:30

    expecting the student to say something like

    I would like to reserve a table for three people at seven thirty

    or whatever the equivalent is in the L2 they are using.

  • As soon as we've figured out a good way to do it, we would like to be able to present the interlingua prompt in graphical form. So here, we might have a picture that could be described as

    Scene:
    Client is talking to waiter.
    Speech bubble from client.
    Inside speech bubble:
    three chairs around a restaurant table;
    large clock in background shows 19:30
  • All speech input to the system will be logged in the usual way. We will have a registration process which allows us to associate each recorded utterance with meta-data which in particular will specify whether or not the utterance was recorded by a native speaker, and whether or not speech recognition got it right.

  • When the system has compared the student's interlingua with the prompt interlingua, there are two simple ways for it to give helpful feedback. The first is to present both versions of the interlingua, highlighting the elements that are different. For instance, in the example above, if the system recognized

    Could I have a table for two people at seven thirty

    then the system would present the prompt and recognized interlinguas roughly as follows:

    POLITE REQUEST TABLE *3* PERSON TIME 19:30
    POLITE REQUEST TABLE *2* PERSON TIME 19:30

    The second way to give help will be to play an example of a native speaker saying some version of the sentence in the L2, if such an example already exists.

  • The prompt selection module will have hooks allowing specification of a strategy. A simple strategy we will implement soon is to choose the prompt from a list of examples where there is a recorded example of a native speaker saying the prompt in the L2, possibly with some other constraints. This will make it easy for a teacher to create a lesson. They will first interact with the system in the L2, to create a set of recorded examples which work correctly. When the student logs on, the system will then be set to select prompts matching the teacher's examples.

  • The functionality will be bundled up as a Prolog-based server, which does most of the processing, and will connect to a lightweight Java-based GUI which presents a client view. The server will initially handle two messages: (1) NEXT_EXAMPLE, returning a new interlingua prompt with associated information, and (2) RECOGNISE, prompting the student to speak, carrying out recognition, and returning the pieces of information produced by carrying out the interlingua comparison process.

No comments: