Tuesday, 8 July 2008

N-best rescoring again

[Updated July 9]

I've just checked in new code that makes it possible to create training material for doing N-best rescoring on speech translation applications - the functionality is basically the same as what we already had for dialogue applications, but there were a number of details that had to be fixed. It seems that the potential for improving performance using N-best rescoring varies considerably between apps. So far, we've looked at the following cases:
  • Calendar: can already almost halve error rate using rescoring, more should be possible.
  • Ford app: almost no potential for improvement.
  • Paideia app: considerable potential for improvement (don't currently have figures)
  • English MedSLT: maximum possible improvement looks like about 10% relative.
  • French MedSLT: maximum possible improvement about 15-20% relative.
  • Japanese MedSLT: almost no potential for improvement.
The variation in behavior between the different apps is quite surprising. In particular, I don't yet have a good explanation for why the MedSLT languages should be so different.

No comments: