Tuesday, 17 June 2008

"Paraphrase corpora" for estimating semantic error rates

I've implemented a first cut at the "paraphrase corpus" idea that I suggested in yesterday's post. So far, it only works for speech translation, but it's rather nice to see that we can now measure the effect that N-best rescoring has on semantic error rate in a way that's both much quicker and much more objective than what we were doing previously. On the whole of the Eng corpus (the only one I've tried so far), semantic error rate on this metric is reduced by N-best rescoring by about 4% absolute, or 8% relative.

My next task here is to extend the method to dialogue processing - this should be easy, I think. We will then be able to do dialogue N-best rescoring experiments using out-of-coverage as well as in-coverage data, which should open up several new possibilities.

