Tuesday, 24 June 2008

McNemar at the word level?

I was thinking about our paper for GoTAL, and one thing that's bothered me a little is that we did all the significance testing using SER - the reason was that it's easy to run a McNemar test. However, we got rather bigger improvements in WER, which is really what you would expect from SLMs.

It seems to me though that you should also be able to do McNemar at the word level. You look at each word in the transcription, and then check each of the two hypotheses you're comparing to see whether they include it. This is a little coarse-grained (you treat each sentence as a bag of words), but I'd guess it would still give interesting results. Shouldn't be at all hard to implement either. If we do an expanded version of the GoTAL paper, I'd definitely like to try this.

In fact, this idea is so obvious that either it's wrong, or someone must already have thought of it. Any idea which?

PS Jun 26. Beth Ann pointed out that the proposal as originally formulated only covered deletions, but it's trivial to extend it to do insertions too. More seriously, she wondered if the significance results would always be reliable, given that there may be subtle dependencies. I am really not sure about this, but one way to investigate the idea empirically would be to generate large sets of simulated recognition results using a stochastic process, and look at the distributions. For example, if you generate 10000 simulated recognition runs, then take one run and find all the other runs that come out as different from it at P < 0.01 according to the new statistic, you'd be reassured to find there were not more than 100 of them. A lot more, and something is presumably wrong. A lot less presumably just shows the test isn't very sensitive.

No comments: