How I Calculate the HMM Performance

All the charts showing the performance of my HMMs are created using the holdout method. The entire corpus of tagged text is divided into two subsets: I train my HMMs on one subset (the “training” set), and then I used the HMMs to auto-tag the text in the second subset (the “test” set), comparing the HMMs’ predictions with the tags in the test set. The percentage of corpus text divided between the training set and the test set varies, and for each percentage value I run a number of times. In pseudo-code:

for(trainPercentage from 90 to 10):
{
	for(i from 1 to 100):
	{
		DivideCorpusRandomly() -> TrainingSet, TestSet;
		HMM hmm = new HMM();
		hmm.Train(TrainingSet);
		hmm.Predict(TestSet) -> predictions;
		Compare(predictions, TestSet.actualValues) -> PerformanceMetrics;
	}
	Show(PerformanceMetrics);
}

I can also vary the values of a confidenceThreshold parameter, which cuts off predictions having a low confidence, but I’m currently not using this parameter. I’ll leave its investigation for later, when my F score gets better :-)

Regarding counting “right” and “wrong” tags, I am being a Nazi with myself. Some papers I’ve read count the number of tokens correctly tagged by the HMM (so, tagging “Jada Pinkett” in “Jada Pinkett Smith” would score 2 on 3), while I count the whole tag (in the example above I’d have scored 0). Should I count tokens as well – and be nicer with myself along the way? :-)

0 Responses to “How I Calculate the HMM Performance”



  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s





Follow

Get every new post delivered to your Inbox.