Semantic Similarity Measures Evaluation

This page contains matherials related to evaluation of semantic similarity measures descibed in the following papers:

sim-eval: a tool for evaluation of semantic similarity measures

The evaluation of the results in the papers mentioned above essentially relies on sim-eval. This tool contains data and scripts for benchmarking and comparison of semantic similarity measures. The tool performs two kinds of evaluation: correlations with human judgements and semantic relation ranking. The evaluation procedure relies on two types of ground truth datasets: human judgments about semantic similarity and manually crafted semantic relations. This procedure relies on five common openly available datasets -- MC (Miller and Charles, 91), RG (Rubenstein and Goodenougth, 1965), WordSim353 (Finkelstein et al., 2001), BLESS (Baroni and Lenci, 2011), and SN (Panchenko and Morozova, 2012).

The evaluation tool takes as an input:

It outputs the evaluation scores to the standard output and to the file "scores.txt". Optionally, it also produces also several plots: You can open the source files of the plots (.fig files) in MATLAB. Below you can find references to the input and output files for different semantic similarity measures described in the papers mentioned above.

KONVENS 2012 Paper

Here you can download similarity matrices extracted with various versions of the PatternSim measure described in the paper. You can also download evaluation scores of these measures as well as the scores of the baselines. The measures below are denoted in the same way as in the paper e.g. "Efreq" or "Random".

EACL 2012 Workshop Paper

Here you can download evaluation scores of the baseline and the combined measures presented in the paper. Evaluation scores were generated by the sim-eval. The measures below are denoted in the same way as in the paper.

JEP-TALN-RECITAL 2012 Paper

Here you can download evaluation scores of the baseline and the combined measures presented in the paper. Evaluation scores were generated by the sim-eval. The measures below are denoted in the same way as in the paper.

Contact

For any question concerining this evaluation of semantic similarity measures please write to Alexander Panchenko.

Last Modification: 2 September 2012