SimVerb-3500

SimVerb-3500 is a gold standard evaluation resource for semantic similarity of verbs.

We provide 3500 verb pairs with ratings on a scale 0-10. Here are some examples:

Pair Rating
to reply / to respond 9.79
to participate / to join 5.64
to stay / to leave 0.17

SimVerb-3500 covers all normed verb types from the USF free-association database, and provides at least three examples for every VerbNet class.

Please contact Daniela Gerz for any questions.

Download

Download SimVerb-3500 by clicking here.

The .zip file includes the full dataset, as well as a development and test split.
In addition to the averaged scores (as shown above) we also provide the raw individual ratings per annotator. Please see the accompanying readme file for the file formats and details.

Please cite the following paper if you use SimVerb in your work:

SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
Daniela Gerz, Ivan Vulić, Felix Hill, Roi Reichart and Anna Korhonen. EMNLP 2016.
[pdf][bibtex]

State-of-the-Art

Here is a benchmark of current models on SimVerb-3500. The presented numbers are Spearman correlation scores.
Please consult the supplementary material for an explanation of models.

Model SimVerb-3500 full Development-500 Test-3000
Word2Vec SGNS-BOW-8B (dim=500) [1] 0.348 0.378 0.350
Word2Vec SGNS-DEPS-8B (dim=500) [2][3] 0.356 0.389 0.351
Symmetric Pattern Vectors 8B (dim=500) [4] 0.328 0.276 0.347
Non-Distributional [5] 0.596 0.632 0.600
Paragram (dim=300) [6] 0.540 0.525 0.537
Paragram + counter-fitting (dim=300) [7] 0.628 0.611 0.624

References

[1] Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. In ICLR: Workshop Papers

[2] Omer Levy and Yoav Goldberg. 2014. Dependency-based word embeddings. In ACL, pages 302‐308.

[3] Roy Schwartz, Roi Reichart, and Ari Rappoport. 2016. Symmetric patterns and coordinations: Fast and enhanced representations of verbs and adjectives. In NAACL.

[4] Roy Schwartz, Roi Reichart, and Ari Rappoport. 2015. Symmetric pattern based word embeddings for improved word similarity prediction. In CoNLL, pages 258‐267.

[5] Manaal Faruqui and Chris Dyer. 2015. Non-distributional word vector representations. In ACL, pages 464‐469

[6] John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2015. From paraphrase database to compositional paraphrase model and back. Transactions of the ACL, 3:345‐358.

[7] Nikola Mrkšić, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gašić, Lina Maria Rojas‐Barahona, Pei‐Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve J. Young. 2016. Counter-fitting word vectors to linguistic constraints. In NAACL‐HLT.