A description of a system for identifying Verbal Multi-Word Expressions (VMWEs) in running text is presented. The system mainly exploits universal syntactic dependency features through a Conditional Random Fields (CRF) sequence model. The system competed in the Closed Track at the PARSEME VMWE Shared Task 2017, ranking 2nd place in most languages on full VMWE-based evaluation and 1st in three languages on token-based evaluation. In addition, this paper presents an option to re-rank the 10 best CRF-predicted sequences via semantic vectors, boosting its scores above other systems in the competition. We also show that all systems in the competition would struggle to beat a simple lookup base-line system and argue for a more purpose-specific evaluation scheme.
Metadata
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Workshop
Refereed:
Yes
Uncontrolled Keywords:
Multiword Expression; MWE Identification; Conditional Random Fields; Semantic Reranking
Markantonatou, Stella, Ramisch, Carlos, Savary, Agata and Vincze, Veronika, (eds.)
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017).
.
Association for Computational Linguistics (ACL).