Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Semantic reranking of CRF label sequences for verbal multiword expression identification

Moreau, Erwan, Alsulaimani, Ashjan, Maldonado, Alfredo ORCID: 0000-0001-8426-5249, Han, Lifeng, Vogel, Carl ORCID: 0000-0001-8928-8546 and Dutta Chowdhury, Koel (2018) Semantic reranking of CRF label sequences for verbal multiword expression identification. In: Markantonatou, Stella, Ramisch, Carlos ORCID: 0000-0001-7466-9039, Savary, Agata and Vincze, Veronika ORCID: 0000-0002-9844-2194, (eds.) Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop. Language Science Press, Berlin, pp. 177-207. ISBN 978-3-96110-124-5

This is the latest version of this item.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
673kB

Abstract

Verbal multiword Expressions (VMWE) identification can be addressed successfully as a sequence labeling problem via conditional random fields (CRFs) by returning the one label sequence with maximal probability. This work describes a system that reranks the top 10 most likely CRF candidate VMWE sequences using a decision tree regression model. The reranker aims to operationalize the intuition that a non-compositional MWE can have a different distributional behavior than that of its constituent words. This is why it uses semantic features based on comparing the context vector of a candidate expression against those of its constituent words. However, not all VMWE are non-compositional, and analysis shows that non-semantic features also play an important role in the behavior of the reranker. In fact, the analysis shows that the combination of the sequential approach of the CRF component with the context-based approach of the reranker is the main factor of improvement: our reranker achieves a 12% macro-average F1-score improvement on the basic CRF method, as measured using data from PARSEME shared task on VMWE identification.

Item Type:Book Section
Refereed:Yes
Uncontrolled Keywords:Multiword Expression, MWE identification, semantic reranking, conditional random fields
Subjects:Computer Science > Algorithms
Computer Science > Artificial intelligence
Computer Science > Computational linguistics
Computer Science > Computer engineering
Computer Science > Machine learning
Humanities > Language
Humanities > Linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Publisher:Language Science Press
Official URL:http://dx.doi.org/10.5281/zenodo.1469559
Copyright Information:© 2017 The Authors (CC BY 4.0)
Funders:Science Foundation Ireland (SFI) Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund
ID Code:24500
Deposited On:27 May 2020 16:41 by Lifeng Han . Last Modified 27 May 2020 16:41

Available Versions of this Item

  • Semantic reranking of CRF label sequences for verbal multiword expression identification. (deposited 27 May 2020 16:41) [Currently Displayed]

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric
- Altmetric
+ Altmetric
  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us