Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French

Seddah, Djamé and Chrupała, Grzegorz and Cetinoglu, Ozlem and van Genabith, Josef and Candito, Marie (2010) Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
156Kb

Abstract

This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words.

Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
Research Initiatives and Centres > National Centre for Language Technology (NCLT)
Published in:Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages. . Association for Computational Linguistics.
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/W/W10/
Copyright Information:© 2010 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15987
Deposited On:08 Dec 2010 13:56 by Shane Harper. Last Modified 15 Aug 2011 10:34

Download statistics

Archive Staff Only: edit this record