Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French

Seddah, Djamé, Chrupała, Grzegorz, Cetinoglu, Ozlem, van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 and Candito, Marie (2010) Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.

Abstract
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
Research Institutes and Centres > National Centre for Language Technology (NCLT)
Published in: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages. . Association for Computational Linguistics.
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/W/W10/
Copyright Information:© 2010 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15987
Deposited On:08 Dec 2010 13:56 by Shane Harper . Last Modified 21 Jan 2022 16:28
Documents

Full text available as:

[thumbnail of Lemmatization_and_Lexicalized_Statistical_Parsing.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
159kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record