Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French
Seddah, Djamé, Chrupała, Grzegorz, Cetinoglu, Ozlem, van Genabith, JosefORCID: 0000-0003-1322-7944 and Candito, Marie
(2010)
Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French.
In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English
Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words.
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages.
.
Association for Computational Linguistics.