Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French
Seddah, Djamé and Chrupała, Grzegorz and Cetinoglu, Ozlem and van Genabith, Josef and Candito, Marie (2010) Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.
Full text available as:
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English
Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words.
Archive Staff Only: edit this record