Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
Attia, Mohammed and Foster, Jennifer and Hogan, Deirdre and Le Roux, Joseph and Tounsi, Lamia and van Genabith, Josef (2010) Handling unknown words in statistical latent-variable parsing models for Arabic, English and French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.
Full text available as:
This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique
often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information
about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically
from an annotated corpus.
Archive Staff Only: edit this record