Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
Attia, Mohammed, Foster, JenniferORCID: 0000-0002-7789-4853, Hogan, Deirdre, Le Roux, Joseph, Tounsi, Lamia and van Genabith, JosefORCID: 0000-0003-1322-7944
(2010)
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French.
In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.
This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique
often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information
about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically
from an annotated corpus.