DCU@FIRE-2012: rule-based stemmers for Bengali and Hindi
Ganguly, Debasis and Leveling, Johannes and Jones, Gareth J.F. (2012) DCU@FIRE-2012: rule-based stemmers for Bengali and Hindi. In: FIRE 2012 Workshop, 17-19 Dec 2012, Kolkata, India.
Full text available as:
For the participation of Dublin City University (DCU) in the FIRE-2012 Morpheme Extraction Task (MET), we investigated a rule based stemming approaches for Bengali and Hindi IR. The MET task itself is an attempt to obtain a fair and direct comparison between various stemming approaches measured by comparing the retrieval effectiveness obtained by each on the same dataset. Linguistic knowledge was used to manually craft the rules for removing the commonly occurring plural suffixes for Hindi and Bengali. Additionally, rules for removing classifiers and case markers in Bengali were also formulated. Our rule-based stemming approaches produced the best and the second-best retrieval effectiveness for Hindi and Bengali datasets respectively.
Archive Staff Only: edit this record