Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Simple data-driven context-sensitive lemmatization

Chrupała, Grzegorz (2006) Simple data-driven context-sensitive lemmatization. In: SEPLN 2006, 13-15 September 2006, Zaragoza, Spain.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
153Kb

Abstract

Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:lemmatization;
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
Official URL:http://www.unizar.es/departamentos/filologia_inglesa/sepln2006/
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 04/IN/I527
ID Code:15272
Deposited On:10 Mar 2010 14:30 by DORAS Administrator. Last Modified 27 Apr 2010 16:51

Download statistics

Archive Staff Only: edit this record