Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Simple data-driven context-sensitive lemmatization

Chrupała, Grzegorz (2006) Simple data-driven context-sensitive lemmatization. In: SEPLN 2006, 13-15 September 2006, Zaragoza, Spain.

Abstract
Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:lemmatization;
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
Official URL:http://www.unizar.es/departamentos/filologia_ingle...
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 04/IN/I527
ID Code:15272
Deposited On:10 Mar 2010 14:30 by DORAS Administrator . Last Modified 19 Jul 2018 14:50
Documents

Full text available as:

[thumbnail of paper5.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
157kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record