DCU-UVT: Word-level language classification with code-mixed data
Barman, Utsab, Wagner, JoachimORCID: 0000-0002-8290-3849, Chrupała, GrzegorzORCID: 0000-0001-9498-6912 and Foster, JenniferORCID: 0000-0002-7789-4853
(2014)
DCU-UVT: Word-level language classification with code-mixed data.
In: First Workshop on Computational Approaches to Code Switching, 25 Oct 2014, Doha, Qatar.
This paper describes the DCU-UVT team’s participation in the Language Identification in Code-Switched Data shared task in the Workshop on Computational Approaches to Code Switching. Word-level classification experiments were carried out using a simple dictionary-based method, linear kernel support vector machines (SVMs) with and without contextual clues, and a k-nearest neighbour approach. Based on these experiments, we select our SVM-based system with contextual clues as our final system and present results for the Nepali-English and Spanish-English datasets.
Metadata
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Workshop
Refereed:
Yes
Uncontrolled Keywords:
code-switching; language identification; user-generated content; Nepali-English; Spanish-English