Part-of-speech tagging of code-mixed social media content: pipeline,
stacking and joint modelling
Barman, Utsab, Wagner, JoachimORCID: 0000-0002-8290-3849 and Foster, JenniferORCID: 0000-0002-7789-4853
(2016)
Part-of-speech tagging of code-mixed social media content: pipeline,
stacking and joint modelling.
In: Second Workshop on Computational Approaches to Code Switching, 2 Nov 2016, Austin, Texas, USA.
Multilingual users of social media sometimes use multiple languages during conversation. Mixing multiple languages in content
is known as code-mixing. We annotate a subset of a trilingual code-mixed corpus (Barman
et al., 2014) with part-of-speech (POS) tags.
We investigate two state-of-the-art POS tagging techniques for code-mixed content and
combine the features of the two systems to
build a better POS tagger. Furthermore, we
investigate the use of a joint model which performs language identification (LID) and partof-speech (POS) tagging simultaneously.