The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.
Progetto di Ateneo/CSP 2016, S1618_L2_BOSC_01)., CENF_CT_RIC_19_01, DFG via project CE 326/11 (SAGT), ANR projects ParSiTi (ANR-16-CE33-0021, SoSweet (ANR15-CE38-0011- 01).
ID Code:
24477
Deposited On:
25 May 2020 15:04 by
Teresa Lynn
. Last Modified 25 May 2020 15:04
Available Versions of this Item
Treebanking user-generated content: a proposal for a unified representation in universal dependencies. (deposited 25 May 2020 15:04)[Currently Displayed]