Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Treebanking user-generated content: a proposal for a unified representation in universal dependencies

Sanguinetti, Manuela ORCID: 0000-0002-0147-2208, Bosco, Cristina, Cassidy, Lauren, Çetinoglu, Özlem, Cignarella, Alessandra Teresa ORCID: 0000-0002-4409-6679, Lynn, Teresa, Rehbein, Ines, Ruppenhofer, Josef, Seddah, Djamé and Zeldes, Amir ORCID: 0000-0001-8016-6753 (2020) Treebanking user-generated content: a proposal for a unified representation in universal dependencies. In: 12th Language Resources and Evaluation Conference. (LREC 2020), 11-16 May 2020, Marseille, France. (Virtual).

This is the latest version of this item.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
228kB

Abstract

The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Artificial intelligence
Computer Science > Machine learning
Humanities > Irish language
Humanities > Linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Proceedings of the 12th Language Resources and Evaluation Conference. (LREC 2020). . European Language Resources Association (ELRA).
Publisher:European Language Resources Association (ELRA)
Official URL:https://www.aclweb.org/anthology/2020.lrec-1.645
Copyright Information:© 2020 The authors.
Funders:Progetto di Ateneo/CSP 2016, S1618_L2_BOSC_01)., CENF_CT_RIC_19_01, DFG via project CE 326/11 (SAGT), ANR projects ParSiTi (ANR-16-CE33-0021, SoSweet (ANR15-CE38-0011- 01).
ID Code:24587
Deposited On:09 Jun 2020 16:53 by Teresa Lynn . Last Modified 09 Jun 2020 16:53

Available Versions of this Item

  • Treebanking user-generated content: a proposal for a unified representation in universal dependencies. (deposited 25 May 2020 15:04)
    • Treebanking user-generated content: a proposal for a unified representation in universal dependencies. (deposited 09 Jun 2020 16:53) [Currently Displayed]

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us