Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

A diachronic Italian corpus based on “L’Unit`a”

Basile, Pierpaolo, Caputo, Annalina orcid logoORCID: 0000-0002-7144-8545, Caselli, Tommaso orcid logoORCID: 0000-0003-2936-0256, Cassotti, Pierluigi and Varvara, Rossella orcid logoORCID: 0000-0001-9957-2807 (2020) A diachronic Italian corpus based on “L’Unit`a”. In: Seventh Italian Conference on Computational Linguistics, 1-3 Mar 2021, Bologna (Online).

Abstract
In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unit`a”. We automatically clean and annotate the corpus with PoStags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the Seventh Italian Conference on Computational Linguistics. CEUR Workshop Proceedings 2769. CEUR-WS.
Publisher:CEUR-WS
Official URL:http://ceur-ws.org/Vol-2769/paper_44.pdf
Copyright Information:© 2020 the Authors. (CC-BY-4.0)
ID Code:25946
Deposited On:02 Jun 2021 10:51 by Annalina Caputo . Last Modified 02 Jun 2021 13:35
Documents

Full text available as:

[thumbnail of paper_44.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
412kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record