A diachronic Italian corpus based on “L’Unit`a”

Basile, Pierpaolo; Caputo, Annalina; Caselli, Tommaso; Cassotti, Pierluigi; Varvara, Rossella

Basile, Pierpaolo, Caputo, Annalina ORCID: 0000-0002-7144-8545, Caselli, Tommaso ORCID: 0000-0003-2936-0256, Cassotti, Pierluigi and Varvara, Rossella ORCID: 0000-0001-9957-2807 (2020) A diachronic Italian corpus based on “L’Unit`a”. In: Seventh Italian Conference on Computational Linguistics, 1-3 Mar 2021, Bologna (Online).

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unit`a”. We automatically clean and annotate the corpus with PoStags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Computational linguistics
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of the Seventh Italian Conference on Computational Linguistics. CEUR Workshop Proceedings 2769. CEUR-WS.
Publisher:	CEUR-WS
Official URL:	http://ceur-ws.org/Vol-2769/paper_44.pdf
Copyright Information:	© 2020 the Authors. (CC-BY-4.0)
ID Code:	25946
Deposited On:	02 Jun 2021 10:51 by Annalina Caputo . Last Modified 02 Jun 2021 13:35

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
412kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

A diachronic Italian corpus based on “L’Unit`a”

Downloads