Basile, Pierpaolo, Caputo, Annalina
ORCID: 0000-0002-7144-8545, Caselli, Tommaso
ORCID: 0000-0003-2936-0256, Cassotti, Pierluigi and Varvara, Rossella
ORCID: 0000-0001-9957-2807
(2020)
A diachronic Italian corpus based on “L’Unit`a”.
In: Seventh Italian Conference on Computational Linguistics, 1-3 Mar 2021, Bologna (Online).
Abstract
In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unit`a”. We automatically clean and annotate the corpus with PoStags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.
Metadata
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Event Type: | Conference |
| Refereed: | Yes |
| Subjects: | Computer Science > Computational linguistics |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
| Published in: | Proceedings of the Seventh Italian Conference on Computational Linguistics. CEUR Workshop Proceedings 2769. CEUR-WS. |
| Publisher: | CEUR-WS |
| Official URL: | http://ceur-ws.org/Vol-2769/paper_44.pdf |
| Copyright Information: | © 2020 the Authors. (CC-BY-4.0) |
| ID Code: | 25946 |
| Deposited On: | 02 Jun 2021 10:51 by Annalina Caputo . Last Modified 02 Jun 2021 13:35 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
412kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record