Basile, Pierpaolo, Caputo, AnnalinaORCID: 0000-0002-7144-8545, Caselli, TommasoORCID: 0000-0003-2936-0256, Cassotti, Pierluigi and Varvara, RossellaORCID: 0000-0001-9957-2807
(2020)
A diachronic Italian corpus based on “L’Unit`a”.
In: Seventh Italian Conference on Computational Linguistics, 1-3 Mar 2021, Bologna (Online).
In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unit`a”. We automatically clean and annotate the corpus with PoStags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.