Automatic prediction of text aesthetics and interestingness

Ganguly, Debasis; Leveling, Johannes; Jones, Gareth J.F.

Ganguly, Debasis ORCID: 0000-0003-0050-7138, Leveling, Johannes ORCID: 0000-0003-0603-4191 and Jones, Gareth J.F. ORCID: 0000-0002-4033-9135 (2014) Automatic prediction of text aesthetics and interestingness. In: 25th International Conference on Computational Linguistics (COLING 2014), 23-29 Aug 2014, Dublin, Ireland.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper investigates the problem of automated text aesthetics prediction. The availability of user generated content and ratings, e.g. Flickr, has induced research in aesthetics prediction for non-text domains, particularly for photographic images. This problem, however, has yet not been explored for the text domain. Due to the very subjective nature of text aesthetics, it is dicult to compile human annotated data by methods such as crowd sourcing with a fair degree of inter-annotator agreement. The availability of the Kindle \popular highlights" data has motivated us to compile a dataset comprised of human annotated aesthetically pleasing and interesting text passages. We then undertake a supervised classication approach to predict text aesthetics by constructing real-valued feature vectors from each text passage. In particular, the features that we use for this classification task are word length, repetitions, polarity, part-of-speech, semantic distances; and topic generality and diversity. A traditional binary classication approach is not effective in this case because non-highlighted passages surrounding the highlighted ones do not necessarily represent the other extreme of unpleasant quality text. Due to the absence of real negative class samples, we employ the MC algorithm, in which training can be initiated with instances only from the positive class. On each successive iteration the algorithm selects new strong negative samples from the unlabeled class and retrains itself. The results show that the mapping convergence (MC) algorithm with a Gaussian and a linear kernel used for the mapping and convergence phases, respectively, yields the best results, achieving satisfactory accuracy, precision and recall values of about 74%, 42% and 54% respectively.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Computational linguistics Computer Science > Information retrieval
DCU Faculties and Centres:	Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:	Proceedings of COLING 2014. .
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland
ID Code:	20379
Deposited On:	15 Jan 2015 14:53 by Gareth Jones . Last Modified 25 Oct 2018 09:30

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
178kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Automatic prediction of text aesthetics and interestingness

Downloads