Greevy, Edel (2004) Automatic text categorisation of racist webpages. Master of Science thesis, Dublin City University.
Abstract
Automatic Text Categorisation (TC) involves the assignment of one or more predefined categories to text documents in order that they can be effectively managed. In this thesis we examine the possibility of applying automatic text categorisation to the problem of categorising texts (web pages) based on whether or not they are racist.
TC has proven successful for topic-based problems such as news story categorisation. However, the problem of detecting racism is dissimilar to topic-based problems in that lexical items present in racist documents can also appear in anti-racist documents or indeed potentially any document. The mere presence of a potentially racist term does not necessarily mean the document is racist. The difficulty is finding what discerns racist documents from non-racist.
We use a machine learning method called Support Vector Machines (SVM) to automatically learn features of racism in order to be capable of making a decision about the target class of unseen documents. We examine various representations within an SVM so as to identify the most effective method for handling this problem. Our work shows that it is possible to develop automatic categorisation of web pages, based on these approaches
Metadata
Item Type: | Thesis (Master of Science) |
---|---|
Date of Award: | 2004 |
Refereed: | No |
Supervisor(s): | Smeaton, Alan F. |
Uncontrolled Keywords: | Automatic Text Categorisation; TC; machine learning; Support vector Machines; SVM |
Subjects: | Computer Science > Machine learning |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 17275 |
Deposited On: | 23 Aug 2012 13:49 by Fran Callaghan . Last Modified 19 Jul 2018 14:56 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 3.0 3MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record