Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Automatic text categorisation of racist webpages

Greevy, Edel (2004) Automatic text categorisation of racist webpages. Master of Science thesis, Dublin City University.

Abstract
Automatic Text Categorisation (TC) involves the assignment of one or more predefined categories to text documents in order that they can be effectively managed. In this thesis we examine the possibility of applying automatic text categorisation to the problem of categorising texts (web pages) based on whether or not they are racist. TC has proven successful for topic-based problems such as news story categorisation. However, the problem of detecting racism is dissimilar to topic-based problems in that lexical items present in racist documents can also appear in anti-racist documents or indeed potentially any document. The mere presence of a potentially racist term does not necessarily mean the document is racist. The difficulty is finding what discerns racist documents from non-racist. We use a machine learning method called Support Vector Machines (SVM) to automatically learn features of racism in order to be capable of making a decision about the target class of unseen documents. We examine various representations within an SVM so as to identify the most effective method for handling this problem. Our work shows that it is possible to develop automatic categorisation of web pages, based on these approaches
Metadata
Item Type:Thesis (Master of Science)
Date of Award:2004
Refereed:No
Supervisor(s):Smeaton, Alan F.
Uncontrolled Keywords:Automatic Text Categorisation; TC; machine learning; Support vector Machines; SVM
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:17275
Deposited On:23 Aug 2012 13:49 by Fran Callaghan . Last Modified 19 Jul 2018 14:56
Documents

Full text available as:

[thumbnail of edel_greevy_20120702122736.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record