Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

Malobabić, Jovanka (2007) Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion. Master of Engineering thesis, Dublin City University.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures.

Item Type:Thesis (Master of Engineering)
Date of Award:November 2007
Additional Information:The Centre for Digital Video Processing
Supervisor(s):Murphy, Noel
Uncontrolled Keywords:semantic annotation of visual content in large image or photograph collections; EXIF image information; GPS location information; fusion of low-level visual features with camera metadata; building detector; context-based information; indoor/outdoor classification; detecting text in images;
Subjects:Engineering > Imaging systems
Engineering > Electronics
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Digital Video Processing (CDVP)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:92
Deposited On:13 Dec 2007 by DORAS Administrator. Last Modified 30 Jan 2009 14:27

Download statistics

Archive Staff Only: edit this record