Multimodal segmentation of lifelog data
Doherty, Aiden R. and Smeaton, Alan F. and Lee, Keansub and Ellis, Daniel P.W. (2007) Multimodal segmentation of lifelog data. In: RIAO 2007 - Large-Scale Semantic Access to Content (Text, Image, Video and Sound), 30 May - 1 June 2007, Pittsburgh, PA, USA.
Full text available as:
A personal lifelog of visual and audio information can be very helpful as a human memory augmentation tool. The SenseCam, a passive wearable camera, used in conjunction with an iRiver MP3 audio recorder, will capture over 20,000 images and 100 hours of audio per week. If used constantly, very soon this would build up to a substantial collection of personal data. To gain real value from this collection it is important to automatically segment the data into meaningful units or activities. This paper investigates the optimal combination of data sources to segment personal data into such activities. 5 data sources were logged and processed to segment a collection of personal data, namely: image processing on captured SenseCam images; audio processing on captured iRiver audio data; and processing of the temperature, white light level, and accelerometer sensors onboard the SenseCam device. The results indicate that a combination of the image, light and accelerometer sensor data segments our collection of personal data better than a combination of all 5 data sources. The accelerometer sensor is good for detecting when the user moves to a new location, while the image and light sensors are good for detecting changes in wearer activity within the same location, as well as detecting when the wearer socially interacts with others.
Archive Staff Only: edit this record