This work adapts a deep neural model for image saliency
prediction to the temporal domain of egocentric video. We compute the
saliency map for each video frame, firstly with an off-the-shelf model
trained from static images, secondly by adding a a convolutional or
conv-LSTM layers trained with a dataset for video saliency prediction.
We study each configuration on EgoMon, a new dataset made of seven
egocentric videos recorded by three subjects in both free-viewing and
task-driven set ups. Our results indicate that the temporal adaptation is
beneficial when the viewer is not moving and observing the scene from
a narrow field of view. Encouraged by this observation, we compute and
publish the saliency maps for the EPIC Kitchens dataset, in which view-
ers are cooking.