3DSAL: an efficient 3D-CNN architecture for video saliency prediction
Djilali, Yasser Abdelaziz Dahou, Sayah, Mohamed, McGuinness, KevinORCID: 0000-0003-1336-6477 and O'Connor, Noel E.ORCID: 0000-0002-4033-9135
(2020)
3DSAL: an efficient 3D-CNN architecture for video saliency prediction.
In: VISAPP: 15th International Conference on Computer Vision Theory and Applications, 27-29 Feb 2020, Valetta, Malta.
ISBN 978-989-758-402-2
In this paper, we propose a novel 3D CNN architecture that enables us to train an effective video saliency prediction model. The model is designed to capture important motion information using multiple adjacent frames. Our model performs a cubic convolution on a set of consecutive frames to extract spatio-temporal fea- tures. This enables us to predict the saliency map for any given frame using past frames. We comprehensively investigate the performance of our model with respect to state-of-the-art video saliency models. Experimental results on three large-scale datasets, DHF1K, UCF-SPORTS and DAVIS, demonstrate the competitiveness of our approach.
Metadata
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Conference
Refereed:
Yes
Uncontrolled Keywords:
Visual attention; Video saliency; Deep learning; 3D CNN
Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
4.
ScitePress. ISBN 978-989-758-402-2