Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment

Fernandez, Jaime B.; Syed, Ali Akbar Shah; Ali, Muhammad Intizar

Home
Browse By

Author

DCU Faculties and Centres

Theses

Subject

Year

Publication Type

Year of Award

Supervisors
About / FAQ
Statistics
Login (DCU Staff Only)

Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment

Fernandez, Jaime B. ORCID: 0000-0001-9774-3879, Syed, Ali Akbar Shah ORCID: 0000-0002-8045-3514 and Ali, Muhammad Intizar ORCID: 0000-0002-0674-2131 (2026) Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment. In: 32nd International Conference on Multimedia Modeling, 29-31January 2026, Prague, Czech Republic. ISBN 978-981-95-6963-2

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Virtual and Mixed Reality platforms such as Meta Quest and Apple Vision Pro have accessibility challenges for Blind and Low Vision (BLV) users due to their dependence on visual cues. Existing accessibility features like color filters and text resizing have limited support which makes users with severe vision loss unable to fully engage. In this research, a novel solution has been developed that integrates 3D scenic descriptor generation within Unreal Engine User Interface using a modular client server architecture. The developed system implements a locally hosted Vision Language Model (VLM) to generate scene descriptions. During the comparative testing of VLMs, Llava 7B was identified as the most effective in balancing semantic accuracy and perceptual quality. A key innovation is a multi-prompt strategy that can guard rail the complex scenes into structured and comprehensive audio segments that cover objects, spatial layout, and mood. As the functionality of scene description is activated with a simple key press, the system provides tailored feedback that enables BLV users to integrate with VR environment independently.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	Virtual and Mixed Reality; Blind and Low Vision; Vision Language Model; Gaming Technology; Assistive Technology
Subjects:	Computer Science > Artificial intelligence Computer Science > Image processing Computer Science > Multimedia systems Computer Science > Visualization Engineering > Virtual reality
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in:	Lokoč, J. et al., (ed.) MultiMedia Modeling. MMM 2026. Lecture Notes in Computer Science 16415. Springer. ISBN 978-981-95-6963-2
Publisher:	Springer
Official URL:	https://link.springer.com/chapter/10.1007/978-981-...
Copyright Information:	Authors
Funders:	Taighde Eireann – Research Ireland
ID Code:	32501
Deposited On:	07 Apr 2026 14:37 by Jaime Boanerjes Fernandez Roblero . Last Modified 07 Apr 2026 14:37

Documents

Full text available as:

[thumbnail of 01AskVR_Demopaper_finalsubmissionv2.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
882kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment

Downloads