Fernandez, Jaime B.
ORCID: 0000-0001-9774-3879, Syed, Ali Akbar Shah
ORCID: 0000-0002-8045-3514 and Ali, Muhammad Intizar
ORCID: 0000-0002-0674-2131
(2026)
Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment.
In: 32nd International Conference on Multimedia Modeling, 29-31January 2026, Prague, Czech Republic.
ISBN 978-981-95-6963-2
Abstract
Virtual and Mixed Reality platforms such as Meta Quest and Apple Vision Pro have accessibility challenges for Blind and Low Vision (BLV) users due to their dependence on visual cues. Existing accessibility features like color filters and text resizing have limited support which makes users with severe vision loss unable to fully engage. In this research, a novel solution has been developed that integrates 3D scenic descriptor generation within Unreal Engine User Interface using a modular client server architecture. The developed system implements a locally hosted Vision Language Model (VLM) to generate scene descriptions. During the comparative testing of VLMs, Llava 7B was identified as the most effective in balancing semantic accuracy and perceptual quality. A key innovation is a multi-prompt strategy that can guard rail the complex scenes into structured and comprehensive audio segments that cover objects, spatial layout, and mood. As the functionality of scene description is activated with a simple key press, the system provides tailored feedback that enables BLV users to integrate with VR environment independently.
Metadata
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Event Type: | Conference |
| Refereed: | Yes |
| Uncontrolled Keywords: | Virtual and Mixed Reality; Blind and Low Vision; Vision Language Model; Gaming Technology; Assistive Technology |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Image processing Computer Science > Multimedia systems Computer Science > Visualization Engineering > Virtual reality |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering Research Institutes and Centres > INSIGHT Centre for Data Analytics |
| Published in: | Lokoč, J. et al., (ed.) MultiMedia Modeling. MMM 2026. Lecture Notes in Computer Science 16415. Springer. ISBN 978-981-95-6963-2 |
| Publisher: | Springer |
| Official URL: | https://link.springer.com/chapter/10.1007/978-981-... |
| Copyright Information: | Authors |
| Funders: | Taighde Eireann – Research Ireland |
| ID Code: | 32501 |
| Deposited On: | 07 Apr 2026 14:37 by Jaime Boanerjes Fernandez Roblero . Last Modified 07 Apr 2026 14:37 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0 882kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record