Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment

Fernandez, Jaime B. orcid logoORCID: 0000-0001-9774-3879, Syed, Ali Akbar Shah orcid logoORCID: 0000-0002-8045-3514 and Ali, Muhammad Intizar orcid logoORCID: 0000-0002-0674-2131 (2026) Ask VR: Vision Language Model Driven Scene Descriptor for Blind and Low Vision Users in VR Environment. In: 32nd International Conference on Multimedia Modeling, 29-31January 2026, Prague, Czech Republic. ISBN 978-981-95-6963-2

Abstract
Virtual and Mixed Reality platforms such as Meta Quest and Apple Vision Pro have accessibility challenges for Blind and Low Vision (BLV) users due to their dependence on visual cues. Existing accessibility features like color filters and text resizing have limited support which makes users with severe vision loss unable to fully engage. In this research, a novel solution has been developed that integrates 3D scenic descriptor generation within Unreal Engine User Interface using a modular client server architecture. The developed system implements a locally hosted Vision Language Model (VLM) to generate scene descriptions. During the comparative testing of VLMs, Llava 7B was identified as the most effective in balancing semantic accuracy and perceptual quality. A key innovation is a multi-prompt strategy that can guard rail the complex scenes into structured and comprehensive audio segments that cover objects, spatial layout, and mood. As the functionality of scene description is activated with a simple key press, the system provides tailored feedback that enables BLV users to integrate with VR environment independently.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Virtual and Mixed Reality; Blind and Low Vision; Vision Language Model; Gaming Technology; Assistive Technology
Subjects:Computer Science > Artificial intelligence
Computer Science > Image processing
Computer Science > Multimedia systems
Computer Science > Visualization
Engineering > Virtual reality
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in: Lokoč, J. et al., (ed.) MultiMedia Modeling. MMM 2026. Lecture Notes in Computer Science 16415. Springer. ISBN 978-981-95-6963-2
Publisher:Springer
Official URL:https://link.springer.com/chapter/10.1007/978-981-...
Copyright Information:Authors
Funders:Taighde Eireann – Research Ireland
ID Code:32501
Deposited On:07 Apr 2026 14:37 by Jaime Boanerjes Fernandez Roblero . Last Modified 07 Apr 2026 14:37
Documents

Full text available as:

[thumbnail of 01AskVR_Demopaper_finalsubmissionv2.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
882kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record