Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model

Le, Hoang-Bao orcid logoORCID: 0009-0000-2496-4347, Cuong, Dinh Viet, Nguyen, An Pham Ngoc orcid logoORCID: 0000-0002-0041-9747, Liting, Zhou orcid logoORCID: 0000-0002-7778-8743 and Gurrin, Cathal orcid logoORCID: 0000-0003-4395-7702 (2025) Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model. ICME Workshop .

Abstract
This paper addresses two main objectives. Firstly, we demonstrate the impressive performance of the LLaVA-NeXT-interleave on 22 datasets across three different tasks: Multi-Image Reasoning, Documents and Knowledge-Based Understanding and Interactive MultiModal communication. Secondly, we add the Dense Channel Integration (DCI) connector to the LLaVA-NeXTInterleave and compare its performance against the standard model. We find that the standard model achieves the highest overall accuracy, excelling in vision-heavy tasks like VISION, NLVR2, and Fashion200K. Meanwhile, the DCIenhanced version shows particular strength on datasets requiring deeper semantic coherence or structured change understanding such as MIT-States PropertyCoherence and SlideVQA. Our results highlight the potential of combining powerful foundation models with plug-and-play techniques for Interleave tasks. The code is available at https://github.com/dinhvietcuong1996/icme25-inova.
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:Interleave, llava, comprehension, dense connector
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:UNSPECIFIED
Official URL:https://arxiv.org/abs/2506.11737
ID Code:31152
Deposited On:30 Jun 2025 11:40 by Hoang Bao Le . Last Modified 30 Jun 2025 11:40
Documents

Full text available as:

[thumbnail of 2506.11737v1.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
698kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record