Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model

Le, Hoang-Bao; Cuong, Dinh Viet; Nguyen, An Pham Ngoc; Liting, Zhou; Gurrin, Cathal

Le, Hoang-Bao ORCID: 0009-0000-2496-4347, Cuong, Dinh Viet, Nguyen, An Pham Ngoc ORCID: 0000-0002-0041-9747, Liting, Zhou ORCID: 0000-0002-7778-8743 and Gurrin, Cathal ORCID: 0000-0003-4395-7702 (2025) Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model. ICME Workshop .

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This paper addresses two main objectives. Firstly, we demonstrate the impressive performance of the LLaVA-NeXT-interleave on 22 datasets across three different tasks: Multi-Image Reasoning, Documents and Knowledge-Based Understanding and Interactive MultiModal communication. Secondly, we add the Dense Channel Integration (DCI) connector to the LLaVA-NeXTInterleave and compare its performance against the standard model. We find that the standard model achieves the highest overall accuracy, excelling in vision-heavy tasks like VISION, NLVR2, and Fashion200K. Meanwhile, the DCIenhanced version shows particular strength on datasets requiring deeper semantic coherence or structured change understanding such as MIT-States PropertyCoherence and SlideVQA. Our results highlight the potential of combining powerful foundation models with plug-and-play techniques for Interleave tasks. The code is available at https://github.com/dinhvietcuong1996/icme25-inova.

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Uncontrolled Keywords:	Interleave, llava, comprehension, dense connector
Subjects:	Computer Science > Information retrieval
DCU Faculties and Centres:	UNSPECIFIED
Official URL:	https://arxiv.org/abs/2506.11737
ID Code:	31152
Deposited On:	30 Jun 2025 11:40 by Hoang Bao Le . Last Modified 30 Jun 2025 11:40

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
698kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model

Downloads