TrustVLM

Toward Trustworthy Vision-language Models in the Wild:
Theory, Algorithm and Application

In conjunction with ICMR 2026
16th ACM International Conference on Multimedia Retrieval

June 16th, 2026, Amsterdam, The Netherlands

Topics of Interest

Call for papers

Vision-Language Models (VLMs), such as CLIP, LLaVA, and GPT-4V, have revolutionized multimedia applications, such as captioning, visual question answering, cross-modal retrieval, and multi-modal agents, by enabling open-vocabulary search and generation. While significant strides have been made in developing powerful VLMs, the field remains in the early stages of rigorously evaluating and understanding their real-world vulnerabilities, both empirically and theoretically. In response to this important but under-explored gap, the workshop aims to provide a forum for researchers to share advanced in theories, algorithms and applications of trustworthy multi-modal learning, fostering diverse viewpoints on core principles and emerging techniques for developing trustworthy VLMs in the wild.

Topics of Interest

We welcome all papers that are related but not limited to the following topics:

A. Theory:

Uncertainty quantification for VLMs.

Theoretical bounds on VLMs under distribution shifts.

Causal perspectives on spurious correlations and multi-modal shortcut learning.

Robust optimization and generalization bounds for multi-modal learning objectives

B. Algorithms:

Out-of-distribution (OOD) generalization, OOD detection and test-time adaptation

Detecting and mitigating hallucination from VLMs

Robust training and inference against adversarial attacks, prompt injection and imperfect inputs.

Post-hoc and intrinsic methods for explaining VLM decisions.

Techniques for auditing and debiasing vision-language datasets.

Methods to ensure fairness in the context of VLMs.

Privacy-preserving machine learning for VLMs.

C. Benchmark and Application:

Trustworthy usages of VLMs for scientific domains, including program languages, climate science, healthcare, life sciences, physics, and cognitive science.

Detecting AI-generated content and deepfakes.

New benchmarks and evaluation protocols for real-world trustworthiness

Program

The workshop will be held on June 16th during ICMR 2026 in the Maxima Zaal at the KIT Royal Tropical Institute, Amsterdam.

<

Local Time	Event	Presenter(s)
9:00-9:05	Opening Remark	Yingjun Du
9:05-9:40	Keynote 1: Safe and Robust Vision-Language Models in Hyperbolic Space	Pascal Mettes
9:40-10:15	Keynote 2: Beyond Task Success: Towards Trustworthy Foundation Models for Robotics	Bin Zhu
10:15-10:35	Encore Paper 1: Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models	Zijie Zhou
10:35-10:45	Research Paper 1: Bridging Semantic and Structural Manifolds: Zero-Shot Single-Temporal Change Detection in Remote Sensing	Shih-Chih Lin, Jia-Xian Jian, YunTung Chu, Wei-Chieh Sun, Fang-Yi Lin
10:45-10:55	Break
10:55-11:30	Keynote 3: Grounded Large Language Modeling Reasoning	Mengyue Yang
11:30-11:50	Encore Paper 2: Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints	Guanyu Wang
11:50-12:10	Encore Paper 3: Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models	Yuanwei Hu
12:10-12:20	Research Paper 2: Leveraging Self-Attention Mechanism for Visual Prompting in Large Vision-Language Models	Tianxing Guo, Junbao Li, Jiazheng Wen, Huanyu Liu, Tianyu Lin, and Boxu Pei
12:20-12:30	Research Paper 3: Toward Trustworthy Vision-Language Reporting for Tremor Assessment under Distribution Shift	Xinjun Li
12:30-12:40	Research Paper 4: Hearsay: Vision-Language Medical Diagnoses Without an Image	Siddharth Vohra
12:40	Closing Remark	Yingjun Du

Important dates

All submission deadlines are 23:59 AoE (Anywhere on Earth).

Submission deadline: 19 April 2026

Paper notification: 20 April 2026

Camera-ready deadline: 25 April 2026

Submission

Submissions to the TrustVLM workshop are expected to be short papers (4 page limit, plus additional pages for references) and to comply with a double-blind review process. All papers must be formatted according to the ACM proceedings style. Click here to access LaTeX and Microsoft Word templates for this format.

If you use LaTeX, please use sample-sigconf.tex as the template (or see the Overleaf template here). Submissions should be in two-column format, please use the following header: \documentclass[sigconf, review, anonymous] {acmart}

Accepted papers are non-archiving and should be presented in an oral session during the TrustVLM workshop.

Submit via OpenReview

Keynote Speakers

Yingjun Du

Mengyue Yang, PhD Assistant Professor University of Bristol, UK

Bin Zhu

Bin Zhu, PhD Assistant Professor Singapore Management University, Singapore

Pascal Mettes

Pascal Mettes, PhD Assistant Professor University of Amsterdam, Netherlands

Accepted Papers

PatchTrust: Black-Box Hallucination Detection via Patch-Level Retrieval Scoring
Hearsay: Vision-Language Medical Diagnoses Without an Image
Toward Trustworthy Vision-Language Reporting for Tremor Assessment under Distribution Shift
Leveraging Self-Attention Mechanism for Visual Prompting in Large Vision-Language Models
Bridging Semantic and Structural Manifolds: Zero-Shot Single-Temporal Change Detection in Remote Sensing

Program Co-Chairs

Bo Peng (University of Technology Sydney)
Yingjun Du (University of Amsterdam)
Sean Du (Nanyang Technological University)
Zhen Fang (University of Technology Sydney)

Contact for information

If you have any questions regarding the workshop, feel free to reach out to us:

E-mail: bo.peng-7@student.uts.edu.au