Call for papers
Vision-Language Models (VLMs), such as CLIP, LLaVA, and GPT-4V, have revolutionized multimedia applications, such as captioning, visual question answering, cross-modal retrieval, and multi-modal agents, by enabling open-vocabulary search and generation. While significant strides have been made in developing powerful VLMs, the field remains in the early stages of rigorously evaluating and understanding their real-world vulnerabilities, both empirically and theoretically. In response to this important but under-explored gap, the workshop aims to provide a forum for researchers to share advanced in theories, algorithms and applications of trustworthy multi-modal learning, fostering diverse viewpoints on core principles and emerging techniques for developing trustworthy VLMs in the wild.
Topics of Interest
We welcome all papers that are related but not limited to the following topics:
A. Theory:
- Uncertainty quantification for VLMs.
- Theoretical bounds on VLMs under distribution shifts.
- Causal perspectives on spurious correlations and multi-modal shortcut learning.
- Robust optimization and generalization bounds for multi-modal learning objectives
B. Algorithms:
- Out-of-distribution (OOD) generalization, OOD detection and test-time adaptation
- Detecting and mitigating hallucination from VLMs
-
Robust training and inference against adversarial attacks, prompt injection and imperfect
inputs.
- Post-hoc and intrinsic methods for explaining VLM decisions.
- Techniques for auditing and debiasing vision-language datasets.
- Methods to ensure fairness in the context of VLMs.
- Privacy-preserving machine learning for VLMs.
C. Benchmark and Application:
-
Trustworthy usages of VLMs for scientific domains, including program languages, climate
science, healthcare, life sciences, physics, and cognitive science.
- Detecting AI-generated content and deepfakes.
- New benchmarks and evaluation protocols for real-world trustworthiness
Program
The workshop will be held on June 16th during ICMR 2026 in the Maxima Zaal at the KIT Royal Tropical Institute, Amsterdam.
| Local Time | Event | Presenter(s) |
|---|---|---|
| 9:00-9:05 | Opening Remark | Yingjun Du |
| 9:05-9:40 | Keynote 1: Safe and Robust Vision-Language Models in Hyperbolic Space | Pascal Mettes |
| 9:40-10:15 | Keynote 2: | Bin Zhu |
| 10:15-10:35 | Encore Paper 1: Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models | Zijie Zhou |
| 10:35-10:45 | Break | |
| 10:45-11:20 | Keynote 3: | Mengyue Yang |
| 11:20-11:40 | Encore Paper 2: Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints | Guanyu Wang |
| 11:40-12:00 | Encore Paper 3: Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models | Yuanwei Hu |
| 12:00-12:10 | Research Paper 1: Leveraging Self-Attention Mechanism for Visual Prompting in Large Vision-Language Models | Tianxing Guo, Junbao Li, Huanyu Liu, Tianyu Lin, and Boxu Pei |
| 12:10-12:20 | Research Paper 2: Toward Trustworthy Vision-Language Reporting for Tremor Assessment under Distribution Shift | Xinjun Li |
| 12:20-12:30 | Research Paper 3: PatchTrust: Black-Box Hallucination Detection via Patch-Level Retrieval Scoring | Vaibhav Varshney and Manjunatha Naik MC |
| 12:30-12:40 | Research Paper 4: Hearsay: Vision-Language Medical Diagnoses Without an Image | Siddharth Vohra |
| 12:40 | Closing Remark | Yingjun Du |
Important dates
All submission deadlines are 23:59 AoE (Anywhere on Earth).
Submission deadline: 19 April 2026
Paper notification: 20 April 2026
Camera-ready deadline: 25 April 2026
Submission
Submissions to the TrustVLM workshop are expected to be short papers (4 page limit, plus additional pages for references) and to comply with a double-blind review process. All papers must be formatted according to the ACM proceedings style. Click here to access LaTeX and Microsoft Word templates for this format.
If you use LaTeX, please use sample-sigconf.tex as the template (or see the Overleaf template here). Submissions should be in two-column format, please use the following header: \documentclass[sigconf, review, anonymous] {acmart}
Accepted papers are non-archiving and should be presented in an oral session during the TrustVLM workshop.
Keynote Speakers
Mengyue Yang, PhD Assistant Professor University of Bristol, UK
Bin Zhu, PhD Assistant Professor Singapore Management University, Singapore
Pascal Mettes, PhD Assistant Professor University of Amsterdam, Netherlands
Accepted Papers
- PatchTrust: Black-Box Hallucination Detection via Patch-Level Retrieval Scoring
- Hearsay: Vision-Language Medical Diagnoses Without an Image
- Toward Trustworthy Vision-Language Reporting for Tremor Assessment under Distribution Shift
- Leveraging Self-Attention Mechanism for Visual Prompting in Large Vision-Language Models
Program Co-Chairs
- Bo Peng (University of Technology Sydney)
- Yingjun Du (University of Amsterdam)
- Sean Du (Nanyang Technological University)
- Zhen Fang (University of Technology Sydney)