This paper was accepted at the Learning from Evaluating the Evolving LLM Lifecycle workshop at NeurIPS 2025.
Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish clearly unsafe content from borderline cases, leading to problematic over-blocking or under-refusal of genuinely harmful content. We present Vision Language Safety Understanding (VLSU), a comprehensive framework to systematically evaluate multimodal… Read More
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
This paper was accepted at the Learning from Evaluating the Evolving LLM Lifecycle workshop at NeurIPS 2025.
Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish clearly unsafe content from borderline cases, leading to problematic over-blocking or under-refusal of genuinely harmful content. We present Vision Language Safety Understanding (VLSU), a comprehensive framework to systematically evaluate multimodal…
