File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions

TitleB-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions
Authors
Keywordsadversarial visual-instructions
bias evaluation
black-box
Large vision-language model
Issue Date1-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Information Forensics and Security, 2025, v. 20, p. 1434-1446 How to Cite?
Abstract

Large Vision-Language Models (LVLMs) have shown significant progress in responding well to visual-instructions from users. However, these instructions, encompassing images and text, are susceptible to both intentional and inadvertent attacks. Despite the critical importance of LVLMs' robustness against such threats, current research in this area remains limited. To bridge this gap, we introduce B-AVIBench, a framework designed to analyze the robustness of LVLMs when facing various Black-box Adversarial Visual-Instructions (B-AVIs), including four types of image-based B-AVIs, ten types of text-based B-AVIs, and nine types of content bias B-AVIs (such as gender, violence, cultural, and racial biases, among others). We generate 316K B-AVIs encompassing five categories of multimodal capabilities (ten tasks) and content bias. We then conduct a comprehensive evaluation involving 14 open-source LVLMs to assess their performance. B-AVIBench also serves as a convenient tool for practitioners to evaluate the robustness of LVLMs against B-AVIs. Our findings and extensive experimental results shed light on the vulnerabilities of LVLMs, and highlight that inherent biases exist even in advanced closed-source LVLMs like GeminiProVision and GPT-4V. This underscores the importance of enhancing the robustness, security, and fairness of LVLMs. The source code and benchmark are available at https://github.com/zhanghao5201/B-AVIBench.


Persistent Identifierhttp://hdl.handle.net/10722/362582
ISSN
2023 Impact Factor: 6.3
2023 SCImago Journal Rankings: 2.890

 

DC FieldValueLanguage
dc.contributor.authorZhang, Hao-
dc.contributor.authorShao, Wenqi-
dc.contributor.authorLiu, Hong-
dc.contributor.authorMa, Yongqiang-
dc.contributor.authorLuo, Ping-
dc.contributor.authorQiao, Yu-
dc.contributor.authorZheng, Nanning-
dc.contributor.authorZhang, Kaipeng-
dc.date.accessioned2025-09-26T00:36:16Z-
dc.date.available2025-09-26T00:36:16Z-
dc.date.issued2025-01-01-
dc.identifier.citationIEEE Transactions on Information Forensics and Security, 2025, v. 20, p. 1434-1446-
dc.identifier.issn1556-6013-
dc.identifier.urihttp://hdl.handle.net/10722/362582-
dc.description.abstract<p>Large Vision-Language Models (LVLMs) have shown significant progress in responding well to visual-instructions from users. However, these instructions, encompassing images and text, are susceptible to both intentional and inadvertent attacks. Despite the critical importance of LVLMs' robustness against such threats, current research in this area remains limited. To bridge this gap, we introduce B-AVIBench, a framework designed to analyze the robustness of LVLMs when facing various Black-box Adversarial Visual-Instructions (B-AVIs), including four types of image-based B-AVIs, ten types of text-based B-AVIs, and nine types of content bias B-AVIs (such as gender, violence, cultural, and racial biases, among others). We generate 316K B-AVIs encompassing five categories of multimodal capabilities (ten tasks) and content bias. We then conduct a comprehensive evaluation involving 14 open-source LVLMs to assess their performance. B-AVIBench also serves as a convenient tool for practitioners to evaluate the robustness of LVLMs against B-AVIs. Our findings and extensive experimental results shed light on the vulnerabilities of LVLMs, and highlight that inherent biases exist even in advanced closed-source LVLMs like GeminiProVision and GPT-4V. This underscores the importance of enhancing the robustness, security, and fairness of LVLMs. The source code and benchmark are available at https://github.com/zhanghao5201/B-AVIBench.</p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Information Forensics and Security-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectadversarial visual-instructions-
dc.subjectbias evaluation-
dc.subjectblack-box-
dc.subjectLarge vision-language model-
dc.titleB-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions-
dc.typeArticle-
dc.identifier.doi10.1109/TIFS.2024.3520306-
dc.identifier.scopuseid_2-s2.0-85213463692-
dc.identifier.volume20-
dc.identifier.spage1434-
dc.identifier.epage1446-
dc.identifier.eissn1556-6021-
dc.identifier.issnl1556-6013-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats