Can Adversarial Weight Perturbations Inject Neural Backdoors

Garg, Siddhant; Kumar, Adarsh; Goel, Vibhor; Liang, Yingyu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3340531.3412130
Scopus: eid_2-s2.0-85095864916
WOS: WOS:000749561302004

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Can Adversarial Weight Perturbations Inject Neural Backdoors

Title	Can Adversarial Weight Perturbations Inject Neural Backdoors
Authors	Garg, Siddhant Kumar, Adarsh Goel, Vibhor Liang, Yingyu
Keywords	adversarial deep learning backdoor attacks
Issue Date	2020
Citation	International Conference on Information and Knowledge Management, Proceedings, 2020, p. 2029-2032 How to Cite? DOI: http://dx.doi.org/10.1145/3340531.3412130
Abstract	Adversarial machine learning has exposed several security hazards of neural models. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an ĝ.,"∞ norm around the original model weights. We introduce adversarial perturbations in model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several CV and NLP applications.
Persistent Identifier	http://hdl.handle.net/10722/341291
ISI Accession Number ID	WOS:000749561302004

DC Field	Value	Language
dc.contributor.author	Garg, Siddhant	-
dc.contributor.author	Kumar, Adarsh	-
dc.contributor.author	Goel, Vibhor	-
dc.contributor.author	Liang, Yingyu	-
dc.date.accessioned	2024-03-13T08:41:40Z	-
dc.date.available	2024-03-13T08:41:40Z	-
dc.date.issued	2020	-
dc.identifier.citation	International Conference on Information and Knowledge Management, Proceedings, 2020, p. 2029-2032	-
dc.identifier.uri	http://hdl.handle.net/10722/341291	-
dc.description.abstract	Adversarial machine learning has exposed several security hazards of neural models. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an ĝ.,"∞ norm around the original model weights. We introduce adversarial perturbations in model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several CV and NLP applications.	-
dc.language	eng	-
dc.relation.ispartof	International Conference on Information and Knowledge Management, Proceedings	-
dc.subject	adversarial deep learning	-
dc.subject	backdoor attacks	-
dc.title	Can Adversarial Weight Perturbations Inject Neural Backdoors	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3340531.3412130	-
dc.identifier.scopus	eid_2-s2.0-85095864916	-
dc.identifier.spage	2029	-
dc.identifier.epage	2032	-
dc.identifier.isi	WOS:000749561302004	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Can Adversarial Weight Perturbations Inject Neural Backdoors

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats