Defending against adversarial machine learning attacks for deep neural networks

Wen, Jing; 聞婧

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Defending against adversarial machine learning attacks for deep neural networks

Title	Defending against adversarial machine learning attacks for deep neural networks
Authors	Wen, Jing 聞婧
Advisors	Advisor(s):Yiu, SM Hui, CK
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wen, J. [聞婧]. (2022). Defending against adversarial machine learning attacks for deep neural networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	The deep neural network is extensively deployed in face recognition and image classification systems, which have achieved excellent performance. However, The adaptive nature of DNNs exposes the systems to new threats, which either compromise the integrity by misleading the model with malicious input or learning the confidential training data. This thesis focused on defensive frameworks and algorithms to systematically protect the DNN based systems from adversarial machine learning attacks. Specifically, we propose DCN, Holmes against adversarial attacks, Pat against model inversion attacks, and PuFace against facial cloaking attacks. Adversarial attacks can easily cheat DNNs by adding imperceptible noise to images and causing misclassification. In DCN, we propose a detector-corrector framework to mitigate adversarial attacks. We observed that logit could serve as an exterior feature to train detectors. Therefore, we suggest detecting adversarial samples by a binary classification model. We introduce a shallow neural network detector, and it can achieve high accuracy for detecting adversarial samples with low false-positive rates. The corrector will search their neighbor area for the detected adversarial samples to find the proper labels rather than the wrong labels the model predicts. Afterward, we extend the single detector to multiple detectors systems and propose Holmes to reinforce DNNs by detecting potential, even unseen adversarial samples from multiple attacks with high detection accuracy and low false adversarial rate than single detector systems even in an adaptive model. To ensure the diversity and randomness of detectors in Holmes, we train dedicated detectors for each label or detectors with top-k logits. Model inversion attacks could reveal and synthesize the input data from the result of DNNs, which poses a serious threat to data privacy. We get inspiration from malicious adversarial samples and present Pat to mitigate model inversion attacks by masking the model prediction with slight protective noise. Specifically, we transform the results into adversarial samples by adding optimal noise vectors to mislead attackers. Meanwhile, we leverage label modifiers to ensure model predictions remain the same. Therefore, Pat will not affect the model accuracy. Facial cloaking attacks add invisible cloaks to facial images to protect users from being recognized by facial recognition models. However, we show that the ”cloaks” can be purified from images. We introduce PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss. Our empirical experiments show that our defensive methods can effectively defend against current adversarial machine learning attacks and outperform existing defenses. They are compatible with various models and complementary to other defenses for complete protection.
Degree	Doctor of Philosophy
Subject	Machine learning Neural networks (Computer science) - Security measures
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/322950

DC Field	Value	Language
dc.contributor.advisor	Yiu, SM	-
dc.contributor.advisor	Hui, CK	-
dc.contributor.author	Wen, Jing	-
dc.contributor.author	聞婧	-
dc.date.accessioned	2022-11-18T10:42:04Z	-
dc.date.available	2022-11-18T10:42:04Z	-
dc.date.issued	2022	-
dc.identifier.citation	Wen, J. [聞婧]. (2022). Defending against adversarial machine learning attacks for deep neural networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/322950	-
dc.description.abstract	The deep neural network is extensively deployed in face recognition and image classification systems, which have achieved excellent performance. However, The adaptive nature of DNNs exposes the systems to new threats, which either compromise the integrity by misleading the model with malicious input or learning the confidential training data. This thesis focused on defensive frameworks and algorithms to systematically protect the DNN based systems from adversarial machine learning attacks. Specifically, we propose DCN, Holmes against adversarial attacks, Pat against model inversion attacks, and PuFace against facial cloaking attacks. Adversarial attacks can easily cheat DNNs by adding imperceptible noise to images and causing misclassification. In DCN, we propose a detector-corrector framework to mitigate adversarial attacks. We observed that logit could serve as an exterior feature to train detectors. Therefore, we suggest detecting adversarial samples by a binary classification model. We introduce a shallow neural network detector, and it can achieve high accuracy for detecting adversarial samples with low false-positive rates. The corrector will search their neighbor area for the detected adversarial samples to find the proper labels rather than the wrong labels the model predicts. Afterward, we extend the single detector to multiple detectors systems and propose Holmes to reinforce DNNs by detecting potential, even unseen adversarial samples from multiple attacks with high detection accuracy and low false adversarial rate than single detector systems even in an adaptive model. To ensure the diversity and randomness of detectors in Holmes, we train dedicated detectors for each label or detectors with top-k logits. Model inversion attacks could reveal and synthesize the input data from the result of DNNs, which poses a serious threat to data privacy. We get inspiration from malicious adversarial samples and present Pat to mitigate model inversion attacks by masking the model prediction with slight protective noise. Specifically, we transform the results into adversarial samples by adding optimal noise vectors to mislead attackers. Meanwhile, we leverage label modifiers to ensure model predictions remain the same. Therefore, Pat will not affect the model accuracy. Facial cloaking attacks add invisible cloaks to facial images to protect users from being recognized by facial recognition models. However, we show that the ”cloaks” can be purified from images. We introduce PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss. Our empirical experiments show that our defensive methods can effectively defend against current adversarial machine learning attacks and outperform existing defenses. They are compatible with various models and complementary to other defenses for complete protection.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Machine learning	-
dc.subject.lcsh	Neural networks (Computer science) - Security measures	-
dc.title	Defending against adversarial machine learning attacks for deep neural networks	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2022	-
dc.identifier.mmsid	991044609100503414	-

File Download

Supplementary

postgraduate thesis: Defending against adversarial machine learning attacks for deep neural networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats