File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Defending against adversarial machine learning attacks for deep neural networks
Title | Defending against adversarial machine learning attacks for deep neural networks |
---|---|
Authors | |
Advisors | |
Issue Date | 2022 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wen, J. [聞婧]. (2022). Defending against adversarial machine learning attacks for deep neural networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | The deep neural network is extensively deployed in face recognition and image classification systems, which have achieved excellent performance. However, The adaptive nature of DNNs exposes the systems to new threats, which either compromise the integrity by misleading the model with malicious input or learning the confidential training data. This thesis focused on defensive frameworks and algorithms to systematically protect the DNN based systems from adversarial machine learning attacks. Specifically, we propose DCN, Holmes against adversarial attacks, Pat against model inversion attacks, and PuFace against facial cloaking attacks.
Adversarial attacks can easily cheat DNNs by adding imperceptible noise to images and causing misclassification. In DCN, we propose a detector-corrector framework to mitigate adversarial attacks. We observed that logit could serve as an exterior feature to train detectors. Therefore, we suggest detecting adversarial samples by a binary classification model. We introduce a shallow neural network detector, and it can achieve high accuracy for detecting adversarial samples with low false-positive rates. The corrector will search their neighbor area for the detected adversarial samples to find the proper labels rather than the wrong labels the model predicts.
Afterward, we extend the single detector to multiple detectors systems and propose Holmes to reinforce DNNs by detecting potential, even unseen adversarial samples from multiple attacks with high detection accuracy and low false adversarial rate than single detector systems even in an adaptive model. To ensure the diversity and randomness of detectors in Holmes, we train dedicated detectors for each label or detectors with top-k logits.
Model inversion attacks could reveal and synthesize the input data from the result of DNNs, which poses a serious threat to data privacy. We get inspiration from malicious adversarial samples and present Pat to mitigate model inversion attacks by masking the model prediction with slight protective noise. Specifically, we transform the results into adversarial samples by adding optimal noise vectors to mislead attackers. Meanwhile, we leverage label modifiers to ensure model predictions remain the same. Therefore, Pat will not affect the model accuracy.
Facial cloaking attacks add invisible cloaks to facial images to protect users from being recognized by facial recognition models. However, we show that the ”cloaks” can be purified from images. We introduce PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss.
Our empirical experiments show that our defensive methods can effectively defend against current adversarial machine learning attacks and outperform existing defenses. They are compatible with various models and complementary to other defenses for complete protection.
|
Degree | Doctor of Philosophy |
Subject | Machine learning Neural networks (Computer science) - Security measures |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/322950 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Yiu, SM | - |
dc.contributor.advisor | Hui, CK | - |
dc.contributor.author | Wen, Jing | - |
dc.contributor.author | 聞婧 | - |
dc.date.accessioned | 2022-11-18T10:42:04Z | - |
dc.date.available | 2022-11-18T10:42:04Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Wen, J. [聞婧]. (2022). Defending against adversarial machine learning attacks for deep neural networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/322950 | - |
dc.description.abstract | The deep neural network is extensively deployed in face recognition and image classification systems, which have achieved excellent performance. However, The adaptive nature of DNNs exposes the systems to new threats, which either compromise the integrity by misleading the model with malicious input or learning the confidential training data. This thesis focused on defensive frameworks and algorithms to systematically protect the DNN based systems from adversarial machine learning attacks. Specifically, we propose DCN, Holmes against adversarial attacks, Pat against model inversion attacks, and PuFace against facial cloaking attacks. Adversarial attacks can easily cheat DNNs by adding imperceptible noise to images and causing misclassification. In DCN, we propose a detector-corrector framework to mitigate adversarial attacks. We observed that logit could serve as an exterior feature to train detectors. Therefore, we suggest detecting adversarial samples by a binary classification model. We introduce a shallow neural network detector, and it can achieve high accuracy for detecting adversarial samples with low false-positive rates. The corrector will search their neighbor area for the detected adversarial samples to find the proper labels rather than the wrong labels the model predicts. Afterward, we extend the single detector to multiple detectors systems and propose Holmes to reinforce DNNs by detecting potential, even unseen adversarial samples from multiple attacks with high detection accuracy and low false adversarial rate than single detector systems even in an adaptive model. To ensure the diversity and randomness of detectors in Holmes, we train dedicated detectors for each label or detectors with top-k logits. Model inversion attacks could reveal and synthesize the input data from the result of DNNs, which poses a serious threat to data privacy. We get inspiration from malicious adversarial samples and present Pat to mitigate model inversion attacks by masking the model prediction with slight protective noise. Specifically, we transform the results into adversarial samples by adding optimal noise vectors to mislead attackers. Meanwhile, we leverage label modifiers to ensure model predictions remain the same. Therefore, Pat will not affect the model accuracy. Facial cloaking attacks add invisible cloaks to facial images to protect users from being recognized by facial recognition models. However, we show that the ”cloaks” can be purified from images. We introduce PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss. Our empirical experiments show that our defensive methods can effectively defend against current adversarial machine learning attacks and outperform existing defenses. They are compatible with various models and complementary to other defenses for complete protection. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Machine learning | - |
dc.subject.lcsh | Neural networks (Computer science) - Security measures | - |
dc.title | Defending against adversarial machine learning attacks for deep neural networks | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2022 | - |
dc.identifier.mmsid | 991044609100503414 | - |