Insider threat investigation through unsupervised learning

Wei, Yichen; 衛易辰

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Insider threat investigation through unsupervised learning

Title	Insider threat investigation through unsupervised learning
Authors	Wei, Yichen 衛易辰
Advisors	Advisor(s):Chow, KP
Issue Date	2020
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wei, Y. [衛易辰]. (2020). Insider threat investigation through unsupervised learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Insider threat investigation is one of the major challenges in the field of digital forensics. Being different with external attackers, insiders possess the tokens to access the digital asset within the organization, of which the deviations from normal behaviors are hard to seize. The complexity, concealment and infrequency of malicious internal actions make it difficult to detect insider threat. In this dissertation, we employ unsupervised deep learning approaches for investigating insider threat from digital evidence. The novel frameworks for insider threat detection, prediction and investigation are proposed. The proposed techniques are based on unsupervised data filtering, joint optimization and graph representation learning. First, we propose a real unsupervised deep learning framework for detecting insider threat from system log files. Being widely used for producing the nonlinear representation as low-dimensional codes of the input data, autoencoder is used for insider threat detection through automatic filtering in this thesis. We design cascaded autoencoder insider threat detection framework, a real unsupervised learning model, in which we can filter out insider records by cascaded autoencoder filters (CAFs) automatically and estimate the distribution of encoded normal data with Gaussian mixture model, then identify insider threats’ log records if they have low probabilities. In the process of traditional reactive forensic investigation, analysis and interpretation of the digital evidence are performed after a crime has been committed. Even if insiders can be detected, they have already caused huge damage to the organizations. In this thesis, we propose a novel general unsupervised anomaly detection scheme based on CAFs and joint optimization network. The core idea is to utilize CAFs to do data purification among unlabeled imbalanced dataset then jointly optimize the dimension reduction and density estimation network. Basing on this scheme, we design an end-to-end insider threat prediction framework for proactive forensic investigation, through which we can make real time response to prevent the harmful influences of insider threat. We extract the tractable and scalable feature representation automatically through the data driven Bidirectional Long Short-Term Memory feature extractor, which eliminates the time-consuming and customarily expert dependable feature engineering work. A hypergraph correction module is applied to decrease the commonly existed relatively high false positive rate in insider threat detection. Additionally, most existing deep learning solutions for insider threat investigation ignore considering the underlying correlation relationship among the data and only work for data with Euclidean structure. This thesis proposes Log2graph, an unsupervised variational graph autoencoder based scheme to detect insider threat entities through huge amount of data. We construct a graph representing an insider attack case from raw log files and design a novel graph neural network model to detect suspicious anomalous insiders in the graph. Subsequently, we perform a post-analysis to analyze the anomaly-instructure, which can help investigators attribute potential insiders. We evaluate our proposed models on public benchmark datasets. The empirical experiments demonstrate that our models outperform state-of-the-art methods.
Degree	Doctor of Philosophy
Subject	Machine learning Computer security Computer crimes - Investigation
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/308624

DC Field	Value	Language
dc.contributor.advisor	Chow, KP	-
dc.contributor.author	Wei, Yichen	-
dc.contributor.author	衛易辰	-
dc.date.accessioned	2021-12-06T01:04:01Z	-
dc.date.available	2021-12-06T01:04:01Z	-
dc.date.issued	2020	-
dc.identifier.citation	Wei, Y. [衛易辰]. (2020). Insider threat investigation through unsupervised learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/308624	-
dc.description.abstract	Insider threat investigation is one of the major challenges in the field of digital forensics. Being different with external attackers, insiders possess the tokens to access the digital asset within the organization, of which the deviations from normal behaviors are hard to seize. The complexity, concealment and infrequency of malicious internal actions make it difficult to detect insider threat. In this dissertation, we employ unsupervised deep learning approaches for investigating insider threat from digital evidence. The novel frameworks for insider threat detection, prediction and investigation are proposed. The proposed techniques are based on unsupervised data filtering, joint optimization and graph representation learning. First, we propose a real unsupervised deep learning framework for detecting insider threat from system log files. Being widely used for producing the nonlinear representation as low-dimensional codes of the input data, autoencoder is used for insider threat detection through automatic filtering in this thesis. We design cascaded autoencoder insider threat detection framework, a real unsupervised learning model, in which we can filter out insider records by cascaded autoencoder filters (CAFs) automatically and estimate the distribution of encoded normal data with Gaussian mixture model, then identify insider threats’ log records if they have low probabilities. In the process of traditional reactive forensic investigation, analysis and interpretation of the digital evidence are performed after a crime has been committed. Even if insiders can be detected, they have already caused huge damage to the organizations. In this thesis, we propose a novel general unsupervised anomaly detection scheme based on CAFs and joint optimization network. The core idea is to utilize CAFs to do data purification among unlabeled imbalanced dataset then jointly optimize the dimension reduction and density estimation network. Basing on this scheme, we design an end-to-end insider threat prediction framework for proactive forensic investigation, through which we can make real time response to prevent the harmful influences of insider threat. We extract the tractable and scalable feature representation automatically through the data driven Bidirectional Long Short-Term Memory feature extractor, which eliminates the time-consuming and customarily expert dependable feature engineering work. A hypergraph correction module is applied to decrease the commonly existed relatively high false positive rate in insider threat detection. Additionally, most existing deep learning solutions for insider threat investigation ignore considering the underlying correlation relationship among the data and only work for data with Euclidean structure. This thesis proposes Log2graph, an unsupervised variational graph autoencoder based scheme to detect insider threat entities through huge amount of data. We construct a graph representing an insider attack case from raw log files and design a novel graph neural network model to detect suspicious anomalous insiders in the graph. Subsequently, we perform a post-analysis to analyze the anomaly-instructure, which can help investigators attribute potential insiders. We evaluate our proposed models on public benchmark datasets. The empirical experiments demonstrate that our models outperform state-of-the-art methods.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Machine learning	-
dc.subject.lcsh	Computer security	-
dc.subject.lcsh	Computer crimes - Investigation	-
dc.title	Insider threat investigation through unsupervised learning	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2021	-
dc.identifier.mmsid	991044448906703414	-

File Download

Supplementary

postgraduate thesis: Insider threat investigation through unsupervised learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats