Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN

Ma, Wen; 马文

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Paediatrics & Adolescent Medicine: Theses

postgraduate thesis: Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN

Title	Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN
Authors	Ma, Wen 马文
Advisors	Advisor(s):Lau, YL Yang, W
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Ma, W. [马文]. (2023). Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Systemic lupus erythematosus (SLE) is a common autoimmune disease that affects several vital organs, including heart, brain, kidneys, joints, skin, and central nervous system. Due to the heterogeneity of SLE, it is difficult to make a prognosis or early diagnosis based solely on biomarker tests such as anti-nuclear antibody and anti-dsDNA tests, because they are non-specific for SLE. With the increased availability of genome Single Nucleotide Polymorphism genotyping for SLE, such genetic information can aid early diagnosis and prediction of the risk of developing SLE. Advances in precision medicine can also aid in the assessment of the risk of SLE in individuals. The typical method for predicting disease risk called polygenic risk score (PRS) is based on genotype data, but it often exhibits poor predictive results as it does not take into account the relationship between alleles. Hence, we proposed to apply three classical supervised machine learning (ML) models: Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN) to capture genetic correlations to improve risk predictions for developing SLE. Among the three ML models, RF was shown to be most efficient with the least training time and higher performance for the prediction of SLE compared to lasso-sum PRS, which is one of the best PRS models. Specifically, RF produced the highest mean prediction AUC of 84% on the Chinese dataset and 76% on the European dataset, which is an improvement of 10% and 11% over lasso-sum PRS, respectively. The SVM and ANN models performed comparably, with mean AUC values of 0.77 and 0.76 in the Chinese dataset, respectively, which was slightly higher than in the PRS model (mean AUC = 0.74). A similar pattern was found in the European dataset. Approximately 50%-70% of SLE patients develop lupus nephritis (LN), which has the highest mortality rate among these patients. There are very few specific predictive models that can aid in the early diagnosis of LN. To fill this gap, we investigated the predictive power of RF, SVM, and ANN compared to lasso-sum PRS. Using the Hong Kong data, we performed predictions on two groups: 1) only LN and non-LN (NLN) samples, and 2) LN and NLN samples with control samples. Using only LN and NLN samples, all four models could not well distinguish between LN and NLN, with the best average AUC of 0.55 achieved by ANN. Furthermore, adding control samples did not significantly improve the predictive ability of the models in distinguishing between LN and NLN. Nevertheless, RF had the best mean AUC of 0.89 for differentiating between control and LN samples in the three-class classification, which was an improvement of 12% over lasso-sum PRS (mean AUC = 0.77).
Degree	Doctor of Philosophy
Subject	Machine learning Systemic lupus erythematosus
Dept/Program	Paediatrics and Adolescent Medicine
Persistent Identifier	http://hdl.handle.net/10722/335957

DC Field	Value	Language
dc.contributor.advisor	Lau, YL	-
dc.contributor.advisor	Yang, W	-
dc.contributor.author	Ma, Wen	-
dc.contributor.author	马文	-
dc.date.accessioned	2023-12-29T04:05:10Z	-
dc.date.available	2023-12-29T04:05:10Z	-
dc.date.issued	2023	-
dc.identifier.citation	Ma, W. [马文]. (2023). Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/335957	-
dc.description.abstract	Systemic lupus erythematosus (SLE) is a common autoimmune disease that affects several vital organs, including heart, brain, kidneys, joints, skin, and central nervous system. Due to the heterogeneity of SLE, it is difficult to make a prognosis or early diagnosis based solely on biomarker tests such as anti-nuclear antibody and anti-dsDNA tests, because they are non-specific for SLE. With the increased availability of genome Single Nucleotide Polymorphism genotyping for SLE, such genetic information can aid early diagnosis and prediction of the risk of developing SLE. Advances in precision medicine can also aid in the assessment of the risk of SLE in individuals. The typical method for predicting disease risk called polygenic risk score (PRS) is based on genotype data, but it often exhibits poor predictive results as it does not take into account the relationship between alleles. Hence, we proposed to apply three classical supervised machine learning (ML) models: Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN) to capture genetic correlations to improve risk predictions for developing SLE. Among the three ML models, RF was shown to be most efficient with the least training time and higher performance for the prediction of SLE compared to lasso-sum PRS, which is one of the best PRS models. Specifically, RF produced the highest mean prediction AUC of 84% on the Chinese dataset and 76% on the European dataset, which is an improvement of 10% and 11% over lasso-sum PRS, respectively. The SVM and ANN models performed comparably, with mean AUC values of 0.77 and 0.76 in the Chinese dataset, respectively, which was slightly higher than in the PRS model (mean AUC = 0.74). A similar pattern was found in the European dataset. Approximately 50%-70% of SLE patients develop lupus nephritis (LN), which has the highest mortality rate among these patients. There are very few specific predictive models that can aid in the early diagnosis of LN. To fill this gap, we investigated the predictive power of RF, SVM, and ANN compared to lasso-sum PRS. Using the Hong Kong data, we performed predictions on two groups: 1) only LN and non-LN (NLN) samples, and 2) LN and NLN samples with control samples. Using only LN and NLN samples, all four models could not well distinguish between LN and NLN, with the best average AUC of 0.55 achieved by ANN. Furthermore, adding control samples did not significantly improve the predictive ability of the models in distinguishing between LN and NLN. Nevertheless, RF had the best mean AUC of 0.89 for differentiating between control and LN samples in the three-class classification, which was an improvement of 12% over lasso-sum PRS (mean AUC = 0.77).	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Machine learning	-
dc.subject.lcsh	Systemic lupus erythematosus	-
dc.title	Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Paediatrics and Adolescent Medicine	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044751042003414	-

File Download

Supplementary

postgraduate thesis: Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats