File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN
Title | Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN |
---|---|
Authors | |
Advisors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Ma, W. [马文]. (2023). Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Systemic lupus erythematosus (SLE) is a common autoimmune disease that affects several vital organs, including heart, brain, kidneys, joints, skin, and central nervous system. Due to the heterogeneity of SLE, it is difficult to make a prognosis or early diagnosis based solely on biomarker tests such as anti-nuclear antibody and anti-dsDNA tests, because they are non-specific for SLE. With the increased availability of genome Single Nucleotide Polymorphism genotyping for SLE, such genetic information can aid early diagnosis and prediction of the risk of developing SLE.
Advances in precision medicine can also aid in the assessment of the risk of SLE in individuals. The typical method for predicting disease risk called polygenic risk score (PRS) is based on genotype data, but it often exhibits poor predictive results as it does not take into account the relationship between alleles. Hence, we proposed to apply three classical supervised machine learning (ML) models: Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN) to capture genetic correlations to improve risk predictions for developing SLE.
Among the three ML models, RF was shown to be most efficient with the least training time and higher performance for the prediction of SLE compared to lasso-sum PRS, which is one of the best PRS models. Specifically, RF produced the highest mean prediction AUC of 84% on the Chinese dataset and 76% on the European dataset, which is an improvement of 10% and 11% over lasso-sum PRS, respectively. The SVM and ANN models performed comparably, with mean AUC values of 0.77 and 0.76 in the Chinese dataset, respectively, which was slightly higher than in the PRS model (mean AUC = 0.74). A similar pattern was found in the European dataset.
Approximately 50%-70% of SLE patients develop lupus nephritis (LN), which has the highest mortality rate among these patients. There are very few specific predictive models that can aid in the early diagnosis of LN. To fill this gap, we investigated the predictive power of RF, SVM, and ANN compared to lasso-sum PRS. Using the Hong Kong data, we performed predictions on two groups: 1) only LN and non-LN (NLN) samples, and 2) LN and NLN samples with control samples. Using only LN and NLN samples, all four models could not well distinguish between LN and NLN, with the best average AUC of 0.55 achieved by ANN. Furthermore, adding control samples did not significantly improve the predictive ability of the models in distinguishing between LN and NLN. Nevertheless, RF had the best mean AUC of 0.89 for differentiating between control and LN samples in the three-class classification, which was an improvement of 12% over lasso-sum PRS (mean AUC = 0.77). |
Degree | Doctor of Philosophy |
Subject | Machine learning Systemic lupus erythematosus |
Dept/Program | Paediatrics and Adolescent Medicine |
Persistent Identifier | http://hdl.handle.net/10722/335957 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lau, YL | - |
dc.contributor.advisor | Yang, W | - |
dc.contributor.author | Ma, Wen | - |
dc.contributor.author | 马文 | - |
dc.date.accessioned | 2023-12-29T04:05:10Z | - |
dc.date.available | 2023-12-29T04:05:10Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Ma, W. [马文]. (2023). Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/335957 | - |
dc.description.abstract | Systemic lupus erythematosus (SLE) is a common autoimmune disease that affects several vital organs, including heart, brain, kidneys, joints, skin, and central nervous system. Due to the heterogeneity of SLE, it is difficult to make a prognosis or early diagnosis based solely on biomarker tests such as anti-nuclear antibody and anti-dsDNA tests, because they are non-specific for SLE. With the increased availability of genome Single Nucleotide Polymorphism genotyping for SLE, such genetic information can aid early diagnosis and prediction of the risk of developing SLE. Advances in precision medicine can also aid in the assessment of the risk of SLE in individuals. The typical method for predicting disease risk called polygenic risk score (PRS) is based on genotype data, but it often exhibits poor predictive results as it does not take into account the relationship between alleles. Hence, we proposed to apply three classical supervised machine learning (ML) models: Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN) to capture genetic correlations to improve risk predictions for developing SLE. Among the three ML models, RF was shown to be most efficient with the least training time and higher performance for the prediction of SLE compared to lasso-sum PRS, which is one of the best PRS models. Specifically, RF produced the highest mean prediction AUC of 84% on the Chinese dataset and 76% on the European dataset, which is an improvement of 10% and 11% over lasso-sum PRS, respectively. The SVM and ANN models performed comparably, with mean AUC values of 0.77 and 0.76 in the Chinese dataset, respectively, which was slightly higher than in the PRS model (mean AUC = 0.74). A similar pattern was found in the European dataset. Approximately 50%-70% of SLE patients develop lupus nephritis (LN), which has the highest mortality rate among these patients. There are very few specific predictive models that can aid in the early diagnosis of LN. To fill this gap, we investigated the predictive power of RF, SVM, and ANN compared to lasso-sum PRS. Using the Hong Kong data, we performed predictions on two groups: 1) only LN and non-LN (NLN) samples, and 2) LN and NLN samples with control samples. Using only LN and NLN samples, all four models could not well distinguish between LN and NLN, with the best average AUC of 0.55 achieved by ANN. Furthermore, adding control samples did not significantly improve the predictive ability of the models in distinguishing between LN and NLN. Nevertheless, RF had the best mean AUC of 0.89 for differentiating between control and LN samples in the three-class classification, which was an improvement of 12% over lasso-sum PRS (mean AUC = 0.77). | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Machine learning | - |
dc.subject.lcsh | Systemic lupus erythematosus | - |
dc.title | Random forest boosts genetic risk prediction of systemic lupus erythematosus (SLE) but does not distinguish between patients with lupus nephritis (LN) and non-LN | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Paediatrics and Adolescent Medicine | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044751042003414 | - |