File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Application of statistical learning methods to predict psychopathological symptoms and well-being in young people
Title | Application of statistical learning methods to predict psychopathological symptoms and well-being in young people |
---|---|
Authors | |
Advisors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Fang, C. Z. [方芷芊]. (2023). Application of statistical learning methods to predict psychopathological symptoms and well-being in young people. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Adolescence and early adulthood are crucial phases of a person’s mental development. Most of the mental disorders that start during this time have brought patients long-lasting and even lifetime suffering. Therefore, prevention and early intervention are important in reducing the incidence of mental disorders. However, identifying target groups for these interventions is not always easy. A properly developed prediction model can predict individual prognostic or diagnostic outcomes, which would allow us to target those with a higher vulnerability.
Despite the growing number of new prediction models in recent years, only a few have been translated into practice due to limitations in model development, including the lack of validation, explaining instead of predicting, and poor reverse translation.
In this thesis, statistical learning methods were applied to develop prediction models using two sets of youth mental health data collected from a naturalistic study and a quasi-experimental trial (QCT) under the LevelMind@JC community mental health project targeting youth aged between 12 to 24 in Hong Kong.
In Study 1, we investigated how youths could be stratified based on their mental health indicators at baseline. A classification and regression tree was applied to naturalistic data to develop a prediction model for mental health triage. Internal and external validations were conducted to ensure generalizability and reverse translation. The results showed two- and three-level trees explained a comparable proportion of variance (n = 419; two-level: R-squared = 0.324; 3-level: R-squared = 0.315), whereas the former had higher interpretability and generalizability.
Study 2 compared the prediction performances and model features of eight statistical learning models applied to tabular data (i.e., data stored in a table) collected from youths who used mental health services in the QCT. The models compared include linear regression, LASSO regression, ridge regression, principal component regression (PCR), XGboost trees, random forest (RF), support vector machine (SVM), and single-layer neural network (NN). Model hyperparameters were trained using 10-fold cross-validation. The overall performance of these models was compared using RMSE, R-squared, and MAE. In predicting general psychopathology at three-month, PCR, SVM, RF, and XGBoost had comparable performance in training and test data (ps > 0.05) and explained almost 50% of the variance of the psychopathology outcome score in unseen test data (R2 = 0.44 – 0.46). With small-to-moderate training sample sizes (n ≤ 200), the larger the training samples, the better the prediction performance and generalizability in most of the models.
To the best of our knowledge, the prediction model developed in study 1 is one of the first data-driven mental health triage models for community youth mental health hubs in Hong Kong. It demonstrated the value of statistical learning in studying youth mental health and guiding clinical practice. Further, study 2 provided useful insights into the prediction performance and model features of different statistical learning methods when applied to tabular data with different training sample sizes. Future studies should utilize statistical learning to advance our understanding of youth mental health and develop reliable tools for mental health practice. |
Degree | Master of Philosophy |
Subject | Youth - Mental health - Statistical methods |
Dept/Program | Psychiatry |
Persistent Identifier | http://hdl.handle.net/10722/327811 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Hui, CLM | - |
dc.contributor.advisor | Chen, EYH | - |
dc.contributor.advisor | Sham, PC | - |
dc.contributor.author | Fang, Catherine Zhiqian | - |
dc.contributor.author | 方芷芊 | - |
dc.date.accessioned | 2023-06-05T03:46:14Z | - |
dc.date.available | 2023-06-05T03:46:14Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Fang, C. Z. [方芷芊]. (2023). Application of statistical learning methods to predict psychopathological symptoms and well-being in young people. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/327811 | - |
dc.description.abstract | Adolescence and early adulthood are crucial phases of a person’s mental development. Most of the mental disorders that start during this time have brought patients long-lasting and even lifetime suffering. Therefore, prevention and early intervention are important in reducing the incidence of mental disorders. However, identifying target groups for these interventions is not always easy. A properly developed prediction model can predict individual prognostic or diagnostic outcomes, which would allow us to target those with a higher vulnerability. Despite the growing number of new prediction models in recent years, only a few have been translated into practice due to limitations in model development, including the lack of validation, explaining instead of predicting, and poor reverse translation. In this thesis, statistical learning methods were applied to develop prediction models using two sets of youth mental health data collected from a naturalistic study and a quasi-experimental trial (QCT) under the LevelMind@JC community mental health project targeting youth aged between 12 to 24 in Hong Kong. In Study 1, we investigated how youths could be stratified based on their mental health indicators at baseline. A classification and regression tree was applied to naturalistic data to develop a prediction model for mental health triage. Internal and external validations were conducted to ensure generalizability and reverse translation. The results showed two- and three-level trees explained a comparable proportion of variance (n = 419; two-level: R-squared = 0.324; 3-level: R-squared = 0.315), whereas the former had higher interpretability and generalizability. Study 2 compared the prediction performances and model features of eight statistical learning models applied to tabular data (i.e., data stored in a table) collected from youths who used mental health services in the QCT. The models compared include linear regression, LASSO regression, ridge regression, principal component regression (PCR), XGboost trees, random forest (RF), support vector machine (SVM), and single-layer neural network (NN). Model hyperparameters were trained using 10-fold cross-validation. The overall performance of these models was compared using RMSE, R-squared, and MAE. In predicting general psychopathology at three-month, PCR, SVM, RF, and XGBoost had comparable performance in training and test data (ps > 0.05) and explained almost 50% of the variance of the psychopathology outcome score in unseen test data (R2 = 0.44 – 0.46). With small-to-moderate training sample sizes (n ≤ 200), the larger the training samples, the better the prediction performance and generalizability in most of the models. To the best of our knowledge, the prediction model developed in study 1 is one of the first data-driven mental health triage models for community youth mental health hubs in Hong Kong. It demonstrated the value of statistical learning in studying youth mental health and guiding clinical practice. Further, study 2 provided useful insights into the prediction performance and model features of different statistical learning methods when applied to tabular data with different training sample sizes. Future studies should utilize statistical learning to advance our understanding of youth mental health and develop reliable tools for mental health practice. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Youth - Mental health - Statistical methods | - |
dc.title | Application of statistical learning methods to predict psychopathological symptoms and well-being in young people | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Psychiatry | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044683801503414 | - |