Application of statistical learning methods to predict psychopathological symptoms and well-being in young people

Fang, Catherine Zhiqian; 方芷芊

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Psychiatry: Theses

postgraduate thesis: Application of statistical learning methods to predict psychopathological symptoms and well-being in young people

Title	Application of statistical learning methods to predict psychopathological symptoms and well-being in young people
Authors	Fang, Catherine Zhiqian 方芷芊
Advisors	Advisor(s):Hui, CLM Chen, EYH Sham, PC
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Fang, C. Z. [方芷芊]. (2023). Application of statistical learning methods to predict psychopathological symptoms and well-being in young people. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Adolescence and early adulthood are crucial phases of a person’s mental development. Most of the mental disorders that start during this time have brought patients long-lasting and even lifetime suffering. Therefore, prevention and early intervention are important in reducing the incidence of mental disorders. However, identifying target groups for these interventions is not always easy. A properly developed prediction model can predict individual prognostic or diagnostic outcomes, which would allow us to target those with a higher vulnerability. Despite the growing number of new prediction models in recent years, only a few have been translated into practice due to limitations in model development, including the lack of validation, explaining instead of predicting, and poor reverse translation. In this thesis, statistical learning methods were applied to develop prediction models using two sets of youth mental health data collected from a naturalistic study and a quasi-experimental trial (QCT) under the LevelMind@JC community mental health project targeting youth aged between 12 to 24 in Hong Kong. In Study 1, we investigated how youths could be stratified based on their mental health indicators at baseline. A classification and regression tree was applied to naturalistic data to develop a prediction model for mental health triage. Internal and external validations were conducted to ensure generalizability and reverse translation. The results showed two- and three-level trees explained a comparable proportion of variance (n = 419; two-level: R-squared = 0.324; 3-level: R-squared = 0.315), whereas the former had higher interpretability and generalizability. Study 2 compared the prediction performances and model features of eight statistical learning models applied to tabular data (i.e., data stored in a table) collected from youths who used mental health services in the QCT. The models compared include linear regression, LASSO regression, ridge regression, principal component regression (PCR), XGboost trees, random forest (RF), support vector machine (SVM), and single-layer neural network (NN). Model hyperparameters were trained using 10-fold cross-validation. The overall performance of these models was compared using RMSE, R-squared, and MAE. In predicting general psychopathology at three-month, PCR, SVM, RF, and XGBoost had comparable performance in training and test data (ps > 0.05) and explained almost 50% of the variance of the psychopathology outcome score in unseen test data (R2 = 0.44 – 0.46). With small-to-moderate training sample sizes (n ≤ 200), the larger the training samples, the better the prediction performance and generalizability in most of the models. To the best of our knowledge, the prediction model developed in study 1 is one of the first data-driven mental health triage models for community youth mental health hubs in Hong Kong. It demonstrated the value of statistical learning in studying youth mental health and guiding clinical practice. Further, study 2 provided useful insights into the prediction performance and model features of different statistical learning methods when applied to tabular data with different training sample sizes. Future studies should utilize statistical learning to advance our understanding of youth mental health and develop reliable tools for mental health practice.
Degree	Master of Philosophy
Subject	Youth - Mental health - Statistical methods
Dept/Program	Psychiatry
Persistent Identifier	http://hdl.handle.net/10722/327811

DC Field	Value	Language
dc.contributor.advisor	Hui, CLM	-
dc.contributor.advisor	Chen, EYH	-
dc.contributor.advisor	Sham, PC	-
dc.contributor.author	Fang, Catherine Zhiqian	-
dc.contributor.author	方芷芊	-
dc.date.accessioned	2023-06-05T03:46:14Z	-
dc.date.available	2023-06-05T03:46:14Z	-
dc.date.issued	2023	-
dc.identifier.citation	Fang, C. Z. [方芷芊]. (2023). Application of statistical learning methods to predict psychopathological symptoms and well-being in young people. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/327811	-
dc.description.abstract	Adolescence and early adulthood are crucial phases of a person’s mental development. Most of the mental disorders that start during this time have brought patients long-lasting and even lifetime suffering. Therefore, prevention and early intervention are important in reducing the incidence of mental disorders. However, identifying target groups for these interventions is not always easy. A properly developed prediction model can predict individual prognostic or diagnostic outcomes, which would allow us to target those with a higher vulnerability. Despite the growing number of new prediction models in recent years, only a few have been translated into practice due to limitations in model development, including the lack of validation, explaining instead of predicting, and poor reverse translation. In this thesis, statistical learning methods were applied to develop prediction models using two sets of youth mental health data collected from a naturalistic study and a quasi-experimental trial (QCT) under the LevelMind@JC community mental health project targeting youth aged between 12 to 24 in Hong Kong. In Study 1, we investigated how youths could be stratified based on their mental health indicators at baseline. A classification and regression tree was applied to naturalistic data to develop a prediction model for mental health triage. Internal and external validations were conducted to ensure generalizability and reverse translation. The results showed two- and three-level trees explained a comparable proportion of variance (n = 419; two-level: R-squared = 0.324; 3-level: R-squared = 0.315), whereas the former had higher interpretability and generalizability. Study 2 compared the prediction performances and model features of eight statistical learning models applied to tabular data (i.e., data stored in a table) collected from youths who used mental health services in the QCT. The models compared include linear regression, LASSO regression, ridge regression, principal component regression (PCR), XGboost trees, random forest (RF), support vector machine (SVM), and single-layer neural network (NN). Model hyperparameters were trained using 10-fold cross-validation. The overall performance of these models was compared using RMSE, R-squared, and MAE. In predicting general psychopathology at three-month, PCR, SVM, RF, and XGBoost had comparable performance in training and test data (ps > 0.05) and explained almost 50% of the variance of the psychopathology outcome score in unseen test data (R2 = 0.44 – 0.46). With small-to-moderate training sample sizes (n ≤ 200), the larger the training samples, the better the prediction performance and generalizability in most of the models. To the best of our knowledge, the prediction model developed in study 1 is one of the first data-driven mental health triage models for community youth mental health hubs in Hong Kong. It demonstrated the value of statistical learning in studying youth mental health and guiding clinical practice. Further, study 2 provided useful insights into the prediction performance and model features of different statistical learning methods when applied to tabular data with different training sample sizes. Future studies should utilize statistical learning to advance our understanding of youth mental health and develop reliable tools for mental health practice.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Youth - Mental health - Statistical methods	-
dc.title	Application of statistical learning methods to predict psychopathological symptoms and well-being in young people	-
dc.type	PG_Thesis	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Psychiatry	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044683801503414	-

File Download

Supplementary

postgraduate thesis: Application of statistical learning methods to predict psychopathological symptoms and well-being in young people

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats