Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students

Wong, On-wing.; 黃安穎.

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5060585

Supplementary

Citations:
Appears in Collections:
- Faculty of Education: Theses
- HKU Theses Online

postgraduate thesis: Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students

Title	Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students
Authors	Wong, On-wing.黃安穎.
Advisors	Advisor(s):Law, NWY Chan, KP
Issue Date	2013
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wong, O. [黃安穎]. (2013). Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5060585
Abstract	This study aims to investigate the automated question detection and classification methods to support teachers in monitoring the progression of discussion in Computer-Supported Collaborative Learning (CSCL) discourse of Hong Kong students. Questioning is an important component of CSCL. Through the analysis of question types in CSCL discourse, teachers may probably get a general idea of how an inquiry is constructed. This study is an attempt to take up this time-consuming task of question classification with the techniques developed from machine learning. In general, the performance of machine learning algorithms will improve by increasing the amount of empirical data for training. The amount of training data is a determining factor for the performance of machine learning algorithms. The machine learning based question classification algorithms may not able to detect those question types with a small amount of training data. In order not to miss out those questions, an extra step to detect the occurrence of all question types might be needed. One Chinese and one English datasets are collected from an online discussion platform. These datasets are selected for comparing the performance of question detection and classification in the two languages, and a sentence is defined as the unit of analysis. Question detection is a process to distinguish questions from other types of discourse act. A hybrid method is proposed to combine the rule-based question mark method and machine-learning-based syntax method for question detection. This method achieves 94.8% f1-score and 98.9% accuracy in English question detection and 94.8% f1-score and 93.9% accuracy in Chinese question detection. While question detection focuses mainly on the identification of questions, question classification concentrates on the categorization of questions. The literature showed that the tree kernel method is almost a standardized method for question classification. The classification of English verification and reason questions using tree kernel method can both attained f1-score above 80%. Though the precision of Chinese question classification using the same settings remains at a similar level, the recall drops greatly. This result indicates that the syntax-based tree kernel method may not be appropriate for classifying questions in Chinese languages. In order to improve on the Chinese question classification result, Case-Based Reasoning (CBR) is introduced. CBR is a method to retrieve example case(s) which shares the maximum percentage of similarity with the test case from a database. In this study, the similarity is measured by the lexemes that composed a question. Although the implementation of the CBR method can improve the recall, it also causes the great drop of precision. Considering the high precision of tree kernel method and wide coverage of CBR method, a hybrid method is proposed to combine the two methods. The experiment result shows that f1-score of the hybrid method for multi-class classification surpasses the tree kernel and CBR methods. This indicates that the implementation of hybrid method can generally improve the result of Chinese question classification.
Degree	Master of Philosophy
Subject	Group work in education - Computer-assisted instruction - China - Hong Kong.
Dept/Program	Education
Persistent Identifier	http://hdl.handle.net/10722/188758
HKU Library Item ID	b5060585

DC Field	Value	Language
dc.contributor.advisor	Law, NWY	-
dc.contributor.advisor	Chan, KP	-
dc.contributor.author	Wong, On-wing.	-
dc.contributor.author	黃安穎.	-
dc.date.accessioned	2013-09-08T15:08:02Z	-
dc.date.available	2013-09-08T15:08:02Z	-
dc.date.issued	2013	-
dc.identifier.citation	Wong, O. [黃安穎]. (2013). Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5060585	-
dc.identifier.uri	http://hdl.handle.net/10722/188758	-
dc.description.abstract	This study aims to investigate the automated question detection and classification methods to support teachers in monitoring the progression of discussion in Computer-Supported Collaborative Learning (CSCL) discourse of Hong Kong students. Questioning is an important component of CSCL. Through the analysis of question types in CSCL discourse, teachers may probably get a general idea of how an inquiry is constructed. This study is an attempt to take up this time-consuming task of question classification with the techniques developed from machine learning. In general, the performance of machine learning algorithms will improve by increasing the amount of empirical data for training. The amount of training data is a determining factor for the performance of machine learning algorithms. The machine learning based question classification algorithms may not able to detect those question types with a small amount of training data. In order not to miss out those questions, an extra step to detect the occurrence of all question types might be needed. One Chinese and one English datasets are collected from an online discussion platform. These datasets are selected for comparing the performance of question detection and classification in the two languages, and a sentence is defined as the unit of analysis. Question detection is a process to distinguish questions from other types of discourse act. A hybrid method is proposed to combine the rule-based question mark method and machine-learning-based syntax method for question detection. This method achieves 94.8% f1-score and 98.9% accuracy in English question detection and 94.8% f1-score and 93.9% accuracy in Chinese question detection. While question detection focuses mainly on the identification of questions, question classification concentrates on the categorization of questions. The literature showed that the tree kernel method is almost a standardized method for question classification. The classification of English verification and reason questions using tree kernel method can both attained f1-score above 80%. Though the precision of Chinese question classification using the same settings remains at a similar level, the recall drops greatly. This result indicates that the syntax-based tree kernel method may not be appropriate for classifying questions in Chinese languages. In order to improve on the Chinese question classification result, Case-Based Reasoning (CBR) is introduced. CBR is a method to retrieve example case(s) which shares the maximum percentage of similarity with the test case from a database. In this study, the similarity is measured by the lexemes that composed a question. Although the implementation of the CBR method can improve the recall, it also causes the great drop of precision. Considering the high precision of tree kernel method and wide coverage of CBR method, a hybrid method is proposed to combine the two methods. The experiment result shows that f1-score of the hybrid method for multi-class classification surpasses the tree kernel and CBR methods. This indicates that the implementation of hybrid method can generally improve the result of Chinese question classification.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.source.uri	http://hub.hku.hk/bib/B50605859	-
dc.subject.lcsh	Group work in education - Computer-assisted instruction - China - Hong Kong.	-
dc.title	Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5060585	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Education	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5060585	-
dc.date.hkucongregation	2013	-
dc.identifier.mmsid	991035575019703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats