File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students

TitleComputational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students
Authors
Advisors
Advisor(s):Law, NWYChan, KP
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wong, O. [黃安穎]. (2013). Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5060585
AbstractThis study aims to investigate the automated question detection and classification methods to support teachers in monitoring the progression of discussion in Computer-Supported Collaborative Learning (CSCL) discourse of Hong Kong students. Questioning is an important component of CSCL. Through the analysis of question types in CSCL discourse, teachers may probably get a general idea of how an inquiry is constructed. This study is an attempt to take up this time-consuming task of question classification with the techniques developed from machine learning. In general, the performance of machine learning algorithms will improve by increasing the amount of empirical data for training. The amount of training data is a determining factor for the performance of machine learning algorithms. The machine learning based question classification algorithms may not able to detect those question types with a small amount of training data. In order not to miss out those questions, an extra step to detect the occurrence of all question types might be needed. One Chinese and one English datasets are collected from an online discussion platform. These datasets are selected for comparing the performance of question detection and classification in the two languages, and a sentence is defined as the unit of analysis. Question detection is a process to distinguish questions from other types of discourse act. A hybrid method is proposed to combine the rule-based question mark method and machine-learning-based syntax method for question detection. This method achieves 94.8% f1-score and 98.9% accuracy in English question detection and 94.8% f1-score and 93.9% accuracy in Chinese question detection. While question detection focuses mainly on the identification of questions, question classification concentrates on the categorization of questions. The literature showed that the tree kernel method is almost a standardized method for question classification. The classification of English verification and reason questions using tree kernel method can both attained f1-score above 80%. Though the precision of Chinese question classification using the same settings remains at a similar level, the recall drops greatly. This result indicates that the syntax-based tree kernel method may not be appropriate for classifying questions in Chinese languages. In order to improve on the Chinese question classification result, Case-Based Reasoning (CBR) is introduced. CBR is a method to retrieve example case(s) which shares the maximum percentage of similarity with the test case from a database. In this study, the similarity is measured by the lexemes that composed a question. Although the implementation of the CBR method can improve the recall, it also causes the great drop of precision. Considering the high precision of tree kernel method and wide coverage of CBR method, a hybrid method is proposed to combine the two methods. The experiment result shows that f1-score of the hybrid method for multi-class classification surpasses the tree kernel and CBR methods. This indicates that the implementation of hybrid method can generally improve the result of Chinese question classification.
DegreeMaster of Philosophy
SubjectGroup work in education - Computer-assisted instruction - China - Hong Kong.
Dept/ProgramEducation
Persistent Identifierhttp://hdl.handle.net/10722/188758
HKU Library Item IDb5060585

 

DC FieldValueLanguage
dc.contributor.advisorLaw, NWY-
dc.contributor.advisorChan, KP-
dc.contributor.authorWong, On-wing.-
dc.contributor.author黃安穎.-
dc.date.accessioned2013-09-08T15:08:02Z-
dc.date.available2013-09-08T15:08:02Z-
dc.date.issued2013-
dc.identifier.citationWong, O. [黃安穎]. (2013). Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5060585-
dc.identifier.urihttp://hdl.handle.net/10722/188758-
dc.description.abstractThis study aims to investigate the automated question detection and classification methods to support teachers in monitoring the progression of discussion in Computer-Supported Collaborative Learning (CSCL) discourse of Hong Kong students. Questioning is an important component of CSCL. Through the analysis of question types in CSCL discourse, teachers may probably get a general idea of how an inquiry is constructed. This study is an attempt to take up this time-consuming task of question classification with the techniques developed from machine learning. In general, the performance of machine learning algorithms will improve by increasing the amount of empirical data for training. The amount of training data is a determining factor for the performance of machine learning algorithms. The machine learning based question classification algorithms may not able to detect those question types with a small amount of training data. In order not to miss out those questions, an extra step to detect the occurrence of all question types might be needed. One Chinese and one English datasets are collected from an online discussion platform. These datasets are selected for comparing the performance of question detection and classification in the two languages, and a sentence is defined as the unit of analysis. Question detection is a process to distinguish questions from other types of discourse act. A hybrid method is proposed to combine the rule-based question mark method and machine-learning-based syntax method for question detection. This method achieves 94.8% f1-score and 98.9% accuracy in English question detection and 94.8% f1-score and 93.9% accuracy in Chinese question detection. While question detection focuses mainly on the identification of questions, question classification concentrates on the categorization of questions. The literature showed that the tree kernel method is almost a standardized method for question classification. The classification of English verification and reason questions using tree kernel method can both attained f1-score above 80%. Though the precision of Chinese question classification using the same settings remains at a similar level, the recall drops greatly. This result indicates that the syntax-based tree kernel method may not be appropriate for classifying questions in Chinese languages. In order to improve on the Chinese question classification result, Case-Based Reasoning (CBR) is introduced. CBR is a method to retrieve example case(s) which shares the maximum percentage of similarity with the test case from a database. In this study, the similarity is measured by the lexemes that composed a question. Although the implementation of the CBR method can improve the recall, it also causes the great drop of precision. Considering the high precision of tree kernel method and wide coverage of CBR method, a hybrid method is proposed to combine the two methods. The experiment result shows that f1-score of the hybrid method for multi-class classification surpasses the tree kernel and CBR methods. This indicates that the implementation of hybrid method can generally improve the result of Chinese question classification.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.source.urihttp://hub.hku.hk/bib/B50605859-
dc.subject.lcshGroup work in education - Computer-assisted instruction - China - Hong Kong.-
dc.titleComputational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students-
dc.typePG_Thesis-
dc.identifier.hkulb5060585-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineEducation-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5060585-
dc.date.hkucongregation2013-
dc.identifier.mmsid991035575019703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats