File Download

There are no files associated with this item.

Conference Paper: Combining quantitative and qualitative measures to validate a group speaking assessment test

TitleCombining quantitative and qualitative measures to validate a group speaking assessment test
Authors
Issue Date2016
Citation
The 2nd International Conference on Linguistics and Language Studies (ICLLS 2016), Hong Kong, 23-24 June 2016. How to Cite?
AbstractAll too often, the shift from a norm-referenced to criterion-reference assessment results in tests that reflect holistic and (teacher-biased) expectations of student performance without actually determining whether the criteria hold sufficient standards of validity and reliability, resulting in considerable inter-rater variance. This is particularly felt in L2 contexts where students struggle to achieve grades under criteria that are not produced in the same language (or spirit) as their L1, and where teachers from different backgrounds may interpret individual criteria according to their ideologies and beliefs. However, most validation studies only focus on quantitative matters (losing the personal, holistic focus) or focus on qualitative concerns (missing the general picture). The present study validates a criterion-referenced group tutorial discussion speaking assessment for undergraduate EAP at a leading university in Hong Kong in terms of inter-rater variance and criterion validity. I present three complementary quantitative measures (Intraclass Correlation Coefficient, Cronbach's Alpha and Exploratory Factor Analysis) which suggest a number of criteria could safely be removed from the rubric, and show that the grading of some criteria frequently overlaps that of other criteria. However, in attempting to explain the reasons behind the statistical results, qualitative interviews with test raters suggest that 1) raters bring with them their own interpretations of the rubric criteria and 2) there are specific linguistic considerations regarding individual test takers' performance and rater variance on said performance. The results have implications for improving the validity and reliability of in-house produced criterion-reference assessment rubrics, and hopefully the paper serves as a 'how-to' for language assessment practitioners.
Persistent Identifierhttp://hdl.handle.net/10722/227761

 

DC FieldValueLanguage
dc.contributor.authorCrosthwaite, PR-
dc.contributor.authorBoynton, SD-
dc.contributor.authorCole III, SF-
dc.date.accessioned2016-07-18T09:12:40Z-
dc.date.available2016-07-18T09:12:40Z-
dc.date.issued2016-
dc.identifier.citationThe 2nd International Conference on Linguistics and Language Studies (ICLLS 2016), Hong Kong, 23-24 June 2016.-
dc.identifier.urihttp://hdl.handle.net/10722/227761-
dc.description.abstractAll too often, the shift from a norm-referenced to criterion-reference assessment results in tests that reflect holistic and (teacher-biased) expectations of student performance without actually determining whether the criteria hold sufficient standards of validity and reliability, resulting in considerable inter-rater variance. This is particularly felt in L2 contexts where students struggle to achieve grades under criteria that are not produced in the same language (or spirit) as their L1, and where teachers from different backgrounds may interpret individual criteria according to their ideologies and beliefs. However, most validation studies only focus on quantitative matters (losing the personal, holistic focus) or focus on qualitative concerns (missing the general picture). The present study validates a criterion-referenced group tutorial discussion speaking assessment for undergraduate EAP at a leading university in Hong Kong in terms of inter-rater variance and criterion validity. I present three complementary quantitative measures (Intraclass Correlation Coefficient, Cronbach's Alpha and Exploratory Factor Analysis) which suggest a number of criteria could safely be removed from the rubric, and show that the grading of some criteria frequently overlaps that of other criteria. However, in attempting to explain the reasons behind the statistical results, qualitative interviews with test raters suggest that 1) raters bring with them their own interpretations of the rubric criteria and 2) there are specific linguistic considerations regarding individual test takers' performance and rater variance on said performance. The results have implications for improving the validity and reliability of in-house produced criterion-reference assessment rubrics, and hopefully the paper serves as a 'how-to' for language assessment practitioners.-
dc.languageeng-
dc.relation.ispartofInternational Conference on Linguistics and Language Studies, ICLLS 2016-
dc.titleCombining quantitative and qualitative measures to validate a group speaking assessment test-
dc.typeConference_Paper-
dc.identifier.emailCrosthwaite, PR: drprc80@hku.hk-
dc.identifier.emailBoynton, SD: sboynton@hku.hk-
dc.identifier.emailCole III, SF: samcole@hku.hk-
dc.identifier.authorityCrosthwaite, PR=rp01961-
dc.identifier.hkuros258879-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats