Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

Thomas, R. Danielle; Borchers, Conrad; Kakarla, Sanjit; Lin, Jionghao; Bhushan, Shambhavi; Guo, Boyuan; Gatz, Erin; Koedinger, R. Kenneth

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3706468.3706530
Scopus: eid_2-s2.0-105000348823

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Faculty of Education: Conference papers

Conference Paper: Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

Title	Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT
Authors	Thomas, R. Danielle Borchers, Conrad Kakarla, Sanjit Lin, Jionghao Bhushan, Shambhavi Guo, Boyuan Gatz, Erin Koedinger, R. Kenneth
Keywords	AI-assisted tutoring Assessment Generative AI Human-AI tutoring Tutoring
Issue Date	3-Mar-2025
Abstract	The role of multiple-choice questions (MCQs) as effective learning tools has been debated in past research. While MCQs are widely used due to their ease in grading, open response questions are increasingly used for instruction, given advances in large language models (LLMs) for automated grading. This study evaluates MCQs effectiveness relative to open-response questions, both individually and in combination, on learning. These activities are embedded within six tutor lessons on advocacy. Using a posttest-only randomized control design, we compare the performance of 234 tutors (790 lesson completions) across three conditions: MCQ only, open response only, and a combination of both. We find no significant learning differences across conditions at posttest, but tutors in the MCQ condition took significantly less time to complete instruction. These findings suggest that MCQs are as effective, and more efficient, than open response tasks for learning when practice time is limited. To further enhance efficiency, we autograded open responses using GPT-4o and GPT-4-turbo. GPT models demonstrate proficiency for purposes of low-stakes assessment, though further research is needed for broader use. This study contributes a dataset of lesson log data, human annotation rubrics, and LLM prompts to promote transparency and reproducibility.
Persistent Identifier	http://hdl.handle.net/10722/358757

DC Field	Value	Language
dc.contributor.author	Thomas, R. Danielle	-
dc.contributor.author	Borchers, Conrad	-
dc.contributor.author	Kakarla, Sanjit	-
dc.contributor.author	Lin, Jionghao	-
dc.contributor.author	Bhushan, Shambhavi	-
dc.contributor.author	Guo, Boyuan	-
dc.contributor.author	Gatz, Erin	-
dc.contributor.author	Koedinger, R. Kenneth	-
dc.date.accessioned	2025-08-13T07:47:50Z	-
dc.date.available	2025-08-13T07:47:50Z	-
dc.date.issued	2025-03-03	-
dc.identifier.uri	http://hdl.handle.net/10722/358757	-
dc.description.abstract	<p>The role of multiple-choice questions (MCQs) as effective learning tools has been debated in past research. While MCQs are widely used due to their ease in grading, open response questions are increasingly used for instruction, given advances in large language models (LLMs) for automated grading. This study evaluates MCQs effectiveness relative to open-response questions, both individually and in combination, on learning. These activities are embedded within six tutor lessons on advocacy. Using a posttest-only randomized control design, we compare the performance of 234 tutors (790 lesson completions) across three conditions: MCQ only, open response only, and a combination of both. We find no significant learning differences across conditions at posttest, but tutors in the MCQ condition took significantly less time to complete instruction. These findings suggest that MCQs are as effective, and more efficient, than open response tasks for learning when practice time is limited. To further enhance efficiency, we autograded open responses using GPT-4o and GPT-4-turbo. GPT models demonstrate proficiency for purposes of low-stakes assessment, though further research is needed for broader use. This study contributes a dataset of lesson log data, human annotation rubrics, and LLM prompts to promote transparency and reproducibility.</p>	-
dc.language	eng	-
dc.relation.ispartof	15th International Learning Analytics and Knowledge Conference (LAK 2025) (03/03/2025-07/03/2025, Dublin)	-
dc.subject	AI-assisted tutoring	-
dc.subject	Assessment	-
dc.subject	Generative AI	-
dc.subject	Human-AI tutoring	-
dc.subject	Tutoring	-
dc.title	Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT	-
dc.type	Conference_Paper	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1145/3706468.3706530	-
dc.identifier.scopus	eid_2-s2.0-105000348823	-
dc.identifier.spage	494	-
dc.identifier.epage	504	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats