Comparing Traditional and IRT Scoring of Forced-Choice Tests

Hontangas, Pedro M.; de la Torre, Jimmy; Ponsoda, Vicente; Leenen, Iwin; Morillo, Daniel; Abad, Francisco J.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1177/0146621615585851
Scopus: eid_2-s2.0-84943572085
WOS: WOS:000362593200002
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Faculty of Education: Journal/Magazine Articles

Article: Comparing Traditional and IRT Scoring of Forced-Choice Tests

Title	Comparing Traditional and IRT Scoring of Forced-Choice Tests
Authors	Hontangas, Pedro M.de la Torre, Jimmy Ponsoda, Vicente Leenen, Iwin Morillo, Daniel Abad, Francisco J.
Keywords	EAP
Issue Date	2015
Citation	Applied Psychological Measurement, 2015, v. 39, n. 8, p. 598-612 How to Cite? DOI: http://dx.doi.org/10.1177/0146621615585851
Abstract	© 2015, © The Author(s) 2015.This article explores how traditional scores obtained from different forced-choice (FC) formats relate to their true scores and item response theory (IRT) estimates. Three FC formats are considered from a block of items, and respondents are asked to (a) pick the item that describes them most (PICK), (b) choose the two items that describe them the most and the least (MOLE), or (c) rank all the items in the order of their descriptiveness of the respondents (RANK). The multi-unidimensional pairwise-preference (MUPP) model, which is extended to more than two items per block and different FC formats, is applied to obtain the responses to each item block. Traditional and IRT (i.e., expected a posteriori) scores are computed from each data set and compared. The aim is to clarify the conditions under which simpler traditional scoring procedures for FC formats may be used in place of the more appropriate IRT estimates for the purpose of inter-individual comparisons. Six independent variables are considered: response format, number of items per block, correlation between the dimensions, item discrimination level, and sign-heterogeneity and variability of item difficulty parameters. Results show that the RANK response format outperforms the other formats for both the IRT estimates and traditional scores, although it is only slightly better than the MOLE format. The highest correlations between true and traditional scores are found when the test has a large number of blocks, dimensions assessed are independent, items have high discrimination and highly dispersed location parameters, and the test contains blocks formed by positive and negative items.
Persistent Identifier	http://hdl.handle.net/10722/228232
ISSN	0146-6216 2023 Impact Factor: 1.0 2023 SCImago Journal Rankings: 1.061
ISI Accession Number ID	WOS:000362593200002

DC Field	Value	Language
dc.contributor.author	Hontangas, Pedro M.	-
dc.contributor.author	de la Torre, Jimmy	-
dc.contributor.author	Ponsoda, Vicente	-
dc.contributor.author	Leenen, Iwin	-
dc.contributor.author	Morillo, Daniel	-
dc.contributor.author	Abad, Francisco J.	-
dc.date.accessioned	2016-08-01T06:45:31Z	-
dc.date.available	2016-08-01T06:45:31Z	-
dc.date.issued	2015	-
dc.identifier.citation	Applied Psychological Measurement, 2015, v. 39, n. 8, p. 598-612	-
dc.identifier.issn	0146-6216	-
dc.identifier.uri	http://hdl.handle.net/10722/228232	-
dc.description.abstract	© 2015, © The Author(s) 2015.This article explores how traditional scores obtained from different forced-choice (FC) formats relate to their true scores and item response theory (IRT) estimates. Three FC formats are considered from a block of items, and respondents are asked to (a) pick the item that describes them most (PICK), (b) choose the two items that describe them the most and the least (MOLE), or (c) rank all the items in the order of their descriptiveness of the respondents (RANK). The multi-unidimensional pairwise-preference (MUPP) model, which is extended to more than two items per block and different FC formats, is applied to obtain the responses to each item block. Traditional and IRT (i.e., expected a posteriori) scores are computed from each data set and compared. The aim is to clarify the conditions under which simpler traditional scoring procedures for FC formats may be used in place of the more appropriate IRT estimates for the purpose of inter-individual comparisons. Six independent variables are considered: response format, number of items per block, correlation between the dimensions, item discrimination level, and sign-heterogeneity and variability of item difficulty parameters. Results show that the RANK response format outperforms the other formats for both the IRT estimates and traditional scores, although it is only slightly better than the MOLE format. The highest correlations between true and traditional scores are found when the test has a large number of blocks, dimensions assessed are independent, items have high discrimination and highly dispersed location parameters, and the test contains blocks formed by positive and negative items.	-
dc.language	eng	-
dc.relation.ispartof	Applied Psychological Measurement	-
dc.subject	EAP	-
dc.title	Comparing Traditional and IRT Scoring of Forced-Choice Tests	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1177/0146621615585851	-
dc.identifier.scopus	eid_2-s2.0-84943572085	-
dc.identifier.volume	39	-
dc.identifier.issue	8	-
dc.identifier.spage	598	-
dc.identifier.epage	612	-
dc.identifier.eissn	1552-3497	-
dc.identifier.isi	WOS:000362593200002	-
dc.identifier.issnl	0146-6216	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Comparing Traditional and IRT Scoring of Forced-Choice Tests

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats