IRT-based automated test assembly: a sampling and stratification perspective

Date

2005

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Each year the construction of several linear forms of an assessment is required for most large-scale assessments. Use of automated test assembly procedures to construct many parallel test forms greatly reduces the workload for test developers and ensures the quality of the tests. Existing automated test assembly methods include the heuristic approach, linear programming, network flow, and optimal design. All of these methods fall under the category of constrained combinatorial optimization (van der Linden, 1998). The purpose of this study was to establish a new IRT-based automated test assembly method based on the sampling of test items from a specially stratified item bank such that the distribution of the items’ parameter values mimicked that of the target test. Three such methods were introduced and developed in this study: the Cell Only Method, the Cell and Linear Programming Method (Cell & LP Method), and the Cell & Cube methods. Afterward, each of these methods was compared to the baseline, Minimax Model in Linear Programming Method. Six test forms of 40 items were assembled using each test assembly method. For the simulated item pool, a constraint of no more than 20% test overlap rate was added to both the linear programming component of the Cell & LP Method and the LP Method. Performance evaluation criteria included mean square deviation (MSD), form-to-form overlap rate, and test information function. For tests assembled from the real item pool, the Cell Only Method proved to be superior to the other methods in terms of hitting target test information curves and providing lower MSDs. The Cell & Cube Method yielded the smallest test overlap rates. For tests constructed from the simulated 3000-item pool, the Cell & LP Method yielded the smallest MSDs. All three new methods yielded relatively smaller mean square deviation than the Minimax Model of the Linear Programming Method. Even when a test overlap rate constraint was added to the Minimax Model of the Linear Programming Method, the average test overlap rate was still higher than the three new methods. Overall, the Cell & Cube Method was recommended for its simplicity and item pool use.

Description

text

Keywords

Citation