Construction, Analysis and Calibration of Multiple-Choice Questions: IRT versus CTT


  • Iqra Batool Department of Education, University of Sargodha, Punjab, Pakistan.
  • Ashfaque Ahmad SHAH Department of Educational Development, University of Baltistan, Gilgit Baltistan, Skardu, Pakistan
  • Sehrish Naseer Department of Education, University of Sargodha, Punjab, Pakistan.


Construction, Calibration, Classical Test Theory (CTT), Item Response Theory (IRT), Item difficulty, Item discrimination


The current study examined the construction, analysis and calibration of multiple-choice questions. This quantitative study employed developmental and descriptive methods of research. A convenience sampling technique was used to select a sample of 200 students from the University of Sargodha. The researchers developed a test of multiple-choice items at a master’s level from the “Methods of Teaching” course. This test was used as an instrument to collect data from the respondents. Iteman and X-Calibre were considered suitable tools for item analyses for assessment management applications used to analyze the data. Results showed that the test was identified as fairly difficult, having a modest level of item discrimination index. Student raw scores ranged from 7 to 49 marks. CTT proposed to reject seven items whereas IRT removed six based on the item difficulty index. CTT proposed to reject 18 items due to low ability to differentiate between high and low achievers. Six items were flagged with K. Under the S-pbis in CTT, 18 items were rejected and according to IRT’s parameter ‘b’, there were 6 items that were rejected. Results of the current study established that using IRT for item analysis may be useful in determining the grades of the course and the number of students passing the cut-score. It was recommended that before applying IRT, verify if the test items are locally independent one-dimensional and the ICCs fit the model.


