Construction, Analysis and Calibration of Multiple-Choice Questions: IRT versus CTT


  • Iqra Batool Department of Education, University of Sargodha, Punjab, Pakistan.
  • Ashfaque Ahmad SHAH Department of Educational Development, University of Baltistan, Gilgit Baltistan, Skardu, Pakistan
  • Sehrish Naseer Department of Education, University of Sargodha, Punjab, Pakistan.


Construction, Calibration, Classical Test Theory (CTT), Item Response Theory (IRT), Item difficulty, Item discrimination


The current study examined the construction, analysis and calibration of multiple-choice questions. This quantitative study employed developmental and descriptive methods of research. A convenience sampling technique was used to select a sample of 200 students from the University of Sargodha. The researchers developed a test of multiple-choice items at a master’s level from the “Methods of Teaching” course. This test was used as an instrument to collect data from the respondents. Iteman and X-Calibre were considered suitable tools for item analyses for assessment management applications used to analyze the data. Results showed that the test was identified as fairly difficult, having a modest level of item discrimination index. Student raw scores ranged from 7 to 49 marks. CTT proposed to reject seven items whereas IRT removed six based on the item difficulty index. CTT proposed to reject 18 items due to low ability to differentiate between high and low achievers. Six items were flagged with K. Under the S-pbis in CTT, 18 items were rejected and according to IRT’s parameter ‘b’, there were 6 items that were rejected. Results of the current study established that using IRT for item analysis may be useful in determining the grades of the course and the number of students passing the cut-score. It was recommended that before applying IRT, verify if the test items are locally independent one-dimensional and the ICCs fit the model.


Adedoyin, O. (2010). An Investigation of the effects of teachers’ classroom questions on the achievements of students in mathematics: Case study of Botswana community junior secondary schools. European Journal of Educational Studies, 2(3).

Algina, J. and Penfield, R. D. (2009). Classical test theory. In R. Millsap & A. Maydeu-Olivares

(Eds.). The Sage handbook of quantitative methods in psychology (pp. 93-122). Thousand

Oaks, CA: Sage. Retrieved from

Anderson, S. R., & Miller, R. B. (2020). Improving measurement in couple and Family Therapy: An item response Theory Primer. Journal of Marital and Family Therapy, 46(4), 603-619.

Awopeju, O. A., & Afolabi, E. R. I. (2016). Comparative analysis of classical test theory and item response theory-based item parameter estimates of senior school certificate mathematics examination. European Scientific Journal, 12(28), 263-284.

Azevedo, J. M., Oliveira, E. P., & Beites, P. D. (2019). Using learning analytics to evaluate the quality of multiple-choice questions: A perspective with classical test theory and item response theory. The International Journal of Information and Learning Technology, 36(4), 322-341.

Bichi, A. A., Embong, R., Talib, R., Salleh, S., & Bin Ibrahim, A. (2019). Comparative analysis of classical test theory and item response theory using chemistry test data. International Journal of Engineering and Advanced Technology, 8(5), 1260-1266.

Dubbelman, M. A., Postema, M. C., Jutten, R. J., Harrison, J. E., Ritchie, C. W., Aleman, A., ... & Sikkes, S. A. (2023). What’s in a score: A longitudinal investigation of scores based on item response theory and classical test theory for the Amsterdam Instrumental Activities of Daily Living Questionnaire in cognitively normal and impaired older adults. American Psychological Association.

Embretson, S. E. and Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Retrieved from

Fan, X (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurem... June 1998 v58 n3 p357(25). Retrieved from

Gorter, R., Fox, J. P., Riet, G. T., Heymans, M. W., & Twisk, J. W. R. (2020). Latent growth modeling of IRT versus CTT measured longitudinal latent variables. Statistical methods in medical research, 29(4), 962-986.

Idowu, E. O .Eluwa, A.N. and Abang, B.K. (2011). Evaluation of Mathematics Achievement Test: A Comparison Between Classical Test Theory (CTT) and Item Response Theory (IRT). Journal of Educational and Social Research,1(4):99-106.

Joshua, Ubi and Abang, (2011). Classical Test Theory (CTT) VS Item Response Theory (IRT) an evaluation of the comparability of item analysis results by prof. Retrieved from

Kawilapat, S., Maneeton, B., Maneeton, N., Prasitwattanaseree, S., Kongsuk, T., Arunpongpaisal, S.,& Traisathit, P. (2022). Comparison of unweighted and item response theory-based weighted sum scoring for the Nine-Questions Depression-Rating Scale in the Northern Thai Dialect. BMC Medical Research Methodology, 22(1), 1-15.

Le, Dai-Trang, (2013). Applying item response theory modeling in educational research . Graduate Theses and Dissertations. 13410. Retrieved from

Linn, R. L. and Gronlund, N. E. (2000). Measurement and Assessment in Teaching. Eighth Edition. Retrieved from

Molenaar, I. W. and Sijtsma, K. (2002). Non parametric item response theory International educational and professional publisher thousand oaks London. Retrieved from

Morizot, J. Ainsworth, A. T. and Reise, S. P. (2007). Toward modern psychometrics: Application of item response theory models in personality research. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of Research Methods in Personality Psychology

Nasir, M. (2014). Application of classical test theory and item response theory to analyze

multiple choice questions. Retrieved from

Ojerinde, D. Onoja, G. O and Ifewulu, B. C. (2014). A Comparative Analysis of Candidate’s Performances in the Pre and Post IRT Eras in JAMB on the Use of English language for the 2012 and 2013 UTME. A paper presented at the 39th IAEA Annual Conference in Tel Aviv in Israel. Retrieved from

OYIBORHORO, A. V. (2023). Application of Item Response Theory in the Validation of Basic Science Test of Delta State Basic Education Certificate Examination. International Journal of Research in Education and Sustainable Development, 3(7), 1-13.

Reise, S.P. and Waller, N.G. (2003) How many IRT parameters does it take to model psychopathology items? Psychol.Meth. 8:164–84 . Retrieved from

Tayn, K.S. (2010). An evaluation of multiple-choice test questions deliberately designed to include multiple correct. Retrieved from

Tang, X., Schalet, B. D., Peipert, J. D., & Cella, D. (2023). Does scoring method impact estimation of significant individual changes assessed by patient-reported outcome measures? Comparing Classical Test Theory versus Item Response Theory. Value in Health.

Thissen, D., and Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test Scoring. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Retrieved from

Thorndike, R. L. (1982). Educational measurement: Theory and practice. In D. Spearritt (Ed.) The improvement of measurement in education and psychology: Contributions of latent trait theory. Princeton, NJ: ERIC Clearinghouse of Tests, Measurements, and Evaluations. (ERIC Document Reproduction Service No. ED 222 545).

Topczewski, A. M., Kapoor, S. and Cunningham, P. (2013). Examining the parameter recovery of BILOG-MG 3 and WinBUGS 1.4.3. Poster presented at the Annual Meeting of the National Council on Measurement in Education San Francisco California. Retrieved from

Waller, N. G. and Reise, S. P. (2010). Measuring psychopathology with non-standard IRT models: Fitting the four-parameter model to the MMPI. In S. E. Embretson (Ed.), Measuring psychological constructs with model-based approaches, pp. 147-173.

Williamson, L.M. (2010). An item response theory revision of the internal control index. Diss. California State University, Sacramento, 2012. Retrieved from

Yaşar, M. (2019). Development of a “Perceived Stress Scale" based on Classical Test Theory and graded response model. International Journal of Assessment Tools in Education, 6(3), 522-538.

Zaman, A. Kashmiri, A. R. Mubarak, M. and Ali, Arshad. (2008). Students Ranking, Based on their Abilities on Objective Type Test: Comparison of CTT and IRT. Research online Institutional Repository.



How to Cite

Iqra Batool, Ashfaque Ahmad SHAH, & Sehrish Naseer. (2023). Construction, Analysis and Calibration of Multiple-Choice Questions: IRT versus CTT. Archives of Educational Studies (ARES), 3(2), 242–257. Retrieved from