An investigation into the effect of different missing data imputation methods on IRT-based differential item functioning


Creative Commons License

Ünal F., Koğar H.

INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, vol.11, no.3, pp.445-462, 2024 (ESCI)

  • Publication Type: Article / Article
  • Volume: 11 Issue: 3
  • Publication Date: 2024
  • Doi Number: 10.21449/ijate.1417166
  • Journal Name: INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Central & Eastern European Academic Source (CEEAS), ERIC (Education Resources Information Center), TR DİZİN (ULAKBİM)
  • Page Numbers: pp.445-462
  • Akdeniz University Affiliated: Yes

Abstract

The purpose of this study is to examine the effect of missing data imputation methods, namely regression imputation (RI), multiple imputation (MI) and k-nearest neighbor (kNN) on differential item functioning (DIF). In this regard, the datasets used in the research were created by deleting some of the data via the missing completely at random mechanism from the complete datasets obtained from 600 students in Türkiye, the United Kingdom, the USA, New Zealand and Australia, who answered booklets 14 and 15 from the PISA 2018 science literacy test. Data imputation was applied to the datasets through missing data using RI, MI and kNN methods and DIF analysis was performed on all datasets in terms of language and gender variables via Lord’s χ2 method, Raju’s area measurement method and item response theory likelihood ratio method. DIF results from the complete datasets were taken as a reference and they were compared with the results from other datasets. As a result of the research, values close to 10% of accurate imputation were achieved in the RI method depending on language and gen-der variables. In MI and kNN methods, results closest to the complete datasets were obtained at a rate of 5% depending on the language variable. In the MI method, inaccurate results were obtained in all proportions in terms of the gender variable. For the gender variable, the kNN method gave accurate results at rates of 5% and 10%.

The purpose of this study is to examine the effect of missing data imputation methods, namely regression imputation (RI), multiple imputation (MI) and k-nearest neighbor (kNN) on differential item functioning (DIF). In this regard, the datasets used in the research were created by deleting some of the data via the missing completely at random mechanism from the complete datasets obtained from 600 students in Türkiye, the United Kingdom, the USA, New Zealand and Australia, who answered booklets 14 and 15 from the PISA 2018 science literacy test. Data imputation was applied to the datasets through missing data using RI, MI and kNN methods and DIF analysis was performed on all datasets in terms of language and gender variables via Lord’s χ2 method, Raju’s area measurement method and item response theory likelihood ratio method. DIF results from the complete datasets were taken as a reference and they were compared with the results from other datasets. As a result of the research, values close to 10% of accurate imputation were achieved in the RI method depending on language and gen-der variables. In MI and kNN methods, results closest to the complete datasets were obtained at a rate of 5% depending on the language variable. In the MI method, inaccurate results were obtained in all proportions in terms of the gender variable. For the gender variable, the kNN method gave accurate results at rates of 5% and 10%.