Skip to main content
QUICK REVIEW

[Paper Review] Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory

Chun-Kit Yeung|arXiv (Cornell University)|Apr 26, 2019
Intelligent Tutoring Systems and Adaptive Learning19 references75 citations
TL;DR

Deep-IRT combines DKVMN with item response theory to retain predictive power while providing interpretable student ability and item difficulty estimates over time.

ABSTRACT

Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep neural network architecture called dynamic key-value memory network (DKVMN) to make deep learning based knowledge tracing explainable. Specifically, we use the DKVMN model to process the student's learning trajectory and estimate the student ability level and the item difficulty level over time. Then, we use the IRT model to estimate the probability that a student will answer an item correctly using the estimated student ability and the item difficulty. Experiments show that the Deep-IRT model retains the performance of the DKVMN model, while it provides a direct psychological interpretation of both students and items.

Motivation & Objective

  • Motivate the need for interpretable deep learning-based knowledge tracing models.
  • Propose a synthesis of DKVMN with an IRT-based probability model to yield explainable parameters.
  • Demonstrate that Deep-IRT preserves DKVMN performance while offering psychological interpretability.
  • Analyze the learned student ability and item difficulty against traditional measures.
  • Show potential broader applicability of combining deep learning with psychometric models.

Proposed method

  • Utilize DKVMN to process learning trajectories and extract latent KC and knowledge state representations.
  • Augment DKVMN with a student ability network and a KC difficulty network to produce theta_tj and beta_j.
  • Apply a one-parameter IRT probability function p_t = sigma(3.0 * theta_tj - beta_j) to predict correctness.
  • Train with cross-entropy loss using Adam optimization and standard deep learning practices (embedding matrices, memory matrices).
  • Compare Deep-IRT against DKVMN, DKT, and PFA across multiple public and proprietary datasets.
  • Provide interpretability by mapping deep features to psychometric parameters (ability and difficulty) over time.

Experimental results

Research questions

  • RQ1Does the Deep-IRT model retain the predictive performance of the DKVMN model while providing interpretable theta (ability) and beta (difficulty) at the item level?
  • RQ2How do the estimated item difficulties and student abilities from Deep-IRT compare with traditional IRT/item analysis measures and other KT baselines?
  • RQ3Can the combination of deep learning and IRT offer explainability without compromising key KT metrics (AUC, accuracy, loss) across diverse datasets?
  • RQ4What are the implications of the learned difficulty trajectories for knowledge components across learning trajectories?

Key findings

DatasetModelAUCAccLoss
ASSIST2009PFA59.6869.247.08
ASSIST2009DKT81.5677.175.26
ASSIST2009DKVMN81.6177.015.29
ASSIST2009Deep-IRT81.6577.005.30
ASSIST2015PFA52.8573.376.13
ASSIST2015DKT72.8575.295.69
ASSIST2015DKVMN72.9475.185.71
ASSIST2015Deep-IRT72.8875.145.72
Statics2011PFA64.9979.854.64
Statics2011DKT82.7181.374.29
Statics2011DKVMN83.1781.574.24
Statics2011Deep-IRT83.0981.564.24
SyntheticPFA61.6865.208.01
SyntheticDKT81.6574.845.79
SyntheticDKVMN82.9775.585.62
SyntheticDeep-IRT82.9875.615.61
FSAI-F1toF3PFA54.5254.5710.46
FSAI-F1toF3DKT69.4264.118.26
FSAI-F1toF3DKVMN68.4063.408.42
FSAI-F1toF3Deep-IRT68.6963.438.42
  • Deep-IRT achieves predictive performance comparable to DKVMN across datasets, often matching or exceeding accuracy and AUC.
  • Deep-IRT provides interpretable estimates of student ability and KC difficulty that align with traditional approaches like IRT and item analysis.
  • Across datasets, the Deep-IRT difficulty estimates correlate with external difficulty measures and differ in expected ways from raw model outputs.
  • The model retains DKVMN’s strengths while offering a direct psychological interpretation of both students and items.
  • Analysis shows the reconstruction issue observed in DKT also persists in Deep-IRT, consistent with prior KT findings.
  • Experiments indicate Deep-IRT can serve as an alternative trajectory-based estimator of KC difficulty using entire learning histories.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.