QUICK REVIEW

[Paper Review] Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory

Chun-Kit Yeung|arXiv (Cornell University)|Apr 26, 2019

Intelligent Tutoring Systems and Adaptive Learning19 references75 citations

TL;DR

Deep-IRT combines DKVMN with item response theory to retain predictive power while providing interpretable student ability and item difficulty estimates over time.

ABSTRACT

Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep neural network architecture called dynamic key-value memory network (DKVMN) to make deep learning based knowledge tracing explainable. Specifically, we use the DKVMN model to process the student's learning trajectory and estimate the student ability level and the item difficulty level over time. Then, we use the IRT model to estimate the probability that a student will answer an item correctly using the estimated student ability and the item difficulty. Experiments show that the Deep-IRT model retains the performance of the DKVMN model, while it provides a direct psychological interpretation of both students and items.

Motivation & Objective

Motivate the need for interpretable deep learning-based knowledge tracing models.
Propose a synthesis of DKVMN with an IRT-based probability model to yield explainable parameters.
Demonstrate that Deep-IRT preserves DKVMN performance while offering psychological interpretability.
Analyze the learned student ability and item difficulty against traditional measures.
Show potential broader applicability of combining deep learning with psychometric models.

Proposed method

Utilize DKVMN to process learning trajectories and extract latent KC and knowledge state representations.
Augment DKVMN with a student ability network and a KC difficulty network to produce theta_tj and beta_j.
Apply a one-parameter IRT probability function p_t = sigma(3.0 * theta_tj - beta_j) to predict correctness.
Train with cross-entropy loss using Adam optimization and standard deep learning practices (embedding matrices, memory matrices).
Compare Deep-IRT against DKVMN, DKT, and PFA across multiple public and proprietary datasets.
Provide interpretability by mapping deep features to psychometric parameters (ability and difficulty) over time.

Experimental results

Research questions

RQ1Does the Deep-IRT model retain the predictive performance of the DKVMN model while providing interpretable theta (ability) and beta (difficulty) at the item level?
RQ2How do the estimated item difficulties and student abilities from Deep-IRT compare with traditional IRT/item analysis measures and other KT baselines?
RQ3Can the combination of deep learning and IRT offer explainability without compromising key KT metrics (AUC, accuracy, loss) across diverse datasets?
RQ4What are the implications of the learned difficulty trajectories for knowledge components across learning trajectories?

Key findings

Dataset	Model	AUC	Acc	Loss
ASSIST2009	PFA	59.68	69.24	7.08
ASSIST2009	DKT	81.56	77.17	5.26
ASSIST2009	DKVMN	81.61	77.01	5.29
ASSIST2009	Deep-IRT	81.65	77.00	5.30
ASSIST2015	PFA	52.85	73.37	6.13
ASSIST2015	DKT	72.85	75.29	5.69
ASSIST2015	DKVMN	72.94	75.18	5.71
ASSIST2015	Deep-IRT	72.88	75.14	5.72
Statics2011	PFA	64.99	79.85	4.64
Statics2011	DKT	82.71	81.37	4.29
Statics2011	DKVMN	83.17	81.57	4.24
Statics2011	Deep-IRT	83.09	81.56	4.24
Synthetic	PFA	61.68	65.20	8.01
Synthetic	DKT	81.65	74.84	5.79
Synthetic	DKVMN	82.97	75.58	5.62
Synthetic	Deep-IRT	82.98	75.61	5.61
FSAI-F1toF3	PFA	54.52	54.57	10.46
FSAI-F1toF3	DKT	69.42	64.11	8.26
FSAI-F1toF3	DKVMN	68.40	63.40	8.42
FSAI-F1toF3	Deep-IRT	68.69	63.43	8.42

Deep-IRT achieves predictive performance comparable to DKVMN across datasets, often matching or exceeding accuracy and AUC.
Deep-IRT provides interpretable estimates of student ability and KC difficulty that align with traditional approaches like IRT and item analysis.
Across datasets, the Deep-IRT difficulty estimates correlate with external difficulty measures and differ in expected ways from raw model outputs.
The model retains DKVMN’s strengths while offering a direct psychological interpretation of both students and items.
Analysis shows the reconstruction issue observed in DKT also persists in Deep-IRT, consistent with prior KT findings.
Experiments indicate Deep-IRT can serve as an alternative trajectory-based estimator of KC difficulty using entire learning histories.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.