QUICK REVIEW

[Paper Review] How To Grade a Test Without Knowing the Answers --- A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing

Yoram Bachrach, Thore Graepel|arXiv (Cornell University)|Jun 27, 2012

Machine Learning and Algorithms22 references113 citations

TL;DR

This paper proposes a Bayesian graphical model that jointly estimates question difficulty, participant ability, and correct answers in aptitude testing and crowdsourcing without prior knowledge of answer keys. Using active learning to minimize expected model entropy, the method adaptively selects questions, reducing required questions by up to 30% while maintaining accuracy compared to static testing.

ABSTRACT

We propose a new probabilistic graphical model that jointly models the difficulties of questions, the abilities of participants and the correct answers to questions in aptitude testing and crowdsourcing settings. We devise an active learning/adaptive testing scheme based on a greedy minimization of expected model entropy, which allows a more efficient resource allocation by dynamically choosing the next question to be asked based on the previous responses. We present experimental results that confirm the ability of our model to infer the required parameters and demonstrate that the adaptive testing scheme requires fewer questions to obtain the same accuracy as a static test scenario.

Motivation & Objective

To develop a probabilistic model that jointly infers question difficulty, participant ability, and correct answers in the absence of ground-truth answer keys.
To design an adaptive testing framework that dynamically selects the next question based on prior responses to optimize information gain.
To improve resource efficiency in crowdsourcing and aptitude testing by minimizing the number of questions needed for accurate assessment.
To validate the model's ability to infer latent parameters and outperform static testing in accuracy and efficiency.

Proposed method

The model uses a Bayesian graphical structure to represent dependencies between question difficulties, participant abilities, and correct answers.
It employs a joint probability distribution over latent variables: question difficulty, participant ability, and answer correctness.
The adaptive selection strategy uses greedy minimization of expected model entropy to choose the next question that maximizes information gain.
The model updates posterior distributions over abilities and difficulties using Bayesian inference after each response.
The method supports both crowdsourcing and traditional aptitude testing by modeling uncertainty in both participants and questions.
The framework is trained and evaluated using real-world data, with inference performed via variational approximation or Gibbs sampling.

Experimental results

Research questions

RQ1Can a Bayesian graphical model jointly infer question difficulty, participant ability, and correct answers without access to answer keys?
RQ2Does adaptive question selection based on expected entropy reduction improve estimation efficiency compared to static test designs?
RQ3How many fewer questions are needed for the same accuracy when using adaptive selection versus fixed question sequences?
RQ4To what extent can the model accurately estimate participant ability and question difficulty in real-world crowdsourcing settings?

Key findings

The model successfully infers question difficulty, participant ability, and correct answers with high accuracy even when answer keys are unknown.
Adaptive testing reduced the number of required questions by up to 30% compared to static testing while maintaining equivalent accuracy.
The expected entropy minimization strategy led to faster convergence and more efficient learning of latent parameters.
The model demonstrated robust performance across diverse data distributions and participant reliability levels.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.