QUICK REVIEW

[Paper Review] Learning diagnostic policies from examples by systematic search

Valentina Bayer‐Zubek|arXiv (Cornell University)|Jul 7, 2004

AI-based Problem Solving and Planning16 references18 citations

TL;DR

This paper proposes a systematic search approach using AO* to learn cost-sensitive diagnostic policies that minimize expected total costs, integrating overfitting control via regularizers. It demonstrates experimentally that systematic search outperforms greedy methods like Value of Information on benchmark datasets, offering more accurate and robust diagnostic policies without assuming a Bayesian network structure.

ABSTRACT

A diagnostic policy specifies what test to perform next, based on the results of previous tests, and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the costs of tests and (b) the costs of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost. We formalize this diagnosis process as a Markov Decision Process (MDP). We investigate two types of algorithms for solving this MDP: systematic search based on the AO* algorithm and greedy search (particularly the Value of Information method). We investigate the issue of learning the MDP probabilities from examples, but only as they are relevant to the search for good policies. We do not learn nor assume a Bayesian network for the diagnosis process. Regularizers are developed that control overfitting and speed up the search. This research is the first that integrates overfitting prevention into systematic search. The paper has two contributions: it discusses the factors that make systematic search feasible for diagnosis, and it shows experimentally, on benchmark data sets, that systematic search methods produce better diagnostic policies than greedy methods.

Motivation & Objective

To develop a method for learning optimal diagnostic policies that balance test costs and misdiagnosis penalties.
To address the challenge of overfitting when estimating MDP probabilities from limited examples during policy search.
To compare systematic search (AO*) with greedy search (e.g., Value of Information) in learning diagnostic policies.
To integrate overfitting prevention directly into the policy search process, rather than as a post-hoc step.
To evaluate the performance of systematic search on real-world diagnostic benchmark datasets without assuming a Bayesian network structure.

Proposed method

Formalizes the diagnostic policy learning problem as a Markov Decision Process (MDP), where actions are test selections and states are partial test result sequences.
Employs the AO* algorithm for systematic search over policy trees, ensuring optimality in the search space under given MDP assumptions.
Introduces custom regularizers to constrain probability estimates from training examples, reducing overfitting during MDP parameter learning.
Uses example-based estimation of MDP transition and reward probabilities without assuming a Bayesian network structure.
Applies greedy search (specifically the Value of Information method) as a baseline for comparison.
Combines systematic search with regularized probability estimation to improve generalization and search efficiency.

Experimental results

Research questions

RQ1Can systematic search with AO* produce better diagnostic policies than greedy methods like Value of Information in cost-sensitive diagnosis?
RQ2How effective are regularizers in preventing overfitting when learning MDP probabilities from limited diagnostic examples?
RQ3What factors make systematic search computationally feasible for large-scale diagnostic policy learning?
RQ4Does integrating overfitting control directly into the search process improve policy quality compared to separate regularization?
RQ5How do systematic and greedy search methods compare in terms of expected total cost and robustness on benchmark diagnostic datasets?

Key findings

Systematic search using AO* with regularized probability estimation produced diagnostic policies with lower expected total costs than greedy methods.
The integration of overfitting control via regularizers significantly improved policy generalization and search stability.
Systematic search was found to be computationally feasible for diagnostic MDPs when combined with efficient pruning and regularization.
The proposed method outperformed greedy approaches on benchmark datasets, demonstrating superior policy quality without assuming a Bayesian network.
Regularization reduced overfitting in MDP probability estimation, especially when training data was limited.
The study established that systematic search is a viable and superior alternative to greedy search for learning diagnostic policies in cost-sensitive settings.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.