QUICK REVIEW

[論文レビュー] The Conversational Exam: A Scalable Assessment Design for the AI Era

Barba Lorena A., Laura Stegner|arXiv (Cornell University)|Jan 15, 2026

Intelligent Tutoring Systems and Adaptive Learning被引用数 0

ひとこと要約

対話型試験を導入します。現実的な実習と監督を組み合わせ、AI時代の妥当性を維持するスケーラブルなライブコーディング口頭評価で、2日間にわたり58名の学生を対象に実証しました。

ABSTRACT

Traditional assessment methods collapse when students use generative AI to complete work without genuine engagement, creating an illusion of competence where they believe they're learning but aren't. This paper presents the conversational exam -- a scalable oral examination format that restores assessment validity by having students code live while explaining their reasoning. Drawing on human-computer interaction principles, we examined 58 students in small groups across just two days, demonstrating that oral exams can scale to typical class sizes. The format combines authentic practice (students work with documentation and supervised AI access) with inherent validity (real-time performance cannot be faked). We provide detailed implementation guidance to help instructors adapt this approach, offering a practical path forward when many educators feel paralyzed between banning AI entirely or accepting that valid assessment is impossible.

研究の動機と目的

AI時代における評価改革の必要性を動機づけ、AI対応の見せかけ作業を抑止し真の学習を保証する。
真性と妥当性を保つスケーラブルな対話型試験形式を提案する。
実装の具体的な設計図を提供し、物流・問題設計・採点を含めた普及支援を行う。

提案手法

三原則フレームワークを開発：現実の作業条件を通じた真性、振る舞いを監視するのではなくライブパフォーマンスで妥当性を確保、グループベースの口頭試験でスケーラビリティを実現。
三段階の質問バンクを作成（各階層30問）、三段階の足場づくり（Level 1, Level 2, optional Level 3）とヒント・高度なチェックの意思決定ツリーを用意。
詳細な採点用紙と固定ルーブリック（Technical Skills 1-4, Conceptual Understanding 1-4, Problem-Solving & Communication 1-4）を用いた構造化観察で、グループ間の一貫性のある評価を実現。
3名体制の組織（リード講師、共任講師、ティーチングアシスタント）と、リアルタイムで画面を監視しAI不正を抑制するZoom前提の設営を行う。
Level 1とLevel 2の質問を用いた1グループ30分回転を実施（5〜6名程度）、砂時計で時間管理し、学生の成績に応じて適応する。

実験結果

リサーチクエスチョン

RQ1生成系AIが存在する状況でも、対話型のグループベース口頭試験は妥当性を保ちつつ典型的なクラス規模にスケールできるか。
RQ2このような試験を信頼性高く効率的に実施するためのデザイン原理と物流構造は何か。
RQ3このアプローチは計算機科学系の課程における真の能力とAI補助・表面的な演技を区別できるか。

主な発見

2回の試験実施の教室平均は約80％、2日間で58名を対象に実施。
10回の30分セッションで5〜6名グループのローテーションを実施するスケーラブルな口頭試験形式を実証。
構造化された問題バンク、足場づくり、標準化された採点用紙により迅速で一貫した採点が可能となり、観察者の疲労を軽減。
許可・禁止されたAI使用ガイドラインを明確化することで、監督下でのAIとの意味のある関与を可能にした。
セットアップ、役割、試験前準備（校正、スケジューリング、練習セッション）は信頼性とテンポを確保する上で重要だった。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。