QUICK REVIEW

[논문 리뷰] How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence

Luke Zaphir, Jason M. Lodge|arXiv (Cornell University)|2024. 06. 20.

Explainable Artificial Intelligence (XAI)인용 수 5

한 줄 요약

본 논문은 생성형 AI에 대한 취약성을 비판하기 위한 MAGE 프레임워크를 제안하며, 비판적 사고 과제를 평가하고 개선하기 위한 학문 분야별 평가 설계를 안내한다.

ABSTRACT

Generative AI such as those with large language models have created opportunities for innovative assessment design practices. Due to recent technological developments, there is a need to know the limits and capabilities of generative AI in terms of simulating cognitive skills. Assessing student critical thinking skills has been a feature of assessment for time immemorial, but the demands of digital assessment create unique challenges for equity, academic integrity and assessment authorship. Educators need a framework for determining their assessments vulnerability to generative AI to inform assessment design practices. This paper presents a framework that explores the capabilities of the LLM ChatGPT4 application, which is the current industry benchmark. This paper presents the Mapping of questions, AI vulnerability testing, Grading, Evaluation (MAGE) framework to methodically critique their assessments within their own disciplinary contexts. This critique will provide specific and targeted indications of their questions vulnerabilities in terms of the critical thinking skills. This can go on to form the basis of assessment design for their tasks.

연구 동기 및 목표

생성형 AI가 인지적 기술을 시뮬레이션하는 한계와 능력을 평가할 필요성을 제시한다.
학문 분야 맥락에서 평가를 체계적으로 비판하기 위한 프레임워크를 제공한다.
AI의 생성에 의해 견고한 평가를 설계하기 위한 실행 가능한 지침을 제공한다.
디지털 평가 맥락에서 형평성, 학문적 진실성 및 저자권 이슈를 다룬다.

제안 방법

MAGE 프레임워크를 제안한다: 질문 매핑, AI 취약성 테스트, 채점, 평가를 포함
현재 업계 벤치마크로 ChatGPT-4를 사용하여 비판적 사고 과제에서 AI 능력을 테스트한다
질문을 잠재적 AI 취약성에 매핑하고 응답을 채점 및 평가하기 위한 단계들을 개요화한다
AI 취약성 발견을 해석하기 위한 학문 분야 맥락별 지침을 제공한다.

실험 결과

연구 질문

RQ1ChatGPT-4와 같은 생성형 AI에 대한 취약성에 대해 평가를 어떻게 체계적으로 비판할 수 있는가?
RQ2학문 분야 과제에서 AI의 비판적 사고 품질을 드러내는 지표는 무엇인가?
RQ3MAGE 프레임워크가 AI 취약성을 완화하기 위한 평가 설계에 어떻게 도움이 될 수 있는가?
RQ4디지털 평가 맥락에서 형평성, 학문적 진실성 및 저자권 이슈에 어떤 고려가 필요한가?

주요 결과

본 논문은 AI 취약성에 대해 평가를 비판하기 위한 방법으로 MAGE 프레임워크를 제시한다.
프레임워크는 비판적 사고 기술 측면에서 질문의 취약점을 표적화된 지시로 나타낼 수 있게 한다.
이 접근법은 학문 분야 맥락에서 평가 설계의 개선을 지원한다.
이 프레임워크는 디지털 평가에서 형평성, 학문적 진실성, 및 저자권과 관련된 우려를 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.