QUICK REVIEW

[论文解读] Causal Parrots: Large Language Models May Talk Causality But Are Not Causal

Matej Zečević, Moritz Willig|arXiv (Cornell University)|Aug 24, 2023

Topic Modeling被引用 25

一句话总结

本文认为当前 LLMs 可以揭示关于因果事实的相关性，但并不真正执行因果推理；它引入 meta SCMs 和 Correlation of Causal Facts (CCF) conjecture，并提供经验测试，显示在 GPT-3、Luminous、OPT 与 GPT-4 的 Chain-of-Thought prompting 下具备混合的因果能力。

ABSTRACT

Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise. To this end, we define and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta SCM which encode causal facts about other SCM within their variables. We conjecture that in the cases where LLM succeed in doing causal inference, underlying was a respective meta SCM that exposed correlations between causal facts in natural language on whose data the LLM was ultimately trained. If our hypothesis holds true, then this would imply that LLMs are like parrots in that they simply recite the causal knowledge embedded in the data. Our empirical analysis provides favoring evidence that current LLMs are even weak `causal parrots.'

研究动机与目标

Formalize the idea that causal knowledge can be embedded as correlations of causal facts within meta-structural causal models (meta SCMs).
Propose the Correlation of Causal Facts (CCF) conjecture: LLMs reproduce causal facts only because they appear in training data and minimize training error.
Study whether current LLMs exhibit true causal inference or merely parrot causal information seen during training.
Provide an empirical analysis of how state-of-the-art LLMs perform on causal reasoning tasks and common-sense causal queries.

提出的方法

Define and instantiate simple SCMs and meta-SCMs to model causal facts and their correlations.
Introduce the Pearl causal hierarchy (L1/L2/L3) to frame informational levels needed for causal reasoning.
Formulate the Correlation of Causal Facts (CCF) conjecture linking LLM outputs to training-data-based causal facts and training loss.
Experimentally test LLMs on causal-chain prompts and intuitive-physics tasks to assess LLMs’ ability to infer or recall causal relations.
Discuss the role of fine-tuning and meta-SCM alignment in downstream tasks.

实验结果

研究问题

RQ1Can LLMs reliably answer interventional (L2) and counterfactual (L3) causal queries, or do they mostly reflect correlations present in training data?
RQ2Are meta-SCMs sufficient to explain instances where LLMs appear to reason causally, and can these meta-structures be identified in training data?
RQ3Do current foundation models show genuine causal inference capabilities, or are their correct answers primarily memorized correlations?
RQ4How do fine-tuning and chain-of-thought prompting affect LLMs’ performance on causal and intuitive-physics tasks?

主要发现

LLMs show mixed performance on causal reasoning tasks; some correct causal answers arise, but often reflect correlations learned from data rather than true causal inference.
The authors formalize meta-SCMs and demonstrate they can encode causal facts about another SCM, enabling a model to reflect interventional knowledge.
Chain-of-Thought prompting improves performance on several causal and intuitive-physics prompts, especially for GPT-4, indicating process compliance rather than hidden understanding.
GPT-3, Luminous, and OPT display variable success across tasks, with GPT-4-CoT achieving the strongest results in prompting experiments.
The Correlation of Causal Facts (CCF) conjecture posits that when LLMs produce correct causal answers, those answers are tied to observed causal facts in training data and the training objective minimizes error.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。