QUICK REVIEW

[论文解读] Human-LLM Collaborative Feature Engineering for Tabular Data

Zhuoyan Li, Aditya Bansal|arXiv (Cornell University)|Jan 28, 2026

Machine Learning and Data Classification被引用 0

一句话总结

该论文提出了一种人机协同的表格特征工程框架，将操作提案（由大语言模型生成）与操作选择（由贝叶斯代理和可选的人类偏好引导）分离，从而在多个数据集上提升预测性能并降低认知负担。

ABSTRACT

Large language models (LLMs) are increasingly used to automate feature engineering in tabular learning. Given task-specific information, LLMs can propose diverse feature transformation operations to enhance downstream model performance. However, current approaches typically assign the LLM as a black-box optimizer, responsible for both proposing and selecting operations based solely on its internal heuristics, which often lack calibrated estimations of operation utility and consequently lead to repeated exploration of low-yield operations without a principled strategy for prioritizing promising directions. In this paper, we propose a human-LLM collaborative feature engineering framework for tabular learning. We begin by decoupling the transformation operation proposal and selection processes, where LLMs are used solely to generate operation candidates, while the selection is guided by explicitly modeling the utility and uncertainty of each proposed operation. Since accurate utility estimation can be difficult especially in the early rounds of feature engineering, we design a mechanism within the framework that selectively elicits and incorporates human expert preference feedback, comparing which operations are more promising, into the selection process to help identify more effective operations. Our evaluations on both the synthetic study and the real user study demonstrate that the proposed framework improves feature engineering performance across a variety of tabular datasets and reduces users' cognitive load during the feature engineering process.

研究动机与目标

通过将特征操作提案与选择分离，提升表格特征工程的效率。
引入贝叶斯代理模型来估计提出的操作的效用和不确定性。
整合有选择的人类专家偏好反馈，进一步提升操作选择的效果。
用上置信界限（UCB）策略在操作选择中实现探索与利用的平衡。
通过合成数据和用户研究展示性能提升与认知负荷降低。

提出的方法

LLM 从历史和数据集元数据（H_t、C、Meta）中生成大量候选特征转换。
贝叶斯神经网络代理建模每个操作的效用 g(e)，采用基于嵌入的编码 phi(e)，结合语义与列使用特征。
效用 mu_t(e) 与不确定性 sigma_t(e) 用于 UCB：UCB_t(e) = mu_t(e) + sqrt(beta_t) * sigma_t(e)。
在有益时通过成对比较形式引出人类偏好反馈，以进一步细化选择，建模为 probit 似然并更新后验 q'_t(theta)。
两种决策条件控制人类询问： (C1) UCB 与 LCB 的重叠以确保潜在收益，(C2) 不确定性阈值以证明认知成本；反馈用于在 e_t^a 和 e_t^b 之间调整最终选择。
算法在预算 T 的轮次中迭代，更新历史 H_t 和带有或不带有人类输入的代理模型。

实验结果

研究问题

RQ1在该框架下，将操作提案与选择解耦是否能提升基于 LLM 的表格数据特征工程的效率？
RQ2贝叶斯代理模型如何在此设置中估计拟议特征操作的效用和不确定性？
RQ3有选择的人类偏好反馈是否进一步提升特征工程的性能并降低认知负荷？
RQ4在选择由 LLM 提出操作时，探索与利用之间的权衡在该框架下的取舍如何？
RQ5与 AutoML 和现有基于 LLM 的方法相比，该框架在多数据集和下游模型上的表现如何？

主要发现

所提框架在 13 个分类数据集上对 MLP 与 XGBoost 评估器均显著优于 AutoML 及其他基线方法。
在无人工输入的情况下，方法实现了显著的错误率降低；引入人类反馈后在各任务上进一步降低。
基于 LLM 的特征工程方法通常优于传统的非 LLM AutoML 方法。
明示的效用与不确定性感知选择相较于黑箱式 LLM 优化提高了效率。
有选择的人类偏好反馈在特征工程工作流中带来持续的性能提升并降低了人类认知负荷。
在一组专有的转化数据集上，该方法在相同迭代预算下的 AUROC 高于基线 OCTree。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。