QUICK REVIEW

[논문 리뷰] Human-LLM Collaborative Feature Engineering for Tabular Data

Zhuoyan Li, Aditya Bansal|arXiv (Cornell University)|2026. 01. 28.

Machine Learning and Data Classification인용 수 0

한 줄 요약

본 논문은 표 형식 피처 엔지니어링에서 연산 제안(LLMs에 의해)과 연산 선택(베이지안 대리모와 선택적 인간 선호에 의해 안내)을 분리하는 인간-LLM 협업 프레임워크를 제안합니다. 이는 다수의 데이터셋에서 예측 성능이 향상되고 인지 부담이 감소하는 것을 보여줍니다.

ABSTRACT

Large language models (LLMs) are increasingly used to automate feature engineering in tabular learning. Given task-specific information, LLMs can propose diverse feature transformation operations to enhance downstream model performance. However, current approaches typically assign the LLM as a black-box optimizer, responsible for both proposing and selecting operations based solely on its internal heuristics, which often lack calibrated estimations of operation utility and consequently lead to repeated exploration of low-yield operations without a principled strategy for prioritizing promising directions. In this paper, we propose a human-LLM collaborative feature engineering framework for tabular learning. We begin by decoupling the transformation operation proposal and selection processes, where LLMs are used solely to generate operation candidates, while the selection is guided by explicitly modeling the utility and uncertainty of each proposed operation. Since accurate utility estimation can be difficult especially in the early rounds of feature engineering, we design a mechanism within the framework that selectively elicits and incorporates human expert preference feedback, comparing which operations are more promising, into the selection process to help identify more effective operations. Our evaluations on both the synthetic study and the real user study demonstrate that the proposed framework improves feature engineering performance across a variety of tabular datasets and reduces users' cognitive load during the feature engineering process.

연구 동기 및 목표

특성 연산 제안을 분리하고 선택하는 것을 통해 표 형식 피처 엔지니어링의 효율성 향상을 촉진합니다.
제안된 연산의 효용성과 불확실성을 추정하기 위한 베이지안 대리모를 도입합니다.
선택적 인간 전문가 선호 피드백을 도입해 연산 선택을 더욱 정제합니다.
연산 선택을 위한 Upper Confidence Bound (UCB) 전략으로 탐색과 활용의 균형을 맞춥니다.
합성 데이터와 사용자 연구를 통해 성능 향상과 인지 부담 감소를 입증합니다.

제안 방법

LLM은 과거 이력과 데이터셋 메타데이터(H_t, C, Meta)로부터 다양한 후보 피처 변환 세트를 생성합니다.
베이지안 신경망 대리모가 각 연산의 효용 g(e)를 모델링하고, 의미 체계와 열 사용 특성을 결합한 임베딩 기반 인코딩 phi(e)를 사용합니다.
효용 mu_t(e)와 불확실성 sigma_t(e)는 UCB에 사용됩니다: UCB_t(e) = mu_t(e) + sqrt(beta_t) * sigma_t(e).
이익이 있을 때 선택을 더 세밀하게 다듬기 위해 인간 선호 피드백을 쌍대 비교(pairwise comparisons)로 요청하고, 이를 probit 우도와 업데이트된 사후분포 q'_t(theta)로 모델링합니다.
인간 유도에 적용되는 두 가지 의사결정 조건: (C1) 잠재 이익 보장을 위한 UCB와 LCB의 중첩, (C2) 인지 비용을 정당화하는 불확실성 임계값; 피드백은 최종 선택 e_t^a 와 e_t^b 사이의 조정에 사용됩니다.
알고리즘은 예산 T까지 라운드를 반복하면서 히스토리 H_t와 대리모를 인간 입력 여부에 관계없이 업데이트합니다.

실험 결과

연구 질문

RQ1연산 제안과 선택을 분리하는 것이 표 형식 데이터에 대한 LLM 기반 피처 엔지니어링의 효율성을 향상시킬 수 있을까요?
RQ2이 설정에서 제안된 피처 연산의 효용성 및 불확실성을 베이지안 대리모가 어떻게 추정할 수 있을까요?
RQ3선택적 인간 선호 피드백이 피처 엔지니어링 성능을 더 향상시키고 인지 부하를 줄일까요?
RQ4이 프레임워크에서 LLM이 제안한 연산을 선택할 때 탐색과 활용 간의 trade-off는 무엇일까요?
RQ5AutoML 및 기존 LLM 기반 방법과 비교하여 다수의 데이터셋과 다운스트림 모델에서 이 프레임워크의 성능은 어떠한가요?

주요 결과

제안된 프레임워크는 MLP와 XGBoost 평가자에서 13개의 분류 데이터셋에 대해 AutoML 및 다른 LLM 기반 기준선보다 일관되게 우수합니다.
인간 입력 없이도 방법은 주목할 만한 오차율 감소를 달성하고, 인간 피드백을 사용하면 작업 전반에 걸쳐 감소가 더 커집니다.
LLM 기반 피처 엔지니어링 방법은 일반적으로 기존의 비-LLM AutoML 접근법을 능가합니다.
명시적 효용 및 불확실성 인식 기반 선택이 블랙박스 LLM 최적화 대비 효율성을 향상시킵니다.
선택적 인간 선호 피드백은 일관된 성능 향상을 가져오고 피처 엔지니어링 워크플로우에서 인간의 인지 부담을 줄입니다.
독점 변환 데이터셋에서 동일 반복 예산하에 이 방법은 기준 OCTree보다 더 높은 AUROC를 달성했습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.