QUICK REVIEW

[論文レビュー] Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

Xinyuan Wang, Kunpeng Liu|arXiv (Cornell University)|Feb 13, 2026

Domain Adaptation and Few-Shot Learning被引用数 0

ひとこと要約

この論文は、タブレデータの特徴変換を最適化するためにLLM駆動のデモンストレーションを進化させるデータ中心の閉ループフレームワークを提案し、ベースラインよりも下流の性能を向上させ、より安定させる。

ABSTRACT

Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions rely on discrete search or latent generation, but they are frequently limited by sample inefficiency, invalid candidates, and redundant generations with limited coverage. Large Language Models (LLMs) offer strong priors for producing valid transformations, but current LLM-based FT methods typically rely on static demonstrations, resulting in limited diversity, redundant outputs, and weak alignment with downstream objectives. We propose a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop. Starting from high-performing feature transportation sequences explored by reinforcement learning, we construct and continuously update an experience library of downstream task-verified transformation trajectories, and use a diversity-aware selector to form contexts along with a chain-of-thought and guide transformed feature generation toward higher performance. Experiments on diverse tabular benchmarks show that our method outperforms classical and LLM-based baselines and is more stable than one-shot generation. The framework generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

研究の動機と目的

表形式データにおける特徴変換を最適化して下流予測性能を向上させる。
タスクに整合したデモンストレーションを学習することで無効な変換・冗長な変換を減らす。
CoTスタイルの進化的少数ショット文脈を活用してLLMをより良い変換へ誘導する。
LLMと評価者を跨ぐ方法の安定性と転移性を実証する。

提案手法

特徴変換を後置記号列として表現し、探索空間を縮小し実行性を確保する。
RLを用いて高性能な変換列を探索し初期の経験ライブラリを構築する。
3段階の精練（検証チェック、CoT軌跡構築、エントロピ-にもとづく多様性選択）により再利用可能で多様なデモを作成する。
ステージIIIでは進化する経験ライブラリから構築された文脈を用いてLLMが改善された変換列を生成するよう導き、結果をライブラリに書き戻して検証する。
固定された下流モデルと一貫した評価指標を用いて多様なタブルデータベースで評価し、古典的FTベースラインや他のMLLMベース手法と比較する。

実験結果

リサーチクエスチョン

RQ1Q1: データ中心の経験進化フレームワークは、タブレデータセット上でベースラインより下流性能を改善するか。
RQ2Q2: 閉ループ書き戻しはワンショット生成と比べて向上をもたらすか。
RQ3Q3: 各段階（RL探索、精練、文脈活用）は性能にどれくらい寄与するか、CoT組織は必須か。
RQ4Q4: 方法はポリシーLLM（APIベースおよびオープンソース）間で転移可能かつ下流評価者への耐性を持つか。
RQ5Q5: コスト–性能のトレードオフと特徴変換中のLLMの挙動はどう観察されるか。

主な発見

Dataset	Source	Task	Samples	Features	Original	RDG	PCA	LDA	ERG	AFAT	AutoFeat	NFS	TTG	GRFG	MOAT	OpenFE	CAA FE	FeatLLM	ELLM-FT	Ours
Amazon Employee	Kaggle	C	32769	9	93.37%	92.31%	92.29%	91.64%	92.43%	92.97%	93.29%	93.21%	92.79%	93.02%	93.13%	93.44%	91.41%	93.62%	93.17%	94.41%
German Credit	UCIrvine	C	1000	24	74.20%	68.01%	67.92%	63.91%	74.43%	68.32%	74.86%	68.67%	64.51%	68.29%	72.44%	74.50%	59.92%	76.35%	76.39%	85.32%
Higgs Boson	UCIrvine	C	50000	28	69.66%	67.51%	53.45%	51.32%	69.02%	69.70%	67.35%	69.17%	68.99%	69.77%	69.66%	61.26%	70.35%	69.66%	72.29%
Ionosphere	UCIrvine	C	351	34	93.37%	91.17%	92.87%	65.53%	92.02%	92.87%	93.37%	91.17%	90.31%	93.16%	95.69%	93.37%	96.01%	97.14%	%
Lymphography	UCIrvine	C	148	18	83.19%	79.36%	70.38%	70.38%	83.73%	82.38%	79.26%	85.25%	82.38%	85.51%	88.38%	83.73%	75.00%	85.24%	90.54%	95.07%
Messidor Feature	UCIrvine	C	1151	19	69.09%	62.38%	67.21%	47.52%	66.90%	66.55%	69.08%	63.77%	66.46%	69.24%	73.02%	69.09%	66.10%	72.62%	74.80%	76.98%
PimaIndian	Kaggle	C	768	8	80.68%	76.04%	63.80%	63.80%	76.17%	76.56%	80.86%	74.87%	74.48%	75.39%	80.73%	80.86%	79.86%	89.66%	89.66%	93.29%
Spam Base	UCIrvine	C	4601	57	94.53%	90.61%	81.66%	88.89%	91.70%	91.20%	94.54%	92.50%	91.91%	92.20%	92.90%	94.53%	88.51%	95.03%	96.68%	96.19%
SpectF	UCIrvine	C	267	44	76.06%	76.03%	70.92%	66.29%	75.66%	76.03%	76.06%	79.40%	76.03%	81.65%	86.95%	76.06%	70.60%	80.07%	86.14%	87.16%
SVMGuide3	LibSVM	C	1243	21	81.85%	78.68%	67.60%	65.24%	82.62%	79.49%	83.05%	79.16%	79.81%	81.17%	81.74%	81.85%	75.30%	82.54%	82.70%	87.68%
UCI Credit	UCIrvine	C	30000	23	79.29%	80.32%	73.27%	74.37%	80.16%	80.32%	79.72%	80.13%	79.81%	80.67%	80.87%	80.11%	76.80%	76.39%	79.29%	80.88%
Wine Quality Red	UCIrvine	C	999	11	60.95%	46.65%	42.21%	43.31%	46.10%	48.05%	62.52%	46.21%	46.71%	47.01%	62.10%	53.71%	51.74%	62.65%	61.11%	68.59%
Wine Quality White	UCIrvine	C	4898	11	54.75%	52.41%	43.01%	44.94%	51.04%	51.67%	54.26%	52.51%	53.12%	53.41%	54.52%	54.75%	42.82%	56.87%	55.03%	66.95%
Airfoil	UCIrvine	R	1503	5	0.5749	0.5193	0.2730	0.2201	0.5193	0.5210	0.5746	0.5193	0.5003	0.5587	0.5967	0.5746	N/A	0.5877	0.6174	0.7594
Housing Boston	Kaggle	R	506	13	0.4148	0.4043	0.1048	0.0201	0.4090	0.4161	0.4149	0.4251	0.3967	0.4043	0.4463	0.4148	N/A	0.4442	0.4564	0.7295
Openml 586	OpenML	R	1000	25	0.6311	0.5681	0.1109	0.1109	0.6147	0.5435	0.6329	0.5443	0.5443	0.5768	0.6251	0.6311	N/A	0.6477	0.6328	0.7406
Openml 589	OpenML	R	1000	25	0.5388	0.5091	0.0112	0.0112	0.5103	0.5087	0.5423	0.5053	0.5032	0.5047	0.5139	0.5388	N/A	0.5545	0.5836	0.6602
Openml 607	OpenML	R	1000	50	0.6207	0.5208	0.1071	0.1071	0.5553	0.5158	0.6191	0.5194	0.5222	0.6021	0.6051	0.6207	N/A	0.5608	0.6089	0.7408
Openml 616	OpenML	R	500	50	0.3736	0.0701	0.0242	0.0241	0.1937	0.1489	0.3924	0.1667	0.1567	0.3722	0.4063	0.3736	N/A	0.3836	0.4082	0.5789
Openml 618	OpenML	R	1000	50	0.4402	0.3720	0.1016	0.0521	0.3561	0.2472	0.4407	0.3473	0.3467	0.4562	0.4734	0.4402	N/A	0.4597	0.4734	0.6546
Openml 620	OpenML	R	1000	25	0.6434	0.5111	0.1138	0.0293	0.5466	0.5267	0.6576	0.5130	0.5123	0.5591	0.5722	0.6434	N/A	0.5725	0.6203	0.6925
Openml 637	OpenML	R	500	50	0.3162	0.1364	0.0352	0.0433	0.1521	0.1758	0.3251	0.1521	0.1439	0.2071	0.2125	0.3162	N/A	0.2945	0.2946	0.5471
Openml 616 (duplicate)	OpenML	R	500	50	0.3162	0.1364	0.0352	0.0433	0.1521	0.1758	0.3251	0.1521	0.1439	0.2071	0.2125	0.3162	N/A	0.2945	0.2946	0.5471

本手法は古典的なFTおよび他のLLMベースのベースラインと比較して、分類・回帰ベンチマーク全体で最良の平均順位を達成した。
閉ループ書き戻しは、同じ予算内でワンショット生成よりも安定的で高い最終性能を提供する。
3段階の精練は信頼性とカバレッジを大幅に改善し、CoT組織と多様性制御が重要な向上に寄与する。
複数のポリシーLLM（APIベースおよびオープンソース）間の転移性を示し、評価者間で堅牢な性能を発揮する。
アブレーションにより、初期RL経験を中程度に増やすと初期のカバレージが向上し、その後は精練と書き戻しが利益を生み出すことが示される。エントロピ-に基づく選択は多様性を高め、冗長性を減らす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。