QUICK REVIEW

[論文レビュー] Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models

Xing Wang, Wei Song|arXiv (Cornell University)|Jan 1, 2026

Advanced Graph Neural Networks被引用数 0

ひとこと要約

論文は大規模言語モデルを教師として利用し、時間知識グラフの時系列推論を軽量な学生モデルへ転送する蒸留フレームワークを導入し、ベ benchmark データセットで小型モデルでも性能を向上させる。

ABSTRACT

Reasoning over temporal knowledge graphs (TKGs) is fundamental to improving the efficiency and reliability of intelligent decision-making systems and has become a key technological foundation for future artificial intelligence applications. Despite recent progress, existing TKG reasoning models typically rely on large parameter sizes and intensive computation, leading to high hardware costs and energy consumption. These constraints hinder their deployment on resource-constrained, low-power, and distributed platforms that require real-time inference. Moreover, most existing model compression and distillation techniques are designed for static knowledge graphs and fail to adequately capture the temporal dependencies inherent in TKGs, often resulting in degraded reasoning performance. To address these challenges, we propose a distillation framework specifically tailored for temporal knowledge graph reasoning. Our approach leverages large language models as teacher models to guide the distillation process, enabling effective transfer of both structural and temporal reasoning capabilities to lightweight student models. By integrating large-scale public knowledge with task-specific temporal information, the proposed framework enhances the student model's ability to model temporal dynamics while maintaining a compact and efficient architecture. Extensive experiments on multiple publicly available benchmark datasets demonstrate that our method consistently outperforms strong baselines, achieving a favorable trade-off between reasoning accuracy, computational efficiency, and practical deployability.

研究の動機と目的

リソース制約デバイス上での効率的な時系列知識グラフ（TKG）推論を動機づける。
従来の教師と大規模言語モデル（LLMs）の両方を活用する2段階蒸留フレームワークを提案し、軽量な学生モデルを指導する。
LLM認識蒸留を通じて時系列ダイナミクスと公共知識を取り込み、精度とデプロイ性を向上させる。

提案手法

教師-学生蒸留設定を使用。教師は高容量のTKGモデル、LLMが二次教師として機能。
2段階蒸留で訓練：まず従来の教師と学生を合わせ、次にLLMの予測と学生を合わせる。
損失関数には、教師と学生の間のL1蒸留（エンコーダ-デコーダ整合）、Huber損失によるLLMと学生の間のL2頑健蒸留、そして真値ラベルに対する監督損失のL3を含む。総合的なLtotal = L1 + α*L2 + LLMベースのL3（β因子を含む）。
LLMsはエンティティ-リレーションの意味をエンコードする。予測スコアはLLM埋め込みをソフトマックスベースのスコアリングに統合し、監督損失はLLM駆動のスコアからのソフトターゲットに対するMSEを用いる。
YAGO11kとWIKIdata12kでバックボーンモデルとしてTTransEとTADistMultを用い、BKD、FitNet、RKDのベースラインと比較する。評価指標はMRR、MR、Hits@k。

(a) Traditional knowledge distillation method

実験結果

リサーチクエスチョン

RQ1大規模言語モデルは軽量TKGの蒸留ベースの時系列推論を改善できるか？
RQ2LLMベースのガイダンスを組み込むことは、標準的なTKGベンチマークに対する従来の蒸留ベースラインと比べて性能にどのような影響を与えるか？
RQ3LLMガイド付き蒸留を用いる際のモデルサイズ・計算量・推論精度のトレードオフはどうなるか？

主な発見

モデル	手法	MRR（YAGO）	MR（YAGO）	Hits@1（YAGO）	Hits@3（YAGO）	Hits@10（YAGO）	MRR（WIKI）	MR（WIKI）	Hits@1（WIKI）	Hits@3（WIKI）	Hits@10（WIKI）
TTransE	BKD	7.65	1410.12	3.50	7.83	15.61	7.94	2383.67	4.75	8.22	14.04
TTransE	FitNet	7.59	1201.69	3.06	7.18	16.48	7.86	2148.86	3.93	7.78	14.67
TTransE	RKD	7.01	1186.27	3.56	6.95	13.47	7.89	2052.37	4.72	7.49	12.85
TTransE	Ours	7.69	1193.15	3.61	7.89	16.57	7.92	1985.63	4.86	8.36	14.94
TADistMult	BKD	61.90	973.89	58.51	64.13	67.59	45.89	3150.11	42.46	48.87	51.18
TADistMult	FitNet	58.44	986.92	54.71	60.29	65.34	43.92	3158.20	39.77	47.38	50.18
TADistMult	RKD	58.15	1089.57	54.48	61.72	65.17	42.72	3287.49	36.32	43.92	47.28
TADistMult	Ours	61.87	965.35	58.73	64.15	67.68	46.03	3142.85	42.50	49.16	51.14

提案された蒸留法は、両データセットで従来の蒸留ベースライン（BKD、FitNet、RKD）を一貫して上回る。
TTransEでは、BKDに対してYAGOでMRR0.5%向上、MRで15.4%向上、Hits@1で3.1%、Hits@3で0.8%、Hits@10で6.1%向上、WIKIではMRR・Hits@1・Hits@3・Hits@10の改善を達成。
TADistMultでは、YAGOで平均2.77%、WIKIで3.28%の指標改善を達成し、Hits@1とHits@3で最良を示す。
同アプローチはHits@1とHits@3で最高の性能を達成し、時系列リンク予測における高ランク精度が優れていることを示す。
アブレーション実験により、LLMベースの知識蒸留を組み込むとBKDを上回る堅牢な利益が得られ、軽量モデルへの時系列推論転移の価値が検証された。
一部設定ではRKDが変動を示し、著者は長時間の学習によって不安定性を緩和できる可能性を示唆。

(b) Large language model based distillation method

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。