QUICK REVIEW

[論文レビュー] Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels

Sina Tavakolian, Nhan Thanh Nguyen|arXiv (Cornell University)|Feb 4, 2026

Millimeter-Wave Propagation and Modeling被引用数 0

ひとこと要約

要約: The paper develops compact student models via knowledge distillation from a large teacher to predict mmWave beams from sub-6 GHz channels, achieving near-teacher accuracy with ~99% fewer parameters and FLOPs.

ABSTRACT

Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.

研究の動機と目的

Motivate predicting optimal mmWave beams from sub-6 GHz channels to avoid exhaustive beam training.
Propose a KD-based framework to transfer knowledge from a large teacher to lightweight students.
Develop and compare IKD and RKD strategies plus self-distillation for compact beamformers.
Demonstrate that compact student models achieve near-teacher beam accuracy and spectral efficiency.
Quantify complexity reduction and validate on DeepMIMO-based datasets.

提案手法

Train a high-capacity teacher model to map sub-6 GHz channel inputs to mmWave beam classes using cross-entropy loss.
Distill knowledge into lightweight student models via two KD strategies: individual KD (IKD) and relational KD (RKD).
IKD combines teacher soft targets and ground-truth labels with a temperature parameter to guide the student.
RKD preserves relational structure by matching distance- and angle-based features between teacher and student across minibatches.
Optionally include self-distillation where teacher and student share architecture for comparison.
Assess performance via beam prediction accuracy and spectral efficiency across SNRs.

実験結果

リサーチクエスチョン

RQ1Can lightweight student models replicate the teacher's beam prediction accuracy and SE using sub-6 GHz inputs?
RQ2How do IKD and RKD compare in preserving performance while reducing model size and complexity?
RQ3What is the trade-off between distillation temperature, loss weights, and predictive performance?
RQ4How much complexity reduction is achievable without significant degradation in top-k beam accuracy and SE?

主な発見

Model	Parameters	FLOPs
Teacher	3,477,568	6,940,816
Students	24,768	49,152
Reduction (%)	≈99.29%	≈99.29%

Student models reduce trainable parameters and FLOPs by about 99% compared to the teacher.
IKD and RKD students achieve near-teacher beam prediction accuracy (Top-1 and Top-3) and SE across tested SNRs.
Self-distilled student achieves faster convergence and matches or surpasses the teacher in accuracy and SE.
RKD slightly outperforms IKD in most metrics, with both outperforming the non-distilled baseline.
All KD-based students retain over 95% of the teacher’s SE at typical SNRs (e.g., 15 dB).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。