QUICK REVIEW

[论文解读] LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models

Pengcheng Zheng, Chaoning Zhang|arXiv (Cornell University)|Jan 28, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

LLaVA-FA 通过在傅里叶（频域）中联合应用低秩近似和量化，同时使用 PolarQuant 及可选对角线标定以降低数据需求，从而在训练参数较少、计算量更低的情况下实现强大性能。

ABSTRACT

Large multimodal models (LMMs) have achieved impressive performance on various vision-language tasks, but their substantial computational and memory costs hinder their practical deployment. Existing compression methods often decouple low-rank decomposition and quantization, leading to compounded reconstruction errors, especially in multimodal architectures with cross-modal redundancy. To address this issue, we propose LLaVA-FA, a novel efficient LMM that performs joint low-rank plus quantization approximation in the frequency domain. By leveraging the de-correlation and conjugate symmetry properties of Fourier transform, LLaVA-FA achieves more compact and accurate weight representations. Furthermore, we introduce PolarQuant, a polar-coordinate quantization method tailored for complex matrices, and an optional diagonal calibration (ODC) scheme that eliminates the need for large-scale calibration data. Extensive experimental results demonstrate that our proposed LLaVA-FA outperforms existing efficient multimodal models across multiple benchmarks while maintaining minimal activated parameters and low computational costs, validating its effectiveness as a powerful solution for compressing LMMs.

研究动机与目标

说明对大模态模型（LMMs）进行大规模、数据高效压缩的必要性
提出一个傅里叶域中的联合低秩＋量化框架，以在减小内存与计算量的同时保持准确性
引入 PolarQuant，用于复值权重的量化，以及可选对角线标定（ODC），以避免大量标定数据
证明基于傅里叶的压缩在重构误差和参数数量方面优于时空域方法
展示该方法在不同模型规模上的可扩展性，并在基准任务上保持效率

提出的方法

将每个权重矩阵分解为一个低秩的复数部分与频域残差的量化近似（W ≈ eQ + eL1eL2）
使用 2D-离散傅里叶变换将实值权重映射到具有共轭对称性的复数频域表示 fW，参数大致减半
通过 FourierSVD 从残差中得到 eL1 与 eL2，仅保留前 r 个奇异值
对残差进行 PolarQuant 量化，该编码在极坐标中离散振幅和相位以处理复矩阵（br、bθ 比特）
可选应用对角线标定（ODC）以通过行/列标定均值来近似海森矩阵结构，从而在没有大量标定数据的情况下对目标函数进行加权
给出一个交替优化步骤（eL1,eL2 ← ODC 残差，eQ ← PolarQuant(残差)），用于细化分解

实验结果

研究问题

RQ1联合频域低秩＋量化是否比时空域方法更有效地压缩 LMM 的权重矩阵？
RQ2PolarQuant 是否能够有效量化频域的复数权重，同时保持跨模态对齐与准确性？
RQ3可选的对角线标定（ODC）能否在不牺牲压缩质量的前提下消除对大量标定数据的需求？
RQ4傅里叶域 LMM 压缩在不同模型规模和基准任务上的可扩展性如何？

主要发现

LLaVA-FA 在多模态基准任务上实现具有竞争力甚至优于基线的性能，同时训练参数和数据量显著更少
PolarQuant 编码器能够实现对复数矩阵的振幅与相位感知的有效量化，保留结构并改进重构
可选的对角线标定（ODC）在无需大量标定集的情况下提供鲁棒性
傅里叶域分解在相同秩下比时空域截断具有更小的 Frobenius 重构误差，原因在于去相关化与共轭对称性
LLaVA-FA 能扩展至 2B、3B、7B 级骨干网，在不同任务上保持一致的压缩收益与鲁棒性能（包括与幻觉相关的基准）
推断成本（FLOPs 与延迟）下降，在放缩实验中 KV 缓存使用降低、首次输出时间更快

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。