[论文解读] Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning
本工作在 Tighnari 的基础上,通过聚合来自 PO 数据的伪标签来缓解标签噪声与分布偏移,使用可叠加的三模态跨注意力融合、非对称损失以及专家混合框架,以应对植物分布预测的分布内/分布外测试用例。
Large-scale, cross-species plant distribution prediction plays a crucial role in biodiversity conservation, yet modeling efforts in this area still face significant challenges due to the sparsity and bias of observational data. Presence-Absence (PA) data provide accurate and noise-free labels, but are costly to obtain and limited in quantity; Presence-Only (PO) data, by contrast, offer broad spatial coverage and rich spatiotemporal distribution, but suffer from severe label noise in negative samples. To address these real-world constraints, this paper proposes a multimodal fusion framework that fully leverages the strengths of both PA and PO data. We introduce an innovative pseudo-label aggregation strategy for PO data based on the geographic coverage of satellite imagery, enabling geographic alignment between the label space and remote sensing feature space. In terms of model architecture, we adopt Swin Transformer Base as the backbone for satellite imagery, utilize the TabM network for tabular feature extraction, retain the Temporal Swin Transformer for time-series modeling, and employ a stackable serial tri-modal cross-attention mechanism to optimize the fusion of heterogeneous modalities. Furthermore, empirical analysis reveals significant geographic distribution shifts between PA training and test samples, and models trained by directly mixing PO and PA data tend to experience performance degradation due to label noise in PO data. To address this, we draw on the mixture-of-experts paradigm: test samples are partitioned according to their spatial proximity to PA samples, and different models trained on distinct datasets are used for inference and post-processing within each partition. Experiments on the GeoLifeCLEF 2025 dataset demonstrate that our approach achieves superior predictive performance in scenarios with limited PA coverage and pronounced distribution shifts.
研究动机与目标
- 通过利用 PA 与 PO 数据来解决植物分布数据的稀疏性和偏差。
- 提出一个弱监督伪标签策略,在卫星图像补丁内聚合 PO 标签。
- 开发一个可叠加的三模态跨注意力融合,用于多模态数据(卫星影像、表格特征、时间序列)。
- 引入专家混合框架以应对 PA 训练与测试样本之间的地理分布偏移。
提出的方法
- 将 Swin Transformer 骨干升级为 Swin Base 以用于卫星影像。 将 TabM 作为表格骨干,并保留时间序列的 Temporal Swin Transformer。 仅在训练数据全部为 PA 时,引入可选的邻域标签聚合模态。 用可叠加的串联三模态跨注意力模块替代分层跨注意力。 在多标签场景中采用非对称损失(ASL)来处理标签噪声与类别不平衡。 通过地理邻近性将测试样本按 Mixture of Experts 进行分区,并对每个分区使用不同模型。

实验结果
研究问题
- RQ1如何在不引入过多标签噪声的情况下利用 PO 数据来构建多模态植物分布模型?
- RQ2可叠加的三模态跨注意力融合是否在多模态融合性能上优于以往的跨注意力设计?
- RQ3专家混合(MoE)方法是否在地理分布偏移和 PO 标签噪声下提升预测?
- RQ4将骨干网络升级(Swin Base、TabM)以及两阶段训练对 PA/PO 数据整合的影响?
- RQ5非对称损失在在该领域对大量负样本与稀少正样本之间的学习平衡效果如何?
主要发现
| Model | 2024 Private Score | 2024 Public Score | 2025 Private Score | 2025 Public Score |
|---|---|---|---|---|
| PA Only | 0.36908 | 0.37246 | 0.17290 | 0.20604 |
| PA + PO | 0.33335 | 0.33597 | 0.19107 | 0.21860 |
| MoE | 0.36908 | 0.37246 | 0.21689 | 0.24493 |
- 基于卫星补丁地理覆盖的伪标签聚合策略可降低 PO 标签噪声并将标签空间与遥感特征对齐。
- Swin Transformer Base 与 Temporal Swin Transformer 分别为卫星数据和时间序列数据提供更优的特征提取,TabM 提升表格特征表示。
- 可叠加的串联三模态跨注意力融合在多模态融合上优于其他融合方法,提升各模态间的整合效果。
- 带地理分区的专家混合推断在对分布偏移的鲁棒性方面表现更好,优于仅 PA 或简单 PA+PO 基线。
- 在 GeoLifeCLEF 2025 上,MoE 获得的分数高于基线,在 GeoLifeCLEF 2024 上也超过了第二名分数,表明在分布内和分布外场景下均有效。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。