QUICK REVIEW

[论文解读] $CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models

Yasiru Ranasinghe, Nithin Gopalakrishnan Nair|arXiv (Cornell University)|Mar 22, 2023

Video Surveillance and Tracking Methods被引用 13

一句话总结

CrowdDiff 将人群密度图生成视为条件去噪扩散过程，以产生高保真、窄核密度图，并通过多实现实融合提升计数。

ABSTRACT

Crowd counting is a fundamental problem in crowd analysis which is typically accomplished by estimating a crowd density map and summing over the density values. However, this approach suffers from background noise accumulation and loss of density due to the use of broad Gaussian kernels to create the ground truth density maps. This issue can be overcome by narrowing the Gaussian kernel. However, existing approaches perform poorly when trained with ground truth density maps with broad kernels. To deal with this limitation, we propose using conditional diffusion models to predict density maps, as diffusion models show high fidelity to training data during generation. With that, we present $CrowdDiff$ that generates the crowd density map as a reverse diffusion process. Furthermore, as the intermediate time steps of the diffusion process are noisy, we incorporate a regression branch for direct crowd estimation only during training to improve the feature learning. In addition, owing to the stochastic nature of the diffusion model, we introduce producing multiple density maps to improve the counting performance contrary to the existing crowd counting pipelines. We conduct extensive experiments on publicly available datasets to validate the effectiveness of our method. $CrowdDiff$ outperforms existing state-of-the-art crowd counting methods on several public crowd analysis benchmarks with significant improvements.

研究动机与目标

通过使用窄高斯核来推动基于密度图的人群计数，同时减小背景噪声和核损失。
提出一个条件扩扩散模型框架，用于生成密度图并更真实地学习人口分布。
通过对窄核密度图进行阈值化实现计数，并在多次扩散实现之间采用融合策略。
在训练过程中加入辅助回归分支，以提升计数相关的特征学习。
在多个公开的人群计数数据集上展示最先进的性能。

提出的方法

将密度图生成表述为以输入图像为条件的去噪扩散过程。
采用窄高斯核（3x3，sigma=0.5）生成真实密度图，减少背景干扰。
训练一个去噪U-Net，使用结合噪声预测与计数损失的混合损失。
在训练阶段加入一个计数分支，根据编码器-解码器特征回归计数。
通过随机扩散生成多种密度图实现，并将它们融合以提升计数（人群图融合）。
在融合过程中，对密度图进行阈值化以获得点图，然后通过一个以 SSIM 为引导、带拒绝半径的机制来合并实现以避免重复计数。

$CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models

实验结果

研究问题

RQ1条件扩散模型是否能够在降低背景噪声的同时为人群图像生成高保真密度图？
RQ2使用窄核是否会在密度图保真度和计数准确性方面优于宽核地面实况？
RQ3扩散模型的多实现融合是否能获得比单一实现更好的计数？
RQ4在训练中使用辅助回归监督是否提升了扩散式人群计数的特征学习？
RQ5与最先进方法相比，CrowdDiff 在标准人群计数基准上的表现如何？

主要发现

方法	JHU-Crowd++ MAE	JHU-Crowd++ MSE	ShanghaiTech A MAE	ShanghaiTech A MSE	ShanghaiTech B MAE	ShanghaiTech B MSE	UCF-CC-50 MAE	UCF-CC-50 MSE	UCF-QNRF MAE	UCF-QNRF MSE	NWPU-Crowd MAE	NWPU-Crowd MSE
CrowdDiff	47.3	198.9	47.4	75.0	5.7	8.2	160.8	225.0	68.9	125.6	57.8	221.2

CrowdDiff 在多个公开数据集上达到最先进的结果（如 JHU-Crowd++、ShanghaiTech、UCF-QNRF、NWPU-Crowd）。
窄核使密度图在拥挤区域更好地保留密度信息，且相比宽核减少信息损失。
对密度图进行阈值计数（而非直接求像素之和）可降低背景噪声并提高鲁棒性。
多实现融合（人群图融合）利用扩散的随机性来提升计数，优于单实现基线。
用中间扩散特征训练的计数分支提升了计数性能并减少实现之间的变异性。
在各数据集上，CrowdDiff 在密集和稀疏人群场景都表现出色。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。