QUICK REVIEW

[论文解读] Vision Transformer for Multi-Domain Phase Retrieval in Coherent Diffraction Imaging

Jialun Liu, David Yang|arXiv (Cornell University)|Feb 12, 2026

Advanced X-ray Imaging Techniques被引用 0

一句话总结

本论文提出了一种无监督的 Fourier Vision Transformer（Fourier ViT），能够直接从衍射强度解决多域 Bragg 相干衍射成像（BCDI）的相位检索，在强相位对比和噪声条件下实现了较低的 chi-squared 误差和对域墙的鲁棒重建。

ABSTRACT

Bragg coherent diffraction imaging (BCDI) phase retrieval becomes rapidly difficult in the strong-phase regime, where a crystal contains distortions beyond half a lattice spacing. An important special case is the phase domain problem, where blocks of a crystal are displaced with sharp jumps at domain walls. The strong-phase, here defined as beyond $\pm π/2$, generates split Bragg peaks and dense fringe structure for which classical iterative solvers often stagnate or return different solutions from different initialisations. Here, we introduce an unsupervised Fourier Vision Transformer (Fourier ViT) to solve this block-phase, multi-domain phase-retrieval problem directly from measured 2D Bragg diffraction intensities. Fourier ViT couples reciprocal-space information globally through multiscale Fourier token mixing, while shallow convolutional front and back-ends provide local filtering and reconstruction. We validate the approach on large-scale synthetic datasets of Voronoi multi-domain crystals with strong-phase contrast under realistic noise corruptions, and on experimental diffraction from a $\mathrm{La}_{2-x}\mathrm{Ca}_x\mathrm{MnO}_4$ nanocrystal. Across the regimes considered, Fourier ViT achieves the lowest reciprocal-space mismatch ($χ^2$) among the compared methods and preserves domain-resolved phase reconstructions for increasing numbers of domains. On experimental data, with the same real-space support, Fourier ViT matches the iterative benchmark $χ^2$ while improving robustness to random initialisations, yielding a higher success rate of low-$χ^2$ reconstructions than the complex convolutional neural network baseline.

研究动机与目标

解决 Bragg 相干衍射成像（BCDI）中强相位、多域晶体的相位检索挑战。
开发一个无监督、物理信息驱动的模型，从测量的衍射幅值重建实空间的振幅和相。
实现跨多个域配置的鲁棒、近实时重建，而无需地面实况标签。

提出的方法

提出使用多尺度傅里叶注意力的 Fourier ViT，以在全局上耦合倒空间信息。
将一个浅层 CNN 编码器与在 16x16 token 上工作、具备三种光谱尺度（1:4、1:2、1:1）的 Vision Transformer 相结合。
解码为复数实空间密度，输出振幅和相位，并受固定支撑约束。
在傅里叶空间使用混合损失进行训练，包括 PCC、归一化 RMS 的 chi-squared、一个加权的 chi-squared 项，以及带随纪元权重的小的 TV 正则项。

实验结果

研究问题

RQ1一个无监督的基于傅里叶注意力的变换器能否直接从衍射幅值重建多域强相位的 BCDI 模式？
RQ2在噪声、部分相干性和域数量变化下，Fourier ViT 相对于迭代方法和 CNN 基线的表现如何？
RQ3模型是否在合成与实验数据中保留域分辨的相界并恢复高 q 条纹信息？

主要发现

Fourier ViT 在合成的 64x64 模式上，对比方法中取得最低的倒空间不匹配（chi-squared），覆盖高达 19 个域的场景。
当振幅已知时，通过 Fourier ViT 的相位仅重建在多次运行中收敛到几乎完美的衍射一致性（chi-squared ≤ 1e-5）。
联合振幅–相位检索仍然可行，重建的相揭示清晰的域壁，振幅在整个 q 范围内与真实衍射相匹配。
在实验数据 La2-xCaxMnO4 上，Fourier ViT 在 chi-squared 和 PCC 上与迭代基准相当，并相较于复杂 CNN 基线在随机初始下更鲁棒。
在噪声模型（高斯、泊松）下，重建仍比干净衍射更接近，因此具备去噪能力，而非仅仅重复噪声。
部分相干会模糊衍射并可能使重建的振幅特征偏离清晰目标；Fourier ViT 仍能很好拟合模糊测量，但当模糊增加时可能与干净目标出现偏离。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。