QUICK REVIEW

[논문 리뷰] Vision Transformer for Multi-Domain Phase Retrieval in Coherent Diffraction Imaging

Jialun Liu, David Yang|arXiv (Cornell University)|2026. 02. 12.

Advanced X-ray Imaging Techniques인용 수 0

한 줄 요약

이 논문은 다중 도메인 Bragg 고유성 간섭 패널 이미징(BCDI) 위상 추출을 비지도 Fourier Vision Transformer(Fourier ViT)로 해결하고, 강한 위상 대비 및 잡음에서도 낮은 chi-squared 오차와 강건한 도메인-wall 재구성을 달성합니다.

ABSTRACT

Bragg coherent diffraction imaging (BCDI) phase retrieval becomes rapidly difficult in the strong-phase regime, where a crystal contains distortions beyond half a lattice spacing. An important special case is the phase domain problem, where blocks of a crystal are displaced with sharp jumps at domain walls. The strong-phase, here defined as beyond $\pm π/2$, generates split Bragg peaks and dense fringe structure for which classical iterative solvers often stagnate or return different solutions from different initialisations. Here, we introduce an unsupervised Fourier Vision Transformer (Fourier ViT) to solve this block-phase, multi-domain phase-retrieval problem directly from measured 2D Bragg diffraction intensities. Fourier ViT couples reciprocal-space information globally through multiscale Fourier token mixing, while shallow convolutional front and back-ends provide local filtering and reconstruction. We validate the approach on large-scale synthetic datasets of Voronoi multi-domain crystals with strong-phase contrast under realistic noise corruptions, and on experimental diffraction from a $\mathrm{La}_{2-x}\mathrm{Ca}_x\mathrm{MnO}_4$ nanocrystal. Across the regimes considered, Fourier ViT achieves the lowest reciprocal-space mismatch ($χ^2$) among the compared methods and preserves domain-resolved phase reconstructions for increasing numbers of domains. On experimental data, with the same real-space support, Fourier ViT matches the iterative benchmark $χ^2$ while improving robustness to random initialisations, yielding a higher success rate of low-$χ^2$ reconstructions than the complex convolutional neural network baseline.

연구 동기 및 목표

BCDI에서 강한 위상, 다중 도메인 결정에서의 위상 추출 문제를 다룬다.
측정된 회절 진폭으로부터 실공간 진폭과 위상을 재구성하기 위한 물리 정보를 활용한 무지도 모델을 개발한다.
실제 정답 표기가 없는 상태에서 다중 도메인 구성에 걸쳐 거의 실시간에 가까운 강건한 재구성을 가능하게 한다.

제안 방법

다중 스케일 Fourier 주의를 사용해 역공간 정보를 전역적으로 결합하는 Fourier ViT를 제안한다.
16x16 토큰에서 작동하고 세 가지 스펙tral 스케일(1:4, 1:2, 1:1)을 갖는 얕은 CNN 인코더와 비전 트랜스포머를 결합한다.
고정된 지지를 제약으로 하는 진폭과 위상 출력으로 복소 실공간 밀도를 디코딩한다.
Fourier 공간에서 PCC, RMS-정규화된 chi-squared, 제곱된 chi-squared 항, 작은 TV 정규화항으로 구성된 하이브리드 손실과 epoch 의존 가중치를 사용해 학습한다.

실험 결과

연구 질문

RQ1비지도 Fourier 주의 기반 트랜스포머가 회절 진폭으로부터 다중 도메인 강한 위상 BCDI 패턴을 직접 재구성할 수 있는가?
RQ2노이즈, 부분 일관성, 다양한 도메인 수에서 Fourier ViT가 반복적 방법 및 CNN 기반 기법과 비교해 어떤 성능을 보이는가?
RQ3모델이 도메인 분해 위상 경계를 보존하고 합성 및 실험 데이터에서 높은 q 프린지 정보를 회복하는가?

주요 결과

Fourier ViT는 최대 19 도메인까지 있는 합성 64x64 패턴에서 비교 대상 방법들 중 역공간 불일치(chi-squared)가 가장 낮다.
진폭이 알려진 경우, 위상만 재구성하는 Fourier ViT는 다수의 실행에서 회절 합의가 거의 완벽에 가깝게 수렴한다(chi-squared ≤ 1e-5).
공진 진폭–위상 재구성은 여전히 가능하며, 재구성된 위상은 급격한 도메인 경계를 드러내고 진폭이 전체 q-범위에서 실제 회절과 일치한다.
실험 데이터 La2-xCaxMnO4에서 Fourier ViT는 chi-squared 및 PCC에서 반복적 벤치마크와 일치하며, 복잡한 CNN 기반 기법에 비해 임의 초기화에 대한 강건성이 향상된다.
노이즈 모델(가우스, 포아송) 전반에 걸쳐 재구성은 깨끗한 회절에 더 가까운 경향을 보이며, 노이즈를 단순히 복제하기보다 디노이징 능력이 있음을 시사한다.
부분 일관성은 회절을 흐리게 하고 재구성된 진폭 특징이 이동할 수 있지만, Fourier ViT는 흐림이 증가해도 흐림된 측정을 잘 맞춘다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.