[논문 리뷰] Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective
이 논문은 깊은 ResNet이 깊은 FFN보다 일반화가 더 잘 되는 이유를 무한 너비 한계에서 신경 접선 커널 NTK를 비교하고, FFN NTK가 깊이에 따라 퇴화하는 반면 ResNet NTK는 학습 가능한 용량을 유지한다는 것을 보여준다.
Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.
연구 동기 및 목표
- Motivate understanding of generalization gap between deep ResNets and FFNs.
- Develop NTK-based analysis for deep networks trained end-to-end.
- Compare limiting NTKs of deep FFNs and ResNets under wide/deep regimes.
- Provide nonasymptotic bounds linking network width/depth to NTK behavior.
제안 방법
- Model deep FFNets and ResNets with random Gaussian initializations.
- Derive GP kernels and NTKs for both architectures in the infinite-width limit.
- Normalize NTKs to study limiting behavior as depth grows.
- Prove and/or sketch proofs that FFN NTK degenerates with depth, while ResNet NTK remains learnable.
- Provide nonasymptotic bounds that connect finite-width networks to their limiting NTKs.
- Support theoretical claims with kernel regression experiments on MNIST and CIFAR-10.
실험 결과
연구 질문
- RQ1Do deep FFNs and deep ResNets induce different limiting NTKs as depth goes to infinity?
- RQ2Is the class of functions induced by the FFN limiting NTK learnable, and is this avoided by the ResNet limiting NTK?
- RQ3How do width and depth interact to determine the NTK and generalization properties for both architectures?
- RQ4Can kernel regression with NTK-based kernels reproduce observed generalization differences between FFNs and ResNets?
주요 결과
- The FFN NTK converges to a non-informative limiting kernel that yields poor generalization on unseen data.
- The ResNet NTK converges to a learnable limiting kernel that maintains discriminatory power between inputs as depth grows.
- For ResNets with appropriate scaling, the limiting NTK can be depth-invariant, helping explain sustained generalization with very deep models.
- Nonasymptotic bounds show finite-width networks approximate their limiting NTKs under specified width conditions.
- Numerical experiments on MNIST and CIFAR-10 demonstrate FFN-based kernel regressors degrade with depth, while ResNet-based regressors maintain accuracies across depths.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.