Skip to main content
QUICK REVIEW

[论文解读] One-Shot Identification with Different Neural Network Approaches

Janis Mohr, Jörg Frochte|arXiv (Cornell University)|Jan 13, 2026
Face recognition and analysis被引用 0
一句话总结

论文在工业和图像数据集上对比了三种一次性/零次识别方法,发现 siamese capsule networks 在总体准确性上最佳,合并图像的 CNN 在工业任务中表现最好。

ABSTRACT

Convolutional neural networks (CNNs) have been widely used in the computer vision community, significantly improving the state-of-the-art. But learning good features often is computationally expensive in machine learning settings and is especially difficult when there is a lack of data. One-shot learning is one such area where only limited data is available. In one-shot learning, predictions have to be made after seeing only one example from one class, which requires special techniques. In this paper we explore different approaches to one-shot identification tasks in different domains including an industrial application and face recognition. We use a special technique with stacked images and use siamese capsule networks. It is encouraging to see that the approach using capsule architecture achieves strong results and exceeds other techniques on a wide range of datasets from industrial application to face recognition benchmarks while being easy to use and optimise.

研究动机与目标

  • Motivate the problem of learning from very limited data and the need for robust one-shot identification in industrial and vision tasks.
  • Investigate three approaches: CNN with merged images, Siamese networks, and Siamese capsule networks for one-shot/zero-shot tasks.
  • Evaluate approaches on three datasets (industrial anodes, smallNORB, AT&T faces) to assess generalization and data-efficiency.
  • Quantify performance and compare accuracy, data requirements, and practicality for real-time industrial applications.

提出的方法

  • Three architectures are evaluated: a classic CNN trained on merged image pairs to classify same/different objects; a Siamese network using contrastive loss as a baseline; and a Siamese network with CapsNet (Capsule Networks) in one or both branches.
  • For the CNN with merged images, two images are merged horizontally/vertically or stacked as channels, with stacking giving better performance (98.36% in one setup).
  • The Siamese networks compare two inputs through twin networks with a contrastive loss L = y 1/2 D^2 + (1-y) 1/2 (max{0, m - D})^2, where D is the distance between embeddings.
  • CapsNet-based Siamese uses a CapsNet per branch with dynamic routing, squashing activations, and a decoder; training uses a contrastive loss similar to the baseline.
  • Experiments cover three datasets (industrial anodes, smallNORB, AT&T faces) with 10-fold cross-validation (except the industrial dataset).

实验结果

研究问题

  • RQ1Can one-shot identification be effectively performed with merged-image CNNs, Siamese CNNs, and Siamese CapsNets across diverse domains?
  • RQ2Does capsule-based siamese architecture provide superior accuracy with limited data compared to traditional CNN and siamese CNN approaches?
  • RQ3How do these methods perform on industrial data requiring rapid, data-sparse identification versus standard vision benchmarks?
  • RQ4What is the impact of image fusion strategy (merged vs stacked) on one-shot identification performance?

主要发现

ApproachIndustrial DatasetsmallNORBAT&T faces
merged images98.4%94.7%88.6%
siamese96.4%92.5%87.3%
siamese CapsNet97.9%98.4%90.2%
  • Merged-image CNNs with stacked channel inputs achieved high accuracy (98.4%) on the industrial dataset.
  • Siamese CNNs achieved 96.4% on the industrial dataset, 92.5% on smallNORB, and 87.3% on AT&T faces.
  • Siamese CapsNet achieved 97.9% on the industrial dataset, 98.4% on smallNORB, and 90.2% on AT&T faces, often outperforming baseline siamese setups.
  • CapsNet-based siamese networks perform best on smallNORB, indicating strong performance with limited data.
  • In the industrial task, the stacked CNN approach is slightly more accurate than Siamese CapsNet when combined with decoder-generated data (98.5%), suggesting decoder augmentation can boost performance.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。