QUICK REVIEW

[论文解读] One-Shot Identification with Different Neural Network Approaches

Janis Mohr, Jörg Frochte|arXiv (Cornell University)|Jan 13, 2026

Face recognition and analysis被引用 0

一句话总结

论文在工业和图像数据集上对比了三种一次性/零次识别方法，发现 siamese capsule networks 在总体准确性上最佳，合并图像的 CNN 在工业任务中表现最好。

ABSTRACT

Convolutional neural networks (CNNs) have been widely used in the computer vision community, significantly improving the state-of-the-art. But learning good features often is computationally expensive in machine learning settings and is especially difficult when there is a lack of data. One-shot learning is one such area where only limited data is available. In one-shot learning, predictions have to be made after seeing only one example from one class, which requires special techniques. In this paper we explore different approaches to one-shot identification tasks in different domains including an industrial application and face recognition. We use a special technique with stacked images and use siamese capsule networks. It is encouraging to see that the approach using capsule architecture achieves strong results and exceeds other techniques on a wide range of datasets from industrial application to face recognition benchmarks while being easy to use and optimise.

研究动机与目标

Motivate the problem of learning from very limited data and the need for robust one-shot identification in industrial and vision tasks.
Investigate three approaches: CNN with merged images, Siamese networks, and Siamese capsule networks for one-shot/zero-shot tasks.
Evaluate approaches on three datasets (industrial anodes, smallNORB, AT&T faces) to assess generalization and data-efficiency.
Quantify performance and compare accuracy, data requirements, and practicality for real-time industrial applications.

提出的方法

Three architectures are evaluated: a classic CNN trained on merged image pairs to classify same/different objects; a Siamese network using contrastive loss as a baseline; and a Siamese network with CapsNet (Capsule Networks) in one or both branches.
For the CNN with merged images, two images are merged horizontally/vertically or stacked as channels, with stacking giving better performance (98.36% in one setup).
The Siamese networks compare two inputs through twin networks with a contrastive loss L = y 1/2 D^2 + (1-y) 1/2 (max{0, m - D})^2, where D is the distance between embeddings.
CapsNet-based Siamese uses a CapsNet per branch with dynamic routing, squashing activations, and a decoder; training uses a contrastive loss similar to the baseline.
Experiments cover three datasets (industrial anodes, smallNORB, AT&T faces) with 10-fold cross-validation (except the industrial dataset).

实验结果

研究问题

RQ1Can one-shot identification be effectively performed with merged-image CNNs, Siamese CNNs, and Siamese CapsNets across diverse domains?
RQ2Does capsule-based siamese architecture provide superior accuracy with limited data compared to traditional CNN and siamese CNN approaches?
RQ3How do these methods perform on industrial data requiring rapid, data-sparse identification versus standard vision benchmarks?
RQ4What is the impact of image fusion strategy (merged vs stacked) on one-shot identification performance?

主要发现

Approach	Industrial Dataset	smallNORB	AT&T faces
merged images	98.4%	94.7%	88.6%
siamese	96.4%	92.5%	87.3%
siamese CapsNet	97.9%	98.4%	90.2%

Merged-image CNNs with stacked channel inputs achieved high accuracy (98.4%) on the industrial dataset.
Siamese CNNs achieved 96.4% on the industrial dataset, 92.5% on smallNORB, and 87.3% on AT&T faces.
Siamese CapsNet achieved 97.9% on the industrial dataset, 98.4% on smallNORB, and 90.2% on AT&T faces, often outperforming baseline siamese setups.
CapsNet-based siamese networks perform best on smallNORB, indicating strong performance with limited data.
In the industrial task, the stacked CNN approach is slightly more accurate than Siamese CapsNet when combined with decoder-generated data (98.5%), suggesting decoder augmentation can boost performance.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。