Skip to main content
QUICK REVIEW

[Paper Review] Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision

David Picard|arXiv (Cornell University)|Sep 16, 2021
Advanced Neural Network Applications39 citations
TL;DR

The paper investigates how random seeds influence accuracy across CV models, showing seed-induced variance can be substantial, even with large datasets and pretrained models, and calls for randomness reporting in publications.

ABSTRACT

In this paper I investigate the effect of random seed selection on the accuracy when using popular deep learning architectures for computer vision. I scan a large amount of seeds (up to $10^4$) on CIFAR 10 and I also scan fewer seeds on Imagenet using pre-trained models to investigate large scale datasets. The conclusions are that even if the variance is not very large, it is surprisingly easy to find an outlier that performs much better or much worse than the average.

Motivation & Objective

  • Assess the distribution of model accuracy across random seeds in CIFAR-10 and ImageNet experiments.
  • Identify the existence and magnitude of seed-induced outliers (black swans) in performance.
  • Evaluate whether pretraining on larger datasets mitigates seed-induced variability in CV models.
  • Provide guidance on robust experimental practices regarding randomness in reporting results.

Proposed method

  • Train a ResNet9 on CIFAR-10 across 500 seeds (long training) and 10,000 seeds (short training) to assess convergence stability and seed variability.
  • Evaluate pretrained models on ImageNet using ResNet50 (supervised and SSL) and SSL ViT, with 50 seeds per setup, measuring variation in final accuracy.
  • Use SGD with momentum and weight decay; apply cosine annealing; compare final convergence statistics (mean, std, min, max).
  • Publicly share code and results to enable reproducibility (GitHub: deepseed).
  • Constrain experiments within a ~1000 GPU-hour budget to simulate practical research settings.

Experimental results

Research questions

  • RQ1What is the distribution of accuracy across seeds after convergence?
  • RQ2Are there seed configurations that yield significantly better or worse performance (black swans)?
  • RQ3Does pretraining on larger datasets reduce seed-induced variability in downstream fine-tuning on ImageNet?

Key findings

  • On CIFAR-10 with long training, final accuracy distribution is concentrated around the mean with small variability (mean 90.70, std 0.20, min 90.14, max 91.41).
  • With 10,000 seeds on CIFAR-10 (short training), max/min spread is 1.82 percentage points (89.01% to 90.83%), showing seeds can produce materially different outcomes.
  • On ImageNet with pretrained models, seed-induced variability is smaller (std ~0.1%) but still present (min to max spread ~0.5%), indicating seeds can affect results even with large-scale pretraining.
  • Seed variation persists even when starting from pretrained weights, so larger datasets or pretraining reduce but do not eliminate randomness.
  • The study argues that many recent CV results may be overstated due to implicit seed search and recommends reporting mean, std, min, and max across multiple seeds.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.