[Paper Review] Learning to solve the credit assignment problem
This paper proposes a biologically plausible learning algorithm that trains feedback weights using reinforcement learning to approximate backpropagation gradients. By using perturbations and a global reward signal, the method learns accurate gradient approximations in feedforward and convolutional networks, matching or exceeding backpropagation performance without requiring symmetric feedback or precise learning rules.
Backpropagation is driving today's artificial neural networks (ANNs). However, despite extensive research, it remains unclear if the brain implements this algorithm. Among neuroscientists, reinforcement learning (RL) algorithms are often seen as a realistic alternative: neurons can randomly introduce change, and use unspecific feedback signals to observe their effect on the cost and thus approximate their gradient. However, the convergence rate of such learning scales poorly with the number of involved neurons. Here we propose a hybrid learning approach. Each neuron uses an RL-type strategy to learn how to approximate the gradients that backpropagation would provide. We provide proof that our approach converges to the true gradient for certain classes of networks. In both feedforward and convolutional networks, we empirically show that our approach learns to approximate the gradient, and can match or the performance of exact gradient-based learning. Learning feedback weights provides a biologically plausible mechanism of achieving good performance, without the need for precise, pre-specified learning rules.
Motivation & Objective
- To address the credit assignment problem in biological neural networks, where neurons must determine their contribution to a global outcome.
- To overcome the limitations of reinforcement learning in large networks, which suffer from high variance and slow convergence.
- To develop a hybrid learning system where feedback weights are trained via RL to approximate true gradients, enabling efficient and scalable learning.
- To provide a biologically plausible alternative to backpropagation that avoids the need for symmetric feedback weights or pre-specified learning rules.
Proposed method
- Each neuron uses a reinforcement learning strategy (REINFORCE-style) to learn feedback weights that approximate the gradients that backpropagation would provide.
- Feedback weights are updated using a global reward signal and stochastic perturbations of hidden layer activations to estimate gradient direction.
- The method employs online ridge regression to solve for feedback weights that minimize the error between estimated and true gradients.
- The feedback weight matrix $ B $ is trained to predict the gradient of the loss with respect to hidden layer activations using perturbed feedback signals.
- The approach is applied to both fully connected and convolutional neural networks, with training using stochastic gradient descent and adaptive optimizers.
- A warm-up phase freezes feedforward weights while allowing feedback weights to adapt, improving training stability.
Experimental results
Research questions
- RQ1Can a reinforcement learning-based method train feedback weights to approximate true gradients in a biologically plausible way?
- RQ2Does this method achieve performance comparable to exact backpropagation in feedforward and convolutional networks?
- RQ3How does the method scale with network depth and width compared to feedback alignment and synthetic gradients?
- RQ4Can the method overcome the limitations of feedback alignment in convolutional networks and deep architectures?
- RQ5What is the impact of perturbation noise and feedback weight adaptation on learning stability and convergence?
Key findings
- The method converges to the true gradient in specific network classes, with theoretical proof of consistency under certain conditions.
- In feedforward networks, the approach matches or exceeds the performance of exact backpropagation and outperforms feedback alignment and synthetic gradients.
- The method successfully learns in convolutional neural networks (CIFAR10 and CIFAR100), where feedback alignment fails, achieving competitive accuracy.
- The feedback weights learn to produce gradient approximations that show significant sign congruence with true gradients, even when the matrices differ substantially.
- The method is robust to hyperparameter variation, with optimal noise levels found via random search that improve generalization.
- Ablation studies confirm that the performance gain comes from the RL-based feedback weight training, not from noise alone, and that the method outperforms baselines like matching rules and synthetic gradients with true gradients.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.