QUICK REVIEW

[Paper Review] Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

Decebal Constantin Mocanu, Maria Torres Vega|arXiv (Cornell University)|Oct 18, 2016

Advanced Bandit Algorithms Research2 references18 citations

TL;DR

This paper proposes Online Contrastive Divergence with Generative Replay (OCDGR), a novel online training method for Restricted Boltzmann Machines (RBMs) that replaces traditional experience replay by generating synthetic past experiences using the RBM's own generative capabilities. By avoiding explicit data storage, OCDGR achieves comparable or better generative performance than experience replay with significantly reduced memory usage, outperforming ER in 64.28% of tested cases on real-world datasets while maintaining similar time complexity.

ABSTRACT

Conceived in the early 1990s, Experience Replay (ER) has been shown to be a successful mechanism to allow online learning algorithms to reuse past experiences. Traditionally, ER can be applied to all machine learning paradigms (i.e., unsupervised, supervised, and reinforcement learning). Recently, ER has contributed to improving the performance of deep reinforcement learning. Yet, its application to many practical settings is still limited by the memory requirements of ER, necessary to explicitly store previous observations. To remedy this issue, we explore a novel approach, Online Contrastive Divergence with Generative Replay (OCD_GR), which uses the generative capability of Restricted Boltzmann Machines (RBMs) instead of recorded past experiences. The RBM is trained online, and does not require the system to store any of the observed data points. We compare OCD_GR to ER on 9 real-world datasets, considering a worst-case scenario (data points arriving in sorted order) as well as a more realistic one (sequential random-order data points). Our results show that in 64.28% of the cases OCD_GR outperforms ER and in the remaining 35.72% it has an almost equal performance, while having a considerably reduced space complexity (i.e., memory usage) at a comparable time complexity.

Motivation & Objective

To address the high memory overhead of traditional Experience Replay (ER) in online learning, especially in low-resource environments.
To explore whether generative models can effectively simulate past experiences without storing raw data.
To develop an online training algorithm for RBMs that leverages generative replay instead of explicit data retention.
To evaluate the performance of the proposed method against standard ER in terms of generative capability and memory efficiency.

Proposed method

Training an RBM in an online fashion using Online Contrastive Divergence (OCD), which updates weights incrementally with each new data point.
Replacing traditional experience replay with a generative replay mechanism where the RBM itself generates synthetic past experiences for training.
Using the trained RBM to sample from its learned distribution to simulate previously observed data points during online learning.
Maintaining a Markov chain structure via online weight updates, ensuring stable learning without dependency on stored data buffers.
Employing contrastive divergence with a fixed number of steps (e.g., nCD = 3 or 10) to approximate the gradient of the log-likelihood during online updates.
Integrating the generated samples into the online learning process as if they were real past experiences, enabling continual learning without data storage.

Experimental results

Research questions

RQ1Can a generative model like an RBM effectively simulate past experiences in online learning without storing raw data?
RQ2How does the performance of online RBM training with generative replay compare to traditional experience replay in terms of generative accuracy?
RQ3What is the impact of the number of contrastive divergence steps on the generative performance of the online RBM?
RQ4Does the proposed method maintain low time complexity while achieving high memory efficiency?
RQ5How does the method scale with increasing data complexity and dataset size?

Key findings

OCDGR outperformed traditional experience replay in 64.28% of the 9 real-world datasets tested, with the remaining 35.72% showing nearly equivalent performance.
On the MNIST dataset, RBMOCD achieved a test set average log probability of -104.31 with 10 contrastive divergence steps, improving from -108.96 with 3 steps.
The learning curve of RBMOCD remained stable over time, while RBMER-ML and RBMER-IM showed instability due to poor distributional coverage as replay memory became outdated.
Memory usage in OCDGR was drastically reduced compared to ER, as no data points were stored, while time complexity remained comparable.
As dataset size and distributional complexity increased, OCDGR's performance advantage over ER-based methods became more pronounced.
The method demonstrated stable and consistent performance across both sorted-order and random-order data arrival scenarios, indicating robustness to data ordering.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.