QUICK REVIEW

[논문 리뷰] WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation

Zahin Sufiyan, Shadan Golestan|arXiv (Cornell University)|2026. 03. 18.

Reinforcement Learning in Robotics인용 수 0

한 줄 요약

WINFlowNets는 워밍업 + 이중 학습 프레임워크에서 공유 리플레이 버퍼를 갖춘 흐름 네트워크와 검색/리트리벌 네트워크를 공동으로 학습하여 동적이고 고장에 취약한 로봇 과제에서 지속적인 적응을 가능하게 하고, 평균 보상과 안정성 면에서 CFlowNets 및 표준 RL 베이스라인을 능가한다.

ABSTRACT

Generative Flow Networks for continuous scenarios (CFlowNets) have shown promise in solving sequential decision-making tasks by learning stochastic policies using a flow and a retrieval network. Despite their demonstrated efficiency compared to state-of-the-art Reinforcement Learning (RL) algorithms, their practical application in robotic control tasks is constrained by the reliance on pre-training the retrieval network. This dependency poses challenges in dynamic robotic environments, where pre-training data may not be readily available or representative of the current environment. This paper introduces WINFlowNets, a novel CFlowNets framework that enables the co-training of flow and retrieval networks. WINFlowNets begins with a warm-up phase for the retrieval network to bootstrap its policy, followed by a shared training architecture and a shared replay buffer for co-training both networks. Experiments in simulated robotic environments demonstrate that WINFlowNets surpasses CFlowNets and state-of-the-art RL algorithms in terms of average reward and training stability. Furthermore, WINFlowNets exhibits strong adaptive capability in fault environments, making it suitable for tasks that demand quick adaptation with limited sample data. These findings highlight WINFlowNets' potential for deployment in dynamic and malfunction-prone robotic systems, where traditional pre-training or sample inefficient data collection may be impractical.

연구 동기 및 목표

동적 환경과 고장 하에서의 연속 로봇 제어를 위한 강건한 순차 의사결정을 촉진한다.
흐름(flow)와 검색(retrieval) 구성요소를 함께 학습시켜 사전 학습된 검색 네트워크에 대한 의존성을 제거한다.
공유된 리플레이 버퍼를 갖춘 두 단계 학습 방식(Warm-Up + Dual-Training)을 제안하여 지속적인 적응을 가능하게 한다.
시뮬레이션된 로봇 고장에서 CFlowNets 및 전통적 RL 알고리즘에 비해 평균 보상과 안정성이 향상되었음을 입증한다.

제안 방법

공유된 리플레이 버퍼를 갖춘 두-network GFlowNet 프레임워크인 WINFlowNets를 소개한다.
Warm-Up 단계는 관찰된 전이를 사용하여 이전 상태를 예측하도록 검색 네트워크 Gϕ를 학습시킨다.
Dual-Training 단계는 흘입/흘출 추정치와 공유 버퍼를 사용하여 흐름 네트워크 Fθ와 Gϕ를 함께 업데이트한다.
Flow matching은 Fθ 및 Gϕ를 통해 inflow f+(s)와 outflow f−(s) 근사를 log-sum-exp 형태로 사용한다.
Equation 2는 샘플링된 행동과 보상을 기반으로 한 연속 흐름 매칭 손실을 표현한다.
학습은 Gϕ의 사전 학습을 피하여 분포 외 및 고장 시나리오에 대한 적응을 가능하게 한다.

(a) Directed acyclic graph representation of the decision-making process in GFlowNet.

실험 결과

연구 질문

RQ1사전 학습 없이 흐름과 검색 네트워크를 공동 학습시키는 것이 동적이고 고장 가능성이 있는 환경에 대한 적응성을 향상시킬 수 있는가?
RQ2Warm-Up + Dual-Training WINFlowNets 프레임워크가 일반 및 고장 로봇 작업에서 표준 CFlowNets 및 RL 베이스라인을 능가하는가?
RQ3공유된 리플레이 버퍼가 분리된 버퍼에 비해 학습 안정성과 적응 속도에 어떤 영향을 미치는가?

주요 결과

모델	최종 성능	샘플 효율성
SAC	-7.89 ± 0.16	0.67
PPO	-9.50 ± 0.37	3.39
DDPG	-9.55 ± 0.44	5.20
CFlowNets	-3.70 ± 0.05	0.10
WINFlowNets	-2.39 ± 0.17	0.72

WINFlowNets는 일반 Reacher-v2 환경에서 평균 보상 면에서 CFlowNets 및 RL 베이스라인(PPO, SAC, DDPG)을 상회한다.
고장 시나리오에서 WINFlowNets는 CFlowNets 및 SAC에 비해 최종 성능을 향상시켜 더 나은 고장 적응을 보여준다.
공유 리플레이 버퍼를 갖춘 Warm-Up + Dual-Training 아키텍처는 어느 한 구성요소도 없는 변형들보다 더 안정적이고 우수한 점근적 성능을 보인다.
WINFlowNets는 점근적 성능에 도달하려면 더 많은 학습 샘플이 필요하지만 지속적 적응으로 인해 최종 정책 품질이 더 높게 달성된다.

(b) An overview of our proposed decision-making framework.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.