[Paper Review] Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games.
This paper proposes GATA, a graph-aided transformer agent that learns dynamic knowledge graphs end-to-end from raw text to improve planning and generalization in text-based games. By combining reinforcement and self-supervised learning, GATA outperforms text-only baselines by an average of 24.2% across 500+ TextWorld games, demonstrating superior policy convergence and generalization.
Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text. We propose a novel graph-aided transformer agent (GATA) that infers and updates latent belief graphs during planning to enable effective action selection by capturing the underlying game dynamics. GATA is trained using a combination of reinforcement and self-supervised learning. Our work demonstrates that the learned graph-based representations help agents converge to better policies than their text-only counterparts and facilitate effective generalization across game configurations. Experiments on 500+ unique games from the TextWorld suite show that our best agent outperforms text-based baselines by an average of 24.2%.
Motivation & Objective
- To overcome the limitations of hand-crafted representations and heuristics in text-based game agents.
- To enable effective sequential decision-making and generalization across diverse game configurations.
- To learn structured, dynamic knowledge graphs end-to-end from raw textual game descriptions.
- To improve policy learning and planning performance through graph-structured belief representations.
Proposed method
- GATA employs a graph-aided transformer architecture that infers and updates latent belief graphs during planning.
- The agent uses self-supervised learning to pre-train on raw text sequences to build initial graph structures.
- Reinforcement learning fine-tunes the agent on game-specific rewards, updating the graph based on observed transitions.
- The belief graph captures entity relationships and game state dynamics, enabling better action selection.
- Graph updates are differentiable, allowing end-to-end training via policy gradients.
- The model integrates attention mechanisms over both text tokens and graph nodes to enhance contextual reasoning.
Experimental results
Research questions
- RQ1Can end-to-end learned dynamic knowledge graphs improve policy learning in text-based games?
- RQ2How does graph-structured representation enhance generalization across unseen game configurations?
- RQ3To what extent does combining self-supervised and reinforcement learning improve agent performance compared to text-only baselines?
- RQ4Can the agent maintain effective planning under dynamic and complex game environments using latent graphs?
Key findings
- GATA outperforms text-only baselines by an average of 24.2% across 500+ games in the TextWorld suite.
- The learned graph representations enable faster convergence to high-performing policies compared to text-only models.
- Generalization across unseen game configurations is significantly improved due to structured, dynamic knowledge encoding.
- Self-supervised pre-training on raw text enhances downstream reinforcement learning performance.
- The dynamic graph updates allow the agent to adaptively model evolving game states and relationships.
- The graph-aided approach leads to more robust and interpretable decision-making in complex text-based environments.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.