QUICK REVIEW

[논문 리뷰] RuleSmith: Multi-Agent LLMs for Automated Game Balancing

Ziyao Zeng, Chen Liu|arXiv (Cornell University)|2026. 02. 05.

Artificial Intelligence in Games인용 수 0

한 줄 요약

RuleSmith는 다-에이전트 LLM 자가 학습과 베이지안 최적화를 결합하여 비대칭적이고 매개변수화된 게임(CivMini)을 자동으로 균형 맞추며 거의 같은 승률을 달성하고 해석 가능한 규칙 조정안을 제공합니다.

ABSTRACT

Game balancing is a longstanding challenge requiring repeated playtesting, expert intuition, and extensive manual tuning. We introduce RuleSmith, the first framework that achieves automated game balancing by leveraging the reasoning capabilities of multi-agent LLMs. It couples a game engine, multi-agent LLMs self-play, and Bayesian optimization operating over a multi-dimensional rule space. As a proof of concept, we instantiate RuleSmith on CivMini, a simplified civilization-style game containing heterogeneous factions, economy systems, production rules, and combat mechanics, all governed by tunable parameters. LLM agents interpret textual rulebooks and game states to generate actions, to conduct fast evaluation of balance metrics such as win-rate disparities. To search the parameter landscape efficiently, we integrate Bayesian optimization with acquisition-based adaptive sampling and discrete projection: promising candidates receive more evaluation games for accurate assessment, while exploratory candidates receive fewer games for efficient exploration. Experiments show that RuleSmith converges to highly balanced configurations and provides interpretable rule adjustments that can be directly applied to downstream game systems. Our results illustrate that LLM simulation can serve as a powerful surrogate for automating design and balancing in complex multi-agent environments.

연구 동기 및 목표

언어 모델 에이전트를 사용하여 손으로 만든 휴리스틱 없이 비대칭하고 규칙 기반의 게임의 균형을 자동화합니다.
헬스, 경제, 생산, 점수 매김 방식이 균형에 어떤 영향을 미치는지 연구하기 위해 매개변수화된 규칙 공간을 활용합니다.
획득 기반 적응 샘플링을 통한 평가 비용이 비싸고 시끄러운 평가를 다루는 효율적인 최적화 루프를 개발합니다.

제안 방법

theta 매개변수 규칙 공간에서 비대칭 진영(제국 Empire과 유목민 Nomads)을 플레이하도록 두 개의 LLM 에이전트를 구성합니다.
균형 손실 L(theta) = |w_E - 0.5| + |w_N - 0.5| + 0.5 * w_D를 평가하고 N회의 자가 학습 게임으로 추정합니다.
연속된 규칙 공간의 이완에 대해 베이지안 최적화를 사용하여 theta를 최적화한 다음, 유효한 구성으로 결정론적 이산화를 수행합니다.
기대 개선(Expected Improvement)을 기반으로 하는 적응 샘플링으로 유망한 후보에 더 많은 평가 게임을 할당합니다.
관련 텍스트 규칙을 검색하고 매 턴마다 모든 유닛의 구조화된 JSON 액션을 출력하기 위해 검색 강화 생성(RAG) 시스템을 도입합니다.
경제, 전투, 생산, 점수 매긴 영역에 걸친 12개의 매개변수를 가진 두 진영과 함께 CivMini 7x7 격자에서 접근 방식을 시연합니다.

Figure 1: Overview of RuleSmith. Multi-agent LLMs perform zero-shot self-play using solely the rule book under parameterized rule sets to automatically optimize asymmetric strategy games and other rule-driven systems. This figure is generated by Nano Banana Pro.

실험 결과

연구 질문

RQ1실행 가능한 규칙서에 기반한 다에이전트 LLM 자가 학습이 매개변수화된 비대칭 게임의 균형을 위한 유용한 평가를 생성할 수 있습니까?
RQ2LLM 주도 자가 학습과 베이지안 최적화를 결합하면 고차원 공간에서 균형 있는 규칙 구성을 효율적으로 발견할 수 있습니까?
RQ3모델 용량과 평가 예산이 균형 결과 및 최적 매개변수의 설정 간 전이 가능성에 어떤 영향을 줍니까?
RQ4알고리즘이 비대칭 진영을 균형 맞출 때 경제, 전투, 생산, 점수 매긴에 대해 어떤 해석 가능한 조정이 나타납니까?

주요 결과

RuleSmith는 CivMini에서 승률 차이가 0%에 근접하도록 매우 균형잡힌 구성으로 수렴합니다.
최적화된 매개변수는 건강(생명력) 확장, 자원 효율성, 생산 속도가 공정성에 어떤 영향을 미치는지 해석가능한 통찰을 제공합니다.
획득 기반 예산으로의 적응 샘플링은 고정 샘플 BO 및 무작위 기준선보다 효율성을 향상시킵니다.
모델 용량이 일치할 때 평가 설정 간에 균형 매개변수가 전이되며, 더 큰 모델은 교차 플레이 시나리오에서 전략적 이점을 보입니다.
적응 샘플링은 고정 샘플링 및 다른 기준선보다 거의 같은 승률을 달성하는 데 더 우수합니다.

Figure 2: Overview of the RuleSmith method. We represent an asymmetric, turn-based strategy game (CivMini) as a parameterized rule space $\theta\in\Theta$ , including economy, combat, production, scoring, and game-length parameters. Given a candidate rule configuration $\theta_{t}$ , two role-specif

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.