Skip to main content
QUICK REVIEW

[论文解读] Realistic Synthetic Household Data Generation at Scale

Siddharth Singh, Ifrah Idrees|arXiv (Cornell University)|Feb 6, 2026
Social Robot Interaction and HRI被引用 0
一句话总结

一个双向耦合框架生成可扩展的、语义一致的合成家庭环境与长期人机交互数据,并在用户角色指导下进行迭代 refinements,与真实与合成基线进行对比验证。

ABSTRACT

Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.

研究动机与目标

  • 说明在多样化家庭环境中训练具象化 AI 所需的大规模、真实感合成数据的必要性。
  • 提出一个双向、松耦合框架,将环境示意与人类活动生成联系起来。
  • 实现面向角色的人物驱动环境生成与环境信息驱动的活动合成,并具备时间一致性。
  • 通过迭代 refinement 与多模态验证展示双向信息交换。
  • 展示仿真到现实的对齐以及对家庭机器人数据管线的实际意义。

提出的方法

  • 环境示意生成器基于人物需求创建带语义对象放置的3D家庭布局。
  • 人类活动与人机交互生成器在环境负载能力基础上合成时间上具有一致性的行为序列。
  • 双向影响控制器协调环境与活动模块之间的迭代信息交换。
  • 通用仿真适配器在保持语义的前提下将中间表示转换为与仿真器无关的格式。
  • 基于环境密度、活动粒度和语义相似性标准进行迭代 refinement,直至收敛。
Figure 1: Framework Pipeline Overview: Our bidirectional generation framework comprises three primary modules operating in an iterative refinement cycle. The Environment Schematic Generator produces 3D household layouts based on persona-driven requirements. The Human Activity and HRI Generator synth
Figure 1: Framework Pipeline Overview: Our bidirectional generation framework comprises three primary modules operating in an iterative refinement cycle. The Environment Schematic Generator produces 3D household layouts based on persona-driven requirements. The Human Activity and HRI Generator synth

实验结果

研究问题

  • RQ1环境生成与人类活动合成之间的双向耦合是否能产生更真实、语义扎根的合成家庭数据?
  • RQ2以人物驱动的环境生成和活动生成是否与现实家庭模式一致并实现可扩展的变异?
  • RQ3迭代 refinement 是否提高角色、环境与行为之间的语义一致性和互信息?
  • RQ4生成的数据与真实数据集(如 HOMER)相比,以及与其他合成基线相比的对齐程度如何?
  • RQ5大规模生成的实际计算成本及其对可行性的影响如何?

主要发现

IterationMI(P,E)+MI(E,B)Cosine Sim.
10.45±0.090.58±0.12
20.62±0.080.65±0.10
30.74±0.060.71±0.08
40.81±0.050.76±0.07
50.85±0.040.79±0.06
  • 多模态语义对齐程度较高:环境–行为余弦相似度0.72,人物–环境0.68,人物–行为0.61。
  • 双向耦合通过环境调解人物与行为,随着迭代互信息提升(MI(P,E)+MI(E,B)最高至0.85)。
  • 干预分析显示人物变化对环境与行为有显著因果效应(p<0.001;Cohen’s d 0.51–1.12)。
  • 与真实数据的对齐验证显示与 HOMER 的相似性良好(余弦约0.60),与 Wang 等人的合成人物相比对齐较弱(约0.27)。
  • 在一个小型测试场景中,计算评估显示可扩展使用,包含多次大语言模型调用及处理时间(环境50.00s、HRIs 81.04s、双向19.00s)。
Figure 2: Input Specification and Contextual Memory Framework: Our system accepts structured natural language descriptions of household member personas and environmental constraints. The framework maintains contextual memory across the pipeline, providing the LLM with context regarding task requirem
Figure 2: Input Specification and Contextual Memory Framework: Our system accepts structured natural language descriptions of household member personas and environmental constraints. The framework maintains contextual memory across the pipeline, providing the LLM with context regarding task requirem

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。