Skip to main content
QUICK REVIEW

[Paper Review] Ubiquitous Talker: Spoken Language Interaction with Real World Objects

Katashi Nagao, Jun Rekimoto|ArXiv.org|May 23, 1995
Speech and dialogue systems14 references44 citations
TL;DR

The Ubiquitous Talker is a portable augmented reality system that enables natural spoken language interaction with real-world objects using color-bar ID tags for situation awareness. By integrating speech recognition, synthesized voice output, and real-time object recognition via a camera, the system allows users to converse with physical objects as if they were intelligent agents, significantly improving robustness and naturalness in human-computer interaction through contextual grounding of language.

ABSTRACT

Augmented reality is a research area that tries to embody an electronic information space within the real world, through computational devices. A crucial issue within this area, is the recognition of real world objects or situations. In natural language processing, it is much easier to determine interpretations of utterances, even if they are ill-formed, when the context or situation is fixed. We therefore introduce robust, natural language processing into a system of augmented reality with situation awareness. Based on this idea, we have developed a portable system, called the Ubiquitous Talker. This consists of an LCD display that reflects the scene at which a user is looking as if it is a transparent glass, a CCD camera for recognizing real world objects with color-bar ID codes, a microphone for recognizing a human voice and a speaker which outputs a synthesized voice. The Ubiquitous Talker provides its user with some information related to a recognized object, by using the display and voice. It also accepts requests or questions as voice inputs. The user feels as if he/she is talking with the object itself through the system.

Motivation & Objective

  • To enable natural, robust spoken language interaction with physical objects in everyday environments.
  • To reduce the complexity of spoken language understanding by grounding it in real-world situational context.
  • To develop a portable, real-time system that integrates speech input/output with visual object recognition for augmented reality.
  • To improve the accuracy and efficiency of speech processing by leveraging non-linguistic context such as object identity and user gaze.
  • To create a seamless, intuitive human-computer interface where users feel they are conversing directly with physical objects.

Proposed method

  • Uses a color-bar ID code system to assign unique, machine-readable identifiers to real-world objects for reliable recognition.
  • Employs a CCD camera to detect and decode color-bar tags in real time, enabling the system to identify which object the user is interacting with.
  • Integrates a microphone and speaker for full-duplex voice interaction, allowing users to issue spoken commands or ask questions.
  • Uses an LCD display to overlay digital information (e.g., descriptions, instructions) onto the real-world view, simulating a see-through interface.
  • Applies situation-aware speech processing that constrains the hypothesis space of natural language understanding using object identity and user attention cues.
  • Leverages deictic expressions (e.g., 'this', 'that', 'here') and gaze tracking to anchor language to physical context, reducing ambiguity.

Experimental results

Research questions

  • RQ1How can spoken language understanding be made more robust in real-world settings with ill-formed or ambiguous utterances?
  • RQ2To what extent can situational context—such as object identity and user gaze—reduce the complexity of speech recognition and interpretation?
  • RQ3Can a low-cost, reliable method for object identification (e.g., color-bar tags) enable practical, real-time augmented reality interaction?
  • RQ4How can multimodal input (speech, vision, gaze) be integrated to support natural, situated dialogue with physical objects?
  • RQ5What role does non-linguistic context play in improving the accuracy and efficiency of spoken language processing in human-computer interaction?

Key findings

  • The integration of visual object recognition via color-bar ID codes significantly reduces the hypothesis space in speech understanding, enabling robust interpretation of ill-formed or ambiguous utterances.
  • Users perceive the system as a conversational agent interacting with physical objects, achieving a high degree of naturalness and immersion in human-computer interaction.
  • The system successfully supports real-time, portable spoken interaction with physical objects, demonstrating the feasibility of context-aware augmented reality in everyday environments.
  • Situation awareness through object identity and gaze tracking allows the system to resolve deictic references (e.g., 'this', 'that') with high accuracy, reducing ambiguity in dialogue.
  • The use of passive, battery-free color-bar tags provides a low-cost, reliable, and scalable solution for object identification compared to active or embedded computing systems.
  • The system’s performance is enhanced by combining linguistic and non-linguistic modalities, showing that multimodal context is essential for efficient and accurate spoken language processing.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.