QUICK REVIEW

[Paper Review] Crowdsourcing Gaze Data Collection

Dmitry Rudoy, Dan B Goldman|arXiv (Cornell University)|Apr 16, 2012

Visual Attention and Saliency Detection5 references19 citations

TL;DR

This paper proposes a low-cost, crowdsourced method for collecting gaze direction data from large numbers of participants using self-reported gaze locations via a temporary character chart displayed after video playback. By validating responses against the chart's layout and aggregating results into heatmaps, the method achieves gaze data accuracy comparable to traditional hardware tracking, enabling scalable, globally distributed gaze studies despite uncontrolled viewing conditions.

ABSTRACT

Knowing where people look is a useful tool in many various image and video applications. However, traditional gaze tracking hardware is expensive and requires local study participants, so acquiring gaze location data from a large number of participants is very problematic. In this work we propose a crowdsourced method for acquisition of gaze direction data from a virtually unlimited number of participants, using a robust self-reporting mechanism (see Figure 1). Our system collects temporally sparse but spatially dense points-of-attention in any visual information. We apply our approach to an existing video data set and demonstrate that we obtain results similar to traditional gaze tracking. We also explore the parameter ranges of our method, and collect gaze tracking data for a large set of YouTube videos.

Motivation & Objective

To address the high cost and limited scalability of traditional gaze tracking hardware in collecting gaze data from large, diverse participant pools.
To enable the collection of gaze location data from a virtually unlimited number of participants using only standard web browsers and internet access.
To develop a robust self-reporting mechanism that ensures data reliability and spatial accuracy without specialized equipment.
To validate that self-reported gaze data can achieve results statistically similar to those from lab-based gaze tracking systems.
To explore the feasibility of using crowdsourced gaze data for large-scale video analysis across diverse demographics and viewing environments.

Proposed method

A video clip of duration $t_v$ seconds is shown to participants, followed immediately by a brief display ($t_c$ seconds) of a character chart containing unique symbol triplets arranged in a grid.
Participants report the symbol triplet they saw most clearly, and the system maps this to its known screen location as the estimated gaze point.
The chart's layout is used to detect and reject invalid responses (e.g., incorrect or random inputs), improving data quality.
A tutorial phase with an approval radius $R_a$ is used to screen out inattentive or careless participants, improving overall data reliability.
Gaze locations from multiple participants are aggregated into a probability density function, visualized as a heat map to show attention hotspots.
The method uses a triplet density $D_r$ to control the spatial distribution of symbols on the chart, minimizing clustering and improving spatial resolution.

Experimental results

Research questions

RQ1Can self-reported gaze data collected via a simple web-based interface achieve accuracy comparable to traditional hardware gaze tracking?
RQ2How does the performance of the crowdsourced method vary under uncontrolled real-world viewing conditions compared to lab-controlled settings?
RQ3To what extent can tutorial-based screening and response validation improve data quality in large-scale gaze data collection?
RQ4Can the method reliably capture attention patterns in dynamic video stimuli despite temporal sparsity of gaze samples?
RQ5How representative are the gaze distributions collected via crowdsourcing compared to those from controlled laboratory experiments?

Key findings

The crowdsourced method produced gaze heatmaps that were statistically similar to those obtained using traditional hardware gaze tracking, validating its accuracy despite lower temporal resolution.
Participants who passed only two out of ten tutorial trials still produced high-quality gaze data, suggesting that the tutorial mechanism effectively promotes attention to gaze location.
The use of a character chart with unique triplets enabled precise spatial mapping of gaze points and allowed for automatic detection of invalid responses.
The method demonstrated robustness across diverse viewing conditions, including variations in screen resolution, brightness, and viewing distance, though this introduced variability compared to lab settings.
The system successfully collected gaze data for a large set of YouTube videos, enabling global demographic correlation that would be infeasible with traditional methods.
The study found that gaze patterns varied significantly with screen contrast and ambient lighting, highlighting the trade-off between ecological validity and consistency in data collection.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.