[Paper Review] Does Object Recognition Work for Everyone?
The paper evaluates publicly available object-recognition systems on the Dollar Street dataset and finds substantial accuracy gaps across countries and income levels, driven by item appearance and context differences, suggesting the need for more globally representative and multilingual models.
The paper analyzes the accuracy of publicly available object-recognition systems on a geographically diverse dataset. This dataset contains household items and was designed to have a more representative geographical coverage than commonly used image datasets in object recognition. We find that the systems perform relatively poorly on household items that commonly occur in countries with a low household income. Qualitative analyses suggest the drop in performance is primarily due to appearance differences within an object class (e.g., dish soap) and due to items appearing in a different context (e.g., toothbrushes appearing outside of bathrooms). The results of our study suggest that further work is needed to make object-recognition systems work equally well for people across different countries and income levels.
Motivation & Objective
- Assess whether current object-recognition systems perform equally across countries and income levels.
- Identify the main causes of performance disparities in household-item recognition.
- Quantify accuracy gaps across income and geographic regions using diverse, real-world imagery.
- Suggest potential directions to improve cross-country fairness in object recognition.
Proposed method
- Evaluate five cloud-based vision services (Azure, Clarifai, Google Cloud Vision, Amazon Rekognition, IBM Watson) plus a ResNet-101 model trained on Tencent ML Images.
- Use Dollar Street dataset with 117 household-item classes across 54 countries and 264 homes; ground truth via human annotation of top-5 predictions (accuracy@5).
- Analyze accuracy as a function of household income (PPP-adjusted) and country; control for sample sizes across income bins.
- Investigate sources of discrepancies: geographic sampling bias and language/base-language effects in data collection.
- Provide supplementary analyses including country-level maps and a focused India subset to decouple income and location.
Experimental results
Research questions
- RQ1How does object-recognition accuracy vary with country of image origin and household income?
- RQ2What are the main factors driving accuracy discrepancies (appearance within a class, context, or dataset bias)?
- RQ3Do multiple public cloud systems exhibit similar cross-country/income gaps in recognition?
- RQ4What strategies could mitigate geographic- and income-related performance gaps (e.g., geography-based resampling, multilingual training)?
Key findings
- Average accuracy differences across income: items from households earning <US$50/month are ~10 percentage points less accurate than items from households earning >US$3,500/month.
- Geographic disparities are large: accuracy is ~15–20 percentage points higher in the United States than in Somalia or Burkina Faso.
- Discrepancies are driven by appearance differences within a class (e.g., dish soap) and items appearing in different contexts (e.g., toothbrushes outside bathrooms).
- Results are consistent across six systems (five cloud services plus a ResNet-101 model).
- Geography and income are both drivers of performance; an India-only subset shows income-related accuracy trends even within a single country.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.