QUICK REVIEW

[Paper Review] Understanding Place Identity with Generative AI

Kee Moon Jang, Jun‐Da Chen|arXiv (Cornell University)|Jan 1, 2023

Human Mobility and Location-Based Analysis3 citations

TL;DR

This study evaluates the potential of generative AI models—ChatGPT and DALL·E2—to capture the collective place identity of 31 global cities by generating text and image outputs based on city-specific prompts. Cross-validation against Wikipedia text and Google image search results shows that both models produce semantically and perceptually similar outputs to real-world urban representations, demonstrating their viability for scalable, data-driven urban perception research.

ABSTRACT

Researchers are constantly leveraging new forms of data with the goal of understanding how people perceive the built environment and build the collective place identity of cities. Latest advancements in generative artificial intelligence (AI) models have enabled the production of realistic representations learned from vast amounts of data. In this study, we aim to test the potential of generative AI as the source of textual and visual information in capturing the place identity of cities assessed by filtered descriptions and images. We asked questions on the place identity of a set of 31 global cities to two generative AI models, ChatGPT and DALL-E2. Since generative AI has raised ethical concerns regarding its trustworthiness, we performed cross-validation to examine whether the results show similar patterns to real urban settings. In particular, we compared the outputs with Wikipedia data for text and images searched from Google for image. Our results indicate that generative AI models have the potential to capture the collective image of cities that can make them distinguishable. This study is among the first attempts to explore the capabilities of generative AI in understanding human perceptions of the built environment. It contributes to urban design literature by discussing future research opportunities and potential limitations.

Motivation & Objective

To investigate whether generative AI models can effectively represent the collective place identity of cities based on textual and visual outputs.
To assess the reliability of generative AI outputs by comparing them with real-world data sources such as Wikipedia and Google Images.
To identify limitations in generative AI’s ability to capture nuanced, place-specific characteristics like cultural values and urban morphology.
To explore methodological improvements—such as prompt engineering and similarity metrics—for enhancing the trustworthiness of AI-generated urban representations.

Proposed method

Generated a text-based dataset using ChatGPT with standardized prompts: 'What is the place identity of {city}? Give me in ten bullet points.'
Produced an image-based dataset using DALL·E2 with prompts: 'What is the place identity of streetscapes of {city}?'; 10 images per city at 256×256 pixels.
Collected ground-truth text data from Wikipedia and image data via Google image search for cross-validation.
Measured text similarity using sentence embeddings from a fine-tuned MiniLM-BERT model and computed cosine similarity between ChatGPT and Wikipedia sentences.
Assessed image perceptual similarity using the LPIPS metric, with top three matching Google images identified per DALL·E2 output.
Conducted visual analysis using word clouds to compare thematic content between ChatGPT responses and Wikipedia texts.

Experimental results

Research questions

RQ1Can generative AI models like ChatGPT and DALL·E2 accurately represent the place identity of cities through text and image generation?
RQ2How similar are the generative AI outputs to real-world urban representations in terms of semantic content and visual perception?
RQ3What are the limitations of generative AI in capturing intangible and place-specific attributes such as cultural values and urban morphology?
RQ4How can prompt engineering and similarity metrics be improved to enhance the reliability and specificity of AI-generated urban representations?

Key findings

ChatGPT generated text responses showed high semantic similarity to Wikipedia introductions, with a cosine similarity score of 0.94 for Madrid’s climate description.
Word cloud analysis revealed that ChatGPT effectively captured intangible characteristics such as culture, vibrancy, and modernity for Seoul, and governance-related terms like 'government' and 'one country' for Singapore.
DALL·E2 generated images for Lisbon showed low LPIPS scores (0.65–0.82), indicating strong perceptual similarity to real Google images, particularly in capturing yellow-colored low-rise residential buildings.
Cities like Almaty, Blantyre, Lisbon, and Sydney achieved the highest perceptual similarity (LPIPS ≈ 0.65), suggesting DALL·E2 performs well for certain urban typologies.
Despite strong performance in some cases, DALL·E2 outputs often included generic urban features (e.g., streetlights, pavements) rather than distinct place-specific attributes.
LPIPS scores showed inconsistent interpretation, with similar scenes yielding varying similarity scores, indicating uncertainty in metric-based evaluation of image reliability.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.