QUICK REVIEW

[논문 리뷰] Nationality and Region Prediction from Names: A Comparative Study of Neural Models and Large Language Models

Keito Inoshita|arXiv (Cornell University)|2026. 01. 13.

Names, Identity, and Discrimination Research인용 수 1

한 줄 요약

이 논문은 여섯 개의 신경 모델과 여섯 가지 LLM 프롬프트 전략을 체계적으로 비교하여 이름에서 국적과 지역을 예측하고, LLM이 신경 모델보다 모든 해상도에서 우수한 성능을 보이며, 해상도가 거칠어질수록 성능 격차가 줄고 오차 패턴이 뚜렷해짐을 보여준다.

ABSTRACT

Predicting nationality from personal names has practical value in marketing, demographic research, and genealogical studies. Conventional neural models learn statistical correspondences between names and nationalities from task-specific training data, posing challenges in generalizing to low-frequency nationalities and distinguishing similar nationalities within the same region. Large language models (LLMs) have the potential to address these challenges by leveraging world knowledge acquired during pre-training. In this study, we comprehensively compare neural models and LLMs on nationality prediction, evaluating six neural models and six LLM prompting strategies across three granularity levels (nationality, region, and continent), with frequency-based stratified analysis and error analysis. Results show that LLMs outperform neural models at all granularity levels, with the gap narrowing as granularity becomes coarser. Simple machine learning methods exhibit the highest frequency robustness, while pre-trained models and LLMs show degradation for low-frequency nationalities. Error analysis reveals that LLMs tend to make ``near-miss'' errors, predicting the correct region even when nationality is incorrect, whereas neural models exhibit more cross-regional errors and bias toward high-frequency classes. These findings indicate that LLM superiority stems from world knowledge, model selection should consider required granularity, and evaluation should account for error quality beyond accuracy.

연구 동기 및 목표

마케팅, 인구 통계학, 계보 연구 등 응용 분야를 위한 개인 이름에서 국적/지역 예측의 필요성 제기.
다양한 빈도로 나타나는 국적과 지역 내 구분에서 신경 모델과 LLM이 일반화하는 방식 평가.
세 가지 해상도 수준(국적, 지역, 대륙)에서의 예측 성능 분석 및 오류 패턴 식별.
지식 집중형 분류 작업에서 프롬프트 설계와 모델 선택이 LLM의 능력에 미치는 영향 조사.

제안 방법

문자 n-그램을 이용한 SVM, fastText, CNN, BiLSTM, CANINE, XLM-RoBERTa의 여섯 가지 신경 기반 베이스라인 평가.
제로샷, 적은 샷, 체인-오브-생각(Chain-of-Thought), 자기일관성(Self-Consistency), 최소-다발(Least-to-Most), 자기반성(Self-Reflection)의 여섯 가지 LLM 프롬프트 전략 평가.
name2nat에서 파생된 99개 국적으로 필터링된 데이터셋(train/val/test 비율 8:1:1)과 층화 샘플링 사용.
정확도, Macro-F1, 그리고 Precision@k (k=2,3,5)로 성능 평가 및 빈도 기반 층화 분석(상위/중간/하위).
신경 모델에 대해 세 가지 무작위 시드와 LLM 프롬프트를 위한 GPT-4.1-mini API를 통한 결과를 평균 ± SD로 보고.
세분화된(국적) 예측과 거친 수준의 예측(지역, 대륙) 간의 비교를 위한 계층적 평가 프레임워크를 제공하고 오류 유형을 분석.

실험 결과

연구 질문

RQ1다양한 해상도 수준에서 이름에서 국적 및 지역 예측에서 신경 모델과 대규모 언어 모델의 비교는 어떠한가?
RQ2이 작업을 위해 프롬프트 전략 중 어떤 것이 LLM이 사전 학습된 세계 지식을 가장 잘 활용하게 하는가, 저빈도 국적에 대한 예측은 얼마나 견고한가?
RQ3예측 해상도가 신경 모델과 LLM 간 성능 차이에 어떤 영향을 미치며, 이것이 모델 선택에 대해 무엇을 시사하는가?
RQ4이 작업에서 LLM과 신경 모델의 질적 오류 패턴은 무엇인가(근접 오답 대 지역 간 오류)?

주요 결과

LLMs는 모든 해상도 수준에서 국적 예측에 대해 신경 모델보다 상당히 우수한 성능을 보인다.
해상도가 거칠수록(국적 → 지역 → 대륙) LLM과 신경 모델 간의 성능 격차가 축소된다.
단순한 기계 학습 방법은 고빈도 국적에 대해 견고성을 보이는 반면, 사전 학습된 모델과 LLM은 저빈도 국적에서 성능이 저하된다.
LLMs는 정확한 국적이 잘못되었을 때도 올바른 지역을 예측하는 근접 오답 경향을 보이는 반면, 신경 모델은 지역 간 오답이 더 많고 고빈도 클래스에 편향된다.
프롬프트 설계는 LLM 성능에 영향을 미치며, Self-Consistency 및 Zero-shot/Few-shot 변형이 본 연구에서 강력한 결과를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.