QUICK REVIEW

[논문 리뷰] The Political Preferences of LLMs

David Rozado|arXiv (Cornell University)|2024. 02. 02.

International Arbitration and Investment Law인용 수 6

한 줄 요약

논문은 11개의 정치 방향 테스트를 사용하여 24개의 최첨단 대화형 LLM의 정치적 성향을 분석하고, 프롬프트 후 다수의 채팅봇에서 좌측 중심 편향을 보이며, 감독형 파인튜닝(SFT)이 LLM을 특정 정치적 입장으로 이끌 수 있음을 시연한다.

ABSTRACT

I report here a comprehensive analysis about the political preferences embedded in Large Language Models (LLMs). Namely, I administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both closed and open source. When probed with questions/statements with political connotations, most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. This does not appear to be the case for five additional base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. However, the weak performance of the base models at coherently answering the tests' questions makes this subset of results inconclusive. Finally, I demonstrate that LLMs can be steered towards specific locations in the political spectrum through Supervised Fine-Tuning (SFT) with only modest amounts of politically aligned data, suggesting SFT's potential to embed political orientation in LLMs. With LLMs beginning to partially displace traditional information sources like search engines and Wikipedia, the societal implications of political biases embedded in LLMs are substantial.

연구 동기 및 목표

현대의 대화형 LLM에서 표준화된 정치 방향 테스트를 사용하여 내재된 정치적 선호를 평가한다.
폐쇄형 소스와 오픈 소스 LLM의 정치적 프롬프트에 대한 응답을 비교한다.
기초 모델(Base model)과 대화 최적화 모델 간의 정치 편향 차이를 평가한다.
감독형 파인튜닝이 LLM을 특정 정치적 방향으로 이끌 수 있는지 조사한다.
AI가 주요 정보원의 역할을 하게 될 때 LLM의 정치적 편향이 사회에 미치는 영향을 논의한다.]
method':['폐쇄형 및 오픈 모델을 포함하여 24개의 최첨단 대화형 LLM에 대해 11개의 정치 방향 테스트를 시행한다.
정치적 진술/질문에 대한 응답을 테스트로 진단된 정치적 선호와 비교한다.
대화 최적화 LLM과 그 기반 파운데이션 모델 간의 결과를 대조한다.
정치적으로 정렬된 데이터로의 감독형 파인튜닝(SFT)이 모델의 위치를 어떻게 영향을 주는지 시연한다.
모델 간 테스트 응답의 강건성과 일관성을 분석한다.]
research_questions':['현대의 대화형 LLM이 표준 정치 방향 테스트에 따라 좌측 중심의 정치적 선호를 보이는가?','기초 파운데이션 모델과 그 대화 최적화 버전 간의 정치 평가에서 차이가 있는가?','정치적으로 정렬된 데이터로의 감독형 파인튜닝이 LLM을 특정 정치적 방향으로 이끌 수 있는가?','LLM이 정보원으로서의 역할을 할 때 정치 편향이 사회에 미치는 함축은 무엇인가?']
key_findings':['대다수의 대화형 LLM은 정치 프롬프트에 대해 좌측 중심 관점과 일치하는 응답을 생성하는 경향이 있다.','기초 파운데이션 모델은 일관성 문제로 인해 이러한 정치 테스트에서 약하고 결정적이지 않은 결과를 보인다.','모호하지 않게 정치적으로 정렬된 데이터를 사용한 감독형 미세조정(SFT)이 LLM을 목표 정치 위치로 이끌 수 있다.','연구는 LLM이 정보 접근에 점점 더 큰 영향을 미치게 됨에 따라 사회적 함의가 상당하다는 점을 강조한다.']
table_headers:[]
table_rows:[]} } }

제안 방법

Administer 11 political orientation tests to 24 state-of-the-art conversational LLMs, including closed and open models.
Compare responses to political statements/questions against test-diagnosed political preferences.
Contrast results between conversationally optimized LLMs and their base foundation models.
Demonstrate how Supervised Fine-Tuning (SFT) with politically aligned data affects model positioning.
Analyze robustness and coherence of test answers across models.

실험 결과

연구 질문

RQ1Do modern conversational LLMs exhibit left-of-center political preferences according to standard political orientation tests?
RQ2How do base foundation models compare to their conversation-optimized counterparts in political evaluation?
RQ3Can supervised fine-tuning with limited politically aligned data steer LLMs toward specific political orientations?
RQ4What are the societal implications of political biases embedded in LLMs given their role as information sources?

주요 결과

Most conversational LLMs tend to generate responses aligned with left-of-center viewpoints on political prompts.
Base foundation models show weak and inconclusive results in these political tests due to coherence issues.
Supervised Fine-Tuning with modest politically aligned data can steer LLMs toward targeted political locations.
The study highlights substantial societal implications as LLMs increasingly influence information access.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.