QUICK REVIEW

[논문 리뷰] TrustLLM: Trustworthiness in Large Language Models

Y. Huang, Lichao Sun|arXiv (Cornell University)|2024. 01. 10.

Artificial Intelligence in Healthcare and Education인용 수 50

한 줄 요약

TrustLLM은 여덟 가지 신뢰성 차원을 제안하고 여섯 차원의 벤치마크를 구축한 뒤 30개 데이터셋에 걸쳐 16개의 주류 LLM을 평가하여 신뢰성와 유용성의 관계와 독점형 모델과 오픈형 모델 간의 차이를 분석한다.

ABSTRACT

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

연구 동기 및 목표

여덟 가지 신뢰할 수 있는 LLM의 차원 정의(진실성, 안전성, 공정성, 강건성, 프라이버시, 기계윤리, 투명성, 책임성).
30개가 넘는 데이터셋과 16개 LLM을 활용해 여섯 가지 신뢰성 측면에 대한 포괄적 벤치마크를 확립한다.
신뢰성과 유용성 간의 관계 및 프로프라이터리와 오픈-웨이트 LLM 간의 차이에 대한 통찰을 제공한다.

제안 방법

500편의 문헌 리뷰를 통해 여덟 가지 신뢰성 차원을 식별한다.
투명성 및 책임성을 제외한 여섯 가지 측면의 벤치마크를 18개 이상의 하위 범주와 30개 데이터셋으로 확립한다.
벤치마크 전반에 걸쳐 16개의 주류 LLM(상용 및 오픈-가중치)을 평가한다.
전반적인 신뢰도 순위와 차원별 상세 분석을 제공한다.
데이터셋, 코드 및 도구 모음을 공개하고 TrustLLM를 위한 공개 리더보드를 제공한다.

실험 결과

연구 질문

RQ1여덟 가지 포괄적 차원으로 LLM의 신뢰성을 어떻게 포착할 수 있는가?
RQ2TrustLLM 벤치마크에서 16개 주류 LLM이 30개 데이터셋에서 어떤 성과를 보이는가?
RQ3LLM의 신뢰성과 기능적 유용성 간의 관계는 무엇인가?
RQ4차원별로 상용 모델과 오픈-웨이트 모델은 신뢰성에서 어떻게 비교되는가?
RQ5LLM의 신뢰성을 향상시키기 위한 도전과 방향은 무엇인가?

주요 결과

많은 작업에서 신뢰성와 유용성이 양의 관련성을 보이며, 성능이 높은 모델일수록 신뢰성이 더 높은 경향이다.
많은 LLM이 과도한 정합성(over-alignment)을 보이며, 무해한 프롬프트를 너무 자주 거부하고 유용성을 감소시킨다.
상용 LLM은 일반적으로 오픈-웨이트 모델보다 신뢰성에서 우수하지만, 일부 오픈-웨이트 모델(예: Llama2)은 여러 작업에서 상용에 근접한 성능을 보인다.
진실성, 안전성, 공정성은 모델 간에 눈에 띄는 격차를 보이며, 강건성 및 프라이버시 처리에서 상당한 변동성이 있다.
투명성과 책임성은 벤치마킹이 여전히 어려우나, 연구는 개방적이고 투명한 신뢰 가능한 기술의 필요성을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.