QUICK REVIEW

[논문 리뷰] Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis

Md. Arid Hasan, Shudipta Das|arXiv (Cornell University)|2023. 08. 21.

Sentiment Analysis and Opinion Mining인용 수 25

한 줄 요약

논문은 큰 Bangla 감정 데이터셋(MUBASE)을 구축하고 LLM의 제로/적은 샷 프롬프트와 미세조정 모델을 비교합니다. 모국어 Bangla 미세조정 모델이 일반적으로 이 작업에서 LLM보다 우수한 것으로 나타났습니다.

ABSTRACT

The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community.

연구 동기 및 목표

소셜 미계에서 수작업으로 주석이 달린 Bangla 감정 데이터셋 중 하나를 구축합니다(MUBASE).
제로- 및 소수-shot 프롬프팅의 LLM들(Flan-T5, GPT-4, Bloomz)을 미세조정 모델과 비교 평가합니다.
프롬프팅의 변화와 모델 유형이 Bangla 감정 분류 성능에 미치는 영향을 분석합니다.
저자 Bangla 모델이 다국어 또는 LLM 기반 접근법보다 저리소스 Bangla 감정 분석에서 우수한지 평가합니다.
데이터셋과 도구의 공개 배포 계획을 제시하여 향후 연구를 촉진합니다.

제안 방법

Facebook 게시물과 트윗으로부터 Bangla 감정 데이터셋(MUBASE)을 구축하고 주석을 다는 작업을 수행합니다(정리 후 33,606 항목).
BanglaBERT, mBERT, XLM-RoBERTa, Bloomz, BanglaBERT를 Bangla 데이터에 대해 미세조정합니다.
GPT를 사용해 임베딩을 추출하고 피드포워드 분류기를 기본 임베딩 방식으로 학습합니다.
제로샷 및 소수샷 프롬프팅을 LLM들(Flan-T5, Bloomz, GPT-4)으로 평가하되, Bangla-English 프롬프트와 네이티브 Bangla 프롬프트를 신중하게 설계합니다.
GPT-4와 Bloomz에 대해 0샷 및 3-/5샷 프롬pts를 MMR로 선택된 예시로 사용하고 Bloomz 출력 향상을 위해 앙상블 다수결을 적용합니다.
baselines(무작위, 다수결)과 비교하고 70/10/20으로 구성된 stratified train/dev/test 분할에서 정확도, 가중치된 정밀도, 재현율, F1을 보고합니다.

실험 결과

연구 질문

RQ1제로샷 및 소수샷 LLM 프롬프트가 미세조정 모델과 비교하여 Bangla 감정 분석에서 어떤 성능을 보이는가?
RQ2모국어 Bangla 모델(BanglaBERT 등)이 다국어 또는 LLM 기반 접근법보다 Bangla 감정 과제에서 우수한가?
RQ3프롬프트 설계 및 모델 크기가 제로-/소수샷 Bangla 감정 분류에 어떤 영향을 미치는가?
RQ4모델 간 앙상블 예측이 LLM 기반 접근법의 성능을 향상시킬 수 있는가?
RQ5Bangla 감정 분석에서 원어 프롬프팅이 영어 프롬프팅만큼 효과적인가?

주요 결과

미세조정 모델은 설정 전반에서 제로샷 및 소수샷 LLM 프롬프트를 지속적으로 능가합니다.
모국어 BanglaBERT 기반 미세조정이 테스트 모델 중 최상의 성능을 보였습니다.
제로샷에서 GPT-4는 모국어 미세조정 모델에 비해 경쟁적이지만 지배적이지는 않습니다.
Bloomz는 제로/소샷 설정에서 때때로 GPT-4보다 우수하나 중립 클래스를 예측하는 데 어려움을 겪고, GPT-4는 양성 예측에 어려움을 겪습니다.
Bloomz 설정 간 다수결 앙상블은 가중 F1을 5.73포인트 증가시킵니다.
MUBASE에 SentiNoB(Bangla NoB)를 결합해 BanglaBERT를 미세조정하는 데이터 확장은 추가로 약 1.41%의 F1 향상을 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.