QUICK REVIEW

[論文レビュー] K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization

Cheng Deng, Tianhang Zhang|arXiv (Cornell University)|Jun 8, 2023

Topic Modeling被引用数 13

ひとこと要約

K2はGeoSignalとGeoBenchを含む地球科学用の7BファウンデーションLLMで、5.5Bの地球科学トークンでLLaMA-7Bを追加事前学習し、その後GeoSignalで指示調整を行い地球科学ベンチマークGeoBenchで評価された。すべてデータとモデルはオープンソース。

ABSTRACT

Large language models (LLMs) have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBench, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pre-trained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on 5.5B tokens of geoscience text corpus, including over 1 million pieces of geoscience literature, and utilize GeoSignal's supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Meanwhile, we equip K2 with the abilities of using tools to be a naive geoscience aide. Experiments conducted on the GeoBench demonstrate the effectiveness of our approach and datasets on geoscience knowledge understanding and utilization.We open-source all the training data and K2 model checkpoints at https://github.com/davendw49/k2.

研究の動機と目的

地球科学向けの最初のオープンソースファウンデーションLLMであるK2を紹介する。
Geoscienceの指示チューニングデータセット（GeoSignal）と地球科学ベンチマーク（GeoBench）を作成する。
汎用事前学習モデルからドメイン特化型LLMを訓練するためのドメイン適応のレシピを説明する。
地球科学の知識理解、推論、およびツール使用におけるK2の有効性を実証する。
ドメイン特化型LLMのためのオープンソースの学習データ、モデルチェックポイント、およびパイプラインを提供する。

提案手法

Geoscienceテキストコーパス総計5.5B tokensでLLaMA-7Bを追加事前学習させる。
一般的な指示データを用いた指示チューニングを行い、その後知識集約的なドメインデータ（GeoSignal）をパラメータ効率の良いLoRAチューニングで実施する。
K2が外部地球科学ツールを使用できるようツール学習を有効化する。
GeoSignalを複数の地球科学タスクの統一された入出力監視として構築する。
GeoBenchを作成し客観的および主観的な地球科学タスクを評価する。

実験結果

リサーチクエスチョン

RQ1同程度の規模のベースラインと比較して、ドメイン適用済みLLMは地球科学の知識をどれだけ理解・活用できるか？
RQ2地域特化の指示チューニング（GeoSignal）は、一般的な指示チューニングを超えて地球科学タスクの性能を向上させるか？
RQ3K2は外部ツールを効果的に活用して地球科学タスクを補助できるか？
RQ4客観的および主観的評価（GeoBench）は、地球科学文脈におけるモデルの知識・推論・信頼性をどのように反映するか？

主な発見

ベースライン	NPEE	APTest
Gal-6.7B	25.7	29.9
LLaMA-7B	21.6	27.6
MPT-7B	28.4	26.0
Vicuna-7B	26.4	16.8
Alpaca-7B	31.1	29.1
K2-7B (Ours)	39.9	29.3

K2はGeoBenchの客観的地球科学タスクで同規模のベースラインを上回る（例：7BモデルでNPEEは39.9対25.7–31.1、APTestは29.3対29.9）。
主観的評価では、K2はベースラインと比べ合理性と正確性のスコアが高く、整合性も競争力がある状態を維持する。
アブレーション研究は、GeoSignal対応の訓練がGeoLLaMAベースラインより性能を改善し、ドメイン整合の指示データが出力に実質的な利益をもたらすことを示す。
LoRAを用いた地球科学指示チューニングは、ツール使用機能を含む効率的な適応を、学習可能パラメータが相対的に少ない状態で実現する。
GeoBenchは客観的・主観的評価を組み合わせ、K2の地球科学の知識理解と活用を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。