QUICK REVIEW

[論文レビュー] Deep Learning Based Named Entity Recognition Models for Recipes

Mansi Goel, Ayush Agarwal|arXiv (Cornell University)|Feb 27, 2024

Topic Modeling被引用数 5

ひとこと要約

本論文は、手動で注釈付けされた、拡張された、および機械注釈付きデータセットを用いてレシピテキストのNERモデルを構築・評価し、spaCy-transformer が macro-F1 およそ 96% 程度で最良の性能を示す一方、few-shot prompting LLM は性能が低いことを示した。

ABSTRACT

Food touches our lives through various endeavors, including flavor, nourishment, health, and sustainability. Recipes are cultural capsules transmitted across generations via unstructured text. Automated protocols for recognizing named entities, the building blocks of recipe text, are of immense value for various applications ranging from information extraction to novel recipe generation. Named entity recognition is a technique for extracting information from unstructured or semi-structured data with known labels. Starting with manually-annotated data of 6,611 ingredient phrases, we created an augmented dataset of 26,445 phrases cumulatively. Simultaneously, we systematically cleaned and analyzed ingredient phrases from RecipeDB, the gold-standard recipe data repository, and annotated them using the Stanford NER. Based on the analysis, we sampled a subset of 88,526 phrases using a clustering-based approach while preserving the diversity to create the machine-annotated dataset. A thorough investigation of NER approaches on these three datasets involving statistical, fine-tuning of deep learning-based language models and few-shot prompting on large language models (LLMs) provides deep insights. We conclude that few-shot prompting on LLMs has abysmal performance, whereas the fine-tuned spaCy-transformer emerges as the best model with macro-F1 scores of 95.9%, 96.04%, and 95.71% for the manually-annotated, augmented, and machine-annotated datasets, respectively.

研究の動機と目的

レシピの材料句の大規模で多様なデータセット（手動、拡張、機械注釈付き）を作成し、堅牢なNERトレーニングを実現する。
レシピデータに対して従来のNER手法と深層学習NER手法をベンチマークし、最先端の性能を確立する。
データ拡張とサンプリング戦略を評価し、多様性とモデルの一般化を最大化する。
レシピNERのために大規模言語モデルでのfew-shot prompting の実現可能性を評価する。
レシピテキストにおけるタグ別の学習可能性を分析し、どのエンティティタイプが難しいかを理解する。

提案手法

データセット構築: 材料句6,611件の手動注釈、26,445句への拡張、RecipeDB からの機械注釈コーパスを SEFS クラスタリングで統合して総計 349,762 句。
語形還元と誤りパターンの料理専門家による修正を伴うデータ前処理。
モデル構成: Stanford NER (CRF) の再実装とエンコーダーベースモデル（BERT、DistilBERT、RoBERTa、DistilRoBERTa）およびNLPフレームワーク（spaCy、flair）のファインチューニング。
学習設定: SGD によるファインチューニング、学習率 0.01、NVIDIA A100 上、バッチサイズ 44、最大 12 エポック。
評価: 三つのデータセット（手動注釈、拡張、機械注釈）で macro-F1、精度、再現率を評価。
LLM による few-shot prompting 実験（例: LLaMA、Mistral、Vicuna）を行い、監督付きファインチューニングと比較。

実験結果

リサーチクエスチョン

RQ1レシピ材料句の大規模で多様なデータセットは、レシピテキストのNER性能を向上させることができるか。
RQ2どのNERモデリング手法（CRFベースのベースライン対エンコーダーベースのトランスフォーマー）がレシピデータで最も高い macro-F1 をもたらすか。
RQ3データ拡張と機械注釈データはNER性能を助けるのか、それとも妨げるのか。
RQ4現在のLLM の few-shot プロンプトはファインチューニングなしでレシピNERに有効か。
RQ5レシピNER におけるタグ別の学習可能性のパターンは（例：どのエンティティタイプが容易か難しいか）？

主な発見

spaCy-transformer が三つのデータセット全てで最高の macro-F1 を達成: 手動注釈 95.9%、拡張 96.04%、機械注釈 95.71%。
拡張データは一部のモデルに控えめな利益を提供する一方、機械注釈データはノイズを誘発し性能をわずかに低下させる可能性がある。
Distil 系は base BERT モデルに匹敵または上回ることが多く、過学習とノイズ感度の低減が要因かもしれない。
最新のLLM での few-shot prompting は macro-F1 が低い値にとどまり（モデルにより 5.88–32.90% の範囲）、ファインチューニングなしの領域適応は限定的。
Quantity タグは早く学習される一方、Temperature は遅れ、データの頻度が学習可能性とメモリ依存に影響することを示唆。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。