QUICK REVIEW

[論文レビュー] Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health

Shaoxiong Ji, Tianlin Zhang|arXiv (Cornell University)|Apr 20, 2023

Mental Health via Writing被引用数 16

ひとこと要約

この論文は Reddit データで domain-specific long-context 言語モデル MentalXLNet と MentalLongformer を学習させ、精神健康分類と長距離テキスト理解の有効性を評価する。MentalLongformer は長い文書で概して優れる。

ABSTRACT

Pretrained language models have been used in various natural language processing applications. In the mental health domain, domain-specific language models are pretrained and released, which facilitates the early detection of mental health conditions. Social posts, e.g., on Reddit, are usually long documents. However, there are no domain-specific pretrained models for long-sequence modeling in the mental health domain. This paper conducts domain-specific continued pretraining to capture the long context for mental health. Specifically, we train and release MentalXLNet and MentalLongformer based on XLNet and Longformer. We evaluate the mental health classification performance and the long-range ability of these two domain-specific pretrained models. Our models are released in HuggingFace.

研究の動機と目的

Improve mental health classification performance through domain-specific continued pretraining.
Enable effective modeling of long documents common in mental health data (e.g., Reddit posts).
Compare long-context models (Longformer, XLNet) with domain-specific baselines on multiple datasets.
Provide guidance on model choice for short vs. long text in mental health tasks.

提案手法

Continue pretraining XLNet and Longformer in a mental-health domain using Reddit-based corpus from relevant subreddits.
Evaluate on multiple mental health classification datasets with both short and long texts.
Fine-tune with pooled last-layer representations and an MLP classifier.
Compare against vanilla pretrained models, domain-specific baselines, and zero-shot ChatGPT baselines.
Analyze long-range capabilities by varying input sequence lengths and observing performance trends.

Figure 1: Median number of tokens in the datasets

実験結果

リサーチクエスチョン

RQ1Can domain-specific continued pretraining improve mental health classification across diverse datasets?
RQ2How do long-context models compare to standard and domain-specific baselines on short vs. long texts?
RQ3What is the effect of increasing input sequence length on long-range modeling for mental health tasks?
RQ4Which model is preferable for long documents (e.g., CLPsych15) versus shorter posts (e.g., Dreaddit, SAD dataset)?

主な発見

Model	DR Rec	DR F1	CLPsych15 Rec	CLPsych15 F1	Dreaddit Rec	Dreaddit F1	T-SID Rec	T-SID F1	SAD Rec	SAD F1	CAMS Rec	CAMS F1
BERT	91.13	90.90	64.67	62.75	78.46	78.26	88.44	88.51	62.77	62.72	40.26	34.92
RoBERTa	95.07	95.11	67.67	66.07	80.56	80.56	88.75	88.76	66.86	67.53	41.18	36.54
XLNet	90.89	90.44	69.83	69.12	78.88	78.84	86.04	86.18	67.30	67.30	50.64	49.16
Longformer	95.81	95.74	75.67	75.47	81.54	81.45	89.58	89.63	69.20	69.01	49.52	49.42
MentalBERT	94.58	94.62	64.67	62.63	80.28	80.04	88.65	88.61	67.45	67.34	45.69	39.73
MentalRoBERTa	94.33	94.23	70.33	69.71	81.82	81.76	88.96	89.01	68.61	68.44	50.48	47.62
MentalXLNet	95.32	95.24	71.67	71.49	80.42	80.41	89.17	89.12	69.20	68.76	50.80	50.08
MentalLongformer	96.55	96.53	77.00	76.32	81.12	81.05	89.90	89.89	68.76	68.44	49.20	48.74

MentalXLNet and MentalLongformer outperform most baselines on classification tasks, especially for longer sequences like CLPsych15.
MentalLongformer shows strong performance on long texts and often matches or exceeds Longformer when domain-specific pretraining is applied.
MentalRoBERTa performs best on shorter, 512-token datasets (Dreaddit), while MentalLongformer is advantageous for longer inputs.
For long-range analysis, increasing sequence length generally improves recall and F1 in many cases, though fluctuations occur.
Domain-specific continued pretraining yields consistent gains over non-domain-specific Longformer/XLNet baselines in several datasets.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。