QUICK REVIEW

[論文レビュー] Exploring the Use of Text Classification in the Legal Domain

Octavia-Maria Şulea, Marcos Zampieri|arXiv (Cornell University)|Oct 25, 2017

Artificial Intelligence in Law参考文献 14被引用数 105

ひとこと要約

本論文は、フランス最高裁の判決に対してテキスト分類を適用することを研究し、マスクされた事件説明に対するSVMアンサンブルを用いて法領域、判決種別、判決日を予測し、従来研究よりも強力な性能向上を示す。

ABSTRACT

In this paper, we investigate the application of text classification methods to support law professionals. We present several experiments applying machine learning techniques to predict with high accuracy the ruling of the French Supreme Court and the law area to which a case belongs to. We also investigate the influence of the time period in which a ruling was made on the form of the case description and the extent to which we need to mask information in a full case ruling to automatically obtain training and test data that resembles case descriptions. We developed a mean probability ensemble system combining the output of multiple SVM classifiers. We report results of 98% average F1 score in predicting a case ruling, 96% F1 score for predicting the law area of a case, and 87.07% F1 score on estimating the date of a ruling.

研究の動機と目的

法的調査やケース準備といったタスクを支援する自動化を動機づける。
ケース説明から法領域、判決、判決日を予測できるかを調査する。
学習データのリアリズムとモデル性能に対するマスキングの影響を評価する。
データのどの側面が予測に影響するかを分析し、以前のベースラインとアンサンブル結果を比較する。

提案手法

単語の語彙ユニット（word unigrams）とバイグラムを特徴量とするSVM分類器の平均確率アンサンブルを使用。
クラスの不均衡を扱うため stratified 10-fold cross-validation で学習・評価。
法曹が利用する現実的なドラフトを模擬するため、トレーニング/テストデータのターゲットラベルをマスク。
情報漏洩を招く参照を除去するなどデータを前処理し、特定のタスクではテキストからラベル語を除去。
LIBLINEAR SVM を用いた bag-of-words特徴量で Şulea et al. (2017) ベースラインとアンサンブル結果を比較。

実験結果

リサーチクエスチョン

RQ1マスクされたターゲットを用いて、ケースの説明から法域を予測できるか？
RQ2現実的なマスキングの下で、SVM分類器のアンサンブルはケース説明から判決を正確に予測できるか？
RQ3説明から判決の十年紀や時間区分を推定できるか？
RQ4マスキングは従来のアプローチと比べてモデル性能にどう影響するか？

主な発見

Task	Labels	Model	P	R	F1	Acc.
Law Area	8 classes (>=200 instances)	Ensemble	96.8%	96.8%	96.5%	96.8%
Law Area	8 classes (baseline)	Şulea et al. (2017)	90.9%	90.2%	90.3%	90.2%
Case Ruling	6-class	Ensemble	98.6%	98.6%	98.6%	98.6%
Case Ruling	6-class	Şulea et al. (2017)	97.1%	96.9%	97.0%	96.9%
Case Ruling	8-class	Ensemble	95.9%	96.2%	95.8%	96.2%
Case Ruling	8-class	Şulea et al. (2017)	93.2%	92.8%	92.7%	92.8%
Temporal	7-class	Ensemble	87.3%	87.0%	87.0%	87.0%
Temporal	7-class	Şulea et al. (2017)	75.9%	74.3%	73.2%	74.3%

アンサンブルは法域予測（8クラス）で平均適合率/再現率96.8%、精度96.8%を達成。
アンサンブルは6クラス判決予測でF1/Precision 98.6%、精度98.6%；8クラス判決予測でF1 95.8%、精度96.2%。
アンサンブルは時系列予測（7クラス）でF1 87.0%、精度87.0%。
タスクを跨いで、平均確率アンサンブルは同データセットの Şulea et al. (2017) ベースラインよりも上回る。
混同行列分析では non-lieu と annulation が、事例数が少ないため最も難しい判決クラスの中に含まれる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。