[Paper Review] Fine-tune Bert for DocRED with Two-step Process
The paper shows that fine-tuning BERT on DocRED with a two-step training process (relation existence first, then specific relation) improves document-level relation extraction over baselines.
Modelling relations between multiple entities has attracted increasing attention recently, and a new dataset called DocRED has been collected in order to accelerate the research on the document-level relation extraction. Current baselines for this task uses BiLSTM to encode the whole document and are trained from scratch. We argue that such simple baselines are not strong enough to model to complex interaction between entities. In this paper, we further apply a pre-trained language model (BERT) to provide a stronger baseline for this task. We also find that solving this task in phases can further improve the performance. The first step is to predict whether or not two entities have a relation, the second step is to predict the specific relation.
Motivation & Objective
- Motivate better document-level relation extraction beyond sentence-level models.
- Demonstrate the benefits of leveraging pre-trained Language Models (BERT) for DocRED.
- Propose a two-step training approach to address label imbalance in DocRED.
- Evaluate the approach against established baselines on the DocRED dataset.
Proposed method
- Encode documents with BERT-base to obtain token and entity embeddings.
- Represent entity pairs via a BiLinear layer on projected BERT embeddings to predict relations.
- Train in two steps: (1) binary relation existence (relation vs. N/A) with balanced sampling; (2) multi-class relation prediction using only related pairs.
- Project BERT outputs to a 128-dim space before the BiLinear classifier.
- Use annotated DocRED data for training; in step 1, labels 1/0 with 3:1 negative-to-positive sampling; in step 2, train only on relational instances.
Experimental results
Research questions
- RQ1Does BERT improve doc-level relation extraction on DocRED compared to CNN/LSTM baselines?
- RQ2Can a two-step training process mitigate label imbalance and boost performance in document-level RE?
- RQ3To what extent do entity interaction modeling approaches impact DocRED performance?
Key findings
- BERT yields about a 2% F1 improvement over baselines on DocRED (dev and test).
- Two-step training further improves performance beyond BERT alone (BERT-Two-Step); second-step accuracy is around 90%.
- BiLSTM-based encoders and local-only interaction models underperform compared with BERT-based models on DocRED.
- The bottleneck is the first step (predicting whether a relation exists) rather than identifying the specific relation.
- A SentModel that encodes documents sentence-by-sentence performs similarly to BiLSTM, suggesting current models struggle to capture cross-sentential interactions.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.