[Paper Review] Revise and Resubmit: An Intertextual Model of Text-Based Collaboration in Peer Review
This paper proposes an intertextual model of text-based collaboration in peer review, introducing three core tasks—pragmatic tagging, linking, and version alignment—using a novel graph-based data model. It presents F1000RD, the first multi-domain, open post-publication peer review corpus, and demonstrates the feasibility of joint modeling for collaborative text analysis in NLP.
Peer review is a key component of the publishing process in most fields of science. Increasing submission rates put a strain on reviewing quality and efficiency, motivating the development of applications to support the reviewing and editorial work. While existing NLP studies focus on the analysis of individual texts, editorial assistance often requires modeling interactions between pairs of texts—yet general frameworks and datasets to support this scenario are missing. Relationships between texts are the core object of the intertextuality theory—a family of approaches in literary studies not yet operationalized in NLP. Inspired by prior theoretical work, we propose the first intertextual model of text-based collaboration, which encompasses three major phenomena that make up a full iteration of the review–revise–and–resubmit cycle: pragmatic tagging, linking, and long-document version alignment. While peer review is used across the fields of science and publication formats, existing datasets solely focus on conference-style review in computer science. Addressing this, we instantiate our proposed model in the first annotated multidomain corpus in journal-style post-publication open peer review, and provide detailed insights into the practical aspects of intertextual annotation. Our resource is a major step toward multidomain, fine-grained applications of NLP in editorial support for peer review, and our intertextual framework paves the path for general-purpose modeling of text-based collaboration. We make our corpus, detailed annotation guidelines, and accompanying code publicly available.
Motivation & Objective
- To address the lack of general frameworks and datasets for modeling cross-document, text-based collaboration in peer review.
- To operationalize intertextuality theory in NLP by modeling key phenomena in the review-revise-resubmit cycle.
- To develop a generic, extensible data model (Intertextual Graph) that supports long documents and intertextual relations.
- To create and release the first multi-domain, open post-publication peer review corpus (F1000RD) with clear licensing.
- To enable fine-grained, multi-domain NLP applications in editorial support by providing annotated data and code.
Proposed method
- Proposes a graph-based Intertextual Graph data model to represent textual and non-textual elements, capturing document structure and cross-document relations.
- Introduces three core tasks: pragmatic tagging (classifying statements by communicative purpose), linking (discovering fine-grained connections between texts), and version alignment (aligning revisions of the same document).
- Employs an unsupervised, rule-based ITG alignment technique using ILP constraints for paragraph-level version alignment.
- Uses a binary labeling schema for linking annotation, with plans to explore decompositional approaches in future work.
- Develops a flexible annotation interface and workflow to support high-quality, scalable annotation of intertextual relations.
- Releases the F1000RD corpus and accompanying code under open license for reproducibility and reuse.
Experimental results
Research questions
- RQ1How can intertextuality theory be operationalized in NLP to model text-based collaboration in peer review?
- RQ2What are the key challenges and design considerations in annotating pragmatic tagging, linking, and version alignment in a multi-domain, post-publication review setting?
- RQ3How effective is the proposed unsupervised ITG-based alignment method in achieving high-precision paragraph-level alignment across document revisions?
- RQ4What are the practical trade-offs and limitations of current annotation schemas and task definitions in collaborative text analysis?
- RQ5How does joint modeling of the three tasks improve the understanding of peer review discourse compared to isolated task analysis?
Key findings
- The proposed intertextual model successfully captures core phenomena in the review-revise-resubmit cycle through pragmatic tagging, linking, and version alignment.
- The F1000RD corpus is the first publicly available, multi-domain, open post-publication peer review dataset with clear licensing, supporting diverse NLP applications.
- The unsupervised ITG alignment method achieves high precision, though only 70% of documents are perfectly aligned, indicating room for improvement.
- Joint modeling reveals that task interdependencies are non-trivial, with linking scope and granularity being critical open issues.
- Annotation quality is sensitive to interface design and suggestion mechanisms, suggesting the need for optimized annotation workflows.
- The framework is extensible and adaptable to other domains, including Wikipedia, news, and online discussion platforms, paving the way for broader adoption.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.