QUICK REVIEW

[Paper Review] The First Results of Distributed Peer Review at ESO Show Promising Outcomes

Tereza Jeřabková, F. Patat|arXiv (Cornell University)|Jan 1, 2023

Astronomy and Astrophysical Research2 citations

TL;DR

This paper evaluates the first two cycles (P110 and P111) of ESO's Distributed Peer Review (DPR) system, in which Principal Investigators review each other's proposals instead of relying on centralized expert panels. The DPR system, which assigns proposals based on keyword-matching expertise scores and mandates 10 reviews per submitted proposal, achieved statistically comparable outcomes to traditional panels, with significantly higher feedback satisfaction—especially among rejected proposals—due to more detailed, unedited reviewer comments.

ABSTRACT

The European Southern Observatory (ESO) implemented a new paradigm called Distributed Peer Review (DPR) as part of its proposal evaluation process in Period 110. Under DPR, Principal Investigators who submit proposals agree to review a certain number of proposals submitted by their peers and accept that their own proposal(s) are reviewed by their peers who have also submitted proposals in the same cycle. This article presents a brief overview of the DPR process at ESO, and its outcomes based on data from periods 110 and 111.

Motivation & Objective

To address the growing logistical burden on expert panels due to rising proposal volumes in large astronomical facilities.
To evaluate the feasibility and quality of a distributed peer review (DPR) model where PIs review each other’s proposals.
To assess whether DPR produces outcomes comparable to traditional expert panel reviews in terms of grading consistency and feedback quality.
To analyze the effectiveness of expertise-matching algorithms using applicant-provided keywords and match scores.
To measure user satisfaction with reviewer feedback under DPR versus traditional panel review.

Proposed method

Implemented a DPR system in ESO’s P110 and P111 proposal cycles, where PIs must review 10 peer proposals and accept peer review of their own.
Used a keyword-based expertise matching system: each PI and proposal is assigned a knowledge vector using up to five science keywords, with match scores ranging from 0 (no overlap) to 2 (perfect match).
Employed a custom algorithm, proposal_distributor, to assign proposals to reviewers based on match scores, reviewer capacity (10 reviews per reviewer), and conflict avoidance (e.g., institutional ties).
Normalized DPR grades to match the mean and standard deviation of panel grades to ensure fair ranking.
Collected voluntary feedback from PIs on the usefulness of reviewer comments, distinguishing between accepted and rejected proposals.
Compared match score distributions under DPR, random assignment, and an idealized asymptotic maximum to evaluate algorithm performance.

Experimental results

Research questions

RQ1Does the DPR system produce grading outcomes statistically indistinguishable from traditional expert panel reviews?
RQ2How effective is the keyword-based expertise matching algorithm in assigning reviewers with relevant scientific background to proposals?
RQ3Is the feedback from DPR reviewers more useful than that from expert panels, particularly for rejected proposals?
RQ4Are there systematic biases in self-evaluated expertise levels compared to algorithmically computed match scores?
RQ5Can DPR sustain high-quality evaluation while reducing panel workload and improving feedback quality?

Key findings

The DPR system successfully evaluated 435 proposals in P110 and 417 in P111, with 100% on-time submission and no technical issues reported.
The average match score for DPR assignments (6085.65) was over three times higher than random assignment (1875.15), indicating strong expertise alignment.
The DPR match score distribution peaked at high values, showing effective assignment, and was only 1.05× below the theoretical asymptotic maximum.
PIs reported significantly higher satisfaction with DPR feedback than with panel feedback, especially for rejected proposals: 60% of DPR comments were rated 'mostly' or 'fully' useful, compared to only 30% for panel comments.
Self-evaluated expertise levels showed a positive correlation with match scores, though discrepancies occurred—some reviewers rated themselves as 'non-expert' despite perfect keyword matches, suggesting potential self-rating bias.
The system reduced panel workload by a factor of two, as proposals over 16 hours were routed to DPR, preserving panel control over ~80% of observing time.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.