[Paper Review] Thinking Slow about Latency Evaluation for Simultaneous Machine Translation
The paper introduces Differentiable Average Lagging (DAL), a differentiable latency metric for simultaneous MT, addressing inconsistencies in Average Lagging (AL) and providing a coherent framework to evaluate latency under intrinsic timing scenarios.
Simultaneous machine translation attempts to translate a source sentence before it is finished being spoken, with applications to translation of spoken language for live streaming and conversation. Since simultaneous systems trade quality to reduce latency, having an effective and interpretable latency metric is crucial. We introduce a variant of the recently proposed Average Lagging (AL) metric, which we call Differentiable Average Lagging (DAL). It distinguishes itself by being differentiable and internally consistent to its underlying mathematical model.
Motivation & Objective
- Clarify latency measurement in intrinsic (timing-free) simultaneous MT evaluation.
- Identify limitations of the existing Average Lagging (AL) metric.
- Propose a differentiable latency metric that accounts for target-writing costs and preserves AL’s desirable properties.
- Provide a non-recurrent formulation of the latency model for practical implementation.
Proposed method
- Define g(t) as the number of source tokens read before writing target token t.
- Introduce g' that incorporates a time-cost d for writing a target token, via a recurrence g_d'(t)= { g(t) if t=1; max[g(t), g_d'(t-1)+d] } and show its equivalence to a non-recurrent form.
- Derive DAL_d as 1/|y| sum_t ( g_d'(t) - (t-1)d ), and propose d = |x|/|y| for consistency with AL.
- Eliminate the problematic tau from AL by using a differentiable, time-based formulation.
- Provide a non-recurrent equivalent g_d'(t) = (t-1)d + max_{1≤i≤t} [ g(i) - (i-1)d ].
- Discuss properties, edge cases, and implications for deterministic vs adaptive latency strategies.
Experimental results
Research questions
- RQ1How can latency in intrinsic simultaneous MT evaluation be measured without source timing information?
- RQ2What are the limitations of AL in being differentiable and in penalizing/ rewarding certain timing strategies?
- RQ3Can we design a differentiable latency metric that accounts for the cost of writing target tokens and remains consistent for wait-k systems?
- RQ4How does DAL compare to AL for deterministic and adaptive translation systems under various length conditions?
Key findings
- DAL is differentiable and eliminates the tau-based non-differentiability of AL.
- DAL introduces a writing-cost parameter d, chosen as d = |x|/|y| for consistency with AL and to encourage catch-up when y is longer than x.
- DAL preserves AL’s interpretation for wait-k systems while avoiding AL’s potential exploitation via free writes after tau.
- DAL’s time-indexed lags are lower-bounded by (t-1)d, ensuring a baseline latency and avoiding negative lags.
- Empirical comparison indicates a predominantly linear relationship between AL and DAL, with DAL being more conservative and slightly higher in reported lags for adaptive MILk versus deterministic wait-k systems.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.