QUICK REVIEW

[Paper Review] Linearizable Replicated State Machines With Lattice Agreement

Xiong Zheng, Vijay K. Garg|arXiv (Cornell University)|Oct 13, 2018

Distributed systems and fault tolerance17 references5 citations

TL;DR

This paper presents AsyncLA, an O(log f) asynchronous lattice agreement protocol that exponentially improves upon prior O(f) bounds, enabling efficient linearizable replicated state machines (LaRSM). By optimizing generalized lattice agreement and integrating it with CRDTs, the authors implement a high-throughput, low-latency LaRSM that outperforms SPaxos by 1.3x in throughput and maintains availability during failures, demonstrating lattice agreement as a practical alternative to consensus for linearizable RSMs.

ABSTRACT

This paper studies the lattice agreement problem in asynchronous systems and explores its application to building linearizable replicated state machines (RSM). First, we propose an algorithm to solve the lattice agreement problem in $O(\log f)$ asynchronous rounds, where $f$ is the number of crash failures that the system can tolerate. This is an exponential improvement over the previous best upper bound. Second, Faleiro et al have shown in [Faleiro et al. PODC, 2012] that combination of conflict-free data types and lattice agreement protocols can be applied to implement linearizable RSM. They give a Paxos style lattice agreement protocol, which can be adapted to implement linearizable RSM and guarantee that a command can be learned in at most $O(n)$ message delays, where $n$ is the number of proposers. Later on, Xiong et al in [Xiong et al. DISC, 2018] give a lattice agreement protocol which improves the $O(n)$ guarantee to be $O(f)$. However, neither protocols is practical for building a linearizable RSM. Thus, in the second part of the paper, we first give an improved protocol based on the one proposed by Xiong et al. Then, we implement a simple linearizable RSM using the our improved protocol and compare our implementation with an open source Java implementation of Paxos. Results show that better performance can be obtained by using lattice agreement based protocols to implement a linearizable RSM compared to traditional consensus based protocols.

Motivation & Objective

To address the inefficiency of existing lattice agreement protocols in building practical linearizable replicated state machines (RSMs).
To reduce the asynchronous round complexity of lattice agreement from O(f) to O(log f), significantly improving performance.
To design a practical generalized lattice agreement protocol that supports efficient implementation of linearizable RSMs using conflict-free data types (CRDTs).
To empirically evaluate the performance of the proposed LaRSM against Paxos-based systems like SPaxos under normal and failure conditions.

Proposed method

Designs AsyncLA, a novel lattice agreement algorithm that achieves O(log f) asynchronous rounds using hierarchical coordination and value propagation across failure-tolerant quorums.
Introduces optimizations to the generalized lattice agreement protocol from Xiong et al. (2018), reducing message overhead and improving batching for write-heavy workloads.
Combines the improved lattice agreement protocol with a CRDT map data structure to implement a linearizable RSM (LaRSM) that supports deterministic updates and queries.
Employs a client-side retry mechanism with short timeouts to maintain availability during replica failures, enabling continued request processing.
Uses a peer-to-peer client routing model where clients can dynamically switch replicas upon timeout, enhancing fault tolerance and load balancing.
Evaluates performance via simulation with varying client loads, failure scenarios, and read/write ratios to measure throughput, latency, and scalability.

Experimental results

Research questions

RQ1Can lattice agreement be solved in O(log f) asynchronous rounds, significantly improving upon the prior O(f) bound?
RQ2Can a lattice agreement-based RSM achieve better performance than consensus-based RSMs like SPaxos in terms of throughput and latency?
RQ3How does the proposed LaRSM handle failures compared to leader-based protocols like SPaxos?
RQ4Does the performance of the LaRSM improve with higher read-to-write ratios, and why?
RQ5What are the scalability limitations of the lattice agreement-based RSM as the number of replicas increases?

Key findings

The AsyncLA protocol achieves O(log f) asynchronous rounds for lattice agreement, representing an exponential improvement over the previous O(f) bound.
The implemented LaRSM achieves approximately 1.3 times higher throughput than SPaxos under normal load conditions.
LaRSM maintains request processing during replica failures, with throughput dropping only from ~20K to ~15K requests/sec, while SPaxos stops processing entirely when the leader fails.
Latency in LaRSM increases by only ~5ms under normal conditions and remains stable during failures, whereas SPaxos latency spikes to infinity during leader failure.
Performance improves with higher read-to-write ratios because smaller proposal sets reduce message size and accelerate lattice agreement completion.
LaRSM exhibits poor scalability as the number of replicas increases, due to the O(log f) round complexity depending on f = (n−1)/2, which grows with n.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.