Rethinking Protocol Upgrades with Gauss

Gauss enables safe, seamless, and modular upgrades for distributed systems.

Published on

March 26, 2026

Summary

Gauss is a new reconfiguration mechanism for coordinating upgrades in distributed systems based on state machine replication (SMR). It decouples reconfiguration from the underlying consensus protocol and separates ordering from execution by introducing an inner log for coordination and a sanitized outer log for execution. With Gauss, these systems can seamlessly upgrade consensus parameters (validator membership, quorum rules, failure thresholds) and switch consensus implementations with minimal global downtime while preserving BFT safety guarantees.

State machine replication (SMR) underpins the reliability and safety of many distributed systems in production today. An SMR-based system typically combines several components: a networking layer for disseminating data to nodes, a consensus layer for ordering transactions, an execution layer for deterministically executing transactions to update replicated state, and a reconfiguration layer for coordinating upgrades and configuration changes.

In many SMR implementations, the reconfiguration mechanism is tightly coupled to the underlying consensus algorithm. If the consensus algorithm changes, the reconfiguration mechanism changes too. While this simplifies design, it also makes upgrades harder than they should be. Routine consensus upgrades, such as updates to validator membership, quorum rules, and failure thresholds, and larger changes, such as switching to a new consensus implementation, often require bespoke engineering, complex migration procedures, and sometimes even temporary downtime.

Gauss is a new reconfiguration mechanism that rethinks this approach. Instead of embedding reconfiguration logic within the consensus algorithm, Gauss separates reconfiguration from consensus, treating consensus as a modular component that can evolve independently. This keeps the reconfiguration mechanism stable while the consensus protocol changes over time. A system using Gauss can update validator membership, quorum rules, and failure thresholds, or replace the entire consensus protocol, with minimal downtime and with no significant impact on liveness and safety.

This article explains how Gauss works and complements the original paper published by the Rialo research team. It focuses on the key idea behind the design, the mechanics of the reconfiguration protocol, and the implications for long-lived SMR systems.

It’s not a lie if you don’t get caught

The paper introducing Gauss is titled “It’s Not a Lie if You Don’t Get Caught: Simplifying Reconfiguration in SMR Through Dirty Logs”. Although slightly tongue-in-cheek, the title points to the central idea behind the design of the reconfiguration mechanism introduced in the paper.

In a traditional replicated state machine, the consensus log and execution log are conceptually distinct but operationally identical. Validators agree on an ordered log of transactions, then feed that same log into an execution engine to produce the next application state. In practice, the execution log is simply the consensus log.

This becomes a problem because not every entry in the consensus log is a client transaction. Some entries are coordination messages used to manage consensus-related operations, including validator changes, failure-threshold updates, timeout parameters, and lifecycle events during reconfiguration. Although these messages are necessary for protocol maintenance, they introduce additional complexity.

Because the execution engine processes both client transactions and coordination entries, execution logic must account for internal consensus-layer mechanics. Upgrades also become more difficult, as changes in the consensus or reconfiguration layer may require corresponding updates in the execution layer.

Gauss avoids this by making the distinction between consensus and execution explicit. To achieve this, it introduces two logs: an inner log and an outer log.

The inner log contains all entries ordered by consensus, including both client transactions and coordination messages. The outer log contains only client transactions and is derived from the inner log by applying a deterministic log sanitization function that filters out coordination entries while preserving the order in which transactions must be executed. In this sense, the inner log is “dirty”: it contains everything consensus orders, while the outer log exposes the consistent prefix of that ordering that execution should see. Entries in the inner log may therefore be ignored when deriving the outer log, even though they were ordered by consensus.

The inner log serves as the consensus layer’s coordination timeline, enabling validators to synchronize their actions and monitor key events during reconfiguration. The outer log serves as the source of truth for executable transactions and is exposed to the execution layer. All validators agree on the ordering of entries in the inner log, but the execution engine does not process that log directly. It processes the sanitized outer log instead.

This is the “lie” in the paper’s title: not every entry ordered and committed by consensus is exposed to execution. Once the system transitions to a new configuration, entries from the previous configuration that occur after the transition point are excluded from execution. Coordination entries are also filtered out before reaching the execution layer, but that is a secondary consequence of the same design: the outer log exposes only the part of the inner log that execution should see. Gauss’s ability to transition a system between configurations without halting consensus or execution depends on this design.

This breaks from traditional SMR designs, where system state is formally described as the result of deterministically executing the complete consensus log. Gauss shows that this requirement is unnecessary for safety. Validators need only the portion of the consensus log that affects the system state (ordered transactions) to derive the outer log that execution uses to materialize the replicated state.

Validators are guaranteed to derive the same outer log and agree on the system state, as long as they derive it from the same prefix of the sequence of entries ordered by the consensus protocol. Some validators may lag behind, but all validators converge on the same replicated state if they follow the deterministic log sanitization process. The log sanitizer ensures that execution maintains the consensus-defined ordering of transactions, even though the consensus log is not executed directly.

*Multiple inner logs may grow in parallel, but the log sanitizer ignores post-handover entries when deriving the outer log. A single outer is derived from multiple epoch-specific inner logs.*

Inside Gauss: An overview of the reconfiguration mechanism

Gauss treats system reconfiguration as a controlled transition between epochs, where each epoch runs under a fixed configuration. In Gauss, epoch 𝑖 is the current epoch, epoch 𝑖 + 1 is the next epoch, and each epoch is associated with a configuration 𝐸ᵢ that includes the validator set, the consensus protocol, and the failure threshold.

A reconfiguration replaces 𝐸ᵢ with a new configuration 𝐸ᵢ₊₁. This may involve changing validator membership, adjusting the failure threshold, or replacing the consensus protocol itself. Gauss performs these transitions in three phases: preparation, handover, and shutdown.

During normal operation, the consensus protocol orders transactions and produces the inner log. During reconfiguration, this log also carries coordination entries. Gauss defines three such entries: 𝗘𝗽𝗼𝗰𝗵𝗖𝗵𝗮𝗻𝗴𝗲, 𝗥𝗲𝗮𝗱𝘆, and 𝗗𝗼𝗻𝗲. These entries appear in the inner log but are not executed. They are filtered out before the outer log is exposed to the execution engine.

The preparation phase

A configuration transition begins when an 𝗘𝗽𝗼𝗰𝗵𝗖𝗵𝗮𝗻𝗴𝗲 entry is ordered and committed by the consensus protocol of the current epoch. This entry defines the transition from epoch 𝑖 to epoch 𝑖 + 1 and introduces a new configuration 𝐸ᵢ₊₁.

Validators in the next configuration start preparing to take over at this point. They initialize the new consensus protocol, synchronize their execution state with the current system state, and ensure that they have caught up with the outer execution log. Once a validator in the new configuration has completed this preparation, it submits a 𝗥𝗲𝗮𝗱𝘆 message. Meanwhile, the old configuration continues to order entries in its inner log, and the outer log advances accordingly.

The system waits for all validators in the new configuration to signal readiness before entering the handover phase, where it switches from the old configuration to the new one. While it could proceed once a quorum of 𝗥𝗲𝗮𝗱𝘆 messages is observed, it instead requires every validator to send a 𝗥𝗲𝗮𝗱𝘆 message. This avoids a scenario where a faulty validator signals readiness but does not participate in the new epoch, leaving the system unable to reach quorum and make progress.

The handover phase

Once validators in the new configuration begin submitting 𝗥𝗲𝗮𝗱𝘆 messages, the system must determine exactly where the new configuration should become active. This boundary is determined by examining the consensus log. If all 𝗥𝗲𝗮𝗱𝘆 messages appear in the consensus log before another epoch change entry, the configuration change is successful.

Validators in the old configuration construct a handover certificate during this phase. The handover certificate includes the current epoch identifier, the boundary position ℎ, and information about the new configuration. Validators sign this certificate and submit it as a 𝗗𝗼𝗻𝗲 message to the consensus instance for the new configuration.

The boundary ℎ is the position of the final 𝗥𝗲𝗮𝗱𝘆 entry in the old inner log and defines the exact state that the old configuration certifies and transfers to the new configuration. The outer log used by execution starts following the new inner log at the handover boundary, even though the transition is only certified complete later, when 𝗗𝗼𝗻𝗲 messages from the old configuration appear in the new configuration’s consensus instance.

The old configuration may continue to append entries after the handover boundary ℎ, causing the two inner logs to grow in parallel. A newly syncing validator could observe post-boundary entries from the old configuration’s inner log and include them in its outer log, producing a state that differs from one derived solely from the new inner log. Gauss prevents this by ensuring validators transition to the new configuration at the same boundary point and deterministically derive the same outer log.

To illustrate the issue, suppose the inner log for epoch 𝑖 contains the following entries in order:

T₁ →, T₂ → 𝗘𝗽𝗼𝗰𝗵𝗖𝗵𝗮𝗻𝗴𝗲 → T₃ → 𝗥𝗲𝗮𝗱𝘆₁ → 𝗥𝗲𝗮𝗱𝘆₂ → 𝗥𝗲𝗮𝗱𝘆₃ → T₄ → T₅

T₄ and T₅ appear after the boundary, i.e., after the final 𝗥𝗲𝗮𝗱𝘆₃ message, in the inner log produced by the old configuration. A newly joining validator, V₁, syncing from a peer, may include T₁, T₂, T₃, T₄, and T₅ in its outer log and execute them. Another validator, V₂, which participated in the transition, executes only T₁, T₂, and T₃. V₁ and V₂ now have different states because one executed post-boundary transactions from the old configuration, while the other did not.

Gauss handles this by adjusting how the log sanitizer consumes entries based on the epoch boundary. When deriving the outer log for epoch 𝑖, the sanitizer includes application transactions up to the boundary ℎ (such as T₁, T₂, and T₃) while filtering out coordination entries. For epoch 𝑖 + 1, the sanitizer consumes only entries from the new inner log and ignores all entries from the old configuration that appear after the boundary. As a result, T₄ and T₅ are excluded from the outer log that V₁ and V₂ derive and are not executed.

This is the implementation of the “lie” we discussed earlier: T₄ and T₅ exist in the consensus-defined ordering, but the execution layer never sees them. Those transactions must be resubmitted under the new configuration before they can be included in its inner log and surfaced in the outer log for execution. V₁ and V₂ agree on the resulting state because they transition to the new epoch at the same handover boundary.

This design enables fast handover between configurations without requiring the system to pause. The old consensus instance can continue ordering transactions during the transition, but entries after the boundary are ignored by execution in the new configuration. This reduces disruption to users and applications that depend on continued ordering guarantees during a reconfiguration.

It also enables a smooth transition from the old configuration to the new one. Once the handover boundary is agreed on, the new configuration can begin extending its own inner log while the old one continues to grow. Because validators ignore post-boundary entries from the old configuration when deriving the outer log, the presence of two concurrently growing logs does not affect safety. Validators in the new configuration stop reading the old inner log at the same point and already have an inner log they can continue to extend after the old configuration shuts down.

The shutdown phase

The shutdown phase ensures that the system can safely retire the old configuration. The handover has already occurred at the boundary ℎ, which marks the point at which the system switches from the old configuration to the new one. From that point on, the new configuration can continue ordering transactions, while the old configuration remains active long enough for its validators to confirm the handover and help lagging nodes recover and synchronize the correct pre-transition state.

Validators in the old configuration begin preparing to exit during this phase. They monitor the new configuration’s consensus instance for 𝗗𝗼𝗻𝗲 messages submitted by other members of the old configuration’s validator set. Once a validator observes a quorum of 𝗗𝗼𝗻𝗲 messages, it knows it is no longer needed for recovery and can safely stop participating in consensus.

Here, “recovery” refers to the ability of a newly syncing node to reconstruct the correct state for the previous epoch. Validators in epoch 𝑖 + 1 derive their initial state by synchronizing the inner log exposed by a validator in epoch 𝑖 up to ℎ, applying the log sanitizer, and executing the resulting transactions. The handover certificate ensures that this state is certified by the previous configuration and can be verified by newly synced validators.

By observing 𝗗𝗼𝗻𝗲 messages from a quorum of validators in the old configuration, an honest node gains assurance that a quorum of that configuration agrees on the handover boundary ℎ, can serve the inner log up to that point, and has signed the handover certificate. As long as that quorum contains honest validators, a newly joining node can verify the transition and derive the correct state for epoch 𝑖 before starting epoch 𝑖 + 1.

A quorum of 𝗗𝗼𝗻𝗲 messages also guarantees that multiple conflicting epoch configurations cannot become active at the same time. This follows from the quorum intersection property used in Byzantine fault-tolerant systems. To illustrate, consider a system with four validators, A, B, C, and D, where at most one validator can be faulty (𝑓ᵢ = 1) and a quorum requires three validators. Two quorums are possible:

Quorum 1 = {A, B, C}

Quorum 2 = {B, C, D}

These quorums overlap in two validators: B and C. Because at most one validator can be faulty, at least one validator in the intersection must be honest.

Suppose B is malicious and signs conflicting configuration transitions, attempting to drive both quorums toward different outcomes. B signs one handover certificate in Q1 and another in Q2. C, however, is honest and signs only one handover certificate, for example, the one in Q1, before submitting a 𝗗𝗼𝗻𝗲 message. A does the same for Q1, while D signs only the handover certificate it sees in Q2 before submitting its own 𝗗𝗼𝗻𝗲 message.

The resulting 𝗗𝗼𝗻𝗲 messages look like this:

𝗗𝗼𝗻𝗲 {A}: Transition #1
𝗗𝗼𝗻𝗲 {B}: Transition #1
𝗗𝗼𝗻𝗲 {C}: Transition #1
𝗗𝗼𝗻𝗲 {D}: Transition #2
𝗗𝗼𝗻𝗲 {B}: Transition #2

Only the 𝗗𝗼𝗻𝗲 messages supporting the first transition reach the quorum threshold, while the second transition is rejected because it does not have enough signatures (2/4 instead of 3/4). Since a quorum of three 𝗗𝗼𝗻𝗲 messages can contain at most one faulty validator, at least two validators in any successful quorum must be honest. And since any two quorums overlap, at least one honest validator will always sit in the intersection and refuse to sign conflicting handover certificates.

That is why only one transition can ever be finalized. In this example, Q2’s transition cannot be finalized at the same time as Q1’s, because that would require both B and C to support conflicting handover certificates, but C is honest and will never do that. Validators in the new configuration use the handover certificate to determine which configuration to activate, so they will never observe two conflicting handover certificates finalized in the same epoch.

As such, a validator in epoch 𝑖 can shut down once it sees a quorum of 𝗗𝗼𝗻𝗲 messages in the consensus protocol of epoch 𝑖 + 1. At that point, it knows that honest validators will follow the same transition path from the old epoch into the new one, and that the old configuration no longer needs to remain active to prevent safety failures.

What does Gauss change?

Gauss changes how consensus upgrades are handled in practice. It provides the following benefits to SMR protocols:

1. Reconfigurations can be performed with minimal or no downtime. The validator set for the next configuration prepares to take over and start processing transactions, while the validator set for the current configuration continues to order and commit entries. When the transition occurs, the new configuration’s consensus protocol is already capable of making progress.

2. Execution logic is decoupled from consensus logic. By separating the consensus protocol’s inner log from the outer log consumed by the execution engine, Gauss ensures that the execution layer sees the ordered transactions it needs to execute. Coordination messages, as well as consensus-specific internal structures such as lanes, subdags, or other ordering machinery, remain hidden within the consensus layer.

3. Reconfigurations can be performed safely. The quorum intersection property inherent in the design of Gauss preserves safety during configuration transitions. Quorum intersection ensures that conflicting configurations cannot both be finalized, so the system always converges on a single configuration and a single state in the next epoch.

4. Protocol components become modular. Separating execution from consensus allows both components to evolve independently. Because execution sees only the ordered transactions exposed through the outer log, consensus upgrades need not affect execution. Protocols can tune consensus parameters such as validator membership, quorum rules, and failure thresholds, or swap in different consensus algorithms, without rewriting execution logic or interrupting execution.

These properties are especially valuable for blockchain systems, where upgrades are frequent and often risky. Gauss replaces ad hoc upgrade processes with a structured, in-protocol transition mechanism in which validators operating under the new configuration prepare in advance, validators from the old and new configurations agree on a single handover boundary, and the new configuration becomes active without interrupting execution.

Conclusion

The consensus protocols underpinning state machine replication systems are expected to provide strong safety and liveness guarantees under demanding conditions. In practice, however, engineers must operate systems that evolve over time under changing technical and organisational constraints.

This is where the limits of static protocol assumptions become most visible. Production systems rarely rely on a single consensus design indefinitely. Validator sets change, safety assumptions shift, and consensus algorithms themselves are replaced all the time. However, existing reconfiguration approaches remain tightly coupled to specific consensus implementations, making these transitions complex, risky, and often disruptive.

Gauss addresses this by treating reconfiguration as a first-class concern and by separating it from consensus and execution. The reconfiguration mechanism remains independent of the underlying consensus implementation, while the execution layer is insulated from coordination logic by separating inner and outer logs. As long as a consensus protocol satisfies standard safety and liveness guarantees, it can be introduced as part of a new configuration without interrupting execution or requiring coordinated downtime.

Gauss is being implemented in the Rialo blockchain as part of a modular SMR architecture. For a system expected to operate for a long time, this is an important design choice. It reduces the operational cost of upgrades, lowers the risk of safety and liveness failures during configuration transitions, and enables the system to adopt improved consensus implementations and adjust protocol parameters to balance security and performance. More importantly, it ensures that a reconfiguration is a normal part of the system’s operation rather than a disruptive event.