The Problem

What is quantum error correction and why does decoding need to be fast?

Quantum Errors

Here's the thing about quantum computers - they make errors constantly. Qubits are incredibly fragile; they lose their information through interaction with the environment, a process called decoherence. If you want to build a reliable quantum computer, you need to detect and correct these errors faster than they pile up. That's the whole game.

The Surface Code

The surface code is the leading candidate for quantum error correction, and once I understood how it works, I could see why. Physical qubits are arranged on a 2D grid - some are data qubits that hold the actual quantum information, and others are measure qubits (stabilizers) that check groups of neighboring data qubits without directly reading the quantum data. The pattern of measurement results (the syndrome) tells you where errors happened.

Distance-3 Surface Code

  ○ -  - ○ -  - ○
  |       |       |       |       |
  
  |       |       |       |       |
  ○ -  - ○ -  - ○
  |       |       |       |       |
  
  |       |       |       |       |
  ○ -  - ○ -  - ○

○ = data qubit  ·  = Z-stabilizer  ·  = X-stabilizer

A distance-d surface code uses roughly 2d² physical qubits to protect a single logical qubit, and it can tolerate up to (d-1)/2 errors before failing. The bigger d is, the more protection you get - but the decoder has a lot more work to do.

d = 3 (~17 qubits)

○ - ■ - ○
|       |       |
■   ○   ■
|       |       |
○ - ■ - ○

d = 5 (~49 qubits)

○ - ■ - ○ - ■ - ○
|       |       |       |       |
■   ○   ■   ○   ■
|       |       |       |       |
○ - ■ - ○ - ■ - ○
|       |       |       |       |
■   ○   ■   ○   ■
|       |       |       |       |
○ - ■ - ○ - ■ - ○
d=3 uses ~17 qubits, d=5 uses ~49 qubits. The decoder must handle the exponentially growing error space.

Decoding: The Bottleneck

So here's where I come in. The decoder takes the syndrome and figures out what corrections to apply. This has to happen in real time - before the next round of errors piles on. At d=5 with a 1 MHz measurement cycle, the decoder has roughly 1-10 microseconds to produce an answer. That's not a lot of time.

MWPM: The Gold Standard

The standard approach is Minimum Weight Perfect Matching (MWPM). It formulates decoding as a graph problem on the Detector Error Model (DEM) - a graph where nodes are detectors and edges represent possible error mechanisms with associated weights (log-likelihoods).

MWPM finds the minimum-weight set of edges that explains all fired detectors. It's highly accurate, but it's O(n³) and - this is the part that really got me - it has variable execution time depending on the specific error pattern.

From Lattice to Decoder Input

The decoder never sees the lattice directly. It operates on the Detector Error Model - a weighted graph where nodes are detectors and edges are error mechanisms.

Syndrome on lattice

○ -  - ○ -  - ○
|       |       |

|       |       |
○ -  - ○ -  - ○
|       |       |

|       |       |
○ -  - ○ -  - ○

DEM graph (decoder input)

●1 ---- D2 ---- D3
|       |       |
|       |       |
D4 ---- ●5 ---- D6

/ = fired detector  ·  Edges = error mechanisms with log-likelihood weights  ·  MWPM finds minimum-weight matching between fired nodes

Why time-determinism matters

In practice, MWPM might take 1 µs for one syndrome pattern but 100 µs for another. If your measurement cycle is 1 µs, that variable latency is a showstopper - you can't just hope for easy error patterns. The decoder needs to finish in bounded time every single cycle, no matter what the syndrome looks like. That's the constraint that makes this problem interesting (and hard).

The Engineering Constraint

So what I was actually looking for is a decoder that satisfies three things simultaneously:

  • Bounded time - worst case must fit within the measurement cycle
  • Deterministic - same number of clock cycles every time, no data-dependent branching
  • Accurate - close to MWPM quality or better

That's the question this whole project explores: can we meet all three requirements at once?

The Fundamental Tradeoff

After ~50 approaches, a clear tradeoff emerged:

Speed Class Quality d=3 Quality d=5 FPGA Time
O(E) fixed-graph 1.13-1.33x 1.96-2.43x 0.5-1.2 µs
O(k²) global matching 1.03-1.14x 1.12-1.31x 10-50 µs
MWPM O(k³) 1.00x 1.00x variable

The open frontier: achieving Sinkhorn+3opt quality (1.12x at d=5) within the FPGA time budget (10 µs). Explore the full journey to see how I got here.