The Problem
What is quantum error correction and why does decoding need to be fast?
Quantum Errors
Here's the thing about quantum computers - they make errors constantly. Qubits are incredibly fragile; they lose their information through interaction with the environment, a process called decoherence. If you want to build a reliable quantum computer, you need to detect and correct these errors faster than they pile up. That's the whole game.
The Surface Code
The surface code is the leading candidate for quantum error correction, and once I understood how it works, I could see why. Physical qubits are arranged on a 2D grid - some are data qubits that hold the actual quantum information, and others are measure qubits (stabilizers) that check groups of neighboring data qubits without directly reading the quantum data. The pattern of measurement results (the syndrome) tells you where errors happened.
Distance-3 Surface Code
○ - ■ - ○ - ■ - ○
| | | | |
■ ○ ■ ○ ■
| | | | |
○ - ■ - ○ - ■ - ○
| | | | |
■ ○ ■ ○ ■
| | | | |
○ - ■ - ○ - ■ - ○ ○ = data qubit · ■ = Z-stabilizer · ■ = X-stabilizer
A distance-d surface code uses roughly 2d² physical qubits to protect a single logical qubit, and it can tolerate up to (d-1)/2 errors before failing. The bigger d is, the more protection you get - but the decoder has a lot more work to do.
d = 3 (~17 qubits)
○ - ■ - ○
| | |
■ ○ ■
| | |
○ - ■ - ○ d = 5 (~49 qubits)
○ - ■ - ○ - ■ - ○
| | | | |
■ ○ ■ ○ ■
| | | | |
○ - ■ - ○ - ■ - ○
| | | | |
■ ○ ■ ○ ■
| | | | |
○ - ■ - ○ - ■ - ○ Decoding: The Bottleneck
So here's where I come in. The decoder takes the syndrome and figures out what corrections to apply. This has to happen in real time - before the next round of errors piles on. At d=5 with a 1 MHz measurement cycle, the decoder has roughly 1-10 microseconds to produce an answer. That's not a lot of time.
MWPM: The Gold Standard
The standard approach is Minimum Weight Perfect Matching (MWPM). It formulates decoding as a graph problem on the Detector Error Model (DEM) - a graph where nodes are detectors and edges represent possible error mechanisms with associated weights (log-likelihoods).
MWPM finds the minimum-weight set of edges that explains all fired detectors. It's highly accurate, but it's O(n³) and - this is the part that really got me - it has variable execution time depending on the specific error pattern.
From Lattice to Decoder Input
The decoder never sees the lattice directly. It operates on the Detector Error Model - a weighted graph where nodes are detectors and edges are error mechanisms.
Syndrome on lattice
○ - ■ - ○ - ■ - ○
| | |
★ ○ ■ ○ ■
| | |
○ - ■ - ○ - ■ - ○
| | |
■ ○ ★ ○ ■
| | |
○ - ■ - ○ - ■ - ○ DEM graph (decoder input)
●1 ---- D2 ---- D3
| | |
| | |
D4 ---- ●5 ---- D6 ★ / ● = fired detector · Edges = error mechanisms with log-likelihood weights · MWPM finds minimum-weight matching between fired nodes
Why time-determinism matters
In practice, MWPM might take 1 µs for one syndrome pattern but 100 µs for another. If your measurement cycle is 1 µs, that variable latency is a showstopper - you can't just hope for easy error patterns. The decoder needs to finish in bounded time every single cycle, no matter what the syndrome looks like. That's the constraint that makes this problem interesting (and hard).
The Engineering Constraint
So what I was actually looking for is a decoder that satisfies three things simultaneously:
- Bounded time - worst case must fit within the measurement cycle
- Deterministic - same number of clock cycles every time, no data-dependent branching
- Accurate - close to MWPM quality or better
That's the question this whole project explores: can we meet all three requirements at once?
The Fundamental Tradeoff
After ~50 approaches, a clear tradeoff emerged:
| Speed Class | Quality d=3 | Quality d=5 | FPGA Time |
|---|---|---|---|
| O(E) fixed-graph | 1.13-1.33x | 1.96-2.43x | 0.5-1.2 µs |
| O(k²) global matching | 1.03-1.14x | 1.12-1.31x | 10-50 µs |
| MWPM O(k³) | 1.00x | 1.00x | variable |
The open frontier: achieving Sinkhorn+3opt quality (1.12x at d=5) within the FPGA time budget (10 µs). Explore the full journey to see how I got here.