Glossary

Key terms and notation used throughout this research.

Core Concepts

Surface Code

A topological quantum error-correcting code where physical qubits are arranged on a 2D grid. Data qubits sit on edges, and syndrome measurements (stabilizers) check groups of neighboring qubits for errors. Mathematically described by

H_{\text{det}}

, the binary incidence matrix mapping error mechanisms to detectors.

Code Distance (d)

The minimum number of physical errors needed to cause a logical error. Higher d means more protection: a distance-d code can correct

\lfloor

errors. Example: d=3 tolerates 1 error (17 qubits), d=5 tolerates 2 errors (49 qubits).

Syndrome

The pattern of stabilizer measurement outcomes, represented as the binary vector S. Each entry S[i] is 0 (no error detected) or 1 (error detected). The decoder's input.

Detector

A comparison between two stabilizer measurements (same stabilizer, consecutive rounds). Fires (=1) when the measurement changes, indicating a possible error event. Each row of

H_{\text{det}}

corresponds to one detector; each column to one error mechanism.

Detector Error Model (DEM)

A graph where nodes are detectors and edges represent physical error mechanisms that could cause those detectors to fire. Each edge has weight w[j] =

-\log(p_j)

, the log-likelihood of that error. Encoded as the incidence matrix

H_{\text{det}}

with weight vector w.

MWPM (Minimum Weight Perfect Matching)

The gold-standard decoding algorithm. Finds the minimum-weight set of edges in the DEM graph that explains all fired detectors.

O(n^3)

and variable-time, but highly accurate. All decoder quality is measured relative to MWPM via the vs_mwpm ratio.

Logical Error Rate

The probability that the decoder makes a wrong correction, causing a logical (undetectable) error. Lower is better. Measured as vs_mwpm: the ratio to MWPM error rate. 1.0x = identical to MWPM, 1.18x = 18% worse.

Physical Error Rate (p)

The probability of a physical error on each qubit per cycle. Typical experimental values: p = 0.001 to p = 0.01. Determines the DEM edge weights via w =

-\log(p)

Measurement Rounds (r)

The number of times syndrome measurements are repeated. More rounds give more temporal data, making the decoding problem 3D (space x space x time).

H_{\text{det}}

grows with r: each round adds detectors and cross-round mechanisms.

H_det

The detector-to-mechanism incidence matrix (binary).

H_{\text{det}}[i,j]

= 1 if error mechanism j causes detector i to fire. Shape: [n_dets, n_mechs]. The keystone matrix for neural decoders - every architecture in this research operates on

H_{\text{det}}

or its products.

Example (3 detectors, 4 mechanisms):

         m0  m1  m2  m3
  d0   [  1   1   0   0 ]   ← d0 fires if m0 or m1 errs
  d1   [  0   1   1   0 ]   ← d1 fires if m1 or m2 errs
  d2   [  0   0   1   1 ]   ← d2 fires if m2 or m3 errs

Sinkhorn Algorithm

An iterative algorithm for entropic regularization of optimal transport. Computes a soft assignment from the cost matrix via

\exp(-C / \

, then alternates row and column normalization for n_iter iterations. The

\tau

parameter controls sharpness (lower = more discrete). Fixed iterations = deterministic time.

k-opt

Local search refinement that considers swapping k matched pairs simultaneously. 2-opt swaps pairs; 3-opt considers triples. Higher k catches more complex re-pairings but costs

O(k^k)

per move.

Deterministic Time

Execution time that is fixed and predictable regardless of input data. Critical for FPGA implementation where the decoder must complete before physical qubits decohere. Achieved via fixed-iteration algorithms (Sinkhorn with n_iter iterations) instead of variable-convergence methods like MWPM.

FPGA

Field-Programmable Gate Array. Reconfigurable hardware that executes parallel operations in fixed clock cycles. Target platform for real-time QEC decoders because matrix operations on

H_{\text{det}}

can be fully parallelized.

Stim

A fast stabilizer circuit simulator by Google. Used to generate syndrome samples and DEM graphs for testing decoders. Outputs

H_{\text{det}}

, the weight vector w, and the flip vector f.

PyMatching

Python implementation of MWPM decoding for quantum codes. Uses the Blossom algorithm with

O(n^3)

complexity. The reference decoder - vs_mwpm = 1.0x by definition.

Belief Propagation (BP)

Message-passing algorithm on factor graphs. Works well for LDPC codes but fails on quantum DEM graphs due to short 4-cycles in

H_{\text{det}}

that trap messages.

Union-Find

O(n \alpha(n))

decoder that grows clusters from fired detectors and merges them. Nearly-linear time but loses accuracy at

d \geq 5

due to greedy matching within clusters.

PingPong

A neural architecture that iterates forward (

H_{\text{det}} \cdot \text{det} \to \text{mechanism demand}

) and backward (

H_{\text{det}}^T \cdot \text{mech} \to \text{detector isolation}

) passes with learned MLPs. 2 rounds optimal.

MWPM Distillation

Training technique where an auxiliary side head predicts MWPM output, forcing the backbone to learn matching-like representations. Side head removed at inference (zero overhead).

Observable

A logical operator that determines the logical qubit state. Represented by the flip vector f where f[j]=1 if mechanism j flips the observable. The decoder predicts the parity

\pmod{2}

of f values for all selected mechanisms.

ANF (Algebraic Normal Form)

The unique polynomial representation of a Boolean function over

\text{GF}(2)

. Every Boolean function has an exact ANF. For decoding at d=3 it's degree 8 with 84 terms - but

O(N \cdot 2^N)

to compute, making it impractical.

Mathematical Notation

S (Syndrome vector)

Binary vector where S[i]=1 if detector i fired. The raw input to any decoder.
Example: S = [0, 1, 0, 1, 0, 0] - detectors 1 and 3 fired, all others quiet.

d (Code distance)

Minimum number of physical errors needed to cause a logical error. Corrects up to

\lfloor

errors.
d=3: 17 qubits, 8 detectors per round, tolerates 1 error.
d=5: 49 qubits, 24 detectors per round, tolerates 2 errors.

p (Physical error rate)

Error probability per qubit per cycle. Typical values: p = 0.001 to p = 0.01. Determines DEM edge weights via w =

-\log(p)

.
p = 0.01 → w = 4.6 (common error).
p = 0.001 → w = 6.9 (rare error, higher weight).

r (Measurement rounds)

Number of syndrome measurement rounds. r=1 gives 2D decoding; r>1 adds a time axis making it 3D.
d=3, r=3: 8 spatial detectors x 3 rounds = 24 total detectors.

K (Demand vector)

K[j] = number of fired detectors adjacent to mechanism j in the DEM. Measures how "demanded" each edge is by the current syndrome.
If S = [0, 1, 1, 0] and m1 neighbors d0,d1 while m2 neighbors d1,d2:
K[m1] = 1 (one fired neighbor), K[m2] = 2 (both neighbors fired).

K_t (Same-round demand)

Demand K restricted to edges within time-slice t. Captures spatial error patterns within a single measurement round.
With r=3 rounds,

K_t

for t=2 counts only fired detectors in round 2 against spatial mechanisms in that round.

K' (Cross-round demand)

Demand for edges spanning adjacent time-slices (t to t+1). Captures temporal error correlations - when the same stabilizer gives different results in consecutive rounds.
If detector d fires in round 2 and round 3, the temporal mechanism connecting them has high K' value.

H_det (Incidence matrix)

Detector-mechanism binary incidence matrix.

H_{\text{det}}[i,j]

=1 if mechanism j triggers detector i. Each column has 1 entry (boundary mechanism) or 2 entries (bulk mechanism connecting two detectors).
d=3, r=1: shape [8, 12] (8 detectors, 12 mechanisms).
d=5, r=10: shape [240, ~700].

H_t (Per-round submatrix)

Columns of

H_{\text{det}}

restricted to mechanisms in time-slice t. Enables temporal decomposition: decode each round's spatial errors separately, then stitch across rounds.
700 total mechanisms across 10 rounds → each

H_t

has ~70 columns.

M_adj (Adjacency matrix)

Mechanism adjacency matrix:

M_{\text{adj}}

H_{\text{det}}^T

H_{\text{det}}

. Entry (i,j) > 0 if mechanisms i and j share at least one detector. Used to find neighboring mechanisms for local search and k-opt refinement.
If m1 and m2 both connect to detector d1, then

M_{\text{adj}}

[m1, m2] > 0.

f (Flip vector)

Observable flip vector: f[j]=1 if mechanism j flips the logical observable. The decoder's final answer is the parity

\pmod{2}

of f values for all selected mechanisms.
f = [0, 1, 0, 1, 0, 0]. If decoder selects mechanisms 1 and 3:
flip = f[1] XOR f[3] = 1 XOR 1 = 0 (no logical flip).

w (Weight vector)

DEM weight vector: w[j] =

-\log(p_j)

for mechanism j. Higher weight = less likely error. MWPM minimizes total weight of selected edges.
p = 0.01 → w = 4.6 (common error, low weight).
p = 0.001 → w = 6.9 (rare error, high weight).

tau (Temperature)

Sinkhorn temperature parameter controlling sharpness of the soft assignment via

\exp(-C / \

. Lower

\tau

= sharper (more discrete) assignments. Optimal values differ by code distance.
d=3:

\tau

= 2.0 (softer works better, fewer detectors).
d=5:

\tau

= 0.5 (sharper needed for larger graphs).

n_iter (Sinkhorn iterations)

Number of Sinkhorn row/column normalization iterations. More iterations = better convergence to optimal transport solution. Diminishing returns beyond 20 iterations.
n_iter = 5: fast but rough. n_iter = 20: sweet spot. n_iter = 100: marginal gain.

AUC (Area Under ROC Curve)

Edge ranking quality metric. 1.0 = perfect ranking of true error edges above non-errors. Used to evaluate pruner models that reduce the candidate edge set before matching.

recall@K

Fraction of true error edges retained when keeping top-K candidates per detector. Higher is better.
recall@3 = 0.95 means 95% of true errors appear in the top-3 candidates.

vs_mwpm

Error rate ratio vs MWPM. 1.0x = same quality as MWPM. The primary quality metric for all decoders in this research.
vs_mwpm = 1.04x (d=3): 4% worse than MWPM.
vs_mwpm = 1.18x (d=5): 18% worse than MWPM.