Results

All results as ratio vs MWPM logical error rate. Lower is better. 1.0x = identical to MWPM.

Conditions: p=0.01, d=3 (80 detectors, 222 edges) and d=5 (240 detectors, 930 edges), r=10 unless noted. MWPM baseline: 11.8% (d=3), 10.9% (d=5).

Here's the bottom line -- every decoder I built over ~50 experiments, ranked against MWPM. The short version: Sinkhorn + local search gets remarkably close.

Table 1: Full Decoders Overview

Approach d=3 vs MWPM d=5 vs MWPM Time Deterministic?
MWPM (PyMatching) 1.000x 1.000x No
Sinkhorn + 2-opt 1.030x 1.256x Yes
Sinkhorn + 3-opt 1.044x 1.183x Yes
Sinkhorn (best tau/iter) 1.135x 1.306x Yes
Dijkstra-greedy 1.128x 1.990x No
Sparse Sinkhorn (3 iter) 1.254x 1.993x Yes
Union-Find (odd-only) 1.218x 2.433x No
Chain phase1 only 2.93x - Yes

Sinkhorn best: tau=2.0 iter=20 (d=3), tau=0.5 iter=20 (d=5).

Before tackling the full 3D problem, here's what happened in 2D (single measurement round). At d=3, learned decoders actually beat MWPM - by learning where it makes mistakes.

Table 2: 2D (Single-Round) Results

Approach d=3 vs MWPM d=5 vs MWPM Time Deterministic?
MWPM (PyMatching) 1.000x 1.000x No
Shift-OR 2D 1.000x 1.000x Yes
NN + AND/XOR features ~0.91x - Yes
PP + MWPM Distillation 0.90x 1.108x Yes

All r=1 (single round), p=0.01. PP+Distillation also beats MWPM at d=3 r=3: 0.948x. All approaches degrade at d=5+ and with more rounds.

Why explore Sinkhorn relaxations?

MWPM provides excellent decoding accuracy but involves irregular, branch-heavy computation that is difficult to parallelize or implement in hardware with strict latency guarantees. Sinkhorn relaxations provide a different structure: deterministic runtime, highly parallel matrix operations, and compatibility with GPUs or specialized accelerators. This makes them interesting to explore as bounded-time decoding approximations.

Once I committed to Sinkhorn, I needed to find the right temperature and iteration count. This was a systematic grid search -- and the landscape turned out to be surprisingly non-monotonic.

Table 3: Sinkhorn Parameter Sweep

Heatmap of vs_mwpm ratio across tau (temperature) and n_iter (iterations). Green = close to MWPM, red = far worse.

d=3, r=10, p=0.01 (MWPM err: 11.8%)

tau \ iter5102050100
0.052.215x2.290x2.335x2.356x2.369x
0.11.326x1.324x1.340x1.330x1.343x
0.21.154x1.165x1.148x1.138x1.145x
0.51.162x1.164x1.161x1.149x1.145x
11.146x1.151x1.146x1.150x1.151x
21.139x1.137x1.135x1.137x1.137x
51.148x1.144x1.147x1.149x1.147x

Best: 1.135x (bold border). Green = close to MWPM, red = far worse.

d=5, r=10, p=0.01 (MWPM err: 10.9%)

tau \ iter5102050
0.21.531x1.457x1.373x1.361x
0.51.406x1.330x1.306x1.306x
11.440x1.385x1.351x1.342x
21.773x1.725x1.700x1.648x
51.958x1.911x1.879x1.868x

Best: 1.306x (bold border). Green = close to MWPM, red = far worse.

Raw Sinkhorn gets you to 1.135x. The question was: can local search close the remaining gap? Turns out 2-opt is the sweet spot -- 3-opt helps at d=5 but the compute cost is brutal.

Table 4: Post-Processing Improvement Chain

Stage d=3 vs MWPM d=5 vs MWPM Improvement
Sinkhorn only 1.135x 1.306x -
+ 2-opt local search 1.030x 1.256x -9.3% (d=3), -3.8% (d=5)
+ 3-opt local search 1.044x 1.183x +1.4% (d=3), -5.8% (d=5)

2-opt is the big win at d=3. 3-opt helps at d=5 but costs 50.8 s (impractical). Note: 3-opt at d=3 is slightly worse than 2-opt alone (1.044x vs 1.030x) due to overfitting to local minima.

This was the biggest chapter of the exploration -- six different NN architectures, from simple PingPong to hybrid Sinkhorn+NN. I tried everything I could think of. The punchline: NNs learn features beautifully at d=3, but none of them crack the matching problem at d=5.

Table 5: NN Decoder Architectures

Model d r Params Test Acc MWPM Err
PingPong (d=3, r=3)
PP 2R baseline 33 49.9K 88.0% 4.5%
Scale A (8ch 128x64) 33 50.2K 86.8% 4.5%
Scale B (16ch 256x128) 33 142.9K 90.1% 4.5%
Scale D (32ch 256x128) 33 202.0K 90.3% 4.5%
Scale C (16ch 512x256) 33 347.6K 91.1% 4.5%
Temporal Mixing (d=3, r=10)
tconv PP_2R 310 416.2K 74.0% 11.6%
tconv PP_4R 310 417.3K 76.4% 11.6%
tconv Inter_k3 310 417.8K 78.3% 11.6%
tconv Inter_k5 310 418.8K 70.0% 11.6%
tconv Post_k3 310 417.0K 64.2% 11.6%
DEM-Attention
demattn PP_2R 310 416.2K 65.1% 11.4%
demattn Inter_k3 310 417.8K 67.9% 11.4%
demattn Attn_2L 310 417.3K 86.4% 11.6%
demattn Attn_4L 310 419.5K 85.4% 11.6%
demattn Attn_2L 55 1.33M 81.4% 5.5%
demattn Attn_4L 55 1.33M 83.1% 5.5%
Hybrid (Sinkhorn + NN)
hybrid sinkhorn P1_Med 310 307.8K 87.2% 11.6%
hybrid binary1hop P1_Med 310 307.8K 86.7% 11.6%
hybrid b1h+txorpadcross P1_Med 310 594.5K 87.8% 11.6%
hybrid binary2hop P1_Med 310 717.4K 86.0% 11.6%
hybrid cost2hop P1_Med 310 717.4K 83.4% 11.6%
hybrid sinkhorn P1_Med 510 901.7K 71.2% 10.8%
hybrid b1h+txorpadcross P1_Med 510 1.82M 67.9% 10.8%
hybrid binary1hop P1_Med 510 901.7K 61.8% 10.8%

Best d=3 r=10 NN: hybrid b1h+txorpadcross (87.8%). All NNs degrade sharply at d=5.

Test Accuracy vs Model Size (PingPong scaling)

Diminishing returns: 7x more params (A→C) gives only +4% accuracy at d=3, +4% at d=5.

After the NN decoder chapter, I realized NNs shine at pruning, not decoding. A tiny 346-parameter network can tell you which edges matter -- and then you hand the pruned graph to MWPM. This is where neural networks actually earn their keep.

Table 6: Edge Pruner Comparison

Model d Params AUC Recall@3 Recall@5 MWPM@3 MWPM@5
MatOps 3 0 0.973 92.5% 97.0% 1.07x 1.02x
Linear 3 11 0.979 92.3% 96.9% 1.08x 1.02x
PP 2R 8ch 3 346 0.988 95.4% 98.5% 1.01x 1.00x
PP 2R 16ch 3 1,202 0.989 95.8% 98.5% 1.01x 1.00x
MatOps 5 0 0.960 88.6% 93.7% 1.33x 1.17x
Linear 5 11 0.961 88.7% 93.7% 1.33x 1.17x
PP 2R 8ch 5 346 0.970 88.7% 93.8% 1.24x 1.06x
PP 4R 16ch 5 2,290 0.983 90.5% 95.6% 1.14x 1.01x

MWPM@K = MWPM error rate on graph pruned to top-K edges per detector, divided by unpruned MWPM. At d=3, PP 2R 16ch with K=5 recovers exact MWPM quality (1.00x).

Pruner: MWPM Error on Pruned Graph vs K (candidates per detector)

d=3

d=5

Lower is better. At K=3, PP pruner achieves 1.01x at d=3 but 1.24x at d=5 - the d=5 quality wall.

I also explored what happens when you abandon MWPM entirely. Union-Find, greedy Dijkstra, sparse Sinkhorn -- they all hit the same wall at d=5, clustering around 2.0x. That consistency told me the wall is structural, not algorithmic.

Table 7: Alternative Matching Methods

Method d Error Rate vs MWPM Time Deterministic?
Union-Find (odd-only) 3 14.4% 1.218x No
Dijkstra-greedy 3 13.4% 1.128x No
Sparse Sinkhorn (tau=1.0, 3 iter) 3 14.9% 1.254x Yes
Chain phase1 only 3 34.7% 2.93x Yes
Union-Find (odd-only) 5 26.5% 2.433x No
Dijkstra-greedy 5 21.7% 1.990x No
Sparse Sinkhorn (tau=1.0, 3 iter) 5 21.7% 1.993x Yes

All d=5 alternatives cluster around 2.0x - the structural quality wall for non-MWPM methods.

Training Curves

Training Loss Curves (d=3, r=10 edge pruners)

Solid = train loss, dashed = validation loss. All models: d=3, r=10, p=0.01.

After ~50 experiments, these are the lessons that stuck. Each one was earned the hard way.

Key Insights

1. The decoder operates on the detector graph, not the lattice

Surface-code diagrams emphasize the lattice, but the decoder operates on the detector graph produced by the DEM. Every approach that used grid topology failed. Every approach that used DEM topology worked better. All successful approaches implicitly worked in that representation.

2. HdetH_{\text{det}} is the keystone

Projecting through it (DeepPipe), iterating through it (PingPong), and masking attention with it (DEM-attention) are the three most productive architectural ideas.

3. Global matching pressure is hard to avoid

Multiple approaches converged toward structures similar to MWPM. No local feature, no NN, no iterative message passing can substitute for minimum-weight matching at d≥5. This suggests that global pairing may be an unavoidable component of effective decoders.

4. Pruning helps — selection hurts

Reducing the search space improves performance, but committing to edges early destroys global optimality. NNs can identify which edges are likely errors (AUC 0.99+), but they cannot determine which consistent subset to select. Premature commitment is catastrophic in matching problems.

5. Sinkhorn+2opt is the quality champion

1.030x d=3, 1.256x d=5. Deterministic time. The closest any fixed-time decoder gets to MWPM.

6. Binary features > cost features

NNs work better with 0/1 presence indicators than floating-point costs. The network learns implicit weighting. Cost2hop underperforms binary1hop at both d=3 and d=5.

7. 2-hop features don't help

Adding 2-hop connectivity (binary or cost) does not improve over 1-hop, and often hurts. At d=5, binary2hop and cost2hop plateau at 60.2% accuracy.

8. Distance-5 introduces a structural difficulty wall

Many heuristics that perform well at d=3 degrade sharply at d=5. All fixed-graph decoders cluster around 2.0x at d=5 regardless of method. NNs drop 15-27 percentage points. This suggests that decoding pressure becomes global rather than local as distance increases.