036 — W^EI sets the cliff; W^IE shapes the basin (trains)

Abstract

Maps the (wEI, wIE) coupling plane: which weight controls the recruitment cliff, and where do trained networks land when initialised across it? wEI moves the cliff (E→I drive is the recruitment knob); wIE shapes the basin above engagement but does not move the cliff. Training from each grid point lands in two distinct basins — a low-E PING corner and a high-E silent-I stretched-COBA corner.

Methods

Training recipe (canonical / medium tier):

Parameter	Value
Integration timestep $\Delta t$	0.1 ms
Trial duration $T$	200 ms
MNIST samples (80/20 stratified split of 2000)	1600 train / 400 test ( $\approx$ 2.9% of the 70k-sample MNIST corpus)
Epochs	10

PING and COBA baseline definitions, training recipe, and the spike-budget regulariser are in nb025. The recruitment cliff at $f^\star$ is introduced in nb025. This entry asks: which part of the E↔I coupling architecture controls the cliff’s position, and what happens when we train the network from coupling initialisations spanning both sides of it?

Results

Coupling sweep: shifting the recruitment cliff

If $f^\star$ is set by the E↔I coupling, altering $W^{EI}$ or $W^{IE}$ at inference should move it. On the trained PING baseline ( $\theta_u =$ off), we multiply either $W^{EI}$ or $W^{IE}$ by a scalar in $\{0.25, 0.5, 1.0, 2.0\}$ and re-run the $W_\text{in}$ scale sweep on top. All other weights frozen.

Four-panel figure. Left column: vary W^{EI} scale, W^{IE} held at 1. Right column: vary W^{IE} scale, W^{EI} held at 1. Top row: test accuracy vs W_in scale s. Bottom row: I rate vs s. Left column: cliff moves right as W^{EI} weakens; W^{EI} × 0.25 needs s ≈ 0.5 to engage, baseline (× 1.0) engages at s ≈ 0.25, and × 2.0 sits on top of baseline. Right column: all four curves overlap — varying W^{IE} doesn't move the cliff, only changes basin dynamics above it. — Inference-time $W_\text{in}$ scale sweep on trained PING ( $\theta_u =$ off) with either $W^{EI}$ or $W^{IE}$ scaled. Top: accuracy. Bottom: I rate. Left: $W^{EI}$ sweep, $W^{IE}$ fixed. Right: $W^{IE}$ sweep, $W^{EI}$ fixed.

Figure 1 cleanly separates the two roles. $W^{EI}$ moves the cliff (left column): halving E→I coupling shifts the recruitment edge from $s \approx 0.25$ to $s \approx 0.35$ , quartering it to $s \approx 0.50$ . Doubling does not push it further left because below $s \approx 0.2$ the feedforward drive itself is too weak to make E fire — the bottleneck has moved from I-cell recruitment to E-cell firing, and stronger $W^{EI}$ cannot rescue silent E spikes. $W^{IE}$ does not move the cliff (right column): all four curves lift off at $s \approx 0.25$ , because $W^{IE}$ only matters once I has fired, and at the cliff I has not. Above the cliff the right-column I-rate curves separate visibly — stronger inhibition per I spike suppresses the steady-state I rate. So the recruitment threshold is set by the E→I drive; inhibitory feedback shapes the basin only above engagement.

The $W^{EI}$ scaling has a clean quantitative signature: the unsaturated points obey $s^\star \cdot \sqrt{W^{EI}_\text{scale}} \approx 0.25$ to three significant figures (products 0.250, 0.247, 0.250), so empirically $s^\star \propto 1/\sqrt{W^{EI}}$ rather than the naive $1/W^{EI}$ . The exponent is a fingerprint of the gain function $\Phi_E$ near rheobase, and a concrete prediction for the mean-field analysis in nb033 to reproduce.

$W^{EI}$ x $W^{IE}$ training grid

The coupling-sweep results above scaled $W^{EI}$ and $W^{IE}$ on an already-trained network. A stronger test is to train one network per grid point with $W^{EI}$ and $W^{IE}$ initialised to specified values from scratch, under heavy spike penalty. Do the resulting solutions cluster into a PING-like region and a COBA-like region?

Method: 5×5 grid over $W^{EI}, W^{IE} \in \{0, 0.25, 0.5, 1.0, 2.0\}$ (mean initialisation, std fixed at $0.1 \times$ mean). One PING-architecture training run per cell at $\theta_u = 0.2$ from epoch 0, $W_\text{in} = 0.6$ (a compromise between COBA’s 0.3 init and PING’s 1.2 — gives the no-loop cells a fair shot at converging), 100 training epochs per cell, seed 42. All other hyperparameters from the standard PING recipe. 25 trainings on Modal A100 in parallel. Reporting per-cell best-epoch accuracy.

Three heatmap panels on a 5x5 grid with W^EI on y-axis and W^IE on x-axis, each value in {0, 0.25, 0.5, 1, 2}. Left: best-epoch accuracy ranges from 75% in the no-loop band to 86% at the PING baseline (1, 2). Middle: hidden E rate is 2-3 Hz in the no-loop and PING cells, up to 9.4 Hz in the high-E stretched-COBA cell (0.5, 2). Right: I rate is 0 in the no-loop band and most stretched-COBA cells, jumps to 11-22 Hz in the PING cluster (W^EI ≥ 1 and W^IE ≥ 1, or W^EI = 2 and W^IE ≥ 0.25). — Three heatmaps on the 5×5 ( $W^{EI}, W^{IE}$ ) training grid. One PING-architecture network per cell, trained from scratch at $\theta_u = 0.2$ for 100 epochs, $W_\text{in} = 0.6$ . **Left:** best-epoch test accuracy. **Middle:** hidden-E rate. **Right:** hidden-I rate (the cluster discriminator).

Three regions emerge:

No-loop band ( $W^{EI} = 0$ row OR $W^{IE} = 0$ column): accuracy ≈ 76.5–77%, E rate 2.2 Hz, I rate 0. Without a closed E→I→E loop the optimiser settles at the same COBA-low operating point regardless of which side of the loop is zero — the structural-equivalence prediction holds.
PING cluster ( $W^{EI} \geq 1.0$ with $W^{IE} \geq 1.0$ , plus the entire $W^{EI} = 2$ row from $W^{IE} \geq 0.25$ ): I rate 11.1–22.2 Hz, E rate 2.0–3.6 Hz, accuracy 80.25–86.0%. Gamma signature; the recipe baseline at $(W^{EI}, W^{IE}) = (1, 2)$ peaks at 86.0%, and the cluster includes the over-inhibited corner $(2, 2)$ at 82.5% — only ≈ 3.5% drop from peak.
Stretched-COBA cluster ( $W^{EI} \in \{0.25, 0.5\}$ with most $W^{IE}$ , plus $(1.0, 0.25)$ and $(1.0, 0.5)$ ): I rate 0, E rate 2.2–9.4 Hz, accuracy 75–84%. The I-loop never engages; the optimiser uses pure E-cell firing to fit the digits. The high end of this cluster is the cell at $(0.5, 2.0)$ — at 84.0% with E = 9.37 Hz and I = 0, the I-loop is silent yet accuracy is within 2 pp of PING peak.

The $(0.5, 2.0)$ cell deserves special attention. With 30 epochs of training it sat in the PING cluster (some seeds engaging I, some not). At 100 epochs the optimiser deterministically finds a no-I, high-E solution (E = 9.37 Hz, I = 0, acc = 84%). This is a new attractor that the shorter-training version masked — the network learns to classify with elevated E firing in lieu of recruiting the loop.

The clustering confirms the GH #29 hypothesis: under heavy spike penalty, the (coupling, accuracy, rate) outcome lands in qualitatively different solutions depending on coupling strength. The recruitment cliff that the inference-time coupling sweep located (Figure 1) is reproduced here as the boundary of the PING cluster at $W^{EI} \approx 1$ (with $W^{IE} \geq 1$ ). The no-loop band at 76.5–77% sits ≈ 9 pp below PING’s 86% peak — substantially better than chance, but the loop is a real accuracy advantage even at this spike budget. The diagonal slice below walks this transition at higher resolution along the canonical $W^{IE} = 2 W^{EI}$ line.

Plotting the same 25 cells in (E rate, accuracy) space — colour-coded by I rate — makes the cluster structure direct:

Scatter of 25 points showing best-epoch accuracy vs hidden E rate. Points are coloured by hidden I rate from dark purple (I=0) to bright yellow (I=22 Hz). Dark-purple points (I=0) span accuracy 75-84% with E rates 2.2-9.4 Hz. Green/yellow points (I>11 Hz) sit at 80-86% accuracy with E rates 2-3.6 Hz. The (0.5, 2.0) cell at acc=84%, E=9.4 Hz, I=0 sits as the outlier high-E no-loop solution. — Each dot is one cell of the 5×5 $(W^{EI}, W^{IE})$ training grid, plotted at its best-epoch test accuracy vs final hidden-E rate. Colour encodes the third quantity — hidden-I rate — which is what discriminates the clusters. Labels are $(W^{EI}, W^{IE})$ .

Two clouds separate in this view. Dark-purple points (I rate ≈ 0) span accuracy 75–84% with E rates 2.2–9.4 Hz — the no-loop / stretched-COBA region. The top of this band is $(0.5, 2.0)$ at 84% with E = 9.4 Hz, the high-accuracy E-only solution: silent I, elevated E rate, accuracy within ≈ 2 pp of the PING cluster. Yellow/green points (I rate > 11 Hz) sit above at 80–86% accuracy with E rates clamped to 2–3.6 Hz by the active inhibitory loop — the PING cluster, distinguished from no-loop by lower E and active I.

The vertical gap between the two clouds is the recruitment cliff measured in accuracy units: ≈ 2–6 pp depending on where you draw the comparison. The high-E no-loop attractor at $(0.5, 2.0)$ closes most of that gap with E firing alone, but at 4× the per-cycle E spike count.

Overlaying the COBA and PING $\theta_u$ -sweep frontiers (Figure 5 of nb025) directly onto the same axes pins where the grid solutions live relative to the two canonical baselines:

Same scatter as Figure 2b (25 grid cells coloured by I rate, accuracy 70-90% on y-axis, E rate on x-axis) with two extra curves overlaid: the COBA θ_u sweep (red squares connected by line) traversing high E rates (~88 Hz at θ_u=off down to ~0 Hz at θ_u=0.2) with accuracy 62-88%; and the PING θ_u sweep (black diamonds) clustered tightly at E rate 3-7 Hz with accuracy 85-89%. — Grid scatter (coloured by I rate) plus the COBA (red squares) and PING (black diamonds) $\theta_u$ -sweep frontiers from nb025 Figure 5, connecting each model’s $(\theta_u, \text{final acc}, E)$ points. All curves and points trained for 100 epochs. Annotations on the frontier curves mark $\theta_u$ in spikes (or “off”). Note the axes follow the accuracy zoom (70–90%), so COBA’s low-rate tail at $\theta_u \in \{0.5, 0.2\}$ drops off the bottom.

The PING frontier sits above the green/yellow grid cluster — the $(\theta_u$ -swept $)$ PING baseline reaches 85–89% accuracy at E rates 3–7 Hz, higher than the grid’s PING cluster (82–85.5%). This is because the canonical PING recipe uses $W_\text{in} = 1.2$ but the grid was trained with $W_\text{in} = 0.6$ (chosen to give no-loop cells a fair shot at converging), which costs a few accuracy points even at the recipe baseline coupling. The PING frontier also never leaves the active-loop regime even when the budget is squeezed all the way to 0.2 spikes — no $\theta_u$ value can push PING into the dark-purple band, because the loop sets a structural minimum on E rate. The grid scatter shows that the only way to reach that band is to start training from a $(W^{EI}, W^{IE})$ init that prevents the loop from engaging in the first place — a structural rather than penalty-driven escape.

The COBA frontier sweeps across $\geq 2$ decades of E rate (≈88 Hz down to ≈0 Hz) at accuracies 88–62%. The interesting part is the low-E tail: at $\theta_u = 1$ , COBA sits at $(E \approx 3.3 \text{ Hz}, \text{acc} \approx 68\%)$ — and the high-accuracy stretched-COBA grid cell at $(0.5, 1)$ sits above this point at $(E \approx 3.5 \text{ Hz}, \text{acc} = 78\%)$ . So the grid solution is not just rediscovering COBA — it is finding a strictly better operating point than $\theta_u$ -squeezed COBA at the same rate, ≈10 pp higher accuracy with similar firing. This is the third attractor the diagonal slice also surfaces: not PING (no I loop), not penalty-squeezed COBA (different $W_\text{in}$ scale, different cell participation pattern), but a coupling-shaped E-only basin the optimiser only finds when initialised inside it.

Training videos

Per-epoch oscilloscope videos for every cell of the grid. Each video shows one epoch per frame against a fixed test trial; gamma cadence (if any) is visible in the E/I rasters. Naming: weiX__wieY for $W^{EI} = X$ , $W^{IE} = Y$ .

	$W^{IE} = 0$	$W^{IE} = 0.25$	$W^{IE} = 0.5$	$W^{IE} = 1$	$W^{IE} = 2$
$W^{EI} = 0$	video	video	video	video	video
$W^{EI} = 0.25$	video	video	video	video	video
$W^{EI} = 0.5$	video	video	video	video	video
$W^{EI} = 1$	video	video	video	video	video
$W^{EI} = 2$	video	video	video	video	video

1D diagonal slice along the ei_ratio = 2 line

The grid above varies $W^{EI}$ and $W^{IE}$ independently, but the codebase’s standard PING recipe ties them via $W^{IE} = \text{ei\_ratio} \cdot W^{EI}$ with $\text{ei\_ratio} = 2$ . Restricting to that line gives a finer-grained view: one knob ( $W^{EI}$ ) at 10 values, $W^{IE}$ tracking at twice the value, three seeds per value, 100 training epochs. Accuracy is reported as the per-cell best epoch. Same recipe as the grid otherwise ( $\theta_u = 0.2$ , $W_\text{in} = 0.6$ ).

Three line panels with W^EI on x-axis and error bars from 3 seeds. Left: best-epoch accuracy is flat at 77-78% from W^EI 0 to 0.5, jumps to 80.5% at W^EI=0.6, rises further to 84.75% at W^EI=0.75, peaks at 86.08% at W^EI=1, then drops to 83.33% at W^EI=2. Middle: E rate is 2-3 Hz in the no-loop band, peaks at 7.8 Hz at W^EI=0.75, drops back to 2-3 Hz once the loop locks. Right: I rate is exactly 0 for W^EI ≤ 0.6, lifts to 1.6 Hz at W^EI=0.75 then climbs to 22.2 Hz at W^EI=2. — $W^{EI}$ scanned over $\{0, 0.1, 0.25, 0.4, 0.5, 0.6, 0.75, 1.0, 1.5, 2.0\}$ with $W^{IE} = 2 W^{EI}$ . Error bars are mean ± std of the best-epoch test accuracy over 3 seeds (42, 43, 44), 100 training epochs each; faint dots show individual seed peaks. **Left:** test accuracy. **Middle:** hidden-E rate (last-epoch). **Right:** hidden-I rate (last-epoch). Grey dotted vertical at $W^{EI} = 1$ marks the canonical recipe baseline.

Three regimes:

$W^{EI} \in \{0, 0.1, 0.25, 0.4, 0.5\}$ — no-loop / stretched-COBA, low-accuracy variant: accuracy 77–78%, I rate exactly 0, E rate 2.2–2.6 Hz. Tight error bars (acc std ≤ 0.9%) — the optimiser deterministically lands in the same E-only basin from any seed.
$W^{EI} = 0.6$ — high-accuracy stretched-COBA, transition: 80.50 ± 0.43% accuracy, I rate still exactly 0 for all three seeds, E rate 4.18 Hz. The loop hasn’t engaged but accuracy is already 3 pp above the low-band.
$W^{EI} = 0.75$ — transitional with weak loop: 84.75 ± 0.43% accuracy, I rate 1.6 Hz (1 of 3 seeds engages a small I population), E rate 7.80 Hz. The first cell on the sweep where I lifts off zero.
$W^{EI} \in \{1.0, 1.5\}$ — full PING: accuracy 85.4–86.1%, I rate climbing 12.7 → 16.7 Hz as the loop tightens, E rate clamped to 2.2–3.2 Hz by inhibitory negative feedback. The recipe baseline at $W^{EI} = 1$ is the peak of the sweep.
$W^{EI} = 2.0$ — over-inhibited: 83.33 ± 0.63% accuracy, I rate 22.2 Hz, E rate 2.0 Hz. The drop from the plateau is ≈ 3 pp — not catastrophic, but the over-inhibited corner is genuinely less accurate.

The accuracy cliff is now smoother than the 30-epoch picture suggested. The accuracy cliff rises in two steps: $W^{EI} = 0.5 \to 0.6$ (77.5% → 80.5%) and $W^{EI} = 0.6 \to 0.75$ (80.5% → 84.75%). The I-recruitment cliff sits at the second step, between $W^{EI} = 0.6$ (I = 0) and $W^{EI} = 0.75$ (I ≈ 1.6 Hz). In the gap between the cliffs sits the high-E stretched-COBA solution at $W^{EI} = 0.6$ , which uses pure E firing without engaging the loop. With more compute the network finds an E-only solution that closes much of the gap with PING — the loop is a useful but not necessary mechanism for high accuracy at this spike budget.

Plotting the 30 cells in (E rate, accuracy) space — colour-coded by I rate, same convention as Figure 2b — makes the new attractor visible directly:

Scatter of 30 points showing best-epoch accuracy vs hidden E rate. Points are coloured by hidden I rate from dark purple (I=0) to bright yellow (I=22 Hz). Three groups: a low-E dark-purple cluster at accuracy 77-78% (W^EI ≤ 0.5, I=0, E ≈ 2.2-2.6 Hz); a high-E dark-purple knot at accuracy 80.5% (W^EI=0.6, I=0, E ≈ 4.2 Hz); a transitional W^EI=0.75 cluster (some seeds at I≈4.7 Hz, some at I=0, E ≈ 7.8 Hz, acc ≈ 84.75%); a green/yellow upper band at accuracy 83-86% (W^EI ≥ 1, I > 12 Hz, E 2-3 Hz). Labels mark W^EI values. — Each dot is one $(W^{EI}, \text{seed})$ cell of the 100-epoch diagonal sweep, plotted at its best-epoch test accuracy vs final hidden-E rate. Colour encodes hidden-I rate; labels mark the $W^{EI}$ value at one representative seed.

The diagonal scatter resolves the same physics as Figure 2b at higher resolution. The dark-purple ( $I = 0$ ) cluster now visibly splits into two sub-clouds: a low-E group ( $W^{EI} \leq 0.5$ , E ≈ 2.4–3.7 Hz, acc ≈ 77%) and a tight knot at $W^{EI} = 0.6$ (E ≈ 7.7 Hz, acc ≈ 81%) — the high-accuracy stretched-COBA solution. They share the silent-I label but separate cleanly in $(E, \text{acc})$ . The green/yellow PING cluster ( $W^{EI} \geq 0.75$ ) sits above at 82–85% accuracy with E rates clamped to 2–5 Hz by inhibitory feedback. The recruitment of inhibition (purple→green colour transition) and the accuracy cliff (vertical jump) happen at the same $W^{EI}$ step in this slice (0.6 → 0.75), but the two purple sub-clouds prove that high accuracy is achievable on either side of the I-recruitment threshold — just at very different E rates.

Discussion

The $(W^{EI}, W^{IE})$ map separates two accuracy basins — a high-E silent-I stretched-COBA solution and a low-E loop-engaged PING solution — that share the loss surface but live at very different per-spike economy. The PING corner is not uniquely optimal for accuracy at this tier, but it is the only corner where the E rate is held down to single digits; the rest of the per-spike-economy story in ar009 lives there.

Next steps

Run the same diagonal at full MNIST + more epochs to check whether the two basins converge or stay separated.
Sweep $W^{EI}$ and $W^{IE}$ independently (off-diagonal) to see whether the I-recruitment threshold tracks the product or one factor alone.