036 — W^EI sets the cliff; W^IE shapes the basin (trains)

Abstract

Maps the (wEI, wIE) coupling plane: which weight controls the recruitment cliff, and where do trained networks land when initialised across it? wEI moves the cliff (E→I drive is the recruitment knob); wIE shapes the basin above engagement but does not move the cliff. Training from each grid point lands in two distinct basins — a low-E PING corner and a high-E silent-I stretched-COBA corner.

Methods

Training recipe (canonical / medium tier):

ParameterValue
Integration timestep Δt\Delta t0.1 ms
Trial duration TT200 ms
MNIST samples (80/20 stratified split of 2000)1600 train / 400 test (\approx 2.9% of the 70k-sample MNIST corpus)
Epochs10

PING and COBA baseline definitions, training recipe, and the spike-budget regulariser are in nb025. The recruitment cliff at ff^\star is introduced in nb025. This entry asks: which part of the E↔I coupling architecture controls the cliff’s position, and what happens when we train the network from coupling initialisations spanning both sides of it?

Results

Coupling sweep: shifting the recruitment cliff

If ff^\star is set by the E↔I coupling, altering WEIW^{EI} or WIEW^{IE} at inference should move it. On the trained PING baseline (θu=\theta_u = off), we multiply either WEIW^{EI} or WIEW^{IE} by a scalar in {0.25,0.5,1.0,2.0}\{0.25, 0.5, 1.0, 2.0\} and re-run the WinW_\text{in} scale sweep on top. All other weights frozen.

Figure 1. Recruitment-cliff migration with coupling scale
Four-panel figure. Left column: vary W^{EI} scale, W^{IE} held at 1. Right column: vary W^{IE} scale, W^{EI} held at 1. Top row: test accuracy vs W_in scale s. Bottom row: I rate vs s. Left column: cliff moves right as W^{EI} weakens; W^{EI} × 0.25 needs s ≈ 0.5 to engage, baseline (× 1.0) engages at s ≈ 0.25, and × 2.0 sits on top of baseline. Right column: all four curves overlap — varying W^{IE} doesn't move the cliff, only changes basin dynamics above it.

Inference-time WinW_\text{in} scale sweep on trained PING (θu=\theta_u = off) with either WEIW^{EI} or WIEW^{IE} scaled. Top: accuracy. Bottom: I rate. Left: WEIW^{EI} sweep, WIEW^{IE} fixed. Right: WIEW^{IE} sweep, WEIW^{EI} fixed.

Figure 1 cleanly separates the two roles. WEIW^{EI} moves the cliff (left column): halving E→I coupling shifts the recruitment edge from s0.25s \approx 0.25 to s0.35s \approx 0.35, quartering it to s0.50s \approx 0.50. Doubling does not push it further left because below s0.2s \approx 0.2 the feedforward drive itself is too weak to make E fire — the bottleneck has moved from I-cell recruitment to E-cell firing, and stronger WEIW^{EI} cannot rescue silent E spikes. WIEW^{IE} does not move the cliff (right column): all four curves lift off at s0.25s \approx 0.25, because WIEW^{IE} only matters once I has fired, and at the cliff I has not. Above the cliff the right-column I-rate curves separate visibly — stronger inhibition per I spike suppresses the steady-state I rate. So the recruitment threshold is set by the E→I drive; inhibitory feedback shapes the basin only above engagement.

The WEIW^{EI} scaling has a clean quantitative signature: the unsaturated points obey sWscaleEI0.25s^\star \cdot \sqrt{W^{EI}_\text{scale}} \approx 0.25 to three significant figures (products 0.250, 0.247, 0.250), so empirically s1/WEIs^\star \propto 1/\sqrt{W^{EI}} rather than the naive 1/WEI1/W^{EI}. The exponent is a fingerprint of the gain function ΦE\Phi_E near rheobase, and a concrete prediction for the mean-field analysis in nb033 to reproduce.

WEIW^{EI} x WIEW^{IE} training grid

The coupling-sweep results above scaled WEIW^{EI} and WIEW^{IE} on an already-trained network. A stronger test is to train one network per grid point with WEIW^{EI} and WIEW^{IE} initialised to specified values from scratch, under heavy spike penalty. Do the resulting solutions cluster into a PING-like region and a COBA-like region?

Method: 5×5 grid over WEI,WIE{0,0.25,0.5,1.0,2.0}W^{EI}, W^{IE} \in \{0, 0.25, 0.5, 1.0, 2.0\} (mean initialisation, std fixed at 0.1×0.1 \times mean). One PING-architecture training run per cell at θu=0.2\theta_u = 0.2 from epoch 0, Win=0.6W_\text{in} = 0.6 (a compromise between COBA’s 0.3 init and PING’s 1.2 — gives the no-loop cells a fair shot at converging), 100 training epochs per cell, seed 42. All other hyperparameters from the standard PING recipe. 25 trainings on Modal A100 in parallel. Reporting per-cell best-epoch accuracy.

Figure 2. W^EI × W^IE training grid — accuracy, E rate, I rate (100 epochs)
Three heatmap panels on a 5x5 grid with W^EI on y-axis and W^IE on x-axis, each value in {0, 0.25, 0.5, 1, 2}. Left: best-epoch accuracy ranges from 75% in the no-loop band to 86% at the PING baseline (1, 2). Middle: hidden E rate is 2-3 Hz in the no-loop and PING cells, up to 9.4 Hz in the high-E stretched-COBA cell (0.5, 2). Right: I rate is 0 in the no-loop band and most stretched-COBA cells, jumps to 11-22 Hz in the PING cluster (W^EI ≥ 1 and W^IE ≥ 1, or W^EI = 2 and W^IE ≥ 0.25).

Three heatmaps on the 5×5 (WEI,WIEW^{EI}, W^{IE}) training grid. One PING-architecture network per cell, trained from scratch at θu=0.2\theta_u = 0.2 for 100 epochs, Win=0.6W_\text{in} = 0.6. Left: best-epoch test accuracy. Middle: hidden-E rate. Right: hidden-I rate (the cluster discriminator).

Three regions emerge:

  • No-loop band (WEI=0W^{EI} = 0 row OR WIE=0W^{IE} = 0 column): accuracy ≈ 76.5–77%, E rate 2.2 Hz, I rate 0. Without a closed E→I→E loop the optimiser settles at the same COBA-low operating point regardless of which side of the loop is zero — the structural-equivalence prediction holds.
  • PING cluster (WEI1.0W^{EI} \geq 1.0 with WIE1.0W^{IE} \geq 1.0, plus the entire WEI=2W^{EI} = 2 row from WIE0.25W^{IE} \geq 0.25): I rate 11.1–22.2 Hz, E rate 2.0–3.6 Hz, accuracy 80.25–86.0%. Gamma signature; the recipe baseline at (WEI,WIE)=(1,2)(W^{EI}, W^{IE}) = (1, 2) peaks at 86.0%, and the cluster includes the over-inhibited corner (2,2)(2, 2) at 82.5% — only ≈ 3.5% drop from peak.
  • Stretched-COBA cluster (WEI{0.25,0.5}W^{EI} \in \{0.25, 0.5\} with most WIEW^{IE}, plus (1.0,0.25)(1.0, 0.25) and (1.0,0.5)(1.0, 0.5)): I rate 0, E rate 2.2–9.4 Hz, accuracy 75–84%. The I-loop never engages; the optimiser uses pure E-cell firing to fit the digits. The high end of this cluster is the cell at (0.5,2.0)(0.5, 2.0) — at 84.0% with E = 9.37 Hz and I = 0, the I-loop is silent yet accuracy is within 2 pp of PING peak.

The (0.5,2.0)(0.5, 2.0) cell deserves special attention. With 30 epochs of training it sat in the PING cluster (some seeds engaging I, some not). At 100 epochs the optimiser deterministically finds a no-I, high-E solution (E = 9.37 Hz, I = 0, acc = 84%). This is a new attractor that the shorter-training version masked — the network learns to classify with elevated E firing in lieu of recruiting the loop.

The clustering confirms the GH #29 hypothesis: under heavy spike penalty, the (coupling, accuracy, rate) outcome lands in qualitatively different solutions depending on coupling strength. The recruitment cliff that the inference-time coupling sweep located (Figure 1) is reproduced here as the boundary of the PING cluster at WEI1W^{EI} \approx 1 (with WIE1W^{IE} \geq 1). The no-loop band at 76.5–77% sits ≈ 9 pp below PING’s 86% peak — substantially better than chance, but the loop is a real accuracy advantage even at this spike budget. The diagonal slice below walks this transition at higher resolution along the canonical WIE=2WEIW^{IE} = 2 W^{EI} line.

Plotting the same 25 cells in (E rate, accuracy) space — colour-coded by I rate — makes the cluster structure direct:

Figure 2b. Grid cells in (E rate, accuracy) space, coloured by I rate (100 epochs)
Scatter of 25 points showing best-epoch accuracy vs hidden E rate. Points are coloured by hidden I rate from dark purple (I=0) to bright yellow (I=22 Hz). Dark-purple points (I=0) span accuracy 75-84% with E rates 2.2-9.4 Hz. Green/yellow points (I>11 Hz) sit at 80-86% accuracy with E rates 2-3.6 Hz. The (0.5, 2.0) cell at acc=84%, E=9.4 Hz, I=0 sits as the outlier high-E no-loop solution.

Each dot is one cell of the 5×5 (WEI,WIE)(W^{EI}, W^{IE}) training grid, plotted at its best-epoch test accuracy vs final hidden-E rate. Colour encodes the third quantity — hidden-I rate — which is what discriminates the clusters. Labels are (WEI,WIE)(W^{EI}, W^{IE}).

Two clouds separate in this view. Dark-purple points (I rate ≈ 0) span accuracy 75–84% with E rates 2.2–9.4 Hz — the no-loop / stretched-COBA region. The top of this band is (0.5,2.0)(0.5, 2.0) at 84% with E = 9.4 Hz, the high-accuracy E-only solution: silent I, elevated E rate, accuracy within ≈ 2 pp of the PING cluster. Yellow/green points (I rate > 11 Hz) sit above at 80–86% accuracy with E rates clamped to 2–3.6 Hz by the active inhibitory loop — the PING cluster, distinguished from no-loop by lower E and active I.

The vertical gap between the two clouds is the recruitment cliff measured in accuracy units: ≈ 2–6 pp depending on where you draw the comparison. The high-E no-loop attractor at (0.5,2.0)(0.5, 2.0) closes most of that gap with E firing alone, but at 4× the per-cycle E spike count.

Overlaying the COBA and PING θu\theta_u-sweep frontiers (Figure 5 of nb025) directly onto the same axes pins where the grid solutions live relative to the two canonical baselines:

Figure 2c. Grid cells with COBA / PING θ_u frontiers superimposed
Same scatter as Figure 2b (25 grid cells coloured by I rate, accuracy 70-90% on y-axis, E rate on x-axis) with two extra curves overlaid: the COBA θ_u sweep (red squares connected by line) traversing high E rates (~88 Hz at θ_u=off down to ~0 Hz at θ_u=0.2) with accuracy 62-88%; and the PING θ_u sweep (black diamonds) clustered tightly at E rate 3-7 Hz with accuracy 85-89%.

Grid scatter (coloured by I rate) plus the COBA (red squares) and PING (black diamonds) θu\theta_u-sweep frontiers from nb025 Figure 5, connecting each model’s (θu,final acc,E)(\theta_u, \text{final acc}, E) points. All curves and points trained for 100 epochs. Annotations on the frontier curves mark θu\theta_u in spikes (or “off”). Note the axes follow the accuracy zoom (70–90%), so COBA’s low-rate tail at θu{0.5,0.2}\theta_u \in \{0.5, 0.2\} drops off the bottom.

The PING frontier sits above the green/yellow grid cluster — the (θu(\theta_u-swept)) PING baseline reaches 85–89% accuracy at E rates 3–7 Hz, higher than the grid’s PING cluster (82–85.5%). This is because the canonical PING recipe uses Win=1.2W_\text{in} = 1.2 but the grid was trained with Win=0.6W_\text{in} = 0.6 (chosen to give no-loop cells a fair shot at converging), which costs a few accuracy points even at the recipe baseline coupling. The PING frontier also never leaves the active-loop regime even when the budget is squeezed all the way to 0.2 spikes — no θu\theta_u value can push PING into the dark-purple band, because the loop sets a structural minimum on E rate. The grid scatter shows that the only way to reach that band is to start training from a (WEI,WIE)(W^{EI}, W^{IE}) init that prevents the loop from engaging in the first place — a structural rather than penalty-driven escape.

The COBA frontier sweeps across 2\geq 2 decades of E rate (≈88 Hz down to ≈0 Hz) at accuracies 88–62%. The interesting part is the low-E tail: at θu=1\theta_u = 1, COBA sits at (E3.3 Hz,acc68%)(E \approx 3.3 \text{ Hz}, \text{acc} \approx 68\%) — and the high-accuracy stretched-COBA grid cell at (0.5,1)(0.5, 1) sits above this point at (E3.5 Hz,acc=78%)(E \approx 3.5 \text{ Hz}, \text{acc} = 78\%). So the grid solution is not just rediscovering COBA — it is finding a strictly better operating point than θu\theta_u-squeezed COBA at the same rate, ≈10 pp higher accuracy with similar firing. This is the third attractor the diagonal slice also surfaces: not PING (no I loop), not penalty-squeezed COBA (different WinW_\text{in} scale, different cell participation pattern), but a coupling-shaped E-only basin the optimiser only finds when initialised inside it.

Training videos

Per-epoch oscilloscope videos for every cell of the grid. Each video shows one epoch per frame against a fixed test trial; gamma cadence (if any) is visible in the E/I rasters. Naming: weiX__wieY for WEI=XW^{EI} = X, WIE=YW^{IE} = Y.

WIE=0W^{IE} = 0WIE=0.25W^{IE} = 0.25WIE=0.5W^{IE} = 0.5WIE=1W^{IE} = 1WIE=2W^{IE} = 2
WEI=0W^{EI} = 0videovideovideovideovideo
WEI=0.25W^{EI} = 0.25videovideovideovideovideo
WEI=0.5W^{EI} = 0.5videovideovideovideovideo
WEI=1W^{EI} = 1videovideovideovideovideo
WEI=2W^{EI} = 2videovideovideovideovideo

1D diagonal slice along the ei_ratio = 2 line

The grid above varies WEIW^{EI} and WIEW^{IE} independently, but the codebase’s standard PING recipe ties them via WIE=ei_ratioWEIW^{IE} = \text{ei\_ratio} \cdot W^{EI} with ei_ratio=2\text{ei\_ratio} = 2. Restricting to that line gives a finer-grained view: one knob (WEIW^{EI}) at 10 values, WIEW^{IE} tracking at twice the value, three seeds per value, 100 training epochs. Accuracy is reported as the per-cell best epoch. Same recipe as the grid otherwise (θu=0.2\theta_u = 0.2, Win=0.6W_\text{in} = 0.6).

Figure 3. W^EI sweep along ei_ratio=2 diagonal — accuracy, E rate, I rate (3 seeds, 100 epochs)
Three line panels with W^EI on x-axis and error bars from 3 seeds. Left: best-epoch accuracy is flat at 77-78% from W^EI 0 to 0.5, jumps to 80.5% at W^EI=0.6, rises further to 84.75% at W^EI=0.75, peaks at 86.08% at W^EI=1, then drops to 83.33% at W^EI=2. Middle: E rate is 2-3 Hz in the no-loop band, peaks at 7.8 Hz at W^EI=0.75, drops back to 2-3 Hz once the loop locks. Right: I rate is exactly 0 for W^EI ≤ 0.6, lifts to 1.6 Hz at W^EI=0.75 then climbs to 22.2 Hz at W^EI=2.

WEIW^{EI} scanned over {0,0.1,0.25,0.4,0.5,0.6,0.75,1.0,1.5,2.0}\{0, 0.1, 0.25, 0.4, 0.5, 0.6, 0.75, 1.0, 1.5, 2.0\} with WIE=2WEIW^{IE} = 2 W^{EI}. Error bars are mean ± std of the best-epoch test accuracy over 3 seeds (42, 43, 44), 100 training epochs each; faint dots show individual seed peaks. Left: test accuracy. Middle: hidden-E rate (last-epoch). Right: hidden-I rate (last-epoch). Grey dotted vertical at WEI=1W^{EI} = 1 marks the canonical recipe baseline.

Three regimes:

  • WEI{0,0.1,0.25,0.4,0.5}W^{EI} \in \{0, 0.1, 0.25, 0.4, 0.5\} — no-loop / stretched-COBA, low-accuracy variant: accuracy 77–78%, I rate exactly 0, E rate 2.2–2.6 Hz. Tight error bars (acc std ≤ 0.9%) — the optimiser deterministically lands in the same E-only basin from any seed.
  • WEI=0.6W^{EI} = 0.6high-accuracy stretched-COBA, transition: 80.50 ± 0.43% accuracy, I rate still exactly 0 for all three seeds, E rate 4.18 Hz. The loop hasn’t engaged but accuracy is already 3 pp above the low-band.
  • WEI=0.75W^{EI} = 0.75transitional with weak loop: 84.75 ± 0.43% accuracy, I rate 1.6 Hz (1 of 3 seeds engages a small I population), E rate 7.80 Hz. The first cell on the sweep where I lifts off zero.
  • WEI{1.0,1.5}W^{EI} \in \{1.0, 1.5\} — full PING: accuracy 85.4–86.1%, I rate climbing 12.7 → 16.7 Hz as the loop tightens, E rate clamped to 2.2–3.2 Hz by inhibitory negative feedback. The recipe baseline at WEI=1W^{EI} = 1 is the peak of the sweep.
  • WEI=2.0W^{EI} = 2.0 — over-inhibited: 83.33 ± 0.63% accuracy, I rate 22.2 Hz, E rate 2.0 Hz. The drop from the plateau is ≈ 3 pp — not catastrophic, but the over-inhibited corner is genuinely less accurate.

The accuracy cliff is now smoother than the 30-epoch picture suggested. The accuracy cliff rises in two steps: WEI=0.50.6W^{EI} = 0.5 \to 0.6 (77.5% → 80.5%) and WEI=0.60.75W^{EI} = 0.6 \to 0.75 (80.5% → 84.75%). The I-recruitment cliff sits at the second step, between WEI=0.6W^{EI} = 0.6 (I = 0) and WEI=0.75W^{EI} = 0.75 (I ≈ 1.6 Hz). In the gap between the cliffs sits the high-E stretched-COBA solution at WEI=0.6W^{EI} = 0.6, which uses pure E firing without engaging the loop. With more compute the network finds an E-only solution that closes much of the gap with PING — the loop is a useful but not necessary mechanism for high accuracy at this spike budget.

Plotting the 30 cells in (E rate, accuracy) space — colour-coded by I rate, same convention as Figure 2b — makes the new attractor visible directly:

Figure 3b. Diagonal cells in (E rate, accuracy) space, coloured by I rate (3 seeds, 100 epochs)
Scatter of 30 points showing best-epoch accuracy vs hidden E rate. Points are coloured by hidden I rate from dark purple (I=0) to bright yellow (I=22 Hz). Three groups: a low-E dark-purple cluster at accuracy 77-78% (W^EI ≤ 0.5, I=0, E ≈ 2.2-2.6 Hz); a high-E dark-purple knot at accuracy 80.5% (W^EI=0.6, I=0, E ≈ 4.2 Hz); a transitional W^EI=0.75 cluster (some seeds at I≈4.7 Hz, some at I=0, E ≈ 7.8 Hz, acc ≈ 84.75%); a green/yellow upper band at accuracy 83-86% (W^EI ≥ 1, I > 12 Hz, E 2-3 Hz). Labels mark W^EI values.

Each dot is one (WEI,seed)(W^{EI}, \text{seed}) cell of the 100-epoch diagonal sweep, plotted at its best-epoch test accuracy vs final hidden-E rate. Colour encodes hidden-I rate; labels mark the WEIW^{EI} value at one representative seed.

The diagonal scatter resolves the same physics as Figure 2b at higher resolution. The dark-purple (I=0I = 0) cluster now visibly splits into two sub-clouds: a low-E group (WEI0.5W^{EI} \leq 0.5, E ≈ 2.4–3.7 Hz, acc ≈ 77%) and a tight knot at WEI=0.6W^{EI} = 0.6 (E ≈ 7.7 Hz, acc ≈ 81%) — the high-accuracy stretched-COBA solution. They share the silent-I label but separate cleanly in (E,acc)(E, \text{acc}). The green/yellow PING cluster (WEI0.75W^{EI} \geq 0.75) sits above at 82–85% accuracy with E rates clamped to 2–5 Hz by inhibitory feedback. The recruitment of inhibition (purple→green colour transition) and the accuracy cliff (vertical jump) happen at the same WEIW^{EI} step in this slice (0.6 → 0.75), but the two purple sub-clouds prove that high accuracy is achievable on either side of the I-recruitment threshold — just at very different E rates.

Discussion

The (WEI,WIE)(W^{EI}, W^{IE}) map separates two accuracy basins — a high-E silent-I stretched-COBA solution and a low-E loop-engaged PING solution — that share the loss surface but live at very different per-spike economy. The PING corner is not uniquely optimal for accuracy at this tier, but it is the only corner where the E rate is held down to single digits; the rest of the per-spike-economy story in ar009 lives there.

Next steps

  1. Run the same diagonal at full MNIST + more epochs to check whether the two basins converge or stay separated.
  2. Sweep WEIW^{EI} and WIEW^{IE} independently (off-diagonal) to see whether the I-recruitment threshold tracks the product or one factor alone.