040 — CUBA-PING

Abstract

nb025nb038 characterise PING in the conductance-based regime, where every synapse has an exponential filter at τAMPA\tau_\text{AMPA} or τGABA\tau_\text{GABA}. This entry asks whether current-based (CUBA) PING with instant synapses — no synaptic filtering at all — still produces the gamma dynamics and can be trained.

Methods

LIF E and I populations with the same wiring as nb025’s COBA-PING but with no synaptic state: every presynaptic spike at t1t-1 contributes its weight to the postsynaptic current at tt and is gone by t+1t+1.

  • NE=1024N_E = 1024, NI=256N_I = 256, Nin=784N_\text{in} = 784, Δt=1\Delta t = 1 ms, T=200T = 200 ms.
  • Membrane: τm=20\tau_m = 20 ms, Vrest=0V_\text{rest} = 0, Vth=1V_\text{th} = 1, hard reset to VrestV_\text{rest} on spike, Rm=1R_m = 1. VV is floored at VrestV_\text{rest} after each step (membrane can’t fall below the dominant ion’s reversal potential). No refractory period — after reset the cell is free to fire on the next step; the only mechanisms limiting firing rate are membrane decay back to threshold and (in the PING arm) the I-loop suppression.
  • Weights: WEIN(1,0.1)W^{EI} \sim \mathcal{N}(1, 0.1), WIEN(1,0.1)W^{IE} \sim \mathcal{N}(1, 0.1), both clamped non-negative; WinW_\text{in} is 95% sparse with N(0,0.5)\mathcal{N}(0, 0.5) entries.
Vt+1E=VtE+Δtτm ⁣((VtEVrest)+Rm(xtWinst1IWIE)),stE=1[VtEVth]V^E_{t+1} = V^E_t + \tfrac{\Delta t}{\tau_m}\!\left(-(V^E_t - V_\text{rest}) + R_m(x_t W_\text{in} - s^I_{t-1} W^{IE})\right), \quad s^E_t = \mathbb{1}[V^E_t \geq V_\text{th}] Vt+1I=VtI+Δtτm ⁣((VtIVrest)+Rmst1EWEI),stI=1[VtIVth]V^I_{t+1} = V^I_t + \tfrac{\Delta t}{\tau_m}\!\left(-(V^I_t - V_\text{rest}) + R_m\, s^E_{t-1} W^{EI}\right), \quad s^I_t = \mathbb{1}[V^I_t \geq V_\text{th}]

Both arms — full PING and the E-only ablation — go through oscilloscope train --model cuba-ping ... and --model cuba-noping ... respectively; recipe, optimiser, surrogate gradient, TBPTT window, batching, and seed all match. The output layer is a non-spiking LIF integrator (τout=20\tau_\text{out} = 20 ms) whose time-averaged membrane is the logits — so silent hidden activity gives a flat output and no class signal. The hidden layer is forced to spike for the readout to carry any information.

Results

Untrained dynamics

Fixed weights, no training. Spatially uniform Poisson input at 80 Hz per channel for the full 200 ms window.

Figure 1. CUBA-PING dynamics under uniform Poisson input — no training
Two-panel raster of CUBA-PING at uniform 80 Hz Poisson input. E neurons fire in sparse, time-locked bursts; I neurons fire in narrow synchronous bursts immediately following each E burst.

Spike raster from one trial with fixed random weights and uniform 80 Hz Poisson input on every channel. E above (black), I below (red). I shows narrow synchronous bursts at gamma cadence; E bursts immediately precede each I burst.

Recruitment threshold, inhibitory clamp on the E rate, periodic E→I bursts — the architectural signatures of PING survive without any synaptic filter. The cycle period is set jointly by τm\tau_m and the time it takes the input drive to push enough E cells back above threshold for the next round.

Training CUBA-PING

WinW_\text{in} and WoutW_\text{out} trainable, frozen WEIW^{EI} and WIEW^{IE}. Surrogate-gradient BPTT on Poisson-encoded MNIST. Adam at lr 2×1032 \times 10^{-3}, batch 64, mem-mean output readout, TBPTT window K=10K = 10. The K=10K = 10 window is load-bearing — full BPTT diverges (Section BPTT stability below).

Figure 2. Training curves — CUBA-PING with output-LIF readout + TBPTT (K=10)
Three panels: training cross-entropy loss decreasing, test accuracy climbing past chance, and hidden E (black) / I (red) firing rates climbing slowly.

Cross-entropy loss (left), test accuracy (middle), and mean hidden E (black) / I (red) rates (right). The I-loop suppresses E to ≈1 Hz while the readout drives accuracy upward. Headline numbers above reflect the run that produced these figures.

Figure 3. Trained CUBA-PING — test trial raster
Two-panel raster of the trained network replayed on a single test trial. E neurons fire sparsely; I neurons fire in periodic bursts.

One MNIST test sample replayed on the trained network. Untrained PING dynamics (Figure 1) survive — E spikes are sparse and digit-selective; I bursts at gamma cadence.

CUBA-no-PING ablation

The interesting question is why CUBA-PING ends up at sub-Hz E rates. Two candidates:

  1. The I-loop clamps the rate. Inhibitory feedback suppresses E whenever it gets recruited; the optimiser finds a sparse-spike solution. PING is necessary.
  2. The mem-mean readout doesn’t need spikes. A continuous membrane readout can classify from VEV^E patterns directly. PING is incidental — any CUBA LIF stack would do.

The control: a CUBA no-PING network (E-only, no WEIW^{EI}, no WIEW^{IE}, no I population), same recipe as above.

Vt+1E=VtE+Δtτm ⁣((VtEVrest)+RmxtWin),stE=1[VtEVth]V^E_{t+1} = V^E_t + \tfrac{\Delta t}{\tau_m}\!\left(-(V^E_t - V_\text{rest}) + R_m\, x_t W_\text{in}\right), \quad s^E_t = \mathbb{1}[V^E_t \geq V_\text{th}]
Figure 4. Training curves — CUBA-no-PING (E-only)
Three panels: training cross-entropy loss decreasing fast, test accuracy climbing past 80%, and hidden E rate climbing to tens of Hz.

Same axes as Figure 2 minus the I-loop. The E rate climbs immediately into the tens of Hz; accuracy converges within a handful of epochs.

Figure 5. Trained CUBA-no-PING — test trial E raster
Single-panel raster of the trained CUBA-no-PING network on one MNIST test trial.

E spike raster for one MNIST test trial after training. No I population to render. Compare to Figure 3 — the trained network is dense-spiking across the trial rather than sparsely bursting.

Figure 6. CUBA-PING vs CUBA-no-PING — side-by-side
Two panels: test accuracy and hidden E firing rate for both arms over training.

Black: CUBA-PING. Red: CUBA-no-PING. The rate curves diverge by an order of magnitude; the accuracy gap is bounded.

Figure 7. Headline numbers — final accuracy and E rate
Two bar-chart panels comparing final test accuracy and final hidden-E firing rate for CUBA-PING (black) and CUBA-no-PING (red).

Final test accuracy (left) and mean hidden-E firing rate (right) at the end of training. The same axes as Figure 6 collapsed to single numbers — the rate-accuracy trade in headline form.

Both effects of the I-loop show up cleanly:

  1. The rate clamp is real and strong. Removing the I-loop lets the E rate climb into the tens of Hz — biologically high but within the snnTorch / rate-coded SNN regime. The I-loop in place holds the rate near 1 Hz: an order-of-magnitude sparsification with no LrateL_\text{rate} regulariser.
  2. The sparsification has a real but bounded accuracy cost. Roughly 4 pp at large tier. The output-LIF readout integrates spike events, so suppressing E spikes throttles the output membrane response; the network still has enough integrated signal to classify, just less than the dense-spike control.

This is the same tradeoff the COBA-PING baselines in nb025 Figure 5 walk via the θu\theta_u spike-budget term — except CUBA-PING walks it by the architectural inhibitory clamp rather than by an explicit penalty.

BPTT stability

The single design choice in the training section that takes some justification is the TBPTT window. With K=10K = 10 the network trains; with K=T=200K = T = 200 (full BPTT) gradients overflow to NaN within the first batch.

The recurrent Jacobian

The loss depends on {stE}\{s^E_t\} via the output-LIF integrator. By the chain rule the gradient on WinW_\text{in} accumulates L/VtE\partial L / \partial V^E_t at every step, and that backward signal routes through s_E and s_I via the I-loop even though the forward readout doesn’t.

One E→I→E round trip:

VtEspikestEVI updateVt+1Ispikest+1IVE updateVt+2EV^E_t \xrightarrow{\text{spike}} s^E_t \xrightarrow{V^I \text{ update}} V^I_{t+1} \xrightarrow{\text{spike}} s^I_{t+1} \xrightarrow{V^E \text{ update}} V^E_{t+2}

Four Jacobian links: surrogate at E (bounded by σ\sigma, the surrogate slope), E→I drive (αRmWEI\alpha R_m\, W^{EI} with operator norm αRmWEI\alpha R_m \|W^{EI}\|), surrogate at I (bounded by σ\sigma), I→E suppression (αRmWIE\alpha R_m\, W^{IE} with operator norm αRmWIE\alpha R_m \|W^{IE}\|). Multiplying:

Vt+2EVtEloop(αRm)2σ2WEIWIE\left\|\frac{\partial V^E_{t+2}}{\partial V^E_t}\right\|_{\text{loop}} \leq (\alpha R_m)^2\, \sigma^2\, \|W^{EI}\|\, \|W^{IE}\|

For random matrices with entries N(μ,σw)\mathcal{N}(\mu, \sigma_w) the spectral norm is well-approximated by μmin(Npre,Npost)+O(σwmax(Npre,Npost))\mu \sqrt{\min(N_\text{pre}, N_\text{post})} + O(\sigma_w \sqrt{\max(N_\text{pre}, N_\text{post})}):

  • WEIW^{EI}, WIEW^{IE} at N(1,0.1)\mathcal{N}(1, 0.1), shape 1024×2561024 \times 256 or transpose: W2561+0.1102419\|W\| \approx \sqrt{256} \cdot 1 + 0.1 \sqrt{1024} \approx 19.
  • (αRm)2σ2WEIWIE(0.05)219190.9(\alpha R_m)^2 \sigma^2 \|W^{EI}\| \|W^{IE}\| \approx (0.05)^2 \cdot 19 \cdot 19 \approx 0.9.

Just under unity at this recipe — but with moderate spectral slack the per-cycle factor exceeds 1 and growth begins. Over T=200T = 200 steps the compound overflows float32 well before the trial ends, so full-horizon BPTT is unreliable in practice.

Why truncated BPTT works

TBPTT with window KK caps the per-cycle compound to K/2K/2 round trips instead of T/2T/2. The per-step factor raised to K/2=5K/2 = 5 is finite and well within float32. More importantly, the consistently-signed suppression signal is now summed over K=10K = 10 steps instead of T=200T = 200, so its magnitude is comparable to the readout signal and the optimiser sees a balanced gradient.

The cost is bias. TBPTT computes the gradient as if each window were independent. This is wrong — VEV^E at step 10 really does depend on VEV^E at step 1 via membrane decay. For 200 ms MNIST trials this bias is negligible; for sequence tasks with genuine long-range structure (language modelling, sequence MNIST), TBPTT would underestimate long-range gradients and systematically miss those features. Calling it “truncated” is a bit unfair — the truncation isn’t a bug, it’s the regulariser that lets you train at all. Local-BPTT or gradient horizon limit would be a more honest name.

Discussion

The instant-synapse CUBA-PING has the right dynamics without any synaptic filtering (Figure 1), is trainable when (a) the readout is an output-LIF mem-mean that requires hidden spikes to flow and (b) the BPTT horizon is limited to a few times τm\tau_m, and is ablation-checked: removing the I-loop raises the E rate by an order of magnitude at a bounded ≈4 pp accuracy cost. Both arms go through the same oscilloscope code path, so the only difference between Figures 2/3 and Figures 4/5 is the I population’s presence.

Next steps

  1. Scale to full MNIST. Training loss is still decreasing at the end of the medium-tier run, so the operating point is not yet converged. Full MNIST + more epochs should close the gap to the COBA-PING baselines.
  2. Add an explicit LrateL_\text{rate} like nb025’s, exposing a θu\theta_u knob that lets you slide along the rate–accuracy frontier on top of what the architecture already gives.
  3. Replace TBPTT with a principled stabiliser — spectral normalisation of WEIW^{EI} / WIEW^{IE}, or a Jacobian-norm regulariser — that lets full-horizon BPTT through and closes the COBA-PING gap from below.
  4. Add a refractory window (1.5\approx 1.5 ms for E, 0.5\approx 0.5 ms for I, matching the COBANet baselines). The current model has none, which gives every cell an unbounded firing-rate ceiling and likely contributes to the no-PING control’s ≈70 Hz E rate. With refractory in place that ceiling drops to 1/(τref)6671/(\tau_\text{ref}) \approx 667 Hz for E cells; the no-PING upper rate would compress, narrowing the rate gap to the PING arm without changing the qualitative comparison.