048 — Trained PING streams sequential digits without retraining

Abstract

Can a network trained one-digit-at-a-time on 200 ms trials classify a stream of digits at $\tau \ll T$ ms each, without retraining? At $\tau = 50$ ms (≈ 2 gamma cycles) all five sequential MNIST digits classify correctly, with cycle-locked sparse spiking and a readout that flips within one cycle of each transition. A 2D sweep over $(\tau, \text{input rate})$ shows a sub-cycle failure floor at $\tau < 15$ ms and a broad high-accuracy plateau above it where short presentation time and weak drive trade off cleanly along iso-accuracy diagonals.

Methods

Networks. Canonical nb025 PING baselines at $\theta_u =$ off, three seeds (42, 43, 44), medium tier (1600 train / 400 test, 100 epochs). Inference-only at the trained $\Delta t = 0.1$ ms; the 2D sweep averages over the three seeds, single-trial demos use seed 42.

Stream construction. Sample $n$ MNIST test digits, encode each as a Poisson spike train for $\tau$ ms at the given input rate, concatenate along time. Digit transitions are hard switches — the Poisson rate flips instantaneously at $t = k\tau$ . The network processes the full stream in one forward pass. The 2D sweep covers $\tau \in \{10, 15, 25, 40, 50, 75, 100, 200\}$ ms × input rate $\in \{5, 10, 25, 50, 100, 200\}$ Hz per channel, with 40 streams × 10 digits per cell (1200 segments per cell, 3 seeds).

Readout. Same output LIF, same $W_\text{out}$ , same $\beta_\text{out} = \exp(-\Delta t / \tau_\text{out})$ with $\tau_\text{out} = 2$ ms. The only difference from training: a sliding-window mean of width $\tau$ ms replaces the full-trial integration.

v_\text{out}(t) \;=\; \beta_\text{out}\, v_\text{out}(t-1) \;+\; \frac{1-\beta_\text{out}}{\Delta t}\, \mathbf{s}^E(t-1)\, W_\text{out}, \qquad \text{logits}(t) \;=\; \frac{1}{w}\sum_{u=t-w+1}^{t} v_\text{out}(u)

Per-segment prediction is read at the end of each $\tau$ -window.

Results

Headline: 5 digits at $\tau = 50$ ms

Four-panel column figure showing streaming digit classification on trained PING. Top: five MNIST digits (5, 6, 0, 7, 3) shown as thumbnails, each labelled with the true and predicted class — all 5/5 correct. Second panel: hidden E raster across 250 ms, showing gamma cycles at ≈ 28 ms cadence; vertical dotted lines mark the 5 segment boundaries. Third panel: hidden I raster, also at gamma cadence. Bottom panel: 10-class readout probability over time; the true-class trace is highlighted thick in its colour and reaches near 1.0 within each segment, with rapid transitions at each digit boundary. — Five sequential digits (5, 6, 0, 7, 3) at $\tau = 50$ ms each — ≈ 2 gamma cycles per digit, 250 ms total. Gamma cadence (≈ 28 ms) is preserved across the stream. The readout flips to the new digit within one cycle of each transition and reaches near 1.0 by the segment’s end. **5/5 correct.**

Varying $(\tau, \text{rate})$ stream

A stronger test: vary both knobs within a single stream so each digit gets its own duration and input rate.

Four-panel column figure. Top: five MNIST digits across 450 ms, each segment labelled with its own (τ, rate) and a class-prediction badge. Digit thumbnails are drawn with opacity scaled to input rate (log-mapped, low rate → faint, high rate → bold) so the drive magnitude is visually encoded. Segments from left to right: (200 ms, 10 Hz) → digit 5; (50 ms, 100 Hz) → digit 3; (100 ms, 25 Hz) → digit 4; (25 ms, 200 Hz) → digit 1; (75 ms, 15 Hz) → digit 7. All five predictions correct. Hidden E and I rasters show gamma cycles throughout, with sparser firing during the weak-drive segments and denser firing during the strong-drive ones. Readout probability traces show clean class identification per segment. — **5/5** with $(\tau, \text{rate})$ varying *within* a single stream — durations 25–200 ms, rates 10–200 Hz. Thumbnail opacity ∝ input rate (faint = weak drive, bold = strong). The sliding window uses each segment’s own $\tau$ , so each digit’s prediction respects its presentation window.

Accuracy across the $(\tau, \text{rate})$ grid

To map the operating regime quantitatively, sweep $\tau$ and input rate independently and report per-segment accuracy.

Heatmap of per-segment accuracy across τ ∈ {10, 15, 25, 40, 50, 75, 100, 200} ms (x-axis) and input rate ∈ {5, 10, 25, 50, 100, 200} Hz (y-axis), 48 cells total, averaged over 3 trained seeds × 1200 segments per cell. Magma colormap from 0 to 100%. Lower-left corner (very short τ, low rate): 14% at (10 ms, 5 Hz). Upper-right corner (long τ, high rate): 91% at (200 ms, 200 Hz). The high-accuracy region is broad and the contour is roughly hyperbolic in (τ × rate). The trained baseline cell at (200 ms, 25 Hz) sits at 88%. A sub-cycle failure regime is visible at τ ≤ 15 ms where even the strongest input fails to reach 80%. — 8 × 6 = 48 $(\tau, \text{input rate})$ cells, 3 seeds × 1200 segments per cell. Extended down to $\tau = 10$ ms (≈ 0.36 of a gamma cycle) to resolve the sub-cycle failure regime.

Three observations:

Sub-cycle failure below $\tau \approx 15$ ms. At $\tau = 10$ ms even 200 Hz input only reaches 59% — the architecture cannot classify within less than one cycle regardless of drive. This is the cleanest evidence that the gamma cycle is the temporal quantum of the trained network’s classification ability.
Above one cycle, accuracy ≈ $f(\tau \cdot \text{rate})$ . Iso-accuracy contours run diagonal in log-log space; $\tau$ and input rate substitute for each other once the cycle quantum is cleared. Boosting input rate compensates for short presentation time and vice versa.
The trained $(200 \text{ ms}, 25 \text{ Hz})$ cell at 88% is interior to the plateau — corner cells gain only 3 pp. Substantial headroom in both directions at the trained operating point.

Discussion

The headline figure makes four claims simultaneously visible: sparseness in the E raster, cycle locking in the I-burst clock, class-tracking readout probabilities, and re-identification within ≈ 1 cycle of each digit switch. The trained PING dynamics carry per-cycle class information, and the sliding-window readout — the only change from the training-time readout — surfaces that information at the per-segment timescale.

For the rate-floor story in ar009 / ar010: the architecture’s per-cell rate is bounded by $r_E = p \cdot f_\gamma$ (nb041, nb046). The grid says that with enough drive the network can classify within roughly one cycle — per-cycle information density is sufficient for the task. The bound is tight but not crippling.

Two caveats:

Sliding-window ≠ trained readout. Same output LIF and $W_\text{out}$ , only the integration window changes. A streaming-specific retrain could do better; the point here is that the already-trained network is functional in this regime without any new gradient.
Hard switches, not cross-fades. Real continuous-input regimes (speech, video) blend transitions; we don’t test that.

Next steps

Train on streaming data. Does a network trained with $\tau = 50$ ms segments and a sliding readout outperform post-hoc streaming on the medium-tier baseline?
Cross-fade transitions. Replace hard switches with 10-ms blends; quantify the network’s transient-response time directly.
$\tau \times \tau_\text{GABA}$ interaction. If $\tau_\text{GABA}$ changes the cycle period (per nb041), does the streaming sweet-spot scale linearly with it?
COBA control. Run the same protocol on a trained COBA baseline (no gamma cycle) to ask whether anything here is PING-specific, or just a property of any trained mem-mean readout.