048 — Trained PING streams sequential digits without retraining

Abstract

Can a network trained one-digit-at-a-time on 200 ms trials classify a stream of digits at τT\tau \ll T ms each, without retraining? At τ=50\tau = 50 ms (≈ 2 gamma cycles) all five sequential MNIST digits classify correctly, with cycle-locked sparse spiking and a readout that flips within one cycle of each transition. A 2D sweep over (τ,input rate)(\tau, \text{input rate}) shows a sub-cycle failure floor at τ<15\tau < 15 ms and a broad high-accuracy plateau above it where short presentation time and weak drive trade off cleanly along iso-accuracy diagonals.

Methods

Networks. Canonical nb025 PING baselines at θu=\theta_u = off, three seeds (42, 43, 44), medium tier (1600 train / 400 test, 100 epochs). Inference-only at the trained Δt=0.1\Delta t = 0.1 ms; the 2D sweep averages over the three seeds, single-trial demos use seed 42.

Stream construction. Sample nn MNIST test digits, encode each as a Poisson spike train for τ\tau ms at the given input rate, concatenate along time. Digit transitions are hard switches — the Poisson rate flips instantaneously at t=kτt = k\tau. The network processes the full stream in one forward pass. The 2D sweep covers τ{10,15,25,40,50,75,100,200}\tau \in \{10, 15, 25, 40, 50, 75, 100, 200\} ms × input rate {5,10,25,50,100,200}\in \{5, 10, 25, 50, 100, 200\} Hz per channel, with 40 streams × 10 digits per cell (1200 segments per cell, 3 seeds).

Readout. Same output LIF, same WoutW_\text{out}, same βout=exp(Δt/τout)\beta_\text{out} = \exp(-\Delta t / \tau_\text{out}) with τout=2\tau_\text{out} = 2 ms. The only difference from training: a sliding-window mean of width τ\tau ms replaces the full-trial integration.

vout(t)  =  βoutvout(t1)  +  1βoutΔtsE(t1)Wout,logits(t)  =  1wu=tw+1tvout(u)v_\text{out}(t) \;=\; \beta_\text{out}\, v_\text{out}(t-1) \;+\; \frac{1-\beta_\text{out}}{\Delta t}\, \mathbf{s}^E(t-1)\, W_\text{out}, \qquad \text{logits}(t) \;=\; \frac{1}{w}\sum_{u=t-w+1}^{t} v_\text{out}(u)

Per-segment prediction is read at the end of each τ\tau-window.

Results

Headline: 5 digits at τ=50\tau = 50 ms

Figure 1. Streaming digit classification at τ = 50 ms
Four-panel column figure showing streaming digit classification on trained PING. Top: five MNIST digits (5, 6, 0, 7, 3) shown as thumbnails, each labelled with the true and predicted class — all 5/5 correct. Second panel: hidden E raster across 250 ms, showing gamma cycles at ≈ 28 ms cadence; vertical dotted lines mark the 5 segment boundaries. Third panel: hidden I raster, also at gamma cadence. Bottom panel: 10-class readout probability over time; the true-class trace is highlighted thick in its colour and reaches near 1.0 within each segment, with rapid transitions at each digit boundary.

Five sequential digits (5, 6, 0, 7, 3) at τ=50\tau = 50 ms each — ≈ 2 gamma cycles per digit, 250 ms total. Gamma cadence (≈ 28 ms) is preserved across the stream. The readout flips to the new digit within one cycle of each transition and reaches near 1.0 by the segment’s end. 5/5 correct.

Varying (τ,rate)(\tau, \text{rate}) stream

A stronger test: vary both knobs within a single stream so each digit gets its own duration and input rate.

Figure 3. Varying (τ, input rate) within a single stream
Four-panel column figure. Top: five MNIST digits across 450 ms, each segment labelled with its own (τ, rate) and a class-prediction badge. Digit thumbnails are drawn with opacity scaled to input rate (log-mapped, low rate → faint, high rate → bold) so the drive magnitude is visually encoded. Segments from left to right: (200 ms, 10 Hz) → digit 5; (50 ms, 100 Hz) → digit 3; (100 ms, 25 Hz) → digit 4; (25 ms, 200 Hz) → digit 1; (75 ms, 15 Hz) → digit 7. All five predictions correct. Hidden E and I rasters show gamma cycles throughout, with sparser firing during the weak-drive segments and denser firing during the strong-drive ones. Readout probability traces show clean class identification per segment.

5/5 with (τ,rate)(\tau, \text{rate}) varying within a single stream — durations 25–200 ms, rates 10–200 Hz. Thumbnail opacity ∝ input rate (faint = weak drive, bold = strong). The sliding window uses each segment’s own τ\tau, so each digit’s prediction respects its presentation window.

Accuracy across the (τ,rate)(\tau, \text{rate}) grid

To map the operating regime quantitatively, sweep τ\tau and input rate independently and report per-segment accuracy.

Figure 4. Accuracy heatmap across (τ, input rate)
Heatmap of per-segment accuracy across τ ∈ {10, 15, 25, 40, 50, 75, 100, 200} ms (x-axis) and input rate ∈ {5, 10, 25, 50, 100, 200} Hz (y-axis), 48 cells total, averaged over 3 trained seeds × 1200 segments per cell. Magma colormap from 0 to 100%. Lower-left corner (very short τ, low rate): 14% at (10 ms, 5 Hz). Upper-right corner (long τ, high rate): 91% at (200 ms, 200 Hz). The high-accuracy region is broad and the contour is roughly hyperbolic in (τ × rate). The trained baseline cell at (200 ms, 25 Hz) sits at 88%. A sub-cycle failure regime is visible at τ ≤ 15 ms where even the strongest input fails to reach 80%.

8 × 6 = 48 (τ,input rate)(\tau, \text{input rate}) cells, 3 seeds × 1200 segments per cell. Extended down to τ=10\tau = 10 ms (≈ 0.36 of a gamma cycle) to resolve the sub-cycle failure regime.

Three observations:

  1. Sub-cycle failure below τ15\tau \approx 15 ms. At τ=10\tau = 10 ms even 200 Hz input only reaches 59% — the architecture cannot classify within less than one cycle regardless of drive. This is the cleanest evidence that the gamma cycle is the temporal quantum of the trained network’s classification ability.
  2. Above one cycle, accuracy ≈ f(τrate)f(\tau \cdot \text{rate}). Iso-accuracy contours run diagonal in log-log space; τ\tau and input rate substitute for each other once the cycle quantum is cleared. Boosting input rate compensates for short presentation time and vice versa.
  3. The trained (200 ms,25 Hz)(200 \text{ ms}, 25 \text{ Hz}) cell at 88% is interior to the plateau — corner cells gain only 3 pp. Substantial headroom in both directions at the trained operating point.

Discussion

The headline figure makes four claims simultaneously visible: sparseness in the E raster, cycle locking in the I-burst clock, class-tracking readout probabilities, and re-identification within ≈ 1 cycle of each digit switch. The trained PING dynamics carry per-cycle class information, and the sliding-window readout — the only change from the training-time readout — surfaces that information at the per-segment timescale.

For the rate-floor story in ar009 / ar010: the architecture’s per-cell rate is bounded by rE=pfγr_E = p \cdot f_\gamma (nb041, nb046). The grid says that with enough drive the network can classify within roughly one cycle — per-cycle information density is sufficient for the task. The bound is tight but not crippling.

Two caveats:

  • Sliding-window ≠ trained readout. Same output LIF and WoutW_\text{out}, only the integration window changes. A streaming-specific retrain could do better; the point here is that the already-trained network is functional in this regime without any new gradient.
  • Hard switches, not cross-fades. Real continuous-input regimes (speech, video) blend transitions; we don’t test that.

Next steps

  • Train on streaming data. Does a network trained with τ=50\tau = 50 ms segments and a sliding readout outperform post-hoc streaming on the medium-tier baseline?
  • Cross-fade transitions. Replace hard switches with 10-ms blends; quantify the network’s transient-response time directly.
  • τ×τGABA\tau \times \tau_\text{GABA} interaction. If τGABA\tau_\text{GABA} changes the cycle period (per nb041), does the streaming sweet-spot scale linearly with it?
  • COBA control. Run the same protocol on a trained COBA baseline (no gamma cycle) to ask whether anything here is PING-specific, or just a property of any trained mem-mean readout.