Tuesday, June 02, 2026

There Are Far More Efficient Ways to Run Neural Networks

 

There Are Far More Efficient Ways to Run Neural Networks

The entire AI industry has standardized on one primitive — the dense floating-point matrix multiply — and then built a trillion-dollar edifice of GPUs, data centers, and capital around the assumption that intelligence is that primitive at scale. It is worth saying plainly: that is an assumption, not a law of computing. And it is an expensive one.

I want to make a narrow, concrete version of this argument, because the sweeping version is easy to wave away. Here it is: a neural network can run with no multiplications at all, and the arithmetic we burn most of our energy on is optional. I’ll show it running, then say where it leads.

How we got locked in

In 2012 a big neural network plus a GPU plus a large dataset worked, and the field reorganized around that fact. Every layer of the stack since then has been optimized for dense floating-point matmul: tensor cores designed to feed it, libraries tuned for it, model architectures shaped to saturate it, frameworks that assume it. The loop is self-reinforcing — hardware rewards matmul, so researchers choose matmul-shaped models, so the next hardware doubles down on matmul. After a decade it stops looking like a choice and starts looking like physics. It isn’t.

Where the waste is

The cost of inference is dominated by two things: floating-point multiply-accumulates, and moving 32-bit weights out of memory. Multiplication is among the most expensive operations a chip does; a memory fetch can cost far more than the arithmetic it feeds (see Horowitz, Computing’s Energy Problem, ISSCC 2014). A modern accelerator is a machine built to maximize exactly these operations, fed continuously from DRAM, clocked whether or not the numbers it is grinding actually matter. The brain, by contrast, runs general intelligence on roughly twenty watts — no floating-point multiplier, event-driven, sparse, with memory and compute in the same place. That is not mysticism; it is a different set of primitives.

A network that runs on pulses, not multiplies

Here is the concrete part. I built a small pulsed neural network and ran it on MNIST. It is not a ±1 “binary” network and it does no XNOR tricks. It works the way pulse-stream signal processing works: a value is a stream of pulses (its magnitude is the pulse rate), polarity is a sign clock, and each neuron simply integrates pulses and fires when its accumulator overflows a threshold — the overflow is the activation. A value×weight product becomes “deliver the input’s pulses, each adding the weight.” Sum-then-threshold becomes “accumulate until you overflow and fire.” No multiplier appears anywhere in inference.

It classifies. Trained conventionally and then run purely as pulses:

reference (floating-point) test accuracy: 94.23%

pulses/input (T)   test accuracy
         1            11.40%
         2            58.50%
         4            88.20%
         8            93.00%
        16            93.70%
        64            93.60%

As you spend more pulses, accuracy climbs from chance to within half a point of the floating-point network, and saturates around sixteen pulses per input. That curve is the whole story of rate coding: more pulses = more precision = more energy. It is a knob you control, not a fixed tax. The code is a single dependency-light file you can run yourself; the link is at the bottom.

I’ll be honest about what this does and does not prove. It proves the arithmetic is optional — the network’s function survives with pulses and overflow-firing and zero multiplies. It does not prove this is faster on your laptop. It isn’t: a CPU or GPU is the wrong machine for it, because those chips are built to do the very floating-point matmul we just removed. Emulating pulses on a GPU is slower, not faster. That is the point, not a footnote.

Why the GPU loses, and what wins

A GPU is the maximal embodiment of the old assumption: dense floating point, DRAM-fed, clock-synchronous. Every design decision that makes it superb at matrix multiplication becomes a liability the moment the workload stops being matrix multiplication. The alternatives don’t out-matmul the GPU — they make matmul irrelevant:

  • Pulsed / event-driven computation: energy proportional to activity, not to a clock. Work happens only when a pulse arrives.
  • No floating-point unit to power, because the model doesn’t need one.
  • Weights on-chip, killing the DRAM traffic that dominates the energy budget.

The clearest existing picture of this substrate is the GreenArrays GA144: 144 asynchronous cores, no floating point, picojoule-scale per operation, cores that sleep instantly when no data is flowing, and enough on-chip memory to hold a binary/pulsed model with no external weight fetches. An async, FP-free, on-chip, event-driven array is exactly the machine a pulsed network wants — and the opposite of a GPU in every design commitment.

To be careful: the GA144 energy advantage for this workload is, today, a projection from its architecture, not a benchmark I am handing you. The honest next step is to run a pulsed inference kernel on a GA144 (or its cycle-accurate simulator) and publish the measured instruction and energy counts. That is the experiment worth funding — and it costs a rounding error against what we are spending pouring concrete for matmul.

The claim

GPUs are not obsolete tomorrow, and training will live on dense hardware for a while. But the assumption that inference at planetary scale must be floating-point matrix multiplication is a 2010s artifact, and it is breaking. The arithmetic is optional; the substrate is a choice; and we have barely funded the alternatives. There are far more efficient ways to run these networks. Here is one of them, running.


Working example (MIT-licensed): a pulsed integrate-and-fire MNIST network in one file —  https://github.com/johnsokol/bnn-example   (python3 pulsed_nn.py).


© 2026 John L. Sokol.

groundbreaking new paper introduces a next-gen neuron model inspired by real cortical cells.


groundbreaking new paper introduces a next-gen neuron model inspired by real cortical cells.

Most neural nets are still based on the model of a neuron as proposed in the 1950's: u = activation(w·x + b)

In a new paper, researchers propose a more accurate model of a biological brain neuron and found that it has quite a few advantages, like needing less training data.

the classic point neuron (u = activation(w·x + b)) with a far more biologically realistic version - and it delivers:

- Higher expressivity
- Faster learning
- Better robustness
- Less memorization
- Works with less data
All without adding parameters.
The brain was right all along.

Result? More powerful, faster to train, more robust, and less data-hungry zero extra parameters. and it beats the classic version across the board Better performance

https://arxiv.org/pdf/2605.30370


see also: 


Nixie-clock using neon lamps as logic elements

copied from https://web.archive.org/web/20170824164029/https://wwwhome.ewi.utwente.nl/~ptdeboer/ham/neonclock/

see video here https://youtu.be/v3oUTgtCUb0

Nixie-clock using neon lamps as logic elements

[photo of my neon clock]
The above shows my home-built digital clock. It uses Nixie-tubes for readout. In contrast to most other nixie-clocks being built these days, my clock does not use any transistor or IC for driving the tubes. Instead, the driving logic is built from neon lamps, together with resistors, capacitors and silicon diodes.

The project started in 2002, when our university library was selling old outdated or otherwise superfluous books, and I very cheaply bought the book "Electronic Counting Circuits" by J.B. Dance, published in 1967, and apparently only ever lent three times by our library, all in 1973. It described how neon lamps can be used as logic elements in a ring counter, exploiting the fact that they need a higher voltage to ignite (the striking voltage) than to stay lit (the maintaining voltage):
[schematic from Dance's book]
Unfortunately, if one substitutes the neon bulbs that are available in electronics shops nowadays, the circuit doesn't work. Dance used lamps that were specifically manufactured for this type of application, with a large difference between their striking and maintaining voltages. Nowadays, such lamps are (presumably) no longer manufactured; the neon bulbs that are still available in shops are meant as indicator lamps, and have a much smaller difference between their striking and maintaining voltages. This required changing the circuit's resistor values, and makes its operation more critical; furthermore, the lamps need to be selected for matching characteristics.

This is one of the ring counters in my clock:
[photo of one ringcounter]
Four of these are used, to divide the 50 Hz from the mains power (see here for stability measurements) first by 10 (yielding 5 Hz), then by 5 (yielding 1 Hz, i.e., one pulse per second), then further by 10 and 6 to yield one pulse per minute. Note the paper labels still dangling at the cathode wires of the lamps: these are needed to look up the measured properties of each lamp.

Four more ring counters are used dividing by 10, 6, 10 and 3, to count the minutes, tens-of-minutes, hours and tens-of-hours and drive the Nixie tubes:
[photo of one ringcounter with LDRs]
The nixie tubes are driven through Light Dependent Resistors (LDRs): under the influence of the light from the neon lamp, their resistance lowers, connecting one nixie cathode to the negative power supply. In order for the LDR not to be influenced too much by ambient light, while still allowing the neon bulb to be visible, an optical attenuator and filter is used between them, consisting of a black cardboard disk with a small hole in it, and two layers of red foil, held together by glue and shrink tube:
[photo of one ringcounter with LDRs]

The ring counters are rather sensitive to ambient light: in complete darkness, they tend not to work. Even though there are always a few bulbs active (if only in the power supply, which is not shown in the photographs), my clock still needs a bit of external ambient light. I'm experimenting with blue LEDs for providing this extra ambient light. This seems to be quite effective: illuminated by just two blue leds, the clock ran perfectly one night long in otherwise complete darkness:
[photo of clock with 2 blue leds]
Note though that the blue in this photo is more intense than it looks like in reality: apparently the camera is more sensitive to this shade of blue than the human eye.

Some other things that I ran into while designing this clock:

  • In contrast to what Dance's book says, one can't cascade the ring counters just by connecting them (when using modern-day neon bulbs). I'm now using an extra neon bulb per counter as an amplifier: it is biased to just under its striking voltage, so a small pulse can strike it.
  • The striking and maintaining voltages of the lamps change quite much during their first hours of operation. Therefore, it is necessary to first "burn-in" (age) the lamps, before measuring their characteristics.
  • Despite selecting my lamps for matching characteristics, some still acted weird and needed to be replaced. For example, I had one which somehow didn't work reliably in the buffer stage; and another one worked reliably in a ring counter when clocked at about 1 Hz, but not when getting a pulse only once per hour. Apparently, fully characterizing the neon lamps requires more quantities than just the striking and maintaining voltages.

The clock is now electrically functional, but still some work remains to be done. The power supply needs to be built tidily, the aligator clip test leads eliminated, and the whole thing put into a (transparent) enclosure for safety.


Movie and circuit diagram

A short movie (AVI format, 10 MB) of the clock in operation is available here.

Furthermore, the circuit diagram is available in a PDF file. This schematic diagram contains some extra explanation of how specific parts work. This diagram is meant to document and explain the details of my clock, and there will probably be some minor changes made in the future. The diagram is not meant as a complete basis for building another such clock; for example, while some of the resistor values are quite uncritical and determined by what I happened to have at hand, many depend critically on the characteristics of the neon lamps used. (Hopefully needless to say, any prospective builders should take proper safety precautions for working with the high voltages involved.)


Links

  • Many people these days enjoy building Nixie clocks, though usually with modern electronics driving the tubes. See here for a gallery.
  • A mailing list on nixie clocks exists: neonixie-l at Google groups (and formerly at Yahoo).
  • Nixie clocks completely without silicon have also been built: this one and this one use vacuum tubes, and this one uses trigger tubes. Trigger tubes are actually neon lamps with a third electrode to trigger them.
  • In the November 1966 issue of Electronics Illustrated, a description was published for building an electronic calculator using neon bulbs connected as ring counters. (I've also seen reference to a 1967 issue of "Practical Electronics" as apparently containing the same or a very similar article.)

Comments are welcome at pa3fwm@amsat.org.
Copyright © 2007.
Back to my amateur radio webpage.



Monday, June 01, 2026

Laser cut , Snake Bot -

 

This is a 2 Dimensional Snake Robot that has flexible couplings between each joint so It can rest flat on the surface. There are 8 segments, made using 40mm x 40mm x 20mm metal gear High Torque servos


The is constructed from laser cut 4.7mm MDF (Medium Density Fiberboard) , press fit passive Skateboard Bearings 608zz 8mm x 22mm x 7mm. The same type used in fidget spinners.
The design is done in OpenSCAD and was inspired by Japanese woodworking styles, without screws, glue or fasteners with the exception of the RC Servo Hub.

https://github.com/johnsokol/OpenSCAD-misc-projects/tree/master/RCservo-snake


Laser cut pattern file in OpenSCAD. 



 The parts press fit over the servo.  The Bottom left piece just press fits over the servo output and the only screws are the 3mm screws that hold part on to the servo. and the Ball Bearings just press fit over the end of the t on that part. heat shrink or tape should be placed on the ball bearing to increase friction with the surface. 






When fed a sine wave it can propel across a surface with nothing driving the wheels directly. 







Movement


Snake Kinematics


Image from: Locomotion Efficiency Optimization of Biologically Inspired Snake Robots: Eleni Kelasidi, Mansoureh Jesmani , Kristin Y. Pettersen and Jan Tommy Gravdahl