There Are Better Ways to Run AI — And We're Not Using Them
We are burning an extraordinary amount of power to run artificial intelligence — gigawatts in the data centers, and the last hours of your phone's battery at the edge — and a great deal of it is spent on one habit we never stopped to question. I want to lay out, plainly, why that habit is a choice rather than a necessity, why better ways exist, and why almost no one is pursuing them at the scale the moment deserves. And if any of this resonates, I'd like you to get in touch, because there is more to this than fits in a single piece.
The habit
Every neural network in production today rests on the same operation, repeated trillions upon trillions of times: a floating-point multiply, followed by an add. Matrix multiplication in floating point is the heartbeat of modern AI. The chips are built for it, the data centers are built to feed and cool the chips, and the whole industry has organized itself around doing this one operation faster and faster.
It is worth saying clearly that this was not a foolish choice. Floating-point multiplication won its place honestly. It is smooth and differentiable, which is what made networks trainable in the first place, and the hardware to do it quickly — graphics processors built for video games — happened to already exist when deep learning needed it. The match was lucky and powerful, and it produced the fastest technological transformation most of us will see in our lifetimes.
But a choice made for convenience has quietly hardened into an assumption treated as law: that intelligence must run on floating-point multiplication, and that the only way forward is more of it, faster. That assumption is now being poured into concrete, silicon, and power contracts at a scale of trillions — and it is wrong, or at least far less necessary than the spending implies.
Why it costs so much
The expense hides in two places, and neither is where people usually look.
The first is the operation itself. In a digital circuit, a multiplier is essentially a dense array of adders; the work it does grows with the square of the number of bits involved. An addition grows only linearly. A single bitwise logic operation is nearly free by comparison. Measured at the arithmetic unit, replacing a floating-point multiply-and-accumulate with bitwise logic and a simple count can cut the energy of that step by something like thirty to a hundred times.
The second, and larger, cost is moving the numbers around. In a modern AI chip, more energy goes into hauling weights and activations in and out of memory than into the arithmetic that consumes them — often by a wide margin. This is the memory wall, and it has a crucial consequence: the real prize is not just a cheaper operation, but smaller data. A weight simple enough that the multiply collapses into a sign and a count is also a weight that takes far less space to store, less energy to move, and less bandwidth to transmit. Cut the floating-point multiply properly and you attack the compute bill and the memory bill at the same time.
The edge is where this stops being abstract
At the scale of a data center, inefficiency can be hidden behind money and megawatts. At the edge, it cannot.
On a phone, a watch, a drone, a hearing aid, a remote sensor, a robot, there is no substation and no cooling tower. There is a battery, and a thermal limit set by something you might be holding in your hand. Every joule the model burns is runtime gone and heat you must shed. There is no option to simply supply more power.
So at the edge, efficiency is not a cost optimization — it is the line between the AI fitting on the device at all and not. A model that performs less arithmetic and moves less data runs longer on the same charge, runs cooler, and can stay resident on the device instead of shipping your data off to a server. That last point is a quiet bonus: computation that stays local is computation that is private by construction. For an enormous population of working engineers in mobile, embedded, robotics, and IoT, the power wall is not a future risk — it is the constraint they fight every single day. The float-free approach speaks directly to them.
We already know it can be done
This is not speculation dressed up as hope. Three independent lines of evidence — from biology, from current research, and from working systems — all say the floating-point multiply is optional.
Biology is the existence proof that cannot be argued with. The human brain runs a general intelligence on roughly twenty watts, the draw of a dim bulb. Neurons do not multiply. They accumulate incoming signals and fire when a threshold is crossed, then fall silent; a neuron receiving nothing costs almost nothing. The brain is event-driven and sparse by nature — it does work only where and when there is work to do. That does not prove a brain-like machine is better; aircraft do not flap their wings. But it proves, beyond dispute, that the highest intelligence we know of does not require dense floating-point multiplication. The operation is contingent, not fundamental.
Current research has now shown the same thing in engineering terms. Models built to run on addition instead of multiplication, using severely constrained weights, have reached quality comparable to conventional networks while reporting roughly an order-of-magnitude reduction in inference energy. This is demonstrated for inference, not yet for training at the largest scale — but the wall between "multiplication is mandatory" and "addition is sufficient" now plainly has a door in it.
And working systems have done it on hardware almost laughably weak by today's standards. Float-free, integer and fixed-point, event-driven networks — accumulate-and-fire designs in which the hidden inference path contains no multiplies at all — have run in real time on machines a thousand times less capable than the phone in your pocket. The approach is old enough to have been deployed in earnest decades ago, long before it was fashionable.
What I am, and am not, claiming
I want to be honest about the limits, because the argument is stronger inside them.
I am not claiming floating point is dead, or that these methods have already won. Training the largest models still relies on high precision. I am not claiming that efficiency alone will reduce total energy use — when computation gets cheaper, the world tends to do more of it, and the aggregate bill may not fall. And I am not offering a finished, drop-in product that solves everything tomorrow.
What I am claiming is narrower and still consequential: the floating-point multiply is a choice, not a requirement; dropping it buys power back everywhere; and that matters most exactly where power is scarcest — on a battery that has to last the day, and on a grid being asked to give up cities' worth of electricity and water. The cheapest line item available to this industry is the research into doing the same work with far less power. Against the scale of what is being built, it is a rounding error. On the edge, it is the whole game.
Why I'm writing this
I have spent a long career building efficient systems on constrained hardware, going back to neural networks that ran in real time on machines from the 1980s. I have watched the field arrive, decades later, at conclusions that were reachable far earlier — and I have watched the industry double down on the single most power-hungry way of getting there, just as the costs of that choice are coming due in grids, in water, and in batteries.
I am not trying to sell anything, and I am not asking anyone to stop building. I am trying to point out, to the people making the decisions and the engineers writing the code, that there are better ways — that they are real, that they have history behind them, and that they deserve serious attention before still more capacity is locked into the old assumption.
If any of this lands with you — if you plan power and cooling, allocate research budgets, design for the edge, or simply want to understand why your battery dies the way it does — I would welcome the conversation. There is considerably more to this than I can fit here, including working approaches and decades of hard-won detail. Reach out, and I'll share what I can.
The floating-point habit is not a law of nature. It is a decision we are still free to revisit — and the longer we wait, the more we pay for it, at both ends of the wire.
No comments:
Post a Comment