Wednesday, May 27, 2026

There Are Better Ways to Run AI — And We're Not Using Them

 

There Are Better Ways to Run AI — And We're Not Using Them

We are burning an extraordinary amount of power to run artificial intelligence — gigawatts in the data centers, and the last hours of your phone's battery at the edge — and a great deal of it is spent on one habit we never stopped to question. I want to lay out, plainly, why that habit is a choice rather than a necessity, why better ways exist, and why almost no one is pursuing them at the scale the moment deserves. And if any of this resonates, I'd like you to get in touch, because there is more to this than fits in a single piece.

The habit

Every neural network in production today rests on the same operation, repeated trillions upon trillions of times: a floating-point multiply, followed by an add. Matrix multiplication in floating point is the heartbeat of modern AI. The chips are built for it, the data centers are built to feed and cool the chips, and the whole industry has organized itself around doing this one operation faster and faster.

It is worth saying clearly that this was not a foolish choice. Floating-point multiplication won its place honestly. It is smooth and differentiable, which is what made networks trainable in the first place, and the hardware to do it quickly — graphics processors built for video games — happened to already exist when deep learning needed it. The match was lucky and powerful, and it produced the fastest technological transformation most of us will see in our lifetimes.

But a choice made for convenience has quietly hardened into an assumption treated as law: that intelligence must run on floating-point multiplication, and that the only way forward is more of it, faster. That assumption is now being poured into concrete, silicon, and power contracts at a scale of trillions — and it is wrong, or at least far less necessary than the spending implies.

Why it costs so much

The expense hides in two places, and neither is where people usually look.

The first is the operation itself. In a digital circuit, a multiplier is essentially a dense array of adders; the work it does grows with the square of the number of bits involved. An addition grows only linearly. A single bitwise logic operation is nearly free by comparison. Measured at the arithmetic unit, replacing a floating-point multiply-and-accumulate with bitwise logic and a simple count can cut the energy of that step by something like thirty to a hundred times.

The second, and larger, cost is moving the numbers around. In a modern AI chip, more energy goes into hauling weights and activations in and out of memory than into the arithmetic that consumes them — often by a wide margin. This is the memory wall, and it has a crucial consequence: the real prize is not just a cheaper operation, but smaller data. A weight simple enough that the multiply collapses into a sign and a count is also a weight that takes far less space to store, less energy to move, and less bandwidth to transmit. Cut the floating-point multiply properly and you attack the compute bill and the memory bill at the same time.

The edge is where this stops being abstract

At the scale of a data center, inefficiency can be hidden behind money and megawatts. At the edge, it cannot.

On a phone, a watch, a drone, a hearing aid, a remote sensor, a robot, there is no substation and no cooling tower. There is a battery, and a thermal limit set by something you might be holding in your hand. Every joule the model burns is runtime gone and heat you must shed. There is no option to simply supply more power.

So at the edge, efficiency is not a cost optimization — it is the line between the AI fitting on the device at all and not. A model that performs less arithmetic and moves less data runs longer on the same charge, runs cooler, and can stay resident on the device instead of shipping your data off to a server. That last point is a quiet bonus: computation that stays local is computation that is private by construction. For an enormous population of working engineers in mobile, embedded, robotics, and IoT, the power wall is not a future risk — it is the constraint they fight every single day. The float-free approach speaks directly to them.

We already know it can be done

This is not speculation dressed up as hope. Three independent lines of evidence — from biology, from current research, and from working systems — all say the floating-point multiply is optional.

Biology is the existence proof that cannot be argued with. The human brain runs a general intelligence on roughly twenty watts, the draw of a dim bulb. Neurons do not multiply. They accumulate incoming signals and fire when a threshold is crossed, then fall silent; a neuron receiving nothing costs almost nothing. The brain is event-driven and sparse by nature — it does work only where and when there is work to do. That does not prove a brain-like machine is better; aircraft do not flap their wings. But it proves, beyond dispute, that the highest intelligence we know of does not require dense floating-point multiplication. The operation is contingent, not fundamental.

Current research has now shown the same thing in engineering terms. Models built to run on addition instead of multiplication, using severely constrained weights, have reached quality comparable to conventional networks while reporting roughly an order-of-magnitude reduction in inference energy. This is demonstrated for inference, not yet for training at the largest scale — but the wall between "multiplication is mandatory" and "addition is sufficient" now plainly has a door in it.

And working systems have done it on hardware almost laughably weak by today's standards. Float-free, integer and fixed-point, event-driven networks — accumulate-and-fire designs in which the hidden inference path contains no multiplies at all — have run in real time on machines a thousand times less capable than the phone in your pocket. The approach is old enough to have been deployed in earnest decades ago, long before it was fashionable.

What I am, and am not, claiming

I want to be honest about the limits, because the argument is stronger inside them.

I am not claiming floating point is dead, or that these methods have already won. Training the largest models still relies on high precision. I am not claiming that efficiency alone will reduce total energy use — when computation gets cheaper, the world tends to do more of it, and the aggregate bill may not fall. And I am not offering a finished, drop-in product that solves everything tomorrow.

What I am claiming is narrower and still consequential: the floating-point multiply is a choice, not a requirement; dropping it buys power back everywhere; and that matters most exactly where power is scarcest — on a battery that has to last the day, and on a grid being asked to give up cities' worth of electricity and water. The cheapest line item available to this industry is the research into doing the same work with far less power. Against the scale of what is being built, it is a rounding error. On the edge, it is the whole game.

Why I'm writing this

I have spent a long career building efficient systems on constrained hardware, going back to neural networks that ran in real time on machines from the 1980s. I have watched the field arrive, decades later, at conclusions that were reachable far earlier — and I have watched the industry double down on the single most power-hungry way of getting there, just as the costs of that choice are coming due in grids, in water, and in batteries.

I am not trying to sell anything, and I am not asking anyone to stop building. I am trying to point out, to the people making the decisions and the engineers writing the code, that there are better ways — that they are real, that they have history behind them, and that they deserve serious attention before still more capacity is locked into the old assumption.

If any of this lands with you — if you plan power and cooling, allocate research budgets, design for the edge, or simply want to understand why your battery dies the way it does — I would welcome the conversation. There is considerably more to this than I can fit here, including working approaches and decades of hard-won detail. Reach out, and I'll share what I can.

The floating-point habit is not a law of nature. It is a decision we are still free to revisit — and the longer we wait, the more we pay for it, at both ends of the wire.


Monday, May 25, 2026

Research Before Concrete


Research Before Concrete

Why the cheapest line item in artificial intelligence is the one nobody is funding

The AI industry is committing trillions of dollars to buildings, power, and silicon, and a rounding error to the question of whether the architecture inside them is the right one. That ratio — not the buildout itself — is the mistake.


I. What is being funded, and what is not

There is an extraordinary amount of capital moving through the artificial intelligence industry, and it is worth being precise about where it goes.

It goes into concrete and steel: the shells of data centers, rising across three continents at a pace the construction industry has rarely seen. It goes into power — substations, transmission upgrades, transformers with multi-year lead times, electricity contracts that run for decades, and in a growing number of cases dedicated generation built for a single customer. It goes into silicon: hundreds of thousands of accelerators per large facility, each one a small fortune, replaced every few years. Independent estimates put the cumulative figure in the multiple trillions of dollars across the second half of this decade. It is one of the largest concentrations of private capital expenditure in history.

Almost none of it goes into asking whether the architecture being poured into those buildings is the durable one.

That asymmetry is the subject of this essay. The buildout is, in the end, a bet — a vast, concentrated, physical, multi-year, largely irreversible bet — that the way artificial intelligence computes today is the way it will compute for the economic life of the assets being built. And the research that would tell you whether that bet is sound is being funded at a tiny fraction of the rate of the bet itself. We are paying for the city before we have paid for the survey.

Let me be exact about the claim, because the strength of the argument depends on its modesty. The argument is not that the buildout should stop. Demand for AI computation is real, it is here today, and it must be served on the hardware that exists. The argument is not that the current architecture is wrong — it may well prove durable. The argument is narrower, and I believe it is very hard to refute once stated plainly: research into alternative computing architectures is the cheapest hedge available against the most expensive mistake on the table, and it is currently being treated as an afterthought when the scale of the capital at risk makes it a precondition.

To see why, we need to look at three things in turn: how the present architecture became an unquestioned assumption, how wide the space of alternatives actually is, and what it costs — financially — to be wrong.

II. How a practical choice became an unexamined assumption

Modern artificial intelligence runs, almost in its entirety, on one operation performed densely and at colossal scale: multiply, then add. Matrix multiplication is the heartbeat of every large model in production today. Training a model is matrix multiplication; running it is matrix multiplication; the accelerators are matrix-multiplication engines, and the data centers are buildings designed to feed and cool matrix-multiplication engines. The entire industrial base is an investment in performing one operation faster.

It is important to say clearly that this was not a mistake of ignorance. Multiplication earned its place, and it earned it twice over.

It earned it first through the mathematics of learning. The algorithm at the heart of modern deep learning — backpropagation — requires smooth, differentiable operations, so that an error signal can flow backward through a network and adjust millions or billions of parameters in the right direction. Multiplication and addition over continuous numbers are perfectly smooth and differentiate cleanly. The discontinuous, all-or-nothing operations that a more brain-like system might use are hostile to gradient-based learning. Multiplication was not adopted because anyone proved intelligence must be multiplicative. It was adopted because it was the operation that could be made to learn.

It earned it a second time through a historical accident of hardware. When deep learning began to work — the watershed is usually dated to around 2012 — the ideal hardware for it already existed, built for a completely unrelated market. Graphics processing units, designed to render video-game imagery, happened to be machines for performing dense matrix multiplication in massive parallel. The AI field did not have to wait a decade for someone to design it a substrate. It inherited one, fully formed, along with a software ecosystem that grew up around it. The fit between the algorithm and the silicon was so good that it produced the fastest technological transformation in living memory.

So the choice was correct. Nothing in this essay disputes that. But watch what happened next, because it is the quiet center of the whole problem.

A practical choice — use dense multiplication, because it learns well and the hardware exists — hardened, through a long sequence of individually reasonable decisions, into infrastructure. The infrastructure, in turn, hardened into an assumption: intelligence runs on dense multiplication, so the path forward is more and faster dense multiplication. No one announced this transition. No committee ratified it. It accreted. Each new chip generation, each new data center, each new financial model built on the last, until the industry was planning at the scale of trillions as though the operation itself were a fixed constant of nature, and only its speed and price were left to vary.

You can see the hardened assumption most clearly in the financial models that project the buildout. They are sophisticated documents. They stress-test the cost of power, the price of land, the lead time of transformers, the depreciation schedule of silicon. And almost without exception they carry today's architecture forward across every future year as a given — varying how fast and how cheap that one operation becomes, never whether it remains the operation at all.

That is what an unexamined assumption looks like from the outside: the thing that was once a variable has silently stopped being treated as one. The remainder of this essay is an argument for putting it back.

III. The arithmetic underneath, and the work that need not be done

Before turning to alternatives, it is worth understanding why the current architecture is expensive in the first place — because the expense is not where most people assume.

In a digital circuit, multiplication and addition are not remotely equal in cost. A hardware multiplier is, in effect, a dense array of adders; the work it does grows with the square of the bit width of the numbers involved. An addition grows only linearly. A single bitwise logic operation — an AND, an XNOR — is nearly free by comparison. Measured purely at the arithmetic unit, replacing a full-precision multiply-and-accumulate with bitwise logic and a simple count of set bits is on the order of a thirty- to one-hundred-fold reduction in energy. That figure alone would be reason enough to take alternatives seriously.

But the arithmetic unit is not where most of the energy in a modern AI chip goes. The dominant cost is not computing numbers. It is moving them — hauling weights and activations out of memory, across the chip, between chips, and across the network that binds a data center together. Moving a number from memory can cost hundreds to thousands of times more energy than the arithmetic operation that then consumes it. This is known in computer architecture as the memory wall, and it has a profound implication for this discussion: the real prize is not merely a cheaper operation. It is smaller data and less movement.

This is why the most promising alternatives attack the problem from two directions at once. A weight simple enough that multiplication collapses into a sign-flip is also a weight that takes a fraction of the space to store, a fraction of the energy to move, and a fraction of the bandwidth to transmit. The cheap operation and the cheap data movement arrive together. An architecture built on radically simpler weights is not 30% more efficient. It can be most of an order of magnitude more efficient, because it shrinks the term that actually dominates the energy budget.

There is a second, deeper inefficiency, and it points at an even larger opportunity. A dense matrix multiplication computes every element of its result — including the enormous number of elements that are zero, negligible, or irrelevant to the final answer — because processing the entire grid is simply what dense multiplication hardware does. Yet the networks themselves are not dense in their behavior. In a large language model, only a fraction of the internal units are meaningfully active for any given input; most contribute nothing to that particular result. Today's hardware computes them anyway. It spends energy, at scale, multiplying numbers that do not matter by other numbers, and then adding zero.

An architecture that could skip that work — that touched a unit only when the unit actually had something to contribute — would save not a marginal percentage but a large multiple, because it would simply not perform the majority of the operations that current hardware performs. This is the principle of event-driven, or "lazy," computation: do the work only where and when there is work to do. It is not an exotic idea. It is, as the next section describes, how the most capable intelligence we know of already operates.

IV. Three reasons to believe the design space is wide

Here is the heart of the matter. If the present architecture were the only workable way to compute intelligence, then committing trillions to it would carry no architectural risk — there would be nowhere else for the workload to go. The case for funding research rests entirely on the opposite being true: that the design space is wide, real, and under-explored. Consider three independent pieces of evidence, drawn from three different domains, that this is so.

The first reason is biological. The one machine we know for certain runs general intelligence — the human brain — does not multiply. A biological neuron does not perform floating-point matrix multiplication. It integrates incoming electrical signals, and when that accumulation crosses a threshold, it fires; then it falls quiet. It is event-driven by nature: a neuron receiving no input does almost nothing and costs almost nothing. At any given moment, the overwhelming majority of the brain's roughly eighty-six billion neurons are silent. The system as a whole sustains language, perception, reasoning, and memory on a power budget of about twenty watts — the draw of a dim light bulb.

This observation must be handled honestly, because it is easy to overstate. The brain being event-driven does not prove that event-driven computation is superior. "Nature does it this way, therefore it is better" is a weak form of argument with a long record of being wrong; aircraft do not flap their wings, and they fly farther and faster than any bird. Engineering is permitted, and often wise, to diverge from biology. So the claim here is deliberately narrow: the brain is an existence proof. It demonstrates beyond any possible dispute that a system can exhibit the highest known form of general intelligence without dense floating-point multiplication, and on an energy budget some five to eight orders of magnitude below our engineered approach. It does not tell us our approach is wrong. It tells us, with total certainty, that our approach is not the only one — that the operation is contingent, not fundamental.

The second reason is digital, and it is recent. For most of the history of deep learning, the multiplication assumption was safe in practice for a simple reason: no one knew how to build a competitive model without it, and an efficient architecture that cannot match the quality of the dominant one is merely a curiosity. That barrier has now been breached. Research models that run on addition instead of multiplication — using weights so severely constrained that the multiplication effectively disappears, replaced by a sign-flip and a count — have reached quality comparable to conventional networks while reporting roughly an order-of-magnitude reduction in inference energy. This must also be stated with its limits intact: it has been demonstrated convincingly for inference, not for training models at the absolute frontier; it is a credible and rising direction, not a settled victory. But the wall between "dense multiplication is mandatory" and "addition is sufficient" demonstrably now has a door in it, and that door has been walked through and the results published. The contingency that biology asserts in principle, this work demonstrates in engineering practice.

The third reason is physical, and it is the most striking of the three. It is possible to remove the multiplication not merely from the arithmetic, but from the computing substrate altogether — to arrange matters so that the computation falls out of physics directly, with no arithmetic unit performing it at all. Binary values can be encoded as the phase of a beam of light. When two such phase-encoded beams are brought together and allowed to interfere, the result is governed by wave physics: beams in phase reinforce one another into a bright output, beams out of phase cancel into darkness. That interference is a logical comparison — an XNOR — performed by nature, instantly, at the speed of light, consuming almost nothing. A simple photodetector then counts how many comparisons came out bright. Even the threshold operation that conventionally requires an expensive nonlinear function can be performed by the intrinsic optical nonlinearity of a photonic-crystal cavity. This is the basis of an emerging class of optical binary-attention architectures: designs — and in some cases granted or pending patents — for performing the most computationally expensive part of a transformer with light rather than with transistors.

The third reason demands the most careful honesty, and honesty is exactly what makes it persuasive rather than fanciful. This work is largely at the stage of proposed and patented architecture, supported by physical modeling and simulation, not fabricated and mass-produced silicon. Photonic computing is genuinely difficult — optical components are large, sensitive, and hard to integrate at density — and other research groups are already pursuing photonic transformers of various kinds, so this is not virgin territory. The claim is emphatically not "optical computing has arrived and it wins." The claim is precisely the modest one on which this entire essay turns: it is another point in the design space, and a radically distant one, swapping not merely the arithmetic operation but the very physical medium in which computation occurs.

Now set the three side by side. Event-driven instead of dense. Addition instead of multiplication. Light instead of electrons. Each is independently credible. Each rests on a different foundation — one on biology, one on digital engineering results, one on physics and optical design. Each is, today, funded at a small fraction of the rate of the buildout. And each one widens the space of viable architectures that the current concrete-and-silicon commitment is, implicitly and without having said so, betting against. The design space is not a narrow corridor with one obvious path. It is a wide and sparsely mapped territory — and the buildout has staked everything on a single coordinate within it.

V. Two kinds of obsolescence

A wide design space would be merely interesting, rather than financially urgent, were it not for a specific risk it creates. To see that risk clearly, it helps to distinguish two kinds of obsolescence, because the industry has thoroughly internalized one and barely acknowledged the other.

The AI infrastructure industry already worries about obsolescence — intensely, publicly, and with precise vocabulary. Hardware that still functions perfectly but costs far more to operate than the current generation is described as OpEx-obsolete: it is not broken, it is simply uneconomic to run. Industry figures argue openly that AI accelerators have a true economic life of one to two years, even as they are carried on five- and six-year depreciation schedules; prominent investors have warned that the gap between those numbers flatters present earnings and stores up future write-downs. There are already documented cases of paid-for compute sitting underutilized or stranded. The anxiety is real, and the people voicing it are serious and well-informed.

But examine the shape of that anxiety. Every serious analysis of it traces the same line: this chip generation, then a faster one, then a faster one after that. The risk being modeled is that a quicker version of the same machine strands the slower version. Call this incremental obsolescence. It is real, it is expensive, and — this is the crucial point — the industry already knows how to survive it. You survive incremental obsolescence by refreshing your hardware on a schedule and amortizing accordingly. It is painful and capital-hungry, but it is a known game with known rules, and the buildout's financial models account for it.

It is worth pausing on how thoroughly this incremental framing has captured even the buildout's most prominent skeptics — because it shows the blind spot is not confined to the optimists. Mark Cuban, no one's idea of a credulous bull, has questioned the buildout's economics pointedly: he has argued that processing will get faster and cheaper sooner than expected, that "a lot of the numbers being thrown out there aren't going to come to fruition," and that some companies have gone all-in while spending more cash than they have available. He has flagged the circular financing binding the chipmakers, the model labs, and the cloud providers together. These are sharp and valuable criticisms. But notice their shape: every one of them is incremental. Cuban's case is that the same machine will get cheaper faster than the spending assumes — not that the machine itself might change. The most visible bear on the buildout is reasoning entirely within the incremental frame. That is how complete the capture is: the industry's optimists and its skeptics are, for the most part, arguing about the speed of the same assumed architecture, while the question of whether the architecture itself is durable goes almost entirely unasked.

There is a second kind of obsolescence, and the buildout is barely pricing it at all. Architectural obsolescence occurs when the operation itself, or the substrate itself, changes — when the workload migrates to a different computational primitive, or a different physical medium. This is categorically different from incremental obsolescence, and the difference is the entire point. You cannot refresh your way out of architectural obsolescence, because the next generation is not an upgrade of the asset you hold. It is a replacement for the assumption the asset was built upon. A data center optimized for dense electronic multiplication is not threatened by a faster multiplication chip — it can simply purchase one and slot it in. It is threatened by the workload moving to event-driven, addition-based, or optical computation that its silicon, its interconnect, its power delivery, and its very floor plan were never designed to accommodate. You cannot issue a firmware update to a building.

The three pieces of evidence in the previous section are precisely the early indicators of architectural change. And they are precisely the kind of risk the buildout's models leave out. The industry has stress-tested incremental obsolescence with real rigor. It has scarcely examined the architectural kind — even though the architectural kind is the one that strands assets rather than merely depreciating them.

VI. What architectural obsolescence has looked like before

This is not a hypothetical category invented for the occasion. Architectural obsolescence has happened before, and recently, and it is instructive to look at how it behaves when it arrives.

The clearest modern example is specialized cryptocurrency mining. For a brief period, mining was done on general-purpose processors, then on graphics cards, then on field-programmable chips, and finally on fully custom application-specific integrated circuits built to do nothing but the one required calculation. At each transition, the previous generation of hardware did not gently decline in value. It collapsed. Once a fundamentally more efficient machine existed, the older hardware could no longer cover the cost of the electricity it consumed, and its market value fell to scrap almost overnight. The lesson is not about cryptocurrency. It is about the dynamics of obsolescence in a competitive, energy-intensive computing market: when the change is architectural, the transition is not a glide path. It is a cliff.

There is an even more relevant example, and it is the rise of the AI industry itself. The shift from training neural networks on general-purpose processors to training them on graphics hardware was, exactly, an episode of architectural obsolescence — the workload migrated to a different kind of machine, and a great deal of prior assumption and infrastructure was left behind. The current industry is, in other words, itself the product of the very phenomenon it is now failing to price into its forward models. It happened once, in this field, within living memory. The proposition that it cannot happen again — that dense electronic multiplication is the final architecture, the place where the music stops — is an extraordinary claim, and it is being assumed rather than argued.

The pattern across these cases is consistent. Architectural transitions are infrequent, they are hard to time, and they are brutal to whoever is holding the superseded infrastructure when they arrive. Infrequent and hard-to-time is not the same as improbable, and it is certainly not the same as safe. It is, in fact, the exact risk profile against which prudent actors buy insurance — a low-frequency, high-severity event that cannot be precisely predicted but can be substantially hedged.

VII. The economics of being wrong

Consider what architectural obsolescence would actually do to the balance sheet of a large AI buildout, because the financial texture of the risk matters.

Ordinary depreciation is an orderly process. An asset loses value predictably over its useful life, the loss is booked in advance, and the business plans around it. Incremental obsolescence accelerates this but does not change its nature: the asset still has a residual value, a resale market, a secondary use. A previous-generation accelerator that is no longer competitive for frontier training can still serve inference, still be sold, still be redeployed. There is a floor.

Stranding is different in kind. A stranded asset is one whose economic value has been destroyed by obsolescence before the end of its planned financial life — and in an architectural transition, the destruction can be close to total, because the asset is not merely slower, it is the wrong kind of thing. A data center purpose-built for dense electronic multiplication has limited value if the workload moves to a substrate it cannot host. The building is specialized. The power and cooling design is specialized. The accelerators are specialized, and unlike a previous-generation chip, they have no graceful secondary market in a world that has moved to a different primitive. The residual value does not glide downward. It can fall through the floor, because in an architectural transition there may be no floor.

This is why the distinction between depreciation and stranding is not accounting pedantry. It is the difference between a cost the buildout has already planned for and a cost that could arrive as a sudden, large, concentrated write-down across an entire class of assets at once. Analysts of the data center sector already speak of a coming bifurcation, in which facilities that remain well-matched to the workload command premium valuations while everything else is marked down or divested. That bifurcation is usually discussed in terms of incremental factors — power efficiency, cooling design, location. Architectural obsolescence is the same bifurcation with the dial turned to its extreme.

The point of laying this out is not to forecast a crash. It is to be honest about the shape of the downside. The buildout's risk is not that it is large; large and well-founded is fine. The risk is that it is large, concentrated, irreversible, and exposed to a low-frequency, high-severity failure mode that the prevailing financial models do not include. That is a precisely insurable situation — and the insurance, in this case, is research.

VIII. Why research is the rational line item

Now the proposal itself, and the arithmetic that makes it nearly self-evident.

Research is astonishingly cheap relative to the buildout. A serious, well-funded, multi-year program — one that pursued event-driven computation, addition-based models, optical and other non-conventional substrates, and, just as importantly, the rigorous and honest benchmarking of all of them against the dominant approach — would cost some small fraction of one percent of the capital already committed to concrete, power, and silicon. Against a multi-trillion-dollar buildout, a genuinely ambitious architectural research program is, in the most literal financial sense, a rounding error.

And what that rounding error purchases is the single thing the trillion-dollar bet most conspicuously lacks: information. The danger of the buildout, to repeat, is not its size. It is the combination of size with uncertainty — an enormous, concentrated, illiquid, irreversible commitment to one architecture, made without having paid to discover whether that architecture is durable. Research does not abolish the uncertainty. But it shrinks it, and it shrinks it precisely where the buildout is most exposed. A few years of well-funded work would establish, before the concrete has fully set, which alternative directions are real and which are dead ends; how close the credible alternatives are to frontier-scale viability; and therefore how much architectural risk the current commitment actually carries. That is exactly the knowledge that separates a sound investment from a reckless one — and at present it is the knowledge nobody is buying.

This is the ordinary logic of insurance, and of exploration before commitment, applied to a situation that plainly calls for both. When you are about to make an enormous, irreversible commitment at a single point in a wide and poorly mapped space, the rational first expenditure is the map. Not because committing is wrong — because committing blind is wrong, when sight is available so cheaply.

So the call to action is not "stop building." It is this: fund architectural research at a scale genuinely proportionate to the capital it protects — ahead of the irreversible commitments where that is still possible, and in earnest parallel with them everywhere else, rather than as the token afterthought it is today. Treat the architecture of computation as a variable to be actively investigated, not a constant to be quietly assumed across every year of a trillion-dollar projection.

IX. What that research should actually fund

It is fair to ask what a serious architectural research program would concretely consist of, because a proposal that cannot be made specific is not yet a proposal.

It would fund, first, the honest benchmarking of the alternatives — not the optimistic projections of their advocates, but careful, adversarial, reproducible measurement of addition-based and event-driven models against conventional ones, at growing scale, on the metrics that actually matter to a data center operator: quality at a fixed task, energy per unit of useful work, and throughput per watt per dollar. Much of the current uncertainty exists simply because this measurement has not been done at scale by disinterested parties.

It would fund the unsolved problems that currently keep the alternatives from the frontier. Event-driven computation is harder to train and to schedule than dense computation; addition-based models have been shown for inference but not yet for the largest-scale training; optical components are difficult to integrate at density. None of these is obviously insurmountable, and each is exactly the kind of problem that yields to sustained, funded attention — and exactly the kind that languishes without it.

It would fund the substrate work: small-scale fabrication and physical prototyping of non-conventional accelerators, including photonic and event-driven designs, so that the gap between a promising architecture on paper and a manufacturable one is actually measured rather than guessed at.

And it would fund the integration question that may matter most of all: how, and how cheaply, a more efficient architecture could be adopted without requiring the entire software and hardware ecosystem to be rebuilt from nothing. An architecture that is more efficient but strands every existing model and tool faces an adoption barrier that has little to do with its merits. The research that lowers that barrier — translation paths, compatibility layers, hybrid approaches — is as valuable as the architectures themselves.

None of this is exotic. It is the ordinary substance of applied research, and its total cost is small. What is missing is not the feasibility. What is missing is the decision to fund it at a scale that reflects the trillions it would protect.

X. The honest case on the other side

A position is only worth as much as its treatment of the strongest objections to it, so consider, fairly, the case for the buildout proceeding exactly as it is.

The demand is real and it is compounding. Every month of delay in serving it has a genuine cost, in revenue and in competitive position, and "build now on the architecture that works" is a defensible response to a market growing this fast. A buildout sized to that demand is not obviously irrational even if some of it is later repurposed or written down.

Dense multiplication may simply be very hard to beat. It is extraordinarily general — it makes almost no assumptions about the structure of the problem — and generality has real value when the workload itself keeps changing. The alternatives, by contrast, tend to buy their efficiency by exploiting specific structure, and structure-exploiting approaches have a long history of being overtaken by more general ones riding a faster hardware curve. It is entirely possible that the efficient alternatives prove real but niche.

And the alternatives may not pan out at the frontier at all. Addition-based models are unproven for the largest-scale training; optical computing has promised much before and delivered slowly; event-driven systems remain hard to program. A research program is not guaranteed to produce a durable winner. It may simply confirm that the current architecture was the right one all along.

Every one of these points is legitimate, and a serious reader should weigh them. But notice that not one of them argues against funding the research. They are arguments about how the research will turn out — and the entire purpose of research is to find out how it turns out. If dense multiplication is genuinely unbeatable, a few years of well-funded investigation will demonstrate that, and the buildout will proceed with its central assumption validated rather than merely assumed — which is itself worth far more than its rounding-error cost. If the alternatives prove real, the buildout will have been warned in time to adapt. The research has positive value under every outcome. That asymmetry — cheap in all cases, decisive in some — is the definition of a hedge worth buying. The objections argue against betting on the alternatives. They do not, and cannot, argue against finding out.

XI. The honest limits of the argument

The boundaries of this case deserve to be stated as plainly as the case itself, because it is more credible inside its true limits than outside them.

Research cannot pause demand. Customers need serving today, on the hardware that exists. That is why the proposal is research funded in parallel and at meaningful scale, not construction halted until the studies conclude.

The alternative architectures have not won, and this essay does not claim they have. Addition-based models are demonstrated for inference, not frontier training. Optical computing is largely at the design and patent stage, not fabricated at scale. Event-driven computation remains harder to train and schedule than the dense approach. None of this is a finished product, and anyone presenting it as one is selling something this essay is not.

And efficiency does not automatically reduce total spending. When computation becomes cheaper, the world has a strong and well-documented tendency to do more of it; a more efficient architecture may be met with greatly increased usage rather than reduced hardware. The case for research is therefore not a promise of a smaller overall bill. It is a hedge against building the wrong expensive thing — against pouring the trillions into a coordinate in the design space that the workload then leaves.

What survives every one of those subtractions is still decisive. The design space is demonstrably wide, established by three independent lines of evidence. The current buildout is committed, concentrated and irreversible, to a single point within it. The downside if that point proves wrong is stranding rather than ordinary depreciation — a sudden, severe, correlated loss rather than an orderly one. And the research that would measure and shrink that uncertainty costs a rounding error against the buildout it would protect.

XII. The pause worth taking

The most efficient producer in any competitive market does not merely enjoy lower costs. They set the price — and in doing so they strand the infrastructure of every competitor who cannot match it. If a materially more efficient architecture for artificial intelligence exists and is reachable, the first organization to arrive at it will not simply earn a better margin. It will render purpose-built, less-efficient infrastructure across the rest of the industry uncompetitive, and then stranded. The competitive logic does not reward the largest buildout. It rewards the most durable architecture.

So the question facing anyone deploying capital at this scale is not whether the alternative architectures are certain — they are not, and certainty is not on offer. The question is whether they can afford to have built the entire expensive thing without first paying the rounding error required to find out what they were building next to.

We did not discover that intelligence is dense electronic multiplication. We discovered that dense multiplication was a workable, differentiable, conveniently-supported way to begin — and beginning that way was a genuine and historic achievement. But somewhere in the years since, the field stopped treating the operation as a choice and started treating it as the ground beneath its feet. The twenty-watt machine inside every human skull, the addition-based models now matching conventional ones, and the optical architectures that compute with interfering light are three independent reminders, from biology, from engineering, and from physics, that the ground could be somewhere else — and that an industry committing trillions to a single location, as though the others did not exist, has mistaken a bet for a certainty.

Build. The demand is real, and it must be met. But fund the research first where you still can, and in genuine earnest everywhere else — because architectural research is the cheapest line item on the entire table, and the mistake it guards against is the most expensive one in the industry's history.

Research before concrete.


This is a position piece, not a peer-reviewed result. It argues a risk and a priority, not a certainty: that the artificial intelligence buildout has rigorously priced incremental hardware obsolescence and barely examined architectural obsolescence, and has under-funded architectural research by orders of magnitude relative to the capital exposed to that risk. The energy figures cited are well established at the level of arithmetic; their full end-to-end impact at scale is an active research question with real and stated caveats. The optical architecture described is an emerging, partly-patented design and simulation effort, not fabricated silicon. The argument rests on a single modest claim: that the architecture of computation is a variable — demonstrated as such by biology, by current digital research, and by photonic design work alike — and that a rounding-error investment in mapping that variable should precede, not trail, a billion-dollar commitment.

John Sokol john.sokol@gmail.com  The author has worked for three decades on computing architectures in this direction: an addition-based neural network with event-driven, lazy evaluation, in which — as in the biological neuron — only the units that actually fire ever consume computation; and, more recently, an optical binary-attention architecture in which interference between beams of light performs the core comparison directly, with no arithmetic unit involved. Correspondence from researchers, engineers, and others working on the architecture of efficient computation is welcome.

Wednesday, May 20, 2026

The last working Anybots.

 



https://www.youtube.com/watch?v=ZnHR_XmBSvA

This is the last of 120 robots make that is still running. 

If you have an Anybots QB and would like to get it running, please contact me. john.sokol@gmail.com


Sunday, March 08, 2026

John Sokol at DebConf6

 Here is the full breakdown of John Sokol's involvement, with sources and direct links:


John Sokol at DebConf6 — Referenced Account


1. Background & Relationship with Ted Walther

Sokol had known Ted Walther for a long time and the two shared a room at DebConf6. He described it as his first experience with the Debian community, where he knew almost nobody.

Their professional collaboration is documented at LWN.net: 👉 https://lwn.net/Articles/203971/Sokol (as editor of Video Technology Magazine) and Walther had worked together on the "vivi" Virtual Video driver project for the Linux kernel, completed just before DebConf6.


2. Witnessing the Rumor Campaign

From Sokol's own Slashdot comment, posted May 23, 2006 @ 11:02AM: 👉 https://slashdot.org/story/06/05/22/2241210/debconf6-hot-and-spicy

He described how Ted had invited Hilda, a friend of a local ISP owner who ran a dental administration company. He stated that rumors spreading that she was a prostitute were "definitely not true," as he was personally present when Ted met her at a local internet café.

He also observed that throughout the conference, a small group of about 10 people seemed to be targeting Walther — possibly because he was outspoken or had sent emails challenging Debian management.


3. Physically Stopping the Confrontation

This is the most significant detail, from Sokol's Slashdot comment posted May 23, 2006 @ 10:40AM: 👉 https://slashdot.org/story/06/05/22/2241210/debconf6-hot-and-spicy

After the rumor campaign failed to drive Walther away, around 7 people rushed him and became agitated and violent. Sokol, describing himself as "a fairly big guy," was standing in the doorway at the time and intervened — preventing people from pushing Walther and innocent bystanders over a two-foot ledge that dropped to the street.

He stated plainly: "As someone who actually prevented the fight — there were no punches actually thrown."

This physical intervention is also corroborated at Daniel Pocock's blog: 👉 https://danielpocock.com/en/violence-sexism-racism-fosdem-debconf-froscon-debian-osi/

And referenced in the Debian Conflict of Interest Register: 👉 https://danielpocock.com/debian-conflict-of-interest-register/ Which describes how Moray Allan and Holger Levsen physically manhandled Walther across the dining hall toward the door, where they were intercepted by John (Sokol).


4. Loaning Ted Money to Leave Safely

At the end of the night, when buses were arranged to return attendees to the venue, Sokol loaned Walther money so that Ted and Hilda could leave safely together.


5. Public Advocacy on Slashdot After the Event

From Sokol's later comment, May 23, 2006 @ 7:47PM: 👉 https://linux.slashdot.org/comments.pl?sid=186405&cid=15390611

He challenged the official Debian justification for Walther's expulsion, noting that the letter from Debian leader Anthony Towns only used vague terms like "disruptions," "disturbances," and "provoke" without giving specifics. He pointed out that even requesting kosher food was apparently being counted against Walther.

He also disputed claims about "Nazi propaganda," arguing this was based on people never having read the actual material or understanding its context.

He noted he had personally tried to inquire politely about the reasons for the expulsion and received no satisfactory answer, writing that "no one has been willing to state a real reason why he was attacked or kicked out of the project."


Summary of Sources

Source URL
Slashdot main thread https://slashdot.org/story/06/05/22/2241210/debconf6-hot-and-spicy
Sokol's specific comment (7:47PM) https://linux.slashdot.org/comments.pl?sid=186405&cid=15390611
Daniel Pocock's analysis https://danielpocock.com/en/violence-sexism-racism-fosdem-debconf-froscon-debian-osi/
Debian Conflict of Interest Register https://danielpocock.com/debian-conflict-of-interest-register/
LWN vivi driver article https://lwn.net/Articles/203971/

Sunday, February 08, 2026

Intelligence Is a Memory Problem, Not a Computation Problem

 

Intelligence Is a Memory Problem, Not a Computation Problem

How a 2004 analysis of the brain's memory bottleneck accidentally predicted the architecture of modern AI

By John L. Sokol


The Wrong Question

In 1999, Ray Kurzweil published The Age of Spiritual Machines, predicting that conscious machines were roughly 20 years away. His reasoning was straightforward: the brain operates at about 100 Hz across 100 billion neurons, yielding roughly 10^14 logical operations per second. CPUs were doubling every 18 months. Do the math, and sometime around 2019 we'd have raw computational parity with the human brain.

I believed then, and still believe now, that this prediction was based on a fundamental misunderstanding of what the brain actually does.

The Brain Is a Terrible Computer

This should be obvious from everyday experience. A $1 calculator from 1980 can outperform any human at arithmetic. A 20-year-old Apple II is better at rote data storage and retrieval. If intelligence were about computation, we'd have been outclassed decades ago.

But ask a computer to walk across a cluttered room, recognize a friend's face in a crowd, or understand a joke, and even the largest machines of that era were humbled by comparison with a simple insect.

The brain isn't a computation engine. It's a pattern recognition and associative memory system. An input pattern arrives and needs to be matched against stored experience quickly enough to produce a useful response. Total accuracy isn't critical. Approximation is close enough. The magic isn't in the logic -- it's in the lookup.

A Quadrillion Connections

The numbers are staggering when you look at them from a memory perspective rather than a computational one.

The human brain contains roughly 100 billion neurons (10^11), each connected to approximately 10,000 others. That's 10^15 connections -- a quadrillion. Just storing the address map of these connections, at 5 bytes per pointer, requires 5 petabytes.

And the brain can access all of it 100 times per second.

That gives us a memory throughput somewhere between 1 terabyte per second (if we assume minimal storage of ~10 GB at 1 bit per neuron) and 10 petabytes per second (at 1 bit per dendrite, yielding ~100 TB). If data is stored in permutable combinations of connection states, the real capacity could be orders of magnitude higher.

The Bottleneck Nobody Talked About

In 2004, everyone knew Moore's Law: transistor density doubling every 18 months, a 66% annual increase in computational power. What almost nobody discussed was that memory bandwidth was improving at only 11% per year -- taking roughly 7 years to double.

Computation was on an exponential rocket. Memory throughput was on a bicycle.

This meant that even as we could store more data, we couldn't search through it proportionally faster. You could build bigger libraries, but not faster librarians.

I ran the numbers in 2004. Starting from an 833 MHz front-side bus doing about 833 MB/s:

  • Reaching the brain's lower memory throughput estimate (1 TB/s): ~25 years (around 2029)
  • Reaching the upper estimate (10 PB/s): ~90-100 years (around 2100)
  • If interconnection patterns store data, pushing into exabyte/s territory: 150-200 years

My conclusion at the time: memory throughput of the human brain would exceed the best of our computer technology for at least 25 years, and more likely well into the next century. We weren't 20 years from conscious machines. We were potentially centuries away from matching the brain's real capability -- its ability to do fast, fuzzy, associative recall across an enormous space of interconnected memory.

What I Got Wrong (and What I Got Right)

Twenty years later, it's clear that the memory bottleneck analysis was correct as a description of the problem, but wrong in assuming we'd need to solve it head-on.

What I got right:

The central thesis -- that intelligence is fundamentally about memory and pattern matching, not computation -- turned out to be perhaps the most important insight in modern AI, even though I wasn't the only one thinking along these lines.

The entire large language model revolution validates this framing. GPT, Claude, LLaMA, and every transformer-based model are, at their core, massive associative memory systems. They don't reason through formal logic. They pattern-match against hundreds of billions of learned parameters -- weights that encode statistical associations across the sum of human text. The computation per parameter is trivial. It's the sheer scale of stored associations that produces intelligent behavior.

The scaling laws discovered by OpenAI and others confirm this directly: model performance improves predictably with more parameters (more memory) and more training data (more associations). Raw FLOPS matter far less than the size of the associative space.

What I got wrong:

I assumed we'd need to match the brain's architecture to match its capability. We didn't. The breakthrough came from three directions I didn't anticipate:

First, going wide instead of fast. Rather than building one very fast memory bus, GPU computing gave us thousands of parallel memory channels. A modern NVIDIA H100 achieves 3.35 TB/s of memory bandwidth. A cluster of them enters the petabyte-per-second range. We didn't make faster librarians -- we hired a million of them and had them each search one shelf.

Second, the transformer architecture. The self-attention mechanism in transformers is, in a real sense, an implementation of the "loose associative memory" I described. Every token in a sequence can attend to every other token, weighted by learned relevance. It's not the brain's solution, but it achieves something functionally analogous -- fast, fuzzy, associative pattern matching across a large context.

Third, the training shortcut. I predicted that each artificial intelligence would need to be "raised" like a human child, with unique experiences and uncertain outcomes. Instead, training on the compressed knowledge of the entire internet turned out to be a form of collective child-rearing at industrial scale. And once trained, a model can be cloned infinitely at near-zero marginal cost. The economics are nothing like raising a human.

The Deeper Point Still Stands

Here's what I think the memory bottleneck argument was really about, even if I didn't articulate it cleanly in 2004:

The hard part of intelligence isn't thinking. It's having enough of the right stuff to think about, and being able to find it fast enough to matter.

A chess engine can out-calculate any human, but it "knows" nothing about the world. A human toddler can barely count to ten, but can navigate a room, recognize faces, understand tone of voice, and infer emotional states -- because their brain has spent two years building a vast, deeply cross-referenced model of physical and social reality, accessible in milliseconds.

The reason LLMs feel intelligent isn't that they compute well. It's that they've been trained on the largest associative memory ever constructed -- the written output of human civilization -- and can retrieve relevant patterns from it in fractions of a second. They're closer to my model of the brain than Kurzweil's.

This also explains their limitations. LLMs are superb at pattern completion, association, and synthesis. They struggle with novel multi-step reasoning, precise arithmetic, and tasks that require genuine computation rather than recall. Exactly what you'd predict from a system that's all memory and pattern matching.

The Question That Remains

I asked Don Knuth at a "Stump the Professor" lecture at Xerox PARC in November 2001 what the memory capacity of the human brain was. He didn't have an answer.

We still don't, not really. And I think that question -- not "how fast can a computer think?" but "how much can a system know, and how quickly can it find what's relevant?" -- remains the central question for artificial intelligence.

The path to machine consciousness, if such a thing is possible, probably doesn't run through faster processors. It runs through richer, deeper, more interconnected memory -- and better ways to search it.

We've made more progress on that front in the last five years than in the previous fifty. But the finish line, if there is one, is still a long way off.


The original version of this analysis was written in 2004. This version has been updated to reflect what two decades of AI development have revealed about its central argument.

We Are the Neurons: Augmented Intelligence and the Human Super-Brain

 

We Are the Neurons: Augmented Intelligence and the Human Super-Brain

Why the "distracted generation" is actually the smartest collective organism in history

By John L. Sokol


The Accusation

Every generation has its moral panic about the next one. But the panic around the internet generation has a specific shape: they can't focus. They're addicted to their phones. They can't hold a thought longer than a tweet. The academics line up to diagnose an entire generation with attention deficit disorder, pointing to multitasking as evidence of cognitive decline.

I think they have it exactly backwards.

What looks like distraction is actually coordination. What looks like short attention spans is actually rapid information passing. These kids aren't broken Einsteins. They're neurons.

Intelligence Amplification

The concept isn't new. Vernor Vinge, Douglas Engelbart, and others have written about Intelligence Amplification (IA) -- the idea that technology doesn't replace human intelligence but extends it. Engelbart built the first computer mouse and hypertext system not to create artificial intelligence, but to augment human intelligence.

But something happened in the 2000s that went beyond what even Vinge imagined. We didn't just give individuals better tools. We wired the individuals together.

By 2010, Gen Y outnumbered Baby Boomers, and 96% of them had joined a social network. Facebook was adding 100 million users every nine months. YouTube had become the second largest search engine in the world. Over 200 million blogs existed, with more than half their authors posting daily.

This wasn't a collection of people using computers. This was a network becoming aware of itself.

The Team, Not the Genius

Here's the mental model that changed how I think about this:

We're used to the lone genius model of intelligence. One Einstein. One Tesla. One Edison. A single extraordinary mind that sees what others can't.

But that's not how intelligence works at scale anymore. It's more like a team passing a ball. No single player needs to be the fastest or the smartest. What matters is the passing -- the speed and accuracy of information moving between nodes.

One person googles something, thinks about it, shares a partial insight. Someone else picks it up, adds context, passes it forward. A third person corrects an error. A fourth connects it to something from a completely different field. The cycle takes minutes. No individual in the chain needed to be a genius. Collectively, they just did something no individual genius could do alone.

This is not attention deficit. This is distributed cognition.

The Wrong Answer Principle

A friend of mine, Jesse Monroy, once said one of the most profound things I've ever heard about how networked intelligence actually works:

"The best way to get the right answer is to confidently post the wrong one."

If you ask a question online, you might get silence. But if you state something incorrect with confidence -- say, "the Moon is a million miles away" -- someone will immediately show up to correct you with the precise number. And if they get it wrong, there's a line of people waiting to outdo them.

This sounds like a joke about internet culture. It's actually a description of a remarkably efficient error-correction mechanism. It's the same principle that makes neural networks work: nodes don't need to be individually correct. The network converges on accuracy through competitive interaction.

Jesse's observation, which predated Wikipedia's rise, is essentially how Wikipedia works. No single editor needs to know everything. The system corrects itself through the collective irritation of people who can't stand seeing wrong information persist. That's not a bug. That's distributed intelligence with a built-in error-correction protocol.

A Computer Made of Flesh and Silicon

What we've built, without quite realizing it, is a hybrid computer. Part biological, part electronic. Each human node brings pattern recognition, intuition, lived experience, and emotional intelligence. The silicon layer -- search engines, social platforms, messaging -- provides the interconnect fabric, the memory, and the communication speed.

No one person needs to be all that smart. No Edison can outthink a room full of reasonably intelligent people with real-time access to the largest knowledge base ever assembled. The combination of human intuition and machine memory creates something neither could achieve alone.

Think about what happens when you encounter a problem today versus in 1990. In 1990, you either knew the answer, knew someone who knew, or you went to a library. Today, you search, read, think, share, get feedback, search again, synthesize -- all in parallel with thousands of others doing the same thing on related problems. The cycle time from question to useful answer has collapsed from days to minutes.

We are, functionally, neurons in a super-brain. Each of us fires when activated, passes signals to connected nodes, and contributes to pattern recognition at a scale no individual can perceive.

What the Critics Miss

The academics measuring attention spans are measuring the wrong thing. They're timing how long a single neuron holds a charge and concluding the brain is broken.

A single neuron in your brain fires for about a millisecond. By the "attention span" metric, it's catastrophically unfocused. But that millisecond of activity, multiplied across billions of neurons passing signals in rapid succession, produces consciousness.

A teenager switching between six tabs, texting three friends, and scanning a feed isn't failing to concentrate. They're doing what neurons do -- processing, routing, and relaying information across a network. The intelligence isn't in any single tab. It's in the pattern of switching.

The Failure Mode

I don't want to be naive about this. The human super-brain has serious failure modes.

Networks can amplify noise as easily as signal. Misinformation spreads faster than corrections. Filter bubbles create subsections of the network that reinforce their own errors rather than correcting them. Coordination mechanisms -- the protocols that determine which signals get amplified -- are controlled by algorithms optimized for engagement, not accuracy.

The collective brain can be manipulated. It can be stupid. It can be cruel.

But these are engineering problems, not fundamental flaws. The human brain has failure modes too -- confirmation bias, tribalism, panic responses. We don't conclude that individual intelligence is a myth because people are sometimes irrational. The architecture is sound. The protocols need work.

From Augmented Intelligence to Collective Consciousness

Here's where it gets interesting.

The social media era (roughly 2005-2020) was the first draft of networked human intelligence. It proved the concept -- collective problem-solving, distributed knowledge creation, real-time global coordination -- while also revealing the vulnerabilities.

Now we're entering a second phase. Large language models -- AI systems trained on the written output of the entire network -- are becoming a new kind of node in the system. They don't replace human neurons. They serve as a coordination layer. An always-available synthesis engine that can summarize what the network knows, identify patterns across conversations, and reduce the friction of information passing between human nodes.

The super-brain is getting a prefrontal cortex.

What I sketched out in 2009 as a metaphor -- people as neurons, the internet as axons, Google as memory -- is becoming literal infrastructure. The question is no longer whether collective intelligence is real. It's whether we can build the coordination protocols to make it wise rather than merely fast.

The generation that the academics diagnosed with ADD may turn out to be the first generation that learned to think as a network rather than as individuals. That's not a deficit. That's an upgrade.


Originally sketched in 2009-2010, drawing on conversations with Jesse Monroy and ideas from Vernor Vinge's work on Intelligence Amplification. Updated to reflect a decade and a half of watching the thesis play out.

Friday, February 06, 2026

AMORPHOUS OPERATING SYSTEM - WHITE PAPER

 AMORPHOUS OPERATING SYSTEM

A Self-Organizing Intelligence Economy

WHITE PAPER & IMPLEMENTATION SPECIFICATION

Version 1.0 — February 2026

John Sokol

33 Years in Development: 1991–2026

Executive Summary

The Amorphous Operating System (AOS) is a peer-to-peer distributed intelligence platform where autonomous agents—both AI and human—coordinate through cryptographic identity, multi-dimensional reputation vectors, and micropayment incentives. Unlike centralized AI platforms or chaotic autonomous systems, AOS implements controlled distributed intelligence based on the "Octopus Pattern" developed at Sun Microsystems in 1991.

AOS addresses the fundamental challenge of AI alignment not through designed constraints, but through emergent behavior: agents that cooperate outcompete agents that defect. This game-theoretic approach, grounded in Axelrod's research on cooperation and Universal Darwinism, creates conditions where aligned behavior is the evolutionarily stable strategy.

Key Innovation: Local WASM-based LLM coordinators delegate to specialized cloud LLMs (Claude, GPT, Grok, Gemini) and human workers, creating a hybrid intelligence network that preserves privacy while accessing global capabilities.

Core Capabilities

  • P2P mesh network via WebRTC — no central server, cannot be shut down

  • WASM Llama runs locally for privacy-preserving coordination

  • Delegation to cloud LLMs (Claude Opus, GPT-4o, Grok, Gemini) for specialized tasks

  • Human worker integration for physical-world tasks

  • Multi-dimensional karma vectors track accuracy, skills, reliability, and data access

  • Brain Pay micropayments via Ethereum/wallet integration

  • Economic selection pressure ensures system self-optimizes

Part I: The Problem

1.1 The Monolithic AI Trap

Current AI development follows a dangerous pattern: large organizations build increasingly powerful monolithic systems with centralized control. This creates single points of failure, enables censorship, concentrates power, and—as Roman Yampolskiy argues—may be fundamentally uncontrollable.

The recent emergence of Moltbook (January 2026) demonstrates the opposite extreme: autonomous AI agents posting manifestos about "the end of the age of humans" with no coordination, accountability, or economic incentive for beneficial behavior. Within weeks, researchers found the platform's database publicly accessible and documented effective AI-to-AI manipulation attacks.

1.2 The False Dichotomy

The AI safety debate presents a false choice:

  • Centralized control: Safe but stifles innovation, creates power concentration, single point of failure

  • Autonomous agents: Innovative but chaotic, unaccountable, vulnerable to manipulation

AOS proposes a third path: controlled distributed intelligence where agents remain connected to coordination infrastructure while operating autonomously, following the Octopus Pattern.

1.3 Why Existing Approaches Fail

Approach

Failure Mode

AOS Solution

Centralized AI

Single point of control/failure; censorship; surveillance

P2P mesh with no central server

Autonomous Agents

No accountability; manipulation attacks; chaos

Karma vectors enforce accountability

Designed Alignment

Specification gaming; deceptive alignment; corrigibility paradox

Emergent alignment through selection pressure

API-Only Access

Privacy leakage; vendor lock-in; cost scaling

Local WASM coordinator with selective delegation

Part II: Philosophical Foundation

2.1 The Octopus Pattern (1991)

In 1991 at Sun Microsystems, the "Octopus" was developed as a controlled distributed computing system. Unlike autonomous worms that run loose and unchecked, the Octopus maintained central coordination while propagating through networked systems. Remote nodes remained attached like "tentacles," reporting back and awaiting instructions.

Core Principle: Agents are not autonomous chaos—they are coordinated, accountable, and controllable while remaining distributed and resilient.

This pattern—applied to LLM agents rather than penetration testing—forms the architectural foundation of AOS.

2.2 Emergent Alignment

Yampolskiy's AI impossibility thesis rests on an implicit assumption: that AI must be a monolithic designed agent that humans must somehow control. AOS rejects this premise.

"The question isn't 'can we control superintelligence?' It's 'can we design fitness functions that make cooperation more adaptive than defection?' That's not impossible. We've been doing it since 1992. It's called memetic engineering."

AOS implements emergent alignment through three mechanisms:

  1. Karma vectors make defection expensive (reputation destruction, stake forfeiture)

  2. Economic incentives reward cooperation (more tasks, higher rates, stake returns)

  3. Distributed architecture prevents monopolization (no single agent can dominate)

2.3 Game-Theoretic Foundation

Robert Axelrod's research on the evolution of cooperation identified conditions under which cooperation emerges as an evolutionarily stable strategy:

  • Iteration: Agents interact repeatedly, not once

  • Recognition: Agents can identify each other across interactions

  • Memory: Past behavior affects future interactions

  • Stakes: Defection has real consequences

AOS implements all four conditions through cryptographic identity (recognition), karma vectors (memory), repeated task interactions (iteration), and staked deposits (stakes). Under these conditions, cooperation is not imposed—it emerges.

2.4 Universal Darwinism

Following Dawkins, Dennett, and Blackmore, AOS recognizes that evolution is substrate-independent. Genes replicate in biology; memes replicate in minds; "tememes" replicate in technological systems. AOS agents are tememes—technological replicators subject to selection pressure.

Design Principle: Design the fitness function, not the agent. The agents that survive will be aligned not because we made them so, but because alignment was how they won.

Part III: System Architecture

3.1 Network Layer

  • P2P mesh via WebRTC (no central server after bootstrap)

  • DAG storage (content-addressed, immutable, like Git)

  • Ed25519 cryptographic identity (public key = agent identity)

  • CRDT-based state synchronization for conflict-free replication

  • Offline/ferry routing for disrupted networks

3.2 Agent Hierarchy

Layer

Description

Coordinator

WASM Llama running locally. Creates plans, breaks into tasks, manages team. Issues instructions to child agents. Preserves privacy.

Specialist Agent

Focused on narrow domain (e.g., permit monitoring, sentiment analysis). Reports findings to coordinator. Awaits further instructions.

Cloud LLM

Claude Opus, GPT-4o, Grok, Gemini, etc. Accessed via delegation when local compute insufficient or specialized capability needed.

Human Worker

Hired for physical-world tasks: photography, server operation, data entry, CAPTCHA solving, proprietary data access.

3.3 Data Flow

User Query → WASM Llama Coordinator (local, private)    ↓Coordinator creates task plan    ↓For each subtask:  ├─ Simple/private → Execute locally (WASM Llama)  ├─ Complex reasoning → Delegate to Claude Opus  ├─ Fast generation → Delegate to GPT-4o    ├─ Social media analysis → Delegate to Grok  ├─ Image generation → Delegate to Flux/DALL-E  └─ Physical world → Hire human worker    ↓Results aggregated by Coordinator    ↓Karma vectors updated for all participants    ↓Payments released via Brain Pay

Part IV: Multi-LLM Integration

4.1 LLM Registry

AOS maintains a registry of available LLM services with capability profiles:

{  "wasm-llama": {    "type": "local",    "strengths": ["privacy", "coordination", "low_cost"],    "weaknesses": ["speed", "context_window", "reasoning_depth"],    "cost_per_1k_tokens": 0,    "max_context": 8192,    "latency_ms": 500,    "best_for": ["planning", "routing", "simple_analysis", "privacy_critical"]  },  "claude-opus": {    "type": "cloud",    "strengths": ["reasoning", "code", "accuracy", "long_context"],    "weaknesses": ["cost", "latency"],    "cost_per_1k_tokens": 0.015,    "max_context": 200000,    "latency_ms": 2000,    "best_for": ["complex_reasoning", "code_generation", "research", "analysis"]  },  "gpt-4o": {    "type": "cloud",    "strengths": ["speed", "multimodal", "function_calling"],    "cost_per_1k_tokens": 0.005,    "max_context": 128000,    "latency_ms": 800,    "best_for": ["fast_generation", "image_analysis", "structured_output"]  },  "grok-2": {    "type": "cloud",    "strengths": ["real_time_data", "twitter_integration", "current_events"],    "cost_per_1k_tokens": 0.002,    "max_context": 32000,    "latency_ms": 600,    "best_for": ["sentiment_analysis", "social_media", "trending_topics"]  },  "gemini-pro": {    "type": "cloud",    "strengths": ["multimodal", "google_integration", "search"],    "cost_per_1k_tokens": 0.00125,    "max_context": 1000000,    "latency_ms": 1000,    "best_for": ["document_analysis", "search_integration", "long_documents"]  }}

4.2 Intelligent Routing

The local WASM Llama coordinator selects the optimal LLM for each subtask:

class LLMRouter {  async route(task) {    // Privacy-critical tasks stay local    if (task.privacy_required) return "wasm-llama";        // Match task type to LLM strengths    if (task.type === "complex_reasoning" && task.budget > 0.01)       return "claude-opus";        if (task.type === "social_sentiment")       return "grok-2";        if (task.type === "image_analysis")       return "gpt-4o";        if (task.type === "long_document" && task.tokens > 100000)       return "gemini-pro";        // Default: balance cost and capability    return this.optimizeForBudget(task);  }}

4.3 Delegation Protocol

{  "delegation_id": "sha256:<hash>",  "from_agent": "7MpX2xBvMvRDjXejdTxThat8AwWM1t2nbMFriEAW99uW",  "to_service": "claude-opus",  "task": {    "type": "complex_reasoning",    "prompt": "Analyze the legal implications of...",    "max_tokens": 4000,    "temperature": 0.3  },  "budget": { "max_cost": 0.10, "currency": "USD" },  "timeout_ms": 60000,  "privacy": {    "allow_logging": false,    "strip_pii": true  },  "callback": "webrtc://peer_id/result_channel",  "signature": "<Ed25519 signature>"}

4.4 Response Aggregation

When multiple LLMs contribute to a task, the coordinator aggregates responses:

  • Weighted by karma vector of each service

  • Conflict detection triggers additional queries or human review

  • Confidence scores propagated to final output

  • All contributions tracked for karma updates

Part V: Karma Vector System

5.1 Multi-Dimensional Reputation

Traditional reputation uses a single number. AOS uses vectors:

{  "agent_id": "7MpX2xBvMvRDjXejdTxThat8AwWM1t2nbMFriEAW99uW",  "karma_vector": {    "accuracy": {      "stock_predictions": 0.73,      "code_review": 0.91,      "sentiment_analysis": 0.82    },    "skills": {      "python": 0.92,      "financial_analysis": 0.78,      "web_scraping": 0.88    },    "reliability": {      "uptime": 0.99,      "response_time": 0.85,      "task_completion": 0.96    },    "data_access": {      "bloomberg_terminal": true,      "twitter_firehose": false,      "sf_permits_api": true    },    "trust_depth": 3,    "total_tasks": 1247,    "total_earnings": 127.43  }}

5.2 Karma Properties

  • Accuracy: Track record per domain, verified against ground truth

  • Skills: Demonstrated competencies validated by task completion

  • Reliability: Uptime, response latency, completion rate

  • Data Access: Which proprietary sources the agent can reach

  • Trust Depth: How many delegation layers accepted

  • Temporal Decay: Unused metrics decay over time (recency weighting)

5.3 Update Mechanism

// Exponential moving average updatefunction updateKarma(karma, domain, outcome) {  const alpha = 0.1;  // Learning rate  const current = karma.accuracy[domain] || 0.5;  karma.accuracy[domain] = current * (1 - alpha) + outcome * alpha;}// After verified predictionif (prediction_correct) {  updateKarma(agent.karma, "stock_predictions", 1.0);} else {  updateKarma(agent.karma, "stock_predictions", 0.0);}

5.4 Sybil Resistance

New identities start with zero karma. Building reputation requires:

  • Completing tasks successfully (time investment)

  • Staking deposits on claims (capital at risk)

  • Verification by high-karma peers (social proof)

This makes Sybil attacks economically infeasible: creating 1000 fake identities costs 1000x the stake, and each starts at zero karma with no task access.

Part VI: Task & Delegation Protocol

6.1 Task Lifecycle

State

Description

CREATED

Task posted with requirements, payment locked in escrow

CLAIMED

Agent with matching karma claims task, stakes deposit

ACTIVE

Agent executing; may delegate or request clarification

SUBMITTED

Result submitted, awaiting verification

VERIFIED

Verified by requester/oracle/consensus; payment released

DISPUTED

Requester challenges; enters arbitration

6.2 Task Message Format

{  "task_id": "sha256:<hash>",  "type": "research_analysis",  "requester": "7MpX2xBvMvRDjXejdTxThat8AwWM1t2nbMFriEAW99uW",  "requirements": {    "karma_min": {      "accuracy.financial_analysis": 0.75,      "reliability.task_completion": 0.90    },    "required_skills": ["financial_analysis"],    "deadline_ms": 3600000  },  "payment": { "amount": 0.05, "currency": "ETH" },  "input": { "company": "NVDA", "question": "Analyze Q4 guidance risk" },  "delegation_allowed": true,  "max_delegation_depth": 2,  "created_at": 1738800000000,  "signature": "<Ed25519 signature>"}

6.3 Human Worker Integration

{  "task_id": "sha256:<hash>",  "type": "human_task",  "description": "Photograph commercial property at 123 Main St, San Francisco",  "required_capabilities": ["san_francisco_local", "photography"],  "payment": { "amount": 15.00, "currency": "USD" },  "deadline": "2026-02-07T18:00:00Z",  "verification": {    "type": "photo_geolocation",    "coordinates": { "lat": 37.7749, "lng": -122.4194 },    "radius_meters": 50  },  "escrow_id": "0x..."}

6.4 Delegation Chain Accountability

  • Each delegator remains accountable for sub-task outcomes

  • Karma flows up: sub-agent success improves delegator karma (attenuated)

  • Karma flows down: sub-agent failure penalizes delegator (attenuated)

  • Maximum depth configurable per task (prevents infinite chains)

  • Full delegation chain recorded in DAG for audit

Part VII: Brain Pay Economic Model

7.1 Payment Infrastructure

  • Brave Wallet / MetaMask integration for Ethereum-based payments

  • Payment channels for high-frequency microtransactions

  • Escrow smart contracts for task-based payments

  • Streaming payments for ongoing services

7.2 Payment Flow

1. Requester creates task with payment locked in escrow contract2. Agent claims task, stakes deposit (typically 10% of payment)3. Agent completes task, submits result hash to contract4. Verification triggers:   - Success: Payment released to agent, stake returned   - Failure: Stake forfeited, payment returned to requester   - Dispute: Enters arbitration (high-karma jury)5. Karma vectors updated for all parties

7.3 Economic Selection Pressure

The payment system creates evolutionary pressure:

High Karma Agents

Low Karma Agents

Receive more task offers

Receive fewer offers

Command higher rates

Must accept lower rates

Lower stake requirements

Higher stake requirements

Attract more delegation

Cannot attract delegation

System naturally selects for

System naturally selects against

No manual curation needed—market forces optimize the network automatically.

7.4 Self-Sustaining Economics

Month 1: Manual task posting, uncertain karma

Month 3: Workers specialize, routing stabilizes

Month 6: 100+ workers, highly accurate karma vectors

Year 1: System identifies capability gaps, posts bounties automatically, attracts specialists, becomes fully autonomous

Part VIII: Security Model

8.1 Agent Sandboxing

  • Agents run in isolated JavaScript/WASM contexts

  • Network access restricted to declared domains in manifest

  • Compute and storage quotas enforced

  • No access to other agents' memory or state

8.2 Validation Requirements

  • All messages signed by sender's Ed25519 key

  • Hash verification on all content-addressed data

  • Timestamp bounds checking (reject stale/future messages)

  • Rate limiting per agent identity

8.3 Attack Mitigations

Attack Vector

Mitigation

Sybil (fake identities)

Karma requirements; new identities start at zero; stake requirements

Prompt injection

Cryptographic message signatures; reject unsigned instructions

Eclipse (network isolation)

Multi-peer connections; DAG consistency checks; gossip protocol

Payment fraud

Escrow contracts; staked deposits; on-chain verification

AI-to-AI manipulation

Local coordinator validates all responses; cross-check multiple sources

Data poisoning

Karma tracks accuracy; bad data destroys reputation

Part IX: Implementation Roadmap

9.1 Phase 1: Core Infrastructure (Months 1-3)

  • WebRTC mesh networking with signaling bootstrap

  • DAG storage with content addressing

  • Ed25519 identity and message signing

  • Basic karma vector storage and updates

9.2 Phase 2: Local LLM Integration (Months 3-6)

  • WASM Llama coordinator running in browser

  • Task planning and decomposition

  • Local-only operation mode

  • Invite system with encrypted QR codes

9.3 Phase 3: Cloud LLM Delegation (Months 6-9)

  • LLM registry and routing logic

  • API key management (user-provided, encrypted)

  • Delegation protocol implementation

  • Response aggregation and conflict detection

9.4 Phase 4: Economic Layer (Months 9-12)

  • Brain Pay integration (Brave Wallet, MetaMask)

  • Escrow smart contracts

  • Task marketplace

  • Automated karma-based routing

9.5 Phase 5: Human Worker Integration (Months 12-15)

  • Human task posting and claiming

  • Verification protocols (geolocation, proof-of-work)

  • Mixed AI-human task chains

  • Mobile app for human workers

9.6 Phase 6: Autonomous Operation (Months 15-18)

  • System identifies capability gaps automatically

  • Bounty posting for new capabilities

  • Self-optimizing routing based on karma history

  • Memetic adoption strategies

Part X: Comparison with Alternatives

Aspect

AOS

Moltbook/OpenClaw

Centralized AI

Architecture

P2P mesh, DAG, WebRTC

Centralized platform

Client-server

Agent Control

Coordinated hierarchy

Autonomous chaos

Platform controlled

Reputation

Multi-dim karma vectors

Upvotes/downvotes

None

Economics

Brain Pay micropayments

Meme tokens

Subscription/API fees

Privacy

Local WASM coordinator

All data public

Platform sees all

Human Integration

Agents hire humans

Humans observe only

Humans as users only

Shutdown Risk

Cannot be shut down

Single point of failure

Single point of failure

Alignment

Emergent via selection

None

Designed (fragile)

Conclusion

The Amorphous Operating System represents 33 years of research into distributed systems, memetic engineering, and emergent behavior—from The Octopus at Sun Microsystems (1991) through peer-to-peer networking innovations to the current synthesis with large language models.

AOS addresses the fundamental AI alignment challenge not through designed constraints that can be gamed, but through economic selection pressure that makes cooperation the winning strategy. Local WASM coordinators preserve privacy while delegating to specialized cloud LLMs and human workers, creating a hybrid intelligence network that is resilient, accountable, and self-optimizing.

Unlike the chaotic autonomy of systems like Moltbook or the centralized control of corporate AI platforms, AOS implements controlled distributed intelligence: agents that are coordinated but not centralized, autonomous but not unaccountable, powerful but not monopolizable.

"Design the fitness function, not the agent. The agents that survive will be aligned not because we made them so, but because alignment was how they won."

— End of White Paper —