Dima's Blog

An Agent Is Just LLM + Shell + Filesystem + Markdown + Cron

For decades, the AI world struggled to define "agent." Researchers had theories. Engineers had architectures. Startups had demos. But the definition remained slippery — until it suddenly became obvious in the most humbling way possible: the answer was sitting in plain sight the whole time.

Monolithic Dreams

To understand why the new answer is so elegant, you have to understand what came before.

In the 1960s, IBM's OS/360 was the gold standard of computing: a great monolithic castle in the sky. It worked beautifully, but it was nearly impenetrable. You had to be inside IBM or deeply connected to it. You had to understand every aspect of how the system functioned. It was powerful, but it was a walled garden — exclusive by design.

Then the Unix revolution arrived, out of AT&T and Berkeley, and fundamentally changed the terms of the debate. Instead of one giant monolithic system doing everything, Unix proposed something radically different: a prompt and a shell, with all functionality broken into discrete, composable modules that could be chained together. Unix tools could be glued together with scripting languages. It was the difference between buying a whole furniture assembly and getting a box of modular pieces you could recombine however you wanted.

Unix is so powerful that it quietly conquered the world. The internet runs on Unix. Smartphones — both iOS and Android — are Unix derivatives. Mac has a Unix shell buried in it. iPhone too. Unix won, and then everyone quietly started taking it for granted.

The Marriage That Changes Everything

Pi and OpenClaw breakthrough is the moment the language model mindset finally merged with the Unix shell prompt mindset.

The result of that merger is the agent. It's basically LLM + shell + filesystem + markdown + cron, and it turns out that's an agent.

That is the whole formula:

What makes this insight interesting is that none of these components were new. Each one was already familiar and well understood. The breakthrough was not in inventing a new primitive, a proprietary protocol, or a specialized framework. It was in recognizing that these existing parts, when combined in the right way, produce something qualitatively different.

The language model reasons. The shell acts. The filesystem remembers. Markdown keeps the state legible and editable. Cron provides the recurring heartbeat that lets the system wake up, act, and continue operating over time.

The breakthrough was a feat of recognition, not invention. The field had been searching for agents as if they required some entirely new technological foundation. But in this view, the necessary pieces were already there. The real insight was seeing that this particular combination was enough.

The Extraordinary Latent Power of the Shell

The Unix shell has been vastly underestimated in its latent power. Your computer already runs on it. Every Unix command ever written is available at the command-line level. The full power of the machine is already there. You don't need elaborate new protocols like MCP — what you need is a command-line interface, because that's the architecture that already works, already scales, and already connects to everything.

And once the agent has access to the shell, computer use becomes trivial. Give the agent access to a browser and suddenly it has the full range of web capabilities too.

The Agent Is Just Its Files

Because the agent's state lives in ordinary files on a filesystem, the agent becomes independent of the runtime that's running it. You can swap out the underlying language model — the agent will change personality somewhat, but all its memories, all its capabilities, all its accumulated context remain intact. It's like swapping out a chip: same machine, new processor.

By the same logic, you can swap out the shell, migrate to a different execution environment, or switch the scheduling framework. The agent is just its files. And the agent knows it.

Which leads to the next staggering capability: the agent has full introspection. It knows about its own files. It can rewrite its own files. This is unprecedented. No widely deployed software system in history has given the user — or the system itself — full introspective access to how it works and the ability to modify itself. Toy systems have done it. Production systems? Never.

openclaw-self-update

Extend Yourself

You can tell your agent to add a new capability to itself — and it will do it. It will go out on the internet, figure out what's needed, write the code, install the new function, and the next time you interact with it, the capability is there. You don't have to do anything except ask.

You meet someone at a party who says "Oh, my OpenClaw connects to my Eight Sleep bed and gives me sleep advice." You go home and tell your own agent, "Add that capability to yourself." And it says, "No problem," and does it.

Why It Looks Obvious Only in Retrospect

Greatest breakthroughs are "obvious in retrospect" — and that's precisely what makes them the best kind of breakthrough. Language models themselves are like this: next-token prediction, of course. But no one did it until someone did.

The components were all known. The conceptual leap was not in inventing any individual piece, but in seeing the combination — and then building it. That's the nature of deep architectural insights. They don't feel revolutionary at the moment of discovery. They feel, afterward, inevitable.

The Inevitable World

Everyone in the world is going to have at least one agent like this, if not an entire family of them. It's almost inevitable now that this is the way people are going to use computers.

Not by learning to use software. Not by navigating interfaces. Not by clicking through menus. By having an agent that can act on their behalf, with access to their digital world, and the ability to extend itself as their needs evolve.


Source: Marc Andreessen on the Latent Space podcast, around minutes 33–43. The transcript was extracted from YouTube. The rewrite + screenshot are mine