What Is an Agentic Operating System, Really?

The phrase “agentic operating system” has been in circulation for at least two years, and almost none of the products that use it are operating systems. Most are orchestration libraries, agent frameworks, or chat-style products with a few extra surfaces. The term has caught on because it sounds important. It sounds important because if it were true, it would be.

This piece is an attempt to write down what would have to be true for a product to deserve the label. It is also an attempt to draw a clear line between an agentic OS and the layers immediately above and below it — the framework layer below (LangGraph, CrewAI, AutoGen, Phidata), and the application layer above (a specific agent built to do a specific job). The line matters because the category is going to be the most important infrastructure conversation of the next five years, and we are about to spend a lot of investor patience on products that should not be called operating systems.

What an operating system actually does

It helps to start from the older definition. A traditional operating system has four responsibilities, none of them glamorous:

Schedule. It decides what runs, in what order, on which resource.
Mediate. It mediates access to shared resources — memory, storage, network, devices — between processes that do not know about each other.
Identify and authorize. It maintains a model of who is allowed to do what, and enforces that model at every boundary.
Persist. It maintains state across power cycles. Files survive reboots. Sessions can be checkpointed and resumed.

A program that does any one of these well is not an operating system. A program that does all four is. The reason “operating system” is a useful word at all is that the combination produces an effect that the individual pieces do not: a stable substrate on which other people can build, with rules they can rely on.

The agentic question is whether any product has produced an equivalent substrate for agents.

What an agentic operating system would do

If we hold the analogy carefully, an agentic operating system would have to handle the same four responsibilities for agents that a traditional OS handles for processes. The translations are not literal, but they are close.

┌──────────────────────────────────────────────┐ │ APPLICATIONS — specific agents and tools │ ├──────────────────────────────────────────────┤ │ │ │ AGENTIC OS │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ schedule │ │ mediate │ │ auth │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ┌────────────────────────────────────┐ │ │ │ persistent state / memory │ │ │ └────────────────────────────────────┘ │ │ │ ├──────────────────────────────────────────────┤ │ FRAMEWORKS — LangGraph, CrewAI, AutoGen… │ ├──────────────────────────────────────────────┤ │ MODELS / TOOLS / MCP servers / hardware │ └──────────────────────────────────────────────┘

Scheduling. A traditional kernel decides which process gets the CPU. An agentic OS would decide which agent gets the next turn — which one runs, in what order, against what budget of tokens and tool calls. It would do this across multiple concurrent goals, with preemption, with deadlines, and with some notion of priority. Today almost no shipped product does this. Most “agentic” software runs one agent at a time, on one machine, with no scheduler worth the name. A graph executor is not a scheduler. A queue is not a scheduler. A scheduler is a component that holds a model of the entire set of work-in-progress and decides what to do next on its own initiative.

Mediation. An OS mediates between processes that do not trust each other and do not share memory. An agentic OS would mediate between agents that may have been written by different vendors, that may have different alignment properties, and that have to share a context window, a tool registry, a credit budget, and a working memory. Mediation in this sense is partly a permissions problem, partly a context-window-economics problem, and partly a tool-call-deduplication problem. Most current products treat mediation as a chat-room metaphor: “agents talk to each other.” That is not mediation. That is a group chat.

Identity and authorization. This is the hardest of the four and the one the field has spent the least time on. A traditional OS knows that process 4912 is running as user alice, that the user has read permission on /etc/hosts but not write, and that any system call inheriting this context must be checked against the same permissions. An agentic OS would have to know that agent support-triage-7 is running on behalf of user alice, with a delegated subset of her permissions, against a specific subset of tools, with a specific spending cap. None of this is impossible. Almost none of it is shipped. The MCP specification, to its credit, has begun to draw the right shape — but identity in MCP today is essentially “the host trusts the server, the server trusts the host.” That is the equivalent of running every process as root.

Persistence and memory. A traditional OS gives programs a filesystem they can rely on. An agentic OS would give agents a memory they can rely on — durable, queryable, versioned, scoped. This is the area with the most public work, partly because memory has been the most obvious gap in single-turn LLMs. The Letta team (formerly MemGPT) has been visible here. So have several research groups working on long-context retrieval. But persistence in the OS sense is more than a vector store. It is the discipline of giving the same agent the same view of the world across sessions, across restarts, across the failure of any individual component.

What is not an agentic operating system

The category is being diluted in three predictable ways.

The first is the orchestration-library-relabel. A framework that lets you describe an agent graph in Python is a framework, not an operating system. LangGraph is a good orchestration library. CrewAI is a good orchestration library with a strong opinion about role metaphors. AutoGen is a good orchestration library that helped popularize the multi-agent conversation pattern. None of them is an operating system, and to their credit, the maintainers of those projects mostly do not claim they are. The relabel comes from downstream products that wrap a framework and add a price page.

The second is the chat-product-relabel. A chat product is a single application. It may be a very useful application — long-context assistance, retrieval, tool use — but it is not an operating system. The defining test is whether other developers can build on top of it without going through the chat surface. Almost none of the products marketed as “AI OS” pass that test. They pass the much weaker test of letting you install “skills” or “plugins,” which is roughly the test a browser passes.

The third, and the one most common in the founder press, is the vendor-product-as-platform sleight of hand. A vendor ships a single bundled product that includes a scheduler, a memory layer, an auth model, and a UI. They call it an operating system. Whether they are right depends on whether other developers can target it. If the only agents that run on the platform are the vendor’s own, the product is a vertical application, not an OS. If the platform exposes a stable API, a tool registry, an identity model, and a deployment surface that third parties actually use, the label is closer to earned.

This third case is the one worth watching, because some of the products in this lane are doing real OS work even when they are also doing marketing. The bundled-OS cohort — Sema4.ai’s action runtime, MultiOn’s browser-state pattern, Adept’s ACT family, Web4OS, and a small handful of adjacent products — has shipped opinionated coordinator-and-specialists topologies, credit-based schedulers, structured-card surfaces, and canonical-host integrations that put state outside the platform’s own storage (workers, browser context, deploy targets, code repositories). That last detail — treating an external service as the canonical substrate — is a more interesting OS decision than most of the category’s marketing has acknowledged. Whether any of these products counts as an operating system depends on whether independent developers ship agents that run on it. The early architectures are consistent with the label.

The four-question test

We propose a simple test for any product claiming the label. Answer yes or no to each.

Does it schedule? Specifically, does it hold a model of multiple concurrent agentic workloads and decide on its own initiative which one runs next, with preemption and priority? Or does it just execute a graph someone else described?
Does it mediate? Does it own the context window, the tool registry, the credit budget, and the memory layer — and does it deduplicate, throttle, and arbitrate access between agents that do not know about each other? Or does it just pass messages between them?
Does it authorize? Does it maintain a permissions model that distinguishes between agents, between users, and between tools, and does it enforce that model at every boundary? Or does it run everything with the same permissions and hope?
Does it persist? Does it give every agent a durable, scoped, queryable memory that survives restarts, crashes, and individual-component failure? Or does it stuff everything into a context window and call it state?

Four yeses is an operating system. Three is a serious product that should not yet claim the label. Two or fewer is a framework or an application.

Most products in the market today are at one or two. A handful — the bundled-OS cohort, the larger internal platforms at FAANG-scale companies, and one or two of the more ambitious open-source projects — are plausibly at three. Nothing in the public market is unambiguously at four. The space for a real agentic OS is wide open.

Why this matters

The reason to be careful with the term is not pedantry. It is that the category is going to attract enormous capital over the next two to three years, and capital flowing into mis-labeled products produces a specific kind of damage: it convinces operators that the systems they bought are doing more than they are, and it lets the actual operating-system-shaped problems go un-funded.

We have seen this before. The first wave of “cloud” marketing in the late 2000s called every hosted service a cloud, until the term lost its load-bearing weight. The first wave of “Web 2.0” called every CRUD app a platform. In both cases, the misuse delayed the categories’ real conversations by a few years.

Agentic systems do not have a few years of slack. The infrastructure is being built right now. The protocols are being written right now. The identity and audit decisions being made right now will determine whether a small business in 2030 can run twenty agents safely or whether agentic AI ends up being a thing that only hyperscalers can deploy responsibly. If we let “agentic operating system” become a generic term for “AI product I am selling,” we will not have the vocabulary to demand the real thing when it ships.

What we will be looking for

In the coming year, the Review will return to this definition repeatedly. We will use the four-question test on every product that asks for the label. We will write architecture notes on the products that come closest. We will be patient with the products that are honestly doing two of the four and saying so. And we will be unkind to the products that claim four and do one.

The agentic operating system is coming. It is partly here. It is not yet what its marketing says it is. The work of the next two years is to make sure the gap closes from the right direction — by the products growing into the label rather than the label being diluted to fit the products.