You know me by now, or at least you should, because I have been showing up in your feed with an almost suspicious regularity, and in it I am always waving around those frameworks and research papers like I’m some sort of a street preacher who traded his Bible for a spreadsheet. Ok, I do agentic transformation for a living, which is a fancy way of saying that I spend my days figuring out how to make AI do actual work inside large organizations instead of just generating haikus about quarterly reports and summarizing emails that nobody wanted to read in the first place.
I build AI native companies. Not AI first, mind you, because that particular honor belongs to this Dutch chap called Olivier Rikken, the madman behind the Zero-Human-Company concept, and I still maintain a perhaps foolish belief in the power of human beings and the beautiful mess that is augmentation. I think that humans have value beyond being expensive error-correction mechanisms for algorithms, which apparently makes me an optimist in this industry, even though everyone who knows me personally would choke on their coffee reading that sentence.
So here I am, running what I lovingly call an agentification factory at scale inside a big tech company, which means I wake up every morning with the existential question of how much of the enterprise can be handed over to autonomous systems before someone important notices that the robots are making decisions about things that matter. And because I apparently cannot leave well enough alone, I decided I needed to know exactly where the ceiling was. Not the theoretical ceiling that consultants draw on whiteboards using dotted lines and hopeful arrows, but the actual, measurable, “this is where the current technology stops being useful and starts being dangerous” ceiling.

That question led me to create a benchmark I lovingly called PASF PADE. The acronym sucks, but I hide behind the fact that I’m European, and we do not have a culture of creating beautiful acronyms like TOMAHAWK (missile…is actually an acronym), CROWS (a weapon station) or FAANG (facebook, Apple, Amazon, Netflix, Google), and of course HELLFIRE (Helicopter Launched Fire and Forget Missile). But also maybe because this domain isn’t as sexy as military hardware or Big Tech, I don’t know. You decide.
Anyway, I started this piece of research as the first step in a research program to target specific business processes with AI. In this paper I’ve determined there are four types of process categories, which I call “automation zones”, each of which is harder to automate than the one before. The third zone is the hardest, but theoretically still possible for a novel type of AI I’ll explain more in detail later. If you want to know more about these automation zones, read this blog in which I described it in detail “The Real Story Behind Enterprise Scale Process Agentification.”
Our entire research program at Eigenvector is focused on the ambition of building an AI that targets specifically Zone III, that stubborn 30 to 50 percent of enterprise processes that current AI cannot touch because of risk, compliance, and complexity. But zone III is rich in humans, expensive humans, and that is the incentive. We are already piloting this model at Eigenvector and it looks very promising. I described this AI architecture of this model here The Boring AI That Keeps Planes in the Sky.

Another part of the research program is about Tokenomics – the economics of burning money on tokens to get the AI to do what it promises – because AI is not free and someone has to pay, even if you’re running the models yourself. So, we built a model whereby you optimize your current agentic factory setup against the cost of tokens, we call it Token Minimization Governance, and Olivier and I are going to start a simulation of a swarm of agents performing tasks and then seeing what the spend is with and without the optimization algorithm. I wrote a blog about it called “I Spent a Year Burning Money on AI and Finally Decided to Do Something About It.”

The ultimate goal of our research program is to build a neuro-symbolic system that self-optimizes, is governed, can optimize its own patterns against minimum token usage, and is specifically geared towards Zone III processes. If you want to know more about self-organizing AI and how it might break through the agentification ceiling, read this blog “Self-Evolving AI Might Actually Break the Agentification Ceiling.”
Now, there are a few things I want to share before we dive into the main topic of this blog. Let me put quotes around this thing because it is important
AI at its current state does not displace full jobs. It displaces tasks of a job instead.
This is my personal experience running an AI factory for a while, and it has been corroborated by recent research by the International Labor Organization and MIT who posted their task orientation paper last week. ILO for one, boldly stated that only 2,3% of jobs will disappear because of AI. I want to add to that this this doesn’t take into account entry level jobs, because those are already highly affected by AI. Think of junior consultants doing market analysis, junior programmers, the works. Both links in the comments.
Now, when you generalize the insight of a task-displacement of AI instead of full-job displacement, this would mean that people also have to shift from task orientation towards adding value to the purpose of the job. I stole this line from Fatih Boyla, all props to him, because he said it better than I could. In total, AI does create lots of efficiencies because if you add all the automations together, you save a lot of FTE. But the jobs still remain, apart from entry level jobs that is, but the thing is you have to worry less about humans interfering. The human augmentation vision therefore still stands to this day, and therefore the human still remains the babysitter of the AI. I wrote about this here “The Truth Is That AI Still Needs a Babysitter.”
But that will change at the end of the research program, that is exactly the goal that we’re working towards, which is to create an AI that is specifically targeted at whole processes, and we have already made strides in this domain, so do not sit on your laurels yet. We simply bought ourselves more time.

Now, over the last few weekends I was working on something else. Something relevant, but not exactly technical in nature.
Let me explain.
You know I created a tool that analyzes your processes and lets you know if they are eligible for automation, right? Well, that tool is the result of the PASF PADE benchmark AI Automation Zones paper and it runs at ai-automations dot my (link in the comments), but before you visit, I put the AI there on hold, because it costs me 50 euros per day on tokens, because you people are using it like crazy now.
But the thing is that this tool operates at the process level.
so, over the last couple of weekends I dove a level deeper. I asked my Oompa Loompas to do some research into job frameworks out there, and I created a model to translate jobs into tasks using a structured method, drawing on those job frameworks, and then I translate those jobs to tasks, which I then mapped onto the AI Automation Zone model to see what percentage can be automated by AI. And then I ran a list of standardized white-collar jobs against that method and as a result we now have a tool with which I can predict if a certain job is easy to automate using AI.
And then I ran this tool across a list of standardized job descriptions. This blog is the result of that and if you are into job protection, this is the tool you really need. Someone suggested I put this into a website, but given the token spend, this might take some time. Unless you guys set up a GoFundMe or something.
So here is the paper. And it is called PASF Mapping as a Task-Based Research Method for Estimating Automation Zone Distributions in Knowledge and Office Work and it is based on my own benchmarks I created a year ago.
Let me walk you through what it does.
Why I needed a ceiling for the agentification factory
When you run automation at enterprise scale, you quickly discover some things that the vendor demos conveniently forget to mention. Not all processes are created equal, and the ones that look identical on a process map behave completely differently when you try to hand them to an AI agent. Some processes roll over like my Weiner does, accepting automation with minimal fuss and maximum efficiency gains while others fight back with the ferocity of a Chihuahua being forced into a carrier, generating exceptions and edge cases and compliance nightmares that make you question every life choice that led you to this moment.
I needed a way to classify this mess, so I created something called PASF PADE, which stands for things I will explain in a moment, and which functions as a benchmark for understanding where your processes actually live on the automation spectrum. The framework divides all enterprise work into four zones, and understanding these zones is essential if you want to grasp why I spent my weekends building a job apocalypse calculator instead of doing something sensible like touching grass or something.

Zone I is the easy stuff. This is highly structured, heavily routinized work with low exception density and minimal discretionary judgment and every agentification journey starts there and it has plenty of low hanging fruit. 27% plenty. Think data entry, document filing, basic transaction processing. If a process lives entirely in Zone I, well congratulations my smart friend, you can probably automate it with a script your AI wrote during lunch. Current AI handles this beautifully. No drama here whatsoever.
Zone II is where things get interesting. This is semi-structured work within bounded workflows, the kind where you have established procedures but also need some coordination, some case progression tracking, some controlled interaction with humans who have opinions about things. Take for instance customer service that follows decision trees or insurance claim processing with standard evaluation criteria, IT ticketing within defined escalation paths. Zone II is harder than Zone I because you need workflow intelligence, but current agentic systems can handle most of it if you architect them properly.
Together Zone I and II represent the current agentification ceiling of 35% we bump into when trying to upscale a factory and that is mainly because of Zone III.
Zone III is the problem child that keeps you and me employed.
This is context-sensitive analytical and interpretive work requiring actual judgment, the kind where the right answer depends on circumstances that were not enumerated in any training manual because nobody could have anticipated them in advance. Financial analysis where the numbers tell different stories depending on market conditions. Legal reasoning where precedent matters but so does the specific constellation of facts. Software engineering where requirements are ambiguous and stakeholder needs shift mid-project. Zone III work is rich in humans because humans are good at the messy contextual reasoning stuff that these tasks demand, and our current AI systems struggle here precisely because struggle is what happens when you need genuine understanding rather than pattern matching.
To describe my zone III nightmare, I use the following analogy
Think of current generative AI as a revolver with one bullet. Every time you run a process, the hammer cocks. Five times out of six, it fires clean and everybody celebrates. But the sixth time, when the chamber is loaded, that single failure erases all five wins. You are playing Russian Roulette with your company.
And then finally, zone IV sits at the top that is reserved for strategic, governance-related, normative, and end-responsibility work. This is the place where someone has to actually own the outcome, sign their name to the decision, and face consequences when things go wrong. I am talking about board decisions, regulatory interpretations with legal exposure, ethical judgments that cannot be delegated to a probability distribution, or the fact that I don’t want my salary to be paid out by an AI Agent. Zone IV remains stubbornly human because we have collectively decided, for now, that accountability requires a pulse.
Here is the uncomfortable thing though. When I analyzed enterprise processes against this framework, I discovered that roughly 30 to 50 percent of all knowledge work sits in Zone III. This is the automation gap, the space where current AI cannot reliably operate because the risks are too high and the consequences of getting it wrong are too severe. And this is exactly where most of the expensive humans live, which means the economic incentive to crack Zone III is enormous while the technical and governance challenges remain unsolved.
I described this framework in detail in a piece called “The Real Story Behind Enterprise Scale Process Agentification,” which you can find on LinkedIn and Medium if you want the full technical treatment with charts and everything.

Zone III is where the money and the risk live together
My entire research program at Eigenvector is focused on one specific problem, which is building an AI architecture that can actually target Zone III processes without causing the kind of catastrophic failures that make compliance officers develop stress disorders. This is not a trivial engineering challenge because Zone III is Zone III precisely because it resists automation. The processes that live there require contextual understanding, exception handling that goes beyond decision trees, and a kind of judgment that current language models can approximate but not reliably execute.
The architecture for a new model, including scaffolding and governance that we are developing is what I call a Goal-Directed Governance Agent, which is exactly as boring as it sounds, and that boringness is the entire point. I wrote a piece about this called “The Boring AI That Keeps Planes in the Sky,” because the AI systems that actually matter in high-stakes environments are never the flashy ones. They are the dull, reliable, obsessively-monitored systems that do one thing correctly over and over again while generating audit trails that would make a regulatory lawyer weep with joy.
The core idea is simple enough that I can explain it here without requiring you to read the full paper, though you should anyway because I spent a lot of time on it. Instead of building AI agents that try to be clever and autonomous and impressive, you build agents that are goal-directed within explicit constraints, continuously monitored by governance layers, and designed to escalate rather than improvise when they encounter situations outside their competence boundaries. The agent knows what it is supposed to accomplish, knows the rules it must follow, and knows when to stop and ask a human instead of making up an answer and hoping nobody notices.

We are already piloting this approach at Eigenvector, and the early results are promising in the way that early results always are, which is to say they work in controlled conditions and we are holding our breath to see what happens when reality gets involved. The key insight from the pilots is that Zone III automation is not about making smarter AI, which all frontier labs are doing, but about making AI that knows its own limitations and operates within governance structures that catch failures before they cascade. It is the boring, yet effective AI and it’s the opposite of the demo-driven AI culture that dominates most enterprise conversations nowadays. Therefore we’re more into applied research than fundamental research and we’re not interested in building the smartest model out there, because the level of intelligence as it stands now is already good enough.
The thing that AI needs however, is boring stability, predictability, planning and reasoning, tools, and a sense of money.
And that is the ultimate goal of our research, and if you feel you want to join me in that quest, drop me a DM. Crazy smart thinkers are welcome, but not too smart by the way, because we still need to talk sometimes.

AI is not free and someone has to pay
There is another piece of the research program that does not get enough attention, probably because it involves math and cost accounting, which are the two subjects guaranteed to clear a room faster than a fire alarm.
Tokenomics. Yup.
The truth that most AI enthusiasts prefer to ignore is that every time you call a language model, you are burning tokens, and tokens cost money. Duh. Not a lot of money per call, usually, but when you scale agentic systems to enterprise level, those fractions of cents accumulate into numbers that make CFOs ask uncomfortable questions. I spent approximately a year burning money on AI experimentation before I decided to actually build a systematic approach to understanding and optimizing this spend, which I documented in a piece aptly called “I Spent a Year Burning Money on AI and Finally Decided to Do Something About It.” Only difference is that it has more capitals.

The model we developed is called Token Minimization Governance, and it treats token spend as a first-class optimization target rather than an afterthought. When you design an agentic system, you make architectural choices about how agents communicate, how much context they share, how often they call external models, and how verbose their reasoning chains are. Each of these choices has cost implications, and those implications compound across thousands or millions of executions. An agent that uses 500 tokens per task instead of 2000 tokens per task does not just cost less per execution. It enables deployment scenarios that would be economically impossible with the more expensive architecture.
The tokenomics framework integrates with the PASF model in a specific way. Zone I processes are high volume but low complexity, so token efficiency matters enormously because you are multiplying small costs across huge numbers of executions. Zone II processes are moderate volume with moderate complexity, requiring a balance between reasoning depth and cost control. Zone III processes are lower volume but high stakes, which means you often want to spend more tokens on reasoning and verification because the cost of errors exceeds the cost of computation. The optimization strategy changes depending on where the work lives, and having a framework that makes these tradeoffs explicit is more valuable than having a gut feeling that “AI is expensive sometimes.”

The self-optimizing system that might actually work
The ultimate goal of our research program is ambitious enough that I hesitate to write it down because it sounds like the kind of thing someone says right before their startup fails spectacularly or something along those lines.
Let’s go one level deeper.
We want to build a neuro-symbolic system that self-optimizes, operates under explicit governance constraints, can learn and improve its own operational patterns, and does all of this while minimizing token expenditure, specifically targeting Zone III processes that current systems cannot handle.
Let me break that down because I just threw a lot of jargon at you.
Neuro-symbolic means combining old style 80’s ‘deterministic’ AI with neural network approaches that are good at pattern recognition and language understanding, and we combine them with symbolic reasoning approaches, which are good at logical inference and rule-following. The intuition is that Zone III work requires both. You need the flexibility of neural systems to handle ambiguous situations and the reliability of symbolic systems to apply rules consistently and provide explainable outputs.
Self-optimizing means the system can observe its own performance, identify patterns in its successes and failures, and adjust its behavior accordingly. This is not the same as general self-improvement, which is the AI safety nightmare scenario. It is more like continuous learning within defined boundaries, similar to how a chess engine improves by analyzing its games but does not suddenly decide to optimize for world domination.
And that is why we are using chess to test out this system! Boring, but so is a corporate’s process landscape.

Governance constraints means that the system operates within explicit guardrails that cannot be overwritten by its optimization processes. The goals, the rules, the escalation triggers, the audit requirements, all of these are architecturally privileged in ways that the learning components cannot circumvent. This is what I mean with the boring safety layer that makes the interesting capabilities possible.
I wrote about the self-evolving aspects of this architecture in a piece called “Self-Evolving AI Might Actually Break the Agentification Ceiling,” which is either an exciting research direction or the title of my professional obituary, depending on how the experiments go.
AI does not replace jobs, it replaces tasks
Before I get to the main event, the weekend project that spawned the research paper you are about to learn more about than you probably wanted, I need to address something that gets consistently wrong in public conversations about AI and work.
This one goes inside a quote as well.
AI does not replace jobs. AI replaces tasks.
This distinction matters enormously, and I need to give credit to Fatih Bolya for articulating it better than I have, because when I try to explain this I tend to wander into tangents about labor economics while his version stays crisp and actionable. And the core insight is this. A job is not a single thing, but it’s a bundle of tasks, and AI affects those tasks unevenly. Some tasks get automated completely where others get augmented, meaning they become faster or easier with AI assistance but still require human involvement. Some tasks remain entirely human because the technology is not there yet or because we have collectively decided that humans should remain in the loop for reasons of accountability or judgment or simple preference.
When you add up all the automations across an organization, you often save the equivalent of many full-time employees worth of labor. But the jobs themselves remain, just with different content. The human shifts from doing tasks to overseeing systems that do tasks.
The human becomes, as I wrote in a piece called “The Truth Is That AI Still Needs a Babysitter,” exactly that, a babysitter. Someone who watches the AI work, catches its mistakes, handles the exceptions it cannot handle, and makes the judgment calls it is not authorized to make.
This is the current state of affairs, and I do not expect it to be the permanent state of affairs.
The program we’re running is specifically aimed at reducing the babysitting burden in Zone III, which is where most of the expensive human oversight currently lives. We have made progress. But until that progress translates into deployed systems that actually work reliably in production environments, the babysitter model persists.

Moving from processes to tasks to jobs
Now we arrive at the part where I explain what I actually did over several weekends when I should have been resting or engaging in leisure activities like a normal person.
You may recall that I built a tool that analyzes business processes and tells you whether they are eligible for automation based on the PASF framework. That tool lives at ai-automations dot my, and before you rush over to try it, I should mention that I put the AI on hold because you people started using it enthusiastically and it was costing me roughly fifty euros per day in token spend. Fifty euros per day adds up quickly when your revenue from the tool is exactly zero, so the AI is taking a nap until I figure out a sustainable funding model.
But that tool operates at the process level.
It looks at how work flows through an organization and estimates where on the PASF spectrum that work lives but then I realized that what was missing was a job level analysis. Not processes, but roles, not workflows, but occupations.
So I built this method to translate jobs into tasks.

The approach uses existing occupation frameworks, specifically O*NET from the United States and ESCO from the European Union, which are publicly available taxonomies that describe occupations in terms of their component tasks, required skills, and work activities. I take a job title, match it to the closest occupation codes in these frameworks, extract the representative tasks associated with that occupation, and then map each task onto the PASF zones using an explicit coding rubric that I developed for this purpose.
The output is a zone distribution for the role.
Not “this job is automatable” or “this job is safe” but rather “this job is X percent Zone I, Y percent Zone II, Z percent Zone III, and W percent Zone IV.” It’s a distributional view instead of a binary classification, because jobs are bundles of tasks and the tasks have different automation characteristics.
I then ran this method across a set of standardized white-collar roles to see what patterns emerged.

How you actually map a job to automation zones
Let me explain the method in enough detail that you could replicate it if you wanted to, though I suspect most of you will sensibly choose not to spend your weekends doing what I did.
The process has five stages.
Stage one is role normalization. You take a job title, which is often vague or company-specific or aspirational in ways that obscure what the person actually does, and you match it to a primary occupation in O*NET and, where possible, a secondary occupation in ESCO. This gives you a standardized anchor that connects to a rich task database.

Stage two is task decomposition. You extract the representative tasks associated with that occupation from the framework databases. O*NET provides detailed task statements for each occupation. ESCO provides similar structures with European context. You end up with a list of tasks that characterize what someone in this role actually does with their time.
Stage three is rubric-based zone coding. You take each task and assign it to a PASF zone using explicit rules. Zone I gets highly structured, routinized tasks with low exception density. Zone II gets semi-structured tasks within bounded workflows. Zone III gets analytical, interpretive, or context-sensitive tasks requiring judgment. Zone IV gets strategic, governance, or end-responsibility tasks. The coding rules are explicit enough that different coders should reach similar conclusions, though I have not yet done the inter-rater reliability testing that would be required for publication in a serious journal.

Then stage four is about weighted aggregation.
Here, you combine the task-level assignments into a role-level distribution.
The weighting logic is not just a simple count because not all tasks contribute equally to the character of a role. Tasks with interpretive or strategic responsibility get slightly more weight, and clerical or transactional tasks get slightly less and the result is an estimated zone share for the entire role.
Stage five is quality review.
You check coverage, meaning how well the occupational anchors captured the reconstructed role content, then You check confidence, meaning how stable the estimates are given the coding decisions. You note biases and limitations.

127 tasks, 10 roles, and a lot of uncomfortable numbers
So I ran this method across ten office and knowledge-work roles, coding 127 tasks or task clusters in total. The roles included executive assistant, customer service agent, financial analyst, HR specialist, software engineer, legal advisor, insurance claims handler, IT service desk agent, procurement officer, and sales operations specialist.
Here is what I found.
The mean distribution across all ten roles was 11.88 percent in Zone I, 32.77 percent in Zone II, 43.56 percent in Zone III, and 11.82 percent in Zone IV.

Read that again but in a different accent or something.
The average white-collar role in my sample is not dominated by the easy stuff. Zone I, the fully routinized work that current AI handles trivially, accounts for less than 12 percent of the average role.
The empirical center of gravity sits in Zone II and Zone III – as predicted – which means semi-structured process work and context-sensitive analytical work together account for over 76 percent of what knowledge workers actually do.
The role-by-role breakdown is even more interesting.
Executive assistants showed the highest Zone I share at 55 percent, which makes sense because that role includes a lot of scheduling, document management, and routine coordination. But even executive assistants have nearly 25 percent of their work in Zone III, the analytical and interpretive tasks that require actual judgment.
Financial analysts and software engineers showed over 83 percent Zone III. These roles are dominated by interpretation, analysis, technical reasoning, and context-dependent judgment. The easy automation targets barely exist in these occupations. That reasoning is not valid for entry level coding jobs by the way. These are mostly in zone I and II.
Legal advisors showed the highest Zone IV share at 49 percent, reflecting the governance, normative interpretation, and final-responsibility functions that characterize legal work. Legal advisors also had 51 percent in Zone III, meaning literally zero percent of their work fell into Zone I or Zone II. Every single task in the legal advisor role requires either contextual judgment or strategic accountability.
Insurance claims handlers sat in the middle with a hybrid profile, about 50 percent Zone II, 27 percent Zone III, and 14 percent Zone IV. This is a coordination-heavy role where process adherence coexists with human judgment and periodic escalation.
The pattern that emerges is quite clear.
Most contemporary knowledge work is not the routine stuff that AI vendors love to demonstrate.
Most of it lives in the zones where automation is hard, risky, or impossible with current technology. The easy wins exist, but they are not where most of the human labor actually sits.

What this means and why you should care
So what does this mean for you, assuming you are a knowledge worker with a job title and a vague anxiety about whether a language model is coming for your livelihood?
First, the good news.
Your job is probably not disappearing tomorrow. Jobs are bundles of tasks, and even in roles with high exposure to automation, the Zone III and Zone IV components require human involvement that current AI cannot reliably provide. You will likely keep your job title. You will likely keep your desk. You will likely keep receiving paychecks.
Now the complicated news.
The content of your job is going to change, possibly dramatically. The Zone I and Zone II tasks that you currently perform will increasingly be handled by AI systems, freeing you up, or forcing you, depending on your perspective, to focus more on the Zone III judgment work and the Zone IV accountability work. This is not necessarily bad, but it requires adaptation. The skills that made you successful in a world where your job included routine tasks will need to evolve as those tasks migrate to machines.
And here is the conclusion that nobody wants to discuss.
If you add up all the Zone I and Zone II automation across an organization, you end up with significant labor savings, even if no individual job disappears completely. Ten percent efficiency here, twenty percent there, it accumulates. Organizations will need fewer humans to produce the same output, or they will produce more output with the same humans.
Either way, the labor market implications are real, even if they do not take the form of dramatic mass unemployment that makes for good headlines.

The framework I have built tells you, role by role, where the exposure lives. Someone suggested I put this into a public website, which I would love to do, except the token spend problem remains unsolved. Every query burns money, and I am not yet in a position to subsidize the workforce anxiety of the entire internet. So the tool exists, the method is documented in the paper, and if someone wants to fund a public deployment, my inbox is open.
In the meantime, I will keep running the research program. The Goal-Directed Governance Agents. The Tokenomics Optimization. The Neuro-Symbolic architecture targeting Zone III. The boring, unglamorous work of making AI reliable enough to trust with work that actually matters.
And here is the final uncomfortable truth.
The agentification ceiling is not a permanent barrier. It is a temporary limitation created by current technology, current governance frameworks, and current organizational risk tolerance.
My entire research agenda is aimed at raising that ceiling. And when it rises, the labor market implications change again. Your job will be affected by AI. This paper shows you exactly how much. What you do with that information is up to you.
Signing off,
Marco
Marco van Hurne runs Eigenvector Research and spends his weekends conducting research that keep him awake at night. The PASF mapping paper, the Neuro-Symbolic- and Tokenomics papers are available for those who want the full methodology. The process automation tool at ai-automations dot my is temporarily resting because apparently free AI services have consequences. And still he remains cautiously optimistic about the future of human work, which his colleagues find adorable.

Leave a Reply