11%

That is the share of AI agents that successfully make it from pilot to production. Not because the models fail. The models are genuinely impressive — sometimes strikingly so. That number exists because the organization around the model was never redesigned to use it.

This is the first thing product leaders get wrong about enterprise AI. They treat the technology as the variable and the organization as the constant. In reality, the organization is the hardest problem, and the technology is the easy part.

There is a framework that forces you to confront this in the right order. It comes from Kearney's transformation methodology and it starts with a deceptively simple discipline: pixelating work.

The Framework: Work Pixelation and Operating Model Design

Work pixelation is the practice of decomposing organizational roles into discrete, task-level execution units before deciding what AI should touch. Not job functions. Not departments. Individual tasks, mapped and classified.

Product managers know this instinct from jobs-to-be-done thinking — the idea that you cannot design a good solution until you understand precisely what job the user is trying to accomplish. Operating model design applies the same logic one level up: you cannot design a good AI deployment until you understand precisely what work is being done, and what kind of work it actually is.

This matters because the product leader's job has fundamentally shifted. The first wave of AI product thinking was obsessed with capability — could the model reason, summarize, generate, act as an agent? That was the right question when the technology was new. But capability is no longer the constraint. The question has moved from can we make the software smarter? to can we deliver this intelligence at the right cost, latency, trust level, and scale? That shift — from capability to capacity — is what makes operating model design non-optional.

The PM is no longer only prioritizing features. The PM is making allocation decisions inside a constrained system. And the first allocation decision is always the same: which tasks actually deserve intelligence in the first place.

Every task in a workflow belongs in one of three modes.

Human-led tasks require deep enterprise context, complex stakeholder negotiation, ethical judgment, or decisions with material consequences. AI does not execute here. AI assists by surfacing data faster so the human can decide with better information.

Machine-led tasks via traditional automation are highly structured, rules-based, and repeatable. Deterministic software and RPA have always handled these well. The mistake is replacing working automation with AI for its own sake.

Machine-led tasks via generative and agentic AI are unstructured, language-heavy, and pattern-based — tasks requiring real-time reasoning, contextual retrieval, and dynamic decision-making. This is the category that justifies the compute cost. But only if you have done the mapping to confirm that a specific process actually belongs here.

What Pixelation Actually Looks Like in Practice

The three modes are intuitive in the abstract. They become genuinely useful — and genuinely hard — when you apply them to a real workflow, task by task.

Take customer support. Not the department. The actual sequence of work that happens when a customer submits a ticket. Most organizations treat this as a single workflow and ask: should we add AI? The right question is different. They should ask: which tasks inside this workflow belong in which mode?

Here is what the pixelation reveals.

Customer query received and routed to the correct queue. Structured input, defined routing rules, no ambiguity required. This is Mode 2. Deterministic routing logic has handled this for years. Deploying a frontier model here is not transformation — it is waste.

Query intent classified when the message is ambiguous. "My order is wrong" could mean damaged, missing, wrong item, or wrong address. The correct resolution path depends entirely on which one, and the customer rarely specifies. This is language-based pattern recognition under ambiguity. Mode 3.

Account history retrieved and summarized for the agent. Pulling structured data from a CRM and synthesizing it into a readable brief that surfaces what matters for this specific ticket — that is retrieval plus synthesis. Mode 3. This is where AI saves meaningful time.

Response drafted based on policy. Here is where most teams misclassify. Drafting a response feels like a creative, human task. But when the response must stay within documented policy, reference accurate account data, and follow a defined tone — it is language generation within constraints. That is Mode 3. The model handles it faster than a human writing from scratch. The human should be reviewing and sending, not composing.

Exception judgment: does this customer qualify for a goodwill refund? This requires reading unstated context — loyalty history, the emotional register of the complaint, the reputational risk of the decision, the precedent it sets for future cases. A wrong call here has consequences someone must own. Mode 1. The AI surfaces the relevant history. The human decides.

Escalation to a senior agent or manager. Same logic. The moment a decision carries accountability that cannot be delegated to a system, it belongs in Mode 1. Hard-coding this escalation path is not a weakness in the AI deployment. It is the design.

Resolution logged and case closed. Structured, repeatable, rules-based. Mode 2.

The Three Questions That Determine the Mode

When you are in a workflow mapping session and a task is not obviously one mode or another, three questions resolve the ambiguity.

Can success be defined before the task runs, without knowing the specific input? If yes — if you can write the success criteria on a whiteboard before seeing any particular instance — you are looking at Mode 2. The rules exist. A deterministic system can follow them.

Does the task require understanding something the input does not explicitly state? Intent, tone, context, the gap between what someone wrote and what they meant. If the task requires reading between the lines, it belongs in Mode 3. Language models are built precisely for this. Rules engines are not.

Would a wrong output require a human to be personally accountable for it? Not just correctable — accountable. If the output creates legal exposure, sets a policy precedent, or damages a relationship in a way the system cannot own, it is Mode 1. The AI can prepare the ground. The human must make the call.

The cases that remain hard after these three questions are almost always Mode 1 tasks disguised as Mode 3 — judgment calls dressed up as information tasks. The tell is the accountability question. If nobody in the room is comfortable letting the AI make the final call independently, the task is Mode 1, regardless of how capable the model is. That discomfort is not a feeling to overcome. It is a classification.

Here is what happens when you skip the pixelation step. A customer support AI is deployed. It drafts responses. Agents review them. Managers still measure average handling time. Compliance still reviews every AI-generated output manually. The AI has added a new review step without removing any existing ones.

The workflow became more complex. The economics got worse, not better. The feature is live. The business case is dead.

This is what a compute ceiling looks like from the inside. The model is not failing — it is succeeding. Users are adopting it. Volume is growing. And growth is precisely what breaks the economics, because every additional ticket triggers more model calls, more review cycles, more cost. The ceiling is not a technical limit. It is the moment AI stops being a magical interface and becomes a capacity problem. It was always coming. Deploying AI into an unchanged workflow just ensures you hit it sooner, at scale, in production.

This is the pattern. Not because the model failed. Because the operating model was never redesigned to absorb what the model could do.

❝

Data preparation alone — the foundational work of cleaning, structuring, and maintaining the data an AI will reason over — accounts for up to eighty percent of total AI project effort.

Enterprise leadership consistently underestimates this by a factor of three at kickoff. That is not a technology problem. It is a planning and workflow problem. And it surfaces immediately the moment you pixelate the work and ask:

who owns the data quality task?

What mode of work is that? Is there a human doing it manually today, and

what happens to that role when AI needs clean data at five times the volume?

Run this diagnostic before any initiative is approved: map the target workflow, identify who currently owns each task, and ask whether those ownership lines still hold when AI enters the sequence.

Who decides what the AI escalates?

Who owns data quality when the volume triples?

Which KPI changes when this workflow changes — and who is accountable for measuring it?

If more than half of those questions produce a "we will figure it out later" answer, the operating model is not ready. You are not looking at an AI problem yet. You are looking at a readiness problem. And no model, however capable, fixes a readiness problem.

The Jevons Trap

Every AI workforce planning model built on efficiency assumptions contains a structural error: the assumption that cheaper tasks will be performed the same number of times.

William Stanley Jevons identified the correct dynamic in 1865. Improving the efficiency of steam engines did not reduce total coal consumption. It expanded it. Because efficiency unlocks latent demand. The printing press did not reduce the demand for written communication. The spreadsheet did not reduce the demand for financial analysis. ATMs did not eliminate bank teller jobs — they lowered branch operating costs enough to open more branches, which required more tellers.

AI will reproduce this pattern. When the per-unit cost of a cognitive task falls dramatically, stakeholders request more iterations, new buyers enter the market, and entirely new use cases emerge that were not viable at previous price points. The demand for human oversight and complementary expertise expands proportionally.

The Test

Work pixelation produces one concrete deliverable before any AI initiative is approved: a task-level workflow map with every task classified into its mode. Run this test against the target workflow.

The AI is not the hard part. Redesigning the organization to use it is. That work — the pixelation, the mode classification, the operating model map — is what determines the answer to the one question that exposes every AI initiative for what it actually is:

❝

If usage of your AI feature grew tenfold tomorrow, would your product become stronger — or would you discover that the feature was only affordable while nobody was using it?

The operating model redesign is the work that makes the answer "stronger." Without it, tenfold usage is not a success story. It is a compute ceiling waiting to be hit.

Next in the series: once the work is redesigned, how do you decide which tasks deserve which tier of intelligence? And how do you make the build-versus-buy decision before it destroys your engineering budget?

The Reusable Principle

The unit of AI strategy is not the model. It is the task.

Unsplash

beehiiv — The newsletter platform built for growth

Access the best tools available in email, helping your newsletter scale and monetize like never before.

beehiiv.com

Why AI Lands in the Wrong Work. The Pixelation Framework

11%

The Framework: Work Pixelation and Operating Model Design

What Pixelation Actually Looks Like in Practice

The Three Questions That Determine the Mode

The Jevons Trap

The Test

The Reusable Principle

Reply

Keep Reading

AI Revolution Hub

Home

Learn

Top Papers