This website uses cookies

Read our Privacy policy and Terms of use for more information.

Weekly roundup

Hey folks, welcome back to the money side of AI.

In Part 1, you mapped your workflow into three modes. Mode 2 work, structured, rules-based, and deterministic, belongs to traditional automation.

Mode 1 work, judgment-heavy and accountability-laden, stays with humans. Mode 3 work, language-heavy, pattern-based, and requiring real-time reasoning, is where AI earns its compute cost.

This week we follow the tokens. And the bill they leave behind.

Now the question is what Mode 3 work actually costs to run. Not in the abstract, but through real unit economics: document summarization at scale, internal copilots under growing adoption, and what Google's and Anthropic's own pricing reveals about the cost surprises most teams miss.

For each one, we break down what a token actually costs.

The economics apply whether you are calling a commercial API or running an open-source model on your own infrastructure.

The pricing mechanism differs. API providers charge per token directly, while self-hosted models translate token throughput into inference cost through GPU compute time.

But the underlying economics are the same. Every token consumed has a cost. The question is whether that cost is visible on an invoice or buried in your infrastructure bill.

What is a token ?

A token is the basic unit of currency for every large language model interaction. Not a word — roughly three to four characters of text, on average. The sentence you just read is approximately twenty tokens. 

Every time your product sends a message to a model and receives a response, you pay for the tokens in and the tokens out. Input tokens cost less than output tokens — typically by a factor of three to five — because generating text is computationally heavier than reading it.

A task that requires a long system prompt, rich retrieved context, and a short answer has a very different cost profile than a task requiring a minimal prompt and a long generated response. 

Price per Token Is Not the Product Metric

AI pricing pages usually show cost per million tokens. That is useful, but it is not the business unit.

A product manager does not ship “one million tokens.” A product manager ships completed outcomes: one resolved ticket, one summarized document, one analyzed contract, one routed claim, one answered internal question.

The real metric is: Cost per completed task.

That cost has more layers than most teams initially estimate:

  • Model inference: input and output tokens

  • Retrieval: embeddings, reranking, vector search, document processing

  • Governance: logging, permissions, evaluations, traceability, audit controls

  • Human fallback: review, correction, escalation, exception handling

A workflow that looks cheap at the model layer can become expensive once the full system is counted.

This is especially true for agentic workflows. A single chatbot response may require one model call. A real support agent may require intent detection, retrieval, tool use, response drafting, guardrail checks, quality checks, and escalation logic.

That is no longer one call. It is a cost multiplier.

Token price is the visible cost. Completed task cost is the real cost.

Three Numeric Use Cases: How the Framework Works in Practice

Because this is an AI tokenomics article, the argument should not stay abstract. The framework becomes useful when a product team can attach numbers to it.

Use Case 1: High-volume document summarization

Imagine a team processes 10,000 internal documents per month.

Each document requires:

  • 10,000 input tokens for context

  • 500 output tokens for the summary

A mid-tier model priced at $3 per million input tokens and $15 per million output tokens would cost:

Input cost per document:

10,000 × ($3 / 1,000,000) = $0.030

Output cost per document:

500 × ($15 / 1,000,000) = $0.0075

Total per document:

$0.0375

Monthly cost for 10,000 documents:

$375

Now imagine the task is low-risk and repetitive. The team routes it to a utility model priced at $0.10 per million input tokens and $0.40 per million output tokens.

Input cost per document:

10,000 × ($0.10 / 1,000,000) = $0.001

Output cost per document:

500 × ($0.40 / 1,000,000) = $0.0002

Total per document:

$0.0012

Monthly cost for 10,000 documents:

$12

Same task class. Same volume. Similar quality for the required risk level.

The allocation decision saves $363 per month, or $4,356 per year, in one workflow.

The lesson is simple: low-risk, high-volume tasks should not consume premium intelligence by default.

Use Case 2: Internal AI assistant for power users

Now imagine an internal AI assistant used by 50 power users.

Each user runs 20 complex tasks per day.

Each completed task costs $0.10 after model calls, retrieval, logging, and review are included.

Daily cost:

50 users × 20 queries × $0.10 = $100

Monthly cost:

$100 × 30 = $3,000

Annual cost:

$3,000 × 12 = $36,000

This is not a scary number by itself. But it becomes strategic when usage scales.

If the product grows from 50 users to 500 users, the same behavior becomes:

500 users × 20 queries × $0.10 = $1,000 per day

Monthly cost:

$30,000

Annual cost:

$360,000

That is why tokenomics must be designed before adoption succeeds. A product can look cheap during pilot usage and become expensive exactly when it starts working.

With retrieval, observability, and escalation:

real cost may move even higher

The product lesson is not that agents are bad. It is that agentic workflows need a business case at the workflow level, not a pricing-page estimate at the model level. So you as Business Manager understand the unit economics of your task based on the current model price points and technicalities. Ideally you will have a:

The Unit Economics Scoreboard

The intent of the Scoreboard is to have a clear map of my approx price per task with a reflection on its tradeoff, as this scoreboard is going to be dynamic since the frontier labs fight for the market, so you always know whether the completed task is still economically rational as scale rises.

Case Study: Why Google's Cheapest Model Ran the Most Expensive Bill

Google released Gemini 3.5 Flash on May 19, 2026. Faster, smarter, and built for the kind of multi-step AI workflows that enterprise teams actually care about. It also costs less per token than Google's own 3.1 Pro model.

On paper, this looks like a clear upgrade. In practice, it is a perfect example of why "price per token" is the wrong number to put in a business case.

Let's walk through it.

The price tag looks simple.

Google charges per million tokens (MTok). But there is a catch. The rate you pay depends on how large each individual API request is. If a single request stays under 200,000 tokens, you get the standard rate.

If it crosses that line, the price jumps. For now, we will use the standard rate, because most enterprise API calls fall well under that threshold. We will come back to the catch later.

Model

Input price (per MTok)

Output price (per MTok)

$2.00

$12.00

$1.50

$9.00

Flash is 25% cheaper on both input and output. If you showed this table to your CFO and said "we are switching to the cheaper model," they would approve it.

Now here is what actually happened when someone ran both models on the same work.

Artificial Analysis, an independent benchmarking organisation, ran both models through the exact same test suite, their Intelligence Index v4.0, covering ten evaluations including coding, reasoning, agentic tasks, and hallucination detection. Same test. Same tasks. No advantage to either model.

Model

Intelligence score

Total cost to run

Output tokens used

Gemini 3.1 Pro Preview

57.2

~$887

57M

Gemini 3.5 Flash

55.3

~$1,552

73M

The cheaper-per-token model scored lower on intelligence, used 28% more output tokens, and cost 75% more to complete the same work.

How is that possible?

Image1 - Artificial Analysis

Image 1 tells the story. Every model generates two kinds of output tokens: reasoning tokens, which are the model's internal thinking steps, and answer tokens, which are the actual response the user sees. The split is revealing:

Model

Reasoning tokens

Answer tokens

Total output

Gemini 3.5 Flash

62M

11M

73M

Gemini 3.1 Pro Preview

53M

4M

57M

GPT-5.5 (high)

39M

6M

45M

GPT-5.5 (medium)

18M

4M

22M

Gemini 3.5 Flash generated nearly three times the answer tokens of Pro Preview (11M vs 4M) and 17% more reasoning tokens (62M vs 53M). It is wordier in its answers and thinks in more steps. Both of those cost money.

But the output side is only half the story. The bigger cost driver was input tokens. Gemini 3.5 Flash, optimised for agentic workflows, took more turns to complete each evaluation. Each turn is a separate API call, and each call carries input tokens: the system prompt, the retrieved documents, the conversation history accumulated so far. More turns means more input payloads. The input cost compounded across every additional step.

Think of it like two delivery drivers. One charges $9 per trip and delivers the package in six trips. The other charges $12 per trip but delivers it in three. The "cheaper" driver costs $54. The "expensive" driver costs $36.

Image 2 - Artificial Analysis

The scatter plot maps intelligence score (vertical axis) against output tokens consumed (horizontal axis). The green "most attractive quadrant" is top-left: high intelligence, low token usage.

GPT-5.5 (medium) sits closest to that ideal. Intelligence score of 56.7, only 22M output tokens. It is less smart than GPT-5.5 (high) at 58.9, but it achieves 96% of the intelligence at less than half the token consumption. That is the kind of efficiency trade-off a product team should be evaluating.

Gemini 3.5 Flash sits far to the right. Intelligence score of 55.3, with 73M output tokens. It is outside the attractive quadrant. It consumed more tokens than any other model in the comparison while scoring fourth out of five on intelligence.

Gemini 3.1 Pro Preview sits in a more balanced position. Higher intelligence (57.2), lower token consumption (57M). Not in the attractive quadrant either, but meaningfully more efficient than Flash.

The model that would have won a procurement decision on price-per-token is the worst performer on the metric that actually matters: intelligence per token consumed.

There is a second cost trap hidden in Google's pricing.

Gemini 3.1 Pro Preview has a split pricing structure. The rate depends on the size of each individual API request, not your total monthly usage.

Request size

Input per MTok

Output per MTok

Each request ≤ 200K tokens

$2.00

$12.00

Each request > 200K tokens

$4.00

$18.00

This is a cliff, not a gradient. If a single request hits 210,000 tokens, the entire request flips to the higher rate. Not just the 10,000 tokens above the line. All 210,000 of them.

In agentic workflows, context windows grow across turns. The system prompt stays the same, but retrieved documents and conversation history accumulate. A request that started at 80,000 tokens in turn one can reach 210,000 tokens by turn six. At that point, the cost of that single request silently doubles. No error. No alert. Just a bigger bill at the end of the month.

What this means for your business case.

If you are building an AI deployment and comparing model providers, the pricing page will not tell you what you need to know. Three things will be invisible:

The consumption .

A model that is cheaper per token but takes more steps, reasons longer, and produces wordier answers can cost significantly more per completed piece of work. The only way to see this is to measure total tokens consumed per task, not price per million tokens.

The verbosity.

Reasoning tokens and answer tokens both cost money. A model that thinks in more steps and writes longer answers multiplies your output cost even when the per-token rate is lower. The Artificial Analysis data shows a 3.3x spread in total output tokens across models completing the same work (GPT-5.5 medium at 22M vs Gemini 3.5 Flash at 73M).

The threshold.

Split pricing tiers can double your cost mid-workflow when context windows grow beyond the provider's threshold. The only way to see this is to monitor per-request token counts in production, not just aggregate monthly usage.

The sentence to take to your CFO:

"We do not compare models on price per token. We compare them on cost per completed task, because a model that is 25% cheaper per token can still produce a 75% larger bill depending on how many steps it takes, how much it reasons, and how verbose its answers are." 

The Anthropic Lesson: Model Economics Are a Moving Target

Anthropic’s pricing is a useful product lesson because it shows how quickly AI economics can change.

Image 3 - Screenshot Anthropic Model Pricing May 28th, 2026

A viral interpretation claimed that Anthropic was charging more for older deprecated APIs to push users toward newer models.

However, the older Opus models were not punished. Newer Opus generations became much cheaper to run.

At the time of writing, newer Opus models are listed at roughly $5 per million input tokens and $25 per million output tokens, while older Opus 4-era models remain around $15 per million input tokens and $75 per million output tokens.

That is not just a pricing footnote. 

A model choice that was rational six months ago can become irrational without anything breaking. The product still works. Users still get answers. The team may not notice any quality issue. But the margin quietly leaks because the system is running on an outdated price-performance curve.

This turns model migration into an economic capability, not only a technical chore.

Serious AI products need:

  • A model abstraction layer

  • Prompt versioning

  • Regression evaluations

  • Cost monitoring per task

  • A model review cadence every three to six months

  • A safe migration process when better price-performance appears

Product Questions to Ask Before Spending

  1. What exact task are we improving?

  2. What is the value of completing this task well?

  3. What is the cost of being wrong?

  4. How many tokens, calls, tools, and review steps does completion require?

  5. Does this workflow create a proprietary learning loop?

  6. When will we review the model choice again?

The Reusable Principle

Score the task before you spend on intelligence.

Final Question

If your AI product became ten times more successful next quarter, would your unit economics improve with scale, or would every new user quietly make the product more expensive to operate?

Get involved

Do you want work together ?

I am a Project & Product Manager with 15+ years of experience in both Corporate and Startup. And as you read, I am deeply involved in the business of AI. So if you want me to work on your project.

That's the drop for this week.

If this saved you from one bad model procurement decision, forward it to your team. They probably need it more.

See you next week.

Wilson · The AI Rev Hub Team

P.S. New here? Check the Welcome Kit to catch up on the series from Part 1.

Reply

Avatar

or to participate

Keep Reading