Your AI Dashboards Are Lying to You
Imagine this. Your COO presents a board slide showing your new AI tools deliver an 87% productivity improvement. Task completion is up. Cost per task is down. The board is impressed.
But here is the question nobody asked: how much human time went into making that AI work?
The time your people spent explaining tasks to the AI. Fixing its mid-run mistakes. Reviewing and correcting its output. None of this shows up on your dashboards.
Standard productivity metrics track what the AI does. They miss what the humans spend to supervise it. The result: organizations systematically overestimate AI productivity.
New research published two days ago by Stan Loosmore introduces the Leverage Ratio — the first formal framework for measuring true human-AI productivity.
Leverage Ratio = Human work displaced by AI ÷ (Specification time + Interrupt resolution time + Review time)
Any ratio above 1 means AI saves more time than it costs to supervise. Below 1 means the hidden human costs exceed the productivity gains.
The framework goes deeper than the simple ratio. It decomposes human time into three channels with different cost structures, distinguishes per-task leverage from windowed leverage (which compounds across recurring tasks), and reveals an uncomfortable truth: even the best AI cannot eliminate the human time required for truly novel work.
Executive Summary
The formula: L = Work_displaced / (T_spec + T_int + T_rev)
Ratio > 1 = positive ROI | Ratio < 1 = AI costs more human time than it saves
The three hidden costs:
- Specification time (T_spec) — Explaining the task, providing examples, setting constraints. The largest and most commonly overlooked cost.
- Interrupt resolution time (T_int) — Fixing mid-run errors, providing missing context, re-routing off-course agents.
- Review time (T_rev) — Verifying output correctness, completeness, policy alignment.
Key strategic insight: Per-task leverage is bounded by task novelty. Windowed leverage compounds across recurring tasks as upfront investment gets amortized. The task novelty floor always preserves a human role.
Paper at a Glance
| Metric | Value |
|---|---|
| Title | Leverage Laws: A Per-Task Framework for Human-Agent Collaboration |
| Author | Stan Loosmore |
| Published | April 27, 2026 (2 days ago) |
| Venue | arXiv (Computer Science) |
| Relevance Score | 92/100 (VERY HIGH) |
| Focus Domain | Human-AI collaboration productivity measurement |
| Headline Contribution | Leverage Ratio with three-channel decomposition |
| Paper URL | arxiv.org/abs/2604.25040 |
Why Standard Dashboards Miss the Real Story
A financial analyst uses an AI agent to generate quarterly reports. The dashboard shows the agent produces each report in 12 minutes. Manual was 90 minutes. That looks like an 87% productivity gain.
What the dashboard doesn’t capture: the analyst spends 20 minutes specifying parameters and providing sample formatting. Another 10 minutes fixing mid-run errors — pulled the wrong data source, misinterpreted a chart instruction. And 15 minutes reviewing and correcting the output.
Total human time: 45 minutes. Total displaced manual work: 90 minutes. Actual leverage ratio: 90 / 45 = 2x. Positive ROI — but far from the advertised 87%.
Worse scenario: A junior associate drafts a routine contract with AI. Dashboard shows 8 minutes vs 60 minutes manual. But the associate spends 30 minutes writing a detailed prompt, 10 minutes re-routing through a missed compliance check, and 25 minutes reviewing for jurisdictional accuracy. Total human time: 65 minutes. Leverage ratio: 60/65 = 0.92x. Negative ROI, counted as 87% improvement.
The Three Hidden Channels
Specification Time (T_spec)
The cost of translating human intent into AI-understandable instructions. Detailed prompts, examples, boundary conditions, policy constraints, fallback instructions.
Optimization insight: Agent memory matters as much as capability. An agent that retains context across tasks reduces re-specification time. An agent needing the same instructions repeated inflates T_spec without increasing output.
Interrupt Resolution Time (T_int)
The cost of handling deviations. The agent goes off course, misunderstands an instruction, or hits an edge case its training didn’t cover.
Optimization insight: Better capability reduces interrupt frequency. But the relationship is nonlinear — remaining interrupts become harder as easy problems get solved first.
Review Time (T_rev)
The cost of verifying output quality. Even correct execution must be validated before use.
Optimization insight: Trust calibration. As an agent demonstrates reliability on specific task types, review time decreases. But for novel or high-stakes tasks, review must remain high. Trust is task-specific, not agent-general.
Per-Task vs. Windowed Leverage
This is the paper’s most important strategic insight.
Per-task leverage measures a single execution. It is bounded by task novelty — novel work always requires human specification and review.
Windowed leverage measures across recurring tasks. Upfront specification, agent configuration, and workflow design get amortized across the window.
An AI deployment for customer support ticket triage: per-task leverage on the first ticket might be 0.5x (setup outweighs savings). By the 100th ticket, with refined templates and calibrated agent, it might be 8x. Windowed leverage captures both.
This changes investment strategy. Low per-task leverage on a new deployment is not failure — it may be upfront investment amortized across hundreds of tasks.
The Task Novelty Floor
The paper identifies a fundamental constraint: truly novel tasks always require human time regardless of AI capability.
A novel task cannot be fully specified in advance — you don’t know what the output should look like until you see it. A novel task generates unexpected problems. And a novel task requires judgment about correctness that cannot be delegated.
The strategic implication: Organizations should not aim to eliminate human involvement. They should understand where the novelty floor sits for different task types. High-novelty tasks need human-in-the-loop. Low-novelty tasks are candidates for automation.
This guides workforce strategy. Routine, well-specified roles face the highest automation pressure. Novel problem-solving roles face the lowest.
What Business Leaders Should Do Next
- Audit your current AI tooling — For the top 10 AI-augmented workflows, estimate T_spec, T_int, and T_rev. Compute actual leverage ratios.
- Identify quick wins — Tasks above 3x are scaling candidates. Below 1x need redesign.
- Track the three channels — Add specification, interrupt, and review time to dashboards.
- Model amortization curves — How many recurring executions until upfront investment pays back?
- Classify tasks by novelty — Map roles to novelty levels. Guide reskilling toward high-judgment work.
- Invest in agent memory — Context retention amplifies leverage on recurring tasks.
- Balance the portfolio — Low per-task leverage today might be high windowed leverage tomorrow.
Conclusion
Stop asking if AI is productive. Start asking what your leverage ratio is per task type.
The Leverage Ratio framework exposes hidden human costs standard dashboards miss, distinguishes investment-phase from genuinely unproductive deployments, and provides a clear framework for prioritizing AI investment. Organizations that implement it will make better decisions. Organizations that don’t will systematically overestimate their AI productivity.
0 Comments