North-star Metric
Which north-star to pick for an agent product — recommendations and counter-examples across four typical business-model scenarios.
A north-star metric is the single highest-priority indicator that the entire product organization aligns on. Its purpose is not “the most important metric overall” but “if the team can only optimize one number, which one”.
For agent products, the choice of north-star depends heavily on business model. The same agent application under a subscription model versus a usage-based model may have opposite optimal north-stars.
Four typical scenarios
Scenario 1: Subscription SaaS (per-seat)
Revenue scales linearly with seats; retention determines LTV.
Recommended north-star: Weekly Active Task Users (WATU)
Definition: “users who completed at least 1 successful task this week”. Stricter than MAU — counts only users actually using the agent for work, filtering out “opened an account but never used” zombie accounts.
Why not MAU: in agent products, “opened the app but did not complete a task” and “completed a task” are qualitatively different forms of activity. Only the latter contributes to LTV.
Scenario 2: Usage-based pricing
Revenue scales linearly with task volume; every task carries gross margin.
Recommended north-star: Monthly Successful Tasks
Directly tied to revenue. Note “successful” — failed tasks still consume tokens, but billing legitimacy depends on contract terms (most usage-based contracts do not bill for failed tasks).
Pitfall: optimizing for raw task volume alone may push the team to lower HITL thresholds (letting more tasks “auto-pass”), driving up failure rate and customer complaints. Pair with quality-layer guardrails.
Scenario 3: Value-based / ROI-anchored pricing
Customer pays based on cost saved — for instance, 30% of the converted labor-hour value the agent displaces.
Recommended north-star: Cumulative Hours Saved
Closest fit to business logic — the team’s optimization target is creating more customer value directly.
Practical difficulty: hours-saved calibration requires customer cooperation (the same task may take different time in different organizations). Typically calibrated against industry baseline or customer-specific initial measurement.
Scenario 4: Internal productivity tool (not sold)
Agent deployed inside a company to boost employee efficiency.
Recommended north-star: Tasks Completed per Employee
Measures agent penetration — how many employees have integrated the agent into their workflow.
Not recommended: hours-saved (internal numbers are typically inflated; lacks external calibration data).
Common pitfalls
- Selecting an unoptimizable metric — e.g., choosing customer NPS as north-star, but NPS is slow to feedback into product decisions
- Selecting a gameable metric — e.g., “task initiation count” can be gamed by UI design that prompts more initiations while ignoring completion quality
- Lacking guardrails — focusing only on the north-star creates single-dimensional gaming. Volume north-star needs completion-rate guardrails; retention north-star needs task-quality guardrails
Relationship to other metrics
The north-star is not the only metric, just the highest-priority one. A complete metric dashboard also needs:
- Guardrail metrics — prevent the north-star from being achieved by sacrificing quality
- Diagnostic metrics — decompose the north-star’s driving factors to locate causes of change
- Input metrics — directly operable levers by product / operations that influence the north-star
Cross-section connections
- Specific formulas and collection mechanisms for the selected north-star: operations/dashboards
- Correspondence between north-star and pricing model: pricing