North-star Metric

A north-star metric is the single highest-priority indicator that the entire product organization aligns on. Its purpose is not “the most important metric overall” but “if the team can only optimize one number, which one”.

For agent products, the choice of north-star depends heavily on business model. The same agent application under a subscription model versus a usage-based model may have opposite optimal north-stars.

Four typical scenarios

Scenario 1: Subscription SaaS (per-seat)

Revenue scales linearly with seats; retention determines LTV.

Recommended north-star: Weekly Active Task Users (WATU)

Definition: “users who completed at least 1 successful task this week”. Stricter than MAU — counts only users actually using the agent for work, filtering out “opened an account but never used” zombie accounts.

Why not MAU: in agent products, “opened the app but did not complete a task” and “completed a task” are qualitatively different forms of activity. Only the latter contributes to LTV.

Scenario 2: Usage-based pricing

Revenue scales linearly with task volume; every task carries gross margin.

Recommended north-star: Monthly Successful Tasks

Directly tied to revenue. Note “successful” — failed tasks still consume tokens, but billing legitimacy depends on contract terms (most usage-based contracts do not bill for failed tasks).

Pitfall: optimizing for raw task volume alone may push the team to lower HITL thresholds (letting more tasks “auto-pass”), driving up failure rate and customer complaints. Pair with quality-layer guardrails.

Scenario 3: Value-based / ROI-anchored pricing

Customer pays based on cost saved — for instance, 30% of the converted labor-hour value the agent displaces.

Recommended north-star: Cumulative Hours Saved

Closest fit to business logic — the team’s optimization target is creating more customer value directly.

Practical difficulty: hours-saved calibration requires customer cooperation (the same task may take different time in different organizations). Typically calibrated against industry baseline or customer-specific initial measurement.

Scenario 4: Internal productivity tool (not sold)

Agent deployed inside a company to boost employee efficiency.

Recommended north-star: Tasks Completed per Employee

Measures agent penetration — how many employees have integrated the agent into their workflow.

Not recommended: hours-saved (internal numbers are typically inflated; lacks external calibration data).

Common pitfalls

Selecting an unoptimizable metric — e.g., choosing customer NPS as north-star, but NPS is slow to feedback into product decisions
Selecting a gameable metric — e.g., “task initiation count” can be gamed by UI design that prompts more initiations while ignoring completion quality
Lacking guardrails — focusing only on the north-star creates single-dimensional gaming. Volume north-star needs completion-rate guardrails; retention north-star needs task-quality guardrails

Relationship to other metrics

The north-star is not the only metric, just the highest-priority one. A complete metric dashboard also needs:

Guardrail metrics — prevent the north-star from being achieved by sacrificing quality
Diagnostic metrics — decompose the north-star’s driving factors to locate causes of change
Input metrics — directly operable levers by product / operations that influence the north-star

Cross-section connections

Specific formulas and collection mechanisms for the selected north-star: operations/dashboards
Correspondence between north-star and pricing model: pricing