Measuring AI assistant effectiveness: six honest KPIs

The vanity metrics trap

Three indicators often put forward that don’t measure what they claim:

Number of queries: a user who asks the same question ten times because the answer is bad inflates the metric.
Number of users created: opened accounts say nothing about real usage.
NPS on AI: too generic, too influenced by the novelty effect.

Here are the six KPIs that measure real value.

1. Active adoption: weekly users / users created

Definition. Percent of created users who ask at least three questions in the past seven days.

Healthy threshold. > 60 percent at three months, > 70 percent at six months.

Diagnosis. Below 50 percent at three months, usage is shallow: revisit onboarding, check whether the assistant is actually faster than the existing tools.

2. Measured accuracy: percent of correct answers on a test set

Definition. On a stable test set of 30 to 50 questions, the percent of answers judged correct by a business reviewer.

Healthy threshold. > 90 percent in stable production.

Diagnosis. Below 85 percent, a source documentation or configuration problem. See Good documentation makes a good assistant.

3. Documentation coverage: percent of “I don’t know” answers

Definition. Percent of queries where the assistant says it can’t answer.

Healthy threshold. Between 5 percent and 15 percent in steady state. Below that, the assistant is likely making things up (hallucinations). Above that, the corpus is too thin for real usage.

Important noteA “don’t know” rate that’s too low is not good news. It often means the assistant is answering incorrectly rather than admitting a gap. Always check accuracy in parallel.

4. Time saved: valued in hours and euros

Definition. Hours saved per user per week on the covered tasks (search, writing, summary), valued at fully loaded hourly cost.

Measurement method. Quarterly survey on a panel of 10 to 15 users: “for the questions you asked the assistant this week, how long would it have taken you without it?”

Healthy threshold. 3 to 8 hours per user per week for moderate use cases. More for intensive cases (legal, support, doctrine).

5. Offload: percent of requests handled before reaching a human

Definition. For assistants open to the public or support: percent of requests fully handled by the assistant, with no escalation to an agent.

Healthy threshold. Between 50 percent and 75 percent depending on scope. Broader scope = lower offload.

6. User satisfaction: structured feedback, not generic NPS

Definition. Three short questions, asked at the end of a conversation on 10 percent of traffic:

Was this answer helpful? (yes/no)
Did you save time? (1 to 5)
Would you recommend this assistant? (1 to 10)

Healthy threshold. > 80 percent helpful answers, > 4/5 on time saved, > 8/10 on recommendation.

To turn these indicators into a ROI calculation, see Calculating and maximizing ROI.

Build your dashboard

Twenty minutes to identify the six KPIs that fit your case, their measurement frequency, and their alert thresholds. We hand you a ready-to-use template.

Book a demo→

Six honest KPIs to measure an AI assistant

The vanity metrics trap

1. Active adoption: weekly users / users created

2. Measured accuracy: percent of correct answers on a test set

3. Documentation coverage: percent of “I don’t know” answers

4. Time saved: valued in hours and euros

5. Offload: percent of requests handled before reaching a human

6. User satisfaction: structured feedback, not generic NPS

Read next.

Calculating and maximizing AI assistant ROI

How to make your AI assistant project succeed

Prompting like a pro