You are testing your API, and get a mean response time of ~230 ms.
🎉 Good news — that’s (almost) acceptable!
But then you take a closer look:
- 90% of requests return in under 50 ms
- 9% take 1 second
- And 1% explode to a whopping 10 seconds!
💥 Your API is actually a disaster for some of your users.
The average lied to you. Here’s why — and how to avoid the trap.
The average hides the worst cases (where it really hurts)
Let’s take a simple example:
Requests | Response Time (ms) |
---|---|
190 | < 50 |
10 | > 5000 |
The average is around 215 ms, but does that reflect reality?
- For 19 out of 20 requests, response time is fine.
- But for 1 out of 20 (that’s 5%), it is a disaster.
One request ≠ one user: Averages don’t reflect real user experience
Users don’t make just one request. If they make several and even one of them is slow, their experience suffers.
In many APIs, a business transaction often involves several calls:
- Authentication
- Fetching a product
- Creating an item
- Validating an action
(And in the case of paginated data, it’s common for users to load multiple pages.)
🚨 Services relying on your API often have timeout rules. If your response is too slow, they’ll drop the connection. And if they don’t implement retries, you risk breaking the whole transaction, even if it’s only a small fraction of requests causing the issue.
💡 Worst-case latencies can affect far more transactions than the average suggests.
🎯 Real-world example:
Imagine an e-commerce site.
- 95% of users browse smoothly (P95 = 200 ms).
- But the remaining 5% experience delays of 5 to 10 seconds!
- Guess what? Those are often the users trying to complete their purchase…
And considering that browsing typically involves multiple requests per page, chances are high that some users will hit those slow ones (your 5%).
If just one request is slow, it degrades the entire user experience. Users don’t feel an “average response time”, they feel every request individually.
💡 Tail latencies (P95/P99) have a direct impact on your business (conversion rates, customer frustration, cart abandonment…)
So what should you look at instead? (Percentiles!)
Forget the average — use percentiles:
- P50 (median): Half the requests are faster than this value.
- P90: 90% of requests are below this threshold — the slowest 10% begin here.
- P95: 5% of requests are slower — the ones that start to hurt the user experience.
- P99: The slowest 1% — the ones that truly enrage users (and even this may not be enough — see YouTube: “How to Not Measure Latency”, by Gil Tene)
💡 If your P95 spikes to 5 seconds, you have a real problem, no matter how good your average looks.
Averages fail when there’s instability
Another problem with averages: they don’t reflect variance.
Example:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6 | Test 7 |
---|---|---|---|---|---|---|
100 ms | 200 ms | 300 ms | 50 ms | 5,000 ms | 500 ms | 45 ms |
Average = 885 ms ❌ (Already really bad)
But what does P95 say? 👉 5,000 ms… a nightmare for some users!
What causes this variance?
- Exclusive access to critical resources (e.g., DB locks)
- Unstable third-party dependencies (external APIs, rate limits, timeouts)
- Batch jobs running during peak times, hogging system resources
💡 Look at the full response time distribution, not just one number! Variance is an early warning sign — a signal worth investigating.
In summary: Averages are meaningless!
- They hide the worst cases — where the real problems are
- They don’t reflect the actual user experience
- They ignore performance variability
How to analyze performance properly:
- Always (at least) use percentiles (P90, P95, P99)
- Visualize the response time distribution (histograms, box plots)
- Monitor latency under load (scalability matters)
- Correlate metrics with real user experience (perceived load times)