Load testing an API: How to simulate realistic traffic?

You’ve set up a load test. Everything runs smoothly in PREPROD. But once you hit PROD—chaos.

Why?

Because your test fires off requests like a robot, not like real users.

The result?

You think your API handles the load… until it gets hit with an unexpected pattern (a spike, long load) and crashes.

Let’s look at how to avoid this by simulating realistic traffic!

Stop Doing “Straight Line” Tests

The classic test:

You send exactly 1000 requests per second
Each request hits at the perfect time interval
Everything is neat and predictable

But real life isn’t like that.

Your users don’t behave in deterministic, linear ways.

They come in waves, click like crazy, refresh pages… It’s messy. It’s organic.

💡 Add randomness to your traffic. Simulate irregular patterns like pauses, repeated actions, impatient users, or slow ones.

Plan for Traffic Spikes

You tested under stable traffic—great.

But did you test what happens:

Monday at 9am when everyone logs in at once?
Black Friday when your servers are swamped with orders?
A viral spike (that marketing campaign you launched—BOOM 💥 instant success)?
Talk with your teams about seasonal load patterns. Personally, I rely on monitoring data (when available) or directly parse logs when possible. In the tire retail business, there are spikes during seasonal tire changes—especially at lunch time.
Test with gradual ramp-ups, sudden spikes, and sustained high loads (watch your bills!). Each pattern exposes different behaviors—more on that in a future article.
The app’s behavior after the stress is just as important. The objective is to monitor your system and its capacity to recover properly.

💡 Always account for uneven traffic distribution and observe recovery behavior.

Simulate User Scenarios, Not Just Requests

A basic test sends isolated requests.

But a real user:

🔑 Logs in
🔍 Searches for a product
🛒 Adds it to cart
💳 Places the order

And that’s just one possible flow. There are other behaviors too—comparing products, opening multiple tabs, the search for a specific item (that lasts forever, because the search bar doesn’t understand whatever I wrote…)

Write full user journeys, not just a firehose of requests. Tools like k6, Gatling, or Artillery can simulate actual user flows. Beyond “sending a lot of requests”, you can choreograph API calls to mimick user behavior.
Base your scenarios on real user behavior:
- Use tools such as Real User Monitoring, heatmaps, or even simply watch real users interact with your product.
- A quick call or video chat with API integrators can save hours—you’ll discover usage patterns you never expected.

💡 Leverage logs and RUM tools (Real User Monitoring) to identify actual usage and traffic volume.

Hidden Bottlenecks and Test Environment Gotchas

Your test sends 10,000 requests, all green… but still, production lags.

Why?

🚫 Because (and multiple may apply):

You missed a third-party rate limit
Your cache is hiding the real bottlenecks: Yes, sending the same data again and again doesn’t test actual processing
Your database is heavier in production: with far more records than your test DB
A scheduled Batch job triggers your API server, and you didn’t factor that in
Simulate diverse users and data:
- Multiple test machines to vary IPs
- Different user accounts
- Use varied datasets (personally, I like Faker.js for generating fake but realistic data)
- Test with and without cache
Disable stubs/mocks for third-party APIs and test “close-to-prod” setups :
- Be careful, pay-per-request APIs can cost you!
- Still, you might be surprised how many third-party SLAs are broken
Don’t test with an empty DB!
- Indexes behave differently with full tables
- Same for all other components: message queues, file systems…

💡 As always: test environments should resemble real production as closely as possible—and beware of caches!

Measure What Actually Matters

Okay, your API handles 1000 req/s. But… what does that really mean?

Average response time: Fine, but what about P95 or P99? Forget the mean (see In performance, the average means nothing.)
HTTP errors: 200 OK is nice, but what about all those 429s (too many requests), or occasional 500s?
Infrastructure impact: CPU, RAM, DB, filesystem… Are they holding up or melting down? I advise to correlate your test data with infrastructure metrics.

Key points to watch for:

Don’t trust metric aggregation blindly: averages can be meaningless. Sometimes min/max tells a better story.
Track key metrics (latency, error rate, request durations, CPU/memory usage & saturation, IO pressure), don’t guesstimate!
Don’t forget post-load recovery metrics: Can your system bounce back once the traffic slows?

💡 Three useful concepts to study: Four Golden Signal, RED, USE

TL;DR – Realistic Load Testing for Your API

“Smooth” load tests (linear traffic, identical requests, stable data) give you a false sense of security. In production, traffic is chaotic, irregular, and unpredictable.

✅ To simulate realistic traffic:

Add noise: pauses, spikes, irregular behaviors
Reproduce real usage: full user scenarios (login, search, checkout…)
Use realistic and expected volumes: based on past data and forecasted usage
Test edge cases: Monday 9am spike, Black Friday, viral campaign
Vary the data: cache hit/miss, different users, varied IPs, a “real” DB
Watch your metrics: HTTP errors, CPU/DB saturation, recovery behavior
Correlate with real-world data: logs, APM, user needs

A solid load test reflects real-world behavior. Let’s ditch the “straight-line” tests and think like our users: unpredictable, impatient, and always surprising us.

And remember: a test is never “done.” I adjust my scenarios regularly based on live data (thanks APMs).

Don’t live in theory—because in theory, everything works just fine…

Cédric CHARIERE FIEDLER

Web, QA & Perf Architect

# API,Front,QA,Performance,Cloud