Load Testing Kubernetes: Building a Framework (Part 1)
Slow response times, resource exhaustion, downtime and waiting on workloads lead to dissatisfied users. Load Testing proves the platform remains stable under heavy workloads and can recover quickly from failures.

Our Achievers multi-cluster kubernetes platform needs to be flexible to handle predicted and unexpected traffic. The SRE team designed a framework to help our engineers easily stress the system and monitor results. Part one of this series will run through what the Achievers team did to prepare our load-testing-framework and how we ensured we met our objectives. The next article will review how we unblocked bottlenecks and improved overall performance.
Before jumping into tools and processes, the team needed to come up with a list of requirements around success and failures. It was critical users reviewing the results clearly understood if performance improved or decreased and which data points to look at. The outcome of each test can be measured based on the following criteria:
- Throughput: Refers to the number of requests that a system can handle per second.
- Errors: Number of failed requests that occurred during the test period.
- Latency: The time taken for a request to be processed from the time it was sent out.
- Scalability: The Platform’s ability to handle increasing loads without significant performance degradation.
Building out a Load-Testing framework in 4 steps:
To build the right framework we explored the following 4 steps to reach our goal:
1. Picking the right tool
Our monolith had been using Jmeter for years, however we wanted to take advantage of recent tools for our new framework. Automating a solution anyone could contribute to would be a challenge using Jmeter due to it being overly complicated to write tests and report results the way we wanted.
After reviewing some of the new players we decided on K6. K6 was a new kid on the block and would be easy for anyone to write tests in Javascript. It also had support for GRPC which allowed for us to test services directly not just through the gateway

2. Defining scope
The goal was to replicate a user experience on our platform using K6 with a focus on backend performance. The SRE team came up with a scope for the testing framework.
- Focus on backend components and performance.
- Omit third party APIs from your testing. This can be a bottleneck outside your scope of test, and could cause issues for those external services.
- Mind cost. CDN, extra logging, and cloud costs like compute/egress etc are all more expensive when stressing the platform.
- Don’t test CDN. As per the documentation on the k6 website, it may be better to test without cached resources.
- Don’t test media assets. End-to-end tests cover this case. Cost can also be a factor if you include large images or videos in load-testing.
During the period of testing we could hit the framework on demand, but the goal was to run this as part of our CICD pipelines. Leveraging existing tooling, we could drop the K6 framework into pipelines and run load or stress tests.
Our main target for running load tests would be our pre-production environment. We agreed to run our tests from a Kubernetes cluster outside the network of our application clusters to simulate external users calling the platform. The testing cluster would have dedicated nodepools to run the load to avoid saturation from other workloads. By using Kubernetes, we also have the ability to distribute virtual users across multiple workers.
3. Building a Baseline
A baseline is a starting point for performance values to show improvement or degradation over time. Baseline results are a good place to get an idea on what you can handle today and the goals you wish to achieve. Our initial trial-and-error tests helped us come up with a set of questions to use for discussions when defining our baseline goals.
- How many virtual users can we handle today/What do we want to achieve?
- Are there any known bottlenecks or limitations to document?
- What can be achieve today with the defined success criteria/What do we want to achieve? (Throughput/Latency/Errors/Scaling)
- What are the threshold values?
**Note: Results and how we reached our goals will be covered in part 2**
4. Defining Thresholds
The thresholds set are based on our initial criteria of throughput, errors, and latency. Thresholds are interesting because they show a direct correlation to individual service SLO (service level objectives). Since we already have defined availability and latency SLOs, this gave us a great starting point for setting performance thresholds. For example, if an endpoint is below a 99.00% availability SLO, the error rate during the testing period should not exceed 1%. If the error threshold is exceeded, the framework should fail that test.
Along with thresholds we set goals. For each of our success criteria we wanted to set a specific goal and make sure we were within the allowable threshold. Goals will help us show improvement when bottlenecks are addressed.

Conclusion
Load testing your platform is essential to ensuring you can handle unexpected workloads and run efficiently. Designing a reusable framework and creating a test plan with measurable performance metrics are best practices that can help you achieve success in performance testing Kubernetes. By following these best practices, you can ensure that your Kubernetes cluster performs optimally and meets the needs of your users.
In part two of the load-testing series, I will talk about how to analyze the results of your test and optimize cluster configurations. I will cover how Achievers improved performance by tracking and unblocking bottlenecks in Kubernetes workloads, Istio, Google Cloud Platform and more.