This article describes how to do scalability testing of a web application that is deployed in Azure Kubernetes Service (AKS). Typically, most applications will have to adhere to:
meet the response time as specified by SLA. If there is no prior SLA, then consider that the application response times should be very minimal proportionate to the load.
For the defined SLA’s (+ buffer) and response times, the server uptime should be high
For testing, consider the hardware to resemble the production, including data, capacity or volumes.
Scalability testing of the application should consider the following performance factors: –
- Response time
- JVM usage like heap, non-heap and GC intervals
- Network bandwidth usage
- Server Activity including IO, User, and System usage
- Disk Read/Writes speed
- Similar Hardware configurations resembling the production usage
- Knowing how scalable the software isScalability testing will answer, for example, the following questions:
- How many users can I add without impacting the performance of the application?
- Will adding more servers solve my performance issues?
- Will doubling the hardware allow me to double the number of users?
The above questions can be answered through different kinds of scalability testing, which are briefly described below:
- Predictable Scalability: Keeping the hardware static and increasing or decreasing the volume of users to see if the performance is impacted proportionally to know the predictability of the system is called predictable scalability.
- Horizontal Scalability: When a server is not adequate to handle the current workload, the process of bringing in new server to share the workload along with the current server is called scaling out or horizontal scaling.
- Vertical Scalability: When the current server is not adequate to handle the workload, the process of replacing it with a higher capacity server is called scaling up or vertical scaling.
All the above approaches to scalability testing have their own advantages and disadvantages.
- Preparing for the TestBefore we start the test, we must prepare the load with similar hardware configuration resembling the production environment to handle assumed incoming traffic in real situation.
Before we test for scalability, we also need to find the breaking point of the application by doing stress testing. With the results from stress test, we can analyze how much performance we are achieving and at what point application/server breaks down.
Let’s say we consider 10000 requests per sec hitting the server breaks the application, this will be the breaking point, and we could consider 80% of that breaking point i.e. 8000 requests per sec to be a standard threshold. Based on this analysis, we should plan to scale the application when the utilization reaches this threshold. Also, it depends on the load/start-up time of the application, including the resources provisioning to start the application.
Scalability Test plan
The following steps were followed to execute the scalability testing:
- Pick a process or functionality that covers end-to-end or a most used scenario for conducting scalability tests
- Define response times and other critical performance criteria as per SLA
- Identify how to do the scalability testing either with scripts or tools or multiple environments, and establish the environment.
- Make sure all required tweaks or configurations or changes are done prior to testing
- Run a sample scenario to ensure everything works according to the expectation and then execute the planned test scenarios
- Execute load tests
- Analyze results and generate report
Executing Scalability Tests
Goal: Application under test will be doing a login request which will internally re-direct for security validations and then land into the application. Here the goal is to achieve 1 Million requests triggering the application in a stipulated time period to ensure the re-directs and login to the application will be performed with maximum success rate and minimal response time using the resources to their maximum.
System Overview – a simple setup would have a Dockerized web application deployed in AKS, pointing to a relational database.
- Uses Azure Event Hub service with Kafka enabled
- Uses Azure SQL Service
Horizontal Scaling – auto-Scaling enabled by default with Minimum 4PODS and scaling by 1 POD
- Scaling Threshold – when CPU usag exceeds 40% of the POD, auto-scaling will kick in and 1 new POD will be created. There is no Memory resource-based scaling filter applied.Approach to Goal: To achieve the above cited goal, we divided the tests into 2 categories, to identify the bottlenecks in the database and application code separately:
- Database load test – to identify the slow queries and missing optimizations in the DB schema such as indexes and other DB configurations
Application load test – to identify the bottlenecks in the application code pertaining to optimized execution, memory usage and other resource utilization.
First, the tests are executed on the database. Test 1 describes the DB load test to identify the initial bottlenecks, which uses data insertions into the DB.
Test 1: Initial test started with insertion of 1K #rows and it took a total time of 14hrs.
Issues Identified : It is observed that certain DB calls are taking significant time, and upon detailed analysis identified the queries which are causing the delay and made the following changes:
- added proper Indexes to make the query deliver faster results
Test 2: Tested with 2K #rows insertion and identified few more bottlenecks even though the throughput improved a lot.
Issues Identified: Observed that the DB calls are still taking significant time, made the following changes:
- tweaked the queries in DB so that the inserts would happen much faster
Test 3: Performed Test 1 again after the above tweaks and 1K #rows were inserted now in 56 secs, which is a dramatic performance improvement!! The table below presents the results of the above tests:
Test Number # rows inserted in DB Time Taken Test 1 1K 14hrs Test 2 2K 2hrs 43min Test 3 1K 56secs Test 4 50K 9min
Test 4: Now that we have optimized the DB calls, we increased the number of rows insertion to 50K to validate if the DB would sustain and yield better results with growing size of data. We observed that the Test 4 with 50K rows insertion yielded good throughput from a DB perspective.
Now that we have optimized the DB size, we moved the testing phase to the application performance. Here the scalability test would perform 1 Million requests in a stipulated time to identify issues related to application logic or JMV or any cloud configuration.
During the test, we identified that the requests are failing on the server side. After analysing these failures, we identified that the existing configurations are not suitable for the test and optimized them. The same test is repeated multiple times to identify further issues if any. After couple of test cycles, we performed the following optimizations to improve the application performance.
Configured the AKS auto scaling throughput CPU usage -40%. If the CPU usage exceeds 25% of the POD, then a new POD will be added automatically.
- Increased the number of partitions in Azure Event hub service
- Database tier increased from S2 to S4
- Upgraded to Java 10 for better resource management in docker containers
- Optimized docker container resources default values
After the above optimizations, we were able to hit 1 Million requests in a time span of 1 hr. To achieve these 1 Million requests, we had to scale the #Users to 200 onto 2 Virtual Machines. We also had to ensure that the requests didn’t hit the server continuously without any think time, as that will lead to failure on server side during validation. So, we have considered an ideal think time of 120sec and conducted the test as depicted in the scenario shown below:
#Users #VMs #Per VM Users Time-period Think-time Total # of request 200 2 100 1hr 120sec 1 Million
Once the above scenario is tested, we analyzed the stats and observed the average response time is <1sec which is within acceptable range. The following table summarizes the test results:
Avg time (ms) Min (ms) Max(ms) Median 95% User Throughput /s 934 22 79609 235 4283 ms 80 ms
The Azure real-time graph taken during the test which was conducted to achieve 1Million requests triggering is shown below. You can see that the CPU utilization has reached 100% which is a good sign of resource usage. But in real-time, we shouldn’t be reaching 100% utilization as we should have some buffer CPU for the system and other processes to run on the server. It would be ideal to limit the CPU usage in production to between say 70%-80%. But, here our goal is to test the max possible case to see in how much time we can achieve 1Million requests with full utilization of resources and without breaking anything.
The graph below shows the incoming requests to the event hub, where you can see that we have hit 999.91k incoming messages.
The screenshot below shows the number of containers deployed during the test and their max usage. You can observe that the first container is utilized to the maximum first, compared to other containers and the status is OK. indicating the test as successful.