How a code working in developer and test environment failed in production environment ? This is a very familiar question and we all know the answer and answer is to add performance engineering stage in software development life cycle. Still most of product organisations are confused what to do to and how to do.
Performance engineering is not only a load testing but it is to understand user behaviour & pattern , how they interact with your platform and simulating that in controlled lab environment.
Before we talk about engineering side of it, let’s understand what type of skill set is needed for to have performance engineering practices in your organisation.
Team :
Performance engineering is one area of site reliability engineering. It need engineers who have programming skills, understand each layer of production system, deployment architecture and have attitude to solve operational problems with software engineering.
Why programming is essential skill for a performance engineer? He needs to code user behaviour exactly the way user interact with system. He needs to automate test execution, report extraction , matrics comparison and other repetitive tasks so that he focus more on problem detection and fixing.
Tools:
To execute performance test, many tools are available. I recommend tools which are light-weight, can spawn million of users on low configuration machine and have programming capability.
Locust and Gatlin support performance as a code. Can be integrated with CICD pipeline for continuous performance testing.
KPI :
Performance engineering is centric on two main KPIs
- Response Time:
This is a very critical matric which must be consistent even at high throughput.
Consider below as important criteria for simulation.
- Understand possible user behaviour for services consist of 90% of traffic.
- Measure 90 percentile average response time instead average response time.
- Let hardware resources exhausted till 60% utilisation.
As an industry practice, All APIs should respond within 200ms range.
2. Cost:
Performance engineering is a continuous improvement in hardware needs for same throughput.
Performance Engineering of mobiquity pay X:
mobiquity pay X is a next generation, cloud ready version of award winning stored value wallet management platform. We at Comviva have incorporated continuous performance engineering practices integrated as part of product release pipeline.
Our performance engineering team is a highly skilled set of developers who have production experience and work toward optimising response time and cost of product.
Continuous Performance:
The first challenge was to identify the user behaviour. We having large customer base, got lot of data input to identify traffic pattern, daily active user base and most used services.
After that scripting was easy. We used python to script user tasks and Locust as performance load tool.
Idea was to automate all repetitive tasks so that team can focus on tuning response time and cost , main KPI for this exercise. Below tasks are automated and integrated part of CICD pipeline.
- Define traffic conditions
- Execute Load
- Compare results with expected results
- Prepare report
- Re-run.
Pipeline:
End to End pipeline as presented in diagram, build is promoted to performance environment after functional tests are passed successfully.
All performance results including application matrices, JVM matrices, infra matrices , failure rates etc are scrapped by exporters into Prometheus and visualised by Grafana.
We also integrated slack to notify results and analysis as a part of pipeline itself.
Performance engineering if planned properly and is part of development life cycle, can help avoid lot of surprises in production systems.
This article is also available on Medium.