What is Performance Engineering

 Introduction

This post will help you understand what performance engineering is all about, its basics is and how it goes about in the industry. Let’s get started!

This is how Wikipedia defines performance engineering

Performance engineering encompasses the techniques applied during a systems development life cycle to ensure the non-functional requirements for performance (such as throughputlatency, or memory usage) will be met. It may be alternatively referred to as systems performance engineering within systems engineering, and software performance engineering or application performance engineering within software engineering.

The Basics

We all in the IT / Computer Engineering / Software industry have seen how the applications evolved over the years and the way in which their usage has changed.

The first phase of evolution of an application after development is ‘Testing’ or QA.  This usually means testing the application for defined functionalities which are gathered as a part of functional requirements.  This usually is a single user activity and is done manually (which is a rarity now) or in an automated way.  However today’s applications are not built for single user, they are built to be deployed and used on a global scale, so this is where the next phase of testing begins which is performance tests. 

Once establishing that the application is functionally in order or is termed stable the load tests come in to picture.  This ideally should be the order but in today’s agile world both these phases run in parallel as a result of end-to-end automation

More details

We just discussed about functional requirements and mentioned that the application is tested for a single user and later mentioned that when this functionality is tested on a large scale it is termed as load tests.  You must be thinking about the Wikipedia definition where a term ‘non-functional’ requirements was used, so where does that fit in?

Well, in actuality, what is tested is always the application and its functionality.  But what differs load tests from single user functional tests are these –

  • How many users we want to run with
  • On what kind of hardware we would like to run the application on
  • What functionality (use cases) we would like to cover
  • Finally, what kind of performance metrics are we looking to see as a result

Let’s look at each of these in some detail.

The insights

The first and foremost need of a global scale application that it should be able to run and support multiple users.  So, the important requirement is to understand the user load.  These are usually given by the product owners who have some specific user load in mind while designing the application.  This changes from product to product.  For example the user load of an online shopping application would be totally different from an application that deals with online ticketing.  The user load also sees a season spike which needs to be accounted for. 

This is a best case scenario when such kinds of inputs are obtained.  But in many situations these do not come in so clean and the performance engineers are asked to provide inputs.  In such cases certain variables are taken in to consideration and the engineers proceed with executing certain kind of tests to understand the behavior.

The second requirement is of the hardware.  The last part of the first requirement of certain variables are met here.  While designing an application and anticipating a certain amount of load, a specific hardware is also considered in many situations.  Although cost is an important driving factor a good deployment strategy always takes the best of the hardware in to consideration.

The requirements here are like –

  • The Server / VM should have X CPU + X number of sockets. 
  • The Server / VM should have Y GB of RAM
  • The Server / VM should have Z GB / TB of disk space and should run a HDD / SSD (or) should have a SAN / NAS partitions attached
  • The Server / VM should have so many number of NIC’s running at certain speed and should have optical interfaces

With the deployments happening in a cloud, all of these are selectable from an interface and can be mixed and matched as per need to fit in the desired cost.  But with the deployments now shrinking in to size and the ‘containers’ / ‘pods’ replacing VM’s, the behemoth of computing has contracted to the likes of micro CPU.  I guess this will go in to a separate discussion, perhaps later.

With the hardware requirement taken, the next question to answer is that how many users this particular configuration would support.

This gets us to our third requirement – use cases.  In the beginning we discussed about functional testing / QA.  That activity would probably cover each and every small aspect of the application.  But when we come to performance / load tests, all that cannot be covered and it really does not make sense to cover everything.  Each and every application has certain important flows (or) business logic which is used the most by a majority of the users.  These flows have to be identified in consultation with the product owners for new products (or) can be derived from the usage of the application in production deployments for existing applications. 

Once the functional aspects are finalized, the next step is to identify the load at which each of these run.  Lets consider a small use case for a ticketing based application.  One of the major use case can be as follows –

  • User log-in
  • User searches for travel from point A to B
  • User researches the fare on different dates
  • User books the ticket
  • User log-out

In this, log-in and log-out are executed by each and every user.  But the intermediate steps can vary.  These intermediate steps are termed as transactions.  Say the user might just come in and search and then log out.  Another user might search and then compare the fares for about 15-30 days and then log out.  Another user might just search for a specific date, make the booking and log out. 

With such a variation at play it would be necessary to control the way in which our test script executes.  There are several methods to this, which again will be part of a separate discussion.

Now with all the requirements of user load + hardware configuration + use cases finalized, we come to the final step of the results.  Now with these three things in hand, we execute load tests and then see some results like –

  • X throughput
  • Y hits / sec
  • A CPU usage on the application server, B CPU on the DB server
  • D Mb/s network usage on the app server

-     In certain cases, we go in the opposite direction.  Meaning, we execute our tests to achieve say X throughput (or) Y hits / sec.  This would be case of sporadic / seasonal loads that I referred to earlier.  To achieve this we will have to vary the user load (or) control the script execution specifically to achieve this. 

With all the above completed the last step that comes in is of result analysis.  The results are broadly classified in to –

  • Client side metrics
  • Server side metrics-    

Lets look at these briefly.  When a user accesses an application from the browser the user expects a response within a specific time.  This is known as a the response time.  Lower the value, better the responsiveness of the application.  This is one of the major metric. This is a brief list –

  • Overall response time 90% (ms / sec)
  • Response time per transaction
  • Transactions / sec
  • Hits / sec
  • Pages / sec
  • KB / sec

The server side metrics include –

  • Overall CPU utilization (%) (min, max, avg)
  • Overall Memory utilization
  • Processor queue length
  • Process CPU utilization (application)
  • Disk usage / Disk busy
  • Network (KB/s, MB/s) 

-        With all this data in hand, it’s the responsibility of the performance engineer to connect everything together to understand if the test was successful and it went as expected or if it had any issues and it would be up to the engineer to delve deep in to understanding the problem to root cause it at various levels of application code, database design or the hardware and also propose remedies to overcome the problem. 

As it involves more than just designing and executing load tests and looks in to the systems side of things (as Wikipedia calls), it is called as Performance Engineering and it is an Engineering in itself. 

Lets catch up again in another article soon.  Cheers!

 

 

Popular posts from this blog

Performance Engineering Series – Tuning for Performance - Case Study – 5

Performance Engineering Series – Tuning for Performance - Case Study - 1

Performance Engineering Series – Tuning for Performance - Case Study – 4