Skip to main content

TestOps: What is it and Why we need it

Years of experiences in the software industry shows that there are many more thing that has not been discovered yet. Every approach found in the software industry aims to accelerate the development speed with more customer satisfaction.  Hence the new term "TestOps" is also creating different perspectives on the relationship between test activities to the operational activities within the DevOps culture. In the post, I want to explain my experience with the TestOps. 

What is TestOps

It shortly stands for tests and operations. Within the DevOps culture, it is a sub-discipline DevOps. There are two approaches to TestOps, these are related to the test approach that you are applying the overall test approach. Mainly if you are applying `shift-left testing` then the TestOps should be adopted to have more collaboration with the development team, `TestOps shift-right`. On the other hand, if you are applying `shift-right testing` then the TestOps should be adopted to have more collaboration with the operation team, this can be called `TestOps shift-left`. 

Why We Need TestOps

Developing something from idea to a product is needing different expertise such as product owner, developer, tester, DevOps. This shows that one can not handle everything by himself, or doing one responsibility doesn't mean that you are the expert and doing the correct thing. In the testing, there are always changes. We need to adapt ourselves to the waterfall, v-model, agile development process, and also application architectures patterns like monolithic, microservices. Required tests for each process and pattern are different so the required capabilities of the testers are also different. The emerging pattern nowadays is microservices and the most popular development process is agile. 

With emerging trends in the microservice architectural pattern, shift-right testing has risen significantly.  One of the pioneers who applied the microservice pattern widely is Netflix, and they are suggesting to test those hundred of microservices in the later stage of development as well. Let's look at the real-world example of Amazon and Netflix below:
Number of Integration Point For Amazon and Netflix

Since there are hundreds of microservices and the number of integration points between each of them is very high.  Let's just think 2 services then the integration points become 2. If there are 3 services, then it becomes 3. Respectively 6, 10, 15, 21, and so on. It increases drastically when the number reaches to hundreds, for example, 100 services have 4950; 500 services have 124750 two subset integration points.  Basically, the combination formula is below, where the n is the number of microservices, r is subset/subgroup which is 2 for integration calculation.

Netflix, as well as Google and Amazon, also has been the pioneer in the testing of the Microservices. Thanks to Netflix, they explained the experience transformation publicly. Netflix and also Spotify have changed the use of the traditional test pyramid. It is reverted and it becomes a test diamond, which is saying that the unit test is still important but instead of writing exhaustive unit tests we should focus on integration test more. For detail how Spotify change the test pyramid for microservice testing you can read this post

Traffic in Netflix

So far I want to explain the system that we need to make an impact on the quality of the overall system becomes more and more complicated. Handling the test activities of those systems should be updated with new approaches. Whatever your testing approach is shift-left or shift-right, you should use the discipline of TestOps effectively.

TestOps shift-left 

TestOps is needed when you want to leverage the test automation benefits. This requires lots of expertise for TestOps shift-left, these are some basic tasks that you will need to do for a good test automation practice, such as: 
  • Using containers
  • Using container orchestration tools
  • Creating test pipelines with automation tools
  • Integrating test tools the pipeline
  • Integration test pipeline to the main development pipeline
  • Creating or using a well-known automation framework
  • Writing deterministic tests
  • Creating test data
  • Removing dependency in the tests
  • Creating isolated test environments
  • Running test parallel
  • Creating test reports
  • Binding these test reports
  • Rerunning failed tests
  • Extracting flaky tests and making them investigated
  • Destroying the test environments after tests
  • Removing test data
  • Creating a monitoring system for the visibility of tests
  • so on ...

TestOps shift-right

Let's look at the TestOps shift-right, which is mostly focusing on the collaboration with the operation team. Even though it includes most of the items for TestOps shift-left since it requires continues testing practice, its requirements are also different and we can list them as:
  • Having continues testing
  • Testing on the live environment
  • Effective use of a monitoring system
  • Leveraging data on the live  environment to grasp anomalies
  • Creating dashboards/charts from the live data
  • Chaos engineering, defined by Netflix. Basically, identifying failures before an outage happens by injecting controlled failure scenarios for the reliability of the system.

Where We can Apply TestOps

We can see the benefits of TestOps in some situations more such as performance testing, test automation. For a well-designed performance testing, the strategy should include the real user scenarios. To be able to draw these scenarios, we need to tackle the real user data which is collected from the live environment. Let's list the items needed for performance testing:
  • Real user flow data should be collected
  • Data should be analyzed to create a distribution table for each user flow
  • This distribution should be converted to a percentage of usage
  • Test scenarios should be created depending on the real users
  • Each test scenarios should be weighted with the percentage
  • The test environment should be created if necessary, most of the time live environment should be used
  • Environments for running performance scripts should be prepared, use the cloud services. If the master-slave configuration needed, the environment for each slave should be prepared. The better way is to create an IAAC file to handle this creation and scalability automatically. 
  • Prepare monitoring tools
  • Run the tests
  • Check the monitoring tools if loads are received correctly
  • If any failure occurs, reconfigure the test/live environment and re-run the script. If everything goes well, stop the performance scripts
  • Collect the data
  • Stop every environment 
  • Create a report
  • Possible I missed something
Since these activities are not done by a single person, TestOps discipline helps to handle these tasks. In the same way, test automation also includes most of the technical expertise to leverage the benefits. 

How We can Align to TestOps 

TestOps requires some new qualifications for QA/Test Engineers such as learning how DevOps tools can accelerate the testing experiences and also how the QA team helps the operation team to get better customer experiences. Let's summaries the qualifications which are needed to align TestOps discipline:
  • Learn DevOps tools
    • Containers
    • Container orchestration tools
    • Cloud testing ability
  • Learn scripting languages
  • How to spot defects
  • How to debug defects
  • More automation, not only test but also process
  • Apply continues testing
  • Adapt yourself to new tools and technologies
  • Monitoring tool experiences
    • Search data
    • Create charts/graphs
    • Create reports
  • Learn how to communicate with the operation team
I have given a talk about What is TestOps in Five Question, you can it watch it (in Turkish)
  

Popular posts for software testing and automation

Selenium Error "Element is not currently interactable and may not be manipulated"

Selenium webdriver can drive different browsers like as Firefox, Chrome or Internet Explorer. These browsers actually cover the majority of internet users, so testing these browsers possibly covers the 90% of the internet users. However, there is no guaranty that the same automation scripts can work without a failure on these three browsers. For this reason, automation code should be error-prone for the browsers you want to cover. The following error is caught when the test script run for Chrome and Internet Explorer, but surprisingly there is no error for the Firefox. Selenium gives an error like below: Traceback (most recent call last):   File "D:\workspace\sample_project\sample_run.py", line 10, in <module>     m.login()   File "D:\workspace\ sample_project \test_case_imps.py", line 335, in login     driver.find_element_by_id("id_username").clear()   File "C:\Python27\lib\site-packages\selenium-2.35.0-py2.7.egg\selenium\webdriver\r

Change Default Timeout and Wait Time of Capybara

One of the biggest challenge for automation is handling timeout problem. Most of the time, timeout is 60 seconds but it may sometimes not enough if you have badly designed asynchronous calls or the third party ajax calls. This makes handling timeout more complex. set large enough to tolerate network related problems. For Selenium based automation frameworks, like Capybara, default Webdriver timeout is set to Net::ReadTimeout (Net::ReadTimeout) Changing ReadTimeout If you have timeout problem for Capybara, it gives an error like above. This means that the page is not fully loaded in given timeout period. Even you can see that page is loaded correctly but webdriver wait until the Ajax calls finish. class BufferedIO #:nodoc: internal use only def initialize (io) @io = io @read_timeout = 60 @continue_timeout = nil @debug_output = nil @rbuf = '' end . . . . . def rbuf_fill beg

Create an Alias for Interactive Console Work: Selenium and Capybara

If you are working on shell most of the time Aliases are very helpfull and time saving. For testing purposes you can use Alias for getting ready your test suites. In this post, I want to explain both running Selenium and Capybara on console and creating aliases for each.  This post is for Windows machines, if you are using Unix-like see   this post . Creating Scripts for Selenium and Capybara First of all, it is assumed that you have installed Selenium and Capybara correctly and they work on your machines. If you haven't installed, you can see my previous posts. I am using the Selenium with Python and the Capybara with Ruby. You can use several different language for Selenium but Capybara works only with Ruby.  Create scripts in a directory called scripts (in your home folder, like as  ~/scripts ) for your automation tool as following, save them as capybara.rb, sel.py :  Creating Aliases Depends on your favourite shell, you need to add the alias to .bashrc bash

Page-Object Pattern for Selenium Test Automation with Python

Page-object model is a pattern that you can apply it to develop efficient automation framework. With the page-model, it is possible to minimize maintenance cost. Basically page-object means that your every page is inherited from a base class which includes basic functionalities for every page. If you have some new functionalities that every page should have, you can simple add it to the base class. Base class is like the following: In this part we are creating pages which are inherited from base page. Every page has its own functionalities written as python functions. Some functions return to a new page, it means that these functions leave the current page and produce a new page. You should write as much as functions you need in the assertion part because this is the only part you can use the webdriver functions to interact with web pages . This part can be evaluate as providing data to assertion part.   The last part is related to asserting your test cases against to the

Performance Testing on CI: Locust is running on Jenkins

For a successful Continuous Integration pipeline, there should be jobs for testing the performance of the application. It is necessary if the application is still performing well. Generally performance testing is thought as kinds of activities performed one step before going to live. In general approach it is true but don't forget to test your application's performance as soon as there is an testable software, such as an api end point, functions, and etc. For CI it is a good approach to testing performance after functional testing and just before the deployment of next stage. In this post, I want to share some info about Jenkins and Locust. In my previous post you can find some information about Locust and Jenkins. Jenkins operates the CI environment and Locust is a tool for performance testing. To run the Locust on Jenkins you need command line arguments which control the number of clients ,   hatch rate,  running locust without web interface and there should be so