Search Based Software Testing – An introduction

[This post is part of my activities as M.Sc. Software Engineering student in University of Amsterdam]

Software testing is an important process during software development. Software developers and testers spend a lot of time and effort to create effective test cases and integrate the testing process in their workflow inside an organization. Although software testing is a great investment for the software’s quality and software’s life, creating test cases manually is a very time-consuming, high-cost and, most importantly, an error prone process. Since this is a common and considerable problem, there has been researched and developed methods that can automate the test generation and make the testing process effortless and reliable. Those methods can reduce the time and cost and increase the quality of the test cases set. A solution for automated software tests generation is given from the Search Based Software Testing which uses optimization algorithms, like genetic algorithms, simulated annealing, swarm optimization and more, to achieve the tests cases generation.

The term Search Based Software Engineering has been coined by Harman and Jones in 2001 [1], and is considered as an optimization discipline for software engineering. SBSE has applications on many software engineering areas such as Requirements engineering, Software metrics, Software project management, Automated Software Repair (automated bug fixes) and Software Testing. The first publication on Search-Based Software Testing was in 1976, from Webb Miller and David Spooner [4]. That work concerned test data generation which included a ‘cost-function’ (also called fitness function for meta-heuristics) for running a simple optimization process. It was a notable contribution to the area since it was different from the existing static methods of that time. The basic idea of the search based test data generation approach is that the set of possible inputs to the program forms a search space and the test adequacy criterion is coded as a fitness function [1]. Furthermore, the test in this situation is transformed into an optimization problem. The test object searches for test data that fulfils the respective test aim in the search space [6].

The simplest implementation of an optimization algorithm is a random search. This method is very poor on finding solution, especially when those solutions are widely spread to the search space [3]. A solution is provided from optimization algorithms and meta-heuristics which “guide” the solutions into the search space by using fitness functions which calculate the quality of the generated solutions [4]. A simple optimization algorithms is the Hill-Climbing. In that algorithm a solution starts at a random point and the points that are close to the current point are evaluated for their “quality” (fitness). If a better solution is found, then the algorithm moves to that point and the process is repeated until the best solution is found. The problem with this method is that when a space doesn’t offer a better solution then there may be a ‘local minima’ of that space. An alternative to simple Hill Climbing is Simulated Annealing. Search by Simulated Annealing is similar to Hill Climbing, except movement around the search space is less restricted [3] and therefore it provides the avoidance of local minimas by introducing a “jumping around” of the solution.

Although meta-heuristic methods like Hill-Climbing work for simple optimizations, software is a non-linear problem and the conversion of test aim to optimization problems mostly leads to complex, discontinuous, and non-linear search spaces. Evolutionary Algorithms have proved a very powerful optimization algorithm for software testing[6]. An important contribution was from Xanthakis et al who applied Genetic Algorithms to the problem [7]. Introducing genetic algorithms to such a problem helped a lot by expanding the search space and providing a form of ‘global’ search, sampling many points in the search space at once. Therefore, Evolutionary Testing implemented as a subfield of the SBT to apply genetic algorithms into the SBST problem.

Search-based optimization can be applied to many areas of testing, more specifically it requires the testing goal to be defined numerically. Therefore search based software testing is considered as a very good solution and many authors are trying to adapt that method [6]. The first application area of the SBT, taken from Miller and Spooner approach [4] is in the Structural Testing (White-Box). It is considered the most applicable and most researched area [3]. In Structural Testing the fitness functions are for path coverage, branch coverage, data flow coverage and more. The program that is under testing performs a code tracing process (Dynamic Structural Test) and is executed with inputs suggested by the meta-heuristic algorithm. The code instrumentation helps with the presence of loops and complex logic which makes it difficult to be analysed statically. The path that has been taken during the execution is compared with some structure of interest for which coverage is attempted to be found [3].

Another application area of SBT is the Temporal Testing in which the purpose is to  find the Best Case Execution Time (BCET) and the Worst Case Execution Time (WCET). This is very helpful for safety critical systems and embedded / real time systems. From reports, search based testing has good results in those kind of tests [8], [9]. The fitness function in this situation is the execution time of the software and it can be found by running it with some inputs. The genetic algorithm generates inputs and rates their quality through the fitness function. In the case of the BCET the search tries to find the minimum execution time, while in the case of WCET the search tries to find the longer execution time.

Although Functional Testing (Black-Box) has not that many publications [10], it is considered as an evolving area while many search techniques can be applied to this kind of testing including simulated annealing, genetic algorithms and particle swarm optimization. Particle swarm optimization is a “population based stochastic search technique” that is inspired by social metaphors of behavior and swarm theory. Functional testing describes the logical behaviour of a system. The fitness function rates the solutions based on how close they are to satisfying the conjuncts to each route. The solution generated tries to optimize this distance to the minimum [11].

Finally SBT can also be applied on the Gray-Box Testing  area which combines both structural and functional information. This area includes the following applied methods. Assertion Testing, is a method where the search tries to find test cases that violate assertion conditions which are inserted in the code by the developers. Another is the Exception Condition Testing in which the meta-heuristics search for inputs and test the run-time errors handling in the code (exceptions). The are many future references that can be applied in this kind of testing therefore it is considered as a growing area [4] [11].

Although the Search Based Software Engineering is not widely applied in the software industry, the Search Based Software Testing, as a sub-field, has been developed a lot all over the years with considerable contributions from both the academic and industry (mostly from embedded and real time systems) world. As it can be seen SBT can be applied on many areas including many kinds of testing and produce impressive results. In conclusion, the are many references and prospects of future work on this field that can help the software developers, testers and most importantly the manually testing process which is slow and painful for many organizations.

 

References

[1] Mark Harman, “The Current State and Future of SBSE”, Future of Software Engineering

(FOSE’07), IEEE Computer Society, 2007.

[2] Stefan Mairhofer, Robert Feldt, Richard Torkar, “Search-based Software Testing and Test Data Generation for a Dynamic Programming Language”, (GECCO’11), ACM, 2011.

[3] Phil McMinn, Search-Based Software Testing: Past, Present and Future, 4th International Workshop on Search-Based Software Testing Berlin Germany, March 2011

[4] W. Miller and D. Spooner, “Automatic generation of floating point test data,” IEEE Transactions on Software Engineering, vol. 2, no. 3, 1976.

[5]  M. Harman and J. Clark, “Metrics are fitness functions too,” in International Software Metrics Symposium (METRICS 2004). IEEE Computer Society, 2004.

[6] P. Maragathavalli, Search based software test data generation using evolutionary computation, International Journal of Computer Science & Information Technology (IJCSIT), Vol 3, No 1, Feb 2011.

[7] S. Xanthakis, C. Ellis, C. Skourlas, A. Le Gall, S. Katsikas, and K. Karapoulios, “Application of genetic algorithms to software testing (Application des algorithmes genetiques au test des logiciels),” in 5th International Conference on Software Engineering and its Applications, Toulouse, France, 1992

[8] P. Puschner and R. Nossal, “Testing the results of static worst-case execution-time analysis” Computer Society Press, 1998.

[9] J. Wegener, H. Sthamer, B. F. Jones, and D. E. Eyres, “Testing real-time systems using genetic algorithms” Software Quality Journal, vol. 6, no. 2, 1997.

[10] Raluca Lefticaru, Florentin Ipate, Functional Search-based Testing from State Machines, IEEE Computer Society, 2008.

[11] Phil McMinn, Search-based Software Test Data Generation: A Survey, Software Testing Verification and Reliability 14(2), 2004.