Being skeptical: Test Driven Development

Introduction
Test driven development (TDD) is an agile practice that has been introduced by Kent Beck, or as he puts it “rediscovered” in his book “Test Driven Development in Practice” [Kent, 2002]. Its structure is simple although a bit controversial because of the way it affects the development process. TDD is represented by a development circle, starting always with writing tests and then followed by writing the implementation code.

How it works
Indeed, the first thing is to make a list of the tests that you have to write in order to achieve the goal of your software. The idea behind this, is that it makes you think about the requirements and the specifications of the software before you write any code. Next step is to run these tests and get the “red light”, meaning that those tests should fail. This will ensure that the tests really work and validate the test harness. Therefore, having the tests in place, the following step is to write some code. Since you broke a test you need to fix it! Moreover, TDD suggests writing the simplest code possible, just to pass the tests in place. Particularly, Kent Beck [Kent, 2002] suggests to “Fake it until make it” which implies even returning constant values in order to make the test succeed. As a result, you will never implement something that you are not going to need (YANGI) but you will implement “just enough” to pass the tests. Once the tests and the implementation are in place you need to run the tests and get the “green light”, meaning that all code must pass the tests. Now you need to step back and think about it. You’ve started with failing tests in the morning, you wrote the least possible code at lunch time and in the evening you have a fully working and tested program. Doesn’t this makes you more comfortable? Doesn’t it make you feel more productive? Apparently, that is the point. Of course, you don’t feel that comfortable because even if your code passes the tests you know that you forced it to. For this reason, the next step is to refactor the code. This is the time where you have to apply your favorite software quality standards and principles, while having already the design and the documentation in place, which makes it a lot easier. At last, since you’ll make the tests run again, the implementation always implies new tests. Due to that, you have to go again from step one and follow the development circle.

Benefits
As can be seen from its definition, the benefits of TDD are numerous, from software design to developer’s psychology, and from software requirements to testing and high coverage. Personally, I believe that this technique can often lead to a more reliable software. To begin with, when a feature is finished, most of the time managers force developers to move fast to the next one since the first one “works”. However, the feature is not fully tested which eventually affects its safety. Thus, using TDD you ensure that your feature is already tested with high coverage. Other than that, TDD significantly improves the design. As Ward Cunningham mentions “Test first coding is not a testing technique”. Moreover, Kent Beck presents [Beck, 2001] that TDD offers more than just simple validation of correctness, it can also drive the design of a program. TDD is thus an analysis and a design technique since writing tests before the code requires to consider the design of the solution. In my opinion, this is a good technique to avoid the “analysis-paralysis” effect. It forces you to think about the design, which is considered a topdown process, but during writing tests, which is a bottom-up process. Consequently, it introduces a middleout process which can positively affect the tests, the design process and the developers productivity.

Limitations
So now that your confidence and the test coverage are high and your designs are good, it is time to mention that TDD does not always work. As presented in the previous paragraph, TDD “ensures” reliability, but apparently not always. This is why, most of the time the same developers who write the tests, also write the code. Because of that, the possibility of leaving “blind spots” in the implementation is very high, since the developers are “biased” with high confidence towards their code correctness. The developers think that everything has been tested and feel comfortable about it. However, there are a lot techniques to solve this problem. For instance, test coverage frameworks that analyze to which extent the code has been tested, or random testing that checks cases that you never thought about. In addition to that, there is always the case where the requirements were elicited the wrong way, and in this case you have a perfectly working and tested software, but the wrong software. Furthermore, as it can be seen [Oram and Wilson, 2010] [Beck, 2001] TDD is not applicable in cases that require a lot of functional testing, for example in GUI testing. Additionally, there is no recommended best context for the use of TDD. Thus, TDD can not be used as a framework for specific cases and, when applied, it can not always ensure that the best option, since TDD is difficult to learn. It involves a steep learning curve that requires skill, maturity and time [Oram and Wilson, 2010]. From my experience, you need to invest a lot of time in order to learn TDD, and even some more to practice it before applying it. Generating some personal “heuristics” and patterns that will make your work-flow easier will help. Despite that, once you grasp it, it is easily applicable and provides you with all the aforementioned advantages. Eventually, if not TDD itself, some of its variants, help a lot on many difficult situations. One of those TDD variants is Acceptance Test Driven Development (ATDD). Whereas TDD is a tool for well written unit of code, ATDD is a communication tool for requirements that are well defined. I believe that applying ATDD is a good practice since it does not affect the quality of your code, it is not time consuming, most of the times at least, since you usually test only the APIs, and it can make sure of the validity of the requirements, something that is a problem in the case of TDD. I personally use ATDD a lot. Following ATDD there is also BDD, Behavioral Test Driven Development that focuses on tests which describe behavior. Empirically speaking, there is evidence [Oram and Wilson, 2010] which suggests that on one hand, TDD strongly favors external quality and the modularization of an application, but on the other hand it negatively affects the productivity. Because of the steep learning curve, in the beginning it decreases productivity and introduces a lot of initial overhead. However, I believe it that despite that, it significantly increases productivity overtime, because of the end result is both running and tested software.

Conclusion
Test Driven Development is a very controversial topic in software process. Its costs are usually unknown which makes it an ambiguous technique to use. Online you can find fighters from both fields, favoring or opposing TDD with numerous success and failed stories. In the end it is a nice idea which provides great insights. As agile suggests, “adjust”, try it, and adjust it to your work-flow. If it does not work, change it.

References
[Beck, 2001] Beck, K. (2001). Aim, fire. IEEE Software, (5):87–89.

[Kent, 2002] Kent, B. (2002). Test driven development: by example. Boston, Massachusetts: Addison-Wesley Professional.

[Oram and Wilson, 2010] Oram, A. and Wilson, G. (2010).

Making software: What really works, and why we believe it. ” O’Reilly Media, Inc.”.

Search Based Software Testing – An introduction

[This post is part of my activities as M.Sc. Software Engineering student in University of Amsterdam]

Software testing is an important process during software development. Software developers and testers spend a lot of time and effort to create effective test cases and integrate the testing process in their workflow inside an organization. Although software testing is a great investment for the software’s quality and software’s life, creating test cases manually is a very time-consuming, high-cost and, most importantly, an error prone process. Since this is a common and considerable problem, there has been researched and developed methods that can automate the test generation and make the testing process effortless and reliable. Those methods can reduce the time and cost and increase the quality of the test cases set. A solution for automated software tests generation is given from the Search Based Software Testing which uses optimization algorithms, like genetic algorithms, simulated annealing, swarm optimization and more, to achieve the tests cases generation.

The term Search Based Software Engineering has been coined by Harman and Jones in 2001 [1], and is considered as an optimization discipline for software engineering. SBSE has applications on many software engineering areas such as Requirements engineering, Software metrics, Software project management, Automated Software Repair (automated bug fixes) and Software Testing. The first publication on Search-Based Software Testing was in 1976, from Webb Miller and David Spooner [4]. That work concerned test data generation which included a ‘cost-function’ (also called fitness function for meta-heuristics) for running a simple optimization process. It was a notable contribution to the area since it was different from the existing static methods of that time. The basic idea of the search based test data generation approach is that the set of possible inputs to the program forms a search space and the test adequacy criterion is coded as a fitness function [1]. Furthermore, the test in this situation is transformed into an optimization problem. The test object searches for test data that fulfils the respective test aim in the search space [6].

The simplest implementation of an optimization algorithm is a random search. This method is very poor on finding solution, especially when those solutions are widely spread to the search space [3]. A solution is provided from optimization algorithms and meta-heuristics which “guide” the solutions into the search space by using fitness functions which calculate the quality of the generated solutions [4]. A simple optimization algorithms is the Hill-Climbing. In that algorithm a solution starts at a random point and the points that are close to the current point are evaluated for their “quality” (fitness). If a better solution is found, then the algorithm moves to that point and the process is repeated until the best solution is found. The problem with this method is that when a space doesn’t offer a better solution then there may be a ‘local minima’ of that space. An alternative to simple Hill Climbing is Simulated Annealing. Search by Simulated Annealing is similar to Hill Climbing, except movement around the search space is less restricted [3] and therefore it provides the avoidance of local minimas by introducing a “jumping around” of the solution.

Although meta-heuristic methods like Hill-Climbing work for simple optimizations, software is a non-linear problem and the conversion of test aim to optimization problems mostly leads to complex, discontinuous, and non-linear search spaces. Evolutionary Algorithms have proved a very powerful optimization algorithm for software testing[6]. An important contribution was from Xanthakis et al who applied Genetic Algorithms to the problem [7]. Introducing genetic algorithms to such a problem helped a lot by expanding the search space and providing a form of ‘global’ search, sampling many points in the search space at once. Therefore, Evolutionary Testing implemented as a subfield of the SBT to apply genetic algorithms into the SBST problem.

Search-based optimization can be applied to many areas of testing, more specifically it requires the testing goal to be defined numerically. Therefore search based software testing is considered as a very good solution and many authors are trying to adapt that method [6]. The first application area of the SBT, taken from Miller and Spooner approach [4] is in the Structural Testing (White-Box). It is considered the most applicable and most researched area [3]. In Structural Testing the fitness functions are for path coverage, branch coverage, data flow coverage and more. The program that is under testing performs a code tracing process (Dynamic Structural Test) and is executed with inputs suggested by the meta-heuristic algorithm. The code instrumentation helps with the presence of loops and complex logic which makes it difficult to be analysed statically. The path that has been taken during the execution is compared with some structure of interest for which coverage is attempted to be found [3].

Another application area of SBT is the Temporal Testing in which the purpose is to  find the Best Case Execution Time (BCET) and the Worst Case Execution Time (WCET). This is very helpful for safety critical systems and embedded / real time systems. From reports, search based testing has good results in those kind of tests [8], [9]. The fitness function in this situation is the execution time of the software and it can be found by running it with some inputs. The genetic algorithm generates inputs and rates their quality through the fitness function. In the case of the BCET the search tries to find the minimum execution time, while in the case of WCET the search tries to find the longer execution time.

Although Functional Testing (Black-Box) has not that many publications [10], it is considered as an evolving area while many search techniques can be applied to this kind of testing including simulated annealing, genetic algorithms and particle swarm optimization. Particle swarm optimization is a “population based stochastic search technique” that is inspired by social metaphors of behavior and swarm theory. Functional testing describes the logical behaviour of a system. The fitness function rates the solutions based on how close they are to satisfying the conjuncts to each route. The solution generated tries to optimize this distance to the minimum [11].

Finally SBT can also be applied on the Gray-Box Testing  area which combines both structural and functional information. This area includes the following applied methods. Assertion Testing, is a method where the search tries to find test cases that violate assertion conditions which are inserted in the code by the developers. Another is the Exception Condition Testing in which the meta-heuristics search for inputs and test the run-time errors handling in the code (exceptions). The are many future references that can be applied in this kind of testing therefore it is considered as a growing area [4] [11].

Although the Search Based Software Engineering is not widely applied in the software industry, the Search Based Software Testing, as a sub-field, has been developed a lot all over the years with considerable contributions from both the academic and industry (mostly from embedded and real time systems) world. As it can be seen SBT can be applied on many areas including many kinds of testing and produce impressive results. In conclusion, the are many references and prospects of future work on this field that can help the software developers, testers and most importantly the manually testing process which is slow and painful for many organizations.

 

References

[1] Mark Harman, “The Current State and Future of SBSE”, Future of Software Engineering

(FOSE’07), IEEE Computer Society, 2007.

[2] Stefan Mairhofer, Robert Feldt, Richard Torkar, “Search-based Software Testing and Test Data Generation for a Dynamic Programming Language”, (GECCO’11), ACM, 2011.

[3] Phil McMinn, Search-Based Software Testing: Past, Present and Future, 4th International Workshop on Search-Based Software Testing Berlin Germany, March 2011

[4] W. Miller and D. Spooner, “Automatic generation of floating point test data,” IEEE Transactions on Software Engineering, vol. 2, no. 3, 1976.

[5]  M. Harman and J. Clark, “Metrics are fitness functions too,” in International Software Metrics Symposium (METRICS 2004). IEEE Computer Society, 2004.

[6] P. Maragathavalli, Search based software test data generation using evolutionary computation, International Journal of Computer Science & Information Technology (IJCSIT), Vol 3, No 1, Feb 2011.

[7] S. Xanthakis, C. Ellis, C. Skourlas, A. Le Gall, S. Katsikas, and K. Karapoulios, “Application of genetic algorithms to software testing (Application des algorithmes genetiques au test des logiciels),” in 5th International Conference on Software Engineering and its Applications, Toulouse, France, 1992

[8] P. Puschner and R. Nossal, “Testing the results of static worst-case execution-time analysis” Computer Society Press, 1998.

[9] J. Wegener, H. Sthamer, B. F. Jones, and D. E. Eyres, “Testing real-time systems using genetic algorithms” Software Quality Journal, vol. 6, no. 2, 1997.

[10] Raluca Lefticaru, Florentin Ipate, Functional Search-based Testing from State Machines, IEEE Computer Society, 2008.

[11] Phil McMinn, Search-based Software Test Data Generation: A Survey, Software Testing Verification and Reliability 14(2), 2004.