Three arguments against the verification-validation dichotomy

Last week while talking with two colleagues, one of them mentioned the verification/validation thing. And I noticed it made me feel uneasy. Because I know well enough what is meant by the distinction, but on a practical level I simply can’t relate to it. When I think about what I do as a software tester and how verification versus validation applies to it, nothing happens. Blank mind. Crickets. Tumbleweed.
So after giving it some thought, I present you with three arguments against the verification-validation dichotomy.

First of course, we have the obligatory interlude of defining these two terms. A place to start is the Wikipedia page on Software verification and validation. Unfortunately it contains conflicting definitions, so if anyone cares enough, please do fix. Luckily there’s also the general Verification and validation page of Wikipedia, which gives us (among others) the tl;dr version of the distinction:
– Verification: Are we building the product right?
– Validation: Are we building the right product?
Finally there’s the ISTQB glossary v2.4 that borrows from ISO 9000:
– Verification: Confirmation by examination and through provision of objective evidence that specified requirements have been fulfilled.
– Validation: Confirmation by examination and through provision of objective evidence that the requirements for a specific intended use or application have been fulfilled.

Now on to the three arguments.

1. It screams V-model, silos and contracts.
When talking about verification vs validation, there is the (often implicit) assumption that we do verification first and then validation. That makes sense, if you ask someone else to build something for you. The developer in question can more easily verify (Am I building what I was asked to build?) than validate (Is this fit for my client’s purpose?).
The next step is to realize that in such cases client and developer are most likely in different departments or perhaps even companies. So we need to decide who’s responsible for what, who will pay for what, etc. And we might as well write that down somewhere in some sort of agreement or contract. The most sensible way to divide responsibilities, is to make the developers responsible for building what was designed and thus verification. The client will be responsible for the validation during what is often called acceptance tests. (Admittedly, there may be some verification first in those acceptance tests.)
In this context, one of assigning responsibilities within a contract, it makes sense to distinguish between verification and validation. As a result (bonus?) you enter the domain of utterances like: “That’s not a bug, that’s a change request.” and “Hey, works as designed.” (Or as some developers I know, lacking a proper design document, said: “Hey, works as built.”)
As a tester doing actual testing, however, I honestly don’t care. I don’t ask: Am I doing validation or verification here? I do ask: Could there be a problem here?
Oh, and as a member of a sufficiently agile team, I don’t think I care that much either.

2. There are more sources for test ideas.
Looking at the definitions there are only two sources for test ideas: “the specified requirements” (verification) and “the requirements for a specific intended use or application” (validation). Contrast this with the Little Black Book on Test Design by Rikard Edgren which contains 37 [sic] sources for test ideas divided into 6 categories: Product, Business, Team, Project, Stakeholders, External.
There are several ways you could try to map each of these 37 sources to either verification or validation and perhaps you would even succeed. However, does that make sense for test idea sources like rumors, product image or debt? More importantly, why limit yourself to a list with two items when you can use one with thirty-seven?

3. What about asking non-confirmatory questions?
Both verification and validation are defined as a form of confirmation (just reread the definitions, it’s the first word in there). They’re focused on the question: how might the product work? Yet in testing, there are two other questions that are at least as interesting: How might the product not work? and What can we make the product do?
My favorite description of the job of a software tester is: making software do interesting things. (Sometimes I still find it hard to believe you can get paid for that.) Some things are interesting because they show how the product might work; other things because they show how the product might not work. And yet other things are interesting because we’re not quite sure if it falls into the working or not-working category, but we really should make an effort to find out.
Within the verification-validation distinction, however, this whole area of exploration, investigation and experimentation is nowhere to be found. All we do is confirm against requirements.

And with that I say goodbye to another concept of traditional testing methodologies, because it has no use to me. So goodbye to both of you, verification and validation. Am surprised it lasted as long as it did.

The test case – an epistemological deconstruction

(This article was first published in Dutch in TestNet Nieuws 18. The article below is a translation with minor changes. Many thanks to Joris Meerts and Ruud Cox for reviewing the original version.)

Testing as an information problem

Testing is an information problem. We are in search of certain information, of an answer to the question: does this application fulfill the relevant explicit and implicit expectations? The exact way in which we can answer this question, however, is not immediately clear. First we will need to decide which questions to ask, how to ask them and how to evaluate the responses. Hence, testing is an information problem. For the traditional test methodologies (ISTQB and TMap being the most well-known) the test case is a large part of the solution. So let’s take this solution apart epistemologically and see what it is we have in front of us. If the traditional test case is our solution, what information does a test case contain? What changes occur after executing it? And also, where is the understanding in all of this that’s happening? In this article, I will first describe how a typical test case is created and how it is used. Then we shall take a look at which kinds of information a test case contains. Finally, we will analyze where the understanding of what happens during testing, is present and where it is not.

Creation and usage of the test case

To begin with let’s find out what the traditional test methodologies have to say about creating and using test cases. Because of the philosophical nature of this article, I will only look at what these methodologies describe and ignore possible pragmatic deviations.

Test basis

A test case is created starting from the test basis. In the test basis the expectations with regards to the application are documented. Most likely not all (but close to all) explicit expectations are present. And note that some of these explicit expectations have only become so during the documentation process. Besides the explicit expectations the test basis also contains a number of implicit expectations: expectations of which you can deduce the existence based on the explicit expectations present in the test basis. As a consequence the implicit expectations in the test basis will deviate from the ‘actual’ implicit expectations, for they are based on a different set of explicit expectations. To summarize, the test basis is not a copy, but a model of the expectations of the application. E-TS-TB-TDT

Test design technique

To create a test case one uses the test design techniques selected in the test strategy. Like the test basis, the test strategy is a transformation, a model of the explicit and implicit expectations of the application. While this is a fairly straightforward transformation in the case of the test basis (documenting expectations), it’s more complex for the test strategy. Besides expectations about the behaviour of the application, the test strategy also takes risks into account. The combination of these two models (test basis and test strategy) by means of test design techniques results in the third model: the collection of test cases. Obviously the same applies here as with the test basis: there will not be a one-to-one relation between the test cases and the actual expectations about the application. Even more, there won’t be a one-to-one relation between the expectations documented by the test basis and the expectations documented by the test cases. Some information will be lost, some will be gained. It would be interesting to explore how all these elements (actual expectations, test basis, test strategy and set of test cases) influence each other, but unfortunately I have to leave that out of scope for this article.

Test coverage matrix

One test design technique – and I hope we’re using more than one – results in multiple test cases. Most of the time we group these test cases into for instance a test script to make test execution easier. This makes it difficult to keep track to which part (or parts) of the test basis each test case relates. The solution to this is creating a coverage matrix (aka traceability matrix): a table that documents these relations.

Test case

A test case consists of two parts: on the one hand input (test data and steps to be executed) and preconditions, on the other expected output and postconditions. More precise would be to say “expected input and preconditions”. Setting aside the question if the executing tester correctly identifies the preconditions and correctly enters the input, there is the fact that it’s no more than an expectation of ours that it’s possible to enter the specific input of the test case under the preconditions described in the test case. Until we actually try to execute the test case and see that it can be done, it is no more than that, an expectation. The same applies to the processing of the input by the application. Hence the wavy lines in the illustration. A test case is our expectation based on the best knowledge we have when creating the test case, but that knowledge has not been tested yet. We don’t truly know anything about the application we are planning to test, until we actually test it. TestCase

Test result

When we execute the test case, we check the preconditions, enter the input and compare the actual output with the expected output and postconditions. Based on that we decide: ‘pass’ or ‘fail’. This moment is the first time the expectations that lead to a to-be-tested application come into direct contact with the expectations that lead to a set of test cases. The result is documented in these test cases as a series of green checks and red crosses, a series of passes and fails.

Types of information in a test case

Now we have described what a test case is (a possible solution to an information problem), it’s time to look at what information is present in a test case. We can distinguish the following four types (indicated by black numbers in the illustration): 1. How the application is supposed to work; 2. How the application actually behaves; 3. Why this test case was created; 4. What has been tested. Let’s go over these one by one.

How the application is supposed to work

The information on how the application is supposed to work is present in the test case as such: the input, the expected output, the preconditions, the postconditions. As said earlier it’s important to realize that when we create the test case, we don’t know yet how the application actually behaves. We work based on expectations, also when determining the input and the preconditions. Of some expectations we are quite certain, of others less so. This results in an interesting tension within the expectations about how the application is supposed to work: at what moment are you certain enough of a particular expectation to accept it as input and/or precondition of a test case? Another question is what information is lost when transforming the test basis by means of a test design technique to a number of test cases. We lose the implicit expectations present in the test basis in exchange for the implicit expectations present in the test cases. This exactly is both the strength and the weakness of test design techniques: they allow us to hone in on certain specific expectations; that there is also a loss in information we just have to accept. Another thing we lose in this transformation is the structure of the test basis, the relations between its parts. Often we try to compensate this loss with a coverage matrix: how does the structure of the test basis relate to the structure of the test cases? TestCase_EpsitemDeconstr

How the application actually behaves

During test execution we begin to discover how the application actually behaves. The expectations are tested against the application. One way to think about what happens is by means of John Boyd’s OODA-loop: Observe – Orient – Decide – Act. We execute the test case and go through each of the four phases: we see the output (Observe), we interpret our observations (Orient), on which we base our evaluation (Decide) and finally we do something (Act). (see illustration) For a test case the evaluation is all about the question: Is there a problem here? Does the output conform to the expected output or not? Since the test case describes the expected output, it also is the oracle, the mechanism based on which we decide if there is a problem or not. The test case describes what you should expect to see as output; if you don’t see it, there’s a problem. The thought processes of the tester during test execution – how we observe, how we orient, which decision we take – are thus for a large part determined in advance by the test case we have in front of us. Even more, the OODA-loop is not really a loop. After executing a test case, the tester will not go through an OODA-loop to determine which test case is to be executed next. The next case has been prepared already, it’s simply the next one on the stack.

Why this test case

Each test case exists for a reason. It was created because the test strategy determined a certain test design technique needed to be applied on the test basis. Or put differently, if we think of strategy/tactics/operations (see illustration), it’s the test strategy that describes the strategic level of our testing. The tactical level, however, which connects the test strategy with the actual testing, isn’t described explicitly anywhere. It’s hidden in our choice of test design techniques. The test operations, finally, are described in our test cases. This means that the reason of existence for a particular test case isn’t described or documented explicitly. We have to actively interpret the test case, the test design technique and the test strategy to reconstruct that reason. The big question here is how closely this reconstruction resembles the original reasoning.

What has been tested

The question of what has been tested, can be asked on several different levels. On the level of the test case this question can be answered fairly easily: a test case has been fully executed or not, it passed or it failed. Answering this question on a higher level immediately becomes much more difficult. As just mentioned, the test tactics are not described explicitly. To get to the strategy we will have to make that leap ourselves. That leap as such can be made, but it prevents us from talking about the test strategy on a different level than either the details of the test cases or the abstraction of the test strategy. There is nothing in between. A possible solution is to use a test coverage matrix. However, it’s a limited solution. In the end this matrix does nothing more than link the expectations from the test cases to the expectations from the test basis. Although that does give us another angle, it does not bring our thinking to another level. So the gain is limited. So both of these approaches (linking test cases to either test strategy or to test basis) bring along their own share of problems. Perhaps that’s why there is a third and easier solution: having faith in the work that has been done earlier.

Where is the understanding?

If we now take a step back to get a good overview, one thing that stands out is the dispersedness of information. Information is less available, not as easily accessible, as we would want it to be. (See my earlier post on information debt for some more thoughts on this.) Not only that, the understanding of what and how we are testing, is equally dispersed. Strategy and operations are separated by the implicit tactics of test design techniques. In the test operations the middle part of the OODA-loop, orientation and decision, have been separated from the other two elements, observation and action. The first two are part of test design; the latter two of test execution. And in fact the observation is strongly directed by test design. So only the action as such (marking the test case as passed or failed) happens completely inside the test execution activities. All in all this reminds me strongly of the Chinese room, a thought experiment by John Searle. A man is sitting in a room and he receives pieces of paper with Chinese characters on them. He has a big book with rules about what Chinese characters he has to write in response, depending on the pieces of paper he receives. Now, in fact, the pieces he receives contain questions and the characters he writes down are the correct answers. To an outside observer it would look that the person inside the room knows Chinese, yet this is not the case. So the question is: where is the knowledge, the understanding of Chinese? It’s not in the man and it’s not in the book. A possible answer is that the understanding resides in the system as a whole, in the man together with the book. A similar argument can be made about testing based on test cases. It’s impossible to point at one thing or person that understands the whole: from test strategy to tactics to planned and executed operations. This understanding is present however in the complete system consisting of test strategy, test design techniques, test coverage matrix, test cases, test results and people. If this is a problem or not will depend on how we evaluate the complexity of the information problem that is testing. With the ironic twist that the bigger we estimate the complexity, the more necessary but also the more difficult it will be to avoid this dispersedness of information – or at least limit it sufficiently.