the testing curve

my learning curve in software testing

Testing maturity in an agile/CDT context

One day during a team meeting at Joep‘s previous job at a bank the Team Manager of Testing, listed a number of topics his testers could work on in the coming months. One of those topics was “testing maturity”. This topic was on the list not because this manager was such a fan of maturity models, but because the other team managers (Business Analysis and Development) had produced one for their own teams and higher management would like to have one for testing as well. And although Joep saw little value in a classic five-tiered maturity model either, he was intrigued by the question: so what can you do with respect to maturity models that is of value?

artikel_pic1artikel_pic2
Joep asked Huib to help him think of a way to create a valuable, context-driven way to work on maturity. Since Huib had been working for the same bank, they met and discussed the possibilities. Soon they found out that the criteria should be variable since maturity depends on context. They started experimenting with stack ranking and quite soon they had the first version of their “maturity model”.

 
After a first try-out at the bank Joep worked, we let it rest for a while. After a couple of months we wrote this article. It is the first version and it needs to be refined and polished. The heuristics lists are probably to long and need to be reduced. We think of this model as a card game that can be played with teams.

artikel_pic3

 

Currently we are also working on an agile version of this model, a card game for agile teams to assess their “maturity” to help them to find possible areas for improvements. More about that later.

We are curious about your thoughts. What do you think? Maybe you want to try the game? Feel free to try it out. We hope you will share your experiences with us.

Article (pdf) – Card game (pdf)

Regression testing, it means less than you think

The past weeks I have made several attempts at a blog post about regression testing. About how we use it to refer to different things: tests running on a CI server, people executing test scripts, etc. And about how often the term really doesn’t mean much at all, yet nobody questions you when you use it: “What are you doing?” “Regression testing.” “Oh good, carry on.” The point of the post would be to argue we should use the term ‘regression testing’ a lot less, because most of the time we can be more specific without having to be more verbose.

However, the more I thought about (what I would qualify as) proper regression testing, the more I felt that regression versus progression (or progressive) testing is a distinction without difference. One interesting observation in this regard is that “regression testing” returns 30 times more results on Google than “progression testing” and “progressive testing” combined. So what’s going on here if we have a dichotomy with one member producing so much more discussion than the other? And there’s more: regression testing is commonly contrasted with test types like functional testing and usability testing. But how then should I categorize a regression test focusing on functionality?(1)

Anyhow, more thinking followed and the result is what you are reading here. A blog post describing four different things you could call regression testing. The first two I don’t consider to be regression testing, the third one I find confusing and in the fourth I am reluctant to use the term at all.

Oh wait, I do have to provide a definition of regression testing for within this blog post, so here you go: Regression testing is testing the things you don’t expect to observe any changes in.

Regression testing as part of continuous integration
Situation: your regression test is a bunch of automated checks running on your CI server.

Although I see great value in this, it’s debatable if this qualifies as testing. You commit your code and check if the build is green or red. If it goes red, you know there is a problem somewhere. If it stays green, you know the checks did not detect anything alarming and you move on. This is why people refer to these checks as change detectors(2). Because so far, no real testing is going on. On the other hand, you likely will do testing when you are investigating why the build went red. And testing was involved when you wrote the checks, and arguably when people decided how to run which checks on the CI server. So whether running these checks should be considered testing or not strongly depends on your choice of perspective.

More important for this blog post and referring back to the definition, the decision which checks to run has no relation to where we don’t expect to see any changes. The checks that are run are simply the checks that the CI server has been configured to run. And some of those checks are checks you changed or created because of the code changes in your commit. So there really is no consideration for changed versus unchanged areas here, which means there is no good reason to call running these checks regression testing.

(Not directly relevant, but related and interesting: for a critique of the fully automated approach to regression testing, watch this short Whiteboard Testing video “Regression Testing, the F.A.R.T Model“.)

Regression testing with a suite of regression test scripts
Situation: you have a suite of test scripts you execute every time you perform a regression test.

The problem here is that you have the worst of two worlds: people trying to be like CI servers, executing the same test scripts every time, while that’s not what people are good at. So it’s time to make a choice: either go machine-centric and automate it all, or go human-centric and give them more freedom.(3) In any case, what’s going on here, is not regression testing. It’s slow, error-prone change detection, again with no consideration what has and what hasn’t been changed.

Regression testing by doing actual testing
Situation: you identify the things you don’t expect to observe any changes in and you test those.

This is the part where my thoughts become a little messy. Let’s start with some questions:
– How do you decide that something is a thing?(4)
– When deciding what is a thing and what isn’t, why do we so strongly prefer modules and interfaces as things?(5)
– How do we decide that a change in thing A should not have any effect on thing B?
– How do we know which observations are needed to decide that thing B is not affected by a change in thing A?

The answer to all of these questions is: a model. You need a model for all three activities: mapping out the application, deciding the impact of a change, and designing your tests. However, if my model predicts there are no changes, how can that same model inform my test design? It provides me with no information whatsoever about what to focus my testing on. Wich shows that we should not be using the same model for all three activities. Even more, we should be using different models within all three activities.

However, that does not solve the problem of ‘regression testing’, the definition of which largely depends on the second activity: deciding where you don’t expect any changes. And this decision is fully dependent on which model you use. What is regression testing according to one model, might be progression testing to a different model. So actually the distinction between regression and progression testing is based on our lack of knowledge and/or imagination to predict a change.

And this is where my understanding of ‘regression testing’ starts to break down. If you are testing a refactored feature, are you regression testing or progression testing? The code changed, so that’s progression testing, but the behaviour of the feature hasn’t, so you’re regression testing. If you are testing a feature and you find no bugs in the feature as such, but you do find a regression bug related to the feature, would you say the feature works? If so, how does that even make sense? If one part of a workflow changes and I am testing the whole workflow, am I progression or regression testing? Or does that depend on where I find a bug, if any?

Ok, enough questions. Let’s go over this one more time. A change was made in area A and none of my models predict an impact on area B. Why would I want to regression test area B? Two possible reasons: either I do not sufficiently trust my judgement, or Area B is so important, my judgement doesn’t matter. So what I am acutally saying when I want to regression test area B, is that I have identified a low-risk area that’s important enough that I want to test it. But then… how is this different from testing in general? Why would it need a separate term? Well, that’s why we have a fourth and last section in this blog post.

Regression testing without regression testing
Situation: whenever you test something, you consider what needs testing and you test it. Which is to say: you’re testing a feature, a release, or a whatever, and you decide what testing will give you the information you’re looking for.

This is what in my opinion we should be doing: test the things that are important enough to test. The notion ‘regression testing’ is not relevant in that discussion.

Does this mean that the term ‘regression testing’ is meaningless within this approach? No, it doesn’t. It does mean that ‘regression testing’ is a lot less important as a concept. You might be testing a feature, focusing on usability. And you might not expect to find any issues, because it’s a regression test, i.e. you don’t expect the changes that were made, to have had any impact. Then it makes perfect sense to use the term, but at best ‘regression test’ is the fourth thing you mention, because only in the context of those other three things, it has any meaning.

— — —

(1) I’m definitely not the first one to notice this peculiarity, see e.g. Arborosa’s blog post about regression testing“such a definition […] only provides the intention of this type, or should I say activity, of testing. Regression testing could still encompass any other testing type in practice.”

(2) I am not sure who first used the term ‘change detector’ in this way. The oldest reference I could find is from 2005: Regression testing by Cem Kaner and James Bach.

(3) Just to be clear: machine-centric will involve humans and human-centric will involve machines. The question is: what’s the focus? Humans supporting machines (i.e. machine-centric) or machines supporting humans (i.e. human-centric). And of course, until our robotic overlords are here, at the highest level what you do will be human-centric. (Afterwards too, actually, I hope.)

(4) Some further explanation for the less philosophically inclined: according to a city map, streets and buildings are things, but people and the weather are not. If your application is a city, what kind of map would you draw? What elements would it include and which would it omit?

(5) There’s a more general blog post hidden in this question, expanding into the philosophical: why are most Western ontologies focused on things instead of processes?

Why the testing/checking debate is so messy – a fruit salad analogy

Five days ago James Thomas posted the following in the Software Testing & Quality Assurance group on LinkedIn:

Are Testing and Checking different or not?
This article by Paul Gerrard explains why we shouldn’t be trying to draw a distinction between checking and testing, but should be paying more attention to the skills of the testers we employ to do the job.

I posted a reply there, but I think I can do better than those initial thoughts, so here we go.

Let’s imagine the following scene: Alice and Bob are preparing a fruit salad together.
Alice: “Ok, let’s make a nice fruit salad. We need some apples and some fruit.”
Bob: “Euh, aren’t apples fruit?”
Alice: “Yes. Of course. But when I say ‘fruit’, I mean ‘non-apple fruit’.”
Bob: “So you don’t think that an apple is fruit?”
Alice: “No, I do. It’s just when I say ‘fruit’, I want to focus on the non-apple fruit.”
Bob: “Uhuh. So fruit is stuff like bananas, pears and pomegranate?”
Alice: “Exactly. And that would actually make a great fruit salad: apple and those three fruits.”
Bob: “Ok, but what if I feel like having a fruit salad. And it turns out that I only have apples and bananas at home and I don’t have time to go to the store. And, importantly, I really really don’t like bananas. So I decide to only use apple. That’s still a fruit salad, right?”
Alice: “I suppose so, technically, but still… a fruit salad without any non-apple fruit… I mean, everyone puts apples in their fruit salad and there’s so much more fruit than just apples! So when I say ‘fruit’, I just really want to focus on the non-apple fruit, ok?”
Bob: “Ok, fine. Glad we cleared that up. One more question though: what about tomatoes?”
Alice: “Don’t. Just don’t.”

Now read this piece of dialogue again and replace ‘apple’ with ‘checking’ and ‘fruit’ with ‘testing’. Bob’s confusion is exactly the reason why the whole testing/checking debate is messy: most of the time it’s about testing *versus* checking. You can see it in the title of the LinkedIn post: “Are testing and checking different or not?” You can see it in Paul Gerrard’s article: “[…] the James Bach & Michael Bolton demarcation of or distinction between ‘testing versus checking’.” You can see it in Cem Kaner’s article: “According to the new doctrine, “checking” is the opposite of “testing” and therefore, automated tests that check against expectations are not only not sapient, they are not tests.” You can also see it in the original “Testing vs. Checking” blog post by Michael Bolton dated August 2009. It’s right there in the title. Do take note however, that this post has been retired and we are directed to the new version “Testing and Checking Refined“. However, the new version still contains a sub-title “Checking vs. Testing”.

“Testing and Checking Refined” also contains a helpful diagram, that’s key to the point I want to make in this post. The diagram shows us that there’s one overarching category ‘testing’ (fruit), which contains two things: ‘checking’ (apples) and all other testing activities (non-apple fruit). This helps us to understand two things.

First of all, it shows that any discussion about testing *versus* checking is bullshit. They are on a different conceptual level, just like fruit and apples, so any direct comparison is meaningless. To throw in one more analogy, what would you answer when you were asked while visiting a friend at his home: “Would you like coffee, or something to drink?”

Secondly, it explains why my previous point is difficult to get(1). The diagram presents people with two concepts: ‘testing’ and ‘checking’. Of course there’s also “learning by experimenting, including study, questioning, modeling, observation, inference, etc.”, but that’s just too vague to register mentally as an entity. It does not coalesce into a concept. What we’re left with are only two concepts, ‘testing’ and ‘checking’, and the non-checking part of testing is gone. This is actually illustrated by the title of James Bach’s and Michael Bolton’s blog post: “Testing and Checking Refined”.

So when you present two concepts in this way, is it really that surprising that people talk about them likes apples and oranges, instead of like apples and fruit? I think not.

— — —

(1) Including for myself. See for instance how I’m struggling with this very problem in my blog post from August: “What’s the word for the part of testing that’s not checking?