the testing curve

my learning curve in software testing

Regression testing, it means less than you think

The past weeks I have made several attempts at a blog post about regression testing. About how we use it to refer to different things: tests running on a CI server, people executing test scripts, etc. And about how often the term really doesn’t mean much at all, yet nobody questions you when you use it: “What are you doing?” “Regression testing.” “Oh good, carry on.” The point of the post would be to argue we should use the term ‘regression testing’ a lot less, because most of the time we can be more specific without having to be more verbose.

However, the more I thought about (what I would qualify as) proper regression testing, the more I felt that regression versus progression (or progressive) testing is a distinction without difference. One interesting observation in this regard is that “regression testing” returns 30 times more results on Google than “progression testing” and “progressive testing” combined. So what’s going on here if we have a dichotomy with one member producing so much more discussion than the other? And there’s more: regression testing is commonly contrasted with test types like functional testing and usability testing. But how then should I categorize a regression test focusing on functionality?(1)

Anyhow, more thinking followed and the result is what you are reading here. A blog post describing four different things you could call regression testing. The first two I don’t consider to be regression testing, the third one I find confusing and in the fourth I am reluctant to use the term at all.

Oh wait, I do have to provide a definition of regression testing for within this blog post, so here you go: Regression testing is testing the things you don’t expect to observe any changes in.

Regression testing as part of continuous integration
Situation: your regression test is a bunch of automated checks running on your CI server.

Although I see great value in this, it’s debatable if this qualifies as testing. You commit your code and check if the build is green or red. If it goes red, you know there is a problem somewhere. If it stays green, you know the checks did not detect anything alarming and you move on. This is why people refer to these checks as change detectors(2). Because so far, no real testing is going on. On the other hand, you likely will do testing when you are investigating why the build went red. And testing was involved when you wrote the checks, and arguably when people decided how to run which checks on the CI server. So whether running these checks should be considered testing or not strongly depends on your choice of perspective.

More important for this blog post and referring back to the definition, the decision which checks to run has no relation to where we don’t expect to see any changes. The checks that are run are simply the checks that the CI server has been configured to run. And some of those checks are checks you changed or created because of the code changes in your commit. So there really is no consideration for changed versus unchanged areas here, which means there is no good reason to call running these checks regression testing.

(Not directly relevant, but related and interesting: for a critique of the fully automated approach to regression testing, watch this short Whiteboard Testing video “Regression Testing, the F.A.R.T Model“.)

Regression testing with a suite of regression test scripts
Situation: you have a suite of test scripts you execute every time you perform a regression test.

The problem here is that you have the worst of two worlds: people trying to be like CI servers, executing the same test scripts every time, while that’s not what people are good at. So it’s time to make a choice: either go machine-centric and automate it all, or go human-centric and give them more freedom.(3) In any case, what’s going on here, is not regression testing. It’s slow, error-prone change detection, again with no consideration what has and what hasn’t been changed.

Regression testing by doing actual testing
Situation: you identify the things you don’t expect to observe any changes in and you test those.

This is the part where my thoughts become a little messy. Let’s start with some questions:
– How do you decide that something is a thing?(4)
– When deciding what is a thing and what isn’t, why do we so strongly prefer modules and interfaces as things?(5)
– How do we decide that a change in thing A should not have any effect on thing B?
– How do we know which observations are needed to decide that thing B is not affected by a change in thing A?

The answer to all of these questions is: a model. You need a model for all three activities: mapping out the application, deciding the impact of a change, and designing your tests. However, if my model predicts there are no changes, how can that same model inform my test design? It provides me with no information whatsoever about what to focus my testing on. Wich shows that we should not be using the same model for all three activities. Even more, we should be using different models within all three activities.

However, that does not solve the problem of ‘regression testing’, the definition of which largely depends on the second activity: deciding where you don’t expect any changes. And this decision is fully dependent on which model you use. What is regression testing according to one model, might be progression testing to a different model. So actually the distinction between regression and progression testing is based on our lack of knowledge and/or imagination to predict a change.

And this is where my understanding of ‘regression testing’ starts to break down. If you are testing a refactored feature, are you regression testing or progression testing? The code changed, so that’s progression testing, but the behaviour of the feature hasn’t, so you’re regression testing. If you are testing a feature and you find no bugs in the feature as such, but you do find a regression bug related to the feature, would you say the feature works? If so, how does that even make sense? If one part of a workflow changes and I am testing the whole workflow, am I progression or regression testing? Or does that depend on where I find a bug, if any?

Ok, enough questions. Let’s go over this one more time. A change was made in area A and none of my models predict an impact on area B. Why would I want to regression test area B? Two possible reasons: either I do not sufficiently trust my judgement, or Area B is so important, my judgement doesn’t matter. So what I am acutally saying when I want to regression test area B, is that I have identified a low-risk area that’s important enough that I want to test it. But then… how is this different from testing in general? Why would it need a separate term? Well, that’s why we have a fourth and last section in this blog post.

Regression testing without regression testing
Situation: whenever you test something, you consider what needs testing and you test it. Which is to say: you’re testing a feature, a release, or a whatever, and you decide what testing will give you the information you’re looking for.

This is what in my opinion we should be doing: test the things that are important enough to test. The notion ‘regression testing’ is not relevant in that discussion.

Does this mean that the term ‘regression testing’ is meaningless within this approach? No, it doesn’t. It does mean that ‘regression testing’ is a lot less important as a concept. You might be testing a feature, focusing on usability. And you might not expect to find any issues, because it’s a regression test, i.e. you don’t expect the changes that were made, to have had any impact. Then it makes perfect sense to use the term, but at best ‘regression test’ is the fourth thing you mention, because only in the context of those other three things, it has any meaning.

— — —

(1) I’m definitely not the first one to notice this peculiarity, see e.g. Arborosa’s blog post about regression testing“such a definition […] only provides the intention of this type, or should I say activity, of testing. Regression testing could still encompass any other testing type in practice.”

(2) I am not sure who first used the term ‘change detector’ in this way. The oldest reference I could find is from 2005: Regression testing by Cem Kaner and James Bach.

(3) Just to be clear: machine-centric will involve humans and human-centric will involve machines. The question is: what’s the focus? Humans supporting machines (i.e. machine-centric) or machines supporting humans (i.e. human-centric). And of course, until our robotic overlords are here, at the highest level what you do will be human-centric. (Afterwards too, actually, I hope.)

(4) Some further explanation for the less philosophically inclined: according to a city map, streets and buildings are things, but people and the weather are not. If your application is a city, what kind of map would you draw? What elements would it include and which would it omit?

(5) There’s a more general blog post hidden in this question, expanding into the philosophical: why are most Western ontologies focused on things instead of processes?

Why the testing/checking debate is so messy – a fruit salad analogy

Five days ago James Thomas posted the following in the Software Testing & Quality Assurance group on LinkedIn:

Are Testing and Checking different or not?
This article by Paul Gerrard explains why we shouldn’t be trying to draw a distinction between checking and testing, but should be paying more attention to the skills of the testers we employ to do the job.

I posted a reply there, but I think I can do better than those initial thoughts, so here we go.

Let’s imagine the following scene: Alice and Bob are preparing a fruit salad together.
Alice: “Ok, let’s make a nice fruit salad. We need some apples and some fruit.”
Bob: “Euh, aren’t apples fruit?”
Alice: “Yes. Of course. But when I say ‘fruit’, I mean ‘non-apple fruit’.”
Bob: “So you don’t think that an apple is fruit?”
Alice: “No, I do. It’s just when I say ‘fruit’, I want to focus on the non-apple fruit.”
Bob: “Uhuh. So fruit is stuff like bananas, pears and pomegranate?”
Alice: “Exactly. And that would actually make a great fruit salad: apple and those three fruits.”
Bob: “Ok, but what if I feel like having a fruit salad. And it turns out that I only have apples and bananas at home and I don’t have time to go to the store. And, importantly, I really really don’t like bananas. So I decide to only use apple. That’s still a fruit salad, right?”
Alice: “I suppose so, technically, but still… a fruit salad without any non-apple fruit… I mean, everyone puts apples in their fruit salad and there’s so much more fruit than just apples! So when I say ‘fruit’, I just really want to focus on the non-apple fruit, ok?”
Bob: “Ok, fine. Glad we cleared that up. One more question though: what about tomatoes?”
Alice: “Don’t. Just don’t.”

Now read this piece of dialogue again and replace ‘apple’ with ‘checking’ and ‘fruit’ with ‘testing’. Bob’s confusion is exactly the reason why the whole testing/checking debate is messy: most of the time it’s about testing *versus* checking. You can see it in the title of the LinkedIn post: “Are testing and checking different or not?” You can see it in Paul Gerrard’s article: “[…] the James Bach & Michael Bolton demarcation of or distinction between ‘testing versus checking’.” You can see it in Cem Kaner’s article: “According to the new doctrine, “checking” is the opposite of “testing” and therefore, automated tests that check against expectations are not only not sapient, they are not tests.” You can also see it in the original “Testing vs. Checking” blog post by Michael Bolton dated August 2009. It’s right there in the title. Do take note however, that this post has been retired and we are directed to the new version “Testing and Checking Refined“. However, the new version still contains a sub-title “Checking vs. Testing”.

“Testing and Checking Refined” also contains a helpful diagram, that’s key to the point I want to make in this post. The diagram shows us that there’s one overarching category ‘testing’ (fruit), which contains two things: ‘checking’ (apples) and all other testing activities (non-apple fruit). This helps us to understand two things.

First of all, it shows that any discussion about testing *versus* checking is bullshit. They are on a different conceptual level, just like fruit and apples, so any direct comparison is meaningless. To throw in one more analogy, what would you answer when you were asked while visiting a friend at his home: “Would you like coffee, or something to drink?”

Secondly, it explains why my previous point is difficult to get(1). The diagram presents people with two concepts: ‘testing’ and ‘checking’. Of course there’s also “learning by experimenting, including study, questioning, modeling, observation, inference, etc.”, but that’s just too vague to register mentally as an entity. It does not coalesce into a concept. What we’re left with are only two concepts, ‘testing’ and ‘checking’, and the non-checking part of testing is gone. This is actually illustrated by the title of James Bach’s and Michael Bolton’s blog post: “Testing and Checking Refined”.

So when you present two concepts in this way, is it really that surprising that people talk about them likes apples and oranges, instead of like apples and fruit? I think not.

— — —

(1) Including for myself. See for instance how I’m struggling with this very problem in my blog post from August: “What’s the word for the part of testing that’s not checking?

Two styles of leadership in spreading context-driven testing (TITANconf)

The last weekend of August I spent with some great people – Kristoffer Ankarberg (@KrisAnkarberg), Kristoffer Nordström (@kristoffer_nord), Anna Brunell (@Anna_Brunell), Fredrik Thuresson (@Thure98), Maria Kedemo (@mariakedemo), Henrik Andersson (@henkeandersson), Maria Månsson, Amy Philips (@ItJustBroke), Richard Bradshaw (@FriendlyTester), Duncan Nisbet (@DuncNisbet), Alexandru Rotaru (@altomalex), Oana Casapu, Simon Schrijver (@SimonSaysNoMore), Zeger Van Hese (@TestSideStory), Helena Jeret-Mäe (@HelenaJ_M), Aleksis Tulonen (@al3ksis), Anders Dinsen (@andersdinsen) – at the awesome TITAN peer conference in Karlskrona, Sweden.

During the conference we discussed leadership and testing and on Sunday morning I got the opportunity to tell my story(1). (I do wish I had captured more of the discussion afterwards to include in this blog post.)

The first style
When thinking about my own leadership in testing, one of the first things that come to mind are my attempts to influence my colleagues at work (testers, developers, project managers) to become more context-driven in their attitude towards testing.

Personally, I discovered context-driven testing at a time I was wondering if I wanted to be in testing at all. I had been working as a tester for about two years and a certain fatigue had set in: “Is this what I want to do the rest of my life?” One of things I did to find an answer, was to learn more about testing. So I searched the internet, discovered context-driven testing and after reading both James Bach’s and Michael Bolton’s blogs from the oldest post to the most current one, I was totally into software testing. To me context-driven testing was a huge discovery: through it I found my passion for software testing.

After that I wanted to share that passion. I talked to people and gave them pointers to blogs, books, conferences. I explained concepts, etc. And that is the first style of leadership I employed: reaching out. And although I have good enough manners not to be too pushy(2), part of my intention really was to make other people ‘see the light’, to convert them to context-driven testing. It shouldn’t come as a big surpise that my successes have been slim to none. Although I did influence some people, the bottom line is that in the end the scoreboard says “Conversions: 0”.

An Oriental excursion
Whenever I realised that my efforts didn’t seem to lead anywhere, I felt disappointed and just gave up for a while. As bad as that may sound, it has lead me to discover a second style(3). Instead of reaching out, it keeps to itself and there isn’t really anything as ‘success’ like in the first style. To be honest, at first I wasn’t sure if I had just found fancier words for ‘giving up’, but when I saw an analogy with how koryu present themselves to the outside, I realized it’s more than a rebranded towel thrown in the ring.

Koryu is the name for the old martial arts of Japan, with ‘old’ meaning that they originated sometime between 1400 and 1868(4). Where modern martial arts are very much about what you can get out of them (physical fitness, confidence-building, self defence, …), the attitude of koryu is quite different(5). Dave Lowry did a great job describing this attitude in his article “So You Want To Join The Ryu?“. (‘Ryu’ means ‘school’.) The first sentence of that article reads: “I don’t care about you.” After which he explains that what he does care about is the ryu. So it’s very much a case of “Ask not what your country can do for you – ask what you can do for your country.”

Does this mean it’s incredibly hard to join a ryu? Well, not exactly. It’s just that you’ll have to make an effort. First of all in finding one: few if any actively look for new members. Next there may be some prerequisites. A fairly obvious one is having read about koryu, so you know what you’re getting into. Finally there’s your attitude: sincerity, politeness and patience will get you a long way. So basically, all that is expected of you, is to show good manners and put in some work.
The reasons why koryu have adopted this attitude are many, so I will focus on just one that’s relevant to this blog post. A ryu is a family, a tight-knit community, passing on an heirloom, a body of knowledge and skills, from generation to generation. Both of these explain why it’s difficult for an outsider to just jump in and participate. You don’t just join a family and you don’t just get to lay your hands on an heirloom that has been passed on for several hundreds of years.

The second style
Where a ryu is both a family and an heirloom, context-driven testing is three things: a paradigm (or school of thought), a community and an approach. Put this way, the analogy is fairly obvious: both are communities focused around a body of knowledge and skills. Of course there are also differences. The most obvious one was pointed out by Duncan: we as context-driven testers do share our ‘heirloom’ openly with the world through blogs, at conferences, in discussions. (We also defend it fiercely when it comes under attack – sometimes too fiercely according to some.) However, I do think the analogy is strong enough to ask: what would a koryu-like style of leadership in spreading context-driven testing look like?

The basis of this style is doing your thing. You practice context-driven testing: you apply it and you work to get better at it(6).
Then, you leave crumbs. This can be anything: referring to a book or conference, sharing a blog post, mentioning a certain concept – as long as it’s something the other person can follow up on, if he or she is interested. Because that’s the main point: you don’t try to reach out to someone, you just show there’s something there. And whether or not the other person does something with that crumb, doesn’t matter to you. It’s all up to them.
Finally, if someone does pick up a crumb, if someone shows curiosity and puts effort in finding out more, you reward them. You give them a bigger crumb. You engage more. And perhaps curiosity wanes after that and perhaps the cycle repeats itself. If so, every time you engage a little more, you invest a little more. But the point is that instead of you reaching out, it’s the other person pulling him or herself in. You’re just there to give directions in case someone is looking for them.

In closing
Of course, reality is a little more messy than what I presented here. I’ve influenced people in different ways outside of work, for instance through this blog or by speaking at conferences. I still find myself switching back and forth between the two styles. And I need more practice in not caring. But I do think the second style suits me better than the first. It saves me from disappointments and it gives other people more freedom to find their own path.

— — —

(1) The slides can be found here.
(2) Do let me know if I’m mistaken here.
(3) This style has some similarities to the honey-badger style, mentioned by Henrik Andersson, which he and Ilari Henrik Aegerter identified.
(4) That’s not very descriptive, but fully explaining would take a full blog post at minimum.
(5) There are quite a few different ryu or schools, each with their own character. So please take note that all my generalizations are by definition wrong, because generalizations.
(6) I didn’t mention this explicitly in my presentation, but someone (forgot who, sorry) commented that part four should be ‘practice’. To me, it’s part of doing your thing.