Your CI/CD pipeline does not run regression tests

(This post is also available here.)

CI/CD pipelines

The purpose of a CI/CD pipeline is to allow you to deliver small changes in a fast and controlled way. Without any tests in your pipeline you would gain a lot of speed. You’d also lose a lot control, which is why people in general do run tests in their pipeline. The purpose of these tests is to check if that stage of the pipeline meets the minimum level of acceptable quality for that stage.

For example, commit stage tests will consist of mostly unit tests, a few integration tests, and even fewer end-to-end tests, because early in the pipeline speed is more important than comprehensiveness. When I commit my changes, I want the results fast enough so that I will wait for them – ready to fix any issue that might occur.

Regression testing

There are many definitions of regression testing, as you can read in Arborosa’s blog post on the topic. I have always defined regression testing along the lines of “testing the parts that weren’t impacted by a change to see if they really weren’t impacted.” (Which is really weird if you start thinking about it: something is regression testing depending on your knowledge of the system and the change.)

The tests in your pipeline are regression tests, …

Most of the tests that run in your pipeline are regression tests. Your commits are small and you have a lot of tests, so most of those will cover parts of the system that shouldn’t have been impacted by your changes. So yes, regression tests.

The one exception is if your commit contains both changes and new or updated tests related to that change. For that one run of the pipeline those tests are not regression tests. The next commit they are.
Or, since you ran those tests before committing, perhaps they already have become regression tests when they are executed by the pipeline?

Sidenote:
A grey area is when your commit is a pure refactoring, as in: you didn’t even have to change any of the tests. On the one hand, you made a change, so the tests covering that change, are not regression tests. On the other hand, at the level these tests are defined, there should be zero impact, they shouldn’t detect any changes. So in that sense they are regression tests.

…, but that’s irrelevant.

So sure, the tests run by your pipeline are regression tests. However, they are regression tests incidentally, not essentially. They happen to be regression tests, but that’s not really relevant.

To see why, we need to revisit the start of this blog post.

The purpose of a regression test is to check if unchanged parts of the system are indeed unchanged. It’s the testing that got a name, so we could distinguish it from the other testing, which never really got a name. (Progression testing? Feature testing?) It’s the testing you do after sufficient testing and fixing, when you’re not expecting any more changes and you need to check if all the “other stuff” still works.

The purpose of a test in a CI/CD pipeline is to check the level of quality of a particular stage in the pipeline. The pipeline stages combined with all the practices that surround them, result in a continuous delivery of changes that can be deployed to production. Whether the tests at a particular stage are regression tests or not, doesn’t matter. What does matter is if they provide the information required to decide if we should proceed to the next stage or not.

And that’s why I claim that your CI/CD pipeline does not run regression tests. The definition of “regression test” may technically apply to the tests run by your pipeline; the context that comes with the term, does not. So although it might (mostly) be correct to say that your pipeline runs regression tests, doing so is not helpful in how you think about your pipeline or about your tests. It moves your mind towards thinking about changed versus unchanged things – drawing it away from the continuous delivery of a good enough product.

update August 6th:
After publishing this post, I got the following question on twitter: so how does this impact actual decisions? In response, I came up with four things you might do if you think of the tests in your pipeline as regression tests:
1. Not looking for regressions when exploratory testing because you already have so many regression tests.
2. Poorly designing the stages of the pipeline, because all it needs to do is just run those regression tests.
3. Doing exploratory testing too early in the pipeline, because you should do feature testing before regression testing.
4. Being lenient towards a failed pipeline because they’re just regressions, we can fix them later.

— — —

p.s. 1: One thing I’m glossing over is that your CI/CD pipeline can (should) have stages in which the testing involves a human. I don’t think it makes a difference for my argument. Yet I’m still conveniently limiting the scope of this post to the literal interpretation of “Your CI/CD pipeline does not run regression tests”.

p.s. 2: None of the ideas in this blog post are new, which you can see from the replies in the twitter thread that lead me to writing this blog post.

Regression testing, it means less than you think

The past weeks I have made several attempts at a blog post about regression testing. About how we use it to refer to different things: tests running on a CI server, people executing test scripts, etc. And about how often the term really doesn’t mean much at all, yet nobody questions you when you use it: “What are you doing?” “Regression testing.” “Oh good, carry on.” The point of the post would be to argue we should use the term ‘regression testing’ a lot less, because most of the time we can be more specific without having to be more verbose.

However, the more I thought about (what I would qualify as) proper regression testing, the more I felt that regression versus progression (or progressive) testing is a distinction without difference. One interesting observation in this regard is that “regression testing” returns 30 times more results on Google than “progression testing” and “progressive testing” combined. So what’s going on here if we have a dichotomy with one member producing so much more discussion than the other? And there’s more: regression testing is commonly contrasted with test types like functional testing and usability testing. But how then should I categorize a regression test focusing on functionality?(1)

Anyhow, more thinking followed and the result is what you are reading here. A blog post describing four different things you could call regression testing. The first two I don’t consider to be regression testing, the third one I find confusing and in the fourth I am reluctant to use the term at all.

Oh wait, I do have to provide a definition of regression testing for within this blog post, so here you go: Regression testing is testing the things you don’t expect to observe any changes in.

Regression testing as part of continuous integration
Situation: your regression test is a bunch of automated checks running on your CI server.

Although I see great value in this, it’s debatable if this qualifies as testing. You commit your code and check if the build is green or red. If it goes red, you know there is a problem somewhere. If it stays green, you know the checks did not detect anything alarming and you move on. This is why people refer to these checks as change detectors(2). Because so far, no real testing is going on. On the other hand, you likely will do testing when you are investigating why the build went red. And testing was involved when you wrote the checks, and arguably when people decided how to run which checks on the CI server. So whether running these checks should be considered testing or not strongly depends on your choice of perspective.

More important for this blog post and referring back to the definition, the decision which checks to run has no relation to where we don’t expect to see any changes. The checks that are run are simply the checks that the CI server has been configured to run. And some of those checks are checks you changed or created because of the code changes in your commit. So there really is no consideration for changed versus unchanged areas here, which means there is no good reason to call running these checks regression testing.

(Not directly relevant, but related and interesting: for a critique of the fully automated approach to regression testing, watch this short Whiteboard Testing video “Regression Testing, the F.A.R.T Model“.)

Regression testing with a suite of regression test scripts
Situation: you have a suite of test scripts you execute every time you perform a regression test.

The problem here is that you have the worst of two worlds: people trying to be like CI servers, executing the same test scripts every time, while that’s not what people are good at. So it’s time to make a choice: either go machine-centric and automate it all, or go human-centric and give them more freedom.(3) In any case, what’s going on here, is not regression testing. It’s slow, error-prone change detection, again with no consideration what has and what hasn’t been changed.

Regression testing by doing actual testing
Situation: you identify the things you don’t expect to observe any changes in and you test those.

This is the part where my thoughts become a little messy. Let’s start with some questions:
– How do you decide that something is a thing?(4)
– When deciding what is a thing and what isn’t, why do we so strongly prefer modules and interfaces as things?(5)
– How do we decide that a change in thing A should not have any effect on thing B?
– How do we know which observations are needed to decide that thing B is not affected by a change in thing A?

The answer to all of these questions is: a model. You need a model for all three activities: mapping out the application, deciding the impact of a change, and designing your tests. However, if my model predicts there are no changes, how can that same model inform my test design? It provides me with no information whatsoever about what to focus my testing on. Wich shows that we should not be using the same model for all three activities. Even more, we should be using different models within all three activities.

However, that does not solve the problem of ‘regression testing’, the definition of which largely depends on the second activity: deciding where you don’t expect any changes. And this decision is fully dependent on which model you use. What is regression testing according to one model, might be progression testing to a different model. So actually the distinction between regression and progression testing is based on our lack of knowledge and/or imagination to predict a change.

And this is where my understanding of ‘regression testing’ starts to break down. If you are testing a refactored feature, are you regression testing or progression testing? The code changed, so that’s progression testing, but the behaviour of the feature hasn’t, so you’re regression testing. If you are testing a feature and you find no bugs in the feature as such, but you do find a regression bug related to the feature, would you say the feature works? If so, how does that even make sense? If one part of a workflow changes and I am testing the whole workflow, am I progression or regression testing? Or does that depend on where I find a bug, if any?

Ok, enough questions. Let’s go over this one more time. A change was made in area A and none of my models predict an impact on area B. Why would I want to regression test area B? Two possible reasons: either I do not sufficiently trust my judgement, or Area B is so important, my judgement doesn’t matter. So what I am acutally saying when I want to regression test area B, is that I have identified a low-risk area that’s important enough that I want to test it. But then… how is this different from testing in general? Why would it need a separate term? Well, that’s why we have a fourth and last section in this blog post.

Regression testing without regression testing
Situation: whenever you test something, you consider what needs testing and you test it. Which is to say: you’re testing a feature, a release, or a whatever, and you decide what testing will give you the information you’re looking for.

This is what in my opinion we should be doing: test the things that are important enough to test. The notion ‘regression testing’ is not relevant in that discussion.

Does this mean that the term ‘regression testing’ is meaningless within this approach? No, it doesn’t. It does mean that ‘regression testing’ is a lot less important as a concept. You might be testing a feature, focusing on usability. And you might not expect to find any issues, because it’s a regression test, i.e. you don’t expect the changes that were made, to have had any impact. Then it makes perfect sense to use the term, but at best ‘regression test’ is the fourth thing you mention, because only in the context of those other three things, it has any meaning.

— — —

(1) I’m definitely not the first one to notice this peculiarity, see e.g. Arborosa’s blog post about regression testing“such a definition […] only provides the intention of this type, or should I say activity, of testing. Regression testing could still encompass any other testing type in practice.”

(2) I am not sure who first used the term ‘change detector’ in this way. The oldest reference I could find is from 2005: Regression testing by Cem Kaner and James Bach.

(3) Just to be clear: machine-centric will involve humans and human-centric will involve machines. The question is: what’s the focus? Humans supporting machines (i.e. machine-centric) or machines supporting humans (i.e. human-centric). And of course, until our robotic overlords are here, at the highest level what you do will be human-centric. (Afterwards too, actually, I hope.)

(4) Some further explanation for the less philosophically inclined: according to a city map, streets and buildings are things, but people and the weather are not. If your application is a city, what kind of map would you draw? What elements would it include and which would it omit?

(5) There’s a more general blog post hidden in this question, expanding into the philosophical: why are most Western ontologies focused on things instead of processes?

Why the testing/checking debate is so messy – a fruit salad analogy

Five days ago James Thomas posted the following in the Software Testing & Quality Assurance group on LinkedIn:

Are Testing and Checking different or not?
This article by Paul Gerrard explains why we shouldn’t be trying to draw a distinction between checking and testing, but should be paying more attention to the skills of the testers we employ to do the job.

I posted a reply there, but I think I can do better than those initial thoughts, so here we go.

Let’s imagine the following scene: Alice and Bob are preparing a fruit salad together.
Alice: “Ok, let’s make a nice fruit salad. We need some apples and some fruit.”
Bob: “Euh, aren’t apples fruit?”
Alice: “Yes. Of course. But when I say ‘fruit’, I mean ‘non-apple fruit’.”
Bob: “So you don’t think that an apple is fruit?”
Alice: “No, I do. It’s just when I say ‘fruit’, I want to focus on the non-apple fruit.”
Bob: “Uhuh. So fruit is stuff like bananas, pears and pomegranate?”
Alice: “Exactly. And that would actually make a great fruit salad: apple and those three fruits.”
Bob: “Ok, but what if I feel like having a fruit salad. And it turns out that I only have apples and bananas at home and I don’t have time to go to the store. And, importantly, I really really don’t like bananas. So I decide to only use apple. That’s still a fruit salad, right?”
Alice: “I suppose so, technically, but still… a fruit salad without any non-apple fruit… I mean, everyone puts apples in their fruit salad and there’s so much more fruit than just apples! So when I say ‘fruit’, I just really want to focus on the non-apple fruit, ok?”
Bob: “Ok, fine. Glad we cleared that up. One more question though: what about tomatoes?”
Alice: “Don’t. Just don’t.”

Now read this piece of dialogue again and replace ‘apple’ with ‘checking’ and ‘fruit’ with ‘testing’. Bob’s confusion is exactly the reason why the whole testing/checking debate is messy: most of the time it’s about testing *versus* checking. You can see it in the title of the LinkedIn post: “Are testing and checking different or not?” You can see it in Paul Gerrard’s article: “[…] the James Bach & Michael Bolton demarcation of or distinction between ‘testing versus checking’.” You can see it in Cem Kaner’s article: “According to the new doctrine, “checking” is the opposite of “testing” and therefore, automated tests that check against expectations are not only not sapient, they are not tests.” You can also see it in the original “Testing vs. Checking” blog post by Michael Bolton dated August 2009. It’s right there in the title. Do take note however, that this post has been retired and we are directed to the new version “Testing and Checking Refined“. However, the new version still contains a sub-title “Checking vs. Testing”.

“Testing and Checking Refined” also contains a helpful diagram, that’s key to the point I want to make in this post. The diagram shows us that there’s one overarching category ‘testing’ (fruit), which contains two things: ‘checking’ (apples) and all other testing activities (non-apple fruit). This helps us to understand two things.

First of all, it shows that any discussion about testing *versus* checking is bullshit. They are on a different conceptual level, just like fruit and apples, so any direct comparison is meaningless. To throw in one more analogy, what would you answer when you were asked while visiting a friend at his home: “Would you like coffee, or something to drink?”

Secondly, it explains why my previous point is difficult to get(1). The diagram presents people with two concepts: ‘testing’ and ‘checking’. Of course there’s also “learning by experimenting, including study, questioning, modeling, observation, inference, etc.”, but that’s just too vague to register mentally as an entity. It does not coalesce into a concept. What we’re left with are only two concepts, ‘testing’ and ‘checking’, and the non-checking part of testing is gone. This is actually illustrated by the title of James Bach’s and Michael Bolton’s blog post: “Testing and Checking Refined”.

So when you present two concepts in this way, is it really that surprising that people talk about them likes apples and oranges, instead of like apples and fruit? I think not.

— — —

(1) Including for myself. See for instance how I’m struggling with this very problem in my blog post from August: “What’s the word for the part of testing that’s not checking?