It’s time to retire our test case management tools

(This post is also available here.)

Recently the topic of test case management tools popped up a few times in my surroundings. In almost all cases I’d recommend against using these kinds of tools and I found myself able to give a few reasons, but also found that my thoughts lacked the clarity I’d like them to have. Hence this blog post, to force myself to think more deeply and communicate more clearly.

Before I go into that, there are a few things this blog post is not about. I won’t be really going into what effect test cases have on test execution, or rather if test cases are a good tool to use when doing actual testing. Personally I don’t think they are and I wrote about my inability to use them in this post from July 2013. For some deeper thoughts on this, I recommend James Bach’s and Aaron Hodder’s article “Test cases are not testing: Towards a culture of test performance“.

What I do want to cover in this post is managing test cases. Having a collection of test cases stored somewhere to re-use across releases and reporting their pass/fail numbers. Both are important use cases for a test case management tool and both are in my opinion a bad idea.

Finally, thinking this through made me realize how incredibly hard it is to build a good test case management tool (if one would choose to build one). It needs to be super-responsive and fast to not slow down my thinking. However, it also needs to support synchronous collaboration and run on any computer. (On one project I had to execute tests on one machine, then update the test cases on a different machine in a different room afterwards.) It needs to support pictures (drawings?), videos and have loads of searchable fields. Navigating all that data should be easy and fast. I’d want to link releases to user stories, user stories to test case results, test case results to test cases but also user stories to features, and features to test cases. Which means that features and test cases need history because they are persistent, while releases, user stories and test case results relate to a fixed moment in time. Last but not least, it needs to have advanced features for super-users, while remaining welcoming for occassional – and even reluctant – users. In short, it’s probably not worth the cost to build a really good one.

What is a test case?

There seem to be two definitions of test cases floating around.

The first one is the one that matches those given by ISTQB, TMap, the IEEE Standard 610 (1990) and ISO 29119. A test case is a set of conditions, inputs, actions and expected results. The definition does not explicitly state that these test cases are the result of applying test design techniques, but that is how they expect you to specify your test cases.

The second is one I encounter more often: a test case is something you want to test, it’s a relatively specific and concrete test idea.

To check my understanding I sent out this tweet:

It got four replies – which also tells us something, although I’m not sure what – with two people not using test cases, one person sort of using them describing them as “high level titles”, and one person using them describing them as “list of steps and their expected results.” As to test case management tools, the replies varied from “yes” to documenting in Jira/git/confluence to “No” because “I want people to actually read and review what I write”.

Re-using test cases

A test case is an intermediate product

Testing requires test design and test execution. So test cases are really an intermediate product, since they are the result of test design. And the thing with intermediary products is that they have a limited shelf life. During exploratory testing design and executiom may be intertwined in a tight feedback loop. On the other end of the spectrum you may be designing your test for a few days before you move on to execution. In either case, however, the intermediary product that is the test case, does not become a final product. It does not become a thing on its own.

That does happen if we store test cases for re-use later, for example for next release’s regression test. It reminds me of the days where a new tester would start with doing test execution and then would get “promoted” to doing test design. It suggests that all the thinking happens during design and all the doing happens during execution. In this way re-using test cases discourages thinking.

Of course, you can instruct people to keep thinking while they’re executing tests. But why require people to fight the structures you have set up to organize the work? Why not set up a structure that encourages thinking instead?

Re-use is a legacy of waterfall projects

Re-using test cases feels like a thing for waterfall projects. This big project kicks off for several months (or years). There is no software yet, but there is documentation, so you start writing test cases. The software is delivered and you start executing your test cases. You edit some test cases, add a few more. The software goes back for fixing. There’s a second round of testing. Some of the first test cases you execute are the ones with which you found the bugs several weeks ago. There might be a second round of fixing and a third round of testing. Project done and in the final week you get time to edit/clean up your test cases and then they are archived for a potential future project.

If you still work that way, perhaps a test case management tool is a good idea. On the other hand, one time I was the lead of a team inheriting a collection of test cases written by others and I decided to to keep them archived and have my team start over. I figured resurrecting the old test cases would cost as much time as creating them anew. And starting over would result in deeper understanding and more sense of ownership.

If you don’t work this way, if you work in a more agile manner, it becomes a wholly different discussion – as illustrated by my comment above about linking releases and user stories etc. If the software keeps changing, so must the test cases, requiring a similar level of agility from your tooling.

It’s like backlogs and defect trackers

Finally, managing a collection of test cases is challenging in the same manner it’s challenging to manage a collection of work items in a backlog or a collection of defects in a defect tracker. (I was once part of a project where the new defects had an ID in the 3000s, the oldest open one had an ID of 42. When I brought this up with my test manager, he replied he had a filter to ignore any defect with an ID below 2000.) It takes a ridiculous amount of discipline and effort, and good tooling to manage a collection of test cases well.

This is not about automation

Finally, note that I am not making the argument that if you want to re-use a test case, the solution is to just automate it. For the argument I am making here, it does not make a difference if the test is executed by a human or by a machine. And I don’t think that automating manual test cases is a good approach to test automation. I won’t go deeper into that here, but this blog post of mine about regression testing and CI/CD pipelines should give you an idea of my thoughts on the matter.

Reporting test case results

Test case coverage without formal test design is meaningless

Reporting test cases results (number executed, number passed, number failed) makes some sense if you are using formal test design techniques to create your test cases based on a set of requirements. The reason is that the number of test cases is determined by the techniques and the requirements. Give the same set to two different testers and they should come up with the same test cases. So the approach is: define all test cases as required by the test strategy, execute all the test cases and full coverage (in relation to the test stategy) achieved!

That’s not how most software is tested. Informal test design techniques are used, where different testers come up with similar but not entirely the same set of test cases. You do some exploratory testing, where you usually don’t identify individual test cases. Or perhaps the most common, your definition of test case is not the first one mentioned above, but the second one: “something I want to test”.

Even with formally designed test cases, saying we have executed 73 of the 114 test cases has limited and only highly contextual meaning. When what a test case is becomes significantly vaguer, the same thing happens to a statement about how many of the (current) total you have executed.

What about CI/CD pipeline results?

If you have a CI/CD pipeline running auto-tests, how do you report those results? You could import their results into your test case management tool, but that seems opposite to the intention of a CI/CD pipeline. The pipeline should be the single source of truth, not some other tool. It makes more sense to refer to the test case results in the pipeline. And even then, for the pipeline a green job is a green job. Do you need that reference in your pipeline – except for audit purposes?

Pass/fail numbers are a poor converstation starter

Continuing the track of counting test cases, reporting pass/fail numbers of test cases is probably even worse. (One day a consultant showed me the test result dashboard of the test case management tool he was involved in rolling out. I said: “Oh yeah, I don’t care about those kinds of dashboards.” To which he replied: “That’s curious. Usually people are excited about these dashboards, but I got the exact same response from you as I got from your manager.”)

The problem with these numbers and counts and graphs is that it feels like they tell us something, it feels like we get a good overview of the current status, but in reality they don’t. Now you might argue that these numbers can be a good way to start a conversation. A 90% pass rate will lead to a different conversation than a 60% pass rate. But you don’t need those numbers to start the conversation. Michael Bolton writes in this blog post, the question “How is the testing going?” requires an answer with three strands: a story about the product and its status, a story about the testing, and a story about how good the testing is. A pass/fail ratio is an incredibly narrow entryway to those three stories.

For a further exploration of this topic, I recommend watching Rikard Edgren’s talk “Curing Our Binary Disease“.

Test cases encourage management-by-test-case

Test case management tools invite managers to manage by (number of) test cases. (A former colleague of mine told me she was part of a project where the total number of test cases was determined at the start and each tester was expected to execute a certain number of test cases per day. The solution was to create a number of dummy test cases. When you came up with a test idea during testing, you could fill in a dummy test case. When you were not making your number for they day, you could pass a few dummy test cases.) Managing by test case is as bad as idea as managing programmers by lines of code or number of commits. A programmer’s job is not writing lines of code, even though that is a means he/she uses to achieve their goal. Same for a tester, their job is not executing test cases, but providing information about the quality of the product.

Which leads to a second question: why would you want to manage testing separately from all the other activities? If your testers are in the same team as the programmers, manage the quality and timeliness of the output of the team. If either are not where you want them to be, better testing might be part of the solution and it might not be. In any case, focusing on testing in the hope of testing quality in, is at best going to improve things by accident.

If your testers are in a separate team, you do have to manage testing separately – at least to some degree. Then we’re back to the question: what’s the mission of your test team? And my bet is that test cases are not going to help you to tell if they are achieving their mission or not, because of the reasons mentioned in the paragraphs above.

But what about audits?

The final argument in favor of test case management seems to be audits. The thing is that in my experience the people who claim the most strict requirements for audit, are also the ones who have never spoken to their auditors about what exactly it is they require. Auditors require traceability, but not that you use test cases to achieve that traceability. (In one of my previous jobs we achieved traceability by linking a release to user stories and each user story to one test case each. The test case on its own would contain no information, but attached to it was an Excel sheet documenting the testing we had done. Traceability solved and we could continue doing our jobs the way we wanted to.)

An alternative

So if not test case management tools, what then? Which is two questions: what to document and in which tool to do it?

For what to document I propose a combination of two things: models of the product and heuristics for test ideas.

Models can take many shapes, forms and sizes: architecture diagrams (aka boxes-and-arrows), sequence diagrams, an SFDIPOT model, an ACC (Attribute – Component – Capability) table, … Or as something smaller and more concrete, a list of the different types of users: anonymous, logged in, company admin, platform admin.

Some good heuristics for test ideas can be found in Elisabeth Hendrickson’s Test Heuristics Cheat Sheet and Rikard Edgren’s The Little Black Book On Test Design. Ideally you also have some heuristics specifically for your product based on your team’s experiences.

The advantage of this approach is that it encourages you to think every time about what and how you are testing. In addition to that, the models can also be used during refinement and programming.

A third advantage to documenting models and heuristics is that it gives you a wider choice in which tool to use. Since you are not managing test cases, you don’t have to limit yourself to test case management tools. You can use any information management tool and that’s significantly less of a niche market than the test case management tool. And that might sneakily be one of the bigger advantages of not managing test cases. At the start of this post I mentioned how difficult it would be to build a good test case management tool. By stopping to manage test cases, you’ll have a way better chance to find a tool that will enable your testing instead of hindering it.

Regression testing, it means less than you think

The past weeks I have made several attempts at a blog post about regression testing. About how we use it to refer to different things: tests running on a CI server, people executing test scripts, etc. And about how often the term really doesn’t mean much at all, yet nobody questions you when you use it: “What are you doing?” “Regression testing.” “Oh good, carry on.” The point of the post would be to argue we should use the term ‘regression testing’ a lot less, because most of the time we can be more specific without having to be more verbose.

However, the more I thought about (what I would qualify as) proper regression testing, the more I felt that regression versus progression (or progressive) testing is a distinction without difference. One interesting observation in this regard is that “regression testing” returns 30 times more results on Google than “progression testing” and “progressive testing” combined. So what’s going on here if we have a dichotomy with one member producing so much more discussion than the other? And there’s more: regression testing is commonly contrasted with test types like functional testing and usability testing. But how then should I categorize a regression test focusing on functionality?(1)

Anyhow, more thinking followed and the result is what you are reading here. A blog post describing four different things you could call regression testing. The first two I don’t consider to be regression testing, the third one I find confusing and in the fourth I am reluctant to use the term at all.

Oh wait, I do have to provide a definition of regression testing for within this blog post, so here you go: Regression testing is testing the things you don’t expect to observe any changes in.

Regression testing as part of continuous integration
Situation: your regression test is a bunch of automated checks running on your CI server.

Although I see great value in this, it’s debatable if this qualifies as testing. You commit your code and check if the build is green or red. If it goes red, you know there is a problem somewhere. If it stays green, you know the checks did not detect anything alarming and you move on. This is why people refer to these checks as change detectors(2). Because so far, no real testing is going on. On the other hand, you likely will do testing when you are investigating why the build went red. And testing was involved when you wrote the checks, and arguably when people decided how to run which checks on the CI server. So whether running these checks should be considered testing or not strongly depends on your choice of perspective.

More important for this blog post and referring back to the definition, the decision which checks to run has no relation to where we don’t expect to see any changes. The checks that are run are simply the checks that the CI server has been configured to run. And some of those checks are checks you changed or created because of the code changes in your commit. So there really is no consideration for changed versus unchanged areas here, which means there is no good reason to call running these checks regression testing.

(Not directly relevant, but related and interesting: for a critique of the fully automated approach to regression testing, watch this short Whiteboard Testing video “Regression Testing, the F.A.R.T Model“.)

Regression testing with a suite of regression test scripts
Situation: you have a suite of test scripts you execute every time you perform a regression test.

The problem here is that you have the worst of two worlds: people trying to be like CI servers, executing the same test scripts every time, while that’s not what people are good at. So it’s time to make a choice: either go machine-centric and automate it all, or go human-centric and give them more freedom.(3) In any case, what’s going on here, is not regression testing. It’s slow, error-prone change detection, again with no consideration what has and what hasn’t been changed.

Regression testing by doing actual testing
Situation: you identify the things you don’t expect to observe any changes in and you test those.

This is the part where my thoughts become a little messy. Let’s start with some questions:
– How do you decide that something is a thing?(4)
– When deciding what is a thing and what isn’t, why do we so strongly prefer modules and interfaces as things?(5)
– How do we decide that a change in thing A should not have any effect on thing B?
– How do we know which observations are needed to decide that thing B is not affected by a change in thing A?

The answer to all of these questions is: a model. You need a model for all three activities: mapping out the application, deciding the impact of a change, and designing your tests. However, if my model predicts there are no changes, how can that same model inform my test design? It provides me with no information whatsoever about what to focus my testing on. Wich shows that we should not be using the same model for all three activities. Even more, we should be using different models within all three activities.

However, that does not solve the problem of ‘regression testing’, the definition of which largely depends on the second activity: deciding where you don’t expect any changes. And this decision is fully dependent on which model you use. What is regression testing according to one model, might be progression testing to a different model. So actually the distinction between regression and progression testing is based on our lack of knowledge and/or imagination to predict a change.

And this is where my understanding of ‘regression testing’ starts to break down. If you are testing a refactored feature, are you regression testing or progression testing? The code changed, so that’s progression testing, but the behaviour of the feature hasn’t, so you’re regression testing. If you are testing a feature and you find no bugs in the feature as such, but you do find a regression bug related to the feature, would you say the feature works? If so, how does that even make sense? If one part of a workflow changes and I am testing the whole workflow, am I progression or regression testing? Or does that depend on where I find a bug, if any?

Ok, enough questions. Let’s go over this one more time. A change was made in area A and none of my models predict an impact on area B. Why would I want to regression test area B? Two possible reasons: either I do not sufficiently trust my judgement, or Area B is so important, my judgement doesn’t matter. So what I am acutally saying when I want to regression test area B, is that I have identified a low-risk area that’s important enough that I want to test it. But then… how is this different from testing in general? Why would it need a separate term? Well, that’s why we have a fourth and last section in this blog post.

Regression testing without regression testing
Situation: whenever you test something, you consider what needs testing and you test it. Which is to say: you’re testing a feature, a release, or a whatever, and you decide what testing will give you the information you’re looking for.

This is what in my opinion we should be doing: test the things that are important enough to test. The notion ‘regression testing’ is not relevant in that discussion.

Does this mean that the term ‘regression testing’ is meaningless within this approach? No, it doesn’t. It does mean that ‘regression testing’ is a lot less important as a concept. You might be testing a feature, focusing on usability. And you might not expect to find any issues, because it’s a regression test, i.e. you don’t expect the changes that were made, to have had any impact. Then it makes perfect sense to use the term, but at best ‘regression test’ is the fourth thing you mention, because only in the context of those other three things, it has any meaning.

— — —

(1) I’m definitely not the first one to notice this peculiarity, see e.g. Arborosa’s blog post about regression testing“such a definition […] only provides the intention of this type, or should I say activity, of testing. Regression testing could still encompass any other testing type in practice.”

(2) I am not sure who first used the term ‘change detector’ in this way. The oldest reference I could find is from 2005: Regression testing by Cem Kaner and James Bach.

(3) Just to be clear: machine-centric will involve humans and human-centric will involve machines. The question is: what’s the focus? Humans supporting machines (i.e. machine-centric) or machines supporting humans (i.e. human-centric). And of course, until our robotic overlords are here, at the highest level what you do will be human-centric. (Afterwards too, actually, I hope.)

(4) Some further explanation for the less philosophically inclined: according to a city map, streets and buildings are things, but people and the weather are not. If your application is a city, what kind of map would you draw? What elements would it include and which would it omit?

(5) There’s a more general blog post hidden in this question, expanding into the philosophical: why are most Western ontologies focused on things instead of processes?

Why I dislike test management

As I am enjoying these short, not very nuanced, not extremely well thought out blog posts, here’s another one.

Some people seem to think that it makes sense to think of testing as a project within a project, so they apply project management tools and techniques to testing. This simply doesn’t work.
Because what are the tools and techniques do they use? A plan with milestones no one is ever going to make as unexpected stuff tends to happen. A budget that is too tight because it’s based on that same plan. Entry criteria that are not met, but never mind, we’re running out of time so you need to start testing anyhow. And finally exit criteria that we fail to meet as well, but hey we’ll go live anyway, because the software really isn’t that bad (or so we hope).
So in the end, a lot of time and effort is spent on producing documents that are of little use in guiding the actual testing effort. The only thing they do is give some people a warm and fuzzy illusion of control.

But why doesn’t this test management thing work? In my opinion it’s quite simple: testing on its own doesn’t really do anything. There is no real product at the end of testing; we only produce information.
Of course, one could argue that the product of software testing is a test report, but that’s just weird. No one cares about your test report, they care about the software, about the product. Or rather (and more inspiring for us software testers): they don’t care about the documents you produce, they care about the service you provide. And that gets lost when you focus on the test project instead of on the software project.

p.s. Something is bugging me about this post, but I can’t put my finger on exactly what it is. Ideas anyone?