the testing curve

my learning curve in software testing

Two styles of leadership in spreading context-driven testing (TITANconf)

The last weekend of August I spent with some great people – Kristoffer Ankarberg (@KrisAnkarberg), Kristoffer Nordström (@kristoffer_nord), Anna Brunell (@Anna_Brunell), Fredrik Thuresson (@Thure98), Maria Kedemo (@mariakedemo), Henrik Andersson (@henkeandersson), Maria Månsson, Amy Philips (@ItJustBroke), Richard Bradshaw (@FriendlyTester), Duncan Nisbet (@DuncNisbet), Alexandru Rotaru (@altomalex), Oana Casapu, Simon Schrijver (@SimonSaysNoMore), Zeger Van Hese (@TestSideStory), Helena Jeret-Mäe (@HelenaJ_M), Aleksis Tulonen (@al3ksis), Anders Dinsen (@andersdinsen) – at the awesome TITAN peer conference in Karlskrona, Sweden.

During the conference we discussed leadership and testing and on Sunday morning I got the opportunity to tell my story(1). (I do wish I had captured more of the discussion afterwards to include in this blog post.)

The first style
When thinking about my own leadership in testing, one of the first things that come to mind are my attempts to influence my colleagues at work (testers, developers, project managers) to become more context-driven in their attitude towards testing.

Personally, I discovered context-driven testing at a time I was wondering if I wanted to be in testing at all. I had been working as a tester for about two years and a certain fatigue had set in: “Is this what I want to do the rest of my life?” One of things I did to find an answer, was to learn more about testing. So I searched the internet, discovered context-driven testing and after reading both James Bach’s and Michael Bolton’s blogs from the oldest post to the most current one, I was totally into software testing. To me context-driven testing was a huge discovery: through it I found my passion for software testing.

After that I wanted to share that passion. I talked to people and gave them pointers to blogs, books, conferences. I explained concepts, etc. And that is the first style of leadership I employed: reaching out. And although I have good enough manners not to be too pushy(2), part of my intention really was to make other people ‘see the light’, to convert them to context-driven testing. It shouldn’t come as a big surpise that my successes have been slim to none. Although I did influence some people, the bottom line is that in the end the scoreboard says “Conversions: 0”.

An Oriental excursion
Whenever I realised that my efforts didn’t seem to lead anywhere, I felt disappointed and just gave up for a while. As bad as that may sound, it has lead me to discover a second style(3). Instead of reaching out, it keeps to itself and there isn’t really anything as ‘success’ like in the first style. To be honest, at first I wasn’t sure if I had just found fancier words for ‘giving up’, but when I saw an analogy with how koryu present themselves to the outside, I realized it’s more than a rebranded towel thrown in the ring.

Koryu is the name for the old martial arts of Japan, with ‘old’ meaning that they originated sometime between 1400 and 1868(4). Where modern martial arts are very much about what you can get out of them (physical fitness, confidence-building, self defence, …), the attitude of koryu is quite different(5). Dave Lowry did a great job describing this attitude in his article “So You Want To Join The Ryu?“. (‘Ryu’ means ‘school’.) The first sentence of that article reads: “I don’t care about you.” After which he explains that what he does care about is the ryu. So it’s very much a case of “Ask not what your country can do for you – ask what you can do for your country.”

Does this mean it’s incredibly hard to join a ryu? Well, not exactly. It’s just that you’ll have to make an effort. First of all in finding one: few if any actively look for new members. Next there may be some prerequisites. A fairly obvious one is having read about koryu, so you know what you’re getting into. Finally there’s your attitude: sincerity, politeness and patience will get you a long way. So basically, all that is expected of you, is to show good manners and put in some work.
The reasons why koryu have adopted this attitude are many, so I will focus on just one that’s relevant to this blog post. A ryu is a family, a tight-knit community, passing on an heirloom, a body of knowledge and skills, from generation to generation. Both of these explain why it’s difficult for an outsider to just jump in and participate. You don’t just join a family and you don’t just get to lay your hands on an heirloom that has been passed on for several hundreds of years.

The second style
Where a ryu is both a family and an heirloom, context-driven testing is three things: a paradigm (or school of thought), a community and an approach. Put this way, the analogy is fairly obvious: both are communities focused around a body of knowledge and skills. Of course there are also differences. The most obvious one was pointed out by Duncan: we as context-driven testers do share our ‘heirloom’ openly with the world through blogs, at conferences, in discussions. (We also defend it fiercely when it comes under attack – sometimes too fiercely according to some.) However, I do think the analogy is strong enough to ask: what would a koryu-like style of leadership in spreading context-driven testing look like?

The basis of this style is doing your thing. You practice context-driven testing: you apply it and you work to get better at it(6).
Then, you leave crumbs. This can be anything: referring to a book or conference, sharing a blog post, mentioning a certain concept – as long as it’s something the other person can follow up on, if he or she is interested. Because that’s the main point: you don’t try to reach out to someone, you just show there’s something there. And whether or not the other person does something with that crumb, doesn’t matter to you. It’s all up to them.
Finally, if someone does pick up a crumb, if someone shows curiosity and puts effort in finding out more, you reward them. You give them a bigger crumb. You engage more. And perhaps curiosity wanes after that and perhaps the cycle repeats itself. If so, every time you engage a little more, you invest a little more. But the point is that instead of you reaching out, it’s the other person pulling him or herself in. You’re just there to give directions in case someone is looking for them.

In closing
Of course, reality is a little more messy than what I presented here. I’ve influenced people in different ways outside of work, for instance through this blog or by speaking at conferences. I still find myself switching back and forth between the two styles. And I need more practice in not caring. But I do think the second style suits me better than the first. It saves me from disappointments and it gives other people more freedom to find their own path.

— — —

(1) The slides can be found here.
(2) Do let me know if I’m mistaken here.
(3) This style has some similarities to the honey-badger style, mentioned by Henrik Andersson, which he and Ilari Henrik Aegerter identified.
(4) That’s not very descriptive, but fully explaining would take a full blog post at minimum.
(5) There are quite a few different ryu or schools, each with their own character. So please take note that all my generalizations are by definition wrong, because generalizations.
(6) I didn’t mention this explicitly in my presentation, but someone (forgot who, sorry) commented that part four should be ‘practice’. To me, it’s part of doing your thing.

What’s the word for the part of testing that’s not checking?

The question I asked
Yesterday I asked on twitter:

The reason I asked, is that I noticed I needed that word in discussions about testing and checking. If checking is part of testing – and in the RST namespace it most definitely is, see ‘Testing and checking refined‘ -, then what can I contrast checking with? Contrasting checking with testing (as in ‘checking versus testing’) isn’t going to work: there’s one thing that’s checking and then there’s this other thing, testing, that contains that one thing and some other stuff(1), but it’s like a completely different thing. See the difference? Conceptually that just doesn’t work – at least not in my mind.

The answers I got
So I figured I’d ask twitter in all its infinite testing wisdom and lo and behold, not only did people reply, a discussion ensued with the following people (listed in no particular order) participating in different configurations: @eddybruin, @mariakedemo, @SandroIbig, @TestPappy, @dwiersma, @ilarihenrik, @PhilipHoeben, @huibschoots and @deefex. Thank you all!

Do click on the embedded tweet to read all of it, but here’s a list of the answers they came up with:

  • Exploring
  • Learning
  • Evaluating
  • Monitoring
  • Confidence building and refining
  • Experimenting
  • Non-checking

It only took a few replies for me to realize I may have asked the wrong question – as in: not the question I had intended to ask. And a quick look at the diagram in ‘Testing and checking refined‘ confirmed this:

full blog post at http://www.satisfice.com/blog/archives/856

Testing is a very big box. Learning is a part of it and so are experimenting, studying, questioning, modeling, etc. *and* checking. So the part of testing that’s not checking, isn’t just one thing, it’s many things. Hence Del Delwar’s (@deefex) reply: “I’d suggest that’s possibly too wide an array of things to encapsulate in a single word. Try ‘non-checking’ :-)”

The question I meant to ask
So with that settled, on to the question I meant to ask: If checking is “the process of making evaluations by applying algorithmic decision rules to specific observations of a product” (source yet again ‘Testing and checking refined‘), then what’s the name for the non-algorithmic evaluation of a product? A ‘heuristic evaluation’? Does such a thing exist? Or are all our evaluations during testing checks?

First of all, when I test, it doesn’t feel like all my evaluations are checks. That may not be a very strong argument, but I do think it’s worth to at least note.

Secondly, where do these algorithmic decision rules come from? Do I need to have them beforehand? Do I need to have them recorded somewhere? Or can I just make them up as I go along? More importantly, do these rules need to be explicit?

And that last question led me to a bunch of philosophical questions:
– If not all of our evaluations are checks per se, is it possible to (re-)formulate them as checks?
– If I can’t express my evaluation as a check, how would I able to communicate in a meaningful way about my evaluation?
– If my evaluation is founded on tacit knowledge and there’s no need to make that knowledge explicit, because the people I communicate with, share in that tacit knowledge, can that evaluation still be considered a check?
– Does it matter if the tacit knowledge on which an evaluation is based, is ‘weak’ (we could make it explicit) or ‘strong’ (we can’t make it explicit)?
– Where does algorithmic end? I can make an algorithmic decision if I find something beautiful, by observing my aesthetic feelings towards that object. If I observe positive feelings within myself, I find the thing beautiful. However, I can’t make an algorithmic decision if I find some beautiful (yes, I know, that’s the exact same sentence), because I can’t specify a set of algorithmic rules that decide if I would find something beautiful. So what it boils down to is: is the algorithmic evaluation of an observation of my own mental state a valid check? Or should we go one level deeper, to the cause(s) of my mental state?
– Is the previous bullet point anything other than a philosophical quagmire one needs to extract oneself out off Münchhausen-style? In any case I highly recommend reading Raymond M. Smullyan’s “An Epistemological Nightmare” to sink a little deeper.

Back to the context of the question
So… why was I asking this question again about non-algorithmic evaluation during testing? It’s quite simple actually:
(1) If testing is investigating a product to evaluate it
AND
(2) all evaluation is done by applying algorithmic decision rules,
THEN
(3) the core of testing is checking.

Of course, there’s all the stuff going on around the checking. There’s all the investigating, modeling, experimenting to come to the checks. And there’s all the sense-making of the results of the checks to provide valuable information to our stakeholders. But it all revolves around checking.

So when I am talking to someone who seems to think that there’s nothing more to testing than checking, I can argue that there’s all this other stuff we testers do that is testing, but is not checking. But what I cannot argue is that there is something we do instead of checking (so something non-algorithmic) that leads to evaluative data(2) about the product, because there is no such thing. And that bugs me, because that’s not how I’ve been using the words ‘checking’ and ‘testing’.

— — —

(1) Advanced semantics question: is ‘checking versus testing’ more like ‘apples versus fruit’ or more like ‘squares versus rectangles’? (a)

(2) Data + interpretation = information. Hmm… or: Interpretation(data) = information.

— — —

(a) Apparently the correct answer is “leaves vs trees”. (https://twitter.com/al3ksis/status/633343017252995073)

Test automation – five questions leading to five heuristics

In 1984 Abelson and Sussman said in the Preface to ‘Structure and Interpretation of Computer Programs‘:

Our design of this introductory computer-science subject reflects two major concerns. First, we want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute. Second, we believe that the essential material to be addressed by a subject at this level is not the syntax of particular programming-language constructs, nor clever algorithms for computing particular functions efficiently, nor even the mathematical analysis of algorithms and the foundations of computing, but rather the techniques used to control the intellectual complexity of large software systems. [emphasis mine]

This oft-quoted sentence I emphasized, is even more true if the purpose of our programs is test automation(1). So let’s say you run your test automation program and the result is a list of passes and fails.  The purpose of testing is to produce information. You could say that this list of results qualifies as information and I would disagree. I would say it is data, data in need of interpretation. When we attempt this interpretation, we should consider the following five questions.

Question 1: What exactly is this list of results as such telling you?
Picture the list of test results. All it contains are the names of the test cases and whether they passed or failed. With just that list in front of you, how much do you know? How easy is it to identify potential problems? To identify where you need to start investigating? Are you able to do that based on the list as such? Or will you have to dive into the details of each test case to be able to do this? I certainly hope not…

Question 2: How do you tell false negatives from true ones?
Going through the list of passes and fails, you’ll probably feel good about the passes and bad about the fails.(2) So you set out to investigate the failed test cases. However, some will be true negatives (the test exposed a bug) and some will be false negatives (the test is wrong). How will you be able to tell the difference?

Question 3: How do you tell false positives from true ones?
Not only can we have false negatives in our test results, we might also have false positives. Test cases that pass, although they shouldn’t have. How will you be able to tell the difference here? And more poignantly, where will you find the motivation to even start looking for the false positives? Why can’t we just be happy all those tests passed?

Question 4: How do you find the thing that’s broken? Or even more fun, the things that are broken?
So you have at least one test that doesn’t return the result you want to have. That means the result is a either a fail or a false positive. (So yes, of the four possible outcomes, three require further action.) For your investigation there are basically four areas to focus on:
– The product under test. You found a bug. Good job!
– Test design. You designed a test to identify a potential problem, but it turns out that problem isn’t actually a problem.
– Test execution. You made a mistake in how you translated your test designs into test automation code.
– Test tooling. Your tool (this includes the test environment) had a ‘hiccup’ or a ‘glitch’.

These four areas are relevant whether you’re investigating automated tests or other tests. However, a major problem with automated tests is that this investigation is more difficult because two of the four areas are bigger. First of all there’s the test execution area. Your translated test designs will be interpreted by a computer, which has a lot less interpretative flexibility than a human being. So your translation needs to be of a higher quality than if you were translating for another human being. Secondly, the test tooling area is bigger, simply because you have more tooling.

Question 5 (bonus meta-question): What understanding are you losing by automating?
Toyota is not unfamiliar with automation. And last year, they decided to replace a number of robots in their factories with human workers. Why? As project lead Mitsuru Kawai says:

We cannot simply depend on the machines that only repeat the same task over and over again. To be the master of the machine, you have to have the knowledge and the skills to teach the machine. (source: Bloomberg)

Toyota realized that by fully automating the car manufacturing process, they were losing important knowledge and skills about how to build cars. So no, they’re not replacing all robots with humans, but they are putting humans back into the manufacturing process so that learning and improvement can happen. The same applies to test automation. If it is keeping you from interacting with the product, from actually testing yourself, it’s time to rethink your approach.

Epistemic testability
In the end it all boils down to one question: is your test automation increasing our decreasing your epistemic testability? Does it make it easier or harder to bridge the gap between what we know and what we need to know about the status of the product? Test automation is excellent in providing you with the illusion of increased epsitemic testability: “Every night we run 10,000 tests in less than an hour!” While actually decreasing it: “Alice and Bob spend four hours every day processing the results!”

Having thought about those questions, I have gathered the following set of heuristics on test automation. Time and experience will tell if they’re any good…

Heuristic 0: Don’t call it test automation.
As James Bach pointed out at Tasting Let’s Test Benelux, developers used to talk about “automatic programming“. The meaning of the term has changed over time, but at no point in time did developers think that when you do automatic programming (e.g. use a compiler), all of programming has been automated. So either we change the meaning of ‘test automation’ in a similar way (which fails to account for the testing-checking distinction), or we come up with a better term. I’m still looking for a better term, all suggestions are welcome.

Heuristic 1: Never trust a test you haven’t seen fail. (source: Colin Vipur via Rob Fletcher)
It will help you avoid false positives. But we should actually takes this several steps further, as you can read in this blog post by Richard Bradshaw: Who tests the checks? Do go read the whole post, but one excellent thing he proposes is to test if a failing test gives sufficient information about why it fails.

Heuristic 2: Each test should test only one thing. (s/test/check, of course)
This will reduce the complexity of your investigation when your test needs investigating. If it fails, you can begin looking at the one thing your test is testing. Also, if each test tests only one thing, you will have several quite similar tests. Looking at all of them, seeing which passed and which failed, will give you useful clues in your investigation.

Heuristic 3: It’s better to have reliable information that doesn’t exactly tell you what you want to know, than unreliable information that does.
With reliable I mean: Does it run all the tests every time with a minimal risk of false positives or negatives? If to get that reliability, my tests don’t run on the level I would like to run them (e.g. the GUI-level), I’m more than happy to make that trade-off. The additional interpretative step I need to make, is less of a risk than the extra effort it takes to deal with a flaky, unreliable test set that doesn’t require that step.

Heuristic 4: Every minute spent debugging test automation code is wasted, because you learn nothing.
Going back to the four areas to investigate, the first three (product, test design, test execution) are interesting from a tester’s perspective. Investigating these will provide you with opportunities to learn about the product or about testing. Not so with a failure in your test tooling. It’s an impediment that needs to be solved quickly. In this respect there is no difference between a failure in your test automation tool and a failure of your keyboard.

Heuristic 5: Epistemic testability, epistemic testability, epistemic testability.
Repeating this because it is so important. It is the litmus test of your test automation. Consider it when choosing your tools, when deciding on abstraction layers, when designing your tests, when composing your test set, when writing your test automation code, when testing your tests, when documenting your tests, when interpreting the results. Because when you have your first test results, your first list of passes and fails, it’s the epistemic testability that will decide for a large part how useful that list will be.

(This post was deeply influenced by the ideas James Bach, Micheal Bolton, Alan Richardson, Pascal Dufour, Richard Bradshaw and the BBST Bug Advocacy Course. Thank you to all.)

— — —
(1) Or, instead of test automation, a better term would be ‘check execution automation’. Although this is an important distinction, I’m not going to pursue it today. If you do want to, this post is a good starting point: Testing and Checking Refined by James Bach and Michael Bolton.

(2) Be wary of the binary disease! Luckily there’s a cure: Curing Our Binary Disease by Rikard Edgren at Øredev 2011.