how this tester writes code

A long time ago (March 2015) I wrote a post titled “Test automation – five questions leading to five heuristics“. Later that year Rich Rogers asked for a follow-up. To which I replied I should do a follow-up post (ahum) “soon”.
Then last Wednesday Noah Sussman said on twitter: ‘I don’t know that I’ve *ever* seen “this is how testers write code‘. To which I replied “challenge accpeted”, so now here we are, me writing a blog post about how I as a tester write code.

The format of this post turned out to be advice based on my experiences, so the usual disclaimers apply. And feel free to leave a comment if you have any feedback!

the basics

use an IDE

An IDE is not just an advanced text editor. It understands your code – to a degree, since it’s not interpreting, compiling or executing the code. So not only allows an IDE you to manipulate your code as text, it also allows you to manipulate your code as code.

The first place people seem to run into this, is when renaming functions or variables. With a basic text editor you do all the renaming yourself. With a more advanced text editor you have several ways to do a find-and-replace. With an IDE, however, you can do a refactor-rename, which means the IDE will figure out all the places it needs to that rename for you.

Realizing this is the power of a good IDE, is an important step. It allows you to interact with code-as-code, with code-as-text as a fall-back option.

using libraries well takes skill

Using libraries involves several skills. There’s the skill of picking a library based on the documentation, the number of recent commits/releases, number of stars, quality of the code, quality of the tests, trying it out. There’s the skill of using the library by reading the documentation, looking at examples, reading (parts of) the library code, experimenting with it.

Finding the right library that can do most of the heavy lifting for you and being able to use it to its full potential, will make your life a lot easier. So acquire these skills.

use version control

As the saying goes: “Version control is a great way to do version control.” It’s definitely better than manually adding version numbers to your source code files (been there, done that).

As for version control tools, Git is a great one. It’s very popular and probably also complete overkill for what you need. Don’t let that discourage you. My own approach to using git consists of:
1. basic commands: add, commit, push, pull, checkout, checkout -b, merge, rebase, push -f, stash, stash apply.
2. git aliases for some commands I never remember, e.g. “git oops” for “git reset HEAD^” and “git tree” for some ridiculously long command that shows a tree-like view of commits.
2. reading whatever git, GitLab and GitHub tell me.
3. Googling stuff, which often brings me to this page.
4. whatever I remember of a video (can’t find it, sorry) about how git works (local vs remote, hashes and trees).

One more thing: if you use version control, small commits are the way to go. And don’t feel bad if you have to learn that the hard way. I suspect most of us do it that way.

next steps

mess around with different languages

Over the years I have written code in Bash, Perl, PHP (if adding a string to an array counts), VBA for Excel, Java, JavaScript, TypeScript, and Python. In some more than others and I must admit that in any of those languages except for Python I’d have trouble writing three lines of correct code without looking something up.
And that’s ok. There’s value in having a little experience in a bunch of different languages, because it gives you an idea in which ways those languages are different, and in which ways they are similar.

dive deeper into one language

James Powell his talk So you want to be a Python expert? was important to my learning in three different ways. First of all, watching it and realizing I understood most of it, made me appreciate how much I had learned already. Secondly, I learned a lot about some more advanced features of Python and more importantly the ideas behind them. Thirdly, I realized that learning that stuff is important to progress beyond basic programming.

writing good code

readability

I was making two big mistakes in this area until a developer showed me the error of my ways. The first mistake is that I used short names for everything, because I didn’t realize that IDEs with their auto-completion and refactor-rename make using longer names trivial. The second mistake is that I only used functions and methods to avoid copy-pasting code, never to provide structure to the code.
Avoid those two mistakes by giving everything a proper descriptive name and by using functions and methods for structure, and you have two of the three basics of clean code covered. The third one is: refactoring. Getting naming and structure right in your first version is incredibly hard. So iterate a few times until it’s good enough.

As soon as you have a basic grasp of the above, learn more about clean code and design patterns. Some smart people who have written a lot of code, have thought a lot about how to write good code. Learn from their insights.

separation of concerns

Separation of concerns is one of the most important design principles. If we should use functions and methods to structure code, then what should be the scope of one function or method? Ideally a function or method does one thing and one thing only. That also makes naming easier: you name it for the one thing it does.

If you write code this way, you will have separation of concerns on the lowest level. Next you might notice that some pieces of code are more similar than others. There’s the code that forms the actual tests, the code that interfaces with the software we are testing, the code that does data setup and teardown, etc. That too is separation of concerns, but one level higher: group functions and methods that do similar things together.

If this still seems a little abstract, the most widely known example of separation of concerns for test automation is the Page Object Pattern. You separate the code that knows how to interact with the DOM (the page objects) from the code that contains the test steps and asserts.

logging and debugging

In the beginning it’s fine to use print statements to get an idea of what your code is doing. At some point however, you’ll want to switch to logging. It gives you more flexibility through different log nodes, different log levels, and different log handlers.

If you find yourself adding more and more print statements and/or logging to figure out why your code is not working, switch to a debugger. Especially with a good IDE the learning curve is not that steep and nothing beats being able to inspect your code while it’s doing what it’s doing.

exploring and one-offs

Your code should allow you to easily add some new tests based on the existing ones, for the times you want to explore some behavior that’s not quite covered by the existing tests. Afterwards you can decide to clean up and commit, or to discard these tests.
It should also be easy to create one-off scripts based on your tests and framework. For example, a script to do the data setup for an exploratory test session or a script to generate load as part of a crude performance test.

If your code only allows you to run your tests as they are, you’re missing out.

building good tests

a test should do one thing and one thing only

This is separation of concerns applied to tests: a test should do one thing, not multiple. Of course, depending on the type of test, the size of that one thing will vary. A test on the unit level will be smaller than a test going through the whole stack.

Tests doing one thing only also makes naming easier. Name the test for the thing it is testing. And by this I mean: capture the intention of the test, not the implementation. The implementation I can get by reading the code, the intention I might be able to deduce from the code, but there’s a good chance I can’t.

Separation of concerns for tests also means having all setup and teardown in separate functions or methods. This makes it a lot clearer what the test is trying to cover. So when you start reading a test, you are reading the actual test immediately, not some preparation steps.
Sidenote: ideally your setup and teardown do not have asserts, but will throw exceptions when unexpected things start to happen.

interfaces should be sufficiently transparent

So we have separated our interface code from our test code and most likely we are using some library (e.g. Selenium WebDriver) to do the actual interfacing. Life is easy and code is readable. However… how exactly are we interfacing with the software we’re testing? Do we know what our test is actually doing to the thing we’re testing?
That’s why I want interfaces to strike a balance between ease-of-use and transparency. I do want to say “do a GET on this url” without having to worry about the actual HTTP implementation, but I also want to know what my HTTP library puts in the headers of the requests.

Do you have to write tests for your tests?

(After I accepted Noah’s challenge, I asked if there was any topics in particular he wanted me to cover. This is one of them. Follow the link to read Pedro Gonzalez‘s response and Noah’s reaction.)

If you feel the need to write tests for your tests, that suggests your tests are too complicated. So no, don’t write tests for your tests. I am in favor, however, of testing your tests. This testing can take many forms: a review by you or (even better) someone else, making the test fail through a change in the test or the software under test, mutation testing. You can also consider your tests as implicit tests for your other tests. Do note that how much coverage you get from this implicit testing can vary a lot depending on the kind of tests.

One thing you should consider writing tests for, is your test framework and utilities. If they’re fairly trivial, it might not be needed. As soon as some complexity creeps in, add some simple tests – rather add them a little too soon than a little too late. It will make refactoring your framework and utilities easier.

Can you apply TDD to building tests?

Thinking about writing tests for your tests, I began to wonder: what would doing TDD for tests look like? (Disclaimer: I’m familiar with the ideas behind TDD, but have never practiced it.)
You’d start by writing a meta-test, a test to test your test. You’d run it and it fails, because you haven’t written the test yet. So you write the test. You run the meta-test. It passes. (Or not, so you work on the test until the meta-test passes.)
Now what would a meta-test look like? It’s good test design to have the meta-test not be aware of the implementation details of the test. So the meta-test should only care about what the test returns, which is pass/fail/exception. But isn’t that what our test runner already does for us? Then why write a separate meta-test?

And that train of thought gave me an idea. Perhaps it could be a good practice to start building a test by giving it a name and writing the asserts you want for that test. If you’d run that test, it would fail horribly, because there’s nothing to assert yet. Then you add code to the test until the asserts can evaluate what they need to evaluate.
As I said, right now this is only an idea. If you decide to try it out, please let me know how it went.

Is it ok to have conditional logic in your tests?

(After I accepted Noah’s challenge, I asked if there was any topic in particular he wanted me to cover. This is the other one. He answers this question himself here.)

Needing conditional logic in your tests suggests to me that you’re lacking control somewhere. That your tests are not as deterministic as they should be, so you add conditional logic to remedy that. This really should be a last resort where no other solution is feasible, and the value of the test outweighs it being only half-deterministic.

Conditional logic in setup or teardown is a different matter, by the way. For instance for tests where data creation is expensive, doing a get-or-create can be a good solution.

knowing what’s going on when your test runs

Testing is investigating to evaluate a product. So you want as much information as possible – assuming it’s structured well enough you can navigate it easily. So good test reports and logging are crucial. When a test fails, you want to have more information than the name of the test, the failed assert, and its assert message.

tests-as-code versus tests-as-tests

two different perspectives

As hinted at by having a section on “writing good code” and one on “building good tests”, I approach auto-tests from two different perspectives: tests-as-tests and tests-as code.

Usually these two perspectives align nicely, for example when talking about separation of concerns. Sometimes, however, they do not.
The don’t-repeat-yourself (DRY) principle is a good example. Even when you apply separation of concerns, auto-tests tend to get a bit repetitive when you want to test different variants of the same scenario. (Test parameterization can help here too.)
Let’s say you have a bunch of tests expecting the same (or a very similar) error message. Following the DRY principle, it would make sense to put that message in one variable shared across tests. From a tests-as-code perspective this makes perfect sense. From a tests-as-tests perspective, however, it does not, because it makes the test more difficult to read.
In cases like this I prefer to have easier to read tests over easier to maintain code.

Another thing is that when you’re focused on one of the two perspectives, it sucks to be forced to switch to the other perspective. When you’re in tests-as-tests mode, you’re thinking about the software you’re testing, about the problems it’s trying to solve, the capabilities it tries to deliver. In tests-as-code mode, you’re thinking about the code you’re writing, how to make it do what you need it to and how to keep it maintainable. Switching between the two in a natural rhythm is fine; being forced to switch not so much.

a programming language is a DSL

There are quite some testing tools and libraries that introduce a domain-specific language (DSL) to make writing tests easier. Their claim is that it’s easier to learn their domain-specific language than it is to learn a general-purpose programming language.

In my opinion that’s not an entirely fair comparison. I don’t need to learn all of Java or Python or … to be able to build tests in those languages. I only need to learn the part of the language that’s relevant to the building I want to do. Hence, a programming language is a domain-specific language.

Now I must admit that most DSLs are easier to learn than the required sub-set of a programming language. Even so, I prefer general-purpose programming languages for two reasons.
First of all, a DSL is limited to its domain. If you want to do something it wasn’t designed for or something outside of that domain, you’ll either need to switch to a different DSL or to a general-purpose language. Since that’s bound to happen at some point, I’d rather learn a sub-set of a general-purpose language from the start and expand my knowledge of that language when needed.
Secondly, I think there’s value for anyone who’s part of software development, to have a basic understanding of programming. The best way to do that is to do things in code on a a regular basis – preferably as part of your job. Building tests is a great way for non-programmers to have that experience.

revisiting my blogpost from 2015

As mentioned in the introduction, in 2015 I wrote a blogpost with the title “Test automation – five questions leading to five heuristics“. So let’s see revisit those heuristics more than four years later.

Heuristic 0: Don’t call it test automation.
I don’t think I ever stopped calling it test automation. When I wrote the post, I was stricter in using the best possible words. Currently, my position is that if communication is happening, the words we are using are good enough. I do prefer the term “auto-tests” over “automated tests”, since the latter still suggests to me that perhaps we might be able to “automate all the testing”.
(And yes, I snuck six heuristics in my original post by starting with 0.)

Heuristic 1: Never trust a test you haven’t seen fail.
Yep, see “Do you have to write tests for your tests?“.

Heuristic 2: Each test should test only one thing.
Yep, see “A test should do one thing and one thing only“.

Heuristic 3: It’s better to have reliable information that doesn’t exactly tell you what you want to know, than unreliable information that does.
Not covered in this post, although it is related to having deterministic tests (“Is it ok to have conditional logic in your tests?“). And I still agree: I’d rather have a reliable test that doesn’t entirely test what I want to test, than an unreliable test that does (or more precise: tries to).

Heuristic 4: Every minute spent debugging test automation code is wasted, because you learn nothing.
I was quite amused reading this, since it seems I was finding no joy at all in writing and debugging code. I don’t think that was actually the case, but it is true that I enjoy programming more now than I did back then in 2015. The reason is quite simple: I’ve gotten better at it.
Setting aside the tone of that paragraph, I do see a link with what I have written above about tests-as-tests and tests-as-code, and about how being forced to switch between them sucks.

Heuristic 5: Epistemic testability, epistemic testability, epistemic testability.
Revisiting the definition of epistemic testability, I don’t think I was using the term the way James Bach intended. Be that as it may, I still stand by the point I was making: is your test automation helping you to do better testing? And by extension, helping you to develop and deliver software better?

Solving Black Box Puzzle 31 with data analysis

James Lyndsay has created a number of amazing Black Box Puzzles: tiny applications that challenge you to figure out what they do. (You can support him in creating more of these at his Patreon page.) Two of these Puzzles, 29 and 31, not only have a GUI to explore, but also an API.

And that gave me an idea. If you explore these Puzzles through their GUI, you start from the inputs. You try out different inputs in the hope of discovering a pattern in the outputs. And then that pattern feeds back into your exploration.
With an API, however – and because of the nature of Puzzle 31 – it becomes easy to get the outputs for all possible combinations of inputs. Which means you can start your exploration from the outputs instead of the inputs.

Before I tell you how and what I did, three important remarks.
First of all, I will be spoiling the solution to the Puzzle in this blog post. So this is the right moment to go and solve Puzzle 31 for yourself first. Or at least go play a bit with it, so you have an idea what the inputs and outputs are.
Secondly, I had already solved the Puzzle through the GUI a few months ago. So it was more of a “Can I find the solution this way as well?” than a “Can I find the solution?” thing.
Finally, the code and the spreadsheet I created (linked throughout, also available on GitHub here), are not very clean. I thought about tidying them up, but my two reasons for not doing so are (1) laziness; (2) the way they are now gives a more honest picture of what I did.

Getting all the combinations for Puzzle 31

The API of Puzzle 31 is described here. You can actually use it from your browser by adding the query parameters for all the buttons like this: http://blackboxpuzzles.workroomprds.com:8002/puzzle31?buttonA1=up&buttonA2=up&buttonA3=up&buttonB1=up&buttonB2=up&buttonB3=up&buttonC1=up&buttonC2=up&buttonC3=up, which will return {"lamp1":"on","lamp2":"off","lamp3":"off","lamp4":"off"}.

With that figured out, the next question was how to iterate over all the possible inputs in the API requests. Turns out that the Python library itertools has a product function, which does exactly that: product(('up', 'down'), repeat=9).

Sending API requests with Python is something I have done before (yay requests library!). The same for writing data to a csv file. So I ended up writing this Python script, which got me this csv file.

In hindsight I would make one change to the script. The values “down”/”up” and “on”/”off” make data analysis harder than it should be. So later on I created a different csv file replacing down”/”up” and “on”/”off” with 0/1.

Data analysis with a spreadsheet

Before trying to tackle the analysis with Python (see the next section), I went at it with a spreadsheet. I have done some data analysis and manipulation with spreadsheets in the past, so I figured that with some filters, some formulas, some conditional formatting, and perhaps a pivot table, I should be able to solve the Puzzle.

You can find that spreadsheet here. Normally I would use Excel, but I got curious if I could get it done with LibreOffice Calc instead. Turns out that yes.

Solving lamp 1

First thing I did was checking how many input combinations would turn each of the lamps on/off. For lamp 1 there are only 2 combinations that turn it on. For all other 510 combinations, it is off. Lamp 2 is on for 337 combinations, off for 175. And lamp 3 and 4 have the same ratio: 169 combinations for on and 343 combinations for off.

These results suggest that although we have four lamps, there might be just three types of behavior. Or just three behaviors. So I checked if lamp 3 and 4 do the same thing, but that’s not the case. For 91 input combinations both are on, for 2 x 78 one is off and the other on, for 265 combinations both are off.

Ok, back to lamp 1, because with just 2 input combinations that switch it on, it should be easy to look at those two combinations and figure out how it works.

(Screenshot made after I added conditional formatting to solve lamp 3 and 4.)

And there it is: lamp 1 switches on when all buttons are either up or when all of them are down. So that’s lamp 1 solved.

Solving lamp 3 and 4

With lamps 2, 3 and 4 having so many combinations resulting in the lamps being either on or off, I added a column to the spreadsheet counting the number of “down” buttons for each input combination. This told me that switching lamp 2 requires at least two “down” buttons. Switching on lamp 3 or 4 required at least 3 “down” buttons. (So lamp 3 and 4 are still very similar.)

Another thing I did was adding conditional formatting similar to the GUI of the puzzle. “Down” buttons turn blue and “on” lights turn red. This made it a lot easier for me to spot patterns while looking at the data.

I decided to look further into lamp 3 and lamp 4 first. Looking at the input combinations with the minimal number of three “down” buttons that switch the lamps on, shows a clear pattern:

Lamp 3: buttons with the same number.
Lamp 4: buttons with the same letter.

Lamp 3 switches on if three buttons with the same number are down; lamp 4 if three buttons with the same letter are down.

Of course, the question remains if that’s all there is to lamps 3 and 4. I did a quick visual spot check for lamp 4 with more than three buttons in the down state. That was more of a formality, though, since I knew from solving the Puzzle through the GUI, that I had the correct solution.

What I should have done however, was add a formula to the spreadsheet calculating the state of lamps 1, 3 and 4 based on the inputs. Then add another formula comparing those states to the actual states and check if I got al off them correctly. So I added these for lamp 4 in the spreadsheet in sheet “puzzle31_binary”.

Solving lamp 2

Taking a similar approach as with lamp 3 and 4, I looked at all input combinations with only 2 “down” buttons that switched lamp 2 on. That suggested a pattern related to the middle buttons of the outer rows and columns. However that pattern didn’t seem to hold up when also looking at the inputs with 3 “down” buttons. So I needed something more.

I decided to turn to pivot tables (pun intended). That didn’t gain me anything, though. I have only very rarely used pivot tables in Excel and now in LibreOffice Calc I failed to make a pivot table do anything useful. Back to using filters to make sense of lamp 2.

To make that a little easier, I copied all inputs that result in lamp 2 switching on to a new sheet. Filtering the input combinations on 2 or 3 “down” buttons and counting how often each button was “down”, showed a pattern. The count would be either 5, 6 or 15. And only the ones with count 15 play a role in the combinations with 2 “down” buttons. So I filter further, keeping only the combinations in which the inputs with count 15 are “up”. And I am left with two input combinations: A1-B2-C3 and A3-B2-C1.

On a hunch I decided to go back to the first filter result, while filtering out the two input combinations I just found. (Hence the “man filter” column in the sheet.) Looking at the counts, that leaves me with only two groups of inputs: the one with a count of 4 and with a count of 15. So it seems that hunch was a good one.

Then I look at all the input combinations in which button A2 is “down”. If there are only two “down” buttons, these are either B1 or B3. And when there are three, either B1 or B3 are “down” as well.

Performing the same steps with B1, shows the same pattern, but with A2 and C2. Which means there are two types of input that switch lamp 2 on: (1) inputs of the pattern A2-B1|B3, and (2) inputs of the pattern A1-B2-C3.

And that solves lamp 2: where lamp 3 and 4 are about the straight lines, lamp 2 covers the diagonals.

Data analysis with pandas and seaborn

For the data analysis in Python I created a Jupyter notebook – been wanting to look into these – which you can find here (rendered very nicely by GitHub, btw). For analysis I used the pandas library, for visualization seaborn.

Visualizing a single input combination

After using pandas’ dataframes to replicate some of the steps with the spreadsheet, I decided to “mis-use” seaborn’s heatmap to visualize the two input combinations in which lamp 1 is on. That allowed me to figure out how to create a heatmap, and how it would be displayed in the notebook.

All buttons in the “up” state.
All buttons in the “down” state.

Lamp state heatmap – attempt 1

Now that I had an idea of how to create heatmaps, I decided to create one of all the input combinations that result in lamp 3 switching on.

Input combinations that result in lamp 3 being on.

Obviously that didn’t do me any good. To be honest, I couldn’t quite believe my eyes, so I verified the result using the spreadsheet. It checked out. Thinking about it I realized that the input combinations could be sorted into three groups. And each group could be defined as: three buttons with the same number that are “down”, combined with all possible combinations for the other buttons. So basically all the noise drowns out any signal.

Lamp state heatmap – attempt 2

I realized I had to filter the data in some way, so I decided to reduce the dataset to input combinations that resulted in only one of the lamps being switched on.

lamp 1 only – all on or all off
lamp 2 only – diagonals
lamp 3 only – verticals
lamp 4 only – horizontals

You could say that the lamp 2 heatmap suggests a diagonal pattern, lamp 3 a vertical one, and lamp 4 a horizontal pattern. However, it’s not really clear cut. If I hadn’t known the solution, I’m not sure what I would have concluded from these heatmaps.

lamp state heatmap – interlude

By now I had figured out I needed to find a way to analyse the correlation between different inputs – instead of throwing all inputs and outputs on one big pile.

After some googling I found that I could use pandas’ size() on the result of a DataFrame.groupby() to get more insight into these correlations. As you can see below, if I group by A1-A2-A3, I can see how often each unique combination of values of A1-A2-A3, occur in a dataset.

So I took the dataset that only switches lamp 4 on and grouped by A1-A2-A3, A1-B1-C1 and A1-B2-C3. The first two you can see below. The third one you can find in the notebook; it’s very similar to the second one.

A1  A2  A3               A1  B1  C1
0   0   0     6          0   0   1     5
        1     2              1   0     3
    1   0     1                  1     2
        1     1          1   0   0     5
1   0   0     2                  1     7
        1     2              1   0     2
    1   0     1
        1     9

I noticed two things. Firstly, the A1-A2-A3 result seems to adhere to some logic, while the A1-B1-C1 looks more random. Secondly, the A1-A2-A3 grouping contains input combinations in which all three buttons are “down” or “up”. The other grouping does not.

lamp state heatmap – correlation

Feeling I was on the right track, I started browsing the pandas documentation and there I found the solution: the DataFrame.corr() method. It does exactly what I need it to do. You feed it all the input combinations that result in for example lamp 4 switching on and it calculates the correlation between the inputs.

The result is a table, where a positive number indicates a positive linear correlation (highest is 1), zero indicates no correlation, and a negative number indicates a negative linear correlation (lowest is -1).
Since we are looking for a positive correlation (buttons being either up or down together resulting in lamp 4 switching on), this table allows us to solve lamp 4. Anything with a positive number is part of the pattern for lamp 4.

And the pattern is even clearer in a heatmap:

lamp 4 input correlation heatmap

Since I was curious, I did the same for the dataset of input combinations in which only lamp 4 and none of the others are switched on. Interestingly, although the pattern is still visible, it is less clear. We need the full dataset with what I previously called “noise” to properly calculate the correlations.

lamp 4 only input correlation heatmap

Now let’s look at the other three heatmaps. The one for lamp 3 shows a clear pattern similar to lamp 4.

lamp 3 input correlation heatmap

The correlation heatmap of lamp 1 shows a clear pattern, although it does look a bit weird because of the logic behind its behavior: all buttons either up or down.

lamp 1 input correlation heatmap

Finally, the heatmap of lamp 2 does show a clear enough pattern (focus on the positive versus negative numbers), but it doesn’t jump out as much as with lamps 3 or 4.

lamp 2 input correlation heatmap

I can think of two reasons for the pattern not to jump out as clearly.
First of all, there are two patterns in the data: diagonals of two “down” buttons and diagonals of three “down” buttons. Add to that all the different states the rest of the buttons can be in and you get a lot of noise.
Secondly, some of the correlations of the two “down” buttons pattern cancel each other out. You can see that in this heatmap showing the correlation between input combinations with max two “down” buttons switching lamp 2 on:

lamp 2 input correlation heatmap – two “down” inputs (matplotlib render)

I don’t think this heatmap is a really valid way of using correlation coefficients because of the limited dataset, but it still tells us something about the behavior of lamp 2.

And that wraps that up: Puzzle 31 solved with Python data analysis.

Closing thoughts

Since I wasn’t familiar with all the tools I was using, solving the Puzzle this way took a long time – literally hours. I’d be faster next time, of course, but finding the solution through exploring the GUI can be done in 5-10 minutes.

I had a lot of fun trying to figure out how to get these tools to do what I wanted. So a big thank you to all the great people who built these tools.

I still have a lot to learn on how to do data analysis well. For instance, when seeing the positive numbers in the correlation tables, I was like: “Puzzle solved!” It was only when writing this blog post, I dove a little deeper into what those numbers actually mean.

Do not underestimate the power of spreadsheets. As Felienne Hermans says in her talk “How to teach programming and other things?” (great talk, go watch when you’re done here) at 5:25:

Excel is an amazing programming language that empowers ‘normal people’ to do programming in a variety of different domains in finance. And this became the motto of my PhD dissertation: “Spreadsheets are code.” Spreadsheets are a valid means of programming.

In that light it was interesting to notice how a Jupyter notebook is basically a spreadsheet on steroids.

Something that wasn’t as fun as I had hoped, was solving the Puzzle with Python. I didn’t have the expected eureka-feeling when I saw how the correlation heatmaps provided the solution. Ok, it was the end of a tiring day and I knew the solution beforehand, but still.

Writing this blog post made me realize how my decisions while doing the data analysis were informed by me already knowing the solution. To give one very basic example: I knew that putting buttons in their “down” state made the lamps switch on. So from the start the focus of my data analysis was: “What input combinations make the lights switch on?”

So if I hadn’t known the GUI and the solution, I think we could have seen a clearer difference between my data analysis approaches and a GUI approach. Instead of having the mental representation of 9 buttons in a 3×3 grid and 4 lamps, I would only have the data. 512 combinations of 9 binary inputs and 4 binary outputs. I would have found the patterns in the data and then… I’m not sure actually. Look at the GUI to see what the thing is that the data relates to?

I guess there’s only one way to find out: next time James Lyndsay releases a Black Box Puzzle with an API, I am going API-first.

Test strategy primer

I originally wrote this primer to share with people at work. The first section is intended to be generally applicable; the second not necessarily so. The main influences on this primer are:
– Rapid Software Testing’s
Heuristic Test Strategy Model
– James Lyndsay’s
Why Exploration has a place in any Strategy
– Rikard Edgren’s
Little Black Book on Test Design

This document contains two sections:

  • Why have a test strategy? – what is a test strategy and what’s its purpose
  • Test strategy checklist – checklist for describing your test strategy

Why have a test strategy?

The purpose of a test strategy is to answer the question: “How do we know that what we are building, is good enough?” As such, every team has a test strategy, even if it’s only implicitly defined through the actions of the team members.

The value of documenting your test strategy is to create an explicit shared understanding of your strategy among your stakeholders (that includes you as team members) . So your test strategy document should be a living document, a tool that helps you deliver software better.

A good test strategy is characterized by diverse half-measures: diversity is often more important than depth.

“How do we know that what we are building, is good enough?”

“what we are building”
You can look at what we are building from different perspectives:
– a piece of code
– one or more components
– a feature
– a capability
– a product

“good
A good way to approach this is to split up “good” in different quality attributes such as capability, security, maintainability. Two good checklists with quality attributes are:
– The Test Eye’s Software Quality Characteristics
– The Heuristic Test Strategy Model’s Quality Criteria Categories (page 5)

“good enough”
Something is good enough when it provides sufficient value at low enough risk. Exactly where to draw that line, is a decision for the team’s stakeholders.

Evaluating value starts from our expectations, e.g. acceptance criteria. It’s about showing that something works, that it provides the value we want it to provide. Scripted approaches, i.e. the test is largely or completely defined beforehand, work well in this area – especially when you can define these in code instead of manual instructions.

Evaluating risk starts from the thing we are building. It’s about looking for problems, that what we’re building is not doing things we don’t want it to do. Exploratory approaches work well in this area, because you don’t know beforehand what exactly you are looking for.

A few important things to note:
– The two approaches (scripted versus exploratory) exist in a continuum. When testing, you move within this continuum.
– Both approaches can be applied to any level or across levels of your stack.
– Both approaches benefit greatly from using different kinds of tools.

“how do we know”
What are the things we do?
– static testing (code does not need to run), examples: code review, code analysis.
– dynamic testing (code needs to run), examples: unit testing, system testing, performance testing.
– monitoring (product is used by customers), examples: log alerts, server monitoring, support tickets dashboard.

Test strategy checklist

Products & components

  • What product(s) do we contribute to?
  • Why would our company want to have these products?
  • What components of those products are we responsible for?
  • Critical dependencies of our components:
    • What do our components depend on?
    • What has dependencies on our components?

Feature testing

  • What testing is done while developing a feature?
    • which quality characteristics do you focus on?
    • what levels of the stack and/or which components are part of the test?
    • what testing is done as part of a pipeline (see below)?
  • What testing have you decided not to do, i.e. is not in scope?

Release testing

  • What testing is performed on the release candidate?
    • which quality characteristics do you focus on?
    • what levels of the stack and/or which components are part of the test?
    • what testing is done as part of a pipeline (see below)?
  • To what degree do you repeat your feature testing during release testing?
  • What testing have you decided not to do, i.e. is not in scope?

CI/CD pipeline(s)

  • What is covered by the pipeline(s)?
    • code analysis
    • auto-tests
    • deployment
  • When/how are they triggered?
    • schedule
    • commits, merge requests, …
  • Is there a dashboard?
  • What happens if a pipeline goes red?

Testing done by others

  • What testing is done by others?
    • Examples: security testing by a 3rd party, beta-releases.

Monitoring

  • How do you monitor your production environment(s)?
    • Examples: server metrics, logs, user analytics, support tickets.

Impediments

  • What is slowing down your testing?
    • Examples: architecture, design, skills, tools, team capacity.