Today I was presented with an interesting engineering problem. (Important later: context was the code of an auto-test.) Given a string of the format “Name: [name]”, what’s the best way to get the [name] in Python?
There are several options: – lstrip() – split() – replace() – string slicing – regex
So let’s look at each of them and then I’ll explain which one I prefer and why. All examples are in Python 3.6, using the Python Interpreter.
That seems to work until we try a different string:
>>> our_other_string = "Name: a name"
>>> our_other_string.lstrip("Name: ")
This may seem weird, but lstrip() is doing exactly what it says it will do:“The chars argument is not a prefix; rather, all combinations of its values are stripped.” So it will continue stripping until it encounters the first character that doesn’t match any character in the string we gave to lstrip().
To fix that, we can do:
>>> our_other_string = "Name: a name"
Where the first lstrip() won’t get beyond the space and the second lstrip() will get rid of that space for us.
>>> our_string = "Name: this-is-the-name"
>>> our_other_string = "Name: a name"
The split() will split the string on the colon (:) into a list with two items. We get the second item with the ‘‘ (list indices start at 0). We strip off the leading space.
Python allows you to grab slices from a string similar to what you do from a list (as we did for the split() above).
>>> our_string = "Name: this-is-the-name"
>>> our_other_string = "Name: a name"
The number before the colon tells Python where to start, the one after the colon where to end. So ‘[6:]‘ means to start after the 6th character (or rather from index 6, since indices start at 0) and continue until the end of the string (because no number was given).
>>> import re
>>> regex = "(^Name: )(.+)"
>>> our_string = "Name: this-is-the-name"
>>> re.match(regex, our_string).group(2)
>>> our_other_string = "Name: a name"
>>> re.match(regex, our_other_string).group(2)
The regular expression is defined in the regex variable. It describes a string that starts with “Name: ” and is followed by at least one other character. The parentheses define groups, which we’ll use to get the part we’re interested in, i.e. the part after “Name: “.
match() will return a match object if zero or more characters at the beginning of the string match the regular expression pattern. Which means we don’t actually need the ‘^‘ at the start of our regex to only match the start of the string.
group() will return one or more subgroups of the match. The zeroth group is the entire match. Thanks to the parentheses in our regex definition we also have a first group (“Name: “) and a second group (the thing we’re after).
which one to use
Going through the options they all have disadvantages: – lstrip() strips characters not strings, but we want to strip a certain string. – split() returns two things of which we need only one after we have lstrip()-ed it. – replace() returns a changed copy of a string, while we just want to parse the original string. – string slicing ditches the first 6 characters not caring what they are. – regex feels like a bit too much.
However, two of them have a distinct advantage over the others.
As I said in the introduction, the context of this problem was the code of an auto-test. More specifically, the “Name: [name]” is returned by Selenium WebDriver as the content of an html tag. So without knowing or seeing the application, just from reading the code that retrieves the content of the tag, you have no idea what this string is that we are manipulating.
So that’s the first thing we want from our solution: it should give as much information about the string we are manipulating as possible. This means that split() and string slicing are not an option. The split() solution only tells you there’s a colon followed by a whitespace in the string. The string slicing solution only tells you there are at least 7 characters in the string.
Secondly, the solution should capture our intention as clearly as possible, which is getting the [name] from the string “Name: [name]”. That means lstrip() is out, because it strips characters, not a specific string.
You could argue something similar for replace(), since the “replace with empty string” is a somewhat clever hack to use something intended for editing to retrieve a specific part of a string. On the other hand, the code is clear enough – unlike lstrip() where the characters-not-string might become a bit of a surprise down the line.
So basically we’re left with replace() and regex. At the office we hadn’t figured out that replace() has a count argument, so based on ‘But what if there’s a “Name: ” in the content of the tag?’, we went with the regex solution. Regardless, I still think that the regex solution is the clearest one.
The Zen of Python states: “Explicit is better than implicit.” This one sentence is the reason I can write a whole blog post about what the best way is to get [name] from “Name: [name]”. Code being explicit about what it’s doing and what it is trying to do, matters. The more explicit it is, the better it can serve as documentation.
But what if you have to choose? The regex solution is indeed the clearest – assuming you know some regex, that is. So the solution that’s most explicit about what the string is and what our intentions with it are, is the least explicit about how it does what it does. (Also, if you have trouble understanding what it does, it will be hard to determine the intentions behind the code.) So what now?
Personally I will always trade the code being less explicit about what it does for being more explicit about the thing-under-test and what my intentions are. Because figuring out what the thing-under-test exactly does and what someone’s intentions were, are hard things to do. Learning some new things about a programming language or tool, significantly less so – even if it’s regex.
The purpose of a CI/CD pipeline is to allow you to deliver small changes in a fast and controlled way. Without any tests in your pipeline you would gain a lot of speed. You’d also lose a lot control, which is why people in general do run tests in their pipeline. The purpose of these tests is to check if that stage of the pipeline meets the minimum level of acceptable quality for that stage.
For example, commit stage tests will consist of mostly unit tests, a few integration tests, and even fewer end-to-end tests, because early in the pipeline speed is more important than comprehensiveness. When I commit my changes, I want the results fast enough so that I will wait for them – ready to fix any issue that might occur.
There are many definitions of regression testing, as you can read in Arborosa’s blog post on the topic. I have always defined regression testing along the lines of “testing the parts that weren’t impacted by a change to see if they really weren’t impacted.” (Which is really weird if you start thinking about it: something is regression testing depending on your knowledge of the system and the change.)
The tests in your pipeline are regression tests, …
Most of the tests that run in your pipeline are regression tests. Your commits are small and you have a lot of tests, so most of those will cover parts of the system that shouldn’t have been impacted by your changes. So yes, regression tests.
The one exception is if your commit contains both changes and new or updated tests related to that change. For that one run of the pipeline those tests are not regression tests. The next commit they are. Or, since you ran those tests before committing, perhaps they already have become regression tests when they are executed by the pipeline?
Sidenote: A grey area is when your commit is a pure refactoring, as in: you didn’t even have to change any of the tests. On the one hand, you made a change, so the tests covering that change, are not regression tests. On the other hand, at the level these tests are defined, there should be zero impact, they shouldn’t detect any changes. So in that sense they are regression tests.
…, but that’s irrelevant.
So sure, the tests run by your pipeline are regression tests. However, they are regression tests incidentally, not essentially. They happen to be regression tests, but that’s not really relevant.
To see why, we need to revisit the start of this blog post.
The purpose of a regression test is to check if unchanged parts of the system are indeed unchanged. It’s the testing that got a name, so we could distinguish it from the other testing, which never really got a name. (Progression testing? Feature testing?) It’s the testing you do after sufficient testing and fixing, when you’re not expecting any more changes and you need to check if all the “other stuff” still works.
The purpose of a test in a CI/CD pipeline is to check the level of quality of a particular stage in the pipeline. The pipeline stages combined with all the practices that surround them, result in a continuous delivery of changes that can be deployed to production. Whether the tests at a particular stage are regression tests or not, doesn’t matter. What does matter is if they provide the information required to decide if we should proceed to the next stage or not.
And that’s why I claim that your CI/CD pipeline does not run regression tests. The definition of “regression test” may technically apply to the tests run by your pipeline; the context that comes with the term, does not. So although it might (mostly) be correct to say that your pipeline runs regression tests, doing so is not helpful in how you think about your pipeline or about your tests. It moves your mind towards thinking about changed versus unchanged things – drawing it away from the continuous delivery of a good enough product.
update August 6th: After publishing this post, I got the following question on twitter: so how does this impact actual decisions? In response, I came up with four things you might do if you think of the tests in your pipeline as regression tests: 1. Not looking for regressions when exploratory testing because you already have so many regression tests. 2. Poorly designing the stages of the pipeline, because all it needs to do is just run those regression tests. 3. Doing exploratory testing too early in the pipeline, because you should do feature testing before regression testing. 4. Being lenient towards a failed pipeline because they’re just regressions, we can fix them later.
— — —
p.s. 1: One thing I’m glossing over is that your CI/CD pipeline can (should) have stages in which the testing involves a human. I don’t think it makes a difference for my argument. Yet I’m still conveniently limiting the scope of this post to the literal interpretation of “Your CI/CD pipeline does not run regression tests”.
p.s. 2: None of the ideas in this blog post are new, which you can see from the replies in the twitter thread that lead me to writing this blog post.
James Lyndsay has created a number of amazing Black Box Puzzles: tiny applications that challenge you to figure out what they do. (You can support him in creating more of these at his Patreon page.) Two of these Puzzles, 29 and 31, not only have a GUI to explore, but also an API.
And that gave me an idea. If you explore these Puzzles through their GUI, you start from the inputs. You try out different inputs in the hope of discovering a pattern in the outputs. And then that pattern feeds back into your exploration. With an API, however – and because of the nature of Puzzle 31 – it becomes easy to get the outputs for all possible combinations of inputs. Which means you can start your exploration from the outputs instead of the inputs.
Before I tell you how and what I did, three important remarks. First of all, I will be spoiling the solution to the Puzzle in this blog post. So this is the right moment to go and solve Puzzle 31 for yourself first. Or at least go play a bit with it, so you have an idea what the inputs and outputs are. Secondly, I had already solved the Puzzle through the GUI a few months ago. So it was more of a “Can I find the solution this way as well?” than a “Can I find the solution?” thing. Finally, the code and the spreadsheet I created (linked throughout, also available on GitHub here), are not very clean. I thought about tidying them up, but my two reasons for not doing so are (1) laziness; (2) the way they are now gives a more honest picture of what I did.
With that figured out, the next question was how to iterate over all the possible inputs in the API requests. Turns out that the Python library itertools has a product function, which does exactly that: product(('up', 'down'), repeat=9).
Sending API requests with Python is something I have done before (yay requests library!). The same for writing data to a csv file. So I ended up writing this Python script, which got me this csv file.
In hindsight I would make one change to the script. The values “down”/”up” and “on”/”off” make data analysis harder than it should be. So later on I created a different csv file replacing down”/”up” and “on”/”off” with 0/1.
Data analysis with a spreadsheet
Before trying to tackle the analysis with Python (see the next section), I went at it with a spreadsheet. I have done some data analysis and manipulation with spreadsheets in the past, so I figured that with some filters, some formulas, some conditional formatting, and perhaps a pivot table, I should be able to solve the Puzzle.
You can find that spreadsheet here. Normally I would use Excel, but I got curious if I could get it done with LibreOffice Calc instead. Turns out that yes.
Solving lamp 1
First thing I did was checking how many input combinations would turn each of the lamps on/off. For lamp 1 there are only 2 combinations that turn it on. For all other 510 combinations, it is off. Lamp 2 is on for 337 combinations, off for 175. And lamp 3 and 4 have the same ratio: 169 combinations for on and 343 combinations for off.
These results suggest that although we have four lamps, there might be just three types of behavior. Or just three behaviors. So I checked if lamp 3 and 4 do the same thing, but that’s not the case. For 91 input combinations both are on, for 2 x 78 one is off and the other on, for 265 combinations both are off.
Ok, back to lamp 1, because with just 2 input combinations that switch it on, it should be easy to look at those two combinations and figure out how it works.
And there it is: lamp 1 switches on when all buttons are either up or when all of them are down. So that’s lamp 1 solved.
Solving lamp 3 and 4
With lamps 2, 3 and 4 having so many combinations resulting in the lamps being either on or off, I added a column to the spreadsheet counting the number of “down” buttons for each input combination. This told me that switching lamp 2 requires at least two “down” buttons. Switching on lamp 3 or 4 required at least 3 “down” buttons. (So lamp 3 and 4 are still very similar.)
Another thing I did was adding conditional formatting similar to the GUI of the puzzle. “Down” buttons turn blue and “on” lights turn red. This made it a lot easier for me to spot patterns while looking at the data.
I decided to look further into lamp 3 and lamp 4 first. Looking at the input combinations with the minimal number of three “down” buttons that switch the lamps on, shows a clear pattern:
Lamp 3 switches on if three buttons with the same number are down; lamp 4 if three buttons with the same letter are down.
Of course, the question remains if that’s all there is to lamps 3 and 4. I did a quick visual spot check for lamp 4 with more than three buttons in the down state. That was more of a formality, though, since I knew from solving the Puzzle through the GUI, that I had the correct solution.
What I should have done however, was add a formula to the spreadsheet calculating the state of lamps 1, 3 and 4 based on the inputs. Then add another formula comparing those states to the actual states and check if I got al off them correctly. So I added these for lamp 4 in the spreadsheet in sheet “puzzle31_binary”.
Solving lamp 2
Taking a similar approach as with lamp 3 and 4, I looked at all input combinations with only 2 “down” buttons that switched lamp 2 on. That suggested a pattern related to the middle buttons of the outer rows and columns. However that pattern didn’t seem to hold up when also looking at the inputs with 3 “down” buttons. So I needed something more.
I decided to turn to pivot tables (pun intended). That didn’t gain me anything, though. I have only very rarely used pivot tables in Excel and now in LibreOffice Calc I failed to make a pivot table do anything useful. Back to using filters to make sense of lamp 2.
To make that a little easier, I copied all inputs that result in lamp 2 switching on to a new sheet. Filtering the input combinations on 2 or 3 “down” buttons and counting how often each button was “down”, showed a pattern. The count would be either 5, 6 or 15. And only the ones with count 15 play a role in the combinations with 2 “down” buttons. So I filter further, keeping only the combinations in which the inputs with count 15 are “up”. And I am left with two input combinations: A1-B2-C3 and A3-B2-C1.
On a hunch I decided to go back to the first filter result, while filtering out the two input combinations I just found. (Hence the “man filter” column in the sheet.) Looking at the counts, that leaves me with only two groups of inputs: the one with a count of 4 and with a count of 15. So it seems that hunch was a good one.
Then I look at all the input combinations in which button A2 is “down”. If there are only two “down” buttons, these are either B1 or B3. And when there are three, either B1 or B3 are “down” as well.
Performing the same steps with B1, shows the same pattern, but with A2 and C2. Which means there are two types of input that switch lamp 2 on: (1) inputs of the pattern A2-B1|B3, and (2) inputs of the pattern A1-B2-C3.
And that solves lamp 2: where lamp 3 and 4 are about the straight lines, lamp 2 covers the diagonals.
Data analysis with pandas and seaborn
For the data analysis in Python I created a Jupyter notebook – been wanting to look into these – which you can find here (rendered very nicely by GitHub, btw). For analysis I used the pandas library, for visualization seaborn.
Visualizing a single input combination
After using pandas’ dataframes to replicate some of the steps with the spreadsheet, I decided to “mis-use” seaborn’s heatmap to visualize the two input combinations in which lamp 1 is on. That allowed me to figure out how to create a heatmap, and how it would be displayed in the notebook.
Lamp state heatmap – attempt 1
Now that I had an idea of how to create heatmaps, I decided to create one of all the input combinations that result in lamp 3 switching on.
Obviously that didn’t do me any good. To be honest, I couldn’t quite believe my eyes, so I verified the result using the spreadsheet. It checked out. Thinking about it I realized that the input combinations could be sorted into three groups. And each group could be defined as: three buttons with the same number that are “down”, combined with all possible combinations for the other buttons. So basically all the noise drowns out any signal.
Lamp state heatmap – attempt 2
I realized I had to filter the data in some way, so I decided to reduce the dataset to input combinations that resulted in only one of the lamps being switched on.
You could say that the lamp 2 heatmap suggests a diagonal pattern, lamp 3 a vertical one, and lamp 4 a horizontal pattern. However, it’s not really clear cut. If I hadn’t known the solution, I’m not sure what I would have concluded from these heatmaps.
lamp state heatmap – interlude
By now I had figured out I needed to find a way to analyse the correlation between different inputs – instead of throwing all inputs and outputs on one big pile.
After some googling I found that I could use pandas’ size() on the result of a DataFrame.groupby() to get more insight into these correlations. As you can see below, if I group by A1-A2-A3, I can see how often each unique combination of values of A1-A2-A3, occur in a dataset.
So I took the dataset that only switches lamp 4 on and grouped by A1-A2-A3, A1-B1-C1 and A1-B2-C3. The first two you can see below. The third one you can find in the notebook; it’s very similar to the second one.
I noticed two things. Firstly, the A1-A2-A3 result seems to adhere to some logic, while the A1-B1-C1 looks more random. Secondly, the A1-A2-A3 grouping contains input combinations in which all three buttons are “down” or “up”. The other grouping does not.
lamp state heatmap – correlation
Feeling I was on the right track, I started browsing the pandas documentation and there I found the solution: the DataFrame.corr() method. It does exactly what I need it to do. You feed it all the input combinations that result in for example lamp 4 switching on and it calculates the correlation between the inputs.
The result is a table, where a positive number indicates a positive linear correlation (highest is 1), zero indicates no correlation, and a negative number indicates a negative linear correlation (lowest is -1). Since we are looking for a positive correlation (buttons being either up or down together resulting in lamp 4 switching on), this table allows us to solve lamp 4. Anything with a positive number is part of the pattern for lamp 4.
And the pattern is even clearer in a heatmap:
Since I was curious, I did the same for the dataset of input combinations in which only lamp 4 and none of the others are switched on. Interestingly, although the pattern is still visible, it is less clear. We need the full dataset with what I previously called “noise” to properly calculate the correlations.
Now let’s look at the other three heatmaps. The one for lamp 3 shows a clear pattern similar to lamp 4.
The correlation heatmap of lamp 1 shows a clear pattern, although it does look a bit weird because of the logic behind its behavior: all buttons either up or down.
Finally, the heatmap of lamp 2 does show a clear enough pattern (focus on the positive versus negative numbers), but it doesn’t jump out as much as with lamps 3 or 4.
I can think of two reasons for the pattern not to jump out as clearly. First of all, there are two patterns in the data: diagonals of two “down” buttons and diagonals of three “down” buttons. Add to that all the different states the rest of the buttons can be in and you get a lot of noise. Secondly, some of the correlations of the two “down” buttons pattern cancel each other out. You can see that in this heatmap showing the correlation between input combinations with max two “down” buttons switching lamp 2 on:
I don’t think this heatmap is a really valid way of using correlation coefficients because of the limited dataset, but it still tells us something about the behavior of lamp 2.
And that wraps that up: Puzzle 31 solved with Python data analysis.
Since I wasn’t familiar with all the tools I was using, solving the Puzzle this way took a long time – literally hours. I’d be faster next time, of course, but finding the solution through exploring the GUI can be done in 5-10 minutes.
I had a lot of fun trying to figure out how to get these tools to do what I wanted. So a big thank you to all the great people who built these tools.
I still have a lot to learn on how to do data analysis well. For instance, when seeing the positive numbers in the correlation tables, I was like: “Puzzle solved!” It was only when writing this blog post, I dove a little deeper into what those numbers actually mean.
Excel is an amazing programming language that empowers ‘normal people’ to do programming in a variety of different domains in finance. And this became the motto of my PhD dissertation: “Spreadsheets are code.” Spreadsheets are a valid means of programming.
In that light it was interesting to notice how a Jupyter notebook is basically a spreadsheet on steroids.
Something that wasn’t as fun as I had hoped, was solving the Puzzle with Python. I didn’t have the expected eureka-feeling when I saw how the correlation heatmaps provided the solution. Ok, it was the end of a tiring day and I knew the solution beforehand, but still.
Writing this blog post made me realize how my decisions while doing the data analysis were informed by me already knowing the solution. To give one very basic example: I knew that putting buttons in their “down” state made the lamps switch on. So from the start the focus of my data analysis was: “What input combinations make the lights switch on?”
So if I hadn’t known the GUI and the solution, I think we could have seen a clearer difference between my data analysis approaches and a GUI approach. Instead of having the mental representation of 9 buttons in a 3×3 grid and 4 lamps, I would only have the data. 512 combinations of 9 binary inputs and 4 binary outputs. I would have found the patterns in the data and then… I’m not sure actually. Look at the GUI to see what the thing is that the data relates to?
I guess there’s only one way to find out: next time James Lyndsay releases a Black Box Puzzle with an API, I am going API-first.