Getting [name] from “Name: [name]” in Python – an engineering problem

Today I was presented with an interesting engineering problem. (Important later: context was the code of an auto-test.) Given a string of the format “Name: [name]”, what’s the best way to get the [name] in Python?

There are several options:
– lstrip()
– split()
– replace()
– string slicing
– regex

So let’s look at each of them and then I’ll explain which one I prefer and why. All examples are in Python 3.6, using the Python Interpreter.

lstrip()

>>> our_string = "Name: this-is-the-name"
>>> our_string.lstrip("Name: ")
'this-is-the-name'

That seems to work until we try a different string:

>>> our_other_string = "Name: a name"
>>> our_other_string.lstrip("Name: ")
'name'

This may seem weird, but lstrip() is doing exactly what it says it will do: “The chars argument is not a prefix; rather, all combinations of its values are stripped.” So it will continue stripping until it encounters the first character that doesn’t match any character in the string we gave to lstrip().

To fix that, we can do:

>>> our_other_string = "Name: a name"
>>> our_other_string.lstrip("Name:").lstrip()
'a name'

Where the first lstrip() won’t get beyond the space and the second lstrip() will get rid of that space for us.

split()

>>> our_string = "Name: this-is-the-name"
>>> our_string.split(":")[1].lstrip()
'this-is-the-name'

>>> our_other_string = "Name: a name"
>>> our_other_string.split(":")[1].lstrip()
'a name'

The split() will split the string on the colon (:) into a list with two items. We get the second item with the ‘[1]‘ (list indices start at 0). We strip off the leading space.

replace()

>>> our_string = "Name: this-is-the-name"
>>> our_string.replace("Name: ", "", 1)
'this-is-the-name'

>>> our_other_string = "Name: a name"
>>> our_other_string.replace("Name: ", "", 1)
'a name'

The replace() method replaces parts of a string. So here we replace “Name: ” with an empty string – deleting it. The ‘1‘ argument at the end tells replace() to only replace the first occurrence:

>>> too_many = "Name: more Name: please"
>>> too_many.replace("Name: ", "")
'more please'

>>> too_many.replace("Name: ", "", 1)
'more Name: please'

string slicing

Python allows you to grab slices from a string similar to what you do from a list (as we did for the split() above).

>>> our_string = "Name: this-is-the-name"
>>> our_string[6:]
'this-is-the-name'

>>> our_other_string = "Name: a name"
>>> our_other_string[6:]
'a name'

The number before the colon tells Python where to start, the one after the colon where to end. So ‘[6:]‘ means to start after the 6th character (or rather from index 6, since indices start at 0) and continue until the end of the string (because no number was given).

regex

>>> import re
>>> regex = "(^Name: )(.+)"

>>> our_string = "Name: this-is-the-name"
>>> re.match(regex, our_string).group(2)
'this-is-the-name'

>>> our_other_string = "Name: a name"
>>> re.match(regex, our_other_string).group(2)
'a name'

The regular expression is defined in the regex variable. It describes a string that starts with “Name: ” and is followed by at least one other character. The parentheses define groups, which we’ll use to get the part we’re interested in, i.e. the part after “Name: “.

match() will return a match object if zero or more characters at the beginning of the string match the regular expression pattern. Which means we don’t actually need the ‘^‘ at the start of our regex to only match the start of the string.

group() will return one or more subgroups of the match. The zeroth group is the entire match. Thanks to the parentheses in our regex definition we also have a first group (“Name: “) and a second group (the thing we’re after).

which one to use

Going through the options they all have disadvantages:
lstrip() strips characters not strings, but we want to strip a certain string.
split() returns two things of which we need only one after we have lstrip()-ed it.
replace() returns a changed copy of a string, while we just want to parse the original string.
– string slicing ditches the first 6 characters not caring what they are.
– regex feels like a bit too much.

However, two of them have a distinct advantage over the others.

As I said in the introduction, the context of this problem was the code of an auto-test. More specifically, the “Name: [name]” is returned by Selenium WebDriver as the content of an html tag. So without knowing or seeing the application, just from reading the code that retrieves the content of the tag, you have no idea what this string is that we are manipulating.

So that’s the first thing we want from our solution: it should give as much information about the string we are manipulating as possible. This means that split() and string slicing are not an option. The split() solution only tells you there’s a colon followed by a whitespace in the string. The string slicing solution only tells you there are at least 7 characters in the string.

Secondly, the solution should capture our intention as clearly as possible, which is getting the [name] from the string “Name: [name]”. That means lstrip() is out, because it strips characters, not a specific string.

You could argue something similar for replace(), since the “replace with empty string” is a somewhat clever hack to use something intended for editing to retrieve a specific part of a string. On the other hand, the code is clear enough – unlike lstrip() where the characters-not-string might become a bit of a surprise down the line.

So basically we’re left with replace() and regex. At the office we hadn’t figured out that replace() has a count argument, so based on ‘But what if there’s a “Name: ” in the content of the tag?’, we went with the regex solution. Regardless, I still think that the regex solution is the clearest one.

conclusion

The Zen of Python states: “Explicit is better than implicit.” This one sentence is the reason I can write a whole blog post about what the best way is to get [name] from “Name: [name]”. Code being explicit about what it’s doing and what it is trying to do, matters. The more explicit it is, the better it can serve as documentation.

But what if you have to choose? The regex solution is indeed the clearest – assuming you know some regex, that is. So the solution that’s most explicit about what the string is and what our intentions with it are, is the least explicit about how it does what it does. (Also, if you have trouble understanding what it does, it will be hard to determine the intentions behind the code.) So what now?

Personally I will always trade the code being less explicit about what it does for being more explicit about the thing-under-test and what my intentions are. Because figuring out what the thing-under-test exactly does and what someone’s intentions were, are hard things to do. Learning some new things about a programming language or tool, significantly less so – even if it’s regex.

Your CI/CD pipeline does not run regression tests

CI/CD pipelines

The purpose of a CI/CD pipeline is to allow you to deliver small changes in a fast and controlled way. Without any tests in your pipeline you would gain a lot of speed. You’d also lose a lot control, which is why people in general do run tests in their pipeline. The purpose of these tests is to check if that stage of the pipeline meets the minimum level of acceptable quality for that stage.

For example, commit stage tests will consist of mostly unit tests, a few integration tests, and even fewer end-to-end tests, because early in the pipeline speed is more important than comprehensiveness. When I commit my changes, I want the results fast enough so that I will wait for them – ready to fix any issue that might occur.

Regression testing

There are many definitions of regression testing, as you can read in Arborosa’s blog post on the topic. I have always defined regression testing along the lines of “testing the parts that weren’t impacted by a change to see if they really weren’t impacted.” (Which is really weird if you start thinking about it: something is regression testing depending on your knowledge of the system and the change.)

The tests in your pipeline are regression tests, …

Most of the tests that run in your pipeline are regression tests. Your commits are small and you have a lot of tests, so most of those will cover parts of the system that shouldn’t have been impacted by your changes. So yes, regression tests.

The one exception is if your commit contains both changes and new or updated tests related to that change. For that one run of the pipeline those tests are not regression tests. The next commit they are.
Or, since you ran those tests before committing, perhaps they already have become regression tests when they are executed by the pipeline?

Sidenote:
A grey area is when your commit is a pure refactoring, as in: you didn’t even have to change any of the tests. On the one hand, you made a change, so the tests covering that change, are not regression tests. On the other hand, at the level these tests are defined, there should be zero impact, they shouldn’t detect any changes. So in that sense they are regression tests.

…, but that’s irrelevant.

So sure, the tests run by your pipeline are regression tests. However, they are regression tests incidentally, not essentially. They happen to be regression tests, but that’s not really relevant.

To see why, we need to revisit the start of this blog post.

The purpose of a regression test is to check if unchanged parts of the system are indeed unchanged. It’s the testing that got a name, so we could distinguish it from the other testing, which never really got a name. (Progression testing? Feature testing?) It’s the testing you do after sufficient testing and fixing, when you’re not expecting any more changes and you need to check if all the “other stuff” still works.

The purpose of a test in a CI/CD pipeline is to check the level of quality of a particular stage in the pipeline. The pipeline stages combined with all the practices that surround them, result in a continuous delivery of changes that can be deployed to production. Whether the tests at a particular stage are regression tests or not, doesn’t matter. What does matter is if they provide the information required to decide if we should proceed to the next stage or not.

And that’s why I claim that your CI/CD pipeline does not run regression tests. The definition of “regression test” may technically apply to the tests run by your pipeline; the context that comes with the term, does not. So although it might (mostly) be correct to say that your pipeline runs regression tests, doing so is not helpful in how you think about your pipeline or about your tests. It moves your mind towards thinking about changed versus unchanged things – drawing it away from the continuous delivery of a good enough product.

update August 6th:
After publishing this post, I got the following question on twitter: so how does this impact actual decisions? In response, I came up with four things you might do if you think of the tests in your pipeline as regression tests:
1. Not looking for regressions when exploratory testing because you already have so many regression tests.
2. Poorly designing the stages of the pipeline, because all it needs to do is just run those regression tests.
3. Doing exploratory testing too early in the pipeline, because you should do feature testing before regression testing.
4. Being lenient towards a failed pipeline because they’re just regressions, we can fix them later.

— — —

p.s. 1: One thing I’m glossing over is that your CI/CD pipeline can (should) have stages in which the testing involves a human. I don’t think it makes a difference for my argument. Yet I’m still conveniently limiting the scope of this post to the literal interpretation of “Your CI/CD pipeline does not run regression tests”.

p.s. 2: None of the ideas in this blog post are new, which you can see from the replies in the twitter thread that lead me to writing this blog post.