Saturday, June 29, 2019

Books I read (January - June 2019)

- Timeless Laws of Software Development, Jerry Fitzpatrick
- Writing to Learn, William Zinsser
- The End of the Affair, Graham Greene
- Beyond Legacy Code: Nine Practices to Extend the Life (and Value) of Your Software, David Scott Bernstein

- Refactoring Workbook, William C. Wake
- Binti, Nnedi Okorafor

- Home, Nnedi Okorafor
- The Night Masquerade, Nnedi Okorafor
- Developer Hegemony, Erik Dietrich
- The Ministry of Utmost Happiness, Arundhati Roy

- Cat on a Hot Tin Roof, Tennessee Williams
- Closely Watched Trains (Ostře sledované vlaky), Bohumil Hrabal
- The Chalk Giants, Keith Roberts
- Behold the Man, Michael Moorcock
- Cutting It Short (Postřižiny), Bohumil Hrabal
- La bicicleta de Sumji (סומכי : סיפור לבני הנעורים על אהבה והרפתקאות), Amos Oz
- The Sheltering Sky, Paul Bowles

- Elsewhere, Perhaps (מקום אחר), Amos Oz
- Liminal Thinking: Create the Change You Want by Changing the Way You Think, Dave Gray

- The Turn of the Screw, Henry James
- A Tale of Love and Darkness (סיפור על אהבה וחושך), Amos Oz
- Touch the Water, Touch the Wind (לגעת במים, לגעת ברוח), Amos Oz
- The Third Man, The Fallen Idol, Graham Greene
- The Little Book of Stupidity: How We Lie to Ourselves and Don't Believe Others, Sia Mohajer
- Slaughterhouse-Five, or The Children's Crusade: A Duty-Dance with Death, Kurt Vonnegut
- The Little Town Where Time Stood Still (Městečko, kde se zastavil čas), Bohumil Hrabal

Thursday, June 27, 2019

An example of listening to the tests to improve a design


Recently in the B2B team at LIFULL Connect, we improved the validation of the clicks our API receive using a service that detects whether the clicks were made by a bot or a human being.

So we used TDD to add this new validation to the previously existing validation that checked if the click contained all mandatory information. This was the resulting code:

and these were its tests:

The problem with these tests is that they know too much. They are coupled to many implementation details. They not only know the concrete validations we apply to a click and the order in which they are applied, but also details about what gets logged when a concrete validations fails. There are multiple axes of change that will make these tests break. The tests are fragile against those axes of changes and, as such, they might become a future maintenance burden, in case changes along those axes are required.

So what might we do about that fragility when any of those changes come?

Improving the design to have less fragile tests.

As we said before the test fragility was hinting to a design problem in the ClickValidation code. The problem is that it’s concentrating too much knowledge because it’s written in a procedural style in which it is querying every concrete validation to know if the click is ok, combining the result of all those validations and knowing when to log validation failures. Those are too many responsibilities for ClickValidation and is the cause of the fragility in the tests.

We can revert this situation by changing to a more object-oriented implementation in which responsibilities are better distributed. Let’s see how that design might look:

1. Removing knowledge about logging.

After this change, ClickValidation will know nothing about looging. We can use the same technique to avoid knowing about any similar side-effects which concrete validations might produce.

First we create an interface, ClickValidator, that any object that validates clicks should implement:

Next we create a new class NoBotClickValidator that wraps the BotClickDetector and adapts[1] it to implement the ClickValidator interface. This wrapper also enrichs BotClickDetector’s’ behavior by taking charge of logging in case the click is not valid.

These are the tests of NoBotClickValidator that takes care of the delegation to BotClickDetector and the logging:

If we used NoBotClickValidator in ClickValidation, we’d remove all knowledge about logging from ClickValidation.

Of course, that knowledge would also disappear from its tests. By using the ClickValidator interface for all concrete validations and wrapping validations with side-effects like logging, we’d make ClickValidation tests robust to changes involving some of the possible axis of change that were making them fragile:

  1. Changing the interface of any of the individual validations.
  2. Adding side-effects to any of the validations.

2. Another improvement: don't use test doubles when it's not worth it[2].

There’s another way to make ClickValidation tests less fragile.

If we have a look at ClickParamsValidator and BotClickDetector (I can’t show their code here for security reasons), they have very different natures. ClickParamsValidator has no collaborators, no state and a very simple logic, whereas BotClickDetector has several collaborators, state and a complicated validation logic.

Stubbing ClickParamsValidator in ClickValidation tests is not giving us any benefit over directly using it, and it’s producing coupling between the tests and the code.

On the contrary, stubbing NoBotClickValidator (which wraps BotClickDetector) is really worth it, because, even though it also produces coupling, it makes ClickValidation tests much simpler.

Using a test double when you’d be better of using the real collaborator is a weakness in the design of the test, rather than in the code to be tested.

These would be the tests for the ClickValidation code with no logging knowledge, after applying this idea of not using test doubles for everything:

Notice how the tests now use the real ClickParamsValidator and how that reduces the coupling with the production code and makes the set up simpler.

3. Removing knowledge about the concrete sequence of validations.

After this change, the new design will compose validations in a way that will result in ClickValidation being only in charge of combining the result of a given sequence of validations.

First we refactor the click validation so that the validation is now done by composing several validations:

The new validation code has several advantages over the previous one:

  • It does not depend on concrete validations any more
  • It does not depend on the order in which the validations are made.

It has only one responsibility: it applies several validations in sequence, if all of them are valid, it will accept the click, but if any given validation fails, it will reject the click and stop applying the rest of the validations. If you think about it, it’s behaving like an and operator.

We may write these tests for this new version of the click validation:

These tests are robust to the changes making the initial version of the tests fragile that we described in the introduction:

  1. Changing the interface of any of the individual validations.
  2. Adding side-effects to any of the validations.
  3. Adding more validations.
  4. Changing the order of the validation.

However, this version of ClickValidationTest is so general and flexible, that using it, our tests would stop knowing which validations, and in which order, are applied to the clicks[3]. That sequence of validations is a business rule and, as such, we should protect it. We might keep this version of ClickValidationTest only if we had some outer test protecting the desired sequence of validations.

This other version of the tests, on the other hand, keeps protecting the business rule:

Notice how this version of the tests keeps in its setup the knowledge of which sequence of validations should be used, and how it only uses test doubles for NoBotClickValidator.

4. Avoid exposing internals.

The fact that we’re injecting into ClickValidation an object, ClickParamsValidator, that we realized we didn’t need to double, it’s a smell which points to the possibility that ClickParamsValidator is an internal detail of ClickValidation instead of its peer. So by injecting it, we’re coupling ClickValidation users, or at least the code that creates it, to an internal detail of ClickValidation: ClickParamsValidator.

A better version of this code would hide ClickParamsValidator by instantiating it inside ClickValidation’s constructor:

With this change ClickValidation recovers the knowledge of the sequence of validations which in the previous section was located in the code that created ClickValidation.

There are some stereotypes that can help us identify real collaborators (peers)[4]:

  1. Dependencies: services that the object needs from its environment so that it can fulfill its responsibilities.
  2. Notifications: other parts of the system that need to know when the object changes state or performs an action.
  3. Adjustments or Policies: objects that tweak or adapt the object’s behaviour to the needs of the system.

Following these stereotypes, we could argue that NoBotClickValidator is also an internal detail of ClickValidation and shouldn’t be exposed to the tests by injecting it. Hiding it we’d arrive to this other version of ClickValidation:

in which we have to inject the real dependencies of the validation, and no internal details are exposed to the client code. This version is very similar to the one we’d have got using tests doubles only for infrastructure.

The advantage of this version would be that its tests would know the least possible about ClickValidation. They’d know only ClickValidation’s boundaries marked by the ports injected through its constructor, and ClickValidation`’s public API. That will reduce the coupling between tests and production code, and facilitate refactorings of the validation logic.

The drawback is that the combinations of test cases in ClickValidationTest would grow, and may of those test cases would talk about situations happening in the validation boundaries that might be far apart from ClickValidation’s callers. This might make the tests hard to understand, specially if some of the validations have a complex logic. When this problem gets severe, we may reduce it by injecting and use test doubles for very complex validators, this is a trade-off in which we decide to accept some coupling with the internal of ClickValidation in order to improve the understandability of its tests. In our case, the bot detection was one of those complex components, so we decided to test it separately, and inject it in ClickValidation so we could double it in ClickValidation’s tests, which is why we kept the penultimate version of ClickValidation in which we were injecting the click-not-made-by-a-bot validation.


In this post, we tried to play with an example to show how listening to the tests[5] we can detect possible design problems, and how we can use that feedback to improve both the design of our code and its tests, when changes that expose those design problems are required.

In this case, the initial tests were fragile because the production code was procedural and had too many responsibilities. The tests were fragile also because they were using test doubles for some collaborators when it wasn’t worth to do it.

Then we showed how refactoring the original code to be more object-oriented and separating better its responsibilities, could remove some of the fragility of the tests. We also showed how reducing the use of test doubles only to those collaborators that really needs to be substituted can improve the tests and reduce their fragility. Finally, we showed how we can go too far in trying to make the tests flexible and robust, and accidentally stop protecting a business rule, and how a less flexible version of the tests can fix that.

When faced with fragility due to coupling between tests and the code being tested caused by using test doubles, it’s easy and very usual to “blame the mocks”, but, we believe, it would be more productive to listen to the tests to notice which improvements in our design they are suggesting. If we act on this feedback the tests doubles give us about our design, we can use tests doubles in our advantage, as powerful feedback tools[6], that help us improve our designs, instead of just suffering and blaming them.


Many thanks to my Codesai colleagues Alfredo Casado, Fran Reyes, Antonio de la Torre and Manuel Tordesillas, and to my Aprendices colleagues Paulo Clavijo, Álvaro García and Fermin Saez for their feedback on the post, and to my colleagues at LIFULL Connect for all the mobs we enjoy together.


[2] See Test Smell: Everything is mocked by Steve Freeman where he talks about things you shouldn't be substituting with tests doubles.
[3] Thanks Alfredo Casado for detecting that problem in the first version of the post.
[4] From Growing Object-Oriented Software, Guided by Tests > Chapter 6, Object-Oriented Style > Object Peer Stereotypes, page 52. You can also read about these stereotypes in a post by Steve Freeman: Object Collaboration Stereotypes.
[5] Difficulties in testing might be a hint of design problems. Have a look at this interesting series of posts about listening to the tests by Steve Freeman.
[6] According to Nat Pryce mocks were designed as a feedback tool for designing OO code following the 'Tell, Don't Ask' principle: "In my opinion it's better to focus on the benefits of different design styles in different contexts (there are usually many in the same system) and what that implies for modularisation and inter-module interfaces. Different design styles have different techniques that are most applicable for test-driving code written in those styles, and there are different tools that help you with those techniques. Those tools should give useful feedback about the external and *internal* quality of the system so that programmers can 'listen to the tests'. That's what we -- with the help of many vocal users over many years -- designed jMock to do for 'Tell, Don't Ask' object-oriented design." (from a conversation in Growing Object-Oriented Software Google Group).

I think that if your design follows a different OO style, it might be preferable to stick to a classical TDD style which nearly limits the use of test doubles only to infrastructure and undesirable side-effects.

Saturday, May 25, 2019

The curious case of the negative builder

Recently, one of the teams I’m coaching at my current client, asked me to help them with a problem, they were experiencing while using TDD to add and validate new mandatory query string parameters[1]. This is a shortened version (validating fewer parameters than the original code) of the tests they were having problems with:

and this is the implementation of the QueryStringBuilder used in this test:

which is a builder with a fluid interface that follows to the letter a typical implementation of the pattern. There are even libraries that help you to automatically create this kind of builders[2].

However, in this particular case, implementing the QueryStringBuilder following this typical recipe causes a lot of problems. Looking at the test code, you can see why.

To add a new mandatory parameter, for example sourceId, following the TDD cycle, you would first write a new test asserting that a query string lacking the parameter should not be valid.

So far so good, the problem comes when you change the production code to make this test pass, in that momento you’ll see how the first test that was asserting that a query string with all the parameters was valid starts to fail (if you check the query string of that tests and the one in the new test, you’ll see how they are the same). Not only that, all the previous tests that were asserting that a query string was invalid because a given parameter was lacking won’t be “true” anymore because after this change they could fail for more than one reason.

So to carry on, you’d need to fix the first test and also change all the previous ones so that they fail again only for the reason described in the test name:

That’s a lot of rework on the tests only for adding a new parameter, and the team had to add many more. The typical implementation of a builder was not helping them.

The problem we’ve just explained can be avoided by chosing a default value that creates a valid query string and what I call “a negative builder”, a builder with methods that remove parts instead of adding them. So we refactored together the initial version of the tests and the builder, until we got to this new version of the tests:

which used a “negative” QueryStringBuilder:

After this refactoring, to add the sourceId we wrote this test instead:

which only carries with it updating the valid method in QueryStringBuilder and adding a method that removes the sourceId parameter from a valid query string.

Now when we changed the code to make this last test pass, no other test failed or started to have descriptions that were not true anymore.

Leaving behind the typical recipe and adapting the idea of the builder pattern to the context of the problem at hand, led us to a curious implementation, a “negative builder”, that made the tests easier to maintain and improved our TDD flow.


Many thanks to my Codesai colleagues Antonio de la Torre and Fran Reyes, and to all the colleagues of the Prime Services Team at LIFULL Connect for all the mobs we enjoy together.


[1] Currently, this validation is not done in the controller anymore. The code showed above belongs to a very early stage of an API we're developing.
[2] Have a look, for instance, at lombok's' @Builder annotation for Java.

Tuesday, May 14, 2019

Killing mutants to improve your tests

At my current client we’re working on having a frontend architecture for writing SPAs in JavaScript similar to re-frame’s one: an event-driven bus with effects and coeffects for state management[1] (commands) and subscriptions using reselect’s selectors (queries).

One of the pieces we have developed to achieved that goal is reffects-store. Using this store, React components can be subscribed to given reselect’s selectors, so that they only render when the values in the application state tracked by the selectors change.

After we finished writing the code for the store, we decided to use mutation testing to evaluate the quality of our tests. Mutation testing is a technique in which, you introduce bugs, (mutations), into your production code, and then run your tests for each mutation. If your tests fail, it’s ok, the mutation was “killed”, that means that they were able to defend you against the regression caused by the mutation. If they don’t, it means your tests are not defending you against that regression. The higher the percentage of mutations killed, the more effective your tests are.

There are tools that do this automatically, stryker[2] is one of them. When you run stryker, it will create many mutant versions of your production code, and run your tests for each mutant (that’s how mutations are called in stryker’s’ documentation) version of the code. If your tests fail then the mutant is killed. If your tests passed, the mutant survived. Let’s have a look at the the result of runnning stryker against reffects-store’s code:

Notice how stryker shows the details of every mutation that survived our tests, and look at the summary the it produces at the end of the process.

All the surviving mutants were produced by mutations to the store.js file. Having a closer look to the mutations in stryker’s output we found that the functions with mutant code were unsubscribeAllListeners and unsubscribeListener. After a quick check of their tests, it was esay to find out why unsubscribeAllListeners was having surviving mutants. Since it was a function we used only in tests for cleaning the state after each test case was run, we had forgotten to test it.

However, finding out why unsubscribeListener mutants were surviving took us a bit more time and thinking. Let’s have a look at the tests that were exercising the code used to subscribe and unsubscribe listeners of state changes:

If we examine the mutations and the tests, we can see that the tests for unsubscribeListener are not good enough. They are throwing an exception from the subscribed function we unsubscribe, so that if the unsubscribeListener function doesn’t work and that function is called the test fails. Unfortunately, the test passes also if that function is never called for any reason. In fact, most of the surviving mutants that stryker found above have are variations on that idea.

A better way to test unsubscribeListener is using spies to verify that subscribed functions are called and unsubscribed functions are not (this version of the tests includes also a test for unsubscribeAllListeners):

After this change, when we run stryker we got the following output:

No mutants survived!! This means this new version of the tests is more reliable and will protect us better from regressions than the initial version.

Mutation testing is a great tool to know if you can trust your tests. This is event more true when working with legacy code.


Many thanks to Mario Sánchez and Alex Casajuana Martín for all the great time coding together, and thanks to Porapak Apichodilok for the photo used in this post and to Pexels.


[1] See also reffects which is the synchronous event bus with effects and coeffects we wrote to manage the application state.
[2] The name of this tool comes from a fictional Marvel comics supervillain Willian Stryker who was obsessed with the eradication of all mutants.