Garajeando: Tests

Showing posts with label Tests. Show all posts

Wednesday, May 4, 2022

Interesting Talk: "The deep synergy between testability and good design"

I've just watched this great talk by Michael Feathers

The deep synergy between testability and good design

Sunday, October 18, 2020

Sleeping is not the best option

Introduction.

Some time ago we were developing a code that stored some data with a given TTL. We wanted to check not only that the data was stored correctly but also that it expired after the given TTL. This is an example of testing asynchronous code.

When testing asynchronous code we need to carefully coordinate the test with the system it is testing to avoid running the assertion before the tested action has completed^[1]. For example, the following test will always fail because the assertion in line 30 is checked before the data has expired:

In this case the test always fails but in other cases it might be worse, failing intermittently when the system is working, or passing when the system is broken. We need to make the test wait to give the action we are testing time to complete successfully and fail if this doesn’t happen within a given timeout period.

Sleeping is not the best option.

This is an improved version of the previous test in which we are making the test code wait before the checking that the data has expired to give the code under test time to run:

The problem with the simple sleeping approach is that in some runs the timeout might be enough for the data to expire but in other runs it might not, so the test will fail intermittently; it becomes a flickering test. Flickering tests are confusing because when they fail, we don’t know whether it’s due to a real bug, or it is just a false positive. If the failure is relatively common, the team might start ignoring those tests which can mask real defects and completely destroy the value of having automated tests.

Since the intermittent failures happen because the timeout is too close to the time the behavior we are testing takes to run, many teams decide to reduce the frequency of those failures by increasing the time each test sleeps before checking that the action under test was successful. This is not practical because it soon leads to test suites that take too long to run.

Alternative approaches.

If we are able to detect success sooner, succeeding tests will provide rapid feedback, and we only have to wait for failing tests to timeout. This is a much better approach than waiting the same amount of time for each test regardless it fails or succeeds.

There are two main strategies to detect success sooner: capturing notifications^[2] and polling for changes.

In the case we are using as an example, polling was the only option because redis didn’t send any monitoring events we could listen to.

Polling for changes.

To detect success as soon as possible, we’re going to probe several times separated by a time interval which will be shorter than the previous timeout. If the result of a probe is what we expect the test pass, if the result we expect is not there yet, we sleep a bit and retry. If after several retries, the expected value is not there, the test will fail.

Have a look at the checkThatDataHasExpired method in the following code:

By polling for changes we avoid always waiting the maximum amount of time. Only in the worst case scenario, when consuming all the retries without detecting success, we’ll wait as much as in the just sleeping approach that used a fixed timeout.

Extracting a helper.

Scattering ad hoc low level code that polls and probes like the one in checkThatDataHasExpired throughout your tests not only make them difficult to understand, but also is a very bad case of duplication. So we extracted it to a helper so we could reuse it in different situations.

What varies from one application of this approach to another are the probe, the check, the number of probes before failing and the time between probes, everything else we extracted to the following helper^[3]:

This is how the previous tests would look after using the helper:

Notice that we’re passing the probe, the check, the number of probes and the sleep time between probes to the AsyncTestHelpers::assertWithPolling function.

Conclusions.

We showed an example in Php of an approach to test asynchronous code described by Steve Freeman and Nat Pryce in their Growing Object-Oriented Software, Guided by Tests book. This approach avoids flickering test and produces much faster test suites than using a fixed timeout. We also showed how we abstracted this approach by extracting a helper function that we are reusing in our code.

We hope you’ve found this approach interesting. If you want to learn more about this and several other techniques to effectively test asynchronous code, have a look at the wonderful Growing Object-Oriented Software, Guided by Tests book^[4].

Acknowledgements.

Thanks to my Codesai colleagues for reading the initial drafts and giving me feedback and to Chrisy Totty for the lovely cat picture.

Notes.

[1] This is a nice example of Connascence of Timing (CoT). CoT happens when when the timing of the execution of multiple components is important. In this case the action being tested must run before the assertion that checks its observable effects. That's the coordination we talk about. Check our post about Connascence to learn more about this interesting topic.

[2] In the capturing notifications strategy the test code "observes the system by listening for events that the system sends out. An event-based assertion waits for an event by blocking on a monitor until gets notified or times out", (from Growing Object-Oriented Software, Guided by Tests book).
Some time ago we developed some helpers using the capturing notifications strategy to test asynchronous ClojureScript code that was using core.async channels. Have a look at, for instance, the expect-async-message assertion helper in which we use core.async/alts! and core.async/timeout to implement this behaviour. The core.async/alts! function selects the first channel that responds. If that channel is the one the test code was observing we assert that the received message is what we expected. If the channel that responds first is the one generated by core.async/timeout we fail the test. We mentioned these async-test-tools in previous post: Testing Om components with cljs-react-test.

[3] Have a look at the testing asynchronous systems examples in the GOOS Code examples repository for a more object-oriented implementation of helper for the polling for changes strategy, and also examples of the capturing notifications strategy.

[4] Chapter 27, Testing Asynchronous code, contains a great explanation of the two main strategies to test asynchronous code effectively: capturing notifications and polling for changes.

References.

Growing Object-Oriented Software, Guided by Tests, Steve Freeman and Nat Pryce (chapter 27: Testing Asynchronous Code)
Eradicating Non-Determinism in Tests, Martin Fowler

Monday, July 13, 2020

Interesting Talk: "The Forgotten Art of Structured Programming"

I've just watched this great talk by Kevlin Henney

The Forgotten Art of Structured Programming

Thursday, April 23, 2020

Interesting Talk: "Breaking Illusions with Testing"

I've just watched this wonderful talk by Maaret Pyhäjärvi

Breaking Illusions with Testing

Thursday, April 16, 2020

Interesting Talk: "Ideology"

I've just watched this great talk by Gary Bernhardt

Ideology

Sunday, August 4, 2019

Interesting Talk: "Property-Based Testing The Ugly Parts: Case Studies from Komposition"

I've just watched this interesting talk by Oskar Wickström

Property-Based Testing The Ugly Parts: Case Studies from Komposition

Thursday, June 27, 2019

An example of listening to the tests to improve a design

Introduction.

Recently in the B2B team at LIFULL Connect, we improved the validation of the clicks our API receive using a service that detects whether the clicks were made by a bot or a human being.

So we used TDD to add this new validation to the previously existing validation that checked if the click contained all mandatory information. This was the resulting code:

and these were its tests:

The problem with these tests is that they know too much. They are coupled to many implementation details. They not only know the concrete validations we apply to a click and the order in which they are applied, but also details about what gets logged when a concrete validations fails. There are multiple axes of change that will make these tests break. The tests are fragile against those axes of changes and, as such, they might become a future maintenance burden, in case changes along those axes are required.

So what might we do about that fragility when any of those changes come?

Improving the design to have less fragile tests.

As we said before the test fragility was hinting to a design problem in the ClickValidation code. The problem is that it’s concentrating too much knowledge because it’s written in a procedural style in which it is querying every concrete validation to know if the click is ok, combining the result of all those validations and knowing when to log validation failures. Those are too many responsibilities for ClickValidation and is the cause of the fragility in the tests.

We can revert this situation by changing to a more object-oriented implementation in which responsibilities are better distributed. Let’s see how that design might look:

1. Removing knowledge about logging.

After this change, ClickValidation will know nothing about looging. We can use the same technique to avoid knowing about any similar side-effects which concrete validations might produce.

First we create an interface, ClickValidator, that any object that validates clicks should implement:

Next we create a new class NoBotClickValidator that wraps the BotClickDetector and adapts^[1] it to implement the ClickValidator interface. This wrapper also enrichs BotClickDetector’s’ behavior by taking charge of logging in case the click is not valid.

These are the tests of NoBotClickValidator that takes care of the delegation to BotClickDetector and the logging:

If we used NoBotClickValidator in ClickValidation, we’d remove all knowledge about logging from ClickValidation.

Of course, that knowledge would also disappear from its tests. By using the ClickValidator interface for all concrete validations and wrapping validations with side-effects like logging, we’d make ClickValidation tests robust to changes involving some of the possible axis of change that were making them fragile:

Changing the interface of any of the individual validations.
Adding side-effects to any of the validations.

2. Another improvement: don't use test doubles when it's not worth it^[2].

There’s another way to make ClickValidation tests less fragile.

If we have a look at ClickParamsValidator and BotClickDetector (I can’t show their code here for security reasons), they have very different natures. ClickParamsValidator has no collaborators, no state and a very simple logic, whereas BotClickDetector has several collaborators, state and a complicated validation logic.

Stubbing ClickParamsValidator in ClickValidation tests is not giving us any benefit over directly using it, and it’s producing coupling between the tests and the code.

On the contrary, stubbing NoBotClickValidator (which wraps BotClickDetector) is really worth it, because, even though it also produces coupling, it makes ClickValidation tests much simpler.

Using a test double when you’d be better of using the real collaborator is a weakness in the design of the test, rather than in the code to be tested.

These would be the tests for the ClickValidation code with no logging knowledge, after applying this idea of not using test doubles for everything:

Notice how the tests now use the real ClickParamsValidator and how that reduces the coupling with the production code and makes the set up simpler.

3. Removing knowledge about the concrete sequence of validations.

After this change, the new design will compose validations in a way that will result in ClickValidation being only in charge of combining the result of a given sequence of validations.

First we refactor the click validation so that the validation is now done by composing several validations:

The new validation code has several advantages over the previous one:

It does not depend on concrete validations any more
It does not depend on the order in which the validations are made.

It has only one responsibility: it applies several validations in sequence, if all of them are valid, it will accept the click, but if any given validation fails, it will reject the click and stop applying the rest of the validations. If you think about it, it’s behaving like an and operator.

We may write these tests for this new version of the click validation:

These tests are robust to the changes making the initial version of the tests fragile that we described in the introduction:

Changing the interface of any of the individual validations.
Adding side-effects to any of the validations.
Adding more validations.
Changing the order of the validation.

However, this version of ClickValidationTest is so general and flexible, that using it, our tests would stop knowing which validations, and in which order, are applied to the clicks^[3]. That sequence of validations is a business rule and, as such, we should protect it. We might keep this version of ClickValidationTest only if we had some outer test protecting the desired sequence of validations.

This other version of the tests, on the other hand, keeps protecting the business rule:

Notice how this version of the tests keeps in its setup the knowledge of which sequence of validations should be used, and how it only uses test doubles for NoBotClickValidator.

4. Avoid exposing internals.

The fact that we’re injecting into ClickValidation an object, ClickParamsValidator, that we realized we didn’t need to double, it’s a smell which points to the possibility that ClickParamsValidator is an internal detail of ClickValidation instead of its peer. So by injecting it, we’re coupling ClickValidation users, or at least the code that creates it, to an internal detail of ClickValidation: ClickParamsValidator.

A better version of this code would hide ClickParamsValidator by instantiating it inside ClickValidation’s constructor:

With this change ClickValidation recovers the knowledge of the sequence of validations which in the previous section was located in the code that created ClickValidation.

There are some stereotypes that can help us identify real collaborators (peers)^[4]:

Dependencies: services that the object needs from its environment so that it can fulfill its responsibilities.
Notifications: other parts of the system that need to know when the object changes state or performs an action.
Adjustments or Policies: objects that tweak or adapt the object’s behaviour to the needs of the system.

Following these stereotypes, we could argue that NoBotClickValidator is also an internal detail of ClickValidation and shouldn’t be exposed to the tests by injecting it. Hiding it we’d arrive to this other version of ClickValidation:

in which we have to inject the real dependencies of the validation, and no internal details are exposed to the client code. This version is very similar to the one we’d have got using tests doubles only for infrastructure.

The advantage of this version would be that its tests would know the least possible about ClickValidation. They’d know only ClickValidation’s boundaries marked by the ports injected through its constructor, and ClickValidation`’s public API. That will reduce the coupling between tests and production code, and facilitate refactorings of the validation logic.

The drawback is that the combinations of test cases in ClickValidationTest would grow, and may of those test cases would talk about situations happening in the validation boundaries that might be far apart from ClickValidation’s callers. This might make the tests hard to understand, specially if some of the validations have a complex logic. When this problem gets severe, we may reduce it by injecting and use test doubles for very complex validators, this is a trade-off in which we decide to accept some coupling with the internal of ClickValidation in order to improve the understandability of its tests. In our case, the bot detection was one of those complex components, so we decided to test it separately, and inject it in ClickValidation so we could double it in ClickValidation’s tests, which is why we kept the penultimate version of ClickValidation in which we were injecting the click-not-made-by-a-bot validation.

Conclusion.

In this post, we tried to play with an example to show how listening to the tests^[5] we can detect possible design problems, and how we can use that feedback to improve both the design of our code and its tests, when changes that expose those design problems are required.

In this case, the initial tests were fragile because the production code was procedural and had too many responsibilities. The tests were fragile also because they were using test doubles for some collaborators when it wasn’t worth to do it.

Then we showed how refactoring the original code to be more object-oriented and separating better its responsibilities, could remove some of the fragility of the tests. We also showed how reducing the use of test doubles only to those collaborators that really needs to be substituted can improve the tests and reduce their fragility. Finally, we showed how we can go too far in trying to make the tests flexible and robust, and accidentally stop protecting a business rule, and how a less flexible version of the tests can fix that.

When faced with fragility due to coupling between tests and the code being tested caused by using test doubles, it’s easy and very usual to “blame the mocks”, but, we believe, it would be more productive to listen to the tests to notice which improvements in our design they are suggesting. If we act on this feedback the tests doubles give us about our design, we can use tests doubles in our advantage, as powerful feedback tools^[6], that help us improve our designs, instead of just suffering and blaming them.

Acknowledgements.

Many thanks to my Codesai colleagues Alfredo Casado, Fran Reyes, Antonio de la Torre and Manuel Tordesillas, and to my Aprendices colleagues Paulo Clavijo, Álvaro García and Fermin Saez for their feedback on the post, and to my colleagues at LIFULL Connect for all the mobs we enjoy together.

Footnotes:

[1] See the Adapter or wrapper pattern.

[2] See Test Smell: Everything is mocked by Steve Freeman where he talks about things you shouldn't be substituting with tests doubles.

[3] Thanks Alfredo Casado for detecting that problem in the first version of the post.

[4] From Growing Object-Oriented Software, Guided by Tests > Chapter 6, Object-Oriented Style > Object Peer Stereotypes, page 52. You can also read about these stereotypes in a post by Steve Freeman: Object Collaboration Stereotypes.

[5] Difficulties in testing might be a hint of design problems. Have a look at this interesting series of posts about listening to the tests by Steve Freeman.

[6] According to Nat Pryce mocks were designed as a feedback tool for designing OO code following the 'Tell, Don't Ask' principle: "In my opinion it's better to focus on the benefits of different design styles in different contexts (there are usually many in the same system) and what that implies for modularisation and inter-module interfaces. Different design styles have different techniques that are most applicable for test-driving code written in those styles, and there are different tools that help you with those techniques. Those tools should give useful feedback about the external and *internal* quality of the system so that programmers can 'listen to the tests'. That's what we -- with the help of many vocal users over many years -- designed jMock to do for 'Tell, Don't Ask' object-oriented design." (from a conversation in Growing Object-Oriented Software Google Group).

I think that if your design follows a different OO style, it might be preferable to stick to a classical TDD style which nearly limits the use of test doubles only to infrastructure and undesirable side-effects.

Saturday, May 25, 2019

The curious case of the negative builder

Recently, one of the teams I’m coaching at my current client, asked me to help them with a problem, they were experiencing while using TDD to add and validate new mandatory query string parameters^[1]. This is a shortened version (validating fewer parameters than the original code) of the tests they were having problems with:

and this is the implementation of the QueryStringBuilder used in this test:

which is a builder with a fluid interface that follows to the letter a typical implementation of the pattern. There are even libraries that help you to automatically create this kind of builders^[2].

However, in this particular case, implementing the QueryStringBuilder following this typical recipe causes a lot of problems. Looking at the test code, you can see why.

To add a new mandatory parameter, for example sourceId, following the TDD cycle, you would first write a new test asserting that a query string lacking the parameter should not be valid.

So far so good, the problem comes when you change the production code to make this test pass, in that momento you’ll see how the first test that was asserting that a query string with all the parameters was valid starts to fail (if you check the query string of that tests and the one in the new test, you’ll see how they are the same). Not only that, all the previous tests that were asserting that a query string was invalid because a given parameter was lacking won’t be “true” anymore because after this change they could fail for more than one reason.

So to carry on, you’d need to fix the first test and also change all the previous ones so that they fail again only for the reason described in the test name:

That’s a lot of rework on the tests only for adding a new parameter, and the team had to add many more. The typical implementation of a builder was not helping them.

The problem we’ve just explained can be avoided by chosing a default value that creates a valid query string and what I call “a negative builder”, a builder with methods that remove parts instead of adding them. So we refactored together the initial version of the tests and the builder, until we got to this new version of the tests:

which used a “negative” QueryStringBuilder:

After this refactoring, to add the sourceId we wrote this test instead:

which only carries with it updating the valid method in QueryStringBuilder and adding a method that removes the sourceId parameter from a valid query string.

Now when we changed the code to make this last test pass, no other test failed or started to have descriptions that were not true anymore.

Leaving behind the typical recipe and adapting the idea of the builder pattern to the context of the problem at hand, led us to a curious implementation, a “negative builder”, that made the tests easier to maintain and improved our TDD flow.

Acknowledgements.

Many thanks to my Codesai colleagues Antonio de la Torre and Fran Reyes, and to all the colleagues of the Prime Services Team at LIFULL Connect for all the mobs we enjoy together.

Footnotes:

[1] Currently, this validation is not done in the controller anymore. The code showed above belongs to a very early stage of an API we're developing.

[2] Have a look, for instance, at lombok's' @Builder annotation for Java.

Tuesday, May 14, 2019

Killing mutants to improve your tests

At my current client we’re working on having a frontend architecture for writing SPAs in JavaScript similar to re-frame’s one: an event-driven bus with effects and coeffects for state management^[1] (commands) and subscriptions using reselect’s selectors (queries).

One of the pieces we have developed to achieved that goal is reffects-store. Using this store, React components can be subscribed to given reselect’s selectors, so that they only render when the values in the application state tracked by the selectors change.

After we finished writing the code for the store, we decided to use mutation testing to evaluate the quality of our tests. Mutation testing is a technique in which, you introduce bugs, (mutations), into your production code, and then run your tests for each mutation. If your tests fail, it’s ok, the mutation was “killed”, that means that they were able to defend you against the regression caused by the mutation. If they don’t, it means your tests are not defending you against that regression. The higher the percentage of mutations killed, the more effective your tests are.

There are tools that do this automatically, stryker ^[2] is one of them. When you run stryker, it will create many mutant versions of your production code, and run your tests for each mutant (that’s how mutations are called in stryker’s’ documentation) version of the code. If your tests fail then the mutant is killed. If your tests passed, the mutant survived. Let’s have a look at the the result of runnning stryker against reffects-store’s code:

Notice how stryker shows the details of every mutation that survived our tests, and look at the summary the it produces at the end of the process.

All the surviving mutants were produced by mutations to the store.js file. Having a closer look to the mutations in stryker’s output we found that the functions with mutant code were unsubscribeAllListeners and unsubscribeListener. After a quick check of their tests, it was esay to find out why unsubscribeAllListeners was having surviving mutants. Since it was a function we used only in tests for cleaning the state after each test case was run, we had forgotten to test it.

However, finding out why unsubscribeListener mutants were surviving took us a bit more time and thinking. Let’s have a look at the tests that were exercising the code used to subscribe and unsubscribe listeners of state changes:

If we examine the mutations and the tests, we can see that the tests for unsubscribeListener are not good enough. They are throwing an exception from the subscribed function we unsubscribe, so that if the unsubscribeListener function doesn’t work and that function is called the test fails. Unfortunately, the test passes also if that function is never called for any reason. In fact, most of the surviving mutants that stryker found above have are variations on that idea.

A better way to test unsubscribeListener is using spies to verify that subscribed functions are called and unsubscribed functions are not (this version of the tests includes also a test for unsubscribeAllListeners):

After this change, when we run stryker we got the following output:

No mutants survived!! This means this new version of the tests is more reliable and will protect us better from regressions than the initial version.

Mutation testing is a great tool to know if you can trust your tests. This is event more true when working with legacy code.

Acknowledgements.

Many thanks to Mario Sánchez and Alex Casajuana Martín for all the great time coding together, and thanks to Porapak Apichodilok for the photo used in this post and to Pexels.

Footnotes:

[1] See also reffects which is the synchronous event bus with effects and coeffects we wrote to manage the application state.

[2] The name of this tool comes from a fictional Marvel comics supervillain Willian Stryker who was obsessed with the eradication of all mutants.

Tuesday, March 26, 2019

Interesting Talk: "Mastering Spark Unit Testing"

I've just watched this wonderful talk by Ted Malaska

Mastering Spark Unit Testing

Tuesday, March 5, 2019

Interesting Talk: "Testing & validating Apache Spark jobs"

I've just watched this great talk by Holden Karau

Testing & validating Apache Spark jobs

Thursday, January 3, 2019

Avoid using subscriptions only as app-state getters

Introduction.

Subscriptions in re-frame or re-om are query functions that extract data from the app-state and provide it to view functions in the right format. When we use subscriptions well, they provide a lot of value^[1], because they avoid having to keep derived state the app-state and they dumb down the views, that end up being simple “data in, screen out” functions.

However, things are not that easy. When you start working with subscriptions, it might happen that you end up using them as mere getters of app-state. This is a missed opportunity because using subscriptions in this way, we won’t take advantage of all the value they can provide, and we run the risk of leaking pieces of untested logic into our views.

An example.

We’ll illustrate this problem with a small real example written with re-om subscriptions (in an app using re-frame subscriptions the problem would look similar). Have a look at this piece of view code in which some details have been elided for brevity sake:

this code is using subscriptions written in the horizon.domain.reports.dialogs.edit namespace.

The misuse of subscriptions we’d like to show appears on the following piece of the view:

Notice how the only thing that we need to render this piece of view is next-generation. To compute its value the code is using several subscriptions to get some values from the app-state and binding them to local vars (delay-unit start-at, delay-number and date-modes). Those values are then fed to a couple of private functions also defined in the view (get-next-generation and delay->interval) to obtain the value of next-generation.

This is a bad use of subscriptions. Remember subscriptions are query functions on app-state that, used well, help to make views as dumb (with no logic) as possible. If you push as much logic as possible into subscriptions, you might achieve views that are so dumb you nearly don’t need to test, and decide to limit your unit tests to do only subcutaneous testing of your SPA.

Refactoring: placing the logic in the right place.

We can refactor the code shown above to remove all the leaked logic from the view by writing only one subscription called next-generation which will produce the only information that the view needs. As a result both get-next-generation and delay->interval functions will get pushed into the logic behind the new next-generation subscription and dissappear from the view.

This is the resulting view code after this refactoring:

and this is the resulting pure logic of the new subscription. Notice that, since get-next-generation function wasn’t pure, we had to change it a bit to make it pure:

After this refactoring the view is much dumber. The previously leaked (an untested) logic in the view (the get-next-generation and delay->interval functions) has been removed from it. Now that logic can be easyly tested through the new next-generation subscription. This design is also much better than the previous one because it hides how we obtain the data that the view needs: now both the view and the tests ignore, and so are not coupled, to how the data the view needs is obtained. We might refactor both the app-state and the logic now in get-next-generation and delay->interval functions without affecting the view. This is another example of how what is more stable than how.

Summary

The idea to remember is that subscriptions by themselves don’t make code more testable and maintainable. It’s the way we use subscriptions that produces better code. For that the logic must be in the right place which is not inside the view but behind the subscriptions that provide the data that the view needs. If we keep writing “getter” subscriptions” and placing logic in views, we won’t gain all the advantages the subscriptions concept provides and we’ll write poorly designed views coupled to leaked bunches of (very likely untested) logic.

Acknowledgements.

Many thanks to André Stylianos Ramos and Fran Reyes for giving us great feedback to improve this post and for all the interesting conversations.

Footnotes:

[1] Subscriptions also make it easier to share code between different views and, in the case of re-frame (and soon re-om as well), they are optimized to minimize unnecessary re-renderings of the views and de-duplicate computations.

Tuesday, November 27, 2018

Interesting Screencast: "Refactoring item logic using ‘lift up conditional'"

I've just watched this great screencast by Emily Bache

Refactoring item logic using ‘lift up conditional'

Interesting Screencast: "Introducing the Gilded Rose kata and writing test cases using Approval Tests"

I've just watched this great screencast by Emily Bache

Introducing the Gilded Rose kata and writing test cases using Approval Tests

Saturday, May 19, 2018

Improving legacy Om code (II): Using effects and coeffects to isolate effectful code from pure code

Introduction.

In the previous post, we applied the humble object pattern idea to avoid having to write end-to-end tests for the interesting logic of a hard to test legacy Om view, and managed to write cheaper unit tests instead. Then, we saw how those unit tests were far from ideal because they were highly coupled to implementation details, and how these problems were caused by a lack of separation of concerns in the code design.

In this post we’ll show a solution to those design problems using effects and coeffects that will make the interesting logic pure and, as such, really easy to test and reason about.

Refactoring to isolate side-effects and side-causes using effects and coeffects.

We refactored the code to isolate side-effects and side-causes from pure logic. This way, not only testing the logic got much easier (the logic would be in pure functions), but also, it made tests less coupled to implementation details. To achieve this we introduced the concepts of coeffects and effects.

The basic idea of the new design was:

Extracting all the needed data from globals (using coeffects for getting application state, getting component state, getting DOM state, etc).
Using pure functions to compute the description of the side effects to be performed (returning effects for updating application state, sending messages, etc) given what was extracted in the previous step (the coeffects).
Performing the side effects described by the effects returned by the called pure functions.

The main difference that the code of horizon.controls.widgets.tree.hierarchy presented after this refactoring was that the event handler functions were moved back into it again, and that they were using the process-all! and extract-all! functions that were used to perform the side-effects described by effects, and extract the values of the side-causes tracked by coeffects, respectively. The event handler functions are shown in the next snippet (to see the whole code click here):

Now all the logic in the companion namespace was comprised of pure functions, with neither asynchronous nor mutating code:

Thus, its tests became much simpler:

Notice how the pure functions receive a map of coeffects already containing all the extracted values they need from the “world” and they return a map with descriptions of the effects. This makes testing really much easier than before, and remove the need to use test doubles.

Notice also how the test code is now around 100 lines shorter. The main reason for this is that the new tests know much less about how the production code is implemented than the previous one. This made possible to remove some tests that, in the previous version of the code, were testing some branches that we were considering reachable when testing implementation details, but when considering the whole behaviour are actually unreachable.

Now let’s see the code that is extracting the values tracked by the coeffects:

which is using several implementations of the Coeffect protocol:

All the coeffects were created using factories to localize in only one place the “shape” of each type of coeffect. This indirection proved very useful when we decided to refactor the code that extracts the value of each coeffect to substitute its initial implementation as a conditional to its current implementation using polymorphism with a protocol.

These are the coeffects factories:

Now there was only one place where we needed to test side causes (using test doubles for some of them). These are the tests for extracting the coeffects values:

A very similar code is processing the side-effects described by effects:

which uses different effects implementing the Effect protocol:

that are created with the following factories:

Finally, these are the tests for processing the effects:

Summary.

We have seen how by using the concept of effects and coeffects, we were able to refactor our code to get a new design that isolates the effectful code from the pure code. This made testing our most interesting logic really easy because it became comprised of only pure functions.

The basic idea of the new design was:

Extracting all the needed data from globals (using coeffects for getting application state, getting component state, getting DOM state, etc).
Computing in pure functions the description of the side effects to be performed (returning effects for updating application state, sending messages, etc) given what it was extracted in the previous step (the coeffects).
Performing the side effects described by the effects returned by the called pure functions.

Since the time we did this refactoring, we have decided to go deeper in this way of designing code and we’re implementing a full effects & coeffects system inspired by re-frame.

Acknowledgements.

Many thanks to Francesc Guillén, Daniel Ojeda, André Stylianos Ramos, Ricard Osorio, Ángel Rojo, Antonio de la Torre, Fran Reyes, Miguel Ángel Viera and Manuel Tordesillas for giving me great feedback to improve this post and for all the interesting conversations.

Improving legacy Om code (I): Adding a test harness

Introduction.

I’m working at GreenPowerMonitor as part of a team developing a challenging SPA to monitor and manage renewable energy portfolios using ClojureScript. It’s a two years old Om application which contains a lot of legacy code. When I say legacy, I’m using Michael Feathers’ definition of legacy code as code without tests. This definition views legacy code from the perspective of code being difficult to evolve because of a lack of automated regression tests.

The legacy (untested) Om code.

Recently I had to face one of these legacy parts when I had to fix some bugs in the user interface that was presenting all the devices of a given energy facility in a hierarchy tree (devices might be comprised of other devices). This is the original legacy view code:

This code contains not only the layout of several components but also the logic to both conditionally render some parts of them and to respond to user interactions. This interesting logic is full of asynchronous and effectful code that is reading and updating the state of the components, extracting information from the DOM itself and reading and updating the global application state. All this makes this code very hard to test.

Humble Object pattern.

It’s very difficult to make component tests for non-component code like the one in this namespace, which makes writing end-to-end tests look like the only option.

However, following the idea of the humble object pattern, we might reduce the untested code to just the layout of the view. The humble object can be used when a code is too closely coupled to its environment to make it testable. To apply it, the interesting logic is extracted into a separate easy-to-test component that is decoupled from its environment.

In this case we extracted the interesting logic to a separate namespace, where we thoroughly tested it. With this we avoided writing the slower and more fragile end-to-end tests.

We wrote the tests using the test-doubles library (I’ve talked about it in a recent post) and some home-made tools that help testing asynchronous code based on core.async.

This is the logic we extracted:

and these are the tests we wrote for it:

See here how the view looks after this extraction. Using the humble object pattern, we managed to test the most important bits of logic with fast unit tests instead of end-to-end tests.

The real problem was the design.

We could have left the code as it was (in fact we did for a while) but its tests were highly coupled to implementation details and hard to write because its design was far from ideal.

Even though, applying the humble object pattern idea, we had separated the important logic from the view, which allowed us to focus on writing tests with more ROI avoiding end-to-end tests, the extracted logic still contained many concerns. It was not only deciding how to interact with the user and what to render, but also mutating and reading state, getting data from global variables and from the DOM and making asynchronous calls. Its effectful parts were not isolated from its pure parts.

This lack of separation of concerns made the code hard to test and hard to reason about, forcing us to use heavy tools: the test-doubles library and our async-test-tools assertion functions to be able to test the code.

Summary.

First, we applied the humble object pattern idea to manage to write unit tests for the interesting logic of a hard to test legacy Om view, instead of having to write more expensive end-to-end tests.

Then, we saw how those unit tests were far from ideal because they were highly coupled to implementation details, and how these problems were caused by a lack of separation of concerns in the code design.

Next.

In the next post we’ll solve the lack of separation of concerns by using effects and coeffects to isolate the logic that decides how to interact with the user from all the effectful code. This new design will make the interesting logic pure and, as such, really easy to test and reason about.

Tuesday, May 15, 2018

Interesting Talk: "What vs. How, Writing tests that suck less"

I've just watched this very interesting talk by Torsten Mangner

What vs. How, Writing tests that suck less

Saturday, May 5, 2018

Interesting Talk: "Practical Generative Testing Patterns"

I've just watched this wonderful talk by Srihari Sriraman

Practical Generative Testing Patterns

Monday, April 9, 2018

test-doubles: A small spying and stubbing library for Clojure and ClojureScript

As you may know from a previous post I’m working for GreenPowerMonitor as part of a team that is developing a challenging SPA to monitor and manage renewable energy portfolios using ClojureScript.

We were dealing with some legacy code that was effectful and needed to be tested using test doubles, so we explored some existing ClojureScript libraries but we didn't feel comfortable with them. On one hand, we found that some of them had different macros for different types of test doubles and this made tests that needed both spies and stubs become very nested. We wanted to produce tests with as little nesting as possible. On the other hand, being used to Gerard Meszaros’ vocabulary for tests doubles, we found the naming used for different types of tests doubles in some of the existing libraries a bit confusing. We wanted to stick to Gerard Meszaros’ vocabulary for tests doubles.

So we decided we'd write our own stubs and spies library.

We started by manually creating our own spies and stubs during some time so that we could identify the different ways in which we were going to use them. After a while, my colleague André Stylianos Ramos and I wrote our own small DSL to create stubs and spies using macros to remove all that duplication and boiler plate. The result was a small library that we've been using in our ClojureScript project for nearly a year and that we've recently adapted to make it work in Clojure as well:

I’m really glad to announce that GreenPowerMonitor has open-sourced our small spying and stubbing library for Clojure and ClojureScript: test-doubles.

In the following example written in ClojureScript, we show how we are using test-doubles to create two stubs (one with the :maps option and another with the :returns option) and a spy:

We could show you more examples here of how test-doubles can be used and the different options it provides, but we’ve already included a lot of explained examples in its documentation.

Please do have a look and try our library. You can get its last version from Clojars. We hope it might be as useful to you as it has been for us.

Friday, March 30, 2018

Kata: A small kata to explore and play with property-based testing

1. Introduction.

I've been reading Fred Hebert's wonderful PropEr Testing online book about property-based testing. So to play with it a bit, I did a small exercise. This is its description:

1. 1. The kata.

We'll implement a function that can tell if two sequences are equal regardless of the order of their elements. The elements can be of any type.

We'll use property-based testing (PBT). Use the PBT library of your language (bring it already installed).

Follow these constraints:

You can't use or compute frequencies of elements.

Work test first: write a test, then write the code to make that test pass.

If you get stuck, you can use example-based tests to drive the implementation on. However, at the end of the exercise, only property-based tests can remain.

Use mutation testing to check if you tests are good enough (we'll do it manually injecting failures in the implementation code (by commenting or changing parts of it) and checking if the test are able to detect the failure to avoid using more libraries).

2. Driving a solution using both example-based and property-based tests.

I used Clojure and its test.check library (an implementation of QuickCheck) to do the exercise. I also used my favorite Clojure's test framework: Brian Marick's Midje which has a macro, forall, which makes it very easy to integrate property-based tests with Midje.

So I started to drive a solution using an example-based test (thanks to Clojure's dynamic nature, I could use vectors of integers to write the tests. ):

which I made pass using the following implementation:

Then I wrote a property-based test that failed:

This is how the failure looked in Midje (test.check returns more output when a property fails, but Midje extracts and shows only the information it considers more useful):

the most useful piece of information for us in this failure message is the quick-check shrunken failing values. When a property-based testing library finds a counter-example for a property, it applies a shrinking algorithm which tries to reduce it to find a minimal counter-example that produces the same test failure. In this case, the [1 0] vector is the minimal counter-example found by the shrinking algorithm that makes this test fails.

Next I made the property-based test pass by refining the implementation a bit:

I didn't know which property to write next, so I wrote a failing example-based test involving duplicate elements instead:

and refined the implementation to make it pass:

With this, the implementation was done (I chose a function that was easy to implement, so I could focus on thinking about properties).

3. Getting rid of example-based tests.

Then the next step was finding properties that could make the example-based tests redundant. I started by trying to remove the first example-based test. Since I didn't know test.check's generators and combinators library, I started exploring it on the REPL with the help of its API documentation and cheat sheet.

My sessions on the REPL to build generators bit by bit were a process of shallowly reading bits of documentation followed by trial and error. This tinkering sometimes lead to quick successes and most of the times to failures which lead to more deep and careful reading of the documentation, and more trial and error. In the end I managed to build the generators I wanted. The sample function was very useful during all the process to check what each part of the generator would generate.

For the sake of brevity I will show only summarized versions of my REPL sessions where everything seems easy and linear...

3. 1. First attempt: a partial success.

First, I wanted to create a generator that generated two different vectors of integers so that I could replace the example-based tests that were checking two different vectors. I used the list-distinct combinator to create it and the sample function to be able to see what the generator would generate:

I used this generator to write a new property which made it possible to remove the first example-based test but not the second one:

In principle, we might think that the new property should have been enough to also allow removing the last example-based test involving duplicate elements. A quick manual mutation test, after removing that example-based test, showed that it wasn't enough: I commented the line (= (count s1) (count s2)) in the implementation and the property-based tests weren't able to detect the regression.

This was due to the low probability of generating a pair of random vectors that were different because of having duplicate elements, which was what the commented line, (= (count s1) (count s2)), was in the implementation for. If we'd run the tests more times, we'd have finally won the lottery of generating a counter-example that would detect the regression. So we had to improve the generator in order to increase the probabilities, or, even better, make sure it'd be able to detect the regression.

In practice, we'd combine example-based and property-based tests. However, my goal was learning more about property-based testing, so I went on and tried to improve the generators (that's why this exercise has the constraint of using only property-based tests).

3. 2. Second attempt: success!

So, I worked a bit more on the REPL to create a generator that would always generate vectors with duplicate elements. For that I used test.check's let macro, the tuple, such-that and not-empty combinators, and Clojure's core library repeat function to build it.

The following snippet shows a summary of the work I did on the REPL to create the generator using again the sample function at each tiny step to see what inputs the growing generator would generate:

Next I used this new generator to write properties that this time did detect the regression mentioned above. Notice how there are separate properties for sequences with and without duplicates:

After tinkering a bit more with some other generators like return and vector-distinct, I managed to remove a redundant property-based test getting to this final version:

4. Conclusion.

All in all, this exercise was very useful to think about properties and to explore test.check's generators and combinators. Using the REPL made this exploration very interactive and a lot of fun. You can find the code of this exercise on this GitHub repository.

A couple of days later I proposed to solve this exercise at the last Clojure Developers Barcelona meetup. I received very positive feedback, so I'll probably propose it for a Barcelona Software Craftsmanship meetup event soon.

Wednesday, May 4, 2022

Sunday, October 18, 2020

Introduction.

Sleeping is not the best option.

Alternative approaches.

Polling for changes.

Extracting a helper.

Conclusions.

Acknowledgements.

Notes.

References.

Monday, July 13, 2020

Thursday, April 23, 2020

Thursday, April 16, 2020

Sunday, August 4, 2019

Thursday, June 27, 2019

Introduction.

Improving the design to have less fragile tests.

1. Removing knowledge about logging.

2. Another improvement: don't use test doubles when it's not worth it[2].

3. Removing knowledge about the concrete sequence of validations.

4. Avoid exposing internals.

Conclusion.

Saturday, May 25, 2019

Tuesday, May 14, 2019

Tuesday, March 26, 2019

Tuesday, March 5, 2019

Thursday, January 3, 2019

Introduction.

An example.

Refactoring: placing the logic in the right place.

Summary

Acknowledgements.

Tuesday, November 27, 2018

Saturday, May 19, 2018

Introduction.

Refactoring to isolate side-effects and side-causes using effects and coeffects.

Summary.

Acknowledgements.

Introduction.

The legacy (untested) Om code.

Humble Object pattern.

The real problem was the design.

Summary.

Next.

Tuesday, May 15, 2018

Saturday, May 5, 2018

Monday, April 9, 2018

Friday, March 30, 2018

1. Introduction.

1. 1. The kata.

2. Driving a solution using both example-based and property-based tests.

3. Getting rid of example-based tests.

3. 1. First attempt: a partial success.

3. 2. Second attempt: success!

4. Conclusion.

2. Another improvement: don't use test doubles when it's not worth it^[2].