Sunday, September 3, 2017

Data clumps, primitive obsession and hidden tuples

During the writing of a recent post about connascence for Codesai's blog some of us were discussing whether we could consider a data clump a form of Connascence of Meaning (CoM) or not. In the end, we agreed that data clumps are indeed a form of CoM and that introducing a class for the missing abstraction reduces their connascence to Connascence of Type (CoT).

I had wondered in the past why we use a similar refactoring to eliminate both primitive obsession and data clump smells. Thinking about them from the point of view of connascence has helped me a lot to understand why.

I had also an alternative and curious reasoning to get to the same conclusion, in which a data clump gets basically reduced to an implicit form of primitive obsession. The reasoning is as it follows:

The concept of primitive obsession might be extended to consider the collections that a given language offers as primitives. In such cases, encapsulating the collection reifies a new concept that might attract code that didn't have where to "live" and thus was scattered all over. So far so good.

From the point of view of connascence, primitive obsession is a form of CoM that we transform into CoT by introducing a new type and then we might find Connascence of Algorithm (CoA) that we'd remove by moving the offending code inside the new type.

The composing elements of a data clump only make sense when they go together. This means that they're conceptually (but implicitly) grouped. In this sense a data clump could be seen as a "hidden or implicit tuple".

Having this "hidden collection" in mind is now easier to see how closely related the data clump and primitive obsession smells are. In this sense, we remove a data clump by encapsulating a collection, its "implicit or hidden tuple", inside a new class. Again, from the point of view of connascence, this encapsulation reduces CoM to CoT and might make evident some CoA that will make us move some behavior into the new class that becomes a value object.

This "implicit tuple" reasoning helped me to make more explicit the mental process that was leading me to end up doing very similar refactorings to remove both code smells.

However I think that CoM unifies both cases much more easily than relating the two smells.

The fact that the collection (the grouping of the elements of a data clump) is implicit also makes it more difficult to recognize a data clump as CoM in the first place. That's why I think that a data clump is a more implicit example of CoM than primitive obsession, and, thus, we might consider its CoM to be stronger than the primitive obsession's one.

A curious reasoning, right?