Misusing the DRY Principle

Avoid abstracting incidental duplications

I still remember the very first time when the DRY principle was preached to me. "Whenever you find yourself writing duplicate code, always remember DRY. Don't Repeat Yourself" they said. I can only imagine that the majority of us has received this introduction to DRY and thinks the definition ends there. This definition can be true but it does not truly capture the spirit of the DRY principle. In this article, I want to talk about the idea of incidental duplication and its relationship with the DRY principle. This is an important advanced topic to discuss because it is a paradigm shift towards our view of duplications and abstractions. In addition, I want to change the notion that finding good abstractions is an art. I am convinced that it's a science and it can be mastered.

Purpose behind DRY

Let's start the article by taking a second to talk about the motivation behind the DRY principle. When the author's of "The Programmatic Programmer" introduced the concept, the following was used to describe the DRY principle:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

That's it! It's short, succinct and to the point. So how do people use this simple principle incorrectly? An innocent mistake we generally commit is thinking that the objective of this principle is to minimize the overall lines of code. This is false. If you noticed, this definition does not emphasize on code duplication at all. It just states that any rule or information about your business/app is deserving of an abstraction (i.e. function) and should have a single reference point in your application. These rules would be written once but potentially referenced multiple times. By abiding by this principle, you will also naturally notice a decrease in duplications in the code most times.

Incidental Duplication

So when we mention the term "incidental duplication" we are referring to duplication that has occurred in different places by mere chance (or "incidentally" if you will). It is code that looks the same but represents different behaviours/fact/knowledge in the system. DRY is generally misapplied when incidentally duplicated code is abstracted away. This tends to deliver negative outcomes in our system which we colloquially refer to as spaghetti code.

Example: The What Not To Do

Let's take a look at a practical example to illustrate how abstracting incidental duplication minimizes lines of code but also contributes to worse code quality. The following is hypothetical code that could possibly live in a restaurant's codebase. Let's imagine a restaurant serves pizzas and sandwiches.

const makePizza = () => {
  grateCheese();
  sliceTomatoes();
  sliceOnions();
  sliceBellPeppers();
  sliceOlives();
  putIngredientsOnDough();
  bakeDough();
}

const makeSandwich = () => {
  grateCheese();
  sliceTomatoes();
  sliceOnions();
  sliceBellPeppers();
  squirtMayonnaise();
  putIngredientsBetweenBread();
  toastBread();
}

At first glance, you will notice some commonality between making a pizza and making a sandwich. They both have similar ingredients such as cheese, tomatoes, onions, and bell peppers. One might be tempted to abstract this commonality for the sole purpose of reducing the number of lines of code.

const prepareIngredients = () => {
  grateCheese();
  sliceTomatoes();
  sliceOnions();
  sliceBellPeppers();
}

const makePizza = () => {
  prepareIngredients();
  sliceOlives();
  putIngredientsOnDough();
  bakeDough();
}

const makeSandwich = () => {
  prepareIngredients();
  squirtMayonnaise();
  putIngredientsBetweenBread();
}

The above refactoring has indeed reduced duplication in the code but there are a set of questions that we should ask ourselves. Is code now better? What does prepareIngredients represent? Is this DRY? Are there any new benefits gained from this? Have you seen code in your code base that has undergone this sort of "refactoring"?

Let's take a second to assess the prepareIngredients function and ask ourselves "what knowledge is being captured by this abstraction". It doesn't represent how to make a sandwich. It doesn't represent how to make a pizza. Instead, it represents the overlapping ingredients between how to make a pizza and a sandwich. If you think critically, you may come to realize that this information is most likely not significant to any person in the business and carries no real semantic value. Although truthful, its is considered to be arbitrary information. To put it in terms used by the DRY authors, it is not an authoritative representation of any significant piece of knowledge.

If you do not see the issue so far, consider what would happen if we introduce another item on the menu that also has overlaps with the sandwich and the pizza. Let's assume we need to introduce a burger to the menu and that a burger is simply tomatoes, cheese, a beef patty in a bun. If you are adamant about maintaining the prepareIngredients function, you would like experience the following updates.

const prepareIngredients = () => {
  grateCheese();
  sliceTomatoes();
}

const makeBurger = () => {
  prepareIngredients();
  cookBeefPatty();
  putInbuns();
}

const makePizza = () => {
  prepareIngredients();
  sliceOnions();
  sliceBellPeppers(); 
  sliceOlives();
  putIngredientsOnDough();
  bakeDough();
}

const makeSandwich = () => {
  prepareIngredients();
  sliceOnions();
  sliceBellPeppers();
  squirtMayonnaise();
  putIngredientsBetweenBread();
}

The prepareIngredients function has been minimized to only capture the overlap between all three menu items. In addition makeSandwich and makePizza has been updated to compensate for the changes done in prepareIngredients since it's a dependency to both these functions. To put it in other words, introducing a new item in the menu, which seemingly has nothing to do with the current items, will cause more updates in the code base due to this wrongful abstractions. Although this is a simple example, it implicitly is becoming a little harder to understand what ingredients are involved in a menu item.

The Fix

Although there is less code written in our example above, the quality of our code is clearly worse. Rather than do all this, you could have simply kept your code as follows:


const makeBurger = () => {
  grateCheese();
  sliceTomatoes();
  cookBeefPatty();
  putInbuns();
}

const makePizza = () => {
  grateCheese();
  sliceTomatoes();
  sliceOnions();
  sliceBellPeppers(); 
  sliceOlives();
  putIngredientsOnDough();
  bakeDough();
}

const makeSandwich = () => {
  grateCheese();
  sliceTomatoes();
  sliceOnions();
  sliceBellPeppers();
  squirtMayonnaise();
  putIngredientsBetweenBread();
}

Although longer, this code is linearly simple to understand, involves one less function, and has no interdependencies. This funnily just means that it was correct and better to have done nothing in the first place. There was not issue to begin with.

Last words

The examples I have written above are meant to easily demonstrate that aiming for less lines of code does not necessarily contribute to an increase in code quality. Unfortunately, I have seen signs of this misunderstanding in numerous code bases that I have been involved in. I would also be ballsy enough to say that I think all instances of what we call "spaghetti code" includes a similar misunderstanding of the DRY principle to that included in this article. If I can leave you with 1 simple message, I would leave you with this: Before creating a function, always ask yourself whether or not you are capturing knowledge that the business and other developers are interested in. If not, remove the abstraction!