Over-engineering?
When refactoring code, how far should you go? When does well-factored become over-engineered? In this article I'll refactor a small snippet of code with explanation as I go. Is the final result over-engineered or well-factored?
Building on from my previous article about refactoring to concepts I wanted to explore some more refactoring. Sometimes when I refactor to improve code (in my eyes) some people say to me that my changes are over-engineered. I myself have seen other peoples work and have made similar statements, who is right and when does well-factored tip over into over-engineered?
Let's explore, starting with a small snippet of code:
require "csv"
class Inventory
def import_products(file_path)
CSV.read(file_path, :headers => true).each do |row|
if row.field("name").present?
# do something interesting
end
end
end
The Inventory
class is part of an e-commerce platform that accepts a CSV file of products and imports them into the system. It must validate each record and will only import the valid ones. For simplicity, we'll just assume that the product name is the only thing that needs validating.
To load the data we are using CSV.read
as an example, it will load the whole file into memory so if you want to process large CSV files then you should use CSV.foreach
instead as it will yield each row as they are read. Thanks to @olkeene for mentioning it. :o)
Firstly, lets look at the responsibilities of this method. I can see three different responsibilities:
- Reading in the file
- Validating a row
- Doing the actual work (something interesting)
I want to split this up into it's separate pieces. The first thing I would do is to move the validating logic to a new method to extract the details from the conditional:
class Inventory
def import_products(file_path)
CSV.read(file_path, :headers => true).each do |row|
if valid_product_row?(row)
# do something interesting
end
end
end
def valid_product_row?(row)
row.field("name").present?
end
end
As we saw last time, I have a method that takes an object and then operates on it's data. This really belongs on the object itself. But, hang on a second. This time it's different, we don't own the object in question and we don't control how it gets created. In this case row
is a CSV::Row
object created for us when the rows in the CSV file are iterated. What can we do here? Well, there are a couple of options:
1. Monkey Patching
This is Ruby after all, we could just monkey-patch CSV::Row
and add a #valid?
method to it, right? I'm not against monkey-patching but use it where it makes sense for the object and method in question. Validating a product is not a CSV::Row
ish thing at all. If we were talking about something that wasn't specific to our domain but was relevant to rows in any CSV file then I'd consider it but there is another option.
2. Decorating
I think a better approach in this case would be to create a new class that decorates CSV::Row
which would allow us to extend the behaviour of the class without changing it. Here is where we can place the product specific logic:
class ProductRow
def initialize(row)
@row = row
end
def valid?
@row.field("name").present?
end
end
We can clean this up by using Ruby's SimpleDelegator
class as all missing methods will be delegated to the object passed into the constructor:
require "delegate"
class ProductRow < SimpleDelegator
def valid?
field("name").present?
end
end
And we can use it in the Inventory
like so:
class Inventory
def import_products(file_path)
CSV.read(file_path, :headers => true).each do |row|
product_row = ProductRow.new(row)
if product_row.valid?
# do something interesting
end
end
end
end
That moves the details of the validation out of this class but we still have that horrible conditional and we still have the file reading responsibility. Let's extract the file reading and remove the conditional:
class Inventory
def import_products(file_path)
valid_products_in(file_path).each do |product|
# do something interesting
end
end
def valid_products_in(file_path)
CSV.read(file_path, :headers => true)
.map { |r| ProductRow.new(r) }
.select(&:valid?)
end
end
The import_products
method is looking pretty simple now as it just iterates over valid products in the file and does the interesting work. Our new method valid_products_in
takes care of the file reading, decorating each row and filtering out any that are not valid.
When I described what this new method does, did you notice the number of things I listed? There were several and so I feel this needs to be broken up. Let's create a class for dealing with the CSV file:
class ProductCSVFile
def initialize(file_path)
@file_path = file_path
end
def products
CSV.read(@file_path, :headers => true).map { |p| ProductRow.new(p) }
end
end
This new class has the responsibility of dealing with the file and providing a list of products for us to work with, we'll keep the filtering of the valid products in Inventory
:
class Inventory
def import_products(file)
valid_products_in(file).each do |row|
# do something interesting
end
end
def valid_products_in(file)
ProductCSVFile.new(file).products.select(&:valid?)
end
end
The final thing to do here is to remove the hard-coded use of ProductCSVFile
, we should inject that instead which will enable us to swap out file handling strategies if we need to. Here is the final code in full:
require "delegate"
require "csv"
class ProductRow < SimpleDelegator
def valid?
field("name").present?
end
end
class ProductCSVFile
def initialize(file_path)
@file_path = file_path
end
def products
CSV.read(@file_path, :headers => true).map { |r| ProductRow.new(r) }
end
end
class Inventory
def initialize(import_class = ProductCSVFile)
@import_class = import_class
end
def valid_products_in(file_path)
@import_class.new(file_path).products.select(&:valid?)
end
def import_products(file_path)
valid_products_in(file_path).each do |product|
# do something interesting
end
end
end
After all that, what do you think of the final code, is it well-factored or over-engineered?
My thoughts start with a quote from Kent Beck:
any decent answer to an interesting question begins, "it depends..."
— Kent Beck (@KentBeck) May 6, 2015
Yes. It depends.
We took an 8 line piece of working code and turned it into about 30 lines. We also added two more classes to our codebase so there are more parts to think about. The code we had before was perfectly understandable and did the job we needed it to. The code isn't perfect but always remember that imperfect shipped code is better than perfect code that is never shipped.
Why refactor it?
This is all about change. Your software will change in the future and you can't predict how. The best defence we have is to make our code easy to modify when the time comes.
Single Responsibility Principle
The Single Responsibility Principle (SRP) not only applies to classes but also to methods. For each responsibility, there is one more reason to change. In our refactoring we separated out these responsibilities so that when the changes come we have a defined place to make them.
For example, what happens when the validation rules change? Maybe not only the product name is required but maybe also the SKU is now also mandatory. Where would that code be placed in the original example? What about if the SKU had to meet a certain length or format?
What typically happens is that humans follow patterns. We look to see how existing problems are solved and follow the same pattern. We also tend to follow the path of least resistance. So in reality we would just lob on some extra boolean logic to our validation conditional and move on. That maybe okay for some trivial things but this is the start of code rot. Slowly over time we tack on new bits of code without enough thought and before you know where you are you have an unmaintainable ball of mud.
Because we moved the validation out, there is now a clear place for this validation to live and when we need to change it the other methods are unaffected. This leads to a codebase that is much more maintainable in the future.
This was also the case for the new ProductCSVFile
class. It's responsibility is focused on dealing with CSV files. The original is very simple, but it is missing a bunch of things that need to be put in place before it's production ready. For example, where would error handling go in the original example to handle missing files or invalid file formats? Now we have a defined place for it and our other code is unaware of the details encapsulated within it.
Open Closed Principle
The Open Closed Principle (OCP) is another of the SOLID principles that leads to better software design by allowing software to change it's behaviour without modifying existing code. It allows our code to be open for extension but closed for modification.
In the refactoring above we used OCP to improve how files are dealt with. If you look at the new version of the Inventory
class, there is no dependency on the file being a CSV. We do have a default for the class that processes files but this can be overridden by the caller. If requirements change we just have to create a new class that can handle the new file type and it just needs to have a #products
method. If we need to support multiple file types, we could inject a mediator class that inspects the file and then delegates to the correct file processing class. This would be transparent to the Inventory
class. How would the original code dealt with a change to XML or even multiple formats? Probably massive changes to the whole method rather than just an isolated part that is designed to deal with files.
Maybe we're asking the wrong question.
When is the right time to do this type of refactoring?
Firstly, I would advise you to be pragmatic. If you have a demo in an hour and just need to ship something good enough, then go right ahead and make sure that demo is ready on time without worrying too much about it.
The other thing to consider is how sure you are that the code isn't going to change in the future. If the code in question will not change in the future then you should probably leave it but as change is unpredictable be ready to refactor when the change comes.
If your codebase has been around for a while, take a look at the churn and complexity metrics. You can use a service like Codeclimate or tools such as metric_fu to see how often different parts of your code change and how complex they are. Classes that are high churn and high complexity are ideal candidates for refactoring. Using these tools can help you make an informed decision.
Maybe you don't know enough about the system you are building yet to be able to make the decision. In that case, don't make the change right away but wait until you do have enough knowledge that makes the refactoring the right thing to do.
Defer making the decision until the time is right. Keep code simple and use good Object Oriented programming techniques to make your code easily changeable.
Responses
Feel free to tweet me @andypike with what you think and I'll add a selection here.
Thanks to Paul Sturgess for his tweet that sums up my thoughts nicely.
@andypike To answer the q: I think code can be deemed over-engineered but often with time (& future changes) ends up looking well designed.
— Paul Sturgess (@paulsturgess) June 25, 2015