Imagine you come across the listing below. It’s for a parameterised unit test verifying that a ShoppingCart’s Add() method throws an InvalidQuantity exception when the quantity argument is either 0 or a negative integer value. The quantity will only ever be an integer.
The unit test code is OK. However, the same cannot be said for the test data values: -4, -6, -2, and -5.
I have two questions for you:
What problems can you see with the test data values?
And, what test data values would you choose for this unit test?
The above unit test asserts that a ShoppingCart instance will throw an exception when we are adding a quantity containing zero or negative (integer) value. Yet, the test data, -4, -6, -2, and -5, doesn’t do a great job at covering that range. On the contrary, the supplied data gives the misleading impression of us testing for merely a very narrow and finite range—namely from -2 to -6. The name of this unit test tells a somewhat different story from the test data, which is not what we want. We like to convey an accurate and congruous account of what is being tested to the uninitiated reader. Everything about the test, its name, structure, and data, should all be consistent so unambiguously highlight the test’s purpose.
Let’s improve the test data.
First, we are not verifying the finite boundary—the test data does not include 0. Positive values will not throw an exception, but 0 will. Next, what about the first negative value, -1? I believe that is worth having too. Including -1 will indicate a continuity of values. If we test for 0 and then from -5 onwards, someone might wonder whether there is a gap where the test does not apply; i.e. between -1 and -4. Let’s make sure we include -1 in our test data.
So far, we have 0 and -1. Should we include the next few values, i.e. -2, -3, -4, -5? I believe that such a selection of values would, once again, give the (wrong) impression that we want to confine the test to a narrow finite range: 0 to -5.
How can we better communicate via our test data that the test covers, if not an infinite, then at least a vast range of negative integers? My preferred method is to increase the data values by half powers of 10:
10^0.5 = 3 (approx.)
10^1 = 10
10^1.5 = 30 (approx.)
10^2 = 100
What would our test data look like now? We will have 0, -1, -3, -10, -30, -100.
So far, so good. Why not whole powers of 10, e.g. 10, 100, 1000, and so on? Those are decent values too, but I believe they give the incorrect impression that only powers of 10 will work for our test.
Our test data looks great, and often, I will leave it at what we have so far. However, we could do with one more value, an extreme negative quantity. We could choose the integer data type’s limit of -2,147,483,648. However, that strikes me as a bit misleading: We are specifying these test data values to exercise the invalid quantity limits of the product we might legitimately come across when adding quantities into the ShoppingCart, not to test the limits of the integer data type. Given that our ShoppingCart implementation is unlikely to ever have -2,147,483,648 product items attempted to put into it, I prefer to employ as a real-world example a more realistic figure of maybe around -10,000 or even as much as a negative million. Or test data becomes: 0, -1, -3, -10, -30, -100, -1000000.
Here’s our test again, now with a much broader range of test data values:
In my opinion, our new set of test data is consistent with the name of the test and the narrative of testing for 0 and negative values. Unlike the original data set, our values are ordered, further adding to the clarity of the unit test.
Some people may consider this too much test data, and this will hit the performance of our unit test suite. No, it won’t. Not if the tests are proper unit tests, and we can run thousands of them inside a few seconds.
If you’ve been programming for a few years, you likely will have come across a Genius Programmer or two. You may be sitting next to one right now.
Genius programmers are easy to spot—they’re the ones who know even obscure programming frameworks in intimate detail. They love exploring the quirks and idiosyncrasies of a computer language. Anyone not coding 16 hours a day doesn’t even come close.
In Praise of the Genius Programmer
The Genius Programmer is excellent. Programming challenges confounding mere mortals crumble before their skill at the keyboard. They get the job done and double-quick too. Management loves them for their productivity and holds the Genius Programmer up as a model to be emulated.
Trouble in Paradise
But not all is well with the Genius Programmer. A huge blind spot holds them back from being a genuinely productive genius: Their code is overly complicated for all its brilliance and correct working. Furthermore, only they can touch their code since they’re the only ones who can understand what it’s doing.
The 5 Levels of Cognitive Prowess
As the story goes, Albert Einstein proposed a system of 5 levels for measuring intellectual prowess:
???
Genius
Brilliant
Intelligent
Smart
At the bottom is merely Smart. Then it’s Intelligent, followed by Brilliant.
Genius is at 2. Genius is not Number One. What’s better than genius?
According to Einstein, Simple is.
Here is the complete list:
Simple
Genius
Brilliant
Intelligent
Smart
Simple is at number 1. Simple beats the pants off genius.
How does this apply to me and programming?
It’s genius to meet a programming challenge with a complicated solution. However, a simple solution is even better. Writing simple code represents true programming genius.
The Simple Programmer Outperforms the Genius Programmer
Simple code is agile code. This deceptively short sentence hides all manner of ignore-at-your-peril truths for programmers and their organisations:
Complexity breeds more complexity. Entropy naturally builds up in our universe, and it takes effort to reverse this trend locally. Indeed, the lazy action is to write complex, hard to understand programs. We’re paving the road to ruin.
Writing simple code is hard. For us programmers, condensing the correct level of meaning into a function, module or component is difficult. It takes a persistent strong desire to improve, years of experience, practice and training to get to a point where we can write clean, clear code. While challenging, this journey is worthwhile and cognitively rewarding.
Another quote on the essence of simplicity from Einstein:
“Make things as simple as possible but no simpler.”
And two more from the ultimate Renaissance Man, Leonardo da Vinci:
“Simplicity is the ultimate sophistication.”
“A poet knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.”
So, which would you rather be? A Genius Programmer or a Simple Programmer?
Imagine we have a system for managing users. Say, within the system source code, two separate arrays hold information on users. One array contains unique identifiers for users, represented by integer ids:
user ids:
[ 1, 7, 9, 4, 12 ]
The second array contains the names of the users:
user names:
[ "Dave", "Carol", "Alice", "Eddie", "Bruce" ]
The elements at a given index correspond to the same user’s information in both arrays. Therefore, user “Alice” at (zero-based) index 2 in the names array has a user id of 9, also at index 2, in the user ids array.
The order of the elements in both arrays must be synchronised. If we remove the user with id = 9 at position 2 within the user ids array, we must remove it from the user names array also at position 2. The resulting arrays would see id = 9 and name = “Alice” spliced from both arrays at position 2:
user ids:
[ 1, 7, 4, 12 ]
user names:
[ "Dave", "Carol", "Eddie", "Bruce" ]
Can you see the problems we might run into with this flawed design?
What will happen when another programmer, unaware of the implicit relationship between the arrays, decides to reorder the user names in alphabetical order? Maybe they want to display the user names in alphabetic order in the UI:
reordered user names:
[ "Alice", "Bruce", "Carol", "Dave", "Eddie" ]
Yet the user ids array remains unchanged (!):
user ids:
[ 1, 7, 9, 4, 12 ]
Now, when we remove id = 9 at index 2, “Carol”—rather than “Alice”—will be deleted from the user names array. Oops!
Not only are we removing the wrong user name (“Carol”), but we are also retaining the user name we should have removed (“Alice”).
Why does this problem come about?
Even though the intent for these two arrays is to keep the ordering of the elements synchronised, there exists no mechanism to ensure this.
OK, how do we fix this?
A better way is to house related user ids and names together in an object:
Why is this better? By combining the related user information into objects removes the need for order synchronisation between two arrays. If we want to delete the user with id = 9, we do so, and the entire user object disappears:
users with { “id”: 9, “name”: “Alice” } in position 2 removed:
“Gather together those things that change for the same reason, and separate those things that change for different reasons.”
When it comes to the SRP, many of us—myself included—primarily consider the things we should separate—like UI & database. Yet, it’s just as important, if not more so, to place together functions and data structures that change at the same time and for the same reasons—like our user ids and user names.
Say we are working with a messy legacy software system where high-level business logic is intermingled with data access code. The business logic hardly ever changes—it is stable. Conversely, the data access code is modified often to reflect changes to the database schema—it’s volatile. Since data access and business logic are united, business logic code is also affected whenever we modify data access logic. Business logic has become volatile by being closely associated with the unstable data access code.
We have a system where we must make more changes than we would like. And fewer changes are better.
OK, what might be a better system configuration?
It would be one where we affect fewer lines of code whenever we make a change.
Let’s consider two different 1,000 line systems:
System 1 is the one we have been considering all along. Business logic and data access logic are inseparably joined. Let’s assume changes always affect 10% of code in this system. Also, 80% of changes are to volatile data access code and 20% to stable business logic.
Under these conditions, how many lines of code do we modify for the average requirements changes? (For the sake of simplicity, I am assuming that we are adding no new code):
Lines changed: 1,000 lines * 10% = 100 lines.
In System 1, the average code change is 100 lines.
In another system, System 2, Business logic is separated from data access logic. There exists a one-way dependency from data access logic onto business logic. So, data access logic could be affected when we alter business logic, yet business logic is isolated from data access logic changes.
The question came up again. What shall we call a type converting from a source type to a destination type? Say, from an Order to a DbOrder? How about OrderToDbOrderConverter?
Wouldn’t that mean that other source inputs producing the same destination object, DbOrder, required another converter class? Say SalesInvoiceAndPurchaseStatusToDbOrderConverter? That’s a rather long name.
With conversions, we are primarily interested in the resultant type—here DbOrder. That’s the type we need to carry on with our program.
So, I suggest that we name converters after the output type. If we are converting to DbOrder then the converter type would shorten to DbOrderConverter. For ease of use, all the methods become overloads of Convert() or Map(). The source conversion types become the input parameters.
For example:
public class DbOrderConverter { public DbOrder Convert(Order order) { // Convert from Order to DbOrder } public DbOrder Convert(SalesInvoice salesInvoice, PurchaseStatus purchaseStatus) { // Convert from SalesInvoice & PurchaseStatus to DbOrder } }
The only caveat with this approach is that we carefully construct converters to be aware of only one mechanism at a time. For example, if a converter knows how to convert both an ApiOrder, a web type, and a DbOrder, a database persistence type, to an Order, a business logic type, then this is a problem.
Why?
Consider this: Where would such a converter type be located? Within the database project? Now the database project would have to know about web! What about the web project? Not any better—the web project would need to reference and become coupled to the database project.
What is required here would be two separate converters, each serving one adjacent transition between architectural layers.
Functions describe system behaviour. They are actions. And as such, a function’s name should describe the work it performs.
What do these functions do?
InvertedImage()
Transaction()
Processed()
TotalAmountWithTax()
These function names are a bit ambiguous.
On the other hand, these function names are more descriptive:
ProcessExpenses()
ReconcileTranactions()
GetCustomer()
BuildVehicle()
The starting verb makes all the difference. The noun is also helpful—it describes the object of the behaviour; what the action is affecting. The noun is not always necessary. Sometimes it’s obvious. A File class with an Open() method will be about opening the file.
The verb ought to be in the present tense. Simple Present Tense, e.g. CreateGame(), is more suited for naming functions than Present Continuous Tense, e.g. CreatingGame().
Are there exceptions to the rule of putting the verb first?
Yes, there are.
Constructors: Constructors are functions setting up the initial state of an object. In many languages, the constructor name must be the name of the class:
public class ChessGame() { // Constructor public ChessGame() { } }
Properties: In C#, we can have property getter and setter functions. We call these without parentheses, and they resemble data more than behaviour:
var totalAmount = ShoppingCart.GetTotal;
or
var totalAmount = ShoppingCart.Total;
The second option looks more natural. The first line confuses without invoking parentheses.
Well, not hugely wrong. But still not entirely correct as I see it. This inconsistency has been bothering me for a little while. It’s not a big deal but something to be aware of.
And this flaw does not take away from the utility of Clean Architecture as a software engineering philosophy in any substantive way. I still firmly believe that Clean Architecture—for all its abstractness and potential flaw—leads to beautifully simple application designs that can be easily modified and extended. What’s not to love?
All right, let’s get into it.
Here is the familiar schematic diagram for Clean Architecture.
Clean Architecture
The Dependency Rule states that inner circles cannot know about outer ones. On the other hand, outer shells must depend on inner ones. In the end, some circles must know about one another, or it’s not a productive program but merely chunks of unconnected code.
So, outer rings know about inner ones, and the little arrows depict this one-way relationship:
Use Cases know about Entities—Entities are unaware of Use Cases.
Interface Adapters know about Business Logic—Business Logic is ignorant of Interface Adapters.
Frameworks & Devices are aware of Interface Adapters, yet Interface Adapters are oblivious to Frameworks & Devices.
I have a problem with the last statement. I believe it to be wrong.
Why?
Let’s assume the statement is correct and that Frameworks really are dependent on Interface Adapters.
Say, in a system, we use SQL Server as our database technology.
It would mean that our generic SQL Server data access SDK should reference, or depend on, our specific data access code to retrieve data from our SQL Server database!
Inversely, our specific SQL Server data access could not call generic SQL Server data access functions to fetch or save data since there exists no reference! How is our application-specific data access mean to do its job?
In my opinion, the dependency should be the other way around between the outermost layers.
Interface Adapters depend on Frameworks & Drivers (red arrow).
The Interface Adapters shell connects interfaces exposed by Business Logic with those of the general code of Frameworks, libraries and SDKs.
Adapters act as connectors between two dissimilar interfaces. To see as much, take a look at that adapter plug you’re using to connect the toaster/TV/microwave you bought in the UK to your power outlet. It has both a male and female plug interface. The toaster does not need to know about the UK adapter, and neither does your power outlet—neither depends on the adapter. On the other hand, the power adapter depends on both UK and local power outlet interfaces.
Adapters always depend on both interfaces. Yet, the things that an adapter connects do not, by themselves, depend on the adapter.
Adapters depend on both interfaces.
Therefore, it stands to reason that within the context of Clean Architecture, Interface Adapters depend on Business Logic interfaces and the outermost Frameworks circle.
That’s it. I hope that makes sense.
What do you think? Am I missing something? Feel free to comment or send your opinion to olaf@codecoach.co.nz. I want to learn and see where I am going wrong! :-)
In the last post on Clean Architecture, we explored how caching could be implemented in a pluggable manner within the Interface Adapters layer.
Here is the diagram of the data retrieval part of the system:
I have highlighted the interface connecting from CachedCustomerRepository to the database. However, there is a problem.
Again here is the listing for CachedCustomerRepository:
public class CachedCustomerRepository : ICustomerRepository { private ICache<Customer> Cache { get; set; } private ICustomerDatabase Database { get; set; }
public CachedCustomerRepository(ICache<Customer> cache, ICustomerDatabase database) { Cache = cache; Database = database; } public Customer GetCustomer(string emailAddress) { var customer = Cache.Get(emailAddress); if (customer != null) return customer; customer = Database.GetCustomer(emailAddress); if (customer == null) return null; // Put the customer into the cache for future calls. Cache.Set(customer.EmailAddress, customer); return customer; } }
Notice how the interface to the database is named ICustomerDatabase.
As I see it, we can do improve this design.
As it stands, if we wanted to revert to a simpler system, one without the caching infrastructure and rip out CachedCustomerRepository, so that we can connect up the SqlCustomerRepository to the use case, then we would be in trouble. The use case uses ICustomerRepository, yet SqlCustomerRepository implements ICustomerDatabase!
To make this work, could we have our business logic make its data calls against ICustomerDatabase?
We don’t want the business logic to be aware it’s calling a database—the use case should utilise a neutral interface like ICustomerRepository.
How about SqlCustomerRepository implementing ICustomerRepository instead of ICustomerDatabase?
Let’s try that out. Here is what the system diagram will look like without caching:
Yes, the system is pluggable—we can unplug SqlCustomerRepository and plug in a different database implementation.
OK, what about if we reinstate caching? Here is the system diagram:
Isn’t that nice?! The entire caching infrastructure, especially the logic module switching between cache and database data retrieval, CachedCustomerRepository, plugged upstream into the use case and downstream into the SqlCustomerRepository modules. It just connects up!
Technically, the new implementation of CachedCustomerRepository is a proxy, a segment that sits between two interfaces of the same type. Proxy is a valuable design pattern—it allows us to slip behaviour between an interface consumer and interface implementer.
Our new versatile system, whereby database implementation, caching implementation and the data retrieval workflow are pluggable modules, resembles how Lego blocks connect to one another. Elegant Software Architecture is highly pluggable—in the right way.
Clean Architecture – Caching With Interface Adapters
Last time we discovered the versatility of Clean Architecture’s Interface Adapters shell and how it acts as a connecting layer between the central Business Logic and our system’s specific technologies—the Frameworks & Devices.
In the example, we had business logic that wrote to and read from a SQL Server database. The code that lets us save and retrieve data from the specific database schema belongs in the Interface Adapters. Let’s check out the design we ended up with last time:
The advantage of this design is that when the database schema changes, our SQL Customer Repository Adapter will reflect those changes yet leave our Business Logic unaffected. Here we have powerful pluggability.
OK, let’s make things more interesting.
We want to introduce an optimisation—data caching. Instead of every data read running off to the database, we want first to check whether the data exists in a cache. If it does, return the data and do not read from the database. If not, go to the database, read the data, and put it in the cache for subsequent reads.
Now, where should the logic reading either from the cache or the database live? What about the entirely separate logic connecting us to a specific Redic cache implementation?
Do these belong to the business logic?
No, our business logic is the wrong place. These are both data concerns. All the business logic is concerned with is a way to read this data—that’s it. How the data is retrieved is not its concern.
OK, does this logic belong to the general SQL Server and Redis caching code? i.e. in the Frameworks & Devices shell?
No, that would also be incorrect. We don’t want to mix specific and generic data access.
On the other hand, it makes sense to have the code connecting our specific data retrieval from Redis cache (i.e. construction of cache keys, etc.) in a module at the same level as our specific SQL Server data retrieval code, SQL Customer Repository Adapter.
Furthermore, the logic switching between cache and database reads must sit in front of, and connect to, both the cache and database modules.
OK, so a picture is forming as to a system design that includes caching:
We now have two layers of logic in Interface Adapters—firstly, a module to switch between reading data from cache or database. Secondly, the adapters to retrieve data from Redis and SQL Server.
The logic to switch between cache and database is abstract. It does not mention Redis or SQL Server as the given cache or database technologies. Why? We get the flexibility to plug in other caching and database technologies.
Here is an example implementation of a class managing the Customer data retrieval logic, first from a cache and then from a database:
public class CachedCustomerRepository : ICustomerRepository { private ICache<Customer> Cache { get; set; } private ICustomerDatabase Database { get; set; } public CachedCustomerRepository(ICache<Customer> cache, ICustomerDatabase database) { Cache = cache; Database = database; } public Customer GetCustomer(string emailAddress) { var customer = Cache.Get(emailAddress); if (customer != null) return customer; customer = Database.GetCustomer(emailAddress); if (customer == null) return null; // Put the customer into the cache for future calls. Cache.Set(customer.EmailAddress, customer); return customer; } }
What we have here is a decent pluggable design. However, we can improve on it further by making a small change. We’ll look into that next time.
I can’t wait until next time when we will increase the capability of our Interface Adapters layer by introducing data caching.
OK, let’s get into it.
Say we have a system that we have designed with Clean Architecture guidelines in mind.
There is some business logic, a use case class, that needs access to customer data, and it gets this via an ICustomerRepository interface:
public interface ICustomerRepository { Customer GetCustomer(string emailAddress); }
Note: For the sake of simplicity, all the calls are synchronous.
Customer data is stored in a SQL Server database table called Customers. We have generic SQL Server framework code to save and retrieve data from SQL Server databases in general. This generalised framework code belongs in the outermost Frameworks & Devices shell of our Clean Architecture diagram.
OK, where do we put the connecting code, the behaviour, that utilises the generalised SQL Server framework but is about our specific database and how we store and retrieve customer data from it? Behaviour like reading the database connection string from the application-specific configuration; reading data from the Customers table, and possibly some other joined tables.
Maybe such functionality belongs to the Business Logic?
Business Logic is meant to be ignorant of data mechanisms like SQL Server. Referencing SQL data access code aware of our database type and internal schema suggests intimately knowing data mechanisms. No, we can’t put this code into the Business Logic layer.
What about the Frameworks & Devices shell? Well, it doesn’t here either. This outermost layer schematically holds generalised frameworks and is, in the context of SQL Server, concerned with generic SQL data access.
Our SQL Server database-specific code belongs to the Interface Adapters shell. This includes any auto-generated, schema-aware Object Relational Mapper (ORM) data access code.
Put into an architectural diagram, we end up with one layer of Interface Adapters:
Today has been a decent start to designing a slice of data access for a system designed with Clean Architecture. However, what if we wanted to optimise the system and include caching of data in, say, a Redis instance? How would this change our design? We’ll find out next time.
Better Unit Test Data For An Infinite Range
/by Olaf ThielkeBetter Unit Test Data For An Infinite Range
Imagine you come across the listing below. It’s for a parameterised unit test verifying that a ShoppingCart’s Add() method throws an InvalidQuantity exception when the quantity argument is either 0 or a negative integer value. The quantity will only ever be an integer.
The unit test code is OK. However, the same cannot be said for the test data values: -4, -6, -2, and -5.
I have two questions for you:
The above unit test asserts that a ShoppingCart instance will throw an exception when we are adding a quantity containing zero or negative (integer) value. Yet, the test data, -4, -6, -2, and -5, doesn’t do a great job at covering that range. On the contrary, the supplied data gives the misleading impression of us testing for merely a very narrow and finite range—namely from -2 to -6. The name of this unit test tells a somewhat different story from the test data, which is not what we want. We like to convey an accurate and congruous account of what is being tested to the uninitiated reader. Everything about the test, its name, structure, and data, should all be consistent so unambiguously highlight the test’s purpose.
Let’s improve the test data.
First, we are not verifying the finite boundary—the test data does not include 0. Positive values will not throw an exception, but 0 will. Next, what about the first negative value, -1? I believe that is worth having too. Including -1 will indicate a continuity of values. If we test for 0 and then from -5 onwards, someone might wonder whether there is a gap where the test does not apply; i.e. between -1 and -4. Let’s make sure we include -1 in our test data.
So far, we have 0 and -1. Should we include the next few values, i.e. -2, -3, -4, -5? I believe that such a selection of values would, once again, give the (wrong) impression that we want to confine the test to a narrow finite range: 0 to -5.
How can we better communicate via our test data that the test covers, if not an infinite, then at least a vast range of negative integers? My preferred method is to increase the data values by half powers of 10:
What would our test data look like now? We will have 0, -1, -3, -10, -30, -100.
So far, so good. Why not whole powers of 10, e.g. 10, 100, 1000, and so on? Those are decent values too, but I believe they give the incorrect impression that only powers of 10 will work for our test.
Our test data looks great, and often, I will leave it at what we have so far. However, we could do with one more value, an extreme negative quantity. We could choose the integer data type’s limit of -2,147,483,648. However, that strikes me as a bit misleading: We are specifying these test data values to exercise the invalid quantity limits of the product we might legitimately come across when adding quantities into the ShoppingCart, not to test the limits of the integer data type. Given that our ShoppingCart implementation is unlikely to ever have -2,147,483,648 product items attempted to put into it, I prefer to employ as a real-world example a more realistic figure of maybe around -10,000 or even as much as a negative million. Or test data becomes: 0, -1, -3, -10, -30, -100, -1000000.
Here’s our test again, now with a much broader range of test data values:
In my opinion, our new set of test data is consistent with the name of the test and the narrative of testing for 0 and negative values. Unlike the original data set, our values are ordered, further adding to the clarity of the unit test.
Some people may consider this too much test data, and this will hit the performance of our unit test suite. No, it won’t. Not if the tests are proper unit tests, and we can run thousands of them inside a few seconds.
How To Be Better Than a Genius Programmer—The Simple Way
/by Olaf ThielkeHow To Be Better Than a Genius Programmer
—The Simple Way
If you’ve been programming for a few years, you likely will have come across a Genius Programmer or two. You may be sitting next to one right now.
Genius programmers are easy to spot—they’re the ones who know even obscure programming frameworks in intimate detail. They love exploring the quirks and idiosyncrasies of a computer language. Anyone not coding 16 hours a day doesn’t even come close.
In Praise of the Genius Programmer
The Genius Programmer is excellent. Programming challenges confounding mere mortals crumble before their skill at the keyboard. They get the job done and double-quick too. Management loves them for their productivity and holds the Genius Programmer up as a model to be emulated.
Trouble in Paradise
But not all is well with the Genius Programmer. A huge blind spot holds them back from being a genuinely productive genius: Their code is overly complicated for all its brilliance and correct working. Furthermore, only they can touch their code since they’re the only ones who can understand what it’s doing.
The 5 Levels of Cognitive Prowess
As the story goes, Albert Einstein proposed a system of 5 levels for measuring intellectual prowess:
At the bottom is merely Smart. Then it’s Intelligent, followed by Brilliant.
Genius is at 2. Genius is not Number One. What’s better than genius?
According to Einstein, Simple is.
Here is the complete list:
Simple is at number 1. Simple beats the pants off genius.
How does this apply to me and programming?
It’s genius to meet a programming challenge with a complicated solution. However, a simple solution is even better. Writing simple code represents true programming genius.
Simple code is agile code. This deceptively short sentence hides all manner of ignore-at-your-peril truths for programmers and their organisations:
Another quote on the essence of simplicity from Einstein:
And two more from the ultimate Renaissance Man, Leonardo da Vinci:
So, which would you rather be? A Genius Programmer or a Simple Programmer?
The Two Synchronised Arrays Problem
/by Olaf ThielkeThe Two Synchronised Arrays Problem
Imagine we have a system for managing users. Say, within the system source code, two separate arrays hold information on users. One array contains unique identifiers for users, represented by integer ids:
user ids:
[ 1, 7, 9, 4, 12 ]
The second array contains the names of the users:
user names:
[ "Dave", "Carol", "Alice", "Eddie", "Bruce" ]
The elements at a given index correspond to the same user’s information in both arrays. Therefore, user “Alice” at (zero-based) index 2 in the names array has a user id of 9, also at index 2, in the user ids array.
The order of the elements in both arrays must be synchronised. If we remove the user with id = 9 at position 2 within the user ids array, we must remove it from the user names array also at position 2. The resulting arrays would see id = 9 and name = “Alice” spliced from both arrays at position 2:
user ids:
[ 1, 7, 4, 12 ]
user names:
[ "Dave", "Carol", "Eddie", "Bruce" ]
Can you see the problems we might run into with this flawed design?
What will happen when another programmer, unaware of the implicit relationship between the arrays, decides to reorder the user names in alphabetical order? Maybe they want to display the user names in alphabetic order in the UI:
reordered user names:
[ "Alice", "Bruce", "Carol", "Dave", "Eddie" ]
Yet the user ids array remains unchanged (!):
user ids:
[ 1, 7, 9, 4, 12 ]
Now, when we remove id = 9 at index 2, “Carol”—rather than “Alice”—will be deleted from the user names array. Oops!
Not only are we removing the wrong user name (“Carol”), but we are also retaining the user name we should have removed (“Alice”).
Why does this problem come about?
Even though the intent for these two arrays is to keep the ordering of the elements synchronised, there exists no mechanism to ensure this.
OK, how do we fix this?
A better way is to house related user ids and names together in an object:
users:
Why is this better? By combining the related user information into objects removes the need for order synchronisation between two arrays. If we want to delete the user with id = 9, we do so, and the entire user object disappears:
users with { “id”: 9, “name”: “Alice” } in position 2 removed:
And if we want to reorder the user objects by name alphabetically, then the user ids get reordered too:
users ordered by name:
Is there a general rule or principle to help us avoid such problems?
Yes, there is: The Single Responsibility Principle (SRP) from The SOLID Principles:
“Gather together those things that change for the same reason, and separate those things that change for different reasons.”
When it comes to the SRP, many of us—myself included—primarily consider the things we should separate—like UI & database. Yet, it’s just as important, if not more so, to place together functions and data structures that change at the same time and for the same reasons—like our user ids and user names.
Always Separate Stable And Unstable Code
/by Olaf ThielkeAlways Separate Stable And Unstable Code
Photo by Robert Gourley on Unsplash
Recently we discovered a general truth – stable things should not depend on volatile things. If they do, then they too will become unstable by association.
Why does this matter in programming?
Say we are working with a messy legacy software system where high-level business logic is intermingled with data access code. The business logic hardly ever changes—it is stable. Conversely, the data access code is modified often to reflect changes to the database schema—it’s volatile. Since data access and business logic are united, business logic code is also affected whenever we modify data access logic. Business logic has become volatile by being closely associated with the unstable data access code.
We have a system where we must make more changes than we would like. And fewer changes are better.
OK, what might be a better system configuration?
It would be one where we affect fewer lines of code whenever we make a change.
Let’s consider two different 1,000 line systems:
System 1 is the one we have been considering all along. Business logic and data access logic are inseparably joined. Let’s assume changes always affect 10% of code in this system. Also, 80% of changes are to volatile data access code and 20% to stable business logic.
Under these conditions, how many lines of code do we modify for the average requirements changes? (For the sake of simplicity, I am assuming that we are adding no new code):
Lines changed: 1,000 lines * 10% = 100 lines.
In System 1, the average code change is 100 lines.
In another system, System 2, Business logic is separated from data access logic. There exists a one-way dependency from data access logic onto business logic. So, data access logic could be affected when we alter business logic, yet business logic is isolated from data access logic changes.
Data access logic modification: 500 lines * 10% = 50 lines
Business logic change: 1,000 lines * 10% = 100 lines
(Simplifying Assumption: Business logic change always affects data access logic)
System 2 average number of lines changed: 20% * 100 lines + 80% * 50 lines = 60 lines
In System 2, the average code change is 60 lines.
In this simplified model, System 2 experiences only 60 lines changed on average, while System 1 is much more volatile with 100 lines of code changing.
We have just derived the rationale for the Dependency Inversion Principle.
Name Converters After The Output Type
/by Olaf ThielkeName Converters After The Output Type
Photo by Designnn.co on Unsplash
The question came up again. What shall we call a type converting from a source type to a destination type? Say, from an
Order
to aDbOrder
? How aboutOrderToDbOrderConverter
?Wouldn’t that mean that other source inputs producing the same destination object,
DbOrder
, required another converter class? SaySalesInvoiceAndPurchaseStatusToDbOrderConverter
? That’s a rather long name.With conversions, we are primarily interested in the resultant type—here
DbOrder
. That’s the type we need to carry on with our program.So, I suggest that we name converters after the output type. If we are converting to
DbOrder
then the converter type would shorten toDbOrderConverter
. For ease of use, all the methods become overloads ofConvert()
orMap()
. The source conversion types become the input parameters.For example:
The only caveat with this approach is that we carefully construct converters to be aware of only one mechanism at a time. For example, if a converter knows how to convert both an
ApiOrder
, a web type, and aDbOrder
, a database persistence type, to anOrder
, a business logic type, then this is a problem.Why?
Consider this: Where would such a converter type be located? Within the database project? Now the database project would have to know about web! What about the web project? Not any better—the web project would need to reference and become coupled to the database project.
What is required here would be two separate converters, each serving one adjacent transition between architectural layers.
Function Names Start With A Verb
/by Olaf ThielkeFunction Names Start With A Verb
Photo by Headway on Unsplash
Today’s Tip is foundational to good programming.
Functions describe system behaviour. They are actions. And as such, a function’s name should describe the work it performs.
What do these functions do?
InvertedImage()
Transaction()
Processed()
TotalAmountWithTax()
These function names are a bit ambiguous.
On the other hand, these function names are more descriptive:
ProcessExpenses()
ReconcileTranactions()
GetCustomer()
BuildVehicle()
The starting verb makes all the difference. The noun is also helpful—it describes the object of the behaviour; what the action is affecting. The noun is not always necessary. Sometimes it’s obvious. A
File
class with anOpen()
method will be about opening the file.The verb ought to be in the present tense. Simple Present Tense, e.g.
CreateGame()
, is more suited for naming functions than Present Continuous Tense, e.g.CreatingGame()
.Are there exceptions to the rule of putting the verb first?
Yes, there are.
Constructors: Constructors are functions setting up the initial state of an object. In many languages, the constructor name must be the name of the class:
Properties: In C#, we can have property getter and setter functions. We call these without parentheses, and they resemble data more than behaviour:
var totalAmount = ShoppingCart.GetTotal;
or
var totalAmount = ShoppingCart.Total;
The second option looks more natural. The first line confuses without invoking parentheses.
That’s it for today.
Please start function names with a verb.
Clean Architecture Is Wrong
/by Olaf ThielkeClean Architecture Is Wrong
Well, not hugely wrong. But still not entirely correct as I see it. This inconsistency has been bothering me for a little while. It’s not a big deal but something to be aware of.
And this flaw does not take away from the utility of Clean Architecture as a software engineering philosophy in any substantive way. I still firmly believe that Clean Architecture—for all its abstractness and potential flaw—leads to beautifully simple application designs that can be easily modified and extended. What’s not to love?
All right, let’s get into it.
Here is the familiar schematic diagram for Clean Architecture.
Clean Architecture
The Dependency Rule states that inner circles cannot know about outer ones. On the other hand, outer shells must depend on inner ones. In the end, some circles must know about one another, or it’s not a productive program but merely chunks of unconnected code.
So, outer rings know about inner ones, and the little arrows depict this one-way relationship:
Use Cases know about Entities—Entities are unaware of Use Cases.
Interface Adapters know about Business Logic—Business Logic is ignorant of Interface Adapters.
Frameworks & Devices are aware of Interface Adapters, yet Interface Adapters are oblivious to Frameworks & Devices.
I have a problem with the last statement. I believe it to be wrong.
Why?
Let’s assume the statement is correct and that Frameworks really are dependent on Interface Adapters.
Say, in a system, we use SQL Server as our database technology.
It would mean that our generic SQL Server data access SDK should reference, or depend on, our specific data access code to retrieve data from our SQL Server database!
Inversely, our specific SQL Server data access could not call generic SQL Server data access functions to fetch or save data since there exists no reference! How is our application-specific data access mean to do its job?
In my opinion, the dependency should be the other way around between the outermost layers.
Interface Adapters depend on Frameworks & Drivers (red arrow).
The Interface Adapters shell connects interfaces exposed by Business Logic with those of the general code of Frameworks, libraries and SDKs.
Adapters act as connectors between two dissimilar interfaces. To see as much, take a look at that adapter plug you’re using to connect the toaster/TV/microwave you bought in the UK to your power outlet. It has both a male and female plug interface. The toaster does not need to know about the UK adapter, and neither does your power outlet—neither depends on the adapter. On the other hand, the power adapter depends on both UK and local power outlet interfaces.
Photo by Call Me Fred on Unsplash
Adapters always depend on both interfaces. Yet, the things that an adapter connects do not, by themselves, depend on the adapter.
Adapters depend on both interfaces.
Therefore, it stands to reason that within the context of Clean Architecture, Interface Adapters depend on Business Logic interfaces and the outermost Frameworks circle.
That’s it. I hope that makes sense.
What do you think? Am I missing something? Feel free to comment or send your opinion to olaf@codecoach.co.nz. I want to learn and see where I am going wrong! :-)
Clean Architecture – Caching As A Proxy
/by Olaf ThielkeClean Architecture – Caching As A Proxy
In the last post on Clean Architecture, we explored how caching could be implemented in a pluggable manner within the Interface Adapters layer.
Here is the diagram of the data retrieval part of the system:
I have highlighted the interface connecting from
CachedCustomerRepository
to the database. However, there is a problem.Again here is the listing for
CachedCustomerRepository
:Notice how the interface to the database is named
ICustomerDatabase
.As I see it, we can do improve this design.
As it stands, if we wanted to revert to a simpler system, one without the caching infrastructure and rip out
CachedCustomerRepository
, so that we can connect up theSqlCustomerRepository
to the use case, then we would be in trouble. The use case usesICustomerRepository
, yet SqlCustomerRepository implementsICustomerDatabase
!To make this work, could we have our business logic make its data calls against
ICustomerDatabase
?We don’t want the business logic to be aware it’s calling a database—the use case should utilise a neutral interface like
ICustomerRepository
.How about
SqlCustomerRepository
implementingICustomerRepository
instead ofICustomerDatabase
?Let’s try that out. Here is what the system diagram will look like without caching:
Yes, the system is pluggable—we can unplug
SqlCustomerRepository
and plug in a different database implementation.OK, what about if we reinstate caching? Here is the system diagram:
Isn’t that nice?! The entire caching infrastructure, especially the logic module switching between cache and database data retrieval,
CachedCustomerRepository
, plugged upstream into the use case and downstream into theSqlCustomerRepository
modules. It just connects up!Technically, the new implementation of
CachedCustomerRepository
is a proxy, a segment that sits between two interfaces of the same type. Proxy is a valuable design pattern—it allows us to slip behaviour between an interface consumer and interface implementer.Our new versatile system, whereby database implementation, caching implementation and the data retrieval workflow are pluggable modules, resembles how Lego blocks connect to one another. Elegant Software Architecture is highly pluggable—in the right way.
Clean Architecture – Caching With Interface Adapters
/by Olaf ThielkeClean Architecture – Caching With Interface Adapters
Last time we discovered the versatility of Clean Architecture’s Interface Adapters shell and how it acts as a connecting layer between the central Business Logic and our system’s specific technologies—the Frameworks & Devices.
In the example, we had business logic that wrote to and read from a SQL Server database. The code that lets us save and retrieve data from the specific database schema belongs in the Interface Adapters. Let’s check out the design we ended up with last time:
The advantage of this design is that when the database schema changes, our SQL Customer Repository Adapter will reflect those changes yet leave our Business Logic unaffected. Here we have powerful pluggability.
OK, let’s make things more interesting.
We want to introduce an optimisation—data caching. Instead of every data read running off to the database, we want first to check whether the data exists in a cache. If it does, return the data and do not read from the database. If not, go to the database, read the data, and put it in the cache for subsequent reads.
Now, where should the logic reading either from the cache or the database live? What about the entirely separate logic connecting us to a specific Redic cache implementation?
Do these belong to the business logic?
No, our business logic is the wrong place. These are both data concerns. All the business logic is concerned with is a way to read this data—that’s it. How the data is retrieved is not its concern.
OK, does this logic belong to the general SQL Server and Redis caching code? i.e. in the Frameworks & Devices shell?
No, that would also be incorrect. We don’t want to mix specific and generic data access.
On the other hand, it makes sense to have the code connecting our specific data retrieval from Redis cache (i.e. construction of cache keys, etc.) in a module at the same level as our specific SQL Server data retrieval code, SQL Customer Repository Adapter.
Furthermore, the logic switching between cache and database reads must sit in front of, and connect to, both the cache and database modules.
OK, so a picture is forming as to a system design that includes caching:
We now have two layers of logic in Interface Adapters—firstly, a module to switch between reading data from cache or database. Secondly, the adapters to retrieve data from Redis and SQL Server.
The logic to switch between cache and database is abstract. It does not mention Redis or SQL Server as the given cache or database technologies. Why? We get the flexibility to plug in other caching and database technologies.
Here is an example implementation of a class managing the Customer data retrieval logic, first from a cache and then from a database:
What we have here is a decent pluggable design. However, we can improve on it further by making a small change. We’ll look into that next time.
Clean Architecture – Interface Adapters Example
/by Olaf ThielkeClean Architecture – Interface Adapters Example
Today I would like to look at a simple Clean Architecture example to see what part the Interface Adapters shell plays in data access.
I can’t wait until next time when we will increase the capability of our Interface Adapters layer by introducing data caching.
OK, let’s get into it.
Say we have a system that we have designed with Clean Architecture guidelines in mind.
There is some business logic, a use case class, that needs access to customer data, and it gets this via an
ICustomerRepository
interface:Note: For the sake of simplicity, all the calls are synchronous.
Customer data is stored in a SQL Server database table called Customers. We have generic SQL Server framework code to save and retrieve data from SQL Server databases in general. This generalised framework code belongs in the outermost Frameworks & Devices shell of our Clean Architecture diagram.
OK, where do we put the connecting code, the behaviour, that utilises the generalised SQL Server framework but is about our specific database and how we store and retrieve customer data from it? Behaviour like reading the database connection string from the application-specific configuration; reading data from the Customers table, and possibly some other joined tables.
Maybe such functionality belongs to the Business Logic?
Business Logic is meant to be ignorant of data mechanisms like SQL Server. Referencing SQL data access code aware of our database type and internal schema suggests intimately knowing data mechanisms. No, we can’t put this code into the Business Logic layer.
What about the Frameworks & Devices shell? Well, it doesn’t here either. This outermost layer schematically holds generalised frameworks and is, in the context of SQL Server, concerned with generic SQL data access.
Our SQL Server database-specific code belongs to the Interface Adapters shell. This includes any auto-generated, schema-aware Object Relational Mapper (ORM) data access code.
Put into an architectural diagram, we end up with one layer of Interface Adapters:
Today has been a decent start to designing a slice of data access for a system designed with Clean Architecture. However, what if we wanted to optimise the system and include caching of data in, say, a Redis instance? How would this change our design? We’ll find out next time.