event-driven architecturecomplexity
- Published on
Simplifying Event-Driven Architecture: 3 Guardrails to avoid accidental complexity
Overview of 3 guardrails you can put in place to manage event-driven architecture complexity
- David Boyne
- Published on • 12 min read
There are two types of complexity you will face when building event-driven architectures; essential complexity and accidental complexity.
Essential complexity: the inherent difficulty of the problem domain that cannot be removed or simplified. This could mean your business domain you operate within is naturally complex and the nature of distributed systems these include CAP Theorem, Network Latency, Fault tolerance, Eventual consistency, Security, etc...
Accidental complexity: the difficulty introduced by the tools, techniques, or approaches used to solve problems, which could be avoided or mitigated with better choices. This complexity is not intrinsic to the problem domain but is often a byproduct of implementation.
Many topics around event-driven architecture often focus on essential complexity and how to solve some technical challenges, but often skip over accidental complexity, what to look out for, and how to solve it.
In this blog post I want to share with you three common ways accidental complexity is introduced into your event-driven architecture and guardrails you can put in place to avoid them.
1. Understanding implementation vs behaviour
Event-driven architectures are becoming more accessible to us (as I mention in my previous post: 5 open source standards you should know when building event-driven architectures).
Cloud providers like AWS and Microsoft are offering ready-made services and solutions, whilst we see an increase in innovation in event-driven architectures from CloudFlare, Inngest, restate and trigger.dev.
As event-driven architectures become more accessible, we can implement solutions more quickly and easily. However, this advantage comes with a trade-off: it often leads to increased complexity when adopting an implementation-first approach.
Misunderstood value from implementation first
When we prioritize implementation first, we initially experience a surge in value. Downstream consumers are connected to our events, and managed services simplify this process. As time goes on, we continue adding more consumers and producers.
Eventually, you may begin to notice resistance in your architecture, maintenance becomes more challenging, and the initial value you got from event-driven architectures decreases. But why does this happen?
Complexity with event-driven architectures tends to rapidly increase over time. What starts so simple with a handful of producers and consumers can lead to a complex mess or big ball of mud.
We start to understand the importance of dead-letter queues and idempotency (essential complexity), we introduce wrong boundaries between our services and ways they communicate between each other using events, and the governance and maintainability of the events themself are out of control.
This complexity is often introduced with an implementation first mindset.
It’s the developers (mis)understanding not the domains expert knowledge that gets released into production. - Alberto Brandolini creator of Event Storming
What if we decide to think about the behaviour of our system first? What benefits can we get?
Behaviour first, then implementation
Behavior-first thinking involves taking the time understand how your system behaves before diving into implementation.
Now, I know you might be thinking that sounds super basic, but I cannot stress enough how many people skip this stage, and jump directly into implementation, and deploy their assumptions into production. Just take a minute to think.
When building event-driven architectures, significant value lies in identifying the key events within your system, including business events, domain events, and internal versus public events.
To effectively identify these events, collaboration with stakeholders is essential. Together, you can establish and refine a ubiquitous language that aligns with your domain, ensuring clarity and shared understanding across the system.
Luckily for us, there are some great communities out there focusing on this exact problem. Many companies and teams are using Event Storming and/or Event modelling to help them explore their own domains.
These processes help us identify our events, work with stake holders and get a shared understanding of your system. This is invaluable.
Gaining a shared understanding of your system and domains can help with many issues downstream with an implementation first approach. Take time to understand, then implement.
Note: Part of exploration is the implementation phase. You can more understanding as time goes on. This is great, but just make sure you continue to drive a shared understanding between your teams.
Actions
- Before you jump into implementation, think about the behaviour of your system
- Explore Event storming or Event Modelling to help
- Work with domain experts to identify you events, contracts and flow of your architecture
2. Event strategy
Messages (events, commands and queries) are the core of your event-driven architecture. Events tell a story. Modern applications have events and state (e.g. Tables).
When you start using messages in your architecture they will evolve. The payloads will evolve, your business will evolve, things don’t stay the same.
So how are you going to safely change your events for downstream consumers?
Many teams implementing event-driven architectures over look this. They don’t define an event evolution strategy and end up in mess. Things become hard to change and evolve as producers and consumers come and go over time.
How do we solve this? You need to define a event evolution strategy.
You need to treat your events like you treat your APIs. You need to version them, understand their coupling, and figure out your evolution strategy.
Where it all starts, and how complexity increases
The main problem with event-driven architectures is things start simple and rapidly get harder. With a handful of consumers and producers, maintaining the relationship between them all is relatively simple, and getting on top of their versioning is not a problem.
As time goes on more producers and consumers are added to your architecture. Keeping track of versioning, events, payloads becomes harder. To the point where it’s almost impossible to keep track of who is consuming/producing what and how versioning is managed in your architecture.
With many messages in your architecture, each evolving over time it becomes hard to maintain compatibility between producers and consumers and leads to a high risk of breaking the contracts between them.
To help you overcome this challenge you can define a evolution strategy.
Implementing a message evolution strategy
An evolution strategy can take the form of automated checks, such as contract testing, or a straightforward document outlining guidelines for your teams on how to evolve your events. There is no single solution that fits all situations.
Your event evolution strategy will depend on your use case and needs, but here are some ideas to get you started.
You need to think about how your events are going to evolve in your architecture. There are few options you can consider. Are your events going to be backwards compatible? Are all new fields optional? Are you going to raise multiple versions of your events?
There are many options when dealing with event versioning including Backwards compatibility, forwards compatibility, or full compatibility. Confluent have a great resource of schema evolution I highly recommend reading and understanding.
Even if you have a simple document and gain a shared understanding across your teams about your strategy, you are ahead of most in the industry.
Not having a evolution strategy can lead to complexity over time.
At the start things are simple and don’t really require much coordination between change, but over time as consumers/producers are added this becomes hard to maintain and manage. So rather than having this problem in the future, think of your strategy today and get ahead of yourself.
Actions
- Write a simple event evolution strategy
- Explore what options you have and dive deeper
- Treat your events like you will your APIs.
3. Complexity with event consumers
When designing and implementing event-driven architectures, things may appear decoupled, but in reality, they often are not.
We rely on channels such as brokers, queues, or topics to transfer messages between systems. On paper, this might give the impression of a decoupled architecture, but...
The problem is that complexity lies in between the lines of these relationships.
One example of accidental complexity that arises is the coupling between producers and consumers, which often occurs due to the specific ways we design and consume events.
Coupling in your event consumers
When you consume an event from a producer you are a conformist by default.
You are conforming to the event/payload/model of the producer within your context and domain. This means you are coupled directly to the producer that sends the event. This may be OK.. or may be a problem (depending on the context).
The conformist pattern (bounded context mappings) requires a high level of communication between the producer and consumer. As the producer changes the contract the downstream consumers may be impacted.
Is this pattern bad?: No. Depending on your requirements, and coupling you want this pattern may be OK for you, but remember this pattern is the one you get by default. So if you want to product your domains, you need to map into something else.
Anti-corruption layer for event consumers
If you don’t want to conform to the producers message/model structure, then a common pattern to introduce is a ACL (anti-corruption layer).
This bounded context map allows you to map the given message into a model you and your team understands. This can be useful when service A and service B have different ubiquitous language or understandings of the systems they operate in.
The level of communication is lower than conformist, and can help you decouple models from producer to consumer (only if you need too).
Open host service for event consumers
The open host service means we push back onto the producing service, and agree and contract. The producer (A) maps their models into the contract that was agreed by parties.
The level of communication is lower than conformist, but still requires an initial conversation, but once agreed on contracts the level of communication required tends to reduce.
The pattern could be good for raising domain events. Events that important to the organisation. They are not represented by any implementation details of a given domain, but mapped into a language the business understands.
Which bounded context map to use?
So, which one should you use for your event consumers?…. It depends (as usual).
Coupling comes in all shapes and sizes, but also the the distant between components has to be taken into consideration as Vladik Khonovov shared in his book Balancing Coupling in Software Design.
Understand the context. If your consumer is close to your producer (domain) or within it, then maybe conforming to this model is not a problem? If your consumer is in another domain or organisation then introducing context mappings can help.
You also have to take into consideration the level of communication required between these options. The conformist pattern requires a high level of communication where the open host service requires initial communication but then lowers over time.
To summarise, understand bounded context mappings and how they couple or decouple you from the producing service.
Actions
- Understand that how you consume events may couple you back to the producer
- Explore bounded context mappings and how they can help you
- Read up on more bounded context mapping with the cheat sheet.
Want to learn more about event-driven architectures?
Get the EDA Visuals Book! (For free)Over 60 bite sized visuals I created to help you learn about event-driven architectures.
Summary
Event-driven architecture has two types of complexity; essential complexity and accidental complexity.
Without managing the accidental complexity, you can end up in a complex distributed system that is hard to maintain, evolve and work with.
In this blog post we explore three ways you can manage accidental complexity including understanding behaviour first vs implementation first, an event evolution strategy and how bounded context mappings can help.
By implementing some of these actions in this post blog you can manage some of it’s complexity and keep the benefits that event-driven architecture offers you (resilience, decoupled and fault tolerant architecture options).
If you want prefer to watch a talk I gave on complexity and event-driven architecture you can watch it below.
If you have any questions feel free to reach out on X, Bluesky or LinkedIn.
I hope you enjoyed this blog post and it helps you on your event-driven architecture journey.