How to UnMesh Your Data
How to UnMesh Your Data
Is Data Mesh a Solution, or Problem?
The universe, they say, is “finite yet unbounded”. In contrast, capital, always a finite resource, is inevitably bounded. Making decisions about where to spend your hard-won allocations is forever the stuff of long meetings and agonizing decisions. Your technology allocation can often be the most difficult. Vendors from literally anywhere are wreaking havoc on your inbox promising to solve all sorts of problems. They may even be blowing up your phone, or, the most difficult, their outreach is bubbling up internally from your staff.
Enter “Data Mesh”. If you haven’t heard about it from a coworker, you’ve heard about it from those outbound campaigns. It’s very buzzy and seemingly has legs. Is it living up to the hype or another solution looking for a problem? Is it text messaging or augmented reality?
One answer is…
Enter Data Mesh?
A few notes from the originating authors:
“...the convergence of Distributed Domain Driven Architecture, Self-serve Platform Design, and Product Thinking with Data.”
“In order to decentralize the monolithic data platform, we need to reverse how we think about data, it's locality and ownership. Instead of flowing the data from domains into a centrally owned data lake or platform, domains need to host and serve their domain datasets in an easily consumable way.”
This decentralization has both the purpose and effect of kicking the integration considerations down the road. Is this a plus or a minus?
Why Data Mesh?
The trouble with data is that data is hard. Harder than application software. Harder even than infrastructure. Now throw in pulls of data from 3rd party systems - either data you sent for them to collect on your behalf (think web analytics, marketing traffic feeds), or data that enriches your own analysis or business perspective (perhaps census data, historical weather, shared lists of genomic markers to cancer treatment responses).
In the majority case, the structure, shape, and quality of data are afterthoughts. Software Engineers and DevOps folks are concerned with getting it working, not getting it “right”. Completion, not accuracy. Accuracy is often left to QA or UAT or ultimately, users. Add to this the comp-sci reality that the structure of data is often different for application concerns than for analytical ones. Throw in regulatory and security considerations and now we have “data silos” and an infinity of integration problems when we do get around to the data-consumption side of things.
Adding one last detail - very often we are trying to integrate data from multiple sources either inside or outside the organization. Sometimes our data engineers don’t have the domain expertise to make key decisions about which oranges are actually apples. Keep this in mind, we’ll talk about it a bit more later.
The trouble with good analytics is they make data seem easy. “For god’s sake, just count something” as an old boss used to say. Or as a developer friend of mine says “it is easy for them to imagine so they imagine it is easy”.
The fact that data is hard explains why “solve it for you” concepts peddled by high priced vendors constantly pop up. But is this really what Data Mesh is? Or is it, finally, a “best-idea” that solves data difficulties for us and makes the capital expenditures on implementations cost less and provide more?
Data Mesh What?
Many articles dive into the definition of what Data Mesh is and we’ll not repeat the details here.
Suffice to say that Data Mesh is a set of architectural principles for attempting to corral the aforementioned difficulties and whip them into shape. It should be no surprise that it often comes with vendor-supplied companion tools, either things you install in your own enterprise, or hosted cloud-based services that you ship your data to. Yes, they will cost you additional licenses and require fresh expertise. Not inherently a bad thing if real, challenging problems are being solved.
You’ll hear things like “Data Mesh needs a Data Fabric”, and probably very soon we will begin to see “Data Mesh Compatible” and “a Data Mesh Framework” and the like.
When Data Mesh?
Perhaps you have a data wing in your org. A Data Platform or Data Hub or Data Science team, maybe all of the above? Perhaps you know they are smart and talented, but still can’t seem to get it together when it comes to delivery. After all, the responsibility for their presence ultimately rolls up to you. So when they come to you asking for budget for new tech, you want to listen carefully.
The intensity and immediacy of your data difficulties will color how you see the proposed benefits of Data Mesh. As someone in leadership, it is natural to consider how the adoption of a “new” set of architectural principles and possible tools will impact not only their users, but the organization as a whole. With Data Mesh more than almost any other new tech I can name over the last 10 years, it is critical to think critically about impact.
At the core, Data Mesh wants us to create federated, interoperable Data Products. The nature of this federation is well intentioned - let the subject-matter-experts (SME’s) in any given area own their own data. After all, advocates of Data Mesh argue, SME’s know best. They can publish back to some centralized system all the information about their product to make it discoverable, addressable, and interoperable.
If this sounds like “microservices for data”, you are right on the money, and probably for a lot more money.
If you were part of the microservices wave, you may remember struggling to understand how it was actually benefiting you as teams struggled with understanding how to split up their software, and began to realize that although we were making individual services and that was simpler, hosting and connecting them got a lot more complicated.
In Data Mesh this becomes compounded further by needing not only a clear, clean service definition and separation that was very hard to get right in microservices, but also Data Mesh thinks the organization ought to push the development and deployment of those services into the organizational areas they properly belong to. Not only this, it advises us to also federate access, quality and master data management, creating “localized decision making and autonomy”.
However, you still need global standards and a global governance team, at least in advisory capacity. It’s a bit of a government mindset - “why buy one when you can have two at twice the price”?
How Data Mesh?
Data Mesh attempts to solve your “data is hard” problem by taking one of the toughest things about dealing with data - combining business/SME knowledge with the tools/systems processing and delivering data - and delegating those problems out to parts of the org they properly belong to.
There remains a strand of centralized governance to ensure the other benefits around access and interoperability.
On the surface this doesn’t seem like a bad idea, and depending on budget and organizational structure, it may well be perfect for you. We’ve worked with companies of all sizes in many business domains, and it is a struggle to think of even one that has the internal behavioral characteristics, the discipline and consistency across lines of business, to expect a distributed approach like Data Mesh to practically work. Still, some of these concepts can add value to your data approach - it isn’t all an epic fail.
Data Mesh absolutely can work, and the fundamental concepts are viable if you’ve got iron will, deep control across your groups, and leadership to execute the vision. Oh, and if you’ve got budget to give each of those groups to pay for the new-hires or re-training so each has data engineers and devops resources to support the SME’s. No, these are not folks delegated from a central group in the Data Mesh proposition - they are resources dedicated to a specific domain area. We believe this sort of “resource-pod” approach makes a bit more sense for application developers than for data engineering, but for it to really work you’ll still need a centralized tooling group to ensure any kind of consistency. Some products like Starburst claim to help solve some of the tooling questions. Oh, and you’ll also need to continue to fund your central group that is providing support and tooling to those less-funded and likely less-trained lines of business building up their Data Products.
At its heart, and in the pejorative, we would characterize Data Mesh as “aspirational”. Some good ideas, but dangerous and creating a world of fresh problems in an already difficult space.
Data is Meshy
Data is hard. Business domains represented as software+data are usually complex beneath the surface, for many reasons. Budgets are finite. SME’s know best. Regulation is painful. Access controls are difficult.
We want a Silver Bullet. A Golden Hammer.
“There was money to be made giving all these people what they thought they wanted.”― George Akerlof, Robert Shiller
One fundamental tenet of Data Mesh, the concept of Data Product is alluring in the same way Microservices were. SME-tested, SME-developed, SME-managed, Stakeholder approved! These are high value aspects of the Data Mesh proposal, and they have real merit.
What this allure is truly saying is not “run off and allocate 2M to your Data Mesh implementation for next year”, rather it is saying “let’s think about data differently, let’s incentivize our SME’s and our Data Engineers to ‘sit closer together’ more often”. Let’s bring in expert resources that can help us align our vision with our reality. Let’s be open to input that isn’t purely technical in nature, but instead perhaps opines on our spend categories, our organizational structure, deftness of leadership and its input into how data and business feed each other.
Data Mesh is perhaps peaking at the top of its Hype Cycle here in Q1/2023. If you’ve got the time, the will, the budget, the org structure to support it, it might be worth a shot. You might even find, as many have, that right out of the gate your team seems more productive with Data Products. It’s always easier to build a miniature than a real building. The best use of time and resources is more likely a solid analysis of why Data Mesh seems so appealing, what problems it purports to solve and then consider simpler solutions to them.