Data Management

Embedded Analytics for Fun and Profit

Matt Brown
July 6, 2022

Embedded analytics refers to the practice of bolting on a reporting or analytics package into an existing application or platform.  In our client work, we see many different types of embedded analytics projects, ranging from corporations who are seeking to publish reports and dashboards to internal stakeholders, to commercial software companies who are seeking to provide better analytics to their customers in order to generate additional license revenue.

At Waterloo Data we have helped many commercial software companies embed analytics into their platform or SaaS offering.  This post is an overview of the most common high-level requirements that surface when embarking on that journey.

Reporting and analytics within a software application are oftentimes an afterthought.  Software product teams can be so focused on the core features and functionality of a given software release, that they overlook opportunities for providing analytics back to the end customers.  When reports and analytics are delivered, they are usually “tacked-on” to the end of a development cycle and not incorporated into the core architecture of the system.

This approach leads to scenarios where you have a handful of reports or dashboards, but they are all running off data from the core transactional tables of the application. This approach eventually leads to serious performance and scalability issues and brings us to one of the biggest best practices when embedding analytics into a software application: separate your transactional database schema from your analytics database schema.  

Why should you separate your transactional schema from your analytics schema?  The biggest reason has to do with performance. Transactional data models are optimized for scenarios that require writing a lot of data to disk.  Analytic data models are optimized for scenarios that require reading a lot of data from disk.  When users perform analytic queries against a transactional data model this results in poor query performance and database lockups.

Another reason to create a separate analytics schema for your application is to support aggregations and roll-ups in your data as well as to support the ability to track history over time.  Consider the example of a sales rep and a sales manager.  Let's say that a sales rep reported to a given manager for the first half of the year and then reported to another manager for the second half of the year.  When rolling up numbers from that sales rep to their manager, you need to support different calculations depending on when you are doing the analysis.  This can be tricky in a transactional schema because it does not contain the full history.  The transactional schema just knows the current relationship of the sales rep to the manager and does not understand how that has changed over time.  This requirement, the ability to track slow, gradual changes over time, is very easy to do in a data model designed for analytics and another good reason why it makes sense to have two schemas for your application: one for day-to-day transactional needs and one for analytical purposes.  

Once you have a schema up and running for your analytic needs, you will need some mechanism for keeping that schema up to date with the latest data from the transactional system and any other data sources you would like to incorporate.  There are a number of approaches for supporting this requirement but the typical practice is to incorporate some type of ETL layer into your solution. ETL stands for Extract, Transform, and Load.  As the name implies, ETL refers to the process of pulling data out of a given source system, performing some transformations on that data, and loading that data into a new data model. In some cases you can build your own stored procedures for handling ETL and do not need a purpose built ETL tool.  In other cases, using an ETL tool saves a ton of development time and helps to create a more flexible solution.  Over the past few years ETL tools have improved significantly and there are a wealth of open source and commercial software providers who can meet this need.

So you have a data model that is separate from your transactional schema and purpose built for your analytic needs, and you have a data management solution in place that updates the analytical data mart with the relevant data you need for your analysis. This forms the backend for an embedded solution. Now you need a data access and/or visualization layer.

There are two components to a data access layer.  The first component is the data model that you will query.  This often takes the form of a star schema for ad-hoc/self service queries, or an OLAP cube for drilling from the highest aggregations down to transactional details, or maybe a set of wide and flat tables for advanced analytic purposes. The second component of the data access layer is the tool you will use to query the data. This could just consist of opening up a terminal session and throwing hand rolled SQL against a set of tables, but since we are talking about a customer facing analytics solution, that just won’t  cut it.  

The good news is that there are a number of software vendors who can provide a robust data access tool as part of an OEM partnership, thus avoiding the engineering overhead of rolling your own solution.  Over the years, we have found that the following requirements are the most common when considering an OEM partner for providing data access:

Self-service/Ad-hoc Analysis – This requirement is always at the top of everyone’s list and is one of the most poorly understood. For some people this means that a user should be able to apply different filter sets to a given report or select an existing report from a library of reports and then modify it by adding or removing columns. For other people, this means that a user should have full access to the underlying data model and be able to create custom reports and analysis by selecting different data attributes and different visualization outputs. In working with our customers we typically find a happy medium between these two extremes.  The reality is that in spite of the vendor promises and industry hype; self-service and ad-hoc analytics are beyond the capabilities of 80% to 90% of the users you are attempting to serve.  Before getting all fired up about the “art of the possible” when it comes to self-service, it is better to prototype a solution using an evaluation copy from an OEM provider and get it in front of real users for feedback.  When it comes to self-service, we typically find that the real winners are the internal Account Management and Customer Support teams.  By providing self-service BI to these teams you are giving them an “easy button” for addressing customer requests while abstracting customers from the complexity of producing custom analysis.

Multi Tenant – Given that most software companies are deploying as a SaaS platform, supporting multi-tenant architectures is generally at the top of everyone’s list. However, like the requirement above, multi-tenancy is a requirement that means different things to different people.  When thinking about multi-tenancy as a requirement you have to consider not just the data access layer but also the complete data architecture from source to target. For some organizations, they have to implement physically separate databases for each customer to meet regulatory requirements.  Other organizations can allow customer data to coexist in the same physical database.  When it comes to the data access layer, basic multi-tenant requirements are usually fairly well supported out of the box by commercial BI packages.  Where the major differences emerge is in how well a commercial package will support content migrations and overall change management.  For example, how will you know if a new schema change you are pushing out for client reporting breaks any existing reports that client has created?  Does the vendor provide any tooling for managing content migrations?  How easy is it to refresh a basic inventory of out of the box reports and analytics across the whole client base?      

Web Based – Managing fat clients or desktop applications requires a level of complexity that most teams want to avoid.  For many teams, having an OEM partner that supports a 100% web based, server side solution is a must have.  From a content delivery and overall user interaction perspective, this requirement is generally pretty well supported by the major players.  

Integration – When talking about embedding analytics inside another application or platform there has to be a well-understood mechanism for integrating the OEM software with that of the host application.  This is pretty straightforward on the backend of an embedded solution, but can become complex when considering the data access layer.  User authentication, entitlement, and overall user experience are critical considerations.  One of the first points to consider with integration is determining if you are seeking a true “white-label” experience where the branding and identification of the OEM partner can be completely removed.  Flush this out early in discussions with potential OEM partners.  We have seen deals where once an OEM knows they are the preferred vendor, they use white labeling their solution as a means to negotiate a better deal.  Once you have determined if you are seeking a true white label solution, the next point to consider is whether you are seeking a tight integration or a loose integration.  A tight integration would be one where you have complete control over the layout and presentation of analytical content within the host application and are not limited to how you present and place reports, dashboards, or other visualizations. A loose integration would be one where you are basically passing an authenticated session from the host application to the analytics server, and allowing the analytics server to control the UI framework and layout.  One thing to consider is that if you do have valid self-service/ad-hoc requirements, make sure the analytics content creation experience aligns with your tight vs. loose requirements.  When considering tight vs. loose integration options with an OEM partner, you will dig into the low level details for how they can support the primary means of passing data back and forth with the host application.  In the past vendors would support for SOAP or REST web services, JavaScript APIs, and lately we have seen vendors providing GraphQL APIs and other capabilities.  

Flexible OEM Model – One of the most critical components when evaluating potential OEM partners is to find a partner that can construct a deal that aligns with your selling model.  This is critical from a financial management standpoint where you need to have a true understanding of your Per Unit Costs.  If for example you sell your software based on number of transactions, then you want to model your partnership with your OEM based on transactions.  If you price your software based on a site wide license with unlimited users, then you will want a partner who can support an “all you can eat” license scenario.  Other characteristics of a good OEM partner include providing you free development licenses and structuring payment terms such that they don’t get paid until you collect from your end customers.  In many cases OEM partners will attempt to sell you an upfront quota of licenses to recognize revenue earlier in the deal.  Do not go for this…there are many OEM partners who are more than happy to construct a deal where they don’t get paid until you get paid.  

These are just some of the considerations that need to be taken into account when considering an Embedded Analytics solution in a commercial software setting.  At Waterloo Data we have helped many commercial software teams evaluate their options when it comes to embedding analytics.  Organizations enjoy working with us because we have our finger squarely on the pulse in this space and can help companies quickly evaluate their options and deliver reference solutions.