Embedded analytics refers to the practice of bolting on a reporting or analytics package into an existing application or platform. In our client work, we see many different types of embedded analytics projects, ranging from corporations who are seeking to publish reports and dashboards to internal stakeholders, to commercial software companies who are seeking to provide better analytics to their customers in order to generate additional license revenue.
At Waterloo Data we have helped many commercial software companies embed analytics into their platform or SaaS offering. This post is an overview of the most common high-level requirements that surface when embarking on that journey.
Reporting and analytics within a software application are oftentimes an afterthought. Software product teams can be so focused on the core features and functionality of a given software release, that they overlook opportunities for providing analytics back to the end customers. When reports and analytics are delivered, they are usually “tacked-on” to the end of a development cycle and not incorporated into the core architecture of the system.
This approach leads to scenarios where you have a handful of reports or dashboards, but they are all running off data from the core transactional tables of the application. This approach eventually leads to serious performance and scalability issues and brings us to one of the biggest best practices when embedding analytics into a software application: separate your transactional database schema from your analytics database schema.
Why should you separate your transactional schema from your analytics schema? The biggest reason has to do with performance. Transactional data models are optimized for scenarios that require writing a lot of data to disk. Analytic data models are optimized for scenarios that require reading a lot of data from disk. When users perform analytic queries against a transactional data model this results in poor query performance and database lockups.
Another reason to create a separate analytics schema for your application is to support aggregations and roll-ups in your data as well as to support the ability to track history over time. Consider the example of a sales rep and a sales manager. Let’s say that a sales rep reported to a given manager for the first half of the year and then reported to another manager for the second half of the year. When rolling up numbers from that sales rep to their manager, you need to support different calculations depending on when you are doing the analysis. This can be tricky in a transactional schema because it does not contain the full history. The transactional schema just knows the current relationship of the sales rep to the manager and does not understand how that has changed over time. This requirement, the ability to track slow, gradual changes over time, is very easy to do in a data model designed for analytics and another good reason why it makes sense to have two schemas for your application: one for day-to-day transactional needs and one for analytical purposes.
Once you have a schema up and running for your analytic needs, you will need some mechanism for keeping that schema up to date with the latest data from the transactional system and any other data sources you would like to incorporate. There are a number of approaches for supporting this requirement but the typical practice is to incorporate some type of ETL layer into your solution. ETL stands for Extract, Transform, and Load. As the name implies, ETL refers to the process of pulling data out of a given source system, performing some transformations on that data, and loading that data into a new data model. In some cases you can build your own stored procedures for handling ETL and do not need a purpose built ETL tool. In other cases, using an ETL tool saves a ton of development time and helps to create a more flexible solution. Over the past few years ETL tools have improved significantly and there are a wealth of open source and commercial software providers who can meet this need.
So you have a data model that is separate from your transactional schema and purpose built for your analytic needs, and you have a data management solution in place that updates the analytical data mart with the relevant data you need for your analysis. This forms the backend for an embedded solution. Now you need a data access and/or visualization layer.
There are two components to a data access layer. The first component is the data model that you will query. This often takes the form of a star schema for ad-hoc/self service queries, or an OLAP cube for drilling from the highest aggregations down to transactional details, or maybe a set of wide and flat tables for advanced analytic purposes. The second component of the data access layer is the tool you will use to query the data. This could just consist of opening up a terminal session and throwing hand rolled SQL against a set of tables, but since we are talking about a customer facing analytics solution, that just won’t cut it.
The good news is that there are a number of software vendors who can provide a robust data access tool as part of an OEM partnership, thus avoiding the engineering overhead of rolling your own solution. Over the years, we have found that the following requirements are the most common when considering an OEM partner for providing data access:
Self-service/Ad-hoc Analysis – This requirement is always at the top of everyone’s list and is one of the most poorly understood. For some people this means that a user should be able to apply different filter sets to a given report or select an existing report from a library of reports and then modify it by adding or removing columns. For other people, this means that a user should have full access to the underlying data model and be able to create custom reports and analysis by selecting different data attributes and different visualization outputs. In working with our customers we typically find a happy medium between these two extremes. The reality is that in spite of the vendor promises and industry hype; self-service and ad-hoc analytics are beyond the capabilities of 80% to 90% of the users you are attempting to serve. Before getting all fired up about the “art of the possible” when it comes to self-service, it is better to prototype a solution using an evaluation copy from an OEM provider and get it in front of real users for feedback. When it comes to self-service, we typically find that the real winners are the internal Account Management and Customer Support teams. By providing self-service BI to these teams you are giving them an “easy button” for addressing customer requests while abstracting customers from the complexity of producing custom analysis.
Multi Tenant – Given that most software companies are deploying as a SaaS platform, supporting multi-tenant architectures is generally at the top of everyone’s list. However, like the requirement above, multi-tenancy is a requirement that means different things to different people. When thinking about multi-tenancy as a requirement you have to consider not just the data access layer but also the complete data architecture from source to target. For some organizations, they have to implement physically separate databases for each customer to meet regulatory requirements. Other organizations can allow customer data to coexist in the same physical database. When it comes to the data access layer, basic multi-tenant requirements are usually fairly well supported out of the box by commercial BI packages. Where the major differences emerge is in how well a commercial package will support content migrations and overall change management. For example, how will you know if a new schema change you are pushing out for client reporting breaks any existing reports that client has created? Does the vendor provide any tooling for managing content migrations? How easy is it to refresh a basic inventory of out of the box reports and analytics across the whole client base?
Web Based – Managing fat clients or desktop applications requires a level of complexity that most teams want to avoid. For many teams, having an OEM partner that supports a 100% web based, server side solution is a must have. From a content delivery and overall user interaction perspective, this requirement is generally pretty well supported by the major players.
Flexible OEM Model – One of the most critical components when evaluating potential OEM partners is to find a partner that can construct a deal that aligns with your selling model. This is critical from a financial management standpoint where you need to have a true understanding of your Per Unit Costs. If for example you sell your software based on number of transactions, then you want to model your partnership with your OEM based on transactions. If you price your software based on a site wide license with unlimited users, then you will want a partner who can support an “all you can eat” license scenario. Other characteristics of a good OEM partner include providing you free development licenses and structuring payment terms such that they don’t get paid until you collect from your end customers. In many cases OEM partners will attempt to sell you an upfront quota of licenses to recognize revenue earlier in the deal. Do not go for this…there are many OEM partners who are more than happy to construct a deal where they don’t get paid until you get paid.
These are just some of the considerations that need to be taken into account when considering an Embedded Analytics solution in a commercial software setting. At Waterloo Data we have helped many commercial software teams evaluate their options when it comes to embedding analytics. Organizations enjoy working with us because we have our finger squarely on the pulse in this space and can help companies quickly evaluate their options and deliver reference solutions.