Analytics for the Modern World - What are Complex Data Relationships?
Written by GoodData Author |
As I’ve transitioned in my career from the world of relational database applications to analytics, one thing that has been a real thorn in my side is the inability of analytic tools to deal well with complex relationships.
By ‘complex relationships’ I don’t mean spending the holidays with your in-laws. I’m talking about the relationships between the different kinds of data used by software applications.
Compared to the client-server applications of 10-20 years ago, modern Web applications provide the user with far more sophistication and flexibility. Just think about your favorite social media app with its complex web of posts, hashtags, likes, shares and comments. Modern applications are better able to represent the multiplicity of the ‘real world,’ for example, employees that belong to multiple teams or project groups, or a sales opportunity that touches multiple regions or market segments.
With all this richness brings complexity. There is much more data to deal with. We have loosely structured, deeply nested, and often variable kinds of data. Software is updated frequently, and it’s increasingly global, mobile, and available in the cloud.
Complexity is everywhere in modern software, and the truth is that most analytic systems have not done a good job of keeping up.
This 3 part blog series will explain:
- In this part: How to recognize complex data relationships, especially a certain type of relationship called many-to-many, and why it is useful
- In the part 2: How complex data can make analysis difficult for both business users and analytics experts alike
- In the part 3: How to embrace this ’new normal’ of modern data complexity and make analytics even more relevant and accessible to everyone.
Let’s begin this exploration of analytics and complex data by first understanding the difference between simple, and complex data relationships.
What are data relationships?
In applications, data that is about a particular topic is often separated into different levels of detail. Here is a simple example:
Suppose we wish to analyze our sales performance. We need a table of invoices, which list the customer, the date of the transaction, shipping details, and so on. We also need the details: a list of the products sold, the quantity of each, and the price. Here is a very simplified view of these two tables:
Note that in these tables that every invoice line belongs to a single invoice, but that every invoice can have many invoice lines. This is an example of a one-to-many relationship, and is the most common type of data relationship.
So what exactly is a many-to-many data relationship?
A many-to-many relationship is similar to a one-to-many relationship in that tables are joined together, but the relationship is more complex. Here’s a common example:
Modern marketing automation systems typically let us assign one or more predefined ‘tags’ to each marketing campaign. This lets us organize campaigns in any way we want. A list of campaigns could look like this:
Every campaign can have multiple tags, and every defined tag can be attached to multiple campaigns. This is an example of a many-to-many relationship, a very useful structure that allows us to ask questions such as:
- What is the overall spend per channel for ‘Nurture’ campaigns?
- What has been the total cost of our ‘Spring16’ marketing program?
- For each program, how many Marketing Qualified Leads (MQL) did we generate, and what was the cost per MQL?
These types of many-to-many (or M:N) relationships are typical of modern datasets for social media, marketing, product management, healthcare, sales and more. (For example, imagine the multiplicity of markets, opportunities, products, regions, and overlay teams in a typical sales organization.)
So, what’s the problem and why is that thorn still stuck in my side? Well, the dirty little secret of analytics tools is that most don’t handle many-to-many data relationships at all, and the ones that do don’t handle it all that well. Complex data is now ‘the new normal’ and the analytics model needs to keep up.
In the next part of this blog series on analytics and complex data, we will see the dangers lurking for business users, and explore the ways that analytics experts have attempted to deal with the problem so far.
Written by GoodData Author |