Blog | tags:

FlexConnect: ML in BI Made Easy

5 min read | Published Dec 5, 2024

Štěpán is a developer advocate with a deep interest in ML and GPU computation. He started in GoodData in 2022 as UX Designer, later switching to the role of Sr. Developer Advocate. He currently forms part of the AI team as a backend developer, shifting his interest towards AI and how it can help people utilize their data more efficiently.

Read more from Štěpán

Integrating machine learning (ML) models into business intelligence (BI) reports is often touted as a no-brainer, and rightly so — it has the potential to significantly enhance your ability to forecast and make truly data-driven decisions. When doing so, it makes sense to connect it in near real-time to really leverage the full potential. And how about going a step further and making it react to your visualizations — like filtering to a specific city or branch, for example? Well, that is where things get a little more complicated.

In the old paradigm, your BI and data analysts would have had to agree on what should be predicted, and then you’d get a tailored ML model specific to a single use case (models don’t usually support multiple time granularities, for example). This data would then be stored in a database and connected to your BI and, in the best-case scenario, this would be kept up-to-date as best as possible with frequent data refreshes.

However, what if you have multiple columns for each of the ML model’s outputs due to the fact that you have a number of different cities for which you want to predict?

Well, wonder no more because we have solved this very issue for you! With FlexConnect, you can easily connect to nearly any ML model (on-premise or even cloud!) and feed it any data — be it BI data you already possess or, as in this example, publicly available weather forecast data fed through APIs!

Read on to see an outline of how this would look in practice. And, since we’ve already written an article using the example of an ice cream shop and how it uses APIs, let’s continue on that track. Just don’t blame us if any sudden ice cream-related urges take hold.

This article is part of the FlexConnect Launch Series. To dive deeper into the overall concept, be sure to check out our architectural overview. Don’t miss our other articles on topics such as API ingestion, NoSQL integration, and Kafka connectivity.

The Ice Cream Shop

Our hypothetical ice cream shop has a problem — it either throws out too much ice cream at the end of the day or runs out before it closes. So, the owner would like to know how much to prepare based on the season, day of the week, and weather forecast.

Setting Up the Revenue Prediction Integration

We’ll use a pre-trained machine learning model (e.g., a regression model) that predicts revenue based on date-related features. This model is saved as revenue_model.pkl using pickle.

Starting with the FlexConnect Template

Begin by utilizing the FlexConnect Template Repository. This template offers a solid starting point for any FlexConnect use case, allowing you to focus on integrating your specific ML model and forget about all the fuss.

There are three key methods for FlexConnect:

__init__
: Registers the class as a data source in the Logical Data Model.
on_load
: Initializes static data and loads the ML model into memory, optimizing performance by avoiding repeated loading.
call
: The main method for date extraction, feature preparation, and revenue prediction.

Okay, now let’s look at how you could easily implement Machine Learning for yourself!

Preparing Date Features for the Model

Machine learning models require numerical input features. We need to convert each date into features that the model can use, such as the day of the week, the day of the year, or the weather forecast.

The particular ML model I used had these features:

Day of Week:
Captures weekly patterns in revenue.
Day of Year:
Captures seasonal trends within the year.
Month:
Helps identify monthly trends and effects (e.g., pre-holiday season).
Year:
Accounts for yearly trends or growth (big-picture).
Precipitation:
The chance of rain — people usually don’t eat that much ice cream when it rains.
Temperature:
Temperature often influences the decision whether to have an ice cream.

Here’s the code that converts the dates to their respective features:

def prepare_features(self, date_list: list[datetime]) -> list[list[float]]:
    """Prepare features from dates for the model prediction."""
    # Include day of week, day of year, month, and year as features
    features = []
    for date in date_list:
        day_of_week = date.weekday()  # Monday=0, Sunday=6
        day_of_year = date.timetuple().tm_yday
        month = date.month
        year = date.year
        precipitation, temperature = poll_weather_api(date)
        features.append([day_of_week, day_of_year, month, year])
    return features

Loading the Pre-trained ML Model

When would be the best time to load your Pre-trained ML Model? The answer depends on how large your model is and the speed of your server. If you have a ~20MB model, you can easily load it during the call, but as a rule of thumb, I recommend you use the on_load method to load the pre-trained revenue prediction model when the server starts. This is also where you want to put any heavier loading to make your server as responsive as possible.

Here we are using pickle, as it is very easy to work with, but you can also use anything else.

@staticmethod
def on_load(ctx: ServerContext) -> None:
    # Load the ML model from a pickle file on load to reduce response time.
    with open('revenue_model.pkl', 'rb') as model_file:
        RevenuePredictionFunction.model = pickle.load(model_file)
    _LOGGER.info("Model loaded successfully")

Generating Revenue Predictions

With the dates extracted and features prepared, we can now use the model to make predictions. And since we extracted all the logic to functions, you can see the true power of FlexConnect and how simple it can be to set up!

def call(
    self,
    parameters: dict,
    columns: Optional[tuple[str, ...]],
    headers: dict[str, list[str]],
) -> ArrowData:
    execution_context = ExecutionContext.from_parameters(parameters)
    if execution_context is None:
        raise ValueError("Function did not receive execution context.")
    _LOGGER.info("execution_context", context=execution_context)

    date_list = self.handle_date(execution_context)
    input_features = self.prepare_features(date_list)
    predictions = self.model.predict(input_features)

    output = {
        "Date": [date.date() for date in date_list],
        "PredictedRevenue": predictions.tolist(),
    }

    return pyarrow.from_pydict(output)

And here is how that will look in your BI:

Push Your ML in BI to the Next Level

With the function deployed, you can now use GoodData to visualize the predicted revenue. With all that we’ve done, your users can easily apply date filters and have responsive and dynamic dashboards.

However, you can also take your ML to a whole new level with a few changes to the presented code.

Multiple Models

For example, imagine that you have multiple ice cream shops. You can have a model-per-shop situation, or different models for different time granularities.

Incorporate Cloud Computing

It is completely up to you where you compute all the hard work. You can even call a distinct ML platform, where you host your bigger models and use FlexConnect as the glue for your BI. Or you can use multiple different models via API — and FlexConnect can just orchestrate them and combine all the results.

Enhancing the Model

You can also enhance your model and include, for example, more weather data (wind, etc..), Holidays, Special Events, and any other data you might have on hand (or any publicly/internally available APIs).

The Strength of ML and FlexConnect

As developers, you can enhance your BI reports by integrating pre-trained machine learning models using FlexConnect. More specifically, in this case, you can predict revenue based on dates, enabling more accurate forecasting and helping to anticipate trends, meaning less ice cream wasted.

Leveraging machine learning in your BI strategy through FlexConnect allows you to provide deeper insights and contribute directly to business success.

Learn More

FlexConnect is built for developers and companies looking to streamline the integration of diverse data sources into their BI workflows. It gives you the flexibility and control you need to get the job done with ease.

Explore detailed use cases like connecting APIs, running local machine learning models, handling semi-structured NoSQL data, streaming real-time data from Kafka, or integrating with Unity Catalog — each with its own step-by-step guide.

Want the bigger picture? Check out our architecture article on FlexConnect, or connect with us through our Slack community for support and discussion.

Blog | tags:

Developers