Introducing FlexQuery
Save money, improve performance, and ensure data governance with GoodData’s in-memory, metadata-driven analytics layer.
FlexQuery is a versatile analytics engine built on Apache Arrow, which ingests data in batch or real-time, federates across sources, computes in-memory, and performs pre and post processing transformations — for fast and cost-effective analytics.
- Includes caching engine and controls by data source, user, or workspace.
- Reduces burden and cost of source cloud data warehouses.
- Provides levers for performance vs. cost trade-offs to meet needs.
- Processes large datasets and runs complex algorithms with efficiency.
- Supports real-time via a balance of distributed vs consolidated data.
- Built on Apache Arrow for speedy, analytical processing.
- Integrates a metrics layer within the analytics lake’s architecture.
- Enables the seamless federation of diverse data sources.
- Enhances data discoverability while maintaining data integrity.
FlexQuery powers the GoodData Analytics Lake
Read the whitepaper authored by renowned analyst Donald Farmer
FlexQuery features
The latest FlexQuery enhancements enable near-real-time analytics, with additional features coming soon.
Dive deeper into the GoodData platform
FlexQuery, built on open standards
Cutting-edge, in-memory Apache Arrow is an open-source project that supports diverse data formats.
- Columnar storage format, enabling the efficient handling and processing of large volumes of data.
- Optimized for live, high-speed analytics and machine learning rather than just data storage.
Read more about FlexQuery
Common questions
How would you define FlexQuery in layman's terms?
FlexQuery is GoodData’s in-memory, metadata-driven analytics service layer that improves performance and scalability while controlling cloud data warehouse costs. It is built on open standards and technologies such as Apache Arrow, Iceberg, and DuckDB. Using FlexQuery, analytics and business intelligence engineers federate, transform, and enrich data for dashboards, custom applications, and AI/ML use cases.
How would you explain FlexQuery to business people?
FlexQuery is the part of GoodData that makes reports really fast and makes your data developers more productive while saving money.
Is FlexQuery part of GoodData?
Yes, FlexQuery is the underlying caching and analytics layer that powers every GoodData environment. FlexQuery is available free to all GoodData customers.
What are the benefits of leveraging open-source tooling?
At GoodData, we are committed to using open source standards and technologies as core components of our analytics platform, and FlexQuery follows in that commitment. The reasons we do this are many:
- Open-source technologies are widely adopted and understood by our customers and the broader data community.
- Skills for open source technologies are easier to acquire than proprietary commercial BI technologies.
- Customers get increased portability and less vendor lock-in with open source technologies.
- GoodData’s development teams can do more, faster by embracing open source and thus our customers receive more innovation.
What cost and performance improvements can I expect with FlexQuery?
Business intelligence platforms are a prime driver of cloud data warehouse costs. By utilizing cutting edge caching and aggregate-awareness technology, FlexQuery dramatically reduces the number and complexity of queries processed by your data warehouse. With FlexQuery, we see a 55% average reduction of BI’s contribution to DW spend.
You use the term 'analytics lake', how is that different from a data lake or data warehouse?
‘Analytics lake’ is the term we use to describe the combination of analytics storage, processing, semantics, and visualization offered by the GoodData platform. Unlike a data warehouse or data lake, which are structured for data retention and system-of-record reporting, the analytics lake exists to provide a single interface for all relevant analytics objects — including data, transformations, semantic information, AI/ML processing, and visualizations/reports — via common interfaces such as REST APIs, Python libraries, and SQL endpoints.
Simply put, a data warehouse or data lake stores your data in structured or unstructured formats independent of any downstream uses and serves as a source for the analytics lake, which stores and provides the processes or objects necessary to turn that data into user-facing analytics.