Meet the Analytics Lake
Data, metadata, high performance caching, custom services and analytics all in one platform!
What is an Analytics Lake?
The Analytics Lake forms a 'composable data service architecture' layer that combines the best in open-source tech and our semantic and analytics layers, to give both human and machine consumers a single place to find all of their analytics assets.
The unique purpose of an Analytics Lake
The Analytics Lake is designed to integrate with raw data from a data lake and structured data from a data warehouse to provide advanced analytics, comprehensive business intelligence, and accurate predictive insights, driving better decision-making and outcomes.
Analytics Lake | ||
---|---|---|
Solution | ||
Primary Focus | ||
Scalable raw data storage | Business reporting and decision-making | Analytics and machine learning |
Storage Format & Optimization | ||
Optimized for low-cost storage of large raw, unstructured data | Structured data optimized for query performance | Optimized compute for ML workflows |
Metadata & Semantic Layers | ||
Limited metadata and semantics | Typically has integrated business metadata | Integrated metadata and headless semantic layer |
BI & Visualization Capabilities | ||
Minimal BI support | Full support for BI workflows and visualizations | Integrated BI and visualizations environment |
Advance Analytics Support | ||
Strong support for advanced analytics | Basic predictive modeling capabilities | Optimized for advanced analytics and ML |
Data Science & ML Support | ||
Supports data science through big data tools | Limited support for data science workflows | Tailored for data science and ML with compute optimization and APIs |
Compliance & Governance | ||
Limited governance capabilities | Strong auditing, security, and governance capabilities | Robust governance through metadata integration |
APIs & Programmatic Access | ||
Access via big data APIs and notebooks | Traditionally accessed through SQL | APIs designed specifically for analytics engineers |
Cost Efficiency | ||
Very low-cost storage but unpredictable analytics costs | Predictable but relatively high storage costs | Optimized processing reduces overall costs |
Flexibility & Future Proofing | ||
Flexible but typically requires migration for BI use | Schema-on-write limits flexibility | Designed to interoperate with multiple data platforms |
Key Strengths | ||
Scalability, cost efficiency for storage | Performance, consistency, reliability | Purpose-built for advanced analytics and ML |
Key Limitations | ||
Challenging to apply governance, reuse data for BI and reporting | Limited flexibility and ability to handle messy, large or streaming data | Emerging architecture |
Solution | Data Lake | Data Warehouse | Analytics Lake |
---|---|---|---|
Primary Focus | Scalable raw data storage | Business reporting and decision-making | Analytics and machine learning |
Storage Format & Optimization | Optimized for low-cost storage of large raw, unstructured data | Structured data optimized for query performance | Optimized compute for ML workflows |
Metadata & Semantic Layers | Limited metadata and semantics | Typically has integrated business metadata | Integrated metadata and headless semantic layer |
BI & Visualization Capabilities | Minimal BI support | Full support for BI workflows and visualizations | Integrated BI and visualizations environment |
Advance Analytics Support | Strong support for advanced analytics | Basic predictive modeling capabilities | Optimized for advanced analytics and ML |
Data Science & ML Support | Supports data science through big data tools | Limited support for data science workflows | Tailored for data science and ML with compute optimization and APIs |
Compliance & Governance | Limited governance capabilities | Strong auditing, security, and governance capabilities | Robust governance through metadata integration |
APIs & Programmatic Access | Access via big data APIs and notebooks | Traditionally accessed through SQL | APIs designed specifically for analytics engineers |
Cost Efficiency | Very low-cost storage but unpredictable analytics costs | Predictable but relatively high storage costs | Optimized processing reduces overall costs |
Flexibility & Future Proofing | Flexible but typically requires migration for BI use | Schema-on-write limits flexibility | Designed to interoperate with multiple data platforms |
Key Strengths | Scalability, cost efficiency for storage | Performance, consistency, reliability | Purpose-built for advanced analytics and ML |
Key Limitations | Challenging to apply governance, reuse data for BI and reporting | Limited flexibility and ability to handle messy, large or streaming data | Emerging architecture |
The GoodData Analytics Lake is powered by FlexQuery
Read the whitepaper authored by renowned analyst Donald Farmer
Discover how the GoodData Analytics Lake will help boost your data product
Analytics Lake, built on open standards
Built on open-source technology, the GoodData Analytics Lake integrates seamlessly with Apache Arrow.
- Ensures accessibility for developers and users.
- Accelerates innovation and reduces vendor lock-in.
- Widely supported in-memory analytics platform offering interoperability with various tools.
Leveraging the data service layer
The concept of a data service layer involves creating an intermediary layer that connects various data sources and tools, providing a unified interface for accessing and managing data.
No more moving data
Reduce data duplication and latency, and ensure real-time access to the most current data where it sits.
Tools to connect and orchestrate
S3 for business intelligence, TimescaleDB for real-time analytics, Snowflake for data warehousing, and Databricks for data science.
Efficient Data Management
Ensure that each tool and storage solution delivers the most value without unnecessary data movement or complexity.
Dive deeper into the details
Learn more
Learn more about the Analytics Lake vision