Modern Data Warehouse

In Cloud Technology, Data Architecture, Data Warehousing by PeterLeave a Comment

What is a data warehouse?

A data warehouse is a system that aggregates data from different sources into a single, central data store to support analytics, data mining, machine learning and AI. Also known as an enterprise data warehouse (EDW), its functionality allows businesses to run sophisticated analytics on petabytes of data that could not be handled by a traditional relational database

Conceptual DW/BI vs Azure Modern Data Warehouse comparison

Approaches to Data Warehousing Architecture

  • Kimball’s methodology – Star Schema
  • Bill Inmon’s Corporate Information Factory
  • Data Vault modelling
  • Hybrid

Azure Synapse Analytics – Modern Data Warehouse and Big Data Analytics

Structured and Unstructured Data

Designed to work with Structured and Unstructured Data

All-in-one platform for analytical projects: Data Lake, Data Warehouse, Data Analytics, Data Integration

  • Ingest
  • Prepare
  • Manage
  • Serve

Deeply integrated with other services:

  • Azure Purview
  • Azure Machine Learning
  • Azure Cosmos DB (Link Service)
    • No ETL jobs to manage
    • Near real-time insight
    • No impact on operational workloads
    • Optimized for large-scale analytics workloads
    • Integration with Azure Synapse Analytics
  • Power BI

Main characteristics:

  • Dedicated SQL pool (data warehouse)
  • Serverless SQL pool (data exploration)
  • Serverless Apache Spark pool (big data)
  • Intelligent workload management
  • Materialized views and result-set cache
  • Power BI performance accelerator for Azure Synapse Analytics
  • Power BI integration
  • Streaming analytics (data warehouse)
  • Azure Machine Learning integration
  • Azure Synapse Link
  • Column and row-level security
  • Column-level encryption
  • MPP Massive Parallel Processing
  • Azure Data Lake Storage
  • Column and row level security
  • Limitless Scale
  • Supports Stored Procedures
  • Auto-scale
  • Dedicated SQL pool – formerly known as Azure SQL Data Warehouse (Data Warehouse)
    • The Dedicated SQL pool stores the data in relational tables with columnar storage, reducing data storage costs, while also improving query performance
  • Serverless SQL pool – query both structured and unstructured data (Spark Pool)
    • Don’t have their own storage
    • Don’t pay for compute resources, only pay for data processed
  • Spark integration
  • Synapse Pipelines – hybrid data integration (Azure Data Factory)
  • Studio
  • File support: Parquet, CSV, Json, (Spark supports many more formats)

Developer Tools:

  • Azure Synapse Analytics
  • Visual Studio
  • Azure Data Studio
  • SQL Server Management Studio
  • Visual Studio Code

Security

Managed Private Endpoints

Storage and Compute decoupled means:

  • Storage only charge
  • Compute charge and can be paused

Polybase capability to query data in a Data Lake

  • Polybase is a technology that accesses external data stored in Azure Blob storage, Hadoop, or Azure Data Lake store using the Transact-SQL language

Leave a Comment