Overview
Data modeling and data warehousing are two essential concepts in the field of data engineering and analytics. A data model is a way of organizing and representing data in a logical and consistent manner. A data warehouse is a centralized repository of integrated and historical data from various sources.
Objectives:
- Explain how organizations use data warehousing and data modeling to perform analytics
- Define data modeling and explain its importance in enabling data analytics at scale
- Identify common analytical data modeling terms and techniques
- Identify major data warehousing frameworks, their strengths and weaknesses
- Define ETL and Reverse ETL and their roles in data warehousing operations
Audience
This course is designed for beginners who want to get started with data engineering and analytics. Students will learn the basic concepts, principles, and techniques of data modeling and data warehousing as well as how to apply them in real-world scenarios.
Length
1 day
Outline
Operational and Analytical Processes
- Comparison and Contrast
- OLTP and OLAP systems
- Typical use cases
Data Warehousing Fundamentals
- Common terminology
- ACID
- Structured, Semi-Structured and Unstructured Data
- Schema-on-read vs Schema-on-write
Analytics Platform Types
- Relational data warehouses
- Advantages and disadvantages
- Common types and use cases
- Data Lake
- Advantages and disadvantages
- Common use cases
- Data Lakehouses
- Advantages and disadvantages
- Common use cases
Data Ingestion and Extraction
- ETL vs ELT
- Reverse ETL / ELT
- Use cases
- Data quality measurement
Dimensional Modeling
- Data Modeling Fundamentals
- Entities, Attributes, Relationships, Cardinality
- Dimensional Modeling Fundamentals
- Facts, Dimensions, Grain (Granularity), Slowly Changing Dimensions
- Dimensional Schemas
- Star Schema
- Snowflake Schema
- One Big Table (OBT)
- Data Vault
- Third Normal Form (Relational schema)
Activities and Exercises:
-
- Develop a star schema model from a relational table structure
- Explore and query a data lakehouse with structured and semi-structured data