Microsoft will retire DP-203: Data Engineering on Microsoft Azure on March 31, 2025. It will be replaced by DP-700: Microsoft Fabric Data Engineer.
DP-203 Course: Data Engineering on Microsoft Azure
In this course, students will learn about data engineering as it relates to working with batch and real-time analytics solutions using Azure data platform technologies. Students will begin by learning the basic processing and storage technologies used to build an analytics solution. They will also learn how to interactively explore data stored in files in a data lake. They will learn about the various ingestion techniques that can be used to load data using the Apache Spark functionality included in Azure Synapse Analytics or Azure Databricks, or how to ingest data using Azure Data Factory or Azure Synapse pipelines. Students will also learn the different ways they can transform data using the same technologies used to ingest data. They will understand the importance of implementing security to ensure that data (at rest or in transit) is protected. They will then learn how to build a real-time analytics system to create real-time analytics solutions.

The course includes a certification exam and a bonus opportunity for a virtual gift! *Promotion valid until August 31st, for customers in Spain only. Does not apply to self-learning.
Course aimed at
The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytics solutions using the data platform technologies available in Microsoft Azure. The secondary audience for this course is data analysts and data scientists working with analytics solutions based on Microsoft Azure.
Elements of the DP-203 formation
-
Introduction to Data Engineering in Azure (3 units)
-
Building Data Analytics Solutions with Azure Synapse Serverless SQL Pools (4 units)
-
Performing Data Engineering Tasks with Apache Spark Pools in Azure Synapse (3 units)
-
Data Transfer and Transformation with Azure Synapse Analytics Pipelines (2 units)
-
Implementing a Data Analytics Solution with Azure Synapse Analytics (6 units)
-
Working with Data Warehouses with Azure Synapse Analytics (4 units)
-
Using Hybrid Transactional and Analytical Processing Solutions with Azure Synapse Analytics (3 units)
-
Implementing a data streaming solution with Azure Stream Analytics (3 units)
-
Implementing a Data Lake Warehouse Analytics Solution with Azure Databricks (6 units)
DP-203 Course Content
Module 1: Exploring Computing and Storage Options for Data Engineering Workloads
This module provides an overview of the Azure compute and storage technology options available to data engineers building analytical workloads. This module teaches you how to structure your data lake and optimize files for exploratory, streaming, and batch workloads. Students will learn how to organize their data lake into tiers of data refinement as they transform files through batch and streaming processing. They will then learn how to create indexes on their datasets, such as CSV, JSON, and Parquet files, and use them to potentially accelerate queries and workloads.
Lessons
-
Introduction to Azure Synapse Analytics
-
Azure Databricks Overview
-
Introduction to Azure Data Lake Storage
-
Description of Delta Lake architecture
-
Working with data streams using Azure Stream Analytics
Lab: Exploring Compute and Storage Options for Data Engineering Workloads
-
Combine batch and stream processing in a single pipeline
-
Organize the data lake into file transformation tiers
-
Indexing data lake storage for query and workload acceleration
After completing this module, students will be able to do the following:
-
Describe Azure Synapse Analytics
-
Azure Databricks Overview
-
Describe Azure Data Lake Storage
-
Describe the architecture of Delta Lake
-
Describe Azure Stream Analytics
Module 2: Running interactive queries with Azure Synapse Analytics serverless SQL pools
In this module, students will learn how to work with files stored in the data lake and external file sources using T-SQL statements executed by a serverless SQL pool in Azure Synapse Analytics. They will query Parquet files stored in a data lake, as well as CSV files stored in an external data store. They will then create Azure Active Directory security groups and enforce access to files in the data lake through role-based access control (RBAC) and access control lists (ACLs).
Lessons
-
Exploring the capabilities of Azure Synapse serverless SQL pools
-
Querying data in the lake using Azure Synapse serverless SQL pools
-
Creating metadata objects in Azure Synapse serverless SQL pools
-
Data protection and user management in Azure Synapse serverless SQL pools
Lab: Running Interactive Queries with Serverless SQL Pools
-
Query Parquet data with serverless SQL pools
-
Create external tables for Parquet and CSV files
-
Creating views with serverless SQL pools
-
Securing data access in a data lake when using serverless SQL pools
-
Configure data lake security through role-based access control (RBAC) and access control lists (ACLs)
After completing this module, students will be able to do the following:
-
Describe the capabilities of Azure Synapse serverless SQL pools
-
Querying data in the lake using Azure Synapse serverless SQL pools
-
Creating metadata objects in Azure Synapse serverless SQL pools
-
Data protection and user management in Azure Synapse serverless SQL pools
Module 3: Data Exploration and Transformation in Azure Databricks
This module teaches you how to use several Apache Spark DataFrame methods to explore and transform data in Azure Databricks. Students will learn how to use standard DataFrame methods to explore and transform data. They will also learn how to perform more advanced tasks, such as removing duplicate data, manipulating date and time values, renaming columns, and aggregating data.
Lessons
-
Azure Databricks Overview
-
Reading and writing data in Azure Databricks
-
Working with DataFrame elements in Azure Databricks
-
Working with advanced DataFrame methods in Azure Databricks
Lab: Performing data explorations and transformations in Azure Databricks
-
Use DataFrames in Azure Databricks to explore and filter data
-
Caching DataFrames for faster queries later
-
Eliminating duplicate data
-
Manipulating date and time values
-
Remove columns from a DataFrame and rename them
-
Adding data stored in a DataFrame
After completing this module, students will be able to do the following:
-
Azure Databricks Overview
-
Reading and writing data in Azure Databricks
-
Working with DataFrame elements in Azure Databricks
-
Working with advanced DataFrame methods in Azure Databricks
Module 4: Exploring, Transforming, and Loading Data into Data Warehouses with Apache Spark
This module teaches you how to explore data stored in a data lake, transform that data, and load it into a relational data warehouse. Students will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. They will then use Apache Spark to load data into the data warehouse and join Parquet data in the data lake with data from the dedicated SQL pool.
Lessons
-
Defining Big Data Engineering with Apache Spark in Azure Synapse Analytics
-
Ingesting data with Apache Spark notebooks in Azure Synapse Analytics
-
Data Transformation with Azure Synapse Analytics Apache Spark Pool DataFrame Objects
-
Integrating SQL pools and Apache Spark in Azure Synapse Analytics
Lab: Exploring, Transforming, and Loading Data into Data Warehouses with Apache Spark
-
Performing data explorations in Synapse Studio
-
Ingest data with Spark notebooks in Azure Synapse Analytics
-
Transform data with Azure Synapse Analytics Spark Pools DataFrame
-
Integrate SQL and Spark pools in Azure Synapse Analytics
After completing this module, students will be able to do the following:
-
Describe big data engineering with Apache Spark in Azure Synapse Analytics
-
Ingesting data with Apache Spark notebooks in Azure Synapse Analytics
-
Data Transformation with Azure Synapse Analytics Apache Spark Pool DataFrame Objects
-
Integrating SQL pools and Apache Spark in Azure Synapse Analytics
Module 5: Ingesting and Loading Data into Data Warehouses
In this module, students learn how to ingest data into a data warehouse using T-SQL scripts and Synapse Analytics integration pipelines. Students will learn how to load data into dedicated Synapse SQL pools using PolyBase and COPY using T-SQL. They will also learn how to use workload management along with a copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.
Lessons
Lab: Ingesting and Loading Data into Data Warehouses
-
Perform petabyte-scale ingestion with Azure Synapse pipelines
-
Import data with PolyBase and COPY using T-SQL
-
Using best practices for loading data into Azure Synapse Analytics
After completing this module, students will be able to do the following:
Module 6: Data Transformation with Azure Data Factory or Azure Synapse Pipelines
This module teaches students how to create data integration pipelines to ingest data from multiple data sources, transform data using mapping dataflows, and perform data movement to one or more data sinks.
Lessons
Lab: Transforming data with Azure Data Factory or Azure Synapse pipelines
After completing this module, students will be able to do the following:
Module 7: Orchestrating data movements and transformations in Azure Synapse pipelines
In this module, we will learn how to create linked services and orchestrate data movement and transformation using notebooks in Azure Synapse pipelines.
Lessons
Lab: Orchestrating data movements and transformations in Azure Synapse pipelines
After completing this module, students will be able to do the following:
Module 8: Comprehensive Security with Azure Synapse Analytics
In this module, students will learn how to secure a Synapse Analytics workspace and its supporting infrastructure. They will explore SQL Active Directory Manager, manage IP firewall rules, manage secrets with Azure Key Vault, and access those secrets through a Key Vault linked service and pipeline activities. They will also learn how to implement column-level and row-level security and dynamic data masking using dedicated SQL pools.
Lessons
-
Creating a data warehouse in Azure Synapse Analytics
-
Configuring and managing secrets in Azure Key Vault
-
Implementing compliance controls for sensitive data
Lab: Comprehensive Security with Azure Synapse Analytics
-
Protecting the infrastructure behind Azure Synapse Analytics
-
Secure your Azure Synapse Analytics workspace and managed services
-
Protect your Azure Synapse Analytics workspace data
After completing this module, students will be able to do the following:
-
Creating a data warehouse in Azure Synapse Analytics
-
Configuring and managing secrets in Azure Key Vault
-
Implementing compliance controls for sensitive data
Module 9: Supporting Hybrid Transactional Analytics Processing with Azure Synapse Link
In this module, students will learn how Azure Synapse Link enables seamless connectivity between an Azure Cosmos DB account and a Synapse workspace. Students will see how to enable and configure Synapse Link and then how to query the Azure Cosmos DB analytical warehouse using Apache Spark and serverless SQL.
Lessons
-
Designing hybrid transactional and analytical processing using Azure Synapse Analytics
-
Configuring Azure Synapse Link with Azure Cosmos DB
-
Azure Cosmos DB query with Apache Spark pools
-
Azure Cosmos DB query with serverless SQL pools
Lab: Supporting Hybrid Transactional Analytical Processing with Azure Synapse Link
-
Configuring Azure Synapse Link with Azure Cosmos DB
-
Query Azure Cosmos DB with Apache Spark for Synapse Analytics
-
Query Azure Cosmos DB with serverless SQL pools for Azure Synapse Analytics
After completing this module, students will be able to do the following:
-
Designing hybrid transactional and analytical processing using Azure Synapse Analytics
-
Configuring Azure Synapse Link with Azure Cosmos DB
-
Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics
-
Query Azure Cosmos DB with Serverless SQL for Azure Synapse Analytics
Module 10: Real-Time Stream Processing with Stream Analytics
In this module, students will learn how to process stream data with Azure Stream Analytics. They will ingest vehicle telemetry data into Event Hubs and then process it in real time using several window-based functions in Azure Stream Analytics. They will send the data to Azure Synapse Analytics. Finally, students will learn how to scale Stream Analytics jobs to increase performance.
Lessons
-
Enabling reliable messaging for big data applications with Azure Event Hubs
-
Working with data streams using Azure Stream Analytics
-
Ingesting data streams with Azure Stream Analytics
Lab: Real-Time Stream Processing with Stream Analytics
-
Use Stream Analytics to process real-time data from Event Hubs
-
Use Stream Analytics window-based functions to create aggregates and send them to Synapse Analytics
-
Scaling Azure Stream Analytics jobs to increase performance through partitioning
-
Repartitioning stream input to optimize parallelization
After completing this module, students will be able to do the following:
-
Enabling reliable messaging for big data applications with Azure Event Hubs
-
Working with data streams using Azure Stream Analytics
-
Ingesting data streams with Azure Stream Analytics
Module 11: Building a Stream Processing Solution with Event Hubs and Azure Databricks
In this module, students will learn how to ingest and process stream data at scale using Event Hubs and Spark structured streaming in Azure Databricks. Students will learn the key uses and features of structured streaming. They will implement sliding windows to aggregate data chunks and apply watermarks to remove stale data. Finally, students will connect to Event Hubs to read and write streams.
Lessons
Lab: Building a Stream Processing Solution with Event Hubs and Azure Databricks
-
Analyze the key uses and features of structured streaming.
-
Streaming data from a file and writing it to a distributed file system
-
Use sliding windows to add data fragments instead of all the data
-
Apply watermarks to remove obsolete data
-
Connect to Event Hubs read and write streams
After completing this module, students will be able to do the following:
Prerequisites
Successful students begin this course with a background in cloud computing and data fundamentals, as well as professional experience with data solutions.
Specifically carrying out:
Language
Microsoft Associate Certification: Azure Data Engineer Associate
Microsoft Certified: Azure Data Engineer Associate
Demonstrate an understanding of common data engineering tasks to deploy and manage data engineering workloads in Microsoft Azure using a variety of Azure services.
Level: Intermediate
Role: Data Engineer
Product: Azure
Subject: Data and AI