DP-203: Data Engineering on Microsoft Azure

Name: DP-203: Data Engineering on Microsoft Azure
Availability: InStock

Modality:

Product Options

€295.00

| /

________________________________________________________________

Do you want to take this course remotely or in person?

Contact us by email: info@nanforiberica.com , phone: +34 91 031 66 78, WhatsApp: +34 685 60 05 91 , or contact Our Offices

________________________________________________________________

Microsoft will retire DP-203: Data Engineering on Microsoft Azure on March 31, 2025. It will be replaced by DP-700: Microsoft Fabric Data Engineer.

DP-203 Course: Data Engineering on Microsoft Azure

In this course, students will learn about data engineering as it relates to working with real-time and batch analytics solutions using Azure data platform technologies. Students will begin by learning the basic compute and storage technologies used to build an analytics solution. They will also learn how to interactively explore data stored in files in a data lake. They will learn about the various ingestion techniques that can be used to load data using Apache Spark functionality included in Azure Synapse Analytics or Azure Databricks, or how to ingest using Azure Data Factory or Azure Synapse pipelines. Students will also learn about the different ways they can transform data using the same technologies used to ingest data. They will understand the importance of implementing security to ensure that data (at rest or in transit) is protected. They will then be shown how to build a real-time analytics system to build real-time analytics solutions.

The course includes the certification exam as a gift opportunity! *Promotion valid until February 28th only for clients from Spain

Audience Profile

The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytics solutions using the data platform technologies in Microsoft Azure. The secondary audience for this course is data analysts and data scientists who work with analytics solutions built on Microsoft Azure.

Items in this collection

Introduction to Azure Synapse Analytics (7 Units)
Explore Azure Databricks (7 Units)
Introduction to Azure Data Lake Storage (7 Units)
Introduction to Azure Stream Analytics (7 Units)
Using Azure Synapse Serverless SQL Pool to Query Files in a Data Lake (7 Units)
Using Azure Synapse Serverless SQL Pools to Transform Data in a Data Lake (7 Units)
Creating a lake database in Azure Synapse Analytics (8 Units)
Data protection and user management in Azure Synapse serverless SQL pools (6 units)
Using Apache Spark on Azure Databricks (9 Units)
Using Delta Lake on Azure Databricks (8 Units)
Data Analysis with Apache Spark in Azure Synapse Analytics (8 Units)
Integrating SQL Pools and Apache Spark in Azure Synapse Analytics (11 Units)
Using best practices for loading data into Azure Synapse Analytics (11 Units)
Petabyte-scale ingestion with Azure Data Factory or an Azure Synapse pipeline (9 Units)
Integrate data with Azure Data Factory or Azure Synapse pipeline (13 Units)
Perform code-free transformations at scale with Azure Data Factory or an Azure Synapse pipeline (10 Units)
Orchestrate data movement and transformation in Azure Data Factory or Azure Synapse pipelines (9 Units)
Planning hybrid transactional and analytical processing using Azure Synapse Analytics (5 Units)
Implementing Azure Synapse Link with Azure Cosmos DB (9 Units)
Creating a data warehouse in Azure Synapse Analytics (10 Units)
Configuring and managing secrets in Azure Key Vault (6 units)
Implementing compliance controls for sensitive data (11 Units)
Enabling reliable messaging for big data applications with Azure Event Hubs (8 units)

Course outline

Module 1: Exploring compute and storage options for data engineering workloads

This module provides an overview of the Azure compute and storage technology options available to data engineers building analytics workloads. This module teaches how to structure the data lake and optimize files for batch, stream, and exploration workloads. The student will learn how to organize the data lake into data refinement levels as they transform files through batch and stream processing. They will then learn how to create indexes on their datasets, such as CSV, JSON, and Parquet files, and use them for potential query and workload acceleration.

Lessons

Introduction to Azure Synapse Analytics
Azure Databricks Overview
Getting started with Azure Data Lake Storage
Delta Lake Architecture Overview
Working with data streams using Azure Stream Analytics

Lab: Exploring compute and storage options for data engineering workloads

Combine batch and stream processing in a single pipeline
Organize the data lake into file transformation tiers
Indexing data lake storage for query and workload acceleration

After completing this module, students will be able to:

Describe Azure Synapse Analytics
Azure Databricks Overview
Describe Azure Data Lake Storage
Describe the architecture of Delta Lake
Describe Azure Stream Analytics

Module 2: Running interactive queries with Azure Synapse Analytics serverless SQL pools

In this module, students will learn how to work with files stored in the data lake and external file sources using T-SQL statements executed by a serverless SQL pool in Azure Synapse Analytics. They will query Parquet files stored in a data lake, as well as CSV files stored in an external data store. They will then create Azure Active Directory security groups and enforce access to files in the data lake through role-based access control (RBAC) and access control lists (ACLs).

Lessons

Exploring the capabilities of Azure Synapse serverless SQL pools
Querying data from the lake using Azure Synapse serverless SQL pools
Creating metadata objects in Azure Synapse serverless SQL pools
Data protection and user management in Azure Synapse serverless SQL pools

Lab: Running interactive queries with serverless SQL pools

Querying Parquet data with serverless SQL pools
Creating external tables for Parquet and CSV files
Creating views with serverless SQL pools
Protecting access to data in a data lake when using serverless SQL pools
Configure data lake security through role-based access control (RBAC) and access control lists (ACLs)

After completing this module, students will be able to:

Describe the capabilities of Azure Synapse serverless SQL pools
Querying data from the lake using Azure Synapse serverless SQL pools
Creating metadata objects in Azure Synapse serverless SQL pools
Data protection and user management in Azure Synapse serverless SQL pools

Module 3: Explore and transform data in Azure Databricks

This module teaches how to use various Apache Spark DataFrame methods to explore and transform data in Azure Databricks. Students will learn how to use standard DataFrame methods to explore and transform data. They will also learn how to perform more advanced tasks such as removing duplicate data, manipulating date and time values, renaming columns, and aggregating data.

Lessons

Azure Databricks Overview
Reading and writing data in Azure Databricks
Working with DataFrame elements in Azure Databricks
Working with advanced DataFrame methods in Azure Databricks

Lab: Performing data explorations and transformations in Azure Databricks

Using DataFrames in Azure Databricks to explore and filter data
Caching DataFrames for faster queries later
Deduplication of data
Manipulating date and time values
Removing and renaming columns from DataFrame
Adding data stored in a DataFrame

After completing this module, students will be able to:

Azure Databricks Overview
Reading and writing data in Azure Databricks
Working with DataFrame elements in Azure Databricks
Working with advanced DataFrame methods in Azure Databricks

Module 4: Exploring, transforming, and loading data into data warehouses with Apache Spark

This module teaches how to explore data stored in a data lake, transform the data, and load the data into a relational data warehouse. Students will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. They will then use Apache Spark to load data into the data warehouse and join Parquet data in the data lake with data from the dedicated SQL pool.

Lessons

Defining Big Data Engineering with Apache Spark in Azure Synapse Analytics
Ingesting data with Apache Spark notebooks in Azure Synapse Analytics
Transforming data with DataFrame objects from Azure Synapse Analytics Apache Spark pools
Integrating SQL pools and Apache Spark in Azure Synapse Analytics

Lab: Exploring, transforming, and loading data into data warehouses with Apache Spark

Performing data explorations in Synapse Studio
Ingest data with Spark notebooks in Azure Synapse Analytics
Transform data with Azure Synapse Analytics Spark pools DataFrame
Integrate SQL and Spark pools in Azure Synapse Analytics

After completing this module, students will be able to:

Describe Big Data Engineering with Apache Spark in Azure Synapse Analytics
Ingesting data with Apache Spark notebooks in Azure Synapse Analytics
Transforming data with DataFrame objects from Azure Synapse Analytics Apache Spark pools
Integrating SQL pools and Apache Spark in Azure Synapse Analytics

Module 5: Ingesting and loading data into data warehouses

This module teaches students how to ingest data into the data warehouse using T-SQL scripts and Synapse Analytics integration pipelines. Students will learn how to load data into Synapse dedicated SQL pools using PolyBase and COPY using T-SQL. They will also learn how to use workload management along with a copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.

Lessons

Using best practices for loading data into Azure Synapse Analytics
Petabyte-scale ingestion with Azure Data Factory

Lab: Ingesting and loading data into data warehouses

Perform petabyte-scale ingestions with Azure Synapse pipelines
Importing data with PolyBase and COPY using T-SQL
Using best practices for loading data into Azure Synapse Analytics

After completing this module, students will be able to:

Using best practices for loading data into Azure Synapse Analytics
Petabyte-scale ingestion with Azure Data Factory

Module 6: Transforming data with Azure Data Factory or Azure Synapse pipelines

This module teaches students how to create data integration pipelines to ingest from multiple data sources, transform data using mapping data flows, and perform data movements into one or more data sinks.

Lessons

Data integration with Azure Data Factory or Azure Synapse pipeline
Perform code-free transformations at scale with Azure Data Factory or Azure Synapse pipelines

Lab: Transforming data with Azure Data Factory or Azure Synapse pipelines

Run code-free transformations at scale with Azure Synapse pipelines
Create a data pipeline to import poorly formatted CSV files
Create mapping data flows

After completing this module, students will be able to:

Perform data integrations with Azure Data Factory
Perform code-free transformations at scale with Azure Data Factory

Module 7: Orchestrating data movements and transformations in Azure Synapse pipelines

In this module we will learn how to create linked services and orchestrate data movement and transformation using notebooks in Azure Synapse pipelines.

Lessons

Orchestrating data movements and transformations in Azure Data Factory

Lab : Orchestrating data movements and transformations in Azure Synapse pipelines

Integrate Notebook data with Azure Data Factory or Azure Synapse pipelines

After completing this module, students will be able to:

Orchestrate data movements and transformations in Azure Synapse pipelines

Module 8: End-to-end security with Azure Synapse Analytics

In this module, students will learn how to secure a Synapse Analytics workspace and its supporting infrastructure. They will explore SQL Active Directory Manager, manage IP firewall rules, manage secrets with Azure Key Vault, and access those secrets through a Key Vault linked service and pipeline activities. They will also learn how to implement column-level and row-level security and dynamic data masking when using dedicated SQL pools.

Lessons

Creating a data warehouse in Azure Synapse Analytics
Setting up and managing secrets in Azure Key Vault
Implementing compliance controls for sensitive data

Lab: End-to-end security with Azure Synapse Analytics

Securing the infrastructure behind Azure Synapse Analytics
Securing Azure Synapse Analytics Workspace and Managed Services
Protect data in your Azure Synapse Analytics workspace

After completing this module, students will be able to:

Creating a data warehouse in Azure Synapse Analytics
Setting up and managing secrets in Azure Key Vault
Implementing compliance controls for sensitive data

Module 9: Supporting hybrid transactional analytics processing with Azure Synapse Link

In this module, students will learn how Azure Synapse Link enables seamless connectivity between an Azure Cosmos DB account and a Synapse workspace. Students will see how to enable and configure Synapse Link, and then how to query the Azure Cosmos DB analytical store using Apache Spark and Serverless SQL.

Lessons

Designing hybrid transactional and analytical processing using Azure Synapse Analytics
Setting up Azure Synapse Link with Azure Cosmos DB
Azure Cosmos DB Query with Apache Spark Pools
Azure Cosmos DB Query with Serverless SQL Pools

Lab : Supporting hybrid transactional analytical processing with Azure Synapse Link

Setting up Azure Synapse Link with Azure Cosmos DB
Query Azure Cosmos DB with Apache Spark for Synapse Analytics
Query Azure Cosmos DB with Serverless SQL Pools for Azure Synapse Analytics

After completing this module, students will be able to:

Designing hybrid transactional and analytical processing using Azure Synapse Analytics
Setting up Azure Synapse Link with Azure Cosmos DB
Azure Cosmos DB Query with Apache Spark for Azure Synapse Analytics
Query Azure Cosmos DB with Serverless SQL for Azure Synapse Analytics

Module 10: Real-Time Stream Processing with Stream Analytics

In this module, students will learn how to process streaming data using Azure Stream Analytics. They will ingest vehicle telemetry data into Event Hubs and then process it in real-time using various window-based functions in Azure Stream Analytics. They will send the data to Azure Synapse Analytics. Finally, students will learn how to scale the Stream Analytics job to increase throughput.

Lessons

Enabling reliable messaging for big data applications with Azure Event Hubs
Working with data streams using Azure Stream Analytics
Ingesting data streams with Azure Stream Analytics

Lab: Real-Time Stream Processing with Stream Analytics

Using Stream Analytics to process real-time data from Event Hubs
Use Stream Analytics window-based functions to create aggregates and send them to Synapse Analytics
Scale Azure Stream Analytics jobs to increase performance through partitioning
Repartition the input streams to optimize parallelization

After completing this module, students will be able to:

Enabling reliable messaging for big data applications with Azure Event Hubs
Working with data streams using Azure Stream Analytics
Ingesting data streams with Azure Stream Analytics

Module 11: Building a Stream Processing Solution with Event Hubs and Azure Databricks

In this module, students will learn how to ingest and process stream data at scale using Event Hubs and Spark structured streaming on Azure Databricks. Students will learn the key features and uses of structured streaming. They will implement sliding windows to aggregate data chunks and apply watermarks to remove stale data. Finally, students will connect to Event Hubs to read and write streams.

Lessons

Processing streaming data with Azure Databricks Structured Streaming

Lab: Building a Stream Processing Solution with Event Hubs and Azure Databricks

Analyze the key uses and features of structured streaming.
Streaming data from a file and writing it to a distributed file system
Use sliding windows to add chunks of data instead of all data
Apply watermarks to remove obsolete data
Connect to Event Hubs read and write streams

After completing this module, students will be able to:

Processing streaming data with Azure Databricks Structured Streaming

Prerequisites

Successful students begin this course with knowledge of cloud computing and data fundamentals, and professional experience with data solutions.

Specifically, carrying out:

AZ-900: Azure Fundamentals
DP-900: Data Fundamentals in Microsoft Azure

Language

Course: English
Labs: English

Information related to training

Training support: Always by your side

Always by your side

Do you need another training modality?

Self Learning - Virtual - In-person - Telepresence

Bonuses for companies

For companies

Quick links

DP-203: Data Engineering on Microsoft Azure

Do you want to take this course remotely or in person?

Microsoft will retire DP-203: Data Engineering on Microsoft Azure on March 31, 2025. It will be replaced by DP-700: Microsoft Fabric Data Engineer.

DP-203 Course: Data Engineering on Microsoft Azure

The course includes the certification exam as a gift opportunity! *Promotion valid until February 28th only for clients from Spain

Audience Profile

Items in this collection

Course outline

Prerequisites

Language

Information related to training

Training support: Always by your side

Do you need another training modality?

Bonuses for companies

Register here to receive invitations to events and other Nanfor activities