DP-203: Data Engineering on Microsoft Azure
DP-203: Data Engineering on Microsoft Azure DP-203: Data Engineering on Microsoft Azure DP-203: Data Engineering on Microsoft Azure

Doing this associate level training with official certification included double opportunity, we give you FREE the course of DP-900: Microsoft Azure Data Fundamentals with laboratories, practices and study material.

* Promotion valid during January, February and March 2023

Course description

In this course, the student will learn about data engineering as it relates to working with batch and real-time analytics solutions using Azure data platform technologies. Students will begin by learning the basic processing and storage technologies used to create an analytics solution. They will also learn to interactively explore data stored in files in a data lake. They will learn about the various ingestion techniques that can be used to load data using the Apache Spark functionality included in Azure Synapse Analytics or Azure Databricks, or how to ingest using Azure Data Factory or Azure Synapse pipelines. Students will also learn about the different ways they can transform data using the same technologies used to ingest data. They will understand the importance of implementing security to ensure that data (at rest or in transit) is protected. After that, they will be explained how to create a real-time analytics system to create real-time analytics solutions.

 

Audience Profile

The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and creating analytics solutions using the data platform technologies found in Microsoft Azure. The secondary audience for this course is data analysts and data scientists working with analytics solutions based on Microsoft Azure.
 

Items in this collection

  • Introduction to Azure Synapse Analytics (7 Units)
  • Explore Azure Databricks (7 Units)
  • Introduction to Azure Data Lake Storage (7 Units)
  • Introduction to Azure Stream Analytics (7 Units)
  • Using an Azure Synapse serverless SQL pool to query files in a data lake (7 Units)
  • Use of SQL groups on the Azure Synapse server to transform data into a data lake (7 Units)
  • Creating a lake database in Azure Synapse Analytics (8 Units)
  • Data Protection and User Management in Azure Synapse Serverless SQL Pools (6 Units)
  • Using Apache Spark on Azure Databricks (9 Units)
  • Use of Delta Lake on Azure Databricks (8 Units)
  • Data analysis with Apache Spark in Azure Synapse Analytics (8 Units)
  • Integration of SQL pools and Apache Spark in Azure Synapse Analytics (11 Units)
  • Using best practices for loading data into Azure Synapse Analytics (11 Units)
  • Ingest at petabyte scale with Azure Data Factory or an Azure Synapse pipeline (9 Units)
  • Integrating data with Azure Data Factory or channeling from Azure Synapse (13 Units) 
  • Perform no-code transformations at scale with Azure Data Factory or an Azure Synapse pipeline (10 Units)
  • Orchestrate data movement and transformation in Azure Data Factory or Azure Synapse pipelines (9 Units)
  • Planning hybrid transactional and analytical processing using Azure Synapse Analytics (5 Units) 
  • Implementation of Azure Synapse Link with Azure Cosmos DB (9 Units)
  • Creation of a data warehouse in Azure Synapse Analytics (10 Units)
  • Configuration and management of secrets in Azure Key Vault (6 Units)
  • Implementation of compliance controls for sensitive data (11 Units)
  • Enabling reliable messaging for big data applications with Azure Event Hubs (8 Units)

 

Course outline

Module 1: Exploring compute and storage options for data engineering workloads

This module provides an overview of the Azure storage and compute technology options available to data engineers building analytics workloads. This module teaches you how to structure the data lake and how to optimize files for batch, sequence, and scan workloads. The student will learn to organize the data lake into levels of data refinement as they transform files through batch and stream processing. Then, they will learn how to create indexes on their data sets, such as CSV, JSON, and Parquet files, and how to use them for possible acceleration of queries and workloads.

lessons

  • Introduction to Azure Synapse Analytics

  • Azure Databricks overview

  • Introduction to Azure Data Lake Storage

  • Description of Delta Lake architecture

  • Work with data streams using Azure Stream Analytics

Lab : Exploring compute and storage options for data engineering workloads

  • Combine batch and stream processing in a single pipeline

  • Organize the data lake into file transformation tiers

  • Index data lake storage for query and workload acceleration

After completing this module, students will be able to do the following:

  • Describe Azure Synapse Analytics

  • Azure Databricks overview

  • Describir Azure Data Lake Storage

  • Describe the architecture of Delta Lake

  • Describir Azure Stream Analytics

Module 2: Execution of interactive queries with SQL groups on the Azure Synapse Analytics server

In this module, students will learn how to work with files stored in the data lake and external file sources using T-SQL statements executed by a serverless SQL pool in Azure Synapse Analytics. They will query Parquet files stored in a data lake, as well as CSV files stored in an external data store. Then they'll create Azure Active Directory security groups and enforce access to data lake files through role-based access control (RBAC) and access control lists (ACLs).

lessons

  • Explore the capabilities of Azure Synapse serverless SQL pools

  • Query data in the lake using Azure Synapse serverless SQL pools

  • Create metadata objects in Azure Synapse serverless SQL pools

  • Data protection and user management in Azure Synapse serverless SQL pools

Lab : Running interactive queries with serverless SQL pools

  • Query Parquet data with SQL groups without a server

  • Create external tables for Parquet and CSV files

  • Create views with serverless SQL pools

  • Protect access to data in a data lake when using serverless SQL pools

  • Configure data lake security through role-based access control (RBAC) and access control lists (ACLs)

After completing this module, students will be able to do the following:

  • Describe the capabilities of Azure Synapse serverless SQL pools

  • Query data in the lake using Azure Synapse serverless SQL pools

  • Create metadata objects in Azure Synapse serverless SQL pools

  • Data protection and user management in Azure Synapse serverless SQL pools

Module 3: Data exploration and transformation in Azure Databricks

This module teaches you how to use various Apache Spark DataFrame methods to explore and transform data in Azure Databricks. Students will learn to use standard DataFrame methods to explore and transform data. They will also learn to perform more advanced tasks such as removing duplicate data, manipulating date and time values, renaming columns, and adding data.

lessons

  • Azure Databricks overview

  • Reading and writing data in Azure Databricks

  • Work with DataFrames in Azure Databricks

  • Work with advanced methods of DataFrame in Azure Databricks

Lab : Performing data explorations and transformations in Azure Databricks

  • Use DataFrames in Azure Databricks to explore and filter data

  • Caching DataFrames for faster queries later

  • deduplication

  • Manipulate date and time values

  • Remove columns from DataFrame and rename them

  • Add data stored in a DataFrame

After completing this module, students will be able to do the following:

  • Azure Databricks overview

  • Reading and writing data in Azure Databricks

  • Work with DataFrames in Azure Databricks

  • Work with advanced methods of DataFrame in Azure Databricks

Module 4: Exploring, Transforming, and Loading Data into Data Warehouses with Apache Spark

This module teaches you how to explore data stored in a data lake, how to transform the data, and how to load it into a relational data warehouse. Students will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. They will then use Apache Spark to load data into the data warehouse and join data from Parquet in the data lake with data from the dedicated SQL pool.

lessons

  • Defining big data engineering with Apache Spark in Azure Synapse Analytics

  • Data ingestion with Apache Spark notebooks in Azure Synapse Analytics

  • Data transformation with Azure Synapse Analytics Apache Spark group DataFrame objects

  • SQL pools and Apache Spark integration in Azure Synapse Analytics

Lab : Exploring, transforming, and loading data into data warehouses with Apache Spark

  • Perform data scans in Synapse Studio

  • Ingest data with Spark notebooks in Azure Synapse Analytics

  • Transform data with Azure Synapse Analytics Spark Groups DataFrame

  • Integrate SQL and Spark pools in Azure Synapse Analytics

After completing this module, students will be able to do the following:

  • Describe big data engineering with Apache Spark in Azure Synapse Analytics

  • Data ingestion with Apache Spark notebooks in Azure Synapse Analytics

  • Data transformation with Azure Synapse Analytics Apache Spark group DataFrame objects

  • SQL pools and Apache Spark integration in Azure Synapse Analytics

Module 5: Ingesting and loading data into data warehouses

This module teaches students how to ingest data into the data warehouse using T-SQL scripts and Synapse Analytics integration pipelines. Students will learn how to load data into Synapse dedicated SQL pools with PolyBase and COPY using T-SQL. They will also learn how to use workload management in conjunction with a copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.

lessons

  • Using best practices for uploading data to Azure Synapse Analytics

  • Ingest at petabyte scale with Azure Data Factory

Lab : Ingesting and Loading Data into Data Warehouses

  • Ingest at petabyte scale with Azure Synapse pipelines

  • Import data with PolyBase and COPY using T-SQL

  • Using best practices for uploading data to Azure Synapse Analytics

After completing this module, students will be able to do the following:

  • Using best practices for uploading data to Azure Synapse Analytics

  • Ingest at petabyte scale with Azure Data Factory

Module 6: Transform data with Azure Data Factory or Azure Synapse pipelines

This module teaches students how to create data integration pipelines to ingest from multiple data sources, transform data using mapping data flows, and perform data movement across one or more data sinks.

lessons

  • Data Integration with Azure Data Factory or Azure Synapse Pipeline

  • Perform no-code transformations at scale with Azure Data Factory or Azure Synapse pipelines

Lab : Transform data with Azure Data Factory pipelines or Azure Synapse

  • Run transformations without code and at scale with Azure Synapse pipelines

  • Create a data pipeline to import poorly formatted CSV files

  • Create Mapping Dataflows

After completing this module, students will be able to do the following:

  • Perform data integrations with Azure Data Factory

  • Perform code-free transformations at scale with Azure Data Factory

Module 7: Organize Data Movement and Transformations in Azure Synapse Pipelines

In this module we will learn how to create linked services and how to orchestrate the movement and transformation of data using notebooks in Azure Synapse pipelines.

lessons

  • Orchestration of data movements and transformations in Azure Data Factory

Lab : Organize data movement and transformations in Azure Synapse pipelines

  • Integrate Notebook data with Azure Data Factory or Azure Synapse pipelines

After completing this module, students will be able to do the following:

  • Orchestrate data movements and transformations in Azure Synapse pipelines

Module 8: End-to-end security with Azure Synapse Analytics

In this module, students will learn how to secure a Synapse Analytics workspace and its supporting infrastructure. They will analyze SQL Active Directory Manager, manage IP firewall rules, manage secrets with Azure Key Vault, and access those secrets through a Key Vault linked service and pipeline activities. They will also learn how to implement column-level and row-level security and dynamic data masking when using dedicated SQL pools.

lessons

  • Create a data warehouse in Azure Synapse Analytics

  • Configure and manage secrets in Azure Key Vault

  • Implementation of compliance controls for sensitive data

Lab : End-to-end security with Azure Synapse Analytics

  • Protect infrastructure behind Azure Synapse Analytics

  • Secure Azure Synapse Analytics workspace and managed services

  • Protect Azure Synapse Analytics workspace data

After completing this module, students will be able to do the following:

  • Create a data warehouse in Azure Synapse Analytics

  • Configure and manage secrets in Azure Key Vault

  • Implementation of compliance controls for sensitive data

In this module, students will learn how Azure Synapse Link enables seamless connectivity between an Azure Cosmos DB account and a Synapse workspace. Students will see how to enable and configure Synapse Link, followed by how to query the Azure Cosmos DB analytical store using Apache Spark and serverless SQL.

lessons

  • Design hybrid transactional and analytical processing using Azure Synapse Analytics

  • Configure Azure Synapse Link with Azure Cosmos DB

  • Azure Cosmos DB query with Apache Spark groups

  • Azure Cosmos DB query with SQL groups without server

  • Configure Azure Synapse Link with Azure Cosmos DB

  • Query Azure Cosmos DB with Apache Spark for Synapse Analytics

  • Query Azure Cosmos DB with SQL groups without server for Azure Synapse Analytics

After completing this module, students will be able to do the following:

  • Design hybrid transactional and analytical processing using Azure Synapse Analytics

  • Configure Azure Synapse Link with Azure Cosmos DB

  • Azure Cosmos DB Query with Apache Spark for Azure Synapse Analytics

  • Query Azure Cosmos DB with serverless SQL for Azure Synapse Analytics

Module 10: Real-time stream processing with Stream Analytics

In this module, students will learn how to process stream data with Azure Stream Analytics. They will ingest vehicle telemetry data into Event Hubs and then process it in real time using various window-based functions in Azure Stream Analytics. They will send the data to Azure Synapse Analytics. Finally, students will learn how to scale Stream Analytics work to increase performance.

lessons

  • Enabling reliable messaging for big data applications with Azure Event Hubs

  • Work with data streams using Azure Stream Analytics

  • Ingest data streams with Azure Stream Analytics

Lab : Real-time Stream Processing with Stream Analytics

  • Use Stream Analytics to process real-time data from Event Hubs

  • Use Stream Analytics window-based functions to create aggregates and send them to Synapse Analytics

  • Scale Azure Stream Analytics jobs to increase performance through partitioning

  • Repartition input streams to optimize parallelization

After completing this module, students will be able to do the following:

  • Enabling reliable messaging for big data applications with Azure Event Hubs

  • Work with data streams using Azure Stream Analytics

  • Ingest data streams with Azure Stream Analytics

Module 11: Building a Stream Processing Solution with Event Hubs and Azure Databricks

In this module, students will learn how to ingest and process streaming data at scale with Event Hubs and Spark structured streaming on Azure Databricks. Students will learn the uses and key features of structured streaming. They will implement sliding windows to add chunks of data and apply watermarks to remove stale data. Finally, students will connect to Event Hubs to read and write streams.

lessons

  • Stream data processing with Azure Databricks Structured Streaming

Lab : Building a Stream Processing Solution with Event Hubs and Azure Databricks

  • Analyze the uses and key characteristics of structured streaming.

  • Stream data from a file and write it to a distributed file system

  • Use sliding windows to add chunks of data instead of all data

  • Apply watermarks to remove outdated data

  • Connect to Event Hubs read and write streams

After completing this module, students will be able to do the following:

  • Stream data processing with Azure Databricks Structured Streaming

 

Previous requirements

Eligible students begin this course with knowledge of cloud computing and data fundamentals, and professional experience with data solutions. 

Carrying out specifically:

  • AZ-900: Azure Basics

  • DP-900: Fundamentals of Data in Microsoft Azure

 

Language

  • English course

  • Labs: English

Customer Reviews

Based on 1 review Write a review
€695.00