________________________________________________________________
Do you want to take this course remotely or in person?
Contact us by email: info@nanforiberica.com , phone: +34 91 031 66 78, WhatsApp: +34 685 60 05 91 , or contact Our Offices
________________________________________________________________
Important: This course will be available on 07/18/25
Course DP-3027: Implement a data engineering solution with Azure Databricks
In this course, learn how to leverage the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.
Level: Beginner - Role: Data Analyst, Data Engineer, Data Scientist - Product: Azure - Subject: Data Engineering
Course aimed at
Data engineers, data scientists, and ELT developers learn how to leverage the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.
Objectives of the official course DP-3027
-
Understanding Azure Databricks architecture: Familiarize yourself with the key components of the platform and how they integrate with other Azure services.
-
Implement data ingestion techniques: Learn to capture data from multiple sources using tools such as Structured Streaming and Delta Lake
-
Perform data transformations and processing: Use Apache Spark to cleanse, transform, and prepare data for analysis or storage.
-
Develop scalable ETL workflows: Build efficient and reusable data pipelines that support large volumes of data.
-
Optimize process performance: Apply tuning, autoscaling, and observability strategies to improve workflow efficiency.
-
Implement streaming architectures with Delta Live Tables: Design real-time solutions for continuous data processing.
-
Automate tasks with Azure Databricks Jobs: Orchestrate and schedule workflows to reduce manual intervention and accelerate insight delivery.
-
Apply CI/CD to data environments: Integrate continuous development practices to maintain the quality and stability of data solutions.
Contents of the official Azure Databricks DP-3027 course
Module 1 Performing Incremental Processing with Spark Structured Streaming
- Introduction
- Configuring real-time data sources for incremental processing
- Optimizing Delta Lake for incremental processing in Azure Databricks
- Handling backlogs and out-of-order events in incremental processing
- Performance monitoring and tuning strategies for incremental processing in Azure Databricks
- Exercise: Real-time ingestion and processing with Delta Live Tables in Azure Databricks
Module 2: Implementing Streaming Architecture Patterns with Delta Live Tables
- Introduction
- Event-driven architectures with Delta Live Tables
- Data ingestion with structured streaming
- Maintaining data consistency and reliability with structured streaming
- Scaling Streaming Workloads with Delta Live Tables
- Exercise: End-to-End Streaming Pipeline with Delta Live Tables
Module 3: Performance Optimization with Spark and Delta Live Tables
- Introduction
- Optimizing Performance with Spark and Delta Live Tables
- Performing cost-based optimization and query tuning
- Using Change Data Capture (CDC)
- Use of enhanced autoscaling
- Implement observability and data quality metrics
- Exercise: Optimizing data pipelines to improve performance in Azure Databricks
Module 4: Implementing CI/CD Workflows in Azure Databricks
- Introduction
- Implementing version control and Git integration
- Performing unit tests and integration tests
- Environment administration and configuration
- Implementation of rollback and catch-up strategies
- Exercise: Implementing CI/CD Workflows
Module 5: Automating Workloads with Azure Databricks Jobs
- Introduction
- Implementation of job scheduling and automation
- Optimizing workflows with parameters
- Dependency management control
- Implementation of error control and retry mechanisms
- Exploring recommended procedures and instructions
- Exercise: Automating data processing and ingestion
Module 6: Managing Data Privacy and Governance with Azure Databricks
- Introduction
- Implementing data encryption techniques in Azure Databricks
- Managing access controls in Azure Databricks
- Implementing data masking and anonymization in Azure Databricks
- Using compliance frameworks and secure data sharing in Azure Databricks
- Using data lineage and metadata management
- Implementing governance automation in Azure Databricks
- Exercise: Practice implementing the Unity catalog
Module 7: Using SQL Stores in Azure Databricks
- Introduction
- Introduction to SQL Warehouses
- Creating databases and tables
- Creating queries and dashboards
- Exercise: Using a SQL Warehouse in Azure Databricks
Module 8: Running Azure Databricks Notebooks with Azure Data Factory
- Introduction
- Understanding Azure Databricks Notebooks and Pipelines
- Creating a Linked Service for Azure Databricks
- Using a Notebook Activity in a Pipeline
- Using parameters in a notebook
- Exercise: Running an Azure Databricks Notebook with Azure Data Factory
Prerequisites
None
Language
- Course: English / Spanish