DP-3027: Implement a data engineering solution with Azure Databricks

€495.00
| /

________________________________________________________________

Are you interested in this course in online or in-person format?
Contact us

📧info@nanforiberica.com • 📞+34 91 031 66 78 • 📱 +34 685 60 05 91 (WhatsApp) • 🏢 Our Offices

________________________________________________________________

NEW COURSE: The launch of the DP-3027 material: Implement a data engineering solution with Azure Databricks is delayed with a date to be determined.

Course DP-3027: Implement a data engineering solution with Azure Databricks

In this course, learn how to leverage the power of Apache Spark and the powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

Level: Beginner - Role: Data Analyst, Data Engineer, Data Scientist - Product: Azure - Subject: Data Engineering

Course aimed at

Data engineers, data scientists, and ELT developers learn how to leverage the power of Apache Spark and the powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

Objectives of the official course DP-3027

  • Understanding the Azure Databricks architecture: Become familiar with the platform's key components and how they integrate with other Azure services

  • Implement data ingestion techniques: Learn to capture data from multiple sources using tools such as Structured Streaming and Delta Lake

  • Perform data transformations and processing: Use Apache Spark to clean, transform, and prepare data for analysis or storage.

  • Develop scalable ETL flows: Build efficient and reusable data pipelines that support large volumes of information.

  • Optimize process performance: Apply tuning, autoscaling, and observability strategies to improve workflow efficiency.

  • Implement streaming architectures with Delta Live Tables: Design real-time solutions for continuous data processing.

  • Automate tasks with Azure Databricks Jobs: Orchestrate and schedule workflows to reduce manual intervention and accelerate the delivery of insights.

  • Apply CI/CD in data environments: Integrate continuous development practices to maintain the quality and stability of data solutions.

Content of the official Azure Databricks DP-3027 course

Module 1: Perform incremental processing with Spark structured streaming

  • Introduction
  • Configuring real-time data sources for incremental processing
  • Optimize Delta Lake for incremental processing on Azure Databricks
  • Control of delayed data and out-of-order events in incremental processing
  • Performance monitoring and tuning strategies for incremental processing in Azure Databricks
  • Exercise: Real-time data ingestion and processing with Delta Live Tables using Azure Databricks

Module 2 Implementation of streaming architecture patterns with Delta Live Tables

  • Introduction
  • Event-driven architectures with Delta Live Tables
  • Data ingestion with structured streaming
  • Maintaining data consistency and reliability with structured streaming
  • Scaling streaming workloads with Delta Live Tables
  • Exercise: End-to-end streaming channeling with Delta Live Tables

Module 3 Performance Optimization with Spark and Delta Live Tables

  • Introduction
  • Performance optimization with Spark and Delta Live Tables
  • Cost-based optimization and query adjustment
  • Use of Change Data Capture (CDC)
  • Use of improved auto-scaling
  • Implement observability and data quality metrics
  • Exercise: Optimizing data pipelines to improve performance in Azure Databricks

Module 4 Implementing CI/CD workflows in Azure Databricks

  • Introduction
  • Implementation of version control and Git integration
  • Performing unit tests and integration tests
  • Environment administration and configuration
  • Implementation of reversal and modernization strategies
  • Exercise: Implementation of CI/CD workflows

Module 5 Automating workloads with Azure Databricks jobs

  • Introduction
  • Implementation of job scheduling and automation
  • Workflow optimization with parameters
  • Control of the administration of dependencies
  • Implementation of error control and retry mechanisms
  • Exploration of recommended procedures and instructions
  • Exercise: Automation of data processing and ingestion

Module 6 Privacy Management and Data Governance with Azure Databricks

  • Introduction
  • Implementing data encryption techniques in Azure Databricks
  • Managing access controls in Azure Databricks
  • Implementing data masking and anonymization in Azure Databricks
  • Using compliance frameworks and secure data sharing in Azure Databricks
  • Use of data lineage and metadata management
  • Implementing governance automation in Azure Databricks
  • Exercise: Practice implementing the Unity catalog

Module 7 Using SQL Stores in Azure Databricks

  • Introduction
  • Introduction to SQL Warehouses
  • Creating databases and tables
  • Creating queries and panels
  • Exercise: Using an SQL store in Azure Databricks

Module 8 Running Azure Databricks notebooks with Azure Data Factory

  • Introduction
  • Description of Azure Databricks notebooks and pipelines
  • Creating a linked service for Azure Databricks
  • Using a Notebook activity in a pipeline
  • Using parameters in a notebook
  • Exercise: Running an Azure Databricks notebook with Azure Data Factory

Prerequisites

None

Language

  • Course: English / Spanish

💡 Did you know this course is included in LaaS Cert?

Take this course and many more with our LaaS Cert annual license . Unlimited training for only €1,295!

✅ Microsoft, Linux-LPI, SCRUM, ITIL and Nanfor technical courses

✅ Personalized support always by your side

✅ 100% online, official and updated

Get your license now!

LaaS cert Formación ilimitada

Information related to training

Soporte siempre a tu lado

Training support

Always by your side

Modalidades Formativas

Training modalities

Self Learning - Virtual - In-person - Telepresence

bonificaciones

Bonuses

For companies