Best Azure Data Engineering Full Stack Course in Hyderabad

An Azure Data Engineering full-stack role encompasses the entire data lifecycle within the Azure ecosystem, from data ingestion to analysis and reporting. These engineers are responsible for designing, implementing, and maintaining data pipelines, data warehouses, and data lake solutions using a variety of Azure services. They also handle tasks like data transformation, security, and performance optimization.

Responsibilities:

Data Ingestion and Extraction:

Bringing data from various sources (structured, unstructured, real-time) into Azure.

Data Transformation and Cleaning:

Ensuring data quality and consistency through cleaning, transformation, and integration processes.

Data Storage:

Designing and implementing data storage solutions, including Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.

Data Warehousing:

Building and maintaining data warehouses using Azure Synapse Analytics.

Data Pipeline Development:

Creating and managing automated data pipelines for efficient data movement and processing using Azure Data Factory or Azure Databricks.

Data Security and Compliance:

Implementing security measures (encryption, access control) and ensuring compliance with data privacy laws.

Performance Monitoring and Optimization:

Identifying and resolving performance bottlenecks in data systems.

Collaboration:

Working with data scientists, analysts, and business stakeholders to understand their needs and implement appropriate data solutions.

Azure Data Engineering Full Stack Course Curriculum

Azure Databricks

Day 1:

What is Big Data Analytics

Data Analytics Platform

Storage
Compute

Data Processing Paradigms

Monolithic Computing
Distributed Computing

Day 2:

Distributed Computing Frameworks

Hadoop MapReduce
Apache Spark

Big Data Analytics : Data Lakes

Tightly Coupled Data Lake
Looseky Coupled Data Lake

Day 3:

Big Data File Formats

Row Storage Format
Columnar Storage Format

Scalability

Scale - Up (Vertical Scalability)
Scale - Out (Horizontal Scalability)

Day 4: Intruduction To Azure Databricks

Core Databricks Concepts

Workspace
Notebooks
Library
Folder
Repos
Data
Compute
Workflows

Day 5: Introducing Spark Fundamentals

What is Apache Spark
Why Choose Apache Spark
What are the Spark use cases

Day 6: Spark Architecture

Spark Components

Spark Driver
SparkSession
Cluster manager
Spark Executors

Day 7: Create Databricks Workspace

Workspace Assets

Day 8: Creating Spark Cluster

All-Purpose Cluster

Single Node Cluster
Multi Node Cluster

Day 9: Databricks - Internal Storage

Databricks File System (DBFS)
Uploading Files to DBFS

Day 10: DBUTILS Module

Interaction with DBFS
%fs Magic Command

Day 11: Spark Data API's

RDD (Resilient Distributed Dataset)
DataFrame
Dataset

Day 12: Create Data Frame

Using Python Collection
Converting RDD to DataFrame

Day 13: Reading CSV data with Apache Spark

Inferred Schema
Explicit Schema
Parsing Modes

Day 14: Reading JSON data with Apache Spark

SingleLine JSON
Multiline JSON
Complex JSON
explode() Function

Day 15: Reading XML Data with Apache Spark

Install Spark-xml Library
User Defined Schema

DDL String Approach
StructType() with StructFields()

Day 16: Reading Excel File With Apache Spark

Single Sheet Reading
Multiple Sheet Reading Using List object

Day 17: Reading Excel File With Apache Spark

Multiple Excel Sheets with Same Structure
Multiple Excel Sheets with Different Structures

Day 18: Reading parquet data With Apache Spark

Uploading parquet data
View the data DataFrame
view the Schema of the DataFrame
limitations of parquet file
Schema Evolution

Day 19: Intruduction to Delta Lake

Delta Lake Features
Delta Lake Components

Day 20: Delta lake Features

DML Operations
Time Travel Operations

Day 21: Delta lake Features

Schema Validation and Enforcement
Schema Evolution

Day 22: Access Data from Azure Blob Storage

Account Access Key
Windows Azure Storage Blob driver (WASB)
Read Operations
Write Operation

Day 23: Access Data from Azure Data Lake Gen2

Azure Service Principal
Azure Service Principal
Azure Blob Filesystem driver (ABFS)
Read Operations
Write Operation

Day 24: Access Data from Azure Data Lake Gen2

Shared access signatures (SAS)
Azure Blob Filesystem driver (ABFS)
Read Operations
Write Operation

Day 25: Access Data from Azure SQL Database

Configure a connection to SQL server

Day 26: Access Data from Synapse Dedicated SQL Pool

Configure storage account access key
Read data from an Azure Synapse table
Write Data to Azure Synapse table

Day 27: Access Data from Snowflake

Reading Data
Writing Data

Day 28: Create Mount Point to Azure Cloud Storages

Azure Blob Storage
Azure Data Lake Storage

Day 29: Introduction to Spark SQL Module

Hive Metastore
Spark Catalog

Day 30: Spark SQL - Create Global Managed Tables

DataFrame API
SQL API

Day 31: Spark SQL - Create Global Un-Managed Tables

DataFrame API
SQL API

Day 32: Spark SQL_Create Views

Temporary Views
Global Temporary Views
DataFrame API
SQL API
Dropping Views

Day 33: Spark Batch Processing

Reading Batch Data
Writing Batch Data

Day 34: Spark Structured Streaming API

Reading Streaming Data
Write Streaming Data
checkPoint Location

Day 35: Spark Structured Streaming API - outputModes

Append
Complete
Update

Day 36: Spark Structured Streaming API_Triggers

Unspecified Trigger (Default Behavior)
trigger(availableNow = True)
trigger(processingTime = "n minutes")

Day 37: Spark Structured Streaming API

Data Processing
Joins
Aggregation

Day 38: Code Modularity of Notebooks

%run Magic Command

Day 39: dbutils.notebook Utility

run()
exit()

Day 40: Widgets_Types of Widgets

text
dropdown
multiselect
combobox

Day 41:Parameterization of Notebooks

History Load
Incremental Load

Day 42:Trigger Notebook from Data Factory Pipeline

Notebook Parameters

Day 43:Databricks Workflow

Orchestration of Tasks

Day 44:Databricks Workflow

Task Parameters
Job Trigger

Day 45: Delta Lake Implementation

SCD Type0 Dimension

Day 46:Delta Lake Implementation

SCD Type1 Dimension

Day 47:Delta Lake Implementation

SCD Type2 Dimension

Day 48:Delta Lake Implementation

SCD Type3 Dimension

Day 49:PySpark Performance Optimization

Cache()
Persist()

Day 50:PySpark Performance Optimization

repartition()
coalesce()

Day 51:PySpark Performance Optimization

Column Predicate Pushdown
partitionBy()

Day 52:PySpark Performance Optimization

bucketBy()

Day 53:PySpark Performance Optimization

BroadCastJoin

Day 54:Delta Lake_Performance Optimization

OPTIMIZE
ZORDER

Day 55:Delta Lake_Performance Optimization

Delta Cache

Day 56:Delta Lake_Performance Optimization

Liquid Clustering

Day 57:Delta Lake_Performance Optimization

Partitioning
Liquid Clustering

Day 58:Databricks Unity Catalog

Metastore
Catalog
Schema
Tables
Volumes
Views

Day 59:Databricks Unity Catalog

Managed Tables
External Tables

Day 60:Databricks Unity Catalog

Managed Volumes
External Volumes

Day 61:Databricks - Auto Loader

Auto Loader file detection modes

Directory Listing mode
File Notification mode

Schema Evolution with Auto Loader

Day 62:Delta Live Tables

Simple Declarative SQL & Python APIs
Automated Pipeline Creation
Data Quality Checks

Azure Data Engineering Full Stack