• Welcome to CloudMonks
  • +91 96660 64406
  • info@thecloudmonks.com

Azure Data Engineering Full Stack


An Azure Data Engineering full-stack role encompasses the entire data lifecycle within the Azure ecosystem, from data ingestion to analysis and reporting. These engineers are responsible for designing, implementing, and maintaining data pipelines, data warehouses, and data lake solutions using a variety of Azure services. They also handle tasks like data transformation, security, and performance optimization.


Responsibilities:

Data Ingestion and Extraction:

Bringing data from various sources (structured, unstructured, real-time) into Azure.

Data Transformation and Cleaning:

Ensuring data quality and consistency through cleaning, transformation, and integration processes.

Data Storage:

Designing and implementing data storage solutions, including Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.

Data Warehousing:

Building and maintaining data warehouses using Azure Synapse Analytics.

Data Pipeline Development:

Creating and managing automated data pipelines for efficient data movement and processing using Azure Data Factory or Azure Databricks.

Data Security and Compliance:

Implementing security measures (encryption, access control) and ensuring compliance with data privacy laws.

Performance Monitoring and Optimization:

Identifying and resolving performance bottlenecks in data systems.

Collaboration:

Working with data scientists, analysts, and business stakeholders to understand their needs and implement appropriate data solutions.



Azure Data Engineering Full Stack Course Curriculum



Azure Databricks

Day 1:

  • What is Big Data Analytics
  • Data Analytics Platform
    • Storage
    • Compute
  • Data Processing Paradigms
    • Monolithic Computing
    • Distributed Computing

    Day 2:

  • Distributed Computing Frameworks
    • Hadoop MapReduce
    • Apache Spark
  • Big Data Analytics : Data Lakes
    • Tightly Coupled Data Lake
    • Looseky Coupled Data Lake

    Day 3:

  • Big Data File Formats
    • Row Storage Format
    • Columnar Storage Format
  • Scalability
    • Scale - Up (Vertical Scalability)
    • Scale - Out (Horizontal Scalability)

    Day 4: Intruduction To Azure Databricks

    • Core Databricks Concepts
      • Workspace
      • Notebooks
      • Library
      • Folder
      • Repos
      • Data
      • Compute
      • Workflows

    Day 5: Introducing Spark Fundamentals

    • What is Apache Spark
    • Why Choose Apache Spark
    • What are the Spark use cases

    Day 6: Spark Architecture

    • Spark Components
      • Spark Driver
      • SparkSession
      • Cluster manager
      • Spark Executors

    Day 7: Create Databricks Workspace

    • Workspace Assets

    Day 8: Creating Spark Cluster

    • All-Purpose Cluster
      • Single Node Cluster
      • Multi Node Cluster

    Day 9: Databricks - Internal Storage

    • Databricks File System (DBFS)
    • Uploading Files to DBFS

    Day 10: DBUTILS Module

    • Interaction with DBFS
    • %fs Magic Command

    Day 11: Spark Data API's

    • RDD (Resilient Distributed Dataset)
    • DataFrame
    • Dataset

    Day 12: Create Data Frame

    • Using Python Collection
    • Converting RDD to DataFrame

    Day 13: Reading CSV data with Apache Spark

    • Inferred Schema
    • Explicit Schema
    • Parsing Modes

    Day 14: Reading JSON data with Apache Spark

    • SingleLine JSON
    • Multiline JSON
    • Complex JSON
    • explode() Function

    Day 15: Reading XML Data with Apache Spark

    • Install Spark-xml Library
    • User Defined Schema
      • DDL String Approach
      • StructType() with StructFields()

    Day 16: Reading Excel File With Apache Spark

    • Single Sheet Reading
    • Multiple Sheet Reading Using List object

    Day 17: Reading Excel File With Apache Spark

    • Multiple Excel Sheets with Same Structure
    • Multiple Excel Sheets with Different Structures

    Day 18: Reading parquet data With Apache Spark

    • Uploading parquet data
    • View the data DataFrame
    • view the Schema of the DataFrame
    • limitations of parquet file
    • Schema Evolution

    Day 19: Intruduction to Delta Lake

    • Delta Lake Features
    • Delta Lake Components

    Day 20: Delta lake Features

    • DML Operations
    • Time Travel Operations

    Day 21: Delta lake Features

    • Schema Validation and Enforcement
    • Schema Evolution

    Day 22: Access Data from Azure Blob Storage

    • Account Access Key
    • Windows Azure Storage Blob driver (WASB)
    • Read Operations
    • Write Operation

    Day 23: Access Data from Azure Data Lake Gen2

    • Azure Service Principal
    • Azure Service Principal
    • Azure Blob Filesystem driver (ABFS)
    • Read Operations
    • Write Operation

    Day 24: Access Data from Azure Data Lake Gen2

    • Shared access signatures (SAS)
    • Azure Blob Filesystem driver (ABFS)
    • Read Operations
    • Write Operation

    Day 25: Access Data from Azure SQL Database

    • Configure a connection to SQL server

    Day 26: Access Data from Synapse Dedicated SQL Pool

    • Configure storage account access key
    • Read data from an Azure Synapse table
    • Write Data to Azure Synapse table

    Day 27: Access Data from Snowflake

    • Reading Data
    • Writing Data

    Day 28: Create Mount Point to Azure Cloud Storages

    • Azure Blob Storage
    • Azure Data Lake Storage

    Day 29: Introduction to Spark SQL Module

    • Hive Metastore
    • Spark Catalog

    Day 30: Spark SQL - Create Global Managed Tables

    • DataFrame API
    • SQL API

    Day 31: Spark SQL - Create Global Un-Managed Tables

    • DataFrame API
    • SQL API

    Day 32: Spark SQL_Create Views

    • Temporary Views
    • Global Temporary Views
    • DataFrame API
    • SQL API
    • Dropping Views

    Day 33: Spark Batch Processing

    • Reading Batch Data
    • Writing Batch Data

    Day 34: Spark Structured Streaming API

    • Reading Streaming Data
    • Write Streaming Data
    • checkPoint Location

    Day 35: Spark Structured Streaming API - outputModes

    • Append
    • Complete
    • Update

    Day 36: Spark Structured Streaming API_Triggers

    • Unspecified Trigger (Default Behavior)
    • trigger(availableNow = True)
    • trigger(processingTime = "n minutes")

    Day 37: Spark Structured Streaming API

    • Data Processing
    • Joins
    • Aggregation

    Day 38: Code Modularity of Notebooks

    • %run Magic Command

    Day 39: dbutils.notebook Utility

    • run()
    • exit()

    Day 40: Widgets_Types of Widgets

    • text
    • dropdown
    • multiselect
    • combobox

    Day 41:Parameterization of Notebooks

    • History Load
    • Incremental Load

    Day 42:Trigger Notebook from Data Factory Pipeline

    • Notebook Parameters

    Day 43:Databricks Workflow

    • Orchestration of Tasks

    Day 44:Databricks Workflow

    • Task Parameters
    • Job Trigger

    Day 45: Delta Lake Implementation

    • SCD Type0 Dimension

    Day 46:Delta Lake Implementation

    • SCD Type1 Dimension

    Day 47:Delta Lake Implementation

    • SCD Type2 Dimension

    Day 48:Delta Lake Implementation

    • SCD Type3 Dimension

    Day 49:PySpark Performance Optimization

    • Cache()
    • Persist()

    Day 50:PySpark Performance Optimization

    • repartition()
    • coalesce()

    Day 51:PySpark Performance Optimization

    • Column Predicate Pushdown
    • partitionBy()

    Day 52:PySpark Performance Optimization

    • bucketBy()

    Day 53:PySpark Performance Optimization

    • BroadCastJoin

    Day 54:Delta Lake_Performance Optimization

    • OPTIMIZE
    • ZORDER

    Day 55:Delta Lake_Performance Optimization

    • Delta Cache

    Day 56:Delta Lake_Performance Optimization

    • Liquid Clustering

    Day 57:Delta Lake_Performance Optimization

    • Partitioning
    • Liquid Clustering

    Day 58:Databricks Unity Catalog

    • Metastore
    • Catalog
    • Schema
    • Tables
    • Volumes
    • Views

    Day 59:Databricks Unity Catalog

    • Managed Tables
    • External Tables

    Day 60:Databricks Unity Catalog

    • Managed Volumes
    • External Volumes

    Day 61:Databricks - Auto Loader

    • Auto Loader file detection modes
      • Directory Listing mode
      • File Notification mode
    • Schema Evolution with Auto Loader

    Day 62:Delta Live Tables

    • Simple Declarative SQL & Python APIs
    • Automated Pipeline Creation
    • Data Quality Checks






    Databricks with PySpark Assessments (@Home)

    • ADB_Assessment1_ADB_PySpark_Types of Operations_Transformations_Actions
    • ADB_Assessment2_PySpark_Transformations_select()_selectExpr()
    • ADB_Assessment3_PySpark_Transformations_Data Cleansing_filter()_where()
    • ADB_Assessment4_PySpark_Transformation_Identifying Duplicates_Remove Duplicates
    • ADB_Assessment5_PySpark_Transformation_Sorting Data_sort()_orderBy()
    • ADB_Assessment6_PySpark_Transformation_Single_Multi_Aggregation
    • ADB_Assessment7_PySpark_DataFrame_Renaming Columns using List Comprehension
    • ADB_Assessment8_PySpark_Introduction to Join_Types of Joins
    • ADB_Assessment9_PySpark_Implementation of Joins_Types of Joins
    • ADB_Assessment10_PySpark_Drop Rows that Contains Nulls_dropna()_na.drop()
    • ADB_Assessment11_PySpark_Fill Rows that Contains Nulls_fillna()_na.fill()
    • ADB_Assessment12_PySpark_Window Functions_rank()_dense_rank()_row_number()

    Azure Data Factory

    • ADF Key Components
    • Copy Data from ADLS Gen2 To Azure SQL Database
    • Dynamic Data Ingestion from Files to Tables
    • Dynamic Data Ingestion from Files to Tables_Check File Existence
    • Dynamic Data Ingestion from Tables to Files Using Lookup Query
    • Data Factory Resource Integration with Azure Key Vault
    • Data flows with Transformation Logic
    • Pipeline Execution Success and Error Logs
    • Pipeline with Incremental Loading
    • Pipeline that send Email notifications On Success or Failure
    • Data Ingestion from On-Premise File System to Azure SQL Database


    Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

    Talk With Us