• Welcome to CloudMonks
  • INDIA: +91 96660 64406
  • USA : +1(971)-243-1687
  • info@thecloudmonks.com

Azure Data Engineering Full Stack

About The Course

Azure data engineers are responsible for data-related tasks that include provisioning data storage services, batch data and ingesting streaming, implementing security requirements, transforming data, implementing data retention policies, identifying performance bottlenecks, and accessing external data sources. In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn't have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers.

Big data requires a service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory Training in Hyderabad is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

For example, imagine a gaming company that collects petabytes of game logs that are produced by games in the cloud. The company wants to analyze these logs to gain insights into customer preferences, demographics, and usage behavior. It also wants to identify up-sell and cross-sell opportunities, develop compelling new features, drive business growth, and provide a better experience to its customers.

To analyze these logs, the company needs to use reference data such as customer information, game information, and marketing campaign information that is in an on-premises data store. The company wants to utilize this data from the on-premises data store, combining it with additional log data that it has in a cloud data store.

To extract insights, it hopes to process the joined data by using a Spark cluster in the cloud (Azure HDInsight), and publish the transformed data into a cloud data warehouse such as Azure Synapse Analytics to easily build a report on top of it. They want to automate this workflow, and monitor and manage it on a daily schedule. They also want to execute it when files land in a blob store container.

Best ADF Training in Hyderabd is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Azure Data Factory:

Module 1: Azure Data Factory Introduction

  • What is Azure Data Factory(ADF)?
  • Azure Data Factory Key Components
    • Pipeline
    • Activity
    • Linked Service
    • Data Set
    • Integration Runtime
    • Triggers
    • Data Flows
  • Create Azure Bolb Storage Account
  • Create Azure data lake Storage Gen2 Account
  • Create Azure SQL Database
  • Creation of Azure Data Factory Resourse

Module 2: Working with Copy Data Activity

  • Understanding Azure Data Factory UI
  • Data Ingestion from Blob Storage Service to Azure SQL Database
  • Data Ingestion from Azure Blob Storage to Data Lake Storage Gen2
  • Create Linked service for various data stores and compute
  • Creation of Datasets that points to file and table
  • Design Pipelines with various activities
  • Create SQL Server on Virtual Machines( On-Premise)
  • Define Copy activity and it features
  • Copy Activity-Copy Behaviour
  • Copy Activity_Data Integration Units
  • Copy Activity- User Properties
  • Copy Activity- Number of parallel copies

Module 3 : Azure Data Factory- General Activities

  • Lookup Activity
  • Get Metadata Activity
  • Stored Procedure Activity
  • Execute Pipeline Activity
  • Delete Activity
  • Set Variable Activity
  • Script Activity
  • Validation Activity
  • Web Activity
  • Wait Activity
  • Understanding of Each Activity
  • Filter Activity

Module 4 : Azure Data Factory - Interation & Conditionals

  • Filter Activity
  • ForEach Activity
  • Switch Activity
  • if Condition Activity
  • Until Activity

Module 5 : Azure Data Factory - Types of Integration Runtimes

  • Azure IR (Auto Resolve Integration Runtime)
  • Selfhosted IR

Module 6 : Azure Data Factory - Types of Triggers

  • Stoarge Event Tigger
  • Schedule Trigger
  • Tumbling Window Trigger

Module 7 : Introduction to DataFlows

  • Filter Transformation
  • Select Transformation
  • Derived Column Transformation
  • Aggregator Transformation
  • Join Transformation
  • Union Transformation

Module 8 : Practical Scenarios and Use Cases

  • Practice_Session1_Copy Data from File System to Azure SQL Database
  • Practice_Session2_Copy Data from Multiple Files(ADLS Gen2)To Multiple Tables(Azure SQL Database)
  • Practice_Session3_Copy Data from Multiple Files(ADLS Gen2)To Multiple Tables(Azure SQL Database) Using Parameters
  • Practice_Session4_Dynamically Copy Multiple Files(ADLS Gen2)To Multiple Tables(Azure SQL Database)
  • Practice_Session5_Dynamically Copy Multiple Files To Multiple Tables_Lookup_GetMetadata_For-Each_If Condition Activities
  • Practice_Session6_Copy Multiple CSV Files with Same Structure To Single Table
  • Practice_Session7_FilteringFileFormats Using Getmetadata_Filter_ForEach_Copy_Activity
  • Practice_Session8_Bulk Copy from Tables to Files Using Config Table
  • Practice_Session9_Bulk Copy from Tables to Files Using Lookup Activity_Custom SQL Query
  • Practice_Session10_Container Parameterization_Using Lookup and For-Each Activity
  • Practice_Session11_Azure Key Vault Integration with ADF Resource
  • Practice_Session12_Pipeline Execution_Success Audit log and Failure Audit Log
  • Practice_Session13_Pipeline Execution Automation_Schedule Trigger_Storage Event Trigger
  • Practice_Session14_Copy Data from On-Premise SQL Server to ADLS Gen2 using Self hosted IR
  • Practice_Session15_Email Notifications_Logic Apps
  • Practice_Session16_Incremental OR Delta Load Implementation
  • Practice_Session17_ADF_Designing DataFlows

Module 9: ADF_Assignments & Case Studies

  • ADF_Assignment1_Create Azure Blob Storage Account_Dala Lake Storage Gen2 Account
  • ADF_Assignment2_Create Azure SQL Database Instance
  • ADF_Assignment3_Data Ingestion_Copy Data Tool(CDT)
  • ADF_Assignment4_Add New Columns While Copying Data
  • ADF_Assignment5_CopyData Activity_Executepipeline Activity_ADLS Gen2_SQLDB
  • ADF_Assignment6_FilterFileFormats based on File Size and Delete Files from Source Storage
  • ADF_Assignment7_Insert Metadata_Get Metadata_Stored Procedure Activity
  • ADF_Assignment8_Insert Metadata_About CSV Files in Azure Storage_Get Metadata_Stored Procedure Activity
  • ADF_Assignment9_CopyData Activity_Linked Service_Dataset_Pipeline Parameters_Copy Multiple Files_To_Tables
  • ADF_Assignment10_Copy Data Activity_Copy Behaviour
  • ADF_Assignment11_Snowflake_Integration
  • ADF_Assignment12_Snowflake_To_ADLS_Gen2_StagedCopy
  • ADF_Assignment13_ADF_AWS_S3_Bucket_Integration
  • ADF_Assignment14_GCP_To_ADLS_Gen2_Integration
  • ADF_Assignment15_Dataflows_Rank Transformation
  • ADF_Assignment16_Dataflows_Parse Transformation
  • ADF_Assignment17_Dataflows_Stringfy Transformation
  • ADF_Assignment18_Dataflows_SurrogateKey_Transformation
  • ADF_Assignment19_Dataflows_Windows Transformation
  • ADF_Assignment20_Dataflows_Coniditional Split_Transformation
  • ADF_Assignment21_Dataflows_Aggregator_Sorter Transformation
  • ADF_Assignment22_Dataflows_Lookup Transformation
  • ADF_Assignment23_Dataflows_Exists Transformation
  • ADF_Assignment24_REST API Integration
  • ADF_Assignment25_Data Activity_Filter By Last Modified Date
  • ADF_Assignment26_Data Activity_Copy behaviour_Preserve Hierarchy_Flatten Hierarchy_Merge Files
  • ADF_Assignment27_Copy Data Activity_Filter By Last Modified Date_Dynamic Date Expressions
  • ADF_Assignment28_Copy Data from JSON File To Azure SQL Database Table
  • ADF_Assignment29_Execute Copy Data Activity based on File Count in the Container
  • ADF_Assignment30_Copy Data Activity_List of Files Configuration
  • ADF_Assignment31_Dataflows_Flatten Transformations
  • ADF_Assignment32_Dataflows_Pivot Transformations
  • ADF_Assignment33_Databricks Notebook_Integration with Azure Data Factory
  • ADF_Assignment34_Thumbling Window Trigger_Introduction
  • ADF_Assignment35_Implement_Thumbling Window Trigger
  • ADF_Assignment36_Differences Between Debug VS Tigger Now
  • ADF_Assignment37_Row Format Storage Internals
  • ADF_Assignment38_Columnar Format Storage Internals
  • ADF_Assignment39_Copy Data_On-premise File System To ADLS Gen2
  • ADF_Assignment40_Copy Data from On-premise To Azure Cloud Storages
  • ADF_Assignment41_Copy Data Activity_Excel File Formats
  • ADF_Assignment42_Copy Data Activity_Excel File Formats_Lookup Activity_Pipeline Variables
  • ADF_Assignment43_Copy Data Activity_XML File Formats
  • ADF_Assignment44_Insert the Metadata about a storage Container Dynamically using Parameterized Stored Procedure
  • ADF_Assignment45_Introduction To Slowly Changing Dimensions
  • ADF_Assignment46_Implementation of SCD Type1 Dimension
  • ADF_Assignment47_SCD Type2 Introduction
  • ADF_Assignment48_SCD Type2 Implementation

Azure Synapse Analytics:

Module 1: Processing Data Using Azure Synapse Analytics

  • Provisioning an Azure Synapse Analytics Workspace
  • Analyzing data using serverless SQL pool
  • Provisioning and configuring Spark pools
  • Processing data using Spark pools and a lake database
  • Querying the data in a lake database from serverless SQL pool
  • Scheduling naotebooks to process data incrementally

Module 2: Synapse DataFlows

  • Copying data using a Synapse data flow
  • Performing data transformation using activities such as join,sort, and filter
  • Monitoring data flows and pipelines
  • Configuring partitions to optimize data flows
  • Parameterizing mapping data flows
  • handling schema changes dynamically in data flows using schema drift

Module 3: Azure Synapse SQL Pool

  • Loading data into dedicated SQL pools using Polybase and T-SQL
  • Loading data into dedicated SQL pools using COPY INTO
  • Creating distributed tables and modifying table distribution
  • Creating statistics and automating the update of statistics

Module 4: Monitering Synapse SQL and Spark Pools

  • Configuring a Log Analytics workspace for Synapse SQL Pools
  • A Log Analytics workspace for Synapse Spark Pools
  • Using Kusto queries to monitorSQL and Spark Pools
  • Creating workbooks in a log Analytics workspace to visualize monotoring data
  • Monitoring table disbrution,dataskew, and index health using Syanapse DMVs
  • Building monitoring dashboards for Synapse with Monitor

Module 5: Synapse Pipelines to Orchestrate Data

  • Introducing Synapse Pipelines
    • Integration runtime
      • Azure IR
      • Self Hosted IR
    • Activities
    • Pipelines
    • Triggers
      • Scheduled trigger
      • Storage event Trigger
      • Tumbling window Trigger
  • Creating linked services
  • Defining source and target datasets
  • Using various activities in Synapse pipelines
  • Scheduling Synapse pipelines

Module 6: Working with Python and Spark SQL in Azure Syanapse

  • Pyspark (Python)
  • Spark(Scala)
  • .NET Spark (C#)
  • Spark SQL

Module 7: Azure Synapse dedicated SQL Pool

  • Hash-distributed tables
  • Round-robin-distributed tables
  • Replicated tables

Azure Databricks:

Module 1: Introduction to Azure Databricks

  • Introduction to Databricks
  • Azure Databricks Architecture
  • Azure Databricks Main Concepts
  • Types of Data Processing Paradigms_Traditional Data Processing Approach
  • Traditional Data Processing vs Distributed Computing Framework
  • Different Distributed Computing Frameworks_Hadoop vs Apache Spark
  • Evolution of Azure Databricks History

Module 2: Core Databricks Concepts

  • Workspace
  • Notebooks
  • Library
  • Folder
  • Repos
  • Data
  • Compute
  • Workflows

Module 3: Types Of Clusters

  • All-Purpose Clusters
  • Job Clusters
  • Pools

Module 4: Databricks - Internal Storage

  • Databricks File System (DBFS)

Module 5: Databricks - External Storage

  • Azure Blob Storage
  • Azure Datalake Storage Gen2
  • Azure SQL Database
  • Azure Synapse Dedicated SQL Pool
  • Snowflake

Module 6: Storages - Azure Credentials

  • Account Access Key
  • Shared Access Signature Token
  • OAuth2.0 Azure Service Principal

Module 7: Databricks Notebooks - Magic Commands

  • %Python or %py
  • %r
  • %scala
  • %sql

Module 8: Databricks Utilities

  • File System Utility
  • Widgets Utility
  • Secrets Utility
  • Notebook Utility

Module 9: Bigdata File Format

  • Row - Based File Formats
    • CSV,TSV, and AVRO
  • Columnar File Formats
    • Parquet,Delta, and ORC

Module 10: CSV File Format

  • Reading Data
  • Reading Data from Multiple CSV Files
  • Writing Data

Module 11: JSON File Format

  • Single Line JSON
  • Multi Line JSON
  • Complex Multi Line JSON
    • Arrays
    • Struct Fields

Module 12: Excel File Format

  • Single Sheet Reading
  • Multiple Sheet Reading Using List object
  • Dynamically Reading Multiple Sheets

Module 13: XML File Format

  • Simple XML Files
  • Complex XML Files

Module 14: Libraries

  • Install Cluster Libraries
    • Maven Package
    • PyPI Package
    • CRAN Package

Module 15: Databricks - Big Data Workloads

  • Batch Processing
  • Structured Streaming ( Real Time Processing)

Module 16: Databricks - Accesing Azure Data Lake

  • Account Access Key
  • Shared Access Signature Token
  • Mounting Azure Data Lake (Service Principle)

Module 17: Spark Structured Streaming

  • ReadStream
  • WriteStream
  • output modes
  • Triggers
    • Fixed Interval
    • One Time
    • Continues
  • Managing Streams

Module 18: Azure databricks - Types of Loads

  • History Load
  • Incremental Load

Module 19: Notebook - Code Modularity

  • %run
  • dbutils.notebook.run()

Module 20: Introduction To Spark SQL Module

  • Managed Tables(Internal Tables)
    • DataFrame API
    • Spark SQL API
  • Un-Manged Tables(External Tables)
    • DataFrame API
    • Spark SQL API
  • Temporary Views(Temporary Table)
  • Global Temporary Views

Module 21: Introduction To Delta Lake

  • Delta Lake Features
    • ACID transactions
    • Handling metadata
    • Streaming and batch workfloads
    • Schema enforcement
    • Time travel
    • Upserts and delets
  • Delta Lake Components
    • _delta_log(Transaction log)
    • Versioned parquet files
  • Delata lake Operations
    • Create Table
    • Upsert to a table
    • Read a table
    • Update a table
    • Delete from a table
    • Display table history
    • Time table
    • Clean up snapshots with VACUUM
    • Delta Lake table history
    • Restore a Delta table to an earlir state
    • Vacuum unused data files

Module 22: Delta Lake - Slowly Changing Dimension

  • Type1 Dimension
  • Type2 Dimension
  • Type3 Dimension

Module 23: Databricks - Azure SQL Database

  • Reading Data With Jdbc Driver
  • Writing Data With Jdbc Driver

Module 24: Databricks - Synapse Dedicated SQL Pool

  • Reading Data From Synapse Table
  • Writing Data To Synapse Table

Module 25: Databricks - Snowflake

  • Reading Data From Snowflake Table
  • Writing Data To Snowflake Table

Module 26: Delta Lake - Performance Optimization Technics

  • OPTIMIZE a Table
  • Z-ORDER by Columns

Module 27: Databricks Integration With Azure Data Factory

  • Call a Notebook using Notebook Activity
  • SetVariable Activity
  • Trigger ADF Pipeline

Module 28: Azure Key Vault Integration With databricks

  • Create Secrets
  • Create SecretScope

Azure Databricks Practice Sessions :

  • ADB_Session1_Types of Data Processing Paradigms_Traditional Data Processing Approach
  • ADB_Session2_Traditional Data Processing vs Distributed Computing Framework
  • ADB_Session3_Different Distributed Computing Frameworks_Hadoop vs Apache Spark
  • ADB_Session4_Evolution of Azure Databricks History
  • ADB_Session5_Introduction to Azure Databricks_Create Azure Databricks Workspace
  • ADB_Session6_Azure Databricks Workspace Assets
  • ADB_Session7_Azure Databricks_Magic Commands
  • ADB_Session8_Azure Databricks File System(DBFS)
  • ADB_Session9_DBFS_dbutils.fs Utility_%fs Magic Command
  • ADB_Session10_DBFS_dbutils.fs Utility_%fs Magic Command_%sh Shell Command
  • ADB_Session11_Azure Databricks_dbutils_Widgets Utility
  • ADB_Session12_Reading Data from CSV File Format
  • ADB_Session13_Reading Data from Simple Single Line JSON File Format
  • ADB_Session14_Reading Data from Simple Multi Line JSON File Format
  • ADB_Session15_Reading Data from Complex Multi Line JSON File _Flattening Arrays_Struct fields
  • ADB_Session16_Reading Data from Excel File Format
  • ADB_Session17_Reading Data from XML File Format
  • ADB_Session18_Azure Databricks_Batch Data Processing
  • ADB_Session19_Azure Databricks_Structured Streaming API
  • ADB_Session20_Azure Databricks_History Load_Incremental Load
  • ADB_Session21_Azure Databricks Integration with Azure Data Factory
  • ADB_Session22_Calling a Notebook from Another Notbook using %run
  • ADB_Session23_Calling a Notebook from Another Notbook using dbutils.notebook.run()
  • ADB_Session24_Introduction to Spark SQL Module
  • ADB_Session25_Create Managed Tables Using DataFrame API and Spark SQL API
  • ADB_Session26_Create Un-Managed Tables Using DataFrame API and Spark SQL API
  • ADB_Session27_Introduction to Delta Lake
  • ADB_Session28_Schema Validation_Schema Evalution_DeltaTableBuilder API
  • ADB_Session29_Accessing Azure Blob Storage Using Account Access Key_SecretScope
  • ADB_Session30_Accessing Azure Blob Storage Using Shared Access Singnature Token_SecretScope
  • ADB_Session31_Create Mount Points to Azure BlobStorage_ADLS Gen2
  • ADB_Session32_Accessing Azure SQL Database_JDBC Driver
  • ADB_Session33_Reading_Writing Data to Azure Synapse Dedicated SQL Pool using JDBC ConnectionString
  • ADB_Session34_Implementation of Slowly Changing Dimension Type1
  • ADB_Session35_Implementation of Slowly Changing Dimension Type3

Azure Databricks_Assignments & Case Studies :

  • ADB_Assignment1_Azure Databricks_Types of Clusters
  • ADB_Assignment2_>Azure Databricks_Cluster_Pools
  • ADB_Assignment3_Azure Databricks_Compute_On-Demand vs Azure Spot VM Instances
  • ADB_Assignment4_Azure Databricks_Bigdata File formats
  • ADB_Assignment5_Reading Data from Multiple CSV Files With the Same StructureADB_Assignment1_Reading TSV Files_User Defined Schema
  • ADB_Assignment6_Apache Spark_Transformations_Actions
  • ADB_Assignment7_Create DataFrame Using Python Collection Objects_List_Tuple_Dictionary
  • ADB_Assignment8_Create DataFrame_Define Schema Programatically Using StructType() & StructField()
  • ADB_Assignment9_Reading Single_Double_PIPE Delimited Files
  • ADB_Assignment10_Reading_Multiple_Different_Delimiter CSV Files
  • ADB_Assignment11_Spark Low Level API's vs Structured API's
  • ADB_Assignment12_Creation of Structured API_DataFrame
  • ADB_Assignment13_Creation of DataFrame_Schemas
  • ADB_Assignment14_Python Functions
  • ADB_Assignment15_Python Dictionaries_Functions_Widgets
  • ADB_Assignment16_Flatten Multi Line Complex JSON Files_Python User Defined Function
  • ADB_Assignment17_Flatten Arrays_Maps_explode()_explode_outer() Functions
  • ADB_Assignment18_Batch ETL Processing_Replace Nulls with Literals
  • ADB_Assignment19_Batch ETL Processing_GroupBy_Aggregation Processing
  • ADB_Assignment20_Batch ETL Processing_PySpark_Join Types
  • ADB_Assignment21_Batch ETL Processing_PySpark_Union_UnionAll
  • ADB_Assignment22_Batch ETL Processing_PySpark_Distinct_DropDuplicates Methods
  • ADB_Assignment23_Batch ETL Processing_GroupBy_Aggregation Processing
  • ADB_Assignment24_Create Workflow to orchistrate Multiple Tasks
  • ADB_Assignment25_Implement Slowly Changing Dimension Type1 and Type3
  • ADB_Assignment26_Batch Processing_Data Processing Techniques_Python List Comprehension
  • ADB_Assignment27_Batch Processing_Sorting on Single Column_sort() method
  • ADB_Assignment28_Batch Processing_Sorting on Multiple Columns
  • ADB_Assignment29_Batch Processing_PySpark_Date Functions
  • ADB_Assignment30_Batch Processing_PySpark_Date Functions
  • ADB_Assignment31_Batch Processing_PySpark_Indentify or Check Duplicates in DataFrame
  • ADB_Assignment32_Batch Processing_PySpark_Dropping Rows that Contains Null Values using dropna() & na.drop() Methods
  • ADB_Assignment33_Batch Processing_PySpark_Replacing Nulls with another Value Using fillna() Method_na.fill() Method
  • ADB_Assignment34_Batch Processing_Reading and Writing Data to Snowflake Cloud Data Platform
  • ADB_Assignment35_Delta Lake_Schema Validation_Enforcement
  • ADB_Assignment36_Delta Lake_Schema Evolution
  • ADB_Assignment37_Update_Delete Operations in data lake with Delta Lake
  • ADB_Assignment38_Audting Data Changes with Operation History

Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

Talk With Us