• Welcome to CloudMonks
  • +91 9849668223
  • info@thecloudmonks.com

Azure Data Factory Training

About The Course

Azure data engineers are responsible for data-related tasks that include provisioning data storage services, batch data and ingesting streaming, implementing security requirements, transforming data, implementing data retention policies, identifying performance bottlenecks, and accessing external data sources. In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn't have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers.

Big data requires a service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory Training in Hyderabad is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

For example, imagine a gaming company that collects petabytes of game logs that are produced by games in the cloud. The company wants to analyze these logs to gain insights into customer preferences, demographics, and usage behavior. It also wants to identify up-sell and cross-sell opportunities, develop compelling new features, drive business growth, and provide a better experience to its customers.

To analyze these logs, the company needs to use reference data such as customer information, game information, and marketing campaign information that is in an on-premises data store. The company wants to utilize this data from the on-premises data store, combining it with additional log data that it has in a cloud data store.

To extract insights, it hopes to process the joined data by using a Spark cluster in the cloud (Azure HDInsight), and publish the transformed data into a cloud data warehouse such as Azure Synapse Analytics to easily build a report on top of it. They want to automate this workflow, and monitor and manage it on a daily schedule. They also want to execute it when files land in a blob store container.

Best ADF Training in Hyderabd is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Module 1: Cloud Computing Concepts

  • What is the "Cloud"?
  • Why cloud services
  • Types of cloud models
    • Deployment Models
      • Private Cloud
      • Public Cloud
      • Hybrid Cloud
  • Types of cloud services
    • Infrastructure as a Service(IaaS)
    • Platform as a Service(PaaS)
    • Software as a Service(SaaS)
  • Comparing Cloud Platforms
    • Microsoft Azure,
    • Amazon Web Services,
    • Google Cloud Platform
  • characteristics of cloud computing
    • On-demand self-service
    • Broad network access
    • Multi-tenancy and resource pooling
    • Rapid elasticity and scalability
    • Measured service
  • Cloud Data Warehouse Architecture
    • Shared Memory architecture
    • Shared Disk architecture
    • Shared Nothing architecture

Module 2:BigData Introduction

  • What is BigData?
  • BigData Sources
  • Data vs Information
  • Characteristics of bigdata
    • Variety
    • Velocity
    • Volume
    • Veracity
    • Value
  • Types Of BigData
    • Structured Data
    • Unstructured Data
    • Semi Structured Data

Module 3:Dimensional Modelling

  • OLTP System
    • Relational Modelling
  • Characteristics Features of OLTP
  • Enterprise Data Warehouse
    • Dimensional Modelling
  • Dimensional Modelling-Schemas
    • Star Schema
    • Snowflake Schema
    • Multi Star Schema
  • Dimesional Tables
  • Fact Tables
  • Types of slowly Changing Dimensions
    • Type1 Dimension
    • Type2 Dimension
    • Type3 Dimension
  • Types Facts
    • Additive Facts
    • Semi Additive Facts
    • Non-Additive Facts

Module 4: Azure SQL Database

  • Introduction Azure SQL Database.
  • Comparing Single Database
  • Managed Instance
  • Creating and Using SQL Server
  • Creating SQL Database Services.
  • Azure SQL Database Tools.
  • Migrating on premise database to SQL Azure.
  • Purchasing Models
  • DTU service tiers
  • vCore based Model
  • Serverless compute tier
  • Service Tiers
    • General purpose / Standard
    • Business Critical / Premium
    • Hyperscale
  • Deployment of an Azure SQL Database
  • Elastic Pools.
  • What is SQL elastic pools
    • Choosing the correct pool size
  • Creating a New Pool
  • Manage Pools

Module 5: Azure Storage Service

  • Azure Storage Account
  • Features of Azure storage Service
  • Introduction to Blob Storage Service
  • Blob Storage Architecture
  • Blob Storage Features
  • Types of Blobs
    • Block Blobs,
    • Append Blobs
    • Page Blobs
  • Creating a Storage Account
  • Azure Storage Performance Tiers
    • Standard
    • Premium Performance
  • Understanding Data Replication
    • LRS ( Locally Redundant Storage)
    • ZRS (Zone Redundant Storage)
    • GRS (Geo Redundant Storage)
  • Azure Storage-Access Tiers
    • Hot
    • Cold
    • Archive
  • Working with Containers and Blobs
  • Soft Delete
  • Azure Storage Explorer
  • Access blobs securely
  • Access Key
  • Account Shared Access Token
  • Service Shared Access Token
  • Azure Maximum Scalability Or Limits

Module 6: Azure Data Lake Storage Services

  • Introduction to Azure Data Lake
  • What is Data Lake?
  • What is Azure Data Lake?
  • Data Lake Architecture?
  • Working with Azure Data Lake Storage Gen1
  • Features of Data Lake Storage Gen1
  • Understanding Azure Data Lake Gen2
  • Features of Data Lake Storage Gen2
  • Differences Between Gen1 & Gen2 Storage
  • Explore Data Lake Storges
  • Prevising Data Lake Storage Gen1 Service
  • Provising Data Lake Storage Gen2 Service
  • Uploading Sample File
  • Using Azure Portal
  • Using Storage Explorer

Module 7: Azure Data Factory Introduction

  • What is Azure Data Factory(ADF)?
  • Azure Data Factory Key Components
    • Pipeline
    • Activity
    • Linked Service
    • Data Set
    • Integration Runtime
    • Triggers
    • Data Flows
  • Create Resource Group
  • Create Storage Account
  • Creation of Azure Data Factory Service

Module 8: Working with Copy Activity

  • Understanding Azure Data Factory UI
  • Copy Data from Blob Storage Service to Azure SQL Database
  • Copy data from file storage account to file storage account
  • Create Linked service for various data stores and compute
  • Creation of Datasets that points to file and table
  • Design Pipelines with various activities
  • Create SQL Server on Virtual Machines( On-Premise)
  • Define Copy activity and it features
  • Copy Activity-Copy Behaviour
  • Copy Activity_Data Integration Units
  • Copy Activity- User Properties
  • Copy Activity- Number of parallel copies
  • Working with Lookup Activity
  • Understanding of Each Activity
  • Filter Activity
  • Get Metadata Activity
  • Lift and Shift
  • Hosting Azure - SSIS Integration Runtime
  • Execute SSIS Packages from ADF
  • Monitoring Pipeline
  • Debug Pipeline
  • Trigger pipeline manually
  • Monitor pipeline
  • Trigger pipeline on schedule

Module 9 : Practical Scenarios and Use Cases

  • ADF_PracticeSession1_Blob_To_Blob
  • ADF_PracticeSession2_CopyActivity_Prefix_Wildcard_FilePath_Blob_To_Blob
  • ADF_PracticeSession3_Blob_To_Azure_SQLDB
  • ADF_PracticeSession4_Blob_To_Azure_SQLDB
  • ADF_PracticeSession5_Dataset_Parameters_Blob_To_Azure_SQLDB
  • ADF_PracticeSession6_Blob_To_ADLS_Gen2
  • ADF_PracticeSession7_ADLS_Gen1_To_ADLS_Gen2
  • ADF_PracticeSession8_Pipeline_Dataset_LinkedService_Parameters
  • ADF_PracticeSession9_FilteringFileFormats_Getmetadata_Filter_ForEach_Copy_Activity
  • ADF_PracticeSession10_FilteringFileFormats_Getmetadata_Filter_ForEach_Copy_Activity
  • ADF_PracticeSession11_BulkCopy_Tables_Files
  • ADF_PracticeSession12_Container_Parameterization_Blob_To_Blob_Storage
  • ADF_PracticeSession13_ExecuteCopyActivity_BasedOnFileCount
  • ADF_PracticeSession14_StoredProcedures_Parameters
  • ADF_PracticeSession15_CopyActivity_CustomSQL_Queries_StoredProcedures
  • ADF_PracticeSession16_Pipeline_Audit_Log
  • ADF_PracticeSession17_Copybehaviour
  • ADF_PracticeSession18_CSV_To_JSON_Format
  • ADF_PracticeSession19_Copy_JSON_File_To_AzureSQL
  • ADF_PracticeSession20_Add_AdditionalColumns_WhileCopyingData
  • ADF_PracticeSession21_CopyDataTool
  • ADF_PracticeSession22_Custom_Email_Notification
  • ADF_PracticeSession23_AzureKeyVault_Integration
  • ADF_PracticeSession24_Incremental_Load
  • ADF_PracticeSession25_Integration_Runtime
  • ADF_PracticeSession26_On-Premise_SQLServer_ADLS_Gen2
  • ADF_PracticeSession27_On-Premise_FileSystem_ADLS_Gen2
  • ADF_PracticeSession28_AzureSynapseAnalytics_Integration
  • ADF_PracticeSession29_AzureSynapse_BlobStorage_Polybase_Integration
  • ADF_PracticeSession30_AzureSynapse_AzureSQLDatabse
    polybase_CopyStatement_Integration
  • ADF_PracticeSession31_AWS_S3_Integration
  • ADF_PracticeSession32_AWS_S3_Integration
  • ADF_PracticeSession33_GCP_Integration
  • ADF_PracticeSession34_Snowflake_Integration
  • ADF_PracticeSession35_REST_API_Integration
  • ADF_PracticeSession36_CosmosDB_Introduction
  • ADF_PracticeSession37_Eventbased_Trigger
  • ADF_PracticeSession38_Scheduled_Trigger
  • ADF_PracticeSession39_TumblingWindow_Trigger
  • ADF_PracticeSession40_Blob_SQLDB_Executepipeline_Activity
  • ADF_PracticeSession42_SQLDB_BLOB_Overwrite_Append_Mode
  • ADF_PracticeSession43_Dataflows_Introduction
  • ADF_PracticeSession44_Dataflows_Select_Filter_DerivedColumn_Transformation
  • ADF_PracticeSession45_Dataflows_Select__DerivedColumn_Aggregator_Sort_Transformation
  • ADF_PracticeSession46_Dataflows_ConditionalSplit_Transformation
  • ADF_PracticeSession47_Dataflows_Join_Transformation
  • ADF_PracticeSession48_Dataflows_Union_Transformation
  • ADF_PracticeSession49_Dataflows_Lookup_Transformation
  • ADF_PracticeSession50_Dataflows_Exists_Transformation
  • ADF_PracticeSession51_Dataflows_Rank_Transformation
  • ADF_PracticeSession52_Dataflows_Pivot_Transformation
  • ADF_PracticeSession53_Dataflows_UnPivot_Transformation
  • ADF_PracticeSession54_Dataflows_SurrogateKey_Transformation
  • ADF_PracticeSession55_Dataflows_Windows_Transformation
  • ADF_PracticeSession56_Dataflows_AlterRow_Transformation
  • ADF_PracticeSession57_Switch Activity-Move and delete data
  • ADF_PracticeSession58_Until Activity-Parameters & Variables
  • ADF_PracticeSession59_Copy Recent Files From Blob input to Blob Output folder without LPV
  • ADF_PracticeSession60_Nested ForEach -pass parameters from Master to child pipeline
  • ADF_PracticeSession61_Restart data processing from failure
  • ADF_PracticeSession62_Remove Duplicate rows using data flows
  • ADF_PracticeSession63_Slowly Changing Dimension Type1 (SCD1) with HashKey Function
  • ADF_PracticeSession64_Slowly Changing Dimension Type2

Module 10: Assignments & Case Studies

  • ADF_Azure_Functions Integration
  • ADF_Azure_HDInsight Integration
  • ADF_Azure_HDInsight with Spark Cluster
  • ADF_Azure_Databricks Integration
  • ADF_Azure_Data Lake Analytics integration

Module 11: Introduction to Azure Databricks

  • Introduction to Databricks
  • Azure Databricks Architecture
  • Azure Databricks Main Concepts

Module 12:Databricks Cluster Management

  • Creating and configuring clusters
  • Managing Clusters
    • Displaying clusters
    • Starting a cluster
    • Terminating a cluster
    • Delete a cluster
    • Cluster Information
    • Cluster logs
    • Cluster access control
  • Types of Clusters
    • All pupose clusters
    • Job cluster
  • Databricks Pools
    • Databricks without pools
    • Databricks with Pools
  • Cluters Mode
    • Standard
    • High Concurrency
    • Single Node
  • Autoscalling
  • Databricks runtime versions
  • Multiuser Clusters

Module 13:Databricks notebook core functionalities

  • Creating and managing notebooks
  • Exporting notebooks
  • Importing notebooks
  • Attaching a notebook to a cluster
  • Spark environment variables
    • SparkContext(sc)
    • SQLContext/HiveContext(sqlContext)
    • SparkSession(spark)
  • Scheduling a notebook
  • Default Languege
  • Notebook permissions
  • Folder permissions
  • Cloning notebook
  • Renaming notebook

Module 14:Databricks Utilities and Notebook Parameters

  • Dbutils commands on files, directories
  • Notebooks and libraries
  • Databricks Variables
  • Widget Types
  • Databricks notebook parameters

Module 15:Databricks Integration with Azure Blob Storage

  • Read data from Blob Storage and Creating Blob mount point

Module 16:Databricks Integration with Azure Data Lake Storage Gen2

  • Reading files from Azure Data Lake Storage Gen2

Module 17:Databricks Integration with Azure Data Lake Storage Gen1

  • Reading Files from data lake storage Gen1

Module 18:Databricks Integration with Azure Data Lake Storage Gen2

  • Reading Files from data lake storage Gen2

Module 19:Reading and Writing CSV files in Databricks

  • Read CSV Files
  • Read TSV Files and PIPE Seperated CSV Files
  • Read CSV Files with multiple delimiter in spark 2 and spark 3
  • Reading different position Multidelimiter CSV files

Module 20: Introduction to Python

  • Introduction to Python
  • Datatypes In Python
  • Operaters in Python
  • Input And Output
  • Control Statements
  • Strings and Characters
  • Lists
  • Tuples
  • Dictionaries
  • SET
  • Functions
  • Modules

Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

Talk With Us