• Welcome to CloudMonks
  • +91 9849668223
  • info@thecloudmonks.com

Azure Data Engineering

About The Course

Azure data engineers are responsible for data-related tasks that include provisioning data storage services, batch data and ingesting streaming, implementing security requirements, transforming data, implementing data retention policies, identifying performance bottlenecks, and accessing external data sources. In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn't have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers.

Big data requires a service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

For example, imagine a gaming company that collects petabytes of game logs that are produced by games in the cloud. The company wants to analyze these logs to gain insights into customer preferences, demographics, and usage behavior. It also wants to identify up-sell and cross-sell opportunities, develop compelling new features, drive business growth, and provide a better experience to its customers.

To analyze these logs, the company needs to use reference data such as customer information, game information, and marketing campaign information that is in an on-premises data store. The company wants to utilize this data from the on-premises data store, combining it with additional log data that it has in a cloud data store.

To extract insights, it hopes to process the joined data by using a Spark cluster in the cloud (Azure HDInsight), and publish the transformed data into a cloud data warehouse such as Azure Synapse Analytics to easily build a report on top of it. They want to automate this workflow, and monitor and manage it on a daily schedule. They also want to execute it when files land in a blob store container.

Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Module 1: Cloud Computing Concepts

  • What is the "Cloud"?
  • Why cloud services
  • Types of cloud models
    • Deployment Models
    • private Cloud deployment model
    • public Cloud deployment model
    • hybrid cloud deployment model
  • Types of cloud services
  • Infrastructure as a Service,
  • Platform as a Service,
  • Software as a Service
  • Comparing Cloud Platforms
    • Microsoft Azure,
    • Amazon Web Services,
    • Google Cloud Platform
  • characteristics of cloud computing
    • On-demand self-service
    • Broad network access
    • Multi-tenancy and resource pooling
    • Rapid elasticity and scalability
    • Measured service
  • Cloud Data Warehouse Architecture
  • Shared Memory architecture
  • Shared Disk architecture
  • Shared Nothing architecture

Module 2: Core Azure services

  • Core Azure Architectural components
  • Core Azure Services and Products
  • Azure solutions
  • Azure management tools

Module 3: Azure SQL Database

  • Introduction Azure SQL Database.
  • Comparing Single Database
  • Managed Instance
  • Creating and Using SQL Server
  • Creating SQL Database Services.
  • Azure SQL Database Tools.
  • Migrating on premise database to SQL Azure.
  • Purchasing Models
  • DTU service tiers
  • vCore based Model
  • Serverless compute tier
  • Service Tiers
    • General purpose / Standard
    • Business Critical / Premium
    • Hyperscale
  • Deployment of an Azure SQL Database
  • Elastic Pools.
  • What is SQL elastic pools
    • Choosing the correct pool size
  • Creating a New Pool
  • Manage Pools
  • Monitoring and Tuning Azure SQL Database
  • Configure SQL Database Auditing
  • Export and Import of Database
  • Automated Backup
  • Point in Time Restore
  • Restore deleted databases
  • Long-term backup retention
  • Active Geo Replication
  • Auto Failover Group

Module 4: Azure Storage Service

  • Storage Service and Account
  • Creating a Storage Account
  • Standard and Premium Performance
  • Understanding Replication
  • Hot, Cold and Archive Access Tiers
  • Working with Containers and Blobs
  • Types of Blobs
  • Block Blobs,
  • Append Blobs
  • Page Blobs
  • Blob Metadata
  • Soft Delete
  • Azure Storage Explorer
  • Access blobs securely
  • Access Key
  • Account Shared Access Token
  • Service Shared Access Token
  • Shared Access Policy
  • Storage Service Encryption
  • Azure Key Vault

Module 5: Azure Data Lake

  • Introduction to Azure Data Lake
  • What is Data Lake?
  • What is Azure Data Lake?
  • Data Lake Architecture?
  • Working with Azure Data Lake
  • Provisioning Azure Data Lake.
  • Explore Data Lake Analytics
  • Explore Data Lake Store
  • Uploading Sample File
  • Using Azure Portal
  • Using Storage Explorer
  • Using Azure CLI

Module 6: Azure Data Factory

  • What is Data Factory?
  • Data Factory Key Components
  • Pipeline and Activity
  • Linked Service o Data Set
  • Integration Runtime Provision Required Azure Resources
  • Create Resource Group
  • Create Storage Account
  • Provision SQL Server and Create Database
  • Provision Data Factory

Module 7: Working with Copy Activity

  • Understanding Data Factory UI
  • Copy Data from Blob Storage to SQL Database
  • Copy data from storage account to storage account
  • Create Linked service o Create Dataset
  • Create Pipeline ∙ Integration Service
  • Copy Data from on-premise SQL Server to Blob Storage Working with Activities
  • Understanding Lookup Activity
  • Understanding for Each Activity
  • Filter Activity
  • Get Metadata Activity Azure
  • Lift and Shift
  • Provisioning Azure - SSIS Integration Runtime
  • Execute SSIS Packages from Azure
  • Execute SSIS Packages from SSISDB Triggers,
  • Monitoring Pipeline
  • Debug Pipeline
  • Trigger pipeline manually
  • Monitor pipeline
  • Trigger pipeline on schedule

Module 8 : Practical Scenarios and Use Cases

  • ADF Introduction
  • Important Concepts in ADF
  • Create Azure Free Account for ADF
  • Integration Runtime and Types
  • Integration runtime in ADF-Azure IR
  • Create Your First ADF
  • Create Your First Pipeline in ADF
  • Azure Storage Account Integration with ADF
  • Copy multiple files from blob to blob
  • Filter activity __ Dynamic Copy Activity
  • Get File Names from Folder Dynamically
  • Deep dive into Copy Activity in ADF
  • Copy Activity Behavior in ADF
  • Copy Activity Performance Tuning in ADF
  • Validation in ADF
  • Get Count of files from folder in ADF
  • Validate copied data between source and sink in ADF
  • Azure SQL Database integration with ADF
  • Azure SQL Databases - Introduction Relational databases
  • Creating Your First Azure SQL Database
    • 1) Deployment Models
    • 2) Purchasing Modes
  • Overwrite and Append Modes in Copy Activity
  • Full Load in ADF
  • Copy Data from Azure SQL Database to BLOB in ADF
  • Copy multiple tables in Bulk with Lookup & ForEach in Data Factory
  • Logging and Notification Azure Logic Apps
  • Log Pipeline Executions to SQL Table using ADF
  • Custom Email Notifications Send Error notification with logic app
  • Use Foreach loop activity to copy multiple Tables- Step by Step Explanation
  • Incremental Load in ADF
  • Incremental Load or Delta load from SQL to Blob Storage in ADF
  • Multi Table Incremental Load or Delta load from SQL to Blob Storage
  • Incrementally copy new and changed files based on Last Modified Date
  • Azure Key Vault integration with ADF
  • Azure Key Vault, Secure secrets, keys & certificates in Azure Data
  • ADF Triggers:
  • Event Based Trigger in ADF
  • Tumbling window trigger dependency & parameters
  • Schedule Trigger
  • Self Hosted Integration Runtime
  • Copying On Premise data using Azure Self Hosted integration Runtime
  • Data Migration from On premise SQL Server to cloud using ADF
  • Load data from on premise sql server to Azure SQL DB
  • Data Migration with polybase and Bulk insert
  • Copy Data from sql server to Azure SQL DW with polybase & Bulk Insert
  • Data Migration from On premise File System to cloud using ADF
  • Copy Data from on-premise File System to ADLS Gen2
  • ToCopying data from REST API using ADF
  • Loop through REST API copy data TO ADLS Gen2-Linked Service Parameters
  • AWS S3 integration with ADF
  • Migrate Data from AWS S3 Buckets to ADLS Gen2
  • Activities in ADF
  • Switch Activity-Move and delete data
  • Until Activity-Parameters & Variables
  • Copy Recent Files From Blob input to Blob Output folder without LPV
  • Snowflake integration with ADF
  • Copy data from Snowflake to ADLS Gen2
  • Copy data from ADLS Gen2 to Snowflake
  • Azure CosmosDB integration with ADF
  • Copy data from Azure SQLDB to CosmosDB
  • Copy data from blob to cosmosDB
  • Advanced Concepts in ADF
  • Nested ForEach -pass parameters from Master to child pipeline
  • High Availability of Self Hosted IR &Sharing IR with other ADF
  • Data Flows Introduction
  • Azure Data Flows Introduction
  • Setup Integration Runtime for Data Flows
  • Basics of SQL Joins for Azure Data Flows
  • Joins in Data Flows
  • Aggregations and Derive Column Transformations
  • Joins in Azure DataFlows
  • Advanced Join Transformations with filter and Conditional Split
  • Data Flows - Data processing use case1
  • Restart data processing from failure
  • Remove Duplicate Rows &Store Summary Credit Stats
  • Difference Between Join vs.Lookup Transformation& Merge Functionality
  • Dimensions in Data Flows
  • Flatten Transformation
  • Rank, Dense_Rank Transformatios
  • Data Flows Performance Metrics and Data Flow Parameters
  • How to use pivot and unpivot Transformations
  • Data Quality Checks and Logging using Data Flows
  • Batch Account Integration with ADF
  • Custom Activity in ADF
  • Azure Functions Integration with ADF
  • Azure HDInsight Integration with ADF
  • Azure HDInsight with Spark Cluster
  • Azure Databricks Integration with ADF
  • ADF Integration with Azure Databricks
  • Azure Data Lake Analytics integration with ADF

Module 9: Spark Basics

  • Spark Architecture
  • Spark RDD
  • Spark SQL
  • Spark SQL Functions
  • Spark SQL Advanced

Module 10:Python Basics

  • Python Basics
  • Python Data Types
  • Python Functions
  • Python Modules and packages
  • Python File handling
  • Python Data Structures

Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

Talk With Us