• Welcome to CloudMonks
  • +91 96660 64406
  • info@thecloudmonks.com

Azure Data Engineering - Azure Databricks


An Azure Data Engineering -Databricks role encompasses the entire data lifecycle within the Azure ecosystem, from data ingestion to analysis and reporting. These engineers are responsible for designing, implementing, and maintaining data pipelines, data warehouses, and data lake solutions using a variety of Azure services. They also handle tasks like data transformation, security, and performance optimization.


Responsibilities:

Data Ingestion and Extraction:

Bringing data from various sources (structured, unstructured, real-time) into Azure.

Data Transformation and Cleaning:

Ensuring data quality and consistency through cleaning, transformation, and integration processes.

Data Storage:

Designing and implementing data storage solutions, including Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.

Data Warehousing:

Building and maintaining data warehouses using Azure Synapse Analytics.

Data Pipeline Development:

Creating and managing automated data pipelines for efficient data movement and processing using Azure Data Factory or Azure Databricks.

Data Security and Compliance:

Implementing security measures (encryption, access control) and ensuring compliance with data privacy laws.

Performance Monitoring and Optimization:

Identifying and resolving performance bottlenecks in data systems.

Collaboration:

Working with data scientists, analysts, and business stakeholders to understand their needs and implement appropriate data solutions.



Azure Data Engineering - Azure Databricks-Interactive Live Sessions



  • 1: Intruduction To Azure Databricks
    • Core Databricks Concepts
    • Workspace
    • Notebooks
    • Library
    • Folder
    • Repos
    • Data
    • Compute
    • Workflows
  • 2: Introducing Spark Fundamentals
    • What is Apache Spark
    • Why Choose Apache Spark
    • What are the Spark use cases
  • 3: Spark Architecture
    • Spark Components
    • Spark Driver
    • SparkSession
    • Cluster manager
    • Spark Executors
  • 4: Create Databricks Workspace
    • Workspace Assets
  • 5: Creating Spark Cluster
    • All-Purpose Cluster
    • Single Node Cluster
    • Multi Node Cluster
  • 6: Databricks - Internal Storage
    • Databricks File System (DBFS)
    • Uploading Files to DBFS
  • 7: DBUTILS Module
    • Interaction with DBFS
    • %fs Magic Command
  • 8: Spark Data API's
    • RDD (Resilient Distributed Dataset)
    • DataFrame
    • Dataset
  • 9: Create Data Frame
    • Using Python Collection
    • Converting RDD to DataFrame
  • 10: Reading CSV data with Apache Spark
    • Inferred Schema
    • Explicit Schema
    • Parsing Modes
  • 11: Reading JSON data with Apache Spark
    • SingleLine JSON
    • Multiline JSON
    • Complex JSON
    • explode() Function
  • 12: Reading XML Data with Apache Spark
    • Install Spark-xml Library
    • User Defined Schema
    • DDL String Approach
    • StructType() with StructFields()
  • 13: Reading Excel File With Apache Spark
    • Single Sheet Reading
    • Multiple Sheet Reading Using List object
  • 14: Reading Excel File With Apache Spark
    • Multiple Excel Sheets with Same Structure
    • Multiple Excel Sheets with Different Structures
  • 15: Intruduction to Delta Lake
    • Delta Lake Features
    • Delta Lake Components
  • 16: Delta lake Features
    • DML Operations
    • Time Travel Operations
  • 17: Delta lake Features
    • Schema Validation and Enforcement
    • Schema Evolution
  • 18: Introduction to Spark SQL Module
    • Hive Metastore
    • Spark Catalog
  • 19: Spark SQL - Create Global Managed Tables
    • DataFrame API
    • SQL API
  • 20: Spark SQL - Create Global Un-Managed Tables
    • DataFrame API
    • SQL API
  • 21: Spark SQL_Create Views
    • Temporary Views
    • Global Temporary Views
    • DataFrame API
    • SQL API
    • Dropping Views
  • 22: Access Data from Azure Blob Storage
    • Account Access Key
    • Windows Azure Storage Blob driver (WASB)
    • Read Operations
    • Write Operation
  • 23: Access Data from Azure Data Lake Gen2
    • Azure Service Principal
    • Azure Service Principal
    • Azure Blob Filesystem driver (ABFS)
    • Read Operations
    • Write Operation
  • 24: Access Data from Azure Data Lake Gen2
    • Shared access signatures (SAS)
    • Azure Blob Filesystem driver (ABFS)
    • Read Operations
    • Write Operation
  • 25: Access Data from Azure SQL Database
    • Configure a connection to SQL server
  • 26: Access Data from Synapse Dedicated SQL Pool
    • Configure storage account access key
    • Read data from an Azure Synapse table
    • Write Data to Azure Synapse table
  • 27: Access Data from Snowflake
    • Reading Data
    • Writing Data
  • 28: Create Mount Point to Azure Cloud Storages
    • Azure Blob Storage
    • Azure Data Lake Storage
  • 29: Spark Batch Processing
    • Reading Batch Data
    • Writing Batch Data
  • 30: Spark Structured Streaming API
    • Reading Streaming Data
    • Write Streaming Data
    • checkPoint Location
  • 31: Code Modularity of Notebooks
    • %run Magic Command
  • 32: dbutils.notebook Utility
    • run()
    • exit()
  • 33: Widgets_Types of Widgets
    • text
    • dropdown
    • multiselect
    • combobox
  • 34: Parameterization of Notebooks
    • History Load
    • Incremental Load
  • 35: Trigger Notebook from Data Factory Pipeline
    • Notebook Parameters
    • Notebook Parameters
  • 36: Databricks Workflow
    • Orchestration of Tasks
  • 37: Databricks Workflow
    • Task Parameters
    • Job Trigger
  • 38: Delta Lake Implementation
    • SCD Type0 Dimension
  • 39: Delta Lake Implementation
    • SCD Type1 Dimension
  • 40: Delta Lake Implementation
    • SCD Type3 Dimension
  • 41: Databricks - Auto Loader
    • Auto Loader file detection modes
    • Directory Listing mode
    • File Notification mode
    • Schema Evolution with Auto Loader
  • 42: Databricks Unity Catalog
    • Metastore
    • Catalog
    • Schema
    • Tables
    • Volumes
    • Views
    • Managed Tables
    • External Tables
    • Managed Volumes
    • External Volumes
  • 43: Delta Live Tables
    • Simple Declarative SQL & Python APIs
    • Automated Pipeline Creation
    • Data Quality Checks
  • 44: Data Engineering using Apache Spark, Delta Lakes and Notebooks
    • Introduction to Spark Compute in Microsot Fabric
    • Apache Spark Job Defntion
    • Apache Spark Monitoring in Microsot Fabric
    • Delta Lake Tables Optmizaion and V-Order
    • Working with Fabric Notebooks
    • create the workspace in Fabric and Build a lake house in fabric
    • Install One Lake Explorer &Data Studio
    • Create Your First warehouse in Fabric | | Lakehouse vs Warehouse
    • Apache spark in Fabric
    • Work with Delta Lake tables in Microsoft Fabric
  • 45: Introducton Of Azure Data Factory (ADF)
    • ADF_Key Components_Data Ingestion Blob to Data Lake Storage
    • Bulk Ingestion of Data from Files to Tables Using Parameterization
    • Bulk Ingestion of Data from Tables to Files Using Parameterization
    • Copy Raw Data from On-premise File System to Cloud Storage
    • Integrate ADF with Azure Key Vault to Access Secrets
    • Introduction to DataFlows_Design Dataflow with Transformations


    Azure Data Engineering - Azure Databricks Assessments

  • PySpark_Transformation
    • Identify Duplicate Records
    • Eliminate Duplicates Records
    • Dropping Rows with Nulls
  • PySpark_Transformation
    • Join and Types of Joins
    • Filling Nulls with Values Using fillna()
  • PySpark_Transformation
    • Join and Types of Joins
  • PySpark_Transformation
    • Types of joins_Joins Pocket Guide
  • PySpark_Transformation
    • Merging DataFrames Using union()_unionByName()
  • PySpark_Transformation
    • Calculating Business Aggregates_
    • Single and Multi Aggregations
  • PySpark_Transformation
    • Window Functions
    • Row_Number()
    • Rnk()
    • Dense_Rank()
  • PySpark_Transformation
    • Window Functions
    • sum()
    • Rnk()
    • lag()
  • PySpark_Transformation
    • Data Pivot_
    • UnPivoting Data
  • Delta Lake
    • Vacuum Command
  • Spark Structured Streaming API - outputModes
    • Append
    • Complete
    • Update
  • Spark Structured Streaming API_Triggers
    • Unspecified Trigger (Default Behavior)
    • trigger(availableNow = True)
    • trigger(processingTime = "n minutes")
  • Spark Structured Streaming API
    • Data Processing
    • Joins
    • Aggregation
  • Databricks_COPY INTO SQL Command
    • Incremental Data Ingestion
  • Databricks_Autoloader_
    • Schema Inference
    • SchemaHints
    • Schema Location
  • Databricks_Autoloader
    • Schema Evolution Modes
  • dbutils.notebook Utility
    • run()
    • exit()
  • PySpark Performance Optimization
    • Cache()
    • Persist()
  • PySpark Performance Optimization
    • repartition()
    • coalesce()
  • PySpark Performance Optimization
    • Column Predicate Pushdown
    • partitionBy()
  • PySpark Performance Optimization
    • bucketBy()
  • PySpark Performance Optimization
    • BroadCastJoin
  • Delta Lake_Performance Optimization
    • OPTIMIZE
    • ZORDER
  • Delta Lake_Performance Optimization
    • Delta Cache
  • Delta Lake_Performance Optimization
    • Liquid Clustering
  • Delta Lake_Performance Optimization
    • Partitioning
    • Liquid Clustering
  • Unity Catalog
    • Create Catalog
    • Schema
    • Tables Using UI and SQL
  • Unity Catalog Metastore Storage Account Container
    • Read CSV Files
  • External Data Lake Storage Account
    • Storage Credentials
    • External Locations
    • Read CSV Files
  • Unity Catalog - Managed Tables
    • Managed Tables
    • Managed Storage Locations
  • Unity Catalog - Create External Tables
    • External Tables
  • Unity Catalog-Volume
    • Create Managed Volume using Catalog Explorer UI
    • Create Managed Volume using SQL
  • Delta Lake Implementation
    • SCD Type2 Dimension

    Azure Data Factory_Assessments


    • Data Ingestion_Copy Data Tool(CDT)
    • Add New Columns While Copying Data
    • CopyData Activity_Executepipeline Activity_ADLS Gen2_SQLDB
    • FilterFileFormats based on File Size and Delete Files from Source Storage
    • Insert Metadata_Get Metadata_Stored Procedure Activity
    • Insert Metadata_About CSV Files in Azure Storage_Get Metadata_Stored Procedure Activity
    • CopyData Activity_Linked Service_Dataset_Pipeline Parameters_Copy Multiple Files_To_Tables
    • Copy Data Activity_Copy Behaviour
    • Dataflows_Rank Transformation
    • Dataflows_Parse Transformation
    • Dataflows_Stringfy Transformation
    • Dataflows_SurrogateKey_Transformation
    • Dataflows_Windows Transformation
    • Dataflows_Coniditional Split_Transformation
    • Dataflows_Aggregator_Sorter Transformation
    • Dataflows_Lookup Transformation
    • Dataflows_Exists Transformation
    • REST API Integration
    • Data Activity_Copy behaviour_Preserve Hierarchy_Flatten Hierarchy_Merge Files
    • Copy Data Activity_Filter By Last Modified Date_Dynamic Date Expressions
    • Copy Data from JSON File To Azure SQL Database Table
    • Execute Copy Data Activity based on File Count in the Container
    • Copy Data Activity_List of Files Configuration
    • Dataflows_Flatten Transformations
    • Dataflows_Pivot Transformations
    • Implement_Thumbling Window Trigger
    • Differences Between Debug VS Tigger Now
    • Copy Data_On-premise File System To ADLS Gen2
    • Copy Data from On-premise To Azure Cloud Storages
    • Copy Data Activity_Excel File Formats
    • Copy Data Activity_Excel File Formats_Lookup Activity_Pipeline Variables
    • Copy Data Activity_XML File Formats
    • Insert the Metadata about a storage Container Dynamically using Parameterized Stored Procedure
    • Copy Data from Azure Blob Storage To ADLS Gen2
    • Copy Data from Azure Data Lake Storage Gen2 To Azure SQL Database
    • Copy Data from Multiple Files(ADLS Gen2) To Multiple Tables(Azure SQL DB)
    • Copy Data Activity_Source File Path Type Configurations
    • Execution of Copy Data Activity based on File Count in the Container
    • Data Ingestion from JSON File Format to Table
    • Create Dataflows_Select_Filter_Derived Column Transformation
    • Create Dataflows_Join_Union Transformation


    Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

    Talk With Us