Best Azure Data Engineering - Databricks in Hyderabad

An Azure Data Engineering -Databricks role encompasses the entire data lifecycle within the Azure ecosystem, from data ingestion to analysis and reporting. These engineers are responsible for designing, implementing, and maintaining data pipelines, data warehouses, and data lake solutions using a variety of Azure services. They also handle tasks like data transformation, security, and performance optimization.

Responsibilities:

Data Ingestion and Extraction:

Bringing data from various sources (structured, unstructured, real-time) into Azure.

Data Transformation and Cleaning:

Ensuring data quality and consistency through cleaning, transformation, and integration processes.

Data Storage:

Designing and implementing data storage solutions, including Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database.

Data Warehousing:

Building and maintaining data warehouses using Azure Synapse Analytics.

Data Pipeline Development:

Creating and managing automated data pipelines for efficient data movement and processing using Azure Data Factory or Azure Databricks.

Data Security and Compliance:

Implementing security measures (encryption, access control) and ensuring compliance with data privacy laws.

Performance Monitoring and Optimization:

Identifying and resolving performance bottlenecks in data systems.

Collaboration:

Working with data scientists, analysts, and business stakeholders to understand their needs and implement appropriate data solutions.

Azure Data Engineering - Azure Databricks-Interactive Live Sessions

1: Intruduction To Azure Databricks

Core Databricks Concepts
Workspace
Notebooks
Library
Folder
Repos
Data
Compute
Workflows

2: Introducing Spark Fundamentals

What is Apache Spark
Why Choose Apache Spark
What are the Spark use cases

3: Spark Architecture

Spark Components
Spark Driver
SparkSession
Cluster manager
Spark Executors

4: Create Databricks Workspace

Workspace Assets

5: Creating Spark Cluster

All-Purpose Cluster
Single Node Cluster
Multi Node Cluster

6: Databricks - Internal Storage

Databricks File System (DBFS)
Uploading Files to DBFS

7: DBUTILS Module

Interaction with DBFS
%fs Magic Command

8: Spark Data API's

RDD (Resilient Distributed Dataset)
DataFrame
Dataset

9: Create Data Frame

Using Python Collection
Converting RDD to DataFrame

10: Reading CSV data with Apache Spark

Inferred Schema
Explicit Schema
Parsing Modes

11: Reading JSON data with Apache Spark

SingleLine JSON
Multiline JSON
Complex JSON
explode() Function

12: Reading XML Data with Apache Spark

Install Spark-xml Library
User Defined Schema
DDL String Approach
StructType() with StructFields()

13: Reading Excel File With Apache Spark

Single Sheet Reading
Multiple Sheet Reading Using List object

14: Reading Excel File With Apache Spark

Multiple Excel Sheets with Same Structure
Multiple Excel Sheets with Different Structures

15: Intruduction to Delta Lake

Delta Lake Features
Delta Lake Components

16: Delta lake Features

DML Operations
Time Travel Operations

17: Delta lake Features

Schema Validation and Enforcement
Schema Evolution

18: Introduction to Spark SQL Module

Hive Metastore
Spark Catalog

19: Spark SQL - Create Global Managed Tables

DataFrame API
SQL API

20: Spark SQL - Create Global Un-Managed Tables

DataFrame API
SQL API

21: Spark SQL_Create Views

Temporary Views
Global Temporary Views
DataFrame API
SQL API
Dropping Views

22: Access Data from Azure Blob Storage

Account Access Key
Windows Azure Storage Blob driver (WASB)
Read Operations
Write Operation

23: Access Data from Azure Data Lake Gen2

Azure Service Principal
Azure Service Principal
Azure Blob Filesystem driver (ABFS)
Read Operations
Write Operation

24: Access Data from Azure Data Lake Gen2

Shared access signatures (SAS)
Azure Blob Filesystem driver (ABFS)
Read Operations
Write Operation

25: Access Data from Azure SQL Database

Configure a connection to SQL server

26: Access Data from Synapse Dedicated SQL Pool

Configure storage account access key
Read data from an Azure Synapse table
Write Data to Azure Synapse table

27: Access Data from Snowflake

Reading Data
Writing Data

28: Create Mount Point to Azure Cloud Storages

Azure Blob Storage
Azure Data Lake Storage

29: Spark Batch Processing

Reading Batch Data
Writing Batch Data

30: Spark Structured Streaming API

Reading Streaming Data
Write Streaming Data
checkPoint Location

31: Code Modularity of Notebooks

%run Magic Command

32: dbutils.notebook Utility

run()
exit()

33: Widgets_Types of Widgets

text
dropdown
multiselect
combobox

34: Parameterization of Notebooks

History Load
Incremental Load

35: Trigger Notebook from Data Factory Pipeline

Notebook Parameters
Notebook Parameters

36: Databricks Workflow

Orchestration of Tasks

37: Databricks Workflow

Task Parameters
Job Trigger

38: Delta Lake Implementation

SCD Type0 Dimension

39: Delta Lake Implementation

SCD Type1 Dimension

40: Delta Lake Implementation

SCD Type3 Dimension

41: Databricks - Auto Loader

Auto Loader file detection modes
Directory Listing mode
File Notification mode
Schema Evolution with Auto Loader

42: Databricks Unity Catalog

Metastore
Catalog
Schema
Tables
Volumes
Views
Managed Tables
External Tables
Managed Volumes
External Volumes

43: Delta Live Tables

Simple Declarative SQL & Python APIs
Automated Pipeline Creation
Data Quality Checks

44: Data Engineering using Apache Spark, Delta Lakes and Notebooks

Introduction to Spark Compute in Microsot Fabric
Apache Spark Job Defntion
Apache Spark Monitoring in Microsot Fabric
Delta Lake Tables Optmizaion and V-Order
Working with Fabric Notebooks
create the workspace in Fabric and Build a lake house in fabric
Install One Lake Explorer &Data Studio
Create Your First warehouse in Fabric | | Lakehouse vs Warehouse
Apache spark in Fabric
Work with Delta Lake tables in Microsoft Fabric

45: Introducton Of Azure Data Factory (ADF)

ADF_Key Components_Data Ingestion Blob to Data Lake Storage
Bulk Ingestion of Data from Files to Tables Using Parameterization
Bulk Ingestion of Data from Tables to Files Using Parameterization
Copy Raw Data from On-premise File System to Cloud Storage
Integrate ADF with Azure Key Vault to Access Secrets
Introduction to DataFlows_Design Dataflow with Transformations

Azure Data Engineering - Azure Databricks Assessments

PySpark_Transformation

Identify Duplicate Records
Eliminate Duplicates Records
Dropping Rows with Nulls

PySpark_Transformation

Join and Types of Joins
Filling Nulls with Values Using fillna()

PySpark_Transformation

Join and Types of Joins

PySpark_Transformation

Types of joins_Joins Pocket Guide

PySpark_Transformation

Merging DataFrames Using union()_unionByName()

PySpark_Transformation

Calculating Business Aggregates_
Single and Multi Aggregations

PySpark_Transformation

Window Functions
Row_Number()
Rnk()
Dense_Rank()

PySpark_Transformation

Window Functions
sum()
Rnk()
lag()

PySpark_Transformation

Data Pivot_
UnPivoting Data

Delta Lake

Vacuum Command

Spark Structured Streaming API - outputModes

Append
Complete
Update

Spark Structured Streaming API_Triggers

Unspecified Trigger (Default Behavior)
trigger(availableNow = True)
trigger(processingTime = "n minutes")

Spark Structured Streaming API

Data Processing
Joins
Aggregation

Databricks_COPY INTO SQL Command

Incremental Data Ingestion

Databricks_Autoloader_

Schema Inference
SchemaHints
Schema Location

Databricks_Autoloader

Schema Evolution Modes

dbutils.notebook Utility

run()
exit()