• Welcome to CloudMonks
  • +91 96660 64406 | +91 9849668223
  • info@thecloudmonks.com

Azure Databricks Online Training

About The Course

Azure Databricks training in Hyderabad is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure more scalable and optimized for Azure.

Databricks cloud service is built by the team that started the Spark research project at UC Berkeley that later became Apache Spark and is the leading Spark-based analytics platform. This new service, named Microsoft Azure Databricks training, provides data science and data engineering teams with a fast, easy and collaborative Spark-based platform on Azure. It gives Azure users a single platform for Big Data processing and Machine Learning.

Best Azure Databricks training is a “first party” Microsoft service, the result of a unique year-long collaboration between the Microsoft and Databricks teams to provide Databricks’ Apache Spark-based analytics service as an integral part of the Microsoft Azure platform.

Azure Databricks training leverages Azure’s security and seamlessly integrates with Azure services such as Azure Active Directory, SQL Data Warehouse, and Power BI.

  • Azure Databricks + Apache Spark + enterprise cloud = Azure Databricks
  • It is a fully-managed version of the open-source Apache Spark analytics and it features optimized connectors to storage platforms for the quickest possible data access.
  • It offers a notebook-oriented Apache Spark as-a-service workspace environment which makes it easy to explore data interactively and manage clusters
  • It is secure cloud-based machine learning and big data platform.
  • It is supporting multiple languages such as Scala, Python, R, Java, and SQL.

Module 1: Cloud Computing Concepts

  • What is the "Cloud"?
  • Why Cloud Services
  • Types of cloud services
    • Infrastructure as a Service(IaaS)
    • Platform as a Service(PaaS)
    • Software as a Service(SaaS)

Module 2:BigData Introduction

  • What is BigData?
  • Characteristics of Bigdata
  • Types Of BigData
    • Structured Data
    • Unstructured Data
    • Semi Structured Data

Module 3: Azure Cloud Storage Technologies

  • Azure Blob Storage
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
  • Azure SQL Database
  • Synapse Dedicated SQL Pool

Module 4: Azure Blob Storage

  • Storage Account
  • Containers
  • Types Of Blobs
  • Performance Tiers
  • Access Tiers
  • Data Replication Policies

Module 5: Azure Data Lake Storage Gen2

  • Enable Hierarchical Name Space
  • Access Control List (ACL)
  • Features of ADLS Gen2

Module 6: Azure SQL Database

  • Compute & Storage Configurations
  • vCore Based Purchasing Model
  • DTU Based Purchasing Model
  • Firewall Rules

Module 6: Introduction to Azure Databricks

  • Introduction to Databricks
  • Azure Databricks Architecture
  • Azure Databricks Main Concepts

Module 7:Creating an Azure Databricks Service

  • Creating a Databricks worspace in the Azure Portal
  • Databricks service using the Azure CLI(command-line interface)
  • Databricks service using Azure Resource Manager(ARM) templates
  • Ading users and groups to the workspace
  • Creating a cluster from the user interface(UI)
  • Getting started with notebooks and jobs Azure Databricks
  • Authenticating to Databricks using a PAT

Module 8:Databricks Cluster Management

  • Creating and configuring clusters
  • Managing Clusters
    • Displaying clusters
    • Starting a cluster
    • Terminating a cluster
    • Delete a cluster
    • Cluster Information
    • Cluster logs
    • Cluster access control
  • Types of Clusters
    • All pupose clusters
    • Job cluster
  • Databricks Pools
    • Databricks without pools
    • Databricks with Pools
  • Clusters Mode
    • Standard
    • High Concurrency
    • Single Node
  • Autoscalling
  • Databricks runtime versions
  • Multiuser Clusters

Module 9:Databricks notebook core functionalities

  • Creating and managing notebooks
  • Exporting notebooks
  • Importing notebooks
  • Attaching a notebook to a cluster
  • Spark environment variables
    • SparkContext(sc)
    • SQLContext/HiveContext(sqlContext)
    • SparkSession(spark)
  • Scheduling a notebook
  • Default Languege
  • Notebook permissions
  • Folder permissions
  • Cloning notebook
  • Renaming notebook

Module 10:Databricks Utilities and Notebook Parameters

  • Dbutils commands on files, directories
  • Notebooks and libraries
  • Databricks Variables
  • Widget Types
  • Databricks notebook parameters

Module 11:Azure Databricks Integration with Azure Blob Storage

  • Mount Azure Blob Storage To DBFS
  • Access Blob Storage Using Direct Connection -Account Access Key
  • Access Blob Storage Using SAS Token
  • Writing Data To Azure Blob Storage

Module 12: Azure Databricks Integration with Azure Data Lake Storage Gen2

  • Mount ADLS Gen2 To DBFS Using OAuth2.0 With Service Principal
  • Access ADL Gen2 To Using Direct Connection
  • Reading files from Azure Data Lake Storage Gen2
  • Writing files to Azure Data Lake Storage Gen2

Module 13:Azure Databricks Integration with Azure Data Lake Storage Gen1

  • Mount ADLS Gen1 To DBFS Using OAuth2.0 With Service Principal
  • Access ADLS Gen1 Using Direct Connection
  • Reading Files from data lake storage Gen1
  • Writing Files from data lake storage Gen1

Module 14:Databricks Integration with Azure SQL Database

  • Reading and Writing data from Azure SQL Database

Module 15:Databricks Integration with Azure Synapse

  • Reading and Writing Azure Synapse data from Azure Databricks

Module 16:Databricks Integration with Snowflake

  • Reading and Writing data from Snowflake

Module 17:Databricks Integration with CosmosDB SQL API

  • Reading and Writing data from Azure CosmosDB Account

Module 18:Azure Databricks-CSV File Formats

  • Read CSV Files
  • Read TSV Files
  • PIPE Seperated CSV Files
  • Read CSV Files with multiple delimiter
  • Reading different position Multidelimiter CSV files

Module 19:Azure Databricks-Parquet files

  • Reading Parquet files from Data Lake Storage Gen2
  • Reading and Creating Partition files in Spark
  • Writing Parquet files to Data Lake Storage Gen2

Module 20:Azure Databricks-Read and write EXcel Files

  • Installing libraries on the cluster
  • Reading and Writing Excel files

Module 21:Azure Databricks-XML Files

  • Reading and writing XML Files

Module 22:Azure Databricks-Parsing Complex Json Files

  • Reading and Writing JSON Files
  • Handling Complex JSON files

Module 23:Reading and Writing ORC and Avro Files

  • Reading and Writing ORC Files
  • Reading and Writing Avro Files

Module 24: Big Data Analytics

  • Understanding the collect() method
  • Understanding the use of inferSchema
  • Learning to differentiate CSV and Parquet
  • Learning to differentiate Pandas and Koalas
  • Understanding built-in-Spark functions
  • Learning column predicate pushdown
  • Learning partitioning strategies in Spark
  • Understanding Spark SQL optimizations
  • Understanding bucketing in Spark

Module 25: Working with batch processing in Databricks

  • Reading data
  • Checking row count
  • Selecting columns
  • Filtering data
  • Droping columns
  • Adding or replacing columns
  • Printing schema
  • Renaming a column
  • Droping duplicate rows
  • Limiting output rows
  • Sorting rows
  • Grouping data
  • Visualizing data
  • Writing data to a sink

Module 26: Structured Streaming in Azure Databricks

  • Structured Streaming concepts
  • Managing streams
  • Sorting data
  • Productionizing Structured Streaming

Module 27 :Abstracting Data with DataFrames

  • Creating DataFrames
  • Accessing underlying RDDs
  • Performance optimization
  • Inferring the schema using reflection
  • Specifying the schema programmatically
  • Creating a temporary table
  • Using SQL to interact with DataFrames
  • Overview of DataFrame transformations
  • Overview of DataFrame actions

Module 28: Databricks Delta Lake

  • Delta Lake Introduction
  • Delta Lake Architecture
  • Creation of Delta Table
    • Using Spark SQL
    • Using PySpark
    • Using Dataframe
  • Working with OPTIMIZE and ZORDER Commands
  • Auto Optimize
  • Delta Catching
  • Dynamic Partition pruning
  • Bloom filter indexing
  • Delta Lake audit log Table
  • Delta Lake:Restore Command
  • Delta Lake:Vacume Command
  • Data insertion Approaches
  • Different Approaches to Delete Data

Module 29 :Delta Lake:Batching table read and writes

  • Creating a table
  • Reading a Delta table
  • Partitioning data to speed up queries
  • Querying past states of a table
  • Using time travel to query table
  • Workign with past and present dataSchema Validation

Module 30 :Delta Lake:Streaming table read and writes

  • Streaming from Delta tables
  • Managing table updates and deletes
  • Specifying an initial position
  • Streaming modes
  • Optimization with Delta Lake

Module 31 :PySpark:Joint Types

  • Inner Join
  • Left outer Join
  • Right outer Join
  • Full outer Join
  • Left semi Join
  • Left anti Join

Module 32: PySpark:Data Merging

  • Union
  • UnionAll

Module 33: PySpark: User Defind Functions(UDF)

Module 34: Slowly Changing Dimension

  • Implement type1 dimension using Delta Lake
  • Implement Type2 Dimension using Delta Lake

Module 35: Pyspark Introduction

  • Pyspark Introduction
  • Pyspark Components and Features

Module 36: Spark Architecture and Internals

  • Apache Spark Internal architecture
  • jobs stages and tasks
  • Spark Cluster Architecture Explained

Module 37: Spark RDD

  • Different Ways to create RDD in Databricks
  • Spark Lazy Evaluation Internals & Word Count Program
  • RDD Transformations in Databricks & coalesce vs repartition
  • RDD Transformation and Use Cases

Module 38: Spark SQL

  • Spark SQL Introduction
  • Different ways to create DataFrames

Module 39: Introduction to Python

Module 39: Writing Our First Python Program

Module 40: Datatypes In Python

Module 41: Operaters in Python

Module 42: Input And Output

Module 43: Control Statements

Module 44: Strings and Characters

Module 45: Lists

Module 46: Tuples

Module 47: Dictionaries

Module 48: SET

Module 49: Functions

Module 50: Modules

Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

Talk With Us