• Welcome to CloudMonks
  • +91 9849668223
  • info@thecloudmonks.com

Azure Databricks Online Training

About The Course

Azure Databricks training in Hyderabad is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure more scalable and optimized for Azure.

Databricks cloud service is built by the team that started the Spark research project at UC Berkeley that later became Apache Spark and is the leading Spark-based analytics platform. This new service, named Microsoft Azure Databricks training, provides data science and data engineering teams with a fast, easy and collaborative Spark-based platform on Azure. It gives Azure users a single platform for Big Data processing and Machine Learning.

Best Azure Databricks training is a “first party” Microsoft service, the result of a unique year-long collaboration between the Microsoft and Databricks teams to provide Databricks’ Apache Spark-based analytics service as an integral part of the Microsoft Azure platform.

Azure Databricks training leverages Azure’s security and seamlessly integrates with Azure services such as Azure Active Directory, SQL Data Warehouse, and Power BI.

  • Azure Databricks + Apache Spark + enterprise cloud = Azure Databricks
  • It is a fully-managed version of the open-source Apache Spark analytics and it features optimized connectors to storage platforms for the quickest possible data access.
  • It offers a notebook-oriented Apache Spark as-a-service workspace environment which makes it easy to explore data interactively and manage clusters
  • It is secure cloud-based machine learning and big data platform.
  • It is supporting multiple languages such as Scala, Python, R, Java, and SQL.

Module 1: Cloud Computing Concepts

  • What is the "Cloud"?
  • Why cloud services
  • Types of cloud models
    • Deployment Models
      • Private Cloud
      • Public Cloud
      • Hybrid Cloud
  • Types of cloud services
    • Infrastructure as a Service(IaaS)
    • Platform as a Service(PaaS)
    • Software as a Service(SaaS)
  • Comparing Cloud Platforms
    • Microsoft Azure,
    • Amazon Web Services,
    • Google Cloud Platform
  • characteristics of cloud computing
    • On-demand self-service
    • Broad network access
    • Multi-tenancy and resource pooling
    • Rapid elasticity and scalability
    • Measured service
  • Cloud Data Warehouse Architecture
    • Shared Memory architecture
    • Shared Disk architecture
    • Shared Nothing architecture

Module 2:BigData Introduction

  • What is BigData?
  • BigData Sources
  • Data vs Information
  • Characteristics of bigdata
    • Variety
    • Velocity
    • Valume
    • Veracity
    • Value
  • Types Of BigData
    • Structured Data
    • UnStructured Data
    • Semi Structured Data

Module 3:Dimensional Modeling

  • OLTP System
    • Relational Modeling
  • Characteristics Fetures of OLTP
  • Enterprice Data Wherehouse
    • Dimensional Modeling
  • Dimensional Modeling-Schemas
    • Star Schema
    • Snowflake Schema
    • Multi Star Schema
  • Dimesional Tables
  • Fact Tables
  • Types of slowly Changing Dimensions
    • Type1 Dimension
    • Type2 Dimension
    • Type3 Dimension
  • Types Facts
    • Additive Facts
    • Semi Additive Facts
    • Non-Additive Facts

Module 4: Azure SQL Database

  • Introduction Azure SQL Database.
  • Comparing Single Database
  • Managed Instance
  • Creating and Using SQL Server
  • Creating SQL Database Services.
  • Azure SQL Database Tools.
  • Migrating on premise database to SQL Azure.
  • Purchasing Models
  • DTU service tiers
  • vCore based Model
  • Serverless compute tier
  • Service Tiers
    • General purpose / Standard
    • Business Critical / Premium
    • Hyperscale
  • Deployment of an Azure SQL Database
  • Elastic Pools.
  • What is SQL elastic pools
    • Choosing the correct pool size
  • Creating a New Pool
  • Manage Pools

Module 5: Azure Storage Service

  • Azure Storage Account
  • Features of Azure storage Service
  • Introduction to Blob Storage Sevice
  • Blob Storage Architecture
  • Blob Storage Features
  • Types of Blobs
    • Block Blobs,
    • Append Blobs
    • Page Blobs
  • Creating a Storage Account
  • Azure Storage Performance Tiers
    • Standard
    • Premium Performance
  • Understanding Data Replication
    • LRS ( Locally Redundant Storage)
    • ZRS (Zone Redundant Storage)
    • GRS (Geo Redundant Storage)
  • Azure Storage-Access Tiers
    • Hot
    • Cold
    • Archive
  • Working with Containers and Blobs
  • Soft Delete
  • Azure Storage Explorer
  • Access blobs securely
  • Access Key
  • Account Shared Access Token
  • Service Shared Access Token
  • Azure Maximum Scalability Or Limits

Module 6: Azure Data Lake Storage Services

  • Introduction to Azure Data Lake
  • What is Data Lake?
  • What is Azure Data Lake?
  • Data Lake Architecture?
  • Working with Azure Data Lake Storage Gen1
  • Features of Data Lake Storage Gen1
  • Understanding Azure Data Lake Gen2
  • Features of Data Lake Storage Gen2
  • Differences Between Gen1 & Gen2 Storage
  • Explore Data Lake Storges
  • Provising Data Lake Storage Gen1 Service
  • Provising Data Lake Storage Gen2 Service
  • Uploading Sample File
  • Using Azure Portal
  • Using Storage Explorer

Module 7: Introduction to Azure Databricks

  • Introduction to Databricks
  • Azure Databricks Architecture
  • Azure Databricks Main Concepts

Module 8:Creating an Azure Databricks Service

  • Creating a Databricks worspace in the Azure Portal
  • Databricks service using the Azure CLI(command-line interface)
  • Databricks service using Azure Resource Manager(ARM) templates
  • Ading users and groups to the workspace
  • Creating a cluster from the user interface(UI)
  • Getting started with notebooks and jobs Azure Databricks
  • Authenticating to Databricks using a PAT

Module 9:Databricks Cluster Management

  • Creating and configuring clusters
  • Managing Clusters
    • Displaying clusters
    • Starting a cluster
    • Terminating a cluster
    • Delete a cluster
    • Cluster Information
    • Cluster logs
    • Cluster access control
  • Types of Clusters
    • All pupose clusters
    • Job cluster
  • Databricks Pools
    • Databricks without pools
    • Databricks with Pools
  • Cluters Mode
    • Standard
    • High Concurrency
    • Single Node
  • Autoscalling
  • Databricks runtime versions
  • Multiuser Clusters

Module 10:Databricks notebook core functionalities

  • Creating and managing notebooks
  • Exporting notebooks
  • Importing notebooks
  • Attaching a notebook to a cluster
  • Spark environment variables
    • SparkContext(sc)
    • SQLContext/HiveContext(sqlContext)
    • SparkSession(spark)
  • Scheduling a notebook
  • Default Languege
  • Notebook permissions
  • Folder permissions
  • Cloning notebook
  • Renaming notebook

Module 11:Databricks Utilities and Notebook Parameters

  • Dbutils commands on files, directories
  • Notebooks and libraries
  • Databricks Variables
  • Widget Types
  • Databricks notebook parameters

Module 12:Databricks CLI

  • Azure Databricks CLI Installation
  • Databricks CLI - DBFS, Libraries and Jobs

Module 13:Azure Databricks Integration with Azure Blob Storage

  • Creating Blob mount point
  • Writing data to the Blob Storage
  • Read data from Blob Storage

Module 14: Azure Databricks Integration with Azure Data Lake Storage Gen2

  • Reading files from Azure Data Lake Storage Gen2
  • Writing files to Azure Data Lake Storage Gen2

Module 15:Azure Databricks Integration with Azure Data Lake Storage Gen1

  • Reading Files from data lake storage Gen1

Module 16:Azure Databricks Integration with Azure Data Lake Storage Gen1

  • Reading Files from data lake storage Gen1

Module 17:Azure Databricks-CSV File Formats

  • Read CSV Files
  • Read TSV Files
  • PIPE Seperated CSV Files
  • Read CSV Files with multiple delimiter
  • Reading different position Multidelimiter CSV files

Module 18:Azure Databricks Parquet files in Databricks

  • Reading Parquet files from Data Lake Storage Gen2
  • Reading and Creating Partition files in Spark
  • Writing Parquet files to Data Lake Storage Gen2

Module 19:Azure Databricks-Parsing Complex Json Files

  • Reading and Writing JSON Files
  • Handling Complex JSON files

Module 20:Reading and Writing ORC and Avro Files

  • Reading and Writing ORC Files
  • Reading and Writing Avro Files

Module 21:Databricks Integration with Azure SQL Database

  • Reading and Writing data from Azure SQL Database

Module 22:Databricks Integration with Azure Synapse

  • Reading and Writing Azure Synapse data from Azure Databricks

Module 23:Databricks Integration with Snowflake

  • Reading and Writing data from Snowflake

Module 24:Databricks Integration with CosmosDB SQL API

  • Reading and Writing data from Azure CosmosDB Account

Module 25:Introduction to Python

  • Features of Python
  • Python Virtual Machine (PVM)
  • Frozen Binaries
  • Memory management in Python
  • Garbage collection in Python

Module 26:Writing Our First Python Program

  • Writing our first Python program
  • Executing a Python program
  • Getting help in Python
  • Reopening the Python program in IDLE

Module 27: Datatypes In Python

  • Comments in Python
  • Docstrings
  • How Python sees variables
  • Datatypes in Python
  • Built-in datatypes
  • bool datatype
  • Sequences in Python
  • Sets
  • Literals in Python
  • Determining the datatype of a variable
  • characters in Python
  • User-defined datatypes
  • Constants in Python
  • Identifiers and Reserved words
  • Naming conventions in Python

Module28:Operaters in Python

  • Arithmetic operators
  • Using Python interpreter as calculator
  • Assignment operators
  • Unary minus operator
  • Relational operators
  • Logical operators
  • Boolean operators
  • Membership operators
  • Identity operators
  • Operator precedence and associativity
  • Mathematical functions

Module 29: Input And Output

  • Output statements
  • Various formats of print()
  • Input statements
  • Command line arguments

Module 30:Control Statements

  • if statement
  • if … else statement
  • if … elif … else statement
  • for loop
  • Infinite loops
  • Nested loops
  • break statement
  • continue statement
  • pass statement
  • assert statement
  • return statement

Module 31: Strings and Characters

  • Creating strings
  • Length of a string
  • Indexing in strings
  • Repeating the strings
  • Concatenation of strings
  • Checking membership
  • Comparing strings
  • Removing spaces from a string
  • Finding sub strings
  • Strings are immutable
  • Replacing a string with another string
  • Splitting and joining strings
  • Changing case of a string
  • Checking starting and ending of a string
  • String testing methods
  • Formatting the strings
  • Sorting strings

Module 32:Lists

  • Creating lists using range() function
  • Updating the elements of a list
  • Concatenation of two lists
  • Repetition of lists
  • Membership in lists
  • Aliasing and cloning lists
  • Methods to process lists
  • Nested lists
  • List comprehensions

Module 33: Tuples

  • Creating tuples
  • Accessing the tuple elements
  • Basic operations on tuples
  • Functions to process tuples
  • Nested tuples

Module 34: Dictionaries

  • Operations on dictionaries
  • Dictionary methods
  • Using for loop with dictionaries
  • Sorting the elements of a dictionary using lambdas
  • Converting lists into dictionary
  • Converting strings into dictionary
  • Ordered dictionaries

Module 35:SET

  • Creation of set objects
  • Important functions of set
  • Mathematical Operations on set
  • Membership Operators (in, not in)
  • Set Comprehension

Module 36:Functions

  • Built in Functions
  • User Defined Functions
  • Parameters
  • Return Statement
  • Returning Multiple values from a function
  • Types of Arguments
  • Case study
  • Types of Variables
  • Global Keyword
  • Recursive Functions
  • Anonymous Functions
  • Normal Function

Module 37:Modules

  • Renaming a Module at time of import
  • from…import
  • Various Possibilities of import
  • Member Aliasing
  • Reloading a Module
  • Finding members of module by using dir()
  • The special Variable__name__
  • Working with math module
  • Working with random module

Module 38:Pyspark Introduction

  • Pyspark Introduction
  • Pyspark Components and Features

Module 39:Spark Architecture and Internals

  • Apache Spark Internal architecture
  • jobs stages and tasks
  • Spark Cluster Architecture Explained

Module 40:Spark RDD

  • Different Ways to create RDD in Databricks
  • Spark Lazy Evaluation Internals & Word Count Program
  • RDD Transformations in Databricks & coalesce vs repartition
  • RDD Transformation and Use Cases

Module 41:Spark SQL

  • Spark SQL Introduction
  • Different ways to create DataFrames

Module 42:Spark SQL Intenals

  • Catalyst Optimizer and Spark SQL Execution Plan
  • Deep dive on Sparksession vs sparkcontext
  • spark SQL Basics part-1
  • RDD Transformation and Use Cases

Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

Talk With Us