• Welcome to CloudMonks
  • +91 96660 64406 | +91 9849668223
  • info@thecloudmonks.com

Azure Databricks Online Training

About The Course

Azure Databricks training in Hyderabad is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure more scalable and optimized for Azure.

Databricks cloud service is built by the team that started the Spark research project at UC Berkeley that later became Apache Spark and is the leading Spark-based analytics platform. This new service, named Microsoft Azure Databricks training, provides data science and data engineering teams with a fast, easy and collaborative Spark-based platform on Azure. It gives Azure users a single platform for Big Data processing and Machine Learning.

Best Azure Databricks training is a “first party” Microsoft service, the result of a unique year-long collaboration between the Microsoft and Databricks teams to provide Databricks’ Apache Spark-based analytics service as an integral part of the Microsoft Azure platform.

Azure Databricks training leverages Azure’s security and seamlessly integrates with Azure services such as Azure Active Directory, SQL Data Warehouse, and Power BI.

  • Azure Databricks + Apache Spark + enterprise cloud = Azure Databricks
  • It is a fully-managed version of the open-source Apache Spark analytics and it features optimized connectors to storage platforms for the quickest possible data access.
  • It offers a notebook-oriented Apache Spark as-a-service workspace environment which makes it easy to explore data interactively and manage clusters
  • It is secure cloud-based machine learning and big data platform.
  • It is supporting multiple languages such as Scala, Python, R, Java, and SQL.

Module 1: Cloud Computing Concepts

  • What is the "Cloud"?
  • Why cloud services
  • Types of cloud models
    • Deployment Models
      • Private Cloud
      • Public Cloud
      • Hybrid Cloud
  • Types of cloud services
    • Infrastructure as a Service(IaaS)
    • Platform as a Service(PaaS)
    • Software as a Service(SaaS)

Module 2:BigData Introduction

  • What is BigData?
  • BigData Sources
  • Data vs Information
  • Characteristics of bigdata
    • Variety
    • Velocity
    • Valume
    • Veracity
    • Value
  • Types Of BigData
    • Structured Data
    • UnStructured Data
    • Semi Structured Data

Module 3: Azure SQL Database

  • Introduction Azure SQL Database.
  • Comparing Single Database
  • Managed Instance
  • Creating and Using SQL Server
  • Creating SQL Database Services.
  • Azure SQL Database Tools.
  • Migrating on premise database to SQL Azure.
  • Purchasing Models
  • DTU service tiers
  • vCore based Model
  • Serverless compute tier
  • Service Tiers
    • General purpose / Standard
    • Business Critical / Premium
    • Hyperscale

Module 4: Azure Storage Service

  • Azure Storage Account
  • Features of Azure storage Service
  • Introduction to Blob Storage Sevice
  • Blob Storage Architecture
  • Blob Storage Features
  • Types of Blobs
    • Block Blobs,
    • Append Blobs
    • Page Blobs
  • Creating a Storage Account
  • Azure Storage Performance Tiers
    • Standard
    • Premium Performance
  • Understanding Data Replication
    • LRS ( Locally Redundant Storage)
    • ZRS (Zone Redundant Storage)
    • GRS (Geo Redundant Storage)
  • Azure Storage-Access Tiers
    • Hot
    • Cold
    • Archive
  • Working with Containers and Blobs
  • Soft Delete
  • Azure Storage Explorer
  • Access blobs securely
  • Access Key
  • Account Shared Access Token
  • Service Shared Access Token
  • Azure Maximum Scalability Or Limits

Module 5: Azure Data Lake Storage Services

  • Introduction to Azure Data Lake
  • What is Data Lake?
  • What is Azure Data Lake?
  • Data Lake Architecture?
  • Working with Azure Data Lake Storage Gen1
  • Features of Data Lake Storage Gen1
  • Understanding Azure Data Lake Gen2
  • Features of Data Lake Storage Gen2
  • Differences Between Gen1 & Gen2 Storage
  • Explore Data Lake Storges
  • Provising Data Lake Storage Gen1 Service
  • Provising Data Lake Storage Gen2 Service
  • Uploading Sample File
  • Using Azure Portal
  • Using Storage Explorer

Module 6: Introduction to Azure Databricks

  • Introduction to Databricks
  • Azure Databricks Architecture
  • Azure Databricks Main Concepts

Module 7:Creating an Azure Databricks Service

  • Creating a Databricks worspace in the Azure Portal
  • Databricks service using the Azure CLI(command-line interface)
  • Databricks service using Azure Resource Manager(ARM) templates
  • Ading users and groups to the workspace
  • Creating a cluster from the user interface(UI)
  • Getting started with notebooks and jobs Azure Databricks
  • Authenticating to Databricks using a PAT

Module 8:Databricks Cluster Management

  • Creating and configuring clusters
  • Managing Clusters
    • Displaying clusters
    • Starting a cluster
    • Terminating a cluster
    • Delete a cluster
    • Cluster Information
    • Cluster logs
    • Cluster access control
  • Types of Clusters
    • All pupose clusters
    • Job cluster
  • Databricks Pools
    • Databricks without pools
    • Databricks with Pools
  • Clusters Mode
    • Standard
    • High Concurrency
    • Single Node
  • Autoscalling
  • Databricks runtime versions
  • Multiuser Clusters

Module 9:Databricks notebook core functionalities

  • Creating and managing notebooks
  • Exporting notebooks
  • Importing notebooks
  • Attaching a notebook to a cluster
  • Spark environment variables
    • SparkContext(sc)
    • SQLContext/HiveContext(sqlContext)
    • SparkSession(spark)
  • Scheduling a notebook
  • Default Languege
  • Notebook permissions
  • Folder permissions
  • Cloning notebook
  • Renaming notebook

Module 10:Databricks Utilities and Notebook Parameters

  • Dbutils commands on files, directories
  • Notebooks and libraries
  • Databricks Variables
  • Widget Types
  • Databricks notebook parameters

Module 11:Azure Databricks Integration with Azure Blob Storage

  • Mount Azure Blob Storage To DBFS
  • Access Blob Storage Using Direct Connection -Account Access Key
  • Access Blob Storage Using SAS Token
  • Writing Data To Azure Blob Storage

Module 12: Azure Databricks Integration with Azure Data Lake Storage Gen2

  • Mount ADLS Gen2 To DBFS Using OAuth2.0 With Service Principal
  • Access ADL Gen2 To Using Direct Connection
  • Reading files from Azure Data Lake Storage Gen2
  • Writing files to Azure Data Lake Storage Gen2

Module 13:Azure Databricks Integration with Azure Data Lake Storage Gen1

  • Mount ADLS Gen1 To DBFS Using OAuth2.0 With Service Principal
  • Access ADLS Gen1 Using Direct Connection
  • Reading Files from data lake storage Gen1
  • Writing Files from data lake storage Gen1

Module 14:Databricks Integration with Azure SQL Database

  • Reading and Writing data from Azure SQL Database

Module 15:Databricks Integration with Azure Synapse

  • Reading and Writing Azure Synapse data from Azure Databricks

Module 16:Databricks Integration with Snowflake

  • Reading and Writing data from Snowflake

Module 17:Databricks Integration with CosmosDB SQL API

  • Reading and Writing data from Azure CosmosDB Account

Module 18:Azure Databricks-CSV File Formats

  • Read CSV Files
  • Read TSV Files
  • PIPE Seperated CSV Files
  • Read CSV Files with multiple delimiter
  • Reading different position Multidelimiter CSV files

Module 19:Azure Databricks-Parquet files

  • Reading Parquet files from Data Lake Storage Gen2
  • Reading and Creating Partition files in Spark
  • Writing Parquet files to Data Lake Storage Gen2

Module 20:Azure Databricks-Read and write EXcel Files

  • Installing libraries on the cluster
  • Reading and Writing Excel files

Module 21:Azure Databricks-XML Files

  • Reading and writing XML Files

Module 22:Azure Databricks-Parsing Complex Json Files

  • Reading and Writing JSON Files
  • Handling Complex JSON files

Module 23:Reading and Writing ORC and Avro Files

  • Reading and Writing ORC Files
  • Reading and Writing Avro Files

Module 24: Big Data Analytics

  • Understanding the collect() method
  • Understanding the use of inferSchema
  • Learning to differentiate CSV and Parquet
  • Learning to differentiate Pandas and Koalas
  • Understanding built-in-Spark functions
  • Learning column predicate pushdown
  • Learning partitioning strategies in Spark
  • Understanding Spark SQL optimizations
  • Understanding bucketing in Spark

Module 25: Working with batch processing in Databricks

  • Reading data
  • Checking row count
  • Selecting columns
  • Filtering data
  • Droping columns
  • Adding or replacing columns
  • Printing schema
  • Renaming a column
  • Droping duplicate rows
  • Limiting output rows
  • Sorting rows
  • Grouping data
  • Visualizing data
  • Writing data to a sink

Module 26: Structured Streaming in Azure Databricks

  • Structured Streaming concepts
  • Managing streams
  • Sorting data
  • Productionizing Structured Streaming

Module 27 :Abstracting Data with DataFrames

  • Creating DataFrames
  • Accessing underlying RDDs
  • Performance optimization
  • Inferring the schema using reflection
  • Specifying the schema programmatically
  • Creating a temporary table
  • Using SQL to interact with DataFrames
  • Overview of DataFrame transformations
  • Overview of DataFrame actions

Module 28:Pyspark Introduction

  • Pyspark Introduction
  • Pyspark Components and Features

Module 29:Spark Architecture and Internals

  • Apache Spark Internal architecture
  • jobs stages and tasks
  • Spark Cluster Architecture Explained

Module 30:Spark RDD

  • Different Ways to create RDD in Databricks
  • Spark Lazy Evaluation Internals & Word Count Program
  • RDD Transformations in Databricks & coalesce vs repartition
  • RDD Transformation and Use Cases

Module 31:Spark SQL

  • Spark SQL Introduction
  • Different ways to create DataFrames

Module 32:Introduction to Python

  • Features of Python
  • Python Virtual Machine (PVM)
  • Frozen Binaries
  • Memory management in Python
  • Garbage collection in Python

Module 33:Writing Our First Python Program

  • Writing our first Python program
  • Executing a Python program
  • Getting help in Python
  • Reopening the Python program in IDLE

Module 34: Datatypes In Python

  • Comments in Python
  • Docstrings
  • How Python sees variables
  • Datatypes in Python
  • Built-in datatypes
  • bool datatype
  • Sequences in Python
  • Sets
  • Literals in Python
  • Determining the datatype of a variable
  • characters in Python
  • User-defined datatypes
  • Constants in Python
  • Identifiers and Reserved words
  • Naming conventions in Python

Module 35:Operaters in Python

  • Arithmetic operators
  • Using Python interpreter as calculator
  • Assignment operators
  • Unary minus operator
  • Relational operators
  • Logical operators
  • Boolean operators
  • Membership operators
  • Identity operators
  • Operator precedence and associativity
  • Mathematical functions

Module 36: Input And Output

  • Output statements
  • Various formats of print()
  • Input statements
  • Command line arguments

Module 37:Control Statements

  • if statement
  • if … else statement
  • if … elif … else statement
  • for loop
  • Infinite loops
  • Nested loops
  • break statement
  • continue statement
  • pass statement
  • assert statement
  • return statement

Module 38: Strings and Characters

  • Creating strings
  • Length of a string
  • Indexing in strings
  • Repeating the strings
  • Concatenation of strings
  • Checking membership
  • Comparing strings
  • Removing spaces from a string
  • Finding sub strings
  • Strings are immutable
  • Replacing a string with another string
  • Splitting and joining strings
  • Changing case of a string
  • Checking starting and ending of a string
  • String testing methods
  • Formatting the strings
  • Sorting strings

Module 39:Lists

  • Creating lists using range() function
  • Updating the elements of a list
  • Concatenation of two lists
  • Repetition of lists
  • Membership in lists
  • Aliasing and cloning lists
  • Methods to process lists
  • Nested lists
  • List comprehensions

Module 40: Tuples

  • Creating tuples
  • Accessing the tuple elements
  • Basic operations on tuples
  • Functions to process tuples
  • Nested tuples

Module 41: Dictionaries

  • Operations on dictionaries
  • Dictionary methods
  • Using for loop with dictionaries
  • Sorting the elements of a dictionary using lambdas
  • Converting lists into dictionary
  • Converting strings into dictionary
  • Ordered dictionaries

Module 42:SET

  • Creation of set objects
  • Important functions of set
  • Mathematical Operations on set
  • Membership Operators (in, not in)
  • Set Comprehension

Module 43:Functions

  • Built in Functions
  • User Defined Functions
  • Parameters
  • Return Statement
  • Returning Multiple values from a function
  • Types of Arguments
  • Case study
  • Types of Variables
  • Global Keyword
  • Recursive Functions
  • Anonymous Functions
  • Normal Function

Module 44:Modules

  • Renaming a Module at time of import
  • from…import
  • Various Possibilities of import
  • Member Aliasing
  • Reloading a Module
  • Finding members of module by using dir()
  • The special Variable__name__
  • Working with math module
  • Working with random module

Module 45:Spark SQL Intenals

  • Catalyst Optimizer and Spark SQL Execution Plan
  • Deep dive on Sparksession vs sparkcontext
  • spark SQL Basics part-1
  • RDD Transformation and Use Cases

Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

Talk With Us
footer-logo

Courses

Quick Links

Address

Nilagiri Block, Flat No 602,Aditya Enclave,Ameerpet,Hyderabad, Telangana 500038