Big Data Hadoop Developer Certification Overview

This Big Data Hadoop Developer Certification training program provides online training on the popular skills required for a successful career in data engineering. Master the Hadoop Big Data framework, leverage the functionality of Apache Spark with Python, simplify data lines with Apache Kafka and use the database management tool Hbase to store data.

Big Data Hadoop Developer Certification Key Features

  • Dedicated mentoring session from industry experts
  • Lifetime access to self-paced learning
  • Includes three assignment based exams to test Hadoop development skills
  • Includes real industry-based projects
  • 60 hours of blended learning

Skills Covered

  • Skills Covered
  • Developing Application Using Apache Spark
  • Hive, HBase and Spark
  • MapReduce processing framework
  • View More

Training Options

Lifetime access to self-paced learning

₹ 3750.000

testing

Big Data Hadoop Developer Certification Curriculum

Tuning Spark

Data Serialization

Memory Tuning

Determining Memory Consumption

Tuning Data Structures

Serialized RDD Storage

Garbage Collection Tuning

Other Considerations

Level of Parallelism

Memory Usage of Reduce Tasks

Broadcasting Large Variables

Data Locality

Summary

Real Time Stream Processing with Spark Streaming

Micro batch

Discretized Streams (DStreams)

Input DStreams and Receivers

Dstream to RDD

Basic Sources

Advanced Sources

Transformations on DStreams

Output Operations on DStreams

Design Patterns for using foreachRDD

DataFrame and SQL Operations

Checkpointing

Socket stream

File Stream

Stateful operations

How stateful operations work?

Window Operations

Join Operations

How to process late arriving data?

What is called watermark?

Hands On:

Writing code for Creating SparkContext , HiveContext and HbaseContext objects

•   Writing code for Running Hive queries using Spark-SQL

•   Writing code Loading , transforming text file data and converting that into Dataframe

•   Writing code  Reading and storing JSON files as Dataframes inside the spark code

•   Writing code for Reading and storing PERQUET files as Dataframes

•  Reading and Writing data into RDBMS (MySQL for example) using Spark-SQL

•   Caching the dataframes

•  Writing code for Processing Flume data using Spark streaming

•   Writing code for Processing network data using Spark streaming

•    Writing code for Processing Kafka data using Spark streaming

•   Writing code for  Storing the output into HDFS using Spark streaming

Running SQL queries using Spark SQL

Starting Point: SQLContext

Creating DataFrames

DataFrame Operations

Running SQL Queries Programmatically

Interoperating with RDDs

Inferring the Schema Using Reflection

PInferring the Schema Using Reflection

Data Sources

Generic Load/Save Functions

Save Modes

Saving to Persistent Tables

Parquet Files

Loading Data Programmatically

Partition Discovery

Schema Merging

JSON Datasets

Hive Tables

JDBC To Other Databases

Troubleshooting

Performance Tuning

Caching Data In Memory

Compatibility with Apache Hive

Unsupported Hive Functionality

Introduction of Spark

Evolution of distributed systems

Why we need new generation of distributed system?

Limitation with Map Reduce in Hadoop,

Understanding need of  Batch Vs. Real Time Analytics

Batch Analytics - Hadoop Ecosystem Overview, Real Time Analytics Options

           

 

Introduction to stream and in memory analysis

What is Spark?

A Brief History: Spark

Using Py-Spark for creating Spark Application

 Invoking Py Spark Shell

Creating the SparkContext

Loading a File in Shell

Performing Some Basic Operations on Files in Spark Shell

Building a Spark Project with maven

Distributed Persistence

Spark Streaming Overview

Example: Streaming Word Count

Shared Variables: Broadcast Variables

Shared Variables: Accumulators

Understanding HBase

HBase Data Model,

HBase Shell,

HBase Client API,

Data Loading Techniques,

ZooKeeper Data Model,

Zookeeper Service,

Zookeeper,

Demos on Bulk Loading,

Getting and Inserting Data,

Filters in HBase.,

Creating Hbase tables and column families

• Writing code for  Loading data into Hbase table

•  Writing codefor  Performing CRUD operations in Hbase

•  Creating  external table in Hive for integrating Hbase with Hive

• Writing code for Loading data into Hbase using MapReduce

•   Writing code for Creating Hbase tables and column families

Data Ingestion Tools

Here we will learn different data loading options available in Hadoop and will look into details about Flume and Sqoop to demonstrate how to bring various kind of files such as Web server logs , stream data, RDBMS,  twetter ‘s tweet into HDFS.

Data warehousing using Hive

Hive Background

Hive Use Case

About Hive

Hive Vs Pig

Hive Architecture and Components,

Metastore in Hive

Limitations of Hive

Comparison with Traditional Database

Hive Data Types and Data Models

Static and Dynamic Partitions

Buckets

Hive Tables(Managed Tables and External Tables)

Importing Data, Querying Data, Managing Outputs

Hive Script

Hive UDF

Hive Demo on Healthcare and Airline Data set.

Programming MapReduce Jobs

What MapReduce is and why it is popular

• The Big Picture of the MapReduce

• MapReduce process and terminology

• MapReduce components failures and recoveries

• Working with MapReduce

• Lab: Working with MapReduce

 

 

 

 

 

Day 2 

• Java MapReduce implementation

• Map() and Reduce() methods

• Java MapReduce calling code

• Lab: Programming Word Count

 

Input/Output Formats and Conversion Between Different Formats

Default Input and Output formats

• Sequence File structure

• Sequence File Input and Output formats

• Sequence File access via Java API and HDS

• MapFile

• Input Format Classes

• Format Conversion

MapReduce Features

• Joining Data Sets in MapReduce Jobs

• How to write a Map-Side Join

• How to write a Reduce-Side Join

• MapReduce Counters

• Built-in and user-defined counters

• Retrieving MapReduce counters

• Lab: Map-Side Join

What is YARN framework?

What is Yarn and its  different components.

Who is the master of Yarm?

What does Resource Manager do?

What are the roles and responsibities of Application Master and Node Manager ?

Working with HDFS

Ways of accessing data in HDFS

• Common HDFS operations and commands

• Different HDFS commands

• Internals of a file read in HDFS

• Data copying with ‘distcp’

 

Job Scheduling

• How to schedule Hadoop Jobs on the same cluster

• Default Hadoop FIFO Schedule

• Fair Scheduler and its configuration

Hadoop Distributed File System (HDFS)

• HDFS overview and design

• HDFS architecture

• HDFS file storage

• Component failures and recoveries

• Block placement

• Balancing the Hadoop cluster

Different Modes of Hadoop Cluster deployment

• Different Hadoop deployment types

• Hadoop distribution options

• Hadoop competitors

• Hadoop installation procedure

• Distributed cluster architecture

• Lab: Hadoop Installation

Introduction to Hadoop and its Eco system

The amount of data processing in today’s life

• What Hadoop is why it is important

• Hadoop comparison with traditional systems

• Hadoop history

• Hadoop main components and architecture

Introduction to Big Data

Which data is called as Big Data

• What are business use cases for BigData

• Big Data requirement for traditional Data warehousing and BI space 

•  Big Data solutions

Big Data Hadoop Developer Certification Overview

Download Syllabus

Big Data Hadoop Developer Certification Advisor

Mukesh Kumar

Mukesh Kumar

Big Data Course Advisor

Mukesh has overall 15 years of industry experience, started his career as Software project engineer and worked in different roles such as Project Lead, Software Architect and Enterprise Architect for over 12 years.  In the last 3 years, he hasworked as professional consultant and cooperate trainer for conducting workshop and training programs in the area of Big Data Analytics and helping client’s migrating their data platform and applications to Big Data platform to leverage the scalability and cost effectiveness of these platforms.

As a corporate trainer, he has conducted around 450 corporate batches, 150 online batches and trained around 18000 people. These training program were conducted for  85 different companies including Flipkart, Walmart Labs, Cisco , eBay etc.

 The list of technologies covered in Hadoop Administration and development stack are HDFS, MapReduce, Hive, Hbase, Hue, Zookeeper, Kafka, Oozie , Flume, Solr, Sqoop, Nifi, Talend, Phoenix, Drill, Presto, Ranger, Kerberos, Ambari , Apache Spark, Apache Storm and Machine Learning using Spark-ML,and in the No-SQL world Cassandra, Redis, MongoDB , Python, R and ElasticSearch .

Apart from conducting classes, he has been engaged as a consultant with many clients such as Scope International (A subsidiary of Standard chartered bank), Manhattan Associate, Hewlett Packard Enterprise and Subex, to harness the big data platform for carrying out enterprise scale data analytics, data processing, distributed search and visualization. 

Big Data Hadoop Developer Certification Certification

Certificate Image

Why Tech Eureka

Tech Eureka's Blended Learning model brings classroom learning experience online with its world-class LMS. It combines instructor-led training, self-paced learning and personalized mentoring to provide an immersive learning experience

Classroom-in-Person

Self-Paced Online Video

Instructor led online

Big Data Hadoop Developer Certification FAQs