Big Data Hadoop

  • Big data is a collection of large datasets that cannot be processed using traditional computing techniques.
  • It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and framework

Syllabus

  1. ARCHITECTURE:
  • Introduction to Big Data / Hadoop
  • Understanding of Eco-System Build
  • Understanding Cluster Setup Activities
  • HIVE Architecture
  • PIG Architecture
  • Introduction to NoSql
  • Understanding Linux & Hadoop Basic Commands.
  • HBASE Architecture
  • Understanding of Cloudera Manager and HUE
  1. ADMINISTRATOR:
  • Introduction to Big Data / Hadoop
  • Understanding Cluster
  • Best Practices for Cluster Setup
  • How MapReduce Works
  • Install Pseudcluster
  • Install Multi node cluster
  • Configuration
  • Setup cluster on Cloud - EC2
  • Tools
  • Metadata & Data Backups
  • File system check (fsck)
  • Backing Up NN
  1. HADOOP DEVELOPER:
  • Introduction to Big Data / Hadoop
  • Understanding Cluster
  • Developing MapReduce Application
  • How MapReduce Works
  • MapReduce Types
  • MapReduce Formats
  • MapReduce Features
  • HIVE Basics
  • HIVE UDF's
  • PIG Basic's
  • PIG UDF's
  • Introduction to NoSql
  • Introduction to HBASE
  • Zookeeper
  • Oozie
  • Usecases
  • Exam
  1. HADOOP DATA ANALYTICS:
  • Java Refresher Session-MR Introduction
  • Working With Hive-E-Commerce Use Case
  • Working With Pig-Financial Uses Case
  • Twitter Use Case-Sentimental Analysis
  • MR Optimization
  • Custom Combiner, Custom Partitioner And Distributed Cache
  1. NEW DEVELOPMENTS:
  • Introduction to Yarn
  • Overview of BI Tools.
  • Overview of Platform
  • Overview of Cloudera Manager

 

  1. HADOOP DATA INGESTION:
  • Data Ingestion Using Scoop And Flume
  • From Source To HDFS
  • Deriving Insights From Log Files, Unstructured Data And DBMS
  • Installation And Understanding Of Scoop And Flume
  • Understanding Architecture And Installation of Hbase
  • Meeting Zookeeeper, HRegionServer
  1. PROJECT USE CASES:
  • Entertainment Use Case
  • Twitter Use Case
  • Health Care Use Case
  • E-Commerce Use Case
  • Bio-Informatics Use Case
  1. Course Objectives:

After completing the course successfully, participants should be able to:

  • Explain the need for Big Data, and list its applications.
  • Demonstrate the mastery of HDFS concepts and MapReduce framework
  • Use Sqoop and Flume tload data intHadoop File System
  • Run queries using Pig, and Hive
  • Install and configure HBase
  • Discuss and differentiate various commercial distributions of Big Data like Cloudera and Hortonworks
  • Differentiate between Hadoop 1.0 and hadoop 2.0

For More Information

Feel Free to contact us and ask your queries