Big data Hadoop Module           

Overview

Big Data will help to create new career growth opportunities for job seekers and growth for entirely new categories of companies, such as those that aggregate and analyses industry data. Many of these will be companies that sit in the middle of large information flows where data about products and services, buyers and suppliers, consumer preferences and intent can be captured and analyzed. Forward-thinking leaders across sectors should begin aggressively to build their organizations’ Big Data capabilities.

Module 1: BigData courses include: R, Hadoop, Pig, Hive, HBase, Sqoop, Storm, Spark, MongoDB, Casandra, Sqoop, Mahout

Module 2: Database, Tableua, R, Python Programming, Numpy, Pandas, Linear Problem Solving, Forecasting, Statistics

Students are encouraged to combine module 1 &  2 for the better career success and build a strong knowledge base for the current market tools/technology.

Course Content   
  • Learn how to implement ETL and machine learning algorithms using BIGDATA technologies

  • Key concepts:

  • Module 1:  Big Data use cases with Hadoop Technology 

  • Time: ~32 hours

  • BIG DATA Foundation

  • Database – overview, Oracle PL/SQL   [OCP]

  • Data warehouse, ETL

  • Volume, Velocity, Verocity 

  • Data Warehouse vs BIG DATA

  • BIG DATA – Use cases,   OLAP vs OLTP

  • BIG DATA Programming​

  • Hadoop Architecture 

  • Linux Shell Scripting

  • Hadoop Map Reduce 

  • Sqoop – Data import and export

  • Programming with: 

  • PIG

  • Hive

  • Mahout

  • MySQL

  • HBASE

  • Zookeeper

  • Storm/Spark – Real-time Analytics

  • Java/Python - UDF (User Define Functions)

  • Theory topics are complemented by hands-on LAB+ E2E Project 

  • Module 2: Data Driven Decision By Managers

  • Time: 32 hours

  • Database and Data Analysis

  • Database vs DataWare house vs Big Data

  • Managerial decisions based on data - Introduction

  • Product Valuation [NPV, DCF, FV, PV, WAAC]

  • Statistic Overview

  • Forecasting, and Projection Algorithm [R, Excel]

  • Classification, Clustering, Regression Algorithm

  • Descriptive and Visual Data Analysis [tableau]

  • Machine Learning [Python, R]

  • Linear Problem [LP]

  • Excel Solver

  • R Programming

  • Python Statistical Library [NumPy, SciPy, Panda, PyLab]

  • Theory topics are complemented by hands-on LAB

  • Project Work/Case Study:  Real Industrial problem solution in a classroom setting

1/8