top of page

BIG DATA 

 

 

This course provides the knowledge and training to use new Big Data tools and techniques as well as learn ways of storing information that will allow for efficient processing and analysis for informed business decision-making. Further, you learn to store, manage, process and analyze massive amounts of unstructured data.

 

Courses: course topics and content can be customized based on client needs.

Hadoop
  • Master the concepts of Hadoop framework and its deployment in a cluster environment

  • Learn to write complex MapReduce programs

  • Perform Data Analytics using  Pig & Hive

  • Acquire in-depth understanding of Hadoop Ecosystem including Flume, Apache Oozie workflow scheduler, etc.

  • Master advance concepts of Hadoop 2.0 : Hbase, Zookeeper, and Sqoop

  • Get hands-on experience in setting up different configurations of Hadoop cluster

  • Work on real-life industry based projects using Hadoop
 
PIG
  • Introduction

  • Installing and Running Pig

  • Grunt

  • The Pig Data Model

  • Basic Pig Latin

  • Advanced Pig Latin

  • Developing and Testing Scripts

  • Tuning Pig

  • Embedding Pig Latin in Python

  • Writing Evaluation and Filter Functions

  • Writing Load and Store Functions

  • Pig and and the Rest of the Hadoop Zoo

  • Built-in User Defined Functions and Piggybank
Hive 
 
  • Describe how Apache Hive fits in the Hadoop ecosystem

    • Understand the data pipeline

    • Describe other SQL-on-Hadoop tools
  •  Create tables and load data in Apache Hive

    • Create databases

    • Create simple, external, and partitioned tables

    • Alter and drop tables
  •  Query data with Apache Hive

    • Query tables

    • Manipulate tables with UDFs

    • Combine and store tables
Zookeeper 

 

  • Understand how Apache ZooKeeper solves coordination issues in traditional distributed systems

  • Discover steps to set up and get started with ZooKeeper in a development environment in addition to production

  • Administer Apache ZooKeeper for real-world use and production workload 

  • Zookeeper client commands

R Programming

 

  • Introduction to R and basic statistical methods using R commander

  • Basic elements of R programming and statistical methods

  • Essential R programming considerations and statistical methods

Spark

 

  • Understand Scala/Python and its implementation

  • Apply Lazy values, Control Structures, Loops, Collection, etc.

  • Learn the concepts of Traits and OOPS in scala

  • Understand Functional programming in scala

  • Get an insight into the BigData challenges

  • How spark acts as a solution to these challenges

  • Install spark and implement spark operations on spark shell

  • Understand what are RDDs in spark

  • Implement spark application on YARN (Hadoop)

  • Analyze Hive and Spark SQL Architecture

MongoDB

 

  • Gain an insight into the 'Roles' played by a mongoDB® expert. 

  • Learn how to design Schema using Advanced Queries.

  • Troubleshoot Performance issues.

  • Understand mongoDB® Aggregation framework

  • Learn mongoDB® Backup and Recovery options and strategies

  • Understand scalability and availability in mongoDB® using Sharding 

  • How to setup a replicated cluster, Managing ReplicaSets etc. 

  • What is shards, Key, ConfigServer, Query Router etc. 

  • How to setup Sharding 

  • Various MongoDB® tools to develop and deploy your applications

  • Learn MongoDB® Administration activities

  • Health Check, Backup, Recovery, Performance tuning, etc. 

  • Understand Hadoop and MongoDB Integration 

  • Data Migration in MongoDB with Hadoop (MongoDB to Hive)

  • Learn to integrate MongoDB with tools like Jaspersoft and Pentaho

  • Integration of MongoDB with GUI Tool Robomongo

Neo4J

 

  • What are NoSQL and Graphic databases.

  • Examining basic graph modeling.

  • Using Cypher to query Neo4J.

  • Using paths to traverse multiple nodes.

  • Getting properties back from paths.

  • Using specific nodes

  • Creating entities

  • Deleting entities

Cassandra

 

  • Understand Cassandra and NoSQL domain.

  • Create Cassandra cluster for different kinds of applications.

  • Understand Apache Cassandra Architecture.

  • Design and model Applications for Cassandra.

  • Port existing application from RDBMS to Cassandra.

  • Learn to use Cassandra with various programming languages.

For more info contact ITEXPS

bottom of page