Monday, 20 April 2015

INTRODUCTION:

Big Data is among the hottest trends in IT sector, recently. Hadoop stands front and center in the discussion of how to implement a big data strategy. Now it is important to know exactly what it means when somebody says “Hadoop”. 

DEFINITION:
  1.  Hadoop can be defined as an open source software based on Java-Programming framework that supports the processing of large data sets within a widely distributed computing environment.
  2.  Doug Cutting, the creator named the framework, HADOOP after his son’s stuffed toy element.
  3.  It was inspired by Google’s Map Reduce.
  4.  It is sponsored by the Apache software foundation.


DESCRIPTION:

     Developers :       Apache software foundation
       Initial Release:  December 10, 2011
       Stable Release:  November 18, 2014
       Language:            Java
     Operating System:  Cross Platform (i.e. Windows, Linus  but  can also work with BSD and OS X
     Type:  Distributed File System

WHERE IS IT USED ?

   Hadoop Framework is used by :

 Google, Yahoo and IBM- involving search engines and advertising.
 Other prominent users are Facebook, Microsoft, Azare, Amazon elastic computer cloud and Amazon  Simple Storage Services, IBM Bluemix Cloud services and so on.

HOW AND WHEN DOES IT WORK ?
  1.   Hadoop makes it possible to run applications on systems with thousands of nodes processing    thousands of terabytes.
  2.  Its varied file system facilitates rapid data transfer within the nodes and allows uninterrupted  operation and processing even in case of a node failure, even if a significant number of nodes  become inoperative.
  3.  It is designed to scale up from a single server to thousands of machines, with very high degree  of fault tolerance.
  4.  All the modules in Hadoop are designed with the basic assumption that incase of hardware  failure of an individual machine or rack of machines, it should be tackled(i.e.should be  automatically handled in software by the framework.

MODULES OF HADOOP
   The base Apache Hadoop framework is composed of the following modules:

§  Hadoop Common: contains libraries and utilities needed by other Hadoop modules.
§  HDFS: a distributed File- system that stores data on commodity machines, providing very high aggregate bandwidth across the clusters.
§  Hadoop Yarn: a resource management platform responsible for managing computer resources in clusters and using them for scheduling of user’s application.
§  Hadoop Map Reduce: a programming model for large scale data processing.

HADOOP TRAINING would provide a platform to learn to store, manage, retrieve, and analyse Big Data on clusters of servers in the cloud using the Hadoop ecosystem.
  1.     Also helps to learn how to analyse large amounts of data to bring out insights.
  2.     Relevant examples and cases make the learning more effective and easier.
  3.     Hadoop online training is preferred at a wider scale because it is live, online, interactive classes, learn from home.

SUMMARY:

The Indian Big Data Industry has been predicted to rise to five-fold high from the current level of $200 mn to $1 bn in 2015 which is 4% of the expected global share. At the same time, the survey also indicates there will be a significant gap in job openings and candidates with Big Data skills. Technology professionals need to be equipped for Big Data Projects, which makes them more valuable and thereby more marketable to other employers. So, Hadoop Online Training professional would have the advantage of an accelerated career growth and increased pay package.