INTRODUCTION:
Big Data is among the hottest trends in IT sector,
recently. Hadoop stands front and center in the discussion of how to implement
a big data strategy. Now it is important to know exactly what it means when
somebody says “Hadoop”.
DEFINITION:
- Hadoop can be defined as an open source software based on Java-Programming framework that supports the processing of large data sets within a widely distributed computing environment.
- Doug Cutting, the creator named the framework, HADOOP after his son’s stuffed toy element.
- It was inspired by Google’s Map Reduce.
- It is sponsored by the Apache software foundation.
DESCRIPTION:
Developers : Apache software
foundation
Initial
Release: December 10, 2011
Stable
Release: November 18, 2014
Language: Java
Operating System: Cross Platform (i.e. Windows, Linus but can also work with BSD and OS X
Type: Distributed File System
WHERE IS IT USED ?
Hadoop
Framework is used by :
Google, Yahoo and IBM- involving search engines and
advertising.
Other prominent users
are Facebook, Microsoft, Azare, Amazon elastic computer cloud and Amazon Simple
Storage Services, IBM Bluemix Cloud services and so on.
HOW AND WHEN DOES IT WORK ?
- Hadoop makes it possible to run applications on systems with thousands of nodes processing thousands of terabytes.
- Its varied file system facilitates rapid data transfer within the nodes and allows uninterrupted operation and processing even in case of a node failure, even if a significant number of nodes become inoperative.
- It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance.
- All the modules in Hadoop are designed with the basic assumption that incase of hardware failure of an individual machine or rack of machines, it should be tackled(i.e.should be automatically handled in software by the framework.
MODULES OF HADOOP
The base Apache Hadoop framework is composed
of the following modules:
§ Hadoop
Common: contains libraries and utilities needed by other Hadoop modules.
§ HDFS:
a distributed File- system that stores data on commodity machines, providing
very high aggregate bandwidth across the clusters.
§ Hadoop
Yarn: a resource management platform responsible for managing computer
resources in clusters and using them for scheduling of user’s application.
§ Hadoop
Map Reduce: a programming model for large scale data processing.
HADOOP TRAINING would
provide a platform to learn to store, manage, retrieve, and analyse Big Data on
clusters of servers in the cloud using the Hadoop ecosystem.
- Also helps to learn how to analyse large amounts of data to bring out insights.
- Relevant examples and cases make the learning more effective and easier.
- Hadoop online training is preferred at a wider scale because it is live, online, interactive classes, learn from home.
SUMMARY:
The Indian Big Data Industry has been
predicted to rise to five-fold high from the current level of $200 mn to $1 bn
in 2015 which is 4% of the expected global share. At the same time, the survey
also indicates there will be a significant gap in job openings and candidates
with Big Data skills. Technology professionals need to be equipped for Big Data
Projects, which makes them more valuable and thereby more marketable to other
employers. So, Hadoop Online Training professional would have the advantage of an accelerated
career growth and increased pay package.
No comments:
Post a Comment