MapReduce Technique : Hadoop Big Data

- June 30, 2015

As a batch processing architecture, the major value of Hadoop is that it enables ad hoc queries to run against an entire data set and return results within a reasonable time frame. Distributed computing across a multi-node cluster is what allows this level of data processing to take place.
MapReduce applications can process vast amounts (multiple terabytes) of data in parallel on large clusters in a reliable, fault-tolerant manner. MapReduce is a computational paradigm in which an application is divided into self-contained units of work. Each of these units of work can be issued on any node in the cluster.

http://bigdataconcept.blogspot.in/2015/06/mapreduce-hadoop-big-data.html

A MapReduce job splits the input data set into independent chunks that are processed by map tasks in parallel. The framework sorts the map outputs, which are then input to reduce tasks. Job inputs and outputs are stored in the file system. The MapReduce framework and the HDFS (Hadoop Distributed File System) are typically on the same set of nodes, which enables the framework to schedule tasks on nodes that contain data.
The MapReduce framework consists of a single primary JobTracker and one secondary TaskTracker per node. The primary node schedules job component tasks, monitors tasks, and re-executes failed tasks. The secondary node runs tasks as directed by the primary node.

MapReduce is composed of the following phases:

i)Map
ii)Reduce

The map phase

The map phase is the first part of the data processing sequence within the MapReduce framework. Map functions serve as worker nodes that can process several smaller snippets of the entire data set. The MapReduce framework is responsible for dividing the data set input into smaller chunks, and feeding them to a corresponding map function. When you write a map function, there is no need to incorporate logic to enable the function to create multiple maps that can use the distributed computing architecture of Hadoop. These functions are oblivious to both data volume and the cluster in which they are operating. As such, they can be used unchanged for both small and large data sets (which is most common for those using Hadoop).

Important: Hadoop is a great engine for batch processing. However, if the data volume is small, the processor usage that is incurred by using the MapReduce framework might negate the benefits of using this approach.
Based on the data set that one is working with, a programmer must construct a map function to use a series of key/value pairs. After processing the chunk of data that is assigned to it, each map function also generates zero or more output key/value pairs to be passed forward to the next phase of the data processing sequence in Hadoop. The input and output types of
the map can be (and often are) different from each other.

The reduce phase

As with the map function, developers also must create a reduce function. The key/value pairs from map outputs must correspond to the appropriate reducer partition such that the final results are aggregates of the appropriately corresponding data. This process of moving map
outputs to the reducers is known as shuffling. When the shuffle process is completed and the reducer copies all of the map task outputs, the
reducers can go into what is known as a merge process. During this part of the reduce phase, all map outputs can be merged together to maintain their sort ordering that is established during the map phase. When the final merge is complete (because this process is done in rounds for performance optimization purposes), the final reduce task of consolidating results
for every key within the merged output (and the final result set), is written to the disk on the HDFS.

Development languages: Java is a common language that is used to develop these functions. However, there is support for a host of other development languages and frameworks, which include Ruby, Python, and C++.

Comments

Tejuteju28 May 2018 at 05:16
Nice post ! Thanks for sharing valuable information with us. Keep sharing..Big data hadoop online training

ReplyDelete
Replies
sunshineprofe26 October 2018 at 02:12
I have to voice my passion for your kindness giving support to those people that should have guidance on this important matter.
safety courses in chennai
ReplyDelete
Replies
jefrin5 April 2019 at 04:26
And indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.
Data science Course Training in Chennai |Best Data Science Training Institute in Chennai
RPA Course Training in Chennai |Best RPA Training Institute in Chennai
AWS Course Training in Chennai |Best AWS Training Institute in Chennai
Devops Course Training in Chennai |Best Devops Training Institute in Chennai
Selenium Course Training in Chennai |Best Selenium Training Institute in Chennai
Java Course Training in Chennai | Best Java Training Institute in Chennai
ReplyDelete
Replies
Riyas Fathin15 April 2019 at 06:04
Hello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
AWS Training in Chennai | Best AWS Training Institutes in Chennai
Best Data Science Training Institutes in Chennai
Best Python Training Institutes in Chennai
Best RPA Training Institutes in Chennai
Digital Marketing Training Institutes in Chennai
Matlab Training Institutes in Chennai
Best AWS Course Training Institutes in Chennai
Best Devops Course Training Institutes in Chennai
Java Training Institute in Chennai
C C++ Training Institutes in Chennai
ReplyDelete
Replies
jefrin12 October 2019 at 03:40
It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.
Data science Course Training in Chennai |Best Data Science Training Institute in Chennai
matlab training chennai | Matlab course in chennai
ReplyDelete
Replies
asha rajesh29 October 2019 at 00:11
Useful blog.I have learned a lot of informative stuff from your blog.Thank you so much for sharing this wonderful post. Keep posting such valuable contents. https://suryainformatics.com
ReplyDelete
Replies
varsha19 May 2020 at 03:07
Amazon Web Services (AWS) is the most popular and most widely used Infrastructure as a Service (IaaS) cloud in the world. AWS has four core feature buckets—Compute, Storage & Content Delivery, Databases, and Networking.
AWS training in chennai | AWS training in anna nagar | AWS training in omr | AWS training in porur | AWS training in tambaram | AWS training in velachery
ReplyDelete
Replies
Vale Co Xenia7 July 2020 at 22:37

Great Article
Cloud Computing Projects

Networking Projects

Final Year Projects for CSE

JavaScript Training in Chennai

JavaScript Training in Chennai

The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
ReplyDelete
Replies
360DigiTMG21 March 2022 at 05:00
Took me time to understand all of the comments, but I seriously enjoyed the write-up. It proved to be really helpful to me and I'm positive to all of the commenters right here! Its constantly nice when you can not only be informed, but also entertained! I am certain you had enjoyable writing this write-up.
data analytics course in hyderabad
ReplyDelete
Replies
360DigiTMG19 April 2022 at 01:28
Your blog provided us with valuable information to work with. Each & every tips of your post are awesome. Thanks a lot for sharing. Keep blogging,
data science training in hyderabad
ReplyDelete
Replies
OVERLORRRDD29 September 2022 at 00:32
Your blog is amazing totally loved it .
DigitizeDigitally
ReplyDelete
Replies

Add comment

Search This Blog

Big Data Concept

MapReduce Technique : Hadoop Big Data

Comments

Post a Comment

Popular posts from this blog

Operational Vs Analytical : Big Data Technology

Hadoop Distributed File System