Posts

Showing posts from June, 2015

MapReduce Technique : Hadoop Big Data

Image
As a batch processing architecture, the major value of Hadoop is that it enables ad hoc queries to run against an entire data set and return results within a reasonable time frame. Distributed computing across a multi-node cluster is what allows this level of data processing to take place. MapReduce applications can process vast amounts (multiple terabytes) of data in parallel on large clusters in a reliable, fault-tolerant manner. MapReduce is a computational paradigm in which an application is divided into self-contained units of work. Each of these units of work can be issued on any node in the cluster. A MapReduce job splits the input data set into independent chunks that are processed by map tasks in parallel. The framework sorts the map outputs, which are then input to reduce tasks. Job inputs and outputs are stored in the file system. The MapReduce framework and the HDFS (Hadoop Distributed File System) are typically on the same set of nodes, which enabl...

Operational Vs Analytical : Big Data Technology

Image
There are two technologies used in Big Data Operational and Analytical. Operational capabilities include capturing and storing data in real time where as analytical capabilities include complex analysis of all the data. They both are complementary to each other hence deployed together. Operational and analytical technologies of Big Data have different requirement and in order to address those requirement different architecture has evolved. Operational systems include NoSql database which deals with responding to concurrent requests. Analytical Systems focuses on complex queries which touch almost all the data.Both system work in tandem and manages hundreds of terabytes of data spanning over billion of records. Operational Big Data For Operational Big Data NoSql is generally used. It was developed to address the shortcoming of traditional database and it is faster and can deal with large quantity of data spread over multiple servers. We are also using cloud compu...

Big Data Introduction

Image
What is Big Data? Big Data is a collection of large amount of Data that is available with all the organisation. The amount of these data are so huge that managing them has become a challenge. The worst thing is these data are increasing exponentially. For example : i) 200 of London's Traffic Cams collect 8 TB of data per day. ii)1 day of Instant Messaging in 2002 consume 750 GB of Data. iii)Annual Email Traffic excluding spams consume 300PB+ of Data. iv)In 2004 Walmart Transacton DB contains 200 TB of Data. v) Total Digital Data created in 2012 is assumed to be 270000 PB. As per a report these data will grow at a rate of 40% annually. Big Data Technique is getting lot of importance now a days from organisations to handle those data as well as using them in business growth. Big Data is a technology that uses data that is diverse, huge and require special skill to handle it. In other word conventional technology will not be able to effectively handle it....

Big Data Analytics a high paying career

Image
The next phase of demand for IT professional will come from Big Data . It is a high paying job and the demand is huge. So this is a great news for IT professional. Making money from Big Data is a challenge. Here Data analytics come handy. Data analytics can be of different background which include data science, data mining, web analytics or even statistics. IT professional have to work in tandem with Data Analyst in order to get something meaningful from the huge quantity of data. One of the major complain of Data Analysts is that they don't get enough support from their IT team. This is a major deterrent in their work. Other major problem Data Analyst face is the quality of Data given to them. They are poorly documented and they have to spend huge amount of time in reformatting those data. IT professional must understand the need of Data Analyst and must prepare data according to their need so that they can use their time in analysing the data instead of reformatting it...