What is Big Data?
Big Data is a collection of large amount of Data that is available with all the organisation. The amount of these data are so huge that managing them has become a challenge. The worst thing is these data are increasing exponentially. For example :
i) 200 of London's Traffic Cams collect 8 TB of data per day.
ii)1 day of Instant Messaging in 2002 consume 750 GB of Data.
iii)Annual Email Traffic excluding spams consume 300PB+ of Data.
iv)In 2004 Walmart Transacton DB contains 200 TB of Data.
As per a report these data will grow at a rate of 40% annually. Big Data Technique is getting lot of importance now a days from organisations to handle those data as well as using them in business growth.
Big Data is a technology that uses data that is diverse, huge and require special skill to handle it. In other word conventional technology will not be able to effectively handle it. It contains data which is too large in term of volume, complex to handle, variable i.e not of same type, veracity in term of quality.
But today we have technologies which can be used to arrive at a conclusion from Big Data. For example a retailer can track their user and identify their behaviour to come at a conclusion regarding their preference, what price they are searching and accordingly they can stock their products. One can use social media signal to come at a conclusion like outbreak of any disease or any unrest happening at any part of the country.
So basically Big Data refers to a set of data which is so voluminous that it is impossible to manage by traditional tool.
So Big Data consists of creating effective data from raw data, storing it, retrieving it when necessary and then come out with a conclusion by analysing it.
Some of the term used in Big Data are:
Volume : We might have 500 GB of storage in our personal system. But Facebook consume 500TB of new data everyday. Excessive use of smartphone with new technologies like sensor will create additional data like location and other information including videos.
Velocity : The data is created very fast. Like on-line game is played by million of users simultaneously, stock trading algorithm generates huge quantity of data every second, sensors are generating data in real time, ad impression capture user behaviour at millions of events per seconds. So the data are created at a rapid pace and we need effective technologies in order to deal with it.
Variety : All the data are of different type. Some may be audio, video, text which may be unstructured. It may not be only numbers, dates and strings.
Traditional database used to deal with smaller volume of data which were predictable and consistent. But with the advent of new technology and techniques the amount of Data generated is so large and voluminous that we have to deal with them separately. For that we require Data analysts and Data scientists.