What is Big Data?
We are dealing with data for so many years. But in today's landscape the emphasis has shifted to analytics and Big Data.
Best result can be expected from analytics only when it is provided with high quantity and high quality of data. The more data we have, the better decision we get. Currently data of size of data we deal with is in petabytes which will in future will scale to zeta bytes. With the evolution of technology over the year we are proficient in dealing with massive database, data marts and data warehouses. But now things have changed. We are getting data from different sources which are largely unstructured. So it is a new challenge for the organization how to handle that vast amount of data both structured and unstructured. This situation is dealt with Big Data.
We have reached a point of Data Explosion. From where we are getting all these data. The below diagram explain this.
The data comes from multiple source sensors that gather climate information, contents posted on social media, online transactions record, call details records, cell phone GPS signals, CCTV cameras.
Characteristics of Big Data
Big Data is characterized by four V's.
i) Volume : As our data volume increase the traditional infrastructure is unable to handle it. Managing such humongous data with current budget is not feasible. Organisation is flooded with growing data sometimes in the range of petabytes.
ii)Velocity : Now we have multiple point of data source. Some of them like sensors generates data at such a large pace with equally large volume, retaining them has become a challenge. We have to improve our response time. Some real time data like fraud detection must be processed immediately.
iii)Variety : Now we have both type of data Structured as well as unstructured. Like texts, sensor data, audio and video clips. If we have to analyse both together then new approach is required. And the irony is 80% of data is unstructured.
iv) Veracity : Establishing trust on the data is also a challenge. As bad input will result in bad output. We are devoting so much of time in analysing the data the data must be trustworthy.
Big Data Strategy
All source of data must be fully exploited by organization. While making decisions executive should consider not only operational data and customer demographics, but also customer feedback,details in contracts and agreements and other type of unstructured data and content.
Factors for Big Data Strategy
i) Integrate and manage full variety, velocity and volume of data
ii)Apply advanced analytics to information in its native form
iii)Visualize all available data for ad-hocs analysis
iv)Development environment for building new analytic applications
v) Workload optimization and scheduling
v) Security and governance
People get confused with Big Data as a technology. It is not just technology, it is a business strategy for utilizing information resource. Success at each entry point
is accelerated by products within Big Data platform which helps in building the foundation for future requirements by expanding further into the big data platform
Big Data Tool
i) Hadoop
ii)Cloudera
iii)MongoDB
iv)Talend
Hadoop - "Hadoop is big data and big data is Hadoop". This is what most of the people think. But it is not like that. Hadoop is just one of the flavour of Big Data. It is an open source software framework for storage of very large dataset. It has enormous storage of any kind of data coupled with efficient processing system. It can handle concurrent task.
Cloudera - Cloudera has some additional features which allow people working in an organisation better access to the data.It is an enterprise solution in which hadoop
can be implemented. It is more secure. As we are storing sensitive data, data security is more important.
MongoDB - It is a modern approach which helps in storing unstructured data in a proper way.
Talend - It is also open source company with a number of products.
We are dealing with data for so many years. But in today's landscape the emphasis has shifted to analytics and Big Data.
Best result can be expected from analytics only when it is provided with high quantity and high quality of data. The more data we have, the better decision we get. Currently data of size of data we deal with is in petabytes which will in future will scale to zeta bytes. With the evolution of technology over the year we are proficient in dealing with massive database, data marts and data warehouses. But now things have changed. We are getting data from different sources which are largely unstructured. So it is a new challenge for the organization how to handle that vast amount of data both structured and unstructured. This situation is dealt with Big Data.
We have reached a point of Data Explosion. From where we are getting all these data. The below diagram explain this.
The data comes from multiple source sensors that gather climate information, contents posted on social media, online transactions record, call details records, cell phone GPS signals, CCTV cameras.
Characteristics of Big Data
Big Data is characterized by four V's.
i) Volume : As our data volume increase the traditional infrastructure is unable to handle it. Managing such humongous data with current budget is not feasible. Organisation is flooded with growing data sometimes in the range of petabytes.
ii)Velocity : Now we have multiple point of data source. Some of them like sensors generates data at such a large pace with equally large volume, retaining them has become a challenge. We have to improve our response time. Some real time data like fraud detection must be processed immediately.
iii)Variety : Now we have both type of data Structured as well as unstructured. Like texts, sensor data, audio and video clips. If we have to analyse both together then new approach is required. And the irony is 80% of data is unstructured.
iv) Veracity : Establishing trust on the data is also a challenge. As bad input will result in bad output. We are devoting so much of time in analysing the data the data must be trustworthy.
Big Data Strategy
All source of data must be fully exploited by organization. While making decisions executive should consider not only operational data and customer demographics, but also customer feedback,details in contracts and agreements and other type of unstructured data and content.
Factors for Big Data Strategy
i) Integrate and manage full variety, velocity and volume of data
ii)Apply advanced analytics to information in its native form
iii)Visualize all available data for ad-hocs analysis
iv)Development environment for building new analytic applications
v) Workload optimization and scheduling
v) Security and governance
People get confused with Big Data as a technology. It is not just technology, it is a business strategy for utilizing information resource. Success at each entry point
is accelerated by products within Big Data platform which helps in building the foundation for future requirements by expanding further into the big data platform
Big Data Tool
i) Hadoop
ii)Cloudera
iii)MongoDB
iv)Talend
Hadoop - "Hadoop is big data and big data is Hadoop". This is what most of the people think. But it is not like that. Hadoop is just one of the flavour of Big Data. It is an open source software framework for storage of very large dataset. It has enormous storage of any kind of data coupled with efficient processing system. It can handle concurrent task.
Cloudera - Cloudera has some additional features which allow people working in an organisation better access to the data.It is an enterprise solution in which hadoop
can be implemented. It is more secure. As we are storing sensitive data, data security is more important.
MongoDB - It is a modern approach which helps in storing unstructured data in a proper way.
Talend - It is also open source company with a number of products.