Thursday, 3 March 2016

Real Time Analytics of Big Data

Big Data is used for storing enormous data which is both structured and unstructured and coming from different sources like sensors. In this post I am going to explain Real Time Analytics of Big Data.

The data that we deal with can be analyzed by two ways.
  1. When the data is in motion. That mean when data is still running and it has not been inserted into database.
  2. After data has been inserted into database.

Now the world has become so fast that if we wait for the data to be inserted into database and then analyze it, sometimes it becomes useless.

Let me give some example. We have CCTV camera at every traffic signal. It generates millions of data every second. Now traditionally we follow the technique where when some crime happens then we analyze the database and try to figure out the criminal. This is the bottom up approach. The better option is to analyze every things at source in real time. We will put face scanner at every source and the moment it find some suspects it will alert the nearest crime control system. In this case we don't have to wait for data to get inserted into database. Therefore we can nail the suspect and caught them
before they can commit crime.

There are other areas also where we can use real time analytics.

Now a days every where we have so much data that it is practically impossible to store all of them. So we analyze the data before storing in data base and remove the unwanted data. In this way we will store only the important data.

Real time analytics tool

i) IBM Infosphere Stream
ii) Apache Spark
iii)Apache Storm

IBM infosphere Stream is a core product of IBM which focuses on real time analysis of  big data. The aim is to analyses the data in real time and come out with meaningful conclusions. It works on the principal of Graph. As graph is set of vertex and edges. It also is based on that principal. Here vertex will be called as operator and edges will be called as stream. In operator we will write the code and in stream tuple will flow. Tuple is nothing but a row of data. We have different types of operators each with specific function.


Source Type : Any outside data first comes into this operator. This is the entry point of data. It is capable of interacting from external devices. So it is the intersection point between software and hardware. This operator is capable of parsing and creating external tuples.

Sink Type : The main work of this operator is to load the data into database.

Filter : It do the tuple filtering. The tuple which does not meet the criteria is omitted.

Punctor : A punctor operator can insert punctuations into output stream based on user supplied condition.

Aggregate : An aggregate operator is used for grouping and summarizing  incoming tuples.

Join : Join operator is used for correlating two streams.

Sort : Sort operator is used for imposing an order on incoming tuple.

Real Time Analytics of Big Data

So we have source operator and we have sink operator. The source will interact with outside world. Get the required data from any hardware or file.
The sink will load the final data into database.
In between we have different operator which will be linked with each other via edges known as stream in our case. All the data flow through this stream.

Some cases where real time analytics of data is useful
i) Crime detection and prevention
ii) Stock Market - In stock market trading happens so fast that a fraction of second change
    everything. Here if we analyse the pattern in real time then we can generate  meaningful
iii)Telecommunication - Now a days world is so densely connection that it becomes a headache for
     the companies to manage the CDR. One can imagine the vast quantity of data present in a CDR.
     All of the data is not relevant. So in order to store them efficiently Infosphere Stream can be
     used. It will parse all the details and remove the irrelevant one.
iv)Health monitoring - The system can also be used for proper monitoring of health. Data from
    devices can be monitored and studies in order to find out if  the patient is suffering from some
v) Transportation - Real time data can be available about movement of buses or anything and
    customer can benefit from it.

Infosphere Stream and IOT(Internet of Things)

One of the future technology is IOT. Every company is investing heavily in this field. Streaming technology can be used in implementation of IOT.

For successful implementation of IOT two things are required. The system is capable in handling large amount of data and it is capable of communicating with hardware. Infosphere Stream qualified in both. So it can be one of the technology by which IOT can be implemented.

Let me give an example of IOT-

With the onset of IOT everything will become smart. So we will have smart chair. I can find out from anywhere in the world whether someone has occupied my chair. For this we will give an unique ID to my chair. My chair will be in a network. We will use some sensor like pressure sensor in order to determine whether someone has occupied my chair. The pressure sensor will continuously generate the data after fixed interval of time. Our Source operator will communicate with the sensor and generate the required tuple. Which will be then parsed by the parser to find out if someone is occupying it. So from anywhere in the world we can tell if someone has occupied my chair.



1 comment: