Posts

Showing posts with the label tutorial

Hadoop Distributed File System

Image
Hadoop Distributed File System (HDFS) is the file system that is used to store the data in Hadoop. How it stores data is special. When a file is saved in HDFS, it is first broken down into blocks with any remainder data that is occupying the final block. The size of the block depends on the way that HDFS is configured. At the time of writing, the default block size for Hadoop is 64 megabytes (MB). To improve performance for larger files, Hadoop changes this setting at the time of installation to 128 MB per block. Then, each block is sent to a different data node and written to the hard disk drive (HDD). When the data node writes the file to disk, it then sends the data to a second data node where the file is written. When this process completes, the second data node sends the data to a third data node. The third node confirms the completion of the writeback to the second, then back to the first. The NameNode is then notified and the block write is complete. After all blocks are w...

Operational Vs Analytical : Big Data Technology

Image
There are two technologies used in Big Data Operational and Analytical. Operational capabilities include capturing and storing data in real time where as analytical capabilities include complex analysis of all the data. They both are complementary to each other hence deployed together. Operational and analytical technologies of Big Data have different requirement and in order to address those requirement different architecture has evolved. Operational systems include NoSql database which deals with responding to concurrent requests. Analytical Systems focuses on complex queries which touch almost all the data.Both system work in tandem and manages hundreds of terabytes of data spanning over billion of records. Operational Big Data For Operational Big Data NoSql is generally used. It was developed to address the shortcoming of traditional database and it is faster and can deal with large quantity of data spread over multiple servers. We are also using cloud compu...