TECHNOLOGIES

A MAPREDUCE SOLUTION FOR HANDLING LARGE DATA EFFICIENTLY

  • 1 Department of Computer Science, University of Chemical Technology and Metallurgy, Sofia, Bulgaria

Abstract

Big data is large volume, heterogeneous, distributed data. Big data applications where data collection has grown continuously, it is expensive to manage, capture or extract and process data using existing software tools. With increasing size of data in data warehouse it is expensive to perform data analysis. In recent years, numbers of computation and data intensive scientific data analyses are established. To perform the large scale data mining analyses so as to meet the scalability and performance requirements of big data, several efficient parallel and concurrent algorithms got applied. For data processing, Big data processing framework relay on cluster computers and parallel execution framework provided by MapReduce. MapReduce is a parallel programming model and an associated implementation for processing and generating large data sets. In this paper, we are going to work around MapReduce, use a MapReduce solution for handling large data efficiently, its advantages, disadvantages and how it can be used in integration with other technology.

Keywords

Article full text

Download PDF