The importance of Big Data and Big Data Mining is growing significantly in recent years. Different kind of e-sources as social networks, e-commerce sites, e-mails, sensors, etc. are generating large amount of structured and unstructured numerical and text data. This data provides valuable information about costumer’s preferences or ratings of products or commodities. This information is essential for making predictions on the base of the sentiment analysis of this data. The sentiment analysis of large amount of text data requires specific big data and machine learning /ML/ libraries. In this paper the implementation of a system for big data sentiment analysis using ML algorithms is proposed. It is based on Naïve Bayes and Support Vector Machines /SVM/ classification ML algorithms for text analysis. The system is implemented in Java and uses Apache Spark ML libraries which are very flexible, fast and scalable. The system is tested with well known Amazon dataset and its performance is measured in form of accuracy. The obtained results approve the effectiveness of big data sentiment analysis algorithms. The System can be applied for recommendation of products and services or predictions of customers’ needs.
Author: Al-Barznji K.
Big data is large volume, heterogeneous, distributed data. Big data applications where data collection has grown continuously, it is expensive to manage, capture or extract and process data using existing software tools. With increasing size of data in data warehouse it is expensive to perform data analysis. In recent years, numbers of computation and data intensive scientific data analyses are established. To perform the large scale data mining analyses so as to meet the scalability and performance requirements of big data, several efficient parallel and concurrent algorithms got applied. For data processing, Big data processing framework relay on cluster computers and parallel execution framework provided by MapReduce. MapReduce is a parallel programming model and an associated implementation for processing and generating large data sets. In this paper, we are going to work around MapReduce, use a MapReduce solution for handling large data efficiently, its advantages, disadvantages and how it can be used in integration with other technology.