Big Data Analytics in Cloud environment using Hadoop

Reading time: 5 minute
...

📝 Abstract

The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.

💡 Analysis

The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.

📄 Content

Big Data Analytics in Cloud environment using Hadoop Mansaf Alam and Kashish Ara Shakil Department of Computer Science, Jamia Millia Islamia, New Delhi

Abstract: The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.

Keywords: Big Data, Cloud, Processing, Hadoop, Hbase.

  1. Introduction

There is a rampant increase in the amount of data being produced from varied sources. This can be attributed to the instrumentalisation of the current society and personnel’s leading to storage and production of vast amounts of data. Since, the data being produced is huge with a lot of variety and the rate of production is also rapid. Thus, the traditional systems fail to manage this data and this is what led to the buzz word called Big Data. Big Data is a term which refers to the explosion of variety of data produced from disparate sources [1]. It is characterized by five features or attributes i.e. high volume, variety, veracity, visibility and velocity. Since, this kind of data is beyond the management scope of traditional systems therefore in order to mine such kind of data we need analytics’ solutions that can help in gaining insights from both structured and unstructured data. At present scenario its instrumental to blend both big data and analytics into a single entity termed as big data Analytics. Analytics involves examination of data to derive meaningful insights such as hidden patterns and trends that can in turn benefit the organizations in making important business decisions and developing newer business models. The problem of data deluge imposes potential challenges involved in processing and extracting useful information from data. It also requires skills for management and analysis of huge data sets. Cloud computing serves as a quintessential solution for handling big data and hosting big data workloads. Cloud computing has revolutionized the way in which computing resources can be utilized by providing facilities such as pay per use, rapid elasticity and dynamic scalability. It provides the users with an illusion of infinite storage and compute capacity. The cloud resources can be used in private mode through private cloud or can be shared publicly using a public cloud such as Amazon EC2 and Microsoft Azure. Cloud therefore serves as a scalable technology with low upfront investment costs. Thus, the proposition value associated with using cloud as a platform for carrying out analytics is quite strong and therefore it is well suited for carrying out scalable data analytics.
Hadoop is a technology that can be used for handling big data. It can play a significant role in opening gates to new insights out of data and can easily handle flood of huge unstructured data sets coming from sources such as sensors, mobile devices and social media. This paper presents about how hadoop can be used as technology on cloud for meeting the big data needs of users and discusses about the proposed hadoop based workflow for handling big data. We also present a case study of analysis carried out on movie data for mining many useful information from it which includes finding the number of movies released between a given period and the number of movies having a certain rating besides other information’s. The rest of this paper is organized as follows: Section 2 presents a survey of the related approaches used for big data analytics, Section 3 discusses about hadoop as a platform for meeting the big data needs and requirements. Section 4 shows our proposed workflow for carrying out big data analytics. Furthermore, Section 5 discusses our case study for analytics of movie data. Finally the paper concludes with conclusion and future directions in section 6.

  1. Related Work

In the research paper [6], the researchers have discussed the assisting developers of BDA Apps for cloud deployments. In their paper they have proposed a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments, their approach makes use of the readily-available yet rarely used execution logs from these platforms. They have done a case study on three representative Hadoop-based BDA Apps and have shown that their approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Th

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut