Big Data Analytics in Cloud environment using Hadoop
📝 Abstract
The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.
💡 Analysis
The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.
📄 Content
Big Data Analytics in Cloud environment using Hadoop Mansaf Alam and Kashish Ara Shakil Department of Computer Science, Jamia Millia Islamia, New Delhi
Abstract: The Big Data management is a problem right now. The Big Data growth is very high. It is very difficult to manage due to various characteristics. This manuscript focuses on Big Data analytics in cloud environment using Hadoop. We have classified the Big Data according to its characteristics like Volume, Value, Variety and Velocity. We have made various nodes to process the data based on their volume, velocity, value and variety. In this work we have classify the input data and routed to various processing node. At the last after processing from each node, we can combine the output of all nodes to get the final result. We have used Hadoop to partition the data as well as process it.
Keywords: Big Data, Cloud, Processing, Hadoop, Hbase.
- Introduction
There is a rampant increase in the amount of data being produced from varied sources.
This can be attributed to the instrumentalisation of the current society and personnel’s
leading to storage and production of vast amounts of data. Since, the data being produced
is huge with a lot of variety and the rate of production is also rapid. Thus, the traditional
systems fail to manage this data and this is what led to the buzz word called Big Data. Big
Data is a term which refers to the explosion of variety of data produced from disparate
sources [1]. It is characterized by five features or attributes i.e. high volume, variety,
veracity, visibility and velocity. Since, this kind of data is beyond the management scope
of traditional systems therefore in order to mine such kind of data we need analytics’
solutions that can help in gaining insights from both structured and unstructured data.
At present scenario its instrumental to blend both big data and analytics into a single
entity termed as big data Analytics. Analytics involves examination of data to derive
meaningful insights such as hidden patterns and trends that can in turn benefit the
organizations in making important business decisions and developing newer business
models. The problem of data deluge imposes potential challenges involved in processing
and extracting useful information from data. It also requires skills for management and
analysis of huge data sets.
Cloud computing serves as a quintessential solution for handling big data and hosting big
data workloads. Cloud computing has revolutionized the way in which computing
resources can be utilized by providing facilities such as pay per use, rapid elasticity and
dynamic scalability. It provides the users with an illusion of infinite storage and compute
capacity. The cloud resources can be used in private mode through private cloud or can be
shared publicly using a public cloud such as Amazon EC2 and Microsoft Azure. Cloud
therefore serves as a scalable technology with low upfront investment costs. Thus, the
proposition value associated with using cloud as a platform for carrying out analytics is
quite strong and therefore it is well suited for carrying out scalable data analytics.
Hadoop is a technology that can be used for handling big data. It can play a significant
role in opening gates to new insights out of data and can easily handle flood of huge
unstructured data sets coming from sources such as sensors, mobile devices and social
media.
This paper presents about how hadoop can be used as technology on cloud for meeting
the big data needs of users and discusses about the proposed hadoop based workflow for
handling big data. We also present a case study of analysis carried out on movie data for
mining many useful information from it which includes finding the number of movies
released between a given period and the number of movies having a certain rating besides
other information’s.
The rest of this paper is organized as follows: Section 2 presents a survey of the related
approaches used for big data analytics, Section 3 discusses about hadoop as a platform for
meeting the big data needs and requirements. Section 4 shows our proposed workflow for
carrying out big data analytics. Furthermore, Section 5 discusses our case study for
analytics of movie data. Finally the paper concludes with conclusion and future
directions in section 6.
- Related Work
In the research paper [6], the researchers have discussed the assisting developers of BDA Apps for cloud deployments. In their paper they have proposed a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments, their approach makes use of the readily-available yet rarely used execution logs from these platforms. They have done a case study on three representative Hadoop-based BDA Apps and have shown that their approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Th
This content is AI-processed based on ArXiv data.