In todays competitive business world, being aware of customer needs and market-oriented production is a key success factor for industries. To this aim, the use of efficient analytic algorithms ensures a better understanding of customer feedback and improves the next generation of products. Accordingly, the dramatic increase in using social media in daily life provides beneficial sources for market analytics. But how traditional analytic algorithms and methods can scale up for such disparate and multi-structured data sources is the main challenge in this regard. This paper presents and discusses the technological and scientific focus of the SoMABiT as a social media analysis platform using big data technology. Sentiment analysis has been employed in order to discover knowledge from social media. The use of MapReduce and developing a distributed algorithm towards an integrated platform that can scale for any data volume and provide a social media-driven knowledge is the main novelty of the proposed concept in comparison to the state-of-the-art technologies.
Scalable Decision Making (SDM) in data-intensive applications is a key challenge for enterprises dealing with big data. Complexity and expansion of resources and streams as well as velocity and variety of the data have led to difficulties in enterprise asset management [1]. In this context, scalable algorithms and semantic technologies for dynamic, economic and efficient management of information could facilitate the transition from current challenging situations into a scalable and adaptive decision support systems era in enterprise practices. As an example, in the air traffic management with data intensive activities such as detection of a conflict between two arriving aircraft, a key challenge is to semantically integrate, analyze and interpret data such as trajectory with flight plans, etc., so that they can be used directly for negotiation and resolution of conflicts [2]. The most important factors in such a scenario are time, cost and scalability.
Applications should not only be nearly on-time and cost effective, but also adequate to scale up by enhancing data and processing demands and access via different types of platforms anytime.
On the other hand, social media provides a massive amount of data, especially user generated-content that can be used for opinion mining for a wide variety of purposes. According to Nielsen’s report, the consumer-generated product reviews and ratings are the most preferred source of information between social media users [3]. Accordingly, social media analysis provides product improvement recommendations as well as smart and novel ideas for the next generation and new products. This also provides more efficient marketing methods for enterprises in today’s competitive business world. It is not limited only to industry and marketing oriented aspects, but also to social, medical and political goals.
However, the main challenge here is to find out how to collect the data from social media, which data sources can be useful for specific goals, how to analyze collected data and discover useful knowledge from sources and how to provide scalable algorithms in order to proceed with the large volume and variety of data sources provided in social media. For instance, according to the reports released in February 2015, every day, Twitter produces about 500 million tweets 1 , Facebook produces 2 2.5 billion pieces of content (100 terabytes) and 144.000 hours of videos are being uploaded to the Youtube 3 .
Managers and key decision makers in various industrial and/or business sectors involve continuous supervision of huge amounts of information, which have to be collected from social media and analyzed for predictions, judgments, evaluations, strategic plans and actions. In most cases, the decisions that have to be made are subject to strict restrictions regarding available resources and requested response times. Moreover, decision makers usually encounter sudden, unexpected and urgent events that can easily lead to dangerous situations that may threaten the safety and the reliability of the whole system. The issue of how to scale up or down for acceleration or deceleration imposes an additional complexity in today’s traditional decision making methods [4]. This issue becomes more challenging when there are not any particular estimates of future growth for data volumes and service demands.
Managing huge amounts of heterogeneous data has recently emerged as a key challenge in many computing applications. In addition to the traditional Relational Database Management Systems (RDBMS), so-called NoSQL databases have appeared as high performance alternatives, providing document-oriented storage for semi-structured or unstructured data [5]. These databases can also be deployed to many nodes and allow adjustable redundancy levels as required by the application. However, the right choice of database management system and its correct parameterization according to the data as well as the data processing requirements of a specific application are not yet fully understood in a big data era [4]. Many aspects have to be taken into account, like the type of data assets, the access patterns in the data, the desired level of redundancy/availability, the isolation level of distributed nodes, and many more.
The integration of heterogeneous data sources presents another key challenge. Current approaches of joining diverse data sources and creating an abstraction layer for unified data access are often the result of an ad-hoc approach [6]. A comprehensive methodology for creating data federations over diverse data sources, which is applicable for different domains, is still missing. This is particularly critical when static and dynamic data sources have to be combined for creating new insights from the data specially through social media sources.
NoSQL database management systems are characterized by not using SQL as a query language -or, at least, not using fully functional structured queries. Mostl
This content is AI-processed based on open access ArXiv data.