Big Data and Social Media: The Big Shift
You may find below the recent article published in Today Software Magazine nr. 23 / 2014: “Big Data and Social Media: The Big Shift“.
Since social media platforms expanded through our lives, the amount of data exchanged across them has sharply upsurged. We write texts describing an idea, an opinion, a fact; we upload images and videos; we share our preferences by using simple buttons (“like”, “favorite”, “follow”, “share”, “pin” etc.); we accept in the network people we know very well in our real life and people we have never met before and probably never will – … and everything goes in the network almost in real time!
Suddenly, we realize that the unit measure of the data handled in a given amount of time reaches the order of exabytes. This data is not only big in volume, but is also extremely diverse and it moves at incredible speeds. The information contained in it is relatively incommensurable. Fact is Facebook, Twitter, Pinterest can see when you fall in love, what is your mood, where you are and many other behaviors that you decide to show.
The question is: what can we do with this massive amount of data created through social media?
According to the information gathered by IBM in a reported based on sources provided by Mc Kinsey Global Institute, Twitter, Cisco, EMC, SAS, MEPTEC, QAS the following interesting facts worth paying attention to:
- Facebook ingests approximately 500 times more data each day than the New York Stock Exchange (NYSE).
- Twitter is storing at least 12 times more data each day than the NYSE.
- It’s estimated that 2.5 quintillion bytes (23 trillion gigabytes) of data is created each day
- 6 billion people out of 7 billion people (world population) have cell phones
- It is estimated that 40 zettabytes (43 trillion gygabytes ) of data will be created by 2020 (300 times more than in 2005)
- 300 billion pieces of content are shared on Facebook every month
- 400 million tweets are sent per day by about 200 million monthly active users
- 4 billion hours of video are watched on youtube every month
- By 2014, it’s anticipated that there will be about 420 million wearable, wireless health monitor
- NYSE captures 1 TB of trade information during each trading session
- Modern cars have close to 100 sensors that monitor items such as fuel level and tire pressure
- By 2016 it is estimated there will be 18.9 billion network connections (almost 2.5 connections per person on earth)
- 1 in 3 business leaders do not trust the information they use to make decisions
- 27 % of respondents in one survey were unsure about how much of their data was inaccurate
What does Big Data actually mean?
At first sight we can describe Big Data as very large and complex data sets, impossible or hard to handle with classic data processing tools. The expression itself is being used as it originated from English; we must note that French specialists are currently translating it as “grosses données” (big data) or “données massives” (massive data) or even “datamasse” (datamass) as in “biomass”. The novelty of the concept and the blurred definition lines prevent the localization of the term.
In 2012, Gartner (that has somehow contoured the term in the early 2000’s) has updated the definition: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.“
The above definition outlines the dimensions of Big Data – the well-known 3Vs – volume, velocity, variety. Yet, the great thing about this formulation is that it opens multiple perspectives on the Big Data concept. Recently a 4th V has been attached to the above definition: Veracity. We may note a technology view, a process view and a business view.
Social Media Analytics and Big Data
Since one of the essential characteristics of Big Data originated from social media is that it is real-time or near-real-time. This gives to the exploratory analysis a wide perspective on what is happening and what is about to happen at a certain time in a certain area.
Each fundamental trait of Big Data can be understood as a parameter for quantitative, qualitative and exploratory information analysis.
There are two types of data that social media platforms collect: structured and unstructured. In addition the collection source is diverse: HTM (human to machine), MTM (machine to machine) or sensor based. For social scientists the total mass of the data allows the definition of multiple classes, criteria and the refining of analysis sets and subsets.
The data formats vary from text documents, tables to video data, audio data and many more. This lifts the data analysis to a higher complexity level; therefore, the statistical models will also be adjusted in order to obtain viable information.
Speed is a key aspect in trend and real-life phenomena analysis. The faster the data is generated, shared and understood the more information it can reveal. By analyzing the spreading speed of a certain data set, one can grasp the potential impact of the information it contains on a specific social group in a defined territory. Another interesting aspect is that one can track the data distribution chain.
For the seasoned data analyst it is essential to be able to evaluate the truthfulness, the accuracy and honesty of the data put to analysis. Here the discussion goes around the responsibility of the initial data generator, the goal for which the data is being released and the reactions of the receivers.
Big Data Management
One of biggest challenges at the time being is to build the proper tools and systems to manage big data. As real-time ore near-real time information delivery is one of the key features of big data analytics, the research aim to set-up data base management systems able to correspond to the new requirements.
The technology in progress involves the following:
Storage: For the storage and retrieval of data, the underlying NoSQL developments are best represented by MongoDB, DynamoDB, CouchBase, Cassandra, Redis and Neo4j. Currently they are known as the most performing document, key value, column, graph and distributed databases.
Software: The Apache Hadoop set counts Cloudera, HortonWorks and MapR. Their main goal is to expand the usage of big data platforms to a more diverse and capacious user range. Secondly these technologies focus on increasing the reliability of big data platforms, to enhance the capability of managing them and their performance features.
Data Exploration and Discovery: Big data analytic discovery is a hot research and innovation topic. Major developments have been done by Datameer, Hadapt, Karmasphere, Platfora or Splunk.
When dealing with a completely new size level, the capture, the storage, the research, the distribution, the analysis and the visualization of data must be redefined. The perspective of handling big data are enormous and yet unsuspected!
It is often recalled the possibility to explore information shared in the media, to acquire knowledge and to assess, to analyze trends and to issue forecasts, to manage risks of all kind (commercial, of insurance, industrial, natural) and phenomena of all kind (social, political, religious, etc.). In geodynamics, meteorology, medicine and other explorative fields – big data is ought to improve the way the processes are being deployed and the data interpreted.
The Big Shift
In order to answer our initial question, the best thing we can do with this data mass is to EXPLORE it.
As simple as it may seem, this statement has deep implications on the way we see data analysis in the nearest future. The model is shifting from the traditional model in which we plan, collect data then analyze to the new model where we collect all and after we try to find significant patterns.
The new analysis model has its own risks, but it also opens the way for a new generation of data analysts and scientists. At this point, I consider that this is the main impact that social media had upon the way we see Big Data.
Author: Diana Ciorba.