You must have heard talks about Big Data and may be wondering what exactly it is. This article is for those who want to know the basics and have an idea of what this Big Data is.
Big Data is a large volume of data, whether it is structured or unstructured data. Data captured during a web search, online shopping, data captured by various ERP Systems all can contribute to the Big Data. It is not the amount of data that matters but what it is being used for, is important. Many organizations use Big Data for various analytics and planning business strategies.
Due to its volume, Big Data is often difficult to be stored and processed using the conventional methods. Internet of Things, Connected Systems and E-Commerce are contributing to the popularity and collection of data, the very moment a transaction happens. Various information like shopping behaviour, travel pattern, customer preferences etc. are captured and analyzed on real-time and suggestions and decisions made on the go.
You can see “Related Products” and “Suggestions” on Shopping Web Sites like Amazon when you search for a specific product. The shopping and browsing trend are captured and used for producing such artificial intelligence.
5Vs of Big Data
The 5Vs or 5 keys that make Big Data a huge business are Volume, Velocity, Variety, Veracity and Value.
- Volume:
Organizations can capture data from various sources like Social Media, Smart Phones, Equipment, Websites, IoT etc. The nature of the business decides the importance of these sources. These data must be stored for further analysis. Data Storage is now not a big issue with Data Lakes and Hadoop. Data Lake – is a single store of all enterprise data – raw and transformed data used for various purposes, stored as blobs or files. Hadoop – is an open-source framework for storing data and running on clusters of commodity hardware. Data Warehouses work on the principle of Extract, Transform and Load (ETL) in the traditional style. They store structured data and use it for various analytics and reporting. On the other hand, Data Lakes provide storage of raw data in the native format until it is needed for further processing. - Velocity:
Business organizations need quick information like in real-time for their analysis and planning. - Variety:
Data can be captured in both structured and unstructured format. - Veracity:
The quality of data is often referred to as Veracity. We get data in different formats and from different sources. If these data cannot be properly linked then there is no use of getting it. So, data must be of acceptable quality and can be related or linked together. - Variability/Value:
Data flow can be unpredictable at times. A business need to know something is trending in the social media. The manageability of continuous, periodic and peak data loads must be given importance.
Big data can be as big as (and more) Petabytes (1,024 Terabytes) or Exabytes (1024 Petabytes) for argument, made up of billions and trillions of records.
Examples of big data
Precisely gives an insight into the examples of Big Data as follows.
However, we can gain a sense of just how much information the average organization has to store and analyze today. Toward that end, here are some metrics that help put hard numbers on the scale:
- IDC predicts that by 2025, the world’s data will grow to 175 Zettabytes. (To put that in perspective, if you attempted to download 175ZB at the average current internet connection speed, it would take you 1.8 billion years to download!)
- On average, there are about 500 million tweets sent every day
- According to Nerdwallet, the average smartphone owner uses 2 to 5 GB on his or her cell phone plan each month
- Walmart processes one million customer transactions per hour
- Amazon records $283,000 in sales per minute
- On average, office workers each receive 110 to 120 emails per day, equaling approximately 124 billion emails on any given day
- According to the 2019 Federal Reserve Payments Study, total card payment transactions reached 131.2 billion with a value of $7.08 trillion in 2018, representing growth of 8.9 percent in volume year-over-year
All of the above are examples of sources of big data, no matter how you define it. Whether you analyze this type of information using a platform like Hadoop, and regardless of whether the systems that generate and store the information are distributed, it’s a safe bet that datasets like those described above would count as big data in most people’s books.