Big data is definitely really big to deserve that tag, right? Right! Several other factors arise as one looks deeper into this concept and tries to understand it. A question of how big Big Data is arises often but truly, what might seem massive to you is most definitely minute to a computer. Say pal, how long would you take to search through a table with 1,000,000 rows for the number of occurrences you find the term ‘big data’? For that, I shall consider there is only one column and a maximum of two words per row – I’ll give you a 2 weeks. Godspeed!
For a modern day computer, the feat that costs you 278 hours of your time only takes only a few seconds. In reality, one million rows of words cannot be considered as big data. Let us put this term into perspective and in this article, we shall demystify this behemoth in the field of data science by chopping it into small pieces – call them computer bytes.
Whenever you hear the term Big data being thrown around, I’d like you to think really enormous sizes of data in the range of 1,000,000 GB’s and above. Yes, you got me right there. This territory is not for small fish like Gigabytes or Terabytes. At the bear minimum, we mention a Petabyte, move on to an exabyte and so forth.
To understand the behemoth we are speaking about, perform this experiment. You may have a computer with the minimum size of storage at 1024 GB (1 TB). You have movies there and music – a never ending collection perhaps? Search for any term in the whole disk by using your file explorer, see how long it takes then consider how long it would take for a million times that size of data. This is where the definition of the term Big Data comes from but that is not all, read ahead to better comprehend.
Big Data is data that is either too large or complex for traditional computer software to deal with. The kind of traditional computer software we are talking about here is not the software that was used in the ENIAC, it is the software running on the laptop you are using right now. In either case of being large or complex, the data may also be structured or unstructured.
When describing Big Data, there are three V’s (most recently four V’s) that come into consideration. These are Volume, Variety, Velocity and Veracity. I shall briefly explain each V to make sure the concept is driven home.
Volume : This deals with the quantity of data that is stored or generated. In actuality, this is the primal determinant of whether data can be considered big or not. Remember the exabyte?
Variety : Data quantity can be immense but be of the same type. It would thus lack in complexity. Variety means the type of data consistent in a data set. This can be seen as text, audio, video and images.
Velocity : Thinking data is in physical motion is outright madness. That is why this term means slightly different here. Velocity means the rate at which data is being generated and processed to meet current needs. We won’t go further than that in this article.
Veracity : Simply, this is the quality of data that is collected for analysis. Quality affects its value and further, the accuracy of subsequent analysis.
You now have a basic understanding of what Big Data is and how it actually is considered by most fellows. We shall revisit this concept in the future to look at the technologies involved and how they work plus their differences with traditional software and so forth. Now, it is worth mentioning that Big Data has several (numerous) uses in the current world of computing. It’s concepts is greatly applied in Governments, Manufacturing Industries, Health Care, Education and so forth.
Now you know.