What is Big Data?

In essence, Big Data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. It usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process data within a tolerable elapsed time[1]. The “size” of Big Data is a constantly moving target, which doesn’t remain stable at any given point of time. As per a recent report, its size ranges from a few dozen terabytes to many petabytes of data.

The story of how data became big starts many years before the current buzz around big data. About seventy years ago we encountered the first attempts to quantify the growth rate in the volume of data or what has popularly been known as the Information Explosion (a term first used in 1941). The history of Big Data as a term may be brief – but many of the foundations it is built on were laid many years ago[2]. Long before computers (as we know today) were commonplace, the idea that we were creating an ever-expanding body of knowledge ripe for analysis was popular in academia.

Now, let’s look at a detailed account of the major milestones in the history of sizing data volumes in the evolution of the idea of “big data” and observations pertaining to data or information explosion:

1932 Skipping the important milestone of the population boom would not do justice to the history of Big Data. Information overload continued with the boom in the US population, the issuing of social security numbers, and the general growth of knowledge (research) which demanded more thorough and organized record-keeping.

1941 Scholars began referring to this incredible expansion of information as the “Information Explosion”. First referenced by the Lawton Constitution (newspaper) in 1941, the term was expanded upon in a New Statesman article in March 1964, which referred to the difficulty of managing the amount of information available.


Schematic showing a general communication system[3].

1944 The first flag of warning on the growth of knowledge storage and the retrieval problem came in 1944, when Fremont Rider, a Wesleyan University Librarian estimated that American university libraries were doubling in size every sixteen years. At this growth rate, Rider speculated that the Yale Library in 2040 would have “approximately 200,000,000 volumes, which will occupy over 6,000 miles of shelves… [requiring] a cataloging staff of over six thousand persons.”

1948 Claude Shannon published “Shannon’s Information Theory” which established a framework for determining the minimal data requirements to transmit information over a noisy (imperfect) channel. This was a landmark work that enabled much of today’s infrastructure. Without this understanding, data would be “bigger” than it is today.

1956 The concept of virtual memory was developed by German physicist Fritz-Rudolf Guntsch as an idea that treated finite storage as infinite. Storage, managed by integrated hardware and software to hide the details from the user, permitted us to process data without the hardware memory constraints that previously forced the problem to be partitioned.


Information Overload[4]

1961 Information Scientist, Derek Price, generalized Rider’s findings to include almost the entire range of scientific knowledge. The scientific revolution, as he called it, was responsible for the rapid communication of new ideas as scientific information. This rapid growth was in the form of new journals doubling every 15 years.

1963 In the early 1960’s, Price observed that the vast amount of scientific research was too much for humans to keep abreast of. Abstract journals, which were created in the late 1800’s as a way to manage the increasing knowledge-base, were also growing at the same trajectory and had already reached a “critical magnitude”. They were no longer a storage or organization solution for information.

1966 At around this time, the Centralized Computing Systems entered the scene. Not only was information booming in the science sector, it was booming in the business sector as well. Due to the information influx in the 1960’s, most organizations began to design, develop and implement centralized computing systems that allowed them to automate their inventory systems.

1970 Edgar F. Codd, an Oxford-educated mathematician working at the IBM Research Lab, published a paper showing how information stored in large databases could be accessed without knowing how the information was structures or where it resided on the database. Until then, retrieving information required relatively sophisticated computer knowledge, or even the services of specialists —a time-consuming and expensive task. Today, most routine data transactions—accessing bank accounts, using credit cards, trading stocks, making travel reservations, buying things online—all use structures based on relational database theory.


A relational database system[5]

1976 In the mid-1970’s, Materials Requirements Planning (MRP) systems were designed as a tool to help manufacturing firms to organize and schedule their information. Around the same time, PC’s were gaining huge popularity gradually which marked a shift in focus toward business processes and accounting capabilities. Companies like Oracle and SAP were founded around the same time.

1983 As advancements in technology continued further, every industry began to benefit from new ways to organize, store and produce data.


Information Explosion[6]

1996 Digital storage became more cost-effective for storing data than paper. Also, the boom in data brought more challenges to ERP vendors. The need to redesign ERP products, including breaking the barrier of proprietorship and customization, forced vendors to embrace the collaborative business over the internet in a seamless manner.

1997 The term “Big Data” was used for the first time in an article by NASA researchers Michael Cox and David Ellsworth. The pair claimed that the rise of data was becoming an issue for current computer systems. This was also known as the “problem of big data”.


The 4 V’s of Big Data[7].

1998 By the end of 90’s, many businesses began to believe that their data mining systems were not up to the mark and still needed improvements. Business workers were unable to get access to or answer the data they needed from searches. Also, IT resources were not so easily available at their disposal. So, whenever the employees needed access, they had to call the IT department due to lack of easily accessible information.

2001 The acronym SaaS (Software as a Service) first appeared around this time. It basically means an “on-demand software” delivery model which is licensed on a subscription basis and is centrally hosted.


Software as a Service[8]

2005 SaaS companies began appearing on the scene to offer an alternative to Oracle and SAP that was more focused on the usability of the end user. Adding to this was the creation of a new programming language named Hadoop. Free to download, use, enhance and improve, Hadoop is 100% open source way pf storing and processing data that enables distributed parallel procession of huge amounts of data across inexpensive, industry-standard servers that both store and process the data with extreme scalability.

2009 Business Intelligence became a top priority for Chief Information Officers in 2009. Tim Berners, director of the World Wide Web Consortium (W3C) was the first to use the term “linked data” during a presentation on the subject at the TED 2009 conference. A set of best practices for using the Web to create links between structured data is known as Linked Data.

2011 By this time, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data per company with more than 1000 employees. The writers also estimated the securities and investment industries led in terms of stored data per organization. The scientists calculated that 7.4 exabytes of original data were saved by enterprises and 6.8 exabytes by consumers in 2010 alone.

2012 After the launch of IPv6, identification and location system for computers on the networks and traffic routes across the internet became much faster. Technologically advanced features such as ability to generate reports from in-memory databases which provide faster and more predictable performance were also on the rise. Businesses began to implement new in-memory technology such as SAP HANA to analyze and optimize mass quantities of data. Companies became ever more reliant on utilizing data as a business asset to gain a competitive advantage, with big data leading the charge as arguably the most important new technology to understand and make use of in day-to-day business.


How does Hexanika make use of Big Data?

Hexanika is a FinTech big data software company which has developed an end-to-end solution for financial institutions to address data sourcing and reporting challenges for regulatory compliance. Hexanika’s innovative solution improves data quality, keeps regulatory reporting in harmony with the dynamic regulatory requirements and keeps pace with the new developments and latest regulatory updates.


Hexanika’s unique Big Data deployment approach by experienced professionals will simplify, optimize and reduce costs of deployment. It strives to achieve this by following the process as shown below:

Hexanika addresses Big Data using its unique product and solutions. To know more about us, see: https://hexanika.com/company-profile/

Feel free to get in touch with our experts to know more at: https://hexanika.com/contact-us-big-data-company/

Contributor : Akash Marathe


[1] Source: Wikipedia

[2] Link: https://www.linkedin.com/pulse/brief-history-big-data-everyone-should-read-bernard-marr

[3] Link: http://www.winshuttle.com/big-data-timeline/

[4] Image source: Google images

[5] Image source: IBM.com

[6] Image source: IBM.com

[7] Image source: IBM.com

[8] Image source: Google images

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

36 − = 26