The Bigger The Data, The Better?

24th November, 2016 by

Does Big data become increasingly more valuable the bigger it gets? The debate rumbles on in the big data world.

#1. Bigger Is Better

Some people are unequivocal: yes, data does become more useful the bigger it gets, so you had better collect as much of it as you possibly can or your competitors might find the golden nuggets before you do. (This is the traditional stance of many of the big players in big data solutions, as exemplified by this Teradata article.)

#2. Quality Over Quantity

As more organizations gain experience of using big data solutions, an increasing number are saying size isn’t everything – data quality is of equal importance – and that capturing more data than you can currently usefully analyze is a waste of resources. (This Forbes piece typifies the argument.)

As is so often the case, the truth lies somewhere in between these two positions. In a way, both camps are correct. But how much, and what kind of data you should be storing and analyzing, will differ depending on your specific business circumstances.


When Is Bigger Better?

It’s true that the more useful data you can analyze, the more valuable the insights you can potentially uncover. The key words here, though, are ‘useful’ and ‘potentially’. The old maxim of ‘garbage in, garbage out’ still applies. A mass of raw data is not going to produce such valuable or reliable insights as a subset of that data which has been properly prepared.

So It’s Not A Better Big Data Solution?

It can be, but scaling up big data isn’t just a case of collecting anything and everything. It’s more about scaling outwards. The most valuable insights from big data solutions come when you are able to detect patterns and correlations across a range of different datasets that may have previously sat in isolation. Adding in new, relevant data sources to your big data solution is generally better than augmenting an existing source, since it opens up more opportunities to detect interesting correlations.

Can You Give An Example?

For instance, if a retail organization is trying to understand customer buying patterns it might already be analyzing its own sales data, customer relationship data and marketing campaign data. While adding more data to one of these streams might improve the analysis a little, bringing an entirely new source of useful data onstream  – e.g. social media sentiment data, customer location data, etc – has a far greater potential to produce a whole new set of correlations and insights.

So To Be Better, Bigger Should Mean Broader Not Taller?

Yes, but bigger will also mean faster. The amount of data being produced daily continues to grow exponentially. And as the Internet of Things (IoT) takes hold, that is only going to accelerate. All manner of public and private IoT devices will be churning out ever more detailed streams of data, from household heating systems to personal health monitors and traffic sensors. When you’re talking about sucking in data for analysis in real-time (rather than offline analysis of historical data), then the increasing velocity at which new data is coming at you means that big data solutions will need to handle ever ‘bigger’ data to manage the speedier throughput.

Are There Any Limits On How Big This Will Become?

Clearly, big data solutions will only be able to ingest and analyze new data in real time if they are able to keep up with the throughput. This means you’ll need not just powerful real-time analysis tools, but ever more bandwidth and storage. Similarly, uncovering meaningful business insights from the patterns detected in data currently requires a lot of human intervention by skilled data scientists, so your level of access to these skills will also affect how big your big data can get.


Won’t Advancing Technology Eliminate Those Bottlenecks?

Many believe so, based on the rate at which big data solutions have matured in the past few years and continue to do so. For example:

  • Machine learning and AI technologies are growing more sophisticated every year, and it’s anticipated that eventually the process of identifying high-level insights that today require skilled human intervention to uncover will become increasingly automated.
  • Although full automation looks to be more than a decade away, the tools available for analyzing and visualizing big data continue to mature apace. They are already allowing data scientists to work faster and more productively.
  • Automated data cleansing and classification tools are significantly easing the burden of bringing new data sources on stream.
  • IoT aggregation devices and tools can strip out much of the redundant data generated by connected appliances and sensors before it’s ingested by your big data solution.
  • The rise of cloud-based services for big data analysis will allow even those organizations with limited resources to take advantage of ever more powerful and functional tools without needing to dedicate resources to implementing or building them in-house.

So bigger, high-quality data isn’t just better – it’s inevitable.


(Visited 18 times, 1 visits today)