Dark data is the redundant, often forgotten-about data that you collect “just in case”. It’s unstructured, untagged and unused information that hides within your company’s networks and machines, taking up valuable space. IBM estimates that 80% of all data collected is dark data. Most of this is generated during regular business activities, but isn’t used for any other purpose, even though it was probably stored for a good reason at some point. Types of dark data include:
- User activity logs
- Customer conversations
- Server monitoring logs
- Emails, meeting minutes and employee created data
Without thinking about it, companies will continue to collect and store enormous amounts of data they will never touch again.
It’s time to get your redundant data under control. Storing dark data can become costly if not managed early. Plus, there are serious security concerns surrounding unstructured data. If you don’t know exactly what you’re storing, how can you possibly protect it correctly? Finally, dark data isn’t just a hassle – it can be extremely valuable to businesses. As the tools to analyze large amounts of unstructured data become more readily available, organizations stand to glean valuable information about their customers and business processes.
Businesses need to decide whether they are going to take advantage of all this big, juicy data or start to reduce the amount of dark data they are storing unnecessarily.
The Costs of Redundant, Trivial Data
The global cost of dark data is expected to reach $3.3 trillion USD by 2020. Veritas estimates that it costs the average mid-sized organization (holding 1000TB of data) more than $650,000 annually to store non-critical informa tion.That’s a vast amount of money to spend storing something you don’t even use!
Beyond the actual cost of storage, the management of dark data also eats up valuable resources. All electronically stored data is vulnerable to legal discovery if potential litigation emerges. How much time and energy will it take to pull customer conversations out of storage?
Pruning your stale, unnecessary data can reduce storage maintenance costs, saving your business thousands of dollars a year.
Security Risks of Dark Data
Perhaps a bigger concern is the security around dark data. In this era of hyper-personalization, businesses are collecting more personal information about their users than ever before – often without even realizing they are doing so.
For example, on May 3rd, Twitter announced a bug where user passwords were stored unmasked in internal logs, unbeknownst to Twitter administrators. This is one of the worst cases of dark data gone wrong. While Twitter doesn’t think anyone outside of Twitter was able to access the data, it’s still a big deal. Storing users’ passwords in logs is terrible practice, and it could have been avoided with better dark data policies.
In 2018, the need to understand what data is being collected, and for what purpose, is becoming increasingly important. With new GPRD rules coming into effect in May 2018, “personal data can only be gathered legally under strict conditions, for a legitimate purpose.” That means that storing data “just in case” isn’t a legally viable option anymore. Businesses need to show how they are using and protecting their customers’ data in order to avoid hefty fines.
Finally, all EU citizens have the “right to erasure”, which means they can request that companies delete any information they have stored on them. If you haven’t been accurately tagging and storing user data, this might be an impossible task. If you’re unable to comply, you can expect hefty fines, and a potential lawsuit (if users believe you aren’t using their data ethically).
Getting your dark data in order
With the cost and security concerns around dark data, it’s absolutely essential to spend time getting your redundant and trivial data under control. Here’s how:
#1. Analyze your current data situation
The first step is to wrap your head around the current state of your dark data. Viewpointe explains that “analysis helps reveal what the data is, the format it is in, whether it is a duplicate and how much storage capacity it occupies.” Your purpose is to determine what data might be useful, and what just needs to be deleted (and stop being collected). With small amounts of data, it might be possible to start evaluating your storage manually. But for big companies, you may need to turn to technology to identify what your dark data is hiding.
#2. Classify existing dark data
Once you’ve identified what type of data you’re dealing with, and what you want to keep, it’s time to turn that unstructured mess into structured data. However, this is easier said than done. Understanding the structure of your data and classifying it manually is incredibly hard. But fortunately we’re beginning to see AI startups designed specifically for the task.
In May 2017, Apple bought Lattice, an AI dark data classification tool for $200m. Lattice previously specialized in taking large amounts of unstructured data and turning it into useful, structured information. Quannta Analytics, a dark data startup, works with McDonald’s and the state bank of India to tease out information from their large amounts of dark data.
Big data tools can carry out a similar task. Hadoop and similar SQL tools are great at dealing with large amounts of data. For example, a Hadoop-based data warehouse put into production by Edmunds.com Inc. has helped them examine their dark data and reduce operating costs, says Paddy Hannon, Vice President of Architecture.
Especially important is identifying any stored data that contains regulated information (such as credit card numbers or government ID). Pulling this information into a searchable classification structure is the first step to meeting regulations.
#3. Make a plan
Once you’ve classified your data and determined what needs to go, it’s time to think forwards. There are a lot of questions to be answered:
- Whose hands does this information need to fall into?
- What data retention policy should we put into place in the future?
- How can we use search and retrieval tools to make accessing future data easier?
- How frequently should we review our dark data going forwards?
It’s also a good time to review data regulations and see if there’s anything you need to lock down further. Think of it as spring cleaning, but for your data.
Unlock the business potential in your dark data
While the cost and security concerns are a good enough reason to spring clean your data, there’s a huge bonus in paying attention to your dark data. It contains insight about what your customers want, and where your business needs optimizing.
Nitin Mital, Head of Deloitte Consulting’s Analytics & Information Management practice, believes that bringing in decision makers from other areas of the business is critical to “lighting up” that dark data. “Work with business teams to identify specific questions that need to be answered. Then identify the sources of data that make the most sense for your analytics efforts.” For example, talk to your head of marketing about their goals for the quarter. How can data collected across the business (for example, customer conversations or product usage logs) help marketing understand customers better?
Asking the right questions about your dark data will help uncover the insights locked up in the costly dark data you’re keeping – and not using!