data governance

Data Governance for Big Data

Data Governance For Big DataHow many times have you read an article about data quality that uses the phrase "garbage in, garbage out"? It is almost a cliché, but when it comes to big data, there is no other phrase that better sums up the challenges faced by organizations that want to use data and use it well.

Big data presents huge opportunities and many organizations have already used it effectively to cut costs, develop products, improve customer experience, and more. But data presents challenges, even when the dataset is small. These challenges are amplified with big data.

Because big data is faster and based on a much larger scale, its quality—or cleanliness—is more important than ever.

Dan O'Brien makes this point in his article "The Key to Quality Big Data Analytics" on the Inside Analysis blog. He says that, with big data, the issue of governance not only becomes more difficult but also more crucial.

It is a simple equation: if you are basing key business strategy decisions on an analysis of big data, the accuracy, completeness, relevance and quality of that data is of critical importance. If it is not clean, the consequences can range from a moderately poor decision to a catastrophic one.

The Future

Of course, businesses have to analyze and use the big data that they collect, or there is no point in collecting it. But O'Brien says the flipside is true too—there is no point analyzing data unless there are proper governance measures in place.

This could take many forms. For example, identifying who owns particular datasets and who is responsible for their cleanliness is important to governance. This should cover an entire organization, not just the IT teams or other selective departments. All of these areas and individuals within an organization need to be willing participants, and they need to have the technology in place to deliver data quality throughout its lifecycle.

O'Brien argues that there are other things that businesses need to consider to get the most out of big data. This includes proper data processing, effective use of the Hadoop platform, and finding tools that deliver the functionality required to fully realize the potential of big data. On the last point he acknowledges that there is still some work to do as solution providers and software makers work to make better big data applications.

But the first goal, before any of this is looked at, is that data quality is effectively delivered through effective governance.