Rich Miller - Considering the Value of Data in Banks.

I’ve just had the opportunity to read an excellent LinkedIn post by Paul Forrest entitled How much value is in data for banks?

Here are two thoughts on his three challenges.

“Understanding where your data is” becomes all too often a question of which dataset is the authentic, most relevant, and which of the other replicas are out-of-date, incomplete versions that were retained out of inattention or “we might need it later”.

Eliminating redundant data is a cost and efficiency imperative.

Eliminating tainted, partial, or out-of-date data is about eliminating business risk.

The call to action: Deduplicate and cull potentially misleading data … early and often.

Data value begins with source data quality, and, increasingly, data quality can only be determined by ascertaining its pedigree.

Until recently, banking data was generated primarily from within the bank’s internal operations. The in-house consumer of that “first-party” data generally knows how much remedial work is required, if any, in order to extract real value from that data.

During the past twenty years, banks have become increasingly dependent on data generated by second-parties … necessary data which must be combined with their first-party data in order to make appropriate decisions and take necessary actions. Consider data provided by credit rating services, transaction reporting from credit card companies and their networks.

If these external sources include erroneous or incomplete data, the quality of any subsequent processing by the consuming bank is at risk. The risk increases still more when the source of data is a third-party aggregator of data from multiple second-party sources, much of it supplied with little or no assumption of liability if it turns out that a datum is flawed.

The call to action: Consider the source(s), the lineage and the provenance of data… particularly data generated externally and provided by external suppliers.

Any data source that cannot or will not assume responsibility for its ‘data product’ must be treated accordingly. Use it if you must, but invest in additional efforts to assure data hygiene and independent verification.

Any data source that emphasizes quality by providing verifiable pedigree for the data product must be considered a preferable choice. The price of ‘data with pedigree’ may not make this source feasible, but before protecting your budget, consider the TOTAL cost of tainted data and the likelihood that tainted data will be introduced.

Once again, thanks to Paul Forrest for the insightful post.