This is a part of a series of recaps from the 2014 National Conference on Managing Electronic Records, otherwise known as the MER Conference.
Jason Baron, Esq. and Jay Brudz, Esq., who work in the Information Governance and eDiscovery Group of Drinker Biddle & Reath, LLP, gave a joint presentation about how information governance (IG) can help organizations confront the challenge of data remediation. Baron and Brudz defined data remediation as “[T]he process of securely removing data from legacy electronic environments, with the intent of disposal after due diligence is accomplished.” In the context of TSLAC language and government recordkeeping, we could say that data remediation means the disposition of electronic records after any disposition holds have expired. We’ve talked many times before about the importance of disposition, both in general and specifically regarding electronic records. Disposition is an essential part of any RM program. But if you haven’t quite been able to get your organization to embrace disposition, Baron and Brudz had plenty of thoughts and tips to share about how information governance can help you.
There’s just so much stuff!
“We need to keep everything, just in case! It might be useful one day.” We’ve all come across this attitude – maybe we’ve even felt it ourselves. The advent of Big Data and data analytics has only increased that anxiety. Companies want to keep all of their data in case it becomes valuable later.
Baron likened data remediation to finding a needle in a haystack – but the metaphor could also be extended to discovery or searching for any kind of record for any reason. The haystacks are getting larger (which could be good from a Big Data/analytics perspective) which makes it harder to find the needle. Brudz interjected that to find a needle in a haystack, you need a match. But when you’re talking about data remediation, you’re not really talking about a needle in a haystack – instead, you’re looking for a specific needle in a pile of other needles. In other words, you’re looking for a relevant record among a bunch of irrelevant records. Brudz lamented that spending so much time on the irrelevant records is an “affront to morality.” I think we can all relate to this rather strong choice of words!
How IG and Predictive Analytics Can Help your Disposition Program
We’ve talked about predictive analytics before, particularly as it applies to e-discovery. In fact, I covered a talk that Baron gave on the subject at ARMA 2013. The technology can and should also be applied to disposition. Manually reviewing every single electronic record that is ready for disposition is simply too enormous of a task, especially in small local governments or state agencies where only one person might be doing the records management. Predictive analytics (also known as technology assisted review) can help.
Once again, the Apollo program was held up as an example. When JFK told us to land a man on the moon, Lockheed Martin said that they had the technology to build the lunar lander – but entire industries needed to be created in order to build the materials and parts necessary for the lunar lander. Similarly, the technology exists to get rid of records with minimal human review – but all of the background work needs to get done first. The background work is learning what you have and what you’re supposed to keep. In other words – an inventory and a retention schedule are necessary before you can use predictive analytics. Once you have those elements in place, disposition via predictive analytics should be within reach. Then, once you embark on a remediation or disposition project, you need to do your part to ensure that you are duly diligent – that you are disposing only of records that can be disposed. They have met their retention period and no longer have any business, legal, or financial value.
The Theory of Triangles
To illustrate the value of IG both for remediation and in general, Brudz put forth a “Theory of Triangles.” He explained, “Any field of human endeavor can be characterized as a triangle.” For example, consider the Reese’s peanut butter cup: The chocolate in a Reese’s peanut butter cup is not the best chocolate in the world – nor is the peanut butter the best gosh darn peanut butter out there. But when you put the two together, you get a delicious chocolatey peanut buttery confection of goodness (at least according to people who like peanut butter and chocolate together).
Now let’s apply the triangles to the average American company. Most people in an organization are at the bottom of the triangle, doing most of the work, and not making a lot of money. And then there are a few people at the top. The top portion of the triangle might make as much money as the bottom portion, even if there are only two people at the top versus 5,000 at the bottom. If you work in a single industry, then your goal (if you’re the ambitious type) is to be at the top. But what if your work transcends two industries? For example, law and technology? If you put the triangles together, you don’t have to be at the top of both industries. You just have to be in the middle of both, and when you combine them – look! You can easily rise to the top of that triangle – the Law & Technology triangle.
While working in intersecting triangles is great for your career, you can also apply them to your IG program. Your IG program should be “cross-functional” – so, security and records, technology and law – they all need to be combined. In doing so, you will multiply the value of your program (or your career). It also helps address the problem of silos that you may encounter or hear about in RM. For example, your legal team and your records team might be addressing the same records-related compliance issues, but if the two departments don’t talk to each other, then your IG program isn’t particularly useful. Having the two “triangles” combine with each other and with other triangles will streamline your IG program and make it more efficient.
Baron and Brudz then went on to briefly illustrate their points with a few case studies.
- In case study #1, they had 750,000 boxes of hard copy records to go through. They tackled the problem by using a “Three Bucket Approach.” The three bucket approach meant dividing records into three categories: 1) The stuff you know enough about to keep, 2) The stuff you know enough about to throw away, and 3) The stuff you don’t have enough information on to make a reasonable decision.With those three parameters, they were able to reduce the number of boxes down to 80,000. That’s still a lot of boxes, but it’s nearly a 90 percent reduction in volume.
- In case #2, 15 years’ worth of emails needed to be assessed for remediation. The problem was, there were over 100 active legal holds on the email. Trying to go through every single document was impractical, so predictive coding was employed. The hard part, Brudz and Borger explained, is all the work that goes into preparing the predictive coding – determining what to keep, identifying legal holds, and making sure the legal holds are in good shape.
- In case #3, Baron talked about NARA’s move to the cloud for email with embedded autocategorization. Here, Baron warned the audience that to move to the cloud without having completed the requisite records management work is nothing short of “folly.” He also cautioned that autocategorization does not always work with a taxonomoy. Most of us are familiar with taxonomy as it relates to animal classification, and from my non-scientist perspective, it seems to work fairly well for the scientific community. But what would happen if a creature such as this existed?
It would be pretty difficult to make that fit into a Linnean taxonomy! It’s no different for records. Sometimes, records simply do not fit into one category – they could be classified several different ways. That’s why taxonomies are not a foolproof approach to organizing information (but that’s a whole different topic of discussion).
Make the case for IG and Predictive Analytics!
Baron and Brudz closed the session by emphasizing the importance of extracting the “signal from the noise” in the coming age of dark data (the large volume of data that isn’t readily visible or accessible in a system). As we all know, data volume increases at a seemingly exponential pace every year. But not all that data is useful to an organization or qualifies as a record. Identifying and keeping the relevant data while filtering out personally identifiable information (PII) and other sensitive information will continue to present a hefty challenge for RIM professionals. That’s why it’s important to make your case to the business department in your organization for the importance of information governance and predictive analytics.
Brudz used the following anecdote to illustrate that point: As a young man in the military, Brudz was given his “basic initial issue” – it included all the items you would have on a tank, along with a chart that says what item goes in what spot. Brudz decided that he would rather have certain items go in different spots, to better suit his idea of what worked. But then he realized that if it was 2 AM and he had to go help someone in a different tank, he would know where everything was and would be able to continue the mission, even in the dark.
Similarly, when information is well managed, everyone knows where everything is, and that will help your organization “get there firstest with the mostest,” as Brudz put it. The people who budget your projects might wave you off when you say, “If you don’t do this, bad things will happen.” But with well-managed information, your organization can figure out what it needs to do more quickly – and that will save time and money. And hey, who doesn’t want more time and money?