eRecords Conference 2013: Long-term Access through a Culture of Preservation

erecordsslogo-300x261This is the ninth and final post of a multi-part recap of the 2013 e-Records Conference. Some presentation materials from the e-Records Conference are available on the e-Records 2013 website.

By Angela Ossar, Government Information Analyst

What’s the best way to preserve electronic records for the long-term?  Kalani Kirk Hausman wrapped up this year’s e-Records Conference with a meditation on digital preservation that extended back to his childhood days, helping his dad with computer punch cards, then up through the magnetic tape of the Vietnam era into the present day.  (Sometimes I think I know a thing or two about long-term digital preservation — it’s definitely my favorite professional subject — but then Hausman showed a picture of his computer in college…

Hausman's Computer

He was the only person in his whole dorm to have his own computer.

…and, remembering that my college computer looked more like this:

Practically Space Age!

Mine – practically Space Age!

…I knew I had much to learn.

Hausman began by discussing the first storage media we know of:  cave drawings.  We can “read” cave drawings today because they’ve survived a really, really long time. Their media is durable.  He then compared these drawings to tweets, which he figured as having more like a 30-second life span.* Somewhere in between are the retention standards for electronic state (and local) records. Government records have retention periods of anywhere from “destroy immediately” to “keep indefinitely.”

Any time I see...

“Any time I see ‘permanent,’ said Hausman, ‘this image [of 1’s and 0’s] makes me nervous.”

Unfortunately, there is no single solution to how all that data should be maintained — other than, “I’m gonna have to keep things a long time; maybe I should plan for it.”

So, back to storage media.  Should we chisel all permanent state records onto stone tablets?  Well, no — durable as they are, they do have a “low information density.”  Hausman talked about other obsolete media, like celluloid film, which “has a nasty habit of exploding after some time.”  Later film suffers from vinegar shed and decay: once that happens, the data can’t be reconstructed because it simply no longer exists.

So what about digital media?  After picking on the ZIP disk a little, Hausman talked about some modern digital storage media.  He said that hard drives are getting more complex. They’re helium filled, which means you must keep them cool.  They’re becoming “solid state,” meaning they have no moving parts. He mentioned Blu-ray disks as having a fairly big storage capacity of 128 GB — but pointed out that it would take “an awful lot of Blu-ray’s to back up a 3 TB hard drive.”  He then talked about some storage media we’ll see in the near future, such as the Experimental Holographic Optical Disc (6 TB – 50 TB capacity) and the 7-platter magnetic 3.5″ drive (6 TB).

But ultimately, no matter how we store our stuff, we cannot ignore the need to migrate.  Software changes, formats change, both become obsolete.  He asked the audience if anyone remembered Gopher Space images.

Gopher Space. I had no idea what he was talking about, but the Internet forgets nothing.

Gopher Space. I had no idea what he was talking about, but the Internet forgets nothing.

They were about 480 pixels wide (an inch tall on a screen). Back then, it was cutting edge technology. As was Visicalc, I’m sure, which produced data that is completely incomprehensible to today’s systems.

Visicalc to Excel

Visicalc to Excel

There is no way to migrate one to the other, other than retyping.  Applications are constantly in flux, constantly being updated, “like trying to change tires on a moving car.” No one format is going to work for all types of electronic records, and no one format is going to work 10 years from now.

Strategies

So, how to overcome this multitude of technical obsolescence issues?  The skills required to read older data and programs are becoming more and more rare, as the people who understand them are gradually retiring. Hausman offered five strategies.

1) Print Everything. Paper is durable, it will last longer than most electronic records. The difficulty with paper is that not everything fits into paper. And, paper is not always the most convenient option because not everything is static content. Modern society’s focus is about moving away from paper into electronic.

2) Use Archival Quality Media. If you have to store things on physical media at all, their life span in a best-case-scenario would be 100 years. But in actuality, media life spans tend to be about 11 years (that’s the “common use” rate). Optical media is made up of layers of plastic and metal substrate. Tiny little cracks develop every time you put a disk into a machine. These cracks let air in, which causes the metal to oxidize. And, you must store digital media in good environments, protected from UV lighting, temperature and humidity fluctuations, and seismic activity.

3) Keep Spares of Everything. “Permanent doesn’t mean ‘for the next 5 years,'” Hausman said. Unfortunately, creating backups of everything (and storing them in geographically diverse locations) is costly.

4) Virtualize Hardware. (I apologize, my notes on this part were scant, probably due to total unfamiliarity with the term — the points on the slide were: retain complexity, expand skill sets, enhance online threat, risk App/Host compatibility. That makes perfect sense to you, right?)

5) Use Redundancy in the Cloud. Why not store everything in the cloud?  Well, sometimes there’s a problem with loss of local control — “we need our stuff on OUR servers.” (This reminded me of a point made in a presentation about the University of Texas working with Google to implement Gmail university-wide: UT required that Google agree to keep sensitive research data only on servers physically located in the U.S.)  Also, cloud storage is subject to by-use cost (you pay for what you use — the utility model), and it requires operational internet contact. If the internet goes down, so does your data.

Having said that, he regards cloud storage as the best way to store information, for now, as long as the software comes with you (meaning, it’s not enough to just store a file; you have to keep the software that will read the file).

Is there another path?

 

Golden Gate Bridge, San Francisco

Golden Gate Bridge, San Francisco

The Golden Gate Bridge is painted every 4 years and it takes 4 years to paint it. The same family has been painting the bridge since it was built. It requires continual maintenance because it’s constantly flexing; it’s always changing because of the environment in which it’s situated. What’s the lesson there?

What we need is a change in mindset.  It’s not just protection we should seek, but a mentality of preservation.  Preservation is not a one-time project, but a continuous, collaborative effort.

So, Hausman says, to protect our data we don’t just need a vault — that won’t protect us against accidents, changing skills sets, or other vagaries of technology. We can throw security up around that vault, create the best data storage that exists, use top quality media, make it once and forever perfect….except then, things are going to change again. So we can’t stop with the vault.

And that led Hausman to his final strategy:

6)  Build a Culture of Preservation around Our Data

Select and maintain standards.  Build a bridge between silos — everyone’s got different pieces of the solution, and we all need to work together. Getting money is hard and we all need to pool our resources — across state agencies, divisions, departments. We must accept that taking care of data for the long term takes money and is a continuous effort.

Also, we must implement a cycle of tech refresh for storage and apps. If your backup will last 11 years, then every 5 years you need to refresh. When you move from one software to another, you need to have management and validation after migration.

And, we must identify data for preservation — store only what is needed. Not for business continuity, but for the “long term memory of the State.”   As a records manager — or “data management professional” — we must teach people that we are taking care of them. 

Otherwise….this is what happens when you try to keep everything:

[shudder]

Hausman wrote a great article on his blog after the conference, if you would like to hear his thoughts directly (and I’d recommend it — he teaches undergraduates, so he has to be an engaging communicator!) Read the full text here: http://www.kkhausman.com/2013/11/06/long-term-data-preservation/

* Maybe because you can delete a tweet? As far as I knew, tweets were safe.