e-Records Conference 2014: Texas Digital Archive update

erecordsslogo-300x261This is the sixth and final post of a multi-part recap of the 2014 e-Records Conference. Presentation materials from the e-Records Conference are available on the e-Records 2014 website.

By Angela Ossar, Government Information Analyst

I don’t think I’m the only one who looked forward to Mark Myers’ presentation with great anticipation. Mark is TSLAC’s first-ever Electronic Records Specialist: the person who will be building the Texas Digital Archive, the digital repository for the archival electronic records of Texas state government.

Mark’s experience with building digital archives goes all the way back to 1998, when he worked at the Alabama Department of Archives and History. Then, from 2001 until May 2014, he did the same job for the Kentucky Department for Libraries and Archives.

Mark takes issue with our referring to him as the electronic records “expert,” but if anyone is an expert in the long-term care and safekeeping of electronic records at our agency, it’s him. So, lucky for us, he was ready with a crash course on the issues surrounding digital preservation.

Digital Preservation

Electronic records have four essential characteristics, and digital preservation is the challenge of maintaining all of these characteristics over time:

  • Authenticity:  A record must be what it purports to be: it can be proven to have been created or sent by the person claiming to have created or sent it, and to have been created or sent at that time. Metadata — administrative (like access information and audit trails) and technical (like checksums) — is what proves authenticity.
"Authenticity" slide, Mark Myers, "Have Fun Storming the Castle! The importance of records management & IT collaboration for digital preservation and the development of the Texas Digital Archive" (e-Records Conference, November 18, 2014)

“Authenticity” slide. He asked: is that picture on the left a real Civil War soldier, or is it a reenactor’s Instagram post? And regarding the photo on the right: interesting how CBS’s broadcast featured the CBS Eye in Times Square, even though the NBC logo is actually on that building.

  • Reliability:  reliable record is one whose contents can be trusted as a full and accurate representation of the transactions, activities, or facts to which they attest. To illustrate the importance of reliability, he showed two versions of the Kentucky Governor’s website from 2003, one from the live Web and one from an Archive-It capture.
(Click to enlarge) The screenshot on the right is the Governor's website on the live Web; the one on the left, the captured version. Because the website relied on stylesheets and templates that weren't crawled, a lot of information was missing. So is the "archived" version reliable?

(Click to enlarge) The screenshot on the right is the Governor’s website on the live Web; the one on the left, the captured version. Because the website relied on stylesheets and templates that weren’t crawled, a lot of information was missing. So is the “archived” version reliable?

  • Integrity: The archives must be able to demonstrate that the record hasn’t changed since coming into its possession. That means protecting records from human errors like improper access (tampering) and accidental deletion, as well as technical problems like data loss (“bit rot” or media deterioration: bits get scrambled over time, and there are only so many “flips” an object can take before the data is permanently damaged). One part of integrity is protecting relationships amongst compound (multi-part files), such as GIS data:
To be able to put GIS data on a map (as it is intended to be expressed), you must preserve all of the parts of this compound file. If you only preserve the XLS spreadsheet in this list, you will end up with data that doesn't make sense.)

To be able to put GIS data on a map (as it is intended to be expressed), you must preserve all of the parts of this compound file. If you only preserve the XLS spreadsheet in this list, you will end up with data that doesn’t make sense.

  • Usability:  You have to be able to FIND the records. This requires consistent file/folder names, indexes and inventories, and search and retrieval. “Keyword searching is not always your friend,” Mark said. “How many of you go to page 100 of a Google search?”

One serious digital preservation challenge is the fact that digital records do not survive without constant attention. paper records can survive in a state of “benign neglect”: it’s not good to store records in a basement, but if the environment is controlled, the records may survive without the archivist’s attention.

Digital records, though, are vulnerable to change — which, of course, is a constant of technology. Technological obsolescence is a preservation challenge when software becomes obsolete, records are dependent on a specific hardware or system (for example, how an iTunes song can only be played on an Apple product), non-standard or specialty formats (the more ubiquitous a file format, the better chance it has at surviving), systems being replaced, records no longer being used (if they aren’t used, is anyone paying attention to them?), and records being on removable media (he gave the scenario of putting files onto a CD, putting that CD on a shelf, and then pulling it down 20 years later when the label’s fallen off and you probably don’t have a CD drive on your computer anymore).

How to avoid some of these digital preservation pitfalls? Good records management! Follow your retention schedule, centralize control through recordkeeping systems, and establish policies and procedures that support the program.

Given Mark’s love of science fiction and fantasy (I can personally attest to his Lord of the Rings and Star Wars fandom; plus, the title of his presentation was “Have Fun Storming the Castle!”), I should have expected this appearance by Captain Kirk and crew in his slideshow:

He does not disappoint.

It’s just logical.

On that show, the ship’s computer would read anything dropped into it, regardless of what it was. Unfortunately, we don’t live in a world with that kind of technology (yet).

So we can assume that he’s hard at work replicating that computer for Texas, right? After all, he is the expert.

The Texas Digital Archive

Well, not exactly. The beginning of the Texas Digital Archive was much more down-to-earth: it began with people talking to each other. TSLAC was approached by the Office of the Governor in February 2014 about the transfer of 10 terabytes of data to the State Archives, and they were willing to offer us funding to do it. Ten terabytes may not sound like a lot of data — Alan Webber dismissed the idea in the previous session that 10 TB was a big deal considering the cost of 10 TB’s worth of external hard drives from Amazon.com — but Mark pointed out that in Kentucky, their entire digital archive (representing records from 100+ different agencies) comprised only 5 TB.

The State Archives couldn’t simply store all of the Governor’s records on an external hard drive, of course. For one thing, it would be extremely risky to maintain such important records exclusively on removable media; for another, the archives must also be able to actually find specific records later, should they be requested under the Public Information Act. Digital preservation doesn’t end with finding a storage medium: it’s about keeping records active and usable over time.

We — staff from the State Archives and the State and Local Records Management Division  — began meeting with the Governor’s Office monthly to discuss a plan for the transfer. Budget has been central to those discussions. With the funding provided by the Governor’s Office, TSLAC will be on the way to building what’s called a Trustworthy Digital Repository (as specified by ISO 16363).

Once TSLAC has completed purchase of its digital preservation software system, we will be able to ingest the Governor’s Office archival records into the system, apply appropriate metadata, and (eventually) make the records accessible.

Plans for the Future

Mark mentioned two initiatives he’ll be working on at TSLAC. The first will be the creation of an Electronic Records Working Group through RMICC to focus on issues related to electronic recordkeeping/preservation and make sure that records managers and technologists from a variety of agencies are talking to each other.

The second is to build partnerships with other agencies. The State Archives does plan to begin taking in archival records from other state agencies in the future, and to work with our division (State and Local Records Management) to develop training packages for digital preservation and serve as a knowledge base for state agencies and local governments.

Speaking of partnerships, I have had the pleasure of working directly with Mark on the Governor’s Office project, attending monthly meetings alongside colleagues from the State Archives and Information Resources Technologies divisions since our first meeting in February. It has been incredibly educational to see how a project of this magnitude gets off the ground, and I can personally attest that the strong partnerships, both within our own agency and with the Governor’s Office, have been key to its success!