Why Isn’t Everything in the Archives Available Online?

By Steven Kantner, Digital Asset Coordinator

In our online culture today, we often expect to find everything we need just one click away on the Internet. It’s easy to wonder why everything in the archives isn’t digitized and online. Sadly, it’s easier said than done. The Texas State Library and Archives Commission (TSLAC) holds more than 82,000 cubic feet of records, consisting of millions of sheets of paper and thousands of audiovisual recordings. This grows each year as more records are transferred to TSLAC. Digitization is an expensive undertaking, requiring much time and labor. For many regular folks, quickly scanning family photos and placing them on social media or scanning a stack of standard office paper on a copier may make digitization seem simple. However, digitizing photographs, documents, and recordings properly for digital preservation and online access is not as easy as it seems.

How do we prioritize what should be digitized? There are several factors we usually weigh. We prioritize based on the demand for an item or collection. This greatly reduces handling of the item(s), and we are able send the existing digital file to patrons who request a copy. Historical value or interest in an item or collection is also a factor for prioritization. This often overlaps with demand, but not always.

Media formats that are at high-risk for degradation, such as motion picture film, audio, and video formats often rise to the top of priority lists – some old motion picture film has already become unplayable.

Above: An acetate photographic negative displaying severe degradation (top) compared to a more stable example (bottom).

Estimates are that we have less than 20 years to digitize magnetic audio and video tape formats before obsolescence of the technology gets so severe that nobody is around to repair playback equipment and/or manufacture parts, or the tape itself degrades beyond playback ability.

A delaminating lacquer disc from the 1940s.

The first step in a digitization project involves gathering detailed information about what is in the collection. This involves several things: a condition assessment, creating an inventory and documenting metadata. All items in a collection to be digitized should be reviewed for condition. Some items, like 19th century muster rolls need conservation treatment; items like motion picture film or audiotape could degrade and may need remediation or they might become unplayable. [

While reviewing the content of a collection for condition and inventorying, we may find that papers will be folded in ways that make it difficult to digitize without causing damage. Such paper will need to be humidified and flattened before digitization starts.

Metadata, or information about the digital item, is typically something most people don’t think about. A good example where the average person may have experienced the lack of metadata are family photographs. Often, the names of individuals aren’t written on a photograph and nobody alive today can identify people in the photo. Those identities are now “lost” due to the lack of metadata on the photo. This is why metadata is crucial for digitization and why staff spends as much time in front of a spreadsheet as in front of a scanner. The better the metadata we can provide, the easier for the item to be discovered through online searches.

Creating metadata requires a good deal of time to inventory and document the information about each item being digitized. While gathering this intellectual data about the items is of the utmost importance, sometimes we don’t have complete information – just like the family photos that are missing names. Nor do we have the resources to dive into deep research for every single item we come across in a collection that has little data. We often rely on inventories provided by donors, which may or may not be as detailed as we would like. At the very minimum, we need to have an inventory list consisting of a basic description (such as a title) and unique identifiers assigned for each of the objects before any digitization takes place.

If we are going to digitize an item, we ideally want to go through the process once. Digitizing the same documents over and over wastes time, money, and labor. Plus, digitization increases wear and tear on the object. Bound volumes’ bindings weaken, audio cassette tape gets “eaten” by the player, etc. Preserving an object means we must handle it the least amount possible.

TSLAC Photograph Archivist Cait Burhan uses an overhead scanner to image a fragile document.

Today we digitize to create a high-quality digital master that will be electronically preserved in our digital preservation system. We create copies of this master file for the public to view online or receive upon request. We follow guidelines for best practices in digitization from sources such like FADGI (http://www.digitizationguidelines.gov/) and have created our own set of policies and rules for our in-house digitization program.

Standards for digitization are much more stringent than what you might do at home. Large uncompressed TIFF files are the standard for master images, not compressed JPEG files, which are only used for web copies in our processes today. Digitization of different types of media varies in resources required. While one typed document may be imaged rather quickly, one photograph may take five minutes to digitize at the proper preservation resolution, and time-based media may take hours to digitize, depending on the length of the audio or video recording. For example, it has taken TSLAC’s digital asset coordinator one and a half years to digitize almost 400 audio reels (about 13 cubic feet) from just one collection, while working on other projects simultaneously.

Once digitization is complete, the metadata needs to be converted into a format for the digital preservation system to go along with the item it is describing. Essentially every item from a large spreadsheet is converted into an individual metadata record that tags along with the digital object. Web copies must be created from the master files, and local file management takes up time to ensure safekeeping of the files before they go into the preservation system.

An XML metadata file along with two TIFF images of pages from an early Republic of Texas law.

In addition to all of these procedures, mistakes do happen – we are human after all. Guarding against human error means incorporating quality control measures. The metadata and files should be reviewed by multiple people at multiple points to catch errors and reduce re-work. This additional step adds some time and effort to a project but saves from redoing work that would be a far greater investment in time and effort.

For a large archival institution like TSLAC, imaging materials is only a small portion of the work involved in digitization. Each phase of a project, from selection, to metadata creation, delivery and preservation, there are an array of factors to consider and procedures to implement. Beyond the technical aspects, there are legal issues like copyright and privacy that prevent archives from offering images from their collections on the web. For now, patrons may explore a generous sampling of TSLAC’s holdings by visiting the Texas Digital Archive and other online collections.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.