The Archives and Information Services division at the Texas State Library and Archives Commission (TSLAC) has a sophisticated electronic records processes. Many agencies who submit their records to the Archives, may not know how their records are incorporated into TSLAC’s archival inventory. During e-Records 2020 Brian Thomas, an Electronic Records Specialist at TSLAC, presented “What Happens Electronic Records Sent to the State Archives?” The audience had the chance to get an in-depth look into the process of preserving an electronic records. I spoke to Brian about his presentation and asked questions for our readers:
1. What is the difference between ingest and migration?
Put simply, ingest is the process of adding electronic records to the digital repository. Migration is the process of converting an electronic record from one format to another, whether that be an upgraded version of the same format (.doc to .docx) or a completely different format altogether (.doc to .pdf). In a digital preservation system both the original record and the migrated record are tied together as versions of the same thing. Since file formats can vary widely, migration to standardized and common formats is important to ensure records are accessible long-term.
To elaborate about ingest, it encompasses many steps. This includes packaging files, making checksums to ensure that the files added to the repository are the same as what was sent, virus scanning, characterization of files by the system to know what type of things have been added, storage of extracted data as well as added descriptive data about a file, storage of the records themselves, and finally tying it all together in the repository database. The digital repository managed by TSLAC includes the option to perform standardized migrations based upon rules we set as a last step in the ingest process to ensure that this necessary step is taken.
2. What is metadata, and why is it so important when archiving electronic records?
Metadata is descriptive information (data) about stuff. This is such a pervasive thing that it can be hard to describe. To use a common example, imagine you are buying a season of your favorite television show being sold as local store in whatever format you prefer (VHS, DVD, Blu-RAY, etc.). The tangible physical object just is. At the store, the price tag is descriptive information about cost of the object. The back of the box holding the copy of your favorite season of the show (you know the one), has descriptive information about content of the season and perhaps the show as a whole. If there are multiple carriers (in 2020 this would be discs) of the season in the box, there is probably descriptive information about episodes of the show is held on that carrier. Without all of this metadata you would be holding a blank box that may or may not be free but definitely is a mystery.
As with the prior example, metadata about electronic records is important to knowing what it is and what characteristics it has. Metadata about the records set is important to know the context of where the records come from and why we have them. Metadata embedded in the file itself tells us what characteristics the file has, which in turn tells us how to handle the file. Many different types of file share the same file extension (which is itself metadata to give a computer a heads up on how to handle the file), so embedded metadata can have more details about the format. Think of a PDF scan of a physical document vs. an Email saved as PDF; same file extension and same programs used to access but wildly different in how the program handles it. If a scan or a document goes through Optical Character Recognition (OCR), the embedded text of the document is metadata about the image that was scanned. If you use an office productivity software or something like Sharepoint, metadata may exist about who created or last modified a file. A digital repository extracts, stores, and uses metadata about records to manage them and provide end users contextual information about the record. If migration occurs, metadata in a database is used to tie the two versions of the record together as a thing. If it is important to a researcher to know when files were created, embedded timestamp metadata (extracted by the repository) can allow them to search on that data point. Please see my blog post “Metadata: The Description is Out There” for more information.
3. What do you do to make access easier for the public with regards to electronic records?
This is tough so I’ll split it in two.
TSLAC, takes the electronic records received as part of an agency transfer and verifies that they have archival value. After that assessment, we make sure that the records are arranged in a logical fashion and create metadata about that arrangement, as well as metadata about the contents of the records as a collection and the agency itself. This is called a finding aid and provides context for the public to help orient them on what they are looking for.
I then take electronic records that have been arranged and do pre-ingest migrations to accessible formats if the preservation system cannot handle that type of format (for example, databases). I break the finding aid into pieces that correspond to the folders and individual records being ingested and pair them to ensure a researcher always knows at the very least what collection the file comes from. I also take any metadata provided by an agency and try to leverage that into enhanced metadata for end-user research. I also perform large-scale data manipulation of electronic records in the TDA to rectify any problems and enhance access, such as a current project to assign dates of creation to every item in the TDA to enhance searchability. This type of work lends itself to programming and data science. I also perform various types of research into how best to handle different types of records/formats, such as Artificial Intelligence/Machine Learning and the question of whether social media is an archival record for Texas. I also manage the TDA public access portal, which provides access to the electronic records that should be public, and contains webpages with parts of the finding aids described above.
4. What type of preservation work do you do with regards to electronic records?
Preservation work depends on the needs of the format type in question. For standard formats handled by the preservation system I will check on ingested files to ensure everything migrated to the correct file format properly. For file formats not handled or not handled well by the preservation system, if available, I will track down software capable of performing an appropriate migration, manually make that migration to a long-term stable format and ingest both versions of the record as a unit. For compound items, such as DVD video or GIS, I may make an access copy of the item which combines all of components into a single unit. DVD video to web streamable mp4 files is a great example of that work.
5. Where can we find more information with regards to electronic records preservation?
More information about digital preservation at TSLAC can be found at the Texas State Library and Archives Commission Digital Preservation Framework. This information does not include academic sources. Digital preservation tends to be a learned skill combining multiple fields of knowledge. Although TSLAC cannot endorse any particular vendor, a great academic resource is the InterPares Project. Community Owned digital Preservation Tool Registry (COPTR) is another resource for getting a handle on digital preservation. Finally, the Council of State Archivists has a section on handling electronic records which can be a good place to look.
It is important to note that the purpose of electronic records preservation can vary by circumstances. What works for some office or organization may not be viable for another. For records management questions, I would recommend reaching out to your SLRM liaison for information. For many common situations they may have an answer to your concern. If they do not have an answer to your question, they may refer you to TDA staff who can give guidance based on the situation you describe.