e-Records 2012: The Fascinating (and a Little Frightening) World of Computer Forensics

e-records conference imageThis article is the first in a series of recaps of the 2012 e-Records Conference, a conference dedicated to electronic records management that has been co-sponsored by TSLAC and the Department of Information Resources since 2000. Presentations from the e-Records Conference are available on the e-Records 2012 website.

By Angela Ossar, Government Information Analyst

I think I speak for everyone in Craig Ball’s session, “e-Discovery:  Nerdy Things You Should Know About Computer Forensics,” when I say that computer forensics (when explained right) is downright riveting.

Ninety-two percent of information is born digital. This means that, increasingly, forensic evidence is digital.  In a rousing afternoon session, Ball talked about the treasure trove of evidence that we leave behind when we use computers and smart phones, and introduced computer forensics as the way that we find the story behind the human drama of what people see, hide, steal, and think – essentially, using our digital fingerprints to read minds.

Sometimes these fingerprints are more overt than others, like an incriminating text message (“im so wasted”) or Facebook status.  But, as “mute witnesses,” cell phones are capturing a lot of evidence that many people don’t even know about. For example:

  • Your location.  EXIF data on a digital image show the latitude and longitude of every photo you’ve taken. And even if you turn off your phone’s GPS functionality, if the phone is set to automatically look for a wireless signal (which most of our phones do), your movements are still traceable – an investigator would simply look for the locations where your phone “pinged” for a wireless signal.
  • Your words.  Phones keep track of every word you type into them – these keystroke logs (much like a device you’d illegally attach to a computer at the bank to try to capture people’s usernames and passwords) are how they anticipate what you’ll type next. They do this to serve you better – it’s faster to write a text message when your phone can suggest what you might want to say.
  • Your browsing activity.  Ball talked about the employee who views inappropriate content at work.  This employee probably knows to clear his History File and his Internet cache.  But Ball talked about all the other relatively easy ways that the browsing history can be tracked, including index.dat files, Thumbs.db files, registries, USBSTOR keys, Recycle Bin INFO2 files, Prefetch folders, swap files, system logs… Also, he explained how an Apple iPhone takes constant screenshots — it’s how it produces the nifty disappearing visual when you press the Home button.  And according to Ball, it stores those screenshots for a long time. So, unless you really, really know what you’re doing, it’s pretty hard to conduct criminal activity on a computer or phone without getting caught, as long as a computer forensics expert knows what to look for.
Did you delete that email...or delete-delete it?

Did you delete that email…or delete-delete it?

In our Managing Electronic Records (MER) course, we talk about the difference between deleting data… and “delete-deleting” it.  Ball got particularly nerdy when talking about how, when we delete something from our computers, we haven’t really deleted it.  I cannot hope to explain the concepts of vectors, sectors, clusters, and undulating magnets in the dynamic way that Ball did.  But what I do understand is that when you delete something, it’s not gone until it’s overwritten.  Basically, the information still exists in your hard drive; you’ve merely told your computer that that part of your hard drive is available for overwriting.  Ball compared it to throwing away a card file from a library’s card catalog — it destroys the card, but the book still exists.

Another interesting take-away for me was, at long last, a good explanation of what a file header label is!  Why is this so exciting? Because in the aforementioned MER course, we have to explain that, when you image something, you have to make sure that “a non-proprietary image file header label” is used (13 TAC 6.96(f)(1) for state agencies, 13 TAC 7.76(g)(1) for local governments).

Bulletin 1 opened as a text file

A screenshot of Bulletin 1’s title page opened as a text file – see the first four characters?

So, what the heck is an image file header label?  Well, for a JPEG, explained Ball, it’s these 4 characters: ÿØÿà.  If you open up a JPEG as a text file, you will see that every JPEG begins with ÿØÿà.

This information tells a computer forensics expert what kind of file they’re dealing with even when they do not have the software to open that kind of file. And it would tell a digital archivist, for example, what software to search for in the PRONOM Registry to enable the archivist or an archives researcher to access the file. The image file header can be the key to ever viewing that image again — while we may understand ÿØÿà, I, for one, do not get much from V¥»ü    n†«¨ëúÊÚJ=ÝýÜuõýœ/Éë7±ÛŒe36 — but maybe that’s just me.

The last nerdy thing that I found interesting was Ball’s explanation of “hashing.” Have you ever heard of de-duplication software?  Some organizations use it to ferret out all the convenience copies that are spread all over a network. It works through what is called a hash algorithm. A digital file’s hash value is the item’s “digital fingerprint.” Hash values can be generated for any kind of electronic information: from a single typed word, to a word processing document, to an entire hard drive.  A hash value is several thousand times more unique to a digital file than a human’s DNA is to that person. By running a bitstream through a hash algorithm, you can tell whether two pieces of information are exactly alike.

For example, the text string “Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.” should produce the hash value 73168f4191456bc526791a83c064997b using MD5 (one of the most common hash value generators). No matter who runs the hash algorithm or what software they use to run it the result should always be 73168f4191456bc526791a83c064997b.  Feel free to try it yourself. This is how a digital archivist can prove that a file has remained uncorrupted while in the archives’ custody — the hash algorithm produces a checksum, which says: this file is the same as it was when we got it.

Deep into the creepier parts of computer forensics, Ball asked the audience something like, “who here wants to go home and throw their computer out the window?” It’s true, I think we were all a little dismayed by all the ways our computers track our every move (and don’t delete the information when we think we’ve deleted it). But the session definitely helped me understand some of the technical aspects of the way information is captured and stored, and succeeded in making a very nerdy topic extremely interesting.

2 thoughts on “e-Records 2012: The Fascinating (and a Little Frightening) World of Computer Forensics

  1. Angela,

    I think you definitely have the skills for a writing career – that is, if you ever wanted to leave the Records Management field (who would want to do that?) – just kidding.

    Although I attended the same nerdy session, I could not have put it into words as you have – thanks for your excellent transcribing abilities and your willingness to share it with all of us “not so nerdy” RMs. Kay Steed

Comments are closed.