“Words – so innocent and powerless as they are, as standing in a dictionary, how potent for good and evil they become in the hands of one who knows how to combine them.”
Last time on “My Favorite Artifacts,” I gave a brief overview of what forensic artifacts are and which of my personal favorites I’d be covering in the months to come. In this month’s installment, I’m tackling user dictionary files and sharing a method I call “user dictionary date bracketing” to learn more from these files than you might think possible at first glance.
I first learned the power smartphone user dictionary files during a sexual assault of a child investigation several years ago. Tragically, the 11-year-old child’s parents were killed in a car crash. She was sent to live with her aunt, who resided with a convicted sex offender. Within a few months of moving into the home, the child was victimized by her new guardian.
The perpetrator would use the Notes app on an iPad to pass messages to the child about what he intended to do to her. The child would pass messages back to the perpetrator on the same iPad. After the iPad was passed back to the perpetrator, he would erase the notes.
The child was able to provide details about the assaults in a Safe Harbor interview, but was unsure of the dates when they occurred. The prosecutor needed to know the date range of the assaults in order for the case to be charged, so not having this information proved to be a frustrating roadblock. The child was able to provide some details about the contents of the messages passed back and forth on the iPad, though, which helped focus the forensic examination and ultimately led to the solution to finding the needed date ranges.
While I found many fragments of deleted notes in free pages of the notes.db SQLite database file, I was unable to find the dates corresponding to when the entries were made. Frustrated, I resorted to keyword-searching individual words within the recovered fragments. I soon found that nearly all of the words I searched for had matches in the user dictionary file.
On smart devices with touchscreen keypads such as iPhones and iPads, user dictionary files are indispensable. Dictionary files assist the device user in spelling things correctly and adds to ease of use of the device, such as with predictive suggestions (though plenty of people have found autocorrect to be a bugbear in its own way). All sorts of devices use them, regardless of the mobile operating system involved. The dictionary file may be populated by some applications and not others, depending on whether the app has permissions to write to the file.
Most default applications on the device can make use of the user dictionary. Its content may even include words synced from other associated devices. As a result, the dictionary file captures portions of the content that gets typed on the keyboard of the smartphone. In this way, the dictionary file is somewhat like a keystroke logger, although it only captures some typed content instead of everything.
A typical user dictionary file reads a bit like strange spoken word poetry and looks like this when opened in Notepad:
It’s clear to see how much information we can glean about the phone’s user just from reading the contents of the file. Usernames, app names, some context about what the user is typing about, places of employment, and potentially even passwords can be found in the dictionary file.
(This particular dictionary file might look like a passage from a twenty-first-century rewrite of James Joyce’s Ulysses, but it actually comes from the SANS FOR585 Advanced Smartphone Forensics course. Many thanks to my course co-authors Heather Mahalik and Lee Crognale for letting me use this dictionary from one of our iOS labs! By the way – they are aware of the potential OSINT factor – this is test data.)
Dictionary file content generated by user interaction with the device is generally (but not always) populated sequentially as the words are typed into the keyboard. This means you can find snippets of conversations or typed strings in the dictionary, but also that they are laid down roughly in the order they were typed. There aren’t any conveniently-placed notations about the date and time at which the content is typed, but the sequential order of the words themselves can be of great investigative value.
For example, imagine that the yellow highlighted string “backup sight for delta point reflex sight” is important to an investigation, but we don’t know when the user typed it. We can use keyword searches of all the extracted from the device using unique terms and words positioned around the phrase until we find matching hits.
In this case, the words “dude” and “what the” were created in the dynamic-text dictionary before the phrase we searched for. They were created as the result of the user, Gus Thomas sending a text message on May 9, at 1:45 and 1:46 (UTC+0). The term “the purge” was then searched for on May 22 at 1:51 (UTC+0), resulting in these words being populated to the user dictionary. We could potentially narrow the time frame further with additional keyword searches, but already we know more about the timeline of the events than we did before.
Now that I’ve described user dictionary date bracketing in better detail, it’s time to go revisit the child abuse case in which I discovered and used this technique.
The user dictionary on the iPad proved to be instrumental in the investigation. I painstakingly worked through the dictionary file by hand, referencing typed word combinations back to the non-deleted entries in SMS messages, browser searches, and other database files that still had active content with intact dates and times.
With persistence, I was able to bracket user dictionary word entries with known dates around the content from the multiple deleted notes entries without dates in order to determine a fairly close date and time range for the various notes entries, and therefore the associated assaults. The information I gleaned was exactly what we needed in order to establish a date range for the abuse and bring the perpetrator to justice.
All of this work ended in a guilty plea by the suspect and ultimately a long prison sentence as well. That’s the power of words, even words deleted and forgotten in an electronic device. Since then, I’ve used this same bracketing technique on all sorts of different cases. While this example was for an iOS device, the technique works on both Android and iOS devices, and for 3rd party apps as well.
There are numerous third-party user dictionary apps on the market, and you may need to do some digging to locate them. The following is a list of common locations for the dictionary file on various devices.
Look for a backup file named:
From a file system extraction look for:
/private/var/mobile/Library/Keyboard/en_US-dynamic-text.dat (or whatever language the user chose.)
Look in the data/data directory for:
Samsung uses Swiftkey as a default, and the user dictionary can be found here:
Happy hunting, and stay tuned for next month’s installment of “My Favorite Artifacts!”