Android Email Extraction to .eml

Sometimes the Android ecosystem is a little lacking with tool support; for instance I needed to extract a set of sent items from a POP3 mailbox – the stock mail client only allows you to perform 3 actions: delete, mark as unread or favourite.

Armed with the Android SDK, some SQL queries and a Groovy script we’ll see how it’s possible to recover email to RFC822 .eml files.

Email storage

The stock com.android.email client stores message headers and their bodies in two sqlite databases and the AttachmentProvider stores attachments on disk.

The source code is available from https://android.googlesource.com/platform/packages/apps/Email – in this case it was for a Samsung phone running 4.1.2 so I checked out the jb-mr0-release branch to inspect the code when required.

Backup with ADB

ADB is the Android Debug Bridge, a debugging tool that is part of the Android SDK platform tools. Executing ‘adb usb‘ (re)starts the adbd daemon listening to USB connections.

The next step is to select Developer options from the System section of the settings list (Figure 1) and then enable USB debugging on the device (Figure 2).

Figure 1 - System settings

Figure 1 – System settings

Figure 2 - Developer options

Figure 2 – Developer options

Then connect to the device using a USB cable and execute ‘adb backup -f mybackup.ab -all‘ (you can also be more selective with the package you want, e.g. ‘adb backup -f mybackup.ab com.android.email‘). ADB will prompt you to unlock the phone and permit the backup to proceed.

Inflating the backup

An Android backup is a Zlib deflated tar file – thanks to http://nelenkov.blogspot.jp/2012/06/unpacking-android-backups.html for the hint that it can be reinflated with the following command:
dd if=mybackup.ab bs=24 skip=1|openssl zlib -d > mybackup.tar

Note that the install of openssl I had on OSX wasn’t compiled with zlib support, so I ran this through an Ubuntu Vagrant VM to inflate the tar file.

Examining the backup

Unpacking the tar file will give you a folder structure as shown in Figure 3; in this case the sqlite databases are in db and photo attachments taken with the phone are in f.

Figure 3 - Expanded tar

Figure 3 – Expanded tar

Exploring the databases

Sqlitebrowser is a good tool for browsing sqlite database and features in the following screenshots.

The current script restricts the messages processed to a single mailbox for one account. To determine the mailboxKey requires locating the target account from the Account table (Figure 4) and then isolating the desired mailbox for that account within the Mailbox table (Figure 5).

Figure 4 - Account table structure

Figure 4 – Account table structure

Figure 5 - Mailbox table structure

Figure 5 – Mailbox table structure

You can browse the data or execute the following query to get a list of mailboxes and their corresponding accounts:

SELECT a.displayName as accountName, m._id as mailboxKey, m.displayName as mailboxName
FROM Account a, Mailbox m
WHERE m.accountKey = a._id

If you don’t want to limit the extraction to a single mailbox or account then you can remove the WHERE clause from the first SELECT query.

Email Extraction

Being comfortable with the RFC822 specs & JavaMail (I built an IMAP extension for Alfresco in 2006), I decided it would be easier to reconstruct a MimeMessage using SQL & Groovy than attempt to adapt the Android source code to run off the device or build a custom app to run in the emulator.

Shortcomings

As a first cut that was good enough for my purposes, there are a few deficiencies:
1. It is set to default the Sender field rather than converting the value of Message.fromList and using the addFrom method
2. Addresses only use the address rather than the label
3. The body handling only uses plain text rather than alternative multiparts
4. Body.textReply is separated from the Body.textContent with a separator line of 25 dashes; it does not attempt to reconstruct the header information of the preceding message in the thread
5. The script does not handle attachments – this was a conscious decision as whilst the camera photos were in com.android.email/f, the other ‘RAW’ attachments were not in com.android.email/1.db_att/ as per the javadoc of the AttachmentProvider class

As an initial hint, to reconstruct the attachments you would need to use MimeMultipart, create the MimeBodyPart objects starting with a query like:
"""SELECT fileName, mimeType, size, contentId, contentUri, encoding
FROM Attachment
WHERE messageKey = ${msgKey}"""

Addresses

The script provides one utility method to convert a String-ified list where records are separated by the SOH character (ASCII 1) and email addresses and their corresponding display name are separated by the STX (ASCII 2) character.

The processing consists of iterating through the rows of the message table result set, for each row the message body is obtained from a separate sqlite database and then a JavaMail MimeMessage is constructed and output to a file.

Troubleshooting

The script cannot cause data loss on your device as it is operating on a backup. If you want to experiment first you may like to set a limit (e.g. LIMIT 10) on the first SELECT query to reduce the number of messages that it retrieves.
Also, a simple way of viewing the output within e.g. the Groovy Console is to use msg.writeTo(System.out) instead of creating the .eml file.

Get the script

The usual disclaimers apply that the script is without warranty, support etc. – get it from GitHub: https://gist.github.com/rbramley/65261127dfb857b03bb6

Breaking the monolith

I’m lifting the lid on my latest pet project which is set to revolutionise the ECM world. The codename is mu-fresco as it puts Alfresco into a Hadron collider with microservices.

This came about as I didn’t have access to 448 cores of JVM Azul goodness. The pretotype used 20 over-clocked Raspberry Pi units and an old HP Superdome picked up from eBay (I needed something beefy for the database and it was cheaper than RDS). After a bit of light surgery with a sharp scalpel, aka Spring Remoting with Hessian, it was time to awaken Frankenstein’s monster. The results are very promising though there are still a few kinks to be ironed out, with the Solr 1.4 index being one and the shared database schema between microservices being a glaring architectural impurity.

As for the next iteration, well “I’ve decided to take my work back underground to stop it falling into the wrong hands”.

Groovy graphs with Grails and Gremlins

This article provides a practical introduction to graph databases with a focus on the Groovy ecosystem. It explores Grails GORM support for Neo4j as well as query comparisons between the Neo4j Cypher language, SQL and the Gremlin graph DSL.

This article originally appeared in the May 2013 issue of GroovyMag.

Continue reading

Free your mind (map) with Groovy

Freemind is an Open Source mind mapping tool. Version 0.9 (released Feb 2011) introduced Groovy scripting that can be used for processing the mind map nodes. In this article we’ll look at a couple of use cases and the supporting scripts.

This article originally appeared in the February 2013 issue of GroovyMag

Continue reading

Groovy as a GoogleCode API client

This article gives an introduction to working with XML / HTTP APIs from Groovy in the context of a real world scenario using the GoogleCode API.

This article first appeared in the March 2013 issue of GroovyMag. Since the script was originally written Google deprecated the Issue Tracker API and scheduled it for closure on the 14th June 2013. So whilst the script is now for interest only, the principles are still valid for other purposes.

Continue reading

Monitoring elasticsearch

elasticsearch is an open source distributed RESTful search engine built on top of Apache Lucene.
Like any service or component in your architecture, you’ll want to monitor it to ensure that it’s available and gather performance data to help with tuning.

In this brief post, we’ll look at how we can monitor elasticsearch using Opsview, which is built on Nagios and thus has access to a wide range of plugins, yet provides a more approachable user interface for configuring service checks.

Continue reading

Monitoring OpenStack Swift with Opsview

This is a quick how-to for Opsview users who need to monitor an OpenStack (Essex) Swift installation. As a starting point we’ll perform a ‘front door’ check as this should work no matter what Swift implementation you are using.
Continue reading