I keep a lot of information in my Gmail account; in about 72,488 email messages. All of my email accounts forward to my Gmail Inbox. The convenience of having all of my email in one place is great, but I am risking things by putting all of my eggs in one basket. It’s unlikely, but Google could lose those emails, or for some reason revoke my access to the free service that I’ve been using for so long.
My current backup is still running on the cheapest instance available over at DigitalOcean:
$ fab docean status ... Status: backup running : yes number of emails : 52697 size on disk : 1.2G
A few requirements I had for my Gmail backup:
- Free (as in beer and speech)
- Runs remotely – to not heat up my laptop, and not shutdown when I close the lid
- Filesystem based for later statistical analysis
- Easy to backup
In searching for a tool to host my own backup of Gmail, I cam across gmailbackup. I’ve forked it and made a small change to it’s parsing of date fields in emails. One culprit message that caused the parse error is the introductory message sent by Google entitled “Gmail is different. Here’s what you need to know.” sent when if first signed up on June 23, 2004. When presented with the “Show Original” option, it lacks the required orig-date filed from RFC2822 (Internet Message Format).
Backing up 72,000 emails takes a long time, so I’d rather not run it on my laptop. I’ve put together a fabric script to handle the major functionality of running the backup on a remote server. I originally got this working inside a Vagrant managed virtual machine, but have moved to running it remotely on a DigitalOcean VPS. Setting up a new instance and ripping it down for testing is incredibly quick and simple with them, and they’re not currently charging for backups, so it’s as close to free as can be. I’ll probably bring back the Vagrant configs shortly.
You can check out the source and instructions on how to get it running over at GitHub: https://github.com/adamw523/gmailarchive