~/blog

Backups with Tarsnap

I recently switched from local to online backups with Tarsnap. In this post I want to share what I think is great about Tarsnap and how I configured it.

Motivation

My main reason to change the way that I'm doing backups is that I couldn't get myself to regularly attach external storage and execute Back In Time. The starting point of my search was that I wanted a simple, secure and cheap way to do online backups.

Before coming to Tarsnap I experimented with SpiderOak and Duplicity. SpiderOak was too expensive and the Linux GUI felt unmaintained. Duplicity made a good impression but creating the backups took a long time and it isn't able to spot renamed files.

Meet Tarsnap

Tarsnap is not cheap (0.25 USD/GB-month) but it includes a de-duplication mechanism that considers the backup as a stream of data rather than a list of files. Using this method it can spot duplicate blocks and safe storage and time. The other features of Tarsnap are summarized in the design principles documentation.

Tarsnap is following the Unix philosophy and consists of «programs that do one thing and do it well»: Before Tarsnap can create backups, a key has to be generated using tarsnap-keygen. This key includes the tarsnap.com account that will be charged with the storage and data-transfer costs, the key to encrypt the data and a set of permissions. A newly generated key includes the read, write and delete permission. Later on tarsnap-keymgmt can be used to generate subkeys that for example can only create new backups but don't delete or read existing ones.

tarsnap is the main program and it's purpose is to «manipulate remote encrypted backups». Like the original tar it can be used to create new archives, list the contents of existing ones and unpack them. It can also delete archives.

Rotation schema

The first downside that I had to learn about Tarsnap is that it doesn't provide a way to regularly do backups. If you want daily backups of a particular directory you've to write your own script that does something like:

tarsnap -c -f /files/important-`date +%Y-%m-%d` ~/files/important

Even worse: You have to implement your own retention strategy. Fortunately, Michael Elsdörfer created tarsnapper.

Tarsnapper takes care of managing backup jobs and implements a Grandfather-father-son backup rotation scheme. The following example shows a tarsnapper config file (in YAML format) that will execute the "important" job and keeps daily, weekly and monthly backups.

deltas: 1d 7d 30d
target: /mymachine/$name-$date

jobs:
  important:
    source: ~/files/important

Automation

I'm using Anacron to regularly execute Tarsnapper. In contrast to Cron, Anacron is made for systems that don't run always, such as my laptop. When the backup is finished the results will be mailed to my email address together with some statistics:

tarsnap --print-stats --humanize-numbers

I don't want to run an own mail server locally, so I'm using msmtp to send the emails using my mail provider.

Summary

I think that Tarsnap provides a good backup solution. What I really like is it's speed. It does not start to compute a list of files that need to be send to the server and instead immediately sends blocks of data. It's de-duplication mechanism makes sure that only those blocks getting send that are not yet transferred. Currently I'm storing 12 GB of compressed data and a run of Tarsnapper takes less than 3 minutes.

A bit disappointing is that it takes about 5 minutes to restore a single file in a large archive. In my opinion this is acceptable because most of the time I want to do backups and only restore files when something goes wrong.

I'm sure that Tarsnap is not the best approach out there, but it works for me. In the end it does not matter how you do backups as long as you do them regularly. If you're not doing it don't wait for the next World Backup Day do something now.

Update (2014-11-24): In the meantime I went from mail to desktop notifications. I'm using notify-send for that, which integrates nicely with Ubuntu. You can view the backup script in my dotfiles repository.

about

Hi, I'm Jan Ahrens. In this blog you can read about my thoughts on various technical topics.

As you might have already guessed: My opinions are my own and don't necessarily represent those of my employer.

If you want to contact me you can use my PGP key. Its fingerprint is 3762 1152 E099 AB27 04E8 3FD1 B911 E6A2 2B4F 3B5F.

This blog is built with Jekyll. You can find its source code on GitHub.