Entries in backups (6)


Back to backups (yet again)

In the world of information technology, nothing is static and lasts forever, especially best practices. I’ve been pointing out to clients for a while now that backups need to be rethought in terms of the “jobs to be done” philosophy and no longer thought of as “the thing that happens overnight when files are copied to tapes”.

Historically, backups served two purposes :

  • Being able to go back in time and retrieve data that is no longer available
  • Serve as the basis for a disaster recovery

Fundamentally, backups should really only serve the first point. We have better tools and mechanisms for handling disaster recovery and business continuity. Which brings me to snapshots. I have always told people that snapshots are not backups even though they respond to the criteria of being able to go back in time.

The hiccup is that snapshots that are dependent on your primary storage system should be considered fragile, in the sense that if your primary storage goes away (disaster), you longer have access to the data or the snapshots. However, just about every storage system worth its salt today includes the ability to replicate data to another system based on or including the snapshots themselves. This is a core feature of ZFS and one I rely on regularly. Many of the modern scale out systems also include this type of functionality, some even more advanced than ZFS like the SimpliVity implementation.

When are snapshots backups?

They become backups once you have replicated them to another independent storage system. This responds to the two basic criteria of being able go back in time and be on a separate physical system so the loss of the primary does not preclude access to the data. They become part of your disaster recovery plan when the second system is physically distant from the primary.

Disk to Disk to Tape

We’ve already seen the traditional backup tools adopt this model to respond to the performance issues around coping with the every growing volume of file data so that data can trickle over to a centralized disk store which is directly connected to tape drives where they can be fed at full speed. Exploiting snapshot based replication permits the same structure, but assigns the responsibility of the disk to disk portion to the storage system rather than the backup software.

The question I ask in most cases here is whether the volume of data involved justifies the inclusion of tape as a backup medium. According to the LTO consortium, LTO6 storage is as low as 1.3 cents per Gb, but this only takes into account the media cost. The most bare bones of LTO drives runs around $2,200, which bumps up the overall cost per Gb rather dramatically.

Assuming a configuration where we store 72Gb of data on tape (12 tapes), at the $80 cost per tape cited by the LTO Consortium plus the cost of the drive, this works out to about 4.3 cents/Gb. At current street prices, the 6Tb WD Red drives run about $270 which converts to 4.5 cents per Gb, not taking into account the additional flexibility of disks that permit compression and deduplication. Note that the 6Tb cited for the LTO numbers already includes compression where the 6Tb disk is raw before compression and deduplication.

Tape does have some inherent advantages in certain use cases, particularly long term offline storage, and does cost less to operate on a $/watt basis, but for many small to medium sized environments, the constraints for using it as a primary backup medium (especially when it is also the primary restore medium) are far outweighed by the flexibility, performance and convenience of a disk based system for daily operations.

Operational convenience of disk over tape.

Tape is a great medium for dumping a full copy of a dataset, but when compared with the flexibility of a modern disk based system it falls far behind. A good example that I use is the ability to prune snapshots from a data set to reorganize the space utilisation. In many systems, I use hourly snapshots in order to give users the a decent amount of granularity to handle errors and issues during the day. This also means that the unit of replication on a given filesystem is relatively small, permitting me to recover from intersite communications failures and not have to resend huge data sets that might have been interrupted. Then on the primary system I prune out the hourly snapshots after a week to leave one daily instance to be retained for 2 weeks. A similar process is applied to weekly and monthly snapshots. Where this gets interesting is that I do not have to apply the same policy on the primary and backup storage systems. My backup storage system is designed for capacity and will retain a month of daily snapshots, 8 weekly snapshots and 12 monthly snapshots. The possibility of pruning data from a set is something that is impossible to do effectively using tape technology, so tape is used for an archival copy that needs to be retained beyond the yearly cycle.

Files vs virtual machines

The above-noted approach works equally well for file servers and storage systems hosting virtual machines, especially if we are using a file based protocol for hosting the VMs rather than a pure block protocol like FC or iSCSI. In the world of virtual machines backup tools are considerably more intelligent about the initial analysis of the data to be backed up. Traditional file server backup is based on a two phase process of scanning the contents of the source, matching this against an index of data known to be backed up and then copying the missing bits. This presents a number of practical issues :

  • the time to scan continues to grow with the number of files
  • copying many individual files is a slower process with more overhead that block based differentials

By applying the snapshot and replication technique, we can drastically reduce the backup window, since only the blocks modified between two moments in time need to be copied. In fact there is no longer a backup window since these operations are continuous in the background of the file server.

Virtual machines in the VMware world maintain tracking journals of modified blocks (CBT) which enables the backup software to ignore the filesystem representation of the data and just ask for the modified blocks to copy since the last backup transaction. But again, if we are transmitting snapshots from the underlying storage system, don’t even need to do this. It is, however useful to issue VSS snapshots inside of Windows virtual machines to ensure that any inflight data in caches is flushed to disk before creating the storage layer snapshot.

The biggest issue with backing up virtual machines is the granularity of the restore operation. With only a simple replication, the result is a virtual machine with no visibility into the contents of its internal file systems. This is where the backup tools show their value in being able to backup a virtual machine at the block level, and yet still permit file level restores by peeking inside the envelope to look at the contents of the file systems therein.

The last mile

There are still issues with certain types of restore operations that require a high level of integration with the applications. If you want to restore a single email out of a backed up Exchange or Notes datastore, you need a more sophisticated level of integration than simply having a copy of the virtual machine.

But for the majority of general purposes systems, and particularly file services, the simple replicated snapshot approach is simpler and more effective, both from a cost and operational perspective.


Dump to tape from VMFS

A recurring issue that I see in a few instances are places that still have requirements to externalize backups to tape for long-term storage (please don’t use the archive word). But on the other hand, it’s clear that disk to disk backup solutions that leverage the VADP protocols are considerably more efficient tools for VMware environments.

Now assuming you have a decent budget, I highly recommend Veeam Backup & Replication as a complete solution that now integrates tape externalization. But if you’re environment is smaller or can’t justify the investment when there are excellent “free” tools like VMware Data Protection available, here’s a potential solution for long term dump to tape.

Assuming that you have some kind of existing backup solution that write files to tape, the problem is that VMFS is pretty much an unreadable black box file system. This has been exacerbated by the fact that wuth ESXi the old fashioned approach of putting a Linux backup client in the Service Console is not longer really a viable option.

So we need a few little things in place here.

  • A server with your backup software connected to the SAN (iSCSI or FC)
  • A storage bay that can create and present snapshots (optional, but more efficient)
  • The open source VMFS driver

Some assumptions:

  • Your backup appliance is stored on VMFS3.x block storage, with no RDMs

The basic process involves the following steps:

  1. Stop the backup appliance
  2. Snapshot the LUN(s) for the backup appliance
  3. Start the backup appliance
  4. Present a cloned volume based on the snapshot to the backup server
  5. Connect to the LUNs using the fvmfs java applet and publish them over WebDAV
  6. Mount the WebDAV share as a disk
  7. Backup the contents using your backup software

Stop the backup appliance

In order to ensure a coherent state of the data on disk, you’ll want to stop the backup appliance. VDP can be stopped with a Shutdown Guest OS from the VI-Client or shutdown at the command line.

Snapshot the LUN(s) for the backup appliance

Snapshotting the LUN is an efficient method to have a copy of the appliance in the off state to ensure data consistency. Most systems will allow you to script this kind of activity.

Example using Nexenta recordings:

create snapshot backuppool/vdp-01@totape

Start the backup appliance

Since we have a snapshot, we can now restart the backup appliance using the VI-Client or whatever is easiest for you.

Present a cloned volume based on the snapshot to the backup server

Now that the appliance is running normally, and we have a snapshot with the appliance in a stopped state we can now continue with the extraction to tape process without any time pressure that will impact new backups.

So we need to create a cloned volume from the snapshot and present it to the backup server:

Nexenta example:

setup snapshot backuppool/vdp-01@totape clone backuppool/totape
setup zvol backuppool/totape share -i backupserver -t iSCSI

Where -i points to the name of the initiator group and -t points to the registered target group (generally a set of interfaces).

Now to verify that the presentation worked, we go to the backup server and (assuming a Windows Server), Computer Management > Disk Management. We should now see the a new disk with an Unknown partition type. Don’t try to format this or mount it as a Windows disk. From a practical standpoint, you won’t be doing any harm to your source data since it’s a volume based on a snapshot, not the original, but you want access to the source data. What you want to note is the name on the left side of the window “Disk 3”.

NB If you are using VMFS extents based on multiple source LUNs, you’ll need to present all of them so take note of the new disks that are presented here.

Connect to the LUNs using the fvmfs java applet and publish them over WebDAV

Still on the Windows server, you’re going launch the fvmfs java applet so you’ll need a recent java.exe. At the command line, navigate to the folder containing the fvmfs.jar file and launch it using the following syntax:

“c:\Program Files\Java\jre7\bin\java.exe” -jar fvmfs.jar \\.\PhysicalDrive2 webdav 80

Where the Physical Drive number maps to the Disk number noted in the Disk Management console.

If you are using extents, note the disks with the same syntax, separated by commas.

The WebDAV share is mountable on modern Windows systems with the classic “net use Z: http://localhost/vmfs”.

If you have the misfortune to still be using Windows 2003, you’ll also need to install the WebDAV client which may or may not work for you. If it still doesn’t work, then I recommend trying out WebDrive for mounting WebDAV shares to a letter.

Once the drive is mounted to a drive letter, you’ll have near native access speed to copy the data to tape.

Tear down

Cleaning up is mostly just walking through the steps in reverse. On the server doing the backups, unmount the drive by closing the command prompt running the fvmfs applet or control-C to kill the process.

Then we need to delete the cloned volume, followed by the snapshot. Another Nexenta example:

destroy zvol backuppool/totape -y
destroy snapshot backuppool/vdp01@totape -y

And we’re done.

Restoring data

To keep things simple, restoring this kind of data is best done to an NFS share that is visible to your ESXi hosts. This way you can restore directly from tape to the destination. The fvmfs tool presents a read-only copy of the VMFS datastore so you can’t use it for restores.

Under normal conditions, this would be a very exceptional event and should be to some kind of temporary storage.

Other options

A simple approach for extracting a VDP appliance is to export the system as an OVF, but there are a number of shortcomings to this approach: - it’s slow to extract - it can only extract to via the network - you need a big chunk of temporary space

NB: This is a specific approach to the specific problem of long term externalization. In most operational day to day use cases, you’re better off using some kind of replication to ensure availability of your backups.

[^fn-1] Nexenta recordings are a method of building scripts based on the NMS command line syntax. They are stored in the .recordings hidden directory in /volumes and are simple text files that you can launch with “the “run recording” from the NMS command line without diving down to the expert mode raw bash shell.


Update to back to backups

In light of the recent hacking catastrophe of Mat Honan, there has been a flurry of articles discussing the requirement to have a good backup plan in place and what makes for a good backup plan.

Backing up

I’ve already detailed my backup architecture here, but there are a few things that I see people adding as an absolute requirement for any backup plan, notably a cloud component.

Several cite very useful services like Backblaze and Carbonite which are definitely good approaches if you don’t want to construct something as complex as my system. The cloud component isn’t an absolute necessity, but it brings the advantage of ensuring that recent changes have a copy outside of your local environment.

Dropbox is also often cited as a backup solution, but it’s important to remember that Dropbox is a sync tool, not a backup tool. That said, because of the way that Dropbox works, by keeping local copies of files in sync across different machines, it behaves in someways like a backup since you can lose a computer and simply by relinking to your Dropbox account, you get the files back.

On the other hand, if your Dropbox account is hacked and the hacker deletes your files, they will be deleted across all of your computers. Now this problem is somewhat alleviated by the fact that with Dropbox you can restore missing or older versions of files, but if you’ve lost control of your account this may not be an option.

But since you have local copies of these files, your regular backup plan will ensure that your local backups (Time Machine, SuperDuper, CarbonCopyCloner, Retrospect etc.) will have the history of these files.

And I need to mention a shout-out to Retrospect here for anyone that needs a really good corporate backup solution that scales from really small to quite large. It has been largely replaced by Time Machine in most personal and SOHO environments, but Time Machine doesn’t scale terribly well once you have more than two or three computers hitting the same server. And if you want to consider archiving to tape, then it’s absolutely the best way to go.

Anti Hacking

The other part of the story is avoiding getting hacked, and ensuring that your cloud services are well secured. A few basic principles:

  • Use complicated passwords
  • Use a password management tool to keep track of them (I like 1Password)
  • Don’t ever use the same password on different sites or services
  • Use any advanced security features offered (Google two factor authentication for example)

If you are building corporate web applications, there are affordable two factor authentication services like Duo Security that you should be looking into to avoid a personal hack spilling over into your work environment.


Tape in 2012

I’ve been reading a number of articles recently concerning LTFS as an opportunity for keeping tape alive as a new tier of data in big data environments. There are even some very interesting new appliances on the market that target this type of usage like Cache-a. As an archival tool, I find this a potentially useful approach. But as a near line cloud storage solution I have a few quibbles.

The more time I spend evaluating storage solutions, I keep coming back to a point that is currently only being addressed properly by ZFS and ReFS: guaranteed data integrity. In our current world of virtual-everything, you can pretty much be guaranteed that every single layer of the stack is lying in some way to the others and we have imperfect implementations and to top it off we have bit rot to deal with.

LTFS looks interesting as a method of maintaining non critical data available, albeit with high latency compared to a disk solution. But there is no accounting for end to end data integrity, nor any kind of internal redundancy other than that integrated into the tapes themselves. Doing any kind of RAID on tape volumes strikes me as an exceedingly futile gesture so redundancy is out as are any of the ReFS/ZFS auto-correction approaches that require parity data on a separate physical device.

So from this point of view, unless I have yet another copy of all of this data elsewhere, I’m presenting a risk of data loss or compromised data that I can’t identify. And even then, just how do I audit and verify the integrity of the different copies?

Moving data around

The other point being raised in favor of the use of tapes and LTFS is the possibility of being able to ship them to another site and make that data available relatively easily when WAN replication techniques are not viable due to the volume of data. If we start from the hypotheses of the traditional storage bay with traditional RAID controllers that require near-identical hardware at each end, obviously the tape solution is more interesting. But as we move towards commodity JBOD systems with on-disk structures like ZFS where, like LTFS, everything required to mount the filesystem is self-contained, shipping disks instead of tape is an entirely viable approach.

Then the question of data density comes into play. Currently LTO has a native capacity of 1.5Tb/tape. I’m going to leave compression out of consideration here since I can just as easily compress data to disk[1], and possibly go further with ZFS pool level deduplication.

Roughly, we have the following sizes and capacity measurements today in 2012:

MediaVolumeDataData Density
LTO-5248 cm31,5 Tb6.2 Gb/cm3
3.5” disk376 cm33 Tb8.2 Gb/cm3
2.5” disk105 cm31 Tb9.7 Gb/cm3
2.5” disk105 cm32 Tb19.5 Gb/cm3

Update: Western Digital just announced a 2 Tb 2.5” drive.

Which means that under the current state of affairs, on a measurement of data by volume, using 2.5” disks is the most efficient means of transporting data with a 50% (Update: 300% with the new 2 Tb drives) advantage over LTO-5. Granted that considering data by weight the tapes will probably come out ahead if that’s the primary factor for shipping costs, although the new 2Tb 2.5 drives are probably in the same ballpark.

The secondary issue is obviously cost/Gb/cm3 where tape wins against the new 2Tb 2.5” drives, but if history has shows anything, it’s that the price for disk storage goes down very rapidly due to economies of scale and the overall size of the market.

Of course, LTO-6 will improve on this greatly, but will require equipment replacement, whereas higher density disk drives can plug into current equipment with a considerably longer lifecycle. Plus you have the advantage of being able to profit from incremental improvements in disk technologies with existing equipment.

Other use cases

I can think of a number of places where this would be exceedingly useful in some cloud environments. If your data is primarily self-correcting lossy data, and there’s more data than than is actively being solicited, this lends itself quite well to this kind of architecture. I’m thinking photo or video sharing sites where there are tons of truly cold data that has no SLA attached, and even if a bit flips here or there the data structures are sufficiently forgiving that it will probably pass unnoticed. Photos are especially well adapted since I can keep smaller versions on more expensive disk to present to users while loading up original hi-res versions in the background.

But there are unforgiving binary formats that will simply choke if presented with a data structure that does not conform to its expectations. A flipped bit results in completely useless data without manual review and correction, which may not even be possible.


A side topic that often comes up is the much touted tape longevity which goes up to 30 years. Of course this is mostly a straw man argument since pretty much any data stored for 30 years will be completely unusable without the accompanying applications that generated the data. Text and a few very widely implemented binary formats will always be usable to a certain extent, but beyond that don’t bet on much. XML is not the answer. docx, xlsx and the family of the current generation formats will be useless if computing and the interpreters have changed significantly in the interim.

Recent history

I’ve just recently extracted a pile of data I had archived in 2000 on an AIT-1 tape. Absolutely no problems reading with the data stored on the tape once I had built a machine with my 2000 era OS and an old copy of the backup software (Retrospect). Kudos to Retrospect for a solid product that stands the test of time. The tape was perfectly reliable despite being 12 years old and surviving several international moves under less than ideal conditions but I did have to jump through a few hoops finding an AIT drive and getting the software to recatalog the contents. Purely from that standpoint, LTFS is a phenomenal step in the right direction.

But then came the step of dredging through the files. Fortunately this was an archive of a series of Mac systems where most of the application bundles are completely independent of the OS and I could open up many of the files and it would magically find the appropriate application that had been restored as well. But then there are a number of files that I don’t have the associated applications any more. Anyone seen a copy of Cricket Draw 1.5 or MacDraw or MacDraft from the mid-90s?

So whatever happens, the longevity of the physical media is largely irrelevant since any archival plan must also include review and conversion cycles to ensure the exploitability of the data. In this case, archives must integrate a continuous or cyclical migration process to ensure their viability, maintainability and accessibility.

While researching the issue, I’ve been revisiting the possibility of bringing tape back into my home backup solution, but at end of the day, it’s cost prohibitive for small to medium size environments. I can easily externalize naked drives with inexpensive eSATA or USB connections with a high confidence that in a disaster recovery situation, I can easily rebuild from these sources and that the necessary hardware is available and affordable. Asynchronous replication is even easier to deal with, but requires two well connected sites, which is currently outside of my budget.

The initial hardware tape investment is just too high to be practical and the hardware availability in a disaster recovery adds complexity and cost to the process. That said, my setup is only managing about 6 Tb in the home lab and production. The cost viability of moving to tape seems to be interesting once you’ve gone over 30-50 Tb of primary storage.

[1] zfs set compression=on pool/volume or more aggressively zpool set compression=gzip-9 pool/volume


Back to backups

It’s been a while since I documented the current backup architecture at the house which has changed a little bit with the inclusion of two little HP Microservers running Solaris 11. I’m a big fan of the HP Microservers as they offer the reliability and flexibility of a true server, but the energy consumption and silence of a small NAS at an unbeatable price.


The core of the backup and operations is based on ZFS and it’s ability to take snapshots and replicate them asychronously. In addition the two servers use RAIDZ over four 2 Tb drives for resiliency.

Starting at the point furthest away from the offsite disaster recovery copies, the core of the day to day action is on a Mac Mini and a ZFS Microserver in the living room. The Mac Mini is connected to the TV and the stereo and it’s primary role is the media center. The iTunes library is far too large to fit conveniently on a single disk, so the Mini contains only the iTunes application and the library database. The actual contents of the iTunes library is stored on the ZFS server via an NFS mount which ensures that the path is consistent and auto-mounted, even before a user session is opened. AFP mounts are user dependent and open with the session and in case of conflicts will append a “-1” to the name listed in /Volumes which can cause all sorts of problems.

The Media ZFS filesystem is snapshotted and replicated every hour to the second server in the office. The snapshot retention is set to 4 days (96 hourly snapshots). So in the case of data corruption, I can easily roll back to any snapshot state in the last few days, or I can manually restore any files deleted by accident by browsing the snapshots. A key point here is that the ZFS filesystem architecture follows a block level changelog so that replication activity contains only the modified blocks and can be calculated on the fly during the replication operation. This means that there are no long evaluation cycles like those in traditional backup approaches using Time Machine or rsync.

iPhoto libraries are also stored on the server due to their size on a separate user volume and copied using the same methods.

Then there’s the question of the Mini’s backups. In order to minimize RPO and RTO, I have two approaches. One is that I use SuperDuper to clone the internal Mini SSD to an external 2.5” drive once per day. This permits an RTO of practically zero, if the internal drive dies, I can immediately reboot from the external drive with a maximum of 24 hours lag in the contents. To assuage the issue of data loss, the Mini is backed up every hour via Time Machine to the local ZFS server. I’m using the napp-it tool on the ZFS box to handle the installation and configuration of the Netatalk package to publish ZFS filesystems over AFP. Again, the backup volume is replicated hourly to the second server in the office.

RTO iTunes

Another advantage of this structure is that if the living room server dies, the only thing I need to do is to change the NFS mount point on the Mini to point to the server in the office and everything is back online. The catch is that because the house is very very old and I haven’t yet found an effective, discreet method for pulling GbE between the office and the living room, this connection is over Wifi, so there is a definite performance hit. But for music and up to 720p video it works just fine.


All of the iOS devices in the house are linked to the iTunes library on the Mini, including backups so they get a free ride on the backups of the Mini.


All of the MacBooks in the house are also backed up via Time Machine to volumes on the living room server, with hourly replication to the office so there are always two copies available at any moment.

The office

In the office I have the second ZFS server plus an older Mac Mini running OS X Server. The same strategy is applied on this Mac Mini as well. An external drive, duplicated via SuperDuper for a quick return to service, but I’ve had issues with the sheer number of files on the server causing problems with Time Machine, so I also use SuperDuper to clone the server to a disk image on the ZFS server.

I have a number of virtual machines for lab work stored on the ZFS server in the office in various formats (VirtualBox, ESX, Fusion, Xen, …) on dedicated volumes on the ZFS server, accessed via NFS. I’ve played with iSCSI on this system and it works well, but NFS is considerably more flexible and any performance difference is negligable. Currently the virtualisation host is a old white box machine, but I’m dreaming of building a proper ESX High Availability cluster using two Mac Minis based on the news that I can install ESXi 5 on the latest generation and virtualize OS X instances as well as my Linux and Windows VMs.


No serious backup plan would be complete without an offsite component. I currently use a simple USB dual drive dock to hold the backup zpool (striped for maximum space) made up of two 2 Tb drives. They receive an incremental update to all of the filesystems on a daily basis, but only retain the most recent snapshot.

These disks are swapped out on a weekly or bi-weekly basis. With the contents of these two disks I can reconstruct my entire environment using any PC that I can install Solaris.

The best part of this backup structure is that it requires practically no intervention on my part at all. I receive email notifications of the replication transactions so if anything goes wrong I’ll spot it in the logs. The only real work on my part is swapping out the offsite disks on a regular basis, but even there, the process is forgiving and I can swap the disks at any time as there is no hard schedule that has to be followed.