It’s all a matter of perspective

I’m just catching up on my development RSS feeds and ran across yet another insightful technical article by Mike Ash. I’m finding this quite funny as I just gave a presentation at the Infralys (soon to be integrated in Ackacia!) hosted Rendezvous de la Virtualisation 2013 discussing the impact of SSD and flash storage arriving in the storage stack. Here are the slides for those interested.

In my presentation the coolest, most way out there SSD storage technology is the Diablo Memory Channel storage, where they put NAND chips onto cards that get attached to the RDIMM slots in your server. This is to ensure consistent (and very very very small) latency between the CPU and the storage. No jumping across the PCI bus and traversing various other components and protocols to get to storage, it’s right there accessible via the memory bus.

And here I have Mike explaining from the developer perspective “Why Registers Are Fast and RAM Is Slow”.

Always good to remind us that every part of the stack can be optimized and it’s a matter of perspective. Multi-millisecond latency fetching data from a physical object traversing multiple networks is forever for a modern CPU.

Thought experiment of the day: What if we configured our servers to behave like resource constrained devices, disabled swap and killed processes that stepped out of bounds? We’ve been taking the easy route throwing memory and hardware at problems that might have software optimization answers…


Dump to tape from VMFS

A recurring issue that I see in a few instances are places that still have requirements to externalize backups to tape for long-term storage (please don’t use the archive word). But on the other hand, it’s clear that disk to disk backup solutions that leverage the VADP protocols are considerably more efficient tools for VMware environments.

Now assuming you have a decent budget, I highly recommend Veeam Backup & Replication as a complete solution that now integrates tape externalization. But if you’re environment is smaller or can’t justify the investment when there are excellent “free” tools like VMware Data Protection available, here’s a potential solution for long term dump to tape.

Assuming that you have some kind of existing backup solution that write files to tape, the problem is that VMFS is pretty much an unreadable black box file system. This has been exacerbated by the fact that wuth ESXi the old fashioned approach of putting a Linux backup client in the Service Console is not longer really a viable option.

So we need a few little things in place here.

  • A server with your backup software connected to the SAN (iSCSI or FC)
  • A storage bay that can create and present snapshots (optional, but more efficient)
  • The open source VMFS driver

Some assumptions:

  • Your backup appliance is stored on VMFS3.x block storage, with no RDMs

The basic process involves the following steps:

  1. Stop the backup appliance
  2. Snapshot the LUN(s) for the backup appliance
  3. Start the backup appliance
  4. Present a cloned volume based on the snapshot to the backup server
  5. Connect to the LUNs using the fvmfs java applet and publish them over WebDAV
  6. Mount the WebDAV share as a disk
  7. Backup the contents using your backup software

Stop the backup appliance

In order to ensure a coherent state of the data on disk, you’ll want to stop the backup appliance. VDP can be stopped with a Shutdown Guest OS from the VI-Client or shutdown at the command line.

Snapshot the LUN(s) for the backup appliance

Snapshotting the LUN is an efficient method to have a copy of the appliance in the off state to ensure data consistency. Most systems will allow you to script this kind of activity.

Example using Nexenta recordings:

create snapshot backuppool/vdp-01@totape

Start the backup appliance

Since we have a snapshot, we can now restart the backup appliance using the VI-Client or whatever is easiest for you.

Present a cloned volume based on the snapshot to the backup server

Now that the appliance is running normally, and we have a snapshot with the appliance in a stopped state we can now continue with the extraction to tape process without any time pressure that will impact new backups.

So we need to create a cloned volume from the snapshot and present it to the backup server:

Nexenta example:

setup snapshot backuppool/vdp-01@totape clone backuppool/totape
setup zvol backuppool/totape share -i backupserver -t iSCSI

Where -i points to the name of the initiator group and -t points to the registered target group (generally a set of interfaces).

Now to verify that the presentation worked, we go to the backup server and (assuming a Windows Server), Computer Management > Disk Management. We should now see the a new disk with an Unknown partition type. Don’t try to format this or mount it as a Windows disk. From a practical standpoint, you won’t be doing any harm to your source data since it’s a volume based on a snapshot, not the original, but you want access to the source data. What you want to note is the name on the left side of the window “Disk 3”.

NB If you are using VMFS extents based on multiple source LUNs, you’ll need to present all of them so take note of the new disks that are presented here.

Connect to the LUNs using the fvmfs java applet and publish them over WebDAV

Still on the Windows server, you’re going launch the fvmfs java applet so you’ll need a recent java.exe. At the command line, navigate to the folder containing the fvmfs.jar file and launch it using the following syntax:

“c:\Program Files\Java\jre7\bin\java.exe” -jar fvmfs.jar \\.\PhysicalDrive2 webdav 80

Where the Physical Drive number maps to the Disk number noted in the Disk Management console.

If you are using extents, note the disks with the same syntax, separated by commas.

The WebDAV share is mountable on modern Windows systems with the classic “net use Z: http://localhost/vmfs”.

If you have the misfortune to still be using Windows 2003, you’ll also need to install the WebDAV client which may or may not work for you. If it still doesn’t work, then I recommend trying out WebDrive for mounting WebDAV shares to a letter.

Once the drive is mounted to a drive letter, you’ll have near native access speed to copy the data to tape.

Tear down

Cleaning up is mostly just walking through the steps in reverse. On the server doing the backups, unmount the drive by closing the command prompt running the fvmfs applet or control-C to kill the process.

Then we need to delete the cloned volume, followed by the snapshot. Another Nexenta example:

destroy zvol backuppool/totape -y
destroy snapshot backuppool/vdp01@totape -y

And we’re done.

Restoring data

To keep things simple, restoring this kind of data is best done to an NFS share that is visible to your ESXi hosts. This way you can restore directly from tape to the destination. The fvmfs tool presents a read-only copy of the VMFS datastore so you can’t use it for restores.

Under normal conditions, this would be a very exceptional event and should be to some kind of temporary storage.

Other options

A simple approach for extracting a VDP appliance is to export the system as an OVF, but there are a number of shortcomings to this approach: - it’s slow to extract - it can only extract to via the network - you need a big chunk of temporary space

NB: This is a specific approach to the specific problem of long term externalization. In most operational day to day use cases, you’re better off using some kind of replication to ensure availability of your backups.

[^fn-1] Nexenta recordings are a method of building scripts based on the NMS command line syntax. They are stored in the .recordings hidden directory in /volumes and are simple text files that you can launch with “the “run recording” from the NMS command line without diving down to the expert mode raw bash shell.


TV Fantasy League

I’ve been following many various discussions around the current state of affairs of the AppleTV and the future of media in its various forms, particularly around the TV experience.

The most recent deep discussion on the topic was Screen Time #40 with Moisés Chuillan, Horace Dediu, and Guy English kicked off a number of thoughts.

It seems to me that the TV discussion can be safely split into two categories:

  • streaming media, ie TV and movies
  • interactive media, ie games and apps

Streaming media

TV is an extremely provincial affair with intercrossing ownership, copyright and distribution channels that vary from country to country. Then there are the technology variants. Cable is a dominant method in North America, but practically non-existent in much of Europe where over the air, TVoIP and Satellite dominate.

The key factor to remember when observing Apple in this market is that Apple is a global company, and while there’s lots of interesting soap opera analysis to be done regarding the US TV market, it’s only one market in a world where the US represents a shrinking portion of Apple’s revenue.

So any major moves on Apple’s part will have to be scalable to the world. This means that an Apple TV (in the current external box incarnation) will almost certainly not have any kind of direct cable connection.

Interactive media

Here we have some interesting stuff going on. Currently the iOS ecosystem with iPhones, iPod touches and iPads have a thriving gaming ecosystem which dwarfs the gaming console market in terms of people playing games.

Add to this the fact that the current generation iPad with the A6X has a graphics card capable of driving the 2048x1536 pixel display for highly detailed action games. So there’s no good reason that this chip couldn’t be going into the next generation AppleTV driving a 1920x1080 display with a decent gaming experience.

The latest generation of gaming rigs from Sony & Microsoft are targeting the upcoming 4K TVs that are just starting to hit the market but this is a very small rareified market niche.

So Apple already has the core CPU and graphics hardware to transform an AppleTV into a decent, inexpensive gaming platform. In addition they now have 5 years of experience in developing frameworks for iOS aimed at making game development easier and accessible.

Missing pieces

The human/device interface

One thing that currently hobbles the current AppleTV and iOS devices is the lack of physical controls. The AppleTV remote is a minimalist’s dream, but beyond simple screen navigation, it’s pretty useless.

Using iOS devices as remotes is just barely viable, because the key difference for both gaming and driving a TV is that you’re not looking at your device, you’re looking at the TV. The lack of tactile feedback on the multitouch glass screen means you can’t tell where the buttons are on the screen without looking.

Hence it’s clear that we need some kind of device that we can hold where the physical controls can be driven by touch only without requirement for visual interaction.

News is going around that iOS 7 will include APIs for (Apple approved) game controllers using Bluetooth 4. This is a logical step for adding a new type of gaming experience to iOS that will expand the possibilities far beyond what we can do well using a purely multi-touch interface.

In the short term, this enables the iPhone and iPad as portable gaming consoles that can be easily linked to the TV via an AppleTV using Airplay.

While this is a nice solution, the Airplay component is a rather significant bottleneck. This will be mitigated as more people migrate to 802.11ac, but this is going to take time and no matter how good the wireless network gets, radio is going to add latency and lag to the experience.

So the next logical step is to open up the AppleTV with a full fledged app API and go directly from the controller to the device driving the screen.

Now in terms of raw horsepower, even the A6X can’t touch the capabilities of the new Playstation and xBox, but it certainly will be good enough for an awful lot of people. In addition, by starting with a limited API, the existing 13 million AppleTVs could participate immediately and ramping up to more powerful games with a newer model. Assuming the price remains roughly static, I can easily see an AppleTV and two controllers going for less than half the price of a full fledged console which puts it into the impulse buying territory. Breaking it into increments where existing AppleTV owners only need to buy the controllers makes it even easier to justify.

With this in mind, we can easily imagine that many of the current console style games available for the iPad coming to the AppleTV with minimal work for Apple and the game developers.

Handling the mutiplicity of sources

Attempting to solve the TV problem for the streaming portion, the hardware controller is a definite must and I suspect that the same controller APIs will be adapted for a richer input device, but still closer to the existing Apple Remote to anything that currently exists for TV remote controls. So the input/manipulation problem can be solved relatively easily.

Moving onto the content, this is a fight best won by simply avoiding it entirely. Here I’m on shakier ground since the changes I’m going to propose will probably require that the price of the AppleTV be increased. While Apple isn’t shy of charging for their devices, they are pretty canny about knowing how the market values their products.

The current state of affairs for those of us without universale remotes looks something like the following:

  1. Turn on the TV with the TV Remote
  2. Select the input I want
  3. Put down the TV Remote and pick up the tuner/satellite/cable box remote
  4. Navigate through an awful UI to pick what I want to watch
  5. Put down that remote and pick up the TV Remote to adjust the volume

The first and biggest hurdle is getting the input switching out of the way which could be solved by adding HDMI inputs to the AppleTV. The HDMI standard does include a control channel (CEC) that is implemented in some systems under a myriad of brand names. Which means that the TV will simple default to input 1 with the Apple TV and waking the AppleTV will wake the TV. Then the jumping off point for input selection, whether from an HDMI input or iTunes is a single screen.

The interest of this approach is that HDMI has become the global video interconnection standard. Whether the box is consuming cable, Satellite input or whatever, the output is almost always HDMI. The weak point of this approach is that the CEC implementation is optional and the controls (channel up/down, browsing menus, etc.) are potentially unique to each device. But that’s a considerably simpler technical problem to solve with integration and compatibility testing rather than arguing over distribution rights in every geographic jurisdiction in the world.

However, adding HDMI ports will require a modification to the form factor and almost certainly add to the price, although the cost per port appears to be fairly reasonable (royalty 4-14 cents per device). The vague part of the licence is that it seems to be per device, and I don’t know if the number of ports on the device is factor on the royalty pricing. And I have no idea on the component and integration cost impact.

So it appears to me that the logical evolution on the software side is an extension of the existing iOS toolchain, with the addition of hardware controllers to extend the interaction possibilities.

On the hardware side, there are more potential choices, from simply making the same box with more powerful internals, to extending the responsibility of the box to other components in the living room. It remains to be seen how Apple will approach this and on what time line, but it seems clear that tying more powerful hardware to an extended app ecosystem is the path of least resistance and most revenue. Attacking the streaming media situation head on is pretty much impossible, but interposing the AppleTV between the screen and the various sources seems to be a viable approach requiring relatively little dependency on the existing players.


Managing Thin Provisioning

This question has come to me via a number of different channels over the last few days. Thin provisioning is a really nice feature to give yourself some additional flexibility in managing storage usage. But things have gotten more than a little confusing lately since you can implement it on different levels with different issues.

The biggest fear is what I refer to as a margin call. You have declared more storage to your servers that you really have, and at some point, you exceed your physical capacity and everything grinds to a halt. We’ve already see similar issues with using Hypervisor snapshots where the delta file grows unattended and the LUN fills up and can no longer accept any new writes.

In practical terms, I have a couple of different approaches, mostly depending on the environment you’re working in.


You don’t want to take any chances in production. But this still doesn’t mean that thin provisioning is a bad idea. I strongly recommend that VMware’s thin provisioning not be used in production since there is a noticeable performance impact. However there are still good reasons to use it on the storage system:

  • Modern storage systems tend to use some kind of pooling technology to stripe IO across many disks. If you use fixed provisioning we have a higher probability of running into hot and cold spots and you might be limiting your performance.
  • Unexpected demands can arrive and if you have fixed provisioning your time to reaction may involve waiting on new hardware purchases.

So my general policy on production systems is to use thin provisioning, but never to overprovision. If I have an unexpected project that arrives and needs space, I can allocate it quickly, and start the purchasing process to ensure that my overprovisioned state is temporary. The key is to ensure that the demand is dependent on getting that purchase order approved, so the risk exposure is minimized.

Test, Dev, Integration, Qualification, …

In these environments the lifecycle is very very different from production. Much of the choices here depend on how you use these types of environments.

Much of the time, the work can be exploratory with unexpected demands for additional machines and storage as problems are identified and new test cases appear. In these environments, I tend more towards fixed allocation for a given project, but let the developers and testers the autonomy of deploying into these environments. Thus, the logical choice is to lean more towards thin provisioning at the VM level.

However to maintain maximum flexibility it can be useful to continue to use thin provisioning on the storage system. But in this case, we have a different issue: how to reclaim disk efficiently in an environment where machines can be created and deleted frequently? The problem is that a deletion only writes to the allocation table, but the actual blocks that represent the deleted VM have been written to and thus are still allocated on storage.

Reclaiming thin provisioned storage today remains a PITA. Basically, we need to send some kind of command to clear the contents of unallocated blocks (zeroing out) and then instruct the storage system to reclaim these blocks, which generally involves a pretty brute force approach of reading everything to see what can be reclaimed.

To get around this issue I have adopted a rolling approach where long lived test and development environments are renewed quarterly (or monthly depending on the volatility). This involves scripting the following actions :

  • Create a new LUN
  • Format the LUN as a datastore
  • svMotion the VMs on the source datastore to the new datastore
  • unmap the old LUN
  • delete the old LUN
  • possibly rename the new LUN (or use a minimal date stamp in the name)

This results in a freshly thin provisioned datastore with only the current VMs storage allocated. Any thin provisioned blocks on the original source LUN have been freed up with the deletion of the LUN.

Of course, you could always just use NFS backed by ZFS and let the system do its thing.

Other issue can come into play depending on your internal operating procedures, such as do you do internal billing for allocated storage? In these cases, the question of how to bill for thin provisioned storage is an ongoing debate.


Restoring Open Directory from Time Machine on Mountain Lion

I just ran across an ugly situation where my Open Directory account went bad and was refusing to login to any services.

I was seeing these repeated errors in the System log :

Jun 20 18:40:51 PasswordService[168]: -[AuthDBFile getPasswordRec:putItHere:unObfuscate:]: no entries found for d24bd7b0-d8a7-11e1-ad93-000c29b10837
Jun 20 18:40:51 log[3195]: auth: Error: od(erik, Credential operation failed because an invalid parameter was provided.
Jun 20 18:40:51 log[3195]: auth: Error: od(erik, authentication failed for user=erik, method=CRAM-MD5

And the Password Service log was full of: Jun 20 2013 16:25:24 74348us USER: {0xd24bd7b0d8a711e1ad93000c29b10837} bad ID.

Which were all of my various devices trying to catch up on mail.

So the obvious thing to do is restore Open Directory. But I know that I had made a number of changes since the last archive operation (yes, bad me) so I needed another way to get this back up and running quickly.

I do backup the server using Time Machine, SuperDuper and zfs snapshots, so I could easily do a full rollback to a previous point in time, but I would also lose whatever mail had arrived in the meantime. And the problem is so specific, I should be able to fix it by restoring just the Open Directory data.

So here’s how to restore your Open Directory from a Time Machine backup. Some steps can be accomplished different ways, but this is probably overall the easiest way.

  • On the server, go to the Time Machine menu item and select enter Time Machine. This will mount your Time Machine disk image automatically.
  • On another machine open up an ssh session as an administrator (or you can mount the Time Machine backup image manually and do this locally)
  • sudo bash to get a root shell (the Open Directory files are not accessible to a regular admin account)
  • Stop the Open Directory Service with “serveradmin stop dirserv”
  • cd to /Volumes/Time Machine Backups/Backups.backupdb/servername
  • Here you will find a list of directories with the Time Machine backup sessions. Find one that is just before OD started going south and cd into it and descend to :
  • /Volumes/Time Machine Backups/Backups.backupdb/servername/date/servername/private/var/db
  • Then sync the data from the backup onto the source disk with :
  • rsync -av openldap/ /private/var/db/openldap/
  • Start the Open Directory Service with “serveradmin start dirserv”

You should be back in business.

Page 1 ... 4 5 6 7 8 ... 50 Next 5 Entries »