Hyperconvergence webinar

A quick news update. I’ll be cohosting a SimpliVity sponsored webinar (in French) on the state of the Hyperconvergence marketplace, based on the analysis of the survey done by ActualTech Media. We’ll be going a little further in the analysis specific to EMEA as this is addressed to the French market.

Register here.

Reserve the date: July 1st at 10AM CEST. Looking forward to seeing you there!


Can't register vSphere Replication appliance

I ran into an interesting problem the other day when deploying vSphere Replication where the Appliance couldn’t register the service with vCenter. It turns out to be a combination of factors about the network configuration that can produce this problem. The problem is most likely to occur if you are using the vCSA.

As far as I can tell, the sequence of events for registering with vCenter is the following:

  • use the address or IP currently in use for the active Web Client session to contact vCenter
  • request the value of the Runtime settings vCenter Server name
  • contact the vExtension service based on the name returned in the previous step

And there is where the problem comes from. By default, when you install the vCSA, the value stored in the Runtime settings is the short name of the server, not the FQDN. At least this is the case on the v5.x versions. I haven’t yet tested the 6.0 vCSA.

The net result depends on how your network is configured and whether you are using DHCP or not. I was running into the problem and able to reproduce it with the following sequence of actions:

  • Configure DNS correctly with proper forward and reverse entries for the vCSA and the Replication Appliance
  • On a subnet with no DHCP services, deploy the vCSA with a fixed IP address
  • On the same subnet, deploy the vSphere Replication appliance with a fixed IP address

This will fail since when you configure the vSphere Replication appliance with a fixed IP there’s no place to enter DNS search domains so there’s no way the name resolution will work for a short name returned by the vCSA. If you are deploying using DHCP, you will probably be sending search domains to the client so the resolution will work properly.

When you try to go to the VAMI console of the replication appliance and try to manually connect to the vCenter server you will get the following somewhat misleading error message:

“Unable to obtain SSL certificate: Bad server response; is a LookupService listening on the given address?”

It would have been nice if the message mentioned the address that it was trying to contact which would have highlighted the fact it was looking at a short name.

The workaround is to simply update the runtime settings vCenter name to the FQDN. It’s also probably a good idea to verify that the FQDN in Advanced settings is has the correct value as well.

So if you ever see an appliance that has to register an extension to the vCenter web UI and it isn’t working, checking the value of the Runtime settings vCenter name might be the solution.


Edition issues

Edition issues

I’ve been standing on the sidelines of discussions surrounding the Apple Edition Watch and the pricing, listening to points varying from the tech world’s point of view where value is derived uniquely from functionality, and their total incomprehension of markets that function differently, to the luxury watch world where value is derived from craftsmanship and the cost in person-hours, to the concepts from the fashion world surrounding things like Veblen goods and learning a lot.

I would just like to bring up a thought that occurred to me this morning about how much of the discussion surrounding the cost of goods and how this is such an incredibly limited method of analysis as a predictor of the eventual sale price of an object. This was underlined to me while reading a wonderful analysis of the videos Apple is showing about the manufacturing techniques they are using to produce the watches.

What this signaled to me is that there are complexities, costs and investments in the manufacturing process that go far beyond the raw materials costs that need to be accounted for. Granted, for some of these systems, Apple is working at such an incredible scale that these inputs can sometimes be marginalized when considered on a cost per unit basis, but the Edition presents a special case which will clearly never be production at the scale of any other product made by Apple.

And there’s one more thing…

The elephant in the room is simply this: It’s made of gold. Gold is both incredibly valuable on a price/weight and is also universally exchangeable. Many of the components in a modern smartphone, like individual chips and so on are probably higher value on a cost/weight valuation, but they are only valuable to the greater market when assembled into a final product.

Which means that it’s very likely that there is an entirely separate production chain and set of facilities set up specifically for managing these new risks, which brings the investment on a cost per unit up even higher. This also means more in depth security and background checking on the personnel that will be working in these facilities.

Gold as a major component to a product represents a hugely complicated security risk at all points in the production chain. This is an incremental cost that needs to be addressed from the source where gold is purchased and then transported to the factory where the gold is melted, converted to flattened ingots and then into blanks. From there the blanks will be taken to the facility where the machining is done (I find it doubtful that these processes are done on the same site), noting that anything dealing with machining produces swarf only in this case, the swarf is valued at $800-$1000/ounce rather than the commodity pricing of aluminum. Not to mention that even if I recover a couple of pounds of aluminum by putting sweepings in my pocket, the available marketplace for reselling it remains limited.

Then we have the additional security through all of the following stages of stocking and transporting and then additional security at the store level. This is a non-negligeable cost factor that pretty much all of the discussions are ignoring. The watch people ignore it because it’s second nature to them and therefore obvious, the tech press ignores it because it’s so outside of the scope of the way the world works for them.

Remember, like all products, the cost is greater than the simple sum of the parts.


Datacenter SSDs cross over the price/Gb Barrier

This is a bit of a head scratcher. Samsung’s latest datacenter SSD lineup is now in the same price range as comparable enterprise SAS drives.

According to the documentation, the high endurance models are good for 10 drive writes per day over the 5 year guarantee. The kicker? I just found this drive for 670€ on all taxes in.



Back to backups (yet again)

In the world of information technology, nothing is static and lasts forever, especially best practices. I’ve been pointing out to clients for a while now that backups need to be rethought in terms of the “jobs to be done” philosophy and no longer thought of as “the thing that happens overnight when files are copied to tapes”.

Historically, backups served two purposes :

  • Being able to go back in time and retrieve data that is no longer available
  • Serve as the basis for a disaster recovery

Fundamentally, backups should really only serve the first point. We have better tools and mechanisms for handling disaster recovery and business continuity. Which brings me to snapshots. I have always told people that snapshots are not backups even though they respond to the criteria of being able to go back in time.

The hiccup is that snapshots that are dependent on your primary storage system should be considered fragile, in the sense that if your primary storage goes away (disaster), you longer have access to the data or the snapshots. However, just about every storage system worth its salt today includes the ability to replicate data to another system based on or including the snapshots themselves. This is a core feature of ZFS and one I rely on regularly. Many of the modern scale out systems also include this type of functionality, some even more advanced than ZFS like the SimpliVity implementation.

When are snapshots backups?

They become backups once you have replicated them to another independent storage system. This responds to the two basic criteria of being able go back in time and be on a separate physical system so the loss of the primary does not preclude access to the data. They become part of your disaster recovery plan when the second system is physically distant from the primary.

Disk to Disk to Tape

We’ve already seen the traditional backup tools adopt this model to respond to the performance issues around coping with the every growing volume of file data so that data can trickle over to a centralized disk store which is directly connected to tape drives where they can be fed at full speed. Exploiting snapshot based replication permits the same structure, but assigns the responsibility of the disk to disk portion to the storage system rather than the backup software.

The question I ask in most cases here is whether the volume of data involved justifies the inclusion of tape as a backup medium. According to the LTO consortium, LTO6 storage is as low as 1.3 cents per Gb, but this only takes into account the media cost. The most bare bones of LTO drives runs around $2,200, which bumps up the overall cost per Gb rather dramatically.

Assuming a configuration where we store 72Gb of data on tape (12 tapes), at the $80 cost per tape cited by the LTO Consortium plus the cost of the drive, this works out to about 4.3 cents/Gb. At current street prices, the 6Tb WD Red drives run about $270 which converts to 4.5 cents per Gb, not taking into account the additional flexibility of disks that permit compression and deduplication. Note that the 6Tb cited for the LTO numbers already includes compression where the 6Tb disk is raw before compression and deduplication.

Tape does have some inherent advantages in certain use cases, particularly long term offline storage, and does cost less to operate on a $/watt basis, but for many small to medium sized environments, the constraints for using it as a primary backup medium (especially when it is also the primary restore medium) are far outweighed by the flexibility, performance and convenience of a disk based system for daily operations.

Operational convenience of disk over tape.

Tape is a great medium for dumping a full copy of a dataset, but when compared with the flexibility of a modern disk based system it falls far behind. A good example that I use is the ability to prune snapshots from a data set to reorganize the space utilisation. In many systems, I use hourly snapshots in order to give users the a decent amount of granularity to handle errors and issues during the day. This also means that the unit of replication on a given filesystem is relatively small, permitting me to recover from intersite communications failures and not have to resend huge data sets that might have been interrupted. Then on the primary system I prune out the hourly snapshots after a week to leave one daily instance to be retained for 2 weeks. A similar process is applied to weekly and monthly snapshots. Where this gets interesting is that I do not have to apply the same policy on the primary and backup storage systems. My backup storage system is designed for capacity and will retain a month of daily snapshots, 8 weekly snapshots and 12 monthly snapshots. The possibility of pruning data from a set is something that is impossible to do effectively using tape technology, so tape is used for an archival copy that needs to be retained beyond the yearly cycle.

Files vs virtual machines

The above-noted approach works equally well for file servers and storage systems hosting virtual machines, especially if we are using a file based protocol for hosting the VMs rather than a pure block protocol like FC or iSCSI. In the world of virtual machines backup tools are considerably more intelligent about the initial analysis of the data to be backed up. Traditional file server backup is based on a two phase process of scanning the contents of the source, matching this against an index of data known to be backed up and then copying the missing bits. This presents a number of practical issues :

  • the time to scan continues to grow with the number of files
  • copying many individual files is a slower process with more overhead that block based differentials

By applying the snapshot and replication technique, we can drastically reduce the backup window, since only the blocks modified between two moments in time need to be copied. In fact there is no longer a backup window since these operations are continuous in the background of the file server.

Virtual machines in the VMware world maintain tracking journals of modified blocks (CBT) which enables the backup software to ignore the filesystem representation of the data and just ask for the modified blocks to copy since the last backup transaction. But again, if we are transmitting snapshots from the underlying storage system, don’t even need to do this. It is, however useful to issue VSS snapshots inside of Windows virtual machines to ensure that any inflight data in caches is flushed to disk before creating the storage layer snapshot.

The biggest issue with backing up virtual machines is the granularity of the restore operation. With only a simple replication, the result is a virtual machine with no visibility into the contents of its internal file systems. This is where the backup tools show their value in being able to backup a virtual machine at the block level, and yet still permit file level restores by peeking inside the envelope to look at the contents of the file systems therein.

The last mile

There are still issues with certain types of restore operations that require a high level of integration with the applications. If you want to restore a single email out of a backed up Exchange or Notes datastore, you need a more sophisticated level of integration than simply having a copy of the virtual machine.

But for the majority of general purposes systems, and particularly file services, the simple replicated snapshot approach is simpler and more effective, both from a cost and operational perspective.