VDI and Storage De-Duplication: Good or Bad marriage?

This article by a formed VMware View engineer is interesting. It states that pooled desktops are the way to go. Funnily enough those users can be very well served by Terminal Server, a feature that VMware lacks. Still a good article though!

The idea of data de-duplication to provide storage savings does not make sense for VDI. The real cost of a VDI solution is on the number of IO operations you can get per $ spent, not on the usable disk space you get per $.

There is need to divide VDI in two large buckets: Full Clones and Linked Clones.

De-dupe with Linked Clones

Linked Cloning technology utilizes a single image to provide a unique system disk to all virtual machines running in a View cluster (if you are interested in a deep dive on how Linked Cloning works please read VMware View 4.5 Linked Cloning explained). This single image called Replica will be eventually serving hundreds of clone VMs and this data will often be served from your fastest pool of disks, most likely SSD. These Replicas have a very small storage footprint – often around few GB. De-dupe is not useful for this little amount of data that will be kept as hot blocks for 100% of the time.

In View 4.5 VMware introduced the concept of Disposable disks. These are .vmdk files created to host Windows temporary files such as log, Internet cache and Windows swap file. Each individual VM will have data inherent only to that particular VM and not necessarily common blocks to be de-duped. If disposable disks are used then a de-dupe solution that is not inline (I am not aware of any storage vendor providing inline de-dupe for primary storage) will not provide any benefit. Either way, just to close this thought – disposable disks are deleted with every VM power off.

It is also possible to have Persistent VMs using Linked Clones. This scenario gives you great operational flexibility, allowing use of Refresh and Recompose operations. For this end it is common to make use of Persistent Disks. Those are disks created to host user’s personal settings and sometimes user data, such as My Documents. These disks are better served from a NAS instead of primary storage array as they do not require the same performance. If there is an opportunity for de-dupe this is the place, however If you offload the data to the NAS the persistent disk will only be couple MB, hosting user and computer registries.

De-dupe with Full Clones

Full Clones are “full fat” VMs that will persist across sessions. Full Clones can be created and managed by the connection manager (eq. VMware View Manager) or created and managed by an external entity. If VMs are created by an external entity they are then added to a Manual Pool trough available APIs and/or CLI’s.

The first point about full clones that I would like to create awareness is that Full Clones do not offer a supported and easy way to provide DR. If you are interested in a deeper read on DR for VDI you may go on and read my article VMware View Disaster Recovery Scenarios & Options.

Full Clones are the only pool mode that would make use of de-duplication since all VMs will have exactly the same data in storage. However, most of the block communalities exist only up to the moment the user starts to use the VM. When the VMs are in use they will write their own pertinent data, logging files and Windows Page Files. There are block communalities; however as memory starts to swap to disk this communality will be much lesser.

In my opinion, it doesn’t make sense to have a de-dupe engine running in the background or scheduled to run out of business hours when it’s more important and expensive to buy IOPs than usable disk space for VDI solutions.

One of my customers had offline de-duplication scheduled to run overnight and the task was running into business hours affecting user’s ability to perform their work.


Filed under: News Leave a comment
Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.