While it is quite simple to find many people talking about Backup Exec and Veeam online, it is much more difficult to find anecdotal experience with CommVault Simpana. Having recently been part of an implementation in my company, I thought I would share my own opinions, particularly as it relates to a Hyper-V environment. These are my personal views and do not represent my employer in any way.
In short, if you are a Hyper-V environment, I cannot recommend consideration for CommVault Simpana in any capacity. One would be much better served investigating Veeam or Altaro instead, especially since those products are dedicated to virtualization support and have a track record of excellent word-of-mouth.
Initially I deployed on Simpana V10 SP12. Having completed the required (and outrageously expensive) CommVault training, I had a good understanding of how the system worked and how to implement.
Overall, getting started went smoothly, however it wasn’t very long before I began encountering issues. Here are some of the things I’ve found with V10:
Lack of Change Block Tracking
Somehow during the investigation phase, no one at my company (including myself!) thought to investigate whether CommVault will do Change Block Tracking (CBT) for Hyper-V. And it turns out in V10, it does not. This came as a very large surprise when I went to back up my 6TB virtual machine it the incremental required ~30 hours to complete.
Following an investigation with CommVault support, it was determined that a CRC process is done for every single bit in the VM, to assess whether it has changed or not. There were certain optimizations I made to ensure that this process was as fast as possible, such as changing my EqualLogic MPIO to Round Robin and ensuring the EQL was using 4xNIC with no dedicated management NIC.
By working through some of the other issues below, I was able to mostly mitigate the slow CRC process in my environment but it was a major challenge.
Cluster Shared Volume Owner node
Windows Server 2012 did away with the concept of “Redirected Mode” for backup in a Hyper-V environment, but I don’t think CommVault got the message. While my CSV didn’t go into an actual Redirected Mode, it turns out that only the first Node specified within the CommVault Virtual Client for Hyper-V will stream the backup data, regardless of the owner of the CSV being worked on.
What this means is that in my 2-node cluster, the CRC read process occurred between my cluster hosts on the 1x1Gbe network for cluster communication, rather than happening directly on the node that owned the CSV. This was a huge bottleneck and absolutely killed performance in the cluster.
The solution was to create a Pre-Job powershell script that moved the CSV ownership to one node in the Cluster, which was set as the proxy for the CommVault backup. Not ideal, especially as Windows Server will automatically re-balance the CSV owner as of Server 2012.
To fully saturate my iSCSI connections, I had to split my large VMs into smaller ones. The recommendation I received from CommVault was to not have a VM larger than 2TB. I found that regardless of how many data readers and network streams I configured on the subclient, only a single iSCSI connection was utilized. Once I changed MPIO to Round Robin, all iSCSI connections were utilized but not fully saturated by one subclient.
Now I have 4 subclients, at ~2TB each, running concurrently. This caused some major effort in re-configuring our File Server(s) but thankfully we’re using DFS namespaces to obfuscate the actual server names and it was fairly invisible to our users.
Previously using Microsoft DPM or Backup Exec, I never experienced VSS issues from the Hyper-V hosts or the guests. With Simpana, out of 7 jobs running nightly, at least one is failing with some kind of VSS error. Whether this is “writer is in a transient state” or just errors getting the snap in the first place it is a regular occurrence. I have mitigated some of these issues by ensuring that all guest drives have more than 15% free space on them, including the PageFile.VHDX volumes I’ve created per VM.
Still, for a top-tier product I would not expect as many errors to occur especially when the environment was fully stable prior to Simpana.
Lack of VSS Hardware Support
I would LOVE to use my EqualLogic hardware VSS provider, but it is not supported, and I have found zero indications that progress is being made in supporting additional VSS hardware providers. I actually tried it out and the backup was successfully completed, however there were numerous errors on the Hyper-V node and since it is an unsupported platform I cannot use it in production.
Now Version 11 SP2 has been released, and there are two crucial improvements that it is supposed to provide for Hyper-V:
Change Block Tracking
A 3rd party file system driver has been implemented for CBT in Hyper-V environments. After initial implementation (which requires a new Full backup) it seemed to work quite effectively; my 4 large VMs each taking 7-9 hours now required 40-80 minutes for an incremental.
However, the second weekend something happened where a subclient job crashed, put a CSV into Redirected Mode, and hard-locked a cluster node when I tried to return the CSV to normal. Since then, CBT has been failing on at least 50% of my subclients, even after having CommVault support perform a reset on it.
At this point I am not very trustworthy of such a new feature.
CSV Owner recognition
V11 was supposed to introduce new algorithms for CSV owner identification, allowing all Cluster nodes to act as coordinators for backup of subclients. While this mostly works, there are still odd quirks (that I haven’t dug into deeply yet) such as a weekend job last night that was again saturating my cluster communication network between nodes and effectively locking up every VM running on the cluster. I think I’m still safer moving the CSV owner before every job right now.