Microsoft Ignite – Approval

I’m going to Microsoft Ignite!

 

In early May I’ll be traveling to Chicago for a week, and am extremely excited that my request to attend was approved by my company.

I have never been to a large conference like this, although I have attended a smaller single-vendor conference in 2013. At this point I don’t really know what to expect, but I’m most looking forward to the following areas:

  • Networking and discussions with companies and IT experts who see the same struggles I do
  • Using Hyper-V in real world scenarios
  • Server 2012 R2 features like DirectAccess and Remote Desktop Services

Having never been to Chicago, and not knowing what to look for in accommodations I took a stab in the dark and will be staying at the “Hotel Rush”.

I hope that my attendance this year proves very valuable to my company, so that justification in 2016 is stronger; with Windows 10 and Server vNext coming in late 2015 I anticipate Ignite 2016 to be even more exciting.

DPM syncronization failure on Secondary Server

I now have an environment of Microsoft Data Protection Manager 2012 R2 set up as a replacement for Backup Exec 2010.

Despite the lack of some features, it has been performing quite well. However I recently started receiving email notification of errors and it relates to my secondary DPM server.

The primary DPM server exists in the head office, and provides back up of Hyper-V VMs from my main cluster. The secondary DPM server exists in a branch office 300KM away and provides back up of the primary DPM server.

 

I started receiving errors like the following from the Secondary server to individual resources on the primary:

Synchronization for replica of \Online\servername(servername.clustername) on PrimaryDPM failed because the replica is not in a valid state or is in an inactive state. (ID 30300 Details: VssError:The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.
 (0x800423F4))

 

Every time I tried to perform a consistency check on these resources, it would begin and then end within 30 seconds.

To be honest I didn’t have a lot of time to troubleshoot this one. I tried restarting both DPM servers as well as the Hyper-V host and VM itself, and none of that seemed to have an impact.

At some point I noticed that the resources giving the errors on the Secondary server hadn’t had a recovery point on the Primary server in quite some time.

I forced an Express Full Backup of the VMs on the Primary server and allowed it to complete (successfully). I then initiated a consistency check on the Secondary server protected resources, and it too completed successfully!
Where I’m still confused is why didn’t I receive alerts from my Primary DPM server that recovery points were being missed?

EqualLogic DPM Hyper-V network tuning

I’m in the process of configuring DPM to back up my Hyper-V environment, which resides on an EqualLogic PS6500ES SAN.

It was during this that I encountered an issue with the DPM consistency check for a 3TB VM locking up every other VM on my cluster, due to high write latencies. During this period I couldn’t even get useful stats out of the EqualLogic because SANHQ wouldn’t communicate with it and the Group Manager live sessions would fail to initialize.

 

After some investigation, I did the following on all my Hyper-V hosts and my DPM host:

– Disabled “Large Send Offload” for every NIC

– Set “Receive Side Scaling” queues to 8

– Disabled the Nagle algorithm for iSCSI NICs (http://social.technet.microsoft.com/wiki/contents/articles/7636.iscsi-and-the-nagle-algorithm.aspx)

– Update Broadcom firmware and drivers

 

Following these changes, I still see very high write latency on my backup datastore volume, but the other volumes operate perfectly.

 

 

 

Server 2012 R2 Upgrade and BSOD

I’m currently in the process of upgrading a standalone Server 2012 machine running Hyper-V to Server 2012 R2.

Due to resource constraints, I’m performing an in-place upgrade, despite this server residing 800km away from me. Thank goodness for iDRAC Enterprise.

 

However, during this process, during the “Getting Devices Ready” section I received a Blue Screen Of Death, with the error message:

whea_uncorrectable_error

 

After it hit this BSOD twice, the upgrade process failed out and reverted back to Server 2012. I was unable to find a log file of what occurred in any more detail, and was worried that I would be stuck on Server 2012.

Thankfully, I discovered a log file on the iDRAC with the following message:

A bus fatal error was detected on a component at slot 1.

This triggered my memory, and I recalled that we have a USB3 PCI-E card installed for pre-seeding an external drive with backup info.

I used the BIOS setup (Integrated Devices > Slot Disablement) to disable Slot 1, and then retried the upgrade with fingers crossed.

Success!

 

Hyper-V 2012 migration to R2

Myself and a co-worker just completed an upgrade of our 2-node Server 2012 Hyper-V cluster to a 3-node Server 2012 R2 cluster, and it went very smoothly.

I’ve been looking forward to some of the improvements in Hyper-V 2012 R2, in addition to a 3rd node which is going to be the basis for our Citrix XenApp implementation (with an nVIDIA GRID K1 GPU).

I’ve posted before about my Hyper-V implementation which was done using iSCSI as the protocol but direct connections rather than through switching, since I only had 2 hosts.

For this most recent upgrade I needed to add a 3rd host, which meant a real iSCSI SAN. Here’s the network design I moved forward with:

Server 2012 R2 Network Design
click for big

 

This time I actually checked compatibility of my hardware before proceeding, and found no issues to be concerned about.

The process for the upgrade is described below, which includes the various steps required when 1) renaming hosts in use with MD3220i, and 2) converting to iSCSI SAN instead of direct connect:

Before maintenance window

  • Install redundant switches in the rack (I used PowerConnect 5548’s)
  • Live Migrate VMs from Server1 to Server2
  • Remove Server1 from Cluster membership (Evict Node)
  • Wipe and reinstall Windows Server 2012 R2 on Server1
  • Configure Server1 with new iSCSI configuration as documented
  • Re-cable iSCSI NIC ports to redundant switches
  • Create new Failover Cluster on Server1
  • From Server1 run “Copy Cluster Roles” wizard (previously known as “Cluster Migration Wizard”)
    • This will copy VM configuration, CSV info and cluster networks to the new cluster

Within maintenance window

  • When ready to cut over:
    • Power down VM’s on Server2.
    • Make CSVs on original cluster Offline
    • Power down Server2
  • Remap host mappings for each server in Modular Disk Storage Manager (MDSM) to “unused iSCSI initiator” after rename of host, otherwise you won’t find any available iSCSI disks
  • Reconfigure iSCSI port IP addresses for MD3220i controllers
  • Add host to MDSM (for new 3rd node)
  • Configure iSCSI Storage on Server1 (followed this helpful guide)
  • On Server1, make CSV’s online
  • Start VMs on Server1, ensure they’re online and working properly

 

At this point I had a fully functioning, single-node cluster within Server 2012 R2. With the right planning you can do this with 5-15 minutes of downtime for your VMs.

Next I added the second node:

  • Evict Server2 from Old Cluster, effectively killing it.
  • Wipe and reinstall Windows Server 2012 on Server2
  • Configure Server2 with new iSCSI configuration as documented
  • Recable iSCSI NICs to redundant switches
  • Join Server2 to cluster membership
  • Re-allocate VMs to Server2 to share the load

I still had to reset the preferred node and failover options on each VM.

Adding the 3rd node followed the exact same process. The Cluster Validation Wizard gave a few errors about the processors not being the exact same model, however I had no concerns there as it is simply a newer generation Intel Xeon.

 

The tasks remaining for me are to upgrade the Integration Services for each of my VMs, which will require a reboot so I’m holding off for now.