EqualLogic DPM Hyper-V network tuning

I’m in the process of configuring DPM to back up my Hyper-V environment, which resides on an EqualLogic PS6500ES SAN.

It was during this that I encountered an issue with the DPM consistency check for a 3TB VM locking up every other VM on my cluster, due to high write latencies. During this period I couldn’t even get useful stats out of the EqualLogic because SANHQ wouldn’t communicate with it and the Group Manager live sessions would fail to initialize.

 

After some investigation, I did the following on all my Hyper-V hosts and my DPM host:

– Disabled “Large Send Offload” for every NIC

– Set “Receive Side Scaling” queues to 8

– Disabled the Nagle algorithm for iSCSI NICs (http://social.technet.microsoft.com/wiki/contents/articles/7636.iscsi-and-the-nagle-algorithm.aspx)

– Update Broadcom firmware and drivers

 

Following these changes, I still see very high write latency on my backup datastore volume, but the other volumes operate perfectly.

 

 

 

Strange Sonicwall network issue

Since May I’ve been struggling with a very odd issue with the Sonicwall NSA 2400 in my head office. It was first discovered when our VPN’s kept going down without warning, multiple times per day.

After some internal investigation, my team noticed a pattern; one of us was trying to configure SSL-VPN for the first time, and every time they made a change to the settings, our X2 interface went down.

Only X2 went down though; we have X1 connected to an entirely different ISP, and it never had any issue. Unfortunately X2 was the interface providing connectivity for all our site-to-site VPNs, as we well as our external client-facing services.

I narrowed down how to replicate the issue, and discovered that any change to a NAT policy caused it, as well as other random settings changes. However firewall access rules did not impact X2 connectivity.

I could verify the issue by pinging my X2 gateway from the Sonicwall. Before enabling/disabling a NAT Policy, the ping was successful. However as soon as I made a change, ping timed out.

Connectivity was automatically restored after 5-6 minutes; there was nothing I could do to force traffic to resume.

I got in touch with my ISP but they confirmed that it wasn’t a problem on their network.

I had a ticket open with Sonicwall for quite some time, and diligently followed their directions, including wiping the Sonicwall and starting from factory defaults (that didn’t work).

Next they asked me to reconfigure the link on X5 to replace X2, but that didn’t work either.

After a few delays in troubleshooting, it was recommended to do a hard-reset; boot into safe mode, upgrade to 5.9 firmware and then reset to factory defaults. Apparently the first reset to defaults was considered a ‘soft reset’ and isn’t as effective. To be honest, I don’t understand how a hard reset could resolve an issue like this, but I was willing to give it a shot.

After planning a 2 hour maintenance window, I began the hard reset procedure. When the Sonicwall came back up in Safe Mode, I upgraded to 5.9 firmware and booted to factory defaults. Then I reconfigured the LAN and WAN interfaces, and tested my original issue. Success! X2 didn’t go down.

I was really hoping to avoid a full reconfigure from scratch, so after my successful test I imported my most recent config backup and crossed my fingers that the problem wouldn’t return. After the reboot I disabled a NAT policy, and determined that X2 stayed up the entire time. Success again!

Overall, I was very pleased with Sonicwall support. Despite the fact that they couldn’t pinpoint the problem to a resolvable issue, they were always quick to respond and understanding that I needed to schedule maintenance windows for any work on the device. Sonicwall gets a bad reputation in some IT circles but I will have no hesitation in purchasing additional units and recommending them to others.

 

I always forget the basics

I had a strange issue with one of my branch offices, where they would lose access to local resources and external Internet sites whenever our Site-to-Site VPN with the head office went down.

I spent around 3 hours troubleshooting this issue, desperately looking for a logical cause. It wasn’t until I paid closer attention to the DNS settings that were being received from the DHCP server did I notice that the primary DNS nameserver was a legacy domain controller within the branch office that no longer existed, and the secondary DNS was a domain controller in our head office, across the VPN.

When the VPN link went down, the clients had no resolvable DNS servers, and thus couldn’t access anything except by direct IP.

When I discovered this, it was a quick fix that brought services back online promptly.

Unfortunately it is all too often that I dive into a problem looking for a cause that is complex without seeing the simple issue right in front of me. I need to learn to be a little more methodical in my problem solving, and start with Layer 1 first.

Well that was unexpected

I’Road_ThumbsUp_Successm working late tonight fixing some stuff with Sonicwall, and it worked on the first try! Now that my SSL-VPN is configured over port 443, hopefully it will pass through a client’s super restrictive firewall without issue and solve a long-standing issue for my user.

I’m calling it a night, ending on a high note.

Windows Server 2012 Windows Update Error 0x80240440

I have begun setting up a new server for a branch office, and have decided to use Windows Server 2012 on it; thanks Software Assurance! This way I can utilize the new Hyper-V features when I’m ready, as well as virtualize a domain controller properly.

 

However, I ran into a problem with Windows Update on both the Host and Guest running Server 2012. Windows Update reported an error:

 

 

The windows update log located at %windir%/windowsupdate.log reported this:

+++++++++++  PT: Synchronizing server updates  +++++++++++
  + ServiceId = {9482F4B4-E343-43B6-B170-9A65BC822C77}, Server URL = https://fe1.update.microsoft.com/v6/ClientWebService/client.asmx
WARNING: Nws Failure: errorCode=0x803d0014
WARNING: Original error code: 0x80072efe
WARNING: There was an error communicating with the endpoint at 'https://fe1.update.microsoft.com/v6/ClientWebService/client.asmx'.
WARNING: There was an error sending the HTTP request.
WARNING: The connection with the remote endpoint was terminated.
WARNING: The connection with the server was terminated abnormally
WARNING: Web service call failed with hr = 80240440.
WARNING: Current service auth scheme='None'.
WARNING: Proxy List used: '(null)', Bypass List used: '(null)', Last Proxy used: '(null)', Last auth Schemes used: 'None'.
FATAL: OnCallFailure(hrCall, m_error) failed with hr=0x80240440
WARNING: PTError: 0x80240440
WARNING: SyncUpdates_WithRecovery failed.: 0x80240440
WARNING: Sync of Updates: 0x80240440
WARNING: SyncServerUpdatesInternal failed: 0x80240440
WARNING: Failed to synchronize, error = 0x80240440
WARNING: Exit code = 0x80240440

 

At first I thought this may be related to the “Trusted Sites” within Internet Explorer. I have mine set through GPO, so I added “https://*.update.microsoft.com” to that GPO and then did a “gpupdate /force”, but the error remained.

 

Then I thought to look at my Sonicwall NSA 2400; we have the Application Control enabled, and this has been known to cause strange network connectivity issues even when not expected so I’ve just by default started checking here.

Unsurprisingly this turned out to be the problem. The strange thing is, the AppControl rule that was blocking the traffic isn’t visible in the list of applications; only through the logging did I find it.
If you navigate to the AppControl settings page, use the “Lookup Signature”, for signature # 6:

 

Click on the pencil icon, and you’ll see this screen:

 

Turns out the rule “Non-SSL Traffic over SSL port” is blocking this Windows Update traffic.

Setting the Block option to Disabled for this rule allows Windows Update to work properly.