Windows Updates failing on Azure VM

While playing around with some VMs in Azure, I ran into an issue where they could not perform Windows Updates. This was first noticed with failing Update deployments through Azure Automation:

In order to narrow down the issue, I tried to manually run Windows Updates from the VM itself. I confirmed that public Internet was accessible, but still received this error:

There were some problems installing updates, but we'll try again later. If you keep seeing this and want to search the web or contact support for information, this may help: (0x8024402f)

I ran the PowerShell command “Get-WindowsUpdateLog” which populates C:\Windows\WindowsUpdate.log (new behavior in Server 2016), and only found a brief error showing:

Failed to retrieve SLS response data for service

It was right around this time that I noticed a popup for memory exhaustion on this VM.The VM size I had chosen included 2GB of RAM.

I did a little test, and looked at Task Manager prior to running Windows Update – 45% of memory used:

Then I clicked “Retry” and saw the memory utilization ramp up to 55%, 75%, 85%, 95% and then the Windows Update process returned an error and Task Manager immediately dropped back down to 45% memory utilization.

It appears that memory exhaustion was causing the Windows Update process to crash out. What I don’t understand is why the page file didn’t come into play and grow to accommodate the memory demand; it was set to System Managed and there was more than enough space on the temporary disk to grow into.

 

After I increased the VM size with 4GB of RAM, it updated without issue.

Azure Application Gateway through NSG

I’m testing some things with Azure Application Gateway this week, and ran into a problem after trying to isolate down a network security group (NSG) to restrict virtual network traffic between subnets and peered VNETs.

Here’s the test layout:

click for big

The NSG applied to the “sub-clt-test” has a default incoming rule to allow VirtualNetwork traffic of any port. This is a service tag and according to Microsoft the VirtualNetwork tag includes all subnets within a VNET as well as peered VNETs. This means in my diagram, there’s an “allow all” rule between my “vnet-edge” and “vnet-client”. Not ideal.

I created a new rule on the NSG for “sub-clt-test” to deny all HTTP traffic into the subnet, and then the following intending to allow the Application Gateway to communicate to it’s backend-pool targets:

Source Destination Port
10.8.48.6 Any 80
AzureLoadBalancer (Service Tag) Any 80

The IP address listed there is what is listed as the the frontend private IP of my Application Gateway, within the subnet 10.8.48.0/24.  I added the second rule during testing in case that the Microsoft service tag included the Application Gateway within its dynamic range.

What I discovered is that this configuration broke the Application Gateway’s ability to communicate with the backend targets. The health probes went unresponsive and suggested that the NSG is reviewed.

I knew that this wasn’t due to outbound restrictions on any of my NSG’s because as soon as I removed the incoming port 80 deny on this subnet, it began functioning again.

I removed the Deny rule, and then installed WireShark on the backend web server, to collect information about what IP was actually making the connection.

I discovered that while the frontend private IP was listed as 10.8.48.6, it is actually the IP addresses of 10.8.48.4 and 10.8.48.5 making the connection to the backend pool. The frustrating part is that I couldn’t find any explanation for this behavior in Microsoft documentation – I know that the Application Gateway requires its own subnet not to be shared with other resources, but there’s no references to the reserved IP’s that traffic would be coming from.

Since the subnet is effectively reserved for this resource its easy enough to modify my NSG for the range, but I felt like I was missing something obvious as to why it is these IP addresses being used for the connections.

Halfway through writing this post though, I came across this blog post by RoudyBob, with a bit of insight. Each instance of the Application Gateway uses an IP from the subnet assigned, and it is these IPs that will communicate with your backend targets.

Dell PowerEdge BIOS failed due to IPMI driver

I was updating the BIOS on a couple of Dell PowerEdge R620’s today and was presented with an error I hadn’t heard of before:

IPMI driver is disabled. Please enable or load the driver and then reboot the system.

This was very odd. A little bit of searching showed that there should be an IPMI driver service running in Windows, so I checked that:

Service 'ipmidrv (IPMIDRV)' cannot be started due to the following error: Cannot start service IPMIDRV on computer '.

Knowing this was a driver, I went to look in Device Manager, and was quite surprised to find this IPMI driver listed:

This is a Dell PowerEdge, I have no idea how this HP driver got installed – I’m not even convinced it wasn’t there prior to the other driver updates I was performing. In any case, this device wasn’t starting properly, and was preventing the service from starting. I uninstalled it and the driver:

Following this, the appropriate device appeared under “System Devices”:

Now I could start the service, and the BIOS update proceeded properly.

Powershell command as scheduled task

Here’s the syntax to use a PowerShell command in Task Schedule action, rather than a script:

Program/script:

powershell.exe

Add Arguments:

-noninteractive -executionpolicy bypass -command &{Checkpoint-VM -Name pxetest -SnapshotName 2018-06-23-PreMaintenance}

 

The key here is the ampersand before the command – when I was missing that it would not run.

 

 

SharePoint library column missing

I’ve been working on migrating documents between document libraries in a SharePoint site, and have randomly struggled with an odd issue.

In the new destination library, I’ve configured a managed metadata column based on a term store, and made it mandatory. This column does not exist on the source library.

For the majority of my documents this has been working well – after moving or copying the document I set the column value with the quick-edit info pane. However, some of the documents have a blank value rather than showing “Required Information”:

click for bigger

When I view the properties of the document on the quick-edit pane, the column doesn’t appear at all!

The root cause is that these problem documents carried over their Content Type with them into the new library, and this content type does not have the managed metadata column added to it. However, it isn’t immediately clear this is what happened because the default setting on this library is to not manage Content Types individually.

To fix this, I went into the advanced library settings, and chose Yes for “Allow management of content types”:

Once I did this, the extra content type appeared in the settings page:

click for bigger

Now when I go back to my document and open the quick-edit pane, there is a new option to select Content Type, and I can set it back to the default:

This then updates the list and puts a “Required Information” block in that column, and allows me to fill it in.