AKS StorageClass for Standard HDD managed disk

Today while exploring the Azure Kubernetes Service docs, specifically looking at Storage, I came across a note about StorageClasses:

You can create a StorageClass for additional needs using kubectl

This combined with the description of the default StorageClasses for Managed Disks being Premium and Standard SSD led me to question “what if I want a Standard HDD for my pod?”

This is absolutely possible!

First I took a look at the parameters for an existing StorageClass, the ‘managed-csi’:

While the example provided in the link above uses the old ‘in-tree’ methods of StorageClasses, this gave me the proper Provisioner value to use the Cluster Storage Interface (CSI) method.

I created a yaml file with these contents:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: managed-csi-hdd
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: True
volumeBindingMode: WaitForFirstConsumer
parameters:
  skuname: StandardHDD_LRS

In reality, I took a guess at the “skuname” parameter here, replacing the “StandardSSD_LRS” with “StandardHDD_LRS”. Having used Terraform before with Managed Disk sku’s I figured this wasn’t going to be valid, but I wanted to see what happened.

Then I performed a ‘kubectl apply -f filename.yaml’ to create my StorageClass. This worked without any errors.

To test, I created a PersistentVolumeClaim, and then a simple Pod, with this yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-hdd-disk
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: managed-csi-hdd
  resources:
    requests:
      storage: 5Gi
---
kind: Pod
apiVersion: v1
metadata:
  name: teststorage-pod
spec:
  nodeSelector:
        "kubernetes.io/os": linux
  containers:
    - name: teststorage
      image: mcr.microsoft.com/oss/nginx/nginx:1.15.5-alpine
      volumeMounts:
      - mountPath: "/mnt/azurehdd"
        name: hddvolume
  volumes:
    - name: hddvolume
      persistentVolumeClaim:
        claimName: test-hdd-disk

After applying this with kubectl, my PersistentVolumeClaim was in a Pending state, and the Pod wouldn’t create. I looked at the Events of my PersistentVolumeClaim, and found an error as expected:

This is telling me my ‘skuname’ value isn’t valid and instead I should be using a supported type like “Standard_LRS”.

Using kubectl I deleted my Pod, PersistentVolumeClaim, and StorageClass, modified my yaml, and re-applied.

This time, the claim was created successfully, and a persistent volume was dynamically generated. I can see that disk created as the correct type in the Azure Portal listing of disks:

The Supported Values in that error message also tells me I can create ZRS-enabled StorageClasses, but only for Premium and StandardSSD managed disks.

Here’s the proper functioning yaml for the StorageClass, with the skuname fixed:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: managed-csi-hdd
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: True
volumeBindingMode: WaitForFirstConsumer
parameters:
  skuname: Standard_LRS

 

AKS Windows Node problem after 1.22 upgrade

Here’s a bit of a troubleshooting log as I worked through an experimental cluster in Azure Kubernetes Service (AKS).

As a starting point, my cluster was on K8s version 1.21.4, with one node pool of “system” type on Linux, and one nodepool of “user” type on Windows.

I performed an upgrade to 1.22.4, upgrading both the cluster and the nodepools.

Following this I had 2 issues appear in the Azure Portal for my node pools:

  1. The Linux node pool only rebuilt one of the Virtual Machine Scale Set (VMSS) instances to run 1.22.4 – the other instance was still running 1.21.4 when I viewed the node list in the Portal or with kubectl.
  2. The Windows node pool displayed a node count of 3, but it also showed “0/0 ready” with NO instances in the node list.

Problem #1 was solved by scaling down the pool to 1 instance, and then scaling back to 2. AKS removed and re-created the VMSS instance properly and it all looked good.

Problem #2 was harder – kubectl didn’t see the nodes at all, but I did find the VMSS with the correct number of instances and they appeared healthy (as far as Virtual Machines go). Performing scaling operations on the node pool through AKS affected the VMSS properly (scaling right down to zero even) however these actions didn’t resolve the problem of kubectl not knowing the nodes existed.

I’m coming into both AKS and Kubernetes pretty blind and ignorant, so I began looking at how I could get onto the Nodes themselves and dig through some logs.

This Microsoft Doc talks about viewing the kubelet logs, using an SSH connection to your nodes through a debug container. However, this didn’t work for me because I didn’t have the original SSH keys from cluster setup, and even though I reset the Windows Node credentials (az aks update –resource-group $RESOURCE_GROUP –nameĀ  $CLUSTER_NAME –windows-admin-password $NEW_PW) I still received public key errors attempting to SSH.

Instead, I dropped a new VM into the virtual network with a public IP, and gave myself RDP access to this as a jump host. From here, I could perform RDP directly into my Windows Nodes, as well as SMB access to \\nodeIP\c$.

This let me look at this path: c:\k\kubelet.log

Where I found this error:

E1223 15:50:40.001852 4532 server.go:194] "Failed to validate kubelet flags" err="the DynamicKubeletConfig feature gate must be enabled in order to use the --dynamic-config-dir flag"

Exception calling "RunProcess" with "3" argument(s): "Cannot process request because the process (4532) has exited."
At C:\k\kubeletstart.ps1:266 char:1
+ [RunProcess.exec]::RunProcess($exe, $args, [System.Diagnostics.Proces ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : InvalidOperationException

I also found errors in the file c:\k\kubeproxy.err.log about missing processes azure-vnet.exe and azure-vnet-ipam.exe.

I did a bunch of reading about Troubleshooting Kubernetes Networking on Windows, and ran “hnsdiag list all” through that process, discovering it had zero entries.

At this point, I spun up a new Windows node pool to use as comparison. Here’s a couple things I found:

  • c:\k\config was missing on my broken node
  • “hnsdiag list all” produced lots of output on a good node, and virtually empty on my bad node
  • The good node had a lot of extra files in C:\k\ related to azure-vnet and azure-vnet-ipam

I began looking into the error listed above, specifically around “the DynamicKubeletConfig feature gate must be enabled”. My searching to this K8s page on dynamic kubelet configuration, stating it as deprecated on 1.22.

Now I wanted to find where that feature flag was coming from.

The Kubelet process runs as a service on these Windows nodes:

I wanted to see what executable these were actually running, which you can do with this command:

Get-WmiObject win32_service | ?{$_.Name -like '*kube*'} | select Name, DisplayName, State, PathName

Interesting, it is using NSSM. Luckily I’m familiar with that for running Windows services, and you can inspect the configĀ  for a service like this:

.\nssm dump kubelet

Ok so the Kubelet is a Powershell script: c:\k\kubeletstart.ps1.

I opened that file and started digging. Right away it became apparent where this “DynamicKubeletConfig” flag as an argument on the kubelet service was coming from.

The first line pulls in ClusterConfiguration from a file, and then on line 35 that is turned into $KubeletArgList variable:

# Line 1
$Global:ClusterConfiguration = ConvertFrom-Json ((Get-Content "c:\k\kubeclusterconfig.json" -ErrorAction Stop) | out-string)
# Skip a bunch of stuff until line 35:
$KubeletArgList = $Global:ClusterConfiguration.Kubernetes.Kubelet.ConfigArgs # This is the initial list passed in from aks-engine

dfa

I can inspect this PowerShell variable and see the flag added there. Now I compare the C:\k\kubeclusterconfig.json” file between my good and bad nodes, and find that is the only difference between the two!

I removed that line and saved the file, and then forced a restart of the Kubelet and KubeProxy services.

It appeared to work! Now kubectl and Azure Portal recognize my node, the C:\k\config file and c:\k\azure-vnet.* files were auto-generated, and my pods started being scheduled properly.

Now my question is, “how come this file didn’t get updated properly to remove the flag, and why did this continue to be an issue every time I scaled a new instance in the VMSS?”.

With 1 working node, I scaled my node pool to a count of 2. What I expected was that the count would recognize as 2 but it would say “1/1 ready” with only a single node still listed from ‘kubectl get nodes’. I am assuming that however this config is stored for the VMSS, editing the file on a single running instance doesn’t update it for all of them.

And that is exactly what happened:

That is the next thread I’ll be pulling on, and will post an update to this when I find out more.

Update – 2022-01-05

I’ve received information from Microsoft Support that this is an internal (non-public) bug of “Nodes failed to register with API server after upgrading to 1.22 AKS version” that the AKS team is working on. However, I’m told that even after a fix has been rolled out; I will need to recreate a new node-pool to resolve this issue – it won’t be back-ported to current node-pools.

 

 

DSC disk resource failing due to defrag

I worked through an interesting problem today occurring with Desired State Configuration tied into Azure Automation.

In this scenario, Azure Virtual Machines are connected to Azure Automation for Desired State Configuration, being configured with a variety of resources. One of them is failing, the “Disk” resource, although it was previously working in the past.

The PowerShell DSC resource ‘[Disk]EVolume’ with SourceInfo ‘::1208::13::Disk’ threw one or more non-terminating errors while running the Test-TargetResource functionality. These errors are logged to the ETW channel called Microsoft-Windows-DSC/Operational. Refer to this channel for more details.

I need more detail, so lets see what the interactive run of DSC on the failing virtual machine. While I can view the logs located in “C:\Windows\System32\Configuration\ConfigurationStatus”, I found that in this case, this doesn’t reveal any additional detail beyond what the Azure Portal does.

I run DSC interactively with this command:

Invoke-CimMethod -CimSession $env:computername -Name PerformRequiredConfigurationChecks -Namespace root/Microsoft/Windows/DesiredStateConfiguration -Arguments @{Flags=[Uint32]2} -ClassName MSFT_DscLocalConfigurationManager -Verbose

Now we can see the output of this resource in DSC better:

Invoke-CimMethod : Invalid Parameter
Activity ID: {aab6d6cd-1125-4e9c-8c4e-044e7a14ba07}
At line:1 char:1
+ Invoke-CimMethod -CimSession $env:computername -Name PerformRequiredC ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (StorageWMI:) [Invoke-CimMethod], CimException
    + FullyQualifiedErrorId : StorageWMI 5,Get-PartitionSupportedSize,Microsoft.Management.Infrastructure.CimCmdlets.InvokeCimMethodCommand

This isn’t very useful on it’s own, but the error does lead to an issue logged against the StorageDSC module that is directly related:

https://github.com/dsccommunity/StorageDsc/issues/248

The “out-of-resource” test that ianwalkeruk provides reproduces the error on my system:

$partition = Get-Partition -DriveLetter 'E' | Select-Object -First 1
$partition | Get-PartitionSupportedSize

Running this on my system produces a similar error:

There does happen to be a known issue on the Disk resource in GitHub with Get-PartitionSupportedSize and the defragsvc:

https://github.com/dsccommunity/StorageDsc/wiki/Disk#defragsvc-conflict

Looking at the event logs on my VM, I can see that the nightly defrag from the default scheduled task has been failing:

The volume Websites (E:) was not optimized because an error was encountered: The parameter is incorrect. (0x80070057)

Looking at the docs for Get-PartitionSupportedSize, there is a note that says “This cmdlet starts the “Optimize Drive” (defragsvc) service.”

Based on timing of events, it appears like defrag hasn’t been able to successfully complete in a long time, because it’s duration is longer than the DSC refresh interval – when DSC runs and eventually triggers Get-PartitionSupportedSize, it aborts the defrag. Even running this manually I can see this occur:

The user cancelled the operation. (0x890000006)

At this point, I don’t know what it is about a failed defrag state that is causing Get-PartitionSupportedSize to fail with “Invalid Parameter” – even when defrag isn’t running that cmdlet fails.

However, in one of my systems with this problem, if I ensure that the defrag successfully finishes (by manually running it after each time DSC kills it, making incremental progress), then we can see Get-PartitionSupportedSize all of a sudden succeed!

And following this, DSC now succeeds!

So if you’re seeing “Invalid Parameter” coming from Get-PartitionSupportedSize, make sure you’ve got successful Defrag happening on that volume!

Azure KMS and NSGs

A co-worker recently posed a question to me regarding virtual machines in Azure – “how do they activate Windows?”

The short answer is through KMS – using a KMS key and “kms.core.windows.net:1688”. You can see this on an Azure VM by typing:

slmgr /dlv

The functionality of KMS within Azure isn’t well documented, but there are buried references to things like:

The IP address of the KMS server for the Azure Global cloud is 23.102.135.246. Its DNS name is kms.core.windows.net.

This is a singular IP used globally, and from that same doc, “requires that the activation request come from an Azure public IP address.”

 

This leads to another question – “In an environment with deny-by-default outbound rules, how does KMS communicate?”

The short answer here is, “Magic?”

Lets say I have an NSG attached to a subnet, with the following outbound rules:

There is nothing specifically allowing access to the “kms.core.windows.net” IP address, but there IS a deny rule to the Internet Service Tag, so I would expect this traffic to be denied.

But a Test-NetConnection succeeds!

I check Network Watcher, with the IP Flow Verify tool. It says that this traffic will be denied by my Internet deny rule.

I check the NSG Flow Logs, and surprisingly I see zero references to my traffic on ANY rules! But, if I change my Test-NetConnection to a different port (say 1687), then it does appear as denied:

 

What is happening here?

I have yet to find anything authoritative in Microsoft’s documentation or any related GitHub issue. But based on what has been tested and the existing documents, I think that there is a hidden default NSG rule (likely below priority 100) which is configured to not be visible or logged (even from Network Watcher!) but allows traffic despite my own best efforts to block it.

This is operating effectively the same way that Azure DNS does, I believe – the IP address 168.63.129.16 is always reachable, regardless of the NSG rules you put in place.

There can be exceptions to that statement – there is a Service Tag named AzurePlatformLKM, which can be used to “disable the defaults for licensing” – I believe (but haven’t yet tested) that using this Service Tag in a deny rule would effectively block this traffic on 1688.

Add Maintenance Configuration to Azure Dedicated Host issue

When using Azure Dedicated Hosts, Microsoft offers a way to have more control over maintenance events that occur on the host. This is done through “Maintenance Control” configuration objects.

I’m attempting to test this, and have been receiving failures that I need to investigate.

To start, I created a host group and a host in the Azure Portal. Then I created a maintenance configuration.

Now when I go into the Maintenance Configuration and try to assign my Host, the activity fails:

 

After a few minutes, this shows up in the Hosts activity log:

 

With this detail:

 

I have two assumptions to test at this point:

  1. Maybe something is broken in the portal, because the error indicates it is trying to create an assignment with “dummyname” which doesn’t look right
  2. Perhaps a VM must be assigned to the Dedicated Host before a Maintenance Configuration can be assigned to it

I wanted to test #1 before #2 to reduce the number of variables, and I used Az PowerShell to do so.

First, I Get-AzMaintenanceConfiguration:

Then I create an assignment, using PowerShell splatting:

$AssignmentParams = @{
	ResourceGroupName 			= $resourcegroupname
	ResourceParentType 			= "hostGroups"
	ResourceParentName 			= "eastus2-hostgroup"
	ResourceType 				= "hosts"
	ResourceName 				= "eastus2-host1"
	ProviderName				= "Microsoft.Compute"
	ConfigurationAssignmentName             = "$($config.name)-host1"
	MaintenanceConfigurationId 	        = $config.Id
	Location 				= $config.location
}

New-AzConfigurationAssignment @AssignmentParams

Surprisingly this seemed to work, with no errors output. I say surprisingly, because I’m sure I tested this with errors last week before I decided to sit down and write a blog post about it!

Performing a Get-AzConfigurationAssignment with a smaller set of parameters in a hashtable returns the assignment that I expect to see:

$GetAssignmentParams = @{
	ResourceGroupName 			= $resourcegroupname
	ResourceParentType 			= "hostGroups"
	ResourceParentName 			= "eastus2-hostgroup"
	ResourceType 				= "hosts"
	ResourceName 				= "eastus2-host1"
	ProviderName				= "Microsoft.Compute"
}
Get-AzConfigurationAssignment @GetAssignmentParams

 

 

When I look in the Portal, I can see the assignment on the Host:

 

And it also appears when looking at the Maintenance Configuration too: