AKS image pull failed from ProGet

July 2022 Update

After a support case with Inedo and Microsoft, this has been determined to be caused by Azure AD Application Proxy setting the Content-Length attribute to 0 for every HEAD request.

Microsoft is aware of this and has it in their backlog, along with a corresponding feedback item:


Original Post

Just solved (kind of) an issue with Azure Kubernetes Service performing an Image Pull from a private container registry provided by Inedo ProGet.

Using AKS 1.22.4 with containderd 1.55, I’ve followed the K8s instructions to pull an image from a private registry with the creation of a secret which is then referenced in the yaml manifest.

However, when I apply the manifest, my Pod doesn’t start, ending with an ErrImagePull error.

Performing a “kubectl describe pod [podname]” shows this error:

Failed to pull image "source repo/imagename:tag": rpc error: code = InvalidArgument desc = failed to pull and unpack image "source repo/imagename:tag": unable to fetch descriptor (sha256:hash) which reports content size of zero: invalid argument

Not a lot of other insights online related to this message, but I CAN see that it comes directly from containerd source code here: containerd/handlers.go at main · containerd/containerd · GitHub

I did find one reference talking about this error and an “Azure proxy”, and my ProGet instance was exposed to the Internet over Azure AD Application Proxy. Even though I could successfully reference and pull other container images from my ProGet instance, I modified my connectivity to be direct through my firewall temporarily – this didn’t resolve the problem.

I spent a little bit of time making sure my K8s “imagePullSecrets” was correct – again was able to verify successful pull from ProGet.

On a test machine, I also verified I could perform a ‘docker login’ command against ProGet and a ‘docker pull’, which was successful with this troublesome image.

I used ‘docker image inspect [image name]’ to compare against my working image and broken image, but didn’t find anything conclusive.

I knew this SAME image worked from Azure Container Registry (ACR), so I performed some docker tag and push commands to get the image into my ProGet:

docker tag myrepo.azurecr.io/path/landingpage:tag privaterepo.domain.com/path/landing-page:tag
docker push privaterepo.domain.com/path/landing-page:tag

My thought was, “same image, should work!” but I still received the same problem.


Wanting to look a little deeper, I tried to learn how to see a bit more interaction when my AKS cluster was attempting the image pull. This is when I came across Debugging K8s Nodes with crictl.

Using the knowledge of this tool, as well as Microsoft Docs on Connecting to AKS nodes, I was able to establish a privileged container and gain access to commands against my node with “chroot /host”.

I hit a roadblock trying to use crictl help docs to pull an image, as it continuously gave me a 403 Forbidden error despite proper credentials. But then I learned about “ctr”, which I discovered was already usable on my nodes from my privileged container!

Now we’re in business – the command “ctr image pull” has a flag for –http-dump which gave me a lot more information

I performed image pulls for my working image and broken image, and noticed the broken one was a LOT more chatty – multiple HTTP requests that seemed to be repeating.

Here’s the requests made for the working image:

  1. HEAD /v2/[image path]/[image name]/manifests/[tag]
  2. POST /v2/_auth
  3. GET /v2/_auth?scope=repository%3A[image path]%2F[image name]%3Apull&service=[repo name]
  4. HEAD /v2/[image path]/[image name]/manifests/[tag]

And this is what I saw for the broken image:

  1. HEAD /v2/[image path]/[image name]/manifests/[tag]
  2. POST /v2/_auth
  3. GET /v2/_auth?scope=repository%3A[image path]%2F[image name]%3Apull&service=[repo name]
  4. GET /v2/[image path]/[image name]/manifests/sha256:[hash]
  5. POST /v2/_auth
  6. GET /v2/_auth?scope=repository%3A[image path]%2F[image name]%3Apull&service=[repo name]
  7. GET /v2/[image path]/[image name]/manifests/sha256:[hash]
  8. GET /v2/[image path]/[image name]/blobs/sha256:[hash]
Quite a bit different, and some of the output from request #7 caught my eye – the response contained this header:
INFO[0001] Content-Type: application/vnd.docker.distribution.manifest.v2+json

And this content (truncated):

INFO[0001] "schemaVersion": 2,
INFO[0001] "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
INFO[0001] "config": {
INFO[0001]     "mediaType": "application/vnd.docker.container.image.v1+json",
INFO[0001]     "digest": "sha256:hash1"
INFO[0001] },
INFO[0001] "layers": [
INFO[0001] {
INFO[0001]    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
INFO[0001]    "size": 2814559,
INFO[0001]    "digest": "sha256:hash2"
INFO[0001] },
INFO[0001] {
INFO[0001]    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
INFO[0001]    "size": 7341522,
INFO[0001]    "digest": "sha256:hash3"
INFO[0001] },

I had seen this output before, in the metadata information provided by both ProGet, and ACR, so I did a comparison of my broken image between ProGet and ACR, and also my working image in ProGet.

What I found different in my broken image in ProGet was the “Size” attribute missing within the config property of the docker manifest!


  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 8486,
    "digest": "sha256:hash"



  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:samehash"

Even though I simply re-tagged the ACR image and pushed it into ProGet, that somehow dropped this size attribute.

I thought a short term fix would be to manually edit the docker manifest in ProGet to add the size property, but after finding and modifying the file (D:\ProGet\Packages\.docker\F7\manifests\sha256) it hadn’t updated in the ProGet GUI and I didn’t dig into that further.

Instead I re-checked my tests from the docker CLI, with “docker manifest inspect [image name]:[tag]”. What I found was that the ACR image had the size attribute, but somehow the image I re-tagged and pushed to ProGet did not!

So I re-tagged my ACR image again (call it “jeff2”), and then re-pushed to ProGet; this time, the size attribute stayed! I was then able to successfully perform a ‘ctr image pull’ command, and my ‘kubectl apply’ worked!

Here’s what I think happened:

  • First image push (done by teammate) to ProGet occurred over the Internet, behind the Azure AD App Proxy, which stripped the size attribute from the docker manifest
    • Perhaps similar in a way to this bug worked on by the wikimedia team with their varnish proxy?
  • Initial testing, including docker CLI re-tag and push done when ProGet was still behind the Azure AD App Proxy
  • Assume docker client doesn’t care about missing size attribute, but containerd does
  • Assume actual containerd image pull doesn’t mind coming from Azure AD App Proxy (why my other images worked from ProGet originally)
  • After switching to direct HTTPS, pushing an image to ProGet retains the size attribute.

One final test to explain my re-tagging behavior. I re-instated ProGet behind the Azure AD App Proxy, and then re-pushed my “jeff2” tagged image (not a pull from ProGet, just a push) and in both my workstation, and in ProGet, the size attribute was gone!

  • This confirms my assumption that re-tagging and pushing to ProGet not only dropped the attribute in ProGet, but my local docker image cache too.


For now, takeaway is don’t put ProGet behind Azure AD App Proxy! I’ve got a support case open with Inedo that I’ll relay this information to.

Populate Azure File Share from DevOps Pipeline

Use case: There’s a set of files/scripts/templates that I want to keep in sync on a set of servers, but only on-demand.

A few different ways to solve this, but one way following a pattern I’ve used a few times is to have an Azure DevOps pipeline that populates and Azure File Share, and then a separate script deployed on the servers that can on-demand pull in files from the File Share.

The script below is a YAML pipeline for Azure DevOps, that uses an AzurePowerShell task.

The primary issue I had to work-around with this (at least using the Azure PowerShell module, is that the cmdlet “Set-AzStorageFileContent” requires the parent directory to exist; it won’t auto-create it. And unfortunately “New-AzStorageDirectory” has the same problem, not creating directories recursively.

So the PowerShell script below has two sections: first to create all the folders by ensuring each leaf in the path of each distinct folder gets created, and then populating with files.


  storageAccountName: "stg123"
  resourcegroupName: "teststorage-rg"
  fileShareName: "firstfileshare"

    - main
    include: # Only trigger the pipeline on this path in the git repo
    - 'FileTemplates/*'

    vmImage: 'windows-latest'

- task: AzurePowerShell@5
    azureSubscription: 'AzureSubConnection' #This is the devops service connection name
    ErrorActionPreference: 'Stop'
    FailOnStandardError: true
    ScriptType: 'inlineScript'
    inline: |
      $accountKey = (Get-AzStorageAccountKey -ResourceGroupName $(resourcegroupName) -Name $(storageAccountName))[0].Value
      $ctx = New-AzStorageContext -StorageAccountName $(StorageAccountName) -StorageAccountKey $accountKey
      $s = Get-AzStorageShare $(fileShareName) -Context $ctx
      # We only want to copy a subset of files in the repo, so we'll set our script location to that path
      Set-Location "$(Build.SourcesDirectory)\FileTemplates"
      $CurrentFolder = (Get-Item .).FullName
      $files = Get-ChildItem -Recurse | Where-Object { $_.GetType().Name -eq "FileInfo"}

      # Get all the unique folders without filenames
      $folders = $files.FullName.Substring($Currentfolder.Length+1).Replace("\","/") | split-path -parent | Get-Unique

      # Create Folders for every possible path
      foreach ($folder in $folders) {
        if ($folder -ne ""){
          $folderpath = ("dbscripts\" + $folder).Replace("\","/") # Create a toplevel folder in front of each path to organize within the Azure Share
          $foldersPathLeafs = $folderpath.Split("/")
          if ($foldersPathLeafs.Count -gt 1) {
            foreach ($index in 0..($foldersPathLeafs.Count - 1)) {
              $desiredfolderpath = [string]::Join("/", $foldersPathLeafs[0..$index])
              try {
              catch {
                $message = $_
                Write-Warning "That didn't work: $message"


      # Create each file
      foreach ($file in $files) {
        $path = "scripts/"+$path # Create a toplevel folder in front of each path to organize within the Azure Share
        Write-output "Writing: $($file.FullName)"
        try {
          Set-AzStorageFileContent -Share $s.CloudFileShare -Source $file.FullName -Path $path -Force
        catch {
          $message = $_
          Write-Warning "That didn't work: $message"
    azurePowerShellVersion: 'LatestVersion'
  displayName: "Azure Files Storage Copy"

IIS applications and virtual directories with PowerShell

I’m currently building a container on a Windows Server Core base image with IIS. The intention will be to run this within Azure Kubernetes Service (AKS), on Windows node pools.

A very useful resource in understanding the IIS concepts discussed in this post comes from Octopus: https://octopus.com/blog/iis-powershell#recap-iis-theory

One of the challenges I’m working with is the desire to meet both these requirements:

  • Able to always place our application in a consistent and standard path (like c:\app)
  • Need to be able to serve the app behind customizable virtual paths
    • For example, /env/app/webservice or /env/endpoint
    • These virtual paths should be specified at runtime, not in the container build (to reduce the number of unique containers)
    • A unique domain cannot be required for each application

One of the thoughts is that while testing the application locally, I want to be able to reach the application at the root path (i.e. http://localhost:8080/) but when put together in the context of a distributed system, I want to serve this application behind a customizable path.

In AKS, using the ingress-nginx controller, I can use the “rewrite-target” annotation in order to have my ingress represent the virtual path while maintaining the application at the root of IIS in the container. However, this quickly falls down when various applications are used that might have non-relative links for stylesheets and javascript includes.

One idea was to place the application in the root (c:\inetpub\wwwroot) and then add a new Application on my virtual path pointing to the same physical path. However, this caused problems with duplicate web.config being recognized because it was picked up from the physical path at the root Application and my virtual path Application. This could be mitigated in the web.config with the use of “<location inheritInChildApplications=”false”>” tags, but I also realized I don’t need BOTH requirements to be available at the same time. If a variable virtual path is passed into my container, I don’t need the application served at the root.

With this in mind, I set about creating logic like this:

  1. In the Dockerfile, place the application at c:\app
  2. If the environment variable “Virtual Path” exists
    1. Create an IIS Application pointing at the supplied Virtual Path, with a physical path of c:\app
  3. else
    1. Change the physical path of “Default Web Site” to c:\app

I tested this in the GUI on a Windows Server 2019 test virtual machine, and it appeared to work for my application just fine. However, when I tested using PowerShell (intending to move functional code into my docker run.ps1 script), unexpected errors occurred.

Here’s what I was attempting:

New-WebVirtualDirectory -Name "envtest/app1/webservice" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot"

And here is the error it produced for me:

The view at ‘~/Views/Home/Index.cshtml’ must derive from WebViewPage, or WebViewPage

Interestingly, displaying straight HTML within this virtual path for the Application works just fine – it is only the MVC app that has an error.

The application I’m testing with is a dotnet MVC application, but none of the common solutions to this problem are relevant – the application works just fine at the root of a website, just not when applied under a virtual path.

Using the context from the Octopus link above, I began digging a little deeper and testing. Primarily targeting the ApplicationHost.config file located at “C:\windows\system32\inetsrv\Config”.

When I manually created my pathing in the GUI that was successful (creating each virtual subdir), the structure within the Site in this config file looked like this:

<site name="Default Web Site" id="1">
    <application path="/">
        <virtualDirectory path="/" physicalPath="%SystemDrive%\inetpub\wwwroot" />
		<virtualDirectory path="/envtest" physicalPath="%SystemDrive%\inetpub\wwwroot" />
		<virtualDirectory path="/envtest/app1" physicalPath="%SystemDrive%\inetpub\wwwroot" />
    <application path="envtest/app1/webservice" applicationPool="DefaultAppPool">
        <virtualDirectory path="/" physicalPath="C:\inetpub\wwwroot" />
        <binding protocol="http" bindingInformation="*:80:" />
    <logFile logTargetW3C="ETW" />

However, when I used the PowerShell example above, this is what was generated:

<site name="Default Web Site" id="1">
    <application path="/">
        <virtualDirectory path="/" physicalPath="%SystemDrive%\inetpub\wwwroot" />
    <application path="envtest/app1/webservice" applicationPool="DefaultAppPool">
        <virtualDirectory path="/" physicalPath="C:\inetpub\wwwroot" />
        <binding protocol="http" bindingInformation="*:80:" />
    <logFile logTargetW3C="ETW" />

It seems clear that while IIS can serve content under the virtual path I created, MVC doesn’t like the missing virtual directories.


When I expanded my manual PowerShell implementation to look like this, then the application began to work without error:

New-WebVirtualDirectory -Name "/envtest" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot"
New-WebVirtualDirectory -Name "/envtest/app1" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot"
New-WebApplication -Name "/envtest/app1/webservice" -PhysicalPath "C:\app\" -Site "Default Web Site" -ApplicationPool "DefaultAppPool"

I could then confirm that my ApplicationHost.config file matched what was created in the GUI.


The last piece of this for me was turning a Virtual Path environment variable that could contain any kind of pathing, into the correct representation of IIS virtual directories and applications.

Here’s an example of how I’m doing that:

if (Test-Path "ENV:VirtualPath")
    # Trim the start in case a prefix forwardslash was supplied
    $ENV:VirtualPath = $ENV:VirtualPath.TrimStart("/")
    Write-Host "Virtual Path is passed, will configure IIS web application"
    # We have to ensure the Application/VirtualDirectory in IIS gets created properly in the event of multiple elements in the path
    # Otherwise IIS won't serve some applications properly, like ASP.NET MVC sites

    Import-Module WebAdministration
    # for each item in the Virtual Path, excluding the last Leaf
    foreach ($leaf in 0..($ENV:VirtualPath.Split("/").Count-2)) { # minus 1 for 0-based counting, minus 2 for dropping the last leaf
        if ($leaf -eq 0){
            # Check and see if we're the first index of the VirtualPath, and if so just use it
            $usepath = $ENV:VirtualPath.Split("/")[$leaf]
        } else {
            # If not first index, go through all previous index and concat
            $usepath = [string]::Join("/",$ENV:VirtualPath.Split("/")[0..$leaf])
        New-WebVirtualDirectory -Name "$usepath" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot" # Don't specify Application, default to root

    # Create Application with the full Virtual Path (making last element effective)
    New-WebApplication -Name "$ENV:VirtualPath" -PhysicalPath "C:\app\" -Site "Default Web Site" -ApplicationPool "DefaultAppPool" # Expect no beginning forward slash
} else {
    # Since no virtual path was passed, we want Default Web Site to point to C:\app
    Set-ItemProperty -Path "IIS:\Sites\Default Web Site" -name "physicalPath" -value "C:\app\"


AKS StorageClass for Standard HDD managed disk

Today while exploring the Azure Kubernetes Service docs, specifically looking at Storage, I came across a note about StorageClasses:

You can create a StorageClass for additional needs using kubectl

This combined with the description of the default StorageClasses for Managed Disks being Premium and Standard SSD led me to question “what if I want a Standard HDD for my pod?”

This is absolutely possible!

First I took a look at the parameters for an existing StorageClass, the ‘managed-csi’:

While the example provided in the link above uses the old ‘in-tree’ methods of StorageClasses, this gave me the proper Provisioner value to use the Cluster Storage Interface (CSI) method.

I created a yaml file with these contents:

kind: StorageClass
apiVersion: storage.k8s.io/v1
  name: managed-csi-hdd
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: True
volumeBindingMode: WaitForFirstConsumer
  skuname: StandardHDD_LRS

In reality, I took a guess at the “skuname” parameter here, replacing the “StandardSSD_LRS” with “StandardHDD_LRS”. Having used Terraform before with Managed Disk sku’s I figured this wasn’t going to be valid, but I wanted to see what happened.

Then I performed a ‘kubectl apply -f filename.yaml’ to create my StorageClass. This worked without any errors.

To test, I created a PersistentVolumeClaim, and then a simple Pod, with this yaml:

apiVersion: v1
kind: PersistentVolumeClaim
  name: test-hdd-disk
  - ReadWriteOnce
  storageClassName: managed-csi-hdd
      storage: 5Gi
kind: Pod
apiVersion: v1
  name: teststorage-pod
        "kubernetes.io/os": linux
    - name: teststorage
      image: mcr.microsoft.com/oss/nginx/nginx:1.15.5-alpine
      - mountPath: "/mnt/azurehdd"
        name: hddvolume
    - name: hddvolume
        claimName: test-hdd-disk

After applying this with kubectl, my PersistentVolumeClaim was in a Pending state, and the Pod wouldn’t create. I looked at the Events of my PersistentVolumeClaim, and found an error as expected:

This is telling me my ‘skuname’ value isn’t valid and instead I should be using a supported type like “Standard_LRS”.

Using kubectl I deleted my Pod, PersistentVolumeClaim, and StorageClass, modified my yaml, and re-applied.

This time, the claim was created successfully, and a persistent volume was dynamically generated. I can see that disk created as the correct type in the Azure Portal listing of disks:

The Supported Values in that error message also tells me I can create ZRS-enabled StorageClasses, but only for Premium and StandardSSD managed disks.

Here’s the proper functioning yaml for the StorageClass, with the skuname fixed:

kind: StorageClass
apiVersion: storage.k8s.io/v1
  name: managed-csi-hdd
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: True
volumeBindingMode: WaitForFirstConsumer
  skuname: Standard_LRS


AKS Windows Node problem after 1.22 upgrade

Here’s a bit of a troubleshooting log as I worked through an experimental cluster in Azure Kubernetes Service (AKS).

As a starting point, my cluster was on K8s version 1.21.4, with one node pool of “system” type on Linux, and one nodepool of “user” type on Windows.

I performed an upgrade to 1.22.4, upgrading both the cluster and the nodepools.

Following this I had 2 issues appear in the Azure Portal for my node pools:

  1. The Linux node pool only rebuilt one of the Virtual Machine Scale Set (VMSS) instances to run 1.22.4 – the other instance was still running 1.21.4 when I viewed the node list in the Portal or with kubectl.
  2. The Windows node pool displayed a node count of 3, but it also showed “0/0 ready” with NO instances in the node list.

Problem #1 was solved by scaling down the pool to 1 instance, and then scaling back to 2. AKS removed and re-created the VMSS instance properly and it all looked good.

Problem #2 was harder – kubectl didn’t see the nodes at all, but I did find the VMSS with the correct number of instances and they appeared healthy (as far as Virtual Machines go). Performing scaling operations on the node pool through AKS affected the VMSS properly (scaling right down to zero even) however these actions didn’t resolve the problem of kubectl not knowing the nodes existed.

I’m coming into both AKS and Kubernetes pretty blind and ignorant, so I began looking at how I could get onto the Nodes themselves and dig through some logs.

This Microsoft Doc talks about viewing the kubelet logs, using an SSH connection to your nodes through a debug container. However, this didn’t work for me because I didn’t have the original SSH keys from cluster setup, and even though I reset the Windows Node credentials (az aks update –resource-group $RESOURCE_GROUP –name  $CLUSTER_NAME –windows-admin-password $NEW_PW) I still received public key errors attempting to SSH.

Instead, I dropped a new VM into the virtual network with a public IP, and gave myself RDP access to this as a jump host. From here, I could perform RDP directly into my Windows Nodes, as well as SMB access to \\nodeIP\c$.

This let me look at this path: c:\k\kubelet.log

Where I found this error:

E1223 15:50:40.001852 4532 server.go:194] "Failed to validate kubelet flags" err="the DynamicKubeletConfig feature gate must be enabled in order to use the --dynamic-config-dir flag"

Exception calling "RunProcess" with "3" argument(s): "Cannot process request because the process (4532) has exited."
At C:\k\kubeletstart.ps1:266 char:1
+ [RunProcess.exec]::RunProcess($exe, $args, [System.Diagnostics.Proces ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : InvalidOperationException

I also found errors in the file c:\k\kubeproxy.err.log about missing processes azure-vnet.exe and azure-vnet-ipam.exe.

I did a bunch of reading about Troubleshooting Kubernetes Networking on Windows, and ran “hnsdiag list all” through that process, discovering it had zero entries.

At this point, I spun up a new Windows node pool to use as comparison. Here’s a couple things I found:

  • c:\k\config was missing on my broken node
  • “hnsdiag list all” produced lots of output on a good node, and virtually empty on my bad node
  • The good node had a lot of extra files in C:\k\ related to azure-vnet and azure-vnet-ipam

I began looking into the error listed above, specifically around “the DynamicKubeletConfig feature gate must be enabled”. My searching to this K8s page on dynamic kubelet configuration, stating it as deprecated on 1.22.

Now I wanted to find where that feature flag was coming from.

The Kubelet process runs as a service on these Windows nodes:

I wanted to see what executable these were actually running, which you can do with this command:

Get-WmiObject win32_service | ?{$_.Name -like '*kube*'} | select Name, DisplayName, State, PathName

Interesting, it is using NSSM. Luckily I’m familiar with that for running Windows services, and you can inspect the config  for a service like this:

.\nssm dump kubelet

Ok so the Kubelet is a Powershell script: c:\k\kubeletstart.ps1.

I opened that file and started digging. Right away it became apparent where this “DynamicKubeletConfig” flag as an argument on the kubelet service was coming from.

The first line pulls in ClusterConfiguration from a file, and then on line 35 that is turned into $KubeletArgList variable:

# Line 1
$Global:ClusterConfiguration = ConvertFrom-Json ((Get-Content "c:\k\kubeclusterconfig.json" -ErrorAction Stop) | out-string)
# Skip a bunch of stuff until line 35:
$KubeletArgList = $Global:ClusterConfiguration.Kubernetes.Kubelet.ConfigArgs # This is the initial list passed in from aks-engine


I can inspect this PowerShell variable and see the flag added there. Now I compare the C:\k\kubeclusterconfig.json” file between my good and bad nodes, and find that is the only difference between the two!

I removed that line and saved the file, and then forced a restart of the Kubelet and KubeProxy services.

It appeared to work! Now kubectl and Azure Portal recognize my node, the C:\k\config file and c:\k\azure-vnet.* files were auto-generated, and my pods started being scheduled properly.

Now my question is, “how come this file didn’t get updated properly to remove the flag, and why did this continue to be an issue every time I scaled a new instance in the VMSS?”.

With 1 working node, I scaled my node pool to a count of 2. What I expected was that the count would recognize as 2 but it would say “1/1 ready” with only a single node still listed from ‘kubectl get nodes’. I am assuming that however this config is stored for the VMSS, editing the file on a single running instance doesn’t update it for all of them.

And that is exactly what happened:

That is the next thread I’ll be pulling on, and will post an update to this when I find out more.

Update – 2022-01-05

I’ve received information from Microsoft Support that this is an internal (non-public) bug of “Nodes failed to register with API server after upgrading to 1.22 AKS version” that the AKS team is working on. However, I’m told that even after a fix has been rolled out; I will need to recreate a new node-pool to resolve this issue – it won’t be back-ported to current node-pools.