AKS image pull failed from ProGet

Just solved (kind of) an issue with Azure Kubernetes Service performing an Image Pull from a private container registry provided by Inedo ProGet.

Using AKS 1.22.4 with containderd 1.55, I’ve followed the K8s instructions to pull an image from a private registry with the creation of a secret which is then referenced in the yaml manifest.

However, when I apply the manifest, my Pod doesn’t start, ending with an ErrImagePull error.

Performing a “kubectl describe pod [podname]” shows this error:

Failed to pull image "source repo/imagename:tag": rpc error: code = InvalidArgument desc = failed to pull and unpack image "source repo/imagename:tag": unable to fetch descriptor (sha256:hash) which reports content size of zero: invalid argument

Not a lot of other insights online related to this message, but I CAN see that it comes directly from containerd source code here: containerd/handlers.go at main · containerd/containerd · GitHub

I did find one reference talking about this error and an “Azure proxy”, and my ProGet instance was exposed to the Internet over Azure AD Application Proxy. Even though I could successfully reference and pull other container images from my ProGet instance, I modified my connectivity to be direct through my firewall temporarily – this didn’t resolve the problem.

I spent a little bit of time making sure my K8s “imagePullSecrets” was correct – again was able to verify successful pull from ProGet.

On a test machine, I also verified I could perform a ‘docker login’ command against ProGet and a ‘docker pull’, which was successful with this troublesome image.

I used ‘docker image inspect [image name]’ to compare against my working image and broken image, but didn’t find anything conclusive.

I knew this SAME image worked from Azure Container Registry (ACR), so I performed some docker tag and push commands to get the image into my ProGet:

docker tag myrepo.azurecr.io/path/landingpage:tag privaterepo.domain.com/path/landing-page:tag
docker push privaterepo.domain.com/path/landing-page:tag

My thought was, “same image, should work!” but I still received the same problem.

 

Wanting to look a little deeper, I tried to learn how to see a bit more interaction when my AKS cluster was attempting the image pull. This is when I came across Debugging K8s Nodes with crictl.

Using the knowledge of this tool, as well as Microsoft Docs on Connecting to AKS nodes, I was able to establish a privileged container and gain access to commands against my node with “chroot /host”.

I hit a roadblock trying to use crictl help docs to pull an image, as it continuously gave me a 403 Forbidden error despite proper credentials. But then I learned about “ctr”, which I discovered was already usable on my nodes from my privileged container!

Now we’re in business – the command “ctr image pull” has a flag for –http-dump which gave me a lot more information

I performed image pulls for my working image and broken image, and noticed the broken one was a LOT more chatty – multiple HTTP requests that seemed to be repeating.

Here’s the requests made for the working image:

  1. HEAD /v2/[image path]/[image name]/manifests/[tag]
  2. POST /v2/_auth
  3. GET /v2/_auth?scope=repository%3A[image path]%2F[image name]%3Apull&service=[repo name]
  4. HEAD /v2/[image path]/[image name]/manifests/[tag]

And this is what I saw for the broken image:

  1. HEAD /v2/[image path]/[image name]/manifests/[tag]
  2. POST /v2/_auth
  3. GET /v2/_auth?scope=repository%3A[image path]%2F[image name]%3Apull&service=[repo name]
  4. GET /v2/[image path]/[image name]/manifests/sha256:[hash]
  5. POST /v2/_auth
  6. GET /v2/_auth?scope=repository%3A[image path]%2F[image name]%3Apull&service=[repo name]
  7. GET /v2/[image path]/[image name]/manifests/sha256:[hash]
  8. GET /v2/[image path]/[image name]/blobs/sha256:[hash]
Quite a bit different, and some of the output from request #7 caught my eye – the response contained this header:
INFO[0001] Content-Type: application/vnd.docker.distribution.manifest.v2+json

And this content (truncated):

INFO[0001] "schemaVersion": 2,
INFO[0001] "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
INFO[0001] "config": {
INFO[0001]     "mediaType": "application/vnd.docker.container.image.v1+json",
INFO[0001]     "digest": "sha256:hash1"
INFO[0001] },
INFO[0001] "layers": [
INFO[0001] {
INFO[0001]    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
INFO[0001]    "size": 2814559,
INFO[0001]    "digest": "sha256:hash2"
INFO[0001] },
INFO[0001] {
INFO[0001]    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
INFO[0001]    "size": 7341522,
INFO[0001]    "digest": "sha256:hash3"
INFO[0001] },

I had seen this output before, in the metadata information provided by both ProGet, and ACR, so I did a comparison of my broken image between ProGet and ACR, and also my working image in ProGet.

What I found different in my broken image in ProGet was the “Size” attribute missing within the config property of the docker manifest!

ACR:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 8486,
    "digest": "sha256:hash"
  },
}

 

ProGet:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:samehash"
  },
}

Even though I simply re-tagged the ACR image and pushed it into ProGet, that somehow dropped this size attribute.

I thought a short term fix would be to manually edit the docker manifest in ProGet to add the size property, but after finding and modifying the file (D:\ProGet\Packages\.docker\F7\manifests\sha256) it hadn’t updated in the ProGet GUI and I didn’t dig into that further.

Instead I re-checked my tests from the docker CLI, with “docker manifest inspect [image name]:[tag]”. What I found was that the ACR image had the size attribute, but somehow the image I re-tagged and pushed to ProGet did not!

So I re-tagged my ACR image again (call it “jeff2”), and then re-pushed to ProGet; this time, the size attribute stayed! I was then able to successfully perform a ‘ctr image pull’ command, and my ‘kubectl apply’ worked!

Here’s what I think happened:

  • First image push (done by teammate) to ProGet occurred over the Internet, behind the Azure AD App Proxy, which stripped the size attribute from the docker manifest
    • Perhaps similar in a way to this bug worked on by the wikimedia team with their varnish proxy?
  • Initial testing, including docker CLI re-tag and push done when ProGet was still behind the Azure AD App Proxy
  • Assume docker client doesn’t care about missing size attribute, but containerd does
  • Assume actual containerd image pull doesn’t mind coming from Azure AD App Proxy (why my other images worked from ProGet originally)
  • After switching to direct HTTPS, pushing an image to ProGet retains the size attribute.

One final test to explain my re-tagging behavior. I re-instated ProGet behind the Azure AD App Proxy, and then re-pushed my “jeff2” tagged image (not a pull from ProGet, just a push) and in both my workstation, and in ProGet, the size attribute was gone!

  • This confirms my assumption that re-tagging and pushing to ProGet not only dropped the attribute in ProGet, but my local docker image cache too.

 

For now, takeaway is don’t put ProGet behind Azure AD App Proxy! I’ve got a support case open with Inedo that I’ll relay this information to.

IIS applications and virtual directories with PowerShell

I’m currently building a container on a Windows Server Core base image with IIS. The intention will be to run this within Azure Kubernetes Service (AKS), on Windows node pools.

A very useful resource in understanding the IIS concepts discussed in this post comes from Octopus: https://octopus.com/blog/iis-powershell#recap-iis-theory

One of the challenges I’m working with is the desire to meet both these requirements:

  • Able to always place our application in a consistent and standard path (like c:\app)
  • Need to be able to serve the app behind customizable virtual paths
    • For example, /env/app/webservice or /env/endpoint
    • These virtual paths should be specified at runtime, not in the container build (to reduce the number of unique containers)
    • A unique domain cannot be required for each application

One of the thoughts is that while testing the application locally, I want to be able to reach the application at the root path (i.e. http://localhost:8080/) but when put together in the context of a distributed system, I want to serve this application behind a customizable path.

In AKS, using the ingress-nginx controller, I can use the “rewrite-target” annotation in order to have my ingress represent the virtual path while maintaining the application at the root of IIS in the container. However, this quickly falls down when various applications are used that might have non-relative links for stylesheets and javascript includes.

One idea was to place the application in the root (c:\inetpub\wwwroot) and then add a new Application on my virtual path pointing to the same physical path. However, this caused problems with duplicate web.config being recognized because it was picked up from the physical path at the root Application and my virtual path Application. This could be mitigated in the web.config with the use of “<location inheritInChildApplications=”false”>” tags, but I also realized I don’t need BOTH requirements to be available at the same time. If a variable virtual path is passed into my container, I don’t need the application served at the root.

With this in mind, I set about creating logic like this:

  1. In the Dockerfile, place the application at c:\app
  2. If the environment variable “Virtual Path” exists
    1. Create an IIS Application pointing at the supplied Virtual Path, with a physical path of c:\app
  3. else
    1. Change the physical path of “Default Web Site” to c:\app

I tested this in the GUI on a Windows Server 2019 test virtual machine, and it appeared to work for my application just fine. However, when I tested using PowerShell (intending to move functional code into my docker run.ps1 script), unexpected errors occurred.

Here’s what I was attempting:

New-WebVirtualDirectory -Name "envtest/app1/webservice" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot"

And here is the error it produced for me:

The view at ‘~/Views/Home/Index.cshtml’ must derive from WebViewPage, or WebViewPage

Interestingly, displaying straight HTML within this virtual path for the Application works just fine – it is only the MVC app that has an error.

The application I’m testing with is a dotnet MVC application, but none of the common solutions to this problem are relevant – the application works just fine at the root of a website, just not when applied under a virtual path.

Using the context from the Octopus link above, I began digging a little deeper and testing. Primarily targeting the ApplicationHost.config file located at “C:\windows\system32\inetsrv\Config”.

When I manually created my pathing in the GUI that was successful (creating each virtual subdir), the structure within the Site in this config file looked like this:

<site name="Default Web Site" id="1">
    <application path="/">
        <virtualDirectory path="/" physicalPath="%SystemDrive%\inetpub\wwwroot" />
		<virtualDirectory path="/envtest" physicalPath="%SystemDrive%\inetpub\wwwroot" />
		<virtualDirectory path="/envtest/app1" physicalPath="%SystemDrive%\inetpub\wwwroot" />
    </application>
    <application path="envtest/app1/webservice" applicationPool="DefaultAppPool">
        <virtualDirectory path="/" physicalPath="C:\inetpub\wwwroot" />
    </application>
    <bindings>
        <binding protocol="http" bindingInformation="*:80:" />
    </bindings>
    <logFile logTargetW3C="ETW" />
</site>

However, when I used the PowerShell example above, this is what was generated:

<site name="Default Web Site" id="1">
    <application path="/">
        <virtualDirectory path="/" physicalPath="%SystemDrive%\inetpub\wwwroot" />
    </application>
    <application path="envtest/app1/webservice" applicationPool="DefaultAppPool">
        <virtualDirectory path="/" physicalPath="C:\inetpub\wwwroot" />
    </application>
    <bindings>
        <binding protocol="http" bindingInformation="*:80:" />
    </bindings>
    <logFile logTargetW3C="ETW" />
</site>

It seems clear that while IIS can serve content under the virtual path I created, MVC doesn’t like the missing virtual directories.

 

When I expanded my manual PowerShell implementation to look like this, then the application began to work without error:

New-WebVirtualDirectory -Name "/envtest" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot"
New-WebVirtualDirectory -Name "/envtest/app1" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot"
New-WebApplication -Name "/envtest/app1/webservice" -PhysicalPath "C:\app\" -Site "Default Web Site" -ApplicationPool "DefaultAppPool"

I could then confirm that my ApplicationHost.config file matched what was created in the GUI.

 

The last piece of this for me was turning a Virtual Path environment variable that could contain any kind of pathing, into the correct representation of IIS virtual directories and applications.

Here’s an example of how I’m doing that:

if (Test-Path "ENV:VirtualPath")
{
    # Trim the start in case a prefix forwardslash was supplied
    $ENV:VirtualPath = $ENV:VirtualPath.TrimStart("/")
    Write-Host "Virtual Path is passed, will configure IIS web application"
    # We have to ensure the Application/VirtualDirectory in IIS gets created properly in the event of multiple elements in the path
    # Otherwise IIS won't serve some applications properly, like ASP.NET MVC sites

    Import-Module WebAdministration
    # for each item in the Virtual Path, excluding the last Leaf
    foreach ($leaf in 0..($ENV:VirtualPath.Split("/").Count-2)) { # minus 1 for 0-based counting, minus 2 for dropping the last leaf
        if ($leaf -eq 0){
            # Check and see if we're the first index of the VirtualPath, and if so just use it
            $usepath = $ENV:VirtualPath.Split("/")[$leaf]
        } else {
            # If not first index, go through all previous index and concat
            $usepath = [string]::Join("/",$ENV:VirtualPath.Split("/")[0..$leaf])
        }
        New-WebVirtualDirectory -Name "$usepath" -Site "Default Web Site" -PhysicalPath "C:\inetpub\wwwroot" # Don't specify Application, default to root
    }

    # Create Application with the full Virtual Path (making last element effective)
    New-WebApplication -Name "$ENV:VirtualPath" -PhysicalPath "C:\app\" -Site "Default Web Site" -ApplicationPool "DefaultAppPool" # Expect no beginning forward slash
} else {
    # Since no virtual path was passed, we want Default Web Site to point to C:\app
    Set-ItemProperty -Path "IIS:\Sites\Default Web Site" -name "physicalPath" -value "C:\app\"
}

 

AzCopy with Packer out of memory

One of my Packer builds for a Windows image is using AzCopy to download files from Azure blob storage. In some circumstances I’ve  had issues where the AzCopy “copy” command fails with a Go error, like this:

2022/01/06 10:00:02 ui:     hyperv-vmcx: Job e1fcf7c7-f32e-d247-79aa-376ef5d49bd6 has started
2022/01/06 10:00:02 ui:     hyperv-vmcx: Log file is located at: C:\Users\cxadmin\.azcopy\e1fcf7c7-f32e-d247-79aa-376ef5d49bd6.log
2022/01/06 10:00:02 ui:     hyperv-vmcx:
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime: VirtualAlloc of 8388608 bytes failed with errno=1455
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: fatal error: out of memory
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx:
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime stack:
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime.throw(0xbeac4b, 0xd)
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: 	/opt/hostedtoolcache/go/1.16.0/x64/src/runtime/panic.go:1117 +0x79
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime.sysUsed(0xc023d94000, 0x800000)
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: 	/opt/hostedtoolcache/go/1.16.0/x64/src/runtime/mem_windows.go:83 +0x22e
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime.(*mheap).allocSpan(0x136f960, 0x400, 0xc000040100, 0xc000eb9b00)
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: 	/opt/hostedtoolcache/go/1.16.0/x64/src/runtime/mheap.go:1271 +0x3b1
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime.(*mheap).alloc.func1()
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: 	/opt/hostedtoolcache/go/1.16.0/x64/src/runtime/mheap.go:910 +0x5f
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime.systemstack(0x0)
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: 	/opt/hostedtoolcache/go/1.16.0/x64/src/runtime/asm_amd64.s:379 +0x6b
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: runtime.mstart()
2022/01/06 10:00:06 ui error: ==> hyperv-vmcx: 	/opt/hostedtoolcache/go/1.16.0/x64/src/runtime/proc.go:1246

Notice the “fatal error: out of memory” there.

I had already set the AzCopy environment variable AZCOPY_BUFFER_GB to 1GB, and I also increased my pagefile size (knowing Windows doesn’t always grow it upon demand reliably) but these didn’t improve it.

Then I stumbled upon this GitHub issue from tomconte: https://github.com/Azure/azure-storage-azcopy/issues/781

I added this into my Packer build before AzCopy gets called, and it seems to have resolved my problem.

VMM stuck logical network on NIC

I ran into an issue while setting up a new Hyper-V server in System Center VMM yesterday. I’m using a Switch Independent team on the server, and while I configured it on the host first, I started going down the path of setting up the networking using VMM components, like uplink port profiles and logical switches. At one point I decided to revert back to my original configuration, but I found that my new logical switch had a dependency on my host, despite removing all visible configuration.

The only thing odd about this host that I could see was that on the NIC used for the Hyper-V virtual switch, it was pinned to a logical network and greyed out; I couldn’t remove it:

This is a server in a cluster, and the second host didn’t exhibit the same problem. I decided to check off a logical network on a different NIC, and then hit the “View Script” button to see the PowerShell that VMM generated, to try and reverse engineer what was happening.

The PowerShell used the cmdlets “Get-SCVMHost”, “Get-SCVMHostNetworkAdapter”, and “Set-SCVMHostNetworkAdapter”. After following those through, I ended up with a command to remove the logical network from my NIC:

Set-SCVMHostNetworkAdapter -VMHostNetworkAdapter $vmhostnetworkadapter -RemoveLogicalNetwork $logicalNetwork

However, this produced an error:

Set-SCVMHostNetworkAdapter : The selected host adapter ‘Intel(R) Ethernet 10G 4P X520/I350 rNDC #2$$$Microsoft:{F17CF86F-A125-4EE7-9DB3-0777D9935BA4}’ has an uplink
port profile set configured with network sites, so logical networks, IP subnets, or VLANs cannot be directly modified on the host network adapter. (Error ID: 25234)

When I viewed the dependency on the Uplink Port Profile, it displayed my server name; but I couldn’t see this uplink port profile anywhere in the GUI for the server.

Reviewing the docs on the “Set-SCVMHostNetworkAdapter” led me to another switch: RemoveUplinkPortProfileSet

Here’s the full PowerShell I used to remove this, which allowed me to remove my logical switch and uplink port profile.

$vmHost = Get-SCVMHost -Computername "hostname"
#Get-SCVMHostNetworkAdapter -VMHost $vmHost | select name, connectionName # Find the NIC by connectionName, so I can use the real name in the next command
$vmHostNetworkAdapter = Get-SCVMHostNetworkAdapter -VMHost $vmHost -name "Intel(R) Ethernet 10G 2P X520 Adapter"
$logicalNetwork = Get-SCLogicalNetwork -Name "logical network name"
Set-SCVMHostNetworkAdapter -VMHostNetworkAdapter $vmhostnetworkadapter -RemoveUplinkPortProfileSet

 

docker-compose environment variables and quotes

Today I am learning about using docker-compose to run a simple dotnet core Blazor server app, and I hit a snag.

For various reasons I won’t detail right now, I want my docker container to serve my app up over HTTPS, and this requires a bit of extra configuration for dotnet core.

After producing a certificate, I managed to get my container running with a a “docker run”, like this:

docker run --rm -p 44381:443 -e ASPNETCORE_HTTPS_PORT=44381 -e ASPNETCORE_URLS="https://+;http://+" -e Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx -e Kestrel__Certificates__Default__Password=password -v $env:USERPROFILE\.aspnet\https:/https/ samplewebapp-blazor
No problems, I could hit https://localhost:44381 and it all worked great.
However, that’s messy and I wanted to experiment with docker-compose yml files to clean it up a bit. I produced this:
version: "3.8"
services:
  web:
    image: samplewebapp-blazor
    ports:
      - "44381:443"
    environment:
      - COMPOSE_CONVERT_WINDOWS_PATHS=1
      - ASPNETCORE_HTTPS_PORT=44381
      - ASPNETCORE_URLS="https://+;http://+"
      - Kestrel__Certificates__Default__Password="password"
      - Kestrel__Certificates__Default__Path="/https/aspnetapp.pfx"
    volumes:
      - "/c/Users/jeff.miles/.aspnet/https:/https/"
Then, I run “docker-compose up”. However, instead of success, I saw errors!
crit: Microsoft.AspNetCore.Server.Kestrel[0]
web_1  |       Unable to start Kestrel.
web_1  | Interop+Crypto+OpenSslCryptographicException: error:2006D080:BIO routines:BIO_new_file:no such file

My first thought was, “That’s got to be referring to the certificate – I must not have the volume syntax correct, and it isn’t mounted”. So I messed around with a bunch of different ways of specifying the local mount point, investigated edge cases with WSL2 and Docker Desktop, and wasted about 45 minutes with no results.

So I tagged in my buddy Matthew for his insight, and his first suggestion was “is it actually mounted?” In order to check, I had to get the container to run with docker-compose, so I commented out the environment variables for ASPNETCORE_URLS, and the Kestral values. This allowed the container to run, although I couldn’t actually hit the web app.

Then I was able to do: “docker exec -it containername bash”

Using this I could browse the filesystem, and verify the volume was mounted and the certificate was present.

Within that bash prompt, I manually set the environment variables, and then re-ran dotnet with the same entrypoint command as what builds my docker image. Surprisingly, the application loaded up successfully!

This tells me the volume is good, but something’s wrong with the passed-in variables.

First, I tried taking the quotes off the value of the Kestrel__Certificates__Default__Path variable. But then docker-compose gave me this error:

web_1  | crit: Microsoft.AspNetCore.Server.Kestrel[0]
web_1  |       Unable to start Kestrel.
web_1  | System.InvalidOperationException: Unrecognized scheme in server address '"https://+""'. Only 'http://' is supported.

I decided to remove all quotes from all environment variables (as a shot in the dark), and again surprisingly, it worked!

A bit of internet sleuthing later, and Matthew had produced this GitHub issue as explanation of what was going on.

Because I was wrapping the environment variables in quotes, they were actually getting injected into the container with quotes!

Here’s the end result of my compose file:

version: "3.8"
services:
  web:
    image: samplewebapp-blazor
    ports:
      - "44381:443"
    environment:
      - COMPOSE_CONVERT_WINDOWS_PATHS=1
      - ASPNETCORE_HTTPS_PORT=44381
      - ASPNETCORE_URLS=https://+;http://+
      - Kestrel__Certificates__Default__Password=password
      - Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx
    volumes:
      - "/c/Users/jeff.miles/.aspnet/https:/https/"

It looks like as of docker-compose 1.26 (out now) that if you need quotes around environment variable values, you should use a .env file, which will work properly.