Azure Availability Zones and latency testing

While working on a design for virtual machine placement in Azure, I got to wondering about specifics of Availability Zones and the potential performance impacts of not actually choosing one. My findings below are a little bit conjecture at this point, not having found direct confirmation from Microsoft on the topic.

Availability Zones are a method within Azure to provide resiliency for resources by using multiple datacenters within a region.

Resources within Azure can be one of 3 types related to these zones:

  • Zonal services – where a resource is pinned to a specific zone (for example, virtual machines, managed disks, Standard IP addresses), or
  • Zone-redundant services – when the Azure platform replicates automatically across zones (for example, zone-redundant storage, SQL Database).
  • None – not actually documented (yet?) but this is the type when you have a Zonal service but do not select a zone.

The last item there is of particular interest – if you don’t select a zone for a Zonal service, where does it go? This issue from Microsoft Docs has a description of an “allocator” that works behind the scenes to make a decision on zone placement, but that is never surfaces to you; not even available in the Azure Resource Explorer.

For example, here’s a snipped of the metadata available for a VM with a specific Zone placement:

And here’s one without any at all:

This led to some questions for me:

  1. Am I losing performance (higher latency) by not setting my VMs in the same zone (if they happen to be placed in separate zones by the “allocator”)?
  2. Will I be charged for bandwidth between zones when billing begins on July 1, 2021 for it, if my VMs don’t have a zone selected but get placed in separate zones?

I’ve asked #2 in an Issue on the doc, and hopefully will receive an answer. I set out to test #1 within EastUS2.

Starting with Microsoft’s recommendation for latency testing on a virtual network, I downloaded the “latte.exe” tool and spun up some VMs.

The advantage of this tool, according to Microsoft, is:

latte.exe (for Windows) can isolate and measure network latency while excluding other types of latency, such as application latency.

Other common connectivity tools, such as Ping … employ the Internet Control Message Protocol (ICMP), which can be treated differently from application traffic and whose results might not apply to workloads that use TCP and UDP.

The output of this tool looks like this, and it is the Latency value we’re after:

While running multiple tests on idle VMs, I found a discrepancy of ~20-30 us between tests, so take that into account when viewing the results below.

Here’s some of the results that I found:

Test Result (us)
2 VMs, same availability zone, accelerated networking is false: 340
2 VMs, same availability zone, accelerated networking is true: 169
2 VMs, different availability zone, accelerated networking is false: 397
2 VMs, different availability zone, accelerated networking is true: 150
2 VMs, no availability zone selected, accelerated networking is false: 427
2 VMs, no availability zone selected, accelerated networking is true: 144
2 VMs, same availability zone, accelerated networking is true, proximity placement group aligned: 158

 

It doesn’t seem right, but the conclusion that I draw from this is that the latency between availability zones (at least in EastUS2) is functionally equivalent to within a zone, and even within a proximity placement group, which is supposed to improve even more.

I don’t have a good explanation for these results yet – perhaps my testing is flawed in some way, or perhaps this is specific to EastUS2 and the differences are more varied in other Regions where the datacenters are further apart, or consist of more datacenters within each zone itself.

 

 

 

Migrate Azure Managed Disk between regions

This post is a reference for needing to move an Azure Managed Disk between regions. It is based on this Microsoft Docs article.

There are legacy posts containing information to use Az PowerShell to basically do a disk export to blob storage (with a vhd file) and then import that.

This procedure skips those intermediate steps, and uses AzCopy to do it directly.

If there’s no active data changing on the disk, you could consider taking a snapshot and then producing a new Managed Disk from your snapshot to perform the steps below – this way you wouldn’t need to turn off the VM using the Disk. However, the situations where this might be viable are probably rare.

You will need to:

First, shut down and deallocate your VM.

Then open a PowerShell terminal and connect to your Azure subscription.

Then populate this script, and execute each command in sequence.

# Name of the Managed Disk you are starting with
$sourceDiskName = "testweb1_c"
# Name of the resource group the source disk resides in
$sourceRG = "test-centralus-rg"
# Name you want the destination disk to have
$targetDiskName = "testweb1_c"
# Name of the resource group to create the destination disk in
$targetRG = "test-eastus2-rg"
# Azure region the target disk will be in
$targetLocate = "EastUS2"

# Gather properties of the source disk
$sourceDisk = Get-AzDisk -ResourceGroupName $sourceRG -DiskName $sourceDiskName

# Create the target disk config, adding the sizeInBytes with the 512 offset, and the -Upload flag
# If this is an OS disk, add this property: -OsType $sourceDisk.OsType
$targetDiskconfig = New-AzDiskConfig -SkuName 'Premium_LRS' -UploadSizeInBytes $($sourceDisk.DiskSizeBytes+512) -Location $targetLocate -CreateOption 'Upload'

# Create the target disk (empty)
$targetDisk = New-AzDisk -ResourceGroupName $targetRG -DiskName $targetDiskName -Disk $targetDiskconfig

# Get a SAS token for the source disk, so that AzCopy can read it
$sourceDiskSas = Grant-AzDiskAccess -ResourceGroupName $sourceRG -DiskName $sourceDiskName -DurationInSecond 86400 -Access 'Read'

# Get a SAS token for the target disk, so that AzCopy can write to it
$targetDiskSas = Grant-AzDiskAccess -ResourceGroupName $targetRG -DiskName $targetDiskName -DurationInSecond 86400 -Access 'Write'

# Begin the copy!
.\azcopy copy $sourceDiskSas.AccessSAS $targetDiskSas.AccessSAS --blob-type PageBlob

# Revoke the SAS so that the disk can be used by a VM
Revoke-AzDiskAccess -ResourceGroupName $sourceRG -DiskName $sourceDiskName

# Revoke the SAS so that the disk can be used by a VM
Revoke-AzDiskAccess -ResourceGroupName $targetRG -DiskName $targetDiskName

 

When you get to the AzCopy step, you should see results something like this:

In my experience, the transfer will go as fast as the slowest rated speed for your managed disk – the screenshot above was from a Premium P15 disk (256 GB) rated at 125 MBps (or ~ 1 Gbps).

 

Azure WAF Policy and Application Gateway limitation

Today I encountered a concerning product limitation of the Azure Application Gateway and Web Application Firewall (WAF) Policies.

Some background first – when working with an Application Gateway v2 sku, you can apply a WAF in 2 different ways:

Microsoft’s documentation appears to be updated to display a preference for using the WAF Policy object, including a scripted method for converting to it: Migrate Web Application Firewall policies using Azure PowerShell

I moved to using a WAF Policy because I wanted to use a series of Terraform local variables to supply WAF rule configuration (exclusions) to both Application Gateway and Azure FrontDoor WAF Policies, without duplicating code.

But there is a severe limitation that you may not have noticed in the docs:

You might think, “that’s not a problem, I shouldn’t ever need to disassociate my application gateway” but here’s where it gets wild.

Let’s say that you decide you under-sized your Application Gateway, and want to increase the maximum scale units, or set it to auto-scale. You go to the Configuration blade, and modify the setting, and then hit “save”. You will see this error:

Uh-oh.

You can try to disassociate the policy in the Portal, but you’ll just see this:

What if we try to disassociate the Policy in PowerShell?

"Firewall policy cannot be removed from Application Gateway, changing from one firewall policy to another is permitted."

This is the limitation – once you’ve applied a WAF Policy, the only way to make a configuration change against the Application Gateway is to destroy it and re-create it. This is absolutely crazy, and means I will not deploy another WAF Policy object until it is resolved. Generally speaking one wouldn’t expect to be changing the AppGw configuration often, but being stuck like this not a good place to be in.

Other’s have talked about this like on ServerFault with a suggestion to shut down the AppGw (not effective) or this issue on azure-cli describing the limitation. To date I haven’t found any viable workaround.

 

Azure B-Series CPU Credit workbook in Azure Monitor

Today I produced a workbook for Azure Monitor that can help watch CPU Credit utilization for Azure B-Series VMs. If you’re using this VM Sku, you want to be aware of trends in your compute usage to avoid credit exhaustion that would force your VM to operate at the baseline performance level instead of being able to burst above.

Once you import this into Azure Monitor, you will have a very easy way to view CPU credits remaining, consumed, and CPU usage for all virtual machines in a subscription.

To get started, grab the gallery JSON of the workbook from here: CPU Credits Remaining.workbook

  • This is currently in a branch of my own fork of the Azure Monitor Community repository – this will be updated when my pull request is approved and merged

In the Azure Portal, navigate to Azure Monitor -> Workbooks:

Select “New” from the top menu, and then click the advanced editor to be able to paste the JSON code:

Then click the “Apply” button and you should see the Workbook load. Make sure to save this and give it a name!

You should see a workbook that gives you the capabilty to filter by subscription, and set the metric time-range, along with a display for each VM. The trendlines may not show much graphically at 1-hour intervals, but change it to 12 or 24 hours and you’ll begin to see more movement (assuming variable CPU usage).

This workbook doesn’t filter out VMs that are NOT B-series VMs, so areas where the table is blank are because those VMs don’t have credits at all.

 

HTTPS for dotnet Blazor container and Azure AD authentication

As I delve further into working with some new technologies, like docker and Blazor, I keep adding new use-cases that I want to address.

Today’s is Azure AD Oauth authentication for single-sign-on. But in order to do this properly, I want to add HTTPS support.

This post will go over what I found I had to do to run a local docker container with HTTPs and Azure AD login.

I’m assuming that the basics of Docker and working with dotnet core are understood.

dotnet Blazor test project

We start with a dotnet Blazor project. We can pull a template from the dotnet templates using a command like this:

dotnet new blazorserver -o BlazorApp --no-https -f netcoreapp3.1

We can see that it has produced an application from the template for us:

After moving into the BlazorApp directory, you can perform a “dotnet run” and hit the presented URL in the browser to confirm the site is working.

 

Now we’re going to build a dockerfile, based on an example provided by Microsoft. You will need Docker Desktop as a prerequisite.

This dockerfile below was from a different Microsoft example that I found, but cannot find a reference to any longer. Place this dockerfile inside the BlazorApp directory.

### ------ Base ------ ###
# Base contains only the .NET Core runtime
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS base
WORKDIR /app
EXPOSE 80

### ------ Build ------ ###
# Build stage uses an image with the .NET Core SDK
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS build

# Sets the working directory, so any actions taken are relative to here
WORKDIR /src

# Copy the csproj file, is source then destination; use . because we want it in our WORKDIR
COPY BlazorApp.csproj .

# dotnet restore uses nuget to install the dependencies and tools specified in the project
RUN dotnet restore "BlazorApp.csproj"

# Now we copy everything from source (where this dockerfile is) to the WORKDIR
COPY . .

# dotnet build - builds the project and it's dependencies into a set of binaries
RUN dotnet build "BlazorApp.csproj" -c Release -o /app/build
    # Here we're using "release" configuration - can specify multiple in the dotnet project
    # We don't technically need to do this, because dotnet publish will do the same


### ------ Publish ------ ###
FROM build AS publish
# dotnet publish will compile the application, and put the resulting set of files into a directory
RUN dotnet publish "BlazorApp.csproj" -c Release -o /app/publish


### ------ Final ------ ###
# Take the original lightweight base image as our source
FROM base AS final
WORKDIR /app
# Specify the context of our source, which is the Publish stage of the docker build, and the folder /app/publish, and put it in the WORKDIR
COPY --from=publish /app/publish .

# This is the instruction that tells the image how to start the process it will run for us
ENTRYPOINT ["dotnet", "BlazorApp.dll"]

Now we build the dockerfile with this:

docker build -t "blazorapp" .
    # -t is the tag we give the image name
    # The . tells docker to look for the dockerfile in the current directory

This produces a docker image, which we can see from the docker cli with “docker image ls”:

To run the container, we’ll use:

docker run --rm -p 44381:80 blazorapp

We expose port 80 from the container (in the docker file) and link it to the host port 44381 and then test that we can hit this from a browser on my local workstation:

 

Add Azure AD authentication

I haven’t found an easy 100-level intro to integrating Azure AD authentication to an existing project. Instead, we can create a new application from a template supplying additional command line switches to pre-create a project with this enabled.

Before we do that, you will need to create an App Registration in your Azure AD. This guide is simple to follow to do so. Note that on step 6, where you supply the redirect URIs, they must use the port that your “docker run” command is using (i.e. 44381). Also note that the URL entered there is HTTPS – we’ll get to that.

Once your app registration is created, you can use properties from it (on the Overview page) to create a new blazor app:

dotnet new blazorserver -o BlazorApp --no-https -f netcoreapp3.1 --auth SingleOrg --client-id "Enter_the_Application_Id_here" --tenant-id "yourTenantId"

Make sure you modify the appsettings.json file to include your domain name from Azure AD.

Now when you try and run your application using Docker Run, you’ll get an error:

This is because the reply url is HTTP but your App Registration is configured for only HTTPS.

So lets add in TLS to get that HTTPS URL.

TLS support

There is a way to get HTTPS enabled within the container, using certificate references and configuration in the application. But in my mind this seems to make the application less portable between environments and hosting methods (local development, running in a container, in Azure App Service, etc).

Instead, I’m looking at using the sidecar profile instead. This uses a paired container that serves up the HTTPs, and reverse proxies web requests to the application. Here’s an example describing the process (although this is for Azure Container Instances).

To work with this locally, we will use Docker Compose, to coordinate multiple containers that can talk to each other.

There’s a few things to prepare. First step, we need to generate a certificate, which will be then injected into the sidecar container at runtime. For this we follow the instructions from that Azure Container Instance linked above.

openssl req -new -newkey rsa:2048 -nodes -keyout ssl.key -out ssl.csr
# Follow the prompts to add the identification information. For Common Name, enter the hostname associated with the certificate. When prompted for a password, press Enter without typing, to skip adding a password.
openssl x509 -req -days 365 -in ssl.csr -signkey ssl.key -out ssl.crt

I placed these certificate files in a subfolder named “tls_sidecar”.

Next we need a configuration file for nginx, which we will use as the application inside our sidecar container.

# nginx Configuration File
# https://wiki.nginx.org/Configuration

# Run as a less privileged user for security reasons.
user nginx;

worker_processes auto;

events {
    worker_connections 1024;
}

pid        /var/run/nginx.pid;

http {
    proxy_buffer_size   128k;
    proxy_buffers   4 256k;
    proxy_busy_buffers_size   256k;
    large_client_header_buffers 4 16k;
    #Redirect to https, using 307 instead of 301 to preserve post data

    server {
        listen [::]:443 ssl;
        listen 443 ssl;

        server_name localhost;

        # Protect against the BEAST attack by not using SSLv3 at all. If you need to support older browsers (IE6) you may need to add
        # SSLv3 to the list of protocols below.
        ssl_protocols              TLSv1.2;

        # Ciphers set to best allow protection from Beast, while providing forwarding secrecy, as defined by Mozilla - https://wiki.mozilla.org/Security/Server_Side_TLS#Nginx
        ssl_ciphers                ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:ECDHE-RSA-RC4-SHA:ECDHE-ECDSA-RC4-SHA:AES128:AES256:RC4-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!3DES:!MD5:!PSK;
        ssl_prefer_server_ciphers  on;

        # Optimize TLS/SSL by caching session parameters for 10 minutes. This cuts down on the number of expensive TLS/SSL handshakes.
        # The handshake is the most CPU-intensive operation, and by default it is re-negotiated on every new/parallel connection.
        # By enabling a cache (of type "shared between all Nginx workers"), we tell the client to re-use the already negotiated state.
        # Further optimization can be achieved by raising keepalive_timeout, but that shouldn't be done unless you serve primarily HTTPS.
        ssl_session_cache    shared:SSL:10m; # a 1mb cache can hold about 4000 sessions, so we can hold 40000 sessions
        ssl_session_timeout  24h;


        # Use a higher keepalive timeout to reduce the need for repeated handshakes
        keepalive_timeout 300; # up from 75 secs default

        # remember the certificate for a year and automatically connect to HTTPS
        add_header Strict-Transport-Security 'max-age=31536000; includeSubDomains';

        ssl_certificate      /etc/nginx/ssl.crt;
        ssl_certificate_key  /etc/nginx/ssl.key;

        location / {
            proxy_pass http://web:80; # this uses the service name from docker compose

            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection keep-alive;
            proxy_set_header Host $http_host;
            proxy_cache_bypass $http_upgrade;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Proto $scheme;
        }
    }
}

I found that there were some specific nginx settings I needed to set in order to get this fully working which are included in my conf above (but I didn’t need the fastcgi entries from that link).

You can also see that the proxy_pass directive is referencing a name of “web” – this maps to the service name that we’ll be using in docker-compose.

This nginx.conf file is also placed into the “tls_sidecar” container.

There are a couple of changes required to be made to the dotnet Core application itself as well, based on this link for placing it behind a reverse proxy.

In Startup.cs, I added 2 new namespaces at the top of the file:

using System.Net;
using Microsoft.AspNetCore.HttpOverrides;

Then within the “ConfigureServices” method, we add options for dealing with the ForwardedFor headers:

services.Configure(options =>
            {
                options.ForwardedHeaders =
                    ForwardedHeaders.XForwardedFor | ForwardedHeaders.XForwardedProto;
                options.KnownNetworks.Add(new IPNetwork(IPAddress.Parse("10.0.0.0"), 8));
                options.KnownNetworks.Add(new IPNetwork(IPAddress.Parse("172.16.0.0"), 12));
                options.KnownNetworks.Add(new IPNetwork(IPAddress.Parse("192.168.0.0"), 16));
            });

So that it looks like this:


We also add another line in the “Configure” method: app.UseForwardedHeaders();

Lastly, we create a docker-compose file – this will define 2 services, and allow Docker to build private networking between them.

version: "3.8"

services:
  web:
    image: blazorapp:latest
  nginx:
    image: nginx:latest
    container_name: nginx_with_ssl
    volumes:
      - ./tls_sidecar/nginx.conf:/etc/nginx/nginx.conf
      - ./tls_sidecar/ssl.crt:/etc/nginx/ssl.crt
      - ./tls_sidecar/ssl.key:/etc/nginx/ssl.key
    ports:
      - 44381:443

Here in this YAML file we are mounting our nginx.conf and SSL certificate files as volumes, in a one-to-one fashion (rather than directories). This will allow the nginx container to use these files from source control without having to build a custom container image with them inside.

Now that we’re prepared, the final steps are to re-run our “docker build” step from above (since we made application changes) and then run Docker Compose, with this:

docker-compose up

Then try to hit your application at https://localhost:44381 (you’ll likely get a cert warning for a name mis-match)

We did it!