Azure Site Recovery learnings – Azure-To-Azure

A few things I’ve learned about Azure Site Recovery (ASR) recently, while doing an Azure-to-Azure DR design – some quite surprising:

  • Both your Recovery Services Vault AND it’s resource group must be located in a different region than the source
  • You need a cache storage account in the source region, however the staged incremental data will have negligible cost according to Microsoft
  • For source VMs utilizing Managed Disks, ASR will create destination Managed Disks and you will be charged for provisioned storage size, not consumed size. This differs from using unmanaged disks or for on-premises ASR, where consumed size is stored as page blobs in the destination storage account. I can’t find a Microsoft Doc link that specifically outlines this.
  • Egress bandwidth compression is estimated at about 60%, according to this blog post: Know exactly how much it will cost for enabling DR to your Azure VMs
  • VM Extensions are not replicated to a failover VM, and need to be manually installed: Doc Link
  • Secondary IP addresses are not replicated! These will need to be re-added through a Post-Failover task: Doc Link

 

Some things I still need to research and test are:

  • What happens if you perform a failover (or test failover) to a VM that is reporting into Log Analytics and Azure Automation (DSC, Update Management)? Will it seamlessly continue these operations, even when the VM extension no longer exists?
  • What happens to Azure Backup? Will the test failover VM using the Azure Backup Agent try to send backup data cross-region to the source Recovery Services Vault if the schedule time is triggered?

Azure NSG Update issue with PowerShell

Today I’m working on creating and updating Azure Network Security Group rules from the Az PowerShell module. However I hit a bit of a bump.

I’m trying to do something simple like add 1 rule to an existing NSG that has other rules:

# Find the security group where it matches a variable
$nsgs = Get-AzNetworkSecurityGroup -ResourceGroupName $rgname | where-object { $_.Name -like "$nsgsuffix" }
# For the selected security group, add a rule, and then apply it
$nsgs | add-aznetworksecurityruleconfig -Access Allow -DestinationAddressPrefix $ips -DestinationPortRange 443 `
 -Direction Outbound -name allowOut-To3rdParty_443 -Priority 400 `
-SourceAddressPrefix * -SourcePortRange * -Protocol * | Set-AzNetworkSecurityGroup

This NSG and other rules were previously deployed through TerraForm, and I want to add an “out-of-band” rule that isn’t tracked in the Terraform state.

When I run these commands, I get this error:

Required security rule parameters are missing for security rule with Id:
Security rule must specify either DestinationPortRange or DestinationPortRanges.

That’s super strange; when I look at that rule in the Portal, it has ports listed:

So I check my $nsgs variable that was previously populated, looking specifically for the SecurityRules property:

$nsgs.SecurityRules

What I find is that for any rule that the portal displays having multiple ports, it lists them just fine (see Destination Port Range property below):

Name                                 : test_test.test
Id                                   : /subscriptions/1/resourceGroups/rg/providers/Microsoft.Network
/networkSecurityGroups/db-nsg/securityRules/test_test.test
Etag                                 : W/"3a98e199-3bf6-4308-806c-8d84bc03723e"
ProvisioningState                    : Succeeded
Description                          :
Protocol                             : *
SourcePortRange                      : {*}
DestinationPortRange                 : {123, 456, 789}
SourceAddressPrefix                  : {10.8.1.}
DestinationAddressPrefix             : {10.8.1.}
SourceApplicationSecurityGroups      : []
DestinationApplicationSecurityGroups : []
Access                               : Allow
Priority                             : 109
Direction                            : Inbound

But for any rule that only has a SINGLE port shown in the portal, it is blank for that property:

Name                                 : web_.db
Id                                   : /subscriptions/1/resourceGroups/rg/providers/Microsoft.Network
/networkSecurityGroups/db-nsg/securityRules/web_.db
Etag                                 : W/"3a98e199-3bf6-4308-806c-8d84bc03723e"
ProvisioningState                    : Succeeded
Description                          :
Protocol                             : *
SourcePortRange                      : {*}
DestinationPortRange                 : {}
SourceAddressPrefix                  : {10.0.0.1, 10.0.0.2}
DestinationAddressPrefix             : {10.1.1.100}
SourceApplicationSecurityGroups      : []
DestinationApplicationSecurityGroups : []
Access                               : Allow
Priority                             : 107
Direction                            : Inbound

My first thought was “how is this possible”?

I think this is because for these single port rules, we’re using the following Terraform syntax:

destination_port_ranges    = ["${var.port-https}"]

Terraform is expecting a list, but we’re only passing a single value. Terraform doesn’t consider this invalid, and it is applied to Azure successfully. However the flaw is revealed when trying to use the “Set-AzNetworkSecurityGroup” cmdlet because that attempts to re-validate ALL rules on the NSG, not just the one that I added or modified.

If I change that Terraform property to destination_port_range (note the singular) then everything appears to work properly when using Az PowerShell afterwards.

Get Azure VM Uptime – sorta

Let’s say you have a few virtual machines in Azure, that should only be running for a limited period of time. You want to ensure that they are stopped after that period of time, but you aren’t the one responsible for the service running on the VMs, and as such, want to empower the owner(s) with both the responsibility and capability to determine when that period of time is finished; perhaps with just a bit of nudging…

Thinking about this scenario had me determine, “what if I could find out when a VM uptime exceeds a set value (say 14 days)” and notify someone that “hey, maybe you forgot to turn this off”.

Turns out there isn’t an accessible “uptime” property for an Azure VM. However, there is a time stamp property from the last state change, and that’s what we can use to enable this logic.

The key is to use “Get-AzVM” with the -Status switch. When you run Get-AzVM, you can either return the “model view” or the “instance view” of a collection of virtual machines, or an individual VM. The Microsoft Doc page for this cmdlet describes using the -Status switch to provide the “instance view” which is where we get the properties we’re after.

If you run it for a collection of VMs you can retrieve the PowerState property:

For an individual VM, you get the “Statuses” property, which contains a collection of items:

The first item within the “Statuses” collection is the “ProvisioningState”, which displays (from what I can gather) the most recent provisioning action taken on the VM (starting it, stopping/deallocating it), along with a Time property. The second item is the current PowerState of the VM.

Using this information, I’ve built the PowerShell below to grab all VMs within a subscription, and for each of them evaluate the status against a point-in-time, outputting results for those which are running and where the successful provisioning exceeds the age of that point-in-time.

Select-AzSubscription 
$comparedate = (get-date).AddDays(-14)
$rg = "resourcegroup"
#Get the Instance view of a collection of virtual machines (returns the PowerState property)
$vms = get-azvm -status -resourcegroup $rg

#Iterate through the collection
foreach ($vm in $vms)
{
	# only check if the VM is running, because if it's off we don't care
	if ($vm.powerstate -ceq "VM running")
	{
		# Get the instance view of a single virtual machine (returns the "statuses" object)
		$foundvm = get-azvm -resourcegroup $vm.ResourceGroupName -name $vm.Name -status
		    #$foundvm.Statuses.Time
                # check if time since it was provisioned (in Statuses[0]) is greater than a value
		if ($foundvm.Statuses.Time -le $comparedate)
		{
			write-output "$($foundvm.name) : running longer than 14 days"
		}
	}
}

Using this PowerShell, you could modify the results to populate an array, and email it to a recipient. This could be placed into an Azure Automation runbook and ran on a schedule.

Run script inside Azure VM from PowerShell

Today I was working on a way to initialize and format disks inside an Azure VM from PowerShell, specifically outside the VM itself.

The two most common ways to accomplish this are:

One of my key requirements was to not have the VM itself go and grab the script file to run; I wanted to pass a scriptblock to my VM from PowerShell instead. This immediately ruled out Set-AzVMCustomScriptExtension, as it needs a file that is accessible to the VM in order to run.

I had originally thought the same about Invoke-AzVmRunCommand. The example I saw for the ‘-ScriptPath’ parameter were all from a storage account or something similar. However after testing, I found that this ScriptPath could be local to MY computer, where I was running the PowerShell, not the VM itself. The command will inject the contents of the script file into the VM.

In order to simulate the “scriptblock” effect without having too many files, I put my code into a script file at runtime (so that it would be wherever the user was running the prompt), referenced it, and then removed it afterwards, like this:

# Build a command that will be run inside the VM.
$remoteCommand =
@"
#Get first disk that is raw with the lowest disk number (because we may not know what number it will be)
# F volume
Get-Disk | Where-Object partitionstyle -eq 'raw' | Sort-Object Number | Select-Object -first 1 | Initialize-Disk -PartitionStyle GPT  -confirm:$false -PassThru |
New-Partition -DriveLetter 'F' -UseMaximumSize |
Format-Volume -FileSystem NTFS -NewFileSystemLabel "F volume" -Confirm:$false
"@
# Save the command to a local file
Set-Content -Path .\DriveCommand.ps1 -Value $remoteCommand
# Invoke the command on the VM, using the local file
Invoke-AzureRmVMRunCommand -Name $vm.name -ResourceGroupName $vm.ResourceGroupName -CommandId 'RunPowerShellScript' -ScriptPath .\DriveCommand.ps1
# Clean-up the local file
Remove-Item .\DriveCommand.ps1

 

Azure VPN Gateway Connection with custom IPSEC Policy

I was recently setting up a VPN tunnel between an Azure VPN Gateay and an on-premise location, and ran into issues with the tunnel connecting.

The connection in Azure kept saying “connecting”. I was trying to use the VPN troubleshooter to log diagnostics to a storage account for parsing, however the diags didn’t contain the actual errors, and the wizard in the Portal wouldn’t refresh from subsequent runs, so it was stuck on an error with the pre-shared key which I had already corrected.

The on-premise device is a Cisco, and so there were accessible error messages from it:

crypo map policy not found for remote traffic selector 0.0.0.0

This led me down a path of searching resulting in the Cisco example configuration from Microsoft. The key part of this is that a Cisco ASA cannot make a connection to a native RouteBased VPN Gateway in Azure.

The fix is to apply a custom IPSec policy to your connection, particularly with this flag: -UsePolicyBasedTrafficSelectors $True

I used a small bit of PowerShell in order to try this out:

$rg          = "default-rg"
$ConnectionHEN = "vpngw"
$connection = Get-AzVirtualNetworkGatewayConnection -Name $ConnectionHEN -ResourceGroupName $rg
$newpolicy   = New-AzIpsecPolicy -IkeEncryption AES256 -ikeintegrity SHA256 -DhGroup DHGroup2 -IpsecEncryption AES256 -IpsecIntegrity SHA256 -PfsGroup none -SALifeTimeSeconds 28800
Set-AzVirtualNetworkGatewayConnection -VirtualNetworkGatewayConnection $connection -IpsecPolicies $newpolicy -UsePolicyBasedTrafficSelectors $true

However, this returned the following error:

A virtual network gateway SKU of Standard or higher is required for Ipsec Policies support on virtual network gateway

My VPN Gateway is of SKU “Basic”, so it does not support IPSec policies, according to this documentation page.

Because it’s Basic, I can’t simply upgrade to a “VpnGW1” – I have to destroy and re-create my gateway as the new SKU, which will also generate a new public IP address.

So I did these things, but I did them using Terraform since this environment is managed with that tool.

First I ‘tainted‘ the existing resource to mark it for deletion and recreation. Then I ran “terraform apply” to make the modifications, based on the resources here below:

 

resource "azurerm_resource_group" "srv-rg" {
  name     = "srv-rg"
  location = "${var.location}"
}
resource "azurerm_public_ip" "vpngw-pip" {
  name = "vpngateway-ip"
  location = "${azurerm_resource_group.srv-rg.location}"
  resource_group_name = "${azurerm_resource_group.srv-rg.name}"
  allocation_method = "Dynamic"
}
resource "azurerm_local_network_gateway" "localgateway" {
  name                = "localgateway"
  resource_group_name = "${azurerm_resource_group.srv-rg.name}"
  location            = "${azurerm_resource_group.srv-rg.location}"
  gateway_address     = "1.2.3.4"
  address_space       = "10.10.0.0/24"
}
resource "azurerm_virtual_network_gateway" "vpngw" {
  name = "vpngw"
  location = "${azurerm_resource_group.srv-rg.location}"
  resource_group_name = "${azurerm_resource_group.srv-rg.name}"
  type = "Vpn"
  vpn_type = "RouteBased"

  active_active = false
  enable_bgp = false
    sku = "VpnGw1"

  ip_configuration {
    name = "vpngateway_ipconfig"
    public_ip_address_id = "${azurerm_public_ip.vpngw-pip.id}"
    private_ip_address_allocation = "Dynamic"
    subnet_id = "${azurerm_subnet.GatewaySubnet.id}"
  }
}

# Client VPN Connection
resource "azurerm_virtual_network_gateway_connection" "vpnconnection" {
  name = "vpnconnection"
  location = "${azurerm_resource_group.srv-rg.location}"
  resource_group_name = "${azurerm_resource_group.srv-rg.name}"

  type = "IPsec"
  virtual_network_gateway_id = "${azurerm_virtual_network_gateway.vpngw.id}"
  local_network_gateway_id = "${azurerm_local_network_gateway.localgateway.id}"
  use_policy_based_traffic_selectors = true

  shared_key = "${var.ipsec_key}"

  ipsec_policy {
    dh_group = "DHGroup2"
    ike_encryption = "AES256"
    ike_integrity = "SHA256"
    ipsec_encryption = "AES256"
    ipsec_integrity = "SHA256"
    pfs_group = "None"
    sa_lifetime = "28800"

  }
}