Terraform AzureRM Backend

One of the primary items I wanted to accomplish before my latest use of Terraform in production was storing the state file in a central location for shared use within my team.

This is controlled in Terraform by the “backend“. In my particular case, I was interested in the AzureRM backend.

There are a few ways to accomplish this configuration, but one of my requirements was to not actually store any key string in a file. Rather, I’m relying upon the Azure Cloud Shell as my deployment environment for Terraform, which I will have already authenticated to and can dynamically connect to resources within my subscription.

First I define a simple “backend.tf” file.

terraform {
    backend "azurerm" {}
}

Next, I wrote a wrapper script (“InitWrapper.ps1”) to actually run my “Terraform Init” command, passing in the variables for the backend as documented by Terraform.

 
$subscription_id = "e86a3dce" #
$resource_group = "Default"
$storage_account = "terraformstates"
$containerName = "Client1"
 
# Get the Storage Account Key for use in loading the backend file
Select-AzureRmSubscription -SubscriptionId $subscription_id
$accountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resource_group -Name $storage_account)[0].Value
 
# Perform init
terraform init `
    -backend-config="storage_account_name=$storage_account" `
    -backend-config="container_name=$containerName" `
    -backend-config="access_key=$accountKey" `
    -backend-config="key=prod.terraform.tfstate"

You can see here I’m referencing a storage account, and a blob container. This way you can isolate state files within a container for purposes of RBAC or management.
The last parameter I’m passing in is “key” which tells Terraform what to name the state file.

The end result of running this is a file in my Azure Cloud Shell share (/home/azureuser/clouddrive/client1 is where I would typically change directory to) named “terraform.tfstate” which is a pointer to my backend sitting in an Azure storage account blob container.

Now I no longer need to run the init file unless there are specific Terraform changes that need to be initialized in my cloud shell.

If another team member wishes to work with the same state file, they can run the Init wrapper, and their local state pointer will connect to the same “prod.terraform.tfstate” and be able to create/modify infrastructure in the same way.

Terraform Azure VM Extensions

Having recently gone through getting Terraform to deploy a virtual machine and a VM extension to register Desired State Configuration (DSC) with Azure Automation, I thought I’d note the method and code here for future reference.

This presumes a functioning Azure Automation account with a DSC configuration and generated node configurations.

 

First, I specify my variables in a file “Variables.tf”:

// For all VMs
 
variable "subscription" { type = "string" }
variable "location" { type= "string" }
variable "vmsize" { type = "map" }
variable "username" { type = "string" }
variable "clientcode" { type = "string" }
variable "password" { type = "map" }
variable "networkipaddress" { type = "map" }
variable "serveripaddress" { type = "map" }
variable "dnsservers" { type = "list" }
variable "dcdnsservers" { type = "list" }
//DSC related
# The key for the Azure Automation account
variable "dsc_key" { type = "string" }
# Endpoint, also referred to as the Registration URL
variable "dsc_endpoint" { type = "string" }
# This can be ApplyAndMonitor, ApplyandAutoCorrect, among others
variable "dsc_mode" { type = "string" }
# Heres where you define the node configuration that you actually want to apply to the VM
variable "dsc_nodeconfigname" { type = "map" }
 
variable "dsc_configfrequency" { type = "string" }
variable "dsc_refreshfrequency" { type = "string" }

Then I deploy the VM, along with the required dependencies:

# Initial Resource Group
resource "azurerm_resource_group" "Default" {
  name     = "az${var.clientcode}"
  location = "${var.location}"
}
 
resource "azurerm_virtual_network" "VirtualNetwork" {
  name                = "azcx${var.clientcode}"
  address_space       = ["${lookup(var.networkipaddress, "VMNet")}"]
  location            = "${var.location}"
  resource_group_name = "${azurerm_resource_group.Default.name}"
  dns_servers         = "${var.dnsservers}"
}
 
resource "azurerm_subnet" "subnet-lan" {
  name                 = "az${var.clientcode}-lan"
  resource_group_name  = "${azurerm_resource_group.Default.name}"
  virtual_network_name = "${azurerm_virtual_network.VirtualNetwork.name}"
  address_prefix       = "${lookup(var.networkipaddress, "subnet-lan")}"
}
resource "azurerm_network_interface" "dc1nic1" {
  name                = "az${var.clientcode}1nic1"
  location            = "${var.location}"
  resource_group_name = "${azurerm_resource_group.Default.name}"
  # reverse DNS for the domain controller
  dns_servers = "${var.dcdnsservers}"
 
  ip_configuration {
    name                          = "ipconfig1"
    subnet_id                     = "${azurerm_subnet.subnet-lan.id}"
    private_ip_address_allocation = "static"
    private_ip_address            = "${lookup(var.serveripaddress, "az${var.clientcode}1")}"
 
  }
}
resource "azurerm_virtual_machine" "dc1" {
  name                             = "az${var.clientcode}1"
  location                         = "${var.location}"
  resource_group_name              = "${azurerm_resource_group.Default.name}"
  network_interface_ids            = ["${azurerm_network_interface.dc1nic1.id}"]
  vm_size                          = "${lookup(var.vmsize, "az${var.clientcode}1")}"
  delete_os_disk_on_termination    = true
  delete_data_disks_on_termination = true
 
  storage_image_reference {
    publisher = "MicrosoftWindowsServer"
    offer     = "WindowsServer"
    sku       = "2012-R2-Datacenter"
    version   = "latest"
  }
 
  storage_os_disk {
    name              = "az${var.clientcode}1_c"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Standard_LRS"
    disk_size_gb      = "128"
  }
 
  storage_data_disk {
    name              = "az${var.clientcode}1_d"
    managed_disk_type = "Standard_LRS"
    caching           = "None"
    create_option     = "Empty"
    lun               = 0
    disk_size_gb      = "64"
  }
 
  os_profile {
    computer_name  = "az${var.clientcode}1"
    admin_username = "${var.username}"
    admin_password = "${lookup(var.password, "az${var.clientcode}1")}"
  }
 
  os_profile_windows_config {
    provision_vm_agent        = true
    enable_automatic_upgrades = false
  }
}

Now with the VM deployed, the Extension can be applied. This will pass in the proper variables defined, install the VM extension, and register with the Azure Automation account to begin the initial DSC deployment. Note, due to WordPress formatting, the line below that says “setting = SETTINGS” should look like this: “setting = <<SETTINGS”. The double bracket should exist for the PROTECTED_SETTINGS line too.

resource "azurerm_virtual_machine_extension" "dc1-dsc" {
  name                 = "Microsoft.Powershell.DSC"
  location             = "${var.location}"
  resource_group_name  = "${azurerm_resource_group.Default.name}"
  virtual_machine_name = "az${var.clientcode}1"
  publisher            = "Microsoft.Powershell"
  type                 = "DSC"
  auto_upgrade_minor_version = true
  type_handler_version = "2.76"
  depends_on           = ["azurerm_virtual_machine.dc1"]
 
  settings = SETTINGS
        {
            "WmfVersion": "latest",
            "advancedOptions": {
	              "forcePullAndApply": true 
                },
            "Properties": {
                "RegistrationKey": {
                  "UserName": "PLACEHOLDER_DONOTUSE",
                  "Password": "PrivateSettingsRef:registrationKeyPrivate"
                },
                "RegistrationUrl": "${var.dsc_endpoint}",
                "NodeConfigurationName": "${lookup(var.dsc_nodeconfigname, "dc1")}",
                "ConfigurationMode": "${var.dsc_mode}",
                "ConfigurationModeFrequencyMins": ${var.dsc_configfrequency},
                "RefreshFrequencyMins": ${var.dsc_refreshfrequency},
                "RebootNodeIfNeeded": true,
                "ActionAfterReboot": "continueConfiguration",
                "AllowModuleOverwrite": true
            }
        }
    SETTINGS
  protected_settings = PROTECTED_SETTINGS
    {
      "Items": {
        "registrationKeyPrivate" : "${var.dsc_key}"
      }
    }
PROTECTED_SETTINGS
}

For passing in the variable values, I use two files. One titled inputs.auto.tfvars with the following:

# This file contains input values for variables defined in "CLIENT_Variables.tf"
# As this file doesnt contain secrets, it can be committed to source control.
 
subscription = "bc5242b8"
 
location = "eastus2"
 
vmsize = {
  "azclient1" = "Standard_A1_v2"
  "azclientweb" = "Standard_A1_v2"
 }
 
username = "admin"
 
clientcode = "client"
 
serveripaddress = {
  "azclient1" = "10.0.0.211"
  "azclientweb" = "10.0.0.71"
}
 
networkipaddress = {
  "VMNet"                    = "10.0.0.0/23"
  "subnet-lan"               = "10.0.0.0/26"
}
 
resource_group_name = "azclient"
 
dnsservers = ["10.0.0.10", "10.0.0.111"]
dcdnsservers = ["10.0.0.111","10.0.0.10"]
 
#This is the registration URL of Azure Automation
dsc_endpoint = "https://eus2-agentservice-prod-1.azure-automation.net/accounts/guid"
dsc_mode = "ApplyandAutoCorrect"
dsc_configfrequency = "240"
dsc_refreshfrequency = "720"
dsc_nodeconfigname = {
  "dc1"   = "deploymentconfig.domaincontroller"
  "web1"   = "deploymentconfig.webserver"
}

And then another titled secrets.auto.tfvars which does not get uploaded to source control:

password = {
  "azclient1" = "password"
  "azclientweb" = "password"
}
 
dsc_key = "insert key here"

Windows Updates failing on Azure VM

While playing around with some VMs in Azure, I ran into an issue where they could not perform Windows Updates. This was first noticed with failing Update deployments through Azure Automation:

In order to narrow down the issue, I tried to manually run Windows Updates from the VM itself. I confirmed that public Internet was accessible, but still received this error:

There were some problems installing updates, but we'll try again later. If you keep seeing this and want to search the web or contact support for information, this may help: (0x8024402f)

I ran the PowerShell command “Get-WindowsUpdateLog” which populates C:\Windows\WindowsUpdate.log (new behavior in Server 2016), and only found a brief error showing:

Failed to retrieve SLS response data for service

It was right around this time that I noticed a popup for memory exhaustion on this VM.The VM size I had chosen included 2GB of RAM.

I did a little test, and looked at Task Manager prior to running Windows Update – 45% of memory used:

Then I clicked “Retry” and saw the memory utilization ramp up to 55%, 75%, 85%, 95% and then the Windows Update process returned an error and Task Manager immediately dropped back down to 45% memory utilization.

It appears that memory exhaustion was causing the Windows Update process to crash out. What I don’t understand is why the page file didn’t come into play and grow to accommodate the memory demand; it was set to System Managed and there was more than enough space on the temporary disk to grow into.

 

After I increased the VM size with 4GB of RAM, it updated without issue.

Azure Application Gateway through NSG

I’m testing some things with Azure Application Gateway this week, and ran into a problem after trying to isolate down a network security group (NSG) to restrict virtual network traffic between subnets and peered VNETs.

Here’s the test layout:

click for big

The NSG applied to the “sub-clt-test” has a default incoming rule to allow VirtualNetwork traffic of any port. This is a service tag and according to Microsoft the VirtualNetwork tag includes all subnets within a VNET as well as peered VNETs. This means in my diagram, there’s an “allow all” rule between my “vnet-edge” and “vnet-client”. Not ideal.

I created a new rule on the NSG for “sub-clt-test” to deny all HTTP traffic into the subnet, and then the following intending to allow the Application Gateway to communicate to it’s backend-pool targets:

Source Destination Port
10.8.48.6 Any 80
AzureLoadBalancer (Service Tag) Any 80

The IP address listed there is what is listed as the the frontend private IP of my Application Gateway, within the subnet 10.8.48.0/24.  I added the second rule during testing in case that the Microsoft service tag included the Application Gateway within its dynamic range.

What I discovered is that this configuration broke the Application Gateway’s ability to communicate with the backend targets. The health probes went unresponsive and suggested that the NSG is reviewed.

I knew that this wasn’t due to outbound restrictions on any of my NSG’s because as soon as I removed the incoming port 80 deny on this subnet, it began functioning again.

I removed the Deny rule, and then installed WireShark on the backend web server, to collect information about what IP was actually making the connection.

I discovered that while the frontend private IP was listed as 10.8.48.6, it is actually the IP addresses of 10.8.48.4 and 10.8.48.5 making the connection to the backend pool. The frustrating part is that I couldn’t find any explanation for this behavior in Microsoft documentation – I know that the Application Gateway requires its own subnet not to be shared with other resources, but there’s no references to the reserved IP’s that traffic would be coming from.

Since the subnet is effectively reserved for this resource its easy enough to modify my NSG for the range, but I felt like I was missing something obvious as to why it is these IP addresses being used for the connections.

Halfway through writing this post though, I came across this blog post by RoudyBob, with a bit of insight. Each instance of the Application Gateway uses an IP from the subnet assigned, and it is these IPs that will communicate with your backend targets.

Dell PowerEdge BIOS failed due to IPMI driver

I was updating the BIOS on a couple of Dell PowerEdge R620’s today and was presented with an error I hadn’t heard of before:

IPMI driver is disabled. Please enable or load the driver and then reboot the system.

This was very odd. A little bit of searching showed that there should be an IPMI driver service running in Windows, so I checked that:

Service 'ipmidrv (IPMIDRV)' cannot be started due to the following error: Cannot start service IPMIDRV on computer '.

Knowing this was a driver, I went to look in Device Manager, and was quite surprised to find this IPMI driver listed:

This is a Dell PowerEdge, I have no idea how this HP driver got installed – I’m not even convinced it wasn’t there prior to the other driver updates I was performing. In any case, this device wasn’t starting properly, and was preventing the service from starting. I uninstalled it and the driver:

Following this, the appropriate device appeared under “System Devices”:

Now I could start the service, and the BIOS update proceeded properly.