Programmatic deployment of Azure Site Recovery for Azure VMs – that’s the target I started with for a project a little while ago.
There is a large amount of information for Azure Site Recovery on Microsoft’s Docs, however the amount of available code online to programmatically deploy a full setup is very sparse! Much of what Microsoft provides is PowerShell, which isn’t idempotent and doesn’t fit with my current tooling (Terraform and declarative infrastructure-as-code). I did consider Azure CLI, but couldn’t find any references for Site Recovery.
While working on this I came across an Azure Quickstart which is a little incomplete and starts with some good variable definition but quickly devolves into hard-coded values from where it came from; I also discovered a blog post from Pratap Bhaskar which was useful especially for understanding the Loop mechanism in the template, but it didn’t go far enough for my purposes.
So I spun up a few VMs, manually configured ASR, and did an ARM template export. What I received was a huge amount of properties on the resources that I was sure were relevant to runtime only, not creation. This was also a good reference, but not exactly where I needed to be.
The final piece of the puzzle that got me on my way was the REST API docs for Site Recovery. With this in hand, and the other sources I had at my disposal, I had the references I needed to begin putting together an ARM Template that would configure my environment end-to-end including Recovery Plans with automation runbooks.
There are a lot of design decisions I made when building this to fit my environment, some of which won’t make sense without additional context; most of which I can’t provide. That’s ok, as I hope it at least serves as a reference for “what’s possible” to others who come across it. Here is the overall structure:
- Pre-define and create destination resources like resource groups, virtual networks, and subnets with Terraform
- Deploy ASR for a subset of Virtual Machines, targeting the destination resources
- Include dependent resources like source-side storage account for cache, and azure automation account in the same region and subscription as the ASR resources
- Deploy a Recovery Plan that provides runbook functionality to configure a Test Failover environment
- This environment was intended to be completely isolated, to ensure there’s no chance of contamination with prod
- Current design of my web servers has multiple ip configurations; these need to be replaced
- Access to the environment is provided through a Jump Host, which needs a known-in-advance IP address
I have documented this output on a GitHub repo called arm-azuresiterecovery
The majority of my time was actually spent cleaning up the template into proper parameters and variables for effective re-use, and then solving all the syntax challenges and typos that come along with that.
There are still some loose ends in what I’ve created, around certain manual steps still required. However as I’m sure is common in the industry, it is good enough to deploy and I must move on – fine-tuning comes later.