Governance at Scale - Azure Policy, Locks, and Tags

Image generated by AI
The Governance Paradox
In the on-premises era, governance was physical. You couldn't deploy a server without a purchase order, a rack mount, and a network cable. In Azure, that friction is gone. A developer with Contributor access can spin up a geo-redundant Kubernetes cluster in minutes.
Governance is the engineering discipline of reintroducing necessary friction—guardrails—without destroying the agility that drove the cloud migration in the first place. This guide dissects the three mechanisms provided by the Azure Resource Manager (ARM): Azure Policy, Resource Locks, and Tagging.
Section 1: Azure Policy - The Guardrails
Azure Policy operates at the heart of the ARM request pipeline. It intercepts PUT and PATCH requests to validate resource state before it is persisted.
Definitions vs. Initiatives
A common mistake is assigning individual Policy Definitions directly to subscriptions. While this works for a sandbox, it fails at scale due to the Assignment Limit. Azure imposes a strict limit of 200 policy assignments per scope.
The Initiative Strategy: Always group related policies into an Initiative (PolicySet). Even if you only have one rule today (e.g., "Audit Public IPs"), wrap it in an Initiative. This allows you to add 50 more network rules later without consuming additional assignment slots.
The "Effect" Types
Understanding the effect is critical for predicting behavior:
- Audit: The safest starting point. It logs violations but does not block deployment.
- Deny: Blocks the request immediately. Warning: This can break CI/CD pipelines if not tested in a "Canary" ring first.
- Modify: Used to add or edit properties (like Tags) during the deployment. It is synchronous.
- DeployIfNotExists (DINE): The most complex. It runs asynchronously.
- The Trap: If you use DINE to install a VM Extension, the VM creates successfully (201 Created), and the extension installs 10-15 minutes later. If your application requires that extension immediately upon boot, your deployment will fail.
Remediation Tasks: The Identity Gotcha
Policies with Modify or DINE effects can auto-remediate new resources, but they do not automatically fix existing ones. You must trigger a Remediation Task.
This requires a Managed Identity. When deploying via the Portal, Azure tries to grant this identity the necessary roles automatically. When deploying via code (Bicep/Terraform), you must explicitly assign the role.
Example: A policy to "Deploy Diagnostic Settings" requires the Managed Identity to have Log Analytics Contributor rights. If you miss this role assignment, the remediation task will fail silently.
PowerShell: Triggering Batch Remediation
Start-AzPolicyRemediation `
-PolicyAssignmentId "/providers/Microsoft.Management/managementGroups/Contoso/providers/Microsoft.Authorization/policyAssignments/Tag-Enforcement" `
-ResourceDiscoveryMode ReEvaluateCompliance
Using ReEvaluateCompliance ensures you don't waste cycles on resources that were already deleted or fixed manually.
Section 2: Resource Locks – The Kill Switch
Resource Locks are blunt instruments. They lack the granularity of RBAC, but they are effective at preventing accidental deletion.
CanNotDelete vs. ReadOnly
| Feature | CanNotDelete | ReadOnly |
|---|---|---|
| Delete Resource | Blocked | Blocked |
| Modify Config | Allowed | Blocked |
| Read Data | Allowed | Allowed |
| Restart VM | Allowed | Blocked |
| Access Storage Keys | Allowed | Blocked |
The Control Plane vs. Data Plane Distinction
Locks only apply to the Control Plane (management.azure.com). They do not restrict the Data Plane.
- A
ReadOnlylock on a SQL Database prevents you from changing the "Pricing Tier" (Control Plane). - It does NOT prevent you from running
DELETE FROM Usersvia SQL Management Studio (Data Plane).
Troubleshooting: Services that Break under ReadOnly
The ReadOnly lock is dangerous because many "Read" operations in Azure are technically "Write" (POST) operations on the Control Plane.
- Virtual Machines: You cannot start, stop, or restart a locked VM. These actions require a POST to the
restartendpoint, which the lock blocks. - Storage Accounts: You cannot "List Keys" (view connection strings). This breaks any application or Azure Function that needs to retrieve keys to connect to Blob Storage.
- App Services: Visual Studio Server Explorer and basic scaling operations will fail. The
ReadOnlylock prevents the "Write" access needed to bridge the tooling to the server.
Warning: The "Owner" Paradox Locks apply to everyone, including Global Admins and Owners. If you are an Owner and need to delete a locked resource, you cannot simply bypass it. You must remove the lock first, perform the deletion, and then (optionally) re-apply it.
Section 3: Tagging for FinOps & Operations
Tagging is the only link between your engineering reality and your finance team's invoices.
The Case Sensitivity Trap
This is the most common FinOps failure mode.
- ARM (Control Plane): Tag Names (Keys) are case-insensitive.
CostCenterandcostcenterare treated as the same key. - Cost Management (Billing): Tag Values are case-sensitive.
Env: ProdandEnv: prodappear as two completely different buckets in your billing report.
Strategy: Do not rely on human discipline. Use Azure Policy to enforce exact casing.
Enforcement: Modify > Append
Historically, admins used the Append effect to add tags. You should stop doing this.
- Append: Adds a tag if missing. Cannot fix incorrect values.
- Modify: Can
addOrReplace. If a user tags a resourceEnv: prod, a Modify policy can force it toEnv: Prod, correcting the data quality issue automatically.
The "Inheritance" Myth
Tags do not inherit from Resource Groups to Resources by default. You must build a policy to force this "waterfall" behavior.
JSON Snippet: Enforce Tag Inheritance This policy looks at the Resource Group's tags and stamps them onto any new resource created inside it.
{
"if": {
"allOf": [
{
"field": "[concat('tags[', parameters('tagName'), ']')]",
"exists": "false"
},
{
"field": "type",
"notEquals": "Microsoft.Resources/subscriptions/resourceGroups"
}
]
},
"then": {
"effect": "modify",
"details": {
"roleDefinitionIds": [
"/providers/microsoft.authorization/roledefinitions/b24988ac-6180-42a0-ab88-4908832b8eec"
],
"operations": [
{
"operation": "add",
"field": "[concat('tags[', parameters('tagName'), ']')]",
"value": "[resourceGroup().tags[parameters('tagName')]]"
}
]
}
}
}
Want to discuss this further?
I'm always happy to chat about cloud architecture and share experiences.
Follow me for more insights on cloud architecture and DevOps
Follow on LinkedIn