Command Palette

Search for a command to run...

Back to Blog
Cloud ArchitectureAzureGovernanceAzure PolicyTagsLocksFinOpsCloud Architecture

Governance at Scale - Azure Policy, Locks, and Tags

December 11, 2025
12 min read
Cover image for Governance at Scale - Azure Policy, Locks, and Tags

Image generated by AI

The Governance Paradox

In the on-premises era, governance was physical. You couldn't deploy a server without a purchase order, a rack mount, and a network cable. In Azure, that friction is gone. A developer with Contributor access can spin up a geo-redundant Kubernetes cluster in minutes.

Governance is the engineering discipline of reintroducing necessary friction—guardrails—without destroying the agility that drove the cloud migration in the first place. This guide dissects the three mechanisms provided by the Azure Resource Manager (ARM): Azure Policy, Resource Locks, and Tagging.


Section 1: Azure Policy - The Guardrails

Azure Policy operates at the heart of the ARM request pipeline. It intercepts PUT and PATCH requests to validate resource state before it is persisted.

Definitions vs. Initiatives

A common mistake is assigning individual Policy Definitions directly to subscriptions. While this works for a sandbox, it fails at scale due to the Assignment Limit. Azure imposes a strict limit of 200 policy assignments per scope.

The Initiative Strategy: Always group related policies into an Initiative (PolicySet). Even if you only have one rule today (e.g., "Audit Public IPs"), wrap it in an Initiative. This allows you to add 50 more network rules later without consuming additional assignment slots.

The "Effect" Types

Understanding the effect is critical for predicting behavior:

  • Audit: The safest starting point. It logs violations but does not block deployment.
  • Deny: Blocks the request immediately. Warning: This can break CI/CD pipelines if not tested in a "Canary" ring first.
  • Modify: Used to add or edit properties (like Tags) during the deployment. It is synchronous.
  • DeployIfNotExists (DINE): The most complex. It runs asynchronously.
    • The Trap: If you use DINE to install a VM Extension, the VM creates successfully (201 Created), and the extension installs 10-15 minutes later. If your application requires that extension immediately upon boot, your deployment will fail.

Remediation Tasks: The Identity Gotcha

Policies with Modify or DINE effects can auto-remediate new resources, but they do not automatically fix existing ones. You must trigger a Remediation Task.

This requires a Managed Identity. When deploying via the Portal, Azure tries to grant this identity the necessary roles automatically. When deploying via code (Bicep/Terraform), you must explicitly assign the role.

Example: A policy to "Deploy Diagnostic Settings" requires the Managed Identity to have Log Analytics Contributor rights. If you miss this role assignment, the remediation task will fail silently.

PowerShell: Triggering Batch Remediation

Start-AzPolicyRemediation `
    -PolicyAssignmentId "/providers/Microsoft.Management/managementGroups/Contoso/providers/Microsoft.Authorization/policyAssignments/Tag-Enforcement" `
    -ResourceDiscoveryMode ReEvaluateCompliance

Using ReEvaluateCompliance ensures you don't waste cycles on resources that were already deleted or fixed manually.


Section 2: Resource Locks – The Kill Switch

Resource Locks are blunt instruments. They lack the granularity of RBAC, but they are effective at preventing accidental deletion.

CanNotDelete vs. ReadOnly

FeatureCanNotDeleteReadOnly
Delete ResourceBlockedBlocked
Modify ConfigAllowedBlocked
Read DataAllowedAllowed
Restart VMAllowedBlocked
Access Storage KeysAllowedBlocked

The Control Plane vs. Data Plane Distinction

Locks only apply to the Control Plane (management.azure.com). They do not restrict the Data Plane.

  • A ReadOnly lock on a SQL Database prevents you from changing the "Pricing Tier" (Control Plane).
  • It does NOT prevent you from running DELETE FROM Users via SQL Management Studio (Data Plane).

Troubleshooting: Services that Break under ReadOnly

The ReadOnly lock is dangerous because many "Read" operations in Azure are technically "Write" (POST) operations on the Control Plane.

  1. Virtual Machines: You cannot start, stop, or restart a locked VM. These actions require a POST to the restart endpoint, which the lock blocks.
  2. Storage Accounts: You cannot "List Keys" (view connection strings). This breaks any application or Azure Function that needs to retrieve keys to connect to Blob Storage.
  3. App Services: Visual Studio Server Explorer and basic scaling operations will fail. The ReadOnly lock prevents the "Write" access needed to bridge the tooling to the server.

Warning: The "Owner" Paradox Locks apply to everyone, including Global Admins and Owners. If you are an Owner and need to delete a locked resource, you cannot simply bypass it. You must remove the lock first, perform the deletion, and then (optionally) re-apply it.


Section 3: Tagging for FinOps & Operations

Tagging is the only link between your engineering reality and your finance team's invoices.

Cost Visibility Impact
Impact of inconsistent tagging on Cost Management reporting.
45%30%25%
Percentage of Bill
Percentage of Bill
Percentage of Bill

The Case Sensitivity Trap

This is the most common FinOps failure mode.

  1. ARM (Control Plane): Tag Names (Keys) are case-insensitive. CostCenter and costcenter are treated as the same key.
  2. Cost Management (Billing): Tag Values are case-sensitive. Env: Prod and Env: prod appear as two completely different buckets in your billing report.

Strategy: Do not rely on human discipline. Use Azure Policy to enforce exact casing.

Enforcement: Modify > Append

Historically, admins used the Append effect to add tags. You should stop doing this.

  • Append: Adds a tag if missing. Cannot fix incorrect values.
  • Modify: Can addOrReplace. If a user tags a resource Env: prod, a Modify policy can force it to Env: Prod, correcting the data quality issue automatically.

The "Inheritance" Myth

Tags do not inherit from Resource Groups to Resources by default. You must build a policy to force this "waterfall" behavior.

JSON Snippet: Enforce Tag Inheritance This policy looks at the Resource Group's tags and stamps them onto any new resource created inside it.

{
  "if": {
    "allOf": [
      {
        "field": "[concat('tags[', parameters('tagName'), ']')]",
        "exists": "false"
      },
      {
        "field": "type",
        "notEquals": "Microsoft.Resources/subscriptions/resourceGroups"
      }
    ]
  },
  "then": {
    "effect": "modify",
    "details": {
      "roleDefinitionIds": [
        "/providers/microsoft.authorization/roledefinitions/b24988ac-6180-42a0-ab88-4908832b8eec"
      ],
      "operations": [
        {
          "operation": "add",
          "field": "[concat('tags[', parameters('tagName'), ']')]",
          "value": "[resourceGroup().tags[parameters('tagName')]]"
        }
      ]
    }
  }
}

Want to discuss this further?

I'm always happy to chat about cloud architecture and share experiences.

Follow me for more insights on cloud architecture and DevOps

Follow on LinkedIn