As the demand for cloud resources continues to surge, particularly due to the rise of AI workloads, Azure's UK South region is reportedly facing capacity challenges. Anecdotal feedback in the public domain appears to confirm from experience that there is some pressure with requests for subscriptions and compute quota being closely managed which is leading to project delays and general compute allocation failures.
Understanding the capacity crisis
Azure UK South is currently experiencing intense pressure on its capacity, resulting in:
- New quota requests that are no longer auto-approved.
- Frequent AllocationFailed errors when customers attempt to deploy, scale, or resize VMs.
Whether this situation has been exacerbated by increased demand from AI and other resource-intensive applications or not, it is requiring IT leaders to adopt proactive strategies to navigate these challenges effectively.
Key strategies for managing capacity issues
- Preserve your existing capacity
- Avoid autoscaling deallocation: During shortage windows, refrain from using autoscaling settings that deallocate VMs when load reduces. Deallocating a VM releases its hardware allocation, which may not be available when you need it again. Instead, temporarily disable any scale-in rules that stop or deallocate compute resources.
- Utilise on-demand capacity reservations
- Microsoft offers a capacity reservation feature that allows you to guarantee VM capacity in UK South by reserving compute resources in advance. This is critical for:
- Migrations
- Production workloads
- Seasonal or predictable demand spikes
- While this requires a commitment, it significantly reduces the risk of deployment failures.
- Choose VM sizes wisely
- Opt for VM sizes that have better availability. Newer SKUs are often the oversubscribed. Consider using more commonly available VM families which currently offer improved allocation success rates.
- Move away from legacy VM SKUs
- Older VM families cannot leverage modern hardware and are more prone to failures during capacity shortages. Upgrading to current generation SKUs provides:
- Higher allocation success
- Enhanced performance
- Potential long-term cost savings
- Explore alternative regions
- If governance permits, consider deploying workloads in alternative regions during constraint windows. Regions should be selected based on:
- Latency: Choose a region that minimizes latency for your users.
- Compliance: Ensure the selected region meets all regulatory requirements relevant to your industry.
- Service availability: Check the availability of specific Azure services in your chosen region.
- Deploy across multiple availability zones (AZs)
- Capacity can vary significantly by zone within a single region. Leveraging multiple AZs increases resilience and mitigates the risk of zonal allocation failures.
- Leverage Microsoft's allocation success recommender
- This Azure Portal tool predicts the likelihood of successful VM allocations in your chosen region over the next week. It provides:
- Recommended VM sizes
- Real-time capacity indicators
- Deployment success forecasting
Quick troubleshooting checklist for deployment failures
If you encounter deployment failures, consider the following steps:
Conclusion
The capacity issues in Azure UK South are a pressing concern for IT leaders and organisations reliant on cloud resources. By implementing these management strategies and making informed decisions about resource allocation and usage, organizations can navigate these challenges more effectively. Ensuring that workloads are resilient and prepared for potential capacity constraints will not only enhance operational efficiency but also safeguard against future deployment failures.
For those operating in Azure's UK South region, it is crucial to stay informed and proactive in managing your cloud infrastructure amidst these evolving challenges.