Microsoft's mission to migrate 90% of internal IT to the Azure cloud

By on
Microsoft's mission to migrate 90% of internal IT to the Azure cloud

Microsoft is credited with coining the term 'eating your own dogfood' in the '80s – a phrase since rebranded to the more marketing-friendly 'drinking your own champagne' – and now the software giant has revealed how it is consuming its own cloud.

Microsoft's IT teams plan to migrate 90 percent of the company's computing resources to Azure by July 2017, anticipating a 31 percent increase in Azure workloads from its current footprint of 20,000 Azure virtual machines, 110,000 cores and 1,500 Azure-based applications.

As Microsoft's IT has moved to Azure, there has been a concurrent nosedive in the amount of data it hosts on-premises, from 60,000 on-prem VMs in late 2014 to around 18,000 today. Its Azure operating system instances have grown by almost 10,000 over two years, while on-premises OS instances have fallen by more than 18,000.

Microsoft's internal IT unit is driven by the same impetus as many of its customers: a desire to reduce capex spend on physical hardware and data centres. The company said it started this move with test and dev workloads and has since turned its focus to production resources.

Active but under-utilised resources – the cloud equivalent of 'shelfware' – has been a significant focus in Microsoft's partner strategy. The vendor rejigged channel incentives in 2015 to ensure partner rewards were based on customers' active usage of cloud, as opposed to simply purchasing the licences. It turns out the software vendor went through this same journey internally.

"Our early adoption of cloud technology was focused on getting solutions and resources migrated from on-premises to Azure," according to a blog post. "Many of these migrations were performed as 'lift and shift' – the on-premises architecture was virtualised and placed in Azure IaaS, with the intent of optimisation sometime in the future.

"The lift and shift migration was fast paced – our Azure resources grew quickly, and our subscription base became very large with many Azure resources being underutilised. We’ve realised the need for optimisation and management of our Azure resources, and we understand the business benefits that optimisation brings. The requirement for optimisation will only become more important in the future as our environment continues to grow."

One answer to this problem has been to "heavily" promote migration to platform-as-a-service for "high value on-premises internal solutions and newly designed apps".

"Shocking" cost savings

Cloud sprawl is a problem familiar to many customers: the ease of spinning up new instances means workloads proliferate and costs can skyrocket, often unbeknownst to the company until the bill arrives. It's the opposite to the economic challenges of traditional IT, when costs were sunk upfront.

As Microsoft puts it: "In the on-premises world, teams would acquire servers for a project with one-time capital expenditure budgets, and server utilisation wasn’t a concern because the hardware was a fixed cost. In a cloud-first world, we pay by the hour, and that turns our old practices sideways. Each hour counts, and each core counts."

It turns out Microsoft's tech teams are not immune from a common mistake made by customers: over-speccing cloud instances. The vendor set out to take advantage of cloud's scalability without being hit with unexpected, unwelcome charges. It found the potential for cost savings to be "shocking".

It's impossible to gauge the price tag for Microsoft's Azure footprint, not least because it owns the platform, but it's bound to be extremely costly. At rate card prices, 20,000 D1 instances – the default option on Microsoft's online calculator – in the Azure West US region would cost more than US$2 million per month (but prices could range from a few hundred thousand dollars per month to tens of millions depending on the number of cores, memory and storage specified in each instance).

Next: How Microsoft avoids bill shock

Having been told to reduce its infrastructure spend, Microsoft IT achieved a 38 percent reduction in cloud spending and expects to hit its goal by budget reporting time.

The stats are impressive, including a rise in CPU utilisation from 4.5 percent average to 16 percent average CPU utilisation across Azure IaaS instances. Microsoft also decreased the number of operating system instances by more than 20,000.

To manage its vast IaaS footprint, Microsoft's internal IT teams created Azure Resource Optimisation (ARO), a combination of tools, processes and education that allowed them to analyse total cost of cloud resources and identify underutilised assets.

ARO evaluates the cost-effectiveness of everything from virtual machines to SQL databases, PaaS and unused Azure storage. By identifying issues such as underutilised servers, misconfigured resources and other unused resources, ARO offers recommendations such as adjusting SKU size, deleting unused resources or turning off resources during downtime.

The IT teams have a "cultural focus on optimisation" and review the ARO dashboard on a weekly basis. They "drill down" into ARO data, switching off or killing virtual machines in real-time. The dashboard then updates with the new, reduced cost.

"Watching the benefit of reducing core counts and resource types in near real time provides a great benefit to teams that are trying to stay within their budgets."

Microsoft has shared its learnings on how to optimise Azure consumption, starting by identifying poorly utilised servers.

"The ARO team monitors on-premises data centres and Azure virtual servers daily, using performance counters from System Center Operations Manager (SCOM). Processor, memory and hard drive data is gathered, and then the team uses industry-standard P95 values to determine whether specific assets are underutilised.

"We categorise all servers into five performance categories: frozen, cold, warm, hot and on fire. Teams often use hardware that’s much larger than needed, which leads to very low utilisation. Part of the ARO program is designed to educate engineering organisations about the cost associated with picking server sizes that are too large."

The blog posts outlines further detail on making best use of Azure resources by collecting performance data, creating recommendations and continuing to review metrics to ensure the best configuration and cost.

Microsoft recommends a number of tools to keep on top of Azure, including its own ARO dashboard, as well as Cloud Cruiser, a third-party system that was acquired by Hewlett Packard Enterprise just a few days after the Microsoft blog went live.

Other tools include Snooze, which turns off non-production servers when employees are not actively working on them, as well as Resize, which does what it says on the tin – changes the size of an Azure server, dropping the size and cost of a VM.

Multi page
Got a news tip for our journalists? Share it with us anonymously here.
Copyright © nextmedia Pty Ltd. All rights reserved.
Tags:

Log in

Email:
Password:
  |  Forgot your password?