The Australian IT industry has failed to deliver computing services at scale, according to Melbourne IT CTO Glenn Gore, and will have to lose its addiction to “hugging boxes” to compete in today’s economy.
Gore delivered a blistering attack on the online efforts of Australia’s major retailers in what was undoubtedly the most confronting and engaging presentation at the Australian Data Centre Strategy Summit.
Whether it is predictable events such as the ClickFrenzy sales event or Boxing Day sales, or responses to unplanned ones such as bushfires and floods, Australian web sites invariably fail to scale up when traffic peaks, Gore told the audience.
Gore said the most common (and annoying excuse) he hears during planned events is that “marketing sized it wrong” — meaning that IT expects users to give it accurate forecasts on demand for a given service.
He marvelled that while sizing a campaign too small results in an outage, sizing it too large can get an IT manager in just as much trouble — they are then accused of being wasteful and “over-engineering” a solution.
“That’s just excuses, to be honest,” Gore told the audience. “The problem is actually an incorrect architectural approach to supporting the load. If you get the right IT architecture in place those issues go away.”
The likes of Facebook, Google, Amazon and Microsoft use massively distributed services on commodity hardware that are “designed from the start for scale,” he said.
A similar architectural approach is required for any operation with “true unpredictability and load that you cannot predict”.
The Australian IT industry’s approach, by contrast, was to “build silos of infrastructure” to support enterprise applications over decade-long cycles, all to be controlled by large, formal IT operations teams.
“Australia, we love our infrastructure,” he said. “Sometimes I think we are a nation of box huggers. We love our data centres, we love flashing lights, we love spinning things. The problem with that is it doesn't scale once you max it out.
“The operations departments are there to provide a sense of security and control for the business and the systems that you're on. The problem is that locking down infrastructure doesn't work. It doesn't scale.”
Neither will an enterprise software program developed for today’s user base that needs to meet the needs of those ten years down the track, he said.
“In the online world, success is linked to how quickly you can make changes,” he said. “It was only a couple of years ago that if you still went to some of the big retail sites, they had where the store location was and a pdf of the catalogue that they put in your mailbox. It's not where we need to be and it hasn't improved significantly.”
Even the leading edge media sites in Australia make changes to their online applications weekly, he said, versus hourly for their equivalents in the US.
Gore’s solution is not one many IT managers want to hear.
“IT operations is critical to maintaining IT systems. The services they run absolutely generate huge amounts of money for the business. But does IT operations create value? It's a cost.”
Australia is counting the cost of laying off or outsourcing software developers by the hundreds in the early 2000s, he said, under the assumption that IT administrators were cheaper to keep on the books than software developers.
He advocates that Australia “shifts headcount from Ops Admin back to Apps Development” — first to customise the COTS software more companies are stuck with, but second to “create services that don't require a traditional operations team to manage.”
NoOps?
The solution to the problem is “the cloud”, Gore said, but not “the cloud” that infrastructure vendors are selling. “It’s not about moving VMs around,” he said. “Its about being API driven. When you use APIs, you can build in automation.”
With the right amount of automation, IT organisations “don't have to have human beings making decisions for every little thing that’s happening in the IT environment. The systems can manage themselves.”
Gore is a proponent of the NoOps movement (“or DevOps to use the more politically correct term").
This theory relegates the operations team to areas that manage risk — security and standards, for example, or projects to move off legacy systems. And it empowers the developer to not only produce code, but deploy it into production, scale and manage it.
Gore asked the audience whether they would consider allowing their developers this much responsibility. None raised their hands.
“If you look at the companies that are really successful in operating online at scale, the developers are running the production systems,” he said. “They're running on platforms that allow them to do that, with lots of safety control and frameworks [in place]. Lifecycle management is handled at a platform layer, and the dev is in control.
“It's really confronting stuff — a bit scary. But if you get that right, the reward of doing it is it increases your rate of change.
"The developer can push, they can deploy, manage and scale the application as they're changing it because nobody knows the application better than the guy who wrote it. [The alternative] is to try to code the change, document it, hand it to the test department, go into operational acceptance, go through a change control process, and — assuming I got through every single step of that absolutely spotless with not a single question coming back — then I might be able to release to production.
“You've got a team of 20-30-50-100 developers in that process. The overhead of that slows you down, it means your rate of change can't go up.”
Gore said there may not be a place for test and dev managers in this new world. Software developers will need to “compartmentalise” any change to restrict the damage it can do, have rollback mechanisms in place.
“If you make change easy, maybe its easier to just fix the code and redeploy. It's a very different way of thinking, and the architecture and skills matrix fundamentally changes.”
The operations team at Melbourne IT — which recently embarked on the NoOps journey — "absolutely hate” the model, Gore said.
“They're all sitting there, almost saying, I can't wait for this to fail. It will fail because [the developers] need operations, they need the framework, they need the control that we provide.“
But Gore insists that without more rapid change, the consequences are far more dire.
“We need to do this, if we are to survive,” he said. “As a hosting company, we won't survive long-term without a faster rate of change, because our competition is already well ahead in this at a global level.”
Where does NoOps leave Australia's IT infrastructure industry? Read on for more...