Industry hysteria and vendor missteps in dealing with the Spectre and Meltdown CPU vulnerabilities may have complicated the response for Australian cloud and managed service providers (MSPs), but actual operational damage from the bugs has so far been rather less than many feared at first.
Patches for the vulnerabilities began trickling into the market days after the announcement that two hardware-level bugs had left nearly every computer in the world exposed to snooping from exploitation of speculative-execution capabilities designed to speed CPUs’ performance.
Intel – which was, subsequent investigations have revealed, hiding the vulnerability for months because it wasn’t sure how to fix it – offered initial patches that were feared to slow down machines and ended up spawning a PR disaster when they began rebooting systems at random and bricking Red Hat Linux servers.
The company quickly advised companies to stop using the patches and Microsoft also released a patch to disable the previous patch.
Caught in the middle of this glaring show of incompetence were cloud solutions providers, whose reliance on thousands of machines meant they needed reliable, effective patches that would let them confidently reassure customers that their data was secure.
Press play, then pause
Hosting giant Bulletproof Networks was already in the process of rolling out the patches using a “pretty standard process” based on commercial and homegrown tools to regularly roll out patches, according to CTO John Ferlito.
The company uses a staggered patching process, updating servers in groups to avoid creating potential points of failure. Although a few Red Hat-based systems had “some issues, the biggest cost has been in man-hours,” Ferlito said. “There was a significant effort spent on this.”
By the time word of the patches’ problems reached the technical team, they paused the rollout and began rolling back the changes installed to date. At the same time, they co-ordinated an internal response that included updating staff desktops and reconfiguring Chrome to prevent processes reading data across tabs.
The biggest problem was planning strategy in the information vacuum that Intel left, Ferlito said, contrasting the company’s hush-hush approach with the industry-wide collaboration around resolving 2014’s Heartbleed vulnerability.
“Probably the most disappointing bit for me has been the level of communication from Intel, which for an organisation of their size and magnitude has been less than I would expect,” Ferlito said. “It’s all very PR stuff without a lot of substance. It’s their product – but right now, if I want to know what’s happening with Spectre-Meltdown, I’m going to go to Google’s Project Zero Web site,” where the bugs were first reported.
It took over a month before Intel released new Spectre patches and Ferlito said that Bulletproof’s processes helped contain the effects of the industry chaos. There have been no reports of performance being impacted, he said, although this may be due to the relatively low transactional workloads the company’s customers are generally running.
Shifting the decision-making
Hosting and cloud provider Servers Australia took a rather different approach after the Meltdown and Spectre vulnerabilities were announced: they did nothing at all.
The decision came after a meeting of the company’s internal Security Advisory Council (SAC) – comprised of seven technical and business leaders – which met to evaluate the real risk and concluded that the bugs were only an issue in shared environments, according to chief executive Jared Hirst.
Because the firm’s core user base is built on dedicated servers, the behaviour of potential exploits was simply not relevant – and not worth the potential impact of patching straight away. Instead, the company stopped selling services to unverified customers. “We were very picky around who we were giving access to the platform,” said Hirst.
“We were so glad we didn’t patch on day one because we saw everyone else rolling out patches, then unrolling the patches. It was just a nightmare. And patching a private environment is far more detrimental to the end user than leaving it unpatched if there is no one else having access to it.”
Once stable patches became available, Servers Australia did eventually patch the servers supporting its public-cloud offering – the company maintains a Puppet and Cobbler-based server management system that made the logistics relatively straightforward – but this only happened after extensive testing in a lab to make sure there was no effect on the platform.
Although the team decided internally not to mass-patch the company’s hosting platform, it has been liaising with customers to allow them to make the ultimate decision. But when the nominal risk on single-owner platforms was weighed against the potential risks of the patches – slower performance, instability and crashes among them – Hirst says that 95 percent of the company’s customers agreed to wait.
“The key is education,” he explained. “If the customer does run a shared cloud platform then we are 100 percent advising them to patch it straight away; if two people are on the same machine, it’s the biggest vulnerability. However, if they have a dedicated server with one website on it, then there is almost no risk at this stage.”
A long-term plan
Ultimately, Meltdown and Spectre patches are likely to make their way into every environment as increasingly efficient, resilient patches become commonplace.
Microprocessor vendors are redesigning their microcode to avoid the issue, and by next year, Gartner has predicted updated patches for existing systems will reduce their performance hit to less than 5 percent.
That firm’s analysis recommends that IT organisations evaluate the potential exposure of installed systems to attackers, prioritising updates to all systems that might be exposed to untrusted applications first.
Where patches or updates pose a potential performance risk for new equipment, Gartner suggests, this impact should be factored into equipment specification processes and faster CPUs recommended to compensate.
Australian MSP itGenius had no issues from Meltdown and Spectre due to a decision, years ago, to shift to a business model based on Google’s cloud applications.
By buying into that cloud ecosystem, the company has been able to buffer itself from such infrastructure issues; Google’s cloud services, as well as the Chromebook devices the company recommends and manages for its more than 1000 customers, were automatically updated.
When the bugs were announced, itGenius founder and managing director Peter Moriarty said the company “didn’t have any monumental effort rushing to patch our computers. The patches just trickled down.”
“Nowadays with our approach to managed services, we have quite a hands-off approach considering that these devices don’t need much management intervention Google’s response was brilliant, and they did the hard work for us,” Moriarty said.