KubeCon London: Prepare for a shake-up

KubeCon London: Prepare for a shake-up

During a discussion at the recent Kubecon + Cloud Native Computing Foundation (CNCF) Europe 2025 event in London, panel members discussed one of the big criticisms of modern software development: why is code deployment still fraught with problems?

Shifting left, where developers take on more responsibility for putting their code into production appears to have stalled in some organisations.

Kendall Roden, tech product lead at Diagrid, said that among the big challenges is the rise of distributed systems. According to Roden, this means developers now have a lot of what she called “additional cognitive load”. “There’s a lot of new responsibility, such as how to do resiliency when [the code] no longer runs as a single process,” she said.

In Roden’s experience, a lot of organisations don’t have the platform team to help take on these responsibilities. “Developers are less productive; the DevOps experience has become a little bit bloated and there’s not clear delineation of responsibility,” she warned.

While some organisations may adopt platform engineering to support developers, it still remains a challenge for developers to put code into production. Randy Bias, vice-president of open source strategy and technology at Mirantis, said the goal of DevOps and practices like platform engineering was to break down the walls between developers and operators “so that everybody could operate like Google”.

Platform engineering’s role is to sit between IT operations staff and software developers, where a team provides a platform, which is usually built on top of Kubernetes. This, he said, adds a layer of abstraction and provides developers with a path to production. 

However, Bias added: “If you look at the average enterprise, they have no chance in hell of operating like Google. The silos are as siloed as ever, and there are huge barriers between developers and operations people.” He pointed out that developers do not want to know about IT infrastructure. “They don’t care about it,” said Bias. “They want to develop their applications and have them magically deployed.”

Kubernetes’ relationship with AI

Artificial intelligence (AI) and machine learning (ML) workloads represent something entirely new for the Kubernetes committee, according to Jago Macleod, director of engineering at Google Cloud. “This new round of AI and ML workloads is really different to what the CNCF and KubeCon communities have built so far,” he said.

AI and ML are large training workloads. In the world of Kubernetes, workloads are deployed in pods but given the size of AI and ML workloads. Macleod said this means using one pod per node. “If any one of the nodes stops, you need to get another one in place and then restart [the workload] from a checkpoint,” he said. “These new kinds of workloads are not very fault tolerant.”

According to Macleod, another challenge is the fact that AI researchers who work with AI foundation models tend to program using Python.

“Many come from academia and they write Python code, which is not what we would call elegant code in software engineering, and there’s no source [code] control,” he said, adding that the AI researchers then try to distribute these Python workloads across tens of thousands of nodes.

“It’s a completely different mental model compared to our preferred way, with IT operations, and you check it into the source and then that blasts out in a controlled manner; it’s a very different world,” said Macleod.

AI and ML present very different problems to other Kubernetes workloads. Roden said that in the AI space, there needs to be a shift in mindset among developers to improve their understanding of how they can work with AI. “I don’t necessarily know if the scale of usage from the average enterprise developer is actually there, because it is quite complex and it does require a different skill set, a different approach and a different set of infrastructure,” she added.

Bias went further, adding: “People don’t realise that these are HPC workloads,” referring to the fact that AI and ML share similar problems to supercomputing. He said the problem of AI and ML resiliency was the same kind of problem that supercomputing centres have been tackling for decades. “We don’t have to reinvent the wheel,” said Bias. “We just need to go back and take that knowledge that’s been in the HPC niche.”

Global pressure

Beyond writing and managing code, among the questions asked during the panel discussion was one looking at how geopolitical tension was impacting open source and the Linux Foundation. “There’s a tonne of perils and pitfalls in front of us,” said Bias.

At the CNCF event in Hong Kong, he said the presenters were all Mandarin speakers discussing projects being developed in China. “There is a danger of regional fractionalisation,” said Bias.

But while, from a geopolitical perspective, open source tends to be neutral and the community tries to operate across international boundaries, politics can influence this openness.

“There are geopolitical realities for an organisation like the Linux Foundation because it operates primarily inside the United States,” he said, referring to the Linux Foundation’s decision in November 2024 to exclude a cohort of Russian Linux kernel maintainers.

Bias suggested that given it’s based out of the US, the Linux Foundation is no longer well-placed to oversee the global open source community. Instead, he said what is needed is a United Nations for open source that operates in a geographically neutral region.


Source link