Implementing an AWS multi-account strategy is a popular approach that helps organizations to manage their cloud resources efficiently. In my previous post, I discussed our reasons for implementing an AWS multi-account strategy, our journey, and some of the benefits we gained as an organization.
However, implementing this strategy can come with its fair share of challenges. This article will explore some challenges Detectify faced and the lessons we learned when adopting an AWS Multi-Account strategy.
Challenges when adopting AWS multi-account strategy
Our domain teams needed help understanding where to draw the responsibility and accountability line.
We tried to shield these teams from having to spend time on infrastructure in order for them to focus on the thing that mattered the most: the business. We aimed to ensure that domain teams didn’t need to worry about the infrastructure, network, and much more. Our concern was that It would slow down business development, which was the opposite of our goal.
Prioritization also became an issue for our Platform team. Some teams were eager to join this journey, while others feared the new world they would face.
Cognitive load overload
The Platform team (described by Team Topologies as a grouping of other team types that provide a compelling internal product to accelerate delivery by Stream-aligned teams) had to learn and adapt to many new technical challenges to help and assist other teams. The Platform team became the Enabling team (a team that helps a Stream-aligned team to overcome obstacles) as well.
But what does that mean, exactly?
Below I’ll outline some of our challenges and how they impacted our teams.
We had a team that provided a compelling internal product to accelerate delivery by Stream-aligned teams (a team that is aligned to a flow of work from a segment of the business domain). Options were narrow, but everything worked from a development and deployment perspective.
Previously, there was one way for our internal teams to develop and deploy their services or applications. This consisted of using only the pipelines and Kubernetes clusters provided by the Platform team.
By adopting the AWS multi-account strategy, we had many more opportunities. Although teams, including the Platform team, had to adapt to the new world while maintaining a historical set-up. Becoming an Enabling team meant the Platform team had to help the Stream-aligned team overcome obstacles and detect missing capabilities.
An increased cognitive load on the Platform team delayed development. Stream-aligned teams started spending most of their time writing infrastructure code which delayed application development. Teams started using new AWS services that were not used before by relying on the Platform team for help and support to use them.
This created cognitive load overload, which differed from our goal.
Misalignment
Alignment and consistency are two of the most important aspects an organization needs to maintain to change, restructure and adapt quickly. However, due to the speed of adapting to a multi-account strategy, we also faced this challenge in our journey.
Teams couldn’t just rely on the Platform team, so they started exploring and implementing infrastructure components on their own. The idea was that they help each other while the Platform team was working on the Golden Path (a recommended and supported way of building and deploying services) by collecting all the information and requirements.
It seemed like a great idea, but teams were not aligned, and that became a challenge. All had different structures, naming conventions, and even deployment pipelines. It was hard to help each other, and it was even harder to get help from the Platform team since things were inconsistent. Deployment pipelines that the Platform team offered didn’t work either.
Too generous with options
We used to have too few options on AWS service we could use, and by shifting to a multi-account strategy that gave teams their AWS account, we were then faced with the opposite issue. This created a challenge for the Platform team since teams still relied on them for help. Introducing one or two AWS services is fine, but introducing five or more at a time can be challenging for a not-so-big team.
Sometimes, the Platform team was too late to the party and suggested improvements after the service was already in production. This situation created frustration since it pushed development backward, which is never fun.
Technical challenges
To manage multi-account orchestration in this way, we had to learn a lot. Especially the Platform team. These are some of the challenges we had to solve during our journey to multi-account strategy:
Managing network infrastructure in a multi-account environment
There are different ways to handle network infrastructure in a multi-account environment:
- Deploy network infrastructure on each account and connect them via AWS Transit Gateway or VPC peering
- Have a dedicated network account and share network resources using AWS Resource Access Manager
We chose the second option, which provides better security, control, and cost. However, it comes with challenges when sharing network resources with other AWS accounts since the names of the resources are not being shared, and we had to tackle this differently.
Enforcing security and compliance
We had a lot more control when we had one pipeline and one way of deploying services to production. In a multi-account environment, this is also possible, but it’s a lot wider reach to cover.
The Platform team had to implement the following:
- Secure deployment pipelines with single domain access
- Security controls on all used AWS resources and AWS accounts
- Access to the AWS account with the Principle of least privilege policy
- Backup policies
- And much more
Configuring AWS accounts
The Platform team had to anticipate all the resources needed for old and new services that stream-aligned teams would use. Therefore, when creating an AWS account, we ran an account set-up that would deploy and configure everything on the accounts the team would need. We had to ensure that the solution was scalable and that we could always add more resources as we progress.
Shared services
One of the most time-consuming set-ups was shared services like Amazon Managed Streaming for Apache Kafka (MSK). Given the service’s cost, having an MSK cluster in each stream-aligned team AWS account would not be feasible. Therefore, we decided to deploy it to a central services account and orchestrate permissions with the Principle of least privilege policy. This came with a challenge, especially with AWS Lambda event source mapping, where a Lambda would be triggered directly by a Kafka message in a topic. We might share our solution in one of our future articles.
Conclusion
Implementing the AWS multi-account strategy has provided Detectify organization scalability and flexibility with increased security, improved cost management, and better visibility in our cloud environment. However, it also came with challenges, such as cognitive load overload, misalignment, overloaded options, and significant technical challenges. Despite these challenges, our Platform team was able to implement the strategy and provide assistance to other teams successfully.
Our journey is still not over, and we are looking forward to many more future learning opportunities!