Service mesh is a tool for adding observability, security, and traffic management capabilities at the application layer. A service mesh is intended to help developers and site reliability engineers (SREs) with service-to-service communication within Kubernetes clusters. The challenges involved in deploying and managing microservices led to the creation of the service mesh, but service mesh solutions themselves introduce complexities and challenges.
Here, we’ll explore use cases, challenges, and how to decide if a service mesh is right for you.
Use Cases Driving Adoption
There are four main use cases driving interest from DevOps teams and platform and service owners in service mesh adoption: security/encryption, service-level observability, and service-level control.
- Data-in-transit encryption – Security for data in transit within a cluster. Often this is driven by industry-specific regulatory concerns, such as PCI compliance or HIPAA. Or, it may be driven by internal data security requirements. When the security of internet-facing applications is at the core of an organization’s brand and reputation, security becomes a top priority.
- Service-level observability – Visibility into how workloads and services are communicating at the application layer. Kubernetes is a multi-tenant environment. As more workloads and services are deployed, it becomes harder to understand how everything is working together, especially if an organization is embracing a microservices-based architecture. Service teams want to understand their upstream and downstream dependencies.
- Service-level control – Controlling which services can talk to one another. This includes the ability to implement best practices around a zero-trust model.
- Secure cross-cluster connectivity – As services become shared and centralized in multi-cluster environments, securing and authorizing communication between clusters is another requirement for platform operators.
These are the main drivers for service mesh adoption, but the operational complexity involved in achieving robust security, encryption, service-level observability, and service-level control through the use of a service mesh can be a deterrent to adoption.
Challenges of Service Mesh
Complexity and performance are the two main challenges posed by service mesh.
- Additional control plane: Service mesh is difficult to set up and manage, and requires a specialized skill set. Using a service mesh introduces an additional control plane for security teams to manage, which causes increased deployment complexity and significant operational overhead.
- Specialized skills: While there are many service meshes available, there’s no one-size-fits-all solution to address the needs of different organizations. With that, it’s likely that security teams will spend time figuring out which service mesh will work for their applications. Use of a service mesh requires domain knowledge and specialized skills around whichever service mesh the user chooses. That adds another layer of complexity in addition to the work the team is already doing with Kubernetes.
- Performance issues: Because service mesh introduces latency, it can also create performance issues.
Platform owners, DevOps teams, and SREs have limited resources, so adopting a service mesh is a significant undertaking due to the resources required for configuration and operation. So, who needs a standalone service mesh?
Do I Need a Service Mesh?
If security and observability are your primary drivers, a lightweight approach versus a separate, standalone service mesh will likely suffice. With a lightweight service mesh, you can easily achieve full-stack observability and security, deploy highly performant encryption, and tightly integrate with existing security infrastructure like firewalls.
Lightweight Approach to Service Mesh
When looking for a lightweight service mesh, it’s important to consider the following:
- Security: Look for a solution that offers encryption for data in transit that leverages the latest in crypto technology. This will ensure that encryption is highly performant while still allowing visibility into all traffic flows.
- Observability: The solution should offer visibility into service-to-service communication in a way that is resource efficient and cost effective. It should provide Kubernetes-native visualizations of all the data it collects so you can visualize communication flows across services and team spaces, to facilitate troubleshooting. This is beneficial to platform operators, service owners, and development teams.
- Implementing controls: Your chosen solution should provide capabilities for implementing controls for the full stack, from the network layer up through the application layer. This will ensure that you get the application-layer controls you would get with a service mesh, but are able to combine those with controls you might want to implement at the network or transport layer.
One specific capability to look for is egress access control. This capability makes it easy to integrate with firewalls or other kinds of controls where you might want to understand the origin of egress traffic, and implement certain controls around that. If you’re working with a SIEM or other log management system or monitoring tool, it’s really helpful to be able to identify the origin of egress traffic, to the point where you have visibility into the specific application or namespace from which egress traffic seen outside the cluster came.
- Cluster mesh and federation: Look for a solution that provides a cluster mesh that can be used to federate the identity of endpoints and services across clusters, allowing teams to define policies that explicitly authorize and secure the cross-cluster communication that is required as more shared services and APIs become centralized. This allows platform owners to realize all of the benefits of a multi-cluster environment without incurring the overhead of operating a service mesh in each cluster.
Deciding What’s Right for You
While standalone services meshes are right for some use cases, if security and observability are your organization’s main goals, a bespoke solution probably isn’t necessary. Other solutions can provide granular observability and security – not just at the application layer, but across the full stack – while avoiding the operational complexities and overhead often associated with deploying a service mesh.
Ad