Skip to content

Commit

Permalink
Add post "k8s-multi-tenancy" (#4364)
Browse files Browse the repository at this point in the history
Signed-off-by: Carson Yang <[email protected]>
  • Loading branch information
yangchuansheng authored Nov 29, 2023
1 parent 9466c12 commit 67dd44e
Show file tree
Hide file tree
Showing 2 changed files with 210 additions and 0 deletions.
94 changes: 94 additions & 0 deletions docs/blog/en/2023/k8s-multi-tenancy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
slug: k8s-multi-tenancy
title: The Promise and Challenges of Kubernetes Multi-Tenant
description: explores the value proposition of multi-tenant Kubernetes, implementation hurdles, and potential solutions to unlock its benefits.
authors: [fanux]
tags: [Kubernetes, Sealos, Multi-Tenant]
keywords: [Cloud Operating System, Sealos, K8s, Cloud Native, Cloud Computing, Cloud OS, PaaS, Multi-Tenant, Runtime Isolation, Namespace]
image: https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-17-36-fBsk9p.jpg
date: 2023-11-29T10:00
---

In today's business landscape, managing cloud and server resources is becoming increasingly unwieldy as companies diversify and scale. While powerful, Kubernetes lacks native support for multi-tenancy - the ability to securely isolate multiple tenant workloads. This gap creates deployment friction for teams and missed efficiency opportunities for enterprises.

**This article explores the value proposition of multi-tenant Kubernetes, implementation hurdles, and potential solutions to unlock its benefits.**

<!--truncate-->

## The Potential of Multi-Tenant Kubernetes

Multi-tenancy refers to an architecture allowing multiple users or "tenants" to share resources from the same system while keeping data isolated and secure. For Kubernetes, this means running workloads from different teams on a shared cluster without risk of resource conflicts, data leaks, or security issues.

![Diagram of single vs multi-tenant Kubernetes](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-rLPyaY.jpg)

### Pain Points of Single-Tenant Setups

Consider an enterprise Kubernetes cluster used by 20 internal departments. Without multi-tenancy, several pain points emerge:

1. **Inefficiency** - Deployments get bottlenecked through cluster administrators, hampering velocity.
2. **Underutilization** - Workloads cannot mix, leading to resource stranding.
3. **Sprawl** - Lacking isolation allows cluster entanglement over time.
4. **Limitations** - Fixed single-tenant structure strains under changing demands.

![Comparison table of single vs multi-tenant Kubernetes](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-15-53-DGg4ig.png)

### The Multi-Tenant Advantage

Conversely, effective multi-tenancy unlocks greater cloud agility:

1. **Organization** - Teams self-manage resources without wasteful allocation conflicts.
2. **Velocity** - Services rapidly provision without administrative bottlenecks.
3. **Efficiency** - Cluster administrators focus holistically rather than application-by-application.
4. **Resilience** - Workload isolation enhances stability despite diverse deployments.

## Navigating Multi-Tenancy Challenges in Kubernetes

In the Kubernetes (K8s) landscape, establishing a multi-tenant architecture is a complex endeavor, transcending the mere application of namespaces and involving an array of technical intricacies.

### Challenge 1: Curbing Over-Privilege Risks

Central to a multi-tenant K8s framework is the strict regulation of user permissions. In scenarios where a cluster is shared among several users, overly privileged individuals can pose a serious threat. Prohibitions against accessing server nodes or executing node-level commands, such as `kubectl get node`, are essential. Further, it's crucial to curtail other high-risk activities, including the activation of container privileged modes, and sharing of host filesystems, ports, and networks.

Sealos addresses these concerns through a multi-faceted isolation approach. It employs OpenEBS for block-level storage isolation, Firecracker and Cloud Hypervisor for computational isolation, and Cilium for network isolation, ensuring that the activities of one tenant do not adversely affect others.

### Challenge 2: User Identity, Authorization, and Namespace Ties

Inherent to K8s is the absence of a native user management framework. This necessitates the creation of a user identity system, integration with external user management platforms, and issuance of unique kubeconfig files or tokens. Moreover, it's imperative to forge a multifaceted linkage between users and namespaces, coupled with the distribution of tailored permissions.

![Image Depicting User-Namespace Association](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-Dfn5xa.png)

Sealos's framework enables administrators to effectively slot users into designated namespaces and regulate their roles, thereby achieving a granular control over permissions. This guarantees that users access only the resources they are legitimately permitted to use.

![Image Illustrating User Permissions Management](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-wknQxI.png)

![Image Showcasing User Role Control](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-RQFrTB.png)

### Challenge 3: Metering and Managing Quotas

A critical aspect of multi-tenancy in K8s is the equitable distribution and meticulous tracking of resource usage, including CPU, memory, disk, and network utilization. Managing excess usage and differentiating between internal and external network traffic are particularly challenging, as is accurately attributing traffic to specific containers and tenants.

Utilizing eBPF technology, Sealos adeptly monitors network traffic, correlating it with tenant information and storing it in a database for precise billing and resource management. For compute and storage resources, Sealos relies on controllers to gather and administer relevant data, ensuring efficient resource oversight.

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-36-HsycaI.png)

## Extreme Multi-Tenancy - The Sealos Challenge

In the realm of multi-tenancy, Sealos embarks on an ambitious journey, operating within the unpredictable confines of a public network. This scenario invites any developer to join and partake in a communal Kubernetes cluster, which inherently raises substantial security and stability risks.

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-54-kbCMsN.png)

The method adopted by Sealos brings forth distinct benefits: cost-effectiveness, as it negates the need for users to independently build and manage their clusters, leading to significant cost reductions in cloud services. It also enhances resource utilization, allowing container operations on a smaller scale, thereby leveraging the platform’s flexibility and resources. Crucially, establishing strong isolation in such a public network setting can bolster security and stability.

Nevertheless, Sealos is confronted with a series of formidable challenges:

**Challenge 1: Overcoming Gateway Limitations** The substantial user base of Sealos generates immense traffic, pushing the boundaries of many mainstream open-source gateways. A single user update can potentially impact the entire user base, as systems like Nginx require configuration reloads. Moreover, specific well-known gateways, which remain unnamed, struggle with issues like CPU overload and delays in configuration effectiveness. Sealos has proactively engaged with upstream communities for potential improvements.

**Challenge 2: Addressing Runtime Isolation** Strong isolation is critical for ensuring security in Sealos's multi-tenant environment, but current mainstream runtime environments fall short of meeting these requirements. For example, Firecracker's inadequate GPU support presents a significant limitation for high-performance computing applications.

**Challenge 3: Ensuring Storage Isolation** A paramount concern for Sealos is the isolation of tenant data to prevent unauthorized access or data breaches. The goal is to implement block-level storage isolation, a challenging but necessary endeavor.

**Challenge 4: Network Metering and Contention Management** Managing and accurately metering network resources is essential in a multi-tenant infrastructure. Sealos is committed to distributing these resources fairly, particularly in situations where resource contention occurs, to ensure that all users have fair and efficient access to network resources.

**Summary**

Only with the advancement and maturity of multi-tenancy can the true essence of a cloud be actualized, harnessing over ninety percent of its inherent capabilities. Facing the complexities and unpredictabilities of the public internet, Sealos excels by providing secure and isolated multi-tenant environments. It achieves this while ensuring efficiency and reducing operational costs. The underlying technology Sealos uses is elegantly designed, catering perfectly to businesses where a single cloud platform is shared among all developers.
116 changes: 116 additions & 0 deletions docs/blog/zh-Hans/2023/k8s-multi-tenancy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
slug: k8s-multi-tenancy
title: K8s 多租户方案的挑战与价值
description: 本文深入探讨了 K8s 多租户的概念、其在现代企业中的应用价值,以及实现这一机制所面临的技术挑战和解决方案。
authors: [fanux]
tags: [Kubernetes, Sealos, 多租户]
keywords: [云操作系统, Sealos, K8s, 云原生, 多租户, 隔离, 命名空间]
image: https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-17-36-fBsk9p.jpg
date: 2023-11-29T10:00
---

在当今企业环境中,随着业务的快速增长和多样化,服务器和云资源的管理会越来越让人头疼。K8s 虽然很强大,但在处理多个部门或团队的业务部署需求时,如果缺乏有效的多租户支持,在效率和资源管理方面都会不尽如人意。

**本文将深入探讨 K8s 多租户的概念、其在现代企业中的应用价值,以及实现这一机制所面临的技术挑战和解决方案。**

<!--truncate-->

## K8s 多租户的价值

“多租户”是一种软件架构的设计方式,允许多个用户(租户)共享相同的系统或程序组件,同时保持各自数据的隔离性和安全性。在 K8s 环境中,实现有效的多租户机制意味着能够在同一 K8s 集群中运行多个独立的租户工作负载,而无需担心资源冲突、数据泄露或安全问题。

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-rLPyaY.jpg)

### 没有多租户支持的挑战

当企业购买很多服务器并安装 K8s 后,企业内部多个部门 (例如 20 个) 可能都需要在同一集群上部署各自的业务应用。在没有多租户机制的情况下,这种集群使用方式有很多弊端:

1. **低效率:**每个部门不能自主使用集群,必须通过集群管理员进行部署和管理。不仅减慢了部署进程,还可能造成排队等待的情况,大大降低工作效率。

2. **资源利用不充分:**业务应用不能混合部署,需要对服务器资源进行划分,最终可能会导致资源无法充分利用,造成浪费。

3. **业务和资源管理混乱:**在一个没有租户隔离的集群中,各部门的业务相互干扰,难以管理。随着时间的推移,集群的管理和运维变得越来越复杂。

4. **规模扩展受限:**在一个单一租户的环境下,集群难以支持多样化的业务需求,限制了企业的扩展能力。

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-yua4G3.png)

### 多租户架构的优势

一旦有了多租户能力,企业就可以真正意义上构建自己的云环境,实现资源的最大化利用和高效管理:

1. **资源管理有序:**通过账户系统,每个部门可以自主管理其资源使用,无需担心资源分配和使用上的混乱。

2. **效率大幅提升:**各部门可以独立进行业务部署和更新,无需麻烦集群管理员,极大提高了操作效率和业务的灵活性。

3. **资源扩展灵活:**集群管理员只需关注整个集群的资源状况,而不是单个业务应用,资源按需分配和扩展会更加灵活和高效。

4. **业务隔离保障稳定性:**不同部门的业务应用在集群中彼此隔离,避免了相互干扰,保障了业务的稳定运行。

## K8s 多租户的挑战

在 K8s 环境中实现多租户架构难度非常大,不是简单使用命名空间的能力就能实现的,还涉及到非常多的技术挑战。

### 挑战 1:防止越权

在 K8s 多租户环境中,限制每个用户的权限是关键。当多个用户共享一个集群时,一个权限过高的用户可能会对整个集群构成致命威胁。例如,禁止用户访问服务器节点或执行节点级别的操作,如使用 `kubectl get node` 命令。此外,需要限制其他高风险操作,如启用容器特权模式、共享主机文件系统、端口和网络等。

为了解决这些问题,Sealos 在其底层架构中采用了多种隔离手段。例如,使用 OpenEBS 进行存储的块级别隔离,Firecracker 以及 Cloud Hypervisor 用于计算运行时的隔离,以及通过 Cilium 实现网络隔离。这些措施确保即使在共享环境中,每个租户的操作也不会影响到其他租户。

### 挑战 2:用户的概念、授权与命名空间绑定

K8s 本身不具备原生的用户管理系统。因此,需要通过扩展功能来构建用户概念,与第三方用户系统对接,为每个用户生成独立的 kubeconfig 认证文件或 token。此外,需要建立用户与命名空间 (namespace) 之间的多对多关系,并为用户分配适当的权限。

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-Dfn5xa.png)

Sealos 的设计允许管理员将用户加入特定的命名空间,并对其角色进行管理,从而有效地控制权限。这样管理员就可以细粒度地管理用户权限,确保每个用户只能访问和修改他们被授权的资源。

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-wknQxI.png)

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-34-RQFrTB.png)

### 挑战 3:计量与配额管理

在多租户环境中,合理地分配和监控资源使用是另一个重大挑战。需要明确每个租户使用了多少 CPU、内存、磁盘和网络资源,并在资源使用超出配额时进行适当的处理。网络计量尤其复杂,需要区分内外网流量,而且要追踪到达特定容器的流量,并确定这些容器属于哪个租户。

Sealos 采用 eBPF 技术来监控网络流量,并通过控制器将流量数据与租户信息相关联,存储到数据库中。这样可以与计量计费系统对接,实现对资源使用的准确计费。对于计算和存储资源的监控,Sealos 同样采用了控制器来收集和管理这些信息。

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-36-HsycaI.png)

## Sealos 多租户的挑战

如果说上面的这些问题很难解决,那么 **Sealos 的场景是在上述难度上乘以了 10 倍**,因为 Sealos 选择了在公网这个不可信的环境中解决多租户问题,意味着给任意的开发者公开注册,然后一起共享一个 K8s 集群。

![](https://jsd.onmicrosoft.cn/gh/yangchuansheng/imghosting6@main/uPic/2023-11-29-10-54-kbCMsN.png)

公网环境的不可信性和开放性使得实现多租户变得尤为复杂。在这种环境下,任何开发者都可以注册并共享同一个 K8s 集群,这就带来了巨大的安全和稳定性挑战。但是,如果能够成功实现,其好处也是显而易见的:

1. **成本效益**:用户无需单独搭建和维护完整的集群,显著降低了云服务的使用成本。
2. **资源优化**:允许每个容器运行在更小的规模上,充分利用平台的弹性和资源。
3. **强隔离性**:在公网环境中实现良好的多租户隔离,可以确保更高的安全性和稳定性。

前途的光明的,但挑战也是巨大的。

### 挑战 1:网关限制

Sealos 目前拥有数万注册用户,流量很大,我们几乎已经打爆了市面上很多主流的开源网关。在这种情况下,一个用户的更新可能导致所有其他用户受到影响,因为 Nginx 需要重新加载配置,这显然是不能接受的。

还有某知名网关 (不便透露),控制器 CPU 很容易就被打爆。

还有某知名网关 (同样不便透露),数量一多配置生效需要超过两分钟。不过我们已经和上游社区进行了反馈,应该很快会有改进。

### 挑战 2:运行时隔离问题

Sealos 需要实现强隔离以保证多租户环境的安全。然而,市面上的主流运行时环境并不能满足 Sealos 的隔离需求。例如,Firecracker 无法提供对 GPU 的良好支持,这对于需要高性能计算的应用来说是一个比较严重的限制。

### 挑战 3:存储隔离问题

Sealos 需要确保不同租户的数据彼此隔离,防止数据泄露或被其他租户错误访问。这就需要实现块级别的存储隔离,挑战也很大。

### 挑战 4:网络计量和争用管理

最后,网络资源的计量和管理也是多租户环境中的关键问题。Sealos 需要准确地计量每个用户的网络使用情况,并且在资源有限的情况下合理地分配网络资源。当网络资源产生争用时,需要有机制来公平地解决这些争用,确保所有用户都能公平合理地使用网络资源。

## 总结

多租户成熟了才能算作是一朵真正的云,才能把云的威力发挥到九成以上。面对公网这一极其复杂和不可预测的环境,Sealos 不仅实现了多租户的隔离和安全,还在保障高效运行的同时,降低了成本。且底层使用了非常多优雅的技术方案,彻底解决企业所有开发者共享一朵云的需求。

0 comments on commit 67dd44e

Please sign in to comment.