refactor: full-stack restructure with multi-tenancy, workspace management, and K8s diagnostics
- Add Workspace domain (entity, repository, service, handler, DTO) - Add multi-tenant K8s client with tenant binding and quota management - Add K8s diagnostics client (instance diagnostics) - Add authorization middleware (authz package) - Restructure frontend to feature-based architecture (features/) - Add User Management page in configuration - Add AccessDenied page and route guards - Refactor shared components (form inputs, layout, UI) - Update Tailwind config for new design system - Add comprehensive documentation (docs/, tasks/, plans) - Improve cluster service with better kubeconfig handling - Add tests for crypto, config, helm client, tenant binding
This commit is contained in:
127
Multi-Tenant Kubeconfig.md
Normal file
127
Multi-Tenant Kubeconfig.md
Normal file
@ -0,0 +1,127 @@
|
||||
# Technical Specification: Multi-Tenant Kubeconfig & Auth Gateway
|
||||
|
||||
## 1. System Overview & Goals
|
||||
- **Objective**: Develop a backend API service that automates Kubernetes multi-tenant onboarding (Namespace + Quota isolation) and securely distributes short-lived, dynamic `kubeconfig` files using the Kubernetes `TokenRequest` API.
|
||||
- **Architecture Independence**: This backend service acts as a standalone control plane. It is **not** strictly bound to a BFF pattern and does **not** need to run inside the target Kubernetes cluster (it supports Out-of-Cluster execution).
|
||||
- **Out of Scope**: This spec does NOT cover the frontend UI implementation or the downstream workload deployment. It focuses strictly on identity, tenant provisioning, and credential brokering.
|
||||
- **Security Principles**: Adhere strictly to Zero-Knowledge architecture (no token storage in DB), Ephemeral Credentials (short-lived tokens only), and Least Privilege (the Gateway must NOT be a `cluster-admin`).
|
||||
|
||||
## 2. Architecture & Topology
|
||||
- **Tech Stack**: Go `net/http` (or FastAPI), utilizing the official Kubernetes Client SDK (`client-go` or `kubernetes-client/python`).
|
||||
- **Control Plane Flow**:
|
||||
1. Client/Frontend -> Gateway: User requests environment access.
|
||||
2. Gateway -> K8s API: Gateway authenticates to the target K8s cluster using its own master credentials (e.g., an Out-of-Cluster `kubeconfig`).
|
||||
3. Gateway -> K8s API: Executes Namespace/SA creation (if new) or calls `TokenRequest` API (if existing).
|
||||
4. Gateway -> Client/Frontend: Returns a generated `kubeconfig` YAML string with the short-lived JWT token.
|
||||
|
||||
## 3. Core Business Logic Workflows
|
||||
|
||||
### Phase 1: Tenant Initialization (Onboarding)
|
||||
Triggered when a new user registers or requests a workspace for the first time. The Gateway must execute a K8s transaction creating four resources:
|
||||
1. **Namespace**: `tenant-{user_uuid}`
|
||||
2. **ServiceAccount**: `sa-tenant-admin` (Created inside the tenant's namespace).
|
||||
3. **RoleBinding**: Bind `sa-tenant-admin` to the `admin` (or custom) ClusterRole, strictly isolated within `tenant-{user_uuid}`.
|
||||
4. **ResourceQuota**: Enforce limits (e.g., `requests.cpu: "4"`, `limits.memory: "16Gi"`) to prevent noisy neighbors.
|
||||
|
||||
### Phase 2: Credential Distribution (Dynamic Token)
|
||||
Triggered when the user requests CLI access or downloads a kubeconfig.
|
||||
1. Locate the user's associated Namespace and ServiceAccount, verifying the user's ownership of the workspace.
|
||||
2. Audit Logging: Record the credential issuance event (User, IP, Workspace) into the database.
|
||||
3. Call the `authentication.k8s.io/v1 TokenRequest` API targeting `sa-tenant-admin` in the specific tenant's namespace.
|
||||
4. Set `expirationSeconds: 7200` (2 hours). Hard limit; cannot be extended.
|
||||
5. Retrieve the generated JWT token and inject it into a pre-defined `kubeconfig` text template.
|
||||
|
||||
### Phase 3: Automated Renewal & Emergency Suspension
|
||||
- **Session Management**: If accessed via a Web UI, the Gateway intercepts requests, attaches the dynamic token, and forwards them. If the token is within 10 minutes of expiration, the Gateway automatically issues a new TokenRequest.
|
||||
- **Emergency Suspension**: If a workspace is marked compromised, the Gateway deletes its K8s `RoleBinding`, instantly revoking access for all currently active tokens of that tenant.
|
||||
|
||||
## 4. API Contracts
|
||||
|
||||
### 4.1. Initialize Tenant Workspace
|
||||
- **Route**: `POST /api/v1/workspaces/init`
|
||||
- **Auth**: Gateway Session / Bearer Token
|
||||
- **Rate Limit**: Strictly rate-limited per user to prevent Namespace exhaustion.
|
||||
- **Request Payload**:
|
||||
```json
|
||||
{
|
||||
"tier": "basic" // Determines the ResourceQuota template
|
||||
}
|
||||
- **Response Payload (201 Created)**:
|
||||
```json
|
||||
{
|
||||
"namespace": "tenant-a1b2c3d4",
|
||||
"status": "provisioned",
|
||||
"quota": {"cpu": "4", "memory": "8Gi"}
|
||||
}
|
||||
```
|
||||
### 4.2. Generate Dynamic Kubeconfig
|
||||
- **Route**: `GET /api/v1/workspaces/credentials/kubeconfig`
|
||||
- **Auth**: Gateway Session / Bearer Token
|
||||
- **Request Payload(200 OK)**: Returns raw `application/x-yaml`content.
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
clusters:
|
||||
- cluster:
|
||||
server: https://<k8s-api-server>
|
||||
certificate-authority-data: <ca-base64>
|
||||
name: internal-cluster
|
||||
contexts:
|
||||
- context:
|
||||
cluster: internal-cluster
|
||||
namespace: tenant-a1b2c3d4 # Default context locked to their namespace
|
||||
user: sa-tenant-admin
|
||||
name: tenant-context
|
||||
current-context: tenant-context
|
||||
kind: Config
|
||||
users:
|
||||
- name: sa-tenant-admin
|
||||
user:
|
||||
token: "eyJhbGciOiJSUzI1NiIs..." # Short-lived token injected here
|
||||
```
|
||||
|
||||
### 4.3. Suspend Workspace (Emergency Kill Switch)
|
||||
- **Route**: POST /api/v1/workspaces/{id}/suspend
|
||||
- **Auth**: Admin Only
|
||||
- **Behavior**: Updates DB status to suspended and deletes the associated K8s RoleBinding.
|
||||
|
||||
|
||||
### 5. Data Architecture & Persistence
|
||||
- **Database**: PostgreSQL (Relational mapping between Users and K8s Namespaces).
|
||||
- **Table**: `users`
|
||||
- `id` (UUID, PK),`email`,`password_hash`,`status`
|
||||
- **Table**: `workspaces`
|
||||
|
||||
- `id` (UUID, PK)
|
||||
|
||||
- `user_id` (UUID, FK to Users table)
|
||||
|
||||
- `k8s_namespace` (String, unique)
|
||||
|
||||
- `k8s_sa_name` (String)
|
||||
|
||||
- `tier` (String)
|
||||
|
||||
- `created_at` (Timestamp)
|
||||
- **Table**: `audit_logs`(Security Compliance)
|
||||
- `id` (UUID, PK), `user_id` (UUID), `workspace_id` (UUID), `action` (e.g., IssueKubeconfig), `ip_address`, `created_at`
|
||||
- **Constraint**: We do NOT store the K8s Token in the database. Tokens are ephemeral and generated on-the-fly.
|
||||
|
||||
## 6. Security, Threat Mitigation & Infrastructure Constraints
|
||||
|
||||
### 6.1 Threat Model
|
||||
| Threat | Mitigation Strategy |
|
||||
| :--- | :--- |
|
||||
| **Gateway Compromise** | The Gateway uses a strictly restricted K8s role. It cannot read existing `Secrets` or interfere with other tenants' running Pods. |
|
||||
| **Token Theft (XSS)** | Application-level Auth must use `HttpOnly, Secure` Cookies. Generated Kubeconfigs expire in 2 hours. |
|
||||
| **Resource Abuse (Mining)** | Hardcoded `ResourceQuota` per tenant upon creation. Global `LimitRange` enforced at the cluster level. |
|
||||
|
||||
### 6.2 Restricted Gateway Credentials (Crucial)
|
||||
The Gateway requires a K8s credential (Out-of-Cluster `kubeconfig` or Cloud IAM Role) to operate. **This credential MAY NOT have `cluster-admin` privileges.** It should be bound to a custom `ClusterRole` with ONLY the following permissions:
|
||||
- `create`, `get`, `list` on `namespaces`, `resourcequotas`.
|
||||
- `create`, `get`, `list` on `serviceaccounts`, `rolebindings`.
|
||||
- `create` on `serviceaccounts/token` (CRITICAL for TokenRequest API).
|
||||
- *Strictly prohibited*: `get` or `list` on `secrets`, `pods`, or `deployments`.
|
||||
|
||||
### 6.3 Deployment & Networking
|
||||
- **Deployment Agnostic**: The application will be packaged as a Docker image and can be deployed via Docker Compose, standalone VMs, or within a Kubernetes cluster.
|
||||
- **CORS/CSP**: Since this might not be a single-origin BFF, explicit CORS policies (`Access-Control-Allow-Origin`) must be tightly defined if the frontend is hosted on a separate domain. Wildcards (`*`) are prohibited.
|
||||
Reference in New Issue
Block a user