# Technical Specification: Multi-Tenant Kubeconfig & Auth Gateway ## 1. System Overview & Goals - **Objective**: Develop a backend API service that automates Kubernetes multi-tenant onboarding (Namespace + Quota isolation) and securely distributes short-lived, dynamic `kubeconfig` files using the Kubernetes `TokenRequest` API. - **Architecture Independence**: This backend service acts as a standalone control plane. It is **not** strictly bound to a BFF pattern and does **not** need to run inside the target Kubernetes cluster (it supports Out-of-Cluster execution). - **Out of Scope**: This spec does NOT cover the frontend UI implementation or the downstream workload deployment. It focuses strictly on identity, tenant provisioning, and credential brokering. - **Security Principles**: Adhere strictly to Zero-Knowledge architecture (no token storage in DB), Ephemeral Credentials (short-lived tokens only), and Least Privilege (the Gateway must NOT be a `cluster-admin`). ## 2. Architecture & Topology - **Tech Stack**: Go `net/http` (or FastAPI), utilizing the official Kubernetes Client SDK (`client-go` or `kubernetes-client/python`). - **Control Plane Flow**: 1. Client/Frontend -> Gateway: User requests environment access. 2. Gateway -> K8s API: Gateway authenticates to the target K8s cluster using its own master credentials (e.g., an Out-of-Cluster `kubeconfig`). 3. Gateway -> K8s API: Executes Namespace/SA creation (if new) or calls `TokenRequest` API (if existing). 4. Gateway -> Client/Frontend: Returns a generated `kubeconfig` YAML string with the short-lived JWT token. ## 3. Core Business Logic Workflows ### Phase 1: Tenant Initialization (Onboarding) Triggered when a new user registers or requests a workspace for the first time. The Gateway must execute a K8s transaction creating four resources: 1. **Namespace**: `tenant-{user_uuid}` 2. **ServiceAccount**: `sa-tenant-admin` (Created inside the tenant's namespace). 3. **RoleBinding**: Bind `sa-tenant-admin` to the `admin` (or custom) ClusterRole, strictly isolated within `tenant-{user_uuid}`. 4. **ResourceQuota**: Enforce limits (e.g., `requests.cpu: "4"`, `limits.memory: "16Gi"`) to prevent noisy neighbors. ### Phase 2: Credential Distribution (Dynamic Token) Triggered when the user requests CLI access or downloads a kubeconfig. 1. Locate the user's associated Namespace and ServiceAccount, verifying the user's ownership of the workspace. 2. Audit Logging: Record the credential issuance event (User, IP, Workspace) into the database. 3. Call the `authentication.k8s.io/v1 TokenRequest` API targeting `sa-tenant-admin` in the specific tenant's namespace. 4. Set `expirationSeconds: 7200` (2 hours). Hard limit; cannot be extended. 5. Retrieve the generated JWT token and inject it into a pre-defined `kubeconfig` text template. ### Phase 3: Automated Renewal & Emergency Suspension - **Session Management**: If accessed via a Web UI, the Gateway intercepts requests, attaches the dynamic token, and forwards them. If the token is within 10 minutes of expiration, the Gateway automatically issues a new TokenRequest. - **Emergency Suspension**: If a workspace is marked compromised, the Gateway deletes its K8s `RoleBinding`, instantly revoking access for all currently active tokens of that tenant. ## 4. API Contracts ### 4.1. Initialize Tenant Workspace - **Route**: `POST /api/v1/workspaces/init` - **Auth**: Gateway Session / Bearer Token - **Rate Limit**: Strictly rate-limited per user to prevent Namespace exhaustion. - **Request Payload**: ```json { "tier": "basic" // Determines the ResourceQuota template } - **Response Payload (201 Created)**: ```json { "namespace": "tenant-a1b2c3d4", "status": "provisioned", "quota": {"cpu": "4", "memory": "8Gi"} } ``` ### 4.2. Generate Dynamic Kubeconfig - **Route**: `GET /api/v1/workspaces/credentials/kubeconfig` - **Auth**: Gateway Session / Bearer Token - **Request Payload(200 OK)**: Returns raw `application/x-yaml`content. ```yaml apiVersion: v1 clusters: - cluster: server: https:// certificate-authority-data: name: internal-cluster contexts: - context: cluster: internal-cluster namespace: tenant-a1b2c3d4 # Default context locked to their namespace user: sa-tenant-admin name: tenant-context current-context: tenant-context kind: Config users: - name: sa-tenant-admin user: token: "eyJhbGciOiJSUzI1NiIs..." # Short-lived token injected here ``` ### 4.3. Suspend Workspace (Emergency Kill Switch) - **Route**: POST /api/v1/workspaces/{id}/suspend - **Auth**: Admin Only - **Behavior**: Updates DB status to suspended and deletes the associated K8s RoleBinding. ### 5. Data Architecture & Persistence - **Database**: PostgreSQL (Relational mapping between Users and K8s Namespaces). - **Table**: `users` - `id` (UUID, PK),`email`,`password_hash`,`status` - **Table**: `workspaces` - `id` (UUID, PK) - `user_id` (UUID, FK to Users table) - `k8s_namespace` (String, unique) - `k8s_sa_name` (String) - `tier` (String) - `created_at` (Timestamp) - **Table**: `audit_logs`(Security Compliance) - `id` (UUID, PK), `user_id` (UUID), `workspace_id` (UUID), `action` (e.g., IssueKubeconfig), `ip_address`, `created_at` - **Constraint**: We do NOT store the K8s Token in the database. Tokens are ephemeral and generated on-the-fly. ## 6. Security, Threat Mitigation & Infrastructure Constraints ### 6.1 Threat Model | Threat | Mitigation Strategy | | :--- | :--- | | **Gateway Compromise** | The Gateway uses a strictly restricted K8s role. It cannot read existing `Secrets` or interfere with other tenants' running Pods. | | **Token Theft (XSS)** | Application-level Auth must use `HttpOnly, Secure` Cookies. Generated Kubeconfigs expire in 2 hours. | | **Resource Abuse (Mining)** | Hardcoded `ResourceQuota` per tenant upon creation. Global `LimitRange` enforced at the cluster level. | ### 6.2 Restricted Gateway Credentials (Crucial) The Gateway requires a K8s credential (Out-of-Cluster `kubeconfig` or Cloud IAM Role) to operate. **This credential MAY NOT have `cluster-admin` privileges.** It should be bound to a custom `ClusterRole` with ONLY the following permissions: - `create`, `get`, `list` on `namespaces`, `resourcequotas`. - `create`, `get`, `list` on `serviceaccounts`, `rolebindings`. - `create` on `serviceaccounts/token` (CRITICAL for TokenRequest API). - *Strictly prohibited*: `get` or `list` on `secrets`, `pods`, or `deployments`. ### 6.3 Deployment & Networking - **Deployment Agnostic**: The application will be packaged as a Docker image and can be deployed via Docker Compose, standalone VMs, or within a Kubernetes cluster. - **CORS/CSP**: Since this might not be a single-origin BFF, explicit CORS policies (`Access-Control-Allow-Origin`) must be tightly defined if the frontend is hosted on a separate domain. Wildcards (`*`) are prohibited.