Files
ocdp-go/Multi-Tenant Kubeconfig.md
Ivan087 47849042a7 feat: complete E2E deployment flow with storage layered config and values template versioning
- Instance deployment: charts browser, deploy modal, instances list
- Values Template version management (create/history/rollback)
- Storage layered config (cluster > workspace > shared priority)
- Cluster credential decryptIfNeeded for mixed encrypted/plaintext kubeconfig
- YAML syntax validation (client-side + server-side warning)
- Frontend: charts, instances, storage, templates, admin pages
- Backend: storage service, instance service, cluster service, helm client
- Multi-Tenant Kubeconfig.md: added by user
2026-04-30 16:31:00 +08:00

6.9 KiB

Technical Specification: Multi-Tenant Kubeconfig & Auth Gateway

1. System Overview & Goals

  • Objective: Develop a backend API service that automates Kubernetes multi-tenant onboarding (Namespace + Quota isolation) and securely distributes short-lived, dynamic kubeconfig files using the Kubernetes TokenRequest API.
  • Architecture Independence: This backend service acts as a standalone control plane. It is not strictly bound to a BFF pattern and does not need to run inside the target Kubernetes cluster (it supports Out-of-Cluster execution).
  • Out of Scope: This spec does NOT cover the frontend UI implementation or the downstream workload deployment. It focuses strictly on identity, tenant provisioning, and credential brokering.
  • Security Principles: Adhere strictly to Zero-Knowledge architecture (no token storage in DB), Ephemeral Credentials (short-lived tokens only), and Least Privilege (the Gateway must NOT be a cluster-admin).

2. Architecture & Topology

  • Tech Stack: Go net/http (or FastAPI), utilizing the official Kubernetes Client SDK (client-go or kubernetes-client/python).
  • Control Plane Flow:
    1. Client/Frontend -> Gateway: User requests environment access.
    2. Gateway -> K8s API: Gateway authenticates to the target K8s cluster using its own master credentials (e.g., an Out-of-Cluster kubeconfig).
    3. Gateway -> K8s API: Executes Namespace/SA creation (if new) or calls TokenRequest API (if existing).
    4. Gateway -> Client/Frontend: Returns a generated kubeconfig YAML string with the short-lived JWT token.

3. Core Business Logic Workflows

Phase 1: Tenant Initialization (Onboarding)

Triggered when a new user registers or requests a workspace for the first time. The Gateway must execute a K8s transaction creating four resources:

  1. Namespace: tenant-{user_uuid}
  2. ServiceAccount: sa-tenant-admin (Created inside the tenant's namespace).
  3. RoleBinding: Bind sa-tenant-admin to the admin (or custom) ClusterRole, strictly isolated within tenant-{user_uuid}.
  4. ResourceQuota: Enforce limits (e.g., requests.cpu: "4", limits.memory: "16Gi") to prevent noisy neighbors.

Phase 2: Credential Distribution (Dynamic Token)

Triggered when the user requests CLI access or downloads a kubeconfig.

  1. Locate the user's associated Namespace and ServiceAccount, verifying the user's ownership of the workspace.
  2. Audit Logging: Record the credential issuance event (User, IP, Workspace) into the database.
  3. Call the authentication.k8s.io/v1 TokenRequest API targeting sa-tenant-admin in the specific tenant's namespace.
  4. Set expirationSeconds: 7200 (2 hours). Hard limit; cannot be extended.
  5. Retrieve the generated JWT token and inject it into a pre-defined kubeconfig text template.

Phase 3: Automated Renewal & Emergency Suspension

  • Session Management: If accessed via a Web UI, the Gateway intercepts requests, attaches the dynamic token, and forwards them. If the token is within 10 minutes of expiration, the Gateway automatically issues a new TokenRequest.
  • Emergency Suspension: If a workspace is marked compromised, the Gateway deletes its K8s RoleBinding, instantly revoking access for all currently active tokens of that tenant.

4. API Contracts

4.1. Initialize Tenant Workspace

  • Route: POST /api/v1/workspaces/init
  • Auth: Gateway Session / Bearer Token
  • Rate Limit: Strictly rate-limited per user to prevent Namespace exhaustion.
  • Request Payload:
    {
      "tier": "basic" // Determines the ResourceQuota template
    }
    
  • Response Payload (201 Created):
    {
      "namespace": "tenant-a1b2c3d4",
      "status": "provisioned",
      "quota": {"cpu": "4", "memory": "8Gi"}
    }
    

4.2. Generate Dynamic Kubeconfig

  • Route: GET /api/v1/workspaces/credentials/kubeconfig
  • Auth: Gateway Session / Bearer Token
  • Request Payload(200 OK): Returns raw application/x-yamlcontent.
    apiVersion: v1
    clusters:
    - cluster:
        server: https://<k8s-api-server>
        certificate-authority-data: <ca-base64>
      name: internal-cluster
    contexts:
    - context:
        cluster: internal-cluster
        namespace: tenant-a1b2c3d4 # Default context locked to their namespace
        user: sa-tenant-admin
      name: tenant-context
    current-context: tenant-context
    kind: Config
    users:
    - name: sa-tenant-admin
      user:
        token: "eyJhbGciOiJSUzI1NiIs..." # Short-lived token injected here
    

4.3. Suspend Workspace (Emergency Kill Switch)

  • Route: POST /api/v1/workspaces/{id}/suspend
  • Auth: Admin Only
  • Behavior: Updates DB status to suspended and deletes the associated K8s RoleBinding.

5. Data Architecture & Persistence

  • Database: PostgreSQL (Relational mapping between Users and K8s Namespaces).

  • Table: users

    • id (UUID, PK),email,password_hash,status
  • Table: workspaces

    • id (UUID, PK)

    • user_id (UUID, FK to Users table)

    • k8s_namespace (String, unique)

    • k8s_sa_name (String)

    • tier (String)

    • created_at (Timestamp)

  • Table: audit_logs(Security Compliance)

    • id (UUID, PK), user_id (UUID), workspace_id (UUID), action (e.g., IssueKubeconfig), ip_address, created_at
  • Constraint: We do NOT store the K8s Token in the database. Tokens are ephemeral and generated on-the-fly.

6. Security, Threat Mitigation & Infrastructure Constraints

6.1 Threat Model

Threat Mitigation Strategy
Gateway Compromise The Gateway uses a strictly restricted K8s role. It cannot read existing Secrets or interfere with other tenants' running Pods.
Token Theft (XSS) Application-level Auth must use HttpOnly, Secure Cookies. Generated Kubeconfigs expire in 2 hours.
Resource Abuse (Mining) Hardcoded ResourceQuota per tenant upon creation. Global LimitRange enforced at the cluster level.

6.2 Restricted Gateway Credentials (Crucial)

The Gateway requires a K8s credential (Out-of-Cluster kubeconfig or Cloud IAM Role) to operate. This credential MAY NOT have cluster-admin privileges. It should be bound to a custom ClusterRole with ONLY the following permissions:

  • create, get, list on namespaces, resourcequotas.
  • create, get, list on serviceaccounts, rolebindings.
  • create on serviceaccounts/token (CRITICAL for TokenRequest API).
  • Strictly prohibited: get or list on secrets, pods, or deployments.

6.3 Deployment & Networking

  • Deployment Agnostic: The application will be packaged as a Docker image and can be deployed via Docker Compose, standalone VMs, or within a Kubernetes cluster.
  • CORS/CSP: Since this might not be a single-origin BFF, explicit CORS policies (Access-Control-Allow-Origin) must be tightly defined if the frontend is hosted on a separate domain. Wildcards (*) are prohibited.