Files
ocdp-go/docs/user-guide.md
Ivan087 7f238a3168 refactor: full-stack restructure with multi-tenancy, workspace management, and K8s diagnostics
- Add Workspace domain (entity, repository, service, handler, DTO)
- Add multi-tenant K8s client with tenant binding and quota management
- Add K8s diagnostics client (instance diagnostics)
- Add authorization middleware (authz package)
- Restructure frontend to feature-based architecture (features/)
- Add User Management page in configuration
- Add AccessDenied page and route guards
- Refactor shared components (form inputs, layout, UI)
- Update Tailwind config for new design system
- Add comprehensive documentation (docs/, tasks/, plans)
- Improve cluster service with better kubeconfig handling
- Add tests for crypto, config, helm client, tenant binding
2026-05-12 16:15:14 +08:00

28 KiB

OCDP Platform User Guide

Table of Contents

  1. Overview
  2. Login / Authentication
  3. Home Page
  4. Launch Instance (Chart Browser)
  5. Instances Management
  6. Cluster Monitoring
  7. Setup — Clusters
  8. Setup — Registries
  9. Setup — Users (Admin)
  10. Navigation

1. Overview

OCDP (Open Cloud Deployment Platform) is a Kubernetes LLM inference deployment platform. Its primary use case is: a user selects a vllm-serve Helm chart from a Harbor registry, fills in the instance name, namespace, and values, and the backend pulls the packaged OCI Helm chart and deploys it to a configured Kubernetes cluster via the Helm SDK.

Architecture

Frontend (React 18 + TypeScript + Vite + TailwindCSS)
        |
        | HTTP /api/*
        v
Nginx (Reverse Proxy / Static File Server)
        |
        | HTTP /api/*
        v
Backend (Go 1.24 + Gorilla Mux + Hexagonal Architecture)
        |
        +---> PostgreSQL (persistence)
        +---> ORAS SDK (OCI chart pull)
        +---> Helm SDK (deploy/upgrade/delete)
        +---> client-go (Kubernetes API)

Tech Stack

Layer Technology
Frontend React 18, TypeScript, Vite, TailwindCSS, React Router, Lucide icons
Backend Go 1.24, Gorilla Mux, PostgreSQL, ORAS SDK, Helm SDK, client-go
Gateway Nginx (reverse proxy + static file serving)
Database PostgreSQL
Deployment Docker Compose

2. Login / Authentication

Access

The frontend is deployed at http://10.6.80.114:18080. Navigating to the root URL redirects to the login page.

Login Page

The login page displays:

  • OCDP Console title with a shield icon
  • Subtitle: "Sign in with an account created by an administrator"
  • Username text field (required)
  • Password text field (required, masked)
  • Login button — blue, centered, full-width

When you click Login:

  1. A toast notification "Logging in..." appears briefly
  2. The button shows a spinning loader and "Logging in..." text
  3. On success: a "Welcome, {username}!" toast appears, and you are redirected to /home
  4. On failure: a red error message is shown below the button (e.g., "Invalid credentials" or "Network error")

Default Admin Credentials

If the system was bootstrapped via .env configuration, the default admin credentials are:

  • Username: admin (or whatever was set as BOOTSTRAP_ADMIN_USER)
  • Password: The value of BOOTSTRAP_ADMIN_PASS in your .env file

JWT Session Behavior

  • The backend issues JWT tokens upon successful login
  • The frontend stores the tokens and sends them as Authorization: Bearer <token> headers
  • Session persists until the token expires or the user signs out
  • Clicking the logout icon (top-right corner, person icon with a door arrow) signs the user out

Routing When Authenticated

  • Unauthenticated users are always redirected to / (login page)
  • Authenticated users visiting / are redirected to /home
  • Protected routes are wrapped in a ProtectedRoute component; unauthorized access redirects to login
  • Route-level access is enforced per user role (admin vs regular user)

3. Home Page

The home page at /home is the main landing page after login. It has three sections.

Section 1: Primary Actions (3 Cards)

A large card titled "One Click Deployment Platform" / "Operations Workbench" contains three action cards arranged in a row:

1. Launch Instance

  • Icon: Rocket (blue background)
  • Description: "Browse Helm charts and deploy a new inference service."
  • Clicking navigates to /artifact/registries
  • Shows "Open" with an arrow on hover

2. Instances

  • Icon: Package (emerald background)
  • Description: "Check release status, entries, upgrades, and deletion."
  • Clicking navigates to /artifact/instances
  • Shows "Open" with an arrow on hover

3. Cluster Monitoring

  • Icon: Activity (dark slate background)
  • Description: "Inspect cluster health and node resource pressure."
  • Clicking navigates to /monitoring/clusters
  • Shows "Open" with an arrow on hover

Each card:

  • Has a subtle border, slate background, and hover effect (lifts slightly, adds blue border)
  • Shows a colored icon box at the top
  • Shows title and description
  • Has an "Open" link with right-arrow at the bottom

Section 2: Runtime Focus Sidebar

On the right side of the primary actions, a smaller card titled "Runtime Focus" with subtitle "High-frequency checks":

  • Release status — clickable row that navigates to /artifact/instances. Subtitle: "Installed, failed, deleting"
  • Cluster health — clickable row that navigates to /monitoring/clusters. Subtitle: "Nodes, pods, CPU, memory"

Section 3: Setup

A bottom section titled "Setup" with subtitle "Less frequent administrative tasks". Contains three buttons in a row:

  1. Clusters — Server icon. Description: "Kubeconfig and namespace policy". Navigates to /configuration/clusters
  2. Registries — Database icon. Description: "Harbor robot account and chart access". Navigates to /configuration/registries
  3. Users — Users icon. Description: "Admin-only account management". Navigates to /configuration/users. Only visible to admin users.

4. Launch Instance (Chart Browser)

The Artifact Browser page at /artifact/registries is the chart browser for selecting and launching Helm charts.

Page Layout

The page is a split-pane layout:

  • Left sidebar (w-80): Registry tree with search
  • Right main panel: Repository info, tags, and launch actions

Left Panel: Registry Tree

Header bar (top of page):

  • Title: "Chart Browser"
  • Subtitle: "Select a Harbor chart and launch it into a Kubernetes cluster"
  • Refresh button (secondary style, refresh icon) — reloads all registries and repositories, clears cache

Search bar at the top of the sidebar:

  • Placeholder: "Search registries / repositories..."
  • Filters the tree as you type (matches registry name and repository name)
  • Shows a search icon on the left

Registry nodes listed below the search bar:

  • Each registry shows:
    • Chevron (down/right) to expand/collapse
    • Database icon (blue)
    • Registry name
    • Registry URL (truncated, small text)
    • Badge showing count of repositories
  • Registries are expanded by default
  • Clicking a registry header toggles expansion

Repository items under each registry:

  • Each shows the repository name
  • Clicking a repository selects it (highlighted blue background) and loads its artifacts in the right panel
  • Shows artifact count if available
  • If no repositories: shows "No chart repositories found."
  • If loading: shows "Loading repositories..."

Right Panel: Repository Details

When a repository is selected:

Repository header:

  • Label: "Chart repository" (uppercase, small)
  • Repository name (large, bold)
  • Registry name below
  • Filter chips: Two toggle buttons: "Charts" (default selected, blue) and "All tags"
  • "Charts" filter shows only artifacts of type chart (i.e., deployable Helm charts)
  • "All tags" shows every artifact version regardless of type

Artifact grid (responsive: 1-3 columns):

  • Each artifact is displayed in a TagCard component (see below)

When no repository is selected:

  • Shows empty state: "Select a repository" with "Choose a chart repository from the left panel."

TagCard Component

Each TagCard shows:

  • Type icon: Package (chart), Box (image), or File (other) with color-coded background
  • Tag name (e.g., 1.0.0) with a type badge (e.g., "chart" in blue)
  • Repository path (truncated)
  • Size in KB or MB (e.g., "12.5 MB")

TagCard buttons:

  1. Launch button — Blue button, only visible when the artifact type is chart. Shows rocket icon + "Launch". Opens the LaunchModal.
  2. Copy button — White button with copy icon. Copies the helm pull oci://... command to the clipboard. Shows a success toast.

LaunchModal

Opens when "Launch" is clicked on a chart artifact. Title: "Launch Instance" with rocket icon.

Modal header:

  • Shows the repository name and tag (e.g., vllm-serve:1.0.0)

Form fields:

  1. Target Cluster (required)

    • Dropdown select listing all configured clusters
    • Auto-selects the first available cluster (or user's default cluster)
    • If no clusters: shows an amber warning "No clusters available. Please add a cluster first."
    • Shows loading state while fetching
  2. Instance Name (required)

    • Text input, placeholder: "my-app"
    • Help text: "Lowercase alphanumeric characters, '-' or '.'"
  3. Namespace (required)

    • If the selected cluster has allowed namespaces: shows a dropdown of allowed namespaces
    • If no restrictions: shows a text input, default "default"
    • If namespace is controlled by workspace policy: input is disabled with a blue info notice
  4. Description (optional)

    • Text input, placeholder: "Optional description"
  5. Configuration Values — Three input modes:

    a) Quick mode (default):

    • Blue info box explaining "Quick launch uses the chart defaults"
    • Shows badges: "No values override" and if available "Chart values.yaml available"
    • If values.yaml exists: "Load Defaults from values.yaml" button switches to YAML mode with defaults pre-filled
    • Best for simple deployments with no custom overrides

    b) Guided mode (form):

    • Only available when the chart provides a JSON Schema for its values
    • Dynamically generates form fields based on the schema
    • Supports various schema types: string, number, boolean, object, array
    • "Load Defaults" button — fills in values from the schema defaults
    • Shows schema-generated form in a scrollable container

    c) YAML mode:

    • A code editor (textarea) for entering custom values in YAML format
    • Real-time YAML validation with error display
    • "Load Defaults from values.yaml" button
    • "Load Schema Defaults" button (if no values.yaml but schema exists)
    • "Clear" button to reset the YAML
    • Help text changes based on whether schema is available
  6. Artifact Info (read-only summary):

    • Repository name
    • Tag (badge)
    • Type

Footer buttons:

  • Cancel (secondary) — closes the modal
  • Launch (success/green style with rocket icon) — submits the deployment

Validation on submit:

  • Cluster must be selected
  • Instance name must not be empty
  • Namespace must not be empty
  • If namespace policy restricts namespaces, the selected namespace must be in the allowed list
  • YAML values are parsed and validated before submission

After successful submit:

  • Form resets
  • Modal closes
  • Navigates to /artifact/instances to show the deploying instance
  • Shows "Instance deployed successfully" toast

Error states:

  • Loading clusters fails: error toast
  • Missing required fields: validation error toast
  • YAML parse error: inline error + toast
  • API failure: error toast with message

5. Instances Management

The Instances page at /artifact/instances manages all deployed Helm releases across clusters.

Stats Cards

Three gradient stat cards at the top (shown when clusters exist):

  1. Total Instances (blue) — total count across all clusters
  2. Clusters (emerald) — number of clusters
  3. Showing (violet) — count of currently displayed instances (only shown when filtering across 2+ clusters)

Filter Controls

When more than one cluster exists, a filter bar appears:

  • Filter by Cluster dropdown with "All Clusters" and each cluster with instance count
  • Selecting a cluster filters the instance list to that cluster only

Instance Display

Instances are grouped by cluster when "All Clusters" is selected, each cluster section showing:

  • Cluster name with instance count
  • Instances in a responsive 2-column grid

InstanceCard Component

Each card shows:

Header:

  • Instance name (bold, large)
  • Repository name with version badge (cyan)
  • Status badge with colored background glow:
Status Badge Color Description
Deployed Emerald Deployment completed successfully
Failed Rose/Red Last operation reported a failure
Pending Install Amber Installation is in progress
Pending Upgrade Amber Upgrade is in progress
Pending Rollback Amber Rollback is in progress
Pending Delete Orange Deletion is in progress
Superseded Indigo A newer revision has replaced this instance
Uninstalled Slate Instance has been removed from the cluster
Unknown Slate Awaiting next state update
  • Status reason text (e.g., "Deployment completed successfully." or a custom message)
  • Last operation label (Install / Upgrade / Rollback / Delete / Sync)

Details grid:

  • Namespace — purple icon
  • Revision — green icon, Helm revision number
  • Repository — full-width, truncated, monospace
  • Launched — date the instance was created

Last error alert (conditionally shown):

  • Red alert box with warning icon
  • Shows the last error message if the instance encountered errors

Action buttons (5 buttons in a row):

  1. Refresh — Refresh icon. Refreshes the status of this specific instance from the cluster.
  2. Entries — Network icon (emerald). Opens Entries modal.
  3. Diagnostics — Activity icon (indigo). Opens Diagnostics modal.
  4. Modify — Settings icon (blue). Opens Modify modal.
  5. Delete — Stop icon (rose/red). Prompts confirmation, then deletes.

Empty / Loading / Error States

  • Loading: Shows spinning indicator with "Loading instances..."
  • Error: Shows error state with retry button
  • Empty: "No instances found" with link to launch from registries
  • Auto-refresh: Data refreshes every 30 seconds silently

Entries Modal

Displays network entry information for the instance.

Header:

  • Title: "Instance Entries"
  • Instance name and namespace

Data Source badge:

  • Live from Kubernetes (green) — fetched directly from the cluster
  • From Helm Manifest (blue) — extracted from Helm manifest
  • From Helm Notes (yellow) — from Helm release notes
  • No Data Available (gray)

Services section (if any):

  • Lists each Kubernetes Service with:
    • Service name and type badge
    • Cluster IP (copyable)
    • Ports with mapping (e.g., 80 → 8080 TCP, NodePort)
    • LoadBalancer entries (if applicable) with external link and copy

Ingresses section (if any):

  • Lists each Kubernetes Ingress with:
    • Ingress name and class
    • Host with external link and copy buttons
    • Path routing (e.g., / → service:80)
    • TLS indicator if HTTPS is configured

Helm Notes (as fallback):

  • Raw Helm notes text shown in a monospace pre block

Footer: Close button

Diagnostics Modal

Provides Kubernetes-level diagnostics for the instance.

Header:

  • "Runtime diagnostics" label
  • Instance name
  • Namespace and data collection timestamp

Refresh button in the header — reloads diagnostics data from Kubernetes

Three tabs:

1. Describe tab (default):

  • Summary metrics: Pods count, Services count, Events count
  • Pods section — each pod shows:
    • Pod name, node, pod IP, restart count
    • Status badge (Running=success, other=warning)
    • Containers with name, state badge, image, and reason/message
  • Services section — each service with name, type badge, ClusterIP, ports

2. Events tab:

  • Kubernetes events sorted by time
  • Each event shows: type badge (Warning/ Normal), reason, timestamp, message, involved object, count

3. Pod Logs tab:

  • Logs from each container, labeled by pod/container name
  • Monospace display on dark background
  • Copy Logs button copies all logs to clipboard
  • Last 300 lines are fetched per container

Error states:

  • Loading fails: error toast
  • No data: amber info box "Diagnostics data is not available"
  • Empty pods/events/logs: relevant empty state message

Modify Modal

Allows modifying an existing instance.

Header: "Modify Instance - {name}" with settings icon

Current info section (read-only):

  • Current version
  • Cluster ID
  • Repository

Fields:

  1. Version Tag — text input, pre-filled with current version. Help: "Leave unchanged to keep current version"
  2. Description — text input
  3. Configuration Values — Form or YAML mode (auto-detects if schema exists):
    • Form mode: Dynamic form generated from values schema, with real-time sync to YAML
    • YAML mode: Textarea with monospace font, pre-filled with current values

Footer: Cancel / Modify buttons

After submit: Instance is upgraded via Helm, data refreshes, modal closes.


6. Cluster Monitoring

The Monitoring page at /monitoring/clusters shows the health and resource usage of all configured Kubernetes clusters.

Summary Stats Cards

Four stat cards at the top:

  1. Total Clusters (blue) — total number of clusters
  2. Healthy (green) — clusters with status "healthy"
  3. Warning (orange) — clusters with status "warning" or "unknown"
  4. Error (red) — clusters with status "error" or "unhealthy"

Auto-Refresh

  • The page auto-refreshes every 30 seconds
  • A small note shows "Auto-refresh every 30 seconds" with a refresh indicator
  • Manual Refresh button in the page header

ClusterMonitorCard

Each cluster is shown in an expandable card.

Card header:

  • Status icon (green check, yellow warning, red X, or gray question mark)
  • Cluster name with status badge (Healthy / Warning / Error)

Metrics grid (4 columns):

  • Uptime — how long the cluster has been running
  • Nodes — node count
  • Pods — pod count
  • GPU — used/total GPU count

Resource usage (3 columns with progress bars):

  • CPU — used/total, percentage bar, max per node with peak usage
  • Memory — used/total, percentage bar, max per node with peak usage
  • GPU (only if GPUs exist) — used/total, percentage bar, max per node with peak usage

Last checked timestamp

Show Nodes / Hide Nodes toggle button (only if nodes exist)

NodeMetricCard (Expandable Nodes)

When nodes are expanded, each node shows:

  • Node name with status icon (Ready green / NotReady red)
  • Status badge (Ready / NotReady) and role badge (Control Plane / Worker)
  • Age
  • CPU — usage/allocatable with progress bar
  • Memory — usage/allocatable with progress bar
  • GPU — usage/capacity with progress bar (shows "No GPU" if none)
  • Additional info: Pod count, Kubelet version

States

  • Loading: Shows "Loading cluster monitoring data..."
  • Error: Error state with retry button, "Failed to Load Clusters"
  • Empty: "No Clusters Available" with suggestion to add clusters in configuration

7. Setup — Clusters

The Cluster Configuration page at /configuration/clusters manages Kubernetes cluster connections.

Page Header

  • Title: "Configuration - Clusters"
  • Description changes based on role (admin sees "Manage all..." , regular user sees "Manage your private...")
  • Refresh button (secondary)
  • Add Cluster button (primary, blue, plus icon)

ClusterList Component

Loading state: Spinner with "Loading clusters..." Empty state: Server icon with "No clusters" and "Add your first cluster..."

Cluster cards (2-column grid):

  • Cluster name with server icon and visibility label (Private / Workspace / Global)
  • Description (if any)
  • Three action buttons:
    • Test Connection (Activity icon, emerald) — performs a health check against the cluster
    • Edit (pencil icon, blue) — opens edit modal
    • Delete (trash icon, red) — prompts confirmation then deletes
  • API Server URL (monospace, truncated)
  • Auth status grid (3 columns): CA Certificate, Client Cert, Client Key — each shows " Configured" or "✗ Not Configured"
  • Created date

Action buttons may be disabled based on user permissions (read-only access).

Add / Edit Cluster Modal

Form fields:

  1. Cluster Name (required) — text input, e.g., "Production Cluster"
  2. API Server URL (required) — must start with https://, e.g., https://kubernetes.example.com:6443
  3. CA Certificate (Base64) — required for create. Textarea for base64-encoded CA cert. In edit mode: shows current status and optional new input.
  4. Client Certificate (Base64) — required for create. In edit mode: shows current status.
  5. Client Key (Base64) — required for create. In edit mode: shows current status.
  6. Bearer Token — optional alternative to client certificates. Textarea for service account token.
  7. Description — optional textarea

Validation:

  • Name and API Server URL required
  • URL must start with http:// or https://
  • Create mode requires either token OR all three certificate fields
  • Edit mode: certificate fields are optional (leave blank to keep existing)

Footer: Cancel / Add Cluster or Save

Health Check

Clicking the Test Connection (Activity) button on a cluster:

  • Shows "Testing cluster..." toast
  • If successful: green toast with success message
  • If failed: red toast with error message
  • Checks connectivity to the Kubernetes API server

8. Setup — Registries

The Registry Configuration page at /configuration/registries manages OCI registry connections.

Page Header

  • Title: "Configuration - Registries"
  • Refresh button (secondary)
  • Add Registry button (primary, blue, plus icon)

RegistryList Component

Loading state: "Loading registries..." Empty state: Database icon with "No registries" and "Add your first registry..."

Registry cards (vertical list):

  • Registry name with database icon and visibility label
  • Insecure badge (yellow, if insecure flag is on)
  • Registry URL (clickable link, opens in new tab)
  • Description (if any)
  • Username display
  • Two action buttons:
    • Edit (pencil icon, blue) — opens edit modal
    • Delete (trash icon, red) — prompts confirmation then deletes

Add / Edit Registry Modal

Form fields:

  1. Name (required) — e.g., "Harbor Production"
  2. Registry URL (required) — e.g., https://registry.example.com
  3. Username (required) — registry username (Harbor robot account recommended)
  4. Password — required for create. In edit mode: shows current status ("Password set - encrypted") and optional new password input
  5. Description — optional textarea
  6. Insecure — checkbox. "Allow insecure connection (skip SSL certificate verification)" — for registries using HTTP or self-signed certs

Test Connection button (in edit mode only, after saving):

  • Tests the registry connectivity by calling the backend health endpoint
  • Button shows a pulsing test tube icon while testing
  • Shows success/failure toast

Footer: Save / Test Connection / Cancel


9. Setup — Users (Admin)

The User Management page at /configuration/users is admin-only. Non-admin users cannot access this route.

Page Header

  • "Admin only" label with shield icon
  • Title: "User Management"
  • Description: "Create accounts, assign roles, and disable access without public self-registration."
  • Refresh button (secondary)

Create User Form (Left Panel)

Username (required) — text input Initial password (required) — masked input Role dropdown — "User" or "Admin"

When User role is selected, additional fields appear:

Tenant namespace section:

  • Namespace — text input, auto-generated from username as ocdp-u-{username}
  • Default cluster — dropdown of available clusters

Resource limits section:

  • CPU — default "4" (Kubernetes quantity, e.g., "4" or "500m")
  • Memory — default "16Gi"
  • GPU — default "0" (integer count)
  • GPU Mem — default "0" (integer MB, e.g., 10000)
  • Help text explains the units

Checkbox:

  • "Require password change after first login" — checked by default

Create User button (primary, full-width, user-plus icon)

Accounts Table (Right Panel)

A table with columns:

Column Content
User Username + email
Role Badge: "admin" (info blue) or "user" (secondary)
Status Badge: "Active" (green) or "Disabled" (warning)
Namespace Namespace + workspace name + default cluster
Quota CPU, Memory, GPU/GPU Mem (admin shows "default workspace")
Actions See below

Actions per row (4 buttons):

  1. Make User / Make Admin — toggles the user's role between admin and user
  2. Limits (pencil icon) — opens the Edit Limits modal (only for non-admin users)
  3. Enable / Disable — toggles the user's active status (disabled for own account)
  4. Delete (trash icon, red) — deletes user after confirmation (disabled for own account)

Edit Limits Modal

Opens when "Limits" button is clicked for a non-admin user:

  • Tenant limits label with gauge icon
  • User's name as title
  • Description: "Changes are applied to workspace metadata..."
  • Fields: Namespace, Default cluster, CPU, Memory, GPU, GPU Memory
  • Cancel / Save Limits buttons

10. Navigation

Left Sidebar

The sidebar shows the "Operations" branding at the top with the following navigation items:

Item Icon Route
Home Home (gray) /home
Launch Instance Rocket (blue) /artifact/registries
Instances Boxes (emerald) /artifact/instances
Cluster Monitoring LineChart (teal) /monitoring/clusters
Setup (collapsible) Settings
└ Clusters Server (teal) /configuration/clusters
└ Registries Database /configuration/registries
└ Users Users (blue) /configuration/users
  • The Setup section is expanded by default
  • Active nav item is highlighted with a blue background
  • Sidebar collapses on mobile with a hamburger menu toggle
  • Navigator items dynamically filter based on user role:
    • "Users" is only shown to admin users
    • Routes are protected server-side too

Page Header / Breadcrumbs

Each page shows a header in the top navigation bar with:

  • Page icon
  • Page title (e.g., "Launch Instance", "Instances", "Setup - Clusters")
  • Current user's name and role badge on the right
  • Sign Out button (door icon with arrow, top-right)

The title mapping is:

Route Header Title
/artifact/registries Launch Instance
/artifact/instances Instances
/configuration/clusters Setup - Clusters
/configuration/registries Setup - Registries
/configuration/users Setup - Users
/monitoring/clusters Monitoring - Clusters
/home OCDP Platform

Legacy Route Redirects

Several legacy URL patterns redirect to current routes:

  • /config/*/configuration/clusters
  • /monitor, /cluster, /cluster/monitor/monitoring/clusters
  • /artifact/registry/artifact/registries
  • /artifact/instance/artifact/instances
  • /registry/artifact/registries
  • /register/