# OCDP Platform User Guide ## Table of Contents 1. [Overview](#1-overview) 2. [Login / Authentication](#2-login--authentication) 3. [Home Page](#3-home-page) 4. [Launch Instance (Chart Browser)](#4-launch-instance-chart-browser) 5. [Instances Management](#5-instances-management) 6. [Cluster Monitoring](#6-cluster-monitoring) 7. [Setup — Clusters](#7-setup--clusters) 8. [Setup — Registries](#8-setup--registries) 9. [Setup — Users (Admin)](#9-setup--users-admin) 10. [Navigation](#10-navigation) --- ## 1. Overview **OCDP (Open Cloud Deployment Platform)** is a Kubernetes LLM inference deployment platform. Its primary use case is: a user selects a `vllm-serve` Helm chart from a Harbor registry, fills in the instance name, namespace, and values, and the backend pulls the packaged OCI Helm chart and deploys it to a configured Kubernetes cluster via the Helm SDK. ### Architecture ``` Frontend (React 18 + TypeScript + Vite + TailwindCSS) | | HTTP /api/* v Nginx (Reverse Proxy / Static File Server) | | HTTP /api/* v Backend (Go 1.24 + Gorilla Mux + Hexagonal Architecture) | +---> PostgreSQL (persistence) +---> ORAS SDK (OCI chart pull) +---> Helm SDK (deploy/upgrade/delete) +---> client-go (Kubernetes API) ``` ### Tech Stack | Layer | Technology | |-------------|-----------------------------------------------------------------| | Frontend | React 18, TypeScript, Vite, TailwindCSS, React Router, Lucide icons | | Backend | Go 1.24, Gorilla Mux, PostgreSQL, ORAS SDK, Helm SDK, client-go | | Gateway | Nginx (reverse proxy + static file serving) | | Database | PostgreSQL | | Deployment | Docker Compose | --- ## 2. Login / Authentication ### Access The frontend is deployed at `http://10.6.80.114:18080`. Navigating to the root URL redirects to the login page. ### Login Page The login page displays: - **OCDP Console** title with a shield icon - Subtitle: "Sign in with an account created by an administrator" - **Username** text field (required) - **Password** text field (required, masked) - **Login** button — blue, centered, full-width When you click **Login**: 1. A toast notification "Logging in..." appears briefly 2. The button shows a spinning loader and "Logging in..." text 3. On success: a "Welcome, {username}!" toast appears, and you are redirected to `/home` 4. On failure: a red error message is shown below the button (e.g., "Invalid credentials" or "Network error") ### Default Admin Credentials If the system was bootstrapped via `.env` configuration, the default admin credentials are: - **Username:** `admin` (or whatever was set as `BOOTSTRAP_ADMIN_USER`) - **Password:** The value of `BOOTSTRAP_ADMIN_PASS` in your `.env` file ### JWT Session Behavior - The backend issues JWT tokens upon successful login - The frontend stores the tokens and sends them as `Authorization: Bearer ` headers - Session persists until the token expires or the user signs out - Clicking the **logout icon** (top-right corner, person icon with a door arrow) signs the user out ### Routing When Authenticated - Unauthenticated users are always redirected to `/` (login page) - Authenticated users visiting `/` are redirected to `/home` - Protected routes are wrapped in a `ProtectedRoute` component; unauthorized access redirects to login - Route-level access is enforced per user role (admin vs regular user) --- ## 3. Home Page The home page at `/home` is the main landing page after login. It has three sections. ### Section 1: Primary Actions (3 Cards) A large card titled "One Click Deployment Platform" / "Operations Workbench" contains three action cards arranged in a row: **1. Launch Instance** - Icon: Rocket (blue background) - Description: "Browse Helm charts and deploy a new inference service." - Clicking navigates to `/artifact/registries` - Shows "Open" with an arrow on hover **2. Instances** - Icon: Package (emerald background) - Description: "Check release status, entries, upgrades, and deletion." - Clicking navigates to `/artifact/instances` - Shows "Open" with an arrow on hover **3. Cluster Monitoring** - Icon: Activity (dark slate background) - Description: "Inspect cluster health and node resource pressure." - Clicking navigates to `/monitoring/clusters` - Shows "Open" with an arrow on hover Each card: - Has a subtle border, slate background, and hover effect (lifts slightly, adds blue border) - Shows a colored icon box at the top - Shows title and description - Has an "Open" link with right-arrow at the bottom ### Section 2: Runtime Focus Sidebar On the right side of the primary actions, a smaller card titled "Runtime Focus" with subtitle "High-frequency checks": - **Release status** — clickable row that navigates to `/artifact/instances`. Subtitle: "Installed, failed, deleting" - **Cluster health** — clickable row that navigates to `/monitoring/clusters`. Subtitle: "Nodes, pods, CPU, memory" ### Section 3: Setup A bottom section titled "Setup" with subtitle "Less frequent administrative tasks". Contains three buttons in a row: 1. **Clusters** — Server icon. Description: "Kubeconfig and namespace policy". Navigates to `/configuration/clusters` 2. **Registries** — Database icon. Description: "Harbor robot account and chart access". Navigates to `/configuration/registries` 3. **Users** — Users icon. Description: "Admin-only account management". Navigates to `/configuration/users`. Only visible to admin users. --- ## 4. Launch Instance (Chart Browser) The Artifact Browser page at `/artifact/registries` is the chart browser for selecting and launching Helm charts. ### Page Layout The page is a split-pane layout: - **Left sidebar (w-80):** Registry tree with search - **Right main panel:** Repository info, tags, and launch actions ### Left Panel: Registry Tree **Header bar** (top of page): - Title: "Chart Browser" - Subtitle: "Select a Harbor chart and launch it into a Kubernetes cluster" - **Refresh** button (secondary style, refresh icon) — reloads all registries and repositories, clears cache **Search bar** at the top of the sidebar: - Placeholder: "Search registries / repositories..." - Filters the tree as you type (matches registry name and repository name) - Shows a search icon on the left **Registry nodes** listed below the search bar: - Each registry shows: - Chevron (down/right) to expand/collapse - Database icon (blue) - Registry name - Registry URL (truncated, small text) - Badge showing count of repositories - Registries are expanded by default - Clicking a registry header toggles expansion **Repository items** under each registry: - Each shows the repository name - Clicking a repository selects it (highlighted blue background) and loads its artifacts in the right panel - Shows artifact count if available - If no repositories: shows "No chart repositories found." - If loading: shows "Loading repositories..." ### Right Panel: Repository Details When a repository is selected: **Repository header:** - Label: "Chart repository" (uppercase, small) - Repository name (large, bold) - Registry name below - **Filter chips:** Two toggle buttons: "Charts" (default selected, blue) and "All tags" - "Charts" filter shows only artifacts of type `chart` (i.e., deployable Helm charts) - "All tags" shows every artifact version regardless of type **Artifact grid** (responsive: 1-3 columns): - Each artifact is displayed in a **TagCard** component (see below) When no repository is selected: - Shows empty state: "Select a repository" with "Choose a chart repository from the left panel." ### TagCard Component Each TagCard shows: - **Type icon**: Package (chart), Box (image), or File (other) with color-coded background - **Tag name** (e.g., `1.0.0`) with a type badge (e.g., "chart" in blue) - **Repository path** (truncated) - **Size** in KB or MB (e.g., "12.5 MB") **TagCard buttons:** 1. **Launch button** — Blue button, only visible when the artifact type is `chart`. Shows rocket icon + "Launch". Opens the LaunchModal. 2. **Copy button** — White button with copy icon. Copies the `helm pull oci://...` command to the clipboard. Shows a success toast. ### LaunchModal Opens when "Launch" is clicked on a chart artifact. Title: "Launch Instance" with rocket icon. **Modal header:** - Shows the repository name and tag (e.g., `vllm-serve:1.0.0`) **Form fields:** 1. **Target Cluster** (required) - Dropdown select listing all configured clusters - Auto-selects the first available cluster (or user's default cluster) - If no clusters: shows an amber warning "No clusters available. Please add a cluster first." - Shows loading state while fetching 2. **Instance Name** (required) - Text input, placeholder: "my-app" - Help text: "Lowercase alphanumeric characters, '-' or '.'" 3. **Namespace** (required) - If the selected cluster has allowed namespaces: shows a dropdown of allowed namespaces - If no restrictions: shows a text input, default "default" - If namespace is controlled by workspace policy: input is disabled with a blue info notice 4. **Description** (optional) - Text input, placeholder: "Optional description" 5. **Configuration Values** — Three input modes: **a) Quick mode** (default): - Blue info box explaining "Quick launch uses the chart defaults" - Shows badges: "No values override" and if available "Chart values.yaml available" - If `values.yaml` exists: "Load Defaults from values.yaml" button switches to YAML mode with defaults pre-filled - Best for simple deployments with no custom overrides **b) Guided mode** (form): - Only available when the chart provides a JSON Schema for its values - Dynamically generates form fields based on the schema - Supports various schema types: string, number, boolean, object, array - **"Load Defaults"** button — fills in values from the schema defaults - Shows schema-generated form in a scrollable container **c) YAML mode**: - A code editor (textarea) for entering custom values in YAML format - Real-time YAML validation with error display - "Load Defaults from values.yaml" button - "Load Schema Defaults" button (if no values.yaml but schema exists) - "Clear" button to reset the YAML - Help text changes based on whether schema is available 6. **Artifact Info** (read-only summary): - Repository name - Tag (badge) - Type **Footer buttons:** - **Cancel** (secondary) — closes the modal - **Launch** (success/green style with rocket icon) — submits the deployment **Validation on submit:** - Cluster must be selected - Instance name must not be empty - Namespace must not be empty - If namespace policy restricts namespaces, the selected namespace must be in the allowed list - YAML values are parsed and validated before submission **After successful submit:** - Form resets - Modal closes - Navigates to `/artifact/instances` to show the deploying instance - Shows "Instance deployed successfully" toast **Error states:** - Loading clusters fails: error toast - Missing required fields: validation error toast - YAML parse error: inline error + toast - API failure: error toast with message --- ## 5. Instances Management The Instances page at `/artifact/instances` manages all deployed Helm releases across clusters. ### Stats Cards Three gradient stat cards at the top (shown when clusters exist): 1. **Total Instances** (blue) — total count across all clusters 2. **Clusters** (emerald) — number of clusters 3. **Showing** (violet) — count of currently displayed instances (only shown when filtering across 2+ clusters) ### Filter Controls When more than one cluster exists, a filter bar appears: - **Filter by Cluster** dropdown with "All Clusters" and each cluster with instance count - Selecting a cluster filters the instance list to that cluster only ### Instance Display Instances are grouped by cluster when "All Clusters" is selected, each cluster section showing: - Cluster name with instance count - Instances in a responsive 2-column grid ### InstanceCard Component Each card shows: **Header:** - Instance name (bold, large) - Repository name with version badge (cyan) - **Status badge** with colored background glow: | Status | Badge Color | Description | |-----------------|-------------|----------------------------------------------------| | Deployed | Emerald | Deployment completed successfully | | Failed | Rose/Red | Last operation reported a failure | | Pending Install | Amber | Installation is in progress | | Pending Upgrade | Amber | Upgrade is in progress | | Pending Rollback| Amber | Rollback is in progress | | Pending Delete | Orange | Deletion is in progress | | Superseded | Indigo | A newer revision has replaced this instance | | Uninstalled | Slate | Instance has been removed from the cluster | | Unknown | Slate | Awaiting next state update | - Status reason text (e.g., "Deployment completed successfully." or a custom message) - Last operation label (Install / Upgrade / Rollback / Delete / Sync) **Details grid:** - **Namespace** — purple icon - **Revision** — green icon, Helm revision number - **Repository** — full-width, truncated, monospace - **Launched** — date the instance was created **Last error alert** (conditionally shown): - Red alert box with warning icon - Shows the last error message if the instance encountered errors **Action buttons (5 buttons in a row):** 1. **Refresh** — Refresh icon. Refreshes the status of this specific instance from the cluster. 2. **Entries** — Network icon (emerald). Opens Entries modal. 3. **Diagnostics** — Activity icon (indigo). Opens Diagnostics modal. 4. **Modify** — Settings icon (blue). Opens Modify modal. 5. **Delete** — Stop icon (rose/red). Prompts confirmation, then deletes. ### Empty / Loading / Error States - **Loading:** Shows spinning indicator with "Loading instances..." - **Error:** Shows error state with retry button - **Empty:** "No instances found" with link to launch from registries - **Auto-refresh:** Data refreshes every 30 seconds silently ### Entries Modal Displays network entry information for the instance. **Header:** - Title: "Instance Entries" - Instance name and namespace **Data Source badge:** - **Live from Kubernetes** (green) — fetched directly from the cluster - **From Helm Manifest** (blue) — extracted from Helm manifest - **From Helm Notes** (yellow) — from Helm release notes - **No Data Available** (gray) **Services section** (if any): - Lists each Kubernetes Service with: - Service name and type badge - Cluster IP (copyable) - Ports with mapping (e.g., `80 → 8080 TCP`, NodePort) - LoadBalancer entries (if applicable) with external link and copy **Ingresses section** (if any): - Lists each Kubernetes Ingress with: - Ingress name and class - Host with external link and copy buttons - Path routing (e.g., `/ → service:80`) - TLS indicator if HTTPS is configured **Helm Notes** (as fallback): - Raw Helm notes text shown in a monospace pre block **Footer:** Close button ### Diagnostics Modal Provides Kubernetes-level diagnostics for the instance. **Header:** - "Runtime diagnostics" label - Instance name - Namespace and data collection timestamp **Refresh button** in the header — reloads diagnostics data from Kubernetes **Three tabs:** **1. Describe tab** (default): - Summary metrics: Pods count, Services count, Events count - **Pods section** — each pod shows: - Pod name, node, pod IP, restart count - Status badge (Running=success, other=warning) - Containers with name, state badge, image, and reason/message - **Services section** — each service with name, type badge, ClusterIP, ports **2. Events tab**: - Kubernetes events sorted by time - Each event shows: type badge (Warning/ Normal), reason, timestamp, message, involved object, count **3. Pod Logs tab**: - Logs from each container, labeled by pod/container name - Monospace display on dark background - **Copy Logs** button copies all logs to clipboard - Last 300 lines are fetched per container **Error states:** - Loading fails: error toast - No data: amber info box "Diagnostics data is not available" - Empty pods/events/logs: relevant empty state message ### Modify Modal Allows modifying an existing instance. **Header:** "Modify Instance - {name}" with settings icon **Current info section** (read-only): - Current version - Cluster ID - Repository **Fields:** 1. **Version Tag** — text input, pre-filled with current version. Help: "Leave unchanged to keep current version" 2. **Description** — text input 3. **Configuration Values** — Form or YAML mode (auto-detects if schema exists): - **Form mode:** Dynamic form generated from values schema, with real-time sync to YAML - **YAML mode:** Textarea with monospace font, pre-filled with current values **Footer:** Cancel / Modify buttons **After submit:** Instance is upgraded via Helm, data refreshes, modal closes. --- ## 6. Cluster Monitoring The Monitoring page at `/monitoring/clusters` shows the health and resource usage of all configured Kubernetes clusters. ### Summary Stats Cards Four stat cards at the top: 1. **Total Clusters** (blue) — total number of clusters 2. **Healthy** (green) — clusters with status "healthy" 3. **Warning** (orange) — clusters with status "warning" or "unknown" 4. **Error** (red) — clusters with status "error" or "unhealthy" ### Auto-Refresh - The page auto-refreshes every **30 seconds** - A small note shows "Auto-refresh every 30 seconds" with a refresh indicator - Manual **Refresh** button in the page header ### ClusterMonitorCard Each cluster is shown in an expandable card. **Card header:** - Status icon (green check, yellow warning, red X, or gray question mark) - Cluster name with status badge (Healthy / Warning / Error) **Metrics grid** (4 columns): - **Uptime** — how long the cluster has been running - **Nodes** — node count - **Pods** — pod count - **GPU** — used/total GPU count **Resource usage** (3 columns with progress bars): - **CPU** — used/total, percentage bar, max per node with peak usage - **Memory** — used/total, percentage bar, max per node with peak usage - **GPU** (only if GPUs exist) — used/total, percentage bar, max per node with peak usage **Last checked** timestamp **Show Nodes / Hide Nodes** toggle button (only if nodes exist) ### NodeMetricCard (Expandable Nodes) When nodes are expanded, each node shows: - Node name with status icon (Ready green / NotReady red) - Status badge (Ready / NotReady) and role badge (Control Plane / Worker) - Age - **CPU** — usage/allocatable with progress bar - **Memory** — usage/allocatable with progress bar - **GPU** — usage/capacity with progress bar (shows "No GPU" if none) - Additional info: Pod count, Kubelet version ### States - **Loading:** Shows "Loading cluster monitoring data..." - **Error:** Error state with retry button, "Failed to Load Clusters" - **Empty:** "No Clusters Available" with suggestion to add clusters in configuration --- ## 7. Setup — Clusters The Cluster Configuration page at `/configuration/clusters` manages Kubernetes cluster connections. ### Page Header - Title: "Configuration - Clusters" - Description changes based on role (admin sees "Manage all..." , regular user sees "Manage your private...") - **Refresh** button (secondary) - **Add Cluster** button (primary, blue, plus icon) ### ClusterList Component **Loading state:** Spinner with "Loading clusters..." **Empty state:** Server icon with "No clusters" and "Add your first cluster..." **Cluster cards** (2-column grid): - Cluster name with server icon and visibility label (Private / Workspace / Global) - Description (if any) - Three action buttons: - **Test Connection** (Activity icon, emerald) — performs a health check against the cluster - **Edit** (pencil icon, blue) — opens edit modal - **Delete** (trash icon, red) — prompts confirmation then deletes - API Server URL (monospace, truncated) - Auth status grid (3 columns): CA Certificate, Client Cert, Client Key — each shows "✅ Configured" or "✗ Not Configured" - Created date Action buttons may be disabled based on user permissions (read-only access). ### Add / Edit Cluster Modal **Form fields:** 1. **Cluster Name** (required) — text input, e.g., "Production Cluster" 2. **API Server URL** (required) — must start with `https://`, e.g., `https://kubernetes.example.com:6443` 3. **CA Certificate (Base64)** — required for create. Textarea for base64-encoded CA cert. In edit mode: shows current status and optional new input. 4. **Client Certificate (Base64)** — required for create. In edit mode: shows current status. 5. **Client Key (Base64)** — required for create. In edit mode: shows current status. 6. **Bearer Token** — optional alternative to client certificates. Textarea for service account token. 7. **Description** — optional textarea **Validation:** - Name and API Server URL required - URL must start with `http://` or `https://` - Create mode requires either token OR all three certificate fields - Edit mode: certificate fields are optional (leave blank to keep existing) **Footer:** Cancel / Add Cluster or Save ### Health Check Clicking the **Test Connection** (Activity) button on a cluster: - Shows "Testing cluster..." toast - If successful: green toast with success message - If failed: red toast with error message - Checks connectivity to the Kubernetes API server --- ## 8. Setup — Registries The Registry Configuration page at `/configuration/registries` manages OCI registry connections. ### Page Header - Title: "Configuration - Registries" - **Refresh** button (secondary) - **Add Registry** button (primary, blue, plus icon) ### RegistryList Component **Loading state:** "Loading registries..." **Empty state:** Database icon with "No registries" and "Add your first registry..." **Registry cards** (vertical list): - Registry name with database icon and visibility label - **Insecure** badge (yellow, if insecure flag is on) - Registry URL (clickable link, opens in new tab) - Description (if any) - Username display - Two action buttons: - **Edit** (pencil icon, blue) — opens edit modal - **Delete** (trash icon, red) — prompts confirmation then deletes ### Add / Edit Registry Modal **Form fields:** 1. **Name** (required) — e.g., "Harbor Production" 2. **Registry URL** (required) — e.g., `https://registry.example.com` 3. **Username** (required) — registry username (Harbor robot account recommended) 4. **Password** — required for create. In edit mode: shows current status ("Password set - encrypted") and optional new password input 5. **Description** — optional textarea 6. **Insecure** — checkbox. "Allow insecure connection (skip SSL certificate verification)" — for registries using HTTP or self-signed certs **Test Connection button** (in edit mode only, after saving): - Tests the registry connectivity by calling the backend health endpoint - Button shows a pulsing test tube icon while testing - Shows success/failure toast **Footer:** Save / Test Connection / Cancel --- ## 9. Setup — Users (Admin) The User Management page at `/configuration/users` is **admin-only**. Non-admin users cannot access this route. ### Page Header - "Admin only" label with shield icon - Title: "User Management" - Description: "Create accounts, assign roles, and disable access without public self-registration." - **Refresh** button (secondary) ### Create User Form (Left Panel) **Username** (required) — text input **Initial password** (required) — masked input **Role** dropdown — "User" or "Admin" When **User** role is selected, additional fields appear: **Tenant namespace section:** - **Namespace** — text input, auto-generated from username as `ocdp-u-{username}` - **Default cluster** — dropdown of available clusters **Resource limits section:** - **CPU** — default "4" (Kubernetes quantity, e.g., "4" or "500m") - **Memory** — default "16Gi" - **GPU** — default "0" (integer count) - **GPU Mem** — default "0" (integer MB, e.g., 10000) - Help text explains the units **Checkbox:** - "Require password change after first login" — checked by default **Create User button** (primary, full-width, user-plus icon) ### Accounts Table (Right Panel) A table with columns: | Column | Content | |-----------|------------------------------------------------------| | User | Username + email | | Role | Badge: "admin" (info blue) or "user" (secondary) | | Status | Badge: "Active" (green) or "Disabled" (warning) | | Namespace | Namespace + workspace name + default cluster | | Quota | CPU, Memory, GPU/GPU Mem (admin shows "default workspace") | | Actions | See below | **Actions per row (4 buttons):** 1. **Make User / Make Admin** — toggles the user's role between admin and user 2. **Limits** (pencil icon) — opens the Edit Limits modal (only for non-admin users) 3. **Enable / Disable** — toggles the user's active status (disabled for own account) 4. **Delete** (trash icon, red) — deletes user after confirmation (disabled for own account) ### Edit Limits Modal Opens when "Limits" button is clicked for a non-admin user: - **Tenant limits** label with gauge icon - User's name as title - Description: "Changes are applied to workspace metadata..." - Fields: Namespace, Default cluster, CPU, Memory, GPU, GPU Memory - **Cancel** / **Save Limits** buttons --- ## 10. Navigation ### Left Sidebar The sidebar shows the "Operations" branding at the top with the following navigation items: | Item | Icon | Route | |-------------------|-----------------------|----------------------------------| | Home | Home (gray) | `/home` | | Launch Instance | Rocket (blue) | `/artifact/registries` | | Instances | Boxes (emerald) | `/artifact/instances` | | Cluster Monitoring| LineChart (teal) | `/monitoring/clusters` | | **Setup** (collapsible) | Settings | | | └ Clusters | Server (teal) | `/configuration/clusters` | | └ Registries | Database | `/configuration/registries` | | └ Users | Users (blue) | `/configuration/users` | - The Setup section is expanded by default - Active nav item is highlighted with a blue background - Sidebar collapses on mobile with a hamburger menu toggle - Navigator items dynamically filter based on user role: - "Users" is only shown to admin users - Routes are protected server-side too ### Page Header / Breadcrumbs Each page shows a header in the top navigation bar with: - Page icon - Page title (e.g., "Launch Instance", "Instances", "Setup - Clusters") - Current user's name and role badge on the right - **Sign Out** button (door icon with arrow, top-right) The title mapping is: | Route | Header Title | |-------------------------------|-------------------------| | `/artifact/registries` | Launch Instance | | `/artifact/instances` | Instances | | `/configuration/clusters` | Setup - Clusters | | `/configuration/registries` | Setup - Registries | | `/configuration/users` | Setup - Users | | `/monitoring/clusters` | Monitoring - Clusters | | `/home` | OCDP Platform | ### Legacy Route Redirects Several legacy URL patterns redirect to current routes: - `/config/*` → `/configuration/clusters` - `/monitor`, `/cluster`, `/cluster/monitor` → `/monitoring/clusters` - `/artifact/registry` → `/artifact/registries` - `/artifact/instance` → `/artifact/instances` - `/registry` → `/artifact/registries` - `/register` → `/` ---