- Add Workspace domain (entity, repository, service, handler, DTO) - Add multi-tenant K8s client with tenant binding and quota management - Add K8s diagnostics client (instance diagnostics) - Add authorization middleware (authz package) - Restructure frontend to feature-based architecture (features/) - Add User Management page in configuration - Add AccessDenied page and route guards - Refactor shared components (form inputs, layout, UI) - Update Tailwind config for new design system - Add comprehensive documentation (docs/, tasks/, plans) - Improve cluster service with better kubeconfig handling - Add tests for crypto, config, helm client, tenant binding
28 KiB
OCDP Platform User Guide
Table of Contents
- Overview
- Login / Authentication
- Home Page
- Launch Instance (Chart Browser)
- Instances Management
- Cluster Monitoring
- Setup — Clusters
- Setup — Registries
- Setup — Users (Admin)
- Navigation
1. Overview
OCDP (Open Cloud Deployment Platform) is a Kubernetes LLM inference deployment platform. Its primary use case is: a user selects a vllm-serve Helm chart from a Harbor registry, fills in the instance name, namespace, and values, and the backend pulls the packaged OCI Helm chart and deploys it to a configured Kubernetes cluster via the Helm SDK.
Architecture
Frontend (React 18 + TypeScript + Vite + TailwindCSS)
|
| HTTP /api/*
v
Nginx (Reverse Proxy / Static File Server)
|
| HTTP /api/*
v
Backend (Go 1.24 + Gorilla Mux + Hexagonal Architecture)
|
+---> PostgreSQL (persistence)
+---> ORAS SDK (OCI chart pull)
+---> Helm SDK (deploy/upgrade/delete)
+---> client-go (Kubernetes API)
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, TailwindCSS, React Router, Lucide icons |
| Backend | Go 1.24, Gorilla Mux, PostgreSQL, ORAS SDK, Helm SDK, client-go |
| Gateway | Nginx (reverse proxy + static file serving) |
| Database | PostgreSQL |
| Deployment | Docker Compose |
2. Login / Authentication
Access
The frontend is deployed at http://10.6.80.114:18080. Navigating to the root URL redirects to the login page.
Login Page
The login page displays:
- OCDP Console title with a shield icon
- Subtitle: "Sign in with an account created by an administrator"
- Username text field (required)
- Password text field (required, masked)
- Login button — blue, centered, full-width
When you click Login:
- A toast notification "Logging in..." appears briefly
- The button shows a spinning loader and "Logging in..." text
- On success: a "Welcome, {username}!" toast appears, and you are redirected to
/home - On failure: a red error message is shown below the button (e.g., "Invalid credentials" or "Network error")
Default Admin Credentials
If the system was bootstrapped via .env configuration, the default admin credentials are:
- Username:
admin(or whatever was set asBOOTSTRAP_ADMIN_USER) - Password: The value of
BOOTSTRAP_ADMIN_PASSin your.envfile
JWT Session Behavior
- The backend issues JWT tokens upon successful login
- The frontend stores the tokens and sends them as
Authorization: Bearer <token>headers - Session persists until the token expires or the user signs out
- Clicking the logout icon (top-right corner, person icon with a door arrow) signs the user out
Routing When Authenticated
- Unauthenticated users are always redirected to
/(login page) - Authenticated users visiting
/are redirected to/home - Protected routes are wrapped in a
ProtectedRoutecomponent; unauthorized access redirects to login - Route-level access is enforced per user role (admin vs regular user)
3. Home Page
The home page at /home is the main landing page after login. It has three sections.
Section 1: Primary Actions (3 Cards)
A large card titled "One Click Deployment Platform" / "Operations Workbench" contains three action cards arranged in a row:
1. Launch Instance
- Icon: Rocket (blue background)
- Description: "Browse Helm charts and deploy a new inference service."
- Clicking navigates to
/artifact/registries - Shows "Open" with an arrow on hover
2. Instances
- Icon: Package (emerald background)
- Description: "Check release status, entries, upgrades, and deletion."
- Clicking navigates to
/artifact/instances - Shows "Open" with an arrow on hover
3. Cluster Monitoring
- Icon: Activity (dark slate background)
- Description: "Inspect cluster health and node resource pressure."
- Clicking navigates to
/monitoring/clusters - Shows "Open" with an arrow on hover
Each card:
- Has a subtle border, slate background, and hover effect (lifts slightly, adds blue border)
- Shows a colored icon box at the top
- Shows title and description
- Has an "Open" link with right-arrow at the bottom
Section 2: Runtime Focus Sidebar
On the right side of the primary actions, a smaller card titled "Runtime Focus" with subtitle "High-frequency checks":
- Release status — clickable row that navigates to
/artifact/instances. Subtitle: "Installed, failed, deleting" - Cluster health — clickable row that navigates to
/monitoring/clusters. Subtitle: "Nodes, pods, CPU, memory"
Section 3: Setup
A bottom section titled "Setup" with subtitle "Less frequent administrative tasks". Contains three buttons in a row:
- Clusters — Server icon. Description: "Kubeconfig and namespace policy". Navigates to
/configuration/clusters - Registries — Database icon. Description: "Harbor robot account and chart access". Navigates to
/configuration/registries - Users — Users icon. Description: "Admin-only account management". Navigates to
/configuration/users. Only visible to admin users.
4. Launch Instance (Chart Browser)
The Artifact Browser page at /artifact/registries is the chart browser for selecting and launching Helm charts.
Page Layout
The page is a split-pane layout:
- Left sidebar (w-80): Registry tree with search
- Right main panel: Repository info, tags, and launch actions
Left Panel: Registry Tree
Header bar (top of page):
- Title: "Chart Browser"
- Subtitle: "Select a Harbor chart and launch it into a Kubernetes cluster"
- Refresh button (secondary style, refresh icon) — reloads all registries and repositories, clears cache
Search bar at the top of the sidebar:
- Placeholder: "Search registries / repositories..."
- Filters the tree as you type (matches registry name and repository name)
- Shows a search icon on the left
Registry nodes listed below the search bar:
- Each registry shows:
- Chevron (down/right) to expand/collapse
- Database icon (blue)
- Registry name
- Registry URL (truncated, small text)
- Badge showing count of repositories
- Registries are expanded by default
- Clicking a registry header toggles expansion
Repository items under each registry:
- Each shows the repository name
- Clicking a repository selects it (highlighted blue background) and loads its artifacts in the right panel
- Shows artifact count if available
- If no repositories: shows "No chart repositories found."
- If loading: shows "Loading repositories..."
Right Panel: Repository Details
When a repository is selected:
Repository header:
- Label: "Chart repository" (uppercase, small)
- Repository name (large, bold)
- Registry name below
- Filter chips: Two toggle buttons: "Charts" (default selected, blue) and "All tags"
- "Charts" filter shows only artifacts of type
chart(i.e., deployable Helm charts) - "All tags" shows every artifact version regardless of type
Artifact grid (responsive: 1-3 columns):
- Each artifact is displayed in a TagCard component (see below)
When no repository is selected:
- Shows empty state: "Select a repository" with "Choose a chart repository from the left panel."
TagCard Component
Each TagCard shows:
- Type icon: Package (chart), Box (image), or File (other) with color-coded background
- Tag name (e.g.,
1.0.0) with a type badge (e.g., "chart" in blue) - Repository path (truncated)
- Size in KB or MB (e.g., "12.5 MB")
TagCard buttons:
- Launch button — Blue button, only visible when the artifact type is
chart. Shows rocket icon + "Launch". Opens the LaunchModal. - Copy button — White button with copy icon. Copies the
helm pull oci://...command to the clipboard. Shows a success toast.
LaunchModal
Opens when "Launch" is clicked on a chart artifact. Title: "Launch Instance" with rocket icon.
Modal header:
- Shows the repository name and tag (e.g.,
vllm-serve:1.0.0)
Form fields:
-
Target Cluster (required)
- Dropdown select listing all configured clusters
- Auto-selects the first available cluster (or user's default cluster)
- If no clusters: shows an amber warning "No clusters available. Please add a cluster first."
- Shows loading state while fetching
-
Instance Name (required)
- Text input, placeholder: "my-app"
- Help text: "Lowercase alphanumeric characters, '-' or '.'"
-
Namespace (required)
- If the selected cluster has allowed namespaces: shows a dropdown of allowed namespaces
- If no restrictions: shows a text input, default "default"
- If namespace is controlled by workspace policy: input is disabled with a blue info notice
-
Description (optional)
- Text input, placeholder: "Optional description"
-
Configuration Values — Three input modes:
a) Quick mode (default):
- Blue info box explaining "Quick launch uses the chart defaults"
- Shows badges: "No values override" and if available "Chart values.yaml available"
- If
values.yamlexists: "Load Defaults from values.yaml" button switches to YAML mode with defaults pre-filled - Best for simple deployments with no custom overrides
b) Guided mode (form):
- Only available when the chart provides a JSON Schema for its values
- Dynamically generates form fields based on the schema
- Supports various schema types: string, number, boolean, object, array
- "Load Defaults" button — fills in values from the schema defaults
- Shows schema-generated form in a scrollable container
c) YAML mode:
- A code editor (textarea) for entering custom values in YAML format
- Real-time YAML validation with error display
- "Load Defaults from values.yaml" button
- "Load Schema Defaults" button (if no values.yaml but schema exists)
- "Clear" button to reset the YAML
- Help text changes based on whether schema is available
-
Artifact Info (read-only summary):
- Repository name
- Tag (badge)
- Type
Footer buttons:
- Cancel (secondary) — closes the modal
- Launch (success/green style with rocket icon) — submits the deployment
Validation on submit:
- Cluster must be selected
- Instance name must not be empty
- Namespace must not be empty
- If namespace policy restricts namespaces, the selected namespace must be in the allowed list
- YAML values are parsed and validated before submission
After successful submit:
- Form resets
- Modal closes
- Navigates to
/artifact/instancesto show the deploying instance - Shows "Instance deployed successfully" toast
Error states:
- Loading clusters fails: error toast
- Missing required fields: validation error toast
- YAML parse error: inline error + toast
- API failure: error toast with message
5. Instances Management
The Instances page at /artifact/instances manages all deployed Helm releases across clusters.
Stats Cards
Three gradient stat cards at the top (shown when clusters exist):
- Total Instances (blue) — total count across all clusters
- Clusters (emerald) — number of clusters
- Showing (violet) — count of currently displayed instances (only shown when filtering across 2+ clusters)
Filter Controls
When more than one cluster exists, a filter bar appears:
- Filter by Cluster dropdown with "All Clusters" and each cluster with instance count
- Selecting a cluster filters the instance list to that cluster only
Instance Display
Instances are grouped by cluster when "All Clusters" is selected, each cluster section showing:
- Cluster name with instance count
- Instances in a responsive 2-column grid
InstanceCard Component
Each card shows:
Header:
- Instance name (bold, large)
- Repository name with version badge (cyan)
- Status badge with colored background glow:
| Status | Badge Color | Description |
|---|---|---|
| Deployed | Emerald | Deployment completed successfully |
| Failed | Rose/Red | Last operation reported a failure |
| Pending Install | Amber | Installation is in progress |
| Pending Upgrade | Amber | Upgrade is in progress |
| Pending Rollback | Amber | Rollback is in progress |
| Pending Delete | Orange | Deletion is in progress |
| Superseded | Indigo | A newer revision has replaced this instance |
| Uninstalled | Slate | Instance has been removed from the cluster |
| Unknown | Slate | Awaiting next state update |
- Status reason text (e.g., "Deployment completed successfully." or a custom message)
- Last operation label (Install / Upgrade / Rollback / Delete / Sync)
Details grid:
- Namespace — purple icon
- Revision — green icon, Helm revision number
- Repository — full-width, truncated, monospace
- Launched — date the instance was created
Last error alert (conditionally shown):
- Red alert box with warning icon
- Shows the last error message if the instance encountered errors
Action buttons (5 buttons in a row):
- Refresh — Refresh icon. Refreshes the status of this specific instance from the cluster.
- Entries — Network icon (emerald). Opens Entries modal.
- Diagnostics — Activity icon (indigo). Opens Diagnostics modal.
- Modify — Settings icon (blue). Opens Modify modal.
- Delete — Stop icon (rose/red). Prompts confirmation, then deletes.
Empty / Loading / Error States
- Loading: Shows spinning indicator with "Loading instances..."
- Error: Shows error state with retry button
- Empty: "No instances found" with link to launch from registries
- Auto-refresh: Data refreshes every 30 seconds silently
Entries Modal
Displays network entry information for the instance.
Header:
- Title: "Instance Entries"
- Instance name and namespace
Data Source badge:
- Live from Kubernetes (green) — fetched directly from the cluster
- From Helm Manifest (blue) — extracted from Helm manifest
- From Helm Notes (yellow) — from Helm release notes
- No Data Available (gray)
Services section (if any):
- Lists each Kubernetes Service with:
- Service name and type badge
- Cluster IP (copyable)
- Ports with mapping (e.g.,
80 → 8080 TCP, NodePort) - LoadBalancer entries (if applicable) with external link and copy
Ingresses section (if any):
- Lists each Kubernetes Ingress with:
- Ingress name and class
- Host with external link and copy buttons
- Path routing (e.g.,
/ → service:80) - TLS indicator if HTTPS is configured
Helm Notes (as fallback):
- Raw Helm notes text shown in a monospace pre block
Footer: Close button
Diagnostics Modal
Provides Kubernetes-level diagnostics for the instance.
Header:
- "Runtime diagnostics" label
- Instance name
- Namespace and data collection timestamp
Refresh button in the header — reloads diagnostics data from Kubernetes
Three tabs:
1. Describe tab (default):
- Summary metrics: Pods count, Services count, Events count
- Pods section — each pod shows:
- Pod name, node, pod IP, restart count
- Status badge (Running=success, other=warning)
- Containers with name, state badge, image, and reason/message
- Services section — each service with name, type badge, ClusterIP, ports
2. Events tab:
- Kubernetes events sorted by time
- Each event shows: type badge (Warning/ Normal), reason, timestamp, message, involved object, count
3. Pod Logs tab:
- Logs from each container, labeled by pod/container name
- Monospace display on dark background
- Copy Logs button copies all logs to clipboard
- Last 300 lines are fetched per container
Error states:
- Loading fails: error toast
- No data: amber info box "Diagnostics data is not available"
- Empty pods/events/logs: relevant empty state message
Modify Modal
Allows modifying an existing instance.
Header: "Modify Instance - {name}" with settings icon
Current info section (read-only):
- Current version
- Cluster ID
- Repository
Fields:
- Version Tag — text input, pre-filled with current version. Help: "Leave unchanged to keep current version"
- Description — text input
- Configuration Values — Form or YAML mode (auto-detects if schema exists):
- Form mode: Dynamic form generated from values schema, with real-time sync to YAML
- YAML mode: Textarea with monospace font, pre-filled with current values
Footer: Cancel / Modify buttons
After submit: Instance is upgraded via Helm, data refreshes, modal closes.
6. Cluster Monitoring
The Monitoring page at /monitoring/clusters shows the health and resource usage of all configured Kubernetes clusters.
Summary Stats Cards
Four stat cards at the top:
- Total Clusters (blue) — total number of clusters
- Healthy (green) — clusters with status "healthy"
- Warning (orange) — clusters with status "warning" or "unknown"
- Error (red) — clusters with status "error" or "unhealthy"
Auto-Refresh
- The page auto-refreshes every 30 seconds
- A small note shows "Auto-refresh every 30 seconds" with a refresh indicator
- Manual Refresh button in the page header
ClusterMonitorCard
Each cluster is shown in an expandable card.
Card header:
- Status icon (green check, yellow warning, red X, or gray question mark)
- Cluster name with status badge (Healthy / Warning / Error)
Metrics grid (4 columns):
- Uptime — how long the cluster has been running
- Nodes — node count
- Pods — pod count
- GPU — used/total GPU count
Resource usage (3 columns with progress bars):
- CPU — used/total, percentage bar, max per node with peak usage
- Memory — used/total, percentage bar, max per node with peak usage
- GPU (only if GPUs exist) — used/total, percentage bar, max per node with peak usage
Last checked timestamp
Show Nodes / Hide Nodes toggle button (only if nodes exist)
NodeMetricCard (Expandable Nodes)
When nodes are expanded, each node shows:
- Node name with status icon (Ready green / NotReady red)
- Status badge (Ready / NotReady) and role badge (Control Plane / Worker)
- Age
- CPU — usage/allocatable with progress bar
- Memory — usage/allocatable with progress bar
- GPU — usage/capacity with progress bar (shows "No GPU" if none)
- Additional info: Pod count, Kubelet version
States
- Loading: Shows "Loading cluster monitoring data..."
- Error: Error state with retry button, "Failed to Load Clusters"
- Empty: "No Clusters Available" with suggestion to add clusters in configuration
7. Setup — Clusters
The Cluster Configuration page at /configuration/clusters manages Kubernetes cluster connections.
Page Header
- Title: "Configuration - Clusters"
- Description changes based on role (admin sees "Manage all..." , regular user sees "Manage your private...")
- Refresh button (secondary)
- Add Cluster button (primary, blue, plus icon)
ClusterList Component
Loading state: Spinner with "Loading clusters..." Empty state: Server icon with "No clusters" and "Add your first cluster..."
Cluster cards (2-column grid):
- Cluster name with server icon and visibility label (Private / Workspace / Global)
- Description (if any)
- Three action buttons:
- Test Connection (Activity icon, emerald) — performs a health check against the cluster
- Edit (pencil icon, blue) — opens edit modal
- Delete (trash icon, red) — prompts confirmation then deletes
- API Server URL (monospace, truncated)
- Auth status grid (3 columns): CA Certificate, Client Cert, Client Key — each shows "✅ Configured" or "✗ Not Configured"
- Created date
Action buttons may be disabled based on user permissions (read-only access).
Add / Edit Cluster Modal
Form fields:
- Cluster Name (required) — text input, e.g., "Production Cluster"
- API Server URL (required) — must start with
https://, e.g.,https://kubernetes.example.com:6443 - CA Certificate (Base64) — required for create. Textarea for base64-encoded CA cert. In edit mode: shows current status and optional new input.
- Client Certificate (Base64) — required for create. In edit mode: shows current status.
- Client Key (Base64) — required for create. In edit mode: shows current status.
- Bearer Token — optional alternative to client certificates. Textarea for service account token.
- Description — optional textarea
Validation:
- Name and API Server URL required
- URL must start with
http://orhttps:// - Create mode requires either token OR all three certificate fields
- Edit mode: certificate fields are optional (leave blank to keep existing)
Footer: Cancel / Add Cluster or Save
Health Check
Clicking the Test Connection (Activity) button on a cluster:
- Shows "Testing cluster..." toast
- If successful: green toast with success message
- If failed: red toast with error message
- Checks connectivity to the Kubernetes API server
8. Setup — Registries
The Registry Configuration page at /configuration/registries manages OCI registry connections.
Page Header
- Title: "Configuration - Registries"
- Refresh button (secondary)
- Add Registry button (primary, blue, plus icon)
RegistryList Component
Loading state: "Loading registries..." Empty state: Database icon with "No registries" and "Add your first registry..."
Registry cards (vertical list):
- Registry name with database icon and visibility label
- Insecure badge (yellow, if insecure flag is on)
- Registry URL (clickable link, opens in new tab)
- Description (if any)
- Username display
- Two action buttons:
- Edit (pencil icon, blue) — opens edit modal
- Delete (trash icon, red) — prompts confirmation then deletes
Add / Edit Registry Modal
Form fields:
- Name (required) — e.g., "Harbor Production"
- Registry URL (required) — e.g.,
https://registry.example.com - Username (required) — registry username (Harbor robot account recommended)
- Password — required for create. In edit mode: shows current status ("Password set - encrypted") and optional new password input
- Description — optional textarea
- Insecure — checkbox. "Allow insecure connection (skip SSL certificate verification)" — for registries using HTTP or self-signed certs
Test Connection button (in edit mode only, after saving):
- Tests the registry connectivity by calling the backend health endpoint
- Button shows a pulsing test tube icon while testing
- Shows success/failure toast
Footer: Save / Test Connection / Cancel
9. Setup — Users (Admin)
The User Management page at /configuration/users is admin-only. Non-admin users cannot access this route.
Page Header
- "Admin only" label with shield icon
- Title: "User Management"
- Description: "Create accounts, assign roles, and disable access without public self-registration."
- Refresh button (secondary)
Create User Form (Left Panel)
Username (required) — text input Initial password (required) — masked input Role dropdown — "User" or "Admin"
When User role is selected, additional fields appear:
Tenant namespace section:
- Namespace — text input, auto-generated from username as
ocdp-u-{username} - Default cluster — dropdown of available clusters
Resource limits section:
- CPU — default "4" (Kubernetes quantity, e.g., "4" or "500m")
- Memory — default "16Gi"
- GPU — default "0" (integer count)
- GPU Mem — default "0" (integer MB, e.g., 10000)
- Help text explains the units
Checkbox:
- "Require password change after first login" — checked by default
Create User button (primary, full-width, user-plus icon)
Accounts Table (Right Panel)
A table with columns:
| Column | Content |
|---|---|
| User | Username + email |
| Role | Badge: "admin" (info blue) or "user" (secondary) |
| Status | Badge: "Active" (green) or "Disabled" (warning) |
| Namespace | Namespace + workspace name + default cluster |
| Quota | CPU, Memory, GPU/GPU Mem (admin shows "default workspace") |
| Actions | See below |
Actions per row (4 buttons):
- Make User / Make Admin — toggles the user's role between admin and user
- Limits (pencil icon) — opens the Edit Limits modal (only for non-admin users)
- Enable / Disable — toggles the user's active status (disabled for own account)
- Delete (trash icon, red) — deletes user after confirmation (disabled for own account)
Edit Limits Modal
Opens when "Limits" button is clicked for a non-admin user:
- Tenant limits label with gauge icon
- User's name as title
- Description: "Changes are applied to workspace metadata..."
- Fields: Namespace, Default cluster, CPU, Memory, GPU, GPU Memory
- Cancel / Save Limits buttons
10. Navigation
Left Sidebar
The sidebar shows the "Operations" branding at the top with the following navigation items:
| Item | Icon | Route |
|---|---|---|
| Home | Home (gray) | /home |
| Launch Instance | Rocket (blue) | /artifact/registries |
| Instances | Boxes (emerald) | /artifact/instances |
| Cluster Monitoring | LineChart (teal) | /monitoring/clusters |
| Setup (collapsible) | Settings | |
| └ Clusters | Server (teal) | /configuration/clusters |
| └ Registries | Database | /configuration/registries |
| └ Users | Users (blue) | /configuration/users |
- The Setup section is expanded by default
- Active nav item is highlighted with a blue background
- Sidebar collapses on mobile with a hamburger menu toggle
- Navigator items dynamically filter based on user role:
- "Users" is only shown to admin users
- Routes are protected server-side too
Page Header / Breadcrumbs
Each page shows a header in the top navigation bar with:
- Page icon
- Page title (e.g., "Launch Instance", "Instances", "Setup - Clusters")
- Current user's name and role badge on the right
- Sign Out button (door icon with arrow, top-right)
The title mapping is:
| Route | Header Title |
|---|---|
/artifact/registries |
Launch Instance |
/artifact/instances |
Instances |
/configuration/clusters |
Setup - Clusters |
/configuration/registries |
Setup - Registries |
/configuration/users |
Setup - Users |
/monitoring/clusters |
Monitoring - Clusters |
/home |
OCDP Platform |
Legacy Route Redirects
Several legacy URL patterns redirect to current routes:
/config/*→/configuration/clusters/monitor,/cluster,/cluster/monitor→/monitoring/clusters/artifact/registry→/artifact/registries/artifact/instance→/artifact/instances/registry→/artifact/registries/register→/