refactor: full-stack restructure with multi-tenancy, workspace management, and K8s diagnostics
- Add Workspace domain (entity, repository, service, handler, DTO) - Add multi-tenant K8s client with tenant binding and quota management - Add K8s diagnostics client (instance diagnostics) - Add authorization middleware (authz package) - Restructure frontend to feature-based architecture (features/) - Add User Management page in configuration - Add AccessDenied page and route guards - Refactor shared components (form inputs, layout, UI) - Update Tailwind config for new design system - Add comprehensive documentation (docs/, tasks/, plans) - Improve cluster service with better kubeconfig handling - Add tests for crypto, config, helm client, tenant binding
This commit is contained in:
164
docs/bug-report.md
Normal file
164
docs/bug-report.md
Normal file
@ -0,0 +1,164 @@
|
||||
# OCDP 系统测试 Bug 报告
|
||||
|
||||
**测试日期:** 2026-05-11
|
||||
**测试环境:** http://10.6.80.114:18080
|
||||
**集群:** k3s (dbf824f1-9962-4d8e-881e-870c75fdb6f5), k8s (23880994-dfe4-48d0-abc0-b49692cc630a)
|
||||
**Harbor:** harbor.bwgdi.com (83b823af-873b-457c-912c-9ccde3cb12e6)
|
||||
|
||||
---
|
||||
|
||||
## 测试团队
|
||||
| Agent | 角色 | 账号 |
|
||||
|-------|------|------|
|
||||
| user-a-agent | 前端 UI 测试 | test-user-a / TestUserA123! |
|
||||
| user-b-agent | API/部署测试 | test-user-b / TestUserB123! |
|
||||
| user-c-agent | 权限隔离测试 | test-user-c / TestUserC123! |
|
||||
| security-agent | 安全测试 | admin + 普通用户 |
|
||||
|
||||
---
|
||||
|
||||
## Bug 列表 (按严重度排序)
|
||||
|
||||
### P0 - Blocker (核心功能不可用)
|
||||
|
||||
| ID | 标题 | 发现者 | 页面/端点 | 描述 |
|
||||
|----|------|--------|-----------|------|
|
||||
| BUG-001 | **Launch 按钮点击无任何反应** | user-a | `/artifact/registries` (TagCard) | Chart Browser 中 TagCard 的 "Launch" 按钮显示为可用状态 (`is_enabled() == True`),但点击后无任何效果:不弹出 Launch Modal,无 URL 变化,无控制台错误。**核心"一键部署"流程完全阻塞** |
|
||||
| BUG-002 | **SPA 直接路由返回空白页面** | user-a | `/clusters`, `/registries`, `/monitoring`, `/launch` | 直接访问 SPA 旧路由时只渲染 `<div id="root">` 空壳,React SPA 无法挂载。代码中已定义 redirect 映射但未生效(如 `/clusters` → `/configuration/clusters`) |
|
||||
|
||||
### P1 - 高 (High)
|
||||
|
||||
| ID | 标题 | 发现者 | 页面/端点 | 描述 |
|
||||
|----|------|--------|-----------|------|
|
||||
| BUG-003 | DELETE 实例返回 404 但实际成功删除 | user-b, user-c | `DELETE /clusters/{id}/instances/{id}` | 删除操作正确触发 `pending-delete` 状态转换,但 HTTP 返回 **404**(空 body),非预期 202/204。客户端误判为失败 |
|
||||
| BUG-004 | DELETE 实例返回空响应体 | user-b | `DELETE /clusters/{id}/instances/{id}` | 用正确的 token 和 ID 请求,返回空 body(无 JSON),前端解析会失败 |
|
||||
|
||||
### P2 - 中 (Medium)
|
||||
|
||||
| ID | 标题 | 发现者 | 页面/端点 | 描述 |
|
||||
|----|------|--------|-----------|------|
|
||||
| BUG-005 | Tags 专用端点缺失 | user-b | `GET /registries/{id}/repositories/{repo}/tags` | 端点未实现,返回纯文本 "404 page not found"。虽可通过 `/artifacts` 获取 tag,但 API 不完整 |
|
||||
| BUG-006 | 跨用户 namespace 部署时静默覆盖 | user-c | `POST /clusters/{id}/instances` | 用户请求部署到其他用户的 namespace 时,服务端静默使用自己的 namespace,返回 200 且无任何警告或提示 |
|
||||
| BUG-007 | Clusters Metrics API 缺失 | user-b | `GET /monitoring/clusters/{id}/metrics` | 监控页面可能需要的数据端点未实现(404) |
|
||||
| BUG-008 | Cluster Stats API 缺失 | user-b | `GET /clusters/{id}/stats` | 统计端点未实现(404) |
|
||||
| BUG-009 | Kubeconfig API 缺失 | user-b | `GET /clusters/{id}/kubeconfig` | kubeconfig 签发端点未实现(404) |
|
||||
| BUG-010 | "Launch" 按钮缺乏可访问性标识 | user-a | TagCard "Launch" | Chart 上的 "Launch" 按钮无 `aria-label`,与侧边栏 "Launch Instance" 导航项标签冲突,屏幕阅读器用户无法区分 |
|
||||
|
||||
### P3 - 低 (Low)
|
||||
|
||||
| ID | 标题 | 发现者 | 页面/端点 | 描述 |
|
||||
|----|------|--------|-----------|------|
|
||||
| BUG-011 | API 响应格式不一致 | user-b | 列表 API | Clusters/Registries 返回裸数组,Instances 返回 `{ "instances": [...], "total": N }` 包装对象 |
|
||||
| BUG-012 | `/auth/me` 返回空的 token 字段 | user-b | `GET /auth/me` | 响应中包含 `"accessToken": ""` 和 `"refreshToken": ""` 空字段,复用了 login 响应 DTO 未清理 |
|
||||
| BUG-013 | 登录接口存在用户枚举漏洞 | security | `POST /auth/login` | 不存在用户返回 "user not found",存在用户返回 "invalid password",攻击者可枚举有效用户名 |
|
||||
| BUG-014 | 登录接口无速率限制 | security | `POST /auth/login` | 10 次连续请求全部返回 401,无 429 限流或锁定 |
|
||||
| BUG-015 | Nginx 版本信息泄露 | security | HTTP Headers | `Server: nginx/1.27.5` 暴露精确版本号 |
|
||||
| BUG-016 | CORS 配置过于宽松 | security | All API | `Access-Control-Allow-Origin: *` 允许任意跨域请求 |
|
||||
| BUG-017 | 缺少安全响应头 | security | All pages | 缺少 HSTS、X-Frame-Options、Content-Security-Policy 等 |
|
||||
| BUG-018 | `/health` 端点返回 SPA HTML | security | `GET /health` | 健康检查返回完整 index.html,非 JSON 状态响应 |
|
||||
|
||||
---
|
||||
|
||||
## 分类汇总
|
||||
|
||||
### 前端 Bug
|
||||
| ID | 描述 | 严重度 |
|
||||
|----|------|--------|
|
||||
| BUG-001 | Launch 按钮无反应(核心功能阻塞) | P0 🔴 |
|
||||
| BUG-002 | SPA 路由空白页 | P0 🔴 |
|
||||
| BUG-010 | Launch 按钮缺少 aria-label | P2 🟡 |
|
||||
|
||||
### 后端 API Bug
|
||||
| ID | 描述 | 严重度 |
|
||||
|----|------|--------|
|
||||
| BUG-003 | DELETE 返回 404 | P1 🟠 |
|
||||
| BUG-004 | DELETE 空 body | P1 🟠 |
|
||||
| BUG-005 | Tags 端点缺失 | P2 🟡 |
|
||||
| BUG-007 | Metrics API 缺失 | P2 🟡 |
|
||||
| BUG-008 | Stats API 缺失 | P2 🟡 |
|
||||
| BUG-009 | Kubeconfig API 缺失 | P2 🟡 |
|
||||
| BUG-011 | 响应格式不一致 | P3 🔵 |
|
||||
| BUG-012 | auth/me 空 token 字段 | P3 🔵 |
|
||||
| BUG-018 | /health 返回 HTML | P3 🔵 |
|
||||
|
||||
### 安全/权限 Bug
|
||||
| ID | 描述 | 严重度 |
|
||||
|----|------|--------|
|
||||
| BUG-006 | Namespace 静默覆盖(安全但令人困惑) | P2 🟡 |
|
||||
| BUG-013 | 用户枚举(错误消息差异) | P3 🔵 |
|
||||
| BUG-014 | 无速率限制 | P3 🔵 |
|
||||
| BUG-015 | Nginx 版本泄露 | P3 🔵 |
|
||||
| BUG-016 | CORS Origin: * | P3 🔵 |
|
||||
| BUG-017 | 缺少安全响应头 | P3 🔵 |
|
||||
|
||||
### 严重度分布
|
||||
| 级别 | 数量 |
|
||||
|------|------|
|
||||
| P0 (Blocker) | 2 |
|
||||
| P1 (High) | 2 |
|
||||
| P2 (Medium) | 6 |
|
||||
| P3 (Low) | 8 |
|
||||
| **合计** | **18** |
|
||||
|
||||
---
|
||||
|
||||
## 测试通过项
|
||||
|
||||
### 认证
|
||||
- [x] 有效凭据登录 (admin + 所有 test-user)
|
||||
- [x] 无效凭据返回 401
|
||||
- [x] 无 token 访问被保护 API 返回 401
|
||||
- [x] 无效/篡改 JWT token 全部被拒绝
|
||||
- [x] /auth/me 返回正确的用户信息
|
||||
- [x] JWT payload 包含角色、权限、namespace
|
||||
|
||||
### Cluster / Registry API
|
||||
- [x] 集群列表正常返回
|
||||
- [x] 集群健康检查正常
|
||||
- [x] Registry 列表正常返回
|
||||
- [x] 通过 artifacts 端点浏览 repository 正常
|
||||
- [x] 无效 registry/repository 返回恰当错误
|
||||
|
||||
### 权限隔离
|
||||
- [x] GET /users 返回 403 (普通用户)
|
||||
- [x] POST /auth/register 返回 403 (普通用户)
|
||||
- [x] 用户无法访问其他用户的 workspace 资源
|
||||
- [x] 用户无法部署到其他用户的 Kubernetes namespace
|
||||
- [x] 安全架构:核心认证/授权/脱敏/隔离控制均正确实现
|
||||
|
||||
### 实例部署生命周期
|
||||
- [x] 实例创建操作成功(pending-install)
|
||||
- [x] 实例状态正确追踪(pending-install → deployed)
|
||||
- [x] 实例删除正确转换状态(pending-delete → 消失)
|
||||
- [x] 实例列表按 clusterId 正确过滤
|
||||
|
||||
### 安全测试通过项
|
||||
- [x] XSS/SQLi 注入安全处理
|
||||
- [x] 路径遍历攻击被阻止
|
||||
- [x] JWT alg=none/无效格式被拒绝
|
||||
- [x] 集群凭据和 Registry 密码脱敏显示 (••••••••)
|
||||
- [x] 自注册端点需认证 (401)
|
||||
|
||||
---
|
||||
|
||||
## 建议修复优先级
|
||||
|
||||
### 立即修复 (P0)
|
||||
1. **BUG-001**: 调查 Launch 按钮 onClick handler — TagCard 组件中 `onLaunch` prop 未正确传递给 LaunchModal,或 launch 状态 / artifactType 检查阻止了 modal 打开
|
||||
2. **BUG-002**: 检查 React Router `<Navigate redirect>` 组件和 SPA 的 index.html 配置,确保旧路由正确重定向
|
||||
|
||||
### 尽快修复 (P1)
|
||||
3. **BUG-003/004**: InstanceHandler.Delete 应返回 202 Accepted + `{"status":"deleting"}` 而非 404+空 body
|
||||
|
||||
### 短期修复 (P2)
|
||||
4. 实现 `/metrics`, `/stats` 等缺失 API
|
||||
5. Launch 按钮添加 `aria-label` 属性
|
||||
6. Namespace 覆盖时返回警告或 403
|
||||
|
||||
### 安全加固 (P3)
|
||||
7. 登录错误消息统一为 "Invalid username or password"
|
||||
8. 实现速率限制
|
||||
9. Nginx 安全加固:`server_tokens off` + 安全响应头
|
||||
10. CORS 收紧为具体域名
|
||||
11. 修复 `/health` 端点
|
||||
12. 统一 API 响应格式
|
||||
92
docs/bugs-user-a.md
Normal file
92
docs/bugs-user-a.md
Normal file
@ -0,0 +1,92 @@
|
||||
# OCDP Platform QA Report - test-user-a
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Environment:** http://10.6.80.114:18080
|
||||
**User:** test-user-a (non-admin)
|
||||
|
||||
## Summary
|
||||
|
||||
- **Total Bugs Found:** 3
|
||||
- **Screenshots Taken:** 12
|
||||
- **Test Status:** 7/8 areas covered, 1 blocked (Launch button non-functional)
|
||||
|
||||
---
|
||||
|
||||
## Bug List
|
||||
|
||||
### Bug #1: Direct SPA Routes Return Empty Pages (🔴 HIGH)
|
||||
|
||||
- **Page:** Multiple — `/clusters`, `/registries`, `/monitoring`, `/launch`
|
||||
- **Action:** Navigate directly to these URLs
|
||||
- **Actual:** Returns only the React `<div id="root">` shell with no rendered content (~0 chars body text). The SPA fails to mount when hitting these routes directly.
|
||||
- **Expected:** Should either render content or redirect to correct working routes:
|
||||
- `/clusters` → `/configuration/clusters`
|
||||
- `/registries` → `/configuration/registries`
|
||||
- `/monitoring` → `/monitoring/clusters`
|
||||
- `/launch` → `/artifact/registries`
|
||||
- **Severity:** HIGH — Users who bookmark or type these URLs see blank pages
|
||||
- **Screenshot:** `01-login` (representative of empty state)
|
||||
|
||||
**Working routes for reference:**
|
||||
- `/configuration/clusters` ✅
|
||||
- `/configuration/registries` ✅
|
||||
- `/monitoring/clusters` ✅
|
||||
- `/artifact/registries` ✅
|
||||
- `/artifact/instances` ✅
|
||||
|
||||
---
|
||||
|
||||
### Bug #2: Launch Button Does Nothing When Clicked (🔴 HIGH)
|
||||
|
||||
- **Page:** Chart Browser (`/artifact/registries`)
|
||||
- **Action:**
|
||||
1. Navigate to `/artifact/registries`
|
||||
2. Registry `harbor-bwgdi` loads with 13 charts
|
||||
3. Expand `charts/chromadb` folder
|
||||
4. Tag `0.1.4` appears with "Launch" and "Copy" buttons
|
||||
5. Click the "Launch" button
|
||||
- **Actual:** No visible reaction — no modal opens, no URL change, no console error. The button is not disabled (no `disabled` attribute, no `aria-disabled`), is visibly styled as active (`bg-blue-50 text-blue-700 border-blue-200 shadow-sm`), and Playwright confirms `is_enabled() == True`. The React onClick handler produces no observable effect.
|
||||
- **Expected:** Clicking "Launch" on a chart tag should open a deployment form/dialog with cluster selector, instance name, namespace, and values configuration fields.
|
||||
- **Severity:** HIGH — Core platform feature (deploying Helm charts) is completely blocked
|
||||
- **Screenshot:** `04-chart-expanded`
|
||||
|
||||
---
|
||||
|
||||
### Bug #3: Ambiguous "Launch" Button Labels (🟡 MEDIUM)
|
||||
|
||||
- **Page:** Chart Browser (`/artifact/registries`)
|
||||
- **Action:** Inspect button accessible names
|
||||
- **Actual:** Both the sidebar navigation item "Launch Instance" and the chart action button "Launch" appear on the same page. The chart action button has no distinguishing `aria-label` or accessible description. The "Copy" button next to it has a `title="Copy pull command"` attribute, but "Launch" does not.
|
||||
- **Expected:** The chart action should have a descriptive label like `aria-label="Launch chart chromadb version 0.1.4"` to differentiate from the nav item.
|
||||
- **Severity:** MEDIUM — Accessibility concern; minor confusion for sighted users with multiple "Launch" targets
|
||||
|
||||
---
|
||||
|
||||
## Test Results by Area
|
||||
|
||||
| Area | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| Login | ✅ PASS | test-user-a login successful, redirect to `/home` |
|
||||
| Home Page | ✅ PASS | All cards visible, nav clicks work, no Users section |
|
||||
| Sidebar Nav | ✅ PASS | All 6 items navigate correctly, Users hidden |
|
||||
| Chart Browser | ❌ BLOCKED | Registry loads, charts expand, but **Launch button dead** |
|
||||
| Instances | ✅ PASS | Empty state, filter, refresh all work |
|
||||
| Monitoring | ✅ PASS | 2 clusters, health data, CPU/Memory/GPU stats all load |
|
||||
| Config - Clusters | ✅ PASS | Both clusters listed, Add form opens |
|
||||
| Config - Registries | ✅ PASS | Harbor registry listed, Add form opens |
|
||||
| Direct Routes | ❌ FAIL | 4 routes return empty pages |
|
||||
|
||||
## Screenshots
|
||||
|
||||
- `01-login` → `/tmp/ocdp-qa-screenshots/01-login.png`
|
||||
- `02-home` → `/tmp/ocdp-qa-screenshots/02-home.png`
|
||||
- `02-home-full` → `/tmp/ocdp-qa-screenshots/02-home-full.png`
|
||||
- `04-chart-browser` → `/tmp/ocdp-qa-screenshots/04-chart-browser.png`
|
||||
- `04-chart-expanded` → `/tmp/ocdp-qa-screenshots/04-chart-expanded.png`
|
||||
- `04-launch-modal` → `/tmp/ocdp-qa-screenshots/04-launch-modal.png`
|
||||
- `05-instances` → `/tmp/ocdp-qa-screenshots/05-instances.png`
|
||||
- `06-monitoring` → `/tmp/ocdp-qa-screenshots/06-monitoring.png`
|
||||
- `07-clusters` → `/tmp/ocdp-qa-screenshots/07-clusters.png`
|
||||
- `07-add-cluster-form` → `/tmp/ocdp-qa-screenshots/07-add-cluster-form.png`
|
||||
- `08-registries` → `/tmp/ocdp-qa-screenshots/08-registries.png`
|
||||
- `08-add-registry-form` → `/tmp/ocdp-qa-screenshots/08-add-registry-form.png`
|
||||
149
docs/bugs-user-b.md
Normal file
149
docs/bugs-user-b.md
Normal file
@ -0,0 +1,149 @@
|
||||
# Bug Report: test-user-b QA Test
|
||||
|
||||
**Tester:** test-user-b (user role)
|
||||
**Date:** 2026-05-11
|
||||
**Environment:** http://10.6.80.114:18080
|
||||
|
||||
---
|
||||
|
||||
## Bug 1: Repository Tags Endpoint Returns 404
|
||||
|
||||
**Endpoint:** `GET /api/v1/registries/{registryId}/repositories/{repository}/tags`
|
||||
**Status Code:** 404
|
||||
**Response Body:** `404 page not found` (plain text, not JSON)
|
||||
|
||||
**Expected:** Should return a list of tags for the chart/artifact.
|
||||
**Actual:** The dedicated tags endpoint is not implemented or routes incorrectly. The artifacts endpoint (`/repositories/{repository}/artifacts`) does work and returns tag info.
|
||||
|
||||
**Severity:** Medium — tags are still discoverable via artifacts endpoint but the dedicated tags API is broken.
|
||||
|
||||
---
|
||||
|
||||
## Bug 2: DELETE Instance Returns Empty Response Body
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/clusters/{clusterId}/instances/{instanceId}`
|
||||
**Status Code:** 200
|
||||
**Response Body:** (empty — no content at all)
|
||||
|
||||
**Expected:** Should return a confirmation JSON body (e.g., `{"message": "Instance deletion initiated", "id": "..."}`) or at minimum a 202 Accepted with status details.
|
||||
|
||||
**Actual:** Returns a completely empty body. The instance does transition to `pending-delete` state, but the API consumer receives no feedback.
|
||||
|
||||
**Severity:** Medium — operation works but API consumer gets no confirmation.
|
||||
|
||||
---
|
||||
|
||||
## Bug 3: Cluster Stats Endpoint Returns 404
|
||||
|
||||
**Endpoint:** `GET /api/v1/clusters/{clusterId}/stats`
|
||||
**Status Code:** 404
|
||||
**Response Body:** `404 page not found` (plain text)
|
||||
|
||||
**Expected:** Should return cluster resource statistics (CPU, memory, pod counts, etc.) or a proper JSON error if not implemented.
|
||||
|
||||
**Actual:** Endpoint is not implemented — returns a raw 404 with no JSON error structure.
|
||||
|
||||
**Severity:** Low — but given the user has `monitoring:clusters:view` permission, this is a missing feature.
|
||||
|
||||
---
|
||||
|
||||
## Bug 4: Kubeconfig Endpoint Returns 404
|
||||
|
||||
**Endpoint:** `GET /api/v1/clusters/{clusterId}/kubeconfig`
|
||||
**Status Code:** 404
|
||||
**Response Body:** `404 page not found` (plain text)
|
||||
|
||||
**Expected:** Should return a kubeconfig file content or JSON error. User has `kubeconfig:issue:own` permission.
|
||||
|
||||
**Actual:** Endpoint is not implemented.
|
||||
|
||||
**Severity:** Low — the permission exists but the endpoint does nothing.
|
||||
|
||||
---
|
||||
|
||||
## Bug 5: Monitoring Metrics Endpoint Returns 404
|
||||
|
||||
**Endpoint:** `GET /api/v1/monitoring/clusters/{clusterId}/metrics`
|
||||
**Status Code:** 404
|
||||
**Response Body:** `404 page not found` (plain text)
|
||||
|
||||
**Expected:** Monitoring metrics data. User has `monitoring:clusters:view` permission.
|
||||
|
||||
**Actual:** Endpoint not found.
|
||||
|
||||
**Severity:** Low — monitoring permissions exist but backend endpoints missing.
|
||||
|
||||
---
|
||||
|
||||
## Bug 6: Inconsistent API Response Format (Array vs Object Wrapper)
|
||||
|
||||
**Clusters and Registries** return bare arrays:
|
||||
```json
|
||||
[
|
||||
{ "id": "...", "name": "k3s", ... }
|
||||
]
|
||||
```
|
||||
|
||||
**Instances** returns an object wrapper:
|
||||
```json
|
||||
{
|
||||
"instances": [
|
||||
{ "id": "...", "name": "test-nginx-b", ... }
|
||||
],
|
||||
"total": 1
|
||||
}
|
||||
```
|
||||
|
||||
**Expected:** Consistent response format across all list endpoints. Either all return bare arrays or all use the `{ "items": [...], "total": N }` wrapper pattern.
|
||||
|
||||
**Severity:** Low — API consistency issue. Makes client code harder to write generically.
|
||||
|
||||
---
|
||||
|
||||
## Bug 7: auth/me Returns Empty Token Fields
|
||||
|
||||
**Endpoint:** `GET /api/v1/auth/me`
|
||||
**Response includes empty/unpopulated fields:**
|
||||
```json
|
||||
{
|
||||
"accessToken": "",
|
||||
"refreshToken": "",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Expected:** Either remove these fields from the `/auth/me` response (they are only meaningful in login/refresh responses) or populate them with valid values.
|
||||
|
||||
**Actual:** Emptry string values for both token fields create confusion about whether they should be present.
|
||||
|
||||
**Severity:** Low — cosmetic issue, but suggests the DTO is reusing the login response struct without clearing token fields.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| # | Bug | Severity | Category |
|
||||
|---|-----|----------|----------|
|
||||
| 1 | Tags endpoint 404 | Medium | Missing Implementation |
|
||||
| 2 | DELETE returns empty body | Medium | API Response Quality |
|
||||
| 3 | Cluster stats endpoint 404 | Low | Missing Implementation |
|
||||
| 4 | Kubeconfig endpoint 404 | Low | Missing Implementation |
|
||||
| 5 | Monitoring metrics endpoint 404 | Low | Missing Implementation |
|
||||
| 6 | Inconsistent list response format | Low | API Consistency |
|
||||
| 7 | auth/me returns empty tokens | Low | API Response Quality |
|
||||
|
||||
**Passed Tests:**
|
||||
- Login/authentication ✓
|
||||
- Auth/me user info ✓
|
||||
- Cluster listing ✓
|
||||
- Cluster health check ✓
|
||||
- Registry listing ✓
|
||||
- Repository browsing (artifacts) ✓
|
||||
- Instance deployment (nginx chart) ✓
|
||||
- Instance status tracking (pending-install → deployed) ✓
|
||||
- Instance deletion (async, transitions to pending-delete then removed) ✓
|
||||
- Error handling for invalid repository ✓
|
||||
- Error handling for missing required fields ✓
|
||||
- Auth rejects invalid tokens ✓
|
||||
- Auth rejects missing tokens ✓
|
||||
- Instance cleanup confirmed ✓
|
||||
109
docs/bugs-user-c.md
Normal file
109
docs/bugs-user-c.md
Normal file
@ -0,0 +1,109 @@
|
||||
# QA Report: Permission Isolation & Multi-Tenancy Testing — test-user-c
|
||||
|
||||
**Tester:** test-user-c (role: `user`)
|
||||
**Date:** 2026-05-11
|
||||
**Environment:** http://10.6.80.114:18080
|
||||
|
||||
## Summary
|
||||
|
||||
Test-user-c is a standard `user` role with namespace `ocdp-u-test-c`, workspace `71459030-7166-4c79-b53c-81c61da4c313`. Permissions follow the `manage_own` / `view` pattern — no admin-level permissions.
|
||||
|
||||
---
|
||||
|
||||
## Test Results
|
||||
|
||||
### 1. Login & Basic Access ✅
|
||||
|
||||
| Test | Result | Notes |
|
||||
|------|--------|-------|
|
||||
| POST /auth/login | ✅ Pass | Token issued, role=`user`, workspace/namespace correctly assigned |
|
||||
| GET /auth/me | ✅ Pass | Returns correct user profile with permissions |
|
||||
| GET /clusters | ✅ Pass | Sees all `global_shared` clusters (k8s, k3s) |
|
||||
| GET /registries | ✅ Pass | Sees all `global_shared` registries (harbor) |
|
||||
|
||||
### 2. Admin Endpoint Protection
|
||||
|
||||
| Test | Result | Notes |
|
||||
|------|--------|-------|
|
||||
| GET /api/v1/users | ✅ **403 Forbidden** | Properly blocked — `permission denied` |
|
||||
| POST /auth/register | ✅ **403 Forbidden** | Cannot register new users as non-admin |
|
||||
| GET /api/v1/admin/* | ✅ **404** | Admin route prefix doesn't exist (not a bypass risk) |
|
||||
|
||||
### 3. Frontend Access
|
||||
|
||||
| Test | Result | Notes |
|
||||
|------|--------|-------|
|
||||
| GET /configuration/users | ⚠️ **200 (OK)** | SPA returns index.html — expected. Auth is enforced via API, not routes. |
|
||||
| GET /configuration/clusters | ⚠️ **200 (OK)** | Same — SPA behavior. |
|
||||
| GET /configuration/registries | ⚠️ **200 (OK)** | Same. |
|
||||
|
||||
**Risk: Low.** This is standard SPA behavior. Authorization is enforced at the API level. However, if the frontend relies solely on hiding UI elements rather than checking permissions, users who manually navigate could see empty/error states.
|
||||
|
||||
### 4. Namespace Isolation Enforcement
|
||||
|
||||
| Test | Result | Notes |
|
||||
|------|--------|-------|
|
||||
| Deploy with `namespace: ocdp-u-test-a` | ⚠️ **Silently overridden** | Server ignored requested namespace and used `ocdp-u-test-c` instead. **No warning or error returned.** |
|
||||
| PATCH to change namespace | ✅ **404** | PATCH endpoint doesn't exist — namespace cannot be changed after creation |
|
||||
|
||||
🔴 **Bug: Silent namespace override (Low severity)**
|
||||
When a user specifies a namespace that doesn't belong to them in the instance creation request, the server silently overrides it with the user's own namespace. This is secure (prevents cross-namespace deployment) but:
|
||||
- The user receives HTTP 200 with the overridden value — no indication that their request was modified
|
||||
- The response does not differentiate between "user's own namespace" and "requested namespace"
|
||||
- This could lead to user confusion about where their resources were actually deployed
|
||||
- It's unclear whether the user's Helm values also get silently overridden (e.g., the `values.namespace` field)
|
||||
|
||||
### 5. Resource Isolation
|
||||
|
||||
| Test | Result | Notes |
|
||||
|------|--------|-------|
|
||||
| GET instances with other workspaceId query param | ✅ **Isolated** | Returns only own instances (workspaceId filter is server-enforced) |
|
||||
| DELETE on own instance | ⚠️ **Async deletion** | Returns HTTP 404 on DELETE itself, but instance transitions to `pending-delete` then disappears |
|
||||
|
||||
🔴 **Bug: DELETE returns 404 on success (Medium severity)**
|
||||
When deleting an instance via `DELETE /clusters/{clusterId}/instances/{instanceId}`:
|
||||
- The instance transitions to `pending-delete` status
|
||||
- But the HTTP response status code is **404** rather than 200/202/204
|
||||
- The first raw DELETE call returns an empty body (causing JSON parse errors)
|
||||
- This is an API inconsistency — async deletions should return HTTP 202 Accepted
|
||||
|
||||
### 6. Monitoring & Other Endpoints
|
||||
|
||||
| Test | Result | Notes |
|
||||
|------|--------|-------|
|
||||
| GET /monitoring/clusters/.../pods | ✅ **404** | Monitoring endpoints not implemented for this cluster type |
|
||||
| POST /kubeconfig | ✅ **404** | Kubeconfig endpoint not implemented |
|
||||
|
||||
These endpoints return 404 which is acceptable behavior for features not yet implemented.
|
||||
|
||||
---
|
||||
|
||||
## Security Assessment
|
||||
|
||||
### Works as Intended ✅
|
||||
- Admin endpoints (`/users`, `/auth/register`) properly return 403
|
||||
- User cannot access other users' instances via workspaceId manipulation
|
||||
- User cannot deploy into other users' Kubernetes namespaces
|
||||
- No PATCH/PUT verbs available to modify existing instance namespaces
|
||||
- No admin-specific route paths leak data
|
||||
|
||||
### Bugs Found
|
||||
|
||||
1. **DELETE returns 404 on successful async deletion** (Medium)
|
||||
- Endpoint: `DELETE /clusters/{id}/instances/{id}`
|
||||
- After call, instance status becomes `pending-delete` and eventually disappears
|
||||
- But the HTTP response is `404` with empty body
|
||||
- Expected: `202 Accepted` with a `status: "deleted"` or similar response
|
||||
- Risk: Clients interpreting HTTP 404 as "not found" will retry or report errors incorrectly
|
||||
|
||||
2. **Silent namespace override without user feedback** (Low)
|
||||
- Endpoint: `POST /clusters/{id}/instances`
|
||||
- When requesting deployment into another user's namespace, the server silently uses the caller's namespace
|
||||
- No warning, no error, no indication in the response
|
||||
- Expected: Either `403 Forbidden` with "cannot deploy into namespace owned by another user" or a response field indicating the override occurred
|
||||
- Risk: Low for security (the override correctly prevents cross-tenant deployment), but could cause user confusion
|
||||
|
||||
### No Critical Vulnerabilities Found
|
||||
- No privilege escalation vectors identified
|
||||
- No data leakage across workspaces
|
||||
- No ability to access or manipulate other users' resources
|
||||
284
docs/security/bugs-security.md
Normal file
284
docs/security/bugs-security.md
Normal file
@ -0,0 +1,284 @@
|
||||
# OCDP Security Audit Report
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Target:** http://10.6.80.114:18080
|
||||
**API Base:** http://10.6.80.114:18080/api/v1
|
||||
|
||||
---
|
||||
|
||||
## Finding 1: User Enumeration via Login Error Messages
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Test** | Authentication Error Disclosure |
|
||||
| **Severity** | **Medium** |
|
||||
| **Endpoint** | `POST /api/v1/auth/login` |
|
||||
| **Status** | Confirmed |
|
||||
|
||||
### What I Did
|
||||
|
||||
```bash
|
||||
# Non-existent user
|
||||
curl -s -X POST http://10.6.80.114:18080/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"nonexistent_user_xyz","password":"test123"}'
|
||||
|
||||
# Existing user with wrong password
|
||||
curl -s -X POST http://10.6.80.114:18080/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"admin","password":"wrongpassword"}'
|
||||
```
|
||||
|
||||
### Expected
|
||||
|
||||
Both requests should return the same generic error message (e.g., "Invalid credentials") to prevent username enumeration.
|
||||
|
||||
### Actual
|
||||
|
||||
- Non-existent user: `{"error":"Login failed","message":"user not found","code":401}`
|
||||
- Existing user: `{"error":"Login failed","message":"invalid password","code":401}`
|
||||
|
||||
The error messages are different, allowing an attacker to determine whether a username exists in the system.
|
||||
|
||||
### Impact
|
||||
|
||||
An attacker can enumerate valid usernames by observing the error message difference. This is the first step in a targeted brute force or credential stuffing attack.
|
||||
|
||||
### Recommendation
|
||||
|
||||
Return identical error messages for both cases, e.g., `"Invalid username or password"`.
|
||||
|
||||
---
|
||||
|
||||
## Finding 2: No Rate Limiting on Login Endpoint
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Test** | Brute Force Protection |
|
||||
| **Severity** | **Medium** |
|
||||
| **Endpoint** | `POST /api/v1/auth/login` |
|
||||
| **Status** | Confirmed |
|
||||
|
||||
### What I Did
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 10); do
|
||||
curl -s -o /dev/null -w "%{http_code}" \
|
||||
-X POST http://10.6.80.114:18080/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"admin","password":"wrongpassword"}'
|
||||
done
|
||||
```
|
||||
|
||||
### Expected
|
||||
|
||||
After a threshold (e.g., 5 failed attempts), the server should return HTTP 429 Too Many Requests or temporarily lock the account.
|
||||
|
||||
### Actual
|
||||
|
||||
All 10 rapid sequential attempts returned HTTP 401. No rate limiting, no account lockout, no progressive delay.
|
||||
|
||||
### Impact
|
||||
|
||||
An attacker can brute force passwords without restriction. Combined with Finding 1 (user enumeration), the attack surface is increased.
|
||||
|
||||
### Recommendation
|
||||
|
||||
- Implement rate limiting on the login endpoint (e.g., max 5 attempts per minute per IP).
|
||||
- Consider account lockout after N failed attempts.
|
||||
- Add progressive response delays after repeated failures.
|
||||
|
||||
---
|
||||
|
||||
## Finding 3: Server Version Disclosure
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Test** | Information Disclosure |
|
||||
| **Severity** | **Low** |
|
||||
| **Endpoint** | All (HTTP response headers) |
|
||||
| **Status** | Confirmed |
|
||||
|
||||
### What I Did
|
||||
|
||||
```bash
|
||||
curl -s -D - http://10.6.80.114:18080/ | head -10
|
||||
```
|
||||
|
||||
### Expected
|
||||
|
||||
Server header should be generic (e.g., `Server: nginx`) or removed entirely.
|
||||
|
||||
### Actual
|
||||
|
||||
```http
|
||||
Server: nginx/1.27.5
|
||||
```
|
||||
|
||||
### Impact
|
||||
|
||||
Knowing the exact nginx version helps attackers target known vulnerabilities for that specific version.
|
||||
|
||||
### Recommendation
|
||||
|
||||
Disable or obfuscate the Server header in nginx configuration:
|
||||
|
||||
```nginx
|
||||
server_tokens off;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Finding 4: Permissive CORS Policy
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Test** | CORS Misconfiguration |
|
||||
| **Severity** | **Low** |
|
||||
| **Endpoint** | All API endpoints |
|
||||
| **Status** | Confirmed |
|
||||
|
||||
### What I Did
|
||||
|
||||
```bash
|
||||
curl -s -D - http://10.6.80.114:18080/api/v1/auth/login \
|
||||
-X POST -H "Content-Type: application/json" \
|
||||
-d '{"username":"test","password":"test"}'
|
||||
```
|
||||
|
||||
### Expected
|
||||
|
||||
CORS `Access-Control-Allow-Origin` should be restricted to the application's origin (e.g., `http://10.6.80.114:18080`) rather than allowing all origins.
|
||||
|
||||
### Actual
|
||||
|
||||
```http
|
||||
Access-Control-Allow-Origin: *
|
||||
Access-Control-Allow-Credentials: true
|
||||
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
|
||||
Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With
|
||||
Access-Control-Max-Age: 86400
|
||||
```
|
||||
|
||||
### Impact
|
||||
|
||||
Any website can make cross-origin requests to the API. If a user is logged in, a malicious site could potentially make authenticated API calls on their behalf (CSRF-style attack, though mitigated by the Bearer token requirement).
|
||||
|
||||
### Recommendation
|
||||
|
||||
Restrict `Access-Control-Allow-Origin` to the specific frontend origin(s) instead of `*`.
|
||||
|
||||
---
|
||||
|
||||
## Finding 5: Missing Security Headers
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Test** | Security Headers Audit |
|
||||
| **Severity** | **Low** |
|
||||
| **Endpoint** | All |
|
||||
| **Status** | Confirmed |
|
||||
|
||||
### What I Did
|
||||
|
||||
```bash
|
||||
curl -s -D - http://10.6.80.114:18080/ | head -20
|
||||
```
|
||||
|
||||
### Expected
|
||||
|
||||
Security headers should include:
|
||||
- `Strict-Transport-Security`
|
||||
- `X-Content-Type-Options: nosniff`
|
||||
- `X-Frame-Options: DENY`
|
||||
- `Content-Security-Policy`
|
||||
|
||||
### Actual
|
||||
|
||||
None of these security headers are present in responses.
|
||||
|
||||
### Impact
|
||||
|
||||
Increases attack surface for clickjacking, MIME-type confusion, and XSS attacks.
|
||||
|
||||
### Recommendation
|
||||
|
||||
Add the following headers to nginx configuration:
|
||||
|
||||
```
|
||||
add_header X-Frame-Options "DENY" always;
|
||||
add_header X-Content-Type-Options "nosniff" always;
|
||||
add_header X-XSS-Protection "0" always;
|
||||
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||
add_header Content-Security-Policy "default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline';" always;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Finding 6: `/health` Endpoint Returns HTML Instead of Health Status
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Test** | Health Endpoint Behavior |
|
||||
| **Severity** | **Low** |
|
||||
| **Endpoint** | `GET /health` |
|
||||
| **Status** | Confirmed |
|
||||
|
||||
### What I Did
|
||||
|
||||
```bash
|
||||
curl -s http://10.6.80.114:18080/health
|
||||
```
|
||||
|
||||
### Expected
|
||||
|
||||
A health check endpoint should return a structured JSON response (e.g., `{"status":"healthy"}`) with HTTP 200.
|
||||
|
||||
### Actual
|
||||
|
||||
Returns the full `index.html` SPA page with HTTP 200:
|
||||
|
||||
```html
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<title>OCDP Platform</title>
|
||||
...
|
||||
```
|
||||
|
||||
### Impact
|
||||
|
||||
Not a direct vulnerability, but misconfigured health checks can cause false positives in monitoring/load balancer health checks. It also means the SPA is served at `/health`, which is unexpected.
|
||||
|
||||
### Recommendation
|
||||
|
||||
Implement a dedicated health endpoint that returns `{"status":"ok"}` with appropriate content type, or remove the `/health` route if not needed.
|
||||
|
||||
---
|
||||
|
||||
## Tests Passed (No Issues Found)
|
||||
|
||||
| Test | Result |
|
||||
|------|--------|
|
||||
| **1. Unauthenticated Access** | **PASS** - All business endpoints return 401 |
|
||||
| **2. JWT Token Manipulation** | **PASS** - Tampered tokens, alg=none, invalid formats all rejected (401) |
|
||||
| **3. XSS/SQLi Testing** | **PASS** - Script injection, SQLi patterns safely handled |
|
||||
| **4. IDOR - Instance Access** | **PASS** - No instances deployed to test; cluster/registry isolation confirmed working |
|
||||
| **5. Sensitive Data Masking** | **PASS** - Cluster certs/keys and registry passwords masked as `••••••••` |
|
||||
| **6. Self-Registration** | **PASS** - Registration endpoint requires authentication (401) |
|
||||
| **7. Path Traversal** | **PASS** - Path traversal attempts return index.html (not /etc/passwd) |
|
||||
| **8. Admin Permission Escalation** | **PASS** - Regular users blocked from admin endpoints (403) |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Severity | Count | Findings |
|
||||
|----------|-------|----------|
|
||||
| Critical | 0 | — |
|
||||
| High | 0 | — |
|
||||
| **Medium** | **2** | User enumeration, No rate limiting |
|
||||
| **Low** | **4** | Server version disclosure, Permissive CORS, Missing security headers, `/health` returns HTML |
|
||||
| **Total** | **6** | |
|
||||
|
||||
The platform's core security controls (authentication, JWT validation, authorization, sensitive data masking) are properly implemented. The main areas for improvement are authentication hardening (rate limiting, user enumeration) and HTTP security hardening (headers, CORS).
|
||||
1640
docs/test-scenarios.md
Normal file
1640
docs/test-scenarios.md
Normal file
File diff suppressed because it is too large
Load Diff
79
docs/test-users.json
Normal file
79
docs/test-users.json
Normal file
@ -0,0 +1,79 @@
|
||||
{
|
||||
"meta": {
|
||||
"createdAt": "2026-05-11T09:58:00Z",
|
||||
"apiBase": "http://10.6.80.114:18080/api/v1",
|
||||
"adminUsername": "admin",
|
||||
"adminPassword": "admin123"
|
||||
},
|
||||
"existingResources": {
|
||||
"clusters": {
|
||||
"k8s": {
|
||||
"id": "23880994-dfe4-48d0-abc0-b49692cc630a",
|
||||
"host": "https://10.6.80.12:6443"
|
||||
},
|
||||
"k3s": {
|
||||
"id": "dbf824f1-9962-4d8e-881e-870c75fdb6f5",
|
||||
"host": "https://10.6.80.23:6443"
|
||||
}
|
||||
},
|
||||
"registries": {
|
||||
"harbor-bwgdi": {
|
||||
"id": "83b823af-873b-457c-912c-9ccde3cb12e6",
|
||||
"url": "https://harbor.bwgdi.com"
|
||||
}
|
||||
}
|
||||
},
|
||||
"testUsers": [
|
||||
{
|
||||
"id": "0c70fce6-fa69-4231-979a-5970ff9b854b",
|
||||
"username": "test-user-a",
|
||||
"password": "TestUserA123!",
|
||||
"email": "test-user-a@local.ocdp",
|
||||
"role": "user",
|
||||
"purpose": "Frontend UI testing",
|
||||
"namespace": "ocdp-u-test-a",
|
||||
"defaultClusterId": "dbf824f1-9962-4d8e-881e-870c75fdb6f5",
|
||||
"quotaCpu": "4",
|
||||
"quotaMemory": "8Gi",
|
||||
"quotaGpu": "1",
|
||||
"quotaGpuMemory": "5000"
|
||||
},
|
||||
{
|
||||
"id": "819b12ec-718e-48be-92bc-0cd1f7205926",
|
||||
"username": "test-user-b",
|
||||
"password": "TestUserB123!",
|
||||
"email": "test-user-b@local.ocdp",
|
||||
"role": "user",
|
||||
"purpose": "API/deploy testing",
|
||||
"namespace": "ocdp-u-test-b",
|
||||
"defaultClusterId": "dbf824f1-9962-4d8e-881e-870c75fdb6f5",
|
||||
"quotaCpu": "2",
|
||||
"quotaMemory": "4Gi",
|
||||
"quotaGpu": "0",
|
||||
"quotaGpuMemory": "0"
|
||||
},
|
||||
{
|
||||
"id": "04ef67ba-49c2-44e2-87b4-b71b5d9f36dc",
|
||||
"username": "test-user-c",
|
||||
"password": "TestUserC123!",
|
||||
"email": "test-user-c@local.ocdp",
|
||||
"role": "user",
|
||||
"purpose": "Permission isolation testing",
|
||||
"namespace": "ocdp-u-test-c",
|
||||
"defaultClusterId": "dbf824f1-9962-4d8e-881e-870c75fdb6f5",
|
||||
"quotaCpu": "4",
|
||||
"quotaMemory": "8Gi",
|
||||
"quotaGpu": "1",
|
||||
"quotaGpuMemory": "5000"
|
||||
},
|
||||
{
|
||||
"id": "8bcffd0e-4e7a-4e9a-a47b-bfdb463698c2",
|
||||
"username": "test-admin-d",
|
||||
"password": "TestAdminD123!",
|
||||
"email": "test-admin-d@local.ocdp",
|
||||
"role": "admin",
|
||||
"purpose": "Admin features testing",
|
||||
"namespace": "ocdp-ws-default"
|
||||
}
|
||||
]
|
||||
}
|
||||
156
docs/test2-quota.md
Normal file
156
docs/test2-quota.md
Normal file
@ -0,0 +1,156 @@
|
||||
# Resource Quota Enforcement Test Report
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Tester:** test-user-b
|
||||
**Namespace:** ocdp-u-test-b
|
||||
**User Quota:** cpu=2, memory=4Gi, gpu=0, gpumem=0
|
||||
|
||||
---
|
||||
|
||||
## Test Summary
|
||||
|
||||
| Test | Description | Expected | Actual | Result |
|
||||
|------|-------------|----------|--------|--------|
|
||||
| A | Deploy nginx (default, within quota) | Success | Deployed (status: `deployed`) | ✅ PASS |
|
||||
| B | Deploy nginx (cpu=4, mem=8Gi, replicas=5, exceeds quota) | Blocked by quota | Helm release created, Service created, all pods blocked by ResourceQuota (status: `pending-install`) | ⚠️ PARTIAL |
|
||||
| C | Deploy vllm-serve with gpu=1 (gpu quota = 0) | Blocked by quota | Helm release created, all pods blocked by ResourceQuota (status: `pending-install`) | ⚠️ PARTIAL |
|
||||
|
||||
---
|
||||
|
||||
## Detailed Results
|
||||
|
||||
### Test A: Deploy nginx within quota limits
|
||||
|
||||
- **Instance:** `quota-test-nginx` (ed846c33-3631-4d54-adce-c7f00210176f)
|
||||
- **Chart:** charts/nginx:22.1.1
|
||||
- **Values:** defaults
|
||||
- **API Response:** HTTP 200, status: `pending-install`
|
||||
- **Final Status after 21s:** `deployed` ("Instance deployed successfully")
|
||||
- **K8s Resource Usage:** requests.cpu=100m/2, requests.memory=128Mi/4Gi
|
||||
|
||||
### Test B: Deploy nginx exceeding quota
|
||||
|
||||
- **Instance:** `quota-test-nginx-2` (36c0350f-089c-41c2-a66e-e93539c00d52)
|
||||
- **Chart:** charts/nginx:22.1.1
|
||||
- **Values:** replicaCount=5, resources.limits.cpu=4/memory=8Gi, resources.requests.cpu=2/memory=4Gi
|
||||
- **API Response:** HTTP 200, status: `pending-install`
|
||||
- **Final Status (observed for 90s+):** `pending-install` (never transitioned to `deployed` or `failed`)
|
||||
- **K8s Behavior:**
|
||||
- Helm release created: `sh.helm.release.v1.quota-test-nginx-2.v1`
|
||||
- TLS secret created
|
||||
- Service created, IP assigned
|
||||
- Deployment created, ReplicaSet scaled up
|
||||
- **All pod creations FAILED** with: `Error creating: pods "..." is forbidden: exceeded quota: tenant-quota, requested: requests.cpu=2,requests.memory=4Gi, used: requests.cpu=100m,requests.memory=128Mi, limited: requests.cpu=2,requests.memory=4Gi`
|
||||
|
||||
### Test C: Deploy GPU instance (gpu quota = 0)
|
||||
|
||||
- **Instance:** `quota-test-gpu` (a0d692c8-cdf8-4248-a6d4-1468ad4a7cc7)
|
||||
- **Chart:** charts/vllm-serve:0.6.0
|
||||
- **Values:** resources.gpuLimit=1, resources.gpuMem=5000
|
||||
- **API Response:** HTTP 200, status: `pending-install`
|
||||
- **Final Status (observed for 30s+):** `pending-install`
|
||||
- **K8s Behavior:**
|
||||
- vllm-serve chart defaults: requests.cpu=8, requests.memory=16Gi, requests.nvidia.com/gpu=1, requests.nvidia.com/gpumem=5k
|
||||
- All pods blocked: `exceeded quota: tenant-quota, requested: requests.cpu=8,requests.memory=16Gi,requests.nvidia.com/gpu=1,..., limited: requests.cpu=2,requests.memory=4Gi,requests.nvidia.com/gpu=0`
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### 1. No API-Level (Pre-flight) Quota Enforcement
|
||||
|
||||
The backend API accepts **all** deployment requests regardless of whether they exceed the user's quota. There is no validation at the API layer that checks:
|
||||
|
||||
- Whether the requested resources exceed the user's quota limits
|
||||
- Whether the user's quota is already fully consumed by existing deployments
|
||||
|
||||
**Evidence:** All three deployments returned HTTP 200 with `status: pending-install`. The backend logs contain zero quota-related entries.
|
||||
|
||||
### 2. Kubernetes ResourceQuota Enforces at Pod Level
|
||||
|
||||
The Kubernetes `ResourceQuota` object `tenant-quota` in namespace `ocdp-u-test-b` does enforce limits, but only at the **pod creation** level:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
hard:
|
||||
requests.cpu: "2"
|
||||
requests.memory: 4Gi
|
||||
requests.nvidia.com/gpu: "0"
|
||||
requests.nvidia.com/gpumem: "0"
|
||||
```
|
||||
|
||||
When pods exceed quota, Kubernetes explicitly refuses to create them with a clear error message.
|
||||
However, Helm releases, Services, Deployments, and ReplicaSets are **still created** even when pods are blocked.
|
||||
|
||||
### 3. Stuck at "pending-install"
|
||||
|
||||
Instances that exceed quota remain stuck in `pending-install` status **indefinitely** — they never transition to `deployed`, `failed`, or any error status. The OCDP platform does not detect the ResourceQuota rejection and update the instance status accordingly. The only way to know about the failure is to check Kubernetes events directly:
|
||||
|
||||
```bash
|
||||
kubectl get events -n ocdp-u-test-b
|
||||
```
|
||||
|
||||
### 4. GPU Quota Enforcement
|
||||
|
||||
Users with `gpu=0` quota **can** submit deployments referencing GPU-enabled charts. The API does not reject them. Only the K8s ResourceQuota blocks pod creation at runtime. This could lead to:
|
||||
- Unnecessary Helm releases and resource overhead in the cluster
|
||||
- Confusion for users whose deployments appear to hang at `pending-install`
|
||||
|
||||
### 5. Quota Exposed in Login Response
|
||||
|
||||
The login response includes quota information:
|
||||
```json
|
||||
{
|
||||
"quotaCpu": "2",
|
||||
"quotaMemory": "4Gi",
|
||||
"quotaGpu": "0",
|
||||
"quotaGpuMemory": "0"
|
||||
}
|
||||
```
|
||||
This could be used by the frontend to show usage limits, but no pre-flight check uses it server-side.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Add pre-flight quota validation** in the backend API: before accepting a deployment, check whether the requested resources (from chart values) would exceed the user's quota. Return HTTP 4xx with a clear error message.
|
||||
|
||||
2. **Handle "pending-install" timeout**: implement a watcher that detects when a Helm release has been created but pods remain stuck (e.g., due to ResourceQuota) and:
|
||||
- Update instance status to `failed` with a descriptive `statusReason`
|
||||
- Clean up the Helm release, Service, etc.
|
||||
- Optionally surface the K8s error message via the API
|
||||
|
||||
3. **GPU quota pre-check**: if a chart requests GPU resources and the user's `gpu=0`, reject the deployment at the API level before creating any Kubernetes resources.
|
||||
|
||||
4. **UI quota indicator**: show remaining quota (used vs. hard limit) on the deployment form so users know their limits before submitting.
|
||||
|
||||
---
|
||||
|
||||
## ResourceQuota YAML (for reference)
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ResourceQuota
|
||||
metadata:
|
||||
name: tenant-quota
|
||||
namespace: ocdp-u-test-b
|
||||
labels:
|
||||
ocdp.io/managed-by: ocdp
|
||||
ocdp.io/tenant: ocdp-u-test-b
|
||||
spec:
|
||||
hard:
|
||||
requests.cpu: "2"
|
||||
requests.memory: 4Gi
|
||||
requests.nvidia.com/gpu: "0"
|
||||
requests.nvidia.com/gpumem: "0"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cleanup Verification
|
||||
|
||||
All test instances were removed after testing:
|
||||
- `quota-test-nginx` ✅ deleted (pods terminated, helm release removed, quota back to 0)
|
||||
- `quota-test-nginx-2` ✅ cleaned up (no pods created, resources released)
|
||||
- `quota-test-gpu` ✅ cleaned up (no pods created, resources released)
|
||||
- ResourceQuota used: all resources at 0
|
||||
141
docs/test2-report.md
Normal file
141
docs/test2-report.md
Normal file
@ -0,0 +1,141 @@
|
||||
# OCDP 第二次测试报告
|
||||
|
||||
**测试日期:** 2026-05-11
|
||||
**测试环境:** http://10.6.80.114:18080
|
||||
|
||||
---
|
||||
|
||||
## 测试1: 资源配额限额
|
||||
|
||||
### 测试方法
|
||||
使用 test-user-b(quota: cpu=2, mem=4Gi, gpu=0, gpumem=0)在 k3s 集群部署 nginx chart
|
||||
|
||||
### 测试结果
|
||||
|
||||
| 测试 | 操作 | 预期 | 实际 | 结论 |
|
||||
|------|------|------|------|------|
|
||||
| Test A | 部署 nginx(默认值,在配额内) | 成功 | 部署完成,状态 deployed | ✅ |
|
||||
| Test B | 部署 nginx(requests.cpu=2, mem=4Gi, replica=5,超配额) | 被配额阻止 | Helm release 创建成功,所有 Pod 被 ResourceQuota 阻塞,状态永远 stuck 在 pending-install | ⚠️ 部分通过 |
|
||||
| Test C | 部署 vllm-serve(gpuLimit=1,gpu配额=0) | 被配额阻止 | Helm release 创建成功,Pod 被 ResourceQuota 阻塞,状态 pending-install | ⚠️ 部分通过 |
|
||||
|
||||
### 关键发现
|
||||
|
||||
**1. 没有 API 层的预检查配额验证**
|
||||
- 后端 API 无条件接受所有部署请求,不检查是否超配额
|
||||
- 所有超配额请求返回 HTTP 200 + status: pending-install
|
||||
- 后端日志中**没有任何配额相关的条目**
|
||||
|
||||
**2. K8s ResourceQuota 在 Pod 级别执行**
|
||||
- `tenant-quota` ResourceQuota 对象确实存在并执行限制
|
||||
- 当 Pod 超配额时,K8s 明确拒绝创建并给出错误消息
|
||||
- 但 Helm release、Service、Deployment、ReplicaSet **仍然被创建**
|
||||
|
||||
**3. 实例永远 stuck 在 "pending-install"**
|
||||
- 超配额的实例永远不会转换到 deployed/failed/error
|
||||
- OCDP 平台不检测 ResourceQuota 拒绝事件
|
||||
- 唯一知道失败的方式是直接查 K8s events
|
||||
|
||||
**4. GPU 配额绕过**
|
||||
- gpu=0 的用户可以提交需要 GPU 的 chart 部署
|
||||
- K8s ResourceQuota 最终会阻止,但 Helm release 等资源已被创建
|
||||
|
||||
**5. 有效的 ResourceQuota 配置**
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ResourceQuota
|
||||
metadata:
|
||||
name: tenant-quota
|
||||
namespace: ocdp-u-test-b
|
||||
spec:
|
||||
hard:
|
||||
requests.cpu: "2"
|
||||
requests.memory: 4Gi
|
||||
requests.nvidia.com/gpu: "0"
|
||||
requests.nvidia.com/gpumem: "0"
|
||||
```
|
||||
|
||||
### 建议
|
||||
1. **添加 API 层预检查配额验证** — 在接受部署前检查请求资源是否超过用户配额
|
||||
2. **处理 pending-install 超时** — 监控 Helm release 创建后 Pod 是否 stuck,更新状态为 failed
|
||||
3. **GPU 配额预检查** — 如果 chart 需要 GPU 而用户 gpu=0,在 API 层拒绝
|
||||
4. **UI 配额指示器** — 在部署表单上显示剩余配额
|
||||
|
||||
---
|
||||
|
||||
## 测试2: values.yaml 覆盖优先级
|
||||
|
||||
### 测试方法
|
||||
使用 test-user-c(quota: cpu=4, mem=8Gi, gpu=1, gpumem=5000)部署 vllm-serve:0.6.0 chart
|
||||
|
||||
### 测试结果
|
||||
|
||||
| 方法 | 提交方式 | 是否部署成功 | 存储的值 | 结论 |
|
||||
|------|----------|-------------|---------|------|
|
||||
| 方法1 | `values` JSON 字段 | ✅ | cpuRequest=2, gpuMem=10000 | JSON 值被准确接受和存储 |
|
||||
| 方法2 | `valuesYaml` 字符串 | ✅ | cpuRequest=4, gpuMem=10000 | YAML 被正确解析为结构化 values |
|
||||
| 方法3 | 同时提供 `values` + `valuesYaml`(冲突) | ✅ 无任何错误/警告 | **values JSON 全胜** | `values` JSON 静默覆盖 `valuesYaml` |
|
||||
| 方法4 | 不提供任何 values(使用 chart 默认) | ✅ | 仅 namespace | chart 默认值不存储在 API 响应中 |
|
||||
|
||||
### 优先级最终结论
|
||||
|
||||
| 优先级 | 来源 | 说明 |
|
||||
|--------|------|------|
|
||||
| **最高** | `values` JSON 字段 | 请求体中的结构化 JSON |
|
||||
| **中** | `valuesYaml` 字符串 | 请求体中的 YAML 字符串 |
|
||||
| **最低** | Chart 内置 values.yaml | Helm chart 打包的默认值 |
|
||||
|
||||
### 冲突测试详细结果
|
||||
|
||||
当同时提供 `values` 和 `valuesYaml` 且值冲突时:
|
||||
- `values` JSON 字段**完全覆盖** `valuesYaml`
|
||||
- **没有任何错误或警告**返回给用户
|
||||
- 两者被合并到统一的 DB `values` 字段
|
||||
|
||||
### gpuMem=10000 行为
|
||||
- 整数值 `10000` 在 `values` JSON 和 `valuesYaml` 中都被**正确接受**
|
||||
- 无单位转换(作为整数 MB 标量存储)
|
||||
- 符合项目规范
|
||||
|
||||
### 建议
|
||||
1. **记录优先级顺序** — 用户需知道同时提供两者时 values JSON 优先
|
||||
2. **添加冲突警告** — 当两个字段存在冲突值时应返回警告
|
||||
3. **考虑废弃一个字段** — values 和 valuesYaml 语义重复易混淆
|
||||
|
||||
---
|
||||
|
||||
## 测试3: 前端 UI 溢出/滚动/刷新
|
||||
|
||||
### 测试方法
|
||||
Playwright + 源码分析,测试 1920/768/375 三个视口
|
||||
|
||||
### 测试结果
|
||||
**总体结论: PASS** — 没有导致功能问题的关键溢出问题
|
||||
|
||||
| 测试项 | 结果 | 详情 |
|
||||
|--------|------|------|
|
||||
| 水平溢出 | ✅ 无问题 | 所有视口均无水平溢出 |
|
||||
| 文本截断 | ⚠️ 1 个低风险 | InstanceCard h3 标题 truncate 无 title tooltip |
|
||||
| 响应式设计 | ✅ 正确 | sm/md/lg/xl 断点覆盖完整 |
|
||||
| 滚动行为 | ✅ 流畅 | Sidebar 和内容区独立滚动,overscroll-contain 防滚动穿透 |
|
||||
| 模态框布局 | ✅ 正确 | body scroll lock + 内容独立滚动 |
|
||||
| 页面刷新 | ✅ 正常 | 受保护路由正确重定向到登录页 |
|
||||
| 颜色对比度 | ⚠️ 1 个中风险 | 登录页错误文本 red-400 在白色背景上仅 2.5:1 (WCAG AA 要求 4.5:1) |
|
||||
|
||||
### 通过的细分项
|
||||
- Chart Browser 全高 + overflow-y-auto 布局 ✅
|
||||
- InstanceCard 操作按钮网格 grid-cols-2/3/5 响应正确 ✅
|
||||
- Tabs 支持 overflow-x-auto 水平滚动 ✅
|
||||
- 用户管理表格 overflow-x-auto ✅
|
||||
- iOS 触摸滚动 (`-webkit-overflow-scrolling: touch`) 已配置 ✅
|
||||
|
||||
### 建议
|
||||
1. 将登录页错误文本从 text-red-400 改为 text-red-600/700
|
||||
2. InstanceCard h3 标题添加 title 属性
|
||||
|
||||
---
|
||||
|
||||
## 综合建议
|
||||
1. 添加 API 层配额预检查
|
||||
2. 处理 pending-install 超时 + 状态更新
|
||||
3. 记录 values 覆盖优先级并添加冲突警告
|
||||
4. 统一 values JSON/YAML 的 API 设计
|
||||
271
docs/test2-ui-overflow.md
Normal file
271
docs/test2-ui-overflow.md
Normal file
@ -0,0 +1,271 @@
|
||||
# QA Report: UI Layout Overflow & Responsiveness Test
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Environment:** http://10.6.80.114:18080
|
||||
**Browser:** Chromium (Playwright headless)
|
||||
**Test Credentials:** test-user-a / TestUserA123!
|
||||
|
||||
---
|
||||
|
||||
## Test Results Summary
|
||||
|
||||
| # | Test | Status | Issues Found |
|
||||
|----|------|--------|-------------|
|
||||
| 1 | Login Page Layout | ✅ Pass | 1 Low |
|
||||
| 2 | Home Page | ✅ Pass | 0 |
|
||||
| 3 | Chart Browser (Registries) | ✅ Pass | 0 |
|
||||
| 4 | Instances Page | ✅ Pass | 0 |
|
||||
| 5 | Monitoring Page | ✅ Pass | 0 |
|
||||
| 6 | Tablet Responsive (768px) | ✅ Pass | 0 |
|
||||
| 7 | Mobile Responsive (375px) | ✅ Pass | 0 |
|
||||
| 8 | Deep DOM Overflow Analysis | ✅ Pass | 0 |
|
||||
| 9 | Source Code CSS Pattern Audit | ✅ Pass | 2 Info |
|
||||
| 10 | Text Visibility & Contrast | ⚠️ 1 Issue | 1 Medium |
|
||||
|
||||
---
|
||||
|
||||
## 1. Login Page (AuthPage.tsx)
|
||||
|
||||
**Location:** `frontend/src/features/auth/pages/AuthPage.tsx`
|
||||
|
||||
**Layout:**
|
||||
- Form card is `max-w-md` (448px), horizontally centered via `flex items-center justify-center`
|
||||
- Desktop viewport (1920×1080): card is perfectly centered (checked via bounding rect)
|
||||
- Background: `bg-slate-50` with gradient overlay
|
||||
- Card: `bg-white/95 backdrop-blur-xl` with `shadow-2xl`
|
||||
|
||||
**Responsive:**
|
||||
- Padding: `px-4 sm:px-6` — increases from 16px → 24px on `sm:` breakpoint
|
||||
- Card padding: `p-6 sm:p-7`
|
||||
- Icon: `w-11 h-11` — fixed size, not responsive
|
||||
|
||||
### ✅ Issue #1-LOW: Login error text color contrast
|
||||
- **File:** `AuthPage.tsx:96`
|
||||
- **Pattern:** `<p className="text-red-400 text-center text-sm">`
|
||||
- **Problem:** `text-red-400` (`#f87171`) on white background has a contrast ratio of ~2.5:1, which fails WCAG AA (minimum 4.5:1 for normal text). Error messages may be hard to read for users with visual impairments.
|
||||
- **Recommendation:** Use `text-red-600` or `text-red-700` for error text on white backgrounds.
|
||||
|
||||
---
|
||||
|
||||
## 2. Home Page
|
||||
|
||||
**Location:** `frontend/src/features/home/pages/HomePage.tsx`
|
||||
|
||||
**Layout:**
|
||||
- Main container: `min-h-full bg-slate-50 px-4 py-6 sm:px-6 lg:px-8`
|
||||
- Two-column layout on large screens: `lg:grid-cols-[1.4fr_0.8fr]`
|
||||
- Feature cards: `md:grid-cols-3`
|
||||
- Quick actions: `md:grid-cols-3`
|
||||
|
||||
**Scroll:** ScrollHeight=1080, Viewport=1080 — content fits exactly without scrolling on 1080p.
|
||||
|
||||
**Overflow:** No horizontal overflow detected. Proper use of responsive padding and grid columns.
|
||||
|
||||
### Passing — no issues found.
|
||||
|
||||
---
|
||||
|
||||
## 3. Chart Browser / Registries
|
||||
|
||||
**Location:** `frontend/src/features/artifact/registries/pages/ArtifactBrowserPage.tsx`
|
||||
|
||||
**Layout (Desktop):**
|
||||
- Main layout: `flex-1 flex overflow-hidden bg-slate-50` (sidebar + detail panes)
|
||||
- Sidebar tree: `flex-1 overflow-y-auto custom-scrollbar`
|
||||
- Detail pane: `flex-1 flex flex-col bg-white overflow-hidden`
|
||||
- Tag grid: `grid-cols-1 md:grid-cols-2 xl:grid-cols-3 gap-4`
|
||||
|
||||
**Tablet (768px):** No overflow. Grid collapses to 2 columns.
|
||||
|
||||
**Mobile (375px):** No overflow. Grid collapses to 1 column.
|
||||
|
||||
### Key Patterns Found:
|
||||
- `RepositoryItem.tsx:212` — `<span className="text-sm text-gray-200 font-mono truncate" title={repository}>` — proper truncation with `title` tooltip
|
||||
- `ArtifactBrowserPage.tsx:336` — `<p className="text-[11px] text-slate-500 truncate">` — uses 11px text with truncation
|
||||
- `TagCard.tsx` — uses truncation with `title` attribute for long names
|
||||
|
||||
### Passing — no overflow issues found.
|
||||
|
||||
---
|
||||
|
||||
## 4. Instances Page
|
||||
|
||||
**Location:** `frontend/src/features/artifact/instances/pages/InstancesManagementPage.tsx`
|
||||
**Component:** `InstanceCard.tsx`
|
||||
|
||||
**Layout:**
|
||||
- Cluster cards: responsive grid `clusters.length > 1 ? 'md:grid-cols-3' : 'md:grid-cols-2'`
|
||||
- Instance cards listed in single column then `lg:grid-cols-2 gap-6`
|
||||
- Action buttons grid: `grid-cols-2 gap-2 md:grid-cols-3 xl:grid-cols-5`
|
||||
|
||||
### ✅ Issue #2-INFO: Action button text truncation on InstanceCards
|
||||
- **File:** `InstanceCard.tsx:285-327`
|
||||
- **Pattern:**
|
||||
```
|
||||
<div className="grid grid-cols-2 gap-2 md:grid-cols-3 xl:grid-cols-5">
|
||||
<button>
|
||||
<span className="truncate">Refresh</span>
|
||||
</button>
|
||||
</div>
|
||||
```
|
||||
- **Analysis:** At `grid-cols-2` (small screens), two buttons share each row. The buttons use `min-w-0` which allows them to shrink, and `truncate` on the text span. However, the button text is short ("Refresh", "Entries", "Diagnostics", "Modify", "Delete"), so truncation is unlikely to occur in practice.
|
||||
- **Mitigation:** Each `<span>` has `title` attribute on parent button, providing tooltip fallback.
|
||||
- **Verdict:** Acceptable — button labels are intentionally short and tooltips are present.
|
||||
|
||||
### ✅ Issue #3-INFO: Header text truncation with tooltip
|
||||
- **File:** `InstanceCard.tsx:185`
|
||||
- **Pattern:** `<h3 className="text-xl font-bold text-slate-950 truncate">{instanceName}</h3>`
|
||||
- **Analysis:** Instance names could be long, `truncate` will clip with ellipsis. No `title` attribute on this element — unlike repository text below it.
|
||||
- **Recommendation:** Add `title={instanceName}` to the `<h3>` element for tooltip on overflow.
|
||||
|
||||
### Passing — no critical overflow issues found.
|
||||
|
||||
---
|
||||
|
||||
## 5. Monitoring / Clusters
|
||||
|
||||
**Location:** `frontend/src/features/monitoring/clusters/`
|
||||
|
||||
**Layout:**
|
||||
- Cluster cards grid: `grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-4`
|
||||
- Card header: `<h3 className="text-lg font-semibold text-slate-900 truncate">` with cluster name
|
||||
- Metrics: `grid-cols-2 sm:grid-cols-4 gap-4 mb-3`
|
||||
- Resource bars: `overflow-hidden` for proper progress bar clipping
|
||||
- Node details: `grid-cols-1 lg:grid-cols-2 gap-3`
|
||||
|
||||
**Overflow Check:** ScrollWidth = clientWidth at all tested viewports — no horizontal overflow.
|
||||
|
||||
**Responsive:**
|
||||
- 1920px: 4 columns of cluster cards
|
||||
- 768px: 2 columns
|
||||
- 375px: 1 column
|
||||
|
||||
### Passing — no issues found.
|
||||
|
||||
---
|
||||
|
||||
## 6. Sidebar Layout
|
||||
|
||||
**Location:** `frontend/src/shared/components/layout/SidebarLayout/`
|
||||
|
||||
**Layout:**
|
||||
- Parent: `min-h-screen flex bg-dark text-primary overflow-hidden`
|
||||
- Nav: `flex-1 p-3 space-y-1 overflow-y-auto` — independently scrollable
|
||||
- Footer: Fixed at bottom, `p-3 text-xs text-muted`
|
||||
|
||||
**Scroll Analysis:**
|
||||
- Content area has `overflow-y-auto`, so sidebar nav items scroll independently when they exceed viewport height
|
||||
- The footer anchors to the bottom of the sidebar (not the scroll area)
|
||||
- At 1080px viewport, sidebar content fits without scrolling
|
||||
|
||||
**Potential Concern:** If many nav items are added, the footer will push below the fold and the user must scroll the nav to see it. The `overflow-y-auto` on the `<nav>` element handles this correctly.
|
||||
|
||||
### Passing — no issues found.
|
||||
|
||||
---
|
||||
|
||||
## 7. Tabs Component
|
||||
|
||||
**Location:** `frontend/src/shared/components/layout/Tabs.tsx`
|
||||
|
||||
**Pattern:**
|
||||
```
|
||||
<div className="flex gap-2 sm:gap-4 overflow-x-auto scrollbar-thin scrollbar-thumb-gray-600 scrollbar-track-transparent">
|
||||
<button className="whitespace-nowrap flex-shrink-0">
|
||||
```
|
||||
|
||||
**Analysis:**
|
||||
- Uses `overflow-x-auto` — horizontal scroll appears when tabs exceed container width
|
||||
- `whitespace-nowrap` prevents tab text from wrapping
|
||||
- `flex-shrink-0` prevents tab items from compressing
|
||||
- Custom thin scrollbar styling for webkit browsers
|
||||
|
||||
**Edge Case:** On very small viewports with many tabs, users need to horizontally scroll. This is acceptable UX for a tabs pattern.
|
||||
|
||||
### Passing — no issues found.
|
||||
|
||||
---
|
||||
|
||||
## 8. Modal Component
|
||||
|
||||
**Location:** `frontend/src/shared/components/layout/Modal.tsx`
|
||||
|
||||
**Pattern:**
|
||||
- Body scroll lock: sets `document.body.style.overflow = 'hidden'` on open
|
||||
- Modal overlay: `fixed inset-0 z-[90] flex items-start sm:items-center justify-center overflow-y-auto p-4 sm:p-6`
|
||||
- Content: `max-h-[calc(100vh-12rem)] sm:max-h-[calc(100vh-10rem)] overflow-y-auto`
|
||||
|
||||
**Analysis:**
|
||||
- Properly handles body scroll prevention
|
||||
- Modal content is independently scrollable when content exceeds viewport
|
||||
- `padding-right` compensation prevents layout shift when scrollbar disappears
|
||||
|
||||
### Passing — no issues found.
|
||||
|
||||
---
|
||||
|
||||
## 9. Deep DOM Overflow Analysis
|
||||
|
||||
The Playwright script ran a comprehensive scan of ALL DOM elements, checking:
|
||||
- `overflow: hidden` elements where `scrollWidth > clientWidth` (clipped content)
|
||||
- `text-overflow: ellipsis` elements where content overflows
|
||||
- `white-space: nowrap` causing overflow
|
||||
- Tiny text (< 10px)
|
||||
|
||||
**Result: No overflow:hidden clipping detected, no text truncation overflow detected on the monitoring page (last page tested).**
|
||||
|
||||
Note: The test scans the current page's DOM after navigating through pages. Some truncation exists (`InstanceCard.tsx`, `RepositoryItem.tsx`, etc.) but all uses are intentional and include `title` tooltip fallbacks.
|
||||
|
||||
---
|
||||
|
||||
## 10. CSS Pattern Summary
|
||||
|
||||
### Overflow Patterns Used in Codebase
|
||||
| Pattern | Used For | Risk |
|
||||
|---------|----------|------|
|
||||
| `overflow-hidden` | Card containers, modal wrappers, progress bars | Low — decorative/structural |
|
||||
| `overflow-y-auto` | Scrollable content areas (sidebar nav, modals, detail panes) | None — intentional scroll |
|
||||
| `overflow-x-auto` | Tabs, data tables | Low — scroll indicator needed |
|
||||
| `truncate` | Instance names, repository names, tags, button labels | Low — tooltips provided on most |
|
||||
| `whitespace-nowrap` | Tab items, table headers | Low — paired with overflow-x-auto |
|
||||
| `line-clamp-1` | Registry descriptions | None — CSS line clamp |
|
||||
|
||||
### Responsive Breakpoints Used
|
||||
| Breakpoint | Usage |
|
||||
|-----------|-------|
|
||||
| `sm:` (640px) | Login form padding, cluster form layout, button layouts, tabs spacing |
|
||||
| `md:` (768px) | Grid columns (2-3 cols), card layouts, diagnostics modal |
|
||||
| `lg:` (1024px) | Two-column layouts, 4-col monitoring grids |
|
||||
| `xl:` (1280px) | 5 action button columns, 3-col tag grids |
|
||||
| `2xl:` (1536px) | Not used |
|
||||
|
||||
### Fixed Widths Checked
|
||||
No problematic fixed widths found. All layouts use `max-w-` constraints (`max-w-md`, `max-w-6xl`, `max-w-7xl`) rather than fixed pixel widths.
|
||||
|
||||
---
|
||||
|
||||
## 11. Page Refresh Behavior
|
||||
|
||||
The SPA uses React Router. When navigating to authenticated routes:
|
||||
- `ProtectedRoute` component checks `isAuthenticated` and `isAllowed`
|
||||
- If not authenticated, users are redirected to login page (`/`)
|
||||
- After login, `navigate("/home", { replace: true })` navigates to home
|
||||
- Page refresh at any route should redirect to login if token is expired
|
||||
|
||||
**Note:** The `/login` route path does not exist in the SPA router — login is handled by `AuthPage` rendered at the root `/` path when the user is unauthenticated.
|
||||
|
||||
---
|
||||
|
||||
## Final Verdict
|
||||
|
||||
| Category | Score |
|
||||
|----------|-------|
|
||||
| Horizontal Overflow | ✅ No issues at any viewport |
|
||||
| Text Truncation | ⚠️ InstanceCard `<h3>` missing tooltip fallback |
|
||||
| Responsive Design | ✅ Proper breakpoints at sm/md/lg/xl |
|
||||
| Scroll Behavior | ✅ Sidebar and content areas properly scrollable |
|
||||
| Color Contrast | ⚠️ Login error text (red-400) fails WCAG AA |
|
||||
| Modal Layout | ✅ Body scroll lock + content scroll work correctly |
|
||||
| Page Refresh | ✅ Protected routes redirect to login |
|
||||
|
||||
**Overall: PASS** — Two minor issues found (color contrast on login error text, missing tooltip on InstanceCard title), neither causing functional problems.
|
||||
110
docs/test2-values-priority.md
Normal file
110
docs/test2-values-priority.md
Normal file
@ -0,0 +1,110 @@
|
||||
# Test Report: values.yaml Override Priority
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Tester:** test-user-c
|
||||
**Cluster:** dbf824f1-9962-4d8e-881e-870c75fdb6f5
|
||||
**Chart:** charts/vllm-serve:0.6.0
|
||||
**Namespace:** ocdp-u-test-c
|
||||
|
||||
---
|
||||
|
||||
## Test Results
|
||||
|
||||
### Method 1: `values` JSON field only (vllm-values-json)
|
||||
|
||||
- **Deployment:** ✅ Success (status: pending-install, no errors)
|
||||
- **Submitted values:**
|
||||
```json
|
||||
{ "cpuRequest": 2, "gpuLimit": 1, "gpuMem": 10000, "memoryLimit": "4Gi" }
|
||||
```
|
||||
- **Stored values (from API response):**
|
||||
```json
|
||||
{ "cpuRequest": 2, "gpuLimit": 1, "gpuMem": 10000, "memoryLimit": "4Gi" }
|
||||
```
|
||||
- **Result:** Values were accepted and stored exactly as provided. No chart defaults were merged into the stored representation (e.g., `shmSize: "8Gi"` from chart defaults is absent).
|
||||
|
||||
### Method 2: `valuesYaml` string field only (vllm-values-yaml)
|
||||
|
||||
- **Deployment:** ✅ Success (status: pending-install, no errors)
|
||||
- **Submitted valuesYaml:**
|
||||
```yaml
|
||||
resources:
|
||||
cpuRequest: 4
|
||||
gpuLimit: 1
|
||||
gpuMem: 10000
|
||||
memoryLimit: "8Gi"
|
||||
model:
|
||||
huggingfaceName: "Qwen/Qwen2.5-0.5B-Instruct"
|
||||
```
|
||||
- **Stored values (parsed and stored in DB):**
|
||||
```json
|
||||
{ "cpuRequest": 4, "gpuLimit": 1, "gpuMem": 10000, "memoryLimit": "8Gi" }
|
||||
```
|
||||
- **Result:** The YAML string was correctly parsed into the structured `values` field in the database. YAML parsing works correctly.
|
||||
|
||||
### Method 3: Both `values` JSON AND `valuesYaml` with conflict (vllm-conflict-test)
|
||||
|
||||
- **Deployment:** ✅ Success (status: pending-install, **no error or warning returned**)
|
||||
- **`values` JSON submitted:**
|
||||
```json
|
||||
{ "cpuRequest": 4, "memoryLimit": "8Gi", "huggingfaceName": "Qwen/Qwen2.5-0.5B-Instruct" }
|
||||
```
|
||||
- **`valuesYaml` submitted:**
|
||||
```yaml
|
||||
resources:
|
||||
cpuRequest: 2
|
||||
memoryLimit: "4Gi"
|
||||
model:
|
||||
huggingfaceName: "Qwen/Qwen2.5-7B-Instruct"
|
||||
```
|
||||
- **Stored values:**
|
||||
```json
|
||||
{ "cpuRequest": 4, "gpuLimit": 1, "gpuMem": 10000, "memoryLimit": "8Gi", "huggingfaceName": "Qwen/Qwen2.5-0.5B-Instruct" }
|
||||
```
|
||||
- **Result:** The `values` JSON field **won every conflict**. The `valuesYaml` values (`cpuRequest: 2`, `memoryLimit: "4Gi"`, `Qwen/Qwen2.5-7B-Instruct`) were completely overridden by the `values` JSON values (`cpuRequest: 4`, `memoryLimit: "8Gi"`, `Qwen/Qwen2.5-0.5B-Instruct`). No error or warning was presented to the user.
|
||||
|
||||
### Method 4: No values (chart defaults, vllm-defaults-test)
|
||||
|
||||
- **Deployment:** ✅ Success (status: pending-install, no errors)
|
||||
- **Stored values:**
|
||||
```json
|
||||
{ "namespace": "ocdp-u-test-c" }
|
||||
```
|
||||
- **Result:** Only the auto-injected `namespace` was stored. Chart defaults (`cpuRequest: 8`, `memoryLimit: "16Gi"`, etc.) are not stored in the API response — they are resolved at Helm deploy time.
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### 1. Override Priority Order (when both fields provided)
|
||||
|
||||
| Priority | Source | Description |
|
||||
|----------|--------|-------------|
|
||||
| **Highest** | `values` JSON field | Structured JSON object in the request body |
|
||||
| **Lowest** | `valuesYaml` string field | Raw YAML string in the request body |
|
||||
| **Baseline** | Chart built-in `values.yaml` | Default values packaged in the Helm chart |
|
||||
|
||||
### 2. Conflict Resolution
|
||||
|
||||
When both `values` and `valuesYaml` are provided with conflicting values:
|
||||
- **`values` JSON wins** — the structured JSON field takes priority over the YAML string
|
||||
- **No error or warning** is returned to the user
|
||||
- The system silently prefers the `values` JSON field
|
||||
|
||||
### 3. `gpuMem=10000` Behavior
|
||||
|
||||
- The integer value `10000` was **accepted without issues** in both `values` JSON and `valuesYaml` formats
|
||||
- No normalization or unit conversion was applied (stored as-is: `10000`)
|
||||
- Consistent with the project convention that `nvidia.com/gpumem` is treated as a vendor integer MB scalar
|
||||
|
||||
### 4. All values are stored in a unified `values` field in the DB
|
||||
|
||||
Both `values` JSON and `valuesYaml` inputs are converted to a single structured `values` JSON object in the database. The API response always returns the structured `values` field regardless of how the input was provided.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Document the priority order** — users should know that when providing both `values` and `valuesYaml`, the `values` JSON field takes precedence and no error is raised.
|
||||
2. **Consider returning a warning** when both fields are provided with conflicting values, as silent override could cause confusion.
|
||||
3. **The naming convention** (`values` vs `valuesYaml`) can be misleading since both ultimately serve the same purpose. Consider deprecating one in the API to avoid ambiguity.
|
||||
752
docs/user-guide.md
Normal file
752
docs/user-guide.md
Normal file
@ -0,0 +1,752 @@
|
||||
# OCDP Platform User Guide
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#1-overview)
|
||||
2. [Login / Authentication](#2-login--authentication)
|
||||
3. [Home Page](#3-home-page)
|
||||
4. [Launch Instance (Chart Browser)](#4-launch-instance-chart-browser)
|
||||
5. [Instances Management](#5-instances-management)
|
||||
6. [Cluster Monitoring](#6-cluster-monitoring)
|
||||
7. [Setup — Clusters](#7-setup--clusters)
|
||||
8. [Setup — Registries](#8-setup--registries)
|
||||
9. [Setup — Users (Admin)](#9-setup--users-admin)
|
||||
10. [Navigation](#10-navigation)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
**OCDP (Open Cloud Deployment Platform)** is a Kubernetes LLM inference deployment platform. Its primary use case is: a user selects a `vllm-serve` Helm chart from a Harbor registry, fills in the instance name, namespace, and values, and the backend pulls the packaged OCI Helm chart and deploys it to a configured Kubernetes cluster via the Helm SDK.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Frontend (React 18 + TypeScript + Vite + TailwindCSS)
|
||||
|
|
||||
| HTTP /api/*
|
||||
v
|
||||
Nginx (Reverse Proxy / Static File Server)
|
||||
|
|
||||
| HTTP /api/*
|
||||
v
|
||||
Backend (Go 1.24 + Gorilla Mux + Hexagonal Architecture)
|
||||
|
|
||||
+---> PostgreSQL (persistence)
|
||||
+---> ORAS SDK (OCI chart pull)
|
||||
+---> Helm SDK (deploy/upgrade/delete)
|
||||
+---> client-go (Kubernetes API)
|
||||
```
|
||||
|
||||
### Tech Stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------------|-----------------------------------------------------------------|
|
||||
| Frontend | React 18, TypeScript, Vite, TailwindCSS, React Router, Lucide icons |
|
||||
| Backend | Go 1.24, Gorilla Mux, PostgreSQL, ORAS SDK, Helm SDK, client-go |
|
||||
| Gateway | Nginx (reverse proxy + static file serving) |
|
||||
| Database | PostgreSQL |
|
||||
| Deployment | Docker Compose |
|
||||
|
||||
---
|
||||
|
||||
## 2. Login / Authentication
|
||||
|
||||
### Access
|
||||
|
||||
The frontend is deployed at `http://10.6.80.114:18080`. Navigating to the root URL redirects to the login page.
|
||||
|
||||
### Login Page
|
||||
|
||||
The login page displays:
|
||||
|
||||
- **OCDP Console** title with a shield icon
|
||||
- Subtitle: "Sign in with an account created by an administrator"
|
||||
- **Username** text field (required)
|
||||
- **Password** text field (required, masked)
|
||||
- **Login** button — blue, centered, full-width
|
||||
|
||||
When you click **Login**:
|
||||
|
||||
1. A toast notification "Logging in..." appears briefly
|
||||
2. The button shows a spinning loader and "Logging in..." text
|
||||
3. On success: a "Welcome, {username}!" toast appears, and you are redirected to `/home`
|
||||
4. On failure: a red error message is shown below the button (e.g., "Invalid credentials" or "Network error")
|
||||
|
||||
### Default Admin Credentials
|
||||
|
||||
If the system was bootstrapped via `.env` configuration, the default admin credentials are:
|
||||
|
||||
- **Username:** `admin` (or whatever was set as `BOOTSTRAP_ADMIN_USER`)
|
||||
- **Password:** The value of `BOOTSTRAP_ADMIN_PASS` in your `.env` file
|
||||
|
||||
### JWT Session Behavior
|
||||
|
||||
- The backend issues JWT tokens upon successful login
|
||||
- The frontend stores the tokens and sends them as `Authorization: Bearer <token>` headers
|
||||
- Session persists until the token expires or the user signs out
|
||||
- Clicking the **logout icon** (top-right corner, person icon with a door arrow) signs the user out
|
||||
|
||||
### Routing When Authenticated
|
||||
|
||||
- Unauthenticated users are always redirected to `/` (login page)
|
||||
- Authenticated users visiting `/` are redirected to `/home`
|
||||
- Protected routes are wrapped in a `ProtectedRoute` component; unauthorized access redirects to login
|
||||
- Route-level access is enforced per user role (admin vs regular user)
|
||||
|
||||
---
|
||||
|
||||
## 3. Home Page
|
||||
|
||||
The home page at `/home` is the main landing page after login. It has three sections.
|
||||
|
||||
### Section 1: Primary Actions (3 Cards)
|
||||
|
||||
A large card titled "One Click Deployment Platform" / "Operations Workbench" contains three action cards arranged in a row:
|
||||
|
||||
**1. Launch Instance**
|
||||
- Icon: Rocket (blue background)
|
||||
- Description: "Browse Helm charts and deploy a new inference service."
|
||||
- Clicking navigates to `/artifact/registries`
|
||||
- Shows "Open" with an arrow on hover
|
||||
|
||||
**2. Instances**
|
||||
- Icon: Package (emerald background)
|
||||
- Description: "Check release status, entries, upgrades, and deletion."
|
||||
- Clicking navigates to `/artifact/instances`
|
||||
- Shows "Open" with an arrow on hover
|
||||
|
||||
**3. Cluster Monitoring**
|
||||
- Icon: Activity (dark slate background)
|
||||
- Description: "Inspect cluster health and node resource pressure."
|
||||
- Clicking navigates to `/monitoring/clusters`
|
||||
- Shows "Open" with an arrow on hover
|
||||
|
||||
Each card:
|
||||
- Has a subtle border, slate background, and hover effect (lifts slightly, adds blue border)
|
||||
- Shows a colored icon box at the top
|
||||
- Shows title and description
|
||||
- Has an "Open" link with right-arrow at the bottom
|
||||
|
||||
### Section 2: Runtime Focus Sidebar
|
||||
|
||||
On the right side of the primary actions, a smaller card titled "Runtime Focus" with subtitle "High-frequency checks":
|
||||
|
||||
- **Release status** — clickable row that navigates to `/artifact/instances`. Subtitle: "Installed, failed, deleting"
|
||||
- **Cluster health** — clickable row that navigates to `/monitoring/clusters`. Subtitle: "Nodes, pods, CPU, memory"
|
||||
|
||||
### Section 3: Setup
|
||||
|
||||
A bottom section titled "Setup" with subtitle "Less frequent administrative tasks". Contains three buttons in a row:
|
||||
|
||||
1. **Clusters** — Server icon. Description: "Kubeconfig and namespace policy". Navigates to `/configuration/clusters`
|
||||
2. **Registries** — Database icon. Description: "Harbor robot account and chart access". Navigates to `/configuration/registries`
|
||||
3. **Users** — Users icon. Description: "Admin-only account management". Navigates to `/configuration/users`. Only visible to admin users.
|
||||
|
||||
---
|
||||
|
||||
## 4. Launch Instance (Chart Browser)
|
||||
|
||||
The Artifact Browser page at `/artifact/registries` is the chart browser for selecting and launching Helm charts.
|
||||
|
||||
### Page Layout
|
||||
|
||||
The page is a split-pane layout:
|
||||
- **Left sidebar (w-80):** Registry tree with search
|
||||
- **Right main panel:** Repository info, tags, and launch actions
|
||||
|
||||
### Left Panel: Registry Tree
|
||||
|
||||
**Header bar** (top of page):
|
||||
- Title: "Chart Browser"
|
||||
- Subtitle: "Select a Harbor chart and launch it into a Kubernetes cluster"
|
||||
- **Refresh** button (secondary style, refresh icon) — reloads all registries and repositories, clears cache
|
||||
|
||||
**Search bar** at the top of the sidebar:
|
||||
- Placeholder: "Search registries / repositories..."
|
||||
- Filters the tree as you type (matches registry name and repository name)
|
||||
- Shows a search icon on the left
|
||||
|
||||
**Registry nodes** listed below the search bar:
|
||||
- Each registry shows:
|
||||
- Chevron (down/right) to expand/collapse
|
||||
- Database icon (blue)
|
||||
- Registry name
|
||||
- Registry URL (truncated, small text)
|
||||
- Badge showing count of repositories
|
||||
- Registries are expanded by default
|
||||
- Clicking a registry header toggles expansion
|
||||
|
||||
**Repository items** under each registry:
|
||||
- Each shows the repository name
|
||||
- Clicking a repository selects it (highlighted blue background) and loads its artifacts in the right panel
|
||||
- Shows artifact count if available
|
||||
- If no repositories: shows "No chart repositories found."
|
||||
- If loading: shows "Loading repositories..."
|
||||
|
||||
### Right Panel: Repository Details
|
||||
|
||||
When a repository is selected:
|
||||
|
||||
**Repository header:**
|
||||
- Label: "Chart repository" (uppercase, small)
|
||||
- Repository name (large, bold)
|
||||
- Registry name below
|
||||
- **Filter chips:** Two toggle buttons: "Charts" (default selected, blue) and "All tags"
|
||||
- "Charts" filter shows only artifacts of type `chart` (i.e., deployable Helm charts)
|
||||
- "All tags" shows every artifact version regardless of type
|
||||
|
||||
**Artifact grid** (responsive: 1-3 columns):
|
||||
- Each artifact is displayed in a **TagCard** component (see below)
|
||||
|
||||
When no repository is selected:
|
||||
- Shows empty state: "Select a repository" with "Choose a chart repository from the left panel."
|
||||
|
||||
### TagCard Component
|
||||
|
||||
Each TagCard shows:
|
||||
|
||||
- **Type icon**: Package (chart), Box (image), or File (other) with color-coded background
|
||||
- **Tag name** (e.g., `1.0.0`) with a type badge (e.g., "chart" in blue)
|
||||
- **Repository path** (truncated)
|
||||
- **Size** in KB or MB (e.g., "12.5 MB")
|
||||
|
||||
**TagCard buttons:**
|
||||
|
||||
1. **Launch button** — Blue button, only visible when the artifact type is `chart`. Shows rocket icon + "Launch". Opens the LaunchModal.
|
||||
2. **Copy button** — White button with copy icon. Copies the `helm pull oci://...` command to the clipboard. Shows a success toast.
|
||||
|
||||
### LaunchModal
|
||||
|
||||
Opens when "Launch" is clicked on a chart artifact. Title: "Launch Instance" with rocket icon.
|
||||
|
||||
**Modal header:**
|
||||
- Shows the repository name and tag (e.g., `vllm-serve:1.0.0`)
|
||||
|
||||
**Form fields:**
|
||||
|
||||
1. **Target Cluster** (required)
|
||||
- Dropdown select listing all configured clusters
|
||||
- Auto-selects the first available cluster (or user's default cluster)
|
||||
- If no clusters: shows an amber warning "No clusters available. Please add a cluster first."
|
||||
- Shows loading state while fetching
|
||||
|
||||
2. **Instance Name** (required)
|
||||
- Text input, placeholder: "my-app"
|
||||
- Help text: "Lowercase alphanumeric characters, '-' or '.'"
|
||||
|
||||
3. **Namespace** (required)
|
||||
- If the selected cluster has allowed namespaces: shows a dropdown of allowed namespaces
|
||||
- If no restrictions: shows a text input, default "default"
|
||||
- If namespace is controlled by workspace policy: input is disabled with a blue info notice
|
||||
|
||||
4. **Description** (optional)
|
||||
- Text input, placeholder: "Optional description"
|
||||
|
||||
5. **Configuration Values** — Three input modes:
|
||||
|
||||
**a) Quick mode** (default):
|
||||
- Blue info box explaining "Quick launch uses the chart defaults"
|
||||
- Shows badges: "No values override" and if available "Chart values.yaml available"
|
||||
- If `values.yaml` exists: "Load Defaults from values.yaml" button switches to YAML mode with defaults pre-filled
|
||||
- Best for simple deployments with no custom overrides
|
||||
|
||||
**b) Guided mode** (form):
|
||||
- Only available when the chart provides a JSON Schema for its values
|
||||
- Dynamically generates form fields based on the schema
|
||||
- Supports various schema types: string, number, boolean, object, array
|
||||
- **"Load Defaults"** button — fills in values from the schema defaults
|
||||
- Shows schema-generated form in a scrollable container
|
||||
|
||||
**c) YAML mode**:
|
||||
- A code editor (textarea) for entering custom values in YAML format
|
||||
- Real-time YAML validation with error display
|
||||
- "Load Defaults from values.yaml" button
|
||||
- "Load Schema Defaults" button (if no values.yaml but schema exists)
|
||||
- "Clear" button to reset the YAML
|
||||
- Help text changes based on whether schema is available
|
||||
|
||||
6. **Artifact Info** (read-only summary):
|
||||
- Repository name
|
||||
- Tag (badge)
|
||||
- Type
|
||||
|
||||
**Footer buttons:**
|
||||
- **Cancel** (secondary) — closes the modal
|
||||
- **Launch** (success/green style with rocket icon) — submits the deployment
|
||||
|
||||
**Validation on submit:**
|
||||
- Cluster must be selected
|
||||
- Instance name must not be empty
|
||||
- Namespace must not be empty
|
||||
- If namespace policy restricts namespaces, the selected namespace must be in the allowed list
|
||||
- YAML values are parsed and validated before submission
|
||||
|
||||
**After successful submit:**
|
||||
- Form resets
|
||||
- Modal closes
|
||||
- Navigates to `/artifact/instances` to show the deploying instance
|
||||
- Shows "Instance deployed successfully" toast
|
||||
|
||||
**Error states:**
|
||||
- Loading clusters fails: error toast
|
||||
- Missing required fields: validation error toast
|
||||
- YAML parse error: inline error + toast
|
||||
- API failure: error toast with message
|
||||
|
||||
---
|
||||
|
||||
## 5. Instances Management
|
||||
|
||||
The Instances page at `/artifact/instances` manages all deployed Helm releases across clusters.
|
||||
|
||||
### Stats Cards
|
||||
|
||||
Three gradient stat cards at the top (shown when clusters exist):
|
||||
|
||||
1. **Total Instances** (blue) — total count across all clusters
|
||||
2. **Clusters** (emerald) — number of clusters
|
||||
3. **Showing** (violet) — count of currently displayed instances (only shown when filtering across 2+ clusters)
|
||||
|
||||
### Filter Controls
|
||||
|
||||
When more than one cluster exists, a filter bar appears:
|
||||
- **Filter by Cluster** dropdown with "All Clusters" and each cluster with instance count
|
||||
- Selecting a cluster filters the instance list to that cluster only
|
||||
|
||||
### Instance Display
|
||||
|
||||
Instances are grouped by cluster when "All Clusters" is selected, each cluster section showing:
|
||||
- Cluster name with instance count
|
||||
- Instances in a responsive 2-column grid
|
||||
|
||||
### InstanceCard Component
|
||||
|
||||
Each card shows:
|
||||
|
||||
**Header:**
|
||||
- Instance name (bold, large)
|
||||
- Repository name with version badge (cyan)
|
||||
- **Status badge** with colored background glow:
|
||||
|
||||
| Status | Badge Color | Description |
|
||||
|-----------------|-------------|----------------------------------------------------|
|
||||
| Deployed | Emerald | Deployment completed successfully |
|
||||
| Failed | Rose/Red | Last operation reported a failure |
|
||||
| Pending Install | Amber | Installation is in progress |
|
||||
| Pending Upgrade | Amber | Upgrade is in progress |
|
||||
| Pending Rollback| Amber | Rollback is in progress |
|
||||
| Pending Delete | Orange | Deletion is in progress |
|
||||
| Superseded | Indigo | A newer revision has replaced this instance |
|
||||
| Uninstalled | Slate | Instance has been removed from the cluster |
|
||||
| Unknown | Slate | Awaiting next state update |
|
||||
|
||||
- Status reason text (e.g., "Deployment completed successfully." or a custom message)
|
||||
- Last operation label (Install / Upgrade / Rollback / Delete / Sync)
|
||||
|
||||
**Details grid:**
|
||||
- **Namespace** — purple icon
|
||||
- **Revision** — green icon, Helm revision number
|
||||
- **Repository** — full-width, truncated, monospace
|
||||
- **Launched** — date the instance was created
|
||||
|
||||
**Last error alert** (conditionally shown):
|
||||
- Red alert box with warning icon
|
||||
- Shows the last error message if the instance encountered errors
|
||||
|
||||
**Action buttons (5 buttons in a row):**
|
||||
|
||||
1. **Refresh** — Refresh icon. Refreshes the status of this specific instance from the cluster.
|
||||
2. **Entries** — Network icon (emerald). Opens Entries modal.
|
||||
3. **Diagnostics** — Activity icon (indigo). Opens Diagnostics modal.
|
||||
4. **Modify** — Settings icon (blue). Opens Modify modal.
|
||||
5. **Delete** — Stop icon (rose/red). Prompts confirmation, then deletes.
|
||||
|
||||
### Empty / Loading / Error States
|
||||
|
||||
- **Loading:** Shows spinning indicator with "Loading instances..."
|
||||
- **Error:** Shows error state with retry button
|
||||
- **Empty:** "No instances found" with link to launch from registries
|
||||
- **Auto-refresh:** Data refreshes every 30 seconds silently
|
||||
|
||||
### Entries Modal
|
||||
|
||||
Displays network entry information for the instance.
|
||||
|
||||
**Header:**
|
||||
- Title: "Instance Entries"
|
||||
- Instance name and namespace
|
||||
|
||||
**Data Source badge:**
|
||||
- **Live from Kubernetes** (green) — fetched directly from the cluster
|
||||
- **From Helm Manifest** (blue) — extracted from Helm manifest
|
||||
- **From Helm Notes** (yellow) — from Helm release notes
|
||||
- **No Data Available** (gray)
|
||||
|
||||
**Services section** (if any):
|
||||
- Lists each Kubernetes Service with:
|
||||
- Service name and type badge
|
||||
- Cluster IP (copyable)
|
||||
- Ports with mapping (e.g., `80 → 8080 TCP`, NodePort)
|
||||
- LoadBalancer entries (if applicable) with external link and copy
|
||||
|
||||
**Ingresses section** (if any):
|
||||
- Lists each Kubernetes Ingress with:
|
||||
- Ingress name and class
|
||||
- Host with external link and copy buttons
|
||||
- Path routing (e.g., `/ → service:80`)
|
||||
- TLS indicator if HTTPS is configured
|
||||
|
||||
**Helm Notes** (as fallback):
|
||||
- Raw Helm notes text shown in a monospace pre block
|
||||
|
||||
**Footer:** Close button
|
||||
|
||||
### Diagnostics Modal
|
||||
|
||||
Provides Kubernetes-level diagnostics for the instance.
|
||||
|
||||
**Header:**
|
||||
- "Runtime diagnostics" label
|
||||
- Instance name
|
||||
- Namespace and data collection timestamp
|
||||
|
||||
**Refresh button** in the header — reloads diagnostics data from Kubernetes
|
||||
|
||||
**Three tabs:**
|
||||
|
||||
**1. Describe tab** (default):
|
||||
- Summary metrics: Pods count, Services count, Events count
|
||||
- **Pods section** — each pod shows:
|
||||
- Pod name, node, pod IP, restart count
|
||||
- Status badge (Running=success, other=warning)
|
||||
- Containers with name, state badge, image, and reason/message
|
||||
- **Services section** — each service with name, type badge, ClusterIP, ports
|
||||
|
||||
**2. Events tab**:
|
||||
- Kubernetes events sorted by time
|
||||
- Each event shows: type badge (Warning/ Normal), reason, timestamp, message, involved object, count
|
||||
|
||||
**3. Pod Logs tab**:
|
||||
- Logs from each container, labeled by pod/container name
|
||||
- Monospace display on dark background
|
||||
- **Copy Logs** button copies all logs to clipboard
|
||||
- Last 300 lines are fetched per container
|
||||
|
||||
**Error states:**
|
||||
- Loading fails: error toast
|
||||
- No data: amber info box "Diagnostics data is not available"
|
||||
- Empty pods/events/logs: relevant empty state message
|
||||
|
||||
### Modify Modal
|
||||
|
||||
Allows modifying an existing instance.
|
||||
|
||||
**Header:** "Modify Instance - {name}" with settings icon
|
||||
|
||||
**Current info section** (read-only):
|
||||
- Current version
|
||||
- Cluster ID
|
||||
- Repository
|
||||
|
||||
**Fields:**
|
||||
1. **Version Tag** — text input, pre-filled with current version. Help: "Leave unchanged to keep current version"
|
||||
2. **Description** — text input
|
||||
3. **Configuration Values** — Form or YAML mode (auto-detects if schema exists):
|
||||
- **Form mode:** Dynamic form generated from values schema, with real-time sync to YAML
|
||||
- **YAML mode:** Textarea with monospace font, pre-filled with current values
|
||||
|
||||
**Footer:** Cancel / Modify buttons
|
||||
|
||||
**After submit:** Instance is upgraded via Helm, data refreshes, modal closes.
|
||||
|
||||
---
|
||||
|
||||
## 6. Cluster Monitoring
|
||||
|
||||
The Monitoring page at `/monitoring/clusters` shows the health and resource usage of all configured Kubernetes clusters.
|
||||
|
||||
### Summary Stats Cards
|
||||
|
||||
Four stat cards at the top:
|
||||
|
||||
1. **Total Clusters** (blue) — total number of clusters
|
||||
2. **Healthy** (green) — clusters with status "healthy"
|
||||
3. **Warning** (orange) — clusters with status "warning" or "unknown"
|
||||
4. **Error** (red) — clusters with status "error" or "unhealthy"
|
||||
|
||||
### Auto-Refresh
|
||||
|
||||
- The page auto-refreshes every **30 seconds**
|
||||
- A small note shows "Auto-refresh every 30 seconds" with a refresh indicator
|
||||
- Manual **Refresh** button in the page header
|
||||
|
||||
### ClusterMonitorCard
|
||||
|
||||
Each cluster is shown in an expandable card.
|
||||
|
||||
**Card header:**
|
||||
- Status icon (green check, yellow warning, red X, or gray question mark)
|
||||
- Cluster name with status badge (Healthy / Warning / Error)
|
||||
|
||||
**Metrics grid** (4 columns):
|
||||
- **Uptime** — how long the cluster has been running
|
||||
- **Nodes** — node count
|
||||
- **Pods** — pod count
|
||||
- **GPU** — used/total GPU count
|
||||
|
||||
**Resource usage** (3 columns with progress bars):
|
||||
- **CPU** — used/total, percentage bar, max per node with peak usage
|
||||
- **Memory** — used/total, percentage bar, max per node with peak usage
|
||||
- **GPU** (only if GPUs exist) — used/total, percentage bar, max per node with peak usage
|
||||
|
||||
**Last checked** timestamp
|
||||
|
||||
**Show Nodes / Hide Nodes** toggle button (only if nodes exist)
|
||||
|
||||
### NodeMetricCard (Expandable Nodes)
|
||||
|
||||
When nodes are expanded, each node shows:
|
||||
|
||||
- Node name with status icon (Ready green / NotReady red)
|
||||
- Status badge (Ready / NotReady) and role badge (Control Plane / Worker)
|
||||
- Age
|
||||
- **CPU** — usage/allocatable with progress bar
|
||||
- **Memory** — usage/allocatable with progress bar
|
||||
- **GPU** — usage/capacity with progress bar (shows "No GPU" if none)
|
||||
- Additional info: Pod count, Kubelet version
|
||||
|
||||
### States
|
||||
|
||||
- **Loading:** Shows "Loading cluster monitoring data..."
|
||||
- **Error:** Error state with retry button, "Failed to Load Clusters"
|
||||
- **Empty:** "No Clusters Available" with suggestion to add clusters in configuration
|
||||
|
||||
---
|
||||
|
||||
## 7. Setup — Clusters
|
||||
|
||||
The Cluster Configuration page at `/configuration/clusters` manages Kubernetes cluster connections.
|
||||
|
||||
### Page Header
|
||||
|
||||
- Title: "Configuration - Clusters"
|
||||
- Description changes based on role (admin sees "Manage all..." , regular user sees "Manage your private...")
|
||||
- **Refresh** button (secondary)
|
||||
- **Add Cluster** button (primary, blue, plus icon)
|
||||
|
||||
### ClusterList Component
|
||||
|
||||
**Loading state:** Spinner with "Loading clusters..."
|
||||
**Empty state:** Server icon with "No clusters" and "Add your first cluster..."
|
||||
|
||||
**Cluster cards** (2-column grid):
|
||||
- Cluster name with server icon and visibility label (Private / Workspace / Global)
|
||||
- Description (if any)
|
||||
- Three action buttons:
|
||||
- **Test Connection** (Activity icon, emerald) — performs a health check against the cluster
|
||||
- **Edit** (pencil icon, blue) — opens edit modal
|
||||
- **Delete** (trash icon, red) — prompts confirmation then deletes
|
||||
- API Server URL (monospace, truncated)
|
||||
- Auth status grid (3 columns): CA Certificate, Client Cert, Client Key — each shows "✅ Configured" or "✗ Not Configured"
|
||||
- Created date
|
||||
|
||||
Action buttons may be disabled based on user permissions (read-only access).
|
||||
|
||||
### Add / Edit Cluster Modal
|
||||
|
||||
**Form fields:**
|
||||
|
||||
1. **Cluster Name** (required) — text input, e.g., "Production Cluster"
|
||||
2. **API Server URL** (required) — must start with `https://`, e.g., `https://kubernetes.example.com:6443`
|
||||
3. **CA Certificate (Base64)** — required for create. Textarea for base64-encoded CA cert. In edit mode: shows current status and optional new input.
|
||||
4. **Client Certificate (Base64)** — required for create. In edit mode: shows current status.
|
||||
5. **Client Key (Base64)** — required for create. In edit mode: shows current status.
|
||||
6. **Bearer Token** — optional alternative to client certificates. Textarea for service account token.
|
||||
7. **Description** — optional textarea
|
||||
|
||||
**Validation:**
|
||||
- Name and API Server URL required
|
||||
- URL must start with `http://` or `https://`
|
||||
- Create mode requires either token OR all three certificate fields
|
||||
- Edit mode: certificate fields are optional (leave blank to keep existing)
|
||||
|
||||
**Footer:** Cancel / Add Cluster or Save
|
||||
|
||||
### Health Check
|
||||
|
||||
Clicking the **Test Connection** (Activity) button on a cluster:
|
||||
- Shows "Testing cluster..." toast
|
||||
- If successful: green toast with success message
|
||||
- If failed: red toast with error message
|
||||
- Checks connectivity to the Kubernetes API server
|
||||
|
||||
---
|
||||
|
||||
## 8. Setup — Registries
|
||||
|
||||
The Registry Configuration page at `/configuration/registries` manages OCI registry connections.
|
||||
|
||||
### Page Header
|
||||
|
||||
- Title: "Configuration - Registries"
|
||||
- **Refresh** button (secondary)
|
||||
- **Add Registry** button (primary, blue, plus icon)
|
||||
|
||||
### RegistryList Component
|
||||
|
||||
**Loading state:** "Loading registries..."
|
||||
**Empty state:** Database icon with "No registries" and "Add your first registry..."
|
||||
|
||||
**Registry cards** (vertical list):
|
||||
- Registry name with database icon and visibility label
|
||||
- **Insecure** badge (yellow, if insecure flag is on)
|
||||
- Registry URL (clickable link, opens in new tab)
|
||||
- Description (if any)
|
||||
- Username display
|
||||
- Two action buttons:
|
||||
- **Edit** (pencil icon, blue) — opens edit modal
|
||||
- **Delete** (trash icon, red) — prompts confirmation then deletes
|
||||
|
||||
### Add / Edit Registry Modal
|
||||
|
||||
**Form fields:**
|
||||
|
||||
1. **Name** (required) — e.g., "Harbor Production"
|
||||
2. **Registry URL** (required) — e.g., `https://registry.example.com`
|
||||
3. **Username** (required) — registry username (Harbor robot account recommended)
|
||||
4. **Password** — required for create. In edit mode: shows current status ("Password set - encrypted") and optional new password input
|
||||
5. **Description** — optional textarea
|
||||
6. **Insecure** — checkbox. "Allow insecure connection (skip SSL certificate verification)" — for registries using HTTP or self-signed certs
|
||||
|
||||
**Test Connection button** (in edit mode only, after saving):
|
||||
- Tests the registry connectivity by calling the backend health endpoint
|
||||
- Button shows a pulsing test tube icon while testing
|
||||
- Shows success/failure toast
|
||||
|
||||
**Footer:** Save / Test Connection / Cancel
|
||||
|
||||
---
|
||||
|
||||
## 9. Setup — Users (Admin)
|
||||
|
||||
The User Management page at `/configuration/users` is **admin-only**. Non-admin users cannot access this route.
|
||||
|
||||
### Page Header
|
||||
|
||||
- "Admin only" label with shield icon
|
||||
- Title: "User Management"
|
||||
- Description: "Create accounts, assign roles, and disable access without public self-registration."
|
||||
- **Refresh** button (secondary)
|
||||
|
||||
### Create User Form (Left Panel)
|
||||
|
||||
**Username** (required) — text input
|
||||
**Initial password** (required) — masked input
|
||||
**Role** dropdown — "User" or "Admin"
|
||||
|
||||
When **User** role is selected, additional fields appear:
|
||||
|
||||
**Tenant namespace section:**
|
||||
- **Namespace** — text input, auto-generated from username as `ocdp-u-{username}`
|
||||
- **Default cluster** — dropdown of available clusters
|
||||
|
||||
**Resource limits section:**
|
||||
- **CPU** — default "4" (Kubernetes quantity, e.g., "4" or "500m")
|
||||
- **Memory** — default "16Gi"
|
||||
- **GPU** — default "0" (integer count)
|
||||
- **GPU Mem** — default "0" (integer MB, e.g., 10000)
|
||||
- Help text explains the units
|
||||
|
||||
**Checkbox:**
|
||||
- "Require password change after first login" — checked by default
|
||||
|
||||
**Create User button** (primary, full-width, user-plus icon)
|
||||
|
||||
### Accounts Table (Right Panel)
|
||||
|
||||
A table with columns:
|
||||
|
||||
| Column | Content |
|
||||
|-----------|------------------------------------------------------|
|
||||
| User | Username + email |
|
||||
| Role | Badge: "admin" (info blue) or "user" (secondary) |
|
||||
| Status | Badge: "Active" (green) or "Disabled" (warning) |
|
||||
| Namespace | Namespace + workspace name + default cluster |
|
||||
| Quota | CPU, Memory, GPU/GPU Mem (admin shows "default workspace") |
|
||||
| Actions | See below |
|
||||
|
||||
**Actions per row (4 buttons):**
|
||||
|
||||
1. **Make User / Make Admin** — toggles the user's role between admin and user
|
||||
2. **Limits** (pencil icon) — opens the Edit Limits modal (only for non-admin users)
|
||||
3. **Enable / Disable** — toggles the user's active status (disabled for own account)
|
||||
4. **Delete** (trash icon, red) — deletes user after confirmation (disabled for own account)
|
||||
|
||||
### Edit Limits Modal
|
||||
|
||||
Opens when "Limits" button is clicked for a non-admin user:
|
||||
|
||||
- **Tenant limits** label with gauge icon
|
||||
- User's name as title
|
||||
- Description: "Changes are applied to workspace metadata..."
|
||||
- Fields: Namespace, Default cluster, CPU, Memory, GPU, GPU Memory
|
||||
- **Cancel** / **Save Limits** buttons
|
||||
|
||||
---
|
||||
|
||||
## 10. Navigation
|
||||
|
||||
### Left Sidebar
|
||||
|
||||
The sidebar shows the "Operations" branding at the top with the following navigation items:
|
||||
|
||||
| Item | Icon | Route |
|
||||
|-------------------|-----------------------|----------------------------------|
|
||||
| Home | Home (gray) | `/home` |
|
||||
| Launch Instance | Rocket (blue) | `/artifact/registries` |
|
||||
| Instances | Boxes (emerald) | `/artifact/instances` |
|
||||
| Cluster Monitoring| LineChart (teal) | `/monitoring/clusters` |
|
||||
| **Setup** (collapsible) | Settings | |
|
||||
| └ Clusters | Server (teal) | `/configuration/clusters` |
|
||||
| └ Registries | Database | `/configuration/registries` |
|
||||
| └ Users | Users (blue) | `/configuration/users` |
|
||||
|
||||
- The Setup section is expanded by default
|
||||
- Active nav item is highlighted with a blue background
|
||||
- Sidebar collapses on mobile with a hamburger menu toggle
|
||||
- Navigator items dynamically filter based on user role:
|
||||
- "Users" is only shown to admin users
|
||||
- Routes are protected server-side too
|
||||
|
||||
### Page Header / Breadcrumbs
|
||||
|
||||
Each page shows a header in the top navigation bar with:
|
||||
- Page icon
|
||||
- Page title (e.g., "Launch Instance", "Instances", "Setup - Clusters")
|
||||
- Current user's name and role badge on the right
|
||||
- **Sign Out** button (door icon with arrow, top-right)
|
||||
|
||||
The title mapping is:
|
||||
|
||||
| Route | Header Title |
|
||||
|-------------------------------|-------------------------|
|
||||
| `/artifact/registries` | Launch Instance |
|
||||
| `/artifact/instances` | Instances |
|
||||
| `/configuration/clusters` | Setup - Clusters |
|
||||
| `/configuration/registries` | Setup - Registries |
|
||||
| `/configuration/users` | Setup - Users |
|
||||
| `/monitoring/clusters` | Monitoring - Clusters |
|
||||
| `/home` | OCDP Platform |
|
||||
|
||||
### Legacy Route Redirects
|
||||
|
||||
Several legacy URL patterns redirect to current routes:
|
||||
- `/config/*` → `/configuration/clusters`
|
||||
- `/monitor`, `/cluster`, `/cluster/monitor` → `/monitoring/clusters`
|
||||
- `/artifact/registry` → `/artifact/registries`
|
||||
- `/artifact/instance` → `/artifact/instances`
|
||||
- `/registry` → `/artifact/registries`
|
||||
- `/register` → `/`
|
||||
|
||||
---
|
||||
Reference in New Issue
Block a user