refactor: full-stack restructure with multi-tenancy, workspace management, and K8s diagnostics
- Add Workspace domain (entity, repository, service, handler, DTO) - Add multi-tenant K8s client with tenant binding and quota management - Add K8s diagnostics client (instance diagnostics) - Add authorization middleware (authz package) - Restructure frontend to feature-based architecture (features/) - Add User Management page in configuration - Add AccessDenied page and route guards - Refactor shared components (form inputs, layout, UI) - Update Tailwind config for new design system - Add comprehensive documentation (docs/, tasks/, plans) - Improve cluster service with better kubeconfig handling - Add tests for crypto, config, helm client, tenant binding
This commit is contained in:
8
tasks/lessons.md
Normal file
8
tasks/lessons.md
Normal file
@ -0,0 +1,8 @@
|
||||
# Lessons
|
||||
|
||||
- Do not leave real bootstrap credentials, cluster endpoints, certificates, or passwords in code fallbacks. Bootstrap defaults must be empty/disabled; real data must come only from `.env`, `BOOTSTRAP_CONFIG_JSON`, or explicit config files.
|
||||
- Keep backend permission names aligned with frontend route guards. Returning legacy domain permissions like `clusters:manage:own` without UI permissions such as `configuration:clusters:manage_own` makes ordinary users appear logged in but blocked by every page.
|
||||
- Treat `requests.nvidia.com/gpumem` as a vendor integer MB scalar in this project. Do not normalize it through Kubernetes memory units such as `M`, `G`, or `Gi`; use values like `10000`.
|
||||
- Multi-cluster tenant resources must be scoped by `(workspace_id, cluster_id)`. Do not infer the target cluster from list order; user/workspace defaults, kubeconfig issuance, namespace creation, ResourceQuota, and deploy must all use the same selected cluster.
|
||||
- For real Helm smoke tests, wait for platform instance deletion to remove the DB record before deleting the Kubernetes namespace manually. Deleting the namespace too early can make the async Helm uninstall mark the instance failed.
|
||||
- When embedding Helm, setting `actionConfig.Init(..., namespace, ...)` and `Install.Namespace` is not enough. The custom `RESTClientGetter` must also override the raw kubeconfig loader namespace, or manifests without `metadata.namespace` can be created in the kubeconfig context namespace such as `default`.
|
||||
30
tasks/session-notes.md
Normal file
30
tasks/session-notes.md
Normal file
@ -0,0 +1,30 @@
|
||||
# OCDP 系统测试 - 完成报告
|
||||
|
||||
## 已交付文档
|
||||
|
||||
| 文档 | 路径 | 大小 |
|
||||
|------|------|------|
|
||||
| 用户操作指南 | `docs/user-guide.md` | 752 lines |
|
||||
| 测试场景设计 | `docs/test-scenarios.md` | 67KB, 12 分类, 100+ 用例 |
|
||||
| 测试用户凭据 | `docs/test-users.json` | 4 个账号 |
|
||||
| 综合 Bug 报告 | `docs/bug-report.md` | 18 个 Bug, 含安全发现 |
|
||||
| user-a 测试报告 | `docs/bugs-user-a.md` | 前端 UI 发现 |
|
||||
| user-b 测试报告 | `docs/bugs-user-b.md` | API/部署发现 |
|
||||
| user-c 测试报告 | `docs/bugs-user-c.md` | 权限隔离发现 |
|
||||
| 安全测试报告 | `docs/security/bugs-security.md` | 6 个安全发现 |
|
||||
|
||||
## Bug 统计: 18 个
|
||||
|
||||
| 严重度 | 数量 | 说明 |
|
||||
|--------|------|------|
|
||||
| **P0 (Blocker)** | 2 | Launch 按钮无反应、SPA 路由空白页 |
|
||||
| **P1 (High)** | 2 | DELETE 404 + 空 body |
|
||||
| **P2 (Medium)** | 6 | 缺失 API、静默 namespace 覆盖、无障碍问题 |
|
||||
| **P3 (Low)** | 8 | 响应格式、安全头缺失、CORS、用户枚举等 |
|
||||
| **Total** | **18** | |
|
||||
|
||||
## 测试团队
|
||||
- user-a-agent ✅ (前端测试)
|
||||
- user-b-agent ✅ (API/部署测试)
|
||||
- user-c-agent ✅ (权限隔离测试)
|
||||
- security-agent ✅ (安全测试)
|
||||
36
tasks/todo.md
Normal file
36
tasks/todo.md
Normal file
@ -0,0 +1,36 @@
|
||||
# OCDP 第二次测试 - 完成
|
||||
|
||||
## 交付文档
|
||||
| 文档 | 路径 | 内容 |
|
||||
|------|------|------|
|
||||
| 综合报告 | `docs/test2-report.md` | 3 个测试的完整结果 |
|
||||
| 配额测试详情 | `docs/test2-quota.md` | 配额限额详细分析 |
|
||||
| Values 优先级测试 | `docs/test2-values-priority.md` | values 覆盖测试+冲突测试 |
|
||||
| UI 溢出/滚动/刷新 | `docs/test2-ui-overflow.md` | Playwright + 源码分析 |
|
||||
|
||||
## 核心发现
|
||||
|
||||
### 1. 资源配额
|
||||
| 发现 | 影响 |
|
||||
|------|------|
|
||||
| ✅ K8s ResourceQuota 对象正确创建并生效 | cpu/gpu/mem 限制在 pod 级别执行 |
|
||||
| ❌ **无 API 层预检查** | 后端接受所有部署请求,配额耗尽时 pod stuck pending-install |
|
||||
| ❌ **GPU 配额可绕过** | gpu=0 用户能提交需要 GPU 的 chart |
|
||||
| ❌ **实例不会自动 failed** | 超配额实例永远 stuck 在 pending-install |
|
||||
|
||||
### 2. Values 覆盖优先级
|
||||
| 优先级 | 来源 | 说明 |
|
||||
|--------|------|------|
|
||||
| 🥇 **最高** | `values` JSON 字段 | 结构化 JSON - 覆盖一切 |
|
||||
| 🥈 **中** | `valuesYaml` 字符串 | 被 values JSON 覆盖 |
|
||||
| 🥉 **最低** | Chart 内置 values.yaml | 默认基线 |
|
||||
| ⚠️ **冲突时静默覆盖,无警告** | 两者都提供时 values JSON 全胜 | |
|
||||
|
||||
### 3. 前端 UI
|
||||
| 测试 | 结论 |
|
||||
|------|------|
|
||||
| 水平溢出 | ✅ 无问题 |
|
||||
| 响应式 | ✅ sm/md/lg/xl 正确 |
|
||||
| 滚动 | ✅ 流畅 |
|
||||
| 刷新 | ✅ 正常 |
|
||||
| 颜色对比度 | ⚠️ 登录错误文本 red-400 WCAG AA 不合格 |
|
||||
Reference in New Issue
Block a user