fix: scale replicas in response, K8s metrics client, quota precheck, auth tests
- Add GetMetrics method to MetricsClient interface and implement cluster metrics API - Add QuotaPrecheck service for validating resource quotas before deployment - Add auth DTO with role/permission models and auth handler tests - Add instance diagnostics: mounted NFS volumes, labels, annotations in pod diagnostics - Update workspace handler with GetWorkspace endpoint and shared-user list - Fix monitoring handler to use correct service method name - Add tail_lines fallback in instance handler for snake_case query params - Update nginx config for SSE log streaming support (no buffering) - Add comprehensive test coverage: auth_service_test, auth_handler_test, auth_dto_test, metrics_client_test, quota_precheck_test - Update error messages for quota validation and instance operations - ModifyModal: fix YAML lineWidth:0, modified keys summary, delta-only submit - InstanceCard: correctly disable scale-minus when replicas <= 0 - SidebarLayout: add hover transition for sidebar items - Update todo.md and lessons.md with latest fixes
This commit is contained in:
49
Makefile
49
Makefile
@ -8,7 +8,7 @@ COMPOSE_BIN ?= docker compose
|
|||||||
ROOT_COMPOSE := docker-compose.yml
|
ROOT_COMPOSE := docker-compose.yml
|
||||||
COMPOSE := $(COMPOSE_BIN) -f $(ROOT_COMPOSE)
|
COMPOSE := $(COMPOSE_BIN) -f $(ROOT_COMPOSE)
|
||||||
|
|
||||||
.PHONY: help install run-2 clean-2 docker-dev docker-prod docker-up docker-down docker-logs docker-ps test
|
.PHONY: help install up restart stop clean run-2 clean-2 docker-dev docker-prod docker-up docker-down docker-logs docker-ps test
|
||||||
|
|
||||||
.DEFAULT_GOAL := help
|
.DEFAULT_GOAL := help
|
||||||
|
|
||||||
@ -17,10 +17,14 @@ help:
|
|||||||
@echo "OCDP commands"
|
@echo "OCDP commands"
|
||||||
@echo "────────────────────────────────────────"
|
@echo "────────────────────────────────────────"
|
||||||
@echo " make install Install local Go / frontend dependencies"
|
@echo " make install Install local Go / frontend dependencies"
|
||||||
@echo " make run-2 Build and start full Docker Compose stack in background"
|
@echo " make up Build and start the complete platform: DB + API + web gateway"
|
||||||
@echo " make docker-dev Alias of run-2, kept for old docs / muscle memory"
|
@echo " make restart Restart the complete platform without removing volumes"
|
||||||
@echo " make docker-prod Alias of run-2"
|
@echo " make stop Stop containers, keep volumes"
|
||||||
@echo " make docker-up Alias of run-2"
|
@echo " make clean Stop containers and remove project volumes"
|
||||||
|
@echo " make run-2 Alias of up, kept for old docs / muscle memory"
|
||||||
|
@echo " make docker-dev Alias of up"
|
||||||
|
@echo " make docker-prod Alias of up"
|
||||||
|
@echo " make docker-up Alias of up"
|
||||||
@echo " make docker-down Stop containers, keep volumes"
|
@echo " make docker-down Stop containers, keep volumes"
|
||||||
@echo " make clean-2 Stop containers and remove project volumes"
|
@echo " make clean-2 Stop containers and remove project volumes"
|
||||||
@echo " make docker-logs Follow Compose logs"
|
@echo " make docker-logs Follow Compose logs"
|
||||||
@ -37,32 +41,45 @@ install:
|
|||||||
@echo "→ Installing frontend dependencies"
|
@echo "→ Installing frontend dependencies"
|
||||||
@cd frontend && npm ci
|
@cd frontend && npm ci
|
||||||
|
|
||||||
run-2:
|
up:
|
||||||
@echo "→ Building and starting OCDP stack"
|
@echo "→ Building and starting OCDP stack"
|
||||||
@$(COMPOSE) up --build -d postgres backend nginx
|
@$(COMPOSE) up --build -d
|
||||||
@echo ""
|
@echo ""
|
||||||
@$(COMPOSE) ps
|
@$(COMPOSE) ps -a
|
||||||
@echo ""
|
@echo ""
|
||||||
@echo "Web: http://localhost:$${WEB_HTTP_PORT:-18080}"
|
@echo "Web: http://localhost:$${WEB_HTTP_PORT:-18080}"
|
||||||
@echo "Backend: http://localhost:$${BACKEND_PORT:-18081}/health"
|
@echo "Backend: http://localhost:$${BACKEND_PORT:-18081}/health"
|
||||||
|
|
||||||
docker-dev: run-2
|
restart:
|
||||||
|
@echo "→ Restarting OCDP stack"
|
||||||
|
@$(COMPOSE) up --build -d --force-recreate
|
||||||
|
@$(COMPOSE) ps -a
|
||||||
|
|
||||||
docker-prod: run-2
|
stop:
|
||||||
|
|
||||||
docker-up: run-2
|
|
||||||
|
|
||||||
docker-down:
|
|
||||||
@$(COMPOSE) down --remove-orphans
|
@$(COMPOSE) down --remove-orphans
|
||||||
|
|
||||||
clean-2:
|
clean:
|
||||||
@$(COMPOSE) down -v --remove-orphans
|
@$(COMPOSE) down -v --remove-orphans
|
||||||
|
|
||||||
|
run-2: up
|
||||||
|
|
||||||
|
docker-dev: up
|
||||||
|
|
||||||
|
docker-prod: up
|
||||||
|
|
||||||
|
docker-up: up
|
||||||
|
|
||||||
|
docker-down:
|
||||||
|
@$(MAKE) stop
|
||||||
|
|
||||||
|
clean-2:
|
||||||
|
@$(MAKE) clean
|
||||||
|
|
||||||
docker-logs:
|
docker-logs:
|
||||||
@$(COMPOSE) logs -f
|
@$(COMPOSE) logs -f
|
||||||
|
|
||||||
docker-ps:
|
docker-ps:
|
||||||
@$(COMPOSE) ps
|
@$(COMPOSE) ps -a
|
||||||
|
|
||||||
test:
|
test:
|
||||||
@test/readme-deployment-refresh.sh
|
@test/readme-deployment-refresh.sh
|
||||||
|
|||||||
72
README.md
72
README.md
@ -1,4 +1,4 @@
|
|||||||
# OCDP - Open Cloud Deployment Platform
|
# OCDP - One Click Deployment Platform
|
||||||
|
|
||||||
OCDP 是一个面向 Kubernetes 的大模型推理部署平台。当前核心场景是:用户在页面选择 Harbor 中的 `vllm-serve` Helm Chart,填写实例名称、命名空间和 values 后,后端从 Harbor 拉取封装好的 OCI Helm Chart,并通过 Helm SDK 部署到已配置好的 Kubernetes 集群。
|
OCDP 是一个面向 Kubernetes 的大模型推理部署平台。当前核心场景是:用户在页面选择 Harbor 中的 `vllm-serve` Helm Chart,填写实例名称、命名空间和 values 后,后端从 Harbor 拉取封装好的 OCI Helm Chart,并通过 Helm SDK 部署到已配置好的 Kubernetes 集群。
|
||||||
|
|
||||||
@ -29,9 +29,9 @@ ocdp-go/
|
|||||||
│ └── internal/bootstrap/ # 首次启动数据注入
|
│ └── internal/bootstrap/ # 首次启动数据注入
|
||||||
├── frontend/ # React + Vite 前端
|
├── frontend/ # React + Vite 前端
|
||||||
├── infra/nginx/ # Nginx 网关配置和 TLS 证书
|
├── infra/nginx/ # Nginx 网关配置和 TLS 证书
|
||||||
├── docker-compose.yml # 本地完整部署:PostgreSQL + Backend + 前端 build + Nginx
|
├── docker-compose.yml # 本地完整部署入口:PostgreSQL + Backend + 前端 build job + Nginx
|
||||||
├── backend/docker-compose.yml # PostgreSQL + Backend + pgAdmin
|
├── backend/docker-compose.yml # PostgreSQL + Backend + pgAdmin
|
||||||
├── Makefile # 推荐入口:install / run-2 / docker-dev / docker-down
|
├── Makefile # 推荐入口:up / restart / stop / logs / ps
|
||||||
└── tasks/ # Agent 工作记录
|
└── tasks/ # Agent 工作记录
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -119,29 +119,29 @@ GOSUMDB=sum.golang.google.cn
|
|||||||
|
|
||||||
## 推荐部署流程
|
## 推荐部署流程
|
||||||
|
|
||||||
当前推荐使用根目录 Makefile。`docker-dev`、`docker-prod`、`docker-up` 都是兼容旧文档的别名,实际会启动同一套完整 Docker Compose 栈:PostgreSQL、Backend、前端静态构建和 Nginx。
|
当前只有一个推荐启动入口:在项目根目录执行 `make up`。它会启动同一套完整 Docker Compose 栈:
|
||||||
|
|
||||||
|
- `postgres`:数据库,常驻服务。
|
||||||
|
- `backend`:Go API,常驻服务,宿主机默认端口 `18081`。
|
||||||
|
- `frontend-build`:一次性构建任务,构建完成后退出是正常现象。
|
||||||
|
- `nginx`:统一 Web 入口,常驻服务,宿主机默认端口 `18080`;前端静态文件和 `/api/*` 都从这里访问。
|
||||||
|
|
||||||
|
所以看到 `frontend-build` 处于 `Exited (0)` 不代表前端没运行;前端由 `nginx` 服务。正常运行时至少应看到 `postgres`、`backend`、`nginx` 三个容器为 `Up`。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 1. 在根目录检查 .env
|
# 1. 在根目录检查 .env
|
||||||
ls .env
|
ls .env
|
||||||
|
|
||||||
# 2. 可选:安装本地依赖。只部署 Docker 栈时不是必须,但这个命令可用。
|
# 2. 如果默认高位端口被其他项目占用,先换端口;不要杀其他项目进程
|
||||||
make install
|
|
||||||
|
|
||||||
# 3. 如果默认高位端口仍被其他项目占用,再临时换端口
|
|
||||||
export WEB_HTTP_PORT=18080
|
export WEB_HTTP_PORT=18080
|
||||||
export WEB_HTTPS_PORT=18443
|
export WEB_HTTPS_PORT=18443
|
||||||
export BACKEND_PORT=18081
|
export BACKEND_PORT=18081
|
||||||
export POSTGRES_PORT=15432
|
export POSTGRES_PORT=15432
|
||||||
|
|
||||||
# 4. 构建并后台启动完整栈
|
# 3. 构建并后台启动完整平台
|
||||||
make run-2
|
make up
|
||||||
|
|
||||||
# 兼容旧文档,也可以执行:
|
# 4. 查看服务;postgres/backend/nginx 应为 Up,frontend-build Exited(0) 正常
|
||||||
make docker-dev
|
|
||||||
make docker-prod
|
|
||||||
|
|
||||||
# 5. 查看服务
|
|
||||||
make docker-ps
|
make docker-ps
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -152,14 +152,23 @@ make docker-ps
|
|||||||
- Swagger UI:http://localhost:${BACKEND_PORT:-18081}/api/docs
|
- Swagger UI:http://localhost:${BACKEND_PORT:-18081}/api/docs
|
||||||
- Nginx 健康检查:http://localhost:${WEB_HTTP_PORT:-18080}/healthz
|
- Nginx 健康检查:http://localhost:${WEB_HTTP_PORT:-18080}/healthz
|
||||||
|
|
||||||
没有 Make 时,直接用根目录 Compose 文件即可。注意要加 `--build`,因为后端镜像和前端静态资源需要构建:
|
兼容旧文档的命令仍可用,但只是 `make up` 的别名:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker compose up --build -d postgres backend nginx
|
make run-2
|
||||||
docker compose ps
|
make docker-dev
|
||||||
|
make docker-prod
|
||||||
|
make docker-up
|
||||||
```
|
```
|
||||||
|
|
||||||
如果直接执行 `docker compose up`,Compose 也会使用同一个完整栈;但在代码或 Dockerfile 改动后建议显式加 `--build`,避免复用旧镜像。
|
没有 Make 时,直接用根目录 Compose 文件:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose up --build -d
|
||||||
|
docker compose ps -a
|
||||||
|
```
|
||||||
|
|
||||||
|
代码、Dockerfile、前端资源变更后都建议使用 `make up` 或 `docker compose up --build -d`,避免复用旧镜像或旧前端静态资源。
|
||||||
|
|
||||||
## 验证部署
|
## 验证部署
|
||||||
|
|
||||||
@ -210,20 +219,30 @@ ADMIN_PASS="<BOOTSTRAP_ADMIN_PASS>" \
|
|||||||
## 常用运维命令
|
## 常用运维命令
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
# 一条命令启动/更新完整平台
|
||||||
|
make up
|
||||||
|
|
||||||
|
# 强制重建并重启完整平台
|
||||||
|
make restart
|
||||||
|
|
||||||
|
# 查看当前状态;需要关注 postgres/backend/nginx 是否 Up
|
||||||
|
make docker-ps
|
||||||
|
docker compose ps -a
|
||||||
|
|
||||||
# 查看日志
|
# 查看日志
|
||||||
make docker-logs
|
make docker-logs
|
||||||
|
|
||||||
# 重启后端
|
# 只重启后端
|
||||||
docker compose restart backend
|
docker compose restart backend
|
||||||
|
|
||||||
# 如果后端容器被重建过,Nginx 可能仍缓存旧 upstream IP;只需重启本项目 Nginx
|
# 只重启 Web 网关
|
||||||
docker compose restart nginx
|
docker compose restart nginx
|
||||||
|
|
||||||
# 停止本项目服务,但保留数据卷
|
# 停止本项目服务,保留数据库和前端构建卷
|
||||||
make docker-down
|
make stop
|
||||||
|
|
||||||
# 清理本项目容器和数据卷,谨慎使用
|
# 清理本项目容器和数据卷,谨慎使用,会删除 PostgreSQL 数据
|
||||||
make clean-2
|
make clean
|
||||||
```
|
```
|
||||||
|
|
||||||
## 本地开发与测试
|
## 本地开发与测试
|
||||||
@ -253,7 +272,8 @@ docker compose -f backend/docker-compose.yml --profile mock up -d backend-mock
|
|||||||
## 注意事项
|
## 注意事项
|
||||||
|
|
||||||
- 不要为了端口冲突停止其他项目;优先通过 `WEB_HTTP_PORT`、`WEB_HTTPS_PORT`、`BACKEND_PORT`、`POSTGRES_PORT` 换端口。当前默认端口已经是 `18080/18443/18081/15432`。
|
- 不要为了端口冲突停止其他项目;优先通过 `WEB_HTTP_PORT`、`WEB_HTTPS_PORT`、`BACKEND_PORT`、`POSTGRES_PORT` 换端口。当前默认端口已经是 `18080/18443/18081/15432`。
|
||||||
- 如果旧文档提到 `make docker-dev`、`make docker-prod`,现在这些命令仍可用,都会启动同一套 Docker 栈。
|
- `frontend-build` 是一次性构建任务,退出码 `0` 是正常状态;前端页面由 `nginx` 容器提供。若只看到 backend/postgres 在运行,请执行 `make up` 或 `docker compose up --build -d` 恢复完整栈。
|
||||||
|
- 如果旧文档提到 `make docker-dev`、`make docker-prod`,现在这些命令仍可用,都会调用 `make up` 启动同一套 Docker 栈。
|
||||||
- 如果之前用旧配置启动失败过,PostgreSQL 卷里可能残留旧的加密数据,表现为 `/api/v1/clusters` 或 `/api/v1/registries` 解密失败。开发/重装环境可执行 `make clean-2 && make docker-dev` 重新初始化;生产环境不要直接删卷,应先备份数据库。
|
- 如果之前用旧配置启动失败过,PostgreSQL 卷里可能残留旧的加密数据,表现为 `/api/v1/clusters` 或 `/api/v1/registries` 解密失败。开发/重装环境可执行 `make clean-2 && make docker-dev` 重新初始化;生产环境不要直接删卷,应先备份数据库。
|
||||||
- `vllm-serve` 必须以 Helm Chart OCI artifact 的形式存在于 Harbor 中;后端会寻找 Helm Chart layer 并保存为 `.tgz`。
|
- `vllm-serve` 必须以 Helm Chart OCI artifact 的形式存在于 Harbor 中;后端会寻找 Helm Chart layer 并保存为 `.tgz`。
|
||||||
- Harbor 浏览使用 `/api/v2.0/projects`、project repositories 和 artifacts API。若 robot 账号无法列项目或 artifacts,页面会显示明确错误;请检查 Harbor 项目成员/robot 权限,而不是给普通用户开放全局 catalog。
|
- Harbor 浏览使用 `/api/v2.0/projects`、project repositories 和 artifacts API。若 robot 账号无法列项目或 artifacts,页面会显示明确错误;请检查 Harbor 项目成员/robot 权限,而不是给普通用户开放全局 catalog。
|
||||||
|
|||||||
@ -79,6 +79,12 @@ func main() {
|
|||||||
passwordHasher,
|
passwordHasher,
|
||||||
tokenGenerator,
|
tokenGenerator,
|
||||||
)
|
)
|
||||||
|
authService.SetUserLifecycleCleanup(
|
||||||
|
repos.InstanceRepo,
|
||||||
|
repos.ClusterRepo,
|
||||||
|
repos.BindingRepo,
|
||||||
|
repos.TenantKubeClient,
|
||||||
|
)
|
||||||
|
|
||||||
clusterService := service.NewClusterService(
|
clusterService := service.NewClusterService(
|
||||||
repos.ClusterRepo,
|
repos.ClusterRepo,
|
||||||
@ -106,10 +112,13 @@ func main() {
|
|||||||
instanceService.SetDiagnosticsClient(repos.DiagnosticsClient)
|
instanceService.SetDiagnosticsClient(repos.DiagnosticsClient)
|
||||||
instanceService.SetTenantProvisioning(repos.WorkspaceRepo, repos.TenantKubeClient)
|
instanceService.SetTenantProvisioning(repos.WorkspaceRepo, repos.TenantKubeClient)
|
||||||
instanceService.SetScaleClient(k8s.NewScaleClient())
|
instanceService.SetScaleClient(k8s.NewScaleClient())
|
||||||
|
instanceService.SetUserRepository(repos.UserRepo)
|
||||||
|
|
||||||
monitoringService := service.NewMonitoringService(
|
monitoringService := service.NewMonitoringService(
|
||||||
repos.ClusterRepo,
|
repos.ClusterRepo,
|
||||||
repos.MetricsClient,
|
repos.MetricsClient,
|
||||||
|
repos.InstanceRepo,
|
||||||
|
repos.UserRepo,
|
||||||
)
|
)
|
||||||
|
|
||||||
workspaceService := service.NewWorkspaceService(
|
workspaceService := service.NewWorkspaceService(
|
||||||
@ -243,8 +252,8 @@ func setupRouter(
|
|||||||
api := router.PathPrefix("/api/v1").Subrouter()
|
api := router.PathPrefix("/api/v1").Subrouter()
|
||||||
|
|
||||||
// ===== 认证路由 =====
|
// ===== 认证路由 =====
|
||||||
api.HandleFunc("/auth/login", authHandler.Login)
|
api.HandleFunc("/auth/login", authHandler.Login).Methods(http.MethodPost)
|
||||||
api.HandleFunc("/auth/refresh", authHandler.RefreshToken)
|
api.HandleFunc("/auth/refresh", authHandler.RefreshToken).Methods(http.MethodPost)
|
||||||
|
|
||||||
protected := api.PathPrefix("").Subrouter()
|
protected := api.PathPrefix("").Subrouter()
|
||||||
protected.Use(authMiddleware(authService))
|
protected.Use(authMiddleware(authService))
|
||||||
@ -262,6 +271,8 @@ func setupRouter(
|
|||||||
protected.HandleFunc("/clusters/{cluster_id}", clusterHandler.UpdateCluster).Methods(http.MethodPut)
|
protected.HandleFunc("/clusters/{cluster_id}", clusterHandler.UpdateCluster).Methods(http.MethodPut)
|
||||||
protected.HandleFunc("/clusters/{cluster_id}", clusterHandler.DeleteCluster).Methods(http.MethodDelete)
|
protected.HandleFunc("/clusters/{cluster_id}", clusterHandler.DeleteCluster).Methods(http.MethodDelete)
|
||||||
protected.HandleFunc("/clusters/{cluster_id}/health", clusterHandler.GetClusterHealth).Methods(http.MethodGet)
|
protected.HandleFunc("/clusters/{cluster_id}/health", clusterHandler.GetClusterHealth).Methods(http.MethodGet)
|
||||||
|
protected.HandleFunc("/clusters/{cluster_id}/stats", monitoringHandler.GetClusterStats).Methods(http.MethodGet)
|
||||||
|
protected.HandleFunc("/clusters/{cluster_id}/kubeconfig", workspaceHandler.IssueClusterKubeconfig).Methods(http.MethodGet)
|
||||||
|
|
||||||
// ===== Registry 路由 =====
|
// ===== Registry 路由 =====
|
||||||
protected.HandleFunc("/registries", registryHandler.CreateRegistry).Methods(http.MethodPost)
|
protected.HandleFunc("/registries", registryHandler.CreateRegistry).Methods(http.MethodPost)
|
||||||
@ -273,7 +284,9 @@ func setupRouter(
|
|||||||
|
|
||||||
// ===== Artifact 路由 =====
|
// ===== Artifact 路由 =====
|
||||||
protected.HandleFunc("/registries/{registry_id}/repositories", artifactHandler.ListRepositories).Methods(http.MethodGet)
|
protected.HandleFunc("/registries/{registry_id}/repositories", artifactHandler.ListRepositories).Methods(http.MethodGet)
|
||||||
|
protected.HandleFunc("/repositories/{repository_name:.+}/tags", artifactHandler.ListRepositoryTags).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts", artifactHandler.ListArtifacts).Methods(http.MethodGet)
|
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts", artifactHandler.ListArtifacts).Methods(http.MethodGet)
|
||||||
|
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/tags", artifactHandler.ListRepositoryTags).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts/{reference}", artifactHandler.GetArtifact).Methods(http.MethodGet)
|
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts/{reference}", artifactHandler.GetArtifact).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts/{reference}/values-schema", artifactHandler.GetArtifactValuesSchema).Methods(http.MethodGet)
|
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts/{reference}/values-schema", artifactHandler.GetArtifactValuesSchema).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts/{reference}/values-yaml", artifactHandler.GetArtifactValuesYAML).Methods(http.MethodGet)
|
protected.HandleFunc("/registries/{registry_id}/repositories/{repository_name:.+}/artifacts/{reference}/values-yaml", artifactHandler.GetArtifactValuesYAML).Methods(http.MethodGet)
|
||||||
@ -293,6 +306,7 @@ func setupRouter(
|
|||||||
// ===== Monitoring 路由 =====
|
// ===== Monitoring 路由 =====
|
||||||
protected.HandleFunc("/monitoring/clusters", monitoringHandler.ListClusterMonitoring).Methods(http.MethodGet)
|
protected.HandleFunc("/monitoring/clusters", monitoringHandler.ListClusterMonitoring).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/monitoring/clusters/{cluster_id}", monitoringHandler.GetClusterMonitoring).Methods(http.MethodGet)
|
protected.HandleFunc("/monitoring/clusters/{cluster_id}", monitoringHandler.GetClusterMonitoring).Methods(http.MethodGet)
|
||||||
|
protected.HandleFunc("/monitoring/clusters/{cluster_id}/metrics", monitoringHandler.GetClusterMonitoring).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/monitoring/clusters/{cluster_id}/nodes", monitoringHandler.GetNodeMetrics).Methods(http.MethodGet)
|
protected.HandleFunc("/monitoring/clusters/{cluster_id}/nodes", monitoringHandler.GetNodeMetrics).Methods(http.MethodGet)
|
||||||
protected.HandleFunc("/monitoring/summary", monitoringHandler.GetMonitoringSummary).Methods(http.MethodGet)
|
protected.HandleFunc("/monitoring/summary", monitoringHandler.GetMonitoringSummary).Methods(http.MethodGet)
|
||||||
|
|
||||||
@ -358,15 +372,16 @@ func loggingMiddleware(next http.Handler) http.Handler {
|
|||||||
// corsMiddleware CORS 中间件
|
// corsMiddleware CORS 中间件
|
||||||
func corsMiddleware(next http.Handler) http.Handler {
|
func corsMiddleware(next http.Handler) http.Handler {
|
||||||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
// 设置 CORS 头
|
|
||||||
origin := r.Header.Get("Origin")
|
origin := r.Header.Get("Origin")
|
||||||
if origin == "" {
|
if origin != "" {
|
||||||
origin = "*"
|
w.Header().Add("Vary", "Origin")
|
||||||
|
if corsOriginAllowed(origin) {
|
||||||
|
w.Header().Set("Access-Control-Allow-Origin", origin)
|
||||||
|
w.Header().Set("Access-Control-Allow-Credentials", "true")
|
||||||
|
}
|
||||||
}
|
}
|
||||||
w.Header().Set("Access-Control-Allow-Origin", origin)
|
|
||||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
|
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
|
||||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization, X-Requested-With")
|
w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization, X-Requested-With")
|
||||||
w.Header().Set("Access-Control-Allow-Credentials", "true")
|
|
||||||
w.Header().Set("Access-Control-Max-Age", "86400")
|
w.Header().Set("Access-Control-Max-Age", "86400")
|
||||||
|
|
||||||
// 处理 OPTIONS 预检请求
|
// 处理 OPTIONS 预检请求
|
||||||
@ -378,3 +393,47 @@ func corsMiddleware(next http.Handler) http.Handler {
|
|||||||
next.ServeHTTP(w, r)
|
next.ServeHTTP(w, r)
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func corsOriginAllowed(origin string) bool {
|
||||||
|
origin = strings.TrimSpace(origin)
|
||||||
|
if origin == "" {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
for _, allowed := range corsAllowedOrigins() {
|
||||||
|
if origin == allowed {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
func corsAllowedOrigins() []string {
|
||||||
|
configured := strings.TrimSpace(os.Getenv("CORS_ALLOWED_ORIGINS"))
|
||||||
|
if configured == "" {
|
||||||
|
configured = strings.TrimSpace(os.Getenv("ALLOWED_ORIGINS"))
|
||||||
|
}
|
||||||
|
if configured == "" {
|
||||||
|
return []string{
|
||||||
|
"http://localhost:3000",
|
||||||
|
"http://localhost:5173",
|
||||||
|
"http://localhost:8080",
|
||||||
|
"http://localhost:18080",
|
||||||
|
"http://localhost:18081",
|
||||||
|
"http://127.0.0.1:3000",
|
||||||
|
"http://127.0.0.1:5173",
|
||||||
|
"http://127.0.0.1:8080",
|
||||||
|
"http://127.0.0.1:18080",
|
||||||
|
"http://127.0.0.1:18081",
|
||||||
|
"http://10.6.80.114:18080",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
origins := make([]string, 0)
|
||||||
|
for _, origin := range strings.Split(configured, ",") {
|
||||||
|
origin = strings.TrimSpace(origin)
|
||||||
|
if origin == "" || origin == "*" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
origins = append(origins, origin)
|
||||||
|
}
|
||||||
|
return origins
|
||||||
|
}
|
||||||
|
|||||||
50
backend/cmd/api/main_test.go
Normal file
50
backend/cmd/api/main_test.go
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"net/http"
|
||||||
|
"net/http/httptest"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestCORSMiddlewareAllowsDefaultLocalhostOrigin(t *testing.T) {
|
||||||
|
t.Setenv("CORS_ALLOWED_ORIGINS", "")
|
||||||
|
t.Setenv("ALLOWED_ORIGINS", "")
|
||||||
|
|
||||||
|
req := httptest.NewRequest(http.MethodGet, "/health", nil)
|
||||||
|
req.Header.Set("Origin", "http://localhost:5173")
|
||||||
|
rec := httptest.NewRecorder()
|
||||||
|
|
||||||
|
corsMiddleware(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
w.WriteHeader(http.StatusOK)
|
||||||
|
})).ServeHTTP(rec, req)
|
||||||
|
|
||||||
|
if got := rec.Header().Get("Access-Control-Allow-Origin"); got != "http://localhost:5173" {
|
||||||
|
t.Fatalf("expected localhost origin to be allowed, got %q", got)
|
||||||
|
}
|
||||||
|
if got := rec.Header().Get("Access-Control-Allow-Credentials"); got != "true" {
|
||||||
|
t.Fatalf("expected credentials header for allowed origin, got %q", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCORSMiddlewareDoesNotReflectDisallowedOrigin(t *testing.T) {
|
||||||
|
t.Setenv("CORS_ALLOWED_ORIGINS", "https://app.example.com")
|
||||||
|
t.Setenv("ALLOWED_ORIGINS", "")
|
||||||
|
|
||||||
|
req := httptest.NewRequest(http.MethodOptions, "/api/v1/auth/login", nil)
|
||||||
|
req.Header.Set("Origin", "https://evil.example.com")
|
||||||
|
rec := httptest.NewRecorder()
|
||||||
|
|
||||||
|
corsMiddleware(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
t.Fatal("preflight should not call next handler")
|
||||||
|
})).ServeHTTP(rec, req)
|
||||||
|
|
||||||
|
if got := rec.Code; got != http.StatusNoContent {
|
||||||
|
t.Fatalf("expected preflight status %d, got %d", http.StatusNoContent, got)
|
||||||
|
}
|
||||||
|
if got := rec.Header().Get("Access-Control-Allow-Origin"); got != "" {
|
||||||
|
t.Fatalf("expected disallowed origin not to be reflected, got %q", got)
|
||||||
|
}
|
||||||
|
if got := rec.Header().Get("Access-Control-Allow-Credentials"); got != "" {
|
||||||
|
t.Fatalf("expected credentials header to be omitted for disallowed origin, got %q", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -1,19 +1,47 @@
|
|||||||
package dto
|
package dto
|
||||||
|
|
||||||
|
import "strings"
|
||||||
|
|
||||||
// RegisterRequest 用户注册请求
|
// RegisterRequest 用户注册请求
|
||||||
type RegisterRequest struct {
|
type RegisterRequest struct {
|
||||||
Username string `json:"username" binding:"required"`
|
Username string `json:"username" binding:"required"`
|
||||||
Password string `json:"password" binding:"required,min=6"`
|
Password string `json:"password" binding:"required,min=6"`
|
||||||
Role string `json:"role,omitempty"`
|
Role string `json:"role,omitempty"`
|
||||||
WorkspaceID string `json:"workspaceId,omitempty"`
|
WorkspaceID string `json:"workspaceId,omitempty"`
|
||||||
Namespace string `json:"namespace,omitempty"`
|
WorkspaceIDSnake string `json:"workspace_id,omitempty"`
|
||||||
DefaultClusterID string `json:"defaultClusterId,omitempty"`
|
Namespace string `json:"namespace,omitempty"`
|
||||||
QuotaCPU string `json:"quotaCpu,omitempty"`
|
DefaultClusterID string `json:"defaultClusterId,omitempty"`
|
||||||
QuotaMemory string `json:"quotaMemory,omitempty"`
|
DefaultClusterIDSnake string `json:"default_cluster_id,omitempty"`
|
||||||
QuotaGPU string `json:"quotaGpu,omitempty"`
|
QuotaCPU string `json:"quotaCpu,omitempty"`
|
||||||
QuotaGPUMem string `json:"quotaGpuMemory,omitempty"`
|
QuotaCPUSnake string `json:"quota_cpu,omitempty"`
|
||||||
IsActive *bool `json:"isActive,omitempty"`
|
QuotaMemory string `json:"quotaMemory,omitempty"`
|
||||||
MustChangePassword *bool `json:"mustChangePassword,omitempty"`
|
QuotaMemorySnake string `json:"quota_memory,omitempty"`
|
||||||
|
QuotaGPU string `json:"quotaGpu,omitempty"`
|
||||||
|
QuotaGPUSnake string `json:"quota_gpu,omitempty"`
|
||||||
|
QuotaGPUMem string `json:"quotaGpuMemory,omitempty"`
|
||||||
|
QuotaGPUMemSnake string `json:"quota_gpu_memory,omitempty"`
|
||||||
|
IsActive *bool `json:"isActive,omitempty"`
|
||||||
|
IsActiveSnake *bool `json:"is_active,omitempty"`
|
||||||
|
MustChangePassword *bool `json:"mustChangePassword,omitempty"`
|
||||||
|
MustChangePasswordSnake *bool `json:"must_change_password,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *RegisterRequest) Normalize() {
|
||||||
|
if r == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
r.WorkspaceID = firstNonBlank(r.WorkspaceID, r.WorkspaceIDSnake)
|
||||||
|
r.DefaultClusterID = firstNonBlank(r.DefaultClusterID, r.DefaultClusterIDSnake)
|
||||||
|
r.QuotaCPU = firstNonBlank(r.QuotaCPU, r.QuotaCPUSnake)
|
||||||
|
r.QuotaMemory = firstNonBlank(r.QuotaMemory, r.QuotaMemorySnake)
|
||||||
|
r.QuotaGPU = firstNonBlank(r.QuotaGPU, r.QuotaGPUSnake)
|
||||||
|
r.QuotaGPUMem = firstNonBlank(r.QuotaGPUMem, r.QuotaGPUMemSnake)
|
||||||
|
if r.IsActive == nil {
|
||||||
|
r.IsActive = r.IsActiveSnake
|
||||||
|
}
|
||||||
|
if r.MustChangePassword == nil {
|
||||||
|
r.MustChangePassword = r.MustChangePasswordSnake
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// LoginRequest 用户登录请求
|
// LoginRequest 用户登录请求
|
||||||
@ -68,14 +96,47 @@ type UserResponse struct {
|
|||||||
|
|
||||||
// UpdateUserRequest 管理员更新用户状态/角色请求
|
// UpdateUserRequest 管理员更新用户状态/角色请求
|
||||||
type UpdateUserRequest struct {
|
type UpdateUserRequest struct {
|
||||||
Role string `json:"role,omitempty"`
|
Role string `json:"role,omitempty"`
|
||||||
WorkspaceID string `json:"workspaceId,omitempty"`
|
WorkspaceID string `json:"workspaceId,omitempty"`
|
||||||
Namespace string `json:"namespace,omitempty"`
|
WorkspaceIDSnake string `json:"workspace_id,omitempty"`
|
||||||
DefaultClusterID string `json:"defaultClusterId,omitempty"`
|
Namespace string `json:"namespace,omitempty"`
|
||||||
QuotaCPU string `json:"quotaCpu,omitempty"`
|
DefaultClusterID string `json:"defaultClusterId,omitempty"`
|
||||||
QuotaMemory string `json:"quotaMemory,omitempty"`
|
DefaultClusterIDSnake string `json:"default_cluster_id,omitempty"`
|
||||||
QuotaGPU string `json:"quotaGpu,omitempty"`
|
QuotaCPU string `json:"quotaCpu,omitempty"`
|
||||||
QuotaGPUMem string `json:"quotaGpuMemory,omitempty"`
|
QuotaCPUSnake string `json:"quota_cpu,omitempty"`
|
||||||
IsActive *bool `json:"isActive,omitempty"`
|
QuotaMemory string `json:"quotaMemory,omitempty"`
|
||||||
MustChangePassword *bool `json:"mustChangePassword,omitempty"`
|
QuotaMemorySnake string `json:"quota_memory,omitempty"`
|
||||||
|
QuotaGPU string `json:"quotaGpu,omitempty"`
|
||||||
|
QuotaGPUSnake string `json:"quota_gpu,omitempty"`
|
||||||
|
QuotaGPUMem string `json:"quotaGpuMemory,omitempty"`
|
||||||
|
QuotaGPUMemSnake string `json:"quota_gpu_memory,omitempty"`
|
||||||
|
IsActive *bool `json:"isActive,omitempty"`
|
||||||
|
IsActiveSnake *bool `json:"is_active,omitempty"`
|
||||||
|
MustChangePassword *bool `json:"mustChangePassword,omitempty"`
|
||||||
|
MustChangePasswordSnake *bool `json:"must_change_password,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *UpdateUserRequest) Normalize() {
|
||||||
|
if r == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
r.WorkspaceID = firstNonBlank(r.WorkspaceID, r.WorkspaceIDSnake)
|
||||||
|
r.DefaultClusterID = firstNonBlank(r.DefaultClusterID, r.DefaultClusterIDSnake)
|
||||||
|
r.QuotaCPU = firstNonBlank(r.QuotaCPU, r.QuotaCPUSnake)
|
||||||
|
r.QuotaMemory = firstNonBlank(r.QuotaMemory, r.QuotaMemorySnake)
|
||||||
|
r.QuotaGPU = firstNonBlank(r.QuotaGPU, r.QuotaGPUSnake)
|
||||||
|
r.QuotaGPUMem = firstNonBlank(r.QuotaGPUMem, r.QuotaGPUMemSnake)
|
||||||
|
if r.IsActive == nil {
|
||||||
|
r.IsActive = r.IsActiveSnake
|
||||||
|
}
|
||||||
|
if r.MustChangePassword == nil {
|
||||||
|
r.MustChangePassword = r.MustChangePasswordSnake
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func firstNonBlank(primary, alternate string) string {
|
||||||
|
if strings.TrimSpace(primary) != "" {
|
||||||
|
return primary
|
||||||
|
}
|
||||||
|
return alternate
|
||||||
}
|
}
|
||||||
|
|||||||
51
backend/internal/adapter/input/http/dto/auth_dto_test.go
Normal file
51
backend/internal/adapter/input/http/dto/auth_dto_test.go
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
package dto
|
||||||
|
|
||||||
|
import "testing"
|
||||||
|
|
||||||
|
func TestRegisterRequestNormalizeUsesSnakeCaseAlternates(t *testing.T) {
|
||||||
|
active := false
|
||||||
|
mustChange := true
|
||||||
|
req := RegisterRequest{
|
||||||
|
WorkspaceIDSnake: "workspace-1",
|
||||||
|
DefaultClusterIDSnake: "cluster-1",
|
||||||
|
QuotaCPUSnake: "2",
|
||||||
|
QuotaMemorySnake: "4Gi",
|
||||||
|
QuotaGPUSnake: "1",
|
||||||
|
QuotaGPUMemSnake: "10000",
|
||||||
|
IsActiveSnake: &active,
|
||||||
|
MustChangePasswordSnake: &mustChange,
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Normalize()
|
||||||
|
|
||||||
|
if req.WorkspaceID != "workspace-1" || req.DefaultClusterID != "cluster-1" {
|
||||||
|
t.Fatalf("expected snake case workspace/cluster fields to normalize, got %#v", req)
|
||||||
|
}
|
||||||
|
if req.QuotaCPU != "2" || req.QuotaMemory != "4Gi" || req.QuotaGPU != "1" || req.QuotaGPUMem != "10000" {
|
||||||
|
t.Fatalf("expected snake case quota fields to normalize, got %#v", req)
|
||||||
|
}
|
||||||
|
if req.IsActive == nil || *req.IsActive {
|
||||||
|
t.Fatalf("expected is_active=false to normalize, got %#v", req.IsActive)
|
||||||
|
}
|
||||||
|
if req.MustChangePassword == nil || !*req.MustChangePassword {
|
||||||
|
t.Fatalf("expected must_change_password=true to normalize, got %#v", req.MustChangePassword)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUpdateUserRequestNormalizeKeepsCamelCasePrimary(t *testing.T) {
|
||||||
|
req := UpdateUserRequest{
|
||||||
|
DefaultClusterID: "camel-cluster",
|
||||||
|
DefaultClusterIDSnake: "snake-cluster",
|
||||||
|
QuotaCPU: "3",
|
||||||
|
QuotaCPUSnake: "4",
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Normalize()
|
||||||
|
|
||||||
|
if req.DefaultClusterID != "camel-cluster" {
|
||||||
|
t.Fatalf("expected camelCase defaultClusterId to win, got %q", req.DefaultClusterID)
|
||||||
|
}
|
||||||
|
if req.QuotaCPU != "3" {
|
||||||
|
t.Fatalf("expected camelCase quotaCpu to win, got %q", req.QuotaCPU)
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -2,25 +2,25 @@ package dto
|
|||||||
|
|
||||||
// CreateInstanceRequest 创建实例请求
|
// CreateInstanceRequest 创建实例请求
|
||||||
type CreateInstanceRequest struct {
|
type CreateInstanceRequest struct {
|
||||||
Name string `json:"name" binding:"required"`
|
Name string `json:"name" binding:"required"`
|
||||||
Namespace string `json:"namespace" binding:"required"`
|
Namespace string `json:"namespace" binding:"required"`
|
||||||
RegistryID string `json:"registryId" binding:"required"`
|
RegistryID string `json:"registryId" binding:"required"`
|
||||||
RegistryIDAlt string `json:"registry_id"`
|
RegistryIDAlt string `json:"registry_id"`
|
||||||
Repository string `json:"repository" binding:"required"`
|
Repository string `json:"repository" binding:"required"`
|
||||||
Tag string `json:"tag" binding:"required"`
|
Tag string `json:"tag" binding:"required"`
|
||||||
Description string `json:"description"`
|
Description string `json:"description"`
|
||||||
Values map[string]interface{} `json:"values"`
|
Values map[string]interface{} `json:"values"`
|
||||||
ValuesYAML string `json:"valuesYaml"`
|
ValuesYAML string `json:"valuesYaml"`
|
||||||
ValuesYAMLAlt string `json:"values_yaml"`
|
ValuesYAMLAlt string `json:"values_yaml"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// UpdateInstanceRequest 更新实例请求
|
// UpdateInstanceRequest 更新实例请求
|
||||||
type UpdateInstanceRequest struct {
|
type UpdateInstanceRequest struct {
|
||||||
Version string `json:"version"`
|
Version string `json:"version"`
|
||||||
Description string `json:"description"`
|
Description string `json:"description"`
|
||||||
Values map[string]interface{} `json:"values"`
|
Values map[string]interface{} `json:"values"`
|
||||||
ValuesYAML string `json:"valuesYaml"`
|
ValuesYAML string `json:"valuesYaml"`
|
||||||
ValuesYAMLAlt string `json:"values_yaml"`
|
ValuesYAMLAlt string `json:"values_yaml"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// Normalize 将多种命名风格的字段合并到统一字段
|
// Normalize 将多种命名风格的字段合并到统一字段
|
||||||
@ -67,6 +67,7 @@ type InstanceResponse struct {
|
|||||||
Status string `json:"status"`
|
Status string `json:"status"`
|
||||||
WorkspaceID string `json:"workspaceId"`
|
WorkspaceID string `json:"workspaceId"`
|
||||||
OwnerID string `json:"ownerId"`
|
OwnerID string `json:"ownerId"`
|
||||||
|
OwnerUsername string `json:"ownerUsername,omitempty"`
|
||||||
AllowedActions []string `json:"allowedActions,omitempty"`
|
AllowedActions []string `json:"allowedActions,omitempty"`
|
||||||
StatusReason string `json:"statusReason,omitempty"`
|
StatusReason string `json:"statusReason,omitempty"`
|
||||||
LastOperation string `json:"lastOperation,omitempty"`
|
LastOperation string `json:"lastOperation,omitempty"`
|
||||||
|
|||||||
@ -8,29 +8,56 @@ import (
|
|||||||
|
|
||||||
// ClusterMetricsResponse 集群监控响应
|
// ClusterMetricsResponse 集群监控响应
|
||||||
type ClusterMetricsResponse struct {
|
type ClusterMetricsResponse struct {
|
||||||
ClusterID string `json:"clusterId"`
|
ClusterID string `json:"clusterId"`
|
||||||
ClusterName string `json:"clusterName"`
|
ClusterName string `json:"clusterName"`
|
||||||
Status string `json:"status"`
|
Status string `json:"status"`
|
||||||
Uptime string `json:"uptime"`
|
Uptime string `json:"uptime"`
|
||||||
NodeCount int `json:"nodeCount"`
|
NodeCount int `json:"nodeCount"`
|
||||||
PodCount int `json:"podCount"`
|
PodCount int `json:"podCount"`
|
||||||
LastCheck time.Time `json:"lastCheck"`
|
LastCheck time.Time `json:"lastCheck"`
|
||||||
TotalCPU string `json:"totalCpu"`
|
TotalCPU string `json:"totalCpu"`
|
||||||
TotalMemory string `json:"totalMemory"`
|
TotalMemory string `json:"totalMemory"`
|
||||||
TotalGPU int `json:"totalGpu"`
|
TotalGPU int `json:"totalGpu"`
|
||||||
UsedCPU string `json:"usedCpu"`
|
UsedCPU string `json:"usedCpu"`
|
||||||
UsedMemory string `json:"usedMemory"`
|
UsedMemory string `json:"usedMemory"`
|
||||||
UsedGPU int `json:"usedGpu"`
|
UsedGPU int `json:"usedGpu"`
|
||||||
CPUUsage float64 `json:"cpuUsage"`
|
CPUUsage float64 `json:"cpuUsage"`
|
||||||
MemoryUsage float64 `json:"memoryUsage"`
|
MemoryUsage float64 `json:"memoryUsage"`
|
||||||
GPUUsage float64 `json:"gpuUsage"`
|
GPUUsage float64 `json:"gpuUsage"`
|
||||||
MaxNodeCPU string `json:"maxNodeCpu"`
|
CPURequests string `json:"cpuRequests,omitempty"`
|
||||||
MaxNodeMemory string `json:"maxNodeMemory"`
|
CPULimits string `json:"cpuLimits,omitempty"`
|
||||||
MaxNodeGPU int `json:"maxNodeGpu"`
|
MemoryRequests string `json:"memoryRequests,omitempty"`
|
||||||
MaxNodeCPUUsage float64 `json:"maxNodeCpuUsage"`
|
MemoryLimits string `json:"memoryLimits,omitempty"`
|
||||||
MaxNodeMemUsage float64 `json:"maxNodeMemUsage"`
|
GPURequests int64 `json:"gpuRequests,omitempty"`
|
||||||
MaxNodeGPUUsage float64 `json:"maxNodeGpuUsage"`
|
GPULimits int64 `json:"gpuLimits,omitempty"`
|
||||||
Nodes []NodeMetricsResponse `json:"nodes,omitempty"`
|
GPUMemoryRequestsMB int64 `json:"gpuMemoryRequestsMb,omitempty"`
|
||||||
|
GPUMemoryLimitsMB int64 `json:"gpuMemoryLimitsMb,omitempty"`
|
||||||
|
AllocatedGPU int64 `json:"allocatedGpu,omitempty"`
|
||||||
|
AllocatedGPUMemoryMB int64 `json:"allocatedGpuMemoryMb,omitempty"`
|
||||||
|
ResourceUsageByUser []UserResourceUsageResponse `json:"resourceUsageByUser,omitempty"`
|
||||||
|
MaxNodeCPU string `json:"maxNodeCpu"`
|
||||||
|
MaxNodeMemory string `json:"maxNodeMemory"`
|
||||||
|
MaxNodeGPU int `json:"maxNodeGpu"`
|
||||||
|
MaxNodeCPUUsage float64 `json:"maxNodeCpuUsage"`
|
||||||
|
MaxNodeMemUsage float64 `json:"maxNodeMemUsage"`
|
||||||
|
MaxNodeGPUUsage float64 `json:"maxNodeGpuUsage"`
|
||||||
|
Nodes []NodeMetricsResponse `json:"nodes,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type UserResourceUsageResponse struct {
|
||||||
|
UserID string `json:"userId"`
|
||||||
|
Username string `json:"username"`
|
||||||
|
WorkspaceID string `json:"workspaceId"`
|
||||||
|
InstanceCount int `json:"instanceCount"`
|
||||||
|
PodCount int `json:"podCount"`
|
||||||
|
CPURequests string `json:"cpuRequests"`
|
||||||
|
CPULimits string `json:"cpuLimits"`
|
||||||
|
MemoryRequests string `json:"memoryRequests"`
|
||||||
|
MemoryLimits string `json:"memoryLimits"`
|
||||||
|
GPURequests int64 `json:"gpuRequests"`
|
||||||
|
GPULimits int64 `json:"gpuLimits"`
|
||||||
|
GPUMemoryRequestsMB int64 `json:"gpuMemoryRequestsMb"`
|
||||||
|
GPUMemoryLimitsMB int64 `json:"gpuMemoryLimitsMb"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// NodeMetricsResponse 节点监控响应
|
// NodeMetricsResponse 节点监控响应
|
||||||
@ -72,28 +99,59 @@ type MonitoringSummaryResponse struct {
|
|||||||
// ToClusterMetricsResponse 转换为响应
|
// ToClusterMetricsResponse 转换为响应
|
||||||
func ToClusterMetricsResponse(m *entity.ClusterMetrics) *ClusterMetricsResponse {
|
func ToClusterMetricsResponse(m *entity.ClusterMetrics) *ClusterMetricsResponse {
|
||||||
resp := &ClusterMetricsResponse{
|
resp := &ClusterMetricsResponse{
|
||||||
ClusterID: m.ClusterID,
|
ClusterID: m.ClusterID,
|
||||||
ClusterName: m.ClusterName,
|
ClusterName: m.ClusterName,
|
||||||
Status: m.Status,
|
Status: m.Status,
|
||||||
Uptime: m.Uptime,
|
Uptime: m.Uptime,
|
||||||
NodeCount: m.NodeCount,
|
NodeCount: m.NodeCount,
|
||||||
PodCount: m.PodCount,
|
PodCount: m.PodCount,
|
||||||
LastCheck: m.LastCheck,
|
LastCheck: m.LastCheck,
|
||||||
TotalCPU: m.TotalCPU,
|
TotalCPU: m.TotalCPU,
|
||||||
TotalMemory: m.TotalMemory,
|
TotalMemory: m.TotalMemory,
|
||||||
TotalGPU: m.TotalGPU,
|
TotalGPU: m.TotalGPU,
|
||||||
UsedCPU: m.UsedCPU,
|
UsedCPU: m.UsedCPU,
|
||||||
UsedMemory: m.UsedMemory,
|
UsedMemory: m.UsedMemory,
|
||||||
UsedGPU: m.UsedGPU,
|
UsedGPU: m.UsedGPU,
|
||||||
CPUUsage: m.CPUUsage,
|
CPUUsage: m.CPUUsage,
|
||||||
MemoryUsage: m.MemoryUsage,
|
MemoryUsage: m.MemoryUsage,
|
||||||
GPUUsage: m.GPUUsage,
|
GPUUsage: m.GPUUsage,
|
||||||
MaxNodeCPU: m.MaxNodeCPU,
|
CPURequests: m.CPURequests,
|
||||||
MaxNodeMemory: m.MaxNodeMemory,
|
CPULimits: m.CPULimits,
|
||||||
MaxNodeGPU: m.MaxNodeGPU,
|
MemoryRequests: m.MemoryRequests,
|
||||||
MaxNodeCPUUsage: m.MaxNodeCPUUsage,
|
MemoryLimits: m.MemoryLimits,
|
||||||
MaxNodeMemUsage: m.MaxNodeMemUsage,
|
GPURequests: m.GPURequests,
|
||||||
MaxNodeGPUUsage: m.MaxNodeGPUUsage,
|
GPULimits: m.GPULimits,
|
||||||
|
GPUMemoryRequestsMB: m.GPUMemoryRequestsMB,
|
||||||
|
GPUMemoryLimitsMB: m.GPUMemoryLimitsMB,
|
||||||
|
AllocatedGPU: m.AllocatedGPU,
|
||||||
|
AllocatedGPUMemoryMB: m.AllocatedGPUMemoryMB,
|
||||||
|
MaxNodeCPU: m.MaxNodeCPU,
|
||||||
|
MaxNodeMemory: m.MaxNodeMemory,
|
||||||
|
MaxNodeGPU: m.MaxNodeGPU,
|
||||||
|
MaxNodeCPUUsage: m.MaxNodeCPUUsage,
|
||||||
|
MaxNodeMemUsage: m.MaxNodeMemUsage,
|
||||||
|
MaxNodeGPUUsage: m.MaxNodeGPUUsage,
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(m.ResourceUsageByUser) > 0 {
|
||||||
|
resp.ResourceUsageByUser = make([]UserResourceUsageResponse, len(m.ResourceUsageByUser))
|
||||||
|
for i, usage := range m.ResourceUsageByUser {
|
||||||
|
resp.ResourceUsageByUser[i] = UserResourceUsageResponse{
|
||||||
|
UserID: usage.UserID,
|
||||||
|
Username: usage.Username,
|
||||||
|
WorkspaceID: usage.WorkspaceID,
|
||||||
|
InstanceCount: usage.InstanceCount,
|
||||||
|
PodCount: usage.PodCount,
|
||||||
|
CPURequests: usage.CPURequests,
|
||||||
|
CPULimits: usage.CPULimits,
|
||||||
|
MemoryRequests: usage.MemoryRequests,
|
||||||
|
MemoryLimits: usage.MemoryLimits,
|
||||||
|
GPURequests: usage.GPURequests,
|
||||||
|
GPULimits: usage.GPULimits,
|
||||||
|
GPUMemoryRequestsMB: usage.GPUMemoryRequestsMB,
|
||||||
|
GPUMemoryLimitsMB: usage.GPUMemoryLimitsMB,
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(m.Nodes) > 0 {
|
if len(m.Nodes) > 0 {
|
||||||
|
|||||||
@ -126,6 +126,25 @@ func (h *ArtifactHandler) ListArtifacts(w http.ResponseWriter, r *http.Request)
|
|||||||
respondJSON(w, http.StatusOK, tagResponses)
|
respondJSON(w, http.StatusOK, tagResponses)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ListRepositoryTags is a compatibility alias for clients that request tags
|
||||||
|
// directly instead of the canonical artifacts endpoint.
|
||||||
|
func (h *ArtifactHandler) ListRepositoryTags(w http.ResponseWriter, r *http.Request) {
|
||||||
|
vars := mux.Vars(r)
|
||||||
|
if vars["registry_id"] == "" {
|
||||||
|
registryID := r.URL.Query().Get("registry_id")
|
||||||
|
if registryID == "" {
|
||||||
|
registryID = r.URL.Query().Get("registryId")
|
||||||
|
}
|
||||||
|
if registryID == "" {
|
||||||
|
respondError(w, http.StatusBadRequest, "Missing registry ID", "registry_id query parameter is required")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
vars["registry_id"] = registryID
|
||||||
|
r = mux.SetURLVars(r, vars)
|
||||||
|
}
|
||||||
|
h.ListArtifacts(w, r)
|
||||||
|
}
|
||||||
|
|
||||||
// GetArtifact 获取 artifact 详情
|
// GetArtifact 获取 artifact 详情
|
||||||
// @Summary 获取 Artifact 详情
|
// @Summary 获取 Artifact 详情
|
||||||
// @Description 获取指定 Artifact 的详细信息
|
// @Description 获取指定 Artifact 的详细信息
|
||||||
|
|||||||
@ -3,8 +3,11 @@ package rest
|
|||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
|
"net"
|
||||||
"net/http"
|
"net/http"
|
||||||
"strings"
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
"github.com/gorilla/mux"
|
"github.com/gorilla/mux"
|
||||||
"github.com/ocdp/cluster-service/internal/adapter/input/http/dto"
|
"github.com/ocdp/cluster-service/internal/adapter/input/http/dto"
|
||||||
@ -18,6 +21,74 @@ type AuthHandler struct {
|
|||||||
authService *service.AuthService
|
authService *service.AuthService
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const (
|
||||||
|
loginRateLimitWindow = time.Minute
|
||||||
|
loginRateLimitFailures = 5
|
||||||
|
)
|
||||||
|
|
||||||
|
var defaultLoginRateLimiter = newLoginRateLimiter(loginRateLimitWindow, loginRateLimitFailures)
|
||||||
|
|
||||||
|
type loginRateLimiter struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
window time.Duration
|
||||||
|
limit int
|
||||||
|
failures map[string]loginFailureState
|
||||||
|
now func() time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
type loginFailureState struct {
|
||||||
|
count int
|
||||||
|
windowEnds time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
func newLoginRateLimiter(window time.Duration, limit int) *loginRateLimiter {
|
||||||
|
return &loginRateLimiter{
|
||||||
|
window: window,
|
||||||
|
limit: limit,
|
||||||
|
failures: make(map[string]loginFailureState),
|
||||||
|
now: time.Now,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *loginRateLimiter) Allow(key string) bool {
|
||||||
|
if l == nil || key == "" {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
l.mu.Lock()
|
||||||
|
defer l.mu.Unlock()
|
||||||
|
state, ok := l.failures[key]
|
||||||
|
now := l.now()
|
||||||
|
if !ok || now.After(state.windowEnds) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return state.count < l.limit
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *loginRateLimiter) RecordFailure(key string) {
|
||||||
|
if l == nil || key == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
l.mu.Lock()
|
||||||
|
defer l.mu.Unlock()
|
||||||
|
now := l.now()
|
||||||
|
state, ok := l.failures[key]
|
||||||
|
if !ok || now.After(state.windowEnds) {
|
||||||
|
l.failures[key] = loginFailureState{count: 1, windowEnds: now.Add(l.window)}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
state.count++
|
||||||
|
l.failures[key] = state
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *loginRateLimiter) Reset(key string) {
|
||||||
|
if l == nil || key == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
l.mu.Lock()
|
||||||
|
defer l.mu.Unlock()
|
||||||
|
delete(l.failures, key)
|
||||||
|
}
|
||||||
|
|
||||||
// NewAuthHandler 创建认证 Handler
|
// NewAuthHandler 创建认证 Handler
|
||||||
func NewAuthHandler(authService *service.AuthService) *AuthHandler {
|
func NewAuthHandler(authService *service.AuthService) *AuthHandler {
|
||||||
return &AuthHandler{
|
return &AuthHandler{
|
||||||
@ -41,6 +112,7 @@ func (h *AuthHandler) Register(w http.ResponseWriter, r *http.Request) {
|
|||||||
respondError(w, http.StatusBadRequest, "Invalid request body", err.Error())
|
respondError(w, http.StatusBadRequest, "Invalid request body", err.Error())
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
req.Normalize()
|
||||||
|
|
||||||
// 调用领域服务
|
// 调用领域服务
|
||||||
user, err := h.authService.Register(r.Context(), req.Username, req.Password, req.Role, req.WorkspaceID, service.UserWorkspaceOptions{
|
user, err := h.authService.Register(r.Context(), req.Username, req.Password, req.Role, req.WorkspaceID, service.UserWorkspaceOptions{
|
||||||
@ -79,6 +151,7 @@ func (h *AuthHandler) UpdateUser(w http.ResponseWriter, r *http.Request) {
|
|||||||
respondError(w, http.StatusBadRequest, "Invalid request body", err.Error())
|
respondError(w, http.StatusBadRequest, "Invalid request body", err.Error())
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
req.Normalize()
|
||||||
user, err := h.authService.UpdateUser(r.Context(), userID, req.Role, req.WorkspaceID, service.UserWorkspaceOptions{
|
user, err := h.authService.UpdateUser(r.Context(), userID, req.Role, req.WorkspaceID, service.UserWorkspaceOptions{
|
||||||
Namespace: req.Namespace,
|
Namespace: req.Namespace,
|
||||||
DefaultClusterID: req.DefaultClusterID,
|
DefaultClusterID: req.DefaultClusterID,
|
||||||
@ -120,12 +193,21 @@ func (h *AuthHandler) Login(w http.ResponseWriter, r *http.Request) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
rateLimitKey := loginRateLimitKey(r, req.Username)
|
||||||
|
if !defaultLoginRateLimiter.Allow(rateLimitKey) {
|
||||||
|
w.Header().Set("Retry-After", "60")
|
||||||
|
respondError(w, http.StatusTooManyRequests, "Too many login attempts", "too many login attempts; retry later")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
// 调用领域服务
|
// 调用领域服务
|
||||||
accessToken, refreshToken, user, err := h.authService.Login(r.Context(), req.Username, req.Password)
|
accessToken, refreshToken, user, err := h.authService.Login(r.Context(), req.Username, req.Password)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
respondError(w, http.StatusUnauthorized, "Login failed", err.Error())
|
defaultLoginRateLimiter.RecordFailure(rateLimitKey)
|
||||||
|
respondError(w, http.StatusUnauthorized, "Invalid username or password", "invalid username or password")
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
defaultLoginRateLimiter.Reset(rateLimitKey)
|
||||||
|
|
||||||
workspace, _ := h.authService.GetWorkspaceByID(r.Context(), user.WorkspaceID)
|
workspace, _ := h.authService.GetWorkspaceByID(r.Context(), user.WorkspaceID)
|
||||||
|
|
||||||
@ -151,6 +233,23 @@ func (h *AuthHandler) Login(w http.ResponseWriter, r *http.Request) {
|
|||||||
respondJSON(w, http.StatusOK, response)
|
respondJSON(w, http.StatusOK, response)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func loginRateLimitKey(r *http.Request, username string) string {
|
||||||
|
client := strings.TrimSpace(r.Header.Get("X-Forwarded-For"))
|
||||||
|
if idx := strings.Index(client, ","); idx >= 0 {
|
||||||
|
client = strings.TrimSpace(client[:idx])
|
||||||
|
}
|
||||||
|
if client == "" {
|
||||||
|
client = strings.TrimSpace(r.Header.Get("X-Real-IP"))
|
||||||
|
}
|
||||||
|
if client == "" {
|
||||||
|
client = r.RemoteAddr
|
||||||
|
if host, _, err := net.SplitHostPort(client); err == nil {
|
||||||
|
client = host
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return strings.ToLower(strings.TrimSpace(username)) + "|" + client
|
||||||
|
}
|
||||||
|
|
||||||
func (h *AuthHandler) convertUserResponse(ctx context.Context, user *entity.User) *dto.UserResponse {
|
func (h *AuthHandler) convertUserResponse(ctx context.Context, user *entity.User) *dto.UserResponse {
|
||||||
workspace, _ := h.authService.GetWorkspaceByID(ctx, user.WorkspaceID)
|
workspace, _ := h.authService.GetWorkspaceByID(ctx, user.WorkspaceID)
|
||||||
return &dto.UserResponse{
|
return &dto.UserResponse{
|
||||||
|
|||||||
@ -0,0 +1,44 @@
|
|||||||
|
package rest
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestLoginRateLimiterBlocksAfterConfiguredFailures(t *testing.T) {
|
||||||
|
now := time.Date(2026, 5, 14, 12, 0, 0, 0, time.UTC)
|
||||||
|
limiter := newLoginRateLimiter(time.Minute, 2)
|
||||||
|
limiter.now = func() time.Time { return now }
|
||||||
|
|
||||||
|
key := "user|127.0.0.1"
|
||||||
|
if !limiter.Allow(key) {
|
||||||
|
t.Fatal("expected first attempt to be allowed")
|
||||||
|
}
|
||||||
|
limiter.RecordFailure(key)
|
||||||
|
if !limiter.Allow(key) {
|
||||||
|
t.Fatal("expected second attempt to be allowed")
|
||||||
|
}
|
||||||
|
limiter.RecordFailure(key)
|
||||||
|
if limiter.Allow(key) {
|
||||||
|
t.Fatal("expected third attempt inside the window to be blocked")
|
||||||
|
}
|
||||||
|
|
||||||
|
now = now.Add(time.Minute + time.Second)
|
||||||
|
if !limiter.Allow(key) {
|
||||||
|
t.Fatal("expected attempts to be allowed after the window expires")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestLoginRateLimiterResetClearsFailures(t *testing.T) {
|
||||||
|
limiter := newLoginRateLimiter(time.Minute, 1)
|
||||||
|
key := "user|127.0.0.1"
|
||||||
|
|
||||||
|
limiter.RecordFailure(key)
|
||||||
|
if limiter.Allow(key) {
|
||||||
|
t.Fatal("expected key to be blocked after one failure")
|
||||||
|
}
|
||||||
|
limiter.Reset(key)
|
||||||
|
if !limiter.Allow(key) {
|
||||||
|
t.Fatal("expected reset key to be allowed")
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -4,6 +4,7 @@ import (
|
|||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"net/http"
|
"net/http"
|
||||||
|
"reflect"
|
||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
@ -49,6 +50,11 @@ func (h *InstanceHandler) CreateInstance(w http.ResponseWriter, r *http.Request)
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
req.Normalize()
|
req.Normalize()
|
||||||
|
parsedYAML, hasValuesYAML, err := parseAndCompareValues(req.Values, req.ValuesYAML)
|
||||||
|
if err != nil {
|
||||||
|
respondError(w, http.StatusBadRequest, "Invalid values", err.Error())
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
// Extract chart name from repository (e.g., "charts/nginx" -> "nginx")
|
// Extract chart name from repository (e.g., "charts/nginx" -> "nginx")
|
||||||
chart := req.Repository
|
chart := req.Repository
|
||||||
@ -71,21 +77,16 @@ func (h *InstanceHandler) CreateInstance(w http.ResponseWriter, r *http.Request)
|
|||||||
if req.Values != nil {
|
if req.Values != nil {
|
||||||
instance.SetValues(req.Values)
|
instance.SetValues(req.Values)
|
||||||
}
|
}
|
||||||
if req.ValuesYAML != "" {
|
if hasValuesYAML {
|
||||||
instance.SetValuesYAML(req.ValuesYAML)
|
instance.SetValuesYAML(req.ValuesYAML)
|
||||||
if req.Values == nil {
|
if req.Values == nil {
|
||||||
values, err := parseValuesYAML(req.ValuesYAML)
|
instance.SetValues(parsedYAML)
|
||||||
if err != nil {
|
|
||||||
respondError(w, http.StatusBadRequest, "Invalid values YAML", err.Error())
|
|
||||||
return
|
|
||||||
}
|
|
||||||
instance.SetValues(values)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// 调用领域服务
|
// 调用领域服务
|
||||||
if err := h.instanceService.CreateInstance(r.Context(), instance); err != nil {
|
if err := h.instanceService.CreateInstance(r.Context(), instance); err != nil {
|
||||||
respondError(w, http.StatusBadRequest, "Failed to create instance", err.Error())
|
respondServiceError(w, err, "Failed to create instance")
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -116,6 +117,7 @@ func (h *InstanceHandler) GetInstance(w http.ResponseWriter, r *http.Request) {
|
|||||||
respondError(w, http.StatusNotFound, "Instance not found", "resource does not belong to cluster")
|
respondError(w, http.StatusNotFound, "Instance not found", "resource does not belong to cluster")
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
h.instanceService.EnrichReplicas(r.Context(), clusterID, []*entity.Instance{instance})
|
||||||
|
|
||||||
respondJSON(w, http.StatusOK, convertInstanceResponse(instance, true))
|
respondJSON(w, http.StatusOK, convertInstanceResponse(instance, true))
|
||||||
}
|
}
|
||||||
@ -144,7 +146,7 @@ func (h *InstanceHandler) ListInstances(w http.ResponseWriter, r *http.Request)
|
|||||||
|
|
||||||
responses := make([]*dto.InstanceResponse, 0, len(instances))
|
responses := make([]*dto.InstanceResponse, 0, len(instances))
|
||||||
for _, instance := range instances {
|
for _, instance := range instances {
|
||||||
responses = append(responses, convertInstanceResponse(instance, false))
|
responses = append(responses, convertInstanceResponse(instance, true))
|
||||||
}
|
}
|
||||||
|
|
||||||
response := &dto.InstanceListResponse{
|
response := &dto.InstanceListResponse{
|
||||||
@ -177,6 +179,11 @@ func (h *InstanceHandler) UpdateInstance(w http.ResponseWriter, r *http.Request)
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
req.Normalize()
|
req.Normalize()
|
||||||
|
parsedYAML, hasValuesYAML, err := parseAndCompareValues(req.Values, req.ValuesYAML)
|
||||||
|
if err != nil {
|
||||||
|
respondError(w, http.StatusBadRequest, "Invalid values", err.Error())
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
// 获取现有实例
|
// 获取现有实例
|
||||||
instance, err := h.instanceService.GetInstance(r.Context(), instanceID)
|
instance, err := h.instanceService.GetInstance(r.Context(), instanceID)
|
||||||
@ -194,21 +201,16 @@ func (h *InstanceHandler) UpdateInstance(w http.ResponseWriter, r *http.Request)
|
|||||||
if req.Description != "" {
|
if req.Description != "" {
|
||||||
instance.Description = req.Description
|
instance.Description = req.Description
|
||||||
}
|
}
|
||||||
if req.ValuesYAML != "" {
|
if hasValuesYAML {
|
||||||
instance.SetValuesYAML(req.ValuesYAML)
|
instance.SetValuesYAML(req.ValuesYAML)
|
||||||
if req.Values == nil {
|
if req.Values == nil {
|
||||||
values, err := parseValuesYAML(req.ValuesYAML)
|
instance.SetValues(parsedYAML)
|
||||||
if err != nil {
|
|
||||||
respondError(w, http.StatusBadRequest, "Invalid values YAML", err.Error())
|
|
||||||
return
|
|
||||||
}
|
|
||||||
instance.SetValues(values)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// 调用领域服务
|
// 调用领域服务
|
||||||
if err := h.instanceService.UpdateInstance(r.Context(), instance); err != nil {
|
if err := h.instanceService.UpdateInstance(r.Context(), instance); err != nil {
|
||||||
respondError(w, http.StatusBadRequest, "Failed to update instance", err.Error())
|
respondServiceError(w, err, "Failed to update instance")
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -345,7 +347,6 @@ func (h *InstanceHandler) StreamInstanceLogs(w http.ResponseWriter, r *http.Requ
|
|||||||
w.Header().Set("Content-Type", "text/event-stream")
|
w.Header().Set("Content-Type", "text/event-stream")
|
||||||
w.Header().Set("Cache-Control", "no-cache")
|
w.Header().Set("Cache-Control", "no-cache")
|
||||||
w.Header().Set("Connection", "keep-alive")
|
w.Header().Set("Connection", "keep-alive")
|
||||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
|
||||||
|
|
||||||
flusher, ok := w.(http.Flusher)
|
flusher, ok := w.(http.Flusher)
|
||||||
if !ok {
|
if !ok {
|
||||||
@ -585,6 +586,7 @@ func convertInstanceResponse(instance *entity.Instance, includeValues bool) *dto
|
|||||||
Status: string(instance.Status),
|
Status: string(instance.Status),
|
||||||
WorkspaceID: instance.WorkspaceID,
|
WorkspaceID: instance.WorkspaceID,
|
||||||
OwnerID: instance.OwnerID,
|
OwnerID: instance.OwnerID,
|
||||||
|
OwnerUsername: instance.OwnerUsername,
|
||||||
StatusReason: instance.StatusReason,
|
StatusReason: instance.StatusReason,
|
||||||
LastOperation: string(instance.LastOperation),
|
LastOperation: string(instance.LastOperation),
|
||||||
LastError: instance.LastError,
|
LastError: instance.LastError,
|
||||||
@ -622,6 +624,43 @@ func parseValuesYAML(valuesYAML string) (map[string]interface{}, error) {
|
|||||||
return values, nil
|
return values, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func parseAndCompareValues(values map[string]interface{}, valuesYAML string) (map[string]interface{}, bool, error) {
|
||||||
|
if strings.TrimSpace(valuesYAML) == "" {
|
||||||
|
return nil, false, nil
|
||||||
|
}
|
||||||
|
parsed, err := parseValuesYAML(valuesYAML)
|
||||||
|
if err != nil {
|
||||||
|
return nil, true, fmt.Errorf("invalid values YAML: %w", err)
|
||||||
|
}
|
||||||
|
if values == nil {
|
||||||
|
return parsed, true, nil
|
||||||
|
}
|
||||||
|
normalizedValues, err := normalizeJSONComparable(values)
|
||||||
|
if err != nil {
|
||||||
|
return nil, true, fmt.Errorf("invalid values: %w", err)
|
||||||
|
}
|
||||||
|
normalizedYAML, err := normalizeJSONComparable(parsed)
|
||||||
|
if err != nil {
|
||||||
|
return nil, true, fmt.Errorf("invalid values YAML: %w", err)
|
||||||
|
}
|
||||||
|
if !reflect.DeepEqual(normalizedValues, normalizedYAML) {
|
||||||
|
return nil, true, fmt.Errorf("values and valuesYaml conflict")
|
||||||
|
}
|
||||||
|
return parsed, true, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func normalizeJSONComparable(value interface{}) (interface{}, error) {
|
||||||
|
data, err := json.Marshal(value)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
var normalized interface{}
|
||||||
|
if err := json.Unmarshal(data, &normalized); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return normalized, nil
|
||||||
|
}
|
||||||
|
|
||||||
func normalizeYAMLValue(value interface{}) (interface{}, error) {
|
func normalizeYAMLValue(value interface{}) (interface{}, error) {
|
||||||
switch typed := value.(type) {
|
switch typed := value.(type) {
|
||||||
case map[string]interface{}:
|
case map[string]interface{}:
|
||||||
|
|||||||
@ -43,6 +43,12 @@ func (h *MonitoringHandler) GetClusterMonitoring(w http.ResponseWriter, r *http.
|
|||||||
respondJSON(w, http.StatusOK, response)
|
respondJSON(w, http.StatusOK, response)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// GetClusterStats is a compatibility alias for cluster detail dashboards that
|
||||||
|
// historically read stats from /clusters/{id}/stats.
|
||||||
|
func (h *MonitoringHandler) GetClusterStats(w http.ResponseWriter, r *http.Request) {
|
||||||
|
h.GetClusterMonitoring(w, r)
|
||||||
|
}
|
||||||
|
|
||||||
// ListClusterMonitoring 获取所有集群的监控信息
|
// ListClusterMonitoring 获取所有集群的监控信息
|
||||||
// @Summary 列出集群监控
|
// @Summary 列出集群监控
|
||||||
// @Tags Monitoring
|
// @Tags Monitoring
|
||||||
|
|||||||
@ -2,6 +2,7 @@ package rest
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
|
"errors"
|
||||||
"net/http"
|
"net/http"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
@ -113,6 +114,15 @@ func (h *WorkspaceHandler) IssueCurrentKubeconfig(w http.ResponseWriter, r *http
|
|||||||
if clusterID == "" {
|
if clusterID == "" {
|
||||||
clusterID = r.URL.Query().Get("cluster_id")
|
clusterID = r.URL.Query().Get("cluster_id")
|
||||||
}
|
}
|
||||||
|
h.issueCurrentKubeconfigForCluster(w, r, clusterID)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (h *WorkspaceHandler) IssueClusterKubeconfig(w http.ResponseWriter, r *http.Request) {
|
||||||
|
clusterID := mux.Vars(r)["cluster_id"]
|
||||||
|
h.issueCurrentKubeconfigForCluster(w, r, clusterID)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (h *WorkspaceHandler) issueCurrentKubeconfigForCluster(w http.ResponseWriter, r *http.Request, clusterID string) {
|
||||||
kubeconfig, err := h.workspaceService.IssueCurrentKubeconfig(r.Context(), clusterID, 2*time.Hour)
|
kubeconfig, err := h.workspaceService.IssueCurrentKubeconfig(r.Context(), clusterID, 2*time.Hour)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
respondServiceError(w, err, "Failed to issue kubeconfig")
|
respondServiceError(w, err, "Failed to issue kubeconfig")
|
||||||
@ -152,11 +162,19 @@ func toWorkspaceResponse(workspace *entity.Workspace) workspaceResponse {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func respondServiceError(w http.ResponseWriter, err error, fallback string) {
|
func respondServiceError(w http.ResponseWriter, err error, fallback string) {
|
||||||
|
if errors.Is(err, service.ErrQuotaExceeded) {
|
||||||
|
respondError(w, http.StatusUnprocessableEntity, "Quota exceeded", err.Error())
|
||||||
|
return
|
||||||
|
}
|
||||||
switch err {
|
switch err {
|
||||||
case entity.ErrUnauthorized, authz.ErrUnauthenticated:
|
case entity.ErrUnauthorized, authz.ErrUnauthenticated:
|
||||||
respondError(w, http.StatusUnauthorized, "Unauthorized", err.Error())
|
respondError(w, http.StatusUnauthorized, "Unauthorized", err.Error())
|
||||||
case entity.ErrForbidden, authz.ErrForbidden, entity.ErrUserInactive, entity.ErrWorkspaceSuspended:
|
case entity.ErrForbidden, authz.ErrForbidden, entity.ErrUserInactive, entity.ErrWorkspaceSuspended:
|
||||||
respondError(w, http.StatusForbidden, "Forbidden", err.Error())
|
respondError(w, http.StatusForbidden, "Forbidden", err.Error())
|
||||||
|
case entity.ErrWorkspaceNamespaceConflict, entity.ErrUserHasInstances, entity.ErrWorkspaceExists, entity.ErrInstanceExists:
|
||||||
|
respondError(w, http.StatusConflict, "Conflict", err.Error())
|
||||||
|
case entity.ErrProtectedNamespace:
|
||||||
|
respondError(w, http.StatusForbidden, "Forbidden", err.Error())
|
||||||
case entity.ErrClusterNotFound, entity.ErrRegistryNotFound, entity.ErrInstanceNotFound, entity.ErrWorkspaceNotFound:
|
case entity.ErrClusterNotFound, entity.ErrRegistryNotFound, entity.ErrInstanceNotFound, entity.ErrWorkspaceNotFound:
|
||||||
respondError(w, http.StatusNotFound, fallback, err.Error())
|
respondError(w, http.StatusNotFound, fallback, err.Error())
|
||||||
default:
|
default:
|
||||||
|
|||||||
@ -4,7 +4,7 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"fmt"
|
"fmt"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
"github.com/ocdp/cluster-service/internal/domain/entity"
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
"github.com/ocdp/cluster-service/internal/domain/repository"
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
)
|
)
|
||||||
@ -12,38 +12,47 @@ import (
|
|||||||
// HelmClientMock Helm 客户端 Mock 实现
|
// HelmClientMock Helm 客户端 Mock 实现
|
||||||
type HelmClientMock struct {
|
type HelmClientMock struct {
|
||||||
// Mock 数据存储
|
// Mock 数据存储
|
||||||
releases map[string]map[string]*entity.Instance // clusterID -> releaseName -> instance
|
releases map[string]map[string]*entity.Instance // clusterID -> releaseName -> instance
|
||||||
history map[string]map[string][]*entity.ReleaseHistory // clusterID -> releaseName -> []history
|
history map[string]map[string][]*entity.ReleaseHistory // clusterID -> releaseName -> []history
|
||||||
|
estimates map[string]map[string]*repository.ResourceEstimate // clusterID -> releaseName -> estimate
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewHelmClientMock 创建 Mock 实现
|
// NewHelmClientMock 创建 Mock 实现
|
||||||
func NewHelmClientMock() repository.HelmClient {
|
func NewHelmClientMock() repository.HelmClient {
|
||||||
return &HelmClientMock{
|
return &HelmClientMock{
|
||||||
releases: make(map[string]map[string]*entity.Instance),
|
releases: make(map[string]map[string]*entity.Instance),
|
||||||
history: make(map[string]map[string][]*entity.ReleaseHistory),
|
history: make(map[string]map[string][]*entity.ReleaseHistory),
|
||||||
|
estimates: make(map[string]map[string]*repository.ResourceEstimate),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c *HelmClientMock) SetResourceEstimate(clusterID, namespace, releaseName string, estimate *repository.ResourceEstimate) {
|
||||||
|
if c.estimates[clusterID] == nil {
|
||||||
|
c.estimates[clusterID] = make(map[string]*repository.ResourceEstimate)
|
||||||
|
}
|
||||||
|
c.estimates[clusterID][fmt.Sprintf("%s/%s", namespace, releaseName)] = estimate
|
||||||
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) Install(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
func (c *HelmClientMock) Install(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
||||||
// 初始化集群数据
|
// 初始化集群数据
|
||||||
if c.releases[cluster.ID] == nil {
|
if c.releases[cluster.ID] == nil {
|
||||||
c.releases[cluster.ID] = make(map[string]*entity.Instance)
|
c.releases[cluster.ID] = make(map[string]*entity.Instance)
|
||||||
c.history[cluster.ID] = make(map[string][]*entity.ReleaseHistory)
|
c.history[cluster.ID] = make(map[string][]*entity.ReleaseHistory)
|
||||||
}
|
}
|
||||||
|
|
||||||
// 检查是否已存在
|
// 检查是否已存在
|
||||||
key := fmt.Sprintf("%s/%s", instance.Namespace, instance.Name)
|
key := fmt.Sprintf("%s/%s", instance.Namespace, instance.Name)
|
||||||
if _, exists := c.releases[cluster.ID][key]; exists {
|
if _, exists := c.releases[cluster.ID][key]; exists {
|
||||||
return entity.ErrInstanceExists
|
return entity.ErrInstanceExists
|
||||||
}
|
}
|
||||||
|
|
||||||
// Mock 安装
|
// Mock 安装
|
||||||
instance.Status = entity.StatusDeployed
|
instance.Status = entity.StatusDeployed
|
||||||
instance.Revision = 1
|
instance.Revision = 1
|
||||||
instance.UpdatedAt = time.Now()
|
instance.UpdatedAt = time.Now()
|
||||||
|
|
||||||
c.releases[cluster.ID][key] = instance
|
c.releases[cluster.ID][key] = instance
|
||||||
|
|
||||||
// 添加历史记录
|
// 添加历史记录
|
||||||
c.history[cluster.ID][key] = []*entity.ReleaseHistory{
|
c.history[cluster.ID][key] = []*entity.ReleaseHistory{
|
||||||
{
|
{
|
||||||
@ -55,25 +64,25 @@ func (c *HelmClientMock) Install(ctx context.Context, cluster *entity.Cluster, i
|
|||||||
Description: "Install complete",
|
Description: "Install complete",
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) Upgrade(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
func (c *HelmClientMock) Upgrade(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
||||||
key := fmt.Sprintf("%s/%s", instance.Namespace, instance.Name)
|
key := fmt.Sprintf("%s/%s", instance.Namespace, instance.Name)
|
||||||
|
|
||||||
existing, exists := c.releases[cluster.ID][key]
|
existing, exists := c.releases[cluster.ID][key]
|
||||||
if !exists {
|
if !exists {
|
||||||
return entity.ErrInstanceNotFound
|
return entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
// Mock 升级
|
// Mock 升级
|
||||||
instance.Revision = existing.Revision + 1
|
instance.Revision = existing.Revision + 1
|
||||||
instance.Status = entity.StatusDeployed
|
instance.Status = entity.StatusDeployed
|
||||||
instance.UpdatedAt = time.Now()
|
instance.UpdatedAt = time.Now()
|
||||||
|
|
||||||
c.releases[cluster.ID][key] = instance
|
c.releases[cluster.ID][key] = instance
|
||||||
|
|
||||||
// 添加历史记录
|
// 添加历史记录
|
||||||
history := &entity.ReleaseHistory{
|
history := &entity.ReleaseHistory{
|
||||||
Revision: instance.Revision,
|
Revision: instance.Revision,
|
||||||
@ -84,44 +93,44 @@ func (c *HelmClientMock) Upgrade(ctx context.Context, cluster *entity.Cluster, i
|
|||||||
Description: "Upgrade complete",
|
Description: "Upgrade complete",
|
||||||
}
|
}
|
||||||
c.history[cluster.ID][key] = append(c.history[cluster.ID][key], history)
|
c.history[cluster.ID][key] = append(c.history[cluster.ID][key], history)
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) Uninstall(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) error {
|
func (c *HelmClientMock) Uninstall(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) error {
|
||||||
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
||||||
|
|
||||||
if _, exists := c.releases[cluster.ID][key]; !exists {
|
if _, exists := c.releases[cluster.ID][key]; !exists {
|
||||||
return entity.ErrInstanceNotFound
|
return entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
// Mock 卸载
|
// Mock 卸载
|
||||||
delete(c.releases[cluster.ID], key)
|
delete(c.releases[cluster.ID], key)
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) Rollback(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string, revision int) error {
|
func (c *HelmClientMock) Rollback(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string, revision int) error {
|
||||||
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
||||||
|
|
||||||
instance, exists := c.releases[cluster.ID][key]
|
instance, exists := c.releases[cluster.ID][key]
|
||||||
if !exists {
|
if !exists {
|
||||||
return entity.ErrInstanceNotFound
|
return entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
// 检查历史记录是否存在
|
// 检查历史记录是否存在
|
||||||
histories := c.history[cluster.ID][key]
|
histories := c.history[cluster.ID][key]
|
||||||
if revision > len(histories) || revision < 1 {
|
if revision > len(histories) || revision < 1 {
|
||||||
return fmt.Errorf("revision %d not found", revision)
|
return fmt.Errorf("revision %d not found", revision)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Mock 回滚
|
// Mock 回滚
|
||||||
instance.Revision = len(histories) + 1
|
instance.Revision = len(histories) + 1
|
||||||
instance.Status = entity.StatusDeployed
|
instance.Status = entity.StatusDeployed
|
||||||
instance.UpdatedAt = time.Now()
|
instance.UpdatedAt = time.Now()
|
||||||
|
|
||||||
c.releases[cluster.ID][key] = instance
|
c.releases[cluster.ID][key] = instance
|
||||||
|
|
||||||
// 添加回滚历史记录
|
// 添加回滚历史记录
|
||||||
history := &entity.ReleaseHistory{
|
history := &entity.ReleaseHistory{
|
||||||
Revision: instance.Revision,
|
Revision: instance.Revision,
|
||||||
@ -132,33 +141,33 @@ func (c *HelmClientMock) Rollback(ctx context.Context, cluster *entity.Cluster,
|
|||||||
Description: fmt.Sprintf("Rollback to revision %d", revision),
|
Description: fmt.Sprintf("Rollback to revision %d", revision),
|
||||||
}
|
}
|
||||||
c.history[cluster.ID][key] = append(c.history[cluster.ID][key], history)
|
c.history[cluster.ID][key] = append(c.history[cluster.ID][key], history)
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) GetStatus(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) (*entity.Instance, error) {
|
func (c *HelmClientMock) GetStatus(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) (*entity.Instance, error) {
|
||||||
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
||||||
|
|
||||||
instance, exists := c.releases[cluster.ID][key]
|
instance, exists := c.releases[cluster.ID][key]
|
||||||
if !exists {
|
if !exists {
|
||||||
return nil, entity.ErrInstanceNotFound
|
return nil, entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
return instance, nil
|
return instance, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) GetHistory(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) ([]*entity.ReleaseHistory, error) {
|
func (c *HelmClientMock) GetHistory(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) ([]*entity.ReleaseHistory, error) {
|
||||||
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
||||||
|
|
||||||
if _, exists := c.releases[cluster.ID][key]; !exists {
|
if _, exists := c.releases[cluster.ID][key]; !exists {
|
||||||
return nil, entity.ErrInstanceNotFound
|
return nil, entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
histories := c.history[cluster.ID][key]
|
histories := c.history[cluster.ID][key]
|
||||||
if histories == nil {
|
if histories == nil {
|
||||||
return []*entity.ReleaseHistory{}, nil
|
return []*entity.ReleaseHistory{}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
return histories, nil
|
return histories, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -167,7 +176,7 @@ func (c *HelmClientMock) List(ctx context.Context, cluster *entity.Cluster, name
|
|||||||
if clusterReleases == nil {
|
if clusterReleases == nil {
|
||||||
return []*entity.Instance{}, nil
|
return []*entity.Instance{}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
instances := make([]*entity.Instance, 0)
|
instances := make([]*entity.Instance, 0)
|
||||||
for key, instance := range clusterReleases {
|
for key, instance := range clusterReleases {
|
||||||
// 如果指定了 namespace,只返回该 namespace 的
|
// 如果指定了 namespace,只返回该 namespace 的
|
||||||
@ -179,18 +188,18 @@ func (c *HelmClientMock) List(ctx context.Context, cluster *entity.Cluster, name
|
|||||||
}
|
}
|
||||||
instances = append(instances, c.releases[cluster.ID][key])
|
instances = append(instances, c.releases[cluster.ID][key])
|
||||||
}
|
}
|
||||||
|
|
||||||
return instances, nil
|
return instances, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *HelmClientMock) GetValues(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) (map[string]interface{}, error) {
|
func (c *HelmClientMock) GetValues(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) (map[string]interface{}, error) {
|
||||||
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
key := fmt.Sprintf("%s/%s", namespace, releaseName)
|
||||||
|
|
||||||
instance, exists := c.releases[cluster.ID][key]
|
instance, exists := c.releases[cluster.ID][key]
|
||||||
if !exists {
|
if !exists {
|
||||||
return nil, entity.ErrInstanceNotFound
|
return nil, entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
return instance.Values, nil
|
return instance.Values, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -204,3 +213,16 @@ func (c *HelmClientMock) GetChartDefaultValues(chartPath string) (map[string]int
|
|||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c *HelmClientMock) EstimateInstanceResources(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) (*repository.ResourceEstimate, error) {
|
||||||
|
clusterID := ""
|
||||||
|
if cluster != nil {
|
||||||
|
clusterID = cluster.ID
|
||||||
|
}
|
||||||
|
key := fmt.Sprintf("%s/%s", instance.Namespace, instance.Name)
|
||||||
|
if c.estimates[clusterID] != nil {
|
||||||
|
if estimate := c.estimates[clusterID][key]; estimate != nil {
|
||||||
|
return estimate, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return &repository.ResourceEstimate{}, nil
|
||||||
|
}
|
||||||
|
|||||||
@ -10,6 +10,7 @@ import (
|
|||||||
|
|
||||||
"github.com/ocdp/cluster-service/internal/domain/entity"
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
"github.com/ocdp/cluster-service/internal/domain/repository"
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
|
domainservice "github.com/ocdp/cluster-service/internal/domain/service"
|
||||||
"helm.sh/helm/v3/pkg/action"
|
"helm.sh/helm/v3/pkg/action"
|
||||||
"helm.sh/helm/v3/pkg/chart/loader"
|
"helm.sh/helm/v3/pkg/chart/loader"
|
||||||
"helm.sh/helm/v3/pkg/cli"
|
"helm.sh/helm/v3/pkg/cli"
|
||||||
@ -346,6 +347,41 @@ func (h *HelmClient) GetChartDefaultValues(chartPath string) (map[string]interfa
|
|||||||
return vals, nil
|
return vals, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (h *HelmClient) EstimateInstanceResources(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) (*repository.ResourceEstimate, error) {
|
||||||
|
chartPath := fmt.Sprintf("/tmp/charts/%s-%s.tgz", instance.Chart, instance.Version)
|
||||||
|
chart, err := loader.Load(chartPath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to load chart: %w", err)
|
||||||
|
}
|
||||||
|
actionConfig := new(action.Configuration)
|
||||||
|
actionConfig.Log = func(format string, v ...interface{}) {}
|
||||||
|
|
||||||
|
install := action.NewInstall(actionConfig)
|
||||||
|
install.ReleaseName = instance.Name
|
||||||
|
if install.ReleaseName == "" {
|
||||||
|
install.ReleaseName = "quota-precheck"
|
||||||
|
}
|
||||||
|
install.Namespace = instance.Namespace
|
||||||
|
if install.Namespace == "" {
|
||||||
|
install.Namespace = "default"
|
||||||
|
}
|
||||||
|
install.DryRun = true
|
||||||
|
install.DryRunOption = "client"
|
||||||
|
install.ClientOnly = true
|
||||||
|
install.Replace = true
|
||||||
|
install.SkipSchemaValidation = true
|
||||||
|
|
||||||
|
values := instance.Values
|
||||||
|
if values == nil {
|
||||||
|
values = map[string]interface{}{}
|
||||||
|
}
|
||||||
|
release, err := install.RunWithContext(ctx, chart, values)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to render chart for quota estimate: %w", err)
|
||||||
|
}
|
||||||
|
return domainservice.EstimateRenderedManifestResources(release.Manifest)
|
||||||
|
}
|
||||||
|
|
||||||
// convertReleaseToInstance 转换 Helm Release 为 Instance
|
// convertReleaseToInstance 转换 Helm Release 为 Instance
|
||||||
func (h *HelmClient) convertReleaseToInstance(rel *release.Release) *entity.Instance {
|
func (h *HelmClient) convertReleaseToInstance(rel *release.Release) *entity.Instance {
|
||||||
return &entity.Instance{
|
return &entity.Instance{
|
||||||
|
|||||||
@ -63,7 +63,7 @@ func (c *MetricsClient) GetClusterMetrics(ctx context.Context, clusterID string)
|
|||||||
|
|
||||||
// 计算集群级别汇总
|
// 计算集群级别汇总
|
||||||
metrics := c.aggregateClusterMetrics(cluster, nodes.Items, pods.Items, nodeMetrics)
|
metrics := c.aggregateClusterMetrics(cluster, nodes.Items, pods.Items, nodeMetrics)
|
||||||
|
|
||||||
return metrics, nil
|
return metrics, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -87,6 +87,37 @@ func (c *MetricsClient) GetNodeMetrics(ctx context.Context, clusterID string) ([
|
|||||||
return c.getNodeMetricsData(ctx, clientset, metricsClient, nodes.Items)
|
return c.getNodeMetricsData(ctx, clientset, metricsClient, nodes.Items)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// GetPodResourceAllocations returns Kubernetes Pod requests/limits without
|
||||||
|
// inventing utilization values. GPU memory is treated as vendor integer MB.
|
||||||
|
func (c *MetricsClient) GetPodResourceAllocations(ctx context.Context, clusterID string) ([]*entity.PodResourceAllocation, error) {
|
||||||
|
cluster, err := c.clusterRepo.GetByID(ctx, clusterID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to get cluster: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
clientset, _, err := c.createK8sClients(cluster)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create k8s client: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
pods, err := clientset.CoreV1().Pods("").List(ctx, metav1.ListOptions{})
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to list pods: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
result := make([]*entity.PodResourceAllocation, 0, len(pods.Items))
|
||||||
|
for _, pod := range pods.Items {
|
||||||
|
result = append(result, &entity.PodResourceAllocation{
|
||||||
|
ClusterID: clusterID,
|
||||||
|
Namespace: pod.Namespace,
|
||||||
|
PodName: pod.Name,
|
||||||
|
InstanceName: inferHelmReleaseName(pod.Labels),
|
||||||
|
Allocation: podResourceAllocation(&pod),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
// createK8sClients 创建 Kubernetes 客户端
|
// createK8sClients 创建 Kubernetes 客户端
|
||||||
func (c *MetricsClient) createK8sClients(cluster *entity.Cluster) (*kubernetes.Clientset, *metricsv.Clientset, error) {
|
func (c *MetricsClient) createK8sClients(cluster *entity.Cluster) (*kubernetes.Clientset, *metricsv.Clientset, error) {
|
||||||
config, err := clientcmd.RESTConfigFromKubeConfig([]byte(cluster.GetKubeConfig()))
|
config, err := clientcmd.RESTConfigFromKubeConfig([]byte(cluster.GetKubeConfig()))
|
||||||
@ -127,14 +158,14 @@ func (c *MetricsClient) getNodeMetricsData(
|
|||||||
|
|
||||||
for _, node := range nodes {
|
for _, node := range nodes {
|
||||||
nodeMetric := &entity.NodeMetrics{
|
nodeMetric := &entity.NodeMetrics{
|
||||||
NodeName: node.Name,
|
NodeName: node.Name,
|
||||||
Status: getNodeStatus(&node),
|
Status: getNodeStatus(&node),
|
||||||
Role: getNodeRole(&node),
|
Role: getNodeRole(&node),
|
||||||
Age: getNodeAge(&node),
|
Age: getNodeAge(&node),
|
||||||
OSImage: node.Status.NodeInfo.OSImage,
|
OSImage: node.Status.NodeInfo.OSImage,
|
||||||
KernelVersion: node.Status.NodeInfo.KernelVersion,
|
KernelVersion: node.Status.NodeInfo.KernelVersion,
|
||||||
ContainerRuntime: node.Status.NodeInfo.ContainerRuntimeVersion,
|
ContainerRuntime: node.Status.NodeInfo.ContainerRuntimeVersion,
|
||||||
KubeletVersion: node.Status.NodeInfo.KubeletVersion,
|
KubeletVersion: node.Status.NodeInfo.KubeletVersion,
|
||||||
}
|
}
|
||||||
|
|
||||||
// CPU
|
// CPU
|
||||||
@ -213,7 +244,7 @@ func (c *MetricsClient) aggregateClusterMetrics(
|
|||||||
var totalCPU, totalMem, usedCPU, usedMem int64
|
var totalCPU, totalMem, usedCPU, usedMem int64
|
||||||
var totalGPU, usedGPU int
|
var totalGPU, usedGPU int
|
||||||
healthyNodes := 0
|
healthyNodes := 0
|
||||||
|
|
||||||
// 单机最大值
|
// 单机最大值
|
||||||
var maxNodeCPU, maxNodeMem int64
|
var maxNodeCPU, maxNodeMem int64
|
||||||
var maxNodeGPU int
|
var maxNodeGPU int
|
||||||
@ -251,7 +282,7 @@ func (c *MetricsClient) aggregateClusterMetrics(
|
|||||||
// 从 nodeMetrics 获取使用情况
|
// 从 nodeMetrics 获取使用情况
|
||||||
if i < len(nodeMetrics) && nodeMetrics[i] != nil {
|
if i < len(nodeMetrics) && nodeMetrics[i] != nil {
|
||||||
metrics.Nodes = append(metrics.Nodes, *nodeMetrics[i])
|
metrics.Nodes = append(metrics.Nodes, *nodeMetrics[i])
|
||||||
|
|
||||||
// 更新单机最大使用率
|
// 更新单机最大使用率
|
||||||
if nodeMetrics[i].CPUPercent > maxNodeCPUUsage {
|
if nodeMetrics[i].CPUPercent > maxNodeCPUUsage {
|
||||||
maxNodeCPUUsage = nodeMetrics[i].CPUPercent
|
maxNodeCPUUsage = nodeMetrics[i].CPUPercent
|
||||||
@ -274,7 +305,7 @@ func (c *MetricsClient) aggregateClusterMetrics(
|
|||||||
metrics.TotalCPU = fmt.Sprintf("%.2f cores", float64(totalCPU)/1000.0)
|
metrics.TotalCPU = fmt.Sprintf("%.2f cores", float64(totalCPU)/1000.0)
|
||||||
metrics.TotalMemory = formatBytes(totalMem)
|
metrics.TotalMemory = formatBytes(totalMem)
|
||||||
metrics.TotalGPU = totalGPU
|
metrics.TotalGPU = totalGPU
|
||||||
|
|
||||||
// 格式化单机最大值
|
// 格式化单机最大值
|
||||||
metrics.MaxNodeCPU = fmt.Sprintf("%.2f cores", float64(maxNodeCPU)/1000.0)
|
metrics.MaxNodeCPU = fmt.Sprintf("%.2f cores", float64(maxNodeCPU)/1000.0)
|
||||||
metrics.MaxNodeMemory = formatBytes(maxNodeMem)
|
metrics.MaxNodeMemory = formatBytes(maxNodeMem)
|
||||||
@ -292,7 +323,7 @@ func (c *MetricsClient) aggregateClusterMetrics(
|
|||||||
usedMem += int64(nm.MemoryPercent * float64(totalMem) / 100.0)
|
usedMem += int64(nm.MemoryPercent * float64(totalMem) / 100.0)
|
||||||
usedGPU += nm.GPUUsage
|
usedGPU += nm.GPUUsage
|
||||||
}
|
}
|
||||||
|
|
||||||
if totalCPU > 0 {
|
if totalCPU > 0 {
|
||||||
metrics.CPUUsage = float64(usedCPU) / float64(totalCPU) * 100
|
metrics.CPUUsage = float64(usedCPU) / float64(totalCPU) * 100
|
||||||
}
|
}
|
||||||
@ -302,7 +333,7 @@ func (c *MetricsClient) aggregateClusterMetrics(
|
|||||||
if totalGPU > 0 {
|
if totalGPU > 0 {
|
||||||
metrics.GPUUsage = float64(usedGPU) / float64(totalGPU) * 100
|
metrics.GPUUsage = float64(usedGPU) / float64(totalGPU) * 100
|
||||||
}
|
}
|
||||||
|
|
||||||
metrics.UsedCPU = fmt.Sprintf("%.2f cores", float64(usedCPU)/1000.0)
|
metrics.UsedCPU = fmt.Sprintf("%.2f cores", float64(usedCPU)/1000.0)
|
||||||
metrics.UsedMemory = formatBytes(usedMem)
|
metrics.UsedMemory = formatBytes(usedMem)
|
||||||
metrics.UsedGPU = usedGPU
|
metrics.UsedGPU = usedGPU
|
||||||
@ -348,7 +379,7 @@ func getNodeAge(node *corev1.Node) string {
|
|||||||
age := time.Since(node.CreationTimestamp.Time)
|
age := time.Since(node.CreationTimestamp.Time)
|
||||||
days := int(age.Hours() / 24)
|
days := int(age.Hours() / 24)
|
||||||
hours := int(age.Hours()) % 24
|
hours := int(age.Hours()) % 24
|
||||||
|
|
||||||
if days > 0 {
|
if days > 0 {
|
||||||
return fmt.Sprintf("%dd %dh", days, hours)
|
return fmt.Sprintf("%dd %dh", days, hours)
|
||||||
}
|
}
|
||||||
@ -368,3 +399,110 @@ func formatBytes(bytes int64) string {
|
|||||||
return fmt.Sprintf("%.1f %ciB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
return fmt.Sprintf("%.1f %ciB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func inferHelmReleaseName(labels map[string]string) string {
|
||||||
|
if labels == nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
for _, key := range []string{
|
||||||
|
"app.kubernetes.io/instance",
|
||||||
|
"release",
|
||||||
|
"helm.sh/release",
|
||||||
|
"meta.helm.sh/release-name",
|
||||||
|
"app",
|
||||||
|
} {
|
||||||
|
if value := labels[key]; value != "" {
|
||||||
|
return value
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
func podResourceAllocation(pod *corev1.Pod) entity.ResourceAllocation {
|
||||||
|
if pod == nil {
|
||||||
|
return entity.ResourceAllocation{}
|
||||||
|
}
|
||||||
|
sum := entity.ResourceAllocation{}
|
||||||
|
for _, container := range pod.Spec.Containers {
|
||||||
|
sum = addContainerAllocation(sum, container)
|
||||||
|
}
|
||||||
|
initMax := entity.ResourceAllocation{}
|
||||||
|
for _, container := range pod.Spec.InitContainers {
|
||||||
|
initMax = maxAllocation(initMax, containerAllocation(container))
|
||||||
|
}
|
||||||
|
return maxAllocation(sum, initMax)
|
||||||
|
}
|
||||||
|
|
||||||
|
func addContainerAllocation(base entity.ResourceAllocation, container corev1.Container) entity.ResourceAllocation {
|
||||||
|
return addAllocation(base, containerAllocation(container))
|
||||||
|
}
|
||||||
|
|
||||||
|
func containerAllocation(container corev1.Container) entity.ResourceAllocation {
|
||||||
|
requests := container.Resources.Requests
|
||||||
|
limits := container.Resources.Limits
|
||||||
|
return entity.ResourceAllocation{
|
||||||
|
CPURequestsMilli: quantityMilliValue(requests, corev1.ResourceCPU),
|
||||||
|
CPULimitsMilli: quantityMilliValue(limits, corev1.ResourceCPU),
|
||||||
|
MemoryRequestsBytes: quantityValue(requests, corev1.ResourceMemory),
|
||||||
|
MemoryLimitsBytes: quantityValue(limits, corev1.ResourceMemory),
|
||||||
|
GPURequests: quantityValue(requests, corev1.ResourceName("nvidia.com/gpu")),
|
||||||
|
GPULimits: quantityValue(limits, corev1.ResourceName("nvidia.com/gpu")),
|
||||||
|
GPUMemoryRequestsMB: quantityValueAny(requests, corev1.ResourceName("nvidia.com/gpumem"), corev1.ResourceName("requests.nvidia.com/gpumem")),
|
||||||
|
GPUMemoryLimitsMB: quantityValueAny(limits, corev1.ResourceName("nvidia.com/gpumem"), corev1.ResourceName("requests.nvidia.com/gpumem")),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func addAllocation(left, right entity.ResourceAllocation) entity.ResourceAllocation {
|
||||||
|
return entity.ResourceAllocation{
|
||||||
|
CPURequestsMilli: left.CPURequestsMilli + right.CPURequestsMilli,
|
||||||
|
CPULimitsMilli: left.CPULimitsMilli + right.CPULimitsMilli,
|
||||||
|
MemoryRequestsBytes: left.MemoryRequestsBytes + right.MemoryRequestsBytes,
|
||||||
|
MemoryLimitsBytes: left.MemoryLimitsBytes + right.MemoryLimitsBytes,
|
||||||
|
GPURequests: left.GPURequests + right.GPURequests,
|
||||||
|
GPULimits: left.GPULimits + right.GPULimits,
|
||||||
|
GPUMemoryRequestsMB: left.GPUMemoryRequestsMB + right.GPUMemoryRequestsMB,
|
||||||
|
GPUMemoryLimitsMB: left.GPUMemoryLimitsMB + right.GPUMemoryLimitsMB,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func maxAllocation(left, right entity.ResourceAllocation) entity.ResourceAllocation {
|
||||||
|
return entity.ResourceAllocation{
|
||||||
|
CPURequestsMilli: maxInt64(left.CPURequestsMilli, right.CPURequestsMilli),
|
||||||
|
CPULimitsMilli: maxInt64(left.CPULimitsMilli, right.CPULimitsMilli),
|
||||||
|
MemoryRequestsBytes: maxInt64(left.MemoryRequestsBytes, right.MemoryRequestsBytes),
|
||||||
|
MemoryLimitsBytes: maxInt64(left.MemoryLimitsBytes, right.MemoryLimitsBytes),
|
||||||
|
GPURequests: maxInt64(left.GPURequests, right.GPURequests),
|
||||||
|
GPULimits: maxInt64(left.GPULimits, right.GPULimits),
|
||||||
|
GPUMemoryRequestsMB: maxInt64(left.GPUMemoryRequestsMB, right.GPUMemoryRequestsMB),
|
||||||
|
GPUMemoryLimitsMB: maxInt64(left.GPUMemoryLimitsMB, right.GPUMemoryLimitsMB),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func quantityMilliValue(resources corev1.ResourceList, name corev1.ResourceName) int64 {
|
||||||
|
if quantity, ok := resources[name]; ok {
|
||||||
|
return quantity.MilliValue()
|
||||||
|
}
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func quantityValue(resources corev1.ResourceList, name corev1.ResourceName) int64 {
|
||||||
|
if quantity, ok := resources[name]; ok {
|
||||||
|
return quantity.Value()
|
||||||
|
}
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func quantityValueAny(resources corev1.ResourceList, names ...corev1.ResourceName) int64 {
|
||||||
|
for _, name := range names {
|
||||||
|
if quantity, ok := resources[name]; ok {
|
||||||
|
return quantity.Value()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func maxInt64(left, right int64) int64 {
|
||||||
|
if left > right {
|
||||||
|
return left
|
||||||
|
}
|
||||||
|
return right
|
||||||
|
}
|
||||||
|
|||||||
29
backend/internal/adapter/output/k8s/metrics_client_test.go
Normal file
29
backend/internal/adapter/output/k8s/metrics_client_test.go
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
package k8s
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
corev1 "k8s.io/api/core/v1"
|
||||||
|
"k8s.io/apimachinery/pkg/api/resource"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestContainerAllocationCountsVendorGPUMemoryKey(t *testing.T) {
|
||||||
|
container := corev1.Container{
|
||||||
|
Resources: corev1.ResourceRequirements{
|
||||||
|
Requests: corev1.ResourceList{
|
||||||
|
corev1.ResourceName("nvidia.com/gpumem"): resource.MustParse("10000"),
|
||||||
|
},
|
||||||
|
Limits: corev1.ResourceList{
|
||||||
|
corev1.ResourceName("nvidia.com/gpumem"): resource.MustParse("12000"),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
allocation := containerAllocation(container)
|
||||||
|
if allocation.GPUMemoryRequestsMB != 10000 {
|
||||||
|
t.Fatalf("expected GPU memory requests 10000 MB, got %d", allocation.GPUMemoryRequestsMB)
|
||||||
|
}
|
||||||
|
if allocation.GPUMemoryLimitsMB != 12000 {
|
||||||
|
t.Fatalf("expected GPU memory limits 12000 MB, got %d", allocation.GPUMemoryLimitsMB)
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -106,6 +106,25 @@ func (c *TenantClient) IssueKubeconfig(ctx context.Context, cluster *entity.Clus
|
|||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c *TenantClient) GetResourceQuotaUsage(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) (*repository.ResourceQuotaUsage, error) {
|
||||||
|
binding = binding.WithDefaults()
|
||||||
|
if err := binding.Validate(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
clientset, _, err := c.clientsetForCluster(cluster)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
quota, err := clientset.CoreV1().ResourceQuotas(binding.Namespace).Get(ctx, binding.ResourceQuotaName, metav1.GetOptions{})
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to get tenant resource quota usage: %w", err)
|
||||||
|
}
|
||||||
|
return &repository.ResourceQuotaUsage{
|
||||||
|
Hard: resourceVectorFromList(quota.Status.Hard),
|
||||||
|
Used: resourceVectorFromList(quota.Status.Used),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
// SuspendTenant revokes tenant API access by deleting only the RoleBinding.
|
// SuspendTenant revokes tenant API access by deleting only the RoleBinding.
|
||||||
func (c *TenantClient) SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
func (c *TenantClient) SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
binding = binding.WithDefaults()
|
binding = binding.WithDefaults()
|
||||||
@ -128,6 +147,82 @@ func (c *TenantClient) SuspendTenant(ctx context.Context, cluster *entity.Cluste
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c *TenantClient) DeleteTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
|
binding = binding.WithDefaults()
|
||||||
|
if err := binding.Validate(); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if isProtectedTenantNamespace(binding.Namespace) {
|
||||||
|
return entity.ErrProtectedNamespace
|
||||||
|
}
|
||||||
|
clientset, _, err := c.clientsetForCluster(cluster)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if err := deleteIgnoringNotFound(ctx, func() error {
|
||||||
|
return clientset.RbacV1().RoleBindings(binding.Namespace).Delete(ctx, binding.RoleBindingName, metav1.DeleteOptions{})
|
||||||
|
}); err != nil {
|
||||||
|
return fmt.Errorf("failed to delete tenant role binding: %w", err)
|
||||||
|
}
|
||||||
|
if err := deleteIgnoringNotFound(ctx, func() error {
|
||||||
|
return clientset.CoreV1().ResourceQuotas(binding.Namespace).Delete(ctx, binding.ResourceQuotaName, metav1.DeleteOptions{})
|
||||||
|
}); err != nil {
|
||||||
|
return fmt.Errorf("failed to delete tenant resource quota: %w", err)
|
||||||
|
}
|
||||||
|
if err := deleteIgnoringNotFound(ctx, func() error {
|
||||||
|
return clientset.CoreV1().ServiceAccounts(binding.Namespace).Delete(ctx, binding.ServiceAccountName, metav1.DeleteOptions{})
|
||||||
|
}); err != nil {
|
||||||
|
return fmt.Errorf("failed to delete tenant service account: %w", err)
|
||||||
|
}
|
||||||
|
namespace, err := clientset.CoreV1().Namespaces().Get(ctx, binding.Namespace, metav1.GetOptions{})
|
||||||
|
if apierrors.IsNotFound(err) {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to get tenant namespace before deletion: %w", err)
|
||||||
|
}
|
||||||
|
if namespace.Labels["ocdp.io/managed-by"] != "ocdp" || namespace.Labels["ocdp.io/tenant"] != binding.Namespace {
|
||||||
|
return fmt.Errorf("refusing to delete unmanaged namespace %q", binding.Namespace)
|
||||||
|
}
|
||||||
|
if err := deleteIgnoringNotFound(ctx, func() error {
|
||||||
|
return clientset.CoreV1().Namespaces().Delete(ctx, binding.Namespace, metav1.DeleteOptions{})
|
||||||
|
}); err != nil {
|
||||||
|
return fmt.Errorf("failed to delete tenant namespace: %w", err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func deleteIgnoringNotFound(ctx context.Context, deleteFn func() error) error {
|
||||||
|
if err := ctx.Err(); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
err := deleteFn()
|
||||||
|
if apierrors.IsNotFound(err) {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
func isProtectedTenantNamespace(namespace string) bool {
|
||||||
|
switch strings.TrimSpace(namespace) {
|
||||||
|
case "", "default", "kube-system", "kube-public", "kube-node-lease":
|
||||||
|
return true
|
||||||
|
default:
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func resourceVectorFromList(values corev1.ResourceList) repository.ResourceVector {
|
||||||
|
gpu := values[corev1.ResourceName("requests.nvidia.com/gpu")]
|
||||||
|
gpuMem := values[corev1.ResourceName("requests.nvidia.com/gpumem")]
|
||||||
|
return repository.ResourceVector{
|
||||||
|
CPU: values[corev1.ResourceName("requests.cpu")],
|
||||||
|
Memory: values[corev1.ResourceName("requests.memory")],
|
||||||
|
GPU: gpu.Value(),
|
||||||
|
GPUMemoryMB: gpuMem.Value(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func (c *TenantClient) clientsetForCluster(cluster *entity.Cluster) (kubernetes.Interface, *rest.Config, error) {
|
func (c *TenantClient) clientsetForCluster(cluster *entity.Cluster) (kubernetes.Interface, *rest.Config, error) {
|
||||||
if c.clientset != nil {
|
if c.clientset != nil {
|
||||||
config := &rest.Config{Host: "https://kubernetes.default.svc"}
|
config := &rest.Config{Host: "https://kubernetes.default.svc"}
|
||||||
|
|||||||
@ -2,6 +2,7 @@ package k8s
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
|
"errors"
|
||||||
"strings"
|
"strings"
|
||||||
"testing"
|
"testing"
|
||||||
"time"
|
"time"
|
||||||
@ -58,7 +59,7 @@ func TestTenantClientEnsureTenantUpdatesExistingResources(t *testing.T) {
|
|||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
binding := tenantBinding()
|
binding := tenantBinding()
|
||||||
clientset := fake.NewSimpleClientset(
|
clientset := fake.NewSimpleClientset(
|
||||||
&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: binding.Namespace}},
|
&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: binding.Namespace, Labels: binding.Labels}},
|
||||||
&corev1.ServiceAccount{ObjectMeta: metav1.ObjectMeta{Name: binding.ServiceAccountName, Namespace: binding.Namespace}},
|
&corev1.ServiceAccount{ObjectMeta: metav1.ObjectMeta{Name: binding.ServiceAccountName, Namespace: binding.Namespace}},
|
||||||
&rbacv1.RoleBinding{
|
&rbacv1.RoleBinding{
|
||||||
ObjectMeta: metav1.ObjectMeta{Name: binding.RoleBindingName, Namespace: binding.Namespace},
|
ObjectMeta: metav1.ObjectMeta{Name: binding.RoleBindingName, Namespace: binding.Namespace},
|
||||||
@ -100,7 +101,7 @@ func TestTenantClientSuspendTenantDeletesOnlyRoleBinding(t *testing.T) {
|
|||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
binding := tenantBinding()
|
binding := tenantBinding()
|
||||||
clientset := fake.NewSimpleClientset(
|
clientset := fake.NewSimpleClientset(
|
||||||
&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: binding.Namespace}},
|
&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: binding.Namespace, Labels: binding.Labels}},
|
||||||
&corev1.ServiceAccount{ObjectMeta: metav1.ObjectMeta{Name: binding.ServiceAccountName, Namespace: binding.Namespace}},
|
&corev1.ServiceAccount{ObjectMeta: metav1.ObjectMeta{Name: binding.ServiceAccountName, Namespace: binding.Namespace}},
|
||||||
desiredRoleBinding(binding),
|
desiredRoleBinding(binding),
|
||||||
)
|
)
|
||||||
@ -117,6 +118,47 @@ func TestTenantClientSuspendTenantDeletesOnlyRoleBinding(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestTenantClientDeleteTenantDeletesTenantResources(t *testing.T) {
|
||||||
|
ctx := context.Background()
|
||||||
|
binding := tenantBinding()
|
||||||
|
clientset := fake.NewSimpleClientset(
|
||||||
|
&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: binding.Namespace, Labels: binding.Labels}},
|
||||||
|
&corev1.ServiceAccount{ObjectMeta: metav1.ObjectMeta{Name: binding.ServiceAccountName, Namespace: binding.Namespace}},
|
||||||
|
desiredRoleBinding(binding),
|
||||||
|
&corev1.ResourceQuota{ObjectMeta: metav1.ObjectMeta{Name: binding.ResourceQuotaName, Namespace: binding.Namespace}},
|
||||||
|
)
|
||||||
|
client := NewTenantClientForClientset(clientset)
|
||||||
|
|
||||||
|
if err := client.DeleteTenant(ctx, nil, binding); err != nil {
|
||||||
|
t.Fatalf("DeleteTenant returned error: %v", err)
|
||||||
|
}
|
||||||
|
if _, err := clientset.RbacV1().RoleBindings(binding.Namespace).Get(ctx, binding.RoleBindingName, metav1.GetOptions{}); !apierrors.IsNotFound(err) {
|
||||||
|
t.Fatalf("expected role binding deleted, got %v", err)
|
||||||
|
}
|
||||||
|
if _, err := clientset.CoreV1().ResourceQuotas(binding.Namespace).Get(ctx, binding.ResourceQuotaName, metav1.GetOptions{}); !apierrors.IsNotFound(err) {
|
||||||
|
t.Fatalf("expected resource quota deleted, got %v", err)
|
||||||
|
}
|
||||||
|
if _, err := clientset.CoreV1().ServiceAccounts(binding.Namespace).Get(ctx, binding.ServiceAccountName, metav1.GetOptions{}); !apierrors.IsNotFound(err) {
|
||||||
|
t.Fatalf("expected service account deleted, got %v", err)
|
||||||
|
}
|
||||||
|
if _, err := clientset.CoreV1().Namespaces().Get(ctx, binding.Namespace, metav1.GetOptions{}); !apierrors.IsNotFound(err) {
|
||||||
|
t.Fatalf("expected namespace deleted, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestTenantClientDeleteTenantRejectsProtectedNamespace(t *testing.T) {
|
||||||
|
ctx := context.Background()
|
||||||
|
client := NewTenantClientForClientset(fake.NewSimpleClientset(
|
||||||
|
&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: "default"}},
|
||||||
|
))
|
||||||
|
binding := entity.NewTenantBinding("default")
|
||||||
|
|
||||||
|
err := client.DeleteTenant(ctx, nil, binding)
|
||||||
|
if !errors.Is(err, entity.ErrProtectedNamespace) {
|
||||||
|
t.Fatalf("expected protected namespace error, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func TestTenantClientIssueKubeconfigCapsTokenTTL(t *testing.T) {
|
func TestTenantClientIssueKubeconfigCapsTokenTTL(t *testing.T) {
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
binding := tenantBinding()
|
binding := tenantBinding()
|
||||||
|
|||||||
@ -31,6 +31,28 @@ func (c *MockTenantClient) IssueKubeconfig(ctx context.Context, cluster *entity.
|
|||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c *MockTenantClient) GetResourceQuotaUsage(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) (*repository.ResourceQuotaUsage, error) {
|
||||||
|
if err := binding.Validate(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return &repository.ResourceQuotaUsage{
|
||||||
|
Hard: resourceVectorFromList(binding.ResourceQuotaHard),
|
||||||
|
Used: repository.ResourceVector{},
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
func (c *MockTenantClient) SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
func (c *MockTenantClient) SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
return binding.Validate()
|
return binding.Validate()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c *MockTenantClient) DeleteTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
|
if err := binding.Validate(); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
switch binding.Namespace {
|
||||||
|
case "", "default", "kube-system", "kube-public", "kube-node-lease":
|
||||||
|
return entity.ErrProtectedNamespace
|
||||||
|
default:
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@ -72,6 +72,16 @@ func (r *WorkspaceRepositoryMock) Update(ctx context.Context, workspace *entity.
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (r *WorkspaceRepositoryMock) Delete(ctx context.Context, id string) error {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
if _, ok := r.workspaces[id]; !ok {
|
||||||
|
return entity.ErrWorkspaceNotFound
|
||||||
|
}
|
||||||
|
delete(r.workspaces, id)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
func (r *WorkspaceRepositoryMock) List(ctx context.Context) ([]*entity.Workspace, error) {
|
func (r *WorkspaceRepositoryMock) List(ctx context.Context) ([]*entity.Workspace, error) {
|
||||||
r.mu.RLock()
|
r.mu.RLock()
|
||||||
defer r.mu.RUnlock()
|
defer r.mu.RUnlock()
|
||||||
@ -118,6 +128,20 @@ func (r *WorkspaceClusterBindingRepositoryMock) Get(ctx context.Context, workspa
|
|||||||
return ©, nil
|
return ©, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (r *WorkspaceClusterBindingRepositoryMock) ListByWorkspace(ctx context.Context, workspaceID string) ([]*entity.WorkspaceClusterBinding, error) {
|
||||||
|
r.mu.RLock()
|
||||||
|
defer r.mu.RUnlock()
|
||||||
|
result := make([]*entity.WorkspaceClusterBinding, 0)
|
||||||
|
for _, binding := range r.bindings {
|
||||||
|
if binding.WorkspaceID != workspaceID {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
copy := *binding
|
||||||
|
result = append(result, ©)
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
func (r *WorkspaceClusterBindingRepositoryMock) Delete(ctx context.Context, workspaceID, clusterID string) error {
|
func (r *WorkspaceClusterBindingRepositoryMock) Delete(ctx context.Context, workspaceID, clusterID string) error {
|
||||||
r.mu.Lock()
|
r.mu.Lock()
|
||||||
defer r.mu.Unlock()
|
defer r.mu.Unlock()
|
||||||
|
|||||||
@ -27,8 +27,9 @@ func (r *WorkspaceRepository) Create(ctx context.Context, workspace *entity.Work
|
|||||||
query := `
|
query := `
|
||||||
INSERT INTO workspaces (id, name, status, k8s_namespace, k8s_sa_name, default_cluster_id, quota_cpu, quota_memory, quota_gpu, quota_gpu_memory, created_by, created_at, updated_at)
|
INSERT INTO workspaces (id, name, status, k8s_namespace, k8s_sa_name, default_cluster_id, quota_cpu, quota_memory, quota_gpu, quota_gpu_memory, created_by, created_at, updated_at)
|
||||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13)
|
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13)
|
||||||
|
ON CONFLICT (name) DO NOTHING
|
||||||
`
|
`
|
||||||
_, err := r.db.conn.ExecContext(ctx, query,
|
result, err := r.db.conn.ExecContext(ctx, query,
|
||||||
workspace.ID,
|
workspace.ID,
|
||||||
workspace.Name,
|
workspace.Name,
|
||||||
workspace.Status,
|
workspace.Status,
|
||||||
@ -46,6 +47,13 @@ func (r *WorkspaceRepository) Create(ctx context.Context, workspace *entity.Work
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to create workspace: %w", err)
|
return fmt.Errorf("failed to create workspace: %w", err)
|
||||||
}
|
}
|
||||||
|
rows, err := result.RowsAffected()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to get affected rows: %w", err)
|
||||||
|
}
|
||||||
|
if rows == 0 {
|
||||||
|
return entity.ErrWorkspaceExists
|
||||||
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -132,6 +140,21 @@ func (r *WorkspaceRepository) Update(ctx context.Context, workspace *entity.Work
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (r *WorkspaceRepository) Delete(ctx context.Context, id string) error {
|
||||||
|
result, err := r.db.conn.ExecContext(ctx, `DELETE FROM workspaces WHERE id = $1`, id)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to delete workspace: %w", err)
|
||||||
|
}
|
||||||
|
rows, err := result.RowsAffected()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to get affected rows: %w", err)
|
||||||
|
}
|
||||||
|
if rows == 0 {
|
||||||
|
return entity.ErrWorkspaceNotFound
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
func (r *WorkspaceRepository) List(ctx context.Context) ([]*entity.Workspace, error) {
|
func (r *WorkspaceRepository) List(ctx context.Context) ([]*entity.Workspace, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT id, name, status, k8s_namespace, k8s_sa_name, default_cluster_id, quota_cpu, quota_memory, quota_gpu, quota_gpu_memory, created_by, created_at, updated_at
|
SELECT id, name, status, k8s_namespace, k8s_sa_name, default_cluster_id, quota_cpu, quota_memory, quota_gpu, quota_gpu_memory, created_by, created_at, updated_at
|
||||||
@ -256,6 +279,42 @@ func (r *WorkspaceClusterBindingRepository) Get(ctx context.Context, workspaceID
|
|||||||
return binding, nil
|
return binding, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (r *WorkspaceClusterBindingRepository) ListByWorkspace(ctx context.Context, workspaceID string) ([]*entity.WorkspaceClusterBinding, error) {
|
||||||
|
query := `
|
||||||
|
SELECT id, workspace_id, cluster_id, namespace, service_account, quota_cpu, quota_memory, quota_gpu, quota_gpu_memory, status, created_at, updated_at
|
||||||
|
FROM workspace_cluster_bindings
|
||||||
|
WHERE workspace_id = $1
|
||||||
|
ORDER BY created_at ASC
|
||||||
|
`
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query, workspaceID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to list workspace cluster bindings: %w", err)
|
||||||
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
bindings := make([]*entity.WorkspaceClusterBinding, 0)
|
||||||
|
for rows.Next() {
|
||||||
|
binding := &entity.WorkspaceClusterBinding{}
|
||||||
|
if err := rows.Scan(
|
||||||
|
&binding.ID,
|
||||||
|
&binding.WorkspaceID,
|
||||||
|
&binding.ClusterID,
|
||||||
|
&binding.Namespace,
|
||||||
|
&binding.ServiceAccount,
|
||||||
|
&binding.QuotaCPU,
|
||||||
|
&binding.QuotaMemory,
|
||||||
|
&binding.QuotaGPU,
|
||||||
|
&binding.QuotaGPUMem,
|
||||||
|
&binding.Status,
|
||||||
|
&binding.CreatedAt,
|
||||||
|
&binding.UpdatedAt,
|
||||||
|
); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to scan workspace cluster binding: %w", err)
|
||||||
|
}
|
||||||
|
bindings = append(bindings, binding)
|
||||||
|
}
|
||||||
|
return bindings, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
func (r *WorkspaceClusterBindingRepository) Delete(ctx context.Context, workspaceID, clusterID string) error {
|
func (r *WorkspaceClusterBindingRepository) Delete(ctx context.Context, workspaceID, clusterID string) error {
|
||||||
_, err := r.db.conn.ExecContext(ctx, `DELETE FROM workspace_cluster_bindings WHERE workspace_id = $1 AND cluster_id = $2`, workspaceID, clusterID)
|
_, err := r.db.conn.ExecContext(ctx, `DELETE FROM workspace_cluster_bindings WHERE workspace_id = $1 AND cluster_id = $2`, workspaceID, clusterID)
|
||||||
return err
|
return err
|
||||||
|
|||||||
@ -43,6 +43,9 @@ var (
|
|||||||
ErrValuesSchemaNotFound = errors.New("values schema not found")
|
ErrValuesSchemaNotFound = errors.New("values schema not found")
|
||||||
|
|
||||||
// Workspace errors
|
// Workspace errors
|
||||||
ErrWorkspaceNotFound = errors.New("workspace not found")
|
ErrWorkspaceNotFound = errors.New("workspace not found")
|
||||||
ErrWorkspaceExists = errors.New("workspace already exists")
|
ErrWorkspaceExists = errors.New("workspace already exists")
|
||||||
|
ErrWorkspaceNamespaceConflict = errors.New("workspace namespace conflict")
|
||||||
|
ErrUserHasInstances = errors.New("user has active instances")
|
||||||
|
ErrProtectedNamespace = errors.New("protected namespace")
|
||||||
)
|
)
|
||||||
|
|||||||
@ -54,6 +54,7 @@ type Instance struct {
|
|||||||
CreatedAt time.Time
|
CreatedAt time.Time
|
||||||
UpdatedAt time.Time
|
UpdatedAt time.Time
|
||||||
Replicas int // Running K8s replicas (enriched, not persisted)
|
Replicas int // Running K8s replicas (enriched, not persisted)
|
||||||
|
OwnerUsername string
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewInstance 创建新实例
|
// NewInstance 创建新实例
|
||||||
|
|||||||
@ -25,6 +25,18 @@ type ClusterMetrics struct {
|
|||||||
MemoryUsage float64 `json:"memory_usage"` // 百分比
|
MemoryUsage float64 `json:"memory_usage"` // 百分比
|
||||||
GPUUsage float64 `json:"gpu_usage"` // 百分比
|
GPUUsage float64 `json:"gpu_usage"` // 百分比
|
||||||
|
|
||||||
|
CPURequests string `json:"cpu_requests,omitempty"`
|
||||||
|
CPULimits string `json:"cpu_limits,omitempty"`
|
||||||
|
MemoryRequests string `json:"memory_requests,omitempty"`
|
||||||
|
MemoryLimits string `json:"memory_limits,omitempty"`
|
||||||
|
GPURequests int64 `json:"gpu_requests,omitempty"`
|
||||||
|
GPULimits int64 `json:"gpu_limits,omitempty"`
|
||||||
|
GPUMemoryRequestsMB int64 `json:"gpu_memory_requests_mb,omitempty"`
|
||||||
|
GPUMemoryLimitsMB int64 `json:"gpu_memory_limits_mb,omitempty"`
|
||||||
|
AllocatedGPU int64 `json:"allocated_gpu,omitempty"`
|
||||||
|
AllocatedGPUMemoryMB int64 `json:"allocated_gpu_memory_mb,omitempty"`
|
||||||
|
ResourceUsageByUser []UserResourceUsage `json:"resource_usage_by_user,omitempty"`
|
||||||
|
|
||||||
// 单机资源最大值
|
// 单机资源最大值
|
||||||
MaxNodeCPU string `json:"max_node_cpu"` // 单机最大CPU容量,如 "8 cores"
|
MaxNodeCPU string `json:"max_node_cpu"` // 单机最大CPU容量,如 "8 cores"
|
||||||
MaxNodeMemory string `json:"max_node_memory"` // 单机最大内存容量,如 "32 GB"
|
MaxNodeMemory string `json:"max_node_memory"` // 单机最大内存容量,如 "32 GB"
|
||||||
@ -37,6 +49,42 @@ type ClusterMetrics struct {
|
|||||||
Nodes []NodeMetrics `json:"nodes,omitempty"`
|
Nodes []NodeMetrics `json:"nodes,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ResourceAllocation is derived from Kubernetes Pod resources requests/limits.
|
||||||
|
type ResourceAllocation struct {
|
||||||
|
CPURequestsMilli int64
|
||||||
|
CPULimitsMilli int64
|
||||||
|
MemoryRequestsBytes int64
|
||||||
|
MemoryLimitsBytes int64
|
||||||
|
GPURequests int64
|
||||||
|
GPULimits int64
|
||||||
|
GPUMemoryRequestsMB int64
|
||||||
|
GPUMemoryLimitsMB int64
|
||||||
|
}
|
||||||
|
|
||||||
|
type PodResourceAllocation struct {
|
||||||
|
ClusterID string
|
||||||
|
Namespace string
|
||||||
|
PodName string
|
||||||
|
InstanceName string
|
||||||
|
Allocation ResourceAllocation
|
||||||
|
}
|
||||||
|
|
||||||
|
type UserResourceUsage struct {
|
||||||
|
UserID string `json:"user_id"`
|
||||||
|
Username string `json:"username"`
|
||||||
|
WorkspaceID string `json:"workspace_id"`
|
||||||
|
InstanceCount int `json:"instance_count"`
|
||||||
|
PodCount int `json:"pod_count"`
|
||||||
|
CPURequests string `json:"cpu_requests"`
|
||||||
|
CPULimits string `json:"cpu_limits"`
|
||||||
|
MemoryRequests string `json:"memory_requests"`
|
||||||
|
MemoryLimits string `json:"memory_limits"`
|
||||||
|
GPURequests int64 `json:"gpu_requests"`
|
||||||
|
GPULimits int64 `json:"gpu_limits"`
|
||||||
|
GPUMemoryRequestsMB int64 `json:"gpu_memory_requests_mb"`
|
||||||
|
GPUMemoryLimitsMB int64 `json:"gpu_memory_limits_mb"`
|
||||||
|
}
|
||||||
|
|
||||||
// NodeMetrics 节点监控指标
|
// NodeMetrics 节点监控指标
|
||||||
type NodeMetrics struct {
|
type NodeMetrics struct {
|
||||||
NodeName string `json:"node_name"`
|
NodeName string `json:"node_name"`
|
||||||
|
|||||||
@ -3,8 +3,21 @@ package repository
|
|||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"github.com/ocdp/cluster-service/internal/domain/entity"
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
|
"k8s.io/apimachinery/pkg/api/resource"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
type ResourceVector struct {
|
||||||
|
CPU resource.Quantity
|
||||||
|
Memory resource.Quantity
|
||||||
|
GPU int64
|
||||||
|
GPUMemoryMB int64
|
||||||
|
}
|
||||||
|
|
||||||
|
type ResourceEstimate struct {
|
||||||
|
Requests ResourceVector
|
||||||
|
Limits ResourceVector
|
||||||
|
}
|
||||||
|
|
||||||
// HelmClient Helm 客户端接口(Output Port)
|
// HelmClient Helm 客户端接口(Output Port)
|
||||||
type HelmClient interface {
|
type HelmClient interface {
|
||||||
// Install 安装 Helm Chart
|
// Install 安装 Helm Chart
|
||||||
@ -33,4 +46,7 @@ type HelmClient interface {
|
|||||||
|
|
||||||
// GetChartDefaultValues 从 chart 包中读取默认 values
|
// GetChartDefaultValues 从 chart 包中读取默认 values
|
||||||
GetChartDefaultValues(chartPath string) (map[string]interface{}, error)
|
GetChartDefaultValues(chartPath string) (map[string]interface{}, error)
|
||||||
|
|
||||||
|
// EstimateInstanceResources renders an instance chart with final values and sums Pod template resources.
|
||||||
|
EstimateInstanceResources(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) (*ResourceEstimate, error)
|
||||||
}
|
}
|
||||||
|
|||||||
@ -13,4 +13,7 @@ type MetricsClient interface {
|
|||||||
|
|
||||||
// GetNodeMetrics 获取集群的节点指标
|
// GetNodeMetrics 获取集群的节点指标
|
||||||
GetNodeMetrics(ctx context.Context, clusterID string) ([]*entity.NodeMetrics, error)
|
GetNodeMetrics(ctx context.Context, clusterID string) ([]*entity.NodeMetrics, error)
|
||||||
|
|
||||||
|
// GetPodResourceAllocations returns Pod requests/limits grouped by Pod.
|
||||||
|
GetPodResourceAllocations(ctx context.Context, clusterID string) ([]*entity.PodResourceAllocation, error)
|
||||||
}
|
}
|
||||||
|
|||||||
@ -11,5 +11,12 @@ import (
|
|||||||
type TenantKubeClient interface {
|
type TenantKubeClient interface {
|
||||||
EnsureTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error
|
EnsureTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error
|
||||||
IssueKubeconfig(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding, ttl time.Duration) (*entity.TenantKubeconfig, error)
|
IssueKubeconfig(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding, ttl time.Duration) (*entity.TenantKubeconfig, error)
|
||||||
|
GetResourceQuotaUsage(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) (*ResourceQuotaUsage, error)
|
||||||
SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error
|
SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error
|
||||||
|
DeleteTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error
|
||||||
|
}
|
||||||
|
|
||||||
|
type ResourceQuotaUsage struct {
|
||||||
|
Hard ResourceVector
|
||||||
|
Used ResourceVector
|
||||||
}
|
}
|
||||||
|
|||||||
@ -11,12 +11,14 @@ type WorkspaceRepository interface {
|
|||||||
GetByID(ctx context.Context, id string) (*entity.Workspace, error)
|
GetByID(ctx context.Context, id string) (*entity.Workspace, error)
|
||||||
GetByName(ctx context.Context, name string) (*entity.Workspace, error)
|
GetByName(ctx context.Context, name string) (*entity.Workspace, error)
|
||||||
Update(ctx context.Context, workspace *entity.Workspace) error
|
Update(ctx context.Context, workspace *entity.Workspace) error
|
||||||
|
Delete(ctx context.Context, id string) error
|
||||||
List(ctx context.Context) ([]*entity.Workspace, error)
|
List(ctx context.Context) ([]*entity.Workspace, error)
|
||||||
}
|
}
|
||||||
|
|
||||||
type WorkspaceClusterBindingRepository interface {
|
type WorkspaceClusterBindingRepository interface {
|
||||||
Upsert(ctx context.Context, binding *entity.WorkspaceClusterBinding) error
|
Upsert(ctx context.Context, binding *entity.WorkspaceClusterBinding) error
|
||||||
Get(ctx context.Context, workspaceID, clusterID string) (*entity.WorkspaceClusterBinding, error)
|
Get(ctx context.Context, workspaceID, clusterID string) (*entity.WorkspaceClusterBinding, error)
|
||||||
|
ListByWorkspace(ctx context.Context, workspaceID string) ([]*entity.WorkspaceClusterBinding, error)
|
||||||
Delete(ctx context.Context, workspaceID, clusterID string) error
|
Delete(ctx context.Context, workspaceID, clusterID string) error
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -2,6 +2,7 @@ package service
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
|
"errors"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
@ -18,6 +19,10 @@ import (
|
|||||||
type AuthService struct {
|
type AuthService struct {
|
||||||
userRepo repository.UserRepository
|
userRepo repository.UserRepository
|
||||||
workspaceRepo repository.WorkspaceRepository
|
workspaceRepo repository.WorkspaceRepository
|
||||||
|
instanceRepo repository.InstanceRepository
|
||||||
|
clusterRepo repository.ClusterRepository
|
||||||
|
bindingRepo repository.WorkspaceClusterBindingRepository
|
||||||
|
tenantClient repository.TenantKubeClient
|
||||||
passwordHasher PasswordHasher
|
passwordHasher PasswordHasher
|
||||||
tokenGenerator TokenGenerator
|
tokenGenerator TokenGenerator
|
||||||
}
|
}
|
||||||
@ -53,6 +58,18 @@ func NewAuthService(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) SetUserLifecycleCleanup(
|
||||||
|
instanceRepo repository.InstanceRepository,
|
||||||
|
clusterRepo repository.ClusterRepository,
|
||||||
|
bindingRepo repository.WorkspaceClusterBindingRepository,
|
||||||
|
tenantClient repository.TenantKubeClient,
|
||||||
|
) {
|
||||||
|
s.instanceRepo = instanceRepo
|
||||||
|
s.clusterRepo = clusterRepo
|
||||||
|
s.bindingRepo = bindingRepo
|
||||||
|
s.tenantClient = tenantClient
|
||||||
|
}
|
||||||
|
|
||||||
// Register 注册新用户。业务入口只允许 admin 调用;初始 admin 由 bootstrap seeder 创建。
|
// Register 注册新用户。业务入口只允许 admin 调用;初始 admin 由 bootstrap seeder 创建。
|
||||||
type UserWorkspaceOptions struct {
|
type UserWorkspaceOptions struct {
|
||||||
Namespace string
|
Namespace string
|
||||||
@ -87,6 +104,9 @@ func (s *AuthService) Register(ctx context.Context, username, password, role, wo
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
if normalizeUserRole(role) == authz.RoleUser {
|
||||||
|
normalizedOpts = defaultUserQuotaOptions(normalizedOpts)
|
||||||
|
}
|
||||||
|
|
||||||
// 默认生成占位邮箱,避免数据库约束失败
|
// 默认生成占位邮箱,避免数据库约束失败
|
||||||
email := username + "@local.ocdp"
|
email := username + "@local.ocdp"
|
||||||
@ -96,7 +116,7 @@ func (s *AuthService) Register(ctx context.Context, username, password, role, wo
|
|||||||
user.ID = uuid.New().String()
|
user.ID = uuid.New().String()
|
||||||
user.Role = normalizeUserRole(role)
|
user.Role = normalizeUserRole(role)
|
||||||
user.WorkspaceID = workspaceID
|
user.WorkspaceID = workspaceID
|
||||||
if user.Role == authz.RoleUser && (user.WorkspaceID == "" || user.WorkspaceID == entity.DefaultWorkspaceID) {
|
if user.Role == authz.RoleUser {
|
||||||
workspace, err := s.createUserWorkspace(ctx, username, principal.UserID, normalizedOpts)
|
workspace, err := s.createUserWorkspace(ctx, username, principal.UserID, normalizedOpts)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
@ -131,10 +151,7 @@ func (s *AuthService) createUserWorkspace(ctx context.Context, username, created
|
|||||||
if s.workspaceRepo == nil {
|
if s.workspaceRepo == nil {
|
||||||
return nil, entity.ErrWorkspaceNotFound
|
return nil, entity.ErrWorkspaceNotFound
|
||||||
}
|
}
|
||||||
name := strings.TrimPrefix(entity.NamespaceForUser(username), "ocdp-u-")
|
name := userWorkspaceName(username)
|
||||||
workspace := entity.NewWorkspace(name, createdBy)
|
|
||||||
workspace.ID = uuid.New().String()
|
|
||||||
workspace.DefaultClusterID = strings.TrimSpace(opts.DefaultClusterID)
|
|
||||||
namespace := strings.TrimSpace(opts.Namespace)
|
namespace := strings.TrimSpace(opts.Namespace)
|
||||||
if namespace == "" {
|
if namespace == "" {
|
||||||
namespace = entity.NamespaceForUser(username)
|
namespace = entity.NamespaceForUser(username)
|
||||||
@ -143,6 +160,32 @@ func (s *AuthService) createUserWorkspace(ctx context.Context, username, created
|
|||||||
if len(validation.IsDNS1123Label(namespace)) > 0 {
|
if len(validation.IsDNS1123Label(namespace)) > 0 {
|
||||||
return nil, entity.ErrInvalidNamespace
|
return nil, entity.ErrInvalidNamespace
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
if existing, err := s.workspaceRepo.GetByName(ctx, name); err == nil && existing != nil {
|
||||||
|
if namespace != "" && existing.K8sNamespace != namespace {
|
||||||
|
if err := s.ensureNamespaceAvailable(ctx, namespace, existing.ID); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
applyWorkspaceOptions(existing, opts)
|
||||||
|
if namespace != "" {
|
||||||
|
existing.K8sNamespace = namespace
|
||||||
|
existing.K8sSAName = entity.ServiceAccountForNamespace(namespace)
|
||||||
|
}
|
||||||
|
if err := s.workspaceRepo.Update(ctx, existing); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return existing, nil
|
||||||
|
} else if err != nil && !errors.Is(err, entity.ErrWorkspaceNotFound) {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if err := s.ensureNamespaceAvailable(ctx, namespace, ""); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
workspace := entity.NewWorkspace(name, createdBy)
|
||||||
|
workspace.ID = uuid.New().String()
|
||||||
|
workspace.DefaultClusterID = strings.TrimSpace(opts.DefaultClusterID)
|
||||||
|
if namespace != "" {
|
||||||
workspace.K8sNamespace = namespace
|
workspace.K8sNamespace = namespace
|
||||||
workspace.K8sSAName = entity.ServiceAccountForNamespace(namespace)
|
workspace.K8sSAName = entity.ServiceAccountForNamespace(namespace)
|
||||||
}
|
}
|
||||||
@ -151,11 +194,45 @@ func (s *AuthService) createUserWorkspace(ctx context.Context, username, created
|
|||||||
workspace.QuotaGPU = strings.TrimSpace(opts.QuotaGPU)
|
workspace.QuotaGPU = strings.TrimSpace(opts.QuotaGPU)
|
||||||
workspace.QuotaGPUMem = strings.TrimSpace(opts.QuotaGPUMem)
|
workspace.QuotaGPUMem = strings.TrimSpace(opts.QuotaGPUMem)
|
||||||
if err := s.workspaceRepo.Create(ctx, workspace); err != nil {
|
if err := s.workspaceRepo.Create(ctx, workspace); err != nil {
|
||||||
|
if errors.Is(err, entity.ErrWorkspaceExists) {
|
||||||
|
existing, getErr := s.workspaceRepo.GetByName(ctx, name)
|
||||||
|
if getErr != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if existing.K8sNamespace != namespace {
|
||||||
|
return nil, entity.ErrWorkspaceNamespaceConflict
|
||||||
|
}
|
||||||
|
return existing, nil
|
||||||
|
}
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
return workspace, nil
|
return workspace, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func userWorkspaceName(username string) string {
|
||||||
|
return strings.TrimPrefix(entity.NamespaceForUser(username), "ocdp-u-")
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) ensureNamespaceAvailable(ctx context.Context, namespace, allowedWorkspaceID string) error {
|
||||||
|
if s.workspaceRepo == nil || strings.TrimSpace(namespace) == "" {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
workspaces, err := s.workspaceRepo.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
for _, workspace := range workspaces {
|
||||||
|
if workspace == nil || workspace.K8sNamespace != namespace {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if allowedWorkspaceID != "" && workspace.ID == allowedWorkspaceID {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
return entity.ErrWorkspaceNamespaceConflict
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
func normalizeQuotaOptions(opts UserWorkspaceOptions) (UserWorkspaceOptions, error) {
|
func normalizeQuotaOptions(opts UserWorkspaceOptions) (UserWorkspaceOptions, error) {
|
||||||
opts.Namespace = strings.TrimSpace(opts.Namespace)
|
opts.Namespace = strings.TrimSpace(opts.Namespace)
|
||||||
opts.DefaultClusterID = strings.TrimSpace(opts.DefaultClusterID)
|
opts.DefaultClusterID = strings.TrimSpace(opts.DefaultClusterID)
|
||||||
@ -181,6 +258,16 @@ func normalizeQuotaOptions(opts UserWorkspaceOptions) (UserWorkspaceOptions, err
|
|||||||
return opts, nil
|
return opts, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func defaultUserQuotaOptions(opts UserWorkspaceOptions) UserWorkspaceOptions {
|
||||||
|
if strings.TrimSpace(opts.QuotaGPU) == "" {
|
||||||
|
opts.QuotaGPU = "0"
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(opts.QuotaGPUMem) == "" {
|
||||||
|
opts.QuotaGPUMem = "0"
|
||||||
|
}
|
||||||
|
return opts
|
||||||
|
}
|
||||||
|
|
||||||
func (s *AuthService) ListUsers(ctx context.Context) ([]*entity.User, error) {
|
func (s *AuthService) ListUsers(ctx context.Context) ([]*entity.User, error) {
|
||||||
principal, err := authz.RequirePrincipal(ctx)
|
principal, err := authz.RequirePrincipal(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@ -204,25 +291,35 @@ func (s *AuthService) UpdateUser(ctx context.Context, userID, role, workspaceID
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, entity.ErrUserNotFound
|
return nil, entity.ErrUserNotFound
|
||||||
}
|
}
|
||||||
|
previousRole := user.Role
|
||||||
if role != "" {
|
if role != "" {
|
||||||
user.Role = normalizeUserRole(role)
|
user.Role = normalizeUserRole(role)
|
||||||
}
|
}
|
||||||
if workspaceID != "" {
|
if workspaceID != "" && user.Role != authz.RoleUser {
|
||||||
user.WorkspaceID = workspaceID
|
user.WorkspaceID = workspaceID
|
||||||
}
|
}
|
||||||
|
workspaceHandled := false
|
||||||
if user.Role == authz.RoleAdmin {
|
if user.Role == authz.RoleAdmin {
|
||||||
user.WorkspaceID = entity.DefaultWorkspaceID
|
user.WorkspaceID = entity.DefaultWorkspaceID
|
||||||
}
|
}
|
||||||
if user.Role == authz.RoleUser && (user.WorkspaceID == "" || user.WorkspaceID == entity.DefaultWorkspaceID) {
|
if user.Role == authz.RoleUser && (role != "" || workspaceID != "" || hasWorkspaceUpdates(opts)) {
|
||||||
normalizedOpts, err := normalizeQuotaOptions(opts)
|
normalizedOpts, err := normalizeQuotaOptions(opts)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
workspace, err := s.createUserWorkspace(ctx, user.Username, principal.UserID, normalizedOpts)
|
normalizedOpts = defaultUserQuotaOptions(normalizedOpts)
|
||||||
|
currentWorkspace, _ := s.currentUserWorkspace(ctx, user)
|
||||||
|
if currentWorkspace != nil && shouldCreatePrivateWorkspace(user, previousRole, currentWorkspace) {
|
||||||
|
if normalizedOpts.Namespace == "" || normalizedOpts.Namespace == currentWorkspace.K8sNamespace {
|
||||||
|
normalizedOpts.Namespace = ""
|
||||||
|
}
|
||||||
|
}
|
||||||
|
workspace, err := s.ensureUserWorkspaceForUpdate(ctx, user, previousRole, currentWorkspace, opts, normalizedOpts, principal.UserID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
user.WorkspaceID = workspace.ID
|
user.WorkspaceID = workspace.ID
|
||||||
|
workspaceHandled = true
|
||||||
}
|
}
|
||||||
if isActive != nil {
|
if isActive != nil {
|
||||||
if user.ID == principal.UserID && !*isActive {
|
if user.ID == principal.UserID && !*isActive {
|
||||||
@ -233,7 +330,7 @@ func (s *AuthService) UpdateUser(ctx context.Context, userID, role, workspaceID
|
|||||||
if mustChangePassword != nil {
|
if mustChangePassword != nil {
|
||||||
user.MustChangePassword = *mustChangePassword
|
user.MustChangePassword = *mustChangePassword
|
||||||
}
|
}
|
||||||
if user.Role != authz.RoleAdmin && hasWorkspaceUpdates(opts) {
|
if user.Role != authz.RoleAdmin && !workspaceHandled && hasWorkspaceUpdates(opts) {
|
||||||
normalizedOpts, err := normalizeQuotaOptions(opts)
|
normalizedOpts, err := normalizeQuotaOptions(opts)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
@ -242,10 +339,13 @@ func (s *AuthService) UpdateUser(ctx context.Context, userID, role, workspaceID
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
applyWorkspaceOptions(workspace, normalizedOpts)
|
applyWorkspaceOptionsForUpdate(workspace, opts, normalizedOpts)
|
||||||
if err := s.workspaceRepo.Update(ctx, workspace); err != nil {
|
if err := s.workspaceRepo.Update(ctx, workspace); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
if err := s.syncWorkspaceBindings(ctx, workspace); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
}
|
}
|
||||||
user.RevokedAfter = time.Now()
|
user.RevokedAfter = time.Now()
|
||||||
user.UpdatedAt = time.Now()
|
user.UpdatedAt = time.Now()
|
||||||
@ -289,6 +389,115 @@ func applyWorkspaceOptions(workspace *entity.Workspace, opts UserWorkspaceOption
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) currentUserWorkspace(ctx context.Context, user *entity.User) (*entity.Workspace, error) {
|
||||||
|
if s.workspaceRepo == nil || user == nil || user.WorkspaceID == "" {
|
||||||
|
return nil, entity.ErrWorkspaceNotFound
|
||||||
|
}
|
||||||
|
return s.workspaceRepo.GetByID(ctx, user.WorkspaceID)
|
||||||
|
}
|
||||||
|
|
||||||
|
func shouldCreatePrivateWorkspace(user *entity.User, previousRole string, current *entity.Workspace) bool {
|
||||||
|
if user == nil {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
if previousRole == authz.RoleAdmin || user.WorkspaceID == "" || user.WorkspaceID == entity.DefaultWorkspaceID {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
if current == nil {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return current.Name != userWorkspaceName(user.Username)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) ensureUserWorkspaceForUpdate(ctx context.Context, user *entity.User, previousRole string, current *entity.Workspace, rawOpts, normalizedOpts UserWorkspaceOptions, createdBy string) (*entity.Workspace, error) {
|
||||||
|
if s.workspaceRepo == nil {
|
||||||
|
return nil, entity.ErrWorkspaceNotFound
|
||||||
|
}
|
||||||
|
if shouldCreatePrivateWorkspace(user, previousRole, current) {
|
||||||
|
return s.createUserWorkspace(ctx, user.Username, createdBy, normalizedOpts)
|
||||||
|
}
|
||||||
|
if rawNamespace := strings.TrimSpace(rawOpts.Namespace); rawNamespace != "" && rawNamespace != current.K8sNamespace {
|
||||||
|
if err := s.ensureNamespaceAvailable(ctx, rawNamespace, current.ID); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
applyWorkspaceOptionsForUpdate(current, rawOpts, normalizedOpts)
|
||||||
|
if err := s.workspaceRepo.Update(ctx, current); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if err := s.syncWorkspaceBindings(ctx, current); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return current, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func applyWorkspaceOptionsForUpdate(workspace *entity.Workspace, rawOpts, normalizedOpts UserWorkspaceOptions) {
|
||||||
|
if namespace := strings.TrimSpace(rawOpts.Namespace); namespace != "" {
|
||||||
|
workspace.K8sNamespace = namespace
|
||||||
|
workspace.K8sSAName = entity.ServiceAccountForNamespace(namespace)
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(rawOpts.DefaultClusterID) != "" {
|
||||||
|
workspace.DefaultClusterID = normalizedOpts.DefaultClusterID
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(rawOpts.QuotaCPU) != "" {
|
||||||
|
workspace.QuotaCPU = normalizedOpts.QuotaCPU
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(rawOpts.QuotaMemory) != "" {
|
||||||
|
workspace.QuotaMemory = normalizedOpts.QuotaMemory
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(rawOpts.QuotaGPU) != "" {
|
||||||
|
workspace.QuotaGPU = normalizedOpts.QuotaGPU
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(rawOpts.QuotaGPUMem) != "" {
|
||||||
|
workspace.QuotaGPUMem = normalizedOpts.QuotaGPUMem
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) syncWorkspaceBindings(ctx context.Context, workspace *entity.Workspace) error {
|
||||||
|
if workspace == nil || s.bindingRepo == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
bindings, err := s.bindingRepo.ListByWorkspace(ctx, workspace.ID)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
for _, binding := range bindings {
|
||||||
|
if binding == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
binding.QuotaCPU = strings.TrimSpace(workspace.QuotaCPU)
|
||||||
|
binding.QuotaMemory = strings.TrimSpace(workspace.QuotaMemory)
|
||||||
|
binding.QuotaGPU = strings.TrimSpace(workspace.QuotaGPU)
|
||||||
|
if binding.QuotaGPU == "" {
|
||||||
|
binding.QuotaGPU = "0"
|
||||||
|
}
|
||||||
|
binding.QuotaGPUMem = strings.TrimSpace(workspace.QuotaGPUMem)
|
||||||
|
if binding.QuotaGPUMem == "" {
|
||||||
|
binding.QuotaGPUMem = "0"
|
||||||
|
}
|
||||||
|
binding.UpdatedAt = time.Now()
|
||||||
|
if s.tenantClient != nil && s.clusterRepo != nil {
|
||||||
|
cluster, err := s.clusterRepo.GetByID(ctx, binding.ClusterID)
|
||||||
|
if err != nil {
|
||||||
|
if errors.Is(err, entity.ErrClusterNotFound) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
tenantBinding := entity.NewTenantBinding(binding.Namespace)
|
||||||
|
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
||||||
|
tenantBinding.ResourceQuotaHard = bindingQuotaHard(binding)
|
||||||
|
if err := s.tenantClient.EnsureTenant(ctx, cluster, tenantBinding); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := s.bindingRepo.Upsert(ctx, binding); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
func (s *AuthService) DeleteUser(ctx context.Context, userID string) error {
|
func (s *AuthService) DeleteUser(ctx context.Context, userID string) error {
|
||||||
principal, err := authz.RequirePrincipal(ctx)
|
principal, err := authz.RequirePrincipal(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@ -300,9 +509,117 @@ func (s *AuthService) DeleteUser(ctx context.Context, userID string) error {
|
|||||||
if userID == principal.UserID {
|
if userID == principal.UserID {
|
||||||
return entity.ErrForbidden
|
return entity.ErrForbidden
|
||||||
}
|
}
|
||||||
|
user, err := s.userRepo.GetByID(ctx, userID)
|
||||||
|
if err != nil {
|
||||||
|
return entity.ErrUserNotFound
|
||||||
|
}
|
||||||
|
if err := s.ensureUserHasNoInstances(ctx, user); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if s.isExclusiveUserWorkspace(ctx, user) {
|
||||||
|
if err := s.cleanupUserWorkspace(ctx, user.WorkspaceID); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
return s.userRepo.Delete(ctx, userID)
|
return s.userRepo.Delete(ctx, userID)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) ensureUserHasNoInstances(ctx context.Context, user *entity.User) error {
|
||||||
|
if s.instanceRepo == nil || user == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
instances, err := s.instanceRepo.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
for _, instance := range instances {
|
||||||
|
if instance == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if instance.OwnerID == user.ID {
|
||||||
|
return entity.ErrUserHasInstances
|
||||||
|
}
|
||||||
|
if user.WorkspaceID != "" && user.WorkspaceID != entity.DefaultWorkspaceID && instance.WorkspaceID == user.WorkspaceID {
|
||||||
|
return entity.ErrUserHasInstances
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) isExclusiveUserWorkspace(ctx context.Context, user *entity.User) bool {
|
||||||
|
if user == nil || user.Role == authz.RoleAdmin || user.WorkspaceID == "" || user.WorkspaceID == entity.DefaultWorkspaceID {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
users, err := s.userRepo.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
for _, other := range users {
|
||||||
|
if other == nil || other.ID == user.ID {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if other.WorkspaceID == user.WorkspaceID {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *AuthService) cleanupUserWorkspace(ctx context.Context, workspaceID string) error {
|
||||||
|
if s.workspaceRepo == nil || s.bindingRepo == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
workspace, err := s.workspaceRepo.GetByID(ctx, workspaceID)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if isProtectedWorkspaceNamespace(workspace.K8sNamespace) {
|
||||||
|
return entity.ErrProtectedNamespace
|
||||||
|
}
|
||||||
|
bindings, err := s.bindingRepo.ListByWorkspace(ctx, workspace.ID)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
for _, binding := range bindings {
|
||||||
|
if binding == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if isProtectedWorkspaceNamespace(binding.Namespace) {
|
||||||
|
return entity.ErrProtectedNamespace
|
||||||
|
}
|
||||||
|
if s.tenantClient != nil && s.clusterRepo != nil {
|
||||||
|
cluster, err := s.clusterRepo.GetByID(ctx, binding.ClusterID)
|
||||||
|
if err != nil && !errors.Is(err, entity.ErrClusterNotFound) {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if err == nil {
|
||||||
|
tenantBinding := entity.NewTenantBinding(binding.Namespace)
|
||||||
|
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
||||||
|
tenantBinding.ResourceQuotaHard = resourceQuotaHard(workspace)
|
||||||
|
if err := s.tenantClient.DeleteTenant(ctx, cluster, tenantBinding); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := s.bindingRepo.Delete(ctx, binding.WorkspaceID, binding.ClusterID); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := s.workspaceRepo.Delete(ctx, workspace.ID); err != nil && !errors.Is(err, entity.ErrWorkspaceNotFound) {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func isProtectedWorkspaceNamespace(namespace string) bool {
|
||||||
|
switch strings.TrimSpace(namespace) {
|
||||||
|
case "", "default", "kube-system", "kube-public", "kube-node-lease":
|
||||||
|
return true
|
||||||
|
default:
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func normalizeUserRole(role string) string {
|
func normalizeUserRole(role string) string {
|
||||||
if role == authz.RoleAdmin {
|
if role == authz.RoleAdmin {
|
||||||
return authz.RoleAdmin
|
return authz.RoleAdmin
|
||||||
|
|||||||
322
backend/internal/domain/service/auth_service_test.go
Normal file
322
backend/internal/domain/service/auth_service_test.go
Normal file
@ -0,0 +1,322 @@
|
|||||||
|
package service
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/google/uuid"
|
||||||
|
"github.com/ocdp/cluster-service/internal/adapter/output/persistence/mock"
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
|
"github.com/ocdp/cluster-service/internal/pkg/authz"
|
||||||
|
jwtpkg "github.com/ocdp/cluster-service/internal/pkg/jwt"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestAuthServiceUpdateUserDowngradeReusesUsernameWorkspace(t *testing.T) {
|
||||||
|
ctx := adminContext()
|
||||||
|
userRepo := mock.NewUserRepositoryMock()
|
||||||
|
workspaceRepo := mock.NewWorkspaceRepositoryMock()
|
||||||
|
svc := NewAuthService(userRepo, workspaceRepo, testPasswordHasher{}, testTokenGenerator{})
|
||||||
|
|
||||||
|
target := testUser("user-1", "alice", authz.RoleAdmin, entity.DefaultWorkspaceID)
|
||||||
|
if err := userRepo.Create(ctx, target); err != nil {
|
||||||
|
t.Fatalf("seed user: %v", err)
|
||||||
|
}
|
||||||
|
workspace := entity.NewWorkspace(userWorkspaceName("alice"), "admin")
|
||||||
|
workspace.ID = "workspace-alice"
|
||||||
|
workspace.K8sNamespace = entity.NamespaceForUser("alice")
|
||||||
|
workspace.K8sSAName = entity.ServiceAccountForNamespace(workspace.K8sNamespace)
|
||||||
|
if err := workspaceRepo.Create(ctx, workspace); err != nil {
|
||||||
|
t.Fatalf("seed workspace: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
updated, err := svc.UpdateUser(ctx, target.ID, authz.RoleUser, "", UserWorkspaceOptions{DefaultClusterID: "cluster-1"}, nil, nil)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpdateUser returned error: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if updated.Role != authz.RoleUser {
|
||||||
|
t.Fatalf("expected user role, got %q", updated.Role)
|
||||||
|
}
|
||||||
|
if updated.WorkspaceID != workspace.ID {
|
||||||
|
t.Fatalf("expected reused workspace %q, got %q", workspace.ID, updated.WorkspaceID)
|
||||||
|
}
|
||||||
|
reused, err := workspaceRepo.GetByID(ctx, workspace.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("get reused workspace: %v", err)
|
||||||
|
}
|
||||||
|
if reused.DefaultClusterID != "cluster-1" {
|
||||||
|
t.Fatalf("expected updated default cluster, got %q", reused.DefaultClusterID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAuthServiceRegisterUserAlwaysCreatesPrivateWorkspaceWithZeroDefaultQuotas(t *testing.T) {
|
||||||
|
ctx := adminContext()
|
||||||
|
userRepo := mock.NewUserRepositoryMock()
|
||||||
|
workspaceRepo := mock.NewWorkspaceRepositoryMock()
|
||||||
|
svc := NewAuthService(userRepo, workspaceRepo, testPasswordHasher{}, testTokenGenerator{})
|
||||||
|
|
||||||
|
user, err := svc.Register(ctx, "alice", "password", authz.RoleUser, "shared-workspace", UserWorkspaceOptions{}, nil, nil)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Register returned error: %v", err)
|
||||||
|
}
|
||||||
|
if user.WorkspaceID == "shared-workspace" || user.WorkspaceID == entity.DefaultWorkspaceID {
|
||||||
|
t.Fatalf("expected private user workspace, got %q", user.WorkspaceID)
|
||||||
|
}
|
||||||
|
workspace, err := workspaceRepo.GetByID(ctx, user.WorkspaceID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("get user workspace: %v", err)
|
||||||
|
}
|
||||||
|
if workspace.K8sNamespace != entity.NamespaceForUser("alice") {
|
||||||
|
t.Fatalf("expected user namespace %q, got %q", entity.NamespaceForUser("alice"), workspace.K8sNamespace)
|
||||||
|
}
|
||||||
|
if workspace.QuotaCPU != "" || workspace.QuotaMemory != "" || workspace.QuotaGPU != "0" || workspace.QuotaGPUMem != "0" {
|
||||||
|
t.Fatalf("expected omitted CPU/memory to stay unlimited and GPU/gpumem to default zero, got cpu=%q memory=%q gpu=%q gpumem=%q", workspace.QuotaCPU, workspace.QuotaMemory, workspace.QuotaGPU, workspace.QuotaGPUMem)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAuthServiceUpdateUserDowngradeRejectsNamespaceConflict(t *testing.T) {
|
||||||
|
ctx := adminContext()
|
||||||
|
userRepo := mock.NewUserRepositoryMock()
|
||||||
|
workspaceRepo := mock.NewWorkspaceRepositoryMock()
|
||||||
|
svc := NewAuthService(userRepo, workspaceRepo, testPasswordHasher{}, testTokenGenerator{})
|
||||||
|
|
||||||
|
target := testUser("user-1", "alice", authz.RoleAdmin, entity.DefaultWorkspaceID)
|
||||||
|
if err := userRepo.Create(ctx, target); err != nil {
|
||||||
|
t.Fatalf("seed user: %v", err)
|
||||||
|
}
|
||||||
|
conflicting := entity.NewWorkspace("someone-else", "admin")
|
||||||
|
conflicting.ID = "workspace-other"
|
||||||
|
conflicting.K8sNamespace = entity.NamespaceForUser("alice")
|
||||||
|
conflicting.K8sSAName = entity.ServiceAccountForNamespace(conflicting.K8sNamespace)
|
||||||
|
if err := workspaceRepo.Create(ctx, conflicting); err != nil {
|
||||||
|
t.Fatalf("seed conflicting workspace: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
_, err := svc.UpdateUser(ctx, target.ID, authz.RoleUser, "", UserWorkspaceOptions{}, nil, nil)
|
||||||
|
if !errors.Is(err, entity.ErrWorkspaceNamespaceConflict) {
|
||||||
|
t.Fatalf("expected namespace conflict, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAuthServiceDeleteUserRejectsUserWithInstances(t *testing.T) {
|
||||||
|
ctx := adminContext()
|
||||||
|
userRepo := mock.NewUserRepositoryMock()
|
||||||
|
workspaceRepo := mock.NewWorkspaceRepositoryMock()
|
||||||
|
instanceRepo := mock.NewInstanceRepositoryMock()
|
||||||
|
svc := NewAuthService(userRepo, workspaceRepo, testPasswordHasher{}, testTokenGenerator{})
|
||||||
|
svc.SetUserLifecycleCleanup(instanceRepo, nil, nil, nil)
|
||||||
|
|
||||||
|
user := testUser("user-1", "alice", authz.RoleUser, "workspace-alice")
|
||||||
|
if err := userRepo.Create(ctx, user); err != nil {
|
||||||
|
t.Fatalf("seed user: %v", err)
|
||||||
|
}
|
||||||
|
instance := entity.NewInstance("cluster-1", "app", "ocdp-u-alice", "registry-1", "repo", "chart", "1.0.0")
|
||||||
|
instance.ID = "instance-1"
|
||||||
|
instance.OwnerID = user.ID
|
||||||
|
instance.WorkspaceID = user.WorkspaceID
|
||||||
|
if err := instanceRepo.Create(ctx, instance); err != nil {
|
||||||
|
t.Fatalf("seed instance: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
err := svc.DeleteUser(ctx, user.ID)
|
||||||
|
if !errors.Is(err, entity.ErrUserHasInstances) {
|
||||||
|
t.Fatalf("expected user instance conflict, got %v", err)
|
||||||
|
}
|
||||||
|
if _, err := userRepo.GetByID(ctx, user.ID); err != nil {
|
||||||
|
t.Fatalf("user should not be deleted: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAuthServiceDeleteUserRejectsWorkspaceInstanceEvenWithDifferentOwner(t *testing.T) {
|
||||||
|
ctx := adminContext()
|
||||||
|
userRepo := mock.NewUserRepositoryMock()
|
||||||
|
workspaceRepo := mock.NewWorkspaceRepositoryMock()
|
||||||
|
instanceRepo := mock.NewInstanceRepositoryMock()
|
||||||
|
svc := NewAuthService(userRepo, workspaceRepo, testPasswordHasher{}, testTokenGenerator{})
|
||||||
|
svc.SetUserLifecycleCleanup(instanceRepo, nil, nil, nil)
|
||||||
|
|
||||||
|
user := testUser("user-1", "alice", authz.RoleUser, "workspace-alice")
|
||||||
|
if err := userRepo.Create(ctx, user); err != nil {
|
||||||
|
t.Fatalf("seed user: %v", err)
|
||||||
|
}
|
||||||
|
instance := entity.NewInstance("cluster-1", "shared-workspace-app", "ocdp-u-alice", "registry-1", "repo", "chart", "1.0.0")
|
||||||
|
instance.ID = "instance-1"
|
||||||
|
instance.OwnerID = "other-user"
|
||||||
|
instance.WorkspaceID = user.WorkspaceID
|
||||||
|
if err := instanceRepo.Create(ctx, instance); err != nil {
|
||||||
|
t.Fatalf("seed workspace instance: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
err := svc.DeleteUser(ctx, user.ID)
|
||||||
|
if !errors.Is(err, entity.ErrUserHasInstances) {
|
||||||
|
t.Fatalf("expected workspace instance conflict, got %v", err)
|
||||||
|
}
|
||||||
|
if _, err := userRepo.GetByID(ctx, user.ID); err != nil {
|
||||||
|
t.Fatalf("user should not be deleted: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAuthServiceDeleteUserCleansExclusiveWorkspaceBindings(t *testing.T) {
|
||||||
|
ctx := adminContext()
|
||||||
|
userRepo := mock.NewUserRepositoryMock()
|
||||||
|
workspaceRepo := mock.NewWorkspaceRepositoryMock()
|
||||||
|
instanceRepo := mock.NewInstanceRepositoryMock()
|
||||||
|
bindingRepo := mock.NewWorkspaceClusterBindingRepositoryMock()
|
||||||
|
clusterRepo := &testClusterRepo{clusters: map[string]*entity.Cluster{
|
||||||
|
"cluster-1": {ID: "cluster-1", Name: "cluster-1", Host: "https://cluster.invalid", Token: "token"},
|
||||||
|
}}
|
||||||
|
tenantClient := &recordingTenantClient{}
|
||||||
|
svc := NewAuthService(userRepo, workspaceRepo, testPasswordHasher{}, testTokenGenerator{})
|
||||||
|
svc.SetUserLifecycleCleanup(instanceRepo, clusterRepo, bindingRepo, tenantClient)
|
||||||
|
|
||||||
|
workspace := entity.NewWorkspace(userWorkspaceName("alice"), "admin")
|
||||||
|
workspace.ID = "workspace-alice"
|
||||||
|
workspace.K8sNamespace = entity.NamespaceForUser("alice")
|
||||||
|
workspace.K8sSAName = entity.ServiceAccountForNamespace(workspace.K8sNamespace)
|
||||||
|
if err := workspaceRepo.Create(ctx, workspace); err != nil {
|
||||||
|
t.Fatalf("seed workspace: %v", err)
|
||||||
|
}
|
||||||
|
user := testUser("user-1", "alice", authz.RoleUser, workspace.ID)
|
||||||
|
if err := userRepo.Create(ctx, user); err != nil {
|
||||||
|
t.Fatalf("seed user: %v", err)
|
||||||
|
}
|
||||||
|
if err := bindingRepo.Upsert(ctx, &entity.WorkspaceClusterBinding{
|
||||||
|
ID: "binding-1",
|
||||||
|
WorkspaceID: workspace.ID,
|
||||||
|
ClusterID: "cluster-1",
|
||||||
|
Namespace: workspace.K8sNamespace,
|
||||||
|
ServiceAccount: workspace.K8sSAName,
|
||||||
|
Status: "active",
|
||||||
|
}); err != nil {
|
||||||
|
t.Fatalf("seed binding: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := svc.DeleteUser(ctx, user.ID); err != nil {
|
||||||
|
t.Fatalf("DeleteUser returned error: %v", err)
|
||||||
|
}
|
||||||
|
if _, err := userRepo.GetByID(ctx, user.ID); !errors.Is(err, entity.ErrUserNotFound) {
|
||||||
|
t.Fatalf("expected user deleted, got %v", err)
|
||||||
|
}
|
||||||
|
if bindings, err := bindingRepo.ListByWorkspace(ctx, workspace.ID); err != nil || len(bindings) != 0 {
|
||||||
|
t.Fatalf("expected bindings cleaned, got len=%d err=%v", len(bindings), err)
|
||||||
|
}
|
||||||
|
if len(tenantClient.deleted) != 1 || tenantClient.deleted[0] != workspace.K8sNamespace {
|
||||||
|
t.Fatalf("expected tenant namespace cleanup, got %#v", tenantClient.deleted)
|
||||||
|
}
|
||||||
|
if _, err := workspaceRepo.GetByID(ctx, workspace.ID); !errors.Is(err, entity.ErrWorkspaceNotFound) {
|
||||||
|
t.Fatalf("expected exclusive workspace deleted, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func adminContext() context.Context {
|
||||||
|
return authz.WithPrincipal(context.Background(), &authz.Principal{
|
||||||
|
UserID: "admin-1",
|
||||||
|
Username: "admin",
|
||||||
|
Role: authz.RoleAdmin,
|
||||||
|
WorkspaceID: entity.DefaultWorkspaceID,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
func testUser(id, username, role, workspaceID string) *entity.User {
|
||||||
|
user := entity.NewUser(username, "hash", username+"@local.ocdp")
|
||||||
|
user.ID = id
|
||||||
|
user.Role = role
|
||||||
|
user.WorkspaceID = workspaceID
|
||||||
|
return user
|
||||||
|
}
|
||||||
|
|
||||||
|
type testPasswordHasher struct{}
|
||||||
|
|
||||||
|
func (testPasswordHasher) Hash(password string) (string, error) { return "hash:" + password, nil }
|
||||||
|
func (testPasswordHasher) Verify(password, hash string) error { return nil }
|
||||||
|
|
||||||
|
type testTokenGenerator struct{}
|
||||||
|
|
||||||
|
func (testTokenGenerator) Generate(userID, username, role, workspaceID string) (string, string, error) {
|
||||||
|
return "access", "refresh", nil
|
||||||
|
}
|
||||||
|
func (testTokenGenerator) Verify(token string) (string, string, error) { return "", "", nil }
|
||||||
|
func (testTokenGenerator) VerifyWithIssuedAt(token string) (string, string, int64, error) {
|
||||||
|
return "", "", 0, nil
|
||||||
|
}
|
||||||
|
func (testTokenGenerator) VerifyAccess(token string) (*jwtpkg.Claims, error) { return nil, nil }
|
||||||
|
func (testTokenGenerator) VerifyRefresh(token string) (*jwtpkg.Claims, error) { return nil, nil }
|
||||||
|
func (testTokenGenerator) Refresh(refreshToken string) (string, error) { return "access", nil }
|
||||||
|
|
||||||
|
type testClusterRepo struct {
|
||||||
|
clusters map[string]*entity.Cluster
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *testClusterRepo) Create(ctx context.Context, cluster *entity.Cluster) error {
|
||||||
|
if cluster.ID == "" {
|
||||||
|
cluster.ID = uuid.New().String()
|
||||||
|
}
|
||||||
|
copy := *cluster
|
||||||
|
r.clusters[cluster.ID] = ©
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
func (r *testClusterRepo) GetByID(ctx context.Context, id string) (*entity.Cluster, error) {
|
||||||
|
cluster, ok := r.clusters[id]
|
||||||
|
if !ok {
|
||||||
|
return nil, entity.ErrClusterNotFound
|
||||||
|
}
|
||||||
|
copy := *cluster
|
||||||
|
return ©, nil
|
||||||
|
}
|
||||||
|
func (r *testClusterRepo) GetByName(ctx context.Context, name string) (*entity.Cluster, error) {
|
||||||
|
for _, cluster := range r.clusters {
|
||||||
|
if cluster.Name == name {
|
||||||
|
copy := *cluster
|
||||||
|
return ©, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil, entity.ErrClusterNotFound
|
||||||
|
}
|
||||||
|
func (r *testClusterRepo) Update(ctx context.Context, cluster *entity.Cluster) error {
|
||||||
|
copy := *cluster
|
||||||
|
r.clusters[cluster.ID] = ©
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
func (r *testClusterRepo) Delete(ctx context.Context, id string) error {
|
||||||
|
delete(r.clusters, id)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
func (r *testClusterRepo) List(ctx context.Context) ([]*entity.Cluster, error) {
|
||||||
|
result := make([]*entity.Cluster, 0, len(r.clusters))
|
||||||
|
for _, cluster := range r.clusters {
|
||||||
|
copy := *cluster
|
||||||
|
result = append(result, ©)
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
type recordingTenantClient struct {
|
||||||
|
deleted []string
|
||||||
|
usage *repository.ResourceQuotaUsage
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *recordingTenantClient) EnsureTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
func (c *recordingTenantClient) IssueKubeconfig(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding, ttl time.Duration) (*entity.TenantKubeconfig, error) {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
func (c *recordingTenantClient) GetResourceQuotaUsage(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) (*repository.ResourceQuotaUsage, error) {
|
||||||
|
if c.usage != nil {
|
||||||
|
return c.usage, nil
|
||||||
|
}
|
||||||
|
return &repository.ResourceQuotaUsage{}, nil
|
||||||
|
}
|
||||||
|
func (c *recordingTenantClient) SuspendTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
func (c *recordingTenantClient) DeleteTenant(ctx context.Context, cluster *entity.Cluster, binding entity.TenantBinding) error {
|
||||||
|
if err := binding.Validate(); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
c.deleted = append(c.deleted, binding.Namespace)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
@ -6,6 +6,7 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
"github.com/google/uuid"
|
"github.com/google/uuid"
|
||||||
@ -34,6 +35,7 @@ type InstanceService struct {
|
|||||||
entryClient repository.InstanceEntryClient
|
entryClient repository.InstanceEntryClient
|
||||||
diagClient repository.InstanceDiagnosticsClient
|
diagClient repository.InstanceDiagnosticsClient
|
||||||
workspaceRepo repository.WorkspaceRepository
|
workspaceRepo repository.WorkspaceRepository
|
||||||
|
userRepo repository.UserRepository
|
||||||
tenantClient repository.TenantKubeClient
|
tenantClient repository.TenantKubeClient
|
||||||
scaleClient ScaleClient
|
scaleClient ScaleClient
|
||||||
}
|
}
|
||||||
@ -76,6 +78,10 @@ func (s *InstanceService) SetTenantProvisioning(workspaceRepo repository.Workspa
|
|||||||
s.tenantClient = tenantClient
|
s.tenantClient = tenantClient
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *InstanceService) SetUserRepository(userRepo repository.UserRepository) {
|
||||||
|
s.userRepo = userRepo
|
||||||
|
}
|
||||||
|
|
||||||
const chartCacheDir = "/tmp/charts"
|
const chartCacheDir = "/tmp/charts"
|
||||||
|
|
||||||
func (s *InstanceService) chartArchivePath(instance *entity.Instance) string {
|
func (s *InstanceService) chartArchivePath(instance *entity.Instance) string {
|
||||||
@ -131,15 +137,21 @@ func (s *InstanceService) CreateInstance(ctx context.Context, instance *entity.I
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
enforceNamespaceValues(instance)
|
enforceNamespaceValues(instance)
|
||||||
if err := s.ensureTenantForInstance(ctx, principal, cluster, instance); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
// 检查实例是否已存在
|
|
||||||
existingInstance, _ := s.instanceRepo.GetByClusterAndName(ctx, instance.ClusterID, instance.Name)
|
existingInstance, _ := s.instanceRepo.GetByClusterAndName(ctx, instance.ClusterID, instance.Name)
|
||||||
if existingInstance != nil {
|
if existingInstance != nil {
|
||||||
return entity.ErrInstanceExists
|
return entity.ErrInstanceExists
|
||||||
}
|
}
|
||||||
|
if err := s.downloadChart(ctx, registry, instance); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
binding, err := s.ensureTenantForInstance(ctx, principal, cluster, instance)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if err := s.precheckInstanceQuota(ctx, principal, cluster, binding, instance, nil); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
instance.BeginOperation(entity.OperationInstall, "Preparing installation")
|
instance.BeginOperation(entity.OperationInstall, "Preparing installation")
|
||||||
|
|
||||||
@ -148,13 +160,6 @@ func (s *InstanceService) CreateInstance(ctx context.Context, instance *entity.I
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
// 下载 chart artifact 供 Helm 使用
|
|
||||||
if err := s.downloadChart(ctx, registry, instance); err != nil {
|
|
||||||
instance.MarkFailure("Failed to download chart", err)
|
|
||||||
_ = s.instanceRepo.Update(ctx, instance)
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
// 异步执行 Helm 安装并监控状态
|
// 异步执行 Helm 安装并监控状态
|
||||||
go s.executeAndSyncInstall(context.Background(), instance.ID, cluster, registry, instance)
|
go s.executeAndSyncInstall(context.Background(), instance.ID, cluster, registry, instance)
|
||||||
|
|
||||||
@ -175,6 +180,7 @@ func (s *InstanceService) GetInstance(ctx context.Context, id string) (*entity.I
|
|||||||
if !s.canReadInstance(principal, instance) {
|
if !s.canReadInstance(principal, instance) {
|
||||||
return nil, entity.ErrInstanceNotFound
|
return nil, entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
s.enrichOwnerUsernames(ctx, []*entity.Instance{instance})
|
||||||
return instance, nil
|
return instance, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -219,8 +225,22 @@ func (s *InstanceService) UpdateInstance(ctx context.Context, instance *entity.I
|
|||||||
if !s.canWriteInstance(principal, existingInstance) {
|
if !s.canWriteInstance(principal, existingInstance) {
|
||||||
return entity.ErrForbidden
|
return entity.ErrForbidden
|
||||||
}
|
}
|
||||||
|
instance.ClusterID = existingInstance.ClusterID
|
||||||
instance.WorkspaceID = existingInstance.WorkspaceID
|
instance.WorkspaceID = existingInstance.WorkspaceID
|
||||||
instance.OwnerID = existingInstance.OwnerID
|
instance.OwnerID = existingInstance.OwnerID
|
||||||
|
instance.Name = existingInstance.Name
|
||||||
|
if instance.RegistryID == "" {
|
||||||
|
instance.RegistryID = existingInstance.RegistryID
|
||||||
|
}
|
||||||
|
if instance.Repository == "" {
|
||||||
|
instance.Repository = existingInstance.Repository
|
||||||
|
}
|
||||||
|
if instance.Chart == "" {
|
||||||
|
instance.Chart = existingInstance.Chart
|
||||||
|
}
|
||||||
|
if instance.Version == "" {
|
||||||
|
instance.Version = existingInstance.Version
|
||||||
|
}
|
||||||
|
|
||||||
// 获取集群信息
|
// 获取集群信息
|
||||||
cluster, err := s.clusterRepo.GetByID(ctx, existingInstance.ClusterID)
|
cluster, err := s.clusterRepo.GetByID(ctx, existingInstance.ClusterID)
|
||||||
@ -236,15 +256,21 @@ func (s *InstanceService) UpdateInstance(ctx context.Context, instance *entity.I
|
|||||||
|
|
||||||
instance.Namespace = existingInstance.Namespace
|
instance.Namespace = existingInstance.Namespace
|
||||||
enforceNamespaceValues(instance)
|
enforceNamespaceValues(instance)
|
||||||
instance.BeginOperation(entity.OperationUpgrade, "Pending upgrade")
|
|
||||||
if err := s.instanceRepo.Update(ctx, instance); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
// 下载所需 Chart
|
// 下载所需 Chart
|
||||||
if err := s.downloadChart(ctx, registry, instance); err != nil {
|
if err := s.downloadChart(ctx, registry, instance); err != nil {
|
||||||
instance.MarkFailure("Failed to download chart", err)
|
return err
|
||||||
_ = s.instanceRepo.Update(ctx, instance)
|
}
|
||||||
|
binding, err := s.ensureTenantForInstance(ctx, principal, cluster, instance)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if err := s.precheckInstanceQuota(ctx, principal, cluster, binding, instance, existingInstance); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
instance.BeginOperation(entity.OperationUpgrade, "Pending upgrade")
|
||||||
|
if err := s.instanceRepo.Update(ctx, instance); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -364,9 +390,32 @@ func (s *InstanceService) ListInstancesByCluster(ctx context.Context, clusterID
|
|||||||
visible = append(visible, instance)
|
visible = append(visible, instance)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
s.enrichOwnerUsernames(ctx, visible)
|
||||||
return visible, nil
|
return visible, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *InstanceService) enrichOwnerUsernames(ctx context.Context, instances []*entity.Instance) {
|
||||||
|
if s.userRepo == nil || len(instances) == 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
usernames := make(map[string]string)
|
||||||
|
for _, instance := range instances {
|
||||||
|
if instance == nil || instance.OwnerID == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if username, ok := usernames[instance.OwnerID]; ok {
|
||||||
|
instance.OwnerUsername = username
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
user, err := s.userRepo.GetByID(ctx, instance.OwnerID)
|
||||||
|
if err != nil || user == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
usernames[instance.OwnerID] = user.Username
|
||||||
|
instance.OwnerUsername = user.Username
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// ListInstanceEntries 列出实例关联的入口信息(Service / Ingress)
|
// ListInstanceEntries 列出实例关联的入口信息(Service / Ingress)
|
||||||
func (s *InstanceService) ListInstanceEntries(ctx context.Context, clusterID, instanceID string) ([]*entity.InstanceEntry, error) {
|
func (s *InstanceService) ListInstanceEntries(ctx context.Context, clusterID, instanceID string) ([]*entity.InstanceEntry, error) {
|
||||||
instance, err := s.GetInstance(ctx, instanceID)
|
instance, err := s.GetInstance(ctx, instanceID)
|
||||||
@ -442,27 +491,57 @@ func (s *InstanceService) ScaleInstance(ctx context.Context, clusterID, instance
|
|||||||
if !s.canWriteInstance(principal, instance) {
|
if !s.canWriteInstance(principal, instance) {
|
||||||
return nil, entity.ErrForbidden
|
return nil, entity.ErrForbidden
|
||||||
}
|
}
|
||||||
|
if instance.ClusterID != clusterID {
|
||||||
|
return nil, entity.ErrInstanceNotFound
|
||||||
|
}
|
||||||
cluster, err := s.clusterRepo.GetByID(ctx, clusterID)
|
cluster, err := s.clusterRepo.GetByID(ctx, clusterID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, entity.ErrClusterNotFound
|
return nil, entity.ErrClusterNotFound
|
||||||
}
|
}
|
||||||
|
|
||||||
|
current := cloneInstanceForQuota(instance)
|
||||||
|
currentValues, err := s.helmClient.GetValues(ctx, cluster, instance.Name, instance.Namespace)
|
||||||
|
if err == nil && currentValues != nil {
|
||||||
|
current.SetValues(currentValues)
|
||||||
|
}
|
||||||
|
target := cloneInstanceForQuota(instance)
|
||||||
|
targetValues := copyValues(current.Values)
|
||||||
|
if targetValues == nil {
|
||||||
|
targetValues = copyValues(instance.Values)
|
||||||
|
}
|
||||||
|
if targetValues == nil {
|
||||||
|
targetValues = map[string]interface{}{}
|
||||||
|
}
|
||||||
|
targetValues["replicaCount"] = replicas
|
||||||
|
target.SetValues(targetValues)
|
||||||
|
registry, err := s.registryRepo.GetByID(ctx, instance.RegistryID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, entity.ErrRegistryNotFound
|
||||||
|
}
|
||||||
|
if err := s.downloadChart(ctx, registry, target); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
binding, err := s.ensureTenantForInstance(ctx, principal, cluster, target)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if err := s.precheckInstanceQuota(ctx, principal, cluster, binding, target, current); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
// Scale via K8s API directly (like kubectl scale deploy --replicas=N)
|
// Scale via K8s API directly (like kubectl scale deploy --replicas=N)
|
||||||
if s.scaleClient != nil {
|
if s.scaleClient != nil {
|
||||||
if err := s.scaleClient.ScaleDeployment(ctx, cluster, instance.Namespace, instance.Name, int32(replicas)); err != nil {
|
if err := s.scaleClient.ScaleDeployment(ctx, cluster, instance.Namespace, instance.Name, int32(replicas)); err != nil {
|
||||||
return nil, fmt.Errorf("failed to scale deployment: %w", err)
|
return nil, fmt.Errorf("failed to scale deployment: %w", err)
|
||||||
}
|
}
|
||||||
|
instance.SetValues(targetValues)
|
||||||
|
instance.Replicas = replicas
|
||||||
|
if err := s.instanceRepo.Update(ctx, instance); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
// Fallback: Helm upgrade with replicaCount
|
// Fallback: Helm upgrade with replicaCount
|
||||||
vals, err := s.helmClient.GetValues(ctx, cluster, instance.Name, instance.Namespace)
|
instance.SetValues(targetValues)
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("failed to get current values: %w", err)
|
|
||||||
}
|
|
||||||
if vals == nil {
|
|
||||||
vals = make(map[string]interface{})
|
|
||||||
}
|
|
||||||
vals["replicaCount"] = replicas
|
|
||||||
instance.SetValues(vals)
|
|
||||||
instance.BeginOperation(entity.OperationUpgrade, fmt.Sprintf("Scaling to %d replicas", replicas))
|
instance.BeginOperation(entity.OperationUpgrade, fmt.Sprintf("Scaling to %d replicas", replicas))
|
||||||
if err := s.instanceRepo.Update(ctx, instance); err != nil {
|
if err := s.instanceRepo.Update(ctx, instance); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
@ -516,6 +595,9 @@ func (s *InstanceService) GetInstanceValuesDiff(ctx context.Context, clusterID,
|
|||||||
if !s.canReadInstance(principal, instance) {
|
if !s.canReadInstance(principal, instance) {
|
||||||
return nil, entity.ErrInstanceNotFound
|
return nil, entity.ErrInstanceNotFound
|
||||||
}
|
}
|
||||||
|
if instance.ClusterID != clusterID {
|
||||||
|
return nil, entity.ErrInstanceNotFound
|
||||||
|
}
|
||||||
cluster, err := s.clusterRepo.GetByID(ctx, clusterID)
|
cluster, err := s.clusterRepo.GetByID(ctx, clusterID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, entity.ErrClusterNotFound
|
return nil, entity.ErrClusterNotFound
|
||||||
@ -528,6 +610,18 @@ func (s *InstanceService) GetInstanceValuesDiff(ctx context.Context, clusterID,
|
|||||||
|
|
||||||
// Get default values from the chart archive
|
// Get default values from the chart archive
|
||||||
chartPath := s.chartArchivePath(instance)
|
chartPath := s.chartArchivePath(instance)
|
||||||
|
if _, statErr := os.Stat(chartPath); statErr != nil {
|
||||||
|
if !errors.Is(statErr, os.ErrNotExist) {
|
||||||
|
return nil, fmt.Errorf("failed to inspect chart defaults: %w", statErr)
|
||||||
|
}
|
||||||
|
registry, err := s.registryRepo.GetByID(ctx, instance.RegistryID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, entity.ErrRegistryNotFound
|
||||||
|
}
|
||||||
|
if err := s.downloadChart(ctx, registry, instance); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
defaults, err := s.helmClient.GetChartDefaultValues(chartPath)
|
defaults, err := s.helmClient.GetChartDefaultValues(chartPath)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to read chart defaults: %w", err)
|
return nil, fmt.Errorf("failed to read chart defaults: %w", err)
|
||||||
@ -593,9 +687,6 @@ func (s *InstanceService) applyNamespacePolicy(ctx context.Context, principal *a
|
|||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
if isReservedNamespace(instance.Namespace) {
|
|
||||||
return entity.ErrInvalidNamespace
|
|
||||||
}
|
|
||||||
if cluster.Visibility != authz.VisibilityPrivate || cluster.OwnerID != principal.UserID {
|
if cluster.Visibility != authz.VisibilityPrivate || cluster.OwnerID != principal.UserID {
|
||||||
namespace := principal.Namespace
|
namespace := principal.Namespace
|
||||||
if namespace == "" {
|
if namespace == "" {
|
||||||
@ -606,9 +697,15 @@ func (s *InstanceService) applyNamespacePolicy(ctx context.Context, principal *a
|
|||||||
namespace = binding.Namespace
|
namespace = binding.Namespace
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
if instance.Namespace != "" && instance.Namespace != namespace {
|
||||||
|
return entity.ErrForbidden
|
||||||
|
}
|
||||||
instance.Namespace = namespace
|
instance.Namespace = namespace
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
if isReservedNamespace(instance.Namespace) {
|
||||||
|
return entity.ErrInvalidNamespace
|
||||||
|
}
|
||||||
if instance.Namespace == "" {
|
if instance.Namespace == "" {
|
||||||
if cluster.DefaultNamespace != "" {
|
if cluster.DefaultNamespace != "" {
|
||||||
instance.Namespace = cluster.DefaultNamespace
|
instance.Namespace = cluster.DefaultNamespace
|
||||||
@ -621,8 +718,62 @@ func (s *InstanceService) applyNamespacePolicy(ctx context.Context, principal *a
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (s *InstanceService) ensureTenantForInstance(ctx context.Context, principal *authz.Principal, cluster *entity.Cluster, instance *entity.Instance) error {
|
func (s *InstanceService) ensureTenantForInstance(ctx context.Context, principal *authz.Principal, cluster *entity.Cluster, instance *entity.Instance) (*entity.WorkspaceClusterBinding, error) {
|
||||||
if principal.IsAdmin() || s.workspaceRepo == nil || s.tenantClient == nil {
|
if principal.IsAdmin() || s.workspaceRepo == nil || s.tenantClient == nil {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
workspace, err := s.workspaceRepo.GetByID(ctx, principal.WorkspaceID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if workspace.Status == entity.WorkspaceSuspended {
|
||||||
|
return nil, entity.ErrWorkspaceSuspended
|
||||||
|
}
|
||||||
|
binding := &entity.WorkspaceClusterBinding{
|
||||||
|
ID: uuid.New().String(),
|
||||||
|
WorkspaceID: workspace.ID,
|
||||||
|
ClusterID: cluster.ID,
|
||||||
|
Namespace: instance.Namespace,
|
||||||
|
ServiceAccount: workspace.K8sSAName,
|
||||||
|
QuotaCPU: strings.TrimSpace(workspace.QuotaCPU),
|
||||||
|
QuotaMemory: strings.TrimSpace(workspace.QuotaMemory),
|
||||||
|
QuotaGPU: zeroIfEmptyQuota(workspace.QuotaGPU),
|
||||||
|
QuotaGPUMem: zeroIfEmptyQuota(workspace.QuotaGPUMem),
|
||||||
|
Status: "active",
|
||||||
|
CreatedAt: time.Now(),
|
||||||
|
UpdatedAt: time.Now(),
|
||||||
|
}
|
||||||
|
if s.bindingRepo != nil {
|
||||||
|
if existing, err := s.bindingRepo.Get(ctx, workspace.ID, cluster.ID); err == nil && existing != nil {
|
||||||
|
binding.ID = existing.ID
|
||||||
|
binding.CreatedAt = existing.CreatedAt
|
||||||
|
if existing.Namespace != "" {
|
||||||
|
binding.Namespace = existing.Namespace
|
||||||
|
instance.Namespace = existing.Namespace
|
||||||
|
enforceNamespaceValues(instance)
|
||||||
|
}
|
||||||
|
if existing.ServiceAccount != "" {
|
||||||
|
binding.ServiceAccount = existing.ServiceAccount
|
||||||
|
}
|
||||||
|
if existing.Status != "" {
|
||||||
|
binding.Status = existing.Status
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
tenantBinding := tenantBindingFromWorkspaceClusterBinding(binding)
|
||||||
|
if err := s.tenantClient.EnsureTenant(ctx, cluster, tenantBinding); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if s.bindingRepo != nil {
|
||||||
|
if err := s.bindingRepo.Upsert(ctx, binding); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return binding, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *InstanceService) precheckInstanceQuota(ctx context.Context, principal *authz.Principal, cluster *entity.Cluster, binding *entity.WorkspaceClusterBinding, target, current *entity.Instance) error {
|
||||||
|
if principal.IsAdmin() || s.workspaceRepo == nil || s.helmClient == nil {
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
workspace, err := s.workspaceRepo.GetByID(ctx, principal.WorkspaceID)
|
workspace, err := s.workspaceRepo.GetByID(ctx, principal.WorkspaceID)
|
||||||
@ -632,29 +783,45 @@ func (s *InstanceService) ensureTenantForInstance(ctx context.Context, principal
|
|||||||
if workspace.Status == entity.WorkspaceSuspended {
|
if workspace.Status == entity.WorkspaceSuspended {
|
||||||
return entity.ErrWorkspaceSuspended
|
return entity.ErrWorkspaceSuspended
|
||||||
}
|
}
|
||||||
binding := entity.NewTenantBinding(instance.Namespace)
|
if binding == nil {
|
||||||
binding.ServiceAccountName = workspace.K8sSAName
|
binding = &entity.WorkspaceClusterBinding{
|
||||||
binding.ResourceQuotaHard = instanceResourceQuotaHard(workspace)
|
WorkspaceID: principal.WorkspaceID,
|
||||||
if err := s.tenantClient.EnsureTenant(ctx, cluster, binding); err != nil {
|
ClusterID: cluster.ID,
|
||||||
return err
|
Namespace: target.Namespace,
|
||||||
|
QuotaCPU: strings.TrimSpace(workspace.QuotaCPU),
|
||||||
|
QuotaMemory: strings.TrimSpace(workspace.QuotaMemory),
|
||||||
|
QuotaGPU: zeroIfEmptyQuota(workspace.QuotaGPU),
|
||||||
|
QuotaGPUMem: zeroIfEmptyQuota(workspace.QuotaGPUMem),
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if s.bindingRepo != nil {
|
var usage *repository.ResourceQuotaUsage
|
||||||
_ = s.bindingRepo.Upsert(ctx, &entity.WorkspaceClusterBinding{
|
if s.tenantClient != nil {
|
||||||
ID: uuid.New().String(),
|
tenantBinding := tenantBindingFromWorkspaceClusterBinding(binding)
|
||||||
WorkspaceID: workspace.ID,
|
quotaUsage, err := s.tenantClient.GetResourceQuotaUsage(ctx, cluster, tenantBinding)
|
||||||
ClusterID: cluster.ID,
|
if err != nil {
|
||||||
Namespace: instance.Namespace,
|
return err
|
||||||
ServiceAccount: workspace.K8sSAName,
|
}
|
||||||
QuotaCPU: workspace.QuotaCPU,
|
usage = quotaUsage
|
||||||
QuotaMemory: workspace.QuotaMemory,
|
|
||||||
QuotaGPU: workspace.QuotaGPU,
|
|
||||||
QuotaGPUMem: workspace.QuotaGPUMem,
|
|
||||||
Status: "active",
|
|
||||||
CreatedAt: time.Now(),
|
|
||||||
UpdatedAt: time.Now(),
|
|
||||||
})
|
|
||||||
}
|
}
|
||||||
return nil
|
result, err := NewQuotaPrecheckService(s.helmClient).EstimateAndCompareBinding(ctx, cluster, binding, usage, target, current)
|
||||||
|
if err == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if errors.Is(err, ErrQuotaExceeded) && result != nil {
|
||||||
|
return fmt.Errorf("%w: %s", ErrQuotaExceeded, formatQuotaExceeded(result.Exceeded))
|
||||||
|
}
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatQuotaExceeded(exceeded []QuotaExceededResource) string {
|
||||||
|
if len(exceeded) == 0 {
|
||||||
|
return "requested resources exceed workspace quota"
|
||||||
|
}
|
||||||
|
parts := make([]string, 0, len(exceeded))
|
||||||
|
for _, item := range exceeded {
|
||||||
|
parts = append(parts, fmt.Sprintf("%s required=%s quota=%s", item.Name, item.Required, item.Hard))
|
||||||
|
}
|
||||||
|
return strings.Join(parts, "; ")
|
||||||
}
|
}
|
||||||
|
|
||||||
func instanceResourceQuotaHard(workspace *entity.Workspace) corev1.ResourceList {
|
func instanceResourceQuotaHard(workspace *entity.Workspace) corev1.ResourceList {
|
||||||
@ -687,6 +854,46 @@ func instanceResourceQuotaHard(workspace *entity.Workspace) corev1.ResourceList
|
|||||||
return hard
|
return hard
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func tenantBindingFromWorkspaceClusterBinding(binding *entity.WorkspaceClusterBinding) entity.TenantBinding {
|
||||||
|
namespace := ""
|
||||||
|
if binding != nil {
|
||||||
|
namespace = binding.Namespace
|
||||||
|
}
|
||||||
|
tenantBinding := entity.NewTenantBinding(namespace)
|
||||||
|
if binding != nil {
|
||||||
|
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
||||||
|
tenantBinding.ResourceQuotaHard = bindingQuotaHard(binding)
|
||||||
|
}
|
||||||
|
return tenantBinding
|
||||||
|
}
|
||||||
|
|
||||||
|
func zeroIfEmptyQuota(value string) string {
|
||||||
|
if strings.TrimSpace(value) == "" {
|
||||||
|
return "0"
|
||||||
|
}
|
||||||
|
return strings.TrimSpace(value)
|
||||||
|
}
|
||||||
|
|
||||||
|
func cloneInstanceForQuota(instance *entity.Instance) *entity.Instance {
|
||||||
|
if instance == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
cloned := *instance
|
||||||
|
cloned.SetValues(copyValues(instance.Values))
|
||||||
|
return &cloned
|
||||||
|
}
|
||||||
|
|
||||||
|
func copyValues(values map[string]interface{}) map[string]interface{} {
|
||||||
|
if values == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
copied := make(map[string]interface{}, len(values))
|
||||||
|
for key, value := range values {
|
||||||
|
copied[key] = value
|
||||||
|
}
|
||||||
|
return copied
|
||||||
|
}
|
||||||
|
|
||||||
func isReservedNamespace(namespace string) bool {
|
func isReservedNamespace(namespace string) bool {
|
||||||
switch namespace {
|
switch namespace {
|
||||||
case "default", "kube-system", "kube-public", "kube-node-lease":
|
case "default", "kube-system", "kube-public", "kube-node-lease":
|
||||||
|
|||||||
@ -10,6 +10,7 @@ import (
|
|||||||
"github.com/ocdp/cluster-service/internal/domain/entity"
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
"github.com/ocdp/cluster-service/internal/domain/repository"
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
"github.com/ocdp/cluster-service/internal/pkg/authz"
|
"github.com/ocdp/cluster-service/internal/pkg/authz"
|
||||||
|
"k8s.io/apimachinery/pkg/api/resource"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestDeleteInstanceIgnoresMissingRelease(t *testing.T) {
|
func TestDeleteInstanceIgnoresMissingRelease(t *testing.T) {
|
||||||
@ -85,6 +86,210 @@ func TestEnforceNamespaceValuesOverridesChartNamespaceKnobs(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestApplyNamespacePolicyRejectsMismatchedTenantNamespace(t *testing.T) {
|
||||||
|
principal := &authz.Principal{
|
||||||
|
UserID: "user-1",
|
||||||
|
Username: "alice",
|
||||||
|
Role: authz.RoleUser,
|
||||||
|
WorkspaceID: "workspace-1",
|
||||||
|
WorkspaceName: "alice",
|
||||||
|
Namespace: "ocdp-u-alice",
|
||||||
|
}
|
||||||
|
cluster := &entity.Cluster{
|
||||||
|
ID: "cluster-1",
|
||||||
|
OwnerID: "admin",
|
||||||
|
Visibility: authz.VisibilityWorkspaceShared,
|
||||||
|
}
|
||||||
|
instance := &entity.Instance{Namespace: "other-namespace"}
|
||||||
|
svc := NewInstanceService(nil, nil, nil, nil, nil, nil)
|
||||||
|
|
||||||
|
if err := svc.applyNamespacePolicy(context.Background(), principal, cluster, instance); !errors.Is(err, entity.ErrForbidden) {
|
||||||
|
t.Fatalf("expected ErrForbidden for mismatched tenant namespace, got %v", err)
|
||||||
|
}
|
||||||
|
if instance.Namespace != "other-namespace" {
|
||||||
|
t.Fatalf("expected namespace to remain unchanged on rejection, got %q", instance.Namespace)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestApplyNamespacePolicyAllowsTenantNamespace(t *testing.T) {
|
||||||
|
principal := &authz.Principal{
|
||||||
|
UserID: "user-1",
|
||||||
|
Username: "alice",
|
||||||
|
Role: authz.RoleUser,
|
||||||
|
WorkspaceID: "workspace-1",
|
||||||
|
WorkspaceName: "alice",
|
||||||
|
Namespace: "ocdp-u-alice",
|
||||||
|
}
|
||||||
|
cluster := &entity.Cluster{
|
||||||
|
ID: "cluster-1",
|
||||||
|
OwnerID: "admin",
|
||||||
|
Visibility: authz.VisibilityWorkspaceShared,
|
||||||
|
}
|
||||||
|
instance := &entity.Instance{Namespace: "ocdp-u-alice"}
|
||||||
|
svc := NewInstanceService(nil, nil, nil, nil, nil, nil)
|
||||||
|
|
||||||
|
if err := svc.applyNamespacePolicy(context.Background(), principal, cluster, instance); err != nil {
|
||||||
|
t.Fatalf("expected matching tenant namespace to be allowed, got %v", err)
|
||||||
|
}
|
||||||
|
if instance.Namespace != "ocdp-u-alice" {
|
||||||
|
t.Fatalf("expected namespace to remain the allowed tenant namespace, got %q", instance.Namespace)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEnrichReplicasSetsLiveReplicaCount(t *testing.T) {
|
||||||
|
ctx := context.Background()
|
||||||
|
cluster := &entity.Cluster{ID: "cluster-1", Name: "cluster"}
|
||||||
|
svc := NewInstanceService(nil, &stubClusterRepo{cluster: cluster}, nil, nil, nil, nil)
|
||||||
|
svc.SetScaleClient(&stubScaleClient{replicas: 3})
|
||||||
|
|
||||||
|
instances := []*entity.Instance{{
|
||||||
|
ID: "inst-1",
|
||||||
|
ClusterID: "cluster-1",
|
||||||
|
Name: "demo",
|
||||||
|
Namespace: "ocdp-u-alice",
|
||||||
|
Replicas: 1,
|
||||||
|
}}
|
||||||
|
|
||||||
|
enriched := svc.EnrichReplicas(ctx, "cluster-1", instances)
|
||||||
|
if enriched[0].Replicas != 3 {
|
||||||
|
t.Fatalf("expected live replicas to overwrite stored count, got %d", enriched[0].Replicas)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestListInstancesByClusterHydratesOwnerUsername(t *testing.T) {
|
||||||
|
ctx := authz.WithPrincipal(context.Background(), &authz.Principal{
|
||||||
|
UserID: "admin-1",
|
||||||
|
Username: "admin",
|
||||||
|
Role: authz.RoleAdmin,
|
||||||
|
WorkspaceID: "workspace-admin",
|
||||||
|
})
|
||||||
|
instanceRepo := persistencemock.NewInstanceRepositoryMock()
|
||||||
|
userRepo := persistencemock.NewUserRepositoryMock()
|
||||||
|
if err := userRepo.Create(ctx, &entity.User{ID: "user-1", Username: "alice", PasswordHash: "hash", Role: "user", WorkspaceID: "workspace-1"}); err != nil {
|
||||||
|
t.Fatalf("failed to seed user: %v", err)
|
||||||
|
}
|
||||||
|
instance := &entity.Instance{
|
||||||
|
ID: "inst-1",
|
||||||
|
WorkspaceID: "workspace-1",
|
||||||
|
OwnerID: "user-1",
|
||||||
|
ClusterID: "cluster-1",
|
||||||
|
Name: "demo",
|
||||||
|
Namespace: "ocdp-u-alice",
|
||||||
|
}
|
||||||
|
if err := instanceRepo.Create(ctx, instance); err != nil {
|
||||||
|
t.Fatalf("failed to seed instance: %v", err)
|
||||||
|
}
|
||||||
|
svc := NewInstanceService(
|
||||||
|
instanceRepo,
|
||||||
|
&stubClusterRepo{cluster: &entity.Cluster{ID: "cluster-1", Name: "cluster"}},
|
||||||
|
nil,
|
||||||
|
nil,
|
||||||
|
nil,
|
||||||
|
nil,
|
||||||
|
)
|
||||||
|
svc.SetUserRepository(userRepo)
|
||||||
|
|
||||||
|
instances, err := svc.ListInstancesByCluster(ctx, "cluster-1")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("ListInstancesByCluster returned error: %v", err)
|
||||||
|
}
|
||||||
|
if len(instances) != 1 {
|
||||||
|
t.Fatalf("expected 1 instance, got %d", len(instances))
|
||||||
|
}
|
||||||
|
if instances[0].OwnerUsername != "alice" {
|
||||||
|
t.Fatalf("expected owner username alice, got %q", instances[0].OwnerUsername)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCreateInstanceRejectsGPUWhenWorkspaceQuotaEmptyBeforeCreate(t *testing.T) {
|
||||||
|
ctx := authz.WithPrincipal(context.Background(), &authz.Principal{
|
||||||
|
UserID: "user-ivanwu",
|
||||||
|
Username: "ivanwu",
|
||||||
|
Role: authz.RoleUser,
|
||||||
|
WorkspaceID: "workspace-ivanwu",
|
||||||
|
WorkspaceName: "ivanwu",
|
||||||
|
Namespace: "ocdp-u-ivanwu",
|
||||||
|
})
|
||||||
|
instanceRepo := persistencemock.NewInstanceRepositoryMock()
|
||||||
|
workspaceRepo := persistencemock.NewWorkspaceRepositoryMock()
|
||||||
|
bindingRepo := persistencemock.NewWorkspaceClusterBindingRepositoryMock()
|
||||||
|
workspace := entity.NewWorkspace("ivanwu", "admin")
|
||||||
|
workspace.ID = "workspace-ivanwu"
|
||||||
|
workspace.K8sNamespace = "ocdp-u-ivanwu"
|
||||||
|
workspace.K8sSAName = entity.ServiceAccountForNamespace(workspace.K8sNamespace)
|
||||||
|
workspace.QuotaCPU = "8"
|
||||||
|
workspace.QuotaMemory = "32Gi"
|
||||||
|
workspace.QuotaGPU = ""
|
||||||
|
workspace.QuotaGPUMem = ""
|
||||||
|
if err := workspaceRepo.Create(ctx, workspace); err != nil {
|
||||||
|
t.Fatalf("seed workspace: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cluster := &entity.Cluster{
|
||||||
|
ID: "k3s",
|
||||||
|
Name: "k3s",
|
||||||
|
Host: "https://k3s.invalid",
|
||||||
|
Token: "token",
|
||||||
|
OwnerID: "admin",
|
||||||
|
Visibility: authz.VisibilityGlobalShared,
|
||||||
|
}
|
||||||
|
registry := &entity.Registry{
|
||||||
|
ID: "registry-1",
|
||||||
|
Name: "harbor",
|
||||||
|
URL: "https://harbor.invalid",
|
||||||
|
OwnerID: "admin",
|
||||||
|
Visibility: authz.VisibilityGlobalShared,
|
||||||
|
}
|
||||||
|
helm := &stubHelmClient{
|
||||||
|
estimate: &repository.ResourceEstimate{
|
||||||
|
Requests: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("2"),
|
||||||
|
Memory: resource.MustParse("8Gi"),
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 10000,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
oci := &stubOCIClient{}
|
||||||
|
svc := NewInstanceService(
|
||||||
|
instanceRepo,
|
||||||
|
&stubClusterRepo{cluster: cluster},
|
||||||
|
&stubRegistryRepo{registry: registry},
|
||||||
|
helm,
|
||||||
|
oci,
|
||||||
|
nil,
|
||||||
|
bindingRepo,
|
||||||
|
)
|
||||||
|
svc.SetTenantProvisioning(workspaceRepo, &recordingTenantClient{usage: &repository.ResourceQuotaUsage{}})
|
||||||
|
|
||||||
|
instance := entity.NewInstance("k3s", "vllm-qwen", "ocdp-u-ivanwu", registry.ID, "library/vllm-serve", "vllm-serve", "0.1.0")
|
||||||
|
instance.SetValues(map[string]interface{}{
|
||||||
|
"image": map[string]interface{}{
|
||||||
|
"repository": "harbor.bwgdi.com/library/vllm-openai",
|
||||||
|
"tag": "v0.17.1",
|
||||||
|
},
|
||||||
|
"model": "Qwen/Qwen2.5-0.5B",
|
||||||
|
})
|
||||||
|
|
||||||
|
err := svc.CreateInstance(ctx, instance)
|
||||||
|
if !errors.Is(err, ErrQuotaExceeded) {
|
||||||
|
t.Fatalf("expected GPU quota rejection, got %v", err)
|
||||||
|
}
|
||||||
|
instances, listErr := instanceRepo.List(ctx)
|
||||||
|
if listErr != nil {
|
||||||
|
t.Fatalf("list instances: %v", listErr)
|
||||||
|
}
|
||||||
|
if len(instances) != 0 {
|
||||||
|
t.Fatalf("expected quota rejection before instance DB create, got %#v", instances)
|
||||||
|
}
|
||||||
|
if helm.installCalls != 0 {
|
||||||
|
t.Fatalf("expected Helm install not to be called, got %d calls", helm.installCalls)
|
||||||
|
}
|
||||||
|
if oci.pullCalls != 1 {
|
||||||
|
t.Fatalf("expected chart pull for quota rendering, got %d pulls", oci.pullCalls)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func waitForInstanceDeleted(t *testing.T, ctx context.Context, repo repository.InstanceRepository, id string) {
|
func waitForInstanceDeleted(t *testing.T, ctx context.Context, repo repository.InstanceRepository, id string) {
|
||||||
t.Helper()
|
t.Helper()
|
||||||
|
|
||||||
@ -133,13 +338,19 @@ func (*stubClusterRepo) List(ctx context.Context) ([]*entity.Cluster, error) { r
|
|||||||
|
|
||||||
type stubHelmClient struct {
|
type stubHelmClient struct {
|
||||||
uninstallErr error
|
uninstallErr error
|
||||||
|
estimate *repository.ResourceEstimate
|
||||||
|
values map[string]interface{}
|
||||||
|
installCalls int
|
||||||
|
upgradeCalls int
|
||||||
}
|
}
|
||||||
|
|
||||||
func (*stubHelmClient) Install(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
func (s *stubHelmClient) Install(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
||||||
|
s.installCalls++
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (*stubHelmClient) Upgrade(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
func (s *stubHelmClient) Upgrade(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) error {
|
||||||
|
s.upgradeCalls++
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -163,13 +374,116 @@ func (*stubHelmClient) List(ctx context.Context, cluster *entity.Cluster, namesp
|
|||||||
return nil, nil
|
return nil, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (*stubHelmClient) GetValues(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) (map[string]interface{}, error) {
|
func (s *stubHelmClient) GetValues(ctx context.Context, cluster *entity.Cluster, releaseName, namespace string) (map[string]interface{}, error) {
|
||||||
return nil, nil
|
return s.values, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (*stubHelmClient) GetChartDefaultValues(chartPath string) (map[string]interface{}, error) {
|
func (*stubHelmClient) GetChartDefaultValues(chartPath string) (map[string]interface{}, error) {
|
||||||
return nil, nil
|
return nil, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *stubHelmClient) EstimateInstanceResources(ctx context.Context, cluster *entity.Cluster, instance *entity.Instance) (*repository.ResourceEstimate, error) {
|
||||||
|
if s.estimate != nil {
|
||||||
|
return s.estimate, nil
|
||||||
|
}
|
||||||
|
return &repository.ResourceEstimate{}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
type stubRegistryRepo struct {
|
||||||
|
registry *entity.Registry
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubRegistryRepo) Create(ctx context.Context, registry *entity.Registry) error {
|
||||||
|
s.registry = registry
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubRegistryRepo) GetByID(ctx context.Context, id string) (*entity.Registry, error) {
|
||||||
|
if s.registry != nil && s.registry.ID == id {
|
||||||
|
return s.registry, nil
|
||||||
|
}
|
||||||
|
return nil, entity.ErrRegistryNotFound
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubRegistryRepo) GetByName(ctx context.Context, name string) (*entity.Registry, error) {
|
||||||
|
if s.registry != nil && s.registry.Name == name {
|
||||||
|
return s.registry, nil
|
||||||
|
}
|
||||||
|
return nil, entity.ErrRegistryNotFound
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubRegistryRepo) Update(ctx context.Context, registry *entity.Registry) error {
|
||||||
|
s.registry = registry
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubRegistryRepo) Delete(ctx context.Context, id string) error {
|
||||||
|
if s.registry != nil && s.registry.ID == id {
|
||||||
|
s.registry = nil
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
return entity.ErrRegistryNotFound
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubRegistryRepo) List(ctx context.Context) ([]*entity.Registry, error) {
|
||||||
|
if s.registry == nil {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
return []*entity.Registry{s.registry}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
type stubOCIClient struct {
|
||||||
|
pullCalls int
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) ListRepositories(ctx context.Context, registry *entity.Registry, artifactType string) ([]string, error) {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) ListArtifacts(ctx context.Context, registry *entity.Registry, repositoryName, mediaTypeFilter string) ([]*entity.Artifact, error) {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) GetArtifact(ctx context.Context, registry *entity.Registry, repositoryName, reference string) (*entity.Artifact, error) {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) GetValuesSchema(ctx context.Context, registry *entity.Registry, repositoryName, reference string) (string, error) {
|
||||||
|
return "", nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) GetValuesYAML(ctx context.Context, registry *entity.Registry, repositoryName, reference string) (string, error) {
|
||||||
|
return "", nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubOCIClient) PullArtifact(ctx context.Context, registry *entity.Registry, repositoryName, reference, destPath string) error {
|
||||||
|
s.pullCalls++
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) PushArtifact(ctx context.Context, registry *entity.Registry, repositoryName, tag, sourcePath string) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (*stubOCIClient) CheckHealth(ctx context.Context, registry *entity.Registry) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
type stubScaleClient struct {
|
||||||
|
replicas int32
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubScaleClient) GetDeploymentReplicas(ctx context.Context, cluster *entity.Cluster, namespace, releaseName string) (int32, error) {
|
||||||
|
return s.replicas, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *stubScaleClient) ScaleDeployment(ctx context.Context, cluster *entity.Cluster, namespace, releaseName string, replicas int32) error {
|
||||||
|
s.replicas = replicas
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
var _ repository.ClusterRepository = (*stubClusterRepo)(nil)
|
var _ repository.ClusterRepository = (*stubClusterRepo)(nil)
|
||||||
|
var _ repository.RegistryRepository = (*stubRegistryRepo)(nil)
|
||||||
var _ repository.HelmClient = (*stubHelmClient)(nil)
|
var _ repository.HelmClient = (*stubHelmClient)(nil)
|
||||||
|
var _ repository.OCIClient = (*stubOCIClient)(nil)
|
||||||
|
var _ ScaleClient = (*stubScaleClient)(nil)
|
||||||
|
|||||||
@ -3,6 +3,7 @@ package service
|
|||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
"sort"
|
||||||
|
|
||||||
"github.com/ocdp/cluster-service/internal/domain/entity"
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
"github.com/ocdp/cluster-service/internal/domain/repository"
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
@ -13,16 +14,22 @@ import (
|
|||||||
type MonitoringService struct {
|
type MonitoringService struct {
|
||||||
clusterRepo repository.ClusterRepository
|
clusterRepo repository.ClusterRepository
|
||||||
metricsClient repository.MetricsClient
|
metricsClient repository.MetricsClient
|
||||||
|
instanceRepo repository.InstanceRepository
|
||||||
|
userRepo repository.UserRepository
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewMonitoringService 创建监控服务
|
// NewMonitoringService 创建监控服务
|
||||||
func NewMonitoringService(
|
func NewMonitoringService(
|
||||||
clusterRepo repository.ClusterRepository,
|
clusterRepo repository.ClusterRepository,
|
||||||
metricsClient repository.MetricsClient,
|
metricsClient repository.MetricsClient,
|
||||||
|
instanceRepo repository.InstanceRepository,
|
||||||
|
userRepo repository.UserRepository,
|
||||||
) *MonitoringService {
|
) *MonitoringService {
|
||||||
return &MonitoringService{
|
return &MonitoringService{
|
||||||
clusterRepo: clusterRepo,
|
clusterRepo: clusterRepo,
|
||||||
metricsClient: metricsClient,
|
metricsClient: metricsClient,
|
||||||
|
instanceRepo: instanceRepo,
|
||||||
|
userRepo: userRepo,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -43,6 +50,8 @@ func (s *MonitoringService) GetClusterMonitoring(ctx context.Context, clusterID
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to get cluster metrics: %w", err)
|
return nil, fmt.Errorf("failed to get cluster metrics: %w", err)
|
||||||
}
|
}
|
||||||
|
s.enrichResourceUsage(ctx, principal, metrics)
|
||||||
|
s.scopeTenantMetrics(principal, metrics)
|
||||||
return metrics, nil
|
return metrics, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -75,12 +84,310 @@ func (s *MonitoringService) ListClusterMonitoring(ctx context.Context) ([]*entit
|
|||||||
Status: "unknown",
|
Status: "unknown",
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
s.enrichResourceUsage(ctx, principal, metrics)
|
||||||
|
s.scopeTenantMetrics(principal, metrics)
|
||||||
result = append(result, metrics)
|
result = append(result, metrics)
|
||||||
}
|
}
|
||||||
|
|
||||||
return result, nil
|
return result, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (s *MonitoringService) enrichResourceUsage(ctx context.Context, principal *authz.Principal, metrics *entity.ClusterMetrics) {
|
||||||
|
if metrics == nil || s.instanceRepo == nil || s.metricsClient == nil {
|
||||||
|
s.addVisibleUserRows(ctx, principal, metrics)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
instances, err := s.instanceRepo.ListByCluster(ctx, metrics.ClusterID)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("Warning: failed to list instances for cluster %s resource usage: %v\n", metrics.ClusterID, err)
|
||||||
|
s.addVisibleUserRows(ctx, principal, metrics)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
allocations, err := s.metricsClient.GetPodResourceAllocations(ctx, metrics.ClusterID)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("Warning: failed to list pod resource allocations for cluster %s: %v\n", metrics.ClusterID, err)
|
||||||
|
s.addVisibleUserRows(ctx, principal, metrics)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
visibleInstances := make(map[string]*entity.Instance)
|
||||||
|
for _, instance := range instances {
|
||||||
|
if instance == nil || !canReadMonitoringInstance(principal, instance) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
key := monitoringInstanceKey(instance.Namespace, instance.Name)
|
||||||
|
visibleInstances[key] = instance
|
||||||
|
}
|
||||||
|
|
||||||
|
type usageAccumulator struct {
|
||||||
|
userID string
|
||||||
|
username string
|
||||||
|
workspaceID string
|
||||||
|
allocation entity.ResourceAllocation
|
||||||
|
podCount int
|
||||||
|
instances map[string]struct{}
|
||||||
|
}
|
||||||
|
byUser := make(map[string]*usageAccumulator)
|
||||||
|
total := entity.ResourceAllocation{}
|
||||||
|
|
||||||
|
for _, pod := range allocations {
|
||||||
|
if pod == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
instance := visibleInstances[monitoringInstanceKey(pod.Namespace, pod.InstanceName)]
|
||||||
|
if instance == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
total = addResourceAllocation(total, pod.Allocation)
|
||||||
|
username := instance.OwnerUsername
|
||||||
|
if username == "" {
|
||||||
|
username = s.usernameForOwner(ctx, instance.OwnerID, principal)
|
||||||
|
}
|
||||||
|
acc := byUser[instance.OwnerID]
|
||||||
|
if acc == nil {
|
||||||
|
acc = &usageAccumulator{
|
||||||
|
userID: instance.OwnerID,
|
||||||
|
username: username,
|
||||||
|
workspaceID: instance.WorkspaceID,
|
||||||
|
instances: map[string]struct{}{},
|
||||||
|
}
|
||||||
|
byUser[instance.OwnerID] = acc
|
||||||
|
}
|
||||||
|
if acc.username == "" {
|
||||||
|
acc.username = username
|
||||||
|
}
|
||||||
|
acc.allocation = addResourceAllocation(acc.allocation, pod.Allocation)
|
||||||
|
acc.podCount++
|
||||||
|
acc.instances[instance.ID] = struct{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
metrics.CPURequests = formatCPUAllocation(total.CPURequestsMilli)
|
||||||
|
metrics.CPULimits = formatCPUAllocation(total.CPULimitsMilli)
|
||||||
|
metrics.MemoryRequests = formatMemoryAllocation(total.MemoryRequestsBytes)
|
||||||
|
metrics.MemoryLimits = formatMemoryAllocation(total.MemoryLimitsBytes)
|
||||||
|
metrics.GPURequests = total.GPURequests
|
||||||
|
metrics.GPULimits = total.GPULimits
|
||||||
|
metrics.GPUMemoryRequestsMB = total.GPUMemoryRequestsMB
|
||||||
|
metrics.GPUMemoryLimitsMB = total.GPUMemoryLimitsMB
|
||||||
|
metrics.AllocatedGPU = total.GPURequests
|
||||||
|
metrics.AllocatedGPUMemoryMB = total.GPUMemoryRequestsMB
|
||||||
|
|
||||||
|
userIDs := make([]string, 0, len(byUser))
|
||||||
|
for userID := range byUser {
|
||||||
|
userIDs = append(userIDs, userID)
|
||||||
|
}
|
||||||
|
sort.Slice(userIDs, func(i, j int) bool {
|
||||||
|
left := byUser[userIDs[i]]
|
||||||
|
right := byUser[userIDs[j]]
|
||||||
|
if left.username == right.username {
|
||||||
|
return left.userID < right.userID
|
||||||
|
}
|
||||||
|
return left.username < right.username
|
||||||
|
})
|
||||||
|
|
||||||
|
usage := make([]entity.UserResourceUsage, 0, len(userIDs))
|
||||||
|
for _, userID := range userIDs {
|
||||||
|
acc := byUser[userID]
|
||||||
|
usage = append(usage, entity.UserResourceUsage{
|
||||||
|
UserID: acc.userID,
|
||||||
|
Username: acc.username,
|
||||||
|
WorkspaceID: acc.workspaceID,
|
||||||
|
InstanceCount: len(acc.instances),
|
||||||
|
PodCount: acc.podCount,
|
||||||
|
CPURequests: formatCPUAllocation(acc.allocation.CPURequestsMilli),
|
||||||
|
CPULimits: formatCPUAllocation(acc.allocation.CPULimitsMilli),
|
||||||
|
MemoryRequests: formatMemoryAllocation(acc.allocation.MemoryRequestsBytes),
|
||||||
|
MemoryLimits: formatMemoryAllocation(acc.allocation.MemoryLimitsBytes),
|
||||||
|
GPURequests: acc.allocation.GPURequests,
|
||||||
|
GPULimits: acc.allocation.GPULimits,
|
||||||
|
GPUMemoryRequestsMB: acc.allocation.GPUMemoryRequestsMB,
|
||||||
|
GPUMemoryLimitsMB: acc.allocation.GPUMemoryLimitsMB,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
metrics.ResourceUsageByUser = usage
|
||||||
|
s.addVisibleUserRows(ctx, principal, metrics)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *MonitoringService) addVisibleUserRows(ctx context.Context, principal *authz.Principal, metrics *entity.ClusterMetrics) {
|
||||||
|
if principal == nil || metrics == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
existing := make(map[string]struct{}, len(metrics.ResourceUsageByUser))
|
||||||
|
for _, row := range metrics.ResourceUsageByUser {
|
||||||
|
if row.UserID != "" {
|
||||||
|
existing[row.UserID] = struct{}{}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
appendEmpty := func(userID, username, workspaceID string) {
|
||||||
|
if userID == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if _, ok := existing[userID]; ok {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
metrics.ResourceUsageByUser = append(metrics.ResourceUsageByUser, entity.UserResourceUsage{
|
||||||
|
UserID: userID,
|
||||||
|
Username: username,
|
||||||
|
WorkspaceID: workspaceID,
|
||||||
|
InstanceCount: 0,
|
||||||
|
PodCount: 0,
|
||||||
|
CPURequests: "0 cores",
|
||||||
|
CPULimits: "0 cores",
|
||||||
|
MemoryRequests: "0 B",
|
||||||
|
MemoryLimits: "0 B",
|
||||||
|
})
|
||||||
|
existing[userID] = struct{}{}
|
||||||
|
}
|
||||||
|
if !principal.IsAdmin() {
|
||||||
|
appendEmpty(principal.UserID, principal.Username, principal.WorkspaceID)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if s.userRepo == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
users, err := s.userRepo.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("Warning: failed to list users for monitoring rows: %v\n", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
for _, user := range users {
|
||||||
|
if user == nil || user.Role != authz.RoleUser || !user.IsActive {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
appendEmpty(user.ID, user.Username, user.WorkspaceID)
|
||||||
|
}
|
||||||
|
sort.Slice(metrics.ResourceUsageByUser, func(i, j int) bool {
|
||||||
|
left := metrics.ResourceUsageByUser[i]
|
||||||
|
right := metrics.ResourceUsageByUser[j]
|
||||||
|
if left.Username == right.Username {
|
||||||
|
return left.UserID < right.UserID
|
||||||
|
}
|
||||||
|
return left.Username < right.Username
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *MonitoringService) scopeTenantMetrics(principal *authz.Principal, metrics *entity.ClusterMetrics) {
|
||||||
|
if principal == nil || principal.IsAdmin() || metrics == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
var total entity.ResourceAllocation
|
||||||
|
podCount := 0
|
||||||
|
instanceCount := 0
|
||||||
|
for _, usage := range metrics.ResourceUsageByUser {
|
||||||
|
if usage.UserID != principal.UserID {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
podCount += usage.PodCount
|
||||||
|
instanceCount += usage.InstanceCount
|
||||||
|
total.GPURequests += usage.GPURequests
|
||||||
|
total.GPULimits += usage.GPULimits
|
||||||
|
total.GPUMemoryRequestsMB += usage.GPUMemoryRequestsMB
|
||||||
|
total.GPUMemoryLimitsMB += usage.GPUMemoryLimitsMB
|
||||||
|
}
|
||||||
|
metrics.NodeCount = 0
|
||||||
|
metrics.Nodes = nil
|
||||||
|
metrics.PodCount = podCount
|
||||||
|
metrics.TotalCPU = ""
|
||||||
|
metrics.TotalMemory = ""
|
||||||
|
metrics.TotalGPU = 0
|
||||||
|
metrics.UsedCPU = metrics.CPURequests
|
||||||
|
metrics.UsedMemory = metrics.MemoryRequests
|
||||||
|
metrics.UsedGPU = int(total.GPURequests)
|
||||||
|
metrics.CPUUsage = 0
|
||||||
|
metrics.MemoryUsage = 0
|
||||||
|
metrics.GPUUsage = 0
|
||||||
|
metrics.MaxNodeCPU = ""
|
||||||
|
metrics.MaxNodeMemory = ""
|
||||||
|
metrics.MaxNodeGPU = 0
|
||||||
|
metrics.MaxNodeCPUUsage = 0
|
||||||
|
metrics.MaxNodeMemUsage = 0
|
||||||
|
metrics.MaxNodeGPUUsage = 0
|
||||||
|
metrics.ResourceUsageByUser = filterSelfUsage(principal.UserID, metrics.ResourceUsageByUser)
|
||||||
|
if instanceCount == 0 {
|
||||||
|
metrics.CPURequests = ""
|
||||||
|
metrics.CPULimits = ""
|
||||||
|
metrics.MemoryRequests = ""
|
||||||
|
metrics.MemoryLimits = ""
|
||||||
|
metrics.GPURequests = 0
|
||||||
|
metrics.GPULimits = 0
|
||||||
|
metrics.GPUMemoryRequestsMB = 0
|
||||||
|
metrics.GPUMemoryLimitsMB = 0
|
||||||
|
metrics.AllocatedGPU = 0
|
||||||
|
metrics.AllocatedGPUMemoryMB = 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func filterSelfUsage(userID string, usage []entity.UserResourceUsage) []entity.UserResourceUsage {
|
||||||
|
filtered := make([]entity.UserResourceUsage, 0, len(usage))
|
||||||
|
for _, row := range usage {
|
||||||
|
if row.UserID == userID {
|
||||||
|
filtered = append(filtered, row)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return filtered
|
||||||
|
}
|
||||||
|
|
||||||
|
func canReadMonitoringInstance(principal *authz.Principal, instance *entity.Instance) bool {
|
||||||
|
if principal == nil || instance == nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if principal.IsAdmin() {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return instance.WorkspaceID == principal.WorkspaceID && instance.OwnerID == principal.UserID
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *MonitoringService) usernameForOwner(ctx context.Context, ownerID string, principal *authz.Principal) string {
|
||||||
|
if ownerID == "" {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
if principal != nil && ownerID == principal.UserID {
|
||||||
|
return principal.Username
|
||||||
|
}
|
||||||
|
if s.userRepo == nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
user, err := s.userRepo.GetByID(ctx, ownerID)
|
||||||
|
if err != nil || user == nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return user.Username
|
||||||
|
}
|
||||||
|
|
||||||
|
func monitoringInstanceKey(namespace, name string) string {
|
||||||
|
return namespace + "/" + name
|
||||||
|
}
|
||||||
|
|
||||||
|
func addResourceAllocation(left, right entity.ResourceAllocation) entity.ResourceAllocation {
|
||||||
|
return entity.ResourceAllocation{
|
||||||
|
CPURequestsMilli: left.CPURequestsMilli + right.CPURequestsMilli,
|
||||||
|
CPULimitsMilli: left.CPULimitsMilli + right.CPULimitsMilli,
|
||||||
|
MemoryRequestsBytes: left.MemoryRequestsBytes + right.MemoryRequestsBytes,
|
||||||
|
MemoryLimitsBytes: left.MemoryLimitsBytes + right.MemoryLimitsBytes,
|
||||||
|
GPURequests: left.GPURequests + right.GPURequests,
|
||||||
|
GPULimits: left.GPULimits + right.GPULimits,
|
||||||
|
GPUMemoryRequestsMB: left.GPUMemoryRequestsMB + right.GPUMemoryRequestsMB,
|
||||||
|
GPUMemoryLimitsMB: left.GPUMemoryLimitsMB + right.GPUMemoryLimitsMB,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatCPUAllocation(milli int64) string {
|
||||||
|
return fmt.Sprintf("%.2f cores", float64(milli)/1000.0)
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatMemoryAllocation(bytes int64) string {
|
||||||
|
const unit = 1024
|
||||||
|
if bytes < unit {
|
||||||
|
return fmt.Sprintf("%d B", bytes)
|
||||||
|
}
|
||||||
|
div, exp := int64(unit), 0
|
||||||
|
for n := bytes / unit; n >= unit; n /= unit {
|
||||||
|
div *= unit
|
||||||
|
exp++
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%.1f %ciB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||||
|
}
|
||||||
|
|
||||||
// GetMonitoringSummary 获取监控汇总信息
|
// GetMonitoringSummary 获取监控汇总信息
|
||||||
func (s *MonitoringService) GetMonitoringSummary(ctx context.Context) (*entity.MonitoringSummary, error) {
|
func (s *MonitoringService) GetMonitoringSummary(ctx context.Context) (*entity.MonitoringSummary, error) {
|
||||||
// 获取所有集群监控数据
|
// 获取所有集群监控数据
|
||||||
@ -123,6 +430,9 @@ func (s *MonitoringService) GetNodeMetrics(ctx context.Context, clusterID string
|
|||||||
if !authz.CanReadResource(principal, cluster.WorkspaceID, cluster.OwnerID, cluster.Visibility) {
|
if !authz.CanReadResource(principal, cluster.WorkspaceID, cluster.OwnerID, cluster.Visibility) {
|
||||||
return nil, entity.ErrClusterNotFound
|
return nil, entity.ErrClusterNotFound
|
||||||
}
|
}
|
||||||
|
if !principal.IsAdmin() {
|
||||||
|
return nil, entity.ErrForbidden
|
||||||
|
}
|
||||||
nodes, err := s.metricsClient.GetNodeMetrics(ctx, clusterID)
|
nodes, err := s.metricsClient.GetNodeMetrics(ctx, clusterID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to get node metrics: %w", err)
|
return nil, fmt.Errorf("failed to get node metrics: %w", err)
|
||||||
|
|||||||
228
backend/internal/domain/service/monitoring_service_test.go
Normal file
228
backend/internal/domain/service/monitoring_service_test.go
Normal file
@ -0,0 +1,228 @@
|
|||||||
|
package service
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
persistencemock "github.com/ocdp/cluster-service/internal/adapter/output/persistence/mock"
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
|
"github.com/ocdp/cluster-service/internal/pkg/authz"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestListClusterMonitoringAggregatesResourceUsageForAdmin(t *testing.T) {
|
||||||
|
ctx := authz.WithPrincipal(context.Background(), &authz.Principal{
|
||||||
|
UserID: "admin-1",
|
||||||
|
Username: "admin",
|
||||||
|
Role: authz.RoleAdmin,
|
||||||
|
WorkspaceID: "workspace-admin",
|
||||||
|
})
|
||||||
|
instanceRepo, userRepo := seedMonitoringOwners(t, ctx)
|
||||||
|
svc := NewMonitoringService(
|
||||||
|
&monitoringClusterRepo{clusters: []*entity.Cluster{{ID: "cluster-1", Name: "cluster", Visibility: authz.VisibilityGlobalShared}}},
|
||||||
|
&stubMetricsClient{allocations: monitoringAllocations()},
|
||||||
|
instanceRepo,
|
||||||
|
userRepo,
|
||||||
|
)
|
||||||
|
|
||||||
|
metrics, err := svc.ListClusterMonitoring(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("ListClusterMonitoring returned error: %v", err)
|
||||||
|
}
|
||||||
|
if len(metrics) != 1 {
|
||||||
|
t.Fatalf("expected 1 cluster metric, got %d", len(metrics))
|
||||||
|
}
|
||||||
|
got := metrics[0]
|
||||||
|
if got.AllocatedGPU != 3 || got.AllocatedGPUMemoryMB != 30000 {
|
||||||
|
t.Fatalf("expected total GPU/gpumem allocation 3/30000, got %d/%d", got.AllocatedGPU, got.AllocatedGPUMemoryMB)
|
||||||
|
}
|
||||||
|
if len(got.ResourceUsageByUser) != 2 {
|
||||||
|
t.Fatalf("expected 2 user usage rows, got %d: %#v", len(got.ResourceUsageByUser), got.ResourceUsageByUser)
|
||||||
|
}
|
||||||
|
if got.ResourceUsageByUser[0].Username != "alice" || got.ResourceUsageByUser[0].GPURequests != 1 {
|
||||||
|
t.Fatalf("expected alice GPU request row first, got %#v", got.ResourceUsageByUser[0])
|
||||||
|
}
|
||||||
|
if got.ResourceUsageByUser[1].Username != "bob" || got.ResourceUsageByUser[1].GPURequests != 2 {
|
||||||
|
t.Fatalf("expected bob GPU request row second, got %#v", got.ResourceUsageByUser[1])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestListClusterMonitoringFiltersResourceUsageForOrdinaryUser(t *testing.T) {
|
||||||
|
ctx := authz.WithPrincipal(context.Background(), &authz.Principal{
|
||||||
|
UserID: "user-1",
|
||||||
|
Username: "alice",
|
||||||
|
Role: authz.RoleUser,
|
||||||
|
WorkspaceID: "workspace-1",
|
||||||
|
})
|
||||||
|
instanceRepo, userRepo := seedMonitoringOwners(t, ctx)
|
||||||
|
svc := NewMonitoringService(
|
||||||
|
&monitoringClusterRepo{clusters: []*entity.Cluster{{ID: "cluster-1", Name: "cluster", Visibility: authz.VisibilityGlobalShared}}},
|
||||||
|
&stubMetricsClient{allocations: monitoringAllocations()},
|
||||||
|
instanceRepo,
|
||||||
|
userRepo,
|
||||||
|
)
|
||||||
|
|
||||||
|
metrics, err := svc.ListClusterMonitoring(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("ListClusterMonitoring returned error: %v", err)
|
||||||
|
}
|
||||||
|
got := metrics[0]
|
||||||
|
if got.AllocatedGPU != 1 || got.AllocatedGPUMemoryMB != 10000 {
|
||||||
|
t.Fatalf("expected ordinary user allocation to be scoped to alice, got %d/%d", got.AllocatedGPU, got.AllocatedGPUMemoryMB)
|
||||||
|
}
|
||||||
|
if len(got.ResourceUsageByUser) != 1 {
|
||||||
|
t.Fatalf("expected only alice usage row, got %d: %#v", len(got.ResourceUsageByUser), got.ResourceUsageByUser)
|
||||||
|
}
|
||||||
|
if got.ResourceUsageByUser[0].UserID != "user-1" || got.ResourceUsageByUser[0].Username != "alice" {
|
||||||
|
t.Fatalf("expected alice usage row, got %#v", got.ResourceUsageByUser[0])
|
||||||
|
}
|
||||||
|
if got.NodeCount != 0 || len(got.Nodes) != 0 || got.TotalCPU != "" || got.TotalMemory != "" {
|
||||||
|
t.Fatalf("expected ordinary user cluster-wide metrics to be sanitized, got nodes=%d/%d totalCPU=%q totalMemory=%q", got.NodeCount, len(got.Nodes), got.TotalCPU, got.TotalMemory)
|
||||||
|
}
|
||||||
|
if got.PodCount != 1 {
|
||||||
|
t.Fatalf("expected ordinary user pod count to be self scoped, got %d", got.PodCount)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestGetNodeMetricsForbiddenForOrdinaryUser(t *testing.T) {
|
||||||
|
ctx := authz.WithPrincipal(context.Background(), &authz.Principal{
|
||||||
|
UserID: "user-1",
|
||||||
|
Username: "alice",
|
||||||
|
Role: authz.RoleUser,
|
||||||
|
WorkspaceID: "workspace-1",
|
||||||
|
})
|
||||||
|
svc := NewMonitoringService(
|
||||||
|
&monitoringClusterRepo{clusters: []*entity.Cluster{{ID: "cluster-1", Name: "cluster", Visibility: authz.VisibilityGlobalShared}}},
|
||||||
|
&stubMetricsClient{allocations: monitoringAllocations()},
|
||||||
|
nil,
|
||||||
|
nil,
|
||||||
|
)
|
||||||
|
|
||||||
|
_, err := svc.GetNodeMetrics(ctx, "cluster-1")
|
||||||
|
if err != entity.ErrForbidden {
|
||||||
|
t.Fatalf("expected ordinary user node metrics to be forbidden, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func seedMonitoringOwners(t *testing.T, ctx context.Context) (*persistencemock.InstanceRepositoryMock, *persistencemock.UserRepositoryMock) {
|
||||||
|
t.Helper()
|
||||||
|
instanceRepo := persistencemock.NewInstanceRepositoryMock().(*persistencemock.InstanceRepositoryMock)
|
||||||
|
userRepo := persistencemock.NewUserRepositoryMock().(*persistencemock.UserRepositoryMock)
|
||||||
|
for _, user := range []*entity.User{
|
||||||
|
{ID: "user-1", Username: "alice", PasswordHash: "hash", Role: "user", WorkspaceID: "workspace-1"},
|
||||||
|
{ID: "user-2", Username: "bob", PasswordHash: "hash", Role: "user", WorkspaceID: "workspace-2"},
|
||||||
|
} {
|
||||||
|
if err := userRepo.Create(ctx, user); err != nil {
|
||||||
|
t.Fatalf("failed to seed user %s: %v", user.ID, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for _, instance := range []*entity.Instance{
|
||||||
|
{ID: "inst-1", ClusterID: "cluster-1", Name: "alice-app", Namespace: "ocdp-u-alice", WorkspaceID: "workspace-1", OwnerID: "user-1"},
|
||||||
|
{ID: "inst-2", ClusterID: "cluster-1", Name: "bob-app", Namespace: "ocdp-u-bob", WorkspaceID: "workspace-2", OwnerID: "user-2"},
|
||||||
|
} {
|
||||||
|
if err := instanceRepo.Create(ctx, instance); err != nil {
|
||||||
|
t.Fatalf("failed to seed instance %s: %v", instance.ID, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return instanceRepo, userRepo
|
||||||
|
}
|
||||||
|
|
||||||
|
func monitoringAllocations() []*entity.PodResourceAllocation {
|
||||||
|
return []*entity.PodResourceAllocation{
|
||||||
|
{
|
||||||
|
ClusterID: "cluster-1",
|
||||||
|
Namespace: "ocdp-u-alice",
|
||||||
|
PodName: "alice-app-0",
|
||||||
|
InstanceName: "alice-app",
|
||||||
|
Allocation: entity.ResourceAllocation{
|
||||||
|
CPURequestsMilli: 500,
|
||||||
|
CPULimitsMilli: 1000,
|
||||||
|
MemoryRequestsBytes: 1024 * 1024 * 1024,
|
||||||
|
MemoryLimitsBytes: 2 * 1024 * 1024 * 1024,
|
||||||
|
GPURequests: 1,
|
||||||
|
GPULimits: 1,
|
||||||
|
GPUMemoryRequestsMB: 10000,
|
||||||
|
GPUMemoryLimitsMB: 10000,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
ClusterID: "cluster-1",
|
||||||
|
Namespace: "ocdp-u-bob",
|
||||||
|
PodName: "bob-app-0",
|
||||||
|
InstanceName: "bob-app",
|
||||||
|
Allocation: entity.ResourceAllocation{
|
||||||
|
CPURequestsMilli: 2000,
|
||||||
|
CPULimitsMilli: 4000,
|
||||||
|
MemoryRequestsBytes: 4 * 1024 * 1024 * 1024,
|
||||||
|
MemoryLimitsBytes: 8 * 1024 * 1024 * 1024,
|
||||||
|
GPURequests: 2,
|
||||||
|
GPULimits: 2,
|
||||||
|
GPUMemoryRequestsMB: 20000,
|
||||||
|
GPUMemoryLimitsMB: 20000,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
type monitoringClusterRepo struct {
|
||||||
|
clusters []*entity.Cluster
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *monitoringClusterRepo) Create(ctx context.Context, cluster *entity.Cluster) error {
|
||||||
|
r.clusters = append(r.clusters, cluster)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *monitoringClusterRepo) GetByID(ctx context.Context, id string) (*entity.Cluster, error) {
|
||||||
|
for _, cluster := range r.clusters {
|
||||||
|
if cluster.ID == id {
|
||||||
|
return cluster, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil, entity.ErrClusterNotFound
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *monitoringClusterRepo) GetByName(ctx context.Context, name string) (*entity.Cluster, error) {
|
||||||
|
for _, cluster := range r.clusters {
|
||||||
|
if cluster.Name == name {
|
||||||
|
return cluster, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil, entity.ErrClusterNotFound
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *monitoringClusterRepo) Update(ctx context.Context, cluster *entity.Cluster) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *monitoringClusterRepo) Delete(ctx context.Context, id string) error { return nil }
|
||||||
|
|
||||||
|
func (r *monitoringClusterRepo) List(ctx context.Context) ([]*entity.Cluster, error) {
|
||||||
|
return r.clusters, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
type stubMetricsClient struct {
|
||||||
|
allocations []*entity.PodResourceAllocation
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *stubMetricsClient) GetClusterMetrics(ctx context.Context, clusterID string) (*entity.ClusterMetrics, error) {
|
||||||
|
return &entity.ClusterMetrics{
|
||||||
|
ClusterID: clusterID,
|
||||||
|
ClusterName: "cluster",
|
||||||
|
Status: "healthy",
|
||||||
|
NodeCount: 3,
|
||||||
|
PodCount: 99,
|
||||||
|
TotalCPU: "48 cores",
|
||||||
|
TotalMemory: "256Gi",
|
||||||
|
Nodes: []entity.NodeMetrics{{NodeName: "node-a"}},
|
||||||
|
LastCheck: time.Now(),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *stubMetricsClient) GetNodeMetrics(ctx context.Context, clusterID string) ([]*entity.NodeMetrics, error) {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *stubMetricsClient) GetPodResourceAllocations(ctx context.Context, clusterID string) ([]*entity.PodResourceAllocation, error) {
|
||||||
|
return c.allocations, nil
|
||||||
|
}
|
||||||
400
backend/internal/domain/service/quota_precheck.go
Normal file
400
backend/internal/domain/service/quota_precheck.go
Normal file
@ -0,0 +1,400 @@
|
|||||||
|
package service
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"sort"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
|
corev1 "k8s.io/api/core/v1"
|
||||||
|
"k8s.io/apimachinery/pkg/api/resource"
|
||||||
|
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
|
||||||
|
"k8s.io/apimachinery/pkg/util/yaml"
|
||||||
|
)
|
||||||
|
|
||||||
|
var ErrQuotaExceeded = errors.New("quota exceeded")
|
||||||
|
|
||||||
|
type QuotaExceededResource struct {
|
||||||
|
Name string
|
||||||
|
Required string
|
||||||
|
Hard string
|
||||||
|
}
|
||||||
|
|
||||||
|
type QuotaPrecheckResult struct {
|
||||||
|
Allowed bool
|
||||||
|
Required repository.ResourceEstimate
|
||||||
|
Hard repository.ResourceVector
|
||||||
|
Exceeded []QuotaExceededResource
|
||||||
|
}
|
||||||
|
|
||||||
|
type QuotaPrecheckService struct {
|
||||||
|
helmClient repository.HelmClient
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewQuotaPrecheckService(helmClient repository.HelmClient) *QuotaPrecheckService {
|
||||||
|
return &QuotaPrecheckService{helmClient: helmClient}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *QuotaPrecheckService) EstimateAndCompare(ctx context.Context, cluster *entity.Cluster, workspace *entity.Workspace, instance *entity.Instance) (*QuotaPrecheckResult, error) {
|
||||||
|
if s == nil || s.helmClient == nil {
|
||||||
|
return nil, errors.New("quota precheck requires helm client")
|
||||||
|
}
|
||||||
|
estimate, err := s.helmClient.EstimateInstanceResources(ctx, cluster, instance)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
result, err := CompareWorkspaceQuota(workspace, estimate)
|
||||||
|
if err != nil {
|
||||||
|
return result, err
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (s *QuotaPrecheckService) EstimateAndCompareBinding(ctx context.Context, cluster *entity.Cluster, binding *entity.WorkspaceClusterBinding, usage *repository.ResourceQuotaUsage, target *entity.Instance, current *entity.Instance) (*QuotaPrecheckResult, error) {
|
||||||
|
if s == nil || s.helmClient == nil {
|
||||||
|
return nil, errors.New("quota precheck requires helm client")
|
||||||
|
}
|
||||||
|
targetEstimate, err := s.helmClient.EstimateInstanceResources(ctx, cluster, target)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
var currentEstimate *repository.ResourceEstimate
|
||||||
|
if current != nil {
|
||||||
|
currentEstimate, err = s.helmClient.EstimateInstanceResources(ctx, cluster, current)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
result, err := CompareBindingQuota(binding, usage, targetEstimate, currentEstimate)
|
||||||
|
if err != nil {
|
||||||
|
return result, err
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func CompareWorkspaceQuota(workspace *entity.Workspace, estimate *repository.ResourceEstimate) (*QuotaPrecheckResult, error) {
|
||||||
|
return compareQuotaList(resourceQuotaHard(workspace), nil, estimate, nil)
|
||||||
|
}
|
||||||
|
|
||||||
|
func CompareBindingQuota(binding *entity.WorkspaceClusterBinding, usage *repository.ResourceQuotaUsage, targetEstimate, currentEstimate *repository.ResourceEstimate) (*QuotaPrecheckResult, error) {
|
||||||
|
return compareQuotaList(bindingQuotaHard(binding), usage, targetEstimate, currentEstimate)
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareQuotaList(hardList corev1.ResourceList, usage *repository.ResourceQuotaUsage, targetEstimate, currentEstimate *repository.ResourceEstimate) (*QuotaPrecheckResult, error) {
|
||||||
|
if targetEstimate == nil {
|
||||||
|
targetEstimate = &repository.ResourceEstimate{}
|
||||||
|
}
|
||||||
|
current := effectiveQuotaRequests(currentEstimate)
|
||||||
|
target := effectiveQuotaRequests(targetEstimate)
|
||||||
|
used := repository.ResourceVector{}
|
||||||
|
if usage != nil {
|
||||||
|
used = usage.Used
|
||||||
|
}
|
||||||
|
required := addResourceVector(subtractResourceVectorFloorZero(used, current), target)
|
||||||
|
hard := resourceVectorFromQuotaHard(hardList)
|
||||||
|
result := &QuotaPrecheckResult{
|
||||||
|
Allowed: true,
|
||||||
|
Required: repository.ResourceEstimate{
|
||||||
|
Requests: required,
|
||||||
|
},
|
||||||
|
Hard: hard,
|
||||||
|
}
|
||||||
|
addExceeded := func(name, required, limit string) {
|
||||||
|
result.Allowed = false
|
||||||
|
result.Exceeded = append(result.Exceeded, QuotaExceededResource{
|
||||||
|
Name: name,
|
||||||
|
Required: required,
|
||||||
|
Hard: limit,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.cpu")]; ok && required.CPU.Cmp(quantity) > 0 {
|
||||||
|
addExceeded("requests.cpu", required.CPU.String(), quantity.String())
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.memory")]; ok && required.Memory.Cmp(quantity) > 0 {
|
||||||
|
addExceeded("requests.memory", required.Memory.String(), quantity.String())
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.nvidia.com/gpu")]; ok && required.GPU > quantity.Value() {
|
||||||
|
addExceeded("requests.nvidia.com/gpu", strconv.FormatInt(required.GPU, 10), quantity.String())
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.nvidia.com/gpumem")]; ok && required.GPUMemoryMB > quantity.Value() {
|
||||||
|
addExceeded("requests.nvidia.com/gpumem", strconv.FormatInt(required.GPUMemoryMB, 10), quantity.String())
|
||||||
|
}
|
||||||
|
sort.Slice(result.Exceeded, func(i, j int) bool {
|
||||||
|
return result.Exceeded[i].Name < result.Exceeded[j].Name
|
||||||
|
})
|
||||||
|
if !result.Allowed {
|
||||||
|
return result, ErrQuotaExceeded
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func legacyCompareWorkspaceQuota(workspace *entity.Workspace, estimate *repository.ResourceEstimate) (*QuotaPrecheckResult, error) {
|
||||||
|
if estimate == nil {
|
||||||
|
estimate = &repository.ResourceEstimate{}
|
||||||
|
}
|
||||||
|
hardList := resourceQuotaHard(workspace)
|
||||||
|
hard := resourceVectorFromQuotaHard(hardList)
|
||||||
|
result := &QuotaPrecheckResult{
|
||||||
|
Allowed: true,
|
||||||
|
Required: *estimate,
|
||||||
|
Hard: hard,
|
||||||
|
}
|
||||||
|
effectiveRequests := effectiveQuotaRequests(estimate)
|
||||||
|
addExceeded := func(name, required, limit string) {
|
||||||
|
result.Allowed = false
|
||||||
|
result.Exceeded = append(result.Exceeded, QuotaExceededResource{
|
||||||
|
Name: name,
|
||||||
|
Required: required,
|
||||||
|
Hard: limit,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.cpu")]; ok && effectiveRequests.CPU.Cmp(quantity) > 0 {
|
||||||
|
addExceeded("requests.cpu", effectiveRequests.CPU.String(), quantity.String())
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.memory")]; ok && effectiveRequests.Memory.Cmp(quantity) > 0 {
|
||||||
|
addExceeded("requests.memory", effectiveRequests.Memory.String(), quantity.String())
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.nvidia.com/gpu")]; ok && effectiveRequests.GPU > quantity.Value() {
|
||||||
|
addExceeded("requests.nvidia.com/gpu", strconv.FormatInt(effectiveRequests.GPU, 10), quantity.String())
|
||||||
|
}
|
||||||
|
if quantity, ok := hardList[corev1.ResourceName("requests.nvidia.com/gpumem")]; ok && effectiveRequests.GPUMemoryMB > quantity.Value() {
|
||||||
|
addExceeded("requests.nvidia.com/gpumem", strconv.FormatInt(effectiveRequests.GPUMemoryMB, 10), quantity.String())
|
||||||
|
}
|
||||||
|
sort.Slice(result.Exceeded, func(i, j int) bool {
|
||||||
|
return result.Exceeded[i].Name < result.Exceeded[j].Name
|
||||||
|
})
|
||||||
|
if !result.Allowed {
|
||||||
|
return result, ErrQuotaExceeded
|
||||||
|
}
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func effectiveQuotaRequests(estimate *repository.ResourceEstimate) repository.ResourceVector {
|
||||||
|
if estimate == nil {
|
||||||
|
return repository.ResourceVector{}
|
||||||
|
}
|
||||||
|
return repository.ResourceVector{
|
||||||
|
CPU: maxQuantity(estimate.Requests.CPU, estimate.Limits.CPU),
|
||||||
|
Memory: maxQuantity(estimate.Requests.Memory, estimate.Limits.Memory),
|
||||||
|
GPU: maxInt64(estimate.Requests.GPU, estimate.Limits.GPU),
|
||||||
|
GPUMemoryMB: maxInt64(estimate.Requests.GPUMemoryMB, estimate.Limits.GPUMemoryMB),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func addResourceVector(left, right repository.ResourceVector) repository.ResourceVector {
|
||||||
|
out := left
|
||||||
|
out.CPU.Add(right.CPU)
|
||||||
|
out.Memory.Add(right.Memory)
|
||||||
|
out.GPU += right.GPU
|
||||||
|
out.GPUMemoryMB += right.GPUMemoryMB
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
func subtractResourceVectorFloorZero(left, right repository.ResourceVector) repository.ResourceVector {
|
||||||
|
out := left
|
||||||
|
out.CPU.Sub(right.CPU)
|
||||||
|
if out.CPU.Sign() < 0 {
|
||||||
|
out.CPU = resource.Quantity{}
|
||||||
|
}
|
||||||
|
out.Memory.Sub(right.Memory)
|
||||||
|
if out.Memory.Sign() < 0 {
|
||||||
|
out.Memory = resource.Quantity{}
|
||||||
|
}
|
||||||
|
out.GPU -= right.GPU
|
||||||
|
if out.GPU < 0 {
|
||||||
|
out.GPU = 0
|
||||||
|
}
|
||||||
|
out.GPUMemoryMB -= right.GPUMemoryMB
|
||||||
|
if out.GPUMemoryMB < 0 {
|
||||||
|
out.GPUMemoryMB = 0
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
func maxQuantity(left, right resource.Quantity) resource.Quantity {
|
||||||
|
if left.Cmp(right) >= 0 {
|
||||||
|
return left
|
||||||
|
}
|
||||||
|
return right
|
||||||
|
}
|
||||||
|
|
||||||
|
func maxInt64(left, right int64) int64 {
|
||||||
|
if left >= right {
|
||||||
|
return left
|
||||||
|
}
|
||||||
|
return right
|
||||||
|
}
|
||||||
|
|
||||||
|
func EstimateRenderedManifestResources(manifest string) (*repository.ResourceEstimate, error) {
|
||||||
|
decoder := yaml.NewYAMLOrJSONDecoder(strings.NewReader(manifest), 4096)
|
||||||
|
estimate := &repository.ResourceEstimate{}
|
||||||
|
for {
|
||||||
|
var obj unstructured.Unstructured
|
||||||
|
if err := decoder.Decode(&obj); err != nil {
|
||||||
|
if errors.Is(err, io.EOF) {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("failed to decode rendered manifest: %w", err)
|
||||||
|
}
|
||||||
|
if obj.GetKind() == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
podSpec, replicas, ok := podTemplateSpec(obj.Object)
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
addPodSpecResources(estimate, podSpec, replicas)
|
||||||
|
}
|
||||||
|
return estimate, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func resourceVectorFromQuotaHard(hard corev1.ResourceList) repository.ResourceVector {
|
||||||
|
gpu := hard[corev1.ResourceName("requests.nvidia.com/gpu")]
|
||||||
|
gpuMemory := hard[corev1.ResourceName("requests.nvidia.com/gpumem")]
|
||||||
|
return repository.ResourceVector{
|
||||||
|
CPU: hard[corev1.ResourceName("requests.cpu")],
|
||||||
|
Memory: hard[corev1.ResourceName("requests.memory")],
|
||||||
|
GPU: gpu.Value(),
|
||||||
|
GPUMemoryMB: gpuMemory.Value(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func bindingQuotaHard(binding *entity.WorkspaceClusterBinding) corev1.ResourceList {
|
||||||
|
hard := corev1.ResourceList{}
|
||||||
|
if binding == nil {
|
||||||
|
return hard
|
||||||
|
}
|
||||||
|
addQuantity := func(name corev1.ResourceName, value string) {
|
||||||
|
value = normalizeStandardQuotaQuantity(value)
|
||||||
|
if value == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if quantity, err := resource.ParseQuantity(value); err == nil {
|
||||||
|
hard[name] = quantity
|
||||||
|
}
|
||||||
|
}
|
||||||
|
addGPUMemoryQuantity := func(value string) {
|
||||||
|
value, err := normalizeGPUMemoryQuota(value)
|
||||||
|
if err != nil || value == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if quantity, err := resource.ParseQuantity(value); err == nil {
|
||||||
|
hard[corev1.ResourceName("requests.nvidia.com/gpumem")] = quantity
|
||||||
|
}
|
||||||
|
}
|
||||||
|
addQuantity(corev1.ResourceName("requests.cpu"), binding.QuotaCPU)
|
||||||
|
addQuantity(corev1.ResourceName("requests.memory"), binding.QuotaMemory)
|
||||||
|
addQuantity(corev1.ResourceName("requests.nvidia.com/gpu"), binding.QuotaGPU)
|
||||||
|
addGPUMemoryQuantity(binding.QuotaGPUMem)
|
||||||
|
return hard
|
||||||
|
}
|
||||||
|
|
||||||
|
func podTemplateSpec(obj map[string]interface{}) (map[string]interface{}, int64, bool) {
|
||||||
|
kind, _, _ := unstructured.NestedString(obj, "kind")
|
||||||
|
switch kind {
|
||||||
|
case "Pod":
|
||||||
|
spec, ok := nestedMap(obj, "spec")
|
||||||
|
return spec, 1, ok
|
||||||
|
case "Deployment", "ReplicaSet", "StatefulSet", "ReplicationController":
|
||||||
|
spec, replicas, ok := workloadTemplateSpec(obj)
|
||||||
|
return spec, replicas, ok
|
||||||
|
case "DaemonSet", "Job":
|
||||||
|
spec, ok := nestedMap(obj, "spec", "template", "spec")
|
||||||
|
return spec, 1, ok
|
||||||
|
case "CronJob":
|
||||||
|
spec, ok := nestedMap(obj, "spec", "jobTemplate", "spec", "template", "spec")
|
||||||
|
return spec, 1, ok
|
||||||
|
default:
|
||||||
|
return nil, 0, false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func workloadTemplateSpec(obj map[string]interface{}) (map[string]interface{}, int64, bool) {
|
||||||
|
spec, ok := nestedMap(obj, "spec", "template", "spec")
|
||||||
|
if !ok {
|
||||||
|
return nil, 0, false
|
||||||
|
}
|
||||||
|
replicas, _, err := unstructured.NestedInt64(obj, "spec", "replicas")
|
||||||
|
if err != nil || replicas < 1 {
|
||||||
|
replicas = 1
|
||||||
|
}
|
||||||
|
return spec, replicas, true
|
||||||
|
}
|
||||||
|
|
||||||
|
func nestedMap(obj map[string]interface{}, fields ...string) (map[string]interface{}, bool) {
|
||||||
|
value, ok, err := unstructured.NestedMap(obj, fields...)
|
||||||
|
return value, ok && err == nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func addPodSpecResources(estimate *repository.ResourceEstimate, podSpec map[string]interface{}, replicas int64) {
|
||||||
|
if replicas < 1 {
|
||||||
|
replicas = 1
|
||||||
|
}
|
||||||
|
for _, field := range []string{"initContainers", "containers"} {
|
||||||
|
containers, ok, err := unstructured.NestedSlice(podSpec, field)
|
||||||
|
if err != nil || !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
for _, item := range containers {
|
||||||
|
container, ok := item.(map[string]interface{})
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
addContainerResourceList(&estimate.Requests, replicas, container, "resources", "requests")
|
||||||
|
addContainerResourceList(&estimate.Limits, replicas, container, "resources", "limits")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func addContainerResourceList(target *repository.ResourceVector, replicas int64, container map[string]interface{}, fields ...string) {
|
||||||
|
resources, ok := nestedMap(container, fields...)
|
||||||
|
if !ok {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
for name, value := range resources {
|
||||||
|
switch name {
|
||||||
|
case "cpu":
|
||||||
|
addQuantity(&target.CPU, value, replicas)
|
||||||
|
case "memory":
|
||||||
|
addQuantity(&target.Memory, value, replicas)
|
||||||
|
case "nvidia.com/gpu", "requests.nvidia.com/gpu", "limits.nvidia.com/gpu":
|
||||||
|
target.GPU += parseIntegerResource(value) * replicas
|
||||||
|
case "nvidia.com/gpumem", "requests.nvidia.com/gpumem", "limits.nvidia.com/gpumem":
|
||||||
|
target.GPUMemoryMB += parseGPUMemoryResource(value) * replicas
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func addQuantity(target *resource.Quantity, value interface{}, replicas int64) {
|
||||||
|
quantity, err := resource.ParseQuantity(fmt.Sprint(value))
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
quantity.Mul(replicas)
|
||||||
|
target.Add(quantity)
|
||||||
|
}
|
||||||
|
|
||||||
|
func parseIntegerResource(value interface{}) int64 {
|
||||||
|
quantity, err := resource.ParseQuantity(fmt.Sprint(value))
|
||||||
|
if err != nil {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
return quantity.Value()
|
||||||
|
}
|
||||||
|
|
||||||
|
func parseGPUMemoryResource(value interface{}) int64 {
|
||||||
|
normalized, err := normalizeGPUMemoryQuota(fmt.Sprint(value))
|
||||||
|
if err != nil || normalized == "" {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
parsed, err := strconv.ParseInt(normalized, 10, 64)
|
||||||
|
if err != nil {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
return parsed
|
||||||
|
}
|
||||||
241
backend/internal/domain/service/quota_precheck_test.go
Normal file
241
backend/internal/domain/service/quota_precheck_test.go
Normal file
@ -0,0 +1,241 @@
|
|||||||
|
package service
|
||||||
|
|
||||||
|
import (
|
||||||
|
"errors"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/entity"
|
||||||
|
"github.com/ocdp/cluster-service/internal/domain/repository"
|
||||||
|
corev1 "k8s.io/api/core/v1"
|
||||||
|
"k8s.io/apimachinery/pkg/api/resource"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestCompareWorkspaceQuotaReportsExceededRequests(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
|
||||||
|
workspace := &entity.Workspace{
|
||||||
|
QuotaCPU: "2",
|
||||||
|
QuotaMemory: "4Gi",
|
||||||
|
QuotaGPU: "1",
|
||||||
|
QuotaGPUMem: "10000",
|
||||||
|
}
|
||||||
|
estimate := &repository.ResourceEstimate{
|
||||||
|
Requests: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("2500m"),
|
||||||
|
Memory: resource.MustParse("3Gi"),
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 12000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
result, err := CompareWorkspaceQuota(workspace, estimate)
|
||||||
|
if !errors.Is(err, ErrQuotaExceeded) {
|
||||||
|
t.Fatalf("expected ErrQuotaExceeded, got %v", err)
|
||||||
|
}
|
||||||
|
if result == nil || result.Allowed {
|
||||||
|
t.Fatalf("expected denied result, got %#v", result)
|
||||||
|
}
|
||||||
|
if len(result.Exceeded) != 2 {
|
||||||
|
t.Fatalf("expected 2 exceeded resources, got %#v", result.Exceeded)
|
||||||
|
}
|
||||||
|
if result.Exceeded[0].Name != "requests.cpu" {
|
||||||
|
t.Fatalf("expected requests.cpu exceeded first, got %#v", result.Exceeded)
|
||||||
|
}
|
||||||
|
if result.Exceeded[1].Name != "requests.nvidia.com/gpumem" {
|
||||||
|
t.Fatalf("expected requests.nvidia.com/gpumem exceeded second, got %#v", result.Exceeded)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCompareWorkspaceQuotaUsesLimitsAsEffectiveRequests(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
|
||||||
|
workspace := &entity.Workspace{
|
||||||
|
QuotaGPU: "0",
|
||||||
|
QuotaGPUMem: "9999",
|
||||||
|
}
|
||||||
|
estimate := &repository.ResourceEstimate{
|
||||||
|
Limits: repository.ResourceVector{
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 10000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
result, err := CompareWorkspaceQuota(workspace, estimate)
|
||||||
|
if !errors.Is(err, ErrQuotaExceeded) {
|
||||||
|
t.Fatalf("expected ErrQuotaExceeded from limits-only GPU resources, got %v", err)
|
||||||
|
}
|
||||||
|
if result == nil || len(result.Exceeded) != 2 {
|
||||||
|
t.Fatalf("expected gpu and gpumem to be exceeded, got %#v", result)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCompareBindingQuotaSubtractsCurrentReleaseFromUsedQuota(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
|
||||||
|
binding := &entity.WorkspaceClusterBinding{
|
||||||
|
QuotaCPU: "1",
|
||||||
|
QuotaMemory: "2Gi",
|
||||||
|
QuotaGPU: "1",
|
||||||
|
QuotaGPUMem: "10000",
|
||||||
|
}
|
||||||
|
usage := &repository.ResourceQuotaUsage{
|
||||||
|
Used: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("1"),
|
||||||
|
Memory: resource.MustParse("2Gi"),
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 10000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
current := &repository.ResourceEstimate{
|
||||||
|
Requests: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("1"),
|
||||||
|
Memory: resource.MustParse("2Gi"),
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 10000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
targetSameSize := &repository.ResourceEstimate{
|
||||||
|
Requests: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("1"),
|
||||||
|
Memory: resource.MustParse("2Gi"),
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 10000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
result, err := CompareBindingQuota(binding, usage, targetSameSize, current)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("expected update with same resource footprint to fit quota, got %v", err)
|
||||||
|
}
|
||||||
|
if result.Required.Requests.GPU != 1 || result.Required.Requests.GPUMemoryMB != 10000 {
|
||||||
|
t.Fatalf("expected required resources to subtract current release before target, got %#v", result.Required.Requests)
|
||||||
|
}
|
||||||
|
|
||||||
|
targetScaledUp := &repository.ResourceEstimate{
|
||||||
|
Requests: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("2"),
|
||||||
|
Memory: resource.MustParse("4Gi"),
|
||||||
|
GPU: 2,
|
||||||
|
GPUMemoryMB: 20000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
result, err = CompareBindingQuota(binding, usage, targetScaledUp, current)
|
||||||
|
if !errors.Is(err, ErrQuotaExceeded) {
|
||||||
|
t.Fatalf("expected scale-up beyond quota to be rejected, got %v", err)
|
||||||
|
}
|
||||||
|
if result == nil || result.Allowed {
|
||||||
|
t.Fatalf("expected denied quota result, got %#v", result)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCompareBindingQuotaTreatsExplicitZeroGPUAsNoGPUAllowed(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
|
||||||
|
binding := &entity.WorkspaceClusterBinding{
|
||||||
|
QuotaCPU: "8",
|
||||||
|
QuotaMemory: "32Gi",
|
||||||
|
QuotaGPU: "0",
|
||||||
|
QuotaGPUMem: "0",
|
||||||
|
}
|
||||||
|
vllmLikeEstimate := &repository.ResourceEstimate{
|
||||||
|
Requests: repository.ResourceVector{
|
||||||
|
CPU: resource.MustParse("2"),
|
||||||
|
Memory: resource.MustParse("8Gi"),
|
||||||
|
GPU: 1,
|
||||||
|
GPUMemoryMB: 10000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
result, err := CompareBindingQuota(binding, &repository.ResourceQuotaUsage{}, vllmLikeEstimate, nil)
|
||||||
|
if !errors.Is(err, ErrQuotaExceeded) {
|
||||||
|
t.Fatalf("expected GPU request to exceed explicit zero quota, got %v", err)
|
||||||
|
}
|
||||||
|
exceeded := map[string]bool{}
|
||||||
|
for _, item := range result.Exceeded {
|
||||||
|
exceeded[item.Name] = true
|
||||||
|
}
|
||||||
|
for _, name := range []string{"requests.nvidia.com/gpu", "requests.nvidia.com/gpumem"} {
|
||||||
|
if !exceeded[name] {
|
||||||
|
t.Fatalf("expected %s to be exceeded, got %#v", name, result.Exceeded)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBindingQuotaHardKeepsGPUMemoryAsIntegerMB(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
|
||||||
|
hard := bindingQuotaHard(&entity.WorkspaceClusterBinding{QuotaGPU: "1", QuotaGPUMem: "10000"})
|
||||||
|
gpuMem := hard[corev1.ResourceName("requests.nvidia.com/gpumem")]
|
||||||
|
if gpuMem.Value() != 10000 {
|
||||||
|
t.Fatalf("expected gpumem quota to remain integer MB 10000, got %s value=%d", gpuMem.String(), gpuMem.Value())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEstimateRenderedManifestResourcesSumsPodTemplates(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
|
||||||
|
manifest := `
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: gpu-worker
|
||||||
|
spec:
|
||||||
|
replicas: 3
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
initContainers:
|
||||||
|
- name: init
|
||||||
|
image: busybox
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
containers:
|
||||||
|
- name: app
|
||||||
|
image: busybox
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 1Gi
|
||||||
|
nvidia.com/gpu: "1"
|
||||||
|
nvidia.com/gpumem: "10000"
|
||||||
|
limits:
|
||||||
|
cpu: "1"
|
||||||
|
memory: 2Gi
|
||||||
|
nvidia.com/gpu: "1"
|
||||||
|
nvidia.com/gpumem: "12000"
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: ignored
|
||||||
|
`
|
||||||
|
estimate, err := EstimateRenderedManifestResources(manifest)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("EstimateRenderedManifestResources returned error: %v", err)
|
||||||
|
}
|
||||||
|
if estimate.Requests.CPU.Cmp(resource.MustParse("1800m")) != 0 {
|
||||||
|
t.Fatalf("expected requests cpu 1800m, got %s", estimate.Requests.CPU.String())
|
||||||
|
}
|
||||||
|
if estimate.Requests.Memory.Cmp(resource.MustParse("3456Mi")) != 0 {
|
||||||
|
t.Fatalf("expected requests memory 3456Mi, got %s", estimate.Requests.Memory.String())
|
||||||
|
}
|
||||||
|
if estimate.Requests.GPU != 3 {
|
||||||
|
t.Fatalf("expected requests gpu 3, got %d", estimate.Requests.GPU)
|
||||||
|
}
|
||||||
|
if estimate.Requests.GPUMemoryMB != 30000 {
|
||||||
|
t.Fatalf("expected requests gpumem 30000, got %d", estimate.Requests.GPUMemoryMB)
|
||||||
|
}
|
||||||
|
if estimate.Limits.CPU.Cmp(resource.MustParse("3")) != 0 {
|
||||||
|
t.Fatalf("expected limits cpu 3, got %s", estimate.Limits.CPU.String())
|
||||||
|
}
|
||||||
|
if estimate.Limits.Memory.Cmp(resource.MustParse("6Gi")) != 0 {
|
||||||
|
t.Fatalf("expected limits memory 6Gi, got %s", estimate.Limits.Memory.String())
|
||||||
|
}
|
||||||
|
if estimate.Limits.GPU != 3 {
|
||||||
|
t.Fatalf("expected limits gpu 3, got %d", estimate.Limits.GPU)
|
||||||
|
}
|
||||||
|
if estimate.Limits.GPUMemoryMB != 36000 {
|
||||||
|
t.Fatalf("expected limits gpumem 36000, got %d", estimate.Limits.GPUMemoryMB)
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -9,6 +9,10 @@ import (
|
|||||||
|
|
||||||
func normalizeStandardQuotaQuantity(value string) string {
|
func normalizeStandardQuotaQuantity(value string) string {
|
||||||
value = strings.TrimSpace(value)
|
value = strings.TrimSpace(value)
|
||||||
|
switch strings.ToLower(value) {
|
||||||
|
case "unlimited", "none", "no-limit", "nolimit":
|
||||||
|
return ""
|
||||||
|
}
|
||||||
upper := strings.ToUpper(value)
|
upper := strings.ToUpper(value)
|
||||||
switch {
|
switch {
|
||||||
case strings.HasSuffix(upper, "MB"):
|
case strings.HasSuffix(upper, "MB"):
|
||||||
|
|||||||
@ -3,6 +3,7 @@ package service
|
|||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"sort"
|
"sort"
|
||||||
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
"github.com/google/uuid"
|
"github.com/google/uuid"
|
||||||
@ -94,17 +95,17 @@ func (s *WorkspaceService) EnsureClusterBinding(ctx context.Context, workspaceID
|
|||||||
ClusterID: cluster.ID,
|
ClusterID: cluster.ID,
|
||||||
Namespace: workspace.K8sNamespace,
|
Namespace: workspace.K8sNamespace,
|
||||||
ServiceAccount: workspace.K8sSAName,
|
ServiceAccount: workspace.K8sSAName,
|
||||||
QuotaCPU: workspace.QuotaCPU,
|
QuotaCPU: strings.TrimSpace(workspace.QuotaCPU),
|
||||||
QuotaMemory: workspace.QuotaMemory,
|
QuotaMemory: strings.TrimSpace(workspace.QuotaMemory),
|
||||||
QuotaGPU: workspace.QuotaGPU,
|
QuotaGPU: zeroIfEmptyQuota(workspace.QuotaGPU),
|
||||||
QuotaGPUMem: workspace.QuotaGPUMem,
|
QuotaGPUMem: zeroIfEmptyQuota(workspace.QuotaGPUMem),
|
||||||
Status: "active",
|
Status: "active",
|
||||||
CreatedAt: time.Now(),
|
CreatedAt: time.Now(),
|
||||||
UpdatedAt: time.Now(),
|
UpdatedAt: time.Now(),
|
||||||
}
|
}
|
||||||
tenantBinding := entity.NewTenantBinding(binding.Namespace)
|
tenantBinding := entity.NewTenantBinding(binding.Namespace)
|
||||||
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
||||||
tenantBinding.ResourceQuotaHard = resourceQuotaHard(workspace)
|
tenantBinding.ResourceQuotaHard = bindingQuotaHard(binding)
|
||||||
if s.tenantClient != nil {
|
if s.tenantClient != nil {
|
||||||
if err := s.tenantClient.EnsureTenant(ctx, cluster, tenantBinding); err != nil {
|
if err := s.tenantClient.EnsureTenant(ctx, cluster, tenantBinding); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
@ -145,10 +146,22 @@ func (s *WorkspaceService) IssueKubeconfig(ctx context.Context, workspaceID, clu
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
} else {
|
||||||
|
binding.QuotaCPU = strings.TrimSpace(workspace.QuotaCPU)
|
||||||
|
binding.QuotaMemory = strings.TrimSpace(workspace.QuotaMemory)
|
||||||
|
binding.QuotaGPU = zeroIfEmptyQuota(workspace.QuotaGPU)
|
||||||
|
binding.QuotaGPUMem = zeroIfEmptyQuota(workspace.QuotaGPUMem)
|
||||||
|
binding.UpdatedAt = time.Now()
|
||||||
}
|
}
|
||||||
tenantBinding := entity.NewTenantBinding(binding.Namespace)
|
tenantBinding := entity.NewTenantBinding(binding.Namespace)
|
||||||
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
tenantBinding.ServiceAccountName = binding.ServiceAccount
|
||||||
tenantBinding.ResourceQuotaHard = resourceQuotaHard(workspace)
|
tenantBinding.ResourceQuotaHard = bindingQuotaHard(binding)
|
||||||
|
if s.tenantClient != nil {
|
||||||
|
if err := s.tenantClient.EnsureTenant(ctx, cluster, tenantBinding); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ = s.bindingRepo.Upsert(ctx, binding)
|
||||||
kubeconfig, err := s.tenantClient.IssueKubeconfig(ctx, cluster, tenantBinding, ttl)
|
kubeconfig, err := s.tenantClient.IssueKubeconfig(ctx, cluster, tenantBinding, ttl)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
|
|||||||
@ -117,6 +117,7 @@ services:
|
|||||||
nginx:
|
nginx:
|
||||||
image: nginx:1.27-alpine
|
image: nginx:1.27-alpine
|
||||||
container_name: ocdp-nginx
|
container_name: ocdp-nginx
|
||||||
|
restart: unless-stopped
|
||||||
depends_on:
|
depends_on:
|
||||||
frontend-build:
|
frontend-build:
|
||||||
condition: service_completed_successfully
|
condition: service_completed_successfully
|
||||||
|
|||||||
@ -3,7 +3,7 @@
|
|||||||
* Export configured API client, generated functions, and friendly aliases.
|
* Export configured API client, generated functions, and friendly aliases.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
type AxiosOptions<T extends (...args: any) => any> = Parameters<T>[2];
|
type AxiosOptions<T extends (...args: never[]) => unknown> = Parameters<T>[2];
|
||||||
|
|
||||||
import {
|
import {
|
||||||
deleteClustersClusterId,
|
deleteClustersClusterId,
|
||||||
@ -143,7 +143,10 @@ export type CreateRegistryRequest = GeneratedCreateRegistryRequest;
|
|||||||
export type UpdateRegistryRequest = GeneratedUpdateRegistryRequest;
|
export type UpdateRegistryRequest = GeneratedUpdateRegistryRequest;
|
||||||
export type RegistryHealthResponse = GeneratedRegistryHealthResponse;
|
export type RegistryHealthResponse = GeneratedRegistryHealthResponse;
|
||||||
|
|
||||||
export type InstanceResponse = GeneratedInstanceResponse;
|
export type InstanceResponse = GeneratedInstanceResponse & {
|
||||||
|
ownerId?: string;
|
||||||
|
ownerUsername?: string;
|
||||||
|
};
|
||||||
export type CreateInstanceRequest = GeneratedCreateInstanceRequest;
|
export type CreateInstanceRequest = GeneratedCreateInstanceRequest;
|
||||||
export type UpdateInstanceRequest = GeneratedUpdateInstanceRequest;
|
export type UpdateInstanceRequest = GeneratedUpdateInstanceRequest;
|
||||||
export type InstanceEntry = GeneratedInstanceEntry;
|
export type InstanceEntry = GeneratedInstanceEntry;
|
||||||
@ -242,7 +245,7 @@ export const scaleInstance = (
|
|||||||
instanceId: string,
|
instanceId: string,
|
||||||
body: { replicas: number; workload?: string },
|
body: { replicas: number; workload?: string },
|
||||||
) => {
|
) => {
|
||||||
return customAxiosInstance<{ instance: any; replicas: number; message: string }>({
|
return customAxiosInstance<{ instance: InstanceResponse; replicas: number; message: string }>({
|
||||||
url: `/clusters/${encodeURIComponent(clusterId)}/instances/${encodeURIComponent(instanceId)}/scale`,
|
url: `/clusters/${encodeURIComponent(clusterId)}/instances/${encodeURIComponent(instanceId)}/scale`,
|
||||||
method: "POST",
|
method: "POST",
|
||||||
data: body,
|
data: body,
|
||||||
@ -252,7 +255,7 @@ export const getInstanceValuesDiff = (
|
|||||||
clusterId: string,
|
clusterId: string,
|
||||||
instanceId: string,
|
instanceId: string,
|
||||||
) => {
|
) => {
|
||||||
return customAxiosInstance<{ current: Record<string, any>; defaults: Record<string, any> }>({
|
return customAxiosInstance<{ current: Record<string, unknown>; defaults: Record<string, unknown> }>({
|
||||||
url: `/clusters/${encodeURIComponent(clusterId)}/instances/${encodeURIComponent(instanceId)}/values-diff`,
|
url: `/clusters/${encodeURIComponent(clusterId)}/instances/${encodeURIComponent(instanceId)}/values-diff`,
|
||||||
method: "GET",
|
method: "GET",
|
||||||
});
|
});
|
||||||
|
|||||||
@ -71,11 +71,66 @@ import type { ClusterMonitoring, ClusterMonitoringStatus, NodeMetricsResponse }
|
|||||||
|
|
||||||
export type NodeMetrics = NodeMetricsResponse;
|
export type NodeMetrics = NodeMetricsResponse;
|
||||||
|
|
||||||
|
export interface UserResourceUsage {
|
||||||
|
userId?: string;
|
||||||
|
userName?: string;
|
||||||
|
username?: string;
|
||||||
|
namespace?: string;
|
||||||
|
cpuUsed?: string;
|
||||||
|
usedCpu?: string;
|
||||||
|
cpuRequest?: string;
|
||||||
|
cpuLimit?: string;
|
||||||
|
memoryUsed?: string;
|
||||||
|
usedMemory?: string;
|
||||||
|
memoryRequest?: string;
|
||||||
|
memoryLimit?: string;
|
||||||
|
gpuUsed?: number;
|
||||||
|
usedGpu?: number;
|
||||||
|
gpuAllocated?: number;
|
||||||
|
gpuAllocation?: number;
|
||||||
|
gpuMemoryUsed?: string | number;
|
||||||
|
usedGpuMemory?: string | number;
|
||||||
|
gpuMemUsed?: string | number;
|
||||||
|
gpuMemoryAllocated?: string | number;
|
||||||
|
gpuMemAllocated?: string | number;
|
||||||
|
podCount?: number;
|
||||||
|
instanceCount?: number;
|
||||||
|
cpuRequests?: string;
|
||||||
|
cpuLimits?: string;
|
||||||
|
memoryRequests?: string;
|
||||||
|
memoryLimits?: string;
|
||||||
|
gpuRequests?: number;
|
||||||
|
gpuLimits?: number;
|
||||||
|
gpuMemoryRequestsMb?: number;
|
||||||
|
gpuMemoryLimitsMb?: number;
|
||||||
|
}
|
||||||
|
|
||||||
export interface ClusterMetrics extends ClusterMonitoring {
|
export interface ClusterMetrics extends ClusterMonitoring {
|
||||||
/** Internal UI identifier (legacy) */
|
/** Internal UI identifier (legacy) */
|
||||||
id?: string;
|
id?: string;
|
||||||
nodes?: NodeMetrics[];
|
nodes?: NodeMetrics[];
|
||||||
status?: ClusterMonitoringStatus | 'warning' | 'error';
|
status?: ClusterMonitoringStatus | 'warning' | 'error';
|
||||||
|
allocatedGpu?: number;
|
||||||
|
allocatedGpuMemoryMb?: number;
|
||||||
|
allocatedGpuMemoryMB?: number;
|
||||||
|
gpuMemoryRequestsMb?: number;
|
||||||
|
gpuMemoryLimitsMb?: number;
|
||||||
|
gpuAllocated?: number;
|
||||||
|
gpuAllocation?: number;
|
||||||
|
cpuRequests?: string;
|
||||||
|
cpuLimits?: string;
|
||||||
|
memoryRequests?: string;
|
||||||
|
memoryLimits?: string;
|
||||||
|
totalGpuMemory?: string | number;
|
||||||
|
usedGpuMemory?: string | number;
|
||||||
|
gpuMemoryUsed?: string | number;
|
||||||
|
totalGpuMem?: string | number;
|
||||||
|
usedGpuMem?: string | number;
|
||||||
|
userResources?: UserResourceUsage[];
|
||||||
|
resourceUsageByUser?: UserResourceUsage[];
|
||||||
|
userResourceUsage?: UserResourceUsage[];
|
||||||
|
resourcesByUser?: UserResourceUsage[];
|
||||||
|
userResourceRows?: UserResourceUsage[];
|
||||||
}
|
}
|
||||||
|
|
||||||
// ==================== Common Types ====================
|
// ==================== Common Types ====================
|
||||||
|
|||||||
@ -5,7 +5,7 @@
|
|||||||
import React from "react";
|
import React from "react";
|
||||||
import {
|
import {
|
||||||
Box, Settings, StopCircle, CheckCircle, XCircle, Clock,
|
Box, Settings, StopCircle, CheckCircle, XCircle, Clock,
|
||||||
Network, Activity, GitBranch, Layers,
|
Network, Activity, GitBranch, Layers, User,
|
||||||
AlertTriangle, HelpCircle, Minus, Plus, Loader2,
|
AlertTriangle, HelpCircle, Minus, Plus, Loader2,
|
||||||
} from "lucide-react";
|
} from "lucide-react";
|
||||||
import type { InstanceResponse } from "@/api";
|
import type { InstanceResponse } from "@/api";
|
||||||
@ -116,6 +116,7 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
const version = instance.version || "—";
|
const version = instance.version || "—";
|
||||||
const namespace = instance.namespace || "default";
|
const namespace = instance.namespace || "default";
|
||||||
const revision = instance.revision ?? "—";
|
const revision = instance.revision ?? "—";
|
||||||
|
const ownerLabel = ownerDisplayName(instance.ownerUsername, instance.ownerId);
|
||||||
|
|
||||||
const currentReplicas: number = instance.replicas ?? 0;
|
const currentReplicas: number = instance.replicas ?? 0;
|
||||||
|
|
||||||
@ -150,7 +151,7 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
};
|
};
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="hover-lift group relative bg-white border border-slate-200 rounded-lg flex items-center gap-3 px-4 py-3 transition-all">
|
<div className="hover-lift group relative bg-white border border-slate-200 rounded-lg flex flex-col gap-3 px-4 py-3 transition-all lg:flex-row lg:items-center">
|
||||||
{/* Left color bar (status) */}
|
{/* Left color bar (status) */}
|
||||||
<div className={`self-stretch w-1 rounded-full flex-shrink-0 ${statusInfo.bg} border ${statusInfo.border}`} />
|
<div className={`self-stretch w-1 rounded-full flex-shrink-0 ${statusInfo.bg} border ${statusInfo.border}`} />
|
||||||
|
|
||||||
@ -161,10 +162,10 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Name + Chart info */}
|
{/* Name + Chart info */}
|
||||||
<div className="flex-1 min-w-0 flex items-center gap-4">
|
<div className="w-full min-w-0 flex-1 flex items-center gap-4 lg:w-auto">
|
||||||
<div className="min-w-0">
|
<div className="min-w-0">
|
||||||
<h4 className="text-sm font-semibold text-slate-900 truncate">{instanceName}</h4>
|
<h4 className="text-sm font-semibold text-slate-900 truncate">{instanceName}</h4>
|
||||||
<div className="flex items-center gap-3 text-xs text-slate-500 mt-0.5">
|
<div className="flex flex-wrap items-center gap-x-3 gap-y-1 text-xs text-slate-500 mt-0.5">
|
||||||
<span className="flex items-center gap-1">
|
<span className="flex items-center gap-1">
|
||||||
<Box className="w-3 h-3" />
|
<Box className="w-3 h-3" />
|
||||||
<span className="truncate max-w-[200px]">{chart}:{version}</span>
|
<span className="truncate max-w-[200px]">{chart}:{version}</span>
|
||||||
@ -177,6 +178,12 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
<GitBranch className="w-3 h-3" />
|
<GitBranch className="w-3 h-3" />
|
||||||
rev{revision}
|
rev{revision}
|
||||||
</span>
|
</span>
|
||||||
|
{ownerLabel && (
|
||||||
|
<span className="flex min-w-0 items-center gap-1">
|
||||||
|
<User className="w-3 h-3" />
|
||||||
|
<span className="truncate max-w-[120px]">{ownerLabel}</span>
|
||||||
|
</span>
|
||||||
|
)}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -196,7 +203,7 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
)}
|
)}
|
||||||
|
|
||||||
{/* Scale controls */}
|
{/* Scale controls */}
|
||||||
<div className="flex items-center gap-0.5 flex-shrink-0">
|
<div className="flex w-full items-center justify-end gap-0.5 flex-shrink-0 lg:w-auto">
|
||||||
{canScale ? (
|
{canScale ? (
|
||||||
<>
|
<>
|
||||||
<button
|
<button
|
||||||
@ -229,7 +236,7 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Action buttons */}
|
{/* Action buttons */}
|
||||||
<div className="flex items-center gap-1.5 flex-shrink-0">
|
<div className="flex w-full flex-wrap items-center justify-end gap-1.5 flex-shrink-0 lg:w-auto">
|
||||||
<button
|
<button
|
||||||
onClick={() => onViewEntries(instance)}
|
onClick={() => onViewEntries(instance)}
|
||||||
className="flex items-center gap-1 px-2 py-1.5 rounded-md text-slate-500 hover:text-blue-600 hover:bg-blue-50 transition-colors text-xs font-medium"
|
className="flex items-center gap-1 px-2 py-1.5 rounded-md text-slate-500 hover:text-blue-600 hover:bg-blue-50 transition-colors text-xs font-medium"
|
||||||
@ -266,3 +273,12 @@ export const InstanceCard: React.FC<InstanceCardProps> = ({
|
|||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const ownerDisplayName = (ownerUsername?: string, ownerId?: string): string => {
|
||||||
|
const username = ownerUsername?.trim();
|
||||||
|
if (username) return username;
|
||||||
|
const id = ownerId?.trim();
|
||||||
|
if (!id) return "";
|
||||||
|
if (id.length <= 12) return id;
|
||||||
|
return `${id.slice(0, 8)}...${id.slice(-4)}`;
|
||||||
|
};
|
||||||
|
|||||||
@ -35,7 +35,6 @@ export const ModifyModal: React.FC<ModifyModalProps> = ({
|
|||||||
const [valuesYaml, setValuesYaml] = useState("");
|
const [valuesYaml, setValuesYaml] = useState("");
|
||||||
const [loading, setLoading] = useState(false);
|
const [loading, setLoading] = useState(false);
|
||||||
const [error, setError] = useState<string | null>(null);
|
const [error, setError] = useState<string | null>(null);
|
||||||
const [originalValuesYaml, setOriginalValuesYaml] = useState("");
|
|
||||||
const [modifiedKeys, setModifiedKeys] = useState<string[]>([]);
|
const [modifiedKeys, setModifiedKeys] = useState<string[]>([]);
|
||||||
|
|
||||||
// Values Diff support
|
// Values Diff support
|
||||||
@ -60,7 +59,6 @@ export const ModifyModal: React.FC<ModifyModalProps> = ({
|
|||||||
if (data?.current && Object.keys(data.current).length > 0) {
|
if (data?.current && Object.keys(data.current).length > 0) {
|
||||||
const currentYaml = stringifyYaml(data.current, { lineWidth: 0 });
|
const currentYaml = stringifyYaml(data.current, { lineWidth: 0 });
|
||||||
setValuesYaml(currentYaml);
|
setValuesYaml(currentYaml);
|
||||||
setOriginalValuesYaml(currentYaml);
|
|
||||||
setDiffData({ current: data.current, defaults: data.defaults ?? {} });
|
setDiffData({ current: data.current, defaults: data.defaults ?? {} });
|
||||||
}
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
@ -71,7 +69,6 @@ export const ModifyModal: React.FC<ModifyModalProps> = ({
|
|||||||
if (detail.values && Object.keys(detail.values).length > 0) {
|
if (detail.values && Object.keys(detail.values).length > 0) {
|
||||||
const y = stringifyYaml(detail.values, { lineWidth: 0 });
|
const y = stringifyYaml(detail.values, { lineWidth: 0 });
|
||||||
setValuesYaml(y);
|
setValuesYaml(y);
|
||||||
setOriginalValuesYaml(y);
|
|
||||||
}
|
}
|
||||||
} catch (err2) {
|
} catch (err2) {
|
||||||
console.error('[ModifyModal] Failed to load instance detail:', err2);
|
console.error('[ModifyModal] Failed to load instance detail:', err2);
|
||||||
@ -143,29 +140,18 @@ export const ModifyModal: React.FC<ModifyModalProps> = ({
|
|||||||
});
|
});
|
||||||
};
|
};
|
||||||
|
|
||||||
const computeDeltaValues = (): Record<string, any> | undefined => {
|
|
||||||
if (!valuesYaml.trim()) return undefined;
|
|
||||||
try {
|
|
||||||
const current = parseYaml(valuesYaml);
|
|
||||||
if (!originalValuesYaml) return current; // no original to compare against
|
|
||||||
const original = parseYaml(originalValuesYaml);
|
|
||||||
return diffObjects(current, original);
|
|
||||||
} catch {
|
|
||||||
return parseValuesYaml(valuesYaml);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
const handleSubmit = async (e: React.FormEvent) => {
|
const handleSubmit = async (e: React.FormEvent) => {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
setError(null);
|
setError(null);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const deltaValues = computeDeltaValues();
|
if (valuesYaml.trim()) {
|
||||||
|
parseValuesYaml(valuesYaml);
|
||||||
|
}
|
||||||
const payload: UpdateInstanceRequest = {
|
const payload: UpdateInstanceRequest = {
|
||||||
version: tag && tag !== instance.version ? tag : undefined,
|
version: tag && tag !== instance.version ? tag : undefined,
|
||||||
description: description.trim() || undefined,
|
description: description.trim() || undefined,
|
||||||
values: deltaValues,
|
|
||||||
valuesYaml: valuesYaml.trim() || undefined,
|
valuesYaml: valuesYaml.trim() || undefined,
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -268,7 +254,7 @@ export const ModifyModal: React.FC<ModifyModalProps> = ({
|
|||||||
Configuration Values
|
Configuration Values
|
||||||
</label>
|
</label>
|
||||||
<p className="text-xs text-slate-500">
|
<p className="text-xs text-slate-500">
|
||||||
Editing current deployed values. Only modified keys are sent to the server.
|
Editing current deployed values. The full YAML is submitted so nested chart values stay intact.
|
||||||
</p>
|
</p>
|
||||||
{modifiedKeys.length > 0 && (
|
{modifiedKeys.length > 0 && (
|
||||||
<div className="flex flex-wrap items-center gap-1.5 text-xs">
|
<div className="flex flex-wrap items-center gap-1.5 text-xs">
|
||||||
@ -368,27 +354,6 @@ export const ModifyModal: React.FC<ModifyModalProps> = ({
|
|||||||
);
|
);
|
||||||
};
|
};
|
||||||
|
|
||||||
/** Deep diff: returns only keys in `current` whose values differ from `original` */
|
|
||||||
const diffObjects = (current: Record<string, any>, original: Record<string, any>): Record<string, any> => {
|
|
||||||
const delta: Record<string, any> = {};
|
|
||||||
for (const key of Object.keys(current)) {
|
|
||||||
const cv = current[key];
|
|
||||||
const ov = original[key];
|
|
||||||
if (JSON.stringify(cv) !== JSON.stringify(ov)) {
|
|
||||||
if (typeof cv === 'object' && cv !== null && !Array.isArray(cv) &&
|
|
||||||
typeof ov === 'object' && ov !== null && !Array.isArray(ov)) {
|
|
||||||
const nested = diffObjects(cv, ov);
|
|
||||||
if (Object.keys(nested).length > 0) {
|
|
||||||
delta[key] = nested;
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
delta[key] = cv;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return delta;
|
|
||||||
};
|
|
||||||
|
|
||||||
const parseValuesYaml = (source: string): Record<string, any> => {
|
const parseValuesYaml = (source: string): Record<string, any> => {
|
||||||
const parsed = parseYaml(source);
|
const parsed = parseYaml(source);
|
||||||
if (parsed == null) {
|
if (parsed == null) {
|
||||||
|
|||||||
@ -222,7 +222,7 @@ export const LaunchModal: React.FC<LaunchModalProps> = ({
|
|||||||
valuesObj = pruneEmptyValues(valuesForm);
|
valuesObj = pruneEmptyValues(valuesForm);
|
||||||
} else if (inputMethod === "yaml" && valuesYaml.trim()) {
|
} else if (inputMethod === "yaml" && valuesYaml.trim()) {
|
||||||
try {
|
try {
|
||||||
valuesObj = parseValuesYaml(valuesYaml);
|
parseValuesYaml(valuesYaml);
|
||||||
normalizedValuesYaml = valuesYaml.trim();
|
normalizedValuesYaml = valuesYaml.trim();
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error(err);
|
console.error(err);
|
||||||
|
|||||||
@ -1,11 +1,13 @@
|
|||||||
import React, { useEffect, useMemo, useState } from "react";
|
import React, { useCallback, useEffect, useMemo, useState } from "react";
|
||||||
import { Gauge, KeyRound, Pencil, RefreshCw, Shield, Trash2, UserPlus, Users, X } from "lucide-react";
|
import { Gauge, KeyRound, Pencil, RefreshCw, Shield, Trash2, UserPlus, Users, X, type LucideIcon } from "lucide-react";
|
||||||
import { createUser, listClusters, listUsers, updateUser, deleteUser, type ClusterResponse, type UserResponse } from "@/api";
|
import { createUser, listClusters, listUsers, updateUser, deleteUser, type ClusterResponse, type UserResponse } from "@/api";
|
||||||
import { useToast } from "@/shared";
|
import { useToast } from "@/shared";
|
||||||
import { Button, Input, Badge, LoadingState } from "@/shared/components";
|
import { Button, Input, Badge, LoadingState } from "@/shared/components";
|
||||||
import { formatApiError } from "@/shared/utils";
|
import { formatApiError } from "@/shared/utils";
|
||||||
import { useAuth } from "@/app/providers";
|
import { useAuth } from "@/app/providers";
|
||||||
|
|
||||||
|
type LimitsEditorMode = "edit" | "downgrade";
|
||||||
|
|
||||||
const UserManagementPage: React.FC = () => {
|
const UserManagementPage: React.FC = () => {
|
||||||
const { user } = useAuth();
|
const { user } = useAuth();
|
||||||
const { success, error: toastError } = useToast();
|
const { success, error: toastError } = useToast();
|
||||||
@ -18,8 +20,8 @@ const UserManagementPage: React.FC = () => {
|
|||||||
const [role, setRole] = useState("user");
|
const [role, setRole] = useState("user");
|
||||||
const [namespace, setNamespace] = useState("");
|
const [namespace, setNamespace] = useState("");
|
||||||
const [defaultClusterId, setDefaultClusterId] = useState("");
|
const [defaultClusterId, setDefaultClusterId] = useState("");
|
||||||
const [quotaCpu, setQuotaCpu] = useState("4");
|
const [quotaCpu, setQuotaCpu] = useState("");
|
||||||
const [quotaMemory, setQuotaMemory] = useState("16Gi");
|
const [quotaMemory, setQuotaMemory] = useState("");
|
||||||
const [quotaGpu, setQuotaGpu] = useState("0");
|
const [quotaGpu, setQuotaGpu] = useState("0");
|
||||||
const [quotaGpuMemory, setQuotaGpuMemory] = useState("0");
|
const [quotaGpuMemory, setQuotaGpuMemory] = useState("0");
|
||||||
const [mustChangePassword, setMustChangePassword] = useState(true);
|
const [mustChangePassword, setMustChangePassword] = useState(true);
|
||||||
@ -31,13 +33,14 @@ const UserManagementPage: React.FC = () => {
|
|||||||
const [editQuotaGpu, setEditQuotaGpu] = useState("");
|
const [editQuotaGpu, setEditQuotaGpu] = useState("");
|
||||||
const [editQuotaGpuMemory, setEditQuotaGpuMemory] = useState("");
|
const [editQuotaGpuMemory, setEditQuotaGpuMemory] = useState("");
|
||||||
const [savingLimits, setSavingLimits] = useState(false);
|
const [savingLimits, setSavingLimits] = useState(false);
|
||||||
|
const [limitsEditorMode, setLimitsEditorMode] = useState<LimitsEditorMode>("edit");
|
||||||
|
|
||||||
const sortedUsers = useMemo(
|
const sortedUsers = useMemo(
|
||||||
() => [...users].sort((a, b) => (a.username ?? "").localeCompare(b.username ?? "")),
|
() => [...users].sort((a, b) => (a.username ?? "").localeCompare(b.username ?? "")),
|
||||||
[users]
|
[users]
|
||||||
);
|
);
|
||||||
|
|
||||||
const loadUsers = async () => {
|
const loadUsers = useCallback(async () => {
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
try {
|
try {
|
||||||
setUsers(await listUsers());
|
setUsers(await listUsers());
|
||||||
@ -46,9 +49,9 @@ const UserManagementPage: React.FC = () => {
|
|||||||
} finally {
|
} finally {
|
||||||
setLoading(false);
|
setLoading(false);
|
||||||
}
|
}
|
||||||
};
|
}, [toastError]);
|
||||||
|
|
||||||
const loadClusters = async () => {
|
const loadClusters = useCallback(async () => {
|
||||||
try {
|
try {
|
||||||
const data = await listClusters();
|
const data = await listClusters();
|
||||||
const available = data.filter((cluster) => typeof cluster.id === "string" && cluster.id.length > 0);
|
const available = data.filter((cluster) => typeof cluster.id === "string" && cluster.id.length > 0);
|
||||||
@ -57,12 +60,12 @@ const UserManagementPage: React.FC = () => {
|
|||||||
} catch (err) {
|
} catch (err) {
|
||||||
toastError(formatApiError(err) || "Failed to load clusters");
|
toastError(formatApiError(err) || "Failed to load clusters");
|
||||||
}
|
}
|
||||||
};
|
}, [toastError]);
|
||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
void loadUsers();
|
void loadUsers();
|
||||||
void loadClusters();
|
void loadClusters();
|
||||||
}, []);
|
}, [loadClusters, loadUsers]);
|
||||||
|
|
||||||
const handleCreate = async (event: React.FormEvent) => {
|
const handleCreate = async (event: React.FormEvent) => {
|
||||||
event.preventDefault();
|
event.preventDefault();
|
||||||
@ -96,8 +99,8 @@ const UserManagementPage: React.FC = () => {
|
|||||||
setRole("user");
|
setRole("user");
|
||||||
setNamespace("");
|
setNamespace("");
|
||||||
setDefaultClusterId(clusters[0]?.id || "");
|
setDefaultClusterId(clusters[0]?.id || "");
|
||||||
setQuotaCpu("4");
|
setQuotaCpu("");
|
||||||
setQuotaMemory("16Gi");
|
setQuotaMemory("");
|
||||||
setQuotaGpu("0");
|
setQuotaGpu("0");
|
||||||
setQuotaGpuMemory("0");
|
setQuotaGpuMemory("0");
|
||||||
setMustChangePassword(true);
|
setMustChangePassword(true);
|
||||||
@ -135,6 +138,10 @@ const UserManagementPage: React.FC = () => {
|
|||||||
const toggleRole = async (target: UserResponse) => {
|
const toggleRole = async (target: UserResponse) => {
|
||||||
if (!target.id) return;
|
if (!target.id) return;
|
||||||
const nextRole = target.role === "admin" ? "user" : "admin";
|
const nextRole = target.role === "admin" ? "user" : "admin";
|
||||||
|
if (nextRole === "user") {
|
||||||
|
openLimitsEditor(target, "downgrade");
|
||||||
|
return;
|
||||||
|
}
|
||||||
try {
|
try {
|
||||||
await updateUser(target.id, { role: nextRole });
|
await updateUser(target.id, { role: nextRole });
|
||||||
success("User role updated");
|
success("User role updated");
|
||||||
@ -156,12 +163,13 @@ const UserManagementPage: React.FC = () => {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
const openLimitsEditor = (target: UserResponse) => {
|
const openLimitsEditor = (target: UserResponse, mode: LimitsEditorMode = "edit") => {
|
||||||
|
setLimitsEditorMode(mode);
|
||||||
setEditingLimits(target);
|
setEditingLimits(target);
|
||||||
setEditNamespace(target.namespace || namespaceForUsername(target.username || ""));
|
setEditNamespace(target.namespace || namespaceForUsername(target.username || ""));
|
||||||
setEditDefaultClusterId(target.defaultClusterId || clusters[0]?.id || "");
|
setEditDefaultClusterId(target.defaultClusterId || clusters[0]?.id || "");
|
||||||
setEditQuotaCpu(target.quotaCpu || "4");
|
setEditQuotaCpu(target.quotaCpu || "");
|
||||||
setEditQuotaMemory(target.quotaMemory || "16Gi");
|
setEditQuotaMemory(target.quotaMemory || "");
|
||||||
setEditQuotaGpu(target.quotaGpu || "0");
|
setEditQuotaGpu(target.quotaGpu || "0");
|
||||||
setEditQuotaGpuMemory(target.quotaGpuMemory || "0");
|
setEditQuotaGpuMemory(target.quotaGpuMemory || "0");
|
||||||
};
|
};
|
||||||
@ -172,14 +180,15 @@ const UserManagementPage: React.FC = () => {
|
|||||||
setSavingLimits(true);
|
setSavingLimits(true);
|
||||||
try {
|
try {
|
||||||
await updateUser(editingLimits.id, {
|
await updateUser(editingLimits.id, {
|
||||||
|
...(limitsEditorMode === "downgrade" ? { role: "user" } : {}),
|
||||||
namespace: editNamespace.trim(),
|
namespace: editNamespace.trim(),
|
||||||
defaultClusterId: editDefaultClusterId.trim(),
|
defaultClusterId: editDefaultClusterId.trim(),
|
||||||
quotaCpu: editQuotaCpu.trim(),
|
quotaCpu: quotaForApi(editQuotaCpu, true),
|
||||||
quotaMemory: editQuotaMemory.trim(),
|
quotaMemory: quotaForApi(editQuotaMemory, true),
|
||||||
quotaGpu: editQuotaGpu.trim(),
|
quotaGpu: editQuotaGpu.trim(),
|
||||||
quotaGpuMemory: editQuotaGpuMemory.trim(),
|
quotaGpuMemory: editQuotaGpuMemory.trim(),
|
||||||
});
|
});
|
||||||
success("User limits updated");
|
success(limitsEditorMode === "downgrade" ? "User role and limits updated" : "User limits updated");
|
||||||
setEditingLimits(null);
|
setEditingLimits(null);
|
||||||
await loadUsers();
|
await loadUsers();
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
@ -190,7 +199,7 @@ const UserManagementPage: React.FC = () => {
|
|||||||
};
|
};
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="mx-auto max-w-7xl px-4 py-6 sm:px-6 lg:px-8">
|
<div className="mx-auto max-w-screen-2xl px-4 py-6 sm:px-6 lg:px-8">
|
||||||
<div className="mb-6 flex flex-col gap-3 sm:flex-row sm:items-end sm:justify-between">
|
<div className="mb-6 flex flex-col gap-3 sm:flex-row sm:items-end sm:justify-between">
|
||||||
<div>
|
<div>
|
||||||
<div className="mb-2 flex items-center gap-2 text-sm font-medium text-blue-700">
|
<div className="mb-2 flex items-center gap-2 text-sm font-medium text-blue-700">
|
||||||
@ -207,7 +216,7 @@ const UserManagementPage: React.FC = () => {
|
|||||||
</Button>
|
</Button>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div className="grid gap-6 lg:grid-cols-[360px_1fr]">
|
<div className="grid gap-6 xl:grid-cols-[380px_minmax(0,1fr)]">
|
||||||
<form onSubmit={handleCreate} className="rounded-lg border border-slate-200 bg-white p-5 shadow-soft">
|
<form onSubmit={handleCreate} className="rounded-lg border border-slate-200 bg-white p-5 shadow-soft">
|
||||||
<div className="mb-4 flex items-center gap-2">
|
<div className="mb-4 flex items-center gap-2">
|
||||||
<UserPlus className="h-5 w-5 text-blue-600" />
|
<UserPlus className="h-5 w-5 text-blue-600" />
|
||||||
@ -281,11 +290,11 @@ const UserManagementPage: React.FC = () => {
|
|||||||
<div className="grid grid-cols-2 gap-3">
|
<div className="grid grid-cols-2 gap-3">
|
||||||
<label className="block text-sm font-medium text-slate-700">
|
<label className="block text-sm font-medium text-slate-700">
|
||||||
CPU
|
CPU
|
||||||
<Input value={quotaCpu} onChange={(e) => setQuotaCpu(e.target.value)} className="mt-1" placeholder="4" />
|
<Input value={quotaCpu} onChange={(e) => setQuotaCpu(e.target.value)} className="mt-1" placeholder="Unlimited" />
|
||||||
</label>
|
</label>
|
||||||
<label className="block text-sm font-medium text-slate-700">
|
<label className="block text-sm font-medium text-slate-700">
|
||||||
Memory
|
Memory
|
||||||
<Input value={quotaMemory} onChange={(e) => setQuotaMemory(e.target.value)} className="mt-1" placeholder="16Gi" />
|
<Input value={quotaMemory} onChange={(e) => setQuotaMemory(e.target.value)} className="mt-1" placeholder="Unlimited" />
|
||||||
</label>
|
</label>
|
||||||
<label className="block text-sm font-medium text-slate-700">
|
<label className="block text-sm font-medium text-slate-700">
|
||||||
GPU
|
GPU
|
||||||
@ -297,7 +306,7 @@ const UserManagementPage: React.FC = () => {
|
|||||||
</label>
|
</label>
|
||||||
</div>
|
</div>
|
||||||
<p className="mt-2 text-xs text-slate-500">
|
<p className="mt-2 text-xs text-slate-500">
|
||||||
CPU and memory use Kubernetes quantities. GPU memory is an integer MB value, for example 10000.
|
Leave CPU or memory blank for no platform quota. GPU and GPU memory default to 0. GPU memory is integer MB, for example 10000.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
@ -327,90 +336,87 @@ const UserManagementPage: React.FC = () => {
|
|||||||
{loading ? (
|
{loading ? (
|
||||||
<LoadingState message="Loading users..." />
|
<LoadingState message="Loading users..." />
|
||||||
) : (
|
) : (
|
||||||
<div className="overflow-x-auto">
|
<div className="divide-y divide-slate-100">
|
||||||
<table className="min-w-full divide-y divide-slate-200 text-sm">
|
{sortedUsers.map((target) => (
|
||||||
<thead className="bg-slate-50 text-left text-xs font-semibold uppercase tracking-wide text-slate-500">
|
<article key={target.id} className="min-w-0 overflow-hidden px-4 py-4 hover:bg-slate-50 sm:px-5">
|
||||||
<tr>
|
<div className="flex min-w-0 flex-col gap-4 xl:flex-row xl:items-start xl:justify-between">
|
||||||
<th className="px-5 py-3">User</th>
|
<div className="min-w-0">
|
||||||
<th className="px-5 py-3">Role</th>
|
<div className="flex flex-wrap items-center gap-2">
|
||||||
<th className="px-5 py-3">Status</th>
|
<h3 className="min-w-0 truncate text-sm font-semibold text-slate-950">{target.username}</h3>
|
||||||
<th className="px-5 py-3">Namespace</th>
|
|
||||||
<th className="px-5 py-3">Quota</th>
|
|
||||||
<th className="sticky right-0 z-10 bg-slate-50 px-5 py-3 text-right shadow-[-12px_0_18px_-18px_rgba(15,23,42,0.35)]">
|
|
||||||
Actions
|
|
||||||
</th>
|
|
||||||
</tr>
|
|
||||||
</thead>
|
|
||||||
<tbody className="divide-y divide-slate-100">
|
|
||||||
{sortedUsers.map((target) => (
|
|
||||||
<tr key={target.id} className="group hover:bg-slate-50">
|
|
||||||
<td className="px-5 py-3">
|
|
||||||
<div className="font-medium text-slate-900">{target.username}</div>
|
|
||||||
<div className="text-xs text-slate-500">{target.email}</div>
|
|
||||||
</td>
|
|
||||||
<td className="px-5 py-3">
|
|
||||||
<Badge variant={target.role === "admin" ? "info" : "secondary"} size="sm">
|
<Badge variant={target.role === "admin" ? "info" : "secondary"} size="sm">
|
||||||
{target.role}
|
{target.role}
|
||||||
</Badge>
|
</Badge>
|
||||||
</td>
|
|
||||||
<td className="px-5 py-3">
|
|
||||||
<Badge variant={target.isActive ? "success" : "warning"} size="sm">
|
<Badge variant={target.isActive ? "success" : "warning"} size="sm">
|
||||||
{target.isActive ? "Active" : "Disabled"}
|
{target.isActive ? "Active" : "Disabled"}
|
||||||
</Badge>
|
</Badge>
|
||||||
</td>
|
</div>
|
||||||
<td className="px-5 py-3">
|
<div className="mt-1 truncate text-xs text-slate-500">{target.email || target.id}</div>
|
||||||
<div className="font-mono text-xs text-slate-700">{target.namespace || "-"}</div>
|
</div>
|
||||||
<div className="text-xs text-slate-500">{target.workspaceName || target.workspaceId}</div>
|
|
||||||
{target.defaultClusterId && (
|
<div className="grid w-full min-w-0 grid-cols-2 gap-2 sm:grid-cols-4 xl:w-auto xl:min-w-[360px] xl:max-w-[480px]">
|
||||||
<div className="mt-1 text-xs text-blue-700">{clusterName(clusters, target.defaultClusterId)}</div>
|
<ActionButton onClick={() => toggleRole(target)} title={`Make ${target.role === "admin" ? "User" : "Admin"}`}>
|
||||||
)}
|
{target.role === "admin" ? "To User" : "To Admin"}
|
||||||
</td>
|
</ActionButton>
|
||||||
<td className="px-5 py-3 text-xs text-slate-600">
|
{target.role !== "admin" && (
|
||||||
{target.role === "admin" ? (
|
<ActionButton variant="secondary" icon={Pencil} onClick={() => openLimitsEditor(target)}>
|
||||||
<span className="text-slate-400">default workspace</span>
|
Limits
|
||||||
) : (
|
</ActionButton>
|
||||||
<div className="grid gap-1">
|
)}
|
||||||
<span>CPU {target.quotaCpu || "-"}</span>
|
<ActionButton
|
||||||
<span>Mem {target.quotaMemory || "-"}</span>
|
variant="secondary"
|
||||||
<span>GPU {target.quotaGpu || "0"} / Mem {target.quotaGpuMemory || "0"}</span>
|
onClick={() => toggleActive(target)}
|
||||||
</div>
|
disabled={target.id === user?.userId}
|
||||||
)}
|
>
|
||||||
</td>
|
{target.isActive ? "Disable" : "Enable"}
|
||||||
<td className="sticky right-0 bg-white px-5 py-3 shadow-[-12px_0_18px_-18px_rgba(15,23,42,0.35)] group-hover:bg-slate-50">
|
</ActionButton>
|
||||||
<div className="grid w-[260px] grid-cols-2 gap-2">
|
<ActionButton
|
||||||
<Button type="button" variant="ghost" size="sm" onClick={() => toggleRole(target)}>
|
variant="danger"
|
||||||
Make {target.role === "admin" ? "User" : "Admin"}
|
icon={Trash2}
|
||||||
</Button>
|
onClick={() => handleDelete(target)}
|
||||||
{target.role !== "admin" && (
|
disabled={target.id === user?.userId}
|
||||||
<Button type="button" variant="secondary" size="sm" icon={Pencil} onClick={() => openLimitsEditor(target)}>
|
>
|
||||||
Limits
|
Delete
|
||||||
</Button>
|
</ActionButton>
|
||||||
)}
|
</div>
|
||||||
<Button
|
</div>
|
||||||
type="button"
|
|
||||||
variant="secondary"
|
<div className="mt-4 grid min-w-0 gap-4 lg:grid-cols-[minmax(240px,0.85fr)_minmax(0,1.15fr)]">
|
||||||
size="sm"
|
<div className="min-w-0 rounded-md border border-slate-200 bg-slate-50 px-3 py-2">
|
||||||
onClick={() => toggleActive(target)}
|
<div className="grid gap-1 text-xs">
|
||||||
disabled={target.id === user?.userId}
|
<div className="flex min-w-0 items-center justify-between gap-3">
|
||||||
>
|
<span className="shrink-0 text-slate-500">Namespace</span>
|
||||||
{target.isActive ? "Disable" : "Enable"}
|
<span className="min-w-0 truncate font-mono text-slate-800">{target.namespace || "-"}</span>
|
||||||
</Button>
|
|
||||||
<Button
|
|
||||||
type="button"
|
|
||||||
variant="danger"
|
|
||||||
size="sm"
|
|
||||||
icon={Trash2}
|
|
||||||
onClick={() => handleDelete(target)}
|
|
||||||
disabled={target.id === user?.userId}
|
|
||||||
>
|
|
||||||
Delete
|
|
||||||
</Button>
|
|
||||||
</div>
|
</div>
|
||||||
</td>
|
<div className="flex min-w-0 items-center justify-between gap-3">
|
||||||
</tr>
|
<span className="shrink-0 text-slate-500">Workspace</span>
|
||||||
))}
|
<span className="min-w-0 truncate text-slate-700">{target.workspaceName || target.workspaceId || "-"}</span>
|
||||||
</tbody>
|
</div>
|
||||||
</table>
|
<div className="flex min-w-0 items-center justify-between gap-3">
|
||||||
|
<span className="shrink-0 text-slate-500">Cluster</span>
|
||||||
|
<span className="min-w-0 truncate text-blue-700">
|
||||||
|
{target.defaultClusterId ? clusterName(clusters, target.defaultClusterId) : "-"}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="min-w-0">
|
||||||
|
{target.role === "admin" ? (
|
||||||
|
<div className="inline-flex rounded-full border border-slate-200 bg-slate-50 px-3 py-1 text-xs text-slate-500">
|
||||||
|
default workspace
|
||||||
|
</div>
|
||||||
|
) : (
|
||||||
|
<div className="grid min-w-0 grid-cols-2 gap-2 sm:grid-cols-4">
|
||||||
|
{quotaChip("CPU", target.quotaCpu || "Unlimited")}
|
||||||
|
{quotaChip("Memory", target.quotaMemory || "Unlimited")}
|
||||||
|
{quotaChip("GPU", target.quotaGpu || "0")}
|
||||||
|
{quotaChip("GPU Mem", target.quotaGpuMemory || "0")}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</article>
|
||||||
|
))}
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
</section>
|
</section>
|
||||||
@ -422,10 +428,14 @@ const UserManagementPage: React.FC = () => {
|
|||||||
<div>
|
<div>
|
||||||
<div className="flex items-center gap-2 text-sm font-semibold text-blue-700">
|
<div className="flex items-center gap-2 text-sm font-semibold text-blue-700">
|
||||||
<Gauge className="h-4 w-4" />
|
<Gauge className="h-4 w-4" />
|
||||||
Tenant limits
|
{limitsEditorMode === "downgrade" ? "Convert admin to user" : "Tenant limits"}
|
||||||
</div>
|
</div>
|
||||||
<h2 className="mt-1 text-xl font-semibold text-slate-950">{editingLimits.username}</h2>
|
<h2 className="mt-1 text-xl font-semibold text-slate-950">{editingLimits.username}</h2>
|
||||||
<p className="mt-1 text-sm text-slate-500">Changes are applied to workspace metadata and the next tenant binding/deploy refreshes Kubernetes ResourceQuota.</p>
|
<p className="mt-1 text-sm text-slate-500">
|
||||||
|
{limitsEditorMode === "downgrade"
|
||||||
|
? "Set the tenant namespace, default cluster, and resource quota before applying the user role."
|
||||||
|
: "Changes are applied to workspace metadata and the next tenant binding/deploy refreshes Kubernetes ResourceQuota."}
|
||||||
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<button type="button" onClick={() => setEditingLimits(null)} className="rounded-lg p-2 text-slate-500 hover:bg-slate-100 hover:text-slate-900">
|
<button type="button" onClick={() => setEditingLimits(null)} className="rounded-lg p-2 text-slate-500 hover:bg-slate-100 hover:text-slate-900">
|
||||||
<X className="h-5 w-5" />
|
<X className="h-5 w-5" />
|
||||||
@ -454,11 +464,11 @@ const UserManagementPage: React.FC = () => {
|
|||||||
<div className="grid gap-3 sm:grid-cols-2">
|
<div className="grid gap-3 sm:grid-cols-2">
|
||||||
<label className="block text-sm font-medium text-slate-700">
|
<label className="block text-sm font-medium text-slate-700">
|
||||||
CPU
|
CPU
|
||||||
<Input value={editQuotaCpu} onChange={(e) => setEditQuotaCpu(e.target.value)} className="mt-1" required />
|
<Input value={editQuotaCpu} onChange={(e) => setEditQuotaCpu(e.target.value)} className="mt-1" placeholder="Unlimited" />
|
||||||
</label>
|
</label>
|
||||||
<label className="block text-sm font-medium text-slate-700">
|
<label className="block text-sm font-medium text-slate-700">
|
||||||
Memory
|
Memory
|
||||||
<Input value={editQuotaMemory} onChange={(e) => setEditQuotaMemory(e.target.value)} className="mt-1" required />
|
<Input value={editQuotaMemory} onChange={(e) => setEditQuotaMemory(e.target.value)} className="mt-1" placeholder="Unlimited" />
|
||||||
</label>
|
</label>
|
||||||
<label className="block text-sm font-medium text-slate-700">
|
<label className="block text-sm font-medium text-slate-700">
|
||||||
GPU
|
GPU
|
||||||
@ -475,7 +485,7 @@ const UserManagementPage: React.FC = () => {
|
|||||||
Cancel
|
Cancel
|
||||||
</Button>
|
</Button>
|
||||||
<Button type="submit" variant="primary" loading={savingLimits}>
|
<Button type="submit" variant="primary" loading={savingLimits}>
|
||||||
Save Limits
|
{limitsEditorMode === "downgrade" ? "Save Role and Limits" : "Save Limits"}
|
||||||
</Button>
|
</Button>
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
@ -499,4 +509,41 @@ const clusterName = (clusters: ClusterResponse[], clusterId: string): string =>
|
|||||||
return cluster?.name || clusterId;
|
return cluster?.name || clusterId;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const quotaChip = (label: string, value: React.ReactNode): React.ReactElement => (
|
||||||
|
<div className="min-w-0 rounded-md border border-slate-200 bg-white px-3 py-2">
|
||||||
|
<div className="text-[11px] font-medium uppercase tracking-wide text-slate-500">{label}</div>
|
||||||
|
<div className="mt-0.5 truncate font-mono text-sm text-slate-900">{value}</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
|
||||||
|
const ActionButton = ({
|
||||||
|
children,
|
||||||
|
icon,
|
||||||
|
variant = "ghost",
|
||||||
|
...props
|
||||||
|
}: React.ButtonHTMLAttributes<HTMLButtonElement> & {
|
||||||
|
children: React.ReactNode;
|
||||||
|
icon?: LucideIcon;
|
||||||
|
variant?: "secondary" | "danger" | "ghost";
|
||||||
|
}): React.ReactElement => (
|
||||||
|
<Button
|
||||||
|
type="button"
|
||||||
|
variant={variant}
|
||||||
|
size="sm"
|
||||||
|
icon={icon}
|
||||||
|
className="min-w-0 max-w-full px-2 text-xs leading-tight [&>svg]:shrink-0"
|
||||||
|
{...props}
|
||||||
|
>
|
||||||
|
<span className="min-w-0 truncate">{children}</span>
|
||||||
|
</Button>
|
||||||
|
);
|
||||||
|
|
||||||
|
const quotaForApi = (value: string, blankMeansUnlimited = false): string => {
|
||||||
|
const trimmed = value.trim();
|
||||||
|
if (!trimmed && blankMeansUnlimited) {
|
||||||
|
return "unlimited";
|
||||||
|
}
|
||||||
|
return trimmed;
|
||||||
|
};
|
||||||
|
|
||||||
export default UserManagementPage;
|
export default UserManagementPage;
|
||||||
|
|||||||
@ -3,7 +3,7 @@
|
|||||||
* 显示单个集群的监控信息
|
* 显示单个集群的监控信息
|
||||||
*/
|
*/
|
||||||
import React, { useState } from "react";
|
import React, { useState } from "react";
|
||||||
import { Activity, CheckCircle, AlertTriangle, XCircle, HelpCircle, Clock, Cpu, Database, Server as ServerIcon, ChevronDown, ChevronUp, TrendingUp } from "lucide-react";
|
import { Activity, CheckCircle, AlertTriangle, XCircle, HelpCircle, Clock, Cpu, Database, Server as ServerIcon, ChevronDown, ChevronUp, TrendingUp, Users } from "lucide-react";
|
||||||
import { Card, Badge } from "@/shared/components";
|
import { Card, Badge } from "@/shared/components";
|
||||||
import type { ClusterMetrics } from "@/core/types";
|
import type { ClusterMetrics } from "@/core/types";
|
||||||
import { NodeMetricCard } from "./NodeMetricCard";
|
import { NodeMetricCard } from "./NodeMetricCard";
|
||||||
@ -20,6 +20,9 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
const podCount = cluster.podCount ?? 0;
|
const podCount = cluster.podCount ?? 0;
|
||||||
const totalGpu = cluster.totalGpu ?? 0;
|
const totalGpu = cluster.totalGpu ?? 0;
|
||||||
const usedGpu = cluster.usedGpu ?? 0;
|
const usedGpu = cluster.usedGpu ?? 0;
|
||||||
|
const allocatedGpu = firstNumber(cluster.gpuAllocated, cluster.allocatedGpu, cluster.gpuAllocation, usedGpu);
|
||||||
|
const usedGpuMemory = firstDisplayValue(cluster.allocatedGpuMemoryMb, cluster.allocatedGpuMemoryMB, cluster.gpuMemoryRequestsMb, cluster.usedGpuMemory, cluster.gpuMemoryUsed, cluster.usedGpuMem);
|
||||||
|
const totalGpuMemory = firstDisplayValue(cluster.totalGpuMemory, cluster.totalGpuMem);
|
||||||
const cpuUsage = cluster.cpuUsage ?? 0;
|
const cpuUsage = cluster.cpuUsage ?? 0;
|
||||||
const memoryUsage = cluster.memoryUsage ?? 0;
|
const memoryUsage = cluster.memoryUsage ?? 0;
|
||||||
const gpuUsage = cluster.gpuUsage ?? 0;
|
const gpuUsage = cluster.gpuUsage ?? 0;
|
||||||
@ -27,7 +30,11 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
const totalCpu = cluster.totalCpu ?? "N/A";
|
const totalCpu = cluster.totalCpu ?? "N/A";
|
||||||
const usedMemory = cluster.usedMemory ?? "N/A";
|
const usedMemory = cluster.usedMemory ?? "N/A";
|
||||||
const totalMemory = cluster.totalMemory ?? "N/A";
|
const totalMemory = cluster.totalMemory ?? "N/A";
|
||||||
|
const cpuRequestText = firstDisplayValue(cluster.cpuRequests, usedCpu);
|
||||||
|
const memoryRequestText = firstDisplayValue(cluster.memoryRequests, usedMemory);
|
||||||
|
const hasClusterTotals = Boolean(cluster.totalCpu || cluster.totalMemory || cluster.nodeCount);
|
||||||
const lastCheckedText = cluster.lastCheck ? new Date(cluster.lastCheck).toLocaleString() : "N/A";
|
const lastCheckedText = cluster.lastCheck ? new Date(cluster.lastCheck).toLocaleString() : "N/A";
|
||||||
|
const userResourceRows = getUserResourceRows(cluster);
|
||||||
|
|
||||||
const getStatusBadge = () => {
|
const getStatusBadge = () => {
|
||||||
switch (status) {
|
switch (status) {
|
||||||
@ -76,13 +83,13 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Metrics Grid */}
|
{/* Metrics Grid */}
|
||||||
<div className="grid grid-cols-2 sm:grid-cols-4 gap-4 mb-3">
|
<div className="grid grid-cols-2 gap-4 mb-3 md:grid-cols-3 xl:grid-cols-5">
|
||||||
<div>
|
<div>
|
||||||
<p className="text-xs text-slate-500">Uptime</p>
|
<p className="text-xs text-slate-500">Uptime</p>
|
||||||
<p className="text-sm text-slate-700 font-mono mt-1">{uptime}</p>
|
<p className="text-sm text-slate-700 font-mono mt-1">{uptime}</p>
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<p className="text-xs text-slate-500">Nodes</p>
|
<p className="text-xs text-slate-500">{hasClusterTotals ? "Nodes" : "Visible Nodes"}</p>
|
||||||
<div className="flex items-center gap-1 mt-1">
|
<div className="flex items-center gap-1 mt-1">
|
||||||
<ServerIcon className="w-3 h-3 text-blue-400" />
|
<ServerIcon className="w-3 h-3 text-blue-400" />
|
||||||
<p className="text-sm text-slate-700 font-mono">{nodeCount}</p>
|
<p className="text-sm text-slate-700 font-mono">{nodeCount}</p>
|
||||||
@ -95,7 +102,13 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
<div>
|
<div>
|
||||||
<p className="text-xs text-slate-500">GPU</p>
|
<p className="text-xs text-slate-500">GPU</p>
|
||||||
<p className="text-sm text-slate-700 font-mono mt-1">
|
<p className="text-sm text-slate-700 font-mono mt-1">
|
||||||
{usedGpu}/{totalGpu || "N/A"}
|
{hasClusterTotals ? `${usedGpu}/${totalGpu || "N/A"}` : `${allocatedGpu} allocated`}
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
<p className="text-xs text-slate-500">GPU Mem</p>
|
||||||
|
<p className="text-sm text-slate-700 font-mono mt-1">
|
||||||
|
{usedGpuMemory || "N/A"}{totalGpuMemory ? ` / ${totalGpuMemory}` : ""}
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -105,16 +118,18 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
<div>
|
<div>
|
||||||
<div className="flex items-center gap-2 mb-1">
|
<div className="flex items-center gap-2 mb-1">
|
||||||
<Cpu className="w-3 h-3 text-blue-400" />
|
<Cpu className="w-3 h-3 text-blue-400" />
|
||||||
<p className="text-xs text-slate-500">CPU (Cluster Total)</p>
|
<p className="text-xs text-slate-500">{hasClusterTotals ? "CPU (Cluster Total)" : "CPU Requests"}</p>
|
||||||
</div>
|
</div>
|
||||||
<p className="text-sm text-slate-700 font-mono">{usedCpu} / {totalCpu}</p>
|
<p className="text-sm text-slate-700 font-mono">
|
||||||
|
{hasClusterTotals ? `${usedCpu} / ${totalCpu}` : cpuRequestText || "0 cores"}
|
||||||
|
</p>
|
||||||
<div className="mt-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
|
<div className="mt-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
|
||||||
<div
|
<div
|
||||||
className="h-full bg-blue-500 rounded-full transition-all"
|
className="h-full bg-blue-500 rounded-full transition-all"
|
||||||
style={{ width: `${Math.min(cpuUsage, 100)}%` }}
|
style={{ width: `${Math.min(cpuUsage, 100)}%` }}
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
<p className="text-xs text-slate-500 mt-1">{cpuUsage.toFixed(1)}%</p>
|
<p className="text-xs text-slate-500 mt-1">{hasClusterTotals ? `${cpuUsage.toFixed(1)}%` : "self-scoped allocation"}</p>
|
||||||
{cluster.maxNodeCpu && (
|
{cluster.maxNodeCpu && (
|
||||||
<div className="mt-1.5 pt-1.5 border-t border-slate-200">
|
<div className="mt-1.5 pt-1.5 border-t border-slate-200">
|
||||||
<div className="flex items-center gap-1">
|
<div className="flex items-center gap-1">
|
||||||
@ -132,16 +147,18 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
<div>
|
<div>
|
||||||
<div className="flex items-center gap-2 mb-1">
|
<div className="flex items-center gap-2 mb-1">
|
||||||
<Database className="w-3 h-3 text-green-400" />
|
<Database className="w-3 h-3 text-green-400" />
|
||||||
<p className="text-xs text-slate-500">Memory (Cluster Total)</p>
|
<p className="text-xs text-slate-500">{hasClusterTotals ? "Memory (Cluster Total)" : "Memory Requests"}</p>
|
||||||
</div>
|
</div>
|
||||||
<p className="text-sm text-slate-700 font-mono">{usedMemory} / {totalMemory}</p>
|
<p className="text-sm text-slate-700 font-mono">
|
||||||
|
{hasClusterTotals ? `${usedMemory} / ${totalMemory}` : memoryRequestText || "0 B"}
|
||||||
|
</p>
|
||||||
<div className="mt-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
|
<div className="mt-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
|
||||||
<div
|
<div
|
||||||
className="h-full bg-green-500 rounded-full transition-all"
|
className="h-full bg-green-500 rounded-full transition-all"
|
||||||
style={{ width: `${Math.min(memoryUsage, 100)}%` }}
|
style={{ width: `${Math.min(memoryUsage, 100)}%` }}
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
<p className="text-xs text-slate-500 mt-1">{memoryUsage.toFixed(1)}%</p>
|
<p className="text-xs text-slate-500 mt-1">{hasClusterTotals ? `${memoryUsage.toFixed(1)}%` : "self-scoped allocation"}</p>
|
||||||
{cluster.maxNodeMemory && (
|
{cluster.maxNodeMemory && (
|
||||||
<div className="mt-1.5 pt-1.5 border-t border-slate-200">
|
<div className="mt-1.5 pt-1.5 border-t border-slate-200">
|
||||||
<div className="flex items-center gap-1">
|
<div className="flex items-center gap-1">
|
||||||
@ -156,20 +173,20 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{totalGpu > 0 && (
|
{(totalGpu > 0 || allocatedGpu > 0) && (
|
||||||
<div>
|
<div>
|
||||||
<div className="flex items-center gap-2 mb-1">
|
<div className="flex items-center gap-2 mb-1">
|
||||||
<Activity className="w-3 h-3 text-purple-400" />
|
<Activity className="w-3 h-3 text-purple-400" />
|
||||||
<p className="text-xs text-slate-500">GPU (Cluster Total)</p>
|
<p className="text-xs text-slate-500">GPU Allocation</p>
|
||||||
</div>
|
</div>
|
||||||
<p className="text-sm text-slate-700 font-mono">{usedGpu} / {totalGpu}</p>
|
<p className="text-sm text-slate-700 font-mono">{allocatedGpu} / {totalGpu || "N/A"}</p>
|
||||||
<div className="mt-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
|
<div className="mt-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
|
||||||
<div
|
<div
|
||||||
className="h-full bg-purple-500 rounded-full transition-all"
|
className="h-full bg-purple-500 rounded-full transition-all"
|
||||||
style={{ width: `${Math.min(gpuUsage, 100)}%` }}
|
style={{ width: `${Math.min(gpuUsage, 100)}%` }}
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
<p className="text-xs text-slate-500 mt-1">{gpuUsage.toFixed(1)}%</p>
|
<p className="text-xs text-slate-500 mt-1">{hasClusterTotals ? `${gpuUsage.toFixed(1)}%` : "self-scoped allocation"}</p>
|
||||||
{cluster.maxNodeGpu && cluster.maxNodeGpu > 0 && (
|
{cluster.maxNodeGpu && cluster.maxNodeGpu > 0 && (
|
||||||
<div className="mt-1.5 pt-1.5 border-t border-slate-200">
|
<div className="mt-1.5 pt-1.5 border-t border-slate-200">
|
||||||
<div className="flex items-center gap-1">
|
<div className="flex items-center gap-1">
|
||||||
@ -184,8 +201,62 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
|
{(usedGpuMemory || totalGpuMemory) && (
|
||||||
|
<div>
|
||||||
|
<div className="flex items-center gap-2 mb-1">
|
||||||
|
<Database className="w-3 h-3 text-fuchsia-500" />
|
||||||
|
<p className="text-xs text-slate-500">GPU Mem</p>
|
||||||
|
</div>
|
||||||
|
<p className="text-sm text-slate-700 font-mono">
|
||||||
|
{usedGpuMemory || "0"}{totalGpuMemory ? ` / ${totalGpuMemory}` : ""}
|
||||||
|
</p>
|
||||||
|
<p className="text-xs text-slate-500 mt-1">requests.nvidia.com/gpumem</p>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
{userResourceRows.length > 0 && (
|
||||||
|
<div className="mt-3 overflow-hidden rounded-lg border border-slate-200">
|
||||||
|
<div className="flex items-center gap-2 border-b border-slate-200 bg-slate-50 px-3 py-2">
|
||||||
|
<Users className="h-4 w-4 text-slate-500" />
|
||||||
|
<h4 className="text-sm font-semibold text-slate-900">User Resources</h4>
|
||||||
|
</div>
|
||||||
|
<div className="overflow-x-auto">
|
||||||
|
<table className="min-w-[720px] w-full text-left text-xs">
|
||||||
|
<thead className="bg-white text-slate-500">
|
||||||
|
<tr>
|
||||||
|
<th className="px-3 py-2 font-medium">User</th>
|
||||||
|
<th className="px-3 py-2 font-medium">Namespace</th>
|
||||||
|
<th className="px-3 py-2 font-medium">CPU</th>
|
||||||
|
<th className="px-3 py-2 font-medium">Memory</th>
|
||||||
|
<th className="px-3 py-2 font-medium">GPU</th>
|
||||||
|
<th className="px-3 py-2 font-medium">GPU Mem</th>
|
||||||
|
<th className="px-3 py-2 font-medium">Pods</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody className="divide-y divide-slate-100">
|
||||||
|
{userResourceRows.map((row, index) => (
|
||||||
|
<tr key={`${row.userId || row.username || row.userName || "user"}-${index}`} className="bg-white">
|
||||||
|
<td className="max-w-[180px] truncate px-3 py-2 font-medium text-slate-800">
|
||||||
|
{row.username || row.userName || shortId(row.userId) || "-"}
|
||||||
|
</td>
|
||||||
|
<td className="max-w-[180px] truncate px-3 py-2 font-mono text-slate-600">{row.namespace || "-"}</td>
|
||||||
|
<td className="px-3 py-2 font-mono text-slate-700">{firstDisplayValue(row.cpuRequests, row.usedCpu, row.cpuUsed, row.cpuRequest, row.cpuLimits, row.cpuLimit) || "-"}</td>
|
||||||
|
<td className="px-3 py-2 font-mono text-slate-700">{firstDisplayValue(row.memoryRequests, row.usedMemory, row.memoryUsed, row.memoryRequest, row.memoryLimits, row.memoryLimit) || "-"}</td>
|
||||||
|
<td className="px-3 py-2 font-mono text-slate-700">{firstNumber(row.gpuRequests, row.gpuAllocated, row.gpuAllocation, row.usedGpu, row.gpuUsed) ?? 0}</td>
|
||||||
|
<td className="px-3 py-2 font-mono text-slate-700">
|
||||||
|
{firstDisplayValue(row.gpuMemoryRequestsMb, row.gpuMemoryAllocated, row.gpuMemAllocated, row.usedGpuMemory, row.gpuMemoryUsed, row.gpuMemUsed) || "0"}
|
||||||
|
</td>
|
||||||
|
<td className="px-3 py-2 font-mono text-slate-700">{row.podCount ?? "-"}</td>
|
||||||
|
</tr>
|
||||||
|
))}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
<div className="mt-3 flex items-center gap-2 text-xs text-slate-500">
|
<div className="mt-3 flex items-center gap-2 text-xs text-slate-500">
|
||||||
<Clock className="w-3 h-3" />
|
<Clock className="w-3 h-3" />
|
||||||
<span>Last checked: {lastCheckedText}</span>
|
<span>Last checked: {lastCheckedText}</span>
|
||||||
@ -233,3 +304,34 @@ export const ClusterMonitorCard: React.FC<ClusterMonitorCardProps> = ({ cluster
|
|||||||
</Card>
|
</Card>
|
||||||
);
|
);
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const firstNumber = (...values: Array<number | undefined | null>): number => {
|
||||||
|
for (const value of values) {
|
||||||
|
if (typeof value === "number" && Number.isFinite(value)) {
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
const firstDisplayValue = (...values: Array<string | number | undefined | null>): string => {
|
||||||
|
for (const value of values) {
|
||||||
|
if (typeof value === "number" && Number.isFinite(value)) {
|
||||||
|
return String(value);
|
||||||
|
}
|
||||||
|
if (typeof value === "string" && value.trim()) {
|
||||||
|
return value.trim();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "";
|
||||||
|
};
|
||||||
|
|
||||||
|
const getUserResourceRows = (cluster: ClusterMetrics) =>
|
||||||
|
cluster.resourceUsageByUser || cluster.userResources || cluster.userResourceUsage || cluster.resourcesByUser || cluster.userResourceRows || [];
|
||||||
|
|
||||||
|
const shortId = (value?: string): string => {
|
||||||
|
const id = value?.trim();
|
||||||
|
if (!id) return "";
|
||||||
|
if (id.length <= 12) return id;
|
||||||
|
return `${id.slice(0, 8)}...${id.slice(-4)}`;
|
||||||
|
};
|
||||||
|
|||||||
@ -3,7 +3,7 @@
|
|||||||
* 监控集群状态和健康信息
|
* 监控集群状态和健康信息
|
||||||
*/
|
*/
|
||||||
import React, { useState, useEffect } from "react";
|
import React, { useState, useEffect } from "react";
|
||||||
import { Activity, Server, RefreshCw } from "lucide-react";
|
import { Activity, Database, Server, RefreshCw } from "lucide-react";
|
||||||
import { PageHeader, StatsCard, Button, LoadingState, ErrorState, EmptyState } from "@/shared";
|
import { PageHeader, StatsCard, Button, LoadingState, ErrorState, EmptyState } from "@/shared";
|
||||||
import { useToast } from "@/shared";
|
import { useToast } from "@/shared";
|
||||||
import { ClusterErrors, SuccessMessages, formatApiError } from "@/shared/utils";
|
import { ClusterErrors, SuccessMessages, formatApiError } from "@/shared/utils";
|
||||||
@ -107,6 +107,12 @@ const MonitoringClustersPage: React.FC = () => {
|
|||||||
const healthyCount = clusters.filter(c => c.status === "healthy").length;
|
const healthyCount = clusters.filter(c => c.status === "healthy").length;
|
||||||
const warningCount = clusters.filter(c => c.status === "warning" || c.status === "unknown").length;
|
const warningCount = clusters.filter(c => c.status === "warning" || c.status === "unknown").length;
|
||||||
const errorCount = clusters.filter(c => c.status === "error" || c.status === "unhealthy").length;
|
const errorCount = clusters.filter(c => c.status === "error" || c.status === "unhealthy").length;
|
||||||
|
const allocatedGpu = clusters.reduce(
|
||||||
|
(sum, cluster) => sum + firstNumber(cluster.gpuAllocated, cluster.allocatedGpu, cluster.gpuAllocation, cluster.usedGpu),
|
||||||
|
0
|
||||||
|
);
|
||||||
|
const totalGpu = clusters.reduce((sum, cluster) => sum + (cluster.totalGpu ?? 0), 0);
|
||||||
|
const gpuMemoryText = summarizeGpuMemory(clusters);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="space-y-6">
|
<div className="space-y-6">
|
||||||
@ -127,7 +133,7 @@ const MonitoringClustersPage: React.FC = () => {
|
|||||||
</PageHeader>
|
</PageHeader>
|
||||||
|
|
||||||
{/* Summary Stats */}
|
{/* Summary Stats */}
|
||||||
<div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-4">
|
<div className="grid grid-cols-1 sm:grid-cols-2 xl:grid-cols-6 gap-4">
|
||||||
<StatsCard
|
<StatsCard
|
||||||
title="Total Clusters"
|
title="Total Clusters"
|
||||||
value={clusters.length}
|
value={clusters.length}
|
||||||
@ -152,6 +158,18 @@ const MonitoringClustersPage: React.FC = () => {
|
|||||||
icon={Activity}
|
icon={Activity}
|
||||||
variant="red"
|
variant="red"
|
||||||
/>
|
/>
|
||||||
|
<StatsCard
|
||||||
|
title="GPU Allocation"
|
||||||
|
value={`${allocatedGpu}/${totalGpu || "N/A"}`}
|
||||||
|
icon={Activity}
|
||||||
|
variant="purple"
|
||||||
|
/>
|
||||||
|
<StatsCard
|
||||||
|
title="GPU Mem"
|
||||||
|
value={gpuMemoryText}
|
||||||
|
icon={Database}
|
||||||
|
variant="orange"
|
||||||
|
/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Auto-refresh Info */}
|
{/* Auto-refresh Info */}
|
||||||
@ -173,3 +191,40 @@ const MonitoringClustersPage: React.FC = () => {
|
|||||||
};
|
};
|
||||||
|
|
||||||
export default MonitoringClustersPage;
|
export default MonitoringClustersPage;
|
||||||
|
|
||||||
|
const firstNumber = (...values: Array<number | undefined | null>): number => {
|
||||||
|
for (const value of values) {
|
||||||
|
if (typeof value === "number" && Number.isFinite(value)) {
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
const summarizeGpuMemory = (clusters: ClusterMetrics[]): string => {
|
||||||
|
const usedValues = clusters
|
||||||
|
.map((cluster) => firstText(cluster.allocatedGpuMemoryMb, cluster.allocatedGpuMemoryMB, cluster.gpuMemoryRequestsMb, cluster.usedGpuMemory, cluster.gpuMemoryUsed, cluster.usedGpuMem))
|
||||||
|
.filter(Boolean);
|
||||||
|
const totalValues = clusters
|
||||||
|
.map((cluster) => firstText(cluster.totalGpuMemory, cluster.totalGpuMem))
|
||||||
|
.filter(Boolean);
|
||||||
|
if (usedValues.length === 0 && totalValues.length === 0) {
|
||||||
|
return "N/A";
|
||||||
|
}
|
||||||
|
if (usedValues.length === 1 && totalValues.length <= 1) {
|
||||||
|
return totalValues[0] ? `${usedValues[0] || "0"} / ${totalValues[0]}` : usedValues[0];
|
||||||
|
}
|
||||||
|
return `${usedValues.length || 0} clusters`;
|
||||||
|
};
|
||||||
|
|
||||||
|
const firstText = (...values: Array<string | number | undefined | null>): string => {
|
||||||
|
for (const value of values) {
|
||||||
|
if (typeof value === "number" && Number.isFinite(value)) {
|
||||||
|
return String(value);
|
||||||
|
}
|
||||||
|
if (typeof value === "string" && value.trim()) {
|
||||||
|
return value.trim();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "";
|
||||||
|
};
|
||||||
|
|||||||
@ -21,7 +21,7 @@ export default function SidebarLayout({ items, children }: SidebarLayoutProps) {
|
|||||||
isOpen={isSidebarOpen}
|
isOpen={isSidebarOpen}
|
||||||
onClose={() => setIsSidebarOpen(false)}
|
onClose={() => setIsSidebarOpen(false)}
|
||||||
/>
|
/>
|
||||||
<div className="relative z-10 flex flex-col flex-1">
|
<div className="relative z-10 flex min-w-0 flex-1 flex-col">
|
||||||
{React.Children.map(children, (child) => {
|
{React.Children.map(children, (child) => {
|
||||||
// 将 toggleSidebar 函数传递给子组件
|
// 将 toggleSidebar 函数传递给子组件
|
||||||
if (React.isValidElement(child)) {
|
if (React.isValidElement(child)) {
|
||||||
|
|||||||
@ -30,7 +30,7 @@ export default function TopNavLayout({
|
|||||||
onSignOut={onSignOut}
|
onSignOut={onSignOut}
|
||||||
onToggleSidebar={onToggleSidebar}
|
onToggleSidebar={onToggleSidebar}
|
||||||
/>
|
/>
|
||||||
<main className="flex-1 w-full max-w-screen-2xl mx-auto px-4 sm:px-6 lg:px-10 py-6 space-y-6">
|
<main className="min-w-0 flex-1 w-full max-w-screen-2xl mx-auto px-4 sm:px-6 lg:px-10 py-6 space-y-6">
|
||||||
{children}
|
{children}
|
||||||
</main>
|
</main>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@ -143,6 +143,14 @@ export function formatApiError(error: any): string | null {
|
|||||||
|
|
||||||
// Layer 2: HTTP Status Codes (标准HTTP错误)
|
// Layer 2: HTTP Status Codes (标准HTTP错误)
|
||||||
const status = error?.response?.status;
|
const status = error?.response?.status;
|
||||||
|
const apiMessage = error?.response?.data?.message;
|
||||||
|
const apiError = error?.response?.data?.error;
|
||||||
|
if ((status === 400 || status === 403 || status === 422) && apiMessage) {
|
||||||
|
return apiMessage;
|
||||||
|
}
|
||||||
|
if ((status === 400 || status === 403 || status === 422) && apiError) {
|
||||||
|
return apiError;
|
||||||
|
}
|
||||||
if (status) {
|
if (status) {
|
||||||
switch (status) {
|
switch (status) {
|
||||||
case 401:
|
case 401:
|
||||||
@ -162,12 +170,12 @@ export function formatApiError(error: any): string | null {
|
|||||||
|
|
||||||
// Layer 3: API Response Messages (后端返回的具体错误)
|
// Layer 3: API Response Messages (后端返回的具体错误)
|
||||||
// 提取后端返回的详细错误信息
|
// 提取后端返回的详细错误信息
|
||||||
if (error?.response?.data?.message) {
|
if (apiMessage) {
|
||||||
return error.response.data.message;
|
return apiMessage;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (error?.response?.data?.error) {
|
if (apiError) {
|
||||||
return error.response.data.error;
|
return apiError;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Layer 4: Generic Error Messages
|
// Layer 4: Generic Error Messages
|
||||||
@ -243,4 +251,3 @@ export type ErrorMessages =
|
|||||||
| typeof InstanceErrors
|
| typeof InstanceErrors
|
||||||
| typeof ValidationErrors
|
| typeof ValidationErrors
|
||||||
| typeof BusinessErrors;
|
| typeof BusinessErrors;
|
||||||
|
|
||||||
|
|||||||
@ -7,6 +7,7 @@ server {
|
|||||||
listen 80 default_server;
|
listen 80 default_server;
|
||||||
listen 443 ssl http2 default_server;
|
listen 443 ssl http2 default_server;
|
||||||
server_name _;
|
server_name _;
|
||||||
|
server_tokens off;
|
||||||
|
|
||||||
ssl_certificate /etc/nginx/certs/tls.crt;
|
ssl_certificate /etc/nginx/certs/tls.crt;
|
||||||
ssl_certificate_key /etc/nginx/certs/tls.key;
|
ssl_certificate_key /etc/nginx/certs/tls.key;
|
||||||
@ -18,6 +19,13 @@ server {
|
|||||||
|
|
||||||
root /usr/share/nginx/html;
|
root /usr/share/nginx/html;
|
||||||
index index.html;
|
index index.html;
|
||||||
|
resolver 127.0.0.11 valid=10s ipv6=off;
|
||||||
|
|
||||||
|
add_header X-Frame-Options "DENY" always;
|
||||||
|
add_header X-Content-Type-Options "nosniff" always;
|
||||||
|
add_header Referrer-Policy "no-referrer" always;
|
||||||
|
add_header Content-Security-Policy "default-src 'self'; connect-src 'self' http: https: ws: wss:; img-src 'self' data:; style-src 'self' 'unsafe-inline'; script-src 'self'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'" always;
|
||||||
|
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||||
|
|
||||||
# 前端 SPA 路由 fallback
|
# 前端 SPA 路由 fallback
|
||||||
location / {
|
location / {
|
||||||
@ -26,7 +34,8 @@ server {
|
|||||||
|
|
||||||
# API 请求代理到 backend 服务
|
# API 请求代理到 backend 服务
|
||||||
location /api/ {
|
location /api/ {
|
||||||
proxy_pass http://backend:8080;
|
set $backend_upstream http://backend:8080;
|
||||||
|
proxy_pass $backend_upstream;
|
||||||
proxy_http_version 1.1;
|
proxy_http_version 1.1;
|
||||||
proxy_set_header Host $host;
|
proxy_set_header Host $host;
|
||||||
proxy_set_header X-Real-IP $remote_addr;
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
@ -34,10 +43,22 @@ server {
|
|||||||
proxy_set_header X-Forwarded-Proto $scheme;
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# 外部健康检查,避免 /health 落到 SPA
|
||||||
|
location = /health {
|
||||||
|
access_log off;
|
||||||
|
default_type application/json;
|
||||||
|
return 200 '{"status":"ok"}';
|
||||||
|
}
|
||||||
|
|
||||||
# Nginx 健康检查
|
# Nginx 健康检查
|
||||||
location = /healthz {
|
location = /healthz {
|
||||||
access_log off;
|
access_log off;
|
||||||
add_header Content-Type text/plain;
|
add_header Content-Type text/plain;
|
||||||
|
add_header X-Frame-Options "DENY" always;
|
||||||
|
add_header X-Content-Type-Options "nosniff" always;
|
||||||
|
add_header Referrer-Policy "no-referrer" always;
|
||||||
|
add_header Content-Security-Policy "default-src 'self'; connect-src 'self' http: https: ws: wss:; img-src 'self' data:; style-src 'self' 'unsafe-inline'; script-src 'self'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'" always;
|
||||||
|
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||||
return 200 'ok';
|
return 200 'ok';
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -45,6 +66,11 @@ server {
|
|||||||
location ~* \.(js|css|png|jpg|jpeg|gif|svg|ico)$ {
|
location ~* \.(js|css|png|jpg|jpeg|gif|svg|ico)$ {
|
||||||
expires 7d;
|
expires 7d;
|
||||||
add_header Cache-Control "public, max-age=604800, immutable";
|
add_header Cache-Control "public, max-age=604800, immutable";
|
||||||
|
add_header X-Frame-Options "DENY" always;
|
||||||
|
add_header X-Content-Type-Options "nosniff" always;
|
||||||
|
add_header Referrer-Policy "no-referrer" always;
|
||||||
|
add_header Content-Security-Policy "default-src 'self'; connect-src 'self' http: https: ws: wss:; img-src 'self' data:; style-src 'self' 'unsafe-inline'; script-src 'self'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'" always;
|
||||||
|
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||||
try_files $uri /index.html;
|
try_files $uri /index.html;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -7,3 +7,9 @@
|
|||||||
- For real Helm smoke tests, wait for platform instance deletion to remove the DB record before deleting the Kubernetes namespace manually. Deleting the namespace too early can make the async Helm uninstall mark the instance failed.
|
- For real Helm smoke tests, wait for platform instance deletion to remove the DB record before deleting the Kubernetes namespace manually. Deleting the namespace too early can make the async Helm uninstall mark the instance failed.
|
||||||
- When embedding Helm, setting `actionConfig.Init(..., namespace, ...)` and `Install.Namespace` is not enough. The custom `RESTClientGetter` must also override the raw kubeconfig loader namespace, or manifests without `metadata.namespace` can be created in the kubeconfig context namespace such as `default`.
|
- When embedding Helm, setting `actionConfig.Init(..., namespace, ...)` and `Install.Namespace` is not enough. The custom `RESTClientGetter` must also override the raw kubeconfig loader namespace, or manifests without `metadata.namespace` can be created in the kubeconfig context namespace such as `default`.
|
||||||
- **Axios keysToSnake recursively converts ALL object keys including user-provided values map.** This silently renames Helm chart values (gpuMem → gpu_mem) causing chart to ignore user settings. Fix: skip recursion for known data fields (values, valuesYaml) while still converting field names. Backend DTOs must provide dual json tags (camelCase + snake_case) with Normalize() fallback.
|
- **Axios keysToSnake recursively converts ALL object keys including user-provided values map.** This silently renames Helm chart values (gpuMem → gpu_mem) causing chart to ignore user settings. Fix: skip recursion for known data fields (values, valuesYaml) while still converting field names. Backend DTOs must provide dual json tags (camelCase + snake_case) with Normalize() fallback.
|
||||||
|
- In the current two-role model, ordinary users must be forced into username-derived private workspaces/namespaces. Do not accept arbitrary `workspaceId` for role=`user`, or `(workspace_id, cluster_id)` quotas become shared across users. When editing an existing user, update the existing private workspace in place; only migrate users still attached to the default workspace.
|
||||||
|
- CPU and memory quotas are allowed to be blank, which means no platform ResourceQuota limit for that resource. GPU and `requests.nvidia.com/gpumem` should still default to explicit `0` for ordinary users unless admin sets them.
|
||||||
|
- Layout regression tests must not depend on deliberately invalid charts leaving DB instances behind. The safer behavior is to reject before DB persistence; use mocked API data for pure frontend overflow checks.
|
||||||
|
- Monitoring Pod-to-instance attribution cannot rely only on Helm standard labels. Some local charts, including `vllm-serve`, use only `app=<release>`; include that fallback before concluding allocation is zero.
|
||||||
|
- In this Compose stack the React frontend is not a long-running frontend container. `frontend-build` is a one-shot asset build and `nginx` is the frontend runtime plus API gateway; README and status commands must make that explicit or users will think the stack is partially down.
|
||||||
|
- For admin tables/cards inside the sidebar shell, fixed multi-column grids can still overflow even when individual buttons use `min-w-0`. Prefer responsive card layouts with a wrapping action region, and test at 1440/1280/1024/900/768 widths.
|
||||||
|
|||||||
118
tasks/todo.md
118
tasks/todo.md
@ -1,5 +1,77 @@
|
|||||||
# OCDP 最终文档结构
|
# OCDP 最终文档结构
|
||||||
|
|
||||||
|
## Quota lifecycle monitoring implementation 2026-05-14
|
||||||
|
|
||||||
|
- [x] Main: integrate per-user per-cluster quota semantics and final verification.
|
||||||
|
- [x] Treat ordinary-user empty CPU/memory/GPU/GPU memory quotas as explicit zero.
|
||||||
|
- [x] Make create/update/scale quota checks use the selected cluster binding and sync ResourceQuota first.
|
||||||
|
- [x] Reject GPU=0 user vllm deployment on k3s before DB instance/release creation.
|
||||||
|
- [x] Worker A: implement backend quota evaluator/resource quota sync without touching frontend.
|
||||||
|
- [x] Worker B: implement user lifecycle cleanup, snake_case DTO normalization, and safe admin/user role transitions.
|
||||||
|
- [x] Add auth DTO alternate snake_case fields plus `Normalize()` for register/update requests, and call it in auth handler before service mapping.
|
||||||
|
- [x] Make admin-to-user role transitions create or safely reuse the username-derived workspace; detect namespace ownership conflicts and return an explicit domain conflict error.
|
||||||
|
- [x] Extend workspace binding repository to list/delete all bindings for a workspace so user deletion can clean every cluster binding.
|
||||||
|
- [x] Extend tenant kube client with idempotent tenant cleanup for namespace/service account/role binding/resource quota, refusing system namespaces such as `default` and `kube-system`.
|
||||||
|
- [x] Extend `AuthService` dependencies for instance/cluster/binding/tenant cleanup, preserving existing callers and avoiding frontend changes.
|
||||||
|
- [x] Update `DeleteUser` to reject deletion when the user owns instances; when safe, clean exclusive user workspace cluster bindings and OCDP tenant resources before deleting the user.
|
||||||
|
- [x] Add focused Go tests for DTO normalization, role downgrade workspace reuse/conflict, delete-with-instances conflict, cleanup path, and protected namespace cleanup.
|
||||||
|
- [x] Run targeted Go tests, review diff, and add Worker B Review summary here.
|
||||||
|
- Review: changed auth DTO normalization, auth handler normalization calls, auth service workspace reuse/delete cleanup logic, workspace binding repository ports/adapters, tenant kube cleanup, domain errors, mock/test coverage, and API wiring. Namespace conflicts now return `ErrWorkspaceNamespaceConflict`/HTTP 409; deleting users with owned/workspace instances returns `ErrUserHasInstances`/HTTP 409; protected tenant namespaces return forbidden-style `ErrProtectedNamespace`. Validation passed with targeted Worker B tests and full backend `go test ./...`.
|
||||||
|
- [x] Worker C: implement monitoring resource aggregation and instance owner username fields.
|
||||||
|
- [x] Worker D: implement frontend user management, instance card, and monitoring UI changes.
|
||||||
|
- [x] Inspect current API/generated/UI type contracts for owner and monitoring resource fields without changing backend.
|
||||||
|
- [x] Rework User Management accounts area into a wider operations layout with quota chips/split columns and actions that do not squeeze quota content.
|
||||||
|
- [x] Change admin-to-user downgrade flow to open/reuse the tenant resource limit editor and submit role plus namespace/cluster/quota fields together.
|
||||||
|
- [x] Show instance owner as `ownerUsername` when present, otherwise a shortened `ownerId`.
|
||||||
|
- [x] Extend monitoring frontend types/adapters as needed for GPU allocation, GPU memory, and per-user resource rows returned by the backend.
|
||||||
|
- [x] Update Cluster Monitoring cards/page to render GPU allocation/GPU Mem and per-user resource tables while respecting backend-scoped data for normal users.
|
||||||
|
- [x] Check responsive behavior for the touched UI and avoid obvious desktop/mobile overflow.
|
||||||
|
- [x] Run targeted frontend type/build tests available in the repo and review diff.
|
||||||
|
- [x] Add Worker D Review summary with changed files and verification results.
|
||||||
|
- Review: changed `frontend/src/features/configuration/users/pages/UserManagementPage.tsx`, `frontend/src/features/artifact/instances/components/InstanceCard.tsx`, `frontend/src/features/monitoring/clusters/components/ClusterMonitorCard.tsx`, `frontend/src/features/monitoring/clusters/pages/MonitoringClustersPage.tsx`, `frontend/src/core/types/index.ts`, and `frontend/src/api/index.ts`. User Management now uses wider operation rows with quota chips and admin-to-user downgrade saves role plus tenant limits. Instance cards show owner username or short owner ID. Cluster monitoring renders GPU allocation, GPU memory, and backend-returned per-user resource rows. Validation: `npm run build` passed; targeted `npx eslint ...` on changed frontend source files passed; full `npm run lint` remains blocked by pre-existing generated/cache and legacy lint errors; Playwright viewport check passed for `/configuration/users` and `/monitoring/clusters` at 390x844 and 1440x1000 with mocked API data and no horizontal overflow detected.
|
||||||
|
- [x] Worker E: add API/Playwright/k3s regression tests for this plan.
|
||||||
|
- [x] Worker F: read-only review for quota bypass, namespace deletion safety, and monitoring privacy.
|
||||||
|
- [x] Run `go test ./...` and `npm run build`.
|
||||||
|
- [x] Run Docker Compose smoke plus API/Playwright regression scripts.
|
||||||
|
- [x] Run real k3s negative vllm quota deployment test and clean up test users.
|
||||||
|
- [ ] Run positive GPU=1 k3s vllm deployment when cluster resources are available.
|
||||||
|
- [x] Add Review summary and lessons.
|
||||||
|
- Review: ordinary users are now forced into username-derived private workspace/namespace so per-cluster bindings behave as per-user/per-cluster quota buckets. Empty user CPU/memory/GPU/GPU Mem default to explicit `0`; create/update/scale use rendered Helm resource estimates, live ResourceQuota usage minus current release delta, and synced tenant ResourceQuota before persistence/Helm mutation. User deletion blocks on owned/workspace instances, then cleans tenant bindings/namespaces and deletes the exclusive workspace record. Monitoring now returns per-user resource rows and strips cluster-wide node/total metrics for ordinary users. Frontend renders `resourceUsageByUser`, instance owners, less cramped user quotas, and wraps InstanceCard actions to keep Delete inside the viewport. Validation passed: backend `go test ./...`, frontend `npm run build`, Docker Compose health checks, `test/unresolved_bugs_security_gateway_contract.py`, `test/unresolved_bugs_api_contract.py`, `test/user_namespace_quota_api_contract.py`, `test/frontend-playwright-smoke.py`, and `test/instance_card_action_layout_playwright.py`. Positive GPU=1 vllm deployment was not run because the cluster resource constraint remains external; negative GPU=0 vllm quota rejection on k3s passed.
|
||||||
|
|
||||||
|
## Unresolved bugs implementation 2026-05-14
|
||||||
|
|
||||||
|
- [x] Worker A: fix instance API contract: detail replicas, list values, values/valuesYaml conflict, namespace 403.
|
||||||
|
- [x] Inspect current instance handler/service tests and avoid touching other workers' areas.
|
||||||
|
- [x] Add request validation so `values` plus `valuesYaml`/`values_yaml` conflicts return HTTP 400, while YAML-only still populates values.
|
||||||
|
- [x] Enrich `GetInstance` replica count using the same live K8s source as list.
|
||||||
|
- [x] Include `values` in list responses for API compatibility.
|
||||||
|
- [x] Change normal-user tenant namespace mismatch from silent override to `ErrForbidden`/HTTP 403.
|
||||||
|
- [x] Add focused Go tests for namespace mismatch and replica enrichment/list values where practical.
|
||||||
|
- [x] Run targeted Go tests and review diff.
|
||||||
|
- Review: changed only scoped instance backend files plus this task tracker; validated with `go test ./internal/domain/service` and `go test ./internal/adapter/input/http/rest` from `backend/`.
|
||||||
|
- [x] Worker B: add Helm-rendered quota pre-check helper before DB create/Helm install.
|
||||||
|
- [x] Inspect Helm client/service quota contracts and preserve other workers' edits.
|
||||||
|
- [x] Add domain quota precheck types and compare logic for CPU, memory, GPU, and integer-MB gpumem.
|
||||||
|
- [x] Add Helm render estimator output port and real/mock Helm implementations that render final chart values and sum Pod template requests/limits.
|
||||||
|
- [x] Add focused Go tests for quota comparison and rendered manifest estimation where feasible.
|
||||||
|
- [x] Run targeted Go tests and review diff.
|
||||||
|
- Review: exposed `QuotaPrecheckService.EstimateAndCompare` plus `CompareWorkspaceQuota`; real Helm now dry-renders `/tmp/charts/{chart}-{version}.tgz` with final values and estimates Pod template requests/limits. Added quota and manifest estimator tests. Validation passed with `go test ./internal/domain/service ./internal/adapter/output/helm/...` and full backend `go test ./...`.
|
||||||
|
- [ ] Worker C: add compatibility/security backend endpoints and auth/CORS/rate-limit fixes.
|
||||||
|
- [x] Inspect backend route/handler/service contracts and preserve other workers' edits.
|
||||||
|
- [x] Add `/repositories/{repo}/tags` compatibility alias without changing existing artifact behavior.
|
||||||
|
- [x] Add `/monitoring/clusters/{id}/metrics` alias and `/clusters/{id}/stats` compatibility response.
|
||||||
|
- [x] Add `/clusters/{id}/kubeconfig` tenant kubeconfig endpoint scoped to the authenticated user's workspace and requested cluster.
|
||||||
|
- [x] Make login failures uniform and add a lightweight per-client login rate limit.
|
||||||
|
- [x] Replace permissive CORS reflection/wildcard defaults with an allowlist-driven default suitable for local dev.
|
||||||
|
- [ ] Add focused Go tests where straightforward, then run relevant Go tests and review diff.
|
||||||
|
- Review: changed scoped backend route/auth/CORS handlers, added CORS and login limiter tests, and removed the direct SSE CORS wildcard so global CORS applies. Validation attempted with `go test ./cmd/api ./internal/adapter/input/http/rest` from `backend/`, but it is currently blocked by concurrent Worker B compile errors in `internal/domain/service/quota_precheck.go` and Helm client implementations missing `EstimateInstanceResources`.
|
||||||
|
- [x] Worker D: harden Nginx gateway `/health`, server tokens, and security headers.
|
||||||
|
- [ ] Worker E: align frontend/API client and Playwright coverage for conflict/namespace/scale flows.
|
||||||
|
- [ ] Worker F: add API/security/regression test scripts and review coverage.
|
||||||
|
- [ ] Integrate worker changes, resolve conflicts, and run Go/frontend builds.
|
||||||
|
- [ ] Run Docker Compose smoke, API contracts, Playwright, and real k3s deploy cleanup.
|
||||||
|
- [ ] Update `tasks/lessons.md` and add Review summary here.
|
||||||
|
|
||||||
## docs/ 目录 (已清理)
|
## docs/ 目录 (已清理)
|
||||||
|
|
||||||
| 文件 | 用途 | 状态 |
|
| 文件 | 用途 | 状态 |
|
||||||
@ -9,3 +81,49 @@
|
|||||||
| `test-users.json` | 4 个测试账号凭证 | ✅ 永久参考 |
|
| `test-users.json` | 4 个测试账号凭证 | ✅ 永久参考 |
|
||||||
| `regression-full-report.md` | 最新综合回归报告 | ✅ 可删除(下一个版本) |
|
| `regression-full-report.md` | 最新综合回归报告 | ✅ 可删除(下一个版本) |
|
||||||
| `UNRESOLVED-BUGS.md` | 未修复问题清单 (15 个) | ✅ 当前版本 |
|
| `UNRESOLVED-BUGS.md` | 未修复问题清单 (15 个) | ✅ 当前版本 |
|
||||||
|
|
||||||
|
## Worker C monitoring and instance owner backend 2026-05-14
|
||||||
|
|
||||||
|
- [x] Inspect existing instance/monitoring permission, repository, DTO, and K8s metrics contracts without reverting other workers' changes.
|
||||||
|
- [x] Add `ownerUsername` to instance entity/DTO responses and hydrate it for detail/list via user repository while preserving ordinary-user/admin visibility rules.
|
||||||
|
- [x] Add K8s Pod resource allocation collection from requests/limits, including GPU and `requests.nvidia.com/gpumem` as integer MB.
|
||||||
|
- [x] Aggregate `resourceUsageByUser` in monitoring service by matching Pods to visible instances/workspaces/owners, with ordinary users scoped to themselves and admins seeing all visible owners.
|
||||||
|
- [x] Expose cluster-level GPU/GPU memory allocation fields and per-user resource usage in `/monitoring/clusters`, detail, and existing aliases.
|
||||||
|
- [x] Add focused Go tests for instance owner username and monitoring resource aggregation/privacy.
|
||||||
|
- [x] Run relevant Go tests, review diff, and add Review summary here.
|
||||||
|
- Review: Instance list/detail now include `ownerUsername` hydrated from the user repository. Monitoring responses now include per-user resource usage plus CPU/memory/GPU/GPU-memory request/limit allocation fields derived from Kubernetes Pod resources and DB instance ownership mapping; ordinary users only see their own allocation rows/totals, admins see all visible instance owners. Validation passed with `go test ./internal/domain/service`, `go test ./cmd/api ./internal/adapter/input/http/rest ./internal/adapter/output/k8s`, and backend `go test ./...`.
|
||||||
|
|
||||||
|
## Debug quota limits monitoring UI 2026-05-15
|
||||||
|
|
||||||
|
- [x] Inspect current runtime logs for workspace conflict/quota errors without killing other services.
|
||||||
|
- [x] Fix quota semantics: CPU/memory blank means unlimited; GPU/GPU Mem blank means explicit zero for ordinary users.
|
||||||
|
- [x] Fix admin user update so editing an existing user's quota does not recreate/reassign namespace and does not raise false `workspace namespace conflict`.
|
||||||
|
- [x] Rework User Management action controls so they wrap inside the viewport on desktop and mobile.
|
||||||
|
- [x] Improve monitoring for ordinary users with self-scoped useful fields instead of all `N/A`; make admin monitoring show the new resource allocation rows clearly.
|
||||||
|
- [x] Rebuild Docker Compose stack and run backend/frontend tests plus Playwright overflow smoke.
|
||||||
|
- [x] Use ivanwu on k3s with vllm-serve 0.6.0, CPU/memory unlimited and gpumem `10000`, then verify/clean up.
|
||||||
|
- [x] Add Review summary and lessons.
|
||||||
|
|
||||||
|
Review:
|
||||||
|
- Runtime logs were checked before and after changes. The only 502s observed were during intentional backend rebuild; final backend/nginx logs had no error/fatal/5xx entries.
|
||||||
|
- Admin can now update ivanwu without `workspace namespace conflict`; ivanwu was migrated to workspace `ivanwu`, namespace `ocdp-u-ivanwu`, default cluster k3s, CPU/memory unlimited, GPU `1`, GPU Mem `10000`.
|
||||||
|
- k3s ResourceQuota for ivanwu contains only GPU and GPU Mem hard limits; CPU/memory are omitted as unlimited. A vllm-serve `0.6.0` deployment used `harbor.bwgdi.com/library/vllm-openai:v0.17.1`, reached `deployed`, Pod `1/1 Running`, then was deleted through the platform and quota usage returned to `0/1` GPU and `0/10k` gpumem.
|
||||||
|
- Monitoring now shows ordinary users self-scoped allocation rows and admin per-user rows. The vLLM deployment was visible as CPU `1.00 cores`, memory `9.8 GiB`, GPU `1`, GPU Mem `10000`.
|
||||||
|
- Verification passed: `go test ./...`, `npm run build`, `test/frontend-playwright-smoke.py`, `test/instance_card_action_layout_playwright.py`, `test/user_management_layout_playwright.py`, `test/user_namespace_quota_api_contract.py`, and `test/unresolved_bugs_api_contract.py`.
|
||||||
|
|
||||||
|
## Restart docs and user management overflow 2026-05-18
|
||||||
|
|
||||||
|
- [x] Inspect current Docker Compose service lifecycle and identify why frontend/backend feel disconnected.
|
||||||
|
- [x] Update Makefile/README so one clear command starts the whole platform, with explicit rebuild/restart/status/log commands.
|
||||||
|
- [x] Restart the full stack through the documented command and verify health endpoints.
|
||||||
|
- [x] Reproduce User Management overflow with Playwright at desktop/tablet/mobile widths.
|
||||||
|
- [x] Fix User Management layout so action buttons and quota controls stay inside the viewport.
|
||||||
|
- [x] Run backend/frontend builds plus Playwright layout smoke.
|
||||||
|
- [x] Record Review summary and lessons.
|
||||||
|
|
||||||
|
Review:
|
||||||
|
- `make up` is now the single documented platform start command. It runs `docker compose up --build -d` for the whole stack, and old commands (`run-2`, `docker-dev`, `docker-prod`, `docker-up`) are compatibility aliases.
|
||||||
|
- `docker-compose.yml` now keeps `nginx` under `restart: unless-stopped`; `make docker-ps` and `make up` show `docker compose ps -a`, so the expected `frontend-build Exited (0)` state is visible and less confusing.
|
||||||
|
- README now explains that frontend-build is a one-shot build job and the actual frontend runtime is `nginx`, which also proxies `/api`.
|
||||||
|
- User Management layout was changed from a fixed four-column row to a responsive card layout with a wrapping action area. The app shell content column also has `min-w-0` so wide children cannot force browser overflow.
|
||||||
|
- Verification passed: `go test ./...`, `npm run build`, `make up`, health checks for backend/nginx/web, `test/user_management_layout_playwright.py` across 1440/1280/1024/900/768 widths, `test/frontend-playwright-smoke.py`, and `test/instance_card_action_layout_playwright.py`.
|
||||||
|
|||||||
@ -1,7 +1,8 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# Covers InstanceCard action layout: creates a harmless failed metadata instance
|
# Covers InstanceCard action layout. It prefers a harmless failed metadata
|
||||||
# with an invalid chart before Helm runs, opens the Instances page, verifies the
|
# instance when the API preserves one; if chart validation rejects before DB
|
||||||
# Delete button remains inside the card and viewport, clicks it, and cleans up.
|
# persistence, it falls back to mocking only the instance list/delete API so the
|
||||||
|
# visual overflow assertion remains independent from deployment behavior.
|
||||||
|
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
@ -102,6 +103,7 @@ def main() -> int:
|
|||||||
release = f"ocdp-ui-overflow-{suffix}"
|
release = f"ocdp-ui-overflow-{suffix}"
|
||||||
namespace = f"ocdp-ui-overflow-{suffix}"
|
namespace = f"ocdp-ui-overflow-{suffix}"
|
||||||
instance_id = ""
|
instance_id = ""
|
||||||
|
synthetic = False
|
||||||
try:
|
try:
|
||||||
create = request(
|
create = request(
|
||||||
"POST",
|
"POST",
|
||||||
@ -125,16 +127,44 @@ def main() -> int:
|
|||||||
break
|
break
|
||||||
time.sleep(0.5)
|
time.sleep(0.5)
|
||||||
if not instance_id:
|
if not instance_id:
|
||||||
raise AssertionError("test instance was not visible after failed chart download")
|
synthetic = True
|
||||||
|
|
||||||
with sync_playwright() as p:
|
with sync_playwright() as p:
|
||||||
browser = p.chromium.launch(headless=True)
|
browser = p.chromium.launch(headless=True)
|
||||||
page = browser.new_page(viewport={"width": 920, "height": 760})
|
page = browser.new_page(viewport={"width": 920, "height": 760})
|
||||||
page.on("dialog", lambda dialog: dialog.accept())
|
page.on("dialog", lambda dialog: dialog.accept())
|
||||||
login_ui(page)
|
login_ui(page)
|
||||||
|
if synthetic:
|
||||||
|
synthetic_instance = {
|
||||||
|
"id": f"synthetic-{suffix}",
|
||||||
|
"name": release,
|
||||||
|
"namespace": namespace,
|
||||||
|
"clusterId": cluster_id,
|
||||||
|
"registryId": registry_id,
|
||||||
|
"repository": "charts/nonexistent",
|
||||||
|
"chart": "nonexistent",
|
||||||
|
"version": "0.0.0",
|
||||||
|
"status": "failed",
|
||||||
|
"ownerUsername": ADMIN_USER,
|
||||||
|
"values": {},
|
||||||
|
"createdAt": "2026-05-15T00:00:00Z",
|
||||||
|
"updatedAt": "2026-05-15T00:00:00Z",
|
||||||
|
}
|
||||||
|
|
||||||
|
def fulfill_instances(route):
|
||||||
|
if route.request.method == "GET":
|
||||||
|
route.fulfill(status=200, content_type="application/json", body=json.dumps({"instances": [synthetic_instance], "total": 1}))
|
||||||
|
return
|
||||||
|
if route.request.method == "DELETE":
|
||||||
|
route.fulfill(status=204, body="")
|
||||||
|
return
|
||||||
|
route.continue_()
|
||||||
|
|
||||||
|
page.route("**/api/v1/clusters/*/instances", fulfill_instances)
|
||||||
|
page.route("**/api/v1/clusters/*/instances/*", fulfill_instances)
|
||||||
page.get_by_role("button", name="Instances", exact=True).click()
|
page.get_by_role("button", name="Instances", exact=True).click()
|
||||||
page.wait_for_load_state("networkidle")
|
page.wait_for_load_state("networkidle")
|
||||||
heading = page.get_by_role("heading", name=release, exact=True)
|
heading = page.get_by_role("heading", name=release, exact=True).first
|
||||||
expect(heading).to_be_visible(timeout=15000)
|
expect(heading).to_be_visible(timeout=15000)
|
||||||
card = heading.locator("xpath=ancestor::div[contains(@class, 'group')][1]")
|
card = heading.locator("xpath=ancestor::div[contains(@class, 'group')][1]")
|
||||||
delete_button = card.get_by_role("button", name="Delete", exact=True)
|
delete_button = card.get_by_role("button", name="Delete", exact=True)
|
||||||
|
|||||||
259
test/unresolved_bugs_api_contract.py
Normal file
259
test/unresolved_bugs_api_contract.py
Normal file
@ -0,0 +1,259 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
# Covers unresolved API regressions: compatibility tags/metrics/stats/kubeconfig
|
||||||
|
# endpoints, values/valuesYaml conflict handling, ordinary-user namespace 403,
|
||||||
|
# and quota precheck rejection before an instance is persisted.
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any
|
||||||
|
from urllib.error import HTTPError, URLError
|
||||||
|
from urllib.parse import quote, urljoin
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
|
||||||
|
|
||||||
|
RAW_BASE_URL = os.environ.get("BASE_URL", "http://localhost:18081/api/v1").rstrip("/")
|
||||||
|
BASE_URL = RAW_BASE_URL + "/"
|
||||||
|
ADMIN_USER = os.environ.get("ADMIN_USER", os.environ.get("BOOTSTRAP_ADMIN_USER", "admin"))
|
||||||
|
ADMIN_PASS = os.environ.get("ADMIN_PASS", os.environ.get("BOOTSTRAP_ADMIN_PASS", ""))
|
||||||
|
TARGET_CLUSTER_NAME = os.environ.get("TARGET_CLUSTER_NAME", "k3s")
|
||||||
|
TARGET_REGISTRY_NAME = os.environ.get("TARGET_REGISTRY_NAME", "harbor-bwgdi")
|
||||||
|
NGINX_REPOSITORY = os.environ.get("NGINX_CHART_REPOSITORY", "charts/nginx")
|
||||||
|
NGINX_TAG = os.environ.get("NGINX_CHART_TAG", "22.1.1")
|
||||||
|
VLLM_REPOSITORY = os.environ.get("VLLM_CHART_REPOSITORY", "charts/vllm-serve")
|
||||||
|
VLLM_TAG = os.environ.get("VLLM_CHART_TAG", "0.6.0")
|
||||||
|
GPU_MEM_MB = os.environ.get("GPU_MEM_MB", "10000")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Response:
|
||||||
|
status: int
|
||||||
|
headers: dict[str, str]
|
||||||
|
body: str
|
||||||
|
json: Any
|
||||||
|
|
||||||
|
|
||||||
|
def parse_json(body: str) -> Any:
|
||||||
|
try:
|
||||||
|
return json.loads(body) if body else None
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def request(method: str, path: str, token: str | None = None, payload: Any = None, timeout: int = 60) -> Response:
|
||||||
|
data = None
|
||||||
|
headers = {"Accept": "application/json"}
|
||||||
|
if payload is not None:
|
||||||
|
data = json.dumps(payload).encode("utf-8")
|
||||||
|
headers["Content-Type"] = "application/json"
|
||||||
|
if token:
|
||||||
|
headers["Authorization"] = f"Bearer {token}"
|
||||||
|
url = path if path.startswith("http") else urljoin(BASE_URL, path.lstrip("/"))
|
||||||
|
try:
|
||||||
|
with urlopen(Request(url, data=data, headers=headers, method=method), timeout=timeout) as res:
|
||||||
|
body = res.read().decode("utf-8", errors="replace")
|
||||||
|
return Response(res.status, dict(res.headers), body, parse_json(body))
|
||||||
|
except HTTPError as exc:
|
||||||
|
body = exc.read().decode("utf-8", errors="replace")
|
||||||
|
return Response(exc.code, dict(exc.headers), body, parse_json(body))
|
||||||
|
except URLError as exc:
|
||||||
|
raise AssertionError(f"Cannot reach {url}: {exc}") from exc
|
||||||
|
|
||||||
|
|
||||||
|
def assert_status(resp: Response, expected: set[int], context: str) -> None:
|
||||||
|
if resp.status not in expected:
|
||||||
|
raise AssertionError(f"{context}: expected HTTP {sorted(expected)}, got {resp.status}. Body: {resp.body[:800]}")
|
||||||
|
|
||||||
|
|
||||||
|
def login(username: str, password: str) -> str:
|
||||||
|
resp = request("POST", "/auth/login", payload={"username": username, "password": password})
|
||||||
|
assert_status(resp, {200}, f"login {username}")
|
||||||
|
if not isinstance(resp.json, dict) or not resp.json.get("accessToken"):
|
||||||
|
raise AssertionError(f"login {username}: missing accessToken")
|
||||||
|
return str(resp.json["accessToken"])
|
||||||
|
|
||||||
|
|
||||||
|
def list_items(path: str, token: str, context: str) -> list[dict[str, Any]]:
|
||||||
|
resp = request("GET", path, token)
|
||||||
|
assert_status(resp, {200}, context)
|
||||||
|
if isinstance(resp.json, list):
|
||||||
|
return [item for item in resp.json if isinstance(item, dict)]
|
||||||
|
if isinstance(resp.json, dict):
|
||||||
|
for key in ("items", "clusters", "registries", "instances"):
|
||||||
|
value = resp.json.get(key)
|
||||||
|
if isinstance(value, list):
|
||||||
|
return [item for item in value if isinstance(item, dict)]
|
||||||
|
raise AssertionError(f"{context}: expected list response, got {resp.body[:800]}")
|
||||||
|
|
||||||
|
|
||||||
|
def find_by_name(items: list[dict[str, Any]], name: str, context: str) -> dict[str, Any]:
|
||||||
|
for item in items:
|
||||||
|
if item.get("name") == name:
|
||||||
|
return item
|
||||||
|
raise AssertionError(f"{context}: could not find {name!r}. Available: {[item.get('name') for item in items]}")
|
||||||
|
|
||||||
|
|
||||||
|
def encoded_repo(repo: str) -> str:
|
||||||
|
return quote(repo, safe="")
|
||||||
|
|
||||||
|
|
||||||
|
def create_test_user(admin_token: str, cluster_id: str, suffix: str, quota_gpu: str = "1") -> tuple[str, str, str]:
|
||||||
|
username = f"api-bugs-{suffix}"
|
||||||
|
password = "ApiBugs123!"
|
||||||
|
namespace = f"ocdp-u-api-bugs-{suffix}"
|
||||||
|
created = request(
|
||||||
|
"POST",
|
||||||
|
"/users",
|
||||||
|
admin_token,
|
||||||
|
{
|
||||||
|
"username": username,
|
||||||
|
"password": password,
|
||||||
|
"role": "user",
|
||||||
|
"namespace": namespace,
|
||||||
|
"defaultClusterId": cluster_id,
|
||||||
|
"quotaCpu": "2",
|
||||||
|
"quotaMemory": "8Gi",
|
||||||
|
"quotaGpu": quota_gpu,
|
||||||
|
"quotaGpuMemory": GPU_MEM_MB,
|
||||||
|
"isActive": True,
|
||||||
|
"mustChangePassword": False,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
assert_status(created, {201}, "create API contract test user")
|
||||||
|
return str(created.json["id"]), username, password
|
||||||
|
|
||||||
|
|
||||||
|
def instance_names(cluster_id: str, token: str) -> set[str]:
|
||||||
|
resp = request("GET", f"/clusters/{cluster_id}/instances", token)
|
||||||
|
assert_status(resp, {200}, "list instances")
|
||||||
|
instances = resp.json.get("instances", []) if isinstance(resp.json, dict) else []
|
||||||
|
return {str(item.get("name")) for item in instances if isinstance(item, dict)}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
if not ADMIN_PASS:
|
||||||
|
raise AssertionError("ADMIN_PASS or BOOTSTRAP_ADMIN_PASS is required")
|
||||||
|
|
||||||
|
suffix = uuid.uuid4().hex[:6]
|
||||||
|
admin_token = login(ADMIN_USER, ADMIN_PASS)
|
||||||
|
user_id = ""
|
||||||
|
quota_user_id = ""
|
||||||
|
try:
|
||||||
|
clusters = list_items("/clusters", admin_token, "list clusters")
|
||||||
|
cluster = find_by_name(clusters, TARGET_CLUSTER_NAME, "select target cluster")
|
||||||
|
cluster_id = str(cluster["id"])
|
||||||
|
registries = list_items("/registries", admin_token, "list registries")
|
||||||
|
registry = find_by_name(registries, TARGET_REGISTRY_NAME, "select target registry")
|
||||||
|
registry_id = str(registry["id"])
|
||||||
|
|
||||||
|
tags = request("GET", f"/registries/{registry_id}/repositories/{encoded_repo(NGINX_REPOSITORY)}/tags?media_type=chart", admin_token)
|
||||||
|
assert_status(tags, {200}, "registry repository tags alias")
|
||||||
|
if NGINX_TAG not in tags.body:
|
||||||
|
raise AssertionError(f"tags alias did not include expected {NGINX_REPOSITORY}:{NGINX_TAG}")
|
||||||
|
|
||||||
|
metrics = request("GET", f"/monitoring/clusters/{cluster_id}/metrics", admin_token)
|
||||||
|
assert_status(metrics, {200}, "monitoring metrics alias")
|
||||||
|
stats = request("GET", f"/clusters/{cluster_id}/stats", admin_token)
|
||||||
|
assert_status(stats, {200}, "cluster stats alias")
|
||||||
|
|
||||||
|
user_id, username, password = create_test_user(admin_token, cluster_id, suffix)
|
||||||
|
user_token = login(username, password)
|
||||||
|
|
||||||
|
kubeconfig = request("GET", f"/clusters/{cluster_id}/kubeconfig", user_token)
|
||||||
|
assert_status(kubeconfig, {200}, "cluster kubeconfig compatibility endpoint")
|
||||||
|
if "apiVersion: v1" not in kubeconfig.body or "kind: Config" not in kubeconfig.body or "token:" not in kubeconfig.body:
|
||||||
|
raise AssertionError(f"kubeconfig endpoint did not return tenant token kubeconfig: {kubeconfig.body[:500]}")
|
||||||
|
forbidden_fields = ("client-key-data:", "client-certificate-data:")
|
||||||
|
leaked = [field for field in forbidden_fields if field in kubeconfig.body]
|
||||||
|
if leaked:
|
||||||
|
raise AssertionError(f"kubeconfig endpoint leaked stored cert/key fields: {leaked}")
|
||||||
|
|
||||||
|
conflict = request(
|
||||||
|
"POST",
|
||||||
|
f"/clusters/{cluster_id}/instances",
|
||||||
|
user_token,
|
||||||
|
{
|
||||||
|
"name": f"values-conflict-{suffix}",
|
||||||
|
"namespace": f"ocdp-u-api-bugs-{suffix}",
|
||||||
|
"registryId": registry_id,
|
||||||
|
"repository": NGINX_REPOSITORY,
|
||||||
|
"tag": NGINX_TAG,
|
||||||
|
"values": {"replicaCount": 1},
|
||||||
|
"valuesYaml": "replicaCount: 2\n",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
assert_status(conflict, {400}, "values/valuesYaml conflict")
|
||||||
|
if "conflict" not in conflict.body.lower():
|
||||||
|
raise AssertionError(f"values conflict response should explain conflict, got {conflict.body[:500]}")
|
||||||
|
|
||||||
|
before = instance_names(cluster_id, user_token)
|
||||||
|
forbidden_name = f"namespace-forbidden-{suffix}"
|
||||||
|
namespace_forbidden = request(
|
||||||
|
"POST",
|
||||||
|
f"/clusters/{cluster_id}/instances",
|
||||||
|
user_token,
|
||||||
|
{
|
||||||
|
"name": forbidden_name,
|
||||||
|
"namespace": "default",
|
||||||
|
"registryId": registry_id,
|
||||||
|
"repository": NGINX_REPOSITORY,
|
||||||
|
"tag": NGINX_TAG,
|
||||||
|
"valuesYaml": "replicaCount: 1\n",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
assert_status(namespace_forbidden, {403}, "ordinary user forbidden namespace")
|
||||||
|
after = instance_names(cluster_id, user_token)
|
||||||
|
if forbidden_name in after or before != after:
|
||||||
|
raise AssertionError("forbidden namespace request must not create an instance")
|
||||||
|
|
||||||
|
quota_user_id, quota_username, quota_password = create_test_user(admin_token, cluster_id, f"quota-{suffix}", quota_gpu="0")
|
||||||
|
quota_token = login(quota_username, quota_password)
|
||||||
|
quota_name = f"quota-precheck-{suffix}"
|
||||||
|
quota_resp = request(
|
||||||
|
"POST",
|
||||||
|
f"/clusters/{cluster_id}/instances",
|
||||||
|
quota_token,
|
||||||
|
{
|
||||||
|
"name": quota_name,
|
||||||
|
"namespace": f"ocdp-u-api-bugs-quota-{suffix}",
|
||||||
|
"registryId": registry_id,
|
||||||
|
"repository": VLLM_REPOSITORY,
|
||||||
|
"tag": VLLM_TAG,
|
||||||
|
"valuesYaml": f"""resources:
|
||||||
|
gpuLimit: 1
|
||||||
|
gpuMem: {GPU_MEM_MB}
|
||||||
|
cpuRequest: 1
|
||||||
|
memoryLimit: "4Gi"
|
||||||
|
replicaCount: 1
|
||||||
|
workerSize: 1
|
||||||
|
initContainers:
|
||||||
|
enabled: false
|
||||||
|
""",
|
||||||
|
},
|
||||||
|
timeout=600,
|
||||||
|
)
|
||||||
|
assert_status(quota_resp, {403, 422}, "quota precheck rejects over-quota deployment")
|
||||||
|
if quota_name in instance_names(cluster_id, quota_token):
|
||||||
|
raise AssertionError("quota precheck must reject before persisting an instance")
|
||||||
|
|
||||||
|
print("PASS: unresolved API contract")
|
||||||
|
return 0
|
||||||
|
finally:
|
||||||
|
if user_id:
|
||||||
|
cleanup = request("DELETE", f"/users/{user_id}", admin_token)
|
||||||
|
if cleanup.status not in {204, 404}:
|
||||||
|
print(f"WARN: cleanup user {user_id} returned HTTP {cleanup.status}: {cleanup.body[:300]}", file=sys.stderr)
|
||||||
|
if quota_user_id:
|
||||||
|
cleanup = request("DELETE", f"/users/{quota_user_id}", admin_token)
|
||||||
|
if cleanup.status not in {204, 404}:
|
||||||
|
print(f"WARN: cleanup quota user {quota_user_id} returned HTTP {cleanup.status}: {cleanup.body[:300]}", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
try:
|
||||||
|
raise SystemExit(main())
|
||||||
|
except AssertionError as exc:
|
||||||
|
print(f"FAIL: {exc}", file=sys.stderr)
|
||||||
|
raise SystemExit(1)
|
||||||
150
test/unresolved_bugs_security_gateway_contract.py
Normal file
150
test/unresolved_bugs_security_gateway_contract.py
Normal file
@ -0,0 +1,150 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
# Covers unresolved security/gateway regressions: uniform login failures,
|
||||||
|
# per-IP+username login rate limiting with Retry-After, backend CORS allowlist,
|
||||||
|
# gateway /health JSON, Nginx version hiding, and security response headers.
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any
|
||||||
|
from urllib.error import HTTPError, URLError
|
||||||
|
from urllib.parse import urljoin
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
|
||||||
|
|
||||||
|
RAW_BASE_URL = os.environ.get("BASE_URL", "http://localhost:18081/api/v1").rstrip("/")
|
||||||
|
BASE_URL = RAW_BASE_URL + "/"
|
||||||
|
GATEWAY_URL = os.environ.get("GATEWAY_URL", "http://localhost:18080").rstrip("/")
|
||||||
|
ADMIN_USER = os.environ.get("ADMIN_USER", os.environ.get("BOOTSTRAP_ADMIN_USER", "admin"))
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Response:
|
||||||
|
status: int
|
||||||
|
headers: dict[str, str]
|
||||||
|
body: str
|
||||||
|
json: Any
|
||||||
|
|
||||||
|
|
||||||
|
def parse_json(body: str) -> Any:
|
||||||
|
try:
|
||||||
|
return json.loads(body) if body else None
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def request(method: str, url: str, payload: Any = None, headers: dict[str, str] | None = None) -> Response:
|
||||||
|
data = None
|
||||||
|
req_headers = dict(headers or {})
|
||||||
|
req_headers.setdefault("Accept", "application/json")
|
||||||
|
if payload is not None:
|
||||||
|
data = json.dumps(payload).encode("utf-8")
|
||||||
|
req_headers["Content-Type"] = "application/json"
|
||||||
|
target = url if url.startswith("http") else urljoin(BASE_URL, url.lstrip("/"))
|
||||||
|
try:
|
||||||
|
with urlopen(Request(target, data=data, headers=req_headers, method=method), timeout=20) as res:
|
||||||
|
body = res.read().decode("utf-8", errors="replace")
|
||||||
|
return Response(res.status, dict(res.headers), body, parse_json(body))
|
||||||
|
except HTTPError as exc:
|
||||||
|
body = exc.read().decode("utf-8", errors="replace")
|
||||||
|
return Response(exc.code, dict(exc.headers), body, parse_json(body))
|
||||||
|
except URLError as exc:
|
||||||
|
raise AssertionError(f"Cannot reach {target}: {exc}") from exc
|
||||||
|
|
||||||
|
|
||||||
|
def header(resp: Response, name: str) -> str:
|
||||||
|
for key, value in resp.headers.items():
|
||||||
|
if key.lower() == name.lower():
|
||||||
|
return value
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def assert_status(resp: Response, expected: set[int], context: str) -> None:
|
||||||
|
if resp.status not in expected:
|
||||||
|
raise AssertionError(f"{context}: expected HTTP {sorted(expected)}, got {resp.status}. Body: {resp.body[:500]}")
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
fake_user = f"no-such-user-{uuid.uuid4().hex[:8]}"
|
||||||
|
existing_failure = request(
|
||||||
|
"POST",
|
||||||
|
"/auth/login",
|
||||||
|
{"username": ADMIN_USER, "password": f"wrong-{uuid.uuid4().hex}"},
|
||||||
|
{"X-Forwarded-For": "203.0.113.10"},
|
||||||
|
)
|
||||||
|
missing_failure = request(
|
||||||
|
"POST",
|
||||||
|
"/auth/login",
|
||||||
|
{"username": fake_user, "password": "wrong-password"},
|
||||||
|
{"X-Forwarded-For": "203.0.113.11"},
|
||||||
|
)
|
||||||
|
assert_status(existing_failure, {401}, "existing-user login failure")
|
||||||
|
assert_status(missing_failure, {401}, "missing-user login failure")
|
||||||
|
if existing_failure.json != missing_failure.json:
|
||||||
|
raise AssertionError(f"login failures must be uniform, got {existing_failure.body!r} vs {missing_failure.body!r}")
|
||||||
|
|
||||||
|
limited_user = f"rate-limit-{uuid.uuid4().hex[:8]}"
|
||||||
|
rate_resp = None
|
||||||
|
for _ in range(6):
|
||||||
|
rate_resp = request(
|
||||||
|
"POST",
|
||||||
|
"/auth/login",
|
||||||
|
{"username": limited_user, "password": "bad-password"},
|
||||||
|
{"X-Forwarded-For": "203.0.113.12"},
|
||||||
|
)
|
||||||
|
assert rate_resp is not None
|
||||||
|
assert_status(rate_resp, {429}, "login rate limit")
|
||||||
|
if not header(rate_resp, "Retry-After"):
|
||||||
|
raise AssertionError("login rate limit response must include Retry-After")
|
||||||
|
|
||||||
|
allowed_origin = "http://localhost:18080"
|
||||||
|
unknown_origin = "https://evil.example"
|
||||||
|
allowed = request(
|
||||||
|
"POST",
|
||||||
|
"/auth/login",
|
||||||
|
{"username": f"cors-allowed-{uuid.uuid4().hex[:8]}", "password": "bad-password"},
|
||||||
|
{"Origin": allowed_origin, "X-Forwarded-For": "203.0.113.13"},
|
||||||
|
)
|
||||||
|
assert_status(allowed, {401}, "allowed CORS login response")
|
||||||
|
if header(allowed, "Access-Control-Allow-Origin") != allowed_origin:
|
||||||
|
raise AssertionError(f"allowed origin was not echoed: {allowed.headers}")
|
||||||
|
unknown = request(
|
||||||
|
"POST",
|
||||||
|
"/auth/login",
|
||||||
|
{"username": f"cors-unknown-{uuid.uuid4().hex[:8]}", "password": "bad-password"},
|
||||||
|
{"Origin": unknown_origin, "X-Forwarded-For": "203.0.113.14"},
|
||||||
|
)
|
||||||
|
assert_status(unknown, {401}, "unknown CORS login response")
|
||||||
|
if header(unknown, "Access-Control-Allow-Origin"):
|
||||||
|
raise AssertionError(f"unknown origin must not be allowed: {unknown.headers}")
|
||||||
|
|
||||||
|
health = request("GET", f"{GATEWAY_URL}/health")
|
||||||
|
assert_status(health, {200}, "gateway /health")
|
||||||
|
if health.json != {"status": "ok"}:
|
||||||
|
raise AssertionError(f"/health must return JSON status ok, got {health.body[:300]!r}")
|
||||||
|
server = header(health, "Server")
|
||||||
|
if re.search(r"nginx/\d", server, re.IGNORECASE):
|
||||||
|
raise AssertionError(f"Nginx precise version leaked in Server header: {server}")
|
||||||
|
for name in ("X-Frame-Options", "X-Content-Type-Options", "Referrer-Policy", "Content-Security-Policy"):
|
||||||
|
if not header(health, name):
|
||||||
|
raise AssertionError(f"missing security header {name} on /health")
|
||||||
|
|
||||||
|
healthz = request("GET", f"{GATEWAY_URL}/healthz")
|
||||||
|
assert_status(healthz, {200}, "gateway /healthz")
|
||||||
|
for name in ("X-Frame-Options", "X-Content-Type-Options", "Referrer-Policy", "Content-Security-Policy"):
|
||||||
|
if not header(healthz, name):
|
||||||
|
raise AssertionError(f"missing security header {name} on /healthz")
|
||||||
|
|
||||||
|
print("PASS: unresolved security/gateway contract")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
try:
|
||||||
|
raise SystemExit(main())
|
||||||
|
except AssertionError as exc:
|
||||||
|
print(f"FAIL: {exc}", file=sys.stderr)
|
||||||
|
raise SystemExit(1)
|
||||||
63
test/user_management_layout_playwright.py
Normal file
63
test/user_management_layout_playwright.py
Normal file
@ -0,0 +1,63 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
# Covers admin User Management layout: quota fields and action buttons remain
|
||||||
|
# visible inside the viewport on desktop and tablet widths.
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from playwright.sync_api import expect, sync_playwright
|
||||||
|
|
||||||
|
|
||||||
|
FRONTEND_URL = os.environ.get("FRONTEND_URL", "http://localhost:18080")
|
||||||
|
ADMIN_USER = os.environ.get("ADMIN_USER", "admin")
|
||||||
|
ADMIN_PASS = os.environ["ADMIN_PASS"]
|
||||||
|
|
||||||
|
|
||||||
|
def login(page):
|
||||||
|
page.goto(FRONTEND_URL, wait_until="networkidle")
|
||||||
|
if page.locator("input[type='password']").count() == 0:
|
||||||
|
return
|
||||||
|
page.locator("input:not([type='password'])").first.fill(ADMIN_USER)
|
||||||
|
page.locator("input[type='password']").first.fill(ADMIN_PASS)
|
||||||
|
page.get_by_role("button").filter(has_text="Login").last.click()
|
||||||
|
page.wait_for_url("**/home", timeout=15000)
|
||||||
|
page.wait_for_load_state("networkidle")
|
||||||
|
|
||||||
|
|
||||||
|
def assert_no_action_overflow(page):
|
||||||
|
expect(page.get_by_role("heading", name="User Management")).to_be_visible(timeout=15000)
|
||||||
|
expect(page.get_by_text("GPU Mem").first).to_be_visible(timeout=15000)
|
||||||
|
overflow = page.evaluate("document.documentElement.scrollWidth > document.documentElement.clientWidth + 2")
|
||||||
|
assert not overflow, "User Management page has horizontal document overflow"
|
||||||
|
buttons = page.locator("button").filter(has_text="Limits")
|
||||||
|
expect(buttons.first).to_be_visible(timeout=15000)
|
||||||
|
viewport = page.viewport_size or {"width": 0, "height": 0}
|
||||||
|
action_labels = ("To User", "To Admin", "Limits", "Disable", "Enable", "Delete")
|
||||||
|
for label in action_labels:
|
||||||
|
matches = page.get_by_role("button", name=label, exact=True)
|
||||||
|
for index in range(min(matches.count(), 8)):
|
||||||
|
box = matches.nth(index).bounding_box()
|
||||||
|
if not box:
|
||||||
|
continue
|
||||||
|
assert box["x"] >= -1, f"{label} button overflows left viewport edge"
|
||||||
|
assert box["x"] + box["width"] <= viewport["width"] + 1, f"{label} button overflows right viewport edge"
|
||||||
|
|
||||||
|
|
||||||
|
with sync_playwright() as p:
|
||||||
|
browser = p.chromium.launch(headless=True)
|
||||||
|
for viewport in (
|
||||||
|
{"width": 1440, "height": 900},
|
||||||
|
{"width": 1280, "height": 900},
|
||||||
|
{"width": 1024, "height": 800},
|
||||||
|
{"width": 900, "height": 760},
|
||||||
|
{"width": 768, "height": 900},
|
||||||
|
):
|
||||||
|
page = browser.new_page(viewport=viewport)
|
||||||
|
login(page)
|
||||||
|
page.get_by_role("button", name="Users", exact=True).click()
|
||||||
|
page.wait_for_load_state("networkidle")
|
||||||
|
page.wait_for_timeout(500)
|
||||||
|
assert_no_action_overflow(page)
|
||||||
|
page.close()
|
||||||
|
browser.close()
|
||||||
|
|
||||||
|
print("PASS: user management layout")
|
||||||
Reference in New Issue
Block a user