- Fix Axios keysToSnake converting user values map keys (gpuMem -> gpu_mem) - Add skipRecurseKeys to keysToSnake for values/valuesYaml fields - Add values_yaml alt json tag and Normalize() in DTOs - Check both camelCase/snake_case in enforceNamespaceValues - Read both tailLines/tail_lines query param for diagnostics - Admin users can freely choose namespace in LaunchModal (free-text input) - Block only kube-system/kube-public/kube-node-lease for admin - Regular users keep existing namespace restrictions - Add SSE streaming pod logs endpoint (backend + frontend) - New PodLogStreamer interface and K8s Follow:true implementation - SSE handler with text/event-stream output - Frontend DiagnosticsModal: Stream button, auto-scroll, live indicator - Remove per-card Refresh button from InstanceCard (redundant with page refresh) - Deploy bge-m3 on vllm-serve 0.6.0 (gpuMem=10000, status=deployed)
1.9 KiB
1.9 KiB
Lessons
- Do not leave real bootstrap credentials, cluster endpoints, certificates, or passwords in code fallbacks. Bootstrap defaults must be empty/disabled; real data must come only from
.env,BOOTSTRAP_CONFIG_JSON, or explicit config files. - Keep backend permission names aligned with frontend route guards. Returning legacy domain permissions like
clusters:manage:ownwithout UI permissions such asconfiguration:clusters:manage_ownmakes ordinary users appear logged in but blocked by every page. - Treat
requests.nvidia.com/gpumemas a vendor integer MB scalar in this project. Do not normalize it through Kubernetes memory units such asM,G, orGi; use values like10000. - Multi-cluster tenant resources must be scoped by
(workspace_id, cluster_id). Do not infer the target cluster from list order; user/workspace defaults, kubeconfig issuance, namespace creation, ResourceQuota, and deploy must all use the same selected cluster. - For real Helm smoke tests, wait for platform instance deletion to remove the DB record before deleting the Kubernetes namespace manually. Deleting the namespace too early can make the async Helm uninstall mark the instance failed.
- When embedding Helm, setting
actionConfig.Init(..., namespace, ...)andInstall.Namespaceis not enough. The customRESTClientGettermust also override the raw kubeconfig loader namespace, or manifests withoutmetadata.namespacecan be created in the kubeconfig context namespace such asdefault. - Axios keysToSnake recursively converts ALL object keys including user-provided values map. This silently renames Helm chart values (gpuMem → gpu_mem) causing chart to ignore user settings. Fix: skip recursion for known data fields (values, valuesYaml) while still converting field names. Backend DTOs must provide dual json tags (camelCase + snake_case) with Normalize() fallback.