# vllm-server OpenAI-compatible model serving with vLLM. The base is CPU-safe YAML. Add `components/gpu-nvidia` in environments that provide NVIDIA GPUs, and let the instance overlay patch model name, resources, and cache size.