vllm-server
OpenAI-compatible model serving with vLLM.
The base is CPU-safe YAML. Add components/gpu-nvidia in environments that
provide NVIDIA GPUs, and let the instance overlay patch model name, resources,
and cache size.
OpenAI-compatible model serving with vLLM.
The base is CPU-safe YAML. Add components/gpu-nvidia in environments that
provide NVIDIA GPUs, and let the instance overlay patch model name, resources,
and cache size.