Serve LLMs locally¶
Serve LLMs with Llama-cpp¶
from fastserve.models import ServeLlamaCpp
model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()
or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf
from terminal.
Last update:
July 20, 2024