vLLM

High-throughput and memory-efficient LLM serving engine. Deploy AI models with PagedAttention for up to 24x higher throughput than HuggingFace Transformers. The industry standard for LLM inference.

View on AIWEBTOOLS.AI