Announcing Our Partnership With vLLM to Advance AI Inference

Novita AI, a leading global AI cloud platform, is thrilled to announce a strategic partnership with vLLM, the pioneering open-source inference engine for large language models (LLMs). This collaboration marks a significant step forward in their shared mission to drive innovation in AI and promote growth within the open-source community.

vLLM is renowned for its groundbreaking PagedAttention algorithm, which significantly boosts the performance and efficiency of large language models during inference. This technology has made vLLM a trusted solution for developers, offering memory-optimized inference capabilities across public clouds, model providers, and AI-powered applications. By open-sourcing its technology, vLLM has democratized access to cutting-edge AI tools, enabling developers to streamline their workflows and reduce operational costs.

“vLLM’s PagedAttention algorithm highlights the transformative potential of open-source AI,” said Junyu Huang, Co-Founder & COO at Novita AI. “Through this collaboration, we aim to help developers and organizations unlock the full range of efficiencies and opportunities these advancements bring to AI deployment.”

As part of this collaboration, Novita AI is supporting vLLM’s growth by providing access to high-performance compute resources for testing, benchmarking, research and development. This collaboration allows for continuous improvement of vLLM’s capabilities, ensuring its tools are optimized for a wide range of applications and giving developers the most efficient ways to deploy large language models.

Developers using Novita AI’s platform can easily deploy open-source LLMs like LLaMA 3.1, leveraging vLLM’s advanced inference capabilities. This streamlines the development process, speeds up application deployment, and helps organizations scale their AI solutions with ease.

“This collaboration marks the start of a long-term effort to accelerate AI advancements and equip developers with cutting-edge tools to innovate at scale,” remarked Junyu Huang.

This partnership reflects the shared commitment of Novita AI and vLLM to empower developers and advance open-source AI. By combining Novita AI’s scalable GPU cloud infrastructure with vLLM’s state-of-the-art inference engine, the collaboration aims to provide developers with the tools and resources needed to create impactful AI solutions. Additionally, the effort seeks to foster a vibrant open-source ecosystem that encourages technological innovation, ultimately driving the development of groundbreaking AI applications across industries.

Junyu Huang emphasized, “This partnership is more than just collaboration — it’s a testament to our shared mission to advance open-source AI and create new opportunities for developers around the world.”

About Novita AI

Originally published at Novita AI

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

By supporting open-source libraries for LLM inference and serving, such as vLLM, which is a fast and easy-to-use library for this purpose, Novita AI is helping shape the future of AI and driving innovation across the industry.