Skip to main contentSkip to navigation

Llama.cpp

A C++ inference engine that enables running large language models locally with minimal setup and state-of-the-art performance across CPU and GPU hardware.