Llama.cpp

A C++ inference engine that enables running large language models locally with minimal setup and state-of-the-art performance across CPU and GPU hardware.

Available Pages

Llama.cpp Overview: Run Local AI Models & Tackle Compilation

Explore Llama.cpp, the C++ inference engine for running AI models locally. Understand its purpose, navigate common compilation challenges, and troubleshoot GPU usage issues.

Related Technologies

Competition

ollama

Direct competitors

llamafile

Can replace or substitute

localai

Direct competitors

huggingface transformers

Direct competitors

candle

Can replace or substitute

Integration

Integrates With

text generation webui

Official integration support

Integrates With

gpt4all

Official integration support

Integrates With

lm studio

Official integration support

Integrates With

jan

Official integration support

Integrates With

open webui

Official integration support

Compatible With

cuda

Works well together

Compatible With

metal

Works well together

Compatible With

vulkan

Works well together

Dependencies

ggml

Foundation technology

llama cpp python

Enables other tools

node llama cpp

Enables other tools

go llama cpp

Enables other tools

llama cpp rs

Enables other tools

cmake

Requires for operation

ollama

Enables other tools

koboldcpp

Foundation technology

Development

koboldcpp

Has been forked

Similar

vllm

Similar functionality