Remember when OpenAI started charging $20/month for ChatGPT Plus? That pricing shift woke up every developer who'd been happily feeding proprietary code to external APIs. Text-generation-webui emerged as the answer - oobabooga's project that lets you run LLaMA, Mistral, and dozens of other models on your own hardware.
I've been using this for about 8 months now, mostly for coding help when I don't want to send proprietary code to external APIs. Works great for that, though setup can be a pain depending on your system.
The main thing that sets this apart is the backend flexibility. While Ollama basically just wraps llama.cpp and LM Studio is Windows-focused, text-generation-webui supports multiple backends: Transformers, ExLlamaV2, AutoGPTQ, and others. This means you can actually pick what works best for your hardware instead of being stuck with one approach.
The Gradio web interface is decent - not as polished as ChatGPT but gets the job done. You get chat mode for conversations, instruct mode for tasks, and notebook mode for long-form generation. Plus it runs entirely offline, so your conversations stay on your machine.
Recent updates added vision model support (you can feed it images), file uploads for PDFs, and an OpenAI-compatible API. The API part is huge if you want to integrate with existing tools.
Installation runs the full spectrum from "just works" to "there goes my weekend debugging CUDA dependencies." The one-click installers help, but Windows users still battle driver conflicts and path issues. Linux users typically breeze through setup, assuming they don't mind compiling things from source.
Performance scales directly with your wallet - my RTX 3090 handles most 7B models at 15-20 tokens/second, but anything 13B+ starts crawling. CPU-only inference exists in theory but barely qualifies as usable at 1-2 tokens/second.
If you're spending $20+ monthly on OpenAI and comfortable tinkering with hardware configs, the initial setup pain pays dividends. Plus you actually own your conversation history.