What Claude Computer Use Actually Does
Claude Computer Use is basically Claude with eyes and hands. It takes screenshots of your desktop, figures out what's on screen using computer vision, then clicks and types like you would. No APIs, no special integrations - just raw screenshot analysis and coordinate clicking.
This is huge because most software doesn't have APIs, especially legacy enterprise crap that's been running since Windows XP. I've used it to automate our ancient ERP system that predates REST APIs. Claude just sees the UI and clicks through it like a human would, except it doesn't get tired or make typos at 3am.
How It Actually Works (And Why It Breaks)
Here's how it actually works: Claude takes a screenshot, uses computer vision to identify clickable elements, calculates pixel coordinates, then sends mouse/keyboard commands. After each action, it takes another screenshot to see what happened.
This feedback loop is where I've watched everything go wrong. Claude clicks the wrong button because UI elements moved, or it gets completely confused by modal dialogs that pop up unexpectedly. The pixel counting accuracy problem is real - Claude has to literally count pixels to know where to click, which breaks when screen resolutions change.
I've watched it click on button shadows, get stuck in infinite loops when websites dynamically load content, and completely give up when faced with CAPTCHAs. But when it works, it's pretty satisfying watching an AI navigate through complex multi-step processes.
What Models Actually Work (August 2025)
Right now you can use Computer Use with:
- Claude Sonnet 3.5: The original, works but scrolling is janky (deprecated)
- Claude Sonnet 3.7: Better scrolling and stability, has extended thinking mode
- Claude Sonnet 4: Current flagship - much more reliable, handles complex interactions well
- Claude Opus 4/4.1: Most capable but expensive, overkill for most automation tasks
The difference is night and day. Sonnet 3.5 randomly fails at scrolling through long pages and I've given up on it. Sonnet 3.7 fixed most stability issues I was hitting. Sonnet 4 is where it gets reliable enough that I actually use it for real work without expecting it to break every 5 minutes.
Stick with Sonnet 4 for most use cases. The older models will frustrate you with random failures.
Docker Setup Hell
You need Docker with X11 forwarding, which is its own special kind of pain. The official setup uses Xvfb (virtual framebuffer) with a desktop environment running inside the container.
Plan on spending at least 2 hours getting display forwarding working correctly. On macOS, you'll need XQuartz and it breaks every OS update. On Windows, forget about it - Docker Desktop's X11 forwarding is completely broken half the time. Linux works best but you'll spend 30 minutes fighting xhost permissions.
The Docker container randomly stops working after system updates and nobody knows why. I have a bash script that restarts the container every 6 hours because of memory leaks. Your display will randomly go black and you'll have to rebuild the entire thing.
Keep your resolution at 1280x800 or lower - higher resolutions make Claude less accurate because it has to resize images. I learned this the hard way after wondering why it kept missing buttons on my 4K monitor.