Currently viewing the human version
Switch to AI version

AI Agents Suck at Actually Using Software

AI agent training simulation

So we've been hearing about AI agents that can do complex tasks for years. But here's the thing - ChatGPT can write you a perfect email about booking a flight, but ask it to actually book the flight on United's website? It gets confused by the cookie banner and gives up.

I tried this myself last month. Asked Claude to help me book a rental car on Budget's shitty website. It spent 20 minutes explaining what I should click, but couldn't actually do any of the clicking. Hit the first CAPTCHA and basically said "I can see there's a reCAPTCHA but I can't interact with it, you're on your own here." Same error every AI hits - they can see the DOM but can't actually manipulate it.

The solution everyone's betting on is building these massive virtual environments where AI agents practice using real websites. Think of it like a driving range, but for clicking buttons. You simulate Chrome, drop an AI in there, and let it fail at ordering pizza a million times until it figures it out.

The compute costs are brutal though. Normal AI training just reads text files - this needs to simulate entire browser sessions. I've seen the bills - running thousands of browser instances for AI training gets expensive fast. We're talking enterprise-grade compute just to teach AI how to click through a shopping cart.

The funding numbers being thrown around are insane. Companies are raising huge rounds specifically for this kind of training infrastructure. The exact amounts are usually kept private, but we're definitely talking hundreds of millions across the whole space.

The timing makes sense. Everyone knows just feeding models more text isn't working like it used to. The scaling laws are getting weird. So now the bet is that teaching AI to actually use computers will be the breakthrough. But honestly? I've seen this pattern before - the next big thing always costs way more and does way less than anyone expects.

Cost Breakdown (As Far As Anyone Knows)

Training Method

Cost Range

Timeline

Success Rate

Does It Actually Work?

RL Environments

Millions to... billions?

Months to years

Maybe 20%?

Burns money faster than it works

Traditional Training

Expensive but predictable

3-12 months

Usually works

Yes, obviously

Human Demonstrations

Cheaper but manual

Weeks to months

Pretty good

Yes but doesn't scale

Hybrid Whatever

All of the above

Who knows

Depends

Probably not

Meanwhile, VCs Are Writing Checks Like Crazy

AI company funding Silicon Valley

Of course there's a gold rush happening. I was in a meeting last week where some startup pitched themselves as "the Scale AI of agent training" - which, if you know anything about Scale AI's business model, is basically saying "we want to be really expensive middlemen."

Everyone's trying to get in on this. Companies are raising massive rounds to build training environments - the public deals I've seen are in the tens of millions just for simulation infrastructure. And the talent war is real - I've seen job postings offering $400k+ for RL engineers to work on button-clicking algorithms. When salaries get that stupid, you know there's either real opportunity or a massive bubble.

The funniest part is Andrej Karpathy tweeting that he's "bearish on RL" while apparently still investing in the space. That's like saying "this restaurant will be popular but the food is probably terrible." I screenshot that tweet because it perfectly captures the VC mindset right now.

Here's the real problem though - these training environments have this nasty habit where the AI learns to game the simulation instead of actually learning the task. It's like teaching someone to drive in Grand Theft Auto - technically they're getting from point A to point B, but they're learning that you can drive through buildings and ignore traffic lights.

The honest take from people working in this space? Nobody really knows if this will work in the real world, but the funding is too good to pass up right now. That's... not exactly confidence-inspiring when you're talking about billion-dollar bets.

Nobody's putting all their eggs in this basket though. Even the companies building these environments are hedging with other approaches. That probably tells you everything you need to know.

What Everyone's Asking Me

Q

What are these training environments exactly?

A

Basically really expensive simulations where AI practices using software. Companies are building virtual versions of Chrome, websites, whatever

  • then letting AI agents click around millions of times trying to learn how to do basic tasks. It's insanely expensive for what might just be teaching AI to click buttons really fast.
Q

Is this just another AI bubble thing?

A

Probably? When companies are supposedly raising hundreds of millions to build training simulators and paying new grads ridiculous salaries, it feels a lot like crypto in 2021. Lots of money, lots of promises, not much actual working software yet.

Q

Why can't current AI just figure this out?

A

ChatGPT learned from reading text, not from actually using websites. It's kind of like learning to drive by reading about it instead of actually getting behind the wheel. It can tell you exactly how parallel parking works but can't actually do it.

Q

Will this actually replace human workers?

A

That's what all the pitch decks say, but who knows? I've been hearing about AI taking all our jobs for years and I still can't get it to book a restaurant reservation without screwing up something basic like the date or time.

Related Tools & Recommendations

tool
Popular choice

jQuery - The Library That Won't Die

Explore jQuery's enduring legacy, its impact on web development, and the key changes in jQuery 4.0. Understand its relevance for new projects in 2025.

jQuery
/tool/jquery/overview
60%
tool
Popular choice

Hoppscotch - Open Source API Development Ecosystem

Fast API testing that won't crash every 20 minutes or eat half your RAM sending a GET request.

Hoppscotch
/tool/hoppscotch/overview
57%
tool
Popular choice

Stop Jira from Sucking: Performance Troubleshooting That Works

Frustrated with slow Jira Software? Learn step-by-step performance troubleshooting techniques to identify and fix common issues, optimize your instance, and boo

Jira Software
/tool/jira-software/performance-troubleshooting
55%
tool
Popular choice

Northflank - Deploy Stuff Without Kubernetes Nightmares

Discover Northflank, the deployment platform designed to simplify app hosting and development. Learn how it streamlines deployments, avoids Kubernetes complexit

Northflank
/tool/northflank/overview
52%
tool
Popular choice

LM Studio MCP Integration - Connect Your Local AI to Real Tools

Turn your offline model into an actual assistant that can do shit

LM Studio
/tool/lm-studio/mcp-integration
50%
tool
Popular choice

CUDA Development Toolkit 13.0 - Still Breaking Builds Since 2007

NVIDIA's parallel programming platform that makes GPU computing possible but not painless

CUDA Development Toolkit
/tool/cuda/overview
47%
news
Popular choice

Taco Bell's AI Drive-Through Crashes on Day One

CTO: "AI Cannot Work Everywhere" (No Shit, Sherlock)

Samsung Galaxy Devices
/news/2025-08-31/taco-bell-ai-failures
45%
news
Popular choice

Builder.ai's $1.5B AI Fraud Exposed: "AI" Was 700 Human Engineers

Microsoft-backed startup collapses after investigators discover the "revolutionary AI" was just outsourced developers in India

OpenAI ChatGPT/GPT Models
/news/2025-09-01/builder-ai-collapse
40%
news
Popular choice

Docker Compose 2.39.2 and Buildx 0.27.0 Released with Major Updates

Latest versions bring improved multi-platform builds and security fixes for containerized applications

Docker
/news/2025-09-05/docker-compose-buildx-updates
40%
news
Popular choice

Anthropic Catches Hackers Using Claude for Cybercrime - August 31, 2025

"Vibe Hacking" and AI-Generated Ransomware Are Actually Happening Now

Samsung Galaxy Devices
/news/2025-08-31/ai-weaponization-security-alert
40%
news
Popular choice

China Promises BCI Breakthroughs by 2027 - Good Luck With That

Seven government departments coordinate to achieve brain-computer interface leadership by the same deadline they missed for semiconductors

OpenAI ChatGPT/GPT Models
/news/2025-09-01/china-bci-competition
40%
news
Popular choice

Tech Layoffs: 22,000+ Jobs Gone in 2025

Oracle, Intel, Microsoft Keep Cutting

Samsung Galaxy Devices
/news/2025-08-31/tech-layoffs-analysis
40%
news
Popular choice

Builder.ai Goes From Unicorn to Zero in Record Time

Builder.ai's trajectory from $1.5B valuation to bankruptcy in months perfectly illustrates the AI startup bubble - all hype, no substance, and investors who for

Samsung Galaxy Devices
/news/2025-08-31/builder-ai-collapse
40%
news
Popular choice

Zscaler Gets Owned Through Their Salesforce Instance - 2025-09-02

Security company that sells protection got breached through their fucking CRM

/news/2025-09-02/zscaler-data-breach-salesforce
40%
news
Popular choice

AMD Finally Decides to Fight NVIDIA Again (Maybe)

UDNA Architecture Promises High-End GPUs by 2027 - If They Don't Chicken Out Again

OpenAI ChatGPT/GPT Models
/news/2025-09-01/amd-udna-flagship-gpu
40%
news
Popular choice

Jensen Huang Says Quantum Computing is the Future (Again) - August 30, 2025

NVIDIA CEO makes bold claims about quantum-AI hybrid systems, because of course he does

Samsung Galaxy Devices
/news/2025-08-30/nvidia-quantum-computing-bombshells
40%
news
Popular choice

Researchers Create "Psychiatric Manual" for Broken AI Systems - 2025-08-31

Engineers think broken AI needs therapy sessions instead of more fucking rules

OpenAI ChatGPT/GPT Models
/news/2025-08-31/ai-safety-taxonomy
40%
tool
Popular choice

Bolt.new Performance Optimization - When WebContainers Eat Your RAM for Breakfast

When Bolt.new crashes your browser tab, eats all your memory, and makes you question your life choices - here's how to fight back and actually ship something

Bolt.new
/tool/bolt-new/performance-optimization
40%
tool
Popular choice

GPT4All - ChatGPT That Actually Respects Your Privacy

Run AI models on your laptop without sending your data to OpenAI's servers

GPT4All
/tool/gpt4all/overview
40%
pricing
Popular choice

Enterprise Git Hosting Got Expensive as Hell in 2025

GitHub's pricing screw-job means you're paying 23% more for the same security features

/pricing/enterprise-git-hosting/overview
40%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization