So we've been hearing about AI agents that can do complex tasks for years. But here's the thing - ChatGPT can write you a perfect email about booking a flight, but ask it to actually book the flight on United's website? It gets confused by the cookie banner and gives up.
I tried this myself last month. Asked Claude to help me book a rental car on Budget's shitty website. It spent 20 minutes explaining what I should click, but couldn't actually do any of the clicking. Hit the first CAPTCHA and basically said "I can see there's a reCAPTCHA but I can't interact with it, you're on your own here." Same error every AI hits - they can see the DOM but can't actually manipulate it.
The solution everyone's betting on is building these massive virtual environments where AI agents practice using real websites. Think of it like a driving range, but for clicking buttons. You simulate Chrome, drop an AI in there, and let it fail at ordering pizza a million times until it figures it out.
The compute costs are brutal though. Normal AI training just reads text files - this needs to simulate entire browser sessions. I've seen the bills - running thousands of browser instances for AI training gets expensive fast. We're talking enterprise-grade compute just to teach AI how to click through a shopping cart.
The funding numbers being thrown around are insane. Companies are raising huge rounds specifically for this kind of training infrastructure. The exact amounts are usually kept private, but we're definitely talking hundreds of millions across the whole space.
The timing makes sense. Everyone knows just feeding models more text isn't working like it used to. The scaling laws are getting weird. So now the bet is that teaching AI to actually use computers will be the breakthrough. But honestly? I've seen this pattern before - the next big thing always costs way more and does way less than anyone expects.