Another Jensen Huang Keynote, Another Impossible GPU
Nvidia's September 9 announcement follows the same script: Jensen walks on stage in a leather jacket, throws around numbers that sound impressive, promises to revolutionize computing, then mentions it won't ship for two years. The Rubin CPX handles million-token contexts without memory-related crashes that plague current systems. SemiAnalysis confirms the specialized architecture splits compute and bandwidth optimization - basically admitting current GPUs are shit at long context.
30 petaflops using NVFP4 precision (Nvidia's made-up number format) and 128GB GDDR7 memory. Technical teardown shows it's a single massive die instead of chiplets - probably because chiplets introduce latency that ruins long-context performance. It's 3x faster than GB300 at attention mechanisms, which is what you need when processing entire codebases or War and Peace.
I've run GB300 systems that crash when context windows hit 500k tokens. The memory bandwidth just can't keep up. CPX supposedly fixes this by redesigning the entire memory subsystem. Power consumption is still classified, which means it's apocalyptically high.
The Math is Completely Fucked
Vera Rubin NVL144 CPX platform: 8 exaflops per rack, 7.5x current performance, costs somewhere between "new yacht" and "small country GDP." Nvidia claims $5 billion token revenue for every $100 million hardware investment. Translation: spend $100M, then somehow process 500 trillion tokens to break even.
At current OpenAI pricing ($0.01 per 1k tokens), you need to process roughly 500 trillion tokens to generate $5 billion. That's every Wikipedia article ever written, processed 50,000 times. Or one really long conversation with ChatGPT.
I ran the numbers for our last GPU cluster purchase. A 16-GPU H100 setup cost $800k, burns $50k/month in electricity, and generates maybe $200k/month revenue on a good day. ROI timeline: 3 years if nothing breaks. CPX systems will cost 10x more and probably still take 3 years to pay off.
MGX platform supports InfiniBand and Ethernet because Nvidia wants to sell you networking equipment too. 1.7 petabytes/second memory bandwidth means your entire network infrastructure needs upgrading or this becomes the world's most expensive paperweight.
The Usual Suspects Line Up
Cursor's Michael Truell wants CPX for "lightning-fast code generation" because current AI can't understand a full codebase without shitting itself. Makes sense - I've watched Claude try to fix a bug in our React app and suggest importing a component that doesn't exist because it only saw 10% of the context. Full codebase understanding would actually be useful.
Runway's CEO talks about "agent-driven creative workflows" which is marketing speak for "AI that can make videos longer than 15 seconds without going insane." Current AI video breaks down faster than a 2003 Honda in winter. Longer context windows might fix the consistency problem where AI forgets what the protagonist looks like halfway through.
Magic is building 100-million-token context models for software engineering. Their pitch: AI that can see your entire codebase, documentation, GitHub history, and every Stack Overflow answer you've ever copied. Either that's the future of programming or we're training AI to write enterprise Java-level spaghetti code at unprecedented scale.
Two Years for Everyone Else to Catch Up (They Won't)
Late 2026 ship date gives AMD, Intel, and the other also-rans two years to build something competitive. Nvidia's betting CUDA lock-in will keep their 6 million developers trapped forever. They're probably right - I've tried migrating CUDA code to ROCm and it's like translating Shakespearean English to Klingon.
TechPowerUp confirms the single-die approach reduces manufacturing complexity, which means fewer things can go wrong during production. Smart move when you're building something this complicated.
AMD's MI300X is decent hardware but their software ecosystem is like a ghost town. Tom's Hardware notes CPX's disaggregated architecture is unique because nobody else is crazy enough to build specialized chips for specific AI workloads. Intel's Gaudi costs less but good luck finding developers who want to rewrite their entire stack.
Notebook Check found six different Rubin chips at TSMC, confirming this isn't incremental bullshit - it's a complete platform rebuild. PC Mag's take is that Rubin addresses "AI's skyrocketing costs" by making the costs even more skyrocketing.
The big question: do we actually need million-token context or is this another useless benchmark? Most AI apps don't need to memorize entire novels. But if you're building AI lawyers that need to understand case law or AI coders that need full repository context, this might be the only option that doesn't crash when memory pressure hits.
NIM microservices will allegedly be ready when hardware ships. Assuming your power grid can handle whatever apocalyptic wattage this thing draws.