GraphQL's biggest lie is that it's performant out of the box. The reality? Your innocent "fetch users and their posts" query will hammer your database with hundreds of queries and bring everything to its knees.
I learned this the hard way when our "simple" user feed took forever to load because it was hammering the database with hundreds of queries. Each user card was a separate SELECT, each post was another SELECT, and so on. Classic N+1 hell.
DataLoader fixes this by batching the fuck out of your database calls. It collects all the database requests from a single GraphQL query and batches them together. Instead of a shitload of individual queries, you get maybe 2-3 batched ones.
The Problem: GraphQL Resolvers Are Dumb
GraphQL resolvers run independently and have no clue what other resolvers are doing. This fundamental resolver design creates performance disasters. So when you query:
{
users {
name
posts {
title
}
}
}
You get:
- One query for users:
SELECT * FROM users
- N queries for posts:
SELECT * FROM posts WHERE user_id = 1
,SELECT * FROM posts WHERE user_id = 2
, etc.
This is fine for 10 users. It'll bring your server to its knees with 100 users and murder your database with 1000.
How DataLoader Actually Works
DataLoader waits for one tick of the event loop and collects all the .load()
calls that happened. Then it calls your batch function once with all the requested IDs. This event loop batching mechanism is what makes DataLoader work:
// Instead of 100 individual queries:
SELECT * FROM users WHERE id = 1;
SELECT * FROM users WHERE id = 2;
// ... 98 more queries
// You get one batched query:
SELECT * FROM users WHERE id IN (1,2,3...100);
The trick is that DataLoader does this automatically. You write code that looks like it's making individual database calls, but under the hood it's batching everything. Facebook's original DataLoader pattern has become the standard for GraphQL performance optimization.
Every Implementation Has Gotchas
JavaScript - The original and still the best. Just works, but you'll fuck up the ordering requirement and spend hours debugging wrong data.
Java - CompletableFuture hell. Works fine once you understand Java's async patterns, which you probably don't.
Go - Goroutine-safe but the documentation is shit. You'll have to read the source code.
Python - aiodataloader is barely maintained. Use Strawberry GraphQL's built-in implementation instead, which actually follows modern Python async patterns.
Ruby - Shopify's graphql-batch actually works well. Rare for a Ruby library.
Real Performance Numbers
Production teams commonly see 90-95% fewer database queries for the same data. That's the difference between 200 queries per page load and maybe 5-10.
We went from page loads that took ages to sub-second responses. But here's the thing - DataLoader only helps if you implement it correctly. And there are about 5 ways to fuck it up that the docs don't warn you about.
The ordering requirement alone has fucked over more developers than any other API design decision. I've watched senior engineers spend entire afternoons debugging this shit.