After running GitHub Projects for 18 months across 12 enterprise teams with 23,000+ active items, here's what nobody tells you about scaling beyond the marketing bullshit.
The 15,000 Item Performance Cliff
GitHub says 50,000 items per project. Reality? Performance turns to dogshit around 15,000 active items. I learned this during our Q3 planning when filtering 18,000 items locked up the UI for 45 seconds. The table view becomes unusable, searches time out, and bulk operations fail silently. The GitHub documentation doesn't mention these performance limitations.
What actually happens at scale:
- Table loading times jump from 2 seconds to 30+ seconds
- GraphQL queries start hitting timeouts (10 second limit)
- Browser memory usage spikes to 2GB+ per tab
- Mobile becomes completely worthless (it already was, but now it's worse)
We had to split our monolithic project into 4 smaller projects around 8,000 items each. Not ideal, but the alternative was watching senior engineers curse at loading spinners all day.
API Rate Limits Will Ruin Your Weekend
GitHub's 5,000 requests per hour limit sounds generous until you try bulk operations. Moving 500 items between projects? That's 1,000+ API calls (read + update for each item). You'll hit the limit in 12 minutes.
Real scenario that destroyed our Friday deployment:
- Automated script to update 800 items with new sprint assignments
- Hit rate limit after 200 items, remaining 600 stuck in limbo
- Weekend on-call got paged when deployment dashboard showed "unknown" status
- Took 3 hours to manually fix because bulk retry logic didn't exist
Rate limit math that'll save your ass:
- Moving items between projects: 2 API calls per item
- Updating custom fields: 1 API call per field per item
- Adding items to projects: 3 API calls (create, link, update)
- Bulk status updates: 1 API call per item (sounds simple, isn't)
Plan for 2,500 operations max per hour if you want buffer space for other team activity.
GraphQL Queries That Don't Suck at Scale
The web interface chokes on large datasets, but GraphQL API can handle it with proper query structure. Most developers write shit queries because the GitHub GraphQL docs don't explain performance implications, and the official API guides focus on basic examples rather than production patterns.
Bad query pattern (kills performance):
query {
organization(login: \"yourorg\") {
projectsV2(first: 100) {
nodes {
items(first: 50) { # This nested query explodes
nodes {
fieldValues(first: 20) { # Now you're fucked
nodes {
... on ProjectV2ItemFieldTextValue {
text
}
}
}
}
}
}
}
}
}
Query that actually works with 20k+ items:
query($projectId: ID!, $cursor: String) {
node(id: $projectId) {
... on ProjectV2 {
items(first: 100, after: $cursor) {
pageInfo {
hasNextPage
endCursor
}
nodes {
id
fieldValues(first: 10) {
nodes {
... on ProjectV2ItemFieldSingleSelectValue {
name
}
}
}
}
}
}
}
}
Use pagination with cursors, limit field queries to essentials only, and process in batches of 100 items max. Anything else will timeout in production. The GraphQL best practices guide explains cursor-based pagination, but GitHub's specific implementation has quirks.
Automation Patterns That Won't Break at 3AM
Enterprise automation needs error handling, retry logic, and monitoring. GitHub's basic automation works until it doesn't, then you're debugging at 3AM wondering why 400 items are stuck in "In Progress" status.
Bulletproof automation architecture:
- Queue-based processing - Don't hit APIs directly from triggers
- Exponential backoff retry - Rate limits are temporary, failures aren't
- Dead letter queues - Some items will always fail, isolate them
- Monitoring with actual alerts - Silent failures are worse than loud ones
We use GitHub Actions with AWS SQS for reliable processing. When a PR gets merged, it queues an update instead of trying to hit the API immediately. The queue processor handles retries, rate limiting, and failure isolation. The GitHub Actions marketplace has tools for this, but most are poorly maintained.
Production incident that taught us this:
- Automated sprint rollover script ran at midnight (bad idea)
- Hit rate limits, partially updated 1,200 items
- Morning standup showed half the team with "undefined" sprint assignments
- Took 4 hours to identify which items were corrupted
- Lost half a day of sprint planning while we unfucked the data
Custom Field Performance Is a Nightmare
Custom fields are where performance goes to die. Each field adds API overhead, and GitHub's field querying is inefficient as hell. We started with 15 custom fields because "flexibility." Big mistake. The field limit documentation says 50 fields max, but doesn't mention the performance implications.
Fields that destroy performance:
- Text fields with long content (descriptions, notes) - slow to query
- Date fields with complex calculations - kills roadmap view rendering
- Multiple select fields with 20+ options - UI becomes unusable
- Calculated fields depending on other fields - creates query cascades
What actually works in production:
- Priority: Single select (High/Medium/Low) - simple, fast
- Story Points: Number field - essential for velocity tracking
- Component: Single select (5-8 options max) - for filtering
- Status: Built-in status field - don't create custom status fields
- Sprint: Iteration field - works with GitHub's sprint planning
Kill everything else. Seriously. That "estimated completion date" field isn't worth the 3-second load time penalty.
Enterprise Permission Hell
GitHub Projects permissions are designed by people who never worked in enterprise environments. The model breaks down with complex org structures, external contractors, and compliance requirements.
Permission edge cases that will bite you:
- External contractors can see project data but not underlying repos
- Admin permissions don't grant project management rights automatically
- Service accounts need separate permission grants for API automation
- SSO failures lock users out of projects but not repos (confusing as hell)
- Cross-org projects require manual permission coordination
We maintain a separate permissions audit spreadsheet because GitHub's permission reporting is garbage. Monthly permission review meetings because people get access they shouldn't have, and removing access breaks automation scripts.
Monitoring and Alerting That Actually Matters
GitHub doesn't provide operational metrics for projects, so you're flying blind without custom monitoring. Basic "it's working" checks aren't enough when automation manages critical planning data.
Essential monitoring (based on painful experience):
- API rate limit consumption (alert at 80% usage)
- Automation queue depth (alert if processing falls behind)
- Failed API calls by operation type (update vs create vs delete)
- Project performance metrics (query response times)
- Data consistency checks (items in wrong status, missing fields)
We built custom monitoring because GitHub's built-in insights are worthless for operations. GitHub Status API doesn't cover projects specifically, so project outages often go unnoticed until users complain.
Monitoring stack that saves your ass:
- DataDog for API call metrics and response times
- Custom health check endpoints for automation services
- Slack alerts for rate limit warnings (not errors, warnings)
- Weekly automated data consistency reports
- Dashboard showing project performance trends
The goal isn't perfect monitoring - it's early warning before things break so badly that you're fixing data corruption instead of preventing it.
Essential Documentation and Tools
Critical resources for enterprise implementations: GitHub GraphQL API Explorer, rate limiting documentation, webhook configuration, GitHub Actions marketplace, project automation examples, enterprise security policies, audit log API, API pagination best practices, GraphQL cursor pagination, GitHub Status API, and third-party monitoring tools for comprehensive project health tracking.