The Error → Chat → Wake Someone Up Pipeline
Sentry catches your errors, Slack yells about them, and PagerDuty wakes up whoever's unlucky enough to be on call. Simple concept, but the devil's in the webhook details.
Core Architecture Components
Sentry: The Error Catcher
Sentry is your early warning system. It sits in your app catching crashes, performance problems, and whatever else your users manage to break. Supports pretty much every language (though their JavaScript error fingerprinting is garbage for similar errors). Version gotcha: Sentry SDK 7.x to 8.x migration changed error boundary handling - check the migration docs or your React errors might not get captured.
The important part: Sentry's webhook system fires off JSON payloads when shit hits the fan. Make sure you understand webhook signature verification or you'll be debugging security issues for weeks. These webhooks include the error message, stack trace, user context, and environment info - basically everything you need to figure out what broke and why.
Slack: The Message Hub
Slack is where your team panics together in an organized fashion. When Sentry fires a webhook, your middleware formats it into a readable message and posts it to the appropriate channel. Block Kit lets you create fancy interactive messages with buttons like "View in Sentry" or "Page the on-call engineer."
Here's where rate limiting will bite you: Slack enforces a 1 message per second limit per channel. During a major outage when errors are flying, you'll hit this limit fast. Check out Slack's rate limiting guide and plan for message batching or your integration will choke when you need it most.
PagerDuty: The Human Wakeup Service
PagerDuty is expensive but it beats missing critical alerts. When your Slack integration determines an error needs human attention, it fires off an event to PagerDuty's Events API. PagerDuty then proceeds to annoy whoever's on call until they acknowledge the incident.
The Event Intelligence feature tries to group related alerts so you don't get 47 pages for the same database outage. Their alert grouping algorithms use machine learning to correlate similar incidents. Works okay most of the time, fails spectacularly during actual emergencies when you need it most. But hey, at least someone gets woken up.
How the Data Actually Flows (And Where It Breaks)
Webhook Flow Architecture
Here's what happens when your production app decides to shit the bed:
- Error happens: Your app crashes, Sentry captures the stack trace and user context
- Webhook fires: Sentry sends a POST request to your serverless function (this fails constantly)
- Middleware processes: Your function verifies the webhook signature, parses the error data
- Decision logic: Function decides if this error deserves human attention based on your rules
- Slack notification: Posts formatted message to team channel (rate limits kick in here)
- Escalation check: If error meets severity criteria, creates PagerDuty incident
- Human wakeup: PagerDuty calls/texts whoever's on call until they acknowledge
Where this breaks in practice:
- Webhook delivery failures (network timeouts, server overload)
- Signature verification issues when secrets rotate (War story: Spent 4 hours debugging "invalid signature" errors during Black Friday because someone rotated the webhook secret)
- Rate limiting during outage storms (Reality check: During our Redis outage, Slack rate limited us after maybe 400-500 messages pretty quick)
- PagerDuty API throttling when you need it most
- Cold start delays in serverless functions when milliseconds matter (Gotcha: Vercel cold starts can take 2-5 seconds, AWS Lambda 500ms-2s depending on runtime)
You'll need proper retry logic with exponential backoff and dead letter queues for the inevitable failures.
Lessons Learned From Production Hell
Use Slack as Mission Control
Put all incident communication in Slack channels. Makes it easy to see who's working on what, and you get a timeline for postmortems. Just don't try to manage complex incidents over DM - that's how important context gets lost.
Stop Alert Storms Before They Start
When your database goes down, you don't need 200 individual error alerts. Build in correlation logic to group related errors by service, deployment, or error signature. PagerDuty's grouping helps, but you'll still need custom logic for your specific fuckups.
Include Actual Context
Every alert should answer: What broke? How many users affected? What changed recently? Link directly to Sentry issues, not just generic "something went wrong" messages. Your 3am self will thank you.
Monitor Your Monitoring
Track webhook delivery rates, API response times, and end-to-end notification latency. Your integration will fail when you need it most, so build in health checks and dead letter queues for failed webhooks. Most teams see 95%+ reliability once they implement proper retry logic.