What's the difference between X-Ray and CloudWatch?

[CloudWatch](https://aws.amazon.com/cloudwatch/) tells you "your API is slow." X-Ray tells you "your API is slow because the database query in the user service is taking like 3.2 seconds." CloudWatch gives you dashboards and alerts. X-Ray gives you the exact trace of what went wrong.You'll end up using both because CloudWatch alerts wake you up, X-Ray helps you figure out what to fix.

How do I not bankrupt myself with sampling?

The [default sampling](https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html) is 1 trace per second + 5% of additional traffic. Sounds reasonable until your high-volume service starts generating 100K traces per day.![X-Ray Sampling Rules](https://docs.aws.amazon.com/images/xray/latest/devguide/images/analytics-showFilterf.png)**Pro tip**: Start with 1% sampling and increase only if you need more data. [Custom sampling rules](https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html) let you sample 100% of errors but 0.1% of successful requests.

Can I use X-Ray outside AWS?

Short answer: don't. X-Ray only works with AWS infrastructure. Your on-premises app would need to send traces through AWS, which defeats the purpose. For multi-cloud setups, use [Jaeger](https://www.jaegertracing.io/) or [Zipkin](https://zipkin.io/) instead.

Will X-Ray slow down my app?

Supposedly adds 1-2% CPU overhead and uses UDP for async trace sending. In practice, it's usually fine unless you're running on potato-powered EC2 instances. The bigger issue is when the [daemon crashes](https://github.com/aws/aws-xray-daemon/issues) and you don't notice for three days.![X-Ray Performance Impact](https://technology.amis.nl/wp-content/uploads/2020/06/04-X-Ray-traces-2-1024x367.png)

Why does my trace data disappear after 30 days?

AWS automatically deletes X-Ray traces after 30 days. No exceptions, no extensions, gone forever. If you need historical data for trend analysis, you'll need to [export traces to S3](https://docs.aws.amazon.com/xray/latest/devguide/xray-api-gettrace.html) yourself. This is both a blessing (no storage costs) and a curse (no long-term debugging).

Can I build custom dashboards with X-Ray?

Nope. X-Ray gives you service maps and basic analytics, but if you want custom dashboards, you're exporting data to [CloudWatch](https://aws.amazon.com/cloudwatch/) or [Grafana](https://grafana.com/). It's a tracing tool, not a visualization platform.

Which programming languages actually work well?

The [Java SDK](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-java.html) is solid, [Node.js](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-nodejs.html) works fine, [Python](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python.html) is decent. [.NET](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-dotnet.html) support exists but feels like an afterthought. [Go](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-go.html) and [Ruby](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-ruby.html) work but require more manual setup. [OpenTelemetry](https://opentelemetry.io/) support gives you escape velocity if you need other languages or want to switch away from AWS later.

What happens when my app throws errors?

X-Ray automatically captures HTTP errors (4xx, 5xx) and exception details. Stack traces, error messages, service context - it's all there. The cool part is seeing error propagation: how a database timeout causes a cascade of failures across your microservices.![X-Ray Error Traces](https://newrelic.com/sites/default/files/wp_blog_inline_files/Xray-Trace.gif)**Pro tip**: Add custom error annotations so you can filter by error type instead of digging through stack traces.

Does X-Ray work with Docker and Kubernetes?

Yes, but you'll need to run the daemon somewhere. On [ECS](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/trace-data.html), run it as a sidecar. On [EKS](https://aws.amazon.com/blogs/compute/application-tracing-on-kubernetes-with-aws-x-ray/), run it as a DaemonSet. Use the [official daemon image](https://gallery.ecr.aws/xray/aws-xray-daemon) or spend hours debugging why your custom build doesn't work.

What's X-Ray Insights and is it worth $1 per million traces?

[X-Ray Insights](https://docs.aws.amazon.com/xray/latest/devguide/xray-insights.html) uses ML to find anomalies automatically. It'll tell you "latency increased 40% for checkout service" without you having to dig through traces manually. Worth it if you're too busy to monitor traces yourself, probably overkill for smaller apps.

How does Lambda + X-Ray work?

Just enable tracing in your Lambda console. Done. Lambda handles the daemon automatically and traces show cold starts, duration, and downstream calls. This is probably the easiest X-Ray setup you'll ever do.![Lambda X-Ray Integration](https://seed.run/assets/blog/how-to-trace-server-less-apps-with-aws-x-ray/click-recent-trace-from-the-aws-x-ray-console.png)

Can I get trace data out of AWS?

Yes, through the [X-Ray API](https://docs.aws.amazon.com/xray/latest/api/). Export to [S3](https://aws.amazon.com/s3/), [Elasticsearch](https://aws.amazon.com/opensearch-service/), or whatever. Just remember: you're paying AWS egress charges for that data.

What happens when the daemon dies?

Your app keeps running fine, but traces disappear into the void. The SDK fails gracefully, so no crashes, but you'll be debugging blind until someone notices the daemon is down. Monitor daemon health or you'll learn this lesson the hard way during your next production incident.

Should I start a new project with X-Ray in 2025?

**Hell no.** X-Ray SDKs reach end-of-support in February 2027. Starting with X-Ray now means you'll be rewriting instrumentation code in 12-18 months. Skip the pain and go straight to [OpenTelemetry with AWS Distro](https://aws-otel.github.io/docs/) or wait for [AWS Application Signals](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals.html) to exit preview.

How hard is migrating from X-Ray to OpenTelemetry?

Depends on your setup. Simple Lambda functions? Few hours per function. Complex microservices with custom instrumentation? Plan for weeks of work per service. The [official migration guide](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-migration.html) exists, but expect gaps in documentation and edge cases that aren't covered.

What happens to my X-Ray data after 2027?

The X-Ray service itself isn't going anywhere - AWS just stops supporting the SDKs and daemon. Existing traces stay visible until the normal 30-day retention expires. You can still use the X-Ray console to view historical data, but you can't generate new traces without OpenTelemetry or a third-party solution.

Will AWS extend the 2027 deadline?

Unlikely. AWS has been pushing OpenTelemetry hard since 2021, and they've given 18+ months notice. They want to consolidate around industry standards instead of maintaining proprietary SDKs. Plan for the deadline, don't bet on an extension.

Currently viewing the AI version

Switch to human version

AWS X-Ray: Distributed Tracing & 2027 Migration Strategy

Critical Timeline Warning

X-Ray SDKs reach end-of-support: February 25, 2027

Maintenance mode begins: February 25, 2026 (no new features, critical bugs only)
Migration window: 12-18 months for complex microservices
AWS will not extend deadline - 18+ months notice given

Configuration That Actually Works

Production-Ready Settings

Sampling: Start with 1% (not default 100%) to avoid bill shock
Custom sampling rules: 100% of errors, 0.1% of successful requests
Daemon: Use official Docker image aws-xray-daemon or systemd service
Port: UDP 2000 (daemon crashes = lost traces)

AWS Service Integration (Automatic)

RDS, DynamoDB, SQS, SNS, ElastiCache
Lambda (built-in, just enable tracing)
Elastic Beanstalk (pre-installed daemon)
ECS/EKS (run daemon as sidecar/DaemonSet)

Language SDK Reliability

Language	Production Readiness	Notes
Java	Excellent	Spring Boot integration solid
Node.js	Good	Express.js works, manual for others
Python	Decent	Flask/Django middleware requires work
.NET	Fair	ASP.NET Core fine, Framework janky
Go	Basic	Expect boilerplate code
Ruby	Limited	Rails integration exists, docs poor

Resource Requirements

Time Investment

Simple Lambda: Few hours per function
Complex microservices: Weeks per service for migration
Initial setup: 1-2 days for basic configuration
Migration testing: 6-12 months for enterprise systems

Expertise Requirements

IAM permission management (xray:PutTraceSegments insufficient)
Container networking for ECS/EKS deployments
UDP networking troubleshooting
OpenTelemetry knowledge for migration

Financial Costs

Free Tier (genuinely useful):

100K traces recorded/month
1M traces scanned/month

Paid Pricing:

$5 per 1M traces recorded
$0.50 per 1M traces scanned
$1 per 1M traces for ML-powered Insights

Cost Disaster Examples:

100% sampling on high-volume service: $847 weekend bill
Default sampling (1/sec + 5%): 100K traces/day on busy services

Critical Warnings

What AWS Documentation Doesn't Tell You

UDP daemon failures lose traces silently
30-day retention only (no historical analysis)
AWS-only (multi-cloud requires different solution)
Service map breaks above ~1000 spans (debugging impossible)
Daemon must be monitored or traces disappear during incidents

Migration Breaking Points

Custom instrumentation code requires complete rewrite
Testing migration across dozens of services takes months
Edge cases not covered in official migration guide
OpenTelemetry adds operational complexity (OTel Collector + X-Ray daemon)

Production Failure Scenarios

Daemon crashes during incident (no debugging capability)
Sampling misconfiguration causes budget overrun
IAM permission gaps break trace collection
Container networking issues prevent daemon communication
High trace volume overwhelms collection pipeline

Decision Criteria

Choose X-Ray When:

Already on AWS with existing X-Ray implementation
Simple Lambda-based architecture
Need immediate distributed tracing (pre-migration)
AWS service integration is primary requirement

Avoid X-Ray When:

Starting new projects in 2025+ (EOL in 2027)
Multi-cloud or on-premises requirements
Need custom dashboards or long-term data retention
Operating under tight budget constraints

Alternative Evaluation

Solution	Migration Effort	Long-term Viability	AWS Integration
AWS Distro for OpenTelemetry	Medium	High	Native
Jaeger	High	High	Manual
New Relic/Datadog	Medium	High	Agent-based

Implementation Reality

What Actually Works

Error correlation: Shows cascading failures across services
Performance analytics: Compares good vs bad traces for patterns
Service maps: Visual representation of service dependencies
Subsegments: Break down slow operations (200ms auth + 2.8s DB query)

Common Implementation Problems

Daemon installation/management outside managed services
IAM permission complexity beyond basic xray:PutTraceSegments
Container networking configuration for sidecar deployments
Sampling rule optimization to prevent cost overruns

Performance Impact

1-2% CPU overhead (generally acceptable)
UDP async transmission (minimal latency impact)
Bigger issue: daemon reliability and monitoring

Migration Strategy (Required by 2027)

Phase 1: Assessment (Now - 2025)

Inventory current X-Ray usage across services
Learn OpenTelemetry fundamentals
Pilot ADOT on non-critical services
Establish migration testing procedures

Phase 2: Migration Planning (2025-2026)

Service-by-service migration plan
Integration testing framework
Rollback procedures for failed migrations
Team training on OpenTelemetry

Phase 3: Execution (2026-Early 2027)

Gradual rollout starting with least critical services
Parallel running of X-Ray and OpenTelemetry
Validation of trace data consistency
Final cutover before February 2027 deadline

Migration Options Ranked by Difficulty

AWS Distro for OpenTelemetry: Easiest path, works with X-Ray backend
OpenTelemetry + AWS Application Signals: AWS's future direction (currently preview)
OpenTelemetry + Jaeger: Full vendor independence, highest operational overhead

Operational Intelligence

Success Patterns

Start with 1% sampling, increase based on data needs
Monitor daemon health as critically as application health
Use annotations for filtering (user IDs, feature flags, error types)
Export historical data before 30-day retention expires

Failure Patterns

100% sampling on production traffic
Ignoring daemon health monitoring
Complex IAM permissions without proper testing
Assuming X-Ray will work outside AWS ecosystem

Emergency Procedures

Daemon failure: Check systemd status, restart service
High costs: Immediately reduce sampling percentage
Missing traces: Verify IAM permissions and daemon connectivity
Service map overload: Implement trace filtering by service/operation

Long-term Viability Assessment

Current State: Functional but deprecated technology
2026: Maintenance mode only (no new features)
2027+: End of support, OpenTelemetry migration mandatory
Recommendation: Plan migration now, don't wait for deadline panic

Useful Links for Further Investigation

Essential Resources for X-Ray and Migration Planning

Link	Description
AWS X-Ray Service Page	Official product overview, features, and use cases directly from AWS
AWS X-Ray Developer Guide	Comprehensive technical documentation covering setup, configuration, and advanced features
AWS X-Ray API Reference	Complete API documentation for programmatic access to X-Ray services
AWS X-Ray Pricing	Current pricing information, free tier limits, and cost calculation examples
AWS X-Ray Features	Detailed breakdown of X-Ray capabilities and differentiators
Getting Started with AWS X-Ray	Step-by-step guide for implementing X-Ray in your applications
AWS Observability Workshop	Hands-on training covering X-Ray, CloudWatch, and other AWS observability tools (decent but skips the hard parts about container networking)
X-Ray Analytics Workshop	Advanced workshop focused on X-Ray analytics and root cause analysis
AWS X-Ray Daemon Documentation	Installation and configuration guide for the X-Ray daemon
AWS X-Ray SDK for Java	Java implementation guide with framework-specific integrations
AWS X-Ray SDK for Node.js	Node.js SDK documentation with Express.js and framework examples
AWS X-Ray SDK for .NET	.NET Core and ASP.NET integration documentation
AWS X-Ray SDK for Python	Python SDK guide covering Django, Flask, and other frameworks
AWS X-Ray SDK for Go	Go language SDK implementation and examples
AWS X-Ray SDK for Ruby	Ruby and Rails integration documentation
Using X-Ray with AWS Lambda	Lambda-specific X-Ray configuration and best practices
X-Ray with Amazon ECS	Containerized application tracing on ECS
X-Ray with Elastic Beanstalk	Built-in X-Ray integration for Elastic Beanstalk applications
X-Ray Service Integrations	Complete list of AWS services with native X-Ray integration
X-Ray Data Protection and Encryption	Security configuration and compliance information
X-Ray IAM Permissions	Access control and IAM policy examples
X-Ray VPC Endpoints	Private network access configuration
X-Ray Sampling Rules	Advanced sampling configuration for cost optimization
X-Ray SDK and Daemon End of Support Timeline	Official AWS timeline and migration requirements
Migrating from X-Ray to OpenTelemetry	Step-by-step migration guide from AWS
AWS Distro for OpenTelemetry	AWS's supported OpenTelemetry distribution - your migration path
AWS Application Signals (Preview)	AWS's next-generation observability platform
OpenTelemetry Main Website	Official OpenTelemetry documentation and getting started guides
CNCF Jaeger Project	Open source distributed tracing platform - viable X-Ray alternative
AWS re:Post X-Ray Questions	Community-driven Q&A platform for X-Ray questions and migration help
AWS X-Ray Docker Images	Official Docker images for the X-Ray daemon (until 2027)

AWS X-Ray: Distributed Tracing & 2027 Migration Strategy

Critical Timeline Warning

Configuration That Actually Works

Production-Ready Settings

AWS Service Integration (Automatic)

Language SDK Reliability

Resource Requirements

Time Investment

Expertise Requirements

Financial Costs

Critical Warnings

What AWS Documentation Doesn't Tell You

Migration Breaking Points

Production Failure Scenarios

Decision Criteria

Choose X-Ray When:

Avoid X-Ray When:

Alternative Evaluation

Implementation Reality

What Actually Works

Common Implementation Problems

Performance Impact

Migration Strategy (Required by 2027)

Phase 1: Assessment (Now - 2025)

Phase 2: Migration Planning (2025-2026)

Phase 3: Execution (2026-Early 2027)

Migration Options Ranked by Difficulty

Operational Intelligence

Success Patterns

Failure Patterns

Emergency Procedures

Long-term Viability Assessment

Useful Links for Further Investigation

Essential Resources for X-Ray and Migration Planning

Related Tools & Recommendations

OpenTelemetry + Jaeger + Grafana on Kubernetes - The Stack That Actually Works

Datadog Cost Management - Stop Your Monitoring Bill From Destroying Your Budget

Datadog vs New Relic vs Sentry: Real Pricing Breakdown (From Someone Who's Actually Paid These Bills)

Datadog Enterprise Pricing - What It Actually Costs When Your Shit Breaks at 3AM

New Relic - Application Monitoring That Actually Works (If You Can Afford It)

Prometheus + Grafana + Jaeger: Stop Debugging Microservices Like It's 2015

Set Up Microservices Monitoring That Actually Works

Zipkin - Distributed Tracing That Actually Works

Lambda Alternatives That Won't Bankrupt You

Stop Your Lambda Functions From Sucking: A Guide to Not Getting Paged at 3am

AWS Lambda - Run Code Without Dealing With Servers

API Gateway Pricing: AWS Will Destroy Your Budget, Kong Hides Their Prices, and Zuul Is Free But Costs Everything

AWS API Gateway - Production Security Hardening

AWS API Gateway - The API Service That Actually Works

GitHub Actions + Docker + ECS: Stop SSH-ing Into Servers Like It's 2015

Amazon ECS - Container orchestration that actually works

Dynatrace Enterprise Implementation - The Real Deployment Playbook

Dynatrace - Monitors Your Shit So You Don't Get Paged at 2AM

OpenTelemetry Alternatives - For When You're Done Debugging Your Debugging Tools

OpenTelemetry - Finally, Observability That Doesn't Lock You Into One Vendor