Step 1: Identify the Error Category
When your Solana app explodes in production, don't panic-refresh everything. First, categorize what broke:
Network Issues (50% of problems): RPC timeouts, connection failures, rate limiting
Transaction Issues (30% of problems): Blockhash expired, simulation failed, wrong fees
Account Issues (15% of problems): PDA derivation, missing accounts, wrong data
Code Issues (5% of problems): Memory leaks, logic errors, version conflicts
Most developers immediately assume it's a code problem. It's usually network issues.
Step 2: Check the Obvious Shit First
Before diving into complex debugging, check these in order:
- Network Status: Is status.solana.com green? If not, it's not your fault.
- RPC Health: Try switching to a different RPC provider temporarily. Helius, QuickNode, or Chainstack have better reliability than public endpoints.
- Recent Changes: What changed in the last deployment? Revert if possible.
- Error Frequency: Is this affecting 100% of transactions or just some users?
Step 3: Enable Comprehensive Logging
The default Solana Web3.js logging is garbage. Here's what actually helps:
// Production logging setup
const connection = new Connection(RPC_URL, {
commitment: 'confirmed',
confirmTransactionInitialTimeout: 60000,
disableRetryOnRateLimit: false,
// Add custom fetch for logging
fetch: (url, options) => {
console.log(`RPC Request: ${options.method || 'GET'} ${url}`);
console.log(`Request body:`, JSON.parse(options.body || '{}'));
return fetch(url, options).then(response => {
if (!response.ok) {
console.error(`RPC Error: ${response.status} ${response.statusText}`);
}
return response;
});
}
});
Step 4: Transaction Debugging Workflow
For transaction failures, follow this systematic approach:
a) Check Transaction Status
const signature = await connection.sendRawTransaction(transaction.serialize());
const status = await connection.confirmTransaction(signature);
console.log('Transaction status:', status);
b) Inspect Transaction Details
const parsedTx = await connection.getParsedTransaction(signature);
console.log('Parsed transaction:', JSON.stringify(parsedTx, null, 2));
c) Check Account States
// Before transaction
const accountBefore = await connection.getAccountInfo(targetAccount);
console.log('Account before:', accountBefore);
// After failed transaction
const accountAfter = await connection.getAccountInfo(targetAccount);
console.log('Account after:', accountAfter);
Step 5: Memory Leak Detection
Memory leaks kill Node.js apps slowly, then suddenly. Monitor these metrics:
// Add to your health check endpoint
app.get('/health', (req, res) => {
const usage = process.memoryUsage();
const mb = (bytes) => Math.round(bytes / 1024 / 1024);
res.json({
memory: {
rss: `${mb(usage.rss)}MB`,
heapTotal: `${mb(usage.heapTotal)}MB`,
heapUsed: `${mb(usage.heapUsed)}MB`,
external: `${mb(usage.external)}MB`
},
uptime: process.uptime()
});
});
Warning signs: heapUsed growing continuously, never declining after GC.
Step 6: PDA Debugging Deep Dive
PDA issues are subtle and evil. Here's how to debug them systematically:
// Debug PDA derivation
function debugPDA(seeds, programId) {
console.log('Seeds:', seeds.map(s =>
Buffer.isBuffer(s) ? s.toString('hex') : s
));
console.log('Program ID:', programId.toString());
try {
const [pda, bump] = PublicKey.findProgramAddressSync(seeds, programId);
console.log('Derived PDA:', pda.toString());
console.log('Bump:', bump);
return pda;
} catch (error) {
console.error('PDA derivation failed:', error);
throw error;
}
}
Common PDA failures:
- Seed encoding issues (string vs Buffer)
- Wrong program ID (copy-paste from wrong environment)
- Seed order matters:
[a, b]
≠[b, a]
- Missing bump seed validation
Step 7: Performance Profiling in Production
Don't guess where bottlenecks are. Measure:
// Wrap RPC calls with timing
async function timedRPC(operation, ...args) {
const start = Date.now();
try {
const result = await operation(...args);
const duration = Date.now() - start;
console.log(`RPC took ${duration}ms`);
return result;
} catch (error) {
const duration = Date.now() - start;
console.error(`RPC failed after ${duration}ms:`, error.message);
throw error;
}
}
// Usage
const balance = await timedRPC(
connection.getBalance.bind(connection),
publicKey
);
Step 8: Emergency Production Fixes
When everything's broken and users are screaming:
1. Circuit Breaker Pattern
let failureCount = 0;
const MAX_FAILURES = 5;
const RESET_TIMEOUT = 30000;
async function circuitBreakerRPC(operation) {
if (failureCount >= MAX_FAILURES) {
throw new Error('Circuit breaker open - too many failures');
}
try {
const result = await operation();
failureCount = 0; // Reset on success
return result;
} catch (error) {
failureCount++;
if (failureCount >= MAX_FAILURES) {
setTimeout(() => failureCount = 0, RESET_TIMEOUT);
}
throw error;
}
}
2. Fallback RPC Providers
const RPC_ENDPOINTS = [
'https://api.mainnet-beta.solana.com',
'https://mainnet.helius-rpc.com/',
'https://solana-mainnet.rpc.quicknode.pro/YOUR_TOKEN'
];
async function fallbackRPC(operation) {
for (const endpoint of RPC_ENDPOINTS) {
try {
const connection = new Connection(endpoint);
return await operation(connection);
} catch (error) {
console.warn(`RPC ${endpoint} failed:`, error.message);
continue;
}
}
throw new Error('All RPC endpoints failed');
}
3. Transaction Retry with Exponential Backoff
async function retryTransaction(transactionFn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await transactionFn();
} catch (error) {
if (i === maxRetries - 1) throw error;
const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
console.log(`Transaction failed, retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
What NOT to Do (Common Panic Mistakes)
- Don't restart everything - You'll lose debugging context
- Don't immediately deploy fixes - Debug first, fix second
- Don't ignore Solana Explorer - Transaction signatures contain crucial debugging info
- Don't assume it's your code - Network issues are more common
- Don't disable error handling - You need those error details