tracemalloc - Find Memory Leaks in Your Python Code

Currently viewing the human version

Why tracemalloc Exists and When You Actually Need It

Python's garbage collector sucks at cleanup sometimes. Objects stick around eating memory until your app dies with MemoryError: unable to allocate array of size X. External profilers are either too slow or miss the important shit.

What tracemalloc Actually Does

tracemalloc hooks into Python's memory allocation and records what matters:

Where memory gets allocated: Every time Python creates an object, tracemalloc records the complete stack trace. No more guessing which line is eating your RAM.

How much memory each part uses: It shows you exactly which function is allocating 2GB of dictionaries.

Memory growth over time: Take snapshots and compare them. Growing allocations = memory leaks.

When External Tools Fall Short

I've debugged production leaks where memory_profiler made the app 10x slower and py-spy couldn't see Python's internal allocations. tracemalloc runs in-process with about 30% overhead (docs say 30%, feels like more) - painful but workable for debugging.

The killer feature is detailed stack traces. When your Flask app starts eating memory after 6 hours, tracemalloc tells you exactly which view function and line is the problem. External profilers just dump useless aggregate data on you.

Production Reality Check

You need tracemalloc when:

Memory usage keeps growing in long-running services
Your app works locally but crashes in production after hours/days
Memory profilers are too slow or miss Python-specific allocations
You need to debug without deploying debug builds

Don't leave it running 24/7 in production - the performance hit adds up. Turn it on when things break, get your data, turn it off.

tracemalloc vs Other Memory Profilers (Reality Check)

Tool	Built-in	Performance Hit	Status	My Take
tracemalloc	✅ Python 3.4+	10-50% slower	Just works	Already there, kinda slow but whatever
memory_profiler	❌ pip install	Makes everything unusable	Barely works	Don't bother
pympler	❌ pip install	2-5x slower	Works if you can wait	Good for deep shit
py-spy	❌ pip install	~5% overhead	Works great	Wrong tool this is for CPU
memray	❌ pip install	~10% slower	Best overall	I gave up after 2 hours with the dependencies

How to Actually Use tracemalloc (With All the Gotchas)

Start tracing, run code, take snapshots, analyze. Simple pattern but full of landmines.

Don't Start with Default Settings

import tracemalloc

## DON'T: Start with 1 frame - useless traces
tracemalloc.start()

## DO: Start with 10-25 frames to get useful call stacks  
tracemalloc.start(25)

More frames = more overhead. Don't use 100 frames unless you hate performance.

The Environment Variable Trick

Debug production without touching code:

export PYTHONTRACEMALLOC=10
python your_app.py

This starts tracing from the beginning. You'll see import overhead too - output gets noisy but that's usually fine.

Memory Leak Detection

## Before the suspected operation
snapshot1 = tracemalloc.take_snapshot()

## Run suspected leaky code
for i in range(100):
    process_request()
    
## After
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')

## Show only growth
for stat in top_stats[:10]:
    if stat.size_diff > 0:
        mb_diff = stat.size_diff / 1024 / 1024
        print(f"LEAKED {mb_diff:.1f} MB at:")
        for line in stat.traceback.format():
            print(f"  {line}")

Common Screwups

Starting tracing too late: If you start after hours of uptime, you'll miss the interesting allocations. Start early or use environment variables.

Not filtering stdlib noise: Half your results are Python internals. Filter them out:

filters = [
    tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
    tracemalloc.Filter(False, "<frozen importlib._bootstrap_external>"),
]
filtered_snapshot = snapshot.filter_traces(filters)

Forgetting the performance hit: Don't leave this running 24/7. Users will notice the slowdown.

The environment variable trick is clutch for emergency debugging - no code changes, just restart with PYTHONTRACEMALLOC=10 and wait for the leak to reproduce.

Questions You'll Actually Ask

Will this kill my app's performance?

Yeah, it's slow. Sometimes 10% overhead, sometimes kills everything. Don't leave it on in production unless you want angry users.

Why am I seeing `<unknown>` everywhere?

You started tracing too late. All that memory was allocated before you called tracemalloc.start(). Either start earlier or use export PYTHONTRACEMALLOC=10 to trace from startup. This will fail silently and waste your afternoon if you get it wrong.

Does this see NumPy/pandas memory usage?

Nope. tracemalloc only sees Python's allocations. If you're using NumPy, pandas, or any C extension, you're blind to most memory usage. The results are full of Python internals garbage while the real memory hogs stay hidden.

My results are full of Python internals - how do I filter that out?

Add filters to hide stdlib noise:

filters = [
    tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
    tracemalloc.Filter(False, "<frozen importlib._bootstrap_external>"),
    tracemalloc.Filter(False, tracemalloc.__file__),
]
filtered = snapshot.filter_traces(filters)

How do I find what's actually leaking memory?

Take snapshots before and after the suspected operation, then compare:

before = tracemalloc.take_snapshot()
## Run suspicious code
after = tracemalloc.take_snapshot()
growing = after.compare_to(before, 'lineno')
## Look for positive size_diff values

Does this work with async/await code?

Yeah, but async code has weird memory patterns. The event loop holds references longer than you'd expect. tracemalloc shows the allocations but figuring out why shit isn't getting cleaned up is harder.

Can I save snapshots for later analysis?

snapshot.dump('debug_snapshot.dump')
## Later...
loaded = tracemalloc.Snapshot.load('debug_snapshot.dump')

Don't do this in production - dump files can be hundreds of MB for complex apps.

What's the difference between 'lineno', 'filename', and 'traceback' grouping?

'lineno': Shows exact line numbers - best for finding the specific problem
'filename': Groups by file - good for seeing which modules are fucked
'traceback': Groups by full call stack - useful when the same line gets called from different places

Start with 'lineno', use 'traceback' if you need more context.

How much memory does tracemalloc itself use?

Usually 1-5 MB but scales with allocations you're tracking. Check with tracemalloc.get_tracemalloc_memory() if you're worried about it eating your memory budget.

Real-World War Stories and Lessons Learned

I've debugged memory leaks that would have been impossible to find without tracemalloc. Here's what actually works and the stupid mistakes I made.

Flask App That Ate AWS Budget

Had this Flask app that looked fine for a few hours, then memory would climb until Kubernetes killed it. AWS bill went nuts - like $800 one day when it's usually $200. Panic mode engaged.

Tried profiling with memory_profiler first but it made everything 10x slower. External profilers just dumped useless aggregate shit on me. Finally used PYTHONTRACEMALLOC=10 and restarted the service.

Turns out the leak was in a caching decorator holding references to request objects. The cache was supposed to have a size limit but the cleanup logic was fucked. Without tracemalloc's stack traces showing me the exact line in the middleware, I would've spent days hunting this down.

Don't be an idiot like me - check your caching logic actually cleans up.

Data Pipeline That Killed the Server

Had this ETL job processing CSVs. Load 500MB file, transform it, wonder why I need 8GB RAM and the server crashes every few hours.

Used snapshot comparisons to track memory at each stage:

before_load = tracemalloc.take_snapshot()
data = load_huge_csv()
after_load = tracemalloc.take_snapshot()
load_stats = after_load.compare_to(before_load, 'lineno')

Turns out pandas was keeping copies of data way longer than I expected, plus I was creating unnecessary intermediate DataFrames. Cut memory usage by 60% just by adding explicit del dataframe calls and being less stupid about chaining operations.

Spent 3 days thinking it was pandas being stupid, turns out the image library was the asshole all all along.

Background Job From Hell

Background job processing uploaded images. Memory usage climbed over days until container died. AWS costs went from normal to 'oh shit' in one day.

tracemalloc showed the leak was in the image processing library holding decoded image data. Library wasn't cleaning up after exceptions. Added explicit cleanup in finally blocks and costs dropped.

Should have figured this out way earlier but I was being lazy about proper exception handling.

When NOT to Use tracemalloc

High-performance APIs: 30% overhead is noticeable. Don't leave it on during peak traffic or you'll get complaints.

NumPy-heavy workloads: tracemalloc won't see most memory usage. The real hogs are invisible.

Distributed systems: Only shows per-process memory. If your leak is in Redis connections or shared memory, you're blind.

What Actually Works

Environment variable toggling: if os.environ.get('DEBUG_MEMORY'): tracemalloc.start(10). Deploy with this, enable only when shit breaks.

Memory threshold alerts: Hook into monitoring to dump a snapshot when memory hits 80% of container limits. This ugly hack saved me hours of debugging.

CI regression tests: Compare memory snapshots in tests to catch leaks before production. This prevented at least 3 memory disasters.

tracemalloc is a debugging tool, not monitoring. Turn it on when things break, get data, fix the problem, turn it off. Don't overthink it.

Related Tools & Recommendations

compare

Recommended

Django、Flask、FastAPI - 結局どれ使えば死なずに済むのか

integrates with Django

Django

/ja:compare/django/flask/fastapi/production-framework-selection

100%

howto

Recommended

How to Grab Specific Files from Git Branches (Without Destroying Everything)

November 15th, 2023, 11:47 PM: Production is fucked. You need the bug fix from the feature branch. You do NOT need the 47 experimental commits that Jim pushed a

Git

/howto/merge-git-branch-specific-files/selective-file-merge-guide

50%

news

Recommended