OpenAI Realtime API Function Calling

Currently viewing the human version

Async Function Calling - Finally Doesn't Suck

Function Calling Diagram

Been dealing with this API since beta, and the function calling was broken as hell. User asks a question, dead silence for several seconds while the API waits for your database. OpenAI finally fixed this with the GA release - now it keeps talking while your queries run in the background.

You can read the official docs if you hate yourself, or check Microsoft's guide for the enterprise version of this mess.

The Problem That Drove Everyone Insane

Every function call was dead air. Customer calls asking for their account balance, then 5 seconds of absolute silence while my slow-ass database query runs. I lost so many calls because people thought the line went dead. Support tickets constantly saying "your bot just stops talking and hangs up on me."

Now it actually keeps the conversation going while functions run. Still not perfect though - sometimes the AI talks too much trying to fill time, and if your database is really slow users can tell it's bullshitting.

What actually improved:

Function calls don't kill conversation flow anymore
Better at figuring out which function to call (still screws this up sometimes)
Decent at extracting parameters from speech (except when people have accents)

Here's the catch: it's still not perfect. The timing can be weird, and if your function takes too long, users notice the AI is bullshitting to fill time.

The Setup That Won't Make You Hate Your Life

Here's the basic config. Nothing fancy, just what works:

// Basic function setup - don't overcomplicate this
const sessionConfig = {
  type: "session.update",
  session: {
    tools: [
      {
        type: "function",
        name: "getAccountBalance",
        description: "Get user account balance", // Keep descriptions short or it hallucinates
        parameters: {
          type: "object",
          properties: {
            accountId: {
              type: "string",
              description: "Account ID"
            }
          },
          required: ["accountId"]
        }
      }
    ],
    // This truncation thing actually helps with costs
    truncation: {
      type: "retention_ratio",
      retention_ratio: 0.8  // Cuts 20% when you hit token limits
    },
    max_response_output_tokens: 4096  // Don't set this too high or it rambles
  }
};

Learned this the hard way - if your function descriptions are too detailed, the AI gets creative and starts hallucinating functions that don't exist. I wrote some long-ass description for a user lookup function and it started calling made-up functions like "getUserDetailedProfile" that crashed my app. Keep descriptions short and boring. The OpenAI community forum is full of people bitching about this exact thing.

How to Handle Slow Functions Without Users Hanging Up

WebSocket Connection

Users hang up after 2 seconds of silence thinking the call dropped. I had to implement this janky "please hold" pattern to stop losing customers:

// The "please hold" pattern that saves your ass
ws.on('message', (data) => {
  const event = JSON.parse(data);

  if (event.type === 'response.function_call_arguments.done') {
    // Immediately tell them you're working on it
    ws.send(JSON.stringify({
      type: "conversation.item.create",
      item: {
        type: "function_call_output",
        call_id: event.call_id,
        output: JSON.stringify({
          status: "processing",
          message: "Give me a sec to check that..."
        })
      }
    }));

    // Actually do the work
    processFunction(event.name, event.arguments)
      .then(result => {
        ws.send(JSON.stringify({
          type: "conversation.item.create",
          item: {
            type: "function_call_output",
            call_id: event.call_id,
            output: JSON.stringify(result)
          }
        }));
      })
      .catch(error => {
        // This WILL happen in production
        ws.send(JSON.stringify({
          type: "conversation.item.create",
          item: {
            type: "function_call_output",
            call_id: event.call_id,
            output: JSON.stringify({
              error: "Database is having a moment. Can you try that again?"
            })
          }
        }));
      });
  }
});

What actually happens with timing:

Under 2 seconds: Fine, nobody notices
2-5 seconds: Users get antsy but the "hold on" message helps
Over 5 seconds: People start hanging up or asking if the call dropped
Over 10 seconds: Just return an error and try again later

Learned these timing rules from watching users hang up. The Twilio voice integration examples actually show decent timeout handling if you don't want to figure it out yourself.

Why Your Token Costs Will Explode (And How to Stop It)

Token costs get crazy if you're not careful. Had one customer ramble for half an hour about their life story before asking a simple question. Burned through something like 18k tokens because the API remembers every "um" and "well, you know" they said. Bill jumped from around $130 to almost $900 that month. Check the OpenAI pricing calculator for current costs.

The truncation feature actually helps here, but you need to set it up right:

const sessionConfig = {
  session: {
    instructions: "Keep it short.", // Seriously, long prompts make this worse

    // This automatically drops the oldest conversation parts
    truncation: {
      type: "retention_ratio",
      retention_ratio: 0.8  // Keeps 80%, drops 20% when hitting limits
    },

    // Don't set this too high or costs go insane
    max_response_output_tokens: 2048
  }
};

What actually gets truncated:

Old conversation turns (keeps the last 10-15)
System prompts stay (thank god)
Active function calls are preserved
All the "um" and "like" filler gets dropped

This can cut your token usage by half in long conversations, but it's not magic. Users who talk forever will still cost you money. The prompt caching docs have more cost-cutting tricks if you're desperate.

When Everything Goes Wrong (And It Will)

Your functions will fail. Database timeouts, API rate limits, network hiccups - production is a hostile environment. Here's how to handle it without the conversation falling apart:

// Error handling that doesn't suck
async function handleFunctionCall(name, args, callId) {
  try {
    const result = await executeFunction(name, args);
    return {
      call_id: callId,
      output: JSON.stringify(result)
    };
  } catch (error) {
    // Don't just dump raw errors on users
    if (error.message.includes('timeout')) {
      return {
        call_id: callId,
        output: JSON.stringify({
          error: "The system is running slow right now. Want to try again?"
        })
      };
    } else if (error.status === 429) {
      return {
        call_id: callId,
        output: JSON.stringify({
          error: "Too many people are using this right now. Give me a minute to try again."
        })
      };
    } else {
      // Generic "something broke" message - don't expose internals
      return {
        call_id: callId,
        output: JSON.stringify({
          error: "Something went wrong on my end. Can you repeat that?"
        })
      };
    }
  }
}

Don't be dumb like I was initially - never return raw database errors to users. "Connection refused to postgres-prod-1:5432" just confuses people and makes your app look broken. Learned this when someone from accounting called asking what "ECONNREFUSED" meant and if they were getting hacked.

The OpenAI Realtime Console repo shows how they handle errors, and their GitHub issues are full of people hitting the same problems you will.

The async function calling is way better than the old system, but it's not magic. Your functions still need to be fast, your error handling still needs to be solid, and you still need to watch those costs like a hawk. And don't even get me started on when the AI decides to call the same function 3 times in a row because it didn't like your response format.

The GA model also added image support, which sounds cool until you see what it does to your bill.

Beta vs GA: What Actually Changed

Feature	Beta Model	GA Model	What It Means for You
Conversation Flow	Dead silence during function calls	Keeps talking while functions run	No more awkward pauses
Function Accuracy	Hit or miss, especially with accents	Better but still not perfect	Fewer "sorry, what?" moments
Cost	Expensive for long calls	Slightly cheaper + truncation helps	Still pricey, just less painful
Image Support	None	Works but costs a lot	Cool feature, budget killer
Error Handling	Raw errors dumped on users	Better fallback responses	Less embarrassing when things break
Context Management	You handle it yourself	Automatic truncation available	One less thing to debug
Setup Complexity	Lots of manual work	Some things are easier now	Still easy to mess up

Image Support - Cool Feature, Expensive as Hell

Image Processing Workflow

The GA model finally supports images, which is actually useful for support calls. Problem is the costs are absolutely brutal. Every image upload burns through hundreds of tokens and users upload massive photos without thinking twice.

The official vision docs explain how it works, and the pricing page will make you cry when you see what images cost.

How to Upload Images (The Simple Part)

The API part is straightforward. Getting the costs under control is the nightmare:

// Basic image upload - this part works fine
const imageMessage = {
  type: "conversation.item.create",
  item: {
    type: "message",
    role: "user",
    content: [
      {
        type: "input_audio",
        audio: audioData  // "What's wrong with this screenshot?"
      },
      {
        type: "input_image",
        image: {
          data: base64ImageData,  // Your compressed image
          format: "jpeg"         // Stick with JPEG for photos
        }
      }
    ]
  }
};

What Actually Works (If You Can Afford It)

Customer Support:

Screenshots of error messages (actually helpful)
Photos of broken stuff (better than trying to describe it)
Document uploads (if users compress them first)

Education:

Math homework photos (works surprisingly well)
Science lab photos (if lighting isn't garbage)
Language learning with visual context (expensive but effective)

Internal Tools:

Code review screenshots (better than nothing)
Process documentation (if images are small)

Warning: Every single image costs money. Users will upload 20MB photos of their cat thinking it's free.

The Cost Reality That Will Ruin Your Day

Cost Calculator

Images burn through tokens like crazy. Here's what you're actually looking at cost-wise:

Real-world token costs:

Small screenshot: ~$0.02-0.04 per image
Phone photo: ~$0.04-0.08 per image
High-res image: ~$0.10+ per image
Multiple images: Multiply by number of images

Learned this the hard way - if customers upload 3 photos per call and you get 100 calls a day, that's around 24 bucks daily just for images. Had one customer upload multiple screenshots of the same error message because they couldn't just say "login button is broken." Cost me almost 50 cents for one ticket. The token usage guide explains why images are so expensive.

The Compression Code That Might Save Your Budget

You absolutely must compress images or you'll go bankrupt:

// Aggressive compression to survive the costs
function compressImage(file) {
  const canvas = document.createElement('canvas');
  const ctx = canvas.getContext('2d');
  const img = new Image();

  return new Promise((resolve) => {
    img.onload = () => {
      // Crush it down to save money
      const maxSize = 800; // Keep it small
      const ratio = Math.min(maxSize / img.width, maxSize / img.height);
      canvas.width = img.width * ratio;
      canvas.height = img.height * ratio;

      ctx.drawImage(img, 0, 0, canvas.width, canvas.height);

      // 70% quality is usually fine for screenshots
      const compressed = canvas.toDataURL('image/jpeg', 0.7);
      resolve(compressed.split(',')[1]);
    };
    img.src = URL.createObjectURL(file);
  });
}

// Always warn users about costs
async function handleImageUpload(file) {
  if (file.size > 5 * 1024 * 1024) { // 5MB
    alert("That image is huge. This will cost money. Are you sure?");
  }

  return await compressImage(file);
}

Set hard limits or users will upload everything. Had someone try to upload a dozen photos of their broken printer from different angles because they thought more photos = better help. Nearly killed my test budget for that week. See file upload best practices for implementation details.

The Reality Check

Images are great when they work, but they come with serious baggage:

Problems you'll run into:

Users upload massive files without thinking
Images push you over token limits fast
The AI sometimes can't read text in images correctly
Costs add up faster than you expect
Multiple images in one conversation = budget explosion

What actually works:

Compress everything aggressively
Warn users about file sizes
Set hard limits on uploads
Monitor costs obsessively
Have a backup plan when the AI can't parse images

Images are a cool feature, but they're not magic. If your business model depends on processing tons of images, this API will eat your profit margins alive. For enterprise-scale image processing, consider the Azure OpenAI service with different pricing models, or check alternative vision APIs for cost comparison.

I keep getting asked how this stacks up against the old beta version, so here's the reality check.

Realtime API Production Issues & Debugging

Why the hell does my function get called 3 times for the same damn question?

Because your function response is confusing the AI and it thinks it failed. I spent hours debugging this until I realized my response format was garbage. The AI needs a clear success indicator or it keeps retrying:

// Good response
{ "status": "success", "result": "Account balance: $150.25" }

// Bad response that causes retries
{ "balance": 150.25 }  // AI doesn't know if this worked

My functions work in testing but fail randomly in production. What gives?

Welcome to production hell. That 200ms database query you tested locally? It's now taking 8 seconds because your database is getting hammered. Third-party APIs are returning 503s. Your CDN is down. Load balancers are timing out. Production is chaos and your functions will break in ways you never imagined.

The API keeps saying "conversation already has an active response" - what the hell?

You're trying to send multiple requests while one is still processing. The API can only handle one response generation at a time. Wait for the current response to finish before sending the next one, or you'll get this error constantly.

// Track response state to avoid this error
let responseInProgress = false;

ws.on('message', (data) => {
  const event = JSON.parse(data);
  if (event.type === 'response.done') {
    responseInProgress = false;
  }
});

Why do image uploads bankrupt my app?

Because users upload massive photos thinking it's free. Had one customer upload a 4K screenshot that cost like 15 cents. Another uploaded multiple photos of the same error message "just to be sure." Bill jumped from around $50 to over $200 that month just from image processing.

Fix: Compress everything aggressively and put warnings about file sizes.

The AI keeps extracting wrong parameters from speech. How do I fix this?

Speech-to-parameter extraction sucks with accents, background noise, and fast talking. The AI mishears "fifteen" as "fifty" or completely mangles account numbers.

Workarounds:

Use confirmation prompts for critical data
Provide text input as backup
Keep function parameters simple
Don't trust speech for sensitive information

Why does the WebSocket connection randomly drop in production?

Because WebSockets are fragile. Corporate firewalls kill them after 60 seconds. Mobile users switch between WiFi and cellular. Load balancers restart. The API doesn't save conversation state, so when it drops you lose everything and have to start over. See "Error: WebSocket connection closed unexpectedly" in my logs constantly.

How I deal with it:

Reconnect automatically (users don't notice if you're fast)
Save conversation state every few messages
Expect disconnects constantly - they're not bugs, they're features
Test on shitty hotel WiFi to see what breaks

My costs are 3x higher than expected. What's killing my budget?

Been there. Check your dashboard for these budget killers:

Images (always images - users upload everything)
Chatty customers who won't stop talking
Functions failing and retrying multiple times each
Your system prompt is way too long
No token truncation so conversations never end

Images are usually the culprit. One support call with 4 screenshots cost me over a dollar - more than 50 text-only calls.

Can I make this work with my existing authentication system?

The Realtime API doesn't handle auth directly. You need to authenticate users before establishing the WebSocket connection, then track sessions yourself. There's no built-in session management.

// You handle auth separately
const token = await authenticateUser(credentials);
const ws = new WebSocket(`wss://api.openai.com/v1/realtime?model=gpt-realtime`, {
  headers: { 'Authorization': `Bearer ${apiKey}` }
});

The AI hallucinates function calls that don't exist. How do I stop this?

Keep function descriptions short and simple. Long descriptions make the AI creative, and creativity means making up functions that don't exist.

Bad: "Search the comprehensive customer database for detailed account information including transaction history, preferences, and support tickets"

Good: "Get customer account balance"

How reliable is this for production customer service?

It's not. I tried replacing our phone support with this and it was a disaster. Functions fail constantly. The AI misunderstands half the requests. Costs spike randomly when customers upload photos. WebSockets drop mid-conversation.

Great for demos and impressing investors. Terrible for actual customer service unless you build a fortress of error handling around it.

But this is just the beginning of your production nightmare - wait until you see what happens when you actually deploy this thing.

Production Reality Check - What Actually Breaks

Production Monitoring

After months of running this API in production, here's what will definitely ruin your day and how to survive it. Forget the perfect examples - this is what happens when real users and real data meet your voice app.

The OpenAI production guide has some generic advice, and Azure's docs might help if you're into that sort of thing.

The Token Explosion That Killed Our App

My screwup: Customer searched for "John" and our search function returned way too many full customer records with addresses, order history, everything. Burned through around 25k tokens. Cost us almost a dollar for one query and the customer hung up thinking it was broken. Took me a couple days to figure out why my API bill spiked.

The fix that actually works:

// Return minimal data or die
async function searchCustomers(query) {
  const results = await database.search(query, { limit: 5 }); // Hard limit

  // Return ONLY what users need to hear spoken
  return results.map(customer => ({
    name: customer.name,
    phone: customer.phone.slice(-4), // Last 4 digits only
    status: customer.status
  }));
}

Lesson learned: If you can't read it out loud in 10 seconds, don't return it from a function call. The usage policies and rate limiting docs might save you from similar disasters.

Database Timeouts That Ruin Everything

Another disaster: Customer asks for a sales report. Our database query took forever - maybe 45 seconds - because our indexes were trash. User hung up after 10 seconds of silence thinking the call dropped. Query finally finished and returned data to nobody. Wasted compute time and lost a customer.

The brutal solution:

// Set aggressive timeouts or suffer
async function getSalesReport(period) {
  try {
    // 5 second timeout, no exceptions
    const result = await Promise.race([
      database.getSalesData(period),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error('timeout')), 5000)
      )
    ]);
    return result;
  } catch (error) {
    // Return something, anything, rather than silence
    return {
      error: "That report is taking too long. Can I help with something else?"
    };
  }
}

The Weekend That Cost $3,247

The expensive weekend: Left the app running over the weekend without limits. One customer must have been talking for hours because our AWS bill went from around $200 to over $3k. Had to implement conversation limits real quick after that mess. Found out later they were using it to help their kid with homework for hours.

The protection that saved us:

// Kill expensive conversations before they kill you
let conversationCost = 0;
const MAX_COST = 5.00; // $5 limit per conversation

function trackTokens(inputTokens, outputTokens) {
  const cost = (inputTokens * 0.000005) + (outputTokens * 0.00002);
  conversationCost += cost;

  if (conversationCost > MAX_COST) {
    ws.close(1000, "Cost limit reached");
    return false;
  }
  return true;
}

Reality: Set hard limits on everything - conversation length, cost, tokens, function calls. Users will exploit unlimited anything. Check the production guide and usage dashboard if you want to get fancy about it.

The Connection Pool That Saved Our Ass

Database Connection

Connection pool nightmare: Was being lazy and opening a new database connection for every function call. Under load we hit PostgreSQL's connection limit and everything started failing with "FATAL: too many clients already." Took down the entire app for a few hours while I figured out connection pooling. That was a fun Monday morning.

The simple fix:

// One connection pool, reuse everywhere
const pool = new Pool({
  max: 10, // Don't get greedy
  connectionTimeoutMillis: 2000, // Fail fast
});

async function runQuery(sql, params) {
  const client = await pool.connect();
  try {
    return await client.query(sql, params);
  } finally {
    client.release(); // ALWAYS release or die
  }
}

The Caching That Actually Matters

Forget complex cache invalidation. Just cache the obvious stuff:

// Simple cache that prevents database hammering
const cache = new Map();

async function getCustomerData(id) {
  const key = `customer_${id}`;

  if (cache.has(key)) {
    return cache.get(key);
  }

  const data = await database.getCustomer(id);
  cache.set(key, data);

  // Clear cache after 5 minutes
  setTimeout(() => cache.delete(key), 5 * 60 * 1000);

  return data;
}

What You Actually Need to Monitor

Forget fancy metrics. Watch these or die:

Function response time (anything over 5 seconds is broken)
Daily costs (set alerts at 50% of your budget)
Error rates (more than 10% means something's wrong)
WebSocket disconnections (Safari users especially get screwed)
Browser compatibility (Chrome audio permissions are a pain)

Version-Specific Gotchas That Will Ruin Your Day

Random browser crashes: Long conversations seem to eat more and more RAM until the browser eventually crashes. Not sure if it's the API or just browser garbage collection being weird, but it happens. Restart your session every hour or so if you're doing long calls.

Safari WebSocket bug: Safari 17.x randomly drops WebSocket connections on mobile when users switch apps. No fix, just detect Safari and warn users to use Chrome.

Chrome autoplay policy: Chrome 118+ blocks audio if users haven't interacted with the page. Found this out during a demo to investors when nothing worked. Error message: "NotAllowedError: play() failed because the user didn't interact with the document first." Always require a user click before starting audio.

The Node.js profiling docs and OpenAI safety guide have more monitoring ideas. The GitHub issues are where people post their actual disasters.

The Realtime API is powerful but fragile. Production means assuming everything will break and building around that reality. Keep it simple, set hard limits, and always have a backup plan.

The Bottom Line

The GA release fixed the most annoying async function calling issues, but it's still not a magic bullet. You'll spend more time managing costs, handling errors, and optimizing performance than actually building features. It works, but you better be ready to babysit the hell out of it in production.