Skip to main content

Stream an OpenAI Realtime API agent with a cXML script

Put OpenAI Speech-to-Speech models on the phone with cXML <Stream>

In this guide, we will build a Node.js application that serves a cXML Script that initiates a two-way (bidirectional) <Stream> to a Speech-to-Speech model on the OpenAI Realtime API. When a caller initiates a call to the assigned phone number, the SignalWire platform requests and runs the cXML script.

Prerequisites

Before you begin, ensure you have:

  • SignalWire Space - Sign up free
  • OpenAI API Key - Get access (requires paid account)
  • Node.js 20+ - For running the TypeScript server (Install Node)
  • ngrok or other tunneling service - For local development tunneling (Install ngrok)
  • Docker (optional) - For containerized deployment

Quickstart

Clone and install

Clone the SignalWire Solutions repository, navigate to this example, and install.

git clone https://github.com/signalwire/cXML-realtime-agent-stream
cd cXML-realtime-agent-stream
npm install

Add OpenAI credentials

Select the Local or Docker tab below depending on where you plan to run the application.

When running the server on your local machine, store your credentials in a .env file.

cp .env.example .env

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=sk-your-actual-api-key-here

Run application

Local - Build and run
npm run build
npm start

Your AI assistant webhook is now running at http://localhost:5050/incoming-call.

Health check

Make sure your server is running and the health check passes:

curl http://localhost:5050/health
# Should return: {"status":"healthy"}

Create a cXML script

Next, we need to tell SignalWire to request cXML from your server when a call comes in.

  • Navigate to My Resources in your Dashboard.
  • Click Create Resource, select Script as the resource type, and choose cXML.
  • Under Handle Using, select External Url.
  • Set the Primary Script URL to your server's webhook endpoint.

Select the Local tab below if you ran the application locally, and the Docker tab if you're running it with Docker.

SignalWire must be able to reach your webhook from the internet. For local development, use ngrok or another tunneling service to expose your local server.

Use ngrok to expose port 5050 on your development machine:

ngrok http 5050

The output will look like:

Forwarding                    https://abc123def456.ngrok.io -> http://localhost:5050

Append /incoming-call to the HTTPS URL provided by ngrok:

https://abc123def456.ngrok.io/incoming-call

Use this as the Primary Script URL when creating your cXML script in the SignalWire Dashboard.

set routes

For this example, you must include /incoming-call at the end of your URL. This is the specific webhook endpoint that our application uses to handle incoming calls.

  • Give the cXML Script a descriptive name, such as "AI Voice Assistant".
  • Save your new Resource.

Assign phone number or SIP address

To test your AI assistant, create a SIP address or phone number and assign it as a handler for your cXML Script Resource.

  • From the My Resources tab, select your cXML Script
  • Open the Addresses & Phone Numbers tab
  • Click Add, and select either SIP Address or Phone Number
  • Fill out any required details, and save the configuration

Test application

Dial the SIP address or phone number assigned to your cXML Script. You should now be speaking to your newly created agent!


How it works

This section walks through the key components of the integration. Start with the system architecture to understand the full picture, then explore each component in detail.

The application consists of four key components:

  1. cXML Server (Fastify): Receives incoming call webhooks and returns instructions, in the form of a cXML Script, to SignalWire
  2. WebSocket Bridge (SignalWireCompatibilityTransportLayer): Translates between SignalWire's media stream protocol and OpenAI's Realtime API format
  3. Realtime agent and session: Creates a RealtimeAgent with behavior instructions and attaches it to a RealtimeSession, which manages the bidirectional connection to OpenAI's Realtime API using the transport layer for speech processing and AI responses
  4. Function calling: Server-side tool execution (weather, time, custom functions) during conversations

When SignalWire receives an incoming call, it sends a webhook to your server. Your server responds with instructions, in the form of a cXML Script, that tell SignalWire to stream the audio to your WebSocket endpoint, which is connected to OpenAI.

src/routes/webhook.ts
import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
import { WEBHOOK_MESSAGES, AUDIO_FORMAT, SIGNALWIRE_CODECS } from '../constants.js';
import { AGENT_CONFIG } from '../config.js';

export async function webhookRoute(fastify: FastifyInstance) {
fastify.all('/incoming-call', async (request: FastifyRequest, reply: FastifyReply) => {
// Dynamically construct WebSocket URL from request headers
const host = request.headers.host || 'localhost';
const protocol = request.headers['x-forwarded-proto'] === 'https' ? 'wss' : 'ws';
const websocketUrl = `${protocol}://${host}/media-stream`;

// Get codec attribute based on configured audio format
const codec = AGENT_CONFIG.audioFormat === AUDIO_FORMAT.PCM16
? SIGNALWIRE_CODECS.PCM16
: SIGNALWIRE_CODECS.G711_ULAW;
const codecAttribute = codec ? ` codec="${codec}"` : '';

// Log codec selection for debugging
console.log(`📞 Incoming call - Audio format: ${AGENT_CONFIG.audioFormat}, SignalWire codec: ${codec || 'default (G.711 μ-law)'}`);

// Generate cXML response to stream audio to our WebSocket
const cXMLResponse = `<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say>${WEBHOOK_MESSAGES.CONNECTING}</Say>
<Connect>
<Stream url="${websocketUrl}"${codecAttribute} />
</Connect>
</Response>`;

reply.type('text/xml').send(cXMLResponse);
});
}

The webhook automatically sets the correct codec based on your AUDIO_FORMAT configuration. See Configure audio format for how to set your desired audio quality.

Webhook URL Format

Your webhook URL must include /incoming-call at the end:

  • Local: https://your-ngrok-url.ngrok.io/incoming-call
  • Production: https://your-domain.com/incoming-call

This is the core of the integration. When a WebSocket connection is established at /media-stream, you create a transport layer that bridges SignalWire's audio stream to OpenAI's Realtime API. The transport layer automatically handles protocol translation and audio format conversions, so audio flows bidirectionally without any manual conversion.

In the code below, we handle WebSocket lifecycle events (connection, disconnection, errors), create the transport layer with the configured audio format, connect to OpenAI's Realtime API while logging key events (AI responses, transcriptions, tool calls), and trigger an immediate AI response to greet the caller.

src/routes/streaming.ts
import type { WebSocket } from 'ws';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
import { SignalWireCompatibilityTransportLayer } from '../transports/SignalWireCompatibilityTransportLayer.js';

fastify.get('/media-stream', { websocket: true }, async (connection: WebSocket) => {
// Handle client disconnection
connection.on('close', () => {
console.log('Client disconnected');
});

// Handle connection errors
connection.on('error', (error) => {
console.error('Connection error:', error);
});

try {
// Create SignalWire transport layer with configured audio format
const signalWireTransportLayer = new SignalWireCompatibilityTransportLayer({
signalWireWebSocket: connection,
audioFormat: AGENT_CONFIG.audioFormat
});

// Create session with SignalWire transport
const session = new RealtimeSession(realtimeAgent, {
transport: signalWireTransportLayer,
model: model
});

// Listen to raw transport events for debugging
session.transport.on('*', (event) => {
switch (event.type) {
case 'response.done':
console.log('AI response completed', event);
break;
case 'conversation.item.input_audio_transcription.completed':
console.log('User transcription completed', event);
break;
default:
console.debug('Raw transport event:', event);
}
});

// Listen to session events for tool call lifecycle
session.on('agent_tool_start', (context, agent, tool, details) => {
console.log('Tool call started:', details);
});

session.on('agent_tool_end', (context, agent, tool, result, details) => {
console.log('Tool call completed:', details);
});

// Handle errors gracefully
session.on('error', (error) => {
console.error('Session error:', error);
});

// Connect to OpenAI Realtime API via the transport layer
await session.connect({
apiKey: process.env.OPENAI_API_KEY
});

// Trigger immediate AI response
try {
const responseEvent = { type: 'response.create' };
signalWireTransportLayer.sendEvent(responseEvent);
} catch (error) {
// AI-first response trigger failed, but session continues
}

} catch (error) {
console.error('Error initializing session:', error);
}
});

The agent configuration controls how your assistant behaves during conversations. You define its personality through instructions, select its voice, specify which tools it can use, and configure the audio format:

src/config.ts - Agent configuration
import type { RealtimeAgentConfiguration } from '@openai/agents/realtime';
import { allTools } from '../tools/index.js';

export const AGENT_CONFIG: RealtimeAgentConfiguration = {
name: 'SignalWire Voice Assistant',
voice: 'alloy',
model: 'gpt-4o-realtime-preview',
audioFormat: process.env.AUDIO_FORMAT || 'g711_ulaw',
instructions: `
You are a helpful and friendly voice assistant integrated with SignalWire.

IMPORTANT: Always start every conversation by greeting the caller first. Begin with something like "Hello! I'm your AI voice assistant. How can I help you today?"

You can help with weather information, time queries, and general conversation.
Be concise and friendly in your responses, remembering you're on a phone call.
When you first greet someone, briefly mention that you can help with weather, time, and answering questions.
Always confirm when you're about to use a tool.
`
};

// The agent is instantiated in the WebSocket handler with: new RealtimeAgent(AGENT_CONFIG)

Tools are server-side functions that the AI can call during a conversation. In production, extract the implementation logic into separate files as shown in the repository.

Get the current time in Eastern Time. Uses JavaScript's built-in toLocaleString method—no external APIs required.

src/tools/time.tool.ts
import { z } from 'zod';
import { tool as realtimeTool } from '@openai/agents/realtime';
import { ERROR_MESSAGES } from '../constants.js';

export const timeTool = realtimeTool({
name: 'get_time',
description: 'Get the current time in Eastern Time',
parameters: z.object({}), // No parameters needed
execute: async () => {
try {
const now = new Date();

// Always format for Eastern Time
const easternTime = now.toLocaleString('en-US', {
timeZone: 'America/New_York',
timeZoneName: 'short',
weekday: 'long',
year: 'numeric',
month: 'long',
day: 'numeric',
hour: 'numeric',
minute: '2-digit'
});

return `The current time in Eastern Time is ${easternTime}.`;
} catch (error) {
// Return fallback message if time formatting fails
return ERROR_MESSAGES.TIME_UNAVAILABLE;
}
},
});
  1. User asks: "What time is it?"
  2. AI recognizes intent: Needs time information
  3. Function call triggered: get_time()
  4. Server executes: Gets current Eastern Time
  5. Result returned: AI incorporates into response
  6. User hears: "The current time is 3:45 PM Eastern Time."

Configuration

Environment variables

Set up your environment variables for different deployment scenarios:

Create a .env file in your project root:

# Required
OPENAI_API_KEY=sk-your-actual-api-key-here

# Optional
PORT=5050
AUDIO_FORMAT=g711_ulaw # or 'pcm16' for HD audio

Audio codec

Choose the right audio codec for your use case. The default is G.711 μ-law.

PCM16 @ 24kHz

Crystal clear audio for demos and high-quality applications

Sample rate: 24 kHz
Bandwidth: ~384 kbps
Quality: High definition

G.711 μ-law @ 8kHz

Standard telephony quality, lower bandwidth usage

Sample rate: 8 kHz
Bandwidth: ~64 kbps
Quality: Standard telephony

The application automatically sets the correct codec in your cXML response based on the AUDIO_FORMAT environment variable. Just configure the environment variable:

# In your .env file
AUDIO_FORMAT=pcm16 # or g711_ulaw

The application will use pcm16 (24kHz HD audio) when set, or default to g711_ulaw (8kHz standard telephony) if not set.

Troubleshooting

Refer to this table if you encounter issues running the application.

IssueCauseSolution
No audio from AICodec mismatch or incorrect codec configuration• Check AUDIO_FORMAT environment variable
• Verify SignalWire and application codec match
Invalid AUDIO_FORMAT errorInvalid environment variable value• Verify AUDIO_FORMAT is either g711_ulaw or pcm16
• Check for typos or extra whitespace
• Remove the variable to use default (g711_ulaw)
Server fails to startPort 5050 already in use• Check what's running on port 5050: lsof -i :5050
• Stop the conflicting application or use a different port with PORT=5051 npm start
Health check failingServer crashed or not responding• Check server logs for error messages
• Verify all configuration is correct
• Try accessing /health endpoint directly in browser
Missing OPENAI_API_KEYConfiguration error• Verify OPENAI_API_KEY in .env file (local)
• Verify Docker secrets are configured (Docker)
Calls not connecting after ngrok restartngrok URL changed• ngrok generates a new URL each time you restart
• Update the webhook URL in SignalWire Dashboard with the new ngrok URL
• Restart ngrok and update SignalWire before testing

Resources