LIVE API

Telephony middleware
for AI voice developers.

Control calls with REST APIs. Get real-time audio via webhooks. Bring your own SIP carrier. Build anything.

Try it now — enter your email and phone number, then hit Execute. We'll call you with a quick automated demo.

~ / try-monkeydial

$ curl -X POST https://monkeydial.com/demo/api/v1/dial \
-H "X-API-Key: " \
-H "Content-Type: application/json" \
-d '{
"to": "+1",
"from": "+17029860828"
}'

Webhook Events

idle

01 — Outbound Dialer

Make a phone call with one POST.

Send a single HTTP request, phone rings. Get a call_uuid back instantly. Configure a webhook URL and receive real-time events: call answered, audio segments, DTMF digits, call ended. No XML. No state machines. Just REST in, webhooks out.

Your App

POST /dial

MonkeyDial

Phone Rings

outbound-call

$ curl -X POST https://api.monkeydial.com/v1/dial \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "to": "+13105551234", "from": "+17029860828", "webhook_url": "https://your-app.com/webhook" }'

# Response 200 OK { "success": true, "call_uuid": "86fd19ef-d17d-482f-bbda-65287272ac22", "message": "Call initiated successfully" }

02 — AI Voice Agent

Your models. Your pipeline. Our audio.

When a call connects, VAD listens for speech and delivers audio segments to your webhook in real time. Run them through your own stack — Deepgram, Whisper, OpenAI, ElevenLabs, or anything else. Send the generated audio back and MonkeyDial plays it to the caller.

Call

VAD

Your STT

Your LLM

Your TTS

Playback

ai-voice-agent

── webhook event ────────────────────── POST https://your-app.com/webhook { "event": "vad_audio", "call_uuid": "86fd19ef-d17d-482f", "audio_url": "/segments/seg_003.wav", "duration_ms": 2100 } ── your app (any stack) ─────────────── STT → LLM → TTS Whisper, Deepgram, OpenAI, ElevenLabs ...whatever you want ── play response back ───────────────── POST /v1/vad/{call_uuid}/playback_audio Content-Type: multipart/form-data audio_file: response.wav ← 200 {"success": true, "status": "playing"}

03 — Call Bridging

Connect any two calls. Or ten.

Bridge an inbound caller to an outbound call with a single join — caller asks for billing, your app dials billing and connects them. For more complex scenarios, use conferences: agents wait in a room, outbound calls get placed and joined to available agents as they connect. Warm transfers, escalations, multi-party handoffs — your application decides the logic, MonkeyDial connects the calls.

Call A

Bridge / Conference

Call B

call-bridging

── simple bridge: connect two calls ─── POST /v1/dial {"to": "+18005551234", "from": "+17029860828"} ← 200 {"call_uuid": "b7c2d4e1..."} POST /v1/bridge {"call_a": "86fd19ef...", "call_b": "b7c2d4e1..."} ← 200 {"success": true} ── conference: agents + warm transfers ─ POST /v1/conference/create {"name": "agent-room-12"} POST /v1/conference/agent-room-12/join {"call_ids": ["86fd19ef...", "b7c2d4e1..."]} ← 200 {"success": true, "participants": 2}

03.5 — Call Routing

Route calls with keypresses, voice, or both.

Build IVR menus that route calls via DTMF — press 1 for sales, 2 for support. Or go further: capture audio segments, run them through your own speech recognition, and route based on what the caller says. Mix both modes freely. Transfer to agents, forward to another number, or hand off to an AI — all from your webhook handler.

Inbound Call

DTMF or Voice

Your Logic

Route

call-routing

── option A: route by keypress ──────── WEBHOOK → your-app.com/webhook { "event": "dtmf", "call_uuid": "86fd19ef-d17d-482f", "digit": "1" } ── option B: route by voice ─────────── WEBHOOK → your-app.com/webhook { "event": "vad_audio", "call_uuid": "86fd19ef-d17d-482f", "audio_url": "/segments/seg_001.wav" } → your STT: "representative" ── route the call ───────────────────── POST /v1/dial {"to": "+18005551234", "from": "+17029860828"} ← 200 {"success": true, "call_uuid": "a3b8c9..."}

04 — Phone Payments

Collect payments over the phone.

Build payment flows where callers enter card numbers on their keypad via DTMF, or speak them — capture audio segments through VAD, run your own STT to extract the details, validate, and charge. Mix both modes: keypad for card numbers, voice for confirmation. Your app handles the input tracking, validation, and payment processing.

Caller

DTMF or Voice

Your App

Charge

phone-payment

── prompt the caller ────────────────── POST /v1/vad/{call_uuid}/playback_audio audio_file: enter-card-number.wav ── DTMF digits arrive at your webhook ─ WEBHOOK → your-app.com/webhook {"event": "dtmf", "digit": "4"} {"event": "dtmf", "digit": "2"} {"event": "dtmf", "digit": "4"} {"event": "dtmf", "digit": "2"} ...collect all 16 digits, exp, CVV ── your app charges the card ────────── Stripe, Authorize.net, etc. ── confirm to the caller ────────────── POST /v1/vad/{call_uuid}/playback_audio audio_file: payment-confirmed.wav

Bring Your Own Carrier

Your carrier. Our middleware.

No migration

Keep your existing SIP provider. Point your DID inbound route to your MonkeyDial SIP URI.

No lock-in

Swap carriers any time. Each DID can route from a different provider. Mix and match freely.

Any SIP carrier

If your carrier can deliver calls to a SIP URI via IP whitelist, username/password, or tech prefix — it works.

Tested with: Flowroute, Twilio, Telnyx, Bandwidth — or any carrier that supports challenge-based auth on dial (no full SIP registration required).

FAQ

Frequently Asked Questions

MonkeyDial is telephony middleware. It connects your SIP carrier to your application via REST APIs and webhooks. You make API calls to dial, hangup, play audio, and create conferences. You receive webhooks for events like audio segments, DTMF digits, and call state changes. You write the application logic — the AI agent, the IVR, the routing rules.

No. Keep your existing Twilio, Flowroute, Telnyx, or any SIP provider. Just point your DID's inbound route to your MonkeyDial SIP URI. Takes about 2 minutes. No migration, no downtime.

Pay as you go — no subscriptions. You pay your carrier for call costs (phone numbers, per-minute rates). MonkeyDial charges separately for processing minutes only, starting at $0.02/min billed in 6-second increments. Volume discounts kick in automatically. See pricing for details.

Today, MonkeyDial delivers audio as file-based segments via webhooks — VAD detects speech, packages the audio, and sends it to your endpoint. This works well for most voice agent and IVR use cases. Real-time audio streaming (bidirectional) is in active development and will be available for beta testing soon. If streaming is critical for your use case, reach out — we'd love to have you in the beta.

MonkeyDial handles: Inbound/outbound calls, VAD, conference bridges, DTMF detection, call recording, AMD, audio playback, and webhook delivery.

You build: Your application logic — the AI agent, the routing rules, the IVR flows. Use any language, any AI provider, any framework.

Try the live demo above — it makes a real call to your phone using the actual MonkeyDial API. When you're ready, sign up and get $10 in free credit to start building. No credit card required.

Telephony middlewarefor AI voice developers.

That was a real call.

Make a phone call with one POST.

Your models. Your pipeline. Our audio.

Connect any two calls. Or ten.

Route calls with keypresses, voice, or both.

Collect payments over the phone.

Your carrier. Our middleware.

No migration

No lock-in

Any SIP carrier

Frequently Asked Questions

What does MonkeyDial actually do?

Do I need to switch carriers?

How does billing work?

What about audio streaming?

What's included vs. what do I build?

Is there a free trial?

Telephony middleware
for AI voice developers.