Give any face a voice.
Real-time lip-synced avatars from a single photo. Stream audio in, get video out. Pay per minute with USDC — no accounts, no API keys.
Any face, instantly
Upload any photo. A custom avatar is live in 5 minutes to instant — no training pipeline, no pre-built library.
Agent-native
The only avatar API an autonomous agent can discover, pay for, and operate without human setup. skill.md + x402.
LiveKit first
Video streams over LiveKit by default for browser and OBS viewer flows, or to any RTMP endpoint when you provide one.
Zero friction
No accounts. No API keys. No subscriptions. Pay per minute with USDC — your wallet is your only credential.
Built different
Pre-rendered video, not real-time. Require training your avatar for hours or picking from a library. Accounts, API keys, monthly subscriptions.
Output via WebRTC to a browser tab. Require account signup and API key management. No agent discovery or programmatic payment.
No GPU provisioning. No CUDA dependencies. No FFmpeg pipeline maintenance. No scaling headaches. Upload a photo and go.
How it works
Create a session
Send a face image. If you omit output_url, the gateway uses LiveKit. Pay with USDC via x402 — your wallet is your identity.
Wait for warm-up
The GPU loads your avatar. Cold starts take ~5 minutes. Warm starts are instant. The timer doesn't start until you're live.
Stream audio
Connect via WebSocket and send audio in any format — PCM, MP3, WAV, OGG, or FLAC. Auto-detected. The GPU renders lip-synced video in real time.
Watch the output
Video streams to LiveKit by default, or to your RTMP endpoint if you provide one. Extend, end, or let it expire.
Integrate
Point your agent at the skill file. Discovery, payment, and rendering happen autonomously — no human in the loop.
