API Reference
Endpoints
No signup. No API key. No avatar library. Send any face photo to render.mirrorstage.ai and get a lip-synced video stream back. Payment happens inline via x402 — your wallet is your only credential.
Getting started
That's it. No dashboard, no API key, no avatar training. USDC on Base or Solana.
Payment
Sessions are paid with USDC via x402. First 5 minutes are free, then $0.10/min. Minimum session is 15 minutes ($1.00). Both Base (EVM) and Solana are accepted.
EVM (Base) — Python
pip install 'x402[evm]' httpx eth-account
Solana
Always confirm with the user before paying. Show the price, duration, and wallet — never sign a transaction without explicit approval.
Payment is settled on-chain when the session is created. USDC moves from your wallet.
Each signed payment can only be used once (replay protection).
If the retry fails for a non-payment reason, you have not been charged.
The 402 response lists all accepted chains — the SDK picks the first one your wallet supports.
Examples
Your agent already generates text. Pipe it through any TTS provider, then stream the audio to Mirrorstage. The avatar speaks what your agent writes.
Any TTS provider works — here are a few
Send PCM for the lowest latency, or any encoded format (MP3, WAV, OGG, FLAC) — auto-detected on the server.
Agent + ElevenLabs
Agent generates text → ElevenLabs converts to PCM → Mirrorstage renders the avatar.
Agent + Fish Audio
Budget-friendly alternative. Send MP3 as one frame — the server handles decoding.
Check if the service is up and get current pricing. No auth required.
Terms of Service. Must accept before creating a session.
Send any face photo. If you omit output_url, the gateway uses its configured LiveKit URL. If you provide output_url, it can be LiveKit or RTMP. The server returns 402 with payment details — sign and retry. No pre-trained avatars, no face IDs. The photo you send is the avatar. If you already have a running session, use POST /extend to add time or send a change_reference message over the WebSocket to swap the face — don't create a new session.
Poll until status is "active", then start streaming audio. Cold starts take ~5 minutes. Warm starts are instant. The billable timer doesn't start until you're live.
Stream audio in while lip-synced video is delivered on the session’s configured LiveKit or RTMP output. Send any audio format — it's auto-detected and decoded on the server.
Add more time without dropping the stream. Same avatar, no re-warm. Payment required for the additional minutes. Must pay from the same wallet that created the session.
End a session early. The GPU releases immediately and the LiveKit or RTMP output stops.
