The Streaming STT API uses WebSockets for real-time transcription. You connect once, stream audio chunks, receive partial transcripts as they become available, and send a commit message to receive the final transcript.

Endpoint

wss://infer.voice.intron.io/stt/v1/stream

Authentication

All connections require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Connection query parameters

The websocket route reads these values from request.args.get(...).
ParameterTypeDescriptionRequiredDefault
session_idStringExisting session ID. If omitted, server creates one and returns it in SESSION_CREATED.NoAuto-generated
sample_rateIntegerInput audio sample rate in Hz.No16000
bit_rateIntegerInput PCM bit depth.No16
num_channelsIntegerNumber of input channels.No1
use_language_asr_inputStringInput language code.(see the supported languages)Noen

Supported Languages

Supported Languages

Comprehensive list of languages supported

WebSocket handshake examples

wscat -c 'wss://infer.voice.intron.io/stt/v1/stream?sample_rate=16000&bit_rate=16&num_channels=1&use_language_asr_input=en' \
	-H 'Authorization: Bearer YOUR_API_KEY'

Input message types

INPUT_AUDIO_CHUNK

FieldTypeRequiredDescription
message_typeStringYesMust be INPUT_AUDIO_CHUNK
audio_base_64StringYesBase64 encoded PCM16 little-endian audio bytes
ack_idIntegerNoClient chunk sequence ID echoed by server ACK
JSON
{
	"message_type": "INPUT_AUDIO_CHUNK",
	"audio_base_64": "<base64_pcm16_chunk>",
	"ack_id": 1
}

COMMIT

FieldTypeRequiredDescription
message_typeStringYesMust be COMMIT
JSON
{
	"message_type": "COMMIT"
}

Response message types

Output payloads are based on websocket response classes (WebsocketResponse... and stream response classes).

Session and transcript messages

Sent immediately after successful authentication, capacity checks, and quota checks. It returns the active session_id, current credit balance, and the effective stream configuration.
JSON
{
	"message_type": "SESSION_CREATED",
	"session_id": "12a9760f-b165-4404-91d0-a65d4cdt78fs",
	"credit_balance": 120.0,
	"configs": {
		"sample_rate": 16000,
		"bit_rate": 16,
		"num_channels": 1,
		"use_prompt_id": null,
		"use_language_asr_input": "en",
		"use_language_asr_output": "en"
	}
}
Sent after the server accepts and stores an audio chunk. The server echoes the client ack_id when present, otherwise it uses the running chunk count.
JSON
{
	"message_type": "AUDIO_CHUCK_ACK",
	"chunck_id": 1,
	"total_chunks": 1
}
Sent whenever a new partial transcript is available and differs from the last partial transcript already sent on the connection.
JSON
{
	"message_type": "PARTIAL_TRANSCRIPT",
	"transcript": "patient reports intermittent chest pain"
}
Sent after the client sends COMMIT and the backend finishes processing the full stream. This is the final transcript payload for the session.
JSON
{
	"message_type": "COMMITTED_TRANSCRIPT",
	"transcript_id": "e1c4a90b-e319-4de6-9f22-0e0cf5e8b7a2",
	"transcript_text": "patient reports intermittent chest pain for two days",
	"audio_len": 10
}
Optional keep-alive message sent by the server when periodic pinging is enabled for the connection.
JSON
{
	"message_type": "PING",
	"message": "ping"
}

Error and terminal messages

Sent when the Authorization header is missing, invalid, or the access key fails authentication. The server closes the connection after sending this message.
JSON
{
	"message_type": "AUTHENTICATION_ERROR",
	"message": "permission denied,access-key error"
}
Sent when the service does not currently have enough available capacity to accept the stream. The status field indicates the capacity state returned by the backend.
JSON
{
	"message_type": "RESOURCE_EXHAUSTED",
	"status": "CAPACITY_NOT_AVAILABLE"
}
Sent when the authenticated user or entity has insufficient credits or has exceeded an allowed usage limit. The server closes the connection after sending this message.
JSON
{
	"message_type": "QUOTA_EXCEEDED",
	"credits_balance": "0",
	"message": "insufficient credits balance"
}
Sent when the request payload is invalid, such as malformed JSON, an invalid message structure, invalid base64 audio data, invalid PCM16 payload length, or audio sent after COMMIT.
JSON
{
	"message_type": "INPUT_ERROR",
	"message": "Invalid base64 audio payload"
}
Sent when the received audio chunk is smaller than the minimum allowed chunk size.
JSON
{
	"message_type": "CHUNCK_SIZE_TOO_SMALL",
	"chunk_size_min": "1"
}
Sent when the received audio chunk is larger than the maximum allowed chunk size.
JSON
{
	"message_type": "CHUNK_SIZE_TOO_LARGE",
	"chunk_size_max": "32"
}
Sent when no audio chunk has been received for longer than the allowed idle timeout while the session is still active.
JSON
{
	"message_type": "INSUFFICIENT_AUDIO_ACTIVITY",
	"message": "insufficient audio activity detected,exceed max idle time of 60 seconds"
}
Sent when the websocket session exceeds the maximum allowed session lifetime.
JSON
{
	"message_type": "SESSION_TIME_LIMIT_EXCEEDED",
	"session_time_limit": "300"
}
Sent when the server detects that the websocket connection has been lost.
JSON
{
	"message_type": "CONNECTION_LOST",
	"message": "connection lost"
}
Sent for unexpected server-side failures that do not map to a more specific websocket error message.
JSON
{
	"message_type": "ERROR",
	"message": "connection error"
}

Runtime limits

Max session lifetime: 300 seconds
Max idle audio gap: 60 seconds
Min chunk size: 1 KB
Max chunk size: 32 KB

Typical flow

  1. Open websocket connection with query parameters and Authorization header.
  2. Receive SESSION_CREATED.
  3. Send INPUT_AUDIO_CHUNK messages repeatedly.
  4. Receive AUDIO_CHUCK_ACK and PARTIAL_TRANSCRIPT messages.
  5. Send COMMIT when done streaming audio.
  6. Receive COMMITTED_TRANSCRIPT and the connection closes.