Streaming

The Streaming STT API uses WebSockets for real-time transcription. You connect once, stream audio chunks, receive partial transcripts as they become available, and send a commit message to receive the final transcript.

This endpoint requires an Integrator Account

Runtime limits

Max session lifetime: 300 seconds
Max idle audio gap: 60 seconds
Min chunk size: 1 KB
Max chunk size: 32 KB

Endpoint

wss://infer.voice.intron.io/stt/v1/stream

Authentication

All connections require a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Connection query parameters

The websocket route reads these values from request.args.get(...).

Parameter	Type	Description	Required	Default
`sample_rate`	Integer	Input audio sample rate in Hz.	No	`16000`
`bit_rate`	Integer	Input PCM bit depth.	No	`16`
`num_channels`	Integer	Number of input channels.	No	`1`
`use_language_asr_input`	String	Input language code.(see the supported languages)	No	`en`

Supported Languages

Comprehensive list of languages supported

WebSocket handshake examples

wscat -c 'wss://infer.voice.intron.io/stt/v1/stream?sample_rate=16000&bit_rate=16&num_channels=1&use_language_asr_input=en' \
	-H 'Authorization: Bearer YOUR_API_KEY'

Input message types

`INPUT_AUDIO_CHUNK`

Field	Type	Required	Description
`message_type`	String	Yes	Must be `INPUT_AUDIO_CHUNK`
`audio_base_64`	String	Yes	Base64 encoded PCM16 little-endian audio bytes
`ack_id`	Integer	No	Client chunk sequence ID echoed by server ACK

JSON

{
	"message_type": "INPUT_AUDIO_CHUNK",
	"audio_base_64": "<base64_pcm16_chunk>",
	"ack_id": 1
}

`COMMIT`

Field	Type	Required	Description
`message_type`	String	Yes	Must be `COMMIT`

JSON

{
	"message_type": "COMMIT"
}

Response message types

Output payloads are based on websocket response classes (WebsocketResponse... and stream response classes).

Session and transcript messages

SESSION_CREATED

Sent immediately after successful authentication, capacity checks, and quota checks. It returns the active session_id, current credit balance, and the effective stream configuration.

JSON

{
	"message_type": "SESSION_CREATED",
	"session_id": "12a9760f-b165-4404-91d0-a65d4cdt78fs",
	"credit_balance": 120.0,
	"configs": {
		"sample_rate": 16000,
		"bit_rate": 16,
		"num_channels": 1,
		"use_prompt_id": null,
		"use_language_asr_input": "en",
		"use_language_asr_output": "en"
	}
}

AUDIO_CHUCK_ACK

Sent after the server accepts and stores an audio chunk. The server echoes the client ack_id when present, otherwise it uses the running chunk count.

JSON

{
	"message_type": "AUDIO_CHUCK_ACK",
	"chunck_id": 1,
	"total_chunks": 1
}

PARTIAL_TRANSCRIPT

Sent whenever a new partial transcript is available and differs from the last partial transcript already sent on the connection.

JSON

{
	"message_type": "PARTIAL_TRANSCRIPT",
	"transcript": "patient reports intermittent chest pain"
}

COMMITTED_TRANSCRIPT

Sent after the client sends COMMIT and the backend finishes processing the full stream. This is the final transcript payload for the session.

JSON

{
	"message_type": "COMMITTED_TRANSCRIPT",
	"transcript_id": "e1c4a90b-e319-4de6-9f22-0e0cf5e8b7a2",
	"transcript_text": "patient reports intermittent chest pain for two days",
	"audio_len": 10
}

Error and terminal messages

ERROR

Sent for unexpected server-side failures that do not map to a more specific websocket error message.

JSON

{
	"message_type": "ERROR",
	"message": "connection error"
}

INPUT_ERROR

Sent when the request payload is invalid, such as malformed JSON, an invalid message structure, invalid request data.

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "Invalid data structure for commit audio request"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "Commit request already received, cannot accept more audio chunks"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "Invalid data structure for audio chunk"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "Invalid base64 audio payload"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "Invalid PCM-16 payload length"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "no data received"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "invalid input message_type received"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "invalid input data"
}

JSON

{
	"message_type": "INPUT_ERROR",
	"message": "Error processing datas"
}

AUTHENTICATION_ERROR

Sent when the Authorization header is missing, invalid, or the access key fails authentication. The server closes the connection after sending this message.

JSON

{
	"message_type": "AUTHENTICATION_ERROR",
	"message": "permission denied,access-key error"
}

RESOURCE_EXHAUSTED

Sent when the service does not currently have enough available capacity to accept the stream. The status field indicates the capacity state returned by the backend.

JSON

{
	"message_type": "RESOURCE_EXHAUSTED",
	"status": "CAPACITY_NOT_AVAILABLE"
}

QUOTA_EXCEEDED

Sent when the authenticated user or entity has insufficient credits or has exceeded an allowed usage limit. The server closes the connection after sending this message.

JSON

{
	"message_type": "QUOTA_EXCEEDED",
	"credits_balance": "0",
	"message": "insufficient credits balance"
}

CHUNCK_SIZE_TOO_SMALL

Sent when the received audio chunk is smaller than the minimum allowed chunk size.

JSON

{
	"message_type": "CHUNCK_SIZE_TOO_SMALL",
	"chunk_size_min": "1"
}

CHUNK_SIZE_TOO_LARGE

Sent when the received audio chunk is larger than the maximum allowed chunk size.

JSON

{
	"message_type": "CHUNK_SIZE_TOO_LARGE",
	"chunk_size_max": "32"
}

INSUFFICIENT_AUDIO_ACTIVITY

Sent when no audio chunk has been received for longer than the allowed idle timeout while the session is still active.

JSON

{
	"message_type": "INSUFFICIENT_AUDIO_ACTIVITY",
	"message": "insufficient audio activity detected,exceed max idle time of 60 seconds"
}

SESSION_TIME_LIMIT_EXCEEDED

Sent when the websocket session exceeds the maximum allowed session lifetime.

JSON

{
	"message_type": "SESSION_TIME_LIMIT_EXCEEDED",
	"session_time_limit": "300"
}

CHUNK_ID_MISMATCH_WITH_TOTAL

Sent when the provided sequential integer ack_id in the INPUT_AUDIO_CHUNK request does match the count total chunks sent.

JSON

{
	"message_type": "CHUNK_ID_MISMATCH_WITH_TOTAL",
	"chunk_id_input": 20
	"chunk_id_expected": 3
	"chunk_id_total": 2
}

Typical flow

Open websocket connection with query parameters and Authorization header.
Receive SESSION_CREATED.
Send INPUT_AUDIO_CHUNK messages repeatedly.
Receive AUDIO_CHUCK_ACK and PARTIAL_TRANSCRIPT messages.
Send COMMIT when done streaming audio.
Receive COMMITTED_TRANSCRIPT and the connection closes.

Getting Started

What's New

Widget

STT API

TTS API

Voice Bots API

Runtime limits

Endpoint

Authentication

Connection query parameters

Supported Languages

Supported Languages

WebSocket handshake examples

Input message types

`INPUT_AUDIO_CHUNK`

`COMMIT`

Response message types

Session and transcript messages

Error and terminal messages

Typical flow

Getting Started

What's New

Widget

STT API

TTS API

Voice Bots API

​Runtime limits

​Endpoint

​Authentication

​Connection query parameters

​Supported Languages

Supported Languages

​WebSocket handshake examples

​Input message types

​INPUT_AUDIO_CHUNK

​COMMIT

​Response message types

​Session and transcript messages

​Error and terminal messages

​Typical flow

Runtime limits

Endpoint

Authentication

Connection query parameters

Supported Languages

WebSocket handshake examples

Input message types

`INPUT_AUDIO_CHUNK`

`COMMIT`

Response message types

Session and transcript messages

Error and terminal messages

Typical flow