VoiceBase provides APIs for speech recognition and speech analytics. Our customers use the APIs to transcribe recordings with high accuracy, discover the keywords and topics discussed, and predict business outcomes.
The core workflow of the API is to generate transcriptions and analytics from voice recordings. This workflow is asynchronous, and a typical usage is to:
- Upload a voice recording, starting the transcription and analysis process
- Wait for completion, using periodic polling for status or callbacks
- Process or retrieve results, including the transcript, keywords, topics and predictions
To achieve scalability, this workflow runs for multiple recordings in parallel.
REST Call Flow¶
A typical pattern of REST API calls to accomplish the workflow is to:
The body of POST request is MIME multipart, with three parts:
- media: the voice recording attachment or,
- mediaUrl: URL where the API can retrieve the voice recording
- configuration: (optional) a JSON object with customized processing instructions
- metadata: (optional) a JSON object with metadata
The API will return a unique identifier for the new object, called a mediaId.
This call retrieves status and progress information. When the processing is finished, the transcript and analytics can be retrieved.
The API supports Callbacks instead of polling for status, and this pattern is recommended for production integrations.