API Overview

VoiceBase provides APIs for speech recognition and speech analytics, offering customers a wide variety of insights about the data in their audio files.


The VoiceBase /v3 REST API is secured using OAuth Bearer tokens. Bearer tokens are issued and managed through your VoiceBase account.

For details on creating your first token, see the How to Get Your Bearer Token section of the Hello, World How-To Guide.

Core Workflow

The core workflow of the API is to generate transcriptions and analytics from voice recordings. This workflow is asynchronous, and a typical usage is to:

  1. Upload a voice recording, starting the transcription and analysis process
  2. Wait for completion, using periodic polling for status or callbacks
  3. Process or retrieve results, including the transcript, keywords, topics and predictions

To achieve scalability, this workflow runs for multiple recordings in parallel.

REST Call Flow

A typical pattern of REST API calls to accomplish the workflow is to:

POST /media

The body of POST request is MIME multipart, with three parts:

One of:

  • media: the voice recording attachment or,
  • mediaUrl: URL where the API can retrieve the voice recording

and optionally:

  • configuration: (optional) a JSON object with customized processing instructions
  • metadata: (optional) a JSON object with metadata

The API will return a unique identifier for the new object, called a mediaId.

GET /media/{mediaId}/progress

This call retrieves status and progress information. When the processing is finished, the transcript and analytics can be retrieved.

GET /media/{mediaId}

The API supports Callbacks instead of polling for status, and this pattern is recommended for production integrations.

Getting Started

The Hello, World How-To Guide provides a practical introduction to getting started with the VoiceBase V3 REST API.