AI APIs
The Most Powerful models at your fingertip.
These endpoints require an API key to access. You can obtain an API key from our website. After obtaining an API key, you can pass it in the api-key header, such as api-key: YOUR_API_KEY.
The Most Powerful models at your fingertip.
These endpoints require an API key to access. You can obtain an API key from our website. After obtaining an API key, you can pass it in the api-key header, such as api-key: YOUR_API_KEY.
Whisper (https://github.com/openai/whisper) is a general-purpose speech recognition model. This endpoint uses Stable-ts (https://github.com/jianfch/stable-ts) and Faster-Whisper (https://github.com/SYSTRAN/faster-whisper) to improve speed and timestamp accuracy of the original whisper. Currently using stable-ts verion 2.17.3. This endpoint creates a job to align a pre-transcribed text (without timestamps) on the provided audio file asynchronously. The job will be placed in a queue and processed in the order it was received. The job status can be checked using the /stablets/status endpoint.
audio_data should be passed in as a dictionary in the format below:
{
"audio_data": "audio file converted to base64",
"file_ext": "mp3" or "wav"
}
If your audio file is in another format or is a video, consider converting it to one of the supported formats (mp3 or wav) using a tool such as ffmpeg.
Exactly one of the following two query parameters is required:
audio_data (this parameter)audio_url{
"audio_data": "SUQzAwAAAAAAU1RTU0UAAAAPAAADTGF2ZjU2LjIzLjEwMAAAAAAAAAAAAAAA//tAxAAAAAAA",
"file_ext": "mp3"
}A public URL to a hosted audio file. You can use services such as Azure Storage to obtain a public link to your audio file and pass it here. Only mp3 or wav files are supported. If your audio file is in another format or is a video, consider converting it to one of the supported formats (mp3 or wav) using a tool such as ffmpeg.
Exactly one of the following two query parameters is required:
audio_dataaudio_url (this parameter)https://longstoragevoila.blob.core.windows.net/long/zaiye.mp3The model to use for transcription.
large-v2The language of the audio file. Auto-detects if not provided.
enPunctuations to prepend to the transcription.
Punctuations to append to the transcription.
Whether to regroup the transcription segments.
trueWhether to suppress silence in the transcription.
trueWhether to suppress word timestamps in the transcription.
trueWhether to use word positions in the transcription.
trueThe quality levels for the transcription.
2The kernel size for the transcription.
3The denoiser to use for the transcription.
default_denoiserOptions for the denoiser.
Whether to use voice activity detection.
trueThreshold for voice activity detection.
0.5Minimum duration of words in the transcription.
0.2Minimum duration of silence in the transcription.
0.5Whether to treat nonspeech as an error.
trueWhether to use only voice frequency.
trueThe text to align with the audio.
Whether to remove instant words from the alignment.
trueStep size for token alignment.
1Whether to use original split for alignment.
trueMaximum duration of words in the alignment.
1.5Whether to skip nonspeech segments during alignment.
trueWhether to enable fast mode for alignment.
trueWhether to stream the alignment results.
trueThreshold for considering alignment as a failure.
0.2Whether to presplit the text for alignment.
truePadding for gaps in the alignment.
0.1Successful response
Bad Request
curl --request POST \
--url https://ytdlp-voilatech-apim.azure-api.net/v1/stablets/align \
--header 'Content-Type: application/json' \
--data '{
"text": "Hello, how are you?",
"model": "small",
"audio_url": "https://longstoragevoila.blob.core.windows.net/long/zaiye.mp3"
}'
{
"id": "66bf0b2a-28c4-43a9-895c-96ec04aa49d1-e1",
"status": "IN_QUEUE"
}Successful response