nova-2
Model Overview
Nova-2 builds on the advancements of Nova-1 with speech-specific optimizations to its Transformer architecture, refined data curation techniques, and a multi-stage training approach. These improvements result in a lower word error rate (WER) and better entity recognition (including proper nouns and alphanumeric sequences), as well as enhanced punctuation and capitalization.
Nova-2 offers the following model options:
automotive: Optimized for audio with automotive oriented vocabulary.
conversationalai: Optimized for use cases in which a human is talking to an automated bot, such as IVR, a voice assistant, or an automated kiosk.
drivethru: Optimized for audio sources from drivethrus.
finance: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.
general: Optimized for everyday audio processing.
medical: Optimized for audio with medical oriented vocabulary.
meeting: Optimized for conference room settings, which include multiple speakers with a single microphone.
phonecall: Optimized for low-bandwidth audio phone calls.
video: Optimized for audio sourced from videos.
voicemail: Optimized for low-bandwidth audio clips with a single speaker. Derived from the phonecall model.
Setup your API Key
If you don’t have an API key for the Apilaplas API yet, feel free to use our Quickstart guide.
Submit a request
API Schema
Creating and sending a speech-to-text conversion task to the server
Requesting the result of the task from the server using the generation_id
Quick Code Examples
Let's use the #g1_nova-2-meeting
model to transcribe the following audio fragment:
Example #1: Processing a Speech Audio File via URL
import time
import requests
base_url = "https://api.apilaplas.com/v1"
# Insert your LAPLAS API Key instead of <YOUR_LAPLASAPI_KEY>:
api_key = "<YOUR_LAPLASAPI_KEY>"
# Creating and sending a speech-to-text conversion task to the server
def create_stt():
url = f"{base_url}/stt/create"
headers = {
"Authorization": f"Bearer {api_key}",
}
data = {
"model": "#g1_nova-2-meeting",
"url": "https://audio-samples.github.io/samples/mp3/blizzard_primed/sample-0.mp3"
}
response = requests.post(url, json=data, headers=headers)
if response.status_code >= 400:
print(f"Error: {response.status_code} - {response.text}")
else:
response_data = response.json()
print(response_data)
return response_data
# Requesting the result of the task from the server using the generation_id
def get_stt(gen_id):
url = f"{base_url}/stt/{gen_id}"
headers = {
"Authorization": f"Bearer {api_key}",
}
response = requests.get(url, headers=headers)
return response.json()
# First, start the generation, then repeatedly request the result from the server every 10 seconds.
def main():
stt_response = create_stt()
gen_id = stt_response.get("generation_id")
if gen_id:
start_time = time.time()
timeout = 600
while time.time() - start_time < timeout:
response_data = get_stt(gen_id)
if response_data is None:
print("Error: No response from API")
break
status = response_data.get("status")
if status == "waiting" or status == "active":
("Still waiting... Checking again in 10 seconds.")
time.sleep(10)
else:
print("Processing complete:/n", response_data["result"]['results']["channels"][0]["alternatives"][0]["transcript"])
return response_data
print("Timeout reached. Stopping.")
return None
if __name__ == "__main__":
main()
Example #2: Processing a Speech Audio File via File Path
import time
import requests
base_url = "https://api.apilaplas.com/v1"
# Insert your LAPLAS API Key instead of <YOUR_LAPLASAPI_KEY>:
api_key = "<YOUR_LAPLASAPI_KEY>"
# Creating and sending a speech-to-text conversion task to the server
def create_stt():
url = f"{base_url}/stt/create"
headers = {
"Authorization": f"Bearer {api_key}",
}
data = {
"model": "#g1_nova-2-meeting",
}
with open("stt-sample.mp3", "rb") as file:
files = {"audio": ("sample.mp3", file, "audio/mpeg")}
response = requests.post(url, data=data, headers=headers, files=files)
if response.status_code >= 400:
print(f"Error: {response.status_code} - {response.text}")
else:
response_data = response.json()
print(response_data)
return response_data
# Requesting the result of the task from the server using the generation_id
def get_stt(gen_id):
url = f"{base_url}/stt/{gen_id}"
headers = {
"Authorization": f"Bearer {api_key}",
}
response = requests.get(url, headers=headers)
return response.json()
# First, start the generation, then repeatedly request the result from the server every 10 seconds.
def main():
stt_response = create_stt()
gen_id = stt_response.get("generation_id")
if gen_id:
start_time = time.time()
timeout = 600
while time.time() - start_time < timeout:
response_data = get_stt(gen_id)
if response_data is None:
print("Error: No response from API")
break
status = response_data.get("status")
if status == "waiting" or status == "active":
("Still waiting... Checking again in 10 seconds.")
time.sleep(10)
else:
print("Processing complete:/n", response_data["result"]['results']["channels"][0]["alternatives"][0]["transcript"])
return response_data
print("Timeout reached. Stopping.")
return None
if __name__ == "__main__":
main()
Last updated