Resemble AI is an AI voice toolkit that uses GPT 3.5 and its own models available through an API.
Some of Resemble’s key tools include voice cloning, voice blending, and localization. The localization feature allows the creation of synthetic voices in multiple languages, extending the reach of content to a global audience.
Resemble’s voice blending fill allows users to combine human and synthetic voices for a seamless audio experience.
Resemble AI’s toolkit includes text-to-speech, speech-to-speech, neural audio editing, and voice dubbing capabilities for a wide range of applications. The voice dubbing is impressive.
The platform’s emotion feature adds an infinite range of emotions to speech without requiring new data, providing more nuanced and authentic communication.
It also supports real-time speech-to-speech transformations with granular control over inflection and intonation.
Resemble AI claims to have perfected its product with 200,000 AI voices producing more than two million minutes of audio per month. Resemble AI has found widespread use in various industries, including at the enterprise level.
The Resemblezer Python package can be used for speaker verification, digitization, deepfake detection, and more.
With a simple Unity plugin, game developers can create unique voices and speech assets, attach them to characters, and animate them.
Sample code is available on Github for an Alexa Skill project that shows how to dynamically integrate AI-generated text using GPT-3 with a custom voice powered by Resemble. This is a bit technical for standard users.
The driving force behind Resemble AI is its co-founders, Saqib Muhammad and Zohaib Ahmed.
Muhammad is based in California but studied in Canada. He has a background in business, having graduated from McGill and with stints at capital management firms.
Ahmed is a software developer who previously worked at Blackberry. He studied computer science at Toronto University.
Resemble AI offers a flexible pricing model that scales with the user’s needs.
The ‘Basic’ plan is pay-as-you-go and includes web-recorded custom voices, localization to Spanish (MX) and French, and access to over 50 marketplace voices. The price per second is $0.006.
For more extensive needs, the ‘Pro’ plan includes additional features such as custom data upload, advanced emotion control, low latency APIs, and cross-lingual support in over 24 languages. Pricing is not publicly available.
Both plans include unlimited team users and projects, with more advanced features available in the Pro plan.