
Microsoft claims it has developed an AI speech generator that is able to mimic real speech of real people to such a high degree that they dare not release it to the public. The name of the AI tool is VALL-E 2, and it can get your speech patterns and tone down with just a few seconds of audio recording of your voice.
Go to Article
Excerpt from www.businessinsider.in
In a world where technological advancements are often heralded with great fanfare and widespread availability, Microsoft has taken an unusually cautious step. The tech giant has developed an artificial intelligence (AI) speech generator so convincing and advanced that it has decided to withhold it from public release.
VALL-E 2 is an AI marvel capable of mimicking human speech with uncanny accuracy, using just a few seconds of audio. Representing a significant leap in text-to-speech (TTS) technology, Microsoft’s researchers boast that it achieves “human parity” in generating speech — meaning its output is virtually indistinguishable from a human’s voice.
This extraordinary capability has been made possible through a couple of groundbreaking features. The first of these is “Repetition Aware Sampling”, which ensures that VALL-E 2 avoids the pitfalls of monotonous speech by addressing repetitions of “tokens” — the small units of language like words or syllables. This feature prevents the AI from getting stuck in a loop of sounds, making its speech flow more naturally.
