They became a little lost in Meta, but in recent weeks they have significantly increased their efforts in the field of artificial intelligence.They recently debuted MusicGen, their generative AI to produce music, to great acclaim among the Open Source community, which has embraced the launch of their LLaMA model and used it as the foundation for numerous independent projects.Another visually stunning project is about to start.Voicebox is its name.
Voicebox was introduced by Meta researchers last Friday, claiming that it is the first model capable of generalizing speech generating tasks without being specifically taught for them and producing excellent results.And it achieves more than other models do.
Not using text-to-speech. To begin with, Voicebox does not require prior training; all that is required is for a user to write a sentence they wish to have read out, and the system will then produce a variety of plausible, if not quite flawless, synthetic voices in the style of their choice.
Text to voice transition. The most "traditional" function is the ability to mimic someone else's voice to pronounce any sentence, and Voicebox is capable of doing just that.Simply place a little audio clip (for instance, ours, which lasts two seconds) next to the written sentence that should be pronounced, and the model will be able to produce the voice for that sentence.
You can now converse in a variety of languages. A written text in any language and an audio clip in your native tongue are also options.In order to break down language barriers in a variety of situations, Voicebox will force you to "say" that sentence in that language as if it were your natural tongue.
Get rid of the noise. If a dog is barking while you're shooting a video and you don't want that barking to be audible while you're speaking, Voicebox can also identify and remove that background noise.
Finally, the Meta development can also change any word you stated in the original audio clip you recorded with your voice and insert a new term that is specified in the text prompt. For instance, you might easily alter the phrase "Hey guys, today we're going to talk about artificial intelligence" to "Ladies and gentlemen, today we're going to talk about artificial intelligence."
Public Domain Training. The engineers at Meta fed Voicebox 50,000 hours of audiobooks in English and another 60,000 hours of audiobooks in other languages to train it.Because of this, the vocals on the demos don't adopt a more relaxed, conversational cadence and instead sound like they are reading from a book.The notion is that the model develops in this direction.Once more, Meta fails to identify the audiobooks that were utilized, although a company representative told Gizmodo that they were "public domain" audiobooks.
Deepfakes can be seen. Although this kind of system has some noteworthy benefits and many useful applications, it may also be abused to produce deepfakes.Because they allow for the use of impersonating identities in scams of all types, Meta had to choose Voicebox on this particular occasion.
The software won't be open source. At Meta, they have chosen not to publish the Voicebox source, in contrast to LLaMA, which is Open Source and was distributed to the academic community.The company claims that because of misuse, they would prefer not to make it accessible to the general public in order to responsibly conduct further AI research.Yes, they do so in order to maintain openness about the development of this sector.