Meta has revealed Audiocraft, their new generative AI tool that generates high-quality audio from text prompts, much like ChatGPT does with articles or Dall-E with images.
“Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or a small business owner adding a soundtrack to their latest video ad on Instagram with ease. That’s the promise of AudioCraft — our latest AI tool that generates high-quality, realistic audio and music from text,” Meta wrote on their company blog.
AudioCraft consists of three models: MusicGen, AudioGen and EnCodec. MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text prompts, while AudioGen, which was trained on public sound effects, generates audio from text prompts.
“Today, we’re excited to release an improved version of our EnCodec decoder, which allows higher quality music generation with fewer artifacts,” they said.
Meta also announced that they are open-sourcing the tools to allow researchers and practitioners “train their own models with their own datasets for the first time and help advance the field of AI-generated audio and music.”
“Having a solid open-source foundation will foster innovation and complement the way we produce and listen to audio and music in the future,” the company said.
Meta said they envision MusicGen to eventually turn into a new type of instrument, akin to synthesizers in 1955.
“We see the AudioCraft family of models as tools for musicians and sound designers to provide inspiration, help people quickly brainstorm and iterate on their compositions in new ways. We can’t wait to see what people create with Audiocraft,” they said.