How we created amazing music with video using Artificial Intelligence

26 Jul, 2023
Xebia Background Header Wave

Pixelated Poets: How Generative AI fully fuels our music production

The journey of Pixelated Poets is a testament to the endless possibilities AI brings into our lives. It’s an expedition into the unknown, a chronicle of innovation, and most importantly, a celebration of how creativity and technology can together compose a beautiful symphony. So, let’s dive in and hit the high notes of this exciting musical saga.

In this era of rapid technological advancement, Artificial Intelligence (AI) is no longer confined to data analytics or automated systems. It’s spreading its wings across diverse fields, revolutionizing industries and creating unexpected wonders. One such wonder is the impact AI has made on the creative world, specifically music. Welcome to the incredible journey of "Pixelated Poets", a whimsical AI-driven folk band where creativity meets artificial intelligence.

These aren’t your regular musicians. There’s no lead vocalist passionately belting out notes, no guitarist strumming heart-tugging melodies, and no drummer rhythmically guiding the tempo. Instead, our band members are a complex network of algorithms, working in unison to compose music that evokes emotions, tells stories, and sets your foot tapping.

Everything you’re about to explore here, from the catchy band name to the mesmerizing tunes, has been birthed by AI. This includes the song descriptions that capture the spirit of each track, the lyrics that weave intricate narratives, the melody that paints the air with musical hues, and the rhythm section consisting of bass, backing tracks, and drums that set the groove.

We’ve also utilized AI to produce our very own music videos, creating a rich visual narrative that harmonizes perfectly with our music. Every frame of these videos is a testament to the power of AI in generating compelling visual art.

In this blog post, we’re going to dive deep into the fascinating process behind Pixelated Poets. How does AI compose lyrics that resonate with human experiences? How does it craft a melody that makes a song memorable? And most importantly, how does AI go beyond ones and zeros to create art that touches hearts? Join us on this magical musical adventure to uncover the answers.

The song is available on Spotify at:

You can view the music video of the AI-produced song here:

Pixelated Dreams

Behind the Music: How AI Writes Lyrics

When you listen to a song from Pixelated Poets, you might wonder: "How does an AI write such emotionally resonant lyrics?" The answer lies in the complex machinery of Large Language Models (LLMs). These models are powerful tools that use machine learning to generate human-like text.

Large Language Models have been trained by ingesting enormous amounts of text data from the internet, including books, articles, and, importantly for our purposes, song lyrics. This allows the model to learn the structures, patterns, and nuances of language.

Through this process, the AI learns about grammar, rhymes, common phrases, and even thematic elements. However, it’s important to clarify that the AI doesn’t "understand" the text in the way humans do. It identifies statistical patterns and uses them to predict what comes next.

We don’t need to train any large language model: the technology has is readily available nowadays. The most notable Large Language Models being Google Bard and ChatGPT.

The actual lyric generation begins with a "prompt". In the case of "Pixelated Dreams" it was: "We’re the Pixelated Poets, a whimsical folk band that generates music using artificial intelligence. Our members include Dennis, Matt, Dean and Riccardo. Write a song that takes these elements into account in the style of Bob Dylan".

Once the AI has a prompt, it begins to predict the next most likely word. It does this by analyzing the context provided by the prompt and all the previous words it has generated. The model makes an educated guess, choosing the word that most statistically fits with the given context.

This process repeats word by word, line by line, until the AI has constructed an entire set of lyrics. The AI doesn’t plan ahead – each word is generated based on the context up to that point.

We’ve used parts of the lyrics it spit out for the Pixelated Dreams song.

Generative AI Tweaking: Temperature and More

The generation process is influenced by several parameters, such as temperature and max tokens.

The temperature parameter controls the randomness of the AI’s choices. A lower temperature (like 0.2) makes the AI’s predictions more focused and deterministic, leading to repetitive and conservative text. On the other hand, a higher temperature (like 0.8) encourages more diversity and creativity, at the risk of generating less coherent or grammatically incorrect text.

Max tokens is another parameter that defines the length of the output. It limits the number of words the AI can generate, enabling control over the length of the generated lyrics.

Human Touch

While the AI does a fantastic job of creating unique and intriguing lyrics, the raw output often requires some refining. A human touch can help arrange lines, ensure thematic consistency, and polish any rough edges. This collaboration between AI and human creativity is what gave life to the music of Pixelated Poets.

So, when you listen to Pixelated Dreams, take a moment to appreciate the complex dance of algorithms and human creativity that brought the lyrics to life.

The Melody of Machine Learning: Creating Music with AI

When you listen to Pixelated Poets, you might find yourself mesmerized by the harmonious blend of instruments, the captivating melodies, and the rhythmic beats. All of our music was composed not by a traditional musician, but by AI. Our partner in this symphony of technology is Soundraw, an innovative AI composition tool that transforms the music creation process. It’s designed to empower creators, allowing anyone to craft songs perfectly suited to their content in just minutes, even without any knowledge of music composition.

Creating a new song with Soundraw begins by defining the parameters of your piece. These include the style of the song (say, folk or pop), the mood (such as happy, sad, or dramatic), which instruments to use, the tempo, and the desired length of the track.

Once your parameters are set, it’s time to let the AI do its magic. Soundraw generates a couple of complete songs based on the parameters you’ve provided. This isn’t a simple process of stringing together pre-made loops—it’s the result of complex algorithms working in harmony to create a unique piece of music from scratch.

Each song consists of several tracks: melody, backing, bass, drums, and fill-ins. And if you like the generated song, you can continue iterating on it, making variations, fine-tuning the instruments, and so on until you achieve the perfect composition.

When you’re satisfied with your song, you can download it in high-quality WAV format. To provide maximum flexibility for producing and mixing, Soundraw allows you to save each track of the song individually. Here’s how: you favourite a generated track, press the ‘share’ button, and turn off all tracks except for one. This process is repeated until you have all the tracks saved as individual WAV files.

These individual tracks can then be easily imported into digital audio workstation software like GarageBand, giving you full control over the final mix.

The combination of Soundraw’s AI and our creative direction resulted in Pixelated Dreams. The melodies, harmonies, and rhythms all spring from this powerful blend of technology and creativity. It’s a testament to how AI is not only revolutionizing the way we create music but also expanding the horizons of what’s possible.

Singing with Synthesis: AI-Powered Vocals with Synthesizer-V

Creating the melody and rhythm of our songs is only part of the musical journey. A crucial element that breathes life into our lyrics is the vocals. For Pixelated Poets, this was achieved using an innovative software called Synthesizer-V. And that’s a good thing. We did the vocals on our first song – The Government Knows, a Knower-inspired song – and it came out as a mixture of Tom Waits and Cookie Monster. Still love the song to death, though.

Synthesizer-V is a vocal synthesizer that uses AI technology to create incredibly realistic singing voices. The software provides several voicebanks, each representing a unique "singer". Each voicebank has its own character and style, from soft and breathy to bold and powerful.

Once the lyrics and melody are ready, we’ve fed them into Synthesizer-V. The software allows us to define the melody that the virtual singer will follow, and input the lyrics they’ll be singing. The latter requires significant effort of humans. It involves finding the correct notes of the generated tune, matching words on top of them, playing with phonetics and creating harmonies. We can also control aspects of the vocal performance, like the pitch, dynamics, tone, and more. This granular control allows us to fine-tune the performance until it perfectly fits the song.

Once the AI-generated vocals are created, each vocal track can be exported as an individual WAV file. This file can then be mixed with the rest of the music tracks in a digital audio workstation like GarageBand.

Visualizing the Music: AI-Generated Video Production

A music video is a powerful way to enhance the storytelling of a song. For Pixelated Poets, we decided to lean further into the world of AI to create our music videos, using DALL-E and RunwayML.

DALL-E is a sibling of GPT-3, a model by OpenAI, but instead of generating text, DALL-E generates images from text descriptions. We’ve used Dall-E to generate our band logo:

Pixelated Dreams

Once we had our static image, the next step was to add a layer of motion and dynamism. For this, we turned to RunwayML and its impressive Gen-2 feature, which specializes in Text to Video generation. RunwayML Gen-2 harnesses the power of AI to transform textual descriptions into dynamic videos. We uploaded our Pixelated Poets logo image and prompted RunwayML to create video’s in the style of that image. Our prompt was to create Pixelated Poets watching a beautiful scenery, and Pixelated Poets dancing.

RunwayML proceeded to generate previews of movies and we selected the ones we thought had potential. This saves on credits, as you don’t need to waste credits on video’s that you don’t like. Each generated video is 4 seconds in length and gave the perfect pixelated vibe for our videos. It definitely brought a new layer of depth to our music.

Now with our AI-generated animation and music in hand, it was time to piece them together into a complete music video.

Putting It All Together with iMovie

With our animated visuals and AI-created music tracks at hand, we needed a way to bring them together into a cohesive music video. We chose iMovie, Apple’s user-friendly video editing software, for this task.

We imported our produced music track and animated clips into iMovie, and started the process of synchronizing the visuals with the music. This involves slowing down most of the generated clips so that we had enough video material to cover the music, and timing the transitions and animations to match the rhythm and mood of the song, emphasizing key moments and enhancing the overall storytelling.

In Conclusion: The Future of Generative AI in Music

Through the combination of Large Language Models, Soundraw, Synthesizer-V, DALL-E, RunwayML, and iMovie, we were able to craft a captivating AI-generated music video with an great visuals and vocals.

Our journey with Pixelated Poets has been an awe-inspiring exploration of the intersection between creativity and technology. The band is the result of not just advanced AI algorithms but also countless hours of human effort in refining outputs, and blending different elements together.

In the course of this project, we’ve seen firsthand how generative AI can open up new possibilities in music and art. It’s important to note, however, that while AI played a significant role in generating lyrics, composing music, providing vocals, and creating visuals, it was our human input and intuition that guided the entire process. We made critical choices at every step: the style and mood of our songs, the voice of our singer, the themes of our visuals, and the final composition of our music videos.

The results have thrilled us beyond measure. From whimsical lyrics to enchanting melodies, from expressive vocals to captivating visuals, Pixelated Poets is a testament to the creative potential of AI. But, it’s also a testament to human creativity, which has used this cutting-edge technology to craft a unique musical experience.

And we’re just getting started. The field of generative AI is still in its infancy, and we’re excited about its future. We anticipate a world where AI becomes an even more versatile and powerful, pushing the boundaries of what’s possible in music and beyond. We can’t wait to see where this journey takes us next.

Dennis Vink
Crafting digital leaders through innovative AI & cloud solutions. Empowering businesses with cutting-edge strategies for growth and transformation.

Get in touch with us to learn more about the subject and related solutions

Explore related posts