Can we use AI to create a Christmas music video?

Data Insights

Artificial Intelligence (AI) and text to image technology has boomed in the last year. With multiple tools already widely accessible to users, it’s possible to create your very own custom imagery based on any one idea and byway of simple text to image prompts. With that in mind, we wanted to test the capabilities of said models to see if we could put together a Christmas music video. Because we feel, that’s exactly what the industry needs, more; ingenuity, creative innovation and horrifyingly good Christmas music videos.

Last Christmas, we looked at ways to create our very own Christmas song using a similar approach (you can find the blog here). Based on the tech available at the time, we opted for a model called Markov Chain, an AI tool that is able to automate text based on previous training as well as previous steps taken. So it was apparent to us that this model was perfect for song writing, and we’ll show you just how capable it really was.

The Christmas Video

So, as we already have the AI ghost writer behind this one, all we needed was some help with the visuals to try and compete with the likes of Mariah Carey this Christmas. While the current tech required for a music video (text-to-video) is currently unavailable to us, we decided that we would have to adopt the briefly mentioned text to image methodology for each of our frames.

The Technology

Just as we did with the song writing, we now had to consider the existing tools that could help us create our own music video (without asking for too much help from our Creative team). With the popularity of Dall-E and the likes of other tools on the rise, as well as the likes of DALL-E 2 now in the mix, we wanted to explore the wide realm of text to image technology. With all things considered, it seemed obvious to go ahead and utilise a model named Stable Diffusion. A model which facilitates for the integration of Whisper – a bit of tech which helped recognise song lyrics to further help our text prompts which the AI requires to produce the desired imagery. This seemed the most suitable of methods, as a fundamental aspect of music video production is surely that each image produced is at least a small representation of the lyrics, right?

The Process

With the appropriate tech in place and the workflow/concept understood, the process seemed pretty straight forward, or so we had thought. With the prompt breakdown assisted by Whisper, respective of our song lyrics created by Markov Chain we now had the opportunity to overwrite these prompts. In doing so, overwriting such prompts to add styles and themes helped further direct the Stable Diffusion model to create images more suiting to a Christmas audience.

A steeper learning curve than expected

“Santa Delivering Presents at Christmas”- Having thought that a simple prompt such as this one, would be exactly the sort of thing one would use to produce a respective image of the instruction. Of course we were wrong, and the results, simply horrifying. In order to get something more in the way of a John Lewis advert and less, ‘The Hills Have Eyes, Christmas Special’, we would have to reconsider our approach and possibly tweak our prompt to help the computer understand exactly what we want from it.

Without a real understanding of the tech being used and it’s capabilities, this is what we can expect. The AI doesn’t deal with human faces nor does it stitch together realistic/life like images very well. There are a number of limitations to text to image technology, most of which can be overcome with more accurate prompting or the use of styles and themes which are less complex.

“[A Watercolor Image of Santa Clause Delivering Presents], [In the style of Peter Rabbit]” – So, we thought it best to take some time to better understand prompt engineering, and we had found that anything left unsaid inside of a prompt will surprise you, and there is no predicting whether it will be pleasant either. Here, we add “[A what doing what, in what style]” to further communicate with our model in a more computerised language format. Which makes sense, given that we(humans) are asking a model(a computer) to execute a humanly conceptual task using human language. We also opted for a more simplistic idea and basic theme prompt to help assist the model produce a clearer image, and here’s how it went in comparison to the first image.

The Outcome

We had eventually found a way to be more precise to our model, which helped us to receive much more bespoke imagery in return. However, we could not ignore that the creative spark and artistic control that one has during music video production was lacking. After all, most of the control is at the hands of the model, with us relying on its previous training to be fully comprehensive. We personally, do not see AI models such as these to be a reliable outsource solution for Creative studios or Design teams just yet, unless of course you are aiming for an end product with a bit of a, quirk?

Disclaimer

Our real aim here, was not just to SHARE Happiness at Christmas, but to test the capabilities of AI in a creative domain, with a specific brief and request to produce a Christmas music video. We’ll leave the decision as to whether we met that request sufficiently, up to you. For now, Merry Christmas from everyone at SHARE Creative, and we hope you enjoy the slightly creepy visuals we have put together, along with the wonderfully written ‘SHARE Happiness (At Christmas)’ – with music by the brilliant Groove Machine Ltd.