Home > Tech

Meta's Make-A-Video AI is Dall-E for video clips

We walk deeper into the shadow of the uncanny valley every day.

Elizabeth de Luna

on September 29, 2022

Make-A-Video results for the prompts "a dog wearing a superhero cape flying through the sky" and "a spaceship landing on Mars," which are accurate but a bit blurry. The dog also has on little goggles in addition to his cape.

Make-A-Video results for the prompts "a dog wearing a superhero cape flying through the sky" and "a spaceship landing on Mars." Credit: Meta

Everyone's favorite text-to-image generator Dall-E has a new competitor from Meta: A video-to-text generator called Make-A-Video. The tool generates short, soundless video snippets based on the same type of text prompts you feed to Dall-E.

But Dall-E is child's play compared to Make-A-Video, at least according to Mark Zuckerberg. The Meta CEO noted in a Facebook post, “It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.” Make-A-Video doesn't have that problem because it "understand[s] motion in the physical world and apply it to traditional text-to-image generation."

Mashable Light Speed

Want more out-of-this world tech, space and science stories?

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

You May Also Like

Another Make-A-Video feature is the ability to add motion to static images. Make-A-Video's transformation of a static image of a woman doing a yoga pose, for example, has her leaning deeper into her stretch as a light flare shimmers on the lens. Other examples of the tool are available on its website, which notes that you can also show Make-A-Video an existing video and be presented with several new interpretations.

We'll take all these examples with a grain of salt, since Make-A-Video isn't yet available to the public, but it is a wild new potential development for artificial intelligence.

Meta has published a paper about the tool which you can read at this link. It details how it was trained, along with the technical limitations of the tool, which include its inability to generate clips longer than five seconds and deliver resolutions higher than 768 by 768 pixels at 16 frames per second. The Verge notes that the only text-to-video model available to the public, called CogVideo, is burdened by the same limitations.

Topics Artificial Intelligence Meta

Elizabeth de Luna

Culture Reporter

Elizabeth is a digital culture reporter covering the internet's influence on self-expression, fashion, and fandom. Her work explores how technology shapes our identities, communities, and emotions. Before joining Mashable, Elizabeth spent six years in tech. Her reporting can be found in Rolling Stone, The Guardian, TIME, and Teen Vogue. Follow her on Instagram here.