Meta's Make-A-Video AI is Dall-E for video clips

We walk deeper into the shadow of the uncanny valley every day.
 By 
Elizabeth de Luna
 on 
Make-A-Video results for the prompts "a dog wearing a superhero cape flying through the sky" and "a spaceship landing on Mars," which are accurate but a bit blurry. The dog also has on little goggles in addition to his cape.
Make-A-Video results for the prompts "a dog wearing a superhero cape flying through the sky" and "a spaceship landing on Mars." Credit: Meta

Everyone's favorite text-to-image generator Dall-E has a new competitor from Meta: A video-to-text generator called Make-A-Video. The tool generates short, soundless video snippets based on the same type of text prompts you feed to Dall-E.

But Dall-E is child's play compared to Make-A-Video, at least according to Mark Zuckerberg. The Meta CEO noted in a Facebook post, “It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.” Make-A-Video doesn't have that problem because it "understand[s] motion in the physical world and apply it to traditional text-to-image generation."


You May Also Like

Another Make-A-Video feature is the ability to add motion to static images. Make-A-Video's transformation of a static image of a woman doing a yoga pose, for example, has her leaning deeper into her stretch as a light flare shimmers on the lens. Other examples of the tool are available on its website, which notes that you can also show Make-A-Video an existing video and be presented with several new interpretations.

We'll take all these examples with a grain of salt, since Make-A-Video isn't yet available to the public, but it is a wild new potential development for artificial intelligence.

Meta has published a paper about the tool which you can read at this link. It details how it was trained, along with the technical limitations of the tool, which include its inability to generate clips longer than five seconds and deliver resolutions higher than 768 by 768 pixels at 16 frames per second. The Verge notes that the only text-to-video model available to the public, called CogVideo, is burdened by the same limitations.

Mashable Image
Elizabeth de Luna
Culture Reporter

Elizabeth is a digital culture reporter covering the internet's influence on self-expression, fashion, and fandom. Her work explores how technology shapes our identities, communities, and emotions. Before joining Mashable, Elizabeth spent six years in tech. Her reporting can be found in Rolling Stone, The Guardian, TIME, and Teen Vogue. Follow her on Instagram here.

Mashable Potato

Recommended For You
Pranksters and pickup artists are using Meta Ray-Ban glasses to harass strangers for content
Man with meta ray ban glasses with creepy grin

Meta can read your WhatsApp messages, lawsuit alleges
whatsapp logo

Meta reverses course, will keep metaverse partially VR after all
Horizon Worlds logo seen on a smartphone.

Meta to fund natural gas plants to power its largest data center
By Jack Dawes
Futuristic data center - stock photo

White House uses 'Call of Duty' clips to brag about war in Iran
President Trump grimaces in front of a U.S. flag.

Trending on Mashable
NYT Connections hints today: Clues, answers for April 3, 2026
Connections game on a smartphone

Wordle today: Answer, hints for April 3, 2026
Wordle game on a smartphone

What's new to streaming this week? (April 3, 2026)
A composite of images from film and TV streaming this week.


NYT Strands hints, answers for April 3, 2026
A game being played on a smartphone.
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!