Jump to content

Veo (text-to-video model)

From Wikipedia, the free encyclopedia
Veo
Developer(s)Google DeepMind
Initial releaseMay 2024; 1 year ago (2024-05)
Stable release
Veo 3 / 20 May 2025; 49 days ago (2025-05-20)
TypeText-to-video model
Websitedeepmind.google/models/veo/

Google Veo or simply/alternatively, Veo, is a text-to-video model developed by Google DeepMind and announced in May 2024. As a generative AI model, it creates videos based on user prompts. Veo 3, released in May 2025, can also generate accompanying audio.

Development

[edit]

In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024.[1] Google claimed that it could generate 1080p videos over a minute long.[1] In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation and has an improved understanding of physics.[2] In April 2025, Google announced that Veo 2 became available for advanced users on the Gemini app.[3] In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals.[4][5] Google also announced Flow, a video-creation tool powered by Veo and Imagen.[6]

A key innovation of the May 2025 release of Veo 3 was that it generated audio, including music and voices, to match well with the video.[5] Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film.[5]

Capabilities and limitations

[edit]
A LGBTQ romantic thriller, short film, generated exclusively by Google Veo 3. This clip is a perfect example of how detailed, diverse, and realistic the character models are, continuity with characters/stories, in addition to voice acting also being implemented.

Google Veo can be bought by several subscription/membership tiers, and/or by using Google "AI credits". The software itself can be run by two different consoles called Google Gemini and Google Flow, with Gemini being geared towards shorter, quicker, and faster projects, using the Gemini AI chat model, or through Google Flow, which is essentially a movie editor, as well, allowing users to create longer projects, and continuity using the same characters and actors. Users can create a maximum length of eight seconds per clip.[7]

Google Veo, has a relatively simple interface and dashboard, however writing prompts, for those who have little to no experience in transcribing or filmmaking may face issues with the software misunderstanding what the user intended by their prompt (no matter how detailed it was). So although Veo does have a friendly and simple setup, prompts, which are the forefront of the software, need to be not only short and to the point, but they also must be very specific, if the user wants the right vision for their project. Google Veo, when it comes to human models, is able to generate several races, ethnicity and body types. The software is also capable of generating stand up comedy routines, and Music videos. It can as well generate animals, cartoons, and animation. Prompts must accurately describe places, people, and things in each scene, in addition knowledge of film and camera lingo such as panning, zooming, and terms for camera angles, are also important.[8]

Google Veo however, has strict guidelines and blockades to their software. Before a clip is generated, the algorithm computer software reviews it, and if it's anything deemed inappropriate, too graphically sexual, illegal, showcasing graphic abuse/assault/fighting (unless the prompt specifies that it's a fictitious martial arts scene etc.) gross behaviors, antisemitism, racist, homophobic, anything depicting reigning regimes, rioting, blood, gore, or warfare, (unless in some cases the prompt specifies that it's fictitious period drama, the clip may still be generated), the clip will not be generated. In addition, Google Veo cannot, and will not, generate character actors that look identical to celebrities, or a real life person. Users have mainly complained that no matter how descriptive and detailed their prompts are, Google Veo will either still misunderstand the prompt, generating something completely different; emulate incorrect subtitles and captions; emulate a complex scene (which due to the maximum eight second length) incomplete/unfinished, generate garbled and gibberish speech, as well as character models looking and moving deformed; and also complaints of prompts and generations being falsely reported, as going against the guidelines, and other arrays of issues and complaints. However, trial and error may have to be used with Veo for optimal results.[9]

Reactions

[edit]

A reporter for Gizmodo reacted to the release of Veo 3 by observing that users were directing the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products.[10] Another media commentator reported that the tool tended to repeat the same joke in response to different prompts.[11]

Commentators speculated that Google had trained the service on YouTube videos[5] or Reddit posts.[11] Google itself had not stated the source of its training content.[5]

In July 2025, Media Matters for America reported that racist and antisemitic videos generated using Veo 3 were being uploaded to TikTok.[12][13] Ryan Whitwam of Ars Technica commented, "In a perfect world, Veo 3 would refuse to create these videos, but vagueness in the prompt and the AI's inability to understand the subtleties of racist tropes (i.e., the use of monkeys instead of humans in some videos) make it easy to skirt the rules."[13]

See also

[edit]

References

[edit]
  1. ^ a b Wiggers, Kyle (14 May 2024). "Google Veo, a serious swing at AI-generated video, debuts at Google I/O 2024". TechCrunch.
  2. ^ "Google unveils improved AI video generator Veo 2 to rival OpenAI's Sora". The Hindu. 2024-12-17. ISSN 0971-751X. Retrieved 2024-12-20.
  3. ^ Wiggers, Kyle (2025-04-15). "Google's Veo 2 video generating model comes to Gemini". TechCrunch. Archived from the original on 2025-04-16. Retrieved 2025-04-16.
  4. ^ "Google launches Veo 3, an AI video generator that incorporates audio". CNBC. 2025-05-20. Retrieved 2025-05-20.
  5. ^ a b c d e Wiggers, Kyle (20 May 2025). "Veo 3 can generate videos — and soundtracks to go along with them". TechCrunch.
  6. ^ Peters, Jay (May 20, 2025). "Google has a new tool just for making AI videos". The Verge. Archived from the original on May 20, 2025. Retrieved May 20, 2025.
  7. ^ Caswell, Amanda (20 May 2025). "Google Veo 3 and Flow: The future of AI filmmaking is here, and here's how it works". Tomsguide.com.
  8. ^ Olteanu, Alex (22 May 2025). "Google's Veo 3: A Guide To Prompts, With Practical Examples". Datacamp.com.
  9. ^ "Generative AI Prohibited Use Policy". Google.com. 17 December 2024.
  10. ^ Pero, James (22 May 2025). "Google's Veo 3 Is Already Deepfaking All of YouTube's Most Smooth-Brained Content". Gizmodo. Archived from the original on 23 May 2025. Retrieved 23 May 2025.
  11. ^ a b Maiberg, Emanuel (21 May 2025). "Why Does Google's New Veo 3 AI Video Generator Love This Dad Joke?". 404 Media.
  12. ^ Richards, Abbie (July 1, 2025). "Racist AI-generated videos are the newest slop garnering millions of views on TikTok". Media Matters for America. Retrieved July 4, 2025.
  13. ^ a b Whitwam, Ryan (2025-07-02). "TikTok is being flooded with racist AI videos generated by Google's Veo 3". Ars Technica. Retrieved 2025-07-03.
[edit]