国产精品乱码在线观看,中出丰满大乳中文字幕,亚洲精品无码专区在线

谷歌旗下公司發(fā)布新款視頻生成器,，分辨率超Sora

DAVID MEYER

2024-12-19

DeepMind的新Veo 2人工智能視頻生成器以4K分辨率超越了OpenAI的Sora模型,。

文本設(shè)置

小號

默認(rèn)

大號

Plus(0條)

在Alphabet旗下的谷歌DeepMind推出Veo人工智能視頻生成器僅7個(gè)月后,，該部門再次宣布推出Veo 2,。

新工具能夠生成分辨率高達(dá)4K的視頻，而第一代Veo僅支持最高1080p的視頻處理,。谷歌聲稱,，升級后的Veo生成的場景的物理效果有所改進(jìn)，而且“相機(jī)控制”功能也更為出色（盡管并不涉及實(shí)體相機(jī),，但用戶可以通過指令提示模型選用特定的相機(jī)鏡頭和拍攝角度,，從特寫鏡頭到搖攝再到“定場鏡頭”）。

DeepMind還宣布了Imagen 3文本到圖像模型的更新版本,，盡管這些改進(jìn)——比如“構(gòu)圖更為均衡”的圖像以及藝術(shù)風(fēng)格更為貼合——顯然不足以讓其升級到全新的版本號,。Imagen 3于今年8月首次推出。

Veo 2升級至4K分辨率表明DeepMind在視頻生成領(lǐng)域相較于競爭對手的人工智能實(shí)驗(yàn)室取得了領(lǐng)先優(yōu)勢,。

一周前,，OpenAI終于推出了Sora視頻生成器（早在2月份就已公布），但Sora（尤其是目前僅向ChatGPT Plus和Pro用戶開放的Sora Turbo版本）的導(dǎo)出分辨率上限僅為1080p,。目前最受歡迎的人工智能視頻生成器Runway的導(dǎo)出分辨率更是局限在較為模糊的720p,。

谷歌在Veo 2的演示中表示：“低分辨率視頻在移動(dòng)設(shè)備上播放效果很好，但創(chuàng)作者希望看到自己的作品在大屏幕上熠熠生輝,?！?/p>

谷歌發(fā)言人表示，在默認(rèn)情況下,，Veo 2生成的4K視頻時(shí)長被限制在8秒以內(nèi),，但可以延長至2分鐘或以上。Sora生成的1080p視頻時(shí)長則被限制在20秒以內(nèi),。

DeepMind聲稱,，在對Veo 2和Sora Turbo進(jìn)行比較時(shí)，59%的人類評分者更青睞谷歌的服務(wù),，27%的人則選擇Sora Turbo,。它還聲稱，在與Minimax及Meta的Movie Gen的較量中,，DeepMind也取得了類似的勝利,。當(dāng)競爭對手是來自中國的快手科技（Kuaishou Technology）的Kling v1.5時(shí)，Veo 2的受青睞程度僅略低于50%,。

據(jù)DeepMind稱,，在“遵循提示”（即按照要求完成任務(wù)）方面，Veo 2的受青睞程度也相似,。

該谷歌部門還聲稱在消除多余手指等“幻覺”細(xì)節(jié)方面取得了重大進(jìn)展,，并且在展示“對現(xiàn)實(shí)世界物理學(xué)以及人類動(dòng)作和表情細(xì)微差別有更好理解”方面也取得了重大進(jìn)展。

物理學(xué)問題一直是視頻生成器面臨的一大難題,。例如,，Sora就難以生成逼真的體操運(yùn)動(dòng)員及其復(fù)雜動(dòng)作視頻。Veo 2在這方面會有多大改進(jìn)還有待觀察,。

斯坦福大學(xué)（Stanford）教授,、World Labs聯(lián)合創(chuàng)始人李飛飛（Fei-Fei Li）等人認(rèn)為，只有所謂的世界模型才能真正解決物理和物體永恒性等難題,，這些模型具有“空間智能”,，能夠理解和生成三維環(huán)境。谷歌于本月早些時(shí)候推出了Genie 2世界模型,，但其重點(diǎn)是生成環(huán)境,，用于訓(xùn)練和評估在虛擬環(huán)境中運(yùn)行的人工智能“代理”。

圖像和視頻生成器的輸出越逼真,，其被用于非法目的的風(fēng)險(xiǎn)就越大,。DeepMind在Veo 2視頻片段中添加了隱形的SynthID水印。如果人們在查看視頻時(shí)發(fā)現(xiàn)了人工智能來源的蛛絲馬跡,，那么利用這些視頻片段進(jìn)行政治造謠的難度就會加大,。對于更普通的欺詐性應(yīng)用程序，這一措施可能并不奏效,，因?yàn)槭芎φ卟惶赡軝z查文件中是否有隱形水印,。

相比之下，OpenAI的Sora在其生成視頻的右下角添加了明顯的動(dòng)畫標(biāo)識,。Sora還使用開源的C2PA水印協(xié)議,，這是SynthID的替代系統(tǒng)（盡管谷歌也在2月份加入了C2PA計(jì)劃）,。

Veo 2現(xiàn)已被整合進(jìn)谷歌實(shí)驗(yàn)室的VideoFX生成工具（分辨率上限為720p），而修改后的Imagen 3模型如今也已應(yīng)用于ImageFX工具,。VideoFX目前只在美國推出,，但I(xiàn)mageFX可在100多個(gè)國家使用。

谷歌DeepMind并未透露Veo 2和新版Imagen 3所使用的訓(xùn)練數(shù)據(jù)來源,，不過該公司此前曾暗示,，油管（YouTube）上的視頻（這兩家公司都隸屬于Alphabet）是原始Veo版本部分訓(xùn)練數(shù)據(jù)的來源。

許多藝術(shù)家,、攝影師,、創(chuàng)作者和電影制作人擔(dān)心，他們受版權(quán)保護(hù)的作品會在未經(jīng)授權(quán)的情況下被用于訓(xùn)練此類系統(tǒng),。OpenAI拒絕透露Sora的訓(xùn)練數(shù)據(jù)來源,，但《紐約時(shí)報(bào)》援引熟悉Sora訓(xùn)練情況的消息人士報(bào)道稱，該公司使用了谷歌油管服務(wù)上的視頻來訓(xùn)練人工智能模型,。404 Media此前曾報(bào)道,，Runway似乎也使用了油管上的視頻來訓(xùn)練Gen 3 Alpha。

ImageFX在筆者所在的德國無法使用,。然而,，谷歌DeepMind的一位發(fā)言人否認(rèn)這與歐盟新的《人工智能法案》有任何關(guān)聯(lián)，該法案要求大型科技公司提供詳細(xì)的摘要,，說明他們在訓(xùn)練人工智能模型時(shí)使用了哪些受版權(quán)保護(hù)的數(shù)據(jù),。他們表示：“我們通常會先在某一特定市場或有限的市場范圍內(nèi)逐步推進(jìn)試驗(yàn)，然后再拓展到更廣闊的市場,?！保ㄘ?cái)富中文網(wǎng)）

譯者：中慧言-王芳

在Alphabet旗下的谷歌DeepMind推出Veo人工智能視頻生成器僅7個(gè)月后，該部門再次宣布推出Veo 2,。

新工具能夠生成分辨率高達(dá)4K的視頻,，而第一代Veo僅支持最高1080p的視頻處理。谷歌聲稱,，升級后的Veo生成的場景的物理效果有所改進(jìn),，而且“相機(jī)控制”功能也更為出色（盡管并不涉及實(shí)體相機(jī)，但用戶可以通過指令提示模型選用特定的相機(jī)鏡頭和拍攝角度,，從特寫鏡頭到搖攝再到“定場鏡頭”）,。

DeepMind還宣布了Imagen 3文本到圖像模型的更新版本，盡管這些改進(jìn)——比如“構(gòu)圖更為均衡”的圖像以及藝術(shù)風(fēng)格更為貼合——顯然不足以讓其升級到全新的版本號,。Imagen 3于今年8月首次推出,。

Veo 2升級至4K分辨率表明DeepMind在視頻生成領(lǐng)域相較于競爭對手的人工智能實(shí)驗(yàn)室取得了領(lǐng)先優(yōu)勢。

一周前，OpenAI終于推出了Sora視頻生成器（早在2月份就已公布）,，但Sora（尤其是目前僅向ChatGPT Plus和Pro用戶開放的Sora Turbo版本）的導(dǎo)出分辨率上限僅為1080p,。目前最受歡迎的人工智能視頻生成器Runway的導(dǎo)出分辨率更是局限在較為模糊的720p。

谷歌在Veo 2的演示中表示：“低分辨率視頻在移動(dòng)設(shè)備上播放效果很好,，但創(chuàng)作者希望看到自己的作品在大屏幕上熠熠生輝,。”

谷歌發(fā)言人表示,，在默認(rèn)情況下，Veo 2生成的4K視頻時(shí)長被限制在8秒以內(nèi),，但可以延長至2分鐘或以上,。Sora生成的1080p視頻時(shí)長則被限制在20秒以內(nèi)。

DeepMind聲稱,，在對Veo 2和Sora Turbo進(jìn)行比較時(shí),，59%的人類評分者更青睞谷歌的服務(wù)，27%的人則選擇Sora Turbo,。它還聲稱,，在與Minimax及Meta的Movie Gen的較量中，DeepMind也取得了類似的勝利,。當(dāng)競爭對手是來自中國的快手科技（Kuaishou Technology）的Kling v1.5時(shí),，Veo 2的受青睞程度僅略低于50%。

據(jù)DeepMind稱,，在“遵循提示”（即按照要求完成任務(wù)）方面,，Veo 2的受青睞程度也相似。

該谷歌部門還聲稱在消除多余手指等“幻覺”細(xì)節(jié)方面取得了重大進(jìn)展,，并且在展示“對現(xiàn)實(shí)世界物理學(xué)以及人類動(dòng)作和表情細(xì)微差別有更好理解”方面也取得了重大進(jìn)展,。

物理學(xué)問題一直是視頻生成器面臨的一大難題。例如,，Sora就難以生成逼真的體操運(yùn)動(dòng)員及其復(fù)雜動(dòng)作視頻,。Veo 2在這方面會有多大改進(jìn)還有待觀察。

斯坦福大學(xué)（Stanford）教授,、World Labs聯(lián)合創(chuàng)始人李飛飛（Fei-Fei Li）等人認(rèn)為,，只有所謂的世界模型才能真正解決物理和物體永恒性等難題，這些模型具有“空間智能”,，能夠理解和生成三維環(huán)境,。谷歌于本月早些時(shí)候推出了Genie 2世界模型，但其重點(diǎn)是生成環(huán)境,，用于訓(xùn)練和評估在虛擬環(huán)境中運(yùn)行的人工智能“代理”,。

圖像和視頻生成器的輸出越逼真，其被用于非法目的的風(fēng)險(xiǎn)就越大。DeepMind在Veo 2視頻片段中添加了隱形的SynthID水印,。如果人們在查看視頻時(shí)發(fā)現(xiàn)了人工智能來源的蛛絲馬跡,，那么利用這些視頻片段進(jìn)行政治造謠的難度就會加大。對于更普通的欺詐性應(yīng)用程序,，這一措施可能并不奏效,，因?yàn)槭芎φ卟惶赡軝z查文件中是否有隱形水印。

相比之下,，OpenAI的Sora在其生成視頻的右下角添加了明顯的動(dòng)畫標(biāo)識,。Sora還使用開源的C2PA水印協(xié)議，這是SynthID的替代系統(tǒng)（盡管谷歌也在2月份加入了C2PA計(jì)劃）,。

Veo 2現(xiàn)已被整合進(jìn)谷歌實(shí)驗(yàn)室的VideoFX生成工具（分辨率上限為720p）,，而修改后的Imagen 3模型如今也已應(yīng)用于ImageFX工具。VideoFX目前只在美國推出,，但I(xiàn)mageFX可在100多個(gè)國家使用,。

谷歌DeepMind并未透露Veo 2和新版Imagen 3所使用的訓(xùn)練數(shù)據(jù)來源，不過該公司此前曾暗示,，油管（YouTube）上的視頻（這兩家公司都隸屬于Alphabet）是原始Veo版本部分訓(xùn)練數(shù)據(jù)的來源,。

許多藝術(shù)家、攝影師,、創(chuàng)作者和電影制作人擔(dān)心,，他們受版權(quán)保護(hù)的作品會在未經(jīng)授權(quán)的情況下被用于訓(xùn)練此類系統(tǒng)。OpenAI拒絕透露Sora的訓(xùn)練數(shù)據(jù)來源,，但《紐約時(shí)報(bào)》援引熟悉Sora訓(xùn)練情況的消息人士報(bào)道稱,，該公司使用了谷歌油管服務(wù)上的視頻來訓(xùn)練人工智能模型。404 Media此前曾報(bào)道,，Runway似乎也使用了油管上的視頻來訓(xùn)練Gen 3 Alpha,。

ImageFX在筆者所在的德國無法使用。然而,，谷歌DeepMind的一位發(fā)言人否認(rèn)這與歐盟新的《人工智能法案》有任何關(guān)聯(lián),，該法案要求大型科技公司提供詳細(xì)的摘要，說明他們在訓(xùn)練人工智能模型時(shí)使用了哪些受版權(quán)保護(hù)的數(shù)據(jù),。他們表示：“我們通常會先在某一特定市場或有限的市場范圍內(nèi)逐步推進(jìn)試驗(yàn),，然后再拓展到更廣闊的市場?！保ㄘ?cái)富中文網(wǎng)）

譯者：中慧言-王芳

Just seven months after it unveiled its Veo AI video generator, Alphabet division Google DeepMind has announced Veo 2.

The new tool can generate videos of up to 4K resolution, whereas the first Veo could only handle up to 1080p. Google is claiming improvements in the physics of the scenes that the upgraded Veo generates, as well as better “camera control” (there is no real camera involved, but the user can prompt the model for specific camera shots and angles, from close ups to pans to “establishing shots.”)

DeepMind also announced an updated version of its Imagen 3 text-to-image model, though the changes—like “more compositionally balanced” images and improved adherence to artistic styles—clearly aren’t big enough to warrant a full new version number. Imagen 3 first rolled out in August.

Veo 2’s step up to 4K suggests DeepMind is pulling ahead of rival AI labs in video generation.

OpenAI finally released its Sora video generator a week ago, after having unveiled it all the way back in February, but the output of Sora (specifically, the Sora Turbo version that is now available to ChatGPT Plus and Pro users) remains limited to a maximum resolution of 1080p. Runway, which is perhaps the most popular of the current AI video generators, can only export at an even fuzzier 720p.

“Low resolution video is great for mobile, but creators want to see their work shine on the big screen,” Google said in a presentation on Veo 2.

Veo 2’s 4K clips are limited to eight seconds by default, but they can be extended to two minutes or more, said a Google spokesperson. Sora’s 1080p clips are capped at 20 seconds.

DeepMind claims that, when comparing Veo 2 to Sora Turbo, 59% of human raters preferred Google’s service, with 27% opting for Sora Turbo. It also claims similar victories against Minimax and Meta’s Movie Gen, with Veo 2 preference only slipping slightly below 50% when the rival was Kling v1.5, a service from China’s Kuaishou Technology.

When it comes to “prompt adherence”—i.e. doing what it was asked to do—Veo 2 was preferred at similar rates, according to DeepMind.

The Google unit also claims to have made significant strides in combating “hallucinated” details, like bonus fingers, and in demonstrating “a better understanding of real-world physics and the nuances of human movement and expression.”

The physics issue is one that continues to bedevil video generators. Sora, for example, struggles to generate plausible footage of gymnasts and their complex movements. It remains to be seen how much better Veo 2 will prove in this regard.

Some, like Stanford professor and World Labs co-founder Fei-Fei Li, argue that issues like physics and object permanence can only really be solved with so-called world models that have the “spatial intelligence” to understand and generate 3D environments. Google unveiled its own Genie 2 world model earlier this month, but with a focus on generating environments that can be used to train and evaluate AI “agents” that operate in virtual environments.

The more plausible the output of image and video generators, the greater the risk of them being used for nefarious purposes. DeepMind applies invisible SynthID watermarks to Veo 2 clips, which should make it more difficult to use them for political disinformation, if people are checking videos for such telltale signs of AI origins. The same may not hold true for more mundane fraudulent applications, where victims would be less likely to check the file for invisible watermarks.

By way of contrast, OpenAI’s Sora embeds a visible animation in the bottom right corner of its videos. Sora also uses the open-source C2PA watermarking protocol, an alternative system to SynthID (though Google also joined the C2PA initiative in February.)

Veo 2 is now powering Google Labs’s VideoFX generation tool (which has a resolution cap of 720p,) while the revised Imagen 3 model can now be used in the ImageFX tool. VideoFX is currently only rolling out in the U.S., but ImageFX is available in over 100 countries.

Google DeepMind has not said what data was used to train Veo 2 or the new version of Imagen 3, though it previously hinted that YouTube videos (both companies fall under the Alphabet umbrella) comprised some of the training data for the original Veo.

Many artists, photographers, creators and filmmakers are concerned their copyrighted works have been used to train such systems without their consent. OpenAI has refused to say what data was used to train Sora but the New York Times, citing sources familiar with Sora’s training, has reported that the company used videos from Google’s YouTube service to train the AI model. 404 Media has previously reported that Runway also seems to have used YouTube videos to train Gen 3 Alpha.

ImageFX is not available in Germany, where this writer is based. However, a Google DeepMind spokesperson denied that this had anything to do with the EU’s new AI Act, which demands that Big Tech firms provide a detailed summary of what copyright-protected data they use to train their AI models. “We often ramp up experiments in one or limited markets before expanding more broadly,” they said.

財(cái)富中文網(wǎng)所刊載內(nèi)容之知識產(chǎn)權(quán)為財(cái)富媒體知識產(chǎn)權(quán)有限公司及/或相關(guān)權(quán)利人專屬所有或持有,。未經(jīng)許可，禁止進(jìn)行轉(zhuǎn)載,、摘編,、復(fù)制及建立鏡像等任何使用。

0條Plus

精彩評論

評論

撰寫或查看更多評論

請打開財(cái)富Plus APP

前往打開

熱讀文章

關(guān)注我們

谷歌旗下公司發(fā)布新款視頻生成器,，分辨率超Sora

撰寫或查看更多評論