《DALL-E》嘅修訂比較
What if format like English version? |
|||
第15行: | 第15行: | ||
|website={{url|https://www.openai.com/blog/dall-e/}} |
|website={{url|https://www.openai.com/blog/dall-e/}} |
||
}} |
}} |
||
'''DALL-E''',風格化叫做 '''DALL·E''',係一種[[人工智能]]程式,透過文本描述嚟生成圖像嘅,2021年1月5號由 OpenAI 公佈<ref name="tc" |
'''DALL-E''',風格化叫做 '''DALL·E''',係一種[[人工智能]]程式,透過文本描述嚟生成圖像嘅,2021年1月5號由 OpenAI 公佈<ref name="tc" /><ref name="mittr" />。呢個程序係基於120億參數<ref name="vb" />版本嘅 [[GPT-3]] [[Transformer (機械學習模型)|Transformer模型]]嘅,攞嚟解釋啲[[自然語言]]輸入(譬如「形狀似[[五邊形]]嘅綠色皮革錢包」抑或「悲傷[[水豚]]嘅[[等軸測圖]]」)並生成相應嘅圖片。佢識創建逼真對象嘅圖像(「帶有藍色[[士多啤梨]]圖像嘅彩色玻璃窗」)與及實際上唔存在嘅對象(「有豪豬紋理嘅[[立方體]]」) <ref name="vb2oped" /><ref name="zme" /><ref name="axios" /><ref name="synced" />。DALL-E 個名係一個[[混成詞]],嚟自 ''[[太空奇兵·威E|WALL-E]]'' 同[[達利]]。 |
||
由2000年代起,好多神經網絡已經識生成逼真嘅圖像<ref name="tc" />,但 DALL-E 識由啲[[自然語言]]提示嚟生成圖像、啲「佢理解到[...]並且好少有好嚴重嘅失敗情況」嘅<ref name="tc" />。 |
由2000年代起,好多神經網絡已經識生成逼真嘅圖像<ref name="tc" />,但 DALL-E 識由啲[[自然語言]]提示嚟生成圖像、啲「佢理解到[...]並且好少有好嚴重嘅失敗情況」嘅<ref name="tc" />。 |
||
DALL-E 佮埋另一種模型 CLIP({{Lang-en|Contrastive Language-Image Pre-training}},「對比語言-圖像預訓練」) <ref name="mittr" |
DALL-E 佮埋另一種模型 CLIP({{Lang-en|Contrastive Language-Image Pre-training}},「對比語言-圖像預訓練」) <ref name="mittr" />一齊開發並向公眾宣布,種模型嘅作用係「理解同埋排列」啲輸出<ref name="tc" />。DALL-E嘅原始輸出由CLIP整理展示,CLIP識為任意畀定嘅提示提供最高質量嘅圖像。OpenAI 拒絕發布任何一種模型嘅源代碼。OpenAI 嘅網站度提供唨 DALL-E 嘅「受控演示」,可以喺其中查啲根據有限示例提示得到嘅輸出<ref name="vb" />。 |
||
根據《[[麻省理工學院技術評論]]》,DALL-E嘅開發過程中,OpenAI嘅目標之一係「令到語言模型掌握啲日常概念掌握得更好,啲人類攞嚟理解事物嘅 <ref name="mittr" |
根據《[[麻省理工學院技術評論]]》,DALL-E嘅開發過程中,OpenAI嘅目標之一係「令到語言模型掌握啲日常概念掌握得更好,啲人類攞嚟理解事物嘅 <ref name="mittr" />。」 |
||
== 架構 == |
== 架構 == |
||
[[發電機預訓練變壓器|通用預訓練Transformer]](GPT)模型由OpenAI首次開發於2018年<ref name="gpt1paper" |
[[發電機預訓練變壓器|通用預訓練Transformer]](GPT)模型由OpenAI首次開發於2018年<ref name="gpt1paper" />, 使用[[Transformer (機械學習模型)|Transformer]]架構。第一次迭代成品GPT得喺2019年擴大規模嚟轉成[[GPT-2]] <ref name="gpt2paper" />。2020年,GPT-2嘅功能都得到唨類似嘅增強,嚟轉成[[GPT-3]]<ref name="gpt3paper" />,DALL-E就係其中嘅一種<ref name="vb" /><ref name="dallepaper" />。佢使用[[零次學習]]嚟由描述跟提示生成輸出,而唔使進一步訓練<ref name="engadget" />。 |
||
DALL-E 嘅模型係 GPT-3<ref name="vb" />嘅 120 億參數版本(由GPT-3嘅成1,750億嘅參數大小縮減成)<ref name="gpt3paper" />,呢個模型「攞像素為單位交換文本」,訓練喺啲嚟自互聯網嘅文本圖像對(text-image pairs)嘅基礎上<ref name="mittr" />。 |
|||
DALL-E 嘅模型係 GPT-3<ref name="vb">{{引網|url=https://venturebeat.com/2021/01/05/openai-debuts-dall-e-for-generating-images-from-text/|title=OpenAI debuts DALL-E for generating images from text|last=Johnson|first=Khari|date=5 January 2021|website=|publisher=VentureBeat|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFJohnson2021">Johnson, Khari (5 January 2021). [https://venturebeat.com/2021/01/05/openai-debuts-dall-e-for-generating-images-from-text/ "OpenAI debuts DALL-E for generating images from text"]. VentureBeat<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref>嘅 120 億參數版本(由GPT-3嘅成1,750億嘅參數大小縮減成)<ref name="gpt3paper">{{Cite arXiv|last=Brown|title=Language Models are Few-Shot Learners}}</ref>,呢個模型「攞像素為單位交換文本」,訓練喺啲嚟自互聯網嘅文本圖像對(text-image pairs)嘅基礎上<ref name="mittr">{{引網|url=https://www.technologyreview.com/2021/01/05/1015754/avocado-armchair-future-ai-openai-deep-learning-nlp-gpt3-computer-vision-common-sense/|title=This avocado armchair could be the future of AI|last=Heaven|first=Will Douglas|date=5 January 2021|website=|publisher=MIT Technology Review|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFHeaven2021">Heaven, Will Douglas (5 January 2021). [https://www.technologyreview.com/2021/01/05/1015754/avocado-armchair-future-ai-openai-deep-learning-nlp-gpt3-computer-vision-common-sense/ "This avocado armchair could be the future of AI"]. MIT Technology Review<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref>。 |
|||
DALL-E 根據提示生成大量圖像;另一種 OpenAI 模型 CLIP、佮埋 DALL-E 共同開發並同時宣布嘅,係負責「理解跟排序」佢啲輸出 <ref name="tc" |
DALL-E 根據提示生成大量圖像;另一種 OpenAI 模型 CLIP、佮埋 DALL-E 共同開發並同時宣布嘅,係負責「理解跟排序」佢啲輸出 <ref name="tc" />。CLIP嘅訓練接受有超過4億對圖像跟文本<ref name="vb" />。CLIP係一種圖像識別系統<ref name="mittr" />;但係,戥大多數[[統計分類|分類器]]模型唔同嘅係,CLIP唔係集中訓練喺啲有標記過嘅圖像(譬如 [[影像網|ImageNet]])嘅精選數據之上,而係訓練喺啲從Internet[[網頁爬取|爬]]到嘅圖像跟描述之上。CLIP唔係學習自單個標籤,而係學習捉圖像同成個標題相關聯。訓練過嘅CLIP識預測邊種描述(透過「隨機揀選」由32,768個可能描述當中揀出)最啱返個圖像,係噉令佢有能力喺訓練集之外識別各種圖像當中嘅對象。 |
||
== 表現 == |
== 表現 == |
||
DALL-E識以多種風格生成圖像,從[[照相寫實主義|逼真]]嘅圖像<ref name="vb" |
DALL-E識以多種風格生成圖像,從[[照相寫實主義|逼真]]嘅圖像<ref name="vb" />到[[畫畫|繪畫]]跟表情符號。佢仲識「操縱跟重新排列」佢啲圖像當中嘅對象<ref name="vb" />。佢有一項功能俾佢個創造者註意到嘅係識將設計元素正確噉擺喺新穎嘅構圖當中、而唔使明確嘅指示:「譬如,着要求繪製白蘿蔔吹佢個鼻、啜拿鐵又或者騎獨輪車嗰陣,DALL·E 通常會喺合理嘅位置繪製啲方巾、手同埋腳 <ref name="boing" />。」 |
||
儘管DALL-E展示唨各種各樣嘅技能跟能力,但喺佢個公開 demo 嘅發布入便,大多數報導都集中喺一少部分「超現實」<ref name="mittr" />抑或「古怪」<ref name="cnbc" /> 輸出圖像度。具體嚟講,DALL-E嘅輸出「著住[[芭蕾舞裙]]、帶緊狗嘅蘿蔔仔插圖」,就喺''Input''、NBC、''Nature''、''VentureBeat''、''Wired''、CNN、''New Scientist''與及BBC嘅文章當中都有提到 <ref name="input" /> <ref name="nbc" /> <ref name="nature" /> <ref name="vb /> <ref name="wired" /> <ref name="cnn" /><ref name="newscientist" /><ref name="bbc">{{引網|url=https://www.bbc.com/news/technology-55559463|title=AI draws dog-walking baby radish in a tutu|last=Wakefield|first=Jane|date=6 January 2021|website=|publisher=[[British Broadcasting Corporation]]|access-date=3 March 2021|quote=}}</ref>;佢啲針對「[[牛油果]]狀扶手椅」提示嘅輸出,就俾 ''Wired''、''VentureBeat''、''New Scientist''、NBC、''麻省理工科技評論''、CNBC、CNN 與及 BBC 報導唨<ref name="mittr" /><ref name="vb" /><ref name="cnbc" /><ref name="wired" /><ref name="cnn" /><ref name="newscientist" /><ref name="bbc" />。相反,機械學習工程師Dale Markowitz喺''TheNextWeb''度報導唨DALL-E意外發展出嘅視覺推理技能,個足夠攞嚟解決啲[[烏鴉的漸進矩陣|Raven矩陣]](一種視覺測試,好多時係畀人類做嚟測量智力嘅 )嘅 <ref name="dale" />。 |
|||
儘管DALL-E展示唨各種各樣嘅技能跟能力,但喺佢個公開 demo 嘅發布入便,大多數報導都集中喺一少部分「超現實」<ref name="mittr">{{引網|url=https://www.technologyreview.com/2021/01/05/1015754/avocado-armchair-future-ai-openai-deep-learning-nlp-gpt3-computer-vision-common-sense/|title=This avocado armchair could be the future of AI|last=Heaven|first=Will Douglas|date=5 January 2021|website=|publisher=MIT Technology Review|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFHeaven2021">Heaven, Will Douglas (5 January 2021). [https://www.technologyreview.com/2021/01/05/1015754/avocado-armchair-future-ai-openai-deep-learning-nlp-gpt3-computer-vision-common-sense/ "This avocado armchair could be the future of AI"]. MIT Technology Review<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref>抑或「古怪」<ref name="cnbc">{{引網|url=https://www.cnbc.com/2021/01/08/openai-shows-off-dall-e-image-generator-after-gpt-3.html|title=Why everyone is talking about an image generator released by an Elon Musk-backed A.I. lab|last=Shead|first=Sam|date=8 January 2021|website=|publisher=[[CNBC]]|access-date=2 March 2021|quote=}}</ref> 輸出圖像度。具體嚟講,DALL-E嘅輸出「著住[[芭蕾舞裙]]、帶緊狗嘅蘿蔔仔插圖」,就喺''Input''、NBC、''Nature''、''VentureBeat''、''Wired''、CNN、''New Scientist''與及BBC嘅文章當中都有提到 <ref name="input">{{引網|url=https://www.inputmag.com/tech/dalle-takes-your-text-turns-it-into-surreal-captivating-art|title=This AI turns text into surreal, suggestion-driven art|last=Kasana|first=Mehreen|date=7 January 2021|website=|publisher=Input|access-date=2 March 2021|quote=}}</ref> <ref name="nbc">{{引網|url=https://www.nbcnews.com/tech/innovation/here-s-dall-e-algorithm-learned-draw-anything-you-tell-n1255834|title=Here's DALL-E: An algorithm learned to draw anything you tell it|last=Ehrenkranz|first=Melanie|date=27 January 2021|website=|publisher=[[NBC News]]|access-date=2 March 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFEhrenkranz2021">Ehrenkranz, Melanie (27 January 2021). [https://www.nbcnews.com/tech/innovation/here-s-dall-e-algorithm-learned-draw-anything-you-tell-n1255834 "Here's DALL-E: An algorithm learned to draw anything you tell it"]. [[NBC新聞|NBC News]]<span class="reference-accessdate">. Retrieved <span class="nowrap">2 March</span> 2021</span>.</cite></ref> <ref name="nature">{{引網|url=https://www.nature.com/immersive/d41586-021-00095-y/index.html|title=Tardigrade circus and a tree of life — January's best science images|last=Stove|first=Emma|date=5 February 2021|website=|publisher=[[Nature (journal)|Nature]]|access-date=2 March 2021|quote=}}</ref> <ref name="vb">{{引網|url=https://venturebeat.com/2021/01/05/openai-debuts-dall-e-for-generating-images-from-text/|title=OpenAI debuts DALL-E for generating images from text|last=Johnson|first=Khari|date=5 January 2021|website=|publisher=VentureBeat|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFJohnson2021">Johnson, Khari (5 January 2021). [https://venturebeat.com/2021/01/05/openai-debuts-dall-e-for-generating-images-from-text/ "OpenAI debuts DALL-E for generating images from text"]. VentureBeat<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref> <ref name="wired">{{引網|url=https://www.wired.com/story/ai-go-art-steering-self-driving-car/|title=This AI Could Go From ‘Art’ to Steering a Self-Driving Car|last=Knight|first=Will|date=26 January 2021|website=|publisher=Wired|access-date=2 March 2021|quote=}}</ref> <ref name="cnn">{{引網|url=https://www.cnn.com/2021/01/08/tech/artificial-intelligence-openai-images-from-text/index.html|title=A radish in a tutu walking a dog? This AI can draw it really well|last=Metz|first=Rachel|date=2 February 2021|website=|publisher=CNN|access-date=2 March 2021|quote=}}</ref> <ref name="newscientist">{{引網|url=https://www.newscientist.com/article/2264022-ai-illustrator-draws-imaginative-pictures-to-go-with-text-captions/|title=AI illustrator draws imaginative pictures to go with text captions|last=Stokel-Walker|first=Chris|date=5 January 2021|website=|publisher=[[New Scientist]]|access-date=4 March 2021|quote=}}</ref> <ref name="bbc">{{引網|url=https://www.bbc.com/news/technology-55559463|title=AI draws dog-walking baby radish in a tutu|last=Wakefield|first=Jane|date=6 January 2021|website=|publisher=[[British Broadcasting Corporation]]|access-date=3 March 2021|quote=}}</ref>;佢啲針對「[[牛油果]]狀扶手椅」提示嘅輸出,就俾 ''Wired''、''VentureBeat''、''New Scientist''、NBC、''麻省理工科技評論''、CNBC、CNN 與及 BBC 報導唨<ref name="mittr">{{引網|url=https://www.technologyreview.com/2021/01/05/1015754/avocado-armchair-future-ai-openai-deep-learning-nlp-gpt3-computer-vision-common-sense/|title=This avocado armchair could be the future of AI|last=Heaven|first=Will Douglas|date=5 January 2021|website=|publisher=MIT Technology Review|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFHeaven2021">Heaven, Will Douglas (5 January 2021). [https://www.technologyreview.com/2021/01/05/1015754/avocado-armchair-future-ai-openai-deep-learning-nlp-gpt3-computer-vision-common-sense/ "This avocado armchair could be the future of AI"]. MIT Technology Review<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref><ref name="vb">{{引網|url=https://venturebeat.com/2021/01/05/openai-debuts-dall-e-for-generating-images-from-text/|title=OpenAI debuts DALL-E for generating images from text|last=Johnson|first=Khari|date=5 January 2021|website=|publisher=VentureBeat|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFJohnson2021">Johnson, Khari (5 January 2021). [https://venturebeat.com/2021/01/05/openai-debuts-dall-e-for-generating-images-from-text/ "OpenAI debuts DALL-E for generating images from text"]. VentureBeat<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref><ref name="nbc">{{引網|url=https://www.nbcnews.com/tech/innovation/here-s-dall-e-algorithm-learned-draw-anything-you-tell-n1255834|title=Here's DALL-E: An algorithm learned to draw anything you tell it|last=Ehrenkranz|first=Melanie|date=27 January 2021|website=|publisher=[[NBC News]]|access-date=2 March 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFEhrenkranz2021">Ehrenkranz, Melanie (27 January 2021). [https://www.nbcnews.com/tech/innovation/here-s-dall-e-algorithm-learned-draw-anything-you-tell-n1255834 "Here's DALL-E: An algorithm learned to draw anything you tell it"]. [[NBC新聞|NBC News]]<span class="reference-accessdate">. Retrieved <span class="nowrap">2 March</span> 2021</span>.</cite></ref><ref name="cnbc" /><ref name="wired" /><ref name="cnn" /><ref name="newscientist" /><ref name="bbc" />。相反,機械學習工程師Dale Markowitz喺''TheNextWeb''度報導唨DALL-E意外發展出嘅視覺推理技能,個足夠攞嚟解決啲[[烏鴉的漸進矩陣|Raven矩陣]](一種視覺測試,好多時係畀人類做嚟測量智力嘅 )嘅 <ref name="dale">{{引網|url=https://thenextweb.com/neural/2021/01/10/heres-how-openais-magical-dall-e-generates-images-from-text-syndication/|title=Here’s how OpenAI’s magical DALL-E image generator works|last=Markowitz|first=Dale|date=10 January 2021|website=|publisher=[[TheNextWeb]]|access-date=2 March 2021|quote=}}</ref>。 |
|||
''[[自然(期刊)|《自然》雜誌]]''將DALL-E稱為「一種人工智能程序,識繪製幾乎你所需要到嘅任何嘢嘅」<ref name="nature" />。''[[下一個網站|TheNextWeb]]'' 嘅托馬斯·麥考雷(Thomas Macaulay)稱佢啲圖像「醒目」同埋「令人印象深刻」,點名到佢個「畀有一條提示、包括未經歷過訓練嘅啲奇幻物體、啲係由嘅無關思想結合起身嘅,就識探索條提示個結構並創建全新圖片嘅能力」 <ref name="tnw" |
''[[自然(期刊)|《自然》雜誌]]''將DALL-E稱為「一種人工智能程序,識繪製幾乎你所需要到嘅任何嘢嘅」<ref name="nature" />。''[[下一個網站|TheNextWeb]]'' 嘅托馬斯·麥考雷(Thomas Macaulay)稱佢啲圖像「醒目」同埋「令人印象深刻」,點名到佢個「畀有一條提示、包括未經歷過訓練嘅啲奇幻物體、啲係由嘅無關思想結合起身嘅,就識探索條提示個結構並創建全新圖片嘅能力」 <ref name="tnw" />。''ExtremeTech''表示:「有時渲染效果唔似人手繪畫咁好,但係有時又係精確嘅刻畫」 <ref name="extreme" />。''[[TechCrunch]]'' 咁講,儘管DALL-E係「非常之有趣且功能強大嘅成果」,但佢有時會產生啲奇怪又或者難以理解嘅輸出,並且「好多佢生成嘅圖像都有啲……走趲」 <ref name="tc" />。<blockquote>話「形狀像五邊形嘅綠色皮革錢包」可能會產生預期效果,但係「形狀像五邊形嘅藍色絨面皮革錢包」可能會產生噩夢。點解?考慮到呢啲系統嘅[[黑盒]]性質,好難講 <ref name="tc" />。</blockquote>儘管如此,DALL-E着描述為「對噉樣嘅變化[[頑健性|頑健]]交關」,並且喺生成用於各種任意描述嘅圖像陣時都係可靠嘅 <ref name="tc" />。[[CNBC]]嘅 Sam Shead 報道稱佢啲圖像「古怪」,並引用埋[[劍橋大學]]機械學習教授 Neil Lawrence 嘅描述,之話佢係「一種鼓舞性嘅演示,演示到呢啲模型嘅嗰種能力,即存儲啲關於我哋世界嘅信息並加以概括、而且係以一種人類認為非常之自然嘅方式」。佢仲引用埋[[佐治亞理工學院|佐治亞州]]互動計算技術學院副教授 Mark Riedl 嘅話,佢講到 DALL-E 嘅演示結果表明DALL-E識「連貫噉溝埋啲概念」、之係人類[[創意|創造力]]嘅關鍵要素,而「 DALL -E演示非常之出色,喺產生[[插圖]]方面,啲插圖連貫過我過去幾年睇過嘅其他 Text2Image 系統。」 [[英國廣播公司]]仲引用里德爾嘅話講到,佢「對呢個系統嘅能力印象深刻」 <ref name="bbc" />。 |
||
DALL-E 識「填補空白」、喺冇特定提示嘅情況下推斷出適當嘅細節。''ExtremeTech''指出,一個提示要求到畫一隻著住[[聖誕唥衫|聖誕節唥衫]]嘅企鵝嘅圖像唔單止會產生啲形象係著住唥衫嘅企鵝嘅、而且仲會係著住同個主題相關嘅[[聖誕帽]]嘅<ref name="extreme" |
DALL-E 識「填補空白」、喺冇特定提示嘅情況下推斷出適當嘅細節。''ExtremeTech''指出,一個提示要求到畫一隻著住[[聖誕唥衫|聖誕節唥衫]]嘅企鵝嘅圖像唔單止會產生啲形象係著住唥衫嘅企鵝嘅、而且仲會係著住同個主題相關嘅[[聖誕帽]]嘅<ref name="extreme" />。''[[Engadget]]''亦都指出,喺針對「一幅畫冬天坐喺田野裡嘅狐狸嘅肖像」條提示嘅輸出裏頭陰影都擺得好啱<ref name="engadget" />。另自例子當中,DALL-E仲展示唨對視覺同設計風向嘅廣泛理解;''ExtremeTech''表示,「你可以向DALL-E要求畀出喺指定時期當中手機抑或吸塵器嘅照片,而佢瞭解呢啲嘢係點樣變化嘅」<ref name="extreme" />。''Engadget''都指出唨佢個「理解電話跟其他嘢點樣隨時間變化」嘅非凡能力<ref name="engadget" />。 |
||
=== 可能影響 === |
=== 可能影響 === |
||
OpenAI拒絕唨發布DALL-E嘅源代碼、或者允許喺少量示例提示之外佢嘅使用 <ref name="vb" |
OpenAI拒絕唨發布DALL-E嘅源代碼、或者允許喺少量示例提示之外佢嘅使用 <ref name="vb" />;OpenAI聲稱佢計劃喺DALL-E之類嘅模型中「分析啲社會影響」<ref name="tnw" />同埋「潛在嘅走趲」<ref name="cnbc" />。即管缺乏權限攞到 DALL-E,但 DALL-E 嘅至少一種可能嘅影響已經有討論到,一啲記者跟內容作者主要預測 DALL-E 可能會對新聞跟內容寫作領域產生影響。Sam Shead 喺 CNBC 嘅文章當中指出過,有啲人擔心缺乏已發表嘅論文有描述到呢個系統嘅,而且 DALL-E 仲未「開源」{{sic}} <ref name="cnbc" />。 |
||
儘管 ''TechCrunch'' 話「暫時唔好寫照片同插圖廣告書住」<ref name="tc" />, ''Engadget'' 咁講:「如果進一步發展,DALL-E 會有顛覆股票照片同插圖等領域嘅巨大潛力,唔論佢個影響會係好定壞」<ref name="engadget />。 |
|||
儘管 ''TechCrunch'' 話「暫時唔好寫照片同插圖廣告書住」<ref name="tc">{{引網|url=https://techcrunch.com/2021/01/05/openais-dall-e-creates-plausible-images-of-literally-anything-you-ask-it-to/|title=OpenAI's DALL-E creates plausible images of literally anything you ask it to|last=Coldewey|first=Devin|date=5 January 2021|website=|publisher=|access-date=5 January 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFColdewey2021">Coldewey, Devin (5 January 2021). [https://techcrunch.com/2021/01/05/openais-dall-e-creates-plausible-images-of-literally-anything-you-ask-it-to/ "OpenAI's DALL-E creates plausible images of literally anything you ask it to"]<span class="reference-accessdate">. Retrieved <span class="nowrap">5 January</span> 2021</span>.</cite></ref>, ''Engadget'' 咁講:「如果進一步發展,DALL-E 會有顛覆股票照片同插圖等領域嘅巨大潛力,唔論佢個影響會係好定壞」<ref name="engadget">{{引網|url=https://www.engadget.com/dall-e-ai-gpt-make-image-from-any-description-135535140.html|title=OpenAI's DALL-E app generates images from just a description|last=Dent|first=Steve|date=6 January 2021|website=|publisher=[[Engadget]]|access-date=2 March 2021|quote=}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFDent2021">Dent, Steve (6 January 2021). [https://www.engadget.com/dall-e-ai-gpt-make-image-from-any-description-135535140.html "OpenAI's DALL-E app generates images from just a description"]. [[Engadget]]<span class="reference-accessdate">. Retrieved <span class="nowrap">2 March</span> 2021</span>.</cite></ref>。 |
|||
喺[[福布斯|《福布斯》]]嘅一篇觀點文章當中,[[風險資本家|風險投資家]] Rob Toews 表示,DALL-E「預示唨一種新嘅AI範式,即[[多模態AI]] 」嘅出現,呢個系統會有能力「解釋、合成同埋轉換多種信息模式」。佢跟住話,DALL-E 證明到「越嚟越難以否認人工智能具有創造力」噉。根據樣本提示(其中包括埋著住衫嘅模特與及家具物品),佢預測[[時裝設計師]]同埋[[家具設計師]]可能會使埋 DALL-E,但佢又預測「技術會繼續噉快速改進」<ref name="forbesoped" |
喺[[福布斯|《福布斯》]]嘅一篇觀點文章當中,[[風險資本家|風險投資家]] Rob Toews 表示,DALL-E「預示唨一種新嘅AI範式,即[[多模態AI]] 」嘅出現,呢個系統會有能力「解釋、合成同埋轉換多種信息模式」。佢跟住話,DALL-E 證明到「越嚟越難以否認人工智能具有創造力」噉。根據樣本提示(其中包括埋著住衫嘅模特與及家具物品),佢預測[[時裝設計師]]同埋[[家具設計師]]可能會使埋 DALL-E,但佢又預測「技術會繼續噉快速改進」<ref name="forbesoped" />。 |
||
== 攷 == |
== 攷 == |
喺2021年3月6號 (六) 09:26嘅修訂
DALL-E sample.png Images produced by DALL-E when given the text prompt "a professional high quality illustration of a giraffe dragon chimera. a giraffe imitating a dragon. a giraffe made of dragon." | |
原作者 | OpenAI |
---|---|
初始版本 | 2021年1月5號 |
軟件類別 | Transformer 語言模型 |
網站 | www |
DALL-E,風格化叫做 DALL·E,係一種人工智能程式,透過文本描述嚟生成圖像嘅,2021年1月5號由 OpenAI 公佈[1][2]。呢個程序係基於120億參數[3]版本嘅 GPT-3 Transformer模型嘅,攞嚟解釋啲自然語言輸入(譬如「形狀似五邊形嘅綠色皮革錢包」抑或「悲傷水豚嘅等軸測圖」)並生成相應嘅圖片。佢識創建逼真對象嘅圖像(「帶有藍色士多啤梨圖像嘅彩色玻璃窗」)與及實際上唔存在嘅對象(「有豪豬紋理嘅立方體」) [4][5][6][7]。DALL-E 個名係一個混成詞,嚟自 WALL-E 同達利。
由2000年代起,好多神經網絡已經識生成逼真嘅圖像[1],但 DALL-E 識由啲自然語言提示嚟生成圖像、啲「佢理解到[...]並且好少有好嚴重嘅失敗情況」嘅[1]。
DALL-E 佮埋另一種模型 CLIP(英文:Contrastive Language-Image Pre-training,「對比語言-圖像預訓練」) [2]一齊開發並向公眾宣布,種模型嘅作用係「理解同埋排列」啲輸出[1]。DALL-E嘅原始輸出由CLIP整理展示,CLIP識為任意畀定嘅提示提供最高質量嘅圖像。OpenAI 拒絕發布任何一種模型嘅源代碼。OpenAI 嘅網站度提供唨 DALL-E 嘅「受控演示」,可以喺其中查啲根據有限示例提示得到嘅輸出[3]。
根據《麻省理工學院技術評論》,DALL-E嘅開發過程中,OpenAI嘅目標之一係「令到語言模型掌握啲日常概念掌握得更好,啲人類攞嚟理解事物嘅 [2]。」
架構
通用預訓練Transformer(GPT)模型由OpenAI首次開發於2018年[8], 使用Transformer架構。第一次迭代成品GPT得喺2019年擴大規模嚟轉成GPT-2 [9]。2020年,GPT-2嘅功能都得到唨類似嘅增強,嚟轉成GPT-3[10],DALL-E就係其中嘅一種[3][11]。佢使用零次學習嚟由描述跟提示生成輸出,而唔使進一步訓練[12]。
DALL-E 嘅模型係 GPT-3[3]嘅 120 億參數版本(由GPT-3嘅成1,750億嘅參數大小縮減成)[10],呢個模型「攞像素為單位交換文本」,訓練喺啲嚟自互聯網嘅文本圖像對(text-image pairs)嘅基礎上[2]。
DALL-E 根據提示生成大量圖像;另一種 OpenAI 模型 CLIP、佮埋 DALL-E 共同開發並同時宣布嘅,係負責「理解跟排序」佢啲輸出 [1]。CLIP嘅訓練接受有超過4億對圖像跟文本[3]。CLIP係一種圖像識別系統[2];但係,戥大多數分類器模型唔同嘅係,CLIP唔係集中訓練喺啲有標記過嘅圖像(譬如 ImageNet)嘅精選數據之上,而係訓練喺啲從Internet爬到嘅圖像跟描述之上。CLIP唔係學習自單個標籤,而係學習捉圖像同成個標題相關聯。訓練過嘅CLIP識預測邊種描述(透過「隨機揀選」由32,768個可能描述當中揀出)最啱返個圖像,係噉令佢有能力喺訓練集之外識別各種圖像當中嘅對象。
表現
DALL-E識以多種風格生成圖像,從逼真嘅圖像[3]到繪畫跟表情符號。佢仲識「操縱跟重新排列」佢啲圖像當中嘅對象[3]。佢有一項功能俾佢個創造者註意到嘅係識將設計元素正確噉擺喺新穎嘅構圖當中、而唔使明確嘅指示:「譬如,着要求繪製白蘿蔔吹佢個鼻、啜拿鐵又或者騎獨輪車嗰陣,DALL·E 通常會喺合理嘅位置繪製啲方巾、手同埋腳 [13]。」
儘管DALL-E展示唨各種各樣嘅技能跟能力,但喺佢個公開 demo 嘅發布入便,大多數報導都集中喺一少部分「超現實」[2]抑或「古怪」[14] 輸出圖像度。具體嚟講,DALL-E嘅輸出「著住芭蕾舞裙、帶緊狗嘅蘿蔔仔插圖」,就喺Input、NBC、Nature、VentureBeat、Wired、CNN、New Scientist與及BBC嘅文章當中都有提到 [15] [16] [17] [3] [18] [19][20][21];佢啲針對「牛油果狀扶手椅」提示嘅輸出,就俾 Wired、VentureBeat、New Scientist、NBC、麻省理工科技評論、CNBC、CNN 與及 BBC 報導唨[2][3][14][18][19][20][21]。相反,機械學習工程師Dale Markowitz喺TheNextWeb度報導唨DALL-E意外發展出嘅視覺推理技能,個足夠攞嚟解決啲Raven矩陣(一種視覺測試,好多時係畀人類做嚟測量智力嘅 )嘅 [22]。
《自然》雜誌將DALL-E稱為「一種人工智能程序,識繪製幾乎你所需要到嘅任何嘢嘅」[17]。TheNextWeb 嘅托馬斯·麥考雷(Thomas Macaulay)稱佢啲圖像「醒目」同埋「令人印象深刻」,點名到佢個「畀有一條提示、包括未經歷過訓練嘅啲奇幻物體、啲係由嘅無關思想結合起身嘅,就識探索條提示個結構並創建全新圖片嘅能力」 [23]。ExtremeTech表示:「有時渲染效果唔似人手繪畫咁好,但係有時又係精確嘅刻畫」 [24]。TechCrunch 咁講,儘管DALL-E係「非常之有趣且功能強大嘅成果」,但佢有時會產生啲奇怪又或者難以理解嘅輸出,並且「好多佢生成嘅圖像都有啲……走趲」 [1]。
話「形狀像五邊形嘅綠色皮革錢包」可能會產生預期效果,但係「形狀像五邊形嘅藍色絨面皮革錢包」可能會產生噩夢。點解?考慮到呢啲系統嘅黑盒性質,好難講 [1]。
儘管如此,DALL-E着描述為「對噉樣嘅變化頑健交關」,並且喺生成用於各種任意描述嘅圖像陣時都係可靠嘅 [1]。CNBC嘅 Sam Shead 報道稱佢啲圖像「古怪」,並引用埋劍橋大學機械學習教授 Neil Lawrence 嘅描述,之話佢係「一種鼓舞性嘅演示,演示到呢啲模型嘅嗰種能力,即存儲啲關於我哋世界嘅信息並加以概括、而且係以一種人類認為非常之自然嘅方式」。佢仲引用埋佐治亞州互動計算技術學院副教授 Mark Riedl 嘅話,佢講到 DALL-E 嘅演示結果表明DALL-E識「連貫噉溝埋啲概念」、之係人類創造力嘅關鍵要素,而「 DALL -E演示非常之出色,喺產生插圖方面,啲插圖連貫過我過去幾年睇過嘅其他 Text2Image 系統。」 英國廣播公司仲引用里德爾嘅話講到,佢「對呢個系統嘅能力印象深刻」 [21]。
DALL-E 識「填補空白」、喺冇特定提示嘅情況下推斷出適當嘅細節。ExtremeTech指出,一個提示要求到畫一隻著住聖誕節唥衫嘅企鵝嘅圖像唔單止會產生啲形象係著住唥衫嘅企鵝嘅、而且仲會係著住同個主題相關嘅聖誕帽嘅[24]。Engadget亦都指出,喺針對「一幅畫冬天坐喺田野裡嘅狐狸嘅肖像」條提示嘅輸出裏頭陰影都擺得好啱[12]。另自例子當中,DALL-E仲展示唨對視覺同設計風向嘅廣泛理解;ExtremeTech表示,「你可以向DALL-E要求畀出喺指定時期當中手機抑或吸塵器嘅照片,而佢瞭解呢啲嘢係點樣變化嘅」[24]。Engadget都指出唨佢個「理解電話跟其他嘢點樣隨時間變化」嘅非凡能力[12]。
可能影響
OpenAI拒絕唨發布DALL-E嘅源代碼、或者允許喺少量示例提示之外佢嘅使用 [3];OpenAI聲稱佢計劃喺DALL-E之類嘅模型中「分析啲社會影響」[23]同埋「潛在嘅走趲」[14]。即管缺乏權限攞到 DALL-E,但 DALL-E 嘅至少一種可能嘅影響已經有討論到,一啲記者跟內容作者主要預測 DALL-E 可能會對新聞跟內容寫作領域產生影響。Sam Shead 喺 CNBC 嘅文章當中指出過,有啲人擔心缺乏已發表嘅論文有描述到呢個系統嘅,而且 DALL-E 仲未「開源」 [sic] [14]。
儘管 TechCrunch 話「暫時唔好寫照片同插圖廣告書住」[1], Engadget 咁講:「如果進一步發展,DALL-E 會有顛覆股票照片同插圖等領域嘅巨大潛力,唔論佢個影響會係好定壞」[12]。
喺《福布斯》嘅一篇觀點文章當中,風險投資家 Rob Toews 表示,DALL-E「預示唨一種新嘅AI範式,即多模態AI 」嘅出現,呢個系統會有能力「解釋、合成同埋轉換多種信息模式」。佢跟住話,DALL-E 證明到「越嚟越難以否認人工智能具有創造力」噉。根據樣本提示(其中包括埋著住衫嘅模特與及家具物品),佢預測時裝設計師同埋家具設計師可能會使埋 DALL-E,但佢又預測「技術會繼續噉快速改進」[25]。
攷
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Coldewey, Devin (5 January 2021). "OpenAI's DALL-E creates plausible images of literally anything you ask it to". 喺5 January 2021搵到.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 Heaven, Will Douglas (5 January 2021). "This avocado armchair could be the future of AI". MIT Technology Review. 喺5 January 2021搵到.
- ↑ 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 Johnson, Khari (5 January 2021). "OpenAI debuts DALL-E for generating images from text". VentureBeat. 喺5 January 2021搵到.
- ↑ Grossman, Gary (16 January 2021). "OpenAI's text-to-image engine, DALL-E, is a powerful visual idea generator". VentureBeat. 喺2 March 2021搵到.
- ↑ Andrei, Mihai (8 January 2021). "This AI module can create stunning images out of any text input". ZME Science. 喺2 March 2021搵到.
- ↑ Walsh, Bryan (5 January 2021). "A new AI model draws images from text". Axios. 喺2 March 2021搵到.
- ↑ "For Its Latest Trick, OpenAI's GPT-3 Generates Images From Text Captions". Synced. 5 January 2021. 喺2 March 2021搵到.
- ↑ Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. 喺23 January 2021搵到.
- ↑ Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). "Language models are unsupervised multitask learners" (PDF). 1 (8). 喺19 December 2020搵到.
{{cite journal}}
: Cite journal requires|journal=
(help) - ↑ 10.0 10.1 Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (July 22, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
{{cite arxiv}}
: Unknown parameter|url=
ignored (help) - ↑ Ramesh, Aditya; Pavlov, Mikhail; Goh, Gabriel; Gray, Scott; Voss, Chelsea; Radford, Alec; Chen, Mark; Sutskever<ref name="newscientist">, Ilya (24 February 2021). "Zero-Shot Text-to-Image Generation". arXiv:2101.12092 [cs.LG].
- ↑ 12.0 12.1 12.2 12.3 Dent, Steve (6 January 2021). "OpenAI's DALL-E app generates images from just a description". Engadget. 喺2 March 2021搵到.
- ↑ Dunn, Thom (10 February 2021). "This AI neural network transforms text captions into art, like a jellyfish Pikachu". BoingBoing. 喺2 March 2021搵到.
- ↑ 14.0 14.1 14.2 14.3 Shead, Sam (8 January 2021). "Why everyone is talking about an image generator released by an Elon Musk-backed A.I. lab". CNBC. 喺2 March 2021搵到.
- ↑ Kasana, Mehreen (7 January 2021). "This AI turns text into surreal, suggestion-driven art". Input. 喺2 March 2021搵到.
- ↑ Ehrenkranz, Melanie (27 January 2021). "Here's DALL-E: An algorithm learned to draw anything you tell it". NBC News. 喺2 March 2021搵到.
- ↑ 17.0 17.1 Stove, Emma (5 February 2021). "Tardigrade circus and a tree of life — January's best science images". Nature. 喺2 March 2021搵到.
- ↑ 18.0 18.1 Knight, Will (26 January 2021). "This AI Could Go From 'Art' to Steering a Self-Driving Car". Wired. 喺2 March 2021搵到.
- ↑ 19.0 19.1 Metz, Rachel (2 February 2021). "A radish in a tutu walking a dog? This AI can draw it really well". CNN. 喺2 March 2021搵到.
- ↑ 20.0 20.1 Stokel-Walker, Chris (5 January 2021). "AI illustrator draws imaginative pictures to go with text captions". New Scientist. 喺4 March 2021搵到.
- ↑ 21.0 21.1 21.2 Wakefield, Jane (6 January 2021). "AI draws dog-walking baby radish in a tutu". British Broadcasting Corporation. 喺3 March 2021搵到. 引用錯誤 Invalid
<ref>
tag; name "bbc" defined multiple times with different content - ↑ Markowitz, Dale (10 January 2021). "Here's how OpenAI's magical DALL-E image generator works". TheNextWeb. 喺2 March 2021搵到.
- ↑ 23.0 23.1 Macaulay, Thomas (6 January 2021). "Say hello to OpenAI's DALL-E, a GPT-3-powered bot that creates weird images from text". TheNextWeb. 喺2 March 2021搵到.
- ↑ 24.0 24.1 24.2 Whitwam, Ryan (6 January 2021). "OpenAI's 'DALL-E' Generates Images From Text Descriptions". ExtremeTech. 喺2 March 2021搵到.
- ↑ Toews, Rob (18 January 2021). "AI And Creativity: Why OpenAI's Latest Model Matters". Forbes. 喺2 March 2021搵到.