【Diffusion勉強会】Diffusion Bridge Models

556 Views

August 27, 24

#拡散モデル #画像変換 #SDEdit #InstructPix2Pix #DDBM

スライド概要

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 86.2K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.6K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 56.2K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 40.2K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 34.6K

【拡散モデル勉強会】拡散モデルのサンプラーまとめ

Deep Learning JP 34.4K

各ページのテキスト

Diffusion Bridge Models 許諾なく撮影や第三者への開示を禁止します 2024/08/27 O.Sodtavilan ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO

Introduction Agenda 1. Score-based Diffusion Models - SDE - VP-SDE - VE-SDE 2. Image 2 Image translations - SDEdit - Pix2Pix Instruct 3. DDBM ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 2

1. Score-based Diffusion Models Stocastic Differential Equation: Forward process: 𝑑𝑥𝑡 = 𝐹 𝑥𝑡 , 𝜎𝑡 𝑑𝜎𝑡 + 𝐺 𝜎𝑡 𝑑𝜔𝑡 Reverse process: 1 𝑑𝑥𝑡 = 𝐹 𝑥𝑡 , 𝜎𝑡 − 𝐺 𝜎𝑡 2 ∇𝑥𝑡 log 𝑝𝜎 (𝑥𝑡 ) 𝑑𝜎𝑡 + 𝐺 𝜎𝑡 𝑑𝜔𝑡 2 ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 3

1. Score-based Diffusion Models VP-SDE: DDPM-like diffusion models 1 𝐹 𝑥𝑡 , 𝜎𝑡 = − 𝛽 𝜎𝑡 𝑥𝑡 2 𝐺 𝜎𝑡 = 𝛽(𝜎𝑡 ) 確率微分方程式の解 : 𝑥𝑡 = 𝛼𝑡 𝑥0 + 𝜎𝑡 𝑧𝑡 𝑤ℎ𝑒𝑟𝑒 𝑧𝑡 ~𝑁(0, 𝐼) VE-SDE: Variance Exploding 𝐹 𝑥𝑡 , 𝜎𝑡 = 0 𝐺 𝜎𝑡 = 2𝜎𝑡 確率微分方程式の解 : 𝑥𝑡 = 𝑥0 + 𝜎𝑡 𝑧𝑡 𝑤ℎ𝑒𝑟𝑒 𝑧𝑡 ~𝑁(0, 𝐼) ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 4

1. Score-based Diffusion Models Score-matching 1 𝑑𝑥𝑡 = 𝐹 𝑥𝑡 , 𝜎𝑡 − 𝐺 𝜎𝑡 2 ∇𝑥𝑡 log 𝑝𝜎 (𝑥𝑡 ) 𝑑𝜎𝑡 + 𝐺 𝜎𝑡 𝑑𝜔𝑡 2 Train this parameter ℒ 𝜃 = 𝔼𝑥𝑡~𝑝 𝑥𝑡 𝑥0 ,𝑥0~𝑝𝑑𝑎𝑡𝑎,𝑡~𝑈(0,𝑇) 𝑠𝜃 𝑥𝑡 , 𝜎𝑡 − ∇𝑥𝑡 log 𝑝𝜎 𝑥𝑡 2 Sampling = Solving ODE: 1 𝑑𝑥𝑡 = 𝐹 𝑥𝑡 , 𝜎𝑡 − 𝐺 𝜎𝑡 2 𝑠𝜃 (𝑥𝑡 , 𝜎𝑡 ) 𝑑𝜎𝑡 2 ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 5

2. Image 2 Image translations Image 2 Image translation task: Sample Generated ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 6

2. Image 2 Image translations Image 2 Image translation methods: SDEdit Instruct Pix2Pix Denoising Translation Conditional Translation Bridge Models 𝑦~𝑞(𝑦) Classifier-Free Guidance (CFG) for 2 conditions Optimal Transport: 𝑇# 𝑞 = 𝑝 𝑥~𝑝(𝑥) ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 7

2. Image 2 Image translations Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J., & Ermon, S. (2021). SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. International Conference on Learning Representations. SDEdit ✓ Improved sampling speed. ✓ Easy to integrate with large diffusion models. × Realism-faithfulness trade-off. ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 8

2. Image 2 Image translations SDEdit Sweet spot: Realism-faithfulness trade-off : ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 9

10.

2. Image 2 Image translations Brooks, T., Holynski, A., & Efros, A.A. (2022). InstructPix2Pix: Learning to Follow Image Editing Instructions. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18392-18402. Instruct Pix2Pix Text Conditioned CFG (StableDiffusion) CFG for 2 conditions “Turn him into a cyborg!” “Turn him into a cyborg!” ✓ No realism-faithfulness trade-off. ✓ Visual quality is good. × Rely on guidance or projected sampling. (生成過程に組込コスト) ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 10

11.

2. Image 2 Image translations Instruct Pix2Pix 生成過程への組込 (1) Generate text edits: GPT-3 (finetuned) Input Caption: “photograph of a girl riding a horse” Instruction: “have her ride a dragon” Edited Caption: “photograph of a girl riding a dragon” (2) Generate paired images: Stable Diffusion + Prompt2Prompt Input Caption: “photograph of a girl riding a horse” Edited Caption: “photograph of a girl riding a dragon” Generated training examples: “have her ride a dragon” “Color the cars pink” “Make it lit by fireworks” “convert to brick” … ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 11

12.

3. DDBM Bortoli, Valentin De et al. “Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling.” ArXiv abs/2106.01357 (2021): n. pag. Definition of Schrodinger Bridge Problem シュレディンガーが粒子の状態遷移を計算するときに提唱された（？）・運動前の粒子の状態：𝜓1 ・運動後の粒子の状態： 𝜓2 確率密度の繊維： 𝜓1 2 → 𝜓2 2 確率空間における最適輸送問題： Parameters: 𝑝𝑑𝑎𝑡𝑎 : dataset distribution 𝑥𝑁 ~𝑝𝑑𝑎𝑡𝑎 = 𝜋0 𝑝𝑝𝑟𝑖𝑜𝑟 : condition distribution 𝑥𝑁 ~𝑝𝑝𝑟𝑖𝑜𝑟 = 𝜋𝑁 ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 12

13.

14.

3. DDBM Zhou, L., Lou, A., Khanna, S., & Ermon, S. (2023). Denoising Diffusion Bridge Models. ArXiv, abs/2309.16948. Denoising Diffusion Bridge Model ✓ No realism-faithfulness trade-off. ✓ Visual quality is good. ✓ Not rely on guidance or projected sampling. ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 14

15.

16.

17.

18.

19.

3. DDBM Research Ideas : ・Computational Costs：1ステップ計算するために3回モデルの推論が必要。・Apply to Large Models：step t のスケジュールを拡張することでStable Diffusionと統合できるか？→Unconditional Samplingができる可能性があるため、Stable DiffusionをBridge ModelにFinetuneする？ ©︎MATSUO LAB, THE UNIVERSITY OF TOKYO 19

20.

21.