332 Views
April 19, 21
スライド概要
2021/04/16
Deep Learning JP:
http://deeplearning.jp/seminar-2/
DL輪読会資料
NeRF-VAE: A Geometry Aware 3D Scene Generative Model Shohei Taniguchi, Matsuo Lab
֓ཁ ະγʔϯͷ෮ݩɾੜ͕Ͱ͖ΔNeRF • ஶऀ Adam R. Kosiorek, Heiko Strathmann, Daniel Zoran, Pol Moreno, Rosalia Schneider, Soňa Mokrá, Danilo J. Rezende • DeepMind • GQNͷgeneratorʹNeRFΛͬͨϞσϧ • Last authorͷRezendeGQNͷఏҊऀ • ICMLϑΥʔϚοτ 2
Outline 1. લఏࣝ • Neural Radiance Fields (NeRF) • Generative Query Networks (GQN) 2. ख๏ɿNeRF-VAE 3. ࣮ݧ 4. ·ͱΊ 3
લఏࣝ 4
[Mildenhall et al., ECCV2020] • 3࣍࠲ݩඪ (x) ͱࢹઢํ (d) Λ ೖྗͱًͯ͠ (r, g, b) ͱີ σ Λ ग़ྗ͢ΔNN (γʔϯؔ Fθ : (x, d) ↦ ((r, g, b), σ) • ༷ʑͳ͔֯ΒࡱͬͨࣸਅͰֶश ผͷ͔֯ΒࡱͬͨࣸਅΛ ɹੜͰ͖Δ(novel view synthesis) 5 ) ➡︎ NeRF
NeRF [Mildenhall et al., ECCV2020] • γʔϯΛ3࣍࠲ݩඪͱࢹઢํ͔Βًͱີ ͷؔͱͯ͠දݱ • ͜ͷ͕ؔΘ͔Δͱɺvolume renderingΛ༻͍ͯҙͷࢹ͔Βͷը૾Λ ੜՄೳʢৄ͘͠͞ډΜͷࢿྉ[1, 2]Λࢀরʣ 6
NeRF [Mildenhall et al., ECCV2020] • ֶशϨϯμϦϯάͨ͠ը૾ͱ ਅͷը૾ͱͷ̎ࠩޡͷ࠷খԽ • volume rendering͕ඍՄೳͳͷͰ end-to-endʹֶशՄೳ • ϨϯμϦϯά࣌ʹ͏αϯϓϧͷ બͼํͳͲʹ༷ʑͳ͋Γ 7
NeRF [Mildenhall et al., ECCV2020] Pros • 3Dγʔϯͷදͯ͠ͱݱըظత • ैདྷ܈ϝογϡͷΑ͏ͳ ࢄͰߴίετͳදݱ • NNΛͬͨimplicitͳදͰݱ ෳࡶͳγʔϯΛਫ਼៛ʹଊ͑ΒΕΔ 8
NeRF [Mildenhall et al., ECCV2020] Cons • γʔϯ͝ͱʹஞҰϞσϧΛ࠷దԽ͢Δඞཁ͕͋Δ • ະͷγʔϯ͕ಘΒΕͨΒɺͦͷʹϞσϧΛֶश͠ͳ͚ΕͳΒͳ͍ • γʔϯ͝ͱʹͨ͘͞Μͷը૾Λ༻ҙ͢Δඞཁ͕͋Δ • 1γʔϯ͋ͨΓֶशʹ1~2͔͔Δ • ʢવ͕ͩʣ৽͍͠γʔϯͷੜͰ͖ͳ͍ 9
GQN [Eslami et al., 2018] • 3࣍ݩγʔϯ෮ݩΛߦ͏VAE • EncoderΛ༻͍ͯ৽͍͠γʔϯΛ ߴʹ෮͖ͰݩΔ • ϞσϧΈࠐΈϕʔε • ৄ͘͠ླ͞Μͷࢿྉ[3]Λࢀর 10
GQN [Eslami et al., 2018] • ࢹ c ͔Βͨݟը૾Λ I ͱ͠ɺγʔϯΛજࡏม z Ͱදݱ c • VAEͱಉ༷ʹมԼքͷ࠷େԽͰֶश I log p ({Ik} ∣ {ck} ) k=1 k=1 𝔼 N N N = log p (z) p (Ik ∣ ck, z) dz ∏ ∫ k=1 ≥ N log p I ∣ c , z − D q∥p ( ) ( ) k k KL q(z ∣ {Ik, ck} ) [ ∑ ] k=1 k=1 N 11 z
GQN [Eslami et al., 2018] ৽͍͠γʔϯͷ෮ݩencoder (q)Λ ͬͯߴʹͰ͖Δ p (I ∣ c, {Ik, ck} q(z ∣ {Ik, ck} k=1) M p I ∣ c, z ( )] [ 𝔼 ≈ k=1) M 12
GQN [Eslami et al., 2018] Pros Cons • EncoderͰະγʔϯΛߴʹ • زԿతͳใΛͬͯͳ͍ͷͰ ෮ݩը૾ʹҰ؏ੑ͕ͳ͍ ෮͖ͰݩΔ (amortized inference) • NeRF΄Ͳ៉ྷʹੜͰ͖ͳ͍ • ֶश࣌ؒͦ͜·Ͱ͔͔Βͳ͍ 13
ख๏ 14
NeRF-VAE • NeRFʹજࡏมΛ࣋ͨͤͯɺVAEͷΑ͏ʹֶश͢Δ͜ͱͰ ະγʔϯͷ෮͕ݩՄೳͳ֦ʹܗு • γʔϯؔͷೖྗʹજࡏมΛՃ͑Δ Gθ( ⋅ , z) : (x, d) ↦ ((r, g, b), σ) • γʔϯؔͷύϥϝʔλ θ શγʔϯʹڞ௨ͳߏΛֶश͠ જࡏม z ͕γʔϯ͝ͱͷಛΛଊ͑ΔΑ͏ʹͳΔ • ࣄલ p (z) ͔Βαϯϓϧ͢Εɺ৽͍͠γʔϯͷੜͰ͖Δ 15 c I z
NeRF-VAE ࠷దԽ ̂ • ࢹ c ͔ΒͷϨϯμϦϯά݁ՌΛ I = render (Gθ( ⋅ , z), c) ͱ͢Δͱ 2 ̂ I(i, j) ∣ I(i, j), σ ؔ pθ(I ∣ z, c) = lik) ∏ ( c i,j • ֶशGQNͱಉ༷ʹมԼքͷ࠷େԽ N log p I ∣ c , z − D q∥p ( ) ( ) k k KL q(z ∣ {Ik, ck} ) [ ∑ ] k=1 k=1 𝒩 𝔼 N 16 I z
NeRF-VAE ࡉ͔͍ 1. Encoder (q) ResNetͰ֤ը૾ΛຒΊࠐΜͩಛͷฏۉΛऔͬͯ ਖ਼نͷύϥϝʔλʹม 2. Encoderͷਪ࣌ʹiterative amortized inferenceΛ͏ 3. γʔϯؔ Gθ( ⋅ , z) ʹattentionϕʔεͷ ΞʔΩςΫνϟΛ͏ 17
ݧ࣮ 18
NeRFͱͷൺֱ • NeRFʹൺͯগͳ͍ࢹͰ͏·͍͘͘ • ࢹ͕ेଟ͍߹NeRFͷํ͕͖Ε͍ʢ͜Εવʣ 19
GQN (CONV-AR-VAE) ͱͷൺֱ ϨϯμϦϯάͷҰ؏ੑ • GQNҰ؏ੑ͕ͳ͍ʢମ͕ݱΕͨΓফ͑ͨΓ͍ͯ͠Δʣ • ఏҊ๏NeRFͰزԿతͳࣄલ͕ࣝೖ͍ͬͯΔͷͰɺৗʹҰ؏͍ͯ͠Δ 20
GQN (CONV-AR-VAE) ͱͷൺֱ ֎ͷ൚Խ • GQNֶश࣌ʹͱͨ͜ݟͷͳ͍ࢹ͏·͘ϨϯμϦϯάͰ͖ͳ͍ • ఏҊ๏͏·͘൚Խ͍ͯ͠Δ 21
৽͍͠γʔϯͷੜ • ࣄલ͔Βαϯϓϧ͢Δ͜ͱͰ৽͍͠γʔϯੜͰ͖Δ • ݪཧతʹGQNͰͰ͖Δ͕ͣͩଟ͜Μͳʹ៉ྷʹੜͰ͖ͳ͍ͣ 22
·ͱΊ & ײ • NeRFͱVAEΛΈ߹ΘͤΔ͜ͱͰɺະγʔϯͷ෮ݩ/ੜ͕Ͱ͖ΔϞσϧ NeRF-VAEΛఏҊ • Ұ؏ͨ͠ϨϯμϦϯά৽͍͠γʔϯͷੜ͕Մೳʹ ײ • ૉͳ֦ுͰྑͦ͞͏͕ͩɺ࣮݁ݧՌͦΕ΄Ͳ͕ͨ͠ؾ͍ͳ͘ڧ • ͜Ε͕NeRFͱಉ͘͡Β͍ෳࡶͳγʔϯʹεέʔϧͨ͠Β͔ͳΓͦ͢͝͏ 23
References [1] [DLྠಡձ]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (https://www.slideshare.net/DeepLearningJP2016/dlnerf-representingscenes-as-neural-radiance-fields-for-view-synthesis) [2] [DLྠಡձ]Neural Radiance Field (NeRF) ͷੜͱ·ڀݚΊ (https:// www.slideshare.net/DeepLearningJP2016/dlneural-radiance-field-nerf?ref=https:// deeplearning.jp/) [3] [DLྠಡձ]GQNͱؔ࿈ڀݚɼੈքϞσϧͱͷؔʹ͍ͭͯ (https:// www.slideshare.net/DeepLearningJP2016/dlgqn-111725780) 24