【DL輪読会】Mastering Diverse Domains through World Models

1.4K Views

January 16, 23

スライド概要

2023/1/13
Deep Learning JP
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

Mastering Diverse Domains through World Models Shohei Taniguchi, Matsuo Lab

2.

ॻࢽ৘ใ Mastering Diverse Domains through World Models https://arxiv.org/abs/2301.04104 • ஶऀ • Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap • ֓ཁ • ੈքϞσϧΛ࢖ͬͨ‫ڧ‬Խֶशख๏Dreamerͷվળ൛ (ver. 3) • εΫϥονͷ‫ڧ‬ԽֶशͰॳΊͯMinecraftͰμΠϠϞϯυΛͱΔ͜ͱʹ੒ޭ 2

3.

Minecraft ObtainDiamond • MinecraftͰμΠϠϞϯυΛͱΔλεΫ • ใु͸ɼதؒΞΠςϜ͔μΠϠΛͱͬͨͱ͖ͷΈಘΒΕΔ • NeurIPSͰ2019೥͔Βίϯϖ͕ߦΘΕ͓ͯΓɼRL‫ڀݚ‬ͷ1ͭϚΠϧετʔϯ • ͜Ε·ͰεΫϥονͷRLͰμΠϠ֫ಘ·Ͱ ੒ޭͨ͠ྫ͸ͳ͠ • ਓؒͷσϞΛ࢖͏ख๏Ͱͷ੒ޭྫ͸͋Γ

4.

ൃද֓ཁ • લఏ஌ࣝ • ੈքϞσϧ x ‫ڧ‬Խֶश • PlaNet, Dreamer, DreamerV2 • DreamerV3 • ·ͱΊ εϥΠυͷҰ෦ΛҎԼ͔Βྲྀ༻͍ͯ͠·͢ 4 https://www.slideshare.net/ShoheiTaniguchi2/ss-238325780

5.

‫ڧ‬Խֶशͷ՝୊ αϯϓϧޮ཰ • ֶशʹେྔͷ͕͔͔࣌ؒΔ • ϩϘοτͳͲ͸ͦΜͳʹසൟʹ࣮‫ֶͰػ‬शͤ͞Δͷ͸ίετతʹ‫͍͠ݫ‬ 5

6.

ੈքϞσϧ x ‫ڧ‬Խֶश ‫ڥ؀‬ͷϞσϧΛਂ૚ֶशͰ֫ಘͰ͖Ε͹ ͦͷϞσϧ಺Ͱ‫ڥ؀‬ΛγϛϡϨʔτͯ͠ ํࡦΛֶशͰ͖Δ͸ͣ ➡ ੈքϞσϧ 6

7.

ੈքϞσϧ x ‫ڧ‬Խֶश ֶशͷྲྀΕ 1. ํࡦ π Ͱ‫͔ڥ؀‬Βσʔλ D ΛूΊΔ D = {x1, a1, r1, …, xT, aT, rT} 2. D Λ༻͍ͯੈքϞσϧ pψ Λֶश pψ (x1:T, r1:T ∣ a1:T) 3. ੈքϞσϧΛ༻͍ͯํࡦ π Λߋ৽ https://arxiv.org/abs/1903.00374 • 1 ~ 3Λ‫܁‬Γฦ͢ 7

8.

World Models https://worldmodels.github.io/ [Ha and Schmidhuber, 2018] • ੈքϞσϧ‫ܥ‬ͷ‫ڀݚ‬ͷ૸Γͱ͍͑Δ࿦จ • ੈքϞσϧͷֶशɿVAE + MDN-RNN • ํࡦͷֶशɿCMA-ES • ࠓճ͸ৄ͍͠಺༰͸ׂѪ͠·͢ ʢҎԼͷεϥΠυͳͲΛࢀরʣ https://www.slideshare.net/masa_s/ss-97848402 8 https://arxiv.org/abs/1803.10122

9.

https://planetrl.github.io/ PlaNet [Hafner, et al., 2019] • ੈքϞσϧͷֶशɿ • Recurrent State Space Model ্ɿ࣮‫Ͱڥ؀‬ͷϩʔϧΞ΢τ • ํࡦͷֶशɿCEM ԼɿੈքϞσϧʹΑΔγϛϡϨʔγϣϯ • ϞσϧϑϦʔͱ΄΅ಉ౳ͷੑೳ DM Control SuiteͰͷ࣮‫݁ݧ‬Ռ 9 https://arxiv.org/abs/1811.04551

10.

Ψ΢ε‫ܕ‬ঢ়ଶۭؒϞσϧ Gaussian State Space Model • ঢ়ଶભҠ֬཰ʹਖ਼‫ن‬෼෍Λ࢖͏Ϟσϧ pψ (st+1 ∣ st, at) • = Normal μψ (st, at), diag σψ2 (st, at) ( ( )) • 2 ؔ਺ μψ, σψ ʹ͸DNNͳͲΛ༻͍Δ • ͜Εͩͱ࣮‫ݧ‬తʹ͏·͍͔͘ͳ͍ʢޯ഑ফࣦͳͲʣ 10 rt rt+1 at at+1 st st+1 ot ot+1

11.

࠶‫ؼ‬తঢ়ଶۭؒϞσϧ Recurrent State Space Model (RSSM) • ঢ়ଶ s Λܾఆ࿦తʹભҠ͢Δ h ͱ ֬཰తʹભҠ͢Δ z ʹ෼͚ͯϞσϧԽ͢Δ ht+1 = fψ (ht, st, at) pψ (st ∣ ht) = 2 Normal (μψ (ht), diag (σψ (ht))) • fψ ͸LSTMͳͲͷRNN‫ܕ‬ͷؔ਺ 11 rt rt+1 at at+1 ht ht+1 st st+1 xt xt+1

12.

࠶‫ؼ‬తঢ়ଶۭؒϞσϧ Recurrent State Space Model (RSSM) RSSMΛ࢖͏ͱ͔ͳΓੑೳ্͕͕Δ 12

13.

https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html Dreamer [Hafner, et al., 2019] • PlaNetΛϕʔεʹͯ͠ɺ ํࡦͷֶशΛActor-Critic‫ʹܕ‬มߋ • Ձ஋ؔ਺ʹ λ ऩӹΛ༻͍Δ • PlaNet͔Βੑೳ͕େ෯ʹվળ 13 https://arxiv.org/abs/1912.01603

14.

Ձ஋ؔ਺ͷਪఆ ϕϧϚϯํఔࣜ V (st) = π π [r (st, at)] + V (st+1) π n εςοϓʹ֦ு͢Δͱ = r s , a + V s ( ) ( π ∑ t+k t+k t+n) [ k=1 ] π 𝔼 𝔼 π Vn (st) n−1 14

15.

Ձ஋ؔ਺ͷਪఆ π Vn (st) = n−1 π r (st+k, at+k) + V (st+n) ∑ [ k=1 ] π n = 1,…, ∞ Ͱࢦ਺ฏ‫ۉ‬ΛͱΔͱ V̄ (st, λ) = (1 − π ∞ n−1 π λ) λ Vn (st) ∑ n=1 𝔼 ͜ΕΛ λ ऩӹͱ‫Ϳݺ‬ 15

16.

Ձ஋ؔ਺ͷਪఆ DreamerͰ͸ɺλ ऩӹΛՁ஋ؔ਺ͷλʔήοτͱ͢Δ θ ← θ − ηθ ∇θ pψ,πϕ [ πϕ Vθ (st) − V̄ (st, λ) π ] 2 ͨͩ͠ɺࢦ਺ฏ‫ۉ‬ͷ࿨͸ద౰ͳେ͖͞ʢHͱ͢ΔʣͰଧͪ੾Δ V̄ (st, λ) ≈ (1 − n−1 π λ) λ Vn (st) ∑ n=1 𝔼 π H−1 16 + H−1 π λ VH (st)

17.

λ ऩӹͷޮՌ No value͸ํࡦޯ഑๏Ͱֶशͨ͠৔߹ͷ݁Ռ λ ऩӹΛ༻͍Δ͜ͱͰɺH ʹґΒͣੑೳ͕վળ 17

18.

DreamerV2 [Hafner, et al., 2020] Dreamerͷվྑ൛ 1. જࡏม਺ʹ཭ࢄͳΧςΰϦΧϧ෼෍Λ࢖͏ 2. Τϯίʔμ͕ա౓ʹਖ਼ଇԽ͞Εͳ͍Α͏ʹ KL߲ͷֶश཰Λௐ੔͢Δ • AtariͰਓؒϨϕϧͷੑೳΛୡ੒ 18

19.

཭ࢄજࡏม਺ • PlaNet΍DreamerV1Ͱ͸ɼ࿈ଓతͳજࡏม਺Λ࢖͍ɼਖ਼‫ن‬෼෍ͰϞσϧԽ • DreamerV2Ͱ͸ɼ཭ࢄͳΧςΰϦΧϧ෼෍ʹมߋ 19

20.

཭ࢄજࡏม਺ • ཭ࢄʹͨ͜͠ͱͰɼޯ഑ͷਪఆʹreparameterization trick͸࢖͑ͳ͘ͳΔ • ୅ΘΓʹstraight-through estimatorͰਪఆ • ਪఆྔʹόΠΞε͕৐Δ͕ɼ࣮૷͕؆୯ 20

21.

KL Balancing • ੈքϞσϧͷϩεʹ͓͍ͯɼKL߲͸encoderͱભҠϞσϧͷpriorΛ͚ۙͮΔ ਖ਼ଇԽͷ໾ׂΛ͢Δ • ͔͠͠ɼಛʹֶशॳ‫ʹظ‬ભҠϞσϧ͕े෼ʹֶशͰ͖͍ͯͳ͍ঢ়ଶͩͱ ͜ͷKLਖ਼ଇԽ͕‫ͳ͘ڧ‬Γֶ͗ͯ͢शͷ๦͛ʹͳΔ 21

22.

KL Balancing • EncoderͱભҠϞσϧͷKL߲ʹ͍ͭͯͷֶश཰Λௐ੔͢Δ͜ͱͰܰ‫ݮ‬ • α͸0.8ʹઃఆ 22

23.

࣮‫ݧ‬ • AtariͰਓؒ௒͑ • ϞσϧϑϦʔͷDQN, RainbowͳͲΑΓ΋‫͍ڧ‬ 23

24.

࣮‫ݧ‬ Ablation • ΧςΰϦΧϧม਺΍KL balancingͷޮՌ΋͔ͳΓେ͖͍ 24

25.

DreamerV3 25

26.

DreamerV3 • DreamerV2ΛΑΓ൚༻తʹ࢖͑Δख๏ʹ͢ΔͨΊʹ͍͔ͭ͘޻෉Λ௥Ճ • υϝΠϯ͕มΘͬͯ΋ৗʹಉ͡ϋΠύϥͰֶशͰ͖ΔΑ͏ʹ 1. ‫؍‬ଌ΍ใुͷ஋Λsymlogؔ਺Ͱม‫͢׵‬Δ 2. Actorͷ໨తؔ਺Ͱ͸λऩӹͷ஋Λਖ਼‫ن‬Խ͢Δ 26

27.

Symlog Prediction • υϝΠϯ͕มΘΔͱɼ‫؍‬ଌ΍ใुͷ஋ͷεέʔϧ͕มΘΔͷͰɼ ஞҰϋΠύϥΛௐ੔͢Δඞཁ͕͋Δ • ͦΕΛ͠ͳ͍͍ͯ͘Α͏ʹɼsymlogؔ਺Λ͔͚Δ͜ͱͰ஋Λ͋Δఔ౓ἧ͑Δ • Մ‫ͳ਺ؔͳٯ‬ͷͰɼ‫਺ؔٯ‬Λ͔͚Ε͹‫ݩ‬ͷ஋ʹ໭ͤΔ 27

28.

λऩӹͷਖ਼‫ن‬Խ • Τϯτϩϐʔਖ਼ଇԽ෇͖ͰactorΛֶश͢Δ৔߹ɼͦͷ܎਺ͷνϡʔχϯά͸ ใुͷεέʔϧ΍εύʔεੑʹґଘ͢ΔͷͰ೉͍͠ • ͏·͘ใुͷ஋Λਖ਼‫ن‬ԽͰ͖Ε͹ɼυϝΠϯʹΑΒͣΤϯτϩϐʔ߲ͷ܎਺Λ ‫ݻ‬ఆͰ͖Δ͸ͣ 28

29.

λऩӹͷਖ਼‫ن‬Խ • ऩӹΛ5ʙ95%෼Ґ਺ͷ෯Ͱਖ਼‫ن‬Խ͢Δ • ୯७ʹ෼ࢄͰਖ਼‫ن‬Խ͢Δͱɼใु͕εύʔεͳͱ͖ʹɼऩӹ͕աେධՁ͞Εͯ ͠·͏ͷͰɼ֎Ε஋Λ஄͚ΔΑ͏ʹ͜ͷ‫͢ʹܗ‬Δ 29

30.

࣮‫ݧ‬ • ͢΂ͯͷυϝΠϯɾλεΫͰಉ͡ϋΠύϥͰߴ͍ੑೳ͕ग़ͤΔ 30

31.

࣮‫ݧ‬ • ϞσϧͷαΠζʹΑͬͯੑೳ͕εέʔϧ͢Δ͜ͱ΋֬ೝ 31

32.

࣮‫ݧ‬ ੈքϞσϧʹΑΔະདྷ༧ଌ 32

33.

࣮‫ݧ‬ • MinecraftͰॳΊͯRL agent͕μΠϠϞϯυΛͱΔ͜ͱʹ੒ޭ 33

34.

·ͱΊ • ੈքϞσϧͷ୅දతͳख๏DreamerͷൃలΛղઆ • V3ʹؔͯ͠͸ਖ਼௚ώϡʔϦεςΟοΫͷմ‫ײ‬͸൱Ίͳ͍ • ݁Ռ͸͍͢͝ 34