【DL輪読会】The frontier of simulation-based inference

1.2K Views

February 19, 21

スライド概要

2020/05/08
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

The frontier of simulation-based inference Shohei Taniguchi, Matsuo Lab (M1) 1

2.

Paper Info ॻࢽ৘ใ • ஶऀ • Kyle Cranmera, Johann Brehmera, and Gilles Louppe • χϡʔϤʔΫେɺϦΤʔδϡେ • PNAS accepted • બఆཧ༝ • ໬౓ͳ͠ਪ࿦΁ͷ‫ڵ‬ຯ

3.

Outline ൃද֓ཁ 1. લఏ஌ࣝ • ໬౓ͳ͠ਪ࿦ 2. ۙ೥ͷਐల • ‫ػ‬ցֶशʢಛʹਂ૚ֶशʣͷൃల • ໬౓ͳ͠ਪ࿦΁ͷߩ‫ݙ‬

4.

Statistical Inference ౷‫ܭ‬తਪ࿦ • σʔλ͔Β౷‫ܭ‬Ϟσϧ p (X ∣ Θ) ͷύϥϝʔλ Θ Λਪఆ͢Δ͜ͱ • ༷ʑͳਪ࿦ํ๏ 1. ࠷໬ਪఆɹɹargmaxΘ p (X = x ∣ Θ) 2. MAPਪఆɹɹargmaxΘ p (X = x ∣ Θ) p (Θ) 3. ϕΠζਪఆɹp(Θ ∣ X = x) = p (X = x ∣ Θ) p (Θ) ∫ p (X = x ∣ Θ) p (Θ) dΘ

5.

Likelihood ໬౓ͱ͸ p (X ∣ Θ)ʹ͍ͭͯ • Θ = θ Λ‫ݻ‬ఆͯ͠Xͷؔ਺ͱͯ͠‫ ਺ؔ౓ີ཰֬ → ͖ͱͨݟ‬p (X ∣ Θ = θ) • X = x Λ‫ݻ‬ఆͯ͠Θͷؔ਺ͱͯ͠‫ → ͖ͱͨݟ‬໬౓ؔ਺ɹɹ p (X = x ∣ Θ) • ۠ผ͢ΔͨΊʹɺཅʹ L (θ ∣ X = x) ͱॻ͘͜ͱ΋͋Δ • ‫ػ‬ցֶशͷ࿦จͩͱɺ۠ผͤͣʹ p (x ∣ θ) ͳͲͱॻ͔ΕΔ͜ͱ͕ଟ͍

6.

Markov Chain Monte Carlo Ϛϧίϑ࿈࠯ϞϯςΧϧϩɺMCMC θ0 Λॳ‫ظ‬஋ɺq (Θ′ ∣ Θ) ΛఏҊ෼෍ͱͯ͠ɺҎԼΛ‫܁‬Γฦ͢ 1. θt ∼ q (Θ′ ∣ θt−1) Λαϯϓϧ 2. ֬཰ min p (x ∣ θt) p (θt) q (θt−1 ∣ θt) Ͱ θt Λ࠾୒ 1, ( p (x ∣ θt−1) p (θt−1) q (θt ∣ θt−1) ) ͦΕҎ֎͸‫غ‬٫

7.

Likelihood-free Inference ໬౓ͳ͠ਪ࿦ɺSimulation-based Inference • ໬౓ p (x ∣ θ) ͸ཅʹॻ͚ͳ͍͕αϯϓϦϯά x ∼ p (X ∣ θ) ͸ՄೳͳϞσϧͰͷਪ࿦ (ྫ) જࡏม਺Ϟσϧ p (X ∣ θ) = p (X ∣ Z, θ) p (Z ∣ θ) dZ ∫ • αϯϓϦϯάաఔΛ x = f (z, θ) ͷΑ͏ʹॻ͚Δ৔߹͸͢΂ͯ໬౓ͳ͠Ϟσϧ • ͭ·Γɺ p (X ∣ Z, Θ) ΋Θ͔Βͳ͍ͱ͍͏৔߹΋‫ؚ‬Ή • જࡏม਺΋߹Θͤͯਪ࿦͢Δ৔߹΋͋Δ

8.

Example 1: Population Genetics Ҩ఻౷‫ֶܭ‬ • ͋ΔूஂͷDNAͷσʔλ͔Βɺಥવมҟ཰΍MRCA ·Ͱͷ࣌ؒͳͲΛਪఆ͢Δ • MRCA: ࠷΋͍ۙ‫ڞ‬௨ͷ૆ઌ https://www.ism.ac.jp/~fukumizu/ABC2015/ABC_review.pdf • ύϥϝʔλΛܾΊͯɺҨ఻ͷաఔΛγϛϡϨʔγϣϯ͢Δ͜ͱ͸؆୯͕ͩɺ ໬౓͸ཅʹఆٛͰ͖ͳ͍͜ͱ͕ଟ͍

9.

Example 2: GAN Generative Adversarial Networks • ໬౓Λ‫͍ͳ͠ࢉܭ‬୅ΘΓʹdiscriminatorΛ༻ҙͯ͠ɺఢରతʹֶश minD maxG V (D, G) = 𝔼x∼p(X) [log D (x)] + 𝔼z∼p(Z) log (1 − D (G (z))) [ ] • discriminator͸ɺσʔλ෼෍ͱϞσϧ෼෍ͷີ౓ൺΛਪఆ͍ͯ͠ΔͱΈͳͤΔ p (x) D (x) r (x) = = p (x ∣ θ) 1 − D (x)

10.

Traditional Methods ‫ݹ‬యతͳख๏ 1. Approximate Bayesian Computation • αϯϓϦϯάʹ‫ͮ͘ج‬ख๏ • ໬౓ͳ͠ਪ࿦ͱ͍͏৔߹ɺ΄ͱΜͲABCΛࢦ͢͜ͱ͕ଟ͍ 2. Surrogate model • ໬౓ͳͲΛ‫͢ࢉܭ‬Δ୅ΘΓͷϞσϧΛ༻ҙ͢Δ • ‫ݹ‬యతʹ͸Χʔωϧີ౓ਪఆͳͲ͕࢖ΘΕΔ

11.

Approximate Bayesian Computation ۙࣅϕΠζ‫ࢉܭ‬ɺABC ҎԼΛ‫܁‬Γฦ͢ 1. θ ∼ p (Θ) Λαϯϓϧ 2. xθ ∼ p (X ∣ θ) Λαϯϓϧ 3. ͋Δ‫ࢦ཭ڑ‬ඪ d Ͱɺd (xθ, x) < ϵ ͳΒ θΛ࠾୒ɺͦΕҎ֎͸‫غ‬٫ • ϵ → 0 Ͱࣄ‫ޙ‬෼෍ p (Θ ∣ x) ʹऩଋ

12.

ABC with MCMC MCMCΛ༻͍ͨABC θ0 Λॳ‫ظ‬஋ɺq (Θ′ ∣ Θ) ΛఏҊ෼෍ͱͯ͠ɺҎԼΛ‫܁‬Γฦ͢ 1. θt ∼ q (Θ′ ∣ θt−1) Λαϯϓϧ 2. xθt ∼ p (X ∣ θt) Λαϯϓϧ 3. d (xθ, x) < ϵ ͳΒ 4΁ɺͦΕҎ֎͸‫غ‬٫ 4. ֬཰ min ( 1, p (θt) q (θt−1 ∣ θt) p (θt−1) q (θt ∣ θt−1) ) Ͱ θt Λ࠾୒ɺͦΕҎ֎͸‫غ‬٫

13.

Challenges ABCͷ՝୊ • ᮢ஋ ϵ ͱਫ਼౓ͷτϨʔυΦϑ • ϵ Λখ͘͢͞Ε͹ਫ਼౓͸ྑ͘ͳΔ͕ɺ΄ͱΜͲͷαϯϓϧ͕‫غ‬٫͞ΕΔ • ‫ࢦ཭ڑ‬ඪ d ͷઃ‫ܭ‬ • X ্Ͱ‫཭ڑ‬ΛଌΔͷ͕ཧ૝͕ͩɺ‫غ‬٫͞Ε΍͍͢ • ཁ໿౷‫ ྔܭ‬S (x) ͷ‫཭ڑ‬ΛଌΔ͜ͱ͕ଟ͍͕ɺS ͷઃ‫͕ܭ‬೉͍͠

14.

Surrogate model ୅ཧϞσϧ ໬౓ؔ਺Λۙࣅ͢Δ୅ཧϞσϧ p̂ (X ∣ Θ; w) Λ ༻ҙͯ͠ɺҎԼΛߦ͏ 1. αϯϓϧ θ ∼ p (Θ), xθ ∼ p (X ∣ θ) Λ ‫ࢣڭ‬σʔλͱͯ͠୅ཧϞσϧΛֶश 2. ୅ཧϞσϧΛ࢖ͬͯσʔλ͔Β θ Λਪఆ 3. (option) ਪఆͨ͠ θ Ͱ୅ཧϞσϧΛ௥Ճ ֶश ➡ 2, 3Λ‫܁‬Γฦ͢

15.

Challenges ୅ཧϞσϧͷ՝୊ • ਪఆͷਫ਼౓͕୅ཧϞσϧʹґଘ • ୅ཧϞσϧͷද‫͕ྗݱ‬ऑ͍ͱɺਫ਼౓͕ѱ͘ͳΔ • ߴ࣍‫ʹݩ‬εέʔϧ͠ͳ͍ • ୅ཧϞσϧʹ͸Χʔωϧີ౓ਪఆ͕࢖ΘΕ͖͕ͯͨɺߴ࣍‫Ͱݩ‬͸‫͍͠ݫ‬

16.

Frontiers of simulation-based inference ໬౓ͳ͠ਪ࿦ͷ՝୊ 1. αϯϓϧޮ཰ • গͳ͍αϯϓϧͰਪఆ͍ͨ͠ 2. ਪఆਫ਼౓ • ਫ਼౓ྑ͘ਪఆ͍ͨ͠ 3. σʔλޮ཰ (amortization) • σʔλ͕૿͑ͯ΋ޮ཰Α͘ਪఆ͍ͨ͠

17.

Revolution of Machine Learning ‫ػ‬ցֶशͷൃల • ۙ೥ɺ‫ػ‬ցֶशɺಛʹਂ૚ֶशͷ‫ٸ͕ڀݚ‬଎ʹൃల • ಛʹ‫͋ࢣڭ‬ΓֶशͰΊ͟·͍͠੒Ռ • ਂ૚ֶशͷख๏͕໬౓ͳ͠ਪ࿦ʹ΋࢖ΘΕ࢝Ί͍ͯΔ • ୅ཧϞσϧʹχϡʔϥϧͳີ౓ਪఆ‫ث‬Λ࢖͏ • GANͷΑ͏ͳີ౓ൺਪఆʹ‫͍ͨͮج‬ਪ࿦ख๏

19.

Quantities ҎԼͷྔ͕‫͖Ͱࢉܭ‬Δ͔Λ‫ج‬४ʹख๏ΛબͿͱྑ͍ I. p (x ∣ z, θ)ɿજࡏม਺͕༩͑ΒΕͨͱ͖ͷ֬཰ີ౓ II. t (x, z ∣ θ) ≡ ∇θ log p (x, z ∣ θ)ɿજࡏม਺ͱͷಉ࣌෼෍ͷޯ഑ III. ∇z log p (x, z ∣ θ)ɿ‫؍‬ଌͱજࡏม਺ͷಉ࣌෼෍ͷજࡏม਺ʹ͍ͭͯͷޯ഑ IV. r (x, z ∣ θ, θ′) ≡ p (x, z ∣ θ) p (x, z ∣ θ′) ɿҟͳΔύϥϝʔλͰͷಉ࣌෼෍ͷີ౓ൺ V. ∇θ (x, z)ɿ‫؍‬ଌͱજࡏม਺ͷύϥϝʔλʹ͍ͭͯͷޯ഑ VI. ∇z xɿ‫؍‬ଌͷજࡏม਺ʹ͍ͭͯͷޯ഑

20.

Approximate Bayesian Computation with Monte Carlo Sampling ABCͰαϯϓϦϯά • ී௨ͷABC • ҎԼΛ‫܁‬Γฦ͢ 1. θ ∼ p (Θ) Λαϯϓϧ 2. xθ ∼ p (X ∣ θ) Λαϯϓϧ 3. ͋Δ‫ࢦ཭ڑ‬ඪ d Ͱɺd (xθ, x) < ϵ ͳΒ θΛ࠾୒ɺͦΕҎ֎͸‫غ‬٫

21.

Approximate Bayesian Computation with Learned Summary Statistics ཁ໿౷‫ྔܭ‬Λֶश • ‫ݹ‬యతʹ͸ɺཁ໿౷‫ྔܭ‬ͷઃ‫ܭ‬ΛυϝΠϯ஌ࣝ ͷ͋Δઐ໳Ո͕ߦ͖ͬͯͨ • ୅ΘΓʹྑ͍ੑ࣭Λ΋ͭཁ໿౷‫ྔܭ‬Λֶश͢Δ (ྫ) t (x ∣ θ) ≡ ∇θ log p (x ∣ θ) ͸ ɹ p (x) = p (x ∣ θ) ⇒ t (x ∣ θ) = 0 Λຬͨ͢ (V) ∇θ (x, z) ͕Θ͔Ε͹ࢦ਺෼෍଒ͰۙࣅՄ ೳ (II) t (x, z ∣ θ) ͕Θ͔Ε͹αϯϓϧΛ࢖ͬͯ t (x ∣ θ)Λۙࣅ͢ΔNNΛֶशͤ͞ΒΕΔ

22.

Probabilistic Programming with Monte Carlo sampling ֬཰తϓϩάϥϛϯάͰαϯϓϦϯά • StanͳͲͷ֬཰తϓϩάϥϛϯ ά‫( ޠݴ‬PPL) ͕ൃల • (Ⅰ) p (x ∣ z, θ) ͕Θ͔Ε͹ MCMCͳͲͷαϯϓϦϯάख ๏͕ߴ଎ʹ࣮ߦͰ͖Δ • ABC΋PPLͰ؆୯ʹՄೳʹ

23.

Probabilistic Programming with Inference Compilation ֬཰తϓϩάϥϛϯάΛ༻͍ͨਪ࿦ͷमਖ਼ • ͍ΘΏΔamortized inference • ۙࣅࣄ‫ޙ‬෼෍ q (z, θ ∣ x) ΛNNͰ ఆٛͯ͠ɺ(Ⅰ) p (x ∣ z, θ) Λ࢖ͬ ֶͯश͢Δ • q (z, θ ∣ x) ΛఏҊ෼෍ͱͯ͠ɺ ஞ࣍ॏ఺αϯϓϦϯάΛͯ͠ɺ ࠷ऴతͳਪ࿦݁ՌΛಘΔ

24.

Amortized Likelihood ঈ٫໬౓ • ී௨ͷ୅ཧϞσϧΛ࢖͏ख๏ͷ͜ͱ • ۙ೥͸Χʔωϧີ౓ਪఆͷ୅ΘΓʹ normalizing flowͳͲͷχϡʔϥϧີ ౓ਪఆ͕࢖ΘΕΔΑ͏ʹͳͬͨ • ୅ཧϞσϧ͸1౓ֶशͨ͠Βɺ৽͠ ͍σʔλʹରͯ͠΋ޮ཰తʹਪ࿦Ͱ ͖ɺίετ͕࡟‫͖Ͱݮ‬Δ (amortize)

25.

Amortized Posterior ঈ٫ࣄ‫ޙ‬෼෍ • ࣄ‫ޙ‬෼෍ p (Θ ∣ X) ͷ୅ཧϞσϧ p̂ (Θ ∣ X; w) Λ࡞Δ • ֶशͨ͠୅ཧϞσϧʹσʔλ x Λ ೖྗͨ͠Βࣄ‫ޙ‬෼෍͕ಘΒΕΔ • ࣄ‫ޙ‬෼෍ͷ୅ཧϞσϧΛ࡞Δͷ͸ ೉͔͕ͬͨ͠ɺද‫ྗݱ‬ͷߴ͍χϡ ʔϥϧີ౓ਪఆͷ‫Ͱ༻׆‬Մೳʹ

26.

Amortized Likelihood Ratio ঈ٫໬౓ൺ • ໬౓ൺ p (X ∣ Θ) ʹ͍ͭͯ p (X ∣ Θ′) ୅ཧϞσϧ r ̂ (X, Θ, Θ′; w) Λ࡞ֶͬͯश͢Δ • GANͷdiscriminatorͱ΄ͱΜͲಉ͕ͩ͡ɺ ύϥϝʔλ΋ೖྗʹೖΔ • ୅ཧϞσϧΛMCMCͷ‫غ‬٫཰ͷ‫͏࢖ʹࢉܭ‬ min 1, ( r ̂ (x, θt, θt−1; w) p (θt) q (θt−1 ∣ θt) p (θt−1) q (θt ∣ θt−1) )

27.

Amortized surrogates trained with augmented data ঈ٫୅ཧϞσϧ (Ⅱ) t (x, z ∣ θ) ΍ (Ⅳ) r (x, z ∣ θ, θ′) ͕ ಘΒΕΔͱ͖ɺ໬౓΍໬౓ൺͷ୅ཧ Ϟσϧ p̂ (X ∣ Θ; w), r ̂ (X, Θ, Θ′; w) ͸ɺҎԼͷଛࣦؔ਺ͷ࠷খԽͰֶश Ͱ͖Δ 1 LROLR[r]̂ = yi r (xi, zi) − r ̂ (xi) ∑ N i 2 + (1 − yi) 1 r (xi, zi) − 1 2 r ̂ (xi) 1 ̂ LSCANDAL[ p]̂ = LMLE + α t (xi, zi) − ∇θ log p(x) N∑ i 2

28.

Asymptotically Exact Bayesian inference ઴ۙతʹ‫ͳີݫ‬ਪ࿦๏ • (Ⅵ) ∇z x ͕ಘΒΕΔͱ͖ɺજࡏม਺ʹ͍ͭͯ‫ͳີݫ‬ϕΠζਪఆ͕Մೳ • x = f (z, θ) Λ੍໿৚݅ͱߟ͑Δͱɺ੍໿෇͖MCMC๏Λ࢖͏͜ͱͰࣄ‫ޙ‬෼෍͔Βͷαϯϓϧ͕ಘΒΕΔ • z0 Λॳ‫ظ‬஋ɺq (Z′ ∣ Z) ΛఏҊ෼෍ͱͯ͠ɺҎԼΛ‫܁‬Γฦ͢ 1. zt ∼ q (Z′ ∣ zt−1) Λαϯϓϧ 2. d (f (zt, θ), x) = 0 Λ४χϡʔτϯ๏Ͱղ͘ 3. ֬཰ min ( 1, p (zt) q (zt−1 ∣ zt) p (zt−1) q (zt ∣ zt−1) ) Ͱ θt Λ࠾୒ɺͦΕҎ֎͸‫غ‬٫

29.

Summary ໬౓ͳ͠ਪ࿦ͷൃల • ໬౓ͳ͠ਪ࿦͕ɺਂ૚ֶशͷख๏ͷ‫Ͱ༻׆‬ΑΓޮ཰త͔ͭߴਫ਼౓ʹߦ͑ΔΑ ͏ʹͳ͖ͬͯͨ • ख़࿅ऀʹΑΔཁ໿౷‫ྔܭ‬ͷઃ‫ → ܭ‬NNͰཁ໿౷‫ྔܭ‬Λֶश • Χʔωϧີ౓ਪఆ → χϡʔϥϧີ౓ਪఆ (normalizing flow) • ֬཰తϓϩάϥϛϯάͷൃల

30.

‫ײ‬૝ • ͜ͷ࿦จ͸ɺਂ૚ֶश͕໬౓ͳ͠ਪ࿦ʹ༩͑Δߩ‫͕͍ͯͭʹݙ‬ϝΠϯ͕ͩɺ ‫ʹٯ‬໬౓ͳ͠ਪ࿦ͷཧ࿦͔ΒಘΒΕΔਂ૚ֶशͷ஌‫͕͋ݟ‬Δͱ໘ന͍ • ໬౓ͳ͠ਪ࿦͸ύϥϝʔλ਺͕গ਺ͷ৔߹͕ଟ͍͕ɺଟ਺ύϥϝʔλΛѻ͏ ਂ૚ֶशͰ໬౓ͳ͠ਪ࿦͢ΔͨΊͷํ๏࿦΋ඞཁͦ͏ • GAN͸1ͭͷ੒ޭྫ͕ͩɺ‫ط‬ଘͷ໬౓ͳ͠ਪ࿦ख๏ͱͷ‫͕ܨ‬Γ͸·ͩෆ໌ྎ • ཁ໿౷‫ͯ͠ͱྔܭ‬JS/Wasserstein divergenceΛ࢖͏৔߹ʹ૬౰ (?)