[DL輪読会]DeepLearningと曲がったパラメータ空間 (Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation)

123 Views

March 23, 18

スライド概要

2018/03/02
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

DEEP LEARNING JP [DL Papers] Deep Learning   Reiji Hatsugai, DeepX http://deeplearning.jp/ 1

2.
[beta]
M\kxgn[UFW
• -6@@:3<YH,:@96?(.<7=?;3A:=<(03A?:B[UFW"O^P
• \'
–
–
–
–
–
–

^S]R\

$[

Ec

L\mqliXP

+/\&
tgjf{ow|xyplmz|i
hvls
*3A3@A?=>9:4(7=?86AA:<8
11\b
11\rxliulik

• -YH,.0\![UFW"O^P
• d:2352?=XFWF^P
–

VaHTSaP_^QeCC

• ISKNeWJ^PDD
– EYX %P`G[HKFW^PCX]d#Pc

[Zb^P
)

3.

     $   $"&#   $!%&#

4.

   !%   %#'$   %"& '$

5.

 

6.

 •     • 

7.
[beta]


F+,/

• RUSXOIL CK
•
BDEWN?A@GK L>K
• HRUSXO!ITXMVPQ")
– *.JHL CK $3'5'#170;;64610<698%
–  I-.&26=3:53813L>K

, 

, https://www.jstage.jst.go.jp/article/sicejl1962/40/10/40_10_735/_pdf

(

8.

  

9.

  

10.

&   '  •  #+/,*0-/.1'&$!' # )%"( 

11.

      

12.

     $   $"&#   $!%&# 

13.
[beta]

•  $$!#.



• ",-,-3+(<
• 07.")&&(,-3+(<  2*0&/&'4%5
•  $$!#6+5 1 .+5:8;98

• 11



D.P.Kingma, Adam: A Method for Stochastic Optimization, https://arxiv.org/pdf/1412.6980.pdf



14.

 

15.

 

16.
[beta]

• "/201!&'(-, -*($5.1(+(6#1(-,
– ,3&/0& 3&$1-/./-%2$1! "# $@

<

;

>?

• /-,&$)&/#$1-/&%../-4(+#1&2/3#12/&
– @CDBA



• /-,&$)&/#$1-/&%!&$2/0(3&../-4(+#1(-,
– @CDBA

• 7<;



989=:

John Schulman et al, Trust Region Policy Optimization, https://arxiv.org/pdf/1502.05477.pdf
Yuhuai Wu et al, Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation,
https://arxiv.org/pdf/1708.05144.pdf
Roger Grose & James Martens, A Kronecker-factored approximate Fisher matrix for convolutional layers, https://arxiv.org/pdf/1602.01407.pdf
James Martens & Roger Grose, Optimizing neural network with kronecker-factored approximation curvature,
https://arxiv.org/pdf/1503.05671.pdf


17.
[beta]
^RTQeZafbc[WYdfS
• ^RTQeDCL<2
– @I\b_fVE CP?H;D 9=J><
–   H?1%*@3M 3?1CP?:MEFHPB782

• b]bU

• Θ∗ FDG>?2ME@1'-00.,/OHMA 
• `Xc48I IHKNM
• E)&+#O><5($*+ "D>?2M

5@6Mg

C.M.Bishop, PRML () 213p
Hippolyt Ritter et al, A Scaleble Laplace Approximation for Neural Networks, https://openreview.net/pdf?id=Skdvd2xAZ

!

18.
[beta]
$#"!'% &"!
• 987305/)*+., ((.
– $#"!
–  &"!

• :<;=6 -214

-&$!""&-

-

Nitish Shirish Keskar et al, On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima,
https://arxiv.org/pdf/1609.04836.pdf



19.
[beta]
$5'1&/-*.(!%3%231/0)*&"/1('33*.(
• ##EBGLKJIHG;AB:

=GD9

– 6GLKJI <>CLKJ I =GALKJC=GI
– ?DLKJIC8G-4,3*3%2+,'%1.*.(BF

• LKJ@DMONPLDD

7

I<B9FLKJ I 

James Kirkpatrick et al, Overcoming catastrophic forgetting in neural networks, https://arxiv.org/pdf/1612.00796.pdf



20.
[beta]
6;A<>745@A1*
• +8<9A3#
– 
– 9:=0

•  *745@A1", (8<9A30 -%.
• .8<9A3*#?2 )'/&$ 0!."
–

*()

Yann Le Cun et al, Optimal Brain Damage, http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
Yoojin Choi et al, Towards the Limit of Network Quantization, https://arxiv.org/pdf/1612.01543.pdf



21.
[beta]

•

HTWPMUPMN
–  L;D<E9
– $4%01!0)!, /%0230"!2).-

•
•


– 8KRZO6L 9C@BVRXF:>K7
8KQNSRZO!"#$" DGYNG

(22/$%%/,%!0-)-'*/3-$%012!-$)-'",!#+".5/0%$)#2).-14)!)-&,3%-#%&3-#2).-1
–

??G

F@=9C8JIA

Pang Wei Koh, Understanding Black-box Predictions via Influence Functions, https://arxiv.org/pdf/1703.04730.pdf



22.

     $   $"&#   $!%&#

23.
[beta]
#++&"(&+%#*($)*'",&)(",*&-:EFH
• =?:

5

<>6CGEIA ::EFH1



– !#+ #,:CGEIA 
–
8EFH.≈ 10$% J

• B@ID9=? <>:;2430

• 87:

/8



24.
[beta]




• (

)!" ('

– 
– &#!  %$%#!" !#%% 
– "% # 

• $$

C.M.Bishop, PRML p251
D.P.Kingma, Adam: A Method for Stochastic Optimization, https://arxiv.org/pdf/1412.6980.pdf
James Kirkpatrick et al, Overcoming catastrophic forgetting in neural networks, https://arxiv.org/pdf/1612.00796.pdf
Yann Le Cun et al, Optimal Brain Damage, http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf

25.

!"# $ (& "# $).  *,  •  &#'!"# $ () & "# $ )-( • 1/012$ .!"+. * %! "# $ (& "# $).  3 

26.
[beta]
?6!":=
• !*)(-&#,%"+#$'%),/ 
•
C :5A C3 !" = $<.= !%& $C@A
•  <FDEGx;708()C5A4:192B>!%& $C

http://www.slis.tsukuba.ac.jp/~fujisawa.makoto.fu/cgibin/wiki/index.php?%CF%A2%CE%A91%BC%A1%CA%FD%C4%F8%BC%B0%A1%A7%B6%A6%CC%F2%B8%FB%C7%DB%CB
%A1



27.
[beta]
!"# $ (& "# $)=



9; ,#)"

•  526!"# $ () & "# $. 8<7• @>?@A3 =01:= 9/4! "# $ (& "# $)= 
• !((#$ !)%' '% *) +&

B

import tensorflow as tf
grads = tf.gradients(loss, params)
hvp = tf.gradients(tf.reduce_sum(grads*x), params)


28.

 •     

29.

%#" % &#%$$%#)!&'%(&'% • 178652 %#" %4*. • %#" %1 ! "# 4 0 -3,/+ 4*3/  

30.
[beta]
&$#!& ($&&)&' *%%&$+ "( $#
• 

1.

• 3-

2

69:87

2/,0

45



31.
[beta]
.>@6 5!"7

859

• 
– .
–

$**%#&/207/-= ;2.

• !"# $ %& ' "# $? 

– CDF/. 

9=
7%+$)#+%'&741 ,(?3=/-=

• HIEB
– 

.

– :<



– GA.



32.
[beta]
92 7:<b[c
• NXVoqprm%,`K*^
– ."K*a_

SiYKh

NK

• ;7>52/;756_!Z)PYjg[$]92/7:<kcfiheL^]XV
– 0dL

bXYhQ\1/Sf^I]*^jiYP[L

– 

'`UZ^jiYKh_Z85=[MD@H? C@?FEBEA[M

• nlqr +`URK
• 9@GGB?E_(&`bWbWJh
• [_
– 7:<N

]-jg

#^(&ZOheL^]XYOV_Z]^MT^]f]KMs

43