[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)

>100 Views

November 02, 18

スライド概要

2018/11/02
Deep Learning JP:
http://deeplearning.jp/seminar-2/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

DEEP LEARNING JP [DL Papers] Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018) Kazuki Fujikawa, DeNA http://deeplearning.jp/ 1

2.

サマリ • 書誌情報 – Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation • NIPS2018(to appear) • Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec • 概要 – Graph Convolutional Policy Network(GCPN)を提案 • 強化学習で所望の属性を最適化する分子グラフを生成する • ドメイン特有の報酬と敵対的な損失が最適化されるように方策を学習する – 分子属性の最適化、ターゲティングなどの実験で既存の手法を上回る性能を示した • 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い 2

3.

アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 3

4.

アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 4

5.

背景 • 目的関数が最適化できるグラフ構造を生成することは、創薬や材料化学の分野 において重要 – 一般的な分子構造設計では、原子価などの物理法則に従いながら、Drug-likenessや合成 可能性といった特性が理想的な値にすることを考える – 複雑で微分不可能なルールに対して最適化することは依然として困難 • 可変長のグラフを直接生成することは容易ではない – 自然言語のような直列の系列と比較して、「分岐・結合種の存在」「始点が不明確」などの 理由で難易度が高い 図引用: Gomez-Bombarelli+, 2018 5

6.

アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 6

7.

関連研究 • 扱うデータ形式の違いで2種類に大別できる – テキストベース • SMILES – 分子の化学構造を文字列で表現する記法 • SMILES CFG (Context-free Grammar) – SMILESを生成する文脈自由文法の生成規則列 – グラフベース • 隣接行列を直接生成 • ノード・結合を自己回帰的に生成(隣接行列を一行ずつ生成) 分子名 ベンゼン 構造式(グラフ) 隣接行列 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 SMILES SMILES CFG c1ccccc1 smiles → chain chain → chain, branched atom chain → branched atom branched atom → atom, ringbond branched atom → atom atom → aromatic organic atom → aliphatic organic ringbond → digit aromatic organic → ’c’ aliphatic organic → ‘C’ aliphatic organic → ‘N’ digit → ‘1’ digit → ‘2’ 7

8.
[beta]
to generate drug-like molecules. [ Gómez-Bombarelli et al.,
Carlo search with the canonical rollout policy G✓ represented
2016b] employed a variational autoencoder to build a laas
tent, continuous space where property optimization can be
1
N
MCG ✓ (Y1:t ; N ) = { Y1:T
, ..., Y1:T
}
(3)
made through surrogate optimizati on. Finally, [ Kadurin et
n
al., 2017] presented a GAN model for drug generation. Adwhere Y1:t
= Y1:t and Ytn+ 1:T is stochastically sampled via
ditionally, the approach presented in this paper has recently
the policy G✓. Now Q(s, a) becomes
been applied to molecular design [ Sanchez-Lengeling et al.,
2017] .
8 1 P representation
n
Automatic
chemical
design[ Lee
using
a data-driven
continuous
of
] built
In the field
of music generation,
et al., 2017
>
< N n = 1..N R(Y1:T ), with
a SeqGAN[Gómez-Bombarelli+,
model employing an efficient representation
of
molecules
2018]
Q(Y1:t − 1 , yt ) = Y n 2 MCG ✓ (Y1:t ; N ), if t < T . (4)
multi-channel MIDI to generate polyphonic music. [ Chen
>
: 1:T
R(Y1:T ),
if t = T .
et al., 2017] presented Fusion GAN, a dual-learning
GAN
• 入力SMILESをAuto-Encoder,
VAEで再構築するように学習することで、潜在空間を学習
model that can fuse two data distributions. [ Jaques et al.,
An unbiased estimation of the gradient of J ( ✓) can be de2017] employ deep Q-learning with a cross-entropy reward
• ベイズ最適化で目的変数を最適化
rived as
to optimize the quality of melodies generated from an RNN.
In adversarial
training, [ Pfau2017]
and Vinyals, 2016] recontexORGAN
[Guimaraes+,
tualizes GANs in the actor-critic setting. This connection
1 X
r
J
(
✓
)
'
Ey t ⇠G ✓ ( y t |Y1 : t − 1 ) [
is
also
explored
with
the
Wasserstein-1
distance
in
WGANs
• RNNDecoderによるSMILESの文字列生成をGAN+RLで最適化
✓
T
[ Arjovsky et al., 2017] . Minibatch discrimination and feature
t = 1,...,T
• SeqGAN
[Yu+,
と同様、Discriminatorが評価したスコア平均を報酬に学習
mapping were
used to2017]
promote diversity
in GANs [ Salimans
r ✓ log G✓(yt |Y1:t − 1 ) · Q(Y1:t − 1 , yt )] (5)
et al., 2016] . Another approach to avoid mode collapse was
• 任意のヒューリスティクス(Diversity等)から得たスコアも同時に最大化する
shown with Unrolled GANs [ Metz et al., 2016] . Issues and
Finally in SeqGAN the reward function is provided by D φ .
[
convergence of GANs has been studied in Mescheder et al.,
2017] .
4 ORGAN

関連研究(SMILES-based)

• テキストベースの生成モデルを使ってSMILESを生成するアプローチ
–

–

3 Backgr ound
In this section, weelaborate on theGAN and RL setting based
on SeqGAN [ Yu et al., 2017]
G✓ is a generator parametrized by ✓, that is trained to produce high-quality sequences Y1:T = (y1 , ..., yT ) of length
T and a discriminator model D φ parametrized by φ, trained
to classify real and generated sequences. G✓ is trained to
deceive D φ , and D φ to classify correctly. Both models are
trained in alternation, following a minimax game:
Gomez-Bombarelli+, 2018

Guimaraes+, 2017
Figure 1: Schema for ORGAN. Left: D is trained as a classifier
receiving as input a mix of real data and generated data by G. Right:

8

9.

関連研究(Graph-based) • グラフベースの生成モデルを使って分子グラフを直接生成するアプローチ – Learning deep generative models of graphs [Li+, 2018] • ノード・結合を順々に自己回帰的に生成する • 生成途中のグラフに対してGraph Convolutionで特徴抽出を行い、その結果を用いて次に生成する ノード・結合を決める – Junction Tree Variational Autoencoder for Molecular Graph Generation [Jin+, 2018] • 環などの原子団を一つのグループにまとめることにより、 グラフ構造を木構造に変換する(Tree decomposition) Junction Tree Var iational Autoencoder for M olecular Gr aph Gener ation • 木構造をVAEの枠組みで再構築するように学習する Figure 2. Comparison of two graph generation schemes: Structure by structure approach is preferred as it avoids invalid intermediate states (marked in red) encountered in node by node approach. • Graph Convolutionで特徴抽出した結果も使って木構造から ond phase, the subgraphs (nodes in the tree) are assembled グラフへと戻す together into a coherent molecular graph. We evaluate our model on multiple tasks ranging from molecular generation to optimization of a given molecule according to desired properties. As baselines, we utilize state-of-the-art SMILES-based generation approaches (Kusner et al., 2017; Dai et al., 2018). We demonstrate that our model produces 100% valid molecules when sampled from a prior distribution, outperforming the top performing baseline by a significant margin. In addition, we show that our model excels in discovering molecules with desired properties, yielding a 30% relative gain over the baselines. Li+, 2018 2. Junction Tree Var iational Autoencoder Our approach extends the variational autoencoder (Kingma Figure 3. Overview of our method: A molecular graph G is first Jin+, 2018 decomposed into its junction tree TG , where each colored node in the tree represents a substructure in the molecule. We then encode both the tree and graph into their latent embeddings zT and zG . 9

10.

アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 10

11.

Graph Generation as MDP In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. • 反復的なグラフ生成のプロセスをMDPで定式化 –3.1状態: 𝑆= {𝑠𝑡 } Problem Definition • エージェントが観測する、時刻 𝑡 での中間的なグラフ We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d is the node feature matrix assuming each node has d features. We define E 2 { 0, 1} b⇥n ⇥n to be the –(discrete) 行動: A = {𝑎 } 𝑡 edge-conditioned adjacency tensor, assuming there are b possible edge types. E i ,j ,k = 1 if Pb 各時刻で現在のグラフに対する修正を記述する行動の集合(ノード・結合の追加など) there•exists an edge of type i between nodes j and k, and A = i = 1 E i . Our primary objective is to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize EG 0 [S(G0)], –where 状態遷移: P = 𝑝(𝑠 , … , 𝑠be0one , 𝑎𝑡or) multiple domain-specific statistics of interest. G0 is the generated graph, and|𝑠 S𝑡could 𝑡+1 It is also importance to constrain our model with two main sources of prior knowledge. • 𝑠of , practical … , 𝑠0 において行動 𝑎𝑡 を取った時の状態遷移確率 𝑡 (1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of graphs ⇠ pdat to incorporate such prior knowledge by regularizing –example 報酬: R =G {𝑠 } a (G), and would like 𝑡 the property optimization objective with EG ,G 0 [J (G, G0)] under distance metric J (·, ·). In the case of molecule generation, the set of hard constraints is described by chemical valency while the distance • 状態 𝑠𝑡 到達時に得られる報酬関数 metric is an adversarially trained discriminator. 11

12.

Graph Convolutional Policy Network (GCPN) In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. • Graph convolution による生成済みグラフ 𝐺𝑡 と候補構造 𝐶 の特徴抽出 –3.1候補構造(Scaffold) Problem Definition • 生成済みのグラフに対して、新たに追加される部分グラフの候補 We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d is the•node feature matrix assuming each node has d features. We define E 2 { 0, 1} b⇥n ⇥n to be the いくつかの原子からなる集合も考えられるが、本研究では単一の原子のみを想定 (discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. E i ,j ,k = 1 if Pb –there 拡張グラフ 𝐺𝑡type 𝐶 に対し、GCNの一種 [Kipf+, 2017] を拡張したモデルを使って特徴抽出 exists an edge of nodes j and k, and A = objective is ‫ڂ‬i between i = 1 E i . Our primary 0 to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize EG 0 [S(G )], 0 の手法を結合が考慮できるように拡張 where• GKipf+ is the generated graph, and S could be one or multiple domain-specific statistics of interest. It is also of –practical importance to constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of (𝑙) example graphs G ⇠ pdat a (G), and would like to incorporate such prior knowledge by regularizing – 𝑙 層目のノード埋め込み 𝐻 (𝑙) を結合の種類毎に定義した重み 𝑊𝑖 を使って畳み込む the property optimization objective with EG ,G 0 [J (G, G0)] under distance metric J (·, ·). In the case of – 非線形変換などを行った後、AGG処理で各結合の種類に関して統合した結果を 𝐻 (𝑙+1) とする molecule generation, the set of hard constraints is described by chemical valency while the distance metric is an adversarially trained discriminator. ෩𝑖 = σ𝑘 𝐸෨ 𝑖𝑗𝑘 – 𝐸𝑖 : 結合に関する次元を追加した隣接テンソル 𝐸 の 𝑖 番目の slice、𝐸෨𝑖 = 𝐸𝑖 + 𝐼、𝐷 12

13.

Graph Convolutional Policy Network (GCPN) In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. • 行動の予測 –3.1グラフにおけるリンク予測の要領で、𝑎 Problem Definition 𝑡+1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑎𝑓𝑖𝑟𝑠𝑡 , 𝑎𝑠𝑒𝑐𝑜𝑛𝑑 , 𝑎𝑒𝑑𝑔𝑒 , 𝑎𝑠𝑡𝑜𝑝) を推定する • 前項で計算したノード埋め込みベクトルを使ってどのノードを最初に選択するか決める We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d b⇥𝑛n ⇥ n 𝑛×𝑘 is the node – feature assuming each node has d features. We define𝑠E 2∈{ 0, 1}1} to be the 𝑓𝑓𝑖𝑟𝑠𝑡matrix (𝑠𝑡 ) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 (𝑋)), 𝑎 ~𝑓 {0, (𝑚 → ℝ𝑛 へ写像するMLP) 𝑓 𝑓𝑖𝑟𝑠𝑡 𝑓𝑖𝑟𝑠𝑡 𝑡 𝑓: ℝ (discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. E i ,j ,k = 1 if Pb 最初に選択されたノードに関する情報も使ってどのノードを2番目に選択するか決める there•exists an edge of type i between nodes j and k, and A = i = 1 E i . Our primary objective is to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize EG 0 [S(G0)], 𝑛+𝑐 (𝑠𝑡graph, ) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑋)), domain-specific 𝑎𝑠𝑒𝑐𝑜𝑛𝑑 ~𝑓𝑠𝑒𝑐𝑜𝑛𝑑 𝑠𝑡 of∈interest. {0, 1} 𝑠 (𝑋 𝑎𝑓𝑖𝑟𝑠𝑡 where G0 is–the𝑓𝑠𝑒𝑐𝑜𝑛𝑑 generated and S could be one or ,multiple statistics It is also of practical importance to constrain our model with two main sources of prior knowledge. • 選択された2つのノードの情報を使って結合の種類を決める (1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of example graphs G ⇠(𝑠 pdat (G), and would like to𝑓𝑖𝑟𝑠𝑡 incorporate by regularizing – 𝑓𝑒𝑑𝑔𝑒 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 , 𝑋𝑎𝑠𝑒𝑐𝑜𝑛𝑑such )), prior𝑎knowledge 𝑠𝑡 ∈ {0, 1}𝑏 𝑡 ) a= 𝑒 (𝑋𝑎 𝑒𝑑𝑔𝑒 ~𝑓𝑒𝑑𝑔𝑒 the property optimization objective with EG ,G 0 [J (G, G0)] under distance metric J (·, ·). In the case of • 現在のグラフ全体の情報を使って生成プロセスを終了させるか決める molecule generation, the set of hard constraints is described by chemical valency while the distance metric is an adversarially trained discriminator. – 𝑓𝑠𝑡𝑜𝑝 (𝑠𝑡 ) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚𝑡 (𝐴𝐺𝐺 𝑋 )), 𝑎𝑠𝑡𝑜𝑝 ~𝑓𝑠𝑡𝑜𝑝 𝑠𝑡 ∈ {0, 1} 13

14.

状態遷移 / 報酬 In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. • 状態遷移 –3.1生成器が提案したノード / エッジが追加された分子に対して原子価チェックを行い、 Problem Definition その時点でin-validだった場合は状態を更新せず再度行動のサンプリングを行う n ⇥n n ⇥d We represent a graph G as (A, E , F ), where A 2 { 0, 1} is the adjacency matrix, and F 2 R is the node feature matrix assuming each node has d features. We define E 2 { 0, 1} b⇥n ⇥n to be the (discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. E i ,j ,k = 1 if Pb existsreward an edge of type i between nodes j and k, and A = –there Step i = 1 E i . Our primary objective is to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize EG 0 [S(G0)], 0 where• G原子価ルールに違反したかどうか is the generated graph, and S could be one or multiple domain-specific statistics of𝑉(π interest. + Adversarial reward: θ , 𝐷φ ) • 報酬 It is also of practical importance to constrain our model with two main sources of prior knowledge. • Adversarial rewardを算出するDiscriminatorは一般的なGANフレームワークに従って学習する (1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of example graphs such prior knowledge by regularizing – G ⇠ pdat a (G), and would like to incorporate 0 the property optimization objective with EG ,G 0 [J (G, G )] under distance metric J (·, ·). In the case of the set of hard constraints is described by chemical valency while the distance –molecule Finalgeneration, reward metric is an adversarially trained discriminator. • ドメイン固有の報酬(LogP, QED, 分子量等の組み合わせ)+ Adversarial reward: 𝑉(πθ , 𝐷φ ) 14

15.

方策勾配ベースの手法による方策の学習 In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. • Proximal Policy Optimization (PPO) [Schulman+, 2017] により方策を学習 –3.1通常の方策勾配法: Problem Definition 𝑃𝐺 ෡ መ • 𝐿 (θ) =𝔼 𝑡 )𝐴A 𝑡 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d We represent a graph G𝑡aslog (A, π Eθ, (𝑎 F ),𝑡 |𝑠 where is the node feature matrix assuming each node has d features. We define E 2 { 0, 1} b⇥n ⇥n to be the –(discrete) Conservative Policy (CPI): edge-conditioned adjacencyIteration tensor, assuming there are過去の方策との差分に注目 b possible edge types. E i ,j ,k = 1 if Pb there exists an edge of type i between nodes j and k, and A = i = 1 E i . Our primary objective is πθ (𝑎𝑡a|𝑠given to generate graphs that෡maximize property function S(G) 2 R, i.e., maximize EG 0 [S(G0)], 𝑡) 𝐶𝑃𝐼 መ ෡ መ • 𝐿 θ = 𝔼 𝐴 = 𝔼 𝑟 (θ) 𝐴 where G0 is the generated𝑡 graph, S𝑡could πθ and (𝑎𝑡 |𝑠 ) 𝑡 be one𝑡 or 𝑡multiple𝑡 domain-specific statistics of interest. 𝑜𝑙𝑑 –(1)Proximal Policy (PPO): Generated graphs need to Optimization satisfy a set of hard constraints. (2) 方策の更新幅に制限を加えて学習を安定化させる We provide the model with a set of It is also of practical importance to constrain our model with two main sources of prior knowledge. example graphs a (G), and would like to incorporate such prior knowledge by regularizing 𝐶𝐿𝐼𝑃 G ⇠ pdat ෡ መ 𝑡, 0clip(𝑟 • 𝐿 (θ) = 𝔼 min(𝑟with θ under , 1 −distance ε, 1 +metric ε)𝐴መ 𝑡 J) (·, ·). In the case of the property optimization 𝑡objective [J (G, G 𝑡 𝜃 E𝐴 𝑡 0)] G ,G molecule generation, the set of hard constraints is described by chemical valency while the distance metric is an adversarially trained discriminator. 15

16.

アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 16

17.

実験設定 • データセット – ZINCからサンプリングした25万件の分子を使用 – 原子数の上限: 38、原子の種類: 9、結合の種類: 3 • GCPNの設定 – 3層64次元の中間層 + 各層の出力に対してBatch Normalizationを適用 – Aggregation functionにはSUMを採用 – RLの学習率は0.001、expert pretrainingについての学習率は0.00025 – Adam optimizer、バッチサイズ: 32 • ベースライン – JT-VAEとORGANをベースラインに設定 17

18.

実験1: 属性最適化 • 下記二種の属性値を最大化することを目的に実験を行った – Penalized logP: ring sizeや合成可能性スコアも含めたLogP(疎水性)スコア – QED: Drug-likenessを測る指標 • 一貫して既存法よりも優れた結果を達成 – LogP: JT-VAEと比較して約61%、ORGANと比較して約186%の改善 – Step-wiseの原子価チェックにより、in-validな分子は全く生成されなかった • スコアが非常に高い、非現実的な分子を生成してしまう例が稀に見られた – 下図2(a)右下の分子のように、Penalized logPは非常に良いが非現実的であるような、 スコア関数の欠陥をつくような生成例も存在した Figure 2: Samples of generated molecules in property optimization and constrained property optimization task. In (c), the two columns correspond to molecules before and after modification. Refer ences 18

19.

実験2: 属性ターゲティング • LogP, 分子量が特定の値域に収めることを目的に実験を行った – スコアが範囲に収まっているかどうかに加えて、生成物の多様性も含めて評価を行った – 多様性は生成物同士の全ペアに対するMorgan Fingerprintのタニモト距離平均で評価 • 値が大きいほど多様性が高い • 値域の制御については一貫して既存法よりも優れた結果を達成 – 多様性については一部他手法より劣っているものの、致命的なものは無く、値域の制御と 多様性を両立できていると言える 19

20.

実験3: 制約付き属性最適化 • 所与の分子との類似度とPenalized logPとの両立を目的に実験を行った – 800個ピックアップしたZINC分子との類似度を最適化後、Penalized logPについて最適化 する – JT-VAEについては目的関数による制御ができないため、類似度の閾値δでフィルタを行った • 一貫して既存法よりも優れた結果を達成 – Penalized logPの改善幅については平均して148%の改善を達成 – 元の分子の部分構造を保ちながら、目的関数を最適化する新たな分子の生成に関して 一定水準の品質で成功した 20

21.

結論 • Graph Convolutional Policy Network(GCPN)を提案し、分子設計に適用した – 分子属性の最適化、ターゲティングなどのタスクにおいて、既存の手法を上回る性能を 示した – 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い • GCPNは一般的な枠組みであり、分子生成以外の分野にも適用可能 – 電子回路やSNSなどの分野でも、ドメイン固有の目的関数を変更することで適用可能だと 考えられる 21

22.

References • Text-based generative models – Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation of molecules." ACS central science 4.2 (2018): 268-276. – Guimaraes, Gabriel Lima, et al. "Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models." arXiv preprint arXiv:1705.10843 (2017). – Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017. • Graph-based generative models – You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation." NIPS (2018 to appear). – Li, Yujia, et al. "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018). – Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." ICML (2018). – Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." ICLR (2017). • Others – Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). 22