[DL輪読会]Relational inductive biases, deep learning, and graph networks

223 Views

July 06, 18

#deep learning #深層学習 #グラフネットワーク #関係推論 #組み合わせ一般化 #構造化知識

スライド概要

2018/06/29
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 87.1K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.9K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 58.1K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 41.2K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 37.2K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 37K

各ページのテキスト

DEEP LEARNING JP [DL Seminar] Relational inductive biases, deep learning, and graph networks Hiromi Nakagawa, Matsuo Lab http://deeplearning.jp/

http://deeplearning.jp/

書誌情報 • 2018/06/04にarXivに投稿 – https://arxiv.org/abs/1806.01261 – position paper – 23pある • 著者（27人）

https://arxiv.org/abs/1806.01261

Summary • 近年のDeep Learningの発展＝膨大なデータと計算力によるもの – 大量のデータを突っ込んで学習 – hand-engineeringが不要、 E n d - t o - E n d であることが正義 • しかし、それだけでは解決できない問題が現実世界には数多く存在 – 人間に近しいタスクを行っていくには、人間同様に s t r u c t u r e d k n o w l e d g e を使う必要がある • より一般的な推論方法としてGraph Networkを提案 – FCNやCNN、RNNなどをより一般化したネットワーク – s t r u c t u r e d k n o w l e d g e とDeep Learningの f l e x i b i l i t y を相互補完的に活用できる

Agenda 1. Introduction 2. Relational inductive biases 3. Graph Network 4. Design principles for graph network architectures 5. Discussion

1. Introduction • 人間の知能：”Infinite use of finite means” – 少数のentityを組み合わせてあらゆるものを作り出す – combinatorial generalization (組み合わせ一般化？) – 既知の要素（building blocks）から新たなinference, prediction, behaviorを生み出す • 学習を行う時 – 新たな知識を既存のstructured representationsに当てはめる – または、structure自体を調整する

1. Introduction • Structured approachはAI研究における長きに渡るテーマ • 従来の機械学習ではStructured approachが重視されていた – 計算資源やデータが貴重 – 強力な i n d u c t i v e b i a s （帰納バイアス）によってサンプル効率を向上させることの価値が大きかった帰納バイアス • • • • 学習の過程で、entity同士の関係などに対して制約を課すこと学習データに現れないデータを予測するために必要モデルの柔軟性が失われる代わりに、サンプル効率が改善される例 • 線形回帰における目的関数の二乗誤差 • ベイズ推定における事前分布 • 正則化

1. Introduction • 近年では最小限のa prioriの表現・仮定の元でEnd-to-Endに学習するDLが好まれている – explicit structureとhand-engineeringは避けられている – 画像認識や自然言語処理などで大きな成果 • 一方で、combinatorial generalizationを要するタスクは従来のDLのアプローチでは困難 – 複雑な言語・情景認識 – Structured dataに対する推論 – 訓練環境にない環境での転移学習 – 少数の経験からの学習

1. Introduction • これからのAI研究ではcombinatorial generalizationが大事 – s t r u c t u r e と f l e x i b i l i t y を両立させるような手法が求められている • これまでもそのような研究は存在したが、近年ではグラフの活用によってDLの枠組みを活かしながらexplicit structure dataについて推論できる研究が登場しつつある – 離散のentityとそれらの関係性を扱うことが可能 – entityとrelationのrepresentation/structureを学習することが可能 = r e l a t i o n a l i n d u c t i v e b i a s • 本論文ではentity-/relation-basedな推論のための一般的なフレームワークとして、既存の手法を統合/拡張したGraph Networkについて説明 – より効果的なアーキテクチャ設計のためのデザイン原則も検討

10.

Agenda 1. Introduction 2. Relational inductive biases 3. Graph Network 4. Design principles for graph network architectures 5. Discussion

11.

2. Relational inductive biases • Relational reasoning – e n t i t y と r e l a t i o n のstructured representationを、それらがどのように構成されるかという r u l e を用いて扱う – entity：なんらかの属性（attributes）を有する要素 • 例：サイズと質量を有する物体 – relation：entityの間の性質（property）。属性を有する • 例：”Same size as”, “Heavier than” – rule：entityとrelationを別のentityとrelationに写像する関数 • 例：”Is entity X heavier than Y?”

12.

2. Relational inductive biases • Deep Learningの手法の多くも、いくつかのrelational inductive biasを用いている – 複数層積み上げることによる階層的処理のバイアスや、特殊なブロックによるバイアス

13.

2. Relational inductive biases • Deep Learningの手法の多くも、いくつかのrelational inductive biasを用いている – いずれも「デフォルト」にはなりえない – 任意の関係構造を扱えるような、entityとrelationのruleを見つけ出すアルゴリズムが必要 • 現実世界のentityは根本的には順序を持たないものが多い – entity同士の関係性によって順序付けられる（サイズ、年齢、価格...） – 順序不変性はrelational reasoningを行う上では満たされるべき性質 • entityの順序が定義されず不変なもの→集合（Sets） – MLPで扱うの場合、出力の総和を取るようなsymmetric aggregationでないと順序不変でない – 一方で、集合内の特定の要素間にrelationが存在する場合、aggregationはx – どうする？

14.

2. Relational inductive biases • 現実世界のシステムは関係性のある部位/ない部位が入り混じっている→グラフを使おう – 近接する成分とのみ関係し合う=グラフ構造 – 任意のrelational structureを表現でき、複数グラフ間の演算でinductive biasを反映できる

15.

Agenda 1. Introduction 2. Relational inductive biases 3. Graph Network 4. Design principles for graph network architectures 5. Discussion

16.

3. Graph networks • Graph neural networkは様々なタスクで有効性が検証されてきた – visual scene understanding tasks (Raposo et al., 2017; Santoro et al., 2017) – few-shot learning (Garcia and Bruna, 2018) – learn the dynamics of physical systems (Battaglia et al., 2016; Chang et al., 2017; Watters et al., 2017; van Steenkiste et al., 2018; Sanchez-Gonzalez et al., 2018) – multi-agent systems (Sukhbaatar et al., 2016; Hoshen, 2017; Kipf et al., 2018) – reason about knowledge graphs (Bordes et al., 2013; Onoro-Rubio et al., 2017; Hamaguchi et al., 2017) – predict the chemical properties of molecules(Duvenaud et al., 2015; Gilmer et al., 2017) – predict traffic on roads (Cui et al., 2018) – classify and segment videos (Wang et al., 2018c) and 3D meshes and point clouds (Wang et al., 2018d) – classify regions in images (Chen et al., 2018a) – perform semi-supervised text classification (Kipf and Welling, 2017) – machine translation (Vaswani et al., 2017; Shaw et al., 2018; Gulcehre et al., 2018) – model-free (Wang et al., 2018b) and model-based (Hamrick et al., 2017; Pascanu et al., 2017; Sanchez-Gonzalez et al., 2018) continuous control, – model-free reinforcement learning (Hamrick et al., 2018; Zambaldi et al., 2018) – more classical approaches to planning (Toyer et al., 2017).

17.

3. Graph networks • Graph network (GN) Block – グラフを入力としてグラフを出力する「graph-to-graph」モジュール – 1つのグラフは G = (u, V, E)で表される • u：グラフ全体のproperty、global attribute（例：重力場） • V：entityであるnode（ v i ）の集合（例：位置、速度などの属性を有する一つ一つのボール） • E：relationであるedgeの集合（e k）（例：ボールの間のばねの有無と、ばね定数という属性）

18.

3. Graph networks • GN Blockはupdate関数φとaggregation関数ρを有する – φは各node/edge/global attributeごとに更新を行う関数 – ρは集合を入力とし、集計結果として単一の要素を出力する関数。順序不変で可変長の入力を受け取る必要各ボール間の張力を更新各ボールに働く全張力を集計各ボールの位置・速度などを更新全体の張力の合計 (=0)を集計全体の運動エネルギーを集計全体のエネルギーの総和を更新

19.

3. Graph networks • GN Blockはupdate関数φとaggregation関数ρを有する – φは各node/edge/global attributeごとに更新を行う関数 – ρは集合を入力とし、集計結果として単一の要素を出力する関数。順序不変で可変長の入力を受け取る必要各ボール間の張力を更新各ボールに働く全張力を集計全体の張力の合計 (=0)を集計各ボールの位置・速度などを更新全体の運動エネルギーを集計全体のエネルギーの総和を更新

20.

3. Graph networks • GNは強力なinductive biasesを学習に反映できる – entity間の任意の関係性を表現できる • GNへの入力が表現同士のinteraction/isolationを決定できる（cf. 固定のアーキテクチャ） – entityとrelationを順序不変に表現できる • 順序性を反映したければindexを入れ込めばいい（例：positional encoding） – per-edge/per-nodeの関数をネットワーク全体で再利用（共有）できる • nodeやedgeの数、edge間の接続形式などが違うグラフも共通して扱える • combinatorial generalizationを自動的にサポートしている

21.

Agenda 1. Introduction 2. Relational inductive biases 3. Graph Network 4. Design principles for graph network architectures 5. Discussion

22.

4. Design principles for graph network architectures • 特にDeep Learningのアーキテクチャとして、学習可能なgraph-to-graphの関数近似器としての特徴に注目し、以下を解説 – Flexible representations – Configurable within – Composable multi-block architectures-block structure

23.

4. Design principles for graph network architectures • Flexible representations – global/node/edgeのattributeには任意の表現が使える • 実数ベクトル、テンソル、系列、集合、グラフ – 出力もテンソルなので、出力をMLP/CNN/RNNなどにつなぐことも可能 • edge-focused output：entity間のinteractionについて知りたい時 • node-focused output：物理システムについて推論したい時 • graph-focused output：物理システムのpotential energyを予測したい時 • これらを組み合わせることも可能

24.

4. Design principles for graph network architectures • Flexible representations – 入力のrelational structureを明示的に指定することが可能 • knowledge graphs, social networks, parse trees, optimization problems, chemical graphs, road networks, and physical systems with known interactions

25.

4. Design principles for graph network architectures • Flexible representations – 指定せずにrelational structure自体を推論させることも可能 • visual scenes, text corpora, programming language source code, and multi-agent systems – ただしrelationが全くわからない場合に全結合させると計算量が爆発するので、unstructured dataからスパースな構造を推定する手法が必要[Kipf et al. 2018]

26.

4. Design principles for graph network architectures • Configurable within-block structure – GN Block内の関数は様々に設定できる

27.

4. Design principles for graph network architectures • Composable multi-block architectures – GN blockを組み合わせることで複雑なアーキテクチャを実現可能 – 例：(c)Recurrent GN architecture → trajectory of a dynamical system over time

28.

Agenda 1. Introduction 2. Relational inductive biases 3. Graph Network 4. Design principles for graph network architectures 5. Discussion

29.

5. Discussion • Combinatorial generalization in graph networks – GNの構造はcombinatorial generalizationをサポートしている • システム全体だけでなく、entityやrelationに対しても処理を共有 – 未知のシステムにおいても、その構成要素についてわかっていれば推論可能 • Limitations of graph networks – recursion, control flow, and conditional iterationなどはグラフで表現しにくい

30.

5. Discussion • Open questions – センサ値をグラフなどの構造化表現に落とし込む最適な方法の考案 • 全結合してもいいが、そもそものentityの定義や、スパースな表現の獲得の必要がある – 演算中にグラフ構造が変わるなどの状況への対処 • ノードが分裂する – より解釈性の高い分析や可視化手法の開発 • 世界がobjectとrelationによって構成されている、という人間の認知と近いためそもそもの解釈性は高い

31.

まとめ • 近年、特にDLによってAIが大きく進歩しているものの、人間の知能のようなefficient generalizable learningとの間には大きな壁 • 著者らはこれをcombinatorial generalizationによって解決すべきと主張 • 人間の認知や従来のengineeringなどのstructured knowledgeとDLを組み合わせることで、強力なrelational inductive biasesを反映し、かつ柔軟な学習ができるように、これまでの Graph Networkの研究を統一的に扱えるフレームワークを提案 • Graph Networkはまだ発展途上であるものの今後に期待

32.

感想 • 現在ある多くのDLモデルがinductive biasを反映していて、それを踏まえてGraph Networkとしてより一般化するという考え、entity, relation, ruleによる表現などは興味深い • relationやrule自体が学習されるのは面白そう、可視化・解釈で知見を得られる＆その知見をモデルに反映する、とか • 提案しているフレームワークで具体的なデータとStructured knowledgeをどう定義するのか、学習するとどういう結果になるのかとかはそこまでイメージがわかず – 参考文献読めばある程度はわかるのかもしれないが – 180件くらいある