第10回配信講義　計算科学技術特論A（2025）

Computational Science Alliance T h e Un ive r s i t y of To k yo テンソルネットワークを用いた大規模計算東京大学大学院理学系研究科大久保毅 1

講師の背景毅（OKUBO Tsuyoshi) 大久保東京大学理学系研究科量子ソフトウェア寄付講座特任准教授研究分野：統計物理、物性物理、磁性、計算物理 • • • 剛体円盤のランダムパッキング階層社会形成の平均場解析・シミュレーション（古典）フラストレート磁性体の新規秩序 • • • multiple-Q 状態, Z2-vortex,… 脱閉じ込め量子臨界現象テンソルネットワーク • （量子）スピン系、テンソル繰り込み • • Monte Carlo 機械学習、量子アルゴリズム …. 2

3.

今日の話の流れ • はじめに • • • • テンソルとテンソルネットワークテンソルネットワーク計算の基礎 • 基本的な演算 • 近似的なテンソルネットワーク縮約：テンソル繰り込み群 • 変分法による固有値問題への応用テンソルネットワーク計算の大規模化 • テンソルネットワーク計算の実装・特徴 • アルゴリズムに特化した並列化：HOTRG • 汎用的なライブラリを使った並列化：TeNeS まとめ 3

4.

はじめに：テンソルとテンソルネットワーク 4

5.

テンソル？ • ベクトル： 1次元的な数字の並び • 行列： 2次元的な数字の並び一般化 • (n階の）テンソル： n次元的な数字の並び【基本的な演算＝縮約】行列積：縮約： "足"が多くなると表記が複雑... 5

6.

ダイアグラムを用いたテンソル表記 R. Penrose, Combinatorial Mathematics and its Applications, 221-244 (1971) • ベクトル： • 行列： • テンソル：テンソルの積（縮約）の表現＊n階のテンソル＝n本の足 C ＝ B A C D ＝ B A 6

7.

縮約の計算量 3本行列積： A, B = C ＝ B A の計算量= テンソル縮約： A= B= 4本 C ＝ A B の計算量= ダイアグラムとの対応 • • 縮約の計算量はダイアグラムの足の数で分かる（メモリ使用量も分かる） 4本 7

8.

縮約の計算量と計算順テンソル縮約： A= B= C= C D ＝ B A Case 1: の計算量= Case 2: の計算量= 縮約の評価順で計算量が変わる！ C B A AB C A B AC ＊最適順序の決定はNP困難。実用的なアルゴリズム例 R.N.C. Pfeifer, et al., Phys. Rev. E 90, 033315 (2014). 8

9.

テンソルネットワーク：テンソルの縮約で構成されたネットワークテンソルネットワーク（TN）【（ざっくりした）分類】 • • Openな足：あり or なし • Openな足があり：TN自身が大きなテンソル • Openな足がなし：TNは数字ネットワーク構造：規則的 or 不規則 • • ネットワーク構造は問題に応じて変わる • 例：スピン模型の分配関数は規則的 • 例：分子の多体電子状態は不規則ネットワークサイズ：有限 or 無限 • 基本的に有限だが、場合によっては無限系も取り扱える 9

10.

テンソルネットワークの例1：統計物理学古典イジング模型（磁性体のモデル）温度T での確率分布：ボルツマン分布状態：逆温度：熱力学自由エネルギー分配関数： (2Nの和) • Openな足は"なし" 【分配関数のテンソルネットワーク表示】• 規則的 A A • 有限〜無限 A A A A A A A A 10

11.

[beta]

テンソルネットワークの例2：量子回路
量子回路：

googleの"量子超越" 回路
F. Arute, et al., Nature 574, 505 (2019)

量子ビットに演算するゲート操作の回路図
Article

developed fast, high-fidelity gates that can be executed simultaneously
across a two-dimensional qubit array. We calibrated and benchmarked
the processor at both the component and system level using a powerful
new tool: cross-entropy benchmarking11. Finally, we used componentlevel fidelities to accurately predict the performance of the whole system, further showing that quantum information behaves as expected
when scaling to large systems.

2次元のベクトル。

適当な基底の元では、ユニタリ

適当な基底、（ | 0⟩, | 1⟩）で

行列（or "テンソル"）

a

A suitable computational task

表現すると2次元の複素ベクトル

To demonstrate quantum supremacy, we compare our quantum processor against state-of-the-art classical computers in the task of sampling
the output of a pseudo-random quantum circuit11,13,14. Random circuits
are a suitable choice for benchmarking because they do not possess
structure and therefore allow for limited guarantees of computational
hardness10–12. We design the circuits to entangle
b a set of quantum bits
Single-qubit
gate: logi(qubits) by repeated application of single-qubit and
two-qubit
25 ns
cal operations. Sampling the quantum circuit’s output produces a set
of bitstrings, for example {0000101, 1011100, …}.
Owing to quantum
Qubit
XYbitstrings
control
interference, the probability distribution of the
resembles
a speckled intensity pattern produced by light interference in laser
Two-qubit gate:
scatter, such that some bitstrings are much more likely
to occur than
12 ns
others. Classically computing this probability distribution
becomes
Qubit 1
Z control
exponentially more difficult as the number of qubits
(width) and number
of gate cycles (depth) grow.
Coupler
We verify that the quantum processor is working properly using a
2 compares how
benchmarking11,12,14,Qubit
which
A
B
CmethodDcalled cross-entropy
Z control
often each bitstring is observed experimentally with its corresponding
m
5
6
7
8
ideal probability computed via simulation on a classical computer. For
a given
circuit,into
we four
collect
the measured
bitstrings
i} and compute the
couplers
are divided
subsets
(ABCD), each
of which is{x
executed
11,13,14
•
linear
cross-entropy
benchmarking
fidelity
(see
also
Supplementary
simultaneously across the entire array corresponding to shaded colours.
Here
Information),
which
is
the
mean
of
the
simulated
probabilities
of the
we show an intractable sequence (repeat ABCDCDAB); we also use different
bitstrings
we
measured:
coupler subsets along with a simplifiable sequence (repeat EFGHEFGH, not

googleの"量子超越"
回路 F. Arute, et al., Nature 574, 505 (2019)
Article
a
0
0

C

A

0

B
D

0
0

Row

Column

W
X
X
W
Y

Time
Cycle 1

A

B
2

D

C
3

4

Fig. 3 | Control operations for the quantum supremacy circuits. a, Example
quantum circuit instance used in our experiment. Every cycle includes a layer
each of single- and two-qubit gates. The single-qubit gates are chosen randomly
from { X , Y , W }, where W = (X + Y )/ 2 and gates do not repeat sequentially.
The sequence of two-qubit gates is chosen according to a tiling pattern,
coupling each qubit sequentially to its four nearest-neighbour qubits. The

量子回路=テンソルネットワーク
single-qubit gates chosen randomly from
on all qubits,
followed by two-qubit gates on pairs of qubits. The sequences of gates
which form the ‘supremacy circuits’ are designed to minimize the circuit
depth required to create a highly entangled state, which is needed for

Adjustable coupler

b

Openな足は"あり"
• "なし"もある

shown) that can be simulated on a classical computer. b, Waveform of control
FXEB = 2n"P(xi)#i − 1
signals for single- and two-qubit gates.

量子コンピュータの古典シミュレーション
{ X, Y, W}
＝テンソルネットワークの縮約

Qubit

• 不規則

(1)

where nthese
is thecircuits
number
qubits,
P(xi)processor
is the probability
ofthan
bitstring
of running
onofthe
quantum
is greater
at xi
computed
for
the
ideal
quantum
circuit,
and
the
average
is
over
least 0.1%. We expect that the full
• data for Fig. 4b should have similar the
observed
bitstrings.
Intuitively,
is correlated
withtoo
how
often
FXEB
fidelities, but since the simulation times
(red
numbers) take
long
to we
sample
high-probability
bitstrings.
When
there
are
no
errors
in the
check, we have archived the data (see ‘Data availability’ section). The

有限

10 mm
Fig. 1 | The Sycamore processor. a, Layout of processor, showing a rectangular
array of 54 qubits (grey), each connected to its four nearest neighbours with
couplers (blue). The inoperable qubit is outlined. b, Photograph of the
Sycamore chip.

11

12.

テンソルネットワークによる量子回路シミュレーション量子回路のシミュレーション=テンソルネットワークの縮約古典コンピュータでの計算：実際の回路の実行順序によらず、最適な順番でテンソルの縮約計算を行うことで、計算コスト、メモリコストが低下最先端の計算： Y. A. Liu, et al., Gordon bell Prize in SC21 (2021), Googleが量子超越を主張したランダム量子回路の古典サンプリング 10,000年（最初の見積もり） 304秒！（cf. 量子コンピュータ=200秒） 12

https://www.doi.org/10.1145/3458817.3487399

13.

テンソルネットワークの例3：量子多体状態量子多体状態：基底量子スピン・bit：量子化学： i = 原子軌道・分子軌道の占有数係数はテンソル量子多体状態 (a) (b) テンソルネットワーク分解 (c) PEPS (for 2d system) <latexit sha1_base64="XkC8uXv0uT1Qilx+79HHlBouX2k=">AAACZ3icjVG7SgNBFD1ZXzE+EhVEsFFDxCrMiqBYBW0s4yMPSELYXSfJkH2xuwnE4A9Y2EawUhARP8PGH7DwE8RSwcbCu5sFsYh6h5l758w9956ZUW1duB5jzxFpaHhkdCw6HpuYnJqOJ2Zm867VcjSe0yzdcoqq4nJdmDznCU/nRdvhiqHqvKA2d/3zQps7rrDMI69j84qh1E1RE5ri+VA564pqIimnWWBLg4MkQstaiVuUcQwLGlowwGHCo1iHApdGCTIYbMIq6BLmUCSCc45TxIjboixOGQqhTVrrtCuFqEl7v6YbsDXqotN0iLmEFHtid+yNPbJ79sI+B9bqBjV8LR3yap/L7Wr8bOHw40+WQd5D45v1q2YPNWwFWgVptwPEv4XW57dPem+H2wep7iq7Zq+k/4o9swe6gdl+1272+cElYv/7gPx6WmZpeX8jmdkJvyKKRaxgjd57ExnsIYsc9W3gHD1cRF6kuDQvLfRTpUjImcMPk5a/AI6fiyk=</latexit> 量子相関の小さいテンソルに分解特徴を利用した近似 ~eNの独立要素 ~O(N)の独立要素量子多体系の低エネルギー状態： • 一般の状態（ランダムベクトル）に比べて、少ない量子相関 • c.f. エンタングルメントエントロピーの面積則テンソルネットワークによる高精度の近似 • Openな足は"あり" • 規則・不規則 • 有限・無限 13

14.

テンソルネットワークによる近似シミュレーション量子回路の近似シミュレーション古典コンピュータでの計算：量子回路に従って移り変わる量子状態をテンソルネットワークで近似的に表現する • 初期は小さいテンソルで表現可能 • • 非常に多くのqubitを古典コンピュータで取り扱える回路が深くなると、一般にテンソルが大きくなる • • 計算を進めるには（テンソルを小さく保つ）近似が必要深くなればなるほど、近似精度が低下 14

15.

10 TABLE 2 a f2 (x) with error bound setting to 10 3 . For the 10th-order tensor, all 9 k r̄ as well as the number of total parameters Np are compared. 9 4 5 5 7 9 テンソルネットワークの例4：テンソル型データ 1 2 3 4 Np 5 6 7 8 9 1512 1944 2084 2144 2732 2328 3088 10376 3312 任意のテンソル型データ 1512 1944 2064 2144 4804 4224 9424 7728 2080 1360 1828 1384 1360 2064 1324 1360 1788 1384 1360 1544 1348 1360 1556 1348 1360 1864 1384 1360 2832 1384 1360 1600 1272 ：量子多体状態と同様にして分解可能テンソルネットワーク分解テンソル型データ ce, sive results that are similar to that in noise free cases. TRpt ALSAR slightly overestimates the TR-ranks. It should be As noted that TR-BALS can estimate the true rank correctly m- and obtain the best compression ratio as TR-ALS given 情報の相関構造の In true rank. In addition, TR-BALS is more computationally ral efficient than TR-ALSAR. In summary, TT-SVD and TR特徴を利用した近似 are SVD have limitations for representing the tensor data with be symmetric ranks, and this problem becomes more severe TR when noise is considered. The ALS algorithm can avoid this ral problem due to the flexibility on distribution of ranks. More (Q. Zhao, et al arXiv:1606.05535) ata detailed results can be found in Table 3. s aCOIL-100 dataset = 32 x 32 x 3 x 7200 テンソル her 6.2 COIL-100 dataset ors We are 10. mal der ng ks. ed go3, en • Openな足は"あり" 1360 1324 1324 • 規則・不規則 • 有限 T 例1：画像データセットピクセル色画像数 M1 M2 r テンソルリング分解 M4 M3 例２：ニューラルネットワークの重み行列 (Z.-F. Gao et al, Phys. Rev. Research 2, 023300 (2020).) xi: input neuron (pixel) yi: output neuron matrix x and y Wij: weightZE-FENG GAO etconnecting al. since t grows out th of the ment e the sys The ap quantu [40,41 In t 15 neural

16.

[beta]

13th IEEE International Conference on Renewable Energy Research and Applications

TABLE I.

RESULTS FOR MAXIMUM COMPRESSION CASE

テンソルネットワークと機械学習
Method

compression ratio of target
layers (%) / # of total
parameters

MAPE (%)

NLL (-)

Original

100.00 / 35.06K

33.31

295.16

3

FIG. 1. Procedures to be iterated for optimizing the tensors and network
structure in1.98
the/ 5.36K
adaptive tensor
tree
(ATT) method.
SNIP
35.13
297.82
(a) Three candidate decompositions of a combined tensor
withH.
four
legs. Before
computing
the and
bondK.mutual
information
D. Watari,
Tanimoto,
T. Okubo,
S. Todo,
Nishiyama,
SNAP
0.43 / 2.87K
41.75
306.36
(BMI) for each configuration, the component tensors (red) are Proceedings
improved. (b)
For a target bond
thick black line), we
of ICRERA2024,
429(the
(2024)
SVDthe one with the2.97
/ 5.66K BMI41.86
obtain a bigger tensor with four legs by contracting it. We choose
smallest
among 306.22
the three possible
TT
1.64 / 5.26K
34.36
296.04
decompositions.

ニューラルネットワークの圧縮
13th IEEE International Conference on Renewable Energy Research and Applications

Input layers
Hidden layers

Output layers

…

…
…

Mixing
coefficient 𝜋𝜋𝑚𝑚

Mixing
coefficient 𝜋𝜋1𝑚𝑚

𝑗𝑗

𝑗𝑗2

…

…
…

Mean 𝜇𝜇𝑚𝑚

…

…

exp

Standard
deviation 𝜎𝜎𝑚𝑚

exp

exp

…

…

Embedding

Embedding (lookup-table)
(lookup-table)

𝑗𝑗3

exp

…

…

Mean 𝜇𝜇𝑚𝑚

…

…

…

…

…

…

…

…
…
…

…

…

…

input
𝒙𝒙

softmax

…

…

…
input
𝒙𝒙

Output layers

softmax

Input layers

Hidden layers

Standard
deviation 𝜎𝜎𝑚𝑚

Fig. 2. Model architecture of MDN model.

in

𝑗𝑗1
Weight
𝑗𝑗2
tensor

*Bold: best / Italic: second best
高圧縮率でも性能の劣化が非常に小さい
Fig. 6. Validation

November 9-13, Nagasaki, JAPAN

out
in
𝑖𝑖1 1
outWeight𝑗𝑗1

𝑗𝑗1
𝑖𝑖1 𝑟𝑟

𝓖𝓖

𝑖𝑖1
tensor

𝑟𝑟1

𝑖𝑖2𝓦𝓦 𝑗𝑗2

𝓖𝓖2

42

𝓖𝓖1
𝓖𝓖2

𝑖𝑖1

40

1

MAPE (%)

重み行列をTN分解して情報圧縮

th IEEE International Conference on Renewable Energy Research and Applications

November 9-13, Nagasaki, JAPAN

Original
TT
SVD
SNIP
SNAP

indicating its l
38
information und
2
𝓦𝓦
𝑟𝑟
𝐼𝐼
×
𝐽𝐽
performance ove
𝑗𝑗3
𝑖𝑖23
3
𝑗𝑗
𝑖𝑖
𝐼𝐼FIG.
× 𝐽𝐽 1. Procedures
𝓖𝓖
to be iterated
for
optimizing
the
tensors
and
network
structure
in
the
adaptive
tensor
tree
(ATT)
method. loss.
𝑖𝑖3
3
3
validation
3
36
𝑗𝑗
𝑖𝑖
𝓖𝓖
3
3
(a) Three candidate decompositions of a combined tensor with four legs. Before computing the bond mutual information
𝐼𝐼 = 𝑖𝑖1 × 𝑖𝑖2 × 𝑖𝑖3
pattern similar t
each configuration, the component tensors (red) are improved. (b) For a target bond (the thick black line), we
𝐼𝐼 = 𝑖𝑖1(BMI)
× 𝑖𝑖2 × 𝑖𝑖3for
𝐽𝐽 = 𝑗𝑗1 × 𝑗𝑗2 × 𝑗𝑗3
Decompose into
exhibited a perf
with four
legs by contracting
𝐽𝐽 = 𝑗𝑗1obtain
× 𝑗𝑗2 × 𝑗𝑗3a bigger tensor
𝑘𝑘 34 it. We choose the one with the smallest BMI among the three possible
Decompose
into
tensor
node
𝓖𝓖
and
decompositions. tensor node 𝓖𝓖𝑘𝑘 and
leading to highe
shrink tensor rank 𝑟𝑟𝑘𝑘
0
10
20
30
40
50
shrink tensor rank 𝑟𝑟𝑘𝑘
different compre
Fig. 3. Illustration of TT decomposition applied to a fully connected layer.
𝑖𝑖2

𝑗𝑗2

𝑖𝑖2 𝑟𝑟

𝑖𝑖2

Compression ratio of target layers (%)
Fig. 3. Illustration of TTOriginal
decomposition
a fully
connected
weightapplied
tensorto𝓦𝓦
(matrix
𝑾𝑾) islayer.
decomposed into three
Original weight interconnected
tensor 𝓦𝓦 (matrix
𝑾𝑾) nodes.
is decomposed into three
tensor
Fig. 5. Compression ratio vs accuracy (lower left is better).
interconnected tensor nodes.
(consumer ID, time slot, day of week, and holiday indicators)

Fig. 2. Model architecture of MDN model.

ボルンマシン表現による生成モデル

(consumer ID, time slot, day of week, and holiday indicators)
are processed
an embedding
layer,
mapping
them to
are processed
through anthrough
embedding
layer, mapping
them
to
dense
vector
representations
in
a
V-dimensional
Euclidean
an example
with where
d = 3 the
nodes,
where the original
dense vector representations in a V-dimensional Euclidean
illustrates an illustrates
example with
d = 3 nodes,
original
space.fully
Multiple
fully hidden
connected
hidden
layers
core ofmatrix
B. 𝓦𝓦
Performance
weight
matrix
𝑾𝑾
(regarded
as
the
tensor
) can be Comparison
space. Multiple
connected
layers
form the
coreform
of theweight
𝑾𝑾 (regarded as the tensor 𝓦𝓦 ) can be
the multi-layer
output
layer produces
represented
through the
contraction
of these three
nodes.I presents the results for the maximum compression
the multi-layer
perceptron.perceptron.
The outputThe
layer
produces
represented through
the contraction
of these
three nodes.
Table
for the
Gaussian distribution:
mixing
parametersparameters
for the mixture
of mixture
Gaussianofdistribution:
mixing
After TT decomposition,
(4) can
be reformulated
case.
The proposed TT method achieved significant parameter
After
equation (4) canequation
be reformulated
coefficients
𝜋𝜋𝑚𝑚𝜇𝜇,𝑚𝑚 means
𝜇𝜇𝑚𝑚 , anddeviations
standard 𝜎𝜎deviations
𝜎𝜎𝑚𝑚 . TT decomposition,
coefficients
𝜋𝜋𝑚𝑚 , means
, and standard
𝑚𝑚 .
by
transforming
the
output,
input,
and
bias
into
tensors
𝓗𝓗,
by transforming
the output, input, and bias into tensors 𝓤𝓤, 𝓗𝓗,
reduction,𝓤𝓤,compressing
the target layers to 1.64% of their
These parameters
learned by minimizing
logThese parameters
are learnedare
by minimizing
the negative the
log-negative
and
𝓑𝓑
respectively:
and
𝓑𝓑
respectively:
the loss
function
𝐿𝐿 over𝒟𝒟:
the dataset 𝒟𝒟:
likelihoodlikelihood
as the loss as
function
𝐿𝐿 over
the dataset
original size (5.26K total parameters). Despite this dramatic

K. Harada, T. Okubo, and N. Kawashima,
Mach. Learn.: Sci. Technol. 6 025002, (2025)

•

reduction, TT maintained the best MAPE (34.36%) and NLL
(296.04) among all compression methods, closely matching
�𝒙𝒙𝑝𝑝,𝑡𝑡 , 𝑦𝑦𝑝𝑝,𝑡𝑡 �∈𝒟𝒟�𝒙𝒙𝑝𝑝,𝑡𝑡 , 𝑦𝑦𝑝𝑝,𝑡𝑡 �∈𝒟𝒟
the original model's performance (MAPE: 33.31%, NLL:
• B. Tensor-Train
through TT decomposition
is achieved by is achieved by
B. Tensor-Train
Decomposition
for Model Compression
through TT decomposition
Decomposition
for Model CompressionCompression Compression
295.16). SNIP achieved similar accuracy (MAPE: 35.13%)
𝑘𝑘
the rank
. If the
size of tensor
constraining the
therank
rank 𝑟𝑟𝑘𝑘 . If the rank
the 𝓖𝓖sizewith
of tensor
𝓖𝓖𝑘𝑘𝑟𝑟𝑘𝑘with
constraining
TT decomposition
[14] is a [14]
powerful
TT decomposition
is a technique
powerful for
technique for
but
doesn't
reduce the actual model size as pruned weights are
𝑟𝑟𝑘𝑘 is sufficiently
it is possible
toitreduce
both the
number
compressing
high-dimensional
data structures
representing
𝑟𝑟𝑘𝑘 issmall,
sufficiently
small,
is possible
to reduce
both
the number
compressing
high-dimensional
databystructures
by representing
of model parameters
and
computational
complexity [10].complexity
The
zero-filled.
This demonstrates TT's ability to preserve model
them as athem
product
lower-dimensional
tensors. Thistensors.
study This
of model
parameters
and computational
[10]. The
as aofproduct
of lower-dimensional
study
rank
is a crucial
hyperparameter;
smaller
ranks
yield
higher
applies TTapplies
decomposition
to
the
fully
connected
layers
of
the
rank is a crucial hyperparameter; smaller ranks
yield higher even under extreme compression.
expressiveness
TT decomposition to the fully connected layers of the
𝑝𝑝,𝑡𝑡

𝑝𝑝,𝑡𝑡

𝑝𝑝,𝑡𝑡

𝑖𝑖1 ,…,𝑖𝑖𝑑𝑑

量子状態にTN分解を適用

𝑘𝑘
𝑖𝑖𝑘𝑘 ,𝑗𝑗𝑘𝑘 ,𝛼𝛼𝑘𝑘−1 ,𝛼𝛼𝑘𝑘
𝑖𝑖1 ,…,𝑖𝑖𝑑𝑑
𝑗𝑗1 ,…,𝑗𝑗𝑑𝑑 𝛼𝛼0 ,…,𝛼𝛼𝑑𝑑 𝑗𝑗𝑘𝑘1 ,…,𝑗𝑗𝑑𝑑 𝛼𝛼0 ,…,𝛼𝛼𝑑𝑑 𝑘𝑘

𝑘𝑘

+ 𝓑𝓑𝑖𝑖1𝑘𝑘,…,𝑖𝑖
𝓗𝓗𝑑𝑑𝑗𝑗(6)
+ 𝓑𝓑𝑖𝑖1 ,…,𝑖𝑖𝑑𝑑 (6)
𝑗𝑗1𝑖𝑖,…,𝑗𝑗
𝑘𝑘,𝛼𝛼𝑘𝑘−1 ,𝛼𝛼
1 ,…,𝑗𝑗𝑘𝑘
𝑘𝑘 ,𝑗𝑗𝑘𝑘

データに合うように

compression compression
rates but potentially
embedded MDN model. This approach aims to reduce the
rates but sacrifice
potentiallymodel
sacrifice model
embedded MDN model. This approach aims to reduce
the
expressiveness.
The
optimal
approach
is
to
set
the rank as low
number of parameters and computational complexity while
Figure
illustrates the performance across various
expressiveness.
The
optimal
approach
is
to
set
the rank5as low
number of parameters and computational complexity
while without compromising model accuracy.
as possible
preserving the network's expressive power. To distinguish
as possible without compromising model accuracy.
compression ratios. The TT method demonstrated remarkable
preserving
the network's
expressive
power.
To distinguish
tensors from
scalar, vector,
and matrix
quantities,
we use
This
TT
decomposition
technique
enables
the
creation
of
tensors
scalar,
vector,
use
consistency,
maintaining a MAPE around 34% even as the
This TT
decomposition
technique
enables
calligraphic
letters from
to denote
tensors
(e.g.,and
𝒜𝒜). matrix quantities, awe
more efficient MDN
model
for electrical load
forecasting
in the creation of
<latexit sha1_base64="vOJX0Msp2QRw56wrP2TXE8H3N2E=">AAACs3ichVE9LwRBGH6s7+9DI9FcXMhpZE4IkRCJRnk+DnHHZXfMMbFf2d3bhHV/QKFVqEgUolVJaDT+gMJPECWJRuHdvQ1B8E5m5pln3uedZ/Jqti5dj7GHGqW2rr6hsam5pbWtvaMz0dW97Fplh4sct3TLWdVUV+jSFDlPerpYtR2hGpouVrSd2fB+xReOKy1zydu1xbqhbpmyJLnqEVVMpLPpgi944FeGklPJQslRebBfyLryg97fGKkEa5ViIsWGWRTJnyATgxTiyFqJKxSwCQscZRgQMOER1qHCpZFHBgw2cesIiHMIyeheoIIW0pYpS1CGSuwOrVt0ysesSeewphupOb2i03RImcQAu2fn7JndsQv2yN5+rRVENUIvu7RrVa2wi50HvYuv/6oM2j1sf6r+9OyhhInIqyTvdsSEv+BVvb939Lw4uTAQDLJT9kT+T9gDu6UfmP4LP5sXC8d/+NHIS1QlRNSmzPem/ATLI8MZwvOjqZnpuGFN6EM/0tSVccxgDlnk6JVDXOIaN8qYklc0ZbOaqtTEmh58CcV4B51poSY=</latexit>

Notably, the
the best perform
close while off
trade-off betwee
favorable for TT
The consistent
compression rati
in various applic

手描き文字データ分布のTN表現

確率分布を量子状態を通して表現
𝓤𝓤 (3)= �
𝓖𝓖 � �
𝓗𝓗𝓖𝓖
𝓤𝓤 �= �
�
𝐿𝐿 = − � 𝐿𝐿 = − � log 𝑃𝑃�𝑦𝑦 �𝒙𝒙log�𝑃𝑃�𝑦𝑦 �𝒙𝒙(3)�
𝑝𝑝,𝑡𝑡

methods. Line
standard devia

TN構造を最適化

C. Matrix Rank
To understan
accuracy while
depth analysis o
weight matrices.
calculate two ke
matrix rank, de
values, indicate
space. The soft r
16
squared Frobeni

17.

NJO ror in the L2 norm due to this approximation is ǿ = ȕ + ȉ ǿ . ds number for the 2D and 3D systems in b and c, respectively. edure can be straightforwardly generalized by replacing bits with TJO(ɿY (2D) or octals (3D), that is, by replacing 1D line segments with or cubes (3D). parametrizing The maximal numbers are then given by ! umber of variables theSchmidt solution (NVPS) by more E(O) one order of magnitude. VJ ( U S R ) = ȉǿ (U)3ǿ (U 9L )Gǿ (U YM ) SR = 9L + Y!M −(Z−Z ) I n 2D and equation The (5) conceptual in 3D. similarity between the NBY E ( Y Z) = ɿ F r network algorithm presented here and those used in quanǿ = テンソルネットワークの広がり：偏微分方程式 hysics opens the possibility of conducting computational fluid Nikita Gourianov et. al., Nat. Comput. Sci. 2, 20 (2022). uct state algorithm. The INSE comprises a coupled set of partial mics on a quantum computer. Positions Xk correspond to a quadratic grid with 2n × 2n[ARTICLES points NATURE COMPUTATIONAL SCIENCE DPT(ɿY-CP “近似表現に必要なボンド次元” (coarse grid), and xl correspond to a fine sub-grid with 2N − n × 2N − n quations for the velocity field V and pressure p: a b grid points. The functions Rα and fα obey the orthonormality lts + DPT(ɿY256 y/L tifying interscale Throughout this work we fol- condition ∇ ·correlations. 7= subgrid 4 1 TDJ, t/T = 0.25 he standard approach in computational fluid dynamics and ! ! ! TDJ, t/T = 0.75 4 Ⱥ7 . Th 3/4 tize the computational domain. Each spatial dimension is disδ = u /(40A) and 3 G U Y ) = Ȃ = NBY E + E + ( 7 · ∇) 7 = −∇ Q + ȋ ∇ 7 ǿ (U 9L )3 Ȁ (U 9L ) =0 ǿ (U YM )G Ȁ (" ǿȀ M TDJ, t/T = 1.25 ȺU Y Z ed by 2N grid points, where N is a positive integer. In this way, 4 TDJ, t/Tas = 1.75 L M tensor shown in Fig. 2b are defined 1/2 locity field , 1,024 grid 4 N 1/4 kinematic viscosity ∇ is the nabla operator. Afterwhere discretizing “流れ場”が 2 and グリッド点上にある δαβ is the Kronecker delta. The parameter n = 1, …, N − 1, 256 grid , χȒ JK =( 25Z U) 4 labels the possible bipartitions of the square lattice in coarse and ional domain as described at the beginning of this Article, we , 0 ! x/L fine grids = 10 and n = 2). The Schmidt number d(n) 7(U SR ) = VJ (U SR )F̂J 1/2 3/4 1 (in Fig. 1a N 1 0 1/4 2 3 4 7 8 9 denotes the number of1 retained terms in5 the6 summation in equaJ= n th bipartition tion (2), and each product Rαfα is weighted by a Schmidt coefficient NATUR MPUTATIONAL SCIENCE | VOL 2 | JANUARY 2022 | 30–37 | www.nature.com/natcomputsci ARTICLES d c λ s Cartesian components ui are discrete functions of the grid appear in descending order λ1 ≥ λ2... ≥ λd(n), α ≥ 0. These coefficients 8 4 TGV, t/T = 0.2 t =1.75 =0.25 tor=0.75 tleast =1.25 s rq, where K is the spatial dimension and F̂J are Cartesian soa varying d(n) twill only add remove the important of b x/L TGV, t/T = 0.8 the orthonormal0.2basis ectors. We measure the interscale correlations4 by using the グリッド 8 functions. Here we take d(n) as a quantita0.5 0.8 0.2 0.5 0.8 0.2 0.5 0.8 0.2 0.5 0.8= 1.4 TGV, t/T idt (singular value) decomposition to systematically divide the tive measure for the interscale correlations of turbulent flows at a TGV, t/T = 2 0.7 8 4 1a for K = 2. utational grid into sub-grids, as illustrated in Fig. given bipartition n of the lattice: d(n) = 1 corresponds to an uncor-, 256 grid DNS D 0.5 grid , 64 ecompose (for details, see the Schmidt decomposition section related product state, and with increasing d(n) the flow becomes TGV (3D) 0.3 8 N N 4 χ ∼Re Methods) each component ui on this 2 × 2 grid into func- more strongly correlated between the coarse and the fine grid., χ = 207 TDJ (2D) 0.7 Note that although the1 d(n) = 1 product state exhibits no interscale R and f on a coarse二進数表記 and a fine subgrid,量子ビット系 respectively: 1 流体力学方程式の数値解法 2 box 5 0 4 0 0 3 d(n) 0 2 2 2 99 box 4 4 0 box 0 3 3 99 101 d(n) 2 y/Lbox χ99 0 102 Re 0 2 3 3 0.71 χ = 118 E COMPUTATIONAL SCIENCE | VOL 2 | JANUARY 2022 | 30–37 | www.nature.com/natcomputsci 99 0.5 3 10 0.3 1 2 3 4 n th bipartition 5 6 χ 7 31 Fig. 1 | Interscale correlations of turbulent fluid flows. 0.7 Panels a and b correspond to a square with edge length Lbox on a 2 !×!2 grid. a, The subgrid structure when decomposing a function ui according to equation (2) for n!=!2. Red dots are the 22!×!22 grid points Xk of the coarse grid. Each blue square 0.5 the 28!×!28 grid points x of the fine grid. b, The Schmidt numbers d (n,!t) on a logarithmic scale χ = 74subgrid with attached to the Xk indicates the quadratic k 99 such that the decomposition in equation (2) results0.3 in a 99% accurate representation of DNS solutions to the INSE at four different times (Fig. 2). The domain D indicated by the black dashed line corresponds to DNS. The gray shaded area W is for solutions on a 28!×!28 grid. The blue shaded area M is for d(n)!≤!25 in equation (2). c, Same as in b but for0.7 the 3D simulations shown in Fig. 3. In b and c, T0 is the characteristic timescale on which the quickest particles in the initial flow fields can traverse the box (Set-up of numerical experiments section in the Methods). d, Scaling of ȕ = NBY E (O U) with the χ = 33 0.5 Reynolds number for the 2D and 3D systems in b and c, respectively. 10 テンソルネットワークで表現された流れ場の時間発展を解く ↔︎ 0.3 10 17 χ χ

18.

テンソルネットワーク計算の基礎 18

19.

テンソルネットワークの数値計算テンソルネットワークを用いた応用の基本計算要素 • • • テンソルの縮約 • 基本的に、2つずつ縮約計算をする • テンソルを行列に変形し、BLASなどを用いるテンソルの低ランク近似 • 特異値分解による低ランク近似の拡張 • 近似的な縮約を行う目的などに用いられる • 多くの場合、テンソルを行列に変形し、行列の特異値分解を用いるテンソルの線形問題 • テンソルから構成される行列の（一般化）固有値問題 • 量子多体問題、テンソル分解などの"最適化"で使用テンソルの基本演算は、（現状は）行列に変形して行われる cf. TBLIS テンソル向けのBLAS（BLIS= BLAS-like Library Instantiation Software) 19 https://github.com/devinamatthews/tblis

https://github.com/devinamatthews/tblis

20.

テンソルの行列への変形テンソルの足をまとめて行列とみなす l ダイアグラム j l j k i i i j l k k (0,0) → 0 (0,1) → 1 i, l = 0, 1のとき (1,0) → 2 (1,1) → 3 テンソル形状 • テンソル用のライブラリで簡単に行える。（例：numpy.reshape) • 行列への変形は一般に、一意ではない • どの様に行列化するかは、目的に合わせる 20

21.

テンソルネットワークの縮約テンソルネットワーク縮約の計算量ループのないツリー型の構造以外では、計算量はテンソル数に関して、指数関数的に増大する長さNのchain L×Lのsquare lattice 局所テンソル：局所テンソル：端から順に縮約：端から順に縮約：大規模なテンソルネットワーク縮約は近似的に評価 2d 規則TNに対する汎用的アプローチ： • テンソル繰り込み群＊不規則でも同種の近似は可能 • 行列積状態法 21 • 角転送繰り込み群

22.

テンソルネットワーク計算の基礎：近似縮約 22

23.

テンソルネットワーク繰り込み群 • M. Levin and C. P. Nave, PRL (2007)による Tensor network Renormalization Group (TRG)から始まった比較的新しい流れ • テンソルネットワークを粗視化していくことで、近似的に縮約を計算する • • 粗視化⟷実空間繰り込み群種々の（規則格子）TNに適用可能 • 分配関数の計算に適用することで、物性、素粒子、原子核分野の物理研究に応用されている 23

24.

TRGでやりたいこと繰り込み（長さスケールが√２倍） : : （近似） L×L の正方格子 (L×L)/2 の正方格子テンソルの大きさを変えずにテンソルの数を減らす 24

25.

TRGの準備：特異値分解特異値分解任意の行列N×M行列Aは以下の形に一意に分解できる A = は非負の実数。 rank(A) = 非ゼロの特異値の数と並べると便利対角成分がλの対角行列一般化ユニタリ行列 Aの最適なRランク近似：特異値を大きい方からR個だけ残し、残りをゼロで置き換える 25

26.

TRGの準備：特異値分解による近似 Aの最適なRランク近似：特異値を大きい方からR個だけ残し、残りをゼロで置き換える A = :M×N (M ≦ N) ≃ :M×M :(M, N)×M 近似 :R×R :(M, N)×R さらに = = ：対角成分がの対角行列 X SVDを使うと Y :M×R :R×N Aを小さい行列の積に分解できる 26

27.

テンソルネットワーク繰り込みのレシピ１．分解行列とみなす i l i i j l j j k SVD による近似 l k k rank(A)=χ に近似 : : （近似） 27

28.

テンソルネットワーク繰り込みのレシピ２．粗視化内側の足を縮約元のテンソル２つが新しいテンソル１つに粗視化された : 28

29.

テンソルネットワーク繰り込みの計算量 M. Levin and C. P. Nave, Phys. Rev. Lett. 99, 120601 (2007) Z.-C. Gu, M. Levin and X.-G. Wen, Phys. Rev. B 78, 205116 (2008) 実はO(χ5)まで減らせる（後で議論します）計算量： SVD= O(χ6) 縮約= O(χ6) （＊テンソルあたり）＊1 TRG ステップでテンソル数は1/2になるテンソルネットワーク縮約がテンソル数に対して多項式で計算可能 29

30.

テンソルネットワーク計算の基礎：固有値問題 30

31.

固有値問題の変分法例：最低エネルギー状態コスト関数： Fの最小値その時の変分法 • • 固有値問題の近似解を得る方法の一つ Fの最小値を制限された空間の範囲で探すの形を仮定する＝試行関数、変分波動関数例：平均場近似、テンソルネットワーク状態、ニューラルネットワーク, ... • 良い試行関数→高精度の最低エネルギー • 複雑な試行関数→コスト関数の計算量が増大 31

32.

テンソルネットワークによる変分法変分法の試行関数としてテンソルネットワークを用いるテンソル積状態（TPS, PEPS）例：行列積状態（MPS） (b) (c) 小さいテンソルに分解コスト関数： Fを最小にするを探す！メリット：元のベクトル空間の次元がaNの時に、O(N)のコストで計算可能＊ただし、行列は（TN表現可能な）疎行列 32

33.

Iterative optimization (F. Verstraete, D. Porras, and J. I. Cirac, Phys. Rev. Lett. 93, 227205 (2004)) i 番目のテンソルに着目した局所最適化 Minimize 小さいサイズの一般化固有値問題 (最小固有値の固有状態を探す） i 注: 行列サイズ= Aiを取り除く 33

34.

Iterative optimization (F. Verstraete, D. Porras, and J. I. Cirac, Phys. Rev. Lett. 93, 227205 (2004)) Aiをi =1 から Nまでsweepしてupdate … i =N から 1まで逆方向にsweep … 収束するまで繰り返す 34

35.

行列のコンパクトな表現注！このアルゴリズムは、行列自体がテンソルネットワークで効率的に表現できる場合にのみ有効に使えるこの行列は例えば Matrix Product Operator (MPO) と呼ばれる形式で表すことができる（場合がある）。 = 例： 1次元量子S=1/2ハイゼンベルグ模型のハミルトニアンのMPOで表現できる（χはNに依存しない） 35

36.

テンソルネットワーク計算の大規模化 36

37.

テンソルネットワーク計算の実装テンソル繰り込み、変分法などのテンソルネットワーク計算の実装中・小規模の計算は、既存のライブラリなどで簡単に実装できる例： • Python + numpy, scipy • Julia • MATLAB • 機械学習フレームワーク • Fortranでも（大変だけど）できる大規模計算では、分散メモリによる並列化が不可避 cf. TRGの計算コスト~O(χ6), PEPSだとO(χ9)~O(χ12) 2つのアプローチ • アルゴリズムに特化した並列実装を行う • 汎用的な並列ライブラリを用いる例：mptensor (https://github.com/smorita/mptensor) 慶應大の森田悟史さんが開発＊前者の方が性能が高くなりやすいが、多くのTN計算は（とても）複雑であり、汎用的なライブラリの方が実装コストが大幅に低い 37

https://github.com/smorita/mptensor

38.

テンソルネットワーク計算の特徴 • テンソルネットワークの基本的な計算は行列の線形代数になる • 行列に変形した後は、既存の行列用ライブラリが使える • • • BLAS, LAPACK, ScaLAPACK, .... テンソルと行列の相互変換、"transpose"が何度も現れる • 分散メモリ並列化では、通信が多数発生する可能性 • 共有メモリでも、メモリ書き換えコストがある • （変換のためのindex計算も場合によっては重い）行列のサイズは、正方行列より"長方形の行列"が多数でてくる • メモリと計算量のバランスが悪くなりやすい • 計算効率が出にくい場合も多い 38

39.

効率的なTN計算のための注意点テンソル縮約の順序縮約の評価順序で計算コストが変わるため、計算順序の最適化が重要 • 簡単なネットワークであれば（慣れた）人間が最適化する • NCONなどのアルゴリズム、ライブラリを使って最適化する • 最近のライブラリだと、最適化した順序でTN縮約をやってくれるものもある • pythonのopt_einsum • 最適化のアルゴリズムにも注意が必要（真面目にやると遅い） Transposeの遅延プログラムのある行でテンソルのindexを並び替えても、実際の計算で必要になるまでは、メモリ上の並び替えはしない＊numpyなどの多くのライブラリで基本的に実装されている（注）numpyのtranspose（reshape）はスレッド並列に対応していないため、大規模計算でここがボトルネックになることがある →自分でスレッド並列版の実装を書くなどして対応する 39

40.

効率的なTN計算のための工夫疎行列の特異値分解 TN計算では、特異値分解を（大幅な）低ランク近似に用いるため、全特異値を求める必要がないことが多い Full SVD ではなく、partial SVD （truncated SVD) を用いる (密行列ソルバー) (疎行列ソルバー) 例：TRGでの特異値分解 SVDによる近似 i l j k rank(A)=χ に近似密行列ソルバー： (全特異値を求める）疎行列ソルバー： (上位χ個の特異値を求める） 40

41.

効率的なTN計算のための工夫疎行列の特異値分解（つづき）疎行列ソルバーでは、行列ベクトル積が計算できれば良いため、行列を陽に作る必要もなくなる例：TRGでの特異値分解 SVDによる近似 i l j k rank(A)=χ に近似テンソルの内部構造：この縮約はこの縮約は χ個の特異値 41

42.

テンソルネットワーク計算の大規模化：HOTRG （アルゴリズムに特化した並列化の例） 42

43.

[beta]

分散メモリ並列化の例：3D-HOTRG法に特化
HOTRG: SVDをテンソルに拡張したHOSVDによるTRG
Z. Y. Xie et al, Phys. Rev. B 86, 045139 (2012)

⌘
<latexit sha1_base64="6fFfiOA5K65pfNgux7h1VXJdczM=">AAACaXichVFNLwNBGH66vqq+Wi4NF7GpODWzJSEO0sTFkVZLQtPsrsGwX/ajSTX+gJOb4EQiIn6Giz/g4CeII4mLg7fbTYQG72Rmnnnmfd55ZkZzDOH5jD3FpI7Oru6eeG+ir39gcCiZGi57duDqvKTbhu2ua6rHDWHxki98g687LldNzeBr2v5ic3+txl1P2NaqX3d4xVR3LLEtdNUnqrzJDwJRqyZllmVhjLcDJQIyoli2kzfYxBZs6AhggsOCT9iACo/aBhQwOMRV0CDOJSTCfY4jJEgbUBanDJXYfRp3aLURsRatmzW9UK3TKQZ1l5TjyLBHdste2QO7Y8/s49dajbBG00udZq2l5U516DhdfP9XZdLsY/dL9adnH9uYC70K8u6ETPMWektfOzx9Lc4XMo1JdsVeyP8le2L3dAOr9qZfr/DCBRL0AcrP524H5VxWmc7mVmbk/EL0FXGMYQJT9N6zyGMJyyjRuXs4wRnOYy9SSkpLo61UKRZpRvAtJPkT06uMOA==</latexit>

任意次元の超立方格子に適用可能
3次元の場合

の計算コスト
計算のボトルネック

⌘
<latexit sha1_base64="6fFfiOA5K65pfNgux7h1VXJdczM=">AAACaXichVFNLwNBGH66vqq+Wi4NF7GpODWzJSEO0sTFkVZLQtPsrsGwX/ajSTX+gJOb4EQiIn6Giz/g4CeII4mLg7fbTYQG72Rmnnnmfd55ZkZzDOH5jD3FpI7Oru6eeG+ir39gcCiZGi57duDqvKTbhu2ua6rHDWHxki98g687LldNzeBr2v5ic3+txl1P2NaqX3d4xVR3LLEtdNUnqrzJDwJRqyZllmVhjLcDJQIyoli2kzfYxBZs6AhggsOCT9iACo/aBhQwOMRV0CDOJSTCfY4jJEgbUBanDJXYfRp3aLURsRatmzW9UK3TKQZ1l5TjyLBHdste2QO7Y8/s49dajbBG00udZq2l5U516DhdfP9XZdLsY/dL9adnH9uYC70K8u6ETPMWektfOzx9Lc4XMo1JdsVeyP8le2L3dAOr9qZfr/DCBRL0AcrP524H5VxWmc7mVmbk/EL0FXGMYQJT9N6zyGMJyyjRuXs4wRnOYy9SSkpLo61UKRZpRvAtJPkT06uMOA==</latexit>

のメモリコスト

43

44.

分散メモリ並列化の例：3D-HOTRG法特化並列化の指針（筑波大の山下・櫻井らのアイデアを改良） z T. Yamashita and T. Sakurai, Comp. Phys. Comm. 278, 108423 (2022) • 全部でχ2のMPIプロセス • 各プロセスは(z,z')の各indexに対応するダイアグラムを計算 • 各プロセスは(z,z')を固定した、O(χ5)のテンソルをメモリに保持 • 計算を方向を変えて繰り返すために、χ4のデータをχ2の相手にbroadcst • z' Broadcastするデータはχ4に減らせるの計算コスト /プロセスの計算コストのメモリコスト /プロセスのメモリコスト MPI並列化 44

45.

HOTRG 1stepの並列化性能 ISSP sekirei • 実行時間は予想通りのO(χ9)スケーリング • スレッド並列の性能も概ね出ている京 45

46.

計算要素の重み = 24 Sekirei make̲U Bcast̲U make̲T K Bcast̲T reshape̲T make̲U 3E+02 6.00000E+02 2.25E+02 4.50000E+02 1.5E+02 3.00000E+02 7.5E+01 1.50000E+02 0E+00 chi=24, omp=1 chi=24, omp=2 chi=24, omp=4 chi=24, omp=6 0.00000E+00 Bcast̲U make̲T Bcast̲T reshape̲T chi=24, omp=1 chi=24, omp=2 chi=24, omp=4 chi=24, omp=8 • make_U （HOSVDによるprojectorの計算）はmake_T（縮約）に比べて無視できる • Bcast_T（データのbroadcast）は実行時間の 5 ~10 % 概ね満足できる性能 46

47.

テンソルネットワーク計算の大規模化：TeNeS （汎用ライブラリを用いた並列化の例。紹介のみ） 47

48.

分散メモリ並列化の例：2次元量子多体問題 TPS (Tensor Product State) (AKLT, T. Nishino, K. Okunishi, …) PEPS (Projected Entangled-Pair State) (F. Verstraete and J. Cirac, arXiv:cond-mat/0407066) 例：2次元正方格子のTPS 4+1 階のテンソルが敷き詰められたネットワーク局所自由度：s Virtual自由度：i, j, k, l 各インデックスの次元＝ボンド次元（D）変分波動関数としての精度に関係するパラメタ（D→∞で厳密に） TPSを変分波動関数とする変分法 • • 面積則を満たすため、有限Dでも精度の良い近似 • 無限系も直接、有限のDで計算できる：iTPS テンソルネットワークのみを仮定した、バイアスの少ない変分波動関数 • ボンド次元の増大により、系統的に精度を改善できる 48

49.

iTPS法の適用例例：（モンテカルロ法のできない）フラストレート磁性体 H. YAMAGUCHI et al. RE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09063-7 フラストレート正方格子模型 5/9 0.4 3/9 Spin 1 0 0 1 site 1 0.1 3 B/J g. 1 Calculated magnetization process for the spin-1/2 KAFM with the earest-neighbor interactionD. J. Nakamura, The tensor network method with the R. Okuma, T. Okubo et al, rojected entangledNat. pair Commun. state (PEPS)10, is used. vertical and horizontal 1229The (2019). xes represent magnetization M divided by saturated magnetization Ms and 0 site 2 1.0 0.5 1.5 (c) 2.0 site 1 0.3 site 2 0.2 2 site 1 H 0.4 0.3 2 Magnon 1/9 0 (b) <Sz > M /MS 7/9 (a) Intensity M/Msat 1 1.0 J6 J1 0.9 site 1 0.8 J2 J5 0.7 J4 J3 site 2 0.6 0.5 0.4 0.3 H=0 0.2 0.1 0 0 0.5 0.5 (< Sx>2+<Sy>2+<Sz>2 )1/2 カゴメ格子ハイゼンベルグ模型の磁化曲線 0.2 site 2 0 0.5 1.0 1.5 0.1 H/|J1| 0 0.5 1.0 1.5 FIG. 4. The ground states under magnetic fields obtained from H. Yamaguchi, Y. Sasaki, T. Okubo, D = 6 iTPS calculation. (a) Normalized magnetization curve, (b) Phys. Rev. B 98, 094402 (2018). average local magnetization, (c) average local moment at T = 049 calculated using the tensor network method assuming the ratios of var P(1 den

50.

Y. Motoyama, T. Okubo, K. Yoshimi, et al., Comput. Phys. Commun. 279, 108437 (2022). Y. Motoyama, T. Okubo, K. Yoshimi, et al., Comput. Phys. Commun. 315, 109692 (2025). Tensor Netwok Solver (TeNeS) https://github.com/issp-center-dev/TeNeS 無限系のTPS（iTPS）を用いた変分法による計算虚時間発展法によるテンソルの最適化カゴメ格子模型の磁化曲線 1 Ver.2 では実時間発展・有限温度にも対応 D = 10 0.8 @sekirei 72node ~4時間弱 MPI/OpenMPによる大規模並列計算に対応 mptensor（森田）によるテンソル演算の並列化 0.6 二次元の量子スピン系やボゾン系が簡単に計算可能標準的な二次元格子にデフォルトで対応原理的には任意の二次元格子に対応可能 0.4 3-site unit cell 0.2 0 開発チーム 0 0.5 1 1.5 2 2.5 • 大久保毅（東大理）：アルゴリズム部分の実装森田悟史（慶應大）：関連ライブラリ・ツール作成本山裕一（物性研）：メインプログラム等の設計・実装吉見一慶（物性研）：ユーザーテスト・サンプルの作成、マネージメント青山龍美（物性研）：ユーザーテスト・サンプルの作成【物性研高度化加藤岳生（物性研）：ユーザーテスト・サンプルの作成 • 川島直輝（物性研）：プロジェト統括 • • • • • プロジェクト】 50 3

51.

まとめ • テンソルネットワーク（TN）は計算科学のいろいろな場面に現れる • • 問題自体がTN形式で表現される。近似としてTN形式が現れる。 TNの基本的な計算は、縮約（行列積）、低ランク近似（SVD）、および、固有値問題で、行列演算ライブラリが活用できる • • 小・中規模の計算は、汎用的なライブラリで簡単に実装できる • • 重要な応用例：テンソル繰り込み群、固有値問題の変分法 Transpose や縮約順序など、テンソル独自の問題もある大規模計算には、分散メモリでの並列化が必要になる • アルゴリズムに特化した並列化：HOTRGなど • 汎用ライブラリを用いた並列化：TeNeSなど 51

52.

参考文献 • 日本語の文献 • 「テンソルネットワーク形式の進展と応用」西野友年、大久保毅、日本物理学会誌 Vol 72, No. 10, 2017 • 「テンソルネットワークによる情報圧縮とフラストレート磁性体への応用」大久保毅、第63回物性若手夏の学校テキスト、物性研究 Vol. 7, No.2 (2018) • 「テンソルネットワークの基礎と応用統計物理・量子情報・機械学習」西野友年、SGCライブラリ、サイエンス社 • • 数理科学 2022年2月号特集「テンソルネットワークの進展」 • 「テンソルデータ解析の基礎と応用」横田達也、次世代信号情報処理シリーズ7、コロナ社 English • "A practical introduction to tensor networks: Matrix product states and projected entangled pair states", R. Orús, Annals of Physics 349, 117 (2014). • “Tensor Network Techniques for Quantum Computation”, M. Collura et al., SISSA Medialab S.r.l. (2024) • “Density matrix and tensor network renormalization”, T. Xian, Cambridge University Press (2023). • "Tensor Network Contractions", Shi-Ju Ran, Emanuele Tirrito, Cheng Peng, Xi Chen, Luca Tagliacozzo, Gang Su, Maciej Lewenstein, Lecture Note in Physics, vol. 964, Springer, (2020). (Open access: https://link.springer.com/book/10.1007%2F978-3-030-34489-4) 52

https://link.springer.com/book/10.1007/978-3-030-34489-4

第10回配信講義　計算科学技術特論A（2025）

R-CCS　計算科学研究推進室

関連スライド

第1回配信講義　計算科学技術特論A （2023）

第1回配信講義　計算科学技術特論A（2025）

第2回配信講義　計算科学技術特論A （2023）

第6回配信講義　計算科学技術特論A （2023）

第8回配信講義　計算科学技術特論A （2023）

第４回配信講義　計算科学技術特論A （2023）

各ページのテキスト

第10回 配信講義 計算科学技術特論A（2025）

R-CCS 計算科学研究推進室

関連スライド

第1回 配信講義 計算科学技術特論A （2023）

第1回 配信講義 計算科学技術特論A（2025）

第2回 配信講義 計算科学技術特論A （2023）

第6回 配信講義 計算科学技術特論A （2023）

第8回 配信講義 計算科学技術特論A （2023）

第４回 配信講義 計算科学技術特論A （2023）

各ページのテキスト

第10回配信講義　計算科学技術特論A（2025）

R-CCS　計算科学研究推進室

第1回配信講義　計算科学技術特論A （2023）

第1回配信講義　計算科学技術特論A（2025）

第2回配信講義　計算科学技術特論A （2023）

第6回配信講義　計算科学技術特論A （2023）

第8回配信講義　計算科学技術特論A （2023）

第４回配信講義　計算科学技術特論A （2023）