"Will HPC be a next decade disruptor, or will it be disrupted?"

115 Views

May 28, 24

スライド概要

Feasibility Study Invited Talk (DAY-1 : Jan 29, 2024)
Eric Monchalin (Eviden)

The 6th R-CCS International Symposium
https://www.r-ccs.riken.jp/R-CCS-Symposium/2024/

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

Will HPC be a next decade disruptor, or will it be disrupted? Eric Monchalin Chair of the European Processor Initiative Vice President at Eviden, Head of Machine Intelligence 29/01/2024 © European Processor © Eviden SASInitiative 1

2.

the uncertain certainties 1945 Thomas J.Watson (CEO of IBM) « World market for may be five computers » 1977 Ken Olson (CEO of DEC) « No reason for anyone to have a computer at home » 1980 IBM study « Only about 50 Cray-1 class computers will be sold per year » © European Processor Initiative 2

3.

So , let be cautious The views expressed on the following slides are those of the presenter. They do not necessarily represent any view of Eviden, EPI or EuroHPC JU organizations, affiliates or employees. © European Processor Initiative 3

4.

1 4 2 5 3 6 The ground truth What’s at stake? The threats The weaknesses Making HPC future a reality The European initiative © European Processor © Eviden SASInitiative 4

5.

1 The ground truth © European Processor Initiative

6.

Consumption Software is eating the world (Marc Andreessen) Yet hardware is still shaping it Containers & Virtualization Cloud Infrastructure SW Defined Data Center Value SW defined compute SW defined storage SW defined network SW Defined Infrastructure Data Center Facilities Provision © European Processor Initiative 6

7.

The divergences of the microelectronics performances 10 Flops Improvements per 5 years 9 DRAM Bandwidth 8 Interconnect Bandwidth 7 6 Normalized scaling 5 4 3 2 1 dw dw id th id th e si z an B IO D R A M b an ce ll si ty D R A M p rd en si st o ea k U P /G U P C Tr an P (2 x la w e' s oo r M flo /2 yr s) s 0 Source: medium.com (AI and Memory Wall) © European Processor Initiative 7

8.

Supremacy of data generation over computation Large Hadron Collider Tens of Petabytes per year Square Kilometer Array 400 exabytes per year Climate modeling 400 Petabytes per year Advanced Light Source 7 Terabytes per hour Genomics Exabytes per year Data volumes are growing far beyond © European Processor Initiative 8

9.

New era of the cooperation in compute and data continuum Data Center Edge IoT From Hierarchy To Swarm Data Center Edge IoT © European Processor Initiative 9

10.

2 What’s at stake? © European Processor Initiative 10

11.

Tackle research, economic, industry, and societal challenges Anticipate climate changes Study earthquakes Control epidemics Care for aging population Innovate without limit Secure energy resources © European Processor Initiative 11

12.

Who knows what’s HPC and numerical simulation? © European Processor Initiative 12

13.

Keeping HPC so confidential is at risk © European Processor Initiative 13

14.

Reinvent HPC for a brilliant phygital future © European Processor Initiative 14

15.

3 The threats © European Processor Initiative 15

16.

Gen Z moto: You Only Live Once Leave an impact on the world Entrepreneur Align job and interests Tech-savvy Multi-taskers Few focus work-life balance Digital interactive Freedom © European Processor Initiative 16

17.

Make scientific education great again Is there anyone else? I love sciences ? © European Processor Initiative 17

18.

Foreseen mismatch between electricity supply and demand 2020 91% Impact of energy on global CO2 emissions 14% Energy that is green Electricity 7% Electricity weight of Information & Communication Technology 2030 © European Processor Initiative +50% Increase in demand for electricity 20% Electricity weight of Information & Communication Technology 18

19.

Energy still a sensitive subject The paper that forced Timnit Gebru out of Google in 2020 © European Processor Initiative 19

20.

4 The weaknesses © European Processor Initiative 2 0

21.

From Flops to Bytes per Flop Direct solver Dense Matrix Direct solver Banded Matrix Iterative solver (CG) Sparse Matrix Iterative solver (AMG) Sparse Matrix Spectral solver (FFT) Regular grid O(N3) O(N2.7) O(N1.5) O(N.log(N) O(N.log(N) Linpack benchmark HPCG benchmark Dense blocks n2 memory access - n3 computation Sparse data n memory access - n computation Core intensive Memory intensive © European Processor Initiative 21

22.

Lack of memory throughput Top 5 2023 06 Tflop/s peak Linpack efficiency HPCG efficiency Frontier 1 679 819 71% 0,8% Fugaku 537 212 82% 3,0% LUMI 428 704 72% 0,8% Leonardo 304 466 78% 1,0% Summit 200 795 74% 1,5% FP64 (Tflop/s) Bandwidth (TW*/s) BW / FP64 NVIDIA A100 9.70 0.485 5% Millan 64C 2Ghz 2.05 0.051 2.5% W* = 4 bytes © European Processor Initiative 22

23.

Needed but so energy expensive memory throughput Operation Picojoules per operation 45 nm 7 nm Ratio 7nm - Picojoules per operation (log10 scale) Add. FP 16 0.4 0.16 2.5 (IEEE) FP 32 0.9 0.38 2.4 FP 16 1.1 0.34 3.2 (IEEE) FP 32 3.7 1.31 2.8 SRAM 1 MB SRAM 32 KB 20 8.5 2.4 SRAM 32 KB 64b access 1 MB 100 14 7.1 Mult. DRAM 64b access DDR 3/4 HBM2 Circa 45 nm Circa 7nm 1 300 1 300 HBM2 (64b) 250 1300 DDR 3/4 (64b) 14 8.5 Mult. IEEE FP 32 Add. IEEE FP 32 1.31 0.38 1.0 250-450 Source: Ten Lessons From Three Generations Shaped Google’s TPUv4i © European Processor Initiative 23

24.

The uncertain future of monolithic chips x8 x4 Logic 2nm 3nm SRAM Analog $M725 x2 Logic 7nm SRAM x1 Logic 5nm Analog SRAM Analog Logic $M449 $K16 Analog $M249 (chip development cost) 2018 $K10 (wafer foundry cost) 2020 SRAM $M581 $K20 Sources: fuse.wikichip.org granitefirm.com (Andy Lin’s blog) TSMC IBS, July 2022 Denis Dutoit (CEA LIST) 2022 © European Processor Initiative 2024 2026 24

25.

the irresistible rise in energy consumption 100000 Training (flops) Power (kW) 10000 1000 Top 500 #1 x1.6 / 5 years x750 / 2 years 100 2002 2005 2008 2011 2014 2017 2020 2023 Source: medium.com (AI and Memory Wall) © European Processor Initiative 25

26.

Gate expensive FP64 computation not so useful for AI NVIDIA GPU performances 1000 TF32 TC (Tflops) 100 More expensive than ever Eflops Linpack in the coming years? FP64 TC (Tflops) 10 FP64 (Tflops) 1 Memory Bandwidth (TB/s) 0,1 2011 2012 2013 2016 2017 2020 © European Processor Initiative 2022 26

27.

Flops are isolated, not IOs that are polluted by workflow concurrencies Lattice Quantum Chromo Dynamics Workflow 4KB-16KB sequential IOs (log10 scale) 1sec to 17sec 66msec to 1sec Read Write 1msec to 66 msec 64!sec to 1msec 4!sec to 64 !sec 0!sec to 4 !sec 1 10 100 1 000 © European Processor Initiative 10 000 (1%) 100 000 (10%) 1 000 000 (100%) 27

28.

5 Making HPC future a reality © European Processor Initiative 2 8

29.

The basic ingredients Specific computing chips Contained power budget Ease of life of GAFAM generation Greener ecosystem © European Processor Initiative 29

30.

I have a dream HPC/Cloud user environment convergence Generate/store key data only Programming language abstraction Standardized accelerator interface Less FP64 sensitive algorithms Prescriptive maintenance Monolithic simulations to workflows Data life cycle management AI augmented algorithms End user tooling Augmented orchestration Standardized carbon footprint mngt Augmented Management Unified CPU/accelerator memory Predictive maintenance Ultra Ethernet for scale up/out Photonic computing E2E & unified security of a federation Near memory Computing System on wafer FP64 profiler Dynamic Frequency Management 3D memory Inner / outer photonics Classical/Quantum convergence Superconducting CMOS Polymorphic & Disaggregated IOs Disaggregated SoC Standardized chiplet foundation Disaggregated architecture MPI offload Reconditionable infrastructure Full VHV to LV DC power Short term Mid term Long term © European Processor Initiative 30

31.

Dataflow challenges Data moving through the continuum Application workflow pressure on file system IoT Data Center Edge Init dataflow © European Processor Initiative running dataflow 31

32.

Polymorphic & Disaggregated IOs Application Application Application Multi protocol I/O Multi protocol I/O name space router Multi protocol I/O name space router services name space router services Dataflow node services Dataflow node Dataflow node Init dataflow running dataflow Data Data Data repository Data Data repository Data repository Data repository repository Data repository repository Data repository Namespace space repository Name repository Namespace space Name Name space Data container Data container Data container IoT to Data center data continuum © European Processor Initiative 32

33.

Dataflow nodes in action: write elapsed time divided by 2 Lattice Quantum Chromo Dynamics Workflow 4KB-16KB sequential Writes (log10 scale) 1sec to 17sec 66msec to 1sec 1msec to 66 msec With dataflow nodes Without dataflow nodes 64!sec to 1msec 4!sec to 64 !sec 0!sec to 4 !sec 1 10 100 1 000 10 000 (1%) © European Processor Initiative 100 000 (10%) 1 000 000 (100%) 33

34.

Save power at almost no performance cost Detect execution phases Classify phases Modulate CPU frequency Ø Nucleus for European Modelling of the Ocean: 16% energy saving (4% over time execution) Ø High Performance Conjugate Gradient: 15% energy saving (3% over time execution) © European Processor Initiative 34

35.

6 The European initiative © European Processor Initiative 35

36.

EU back in the race with EuroHPC JU Deploy Develop, deploy, extend & maintain a world-leading supercomputing, quantum computing, service & data infrastructure ecosystem in Europe Innovate Support the development of innovative supercomputing components, technologies, knowledge & applications to underpin a competitive European supply chain Value Widen the use of HPC & quantum infrastructures to a large number of public & private users wherever they are located in Europe and supporting the development of key HPC skills for European science and industry © European Processor Initiative 36

37.

Fuel European ambition (2021-2027) Digital Europe Program 1.98B Eur Horizon Europe Program 900M Eur Infrastructure Technology Federation of supercomputing services Application Widening usage and skills International Cooperation Connecting Europe Facility 200M Eur Hyperconnectivity Data connectivity *Member states to match this with national contributions © European Processor Initiative 37

38.

Supercomputer deployment NOV 2023 TOP500 Green500 LUMI #5 #7 LEONARDO #6 #18 MARENOSTRUM 5 #8 #6 MELUXINA #71 #27 KAROLINA #113 #25 DISCOVERER #166 #216 VEGA #198 #253 Year Performance JUPITER 2024 Exascale DEADELUS 2024 Mide-range JULES VERNES 2025 Exascale Underway © European Processor Initiative 38

39.

Quantum deployment Agreements with Six hosting entities 2 quantum simulators (100+ qubits) in Ø Joliot Curie (GENCI / France) Ø Juwels (JFZ / Germany) Two procurements in progress Ø EuroQCS-Poland (PSNC / Poland) Ø Euro-Q-Exa (LRZ / Germany) • Call in progress for 2 quantum Excellence Centers © European Processor Initiative 39

40.

Federate EuroHPC systems (2023+) Lumi (FI) Authentication, Authorization and Identification services (AAI) Computing services Ø Interactive Computing Ø Cloud access – Virtual Machines - Containers Jupiter (DE) Meluxina (LU) Data services Ø Archival Services and Data repositories Ø Data mover / transport services Karolina (CZ) Vega (SI) Discoverer (BG) Leonardo (IT) User and Resource management Deucalion (PT) MareNostrum 5 (ES) Daedalus (GR) © European Processor Initiative 40

41.

Infrastructure roadamp JUPITER DAEDALUS Mid-range Q1 Q2 Q3 Q4 Q1 2023 Q2 Q3 Q4 Q1 2024 Upgrades JULES VERNES POST-EXA Q2 2025 Q3 Q4 2026 - 2027 Upgrades Federation/Hyperconnectivity Industrial Systems Quantum systems Quantum systems © European Processor Initiative 41

42.

Areas of Strategic Research & Innovation Leadership in Use & Skills Competence Centres and training programs in HPC commensurate with the labor market. Applications and Algorithms Centres of Excellence for HPC Applications and new algorithms for European exascale technology. User Application Software European Software Stack Software and algorithms, programming models and tools for exascale and post exascale systems. System Software Currently Around 40 running projects European Open Hardware Ecosystem for the low power high-end general purpose processor and accelerator. © European Processor Initiative Hardware 42

43.

(Partial) system technology roadmap IO-Sea Data mngt and storage Deep-Sea SW for Exascale Archi. Red-Sea interconnect Energy Efficient Technologies 1 ultra-high-speed Interconnect 1 ARM next gen 1 DARE 1: RISC-V Energy Efficient Technologies 2 ultra-high-speed Interconnect 2 ARM next gen 2 DARE 2 : RISC-V EUpilot: RISC-V pilot EUpex: ARM pilot EPI: ARM & RISC-V © European Processor Initiative 43

44.

One-stop shop for HPC training offers of 33 National Competence Centers Pan-European Master (2 years) for HPC 14 Training Centres with ~100 training events each year Coming: virtual HPC academy © European Processor Initiative 44

45.

6 Conclusion © European Processor Initiative 4 5

46.

They did not know it was impossible… So they did it” (Mark Twain) © European Processor Initiative 46

47.

We are still moving data to compute! CPU GPU IPU FPGA ASIC Neuromorphic Quantum DNA © European Processor Initiative 47

48.

we tend to shy away from simple and obvious solutions 3500 BC AD 1850 AD 1972 AD 1987 AD 2004 © European Processor Initiative 48

49.

Questions? © European Processor Initiative 49

50.

[email protected] Confidential information owned by the EPI consortium, to be used by the recipient only. This document, or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from the EPI consortium. © European Processor Initiative