Dynamic Provisioning and Capacity-Aware Scheduling for Local Storage

111 Views

June 24, 25

スライド概要

KubeCon + CloudNativeCon Japan 2025

June 17, 2025
https://kccncjpn2025.sched.com/event/1x724/dynamic-provisioning-and-capacity-aware-scheduling-for-local-storage-yuma-ogami-cybozu-inc

Yuma Ogami
Software Engineer, Cybozu, Inc.

profile-image

サイボウズ株式会社の主に開発本部の資料を公開するアカウントです。

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

各ページのテキスト
1.

Dynamic Provisioning and Capacity-Aware Scheduling for Local Storage Yuma Ogami, Cybozu, Inc. June 17th, 2025 1

2.

About Me • Yuma Ogami • Cloud Infrastructure Division, Cybozu, Inc. • One of the maintainers of TopoLVM • Following KEP owner o Storage Capacity Scoring of Nodes for Dynamic Provisioning (KEP-4049) 2

3.

Agenda • Local PV Overview and Limitations • Introduction to TopoLVM • Storage Capacity Scoring of Nodes for Dynamic Provisioning (KEP4049) • Summary 3

4.

Agenda • Local PV Overview and Limitations • Introduction to TopoLVM • Storage Capacity Scoring of Nodes for Dynamic Provisioning (KEP4049) • Summary 4

5.

Local Storage and Local PV • Local storage: • Storage (like SSD/HDD) directly attached to nodes • Benefits: • Low latency & High throughput • Cost-effective solution and easy to setup without special equipment like NAS or FC storage • Use case: • Applications requiring high I/O performance (Full-text search engines, Databases, etc.) • Typically, accessed through Local PV in Kubernetes 5

6.

Local PV is Simple but Not Flexible • Dynamic provisioning is not supported • Volume resizing is not supported 6

7.

Dynamic Provisioning is Not Supported When a user creates a PVC, a corresponding PV is not automatically created User Create Pod Pending PVC Pending 7

8.

Dynamic Provisioning is Not Supported Administrators must prepare PVs in advance (Increases operational costs) Admin Create What size should each PV be? Allocate PV PV node Volume Group LV LV LV PV 8

9.

Volume Resizing is Not Supported Storage usage increases over time node Cannot resize and run out of space! PV PV 9

10.

Agenda • Local PV Overview and Limitations • Introduction to TopoLVM • Storage Capacity Scoring of Nodes for Dynamic Provisioning (KEP4049) • Summary 10

11.

What is TopoLVM? • CSI plugin to manage local storage • Overcomes all the limitations mentioned before • Leverages Logical Volume Manager (LVM) • Open source • https://github.com/topolvm/topolvm 11

12.

Key Features of TopoLVM • Dynamic provisioning • Volume resizing • Pod scheduling based on node storage capacity 12

13.

Dynamic provisioning User 1.Create Pod PVC 2.Find TopoLVM & csi-sidecars 3.Create LV automatically node 4.Create corresponding PV PV LV 13

14.

Volume resizing User 1.Edit request size Pod PVC 2.Find TopoLVM & csi-sidecars 3.Expand node 4.Update PV PV LV 14

15.

W/o Pod Scheduling Based on Available Storage Capacity User request size: 10GB Pod The node with lower available storage capacity might be selected, PVC node A node B available capacity: 100 GB 50 GB node C 10 GB 10 GB 15

16.

W/o Pod Scheduling Based on Available Storage Capacity User request size: 10GB Pod Once placed, volume expansion is not possible PVC node A node B available capacity: 100 GB 50 GB node C 10 GB LV 10 GB 16

17.

Pod Scheduling Based on Available Storage Capacity User request size: 10GB Pod So, we want to place it on nodes with higher available storage capacity PVC node A node B available capacity: 100 LVGB 50 GB node C 10 GB 10 GB 17

18.

Benefits of Free Space Based Scoring User request size: 10GB Pod This becomes possible by scoring nodes based on their available storage capacity PVC Score:90 node A Score:80 node B available capacity: 100 GB 50 GB Score:0 node C 10 GB 10 GB 18

19.

Extending Kubernetes Scheduler with Scheduler Extender • When scheduling a pod, the scheduler extender allows an external process to filter and prioritize nodes • The http call is issued to the scheduler extender, one for "filter" and one for "prioritize" actions • When each call is made, the Pod and node manifests are passed 19

20.

Extending Kubernetes Scheduler with Scheduler Extender TopoLVM's approach for storage-aware scheduling (topolvm-scheduler): 1. Use webhooks to calculate PVC sizes and add this information as annotations to Pods 2. topolvm-node adds available storage capacity as annotations to Nodes 3. On each call ("filter" or "prioritize"), o Get required storage from Pod annotation o Get available capacity from Node annotations o Filter nodes for "filter" action / Score nodes for "prioritize" action 20

21.

Behavior of topolvm-scheduler User 1.Create Pod Kubernetes Scheduler (kube-scheduler) request size: 50GB PVC node A node B available capacity: 100 GB 50 GB topolvm-scheduler node C 10 GB 10 GB 21

22.

Behavior of topolvm-scheduler webhook User 3.Add Annotation for requested storage size (50GB) Pod 2.Read kube-scheduler request size: 50GB PVC node A node B available capacity: 100 GB 50 GB topolvm-scheduler node C 10 GB 10 GB 22

23.

Behavior of topolvm-scheduler kube-scheduler User Annotation for requested storage size (50GB) Pod topolvm-node 5.Add 4.Get node A available Annotation for capacity: available capacity 100 GB (100GB) PVC topolvm-scheduler topolvm-node topolvm-node 5.Add 4.Get node B Annotation 50 GB for available capacity (50GB) 5.Add 4.Get node C 10 GB Annotation for 10 GB available capacity (10GB) 23

24.

Behavior of topolvm-scheduler User Annotation for requested storage size (50GB) Pod kube-scheduler 6.Read ① on filter phase: Filters nodes that do not have enough available storage capacity for request PVC size of PVC node A node B available Annotation for capacity: available capacity 100 GB (100GB) Annotation 50 GB for available capacity (50GB) 7.Call with these manifests topolvm-scheduler ② on prioritize phase: Scores the remaining node C 10 GBnodes based on available storage capacity Annotation for 10 GB available capacity (10GB) 24

25.

Limitations with Scheduler Extenders • Many managed Kubernetes services do not allow kube-scheduler configuration changes o Multiple users has reported problems about this limitation (topolvm: discussion#713, issue#235) • topolvm-scheduler cannot be used for other CSI plugins These limitations are the primary motivation behind KEP-4049 25

26.

Agenda • Local PV Overview and Limitations • Introduction to TopoLVM • Storage Capacity Scoring of Nodes for Dynamic Provisioning (KEP-4049) • Summary 26

27.

Overview • Purpose: • Add scoring functionality based on node storage capacity • Enable capacity-aware scheduling without scheduler extender • Allow CSI plugins other than TopoLVM to also benefit from storage-aware scheduling • Status: • Alpha stage (as of Kubernetes 1.33) • Can be enabled with feature gate StorageCapacityScoring 27

28.

Behavior of StorageCapacityScoring kube-scheduler with StorageCapacityScoring User Create Call GetCapacity to collect storage capacity information Pod CSI Plugin (TopoLVM) PVC node A node B available capacity: 100 GB 50 GB node C 10 GB 10 GB 28

29.

Behavior of StorageCapacityScoring kube-scheduler with StorageCapacityScoring User request size: 10GB Pod Schedule the pod to the node with the most available storage capacity as possible PVC node A node B available capacity: 100 GB 50 GB node C 10 GB 10 GB 29

30.

Demo: Video • There are 3 nodes in a Kubernetes cluster • Create multiple Pods and PVCs one by one • The first demo: o StorageCapacityScoring feature gate is disabled o This will help us understand the problem we're trying to solve • The second demo: o StorageCapacityScoring feature gate is enabled o You will be able to see how volumes will be provisioned based on available storage capacity 30

31.

Configure scoring policy You can configure the scoring policy by the shape setting on KubeSchedulerConfiguration apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: ... pluginConfig: - name: VolumeBinding args: Prioritize nodes with ... shape: higher available - utilization: 0 storage capacity score: 10 - utilization: 100 (default) score: 0 apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: ... pluginConfig: - name: VolumeBinding args: ... Prioritize nodes shape: with - utilization: 0 score: 0 lower available - utilization: 100 storage capacity score: 10 31

32.

Configure Scoring Policy Score 10 (Max) Prioritize nodes with higher available storage capacity (default) 0 Utilization[%] Prioritize nodes with lower available storage capacity 100 32

33.

Agenda • Local PV Overview and Limitations • Introduction to TopoLVM • Storage Capacity Scoring of Nodes for Dynamic Provisioning (KEP4049) • Summary 33

34.

Summary • Local PV offers high performance but presents management limitations in Kubernetes • TopoLVM overcomes limitations of local PV • TopoLVM's capacity-aware scheduling relies on the Kubernetes scheduler extender method • KEP-4049 (StorageCapacityScoring) provides this scheduling for all CSI drivers that support Storage Capacity Tracking 34

35.

Acknowledgments • Special thanks to sig-storage members for their reviews and support of KEP-4049 • Thanks to everyone who provided feedback 35

36.

For More Details • Want to learn more about TopoLVM’s or StorageCapacityScoring (KEP4049)’s design? • The design document of TopoLVM: • https://github.com/topolvm/topolvm/blob/main/docs/design.md • The design document of KEP-4049: • https://github.com/kubernetes/enhancements/blob/master/keps/sigstorage/4049-storage-capacity-scoring-of-nodes-for-dynamic- provisioning/README.md 36

37.

Thank You! • Questions? 37

38.

©️ Cybozu, Inc. 38