A Deep Reinforcement Learning-based Approach for Revenue Optimization in PV-Battery Storage Systems

>100 Views

November 28, 25

#深層強化学習 #太陽光発電 #蓄電池システム #電力市場 #収入最適化

スライド概要

ICECET 2025

小平　大輔

@daisuke-kodaira

スライド一覧

小平大輔 - 筑波大学エネルギー・環境系助教。現在の研究テーマは、電気自動車の充電スケジューリング、エネルギー取引のためのブロックチェーン、太陽光発電とエネルギー需要の予測など。スライドの内容についてはお気軽にご相談ください：kodaira.daisuke.gf[at]u.tsukuba.ac.jp

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

ブロックチェーンを用いたP2P電力取引に関する研究

ブロックチェーン

小平　大輔 6.5K

2024_収益向上を目指した強化学習ベースの蓄電池制御手法の検討

強化学習蓄電池太陽光発電

小平　大輔 4.7K

住宅用太陽光発電に併設した蓄電池の深層強化学習による運用

機械学習強化学習蓄電池太陽光発電

小平　大輔 4.5K

2021_アンサンブル学習による電力需要予測

機械学習

小平　大輔 4.4K

2024_Quantile Regressionを用いた確率的電力価格予測

電力価格分位点予測電力取引市場

小平　大輔 3.5K

2022_太陽光発電出力予測における学習データの欠損値補完

小平　大輔 2.8K

各ページのテキスト

June 1, 2025 ICECET 2025, Paris A Deep Reinforcement Learning-based Approach for Revenue Optimization in PV-Battery Storage Systems Yuki Osone Institute of Systems and Information Engineering University of Tsukuba Tsukuba, Japan [email protected]

Introduction Background and Problem Battery Owner Revenue (+ JPY) Imbalance Cost (− JPY) Imbalance penalties incurred in electricity trading[1] Penalty! Gap between Day-Ahead Schedule and Real-Time Supply Research Objective Develop a PV–battery control algorithm that reduces imbalance costs while maximizing revenue Electricity Market PV-Battery Storage Systems Battery Photovoltaic (PV) [1] Tokyo Electric Power Company Holdings, Inc., “Open-Platform Aggregation Business Demonstration Project,”(Japanese)https://sii.or.jp/vpp31/uploads/B_1_2_tepco.pdf. 1

https://sii.or.jp/vpp31/uploads/B_1_2_tepco.pdf

Previous Research Category Model-based Reinforcement Learning -based Methods MPC, MILP [Abdullah et al,2015] DQN, DDPG [Karimi Madahi et al, 2024] Our proposed Deep Reinforcement Learning PPO (Hybrid with MPC) (DRL) -based 2 Limitations / Strengths • Slow computation • Hard to redesign • Rarely consider imbalance • Weak forecasting–control integration • Fast and flexible • Explicitly reduces imbalance loss

Simulation Workflow Battery charge/discharge planning flow on the prediction day Weather-Forecast Data Input [Ⅰ. Forecast] [Ⅱ. Control] • PV Forecast Input Formulate 蓄電池の the battery PV予測 • Electricity-price Forecast 充放電計画策定 charge/discharge • Imbalance-price schedule Forecast Plan Submission [Ⅲ. Battery System] Feedback [Ⅱ. Control] Extended to a Hybrid DRL + MPC Control Long-term Strategy Short-term Optimization [DRL] Proximal Policy PV予測 Optimization PV予測 Control Model Predictive 3

Simulation Condition Overview of Simulation Conditions [Assumed scenario (prosumer side)] ◼ Owns commercially available PV array (4 kW) and battery (4 kWh) ◼ The battery control is independent of the household electricity consumption ◼ No charging from the grid is permitted [Input data] ◼ Training data for the forecasters: observation data collected at Tsukuba City, Japan, 1 Apr 2022 – 31 Mar 2023 4

Simulation Model Formulate the battery-storage control problem as a Markov Decision Process ◼ State 𝑆𝑡 ：PV power output, electricity price, imbalance price, battery state-of-charge, previous action 𝑎𝑡−1 , and time-series features (sin, cos) ◼ Action 𝑎𝑡 ： Charge/discharge command given as a continuous value 5

Simulation Model Reward design of the PPO model 𝑹𝟏 , 𝑹𝟐 , 𝑹𝟑 ① Positive Reward for Discharge 𝑹𝟏 ②Negative Reward for Infeasible Actions 𝑹𝟐 Discharged energy (kWh) ×Electricity price (JPY/kWh) • Charging beyond PV output ③ Negative Reward Considering Imbalance Loss 𝑹𝟑 • Discharging beyond remaining battery capacity PV output (kWh) × Forecast errors 𝜀 × Imbalance price (JPY/kWh) • Using forecast error 𝜀~𝛮(0, 𝜎 2 ) to synthetically reproduce the imbalance losses • Calculate actual forecast errors from past data (𝜎 = 0.22) 6

Simulation result Two-Day Charge/Discharge Schedule Results (8–9 Sep 2022) Proposed Model (DRL - Imbalance Aware) Conventional Model (DRL - Imbalance Unaware) Surge in Imbalance Prices Discharge Charge The proposed model follows electricity prices and adapts to surges in imbalance prices, whereas traditional models respond only to electricity prices. 7

Simulation result Revenue and Cost Comparison over One Month of Simulation Net Revenue Imbalance Loss Proposed Model ＋35% －47% Conventional Model ＋33% －29% *Change relative to the rule-based model Rule-Based Model Proposed Model Conventional Model (DRL-Imbalance (DRL-Imbalance Aware) Unaware) 8 Outperforms the Conventional model in both profit and penalty reduction.

10.

Simulation result Seasonal Revenue Comparison Compared to the Conventional Model Net Revenue Proposed Model Conventional Model April July October January overall +4.2% +6.8% +2.5% +1.4% +4.2% Throughout the year, the proposed model yields higher revenue than the conventional model 9

11.

Summary Objective Methodology Develop a PV-battery system control algorithm for reducing imbalance costs and optimizing revenue Integrated forecasting and control using a deep reinforcement learning-based battery control approach Results The proposed model increased revenue by 35 % and reduced imbalance cost by 47 % compared to the RuleBased model Next demonstrate on our laboratory’s physical battery system 10

12.

Thank you for your attention. Questions? [email protected] 11