基于安全强化学习的微电网能量优化管理方法
CSTR:
作者:
作者单位:

(1.国网宁夏电力有限公司石嘴山供电公司 ,宁夏回族自治区 银川市 753000;2.天津大学电气自动化与信息工程学院 ,天津市 300072)

作者简介:

通讯作者:

李京京(2000—),女,硕士研究生,主要从事微电网能量优化管理方面的研究;E-mail:ljj_66@tju.edu.cn

中图分类号:

TM732

基金项目:

国家电网有限公司科技项目资助(No.5229SZ230003)


Energy optimization management method for microgrids based on safe reinforcement learning
Author:
Affiliation:

(1. Shizuishan Power Supply Company , State Grid Ningxia Electric Power Co ., Ltd., Yinchuan 753000, China; 2. School of Electrical and Information Engineering , Tianjin University , Tianjin 300072, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    微电网能量管理面临动态环境适应性差与训练过程安全性不足的双重挑战。传统的基于模型的能量优化方法严重依赖微电网的准确参数,难以应对微电网的动态变化。提出一种基于约束马尔可夫博弈的安全强化学习方法。首先,构建包含风机、储能与可调负荷的多主体安全边界约束,将策略探索限制在预设操作域内;其次,设计异步安全验证线程,实时修正策略网络的梯度更新方向;最后,用实例所提方法进行了仿真分析。研究结果表明,所提方法在保证系统安全性的前提下,相比其他方法提升日利润 120元,获得了最高的奖励值,且该方法能降低弃风量,提高储能利用率。该方法通过解耦安全约束与策略优化的时空关联性,为分布式能源系统提供了可拓展的安全强化学习范式。

    Abstract:

    Energy management of microgrids faces the dual challenges of poor adaptability to dynamic environments and insufficient safety in the training process.Traditional model-based energy optimization methods rely heavily on the accurate parameters of microgrids,making it difficult to cope with the dynamic changes of microgrids.A safe reinforcement learning method based on the constrained Markov game is proposed.First,a multi-agent safety boundary constraint including wind turbines,energy storage,and adjustable loads is constructed to limit policy exploration within the preset operation domain;second,an asynchronous safety verification thread is designed to correct the gradient update direction of the policy network in real time;finally,a simulation analysis of the proposed method is conducted using an instance.The research results show that under the premise of ensuring system safety,the proposed method increases the daily profit by 120 yuan compared with other methods,obtains the highest reward value,reduces the wind curtailment volume,and improves the energy storage utilization rate.By decoupling the spatiotemporal correlation between safety constraints and policy optimization,this method provides a scalable safe reinforcement learning paradigm for distributed energy systems.

    参考文献
    相似文献
    引证文献
引用本文

张云峰,徐涛,李文,等.基于安全强化学习的微电网能量优化管理方法[J].电力科学与技术学报,2026,41(2):259-270.
ZHANG Yunfeng, XU Tao, LI Wen, et al. Energy optimization management method for microgrids based on safe reinforcement learning[J]. Journal of Electric Power Science and Technology,2026,41(2):259-270.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-02-28
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-05-01
  • 出版日期:
文章二维码