132 84 70MB
English Pages 767 [759] Year 2023
Yuan Tian · Tinghuai Ma · Qingshan Jiang · Qi Liu · Muhammad Khurram Khan (Eds.)
Communications in Computer and Information Science
1796
Big Data and Security 4th International Conference, ICBDS 2022 Xiamen, China, December 8–12, 2022 Proceedings
Communications in Computer and Information Science Editorial Board Members Joaquim Filipe , Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh , Indian Statistical Institute, Kolkata, India Raquel Oliveira Prates , Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou, Tsinghua University, Beijing, China
1796
Rationale The CCIS series is devoted to the publication of proceedings of computer science conferences. Its aim is to efficiently disseminate original research results in informatics in printed and electronic form. While the focus is on publication of peer-reviewed full papers presenting mature work, inclusion of reviewed short papers reporting on work in progress is welcome, too. Besides globally relevant meetings with internationally representative program committees guaranteeing a strict peer-reviewing and paper selection process, conferences run by societies or of high regional or national relevance are also considered for publication. Topics The topical scope of CCIS spans the entire spectrum of informatics ranging from foundational topics in the theory of computing to information and communications science and technology and a broad variety of interdisciplinary application fields. Information for Volume Editors and Authors Publication in CCIS is free of charge. No royalties are paid, however, we offer registered conference participants temporary free access to the online version of the conference proceedings on SpringerLink (http://link.springer.com) by means of an http referrer from the conference website and/or a number of complimentary printed copies, as specified in the official acceptance email of the event. CCIS proceedings can be published in time for distribution at conferences or as postproceedings, and delivered in the form of printed books and/or electronically as USBs and/or e-content licenses for accessing proceedings at SpringerLink. Furthermore, CCIS proceedings are included in the CCIS electronic book series hosted in the SpringerLink digital library at http://link.springer.com/bookseries/7899. Conferences publishing in CCIS are allowed to use Online Conference Service (OCS) for managing the whole proceedings lifecycle (from submission and reviewing to preparing for publication) free of charge. Publication process The language of publication is exclusively English. Authors publishing in CCIS have to sign the Springer CCIS copyright transfer form, however, they are free to use their material published in CCIS for substantially changed, more elaborate subsequent publications elsewhere. For the preparation of the camera-ready papers/files, authors have to strictly adhere to the Springer CCIS Authors’ Instructions and are strongly encouraged to use the CCIS LaTeX style files or templates. Abstracting/Indexing CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. How to start To start the evaluation of your proposal for inclusion in the CCIS series, please send an e-mail to [email protected].
Yuan Tian · Tinghuai Ma · Qingshan Jiang · Qi Liu · Muhammad Khurram Khan Editors
Big Data and Security 4th International Conference, ICBDS 2022 Xiamen, China, December 8–12, 2022 Proceedings
Editors Yuan Tian Nanjing Institute of Technology Nanjing, China Qingshan Jiang Chinese Academy of Sciences Shenzhen, China Muhammad Khurram Khan King Saud University Riyadh, Saudi Arabia
Tinghuai Ma Nanjing University of Information Science and Technology Nanjing, Jiangsu, China Qi Liu Nanjing University of Information Science and Technology Nanjing, China
ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-981-99-3299-3 ISBN 978-981-99-3300-6 (eBook) https://doi.org/10.1007/978-981-99-3300-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This volume contains the papers from the 4th International Conference on Big Data and Security (ICBDS 2022). The event was held in Xiamen and Quanzhou, and was organized by Nanjing Institute of Technology, Xiamen University, JiangSu Computer Society, and IEEE Broadcast Technology Society. The International Conference on Big Data and Security (ICBDS 2022) brings experts and researchers together from all over the world to discuss the current status and potential ways to address security and privacy regarding the use of Big Data systems. Big Data systems are complex and heterogeneous; due to their extraordinary scale and the integration of different technologies, new security and privacy issues are introduced and must be properly addressed. The ongoing digitalization of the business world is putting companies and users at risk of cyber-attacks more than ever before. Big data analysis has the potential to offer protection against these attacks. Participation in workshops on specific topics of the conference was expected to achieve progress, and global networking in transferring and exchanging ideas. The papers come from researchers who work in universities and research institutions, and give us the opportunity to achieve a good level of understanding of the mutual needs, requirements, and technical means available in this field of research. The topics included in the second edition of this event include the following fields connected to Big Data: Security in Blockchain, IoT Security, Security in Cloud and Fog Computing, Artificial Intelligence/Machine Learning Security, and Cybersecurity and Privacy. We received 211 submissions and accepted 54 papers. All the accepted papers were peer reviewed by three qualified reviewers chosen from our technical program committee based on their qualifications and experience. The proceedings editors wish to thank the dedicated Scientific Committee members and all the other reviewers for their efforts and contributions. We also thank Springer for their trust and for publishing the proceedings of ICBDS 2022. January 2022
Yuan Tian Tinghuai Ma Qingshan Jiang Qi Liu Muhammad Khurram Khan
Organization
General Chairs Ting Huai Ma Yi Pan Muhammad Khurram Khan Yuan Tian
Nanjing University of Information Science and Technology, China Georgia State University, USA King Saud University, Saudi Arabia Nanjing Institute of Technology, China
Organization Chair Zhong Tan
Xiamen University, China
Technical Program Chairs Victor S. Sheng Qi Liu
University of Central Arkansas, USA Nanjing University of Information Science and Technology, China
Technical Program Committee Members Eui-Nam Huh Heba Abdullataif Kurdi Omar Alfandi Dongxue Liang Mohammed Al-Dhelaan Päivi Raulamo-Jurvanen Zeeshan Pervez Adil Mehmood Khan Wajahat Ali Khan Qiao Lin Ye Pertti Karhapää Farkhund Iqbal Muhammad Ovais Ahmad Lejun Zhang
Kyung Hee University, South Korea Massachusetts Institute of Technology, USA Zayed University, UAE Tsinghua University, China King Saud University, Saudi Arabia University of Oulu, Finland University of the West of Scotland, UK Innopolis University, Russia Kyung Hee University, South Korea Nanjing Forestry University, China University of Oulu, Finland Zayed University, UAE Karlstad University, Sweden Yangzhou University, China
viii
Organization
Linshan Shen Ghada Al-Hudhud Lei Han Tang Xin Mznah Al-Rodhaan Yao Zhenjian Thant Zin Oo Mohammad Rawashdeh Alia Alabdulkarim Elina Annanperä Soha Zaghloul Mekki Basmah Alotibi Mariya Muneeb Maryam Hajakbari Miada Murad Pilar Rodríguez Zhiwei Wang Rand. J Jiagao Wu Weipeng Jing Yu Zhang Nguyen H. Tran Hang Chen Sarah Alkharji Chunguo Li Babar Shah Tianyang Zhou Manal Hazazi Amiya Kumar Tripathy Shaoyong Guo Shadan AlHamed Cunjie Cao Linfeng Liu Chunliang Yang Patrick Hung Xinjian Zhao
Harbin Engineering University, China King Saud University, Saudi Arabia Nanjing Institute of Technology, China University of International Relations, China King Saud University, Saudi Arabia Huazhong University of Science and Technology, China Kyung Hee University, South Korea University of Central Missouri, USA King Saud University, Saudi Arabia University of Oulu, Finland King Saud University, Saudi Arabia King Saud University, Saudi Arabia King Saud University, Saudi Arabia Islamic Azad University, Iran King Saud University, Saudi Arabia Technical University of Madrid, Spain Hebei Normal University, China Shaqra University, Saudi Arabia Nanjing University of Posts and Telecommunications, China Northeast Forestry University, China Harbin Institute of Technology, China University of Sydney, Australia Nanjing Institute of Technology, China King Saud University, Saudi Arabia Southeast University, China Zayed University, UAE State Key Laboratory of Mathematical Engineering and Advanced Computing, China King Saud University, Saudi Arabia Edith Cowan University, Australia Beijing University of Posts and Telecommunications, China King Saud University, Saudi Arabia Hainan University, China Nanjing University of Posts and Telecommunications, China China Mobile IoT Company Limited, China University of Ontario Institute of Technology, Canada State Grid Nanjing Power Supply Company, China
Organization
Sungyoung Lee Zhengyu Chen Jian Zhou Pasi Kuvaja Xiao Xue Jianguo Sun Farkhund Iqbal Zilong Jin Susheela Dahiya Ming Pang Yuanfeng Jin Maram Al-Shablan Kejia Chen Valentina Lenarduzzi Davide Taibi Jinghua Ding XueSong Yin Qiang Ma Shiwen Hu Manar Hosny Lei Cui Yonghua Gong Kashif Saleem Xiaojian Ding Irfan Mohiuddin Ming Su Yunyun Wang Abdullah Al-Dhelaan
ix
Kyung Hee University, South Korea Jinling Institute of Technology, China Nanjing University of Posts and Telecommunications, China University of Oulu, Finland Tianjin University, China Harbin Engineering University, China Zayed University, UAE Nanjing University of Information Science and Technology, China University of Petroleum & Energy Studies, India Harbin Engineering University, China Yanbian University, China King Saud University, Saudi Arabia Nanjing University of Posts and Telecommunications, China University of Tampere, Finland University of Tampere, Finland Sungkyunkwan University, South Korea Nanjing Institute of Technology, China King Saud University, Saudi Arabia Accelor Ltd., USA King Saud University, Saudi Arabia Chinese Academy of Sciences (CAS), China Nanjing University of Posts and Telecommunications, China King Saud University, Saudi Arabia Nanjing University of Finance and Economics, China King Saud University, Saudi Arabia Beijing University of Posts and Telecommunications, China Nanjing University of Posts and Telecommunications, China King Saud University, Saudi Arabia
x
Organization
Workshop Chairs Mohammad Mehedi Hassan Asad Masood Khattak
King Saud University, Saudi Arabia Zayed University, UAE
Publication Chair Vidyasagar Potdar
Curtin University, Australia
Organization Committee Members Fan Lin Yuyao Zhang Jalal Al-Muhtadi Geng Yang Qiao Lin Ye Pertti Karhapää Lei Han Yong Zhu Bin Xie Dawei Li Jing Rong Chen Thant Zin Oo Alia Alabdulkarim Rand. J Hang Chen Jiagao Wu
Xiamen University, China Xiamen University, China King Saud University, Saudi Arabia Nanjing University of Posts and Telecommunications, China Nanjing Forestry University, China University of Oulu, Finland Nanjing Institute of Technology, China Jingling Institute of Technology, China Hebei Normal University, China Nanjing Institute of Technology, China Nanjing Institute of Technology, China Kyung Hee University, South Korea King Saud University, Saudi Arabia Shaqra University, Saudi Arabia Nanjing Institute of Technology, China Nanjing University of Posts and Telecommunications, China
Contents
Big Data and New Method Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection of Multi-type Enterprises Based on Energy Consumption Big Data Mining . . . . Xiangguo Liu, Jian Geng, Zhonglong Wang, Lingyao Cai, and Fanjiao Yin
3
Research on Data-Driven AGC Instruction Execution Effect Recognition Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyang Jiang, Hongtong Liu, and Yangfei Zhang
14
Research on Day-Ahead Scheduling Strategy of the Power System Includes Wind Power Plants and Photovoltaic Power Stations Based on Big Data Clustering and Filling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feng Qi, Qiang Wang, Xiaoqiang Wei, Yiping Zhang, Wenbin Wu, Donyang He, Taicheng Wang, and Suyang Shen
25
Day-Ahead Scenario Generation Method for Renewable Energy Based on Historical Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hong Wang, Yang Liu, Chao Yin, Xuesong Bai, Bo Sun, Menglei Li, and Xiyong Yang
37
A High-Frequency Stock Price Prediction Method Based on Mode Decomposition and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weijie Chen, Qingshan Jiang, Xibei Jia, Abdur Rasool, and Weihui Jiang
47
A Cross-Platform Instant Messaging User Association Method Based on Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pei Zhou, Xiangyang Luo, Shaoyong Du, Wenqi Shi, and Jiashan Guo
63
Research on Data Watermark Tracing System in Hadoop Environment . . . . . . . . Wenyu Qiao and Jiexi Wang Application of RFID Tag in the Localization of Power Cable Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhenyu Zhang, Shuming Feng, Min Xiao, Yongcheng Yang, and Gangbo Song
80
92
Research Hotspots and Evolution Trend of Virtual Power Plant in China: An Empirical Analysis Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Yan Zhang, Pengcheng Liu, Hao Xu, and Mulan Wang
xii
Contents
Influencing Factors Analysis and Prediction Model of Pavement Transverse Crack Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Yuqin Zhu, Wengang Ma, Ling Cong, Chengtao Li, and Shixiang Hu Research on the Evolution of New Energy Industry Financing Ecosystem Under the Background of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Hairong Wang and Qiuchi Wu A Survey of the State-of-the-Art and Some Extensions of Recommender System Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Lixin Jia, Lixiu Jia, and Lihang Feng An Innovative AdaBoost Process Using Flexible Soft Labels on Imbalanced Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Jinke Wang, Biao Song, Xinchang Zhang, Yuan Tian, and Ran Guo Artificial Intelligence and Machine Learning Security Feature Fusion Based IPSO-LSSVM Fault Diagnosis of On-Load Tap Changers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Honghua Xu, Laibi Yin, Ziqiang Xu, Qian Xin, and Mengjie Lv Graphlet-Based Measure to Assess Institutional Research Teams . . . . . . . . . . . . . 200 Shengqing Li and Jiulei Jiang Logical Relationship Extraction of Multimodal South China Sea Big Data Using BERT and Knowledge Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Peng Yufang, Xu Hao, Jin Weijian, and Yang Haiping Research on Application of Knowledge Graph in War Archive Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Huang Yongqin, Chen Xushan, Yang Anlian, and Ping Shuo Semi-supervised Learning Enabled Fault Analysis Method for Power Distribution Network Based on LSTM Autoencoder and Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Haining Xie, Liming Zhuang, and Xiang Wang Fault Detection Method for Power Distribution Network Based on Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 Haining Xie and Lijuan Chao Factors Influencing Chinese Users’ Willingness to Pay for O2O Knowledge Products Based on Information Adoption Model . . . . . . . . . . . . . . . . . 276 Zhu Zhentao, Yue Wen, Zhang Yan, and Wu Qiuchi
Contents
xiii
A Survey of Integrating Federated Learning with Smart Grids: Application Prospect, Privacy Preserving and Challenges Analysis . . . . . . . . . . . . . . . . . . . . . . 296 Zhichao Tang, Yan Yan, Dong Wu, Tianhao Yang, Ruixuan Dong, Shuyang Hao, Wei Wang, Yizhi Chen, and Yuan Tian Application of the Fusion Access Technology of Carrier and 5G in Power Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Yang Hu, Ming Zhang, Yingli He, Guangxiang Jin, Qi Wei, and Jia Yu Muscle Fatigue Classification Based on GA Optimization of BP Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Mengjie Zang, Lidong Xing, Zhiyu Qian, and Liuye Yao Data Technology and Network Security Research on Typical Scenario Generation Based on Distribution Network Data Mining and Improved Policy Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Jinhu Wang, Tongzhou Zhang, Ming Chen, Wei Han, Mingze Ji, and Yuzhuo Zhang A Data-Driven and Deep Learning-Based Economic Evaluation Method for New Power System Distribution Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Yunzhao Wu, Jialei Zhang, Qing Duan, Guanglin Sha, and Yao Zhang Study on Random Generation of Virtual Avatars Based on Big Data . . . . . . . . . . 355 Jian Zhao, Mo Peng, Bo-Lin Zhu, and Ling-Ling Li Ordering, Pricing, and Coordination of a Closed-Loop Supply Chain with Risk Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Ronghua Lu, Yisheng Wu, and Feng Xu An Explainable Optimization Method for Assembly Process Parameter . . . . . . . . 391 Hongsong Peng, Weiwei Yuan, Yimin Pu, Xiying Yang, Donghai Guan, and Ran Guo A Correlational Strategy for the Prediction of High-Dimensional Stock Data by Neural Networks and Technical Indicators . . . . . . . . . . . . . . . . . . . . . . . . . 405 Jingwei Hong, Ping Han, Abdur Rasool, Hui Chen, Zhiling Hong, Zhong Tan, Fan Lin, Steven X. Wei, and Qingshan Jiang Research on Attribute-Based Privacy-Preserving Computing Technologies . . . . . 420 Shuo Qiu, Wenhui Ni, Yanfeng Shi, and Wanni Xu
xiv
Contents
Research on BIM Modeling Technology from the Perspective of Power Big Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Yan Li Security Risk Management of the Internet of Things Based on 5G Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Wei Cao, Yang Hu, Shuang Yang, Xue-yang Zhu, and Jia Yu Multi Slice SLA Collaboration and Optimization of Power 5G Big Data Security Business . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Daohua Zhu, Yajuan Guo, Lei Wei, Yunxiao Sun, and Wei Liu Virtualized Network Functions Placement Scheme in Cloud Network Collaborative Operation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 Youjun Hu, Lan Gan, E. Longhui, Fangcheng Chu, Yang Lu, Xifan Nie, and Fei Zhao Grounded Theory-Driven Knowledge Production Features Mining: One Empirical Study Based on Big Data Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Hao Xu, Yiyang Li, Mulan Wang, Yufang Peng, Qinwei Chen, Pengcheng Liu, and Yijing Li Research on Digital Twin Technology of Main Equipment for Power Transmission and Transformation Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . 512 Yu Chen, Ziqian Zhang, and Ning Tang Efficient Spatiotemporal Big Data Indexing Algorithm with Loss Control . . . . . . 524 Ziyu Wang, Runda Guan, Xiaokang Pan, Biao Song, Xinchang Zhang, and Yuan Tian Cybersecurity and Privacy Privacy Measurement Based on Social Network Properties and Structure . . . . . . 537 YuHong Gong, Biao Jin, and ZhiQiang Yao Research on 5G-Based Zero Trust Network Security Platform . . . . . . . . . . . . . . . . 552 Zaojian Dai, Jidong Zhang, Yong Li, Xinyi Li, Ziang Lu, and Wengao Fang A Mobile Data Leakage Prevention System Based on Encryption Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Wen Shen and Hongzhang Xiong Research on Data Security Access Control Mechanism in Cloud Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Chen Luo, Yang Li, and Qianxuan Wang
Contents
xv
Research on Data Security Storage System Based on Distributed Database . . . . . 586 Bo Liu, Lan Zhang, and Jinke Wang Design of an Active Data Watermark Detection System . . . . . . . . . . . . . . . . . . . . . 597 Lan Zhang, Xiangyang Zhang, and Tiejun Yang Research on Privacy Protection Methods for Data Mining . . . . . . . . . . . . . . . . . . . 609 Jindong He, Rongyan Cai, Shanshan Lei, and Dan Wu A Cluster-Based Facial Image Anonymization Method Using Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Yuanzhe Yang, Zhiyi Niu, Yuying Qiu, Biao Song, Xinchang Zhang, Yuan Tian, and Ran Guo IoT Security Research on the Security Diagnosis Platform for Partial Discharge of 10 kV Cables in Urban Distribution Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Xinping Wang, Guofei Guan, Chunpeng Li, Hao Zhang, Chao Jiang, and Qingwu Song Research on the Electromagnetic Sensor-Based Partial Discharge Security Monitoring System of Distribution Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Xinping Wang, Chunpeng Li, Xiaoping Yang, Feng Jiang, Qiqi Luan, and Yifan Wang Design of Ultrasonic-Based Remote Distribution Network Online Security Monitoring Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 Qiqi Luan, Feng Jiang, Xinping Wang, Qingwu Song, Tianze Zhu, and Jiangbin Wang An Intelligent IoT Terminal Detection System Based on Data Sniffing . . . . . . . . 677 Tao Chen, Can Cao, and Pengcheng Ni Intelligent IoT Terminal Software Identification System Based on Behavior Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 Zhiyuan Ye, Cen Chen, Nuannuan Li, Wen Yang, Zheng Zhang, and Can Cao Research on the Identification Model of Power Terminal . . . . . . . . . . . . . . . . . . . . 701 Wenbo Shang, Xin Jin, Biying Sun, Xiaoqin Liu, and Chunhui Du
xvi
Contents
Research on Firmware Vulnerability Mining Model of Power Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Chao Zhou, Ziying Wang, Jing Guo, Yajuan Guo, Haitao Jiang, Zhimin Gu, and Wei Huang How Can Social Workers Participate in Big Data Governance? The Third-Party Perspective on Big Data Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 Di Zhao Power Network Scheduling Strategy Based on Federated Learning Algorithm in Edge Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Xiaowei Xu, Han Ding, Jiayu Wang, and Liang Hua Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
Big Data and New Method
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection of Multi-type Enterprises Based on Energy Consumption Big Data Mining Xiangguo Liu(B) , Jian Geng, Zhonglong Wang, Lingyao Cai, and Fanjiao Yin State Grid Shandong Electric Power Company, Tai’an Power Supply Company, Tai’an 271000, China [email protected]
Abstract. In view of the continuous improvement of the current energy consumption data of many types of enterprises, the effective monitoring of enterprise energy consumption and the early warning of different reward energy use will be the top priority. This paper proposes a data-driven energy efficiency evaluation and energy anomaly detection method for multi-type enterprises based on energy consumption big data mining. This method uses K-means Clustering algorithm identifies the energy consumption patterns of different enterprises, which is convenient for the evaluation of enterprise energy efficiency. Then, on the basis of pattern division, the outlier detection of enterprise energy consumption data is completed by CEEMDAN-LOF algorithm, and the abnormal energy consumption detection and research of enterprises are realized. The example uses the real energy consumption data of power grid enterprises, and the simulation results show the effectiveness of the proposed method. Keywords: Early warning of abnormal energy use · big data mining · K-means clustering · outlier detection
1 Introduction Energy conservation and emissions reduction has become the consensus of the whole society, is the only way to sustainable development [1, 2], so in the continuous improvement of the enterprise energy consumption data, the effective use of enterprise energy consumption data for energy consumption can effectively monitor and abnormal use of early warning is the most important [3], identification of abnormal energy consumption to reduce energy consumption of enterprises improves the management level of the power grid provides the positive support. Project Supported by Science and Technologyleft-248285 Project of State Grid Shandong Electric Power Company “Research on the key technologies for intelligent research and judgment of energy efficiency anomalies of multi type enterprises based on massive data mining” (5206002000QW). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 3–13, 2023. https://doi.org/10.1007/978-981-99-3300-6_1
4
X. Liu et al.
Reference [4] proposed the concept of multi-energy system to obtain the optimal energy ratio through energy efficiency and energy consumption ratio. Reference [5] proposed the establishment of a comprehensive energy system evaluation index body through the related architecture of energy Internet. Reference [6] proposed a Stacking ensemble model for anomaly monitoring based on five ML models. Reference [7] designed different screening methods for measurement anomalies based on the differences in massive data features.
2 Determination of the Optimal Number of Clusters Evaluation index, SC is a common clustering algorithm for arbitrary bunch of class C j and a single sample of SC calculation formula is: b−a max(a, b)
(1)
1 (xi − cj )2 n1
(2)
1 (xi − xk )2 m
(3)
s= where: a=
xi ∈Cj
b=
xi ∈Cj xk ∈Cl
where: S is the contour coefficient of a single sample; A is the average distance between samples and all other points in class C j ; B is the average distance between the samples in class C l and all points in the samples in the nearest class C j ; C j is the center of mass of class C j ; M and N represent the number of samples in class C j and C l , respectively. However, SC will be overrepresented on convex clusters. Therefore, GSA algorithm is considered to be introduced on the basis of SC. GSA can estimate the optimal number of clusters in the dataset, and the effect is obvious in the scenario where the number of clusters is set at a low value. The clustering dispersion of K clusters is defined as: W (K) =
K
(xi − cj )2
(4)
i=1 xi ∈Cj
The main idea of GSA algorithm is to calculate the natural logarithm of the corresponding cluster dispersion for each set cluster number K, and compare it with the threshold, so as to determine the optimal cluster number. After natural logarithm processing, the data situation tends to be more linearized, which makes it easier to determine the Gap value Gap(K) between clustering dispersion and threshold. The Gap value is defined as follows: Gap(K) = E ln[Wr (K)] − ln[W (K)]
(5)
where: R is the selected reference dataset, and E is the mathematical expectation of the reference dataset. The Gap (K) value contains the information of clustering dispersion
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection
5
between the data set to be tested and the reference distribution. With the change of K, the Gap (K) value will also change, and the analysis of Gap (K) can determine the optimal number of clusters. The above describes the algorithm principles of GSA and SC respectively. However, the determination of the optimal clustering number by simply using GSA or SC methods may lead to misjudgment, resulting in insignificant clustering effect. Therefore, in order to improve the accuracy of clustering results, GSA-SC fusion algorithm was introduced to determine the optimal number of clusters, and the average value of Gap(K) and S was calculated to construct a new clustering evaluation index G, as shown in the following equation. G=
Gap(K) + s 2
(6)
3 Energy Consumption Pattern Recognition Based on K-Means Clustering K-means is based on partition clustering algorithm. Its main idea is that, for a data set given multiple data objects, it can be divided into several dissimilar clustering clusters by partition method, and the data objects of each cluster cannot belong to another cluster. The partition process is shown as follows (Fig. 1):
Select the iniƟal center
Produce a parƟƟon result
Are the results reasonable?
Y
The final result
N Modify the division
Fig. 1. Flowchart of partitioning based clustering algorithm
K-means clustering algorithm was proposed by MacQueen in L976, and its main idea is to divide the data set with N objects into K clusters. K-means algorithm generally uses Euclidean distance to measure similarity, and the evaluation function S of clustering effect can be defined as follows. S=
n k
Dij xi − h2
(7)
i=1 j=1
The end of the partition is marked by the minimum sum of squared distances of data objects in each cluster. However, in the actual experiment, it is impossible to minimize the sum of squared distances of each cluster, and the partitioning algorithm will fall into the local optimal solution. The execution steps of K-means clustering algorithm are divided into the following four steps: Step 1: Randomly select the initial K cluster center points, ci ;
6
X. Liu et al.
Step 2: Traverse the points in the data space to obtain the nearest initial clustering center to the point, and finally merge this point into the class. Step 3: Calculate the mean of the sum of squared distances of points within each cluster, and take the mean as the next cluster center. Step 4: If the sum of squared distances of points in the cluster does not change, end the clustering; otherwise, iterate step 2 and step 3 continuously. The time complexity of k-means algorithm is O(nkd), where n represents the number of data points, K represents the number of clustering centers, and D represents the dimension of data points. For k-means algorithm, the time taken for each iteration is O(nkd)). If the number of iterations is T, then the time complexity of K-mean clustering algorithm is O(nkd). In fact, most of the time spent by k-means algorithm is to calculate the distance between data points. Future improvement direction can be aimed at reducing the time consumption and the nearest neighbor problem.
4 Abnormal Analysis of Energy Consumption Data In this module, the abnormal analysis of energy consumption data is completed by combining CEEMDAN algorithm and LOF algorithm, and the data analysis helps enterprises to better improve energy monitoring and abnormal detection of energy consumption data. 4.1 LOF Algorithm Density-based LOF algorithm is based on the definition of KTH distance, k-distance neighborhood, reachable distance and local reachable density of data objects. Related concepts are as follows: For a given dataset D, p ∈ D and O ∈ D, the distance between point P and O is denoted as D (p, O), and the KTH distance k(p) of point P is defined as: (1) There are at least k points O ∈ D, such that D (P, O ) ≤ D (P,O): (2) There are at most k−1 points O ∈ D such that D (P, O ) < D (P, O) holds. The set of points whose distance is not greater than the KTH distance of P is the k-distance neighborhood of point P, which is defined as follows. Nk (p) = {q ∈ D − {p}|d (p, o) ≤ k(p)}
(8)
For a given dataset D, p ∈ D, O ∈ D, the KTH distance of point O and the maximum distance between point P and O are the reachable distance, and the calculation formula is as follows: rr (p, o) = max{k(o), d (p, o)}
(9)
It can be understood that when point p is far away from point O, the reachable distance is the actual distance between the two points; When point P is close to point O, the reachable distance is the K distance of point p. That is, for a point o in Nk (p), when the reachable distance rk (p, o) is taken as the distance (p, o) between two points, it indicates that K of point P from point O is more likely to be far away from a point in the
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection
7
p1 Reach-distk(p1,o)=k-distance(o) o Reach-distk(p2,o)
p2 Fig. 2. When k = 4, reach distance (P1 , O) and reach distance (P2 , O)
neighborhood. On the contrary, if rk (p, o) takes k(O), then for the point O in Nk (p), the probability that point p is in the neighborhood of K distance of point O is large (Fig. 2) The reciprocal of the average reachable distance of all points in the K-neighborhood Nk (p) based on point P is the local reachable density of point P, and the calculation formula is: lk (P) =
|Nk (p)| o∈Nk (p) rk (p, o)
(10)
According to the local reachable density, the greater the distance between a data point and its neighbors, the smaller the density between them, and vice versa. That is, the locally accessible density reflects the density of the local area around the data point. The local anomaly factor is expressed as the average of the ratio between the local accessible density of other points in N(P) and the local accessible density of point P. The calculation formula is as follows: o∈Nk (p) lk (o) (11) Lk (p) = |Nk (p)|lk (p) The local anomaly factor reflects the degree of outliers between point P and its neighbors. If the value of Lk (p) approaches 1, it means that the local density of point P is close to that of its neighbors. If the value of Lk (p) is greater than 1 and larger, it means that point P is more distant from neighboring points and can be regarded as a possible outlier. Therefore, the outlier degree of any point P in the data set can be judged by calculating Lk (p). The calculation flow of LOF algorithm is as follows: (1) Determine the parameter, k, and calculate the k-th distance. For a given dataset D, there is one sample data Pi ∈ D, i = 1, 2, n, k is a positive integer. Determine the value of parameter k and calculate each point P in dataset D and its k distance k(pi ) in its neighborhood Nk (pi ).
8
X. Liu et al.
(2) Calculate the accessible distance and the local accessible density. k(pi ) obtained from the upper dragon was used to calculate the accessible distance between p: and the k distance from all the points in the neighborhood Nk (pi ), thus calculating the local accessible density of p of lk (pi ) (3) Calculate the local anomalous factor value and divide the anomalous data set. Local anomaly factor lk (pi ) is calculated for all sample data in data set D and compared with the threshold 1. If local anomaly factor lk (pi ) ≤ 1, data P is considered normal and otherwise outliers. The local anomaly factor of all points in the dataset compared to 1. For comparison, data greater than 1 are divided into anomalous data sets, and the algorithm ends. 4.2 CEEMDAN Algorithm The specific steps of CEEMDAN are as follows: (1) Add white Gaussian noise εωi (n) to the signal X(n) to be decomposed: x(n) = x(n) + εωi (n), where ω I (n) represents the white Gaussian noise added in the ith experiment, ε represents the amplitude of the noise, and the first IMF component can be obtained by using EMD algorithm to decompose the above expression for I times and calculate the mean value: 1 IMF1 (n) I I
IMF1 (n) =
(12)
i=1
IMF1 (n) denotes the first IMF component obtained using CEEMDAN. (2) By subtracting the first F-component from the original signal to be decomposed, the margin after the first step of decomposition can be obtained: r1 (n): r1 (n) = x(n) − IMF1 (n); (3) The second IMF can be obtained by 1 decomposition experiment of R1 (n) = x(n) − IMF1 (n) using EMD [8] algorithm and taking the average value; (4) For k = 2, 3 … K, calculate the KTH allowance rk (n), The k + 1 IMF component can be obtained by decomposing rk (n) + Ek εωi (n) I times and taking the average value: 1 Ei πk (n) + εEk ωi (n) I I
IMFk+1 (n) =
(13)
i=1
where IMFk+1 (n) represents the k + 1 IMF component obtained using CEEMDAN algorithm. (5) Repeat until you can’t break it down.
5 Example Simulation and Analysis The example data adopts the historical energy consumption data of a cement enterprise as the research sample to study the abnormal data detection proposed in this section. Daily level data of the enterprise from January 1, 2020 to December 31, 2020 are selected as the detection data, a total of 365 data.
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection
9
5.1 Enterprise Energy Consumption Mode Division The historical energy consumption data of enterprises were normalized. The figure shows the distribution of the actual and normalized values of the enterprise’s historical energy consumption. It can be seen from the figure that the energy consumption is mainly distributed within the interval [0, 2.5], but there is an isolated point (the actual value is 20.45). If the traditional clustering algorithm is used for analysis, because the number of clusters is a hyperparameter, it needs to be artificially input, which leads to the optimal number of clusters of samples is not easy to determine and has strong randomness. Therefore, the GSA-SC fusion algorithm is considered to determine the optimal clustering number of energy consumption set (Fig. 3).
Fig. 3. Schematic diagram of historical energy consumption distribution of enterprises
Due to the shut-down and refurbishment during the data statistics period, the total energy consumption, coal consumption and power consumption of the enterprise are all 0 data, so this type of 0 data is ignored, and the preprocessed data is shown in Fig. 4.
Fig. 4. Data display after preprocessing
In order to verify the superiority of the GSA-SC algorithm presented in this paper, GSA and SC are respectively used for comparative analysis, and the results are shown in
10
X. Liu et al.
the table. As can be seen from the table, if only GSA method is used to judge, when K ≥ 3, the change trend of Gap(K) is not obvious, and all values are very similar, leading to the difficulty in determining the optimal cluster number. If only SC algorithm is used for analysis, when K is 2 and 3, its corresponding S value is close, and the optimal cluster number is not easy to determine. When the GSA-SC algorithm proposed in this paper is used, it is easy to determine that when K is 3, G value is the largest, and no neighboring point value is similar to it. Therefore, the optimal number of clusters was selected as 3 (Table 1). Table 1. Effect comparison of GSA-SC algorithm K
GSA
SC
GSA-SC
2
0.241
0.42
0.3305
3
0.302
0.43
0.366
4
0.305
0.35
0.3275
5
0.304
0.312
0.308
6
0.311
0.29
0.3
The binary K-means++ algorithm was used for cluster analysis to obtain different energy consumption patterns. The clustering results are shown in the figure, and the details of cluster classes are shown in the table. According to the information in the table, the energy consumption data are divided into three clusters: A, B and C. Before normalization, the standard interval of cluster A is [5.01 TCE, 10.54 TCE], the standard interval of cluster B is [11.9 TCE, 16.6 TCE], and the standard interval of cluster C is [4.56 TCE, 9.04 TCE].
Fig. 5. Clustering results of enterprise energy consumption data
Table 2 shows the data clustering results achieved by combining IBM SPSS Statistics software. As can be seen from the table, when the number of cluster classes is set to
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection
11
the optimal number of cluster classes 3, the final cluster center and the number of cases in each cluster are completely consistent with the situation in the figure. It can be seen from Table 2 combined with Fig. 5. Table 2. Number of cases in each cluster class Clusters of class
A
B
C
Numbers
93
67
158
5.2 Abnormal Energy Consumption Detection for Enterprises As shown in Fig. 6, the solid red line is the running trend of the energy consumption time series curve of enterprises. Figure 7 shows that the low-frequency recombination sequence obtained by CEEMDAN is relatively flat and has the ability to resist outliers, which can well represent the running trend of the energy consumption time series curve of enterprises.
Number of days/d
Fig. 6. Comparison of energy consumption data and operation trend of enterprises
Figure 7 for the primitive LOF detection under the temporal sequence k = 30 result diagram, as shown in Fig. 7, you can see the serial number 0–20 and other time data is identified as the discrete points, this is because the test results by this period enterprise regular changes in the energy use, lead to the results of the discriminant deviation, thus unable to accurately identify outlier data points. As shown in Fig. 8, compared with the original outlier detection, CEEMDAN-LOF outlier data detection method adopted in this paper takes into account the running trend of time series, and LOF values of outlier data and normal data are significantly different, which is convenient to distinguish group data from normal data. The detailed diagnostic results of the two algorithms are shown in Table 3. When k = 10, 20, 30, and 40, there are many false and missed values when the original sequence
12
X. Liu et al. Normal MiscalculaƟo Residual valu
6 5
LOF value
Outliers 4 3 2 1 The sequential
Fig. 7. Adopts LOF outlier data detection method
is used to detect the outlier data. However, there is no false or missed values when CEEMDAN is used to detect the outlier data after eliminating the sequence running trend. It further validates the effectiveness of the method used in this paper to detect abnormal data. Table 3. Detection results using LOF and CEEMDAN-LOF algorithms K
LOF Number of outliers
CEEMDAN-LOF The number of residual
Number of mistakes
Number of outliers
The number of residual
Number of mistakes
10
10
7
0
17
0
0
20
15
8
6
17
0
0
30
17
13
13
17
0
0
40
18
11
12
17
0
0
Fig. 8. CEEMDAN-LOF outlier data detection method
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection
13
6 Conclusion Driven by the presented data based on the energy consumption of big energy efficiency evaluation and the type of data mining enterprises can use anomaly detection method to solve the energy consumption of the anomaly detection problem under the huge amounts of data, and for different types of enterprise energy consumption data monitoring and puts forward the corresponding method of energy efficiency for the enterprise, the power grid to improve the management level.
References 1. Guo, Y., Zhang, X., Jiang, F., Zhang, J.: Research and management of water supply energy saving technology based on carbon peaking and carbon neutralization target. Water Supply Drainage 1–5 (2022). Accessed 28 Aug 2022 2. Zhang, B., Lan, C.: Research on the present situation and countermeasures of energyconservation and emission-reduction in China’s iron and steel industry. In: 2010 Asia-Pacific Power and Energy Engineering Conference, pp. 1–4 (2010) 3. Zhang, J., Lei, Y.: Research on the multi-objective optimized model of power generation dispatching based on low-carbon economy and energy-conservation. In: The 27th Chinese Control and Decision Conference (2015 CCDC), pp. 1651–1655 (2015) 4. Tian, Y., Yu, Z., Zhao, N., Zhu, Y., Xia, R.: Optimized operation of multiple energy interconnection network based on energy utilization rate and global energy consumption ratio. In: 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), pp. 1–6 (2018) 5. Liu, H., Li, Z., Cheng, Q., Yao, T., Yang, X., Hu, Y.: Construction of the evaluation index system of the regional integrated energy system compatible with the hierarchical structure of the energy Internet. In: 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2), pp. 342–348 (2020) 6. Jian, S., Li, W., Peng, X., Yan, Z., Cheng, C., Yuan, H.: Abnormal detection of power consumption based on a stacking ensemble model. In: 2021 4th International Conference on Energy, Electrical and Power Engineering (CEEPE), pp. 1021–1026 (2021) 7. Li, Z.: Abnormal energy consumption analysis based on big data mining technology. In: 2020 Asia Energy and Electrical Engineering Symposium (AEEES), pp. 64–68 (2020) 8. Han, Z., Zhang, C., Gao, M.: Hybrid energy storage planning for island-type integrated energy system based on EMD decomposition. Therm. Power Gener. 2022(09), 72–78 (2022). Accessed 28 Aug 2022 9. Arkhangelski, J., Abdou-Tankari, M., Lefebvre, G.: Efficient abnormal building consumption detection by deep learning LSTM IOT data classification. In: 2022 11th International Conference on Renewable Energy Research and Application (ICRERA), pp. 125–129 (2022) 10. Zhang, W., Dong, X., Li, H., Xu, J., Wang, D.: Unsupervised detection of abnormal electricity consumption behavior based on feature engineering. IEEE Access 8, 55483–55500 (2020)
Research on Data-Driven AGC Instruction Execution Effect Recognition Method Haiyang Jiang1 , Hongtong Liu2(B) , and Yangfei Zhang2 1 State Grid Heilongjiang Electric Power Company, Harbin 150090, Heilongjiang, China 2 School of Electric Power Engineering, Nanjing Institute of Technology, Nanjing 211167,
Jiangsu, China [email protected]
Abstract. With the high penetration of random energy such as wind power and photovoltaic in the power grid, the influence of the accuracy of regulation of traditional thermal power units on the operation of the power grid is gradually increasing. Aiming at the problem of the deviation between the actual output of thermal power units and the AGC command of the power grid, this paper proposes a data-driven AGC command execution effect identification method. Firstly, based on Kernel Principal Component Analysis (KPCA), a data preprocessing method is proposed, which maps feature datasets into low-dimensional vectors to achieve dimensionality reduction. Secondly, the Independent Recurrent Neural Network (IndRNN) is used to process and predict the dimensionality reduction data, so as to realize the accurate perception of the adjustment effect of the unit execution command. Finally, the real power grid data is used to simulate and verify the proposed method. The results show that the model can effectively reduce the deviation of instruction execution. Keywords: recurrent neural network · kernel principal component analysis · Long short-term memory network · gated recurrent unit
1 Introduction Automatic generation control (AGC) has become increasingly mature and widely used after years of research [1, 2], in recent years, the installed capacity of renewable energy, mainly wind energy and photovoltaic energy, has increased year by year, due to the uncertainty of renewable energy output, it is necessary for the generator to undertake heavy output regulation task for a long time, however, the regulating capacity of thermal power units is far from meeting the demand of frequency modulation [3]. Therefore, it is urgent to improve the regulation capacity of thermal power units to deal with the problems caused by new energy grid connection. In the field of active power scheduling under the background of new energy generation, it is difficult to improve the regulation ability of thermal power units by traditional frequency modulation strategy optimization. However, with the rise of intelligent algorithms in recent years, more and more scholars use intelligent algorithms to improve the regulation ability of thermal power © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 14–24, 2023. https://doi.org/10.1007/978-981-99-3300-6_2
Research on Data-Driven AGC Instruction Execution Effect Recognition Method
15
units. Literature [4] proposed a multi-model predictive function control strategy and applied it to the design of power regulation control system to achieve better primary frequency modulation performance of thermal power units. Literature [5] proposed a deep fusion learning model based on convolutional neural network and fully connected neural network for short-term wind power generation prediction. In addition, in recent years, the regulation technology based on artificial intelligence technology has been gradually paid attention to by researchers. Literature [6] proposed an online decision method for unit recovery based on deep learning and Monte Carlo search and obtained an online recovery scheme with high robustness. Literature [7] proposed a self-learning decision-making method for unit combination based on longterm deep learning network. Compared with traditional methods, the decision accuracy and efficiency are improved, and the decision results are more robust. The strategies proposed in the above literatures are still based on the premise that thermal power units can accurately respond to AGC instructions, and they cannot overcome the problems caused by low response accuracy of thermal power units. To solve the above problems, this paper firstly uses deep recurrent neural network (DRNN) to predict and perceive the accuracy of AGC control command execution. Secondly, a preprocessing strategy for the input parameters of the deep network model is proposed to improve the overall training speed and output accuracy of the model. Finally, the real unit operation data in a provincial power grid is used for simulation analysis, and the advantages of the proposed preprocessing strategy in model training and convergence are verified. It is proved that the prediction model proposed in this paper can accurately predict the unit AGC command.
2 Independent Recurrent Neural Network 2.1 Recurrent Neural Network Recurrent neural network [8] (RNN) is a commonly used model for predicting time series. Its most important feature is that the state of cyclic neurons at a certain time not only depends on the current input, but also can be affected by the input before the current time. Such characteristics make it very suitable for solving time series prediction problems. The RNN is not only fully connected from layer to layer, but also the output and input of the hidden layer are connected through a specific arithmetic unit, forming a feedback loop and forming a flip-flop like structure, which endows the RNN unit with the ability to remember the state of the previous cycle. Figure 1 shows a standard single-layer RNN network.
16
H. Jiang et al. Output layer
O
Hidden layer
S
Input layer
X
W
Fig. 1. Structure diagram of recurrent neural network
Where X represents the value of the Input layer, S represents the value of the Hidden layer, O represents the value of the Output layer, U represents the Input weight matrix represents the Output weight matrix, and W represents the cyclic weight matrix. To solve more complex problems, the RNN network can be superimposed to build a deep recurrent neural network (DRNN) model. The DRNN model has more nonlinear features and weight parameters and can fit more complex functions. Figure 2 shows the structure diagram of a DRNN with three hidden layers.
Fig. 2. Deep recurrent neural network
Research on Data-Driven AGC Instruction Execution Effect Recognition Method
17
2.2 Improvement of Deep Recurrent Neural Network To solve the problem of gradient disappearance or gradient explosion in DRNN network, an improvement is made on the basis of RNN, long short-term memory network [9] (LSTM) and gated recurrent unit [10] (GRU) are proposed. These two network structures improve the problem of gradient disappearance or gradient explosion to a certain extent. However, because tanh or sigmoid is used as the activation function, the gradient attenuation between network layers is caused, and it is difficult to construct and train RNN-based deep LSTM or GRU. The above problems can be solved effectively by using Independent Recurrent Neural Network (IndRNN). The status update formula of IndRNN is: ht = σ (Wxt + Uht−1 + bh )
(1)
Stands for Hadamard product, This is the biggest difference with traditional RNN networks. A neuron in the cell of IndRNN at each time is only related to the corresponding neuron in the cell of other time, and has nothing to do with other neurons. Therefore, the connections between neurons can be achieved by stacking multiple layers of IndRNN units, making the IndRNN expand to deeper layers. For the n neuron, the hidden state at time t can be calculated as follows: (2) hn,t = σ Wn xt + un hn,t−1 + bn IndRNN presents a new perspective of aggregating spatial patterns (via W ) independently over time steps (via U ). By stacking multiple layers of neurons, each neuron in the next layer independently processes the output of all neurons in the previous layer, which reduces the difficulty of building deep network structures and enhances the processing ability of long sequence problems.
3 KPCA Preprocessing Process The input preprocessing process based on KPCA is as follows: 1) The parameters of KPCA preprocessing model are adjusted. The Grid Search (GS) algorithm with k-fold cross-validation was used to Search for the optimal KPCA preprocessing model hyperparameters. 2) Data normalization stage. The feature variables were input into the model in the form of vectors, and the corresponding data sets of the feature variables were standardized to map the data within the range of 0–1. 3) Calculate eigenvalues and eigenvectors. Firstly, the preprocessed data is mapped to a high-dimensional space by constructing a kernel function. Secondly, the covariance moments are obtained by calculating the eigenvectors and eigenvalues of the covariance matrix in the high-dimensional space. Finally, the calculated eigenvalues are sorted.
18
H. Jiang et al.
4) Principal components are determined by calculating the contribution rate of eigenvalues. First, calculating the contribution rate of each characteristic value, the characteristics of the contribution rate is greater than the fixed threshold value in a collection of C, and calculate the collection C characteristic value in the cumulative contribution rate, and if the cumulative contribution rate is greater than the threshold, then determine the characteristic value from the set of feature vector of the principal component number, and will get the number of principal component dimension reduction after save to set U ; If the cumulative contribution rate is less than the threshold, adjust the parameters and repeat step (4). 5) Judge whether GS algorithm completes the traversal of parameters. If so, select the KPCA model with the optimal parameters according to the maximization of the principal component contribution rate, and output the set U of the optimal KPCA model. If not, return to step 3. Figure 3 is the flow chart of input preprocessing based on KPCA. Start
Determine the optimization parameters, value range and step size
K-fold cross validation and grid search algorithm search the traversal parameters to obtain the KPCA model
The preprocessed dataset is fed into the KPCA model
Establish kernel function
Calculate the contribution of eigenvalues
Set C is composed of eigenvalues whose contribution rate is greater than a given threshold
Adjusting Parameter Values
Calculate the cumulative contribution of eigenvalues
Cumulative contribution rate is greater than the threshold
No
Yes No
GS completes the parameter traversal
Yes Determine the pivot number and save the number of pivot entries into the set U According to the maximum principal cumulative contribution to construct the optimal parameters of KPCA model
Output the pivot set U of the optimal KPCA model
End
Fig. 3. Flow chart of input variable preprocessing based on KPCA
Research on Data-Driven AGC Instruction Execution Effect Recognition Method
19
4 Identification Process of AGC Instruction Execution Effect The identification process of AGC instruction execution effect based on KPCA preprocessing strategy and DIndRNN model is as follows: 1) Data preprocessing. The original data of unit operation were normalized. 2) Construct DIndRNN model. The preprocessed data set was used as input to construct DIndRNN model. 3) Train DIndRNN model. The preprocessed training data is input into the model, and the predicted value obtained is compared with the training label. The weight of DIndRNN model is updated through back propagation, and step (3) is repeated until the training of DIndRNN model is completed. 4) DIndRNN model deployment. The preprocessed test data set is input into the trained DIndRNN model, and the predicted response value of the unit to AGC instruction is output, so as to realize the identification of the effect of the unit executing AGC instruction (Fig. 4).
Start
Raw data of unit output sequence
Data were normalized
The original data was input to K-fold cross validation of the KPCA model constructed by grid search
The training dataset after dimensionality reduction
Construct DIndRNN model
Output predicted value Update the weight The predicted value is compared with the data label
No Complete the training Yes DIndRNN model deployment
Import the dimensionally reduced test dataset
Output predicted value
End
Fig. 4. Flowchart of AGC instruction execution effect identification algorithm
20
H. Jiang et al.
5 Example Analysis 5.1 Basic Information of the Example To verify the validity and rationality of the model and control method proposed in this paper, real data of power grid are introduced as the analysis object. The data time span is half a year, the sampling frequency is 1 min, and the original data is 262,080 pieces. The models used in the examples are built by Tensorflow2.0, a deep learning framework owned by Google. The language is python3.7.6; The computer used is configured with Intel(R)Core(TM)[email protected] GHz and 16G memory. 5.2 Comparative Analysis of Training Set and Validation Set Randomly selected 80 predicted points, the prediction accuracy of DIndRNN model on training set and validation set is compared, and the single-step prediction result is shown in Fig. 5. It can be clearly seen from the figure that, in both the training set and the validation set, DIndRNN model can better give the output prediction results of the unit under AGC command, which reflects the good generalization ability of DIndRNN model, that is, it can also have a good effect on the data not in the training set.
Fig. 5. Comparative analysis of prediction results of training set and validation set
5.3 Comparative Analysis of DIndRNN Model and MLP Model Figure 6(a) shows the prediction results of DIndRNN model and MLP model. The red broken line represents the DIndRNN model prediction result, the blue broken line represents the MLP model prediction result, and the green broken line represents the real unit output value. MLP model presents a large error in predicting unit output sequence, while DIndRNN model has a higher prediction accuracy. This is because the MLP model cannot store sequence information and is difficult to process sequence data. DIndRNN has the ability of memory and has a good effect on sequence prediction.
Research on Data-Driven AGC Instruction Execution Effect Recognition Method
21
(a)Comparison between DIndRNN model and MLP model
(b) Compared with the MLP model prediction error distribution DIndRNN model
Fig. 6. DIndRNN model and MLP model analysis
Figure 6(b) shows the prediction error distribution of DIndRNN model and MLP model in the form of histogram. The horizontal coordinate of the histogram is each interval of the error distribution, and the vertical coordinate is the proportion of the model prediction error within the interval. The curve in the figure is the kernel density fitting curve. The orange histogram represents the error distribution of the DIndRNN model, while the blue histogram represents the error distribution of the MLP model. As can be seen from Fig. 6(b), the error distribution of DIndRNN model is far more concentrated than that of MLP model. This also shows from another aspect that DIndRNN model has a smaller error value (Table 1).
22
H. Jiang et al. Table 1. DIndRNN model and MLP model evaluation index
Evaluation index
DIndRNN
MLP
MAE
3.4211
8.5662
MSE
46.3561
132.1398
5.4 Influence of Preprocessing Strategy on Model Prediction Accuracy 1) Comparison of different dimensionality reduction models. In order to verify the effectiveness of the KPCA dimensionality reduction model, the KPCA dimensionality reduction model, local linear embedding (LLE) and principal component analysis (PCA) models are compared and analyzed in this section. Different dimensionality reduction models are used to process the original data, and DIndRNN model is used to output the data after processing. The evaluation indexes of the prediction results are shown in Table 2. Table 2. Different pretreatment strategy prediction precision quantitative contrast The evaluation index
KPCA
LLE
PCA
MAE
7.3239
16.3661
138.9642
MSE
110.2366
588.6422
36142.6866
As can be seen from the table, the linear dimensionality reduction method PCA has been unable to effectively process the data set in this paper, and too much effective information has been lost in the dimensionality reduction process, leading to the failure of the model to give prediction results. The nonlinear dimensionality reduction methods KPCA and LLE perform much better. Among them, KPCA model is better than LLE model. 2) Analysis of model prediction accuracy based on dimensionality reduction. Table 3 shows the prediction accuracy of DIndRNN model under different dimensionality reduction dimensions. Table 3. Comparison of model training errors in different dimensions The evaluation index
3d
4d
5d
MAE
11.8624
7.3239
7.6322
MSE
368.2402
110.2366
14.6988
Research on Data-Driven AGC Instruction Execution Effect Recognition Method
23
It can be seen from Table 3 that the prediction accuracy is better in 4d than in 3d. However, when the dimension changes from 4d to 5d, the prediction accuracy drops slightly. Dimension 4 is the optimal parameter. 3) The influence of preprocessing on the model training process. Figure 7 shows the error reduction of DIndRNN model in the training process whether KPCA dimensionality reduction preprocessing process is adopted.
Fig. 7. Diagram of the effect of preprocessing on model training
It can be seen from the orange broken line that the error decreases rapidly in the first five iterations, and the rate of decline is faster than that of the blue broken line. In addition, after 30 iterations, both kinds of broken lines tend to be flat. Compared with the blue curve, the error value of the yellow curve is smaller, which indicates that the KPCA dimension reduction strategy effectively simplifies the model.
6 Conclusion In this paper, an AGC instruction identification method based on independent recurrent neural network is proposed. The main conclusions are as follows: 1) The KPCA dimension reduction model constructed in this paper can effectively eliminate the factors that have little correlation with unit output. 2) Compared with the traditional recurrent neural network model, the DIndRNN prediction model constructed in this paper has better prediction accuracy. 3) Using KPCA preprocessing strategy and DIndRNN model, the dimensionality of data can be reduced under the premise of retaining as much effective information as possible, shorten the training time of the model, and improve the prediction accuracy.
24
H. Jiang et al.
References 1. He, C., Wang, H., Wei, Z., et al.: Distributed collaborative real-time control of wind farm and AGC unit. Acta Electr. Eng. Sin. 35(02), 302–309 (2015) 2. Liao, X., Liu, K., Le, J., et al.: Coordinated control strategy for cross regional AGC units based on bi level model predictive structure. Acta Electr. Eng. Sin. 39(16), 4674–4685+4970 (2019) 3. Lin, L., Zou, L., Zhou, P., Tian, X.: Multi-angle economic analysis on deep peak regulation of thermal power units with large-scale wind power integration. Autom. Electr. Power Syst. 41(07), 21–27 (2017) 4. Shu, J., et al.: Primary frequency modulation control strategy in deep peak load cycling state for thermal power units based on multi-model predictive control. Boiler Technol. 51(06), 12–17 (2020) 5. Hossain, M.A., Chakrabortty, R.K., Elsawah, S., Gray, E.M., Ryan, M.J.: Predicting wind power generation using hybrid deep learning with optimization. IEEE Trans. Appl. Supercond. 31(8), 1–5 (2021) 6. Sun, R., Liu, Y.: Online decision making of unit recovery based on deep learning and Monte Carlo tree search. Power Syst. Autom. 42(14), 40–47 (2018) 7. Yang, N., Ye, D., Lin, J., et al.: Research on intelligent decision-making method of unit commitment with self-learning ability based on data-driven. Chin. J. Electr. Eng. 39(10), 2934–2945 (2019) 8. Zhao, H., Xue, W., Li, X., et al.: Multi-mode neural network for human action recognition. IET Comput. Vis. 14(8), 587–596 (2020) 9. Xuejun, Y., Yiqun, Z., Shuxian, S., et al.: LSTM network for carrier module detection data classification. In: 2019 International Conference on Smart Grid and Electrical Automation (ICSGEA), Xiangtan, China (2019) 10. Dong, M., Grumbach, L.: A hybrid distribution feeder long-term load forecasting method based on sequence prediction. IEEE Trans. Smart Grid 11(1), 470–482
Research on Day-Ahead Scheduling Strategy of the Power System Includes Wind Power Plants and Photovoltaic Power Stations Based on Big Data Clustering and Filling Feng Qi1 , Qiang Wang2 , Xiaoqiang Wei1 , Yiping Zhang1 , Wenbin Wu1 , Donyang He1 , Taicheng Wang3(B) , and Suyang Shen3 1 State Grid Heilongjiang Electric Power Co., Ltd., Harbin 150090, Heilongjiang, China 2 Jixi Power Supply Company of State Grid Heilongjiang Electric Power Co., Ltd., Jixi 158100,
China 3 School of Electric Power Engineering, Nanjing Institute of Technology, Nanjing 211167,
Jiangsu, China [email protected]
Abstract. Traditional power system scheduling optimization methods cannot fully deal with the massive data brought by the increase of new energy penetration. Aiming at the above problems, this paper proposes a day-ahead scheduling optimization method based on big data clustering and filling for power system includes wind power plants and photovoltaic power stations. Firstly, according to the collected historical data of power grid, the K-means clustering method of big data is used to generate representative load, wind power and solar illumination scene sets. Secondly, a missing value filling method based on historical data assisted scene analysis was proposed. Finally, the day-ahead scheduling model of power system with wind power plants and photovoltaic power stations is established and solved by improved particle swarm optimization algorithm. The simulation results show that this method can improve the filling accuracy of the missing value of the dayahead dispatching historical data of the power system, and meet the development demand of the power system with new energy. Keywords: Data filling · Power system includes new energy · Scene analysis · Wave cross correlation analysis · Reactive power optimization
1 The Introduction With the continuous improvement of power grid automation and the wide application of power grid data intelligent acquisition system, the power grid data collection system is becoming mature. However, the lack of frequency and accuracy in the collection process of power grid data inevitably leads to some missing values in the data, so as to interfere with the process of data analysis and affect the final identification effect of the model. Therefore, how to effectively fill the missing value of power grid data has gradually become a difficult problem to be solved. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 25–36, 2023. https://doi.org/10.1007/978-981-99-3300-6_3
26
F. Qi et al.
In recent years, the research on big data [1] has been upsurge around the world. Big data technology has injected fresh blood into the development of smart grid and achieved good results [2]. Literature [3] proposes a medium and long-term load forecasting method for power system based on big data clustering. The calculation results show that the big data clustering algorithm can effectively cluster and partition a large number of load data, realize the discrimination and prediction of loads with different growth characteristics, and has high prediction accuracy. Literature [4] proposes complete compatibility classes based on the incomplete data fill algorithm, the energy consumption of large data management model, to help green data center, effective management of solar energy and other resources to match the utility to provide stable and adequate power supply, thus providing efficient service system for the whole data center energy services. Literature [5] proposes a missing data filling algorithm for the unified data model of panoramic regulation, which uses the improved Markov Monte Carlo method to estimate the missing data according to the known data, and obtains the best parameters of incomplete regulatory data through fewer iterations, thus improving the accuracy of missing data estimation. This paper proposes a day-ahead scheduling optimization strategy of power system based on big data clustering and filling. Firstly, the K-means clustering method of big data is used to generate representative load, wind power and solar illumination scene sets. Then, a missing value filling method of power grid data based on multi-parameter IOT fusion technology is proposed. The fluctuation cross-correlation analysis is carried out according to the historical data of power grid. By combining the weights and the dynamic time warping distance to measure a certain date missing attribute data is similar to the date of the known data integrated similarity to find out the attribute the date of the highest similarity, and the date at the same time to missing data combined with the linear fitting curve data attribute data fill further improved the precision of data missing value fill. Finally, the day-ahead scheduling model of the power system includes wind power plant and photovoltaic power station is established, and the improved particle swarm optimization algorithm is used to solve the problem. The simulation results show that the data after clustering can meet the requirements of data integrity and accuracy including day-ahead scheduling optimization of system with wind power plants and photovoltaic power stations.
2 K-Means Clustering Method for Massive Power Grid Data The process of grouping a collection of physical or abstract objects into multiple classes consisting of similar objects is called clustering. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster but different from objects in other clusters. Dissimilarity is calculated according to the membership value of the description object, and distance is often used as a measure. With the increasing perfection of the power grid information collection system, a large amount of network data can be collected. Therefore, the selected clustering method is required not only to be able to cluster quickly and accurately when the amount of information is not large, but also to adapt to the demand of increasing data volume and to output the clustering results timely and quickly. Because k-means algorithm can deal
Research on Day-Ahead Scheduling Strategy of the Power System
27
with large data sets, has good scalability and high efficiency, is simple and fast, can adapt to the demand of real-time processing of data volume growth, and is widely used in large-scale data clustering, so K-means algorithm is selected to cluster samples in this paper. The steps of classical k-means algorithm are as follows: 1) Determine the number of clusters N and the maximum number of iterations M. 2) Choose any N objects in the sample as the initial clustering center. 3) Calculate the Euclidean distance between the samples and N objects, and classify the samples according to the size of the Euclidean distance. The Euclidean distance is defined as: p (xik − xjk )2 (1) dij = k=1
In the formula: xik represents the observed value of the ith variable of the kth sample; p represents the number of samples; dij Represents the Euclidean distance between sample j and sample i. 4) Calculate the average value of various objects and update the clustering center. 5) Calculate the square error criterion function to determine whether convergence conditions are met. If convergence occurs, the algorithm ends. If it does not converge, judge whether the number of iterations is greater than M. if it is less than M, go to the third step, otherwise, the algorithm ends. The squared error criterion function is: E=
k
(xq − mi )2 , mi =
i=1 xq ∈Ci
xq nCi
(2)
xq ∈Ci
6) Output clustering results The validity of clustering can be from two aspects of the distance between the class and class distance measure, class distance means that the condensation degree of similar samples, the distance between the class means that between different classes of separating degree, thus a good clustering results, should meet the distance between the class, class distance is small, so the greater the similarity of the same class, the greater the difference of the sample in different classes. Contour coefficient is an index that comprehensively reflects the similarity and difference between classes, so it can be used to evaluate the quality of clustering and determine the reasonable number of clusters. The contour coefficient S is defined as: S=
n si i=1
n
(3)
In the formula, n represents the total number of samples; si represents the contour coefficient of sample i, its definition is: si =
yi − xi max{xi , yi }
(4)
28
F. Qi et al.
In the formula, xi represents the average distance between the ith sample in class x and other samples in class x, and represents the degree of cohesion in the class; Calculate the average of distances between xi and samples in all classes except x, and denote yi as the minimum value of this average. Obviously, the larger the value, the higher the clustering quality.
3 Missing Data Processing Strategy Based on Historical Data Assisted Scene Analysis In the field of day-ahead reactive power optimization of new energy grids, the accuracy and completeness of data have a great impact on the line loss analysis results. However, with the exponential growth of collected data, the problem of missing voltage data caused by manual input and acquisition device failure occurs frequently, so the missing data need to be identified or completed. The traditional maximum expected value algorithm [6], K-proximity algorithm [7] and other methods provide solutions, but the filling effect is not ideal because historical data are rarely used as the analysis basis. In recent years, the world raised a hot wave of big data research, big data technology for smart grid development injected fresh blood, and good results were obtained, therefore this section presents a missing value based on historical data aided scenario analysis to fill method, further improve the missing value in the line loss analysis fill precision at the same time, meet the demand of power grid development, The overall process is shown in Fig. 1: The specific steps are as follows: Step 1: Obtain the historical data of the power grid and go to Step 2. Step 2: The fluctuation cross-correlation between the known attribute data and the missing attribute data at the same time was calculated by the fluctuation cross-correlation analysis algorithm, and then Step 3 was performed. The calculation steps are as follows: 1) Certain two time series of equal length xi and yi , i = 1, 2, · · · , N 2) Calculate the sum of differences between xi , yi and the average value: ⎧ l
⎪ ⎪ (xi − x), l = 1, 2, · · · , N ⎨ x(l) = i=1
l
⎪ ⎪ ⎩ y(l) = (yi − y), l = 1, 2, · · · , N
(5)
i=1
In the formula, l represents the sampling length, x(l) and y(l) respectively represent the sum of differences between xi , yi and the average value at sampling length l, x and y respectively represent the average value of xi and yi . 3) Calculate the forward difference representing xi and yi autocorrelation respectively: x(l, l0 ) = x(l0 + l) − x(l0 ), l0 = 1, 2, · · · , N − l (6) y(l, l0 ) = y(l0 + l) − y(l0 ), l0 = 1, 2, · · · , N − l
Research on Day-Ahead Scheduling Strategy of the Power System
29
Start
Obtaining grid data Analyze data attributes, get M properties and set N=1 Input attribute data Calculate the wave cross correlation
N
Whether the threshold is exceeded
N=N+1
Y Discarded
N 0, it means xi and yi are positively correlated; When hxy < 0, it means that xi and yi are negatively correlated; A higher value of hxy indicates a higher degree of correlation between xi and yi . Step 3: If the fluctuation correlation between a known attribute data and the missing attribute data exceeds the comparison threshold, the known attribute data is retained and the missing attribute data goes to Step 4. Otherwise, the known attribute data is discarded. Step 4: The attributes corresponding to the M known attribute data retained are called Know attributes, and the attributes corresponding to the missing attribute data are called Unknow attributes. The combined weights of Know attributes and Unknow attributes are calculated respectively. In this step, the combined weight wj of Know attribute j and Unknow attributes are calculated by the following formula: ⎧ cj wj =
⎪ M ⎪ ⎪ ⎨ cj j=1 (9) M
⎪ ⎪ ⎪ wj = 1 ⎩ j=1
In the formula: M represents the number of Know attributes j = 1, 2, . . . , M , cj represents the fluctuation correlation coefficient between Know attribute j and Unknow attributes. Step 5: The scene analysis was carried out on the date containing Unknow attributes, and the H most similar scenes were found after clustering historical data of the power grid. The date with attribute Unknow is called the missing date, and The H most similar scenes found are called H similar scenes. Step 6: Firstly, the time period of the missing attribute data in the missing date is determined. Then, for the same time period of each similar scene, the similarity between each Know attribute data of the missing date and each Know attribute data of each similar scene is measured by the dynamic time-bending distance. The specific steps are as follows: 1) Set the occurrence time of missing attribute data as time tn , select n time points backward at time tn , select n time points forward at time tn , and finally form the time period (tn , t2n ) of missing attribute data in the missing date, including 2n + 1 time points t0 , t1 , . . . , t2n ; Let M Know attributes retained after comparison threshold judgment and screening be denoted as A1 , A2 , . . . , AM , Unknow attribute be denoted as A0 ;
Research on Day-Ahead Scheduling Strategy of the Power System
31
2) The Know attribute data A1 , A2 , . . . , AM at time t0 , t1 , . . . , t2n of the hth similar 2n
scene are denoted as D(1,h) , D(2,h) , . . . , D(M ,h) respectively. D(j,h) = d(j,h,g) , g=0
d(j,h,g) denotes the attribute data of Know attribute j at time tg in the hth similar scene, j = 1, 2, . . . , M , h = 1, 2, . . . , H , g = 1, 2, . . . , 2n; 3) The dynamic time-bending distance is used to measure the similarity S(j,h) between the Know attribute data Aj at time t0 , t1 , . . . , t2n in the hth similar scene and the attribute data D(j,h) at time t0 , t1 , . . . , t2n in the missing date. p represents the missing date. Step 7: Combined with the combined weights of each Know attribute and Unknow attribute, the comprehensive similarity of Unknow attribute of each similar scene is calculated using the following formula: Ch =
H M
wj × S(j,h)
(10)
j=1 h=1
In the formula, Ch represents the comprehensive similarity of Unknow attribute in the hth similar scene. Step 8: After finding N scenes whose comprehensive similarity of Unknow attribute is within the threshold, the data T1 of the Unknow attribute at the time tn of the scene is extracted from the ith scene as the longitudinal filling data. At the same time, the linear fitting of the curve for the Unknow attribute was used to find out the data T2 at time tn of this scene as the transverse filling data, and the filling value of the missing attribute data in this scene was solved: Ti = α × T1 + β × T2 (11) α+β =1 In the formula, time tn is the occurrence time of missing attribute data, α is the weight of T1 , and β is the weight of T2 . Call the probability of each scenario pi , the final imputation value of missing attribute data is obtained by multiplying the imputation value of each scene by its scene probability: T= pi Ti
4 Construction of Power System Optimization Model Including Wind Power Plant and Optical Power Station 4.1 Doubly-Fed Asynchronous Fan The relationship between DFIG active power PDFIG output and wind speed v [8] is as follows: ⎧ v ≤ vi , v ≥ vo ⎨ 0, PDFIG = k1 v + k2 , vi < v ≤ ve (12) ⎩ Pe , vo > v > ve
32
F. Qi et al.
In the formula, Pe is DFIG rated power; vi is the cut in wind speed; ve is the rated wind speed; vo is the cut out wind speed; k1 and k2 are the parameter of the fan power generation system. The relation expression of active and reactive power output in DFIG is as follows: ⎧ 2 ⎪ 2 ⎨ PDFIG + Q2 DFIG = (3Us Is ) 1−s (13) 2 2 2 2 ⎪ Xm ⎩ PDFIG + QDFIG + 3 Us = 3 U I s r 1−s Xs Xs In the formula: s is the slip rate; Us is the stator side voltage; Is is the stator winding current; Xs is the stator leakage reactance; Xm is excitation reactance; Ir is the rotor side converter current. 4.2 Photovoltaic Power Array The active power output of PV array shall meet the following equation: pPV = EηA
(14)
In the formula, η - photoelectric conversion efficiency; A - Effective light area. Reactive power output characteristics of PV array shall meet the following equation: 2 − P2 |QPV |max = SPV (15) PV In the formula: |QPV |max - maximum reactive power regulation capacity of PV; SPV grid-connected inverter capacity of PV system; PPV - Active power output of PV system. 4.3 The Objective Function The objective is to minimize the active power loss, and its objective function is as follows: min y = Ploss =
Nl
Gk (Ui2 + Uj2 − 2Ui Uj cos θij )
(16)
k=1
In the formula, Ploss is the active power loss of the system; Ui and Uj are node i and node j voltage amplitudes respectively; Nl is the number of system branches; Gk is the conductance of the branch; θij is the voltage phase Angle difference between node i and node j.
Research on Day-Ahead Scheduling Strategy of the Power System
33
4.4 The Constraint 1) System power flow constraint ⎧
⎪ Uk (Gik cos δik + Bik sin δik ) ⎨ PLi − PDGi = Ui k∈i
⎪ Uk (Gik sin δik − Bik cos δik ) ⎩ QLi − QDGi = Ui
(17)
k∈i
In the formula, node k is all the nodes connected with the node; PLi , QLi are the active and reactive power injected into node i; PDGi , QDGi are the active and reactive power of DG or battery injected into node i; δik is the phase Angle difference of nodes at both ends of branch ik; Gik , Bik are the real and imaginary parts of admittance on branch ik; Ui , Uk are the voltage amplitudes of node I and node k respectively. 2) Node voltage constraint Ui min ≤ Ui ≤ Ui max i = 1, 2, . . . , n
(18)
In the formula, n is the number of system nodes; Ui , Ui max , Ui min are respectively the node voltage of the node and its allowed maximum and minimum values. 3) Constraints on DG operation conditions QDGi min ≤ QDGi ≤ QDGi max
(19)
In the formula, QDGi , QDGi max , QDGi min are the reactive power output of DG of station I (group) and its allowable maximum and minimum value respectively.
5 Example Analysis 5.1 The Example Described In this paper, an improved IEEE14 node power system simulation analysis is adopted, as shown in Fig. 2. Node 9 is connected to a large-scale wind power plant, node 5 is connected to a centralized photovoltaic power station, and node 2 and node 10 are connected to 10 sets of switchable capacitor banks with a reactive power capacity of 200 kVar each. It is assumed that the transformer between branch 5–6 and branch 4– 9 is a 17-level transformer with load adjusting sub, and the improved particle swarm optimization algorithm is used to optimize the solution [9]. The voltage reference value of the power system is 220 kV, and the capacity reference value is 100 MVA. The fan inlet wind speed, rated wind speed and cut out wind speed are 4 m/s, 12 m/s and 25 m/s, respectively, with a rated capacity of 8 MW. The effective illumination area of the photovoltaic power station is 30000 m2 , the conversion efficiency is 0.9, and the rated capacity is 8 MW. The allowable range of the node voltage is ±6% of the rated voltage; The population of the improved PSO algorithm is 50, the number of iterations is 150, the GAMA parameter is 0.95, and the learning factor is C1 = C2 = 0.5.
34
F. Qi et al.
Fig. 2. Improved IEEE14 node system
5.2 Analysis of Optimization Results An electrical network of typical years of historical data as fill the database, and assumes that one of the typical day has the problem of the missing data, will fill it as object, the method proposed in this paper after data packing on that day, part of the extract containing fill data into the improved IEEE14 node in the system, and a scheduling optimization. The optimization results of control variables are shown in Fig. 3:
Fig. 3. Optimization results of control variables
The result of active power loss of the whole system for a whole day under various conditions is shown in the Table 1.
Research on Day-Ahead Scheduling Strategy of the Power System
35
Table 1. The simulation results contrast The layer number
Situation name
Value of loss (kW)
The error rate from the true value
1
The loss of the original system
345.77
5.01%
2
The loss of the system before reactive power optimization with real data
341.62
3.75%
3
The loss of the system after reactive power optimization with real data
329.28
0%
4
The loss after clustering and filling in out possible scenarios with big data
327.53
0.532%
Standard system is the first case IEEE14 node in the table of a scheduling cycle loss, the second is in the system after modification is not a scheduling a scheduling cycle loss, and the third is the transformation system has a scheduling cycle loss after scheduling optimization, the fourth case is to use the clustered scene data to fill in the missing data, and carry out day-ahead scheduling optimization to optimize the loss of a scheduling cycle. According to the data in the table, the method proposed in this paper is used to fill the grid data, and then the grid with the data filled is optimized for day-ahead scheduling. The results obtained are close to accurate data, which can meet the requirements of data integrity and accuracy for day-ahead scheduling optimization of new energy grid. This paper proposes a day-ahead scheduling optimization method for power systems including wind power plants and photovoltaic power stations based on big data clustering and filling. The main conclusions are as follows: 1) The proposed missing value filling method based on historical data assisted scenario analysis can improve the filling accuracy of the missing value of the historical data of the day-ahead scheduling of the power system. 2) The load, wind power and photoelectric scene sets obtained by K-means clustering method of big data can be accurately represented. 3) Establish day-ahead scheduling model of power system including wind power plants and photovoltaic power stations, and use improved particle swarm optimization algorithm to solve it. The simulation results show that this method can improve the filling accuracy of the missing value of the day-ahead dispatching historical data of the power system, and meet the development demand of the power grid with new energy.
36
F. Qi et al.
References 1. Li, G., Cheng, X.: Big Data Research: a major strategic field of future technology and economic and social development – research status and scientific thinking of big data. Proc. Chin. Acad. Sci. 27(06), 647–657 (2012) 2. Song, Y., Zhou, G., Zhu, Y.: Current status and challenges of smart grid big data processing technology. Power Grid Technol. 37(04), 927–935 (2013) 3. Xu, Y., Cheng, X., Li, Y., Zhang, H., Yu, W., He, B.: Medium and long-term load forecasting of power system based on big data clustering. J. Electr. Power Syst. Autom. 29(08), 43–48 (2017) 4. Yuan, J., Zhong, L., Yang, G., Chen, M., Gu, J., Li, T.: Research on filling and classification algorithm of incomplete energy consuming big data in green data center. Chin. J. Comput. 38(12), 2499–2516 (2015) 5. Tang, L., Wang, R., Wu, R., Fan, B.: Missing data filling algorithm for panoramic regulation unified data model. Autom. Electr. Power Syst. 41(01), 25–30+87 (2017) 6. Mallick, P., Chen, Z., Zamani, M.: Reinforcement learning using expectation maximization based guided policy search for stochastic dynamics. Neurocomputing 484, 79–88 (2017) 7. Kim, Y.K., Kim, H.J., Lee, H., Chang, J.W.: Privacy-preserving parallel kNN classification algorithm using index-based filtering in cloud computing. PLoS ONE 17(5), e0267908 (2017) 8. Lang, Y., Zhang, X., Xu, D., Ma, H., Hadianmrei, S.R.: Reactive power analysis and control strategy for doubly-fed wind farm. Proc. CSEE 2007(09), 77–82 (2007) 9. Hematpour, N., Ahadpour, S.: Execution examination of chaotic S-box dependent on improved PSO algorithm. Neural Comput. Appl. 33(10), 5111–5133 (2020)
Day-Ahead Scenario Generation Method for Renewable Energy Based on Historical Data Analysis Hong Wang1 , Yang Liu1 , Chao Yin1 , Xuesong Bai1 , Bo Sun1 , Menglei Li2(B) , and Xiyong Yang2 1 State Grid Heilongjiang Electric Power Company, Harbin 150090, Heilongjiang, China 2 Nanjing Institute of Technology, Nanjing 211167, Jiangsu, China
[email protected]
Abstract. With the massive application of new energy, the contradiction of power grid regulation has become increasingly prominent. How to effectively predict the range of the power grid is a huge challenge faced by the day-ahead dispatch of power system. Aiming at this problem, this paper proposes a method for generating day-ahead scenarios for renewable energy based on historical data analysis. First, the deep embedding clustering (DEC) algorithm is used to analyze historical data, and periods with similar characteristics are divided into one group. Then the conditional deep convolutions generative adversarial network (C-DCGAN) model generates a day-ahead scenario set for renewable energy. At last, the Belgian Elia renewable energy data is used for simulation analysis, and the results show that the proposed method can accurately describe the uncertainty of renewable energy. Keywords: Renewable energy · Cluster analysis · Conditional deep convolutional generative adversarial networks · Day-ahead scenario set
1 The Introduction Global energy is undergoing a deep transformation with renewable energy becoming the focus of energy development [3]. However, due to the uncertainty of renewable energy, it brings huge challenges to the power system [4]. How to accurately describe the uncertainty is one of the key issues to overcome these challenges. The stochastic optimization method uses the scenario analysis method to deal with the uncertainty of renewable energy [1]. Scholars has used statistical methods to generate renewable energy scenarios: Class method, time series simulation method, Markov chain, scene tree generation method, autoregressive moving average model, etc. [2, 3]. The output of renewable energy is easily affected by various factors such as geographical environment, climate and weather. In recent years, with the rapid development of artificial intelligence, deep learning algorithms have become popular. Compared with statistical methods, renewable power output scenarios generated by deep learning models can accurately describe the uncertainty of renewable energy due to excavating multi-feature and high-dimensional raw © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 37–46, 2023. https://doi.org/10.1007/978-981-99-3300-6_4
38
H. Wang et al.
data. Main deep learning models include: variational auto-encoder (VAE) [4], deep belief network (DBN) [5], generative adversarial networks (GAN) [6]. In recent years, a large number of scholars have studied the method of new energy output scenarios based on unsupervised learning models. Reference [7] proposed a method for new energy output scenarios based on variational autoencoders. Using variational autoencoders to extract non-Euclidean correlation features and temporal correlation features and effectively describe the correlation between multi-source scenarios. Reference [8] uses GAN to find the time-space correlation features of new energy output and introduce Wasserstein distance to improve the training quality of the model. Reference [9] proposes a renewable energy day-ahead scenario generation method based on a conditional generative adversarial network (CGAN), learning the relation between the noise distribution and the day-ahead scenarios through the training of the adversarial network. Compared with the Markov chain scenario generation method, it can describe the uncertainty of wind power more accurately. Although a series of research has achieved a lot, there are few studies on the generation methods of day-ahead scenarios with new energy access. Aiming at such problems, this paper proposes a method for generating day-ahead scenarios for renewable energy based on historical data analysis. First, the deep embedding clustering (DEC) algorithm is used to cluster and analyze historical data, and time periods with similar characteristics are collected together; second, the conditional deep convolutional generative adversarial network (C-DCGAN) model generates a set of renewable energy day-ahead scenarios; finally, the simulation analysis is carried out with the Belgian Elia renewable energy data to verify the effect of the method in this paper. The example results show that the method proposed in this paper can more accurately describe the uncertainty of renewable energy.
2 General Framework of Day-Ahead Scenario Generation Method for Renewable Energy Figure 1 shows the framework of the day-ahead prediction method under the high proportion of renewable energy, which is mainly divided into three steps: Step 1: Collect historical data and use autoencoder to get low-dimensional features; use K-means clustering to get initial clustering results; calculate rework loss and clustering loss function and use the K-means algorithm to iterate repeatedly until the clustering result is within the set deviation range. Step 2: Establish a C-DCGAN model for each cluster; collect historical prediction data and actual data, input into C-DCGAN for training. After training, the model can learn the relationship between the noise and the actual output. Step 3: Obtain the prediction value of the new energy output, judge its feature and select the C-DCGAN model, input the prediction value, and day-ahead scenario is generated multiple times to form a scenario set; calculate the prediction interval.
K-means clustering
Optimizer Tuning Parameters
Calculate the loss function
decoder
Training network
C-DCGAN1
C-DCGAN2
...
Generate day-ahead scenario set
Choose C-CDGAN model type n Prediction value
...
Actual value
Prediction value
type2
Actual value
Prediction value
type1
Generated scenario evaluation
No
Iterative error