Big Data and Security: 4th International Conference, ICBDS 2022, Xiamen, China, December 8–12, 2022, Proceedings (Communications in Computer and Information Science, 1796) 9819932998, 9789819932993

This book constitutes the refereed proceedings of the 4th International Conference on Big Data and Security, ICBDS 2022,

131 84 70MB

English Pages 767 [759] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Organization
Contents
Big Data and New Method
Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection of Multi-type Enterprises Based on Energy Consumption Big Data Mining
1 Introduction
2 Determination of the Optimal Number of Clusters
3 Energy Consumption Pattern Recognition Based on K-Means Clustering
4 Abnormal Analysis of Energy Consumption Data
4.1 LOF Algorithm
4.2 CEEMDAN Algorithm
5 Example Simulation and Analysis
5.1 Enterprise Energy Consumption Mode Division
5.2 Abnormal Energy Consumption Detection for Enterprises
6 Conclusion
References
Research on Data-Driven AGC Instruction Execution Effect Recognition Method
1 Introduction
2 Independent Recurrent Neural Network
2.1 Recurrent Neural Network
2.2 Improvement of Deep Recurrent Neural Network
3 KPCA Preprocessing Process
4 Identification Process of AGC Instruction Execution Effect
5 Example Analysis
5.1 Basic Information of the Example
5.2 Comparative Analysis of Training Set and Validation Set
5.3 Comparative Analysis of DIndRNN Model and MLP Model
5.4 Influence of Preprocessing Strategy on Model Prediction Accuracy
6 Conclusion
References
Research on Day-Ahead Scheduling Strategy of the Power System Includes Wind Power Plants and Photovoltaic Power Stations Based on Big Data Clustering and Filling
1 The Introduction
2 K-Means Clustering Method for Massive Power Grid Data
3 Missing Data Processing Strategy Based on Historical Data Assisted Scene Analysis
4 Construction of Power System Optimization Model Including Wind Power Plant and Optical Power Station
4.1 Doubly-Fed Asynchronous Fan
4.2 Photovoltaic Power Array
4.3 The Objective Function
4.4 The Constraint
5 Example Analysis
5.1 The Example Described
5.2 Analysis of Optimization Results
References
Day-Ahead Scenario Generation Method for Renewable Energy Based on Historical Data Analysis
1 The Introduction
2 General Framework of Day-Ahead Scenario Generation Method for Renewable Energy
3 Cluster Analysis Based on Deep Embedding Clustering
3.1 Overall Framework of Deep Embedding Clustering Algorithm
3.2 Stacked Autoencoder
3.3 Initial Clustering of Low-Dimensional Features
3.4 Joint Optimization
4 Day-Ahead Scenario Generation Based on C-DCGAN
4.1 Conditional Deep Convolution GAN
4.2 Building Models Based on C-DCGAN
4.3 Training of the C-DCGAN Model
5 Determination and Evaluation of the Day-Ahead Value
5.1 Determination of the Day-Ahead Prediction Value
5.2 Evaluation of the Day-Ahead Prediction Value
6 Example Analysis
6.1 Day-Ahead Scenario Generation
6.2 Method Comparison
7 Conclusion
References
A High-Frequency Stock Price Prediction Method Based on Mode Decomposition and Deep Learning
1 Introduction
2 Related Work
3 Methodology
3.1 EMD-Based Price Series Decomposition
3.2 CNN-Based High-Dimensional Feature Extraction
3.3 GRU-Based Modeling and Price Prediction
4 Results Evaluation
4.1 Datasets and Evaluation Metrics
4.2 High-Frequency Price Series Decomposition
4.3 High-Frequency Stock Price Prediction Results
5 Discussion and Analysis
5.1 Comparative Analysis of Different Model Parameters
5.2 Comparative Analysis of Different Prediction Methods
6 Conclusion
References
A Cross-Platform Instant Messaging User Association Method Based on Supervised Learning
1 Introduction
2 Background and Related Work
3 Cross-Platform Instant Messaging User Association Based on Supervised Learning
3.1 Get User Information and Trajectory Information
3.2 Cross-Platform User Association Feature Selection
3.3 Classification Model Training and Testing
3.4 Cross-Platform User Association
4 Experiments
4.1 Experimental Dataset and Evaluation Criteria
4.2 Experimental Setup
4.3 Comparative Analysis of Experimental Results
5 Conclusion
References
Research on Data Watermark Tracing System in Hadoop Environment
1 Introduction
2 Data Tracing Model
2.1 Data Tracing Technology
2.2 Data Watermark Tracing Model
3 Hadoop Environment Architecture
4 Data Watermark Tracing System in Hadoop Environment
4.1 Data Watermarking Method
4.2 Data Watermark Tracing System
5 Validation of System Performance
5.1 Verification Parameters
5.2 Verification Results
6 Conclusion
References
Application of RFID Tag in the Localization of Power Cable Based on Big Data
1 Introduction
2 UHF RFID Technology
3 Positioning Algorithm of Faulty Cables Based on RFID
3.1 Deployment Plan for Positioning
3.2 The Algorithm for Fault Localization
4 Positioning Algorithm of Faulty Cables Based on RFID
4.1 Single-End Traveling Wave Ranging Principle
4.2 Single-End Traveling Wave Ranging Principle
5 Simulation Results
6 Conclusion
References
Research Hotspots and Evolution Trend of Virtual Power Plant in China: An Empirical Analysis Based on Big Data
1 Introduction
2 Related Work
2.1 Status Quo of Literature Big Data Mining
2.2 Research Status of VPP
3 Data Sources and Research Design
3.1 Data Collection
3.2 Building Knowledge Graphs Based on CiteSpace
4 Research Findings
4.1 VPP Area Research Heat Analysis
4.2 Hot Topics of VPP
4.3 Clustering Mapping Analysis of Keywords in the Field of VPP Research
4.4 Research Hotspots Evolutionary Trends
4.5 Journal Distribution
5 Conclusion and Outlook
5.1 Conclusion
5.2 Outlook
References
Influencing Factors Analysis and Prediction Model of Pavement Transverse Crack Based on Big Data
1 Introduction
2 Evaluation of Transverse Cracks in Asphalt Pavement with Semi-rigid Base
2.1 Evaluation Indicators
2.2 Evaluation Results
3 Influencing Factors of Transverse Cracks Evaluation Index
3.1 IF of Node Index
3.2 IF of Development Index
4 Fracture Energy Test and Correlation Analysis with TCS
5 TCS Prediction Model Construction
6 Conclusions
References
Research on the Evolution of New Energy Industry Financing Ecosystem Under the Background of Big Data
1 Introduction
2 Theoretical Basis
2.1 Big Data
2.2 New Energy Industry Financing Ecosystem in the Context of Big Data
3 Analysis of the Evolutionary Process of New Energy Industry Financing Ecosystem Based on Logistic Equation in the Context of Big Data
3.1 An Evolutionary Model of New Energy Industry Financing Ecosystem Based on Logistic Equation
3.2 The Evolutionary Process of New Energy Industry Financing Ecosystem Based on Logistic Equation
4 An Empirical Test of the Evolution of New Energy Industry Financing Ecosystem Based on Logistic Equation in the Context of Big Data
5 Co-Evolutionary Strategy of New Energy Industry Financing Ecosystem in the Context of Big Data
References
A Survey of the State-of-the-Art and Some Extensions of Recommender System Based on Big Data
1 Introduction
2 Recommender System Approaches
2.1 Content-Based Approach
2.2 Collaborative Filtering Approach
2.3 Knowledge-Based Approach
2.4 Hybrid Approach
3 Emerging Approach
3.1 Deep Learning-Based Approach
3.2 Knowledge Graphs-Based Approach
3.3 Parallel Computing-Based Approach
4 Conclusions and Discussion
References
An Innovative AdaBoost Process Using Flexible Soft Labels on Imbalanced Big Data
1 Introduction
2 Related Work
2.1 AdaBoost
2.2 Soft Label
3 Methodology
3.1 The General Framework
3.2 Creating Soft Labels
3.3 AdaBoost Based on the Soft Label
4 Experiments
4.1 Datasets
4.2 Experiments Results
5 Conclusion
References
Artificial Intelligence and Machine Learning Security
Feature Fusion Based IPSO-LSSVM Fault Diagnosis of On-Load Tap Changers
1 Introduction
2 Construction of Characteristic Matrix Based on OLTC Vibration Signal
2.1 Time Domain Feature Extraction
2.2 IMPE Feature Extraction
2.3 Construction of Feature Fusion Matrix
3 OLTC Fault Diagnosis Model Based on IPSO-LSSVM
3.1 IPSO Algorithm
3.2 LSSVM Algorithm
3.3 OLTC Fault Diagnosis Model Based on IPSO-LSSVM
4 Experimental Verification and Analysis
4.1 Experimental Data Acquisition
4.2 Comparison of Calculation Results
5 Conclusion
References
Graphlet-Based Measure to Assess Institutional Research Teams
1 Introduction
2 Related Work
3 Data and Methods
3.1 Data Source and Preprocessing
3.2 Research Method
4 Case Study
4.1 Choice of Top 20 Institutions
4.2 Distribution of Connected Subgraphs of Research Teams in the Top 20 Institutions
4.3 The Graphlet Structure of the Largest Connected Subgraph of the Scientific Research Team of the Top 20 Institutions
4.4 Similarity Analysis of the Largest Connected Subgraph of Top 20 Institutional Research Teams
5 Discussion
6 Conclusion
References
Logical Relationship Extraction of Multimodal South China Sea Big Data Using BERT and Knowledge Graph
1 Introduction
2 Related Work
2.1 Research on the South China Sea Big Data
2.2 Research on the Logical Relationship Extraction
3 Data and Methods
3.1 Data Collection and Preprocessing
3.2 Methods
4 Logical Relationship Extraction of the South China Sea Evidence Data Based on BERT
4.1 Experimental Dataset
4.2 Experimental Environment
4.3 Experimental Design and Evaluation Metrics
4.4 Experimental Results and Analysis
5 Visualization of Logical Relationship of the South China Sea Evidence Data Based on Knowledge Graph
6 Conclusion and Future Work
References
Research on Application of Knowledge Graph in War Archive Based on Big Data
1 Introduction
2 Construction of Knowledge Graph
2.1 Knowledge Modeling
2.2 Knowledge Extraction
2.3 Knowledge Fusion
2.4 Knowledge Management
3 Application of Knowledge Graph
4 Discussion and Future Work
References
Semi-supervised Learning Enabled Fault Analysis Method for Power Distribution Network Based on LSTM Autoencoder and Attention Mechanism
1 Introduction
2 Related Work
3 System Model
3.1 Unsupervised LSTM Autoencoder
3.2 Attention Mechanism
3.3 Semi-supervised LSTM Autoencoder
4 Time Series Anomaly Detection Algorithm Based on SS-LSTM-AAE
5 Performance Evaluation and Analysis
5.1 Evaluation Index
5.2 Experimental Results
6 Conclusion
References
Fault Detection Method for Power Distribution Network Based on Ensemble Learning
1 Introduction
2 Related Work
3 Construction of Semi-supervised Anomaly Detection Model for Electric Power Networks Based on Ensemble Learning
3.1 Unsupervised LSTM Prediction Model
3.2 Semi-supervised LSTM Prediction Model
3.3 Semi-supervised Anomaly Detection Model Based on Ensemble Learning
4 Time Series Anomaly Detection Algorithm Based on SS-LSTM-PAAE
4.1 Data Preprocessing
4.2 Training Process of the Proposed SS-LSTM-PAAE Model
4.3 Abnormal Detection
5 Experimental Results and Analysis
5.1 Experiment Settings
5.2 Experimental Results of Semi-supervised LSTM Prediction Model
6 Conclusion
References
Factors Influencing Chinese Users’ Willingness to Pay for O2O Knowledge Products Based on Information Adoption Model
1 Introduction
2 Literature Review
3 Theoretical Model Construction
3.1 Theoretical Model
3.2 Research Hypothesis
4 Research Design
4.1 Data Sources and Variable Descriptions
4.2 Statistical Model
5 Results
5.1 Descriptive Statistics of the Sample
5.2 Statistical Model Analysis
6 Conclusions
6.1 Conclusion and Practical Implications
6.2 Limitations and Future Research Directions
References
A Survey of Integrating Federated Learning with Smart Grids: Application Prospect, Privacy Preserving and Challenges Analysis
1 Introduction
2 Related Work
3 Local Data Collection and Analysis
4 Data Privacy Protection
4.1 Current Privacy Issues in Modern Smart Grids
4.2 Effective Solutions to Privacy Issues
5 Model Training Framework
6 Conclusion
References
Application of the Fusion Access Technology of Carrier and 5G in Power Communication
1 Introduction
2 Research on Communication Technology
2.1 Low Delay Technology for 5G
2.2 The Necessity for Carrier Communication
3 Fusion Access Technology of Carrier and 5G
4 Testing of the Switching Device
4.1 Design of the Scheme
4.2 The Results of the Experiment
5 Conclusion
References
Muscle Fatigue Classification Based on GA Optimization of BP Neural Network
1 Introduction
2 Theory and Method
2.1 Characteristics analysis of sEMG signal
2.2 BP Neural Network
2.3 BP Neural Network Optimized by GA
3 Experiments
3.1 Classification of Muscle Fatigue
3.2 Experiment Scheme Design
4 Results
4.1 The Characteristics of Signals Analysis
4.2 Performance Comparison of Different Algorithms
5 Conclusion
References
Data Technology and Network Security
Research on Typical Scenario Generation Based on Distribution Network Data Mining and Improved Policy Clustering
1 The Introduction
2 The Initial Scenario is Divided Based on the Multivariate Load Evaluation Model
2.1 To Construct the Multivariate Evaluation Model of Distribution Network Load
2.2 The Initial Scene is Generated Based on K-means Clustering Method
3 Design the Intermediate Scene Set Based on Different Ways of Energy Use
3.1 “Single Power Supply” Requirement
3.2 “Electricity and Other Energy Supply” Demand
4 The Final Scene is Generated by the Density Clustering Method Based on the Improved Strategy
4.1 Improved Density Clustering Method
4.2 The Final Typical Scene is Generated According to the Clustering Results
5 Example Analysis
5.1 Construct Multivariate Load Evaluation Model
5.2 K-means Clustering Analysis
5.3 Density Clustering of Improved Strategies
6 Conclusion
References
A Data-Driven and Deep Learning-Based Economic Evaluation Method for New Power System Distribution Grid
1 Introduction
2 Multi-index for Economic Evaluation
3 Multi-level Fuzzy Comprehensive Evaluation Method
3.1 Calculate Subjective Weight with AHP
3.2 Calculate Objective Weight with EWM
3.3 Calculate the Combined Weight
3.4 Multi-level Fuzzy Comprehensive Evaluation Method
4 DBN Model
5 Example Analysis
6 Conclusion
References
Study on Random Generation of Virtual Avatars Based on Big Data
1 Introduction
2 Methods
2.1 Data Set Optimization
2.2 Build GAN
2.3 The Optimization Function and the Loss Function
3 Results
4 Conclusion
References
Ordering, Pricing, and Coordination of a Closed-Loop Supply Chain with Risk Preference
1 Introduction
2 Literature Review
3 Problem Clarification and Basic Assumptions
4 CLSC Model Under the M-CVaR Criterion
4.1 M-CVaR Model
4.2 Decentralized Decision (DD) Model
4.3 Centralized Decision (CD) Model
4.4 The Coordination Based on the M-CVaR Criterion
5 Numerical Examples
5.1 Effects of the Risk-Averse and Pessimistic Coefficients on Optimal Order Quantity
5.2 Effects of the Risk-Averse and Pessimistic Coefficients on the Optimal Wholesale Price
5.3 Effects of the Risk-Averse and Pessimistic Coefficients on the Expected Profit of Each CLSC Member
5.4 Effects of the Risk-Averse Coefficient and Pessimistic Coefficient on CLSC Coordination
6 Conclusion
References
An Explainable Optimization Method for Assembly Process Parameter
1 Introduction
2 Process Parameter Optimization Plan and Information Collection
2.1 Process Parameter Optimization Plan
2.2 Data Collection of Engine Assembly Process Parameters
3 Vibration Prediction Model Training and Explanation
3.1 Vibration Prediction Model Training
3.2 SHAP Explains Engine Vibration Value Prediction Model
4 Assembly Process Parameter Optimization Results
5 Conclusion
References
A Correlational Strategy for the Prediction of High-Dimensional Stock Data by Neural Networks and Technical Indicators
1 Introduction
2 Related Work
3 Preliminaries
3.1 Stock Technical Indicators (STIs)
3.2 Deep Neural Networks
3.3 Performance Metrics (PM)
4 Proposed Strategy
4.1 Data Collection and Preprocessing
4.2 Stock Movements
4.3 Correlational Strategy
4.4 Neural Networks Implementation with STI
5 Experiments
5.1 Datasets and Experimental Processes
6 Results Evaluation
6.1 Movement Analysis of Hong Kong Stock
6.2 Correlation of Movement and STI
6.3 Stock Prediction with NNs
6.4 Comparative Analysis
7 Conclusion
References
Research on Attribute-Based Privacy-Preserving Computing Technologies
1 Introduction
2 Schemes
2.1 Attribute-Based Encryption with Keyword Search
2.2 Attribute-Based Encryption with Symmetric Search
2.3 Attribute-Based Encryption with Private Set Intersection
2.4 Attribute-Based Encryption with Homomorphic Encryption
3 Conclusions
References
Research on BIM Modeling Technology from the Perspective of Power Big Data Security
1 Introduction
2 Introduction to BIM Technology
3 BIM Based Security Modeling
3.1 Basic Model of Safety Management
3.2 Data Security Fusion and Integration
3.3 Secure Digital Twinning Mechanism
3.4 Safety Risk Identification
3.5 BIM Based Power Production Safety Plan
3.6 BIM Based Safety Inspection
3.7 BIM Based Safety Training
4 Security Model
4.1 Satellite Positioning Data Comparison
4.2 Hazard Identification Technology
5 Application Example
6 Conclusion
References
Security Risk Management of the Internet of Things Based on 5G Technology
1 Introduction
2 Internet of Things Technology
2.1 Basic Knowledge
2.2 Network Security Management Game
2.3 Limited and Rational Safety Management Game
2.4 Limited and Rational Safety Management Game
2.5 Format Tower Nash Equilibrium
3 The Design of the Algorithm
3.1 Rewrite the Constraint Optimization Formula
3.2 End-To-End Time-Delay Jitter Analysis Model
3.3 End-To-End Time-Lapse Analysis
4 Results of the Case Analysis
5 Conclusion
References
Multi Slice SLA Collaboration and Optimization of Power 5G Big Data Security Business
1 Introduction
2 Related Work
3 Security Model for Power 5G Network Slice
3.1 Big Data Security SLA Analysis of Power 5G Service
3.2 Power Business Big Data Security Perception Collaborative Optimization Model
4 Simulation
5 Conclusion
References
Virtualized Network Functions Placement Scheme in Cloud Network Collaborative Operation Platform
1 Introduction
2 Related Work
2.1 The IETF Standard
2.2 The BBF Standard
2.3 The ETSI Standard
2.4 The ITU-T Standard
3 System Model
4 Proposed Solutions
4.1 SDN IPRAN Solution
4.2 Loose Coupling Scheme
4.3 No Coupling Scheme
4.4 Solutions for SDON
5 Evaluation and Discussions
6 Conclusion
References
Grounded Theory-Driven Knowledge Production Features Mining: One Empirical Study Based on Big Data Technology
1 Introduction
2 Related Work
3 Data Sources and Methods
3.1 Data Collection
3.2 Data Preprocessing
3.3 Data Set Storage
4 Research Focus Mining Driven by Grounded Theory
5 Scientific Research Collaboration Pattern Driven by Grounded Theory
5.1 Author Collaboration Networks
5.2 Institutional Collaborative Network
6 Summary and Outlook
References
Research on Digital Twin Technology of Main Equipment for Power Transmission and Transformation Based on Big Data
1 Introduction
2 Research on Technology
2.1 Geometric Information Measurement Technology Based on Multi-visual Image
2.2 Real-Time Target Detection and Tracking Technology Based on Real-Scene Video
3 Image Synthesis Technique from a Free Perspective
3.1 Free-Perspective Image Synthesis Based on Depth Information
3.2 Free-Perspective Image Synthesis Based on Light-Field Reconstruction
3.3 Camera Array Building Strategy
3.4 Automatic Fusion and Matching of Images with 3D Models
4 Simulation and Implementation
5 Conclusion
References
Efficient Spatiotemporal Big Data Indexing Algorithm with Loss Control
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 System Model
3.2 Construct MBRs with Hilbert Curve
3.3 Indexing Algorithm
4 Experimentation and Results
4.1 Experimental Setup
4.2 Performance Evaluation for Range Query
5 Conclusion
References
Cybersecurity and Privacy
Privacy Measurement Based on Social Network Properties and Structure
1 Introduction
2 Related Work
3 Privacy Guarantee Framework
4 Privacy Risk Calculation Methods
4.1 Individual Privacy Risk Score
4.2 Similarity Calculation Method
4.3 Global Privacy Risk Score
5 Experimental Results
5.1 Dataset
5.2 Individual Privacy Risk Score
5.3 User Importance Analysis
5.4 Global Similarity Score
5.5 Global Privacy Risk Score
6 Conclusion
References
Research on 5G-Based Zero Trust Network Security Platform
1 Introduction
2 Requirement Analysis
3 Security Model
3.1 Security Architecture
3.2 Security Model
4 Security Platform
4.1 System Architecture
4.2 System Functions
5 Application Results and Discussion
5.1 Application Results
5.2 Discussion
6 Conclusion
References
A Mobile Data Leakage Prevention System Based on Encryption Algorithms
1 Introduction
2 Mobile Data Encryption Methods
2.1 SM4 Encryption Algorithm
2.2 SM3 Hash Function
2.3 Cipher-Text Retrieval Model
3 Mobile Data Leakage Prevention System
3.1 System Model
3.2 Functions Design
4 Conclusion
References
Research on Data Security Access Control Mechanism in Cloud Computing Environment
1 Introduction
2 Cloud Computing OpenStack Platform
3 Data Security Access Control Model
4 Access Control System Based on OpenStack
5 Conclusion
References
Research on Data Security Storage System Based on Distributed Database
1 Introduction
2 Data Security Storage Model
3 Data Security Storage System Based on Distributed Storage
4 Conclusion
References
Design of an Active Data Watermark Detection System
1 Introduction
2 Data Watermark Detection Technology
3 Active Data Watermark Detection System
4 Conclusion
References
Research on Privacy Protection Methods for Data Mining
1 Introduction
2 Privacy Preserving Data Mining Methods
2.1 Privacy Preserving K-Means Algorithm
2.2 Privacy Preserving SVM
2.3 Privacy Preserving Decision Tree
2.4 Privacy Preserving Association Rule Mining
3 Privacy Protection Methods for Data Mining
3.1 Restricted Release
3.2 Searchable Symmetric Encryption
3.3 Homomorphic Encryption
3.4 Digital Envelope
4 Conclusion
References
A Cluster-Based Facial Image Anonymization Method Using Variational Autoencoder
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 System Model
3.2 Architecture and Working
3.3 Method Execution Process
4 Experimentation and Results
4.1 Experimental Setup
4.2 Privacy Preservation Evaluation
4.3 Image Utility Maintenance
5 Conclusion
References
IoT Security
Research on the Security Diagnosis Platform for Partial Discharge of 10 kV Cables in Urban Distribution Networks
1 Introduction
2 Partial Discharge Model
3 Security Diagnosis Method
4 Security Diagnosis Platform
4.1 Hardware System of the Proposed Platform
4.2 Software System of the Proposed Platform
4.3 Security Diagnostic Process of the Proposed Platform
5 Application of the Security Diagnosis Platform
6 Conclusion
References
Research on the Electromagnetic Sensor-Based Partial Discharge Security Monitoring System of Distribution Network
1 Introduction
2 Theory of Partial Discharge
2.1 Factors of Partial Discharge in Distribution Network
2.2 Security Monitoring Parameters of Partial Discharge
3 Security Monitoring Method of Partial Discharge
3.1 Security Monitoring Based on Electromagnetic Sensing
3.2 Partial Discharge Electromagnetic Radiation Characteristics
3.3 Local Discharge Signal Transmission Mechanism
4 Security Monitoring System of Discharge Signal
4.1 Deployment of Electromagnetic Sensors
4.2 Interference Sources of Local Discharge Signals
4.3 Anti-interference Scheme for Security Monitoring
5 Tests and Analysis
6 Conclusion
References
Design of Ultrasonic-Based Remote Distribution Network Online Security Monitoring Device
1 Introduction
2 Principles of Online Security Monitoring
2.1 Mechanism of Partial Discharge
2.2 Calculation of Partial Discharge
2.3 Ultrasound and the Monitoring Mechanism
2.4 Ultrasonic Determination Method of Partial Discharge
3 Ultrasonic-Based Security Monitoring Device
3.1 Hardware Architecture
3.2 Software Implementation
4 Validation of the Effectiveness of Security Monitoring Device
5 Conclusion
References
An Intelligent IoT Terminal Detection System Based on Data Sniffing
1 Introduction
2 Typical IoT Architecture
3 Intelligent IoT Terminal Detection Methods
3.1 Comprehensive Detection Model
3.2 IoT Protocol Analysis
4 Intelligent IoT Terminal Detection System
4.1 System Architecture
4.2 System Workflow
5 Discussion on IoT Terminal Security
6 Conclusion
References
Intelligent IoT Terminal Software Identification System Based on Behavior Features
1 Introduction
2 IoT Terminal Software Behavior Detection
2.1 Software Architecture of Intelligent IoT Terminal
2.2 Non-invasive Instruction and Data Tracing
3 IoT Terminal Software Identification System
3.1 System Architecture
3.2 System Processes
4 Discussion
4.1 Use Multifaceted Information for Identification
4.2 Use a Variety of Terminal Codes
4.3 Use Multiple Perceptual Points
5 Conclusion
References
Research on the Identification Model of Power Terminal
1 Introduction
2 Power Terminal Identification Technology Analysis
2.1 Principle of Power Terminal Identification Technology
2.2 Comparison of Power Terminal Identification Technologies
3 Power Terminal Identification Model
3.1 Power Terminal Identification Requirements
3.2 Power Terminal Identification Model
3.3 Power Terminal Identification Algorithm
4 Implementation of the Identification Model
5 Experiment and Results Analysis
6 Conclusion
References
Research on Firmware Vulnerability Mining Model of Power Internet of Things
1 Introduction
2 Power IoT Terminal Firmware
2.1 Architecture of Power IoT
2.2 Acquisition of Terminal Firmware
3 Firmware Vulnerability Mining Techniques
3.1 Static Analysis Techniques
3.2 Fuzzy Testing Techniques
4 Firmware Vulnerability Mining Model
4.1 Firmware Vulnerability Mining Process
4.2 Firmware Sample Selection Method
4.3 Firmware Status Detection
5 Tests and Results Analysis
6 Conclusion
References
How Can Social Workers Participate in Big Data Governance? The Third-Party Perspective on Big Data Governance
1 Introduction
2 Literature Review and Analytical Framework
2.1 Platform Organization-Level Analysis of Big Data Governance
2.2 Stakeholder Analysis of Big Data Governance
2.3 Industry Supply Chain Analysis of Big Data Governance
2.4 An Analytical Framework of Big Data Governance: The Third Party of Social Work Service
3 Retrospective Analysis of Ethical Issues in Big Data Governance
4 Multidimensional Isomorphism of Third-Party Social Work Services and Big Data Governance
4.1 Work Object Isomorphism
4.2 Isomorphism of Ethical Connotation
4.3 Technical Requirement Isomorphism
5 How Third-Party Social Work Participates in Big Data Governance
6 Discussion and Conclusion
6.1 Discussion
6.2 Conclusion
References
Power Network Scheduling Strategy Based on Federated Learning Algorithm in Edge Computing Environment
1 Introduction
2 Related Works
3 The Proposed Method
4 Experimental Results and Analysis
5 Experimental Results and Analysis
References
Author Index
Recommend Papers

Big Data and Security: 4th International Conference, ICBDS 2022, Xiamen, China, December 8–12, 2022, Proceedings (Communications in Computer and Information Science, 1796)
 9819932998, 9789819932993

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Yuan Tian · Tinghuai Ma · Qingshan Jiang · Qi Liu · Muhammad Khurram Khan (Eds.)

Communications in Computer and Information Science

1796

Big Data and Security 4th International Conference, ICBDS 2022 Xiamen, China, December 8–12, 2022 Proceedings

Communications in Computer and Information Science Editorial Board Members Joaquim Filipe , Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh , Indian Statistical Institute, Kolkata, India Raquel Oliveira Prates , Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou, Tsinghua University, Beijing, China

1796

Rationale The CCIS series is devoted to the publication of proceedings of computer science conferences. Its aim is to efficiently disseminate original research results in informatics in printed and electronic form. While the focus is on publication of peer-reviewed full papers presenting mature work, inclusion of reviewed short papers reporting on work in progress is welcome, too. Besides globally relevant meetings with internationally representative program committees guaranteeing a strict peer-reviewing and paper selection process, conferences run by societies or of high regional or national relevance are also considered for publication. Topics The topical scope of CCIS spans the entire spectrum of informatics ranging from foundational topics in the theory of computing to information and communications science and technology and a broad variety of interdisciplinary application fields. Information for Volume Editors and Authors Publication in CCIS is free of charge. No royalties are paid, however, we offer registered conference participants temporary free access to the online version of the conference proceedings on SpringerLink (http://link.springer.com) by means of an http referrer from the conference website and/or a number of complimentary printed copies, as specified in the official acceptance email of the event. CCIS proceedings can be published in time for distribution at conferences or as postproceedings, and delivered in the form of printed books and/or electronically as USBs and/or e-content licenses for accessing proceedings at SpringerLink. Furthermore, CCIS proceedings are included in the CCIS electronic book series hosted in the SpringerLink digital library at http://link.springer.com/bookseries/7899. Conferences publishing in CCIS are allowed to use Online Conference Service (OCS) for managing the whole proceedings lifecycle (from submission and reviewing to preparing for publication) free of charge. Publication process The language of publication is exclusively English. Authors publishing in CCIS have to sign the Springer CCIS copyright transfer form, however, they are free to use their material published in CCIS for substantially changed, more elaborate subsequent publications elsewhere. For the preparation of the camera-ready papers/files, authors have to strictly adhere to the Springer CCIS Authors’ Instructions and are strongly encouraged to use the CCIS LaTeX style files or templates. Abstracting/Indexing CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. How to start To start the evaluation of your proposal for inclusion in the CCIS series, please send an e-mail to [email protected].

Yuan Tian · Tinghuai Ma · Qingshan Jiang · Qi Liu · Muhammad Khurram Khan Editors

Big Data and Security 4th International Conference, ICBDS 2022 Xiamen, China, December 8–12, 2022 Proceedings

Editors Yuan Tian Nanjing Institute of Technology Nanjing, China Qingshan Jiang Chinese Academy of Sciences Shenzhen, China Muhammad Khurram Khan King Saud University Riyadh, Saudi Arabia

Tinghuai Ma Nanjing University of Information Science and Technology Nanjing, Jiangsu, China Qi Liu Nanjing University of Information Science and Technology Nanjing, China

ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-981-99-3299-3 ISBN 978-981-99-3300-6 (eBook) https://doi.org/10.1007/978-981-99-3300-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This volume contains the papers from the 4th International Conference on Big Data and Security (ICBDS 2022). The event was held in Xiamen and Quanzhou, and was organized by Nanjing Institute of Technology, Xiamen University, JiangSu Computer Society, and IEEE Broadcast Technology Society. The International Conference on Big Data and Security (ICBDS 2022) brings experts and researchers together from all over the world to discuss the current status and potential ways to address security and privacy regarding the use of Big Data systems. Big Data systems are complex and heterogeneous; due to their extraordinary scale and the integration of different technologies, new security and privacy issues are introduced and must be properly addressed. The ongoing digitalization of the business world is putting companies and users at risk of cyber-attacks more than ever before. Big data analysis has the potential to offer protection against these attacks. Participation in workshops on specific topics of the conference was expected to achieve progress, and global networking in transferring and exchanging ideas. The papers come from researchers who work in universities and research institutions, and give us the opportunity to achieve a good level of understanding of the mutual needs, requirements, and technical means available in this field of research. The topics included in the second edition of this event include the following fields connected to Big Data: Security in Blockchain, IoT Security, Security in Cloud and Fog Computing, Artificial Intelligence/Machine Learning Security, and Cybersecurity and Privacy. We received 211 submissions and accepted 54 papers. All the accepted papers were peer reviewed by three qualified reviewers chosen from our technical program committee based on their qualifications and experience. The proceedings editors wish to thank the dedicated Scientific Committee members and all the other reviewers for their efforts and contributions. We also thank Springer for their trust and for publishing the proceedings of ICBDS 2022. January 2022

Yuan Tian Tinghuai Ma Qingshan Jiang Qi Liu Muhammad Khurram Khan

Organization

General Chairs Ting Huai Ma Yi Pan Muhammad Khurram Khan Yuan Tian

Nanjing University of Information Science and Technology, China Georgia State University, USA King Saud University, Saudi Arabia Nanjing Institute of Technology, China

Organization Chair Zhong Tan

Xiamen University, China

Technical Program Chairs Victor S. Sheng Qi Liu

University of Central Arkansas, USA Nanjing University of Information Science and Technology, China

Technical Program Committee Members Eui-Nam Huh Heba Abdullataif Kurdi Omar Alfandi Dongxue Liang Mohammed Al-Dhelaan Päivi Raulamo-Jurvanen Zeeshan Pervez Adil Mehmood Khan Wajahat Ali Khan Qiao Lin Ye Pertti Karhapää Farkhund Iqbal Muhammad Ovais Ahmad Lejun Zhang

Kyung Hee University, South Korea Massachusetts Institute of Technology, USA Zayed University, UAE Tsinghua University, China King Saud University, Saudi Arabia University of Oulu, Finland University of the West of Scotland, UK Innopolis University, Russia Kyung Hee University, South Korea Nanjing Forestry University, China University of Oulu, Finland Zayed University, UAE Karlstad University, Sweden Yangzhou University, China

viii

Organization

Linshan Shen Ghada Al-Hudhud Lei Han Tang Xin Mznah Al-Rodhaan Yao Zhenjian Thant Zin Oo Mohammad Rawashdeh Alia Alabdulkarim Elina Annanperä Soha Zaghloul Mekki Basmah Alotibi Mariya Muneeb Maryam Hajakbari Miada Murad Pilar Rodríguez Zhiwei Wang Rand. J Jiagao Wu Weipeng Jing Yu Zhang Nguyen H. Tran Hang Chen Sarah Alkharji Chunguo Li Babar Shah Tianyang Zhou Manal Hazazi Amiya Kumar Tripathy Shaoyong Guo Shadan AlHamed Cunjie Cao Linfeng Liu Chunliang Yang Patrick Hung Xinjian Zhao

Harbin Engineering University, China King Saud University, Saudi Arabia Nanjing Institute of Technology, China University of International Relations, China King Saud University, Saudi Arabia Huazhong University of Science and Technology, China Kyung Hee University, South Korea University of Central Missouri, USA King Saud University, Saudi Arabia University of Oulu, Finland King Saud University, Saudi Arabia King Saud University, Saudi Arabia King Saud University, Saudi Arabia Islamic Azad University, Iran King Saud University, Saudi Arabia Technical University of Madrid, Spain Hebei Normal University, China Shaqra University, Saudi Arabia Nanjing University of Posts and Telecommunications, China Northeast Forestry University, China Harbin Institute of Technology, China University of Sydney, Australia Nanjing Institute of Technology, China King Saud University, Saudi Arabia Southeast University, China Zayed University, UAE State Key Laboratory of Mathematical Engineering and Advanced Computing, China King Saud University, Saudi Arabia Edith Cowan University, Australia Beijing University of Posts and Telecommunications, China King Saud University, Saudi Arabia Hainan University, China Nanjing University of Posts and Telecommunications, China China Mobile IoT Company Limited, China University of Ontario Institute of Technology, Canada State Grid Nanjing Power Supply Company, China

Organization

Sungyoung Lee Zhengyu Chen Jian Zhou Pasi Kuvaja Xiao Xue Jianguo Sun Farkhund Iqbal Zilong Jin Susheela Dahiya Ming Pang Yuanfeng Jin Maram Al-Shablan Kejia Chen Valentina Lenarduzzi Davide Taibi Jinghua Ding XueSong Yin Qiang Ma Shiwen Hu Manar Hosny Lei Cui Yonghua Gong Kashif Saleem Xiaojian Ding Irfan Mohiuddin Ming Su Yunyun Wang Abdullah Al-Dhelaan

ix

Kyung Hee University, South Korea Jinling Institute of Technology, China Nanjing University of Posts and Telecommunications, China University of Oulu, Finland Tianjin University, China Harbin Engineering University, China Zayed University, UAE Nanjing University of Information Science and Technology, China University of Petroleum & Energy Studies, India Harbin Engineering University, China Yanbian University, China King Saud University, Saudi Arabia Nanjing University of Posts and Telecommunications, China University of Tampere, Finland University of Tampere, Finland Sungkyunkwan University, South Korea Nanjing Institute of Technology, China King Saud University, Saudi Arabia Accelor Ltd., USA King Saud University, Saudi Arabia Chinese Academy of Sciences (CAS), China Nanjing University of Posts and Telecommunications, China King Saud University, Saudi Arabia Nanjing University of Finance and Economics, China King Saud University, Saudi Arabia Beijing University of Posts and Telecommunications, China Nanjing University of Posts and Telecommunications, China King Saud University, Saudi Arabia

x

Organization

Workshop Chairs Mohammad Mehedi Hassan Asad Masood Khattak

King Saud University, Saudi Arabia Zayed University, UAE

Publication Chair Vidyasagar Potdar

Curtin University, Australia

Organization Committee Members Fan Lin Yuyao Zhang Jalal Al-Muhtadi Geng Yang Qiao Lin Ye Pertti Karhapää Lei Han Yong Zhu Bin Xie Dawei Li Jing Rong Chen Thant Zin Oo Alia Alabdulkarim Rand. J Hang Chen Jiagao Wu

Xiamen University, China Xiamen University, China King Saud University, Saudi Arabia Nanjing University of Posts and Telecommunications, China Nanjing Forestry University, China University of Oulu, Finland Nanjing Institute of Technology, China Jingling Institute of Technology, China Hebei Normal University, China Nanjing Institute of Technology, China Nanjing Institute of Technology, China Kyung Hee University, South Korea King Saud University, Saudi Arabia Shaqra University, Saudi Arabia Nanjing Institute of Technology, China Nanjing University of Posts and Telecommunications, China

Contents

Big Data and New Method Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection of Multi-type Enterprises Based on Energy Consumption Big Data Mining . . . . Xiangguo Liu, Jian Geng, Zhonglong Wang, Lingyao Cai, and Fanjiao Yin

3

Research on Data-Driven AGC Instruction Execution Effect Recognition Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyang Jiang, Hongtong Liu, and Yangfei Zhang

14

Research on Day-Ahead Scheduling Strategy of the Power System Includes Wind Power Plants and Photovoltaic Power Stations Based on Big Data Clustering and Filling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feng Qi, Qiang Wang, Xiaoqiang Wei, Yiping Zhang, Wenbin Wu, Donyang He, Taicheng Wang, and Suyang Shen

25

Day-Ahead Scenario Generation Method for Renewable Energy Based on Historical Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hong Wang, Yang Liu, Chao Yin, Xuesong Bai, Bo Sun, Menglei Li, and Xiyong Yang

37

A High-Frequency Stock Price Prediction Method Based on Mode Decomposition and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weijie Chen, Qingshan Jiang, Xibei Jia, Abdur Rasool, and Weihui Jiang

47

A Cross-Platform Instant Messaging User Association Method Based on Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pei Zhou, Xiangyang Luo, Shaoyong Du, Wenqi Shi, and Jiashan Guo

63

Research on Data Watermark Tracing System in Hadoop Environment . . . . . . . . Wenyu Qiao and Jiexi Wang Application of RFID Tag in the Localization of Power Cable Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhenyu Zhang, Shuming Feng, Min Xiao, Yongcheng Yang, and Gangbo Song

80

92

Research Hotspots and Evolution Trend of Virtual Power Plant in China: An Empirical Analysis Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Yan Zhang, Pengcheng Liu, Hao Xu, and Mulan Wang

xii

Contents

Influencing Factors Analysis and Prediction Model of Pavement Transverse Crack Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Yuqin Zhu, Wengang Ma, Ling Cong, Chengtao Li, and Shixiang Hu Research on the Evolution of New Energy Industry Financing Ecosystem Under the Background of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Hairong Wang and Qiuchi Wu A Survey of the State-of-the-Art and Some Extensions of Recommender System Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Lixin Jia, Lixiu Jia, and Lihang Feng An Innovative AdaBoost Process Using Flexible Soft Labels on Imbalanced Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Jinke Wang, Biao Song, Xinchang Zhang, Yuan Tian, and Ran Guo Artificial Intelligence and Machine Learning Security Feature Fusion Based IPSO-LSSVM Fault Diagnosis of On-Load Tap Changers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Honghua Xu, Laibi Yin, Ziqiang Xu, Qian Xin, and Mengjie Lv Graphlet-Based Measure to Assess Institutional Research Teams . . . . . . . . . . . . . 200 Shengqing Li and Jiulei Jiang Logical Relationship Extraction of Multimodal South China Sea Big Data Using BERT and Knowledge Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Peng Yufang, Xu Hao, Jin Weijian, and Yang Haiping Research on Application of Knowledge Graph in War Archive Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Huang Yongqin, Chen Xushan, Yang Anlian, and Ping Shuo Semi-supervised Learning Enabled Fault Analysis Method for Power Distribution Network Based on LSTM Autoencoder and Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Haining Xie, Liming Zhuang, and Xiang Wang Fault Detection Method for Power Distribution Network Based on Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 Haining Xie and Lijuan Chao Factors Influencing Chinese Users’ Willingness to Pay for O2O Knowledge Products Based on Information Adoption Model . . . . . . . . . . . . . . . . . 276 Zhu Zhentao, Yue Wen, Zhang Yan, and Wu Qiuchi

Contents

xiii

A Survey of Integrating Federated Learning with Smart Grids: Application Prospect, Privacy Preserving and Challenges Analysis . . . . . . . . . . . . . . . . . . . . . . 296 Zhichao Tang, Yan Yan, Dong Wu, Tianhao Yang, Ruixuan Dong, Shuyang Hao, Wei Wang, Yizhi Chen, and Yuan Tian Application of the Fusion Access Technology of Carrier and 5G in Power Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Yang Hu, Ming Zhang, Yingli He, Guangxiang Jin, Qi Wei, and Jia Yu Muscle Fatigue Classification Based on GA Optimization of BP Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Mengjie Zang, Lidong Xing, Zhiyu Qian, and Liuye Yao Data Technology and Network Security Research on Typical Scenario Generation Based on Distribution Network Data Mining and Improved Policy Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Jinhu Wang, Tongzhou Zhang, Ming Chen, Wei Han, Mingze Ji, and Yuzhuo Zhang A Data-Driven and Deep Learning-Based Economic Evaluation Method for New Power System Distribution Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Yunzhao Wu, Jialei Zhang, Qing Duan, Guanglin Sha, and Yao Zhang Study on Random Generation of Virtual Avatars Based on Big Data . . . . . . . . . . 355 Jian Zhao, Mo Peng, Bo-Lin Zhu, and Ling-Ling Li Ordering, Pricing, and Coordination of a Closed-Loop Supply Chain with Risk Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Ronghua Lu, Yisheng Wu, and Feng Xu An Explainable Optimization Method for Assembly Process Parameter . . . . . . . . 391 Hongsong Peng, Weiwei Yuan, Yimin Pu, Xiying Yang, Donghai Guan, and Ran Guo A Correlational Strategy for the Prediction of High-Dimensional Stock Data by Neural Networks and Technical Indicators . . . . . . . . . . . . . . . . . . . . . . . . . 405 Jingwei Hong, Ping Han, Abdur Rasool, Hui Chen, Zhiling Hong, Zhong Tan, Fan Lin, Steven X. Wei, and Qingshan Jiang Research on Attribute-Based Privacy-Preserving Computing Technologies . . . . . 420 Shuo Qiu, Wenhui Ni, Yanfeng Shi, and Wanni Xu

xiv

Contents

Research on BIM Modeling Technology from the Perspective of Power Big Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Yan Li Security Risk Management of the Internet of Things Based on 5G Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Wei Cao, Yang Hu, Shuang Yang, Xue-yang Zhu, and Jia Yu Multi Slice SLA Collaboration and Optimization of Power 5G Big Data Security Business . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Daohua Zhu, Yajuan Guo, Lei Wei, Yunxiao Sun, and Wei Liu Virtualized Network Functions Placement Scheme in Cloud Network Collaborative Operation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 Youjun Hu, Lan Gan, E. Longhui, Fangcheng Chu, Yang Lu, Xifan Nie, and Fei Zhao Grounded Theory-Driven Knowledge Production Features Mining: One Empirical Study Based on Big Data Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Hao Xu, Yiyang Li, Mulan Wang, Yufang Peng, Qinwei Chen, Pengcheng Liu, and Yijing Li Research on Digital Twin Technology of Main Equipment for Power Transmission and Transformation Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . 512 Yu Chen, Ziqian Zhang, and Ning Tang Efficient Spatiotemporal Big Data Indexing Algorithm with Loss Control . . . . . . 524 Ziyu Wang, Runda Guan, Xiaokang Pan, Biao Song, Xinchang Zhang, and Yuan Tian Cybersecurity and Privacy Privacy Measurement Based on Social Network Properties and Structure . . . . . . 537 YuHong Gong, Biao Jin, and ZhiQiang Yao Research on 5G-Based Zero Trust Network Security Platform . . . . . . . . . . . . . . . . 552 Zaojian Dai, Jidong Zhang, Yong Li, Xinyi Li, Ziang Lu, and Wengao Fang A Mobile Data Leakage Prevention System Based on Encryption Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Wen Shen and Hongzhang Xiong Research on Data Security Access Control Mechanism in Cloud Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Chen Luo, Yang Li, and Qianxuan Wang

Contents

xv

Research on Data Security Storage System Based on Distributed Database . . . . . 586 Bo Liu, Lan Zhang, and Jinke Wang Design of an Active Data Watermark Detection System . . . . . . . . . . . . . . . . . . . . . 597 Lan Zhang, Xiangyang Zhang, and Tiejun Yang Research on Privacy Protection Methods for Data Mining . . . . . . . . . . . . . . . . . . . 609 Jindong He, Rongyan Cai, Shanshan Lei, and Dan Wu A Cluster-Based Facial Image Anonymization Method Using Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Yuanzhe Yang, Zhiyi Niu, Yuying Qiu, Biao Song, Xinchang Zhang, Yuan Tian, and Ran Guo IoT Security Research on the Security Diagnosis Platform for Partial Discharge of 10 kV Cables in Urban Distribution Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Xinping Wang, Guofei Guan, Chunpeng Li, Hao Zhang, Chao Jiang, and Qingwu Song Research on the Electromagnetic Sensor-Based Partial Discharge Security Monitoring System of Distribution Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Xinping Wang, Chunpeng Li, Xiaoping Yang, Feng Jiang, Qiqi Luan, and Yifan Wang Design of Ultrasonic-Based Remote Distribution Network Online Security Monitoring Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 Qiqi Luan, Feng Jiang, Xinping Wang, Qingwu Song, Tianze Zhu, and Jiangbin Wang An Intelligent IoT Terminal Detection System Based on Data Sniffing . . . . . . . . 677 Tao Chen, Can Cao, and Pengcheng Ni Intelligent IoT Terminal Software Identification System Based on Behavior Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 Zhiyuan Ye, Cen Chen, Nuannuan Li, Wen Yang, Zheng Zhang, and Can Cao Research on the Identification Model of Power Terminal . . . . . . . . . . . . . . . . . . . . 701 Wenbo Shang, Xin Jin, Biying Sun, Xiaoqin Liu, and Chunhui Du

xvi

Contents

Research on Firmware Vulnerability Mining Model of Power Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Chao Zhou, Ziying Wang, Jing Guo, Yajuan Guo, Haitao Jiang, Zhimin Gu, and Wei Huang How Can Social Workers Participate in Big Data Governance? The Third-Party Perspective on Big Data Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 Di Zhao Power Network Scheduling Strategy Based on Federated Learning Algorithm in Edge Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Xiaowei Xu, Han Ding, Jiayu Wang, and Liang Hua Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749

Big Data and New Method

Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection of Multi-type Enterprises Based on Energy Consumption Big Data Mining Xiangguo Liu(B) , Jian Geng, Zhonglong Wang, Lingyao Cai, and Fanjiao Yin State Grid Shandong Electric Power Company, Tai’an Power Supply Company, Tai’an 271000, China [email protected]

Abstract. In view of the continuous improvement of the current energy consumption data of many types of enterprises, the effective monitoring of enterprise energy consumption and the early warning of different reward energy use will be the top priority. This paper proposes a data-driven energy efficiency evaluation and energy anomaly detection method for multi-type enterprises based on energy consumption big data mining. This method uses K-means Clustering algorithm identifies the energy consumption patterns of different enterprises, which is convenient for the evaluation of enterprise energy efficiency. Then, on the basis of pattern division, the outlier detection of enterprise energy consumption data is completed by CEEMDAN-LOF algorithm, and the abnormal energy consumption detection and research of enterprises are realized. The example uses the real energy consumption data of power grid enterprises, and the simulation results show the effectiveness of the proposed method. Keywords: Early warning of abnormal energy use · big data mining · K-means clustering · outlier detection

1 Introduction Energy conservation and emissions reduction has become the consensus of the whole society, is the only way to sustainable development [1, 2], so in the continuous improvement of the enterprise energy consumption data, the effective use of enterprise energy consumption data for energy consumption can effectively monitor and abnormal use of early warning is the most important [3], identification of abnormal energy consumption to reduce energy consumption of enterprises improves the management level of the power grid provides the positive support. Project Supported by Science and Technologyleft-248285 Project of State Grid Shandong Electric Power Company “Research on the key technologies for intelligent research and judgment of energy efficiency anomalies of multi type enterprises based on massive data mining” (5206002000QW). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 3–13, 2023. https://doi.org/10.1007/978-981-99-3300-6_1

4

X. Liu et al.

Reference [4] proposed the concept of multi-energy system to obtain the optimal energy ratio through energy efficiency and energy consumption ratio. Reference [5] proposed the establishment of a comprehensive energy system evaluation index body through the related architecture of energy Internet. Reference [6] proposed a Stacking ensemble model for anomaly monitoring based on five ML models. Reference [7] designed different screening methods for measurement anomalies based on the differences in massive data features.

2 Determination of the Optimal Number of Clusters Evaluation index, SC is a common clustering algorithm for arbitrary bunch of class C j and a single sample of SC calculation formula is: b−a max(a, b)

(1)

1  (xi − cj )2 n1

(2)

1   (xi − xk )2 m

(3)

s= where: a=

xi ∈Cj

b=

xi ∈Cj xk ∈Cl

where: S is the contour coefficient of a single sample; A is the average distance between samples and all other points in class C j ; B is the average distance between the samples in class C l and all points in the samples in the nearest class C j ; C j is the center of mass of class C j ; M and N represent the number of samples in class C j and C l , respectively. However, SC will be overrepresented on convex clusters. Therefore, GSA algorithm is considered to be introduced on the basis of SC. GSA can estimate the optimal number of clusters in the dataset, and the effect is obvious in the scenario where the number of clusters is set at a low value. The clustering dispersion of K clusters is defined as: W (K) =

K  

(xi − cj )2

(4)

i=1 xi ∈Cj

The main idea of GSA algorithm is to calculate the natural logarithm of the corresponding cluster dispersion for each set cluster number K, and compare it with the threshold, so as to determine the optimal cluster number. After natural logarithm processing, the data situation tends to be more linearized, which makes it easier to determine the Gap value Gap(K) between clustering dispersion and threshold. The Gap value is defined as follows: Gap(K) = E ln[Wr (K)] − ln[W (K)]

(5)

where: R is the selected reference dataset, and E is the mathematical expectation of the reference dataset. The Gap (K) value contains the information of clustering dispersion

Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection

5

between the data set to be tested and the reference distribution. With the change of K, the Gap (K) value will also change, and the analysis of Gap (K) can determine the optimal number of clusters. The above describes the algorithm principles of GSA and SC respectively. However, the determination of the optimal clustering number by simply using GSA or SC methods may lead to misjudgment, resulting in insignificant clustering effect. Therefore, in order to improve the accuracy of clustering results, GSA-SC fusion algorithm was introduced to determine the optimal number of clusters, and the average value of Gap(K) and S was calculated to construct a new clustering evaluation index G, as shown in the following equation. G=

Gap(K) + s 2

(6)

3 Energy Consumption Pattern Recognition Based on K-Means Clustering K-means is based on partition clustering algorithm. Its main idea is that, for a data set given multiple data objects, it can be divided into several dissimilar clustering clusters by partition method, and the data objects of each cluster cannot belong to another cluster. The partition process is shown as follows (Fig. 1):

Select the iniƟal center

Produce a parƟƟon result

Are the results reasonable?

Y

The final result

N Modify the division

Fig. 1. Flowchart of partitioning based clustering algorithm

K-means clustering algorithm was proposed by MacQueen in L976, and its main idea is to divide the data set with N objects into K clusters. K-means algorithm generally uses Euclidean distance to measure similarity, and the evaluation function S of clustering effect can be defined as follows. S=

n k  

Dij xi − h2

(7)

i=1 j=1

The end of the partition is marked by the minimum sum of squared distances of data objects in each cluster. However, in the actual experiment, it is impossible to minimize the sum of squared distances of each cluster, and the partitioning algorithm will fall into the local optimal solution. The execution steps of K-means clustering algorithm are divided into the following four steps: Step 1: Randomly select the initial K cluster center points, ci ;

6

X. Liu et al.

Step 2: Traverse the points in the data space to obtain the nearest initial clustering center to the point, and finally merge this point into the class. Step 3: Calculate the mean of the sum of squared distances of points within each cluster, and take the mean as the next cluster center. Step 4: If the sum of squared distances of points in the cluster does not change, end the clustering; otherwise, iterate step 2 and step 3 continuously. The time complexity of k-means algorithm is O(nkd), where n represents the number of data points, K represents the number of clustering centers, and D represents the dimension of data points. For k-means algorithm, the time taken for each iteration is O(nkd)). If the number of iterations is T, then the time complexity of K-mean clustering algorithm is O(nkd). In fact, most of the time spent by k-means algorithm is to calculate the distance between data points. Future improvement direction can be aimed at reducing the time consumption and the nearest neighbor problem.

4 Abnormal Analysis of Energy Consumption Data In this module, the abnormal analysis of energy consumption data is completed by combining CEEMDAN algorithm and LOF algorithm, and the data analysis helps enterprises to better improve energy monitoring and abnormal detection of energy consumption data. 4.1 LOF Algorithm Density-based LOF algorithm is based on the definition of KTH distance, k-distance neighborhood, reachable distance and local reachable density of data objects. Related concepts are as follows: For a given dataset D, p ∈ D and O ∈ D, the distance between point P and O is denoted as D (p, O), and the KTH distance k(p) of point P is defined as: (1) There are at least k points O ∈ D, such that D (P, O ) ≤ D (P,O): (2) There are at most k−1 points O ∈ D such that D (P, O ) < D (P, O) holds. The set of points whose distance is not greater than the KTH distance of P is the k-distance neighborhood of point P, which is defined as follows. Nk (p) = {q ∈ D − {p}|d (p, o) ≤ k(p)}

(8)

For a given dataset D, p ∈ D, O ∈ D, the KTH distance of point O and the maximum distance between point P and O are the reachable distance, and the calculation formula is as follows: rr (p, o) = max{k(o), d (p, o)}

(9)

It can be understood that when point p is far away from point O, the reachable distance is the actual distance between the two points; When point P is close to point O, the reachable distance is the K distance of point p. That is, for a point o in Nk (p), when the reachable distance rk (p, o) is taken as the distance (p, o) between two points, it indicates that K of point P from point O is more likely to be far away from a point in the

Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection

7

p1 Reach-distk(p1,o)=k-distance(o) o Reach-distk(p2,o)

p2 Fig. 2. When k = 4, reach distance (P1 , O) and reach distance (P2 , O)

neighborhood. On the contrary, if rk (p, o) takes k(O), then for the point O in Nk (p), the probability that point p is in the neighborhood of K distance of point O is large (Fig. 2) The reciprocal of the average reachable distance of all points in the K-neighborhood Nk (p) based on point P is the local reachable density of point P, and the calculation formula is: lk (P) = 

|Nk (p)| o∈Nk (p) rk (p, o)

(10)

According to the local reachable density, the greater the distance between a data point and its neighbors, the smaller the density between them, and vice versa. That is, the locally accessible density reflects the density of the local area around the data point. The local anomaly factor is expressed as the average of the ratio between the local accessible density of other points in N(P) and the local accessible density of point P. The calculation formula is as follows:  o∈Nk (p) lk (o) (11) Lk (p) = |Nk (p)|lk (p) The local anomaly factor reflects the degree of outliers between point P and its neighbors. If the value of Lk (p) approaches 1, it means that the local density of point P is close to that of its neighbors. If the value of Lk (p) is greater than 1 and larger, it means that point P is more distant from neighboring points and can be regarded as a possible outlier. Therefore, the outlier degree of any point P in the data set can be judged by calculating Lk (p). The calculation flow of LOF algorithm is as follows: (1) Determine the parameter, k, and calculate the k-th distance. For a given dataset D, there is one sample data Pi ∈ D, i = 1, 2, n, k is a positive integer. Determine the value of parameter k and calculate each point P in dataset D and its k distance k(pi ) in its neighborhood Nk (pi ).

8

X. Liu et al.

(2) Calculate the accessible distance and the local accessible density. k(pi ) obtained from the upper dragon was used to calculate the accessible distance between p: and the k distance from all the points in the neighborhood Nk (pi ), thus calculating the local accessible density of p of lk (pi ) (3) Calculate the local anomalous factor value and divide the anomalous data set. Local anomaly factor lk (pi ) is calculated for all sample data in data set D and compared with the threshold 1. If local anomaly factor lk (pi ) ≤ 1, data P is considered normal and otherwise outliers. The local anomaly factor of all points in the dataset compared to 1. For comparison, data greater than 1 are divided into anomalous data sets, and the algorithm ends. 4.2 CEEMDAN Algorithm The specific steps of CEEMDAN are as follows: (1) Add white Gaussian noise εωi (n) to the signal X(n) to be decomposed: x(n) = x(n) + εωi (n), where ω I (n) represents the white Gaussian noise added in the ith experiment, ε represents the amplitude of the noise, and the first IMF component can be obtained by using EMD algorithm to decompose the above expression for I times and calculate the mean value: 1 IMF1 (n) I I

IMF1 (n) =

(12)

i=1

IMF1 (n) denotes the first IMF component obtained using CEEMDAN. (2) By subtracting the first F-component from the original signal to be decomposed, the margin after the first step of decomposition can be obtained: r1 (n): r1 (n) = x(n) − IMF1 (n); (3) The second IMF can be obtained by 1 decomposition experiment of R1 (n) = x(n) − IMF1 (n) using EMD [8] algorithm and taking the average value; (4) For k = 2, 3 … K, calculate the KTH allowance rk (n),   The k + 1 IMF component can be obtained by decomposing rk (n) + Ek εωi (n) I times and taking the average value:   1  Ei πk (n) + εEk ωi (n) I I

IMFk+1 (n) =

(13)

i=1

where IMFk+1 (n) represents the k + 1 IMF component obtained using CEEMDAN algorithm. (5) Repeat until you can’t break it down.

5 Example Simulation and Analysis The example data adopts the historical energy consumption data of a cement enterprise as the research sample to study the abnormal data detection proposed in this section. Daily level data of the enterprise from January 1, 2020 to December 31, 2020 are selected as the detection data, a total of 365 data.

Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection

9

5.1 Enterprise Energy Consumption Mode Division The historical energy consumption data of enterprises were normalized. The figure shows the distribution of the actual and normalized values of the enterprise’s historical energy consumption. It can be seen from the figure that the energy consumption is mainly distributed within the interval [0, 2.5], but there is an isolated point (the actual value is 20.45). If the traditional clustering algorithm is used for analysis, because the number of clusters is a hyperparameter, it needs to be artificially input, which leads to the optimal number of clusters of samples is not easy to determine and has strong randomness. Therefore, the GSA-SC fusion algorithm is considered to determine the optimal clustering number of energy consumption set (Fig. 3).

Fig. 3. Schematic diagram of historical energy consumption distribution of enterprises

Due to the shut-down and refurbishment during the data statistics period, the total energy consumption, coal consumption and power consumption of the enterprise are all 0 data, so this type of 0 data is ignored, and the preprocessed data is shown in Fig. 4.

Fig. 4. Data display after preprocessing

In order to verify the superiority of the GSA-SC algorithm presented in this paper, GSA and SC are respectively used for comparative analysis, and the results are shown in

10

X. Liu et al.

the table. As can be seen from the table, if only GSA method is used to judge, when K ≥ 3, the change trend of Gap(K) is not obvious, and all values are very similar, leading to the difficulty in determining the optimal cluster number. If only SC algorithm is used for analysis, when K is 2 and 3, its corresponding S value is close, and the optimal cluster number is not easy to determine. When the GSA-SC algorithm proposed in this paper is used, it is easy to determine that when K is 3, G value is the largest, and no neighboring point value is similar to it. Therefore, the optimal number of clusters was selected as 3 (Table 1). Table 1. Effect comparison of GSA-SC algorithm K

GSA

SC

GSA-SC

2

0.241

0.42

0.3305

3

0.302

0.43

0.366

4

0.305

0.35

0.3275

5

0.304

0.312

0.308

6

0.311

0.29

0.3

The binary K-means++ algorithm was used for cluster analysis to obtain different energy consumption patterns. The clustering results are shown in the figure, and the details of cluster classes are shown in the table. According to the information in the table, the energy consumption data are divided into three clusters: A, B and C. Before normalization, the standard interval of cluster A is [5.01 TCE, 10.54 TCE], the standard interval of cluster B is [11.9 TCE, 16.6 TCE], and the standard interval of cluster C is [4.56 TCE, 9.04 TCE].

Fig. 5. Clustering results of enterprise energy consumption data

Table 2 shows the data clustering results achieved by combining IBM SPSS Statistics software. As can be seen from the table, when the number of cluster classes is set to

Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection

11

the optimal number of cluster classes 3, the final cluster center and the number of cases in each cluster are completely consistent with the situation in the figure. It can be seen from Table 2 combined with Fig. 5. Table 2. Number of cases in each cluster class Clusters of class

A

B

C

Numbers

93

67

158

5.2 Abnormal Energy Consumption Detection for Enterprises As shown in Fig. 6, the solid red line is the running trend of the energy consumption time series curve of enterprises. Figure 7 shows that the low-frequency recombination sequence obtained by CEEMDAN is relatively flat and has the ability to resist outliers, which can well represent the running trend of the energy consumption time series curve of enterprises.

Number of days/d

Fig. 6. Comparison of energy consumption data and operation trend of enterprises

Figure 7 for the primitive LOF detection under the temporal sequence k = 30 result diagram, as shown in Fig. 7, you can see the serial number 0–20 and other time data is identified as the discrete points, this is because the test results by this period enterprise regular changes in the energy use, lead to the results of the discriminant deviation, thus unable to accurately identify outlier data points. As shown in Fig. 8, compared with the original outlier detection, CEEMDAN-LOF outlier data detection method adopted in this paper takes into account the running trend of time series, and LOF values of outlier data and normal data are significantly different, which is convenient to distinguish group data from normal data. The detailed diagnostic results of the two algorithms are shown in Table 3. When k = 10, 20, 30, and 40, there are many false and missed values when the original sequence

12

X. Liu et al. Normal MiscalculaƟo Residual valu

6 5

LOF value

Outliers 4 3 2 1 The sequential

Fig. 7. Adopts LOF outlier data detection method

is used to detect the outlier data. However, there is no false or missed values when CEEMDAN is used to detect the outlier data after eliminating the sequence running trend. It further validates the effectiveness of the method used in this paper to detect abnormal data. Table 3. Detection results using LOF and CEEMDAN-LOF algorithms K

LOF Number of outliers

CEEMDAN-LOF The number of residual

Number of mistakes

Number of outliers

The number of residual

Number of mistakes

10

10

7

0

17

0

0

20

15

8

6

17

0

0

30

17

13

13

17

0

0

40

18

11

12

17

0

0

Fig. 8. CEEMDAN-LOF outlier data detection method

Data-Driven Energy Efficiency Evaluation and Energy Anomaly Detection

13

6 Conclusion Driven by the presented data based on the energy consumption of big energy efficiency evaluation and the type of data mining enterprises can use anomaly detection method to solve the energy consumption of the anomaly detection problem under the huge amounts of data, and for different types of enterprise energy consumption data monitoring and puts forward the corresponding method of energy efficiency for the enterprise, the power grid to improve the management level.

References 1. Guo, Y., Zhang, X., Jiang, F., Zhang, J.: Research and management of water supply energy saving technology based on carbon peaking and carbon neutralization target. Water Supply Drainage 1–5 (2022). Accessed 28 Aug 2022 2. Zhang, B., Lan, C.: Research on the present situation and countermeasures of energyconservation and emission-reduction in China’s iron and steel industry. In: 2010 Asia-Pacific Power and Energy Engineering Conference, pp. 1–4 (2010) 3. Zhang, J., Lei, Y.: Research on the multi-objective optimized model of power generation dispatching based on low-carbon economy and energy-conservation. In: The 27th Chinese Control and Decision Conference (2015 CCDC), pp. 1651–1655 (2015) 4. Tian, Y., Yu, Z., Zhao, N., Zhu, Y., Xia, R.: Optimized operation of multiple energy interconnection network based on energy utilization rate and global energy consumption ratio. In: 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), pp. 1–6 (2018) 5. Liu, H., Li, Z., Cheng, Q., Yao, T., Yang, X., Hu, Y.: Construction of the evaluation index system of the regional integrated energy system compatible with the hierarchical structure of the energy Internet. In: 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2), pp. 342–348 (2020) 6. Jian, S., Li, W., Peng, X., Yan, Z., Cheng, C., Yuan, H.: Abnormal detection of power consumption based on a stacking ensemble model. In: 2021 4th International Conference on Energy, Electrical and Power Engineering (CEEPE), pp. 1021–1026 (2021) 7. Li, Z.: Abnormal energy consumption analysis based on big data mining technology. In: 2020 Asia Energy and Electrical Engineering Symposium (AEEES), pp. 64–68 (2020) 8. Han, Z., Zhang, C., Gao, M.: Hybrid energy storage planning for island-type integrated energy system based on EMD decomposition. Therm. Power Gener. 2022(09), 72–78 (2022). Accessed 28 Aug 2022 9. Arkhangelski, J., Abdou-Tankari, M., Lefebvre, G.: Efficient abnormal building consumption detection by deep learning LSTM IOT data classification. In: 2022 11th International Conference on Renewable Energy Research and Application (ICRERA), pp. 125–129 (2022) 10. Zhang, W., Dong, X., Li, H., Xu, J., Wang, D.: Unsupervised detection of abnormal electricity consumption behavior based on feature engineering. IEEE Access 8, 55483–55500 (2020)

Research on Data-Driven AGC Instruction Execution Effect Recognition Method Haiyang Jiang1 , Hongtong Liu2(B) , and Yangfei Zhang2 1 State Grid Heilongjiang Electric Power Company, Harbin 150090, Heilongjiang, China 2 School of Electric Power Engineering, Nanjing Institute of Technology, Nanjing 211167,

Jiangsu, China [email protected]

Abstract. With the high penetration of random energy such as wind power and photovoltaic in the power grid, the influence of the accuracy of regulation of traditional thermal power units on the operation of the power grid is gradually increasing. Aiming at the problem of the deviation between the actual output of thermal power units and the AGC command of the power grid, this paper proposes a data-driven AGC command execution effect identification method. Firstly, based on Kernel Principal Component Analysis (KPCA), a data preprocessing method is proposed, which maps feature datasets into low-dimensional vectors to achieve dimensionality reduction. Secondly, the Independent Recurrent Neural Network (IndRNN) is used to process and predict the dimensionality reduction data, so as to realize the accurate perception of the adjustment effect of the unit execution command. Finally, the real power grid data is used to simulate and verify the proposed method. The results show that the model can effectively reduce the deviation of instruction execution. Keywords: recurrent neural network · kernel principal component analysis · Long short-term memory network · gated recurrent unit

1 Introduction Automatic generation control (AGC) has become increasingly mature and widely used after years of research [1, 2], in recent years, the installed capacity of renewable energy, mainly wind energy and photovoltaic energy, has increased year by year, due to the uncertainty of renewable energy output, it is necessary for the generator to undertake heavy output regulation task for a long time, however, the regulating capacity of thermal power units is far from meeting the demand of frequency modulation [3]. Therefore, it is urgent to improve the regulation capacity of thermal power units to deal with the problems caused by new energy grid connection. In the field of active power scheduling under the background of new energy generation, it is difficult to improve the regulation ability of thermal power units by traditional frequency modulation strategy optimization. However, with the rise of intelligent algorithms in recent years, more and more scholars use intelligent algorithms to improve the regulation ability of thermal power © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 14–24, 2023. https://doi.org/10.1007/978-981-99-3300-6_2

Research on Data-Driven AGC Instruction Execution Effect Recognition Method

15

units. Literature [4] proposed a multi-model predictive function control strategy and applied it to the design of power regulation control system to achieve better primary frequency modulation performance of thermal power units. Literature [5] proposed a deep fusion learning model based on convolutional neural network and fully connected neural network for short-term wind power generation prediction. In addition, in recent years, the regulation technology based on artificial intelligence technology has been gradually paid attention to by researchers. Literature [6] proposed an online decision method for unit recovery based on deep learning and Monte Carlo search and obtained an online recovery scheme with high robustness. Literature [7] proposed a self-learning decision-making method for unit combination based on longterm deep learning network. Compared with traditional methods, the decision accuracy and efficiency are improved, and the decision results are more robust. The strategies proposed in the above literatures are still based on the premise that thermal power units can accurately respond to AGC instructions, and they cannot overcome the problems caused by low response accuracy of thermal power units. To solve the above problems, this paper firstly uses deep recurrent neural network (DRNN) to predict and perceive the accuracy of AGC control command execution. Secondly, a preprocessing strategy for the input parameters of the deep network model is proposed to improve the overall training speed and output accuracy of the model. Finally, the real unit operation data in a provincial power grid is used for simulation analysis, and the advantages of the proposed preprocessing strategy in model training and convergence are verified. It is proved that the prediction model proposed in this paper can accurately predict the unit AGC command.

2 Independent Recurrent Neural Network 2.1 Recurrent Neural Network Recurrent neural network [8] (RNN) is a commonly used model for predicting time series. Its most important feature is that the state of cyclic neurons at a certain time not only depends on the current input, but also can be affected by the input before the current time. Such characteristics make it very suitable for solving time series prediction problems. The RNN is not only fully connected from layer to layer, but also the output and input of the hidden layer are connected through a specific arithmetic unit, forming a feedback loop and forming a flip-flop like structure, which endows the RNN unit with the ability to remember the state of the previous cycle. Figure 1 shows a standard single-layer RNN network.

16

H. Jiang et al. Output layer

O

Hidden layer

S

Input layer

X

W

Fig. 1. Structure diagram of recurrent neural network

Where X represents the value of the Input layer, S represents the value of the Hidden layer, O represents the value of the Output layer, U represents the Input weight matrix represents the Output weight matrix, and W represents the cyclic weight matrix. To solve more complex problems, the RNN network can be superimposed to build a deep recurrent neural network (DRNN) model. The DRNN model has more nonlinear features and weight parameters and can fit more complex functions. Figure 2 shows the structure diagram of a DRNN with three hidden layers.

Fig. 2. Deep recurrent neural network

Research on Data-Driven AGC Instruction Execution Effect Recognition Method

17

2.2 Improvement of Deep Recurrent Neural Network To solve the problem of gradient disappearance or gradient explosion in DRNN network, an improvement is made on the basis of RNN, long short-term memory network [9] (LSTM) and gated recurrent unit [10] (GRU) are proposed. These two network structures improve the problem of gradient disappearance or gradient explosion to a certain extent. However, because tanh or sigmoid is used as the activation function, the gradient attenuation between network layers is caused, and it is difficult to construct and train RNN-based deep LSTM or GRU. The above problems can be solved effectively by using Independent Recurrent Neural Network (IndRNN). The status update formula of IndRNN is: ht = σ (Wxt + Uht−1 + bh )

(1)

Stands for Hadamard product, This is the biggest difference with traditional RNN networks. A neuron in the cell of IndRNN at each time is only related to the corresponding neuron in the cell of other time, and has nothing to do with other neurons. Therefore, the connections between neurons can be achieved by stacking multiple layers of IndRNN units, making the IndRNN expand to deeper layers. For the n neuron, the hidden state at time t can be calculated as follows:   (2) hn,t = σ Wn xt + un  hn,t−1 + bn IndRNN presents a new perspective of aggregating spatial patterns (via W ) independently over time steps (via U ). By stacking multiple layers of neurons, each neuron in the next layer independently processes the output of all neurons in the previous layer, which reduces the difficulty of building deep network structures and enhances the processing ability of long sequence problems.

3 KPCA Preprocessing Process The input preprocessing process based on KPCA is as follows: 1) The parameters of KPCA preprocessing model are adjusted. The Grid Search (GS) algorithm with k-fold cross-validation was used to Search for the optimal KPCA preprocessing model hyperparameters. 2) Data normalization stage. The feature variables were input into the model in the form of vectors, and the corresponding data sets of the feature variables were standardized to map the data within the range of 0–1. 3) Calculate eigenvalues and eigenvectors. Firstly, the preprocessed data is mapped to a high-dimensional space by constructing a kernel function. Secondly, the covariance moments are obtained by calculating the eigenvectors and eigenvalues of the covariance matrix in the high-dimensional space. Finally, the calculated eigenvalues are sorted.

18

H. Jiang et al.

4) Principal components are determined by calculating the contribution rate of eigenvalues. First, calculating the contribution rate of each characteristic value, the characteristics of the contribution rate is greater than the fixed threshold value in a collection of C, and calculate the collection C characteristic value in the cumulative contribution rate, and if the cumulative contribution rate is greater than the threshold, then determine the characteristic value from the set of feature vector of the principal component number, and will get the number of principal component dimension reduction after save to set U ; If the cumulative contribution rate is less than the threshold, adjust the parameters and repeat step (4). 5) Judge whether GS algorithm completes the traversal of parameters. If so, select the KPCA model with the optimal parameters according to the maximization of the principal component contribution rate, and output the set U of the optimal KPCA model. If not, return to step 3. Figure 3 is the flow chart of input preprocessing based on KPCA. Start

Determine the optimization parameters, value range and step size

K-fold cross validation and grid search algorithm search the traversal parameters to obtain the KPCA model

The preprocessed dataset is fed into the KPCA model

Establish kernel function

Calculate the contribution of eigenvalues

Set C is composed of eigenvalues whose contribution rate is greater than a given threshold

Adjusting Parameter Values

Calculate the cumulative contribution of eigenvalues

Cumulative contribution rate is greater than the threshold

No

Yes No

GS completes the parameter traversal

Yes Determine the pivot number and save the number of pivot entries into the set U According to the maximum principal cumulative contribution to construct the optimal parameters of KPCA model

Output the pivot set U of the optimal KPCA model

End

Fig. 3. Flow chart of input variable preprocessing based on KPCA

Research on Data-Driven AGC Instruction Execution Effect Recognition Method

19

4 Identification Process of AGC Instruction Execution Effect The identification process of AGC instruction execution effect based on KPCA preprocessing strategy and DIndRNN model is as follows: 1) Data preprocessing. The original data of unit operation were normalized. 2) Construct DIndRNN model. The preprocessed data set was used as input to construct DIndRNN model. 3) Train DIndRNN model. The preprocessed training data is input into the model, and the predicted value obtained is compared with the training label. The weight of DIndRNN model is updated through back propagation, and step (3) is repeated until the training of DIndRNN model is completed. 4) DIndRNN model deployment. The preprocessed test data set is input into the trained DIndRNN model, and the predicted response value of the unit to AGC instruction is output, so as to realize the identification of the effect of the unit executing AGC instruction (Fig. 4).

Start

Raw data of unit output sequence

Data were normalized

The original data was input to K-fold cross validation of the KPCA model constructed by grid search

The training dataset after dimensionality reduction

Construct DIndRNN model

Output predicted value Update the weight The predicted value is compared with the data label

No Complete the training Yes DIndRNN model deployment

Import the dimensionally reduced test dataset

Output predicted value

End

Fig. 4. Flowchart of AGC instruction execution effect identification algorithm

20

H. Jiang et al.

5 Example Analysis 5.1 Basic Information of the Example To verify the validity and rationality of the model and control method proposed in this paper, real data of power grid are introduced as the analysis object. The data time span is half a year, the sampling frequency is 1 min, and the original data is 262,080 pieces. The models used in the examples are built by Tensorflow2.0, a deep learning framework owned by Google. The language is python3.7.6; The computer used is configured with Intel(R)Core(TM)[email protected] GHz and 16G memory. 5.2 Comparative Analysis of Training Set and Validation Set Randomly selected 80 predicted points, the prediction accuracy of DIndRNN model on training set and validation set is compared, and the single-step prediction result is shown in Fig. 5. It can be clearly seen from the figure that, in both the training set and the validation set, DIndRNN model can better give the output prediction results of the unit under AGC command, which reflects the good generalization ability of DIndRNN model, that is, it can also have a good effect on the data not in the training set.

Fig. 5. Comparative analysis of prediction results of training set and validation set

5.3 Comparative Analysis of DIndRNN Model and MLP Model Figure 6(a) shows the prediction results of DIndRNN model and MLP model. The red broken line represents the DIndRNN model prediction result, the blue broken line represents the MLP model prediction result, and the green broken line represents the real unit output value. MLP model presents a large error in predicting unit output sequence, while DIndRNN model has a higher prediction accuracy. This is because the MLP model cannot store sequence information and is difficult to process sequence data. DIndRNN has the ability of memory and has a good effect on sequence prediction.

Research on Data-Driven AGC Instruction Execution Effect Recognition Method

21

(a)Comparison between DIndRNN model and MLP model

(b) Compared with the MLP model prediction error distribution DIndRNN model

Fig. 6. DIndRNN model and MLP model analysis

Figure 6(b) shows the prediction error distribution of DIndRNN model and MLP model in the form of histogram. The horizontal coordinate of the histogram is each interval of the error distribution, and the vertical coordinate is the proportion of the model prediction error within the interval. The curve in the figure is the kernel density fitting curve. The orange histogram represents the error distribution of the DIndRNN model, while the blue histogram represents the error distribution of the MLP model. As can be seen from Fig. 6(b), the error distribution of DIndRNN model is far more concentrated than that of MLP model. This also shows from another aspect that DIndRNN model has a smaller error value (Table 1).

22

H. Jiang et al. Table 1. DIndRNN model and MLP model evaluation index

Evaluation index

DIndRNN

MLP

MAE

3.4211

8.5662

MSE

46.3561

132.1398

5.4 Influence of Preprocessing Strategy on Model Prediction Accuracy 1) Comparison of different dimensionality reduction models. In order to verify the effectiveness of the KPCA dimensionality reduction model, the KPCA dimensionality reduction model, local linear embedding (LLE) and principal component analysis (PCA) models are compared and analyzed in this section. Different dimensionality reduction models are used to process the original data, and DIndRNN model is used to output the data after processing. The evaluation indexes of the prediction results are shown in Table 2. Table 2. Different pretreatment strategy prediction precision quantitative contrast The evaluation index

KPCA

LLE

PCA

MAE

7.3239

16.3661

138.9642

MSE

110.2366

588.6422

36142.6866

As can be seen from the table, the linear dimensionality reduction method PCA has been unable to effectively process the data set in this paper, and too much effective information has been lost in the dimensionality reduction process, leading to the failure of the model to give prediction results. The nonlinear dimensionality reduction methods KPCA and LLE perform much better. Among them, KPCA model is better than LLE model. 2) Analysis of model prediction accuracy based on dimensionality reduction. Table 3 shows the prediction accuracy of DIndRNN model under different dimensionality reduction dimensions. Table 3. Comparison of model training errors in different dimensions The evaluation index

3d

4d

5d

MAE

11.8624

7.3239

7.6322

MSE

368.2402

110.2366

14.6988

Research on Data-Driven AGC Instruction Execution Effect Recognition Method

23

It can be seen from Table 3 that the prediction accuracy is better in 4d than in 3d. However, when the dimension changes from 4d to 5d, the prediction accuracy drops slightly. Dimension 4 is the optimal parameter. 3) The influence of preprocessing on the model training process. Figure 7 shows the error reduction of DIndRNN model in the training process whether KPCA dimensionality reduction preprocessing process is adopted.

Fig. 7. Diagram of the effect of preprocessing on model training

It can be seen from the orange broken line that the error decreases rapidly in the first five iterations, and the rate of decline is faster than that of the blue broken line. In addition, after 30 iterations, both kinds of broken lines tend to be flat. Compared with the blue curve, the error value of the yellow curve is smaller, which indicates that the KPCA dimension reduction strategy effectively simplifies the model.

6 Conclusion In this paper, an AGC instruction identification method based on independent recurrent neural network is proposed. The main conclusions are as follows: 1) The KPCA dimension reduction model constructed in this paper can effectively eliminate the factors that have little correlation with unit output. 2) Compared with the traditional recurrent neural network model, the DIndRNN prediction model constructed in this paper has better prediction accuracy. 3) Using KPCA preprocessing strategy and DIndRNN model, the dimensionality of data can be reduced under the premise of retaining as much effective information as possible, shorten the training time of the model, and improve the prediction accuracy.

24

H. Jiang et al.

References 1. He, C., Wang, H., Wei, Z., et al.: Distributed collaborative real-time control of wind farm and AGC unit. Acta Electr. Eng. Sin. 35(02), 302–309 (2015) 2. Liao, X., Liu, K., Le, J., et al.: Coordinated control strategy for cross regional AGC units based on bi level model predictive structure. Acta Electr. Eng. Sin. 39(16), 4674–4685+4970 (2019) 3. Lin, L., Zou, L., Zhou, P., Tian, X.: Multi-angle economic analysis on deep peak regulation of thermal power units with large-scale wind power integration. Autom. Electr. Power Syst. 41(07), 21–27 (2017) 4. Shu, J., et al.: Primary frequency modulation control strategy in deep peak load cycling state for thermal power units based on multi-model predictive control. Boiler Technol. 51(06), 12–17 (2020) 5. Hossain, M.A., Chakrabortty, R.K., Elsawah, S., Gray, E.M., Ryan, M.J.: Predicting wind power generation using hybrid deep learning with optimization. IEEE Trans. Appl. Supercond. 31(8), 1–5 (2021) 6. Sun, R., Liu, Y.: Online decision making of unit recovery based on deep learning and Monte Carlo tree search. Power Syst. Autom. 42(14), 40–47 (2018) 7. Yang, N., Ye, D., Lin, J., et al.: Research on intelligent decision-making method of unit commitment with self-learning ability based on data-driven. Chin. J. Electr. Eng. 39(10), 2934–2945 (2019) 8. Zhao, H., Xue, W., Li, X., et al.: Multi-mode neural network for human action recognition. IET Comput. Vis. 14(8), 587–596 (2020) 9. Xuejun, Y., Yiqun, Z., Shuxian, S., et al.: LSTM network for carrier module detection data classification. In: 2019 International Conference on Smart Grid and Electrical Automation (ICSGEA), Xiangtan, China (2019) 10. Dong, M., Grumbach, L.: A hybrid distribution feeder long-term load forecasting method based on sequence prediction. IEEE Trans. Smart Grid 11(1), 470–482

Research on Day-Ahead Scheduling Strategy of the Power System Includes Wind Power Plants and Photovoltaic Power Stations Based on Big Data Clustering and Filling Feng Qi1 , Qiang Wang2 , Xiaoqiang Wei1 , Yiping Zhang1 , Wenbin Wu1 , Donyang He1 , Taicheng Wang3(B) , and Suyang Shen3 1 State Grid Heilongjiang Electric Power Co., Ltd., Harbin 150090, Heilongjiang, China 2 Jixi Power Supply Company of State Grid Heilongjiang Electric Power Co., Ltd., Jixi 158100,

China 3 School of Electric Power Engineering, Nanjing Institute of Technology, Nanjing 211167,

Jiangsu, China [email protected]

Abstract. Traditional power system scheduling optimization methods cannot fully deal with the massive data brought by the increase of new energy penetration. Aiming at the above problems, this paper proposes a day-ahead scheduling optimization method based on big data clustering and filling for power system includes wind power plants and photovoltaic power stations. Firstly, according to the collected historical data of power grid, the K-means clustering method of big data is used to generate representative load, wind power and solar illumination scene sets. Secondly, a missing value filling method based on historical data assisted scene analysis was proposed. Finally, the day-ahead scheduling model of power system with wind power plants and photovoltaic power stations is established and solved by improved particle swarm optimization algorithm. The simulation results show that this method can improve the filling accuracy of the missing value of the dayahead dispatching historical data of the power system, and meet the development demand of the power system with new energy. Keywords: Data filling · Power system includes new energy · Scene analysis · Wave cross correlation analysis · Reactive power optimization

1 The Introduction With the continuous improvement of power grid automation and the wide application of power grid data intelligent acquisition system, the power grid data collection system is becoming mature. However, the lack of frequency and accuracy in the collection process of power grid data inevitably leads to some missing values in the data, so as to interfere with the process of data analysis and affect the final identification effect of the model. Therefore, how to effectively fill the missing value of power grid data has gradually become a difficult problem to be solved. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 25–36, 2023. https://doi.org/10.1007/978-981-99-3300-6_3

26

F. Qi et al.

In recent years, the research on big data [1] has been upsurge around the world. Big data technology has injected fresh blood into the development of smart grid and achieved good results [2]. Literature [3] proposes a medium and long-term load forecasting method for power system based on big data clustering. The calculation results show that the big data clustering algorithm can effectively cluster and partition a large number of load data, realize the discrimination and prediction of loads with different growth characteristics, and has high prediction accuracy. Literature [4] proposes complete compatibility classes based on the incomplete data fill algorithm, the energy consumption of large data management model, to help green data center, effective management of solar energy and other resources to match the utility to provide stable and adequate power supply, thus providing efficient service system for the whole data center energy services. Literature [5] proposes a missing data filling algorithm for the unified data model of panoramic regulation, which uses the improved Markov Monte Carlo method to estimate the missing data according to the known data, and obtains the best parameters of incomplete regulatory data through fewer iterations, thus improving the accuracy of missing data estimation. This paper proposes a day-ahead scheduling optimization strategy of power system based on big data clustering and filling. Firstly, the K-means clustering method of big data is used to generate representative load, wind power and solar illumination scene sets. Then, a missing value filling method of power grid data based on multi-parameter IOT fusion technology is proposed. The fluctuation cross-correlation analysis is carried out according to the historical data of power grid. By combining the weights and the dynamic time warping distance to measure a certain date missing attribute data is similar to the date of the known data integrated similarity to find out the attribute the date of the highest similarity, and the date at the same time to missing data combined with the linear fitting curve data attribute data fill further improved the precision of data missing value fill. Finally, the day-ahead scheduling model of the power system includes wind power plant and photovoltaic power station is established, and the improved particle swarm optimization algorithm is used to solve the problem. The simulation results show that the data after clustering can meet the requirements of data integrity and accuracy including day-ahead scheduling optimization of system with wind power plants and photovoltaic power stations.

2 K-Means Clustering Method for Massive Power Grid Data The process of grouping a collection of physical or abstract objects into multiple classes consisting of similar objects is called clustering. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster but different from objects in other clusters. Dissimilarity is calculated according to the membership value of the description object, and distance is often used as a measure. With the increasing perfection of the power grid information collection system, a large amount of network data can be collected. Therefore, the selected clustering method is required not only to be able to cluster quickly and accurately when the amount of information is not large, but also to adapt to the demand of increasing data volume and to output the clustering results timely and quickly. Because k-means algorithm can deal

Research on Day-Ahead Scheduling Strategy of the Power System

27

with large data sets, has good scalability and high efficiency, is simple and fast, can adapt to the demand of real-time processing of data volume growth, and is widely used in large-scale data clustering, so K-means algorithm is selected to cluster samples in this paper. The steps of classical k-means algorithm are as follows: 1) Determine the number of clusters N and the maximum number of iterations M. 2) Choose any N objects in the sample as the initial clustering center. 3) Calculate the Euclidean distance between the samples and N objects, and classify the samples according to the size of the Euclidean distance. The Euclidean distance is defined as:   p  (xik − xjk )2 (1) dij =  k=1

In the formula: xik represents the observed value of the ith variable of the kth sample; p represents the number of samples; dij Represents the Euclidean distance between sample j and sample i. 4) Calculate the average value of various objects and update the clustering center. 5) Calculate the square error criterion function to determine whether convergence conditions are met. If convergence occurs, the algorithm ends. If it does not converge, judge whether the number of iterations is greater than M. if it is less than M, go to the third step, otherwise, the algorithm ends. The squared error criterion function is: E=

k  

(xq − mi )2 , mi =

i=1 xq ∈Ci

 xq nCi

(2)

xq ∈Ci

6) Output clustering results The validity of clustering can be from two aspects of the distance between the class and class distance measure, class distance means that the condensation degree of similar samples, the distance between the class means that between different classes of separating degree, thus a good clustering results, should meet the distance between the class, class distance is small, so the greater the similarity of the same class, the greater the difference of the sample in different classes. Contour coefficient is an index that comprehensively reflects the similarity and difference between classes, so it can be used to evaluate the quality of clustering and determine the reasonable number of clusters. The contour coefficient S is defined as: S=

n  si i=1

n

(3)

In the formula, n represents the total number of samples; si represents the contour coefficient of sample i, its definition is: si =

yi − xi max{xi , yi }

(4)

28

F. Qi et al.

In the formula, xi represents the average distance between the ith sample in class x and other samples in class x, and represents the degree of cohesion in the class; Calculate the average of distances between xi and samples in all classes except x, and denote yi as the minimum value of this average. Obviously, the larger the value, the higher the clustering quality.

3 Missing Data Processing Strategy Based on Historical Data Assisted Scene Analysis In the field of day-ahead reactive power optimization of new energy grids, the accuracy and completeness of data have a great impact on the line loss analysis results. However, with the exponential growth of collected data, the problem of missing voltage data caused by manual input and acquisition device failure occurs frequently, so the missing data need to be identified or completed. The traditional maximum expected value algorithm [6], K-proximity algorithm [7] and other methods provide solutions, but the filling effect is not ideal because historical data are rarely used as the analysis basis. In recent years, the world raised a hot wave of big data research, big data technology for smart grid development injected fresh blood, and good results were obtained, therefore this section presents a missing value based on historical data aided scenario analysis to fill method, further improve the missing value in the line loss analysis fill precision at the same time, meet the demand of power grid development, The overall process is shown in Fig. 1: The specific steps are as follows: Step 1: Obtain the historical data of the power grid and go to Step 2. Step 2: The fluctuation cross-correlation between the known attribute data and the missing attribute data at the same time was calculated by the fluctuation cross-correlation analysis algorithm, and then Step 3 was performed. The calculation steps are as follows: 1) Certain two time series of equal length xi and yi , i = 1, 2, · · · , N 2) Calculate the sum of differences between xi , yi and the average value: ⎧ l

⎪ ⎪ (xi − x), l = 1, 2, · · · , N ⎨ x(l) = i=1

l

⎪ ⎪ ⎩ y(l) = (yi − y), l = 1, 2, · · · , N

(5)

i=1

In the formula, l represents the sampling length, x(l) and y(l) respectively represent the sum of differences between xi , yi and the average value at sampling length l, x and y respectively represent the average value of xi and yi . 3) Calculate the forward difference representing xi and yi autocorrelation respectively: x(l, l0 ) = x(l0 + l) − x(l0 ), l0 = 1, 2, · · · , N − l (6) y(l, l0 ) = y(l0 + l) − y(l0 ), l0 = 1, 2, · · · , N − l

Research on Day-Ahead Scheduling Strategy of the Power System

29

Start

Obtaining grid data Analyze data attributes, get M properties and set N=1 Input attribute data Calculate the wave cross correlation

N

Whether the threshold is exceeded

N=N+1

Y Discarded

N 0, it means xi and yi are positively correlated; When hxy < 0, it means that xi and yi are negatively correlated; A higher value of hxy indicates a higher degree of correlation between xi and yi . Step 3: If the fluctuation correlation between a known attribute data and the missing attribute data exceeds the comparison threshold, the known attribute data is retained and the missing attribute data goes to Step 4. Otherwise, the known attribute data is discarded. Step 4: The attributes corresponding to the M known attribute data retained are called Know attributes, and the attributes corresponding to the missing attribute data are called Unknow attributes. The combined weights of Know attributes and Unknow attributes are calculated respectively. In this step, the combined weight wj of Know attribute j and Unknow attributes are calculated by the following formula: ⎧ cj wj =

⎪ M ⎪ ⎪ ⎨ cj j=1 (9) M

⎪ ⎪ ⎪ wj = 1 ⎩ j=1

In the formula: M represents the number of Know attributes j = 1, 2, . . . , M , cj represents the fluctuation correlation coefficient between Know attribute j and Unknow attributes. Step 5: The scene analysis was carried out on the date containing Unknow attributes, and the H most similar scenes were found after clustering historical data of the power grid. The date with attribute Unknow is called the missing date, and The H most similar scenes found are called H similar scenes. Step 6: Firstly, the time period of the missing attribute data in the missing date is determined. Then, for the same time period of each similar scene, the similarity between each Know attribute data of the missing date and each Know attribute data of each similar scene is measured by the dynamic time-bending distance. The specific steps are as follows: 1) Set the occurrence time of missing attribute data as time tn , select n time points backward at time tn , select n time points forward at time tn , and finally form the time period (tn , t2n ) of missing attribute data in the missing date, including 2n + 1 time points t0 , t1 , . . . , t2n ; Let M Know attributes retained after comparison threshold judgment and screening be denoted as A1 , A2 , . . . , AM , Unknow attribute be denoted as A0 ;

Research on Day-Ahead Scheduling Strategy of the Power System

31

2) The Know attribute data A1 , A2 , . . . , AM at time t0 , t1 , . . . , t2n of the hth similar 2n

scene are denoted as D(1,h) , D(2,h) , . . . , D(M ,h) respectively. D(j,h) = d(j,h,g) , g=0

d(j,h,g) denotes the attribute data of Know attribute j at time tg in the hth similar scene, j = 1, 2, . . . , M , h = 1, 2, . . . , H , g = 1, 2, . . . , 2n; 3) The dynamic time-bending distance is used to measure the similarity S(j,h) between the Know attribute data Aj at time t0 , t1 , . . . , t2n in the hth similar scene and the attribute data D(j,h) at time t0 , t1 , . . . , t2n in the missing date. p represents the missing date. Step 7: Combined with the combined weights of each Know attribute and Unknow attribute, the comprehensive similarity of Unknow attribute of each similar scene is calculated using the following formula: Ch =

H M  

wj × S(j,h)

(10)

j=1 h=1

In the formula, Ch represents the comprehensive similarity of Unknow attribute in the hth similar scene. Step 8: After finding N scenes whose comprehensive similarity of Unknow attribute is within the threshold, the data T1 of the Unknow attribute at the time tn of the scene is extracted from the ith scene as the longitudinal filling data. At the same time, the linear fitting of the curve for the Unknow attribute was used to find out the data T2 at time tn of this scene as the transverse filling data, and the filling value of the missing attribute data in this scene was solved: Ti = α × T1 + β × T2 (11) α+β =1 In the formula, time tn is the occurrence time of missing attribute data, α is the weight of T1 , and β is the weight of T2 . Call the probability of each scenario pi , the final imputation value of missing attribute data is obtained by multiplying the imputation value of each scene by its scene probability:  T= pi Ti

4 Construction of Power System Optimization Model Including Wind Power Plant and Optical Power Station 4.1 Doubly-Fed Asynchronous Fan The relationship between DFIG active power PDFIG output and wind speed v [8] is as follows: ⎧ v ≤ vi , v ≥ vo ⎨ 0, PDFIG = k1 v + k2 , vi < v ≤ ve (12) ⎩ Pe , vo > v > ve

32

F. Qi et al.

In the formula, Pe is DFIG rated power; vi is the cut in wind speed; ve is the rated wind speed; vo is the cut out wind speed; k1 and k2 are the parameter of the fan power generation system. The relation expression of active and reactive power output in DFIG is as follows: ⎧ 2 ⎪ 2 ⎨ PDFIG + Q2 DFIG = (3Us Is ) 1−s (13)  2   2  2 2 ⎪ Xm ⎩ PDFIG + QDFIG + 3 Us = 3 U I s r 1−s Xs Xs In the formula: s is the slip rate; Us is the stator side voltage; Is is the stator winding current; Xs is the stator leakage reactance; Xm is excitation reactance; Ir is the rotor side converter current. 4.2 Photovoltaic Power Array The active power output of PV array shall meet the following equation: pPV = EηA

(14)

In the formula, η - photoelectric conversion efficiency; A - Effective light area. Reactive power output characteristics of PV array shall meet the following equation:  2 − P2 |QPV |max = SPV (15) PV In the formula: |QPV |max - maximum reactive power regulation capacity of PV; SPV grid-connected inverter capacity of PV system; PPV - Active power output of PV system. 4.3 The Objective Function The objective is to minimize the active power loss, and its objective function is as follows: min y = Ploss =

Nl 

Gk (Ui2 + Uj2 − 2Ui Uj cos θij )

(16)

k=1

In the formula, Ploss is the active power loss of the system; Ui and Uj are node i and node j voltage amplitudes respectively; Nl is the number of system branches; Gk is the conductance of the branch; θij is the voltage phase Angle difference between node i and node j.

Research on Day-Ahead Scheduling Strategy of the Power System

33

4.4 The Constraint 1) System power flow constraint ⎧

⎪ Uk (Gik cos δik + Bik sin δik ) ⎨ PLi − PDGi = Ui k∈i

⎪ Uk (Gik sin δik − Bik cos δik ) ⎩ QLi − QDGi = Ui

(17)

k∈i

In the formula, node k is all the nodes connected with the node; PLi , QLi are the active and reactive power injected into node i; PDGi , QDGi are the active and reactive power of DG or battery injected into node i; δik is the phase Angle difference of nodes at both ends of branch ik; Gik , Bik are the real and imaginary parts of admittance on branch ik; Ui , Uk are the voltage amplitudes of node I and node k respectively. 2) Node voltage constraint Ui min ≤ Ui ≤ Ui max i = 1, 2, . . . , n

(18)

In the formula, n is the number of system nodes; Ui , Ui max , Ui min are respectively the node voltage of the node and its allowed maximum and minimum values. 3) Constraints on DG operation conditions QDGi min ≤ QDGi ≤ QDGi max

(19)

In the formula, QDGi , QDGi max , QDGi min are the reactive power output of DG of station I (group) and its allowable maximum and minimum value respectively.

5 Example Analysis 5.1 The Example Described In this paper, an improved IEEE14 node power system simulation analysis is adopted, as shown in Fig. 2. Node 9 is connected to a large-scale wind power plant, node 5 is connected to a centralized photovoltaic power station, and node 2 and node 10 are connected to 10 sets of switchable capacitor banks with a reactive power capacity of 200 kVar each. It is assumed that the transformer between branch 5–6 and branch 4– 9 is a 17-level transformer with load adjusting sub, and the improved particle swarm optimization algorithm is used to optimize the solution [9]. The voltage reference value of the power system is 220 kV, and the capacity reference value is 100 MVA. The fan inlet wind speed, rated wind speed and cut out wind speed are 4 m/s, 12 m/s and 25 m/s, respectively, with a rated capacity of 8 MW. The effective illumination area of the photovoltaic power station is 30000 m2 , the conversion efficiency is 0.9, and the rated capacity is 8 MW. The allowable range of the node voltage is ±6% of the rated voltage; The population of the improved PSO algorithm is 50, the number of iterations is 150, the GAMA parameter is 0.95, and the learning factor is C1 = C2 = 0.5.

34

F. Qi et al.

Fig. 2. Improved IEEE14 node system

5.2 Analysis of Optimization Results An electrical network of typical years of historical data as fill the database, and assumes that one of the typical day has the problem of the missing data, will fill it as object, the method proposed in this paper after data packing on that day, part of the extract containing fill data into the improved IEEE14 node in the system, and a scheduling optimization. The optimization results of control variables are shown in Fig. 3:

Fig. 3. Optimization results of control variables

The result of active power loss of the whole system for a whole day under various conditions is shown in the Table 1.

Research on Day-Ahead Scheduling Strategy of the Power System

35

Table 1. The simulation results contrast The layer number

Situation name

Value of loss (kW)

The error rate from the true value

1

The loss of the original system

345.77

5.01%

2

The loss of the system before reactive power optimization with real data

341.62

3.75%

3

The loss of the system after reactive power optimization with real data

329.28

0%

4

The loss after clustering and filling in out possible scenarios with big data

327.53

0.532%

Standard system is the first case IEEE14 node in the table of a scheduling cycle loss, the second is in the system after modification is not a scheduling a scheduling cycle loss, and the third is the transformation system has a scheduling cycle loss after scheduling optimization, the fourth case is to use the clustered scene data to fill in the missing data, and carry out day-ahead scheduling optimization to optimize the loss of a scheduling cycle. According to the data in the table, the method proposed in this paper is used to fill the grid data, and then the grid with the data filled is optimized for day-ahead scheduling. The results obtained are close to accurate data, which can meet the requirements of data integrity and accuracy for day-ahead scheduling optimization of new energy grid. This paper proposes a day-ahead scheduling optimization method for power systems including wind power plants and photovoltaic power stations based on big data clustering and filling. The main conclusions are as follows: 1) The proposed missing value filling method based on historical data assisted scenario analysis can improve the filling accuracy of the missing value of the historical data of the day-ahead scheduling of the power system. 2) The load, wind power and photoelectric scene sets obtained by K-means clustering method of big data can be accurately represented. 3) Establish day-ahead scheduling model of power system including wind power plants and photovoltaic power stations, and use improved particle swarm optimization algorithm to solve it. The simulation results show that this method can improve the filling accuracy of the missing value of the day-ahead dispatching historical data of the power system, and meet the development demand of the power grid with new energy.

36

F. Qi et al.

References 1. Li, G., Cheng, X.: Big Data Research: a major strategic field of future technology and economic and social development – research status and scientific thinking of big data. Proc. Chin. Acad. Sci. 27(06), 647–657 (2012) 2. Song, Y., Zhou, G., Zhu, Y.: Current status and challenges of smart grid big data processing technology. Power Grid Technol. 37(04), 927–935 (2013) 3. Xu, Y., Cheng, X., Li, Y., Zhang, H., Yu, W., He, B.: Medium and long-term load forecasting of power system based on big data clustering. J. Electr. Power Syst. Autom. 29(08), 43–48 (2017) 4. Yuan, J., Zhong, L., Yang, G., Chen, M., Gu, J., Li, T.: Research on filling and classification algorithm of incomplete energy consuming big data in green data center. Chin. J. Comput. 38(12), 2499–2516 (2015) 5. Tang, L., Wang, R., Wu, R., Fan, B.: Missing data filling algorithm for panoramic regulation unified data model. Autom. Electr. Power Syst. 41(01), 25–30+87 (2017) 6. Mallick, P., Chen, Z., Zamani, M.: Reinforcement learning using expectation maximization based guided policy search for stochastic dynamics. Neurocomputing 484, 79–88 (2017) 7. Kim, Y.K., Kim, H.J., Lee, H., Chang, J.W.: Privacy-preserving parallel kNN classification algorithm using index-based filtering in cloud computing. PLoS ONE 17(5), e0267908 (2017) 8. Lang, Y., Zhang, X., Xu, D., Ma, H., Hadianmrei, S.R.: Reactive power analysis and control strategy for doubly-fed wind farm. Proc. CSEE 2007(09), 77–82 (2007) 9. Hematpour, N., Ahadpour, S.: Execution examination of chaotic S-box dependent on improved PSO algorithm. Neural Comput. Appl. 33(10), 5111–5133 (2020)

Day-Ahead Scenario Generation Method for Renewable Energy Based on Historical Data Analysis Hong Wang1 , Yang Liu1 , Chao Yin1 , Xuesong Bai1 , Bo Sun1 , Menglei Li2(B) , and Xiyong Yang2 1 State Grid Heilongjiang Electric Power Company, Harbin 150090, Heilongjiang, China 2 Nanjing Institute of Technology, Nanjing 211167, Jiangsu, China

[email protected]

Abstract. With the massive application of new energy, the contradiction of power grid regulation has become increasingly prominent. How to effectively predict the range of the power grid is a huge challenge faced by the day-ahead dispatch of power system. Aiming at this problem, this paper proposes a method for generating day-ahead scenarios for renewable energy based on historical data analysis. First, the deep embedding clustering (DEC) algorithm is used to analyze historical data, and periods with similar characteristics are divided into one group. Then the conditional deep convolutions generative adversarial network (C-DCGAN) model generates a day-ahead scenario set for renewable energy. At last, the Belgian Elia renewable energy data is used for simulation analysis, and the results show that the proposed method can accurately describe the uncertainty of renewable energy. Keywords: Renewable energy · Cluster analysis · Conditional deep convolutional generative adversarial networks · Day-ahead scenario set

1 The Introduction Global energy is undergoing a deep transformation with renewable energy becoming the focus of energy development [3]. However, due to the uncertainty of renewable energy, it brings huge challenges to the power system [4]. How to accurately describe the uncertainty is one of the key issues to overcome these challenges. The stochastic optimization method uses the scenario analysis method to deal with the uncertainty of renewable energy [1]. Scholars has used statistical methods to generate renewable energy scenarios: Class method, time series simulation method, Markov chain, scene tree generation method, autoregressive moving average model, etc. [2, 3]. The output of renewable energy is easily affected by various factors such as geographical environment, climate and weather. In recent years, with the rapid development of artificial intelligence, deep learning algorithms have become popular. Compared with statistical methods, renewable power output scenarios generated by deep learning models can accurately describe the uncertainty of renewable energy due to excavating multi-feature and high-dimensional raw © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Tian et al. (Eds.): ICBDS 2022, CCIS 1796, pp. 37–46, 2023. https://doi.org/10.1007/978-981-99-3300-6_4

38

H. Wang et al.

data. Main deep learning models include: variational auto-encoder (VAE) [4], deep belief network (DBN) [5], generative adversarial networks (GAN) [6]. In recent years, a large number of scholars have studied the method of new energy output scenarios based on unsupervised learning models. Reference [7] proposed a method for new energy output scenarios based on variational autoencoders. Using variational autoencoders to extract non-Euclidean correlation features and temporal correlation features and effectively describe the correlation between multi-source scenarios. Reference [8] uses GAN to find the time-space correlation features of new energy output and introduce Wasserstein distance to improve the training quality of the model. Reference [9] proposes a renewable energy day-ahead scenario generation method based on a conditional generative adversarial network (CGAN), learning the relation between the noise distribution and the day-ahead scenarios through the training of the adversarial network. Compared with the Markov chain scenario generation method, it can describe the uncertainty of wind power more accurately. Although a series of research has achieved a lot, there are few studies on the generation methods of day-ahead scenarios with new energy access. Aiming at such problems, this paper proposes a method for generating day-ahead scenarios for renewable energy based on historical data analysis. First, the deep embedding clustering (DEC) algorithm is used to cluster and analyze historical data, and time periods with similar characteristics are collected together; second, the conditional deep convolutional generative adversarial network (C-DCGAN) model generates a set of renewable energy day-ahead scenarios; finally, the simulation analysis is carried out with the Belgian Elia renewable energy data to verify the effect of the method in this paper. The example results show that the method proposed in this paper can more accurately describe the uncertainty of renewable energy.

2 General Framework of Day-Ahead Scenario Generation Method for Renewable Energy Figure 1 shows the framework of the day-ahead prediction method under the high proportion of renewable energy, which is mainly divided into three steps: Step 1: Collect historical data and use autoencoder to get low-dimensional features; use K-means clustering to get initial clustering results; calculate rework loss and clustering loss function and use the K-means algorithm to iterate repeatedly until the clustering result is within the set deviation range. Step 2: Establish a C-DCGAN model for each cluster; collect historical prediction data and actual data, input into C-DCGAN for training. After training, the model can learn the relationship between the noise and the actual output. Step 3: Obtain the prediction value of the new energy output, judge its feature and select the C-DCGAN model, input the prediction value, and day-ahead scenario is generated multiple times to form a scenario set; calculate the prediction interval.

K-means clustering

Optimizer Tuning Parameters

Calculate the loss function

decoder

Training network

C-DCGAN1

C-DCGAN2

...

Generate day-ahead scenario set

Choose C-CDGAN model type n Prediction value

...

Actual value

Prediction value

type2

Actual value

Prediction value

type1

Generated scenario evaluation

No

Iterative error