Advances in Computational Vision and Robotics. Proceedings of the International Conference on Computational Vision and Robotics 9783031386503, 9783031386510


144 91 18MB

English Pages [549] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
ICCVR-2023 Conference Committee
Preface
Acknowledgements
About This Book
Contents
About the Editors
Part I Pattern Recognition and Robotic Vision
1 Design of Piano Automatic Accompaniment System Based on Markov Model
1.1 Introduction
1.2 Methodology
1.2.1 Related Theory of Algorithmic Composition
1.2.2 Automatic Piano Accompaniment System Based on HMM
1.3 Result Analysis and Discussion
1.4 Conclusions
References
2 3D Visual Design of Music Based on Multi-audio Features
2.1 Introduction
2.2 Methodology
2.2.1 Audio Visualization Method
2.2.2 Multi-audio Feature Extraction of Music
2.3 Result Analysis and Discussion
2.4 Conclusions
References
3 Construction of Humming Music Retrieval Model Based on Particle Swarm Optimization
3.1 Introduction
3.2 Related Music Knowledge in Humming Music Retrieval
3.2.1 Elements of Sound
3.2.2 The Relationship and Difference Between Voice Signals and Humming Music Signals
3.3 Constructing a Humming Music Retrieval Model
3.3.1 Basic Framework
3.3.2 Model Application
3.4 Conclusions
References
4 Research on Audio Processing Method Based on 3D Technology
4.1 Introduction
4.2 Audio Processing Method Based on 3D Technology
4.3 Result Analysis and Discussion
4.4 Conclusions
References
5 Design and Optimization of Point Cloud Registration Algorithm Based on Stereo Vision and Feature Matching
5.1 Introduction
5.2 Research Method
5.2.1 Stereo Vision Model
5.2.2 Feature Matching Based on Stereo Vision
5.3 Experiment and Analysis
5.4 Conclusion
References
6 Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based on Multi-scale Feature Extraction
6.1 Introduction
6.2 Research Method
6.2.1 Multiscale Feature Extraction
6.2.2 Real-Time Cloud Matching Acceleration of 3D Point Cloud
6.3 Experimental Analysis
6.4 Conclusion
References
7 Design of Digital Music Copyright Protection System Based on Blockchain Technology
7.1 Introduction
7.2 Research Method
7.2.1 System Overall Design
7.2.2 Key Technology Realization
7.3 Result Analysis
7.4 Conclusion
References
8 Personalized Music Recommendation Model Based on Collaborative Filtering Algorithm and K-Means Clustering
8.1 Introduction
8.2 Collaborative Filtering Algorithm
8.3 Design of Personalized Music Recommendation Model
8.4 Implementation of a Personalized Music Recommendation Model
8.4.1 Personalized Music Model Recommendation Process
8.4.2 Result Analysis
8.5 Conclusions
References
9 Simulation of Fuzzy Calculation Model of Music Emotion Based on Improved Genetic Algorithm
9.1 Introduction
9.2 Simulation Research on Fuzzy Computing Model of Music Emotion
9.2.1 Quantitative Research on Musical Emotion
9.2.2 Construction of a Fuzzy Computing Model for Music Emotion
9.3 Simulation of Fuzzy Calculation Model of Music Emotion Based on Improved GA
9.3.1 Improved GA Model
9.3.2 Analysis of Experimental Results
9.4 Conclusion
References
10 Design and Implementation of Piano Performance Automatic Evaluation System Based on Support Vector Machine
10.1 Introduction
10.2 Methodology
10.2.1 Related Technical Basis
10.2.2 Design and Implementation of Automatic Assessment System for Piano Performance
10.3 Result Analysis and Discussion
10.4 Conclusions
References
11 Simulation of Music Personalized Recommendation Model Based on Collaborative Filtering
11.1 Introduction
11.2 Simulation of Music Personalized Recommendation Model
11.2.1 Basic Theory of Music Recommendation System
11.2.2 Personalized Recommendation System and Related Technologies
11.3 Simulation of Music Personalized Recommendation Model Based on CF
11.3.1 Collaborative Filtering Algorithm
11.3.2 Analysis of Experimental Results
11.4 Conclusion
References
12 Design and Optimization of Image Recognition and Classification Algorithm Based on Machine Learning
12.1 Introduction
12.2 Image Classification and Retrieval Method Based on Image Visual Features
12.2.1 Theoretical Basis of Machine Learning Recognition Algorithm
12.2.2 Research on Virtual Sample Algorithm in Image Recognition
12.3 Image Retrieval Method Combining Machine Learning with Image Visual Features
12.3.1 Image Recognition Advertising Classification Algorithm Model
12.3.2 Experimental Analysis Results
12.4 Conclusion
References
13 Design of Path Planning Algorithm for Intelligent Robot Based on Chaos Genetic Algorithm
13.1 Introduction
13.2 Concept and Principle of Chaotic Genetic Algorithm
13.3 Path Planning Method
13.3.1 Coding Based on Geographic Information
13.3.2 Simplification of Robot Control Parameters
13.4 Simulation Study
13.5 Conclusions
References
14 Design and Development of Rail Transit Overhead Contact Line Monitoring System Based on Image Processing
14.1 Introduction
14.2 Related Concepts
14.2.1 Catenary
14.2.2 Image Processing Technology
14.3 System Design
14.3.1 Overall Scheme Design
14.3.2 Monitoring Preprocessing Module
14.3.3 Monitoring Terminal Energy Consumption Analysis
14.4 System Implementation
14.4.1 Monitoring Terminal Energy-Saving Mode
14.4.2 System Performance Test
14.5 Conclusion
References
15 Ultrasonic Signal Processing Method for Transformer Oil Based on Improved EMD
15.1 Introduction
15.2 Tests and Methods
15.2.1 Ultrasonic Testing
15.2.2 Ultrasonic Signal Processing Method Based on Improved EMD
15.3 Results and Discussion
15.4 Conclusion
References
16 Research on UHV Transmission Line Selection Strategy Aided by Satellite Remote Sensing Image
16.1 Introduction
16.2 Research Method
16.2.1 Data Processing
16.2.2 Precise Correction of RS Image
16.2.3 Transmission Line Path Optimization
16.3 Accuracy Analysis
16.4 Conclusions
References
17 Research on the Evaluation of the Teaching Process of Public Physical Education in Universities Based on Markov Model
17.1 Introduction
17.2 Research Method
17.2.1 Establishment of Evaluation Index System
17.2.2 Markov Model
17.3 An Example of Evaluation of Public PE Class Teaching Process
17.4 Conclusion
References
Part II Artificial Intelligence and Deep Learning Application
18 Simulation Design of Matching Model Between Action and Music Tempo Characteristics Based on Artificial Intelligence Algorithm
18.1 Introduction
18.2 Methodology
18.2.1 Basic Technology of Artificial Intelligence
18.2.2 Construction of Matching Model Between Dance Actions and Music Tempo Characteristics
18.3 Result Analysis and Discussion
18.4 Conclusions
References
19 Design and Optimization of Frequency Identification Algorithm for Monomelody Musical Instruments Based on Artificial Intelligence Technology
19.1 Introduction
19.2 Methodology
19.2.1 Overall Structure of Audio Features
19.2.2 Frequency Identification Algorithm of Musical Instruments in Single Melody Music
19.3 Result Analysis and Discussion
19.4 Conclusion
References
20 Design of Intelligent Evaluation Algorithm for Matching Degree of Music Words and Songs Based on Grey Clustering
20.1 Introduction
20.2 Methodology
20.2.1 Representation and Extraction of Melody Features
20.2.2 Digital Music Signal Denoising Algorithm
20.3 Result Analysis and Discussion
20.4 Conclusion
References
21 Construction of Evaluation Model for Singing Pronunciation Quality Based on Artificial Intelligence Algorithms
21.1 Introduction
21.2 Two Evaluation Systems Based on Artificial Intelligence Algorithm
21.2.1 Objective Evaluation Based on the Extraction of Evaluation Parameters of Singing Voice
21.2.2 An Objective Evaluation Mechanism Based on Subjective Evaluation Criteria Quantification
21.3 Evaluation Model of Singing Pronunciation Quality
21.4 Analysis of Experimental Results
21.5 Conclusions
References
22 Design and Optimization of Intelligent Composition Algorithm Based on Artificial Intelligence
22.1 Introduction
22.2 Model and Algorithm Design
22.2.1 Music Feature Extraction
22.2.2 Intelligent Composition Algorithm
22.3 Result Analysis and Discussion
22.4 Conclusions
References
23 Design of Computer-Aided Music Generation Model Based on Artificial Intelligence Algorithm
23.1 Introduction
23.2 Research Method
23.2.1 Data Preprocessing
23.2.2 Implementation of AI Algorithm
23.3 Experimental Analysis
23.4 Conclusion
References
24 Construction of Electronic Music Classification Model Based on Machine Learning and Deep Learning Algorithm
24.1 Introduction
24.2 Constructing EM Classification Model
24.2.1 Overall Structural Design
24.2.2 Multi Layer Perceptual Feature Classification Processing
24.3 Construction of EM Classification Model Based on ML and DL Algorithms
24.3.1 NN Algorithm Model for ML Optimization
24.3.2 Analysis of Experimental Results
24.4 Conclusion
References
25 Design of Piano Automatic Accompaniment System Based on Artificial Intelligence Algorithm
25.1 Introduction
25.2 Research Method
25.2.1 Design of Piano Automatic Accompaniment Algorithm
25.2.2 System Structure Design
25.3 Result Analysis
25.4 Conclusion
References
26 The Importance of the Application of Intelligent Management System to Laboratory Management in Colleges and Universities
26.1 Introduction
26.2 Composition of Intelligent Management System
26.3 Advantages and Workflow Analysis of Laboratory Intelligent Management System
26.4 Application of Intelligent Management System in Laboratory Management in Colleges and Universities
26.5 Conclusion
References
27 Design of Defect Detection Algorithm for Printed Packaging Products Based on Computer Vision
27.1 Introduction
27.2 Methodology
27.2.1 Printing Packaging Defect Detection
27.2.2 Printing Defect Detection Algorithm
27.3 Result Analysis and Discussion
27.4 Conclusions
References
28 Leap Motion Gesture Information Collection and Gesture Interaction System Construction
28.1 Introduction
28.2 Methodology
28.2.1 Collection and Processing of Gesture Data
28.2.2 Gesture Recognition Model of Interactive System
28.3 Result Analysis and Discussion
28.4 Conclusion
References
29 Optimization of Moving Object Tracking Algorithm Based on Computer Vision and Vision Sensor
29.1 Introduction
29.2 Methodology
29.2.1 Computer Vision and Moving Object Detection
29.2.2 Moving Target Tracking Algorithm
29.3 Result Analysis and Discussions
29.4 Conclusion
References
30 Simulation Experiment of DPCM Compression System for High Resolution Multi-spectral Remote Sensing Images
30.1 Introduction
30.2 Research Method
30.2.1 WT and Bit Plane Coding
30.2.2 Segmentation DPCM Compression Algorithm
30.3 Simulation Experiment Analysis
30.4 Conclusions
References
Part III Big Data Application in Robotics
31 Construction of Music Classification and Detection Model Based on Big Data Analysis and Genetic Algorithm
31.1 Introduction
31.2 Construction of Music Classification and Detection Model
31.2.1 Related Technologies and Implementation Methods of Music Retrieval
31.2.2 Construction of Music Classification Model
31.3 Analysis of Music Classification and Detection Construction Based on Big Data Analysis and Genetic Algorithm
31.3.1 Model Simulation Base on Big Data Analysis and Genetic Algorithm
31.3.2 Analysis of Experimental Results
31.4 Conclusion
References
32 Simulation of Electronic Music Signal Identification Model Based on Big Data Algorithm
32.1 Introduction
32.2 Music Signal Feature Recognition
32.3 Result Analysis and Discussion
32.4 Conclusion
References
33 Design of Interactive Teaching Music Intelligent System Based on AI and Big Data Analysis
33.1 Introduction
33.2 Methodology
33.2.1 Relevant Theoretical and Technical Basis
33.2.2 Construction of Reciprocal Teaching Music Intelligent System
33.3 Result Analysis and Discussion
33.4 Conclusions
References
34 Research on Music Database Construction Combining Big Data Analysis and Machine Learning Algorithm
34.1 Introduction
34.2 Methodology
34.2.1 Big Data Analysis and Machine Learning Algorithms
34.2.2 Construction of Music Database and Recommendation System
34.3 Result Analysis and Discussion
34.4 Conclusions
References
35 Construction and Optimization Design of Pop Music Database Based on Big Data Technology
35.1 Introduction
35.2 Research Method
35.2.1 Overall Design of Pop Music Database
35.2.2 Database Query Optimization
35.3 Simulation Experiment
35.4 Conclusion
References
36 The Application of Big Data in the Construction of Modern Vocal Education Score Database
36.1 Introduction
36.2 Basic Content of Music Score Database Construction
36.2.1 Construction of Music Score Database
36.2.2 The Concrete Method of Music Score Database Construction
36.3 Application of BD in the Construction of Music Score Database for Modern Vocal Pedagogy
36.3.1 Improved Parallel Association Rule Mining Algorithm Based on Vocal Music Education Data
36.3.2 Analysis of Experimental Results
36.4 Conclusion
References
37 Application of Big Data Analysis Technology in Music Style Recognition and Classification
37.1 Introduction
37.2 Methodology
37.2.1 Digital Management of Music Audio and Video Driven by Big Data
37.2.2 Music Style Recognition and Classification Algorithm
37.3 Result Analysis and Discussion
37.4 Conclusion
References
38 Construction of Piano Music Recognition System Based on Big Data Algorithm
38.1 Introduction
38.2 Basic Characteristics of Piano Music and Music Signal Preprocessing
38.2.1 Overview of Piano Music Characteristics
38.2.2 Piano Music Signal Preprocessing
38.3 System Design
38.4 System Test and Result Analysis
38.5 Conclusions
References
39 Design of Music Recommendation Algorithm Based on Big Data Analysis and Cloud Computing
39.1 Introduction
39.2 Research Method
39.2.1 User Behavior and User Portrait Modeling
39.2.2 Implementation of Music Recommendation Algorithm
39.3 Experimental Analysis
39.4 Conclusion
References
40 Simulation of Sports Damage Assessment Model Based on Big Data Analysis
40.1 Introduction
40.2 Research Method
40.2.1 Establishment of Risk Factors of Sports Injury
40.2.2 Construction of Sports Injury Assessment Model
40.3 Simulation Analysis
40.4 Conclusion
References
41 Construction of Purchase Intention Model of Digital Music Products Based on Data Mining Algorithm
41.1 Introduction
41.2 Methodology
41.2.1 Features of Digital Music Products
41.2.2 Digital Music Product Purchase Prediction Model
41.3 Result Analysis and Discussion
41.4 Conclusion
References
42 English Learning Analysis and Individualized Teaching Strategies Based on Big Data Technology
42.1 Introduction
42.2 Introduction of Educational Big Data and Related Technologies
42.2.1 Big Data for Education
42.2.2 Education Data Mining
42.2.3 The Relationship Between English Teaching and Big Data
42.3 Application of Data Mining Technology in Learning Analysis
42.3.1 Learning Analysis Techniques
42.3.2 Construction of Learning Analysis Technology Model
42.4 Application of Data Mining Technology in Personalized English Teaching
42.4.1 Teaching Design
42.4.2 Use Learning Analysis Technology to Analyze Personalized Data
42.4.3 Personalized Data Feedback
42.5 English Teaching Practice
42.5.1 Research Object Selection and Analysis
42.5.2 Analysis of the Effect of Personalized Learning Supported by WeChat Network Platform
42.5.3 Discussion of Practical Results
42.6 Conclusion
References
43 Research on Distributed Storage and Efficient Distribution Technology of High Resolution Optical Remote Sensing Data
43.1 Introduction
43.2 Related Work
43.3 Methodology
43.4 Result Analysis and Discussion
43.5 Conclusions
References
44 Research and Application of Digital Modeling Technology Based on Multi-source Data of Power Grid
44.1 Introduction
44.2 Related Work
44.3 Research on Digital Modeling Technology Based on Multi-source Data of Power Grid
44.3.1 Power Grid Multi-source Data Perception and Extraction
44.3.2 Digital Modeling of Multi-source Data in Power Grids
44.4 Application of Digital Model Based on Multi-source Data of Power Grid
44.4.1 Electricity Compliance Inspection Based on Digital Power Model
44.4.2 Application of Power Equipment State Feature Analysis Based on Digital Power Model
44.5 Application Results
References
45 Construction of Music Popular Trend Prediction Model Based on Big Data Analysis
45.1 Introduction
45.2 Description and Preprocessing of Music Data
45.2.1 Data Description
45.2.2 Data Analysis
45.3 Construction and Analysis of Numerical Prediction Model
45.3.1 Prediction Model of Music Pop Trend
45.3.2 Prediction and Analysis of Music Trend Based on Big Data Analysis
45.4 Conclusions
References
Part IV Deep Learning and Neural Network
46 Construction of Personalized Music Emotion Classification Model Based on BP Neural Network Algorithm
46.1 Introduction
46.2 Analysis and Extraction of Music Emotional Eigenvalues
46.2.1 MIDI Audio File Analysis
46.2.2 Main Track Recognition
46.2.3 Feature Extraction of MIDI Audio Files
46.3 Construction of Music Emotion Classification Model Based on BP Neural Network
46.4 Analysis of Experimental Results
46.5 Conclusions
References
47 Music Main Melody Recognition Algorithm Based on BP Neural Network Model
47.1 Introduction
47.2 Methodology
47.2.1 Digital Processing of Acoustic and Speech Signals
47.2.2 Music Main Melody Recognition Algorithm
47.3 Result Analysis and Discussion
47.4 Conclusions
References
48 Design of Piano Score Difficulty Level Recognition Algorithm Based on Artificial Neural Network Model
48.1 Introduction
48.2 Methodology
48.2.1 Piano Audio Signal Recognition
48.2.2 Algorithm Design of Piano Score Difficulty Level Recognition
48.3 Result Analysis and Discussion
48.4 Conclusions
References
49 Design and Optimization of Improved Recognition Algorithm for Piano Music Based on BP Neural Network
49.1 Introduction
49.2 Methodology
49.2.1 The Main Task of Music Recognition
49.2.2 Improvement of BPNN in Piano Music Recognition
49.3 Result Analysis and Discussion
49.4 Conclusions
References
50 Design of Piano Music Type Recognition Algorithm Based on Convolutional Neural Network
50.1 Introduction
50.2 Methodology
50.2.1 Related Technical Basis
50.2.2 Design of Piano Music Type Recognition Algorithm
50.3 Result Analysis and Discussion
50.4 Conclusions
References
51 Application of Emotion Recognition Technology Based on Support Vector Machine Algorithm in Interactive Music Visualization System
51.1 Introduction
51.2 Research Method
51.2.1 Emotion Recognition Based on SVM Algorithm
51.2.2 Realization of Interactive Music Visualization System
51.3 Result Analysis
51.4 Conclusion
References
52 Research on Image Classification Algorithm of Film and Television Animation Based on Generative Adversarial Network
52.1 Introduction
52.2 Methodology
52.2.1 Image Characteristics of Contemporary Film and Television Animation
52.2.2 Video Animation Image Classification Algorithm Based on GAN
52.3 Result Analysis and Discussion
52.4 Conclusion
References
53 Study on Monitoring Forest Disturbance During Power Grid Construction Based on BJ-3 Satellite Image
53.1 Introduction
53.2 Study Area and Data
53.3 Study Method
53.3.1 High-Resolution Image Segmentation
53.3.2 Feature Selection
53.3.3 Forest Information Extraction and Interference Monitoring
53.4 Result Analysis
53.5 Conclusion
References
54 Transformative Advances in English Learning: Harnessing Neural Network-Based Speech Recognition for Proficient Communication
54.1 Introduction
54.2 Related Research
54.2.1 Introduction of Neural Network
54.2.2 Introduction of Speech Recognition in Oral English Learning
54.3 Simulation and Experiment of Speech Recognition in Oral English Learning Based on Neural Network
54.3.1 Introduction
54.3.2 Evaluation Metrics for Assessing Speech Recognition Performance in Oral English Learning
54.3.3 Experimental Results and Analysis
54.4 The Role of Neural Network-Based Speech Recognition in Shaping the Future of Oral English Learning
54.5 Conclusion
References
Recommend Papers

Advances in Computational Vision and Robotics. Proceedings of the International Conference on Computational Vision and Robotics
 9783031386503, 9783031386510

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Learning and Analytics in Intelligent Systems 33

George A. Tsihrintzis Margarita N. Favorskaya Roumen Kountchev Srikanta Patnaik   Editors

Advances in Computational Vision and Robotics Proceedings of the International Conference on Computational Vision and Robotics

Learning and Analytics in Intelligent Systems Volume 33

Series Editors George A. Tsihrintzis, University of Piraeus, Piraeus, Greece Maria Virvou, University of Piraeus, Piraeus, Greece Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK

The main aim of the series is to make available a publication of books in hard copy form and soft copy form on all aspects of learning, analytics and advanced intelligent systems and related technologies. The mentioned disciplines are strongly related and complement one another significantly. Thus, the series encourages cross-fertilization highlighting research and knowledge of common interest. The series allows a unified/integrated approach to themes and topics in these scientific disciplines which will result in significant cross-fertilization and research dissemination. To maximize dissemination of research results and knowledge in these disciplines, the series publishes edited books, monographs, handbooks, textbooks and conference proceedings. Indexed by EI Compendex.

George A. Tsihrintzis · Margarita N. Favorskaya · Roumen Kountchev · Srikanta Patnaik Editors

Advances in Computational Vision and Robotics Proceedings of the International Conference on Computational Vision and Robotics

Editors George A. Tsihrintzis Department of Informatics University of Piraeus Piraeus, Greece Roumen Kountchev Department of Radio Communications University of Sofia Sofia, Bulgaria

Margarita N. Favorskaya Institute of Informatics and Telecommunications Reshetnev Siberian State University of Science and Technology Krasnoyarsk, Russia Srikanta Patnaik Interscience Institute of Management and Technology Bhubaneswar, Odisha, India

ISSN 2662-3447 ISSN 2662-3455 (electronic) Learning and Analytics in Intelligent Systems ISBN 978-3-031-38650-3 ISBN 978-3-031-38651-0 (eBook) https://doi.org/10.1007/978-3-031-38651-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

ICCVR-2023 Conference Committee

Honorary Chair Professor-Dr. George A. Tsihrintzis, Department of Informatics, University of Piraeus, Greece

General Chairs Prof. Dr. Margarita N. Favorskaya, Institute of Informatics and Telecommunications, 31, Krasnoyarsky Rabochyave, Krasnoyarsk, 660037, Russia Prof. Dr. Srikanta Patnaik, Interscience Institute of Management and Technology, Bhubaneswar, Odisha, India

Publications Chair Prof. Hakim Bendjenna, Professor, Department of Computer Science, Tebessi University, Algeria

Program Chair Prof. Dr. Junsheng Shi, School of Physics and Electronic Information, Yunnan Normal University

v

vi

ICCVR-2023 Conference Committee

Program Co-chair Prof. (Dr.) Roumen Kountchev, Faculty of Telecommunications, Department of Radio Communications and Video Technologies at the Technical University of Sofia, Bulgaria

Finance Chair Prof. Xiaonan Xiao, Vice Dean of School of Information Science and Technology, Xiamen University Tan Kah Kee College, China

Organizing Chairs Prof. Dhananjaya Sarangi, Interscience Institute of Management and Technology (IIMT), Bhubaneswar, India

Co-organizing Chair Dr. Yonghang Tai, Color and Image Vision Lab., Yunnan Normal University, Kunming, China

Technical Committee Chair Prof. (Dr.) Sushanta Kumar Panigrahi, Principal, Interscience Institute of Management and Technology, Bhubaneswar, India

International Advisory Chair Dr. Roumiana Kountcheva, Vice President, TK Engineering, Sofia

ICCVR-2023 Conference Committee

vii

Student Symposium Chairs Prof. Jacek Lukasz Wilk-Jakubowski, Faculty of Electrical Engineering, Automatic Control and Computer Science, Poland Dr. Junlin Hu, Associate Professor, School of Software, Beihang University, Beijing, China

Technical Program Committee Prof. Vladicescu Popentiu, Florin, City University, UK Prof. Guangzhi Qu, Oakland University, USA Prof. Dr. Zhengtao Yu, Kunming University of Science and Technology Prof. V. S. S. Yadavalli, University of Pretoria, South Africa Prof. Bruno Apolloni, Università degli Studi di Milano, Italy Prof. Harry Bouwman, Delft University of Technology, Netherlands Prof. Shyi-Ming Chen, National Taiwan University of Science and Technology, Taiwan Prof. Yahaya Coulibaly, University Technology Malaysia, Malaysia Prof. Ing Kong, RMIT University, Australia Prof. Gerald Midgley, Centre for Systems Studies, Business School, University of Hull, UK Prof. Khubaib Ahmed, Hamdard University, Pakistan Prof. Moustafa Mohammed Eissa, Faculty of Engineering-Helwan University, Egypt Dr. Fernando Boronat Seguí, Universitat Politecnica de Valencia, Spain Dr. Alexandros Fragkiadakis, Institute of Computer Science (FORTH-ICS), Greece Dr. Cristina Alcaraz, University of Malaga, Spain Dr. Mohamed Atef, Assiut University, Egypt Dr. Weilin Wang, University of Georgia, USA Dr. Bensafi Abd-Ei-Hamid, World Islamic Sciences and Education University, Jordan Dr. YudiGondokaryono, Institute of Teknologi Bandung, Indonesia Dr. Hadi Arabshahi, Ferdowsi University of Mashhad, Iran Dr. Qian Lv, Western Digital, USA Dr. Alojz Poredo, University of Ljubljana, Slovenia Dr. Mohamed F. El-Santawy, Cairo University, Egypt Dr. Tongpin Liu, University of Massachusetts Amherst, USA Dr. Seema Verma, Banasthali University, India Dr. Imran Memon, Zhejiang University, Pakistan Prof. Hakim Bendjenna, University of Larbi Tebessi, Algeria Prof. Sanjiv K. Bhatia, University of Missouri—St. Louis, USA Prof. Debasis Chaudhuri, Defence Electronics Applications Laboratory, India Prof. R. I. Damper, University of Southampton, UK Prof. Guilherme N. DeSouza, University of Western Australia, Australia

viii

ICCVR-2023 Conference Committee

Prof. Maria Gini, University of Minnesota, USA Prof. Lapo Governi, University of Florence, Italy Prof. Wang Han, Nanyang Technological University, Singapore Prof. Takamasa Koshizen, Honda Research Institute Japan Co. Ltd., Japan Dr. Ravi Kothari, IBM Research, India Prof. Hamid Krim, North Carolina State University, USA Prof. Chan-Su Lee, Yeungnam University, South Korea Prof. Pedro U. Lima, Instituto Superior Técnico, Puerto Rico Prof. Mario Mata, Universidad Europea de Madrid, Spain Prof. Zbigniew Michalewicz, University of Adelaide, Australia Prof. Luis Moreno, University of Carlos III, Spain Prof. Ali K. N. ToosiNahvi, University of Technology, Iran Prof. Jayanthi Ranjan, Institute of Management Technology (IMT), India Yassine Ruichek, University of Technology of Belfort-Montbéliard, France Prof. Ashok Samal, University of Nebraska-Lincoln, USA Prof. Ishwar Sethi, Oakland University, USA Prof. M. Hassan Shafazand, Shahid Bahonar University of Kerman, Iran Prof. Rajeev Sharma, Pennsylvania State University, USA Prof. Yu Sun, Michigan State University, USA Prof. Kok Kiong Tan, National University of Singapore, Singapore Prof. Christopher R. Wren, Mitsubishi Electric Research Laboratories, USA Prof. Yeon-Mo Yang, Kumoh National Institute of Technology, South Korea Prof. Mohammed Yeasin, University of Memphis, USA Prof. Anthony Yezzi, Georgia Institute of Technology, USA Prof. Peng-Yeng, Yin, National Chi Nan University, Taiwan Dr. Manas Kumar Pal, Birla Global University, Bhubaneswar Dr. Sharada Prasad Sahoo, Department of Economics and Management, Berham University, Berhampur, Odisha Dr. Rabi Narayan Subudhi, Senior Professor of Management, KIIT University Bhubaneswar, Odisha Dr. Sreekumar, Professor (Decision Science), RIMS, Rourkela, Sundargarh, Odisha, India Dr. Debendra Kumar Mahalik, Department of Business Administration, Odisha, India Dr. Manoranjan Parhi, Department of CSE, Siksha ‘O’ Anusandhan, Odisha, India Dr. Padmalita Routray, Fakir Mohan University, Baleshwar, Odisha, India Dr. Manoranjan Dash, Faculty of Management Sciences, Siksha ‘O’ Anusandhan, Odisha, India

Conference Secretary Mrs. Soma Mitra, Assistant Professor, Interscience Institute of Management and Technology, Bhubaneswar, India

Preface

Computer Vision and Robotics is one of the most prominent disciplines in our information-rich and technology-driven world. With the goal of advancing research in this field, the International Conference on Computational Vision and Robotics (ICCVR-2023) brought together eminent and budding researchers, academicians, experts, and practical innovators from around the world to share their latest research and practical applications. ICCVR-2023 brought together researchers, academicians, students, users, developers from the diverse fields of engineering, computer science, and social and biomedical science who contributed to the context of Computer Vision and Robotics and conferred the urgency to adopt the radical dynamics of automation and intelligent cognition effectively and efficiently. The participating researchers were provided with a platform to present and share their work. Moreover, they were offered the opportunity to emphasize various applications in key functional areas via discussions among the best brains across the world. The conference proceedings contain a wealth of knowledge and insights into the latest developments and practical applications in the field of Computer Vision and Robotics. The papers included in these proceedings represent voluminous efforts to process massive data, extract information and create knowledge from both the theoretic and the applied point of view. It is clear that a step forward is taken toward further research and exploration of the future of Computer Vision and Robotics, while a broader range of applications is expected to emerge in the years to come. We are proud to present these proceedings as a valuable resource for researchers, academicians, and practitioners alike, who seek to deepen their understanding and make their own contributions to the advancement of Computer Vision and Robotics. We extend our sincerest gratitude to all the authors, presenters, and attendees who made ICCVR-2023 a great success and contributed to the growth and development of this important field. Bhubaneswar, India

Srikanta Patnaik General Chair

ix

Acknowledgements

We would like to express our sincere appreciation and gratitude to all those who have contributed to the success of the International Conference on Computational Vision and Robotics (ICCVR-2023). First of all, we would like to express our heartfelt thanks to all the participants, presenters, and attendees of ICCVR-2023. Their active involvement, insightful discussions, and valuable contributions made the conference a truly enriching experience. We would also like to acknowledge EasyChair for their efficient conference management system. Their platform streamlined the submission and review process, making it easier for authors and reviewers to participate in the conference. Our heartfelt appreciation also goes out to the program and technical committee for their diligent work in reviewing and selecting the papers that were presented at the conference. Their expertise and commitment to academic excellence helped to ensure that the papers presented were of the highest quality. Furthermore, we extend our deepest appreciation to the Organizing Committee for their hard work and dedication in planning and executing ICCVR-2023. Their meticulous attention to detail, tireless coordination, and outstanding organizational skills contributed significantly to the smooth running of the conference. Last but not least, we extend our sincere thanks to the Springer Series Editors of Learning and Analytics in Intelligent Systems George A. Tsihrintzis, Maria Virvou, and Lakhmi C. Jain for their support and guidance in publishing the conference proceedings Advances in Computational Vision and Robotics in their esteemed book series. Their commitment to disseminating scholarly research in the field of Computational Vision and Robotics is greatly appreciated.

xi

xii

Acknowledgements

We are grateful to all the individuals and organizations that played a part in making ICCVR-2023 a successful and memorable event. Without your support and dedication, this conference would not have been possible. Thank you all for your contributions and commitment to advancing the field of computational finance and business analytics. Piraeus, Greece Krasnoyarsk, Russia Sofia, Bulgaria Bhubaneswar, India

George A. Tsihrintzis Margarita N. Favorskaya Roumen Kountchev Srikanta Patnaik

About This Book

There is a pressing demand for further research in the fields of Computational Vision and Robotics, which will enable robots and other machines to better emulate human vision. Robot vision is the scientific area which employs algorithms, cameras, and any other hardware that leads robots toward developing visual insight. Computer Vision is important in robotics, as without vision a robot would be blind and only capable of repeating the same exact task over and over until it is reprogrammed. Computer Vision allows a robot to adjust to obstacles in its environment and complete different preprogrammed tasks by recognizing which ones need to be completed. Computer Vision is a multidisciplinary field that could be considered a subfield of artificial intelligence and machine learning, making use of learning methodologies and algorithms. These learning algorithms allow a system to improve and become better and more efficient in performing the tasks for which it has been designed. One can say that the goal of Computer Vision and Robotics is to understand and perceive the content of digital images. Typically, this involves developing methods that attempt to emulate human vision. Understanding the content of digital images may involve extracting information from an image, which might be an object, a text description, a three-dimensional model, and so forth. The International Conference on Computational Vision and Robotics (ICCVR2023) which has been successfully held in Hong Kong during June 3rd–4th, 2023, brought together, researchers, academicians, and industry practitioners to explore new trends in the field of Computational Vision and Robotics. The participants showcased their works and research findings while exchanging ideas with other collaborators. The conference was a great success receiving around 80 contributions from researchers across the world. Finally, 54 papers have been carefully shortlisted for publication in the conference proceedings based on rigorous review criteria. These papers are broadly categorized into the following parts: i. ii. iii. iv.

Pattern Recognition and Robotic Vision. Artificial Intelligence and Deep Learning application. Big Data Application in Robotics. Deep Learning and Neural Network.

xiii

xiv

About This Book

Part I in this proceedings has been categorized as “Pattern Recognition and Robotic Vision.” It is comprised of research works that are directly related to the field of robotics as it examines massive amounts of sensor data, providing the ability for self-driven navigation, forecasting maintenance needs, and enhancing human– robot communication. A total of 17 papers have been included in this part based on the problem they are addressing. Specifically, papers range from systems based on Markov models, retrieval models based on particle swarm optimization, cloud matching algorithm based on multi-scale feature extraction, overhead contact line monitoring system based on image processing, and UHV transmission line selection strategy aided by satellite remote sensing image. Another 13 papers are included in the following part entitled “Artificial Intelligence and Deep Learning Application.” The research focus of these papers is on numerous applications across various industries which have revolutionized industries by providing personalized experiences, efficient decision-making, gray clustering, defect detection algorithms for printed packaging products based on Computer Vision, gesture information collection and gesture interaction system construction, optimization of moving object tracking algorithm, and simulation of high-resolution multi-spectral remote sensing images. Next, 15 papers have been selected for Part III entitled “Big Data Application in Robotics.” This is a crucial research topic for robotics as it helps to analyze a large amount of sensor data, leading to autonomous navigation, predictive maintenance, and better human–robot interaction. It also enables machine learning algorithm adaptation, task optimization, and collaborative robotics. This part presents contributions of researchers in the form of classification and detection model based on big data analysis and genetic algorithms, construction combining big data analysis and machine learning algorithm, algorithm based on big data analysis, and cloud computing, Teaching strategies based on big data technology and application of digital modeling technology based on multi-source data of power grid. “Deep Learning and Neural Network” has been the topic of the last part which includes nine research papers selected on the basis of their applicability to enable image recognition, melody recognition algorithm based on backpropagation neural network model, recognition algorithm based on convolution neural network and monitoring forest disturbance during power grid construction based on BJ-3 Satellite image, etc. With this, the proceeding of ICCVR-2023 presents an extremely informative compilation of recent developments areas. I am sure the authors shall find lots of insights in this domain.

Contents

Part I 1

Pattern Recognition and Robotic Vision

Design of Piano Automatic Accompaniment System Based on Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuqiu Wu and Chun Liu

3

2

3D Visual Design of Music Based on Multi-audio Features . . . . . . . . Duo Liu and Chun Liu

3

Construction of Humming Music Retrieval Model Based on Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hua Quan and Sanjun Yao

21

Research on Audio Processing Method Based on 3D Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kai Li, Yaping Tang, and Yuanling Ouyang

31

Design and Optimization of Point Cloud Registration Algorithm Based on Stereo Vision and Feature Matching . . . . . . . . . Yifeng Wang, Shanshan Li, and Shuai Huang

43

Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based on Multi-scale Feature Extraction . . . . . . . . . . . . . . Shanshan Li, Yifeng Wang, and Shuai Huang

53

Design of Digital Music Copyright Protection System Based on Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yirui Kuang and Yao Sanjun

63

Personalized Music Recommendation Model Based on Collaborative Filtering Algorithm and K-Means Clustering . . . . Huijia Peng and Sanjun Yao

73

4

5

6

7

8

11

xv

xvi

9

Contents

Simulation of Fuzzy Calculation Model of Music Emotion Based on Improved Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . Yiming Wang, Yaping Tang, and Yuanling Ouyang

83

10 Design and Implementation of Piano Performance Automatic Evaluation System Based on Support Vector Machine . . . . . . . . . . . . Mingzhu Zhang and Chun Liu

95

11 Simulation of Music Personalized Recommendation Model Based on Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Miao Zhong and Chaozhi Cheng 12 Design and Optimization of Image Recognition and Classification Algorithm Based on Machine Learning . . . . . . . . . 115 Zeng Dan and Chen Yi 13 Design of Path Planning Algorithm for Intelligent Robot Based on Chaos Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Min Sun 14 Design and Development of Rail Transit Overhead Contact Line Monitoring System Based on Image Processing . . . . . . . . . . . . . . 137 Zhigang Jiang, Wu Tan, Hao Tan, and Jun Huang 15 Ultrasonic Signal Processing Method for Transformer Oil Based on Improved EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Yihua Qian, Qing Wang, Yaohong Zhao, Dingkun Yang, and Zhuang Yang 16 Research on UHV Transmission Line Selection Strategy Aided by Satellite Remote Sensing Image . . . . . . . . . . . . . . . . . . . . . . . . 159 Wei Du, Guozhu Yang, Chuntian Ma, Enhui Wei, and Chao Gao 17 Research on the Evaluation of the Teaching Process of Public Physical Education in Universities Based on Markov Model . . . . . . . 169 Yilin Li Part II

Artificial Intelligence and Deep Learning Application

18 Simulation Design of Matching Model Between Action and Music Tempo Characteristics Based on Artificial Intelligence Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Leizhi Yu, Yaping Tang, and Yuanling Ouyang 19 Design and Optimization of Frequency Identification Algorithm for Monomelody Musical Instruments Based on Artificial Intelligence Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Wenxiao Wang and Sanjun Yao

Contents

xvii

20 Design of Intelligent Evaluation Algorithm for Matching Degree of Music Words and Songs Based on Grey Clustering . . . . . . 201 Yipeng Li and Sanjun Yao 21 Construction of Evaluation Model for Singing Pronunciation Quality Based on Artificial Intelligence Algorithms . . . . . . . . . . . . . . . 209 Yaping Tang, Yunfei Gao, and Yuanling Ouyang 22 Design and Optimization of Intelligent Composition Algorithm Based on Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 219 Yuxi Chen and Chunqiu Wang 23 Design of Computer-Aided Music Generation Model Based on Artificial Intelligence Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Wenyi Peng, Yaping Tang, and Yuanling Ouyang 24 Construction of Electronic Music Classification Model Based on Machine Learning and Deep Learning Algorithm . . . . . . . . . . . . . 239 Yipeng Li and Sanjun Yao 25 Design of Piano Automatic Accompaniment System Based on Artificial Intelligence Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Xinwen Zhang and Chun Liu 26 The Importance of the Application of Intelligent Management System to Laboratory Management in Colleges and Universities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Xu Feijian 27 Design of Defect Detection Algorithm for Printed Packaging Products Based on Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Shubao Zhou 28 Leap Motion Gesture Information Collection and Gesture Interaction System Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Yuan Wang 29 Optimization of Moving Object Tracking Algorithm Based on Computer Vision and Vision Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Gongchao Liu 30 Simulation Experiment of DPCM Compression System for High Resolution Multi-spectral Remote Sensing Images . . . . . . . 303 Wei Du, Yi Wu, Maojie Tian, Wei Hu, and Zhidong Li Part III Big Data Application in Robotics 31 Construction of Music Classification and Detection Model Based on Big Data Analysis and Genetic Algorithm . . . . . . . . . . . . . . . 315 Lingjian Tang and Sanjun Yao

xviii

Contents

32 Simulation of Electronic Music Signal Identification Model Based on Big Data Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Sanjun Yao and Hongru Ji 33 Design of Interactive Teaching Music Intelligent System Based on AI and Big Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Chun Liu and Shanshan Li 34 Research on Music Database Construction Combining Big Data Analysis and Machine Learning Algorithm . . . . . . . . . . . . . . . . . 343 Sanjun Yao and Yipeng Li 35 Construction and Optimization Design of Pop Music Database Based on Big Data Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Chunqiu Wang and Xucheng Geng 36 The Application of Big Data in the Construction of Modern Vocal Education Score Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Hongru Ji and Sanjun Yao 37 Application of Big Data Analysis Technology in Music Style Recognition and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Haiqing Wu and Feng Wu 38 Construction of Piano Music Recognition System Based on Big Data Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Qianmei Kuang and Chun Liu 39 Design of Music Recommendation Algorithm Based on Big Data Analysis and Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Yanxin Bai and Chun Liu 40 Simulation of Sports Damage Assessment Model Based on Big Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Xiaodong Li and Zujun Song 41 Construction of Purchase Intention Model of Digital Music Products Based on Data Mining Algorithm . . . . . . . . . . . . . . . . . . . . . . 411 Wenjie Hu, Chunqiu Wang, and Xucheng Geng 42 English Learning Analysis and Individualized Teaching Strategies Based on Big Data Technology . . . . . . . . . . . . . . . . . . . . . . . . 421 Yang Wu 43 Research on Distributed Storage and Efficient Distribution Technology of High Resolution Optical Remote Sensing Data . . . . . . 431 Guozhu Yang, Wei Du, Wei Hu, Chao Gao, Enhui Wei, and Bangbo Zhao

Contents

xix

44 Research and Application of Digital Modeling Technology Based on Multi-source Data of Power Grid . . . . . . . . . . . . . . . . . . . . . . 441 Junfeng Qiao, Hai Yu, Zhimin He, Lianteng Shen, and Xiaodong Du 45 Construction of Music Popular Trend Prediction Model Based on Big Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Yitong Zhou and Chunqiu Wang Part IV Deep Learning and Neural Network 46 Construction of Personalized Music Emotion Classification Model Based on BP Neural Network Algorithm . . . . . . . . . . . . . . . . . . 463 Siyu Yan, Chunqiu Wang, and Xucheng Geng 47 Music Main Melody Recognition Algorithm Based on BP Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Peng Tongxin and Chaozhi Cheng 48 Design of Piano Score Difficulty Level Recognition Algorithm Based on Artificial Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . 483 Zhaoheng Chen and Chun Liu 49 Design and Optimization of Improved Recognition Algorithm for Piano Music Based on BP Neural Network . . . . . . . . . . . . . . . . . . . 495 Zhaoheng Chen and Chun Liu 50 Design of Piano Music Type Recognition Algorithm Based on Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Yuche Liu and Chun Liu 51 Application of Emotion Recognition Technology Based on Support Vector Machine Algorithm in Interactive Music Visualization System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Feng Wu and Haiqing Wu 52 Research on Image Classification Algorithm of Film and Television Animation Based on Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Li Yang 53 Study on Monitoring Forest Disturbance During Power Grid Construction Based on BJ-3 Satellite Image . . . . . . . . . . . . . . . . . . . . . 535 Zijian Zhang, Peng Li, and Xiaobin Zheng 54 Transformative Advances in English Learning: Harnessing Neural Network-Based Speech Recognition for Proficient Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Tianshi Ge and Yang Wu

About the Editors

Prof. (Dr.) George A. Tsihrintzis received the Diploma of Electrical Engineering from the National Technical University of Athens, Greece (with honors) and the M.Sc. and Ph.D. degrees in Electrical Engineering from Northeastern University, Boston, Massachusetts, USA. He is currently a Professor at the University of Piraeus, Greece, and Head of its Department of Informatics. From 2008 to 2016, he also served as the Director of the Graduate Program of Study in “Advanced Computing and Informatics Systems” of the department. His current research interests include knowledge-based/intelligent software engineering, software personalization, information retrieval from databases, artificial intelligence, machine learning, pattern recognition, decision theory, and statistical signal processing and their applications in multimedia interactive services, user modeling, knowledge-based software systems, human–computer interaction and information retrieval. He has authored or co-authored over 350 research publications in these areas, which have appeared in international journals, book chapters, and conference proceedings, and has served as the principal investigator or co-investigator in several R&D projects. He has (co-) edited about 40 research books, and he has organized and chaired about 40 international conferences. He has supervised ten doctoral students who have received their doctoral degrees and is currently supervising an additional five students. Prof. (Dr.) Margarita N. Favorskaya is a Professor and Head of Department of Informatics and Computer Techniques at Reshetnev Siberian State University of Science and Technology, Russian Federation. She is a member of KES organization since 2010, the IPC member and the Chair of invited sessions of over 30 international conferences. She serves as our viewer in international journals (Neuro-computing, Knowledge Engineering and Soft Data Paradigms, Pattern Recognition Letters, Engineering Applications of Artificial Intelligence), an associate editor of Intelligent Decision Technologies Journal, International Journal of Knowledge-Based and Intelligent Engineering Systems, International Journal of Reasoning-based Intelligent Systems, a Honorary Editor of the International Journal of Knowledge Engineering and Soft Data Paradigms, the Reviewer, Guest Editor, and Book Editor (Springer). She is the author or the co-author of 200 publications and 20 educational manuals in xxi

xxii

About the Editors

computer science. She co-authored/co-edited around 20 books/conference proceedings for Springer in the last 10 years. She supervised nine Ph.D. candidates and presently supervising four Ph.D. students. Her main research interests are digital image and videos processing, remote sensing, pattern recognition, fractal image processing, artificial intelligence, and information technologies. Prof. (Dr.) Roumen Kountchev, Ph.D., D.Sc. is a professor at the Faculty of Telecommunications, Department of Radio Communications and Video Technologies at the Technical University of Sofia, Bulgaria. His scientific areas of interest are: digital signal and image processing, image compression, multimedia watermarking, video communications, pattern recognition, and neural networks. Prof. Kountchev has 341 papers published in magazines and conference proceedings (71 international); 15 books; 46 book chapters; 20 patents (3 international). He had been principle investigator of 38 research projects (6 international). At present, he is a member of Euro Mediterranean Academy of Arts and Sciences (EMAAS) and President of Bulgarian Association for Pattern Recognition (member of International Association for Pattern Recognition). Editorial board member of: International Journal of Reasoning-based Intelligent Systems; International Journal Broad Research in Artificial Intelligence and Neuroscience; KES Focus Group on Intelligent Decision Technologies; Egyptian Computer Science Journal; International Journal of Bio-Medical Informatics and e-Health; and International Journal Intelligent Decision Technologies. He has been a plenary speaker at: WSEAS International Conference on Signal Processing 2009, Istanbul; WSEAS International Conference on Signal Processing, Robotics and Automation, University of Cambridge 2010, UK; WSEAS International Conference on Signal Processing, Computational Geometry and Artificial Vision 2012, Istanbul, Turkey; Intern. Workshop on Bioinformatics, Medical Informatics and e-Health 2013, Ain Shams University, Cairo, Egypt; Workshop SCCIBOV 2015, DjillaliLiabès University, Sidi Bel Abbès, Algérie; Intern. Conf. on Information Technology 2015 and 2017, Al Zayatoonah University, Amman, Jordan; WSEAS European Conf. of Computer Science 2016, Rome, Italy. Prof. (Dr.) Srikanta Patnaik is the Director of IIMT, Bhubaneswar. He received his Ph.D. (Engineering) on Computational Intelligence from Jadavpur University, India, in 1999 and supervised 34 Ph.D. theses and more than 60 M.Tech. theses in the area of machine intelligence, soft computing applications, and re-engineering. He has published more than 100 research papers in international journals and conference proceedings. He is author of two textbooks and edited 82 books and few invited book chapters, published by leading international publishers like Springer-Verlag and Kluwer Academic. He is the Editors-in-Chief of International Journal of Information and Communication Technology and International Journal of Computational Vision and Robotics published from Inderscience Publishing House, England and also Editors-in-chief of Book Series on “Modeling and Optimization in Science and Technology” published from Springer, Germany.

Part I

Pattern Recognition and Robotic Vision

Chapter 1

Design of Piano Automatic Accompaniment System Based on Markov Model Yuqiu Wu and Chun Liu

Abstract Based on the increasing demand for spiritual and cultural life in modern society and the deepening of cultural system reform, computer automatic composition has become an increasingly important topic. In this paper, an automatic piano accompaniment scheme based on hidden Markov model is proposed, which expresses the link probability of adjacent pitch melody elements and provides basic data for melody calculation. The test results show that the stability of the piano automatic accompaniment system constructed in this paper is still at a high level under the condition of many transaction sets. From the comparative experiments of the algorithm on the sample set, it can be found that the accuracy of the algorithm in this paper is about 95% on the test set and 95.78% on the experimental set. Compared with the traditional composition method based on notes, the composition based on HMM can improve the logical rigor and aesthetic feeling of melody, and can provide algorithm support for the construction of piano automatic accompaniment system. Keywords Markov model · Piano accompaniment · Algorithm composition

1.1 Introduction After computer technology has become the mainstream subject in today’s society, people are gradually accustomed to completing various tasks through intelligent computers, and audio recognition through intelligent computers is a core research content of speech recognition [1]. The research of automatic piano accompaniment system belongs to the field of “automatic harmony for melody”. It is a research branch in the field of algorithmic composition, that is, the research of multi-voice algorithmic composition system [2]. Automatic composition is a study that attempts to use a formal process to minimize people’s involvement in music creation by using Y. Wu · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_1

3

4

Y. Wu and C. Liu

computers. Algorithmic composition refers to using some logical process to control the generation of music [3]. A Markov transition table can be constructed by using a certain style of music, and the note set is the state space of the Markov model, and then the possibility of the next note to appear can be calculated according to the state transition matrix of the Markov model [4, 5]. The common pitch detection method is the pitch detection method based on autocorrelation function, which belongs to a time domain detection algorithm. It has the advantages of simple calculation process, but it also has some disadvantages, that is, the error of pitch frequency doubling or half frequency. Huang et al. proposed a method of piano audio signal feature recognition based on wavelet packet transform, which reduced the signal noise by using spectral feature analysis, and decomposed the characteristics of piano audio signal by using threshold detection, thus improving the intelligent analysis ability of audio [6, 7]. This method can only identify the noise in the voice playing part, but can’t identify the noise in other periods. Wang et al. proposed an image denoising algorithm based on convolutional neural network. Starting from the noise capture in the image, the noisy image and the noisy image were discriminated together, and the parameters of the model were systematically optimized. Finally, the whole discriminant generator reached a balanced structure [8]. This method has a good effect on identifying the low-threshold noise on both sides of the range, but it can’t accurately extract the noise mixed with audio. Dong and others put forward the thinking model of possibility construction space, and applied the theory of possibility construction space in creative thinking model to the field of music composition for exploration and research, which provided a new way for the research of intelligent composition [9].

1.2 Methodology 1.2.1 Related Theory of Algorithmic Composition Excellent algorithms can obtain effective and standardized output results within a specified time through standardized data input [10]. At the same time, by measuring the spatial complexity and time complexity of the algorithm, the advantages and disadvantages of the algorithm can be judged again, so that it can be continuously upgraded and optimized. Algorithmic composition is essentially a composition method based on different algorithms as creative technical means, which uses some logical process and algorithmic process formulated by composers or program creators to control the generation of music data [11]. After setting the algorithm for the type of music to be generated, the computer will not interfere with the generation of music at the lowest degree, and let the computer control the operation process and generate music works as much as possible. The diversity of music creation means determines that the structure and characteristics of modernist music works are more complicated than any previous historical period. From the perspective of music composition

1 Design of Piano Automatic Accompaniment System Based on Markov …

5

techniques, modernist music broke the creative laws of western traditional music, such as harmony, strength contrast and rhythm change, which caused the audience’s inherent auditory inertia thinking to be broken, resulting in the audience’s differences on modernist music aesthetics and the estrangement from modern music works. It can also be briefly summarized as an insurmountable distance between modernist music creation aesthetics and traditional music aesthetics. As the artistic expression of composing music works, the key lies in the creator’s understanding and expression of human composing thinking when creating algorithms and limiting rules. Artificial intelligence music creation based on algorithmic composition has been criticized by the industry for its ethical and aesthetic concepts. Back to the starting point of artificial intelligence music creation, when artificial intelligence creates music, it doesn’t compose any notes, but only constructs the theory in the procedure, and its melody exists independently. Although harmony and other elements of music have not been ignored, it is ultimately a mathematical phenomenon produced in repeated melodies.

1.2.2 Automatic Piano Accompaniment System Based on HMM The common pitch detection method is the pitch detection method based on autocorrelation function, which belongs to a time domain detection algorithm. It has the advantages of simple calculation process, but it also has some disadvantages, that is, the error of pitch frequency doubling or half frequency [12]. Sound intensity is the strength of sound, that is, the vibration amplitude of vibrating body. Too strong sound will make people have strong emotions, while too weak sound needs high attention and will make people feel nervous, so soft music needs medium-strong sound to express it. The framework of piano playing audio recognition is shown in Fig. 1.1. In the process of signal processing, it is need to calculate the short-term energy change of the signal. In the process of using short-term amplitude to represent the signal energy, it is difficult to ensure the accuracy of energy calculation because the effective length of audio data is uneven. Therefore, it is need to first divide the frame of audio data and turn it into uniform audio paragraphs. Assume that the time domain signal of music waveform is x(n), and the music signal of the i frame obtained after framing by the windowing function w(n) is yi (n). Then it satisfies the following formula: yi (n) = w(n) ∗ x((i − 1) ∗ inc + n), 1 ≤ n ≤ L, 1 ≤ i ≤ fn

(1.1)

n = 1, 2, . . . , L, i = 1, 2, . . . , fn

(1.2)

where w(n) is a window function and yi (n) is the amplitude of a frame; L is the frame length. inc is the frame shift length; fn is the quantity of frames after framing.

6

Y. Wu and C. Liu

Fig. 1.1 Piano playing audio recognition framework

Calculate the short-term average energy of any frame of music signal yi (n) according to the following formula: E(i) =

L−1 

y2i (n)

(1.3)

n=0

By means of matrix, the signal can identify its own noise adaptively, and all related elements are arranged in the waveform of the audio signal in a matrix manner. In this process, all elements will not change in size and characteristics, and the only change is the position information of each element.

1.3 Result Analysis and Discussion Automatic piano accompaniment is a research branch in the field of algorithmic composition, that is, the research of multi-voice algorithmic composition system. Automatic composition is a study that attempts to use a formal process to minimize people’s involvement in music creation by using computers. In order to fully

1 Design of Piano Automatic Accompaniment System Based on Markov …

7

measure the performance of the algorithm, two indicators were proposed for evaluating the annotation results, namely the proportion of incorrect fingering and the unreasonable rate. The results were compared with other existing algorithms on a self built dataset. In terms of evaluation indicators, in addition to the consistency rate compared with manually annotated fingering sequences, the article also proposes two new evaluation indicators: the proportion of non elastic fingering and the unreasonable rate to measure the flexibility and performance efficiency of fingering. In order to analyze and verify the feasibility of the above algorithm, the short-time autocorrelation algorithm and short-time amplitude difference algorithm are compared with this algorithm, and the error results are shown in Fig. 1.2. Compared with the other two algorithms, this algorithm has certain advantages. For harmony HMM, we basically divide the observed values into segments; however, if a bar is a single-beat independent sound group structure in rhythm contrast HMM, then the observed value of this bar needs to be calculated by a special method. The accuracy of different algorithms on test set and experimental set are shown in Figs. 1.3 and 1.4 respectively. The accuracy of the algorithm in this paper is about 95% on the test set and 95.78% on the experimental set. Generally speaking, the accuracy of the algorithm is at a high level. The database cannot recognize the observation status of the current segment. In the database, we replace these absent observations according to the knowledge of music and convert them into identifiable observations in the database to ensure the smooth completion of the music arrangement stage. Three different methods are used

Fig. 1.2 Comparison of frame fundamental frequency extraction error results of different algorithms

8

Fig. 1.3 Accuracy of the algorithm on the test set

Fig. 1.4 Accuracy of the algorithm on the experimental set

Y. Wu and C. Liu

1 Design of Piano Automatic Accompaniment System Based on Markov …

9

Fig. 1.5 System stability

for comparative experiments, and the stability of the piano automatic accompaniment system based on HMM is obtained, as shown in Fig. 1.5. It can be seen that the stability of the piano automatic accompaniment system constructed in this article is still at a high level in the case of more transaction sets. And from the comparative experiments of the algorithm on the sample set, it is concluded that the accuracy of the algorithm in this article is about 95% on the test set and 95.78% on the experimental set. Generally speaking, the performance of the piano automatic accompaniment system is excellent, reaching the expected level. Compared with the traditional method of analyzing a large quantity of works to obtain conversion table data, this method omits the huge workload generated by analyzing a large quantity of works. Compared with the creative mode of “input existing works and output simulated works”, it overcomes the problem of generalizing the whole with the partial. At the same time, because the initial melody element of the music is set manually, it not only realizes automation, but also ensures a certain degree of flexibility.

1.4 Conclusions Piano automatic accompaniment is a research branch in the field of algorithmic composition, that is, the research of multi-voice algorithmic composition system. Automatic composition is an attempt to use a formal process to minimize people’s

10

Y. Wu and C. Liu

participation in music creation by using computers. This paper proposes an automatic piano accompaniment scheme based on HMM. The results show that the accuracy of the algorithm is about 95% on the test set and 95.78% on the experimental set. Moreover, the stability of the piano automatic accompaniment system constructed in this paper is still at a high level, and the expected effect has been achieved. This algorithm generates music, which fully considers the specific problems of music structure and music unit organization. In the process of organization, it is always driven by the softness of a certain style, that is, the object is very clear. The algorithm will show limitations when generating note-intensive melodies, because note-intensive means more notes in a bar, which leads to more conversion of pitch melody elements. Next, we can consider expanding the melody elements on the scale of the number of notes and designing melody elements with appropriate pitch.

References 1. A.J. Kirke, Applying quantum hardware to non-scientific problems: Grover’s algorithm and rule-based algorithmic music composition. Int. J. Unconv. Comput. 14(3/4), 349–374 (2019) 2. L. Döbereiner, Between the abstract and the concrete: a constraint-based approach to navigating instrumental space. Comput. Music J. 43(1), 8–20 (2020) 3. W. Meng, R. Qu, Automated design of search algorithms: learning on algorithmic components. Expert Syst. Appl. 185(6), 115493 (2021) 4. Y. Zhang, Application of fuzzy neural network in music recognition. Rev. Fac. Ing. 32(6), 171–180 (2017) 5. A. Baro, P. Riba, J. Calvo-Zaragoza et al., From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123(5), 1–8 (2019) 6. Y. Wang, Research on music recognition algorithm based on RBF neural network. Rev. Fac. Ing. 32(8), 707–712 (2017) 7. Z. Huang, Study on the role of computer aided audio recognition in music conductor. Bol. Tec./ Tech. Bull. 55(16), 476–483 (2017) 8. Y. Wang, Research on handwritten note recognition in digital music classroom based on deep learning. J. Internet Technol. 2021(6), 22 (2021) 9. Y. Dong, X. Yang, X. Zhao et al., Bidirectional convolutional recurrent sparse network (BCRSN): an efficient model for music emotion recognition. IEEE Trans. Multimed. 21(12), 3150–3163 (2019) 10. V. Chaturvedi, A.B. Kaur, V. Varshney et al., Music mood and human emotion recognition based on physiological signals: a systematic review. Multimed. Syst. 2022(1), 28 (2022) 11. S.W. Hsiao, S.K. Chen, C.H. Lee, Methodology for stage lighting control based on music emotions. Inf. Sci. 412–413, 14–35 (2017) 12. Y. Zhao, Research on resonance state of music signal based on filter model. Bol. Tec./Tech. Bull. 55(15), 206–210 (2017)

Chapter 2

3D Visual Design of Music Based on Multi-audio Features Duo Liu and Chun Liu

Abstract In order to better transform the auditory image of music into visual image and serve the values and aesthetics of modern appreciators, the perceptual auditory standard of music is deeply integrated with the emerging multimedia to a higher degree, thus forming music visualization. In this paper, a three-dimensional visual design method of music based on multi-audio features is proposed, which extracts various features of music, and then carries out comprehensive visual design on these features, so that images can express more musical information. On the basis of music style classification, a comprehensive visualization method based on various audio features is designed. Firstly, the multiple features of music are extracted, and then these features are designed comprehensively and visually, so that the image can express more music information. In this study, the characteristics of note tones are regarded as the identification marks of notes, which improves the sensitivity and ability of tone recognition and promotes the development of music visualization. Keywords Audio features · Music information · Visualization

2.1 Introduction Music, as a unique artistic form of people’s emotional expression, focuses on a rendering of the soul of the music audience and a re-experience of the virtual reality world. This experience is all-round, not just limited to hearing [1]. In the current highspeed development of Internet information age, computer technology is booming, and music visualization has been increasingly studied by academic circles and paid attention to by the market, which in turn has spawned a series of visualization application technologies [2]. After the music is composed, the interpretation methods such as playing musical instruments or singing human voices can convey it to the audience. In order to make the audience fully understand the emotions and thoughts D. Liu · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_2

11

12

D. Liu and C. Liu

expressed by music, the creators or communicators adopt different auxiliary expressions [3]. The development of visualization provides a new way for the expression of music, using specific rules to interpret and produce reproducible visual effects. In order to better transform the auditory image of music into visual image and serve the values and aesthetics of modern appreciators, the perceptual auditory standard of music is deeply integrated with emerging multimedia to a higher extent, thus forming music visualization, that is, the transformation from music to images [4]. It is a process presentation method, which provides a brand-new way of interpretation and deduction for music appreciation. Music visualization is a kind of non-subjective interpretation and judgment of music expression with vision as the core, and it is a presentation technology for understanding, analyzing and comparing the expressive force and internal structure of music [5]. Data visualization method is an intuitive, simple and reasonable method to summarize and present data, and its main purpose is to convey information more clearly and efficiently by means of graphics or images. A reasonable visualization result can help users to analyze and interpret data, thus making the original complex data easy to understand and use [6]. Music expresses people’s thoughts through auditory form, but when the artistic conception of music is profound and difficult to understand, the appearance of music visualization solves such a problem [7]. This technology makes music more vivid and makes the public understand the charm of music art better. There are different understandings of concerts from different angles. Character studies treat music as an abstract art form. However, in synaesthesia, it is considered that hearing and vision are very close in the underlying nature and psychological characteristics, so these two art forms are often assisted with each other to achieve better results [8]. Music notes are not only used to learn music, but also to search and classify songs according to the recognition of music notes, thus promoting the diversified development of music [9]. In order to appreciate music from the perspective of data and image visualization, and enhance the understanding, analysis and processing of music, this article puts forward a 3D visualization design method of music based on multi-audio features, which takes the tone characteristics of notes as the recognition marks of notes, improves the sensitivity and recognition ability of tone recognition, and promotes the development of music visualization field.

2.2 Methodology 2.2.1 Audio Visualization Method In the process of audio visualization, by using different shapes, sizes, colors and animations to express the characteristics of audio, visual psychology and auditory psychology can reach a unified state, so that audio can be visualized more appropriately. When visualizing multiple audios, it is also necessary to analyze the degree

2 3D Visual Design of Music Based on Multi-audio Features

13

of correlation between audios. The metrics used include editing distance, dynamic time warping path length, Euclidean space distance of musical tonality, etc. Finally, the correlation is presented by visual graphics. The artistic creation of music visualization is not a simple computer three-dimensional animation, but the creation of real three-dimensional images in an immersive virtual environment. The extension of the creative mode brought by the visual art of music is mainly to integrate the immersive art and virtual technology, to integrate modern media forms, to integrate images, sounds, images and other aspects into one, and to give the audience a full sense of audio-visual immersion. Data visualization method is a process of presenting abstract data in the form of intuitive graphics or images, and visualized data can convey the information in the data more clearly and effectively. Data visualization uses computer graphics and other technologies to convert data into graphics or images and display them on the screen, so that users can fully understand the information that data can convey by observing the visualization content [10]. In order to create a sense of immersion in music visualization, it is necessary to present a virtual environment in order to give the audience a high-level experience, which requires the combination of image system and projection system. Music visualization is inseparable from emotion, and images directly touch human perception, so images become the core element of immersion. In order to effectively convey the information in the data, visualization needs to ensure that the artistic form of presentation is fully considered on the basis of presenting accurate data characteristics, so as to lead users to deeply observe the obscure and complicated data characteristics and associations in the data set. The graphic elements of audio features are not only the line type and color, but also the size of the shape. If the shape presented is too large, it will give people a sense of oppression. If it is too small, it may distract the audience’s visual focus and bring visual confusion. The size of the shape is also directly related to the rendering effect of animation. Too big or too small will weaken the impact of animation. In the process of visualization, only by balancing the presentation form and content of visualization means can we avoid flashy or boring results. Music visualization is not only reflected in the creative process, but also a refreshing experience for the audience and participants to appreciate the works. The persistence of images keeps visual works real-time when users interact, and it is processed and calculated by computer system to create a virtual environment.

2.2.2 Multi-audio Feature Extraction of Music Feature Extraction and Analysis of Intelligent Segmentation of Music Notes. Music notes propagate through air vibration, using various media for error-free propagation. Usually, the sound is distinguished by its frequency, wavelength, phase and volume [11]. For musical notes, frequency, amplitude, sound intensity, pitch, sound length and melody are the main physical characteristics of musical notes. Each physical characteristic shows different characteristics of musical notes, and these basic

14

D. Liu and C. Liu

characteristics play a major role in composing chords, rhythms and melodies in the middle and high levels of music. In view of the fact that most visualization effects are based on a single feature, this article proposes a music visualization method based on multiple features, which selectively extracts multiple features from the music in the music library and constructs a feature set. Then, according to the feature set, the multi-feature visual design of different types of music is realized. The sound source separation technology of audio processing is shown in Fig. 2.1. The intelligent segmentation and recognition method of musical notes studied in this article preprocesses the musical notes that need to be segmented before cutting, eliminates the tones that are redundant in melody and do not have cutting conditions, and summarizes the musical notes with the same characteristics after feature extraction, which reduces the repetitive note segmentation and recognition process and shortens the processing time of musical notes while ensuring the efficiency of note segmentation. The musical note feature recognition model is shown in Fig. 2.2. Through the computer-recognizable parameters in music or images, combined with the pre-built parameters of designers to express the content, create new images or music. For a hidden unit, use xt to represent the input of step t. The activation value of the current unit is:   s = f Uxt + W St−1

(2.1)

Among them, f represents the activation function, and ReLU is used in this article. The output of step t is calculated by the Softmax layer. The value i t of the input gate unit controls how much of the input at the current time point can enter the memory unit, and its calculation expression is: i t = σ (Wxi xt + Whi h t−1 + Wci ct−1 + bi )

Fig. 2.1 Sound source separation technology

(2.2)

2 3D Visual Design of Music Based on Multi-audio Features

15

Fig. 2.2 Music note feature recognition model

Among them, Wxi , Whi , Wci are used to control the connection weights related to the input gate, and bi is the bias term. Framing processing is performed to divide the time-domain discrete signal into overlapping frames: X ST F T (k, n) =

N −1 

x(n − m)w(m)e− j2kπ m/N

(2.3)

m=0

Among them, k represents the frequency coordinate; n represents the center of the short-time Fourier transform window; w(m) is the Hamming window. Strong X ST F T (k, n) maps to a twelve-dimensional vector p(k), each dimension representing the intensity of a semitone level. The mapping formula is:   p(k) = 12 log2

k · fr e f N · f sr

 mod 12

(2.4)

Among them, fr e f is the reference frequency, and f sr is the sampling rate. Accumulate the frequency values of the frequency points corresponding to each sound level to obtain the value of each sound level contour feature component of each time segment. The formula is as follows:

16

D. Liu and C. Liu

PC P( p) =



|X (K )|2 p = 1, 2, 3, . . . , 11

(2.5)

k= p(k)= p

The tone contour features represent a tone scale by a twelve-dimensional vector, which reflects the relative strength of notes under the scale of chromatic scale in each twelve-chromatic interval.

2.3 Result Analysis and Discussion In artistic expression, each media has its own logical law of development, and different media can find the similarities of clues and match each other in the transition. For example, the pitch, intensity, timbre, melody, rhythm and other elements of music can establish a rich counterpoint relationship with the characteristics of color, brightness, shape and volume in visual experience. Music visualization can be divided into two categories according to the dimension of visual space: two-dimensional and three-dimensional. Two-dimensional music visualization needs low data dimension and is relatively simple to realize, while three-dimensional music visualization is suitable for constructing complex objects because of its strong expressive force, and is mainly used in occasions with high immersion requirements. In two-dimensional visualization, static images show the correlation of parameters of two dimensions of audio, or mine some structural and text features that appear, and there are also dynamic representations of images changing with time. However, three-dimensional transcends the dimension limitation of plane space and can superimpose time flow on two dimensions other than time in two-dimensional space. Each data sample contains more than 50 pitch values, and the effective data volume of each data set is more than 1500. The experimental results are judged manually according to the waveform. In this paper, the data set selected in the experiment is evenly distributed in all features, the samples are representative, and the experimental results are general and have reference significance. The experience change from auditory to visual brings not only a change in the experience effect of the audience, but also a change in the way music is realized. Figure 2.3 shows the subjective score given by the observer on the 3D visualization effect of music. The music visualization method in this article has a higher score, so it can be considered that the proposed music 3D visualization method has achieved a better user experience. The pitch of musical notes and the high sound intensity are two different concepts. Pitch refers to the relative height at which people hear musical sounds, while the high sound intensity refers to the height of musical sounds. The treble of musical notes is influenced by the frequency of sound. The higher the vibration frequency, the higher the pitch. On the contrary, the lower the vibration frequency, the lower the pitch. Feature extraction must be used in digital signal analysis of music, but the features to be extracted are different for different needs and purposes. In the field of musical notes, the pitch is set reasonably according to the melody, but in some

2 3D Visual Design of Music Based on Multi-audio Features

17

Fig. 2.3 Subjective assessment of 3D visualization effect of music given by observers

notes, the notes must be embodied in a specific pitch. These sounds are called musical notes, and the pitch is also the basis of note segmentation and recognition. The sound length is the concrete embodiment of the duration of music, which can mobilize the human body’s perception best. In music segmentation, syncopation sound length is the first step of music note segmentation recognition, which is used to improve the accuracy of music note segmentation. Figure 2.4 shows the comparison of audio feature recognition accuracy of different algorithms. The z-axis in the figure is the algorithm in this paper and the traditional SVM algorithm. Compared with the SVM algorithm, the 3D visualization method of music proposed in this article improves the accuracy by 25.77%. Once the energy of a note is too small, it will be ignored when it is intelligently segmented and recognized. At this time, if the recognition data of a note is ignored in the segmentation recognition result, it can be re-cut, improve the intelligent segmentation result of notes and improve the accuracy of note segmentation. The height of notes is the four properties of sound intensity. The final result of the sound spectrum of the same melody is different through people and music equipment, but the sound intensity of notes will not change. If the sound intensity of notes is changed slightly, the melody of music will also change. In the visual display of multiple audios, a reasonable expression form is used to present the correlation between the data according to the similarity: when the similarity is high, similar visual elements are presented, forming the same visual psychological feeling; when the similarity is low, different visual elements are used to contrast the visual presentation effect and guide users to explore the differences of data characteristics.

18

D. Liu and C. Liu

Fig. 2.4 Audio feature recognition accuracy of different algorithms

Different methods are used to compare and analyze the time-consuming of music 3D visualization processing, as shown in Fig. 2.5. The z-axis in the figure is the algorithm in this paper and the traditional SVM algorithm. As can be seen from Fig. 2.5, the time-consuming effect of traditional music 3D visualization processing increases with the increase of audio feature information, which is time-consuming. The audio recognition algorithm proposed in this article has obvious advantages over SVM, although the processing time is also on the rise. Compared with the traditional methods, the advantage of the intelligent note segmentation recognition method studied in this article is that it adopts the more advanced feature extraction method and the intelligent note segmentation recognition method

Fig. 2.5 Music 3D visualization processing with different methods is time consuming

2 3D Visual Design of Music Based on Multi-audio Features

19

of voice endpoint to make up for the shortcomings of the traditional methods in note segmentation. Intelligent note segmentation and recognition method based on audio feature technology uses hierarchical filtering and segmentation method for a musical melody to complete the operation, thus reducing the task of note segmentation and improving the efficiency of note segmentation and recognition.

2.4 Conclusions In the current high-speed development of Internet information age, computer technology is booming, and music visualization has been increasingly studied by academic circles and paid attention to by the market, which has given birth to a series of visualization application technologies. Music notes are not only used to learn music, but also to search and classify songs according to the recognition of music notes, thus promoting the diversified development of music. In order to better transform the auditory image of music into visual image, the perceptual auditory standard of music is deeply integrated with emerging multimedia to a higher extent, thus forming music visualization. In this article, a 3D visual design method of music based on multi-audio features is proposed, which takes the characteristics of note tones as the identification marks of notes, improves the sensitivity and ability of tone recognition, and promotes the development of music visualization field. The results show that the accuracy of the proposed 3D music visualization method is up to 25.77% compared with the SVM algorithm. Compared with the traditional methods, the advantage of the intelligent note segmentation recognition method studied in this article is that it adopts the more advanced feature extraction method and the intelligent note segmentation recognition method of voice endpoint to make up for the shortcomings of the traditional methods in note segmentation. The research results can be used to guide the design and assessment of music information visualization system, and also contribute to the research of high-dimensional data description and interactive design of multimedia application system. Acknowledgements This research is supported by the “5G Era—Intelligent Platform for Artificial Intelligence Piano Teaching” project of Hunan University Students’ Innovation and Entrepreneurship Training Program in 2020 (Project No.: 3534).

References 1. H.B. Lima, C. Santos, B.S. Meiguins, A survey of music visualization techniques. ACM Comput. Surv. 54(7), 1–29 (2021) 2. A. Dorochowicz, B. Kostek, A study on of music features derived from audio recordings examples—a quantitative analysis. Arch. Acoust. 43(3), 505–516 (2018) 3. V. Chaturvedi, A.B. Kaur, V. Varshney et al., Music mood and human emotion recognition based on physiological signals: a systematic review. Multimed. Syst. 2022(1), 28 (2022)

20

D. Liu and C. Liu

4. Y. Fan, K. Fang, R. Sun et al., Hierarchical auditory perception for species discrimination and individual recognition in the music frog. Curr. Zool. 2021(5), 5 (2021) 5. A. Baro, P. Riba, J. Calvo-Zaragoza et al., From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123(5), 1–8 (2019) 6. X. Wang, Research on the improved method of fundamental frequency extraction for music automatic recognition of piano music. J. Intell. Fuzzy Syst. 35(3), 1–7 (2018) 7. X. Bai, Music style recognition system based on the data mining. Basic Clin. Pharmacol. Toxicol. 2019(1), 124 (2019) 8. Y. Wang, Research on handwritten note recognition in digital music classroom based on deep learning. J. Internet Technol. 2021(6), 22 (2021) 9. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Process. 11(7), 884–891 (2017) 10. H. Nordström, P. Laukka, The time course of emotion recognition in speech and music. J. Acoust. Soc. Am. 145(5), 3058–3074 (2019) 11. P. Yao, Key frame extraction method of music and dance video based on multicore learning feature fusion. Sci. Program. 2022(7), 1–8 (2022)

Chapter 3

Construction of Humming Music Retrieval Model Based on Particle Swarm Optimization Hua Quan and Sanjun Yao

Abstract In our life, we may encounter such a situation: overhearing a melody makes you feel familiar but unfamiliar. You must have heard this song, but you can’t remember the title, singer and so on, only the melody. On the basis of PSO (particle swarm optimization), this paper further studies the retrieval of humming music, and tests the pitch contrast between male and female voices with the same coefficient. The experimental results show that the humming signal of female voice is over-weighted because of the weighting coefficient of male voice, which leads to the noise being lower than the original melody contour. Collect various types of original ecological electronic music data, denoise the collected electronic music data, and detect the denoised electronic music in frames and endpoints to obtain effective electronic music signals. With reference to the analysis of male voices, it can be seen that the fundamental frequency range reserved for female humming signals should be mainly selected in the sub-band range of 2, 3 and 4, while most female voices are in the sub-band range of 120 ~ 220 Hz, so the sub-band weighting coefficient of this range should be set slightly larger. It is proved that the algorithm used in the model can realize the query function. Keywords Particle swarm optimization · Humming music · Retrieval model

3.1 Introduction With the construction of social informatization, the progress of communication technology, the significant improvement of Internet transmission speed, the rapid development of multimedia technology, and the continuous emergence of new and effective multimedia coding technologies, multimedia data has become the main information

H. Quan · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_3

21

22

H. Quan and S. Yao

transmitted on the network [1]. In life, we may encounter situations where you accidentally hear a melody that makes you feel familiar but unfamiliar. You must have heard this song before, but how can you remember the title and singer of the song? You can only remember the melody of the song. However, the retrieval of multimedia data is still done using text based retrieval. It is an urgent need for the development of information technology to research more efficient retrieval technologies that conform to human communication habits. Currently, music search services on the Internet are essentially just text searches that return relevant results by matching keywords such as songs, singer names, or lyrics. In traditional music retrieval technology, music is generally represented by its attributes. When searching, people often require at least one of the following information, such as music name, singer information, publisher, and lyrics, to perform the next search. This requires that people must remember one of the information, such as music name, singer information, publisher, and lyrics, in order to have a way to search, which obviously cannot meet people’s search needs. The search method is not natural enough. If you forget this information, you cannot find the music you want [2, 3]. Humming music retrieval technology is another popular content based retrieval method for large music databases. When the person who wants to query the music does not know the name of the song or the singer, but only remembers a certain melody, they hope to find the music by humming [4]. Compared to the traditional search method of using external information such as song titles and singers, it searches based on the inherent characteristics of music such as melody and rhythm. After investigation and analysis, it was found that the optimal humming music classification model currently is based on particle swarm optimization algorithm. In this way of querying, users only need to use a microphone to hum music clips to the computer, which searches for the target music based on the content you hum. Therefore, this article is designed based on PSO for humming tunes or whistling to find the desired song. It is very useful when you want to find a song from the music library but forget the title or singer. Therefore, compared to the traditional keyword based user interface, humming music retrieval enables users to obtain a better search experience [5]. To establish a humming music retrieval model, it is necessary to analyze the shortcomings of current music classification models.

3.2 Related Music Knowledge in Humming Music Retrieval 3.2.1 Elements of Sound The sounds used in music are a series of sounds with fixed frequency or fixed pitch. This series of sounds is called a series of sounds, which is formed in the long-term development of music practice. These sounds are formed into a fixed system, which is used to express musical ideas and shape musical images. The four elements of music are sound intensity, pitch, timbre and waveform envelope. The ideological

3 Construction of Humming Music Retrieval Model Based on Particle …

23

content and artistic beauty of a musical work can only be expressed through various elements. (1) Sound intensity Sound intensity is the loudness people feel when they hear it, that is, the strength of the sound we usually say is big or small, heavy or light. It is one of the subjective evaluation criteria of sound intensity by human ears. They organize music by the length, strength, fixity and accuracy of the sound and their relationship. The repetition of the same time segments with and without stress in a certain order is called beat. (2) Pitch The lowest frequency among sound waves is called pitch, and the frequencies that are integral multiples of other fundamental frequencies are called overtones. Pitch, or tone, is the subjective evaluation scale of the tone of the sound by the human ear. Its objective evaluation scale is the frequency of sound waves. Unlike the relationship between sound intensity and amplitude, pitch and frequency are basically the same. When an instrument vibrates, its pitch is determined by the pitch, and the frequency and intensity of overtones emitted at the same time determine the timbre of the music. (3) Timbre The acoustic characteristics of different voices, different musical instruments and different combinations. The timbre is determined by the overtone spectrum of the music, which can also be said to be determined by the waveform of the music. Because the waveform of music is not a simple sine wave, but a complex wave. Through the contrast and change of timbre, the expressive force of music can be enriched and strengthened. (4) Waveform The change of waveform envelope also affects the timbre of musical instruments. Obviously, the playback equipment also requires good transient following ability, otherwise it will cause the distortion of the natural envelope of music. The change of sound intensity also plays an important role in shaping the musical image. The function of harmony directly affects the strength, the tightness of rhythm and the power. The relationship between pitch and frequency is also logarithmic, and pitch takes semitone as the unit, so it is determined that the conversion of fundamental frequency into semitone also conforms to logarithmic relationship. According to the relationship between pitch and melody and the rules of music tuning, we can determine the difference of pitch between humming and original singing and the processing method of eliminating this difference. Melody organically combines all the basic elements of music and becomes a complete and inseparable unity [6]. It is inconceivable that melody leaves other musical elements, because the expression of melody is realized through the function and interaction of various musical elements.

24

H. Quan and S. Yao

3.2.2 The Relationship and Difference Between Voice Signals and Humming Music Signals The pitch in music describes the height of the sound, corresponding to the frequency; Loudness describes the strength of a sound, corresponding to its intensity. The timbre is determined by the frequency spectrum of the sound. The proportion of harmonics at various levels varies, and the degree of attenuation over time varies, resulting in different timbre [7]. From the perspective of human auditory perception, the cognition of humming music is mainly based on two major factors: time and frequency. Zero crossing analysis is the simplest analysis of sound in the time domain. For continuous sound signals with a time horizontal axis, it is possible to observe the time domain waveform of sound passing through the horizontal axis. As shown in Fig. 3.1, speech recognition generally selects a recognition method that can meet the requirements based on the type of recognition system. Speech analysis methods are used to analyze the speech feature parameters required by this recognition method, and the recognition results are obtained by comparing them with the system model according to certain criteria and measures. The function of the preprocessing unit is to smooth the signal spectrum through high-frequency pre weighting, and divide the speech data sequence into continuous signal frames using a window function. The endpoint detection unit mainly completes the detection of determining the start frame and end frame of a word, and the function of the feature extraction unit is to complete the calculation of a spectrum based feature vector. Zero crossing rate analysis is the simplest analysis of sound in the time domain. From continuous sound signals on the horizontal axis of time, it is possible to

Fig. 3.1 Speech recognition system

3 Construction of Humming Music Retrieval Model Based on Particle …

25

observe the time domain waveform of sound passing through the horizontal axis [8]. In the case of discrete time sound signals, zero crossing occurs if adjacent samples have different algebraic symbols. When the endpoint detection unit detects that the current speech signal frame is a starting frame, the feature extraction unit starts feature extraction calculation and stores the frame feature vector. In the case of discrete time sound signals, zero crossing occurs if adjacent samples have different algebraic symbols. A sinusoidal signal with a frequency of f 0 is sampled at a f s rate, with an average of: Z er o = 2 f 0 / f s

(3.1)

The short-term average zero crossing rate is based on a short-term processing technology, which can be represented by Eq. (3.2): Qn =

∞ 

T (m)

(3.2)

m−∞

m − ∞ is a sound sampling sequence obtained by sampling at a certain sampling frequency, T (m) is a linear or non-linear transformation, and finally the transformed intermediate transformation is changed. Unlike voice recognition for human voices, in music recognition, the core voice recognition technology can provide valuable information about music content, but there are other audio data that need to be processed. Music search is a music centric search. The autocorrelation function is used to study the signal itself, such as the synchronization and periodicity of the signal waveform. The autocorrelation function is defined as:  Cx x =

x(t)x(t + τ )

(3.3)

where τ is the time lag. The characteristics of voice signals change slowly over time, so it can be assumed that the temporal characteristics of humming voice signals are fixed over time intervals, leading to a short-term analysis of humming voice signals. The application of short time analysis to Fourier transform is short time Fourier transform.

3.3 Constructing a Humming Music Retrieval Model 3.3.1 Basic Framework PSO is a random global computing technology based on group intelligence. Its advantages lie in the simplicity and ease of implementation of the algorithm, without many parameters to adjust, and without the need for gradient information. From

26

H. Quan and S. Yao

the perspective of information theory, vector quantization can also be classified as a feature extraction unit. The codebook of vector quantization must be generated and optimized on a computer in advance, and a search algorithm for the codebook of vector quantization must also be provided [9]. Two assumptions need to be made to characterize speech signals. One is that the transition of internal states is only related to the previous state, and the other is that the output value is only related to the current state or the current state transition. These two assumptions greatly reduce the complexity of the model, treating speech as a series of specific states that cannot be directly observed. For example, this state can be a phoneme of speech. When constructing a humming music retrieval model based on PSO in this article, it is necessary to first collect various types of original electronic music data, denoise the collected electronic music data, and detect the denoised electronic music by frame division and endpoint detection to obtain effective electronic music signals [10]. The time-domain and frequency-domain variance features and energy features of electronic music are extracted from the effective electronic music signal through PSO, and the extracted features are combined to form a feature vector. Suppose there is an insertion error during the humming process, that is, there is an extra note during the humming process, as shown in Fig. 3.2. Then there will be a completely wrong match in the matching process, as shown in Fig. 3.3. Usually, a threshold value is set. When the number of different characters in the two sequences exceeds the closed value, it is judged that this part of the audio is not similar to the humming part, and then this part is skipped and compared with the later part of the music until the end of the music is compared. After the framing work, it is necessary to window the frame. Direct framing is equivalent to adding a rectangular window by default. According to formula (3.4), the probability Pq that the humming melody appears in the song is: Pq =

Fig. 3.2 Insertion error fragment

Fig. 3.3 Incorrect matching

182 · 0.0087 39

(3.4)

3 Construction of Humming Music Retrieval Model Based on Particle …

27

Assume that the number of songs in the song library is T , the number of similar songs after matching is t, and the probability of t or more identical results in the query results is Pt . There is a formula: Pt =

t−1 

  Pqi 1 − Pq

(3.5)

i=0

Therefore, in large music libraries or fuzzy matching where humming errors are allowed, this algorithm cannot meet the requirements of practical applications. Through PSO training, the model parameters are adaptively adjusted to adapt to the training sequence and optimized to obtain the best model in practical applications. Because time domain multiplication corresponds to frequency domain convolution, and the side lobes of rectangular windows are large, it can cause spectrum leakage in the frequency domain due to signal truncation in the time domain, that is, due to frequency domain convolution, energy leaks to other frequencies, and the shorter the length of the frame, the more obvious this effect. This problem is actually the training process of the humming music retrieval model.

3.3.2 Model Application The purpose of a matching algorithm based on a humming music retrieval model is to find music with the same melody from the music library. If the humming is the same as the original singing, then the music retrieval through PSO is very efficient. But in fact, this is not possible because there are many differences between humming and original singing. Therefore, one of the most important issues to be solved by music matching algorithms is how to reduce the impact of differences between humming and original singing on matching. For male humming signals, the fundamental frequency range is between 90 and 230 Hz through the humming music retrieval model. Therefore, when selecting the subband range, levels 2, 3, 4, and 5 should be selected. Level 2 is reserved for signals with a fundamental frequency between 290 and 600 Hz. Under special circumstances, only certain tenors can sing this frequency band. For female humming signals, if the same weighting coefficient is used as for male voices, the pitch comparison results of male and female voices using the same coefficient are shown in Fig. 3.4. As can be seen from Fig. 3.4, due to the use of the male voice’s weighting coefficient, the low frequency portion of the female voice’s humming signal is excessively weighted, resulting in noise points that are lower than the original melody contour. Referring to the analysis method for male voices, it can be seen that the range of fundamental frequencies to be reserved for female humming signals should mainly be in the subband range of levels 2, 3, and 4. Moreover, most of the female voices are in the subband range of 120–220 Hz, and the subband weighting coefficient in this

28

H. Quan and S. Yao

Fig. 3.4 Comparison of pitch between male and female voices using the same coefficient

range should be set slightly higher. After using different weighting coefficients for both sound signals, smooth melodic contours can be obtained, as shown in Fig. 3.5. Figure 3.5 shows the melody profile produced by a male voice and a female voice humming the same song. From this figure, it can be seen that the average pitch of a male voice is 6–8 half tones lower than the average pitch of a female voice. In other

Fig. 3.5 High contrast of male and female voices

3 Construction of Humming Music Retrieval Model Based on Particle …

29

words, the fundamental frequency range of male and female voices is not the same. It can also be seen from the figure that as long as the song is hummed, it is the same song, regardless of the pitch height of the male and female voices, and the melody outline is roughly the same. Through the retrieval test of the humming music retrieval model in this chapter, it can be seen that the humming sound is also performed according to the melody of the song. The hummer needs to match their humming pronunciation with the score, so the humming signal also needs to be considered, including musical characteristics such as pitch, pitch, pitch, rhythm, beat, speed, and melody. Through a discussion of the representation of musical melodies, it is concluded that the ups and downs of musical melodies and numbers have a natural affinity, so using numerical values to represent musical melodies has unparalleled advantages. Some people hum with a similar effect to the original song, while others may not be able to judge the content of their hum. Therefore, the quality of humming has a significant impact on the accuracy of the system. The significance of the running results of the music retrieval model established in this article is to prove that the algorithm used in the model can achieve the query function.

3.4 Conclusions Audio and video contain more and richer information. The pitch in the process of humming is related to the frequency of sound vibration at each moment: the greater the frequency, the higher the pitch; The lower the frequency, the lower the pitch. Humming-based music retrieval is an important part of content-based retrieval technology, and now most of the retrieval scope is limited to text retrieval. Based on PSO, this paper studies the retrieval of humming music, and discusses the relationship between sound height and frequency, and the principle of similarity between humming and original singing. The experimental results show that the humming signal of female voice is over-weighted because of the weighting coefficient of male voice, resulting in noise lower than the original melody contour. The fundamental frequency of music signal is extracted by using the fundamental frequency extraction technology of speech signal. In this paper, the pitch comparison of male and female voices using the same coefficient is tested. With reference to the analysis of male voice, it can be seen that the range of fundamental frequency to be reserved for female humming signal should be mainly selected from 2, 3 and 4 sub-bands, and most of female voices are in the sub-band range of 120 ~ 220 Hz. The sub-band weighting coefficient of this range should be set slightly larger, which can effectively reduce the influence caused by humming errors, and PSO can balance the algorithm between time and accuracy according to adjusting empirical parameters. Acknowledgements This work is a phased achievement of the Outstanding Youth Project “Ecological Research on the Inheritance of Meishan Sacrificial Music” (Grant No. 22B0849), a scientific research project of the Hunan Provincial Department of Education.

30

H. Quan and S. Yao

References 1. X. Liu, An improved particle swarm optimization-powered adaptive classification and migration visualization for music style. Complexity 48(27), 46–63 (2022) 2. S. Deng, A. Zhao, S. Fu et al., Music-search behaviour on a social Q&A site. J. Inf. Sci. 67(18), 23–48 (2022) 3. T. Ammari, J. Kaye, J. Tsai et al., Music, search, and IoT. ACM Trans. Comput.-Hum. Interact. TOCHI 49(17), 61–82 (2022) 4. A. Killin, C. Brusse, A. Currie et al., Not by signalling alone: music’s mosaicism undermines the search for a proper function. Behav. Brain Sci. 63(20), 51–67 (2021) 5. J.H. Saetre, Why school music teachers teach the way they do: a search for statistical regularities. Music Educ. Res. 20(10), 31–44 (2022) 6. L.M. Caton, In search of Mr Baptiste: on early Caribbean music, race, and a colonial composer. Early Music 42(1), 6–18 (2021) 7. H.J. Kim, M. Sung, J.J. Lee et al., Application for automatic creation of music sheet and information search of sound source. Complexity 48(12), 26–58 (2022) 8. Q. Huang, N. Lu, Optimized real-time MUSIC algorithm with CPU-GPU architecture. IEEE Access 55(99), 11–47 (2021) 9. P. Jarumaneeroj, N. Sakulsom, An adaptive large neighborhood search for the multiple-day music rehearsal problems. Comput. Ind. Eng. 48(21), 107279–107298 (2021) 10. S. Ma, Music rhythm detection algorithm based on multipath search and cluster analysis. Complexity 20(11), 19–28 (2021)

Chapter 4

Research on Audio Processing Method Based on 3D Technology Kai Li, Yaping Tang, and Yuanling Ouyang

Abstract The perception of sound by human auditory system includes not only subjective attributes such as loudness, tone and timbre, but also spatial attributes of sound. 3D sound effect is an acoustic concept, which has the characteristics of broad sound stage and strong sense of sound localization, and can bring advanced auditory enjoyment to users. To analyze the audio signal, we must first preprocess the signal, filter out the noise in the audio signal and extract useful signal components. Aiming at the problems that may be faced in 3D audio signal processing, an improved algorithm for determining the threshold based on decomposition scale is proposed, and the optimal decomposition scale is determined by comparing adjacent highfrequency coefficient graphs. The improved algorithm in this article better preserves the characteristics of the signal. The accuracy of audio processing using this method is as high as 95.69%, which is higher than that of the two models, 5.98% and 9.53% respectively. The results show that the method proposed in this article has obvious advantages in audio processing. The improved algorithm can effectively remove noise interference and enhance the stereo effect of 3D audio, and the signal-to-noise ratio is obviously better than the original algorithm. Keywords 3D sound effects · Audio processing · Noise reduction · Signal noise ratio

K. Li · Y. Tang (B) College of Music and Dance, Hunan University of Humanities, Science and Technology, Loudi 417000, China e-mail: [email protected] Y. Ouyang Changsha Human Resources Public Service Center, Changsha 410000, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_4

31

32

K. Li et al.

4.1 Introduction 3D sound effect is an acoustic concept, which has the characteristics of wide sound stage and strong sense of sound localization, and can bring advanced auditory enjoyment to users. The principle of realizing 3D sound effect is to reprocess the left and right channel signals specifically, expand the sound stage, produce extremely realistic 3D positioning sound effect around the listener, and bring real auditory experience [1]. Binaural audio processing technology based on human auditory perception characteristics uses signal processing, computer and other technical means to simulate the same sound pressure as the real sound source scene at the eardrum of the listener as much as possible, so that the listener can perceive the virtual sound image at a specific location in space [2]. Generally speaking, various signals in nature can be compressed in a specific transform domain, or can be sparsely represented, for example, image signals can be sparsely represented in the wavelet transform domain [3]. The reason why natural signals have such characteristics is that there is a lot of redundancy among the data they contain, and there is also some correlation between adjacent data, such as a lot of redundant information between adjacent frames of video signals and adjacent pixels of image signals [4]. The wide application of audio codec in digital audio system makes it possible to improve the efficiency of audio signal transmission and storage. However, with the introduction of coding and decoding algorithm, the traditional objective measurement index no longer corresponds to the perceived sound quality. Audio coding is to digitize analog audio signals, which can be transmitted, stored or processed as digital signals. In order to reduce storage space or reduce transmission bit rate and save bandwidth, it is necessary to compress and encode the digitized audio signals [5]. Sound source signals in natural scenes include not only the azimuth information of each sound source, but also the surrounding environment information. Therefore, binaural technology first needs to use binaural recording or synthesis to create binaural signals containing specific sound source spatial information [6]. When using headphones to play back binaural signals, it is necessary to equalize the transmission function of headphones because it does not meet the natural transmission conditions, and there will also be problems such as head positioning and direction confusion [7]. Waveform coding of audio signal directly encodes the waveform of audio signal in time domain or frequency domain, trying to make the reconstructed audio waveform keep the shape of the original audio signal [8]. Waveform encoder has strong adaptability, good audio quality and easy implementation, but the required coding rate is high. Parameter coding tries to make the reconstructed signal as intelligible as possible by extracting and coding the characteristic parameters of the audio signal, that is, to maintain the semantics of the original audio, and the waveform of the reconstructed signal may be quite different from that of the original audio signal [9]. Aiming at the problems that may be faced in 3D audio signal processing, this article proposes an improved algorithm to determine the threshold based on the decomposition scale and determine the optimal decomposition scale according to the comparison of adjacent high-frequency coefficient graphs.

4 Research on Audio Processing Method Based on 3D Technology

33

4.2 Audio Processing Method Based on 3D Technology Scalable coding technology realizes that an audio codec has a core layer and multiple enhancement layers. When coding, the code stream with lower sound quality is first coded in the core layer, and then the enhancement layer is coded to improve the sound quality. The more enhancement layers, the higher the coding sound quality and the higher the coding rate [10]. The average amplitude of the audio signal is relatively small due to the interference in the transmission stage of the audio signal, so it is necessary to increase its gain value to ensure that the overall amplitude of the signal is increased, so as to ensure that the audio signal is not distorted. The binary mask technology can be used to extract all sound source signals in time–frequency domain. If the time–frequency information of each audio source signal is non-overlapping after aliasing, that is, it is sparse, and the time–frequency components belonging to the same source signal are marked by binary time–frequency mask. Except for the low frequency band, the coefficients of each high frequency band are thresholded, and the threshold size is shown in the following formula: √ t = σ 2 ln N

(4.1)

When the wavelet decomposition coefficient is less than the threshold, it is set to zero, indicating that this coefficient is noisy. Otherwise, the coefficients generated by the signal that are greater than the threshold are retained. The expression is as follows:    ω j,i  ≤ t 0, (4.2) θˆ j,i = ω j,i other The wavelet transform suitable for the signal to be analyzed is used for reconstruction, and some parameters of signal filtering are obtained. In this article, the amount of sampled data is reduced by compressing the data at the same time. By developing the sparsity of the signal reasonably, the original signal can be reconstructed by using the randomly downsampled data obtained after compressed sensing at the receiving end, and satisfactory results can be obtained. Most of the existing audio processing methods are based on frequency domain, because frequency domain is more in line with the physical model of human auditory system compared with time domain signals [11]. For the audio compressed stream using MDCT transform, IMDCT transform is first performed to obtain the time domain signal, and then DFT transform is used to transform it into frequency domain. After audio processing, IDFT transform is needed to get the time domain signal again before playing. In the stage of audio signal input, if the set input value range is too large, it will cause great noise in the signal transmission process and affect the input and output quality of the signal. On the contrary, if the set range of the audio input signal is small, it will be ignored because of the small signal, thus causing the distortion of the audio input signal. According to the input time of

34

K. Li et al.

audio signal, the signal intensity of 36 dB is determined as noise. Except for the low frequency band, the coefficients of each high frequency band are thresholded. Similar to the hard threshold denoising method, the value form is shown in Formula (4.1). When a wavelet decomposition coefficient is less than the threshold, it is set to zero, indicating that this coefficient is noisy. Otherwise, write down the coefficient of the point generated by the signal that is greater than the threshold. The expression of the formula is:      θˆ j,i = sgn ω j,i · ω j,i  − t

(4.3)

The processed signal is reconstructed and the parameter values are calculated. When calculating the average of local minimum extreme value in each critical frequency band, the following conditions should be considered respectively: whether there is a local minimum point in the first critical frequency band, whether there is a local minimum point in the last critical frequency band, whether there is a local minimum point in a critical frequency band, and whether the average of local minimum extreme value is the boundary point of the critical frequency band [12]. In audio processing, it is often assumed that the signal is short-term stationary, that is, the parameters of the sinusoidal model remain unchanged within a short time interval (20 ~ 40 ms). In this way, a short-time audio signal can be expressed as a stationary sinusoidal signal whose group parameters do not change with time. When the gain adjustment time of audio signal is longer than that of normal audio signal, the gain value will not change greatly at this time, and the gain adjustment time should be set to 5 s according to the characteristics of audio signal output. If the audio signal output is large, the gain at this time needs to be reduced rapidly, and if the reduction adjustment is not made, the device will be damaged. When the output value of the signal is greater than the upper limit of the expected value, the gain value adjustment time should be set to 0.5 ms. The schematic diagram of sound source separation technology is show in Fig. 4.1.

Fig. 4.1 Schematic diagram of sound source separation technology

4 Research on Audio Processing Method Based on 3D Technology

35

Since the audio signal is a time-varying audio signal, each channel can be regarded as a single one-dimensional signal, and then the multi-channel audio signal can be regarded as a two-dimensional signal. The traditional algorithm will cause deviation from the optimal threshold, and there is no clear requirement for the decomposition scale, which may also affect the final effect of filtering. Therefore, this article proposes a method to determine the optimal decomposition scale by comparing adjacent highfrequency coefficient graphs. In practice, the standard deviation of noise is unknown. Therefore, in this article, the standard deviation of noise is estimated according to the high frequency coefficient cdk of each layer of wavelet, as shown in the following formula: σk = Mx (|cdk|)/0.6745

(4.4)

The wavelet coefficients of each layer have noise components, and the ratio of noise components of each layer to total noise should be fully considered when selecting the threshold of each layer. The quantization and coding module quantizes and codes according to the allocated quantity of bits, and sends the result to the multiplexer to package the bit stream, and adds header information and necessary sideband information to the bit stream to finally form the output bit stream. Its function is to coordinate the contradiction between perceptual model and bit allocation, so as to achieve satisfactory audio quality under certain bit rate restrictions and limited divisible bits [13]. When the amplitude of the audio signal changes, the adjustment is relatively fast at the beginning of the gain, which has a great influence on the gain, that is, the signal gain changes greatly. After gain adjustment of about 5 s, the output value of signal amplitude can be adjusted to the expected value. With the rapid change of input audio signal, the output audio signal will change correspondingly with the change of input signal, but when the input signal suddenly increases, the audio output signal will not change obviously.

4.3 Result Analysis and Discussion In this experiment, eight mono audio sequences were selected, including four musical instruments (Zhong Qin, guitar, piano and violin), two sets of audio (male and female) and two songs (male and female). These audio sequences are from the standard test sequences organized by 3GPP, the actually recorded music albums and the network resources. The average interception time of audio sequence is 12 s. In audio signal processing, simulation experiments are made according to the amplitude changes of audio input and output. Because the audio signal is a quasi-periodic signal with short-term stationarity, it is generally considered that its length is within 10 ~ 30 ms, so it needs to be framed, and the framed audio signal can be regarded as a small signal block satisfying the condition of “sparse domain”. At the same time, in order to make the framed signal closer to the original signal and ensure the correlation between two adjacent frames, it is usually necessary to set a frame shift of 5 ~ 15 ms,

36

K. Li et al.

Fig. 4.2 Algorithm training situation

that is, the overlap between two adjacent frames of audio signal. The experimental data comes from Million Song Dataset (MSD), which contains a lot of music-related data. This experiment uses MSD’s subdata set, which contains 5000 users’ behavior records of 20,000 music works and related metadata of users and music. The core data after data screening includes user information table, song information table and behavior data table. According to the flow of the algorithm, the output signal of audio is calculated. At this time, when the audio signal suddenly decreases, overshoot will occur. In order to eliminate/avoid overshoot, this article adds delay to the output of audio signal according to the algorithm. The training of the algorithm is shown in Fig. 4.2. Increasing the delay of audio signal can solve and eliminate the overshoot phenomenon. Although increasing the delay will have a certain impact on audio signal, its adverse impact is within an acceptable range. In order to evaluate the effectiveness of the audio processing method proposed in this article, F1 index and MSE (Mean squared error), which can comprehensively reflect the performance of the algorithm, are selected to experiment the algorithm. In addition, consider comparing the performance of different algorithms. The F1 indicators were tested respectively, and the results as shown in Fig. 4.3 were obtained. The MSE indicators were tested respectively, and the results as shown in Fig. 4.4 were obtained. In order to verify the effectiveness of the method proposed in this article in audio enhancement, six pure mono audio signals with sampling frequency of 4.8 kHz are selected, and the white noise in NOIZEX92 noise library is superimposed on them. Information steganography in audio signals will inevitably affect the SNR (Signal to

4 Research on Audio Processing Method Based on 3D Technology

Fig. 4.3 F1 value comparison

Fig. 4.4 MSE comparison

37

38

K. Li et al.

noise ratio) value. SNR is measured in decibels. The higher the SNR, the better the sound quality of the audio signal. The SNR value is calculated as follows: S N R = −10 lg

    F − F  2 F2

(4.5)

where F is the original audio signal and F  is the audio signal containing hidden information. Noise is inevitable in practical application, and noise will affect the accurate estimation of frequency and phase. Gaussian white noise with different signal-to-noise ratio is added to the sinusoidal signal, and the frequency and phase are re-estimated and the average error at different sinusoidal signal frequencies is counted. The influence of Gaussian white noise with different signal-to-noise ratios on frequency and amplitude errors is shown in Fig. 4.5. In the figure, the abscissa represents different signal-to-noise ratios, and the ordinate represents the logarithmic coordinates of the error mean. It can be seen that this method is obviously better than Merdjani method when the signal-to-noise ratio is higher than 1000. The experimental results verify the superiority of the method proposed in this article, which can effectively remove noise interference and enhance the stereo effect of 3D audio, and the signal-to-noise ratio is obviously better than the original algorithm. Because Dobes wavelet is a layered orthogonal wavelet, with the increase of order, the smoothness will be more ideal, which is suitable for digital signal processing. Considering that the order is too large to retain the characteristic information of the signal, db10 wavelet is selected to decompose the signal. In the stage of audio signal

Fig. 4.5 Frequency estimation error under the influence of Gaussian white noise

4 Research on Audio Processing Method Based on 3D Technology

39

Fig. 4.6 Accuracy of audio processing using different methods

input, when the gain of audio signal becomes larger, it will be slower. At this time, the gain delay is smaller and the amplitude of gain increase is smaller. Although the output audio signal is large, it is close to the ideal output amplitude. Therefore, the safety of the signal output device will not be affected. The accuracy of audio processing using different methods is shown in Fig. 4.6. The algorithm in this article makes reasonable use of the unique sparsity of natural audio signals, so that the enhancement effect is not affected by the added noise, and it has good robustness. From the experiments in this section, it can be seen that the improved algorithm in this article better retains the characteristics of the signal. The accuracy of audio processing using this method is as high as 95.69%, which is higher than that of the two models, 5.98% and 9.53% respectively. The results show that the method proposed in this article has obvious advantages in audio processing.

4.4 Conclusions The twenty-first century is an information society. With the continuous improvement of people’s quality of life, people have higher and higher requirements for audio quality. At the same time, in order to meet users’ requirements for music effects, more and more devices support 3D sound playback. Therefore, audio processing technology has become the focus in the field of signal analysis and processing. Aiming at the problems that may be faced in 3D audio signal processing, this article

40

K. Li et al.

proposes an improved algorithm to determine the threshold based on the decomposition scale and determine the optimal decomposition scale according to the comparison of adjacent high-frequency coefficient graphs. In practical application scenarios, it is generally necessary to ensure that the sampling frequency is 5–10 times of the highest frequency of the signal, so as to ensure the correctness of the obtained information, thus introducing more data redundancy, increasing the storage space and reducing the utilization rate of the channel. The simulation of actual signal shows that the accuracy of audio processing using this method is as high as 95.69%, which is higher than that of the two models, 5.98% and 9.53% respectively. The results show that the method proposed in this article has obvious advantages in audio processing, can effectively remove noise interference, enhance the stereo effect of 3D audio, and is obviously superior to the original algorithm in signal-to-noise ratio. It shows that the improved algorithm is effective and has good performance and certain application value. Because the original intention of the model is to provide users with fast and effective audio processing functions, it is slightly insufficient in other auxiliary functions, and it is hoped that more functions can be added in the future to further improve the user experience. In future practical applications, audio signals can be classified according to their characteristics. Different error concealment algorithms can be used for different types of audio sequences, thus improving the effect of error concealment. With the deepening of research and the development of other related fields, it is believed that more and more technical problems in audio signal processing will be solved, which will be more widely used in practice.

References 1. M. Heck, M. Hobiger, A.V. Herwijnen et al., Localization of seismic events produced by avalanches using multiple signal classification. Geophys. J. Int. 216(1), 201–217 (2019) 2. L.J. Nowak, K.M. Nowak, Perceptual audio processing stethoscope. J. Acoust. Soc. Am. 146(3), 1769–1773 (2019) 3. L. Jing, L. Bo, J. Choi et al., DCAR: a discriminative and compact audio representation for audio processing. IEEE Trans. Multimed. 19(12), 2637–2650 (2017) 4. D.M. Rasetshwane, J.G. Kopun, R.W. Mccreery et al., Electroacoustic and behavioral evaluation of an open source audio processing platform. J. Acoust. Soc. Am. 143(3), 1738–1738 (2018) 5. D. Monroe, Digital hearing advances in audio processing help separate the conversation from background noise. Commun. ACM 60(10), 18–20 (2017) 6. C. Maximo, R. Sandra, SART3D: a MATLAB toolbox for spatial audio and signal processing education. Comput. Appl. Eng. Educ. 27(4), 971–985 (2019) 7. S.H. Hawley, B.L. Colburn, S.I. Mimilakis, Profiling musical audio processing effects with deep neural networks. J. Acoust. Soc. Am. 144(3), 1753–1753 (2018) 8. Mustaqeem, S. Kwon, A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2019) 9. M. Matsumoto, Vision-referential speech enhancement of an audio signal using mask information captured as visual data. J. Acoust. Soc. Am. 145(1), 338–348 (2019)

4 Research on Audio Processing Method Based on 3D Technology

41

10. M.R. Bai, S.S. Lan, J.Y. Huang et al., Audio enhancement and intelligent classification of household sound events using a sparsely deployed array. J. Acoust. Soc. Am. 147(1), 11–24 (2020) 11. B. Munson, Audiovisual enhancement and single-word intelligibility in children’s speech. J. Acoust. Soc. Am. 148(4), 2765–2765 (2020) 12. F. Rumsey, Room acoustics modeling, enhancement, measurement. J. Audio Eng. Soc. 66(7–8), 637–641 (2018) 13. B. Ma, J. Teng, H. Zhu et al., Three-dimensional wind measurement based on ultrasonic sensor array and multiple signal classification. Sensors 20(2), 523 (2020)

Chapter 5

Design and Optimization of Point Cloud Registration Algorithm Based on Stereo Vision and Feature Matching Yifeng Wang, Shanshan Li, and Shuai Huang

Abstract Based on scale space division and sparse PCA (Principal Component Analysis) method, the scale that can represent the maximum information of point cloud is set as the representative scale value of the point cloud, so as to complete the initial scale matching between point clouds. The relationship between normal vectors of 3D points is used to achieve scale invariance, and the concept of histogram is combined to enhance the robustness of 3D feature extraction algorithm in the process of point cloud matching. The experimental results show that this method uses many algorithms to optimize, and its convergence speed is faster than ICP (Iterative Closest Point) method. For point clouds with different overlapping degrees, the registration time and convergence error of this algorithm are obviously lower than those of the contrast algorithm. The algorithm in this paper converges quickly and has high registration accuracy when the overlap rate is low and the initial position is quite different, which proves the robustness of the algorithm in this paper. Keywords Stereo vision · Feature matching · Point cloud registration

5.1 Introduction In recent years, 3D laser scanning technology has developed rapidly in all walks of life. Such as 3D reconstruction [1, 2], reverse engineering and digital city. As an important representation of 3D real scenes, 3D point clouds have been used in many research and application fields, such as computer vision, robotics and unmanned driving. Target recognition for point cloud data has become a research hotspot in Y. Wang · S. Li (B) · S. Huang School of Physics, Electronics and Intelligent Manufacturing, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] S. Li Key Laboratory of Intelligent Control Technology for Wuling-Mountain Ecological Agriculture in Hunan Province, Huaihua 418008, Hunan, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_5

43

44

Y. Wang et al.

target recognition technology [3]. Because the point cloud is composed of discrete points, the target recognition algorithm for two-dimensional images is no longer applicable, so it is of practical significance to study the point cloud target recognition algorithm. The matching of 3D point clouds has always been a hot topic in the field of computer vision. However, due to the different methods based on 3D reconstruction, the quality of 3D point clouds can be different, so how to propose a method to match several different types of point clouds at the same time has become a difficult problem in the research. Reference [4] proposed the method of mesh-resolution to calculate the relative scale of point cloud. Mesh-resolution method is to calculate the average of the sum of the side distances of adjacent points in point clouds. Obviously, this method is only suitable for using laser scanning equipment data, and cannot be applied to point clouds with different point distribution densities. Literature [5] proposes an ICP (Iterative Closest Point) matching method which is different from the estimation method of similar change matrix of rigid change. Reference [6] puts forward a method for measuring saliency points. This method can distinguish the salient areas on the surface of point cloud based on perceptual estimation and match them. Although in recent years, the problem of matching and stitching 3D point clouds has gradually attracted researchers’ attention, there are not many achievements in dealing with different types of point clouds at present. Most of the existing methods are based on point clouds with uniform distribution, less noise and sufficient information, and there are more or less defects and deficiencies, which should be improved in terms of accuracy, robustness and computational cost. Due to the complexity of visual scene and the limitation of space, there is no method to reconstruct perfect point cloud in various complex environments. This makes researchers need to propose more optimized methods to deal with low-quality point clouds and complete the subsequent matching tasks in post-processing [7, 8]. Based on the research status of the above scholars, in order to improve the problem that ICP algorithm needs better initial value and slow convergence speed, this paper proposes a design and optimization method of point cloud registration algorithm based on stereo vision and feature matching. By using scale space division and sparse PCA (Principal Component Analysis) method, the scale that can represent the maximum information of point cloud is set as the representative scale value of the point cloud, so as to complete the initial scale matching between point clouds. The relationship between normal vectors of 3D points is used to achieve scale invariance, and the concept of histogram is combined to enhance the robustness of 3D feature extraction algorithm in the process of point cloud matching.

5 Design and Optimization of Point Cloud Registration Algorithm Based …

45

5.2 Research Method 5.2.1 Stereo Vision Model Computer vision technology mainly shoots objects in the real world through a camera, extracts key information from two or more digital images, forms a 3D structural system, restores the appearance shape and geometric stereo information of objects, and then enables computers to have the perception ability of the 3D world. There are two ways to realize 3D reconstruction of objects: active method and passive method. Active methods include contact and non-contact measurement [9]. Passive methods generally do not need to use light sources or touch the measured object itself, but only need to obtain two or more images with overlapping areas. The most widely studied passive 3D reconstruction method is binocular stereo vision technology, which is very similar to human visual system [10, 11]. The ideal binocular stereo vision system adopts parallel binocular cameras, and obtains the imaging pairs of the left and right cameras of the same object from two different viewpoints in the real world at the same time, so as to calculate the parallax of the horizontal position of the object in the stereo image pair, and the distance between the object and the camera can be calculated by using the similar triangle method in mathematics. Binocular stereo vision is to imitate the human visual system, using two cameras to collect the scene image information, calculating the difference between the left and right image pairs and then measuring the coordinate information of each pixel. The binocular stereo vision model used in this paper is shown in Fig. 5.1. Assuming that two cameras shoot a certain point P(xc , yc , z c ) in the real scene at the same time, the imaging points of the P point in the left and right cameras are P1 (u 1 , v1 ), P2 (u 2 , v2 ), and its P1 , P2 ordinate is the same, but its abscissa is u 1 , u 2 , so the position deviation of the left and right image pairs in the horizontal direction is u 1 − u 2 , which is recorded as parallax d. The size of u 1 , u 2 , v1 can be obtained from the triangle similarity principle, as shown in (5.1). ⎧ xc ⎪ ⎨ u 1 = f zc c (5.1) u 2 = f (x z−b) c ⎪ ⎩ v = v = f yc 1 2 zc After camera calibration and correction, the epipolar lines of the left and right images are collinear and parallel to the horizontal direction, that is to say, the left and right matching points have the same ordinate, so the search can be carried out along the one-dimensional scanning line in stereo matching. The relationship between matched pixel pairs is: d(x, y) = x L − x R

(5.2)

46

Y. Wang et al.

Fig. 5.1 Parallel binocular stereo vision imaging model

where x L is the characteristic pixel in the left image, x R is the corresponding pixel in the right image, and d(x, y) is called parallax, that is, the offset of the characteristic point in the left and right image positions.

5.2.2 Feature Matching Based on Stereo Vision Point cloud data refers to a set of point sets that collect 3D coordinates of discrete points on the target surface by various scanning devices. According to the different collection methods and production requirements, there are similarities and differences in the data collected by various methods. The common point is that point cloud data all store the spatial position information of points, but the difference is that some point cloud data store surface normals, RGB information, texture and other information. Point clouds are generally divided into high-density point clouds and low-density point clouds according to the density of points in the collected point clouds. The total number of point clouds collected by scanning instruments that do not use optical principles is several hundred to tens of thousands, which is called low-density point clouds. However, the point cloud obtained by the equipment based on optical principle can reach millions or even tens of millions, which is called high-density point cloud. Sparse point cloud registration is to make two sets of point cloud coordinates in different spatial coordinates under the same spatial coordinate system, but the

5 Design and Optimization of Point Cloud Registration Algorithm Based …

47

point set coordinates after spatial coordinate transformation are different because of considering different coordinate transformation factors. In order to achieve a better point cloud coordinate normalization effect, it cannot be completed by one point cloud coordinate normalization, so it is necessary to search the nearest iteration point, select a point set with good matching effect, and find the transformation parameters for the next iteration, and it is better to reach a threshold [12]. In this paper, a PCA based on the axis direction of point cloud is proposed. Firstly, divide the two point clouds equally to get their centroid, that is, the coordinate centers of the two points; the core problem is to determine the rotation matrix and translation vector of two-point clouds based on the similarity of point cloud axes, so as to obtain the coordinate relationship between two-point clouds. In view of the complexity of the target object, the target object is completely scanned from multiple stations in different directions, and the scanning data of each station has its own coordinate system. The point cloud registration process is shown in Fig. 5.2. Fig. 5.2 Point cloud registration process

48

Y. Wang et al.

Firstly, the feature descriptors of point cloud are calculated, and according to the feature information of each sample point, the local correlation of point cloud data is established by sparse PCA to narrow the search range of matching point pairs. By using the corresponding relationship of feature point pairs and initial registration of sampling consistency, the optimal rotation and translation matrix between two point clouds is found to realize the initial transformation of the two point clouds. Taking this as the initial value, the scale invariance is realized by using the relationship between normal vectors of 3D points, and the robustness of 3D feature extraction algorithm in the process of point cloud matching is enhanced by combining the concept of histogram. Solving the features of covariance matrix of two-point cloud to obtain feature vectors and corresponding eigenvalues, and sorting them according to the size of eigenvalues to obtain two eigenvector matrices of matrix T1 , T2 : T1 = T2 × (T × R)

(5.3)

where T is the translation matrix and R is the rotation matrix. Point qi represents a point in point cloud Q, and point qi represents a point in the coordinate system where point qi is converted to point cloud 1. The expression formula is as follows: qi = qi × inv(T × R)

(5.4)

PCA is widely used in data processing and dimension reduction. The data after PCA dimensionality reduction is a linear transformation of the original data, which can not be processed nonlinearly. The proposal of sparse PCA has effectively improved PCA. When all the point coordinates in point cloud 2 are converted to the spatial coordinate system where point cloud 1 is located through this conversion matrix, the rough registration of point cloud is completed, and the error between the coordinate values of two points cloud and the actual value is very small. Suppose there are N training images, the gray value of each image is extended to , x N ] ∈ R n×N is obtained by normalization, x ∈ R n , and the matrix Y = [x 1 , x 2 , . . .  and the matrix variance is expressed as Y N1 Y Y T . The first sparse feature vector satisfying the following formula is the projection vector:  α αs arg max α T (5.5) subj. α2 = 1, α1 ≤ k In this paper, the feature point extraction method based on normal vector is used to extract point sets with obvious features from 3D point sets. When the region changes little, the normal vector envelope angle changes little, on the contrary, the normal vector envelope angle changes greatly, so we can extract the feature point set by using the similarity between the normal vector envelope angle and the adjacent region.

5 Design and Optimization of Point Cloud Registration Algorithm Based …

49

The change degree of the normal vector at a point pi in the point cloud is defined as the arithmetic mean of the included angle between its normal vector and its neighboring point normal vector: 1 θ= θi j k j=1 k

(5.6)

where θi j is the included angle between point pi and the normal vector of its neighboring point p j , and θ is the arithmetic average of the included angle between pi and its normal vectors of k neighboring points. The greater the θ , the more obvious the characteristics of pi point. The flat area of point cloud can be deleted by setting the included angle threshold, that is, if θ is less than a certain angle, pi will be deleted. In this paper, the relationship between normal vectors of 3D points is used to achieve scale invariance, and the concept of histogram is combined to enhance the robustness of 3D feature extraction algorithm in the process of point cloud matching. Color histogram, as a widely used color feature in image retrieval system, can express color intuitively. The histogram shows the distribution of different colors in the whole image without considering the spatial position of each color. Color histogram is mostly applied to those images that do not need to consider the spatial position of the object and are difficult to segment. After transforming RGB color space into HSV color space, the color information of each pixel is composed of three color components: H, S and V. Unlike the color histogram in RGB color space, the color components in HSV color space are relatively independent and hardly interact with each other. Color histogram is an important factor affecting the color distribution of color images. Among these three vectors, H, S and V have a decreasing trend on the ability of human eyes to distinguish colors. According to the different quantization series of H, S and V and their frequencies, one-dimensional vector L can be defined as: L = H QS QV + S QV + V

(5.7)

where Q S , Q V is the quantization series of S, V , respectively, and taking Q S = Q V = 4, we can get: L = 16H + 4S + V

(5.8)

One-dimensional histogram of 256 handles is obtained by calculation. The three 3D components H, S and V are converted into a one-dimensional vector. The weights of H, S and V are 16, 4 and 1 respectively, which reduces the influence of S and V on the calculation and retrieval results.

50

Y. Wang et al.

5.3 Experiment and Analysis This paper uses 64-bit Win7 system and VS2010 development environment, configures OpenCV open source computer vision library, and calls function library to realize binocular camera to collect and process real scene images. The time and accuracy of this method are verified by a series of control tests on point clouds. In order to carry out the control experiment, this paper uses Bunny, Angel, Lion and Statue point cloud data models in the public data set and the data Room model collected in the field. The experimental test data has been processed by time-based band-pass filtering, that is, the test points that meet the test distance are retained, and the rest of the point clouds caused by stray light and far background noise are directly eliminated. Figure 5.3 shows the comparison of the optimization speed between this method and ICP method. Because this method uses many algorithms for optimization, its convergence speed is faster than ICP method. In addition, although ICP method can estimate the scale ratio of point cloud, the initial point cloud pose is very important. At the same time, the real point clouds in this experiment can’t overlap each other completely, so ICP method can’t be applied. In order to verify the registration performance of the algorithm in this paper under different overlapping degrees, three groups of point clouds with different overlapping degrees are selected and compared with three precise registration algorithms ICP, ref[4] and ref[5]. The registration error and time under different overlapping degrees are shown in Fig. 5.4. According to the registration results, the registration time and convergence error of this algorithm are obviously lower than those of the other three algorithms for point clouds with different overlapping degrees. From the experimental data, it can be seen that the greater the overlapping degree of two groups of point clouds, the

Fig. 5.3 Comparison of convergence speed

5 Design and Optimization of Point Cloud Registration Algorithm Based …

51

Fig. 5.4 Comparison of registration algorithms

higher the registration efficiency, while the registration efficiency of two groups of point clouds with similar overlapping degrees will be higher under the condition of ensuring good initial position. However, the algorithm in this paper has faster convergence speed and higher registration accuracy when the overlap rate is low and the initial position is quite different, which proves the robustness of the algorithm in this paper.

5.4 Conclusion Target recognition for point cloud data has become a research hotspot in target recognition technology. Because the point cloud is composed of discrete points, the target recognition algorithm for two-dimensional images is no longer applicable, so it is of practical significance to study the point cloud target recognition algorithm. In this paper, a design and optimization method of point cloud registration algorithm based on stereo vision and feature matching is proposed. Firstly, the stereo vision model is established, and then the feature matching based on stereo vision is constructed. The experimental results show that this method uses many algorithms to optimize, and its convergence speed is faster than ICP method. For point clouds with different overlapping degrees, the registration time and convergence error of this algorithm are obviously lower than those of the contrast algorithm. The algorithm in this paper converges quickly and has high registration accuracy when the overlap rate is low and the initial position is quite different, which proves the robustness of the algorithm in this paper.

52

Y. Wang et al.

Acknowledgements This work was supported by the Natural Science Foundation of Hunan Province in 2023 (Research on Key Technologies and Optimization Algorithms for Real Time Matching of 3D Point Clouds in Complex Environments, Project No. 2023J50454) and the Science Research Project Fund of the Hunan Provincial Department of Education (Project No. 20B460), as well as the Double First Class Applied Characteristic Discipline of Control Science and Engineering at Huaihua University.

References 1. C. Wang, Y. Yang, Q. Shu et al., Point cloud registration algorithm based on Cauchy mixture model. IEEE Photonics J. 2020(99), 1 (2020) 2. Y. Feng, J. Tang, B. Su et al., Point cloud registration algorithm based on the grey wolf optimizer. IEEE Access 2020(99), 1 (2020) 3. Z. Hu, B. Qi, Y. Luo et al., Improved point cloud registration algorithm and mobile robot experiment in V-SLAM system. Harbin Gongye Daxue Xuebao/J. Harbin Inst. Technol. 51(1), 170–177 (2019) 4. T. Watanabe, T. Niwa, H. Masuda, Registration of point-clouds from terrestrial and portable laser scanners. Int. J. Autom. Technol. 10(2), 163–171 (2016) 5. R.Y. Takimoto, M. Tsuzuki, R. Vogelaar et al., 3D reconstruction and multiple point cloud registration using a low precision RGB-D sensor. Mechatronics 35(7), 11–22 (2016) 6. J.H. Lyu, Z.W. Wang, Y. Zhou et al., A laser scanner point cloud registration method using difference of normals (DoN) based segmentation. Lasers Eng. 2022(2), 53 (2022) 7. Q. Shu, X. He, C. Wang et al., Parallel registration algorithm with arbitrary affine transformation. Chin. Opt. Lett. 18(7), 071001 (2020) 8. F. Mutz, L.P. Veronese, T. Oliveira-Santos et al., Large-scale mapping in complex field scenarios using an autonomous car. Expert Syst. Appl. 46(10), 439–462 (2016) 9. S. Li, Y. Hao, H. Mo et al., Fast non-rigid human motion 3D reconstruction. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/J. Comput.-Aided Des. Comput. Graph. 30(8), 1505 (2018) 10. R. Qi, W. Liang, Research on the integrated manipulator of point cloud measurement and precise cutting for waste nuclear tank. Ind. Robot. 2022(4), 49 (2022) 11. K. Yousif, A. Bab-Hadiashar, R. Hoseinnezhad, 3D SLAM in texture-less environments using rank order statistics. Robotica 35(4), 809–831 (2015) 12. Q. Hu, S. Wang, C. Fu et al., Fine surveying and 3D modeling approach for wooden ancient architecture via multiple laser scanner integration. Remote Sens. 8(4), 270 (2016)

Chapter 6

Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based on Multi-scale Feature Extraction Shanshan Li, Yifeng Wang, and Shuai Huang

Abstract On this basis, multi-scale feature extraction method is adopted to realize real-time cloud matching of 3D point clouds. In this project, multi-scale Harris corner is used to extract feature points of object edges, and LR (Lagrangian relaxation) is introduced through KD tree to improve the speed of point cloud matching. LR solution involves solving the difference equation, which is very suitable for FPGA (Field Programmable Gate Array) devices containing a large number of sequential logic circuit units. Based on FPGA, the point cloud data is effectively accelerated by combining software and hardware. The results show that the repeatability of this method is about 74% without noise, which is about 22% higher. The method in this paper is more robust and repeatable to noise. The running speed of the FPGA accelerator proposed in this paper is increased by 22.9879 and 14.8551 times. The accelerator based on FPGA also has obvious advantages in running power consumption. Keywords Multi-scale · Feature extraction · 3D point cloud · FPGA

6.1 Introduction The key of 3D shape matching is surface matching [1, 2]. Because the image of a single mode usually can’t provide enough information, images of different modes are usually fused. In order to fuse the information in different modal images, image registration must be carried out first [3]. The complete information of the target surface often needs to be obtained through multi-angle and multiple scans, which S. Li (B) · Y. Wang · S. Huang School of Physics, Electronics and Intelligent Manufacturing, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] S. Li Key Laboratory of Intelligent Control Technology for Wuling-Mountain Ecological Agriculture in Hunan Province, Huaihua 418008, Hunan, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_6

53

54

S. Li et al.

leads to the point cloud data obtained in different coordinates, so it is necessary to register the point clouds obtained from different angles and batches [4]. With the development of computer hardware and other related equipment, hardware acceleration has been widely used in computing acceleration. The simplest and commonly used method is GPU (Graphic Processing Unit). Compared with general CPU, this method increases the depth of pipeline and improves the working clock frequency. There are a lot of digital signal processing problems in image processing, so DSP (Digital Signal Processor) is widely used [5]. A large number of hardware technologies are used in DSP to support the operation of fixed-point or floatingpoint numbers. At the same time, the instruction system is rich and flexible, with high data accuracy and processing speed. FPGA (Field Programmable Gate Array) is a digital integrated circuit composed of configurable (programmable) logic units, which has been widely used in hardware product development, hardware accelerated simulation, function verification and other fields. Reference [6] proposes a 3D object recognition method based on point signature, which uses point signature as the matching basis to identify the target object in the scene. This method is sensitive to noise and prone to false matching. Reference [7] constructs a matching point pair based on the distance between points and surfaces, which puts forward a new idea for the selection of point cloud features, but the accuracy is affected by the poor fitting of surfaces; Literature [8] applies the classic 2D image registration algorithm to the problem of point cloud registration. The proposed algorithm can register various shapes of point clouds with high computational efficiency, but its robustness is poor in noisy environment. The extraction of robust features based on point cloud data is of great significance and application prospect. There are a lot of data in the point cloud, and the computation of the algorithm is very large, which makes the real-time learning of 3D vision very difficult, especially the matching operation of adjacent points, which makes it difficult to complete these operations with exhaustive operation in practical application. With the development of point cloud data processing in computer vision, robotics, remote sensing image processing and other fields, algorithms such as filtering and noise reduction, feature extraction, feature matching and target recognition have gradually matured [9, 10]. In this paper, a 3D point cloud real-time matching algorithm based on multi-scale feature extraction is proposed, and LR (Lagrangian relaxation) is added to KD tree to accelerate the point cloud matching operation. LR solution involves solving the difference equation, which is very suitable for FPGA devices containing a large number of sequential logic circuit units. Based on FPGA, the point cloud data is effectively accelerated by combining software and hardware.

6 Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based …

55

6.2 Research Method 6.2.1 Multiscale Feature Extraction Image registration methods can basically be divided into three categories: image registration based on gray level, image registration based on transformation and registration based on image feature points. In order to register the reference image and the floating image, we must find the transformation matrix between the two images [11], and then map the pixels in the floating image to the coordinates of the reference image. Therefore, in image registration, the estimation of transformation matrix parameters is the core problem of image registration. Compared with the image registration algorithm based on feature extraction, the registration method based on image gray level uses all the gray information in the image, so this kind of algorithm has the general disadvantages of large calculation and slow image registration speed. Given a multi-scale variational estimate, an appropriate scale can be made according to the demand, and then the variational estimate of this scale can be used as the feature weight. Therefore, when choosing a parameter, that is, scale, it can be decided whether to extract fine-scale or coarse-scale features. The principle of choosing the scale is that the scale standard points to the local maximum of some normal derivative operators, reflecting the characteristic length of the corresponding data structure. This principle can be easily applied to the multi-scale space just introduced. In two point clouds to be registered, it takes a long time to find matching point pairs, and there are a large number of incorrect point pairs. In order to realize fast and accurate point cloud data acquisition, it is necessary to preprocess the data to get the feature point set. The average distance between points in the K-neighborhood of a point Pi in the point cloud, that is, the characteristic distance gi , is defined as the basis for judging whether the point Pi is a characteristic point. The calculation formula is: 1 di j k j=1 k

g=

(6.1)

where di j indicates the distance from a point in the neighborhood of Pi to the tangent plane of Pi , and gi is the average distance from the neighborhood of Pi to the tangent plane of Pi . At present, multi-scale feature point extraction and matching of images are mainly realized by computer software programming, which has some problems such as limited feature extraction, high algorithm complexity and slow processing speed. For the processing of high-resolution pictures in practical applications, CPU can no longer meet the real-time requirements. Compared with GPU, FPGA can run independently from computer, which is more suitable for real-time processing of pictures by moving equipment such as mobile robots. In this paper, multi-scale Harris corner is used to extract the feature points of the object boundary in the image, and

56

S. Li et al.

Laplacian operator of 2D Gaussian function is proposed to detect the speckle signal of 2D image. According to the parallel structure characteristics of FPGA, the multiscale Harris corner detection calculation is improved, which is more conducive to the realization of FPGA. The 2D Gaussian function G(x, y, σ ) is convolved with the image function I (x, y) to obtain a Gaussian scale image: L(x, y, σ ) = G(x, y, σ ) ⊗ I (x, y)

(6.2)

In programming, Gaussian templates with different weights are selected to replace Gaussian functions. The larger the standard deviation σ is, the larger the corresponding Gaussian template will be. For the calculation of response value, the improved Harris corner response function is introduced, and the improved position space response is obtained as follows: Corner =

A ∗ B − C2 ∗ σD A+B

(6.3)

Among them: A = L 2x (x, y, σ ) ⊗ G(x, y, σ D ), B = L 2y (x, y, σ I ) ⊗ G(x, y, σ D ), C = L x L y (x, y, σ I )(x, y, σ D ). After obtaining the multi-scale feature weights of each point, we need to select a specific threshold, and then compare the feature weights with the selected threshold. In order to select the relevant feature node set Q ∈ P, all the points whose feature weights are below a certain threshold ωmin are discarded. Points with weight less than ωmin are thrown away, and points with weight ωi > ωmin are placed in the feature node set Q. The points in Q constitute the feature point set of the sample surface we want, that is, the base point set of the subsequent matching process. Generally set as a linear function of the average value of the feature weights of all points, namely: N 1  ωi λ=α∗ N i=1

(6.4)

6.2.2 Real-Time Cloud Matching Acceleration of 3D Point Cloud Point cloud acquisition technology can be divided into contact scanner, lidar, structured light and triangulation [12]. Because 3D point cloud target recognition has the

6 Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based …

57

Fig. 6.1 3D point cloud information processing system

characteristics of large amount of data and high processing algorithm complexity, in order to realize stable, measurable, controllable and high-speed intelligent information processing in complex environment, an efficient and stable information processing architecture is needed. This paper constructs a reliable and efficient realtime image processor system. The process of 3D point cloud information processing system is shown in Fig. 6.1. The whole recognition system is divided into two parts: offline learning and online recognition. The task of the off-line part is to extract the feature vectors from 3D point cloud data of various ground targets at different angles and distances. Then the feature subset is learned offline, and LR is added on the basis of KD tree to accelerate the operation of point cloud matching. LR solution involves solving the difference equation, which is very suitable for FPGA devices containing a large number of sequential logic circuit units. Based on FPGA, the point cloud data is effectively accelerated by combining software and hardware. LR method is an algorithm for solving a class of mathematical optimization problems, which are characterized by the separability of objective function relative to decision variables, but the strong coupling between constraints. LR method is to relax these coupling constraints into the objective function in the form of similar penalty terms by introducing so-called Lagrangian multipliers, and transform the original problem into a relaxation problem without coupling constraints. LR algorithm belongs to an approximate optimization algorithm, which embodies the idea of decomposition and coordination. The task of decomposition is to divide a large-scale difficult problem into several small-scale simple problems to solve independently, and the task of coordination is to coordinate the conflicts among the solutions of each sub-problem by adjusting Lagrangian multipliers. In this paper, LR is added to KD tree to accelerate the operation of point cloud matching. We consider the following optimization problem P:

58

S. Li et al.

Z P = min f (x) s.t g j ≤ 0 j = 1, 2, . . . , m

(6.5)

x ∈ S ⊂ Rn where f (x), g j is a convex function and S is a convex set, the constraints are relaxed into the objective function to form the L R P problem:   Z L A R (λ) = min f (x) + λT g j (x) s.t x ∈ S

(6.6)

where the vector λ ≥ 0 is called Lagrange multiplier. The optimal value of relaxation problem L R P is the lower bound of the optimal value of original problem P. In order to improve the quality of the lower bound, we consider the following LR dual problem D P:    Z D P (λ) = max Z L R P (λ) = max min f (x) + λT g j (x) λ

s.t

λ

x

λ≥0

(6.7)

Because the dual function is concave and the feasible solution set is convex, the dual problem is a convex optimization problem, and the maximum value of the dual problem is the lower bound of the minimum value of the original problem. In order to meet the LR characteristics and the resource constraints of FPGA hardware platform, and determine the parallel computing parameters of network deployment, it is necessary to study the design space exploration method. This design considers DSP resources while establishing the resource model, so as to explore the design space of computing resources and storage resources. The number of cycles used by the system to process decision samples is shown in Formula (6.8): FoldC ycle = FoldC ycle F P + FoldC ycle B P

(6.8)

where FoldC ycle F P represents the time spent by forward reasoning in Curr ent_ Q module, and FoldC ycle B P represents the time spent by backward propagation in Curr ent_Q module.

6.3 Experimental Analysis The image used in this experiment is a building, the reference image is a jpg image with the size of 273 × 377 × 3, and the floating image is a jpg image with the size of 208 × 212 × 3. Working platform configuration: Lenovo THINKPAD E40, processor Intel(R)Core(TM)i5, memory 2G, 32-bit operating system, MATLAB.

6 Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based …

59

Fig. 6.2 Time consumption of algorithm after rotation transformation

The whole system of image registration algorithm includes the following parts: image acquisition, image preprocessing, feature extraction, feature description, feature matching, error matching elimination and transformation matrix estimation. Image acquisition uses the pictures of the same scene taken by the camera from different angles as the experimental object. Image preprocessing is mainly to eliminate noise and other information in experimental pictures before image feature extraction. After rotating the floating image at different angles, the experimental data of image registration are shown in Fig. 6.2. It can be seen that the time consumed by the registration of ref[6] method and the time consumed by the cloud matching algorithm in this paper fluctuate greatly with the change of image rotation angle, but they are always below the solid line of ref[6] algorithm. Therefore, the cloud matching algorithm in this paper has a faster registration speed than the ref[6] method. Figure 6.3 shows the repeatability comparison of feature point extraction between this method and ref[6] method under different noise influences, that is, the noise immunity comparison of the two methods. It can be seen that the feature points detected by the two methods are almost the same without noise, but after adding Gaussian noise, some points are misjudged as feature points, and the method of ref[6] has more misjudged points than this one, which shows that feature points are also detected in a relatively flat area. The repeatability of ref[6] method is 55%, while that of this method is about 74%, which is about 22% higher. Therefore, this method is more robust and repeatable to noise. The method in this paper can effectively control the number of feature points and

60

S. Li et al.

Fig. 6.3 Noise immunity contrast

Table 6.1 Comparison of sample processing time

Platform

Running time (ms)

Power consumption (W)

Speed-up ratio

GPU

6.011

260

22.9879

CPU

4.0076

90

14.8551

Our

0.201

10

1

has stronger robustness and repeatability to noise, which is of great significance for surface registration and 3D object recognition. In order to explore the overall acceleration performance of FPGA accelerator in this paper for 3D point cloud real-time cloud matching, a comparative experiment with CPU and GPU was carried out, and the relevant experimental results are shown in Table 6.1. It can be seen that compared with GPU and CPU, the running speed of the FPGA accelerator proposed in this paper is increased by 22.9879 and 14.8551 times. The accelerator based on FPGA also has obvious advantages in running power consumption.

6.4 Conclusion In order to fuse the information in different modal images, image registration must be carried out first. The complete information of the target surface often needs to be obtained through multi-angle and multiple scans, which leads to the point cloud

6 Design of 3D Point Cloud Real-Time Cloud Matching Algorithm Based …

61

data obtained in different coordinates, so it is necessary to register the point clouds obtained from different angles and batches. In this paper, a real-time 3D point cloud matching algorithm based on multi-scale feature extraction is proposed, and LR is added to KD tree to accelerate the operation of point cloud matching. The results show that the repeatability of this method is about 74% without noise, which is about 22% higher. The method in this paper is more robust and repeatable to noise. The running speed of the FPGA accelerator proposed in this paper is increased by 22.9879 and 14.8551 times. The accelerator based on FPGA also has obvious advantages in running power consumption. Acknowledgements This work was supported by the Natural Science Foundation of Hunan Province in 2023 (Research on Key Technologies and Optimization Algorithms for Real Time Matching of 3D Point Clouds in Complex Environments, Project No. 2023J50454) and the Science Research Project Fund of the Hunan Provincial Department of Education (Project No. 20B460), as well as the Double First Class Applied Characteristic Discipline of Control Science and Engineering at Huaihua University.

References 1. M. Shu, G. Chen, Z. Zhang, 3D point cloud-based indoor mobile robot in 6-DoF pose localization using a Wi-Fi-aided localization system. IEEE Access 2021(99), 1–1 (2021) 2. L. Tong, Y. Xiang, 3D point cloud initial registration using surface curvature and SURF matching. 3D Res. 9(3), 41 (2018) 3. C. Dinesh, I.V. Bajic, G. Cheung, Adaptive non-rigid inpainting of 3D point cloud geometry. IEEE Signal Process. Lett. 25(6), 878–882 (2018) 4. H. Farhood, S. Perry, E. Cheng et al., Enhanced 3D point cloud from a light field image. Remote Sens. 12(7), 1125 (2020) 5. Y. Nan, Q. Cheng, X. Xiao et al., Point cloud optimization method of low-altitude remote sensing image based on vertical patch-based least square matching. J. Appl. Remote Sens. 10(3), 035003 (2016) 6. L. Wang, Y. Liu, S. Zhang et al., Structure-aware convolution for 3D point cloud classification and segmentation. Remote Sens. 12(4), 634 (2020) 7. X. Li, S. Du, G. Li et al., Integrate point-cloud segmentation with 3D LiDAR scan-matching for mobile robot localization and mapping. Sensors 20(1), 237 (2019) 8. B. Bayram, T. Ozkan, H.C. Reis et al., Open source library-based 3D face point cloud generation. Chiang Mai J. Sci. 45(4), 1875–1887 (2018) 9. Y. He, Y. Mei, An efficient registration algorithm based on spin image for LiDAR 3D point cloud models. Neurocomputing 151(1), 354–363 (2015) 10. W. Liu, C. Wang, X. Bian et al., AE-GAN-Net: learning invariant feature descriptor to match ground camera images and a large-scale 3D image-based point cloud for outdoor augmented reality. Remote Sens. 11(19), 2243 (2019) 11. Y. Zhang, C. Li, B. Guo et al., KDD: a kernel density based descriptor for 3D point clouds. Pattern Recogn. 111(2), 107691 (2021) 12. S. Biookaghazadeh, P.K. Ravi, M. Zhao, Toward multi-FPGA acceleration of the neural networks. ACM J. Emerg. Technol. Comput. Syst. 17(2), 1–23 (2021)

Chapter 7

Design of Digital Music Copyright Protection System Based on Blockchain Technology Yirui Kuang and Yao Sanjun

Abstract The traditional copyright protection method has some problems, such as difficulty in confirming the right, complicated steps and long time for confirming the right, which can not effectively protect the legitimate rights and interests of the original author. Under the blockchain technology, the process of reshaping the trust mechanism through decentralization, and the trust mechanism determines the right subject and determines that the right subject is beneficial to copyright ownership can clarify the copyright ownership of digital music. Based on the in-depth analysis of existing technologies, standards and systems, this paper designs and implements a complete digital music copyright protection system. When reading resources, users are required to provide relevant authentication information, and users can browse resources only after providing correct authentication information and meeting the validity period of resources. A digital audio watermarking algorithm based on chaotic encryption is designed, which introduces chaos theory. Firstly, the watermark information is encrypted by low-dimensional chaotic mapping, and then the watermark is encrypted by three-dimensional Rossler chaotic attractor, which further improves the security of the watermark and enhances the robustness of the watermark. The research results show that the embedding time of this algorithm is kept within 2 s, which meets the real-time embedding requirements of watermark. The algorithm adopts the way of embedding multiple identical watermarks at the same time, which improves the robustness of watermarks. Keywords Blockchain · Digital music · Copyright protection

Y. Kuang · Y. Sanjun (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_7

63

64

Y. Kuang and Y. Sanjun

7.1 Introduction The rise of digital technology has subverted the traditional music industry. Digital music not only brings more convenience and quickness to consumers, but also challenges the copyright system of traditional digital music. A large number of free pirated music has entered the market, which has caused a great impact on the legitimate rights and interests of copyright owners. With the advent of the digital age, the music industry is developing towards digitalization [1]. Perhaps when people enjoy this beautiful and convenient music, they don’t think too much about or even realize the copyright of digital music. However, with the popularization and development of Internet technology, the issue of copyright of Internet digital music is gradually emerging. Because it is extremely easy and low-cost to transfer and copy information between networks, digital copyright infringement incidents are common. The traditional copyright protection method has some problems, such as difficulty in confirming the right, complicated steps and long time for confirming the right, which can not effectively protect the legitimate rights and interests of the original author. In this context, the continuous development and maturity of blockchain technology will provide a new opportunity for the copyright protection of digital music. Under the blockchain technology, the trust mechanism is rebuilt through decentralization, and the process that the trust mechanism determines the right subject and determines that the right subject is beneficial to copyright ownership can clarify the copyright ownership of digital music [2, 3]. Blockchain system, like computer network system, is also a layered system, which can be roughly divided into protocol layer, extension layer and application layer. Different levels are transparent, and now most blockchain systems have provided protocol layer and extension layer for developers, and developers only need to develop appropriate programs or contracts through the application layer [4]. The unique digital abstract of the work is realized by blockchain technology, which can ensure that the original digital content will not be forged or tampered with, thus achieving registration and certification, and can be docked with government agencies, thus being protected and recognized. In this way, blockchain technology reshapes the trust mechanism of copyright transactions for users and rights holders by decentralizing and establishing safe, transparent and immediate trust guarantee [5, 6]. Digital rights management is a method to protect digital music content from unauthorized playing or copying. It provides a means for content providers to protect digital media content from illegal copying or use. In recent years, the booming blockchain technology has attracted people’s attention [7]. The data of blockchain technology is difficult to tamper with and decentralized, which makes it possible to effectively protect digital rights. Based on the in-depth analysis of existing technologies, standards and systems, this paper puts forward some new methods and strategies to improve the overall performance of digital music copyright protection system, and designs and implements a complete digital music copyright protection system.

7 Design of Digital Music Copyright Protection System Based …

65

7.2 Research Method 7.2.1 System Overall Design The feasibility of using blockchain technology to protect digital music copyright mainly benefits from the characteristics of distributed decentralization, extremely difficult to change, data encryption and data disclosure of blockchain technology, which provides a new and more convenient way for the registration, use, solidification of infringement and loss compensation calculation of digital music copyright. Because the infringer can easily delete the evidence of infringement by internet technology, it is difficult for the copyright holder of digital music to prove that the infringer has infringed his own digital music copyright in the infringement lawsuit, and at the same time, the infringer will not take the initiative to admit that there is infringement of other people’s digital music copyright. The development of blockchain technology provides a new way for copyright protection of digital music in China. Digital technology is the technical guarantee to realize the simplification and barrier-free of music copying and transmission, which makes the time for music works to reach the audience shorter, more convenient and more extensive [8, 9]. The transparency, openness and authenticity of blockchain technology realize the direct communication between the right holder and the right holder, query the information and the chain flow bar, and realize the real-time monitoring of the copyright of the work by the right holder. Blockchain technology uses asymmetric encryption to ensure data security, and operates data with intelligent contracts generated by automated script code, which is more open, independent and anonymous. Its information is true and reliable, which is conducive to solving the problem of lack of trust between people. On this basis, the DRM (Data Rights Management) trading system constructed by blockchain technology can effectively solve the problems of DRM privacy protection and DRM trading, and provide a powerful tool for promoting the healthy development of DRM industry. The traditional practice is to ask for a user name and password. The user name and password entered by the user will be verified by the system to confirm the identity of the user. With the development of technology and equipment, smart cards, fingerprint readers and digital certificates can also greatly enhance the security of media resources [10]. As a digital music copyright protection system, it is necessary to distinguish, encode, encrypt, and finally declare, broadcast and synchronize the relevant words, images, audio and video, binary files uploaded by users. Therefore, how to identify and store the files uploaded by users has become the core of the system design, and at the same time, corresponding measures should be taken for the authenticity of user identity and the copyright legality of files on the blockchain platform. When the smart contract runs the script code in the blockchain, a transaction that meets its conditional parameters is sent to this address on the blockchain where the smart contract is located. If the conditions are met, the judgment mechanism will make a decision to automatically execute the contract according to the preset information,

66

Y. Kuang and Y. Sanjun

Fig. 7.1 Digital music copyright protection system framework

and then the transaction is reached. At this time, the contract is recorded in the blockchain in the form of code. This system adds digital copyright protection technology to the original digital media content after the copyright protection encryption server. The copyright protection technology adopted includes encrypting the digital media content with a symmetric key and adding a watermark to the digital media content. The metadata of digital media content (that is, the introductory information of media content) is released to the outside world through the WEB server. When users provide services such as streaming media, they must first go through the authentication and authority server, and then obtain the corresponding authority. Figure 7.1 is the framework of a digital music copyright protection system based on blockchain technology. The resource browser of digital copyright protection system is mainly used to browse encrypted resources. When reading resources, users are required to provide relevant authentication information. Only after providing correct authentication information and meeting the validity period of resources can users browse resources, but the resource browser system will still prohibit users from copying, copying and saving resource files. If you want to encourage creation and protect copyright, you must pay attention to the property rights of works. From a practical perspective, the scientific development is changing with each passing day, and it is the general trend that traditional music communication carriers lose their position and are eventually eliminated. This is not something that can be recovered by strengthening the supervision of the Internet and strengthening the protection of network copyright. Faced with this situation, the copyright of audio-visual works such as music must find new forms and new carriers. The resource obtained by the user is a resource file packaged by the server of the resource management system. It is necessary to identify the type of the file in

7 Design of Digital Music Copyright Protection System Based …

67

the automatic resource identification module, open the resource file correctly, and present the resource information to the user. Capture these instructions, gain control, process or change them before they reach the resource browser window.

7.2.2 Key Technology Realization Under the blockchain technology, digital music copyright data are all stamped with time stamps and arranged in chronological order, which is irreversible. Moreover, the distributed synchronous account book under the blockchain technology stores the data in a discrete database, which is transparent and authentic, effectively preventing the deletion and tampering of the data, and can be used as a powerful basis for the proof of rights protection and accountability. Therefore, we should turn our attention to the Internet, strengthen the construction of regular music websites, authorize and license websites, and obtain corresponding benefits. This is the only way for copyright owners to realize their rights under the network environment. At the same time, the website pays the copyright owner the license fee, while the network users can still download music for free. There is no conflict between them. Because in our country, the application and business model of digital technology are still in the exploratory stage, and our legal norms are still not perfect, and the law enforcement power is also very weak, which has caused the entry barriers of enterprises to be relatively low. Music platform’s indifference to copyright directly caused a large number of piracy on the Internet, and the scale of infringement expanded rapidly at the same time, which seriously misled the consumption habits of netizens in China in the early stage of industrialization and brought serious negative effects to genuine music websites. Copyright protection means that pirates resell it without the permission of the copyright owner. Therefore, data encryption, digital signature and other technologies are the focus of current digital product owners. Cryptography technology based on public or private keys can convert plaintext into confidential information that others can’t understand and control data access [11]. Audio watermarking is a new subject, which integrates audio signal processing, cryptography, communication theory, signal compression and human auditory system theory, and is a multidisciplinary technology. It makes full use of the redundancy of human visual and auditory systems and embeds the secret information related to the copyright owner to confirm the copyright ownership of the information. In this paper, a digital audio watermarking algorithm based on chaotic encryption is designed. The algorithm introduces chaos theory into watermark information processing. Firstly, the watermark information is encrypted by low-dimensional chaotic mapping, and then the watermark is encrypted by three-dimensional Rossler chaotic attractor, which further improves the security of the watermark and enhances the robustness of the watermark. DWT (Division Wave Transform) has been widely used in the field of digital audio. After DWT, the audio signal is decomposed into time domain and frequency domain at the same time, and its time resolution is mainly reflected in the high frequency

68

Y. Kuang and Y. Sanjun

band, while its frequency resolution is mainly reflected in the low frequency band [12]. Watermark information is added to the low-frequency component of the original signal, and the coefficients with higher energy are selected for embedding, so that the watermark information is easy to be masked, which ensures the robustness of the system and gives consideration to its transparency. Therefore, the watermark embedding point is actually selected as the low-frequency component after threelevel DWT. In order to compare the original watermark information with the extracted watermark information more intuitively, a 64 * 64 binary image is used as the watermark information. The watermark image is encrypted by chaos twice, and the 3D Rossler chaotic system with more complex structure and larger key space is introduced in the second encryption. Firstly, the original watermark image is scrambled and encrypted by two-dimensional Arnold mapping, and the once encrypted image is encrypted again by 3D Rossler chaotic attractor. Figure 7.2 shows the watermark encryption process.

Fig. 7.2 Watermark encryption process

7 Design of Digital Music Copyright Protection System Based …

69

The two-dimensional binary image as watermark information is encrypted by Arnold scrambling transform, which can make the damaged points scattered throughout the whole picture and improve the recognition of the image. Arnold scrambling is to scramble an image by constantly changing the position of pixels. The two-dimensional Arnold scrambling of a digital image of size N ∗ N is defined as:      x 11 x = (7.1) (mod N ) y 12 y where, x, y ∈ {0, 1, . . . , N − 1} represents the pixel position of the image before Arnold scrambling, x  , y  represents the pixel position of the image after Arnold scrambling, and mod N is modular N operation. Logistic chaotic encryption is performed on the binary image after dimensionality reduction, and the corresponding x0 , µ (and saved as key key1) is selected to generate a one-dimensional chaotic encryption sequence R(k), and the encrypted sequence becomes: Z  (k) = Z (k) ⊕ R(k), 0 ≤ k < N

(7.2)

Rossler chaotic system is very famous in the field of non-dynamics and has been well applied in many fields. Its mathematical model: ⎧ ⎨ xˆ = −(y + x) yˆ = x + by ⎩ zˆ = c + z(x − d)

(7.3)

In the above formula, b, c, d is the system parameter, and if and only if b = c = 0.2, d = 5.7, the system will enter the chaotic working state. The application of 3D Rossler chaotic system in encryption field has the characteristics that low-dimensional chaotic system does not have: firstly, the unpredictability of the obtained chaotic sequence is improved; Secondly, the high-dimensional chaotic system makes up for the periodic degradation problem of the low-dimensional chaotic system under finite precision, which improves the key space and further improves the security of the algorithm.

7.3 Result Analysis In order to verify the correctness of the algorithm, we test the performance of audio digital watermarking. Under the Windows 10 operating system, MATLAB is used to carry out simulation experiments. The sample types include voice, country music,

70

Y. Kuang and Y. Sanjun

Fig. 7.3 Embedding time of different types of audio watermark embedding algorithms

pop music, rock music, etc. The audio is in wav format and mono, and the sampling rate is 44,200 Hz. The test results of algorithm running efficiency of different types of audio files are listed in Fig. 7.3. From the data results, it can be seen that the embedding time of this algorithm is within 2 s for digital audio files with a playback time of 60 s, while the embedding time of the classic algorithm MP3Stego is about 18 s, so the embedding speed of this algorithm is fast and the waiting time is not long, which meets the requirements of real-time embedding of watermarks. The business model of content production platform in the future is bound to be a profit model of micro-copyright payment, and it also needs to give up some original benefits to develop knowledge payment, and use blockchain technology to catch the express train of knowledge payment under the condition of realizing micro-copyright protection to maximize the benefits of micro-copyright. In order to test the robustness, the experiment uses the following attack methods to attack the watermarking system, including adding Gaussian white noise, adding colored noise, re-sampling, random cropping and low-pass filtering. In this paper, the Butterworth low-pass filter with order 6 and cut-off frequency of 10 kHz is used to low-pass filter the audio signal containing watermark information, and then the watermark is extracted. Figure 7.4 shows SNR (Signal Noise Ratio) and NCC (normalized correlation coefficient) of pop music. From the experimental results, it can be seen that the information extracted from the watermark has high similarity with the original information and high SNR, which

7 Design of Digital Music Copyright Protection System Based …

71

Fig. 7.4 SNR and NCC of pop music

proves that this method is robust and invisible. In terms of the robustness of the watermark, the algorithm adopts the way of embedding multiple identical watermarks at the same time, and the determination of watermark information adopts the way that the minority is subordinate to the majority when extracting the watermark, which enhances the error correction ability of the watermark and improves the robustness of the watermark compared with embedding a single watermark information. Only when copyright is fully respected and integrated into the normalized digital music business model can we provide the development momentum and creative enthusiasm of the digital music industry. For search engine manufacturers, service providers and digital music operators, only the cooperation based on copyright protection is the only way for the sustainable and mature development of digital music industry and the win–win situation for all parties.

7.4 Conclusion The continuous improvement of blockchain technology and its application in the field of digital music copyright protection will provide a new opportunity for digital music copyright protection. Blockchain has the characteristics of non-tampering and decentralization, which makes it possible to effectively protect digital copyright. By

72

Y. Kuang and Y. Sanjun

analyzing the existing technologies, standards and systems, this paper puts forward a new scheme for digital music copyright protection, and finally completes the design and implementation of digital music copyright protection system. Audio watermarking is a new subject, which integrates audio signal processing, cryptography, communication theory, signal compression and human auditory system theory, and is a multidisciplinary technology. This paper designs a digital audio watermarking algorithm based on chaotic encryption. The research results show that the embedding time of this algorithm is kept within 2 s, which meets the real-time embedding requirements of watermark. The algorithm adopts the way of embedding multiple identical watermarks at the same time, which improves the robustness of watermarks.

References 1. J. Nie, Blockchain-based digital publishing copyright protection. Publ. Res. 2017(9), 4 (2017) 2. N. Mao, X. Zhang, Network copyright protection based on blockchain technology. Libr. Forum 39(8), 7 (2019) 3. S. Zhang, Y. Dong, Research on digital copyright protection based on blockchain technology. Sci. Technol. Manag. Res. 40(1), 5 (2020) 4. A. Kaur, M.K. Dutta, High embedding capacity and robust audio watermarking for secure transmission using tamper detection. ETRI J. 40(1), 133–145 (2018) 5. Y. Jia, Research on Internet copyright trading based on blockchain technology. Technol. Publ. 2018(7), 9 (2018) 6. J. Sun, Digital infringement status and copyright protection of sci-tech journals—a preliminary study on the feasibility of blockchain technology. China Sci-Tech J. Res. 029(010), 1000–1005 (2018) 7. Y. Qin, Y. Wang, Application prospect of audio book copyright protection in the “blockchain” + era. Chin. Editor. 2020(4), 7 (2020) 8. J. Lv, Z. Zhang, Y. Xu, Blockchain-based digital copyright protection method for social networks. Comput. Eng. Des. 2021(006), 042 (2021) 9. W. She, J. Chen, Z. Gu et al., Location privacy protection model of internet of things nodes based on blockchain. J. Appl. Sci. 38(1), 13 (2020) 10. M.J. Hwang, J.S. Lee, M.S. Lee et al., SVD based adaptive QIM watermarking on stereo audio signals. IEEE Trans. Multimed. 2017(99), 1 (2017) 11. G. Hua, J. Huang, Y.Q. Shi et al., Twenty years of digital audio watermarking—a comprehensive review. Signal Process. 128(10), 222–242 (2016) 12. M. Zamani, A. Manaf, Genetic algorithm for fragile audio watermarking. Telecommun. Syst. 59(3), 1–14 (2015)

Chapter 8

Personalized Music Recommendation Model Based on Collaborative Filtering Algorithm and K-Means Clustering Huijia Peng and Sanjun Yao

Abstract Personalized music recommendation technology provides personalized service for users, so that users can find the products they are interested in in in the shortest time. At present, there are many recommended technologies. The traditional usage information filtering technology is mostly content-based filtering technology, in addition to collaborative filtering technology, clustering, Bayesian network, association rules and so on. Among them, CF (Collaborative Filtering) technology is the most mature and successful technology. Therefore, this paper establishes a personalized music recommendation model based on CF algorithm and K-means clustering, and verifies that the accuracy of the test results of this model is better than that of the model established in reference [5] through simulation experiments. Interested in; The average accuracy of great interest is 54.16%, 70.77% and 81.83% respectively. It is better than the test results of the model in reference [5]. It can be seen that this model is more accurate in measuring the interests of these professional users, which makes the recommendation results more in line with the needs of users. Keywords Collaborative filtering algorithm · K-means clustering · Personalized music recommendation model

8.1 Introduction With the development of online music, massive amounts of music can be easily obtained by people through the internet. How to discover their favorite music from the massive amount of music has become a problem faced by users. Personalized music recommendation technology provides users with personalized services, enabling them to find products of interest in the shortest possible time. There are many recommended technologies. Therefore, personalized music recommendation H. Peng · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_8

73

74

H. Peng and S. Yao

technology has become increasingly important, which can effectively help users quickly discover their favorite music from the massive amount of music. Most of the traditional information filtering technologies are content-based filtering technologies. In addition, there are collaborative filtering technologies, clustering, Bayesian network, association rules, etc. [1]. This algorithm is very general, but for songs that have not been rated by users, CF appears powerless; Content filtering establishes feature vectors based on song content, and uses similarity algorithms to identify songs with similar content. This recommendation is applicable to songs that have not been rated by users, and to some extent complements CF [2]. Among them, CF technology is the most mature and successful technology used. CF recommends songs to users based on ratings from others with similar preferences. This article establishes a recommendation model based on the relationship between songs using the CF algorithm and K-means clustering, as well as the probability distribution of songs under different themes. K-means clustering is used to study the implicit dependency relationships between column attributes, which is beneficial for fusing column attribute values. This simplifies the classification of missing values by the K-means clustering algorithm in the data [3]. The algorithm has expanded the application field of naive Bayesian network. Obtaining recommended songs for a certain song under different themes is a beneficial supplement to content filtering. CF first analyzes user interests, and then provides their experiences and suggestions to users within the same user group as a reference, in order to satisfy people’s mentality of referring to others’ opinions before making decisions. But sometimes users hope to get recommendations that can be refined into different themes, such as emotional similarity, music type similarity, and even musical instrument similarity. Compared with other algorithms, this algorithm has the greatest advantage of being simple, easy to implement, high execution efficiency, and relatively high accuracy in music recommendation [4].

8.2 Collaborative Filtering Algorithm The basic idea of CF algorithm is to recommend information interested by users with similar interests to target users. Such as music recommendation and movie recommendation. CF algorithm is divided into: user-based CF and project-based CF. CF algorithm can be divided into two categories: storage-based CF algorithm and model-based CF algorithm. At present, most of the practical applications of CF algorithms belong to the category of storage-based CF algorithms. Unlike the traditional content-based filtering, which directly analyzes the content for recommendation, CF analyzes the user’s interest, finds the similar users of the specified user in the user group, and synthesizes the evaluations of these similar users on a certain information, thus forming a system’s preference prediction for the specified user [5]. The working principle of content filtering: probability statistics and machine learning are used to realize filtering. First, a user interest vector is used to represent the user’s information needs. Then, the text in the text set is segmented, indexed, and weighted by word

8 Personalized Music Recommendation Model Based on Collaborative …

75

Fig. 8.1 CF algorithm flowchart

frequency statistics to generate a text vector; Finally, the similarity between the user vector and the text vector is calculated, and the resource items with high similarity are sent to the registered users of the user model. CF algorithm is different from the traditional algorithm, and its algorithm flow is shown in Fig. 8.1. The core of CF recommendation algorithm is to calculate the nearest neighbor set through similarity measurement method, and return the nearest neighbor rating result as the recommendation prediction result to the user. The CF algorithm mainly studies a set of projects evaluated by the target user, calculates the similarity between these projects and the target project, and then outputs from the top K projects with the highest similarity, which is different from CF [6]. CF recommendations rely on the relationships between projects to determine recommendations. According to the projects that have been evaluated by the target users, calculate the similarity between the project and other projects, and select a series of the most similar projects to generate recommendations. Due to the similarity between projects, the rating range of other projects can be inferred from the ratings of known projects, and projects with similar ratings can be recommended.

76

H. Peng and S. Yao

8.3 Design of Personalized Music Recommendation Model During the registration process of new users, basic information such as age, gender, occupation, etc. needs to be filled in. After registration, personal interests and hobbies label information needs to be filled in, so that users can perform normal rating browsing and other operations. The user’s rating of resources reflects their satisfaction with the resources, and in the vast majority of cases, represents the quality of the resources. CF recommendations based on rating data have excellent recommendation accuracy, and their recommendation results can be guaranteed in terms of quality [7, 8]. As the system interacts with users, the system records their information and gradually captures their interests and hobbies. Users can rate their favorite music, and the system stores their ratings and browsing behavior in a log. The data analysis engine is called to analyze the user’s behavior and update the user’s stored information. As a supplement to personalized music recommendations, common charts include popular music charts, high scoring blockbuster charts, box office charts, etc. The core data layer stores the core business data of the entire system, such as user information and music information required for calculating recommendation lists, user similarity lists, music similarity lists, etc. Due to the large amount of data processed, the principal component analysis method was borrowed from the CF algorithm to perform dimensionality reduction operations while preserving the amount of data information, which greatly helps the algorithm focus on studying classification models [9]. We studied personalized music through CF algorithm combined with K-means clustering and constructed a personalized music recommendation model. The personalized music recommendation model design is shown in Fig. 8.2. There are two methods for obtaining user behavior data in personalized music recommendation models: explicit rating and implicit rating. Explicit scoring refers to the process in which users rate resources directly. Implicit rating is the process of recording user behavior data and weighting it into the user’s rating of resources. Both explicit and implicit ratings have their advantages and disadvantages. The former is more intuitive and can accurately reflect users’ preferences for resources. However, due to the possibility of interrupting users’ learning behavior during rating operations, it may bring some inconvenience to users [10]. Based on the user’s recommendation content, the following similarity model for calculating personalized music recommendation content can be obtained: Log I n f o(q) = a scor e(d, dr ead ) − γ scor e(d, ddelete )

(8.1)

where a and γ are the weights of related content sets and irrelevant content sets. Thus, a personalized music recommendation model can be obtained: I nter enst(q) = Sim(d BG , d) + Log I n f o

(8.2)

To sum up, the measure of user’s interest in personalized music recommendation content is the correlation between the document and the background information

8 Personalized Music Recommendation Model Based on Collaborative …

77

Fig. 8.2 Personalized music recommendation model

filled in by the user, and finally the interest model established according to the user log. In this paper, we recommend the content of music that users have read, and we can think that users are interested in these contents, and regard them as relevant contents, so as to obtain the interest information of users. The similarity score between the target content and the read content can be expressed as:    scor e d, dr ead = Sin

(8.3)

Among them, d stands for the number of times a document has been read by users. The more times a user reads it, in order to make the influence of personalized music recommendation content far greater than the original music recommendation content weight, it is normalized to represent the total number of times all relevant documents have been read. Similarly, the similarity score between the target content and the deleted content can be expressed as:  scor e(d, dr ead ) =

Sim(d, ddelete ) n delete

(8.4)

78

H. Peng and S. Yao

Among them, n delete represents the deleted music recommendation content. When the user’s interests change, the model can capture them and make new recommendations. These data are direct sources of personalized music recommendations, such as the user’s K nearest neighbors and N pieces of music recommended to a certain user. Finally, the algorithm uses K-means clustering to study the implicit dependency relationships between column attributes, which is beneficial for fusing column attribute values. Using K-means to study the implicit dependency between column attributes is beneficial to the fusion of column attribute values, which simplifies the classification of missing values in the next K-means clustering algorithm in data. The algorithm is equivalent to expanding the application field of naive Bayesian network. When there is a large amount of data and there is an implicit relationship between the attributes of each dimension, it is difficult for naive Bayesian network to achieve the ideal effect, and the improved algorithm will be an effective way to deal with these problems. This simplifies the classification of missing values in the next step of the K-means clustering algorithm in the data [11]. We preprocess the personalized music recommendation content and conduct session recognition, that is, extract, decompose and merge the original data, and finally convert it into a data format suitable for data mining, and save it to the relational database table. The algorithm has expanded the application field of naive Bayesian network. When there is a large amount of data and there is an implicit relationship between the attributes of each dimension, the naive Bayesian network is difficult to achieve ideal results, and CF algorithm will be an effective way to deal with these problems.

8.4 Implementation of a Personalized Music Recommendation Model 8.4.1 Personalized Music Model Recommendation Process The music song network is a ringed undirected graph of music composed of nodes and undirected edges. The undirected edge is established based on the similarity between songs. Music only builds an edge when the similarity between two songs exceeds a dynamically set threshold. Music clustering is the process of dividing a dataset into several classes to complete a music clustering task, which mainly depends on feature selection and nearest neighbor measurement. Feature selection requires appropriate selection of features for music, including as much information as possible related to the task. In this model, the edge factor function represents the impact of the relationship between songs in the input song music network on song recommendation under the theme. In the input song network, reducing and minimizing information redundancy in music features is the main goal. Because in supervised music classification, it is necessary to use pre processing of previous features. After analyzing the music recommendation content and obtaining user preferences, music can be

8 Personalized Music Recommendation Model Based on Collaborative …

79

calculated based on user preferences to calculate similar users and items, and then recommendations can be made based on similar users or items [12]. This means that music is two branches of CF, user based and music recommended content based CF. The music online section is mainly a real-time recommendation engine. The engine first analyzes the user’s access behavior online, and then executes CF music calculation based on the current user access page sequence and user configuration files to generate recommendation pages for the server, and then returns them to the browser, so that music can more accurately personalized music recommendation content to users. When calculating the similarity between users, music takes a user’s preference for all items of music as a vector, while when calculating the similarity between items, it takes all users’ preferences for a certain music item as a vector. The similarity relationship between users in a music group is a set of users with similar access patterns. The CF recommendation algorithm music believes that if two users have similar access patterns, it means that their interests in accessing other music websites are also similar. After the similarity of the music is finally calculated, the next step is to find the neighborhood music.

8.4.2 Result Analysis In the personalized music recommendation model based on CF algorithm and Kmeans clustering, we only need to find the class center with the highest similarity first, and then find the neighbors from the class towel with the highest similarity in the class center. If all neighbors are not found in the class with the highest similarity, then search from the class with the second highest similarity until all neighbors are found. In order to verify the effectiveness of the model proposed in this paper, it will be tested through simulation experiments. The data set of this paper comes from the Internet. Through the experiment of the model in reference [5] and the model established in this paper, 25 users’ teaching and scientific research information search logs are collected, in which each user searches for a course keyword and downloads 200 pieces of music. The correlation degree of 200 music downloaded by 25 users was judged manually: three test results were obtained: General interest; Interested in; Very interested. The experimental results are shown in Figs. 8.3 and 8.4. According to the test results in Figs. 8.3 and 8.4, the accuracy of the test results of the model constructed in this paper is better than that of the model established in reference [5], and the model in this paper is generally interested in the test results; Interested in; The average accuracy of great interest is 54.16%, 70.77% and 81.83% respectively. It is better than the test results of the model in reference [5]. It can be seen that in this model, the edge factor function represents the influence of the relationship between songs in the input song music network on the song recommendation under the theme. In the input song network, the main goal is to reduce and minimize information redundancy in music characteristics. Because in supervised music classification, it is necessary to preprocess the features before use. We can

80

H. Peng and S. Yao

Fig. 8.3 Test results of the model in reference [5]

Fig. 8.4 Test results of this model

find rules from users’ behaviors and preferences and make recommendations based on them, so how to collect users’ preference information becomes the most basic determinant of the system recommendation effect. There are many ways for users to provide their own preference information to the system. It can be seen that this

8 Personalized Music Recommendation Model Based on Collaborative …

81

model can measure the interest of these professional users more accurately, which makes the recommendation results more in line with the needs of users.

8.5 Conclusions This article establishes a personalized music recommendation model based on the CF algorithm and K-means clustering, which combines the local attributes and global structure of songs to obtain theme based song recommendations. The interests of the vast majority of people also apply to an individual, that is, public interests can represent individual interests. CF refers to building models for a large number of users. The collaborative recommendation algorithm proposed in this article is currently a commonly used recommendation algorithm, mainly based on two types of filtering: project based and user based. At the same time, combined with K-means clustering, personalized music recommendation adopts a popular evaluation strategy to filter and recommend personalized music recommendation content, which has significant value in improving the accuracy of user content and enhancing user recall. There are two methods for obtaining user behavior data in personalized music recommendation models: explicit rating and implicit rating. Explicit scoring refers to the process in which users rate resources directly. Implicit rating is the process of recording user behavior data and weighting it into the user’s rating of resources. It is not recommended in a general manner without considering the theme. On the other hand, with the concept of probability graph models, songs are no longer independent individuals, but instead become an interconnected and interdependent network, providing a foundation for better recommendations.

References 1. H. Ning, Q. Li, Personalized music recommendation simulation based on improved collaborative filtering algorithm. Complexity 20(11), 61–78 (2020) 2. R. Zhang, Z. Hu, Collaborative filtering recommendation algorithm based on bee colony Kmeans clustering model. Microprocess. Microsyst. 65(10), 103424–103444 (2020) 3. L. Chen, Y. Luo, X. Liu et al., Improved collaborative filtering recommendation algorithm based on user attributes and K-means clustering algorithm. J. Phys. Conf. Ser. 1903(1), 012036– 012055 (2021) 4. X. Han, Z. Wang, H.J. Xu, Time-weighted collaborative filtering algorithm based on improved mini batch K-means clustering. Adv. Sci. Technol. 105(87), 309–317 (2021) 5. W.U. Yun, J. Lin, M.A. Yanlong, A hybrid music recommendation model based on personalized measurement and game theory. Chin. J. Electron. 32(15), 1–10 (2022) 6. J.S. Gomez-Canon, E. Cano, T. Eerola et al., Music emotion recognition: toward new, robust standards in personalized and context-sensitive applications. IEEE Signal Process. Mag. 66(38), 49–62 (2021) 7. M.U. Hassan, N. Zafar, H. Ali et al., Collaborative filtering based hybrid music recommendation system. Complexity 49(15), 59–67 (2022)

82

H. Peng and S. Yao

8. J. Chen, C. Zhao, Uliji et al., Collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering. Complex Intell. Syst. 48(1), 19–46 (2020) 9. N.F. Al-Bakri, S. Hassan, Collaborative filtering recommendation model based on K-means clustering. Al-Nahrain J. Eng. Sci. 22(1), 74–79 (2022) 10. X. Wang, Z. Dai, H. Li et al., Research on hybrid collaborative filtering recommendation algorithm based on the time effect and sentiment analysis. Complexity 20(2), 1–11 (2021) 11. J. Liu, J. Song, C. Li et al., A hybrid news recommendation algorithm based on K-means clustering and collaborative filtering. J. Phys. Conf. Ser. 1881(3), 032050–032066 (2021) 12. W.U. Haijin, J. Chen, Context-aware music recommendation algorithm based on classification and collaborative filtering. J. Fuzhou Univ. Nat. Sci. Ed. 11(7), 41–56 (2019)

Chapter 9

Simulation of Fuzzy Calculation Model of Music Emotion Based on Improved Genetic Algorithm Yiming Wang, Yaping Tang, and Yuanling Ouyang

Abstract With the development of science and technology, the storage capacity of electronic data is getting stronger and stronger, which makes it possible to store and manage massive electronic multimedia data. Music emotion calculation involves the complex emotion representation of multi-dimensional and multi-level structure, and the fuzziness, subtlety and diversity of emotion itself make the traditional emotion recognition methods generally inefficient and inaccurate. In order to improve the recognition accuracy, firstly, Gaussian radial basis function is used for nonlinear mapping to distinguish, extract and amplify more detailed information. This article combines GAs with the “model music” composition method. Generate a music model according to the model music composition method, and then use Genetic Algorithms (GA) to evolve the music to produce music that is different and more characteristic from traditional composition methods. And verified by experiments. Based on this, a rule adjustment strategy based on improved GA is proposed to further address the shortcomings of uniformly setting cluster radius thresholds in the algorithm, and systematic experiments are conducted. The experimental results show that the optimized fuzzy computing model has better recognition effects compared to the Bayesian classification method based on probability and statistics. Keywords Artificial intelligence · Improved genetic algorithm · Support vector machine · Music · Emotional analysis

Y. Wang · Y. Tang (B) College of Music and Dance, Hunan University of Humanities, Science and Technology, Loudi 417000, China e-mail: [email protected] Y. Ouyang Changsha Human Resources Public Service Center, Changsha 410000, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_9

83

84

Y. Wang et al.

9.1 Introduction In the history of human music, the outpouring and externalization of emotion is an eternal theme, and emotion is the essential feature of music. Modern psychological research shows that the non-semantic organizational structure of music vibrating by sound waves has a direct isomorphic relationship with human emotions and will activities. It is this “isomorphic relationship” that provides all kinds of possibilities for music to simulate and depict people’s emotional activities in many ways by analogy or analogy [1]. Music emotion analysis is a research direction of artificial intelligence. In the last century, when artificial intelligence just started, some people raised the related problems of music emotion analysis, hoping that the computer can identify the emotions expressed by music by using the related technologies of artificial intelligence, and then make the computer achieve breakthroughs in emotion identification, automatic composition, emotional music retrieval and so on [2]. Music can subtly affect people’s hearts and cultivate their sentiments. Beautiful and peaceful music can soothe people’s hearts, heal heartbreaks, and lyrical music can express people’s love for a happy life and their longing for a better life in the future [3]. Therefore, music, an ancient and long-standing art form, has almost accompanied the entire process of human development and has always occupied an indispensable and important position in people’s lives. However, for computer composition systems, it is important to consider how to generate more musical segments and more diverse pieces of music [4]. Moreover, interactive composition system needs a lot of users’ participation, which will undoubtedly add a lot of workload to users, and one of the purposes of computer composition system design is to make computers imitate composers to automatically generate ideal music. Emotional calculation of music includes expression, recognition, retrieval and synthesis of emotions in music [5]. Emotion recognition of music is a typical pattern recognition problem. The commonly used methods include statistical methods, machine learning, various clustering and classification methods, neural network technology and fuzzy pattern recognition. Music itself is also the object that people should devote themselves to study, because music has a strong influence on human personal life, personal growth, group action and even a nation’s way of thinking [6]. In terms of the impact of music on individuals, the French thinker Rousseau once said, “Many of my achievements in science are inspired by music.“ Scientists Einstein, writers Gorky, and Balzac are also inextricably linked to music. Music can also relieve stress and relax the body and mind. Everyone should have the experience of waking up in the morning and playing a lively piece of music, which can make people physically and mentally happy and full of energy [7]. The existing music emotion models generally adopt a discrete classification approach, while the corresponding labeling of music emotions adopts an emotional clustering form, which is not consistent with the actual situation of music emotion cognition. For example, it is difficult to determine that a piece of music with a “happy” emotion definitely does not have a “satisfying” emotion [8]. People’s description of music emotion is a subjective description and perceptual knowledge based on fuzzy cognition, and fundamentally, it is a description of fuzzy

9 Simulation of Fuzzy Calculation Model of Music Emotion Based …

85

phenomena. Music emotion has the characteristics of fuzziness, subjectivity, objectivity and movement in essence. It is quite difficult to try to establish a traditional and accurate mathematical model in the study of artificial emotion of music. In the previous research, this paper tried to define the model of music emotion calculation based on fuzzy logic, and used membership function to express the semantic attributes of fuzzy concepts, and established fuzzy relations. However, this must be done through a lot of experiments and under the guidance of experts, and its construction process is quite time-consuming and laborious. To solve this problem, this article establishes a music emotion fuzzy calculation model based on an improved GA. According to the definition of music emotion language values, by expanding the concept of semantic similarity, the definition of music emotion vectors can be easily given. And we compared our method with other methods, and the experimental results showed that our method has higher recognition and is more stable.

9.2 Simulation Research on Fuzzy Computing Model of Music Emotion 9.2.1 Quantitative Research on Musical Emotion Music emotion is a kind of special psychological ambiguity, which includes emotions, moods, preferences, tastes and attitudes related to music practice. Usually, various psychological scales are used for subjective evaluation, and adjectives and adverbs are used to indirectly express the relatively short-lived emotional disturbance state of individuals under the action of music [9]. Currently, the most common music retrieval method based on text description and sentiment analysis is still based on text metadata. Usually, this method of music retrieval is simple and effective, but due to the emergence of electronic music, it is not possible to add detailed description information to many music projects. Alm uses the method of supervised learning to realize the emotion prediction of the text. In the application of music retrieval, Liu et al. proposed the detection of music emotional information on ISMIR. Fuzziness is an important feature of psychological phenomenon. Psychological fuzziness refers to the intermediary transition that the human brain reflects objective differences, and a kind of uncertainty caused by it, and its quantification is called psychological fuzziness [10]. This kind of psychological ambiguity is often fluctuating (personality), but this kind of fluctuation also shows relative stability (commonality) [11]. It is precisely by relying on relative stability that people can communicate with each other and convey common information through vague concepts. Film producers also like to add background music to movies to resonate and create common emotions between the viewers and the characters in the movies, thereby creating a “favorable impression” for the viewers. From these examples, we can see that music and emotion are inextricably linked. Music can affect emotions and also express them. Emotions

86

Y. Wang et al.

Fig. 9.1 The relationship between music and emotion

can affect the perception of music by people listening to songs. The relationship between music and emotion is shown in Fig. 9.1. Emotion is an essential feature of music, and cognition and emotion are closely related to music. Human emotional and volitional activities can generate corresponding relationships through the acoustic vibration of music and the non semantic organizational structure of music. With the rapid development of Internet information dissemination technology and electronic data storage technology, the number of online music resources is increasing exponentially, which provides music system users with more choices, but also makes more people have no choice. Then, through online surveys and interviews, experts were asked to select adjectives that could represent various sub categories of musical emotions, and to provide an open interface to supplement adjectives.

9.2.2 Construction of a Fuzzy Computing Model for Music Emotion It can be said that music can’t be achieved without audio. Therefore, many researchers try to understand and analyze the emotion of music by audio processing. Music retrieval includes three main aspects: the organization and management of music data resources, the understanding of music query forms, the matching and sorting between music data resources and user queries. There are many advantages in the expression of music data, including music score, audio, lyrics, tunes, categories, comments, etc. These data constitute the data set of music retrieval. Nowadays, the more popular music retrieval algorithms are all based on these attributes and content information, and use the intrinsic or external genes of music to construct music retrieval models. For example, Baidu MP3 and Google Music have developed retrieval systems based on humming retrieval technology. The construction idea of the fuzzy classification model based on kernel clustering evolutionary algorithm (FCMBEKC) is as follows: First, select appropriate kernel functions to map the initial pattern space into a high-dimensional feature space, making the distribution of patterns in the high-dimensional feature space more simple and separable; Then, using a kernel clustering evolutionary algorithm, the training patterns of the same class are divided into different hyperspheres in a high-dimensional feature space, with each

9 Simulation of Fuzzy Calculation Model of Music Emotion Based …

87

hypersphere corresponding to a cluster, and considered as a fuzzy partition. The membership function of the hypersphere is defined using the cluster center of each cluster as a parameter; Finally, establish a fuzzy rule for each cluster, and the specific construction process is shown in Fig. 9.2. This model does not need to define the fuzzy set and its membership function for the language value L, but defines the similarity relation of the language value through the fuzzy relation matrix to explain the connotation of the language value. Due to the cognition of the semantic relationship of language values, people are formed through acquired learning. Therefore, we can’t give the expression of fuzzy relation matrix for granted, but we should analyze this cognitive relationship through experiments.

Fig. 9.2 Classification model construction flow chart

88

Y. Wang et al.

9.3 Simulation of Fuzzy Calculation Model of Music Emotion Based on Improved GA 9.3.1 Improved GA Model Many twentieth century musicians abandoned traditional composing habits in the creation of some of their works, instead of using all the sounds in a specific syllable structure as creative materials, and limited the applied materials to only a few sounds. These sounds are combined vertically to form a chord, or horizontally to form a set of melody motives (namely “longitudinal chord” and “lateral motive”). Then the whole song is built on this basis for transposition, displacement, reflection, change, development and completion. Different from the traditional theme presentation, expansion, reproduction and other ways. This kind of music is often called “atonal music”. Since each model can be regarded as a “cell” or a “small unit”, genetic operations can be performed on the original model. Mutation operation: according to the characteristic sound of the original model, the prototype model is mutated in the way of longitudinal chord and lateral motivation. The ways of variation are different chord ways according to the given probability, or non-chord ways (that is, unharmonious ways) according to the given probability. According to the definition of musical emotional language value, the definition of musical emotional vector can be easily given by expanding the concept of semantic similarity. Definition 2 (music emotion vector) defines the emotional connotation of a piece of music Mc with independent emotional semantics in Hevner; The eightdimensional vector e on the emotional ring indicates that the element value ei represents the semantic similarity relationship between music and each sub-emotional language value, and the similarity degree is expressed by a numerical value between 0 and 1, which is called the musical emotional vector e. E = (r (M, L AoM 1 ), . . . r (M, L AoM), . . . , r (M, L AoM 8 )

(9.1)

Among them, the sub-emotion with the largest value is defined as the dominant emotion E don of music, E don = max(ei ), i = 1, 2, ..., 8

(9.2)

In the formula, max () means L AoMi corresponding to the maximum value. For example, the emotion of a piece of music M can be expressed as (0.2, 0.6, 0.9, 0.4, 0.3, 0.1, 0.0, 0.1) after a certain reasoning process, and its dominant emotion is “yearning”, and its emotional semantic similarity value is 0.0; 9. However, due to the particularity of search queries, not all queries have emotional tendencies. For example, music queries such as “Liang Jingru” or “one like summer and one like autumn” do not contain any emotion in the original attribute data of

9 Simulation of Fuzzy Calculation Model of Music Emotion Based …

89

music. Therefore, in order to better meet the emotional needs of music retrieval systems, this article defines emotions such as NONE based on the six categories of emotions defined by WordNet Effect, indicating that the current text does not contain any emotional tendencies. Based on the above definition, the emotions of the text in this article are defined in the following forms: E = (e1 , e2 , . . . , en )

(9.3)

Among them, it is the emotional category, and c∗ = arg maxc P(c|d ) is the corresponding emotional feature in this emotional category. Under this definition of emotional dimension, the emotional semantic space based on text can be expressed in the following form: E P = (E 1 , E 2 , . . . , E i , . . . , E n )

(9.4)

where EP represents the whole emotional semantic space, E i corresponds to the dimension of a certain emotional category, and the value of i ranges from 0 to 7. In this experiment, lyrics attributes and tags are regarded as document d, and the frequency of emotional indicators is counted as a feature weight indicator. Text information such as lyrics attributes and tags is represented in the form of a vector space model. The classification of the emotional semantic space of a document is the category in which the words in document d have the highest probability of being classified.

9.3.2 Analysis of Experimental Results Through acquired learning, people have formed a cognitive understanding of the semantic relationship between language values. This cognitive relationship can be analyzed experimentally in two ways: one is called a direct test, which obtains the semantic similarity relationship of language values by statistically scoring the semantic similarity of language values by participants; The other is called indirect testing, which requires participants to give a description of language values for a certain aspect of music, and then obtain the semantic similarity relationship of language values by analyzing the similarity of the characteristics. In order to verify the effect of emotion classification of non-descriptive music query in the high-dimensional emotion space defined in this paper, this paper designs a comparative experiment from the perspective of text emotion classification. The selection of feature set in the experiment follows the principle of “from simple to complex”. First, the factors that may affect the emotion recognition of query are designed separately, and these factors are integrated. Specifically, for a query, the factors that affect its emotional judgment are emotional indicators, degree modifiers and negative words. Usually, for popular songs, the listener will choose whether to continue listening to

90

Y. Wang et al.

the song according to whether the emotion expressed by the lyrics is in line with his own emotion at that time. If the emotion expressed by the lyrics is in line with the listener’s current emotional appeal, the song is more likely to be collected and recommended by the user. If the emotion expressed by the lyrics is not in line with the user’s emotional appeal at that time, the user is more inclined to listen to other songs and give up the current song. As shown in Table 9.1, in music information retrieval experiments based on emotional analysis and non descriptive query processing, the most effective method is the SVM machine classification method, which is determined by the number of texts in the machine learning process. Due to the particularity of the SVM model, the more training samples, the better the classification results. In addition, this experiment uses cross validation to overcome the over fitting problem and ensure that the feature representation of the data is closest to the original spatial representation. Because affective computing requires a high understanding of the specific content of context, if contextual information is not added, it will have a great impact on the performance of affective prediction. Therefore, the next step of the music retrieval model based on emotional semantic similarity proposed in this paper is how to add contextual information. The other five emotional feature values are obtained by statistical methods, and the change curves of different features are shown in Figs. 9.3 and 9.4. As can be seen from the above figure, the rhythm curve changes very little, because there is relatively little information recorded in the ancient piano score, which is mainly carried out in combination with the player’s breathing during the playing process, so the rhythm characteristics have little influence on our emotional recognition. It can be seen that the big jump interval and sliding tone change greatly, which plays a great role in emotion recognition. In summary, it can be seen that an increase in the value of rd, that is, a decrease in the threshold value of the cluster radius, has no effect on improving the recognition rate. As can be seen from the above, when the number of training modes in each cluster reaches a certain preset value, fuzzy rules are generated. When the preset value is set too small, many small and diverse clusters will frequently generate fuzzy rules, and these rules have little effect in actual classification, and the generalization ability of the rules is also not strong. On the contrary, if the pre value is increased, although the number of generated rules decreases and the generalization ability of the rules increases accordingly, more small hyperspheres (small clusters) will be ignored, Table 9.1 Comparison of search results for non descriptive queries Experiment NO.

Method

MAP

P@5

NDCG@5

NO. 1

Keyeords matching

0.1315

0.4827

0.3324

NO. 2

Naive Bayes

0.3015

0.6342

0.4967

NO. 3

SVM

0.2643

0.6031

0.4976

NO. 4

Logistic regression

0.3012

0.6224

0.4930

NO. 5

Maximum entropy

0.3022

0.6246

0.5060

9 Simulation of Fuzzy Calculation Model of Music Emotion Based …

91

Fig. 9.3 Emotional eigenvalues of Qinqu (a)

Fig. 9.4 Emotional eigenvalues of Qinqu (b)

resulting in poor recognition results. For the emotional semantic classification of songs on lyrics and social tags, this paper introduces in detail the acquisition and processing methods of lyrics and social tags used in this experiment, as well as the representation model of music data in high-dimensional emotional semantic space, and finally compares and verifies the performance of emotional recognition in various emotional feature spaces with experimental data. Finally, the music retrieval models based on emotional semantic space similarity calculation are described in detail,

92

Y. Wang et al.

and their retrieval performance under different emotional recognition methods is compared.

9.4 Conclusion Music emotion analysis is a hot topic that is gradually emerging in the field of artificial intelligence. The research goal is to enable computers to recognize the emotions of music. In the research of music emotion analysis, computer automatic composition, emotion based music retrieval, and other technologies continue to emerge, playing an important role in the development of multimedia technology, as well as in the creation and dissemination of music. Currently, a relatively large number of GA composition systems are mainly based on interactive GAs, which are systems that fully consider human subjectivity to evaluate the different degrees of appreciation of music and generate segments. Although this can fully reflect the personal feelings of users, it requires a large number of users to participate in the composition and creation. The calculation model of music emotion constructed in this paper is closely related to the research of music psychology and music aesthetics. The experiment of semantic similarity and the research of multi-source information fusion of music emotion prove that this model conforms to the cognitive law of human beings about music emotion. The further research is to construct the cognitive model of musical emotion and the automatic labeling mechanism of musical emotion, combining with the cognitive experiment of musical emotion. The experimental results show that the model and implementation method in this paper have good recognition effect.

References 1. A. Fh, B. Ma, E. Smcd, Continuous emotion recognition during music listening using EEG signals: a fuzzy parallel cascades model. Appl. Soft Comput. 24(7), 25 (2020) 2. D.R. Unune et al., Fuzzy logic-based model for predicting material removal rate and average surface roughness of machined Nimonic 80A using abrasive-mixed electro-discharge diamond surface grinding. Neural Comput. Appl. 29(9), 37 (2018) 3. J. Taverner, E. Vivancos, V. Botti, A fuzzy appraisal model for affective agents adapted to cultural environments using the pleasure and arousal dimensions. Inf. Sci. 54(6), 13 (2020) 4. N. Tuan-Linh, K. Swathi, L. Minho, A fuzzy convolutional neural network for text sentiment analysis. J. Intell. Fuzzy Syst. 10(12), 60 (2018) 5. R. Malheiro, R. Panda, P. Gomes et al., Emotionally-relevant features for classification and regression of music lyrics. IEEE Trans. Affect. Comput. 58(9), 12 (2018) 6. I. Dufour, G. Tzanetakis, Using circular models to improve music emotion recognition. IEEE Trans. Affect. Comput. 44(8), 36 (2018) 7. S. Huang, H. Dang, R. Jiang et al., Multi-layer hybrid fuzzy classification based on SVM and improved PSO for speech emotion recognition. Electronics 22(7), 16 (2021) 8. J. Wang, Music education to rescue psychological stress in social crisis based on fuzzy prediction algorithm. Sci. Program. 55(8), 11 (2021)

9 Simulation of Fuzzy Calculation Model of Music Emotion Based …

93

9. C. Chen, Q. Li, A Multimodal music emotion classification method based on multifeature combined network classifier. Math. Probl. Eng. 30(1), 36 (2020) 10. Z.T. Liu, Q. Xie, M. Wu et al., Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 70(7), 46 (2018) 11. F. Fiumani, S. Kopenick, The importance of the emotional development in the music therapy with individuals with intellectual disability. J. Intellect. Disabil. Res.: JIDR 21(8), 65 (2021)

Chapter 10

Design and Implementation of Piano Performance Automatic Evaluation System Based on Support Vector Machine Mingzhu Zhang and Chun Liu

Abstract Computer automatic evaluation and analysis of piano performance is of great theoretical significance and practical value for online teaching, retrieval, classification, distribution and research of piano music works. In this article, SVM (Support vector machine) algorithm is applied to automatic piano performance evaluation, and an automatic piano performance evaluation system based on SVM is designed and implemented. Firstly, starting with the basic stage of piano performance music evaluation, the analysis and extraction of piano performance characteristics are studied. Then, through the automatic analysis and processing of audio signals by computer, the automatic evaluation of piano music is realized. Finally, the model constructed in this article is simulated on two piano music data sets with different characteristics. The simulation results show that the automatic piano performance evaluation system based on SVM has high stability, and it can still achieve more than 90% stability and high accuracy in the case of more transaction sets. The automatic evaluation system of piano performance has achieved the expected effect and provided a new method for the automatic evaluation of piano performance. Keywords Support vector machine · Piano performance · Evaluate

10.1 Introduction Due to the growth of the Internet, the main media of pop music has gradually changed from traditional radio stations and records to network downloads and network radio stations [1]. At the same time, influenced by the Internet, music education has gradually entered a new field of development. Internet-based music education has broken through the limitations of traditional education. By giving full play to the role of M. Zhang · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_10

95

96

M. Zhang and C. Liu

the network, students can learn knowledge anytime and anywhere, and at the same time, with its rich learning resources, it has mobilized students’ learning enthusiasm and interest. Computer music system has great advantages in assisting students’ piano performance training. Among them, the automatic assessment of piano performance is a very challenging but promising task in today’s music field [2]. Computer automatic assessment and analysis of piano performance is of great theoretical significance and practical value for online teaching, retrieval, classification, distribution and research of piano music works [3]. The traditional piano performance assessment method is mostly realized by expert scoring. Because the assessment of the performer’s performance is influenced by the subjective assessment of each expert and his personal thinking and feelings, the score given is based on feelings and impressions [4]. Although scoring gives a specific score, it can’t explain the exact meaning of the score, so the score that looks very accurate can’t accurately explain the specific level of performance or the state of Excellence of the evaluated performer. There are many factors that affect the piano playing effect, and the influence degree of each factor is also different [5]. The automatic piano performance assessment system with stable and reliable functions and good interactivity can provide users with more accurate assessment results, which can mobilize people’s autonomy and enthusiasm in learning piano to a great extent [6]. Therefore, the research on piano performance automatic assessment system has certain practical significance. SVM algorithm has been widely used in texture classification, text classification, face recognition, speech recognition and other fields [7]. Theory and practice have proved that SVM algorithm is robust to noise and outliers, and has strong generalization ability. After expansion, it can solve multi-classification problems. Piano music consists of 88 notes, and its range is the widest and most complete, covering the range of all instrumental and vocal music at present. At any time, when anyone presses the same key of the piano, the sound produced is very different, which has a high degree of acoustic similarity; the frequency of music signal is richer, and the frequency band coverage is much wider than that of voice signal [8]. In data processing, speech recognition is completed by establishing a corpus, which has a large amount of data, while the monophonic processing in this article is aimed at identifying 88 notes in the whole range of the piano. Music signal processing is an important part of signal processing field [9]. As a special quasi-periodic signal, music signal is richer in timbre content, more complex in frequency composition, wider in spectrum range and more obvious in time domain rhythm characteristics compared with speech signal [10]. By analyzing the physical characteristics and piano characteristics of piano music signals, this article applies SVM algorithm to automatic piano performance assessment, and designs and implements an automatic piano performance assessment system based on SVM by using its solid theoretical foundation and good classification performance. Experiments show that the automatic assessment system of piano performance has achieved the expected vision.

10 Design and Implementation of Piano Performance Automatic …

97

10.2 Methodology 10.2.1 Related Technical Basis SVM is a support vector method to solve the pattern recognition problem. This method originally comes from the processing of binary classification problem, and its mechanism can be simply described as: finding a hyperplane in the sample space that separates the positive and negative samples in the training set, and maximizing the blank areas on both sides [11]. Kernel function is the key to SVM classification, which can transform the problem of low-dimensional linear inseparability into high-dimensional space to realize linear separability, and avoid excessive calculation due to the increase of dimension. Because SVM can solve the basic problem of musical sound recognition well, this article considers applying SVM algorithm to automatic assessment of piano performance. Music and speech are both sound signals, and the basic principles of their recognition are similar, that is, they all analyze sound signals and apply noise processing, feature analysis, recognition and other processing processes. Piano is the most commonly used musical instrument, which is more adaptable and representative than other musical instruments. Piano bass is rich, midrange is natural and smooth, and treble is bright and gorgeous. The growth of piano and its music is inseparable from the twelve equal laws [12]. Piano music consists of 88 notes, and its range is the widest and most complete, covering the range of all instrumental and vocal music at present. As a sound signal, the single-tone signal of piano follows the basic laws of acoustics. For music signals, there are different classifications of characteristic parameters. From the point of view of signal processing, it can be divided into time domain parameters and frequency domain parameters, which represent the characteristics in time domain and frequency domain respectively [13]. For speech signals and music signals, they show different characteristics in time domain and frequency domain. When the signal is decomposed by wavelet, there is a problem of decomposition scale, that is, on the premise of filtering the noise signal, the frequency characteristics of the effective signal should be preserved to the greatest extent. In this article, the sampling frequency of the signal is 22,050 Hz, the piano keys are in the whole range from A2 to C5, and the frequency range of the signal is 27.50–4185.00 Hz. Therefore, the maximum frequency of components under the wavelet decomposition scale selected in this article should meet 4185.00 Hz. For piano music, the faster allegro music is generally about 240 beats per minute. Even according to the pianist’s extreme ability-at such a speed, 32-min notes are played in each beat, then according to the calculation that two adjacent notes are different, 960 notes can be played per minute, so the time occupied by playing a single note is 0.0625 s. It can be seen that the short-term stationary hypothesis of short-term Fourier analysis of piano music is established. In this article, SVM algorithm is applied to the automatic assessment of piano performance, and two characteristics, ZCPA and MFCC, are introduced to design and implement an automatic assessment system of piano performance based on SVM. The audio preprocessing module of the system is to preprocess

98

M. Zhang and C. Liu

the audio before piano performance assessment, that is, file format conversion and music recognition. In the system, the performance assessment module is the main carrier to realize the assessment function, which mainly takes the server system algorithm as the carrier. The core algorithm of assessment module belongs to multipitch detection algorithm. This module mainly includes the following four parts: performance positioning, multi-pitch detection, note level judgment and assessment result generation.

10.2.2 Design and Implementation of Automatic Assessment System for Piano Performance The traditional pattern recognition technology aims at minimizing the training error, only considers the fitting of the classifier to the training samples, and tries to improve the recognition rate on the new test sample set by providing sufficient training samples to train the classifier. However, for a finite set of training samples, there is no guarantee that the classifier that is effective for training samples can also effectively classify test samples. In the case of small sample set, blindly pursuing the classification accuracy of training sample set will lead to over-fitting. Theory and practice have proved that SVM is robust to noise and outliers, and has strong generalization ability. After expansion, it can solve multi-classification problems. This article designs a new automatic piano performance assessment system based on SVM algorithm. Because the data collected by recording is usually mixed with white noise, this article removes it by wavelet transform. The E-R diagram of the database of piano performance automatic assessment system based on SVM is shown in Fig. 10.1. The traditional SVM algorithm can’t treat categories differently according to features, so this article considers using measure learning to learn a projection matrix from training scores under supervision. In this article, the contribution of features to automatic assessment of piano performance is fully considered, and different

Fig. 10.1 E-R diagram of database of piano performance automatic assessment system

10 Design and Implementation of Piano Performance Automatic …

99

weights are assigned to features, and then a new distance measure D M is obtained to improve the original Gaussian radial basis function. The improved Gaussian radial basis kernel function is:       1 k M xi , x j = exp − 2 D M xi , x j (10.1) 2σ where σ is the parameter of Gaussian radial basis function, which can be obtained by grid search algorithm. xi , x j represents the feature vector; exp represents an exponential function based on the natural constant e; D M is a new distance measure;  T   D M xi − x j M xi − x j

(10.2)

The key problem of piano performance automatic assessment system is to select appropriate characteristic parameters, which can improve the assessment accuracy and speed. The pitch frequency of musical notes can determine the essential characteristics of musical notes. However, when two homophone high notes appear continuously, the pitch frequency can only reflect one note. Considering this situation, this article introduces two characteristics: ZCPA and MFCC. Usually, MFCC features are effective for speech recognition and musical instrument category recognition. In this article, it is used as the identification feature of piano single tone. Because it can reflect the distribution of audio signal energy in different frequency bands, and the audio signal energy of different piano single tones is concentrated in a fixed frequency band, MFCC can effectively describe the acoustic characteristics of piano single tones. Note level judgment needs to make clear the specific position corresponding to the standard audio through the information obtained in the performance positioning, and then obtain the information of the notes to be played from the standard music performance information, and compare it with the pitch information of each frame in multi-pitch detection, so as to obtain the accuracy of playing the notes in the current frame. In practice, when the samples in the training set are linearly inseparable, any classification hyperplane will be wrongly classified, so all training points cannot meet the constraints:   yi w T xi + b ≥ 1

(10.3)

In order to solve this problem, this article considers relaxing the constraint conditions when the training sample is linearly inseparable, and introduces the relaxation variable ξi ≥ 0 to the i training sample (xi , yi ). The relaxed constraint conditions are as follows:   yi w T xi + b + ξi ≥ 1

(10.4)

100

M. Zhang and C. Liu

 T Such a vector ξ = ξ1 , ξ2 , ξ3 , . . . , ξ p embodies the situation that the training set is allowed to be crossed. In this article, the problem of linear inseparability is mapped into a high-dimensional space, in which it is linearly separable, and linear judgment only needs inner product operation in the high-dimensional space, and even does not need to know the form of nonlinear transformation, so it avoids the problem of high-dimensional change calculation and greatly simplifies the problem. Through the detailed analysis of the automatic assessment project of piano performance, this system focuses on automatic assessment, that is, the user completes the basic settings and inputs the piano music, and the system will automatically analyze the music files and finally complete the automatic assessment of piano performance.

10.3 Result Analysis and Discussion Because the automatic piano performance assessment system is mainly aimed at piano learners, and its function is to help them practice piano performance, we should pay attention to the effectiveness and practicability of the system and the embodiment of the automatic piano performance assessment algorithm in the piano performance music data when testing the system. In this section, the simulation experiment of piano performance automatic assessment system based on SVM is carried out. All the algorithms are realized by MATLAB, and the computer environment used is 32-bit Windows operating system, with built-in Intel i5-4200M processor and 4G memory. The data set used in this article is mainly divided into two parts. The first part is the data set used to separate the mixed audio of voice and music accompaniment. After filtering out the invalid information in the data set, this article uses random filling to complete some missing information recorded in the data set to ensure the integrity, cleanliness and effectiveness of the data set and prepare for the subsequent analysis of user behavior characteristics. The system first realizes the extraction of piano music features one by one, and the realization of this module is the core part of the whole system. Figure 10.2 shows the training situation of the algorithm. The automatic assessment system of piano performance detects multiple pitches through multiple pitch detection algorithm, and the detection result is the notes played in each frame. As far as the detection result of each frame is concerned, it needs to be saved in the detection result database, so that it can be used for subsequent note level judgment. Among them, the analysis and extraction of piano music features plays a vital role in the automatic assessment of piano performance, and the scientific nature of the extracted object, the feasibility of the extraction method and the accuracy of the extracted results directly affect the effect of automatic identification. In this article, the traditional SVM algorithm is improved to better analyze and extract the characteristics of piano music, so as to improve the accuracy of automatic assessment of piano performance. Through experiments, the comparison of the accuracy before and after the improvement of the algorithm is shown in Fig. 10.3.

10 Design and Implementation of Piano Performance Automatic …

101

Fig. 10.2 Algorithm training situation

Fig. 10.3 Comparison of accuracy before and after improvement of the algorithm

It can be seen that the accuracy of the improved SVM algorithm is higher, and the classification accuracy is improved by about 9%. This shows that the improved algorithm in this article effectively improves the classification performance of SVM algorithm based on Gaussian radial basis function in automatic assessment of piano performance.

102

M. Zhang and C. Liu

Fig. 10.4 Algorithm efficiency comparison

In this section, using the method of single note denoising, Daubechies 4-order filtering is carried out on the continuous music signal in MATLAB. Wavelet denoising eliminates some high-frequency information of the signal and emphasizes the useful information of the signal, that is, low-frequency information, and the filtering effect is good. In order to further verify the effectiveness of the algorithm proposed in this article, this section compares the classification accuracy of LR algorithm, L-SVM algorithm and improved SVM algorithm. Figure 10.4 shows the efficiency curves of LR algorithm, L-SVM algorithm and improved SVM algorithm in this article. Figure 10.5 shows the stability of the automatic piano performance assessment system. The simulation results in this section show that the improved SVM algorithm has higher accuracy and better efficiency. At the same time, the piano performance automatic assessment system based on SVM constructed in this article has high stability. The system can still achieve more than 90% stability under the condition of more transaction sets; and the accuracy is high. Compared with the improved method, the classification accuracy is improved by about 9%. The experimental results further show that the proposed method is effective and achieves the expected vision.

10 Design and Implementation of Piano Performance Automatic …

103

Fig. 10.5 Stability comparison of automatic piano performance assessment system

10.4 Conclusions In the assessment, the examiners are greatly influenced by subjective factors, and the unstable situation that the assessment indicators have different weights for different players is easy to appear in the scoring process. Therefore, this article designs an automatic piano performance evaluation system based on SVM algorithm. Because SVM algorithm can simulate nonlinear system well, it can be seen from the experimental results that the model is basically consistent with the actual situation. The background of the automatic piano performance evaluation system in this article is to maintain the database of users and system resources, and the server is to carry the piano performance evaluation algorithm. The assessment results are fed back to users in the form of pictures. Finally, the evaluation system is verified by implementation and testing. The simulation results show that the improved SVM algorithm has higher accuracy and efficiency. At the same time, the piano performance automatic evaluation system based on SVM in this article has high stability, and it can still reach more than 90% stability in the case of more transaction sets. And the accuracy is high. Compared with the improved method, the classification accuracy is improved by about 9%. The automatic evaluation system of piano performance has achieved the expected effect and provided a new method for the automatic evaluation of piano performance. In the future, semi-supervised algorithm can be considered to make full use of a large quantity of label data without difficulty, which is expected to greatly improve the training effect of classifier and thus improve the evaluation accuracy of the algorithm.

104

M. Zhang and C. Liu

References 1. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Process. 11(7), 884–891 (2017) 2. J.A. Bugos, Intense piano training on self-efficacy and physiological stress in aging. Psychol. Music 44(4), 1–14 (2016) 3. Z. Li, On the style of the romantic period’s piano works. North. Music 39(10), 2 (2019) 4. X. Wang, Research on the improved method of fundamental frequency extraction for music automatic recognition of piano music. J. Intell. Fuzzy Syst. 35(3), 1–7 (2018) 5. C. Xi, S. Qin, S. Lima et al., The design and construction based on the ASEAN piano music library and display platform. J. Intell. Fuzzy Syst. 35(3), 1–6 (2018) 6. Z. Zhang, On the style and characteristics of piano works in the middle of romanticism. Art Technol. 30(5), 2 (2017) 7. M. Liu, J. Huang, Piano playing teaching system based on artificial intelligence—design and research. J. Intell. Fuzzy Syst. 40(1), 1–9 (2020) 8. T.A. Langlois, K.B. Schloss, S.E. Palmer, Music-to-color associations of single-line piano melodies in non-synesthetes. Multisens. Res. 29(1–3), 157 (2016) 9. Y. Zhu, The importance of standardizing the piano playing skills training in the piano collective class teaching mode. Music Space-Time 2015(21), 1 (2015) 10. X. Wang, Playing styles of piano works in different periods. Art Rev. 6, 4 (2018) 11. W. David, D. Nick, H. Stefan, Zero waste: mapping the evolution of the iterative sight-reading of a piano score. Music Theory Spectr. 2, 302–313 (2018) 12. M. Mueller, A. Arzt, S. Balke et al., Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Signal Process. Mag. 36(1), 52–62 (2018) 13. M. Noorduin, Re-examining Czerny’s and Moscheles’s metronome marks for Beethoven’s piano sonatas. Ninet.-Century Music Rev. 15(2), 1–27 (2017)

Chapter 11

Simulation of Music Personalized Recommendation Model Based on Collaborative Filtering Miao Zhong and Chaozhi Cheng

Abstract A large amount of user data information is transmitted through the network, resulting in a rapid expansion of data information. This paper constructs a music personalized recommendation model based on collaborative filtering algorithm, which combines user profiles, user ratings and music tags on the basis of traditional collaborative filtering. The user based collaborative filtering algorithm is used to calculate the similarity between users, and the music tag is used to calculate the unknown score. The experimental results indicate that the research method proposed in this paper is more feasible and accurate compared to other methods. Keywords Collaborative filtering algorithm · Music personalized recommendation · Model simulation

11.1 Introduction With the rapid development of information technology and the widespread popularity of the Internet, the number of users accessing the network is increasing day by day, and the extensive participation of people has led to the explosion of network information data. It is difficult for people to obtain their favorite products and information from the massive information data, and it is increasingly time-consuming and laborious to obtain satisfactory data [1]. In the field of e-commerce, users are often limited by their potential needs, so they must obtain from the vast amount of information those keywords that cannot be expressed or accurately searched, and those are the information users really need [2]. How to quickly and accurately select the part we are interested in from the mass of information has become the focus of attention in the online world. This requires an accurate understanding of users’ daily

M. Zhong · C. Cheng (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_11

105

106

M. Zhong and C. Cheng

access habits and interests on the Internet. According to their own behavior characteristics, personalized recommendation service can be implemented for specific users [3]. At present, in order to solve the problems of “information explosion” and “information overload”, improve users’ utilization rate of information, so that those useful information will not be overwhelmed by those irrelevant or uninteresting to users, so that users don’t have to waste a lot of time screening valuable information to further improve user experience, and information system managers don’t waste more costs and resources to manage redundant information. One of our solutions is the traditional information retrieval system represented by search engines, such as Google and Baidu [4]. As far as music is concerned, there is also the problem of information overload. A large number of new songs appear on the internet every day, but most people only understand a small part, so they will miss out on many good songs. The missing piece of music has become a long and lengthy piece that has been completely forgotten. The most representative methods are retrieval and recommendation. Each music section has a search and suggestion directory [5].To solve the problem of information overload, the most typical solutions are search and recommendation. Every music forum has a search function, as well as a recommended catalogue. Songs of various styles are classified and recommended to users. The search function quickly locates the specified resources, but it lacks intelligence and requires users to take the initiative to obtain them [6]. The recommended method of classified catalog has promoted a certain type of music that users like, but each user’s hobbies are not single, which shows that this method lacks flexibility. A large amount of redundant information in the network increases the difficulty for network users to obtain valuable information directly and conveniently, while people want to obtain valuable information directly from a large amount of information. How to obtain information accurately and quickly from complicated and lengthy data has become the key and difficult point of everyone’s attention, and how to obtain and properly use data has also become a hot direction [7]. With the application of recommendation methods, the sales volume of merchants has been increased. The correct use of recommendation algorithms will bring commercial value. Although the recommendation algorithm has been well developed in practical applications, with the further development of the website, the recommendation algorithm also faces challenges, because nothing is perfect [8]. The same principle applies to music recommendation algorithms. On this basis, a new music recommendation algorithm is proposed. On this basis, corresponding improvements and innovative explorations were proposed to address the problems faced by existing CF algorithms in music recommendation. Firstly, based on this, a personalized recommendation method based on CF is studied to effectively solve problems such as cold start of users and products, and sparsity of data. On this basis, a recommendation algorithm based on CF and playback factor is proposed to address the sparsity of users and data in the grey sheep group.

11 Simulation of Music Personalized Recommendation Model Based …

107

11.2 Simulation of Music Personalized Recommendation Model 11.2.1 Basic Theory of Music Recommendation System With the rapid development of the Internet and the gradual maturity of information technology, the cost of establishing Internet sites has become lower and lower, and the number of websites and pages in the world has increased exponentially. Using search engines to conduct traditional information search, the search results on the same topic are often tens of thousands, of which more than 99% are unnecessary, uninterested and low-quality [9]. At present, almost all music platforms provide personalized music recommendation services, such as Spotify, Pandara, Douban Music Radio, Netease Cloud Music, etc. Currently, most music recommendation systems explore users’ long-term preferences based on their historical behavior [10]. This article first introduces the basic concepts of music recommendation systems, and then introduces the basic concepts of music recommendation systems. On this basis, a new method based on B/S architecture and C/S architecture is proposed. Various development methods have their own advantages and disadvantages, therefore, when selecting them, it is necessary to start from the actual needs of the project. Traditional recommendation systems can be considered as the process of filtering and recommending based on recommendation models by analyzing user historical behavior. As shown in Fig. 11.1, at this stage, a complete recommendation model can be divided into three modules: input module, recommendation algorithm module, and output module. The recommendation module calculates the data input into the module by using the corresponding recommendation algorithm to obtain a user similarity list and an item similarity list. You can also calculate those unknown scores at this stage to reduce the adverse effects caused by sparse scores. The quality of recommendation algorithm will directly affect the quality of output module, so recommendation module plays a mainstay role in the whole recommendation system. The output module is the most recommended last block of the system, and also plays a very important role. The output of this stage is usually a score prediction matrix or a project recommendation list. The most critical step is how to recommend the results produced by the recommendation module to users in an appropriate form. In practical recommendation systems, such as Taobao, JD.COM and other domestic e-commerce giants, we pay special attention to the quality of the output module, because it directly affects the interests of enterprises and the experience of users.

108

M. Zhong and C. Cheng

Fig. 11.1 Overall steps of recommendation system

11.2.2 Personalized Recommendation System and Related Technologies The recommendation system can quickly help users find and recommend the items that users are interested in. The recommendation system needs to input relevant data information and output the item information that users may be interested in after algorithm calculation. For users, how to find their favorite information from such a large amount of information, as the provider of network service content, how to better present the network service information content to users, and let users like these content, is a relatively difficult thing. In order to make up for the gap in this field, many companies have begun to develop recommendation systems, which will combine the preferences of users to search for information that meets users’ preferences in network services and present them to users. A typical content-based recommendation algorithm is the Pandora music platform, which collects and preprocesses the music-related attributes, then models them according to the music attributes, and then recommends music with similar attributes to the target user according to the user’s historical preferences. After entering the twenty-first century, Internet technology has made great progress. Due to the limitation of cross-platform and extensibility of C/S mode, developers have developed B/S mode on the basis of C/S mode. The advantage of B/S mode is that different users can use it across platforms, and only need to access

11 Simulation of Music Personalized Recommendation Model Based …

109

Fig. 11.2 B/S mode structure block diagram

the same server through a browser, and the client only needs to install a browser to complete the access. Then the database returns the query results to the Server according to a certain format, and the Server converts the results returned by the database according to the requirements of HTTP protocol again, and finally forms an HTML document and returns it to the Client. The browser follows certain rules to convert hypertext into a Web page that users can directly read. As shown in Fig. 11.2. There are two ways to collect data information: explicit collection and implicit collection. Explicit collection refers to recording users’ specific behaviors, such as listening to a certain music and grading the music. Explicit collection has such advantages. Compared with other data, this part of the data is deterministic, but the disadvantages are also obvious. Information like this data is very limited. Data preprocessing mainly deals with two aspects: first, reducing noise data, which will be mixed in the collected data due to external factors, and because these data will affect the recommendation quality of the system to a certain extent, we should try to filter out invalid data; secondly, when there is less data, the sparse matrix needs to be processed. Facing the problem of sparse data of user-project scoring matrix, we can use the method of filling in values to deal with it. Filled values can be median score, mode, etc.

11.3 Simulation of Music Personalized Recommendation Model Based on CF 11.3.1 Collaborative Filtering Algorithm The user-based CF is based on the evaluation indicators of all projects of the group of users with high similarity with the target users, that is, their interest preferences are close, as the basis to recommend projects to them. In user-based CF, some mathematical statistical methods are usually used to find the nearest neighbor user group

110

M. Zhong and C. Cheng

that is close to the behavior habits and consumption interests of the target user. Due to the rapid growth of music data on the network, how the music platform provides users with high-quality personalized recommendation services has become a huge challenge. Researchers have proposed relevant recommendation algorithms, which effectively mine users’ music preferences. However, these algorithms are not good enough to deal with problems such as new users, new music and sparse ratings. CF technology refers to dividing users into different user groups based on their different interests. Due to different user groups having the same preferences, information can be provided to users based on the preferences of other users within the user group. This project aims to study music recommendation methods based on CF for new users, new products, sparse evaluations, and other issues. This algorithm combines user profile, user rating and music label on the basis of traditional CF. The userbased CF is used to calculate the similarity between users, and the music tag is used to calculate the unknown score. Facing the different systems, the order of the three steps mentioned above may be different. The essence of CF technology recommendation system is to query the “nearest neighbor set” for a target user who wants to recommend, and the final result is to get the prediction list. Let the nearest neighbor set of user u be represented by U , then the predicted score of u to i can be obtained by the user’s score of the items in the nearest neighbor set Pu,i , and the calculation formula is as follows: ∑ Pu,i = R u +

) ( (u, n) ∗ Rn,i − R n ∑ n∈U (|sim(u, n)|)

n∈U

(11.1)

sim(u, n) indicates the similarity between user u and user n, and Rn,i indicates the rating of item n by user i. R u and represent users and users’ average scores of items, respectively. After getting the nearest neighbor set of the target user, the scores of these neighbors are combined to generate the predicted score value of the active user. After calculating the similarity between the active user and other users, the predicted value of the active user’s score on the item is only the weighted sum of the scores of other users’ a and j items, as shown in formula (11.2): Pa, j = R a + x

Q ∑

) ( sim(a, u) Ru, j − R u

(11.2)

u=1

sim(s, u) is the similarity between target user a and neighbor user u, R u is the average score of neighbor user x is a normalization coefficient. Predict all items without a rating, and use the items with higher predicted values as recommendation results to provide feedback to the current user. Compared with the traditional similarity of the two types of users, the difference of the correlation similarity is that it is based on the joint evaluation of two users on the music supply, while the similarity between users is estimated based on the common score of users. The calculation method is shown in formula (11.3).

11 Simulation of Music Personalized Recommendation Model Based …

) ( ∑ ( ) ∈ IUi U j Rui − R u,i sim Ui , U j = ∑ ) ( i∈IUi U j Rui − R u,i

111

(11.3)

The meaning of the formula symbol is the same as the modified cosine. To a new diagonal matrix Sk , transform T and D in the same way to get Tk and Dk , then the newly generated matrix Rk = Tk × Sk × DkT , Rk ≈ R. Singular value decomposition (SVD) can generate the most similar matrix to R among all the matrices whose rank is equal to K. User a’s forecast score for project I is: Pa, j = R a + Tk ×

/

SkT (a) ×



Sk × DkT (i )

(11.4)

where R a is the evaluation score of user a on all the items he has scored. k is the reserved dimension after singular value decomposition, and the reserved dimension should be determined in advance through experiments. Because the value will affect the result of scoring prediction, if the value of k is not large enough, the important structure in the initial scoring matrix will be lost, and if the value is too large, the significance of dimensionality reduction will no longer exist.

11.3.2 Analysis of Experimental Results The experimental data comes from some data of a music website, including the audition data of music works demanded by users and the actual download data of music works demanded by users. On this basis, this paper adds a lot of information, including the singer, category and lyrics of music works demanded by users. During the experiment, there are more than five data of demanders who choose to audition music works demanded by users. There are 53,642 people who need music works, involving 365,874 pieces of music works, 906,123 pieces of audition records of music works and 210,254 pieces of download records of music works. The experimental data in this paper mainly use the download and audition records of music works demanders, with the experimental training set accounting for 80% and the test set accounting for 20%. The experimental simulation tool is MATLAB. In the CF, it aims at the users and users based on music works. In the CF model, we must first know the influence of the threshold size of user situation similarity on the accuracy and its specific performance distribution. The prototype recommendation system packaged with the recommendation engine serves the majority of users. The more users, the more successful our products are. However, on the other hand, it also puts forward higher requirements for the performance of our system. The purpose of this experiment is how much concurrency the system can bear and how many users it can serve. The number of users varies from 25 to 400, and is tested in three cases: starting 1 user every 2 s, starting 1 user every second, and starting 2 users every second. The test results are shown in Table 11.1.

112

M. Zhong and C. Cheng

Table 11.1 System pressure test Thread

Average time consumption of 0.5 Trand/s (ms)

Average time consumption of 1 Trand/s (ms)

Average time consumption of 2 Trand/s (ms)

25

1362

3325

12,335

50

1663

2252

22,463

100

1428

2637

52,416

120

1524

2415

205,669

400

1752

2751

203,654

The test results show that only the average time consumption of starting two users per second will increase with the increase of users, and the average time consumption in the other two cases has little change. In user-based CF, the selection of similarity measurement method is related to whether the recommendation result of the algorithm is ideal or not. For this reason, this paper uses a CF based on users to experiment with network music data in music data, and tests three different similarity formulas under different user neighbor numbers, and obtains the MAE value as shown in Fig. 11.3. As can be seen from the figure, compared with the three similarity methods, the similarity calculation method of Pearson correlation coefficient is better than the other two methods, which effectively reduces the MAE (mean absolute error). At the same time, we find that the optimal neighbor number n of user-based CF should be around 30 on this data set. As shown in Fig. 11.4. From Fig. 11.4, we can find that with the increase of the number of nearest neighbors n. However, when the recent value of n exceeds 30, the MAE value of

Fig. 11.3 Comparison chart of different similarity calculation accuracy

11 Simulation of Music Personalized Recommendation Model Based …

113

Fig. 11.4 Comparison diagram of MAE simulation of average absolute error of three algorithms

the algorithm tends to a stable state. From the vertical point of view, the optimized and improved algorithm in this paper has a relatively high average absolute error, which means that the results recommended by this algorithm are relatively accurate. Therefore, it can be seen that the optimized and improved algorithm in this paper has improved the sparsity and cold start problem of recommendation algorithm to a certain extent, and the optimization of the algorithm is feasible.

11.4 Conclusion Recommendation systems are currently a research hotspot in the fields of machine learning and data mining. In the past few decades, personalized recommendation technology has made significant progress through the collaborative efforts of numerous researchers. From a vertical perspective, it searches for groups similar to users or projects, and then generates recommendation lists and results based on the historical behavior records of these groups, allowing users to perceive and use music that they have not been exposed to in the past, making it convenient for users to choose and enjoy their favorite music. The improved algorithm in this paper has relatively high average absolute error, which means that the results recommended by the algorithm in this paper are relatively accurate. From this, it can be seen that the optimized algorithm in this article has improved the sparsity and cold start issues of the recommendation algorithm to a certain extent, and the optimization of the algorithm has certain feasibility.

114

M. Zhong and C. Cheng

References 1. W.U. Yun, J. Lin, M.A. Yanlong, A hybrid music recommendation model based on personalized measurement and game theory. Chin. J. Electron. 32(1), 10 (2022) 2. Y. Huo, Music personalized label clustering and recommendation visualization. Complexity 44(7), 24 (2021) 3. J. Liu, Evaluation and numerical simulation of music education informationization based on the local linear regression method. Complexity 55(9), 13 (2021) 4. H. Ning, Q. Li, Personalized music recommendation simulation based on improved CF. Complexity 69(8), 30 (2020) 5. H. Gao, Automatic recommendation of online music tracks based on deep learning. Math. Probl. Eng. 4(7), 23 (2022) 6. L. Zhang, Z. Tian, Research on the recommendation of aerobics music adaptation based on computer aided design software. J. Intell. Fuzzy Syst. 12(1), 2 (2021) 7. L. Wei, Design of a music recommendation model on the basis of multilayer attention representation. Sci. Program. 8(1), 58 (2022) 8. J. Liu, Evaluation and numerical simulation of music education informationization based on the local linear regression method. Complexity 55(7), 58 (2021) 9. K. Han, Personalized news recommendation and simulation based on improved CF. Complexity 44(7), 46 (2020) 10. Z. Tong, Research on the model of music sight-singing guidance system based on artificial intelligence. Complexity 22(1), 36 (2021)

Chapter 12

Design and Optimization of Image Recognition and Classification Algorithm Based on Machine Learning Zeng Dan and Chen Yi

Abstract In recent years, traditional keyword based image retrieval and classification have been unable to meet people’s daily needs, and image recognition methods based on image content have emerged. This article uses a computer to achieve visual understanding of human body images. The improvement effect of adding different numbers of virtual samples on recognition, as well as the impact of adding different proportions of virtual images. At the same time, the virtual sample algorithm is applied to SAR image recognition. The experimental results show that adding virtual samples constructed by this method to expand the training set can improve the target recognition rate of SAR images, especially in the case of small samples, and significantly improve the recognition rate. This algorithm effectively characterizes the features of the image, improves the accuracy of classification, and has good classification performance. Keywords Machine learning · Image recognition · Classification algorithm · Design optimization

12.1 Introduction In recent years, with the rapid growth of social networking sites, portals and mobile apps, as well as the rapid development of digital devices and wearable devices, the amount of information explodes exponentially. Searching massive data by way of term index can’t meet the needs of human beings [1]. Image recognition is mainly divided into the following steps: image sample capture, preprocessing, image feature extraction and selection, recognition and classification. However, in the actual situation, it is often difficult to obtain a sufficient number of image samples, and the training samples are limited, which will seriously affect the performance of image Z. Dan (B) · C. Yi Wuhan Institute of Design and Sciences, Wuhan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_12

115

116

Z. Dan and C. Yi

recognition [2]. Especially at present, the commonly used machine learning algorithms are usually based on the premise of large data samples, and insufficient image samples will lead to over-fit. Remote sensing technology judges the information of specific targets on the ground according to the electromagnetic wave segments reflected and radiated by the targets, and through processing and analyzing the known information, it realizes target recognition, accurate positioning and quantitative expression, etc. [3]. Generally, the data provided by remote sensing technology has the advantages of high timeliness, wide coverage, abundant and objective information, etc., so it has been widely used in many research fields, such as land use, biological environment monitoring, resource exploration, crop yield estimation and real-time monitoring of ocean sky, etc., and plays an increasingly important role in various economic developments in modern society [4]. In this information age, people have to accept and process a lot of image information every day, which contains many different levels of information. We can get simple external information at a glance, while we can use intelligent means to dig deeper information [5]. At present, the research focus in this field is Content-based image retrieval (CBIR), which applies the technology of pattern recognition, data mining, database, image processing and other research fields to retrieve images in massive databases [6]. In short, image content recognition is to obtain the object recognition in the scene, the description of the relationship between objects in the scene, the recognition of the scene and the description of high-level semantics of the image from the input image obtained from the vision through a series of calculation, analysis and learning processes [7]. This paper focuses on the image content recognition algorithm based on machine learning, systematically expounds the research significance and related work of image content recognition algorithm, and puts forward an innovative image content recognition algorithm. Secondly, this paper studies the validity evaluation criteria and basis of virtual samples from the feature point of view, and verifies the validity of the virtual samples constructed by the methods of resampling, singular value reconstruction and contour wave reconstruction through experiments.

12.2 Image Classification and Retrieval Method Based on Image Visual Features 12.2.1 Theoretical Basis of Machine Learning Recognition Algorithm Since 1990s, many breakthroughs have been made in the field of image recognition. Now, there are a large number of application examples in many fields including face or fingerprint recognition, handwritten digit recognition or vehicle recognition [8]. In order to make full use of multi-source data to obtain all kinds of information that people need, this section analyzes the characteristics of hyperspectral and

12 Design and Optimization of Image Recognition and Classification …

117

Fig. 12.1 Basic process of image recognition

SAR remote sensing images, and discusses the complementary advantages of data and the problems that should be paid attention to in the process of fusion, classification and processing. Among them, the high-resolution spectral features provided by hyperspectral images are the features that SAR images don’t have, which are helpful to the follow-up study of ground object classification. However, for polarized SAR images, it is difficult for hyperspectral images to obtain more abundant target observation information through their dielectric characteristics and different polarization scattering components [9]. The original image samples usually contain a large number of irrelevant areas, which do not contribute to the final recognition effect, thus affecting the performance of the whole recognition system. Usually, the target area needs to be selected, and in some cases, the original image samples need to be preprocessed by means of histogram equalization, brightness adjustment and contrast adjustment [10]. Generally, the image recognition process can be divided into image acquisition, image preprocessing, feature extraction and dimension reduction, and classification recognition. The specific process is shown in Fig. 12.1. With respect to prediction models, supervised learning is a learning from the training data set. The machine already knows the correct answer, and the algorithm predicts and corrects the training data iteratively. When the algorithm reaches an acceptable performance level, learning stops. Supervised learning problems can be further classified into regression and classification problems. The goal of unsupervised learning is to model the underlying structure or distribution of data in order to learn more about the data. Many image processing algorithms use biological theories for reference. For example, in image compression, the quantization of discrete cosine transform coefficients is based on the fact that human vision is more sensitive to low-frequency components than to high-frequency, so that the high-frequency quantization step is larger and the low-frequency quantization step is smaller, so that human eyes can see that the compressed image still has higher quality. Which is quite different from human visual processing. Therefore, feature extraction is to find out the features that best represent a certain kind of object and use them as another expression of that kind of object. At the same time, because some image targets have many features, for these original feature data in high-dimensional space, not only the calculation amount is huge, but also the calculation efficiency is low, which seriously affects the stability of recognition results.

118

Z. Dan and C. Yi

12.2.2 Research on Virtual Sample Algorithm in Image Recognition The global feature refers to the original feature attribute of the image, which belongs to pixel level features, generally including color, texture, shape, etc. Local feature is an image mode that is different from its domain defined according to some saliency criteria. It is usually associated with the change of one or several image attributes, and has multiple forms such as points, edges, and image regions. Among them, in view of the small sample set problem caused by the difficulty that the number of training samples is too small compared with the feature dimension, and the limitations of traditional basic classifiers, then, using different query criteria in active learning theory, manually and circularly select the samples to be labeled that contain the largest amount of information and are most conducive to improving the classification performance, predict their categories and add them to the total training set. The MGB-LOOC classifier is used again to further improve the classification accuracy of various ground objects, so as to finally solve a series of problems in the classification process of multi-source remote sensing images. Quasi sample technique is an effective method to solve the problem of small sample size. In the case of unknown sample distribution, virtual samples are constructed by using existing prior knowledge and original sample data to expand the number of sample sets. At present, the application of virtual image sample technology in image recognition is mainly through geometric transformation, integral projection, wavelet transformation and polynomial transformation. In this paper, the histogram method of image is used to extract the feature vector of image. In order to prevent over fitting, L2 regularization is introduced into the extracted feature vector to improve the generalization ability of the model. The method shown in Fig. 12.2 is used for classification and recognition. With the development of virtual sample research, more and more methods of expanding virtual image samples have emerged. In the process of image recognition, the shortage of training samples means that there is less image feature information. Generally speaking, it is necessary to construct virtual samples through prior knowledge in this research field. The constructed virtual sample undoubtedly has high rationality. However, in practice, due to the limited depth of research, it is often difficult to obtain complete prior knowledge. At present, many virtual sample algorithms

Fig. 12.2 Traditional machine learning recognition method

12 Design and Optimization of Image Recognition and Classification …

119

are directly extended based on training samples. Most machine learning algorithms need to extract features from the sample set for further training and recognition, and the extracted feature components have a crucial impact on the final recognition performance. Therefore, this paper introduces the concept of feature selection to evaluate the effectiveness of the generated virtual samples. However, with the birth of machine learning algorithms in 1990s, the features of samples tend to be high-dimensional and complicated, and the demand for feature selection is increasing. On the one hand, the feature information contains a lot of important representative information, while there are some irrelevant and redundant information. The purpose of feature selection is to retain the feature components with strong correlation in all feature information through certain standards, and discard irrelevant redundant feature information at the same time, so as to achieve the best classification prediction effect.

12.3 Image Retrieval Method Combining Machine Learning with Image Visual Features 12.3.1 Image Recognition Advertising Classification Algorithm Model Support vector machine is a new machine learning technology, which is based on the structural risk minimization principle of statistical learning theory [11]. According to the limited sample information, it seeks the best compromise between the complexity of the model and the learning ability in order to obtain the best generalization ability. It has good generalization ability in solving small sample, high-dimensional, and nonlinear situations [12]. In addition, the manual method is not only inefficient, but also prone to errors. For the basis of the effectiveness evaluation of virtual samples, this paper will evaluate the effectiveness of virtual samples by using the feature components before and after adding virtual samples. Generally, the evaluation criteria for features are divided into independent criteria and related criteria. The independent criteria include: Separation, information, relevance, consistency, etc. The relevant standard is the classification accuracy after using a specific classifier to classify the specified feature set. In this chapter, the Euclidean distance corresponding to the feature components extracted by PCA features between the training set and the test set before and after adding virtual samples and their mutual information are selected as independent criteria for measuring features. At the same time, the correct rate of classification and recognition obtained by using SVM is used as the relevant standard to evaluate the effectiveness of the virtual samples generated by the virtual sample algorithm in this paper. Let the Euclidean distance m between two points A(a 1 , a 2 , a 3 , . . . , a m ) and B(b1 , b2 , b3 , . . . , bm ) in ρ(A, B) dimensional space be expressed as:

120

Z. Dan and C. Yi

┌ | m |∑ ρ(A, B) = √ (a i − bi )2

(12.1)

i=1

In essence, machine learning algorithm is a system that uses the input instance information set to learn the relationships among them through certain rules. And information is the form of describing the state of things [13]. In information theory, information entropy is used to measure the uncertainty of information in random variables [14]. The greater the uncertainty, the greater the information entropy, and the corresponding information required will also increase. Then the information entropy H (X ) of the random variable X is: { H (X ) = −

p(x) log p(x)d x

(12.2)

x

MILES algorithm is based on DD algorithm, and its idea is: to construct a target space and embed multiple sample packages into a point in the target space through a defined nonlinear projection function, so as to transform multiple sample packages into a single sample, and then to solve the multi-example learning problem with support vector machine. In the this algorithm, all examples are regarded as candidate feature concepts, assuming that the feature set is C = {x k }(k = 1, 2, . . . , n), n, which is the total number of examples in the training package, and the distance between the feature x k and the most similar example in the package Bi is the similarity, which is defined as follows: || || ||xi j − x k ||2 s x , Bi = max exp − i σ2 

k



(12.3)

where σ is a predetermined scale parameter. Firstly, classify the examples in these images, namely visual vocabulary, and select the examples that are most relevant to the object. These examples are called positive examples. Then use the image segmentation algorithm proposed in this chapter to segment the image [15]. Accumulate the weight values of positive examples within the same region and compare them with pre-set threshold values to determine whether the region belongs to the object region. Find all areas in the image that belong to the object, which is the object area. Combining sparse coding with local features of images, the sparse coding model has the advantages of visual bag-of-words model model in the research of image content recognition: generalizing complex large-scale datasets, improving the resolution of features and reducing the time of feature matching/classification. The sparse encoding model also has better image representation performance: it generates the most representative visual vocabulary through sparse encoding theory, and represents image blocks through one or several visual vocabulary with the highest correlation. The eigenvector of package Bi is defined as follows:

12 Design and Optimization of Image Recognition and Classification …

121

     T  m(Bi ) = s x 1 , Bi s x 2 , Bi . . . s x n , Bi

(12.4)

where E( pi (r )qi (s)) represents the Euclidean distance between the i image of class r in training set and the first image of class in testing set [16]. The number of samples in each training set is, and the number of virtual training samples added is N . In addition, the recognition accuracy rate of can be used as the relevant basis for validity discrimination of virtual samples. If the recognition accuracy rate after adding R1 , R2 virtual samples is greater than that before adding them, then the added virtual samples are effective from the perspective of relative standards.

12.3.2 Experimental Analysis Results The images collected under outdoor natural light sources are affected by lighting factors such as directional light, backlight, and overcast weather. Therefore, this experiment adopts a color model of image hue, saturation, and intensity, which can eliminate the influence of light intensity components from the carried color information in color images. Based on active learning theory, this experiment uses a variety of different query mechanisms, uses hyperspectral and polarimetric SAR feature data, and conducts image recognition classification experiments. The accuracy results are shown in Table 12.1. Among them, the comparison between the classification accuracy of each of the five categories and the original classification accuracy after 50 iterations is shown in Fig. 12.3 of the column analysis. Under the control of different query algorithms in the active learning theory, the change of the total classification accuracy with the increase of the number of iterations is shown in Fig. 12.4. Among them, the hyperspectral 10 dimensional feature experimental data comes from the above SVM-PP extraction method and uses genetic algorithm to select features, while the polarimetric SAR image experimental data comes from the 8 dimensional feature data selected by multiple target decomposition. At the same time, the number of basic training samples selected for each category is 30, and the query mechanisms used based on the same initial training set mainly include minimum confidence (LC), margin, information entropy (NEP) and improved committee selection method (NQBC). Table 12.1 COREL1000 classification results

Classifier parameters

Accuracy

Time

KNN

0.52

47 s

Bayes

0.65

22 s

SVM

0.85

1 min 23 s

CNN

0.25

2 min 33 s

LSTM

0.77

5 min 29 s

122

Z. Dan and C. Yi

Fig. 12.3 Classification results of each active learning algorithm

Fig. 12.4 Schematic diagram of using SVM to add different proportions of virtual sample recognition rate

In order to eliminate white small areas and reduce the impact of noise on image segmentation, morphological processing is also necessary. Firstly, perform the corrosion operation in morphological filtering. In non boundary conditions, the gradient change is relatively gentle, while the gradient change at the boundary will be significant [17]. So in image segmentation, the gradient features of the image can be used to obtain image boundaries, and then the image can be segmented. Many edge detectors also use various differential operator to extract image boundaries.

12 Design and Optimization of Image Recognition and Classification …

123

First of all, by analyzing the overall classification experiment results in Table 12.1, we can see that the four query criteria adopted can improve the classification accuracy to some extent, which preliminarily shows that the core theory of active learning theory studied in this section is feasible to expand the number of too few samples by adding samples to the training set, reducing the demand for the number of training samples. It further solves the problem of low classification accuracy caused by insufficient number of samples in a small sample set. In the training set, five images are randomly selected as training samples, and the images are respectively expanded by three virtual sample methods, and corresponding virtual images are generated and added to the training set. After repeated five experiments, the average recognition results are obtained. It can be seen that choosing to add different proportions of virtual samples will have a certain impact on the final recognition result. After adding different proportions to the three virtual samples, the recognition results fluctuate between 1 and 2%, instead of monotonically increasing or decreasing. From the above experimental analysis, it can be seen that for different numbers of actual training samples, the SAR image target recognition rate will be improved after adding the virtual samples constructed by this method to expand the training set, especially in the case of small samples, the recognition rate will be improved significantly. And after adding virtual samples, it can improve the recognition accuracy of SVM and DBN two different recognition algorithms, and the recognition rate after adding virtual samples to SVM is more obvious.

12.4 Conclusion With the gradual improvement of remote sensing application of Earth observation technology, people have obtained more and more unique remote sensing image data. At the same time, facing the complicated conditions of various practical application problems, it is difficult for a single information source to meet the specific needs. How to make rational use of multi-source information with complementary advantages has become a new research hotspot, and image fusion technology is just one of the most effective ways. Which is to realize the visual understanding function of human images by computer. In the aspect of virtual sample application, the virtual sample algorithm used in this paper is analyzed through a large number of experiments on public data sets of multiple faces. The improvement effect of adding virtual samples in different quantities on recognition, and the influence of adding virtual images in different proportions. At the same time, the virtual sample algorithm is applied to SAR image recognition. Through comparative experiments, it can be seen that the SVM PP method for hyperspectral images can effectively combine the principles of support vector machine and projection pursuit to solve a series of problems caused by hyperspectral feature extraction from small sample sets.

124

Z. Dan and C. Yi

Acknowledgements This article is supported by the Hubei Provincial Department of Education 2019 Scientific Research Project: “Research on Multi-sensor Data Fusion Technology in Wireless Sensor Networks (No. B2019331)” and the Annual school-level science Project Research Project in 2022 of Wuhan Institute of Design and Sciences (No. K2022221).

References 1. J. Cui, K. Tian, Edge detection algorithm optimization and simulation based on machine learning method and image depth information. IEEE Sens. J. 20, 11770–11777 (2019) 2. G. Liu, S.B. Tsai, A study on the design and implementation of an improved AdaBoost optimization mathematical algorithm based on recognition of packaging bottles. Math. Probl. Eng. 2022 (2022) 3. V. Hamedpour, P. Oliveri, R. Leardi, D. Citterio, Chemometric challenges in development of paper-based analytical devices: optimization and image processing. Anal. Chim. Acta 1101, 1–8 (2020) 4. R. Yan, T. Wang, X. Jiang, Q. Zhong, X. Huang, L. Wang, X. Yue, Design of high-performance plasmonic nanosensors by particle swarm optimization algorithm combined with machine learning. Nanotechnology 31, 375202 (2020) 5. P. Gu, Y.Z. Feng, L. Zhu, L.Q. Kong, X.L. Zhang, S. Zhang, S.W. Li, G.F. Jia, Unified classification of bacterial colonies on different agar media based on hyperspectral imaging and machine learning. Molecules 25, 1797 (2020) 6. Y. Xin, M. Cui, C. Liu, T. Hou, L. Liu, C. Qian, Y. Yan, A bionic piezoelectric tactile sensor for features recognition of object surface based on machine learning. Rev. Sci. Instrum. 92, 095003 (2021) 7. S. Zhang, L. Bian, Y. Zhang, High-accuracy inverse optical design by combining machine learning and knowledge-depended optimization. J. Opt. 22, 105802 (2020) 8. Z. Pan, S. Fang, H. Wang, LightGBM technique and differential evolution algorithm-based multi-objective optimization design of DS-APMM. IEEE Trans. Energy Convers. 36, 441–455 (2020) 9. Y. Sun, B. Xue, M. Zhang, G.G. Yen, J. Lv, Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. 50, 3840–3854 (2020) 10. M.P. Dang, T.P. Dao, N.L. Chau, H.G. Le, Effective hybrid algorithm of Taguchi method, FEM, RSM, and teaching learning-based optimization for multiobjective optimization design of a compliant rotary positioning stage for nanoindentation tester. Math. Probl. Eng. 2019 (2019) 11. E.T.L. Ann, N.S. Hao, G.W. Wei, K.C. Hee, Feast in: a machine learning image recognition model of recipe and lifestyle applications. MATEC Web Conf. 335, 04006 (2021) 12. H. Su, Design of the online platform of intelligent library based on machine learning and image recognition. Microprocess. Microsyst. 82, 103851 (2021) 13. C. Carpenter, Machine-learning image recognition enhances rock classification. J. Petrol. Technol. 72, 63–64 (2020) 14. W. Niu, Y. Luo, K. Ding, X. Zhang, Y. Wang, B. Li, A novel generation method for diverse privacy image based on machine learning. Comput. J. 66, 540–553 (2023) 15. B.G. Ashinsky, M. Bouhrara, C.E. Coletta, B. Lehallier, K.L. Urish, P.C. Lin, I.G. Goldberg, R.G. Spencer, Predicting early symptomatic osteoarthritis in the human knee using machine learning classification of magnetic resonance images from the osteoarthritis initiative. J. Orthop. Res. 35, 2243–2250 (2017) 16. E. Elyan, P. Vuttipittayamongkol, P. Johnston, K. Martin, K. McPherson, C. Jayne, M.K. Sarker, Computer vision and machine learning for medical image analysis: recent advances, challenges, and way forward. Artif. Intell. Surg. 2 (2022)

12 Design and Optimization of Image Recognition and Classification …

125

17. O.C. King, Machine learning and irresponsible inference: morally assessing the training data for image recognition systems, in On the Cognitive, Ethical, and Scientific Dimensions of Artificial Intelligence: Themes from IACAP 2016 (2019), pp. 265–282

Chapter 13

Design of Path Planning Algorithm for Intelligent Robot Based on Chaos Genetic Algorithm Min Sun

Abstract Path planning is one of the core problems in the field of intelligent robot research, and also an important aspect of artificial intelligence in robotics. This article combines GA (genetic algorithm) with chaotic algorithms. By studying chaos theory, the randomness of chaos is used. This article designs a path planning algorithm for intelligent robots based on chaotic genetic algorithm. By selecting a geographic information encoding method for chromosome encoding, the serial number is converted into coordinate form when evaluating the path. The coordinate method is more convenient for representing the relative positions between grids, calculating the path length, and verifying the feasibility of the path. Finally, the simulation results show that when planning according to the obtained path, because there are many path nodes, the robot’s direction and angle are adjusted more times, which strengthens the effect of positive feedback, increases the diversity of understanding, reduces the possibility of GA falling into local minima, and speeds up the convergence speed. At the same time, the algorithm is also conducive to parallel execution and application. Simulation experiments verify the effectiveness of the proposed algorithm. The chaotic genetic algorithm has achieved good results in solving the path planning problem of robots in unknown environments. Keywords Chaotic genetic algorithm · Intelligent robot · Path planning

13.1 Introduction The appearance of robots, especially intelligent robots, marks a new stage of human exploration of machines and intelligence. Robot technology has been extended to many fields of global economic development, which has had a great impact on industry, national defense, national economy, and people’s lives in many countries. M. Sun (B) Wuhan Institute of Design and Sciences, Wuhan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_13

127

128

M. Sun

Path planning is one of the core problems in the research field of intelligent robots, and it is also an important aspect of artificial intelligence in robotics [1]. The basic behaviors involved in robot navigation are perception and recognition of its own position and posture and its surrounding environment, path planning, path tracking, obstacle avoidance and so on. Based on the intelligent evaluation index of mobile robot, the main research contents of mobile robot technology include navigation and positioning, path planning, motion control and multi-robot system. Path planning is one of the most important tasks in robot navigation. Local planning method is an online planning method, which focuses on the current local environment information of the robot, so that the robot has good collision avoidance ability. Collision-free path planning of robots in obstacle environment can be divided into global path planning with complete environmental information and local path planning with complete or partial unknown environmental information. Many robot navigation methods are usually local methods because its information acquisition only depends on the information obtained by sensors and changes in real time with the change of environment [2]. In recent years, intelligent algorithms have been widely applied in robot path planning. The commonly used methods include input potential field method, fuzzy logic method, neural network method, genetic algorithm, etc. When planning based on the obtained path, due to the large number of path nodes, the direction and angle of the robot need to be adjusted multiple times. However, when there are few path nodes, the robot can reduce running time, quickly reach the destination according to the shortest path, and the path is relatively smooth. To strengthen the role of positive feedback, increase the diversity of understanding, reduce the possibility of CGA falling into local minima, and accelerate the convergence speed [3, 4]. Genetic algorithm is used to solve the path planning problem of mobile robots. The combination of genetic algorithm and chaotic algorithm. Based on GA, Select probability and add operators to improve the basic genetic algorithm, and introduce chaotic operations into the genetic algorithm, and apply the algorithm to robot path planning. Finally, simulation experiments show that this algorithm can effectively perform robot path planning. Good results have been achieved in solving robot path planning problems in unknown environments. By studying chaos theory, the randomness of chaos is used. Disturbances are added to the population to avoid premature convergence of genetic algorithms.

13.2 Concept and Principle of Chaotic Genetic Algorithm The main feature of GA is the group search strategy, and the ergodicity of evolutionary operators makes genetic operators very effective in global search in the sense of probability. It has strong robustness and adaptability, but GA not only provides evolutionary opportunities for individuals in the population, but also inevitably leads to the possibility of degradation. When dealing with path planning problems, there is a long running time. Chaotic motion is a common phenomenon, but also an unstable,

13 Design of Path Planning Algorithm for Intelligent Robot Based …

129

limited and constant motion, which only appears in nonlinear dynamic systems [5, 6]. The ergodicity of chaos optimization strategy is used to generate a good initial population. According to the principle of survival of the fittest, it is selected, hybridized and mutated by genetic operation. At the same time, deletion and insertion operations are added to accelerate the convergence rate of GA. Then chaos randomness is used to add disturbance to the population to avoid premature convergence of CGA. CGA flow chart is shown in Fig. 13.1. The basic idea of CGA designed in this paper is to generate a better initial population by using the ergodicity of chaos optimization strategy. According to the principle of survival of the fittest, genetic operations such as selection, hybridization and mutation are carried out according to the white fitness probability [7]. Then use the randomness of chaos to add disturbance in the population to avoid premature convergence of CGA, and through continuous reproduction and evolution, finally obtain the better individuals that meet the fitness function [8]. If the objective function is a maximization problem, then

Fig. 13.1 CGA flow chart

130

M. Sun

Fit( f (x)) = f (x)

(13.1)

If the objective function is a minimization problem, then Fit( f (x)) = − f (x)

(13.2)

If only cross operation is considered to realize the evolution mechanism, it is not feasible in most cases, which is similar to that inbreeding in the biological world affects the evolution process [9]. Because the number of the population is limited, after several generations of cross operations, because the sub individuals from a better ancestor gradually fill the whole population, the problem will converge prematurely. Of course, the finally obtained individuals cannot represent the optimal solution of the problem.

13.3 Path Planning Method 13.3.1 Coding Based on Geographic Information In CGA, for a fixed fitness function, the length of coding and the size of search space directly determine the time of online computing. Therefore, it is necessary to design a simple and practical encoding technique to shorten planning time and achieve real-time control. The width of a robot should be its maximum width in motion, otherwise the robot may be trapped in obstacles in actual action. The final grid granularity of is the maximum of these two. In actual movement, the path points are two-dimensional. If the size of the path coordinate points can be reduced, the calculation speed will be greatly improved. The grid granularity determined by CGA is based on the unilaterally obtained granularity of obstacles, and the final grid granularity should be determined considering the width of the robot itself [10]. The obstacle is a convex polygon, the position and size of the obstacle are known, and the geometric shape, position and size of the obstacle will not change during the robot movement. When initializing a population, the path represented by each individual is only different on the vertical axis of the path. The change in the vertical axis can enable the robot to avoid obstacles and move to the target point according to different paths. The workspace of a robot is a two-dimensional structural space, assuming that the position and size of obstacles are known, and the position and size of obstacles do not change during the robot’s movement. After coordinate transformation, the abscissa of discrete points in the same vertical line segment is equal in the new coordinate system, while the abscissa values of discrete points in different vertical line segments in the new coordinate system can only be uniquely determined by the sequence number of the vertical line segment. From left to right and from top to bottom, start from the first grid in the upper left corner of the grid array, and give

13 Design of Path Planning Algorithm for Intelligent Robot Based …

131

each grid a serial number p, then the serial number p corresponds to the grid block one by one. Mutual mapping relationship: p = total × column + x + y

(13.3)

or 

x = int( p, total) y = mod( p, total)

(13.4)

where, total represents the total number of columns, int represents the integer of total, and mod represents the remainder of total. Because the coding method will have a great impact on the execution efficiency of the crossover operator and mutation operator, and the widely used binary coding method has invalid solutions, this paper selects the geographic information coding method for chromosome coding.

13.3.2 Simplification of Robot Control Parameters By determining the rotation angle of the robot through changes in genes before and after the path, and controlling the robot’s motion, the control of the robot is simplified. The control of robots is related to speed, step size, and angle. Here, the speed of the robot is set, and the step size is a unit grid. In the environment representation where the front obstacle is a convex polygon, the obstacle boundary extends a certain safety distance outward, and the following text will also introduce a spiral pair to smooth the path, assuming that the size and position of the obstacle are known, the robot’s workspace is a two-dimensional structural space, and the obstacle remains stationary during the robot’s movement. Therefore, the fitness function only needs to include the length of the path and judge whether it collides with the obstacle. The global path planning for the mobile robot, that is, the distribution of all obstacles in the environment is known, requires the mobile robot to move from the starting position to the ending position without collision with the known obstacles, so that the robot can obtain the best moving path. The robot needs to be able to move freely in the grid. If there are no obstacles within the grid size range, the grid is a free grid, otherwise it is an obstacle grid. A two-dimensional workspace map of a robot, in which the black area is represented as an obstacle. As shown in Fig. 13.2. A generalized spiral with shape parameters is introduced, which smoothes the already-generated excellent non-collision broken line path, and the final smooth path is generated by adaptively selecting the shape parameters. At the same time, the smoothing operations at each corner are independent of each other, that is, their shape parameters can be taken independently, so that the shape of the path can be locally adjusted through the shape parameters, which is flexible and convenient. The

132

M. Sun

Fig. 13.2 Two-dimensional workspace diagram with obstacles

disturbance is added to the population to avoid premature convergence in GA. Select probability and add operator to improve the basic GA, and introduce chaos operation into GA, and apply this algorithm to robot path planning. Finally, simulation experiments show that this algorithm can effectively carry out robot path planning. If this grid is a free grid and not in the path, insert the middle of two discontinuous grids as the path. If there is no grid that meets the conditions after traversing the four grids, delete this path.

13.4 Simulation Study In the process of walking, the robot obtains its current position information according to its own sensor, and calculates the angle difference between its current direction and the target point according to the coordinates of the current position and the target point. The specific method of path planning is that the intelligent machine starts from the starting point of the planned path, takes the following path nodes as the target point, and walks one by one until the final target point. Then adjust its forward direction to the straight line between the current position and the target point according to the calculated angle difference. Using the map environment, the basic ant colony algorithm, the algorithm in Ref. [8] and the algorithm in this paper are executed 150 times respectively, and their respective optimal paths are obtained. Compare the results with those in Environment 1, Environment 2 and Environment 3. If the difference is within 35%, the path search is considered successful, otherwise, the search fails. Obtain the results in Table 13.1.

13 Design of Path Planning Algorithm for Intelligent Robot Based … Table 13.1 Number of successful searches

ACA

Literature [8] algorithm

133

Algorithm in this paper

Environment 1

74

92

100

Environment 2

36

88

100

Environment 3

18

75

100

Fig. 13.3 Robot path planning

It can be seen from Table 13.1 that in Environment 1, the three algorithms can converge around the typical solution in Environment 1 with very high probability, and the number of successful searches is relatively high. In Environment 2, the performance of the basic ant colony algorithm is greatly degraded. During the experiment, it is found that sometimes the ant colony will fall into the local optimal solution, or even cannot find the global optimal solution. In Environment 3, due to the large number of obstacles, the search scale will increase, and the search difficulty will increase accordingly. The performance of the basic ant colony algorithm has been very poor, and the performance of the algorithm in literature [8] has also declined to some extent, but the performance of the algorithm in this paper is still good. For the path planning experiment of intelligent robot, the running track of the robot is shown in Fig. 13.3. A is the planned path, and B is the actual walking path of the robot. Figure 13.4 shows the change of left–right speed of the robot. As can be seen from Fig. 13.4, in the initial stage, the walking path of the robot basically coincides with the planned path, and then gradually deviates from the planned path. There are two reasons. First, the sensor of the robot has accumulated errors after obtaining information many times. Second, it is caused by the experimental environment. Because of the randomly generated initial population and the mutation operation in the genetic process, new individuals may be generated, including the barrier grid serial number, so it is necessary to design a deletion operator. At the same time, when the genetic operator is executed, the same serial number will be generated in the individual, and the operation should also be deleted. When planning according to the obtained path, because there are many path nodes, the robot’s direction and angle are adjusted many times, while when there are few path nodes, the robot can reduce the running time, reach the destination quickly according to the shortest path, and the path is relatively smooth. Therefore, the effect

134

M. Sun

Fig. 13.4 Speed of left and right wheels of robot

of positive feedback is strengthened, the diversity of understanding is increased, the possibility of CGA falling into local minima is reduced, and the convergence speed is accelerated. At the same time, the algorithm is also conducive to parallel execution and application. Simulation experiments verify the effectiveness of the proposed algorithm.

13.5 Conclusions In this paper, the randomness, ergodicity and regularity of chaotic motion are used to overcome the premature phenomenon in the traditional GA process. The algorithm is easy to operate and implement. The simulation results show that when planning according to the obtained path, because there are more path nodes, the direction angle of the robot is adjusted more times, which strengthens the effect of positive feedback, increases the diversity of understanding, reduces the possibility of CGA falling into local minima, and accelerates the convergence speed. At the same time, the algorithm is also beneficial to parallel execution and application. Simulation results show the effectiveness of the algorithm. This makes the pheromone update ratio on the optimal path in the early iteration of the algorithm higher and the information update ratio on the global optimal path lower, which is beneficial to the generation of multiple solutions. Pheromones have a higher update ratio in the later global optimal path. Updating “pheromone” with the optimal path information obtained after crossover enhances the effect of positive feedback, increases the diversity of understanding, reduces the possibility of CGA falling into local minima, and accelerates the convergence speed. At the same time, the algorithm is also beneficial to parallel execution

13 Design of Path Planning Algorithm for Intelligent Robot Based …

135

and application. Simulation results show the effectiveness of the algorithm. In the actual working environment of the robot, it is necessary to consider the characteristics of the actual environment and the non-uniform grid search space in order to facilitate the exploration of the optimal path. In the path planning of intelligent robots, CGA can make full use of the distributed and parallel characteristics of the algorithm through the cooperation of multiple robots, and improve the search efficiency. Acknowledgements This article is supported by the 2021 school-level scientific research project of Wuhan Institute of Design and Engineering: “Narrative Research and Visual Animation Interpretation of Dunhuang’s ‘Deer King Bensheng’ (No.: K202107)”.

References 1. Q. Song, S. Li, J. Yang, Q. Bai, J. Hu, X. Zhang, A. Zhang, Intelligent optimization algorithmbased path planning for a mobile robot. Comput. Intell. Neurosci. 2021 (2021) 2. Z. Chen, J. Zhou, C. Cheng, L. Kang, R. Sun, Intelligent camera path planning for scientific visualization based on genetic algorithm and cardinal spline. Int. Agric. Eng. J. 29, 393–402 (2020) 3. X. Li, Path planning of intelligent mobile robot based on Dijkstra algorithm. J. Phys. Conf. Ser. 042034 (2021) 4. M.Z.K. Hawari, N.I.A. Apandi, Industry 4.0 with intelligent manufacturing 5G mobile robot based on genetic algorithm. Indones. J. Electr. Eng. Comput. Sci. 23, 1376–1384 (2021) 5. A.K. Rath, D.R. Parhi, H.C. Das, P.B. Kumar, M.K. Mahto, Design of a hybrid controller using genetic algorithm and neural network for path planning of a humanoid robot. Int. J. Intell. Unmanned Syst. 9, 169–177 (2021) 6. X. Wang, Y. Yan, X. Gu, Spot welding robot path planning using intelligent algorithm. J. Manuf. Process. 42, 1–10 (2019) 7. T. Yifei, Z. Meng, L. Jingwei, L. Dongbo, W. Yulin, Research on intelligent welding robot path optimization based on GA and PSO algorithms. IEEE Access 1 (2018) 8. X. Yin, J. Mao, Y. Zhou, X. Chen, Path planning of EOD robot based on hybrid chaotic sequence and genetic algorithm. Comput. Era 5–8 (2019) 9. A.S. Al-Araji, A.K. Ahmed, K.E. Dagher, A cognition path planning with a nonlinear controller design for wheeled mobile robot based on an intelligent algorithm. J. Eng. 25, 64–83 (2019) 10. Y. Pei, L. Yang, C. Yang, Mobile robot path planning based on a hybrid genetic algorithm. Mod. Electron. Tech. 42, 183–186 (2019)

Chapter 14

Design and Development of Rail Transit Overhead Contact Line Monitoring System Based on Image Processing Zhigang Jiang , Wu Tan, Hao Tan, and Jun Huang

Abstract With the development and continuous expansion of my country’s highspeed railway, the requirements for railway operation safety in my country have increased, and the manual inspection of catenary safety hazards cannot meet the requirements of modern high-efficiency inspection. Therefore, this paper designed the rail transit catenary monitoring system with image processing technology. Specifically, it introduced the catenary and image processing technology, then designed the catenary monitoring system, put forward the overall scheme for the system, and finally implemented and tested the monitoring system, in which the main is the analysis of monitoring the terminal energy-saving mode. Keywords Image processing · Catenary monitoring · Monitoring system · System design · Rail transit

14.1 Introduction In recent years, the popularity of high-speed railways in China has led to increased attention towards railway driving safety [1]. Traditional manual detection methods are known for their low performance, lengthy processing times, and high detection costs, making them unsuitable for high-performance driving requirements. As a result, intelligent detection methods based on image processing technology are Z. Jiang (B) China Railway Nanchang Group Co., Ltd, Nanchang, China e-mail: [email protected] W. Tan China Railway Nanning Group Co., Ltd, Nanning, China H. Tan Nantong Rail Transit Group Co., Ltd, Nantong, China J. Huang Nanchang Rail Transit Design and Research Institute Co., Ltd, Nanchang, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_14

137

138

Z. Jiang et al.

being explored [2]. One such technology is the Rail Transit Overhead Contact Line Monitoring System, which uses an intelligent inspection device installed in the train driver’s cab to acquire images of the overhead contact line while the train is in motion. The system then uses image processing algorithms to identify and locate the catenary equipment, as well as detect any abnormal states. By automating this process, the system eliminates the need for manual inspection, freeing up manpower and ensuring timely detection of potential safety hazards [3]. The development of such a system is crucial for the operational safety of high-speed railways, and has the potential to significantly enhance the safety and efficiency of rail transit. Many researchers have conducted in-depth research on rail transit catenary monitoring system based on image processing technology and achieved good research results [4]. For example, Nelson and Hibberd introduced in detail the detection technology and detection methods adopted by the pantograph monitoring system in arc detection, pantograph, etc. security, and provides a research direction for the realization of digital and intelligent pantograph-catenary dynamic monitoring [5]. Schonfeld et al. proposed a new method based on digital filtering techniques for temperature compensation in fiber Bragg grating sensor systems to monitor the conditions of electrified pantograph catenary systems. The Butterworth high-pass filter was designed to suppress the temperature-dependent signal components of the stopband and capture the true strain in the passband in the frequency domain [6]. Baranov et al. proposed a novel image processing algorithm for identifying and locating pantograph-catenary disconnections in high-speed railway systems. The algorithm integrates edge detection, morphology operations, and machine learning techniques to accurately detect and locate disconnections [7]. Additionally, Karakose and Yaman developed a visual fault diagnosis system for catenary systems in urban rail transit. The system uses highresolution images and image processing algorithms to identify and diagnose faults in catenary systems, enabling timely repairs and reducing maintenance costs. These recent studies demonstrate the potential for further advancements in the development of rail transit catenary monitoring systems based on image processing technology. As such, continued research in this area can lead to the development of even more advanced and effective monitoring systems that enhance the safety and efficiency of rail transit systems [8]. The catenary monitoring system is of great significance to the safe operation of trains, so this paper studies the catenary monitoring system based on image processing technology. The structure of the article is divided into three parts, namely related concept introduction, system design and system implementation. In the related concept introduction part, the catenary and image processing technology are introduced. In the system design part, the overall design scheme is proposed and the energy consumption of the main components of the monitoring terminal is analyzed. Finally, the energy-saving mode of the monitoring terminal is analyzed in the system implementation part to analyze and teste the performance of the system.

14 Design and Development of Rail Transit Overhead Contact Line …

139

14.2 Related Concepts 14.2.1 Catenary The catenary is a vital part of the traction network and is the key facility to ensure the normal running of the electric locomotive. The catenary compensation device is a supporting device for the catenary, which can reasonably adjust the tension of the contact wire and the load-bearing cable, improve the power supply quality of the catenary, and improve the current receiving conditions of the pantograph and catenary [8].

14.2.2 Image Processing Technology Image analysis is performed by analyzing image sequences [9]. Video image analysis is divided into three steps: image segmentation, object recognition, behavior and scene analysis. Image segmentation is mainly to segment foreground targets. If multiple targets are in the foreground, each target must be separated at one time [10]. Target recognition is only to determine the target type, such as whether it is a human target. Behavioral and scene analysis is the analysis and interpretation of the actions performed by the target.

14.3 System Design 14.3.1 Overall Scheme Design The system mainly realizes the monitoring of the catenary compensation device, which is mainly divided into two parts: the lower computer and the upper computer. The system needs to arrange the lower computer in the compensation device of the monitoring point. One MCU corresponds to multiple sensors, and multiple lower computers correspond to one upper computer. The main function is to collect the necessary parameters of the monitoring point compensation device through the sensor, and the MCU collects and packages the sensor data, and sends it to the host computer server through the wireless communication module. After the server of the host computer receives the data, it stores the data in the database, and can display the system status to the user through the client and can also make an alarm when the monitoring point fails. At the same time, when the user needs to know the on-site situation of the monitoring point, he can also issue a photographing instruction to the lower computer through the upper computer, and the lower computer uploads the

140

Z. Jiang et al. client

web service database revres

server application

data receiving program error alarm

Fig. 14.1 Design of the upper camera position

captured pictures to the upper computer, and uses image processing technology to assist monitoring. The role of the upper computer is to display and store the data collected by the lower computer and the fault identification results, so as to provide a human– computer interaction platform for the user. The design diagram is shown in Fig. 14.1. The upper computer part of the system mainly includes three aspects, cloud server, database and management client. The functions of the system host computer client mainly include monitoring point number display, sensor data display, monitoring point picture display, etc. The role of the database is to store the data (including sensor data and pictures) collected by the lower computer, and the user can add, delete, modify, and check the data in the database at any time when needed. The design of the catenary monitoring system not only realizes the basic hardware acquisition and software monitoring, but also realizes the research on the terminal energy saving strategy. The high energy consumption of the terminal not only shortens the power supply time, but also reduces the reliability of the system. Therefore, it is necessary to conduct in-depth research on energy-saving strategies in this paper to ensure the reliable operation of the terminal. The vibration of the catenary will not only cause the looseness of the components of the catenary, but also cause the contact wire to break due to fatigue when the amplitude is too large. By analyzing the vibration frequency characteristics of the catenary, not only can the state of the catenary be indirectly understood, but also provide a theoretical basis for the maintenance and replacement of clues and components.

14.3.2 Monitoring Preprocessing Module The monitoring preprocessing module performs some image processing operations on the video stream data after the main program starts real-time monitoring, mainly for the subsequent detection and alarming of dangerous phenomena. As shown in Fig. 14.2, the main process of the monitoring preprocessing module is: the first step is to initialize the relevant parameters, the second step is to obtain the data stream and decode the data, and the third step is to create an image processing thread. The relevant parameters are loaded into the main program, and by processing the

14 Design and Development of Rail Transit Overhead Contact Line … Fig. 14.2 Monitoring preprocessing flow chart

141

Monitoring starts

Initialize related parameters

Data stream acquisition and decoding

Create image processing thread

thread allocation

Finish

video stream data, after the system detects a dangerous phenomenon, the system can capture and collect evidence and alarm.

14.3.3 Monitoring Terminal Energy Consumption Analysis The power of the monitoring terminal needs to be self-sufficient. If the terminal’s power consumption is too large, it will directly affect the continuity and stability of the power supply. In response to this problem, Table 14.1 summarizes the power consumption and energy consumption of the main components of the monitoring terminal under the condition of all-day operation. The calculations for daily power consumption are (14.1) and (14.2), U is the working voltage, I is the active current, and t is the active time of the device: Q 1 = U11 I11 t1 + U12 I12 (24 − t1 )

(14.1)

Q =U × I ×t

(14.2)

By calculating the power consumption and energy consumption of the major modules of the monitoring terminal, the power consumption of the STM32 is 0.414 and the energy consumption is 9.936 Wh. The whole day energy consumption of the module is 0.162 and the energy consumption is 3.888 Wh. The laser sensor’s whole day operation power consumption is 1.08 and the energy consumption is 25.92 Wh. The whole day operation power consumption of the communication module is 0.40 and the energy consumption is 9.6 Wh. Analyzing the data in Table 14.1 shows that

142

Z. Jiang et al.

Table 14.1 Monitoring terminal daily electricity consumption table Device type

Operating voltage

Working current

Power consumption

Energy consumption

STM32

3.6

115

0.414

9.936

Temperature sensor

3.6

2

0.0072

0.1728

Positioning module

3.6

45

0.162

3.888

Laser sensor

27

40

1.08

25.92

Communication module

16

25

0.40

9.6

when the monitoring terminal runs all day, the daily power consumption is too large and energy-saving design is required. Figure 14.3 shows the energy consumption ratio of each device more intuitively. According to actual needs, the monitoring terminal does not need to collect data most of the time. Therefore, from the perspective of energy saving, the terminal only needs to enter the working state when data is collected and be in a low energy consumption state at other times. It can be seen from Fig. 14.3 that the main energyconsuming devices of the monitoring terminal are laser sensors, STM32 and communication modules, of which laser sensors account for 52%, the highest proportion. Followed by the communication module and STM32 energy consumption accounted for the highest proportion. Fig. 14.3 The energy consumption ratio of the main components of the monitoring terminal

14 Design and Development of Rail Transit Overhead Contact Line …

143

14.4 System Implementation 14.4.1 Monitoring Terminal Energy-Saving Mode The low-power design of the main controller is mainly to close the unnecessary power supply area inside the STM32 when it does not need to work. Through programming, STM32 can actively enter stop mode when it is not working. The current of STM32 in stop mode is only about 20 µA, which can greatly reduce the main controller’s power and energy consumption. The communication module can save energy from the perspective of reducing the power consumption in the sending state and increasing the sleep time. The power consumption of the communication module in the sending state is mainly related to the amount of data transmission and the transmission frequency. The larger the data transmission volume or the faster the transmission frequency, the more power is consumed. To reduce power consumption, when monitoring the data sent by the terminal to the upper computer, the data packet must have the characteristics of large transmission interval, small data volume, and slow transmission rate. When the communication module is in the dormant state, it will no longer receive and send data information, and the energy consumed at this time is the lowest, which can keep the communication module in dormancy for a long time to reduce power consumption. When the catenary monitoring system adopts the energy-saving design, the monitoring terminal collects data every 60 min by default. When it is necessary to collect data, the main controller enters the working mode for 6 min and the stop mode for the rest of the time. Therefore, the main controller works for 2.4 h a day by default, and its operating current and voltage are shown in Table 14.1. The rest of the time is in a dormant state, and the current consumption is about 20 µA. Using (14.1), the daily power consumption of the main controller is obtained as 0.994 Wh. The working time of the laser sensor in each cycle is 4 min, and the relay disconnects the power supply during the rest of the time. Therefore, the working time of the laser sensor is 1.6 h per day, and the daily power consumption is calculated as 1.728 Wh using (14.2). The working time of the communication module in each cycle is 6 min, and the rest of the time is in a sleep state, and the sleep energy consumption is negligible relative to the working energy consumption. Therefore, the working time of the communication module is 2.4 h per day, and its daily power consumption is 0.96 Wh. In order to visualize the energy-saving effect, Fig. 14.4 uses a bar graph to compare the daily power consumption of the main controller, laser sensor and communication module before and after the energy-saving design. It can be seen from the figure that after adopting the energy-saving design method, the daily power consumption of the main controller, the laser sensor and the communication module is 0.994 Wh, 1.728 Wh, and 0.96 Wh respectively. The energy consumption is reduced, which further verifies the method’s effectiveness.

144

Z. Jiang et al.

Fig. 14.4 Comparison of device energy consumption before and after energy saving

Table 14.2 System performance test table Concurrency

20

60

150

300

Average response time (s)

0.025

0.135

1.153

1.987

Denial of service rate (%)

0

0

0.8

1.7

14.4.2 System Performance Test After the functional module test of the system is completed, it is necessary to test the specific performance of the system. We test the monitoring system’s performance by simulating the client’s access to the system through the software. It can be seen from Table 14.2 that with the increase of client users, the system response time is getting longer and longer. Still, the response time is basically within 2 s, which meets the client’s needs, so the monitoring system has good performance and can meet daily use.

14.5 Conclusion This paper studies the catenary monitoring system based on the theory of image processing technology. The catenary is an important facility to ensure the safe operation of electrified trains, so the catenary must be monitored. In the system design part, this paper designs the upper station in the catenary monitoring system, explains the monitoring preprocessing process, and analyzes the energy consumption of the monitoring terminal. In the system implementation part, the monitoring terminal energy-saving strategy is mainly proposed, and the energy consumption is compared, and it is found that the energy-saving mode reduces the energy consumption. This

14 Design and Development of Rail Transit Overhead Contact Line …

145

paper has many deficiencies that need to be corrected, but the research on catenary monitoring systems based on image processing technology is a worthy research direction.

References 1. B. Firlik, M. Tabaszewski, Monitoring of the technical condition of tracks based on machine learning. Proc. Inst. Mech. Eng. F J. Rail Rapid Transit 234, 702–708 (2020) 2. S. Derosa, P. Nåvik, A. Collina, A. Rønnquist, Railway catenary tension force monitoring via the analysis of wave propagation in cables. Proc. Inst. Mech. Eng. F J. Rail Rapid Transit 235, 494–504 (2021) 3. S.-X. Chen, L. Zhou, Y.-Q. Ni, X.-Z. Liu, An acoustic-homologous transfer learning approach for acoustic emission–based rail condition evaluation. Struct. Health Monit. 20, 2161–2181 (2021) 4. B.-R. Lin, Series-connected high frequency converters in a DC microgrid system for DC light rail transit. Energies 11, 266 (2018) 5. A.C. Nelson, R. Hibberd, Influence of rail transit on development patterns in the mountain mega-region with a surprise and implications for rail transit and land-use planning. Transp. Res. Rec. 2675, 374–390 (2021) 6. J. Liu, P.M. Schonfeld, A. Li, Q. Peng, Y. Yin, Effects of line-capacity reductions on urban rail transit system service performance. J. Transp. Eng. A Syst. 146, 04020118 (2020) 7. V. Baranov, I. Vikulov, A. Kiselev, A. Maznev, Monitoring system of electrodynamic braking of DC electric train with collector traction motors. Russ. Railw. Sci. J. 77, 301–309 (2018) 8. M. Karakose, O. Yaman, Complex fuzzy system based predictive maintenance approach in railways. IEEE Trans. Ind. Inf. 16, 6023–6032 (2020) 9. S. Xu, C.-Y. Ji, C. Guedes Soares, Experimental study on effect of side-mooring lines on dynamics of a catenary moored semi-submersible system. Proc. Inst. Mech. Eng. M J. Eng. Marit. Environ. 234, 127–142 (2020) 10. F. Vesali, H. Molatefi, M.A. Rezvani, B. Moaveni, M. Hecht, New control approaches to improve contact quality in the conventional spans and overlap section in a high-speed catenary system. Proc. Inst. Mech. Eng. F J. Rail Rapid Transit 233, 988–999 (2019)

Chapter 15

Ultrasonic Signal Processing Method for Transformer Oil Based on Improved EMD Yihua Qian, Qing Wang, Yaohong Zhao, Dingkun Yang, and Zhuang Yang

Abstract In order to improve the accuracy of transformer oil quality assessment based on ultrasonic characteristics and to guarantee the safe operation of transformers, it is necessary to process the raw ultrasonic signals. In this paper, an improved empirical mode decomposition (EMD) based ultrasonic signal processing method for transformer oil is proposed. Firstly, Gaussian white noise is added to the collected oil ultrasonic signal, and then its EMD decomposition is performed. Then the correlation coefficient between intrinsic mode functions (IMF) and the original signal is calculated using the Pearson correlation coefficient method. Finally, the IMF with high correlation coefficient is selected for signal reconstruction. The signal-tonoise ratio of the reconstructed signal is 18.312 and the correlation coefficient with the original signal is 0.9612. In addition, the moisture in oil is simply evaluated by using the reconstructed signal. The results show that the detection accuracy of moisture in oil based on reconstructed signal is 3% higher than that based on original signal. Keywords Ultrasonic signal · Signal processing · Transformer oil · Empirical mode decomposition

Y. Qian (B) · Q. Wang · Y. Zhao Electric Power Research Institute of Guangdong Power Grid Corporation, Guangzhou, China e-mail: [email protected] D. Yang Chongqing University of Posts and Telecommunications, Chongqing, China Z. Yang National Key Laboratory of Power Transmission Equipment Technology, Chongqing, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_15

147

148

Y. Qian et al.

15.1 Introduction Monitoring the quality status of transformer oil is an important means of ensuring safe transformer operation [1]. Ultrasonic detection technology is a new and effective method to monitor the quality of transformer oil [2]. However, the ultrasonic signal of the oil contains not only the characteristic signals related to the quality of transformer oil, but also many interference signals [3]. Therefore, the processing of ultrasonic signals to filter out interference signals and retain only the acoustic signature signals related to oil quality information is important to achieve effective oil quality assessment using ultrasonic technology. For the measured oil ultrasound signal, the effective components need to be extracted and utilized, and the useless noise components need to be eliminated in order to guarantee the reliability of the subsequent signal processing and the accuracy of the evaluation [4]. In addition to the zero drift of the sensor and other detection components, the interference of external electromagnetic fields, environmental and other factors cause the impact of the actual measured signal in the invalid component often does not have a very clear physical meaning, with a random nature [5]. This brings a lot of difficulties to the accurate extraction and elimination of noise and other invalid components, and a series of problems remain to be studied. The ultrasonic signal is a non-smooth, non-linear, abruptly changing signal, and if the ultrasonic echo is studied by the signal processing method of spectrum analysis, very reliable signal processing results cannot be obtained [6]. Time–frequency analysis method can make the signal in the time domain and frequency domain to obtain sufficient resolution, more conducive to the analysis and processing of non-stationary signals such as speech and sound waves [7]. In recent years, time–frequency analysis methods is often applied in signal processing research such as noise reduction and feature extraction of ultrasonic signals. The authors in [8] proposed EMD algorithm based on wavelet packet transform applied to the processing of ultrasonic signals in engineering flaw detection to achieve good noise elimination performance, but the method also has the problem of signal energy leakage. The authors in [9] proposed a low-rank and joint-sparse representation model to process the image signal in the ultrasonic signal. The authors in [10] remove the clutter using deep unfolded robust PCA and has been applied in medical imaging. However, few researchers have taken intensive study of processing method of ultrasonic signal of transformer oil. Even though literatures [11, 12] have considered the feature extraction of ultrasonic signal of transformer oil, because of the lack of the filtering of signals before extraction, a large amount of redundant information remains. In this paper, an improved empirical mode decomposition based ultrasonic signal processing method for transformer oil is proposed. Firstly, the ultrasonic detection platform is used to detect the transformer oil, so as to obtain the original ultrasonic signal. Then, the original signal is decomposed using the ensemble empirical mode decomposition (EEMD) method to obtain multiple intrinsic mode function (IMF) signal components. Finally, the invalid IMF signal components are screened out using the Pearson correlation coefficient method, and the valid IMF signal components are

15 Ultrasonic Signal Processing Method for Transformer Oil Based …

149

reconstructed to obtain the reconstructed ultrasound signal. In addition, the moisture in oil is simply evaluated by using the reconstructed signal. The experimental results show that this method can effectively eliminate noise and retain the real information of waveform, so as to improve the ultrasonic voiceprint feature extraction and state detection of oil.

15.2 Tests and Methods 15.2.1 Ultrasonic Testing Ultrasonic inspection technology is used to obtain the physical and chemical properties of the measured medium by studying the certain relationship between the acoustic parameters of the measured medium and the object to be measured. The main sensing methods of ultrasonic detection are infiltration and reflection methods. This study uses the infiltration method, which has two ultrasonic transducers located on both sides of the measured substance. One of the ultrasonic transducers is used to emit an ultrasonic signal, and the other ultrasonic transducer is used to receive the ultrasonic signal after it has propagated in the measured substance. In this process, since the ultrasonic waves will propagate through the measured substance, some of the properties of the measured substance will have an effect on the attenuation characteristics of the ultrasonic signals. Therefore, by analyzing the difference between the ultrasonic signal at the transmitting end and the ultrasonic signal at the receiving end, it is possible to obtain information on the characteristics of the substance being measured. However, due to other external factors, the measured signal indispensably contains interference signals. Therefore, the processing of the acquired ultrasonic signals is essential. The ultrasonic testing platform adopted in this paper is shown in Fig. 15.1. Its main components include: ultrasonic control unit, ultrasonic sensing unit and data processing unit. During the detection, the signal generator in the ultrasonic control unit will send out a set of sinusoidal waves with a frequency of 2 MHz and an amplitude of 200 mV, and then transmit them to the transducer in the ultrasonic sensing unit through the amplifier in the control unit. The ultrasonic signals passing through the oil will be received by another sensor, and the ultrasonic signals received in the data processing unit will be collected and analyzed by the computer.

15.2.2 Ultrasonic Signal Processing Method Based on Improved EMD EMD method is an adaptive data processing method proposed by Huang in 1998 [13]. It decomposes nonlinear and non-stationary signals to extract IMF, but is prone

150

Y. Qian et al.

Fig. 15.1 Ultrasonic signal detection platform

to modal aliasing. To avoid the above problems, this paper proposes an ultrasonic signal processing method based on EEMD [14] and Pearson correlation analysis. The specific steps are as follows: Step 1: Add Gaussian white noise to the original signal x(t), where Gaussian white noise conforms to normal distribution and the number of integration is set to I. Thus, the equation x i (t) = x(t) + wgni(t) is acquired, where i = 1, 2 …, I, wgni(t) is the white Gaussian noise added for the i time, and x i (t) is the signal of the i time additional noise. Secondly, each x i (t) is decomposed by EMD: xi (t) =

J 

ci, j (t) + ri, j (t)

(15.1)

j=1

where ci,j (t) is the jth IMF decomposed after adding white noise for the i time, r i,j (t) is the residual component, and j is the number of IMF decomposed. Then repeat the above steps for i times. Step 2: Make an overall average of IMF obtained each time to obtain the true component cj (t): c j (t) =

I 1 ci, j (t) I i=1

(15.2)

where cj (t) is the jth IMF component of EEMD decomposition. Step 3: The Pearson correlation coefficient method is used to calculate the correlation coefficients between IMF components and the original signal, and the IMF components that are correlated with the original signal are screened. The expression of R for calculating the number of mutual relations between IMF component and the original signal is: R( j) =

T 1  x(t)c j (t) T t=1

(15.3)

15 Ultrasonic Signal Processing Method for Transformer Oil Based …

151

where R( j) is the number of mutual relation between the jth IMF component cj (t) and the original signal x(t), and T is the length of the signal. Step 4: Eliminate IMF components whose threshold value R is less than 0.2, and then add the remaining IMF components, reconstructing them into filtered ultrasonic signals. Step 5: Use signal-to-noise ratio SNR to evaluate the signal effect after filtering. The calculation formula is as follows: T x2 (15.4) S N R = 10 lg T t=1 t 2 t=1 (x t − X t )

15.3 Results and Discussion In this paper, the ultrasonic platform described in Sect. 15.2.1 is first used for ultrasonic testing of transformer oil samples, and the data of 1000 sampling points are intercepted by the computer for subsequent analysis. The original ultrasonic signal is shown in Fig. 15.2. Set the standard deviation of white noise as 0.05, and the integration number I = 50. Then the ultrasonic signal obtained by measurement was decomposed by steps 1 and 2 described in Sect. 15.2.2, and 7 IMF components and 1 residual component were obtained, as shown in Fig. 15.3. In view of the EEMD decomposition results of the original ultrasonic signals, Pearson correlation coefficient method was used to obtain the mutual relation number between each IMF component and the original signals, and the results were shown in Table 15.1. As shown in Table 15.1, IMF2 has the highest correlation with the original signal. Besides IMF2, only IMF3 and IMF4 have correlation coefficients exceeding 0.2 with the original signal. Comparing Figs. 15.2 and 15.3 it can be seen

Fig. 15.2 Original ultrasonic signal

152

Y. Qian et al.

Fig. 15.3 EEMD decomposition results of ultrasonic signal

that IMF2, IMF3 and IMF4 are closer to the original information and the rest of the signals can be considered as noise or invalid components. The signal is acquire after the removal of noise and invalid components while the remaining components are added and reconstructed. The result is shown in Fig. 15.4. As can be seen from Fig. 15.4, the proposed method can provide efficient filtration of the ultrasonic signals. The SNR of the filtered ultrasonic signal is 18.312, and the correlation coefficient R between the filtered ultrasonic signal and the original signal is 0.9612. Table 15.1 Correlation coefficients of signal components

IMF component

Correlation coefficient

IMF1

0.1743

IMF2

0.9431

IMF3

0.2816

IMF4

0.2158

IMF5

− 0.0061

IMF6

− 0.0018

IMF7

− 0.0017

r

− 0.0002

15 Ultrasonic Signal Processing Method for Transformer Oil Based …

153

Fig. 15.4 Original and reconstructed signal of the ultrasonic signal

In addition, an in-depth study was conducted to further verify that the reconstructed transformer oil ultrasound is more conducive to transformer oil quality assessment than the original oil ultrasound signal. First, transformer oil with different moisture were configured by vacuum-drying and humidification. Then ultrasonic testing and moisture testing were performed on each transformer oil. To ensure the authenticity and reliability of the data, the moisture testing was carried out with strict reference to the Karl Fischer titration method, and the ultrasonic testing was performed at a constant temperature (26 °C) and in a sealed environment. The oil ultrasonic signal is then processed using the method proposed in this paper. The support vector regression (SVR) [15] was introduced, and two kinds of ultrasonic signals were taken as input and water content in oil as output, respectively. The prediction model of water content in oil based on the original ultrasonic signal and the prediction model of water content in oil based on the reconstructed ultrasonic signal were constructed. Finally, the comparison of the prediction accuracy of the two models demonstrates that the reconstructed signals can improve the accuracy of transformer oil quality assessment. Given sample sets is (x 1 , y1 ), …, (x n , yn ), x i ∈ Rm , yi ∈ R. Then SVR can be expressed as follows. f (x) = ω T ϕ(x) + b

(15.5)

154

Y. Qian et al.

where ω = {ω1 , …, ωm } is normal vector, b is deviation, ϕ is a nonlinear mapping function that converts the sample input into a high-dimensional feature space. The problem of solving the optimal hyperplane can be transformed into the following optimization problem:  1 min ||ω||2 + C ξi+ + ξi− ω,b,ε 2 i=1 n

⎧ T + ⎨ ω ϕ(x) + b − yi ≤ ε + ξi T s.t. yi − ω ϕ(x) − b ≤ ε + ξi− ⎩ + − ξi , ξi ≥ 0

(15.6)

(15.7)

where C is regularization factor, ξi+ , ξi− are relaxation variables, ε is Insensitivity loss factor. In order to solve linearly constrained quadratic programming problems, the Karush–Kuhn–Tucker (KKT) condition is sufficiently necessary. The solution obtained from its dual problem is a linear combination of the support vectors. ω=



βi ϕ(xi )

(15.8)

S.V.

f ω,b (x) =



βi ϕ(xi ), ϕ(x) + b

(15.9)

S.V.

where ϕ(xi ), ϕ(xi ) is a valid kernel function. In this paper, radial basis functions are used as kernel functions. The oil ultrasonic signal is then processed using the method proposed in this paper, and Fig. 15.5 shows the ultrasonic reconstruction signal of transformer oil with five different moistures.

Fig. 15.5 Ultrasonic reconstruction signal of transformer oil with different moisture

15 Ultrasonic Signal Processing Method for Transformer Oil Based … Table 15.2 The moisture of the five oil samples in Fig. 15.5

155

Oil sample

Moisture (mL/L)

#1

21.35

#2

17.32

#3

11.25

#4

6.01

#5

2.59

As can be seen from Fig. 15.5 and Table 15.2, the reconstructed ultrasonic signal effectively filters out redundant information such as noise, thus eliminating the influence of redundant information on the detection of micro-water content in transformer oil. It can also be seen from the figure that the amplitude and sound velocity of the ultrasonic signal of transformer oil with different micro water content are different. The amplitude is more obviously affected by the moisture, which decreases with the increase of moisture. In the verification of this paper, a total of 110 sets of oil samples of different real transformers were collected, among which 100 sets of oil samples were randomly selected as the training set and the remaining 10 sets as the test set. In order to ensure that all ultrasonic characteristic parameters related to transformer oil moisture are selected, this paper uses the whole ultrasonic signal as input to establish a model for moisture in transformer oil detection. The predicted results are shown in Table 15.3. As can be seen from Table 15.3, the maximum error percentage of moisture content prediction based on reconstructed ultrasonic signal is 11% and the minimum error percentage is 5%. The maximum error percentage of moisture prediction based on the original ultrasonic signal is 16% and the minimum error percentage is 7%. The prediction accuracy of moisture in transformer oil based on the reconstructed ultrasonic signal is 3% better than that of the original signal-based moisture in transformer Table 15.3 Prediction of moisture of transformer oil

Number

True value (mL/L)

Original signal Reconstructed signal (mL/L) (mL/L)

1

14.3

13.1

13.56

2

11.32

12.26

10.52

3

16.48

14.95

15.63

4

6.44

5.98

5.98

5

11

12.14

10.25

6

2.29

2.67

2.55

7

4.22

3.79

3.99

8

7.49

6.44

9

22.16

20.01

21

10

18.43

20.11

17.02

6.77

156

Y. Qian et al.

oil. The results show that the proposed ultrasonic signal processing method for transformer oil can eliminate invalid information while retaining valid information, and the reconstructed signal is more conducive to transformer oil quality assessment.

15.4 Conclusion In order to extract the effective information in the oil ultrasonic signal and improve the accuracy of oil quality state evaluation based on the oil ultrasonic signal. In this paper, EEMD algorithm is used to decompose the original ultrasonic signal of transformer oil, and Pearson correlation coefficient analysis is used to analyze and screen the decomposed signal components, and the useful signal components are reconstructed, and finally the filtered ultrasonic signal is obtained. The results show that the SNR of the filtered ultrasonic signal is 18.312, and the correlation coefficient between the filtered ultrasonic signal and the original signal is 0.9612. Through indepth research, it is verified that the prediction accuracy of the moisture prediction model based on the reconstructed signal is 3% higher than that of the moisture prediction model based on the original signal. The research results show that the method proposed in this paper can effectively eliminate the interference information in the oil ultrasonic signal and improve the accuracy of monitoring the transformer oil quality using the oil ultrasonic signal. The research results of this paper give a solid foundation for online monitoring of oil quality based on ultrasonic signals and provide a new idea for future online monitoring technology in the power industry. Acknowledgements Project Supported by Science and Technology Program of China Southern Power Grid Co., Ltd. (GDKJXM20210087).

References 1. Z. Yang, W. Chen, D. Yang, R. Song, A novel recognition method of aging stage of transformer oil-paper insulation using Raman spectroscopic recurrence plots. IEEE Trans. Dielectr. Electr. Insul. 29, 1152–1159 (2022) 2. G.Z. Wang, D.H. Fu, F. Du, H. Yu, R. Cai, Q. Wang, Y.P. Gong, Transformer fault voiceprint recognition based on repeating pattern extraction and gaussian mixture model. Guangdong Electr. Power 36, 126–134 (2023) 3. Z. Yang, Q. Zhou, X. Wu, Z. Zhao, C. Tang, W. Chen, Detection of water content in transformer oil using multi frequency ultrasonic with PCA-GA-BPNN. Energies 12, 1379 (2019) 4. B. Li, Q. Zhou, Y. Liu, J. Chen, A novel nondestructive testing method for dielectric loss factor of transformer oil based on multifrequency ultrasound. IEEE Trans. Dielectr. Electr. Insul. 29, 1659–1665 (2022) 5. W. He, Y. Hao, Z. Zou, Application and prospect of ultrasonic testing technology to stress in basin insulators for GIS. Guangdong Electr. Power 34, 13–20 (2021) 6. Q. Ke, W. Li, W. Wang, S. Song, Correction method with stress field effects in ultrasound nondestructive testing. Nondestruct. Test. Eval. 37, 277–296 (2022)

15 Ultrasonic Signal Processing Method for Transformer Oil Based …

157

7. K.M. Tant, A.J. Mulholland, M. Langer, A. Gachagan, A fractional Fourier transform analysis of the scattering of ultrasonic waves. Proc. R. Soc. A Math. Phys. Eng. Sci. 471, 20140958 (2015) 8. Y. Zhang, X. Xu, Application of wavelet packet-based EMD algorithm in ultrasonic echo signal de-noising. J. Nanjing Univ. Inf. Sci. Technol. Nat. Sci. Ed. 4, 442 (2012) 9. M. Zhang, I. Markovsky, C. Schretter, J. D’hooge, Compressed ultrasound signal reconstruction using a low-rank and joint-sparse representation model. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 66, 1232–1245 (2019) 10. O. Solomon, R. Cohen, Y. Zhang, Y. Yang, Q. He, J. Luo, R.J. van Sloun, Y.C. Eldar, Deep unfolded robust PCA with application to clutter suppression in ultrasound. IEEE Trans. Med. Imaging 39, 1051–1063 (2019) 11. Z. Yang, Q. Zhou, Y.H. Zhao, X.D. Wu, C. Tang, Prediction of interfacial tension of transformer oil based on artificial neural network and multi-frequency ultrasonic testing technology. High Volt. Eng. 45, 3343–3349 (2019) 12. Z. Yang, Q. Zhou, X. Wu, Z. Zhao, A novel measuring method of interfacial tension of transformer oil combined PSO optimized SVM and multi frequency ultrasonic technology. IEEE Access 7, 182624–182631 (2019) 13. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.-C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 454, 903–995 (1998) 14. L. Wang, Y. Shao, Fault feature extraction of rotating machinery using a reweighted complete ensemble empirical mode decomposition with adaptive noise and demodulation analysis. Mech. Syst. Signal Process. 138, 106545 (2020) 15. H. Yu, J. Lu, G. Zhang, An online robust support vector regression for data streams. IEEE Trans. Knowl. Data Eng. 34, 150–163 (2020)

Chapter 16

Research on UHV Transmission Line Selection Strategy Aided by Satellite Remote Sensing Image Wei Du, Guozhu Yang, Chuntian Ma, Enhui Wei, and Chao Gao

Abstract The core of power transmission line planning and design is the selection of transmission line path scheme, while ultra-high voltage overhead transmission lines span a long distance, and the geographical environment is diverse, which involves many influencing factors of optimization. In this paper, taking UHV transmission line selection as an example, the uncontrolled positioning accuracy and controlled positioning accuracy of satellite RS (remote sensing) images are analyzed and compared, and its adjustment accuracy is statistically analyzed in the case of no control points and control points. The research shows that after adding control points, the adjustment accuracy of DSM’s regional network reaches 2.71 m in plane and 2.91 m in elevation. It can meet the requirements of 1:10,000 scale elevation accuracy in mountainous and alpine areas. GA (genetic algorithm) constructed in this study can effectively bypass the avoidance areas such as residential areas, disaster points, nature reserves, scenic spots and ecologically sensitive areas when optimizing UHV transmission line selection, which greatly improves the accuracy and level of initial design of UHV transmission line selection. Keywords RS images · UHV transmission · Line selection

16.1 Introduction The rapid development of UHV power grid has led to the research and development of a series of new technologies in power grid construction, involving various stages of power grid construction feasibility study, design, construction and operation. Satellite RS (remote sensing) technology is widely used in various fields of national production because of its strong macro-scale, large scale, short cycle, low cost, ability to reflect dynamic changes, and less restriction by ground conditions W. Du · G. Yang (B) · C. Ma · E. Wei · C. Gao State Grid Electric Space Technology Company Limited, Beijing 102209, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_16

159

160

W. Du et al.

[1]. By using high-tech means such as satellite photos, aerial photos, GPS system and all-digital photogrammetry technology, the route scheme is optimized, and the survey and design data are obtained, so as to realize the optimization of the whole process of power transmission line from planning, survey and design, tower arrangement design, so as to reduce the project cost, protect the environment and improve the survey and design level of power transmission projects [2, 3]. RS technology can real-time, quickly and dynamically extract the characteristics of geology, landform and topography along the transmission line, and provide a basis for the selection and establishment of the line. The core of power grid transmission line planning and design is the selection of transmission line path scheme. However, the ultra-high voltage overhead transmission line spans a long distance, and the geographical environment is diverse, and there are many influencing factors involved in optimization. It is difficult for designers to weigh the influence of each factor in a short time to achieve the economic, safety and environmental protection goals of the planned path. Using satellite RS technology, the ground elevation model is extracted by stereo image pair analysis, and its spatial resolution is 1 arc second. Overhead transmission lines belong to long-distance linear projects, and satellite RS has obvious application advantages in power line selection because of its characteristics of large-scale synchronous observation, high timeliness and all-weather work, and there have been many different levels of application research [4]. Reference [5] puts forward the method of using RS technology to assist survey work. Literature [6] uses artificial neural network and GIS to simulate soil erosion rate and spatial change. Reference [7] applies simulated annealing algorithm, GA (genetic algorithm) and PSO (Particle Swarm Optimization) to the spatial optimization of multi-objective land allocation. By introducing artificial intelligence into the field of spatial information processing, the spatial analysis ability of GIS (Geographic Information System) is improved and the model of intelligent spatial data processing is constructed. Transmission line design is to plan an optimal transmission line from power plant to substation according to certain principles. The function of GIS spatial analysis and the ability of RS to provide rich geospatial information can realistically reflect the topography, topography and geomorphology of the research area, with good visual effect, giving analysts a sense of presence and providing a virtual environment for their analysis and handling of problems. It can comprehensively improve the level of engineering construction and management, and ensure the scientificity of engineering design scheme, timeliness of construction safety control, economy of engineering management and scientificity of command and decision [8, 9]. In this paper, the representative stereo mapping satellites among domestic RS satellites are studied, and their mapping accuracy under uncontrolled and controlled conditions are analyzed respectively. Combined with the requirements of mapping scale in each stage of power line selection in relevant specifications, the stage applicability of multi-source domestic satellite data accuracy in UHV transmission line selection strategy is discussed.

16 Research on UHV Transmission Line Selection Strategy Aided …

161

16.2 Research Method 16.2.1 Data Processing Satellite RS photograph is a photograph obtained by geosynchronous satellite shooting the ground at high altitude. It has all the characteristics of aerial photographs, but it is far superior to aerial photographs in updating, because satellites observe and measure the earth at all times, and only need to collect it from relevant units when needed, which can eliminate all the preparatory work necessary for aerial surveys. According to the path scheme determined by the design specialty in the feasibility study stage and the requirements of the design specialty, the range of obtaining satellite photos is determined to be within 30 km on both sides of the path center line [10]. Because the satellite image can only reflect the ground image in 2D, and its elevation information itself cannot be reflected, it is necessary to form a DEM that provides elevation support for the satellite image through external control or other means, so that the satellite image can reflect the 3D information of the ground. In practical work, the planning of the line should not only take into account the images of topography and geomorphology on the line direction, but also consider the natural, cultural and social resources occupied in the line passage and the areas to avoid key natural disasters, including towns and villages, planning areas, heavily polluted areas, earthquake intensity zones and ice-covered areas. In mountainous areas and sparsely populated areas, medium-resolution Landsat images with a spatial resolution of about 28.5 m are used to cover [11]. Based on digital photogrammetry, satellite stereo image pairs can be used to build DEM data, make orthophoto maps, 3D ground models and so on. Therefore, combining high-resolution multi-spectral data and satellite stereo image pair data, the features of ground features and topography along the transmission line are automatically extracted, and a GIS database is constructed, and the route optimization of transmission line is realized by GIS spatial analysis method. The data processing flow is shown in Fig. 16.1. Firstly, the front, middle and rear-view data of Resource No. 3 are matched together to construct DSM, then the relative orientation and connection points are matched, and the points are manually confirmed and checked, and the points with large errors are deleted for free network adjustment. After that, the control points are selected, and the uncontrolled accuracy of the experimental area is checked by setting all the control points as checkpoints, some as checkpoints, and the rest as control points for regional network adjustment to evaluate the controlled accuracy of the experimental area.

162

W. Du et al.

Fig. 16.1 Data processing flow

16.2.2 Precise Correction of RS Image With the help of RS image and digital elevation model, the high-precision landform data collection, processing, processing and publishing of the project scope are carried out by UAV aerial photography, and the 3D digital landform around the project is constructed based on the 3D digital earth. And assist in the planning of the implementation plan of the patrol road. You can also use UAV aerial survey technology to build a construction visual management and control system and a 3D visual house demolition management system, carry out panoramic monitoring of the construction site, and assist in cleaning channels such as environmental water conservation and house demolition [12]. When RS images are acquired, they are often distorted due to the influence of platform, sensor, earth curvature and topographic relief. In order to correct these distortions and bring the images into a specific mapping coordinate system, it is necessary to carry out image geometric correction. At first, the contrast of RS image is small, the levels are few, the color is not rich, and the saturation is low. The fused image also has the above shortcomings, so it needs to be enhanced to eliminate all kinds of distortions attached to the radiation brightness in the image data. In this paper, the fused image is mainly processed by histogram equalization, histogram normalization and piecewise linear stretching. The enhanced RS image is bright and beautiful, and its quality has been obviously improved. Using the method of undetermined coefficients of bivariate quadratic polynomial, nine control points are selected for geometric fine correction of each image in three phases, and bilinear sampling is used for radiation sampling. Calculate the root mean square error R M Serr or of each ground control point with the following formula: R M Serr or =

/

(x ∗ − x)2 + (y ∗ − y)2

(16.1)

16 Research on UHV Transmission Line Selection Strategy Aided …

163

where x, y is the coordinate of the ground control point in the original image, and x ∗ , y ∗ is the coordinate of the control point corresponding to the corresponding polynomial calculation. Geometrical correction of the image uses 1:10,000 topographic map as the correction base map. First, the paper topographic map is scanned and corrected by coordinate grid in CASS software in South China. The corrected digital grid map is combined with DEM to perform segmented geometric correction on the 2.5 ms POT5 image. The correction is carried out on lidar and satellite RS image survey and design system. After correction, the maximum plane error is 2.4 m, and the elevation error is 2.1 m. After geometric correction, the grid structure of pixels has changed, that is, the size, shape and geometric relationship of pixels have changed, so the output pixels must be resampled. The resampling method can choose high-order interpolation methods, such as bilinear interpolation or cubic convolution interpolation.

16.2.3 Transmission Line Path Optimization In the feasibility study, initial design and construction drawing stages of UHV transmission line selection, natural and human factors such as planning areas, nature reserves, mining areas, military facilities and airports need to be reasonably avoided, and at the same time, the route situation should be comprehensively analyzed by integrating geological, hydrological and crossing conditions, and various economic indicators, especially the statistics and accounting of house demolition, should be counted, so as to comprehensively optimize the route scheme. It is necessary to build an information service and guarantee system for pipe gallery project, and build a monitoring and command center integrating real-time monitoring, dispatching and coordination, emergency command and technical achievements display. The selection of transmission path not only needs to fully implement the relevant national construction guidelines and technical policies, but also needs to fully consider the convenience of construction, the safety and reliability of line operation and the convenience of maintenance and emergency repair. From the perspective of construction cost, the design sheet of overhead transmission lines should be as short as possible, because in theory, the shorter the design sheet, the more economical the construction cost. From the point of view of safety and stability, it should be built in areas with gentle terrain, good geological conditions, no ice and no pollution. From the perspective of environmental impact, the smaller the impact, the better. Transmission line planning is to find an optimal path, where the optimal solution is a problem that needs to be solved in theory or equation. This study uses the advantages of GA in solving complex problems to optimize the optimal solution or the closest optimal solution that meets the requirements of power grid transmission planning and design. When GA is applied to transmission line optimization, in order to facilitate research, a plane linear model of transmission line will be constructed, which is

164

W. Du et al.

Fig. 16.2 The process of solving the optimization problem of transmission line path planning by using GA

suitable for easy solution. The process of applying GA to solve the optimization problem of transmission line path planning is shown in Fig. 16.2. Because the conditions involved in solving transmission lines are complex and the calculation amount is large, the genetic operator will be designed by knowledgebased operators, and the gene will be coded by floating-point coding, because floating-point coding is convenient for designing knowledge-based genetic operators for special problems, dealing with complex decision variables constraints, and can represent a wider range of numbers, obtain higher accuracy and improve the efficiency of the algorithm. Fitness function plays a decisive role in GA evolution and finding the optimal solution. Theoretically, the shortest line segment between two points is the best, because the complex terrain makes it difficult to build and maintain the line between two points, and it is also possible to cross the terrain of ecologically sensitive areas, lakes or cliffs, which makes the cost of line construction and maintenance higher or impossible to build. The optimization problem of this study is to find the optimal transmission line. The planning and optimization of transmission lines is regarded as the problem of seeking the minimum total cost, and all the factors involved are converted into costs. The lower the total cost of lines, the better the goal. According to the principle of the lowest total line cost, the fitness function is defined as: Fmin = SU M(line.point.cost)

(16.2)

16 Research on UHV Transmission Line Selection Strategy Aided …

165

16.3 Accuracy Analysis The area of about 180 km × 150 km is selected as the research area of path design. The landform features of the research area are low mountains and hills, with the maximum elevation of about 240 m, and most of the terrain is relatively flat. The water system is relatively developed, and there are many reservoirs and rivers in the area. There are basically no adverse geological structures in the area. The total length of collected images is 200.8 km. The working width is 170 km. In this study, RS stereo image pair data from stereo mapping satellite were used to establish digital elevation model for study area. The regional network adjustment of satellite RS images in summer in the experimental area is carried out under uncontrolled and controlled conditions respectively, and the DSM accuracy of the generated 5 m grid is shown in Fig. 16.3. It can be seen that, without control points, the plane accuracy and elevation accuracy of DSM after free network adjustment can reach 6.79 m, 4.44 m, and the maximum plane error of check point is 23.07 m and elevation is 26.59 m. After adding control points, the adjustment accuracy of DSM’s regional network reaches 2.71 m in plane and 2.91 m in elevation. Under controlled conditions, it can meet the elevation accuracy requirements of 1:10,000 scale in mountainous areas and alpine areas. In addition to reducing the total length of transmission lines, it also reduces the number of towers, tower materials, foundation steel, foundation concrete and operation and maintenance costs. In addition, the adoption of RS and GIS in route selection solves the problems of outdated data, long working period, difficulty in collecting data and poor environmental protection in conventional survey methods, which greatly shortens the survey and design time. Therefore, RS technology can be used as a powerful link to control the project cost. Fig. 16.3 DSM matching error

166

W. Du et al.

Figure 16.4 shows the convergence of GA optimization considering the comprehensive cost. It can be seen that the fitness value of genetic operation changes rapidly in the first 20 generations of transmission line route optimization, and the population keeps searching for better individuals to optimize the next generation, and then converges after 30 generations, and the fitness value changes little after convergence. Based on the previous work platform, the feasibility study recommended path scheme is developed, and the line path is optimized through high-resolution satellite images. The optimization process includes: transmission line recommended path scheme. Extract features and landforms such as independent houses and forest areas, and adjust the position of corner towers to avoid them reasonably. Real-time highspeed cross-section extraction, rough ranking, improve the accuracy of path establishment. Fast statistical economic indicators, such as reduced path length, cross-house volume and forest length. The distribution of superimposed ice zones and seismic intensity zones are analyzed to support differentiated design. Table 16.1 lists the optimization results.

Fig. 16.4 Convergence of GA optimization considering comprehensive cost

Table 16.1 Optimization result Optimized line

Optimized length (km)

Rotation angle number

Floor space (km2 )

Forest region length (km)

L_1

66.79

34

6821.39

23

L_2

53.97

61

6734.42

49

L_3

40.4

42

6590.7

23

Total

161.16

137

20,146.51

95

16 Research on UHV Transmission Line Selection Strategy Aided …

167

After optimization, the recommended feasible path length is about 161.16 km, crossing the railway for 4 times, expressway for 6 times, primary and secondary roads for 25 times, 500 kV power lines for 10 times, 220 kV power lines for 11 times and rivers for 8 times. The optimized path can reasonably bypass the avoidance zone and plan along the low-cost area, which shows that the designed GA model is reasonable, feasible and robust. The GA constructed in this study can effectively bypass the avoidance areas such as residential areas, disaster points, nature reserves, scenic spots and ecologically sensitive areas when optimizing UHV transmission line selection. It can provide accurate and rich basic data for the majors of electricity and economics, so that the majors of electricity and economics can avoid the previous estimation and truly take the data as the benchmark, which greatly improves the accuracy and level of the initial design of UHV transmission line selection.

16.4 Conclusions In this paper, the representative stereo mapping satellites among domestic RS satellites are studied, and their mapping accuracy under uncontrolled and controlled conditions are analyzed respectively. Combined with the requirements of mapping scale in each stage of power line selection in relevant specifications, the stage applicability of multi-source domestic satellite data accuracy in UHV transmission line selection strategy is discussed. After adding control points, the adjustment accuracy of DSM’s regional network reaches 2.71 m in plane and 2.91 m in elevation. Under controlled conditions, it can meet the elevation accuracy requirements of 1:10,000 scale in mountainous areas and alpine areas. After optimization, the recommended feasible path length is about 161.16 km. The GA constructed in this study can effectively bypass the avoidance areas such as residential areas, disaster points, nature reserves, scenic spots and ecologically sensitive areas when optimizing UHV transmission line selection. The accuracy and level of initial design of UHV transmission line selection are greatly improved. It can help overhead line planning professionals to make decisions and reduce their workload. This can not only improve the work efficiency, but also save the cost of preliminary survey, which is worth popularizing and applying.

References 1. H. Liu, B. Zhao, L. Huang, A remote-sensing image encryption scheme using DNA bases probability and two-dimensional logistic map. IEEE Access 7, 65450–65459 (2019) 2. X. Pan, F. Xie, Z. Jiang, J. Yin, Haze removal for a single remote sensing image based on deformed haze imaging model. IEEE Signal Process. Lett. 22, 1806–1810 (2015) 3. H. He, T. Chen, M. Chen, D. Li, P. Cheng, Remote sensing image super-resolution using deep–shallow cascaded convolutional neural networks. Sens. Rev. (2019)

168

W. Du et al.

4. A. Boser, D. Sousa, A. Larsen, A. MacDonald, Micro-climate to macro-risk: mapping fine scale differences in mosquito-borne disease risk using remote sensing. Environ. Res. Lett. 16, 124014 (2021) 5. Z. Yang, W. Ou, X. Fei, C. Li, X. Ma, Satellite remote sensing inspection technology of transmission channels based on harmonic analysis. J. Hunan Univ. Sci. Technol. Nat. Sci. Ed. 35, 76–82 (2020) 6. Y. Liu, K. Liu, S. Shi, X-band electromagnetic scattering characteristics of UHV transmission wires. J. Harbin Inst. Technol. 52(3), 6 (2020) 7. L. Cheng, H. Zang, Z. Wei, Regional ultra-short-term photovoltaic power prediction considering multi-spectral satellite remote sensing. J. China Electr. Eng. 42(20), 14 (2022) 8. Z. Zhao, Villi, Y. Dong, Object-oriented waterline extraction from “Gaofen-1” satellite remote sensing image. Space Return Remote Sens. 38(4), 11 (2018) 9. B. Li, T. Ju, B. Zhang, J. Ge, J. Zhang, H. Tang, Satellite remote sensing monitoring and impact analysis of spatial and temporal variation characteristics of atmospheric SO2 concentration in Tianshui. China Environ. Monit. 32, 134–140 (2016) 10. M. Fang, X. Sun, R. Huang, G. Qian, H. Geng, G. Wen, C. Xu, C. Xu, L. Liu, Automatic detection of power towers for high-resolution satellite remote sensing. Space Return Remote Sens. 42, 118–126 (2021) 11. H. Pei, X. Yang, L. Hou, Infrared image detection of double circuit UHV transmission lines on the same tower. J. China Acad. Electron. Inf. Technol. 14, 212–217 (2019) 12. R. Huang, F. Yun, Y. Ma, Identification of hidden danger areas of external damage of transmission corridors based on change detection. Aerosp. Return Remote Sens. 2022(003), 043 (2022)

Chapter 17

Research on the Evaluation of the Teaching Process of Public Physical Education in Universities Based on Markov Model Yilin Li

Abstract Teaching evaluation is a systematic and scientific process. Although this process will involve different evaluated subjects and evaluation contents, its implementation procedures have universal regularity. No matter what kind of evaluation is carried out, some basic steps must be followed. Markov random process, that is, the probability of state transition only depends on the previous state, so we can try to evaluate the whole teaching process by Markov mathematical model. In order to evaluate PE (Physical Education) class more accurately and realize the goal of college PE, an index system for evaluating the teaching process of college public PE class was established based on Markov model. Markov chain analysis is used to consider the students’ original state, and the students’ original grades are divided into the same grades under the same standard, and the state space is determined. On this basis, the probability of one-step migration is calculated and a one-step migration matrix is constructed. Finally, the limit vector of Markov chain is analyzed and compared by using its stability and ease of use. According to the characteristics of Markov chain one-step transfer matrix, the changed information can be accurately extracted from the transfer matrix without too many assumptions. The model is concise, reasonable and practical, which can provide a basis for teachers’ teaching quality evaluation. Keywords Markov model · Public physical · Physical education · Teaching process

Y. Li (B) School of Physical Education and Health, Huaihua University, Huaihua 418000, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_17

169

170

Y. Li

17.1 Introduction The evaluation of public PE (Physical Education) class teaching process is a process of systematically measuring students and teachers and evaluating their values, advantages and disadvantages with the teaching objectives as the standard, so as to improve. At present, the developmental evaluation of PE teachers in China is based on the domestic PE teacher evaluation system, but there is no unified standard for the evaluation of PE teachers in universities. Most PE teachers in universities still use traditional evaluation methods [1]. Therefore, only by following the basic principles of evaluation can the function of evaluation be reasonably brought into play and the established teaching objectives be finally realized. Teaching evaluation is a systematic and scientific process. Although this process will involve different subjects and evaluation contents, its implementation procedures have universal regularity. No matter what kind of evaluation is carried out, some basic steps must be followed. In the process of teaching evaluation, the basic level of different students is different, so it is impossible to evaluate the teaching effect of teachers simply based on the students’ performance in an exam [2, 3]. At present, there are two main teaching evaluation methods: one is the attendance score of the supervision team or other teachers, and the other is the students’ evaluation of teachers. This scoring method cannot rule out artificial subjective factors and is not objective [4]. The second is the teaching evaluation method based on students’ test scores. This method does not consider the basic differences of students, and the conclusion is unfair [5]. Markov random process, that is, the probability of state transition only depends on the previous state. Therefore, we can try to evaluate the whole teaching process by using Markov mathematical model, and eliminate the differences in learning basis and the influence of past results on current evaluation [6, 7]. Many college students’ sports achievements are determined by technical skills. Schools pay too much attention to students’ physical fitness and technical skills, but ignore other non-intellectual factors in teaching objectives, such as sports attitude, cooperative spirit and social development. The evaluation of PE is only for evaluation, and the ultimate goal of PE is understood as the evaluation result. Therefore, the evaluation goal has a certain oneness [8]. With the continuous development of college PE curriculum, the current evaluation methods of college PE curriculum can no longer meet the needs of the development of college PE class curriculum. In order to evaluate PE class more accurately and realize the goal of college PE, an index system for evaluating the teaching process of college public PE class was established based on Markov model.

17 Research on the Evaluation of the Teaching Process of Public Physical …

171

17.2 Research Method 17.2.1 Establishment of Evaluation Index System The principle of teaching evaluation refers to the working rules or standards that must be followed in teaching evaluation. Its purpose is to judge the teaching quality and teaching effect in PE class, help PE teachers find teaching problems, improve PE teaching methods, adjust PE teaching contents, and let students know their own learning situation, find out the shortcomings, improve them, and better improve the learning effect. When evaluating the teaching process of public PE class, we should adhere to the correct orientation, and reflect the corresponding teaching purpose in the evaluation of all aspects of teaching, not only to evaluate the overall design of PE teachers’ classroom teaching, but also to evaluate the completion of students’ innovative spirit and practical ability, as well as their participation, enthusiasm and creativity. Through evaluation, students’ physical and mental health and teachers’ own quality are promoted in an all-round way [9]. The scientific principle is one of the most emphasized principles by educational evaluation scholars. We should mainly pay attention to the following points: First, we should establish a scientific evaluation standard and index system. The evaluation criteria should be practical and reasonable, and the evaluation index system should closely follow the evaluation objectives. Second, make full use of the new achievements of modern science to make information collection more comprehensive and accurate, and information processing more precise and meticulous. Third, qualitative analysis and quantitative analysis are organically combined. These two analysis methods have their own advantages, and combining them is an effective way to improve the scientific evaluation. Principle of comprehensiveness. The comprehensiveness of the teaching evaluation target of PE class in universities means that the evaluation content should not only check students’ cognitive ability, perceptual ability, imitation ability and sports ability, but also check their emotions, attitudes and values. For the teaching teachers, they mainly check the language organization ability, sports demonstration ability, understanding of teaching content, teaching execution ability, classroom adaptability and teaching attitude. Dynamic principle. Only by combining formative evaluation and diagnostic evaluation can we make up for the defects of summative evaluation. The main function of diagnostic evaluation is to provide students with the starting point of teaching and the corresponding teaching methods by evaluating the learning preparation of specific textbooks; Formative assessment can enable students to find problems in PE teaching activities in time and get relevant feedback information, thus promoting the further development of PE teaching. In the evaluation of public PE teaching process, it mainly includes the evaluation of students’ PE learning achievement and PE teaching quality [10]. Through the comprehensive evaluation of the teaching quality of public PE courses in universities,

172

Y. Li

Fig. 17.1 Evaluation system of public PE class teaching process

some suggestions are put forward to improve the teaching quality of public PE courses in universities. Based on the guiding ideology of “body-oriented, health first” put forward by the national PE curriculum guiding outline, paying attention to the integration of in-class and out-of-class education, the common development of culture and sports, on this basis, combined with previous research results, a set of evaluation system of public PE course teaching process is constructed, as shown in Fig. 17.1. Weight is the importance of an index in the evaluation index system, that is, the influence of the change of this factor on the evaluation results when other factors remain unchanged [11]. In this study, some experts and professors engaged in PE were investigated by questionnaire, and the weight coefficient of each index was determined by AHP.

17.2.2 Markov Model Markov model is a theory that describes the state of a system and how to transfer between the states. Through the construction of initial state probability distribution and state transition probability matrix, the change of system state can be predicted according to the model calibration results of specific systems, and the purpose of predicting the future can be achieved. And state transition refers to the transition from one state to another in the development process of the system. For example,

17 Research on the Evaluation of the Teaching Process of Public Physical …

173

from congestion to normal, from normal to unblocked, from unblocked to congested, etc. Markov process refers to a stochastic process with Markov property (or no aftereffect). Markov property or no aftereffect refers to the conditional distribution of the state of the system at time t > tn under the condition that the state of the system at time tn is known, regardless of the state before time tn . That is, the value of X (tn+1 ) is only related to the value of X (tn ), but has nothing to do with the value of X (t1 ), X (t2 ), . . . , X (tn−1 ). The mathematical expression of Markov process is as follows: Let {X t , t ∈ T } be a random process and its state space be S. Under the condition of X (ti ) = xi , xi ∈ S, i = 1, 2, . . . , n − 1, at any time t1 < t2 · · · < tn , n ≥ 3, ti ∈ T of time t, if the conditional probability distribution function of X (tn ) is exactly equal to that of X (tn ) at X (tn−1 ) = xn−1 , {X t , t ∈ T } is called Markov process. See formula (17.1): P{X (tn ) ≤ xn |X (tn−1 ) = X (tn−2 ) = xn−2 , . . . , X (t1 ) } = P{X (tn ) ≤ xn |X (tn−1 ) = xn−1 }

(17.1)

The probability of state transition refers to the probability of transition from one state to another, and its mathematical expression is as follows: If {X (n), n ∈ T } is a Markov chain, then for any integer n(1 ≤ k ≤ n), the probability of the system being in state Si at time m under the condition of being in state S j at time m + k is shown in formula (17.2): } { Pi(k) j = P X (m + k) = S j |X (m) = Si

(17.2)

The sum of the elements of any row in the matrix is always equal to 1. n ∑

Pi j = 1

(17.3)

i=1

The factors that make up the PE teaching process not only include teachers and students, but also involve the choice of venues, equipment and teaching materials. These factors play different roles in the teaching process and form the overall function. PE teaching is a teaching process to meet students’ psychological needs, to treat students as a complete person, and to be a kind of education in line with human nature. Therefore, we should give full play to the characteristics and advantages of PE teachers, fully stimulate students’ thinking and cultivate their innovative ability in PE teaching. In the analysis of various factors of professional skills, students have relatively high recognition of teachers’ knowledge of sports safety and ability to deal with sports injuries; The recognition that teachers should have the ability to execute and judge, and that teachers can plan sports venues and manage class students is relatively low.

174

Y. Li

College PE teaching activities are a complete and systematic process, which includes both teachers’ teaching and students’ learning. The evaluation contents and standards should cover both teachers’ teaching guidance and students’ classroom learning practice. The evaluation should not only pay attention to teachers’ teaching, but also pay attention to students’ learning; We should not only evaluate the results of PE class’s teaching activities, but also evaluate the process of PE teaching activities. Not only evaluate students’ cognitive development, but also evaluate the development of non-cognitive factors [12]. Many foreign educational evaluation experts hold that the subject of evaluation should be diversified, and all the people related to the evaluation object should play a role in the evaluation process. Through the participation of various personnel in the evaluation process, the evaluation objectives can be more scientific and comprehensive, the evaluation methods can be more diverse, and the evaluation information can be more systematic, reliable and timely. On this basis, this project intends to use Markov chain method, taking the initial state of students as the starting point, to classify the initial results uniformly, determine the state space, calculate the one-step transfer probability, build a one-step transfer matrix, and calculate the maximum error vector based on the stability and ease of use of Markov chain, and make a comparative judgment.

17.3 An Example of Evaluation of Public PE Class Teaching Process The methods, principles and results of teachers’ evaluation of teaching are the main contents of teachers’ evaluation of teaching. In the current evaluation system of university teaching, the evaluation subjects are mainly university teaching quality supervision institutions and retired teachers. In the construction of PE teaching quality evaluation system in universities, some common and traditional evaluation contents should be maintained. At present, colleges and universities in China generally take the principle of “equal treatment” when setting up teaching quality evaluation system. That is to say, all subjects and all classes adopt the same system, the same method, the same table and the same set of grading standards. However, due to the uniqueness of the curriculum, this model inevitably has great limitations in the curriculum and cannot really play its theoretical guiding role in the curriculum. In this paper, Markov chain method is applied to the evaluation of PE teaching process in colleges and universities. In general, the correspondence between two continuous numerical states (such as two test scores) is used to describe the transition probability matrix, so as to judge whether the appraisee meets the current state. The results of 30 students in Class A and Class B of 2021 undergraduate PE major were selected as the research object. Score quantification is scored on a scale of 100 points, which is divided into five grades: A (90–100 points), B (80–90 points), C (70–79 points), D (60–69 points) and E (59 points and below).

17 Research on the Evaluation of the Teaching Process of Public Physical …

175

In fact, if there are a large number of students in a certain course, you can randomly select some candidates’ grades X 1 , X 2 , . . . , X n and use point estimation to get the estimated value of μ, σ : 1∑ Xi n i=1 ┌ | n |1 ∑ 2 Xi − X σˆ = √ n i=1 n

μˆ =

(17.4)

The application of Markov chain in the evaluation of teaching effect is based on the results of two exams, and let X i1 , X i2 be the standard score obtained by the i-th student in the two exams of course K . μ, σ are the sample mean and sample variance of this exam of course K respectively. By analyzing the transformation of students’ grades in Class A and Class B after standardization, we can get the transfer of their grades, as shown in Tables 17.1 and 17.2. From the above table, the one-step transfer matrices of Class A and Class B are as follows: Table 17.1 The retrogression of Class A students after standardization Progressive number i → j

Grade A

Grade B

Grade C

Grade D

Grade E

Total

Grade A

2

2

0

0

0

4

Grade B

3

2

0

0

0

5

Grade C

3

2

7

3

0

15

Grade D

0

1

0

0

3

4

Grade E

1

0

0

0

1

2

Total

9

7

7

3

4

30

Grade E

Total

Table 17.2 The retrogression of Class B students after standardization Progressive number i → j

Grade A

Grade B

Grade C

Grade D

Grade A

2

1

0

0

0

3

Grade B

0

1

4

0

0

5 13

Grade C

0

2

10

1

0

Grade D

0

0

0

3

0

3

Grade E

1

0

0

4

1

6

Total

3

4

14

8

1

30

176

Y. Li



⎤ 1/2 1/2 0 0 0 ⎢ 3/5 2/5 0 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ PA = ⎢ 1/5 2/15 7/15 1/5 0 ⎥ ⎢ ⎥ ⎣ 0 1/4 0 0 3/4 ⎦ 1/2 0 0 0 1/2 ⎡ ⎤ 2/3 1/3 0 0 0 ⎢ 0 1/5 4/5 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ PB = ⎢ 0 2/13 10/13 1/13 0 ⎥ ⎢ ⎥ ⎣ 0 0 1 0 3/4 ⎦ 1/6 0 0 2/3 1/6

(17.5)

(17.6)

Thus, the efficiency of Class A and Class B is: E A = 45/104, E B = −108/107. It can be seen that there is still E A > E B after eliminating the test paper factor, that is, the two evaluation methods have reached the same conclusion, which once again verifies the application effect of process assessment. This is a more reasonable evaluation method that fully considers the basic differences and the changes of progress/retrogression. Therefore, it can be said that the teaching evaluation based on Markov model is based on the learning process and evaluates the teaching effect from the perspective of development and change. At the same time, it also double verifies that the effect of adopting research-oriented teaching mode is indeed better than that of traditional indoctrination teaching. This evaluation method can be used to evaluate the teaching effect of both the same discipline and different disciplines. The Markov chain used in this project not only considers the importance of the construction of transfer matrix, but also gives full play to the characteristics of Markov chain’s one-step transfer matrix.

17.4 Conclusion With the continuous development of college PE curriculum, the current evaluation methods of college PE curriculum can no longer meet the needs of the development of college PE class curriculum. In order to evaluate PE class more accurately and realize the goal of college PE, an index system for evaluating the teaching process of college public PE class was established based on Markov model. The teaching evaluation based on Markov model is a more reasonable evaluation method based on the learning process, which evaluates the teaching effect from the perspective of development and change. At the same time, it also confirms the effectiveness of inquiry learning, that is, the effectiveness of inquiry learning. The model is concise, reasonable and practical, which can provide a basis for teachers’ teaching quality evaluation.

17 Research on the Evaluation of the Teaching Process of Public Physical …

177

References 1. Z. Chen, B. Zhao, Gansu public physical education class teaching status and development countermeasures, vol. 1. Longdong University (2015), p. 4 2. J. Meng, Y. Zhang, G. Li et al., Probe into the teaching reform of public physical education class under the network teaching platform-taking Taiyuan University of Technology as an example. Fighting Wushu Sci. 12(12), 3 (2015) 3. H. Lin, S. Li, Research on innovative teaching mode of public physical education class in colleges and universities. Curriculum Educ. Res. Res. Learn. Method Teach. Method (12), 1 (2018) 4. K. Zhang, Empirical study on the process evaluation of college students’ public physical education class teaching-taking badminton class as an example. Popular Lit. Art Acad. Ed. (11), 1 (2018) 5. L. Yan, Evaluation and analysis of physical education class teaching environment in colleges and universities. Contemp. Sports Sci. Technol. 12(5), 13–15 (2022) 6. H. Chen, Research on the construction of teaching quality evaluation model of public physical education class in colleges and universities. Sports Fashion (4), 2 (2018) 7. Q. Dong, C. Wang, The application of Markov chain in the evaluation of higher mathematics teaching effect. Pract. Underst. Math. 48(8), 7 (2018) 8. A. Yang, Y. Yang, S. Zhou et al., Fusion recommendation algorithm based on hidden Markov model. Comput. Modernization (9), 6 (2015) 9. W. Lei, Distance teaching method of e-commerce course based on Markov model. Microcomput. Appl. 38(5), 174–177 (2022) 10. J. Zhang, Z. Guan, The hierarchical connotation of “Koch model” in the evaluation of ideological and political teaching of public physical education courses in colleges and universities. J. Dalian Inst. Educ. 39(1), 3 (2023) 11. L. Liu, Analysis of the reform and innovation of table tennis public physical education class teaching in colleges and universities. Sports Time Space (2018) 12. Z. Lei, L. Niu, S. Song et al., Performance evaluation method based on maximum entropy Markov model. Control Theor. Appl. 34(3), 8 (2017)

Part II

Artificial Intelligence and Deep Learning Application

Chapter 18

Simulation Design of Matching Model Between Action and Music Tempo Characteristics Based on Artificial Intelligence Algorithm Leizhi Yu, Yaping Tang, and Yuanling Ouyang

Abstract At present, improving the matching between music and dance and the authenticity of dance generation is the focus of related research. Traditional methods can’t align the beat and calculate the dance movements and music coefficients. Driven by artificial intelligence, this article puts forward an audio feature extraction algorithm based on rhythm features, constructs a matching model between dance movements and rhythm features, and proves the effectiveness of the algorithm through simulation analysis. Experiments show that the accuracy of the matching model between dance movements and beat features based on artificial intelligence can reach 96.31%, and the matching synchronization between dance movements and music can reach 95.88%. The model constructed in this article can fully reflect the synchronization of music and dance movements, and the matching quality of model movements and music is good. Keywords Dance actions · Music · Tempo feature matching

L. Yu · Y. Tang (B) College of Music and Dance, Hunan University of Humanities, Science and Technology, Loudi 417000, China e-mail: [email protected] Y. Ouyang Changsha Human Resources Public Service Center, Changsha 410000, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_18

181

182

L. Yu et al.

18.1 Introduction In the stage of music and dance performance, dancers can perfectly combine the tempo of movements with the tempo of music, and show the amplitude and intensity of movements and steps with the intensity of tempo [1]. When a performer plays a musical work, his movements will show a quick priority with the change of the tempo of the music and the strength of the beat. Music has a positive effect on dance [2]. In addition to rendering, music can make dancers’ emotions and personalities best released with the ups and downs of music, thus making dance more expressive. The matching of dance and music has strict requirements, which are generally designed by professional choreographers. Choreography is a challenging job that requires great talent [3]. In fact, when choreographing new music, it is not to create a whole new dance sequence, but to repeat a lot of dance sequences corresponding to similar music. By repeating excellent dance sequences, the choreography is more flexible and efficient. Nevertheless, it is quite time-consuming for choreographers to manually match dance and music [4]. Therefore, let the computer realize the automatic synthesis of music-driven dance, and the intelligence and automation of the computer have certain practical value for dance arrangement and entertainment [5]. Music and dance are two different modes of data, and how to bridge the semantic gap between music and dance is a very difficult problem [6]. In order to better ensure the performance effect of dance, we should pay attention to the tacit cooperation between dance and music, so that dance actions and music tempos can achieve the consistency of time and space, so as to create a visual aesthetic feeling and psychological beauty [7, 8]. When choreographing, animators draw key frames by hand according to the given music and interpolate them by computer to generate dance animation [9]. In addition, at present, the music-dance action matching technology can not determine the dance action tempo with the changing form of music [10, 11]. In this case, how to effectively optimize the matching between dance technical movements and music has become a key issue in this field, which has attracted extensive attention from many related experts and scholars. Action and music are two completely different time series signals. In order to match them properly, it is need to establish an action-music characteristic matching model [12]. This model can integrate simple actions with music signals, establish a matching relationship that conforms to human cognition, and store, retrieve and edit music and actions. In fact, when hearing a song, people always shake their bodies tempoically with the beat of the music, and the stronger the musical emotion, the greater the body swing. Therefore, this article makes full use of the common characteristics of music and action in time series: tempo and intensity. In this article, driven by artificial intelligence, an audio feature extraction algorithm based on rhythm features is proposed, and a matching model between dance movements and music rhythm features is constructed. In audio segmentation, the differences between different audio features should be considered hierarchically, so as to divide different audio signals.

18 Simulation Design of Matching Model Between Action and Music …

183

18.2 Methodology 18.2.1 Basic Technology of Artificial Intelligence The first step of audio retrieval is to establish a database, extract features from audio data, and cluster the data by features [13]. Audio retrieval mainly adopts sample query method. Users select a query example through the query interface, set the attribute value, and then submit the query. The system extracts features from the examples selected by users, and then the retrieval engine matches the feature vectors with the clustering parameter sets. Audio segmentation refers to using the extracted audio features to segment the original audio stream where the features suddenly change, so as to obtain audio fragments. Identifying and classifying the segmented audio fragments and obtaining their semantics is the task of audio classification [14]. For the convenience of description, the music sequence is uniformly recorded as M, and m music fragments are obtained by segmentation, which are recorded as: M1 , M2 , M3 , . . . , Mm

(18.1)

The music characteristics extracted by Mi of the music clip are recorded as follows:  Music  FR (f) MusicFeatur e( f ) = f ∈M (18.2) F1Music ( f ) where f is the frame quantity of the music segment Mi ; FRMusic ( f ) and F1Music ( f ) are the tempo and intensity characteristics of music fragments respectively. Because MIDI score can be regarded as a collection of note instructions, the duration and position of each note can be easily obtained. According to this feature, the position of each quarter note in MIDI score is regarded as the tempo point of this music. This method is the extraction algorithm of tempo features. The transition frame interpolation algorithm interpolates the final action of the action segment M and the transition action of the action segment N , so that the interpolated action can not only retain the characteristics of the final action of M, but also smoothly transition to N . The transition action N of action segment N is the adjacent action segment before N in the original action sequence. In order to obtain the transitional action, the action segment matching is carried out for the partially overlapped music segments.

184

L. Yu et al.

18.2.2 Construction of Matching Model Between Dance Actions and Music Tempo Characteristics It can be said that through the matching of movements and music tempos by computers, the various specifications required in the database can be more reasonable. Information function is the prerequisite for its expression, and sequence and tempo are the final forms of expression. In audio segmentation, we should consider the differences between different audio features in different levels, so as to divide different audio signals. To achieve layered segmentation of audio signal streams, the key is to find features or feature combinations that can clearly distinguish different types of audio signals, and then divide continuous audio signal streams into predetermined audio examples by comparing whether the differences between features exceed a certain threshold. The framework of the matching model between dance actions and musical tempo features is shown in Fig. 18.1. Action characteristics extract the common features of action and music: tempo and intensity. The Greek dance used in this article, the movement tempo is mainly reflected by the lifting of feet, so it is extracted based on the extreme value of vertical displacement of feet. Assuming that the time domain signal of the audio file is x(l), the window function is w(m), and the audio signal of the n frame after windowing is x n (m), there are: xn (m) = w(m)x(n + m) 0 ≤ m ≤ N − 1, n = 0, 1T , 2T , . . .

Fig. 18.1 Model framework for matching dance actions and tempo characteristics

(18.3)

18 Simulation Design of Matching Model Between Action and Music …

 w(m) =

1, m = 0, 1, . . . , N − 1 0, other

185

(18.4)

Among them, N is the frame length; T is the displacement of the frame. The short-term energy of the nth frame is defined as follows: En =

N −1 

xn2 (m)

(18.5)

m=0

wherein, N represents the quantity of sampling points in the audio frame, and x n (m) represents the value of the mth sampling point in the nth frame of the audio signal. For each action-music piece combination, the algorithm is used to calculate the matching degree score of the combination. When calculating the audio distance, the distance is first calculated according to the feature vector in each dimension of audio, and then the total distance between audio is calculated by weighted Euclidean distance. Based on the obtained objective function of dance action matching optimization, this article combines ant colony theory to optimize the objective function. Specifically, the pheromone volatilization factors of all music-dance action segments in music choreography are adaptively adjusted according to the pheromone concentration.

18.3 Result Analysis and Discussion In order to prove the comprehensive effectiveness of the proposed optimization method of dance actions and music matching, simulation is needed. This section establishes an action database based on dance action data, uses the algorithm proposed in this article to synthesize dance, and evaluates the effect of this algorithm by synthesizing the effect of dance. The error experiment is carried out, and the error of different methods is shown in Fig. 18.2. It can be seen that the error of this method is low. In order to analyze the cohesion of dance action synthesis more objectively, the matching synchronization is introduced as a measure index. In order to evaluate the performance of this method, this method and two other different methods are used to optimize the matching between dance actions and music tempo characteristics. Compare the matching synchronization of different methods, and the specific comparison results are described in Fig. 18.3. The synchronization of matching dance actions and tempo characteristics using this method is the best, which verifies the effectiveness of this algorithm. According to the algorithm in this article, music segments can be deformed to map action characteristic points to corresponding music characteristic points, so that the matching degree of actions and music segments can be improved and improved. Because of the non-stationarity of audio features, sometimes in the same music, singing and dialogue, the volume and pause change greatly, and even completely

186

L. Yu et al.

Fig. 18.2 Error comparison chart

Fig. 18.3 Synchronization comparison result

different jumps may occur, which will interfere with segmentation. However, the segmentation effect of this method is very ideal on the whole. In order to further verify the rationality of the proposed matching algorithm, this article compares the existing matching dance action-music relationship with the dance-music relationship

18 Simulation Design of Matching Model Between Action and Music …

187

Fig. 18.4 Comparison chart of matching degree of different methods

synthesized in this article. In the following, the matching and optimization experiments of dance actions and music tempo characteristics are carried out by using the literature method and this method respectively. The matching comparison results of different methods are shown in Fig. 18.4. It can be concluded that, compared with the other two literature methods, the matching effect of dance actions and tempo characteristics using this method is better. In this article, based on the common characteristic tempo and intensity, the correlation between music fragments and action fragments is analyzed, and the action fragments are connected by interpolation, and finally the dance actions matching the target audio are synthesized. The following experiments are carried out, and the matching accuracy of several different methods is shown in Fig. 18.5. It can be seen that the matching accuracy of this method is higher than that of the two comparative methods. Generally speaking, the matching model of dance actions and tempo characteristics driven by artificial intelligence can reach 96.31% accuracy, and the matching synchronization of dance actions and music can reach 95.88%. It has certain advantages and correct rate, and achieves the expected effect of model construction.

188

L. Yu et al.

Fig. 18.5 Comparison of matching accuracy

18.4 Conclusions Reasonable matching between dance technical movements and music can improve the fit between stage and music. When matching dance movements with music, we should get several short movement-music segment combinations, align the beat prediction positions obtained in each step, and calculate the correlation coefficient between dance movements and music segments. In this article, driven by artificial intelligence, an audio feature extraction algorithm based on rhythm features is proposed, and a matching model between dance movements and music rhythm features is constructed. The model obtains the action-music map through precalculation, and uses the algorithm to search the best matching path in the rhythm feature space of action and music, and then edits and optimizes the resulting animation according to this best path. The final simulation results show that the matching model of action and music rhythm characteristics driven by artificial intelligence can reach 96.31% accuracy, and the matching synchronization of dance action and music can reach 95.88%. It has certain advantages and correct rate, and achieves the expected effect of model construction. The matching model of action and music can improve the overall efficiency of choreography and music system in terms of action and music, and also make the integration between dance and music simpler, more convenient and faster. Acknowledgements This work is a phased achievement of the Outstanding Youth Project “Ecological Research on the Inheritance of Meishan Sacrificial Music” (Grant No.22B0849), a scientific research project of the Hunan Provincial Department of Education.

18 Simulation Design of Matching Model Between Action and Music …

189

References 1. M. Doostan, B.H. Chowdhury, Power distribution system fault cause analysis by using association rule mining. Electr. Power Syst. Res. 152(11), 140–147 (2017) 2. B. Xu, X. Yin, D. Wu et al., An analytic method for power system fault diagnosis employing topology description. Energies 12(9), 1770 (2019) 3. C. Yu, L. Qi, J. Sun et al., Fault diagnosis technology for ship electrical power system. Energies 15(4), 1287 (2022) 4. G.B. Costa, J.S. Damiani, G. Marchesan et al., A multi-agent approach to distribution system fault section estimation in smart grid environment. Electr. Power Syst. Res. 204(3) (2022) 5. J. Zhang, H. Xu et al., Online identification of power system equivalent inertia constant. IEEE Trans. Ind. Electron. 64(10), 8098–8107 (2017) 6. M.Y. Zargar, U.D. Mufti, S.A. Lone, Adaptive predictive control of a small capacity SMES unit for improved frequency control of a wind-diesel power system. IET Renew. Power Gener. 11(14), 1832–1840 (2017) 7. L. Li, J. Wang, C. Cai et al., Phase-detection-based metal objects and pick-up coils detection scheme without malfunction in wireless power transfer system. IET Electr. Power Appl. 14(11), 2222–2230 (2020) 8. S.K. Gharghan, S.S. Fakhrulddin, A. Al-Naji et al., Energy-efficient elderly fall detection system based on power reduction and wireless power transfer. Sensors 19(20), 4452 (2019) 9. J. Yang, W.A. Zhang, F. Guo, Distributed Kalman-like filtering and bad data detection in the large-scale power system. IEEE Trans. Industr. Inf. 18(8) (2022) 10. G. Rovatsos, X. Jiang, A.D. Domínguez-García et al., Statistical power system line outage detection under transient dynamics. IEEE Trans. Signal Process. 65(11), 2787–2797 (2017) 11. K. Hwang, J. Cho et al., Ferrite position identification system operating with wireless power transfer for intelligent train position detection. IEEE Trans. Intell. Transp. Syst. 20(1), 374–382 (2019) 12. M. Ghasemi-Varnamkhasti, S.S. Mohtasebi, M. Siadat et al., Discriminatory power assessment of the sensor array of an electronic nose system for the detection of non alcoholic beer aging. Czech J. Food Sci. 30(3), 236–240 (2018) 13. Z. Wei, W. Liu, C. Zang et al., Multiagent system-based integrated solution for topology identification and state estimation. IEEE Trans. Industr. Inf. 13(2), 714–724 (2017) 14. V.H.C. Pinheiro, M.C. dos Santos, F.S.M. do Desterro, Nuclear power plant accident identification system with “don’t know” response capability: novel deep learning-based approaches. Ann. Nucl. Energy 137(9), 107111 (2019)

Chapter 19

Design and Optimization of Frequency Identification Algorithm for Monomelody Musical Instruments Based on Artificial Intelligence Technology Wenxiao Wang and Sanjun Yao

Abstract Audio signal analysis and classification is an important part of intelligent audio system. Driven by big data, it is undoubtedly an inevitable and important research topic to analyze and classify audio signals accurately and quickly from a huge audio database. From the analysis of musical complexity and computational complexity, in order to realize the rapid real-time contour of melody, the method based on saliency is more suitable. Driven by artificial intelligence technology, a frequency identification algorithm of single-tone musical instruments based on convolutional neural network is proposed. The frequency spectrum of each classified piano music is trained as the input image of CNN, and the classification of piano music is indirectly realized through image recognition. The simulation results show that the improved frequency identification algorithm of single-tone musical instrument has higher recognition accuracy, and the training result with note spectrum as training sample is better than spectrum, and the classification accuracy is improved by 9.2%, which shows the effectiveness of this method. Keywords Artificial intelligence · Audio recognition · Convolutional neural network · Audio signal

19.1 Introduction With the continuous growth of Internet and computer technology, music data has exploded. The traditional music recognition method can no longer meet the needs of current music retrieval [1]. Classification and recognition of musical instruments in music signals has become a hot topic in the field of music classification and recognition. Timbre refers to the characteristics of sound, and the difference between W. Wang · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_19

191

192

W. Wang and S. Yao

everyone’s voice and the sound produced by various musical instruments is caused by the difference of timbre [2]. Audio signal is the information carrier of regular sound waves with voice, music and sound effects, in which voice and music are the main contents of audio signal. The process of music classification conforms to the general process of pattern recognition application, so the process of music classification can be designed with the idea of pattern recognition [3]. Traditional speech features usually represent signals in frequency domain or cepstrum domain, and these features assume that the statistical characteristics of signals are stable in a short time frame, that is, the signals have short-term stationarity [4]. However, these features are not suitable for describing non-stationary audio signals. Feature extraction refers to finding the expression form of the original signal and extracting the data form that can represent the original signal. Unlike text analysis, which is characterized by keywords, the features in audio data are auditory features extracted from audio, such as tone and pitch [5]. In matlab, we can use the fast Fourier transform (FFT) to draw a spectrum diagram to describe the musical instrument audio, and extract the maximum amplitude and fundamental frequency to compare and analyze the loudness, pitch and timbre of different musical instruments, so as to identify different musical instruments [6]. Melody usually refers to an organized and rhythmic sequence of several musical notes formed by artistic conception, which is characterized by a certain pitch, duration and volume, and is carried out by a single voice with logical factors. If a specific audio clip or a specific shot is found in an audio or video file, it can’t be completed by text retrieval method at all, and it must be retrieved by targeted audio or video retrieval method [7]. Traditional music retrieval based on manually annotated texts needs a lot of manpower and time, which is far from meeting the needs of current development [8]. How to further improve the efficiency and accuracy of music classification through intelligent algorithms has become the focus of current research. Babaee et al. proposed a frequency feature identification method of digital single melody musical instruments based on wavelet packet transform, and adopted spectrum sensing algorithm to construct an audio signal acquisition model. Through this model, the audio information is finely extracted, which improves the ability of frequency detection and feature extraction of digital single melody musical instruments. Xu et al. evaluated the spectrum characteristics and the performance of aggregation strategy in musical instrument recognition, and established a combined model based on shorttime amplitude spectrum and Ghost VLAD, which improved the accuracy of musical instrument audio recognition [9]. Driven by AI technology, this paper proposes a frequency identification algorithm for single melody musical instruments based on CNN. The frequency spectrum of piano music of each classification is trained as the input image of CNN, and the classification of piano music is indirectly realized through image recognition.

19 Design and Optimization of Frequency Identification Algorithm …

193

19.2 Methodology 19.2.1 Overall Structure of Audio Features Through the in-depth understanding of audio signals, according to the characteristics of different audio signals and the ultimate classification purpose, through the analysis of audio signals in time domain and frequency domain, select the model with high classification efficiency to train and establish the classifier model. Both voice and music signals belong to unsteady signals, but they can be considered to be steady in a short time range [10]. In the short-term analysis of audio signals, the audio signal segments are usually framed first, and the framing length should not be too short or too long. Too short a frame length (less than 10 ms) will lead to too little information in a frame signal, which is meaningless for subsequent analysis. Too long a frame length (more than 30 ms) will lead to the failure to regard the signal as a steady state and affect the extraction of signal features [11]. On the basis of shortterm characteristics, the preprocessed audio signal is windowed and cut into audio frames to extract frame features. Audio segment, also known as audio example, is composed of multiple audio segments, and its analysis is based on the audio segments. According to the short-term stationary characteristics of audio signals, the different characteristics of frame signals are statistically analyzed to describe this segment, and the characteristics obtained by statistical analysis are segment characteristics. Audio is a random process characterized by certain time statistical characteristics. Therefore, the selection of audio classifier, like features, should be able to describe and reflect the time statistical characteristics of audio signals well. At present, the commonly used audio classification methods mainly include threshold-based classification method and statistical model-based classification method. Threshold-based classification method is not universal, because when it is applied to different classifications, the threshold is uncertain and needs to be changed constantly. If the application changes, we must find a new threshold through experiments. In the audio classification system, each part is closely connected and indispensable [12]. Only by preprocessing the audio signal according to the flow and then extracting the tilt feature can we ensure the effectiveness of the feature. If the order is reversed, it will lose its meaning, so it can be said that preprocessing is the basis of the subsequent part. The audio recognition framework of a single melody musical instrument is shown in Fig. 19.1. Audio segment is the direct unit of audio, and audio tone is the component unit of audio segment. Statistic the characteristics of audio frame and analyze and calculate its segment characteristics as the input of classifier [13]. The classifier receives different kinds of audio segments as the input data of the classifier, trains the input data to get the model parameters, and improves the classification parameters through repeated training and analysis, and then summarizes and analyzes the constructed model through tests. Audio signal feature set is a feature set composed of feature items extracted from audio signals, and the construction of audio signal feature set is the basis for the normal operation of audio classification system. Audio segments

194

W. Wang and S. Yao

Fig. 19.1 Audio recognition framework of single melody musical instrument

with a certain length of time used to extract the characteristics of audio signals are the basic objects of audio classification research and may contain certain semantic components. The selection of audio features is the key of the whole system. Only by selecting audio features that can accurately describe audio signals can audio be correctly classified.

19.2.2 Frequency Identification Algorithm of Musical Instruments in Single Melody Music Before audio data is classified, the features of the original audio data are extracted first. Therefore, the key of audio data classification is feature analysis, and the selection of audio features is strict, which should not only fully represent the time domain features of audio, but also well represent the frequency domain features. In order to reduce the impact of environment on features, it is required to be general and robust [14]. Firstly, the extracted features are put into the input unit, and then the data stream is sent from the input unit to the hidden unit, and another data stream is sent from the hidden unit to the output unit. For a hidden unit, use xt to represent the input of step t; the activation value of the current unit is:   s = f Uxt + W St−1

(19.1)

Among them, f represents the activation function, and ReLU is used in this paper. The output of step t is calculated by the Softmax layer. The value i t of the input gate unit controls how much of the input at the current time point can enter the memory unit, and its calculation expression is:

19 Design and Optimization of Frequency Identification Algorithm …

i t = σ (Wxi xt + Whi h t−1 + Wci ct−1 + bi )

195

(19.2)

Among them, Wxi , Whi , Wci are used to control the connection weights related to the input gate, and bi is the bias term. Framing processing is performed to divide the time-domain discrete signal into overlapping frames: X ST F T (k, n) =

N −1 

x(n − m)w(m)e− j2kπ m/N

(19.3)

m=0

Among them, k represents the frequency coordinate; n represents the center of the FFT window; w(m) is the Hamming window. Strong X ST F T (k, n) maps to a twelvedimensional vector p(k), each dimension representing the intensity of a semitone level. The mapping formula is:   p(k) = 12 log2

k · fr e f N · f sr

 mod 12

(19.4)

Among them, fr e f is the reference frequency, and f sr is the sampling rate. Accumulate the frequency values of the frequency points corresponding to each sound level to obtain the value of each sound level contour feature component of each time segment. The formula is as follows: PC P( p) =



|X (K )|2 p = 1, 2, 3, . . . , 11

(19.5)

k= p(k)= p

The tone contour features represent a tone scale by a twelve-dimensional vector, which reflects the relative strength of notes under the scale of chromatic scale in each twelve-chromatic interval. When audio signals are converted between different kinds of audio, the differences between different features are very different. Audio signal changes with time, and so does its characteristics. People regard it as constant in long-term research and analysis, which is the short-term characteristics of audio. The audio signal is a nonstationary random process that changes with time. In order to analyze the voice signal with the traditional method, that is, assuming that the change of the audio signal is very slow in a short time, the audio signal is divided into short segments for analysis, which is the audio tilt, which is the smallest unit of audio signal analysis. The extraction of segment features is the result of statistical analysis of post features, so the selection of frame features will have an impact on all subsequent steps, which not only can well characterize the audio category but also has good stability.

196

W. Wang and S. Yao

19.3 Result Analysis and Discussion Because of the non-stationarity of audio features, sometimes in the same music, singing and dialogue, the volume and pause change greatly, and even there may be completely different jumps, which will interfere with the segmentation, but the overall segmentation effect is very ideal, so feature extraction based on CNN can achieve satisfactory results in the segmentation of continuous audio streams [15]. In the case of a large number of unknown input data features, it is not appropriate to specify a unified static structure for different types of action sequence data with different lengths, and a lot of attempts must be made to obtain a suitable network structure. Prepare data and set parameters, that is, establish original feature data set, construct known class data set and unknown class data set, and set parameters such as k value. The fusion invariant features after dimensionality reduction can retain the key information of music style features, which can not only significantly improve the recognition performance, but also improve the recognition efficiency. Figure 19.2 shows the comparison of response times of different frequency identification algorithms. An adaptive operator is added to CNN’s local search strategy, which makes the local search scope decrease with the iterative algorithm, and then makes the local search more targeted. Compared with the high-level statistical features, using the bottom features to construct the frequency identification model of single melody musical instruments can get higher accuracy. The selection of pre-emphasis coefficient directly affects the effect of preprocessing, because the experimental data are preprocessed before audio signals are classified, and then the model construction is affected. The selection of pre-emphasis coefficient has a great influence on the experimental results and the classification accuracy of the whole classification

Fig. 19.2 Response time of different algorithms

19 Design and Optimization of Frequency Identification Algorithm …

197

system. Pre-emphasis can eliminate the high frequency components of the signal and reduce the influence of sharp noise. In the process of training the same batch of iterations, the same batch of samples will be randomly processed, and its output will change. In order to guide the direction of model parameter optimization, it is hoped that the output from the same input will be as same as possible, that is, the probability of belonging to a certain category is as close as possible for prediction classification. On the platform of Matlab, the efficiency of frequency identification models of single melody musical instruments with different methods is tested, and the identification efficiency is evaluated by running time. The statistics of calculation time experimental results of different feature dimensionality reduction are shown in Table 19.1. If an energy threshold is set, the proportion of frames below this threshold in the whole segment can be calculated in an audio segment. The accuracy of frequency identification of single melody musical instruments based on this algorithm and traditional FFT algorithm is shown in Table 19.2. There are different music sequences in the music library. By calculating the sequence length, we can find that some music sequences are very long, on the contrary, some are very short. Because the main purpose of the model design is to generate music with musical style, it has been explained in the previous section that the final musical genre style can be distinguished by the strength of music performance. Then the final music of different genres will have different strengths and weaknesses. The music classification accuracy of different algorithms varies with the number of samples as shown in Fig. 19.3. As can be seen from Fig. 19.3, the accuracy of musical instrument frequency identification of the proposed algorithm is more advantageous than that of the traditional Table 19.1 Dimension reduction time of musical instrument frequency identification model with single melody Musical instrument type

Training sample

Test sample

FFT

CNN

FFT

CNN

Saxophone

7.49

6.38

8.45

6.87

Violin

9.22

5.89

6.91

5.55

Erhu

8.13

6.66

7.78

5.88

Table 19.2 Accuracy of frequency identification model for musical instruments with single melody Musical instrument type

Training sample FFT (%)

CNN (%)

FFT (%)

CNN (%)

Saxophone

88.48

94.55

83.47

93.66

Test sample

Violin

84.67

94.35

86.28

96.75

Erhu

86.22

96.79

86.88

96.34

198

W. Wang and S. Yao

Fig. 19.3 Accuracy of frequency identification of musical instruments with different algorithms

FFT algorithm. In this paper, by adding unlabeled data to the training set, the performance of the algorithm is better than that of using labeled data alone, thus improving the accuracy of recognition. Through the improvement of this paper, the convergence speed of CNN parameters is faster, and the final model classification accuracy is 9.2% higher than the traditional algorithm. This method obtains an ideal result of frequency recognition of musical instruments with single melody, and the recognition accuracy is higher than other music feature recognition methods. Because audio is a random process, its characteristics have certain time statistical characteristics. Therefore, the proposed audio classification method can fully characterize the temporal statistical characteristics of audio data.

19.4 Conclusion The process of music classification conforms to the general process of pattern recognition application, so the process of music classification can be designed with the idea of pattern recognition. Traditional speech features usually represent signals in frequency domain or cepstrum domain. These characteristics assume that the statistical characteristics of the signal are stable in a short time frame, that is, the signal has short-term stationarity. In this paper, a frequency identification algorithm of monophonic musical instruments based on CNN is proposed. The frequency spectrum of each classified piano music is trained as the input image of CNN, and the classification of piano music is indirectly realized through image recognition. According to the principle of the algorithm, the realization of the algorithm is completed by matlab. In this paper, by adding unlabeled data to the training set, the performance of

19 Design and Optimization of Frequency Identification Algorithm …

199

the algorithm is better than that of using labeled data alone, thus improving the accuracy of recognition. Through the improvement of this paper, the convergence speed of CNN parameters is faster and the final model classification accuracy is higher. The pitch, timbre and loudness of different instruments are different. In the aspect of note feature extraction, it is necessary to further solve the frequency doubling interference caused by fast rhythm. In the design of CNN, it is necessary to further study the influence of network structure and loss function design on classification performance.

References 1. A. Baro, P. Riba, J. Calvo-Zaragoza et al., From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123(5), 1–8 (2019) 2. Y. Terchi, S. Bouguezel, Key-dependent audio fingerprinting technique based on a quantization minimum-distance hash extractor in the DWT domain. Electron. Lett. 54(11), 720–722 (2018) 3. R. Smiljanic, S. Keerstock, K. Meemann et al., Face masks and speaking style affect audiovisual word recognition and memory of native and non-native speech. J. Acoust. Soc. Am. 149(6), 4013–4023 (2021) 4. Mustaqeem, S. Kwon, A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2019) 5. X. Chang, W. Skarbek, Multi-modal residual perceptron network for audio-video emotion recognition. Sensors 21(16), 5452 (2021) 6. S. Chandrakala, S.L. Jayalakshmi, Environmental audio scene and sound event recognition for autonomous surveillance. ACM Comput. Surv. 52(3), 1–34 (2019) 7. J. Xu, Z. Liu, J. Jiang et al., High performance robust audio event recognition system based on FPGA platform. Cogn. Syst. Res. 50(8), 196–205 (2018) 8. E. Babaee, N.B. Anuar, A. Wahab et al., An overview of audio event detection methods from feature extraction to classification. Appl. Artif. Intell. 31(7–10), 661–714 (2017) 9. Q. Xu, C. Zhang, B. Sun, Emotion recognition model based on the Dempster-Shafer evidence theory. J. Electron. Imaging 29(2), 1 (2020) 10. T. Dimitrova-Grekow, A. Klis, M. Igras-Cybulska, Speech emotion recognition based on voice fundamental frequency. Arch. Acoust. 44(2), 277–286 (2019) 11. H. Mukherjee, S.M. Obaidullah, K.C. Santosh et al., Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int. J. Speech Technol. 21(4), 753–760 (2018) 12. Z. Huang, Study on the role of computer aided audio recognition in music conductor. Boletin Tecnico/Tech. Bull. 55(16), 476–483 (2017) 13. Y. Dong, X. Yang, X. Zhao et al., Bidirectional convolutional recurrent sparse network (BCRSN): an efficient model for music emotion recognition. IEEE Trans. Multimedia 21(12), 3150–3163 (2019) 14. J. Kocinski, E. Ozimek, Logatome and sentence recognition related to acoustic parameters of enclosures. Arch. Acoust. 42(3), 385–394 (2017) 15. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Proc. 11(7), 884–891 (2017)

Chapter 20

Design of Intelligent Evaluation Algorithm for Matching Degree of Music Words and Songs Based on Grey Clustering Yipeng Li and Sanjun Yao

Abstract The content of music, especially its melody, is the first information in people’s memory, while the text description attached to a musical work is the second information. At present, the music signal recognition methods have some defects, such as large error, slow speed and poor anti-noise robustness. A digital music signal denoising algorithm based on improved deep trust network (DBN) is proposed. The grey clustering algorithm is applied to the intelligent evaluation of the matching degree of music words and songs, and the digital music signal is effectively separated from the noise signal, and then the digital music signal after robust principal component analysis is effectively denoised by fast Fourier transform (FFT) algorithm. The results show that compared with the traditional wavelet transform, the grey clustering algorithm has obvious advantages in the later stage of operation, and the error is reduced by 32.64%. Grey clustering algorithm can greatly improve the recognition effect of digital music signals, improve the recognition speed of digital music signals, meet the online recognition of digital music signals, and have broad application prospects. Keywords Grey clustering · Word-song matching · Music signal · Digital music

20.1 Introduction With the growth of digital music technology, more and more people are accustomed to using the Internet to obtain music information for entertainment, study and business. This trend puts forward high requirements for music retrieval technology. At present, there are many websites that provide music sales and sharing services, and the music retrieval user interface adopted by them is nothing more than browsing the music list Y. Li · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_20

201

202

Y. Li and S. Yao

under a certain manual classification and literally searching the music text description based on words [1]. Because most people like listening to music, there are many kinds of digital music, and everyone likes different types of digital music. If the types of digital music signals are classified and identified in advance, listeners can choose the digital music they want to listen to from the digital music signal labels [2]. This can greatly improve the management level of digital music, so the classification and identification of digital music signals has become an important research direction in the field of artificial intelligence [3]. At first, digital music signals were identified manually. There is too much subjective consciousness in manual mode, which not only has low identification efficiency, but also has a large error in digital music signal identification, which can not meet the development speed of modern digital music [4]. Heck et al. select a quantity of professionals and non-professionals, and the comprehensive evaluation is the weighted sum of the average evaluation of both parties [5]. Zhang et al. adopted some evaluation rules in music theory that consider jumping and interval to evaluate music works [6]. This way is effective for creating music works of a specific music genre. Ge et al. used neural network to construct the matching and evaluation model of music lyrics [7]. At present, there are some digital music signal identification methods based on artificial intelligence, such as linear discriminant analysis, artificial neural network and support vector machine [8]. These methods obtain better identification results of digital music signals than manual methods, but in practical application, the accuracy of linear discriminant analysis of digital music signals is low, which can not reflect the changing trend of digital music signals; The recognition efficiency of digital music signal by support vector machine is low; Artificial neural network has weak robustness to noise interference [9]. In this paper, a digital music signal denoising algorithm based on grey clustering algorithm is proposed. The grey clustering algorithm is applied to the intelligent evaluation of the matching degree of music words and songs, and the digital music signal is effectively separated from the noise signal, and then the digital music signal after robust principal component analysis is effectively denoised by FFT algorithm.

20.2 Methodology 20.2.1 Representation and Extraction of Melody Features In the matching of music lyrics and songs, two important links are the source of feature data and the choice of classifier. In the source of feature data, in order to avoid the problem that single feature data can’t express all music melodies, it is necessary to explore multi-feature data extraction. The main task of the melody extraction function module is to do a series of signal processing in time domain and frequency domain on the input audio, extract melody features from it, including the frequency and rhythm of each note that constitutes the melody, and finally convert it

20 Design of Intelligent Evaluation Algorithm for Matching Degree …

203

into the note sequence required by the matching algorithm. The melody extraction module analyzes the audio humming melody and converts it into a note sequence [10]. The melody expression used in the system is similar to many existing studies. Starting with the pitch change and rhythm, each note is described by. Traditional single feature data can classify music types, used instruments and other contents well, but the accuracy is low when distinguishing music melodies. This is because different kinds of features have different sensitivities to emotions, so when classifying music melodies, it is necessary to use a variety of features to cooperate. In general, correlation function is used to measure the similarity of two signals in time domain, which can be divided into cross-correlation function and autocorrelation function. The cross-correlation function mainly studies the correlation between two signals. If the two signals are completely different and independent, the crosscorrelation function is close to zero. If the waveforms of the two signals are the same, the cross-correlation function will have peaks at the leading and lagging positions, and the similarity between the two signals can be calculated accordingly. Although there are many ways to score different kinds of music with different emotions, they can all be expressed in frequency domain. Because in the time domain signal, the signal at a certain moment contains different components, it is not convenient to extract. Therefore, it needs to be converted into a frequency domain signal. Frequency domain signal can distinguish the overlapping components of time domain signal according to different frequencies, and at the same time, it can know the magnitude of its amplitude. The process of music feature recognition and melody extraction is shown in Fig. 20.1. Using Fourier transform, time domain signals can be converted into frequency domain signals. For a piece of music, rhythm is very important. Among them, lyrics

Fig. 20.1 Music feature recognition and melody extraction

204

Y. Li and S. Yao

and melodies are often in rhythm, such as the tone of pronunciation; Or for short and fast melody fragments, they often correspond to lyrics that can be pronounced shorter, rather than complicated lyrics; Under the same melody segment, the corresponding lyrics are similar enough, that is, they are consistent with the lyrics in rhythm. Different digital music signals have different feature vectors. The identification model of digital music signals is mainly used to describe the mapping relationship between feature vectors and digital music signal types. For each part, the mean and variance of all the feature vectors are calculated and spliced to get the audio features of the whole sentence.

20.2.2 Digital Music Signal Denoising Algorithm Signal-to-noise ratio (SNR), as a criterion for evaluating noise reduction performance, is widely used in the detection of noise reduction performance of various images and music. The pursuit of high SNR in digital music will easily lead to poor auditory effect. Therefore, SNR and audio quality perception are used to comprehensively evaluate the noise reduction effects of the three algorithms on digital music signals [11]. The feature vector which can describe its type is extracted from the denoised digital music signal, and it forms a learning sample with the digital music signal type. Finally, the learning sample is trained by using the least square support vector machine of DBN algorithm to construct a digital music signal identification model. The short-term average energy E(i ) of the music signal in the i-th frame can be obtained by one of the following three algorithms: E(i ) =

N 

|X i (n)|

(20.1)

n=1

E(i ) =

N 

X i2 (n)

(20.2)

log X i2 (n)

(20.3)

n=1

E(i ) =

N  n=1

Among them, N is the frame length, and X i (n) is an amplitude energy of the audio information at the n point. The spectrogram itself originally covers the spectrum of all sound signals, which is a dynamic spectrum. The generated fast Fourier transform is as follows: X (n, k) =

N −1  M=0

X n (m)e− j

2πkm N

(20.4)

20 Design of Intelligent Evaluation Algorithm for Matching Degree …

205

Among them, X n (m) is the n-th frame signal of the framed audio. 0 ≤ k ≤ N − 1, then |X (n, k)| is the short-term amplitude spectrum estimate of X (n), and the spectral energy density function p(n, k) at m is: p(n, k) = |X (n, k)|2

(20.5)

Audio quality perception is a comprehensive evaluation parameter obtained by combining the masking threshold and distortion threshold of the signal through human ear subjective consciousness perception and artificial neural network method. The evaluation result of audio quality perception is usually negative. The larger the value, the smaller the difference between the digital music signal and the original digital music signal after noise reduction, that is, the better the digital music quality after noise reduction.

20.3 Result Analysis and Discussion DBN provides hierarchical display of audio data, and the weight of each layer of network actually includes some components of audio. The higher the quantity of layers, the more information the network contains. In the unilateral continuous matching algorithm, although the character string contains differences in insertion and deletion, it can meet the unilateral continuous matching in different degrees. According to the audio-visual experiment, when the total matching notes of two pieces of music are the same, the longer the continuous matching length of one side, the more similar the music is. DBN processes the original audio input signal layer by layer, and slowly recognizes the whole audio from some features of the audio. The traditional WT algorithm is selected as the comparison object, and the experimental results are shown in Tables 20.1 and 20.2. Compared with the traditional WT, the word-song matching accuracy of this method is obviously higher. The scatter plot between the predicted value and the actual value using the traditional WT algorithm is shown in Fig. 20.2. The scatter Table 20.1 The matching accuracy of digital music words and songs of this algorithm

Sample size

Accuracy of word-song matching (%)

15

98.85

30

98.33

45

97.89

60

97.11

75

96.24

90

95.56

105

94.89

206 Table 20.2 Matching accuracy of traditional WT algorithm for digital music lyrics

Y. Li and S. Yao

Sample size

Accuracy of word-song matching (%)

15

97.89

30

96.75

45

95.79

60

94.55

75

93.69

90

91.44

105

88.85

Fig. 20.2 Scatter plot of actual value and predicted value of traditional WT

plot of the predicted value and the actual value of the word-song matching model using the grey clustering algorithm is shown in Fig. 20.3. It can be analyzed that the word-song matching model based on grey clustering algorithm is better than the traditional WT in both accuracy and efficiency. The grey clustering algorithm achieves the same recognition rate as the grey clustering algorithm under the condition of lower training accuracy than the traditional WT algorithm, and gets the highest recognition rate with the least loss. When the quantity of iterations increases to a certain extent, the classification error gradually approaches a fixed value. Therefore, in order to meet the requirements of classification error accuracy, the quantity of iterations should be reduced as much as possible to reduce the training time. The depth features of automatic learning can be well used for recognition and classification, and local features can be used as a supplement, which makes the overall model have a better recognition rate. After feature extraction, the output of dual-channel face audio is weighted fusion, and finally the fusion results are classified by softmax. A comparison of the algorithms MAE is shown in Fig. 20.4. It can be seen that compared with the traditional WT, the grey clustering algorithm has obvious advantages in the later stage of operation, and the error is reduced by

20 Design of Intelligent Evaluation Algorithm for Matching Degree …

207

Fig. 20.3 Scatter plot of actual value and predicted value of grey clustering algorithm

Fig. 20.4 Comparison of algorithm MAE

32.64%. The signal-to-noise ratio and audio quality perception of the digital music signal with different noises are significantly higher than those of the traditional Fourier transform algorithm and time domain algorithm, which shows that the digital music signal contains the least noise and the digital music quality is the highest after noise reduction, which verifies the noise reduction performance of the proposed algorithm.

208

Y. Li and S. Yao

20.4 Conclusion With the development of digital music technology, more and more people are accustomed to using the Internet to obtain music information. This trend puts forward high requirements for music retrieval technology. For a piece of music, rhythm is very important. Among them, lyrics and melodies are often rhythmic, such as the tone of pronunciation; Or for short and fast melody fragments, it often corresponds to lyrics that can be pronounced shorter, rather than complicated lyrics; Under the same melody segment, the corresponding lyrics are similar enough, that is, they are consistent with the lyrics in rhythm. A digital music signal denoising algorithm based on grey clustering algorithm is proposed. The grey clustering algorithm is applied to the intelligent evaluation of the matching degree between music words and songs, and the digital music signal is effectively separated from the noise signal, and then the digital music signal after robust principal component analysis is effectively denoised by FFT algorithm. The algorithm model of music lyrics matching comprehensively considers the emotional and rhythmic characteristics between music lyrics, which can make a more accurate and intelligent evaluation of music works.

References 1. B. Ma, J. Teng, H. Zhu et al., Three-dimensional wind measurement based on ultrasonic sensor array and multiple signal classification. Sensors 20(2), 523 (2020) 2. H. Zhou, X. Feng, Z. Dong et al., Application of denoising CNN for noise suppression and weak signal extraction of lunar penetrating radar data. Remote Sens. 13(4), 779 (2021) 3. M. Van Segbroeck, A.T. Knoll, P. Levitt et al., MUPET-mouse ultrasonic profile extraction: a signal processing tool for rapid and unsupervised analysis of ultrasonic vocalizations. Neuron 94(3), 465–485 (2017) 4. D. Zhang, X. Zhuo, H. Peng et al., A method for traveltime extraction from thermoacoustic signal in range verification based on neural network algorithm. Med. Phys. 46(6) (2019) 5. M. Heck, M. Hobiger, A.V. Herwijnen et al., Localization of seismic events produced by avalanches using multiple signal classification. Geophys. J. Int. 216(1), 201–217 (2019) 6. K. Zhang, Music style classification algorithm based on music feature extraction and deep neural network. Wirel. Commun. Mob. Comput. (4), 1–7 (2021) 7. M. Ge, Y. Tian, Y. Ge, Optimization of computer aided design system for music automatic classification based on feature analysis. Comput. Aided Des. Appl. 19(3), 153–163 (2021) 8. Z. Zhang, X. Liu, Y. Ma et al., Signal photon extraction method for weak beam data of ICESat-2 using information provided by strong beam data in mountainous areas. Remote Sens. 13(5), 863 (2021) 9. D. Chaudhary, N.P. Singh, S. Singh, Development of music emotion classification system using convolution neural network. Int. J. Speech Technol. 24(3) (2021) 10. Y. Zhao, Research on resonance state of music signal based on filter model. Boletin Tecnico/ Tech. Bull. 55(15), 206–210 (2017) 11. J. Zhang, Music feature extraction and classification algorithm based on deep learning. Sci. Program. 2021(2), 1–9 (2021)

Chapter 21

Construction of Evaluation Model for Singing Pronunciation Quality Based on Artificial Intelligence Algorithms Yaping Tang, Yunfei Gao, and Yuanling Ouyang

Abstract The objective evaluation of the quality of singing pronunciation refers to the classification and comprehensive evaluation of the singer’s singing technical indicators in the process of singing, including sound quality, timbre, pitch strength, voice and breathing stability. Singing voice is an objective material existence, which has different physical properties in timbre, sound quality, sound intensity and pitch. At present, the relevant research has proved that there are obvious differences in frequency and energy between high-quality singing voice and low-quality singing voice in frequency spectrum. This paper attempts to construct an evaluation model of singing pronunciation quality based on artificial intelligence algorithm. The evaluation method of this model is greatly influenced by the appraiser, which is directly related to the appraiser’s professional quality, aesthetic standards, evaluation experience and current mood and mental state, and easily leads to unstable evaluation results. Simplify the candidate set of speech recognition network recognition model and improve the accuracy of pronunciation quality evaluation. Finally, the simulation results show that the candidate set of this model is obviously superior to the other two methods in pronunciation error detection. This shows that simplifying the identification network candidate set by reducing the error of phoneme model can improve the accuracy of phoneme model in identifying variant sounds. Keywords Artificial intelligence algorithm · Singing pronunciation · Quality evaluation model

Y. Tang (B) · Y. Gao College of Music and Dance, Hunan University of Humanities, Science and Technology, Loudi 417000, China e-mail: [email protected] Y. Ouyang Changsha Human Resources Public Service Center, Changsha 410000, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_21

209

210

Y. Tang et al.

21.1 Introduction The pronunciation and communication of people in daily life are not limited by time and conditions, and are conducted within a relatively narrow range, with no higher requirements for timbre, volume, and range. The sound source of singing is also vocal cord vibration, and the resonance sites also include the oral cavity, nasopharynx, laryngopharynx, and chest cavity. However, the vocalization of singing is a mixture of true and false sounds obtained with the support of breath, the natural opening of the throat cavity, and the full expansion of the nasopharynx cavity. This mixed true and false sound changes the resonance state of the true sound, reduces the resonance components of the oral cavity, increases the resonance components of the nasopharyngeal cavity and the head cavity, and maintains a high resonance position. Therefore, there is a significant difference between the mixed true and false sound and the true sound when speaking [1]. The pronunciation of singing requires not only clear articulation and beautiful timbre, but also the expression of thousands of viewers, which requires a relatively wide range of voice, flexible volume, and rich artistic expression methods. Objective evaluation of the quality of singing pronunciation refers to the classification and comprehensive evaluation of the singing technical indicators of the singer during the performance, including sound quality, timbre, tone strength, sound, and respiratory stability [2, 3]. The combination of the above two core concepts is the use of AI methods to objectively and scientifically evaluate the quality of singing sound. AI technology refers to the use of computers and related software to simulate advanced human thinking activities, such as decision-making, problem solving, and learning. Therefore, it is also known as artificial intelligence or machine intelligence. The objective evaluation of singing pronunciation quality is a new exploration using AI algorithm to objectively evaluate singing pronunciation quality. By increasing the logarithmic posterior probability distribution distance of the phoneme model, the discrimination error of the speech to be tested is reduced [4]. In this paper, the posterior probability expectation difference of singing pronunciation is defined as the perception of variant pronunciation by the phoneme model. Firstly, the learner’s pronunciation is forcibly aligned according to the learning content, and the phoneme segmentation result and the corresponding logarithmic likelihood probability of each phoneme are obtained. Then calculate the logarithmic likelihood probability of each phoneme’s pronunciation ambiguity model corresponding to the duration of each phoneme’s pronunciation, and synthesize it to obtain the segment length normalized similarity proportional logarithm of the phoneme. Using singing pronunciation perception as a measure of the accuracy of phoneme models in discriminating aberrant speech sounds under the AI algorithm will provide a relatively objective evaluation reference for traditional subjective evaluation models. This evaluation method is greatly influenced by the evaluator’s main body, and is directly related to the evaluator’s professional accomplishment, aesthetic standards, evaluation experience, and immediate emotions and mental state, which can easily lead to unstable evaluation results [5]. Simplify the candidate set of speech

21 Construction of Evaluation Model for Singing Pronunciation Quality …

211

recognition network recognition models to improve the accuracy of pronunciation quality evaluation.

21.2 Two Evaluation Systems Based on Artificial Intelligence Algorithm 21.2.1 Objective Evaluation Based on the Extraction of Evaluation Parameters of Singing Voice Singing pronunciation is based on language pronunciation. If you leave the foundation of language pronunciation and unilaterally pursue vocal skills and beautiful voice, you will not be able to achieve the expected results, and the songs you sing will not be able to perfectly express the meaning of the songs, so it will not be vivid and touching [6]. The algorithm complexity of vocal music evaluation method based on feature comparison is low, and its scoring result is close to manual scoring, which is more in line with people’s subjective feelings. By analyzing the waveform of the singer’s voice and showing the singer’s shortcomings in an intuitive way, the present situation of multimedia vocal music teaching can be improved. Therefore, the pronunciation training of singing must be embodied on the basis of language pronunciation to make singing pronunciation more artistic and scientific. Adding background noise to the training data to match the noisy characteristics of the test data; In order to solve the influence of channel noise and eliminate the speaker’s individual vocal tract characteristics, the cepstrum mean variance normalization technique based on AI algorithm is adopted. MAP adaptation is made by reading data in the same area as the pronunciation to be tested, which solves the problem of low accuracy of dialect accent and pronunciation; MAP adaptation is done with natural spoken data to solve the problem of arbitrary speaking style [7]. The scoring system architecture of objective evaluation based on the extraction of singing voice evaluation parameters is shown in Fig. 21.1. In addition to evaluating the pronunciation quality of phoneme segments, the evaluation of text-independent pronunciation quality should also evaluate high-level language information such as topic relevance, grammatical rationality and fluency according to the recognition results, so the accuracy of evaluation depends largely on the accuracy of speech recognition results. The recording environment, equipment, sampling standard, singing method and voice type of the extracted object involved in the feature parameter extraction are not clearly explained [8]. The concept of “standard sound data” is not clear. If you learn to master some basic methods of singing, such as “raising your voice position and relaxing your throat to speak, it will reduce the pressure on your throat from unnecessary tension, make your voice bright and pleasant, and spread farther, which will last longer than shouting with a natural voice.”

212

Y. Tang et al.

Fig. 21.1 Scoring system architecture

21.2.2 An Objective Evaluation Mechanism Based on Subjective Evaluation Criteria Quantification Traditional vocal music scoring methods usually use linear classification mathematical analytic expressions, removing the highest and lowest scores from multiple scores, and ultimately taking the average score as the actual score. First, the input speech is passed through a large word list continuous speech recognition engine to obtain recognition results. The confusion network marks several words or phoneme sequences with the highest probability corresponding to each speech segment. Then, using the recognition results to confuse the network to calculate scoring features, including fluency features, pronunciation accuracy features, topic relevance features, and so on; Then, these features are selected and fused. He questioned the traditional scoring method and believed that the traditional vocal music scoring method ignored the non-linear relationship between various evaluation indicators and singing effects, with strong subjectivity, and could not well and truly reflect the singer’s level [9]. The first natural aspect of this process is singing under natural conditions without skills. At the initial stage of learning, when exposed to the expansion of resonance, support of breath, and training in skills such as enunciation and enunciation, one may

21 Construction of Evaluation Model for Singing Pronunciation Quality …

213

begin to feel very uncomfortable and difficult to sing, and may experience problems such as these, which may make one feel very unnatural. Common scoring features include a posteriori probability of frame normalization, speech speed, and duration scores. Among them, a posteriori probability of frame normalization is currently recognized as the most acceptable evaluation index that can reflect the pronunciation standard of candidates; The characteristics of speech speed and duration reflect the fluency of pronunciation. The weight value of the evaluation index is determined by using the analytic hierarchy process in turn, and the standardized fuzzy sub vectors of the comprehensive evaluation are obtained by using the synthesis operation of the fuzzy matrix. The specific score of the evaluated object is calculated based on the hierarchical score of the normalized sub vectors, and the speech feature sequence that changes over time is obtained. In the decoder module, acoustic models and language models are loaded into the search space formed by the pronunciation dictionary for speech feature sequences. When pronunciation is mutated, the acoustic feature vectors of the same phoneme significantly change compared to standard speech [10, 11]. This change in acoustic characteristics is reflected in the change in the logarithmic posterior probability distribution of phoneme models under different speech sample sets. The best word string or word map containing multiple candidate recognition results is obtained through Viterbi search; Finally, in the post processing module, the word graph obtains the recognition result confusion network through node splitting and merging [12, 13].

21.3 Evaluation Model of Singing Pronunciation Quality There is a significant difference between singing speech recognition and pronunciation quality evaluation: speech recognition needs to include nonstandard pronunciation, so training acoustic model with standard pronunciation and nonstandard pronunciation can make training and testing more compatible. Through the refinement and quantification of voice evaluation standards, an objective evaluation mechanism is established by using AI technology, thus improving the current situation that vocal music is completely graded by raters according to subjective attitudes. Strictly speaking, this is not an objective evaluation study in the pure sense, but an “objective improvement of subjective evaluation mechanism”. Thereby effectively improving the recognition performance; The task of pronunciation quality evaluation needs to strictly distinguish standard pronunciation from non-standard pronunciation, so people only use standard pronunciation for acoustic modeling. Evaluation process of singing pronunciation quality based on AI algorithm. Pronunciation quality evaluation is mainly divided into two parts. One part uses perception to select models in the perception measurement matrix; The other part uses the model candidate set to evaluate the pronunciation quality and sing the pronunciation quality evaluation model, as shown in Fig. 21.2.

214

Y. Tang et al.

Fig. 21.2 Singing pronunciation quality evaluation model

Due to the fact that the logarithmic posterior probability of the evaluation model for vocal quality is a random variable under different sets of speech samples, statistical features are usually used to describe it. The evaluation project’s setting of “stage image,” “melody mastery,” “song emotional expression,” and other aspects related to musical sensation and performance content, surpasses previous single research that only examined pronunciation skills, and is more in line with the actual requirements of singing evaluation. Correspondence to traditional evaluation methods should also provide more experimental samples to improve the reliability of research results. The above related issues are urgently needed to be resolved in the research of objective evaluation mechanisms based on subjective evaluation criteria quantification. Therefore, this article uses a singing pronunciation quality evaluation model to calculate the difference between the logarithmic posterior probability expectations of standard speech and variant speech. Therefore, the measurement of phoneme level pronunciation standardization and the calculation of the frame regularized phoneme posterior probability are shown in Eq. (21.1):   1 M R θ j , On = 1n Tn

(21.1)

21 Construction of Evaluation Model for Singing Pronunciation Quality …

215

The pronunciation standard is the average of phoneme metrics, as shown in Eq. (21.2): MR =

  1 M R θ j , On N

(21.2)

The calculation of speech speed is shown in Eq. (21.3): ROS =

N 1  Tn N n=1

(21.3)

It can be seen that manually annotated data only participates in the training of scoring models with only a few parameters, and its role is not fully exploited. In the process of evaluating the quality of singing pronunciation, this paper uses AI algorithm to decode the phoneme set on an unconstrained speech recognition network to obtain the probability of acoustic feature vectors. In the extraction of singing sound samples, the recording environment, recording equipment, and storage methods are not described, which makes the research conclusions based on this data less reliable. Some studies lack data such as the singer’s gender, age, singing style, voice part, and range, or necessary information such as the number of samples used, and the type of voice samples, during the development of the evaluated voice samples. It can be seen that the modeling method proposed in this article is significantly different from the traditional modeling method for pronunciation quality assessment. Firstly, both manual text segmentation and reading aloud are important bases for optimizing acoustic models; Secondly, standard pronunciation, non-standard pronunciation, and even erroneous pronunciation can all participate in the optimization of acoustic models; At the same time, the algorithm does not require manual grading to the phoneme level, but only requires a text level score. The model constructed in this paper selects models based on perceived degree in the perceptual metric matrix, and the model selection results are used to construct an unconstrained recognition network. This can reduce the discrimination error of the phoneme model for variant pronunciation.

21.4 Analysis of Experimental Results The phoneme set in each category is regarded as the confusing pronunciation set corresponding to each phoneme in the set. For example, in Division 2, the phoneme set in vowel class is regarded as the pronunciation confusing set of phonemes in vowel class, and the phoneme set in consonant class is regarded as the pronunciation confusing set of phonemes in consonant class. Specifically, the methods of data acquisition, data analysis, experimental simulation and subjective and objective comparative analysis are adopted, and the outstanding feature is that a large

216

Y. Tang et al.

number of instruments and related software are used for data acquisition and analysis experiments. Modern scientific empirical methods based on AI algorithm are gradually accepted and understood by vocal music circles. At present, the general research methods are: the combination of subjectivity and objectivity, and the combination of theory and experiment. Table 21.1 lists the performance comparison results of pronunciation quality evaluation methods based on different recognition model candidate sets. From the table, it can be seen that the number of candidate phoneme models for each phoneme in unconstrained speech recognition network is 4.5 on average, which is slightly lower than the average number of phoneme model sets in reference [5]. However, the average of this model among the three recognition models is the highest, reaching 10.5. Figures 21.3 and 21.4 list the detection results of two typical pronunciation errors in Chinese pronunciation by pronunciation quality evaluation methods based on different recognition model candidate sets. It can be seen that the model candidate set in this paper is obviously superior to the other two methods in pronunciation error detection. This shows that simplifying the identification network candidate set by reducing the error of phoneme model can improve the accuracy of phoneme model in distinguishing abnormal pronunciation. It can be seen that the above three research methods are fundamentally different, with the main difference being the research basis. The objective evaluation mechanism research based on the extraction of singing voice evaluation parameters is based on the data collection and analysis of the measured and compared sounds. The relevant software and hardware devices used in the research, as well as the measured sound samples, are used to evaluate the consistency of the two groups of scores, which is determined by the degree of correlation between the two groups of scores. It can be flexibly adjusted within the standard framework according to the actual situation in different AI scenarios, and has the characteristics of combining general skills with specialized skills and adapting to the scene. Application examples demonstrate the high availability of the standard. Therefore, correlation can be used not only to test the credibility of manual scoring, but also to evaluate the performance of machine scoring, which has a decisive impact on subsequent research links and research results. Table 21.1 Performance comparison of pronunciation quality evaluation methods for different candidate sets of recognition models Model of cognition

PPV mean

Literature [5] model

0.412

4.5

0.745

Literature [8] model

0.856

7.6

0.912

This paper model

1.024

10.5

1.123

Mean value of the number of phoneme model sets

Correlation with manual scoring

21 Construction of Evaluation Model for Singing Pronunciation Quality …

217

Fig. 21.3 Comparison of vowel and vowel error detection in different recognition models

Fig. 21.4 Comparison of tone error detection for different recognition models

21.5 Conclusions The research of singing voice quality evaluation based on artificial intelligence technology is a relatively objective research, and its research result is not a denial of traditional subjective evaluation, but an objective evaluation and verification. In the extraction and analysis of evaluation parameters, except formant, range, fundamental frequency, average energy, frequency error, range error and other parameters. We ask

218

Y. Tang et al.

to sing like talking, or to speak like singing, which contains the dialectical relationship between talking and singing. This is because the musicality of language and melody also have an interactive relationship. When we speak like singing, we require the tonal expression of language to be as rhythmic as singing. This paper constructs an evaluation model of singing pronunciation quality based on artificial intelligence algorithm. This model uses phoneme model perception to select the corresponding candidate set of recognition model for each phoneme, thus reducing the discrimination error of phoneme model to abnormal speech and improving the accuracy of phoneme model pronunciation quality evaluation. Experiments show that the candidate set of this model is obviously superior to the other two methods in pronunciation error detection. This shows that by reducing the error of phoneme model to simplify the identification of network candidate set, the accuracy of phoneme model in distinguishing abnormal pronunciation can be improved.

References 1. H. Zhu, G. Huang, Pronunciation quality evaluation model for English reading speech based on acoustic model adaption and support vector regression. J. Guilin Univ. Electron. Technol. 68(19), 48–59 (2022) 2. L. Yang, H. Du, B. Ding, Voice quality evaluation of singing art based on 1DCNN model. Math. Probl. Eng. 20(39), 58–69 (2022) 3. H. Nakata, L. Shockey, The effect of singing on improving syllabic pronunciation—vowel epenthesis in Japanese, vol. 67, no. 10 (2022), pp. 45–72 4. F. Baills, Y. Zhang, Y. Cheng et al., Listening to songs and singing benefitted initial stages of second language pronunciation but not recall of word meaning. Lang. Learn. 65(20), 55–64 (2021) 5. G.H. Lee, T.W. Kim, H. Bae et al., N-singer: a non-autoregressive Korean singing voice synthesis system for pronunciation enhancement, vol. 47, no. 11 (2021), pp. 36–59 6. E. Demirel, S. Ahlback, S. Dixon, Computational pronunciation analysis in sung utterances, vol. 37, no. 10. arXiv e-prints (2021), pp. 17–55 7. J. Wang, Exploration on the artistic value and realization path of the nationalization of Bel Canto singing. Adv. High. Educ. 3(3), 112–201 (2019) 8. F. Chen, R. Huang, C. Cui et al., SingGAN: generative adversarial network for high-fidelity singing voice generation, vol. 45, no. 18 (Hindawi Limited, 2021), pp. 42–62 9. H. Nakata, L. Shockey, The effect of singing on improving syllabic pronunciation—vowel epenthesis in Japanese. Lang. Learn. 39(20), 15–44 (2021) 10. N. Takahashi, M. Kumar, Singh et al., Hierarchical diffusion models for singing voice neural vocoder. Adv. High. Educ. 33(7), 14–38 (2022) 11. Z. Wang, Q. Wu, Research on automatic evaluation method of Mandarin Chinese pronunciation based on 5G network and FPGA. Microprocess. Microsyst. 80(3), 103534–103547 (2021) 12. Z. Tong, Research on the model of music sight-singing guidance system based on artificial intelligence. Complexity 20(18), 31–52 (2021) 13. E. Wei, Intonation characteristics of singing based on artificial intelligence technology and its application in song-on-demand scoring system, vol. 18, no. 3 (Hindawi Limited, 2021), pp. 15–28

Chapter 22

Design and Optimization of Intelligent Composition Algorithm Based on Artificial Intelligence Yuxi Chen and Chunqiu Wang

Abstract In the current stage of music composition, the importance of computer composition has been gradually highlighted and people have paid attention to it. The appearance of algorithmic composition provides a brand-new method for music creation, makes music creation more diversified and popular, and provides another possibility for people to communicate with music. Artificial intelligence (AI) algorithm is to control the generation of note sequences through some strategy, and then form musical melodies, and finally get a complete score. This method requires a lot of musical knowledge rules. This article proposes a music feature extraction model based on improved genetic algorithm (GA), which provides technical support for computer intelligent composition. In the stage of composing, the given music is encoded in a certain way, and the encoded music is called chromosome. The crossover and mutation operators in GA are used to “evolve” the music, and the fitness function is used to measure the evolution result, and so on until the final satisfactory solution is found. The test results show that the model improves the local part without affecting other components, and has high scalability, and the time and space complexity of the algorithm is good, which proves that the algorithm in this article is feasible and effective. Keywords Intelligent composition · Music feature extraction · Artificial intelligence · Genetic algorithm

Y. Chen · C. Wang (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_22

219

220

Y. Chen and C. Wang

22.1 Introduction In the current stage of music composition, the importance of computer composition has gradually been highlighted, and people have also paid attention to it [1]. Because this system will not only produce musically strong segments, making the music more diversified, but also can imitate the composer and automatically generate some ideal music. The innovation of computer music composition is actually based on the skills of computer music composition innovation, and then combined with AI algorithm to form an automatic composition system to meet the final needs of users [2]. AI algorithm is to control the generation of note sequence through some strategy, and then compose music melody, and finally get the complete music score. This method requires a lot of music knowledge rules [3]. GA is a search algorithm based on the idea of survival of the fittest and survival of the fittest [4]. In the early 1950s, some biologists tried to simulate biological systems with computers, resulting in the basic idea of GA [5]. Neville points out that GA guides learning and determines the direction of search through simple genetic operation and natural selection of survival of the fittest on a set of coded representations [6]. Laleh thinks that GA is mainly trying to use a formalized process to reduce the degree of human intervention of music composers in music creation and realize computer innovative composition [7]. In this article, an intelligent composition model based on improved GA is proposed. In the stage of composition, the given music is first encoded in a certain way. The encoded music is called chromosome, and the crossover, mutation and other operators in GA are used to “evolve” the music.

22.2 Model and Algorithm Design 22.2.1 Music Feature Extraction Due to the complexity of music information itself, there will be inaccurate and nonstandard phenomena when encoding music. The effect of GA composition depends to a large extent on the performance of the system for music knowledge. The more fully the system expresses music, the more accurate the work will be [8]. In the AI composition process, the interaction between machine and machine is the most frequent, and also the largest amount of computation. Autonomous interaction of machines not only occurs in the part of deconstruction, classification and reconstruction, but also between smaller units or larger modules [9]. Therefore, although people play a major role in human–computer interaction, the autonomous interaction ability of machines cannot be ignored. Due to the complexity and abstraction of the music structure, many current systems still have many shortcomings in the design of reasoning structure, so many current systems are only used to produce relatively short music fragments. The structural scale of the segment is generally not too long, but the form is complete and unified, which can reflect the distinct music image.

22 Design and Optimization of Intelligent Composition Algorithm Based …

221

Fig. 22.1 GA stage of computer intelligent composition

In the main melody, it is shown as the interval difference between the pitch of the ending note of each phrase in the passage and the debug main note. The GA stage of computer intelligent composition is shown in Fig. 22.1. In order to express music information more accurately, the system uses the interval difference method to encode music, that is, the pitch characteristics are recorded by the interval difference from the previous note, so that the coding can better express the so-called characteristics of music melody line, and is more suitable for the operation of music by GA, eliminating the deficiency of directly using pitch coding to generate major pitch changes in the phrase after passing through GA. For chromosome k with fitness f k , its selection probability sk is calculated as follows: rk sk =  pop_si ze i=1

fi

(22.1)

Then sum the fitness values of all chromosomes in the population: 

pop_si ze

F=

fi

(22.2)

i=1

For each chromosome, calculate the selection probability ck : sk =

rk F

(22.3)

222

Y. Chen and C. Wang

For each chromosome, calculate the cumulative probability tk : 

pop_si ze

tk =

si

(22.4)

i=1

In the stage of encoding, an automatic encoder is used to extract features from music content. Generally speaking, in this process, vocabulary can be classified and changing features can be known through classification [10]. The hidden layer of the automatic encoder will produce a starting value. This value is then fed forward to the decoded part to reshape the generated music content. It is often used in nested automatic encoders. In the main theme, it is expressed as the interval difference between the pitch of the ending notes of each phrase in the paragraph and the debugging tonic. Therefore, using music segments as the operating objects of GA can generate music segments with complete musical thoughts. In order to be able to produce the music pieces needed by users according to their emotional factors. The system gives each paragraph a quantity of fuzzy features, which reflect the musical style of the paragraph in a fuzzy percentage.

22.2.2 Intelligent Composition Algorithm The emotion of a piece of music depends largely on the rhythm of the music. For example, a fast pace is easy to associate with a feeling of passion and excitement, while a slow pace is easy to relax. Therefore, after careful consideration, a brandnew rhythm generation method is proposed, which can effectively avoid the huge running cost caused by iterative operation similar to pitch in note length and ensure the level of the final work. For these existing populations, related genetic operations are needed [11]. In order to promote the emergence of a new generation of population, but also to continuously promote the evolution and development of the population, so that it can make continuous progress in continuous genetic operations, thereby further improving the advantages of the population. In practical work, this method has gradually attracted people’s attention. When the user specifies a certain emotion, randomly select a rhythm from the corresponding category in the rhythm library, and then match the pitch obtained by GA, which is the output music. And this rhythm library can be manually extracted from various classic songs. Of course, it is best to provide appropriate variation in the algorithm to achieve the purpose of diversity. This scheme can be regarded as a compromise between efficiency and effectiveness. To realize the “evolution” of genetic algorithm, it is necessary to ensure that better genes have a greater chance to survive and produce offspring, that is, to be “selected”. Based on this idea, it is necessary to ensure that chromosomes with large fitness value may be selected from the population for many times, while chromosomes with small fitness value may lose the opportunity to be selected [12]. If Jane extracts the rhythm from several pieces of music and then stiffly inserts it into the generated music, it

22 Design and Optimization of Intelligent Composition Algorithm Based …

223

will inevitably lead to similarity and rigidity. Therefore, after repeated weighing, it is suggested to extract more rhythms from a larger music library, and at the same time change them randomly within the control range to reduce repeatability. Selection operation is the stage of selecting individuals with strong vitality in a group to produce a new group. GA uses selection operators to select the best and eliminate the worst in the population. If the accurate value of the feature description of a piece of music is higher, the fitness is lower, and the chance of it being selected is smaller. Using information entropy to evaluate the amount of information in each track of MIDI file:  p(x) H ( p(x)) = − p(x) log2 (22.5) where p(x) is a probability density function. Calculate the input s j of each intermediate layer unit with the input sample X = (x1 , x2 , x3 . . . , xn ), the connection weight Wi j and the threshold θ j : ⎧ n  ⎪ ⎨ sj = wi j xi − θ j ⎪ ⎩

i=1

(22.6)

j = 1, 2, 3, . . . , p

Then calculate the output b j of the middle layer unit: ⎧ 1 ⎨ b = f s = j j 1 − e−s j ⎩ j = 1, 2, 3, . . . , p

(22.7)

Calculate the global error of the network: 1 (yi − ci )2 2 i=1 q

E=

(22.8)

The changing intensity of rhythm can be expressed by the length and intensity of notes in a piece of music:

n−1



Ii+1 − Ii



Rhy =

D

i=1

(22.9)

i

In the stage of mutation operation, it is need to closely combine the characteristics of the original model, and then mutate the original model in two forms: horizontal and vertical. In the stage of evolution, the principle of model music is adopted, so it is need to consider the characteristics and coherence of music. It is relatively easy to carry out crossover operation in the whole evolution stage of the model, because

224

Y. Chen and C. Wang

it will realize crossover GA with a relatively small probability, and use crossover operation to combine two individuals or parents to form crossover offspring for the next generation. In order to ensure the continuity of music, the method of single point crossing is adopted. Because a paragraph is a structural combination containing multiple phrases, in order to preserve the musical structure between phrases, it is need to cross each phrase in a paragraph uniformly and independently, that is, the corresponding phrases in two paragraphs cross in pairs.. Through the implementation of the crossover operation process, not only the characteristics of the music itself can be completely preserved, but also the diversity characteristics of the music segments can be enhanced, so it is need to adopt the form of note crossover in the evolution stage of the music. For the music works that have been presented, users can evaluate them by themselves, and then make relevant evaluations and choices according to their own standards.

22.3 Result Analysis and Discussion Traditional music composition means that the composer uses the score to encode music information. When the singer or singer gets the composer’s finished music composition and plays it, it is a stage of decoding the encoding. In the stage of decoding, the stage of music score plays a certain role, but the music score can not completely record the whole stage of the composer coding music information. If the computer GA is used to create music, there will be no such differences. First of all, the computer can make the composer’s coding process easier and record the composer’s information in real detail. When using GA to compose music, how to realize the creativity of music has become a problem that many music creators have to think about. The evaluation of music works is an important way to evaluate the quality of music. In order to verify the performance of the intelligent composition model based on improved GA proposed in this article, this section compares different models by experiments. This article completes the simulation of composition model on Matlab platform. Figure 22.2 shows the influence of iteration times on the loss value. The results show that the greater the quantity of iterations, the more times the weight parameters are learned and adjusted, which can improve the accuracy of the model to a certain extent. The computer can realize automatic decoding, and in this process, the digital signal turns to the analog signal. When a composer composes music with the computer GA, the computer receives the composition signal, which will immediately convert the composer’s score information into sound signals, and finish the perception of music in time through headphones or speakers. The computer can directly present these music signals in the software interface of the sequencer. Through the display of these indicators, the composition will become a dynamic process, so that composers can see the effect of music creation and its shortcomings in time. By recoding the music, the music effect will be more perfect, and the music creation will be completed in the dynamic

22 Design and Optimization of Intelligent Composition Algorithm Based …

225

Fig. 22.2 Effect of iteration number on loss value

operation process. The error results of different music feature extraction algorithms are shown in Fig. 22.3. It can be seen that the music feature extraction algorithm in this article has the smallest error and better performance. After the mode and beat are selected and adjusted, the user can choose the corresponding beat denominator, pitch, tempo and musical instrument to play the music. If there is a music segment in the result that the user is satisfied with, it can be stored in the system music segment library. The characteristic percentage value of this music segment is the percentage value that

Fig. 22.3 Error results of different music feature extraction algorithms

226

Y. Chen and C. Wang

Fig. 22.4 Comparison of algorithm recall results

the user initially entered, and the accurate value of the characteristic description of the music segment is zero. The comparison of algorithm recall results is shown in Fig. 22.4. After all the phrases in the paragraph are crossed, the exact value of the feature description of the newly generated sub-paragraph should be calculated, which can be calculated by the exact value of the feature description of the parents and the intersection of the phrases. Because the interval difference between the beginning and the end of the phrase of the parent segment is completely inherited to a certain sub-segment, it can be given appropriate weight when calculating the accurate value of the feature description of the sub-segment. Through experiments, the accuracy of feature extraction of different algorithms is shown in Fig. 22.5. It can be seen that the music feature extraction accuracy of this model is higher. The results verify the validity and correctness of the model proposed in this article, which can provide technical support for the optimization of computer intelligent composition algorithm. In this article, it is a brand-new idea to describe the fitness function of algorithmic composition with the stylized characteristics of music, and users can produce the music fragments they need according to their own emotional factors. It is feasible for GA to create music pieces as a unit. Through reasonable coding, systematic music section library and constraints on genetic operation process, music sections with complete musical structure can be generated, and when music knowledge is expressed more accurately, the generated music will be more accurate and harmonious.

22 Design and Optimization of Intelligent Composition Algorithm Based …

227

Fig. 22.5 Comparison of feature extraction accuracy of different algorithms

22.4 Conclusions Select possible values from each parameter in turn to form a series of parameter values, which can be changed according to the reversal or retrograde of the current series or series. An intelligent typesetting model based on improved genetic algorithm is proposed. Fitness function is used to measure the evolution result, and this process continues until the final satisfactory solution is found. The consistency between frames in the final generated result sequence is relatively consistent. The results show that the music feature extraction accuracy of this model is high. The results verify the validity and correctness of the model proposed in this paper, which can provide technical support for the optimization of computer intelligent typesetting algorithm. Through reasonable coding, systematic music fragment library and constraints on genetic operation process, music fragments with complete music structure can be generated. When music knowledge is expressed more accurately, The generated music will be more accurate and harmonious. With the support of big data platform, with the improvement of the overall technical level of AI and the improvement of users’ requirements for intelligent composition system, the interactive relationship will not only make a breakthrough in the independent interaction of machines, but also develop in other aspects of human–computer interaction.

228

Y. Chen and C. Wang

References 1. J. Bobrysheva, S. Zapechnikov, On the key composition for post-quantum group messaging and file exchange. Procedia Comput. Sci. 190(5), 102–106 (2021) 2. K. Hastuti, A. Azhari, A. Musdholifah et al., Rule-based and genetic algorithm for automatic gamelan music composition. Int. Rev. Model. Simul. 10(3), 202 (2017) 3. R.D. Prisco, G. Zaccagnino, R. Zaccagnino, EvoComposer: an evolutionary algorithm for 4-voice music compositions. Evol. Comput. 28(2), 1–42 (2019) 4. S. Lee, Y.J. Yoon, S. Kang et al., Enhanced performance of MUSIC algorithm using spatial interpolation in automotive FMCW radar systems. IEICE Trans. Commun. 101(1), 163–175 (2017) 5. M. Rhoades, Exploring the nexus of holography and holophony in visual music composition. Leonardo Music J. 30(1), 64–70 (2020) 6. U. Neville, S.N. Foley, Reasoning about firewall policies through refinement and composition. J. Comput. Secur. 26(2), 207–254 (2018) 7. T. Laleh, J. Paquet, S. Mokhov et al., Constraint verification failure recovery in web service composition. Futur. Gener. Comput. Syst. 89(12), 387–401 (2018) 8. K.L. Hagan, Textural composition: aesthetics, techniques, and spatialization for high-density loudspeaker arrays. Comput. Music J. 41(1), 34–45 (2017) 9. X. Xu, Z. Liu, Z. Wang et al., S-ABC: a paradigm of service domain-oriented artificial bee colony algorithms for service selection and composition. Futur. Gener. Comput. Syst. 68(3), 304–319 (2017) 10. T. Louge, M.H. Karray, B. Archimede et al., Semantic web services composition in the astrophysics domain: issues and solutions. Future Gener. Comput. Syst. 90(1), 185–197 (2018) 11. P. Ombredanne, Free and open source software license compliance: tools for software composition analysis. Computer 53(10), 105–109 (2020) 12. Y. Lei, J. Zhang, Service composition based on multi-agent in the cooperative game. Future Gener. Comput. Syst. 68(3), 128–135 (2017)

Chapter 23

Design of Computer-Aided Music Generation Model Based on Artificial Intelligence Algorithm Wenyi Peng, Yaping Tang, and Yuanling Ouyang

Abstract With the increasing demand for spiritual and cultural life in modern society and the deepening of cultural system reform, computer-aided music generation has become an increasingly important topic. This paper mainly studies the computer-aided music generation model based on AI (artificial intelligence) algorithm. Based on the LSTM (Long short-term memory) network, this paper redesigns the computer-aided music generation network. On this basis, a new musical style subnetwork is proposed and applied to different types of musical style subnetworks. Each music style sub-network analyzes different styles of music, achieving the goal of multi-task parallel completion. Experiments show that the algorithm proposed in this paper has a good effect in promotion and can realize the automatic generation of songs. In terms of the quality of the generated samples, compared with the existing generation algorithms, the results show that the generated samples are reliable. Keywords Artificial intelligence · Computer-aided · Music · Music generation · LSTM

23.1 Introduction With the increasing demand for spiritual and cultural life in modern society and the deepening of cultural system reform, computer-aided music generation has become an increasingly important topic [1]. In each parameter, possible values can be selected in a certain order to form a group of parameters, which can change with the current order or the reversal of the order [2]. AI (artificial intelligence) music design is a W. Peng · Y. Tang (B) College of Music and Dance, Hunan University of Humanities, Science and Technology, Loudi 417000, China e-mail: [email protected] Y. Ouyang Changsha Human Resources Public Service Center, Changsha 410000, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_23

229

230

W. Peng et al.

research topic involving music creation, music technology, AI, automatic accompaniment and so on. The appearance of arrangement forms has played a positive role in releasing people’s musical imagination, benefiting musicians and promoting changes in the music industry. Although music is an art, it has a strong computability, and the beauty of mathematics is behind the music mode. The repetition of melody, modulation, interval, rhythm, arrangement and combination of pitch in harmony, parallelism in musical form, etc. can all be described and modeled by algorithms [3]. Literature [4] uses hidden Markov chain to realize a real-time music generation system with manual tuning. Literature [5] uses LSTM (Long short-term memory) to learn music knowledge. Considering the coordination of music structure, the final model can improvise a piece of music. Literature [6] applies the neurolinguistic model Transformer to the generation of piano music. This model is completely composed of attention units. Compared with RNN (Recurrent Neural Network) methods such as LSTM, it has better long-term dependence modeling ability, and it can also better learn the correlation of different scales in the sequence. Reference [7] carries out rule-based AI programming. The rules of melody and harmony generation are based on the combination rules of notes and chords. The AI component of the system determines the applicability of melody and chord generation rules and the weight rules. With the outstanding achievements of AI algorithm in image recognition, video detection, natural language processing, speech processing and other fields, and its wide application in related fields, it has been deeply studied [8–10]. Automatic composition is a research that tries to use a formal process to minimize people’s involvement in music creation by computer. This paper mainly studies the computeraided music generation model based on AI algorithm, and designs and implements a new model through the variation of LSTM network. Through this model, different genres and styles of music can be generated at the same time, which has a certain contribution to computer-aided music generation.

23.2 Research Method 23.2.1 Data Preprocessing Before training the music data, the data in the music library need to be simply processed. These processes include the quantification of data, the elimination of data that do not meet the requirements, and the processing of data formats. Automatic arranger algorithm based on deep learning is usually a learning system constructed by using the theory and technology of neural sequence modeling in a given application scenario. It needs to be guided by a certain degree of knowledge in the music field, but the demand for professional knowledge is not as much as that of the knowledge base expert system.

23 Design of Computer-Aided Music Generation Model Based on Artificial …

231

Fig. 23.1 Binary vector representation

General music usually consists of melody and its harmonic progression. In a melody, at most one note sounds at any moment, and the period of silence is called a rest. In music, a musical work composed of only one main theme is the most basic form, such as vocal cantata without any accompaniment. In polyphonic music, harmony arrangement often needs to make the different voices of high, middle and low reflect the unique melody under the premise of satisfying the overall harmony and fluency of the work, which usually requires the arranger to master the music theory and make experience. Each MIDI event takes the time represented by the above Delta-Time as the starting time, and each track block contains several different MIDI times. Ordinary MIDI events represent the event sequence information and corresponding control information of notes in MIDI tracks. It can be roughly divided into the following three categories: notes, controllers and system information. These events have a unified expression structure of category + parameter. Compared with processing scanned images or sound files of music scores, it is relatively convenient to process symbol data [11]. Most of the information in the data can be accurately converted into a text or digital format that can be operated by general statistical or information retrieval tools. A simple symbol expression method is proposed, and the state information of symbols is encrypted in the form of vectors. In this paper, a method of encrypting tone state by binary vector method is proposed. This method is depicted in Fig. 23.1. Its characteristic value is speech recognition algorithm, and the binary vector method is used to encrypt the state information of speech, which makes the learning of the model easier. In order to improve the accuracy and effectiveness of the model, it is necessary to select the optimal features for feature extraction. However, in this music library, some MIDI music is synthesized by software, while others are obtained by live performances. MIDI is good at expressing the information of notes, such as pitch and strength, and can also control parameter signals, such as volume and vibrato. The neural network used for music generation may be a more conventional structure such as hierarchy and unit, but more often it is not so conventional, for example, it contains convolution structure; Or a circular structure; Or an automatic encoder structure is used. In the feature extraction of music signal, it is necessary to consider that the silent segment does not contain note features, and the silent segment should be removed

232

W. Peng et al.

from the signal first to reduce the interference of the silent segment on feature extraction. Music segments and silent segments can be distinguished by calculating shortterm energy values. After windowing and framing the music signal, the speech signal of the k frame is xk (n), and the short-time energy spectrum is represented by E k . The calculation formula is as follows: Ek =

N −1 

xk2 (n)

(23.1)

n=0

N represents the frame length. Music segments and silent segments of music signals can be reflected in their energy, and the energy of music segments is larger than that of silent segments, so short-term energy can be calculated to distinguish the silent segments of music. The key to effectively control the local contour feature preference of melody by the method in this paper is to define contour control labels that can be characterized flexibly and easily constructed by users. For any melody sample x (n) , let a discrete numerical sequence c(n) have ci ∈ {1/m, 2/m, . . . , 1}, 1 ≤ i ≤ n, m ≤ 88. Define the contour control label sequence as: c(n)∗ = arg min c(n)

n−1  i

 max 1 −

ci+1 + ci ,0 xi+1 − xi + o

 (23.2)

In the above formula, o represents a minimal positive real number constant, in order to prevent the denominator from being zero. In general, for a given discrete sequence, c(n)∗ can be inferred from multiple x (n) -columns of melody order, so the original melody sample cannot be inferred directly from c(n)∗ .

23.2.2 Implementation of AI Algorithm In recent years, with the appearance of AI technology such as deep learning and its wide application in real life, it has been deeply studied in theory and technology, which makes the multi-dimensional integration of modern technology and music enter a new era again. Computer-aided music generation based on AI algorithm, which uses the deep learning method in AI technology to make the computer learn the rules contained in music data, so as to use the calculator to compose music. In this way, the user’s manual intervention in music creation by computer can be minimized. In terms of composition skills, in terms of creative power, formal tension can be embodied by means of deformation, change, tone and form. From the perspective of perceptual thinking, motivation can form the tension in art through extension, diffusion, fracture, recombination and reproduction.

23 Design of Computer-Aided Music Generation Model Based on Artificial …

233

In practical application, it is a very key technology to adopt deep learning method based on AI. Compared with other methods, this method has the advantages of selflearning, strong memory and fast optimization. LSTM is a special RNN, which can effectively alleviate the instability in RNN training and has good processing ability for long sequence data. Forgetting gate, memory gate and focus gate all use Sigmoid as activation function, and candidate memory gates usually use tanh as activation function. Music is formed by sorting and combining multiple notes in time series, combined with the sequence processing characteristics of LSTM. In this study, LSTM network will be used to learn and train the time series note data, and finally the music with repetitive melody structure will be fitted [12]. The main reason for choosing Python language for model development in this paper is that the language has simple syntax and is easy to use, and it contains many packages that can be used for data analysis and processing. In addition, due to the need for music analysis, the language also provides many music processing packages and libraries. Based on the LSTM network, this paper redesigns the computer-aided music generation network. On this basis, a new musical style subnetwork is proposed and applied to different types of musical style subnetworks. Each music style subnetwork analyzes different styles of music, achieving the goal of multi-task parallel completion. The structure of the network is shown in Fig. 23.2. The number of nerve cells in each layer of the network is different, because different layers and different cell numbers have great influence on the network. Here, the number of neurons in the input layer is 128, and the number of neurons in other hidden layers and other detailed parameter selection will be explained in the following contents. The network is a multi-tasking network, which can analyze different genres of music. The prediction of each note in the melody sample sequence is regarded as a multivariate classification problem, so the regression loss of each note can be predicted by using the cross entropy measurement model, and then the reconstruction loss of the whole sequence can be obtained by summation, which is formalized as follows:

Fig. 23.2 Network structure

234

W. Peng et al.



 n   1 T l x = −E X t log Xˆ t n i

(23.3)

In the above formula, X tT is the single heat vector sequence of the training sample, and the cross entropy loss of a single note is obtained by inner product operation with the logarithm of the probability vector Xˆ t normalized by the model output by Softmax function. On this basis, the established artificial synthetic neural network is studied, and finally a better artificial synthetic neural network is obtained, thus generating a predetermined number of artificial synthetic signals. The input layer of the note prediction network model also needs to input n notes at a time, and the first input note sequence can randomly select n consecutive notes from the training set:

X = notel+1 , note2, . . . , notel+n

(23.4)

In the training process, when the data set is not very large, a common problem is over-fitting, and the generalization ability is poor, resulting in poor effect and insufficient accuracy of the model. On this basis, a “Dropout” method is proposed to avoid “over-fitting” and further improve the accuracy of the model. In the process of learning, these nodes do not work, and their weights will not be updated. The formula for adding a Dropout between hidden layers is: r = maskx f (W v + b)

(23.5)

Here, mask is a binary model, which obeys Bernoulli probability distribution. When the probability value is p, the value is 1, and the other values are 0.

23.3 Experimental Analysis The materials used in this paper are MIDI materials of classical pianos, including various forms of classical music files. Music clips are transmitted to the digital piano by the sequencer on the MIDI base and converted into sound. A total of five music genres are selected for analysis. The number of training sets and test sets of all genres is 180 and 20, so the data of each genre is 200. An artificial neural network with adaptive ability is established by using tensor flow, which has five levels. The input layer is a complete connected layer, which is the input terminal of a string. By arranging the 300 notes generated by 300 predictions in sequence, we can get a set of generated note sequences, assign an instrument as a piano to this note sequence, and match it with the piano sound source, and finally get a piano music with sound format. On this basis, several music birth network models based on LSTM are established, and their convergence performance is verified by loss curve. Figure 23.3 shows

23 Design of Computer-Aided Music Generation Model Based on Artificial …

235

Fig. 23.3 Experimental data of learning rate curves of different models

a model, which contains different executions, and how its loss changes during a training period. An improved algorithm based on LSTM is proposed in this project, which has stronger generalization performance and can better achieve higher music production rate. The results show that the designed neural network can well predict and generate notes, and the generated notes have high quality and achieve the design purpose. Because it is difficult to establish objective evaluation criteria for the quality of music generated by the model, this paper conducts anonymous audition evaluation through 15 student volunteers. Let all models including this model generate four groups of melody samples with 64 notes. Ten samples were randomly selected from each group to form four participating sample groups. The participating sample groups are anonymously submitted to each volunteer for listening comparison. Finally, each volunteer chooses the group that he thinks is the best and the group that is the worst. The results are summarized as shown in Fig. 23.4. The model in this paper got the best evaluation from seven people, who all thought that the two groups of samples sounded more thematic than random note jumps, and the melody was more beautiful and harmonious than other groups. The test results show that the sample quality generated by this method is significantly better than that of the contrast model. This mode provides a more flexible and effective method for music creation. From the production point of view, the quality of samples proves that the proposed model can produce satisfactory music, and at the same time, the credibility of samples also proves that there are a large number of music samples in theory.

236

W. Peng et al.

Fig. 23.4 Evaluation and comparison

23.4 Conclusion Although music is an art, it has a strong computability, and the beauty of mathematics is behind the music mode. Through the analysis of melody repetition, amplitude modulation, interval, rhythm, pitch arrangement and combination, and the parallelism of music forms, it can be described and simulated by mathematical methods. This paper mainly studies the computer-aided music generation based on AI algorithm, and designs and implements a new model through the variation of LSTM network. Through this model, different genres and styles of music can be generated at the same time, which has a certain contribution to computer-aided music generation. The research results show that the ideas proposed in this scheme have better generalization ability and can better complete the task of music generation. In terms of the quality of generated samples, it is proved that the model can generate satisfactory music by comparing with the existing non-interactive generation method of human blind listening, and the reliability of the generated samples is proved by the musical theory analysis of a large number of generated samples.

23 Design of Computer-Aided Music Generation Model Based on Artificial …

237

References 1. D. Bisharad, R.H. Laskar, Music genre recognition using convolutional recurrent neural network architecture. Expert. Syst. 36(4), 13 (2019) 2. J. Zhang, Music feature extraction and classification algorithm based on deep learning. Sci. Program. 2021(2), 1–9 (2021) 3. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Proc. 11(7), 884–891 (2017) 4. Y. Matsumoto, R. Harakawa, T. Ogawa et al., Music video recommendation based on link prediction considering local and global structures of a network. IEEE Access 99, 1 (2019) 5. L. Xin, E. Cai, Y. Tian et al., An improved electroencephalogram feature extraction algorithm and its application in emotion recognition. J. Biomed. Eng. 34(4), 510–517 (2017) 6. D.G. Bhalke, B. Rajesh, D.S. Bormane, Automatic genre classification using fractional Fourier transform based Mel frequency cepstral coefficient and timbral features. Arch. Acoust. 42(2), 213–222 (2017) 7. J.M. Ren, M.J. Wu, J.S.R. Jang, Automatic music mood classification based on timbre and modulation features. IEEE Trans. Affect. Comput. 6(3), 236–246 (2015) 8. P. Hoffmann, B. Kostek, Bass enhancement settings in portable devices based on music genre recognition. J. Audio Eng. Soc. 63(12), 980–989 (2016) 9. B. Mathias, B. Tillmann, C. Palmer, Sensory, cognitive, and sensorimotor learning effects in recognition memory for music. J. Cogn. Neurosci. 28(8), 1 (2016) 10. Z. Yu, Research on multimodal music emotion recognition method based on image sequence. Sci. Program. 2021(12) (2021) 11. J.C. Wang, Y.S. Lee, Y.H. Chin et al., Hierarchical Dirichlet process mixture model for music emotion recognition. IEEE Trans. Affect. Comput. 6(3), 261–271 (2015)

Chapter 24

Construction of Electronic Music Classification Model Based on Machine Learning and Deep Learning Algorithm Yipeng Li and Sanjun Yao

Abstract The identification of electronic music (EM) signal is a pattern recognition problem. At present, there are some defects in the identification method of EM signal, such as large error, slow speed and poor anti-noise robustness. Aiming at the above problems, based on the idea of deep learning (DL) and the structural characteristics of NN, this paper designs a music genre classification model with spectrogram as input, which provides a new idea of audio classification and recognition. Finally, a model is used to perform classification simulation experiments on various EM signals. Experiments have shown that compared to other existing EM classification models, the EM classification model proposed in this project can effectively shorten the construction time of EM classifiers, achieve accurate identification of various types of EM, and effectively improve the accuracy of EM classification. Keywords Machine learning · Deep learning · Electronic music · Classification model

24.1 Introduction Digital music has become increasingly popular, and major music companies have also shifted their product focus to digital albums. It is increasingly difficult for us to see physical albums, such as tapes and CDs [1]. Different music has different styles, accompaniment instruments and other components, and music with different hierarchical structures and characteristics can be summarized into different genres. Because most people like listening to music, there are many kinds of electronic music (EM), and everyone likes different types of EM [2]. If the types of EM signals are classified and identified in advance, listeners can choose the EM they want to listen to from the EM signal labels, which can greatly improve the management Y. Li · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_24

239

240

Y. Li and S. Yao

level of EM, so the classification and identification of EM signals has become an important research direction in the field of artificial intelligence [3]. EM is music made by using EMal instruments and related technologies. The EMal instruments used realize music data exchange through corresponding digital interfaces, synthesizers, sequencers and computers [4]. More detailed and in-depth research has been conducted on computer audio-visual information processing, and artificial intelligence technology can enable computers to understand music. Apply DL to create feature recognition modules, perform adaptive feature fusion on EM, and adjust using adaptive mechanisms to facilitate feature fusion in universities [5]. By taking on the fused feature factors through a NN, introducing a distribution structure for multi-layer perceptual classification, and utilizing the EM special frequency effect to construct an EM classification model. A single EM feature can provide limited EM information, making it difficult to accurately describe the specific content of EM and achieve correct classification of EM. EM features have many features such as short-term energy features, time-domain features, and frequency-domain features. Through EM features, detailed content of EM can be described [6]. Subsequently, artificial intelligence EM signal identification methods appeared, which obtained better identification results than manual methods. However, in practical application, these methods have some shortcomings: the accuracy of EM signal identification by linear discriminant analysis is low, and it is considered that there is a fixed linear relationship between feature vectors and EM signal types, which cannot reflect EM signals [7]. Overall, applying traditional Machine Learning (ML) methods to music genre classification requires manual feature design and relies on professional knowledge and experience in audio signal analysis. The steps are cumbersome, and there are bottlenecks in improving accuracy. The use of DL methods, such as the popular NN in recent years, can provide new ideas for music genre classification modeling [8]. In this paper, the classification and recognition of music genres are taken as the research direction, and one-dimensional audio files are processed by short-time Fourier transform, Mel transform and constant Q transform respectively to generate frequency spectrum and related data. Using convolutional neural network, acoustic features such as rhythm, pitch and chord in images are automatically learned and extracted, and a music genre classification model is constructed.

24.2 Constructing EM Classification Model 24.2.1 Overall Structural Design The adaptive multi feature fusion process of EM is actually a screening process. When the background rhythm frequency of EM undergoes drastic changes, the features of electronic sound effects change in a continuous form. The feature results obtained by single multi feature fusion are very single and cannot be classified for use [9]. Currently, EM features are typically modeled and analyzed using a single feature,

24 Construction of Electronic Music Classification Model Based …

241

Fig. 24.1 System hardware structure diagram

and the amount of information extracted from a single feature is limited, making it difficult to fully describe the type of EM. Therefore, this article uses extracting multiple features for EM classification. Firstly, collect EM signals. As EM signals are continuous, it is necessary to perform frame division processing on the EM signals, and to better extract EM classification features [10]. Among which the hardware part mainly consists of an audio acquisition module, an audio processing module, a storage module, and a power module. The overall hardware structure is shown in Fig. 24.1. Use this interface to transmit the received signal to the audio processing module, undergo analog-to-digital conversion and signal amplification processing, and save it to the memory module. At the same time, it is transmitted to a computer through an acoustic device interface and recognized by software as EM. When there is a differential gradient in EM, it is compensated based on the size of the differential gradient; In cases where the electroacoustic features are not very significant, an adaptive berth method is used to supplement them. Supplementary features will be recorded at each stage to achieve the expression and integration of EM features at multiple levels. In the high and low sound fields, multiple sound fields exhibit different fusion characteristics in the high and low sound fields. Collecting and extracting the features of EM effects from different levels can ensure the adequacy of the fusion method. Considering the actual function requirements of the system, the TLV320AIC23 chip introduced by TI, also known as AIC23 chip, is selected. It is a chip that supports MIC and LINEIN input modes, and has programmable gain adjustment for audio input and output. Program-controlled gain is for users to better control and adjust audio signals and realize online parameter modification. The

242

Y. Li and S. Yao

interaction between users and program gain function is realized by coding switch, and users can adjust the switch to the appropriate gear according to their own needs, so that the gain ratio selected by users is more accurate and meets more precise control requirements.

24.2.2 Multi Layer Perceptual Feature Classification Processing In the process of multi-layer perceptual feature classification, the NN multilayer perceptron (MLP) is used to divide the classification process into three layers: the import layer, the classification layer (one or more layers), and the output layer. The network classification framework of NNs includes neurons, which can undertake feature fusion factors and solve linear classification problems that cannot be solved in single-layer perceptual classification. This method can reflect both multiple features and multiple classification paths. On this basis, a cepstrum coefficient based on human auditory characteristics is proposed. After logarithmizing the amplitude spectrum, the coefficient can be divided into several frequency bands. Due to its strong distinguishability and correlation, it is necessary to perform Fourier transform and power shift to remove the correlation between noise and amplitude. And it was analyzed. This article provides the basic architecture of an electromagnetic classifier. As can be seen from Fig. 24.2, when constructing the EM classification model, we should first collect various types of original ecological EM data and denoise the collected EM data. The denoised EM is detected by framing and endpoint, and effective EM signals are obtained. Using this method can reduce computational complexity and improve classification speed. The specific factors in the classification layer carry out multiple classifications in the classification layer, and are allocated to different neurons according to different characteristics. Assuming that each neuron can only accept one characteristic factor, in the call process of the output layer, the weights of neurons are called, but the output is the characteristic factors carried by neurons. The NN multilayer perceptron is used for the classification of the same feature factor. When the number of imported neurons is the same as the number of output neurons, the number of output feature factors will be separated by the NN multilayer perceptron. Each neuron in the classification layer is an independent individual, but the connection path is different, which can effectively eliminate the classification process of bidirectional feature factors.

24 Construction of Electronic Music Classification Model Based …

243

Fig. 24.2 Classification framework of EM model

24.3 Construction of EM Classification Model Based on ML and DL Algorithms 24.3.1 NN Algorithm Model for ML Optimization EM signal identification method based on ML algorithm. At present, there principle of maximizing empirical risk, while support vector machine is based on the principle of minimizing structural risk. The learning effect of NN is obviously lower than that of support vector machine. The following is a detailed description of support vector machine. Least squares support vector machine is a recently popular ML algorithm, which has a faster learning speed and better learning performance than NNs. Therefore, it was chosen to establish an EM signal identification model. Set the training sample set consisting of EM signal identification features and signal types: {(x i , y i )}, i =

244

Y. Li and S. Yao

1, 2, . . . , n, x i ∈ R n , y i ∈ R, x i and y i are the identification features and types of EM signals, respectively, as shown in Eq. (24.1). f (x) = ω T ϕ(x) + b

(24.1)

Equation (24.1) is transformed and solved, as shown in Eq. (24.2). n 1  2 ς minω2 + γ 2 i=1 i

(24.2)

y i − ω T ϕ(x) + b = ei

(24.3)

s.t.

where γ represents the parameters of the least squares support vector machine. Because the calculation process of formula (24.3) is very complicated, its equivalent form is established, as shown in formula (24.4). L(ω, b, ξ, a) =

n 1  2 1 T ω ω+ γ ξ 2 2 i=1 i

+

n 

a i (ω T ϕ(x i ) − b + ξ i − y i )

(24.4)

i=1

According to optimization theory, as shown in Eq. (24.5).     x i − x j 2 k(x i , x j ) = exp − 2σ 2

(24.5)

In the formula, σ is the radial base width. After digital filtering, the movable Hamming window is used to perform windowing and framing processing, so that the audio characteristics are always stable. The framing processing adopts the method of alternating overlapping between frames, and the alternating part is frame shift, which aims to make the transition between frames smooth and maintain continuity. After ensuring the smooth transition between frames and reducing the truncation effect of audio, the endpoint detection step is entered. Endpoint detection is the key to EM signal identification and has great influence on subsequent feature extraction. Accurately find out the starting point and ending point of a single tone from the audio with noise, suppress the noise interference of the silent segment, and reduce the amount of data and computation, and reduce the processing time. After the classifier is determined, the tones in the training set are input into it. Every time a 60-dimensional feature vector of tone data is input, the possibility of each tone calculated by the hidden layer and the

24 Construction of Electronic Music Classification Model Based …

245

output layer can be obtained. The value is between 0 and 1, and the output result is the maximum value. Compare it with the notes corresponding to the input MFCC feature to determine whether it is the same, and output the final result to complete the EM signal recognition.

24.3.2 Analysis of Experimental Results The following experiments were conducted to verify the rationality of the intelligent classification model design for EM based on reasonable weight allocation. The experiment included 10 types of music, including Blue, Classical, Country, Disco, Hiphop, Jazz, Metal, Pop, Reggae, and Rock. Each type of music contained a total of 100 pieces, and music features were extracted to obtain music fragments. This test is mainly aimed at EM with excessive modulation. A designed DL EM signal recognition system is used to test the decoding time of audio files. At the same time, traditional EM signal recognition systems are used to obtain test results for comparative analysis. The number of samples for each type of EM is shown in Table 24.1. Of the 10 categories, the traditional classification had the lowest accuracy of 100, while the new classification had the lowest accuracy of 300. The maximum accuracy of the conventional classification is 20, while the maximum accuracy of this classification is 200. On this basis, a fuzzy clustering method based on fuzzy clustering method is proposed, which can effectively improve the classification accuracy of fuzzy clustering method and ensure the accuracy of fuzzy clustering results. Music of different periods is classified differently. The rock songs with high classification accuracy selected in this paper are selected as the research object to verify whether the classification accuracy will be reduced under different training cycles. The results are shown in Fig. 24.3. Table 24.1 Ten sample distribution of EM categories

Number of EM types

EM name

Sample size

1

Country music

300

2

Jazz

100

3

Rhythm, Bruce

100

4

Popular bel canto

200

5

HIP-HOP music

50

6

Folk rhyme

300

7

Rock and roll

100

8

Film music

20

9

World music

200

10

Gospel song

100

246

Y. Li and S. Yao

Fig. 24.3 Classification accuracy corresponding to different time periods

As shown in Fig. 24.3, the classification accuracy of rock music is around 8 s, but its accuracy gradually decreases over time. This indicates that there is no significant difference between achieving higher classification accuracy and lower classification accuracy. Instead, every moment in these pieces will have an impact on the classified information. So, these classifications can only be correct at certain times. Among them, classification rate refers to the classification rate of the classifier that can indirectly reflect the electromagnetic method. Data without multi feature classification cannot be output, while data that cannot be correctly classified will be isolated and cannot be output. As can be seen in Fig. 24.4.

Fig. 24.4 Comparison results of classification output rate

24 Construction of Electronic Music Classification Model Based …

247

The accuracy of EM signal identification by ML algorithm is higher than that by support vector machine and BP NN, which reduces the error rate of EM signal identification, solves the problem that the current identification result of EM signal is not ideal, and proves the superiority of the identification result of EM signal by this method. Using the established NN model, through the method of distributed parallel information processing, relying on the relationship between a large number of internal nodes, the purpose of fast processing is achieved.

24.4 Conclusion The EM signal identification system of DL plays a very important role in the development of EM. The system can not only assist the professional grade examination, but also be suitable for non-professionals to learn music. Under the same conditions, the identification simulation experiment is carried out with the classical method. The accuracy of EM signal identification by ML algorithm is much higher than the requirements of practical application, and the signal identification error is lower than that by the classical method. According to the dynamic characteristics of music, the traditional classification method is improved, and the intelligent classification model design of EM based on reasonable weight distribution is proposed. The dynamic parameters of the model are obtained by using the steganographic analysis algorithm of weight distribution, and then the music is modeled, and the obtained vectors are used as the sequence model, and the classification results are obtained. Through comparative tests, it is proved that this system overcomes the shortcomings of the traditional identification system, shortens the decoding time of the system for files that are transferred too much, is suitable for application in real life, is convenient for all walks of life to learn and understand EM, and makes a contribution to the development of EM.

References 1. P. Wang, E. Fan, P. Wang, Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recogn. Lett. 34(12), 40 (2021) 2. J. Hu, H. Zhang, Recognition of classroom student state features based on deep learning algorithms and machine learning. J. Intell. Fuzzy Syst. 40(2), 51 (2021) 3. Q. Liu, L. Sun, A. Kornhauser et al., Road roughness acquisition and classification using improved restricted Boltzmann machine deep learning algorithm. Sens. Rev. 60(8), 44 (2019) 4. X. Xia, J. Yan, Construction of music teaching evaluation model based on weighted Nave Bayes. Sci. Program. 52(2), 27 (2021) 5. Y. Xu, Q. Li, Music classification and detection of location factors of feature words in complex noise environment. Complexity 38(9), 29 (2021) 6. Y. Zheng, The use of deep learning algorithm and digital media art in all-media intelligent electronic music system. PLoS One 66(5), 37 (2020)

248

Y. Li and S. Yao

7. H. Chen, G. Ren, X. Guo et al., Construction and validation of a management model for intraoperatively acquired pressure injury based on PISETI theory. Asian J. Surg. 63(3), 13 (2023) 8. Liu T, Electronic music classification model based on multi-feature fusion and NN. Mod. Electron. Tech. 73(9), 46 (2018) 9. D.T. Muhamediyeva, N.A. Egamberdiyev, Algorithm and the program of construction of the fuzzy logical model, in 2019 International Conference on Information Science and Communications Technologies (ICISCT), vol. 53, no. 8 (2019), p. 11 10. W. Shi, F. Shuang, Research on music emotion classification based on lyrics and audio, in 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 63, no. 8 (IEEE, 2018), p. 33

Chapter 25

Design of Piano Automatic Accompaniment System Based on Artificial Intelligence Algorithm Xinwen Zhang and Chun Liu

Abstract At present, some achievements have been made in the research of computer artificial intelligence (AI) music composition, and computer automatic accompaniment came into being. Its greatest advantage is to complete the accompaniment task instantly on the premise of satisfying the main theme of the song. In this paper, a piano automatic accompaniment system based on AI algorithm is designed. This paper chooses the bidirectional recurrent neural network (RNN) of long shortterm memory (LSTM) to solve the problem of harmony arrangement. The attention mechanism is introduced, so that the output of the neural network is not only related to the input at the current moment, but also influenced by the input at other moments. Training through this mechanism can make the output of the model more in line with the music theory and thus closer to the nature of music. The research results show that compared with the average network score of the literature system, the average network score of the designed system is increased by 3.028 points, which fully shows that the mixed arrangement effect of music accompaniment of the designed system is better. Keywords Artificial intelligence · Piano automatic accompaniment · LSTM

25.1 Introduction Piano automatic accompaniment system is based on specific mechanical and electronic devices, and realizes automatic piano playing according to music data. Piano automatic accompaniment system plays an important role in modern music art and humanistic life. As one of the key links in the process of music creation, accompaniment arrangement will directly affect the expression effect of music if it is not effective. Arranging accompaniment for music is to add certain enhancement effect X. Zhang · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_25

249

250

X. Zhang and C. Liu

to music according to its own melody and basic music theory professional knowledge [1, 2]. At present, some achievements have been made in the research of computer artificial intelligence (AI) music composition, and computer automatic accompaniment came into being. Its greatest advantage is that the accompaniment task can be completed instantly on the premise of satisfying the main theme of the song. Calculus synthesis is an attempt to use computers and a formal program to minimize the role of human beings in composition. The artistic style of music produced by calculus will be influenced by the setting of calculus rules. Algorithmic composition is an extension of a certain pattern composition concept on the computer. Before the appearance of the computer, both oriental music and western music had a certain degree of formal foundation [3]. Literature [4] a melody creation system developed by recursive neural network technology. An unsupervised dimensionality reduction method was proposed in reference [5]. The experimental results show that this method is more effective than the previous methods in extracting music features. Literature [6] uses hidden Markov chain to realize a real-time music generation system with manual tuning. Literature [7] uses support vector machine (SVM) classifier to recognize notes, and the accuracy rate is as high as 98.2%. The research of piano automatic accompaniment system belongs to the field of automatic harmony for melody. It is a research branch in the field of algorithmic composition, that is, the research of multi-voice algorithmic composition system. In this paper, an piano automatic accompaniment system based on AI algorithm is designed. The main notes and harmony components of each segment are extracted by multi-fundamental frequency estimation, and they are used as the characteristics and labels of neural network to train the model, so that it has the ability to arrange harmony. Select a bidirectional recurrent neural network (RNN) based on long short term memory (LSTM) and introduce attention mechanism on the basis of the encoding decoding model. After the training is completed, input a single tone melody into the model to obtain a harmonic sequence, and then use the starting point information of the notes to synthesize music.

25.2 Research Method 25.2.1 Design of Piano Automatic Accompaniment Algorithm The quality of a musical work is ultimately evaluated by people. Different people have different musical preferences, accomplishments and aesthetic standards. This is different from evaluating the solution of a mathematical problem. There are many different “solutions” for configuring harmony or piano accompaniment for a song. This includes different harmony sequence strategies [8], the use of different stylized chords and the different organization methods of multi-voice texture. Through auditory evaluation, the creator can make appropriate adjustments to the various creative

25 Design of Piano Automatic Accompaniment System Based on Artificial …

251

techniques used in his works at any time so that the final works can meet the expectations of the creator. The other parts except the melody part mainly play the role of setting off and enriching the melody part. Polyphony music, on the other hand, refers to several melodic parts with independent meanings, which are combined together at the same time in the movement to form rich and diverse texture forms. This kind of multi-part music with independent melody meaning is called polyphonic music. Timbre refers to the characteristic attribute that can distinguish two same tones. Different musical instruments will have different timbres. People can easily distinguish different musical instruments through the differences in timbres. In music processing and analysis, besides public information, channel messages are also very important parameters. Generally, channel messages are aimed at a single MIDI channel, with a total of 16 channels. Because channel messages actually contain a lot of content, messages can be divided into voice messages and mode messages. MIDI event represents the message content contained in MIDI music in a short event. The event composition is shown in Formula (25.1): M I D I event = ⟨delta time⟩⟨M I D I message⟩

(25.1)

Multi-fundamental frequency estimation of piano music is more difficult than that of other multi-tone music [9], and the main difficulties include fundamental frequency loss, harmonic overlap and disharmony. The loss of fundamental frequency refers to the fact that in the spectrum of piano notes, the fundamental frequency components are very few or even completely lost compared with harmonic components, which mostly occurs in notes in the bass area. The research object of this paper is piano music, and the piano is a semi-harmonic instrument, that is, there is not a perfect multiple relationship between the fundamental frequency and the harmonic frequencies of notes, and there will be some deviation. For semi-harmonic instruments, the relationship between the fundamental frequency and the harmonic frequencies is as follows: /   fk = k ∗ f0 1 + β k 2 + 1

(25.2)

where β represents the disharmony factor, f 0 represents the fundamental frequency, k represents the number of harmonics, and f k represents the frequency of the kth harmonic. It saves the steps of artificially constructing features, can learn appropriate and effective features from a large number of very complex data, and can effectively use these features in practical tasks. This advantage also makes deep learning widely used in various fields, thus promoting its development. Simply put, deep learning can automatically mine information with special relationships hidden in a large number of data. The purpose of the harmony arrangement system studied in this paper is to arrange harmony for the melody, that is, to increase the harmony effect for the single tone in the melody. Because the processed data is sequence data, this paper chooses the

252

X. Zhang and C. Liu

Fig. 25.1 Coding-decoding model structure

two-way RNN of LSTM to solve the problem of harmony arrangement. On this basis, a neural network model based on attention is proposed. Training through this mechanism can make the output of the model more in line with the music theory, and thus closer to the nature of music [10, 11]. The coding-decoding model is selected as the network model, and the bidirectional RNN based on LSTM is used in the coding part, and the attention mechanism is introduced. The structure of the bidirectional LSTM encoding–decoding model based on attention mechanism is shown in Fig. 25.1. Firstly, the output of the last moment is combined with the hidden state of the coding part, and it passes through a Dense layer, that is, the fully connected layer, with an output dimension of 1. Then, the output of all Dense layers is activated by the softmax function, and the attention probability distribution is obtained. Attention mechanism does not require the encoder to convert all inputs into a fixed-length intermediate vector, but into a sequence of vectors. When decoding, a subset closely related to the current output will be selected from the vector sequence for decoding [12]. The output calculation at a certain moment is shown in Formula (25.3):   y (i ) = g y (i−1) , s (i ) , c(i)

(25.3)

where g is a nonlinear function. The combination of two-way LSTM and attention allows the model to learn the importance of each melody in the emotional tendency of the whole sentence, including the forward importance and the backward importance, which is in line with human intuitive feelings. The classifier selected in this paper is a linear Softmax classifier, and its calculation formula is shown in Formula (25.4):

25 Design of Piano Automatic Accompaniment System Based on Artificial …

f (xi , W, b) = W xi + b

253

(25.4)

where W is the dimension of the parameter matrix, and the dimension of the offset vector b is the number of categories. For the classification layer, what it needs to train is the parameter matrix and the bias vector. The loss function of the Softmax classifier is a cross entropy loss function, so the loss calculation formula of the i data by the Softmax classifier is shown in Formula (25.5). L i = − f yi + log

∑ j

e fj

(25.5)

In the coding-decoding model with attention mechanism, each output considers the information of the input sequence by assigning weights, which is of great significance for harmony arrangement, because harmony has not only a vertical structure in which multiple sounds are produced at the same time, but also a horizontal structure in which the front and back sounds are connected, that is, the harmony output at a certain moment has a certain relationship with the melody sounds before and after, so it is necessary to pay attention to the distribution of nearby sounds when arranging harmony for the sounds in the melody. This paper encourages the appearance of completely harmonious intervals and minimizes the appearance of discordant intervals. Assuming that the note at−1 = C at the previous moment is detected, the complete harmonious interval of the tone C has [C, F, G], and the dissonant interval has [D, b2 , B], the formula is constructed: { R m5 (s1:t , at ) =

0.5 i f at ∈ [C, F, G] −0.8 i f at ∈ / [D, b2 , B]

(25.6)

Because the output harmonics are composed of multiple tones and are not mutually exclusive, the loss function of the network is Sigmoid. 88 ∑            y j sigmoid yˆ j + 1 − y j log 1 − sigmoid yˆ j L yˆ , y = −

(25.7)

j=1

where y j represents the j-th element value in the target vector and yˆ j represents the j-th element value in the predicted output vector.

25.2.2 System Structure Design The temperament filter bank is designed according to the twelve-step law, and the distribution of signal energy in each frame can be obtained through this unit. The energy distribution data obtained by the melody filter bank is one of the key bases for

254

X. Zhang and C. Liu

detecting the starting point of notes. The function of the designed system software module is to process the data obtained by the hardware unit, and on this basis, realize the mixed arrangement of music accompaniment, and score the arrangement results on the network, so the software modules are mainly the accompaniment component extraction module, the accompaniment mixed arrangement module and the network scoring module. Note level judgment needs to make clear the specific position corresponding to the standard audio through the information obtained in the performance positioning, and then obtain the information of the notes to be played from the standard music performance information, and compare it with the pitch information of each frame in multi-pitch detection, so as to obtain the accuracy of playing the notes in the current frame. Based on AI algorithm, this paper proposes an automatic piano accompaniment system. The input of the system is a single melody song, and the output of the system is the input single melody song and the piano accompaniment with a certain style provided by the machine. Based on the previous research, this project intends to make a preliminary study on the single-beat sound group structure, and analyze it by simulation, simulation and patterning, so as to obtain a simulated, simulated and independent single-beat sound group structure. Its overall structure is shown in Fig. 25.2. The steps of piano automatic arrangement include two steps: firstly, the neural network model is trained by using the note sequence data set, and after several rounds of learning, a good note prediction network model is finally obtained. The second step is to use the symbol prediction network model to process a group of symbols and match them with the piano sound source, so as to obtain a complete music work. Dropout will be used at each level to enhance the generalization performance of the network and avoid over-fitting. The network input layer receives a note sequence with a fixed length, and the softmax function calculates the likelihood probability of the output unit, and then outputs the note category with the highest probability as the classification result. In music creation, a group of timbres are generated by using timbre prediction network mode, and compared with piano timbres to obtain a group of timbres. The AI algorithm inputs the output information of the interpretation layer into multiple music style analysis networks, so as to achieve the purpose of simultaneous learning of multiple interpretation layers. When analyzing music styles, because there are many kinds of music, in order to learn different music styles better, it is necessary to design multiple sub-network units. In this sub-network, the four-layer bidirectional LSTM learns parameters in a similar way of thinking, and has the functions of forward and backward transmission, parameter adjustment and so on. Based on this model, the interpretation layer method is adopted to reduce the learning time, and the multi-task method is adopted to improve the learning efficiency.

25 Design of Piano Automatic Accompaniment System Based on Artificial …

255

Fig. 25.2 Overall framework of piano automatic accompaniment system

25.3 Result Analysis In order to verify the performance difference between the designed system and the existing system, the system performance test is designed. According to the built test environment and prepared test parameters, the system is tested. Because of the particularity of art, the traditional comparative evaluation method is not suitable for art evaluation. Many scholars have studied the evaluation method. The recognized and effective method is to put the machine accompaniment and manual accompaniment together, distinguish them by people through audition, and evaluate the effect of machine accompaniment through the error rate of distinction. Therefore, this paper also adopts this evaluation method. According to the built test environment and the prepared test parameters, the system is tested. In order to ensure the accuracy of the test conclusion, two groups of music accompaniment components are selected from the database respectively, and the music accompaniment mixed arrangement style is set to be consistent. The music accompaniment mixed arrangement experiment is carried out, and the system performance is reflected by the network score.

256

X. Zhang and C. Liu

Fig. 25.3 Network scoring

A total of 20 pairs of music participated in the evaluation experiment, and 30 evaluators (including 12 music professionals and 18 non-professional music lovers) aged between 20 and 45 were invited to participate in the audition. Upload the mixed arrangement results of music accompaniment output by the system of ref [4] and the system designed in this paper to the cloud server, and find 10 top professional musicians to evaluate them (full score is 10), and obtain the network scoring data as shown in Fig. 25.3. By comparison, it is found that the average network score of the designed system is increased by 3.028 points compared with the average network score of the ref [4] system, which fully shows that the mixed arrangement effect of music accompaniment of the designed system is better. Considering the change of people’s psychological state during the evaluation, in order to make the evaluation result effective, here we randomly select 10 pieces of music from these 20 pieces of music with automatic accompaniment, play them randomly, let these 10 subjects answer whether the played music is accompanied by manual accompaniment or machine accompaniment, and count the number of people who answer, as shown in Fig. 25.4. It can be calculated that the average error rate of machine accompaniment is 45.972%, and that of manual accompaniment is 50.287%. The above data are generally satisfactory. Although there is still a certain gap between machine accompaniment and manual accompaniment, it is very small, so it can be said that it is not easy for

25 Design of Piano Automatic Accompaniment System Based on Artificial …

257

Fig. 25.4 Statistics of experimental results

people to distinguish between manual accompaniment and machine accompaniment generated by AI algorithm.

25.4 Conclusion Piano automatic accompaniment system plays an important role in modern music art and humanistic life. As one of the key links in the process of music creation, accompaniment arrangement will directly affect the expression effect of music if it is not effective. In this paper, an piano automatic accompaniment system based on AI algorithm is designed. The main notes and harmony components of each segment are extracted by multi-fundamental frequency estimation, and they are used as the characteristics and labels of neural network to train the model, so that it has the ability to arrange harmony. Two-way RNN based on LSTM is selected, and attention mechanism is introduced on the basis of coding-decoding model. The research results show that compared with the average network score of the literature system, the average network score of the designed system is increased by 3.028 points, which fully shows that the mixed arrangement effect of music accompaniment of the designed system is better. Acknowledgements This research is supported by the “5 g Era—Intelligent Platform for artificial Intelligence Piano Teaching” project of Hunan University Students’ Innovation and Entrepreneurship Training Program in 2020 (Project No.: 3534).

258

X. Zhang and C. Liu

References 1. C. Ju, H. Ding, B. Hu, A hybrid strategy improved whale optimization algorithm for web service composition. Comput. J. 66(3), 3 (2021) 2. C.K. Ting, C.L. Wu, C.H. Liu, A novel automatic composition system using evolutionary algorithm and phrase imitation. IEEE Syst. J. 11(3), 1284–1295 (2017) 3. F.M. Toyama, W.V. Dijk, Additive composition formulation of the iterative Grover algorithm. Can. J. Phys. 97(7), 777–785 (2019) 4. Y. Zhao, Research and design of automatic scoring algorithm for english composition based on machine learning. Sci. Program. 21(Pt.14), 1–10 (2021) 5. D. Herremans, C.H. Chuan, E. Chew, A functional taxonomy of music generation systems. ACM Comput. Surv. 50(5), 69.1-69.30 (2017) 6. R.H. Liang, O. Ming, Impromptu conductor: a virtual reality system for music generation based on supervised learning. Displays 15(3), 257–266 (2015) 7. J. Grekow, T. Dimitrova-Grekow, Monophonic music generation with a given emotion using conditional variational autoencoder. IEEE Access 9(99), 1 (2021) 8. J. Nika, M. Chemillier, G. Assayag, ImproteK: introducing scenarios into human-computer music improvisation. Comput. Entertain. 14(2), 1–27 (2017) 9. M. Dua, R. Yadav, D. Mamgai et al., An improved RNN-LSTM based novel approach for sheet music generation. Proced. Comput. Sci. 171, 465–474 (2020) 10. Y. Qi, Y. Liu, Q. Sun, Music-driven dance generation. IEEE Access 7(99), 166540–166550 (2019) 11. I. Goienetxea, I. Mendialdua, I. Rodríguez et al., Statistics-based music generation approach considering both rhythm and melody coherence. IEEE Access 7(17), 32 (2019) 12. H. Zhu, Q. Liu, N.J. Yuan et al., Pop music generation. ACM Trans. Knowl. Discov. Data 147(9), 396 (2020)

Chapter 26

The Importance of the Application of Intelligent Management System to Laboratory Management in Colleges and Universities Xu Feijian

Abstract Laboratory is an important place for college teaching practice and personnel training. It uses modern information technology to build a new type of intelligent laboratory management system, which can effectively solve the existing problems in safety and management of university laboratories. At present, the technology and management ideas are constantly updated, and the intelligent management system is also developed, which provides new opportunities and conditions for the management optimization of university laboratories. At the same time, the traditional laboratory management mode is also facing challenges. On this basis, this paper studies the intelligent management system of university laboratory. Firstly, the composition principle of intelligent management system is analyzed, and the advantages and workflow of intelligent laboratory management system are discussed. Finally, the application of intelligent management system in university laboratory management is discussed, so as to improve the level of university laboratory management. Keywords Intelligent management system · University laboratory · Laboratory management · Information platform · Laboratory teaching model

X. Feijian (B) Guangdong University of Science and Technology, Guangdong 523083, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_26

259

260

X. Feijian

26.1 Introduction In the process of laboratory construction in colleges and universities, if we want to achieve the level of external scientific research services on the basis of meeting the needs of teachers and students, we need to change the previous management concepts, put forward effective teaching standards and needs at the same time, and fully reflect innovative ideas in the intelligent management system to promote the sharing of all resources in the laboratory, and build an open laboratory [1]. In addition to strengthening the management of personnel and equipment in the process of laboratory management, a new laboratory management model needs to be built, and there are also higher requirements for management systems, which requires the management system to be intelligent. The following author elaborates on the relevant content to promote a clear understanding of the laboratory intelligent management system.

26.2 Composition of Intelligent Management System The intelligent laboratory management system function aims to provide colleges and universities with an information platform covering the routine business management of the whole school’s laboratory, such as laboratory information, experimental rooms, and so on, so as to make laboratory work more convenient and faster. The specific functions are shown in Table 26.1, and their composition is shown in Fig. 26.1. In the intelligent management system, a total of communication, control and information reading three modules, the three modules are combined with each other, Table 26.1 Functions of intelligent laboratory management system Management of laboratory

Laboratory routine information management, laboratory room information database management, laboratory work management log registration inquiry, laboratory safety management

Planning and summary

Manage the laboratory year, semester work plan and work summary

Laboratory information statistics

Laboratory task statistics, laboratory statistics, laboratory operation funds statistics

Website information management

Website information resource management, website dynamic news management and relevant laboratory home page information setting

26 The Importance of the Application of Intelligent Management System …

261

Fig. 26.1 Integrated laboratory management system of colleges and universities

and play its own role, so as to make the management system fully issued its own role. (1) Communication module In the communication module, the most important role is to process and analyze data information, and upload the processed information to the control center. After that, the control center can make more scientific judgments and assign test benches to students [2]. The operation of the control center is mainly to use the information given by the communication module as an instruction, and many modules are also adopted a unified protocol, in order to effectively complete long-distance transmission. The role of wireless transceiver in this process is also very important, the use of single chip microcomputer to check the transceiver receiving messages, so as to avoid data conflicts and lead to system disorder [3]. The transmission mode is different in different links. For example, the cable transmission mode is mainly adopted between the experimental platform terminal and the control center, which can ensure the

262

X. Feijian

effective realization of serial communication and has a more stable anti-interference ability during transmission. (2) Control module In the control module, should make reasonable use of the single chip microcomputer, make its function can be effectively realized, this module is mainly to ensure that every work in the system can be displayed on the LCD screen, the content of the display mainly includes student information, experiment time and so on. Moreover, through this module, it is also possible to control the locking device on the experimental bench and manage the closure and disconnection of the power supply, so as to ensure the intelligent development of the laboratory [4]. From this process, it can be found that the port of the single-chip microcomputer does not need to be expanded outward. The relay in the system needs to fully consider the strength of the power supply and effectively control the circuit. If the power supply in the experimental bench is automatically cut off, the upper computer needs to be notified so that the time adjustment can be effectively completed. (3) Information reading module This module mainly has the functions of information conversion and radio frequency analysis. According to the situation in university laboratories, the radio frequency analysis needs to use automatic card search technology, which can be connected with all single chip computers. The detection range is continuously expanded automatically to ensure that the received information can be fully transmitted to the following links, so as to do a good job in receiving information, as shown in Fig. 26.2. Fig. 26.2 Schematic diagram of the information reading module

26 The Importance of the Application of Intelligent Management System …

263

26.3 Advantages and Workflow Analysis of Laboratory Intelligent Management System (1) Advantages In college laboratory management, the use of intelligent management systems has very obvious advantages. The use of information technology can carry out scientific management of laboratories and promote the improvement of work efficiency and management level, so as to provide an effective basis for decision makers and improve the level of laboratory management. The advantages of laboratory intelligent management systems in colleges and universities are shown in Table 26.2. (2) The working process Since students entered the laboratory, intelligent management has been carried out. If you want to enter the lab, students need to undertake unity card, and credit card machine will the information in the student card meal can go to the single chip microcomputer, then the obtained information is stored, and the transmission of the information they need to use to the system, the single chip microcomputer to student card access to specific time, when finish the access, the communication module will be able to upload the student number to the control center by means of wireless transmission [5]. After receiving the student information, the corresponding single chip microcomputer can complete the corresponding reading work. At this time, the student information will complete the transmission work. When the transmission work is completed, the management software will accurately judge the duration of Table 26.2 Advantages of laboratory intelligent management system Scientific management mode

The use of intelligent management system can standardize the management work, optimize the various repetitive work, and other competent departments can also query the relevant information in the Internet, and also conducive to the completion of teaching evaluation and hardware facilities management work

Improving teaching methods

The intelligent management system is designed based on project-driven teaching. It can create teaching projects on the client side and transmit them to the controller using radio frequency technology. Students can use the interpersonal interface to consult teaching projects and technical documents, saving some time for students to effectively complete skill operation training

Laboratory connotation construction

In the process of laboratory connotation construction, more qualified talents can be trained for people, so as to provide services for the society. Students should continuously strengthen their theoretical learning, connect with the practice according to the task and content of the subject, and improve their management awareness, which is also the professional quality that highly skilled talents need to have. By combining scientific research and teaching, theory can be applied to practical teaching

264

X. Feijian

the student’s operation according to the specific situation of the experimental bench. If the student has begun to carry out the experiment, the activity information will also be added to the system, and if the student removes the system, it will prompt to turn off the power. If all the test benches in the laboratory are already operated, the control center will prompt the information to other students through the screen. If there is an empty test bench, the control center will send the command to the access control system and the test bench, notify the opening of the laboratory door, and connect the power supply in the laboratory. The intelligent management system also fully reflects the principle of humanization, and the terminal single-chip microcomputer of the experimental bench has a timing function [6]. If the power supply in the experimental bench has been connected continuously for 2.5 h, the control center will monitor this phenomenon, automatically disconnect the power supply, and then transmit the information to the upper computer. After receiving it, all the information in the experimental bench will be automatically deleted. If the student has an accident, the experimental time needs to be extended and manually set up through the control center to ensure that the experimental content can be effectively completed.

26.4 Application of Intelligent Management System in Laboratory Management in Colleges and Universities Judging from the specific situation of laboratory management systems in colleges and universities in the past, there are still obvious shortcomings in many aspects, and the emergence and application of intelligent management systems can effectively improve this situation, promote the effective integration of intelligent management systems into laboratory management, give full play to their own value, and build In the whole process, we need to grasp the key points so that we can create an intelligent laboratory management system for the benefit of teachers and students in colleges and universities. (1) Establish a complete online teaching center If you want to provide students with a more convenient teaching path, you need to reasonably apply modern network technology and build a perfect online teaching center, which can play a good auxiliary role in the development of teaching work. Students can query the experimental bench that can be used in the laboratory through the information center. At the same time, they can freely choose the experimental time, make reasonable use of large-scale network resources, and provide students with content related to the experimental course [7]. For example, the whole experimental process can be viewed from the Internet before the experiment is carried out. The specific steps are analyzed and the tips for the key experimental steps. When the experiment is over, the question of reflection will be put forward according to the content of the experiment, so as to carry out the purpose of checking gaps and filling

26 The Importance of the Application of Intelligent Management System …

265

gaps and leaks, and testing the learning effect of students. By using a relatively sound network information teaching center, teachers can not only have a comprehensive grasp of students’ learning situation, but also systematically design and management can significantly improve teachers’ efficiency in teaching. By teaching network information center provides the technical support, can make the experiment curriculum reform work presents obvious intelligence and facilitation, to all the students in the experimental operation situation has certain understanding, prompting its experimental ability is improved significantly, which will build up the connection between experiment and scientific research work in the future [8]. (2) Establish an open laboratory management system If you want to meet the needs of intelligent management, you need to establish a new management platform, which presents obvious informatization and intelligence compared with the traditional laboratory management system, and can also place all the details in the system, making the operation more flexible. For example, the intelligent card swipe system can record the information of all teachers and students entering and leaving the laboratory, and the monitoring system can record the behavior of teachers and students in the laboratory. Using intelligent software, all experimental processes can be controlled, booking experimental instruments, and controlling the terminal of experimental equipment needs to ensure that the platform information is true. Because there is obvious uncertainty for students in doing experiments, the time and arrangements of experiments are relatively complicated. At some times, you still need to watch the results of the experiment in the middle of the night [9]. Therefore, the laboratory intelligent management system should meet the requirements of teachers and students to freely enter the laboratory all day long, so as to provide teachers and students with the ability to arrange and use the experimental site according to their own experimental progress, and at the same time standardize the instruments and equipment used in the laboratory. (3) Cultivation of scientific research and creative thinking When intelligently managing college laboratories, the laboratory can be used by teachers and students all day [10]. In this way, the scientific research projects studied by students will not be limited by time, experimental resources will be available at any time, and students will be provided with more and sufficient time. Don’t delay the progress of the experiment due to the conflict between the course and the opening time of the laboratory. If the experiment is limited and it is difficult for students to carry out effectively due to schoolwork problems during the day, they can use the corresponding experimental research at night [11]. When students can’t do the laboratory on weekdays, experiments can be carried out on weekends, and teachers no longer need to be able to use various experimental equipment in the laboratory [12]. In the effective application management of intelligent systems in college laboratories, it can bring many benefits to teachers and students in colleges and universities. For example, application experimental sites that can make students more flexible, smoothly promote their own experimental plans, and the probability of publishing all scientific research results has also been significantly improved,

266

X. Feijian

and open experimental sites can promote the improvement of students’ scientific research ability and experimental operation ability [13]. Compared with the past, more scientific research results can be released at the same time, which significantly improves the average level of scientific researchers in China to a certain extent and promotes the effective completion of more scientific research projects. (4) Cultivate top innovative talents The open platform studied on the basis of the laboratory intelligent management system can realize the reform of experimental teaching to a certain extent, so that all instruments and equipment can be applied to undergraduate teaching, and new experiments will be added to reform the experimental teaching system, so that undergraduate students can analyze various discipline directions [14]. Advanced instruments and equipment are contacted so that relevant experiments can be better carried out. By increasing the opening of laboratories and providing support for students in scientific research and innovative experiments, the shortage of experimental teaching resources can be effectively alleviated. First of all, strengthen innovative experimental training. Through the reform of experimental teaching methods, content and other aspects at a high level, the scientific research results are updated and supplemented into experimental teaching, so as to promote the richer and richer experimental teaching methods and content, so as to realize research and teaching [15]. Through large-scale innovative experimental training, discipline leaders are encouraged to serve as responsible professors and work with experimental personnel to guide students to provide students with the opportunity to receive guidance from high-level scientific researchers [16]. Secondly, fully open up the scientific research base. Universities can open the research base of the whole college to the majority of students to support students to participate in scientific research activities [17]. They can also allow students to determine the topics they want to study according to their hobbies and interests, so that they can get the support provided by innovative experimental projects. Encourage all students so that they can enter the experimental and scientific research team earlier, so as to carry out corresponding scientific research training, cultivate their interest in scientific research, and promote their scientific and technological innovation [18]. Finally, a high-quality experiment is set up. More advanced equipment can also be introduced into the professional experimental curriculum system in college laboratories. Through experimental operation, students can master high-precision processing equipment and technology, so that they can better complete instrument testing. (5) Process system integration At present, the smart laboratory integrating data query, statistics and management is the further deepening and upgrading of the digital laboratory. In practical application, the system uses the integrated solution of website + management terminal + mobile terminal + Internet of Things to realize unified management and unified scheduling of all laboratory resources to fully perceive the physical environment of the laboratory. Intelligent recognition of teachers and students working and learning

26 The Importance of the Application of Intelligent Management System …

267

experiment process, equipment use status, to achieve intelligent teaching and scientific research activities, and it can provide big data decision-making and scientific analysis for laboratory management, experimental teaching, instrument and equipment management, laboratory safety, and realize intelligent, safe, open and efficient operation of the laboratory.

26.5 Conclusion In a word, if the curriculum reform of major colleges and universities can be carried out well, then college leaders should recognize the significance of intelligent laboratory management systems for improving the scientific research capabilities of schools, and use advanced technical means and management concepts to establish a scientific and intelligent laboratory management system, which can promote the effective use of experimental sites. Using intelligent management systems to strengthen laboratory management in colleges and universities can promote the deepening reform of laboratory teaching, fully reflect the level of experimental operation of all college students, and put forward stronger support for scientific research achievements. From the above content, it can be understood that the laboratory intelligent management system can improve students’ scientific research and innovation ability, so as to provide a stronger guarantee for the development of scientific research work, which can greatly accelerate the research progress of scientific research results to a certain extent, so that it can be promoted and applied in various universities.

References 1. T. Pengpeng, The role of intelligent management system application on the importance of laboratory management in colleges and universities. Sci. Technol. Style 26, 2 (2020) 2. C. Yue, Brief discussion on the application of intelligent management system in laboratory management in colleges and universities. Inform. Record. Mater. 21(9), 3 (2020) 3. S. Baohong, Brief analysis of the application of intelligent management system in laboratory management in colleges and universities. Sci. Inform. (2020) 4. H. Chen, Discussion on the application of intelligent management system in laboratory management in colleges and universities. Sci. Technol. Econ. Market 8, 2 (2021) 5. Z. Yanyan, Research on the application of intelligent management system in laboratory management in colleges and universities. Sci. Technol. Inform. 20(6), 3 (2022) 6. L. Miao, Practical application of intelligent management system in laboratory management in colleges and universities. Wireless Internet Technol. (2021) 7. G. Xiaoling, Application of information technology in the intelligent management system of college laboratories. Inform. Record. 22(12), 3 (2021) 8. F. Bin, Discussion on the design of intelligent management system for college laboratories based on internet of things technology. Sci. Educ. J. Electr. Ed. (2020) 9. S. Sin, Design and implementation of intelligent management system for campus training tools. Res. Implem. Mod. Vocat. Educ. 48, 158–159 (2021) 10. M. Lui, Z. Jun, C. Yung, Laboratory intelligent management system. J. Tonghua Teach. College 12, 61–65 (2019)

268

X. Feijian

11. W. Yu, The analysis and design of the intelligent management system of the shared training room in higher vocational colleges. Wi-fi 15, 49–50 (2021) 12. V. Sanchez-Anguix, K.-M. Chao, P. Novais et al., Social and intelligent applications for future cities: current advances. Fut. Gener. Comput. Syst. 114, 181–184 (2021) 13. Z. Fang, W. Jie, W. Faquan, The laboratory intelligent management system using NB-IoT and artificial intelligence technology. Math. Probl. Eng. 2022, 42 (2022) 14. G. Prabakaran, D. Vaithiyanathan, M. Ganesan, FPGA based intelligent embedded system for predicting the productivity using fuzzy logic. Sustain. Comput. Inform. Syst. 35, 100749 (2022) 15. Z. Zhang, Z. Zhu, J. Zhang et al., Construction of intelligent integrated model framework for the workshop manufacturing system via digital twin. Int. J. Adv. Manuf. Technol. 118(9–10), 1–14 (2021) 16. H. Wang, Application of project management in laboratory work of colleges and universities. Adv. Higher Educ. 6(17), 10027 (2022) 17. N. Calvo, D. Rodeiro-Pazos, M.J. Rodríguez-Gulías et al., What knowledge management approach do entrepreneurial universities need? Inform. Syst. 57, 85 (2019) 18. C. Qinwei, Q. Shunli, H. Jian, Contradiction and mechanism analysis of science and technology input-output: evidence from key universities in China. Socio-Econ. Plan. Sci. 79, 101144 (2022)

Chapter 27

Design of Defect Detection Algorithm for Printed Packaging Products Based on Computer Vision Shubao Zhou

Abstract The high-speed operation of the printing press, the mistakes of the operator’s supervisor or objective, and the possible unstable factors in the printing environment may all make the printed matter unqualified. Manual inspection has great uncertainty and needs to rely on the subjective judgment of workers, so it is difficult to ensure the accuracy and consistency of printing. The detection technology based on computer vision is to collect the image of the object with the help of a camera and use a computer to analyze and process the image information to complete the quality detection. In order to use computer technology instead of manual defect detection of printed packaging, this paper proposes a defect detection algorithm of printed packaging based on computer vision by combining image registration, automatic selection of registration area and defect detection, and verifies the effectiveness of the method through experiments. The results show that compared with the traditional wavelet transform algorithm, the printing and packaging defect detection algorithm in this paper has obvious advantages in the later stage of operation, and the error is reduced by 32.55%. This algorithm has a higher accuracy in detecting defects in printing and packaging, which is 22.48% higher than the comparison algorithm, and the recall rate is increased by 19.56%, so it can locate the edge contour of printing and packaging images more accurately. Keywords Computer vision · Printing and packaging · Defect detection

27.1 Introduction In the existing printing industry, the printing equipment is constantly updated, the printing production speed is faster and faster, and higher requirements are put forward for the printing quality. In order to achieve better printing quality, new technologies S. Zhou (B) Shanghai Publishing and Printing College, Shanghai 200093, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_27

269

270

S. Zhou

and diversified processing techniques are constantly introduced, and automatic and rapid printing production is basically realized. However, there are still many problems in image quality detection [1]. The high-speed operation of the printing press, the operator’s supervisor’s or objective mistakes, and the possible unstable factors in the printing environment may make the printed matter unqualified, resulting in quality problems such as scratching and color distortion. This seriously affects the printing production efficiency, and makes the printing factory suffer huge economic losses [2]. In the packaging and printing industry, inspection technology is an important product inspection technology and an important means to ensure the printing quality of enterprises. However, under people’s subjective judgment, the quality of printed packaging products will be affected to some extent, which will be detrimental to the development of the printing industry [3]. In order to ensure the quality of printed matter, besides controlling the whole printing process, it also needs to do a good job of quality inspection after printing, strictly control the quality of printed matter, and remove unqualified products in time [4]. It is possible to use computer vision detection technology for on-line detection, but to truly realize on-line detection, it is necessary to solve the problem of multi-channel synchronization of video image acquisition and processing and stereo measurement. At present, printing enterprises generally adopt manual methods, and operators usually use strongpoint to observe the quality of printed images. However, when the printing speed is unstable and the printing plates are aligned, it is almost impossible to observe the quality changes of printed images in time. In addition, working under the strobe light for a long time will do great harm to the operator’s eyesight [1]. In order to reduce the waste products and defective products produced in the printing process and improve the printing quality and efficiency, it is necessary to monitor and check the quality of printed matter in real time, find the printing problems in real time, adjust the printing operation in time and reduce the losses. It needs to innovate and reform the defect detection technology of printed matter [5]. Song et al. [6] proposed an image registration-based defect detection method for food packaging printing, which improved the reliability and stability of the system to some extent. Zhang et al. [7] realized on-line color detection of printed images based on machine vision, and combined with standard template creation, image registration and color space conversion to overcome the problem of unstable quality of manual detection. Wang et al. [8] designed an image quality detection method for unmarked printing by using HALCON programming, including image registration, relevant area selection, defect detection and other steps. This article discusses the use of computer vision instead of manual inspection of printing and packaging products, and then judges whether printed products are qualified or not.

27 Design of Defect Detection Algorithm for Printed Packaging Products …

271

27.2 Methodology 27.2.1 Printing Packaging Defect Detection The computer compares and analyzes the printed matter image to be detected with the standard template image, and obtains the quality information of the printed matter, thus achieving the purpose of prompting the printer operator or directly controlling the printer. Computer vision inspection technology is a kind of computer technology that uses high-speed cameras and image sensors to collect the images of the tested objects, and then completes the inspection related to vision and images through image processing. With the development of computer technology and image processing technology, a large amount of information can be acquired in a short time by using computer technology, and automatic processing and rapid processing control of information can be realized, so it can better meet the requirements of online detection of printed and packaged products [9]. In the automatic measurement and control system, the application of real-time video is to send the video information of the detected live image into the computer monitoring system, so as to realize the real-time detection of the inaccessible parts, dangerous places and distant scenes. Ideally, the captured image is complete and clear, which can be directly used to detect defects. However, in the actual situation, the captured image contains a lot of complex and redundant information, so it is necessary to use targeted image preprocessing methods to eliminate the information unrelated to defect detection and refine the image features to be detected. Every image collected by the machine vision inspection platform contains a lot of image information. For the image information that needs to be inspected, the redundant information that doesn’t need to be inspected should be retained, so as to improve the processing speed and time effect of the inspection algorithm. For the image information with strong correlation, it needs to be processed such as color space conversion, image filtering, location and registration, image segmentation and feature extraction. In order to establish the correspondence between template images and captured images, image registration can be realized based on a specific registration algorithm. Image registration is to use registration criteria to search for the best similarity [10]. A good noise reduction effect can be achieved by synthesizing various filters, but there is still inevitable noise influence in the subsequent image processing. According to the clear and complete image information, the features are pre-selected and judged, and the correlation between the standard image and the image to be measured can be found by taking the local obvious features as small templates, so as to realize the quick small template positioning between the two images. Using visualization technology, the original digital or tabular information can be displayed graphically. Including the visual transformation of the video image of the detected object, the data in the operation process and the weak change of the sensor signal, etc. By using real-time graphics and image technology, the real-time simulation

272

S. Zhou

of instruments and mirrors can be completed, and the model can be appropriately adjusted by man–machine interaction, which can quickly reflect the control process or detection results.

27.2.2 Printing Defect Detection Algorithm From the principle of system inspection, the system can focus the image of the product to be tested on the scanning camera, and then convert the image into an electrical signal that can be recognized by the computer. After collecting and processing these signals, the system can transmit them to the computer, and use image processing software to process, analyze and judge the data. In this process, the position, power, lighting mode and the quality of the camera’s optical system of the light source will have an important impact on the clarity of the image. The basic requirements of image acquisition are good linearity, low noise, high resolution and fast conversion speed. In the image data acquisition system, because the image sensor is a photosensitive device, the intensity of illumination will directly affect the quality of the scanned image, so the requirements for lighting devices are generally very high [11]. Using the image processing software of the system, the feature extraction, data coding and image segmentation of the image can be completed, and the image noise can be eliminated, so that the image quality can be improved. In general, image edge detection based on wavelet transform can be used to detect defects. Considering the limitation of the scanning range of a CCD camera, it is necessary to install multiple scanning heads in plane detection, and at the same time, it needs to detect the defects of the actual three-dimensional appearance of printed packaging. Therefore, it needs to adopt stereo vision method to inspect the appearance defects of printed package surface in a non-contact manner [12]. In the hardware system of printed matter defect online detection system, factors such as rotation and translation will occur in the stage of printed matter transmission and image acquisition due to mechanical vibration, so that the printed matter image to be detected does not match the standard printed matter image in space. In order to make the system run accurately, it is necessary to adjust the position of the image, and match the image to be detected with the template image in space, so that they can refer to the same spatial coordinate system. The flow of printing defect identification algorithm is shown in Fig. 27.1. In different occasions, different times or different weather conditions, the natural light intensity will fluctuate in a large range, which will greatly reduce the consistency of imaging, which may directly lead to the reduction of recognition rate. To solve this problem, the consistency of imaging environment needs to be ensured. In sealed environment, a fixed light source can be used to ensure a stable light intensity, and in the case that it cannot be sealed, a high-intensity stroboscopic device can be used to shield the influence of natural light and improve the consistency. According to the principle and structure of system detection, the system function can be realized by designing the system detection process [13]. It is necessary to use scanning camera to collect the images of standard printed packaging products without defects, and

27 Design of Defect Detection Algorithm for Printed Packaging Products …

273

Fig. 27.1 Algorithm flow of printing and packaging defect recognition

make the images as samples into templates. By comparing the standard template with the collected image, we can find out the defects of the image. In this process, it needs to load the image into the device-independent color gamut space, then convert the image to the color space, and compare the gray difference between the collected image and the standard image. The schematic diagram of corrosion and expansion of printing defect detection is shown in Fig. 27.2. In the system image processing module, it needs to complete the input, processing and output of images, and transmit the detection result information required by the system to the logic control unit of the system. The logic control unit of the system can coordinate and control each module of the system, complete the state scanning of each control part of the system and respond to the actions of different parts of the system. In the actual application of image registration, due to mechanical vibration and other reasons in the stage of image acquisition, the collected images may be interfered by different degrees of noise or other factors, and the two images may not be exactly the same, but only basically the same. Compared with the image registration algorithm based on gray level information, the image registration algorithm based on feature points calculates the parameters of image transformation by establishing the spatial relationship of feature points of the image to be registered, which can greatly reduce the impact of gray level change on the registration structure, and has strong robustness and adaptability. Let the gray value range of the original print packaging image f (x, y) be, choose a suitable threshold T , and:

274

S. Zhou

Fig. 27.2 Schematic diagram of corrosion and expansion of printing packaging defect detection

gmin ≤ T ≤ gmax

(27.1)

Image segmentation with a single threshold can be expressed as:  g(x, y) =

f (x, y) ≥ T f (x, y) < T

1, 0,

(27.2)

g(x, y) is a binarized image. The object can be easily revealed from the background through binarization. The key to binarizing t (gmin , gmax ) he print packaging image is the reasonable selection of the threshold T . The stage of edge detection for continuous image g(x, y) is the stage of finding the local maximum value and direction of gradient of image edge points. It is known that the gradient of g(x, y) along r in θ direction is defined as follows: ∂ f ∂x ∂ f ∂y ∂f = ∗ + ∗ = f x cos θ + f y sin θ ∂r ∂ x ∂r ∂ y ∂r The conditions for

∂f ∂r

(27.3)

to reach the maximum value are: ∂



∂f ∂r

∂θ

 =0

∂ f ∂x ∂ f ∂y ∂f = ∗ + ∗ = f x cos θ + f y sin θ ∂r ∂ x ∂r ∂ y ∂r

(27.4)

27 Design of Defect Detection Algorithm for Printed Packaging Products …

f x cos θ + f y sin θ = 0

275

(27.5)

The following can be obtained:   θg = arctan f y / f x or θg + π

(27.6)

Then the gradient modulus is:  g=

∂f ∂r

 =

/

f x2 + f y2

(27.7)

max

The gradient operator has the properties of isotropy and displacement invariance, and is suitable for edge detection, while the direction of gray change is:   θg = arctan f y / f x

(27.8)

When testing the quality of printed matter, the image to be tested should be compared with the template image, and the system should ensure that the image to be tested will not be judged as an unqualified product due to the large change of the gray value of the image pixels caused by the brightness change, so it is required that the lighting device has stable brightness and high lighting intensity. If the color difference of the image exceeds the threshold, it can be judged that the packaging printed matter of to be inspected is defective [14]. However, in order to detect the slight defects  products, it needs to compare the domain pixels at the image θg = arctan f y / f x defects, so as to detect the slight defects of the system. Therefore, it needs to use edge detection to locate image defects.

27.3 Result Analysis and Discussion Feature-based image feature registration is a registration method that takes features that can be clearly distinguished from many standard images as parameters. The main features are color, texture, outline, three-dimensional, concave and convex, etc. There are many background pixels in the printed image, so it doesn’t need to calculate and match all the pixels. If the sensitive points of the target image are matched, the calculation amount of matching will be reduced and the image processing speed will be improved. However, if there is noise, the feature extraction and matching will be directly affected, and the location of feature points will affect the matching measurement calculation of feature points. If the high-frequency part of the image information can’t be removed well, more high-frequency parts will remain, and the high-frequency noise will affect the image quality. The low-frequency part of the image information can be filtered. The comparison of the average absolute error of the algorithm is shown in Fig. 27.3.

276

S. Zhou

Fig. 27.3 Comparison of algorithm MAE

It can be seen that, compared with the traditional wavelet transform algorithm, the printing and packaging defect detection algorithm in this article has obvious advantages in the later stage of operation, and the error is reduced by 32.55%. The image point of any point in space on the left and right image planes must be on the epipolar line determined by the point and the two projection centers. If the camera orientation is the same and the two projection centers are on the same horizontal line, the epipolar line must be on the same horizontal line of the left and right images. Contrast enhancement is the study of image content. Taking image histogram as an example, it can well represent the gray level of an image. If the gray level is uniformly distributed, that is, the characteristic contrast is adjusted, the concentrated gray information in a certain area can also be uniformly distributed. The value of a pixel is calculated by weighted average of its neighboring pixels. The weighted average is a template to find each corresponding pixel, and the weighted average gray value is calculated to replace the original central pixel of the template, thus realizing Gaussian filtering. Compare the accuracy and recall of the algorithm for printing and packaging defect detection, as shown in Figs. 27.4 and 27.5. The detection results show that the accuracy of this algorithm is higher than that of the comparison algorithm, which is 22.48% higher, and the recall rate is increased by 19.56%. It can accurately locate the edge contour of the printed packaging image. Through the steps of signal acquisition, transmission, matching and processing, the parallax of the corresponding points in the left and right views of the real object can be obtained. The basic principle of packaging defect detection can be described as comparing the image to be detected with the standard template, finding and analyzing the differences between them, and then obtaining the defect information in the printed image. In the specific production, different detection precision can be set according to the quality requirements of different printed matter. When the value of the connected

27 Design of Defect Detection Algorithm for Printed Packaging Products …

277

Fig. 27.4 Comparison of accuracy rate of printing and packaging defect detection

Fig. 27.5 Comparison of recall rate of printing and packaging defect detection

area obtained by detection is greater than the set detection precision, the printed matter is judged as an unqualified product, otherwise it is a qualified product. The use of machine vision and image processing technology will not cause any damage to food packaging, which can completely avoid many drawbacks of manual inspection.

278

S. Zhou

27.4 Conclusions In the packaging and printing industry, inspection technology is an important product inspection technology and an important means to ensure the printing quality of enterprises. It is possible to use computer vision detection technology for online detection, but to realize online detection, it is necessary to solve the multi-channel synchronization problem of video image acquisition and processing and stereo measurement. Defect detection of printed matter is the key link of automatic detection of printed matter quality. However, due to the tensile deformation of the printed matter itself, the jitter of the printed matter and the noise of the light source in the printing process, there are certain errors in the differential image after image matching, which is easy to cause misjudgment in system detection. This paper discusses the use of computer vision instead of manual inspection of printed packaging products, and then judge whether the printed products are qualified. Compared with the traditional wavelet transform algorithm, the printing and packaging defect detection algorithm in this paper has obvious advantages in the later stage of operation, and the error is reduced by 32.55%. This algorithm has a high detection accuracy for printing and packaging defects, which is 22.48% higher than the control algorithm, and the recall rate is 19.56% higher. The detection method and system can well identify some difficultto-identify defects and make up for the shortcomings of traditional methods. Using image processing method to detect defects can improve the automation and intelligence level of packaging, printing and other industries to a certain extent, and reduce the labor intensity.

References 1. C.-H. Liu, N. Jeyaprakash, C.-H. Yang, Material characterization and defect detection of additively manufactured ceramic teeth using non-destructive techniques. Ceram. Int. 47, 7017–7031 (2021) 2. F. Li, Q. Xi, Research and implementation of a fabric printing detection system based on a field programmable gate array and deep neural network. Text. Res. J. 92, 1060–1078 (2022) 3. A. Dey, G. Reddy, Toroidal condensates by semiflexible polymer chains: Insights into nucleation, growth and packing defects. J. Phys. Chem. B 121, 9291–9301 (2017) 4. F. Gao, F. Ma, J. Wang, J. Sun, E. Yang, H. Zhou, Semi-supervised generative adversarial nets with multiple generators for SAR image recognition. Sensors 18, 2706 (2018) 5. Z. Jia, T. Zhang, X. Cao, Design and implementation of defect detection device for food packaging based on machine vision. Food Mach. 34(7), 4 (2018) 6. L. Song, X.U. Jingwei, Y. Yang, Q. Guo, H. Yang, Research and application of package defects detection algorithm based on improved GM. J. Appl. Opt. 40, 644–651 (2019) 7. M. Chen, X. Wang, H. Luo, Y. Geng, W. Liu, Learning to focus: cascaded feature matching network for few-shot image recognition. Sci. China Inform. Sci. 64, 192105 (2021) 8. M. Hayat, S.H. Khan, M. Bennamoun, Empowering simple binary classifiers for image set based face recognition. Int. J. Comput. Vis. 123, 479–498 (2017) 9. W. Wang, X. He, H. Li, Y. Huang, Y. Zhong, A fast online detection algorithm for packaging coil. Pack. Eng. 39, 146–152 (2018)

27 Design of Defect Detection Algorithm for Printed Packaging Products …

279

10. T. Wang, X. Wang, X. Cao, J. Zeng, Z. Jia, X. Li, E. Yao, Optimization of inner packaging defect detection light source based on machine vision. Pack. Eng. 40, 203–211 (2019) 11. S. Wang, L. Lu, H. Yang, Application of convolutional neural network in printing defect detection. Pack. Eng. 40, 203–211 (2019) 12. Y. Pan, A. Braun, I. Brilakis, A. Borrmann, Enriching geometric digital twins of buildings with small objects by fusing laser scanning and AI-based image recognition. Autom. Constr. 140, 104375 (2022) 13. J. Qin, C. Wang, X. Ran, S. Yang, B. Chen, A robust framework combined saliency detection and image recognition for garbage classification. Waste Manag. 140, 193–203 (2022) 14. J. Liang, F. Xu, S. Yu, A multi-scale semantic attention representation for multi-label image recognition with graph networks. Neurocomputing 491, 14–23 (2022)

Chapter 28

Leap Motion Gesture Information Collection and Gesture Interaction System Construction Yuan Wang

Abstract Gesture interaction has very important and far-reaching research significance. It completes the judgment and recognition of gestures through data collection and feature extraction of the original gestures, so that the computer can understand human gesture behavior and finally achieve natural virtual interaction. As an important means of human–computer interaction, the improvement of gesture recognition rate is an important research direction at present. In this paper, a method for correcting the interaction level of Leap Motion (LM) gestures is proposed. This method analyzes the recognition errors of LM by comparing the thresholds in real time, and corrects the hand position by using a hierarchical correction algorithm to solve the unstable recognition phenomenon in the process of human-hand interaction. Through the analysis of the experiment, compared with the traditional CNN model algorithm, the improved CNN has obvious advantages in the later stage of operation, the error is reduced by 36.55%, and the interactive content recognition accuracy is over 90%, which fully proves that this method can improve the recognition accuracy of LM and enhance the user experience. The results show that the proposed method has a high recognition rate. The method is applied to the gesture interaction system to realize the interaction between natural gestures and virtual objects, which increases the interactive fun and improves the interactive experience. Keywords Gesture recognition · Man–machine interaction · Information acquisition

Y. Wang (B) Wuhan Institute of Design and Sciences, Wuhan 430205, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_28

281

282

Y. Wang

28.1 Introduction The development of information technology promotes the change of human society, and the birth of every technology affects people’s life and production to a certain extent [1]. The arrival of the Internet era has further promoted the spread of information technology and accelerated the progress of science and technology. Since the digital technology revolution, human–computer interaction technology has developed rapidly and has been widely used in various fields [2]. In this context, because the somatosensory interaction technology does not need a complex controller. Users can directly interact with the surrounding environment and equipment through physical movements, which has gradually become the most popular interaction technology [3]. The main purpose of human–computer interaction is to allow users to freely control equipment and communicate with equipment through some simple operations. In fact, ordinary computers store data in binary way, which is a one-dimensional storage space. Then human–computer interaction is to realize the docking of multidimensional information channels and one-dimensional storage between human beings and computers, so the core topic of human–computer interaction is interface [4]. Gesture interaction has very important and far-reaching research significance. It completes the judgment and recognition of gestures through data collection and feature extraction of the original gestures, so that the computer can understand human gesture behavior and finally achieve natural virtual interaction [5]. With the continuous updating of technology, the methods of collecting gestures are more diverse and the methods of recognizing gestures are more efficient. As a key component in the process of human–computer interaction, people play a major role in the interaction, and gestures, as one of the tools for human body to communicate with the outside world, are the most flexible and informative body language. People can convey different meanings through different gestures [6]. The traditional interactive method based on data gloves allows users to wear data gloves, and obtains the gesture information of human hands through the sensors on the gloves. Although this method has accurate data recognition and fast recognition speed, data gloves often affect the user’s operation experience, and the equipment price is relatively high [7]. Developing new human–computer interaction technology can not only bring better human–computer interaction experience, but also greatly improve input and output efficiency. As one of the new technologies of human–computer interaction, dynamic gesture directly uses the user’s hand as the input device of the machine. The communication and interaction between man and machine no longer need to go through the intermediary, and users can interact and control the surrounding computers only by defining an appropriate dynamic gesture [8]. Dynamic gesture recognition technology based on computer vision or depth information improves the efficiency and friendliness of human–computer interaction technology, makes intuitive, convenient and natural human–computer interaction possible, and becomes a key technology to promote the rapid development of human–computer interaction

28 Leap Motion Gesture Information Collection and Gesture Interaction …

283

[9]. With the rapid development of multimedia technology, the traditional human– computer interaction tools have become increasingly difficult to meet people’s interaction needs. People want to communicate with computers simply by gestures, which makes this kind of interaction more convenient and fast. In this paper, a hierarchical correction method of LM gesture interaction is proposed. This method analyzes the recognition errors of LM by comparing the thresholds in real time, and corrects the hand position by using a hierarchical correction algorithm to solve the unstable recognition phenomenon in the process of human-hand interaction.

28.2 Methodology 28.2.1 Collection and Processing of Gesture Data One of the difficulties of dynamic gesture recognition is how to define the beginning and end of an independent gesture. Based on the analysis of dynamic gestures, this article proposes a method to distinguish between gesture motion speed and direction change rate to solve this problem. Due to the limited area of LM’s sensing of human hands, it is necessary to ensure that the moving fingers are always within the detection range of the somatosensory controller during gesture acquisition. However, because there is no limited sign, there is still the possibility that the finger will go beyond the detection area and lose the gesture information in the process of gesture collection. In the process of dynamic gesture collection, every gesture sampling point must first judge whether the finger exists in the field of view with LM. In order to calculate the rotation angle of gesture, it is necessary to select a unit vector that can represent the direction of hand from the geometric data of hand as the characteristic data of rotation [10]. Other data such as metacarpal direction vector or phalanx direction vector can be selected. Considering the user’s operability, when the user rotates his hand, it almost rotates around the axis perpendicular to the palm, and the angle between the metacarpal direction of the palm and the rotation axis is close to 90°, so that the metacarpal rotation is more obvious and it is easier to identify the rotation action. Therefore, the metacarpal direction vector is selected as the characteristic data of the hand direction, and the rotation angle is calculated by the metacarpal direction vector. When the user is moving his hand, there may be unconscious rotation action, and the rotation angle value has exceeded the threshold, and the result is wrongly recognized as a rotation gesture, which needs to be ignored [11]. Similarly, when rotating, the position of the hand may be unconsciously moved, and the displacement exceeds the threshold, resulting in mistaken recognition of the moving gesture. The collected dynamic gesture data consists of the coordinates of a series of finger sampling points during the gesture movement, so the original data need to be processed in the next experiment, and at the same time, it is necessary to save the collected original data for the convenience of future research on dynamic gestures collected by LM. In this paper, the way to save dynamic gesture data is to store

284

Y. Wang

the original data of each dynamic gesture as a text file. When the multi-dynamic gesture data is processed in the future, the original data can be obtained from the text file again for processing. The split hand data are stored in the array in sequence, following the requirements of each data structure, and the data are reorganized in sequence, and then the coordinates are converted, packaged and sent to Unity3D, so that the original frame data can be replaced by the split and reorganized data.

28.2.2 Gesture Recognition Model of Interactive System As an important part of computer development, human–computer interaction has been developing with the development of computers. The main purpose of its development is to specifically understand computer users, accurately predict the reasons why they interact with computers, and realize efficient feedback interaction according to the obtained information, so as to make users interact with computers in real time comfortably, quickly and safely, and achieve users’ expected goals. Real-time and accuracy of gesture acquisition is one of the important factors to improve the accuracy of dynamic gesture recognition [12]. LM uses binocular infrared camera to greatly improve the positioning accuracy of dynamic gestures, and try to reduce the finger coupling problem caused by the overlap between fingers. In addition, filtering images by infrared camera can effectively reduce the influence of background noise on dynamic gesture acquisition. A complete dynamic gesture can be regarded as a combination of a series of hands at different moments and postures, and the postures of hands at each moment can be expressed by a series of hand information, such as palm coordinates, palm speed, fingertip speed, finger length, width and knuckle angle. When LM captures a dynamic gesture, it will return a series of continuous frames, which contain the original posture information of the hand. The CNN model of gesture recognition is shown in Fig. 28.1. Before dynamic gesture recognition, it is necessary to carry out a series of preprocessing on the collected data to convert the gesture data into data with the same space size and format. At the last layer of the output data, LM uses CNN to perform multilayer convolution filtering on the image, which further reduces the noise in the data.

Fig. 28.1 CNN model of gesture recognition

28 Leap Motion Gesture Information Collection and Gesture Interaction …

285

Fig. 28.2 Convolution operation process of gesture recognition

Therefore, LM can provide users with efficient and real-time gesture data more stably and accurately. This provides a stable and reliable data guarantee for dynamic gesture recognition. The convolution operation process of gesture recognition is shown in Fig. 28.2. One of the difficulties in dynamic gesture data collection is how to determine the beginning and end of dynamic gestures. When LM captures a hand or other hand-like target, it will continuously output gesture information in the form of frames regardless of whether the target moves or not. In LM-based gesture interaction, the interaction information is implicit, which brings users a variety of interaction methods and environments, which can not only help users complete the operation close to reality in the virtual background, but also achieve the goals and tasks different from reality. Firstly, the collection S containing M gestures is obtained. Each image can be converted into a vector of N dimension. Put these M vectors into a set S: S = {┌1 , ┌2 , ┌3 , . . . , ┌n }

(28.1)

After obtaining the gesture vector set, calculate the average image: ϕn = ┌n − ψ

(28.2)

M orthogonal unit vectors u k are found, and this kind of unit vectors are used to describe ϕn distribution. Find the eigenvalue λk . When u k is determined, λk value is minimum:

286

Y. Wang

λk =

M 1 ∑  T 2 u ϕn M n=1 k

(28.3)

Everyone’s hand bone structure is different. Therefore, there will be some problems in the collected dynamic gestures, such as inconsistent shape, inconsistent range and size. Even if two identical gestures come from the same person, sometimes there will be great differences because of the interference of collection. Therefore, it is necessary to standardize the collected dynamic gesture samples to reduce the noise interference between data. Assuming that the input and output functions of gesture image feature information are expressed as R and R ' respectively, the bilateral filtering discrete form expression of gesture image feature information is as follows: R ' = [k, j] =

p p ∑ ∑

B[m, n, k, j]R[k − m, j − n]

(28.4)

m=− p n=− p

where p represents a pixel of gesture image feature information; m represents the variance of gesture image feature information; n represents the standard deviation of gesture image feature information; B[m, n, k, j] represents Gaussian kernel function of gesture image feature information, and its calculation expression is as follows:

B[m, n, k, j] =

 2 2 exp − m2σ+n2 − δ

R[k−m, j−n] 2σξ2

R(k, j )

 (28.5)

where σ represents the scale parameter of gesture image feature information. The above formula is used to smooth the feature information of gesture images from geometric and photometric domains, eliminate the influence of noise, and keep the feature details of gesture images.

28.3 Result Analysis and Discussion In order to enhance the interaction between human and computer, non-cooperative somatosensory interaction technology is the most popular direction in the development and application of somatosensory technology. The operator does not need to contact the computer, but only needs to do some actions on the computer, and these actions will be recorded, and at the same time, through the tracking of various parts of the human body [13]. In interactive design, people are both operators and experiencers, and they are in the core position in the interactive process. No matter what form of interaction, it should focus on people’s participation. In the process of gesture interaction, LM can track multiple targets at the same time, with high recognition accuracy, and can be used with computers and VR devices. In the process of gesture

28 Leap Motion Gesture Information Collection and Gesture Interaction …

287

interaction based on LM, it has the characteristics of continuity, multi-dimension and implication. In the process of gesture interaction, the interactive information based on three-dimensional space is obtained through LM. Among them, the information of each frame contains the spatial position and characteristic information of gestures or tools. The application will also provide users with three-dimensional visual image feedback to bring users a multi-dimensional interactive experience. Support vector machine (SVM) shows many unique advantages in the process of pattern recognition with small sample size, nonlinearity and high dimension, and can be well popularized and applied to other machine learning problems. SVM has the advantages of low generalization error, low calculation cost and easy interpretation of results, and is more suitable for the classification of numerical and normative data. The SVM algorithm is selected as the comparison object, and the experimental results are shown in Tables 28.1 and 28.2. When the number of test samples begins to increase, the accuracy of gesture recognition of both methods shows a downward trend. However, compared to SVM, the improved CNN gesture recognition accuracy is significantly higher. A good interactive gesture cannot only be recognized by efficient device optimization detection code, but its actions should also be easily detected and recognized by the device. This can not only improve the accuracy of device recognition, but also improve the effectiveness and usability of the program. Conduct a centralized analysis of the collected gesture data, record the corresponding gesture movements and number of users for each function, and analyze whether the movements made by the Table 28.1 Improving the LM gesture recognition accuracy of CNN

Table 28.2 Improving the LM gesture recognition accuracy of SVM algorithm

Sample size

Gesture recognition accuracy (%)

15

98.69

30

98.16

45

97.77

60

97.35

75

96.78

90

96.26

105

95.68

Sample size

Gesture recognition accuracy (%)

15

96.85

30

96.42

45

93.11

60

90.76

75

88.42

90

85.21

105

80.67

288

Y. Wang

experimental personnel are smooth and coherent, and whether most of the experimental personnel are consistent. The results of precision testing using the SVM algorithm are shown in Fig. 28.3. The results of precision testing using improved CNN are shown in Fig. 28.4. It can be analyzed that the gesture recognition model based on improved CNN is better than SVM in both accuracy and efficiency. The depth features of automatic learning can be well used for identification and classification, and with the supplement of local features, the overall model has a better recognition rate. According to the generated error, the loss function value is calculated and the gradient of all weights

Fig. 28.3 Accuracy test results of SVM algorithm

Fig. 28.4 Improved CNN accuracy test results

28 Leap Motion Gesture Information Collection and Gesture Interaction …

289

Fig. 28.5 Comparison of algorithm MAE

and offsets in the network is adjusted, and then all weights and offsets of the neural network are adjusted until the gap between the final output result and the ideal target value disappears or nearly disappears. The comparison between this algorithm and traditional CNN MAE is shown in Fig. 28.5. It can be seen that compared with the traditional CNN model, the improved CNN has obvious advantages in the later stage of operation, and the error is reduced by 36.55%. Compared with other loss functions, the cross entropy loss function has higher model accuracy and less model loss. The optimized data structure is applied to LM system for testing, which significantly improves the speed and accuracy of gesture capture, greatly relieves the pressure of network transmission bandwidth, realizes real-time capture of large-scale gestures, and meets the needs of large-scale manual product assembly.

28.4 Conclusion Since the digital technology revolution, human–computer interaction technology has developed rapidly and has been widely used in various fields. In this context, because the somatosensory interaction technology does not need complex controllers, users can interact with the environment and equipment around them directly through physical movements, and it has gradually become the most popular interaction technology. Compared with other gesture acquisition devices, LM devices are more miniaturized and specialized, which provides a guarantee for fast and accurate dynamic gesture recognition methods and improves the gesture interaction experience in virtual scenes. In this paper, the recognition error of LM is analyzed by real-time

290

Y. Wang

comparison threshold, and the position of human hand is corrected by hierarchical correction algorithm to solve the unstable recognition phenomenon in the process of human hand interaction. Compared with traditional CNN model, the improved CNN has obvious advantages in the later stage of operation, and the error is reduced by 36.55%. Compared with other loss functions, the cross entropy loss function has higher model accuracy and less model loss. The optimized data structure is applied to LM system for testing, which significantly improves the speed and accuracy of gesture capture, greatly relieves the pressure of network transmission bandwidth, realizes real-time capture of large-scale gestures, and meets the needs of large-scale manual product assembly. Using a unified joint motion threshold to judge the recognition error of LM will affect the accuracy of joint correction. The next step is to get the joint motion thresholds of different levels of joint angles according to the hierarchical structural characteristics of human hands. Acknowledgements Application of Gesture Interaction Technology in Public Art Design in Smart City (B2022422).

References 1. I.U. Rehman, S. Ullah, D. Khan, S. Khalid, A. Alam, G. Jabeen, I. Rabbi, H.U. Rahman, N. Ali, M. Azher, Fingertip gestures recognition using leap motion and camera for interaction with virtual environment. Electronics 9, 1986 (2020) 2. B. Yang, X. Xia, S. Wang, L. Ye, Development of flight simulation system based on leap motion controller. Proced. Comput. Sci. 183, 794–800 (2021) 3. K. Lee, S. Lee, J. Lee, Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 1–10 (2018) 4. D. Bachmann, F. Weichert, G. Rinkenauer, Review of three-dimensional human-computer interaction with focus on the leap motion controller. Sensors 18, 2194 (2018) 5. R. Fonk, S. Schneeweiss, U. Simon, L. Engelhardt, Hand motion capture from a 3D leap motion controller for a musculoskeletal dynamic simulation. Sensors 21, 1199 (2021) 6. Q. Wang, W. Jiao, P. Wang, Y. Zhang, Digital twin for human-robot interactive welding and welder behavior analysis. IEEE/CAA J. Autom. Sin. 8, 334–343 (2020) 7. M. Cohen, D. Regazzoni, C. Vrubel, A 3D virtual sketching system using NURBS surfaces and leap motion controller. Comput. Aided Des. Appl. 17, 167 (2019) 8. C. Mizera, T. Delrieu, V. Weistroffer, C. Andriot, A. Decatoire, J.-P. Gazeau, Evaluation of hand-tracking systems in teleoperation and virtual dexterous manipulation. IEEE Sens. J. 20, 1642–1655 (2019) 9. K.E. Raheb, M. Stergiou, A. Katifori, Y. Ioannidis, Dance interactive learning systems: a study on interaction workflow and teaching approaches. ACM Comput. Surv. 52, 1–37 (2019) 10. I. Cortés-Pérez, N. Zagalaz-Anula, D. Montoro-Cárdenas, R. Lomas-Vega, E. Obrero-Gaitán, M.C. Osuna-Pérez, Leap motion controller video game-based therapy for upper extremity motor recovery in patients with central nervous system diseases. A Systematic review with meta-analysis. Sensors 21, 2065 (2021)

28 Leap Motion Gesture Information Collection and Gesture Interaction …

291

11. H. Li, L. Wu, H. Wang, C. Han, W. Quan, J. Zhao, Hand gesture recognition enhancement based on spatial fuzzy matching in leap motion. IEEE Trans. Ind. Inform. 16, 1885–1894 (2019) 12. G. Ponraj, H. Ren, Sensor fusion of leap motion controller and flex sensors using Kalman filter for human finger tracking. IEEE Sens. J. 18, 2042–2049 (2018) 13. A. Ganguly, G. Rashidi, K. Mombaur, Comparison of the performance of the leap motion ControllerTM with a standard marker-based motion capture system. Sensors 21, 1750 (2021)

Chapter 29

Optimization of Moving Object Tracking Algorithm Based on Computer Vision and Vision Sensor Gongchao Liu

Abstract Machine vision uses the collected image information to understand and analyze the real world, and has been applied in more and more scenes. Many factors will have a substantial impact on the performance and efficiency of the algorithm in the process of target movement. In this paper, an improved convolutional neural network (CNN) is proposed to optimize the design of moving target tracking algorithm. The targets with the same characteristics in each frame of image are correlated to obtain the information of the target’s position, speed, acceleration and so on. Then the complete driving trajectory of the target in the field of view is obtained. The simulation results show that compared with the traditional support vector machine (SVM) algorithm, the improved CNN has obvious advantages in the later stage of operation, and the error is reduced by 33.57%. The moving target tracking model based on improved CNN algorithm has more advantages than SVM algorithm in both monitoring accuracy and error. Using the tracking algorithm, the moving target can be tracked stably in real time, thus providing a reliable data source for the deep analysis of target behavior and higher-level visual tasks in the later period according to the actual application requirements. Keywords Computer vision · Convolutional neural network · Sensor · Target following

29.1 Introduction Vision is the most powerful way of human perception. It is usually used to observe things and the environment to obtain relevant information, especially when it is impossible to make direct contact with objects, it can be used to interact with the surrounding environment [1]. With the rapid improvement of computer computing G. Liu (B) Network Information Center (NIC), Wuhan Qingchuan University, Wuhan 420001, Hubei, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_29

293

294

G. Liu

ability and the continuous progress of image acquisition technology, people try to use computers to replace the human eye and brain, intelligently analyze and process the image or video information captured by the camera, and achieve further operations [2]. Computer vision is to use image sensors to collect and obtain images or image sequences, and use relevant technologies for processing and analysis, such as image processing, artificial intelligence, etc., to achieve the purpose of describing and interpreting the three-dimensional world [3]. Computer vision has become an important discipline, covering computer graphics, pattern recognition, neural network and other technologies. The basic idea of target tracking is to detect, extract and recognize the moving target according to the correlation of video information in space and time, and track the target using the extracted representative target features. At the same time, according to the actual application requirements, obtain the motion information such as the centroid coordinates, size, speed or motion track of the target from the tracking process, Provide a reliable data source for in-depth analysis of target behavior and higher-level visual tasks in the later stage [4, 5]. The appearance of computer vision technology has further extended the development of human visual system, especially with the help of various sensor technologies, enabling people to track the moving object in real time, so as to accurately grasp the specific shape attributes of the object [6]. Real-time observation and analysis of massive video data is a very difficult task. Even the post-video query is extremely time-consuming and difficult to obtain all important information. The tracking of moving targets is the basis of subsequent processing processes such as target recognition and classification, target feature extraction, and the results directly affect the understanding and description of target behavior, as well as the difficulty and accuracy of higher-level processing processes such as reasoning and decision-making [7]. In the process of target tracking, there will also be various interferences. The most common interferences are the occlusion of obstacles and the color change of complex background [8]. In practical applications, due to the different speed of tracking targets, the difference of shooting scenes and the difference of the number of tracking targets, the applicable detection and tracking algorithms are also different, and the performance requirements of the algorithms are also different [9]. Fu [10] believes that target detection is widely used in tracking, behavior recognition and anomaly detection, and subsequent algorithms including target detection usually only use foreground pixels instead of the whole picture, so the accuracy of target extraction will greatly affect the performance of the algorithm. Liu [11] believes that in the process of visual tracking of moving objects, the purpose of tracking is to obtain the position, speed, acceleration, trajectory and other information of moving objects in the scene, which is equivalent to the later stage in the visual process. Daryasafar et al. [12] pointed out that in complex scenes, the image information collected by video surveillance is fuzzy and the background noise is large, which requires the auxiliary detection support of video multi-frame image information. This paper proposes to use improved CNN to optimize the design of moving target tracking algorithm, associate the targets with the same characteristics in each frame of the image, get the target’s position, speed, acceleration and other information, and then get the target’s complete travel path in the field of view.

29 Optimization of Moving Object Tracking Algorithm Based …

295

29.2 Methodology 29.2.1 Computer Vision and Moving Object Detection Moving object detection refers to the process of detecting changing areas in video image sequences and extracting moving objects from the background. Moving target detection is the key link of moving target tracking, and it is the bottom of the whole moving target tracking system. Accurate extraction of moving target area is the basis of subsequent advanced applications, such as target behavior analysis. As the most basic task in most machine vision applications, target detection is to segment the moving foreground target from the video or picture sequence, and extract the contour and color features of the target. Target detection is widely used in tracking, behavior recognition and anomaly detection. For the interference factors such as illumination changes and target shadows in video images, target detection methods must also effectively eliminate their effects. In order to get this information, after obtaining and preprocessing the image sequence, the moving target is detected directly from the image sequence. After the moving target is detected, the moving target is extracted and identified to judge whether to track it. Finally, the target is tracked and the relevant motion information of the moving target is obtained. In the field of machine vision, dynamic target detection is the process of separating the real-world target from its background environment, and extracting the target from a single image in the collected image sequence. In order to obtain the continuous characteristics of the target in the video surveillance system, it is necessary to detect and track the target in real time. Then the characteristics of specific target objects are extracted and analyzed, so as to provide comprehensive and accurate target characteristics for behavior classification and recognition. In binocular vision system, in order to obtain the depth information of the midpoint in space, firstly, binocular camera calibration should be carried out, that is, the corresponding relationship between image pixels and three-dimensional scene should be established. Camera calibration in binocular vision refers to the process of solving camera projection matrix. The projection matrix is determined by the internal and external parameters of the camera. The camera parameters are calculated by using the constraint relationship between multiple three-dimensional feature points on the calibration board and their projection points on the two-dimensional plane image.

29.2.2 Moving Target Tracking Algorithm Moving target detection and tracking is to process continuous image sequences through visual programs, detect and extract moving targets from background images, and use image segmentation algorithm to separate interested targets, and then use tracking algorithm to track and locate the selected targets, so as to obtain accurate position, speed and other information of the targets. In the process of target tracking,

296

G. Liu

if we want to successfully build a model of CNN’s target tracking, we need to choose a suitable model and build a reasonable network structure. Because CNN uses the idea of visual receptive field, features can be invariant through convolution and pooling, such as scale invariance, deformation invariance and translation invariance. In CNN, some edge features can be obtained by pre-training feature learning, which can be used as parameters of the convolution filter, and then pooled to improve the translation invariance of the model. As a method of deep learning, CNN is widely used in various fields, such as moving target tracking. Moving target tracking refers to extracting the features of the target in the video sequence and determining the position of the target by using an appropriate matching algorithm [13]. Therefore, choosing an appropriate feature extraction method is necessary to effectively distinguish the target from the background. Because the influence of camera distortion is not considered in the initial value calculation, the initial value error of the calculated linear parameters is relatively large; When the initial value calculates the distortion, the error of the distortion parameters obtained in the first calculation is also relatively large. Therefore, to improve the accuracy of obtaining the internal and external parameters of the camera, it is necessary to recalculate the linear parameters by using the calculated distortion coefficient, and then recalculate the distortion coefficient by using the new linear parameters. After many calculations, until the values of the linear parameters and the distortion coefficient converge. The feature extraction model of moving targets based on improved CNN is shown in Fig. 29.1.

Fig. 29.1 Feature extraction model of moving target

29 Optimization of Moving Object Tracking Algorithm Based …

297

Let X ik denote the sum of the inputs of the neurons i in layer k and Yik as the output. The weight of the neuron j in the k − 1 layer to the neuron i in the k layer is Wi j , then there is the following functional relationship:   Yik = f X ik X ik =

n+1 

Wi j Y jk−1

(29.1)

(29.2)

j=1

Usually f is taken as the asymmetric Sigmoid function:   f xik =

1   1 + exp −X ik

(29.3)

Assuming that the output layer is the m-th layer, the actual output of the i-th neuron in the output layer is Yim . Let the corresponding motion signal be Yi , and define the error function e as: e=

2 1  m Yi − Yi 2 i

(29.4)

After calculating the response value of motion features, the matched semantic features can be updated by filters. When occlusion is detected, the size of the search area needs to be changed, and the method of setting the search area without occlusion can no longer be adopted. If the overlap between blocks is not considered, when a given image block contains both foreground pixels and background pixels, erroneous classification at the pixel level is inevitable [14]. In a simple scene, the environment changes slowly, so the single Gaussian model can be combined when establishing the background model; For the environment with complex image information content, it is necessary to adopt mixed Gaussian distribution fitting when establishing the model. The background in the real environment is usually dynamic, and the statistical law of a single pixel is multimodal, so it is necessary to describe the law of this point by using the combination of multiple Gaussian distributions. In the aspect of complex environment modeling, the model is updated according to a newly acquired image. Finally, all the pixels in the current image are compared with the established background model. Points with higher similarity are classified as background, otherwise they are classified as targets.

298

G. Liu

29.3 Result Analysis and Discussions There are many background components in the tracking rectangular box, and the background information may change continuously with the target tracking process, becoming redundant information [15]. However, the filter used in the convolution process of CNN structure in this paper can extract some edge features. After pooling, there will be translation-invariant edge features at the junction of the target and the background, which will greatly reduce the redundant information caused by background changes. Taking the tracking accuracy of moving target as the test index, the traditional SVM algorithm is selected as the comparison object, and the experimental results are shown in Tables 29.1 and 29.2. When the number of test samples began to increase, the accuracy of target detection and tracking of both methods showed a downward trend. However, compared with the traditional SVM algorithm, the accuracy of moving target detection and tracking based on improved CNN is obviously higher. The task of video target tracking is transformed into inferring the spatial state of the tracked target using observation data under the framework of particle filter, and solving the state space parameter variables in a time-varying dynamic system. The state reasoning framework includes dynamic model and observation model. Among them, the dynamic model represents the evolution process of system state in time series; The observation model is to actually observe the system state containing noise data at discrete time points. Table 29.1 Improving CNN’s moving target tracking accuracy

Table 29.2 Tracking accuracy of moving target based on SVM algorithm

Sample size

Target detection and tracking accuracy (%)

15

98.85

30

98.75

45

97.85

60

97.77

75

96.64

90

96.45

105

96.03

Sample size

Target detection and tracking accuracy (%)

15

95.47

30

94.36

45

94.77

60

93.21

75

91.35

90

90.64

105

89.7

29 Optimization of Moving Object Tracking Algorithm Based …

299

Fig. 29.2 Comparison of algorithm MAE

The feature points that need to be extracted in moving target tracking must ensure its stability, that is, the positions of these feature points will not change greatly because of the slight change of image quality. These feature points need to ensure scale and rotation invariance, so that the same target object can be detected in different scenes, so these feature points must be extracted from multiple scales. The comparison of the algorithm MAE is shown in Fig. 29.2. It can be seen that compared with the traditional SVM algorithm, the improved CNN has obvious advantages in the later stage of operation, and the error is reduced by 33.57%. Based on the moving foreground target detected in video or image sequence, the same moving object is tracked continuously by using a certain feature of the target and combining with an efficient search algorithm, and the information such as the moving state and trajectory of the target is obtained. After the reference model is established, the newly obtained frame is compared with the model, and pixels or regions different from the model are marked as foreground. At the same time, the reference model is updated by using the regions marked as background. If the pixel value of a point in an image follows the law of Gaussian distribution for a period of time, the probability distribution of the pixel point can be defined by Gaussian probability density function. When most of the contents of the image are always classified as foreground objects for a period of time, the private reinitialization process of the model will be triggered. The scatter plot of accuracy test using traditional SVM algorithm is shown in Fig. 29.3. The scatter diagram of testing accuracy using improved CNN is shown in Fig. 29.4. It can be analyzed that the moving target tracking model based on the improved CNN algorithm has advantages over SVM algorithm in both monitoring accuracy and error. Although the two target markers were blocked for a short time in the

300

G. Liu

Fig. 29.3 Accuracy detection results of SVM algorithm

Fig. 29.4 Accuracy detection results of improved CNN algorithm

assessment process, stereo tracking of the moving target through binocular vision can still accurately track the target markers within the time of mutual occlusion. When the two target marker points are separated, the correct tracking marker points can still be found among the marker points that realize prediction by stereo tracking of the moving target through binocular vision. The assessment results also show that the marker points finally tracked by this tracking algorithm are correct and can be popularized. Compared with the traditional target tracking algorithm, although the number of target marker points is very large, there will be mutual occlusion and similar phenomena, but it can still solve the defects of the traditional algorithm and make up for the shortcomings of tracking errors.

29 Optimization of Moving Object Tracking Algorithm Based …

301

29.4 Conclusion Computer vision is to use image sensors to collect and obtain images or image sequences, and use related technologies to process and analyze them, so as to achieve the purpose of describing and explaining the three-dimensional world. In the complex environment, how to realize a robust, accurate and fast detection and tracking algorithm is the direction of all researchers at present. In this paper, the improved CNN is proposed to optimize the design of moving target tracking algorithm, and the targets with the same characteristics in each frame of image are correlated to get the information of the target’s position, speed and acceleration, and then the complete driving trajectory of the target in the field of view is obtained. The results show that compared with the traditional SVM algorithm, the improved CNN has obvious advantages in the later stage of operation, and the error is reduced by 33.57%. The moving target tracking model based on improved CNN algorithm has more advantages than SVM algorithm in both monitoring accuracy and error. This tracking and updating mechanism solve the problems of interference to target tracking and the change of target itself, and has obvious advantages in the accuracy and success rate of tracking effect. However, the tracking system in this paper only studies the tracking of a single target in detail, but in the tracking process, it is often necessary to track multiple targets, so how to use CNN to track multiple targets is also a new direction for future research and learning.

References 1. K. Liu, S. Wei, Z. Chen, B. Jia, G. Chen, H. Ling, C. Sheaff, E. Blasch, A real-time highperformance computation architecture for multiple moving target tracking based on wide-area motion imagery via cloud and graphic processing units. Sensors 17, 356 (2017) 2. H. Ahmadi, F. Viani, R. Bouallegue, An accurate prediction method for moving target localization and tracking in wireless sensor networks. Ad Hoc Netw. 70, 14–22 (2017) 3. X. Gong, Z. Le, H. Wang, Y. Wu, Study on the moving target tracking based on vision DSP. Sensors 20, 6494 (2020) 4. M. Rabah, A. Rohan, S.A. Mohamed, S.-H. Kim, Autonomous moving target-tracking for a UAV quadcopter based on fuzzy-PI. IEEE Access 7, 38407–38419 (2019) 5. Y. Feng, S. Zhao, G. Yu, Research on moving target tracking algorithm based on computer vision in complex scene. Revista de la Facultad de Ingenieria 32, 784–790 (2017) 6. X. Lu, Z. Jia, X. Hu, W. Wang, Double position sensitive detectors (PSDs) based measurement system of trajectory tracking of a moving target. Eng. Comput. 34, 781–799 (2017) 7. Z. Li, X. Chen, Z. Zhao, Design of standoff cooperative target-tracking guidance laws for autonomous unmanned aerial vehicles. Math. Probl. Eng. 2021, 1–14 (2021) 8. M. Liu, D. Zhang, S. Zhang, Q. Zhang, Node depth adjustment based target tracking in UWSNs using improved harmony search. Sensors 17, 2807 (2017) 9. J. Li, J. Wang, W. Liu, Moving target detection and tracking algorithm based on context information. IEEE Access 7, 70966–70974 (2019) 10. R. Fu, Research on tracking algorithm of moving target based on computer vision. Boletin Tecnico/Tech. Bull. 55, 425–434 (2017) 11. J. Liu, Video moving target detection and tracking based on hybrid algorithm. Revista de la Facultad de Ingenieria 32, 55–64 (2017)

302

G. Liu

12. N. Daryasafar, R. Sadeghzadeh, M. Naser-Moghadasi, A technique for multitarget tracking in synthetic aperture radar spotlight imaging mode based on promoted PHD filtering approach. Radio Sci. 52, 248–258 (2017) 13. P. Anup, K. Rushikesh, W.E. Dixon, Target tracking in the presence of intermittent measurements via motion model learning. IEEE Trans. Robot. 14, 1–15 (2018) 14. M. Anvaripour, M. Saif, M. Ahmadi, A novel approach to reliable sensor selection and target tracking in sensor networks. IEEE Trans. Industr. Inf. 16, 171–182 (2020) 15. Q. Ge, Z. Wei, T. Cheng, S. Chen, X. Wang, Flexible fusion structure-based performance optimization learning for multisensor target tracking. Sensors 17, 1045 (2017)

Chapter 30

Simulation Experiment of DPCM Compression System for High Resolution Multi-spectral Remote Sensing Images Wei Du, Yi Wu, Maojie Tian, Wei Hu, and Zhidong Li

Abstract Compared with natural images, remote sensing images are rich in texture information and have poor spatial correlation, so it is difficult to obtain high compression ratio only by removing spatial redundancy in the spectrum. In order not to lose the original information obtained by detectors, remote sensing data usually require lossless compression. In this paper, the simulation experiment of DPCM compression system for high resolution (HR) multispectral remote sensing image is carried out, and a new algorithm for HR multispectral remote sensing image compression using wavelet transformation (WT) and piecewise DPCM hybrid coding is introduced. Firstly, the algorithm uses DPCM method to remove the inter-spectral redundancy of HR multi-spectral remote sensing images, and then uses WT to encode the residual image after removing the inter-spectral redundancy. This algorithm can not only realize near lossless compression of HR multispectral remote sensing images, but also realize lossy compression with different accuracy conveniently. The simulation results show that the entropy value of the image is reduced after the compression of this algorithm. The algorithm in this paper solves the problems existing in calculating the basis vector by covariance estimation and feature decomposition, and it only needs a few bits, but the calculation is very effective. It shows that the algorithm in this paper can achieve ideal compression effect. Keywords High resolution · DPCM · Multi-spectral remote sensing images

30.1 Introduction High resolution (HR) imaging spectrum technology is a new generation of remote sensing technology, which can obtain the spatial characteristics of ground targets, and at the same time, it can obtain rich spectral information, and can qualitatively W. Du · Y. Wu · M. Tian (B) · W. Hu · Z. Li State Grid Electric Space Technology Company Limited, Beijing 102209, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_30

303

304

W. Du et al.

and quantitatively analyze and identify the measured objects [1]. HR multispectral remote sensing image is derived from multispectral remote sensing technology, such as imaging spectrometer used for meteorological satellites. Compared with ordinary images, the image produced by HR multispectral remote sensing image has more one-dimensional spectral information, and the amount of data is large, which far exceeds the requirements of data transmission and storage equipment, so it must be compressed. Compared with natural images, remote sensing images are rich in texture information and have poor spatial correlation, so it is difficult to obtain higher compression ratio only by removing spatial redundancy in the spectrum [2, 3]. At the same time, today’s remote sensing applications generally require a large dynamic range of instruments, generally using 12-bit quantization, which puts great pressure on real-time transmission and recording, and limits the further application and development of imaging spectrometers. After decades of development, image compression technology has made considerable progress and matured, and has gradually formed corresponding image compression standards. However, many image compression algorithms and standards are aimed at 2D images, and the effect on compressing HR multi-spectral remote sensing images is not obvious. Literature [4] adopts three-dimensional discrete cosine transform (DCT) to compress HR multispectral remote sensing images. The main disadvantage of DCT transform is that the efficiency of spectral band decorrelation is lower than that of K–L transform. Literature [5] uses adaptive DPCM to decorrelate between bands, and then uses Variable Length Coding (VLC) to compress the residual error image, thus realizing lossless compression of AVIRIS image. Literature [6] proposed a lossless compression algorithm for HR multispectral remote sensing images with three-dimensional adaptive prediction, and achieved ideal compression results. As a new type of remote sensing data source, HR multi-spectral remote sensing image has been applied in more and more fields, and various methods for HR multi-spectral remote sensing image compression are still being proposed. In order not to lose the original information obtained by the detector, remote sensing data usually require lossless compression. At the same time, in order to facilitate real-time application, the compression method is required to have a small amount of calculation and low hardware requirements. According to information theory, the reason why digital images can be compressed is that there is correlation between pixels and there is information redundancy. In this paper, the simulation experiment of DPCM compression system for HR multispectral remote sensing image is carried out, and a new algorithm for HR multispectral remote sensing image compression using wavelet transformation (WT) and piecewise DPCM hybrid coding is introduced. The purpose of this paper is to use wavelet coding technology to compress airborne HR multispectral remote sensing images quickly and near lossless, so as to try to solve many problems mentioned above.

30 Simulation Experiment of DPCM Compression System for High …

305

30.2 Research Method 30.2.1 WT and Bit Plane Coding HR multispectral remote sensing images usually contain hundreds of bands, which makes HR multispectral remote sensing images different from ordinary images. The characteristics of HR multispectral remote sensing image itself, such as high data dimension and large data volume, make the compression of HR multispectral remote sensing image have its own particularity, which brings the particularity of HR multispectral remote sensing image processing and makes the previous successful 2D image compression method no longer applicable. For human visual system, image space is the most natural and intuitive expression. For each fixed wavelength, you will get photos of the ground scenery. The image can be compressed because there is redundancy between image pixels, that is, there is correlation. Therefore, before studying the compression method of HR multispectral remote sensing images, it is necessary to analyze the correlation of HR multispectral remote sensing images. For HR multispectral remote sensing images, the correlation mainly has two aspects: spatial correlation and inter-spectral correlation [7, 8]. With the continuous development and perfection of remote sensing technology, remote sensors that obtain a large number of remote sensing data are constantly emerging and developing in the direction of HR multi-spectrum. This raises a question: how to transmit such a large amount of data from the space platform to the ground. Telemetry sensor requires transmitting remote sensing information as much as possible with limited channel capacity. Lossless compression is mainly used, but when the amount of information is too large for lossless compression to meet the channel requirements, lossy compression with less distortion is also considered [9]. If you are only interested in some special areas, you can use classified compression to reduce the burden of communication and storage. Lossless compression method mainly includes two contents: first, remove the correlation of image data, such as DPCM, orthogonal change, etc., to reduce the entropy value of image data. For multiband remote sensing images, both spectral correlation and spatial correlation should be removed. Secondly, Huffman coding, Rice coding and arithmetic coding are often used to encode related results. WT is a transform with local characteristics in both time domain and frequency domain, which is very beneficial to signal compression. Wavelet analysis has become a practical signal analysis tool. One of its main advantages is the ability to provide local analysis and refinement [10]. In WT, the change of basis function is obtained by translating and expanding a base wavelet ψ(t) or mother wavelet, that is:   1 t −b , a>b ψa,b (t) = √ ψ a a

(30.1)

Here, the scale parameter a and the translation parameter b are both real numbers.

306

W. Du et al.

WT inherits and develops the idea of localization of short-time Fourier transform [11]. By using variable time–frequency window, the signal can be refined in multiple scales, thus focusing on any details of the processing object, thus solving the contradiction between localization of Fourier transform in time domain and frequency domain. Let ψ(w) be continuous at w = 0, and we can conclude from the admissibility condition that: ∞ ψ(x)d x = 0

(30.2)

−∞

We can see that the mother wavelet function must be a positive and negative oscillation waveform and its mean value in time domain is zero. Although the area of time–frequency window in WT is constant, the time width and bandwidth are changing. This function of scale parameters is shown in Fig. 30.1, and this multi-resolution characteristic of WT is exactly what we hope the analysis method can have when analyzing non-stationary signals. The redundancy of an image signal mainly includes spatial redundancy, statistical redundancy and human visual redundancy, while multi-spectral remote sensing images also increase inter-spectral redundancy. This redundancy comes from two aspects. First, although there are many spectral segments, they are all responses to the same target features, and the spatial structure of the image is the same. Second, the curve has a large number of smooth regions, and if the spectra of the adjacent spectral segments of the multi-spectral sensor are sampled in this smooth region, the corresponding spectra have high correlation.

Fig. 30.1 Basis function and time–frequency resolution of WT

30 Simulation Experiment of DPCM Compression System for High …

307

Generally speaking, compared with 2D images, HR multispectral remote sensing image data has not only spatial correlation but also inter-spectral correlation. Lossless and lossless compression for HR multispectral remote sensing images is realized by removing the unique redundancy of HR multispectral remote sensing images. The core problem of image compression method based on WT is how to quantize the coefficient matrix and how to encode the quantized results. The basic idea of embedded coding based on bit plane is to sort the transformed coefficients according to their contribution to the restored image quality, and code the bit planes of the coefficients one by one from high to low. Bit-plane coding adopts the quantization technique of successive approximation, which determines the effective value in turn according to a series of thresholds, and the values of each threshold are decreasing in turn; Each threshold is 1/2 of the previous threshold, and the largest threshold is generally the bottom of the maximum value in the coefficient after WT, that is, the largest integer not greater than this value. In most cases, the value of n power is taken, where n is obtained according to the coefficient with the largest amplitude. Due to the invariance of Euclidean norm before and after orthogonal transformation, there are: MSE =

2 2 1  1  xi, j − xˆi, j = ci, j − cˆi, j N i j N i j

(30.3)

where i, j is the height and width of the image, xi, j , xˆi, j is the pixel of the original image and the pixel of the decoded image, ci, j , cˆi, j is the transform coefficient before and after quantization, and N = i ∗ j is the size of the image. The binary representations of all wavelet coefficients are sorted, and the most important bits are transmitted first to ensure that the decoded image is similar to the original image to the greatest extent, and the coding process can be interrupted or ended at any time according to the preset compression ratio and image quality requirements.

30.2.2 Segmentation DPCM Compression Algorithm The difficulty of lossless compression lies in the low compression rate. In order to improve the efficiency as much as possible, it is necessary to improve the relevance of information in all aspects of the system as much as possible on the premise of ensuring the amount of information is not damaged, and then do a better job of decorrelating. DPCM is a mature and effective method for lossless decorrelation. It can provide good latent image and gray performance, preserve background information, and is simple to implement. It uses previous image data to predict the value of the current point, and replaces the current point with an error value with a small amplitude and a high probability of occurrence between the predicted value and the actual value

308

W. Du et al.

Fig. 30.2 Working principle of DPCM system

of the current point, so as to remove the correlation between sources and reduce the zero-order entropy of the image [12]. Each spectral segment of HR multispectral remote sensing image corresponds to the same ground scene, so there is correlation between spectral segments, but the magnitude of correlation is different. If a spectrum segment with strong correlation with the current spectrum segment is used for prediction, the prediction will be more accurate, otherwise, the prediction effect will be poor. The decorrelation method mainly carries out predictive coding according to DPCM principle. The output of DPCM predictive encoder is the difference between the current actual value and the signal predicted value. The working principle of DPCM system is shown in Fig. 30.2. DPCM is to use the correlation of speech signals to find a difference value that can reflect the characteristics of signal changes and encode it. According to the correlation principle, the amplitude range of this difference must be smaller than that of the original signal. Therefore, under the condition of keeping the same quantization error, the number of quantization levels can be reduced, that is, the coding rate can be compressed. Difference coding is generally realized in a predictive way. Prediction means that when we know part of the redundant (correlated) signal, we can infer and estimate the rest. Specifically, if the state of a signal before a certain time is known, its future value can be estimated. According to the knowledge of information theory, we know that the information entropy of the J -th character of the symbol sequence { f i }, i = 1, 2, . . . , N with N values satisfies the following formula: log 2 J ≥ H ( f N | f N −1 ) ≥ H ( f N | f N −1 f N −2 ) ≥ 

(30.4)

If you know some symbols N before the f L (L < N ) symbol and guess the symbol f N that follows, the more f L knows, the easier it is to guess. Easy to guess means that the uncertainty of the source is reduced, that is, its information entropy is small. The degree of its guess depends on the probability distribution and correlation of the source. This is the theoretical basis of predictive coding. Because the similarity between the spectral bands in a set of HR multispectral remote sensing images varies with the spectral bands, it is not appropriate to use a single predictor for the whole HR multispectral remote sensing image sequence. A simple and effective segmented DPCM method for removing the spectral redundancy is given below. This algorithm firstly uses DPCM method to remove the inter-spectral redundancy of HR multi-spectral remote sensing images, and then

30 Simulation Experiment of DPCM Compression System for High …

309

uses WT to encode the residual image after removing the inter-spectral redundancy. This algorithm can not only realize near-lossless compression of HR multi-spectral remote sensing images, but also realize lossy compression with different accuracy conveniently. The basis for grouping HR multispectral remote sensing image sequences is given below, and a similarity coefficient of two adjacent HR multispectral remote sensing images is defined as follows: cor (n) =

M−1 N −1 1   x , x n = 1, 2, . . . , L − 1 M N i=0 j=0 n,i, j n−1,i, j

(30.5)

In the algorithm, a threshold value Tcor is set in advance. If it is c(n) > Tcor , it is considered that the similarity of images xn , xn−1 is strong, and xn−1 can be used to predict xn , that is, xn , xn−1 is divided into the same sequence segment; Otherwise in a different sequence segment. The advantage of normalizing the image data in advance is that after segmenting the image sequence, the predictor coefficients of each subsequence can be calculated directly, and it is not necessary to count each subsequence separately to calculate their respective predictor coefficients, thus greatly saving the coding time.

30.3 Simulation Experiment Analysis The experimental image is a frame from the data acquired by a practical modular multispectral imaging spectrometer in a flight. The image size is 2098*2098 points, and the pixel gray quantization level is 12 bits. All the programs in this experiment are written in C/C++ language in Visual C++ environment. The entropy of an image is a measure of the complexity of the image. After the image is preprocessed and predicted, the predicted residual image is obtained. For the residual image, if the prediction is more accurate (a large number of values are near 0), the smaller the entropy value of the image is, which is more conducive to compression to obtain a larger compression ratio. In order to verify the performance of our compression algorithm, we compare JPEG-LS with this algorithm. JPEG-LS is the best lossless compression standard for 2D images. Figure 30.3 shows the entropy values of the original image without entropy coding and the residual image after prediction. As can be seen from the Fig. 30.3, after compression, the entropy of the image is reduced, and we know that the lower the entropy of the residual image, the higher the compression ratio. So, we need to reduce the residual image moisture value as much as possible. Although JPEG-LS algorithm has achieved the function of entropy reduction, its performance is not as good as this algorithm.

310

W. Du et al.

Fig. 30.3 Entropy values of original image and compressed residual image

Table 30.1 shows the compression performance of the selected images after being compressed by this algorithm and JPEG-LS algorithm respectively. Figure 30.4 is a statistical chart of compression performance. It can be seen from the experimental results that JPEG-LS algorithm achieves the same compression effect as this algorithm. However, in JPEG-LS algorithm, the calculation of autocorrelation covariance matrix of images is huge and needs a lot of storage. In addition, the computational complexity of eigenvalues and eigenvectors is also relatively high. However, the application of this algorithm solves the problems existing in calculating the basis vector through covariance estimation and feature decomposition, and the calculation is very effective with only a few bits. Therefore, in terms of algorithm complexity, this algorithm is much lower than JPEG-LS method. It shows that the algorithm in this paper can achieve ideal compression effect. In the further study of the algorithm, we can consider selecting only some representative image data as training samples to speed up the convergence speed, so that the algorithm can better meet the real-time requirements and get better compression performance. Table 30.1 Compression performance Algorithm

CR

PSNR (dB)

MSE

Encoding time (ms)

JPEG-LS

34.469

36.636

0.00944

2464.85

Our

33.277

37.145

0.00943

2057.81

30 Simulation Experiment of DPCM Compression System for High …

311

Fig. 30.4 Statistical chart of compression performance

30.4 Conclusions HR multispectral remote sensing image is derived from multispectral remote sensing technology, such as imaging spectrometer used for meteorological satellites. Compared with ordinary images, the image produced by HR multispectral remote sensing image has more one-dimensional spectral information, and the amount of data is large, which far exceeds the requirements of data transmission and storage equipment, so it must be compressed. In this paper, the simulation experiment of DPCM compression system for HR multispectral remote sensing image is carried out, and a new algorithm for HR multispectral remote sensing image compression using WT and piecewise DPCM hybrid coding is introduced. The simulation results show that the entropy value of the image is reduced after the compression of this algorithm. The algorithm in this paper solves the problems existing in calculating the basis vector by covariance estimation and feature decomposition, and it only needs a few bits, but the calculation is very effective. It shows that the algorithm in this paper can achieve ideal compression effect. With the development of information technology, people have higher and higher requirements for image compression technology, so the work in this paper can provide technical contributions to the development of HR multi-spectral remote sensing image compression technology.

312

W. Du et al.

References 1. Z. Chi, Research on satellite remote sensing image fusion algorithm based on compression perception theory. J. Comput. Methods Sci. Eng. 21, 341–356 (2021) 2. C. Shi, L. Wang, J. Zhang, F. Miao, P. He, Remote sensing image compression based on direction lifting-based block transform with content-driven Quadtree coding adaptively. Remote Sens. 10, 999 (2018) 3. Y. Zhang, H. Cao, H. Jiang, B. Li, Mutual information-based context template modeling for bitplane coding in remote sensing image compression. J. Appl. Remote Sens. 10, 025011 (2016) 4. X. Huang, G. Ye, H. Chai, O. Xie, Compression and encryption for remote sensing image using chaotic system. Sec. Commun. Netw. 8, 3659–3666 (2015) 5. C. Shi, L. Wang, H. Qu, Design of a remote sensing image hierarchical compression and transmission system for visualization. Harbin Gongcheng Daxue Xuebao/J. Harbin Eng. Univ. 39, 8 (2018) 6. N. Zhang, S. Feng, J. Pu, Near-lossless compression of multispectral remote sensing images with CCSDS dynamic rate control. Opt. Precis. Eng. 23, 8 (2015) 7. P. Wang, X. Chen, Y. Zhan, K. Xu, Low complexity lossy compression of multi-spectral remote sensing images based on block KLT. Infrared Technol. 40, 7 (2018) 8. L. Wang, S. Zhang, X. Li, X. Shao, Evaluation of stripe noise in multi-spectral remote sensing images by phase consistency. Infrared Laser Eng. 44, 3731 (2015) 9. K. Ding, X. Yang, S. Su, Y. Liu, Perceptual hash authentication algorithm for multispectral remote sensing images with band features. J. Image Graph. 23, 11 (2018) 10. Z. Wu, P. Liu, Y. Wu, A comprehensive method of multi-spectral remote sensing application requirements. J. Appl. Sci. 035, 658–666 (2017) 11. Q. Zhao, X. Liu, X. Zhao, Y. Li, Multispectral remote sensing image segmentation based on variable FCM algorithm. J. Electr. Inform. 40, 157–165 (2018) 12. J. Qi, Z. Ren, J. Zhao, J. Zhu, Comparison and analysis of two shallow water depth inversion models of multi-spectral remote sensing images. Oceanogr. Res. 038, 50–58 (2020)

Part III

Big Data Application in Robotics

Chapter 31

Construction of Music Classification and Detection Model Based on Big Data Analysis and Genetic Algorithm Lingjian Tang and Sanjun Yao

Abstract How to discover and recommend the most valuable songs from users’ actual needs is an urgent problem in the current music search and recommendation field. On this basis, this project plans to use a combination of big data analysis and genetic algorithm to study a new music recognition method. Firstly, feature extraction is performed on the music signal to obtain feature vectors that comprehensively reflect the sound, rhythm, emotion, and other information of the music; On this basis, the established neural network model is used to classify different tracks and obtain the segmentation probability of different tracks; Finally, a test was conducted on the tested tracks, and based on the probability of their occurrence, the test tracks were classified and the test accuracy was calculated. Experiments have shown that the algorithm proposed in this article has high recognition accuracy and can achieve unsupervised extraction of multidimensional information, laying the foundation for personalized music search. Keywords Big data analysis · Genetic algorithm · Music classification · Detection model

31.1 Introduction With the continuous improvement of people’s living standards, people enjoy life in a variety of ways, among which listening to music has become an important pastime [1]. However, in the process of music collection, due to the interference of some adverse effects in the environment, music includes some information that is harmful to sound quality, which is collectively called noise [2]. Music can subtly influence people’s hearts and cultivate people’s sentiments. Beautiful and quiet music can soothe people’s hearts, and lyrical music can express people’s love for a happy life L. Tang · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_31

315

316

L. Tang and S. Yao

and their longing for a better life in the future [3]. Therefore, music, an ancient and time-honored art form, has almost accompanied the whole course of human development and has always occupied an indispensable and important position in people’s lives [4]. With the development of information and multimedia technology, music digitization is widely used in various media, such as radio broadcasting, digital storage (optical disk, network), and so on. As a result, efficiently retrieving and managing music that users are interested in from a large amount of music has become the focus of research and development in recent years. In order to meet the needs of users, music information retrieval (MIR) in multimedia, a new research field, has developed rapidly and received widespread attention [5]. Music classification plays a very important role in music signal retrieval because many users may only be interested in specific types of music, and music classification systems can perform music retrieval based on their interests and hobbies [6]. In addition, dividing music into different types facilitates efficient management and rapid retrieval of music. So in recent years, music classification has received widespread attention and developed rapidly. How to improve the accuracy and efficiency of music classification has become the focus of research in this field [7]. Text based music retrieval is used to find a matching music list from the text input by the user. The search content of this type of search is mostly music tags composed of basic information such as singer names, song names, lyrics, album names, and other music. In addition, there are two ways to obtain music tags: audio mining and text mining. Audio mining is used to predict music labels from audio content, such as emotional instrument types [8]. Under the same conditions, it is compared with other music classification and detection models. The comparison results effectively prove the effectiveness and superiority of this music classification and detection model. For music classification, the application of deep learning network can get better classification results. In recent years, it has been studied whether the combination of big data analysis and other classifiers can make the classification accuracy reach a new level. According to the existing GMM-hmm network, DNN-hmm network is proposed, which is applied to speech learning and training and has achieved good results. However, there are few algorithms combining big data analysis with other classifiers in the field of music classification at present, so this paper proposes an improved genetic algorithm to meet the needs of music classification.

31 Construction of Music Classification and Detection Model Based on Big …

317

31.2 Construction of Music Classification and Detection Model 31.2.1 Related Technologies and Implementation Methods of Music Retrieval Music retrieval, as the name implies, refers to the retrieval of music data. Generally speaking, it refers to finding the music data you want through a music search engine. Music retrieval is an important research content in the current research of multimedia search engines [9]. Music retrieval includes three main aspects: organization and management of music data resources; understanding of the form of music queries; matching and sorting between music data resources and user queries. There are many advantages in the representation of music data, including music scores, audio, lyrics, tunes, categories, reviews, etc. These data constitute a data set for music retrieval. Today’s popular music retrieval algorithms are based on these attributes and content information, using the intrinsic or extrinsic genes of music to build a music retrieval model. Music exists in various parts of the world, even the most remote tribes [10]. Music develops with the development of human society, but it is time-sensitive. In ancient times, due to the backwardness of science and technology and the limitations of human understanding, music can only be passed on by word of mouth, and there will inevitably be some mistakes. Over time, it will be impossible to maintain the original flavor of the music. Therefore, human ancestors invented all kinds of marking symbols for recording music, and used these symbols to form music scores, so that future generations can play music with the same or similar meaning as the original creator of music according to these music scores. There are many issues that need to be addressed in the research of content based music retrieval systems, such as music segment segmentation, music segment feature selection, retrieval and recognition of queried humming segments, similarity calculation between humming segments and music segments, noise data recognition and filtering, and accurate extraction of tones from multi tone polyphonic music signals. As shown in Fig. 31.1. Generally speaking, the main retrieval resource for content-based music retrieval is to extract perception-related features, including loudness, pitch, pitch, coefficient, etc. A representative system is Jang’s Super MBox. In recent years, due to the development of data storage technology and the continuous improvement of computer data processing power, it is possible to effectively process high-dimensional audio digital signals, and more and more researchers are committed to this. Their low robustness to noise increases the error rate of music classification and detection, which is difficult to meet the practical application requirements of music classification and detection.

318

L. Tang and S. Yao

Fig. 31.1 Content based music retrieval processing flow

31.2.2 Construction of Music Classification Model Feature extraction is a key part of music classification. Its main function is to extract feature vectors representing music features from music, that is, to extract recognizable components from music signals. Music signal features should be able to maximize the representation of music attributes, and have invariance and distinguishability. Tone refers to the subjective perception of the pitch of a sound by the human ear, which is proportional to the frequency of the sound; Timbre refers to people’s perception of the quality of a sound. All frequencies in a musical signal, except the lowest frequency, are collectively referred to as overtones, and the combination of overtones creates a specific timbre; Rhythm is defined as the degree of regularity of music, usually represented by the number of beats per unit time. Tone, timbre, and rhythm are the subjective feelings of the human ear towards music signals, and they can be abstracted into feature vectors that represent them. Most of the abstracted timbre features are short-term features, while most of the abstracted tone and rhythm features are long-term features. In addition to being directly abstracted, long-term features can also be composed of short-term features. The music classification system is mainly composed of three modules, that is, three stages: music signal preprocessing, feature extraction and classification according to the extracted features. Pretreatment is a

31 Construction of Music Classification and Detection Model Based on Big …

319

necessary stage of music signal classification, and its main purpose is to facilitate feature extraction in the next stage; Feature extraction means how to accurately describe music signals with a set of parameters. The selection of music features determines the performance of the recognition system to some extent, and good speech features can improve the accuracy and speed of music signal classification. Classifier is the core of the whole classification system, and its performance directly determines the performance of the whole system. User reviews often contain richer and deeper information related to music. If these information can be accurately mined, it can greatly enrich the types of music tags. However, due to the diversity of comments, it is difficult to completely filter the noise and ensure the accuracy of the obtained information. Therefore, these extracted information is used as fuzzy labels. The music classification method proposed in this article is mainly divided into the following parts, as shown in Fig. 31.2. The purpose of dictionary learning is to get a dictionary suitable for word segmentation of music corpus and improve the accuracy of word segmentation. The basic information of music, user comments, user tags and other information are used in the process of obtaining the dictionary, and the user tags need to be analyzed before

Fig. 31.2 Music classification model framework

320

L. Tang and S. Yao

being added to the dictionary. Tag extraction is to select tags related to music from a large number of candidate keywords. These tags related to music can be used as the index of music retrieval, and the scope of music retrieval can be improved.

31.3 Analysis of Music Classification and Detection Construction Based on Big Data Analysis and Genetic Algorithm 31.3.1 Model Simulation Base on Big Data Analysis and Genetic Algorithm The process of adaptive multi-feature fusion of music is actually a screening process. When the background rhythm frequency of music changes dramatically, the characteristics of sound effects change in a continuous form, and the feature results obtained by single multi-feature fusion are very single, which cannot be classified and used. After adaptive multi-feature fusion, the characteristic sound effect of music has changed obviously, but with the advancement of music, the characteristics will also change, so it is necessary to update the tracked melody in real time to meet the whole feature fusion process. However, when the audio is repeated regularly, or when two features appear at the same time, there will be the integration of wrong features, which will eventually lead to the failure of audio feature fusion with the continuous accumulation of feature fusion. This paper selects short-term energy spectrum features for music classification and detection modeling. Set the sampling frequency of music to f i , and the formula for calculating the spectral signal variance of music is shown in Eq. (31.1). N sp =

( f i − f i )2 p( f i ) N n=1 p( f i )

i=1

(31.1)

The formula for calculating the short-term energy spectrum characteristics of music is shown in Formula (31.2). N ff =

( f i − f i )2 p( f i ) N sp 3 n=1 p( f i )

i=1

(31.2)

The main goal of the enhanced genetic algorithm and the improved version of this paper is to develop an improved genetic algorithm that can find the optimal solutions of a large number of different problems and exclude human participation as much as possible. Therefore, the overall goal is to use the improved genetic algorithm to debug the weight and architecture of the neural network at the same time and use as little manual intervention as possible. It is very important to reduce manual

31 Construction of Music Classification and Detection Model Based on Big …

321

intervention, because the main purpose of this paper is to construct an algorithm that can automatically adjust every aspect of the neural network, so any dependence on the user’s professional knowledge is considered unacceptable. Different from other genetic algorithms that require users to set crossover weights in advance, the improved genetic algorithm only sets the value of W to 0.5 at the beginning, and then adjusts the size of W in each training. In the first training, the direction of W change (increase or decrease) is randomly selected. If the crossover offspring with the highest fitness found in this generation is better than the crossover offspring with the highest fitness in the previous generation, it is obvious that the search area is suitable, so W will continue to change in this direction. On the contrary, if the fitness of the member with the highest fitness in the new generation is not as high as that in the previous generation, W will be changed in another direction to return to a more suitable result. Generally, the energy of the high-frequency part of a music signal is low, so it is necessary to enhance the high-frequency part of the signal, also known as pre emphasis processing. The pre emphasis processing is implemented by filtering, and its mathematical expression is: y(n) = x(n) − µx(n − 1)

(31.3)

Among them, y(n) represents the output signal after pre emphasis, x(n) represents the input signal, and µ represents a pre emphasis factor close to 1. Extract short-term energy spectrum features from noiseless music signals, and preprocess the short-term energy spectrum features using the following method, as shown in Formula (31.4). f f i =

max − f f i max − min

(31.4)

where, and represent the maximum and minimum values of short-term energy spectrum characteristics, respectively. At present, emotional semantics has been widely applied to image processing in multimedia technology, but its research in music is still blank. Firstly, through the analysis and processing of text semantics, a multi-dimensional semantic domain is constructed to express the emotional color of the text. Songs and songs are mapped to the same high-dimensional emotional semantic space, and the similarity measurement between the two is calculated. The similarity between the two is evaluated, and effective processing of non-descriptive text is achieved. Although adding another parameter that requires the user to set, this parameter is much better understood than the parameter w mentioned in Chap. 3. In this experiment, even the most experienced user will only understand how to set W when performing this experiment. The size of a completely depends on the user’s preferences, regardless of the user’s speed, convenience, or overall accuracy. This is a compromise solution, and in all computational problems, this new parameter also allows users to customize their special needs. For this study, the value used here is a = 0.8, and we will not consider the

322

L. Tang and S. Yao

results obtained by comparing various values for the time being. The fitness function set by this method still pays more attention to the accuracy of the network, but also considers the network size to prevent the genetic algorithm from simply selecting a fully connected large network when absolutely necessary.

31.3.2 Analysis of Experimental Results In order to test the effect of music classification and detection in complex noise environments, a music classification and detection model based on BP neural network is selected on the same test platform. This model does not introduce denoising technology and a music classification and detection model based on KNN. This model uses denoising technology, but uses KNN to establish a music classification classifier, The performance of each model is analyzed using music classification, detection accuracy, and detection efficiency in complex noise environments. A piece of music in GTZAN is 30 s, the sampling rate is 22,050 Hz, and the number of sampling bits is 16. In this paper, if the pause length is N f = 512 and the frame shift is N0 = 256, the length of a shout is 512/22,050 * 1000 = 23 ms, and the shout shift is 0.5, then each music can be divided into about 30/23 * 1000 * 2 = 2608 shouts, which is a theoretical value. In the actual simulation, the number of shouts we get is 2584. The pre-emphasis factor µ in the pre-emphasis stage is 0.97. Taking a rock music as an example, the feature coefficient sets of the first-dimensional MPC parameters of 15 s and the 12-dimensional MPC parameters of 100 frames are shown in Fig. 31.3 respectively.

Fig. 31.3 MPC coefficient of the first dimension of music signal

31 Construction of Music Classification and Detection Model Based on Big …

323

Fig. 31.4 The first dimensional MPC coefficients of 10 types of music signals

From Fig. 31.3, it can be seen that the MPC curve is not smooth and has many mutations, indicating that the MPC features of music extracted from different frames vary greatly, which can fully reflect the attributes of music. This is an advantage of MPC features. Figure 31.4 shows a comparison of the first dimensional MPC coefficients of 10 types of music signals. In order to make the comparison obvious, a segment length is taken. As can be seen from Fig. 31.4, the numerical magnitude and direction changes of MPC features of 10 kinds of music signals are completely different, so MPC features can better characterize and distinguish 10 kinds of music signals. The music classification and detection time of this model is less than that of big data technology, which shows that the interference of noise on music classification and detection is well suppressed and the efficiency of music classification and detection is improved by eliminating noise.

31.4 Conclusion In this situation, traditional song retrieval methods based on feature word matching are no longer able to meet users’ requirements for song retrieval. Classifying and detecting music is a crucial technology in music search. The existing music classification and detection models are affected by noise and cannot achieve satisfactory recognition results, resulting in a high recognition error rate. On this basis, a music classification detection method based on background noise is proposed. On this basis, a new music classification method is proposed, which combines multiple features and applies them to music classification. The experimental results demonstrate that this method can effectively identify different types of music and achieve good results. By

324

L. Tang and S. Yao

simulating the classification of specific categories of music, it can be found that the classification accuracy is always low for specific categories of music. We can start from this type of music and find a way to correctly classify it, thereby improving the accuracy of the entire music classification.

References 1. Y. Xu, Q. Li, Music classification and detection of location factors of feature words in complex noise environment. Complexity 34(8), 11 (2021) 2. T. Gong, Deep belief network-based multifeature fusion music classification algorithm and simulation. Complexity 36(10), 22 (2021) 3. D.U. Feifei, S.I. University, Z. University, An electronic music intelligent classification model based on weight reasonable distribution. Mod Electr Tech 19(11), 37 (2018) 4. H. Chen, J. Chen, Recurrent neural networks based wireless network intrusion detection and classification model construction and optimization. J. Electr. Inform. Technol. 50(8), 38 (2019) 5. T. Liu, H. University, Electronic music classification model based on multi-feature fusion and neural network. Mod. Electr. Techn. 47(12), 12 (2018) 6. S.S. Pandey, R. Mishra, P. Ramesh et al., Automatic music classification with genetic algorithm. Curr. Sci. Fortnight. J. Res. 11(3), 21 (2019) 7. A. Ad, B. Da, C. Ea et al., A new technique for ECG signal classification genetic algorithm Wavelet Kernel extreme learning machine. Optik 18(4), 16 (2019) 8. A. Mendes, M. Coelho, R.F. Neto, A music classification model based on metric learning applied to MP3 audio files. Exp. Syst. Appl. 50(8), 33 (2019) 9. Y. Sun, B. Xue, M. Zhang et al., Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. 15(1), 29 (2020) 10. X. Xia, J. Yan, Construction of music teaching evaluation model based on weighted nave bayes. Sci. Program. 55(6), 33 (2021)

Chapter 32

Simulation of Electronic Music Signal Identification Model Based on Big Data Algorithm Sanjun Yao and Hongru Ji

Abstract Music is the art of expressing human emotions with sound, which can express people’s inner emotional changes and complex emotions. Driven by big data, this article proposes an electronic music signal recognition model based on FCM algorithm. The results show that the algorithm has high accuracy and intelligence in detecting electronic music signals and short detection time. Because everyone can synthesize electronic music, the data of electronic music signals is exploding, which makes it difficult for people to choose their favorite electronic music signals. Electronic music signal detection can help people choose electronic music quickly and accurately, so electronic music signal detection has very important research significance. Big data algorithm can greatly improve the recognition effect of electronic music signals, improve the recognition speed of electronic music signals, and meet the online recognition of electronic music signals. Keywords Big data · Music signal · Identification model

32.1 Introduction With the continuous maturity of computer technology, the ability of signal processing has been improved, and it has been successfully applied in the field of music generation, resulting in a large quantity of electronic music [1]. With the rapid development of music signal coding and storage technology, digital music, digital broadcasting, multimedia and so on have been greatly popularized, presenting a huge market prospect [2]. The goal of using computer technology to study music is to use computers to simulate all human musical behaviors, such as music creation, music recognition and music retrieval [3]. Because most people like listening to music,

S. Yao (B) · H. Ji College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_32

325

326

S. Yao and H. Ji

there are many kinds of electronic music, and everyone likes different types of electronic music [4]. If the types of electronic music signals are classified and identified in advance, listeners can choose the electronic music they want to listen to from the electronic music signal labels [5]. There are different understandings of concerts from different angles. Character studies treat music as an abstract art form. However, in synesthesia, it is considered that hearing and vision are very close in the underlying nature and psychological characteristics, so these two art forms are often assisted with each other to achieve better results [5]. At present, due to its own characteristics, the research on music classification requires not only the research on classifiers, but also the research on feature selection methods in this field [6]. Because everyone can synthesize electronic music, the data of electronic music signals is exploding, which makes it difficult for people to choose their favorite electronic music signals. It also has its own characteristics and has specific requirements for feature extraction, feature selection process and classifier performance [7]. Electronic music signal detection can help people choose electronic music quickly and accurately, so electronic music signal detection has very important research significance [8]. In this article, driven by big data, an electronic music signal recognition model based on FCM algorithm is proposed. In order to accurately extract notes, the music signal is denoised, and its superiority and universality are verified.

32.2 Music Signal Feature Recognition Timbre feature is an electronic music factor with short-term characteristics. In order to extract timbre feature accurately, it is need to distinguish the quantity of signal frames. The initial audio signal is divided into frames with equal segment length, and there is a certain overlap between the signal frames. In order to make the signal smoother, it is need to window the signal frames [9]. In order to obtain the accuracy and conciseness of audio, it is need to analyze audio and extract information about classification. The characteristics of electronic music should be extracted according to the specific application, which has the characteristics of timbre, fundamental frequency and rhythm. Based on this characteristic, the timbre and rhythm characteristics of electronic music should be extracted. The frame and flow of the music signal recognition system are shown in Fig. 32.1. The music signal is mixed with noise interference in the acquisition process. In order to extract the notes accurately, it is need to remove the noise interference, which requires denoising the music signal [10]. Among these interferences, 50 Hz power frequency signal interference is the most obvious [11]. To filter this noise, the method of amplitude value accumulation can be used. Taking 20 ms as an accumulation unit, the amplitude value is accumulated, which can not only remove noise, but also reduce the amount of data and improve the calculation speed. In the music signal recognition problem, P(W |X ) is difficult to calculate directly, and it can be converted into the following formula by using the Bayesian formula:

32 Simulation of Electronic Music Signal Identification Model Based …

327

Fig. 32.1 Framework and stage of music signal recognition system

P(X |W )P(W ) = arg max P(X |W )P(W ) Wˆ = arg max P(X ) W W

(32.1)

In the formula, P(X ) represents the prior probability of the feature vector. P(X |W ) represents the probability of using a given word sequence to generate the corresponding acoustic feature vector. P(W ) is the prior probability of the word sequence. The transfer function H (z) of the established mathematical model can be expressed as: H (z) = U (z)V (z)R(z)

(32.2)

Among them, U (z) represents the excitation signal; when the music signal is voiced, U (z) is the pulse sequence, which is the z transform of the triangular pulse sequence. And when the music signal is unvoiced, U (z) is the transformation of random noise z. V (z) is the transfer function of the channel. R(z) is a first-order high-pass filter. The channel’s transfer function V (z) is an all-pole model: 1 −i i=0 ai z

V (z) =  p

(32.3)

328

S. Yao and H. Ji

In the formula, p is the order, the value is in the range of 8 ~ 12, and each pair of poles corresponds to a resonance peak. ai is a parameter for channel molding.

32.3 Result Analysis and Discussion The analysis and extraction of music features play a vital role in the automatic recognition of music. The scientific nature of the extracted objects, the feasibility of the extraction methods and the accuracy of the extracted results directly affect the effect of automatic recognition [12]. Firstly, music is divided into several bars according to the law of the strength of notes, and the feature information of bars is extracted. Then the music is divided into several segments according to the similarity of adjacent bars, and the characteristic information of the segments is extracted. Compare and analyze the time-consuming of music signal recognition processing with different methods, as shown in Fig. 32.2. As can be seen from the figure, the time-consuming of music signal recognition processing of DBN algorithm increases with the increase of music feature information, which takes a long time. Compared with DBN algorithm, the time-consuming music signal recognition based on improved FCM has obvious advantages. In order to extract the melody characteristics of music to form a melody outline that can be retrieved, we must first determine what format of music file to use as the data source for melody extraction. Different electronic music signals have different feature vectors. The identification model of electronic music signals is mainly used to describe the mapping relationship between feature vectors and electronic music signal types. The convergence index is used to compare the results of maximum clique

Fig. 32.2 Time-consuming identification of music signals by different methods

32 Simulation of Electronic Music Signal Identification Model Based …

329

Fig. 32.3 Comparison results of convergence

structure mining between FCM and DBN. The convergence comparison results of FCM method and DBN are shown in Fig. 32.3. The results show that the improved FCM can obtain more reasonable, feasible and scientific results of music signal feature identification and classification than DBN algorithm. Using FCM to extract electronic music signals has good optimization characteristics and rapid convergence speed. The classification of music is different in different training periods. Rock music with the highest classification accuracy in this article is selected as the experimental object to test whether the correct classification rate will be affected with the increase of training period. When searching, we can first search the pitch features by string matching, and then get the hit results by precise matching and fuzzy matching according to the specific retrieval requirements. Then, we can calculate the similarity and correlation of the quantitative melody features of the hit results: sound length and sound intensity, and sort them according to the calculation results and feed them back to the users. The key here is to have an efficient fuzzy matching algorithm, and at the same time, the effective threshold and weight of each feature must be reasonably determined, because these parameters are different in different retrieval input ways, such as the melody outline obtained by identifying humming, and the weight of pitch should be greater than the sound length when searching. Because the collection environment of electronic music signals is complex, usually indoors, due to the influence of echo, signal refraction, personnel walking and the level of technicians, the obtained electronic music signals contain certain noise, which will interfere with the identification results of electronic music signals. Figure 32.4 shows the errors of different algorithms on the training set. Figure 32.5 shows the errors of different algorithms on the test set.

330

S. Yao and H. Ji

Fig. 32.4 Error of different algorithms on training set

Fig. 32.5 Error of different algorithms on test set

Compared with the high-level statistical features, the classification model of electronic music signals based on the bottom features can get higher accuracy. Music is always composed of strong beats and weak beats. This kind of alternation cannot be arranged randomly, but a bar is organized according to a certain law of strength, and then it is repeated on this basis. The law of strength of music has been set in the stage of making music score, that is to say, as long as the music score of a music is created, its law of strength is fixed. Through the improvement of this article, the convergence speed of FCM parameters is faster, and the final model classification accuracy is higher. This method obtains an ideal result of electronic music signal feature recognition, and the recognition accuracy is higher than other music signal

32 Simulation of Electronic Music Signal Identification Model Based …

331

Fig. 32.6 Accuracy of music signal identification using different methods

recognition methods. The accuracy of music signal identification using different methods is shown in Fig. 32.6. The algorithm in this article makes reasonable use of the unique sparsity of natural audio signals, so that the enhancement effect is not affected by the added noise, and it has good robustness. The accuracy of music signal identification using this method is over 95%. The algorithm has the advantages of avoiding over-fitting or under-fitting of fixed structure neural network, using the error information of samples to improve the learning accuracy in the adjustment process, and designing a smaller network structure.

32.4 Conclusion Because most people like listening to music, there are many kinds of electronic music, and everyone likes different types of electronic music. If the types of electronic music signals are classified and identified in advance, listeners can choose the electronic music they want to listen to from the electronic music signal tags. If pattern classification is fully introduced, it will inevitably affect the classification effect. Driven by big data, this article proposes an electronic music signal recognition model based on FCM algorithm, and verifies its superiority and universality. The algorithm makes reasonable use of the sparsity of natural audio signal, which makes the enhancement effect unaffected by added noise and has good robustness. The accuracy of music signal recognition by this method is over 95%. An adaptive operator is added to the local search strategy of the improved FCM algorithm, which makes the local search scope narrow with the iteration of the algorithm, thus making the local search more targeted. Compared with high-level statistical features,

332

S. Yao and H. Ji

the classification model of electronic music signals based on low-level features can obtain higher accuracy. There are too few kinds of music automatic classification methods based on fractal dimension. Highlighting that the visual design of music theme requires the inconvenient conversion between two music libraries is the focus of future research.

References 1. N. Reljin, D. Pokrajac, Music performers classification by using multifractal features: a case study. Arch. Acoust. 42(2), 223–233 (2017) 2. A. Rosner, B. Kostek, Automatic music genre classification based on musical instrument track separation. J. Intell. Inform. Syst. 2(3), 1–22 (2017) 3. Y. Costa, L.S. Oliveira, C.N. Silla, An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017) 4. B. Sun, Using machine learning algorithm to describe the connection between the types and characteristics of music signal. Complexity 2021, 1–10 (2021) 5. A. Dorochowicz, B. Kostek, A study on of music features derived from audio recordings examples: a quantitative analysis. Arch. Acoust. 43(3), 505–516 (2018) 6. S.J. Lunde, P. Vuust, E.A. Garza-Villarreal et al., Music’s objective classification improves quality of music-induced analgesia studies reply. Pain 6, 160 (2019) 7. J.S. Martin-Saavedra, S. Saade-Lemus, Music’s objective classification improves quality of music-induced analgesia studies. Pain 160(6), 1482–1483 (2019) 8. J. Zhang, Music feature extraction and classification algorithm based on deep learning. Sci. Program. 2021(2), 1–9 (2021) 9. K. Zhang, Music style classification algorithm based on music feature extraction and deep neural network. Wireless Commun. Mobile Comput. 2021(4), 1–7 (2021) 10. C. Chen, Q. Li, A multimodal music emotion classification method based on multifeature combined network classifier. Math. Probl. Eng. 2020, 1–11 (2020) 11. K. Jathal, Real-time timbre classification for tabletop hand drumming. Comput. Music J. 41(2), 38–51 (2017) 12. S. Wang, Robust audio content classification using hybrid-based SMD and entropy-based VAD. Entropy 22(2), 183 (2020)

Chapter 33

Design of Interactive Teaching Music Intelligent System Based on AI and Big Data Analysis Chun Liu and Shanshan Li

Abstract In the current music teaching activities, more and more artificial intelligence (AI) and big data analysis technology are needed to assist teaching, so as to cultivate students’ independent inquiry ability and research ability. This article discusses the related technologies of artificial intelligence and big data analysis. On this basis, taking music as an example, combined with information technology and theoretical ideas, an interactive teaching music intelligent system is constructed by RBF algorithm, and the system is discussed in detail. The result shows that the interactive music intelligent teaching system constructed in this article plays a good auxiliary role in students’ learning music knowledge and provides a better educational means. It overcomes some shortcomings of traditional music education, improves the quality and efficiency of music education to a certain extent, and has certain practical value and positive significance for the research of music education. Keywords Artificial intelligence · Music · Interactive teaching · Teaching intelligent system

33.1 Introduction Since the birth of AI, countries have increased their investment in AI. As a new discipline, it is mainly used to research and develop and extend the method theory of human intelligence [1]. At present, the growth of AI mainly focuses on big data C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] S. Li School of Physics, Electronics and Intelligent Manufacturing, Huaihua University, Huaihua 418008, Hunan, China Key Laboratory of Intelligent Control Technology for Wuling-Mountain Ecological Agriculture in Hunan Province, Huaihua 418008, Hunan, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_33

333

334

C. Liu and S. Li

research, cross-media co-processing, man–machine integration, intelligent system integration and autonomous intelligent system research [2]. The trend of hardware platform is more and more obvious, and the growth of AI has entered a new stage [3]. Contemporary music education is gradually developing towards intelligence and online. At the same time, due to the great relationship between AI and hearing, AI has unique advantages in music education [4]. The realization of personalized guidance for students by AI and big data technology makes the application prospect of reciprocal teaching in higher education broad [5]. The rapid growth of AI provides some brand-new ideas for education reform, in which network-based learning mainly constructs a model of teacher-student interaction with the support of information technology, which transcends the limitations of time and space and enriches the overall instructional activities [6]. Li points out that more and more college students are immersed in all kinds of network information and ignore classroom knowledge, and students’ interest in courses or teaching methods also directly affects the quality of school teaching [7]. Liu found that the interactive teaching music intelligent system provided better technical support and learning concept for music learners [8]. Li believes that students can become the main body of teaching and stimulate their enthusiasm and consciousness in learning music through mutual learning [9]. In the modern environment, in order to make the instructional reform adapt to it, advanced instructional methods must be introduced. Based on the above, this article discusses the related technologies of AI and big data analysis. On this basis, taking the music discipline as an example, it combines information technology with theoretical ideas and adopts RBF algorithm to build an reciprocal teaching music intelligent system.

33.2 Methodology 33.2.1 Relevant Theoretical and Technical Basis The growth of music art is also accompanied by the breakthrough of material civilization. Under the background of unprecedented prosperity of art and culture, people’s demand for music learning has also greatly increased [10]. AI represents the forefront of information technology to a certain extent, and the organic combination of AI and computer-aided teaching will greatly improve the teaching level. At present, with the continuous improvement of educational information infrastructure, various tools, resources and computer systems are distributed in all aspects of educational activities, and the channels for data collection are increasing, and the amount of educational data is increasing. A large amount of data generated by education and management makes it an educational resource for large-scale data. It distinguishes large-scale educational data from traditional educational data. AI and big data have brought new opportunities for the growth of online education, making online education gradually move towards personalized education.

33 Design of Interactive Teaching Music Intelligent System Based on AI …

335

Facing the superior development technology, it not only brings many benefits to people, but also impacts the learning effect of students in the classroom [11]. The introduction of AI and big data analysis technology into the traditional music education system solves the choice of instructional content and instructional methods to a certain extent, and can also meet the individualized needs of students and improve teaching quality and learning efficiency. AI without data simply doesn’t work. Therefore, data and AI are like weapons and bullets, and data is the basis of AI operation. Music education system, as a network-based auxiliary teaching mode, uses various resources in the platform and adopts separation mode to improve the modular structure.

33.2.2 Construction of Reciprocal Teaching Music Intelligent System Intelligent learning system has long relied on individual learning mode, and provides many advantages in students’ learning, individual leadership and leadership. In the reciprocal teaching music intelligent system constructed in this article, students can select, judge and process a large amount of knowledge with the help of intelligent computers, so that the learning content is more targeted and the music learning effect is improved. Some personality characteristics of students are consistent, but they show great differences in learning foundation, learning habits and learning preferences. According to the standard of common learner model, in order to reflect students’ learning characteristics more truly and objectively, a learner model can be constructed from four aspects: students’ basic information, learning preference, cognitive level and learning behavior. Domain module is an important part of the reciprocal teaching music intelligent system. Domain module is used to store students’ curriculum knowledge, and can automatically generate questions, at the same time, it can also judge students’ answers and provide the answer process. The domain module usually contains two aspects: the knowledge in the textbook and the knowledge of using the theories in the textbook to answer questions. NN (Neural network) is an operation model with a large quantity of nodes, which can classify and simulate untrained data. NN uses gradient descent mechanism to train the model. The specific structure is shown in Fig. 33.1. This system can be realized by using NN of TensorFlow. TensorFlow uses TensorFlow to represent data, uses computational graph to build NN, uses session to perform node operations in computational graph, optimizes the weights on the line, and obtains the model. The second layer is the membership function, and its mathematical expression is as follows:  μi j (xi ) = exp

xi − ci j σ j2

2  i = 1, 2, 3, . . . , r ;

j = 1, 2, 3, . . . , u

(33.1)

336

C. Liu and S. Li

Fig. 33.1 Structure diagram of RBF network

The third layer describes the quantity of fuzzy rules. The mathematical calculation of the output of the j rule is as follows:  r  ϕ j = exp −

i=1

xi − ci j σ j2

2 

     X − C j 2 = exp − j = 1, 2, 3, . . . , u σ j2 (33.2)

  where c j = c1 j , c2 j , c3 j , . . . , cr j represents the center of the j RBF unit. It is mainly based on the TS fuzzy model in RBF algorithm, and its output is as follows:

u  x−ci 2 + a x + · · · + a x exp − (a ) i0 i1 1 1r r 2 i=1 σi

y(x) = u x−ci 2 i=1 exp − σ 2

(33.3)

i

ωk is the connection mode representing the kth rule. The optimizer mainly uses the minimize iterative method to optimize the objective function. The functional diagram of the interactive music education intelligent system is shown in Fig. 33.2. Any kind of course is inseparable from the guidance of educational ideas and the support of teaching design, and the same is true for online courses. Traditional

33 Design of Interactive Teaching Music Intelligent System Based on AI …

337

Fig. 33.2 Functional diagram of interactive music education intelligent system

subjects are taught by network technology, and the learning objectives are determined. In the course construction, the content design of online courses is completed under the guidance of educational ideas and teaching design mode. In the reciprocal teaching music intelligent system, the instructional content is divided into three levels, namely, unit level, concept level and entity level, which can express the relationship between instructional contents more flexibly. Among them, the concept level refers to the general framework of basic principles in all instructional contents, which can also be called knowledge points. Knowledge points can actually show the connection between instructional contents. The system integrates knowledge points and topics, which can be divided into a single knowledge point topic with the same difficulty, the same knowledge point topic with different difficulty, and a topic with multiple knowledge points. According to the right or wrong situation of students doing problems, the model can be used to calculate students’ mastery of knowledge points. So as to accurately analyze the relationship between knowledge points and build a real and effective knowledge map in the reciprocal teaching music intelligent system. The interface module can provide a convenient operating environment for learners in the form of love and hearing, and handle the information exchange between learners and the system through it. For a teaching system, the design of the interface not only affects the operability of the system, but more importantly, the input and output modes can greatly affect the comprehensibility of sending questions.

338

C. Liu and S. Li

Teachers can use the system platform to publish relevant homework requirements, and then grade and evaluate the completion of homework submitted by students, and summarize the final results; Teachers can also use questionnaires to better guide students to learn music courses and improve their musical literacy. The intelligent interface is the medium for the system to communicate and exchange information with students, and the information and knowledge conveyed on the intelligent interface should be familiar and applicable to students. At present, the man–machine interface module mainly adopts the expression mode of combining language and graphics, which can be easily understood by students and make communication more convenient.

33.3 Result Analysis and Discussion Based on the important role played by NN in AI and its great achievements in AI field in recent years, this article introduces NN technology into the teaching system and explores the application prospect of NN in big data analysis and reciprocal teaching music intelligent system. Based on AI and big data related technologies, an reciprocal teaching music intelligent system is built. In order to verify the robustness and efficient computing ability of the platform, this section carries out experimental analysis. In practical experiments, the data is presented in the form of matrix, and the weights used are often a multi-dimensional matrix. Therefore, it is necessary to get a local optimal solution of the weights and further idealize the required data. The cross entropy loss function can better evaluate the difference between the actual output and the expected output obtained by current training, vectorize the vectorized N-dimensional vector, and the vector with the largest output probability value is the vector eigenvalue needed by this system. This article compares the errors of FCA algorithm, K-means algorithm and this algorithm. After many experiments, the MAE (Mean absolute error) results of the algorithm are finally compared as shown in Fig. 33.3. The comparison results show that the MAE value is obviously reduced under the same conditions by using the algorithm in this article. In the reciprocal teaching music intelligent system, each type of user has its own unique rights. Among them, the system administrator has the highest authority of this teaching system, is responsible for the operation of the whole teaching system and the design of teaching environment, and has the operation authority for all courses offered in the teaching system. The following algorithms are tested, and the accuracy of FCA algorithm, K-means algorithm and this algorithm on the test set is shown in Fig. 33.4. The application advantages of the system platform constructed in this article are simple and powerful. No matter what kind of programming the administrator does, it only takes 20–30 min, and then the music teacher can manage the course, communicate with the students and comment on their homework. In this way, while reducing the burden on teachers, it can also maximize the enrichment of instructional

33 Design of Interactive Teaching Music Intelligent System Based on AI …

Fig. 33.3 MAE test results of different algorithms

Fig. 33.4 Accuracy results of different algorithms on test sets

339

340

C. Liu and S. Li

Fig. 33.5 Stability test results of the system

content. In the research and design of reciprocal teaching music intelligent system, this article takes into account the availability of the system by school teachers or teachers in the academic affairs office. Teachers can check the classroom situation in real time and adjust their own content or instructional methods appropriately to explore instructional methods and contents that are more in line with students’ reality and make students more interested; The administrators of the Academic Affairs Office can actually evaluate the teachers according to the parameters reflected by the system, so as to reduce the deviation of the results of classroom evaluation caused by the advance of experts attending classes. The system in this article is oriented to the teachers and students of the whole school. After testing, the stability result of the system is shown in Fig. 33.5. The test results in this section show that the MAE of this algorithm is about 4.26, the system stability can reach 95.957%, and the accuracy of the algorithm is at a high level. It can provide better technical support for the interactive music education intelligent system.

33.4 Conclusions With the important support of big data, personalization and other technologies, the diversified application of AI education has been pushed to the groundbreaking growth of major AI technologies, showing diversification and trends. AI mainly relies on NN

33 Design of Interactive Teaching Music Intelligent System Based on AI …

341

to play its role. Based on the important role played by NN in AI and its great achievements in AI field in recent years, this article introduces NN technology into reciprocal teaching music intelligent system to serve teaching. Taking music as an example, aiming at the limitations of traditional music education, this article combines information technology with theoretical ideas, and uses RBF algorithm to build an reciprocal teaching music intelligent system to meet the development needs of music courses and provide certain guarantee for the development and application of online courses. This result shows that the reciprocal teaching music intelligent system constructed in this article has played a good auxiliary role for students to learn music knowledge. It overcomes some shortcomings of traditional music education, makes "teaching interaction behavior" possible, and achieves the purpose of diversification of teaching evaluation and education in universities.

References 1. X. Du, Application of deep learning and artificial intelligence algorithm in multimedia music education. J. Intell. Fuzzy Syst. 38(2), 1–11 (2020) 2. G. Iliaki, A. Velentzas, E. Michailidi et al., Exploring the music: a teaching-learning sequence about sound in authentic settings. Res. Sci. Technol. Educ. 37(2), 218–238 (2019) 3. R. White, Authentic learning in senior secondary music pedagogy: an examination of teaching practice in high-achieving school music programmes. Br. J. Music Educ. 38(2), 1–13 (2020) 4. S.J. Gibson, Shifting from offline to online collaborative music-making, teaching and learning: perceptions of Ethno artistic mentors. Music. Educ. Res. 23(2), 1–16 (2021) 5. L. Zheng, Y. Zhu, H. Yu, Ideological and political theory teaching model based on artificial intelligence and improved machine learning algorithms. J. Intell. Fuzzy Syst. 2021(1), 1–10 (2021) 6. J. Qian, Research on artificial intelligence technology of virtual reality teaching method in digital media art creation. J. Int. Technol. 2022(1), 23 (2022) 7. M. Li, An immersive context teaching method for college english based on artificial intelligence and machine learning in virtual reality technology. Mob. Inf. Syst. 2021(2), 1–7 (2021) 8. T. Liu, Z. Gao, H. Guan, Educational information system optimization for artificial intelligence teaching strategies. Complexity 2021(3), 1–13 (2021) 9. Z. Li, H. Wang, The effectiveness of physical education teaching in college based on artificial intelligence methods. J. Intell. Fuzzy Syst. 40(4), 1–11 (2020) 10. B. Yi, D. Mandal, English teaching practice based on artificial intelligence technology. J. Intell. Fuzzy Syst. 37(1), 1–11 (2019) 11. D. Feng, Research on the cultivation of students comprehensive ability in aerobics teaching based on network big data platform. Rev. Fac. Ing. 32(9), 310–316 (2017)

Chapter 34

Research on Music Database Construction Combining Big Data Analysis and Machine Learning Algorithm Sanjun Yao and Yipeng Li

Abstract With the popularity of music database, the traditional classification based on text information such as music name and artist can no longer meet people’s needs. Driven by big data, this article applies machine learning algorithm to the construction of music database, and proposes an intelligent music recommendation method based on FCM (Fuzzy c-means) algorithm. This method uses the principle of FCM algorithm to classify songs and intelligently recommend similar songs to users. The results show that the recall and precision of the FCM algorithm proposed in this article can reach 94.98% and 95.76% respectively. This shows that the proposed method has certain reliability, better realizes the dynamic classification of songs, and changes the traditional mode of similar websites recommending songs to users with leaderboards. This article aims to promote the construction of digital music database and make the analysis and retrieval of massive music resources and data fast and accurate. Keywords Big data · Fuzzy c-means algorithm · Machine learning · Music database

34.1 Introduction The development of Internet makes the creation and dissemination of information more and more convenient, and the information grows faster and faster. People have entered an era of information overload [1]. In the face of massive information, how to help users choose the information they are interested in and shield irrelevant information is an important topic worthy of study [2]. At the same time, with the growth of the Internet and the arrival of big data, the traditional music industry has been S. Yao (B) · Y. Li College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_34

343

344

S. Yao and Y. Li

greatly impacted, and the digitization of original music has brought great convenience to people’s lives. At present, there are more and more music websites, and the resources of online songs are more and more abundant. On-demand songs through the network have become an important part of people’s entertainment life [3]. It is a good method to make full use of existing resources and establish a distinctive digital music database [4]. With the development of Internet technology, people can get more and more data and information. This provides a lot of convenience for people’s life and work to a certain extent, but on the other hand, it also brings challenges to data providers and consumers, and the demand of users is gradually increasing [5]. At present, the online music service has a huge quantity of users, a wide range of use and a high frequency of use, and shows a trend of sustained and rapid growth [6]. In this environment, the establishment of a new digital music database is always accompanied by challenges and opportunities, which come from internal and market supervision. With the increasing scale of personalized recommendation system, the quantity of users and project data increase sharply, the requirements of real-time and accuracy of music recommendation have become increasingly difficult to meet [7]. Usually, music websites will show the popular songs to users in the form of charts [8]. In the construction of music database integrating big data analysis and machine learning algorithm, the retrieval of music information should meet the needs of network retrieval and users, and should have the functions of reading guidance, browsing and retrieval [6]. The retrieved information is completely different in structure, content and scale, and can be integrated into the Internet to play, display and publish. The research of digital music metadata and the construction of database are very professional, and there are many cross-institutional and cross-regional cooperation [9]. The organization, retrieval, application, transmission and sharing of digital music projects are closely combined. In the past, in the construction of music database, the information resources developed were relatively simple, lacking in depth and breadth, and the value of many resources was not fully developed and utilized [10]. The basic idea of fuzzy set theory is to extend the membership relationship in classical sets, so that the membership degree of elements to “sets” can be extended from only two values of 0 and 1 to any value in the unit interval [0, 1], thus realizing the quantitative representation of fuzzy objects. In this article, machine learning algorithm is applied to the construction of big data-driven music database, and an intelligent music recommendation method based on FCM algorithm is proposed. This method uses the principle of FCM algorithm to classify songs and intelligently recommend similar songs to users.

34 Research on Music Database Construction Combining Big Data …

345

34.2 Methodology 34.2.1 Big Data Analysis and Machine Learning Algorithms In China, as a new industry, it is very necessary to understand and tap the market of digital music database websites [11]. Music database is a field full of vitality and potential. There are a large quantity of users of online music service, a wide range of use and high frequency of use, and it shows a sustained and rapid growth trend. The growth of music information resources and the construction of music database are to make reasonable selection, scientific integration and deep processing of music information resources on the basis of extensive collection of music information resources and according to the nature and tasks of the database itself. It can be continuously transformed into network resources and retrieval resources, so as to facilitate users’ communication, absorption and utilization, and at the same time, it can do in-depth research on new ideas and viewpoints in the art field. The music database in this article adopts B/S architecture. Applications based on B/S architecture don’t need a special download configuration process, as long as there is a browser on the computer. On the B/S architecture, the browser client undertakes the task of view display, and the application server undertakes the task of transaction processing, which reduces the burden of the browser client a lot, thus preventing the browser client from becoming very complicated. The system network topology diagram is shown in Fig. 34.1. Clustering is to use similarity to merge and gather scattered data; Classification, on the other hand, disperses data into different classes. The most common method of clustering is to use Euclidean distance as similarity measure, cluster points into one class with the nearest clustering center, and then calculate the point with the nearest distance between other points in each class as the new center, and cluster the points again until the clustering center no longer changes, which is the completion of clustering [12]. The goal of FCM is to classify objects according to the degree of difference, and the basis of classification is various attribute indicators of objects. In this article, songs are clustered, and there are many attributes of songs, which can be marked from shallow to deep. The recommendation based on FCM algorithm is divided into several different clusters according to users’ usage habits and scores. The basic idea of recommendation method based on FCM algorithm is to generate recommendations for target users according to the evaluation of target projects by neighboring users. According to the classification of attributes of clustering algorithm, these attributes mainly include classification variables, ordinal variables and binary variables. Classified variables refer to the qualitative values of variables, which show incompatible categories or attributes; Ordinal variables refer to variables whose quantity varies within a certain range; A binary variable refers to a variable whose value is not 0 but 1. The music intelligent recommendation method based on FCM algorithm firstly standardizes the data in the user-item matrix. Secondly, FCM method is used to matrix the similarity of projects according to their attribute characteristics. Then, the fuzzy equivalence relation matrix of the project is obtained by calculating

346

S. Yao and Y. Li

Fig. 34.1 System network topology diagram

the transitive closure of the similar matrix; Then, the fuzzy classification of items is obtained by setting thresholds to indicate the degree to which items belong to each fuzzy cluster. Finally, the fuzzy classification of items is used to obtain similar groups of items that users are interested in, so as to realize recommendation.

34.2.2 Construction of Music Database and Recommendation System In this article, the back-end of music database recommendation system is developed by Django framework. Django is an open source Web framework, which is one of the favorite frameworks of many Web developers. It is written by Python and is the next representative framework of Python, and it is widely used. The framework is powerful and integrates many modules such as ORM, which provides great convenience for developers when developing projects. Because different users may like different types of music in the same emotional state, the same user may also like different types of music in different emotional States. Therefore, in order to associate users, emotions

34 Research on Music Database Construction Combining Big Data …

347

and music types, that is, to mine the music types that users like in different emotional states, it is necessary to establish the association between emotions and music types for each user by combining the social media dynamics of users’ Weibo and their music listening records. Different users have different preferences for musical emotions. Some users prefer something fresh and warm, while others prefer something slightly sad. Therefore, it is necessary to calculate users’ preferences for various emotions for each musical emotion. Therefore, it is necessary to calculate users’ preferences for various styles for each music style. This article mainly discusses the recommendation list for registered users. Tf-IDF (Term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and DM. The formula for calculating word frequency is as follows: TF =

The number of times a word appears in the text The total number of times in the text

(34.1)

To calculate the frequency of reverse documents, a corpus is needed to simulate the language environment:  IDF = log

The total number of documents in the corpus The total number of documents containing the word + 1

 (34.2)

Calculate TF-IDF: TF − IDF = TF ∗ IDF

(34.3)

The recommended hardware structure framework of music database is shown in Fig. 34.2. According to the attributes and relations of music ontology, combined with the axiom set of domain ontology, predicate logic system is used to design the axiom of music ontology in this article. It mainly includes function attribute, inverse function attribute, transitive attribute, symmetry attribute, reflexive attribute, etc. Among them, the so-called functionality means that the function value of the same parameter is unique, while the inverse functionality is the opposite. In this algorithm, the songs in the cluster that users play more often are selected and added to the candidate set for sorting, thus generating recommended songs. Therefore, the increase of the empty set may lead to the increase of songs that need to be sorted, and those songs that are “called” empty set clustering centers are unlikely to be recommended. Therefore, it is necessary to select a K with a large Q value and a small empty set rate as the quantity of cluster centers. Compared with the calculation of user similarity based on the score of the whole project space, the calculation of this algorithm is carried out in a relatively small and concentrated data set, which not only ensures the integrity of the data, but also greatly reduces the calculation of the data. The proposed method defines the popularity of songs by using the user’s tags, as shown in the formula:

348

S. Yao and Y. Li

Fig. 34.2 Frame diagram of recommended hardware structure of music database

p( f, i ) =

|N (i) ∩ U ( f )| |N (i )|

(34.4)

where N (i ) stands for a collection of users who like the song i; U ( f ) stands for all user sets marked with the tag f . According to the above formula, the popular songs under different labels are calculated, and when new users enter the system, the system recommends these popular songs for users, which effectively solves the problem of cold start. The traditional CF (Collaborative filtering) algorithm needs to find the K nearest neighbor for each song, but this obviously highlights the commonness between songs and ignores the characteristics of each song itself. In this article, this shortcoming is improved, and songs with high recommendation are clustered into one category, so the nearest neighbors of songs are other songs in the same cluster. The algorithm can measure the similarity between users according to the similarity between their music preference concepts. If there is no music with common preference among users or there is very little overlap, we can still use the concept of preference to find neighbors for users. In addition, the traditional CF has no advantage when it encounters sparse data. The disadvantage is that you can’t find the potential interests of users, and the recommended items are only similar to those before, which is not diverse. In this article, firstly, the attribute indexes of clustering algorithm objects are determined, and then the data matrix is established according to the data model in turn and the

34 Research on Music Database Construction Combining Big Data …

349

difference degree of the matrix is analyzed, so as to calculate the distance of the objects conveniently. After the difference analysis, the matrix is transformed into a fuzzy similarity matrix and a dynamic clustering diagram is obtained, so that the songs are divided into several categories for intelligent recommendation for users. This method can solve the problems of cold start and sparse data.

34.3 Result Analysis and Discussion Intelligent song recommendation system can make users find music more easily. Using intelligent recommendation to help users has more advantages than traditional music charts. In order to verify the feasibility of this method, this section will verify the effectiveness of this algorithm in the form of experimental comparison. The experiment mainly maps the classification information of music to the constructed model, and then infers the user’s personal music preference through the algorithm according to the user’s complete listening behavior. Therefore, a part of the existing classification information and the data recorded by users’ listening behavior are screened out from the first data source, and a part of the song list with users’ annotation information on music is extracted from the second data source as research data. In order to recommend music to users more accurately, several important parameters need to be selected according to the specific data set and application environment. This part mainly analyzes the influence of the length of the recommendation list and the quantity of time slices on the recommendation effect. The songs in this article have different attribute indexes, such as the sadness and rhythm of the songs. After manual scoring, different scores can be marked, and then they must be converted into dimensions between 0 and 1. When a new user joins the system, there is a cold start problem because there is no user data. When a new user enters the system, the user is required to select at least three interest tags, and the system recommends popular songs close to the tags selected by the user. Firstly, two indexes, recall and precision, were selected and several experiments were carried out respectively. Figure 34.3 shows the accuracy results of several algorithms. Figure 34.4 shows the comparison of recall rates of several algorithms. It can be seen that the recall and precision of the proposed algorithm can reach 94.98% and 95.76%, respectively. This shows that the proposed method has certain reliability. After adopting the FCM-based algorithm in this article, the time required for real-time recommendation will be significantly less than that required by the traditional recommendation algorithm, especially when the quantity of clusters is relatively large, and the efficiency difference between them is several times. This is because when the quantity of clusters is large, the quantity of items in the same category is small, and the time required for calculating the results is short. Using the algorithm based on FCM can greatly improve the efficiency of recommendation. RMSE value is used to compare the predicted user preference with the actual user preference. It uses the deviation between the predicted value of the user’s preference

350

Fig. 34.3 Comparison results of precision of the algorithm

Fig. 34.4 Comparison of recall of the algorithm

S. Yao and Y. Li

34 Research on Music Database Construction Combining Big Data …

351

Fig. 34.5 RMSE comparison results of several algorithms

and the real preference value of the user to measure the accuracy of the algorithm prediction. Figure 34.5 shows the RMSE comparison of several algorithms. The smaller the RMSE value, the smaller the representation error, the more accurate the prediction of the algorithm and the better the recommendation effect. It can be concluded that the prediction of this method is more accurate. In this section, three indexes, recall, precision and algorithm error, are selected to conduct several experiments respectively. The experimental results show that the error of the FCM algorithm proposed in this article is low, and the recall rate can reach 94.98% and the precision rate can reach 95.76%. This shows that the proposed method has certain reliability. This method aims at the user’s existing song library, analyzes the categories, and then carries out seed screening, which can introduce more songs of the same cluster and recommend them to users.

34.4 Conclusions Massive song resources not only bring great convenience to music lovers, but also pose new problems and challenges. At present, there are many kinds of digital music databases, with different construction subjects, retrieval methods and audiences. The structure of music database and the design of retrieval method are the key to develop music information network resources, so it is very necessary to develop characteristic music database and design novel retrieval methods to promote the growth of music

352

S. Yao and Y. Li

education. Only by constantly strengthening the development and construction of music information network resources can we adapt to the changes of network and international music information demand. In this article, a new music database is constructed by combining big data analysis and machine learning algorithm, and a music intelligent recommendation method based on FCM algorithm is proposed. The music database system adopts B/S architecture. Finally, the paper selects three indexes: recall, precision and algorithm error, and carries out several experiments respectively. The experimental results show that the error of the FCM algorithm is low, and the recall rate can reach 94.98% and the precision rate can reach 95.76%. This shows that the proposed method is feasible and reliable. Establishing a digital music database with digital collection and multi-dimensional retrieval functions is tantamount to opening the prelude to exploring the digitalization of music cultural resources. It also plays a long-term role in the research, development and sharing of music culture.

References 1. Z. Zali, M. Ohrnberger, F. Scherbaum et al., Volcanic tremor extraction and earthquake detection using music information retrieval algorithms. Seismol. Res. Lett. 2021(6), 92 (2021) 2. S.T. Kim, J.H. Oh, Music intelligence: granular data and prediction of top ten hit songs. Decis. Support Syst. 145(2), 113535 (2021) 3. A. Dm, A. Xl, A. Qd et al., Humming-query and reinforcement-learning based modeling approach for personalized music recommendation. Proc. Comput. Sci. 176, 2154–2163 (2020) 4. M. Srinivasa, S.G. Koolagudi, Content-based music information retrieval (CB-MIR) and its applications toward the music industry: a review. ACM Comput. Surv. (CSUR) 51(3), 1–46 (2018) 5. M. Mueller, A. Arzt, S. Balke et al., Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Sig. Process. Mag. 36(1), 52–62 (2018) 6. C.Y. Wang, Y.C. Wang, S. Chou, A context and emotion aware system for personalized music recommendation. J. Int. Technol. 19(3), 765–779 (2018) 7. A. Xambó, A. Lerch, J. Freeman, Music information retrieval in live coding: a theoretical framework. Comput. Music. J. 42(4), 9–25 (2019) 8. D. Sun, Using factor decomposition machine learning method to music recommendation. Complexity 2021(3), 1–10 (2021) 9. M. Schedl, C. Bauer, An analysis of global and regional mainstreaminess for personalized music recommender systems. J. Mob. Multimed. 14(1), 95–112 (2018) 10. J. Shi, Music recommendation algorithm based on multidimensional time-series model analysis. Complexity 2021(1), 1–11 (2021) 11. X. Du, Application of deep learning and artificial intelligence algorithm in multimedia music teaching. J. Intell. Fuzzy Syst. 38(2), 1–11 (2020) 12. Y. Huang, W.J. Huang, X.L. Xiang et al., An empirical study of personalized advertising recommendation based on DBSCAN clustering of Sina Weibo user-generated content. Proc. Comput. Sci. 183(8), 303–310 (2021)

Chapter 35

Construction and Optimization Design of Pop Music Database Based on Big Data Technology Chunqiu Wang and Xucheng Geng

Abstract With the rapid development of pop music, the digitalization of music has greatly promoted the innovation and sustainable development of traditional music. In the big data environment, music communication presents many new features, realizing the transformation of communication mode from simplification to diversification. This paper mainly studies the construction and optimization design of pop music database based on big data technology. Based on big data technology, the pop music database is established by using B/S architecture. Aiming at the query optimization problem of pop music database system, the method adopted in this paper is to optimize it by using improved GA (genetic algorithm). The simulation results show that the response time of different database systems is not much different, and the response time of improved GA is faster than that of traditional GA, and with the increase of iteration times, the efficiency of improved GA is more obvious. The research conclusion shows that the improved GA has greatly improved the performance compared with the traditional GA, and can improve the query effect of pop music database. Keywords Big data · Pop music · Database

C. Wang College of Music and Dance, Huaihua University, Huaihua, China X. Geng (B) Finance Office of Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_35

353

354

C. Wang and X. Geng

35.1 Introduction The advent of the era of big data has greatly changed the whole human society and brought all-round innovations to our lives, including communication technologies and models. As a music industry that is highly dependent on communication technology, it is one of the first areas that are deeply influenced by the communication technology revolution in the era of big data. The application of database technology has penetrated into all walks of life, and the digitalization of music has gradually become the mainstream development trend [1, 2]. A large number of music resources need to be searched, classified, understood and analyzed, which promotes the development and application of music information retrieval technology and attracts more and more researchers from various fields to devote themselves to the research in this field [3, 4]. The rapid development of pop music has promoted the innovation and sustainable development of traditional music to a great extent, but at the same time, it has also brought about the problems of the sharp increase in the number and uneven quality of pop music resources, which makes the management of digital music resources the focus of music digitalization construction [5]. With the progress and evolution of computer technology and information technology, since 1990s, a series of technical research and project development of music database system have been carried out at home and abroad. Many music universities in China are also actively involved in the construction of music database system. At present, the international research on music resources has made much progress, but the domestic research on music resources is still lagging behind, and the research on information standardization, digital copyright protection, integration, sharing and service of music resources is not deep enough [6, 7]. Big data is an important strategic resource serving scientific and technological innovation in the knowledge age, and it is an important premise and foundation of scientific and technological information service. It has the characteristics of large volume, many data types, fast processing speed and low value density [8]. The development of pop music has changed from simple unification at first to personalized recommendation now. Users can not only get the content related to their own interests from the system recommendation, but also discover the content that they are most likely to be interested in through mutual sharing among users. The construction of pop music database will integrate the information resources of pop music works, transform fragmented paper and audio-visual resources into a new generation of information resources, and accurately serve network users. Aiming at the query optimization problem of pop music database system, this paper adopts an improved GA (genetic algorithm) to optimize it.

35 Construction and Optimization Design of Pop Music Database Based …

355

35.2 Research Method 35.2.1 Overall Design of Pop Music Database Music is not only a work of art, but also a cultural consumer product that is widely spread and consumed by audiences at different levels. Therefore, music, as a special form of commodity, has a huge market demand space. Digitalization does not mean digitization. To digitize an object is to put it in a quantitative form, which can be made into a table and analyzed. In the era of big data, the digital processing of music content alone is far from meeting the needs of effective music dissemination, and a communication method that is more convenient for management and calculation is needed, that is, digitization. As the orientation of popular music communication mode, with the popularity of personal mobile electronic devices, the communication mode of popular music must be changed according to the requirements of the audience. Establish relevant databases, sort out different music resources through data analysis, and spread music art scientifically and effectively. The audience can obtain all kinds of favorite information through simple retrieval, and provide convenient services for people more pertinently and efficiently. The content of communication is rich and complete, which gives the audience an all-round and integrated audio-visual experience. The construction of pop music database will expand the audience and communication space for pop music culture. Culture seeks development through innovation, innovation through development and promotion through communication. The traditional music search mode is based on text, and the defects of this mode are obvious. At present, the mode of searching through audio content by using big data technology can effectively make up for the inherent defects of text search mode. The retrieval of audio content can fully analyze the audio itself, so as to accurately identify and index the music melody, which has high technical content [9]. Under the background of big data, music communication has many new features, and the mode of communication has changed from single to diversified. In a huge and infinite virtual world, all kinds of roles may change. It is an effective method to digitize music works by using knowledge ontology. This cloud storage model is far from effective for digital music dissemination. We need a more open platform and mechanism to allow all participants in the communication chain to legally share their works, products and services under the protection of copyright law. The construction of pop music database integrates music resources and realizes the digitalization and informatization of music resources. Digitize and unify all kinds of paper, audio, video and other resources to realize standardized pretreatment of database resources. After building the database, pop music resources can be stored in the cloud server, which can be searched and used at any time, and the traditional paper music scores, CDs, cassettes and fragmented Internet materials can be transformed into a systematic, digital, easy-to-find and easy-to-maintain cloud database form, which can not only save costs, but also be permanently preserved, and facilitate the continuous expansion and update of data resources.

356

C. Wang and X. Geng

Fig. 35.1 Hierarchical framework of pop music database system

The construction of pop music database will adopt the current mainstream B/S architecture, and use advanced technologies such as website clustering technology, data collection and reporting technology, data statistical analysis technology, business intelligence technology, data exchange and data sharing to develop and build [10]. To ensure that the system has good scalability and maintainability, the system uses the hierarchical design concept and divides its implementation into four levels, namely, data application layer, data storage layer, data processing layer and data access layer. This system uses a Java language and a tomcat as a server. Figure 35.1 shows the hierarchical structure of a popular music database. On this basis, an object-oriented distributed storage management method is proposed, which adopts the object-oriented distributed storage management method to realize object-oriented storage management. The data persistence layer captures and processes the data, and then returns the result of the data to the display layer. Among them, the data layer is the bottom layer of the whole system architecture, which mainly undertakes the task of data processing for the whole system, mainly operating the database and indexing Lucene. On this basis, using the index of Lucene, the data can be obtained quickly, which greatly improves the query efficiency of the database and accelerates the retrieval speed of the system. In order to meet the requirements of the overall function and performance of the system, the pop music database management system is designed, developed and deployed in a mode of centralized data storage, unified management and unified provision of Web-based information submission and access services. The whole station adopts dynamic technology, and the homepage of websites at all levels uses

35 Construction and Optimization Design of Pop Music Database Based …

357

Fig. 35.2 System network architecture

cache. The whole system is built according to the idea of website cluster. Firstly, the website cluster platform is built according to the classification of pop music. Figure 35.2 shows the system network architecture. This system mainly includes data acquisition server, database server group, WEB application server group, system security equipment, network equipment, data backup system and portal system. In the system, each server cluster will be strictly divided into service roles, and through the deployment of server clusters, the overall service performance, availability and scalability of servers will be improved. The database server provides reliable and high-performance data storage and access services through the configuration of dual-computer hot backup work.

35.2.2 Database Query Optimization With the development of modern computer network technology, the database presents a trend of networking. The operation and processing objects of pop music database are mainly a large number of data scattered in different places, and these data are stored and queried. Therefore, how to improve the efficiency of data query is a key issue in the study of pop music database [11]. Although the query optimization technology of pop music database is quite complex, the demand for resource sharing

358

C. Wang and X. Geng

by units, organizations and individuals distributed in different geographical locations has promoted its development, and its application demand has made it a hot topic of research today. Therefore, it is necessary to study the query optimization of pop music database, which is of great significance to improve work efficiency and solve practical problems. In order to make music transcend the limitation of time and space and realize its spread, it is necessary to record and process the music, so we need: recording studio, performance organization and data processor. It has played a decisive role in the popularization of digital music. In addition, cloud computing providers must also provide public or private digital music cloud computing, including storage, search, uploading, sharing and even trading. In this process, consumers of digital music have become the most important link in the whole communication chain. GA has unique advantages in solving NP-type problems, multi-dimensional objective functions and nonlinear optimization problems. Based on these properties of pop music database query optimization, it is particularly effective to use GA to optimize such problems. However, in traditional GA, crossover operator and mutation operator are invariable in the whole process of algorithm execution. For the mutation operator, setting it too small is unfavorable for generating new individuals. In order to solve some shortcomings of traditional GA mentioned above, this paper introduces GA and artificial immune theory into database optimization, and makes use of GA’s good search characteristics and immune theory’s ability to efficiently solve the optimal solution. The query efficiency of database can be optimized through genetic coding, genetic operator and immune vaccine construction. The implementation process of the improved GA in this paper is almost the same as that of the traditional GA, and it all goes through seven processes: coding design scheme, selecting strategy formulation stage, selecting genetic operator, designing cost function, designing fitness function and determining algorithm stop condition [12]. In the process of pop music database query, binary tree is generally used to represent the execution plan of pop music database query. The query connection relationship of binary tree is related to the shape of the tree. For the complicated query execution plan, the binary coding scheme can’t handle it well. We use postorder to traverse the binary tree. For leaf nodes and non-leaf nodes, we use 0 as the table number related to the query. The pop music database is deployed on the environment of computer network. In general, we divide the network model into two categories: point-to-point model and broadcast model. In order to roughly calculate the communication cost, we need to set the following two points in advance: the first point is that the attribute values of the relationship are evenly distributed in tuples. The second point is that there is no correlation between the assignment of the attribute values of the relationship. For the width WT of relational table T , there is the following formula: WT =

n  i=1

wi

(35.1)

35 Construction and Optimization Design of Pop Music Database Based …

359

where n represents the number of attributes in the relational table T , and T represents the i-th attribute in the relational table T . In GA, fitness value is an important reference factor to determine the quality of chromosomes in the population. In the pop music database system, the cost of a query plan includes the cost of network transmission and the cost of query, and the fitness function should be considered together. Because the query plan in this paper corresponds to a nonlinear model, and GA is not good at dealing with nonlinear models. In order to solve this problem, this paper adds a penalty factor to the query plan of the connection tree, and the new fitness formula is as follows: Fitness(i) = punish + cos t( j )

(35.2)

here cos t( j ) represents the query cost of the subtree with j node as the root node in this chromosome. In each iteration of this paper, we need to rely on GA to get the optimal solution of the current number of times, but the above actions are repeated in the next generation population, so the concept of immune vaccine is constructed. The probability p(x, j) of antibody j in the subset is: F(xi , j ) k=1 F(x i , j )

p(x, j ) = s

(35.3)

The database set X is evenly divided into S sets, and each set corresponds to a gene block. When the i-th gene block is selected, a random number c is generated in [0, 1], and the j-th gene value in the gene block i is: k(i ) =

i −1+c s

(35.4)

35.3 Simulation Experiment In this paper, the simulation experiment is carried out on a computer. The configuration hardware environment of this computer: processor: Pentium (R) dual-core 2.60 GHz; Memory: 2.0 GB; Hard disk: 120G; Software environment: operating system: Windows 7; Simulation platform: MATLAB. The population size is 30, the largest genetic algebra is 60, and then the crossover rate is 0.8, and the mutation rate is 0.1. In order to compare the differences of different database systems, experiments were carried out in three popular music database clusters. The experimental data is imitation pop music data. When the network topology is generated in advance, the communication time on all paths is set by random method, and the communication time on each node is also

360

C. Wang and X. Geng

Fig. 35.3 Convergence performance of the algorithm

set effectively in milliseconds. For the case of network topology with 50 nodes, the convergence performance of the algorithm is shown in Fig. 35.3. It can be seen that the convergence speed of the algorithm is good at first, and the multi-ant colony algorithm gets the optimal solution in the 26th generation after a short search stagnation in the 6th, 12th and 20th generations. In this experiment, the crossover probability and mutation probability are set to 0.7 and 0.1 for the traditional GA, and (0.8, 0.5, 0.3) and (0.2, 0.15, 0.1) for the improved GA. The population size is set to 90, and the number of iterations is set to (100, 200, 300, 400, 500) respectively. Then run the algorithms respectively, and the results in three databases are shown in Fig. 35.4. The experimental results show that the response time of different database systems is not much different, and the response time of improved GA is faster than that of traditional GA, and the efficiency of improved GA is more obvious with the increase of iteration times. The experimental results show that the improved GA has greatly improved the performance compared with the traditional GA, and can improve the query effect of pop music database.

35 Construction and Optimization Design of Pop Music Database Based …

361

Fig. 35.4 Comparison of experimental performance between two algorithms

35.4 Conclusion As a music industry that is highly dependent on communication technology, it is one of the first areas that are deeply influenced by the communication technology revolution in the era of big data. The application of database technology has penetrated into all walks of life, and the digitalization of music has gradually become the mainstream development trend. Based on big data technology, this paper adopts B/S architecture to establish pop music database. This system mainly includes data acquisition server, database server group, WEB application server group, system security equipment, network equipment, data backup system and portal system. Aiming at the query optimization problem of pop music database system, this paper adopts improved GA to optimize it. The simulation results show that the improved GA has greatly improved the performance compared with the traditional GA, and can improve the query effect of pop music database.

References 1. J. Liu, Construction and update of national basic geographic information database. Bull. Surv. Mapp. 10, 1–3 (2015) 2. X. Zhao, J.H. Tianshenghu, Investigation and analysis of characteristic database construction of public libraries in Yunnan. Libr. Sci. Res. 7, 5 (2015) 3. Q. Zhou, X. Li, Design of China music database system based on recommendation technology. Comput. Technol. Dev. 25(7), 4 (2015)

362

C. Wang and X. Geng

4. J. Liu, Design of China music database system based on project collaborative filtering algorithm. Microcomput. Appl. 2, 4 (2019) 5. H. Wang, L. Liu, W. Ji et al., Construction of Mongolian music resource database. J. Shaanxi Normal Univ. Nat. Sci. Ed. 049(005), 109–116 (2021) 6. X. Shao, Q. Yu, Music cross-modal retrieval based on canonical correlation. Comput. Technol. Dev. 25(7), 6 (2015) 7. W.A. Borgy, S. Bala, T. David, Research in mobile database query optimization and processing. Mob. Inf. Syst. 1(4), 225–252 (2015) 8. E. Azhir, N.J. Navimipour, M. Hosseinzadeh et al., Query optimization mechanisms in the cloud environments: a systematic study. Int. J. Commun. Syst. 32(8), e3940 (2019) 9. J. Li, X. Xia, X. Liu et al., Probabilistic group nearest neighbor query optimization based on classification using ELM. Neurocomputing 277(14), 21–28 (2017) 10. M. Joshi, P.R. Srivastava, Query optimization. Int. J. Intell. Inf. Technol. 9(1), 40–55 (2015) 11. J.F. Naughton, Technical perspective: broadening and deepening query optimization yet still making progress. SIGMOD Rec. 45(1), 23–23 (2016) 12. S. Madden, A. Cheung et al., Sloth: being lazy is a virtue (when issuing database queries). ACM Trans. Database Syst. 41(2), 8.1 (2016)

Chapter 36

The Application of Big Data in the Construction of Modern Vocal Education Score Database Hongru Ji and Sanjun Yao

Abstract Modern vocal music is not only different from traditional vocal music in China, but also different from “Bel Canto” in the west. It is a new vocal music form developed under the specific cultural background of our country, mainly by inheriting our own traditions and drawing lessons from the excellent achievements of Bel Canto, which is absolutely different from western Bel Canto and is accepted and loved by China people to distinguish traditional vocal music. For the vocal music works of today’s era, the commonly used literature music score is the vocal accompaniment music score, which plays a very key role in the vocal pedagogy in colleges and universities. China has entered the “Internet era”. In the Internet environment, Big data (BD) technology is the key to build database. The massive data generated on the Internet requires scientific and reasonable collection and sorting, and there should be sufficient storage space, so as to query and modify the data, timely and accurate tracking. It is particularly important to strengthen the construction of music score library in vocal music teaching, especially in the BD environment where information technology is becoming more and more mature. Keywords Big data · Modern vocal music · Teaching music score · Database construction

36.1 Introduction From the original 1G network to the current 5G network, the development process has only gone through ten years, which shows that China attaches great importance to the construction and development of information technology [1]. Music score, as an important form of recording music, makes music leap above the spectrum through the comprehensive application of various music notation [2]. The types and H. Ji · S. Yao (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_36

363

364

H. Ji and S. Yao

forms of music scores become more diverse, and music scores will be published in the form of music collections [3]. The construction of the China Vocal Music Database (hereinafter referred to as the “Database”) will integrate the information resources of China’s vocal music works, transform the fragmented paper and audiovisual resources into a new generation of information resources, accurately serve the vocal pedagogy and scientific research in music colleges, promote the prosperity and development of China’s vocal music, and accelerate the integration and innovative development of information technology and vocal music education and teaching [4]. The application of these new teaching technologies and teaching products based on network technology has gradually formed a variety of new learning modes and methods for both teaching and learning [5]. Focusing on many problems and difficulties in current music education in normal universities, let me have some thoughts on how to use current Internet technology to build a teaching track database for vocal pedagogy in normal universities [6]. “Internet + education” means that all the contents related to teaching will be combined with the Internet [7]. The research goal of this article is to construct a modern vocal music teaching score database. Section 36.2 introduces the structure and construction method of the score database; Sect. 36.3 introduces the association rule mining algorithm and constructs a database using this algorithm. In addition, it was compared with the other two methods. The results indicate that the method proposed in this paper has a high success rate and is a high-performance solution method.

36.2 Basic Content of Music Score Database Construction 36.2.1 Construction of Music Score Database In our daily study and life, our familiar databases include the famous China HowNet and Wanfang Data Platform. They are also high-quality database module platforms [8]. There is an inevitable connection between the application of the database and the music score. In a broad sense, the database can be organized according to the corresponding structure and rules, and the data can be stored and managed. For example, after the database is defined, it can be responsible for basic management such as query and storage, and realize convenient resource sharing with the help of computer networks. Guided by this concept, databases can be regarded as different types of resource integration modules [9]. Whether it is website data or technical data, the computer can call and integrate the content according to the user’s needs after collecting the information, plan and control the content and elements in these modules, and modify and adjust them as needed [10]. The main types of the database are full text retrieval and reading of music journals, academic thesis database of the Conservatory of Music, foreign vocal music works database, full text reading of vocal music scores in the library, etc. However, in general, there is a large gap between the level of Chinese music database construction and that of the West. There are still

36 The Application of Big Data in the Construction of Modern Vocal …

365

Fig. 36.1 System database structure diagram

some problems in all aspects. With some music and art colleges as the leaders. As shown in Fig. 36.1. Data information is much simpler to collect than to process. It only needs to establish a large capacity database, which can fully ensure the storage and security of data information, and can collect information according to the relevant process. For the processing of data information, the network contains a series of processes of information search. For example, the database will first use the various browsers as the information input interface and complete the interaction between the adult and the network through the instructions issued by the user. In order to accurately find relevant data information, and transfer the information to the website where the browser is located, the website will conduct standard and reasonable processing of the information according to its own requirements, so as to show the user. Therefore, BD technology can intelligently and accurately process all the information in the database, so as to ensure the actual use and function of the database and give full play to the value of the database. However, in terms of advantages, China’s cultural resources are very strong. As a world cultural power with a long history, China has accumulated important literary resources. In the future work practice, we need to strengthen the utilization of resources and strengthen the cooperation mechanism through the technological advantages in the era of BD, so as to save time and cost while building cooperation.

366

H. Ji and S. Yao

36.2.2 The Concrete Method of Music Score Database Construction In the context of the Internet, people’s collection and demand for information are increasing, and the accuracy and value of data information are also improving. Therefore, it needs to be improved and improved continuously, so as to achieve efficient and accurate management of data information, and let BD technology play a more role in the construction of databases. The traditional music and art college libraries are mainly based on the preservation of paper vocal music scores and audio and video data, which requires high requirements for space occupation and resource maintenance; After the construction of the database, the resources of Chinese vocal music can be saved on the cloud server, searched and used at any time, and the traditional paper music scores, CDs, cassettes and fragmented Internet data can be converted into a systematic, digital, easy-to-find and easy-to-maintain cloud data database form, which can not only save costs, but also permanently save, and facilitate the continuous expansion and update of data resources. The vocal pedagogy repertoire database can be referred to as “vocal pedagogy repertoire database” for short. The original intention of building this music library is to systematically store various vocal pedagogy songs in a database according to the difficulty, theme, style, performance form, performers, etc. according to the teaching needs of colleges and universities, combined with teaching practice. For learners, relying on the systematic construction of teaching music library, they have a growth ladder, which can be used in class and artistic practice. As long as the network is smooth, they can always call the works in the music library and practice while enjoying. However, there are some problems in the actual database construction and operation personnel, such as lack of consciousness and cognition, which leads to the database not being built completely in the construction; The lack of cognition of operators will lead to the one-sided and conservative application of BD technology, and the ambiguity of various values in the process of data information processing will restrict the rational development of BD technology, and it is difficult to find problems in the actual application process, thus hindering the development of BD technology. For example, the website will provide corresponding works information after selecting relevant projects according to the search results. When we search for music score results, we should also consider the copyright issue of the original author, download them under the premise of copyright permission, and convert them into encrypted files, and convert them into final music score database works in the form of physical books or electronic books. In this way, we can also organize such works into columns, providing greater convenience for the users of the works. The database construction also involves the processing of works. After the collected works are determined, paper books and electronic books need to be processed and converted into electronic music scores. For paper books, they can be converted into PDF files and then scanned page by page. For e-books, the information can be extracted by the electronic file reader first, and then the content can be saved separately by software to draw the electronic music score.

36 The Application of Big Data in the Construction of Modern Vocal …

367

At present, the mainstream music score software on the market can carry out one-key modulation during the production process, export different music score information of the work, so as to facilitate the data entry and use. BD technology plays an important role in the construction of databases, which can effectively reduce the input cost of data information and greatly improve the efficiency of information processing. Therefore, the improvement of cognition of BD technology is the basis to ensure that the database can fully use BD technology in the process of construction. Whether for database operation and maintenance personnel or users, they should treat BD technology correctly, And we are deeply aware of how to deeply tap the potential and value of technology on the basis of the original technology, and further meet people’s needs for data information.

36.3 Application of BD in the Construction of Music Score Database for Modern Vocal Pedagogy 36.3.1 Improved Parallel Association Rule Mining Algorithm Based on Vocal Music Education Data In the era of “Internet plus”, the education system has undergone profound changes, and vocal music education is no exception. They can get the guidance and correction from teachers at any time for the nonstandard sound or improper handling of vocal music works. In the process of data processing, the use of global search ability can optimize some data mining models that are trapped in local optima, enabling them to find the global optimal solution and improving data analysis and processing capabilities. On the other hand, the method proposed in this article will be continuously improved while optimizing, thus achieving better practical application results. Cosine similarity algorithm is the most common similarity algorithm, and its core idea is: treat the user’s understanding of teaching resources as an n-dimensional vector, and then calculate the angle of cosine to obtain the similarity between teaching resources, and the similarity calculation is as shown in Formula (36.1): ¯ ¯j ) = sim(i, j ) = cos(i,

i¯ · ¯j ||i|| ∗ || j||

(36.1)

The numerator is the inner product of two teaching resource vectors, and the denominator is the product of its vector. The improved cosine similarity calculation method subtracts the music score in the scoring matrix from the average value of the user’s evaluation of the teaching resources based on the above algorithm, that is, it modifies the music score based on the user’s evaluation of the teaching resources. If the user’s evaluation of other teaching resources is generally high, it needs to subtract more points to make the music score more objective and meet the standard, and vice versa. The calculation

368

H. Ji and S. Yao

formula is shown in (36.2):  sim(i, j ) = /

c∈Ui j (Rc,i −R c )(Rc, j −Rc )

/

c∈Ui (Rc,i −R c )2

(36.2)

c∈U j (Rc, j −R c )2

In this paper, the linear sorting method is used to adjust the fitness of individuals in the group, and the adjusted objective function values are sorted in descending order. Each individual calculates the corresponding fitness value according to its position in the sorted population. f ∗ (xi ) = 2 − sp + 2 × (sp − 1) ×

p(xi ) − 1 S−1

(36.3)

Among them, f ∗ (xi )(i = 1, 2, . . . , S) is the fitness of the adjusted individual; S is the number of individuals in the population; sp is the selection pressure difference, which is the ranking position of individual fitness value in the population.

36.3.2 Analysis of Experimental Results In order to accurately test the optimization effects of these three algorithms, this paper tests these test problems in low, medium and high dimensions, and the results are shown in Table 36.1. The table lists the optimal value, the worst value, the average optimal value and the standard deviation obtained by three algorithms after 30 runs based on four test functions. As shown in Figs. 36.2, 36.3 and 36.4, for multimodal and extremal functions, BDO has stronger optimization ability than PSO and GA in low, medium and high dimensions under the same evolutionary algebra, and it can jump out of the local suboptimal solution of multimodal functions with greater probability, thus converging to the global optimal solution. It can be seen from the table that the BDO algorithm’s worst value and average optimal value are significantly better than PSO and GA. For the medium and high dimensional test problems, the search performance of the three algorithms is weak, Table 36.1 Test results of four multi-peak and multi-extreme optimization functions Function

Item

f1

f2

f3

f4

PSO

Max

1772.1

2.4631

0.15266

12.4925

1.3426

0.6423

0.45215

1.9726

0.06122

6.12544

0.3196

0.04639

9.42423

Min Mean Std

824.21 1345.1 495.55

36 The Application of Big Data in the Construction of Modern Vocal …

369

Fig. 36.2 Comparison of algorithm convergence curves (1)

Fig. 36.3 Comparison of algorithm convergence curves (2)

and the average optimal value success rate is almost zero. However, from the comparison value, the optimal performance ranking of the three algorithms is still BDO greater than PSO and GA. The simulation results show that the BDO algorithm is significantly superior to other algorithms in terms of stability, convergence and diversity for high-dimensional optimization functions; For the combinatorial optimization problem CVRP, the experimental results show that BDO algorithm has a higher search success rate than GA and PSO algorithm, and is a superior solution method.

370

H. Ji and S. Yao

Fig. 36.4 Comparison of algorithm convergence curves (3)

Network resources have their own advantages, which are not limited by time and space, and are convenient for online sharing, exchange and cooperation. Users such as teachers and students can share resources simply by clicking on the links. In the process of academic exchange and cooperation in vocal music in China, schools can make effective use of database resources, fill in the gaps, exchange needed goods, and benefit each other, thus improving the efficiency of communication. At the same time, after the reciprocal communication between schools, a more scientific and reasonable vocal music resource system in China with abundant resources and smooth channels can be jointly established, and a convenient and efficient information exchange channel can be established to realize a wider range of music resource sharing.

36.4 Conclusion BD technicians are one of the basic guarantees in database construction. As the main implementers of database construction, we should be deeply aware of the role of BD technology in database construction and understand the value of BD technology. However, there are still many problems in the new mode of vocal pedagogy under the background of “internet plus”, such as lack of face-to-face communication, voice distortion caused by online vocal pedagogy recording, and relatively chaotic management of online resources. So how to solve these problems, how to combine the advantages of the new vocal pedagogy mode and the traditional vocal pedagogy mode under the background of “internet plus”, and improve the efficiency of vocal pedagogy are the problems we should focus on now. Promoting the integration and innovative development of information technology and music education and teaching

36 The Application of Big Data in the Construction of Modern Vocal …

371

is the requirement of the times, and also the mission and task of current music educators. The improved parallel collaborative filtering algorithm is implemented from the direction of improving the co-occurrence matrix and the characteristics of online education data, and various algorithms are simulated and analyzed.

References 1. E. Nakamura, K. Yoshii, Musical rhythm transcription based on bayesian piece-specific score models capturing repetitions. Inf. Sci. 57(1), 2 (2021) 2. H. Yang, Confucian expression of Chinese modernity appeal: philosophical construction of the first generation of modern neo-confucianism. J. Cent. South Univ. Soc. Sci. 47(8), 4 (2018) 3. Y. Zhang, Enlightenment on vocal music classroom teaching from the perspective of neuroscience. NeuroQuantology 16(6), 13 (2018) 4. Y. Liu, Exploration on the application of the internet and the network multimedia in the vocal pedagogy. Basic Clin. Pharmacol. Toxicol. 12(1), 5 (2019) 5. X. Zhao, Application of situational cognition theory in teaching of vocal music performance. NeuroQuantology 16(6), 13 (2018) 6. F.C. Thiesen, R. Kopiez, C. Reuter et al., A snippet in a snippet: development of the Matryoshka principle for the construction of very short musical stimuli (plinks). Music. Sci. 24(4), 5 (2020) 7. L.I. Desen, The application of big data analysis in teaching quality evaluation. Mod. Inf. Technol. 66(9), 37 (2019) 8. H.H. Sun, On construction and application of teaching resource database for higher vocational education of telecommunication engineering under the background of “Internet+.” J. Hubei Open Vocat. Coll. 55(10), 46 (2019) 9. D. Wei, Application of intelligent algorithm big data analysis in the construction of smart campus. Wireless Internet Technol. 77(9), 47 (2019) 10. L. Fu, Research on the reform and innovation of vocal music teaching in colleges. Reg. Educ. Res. Rev. 37(2), 10 (2020)

Chapter 37

Application of Big Data Analysis Technology in Music Style Recognition and Classification Haiqing Wu and Feng Wu

Abstract In the network era, the traditional audio-visual archives management is impacted by the rich network information and is facing the dilemma of being marginalized. In order to improve the effect of style recognition, this article applies big data analysis technology to music style recognition and classification, proposes a music feature recognition method based on Generative Adversarial Network (GAN), and uses the designed network to learn music styles and identify and classify different genres of music styles. The results show that the classification accuracy of the proposed algorithm is better than that of the LSTM algorithm. The model combines the two stages of feature extraction and classification in the recognition task to better realize the interaction between information. Keywords Big data · Music classification · Style recognition

37.1 Introduction Using the melody characteristics of music can help computers identify the style of music, and data mining technology is a way to carry out this work [1]. This work involves the expression of music melody, the extraction of melody characteristics and classification technology [2]. Many reasons make it very difficult for people to extract music features, which leads to the low efficiency of music genre classification and recognition [3]. In order to improve the management efficiency of music audio-visual archives resources, data technology should be applied to realize the information development of archives resources management. At present, the characteristics such as tone, timbre and loudness are usually used [4, 5]. The diversity of music makes music style an important feature value of Internet music retrieval, and most people are used to choosing the music they might like according to their favorite style [6]. Driven H. Wu (B) · F. Wu College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_37

373

374

H. Wu and F. Wu

by big data, the management of music audio-visual files has gradually changed, and the disadvantages of traditional management mode have gradually emerged [7]. The key to the classification of music styles is the feature extraction of music information. In this article, big data analysis technology is applied to music style recognition and classification, and a music feature recognition method based on GAN is proposed.

37.2 Methodology 37.2.1 Digital Management of Music Audio and Video Driven by Big Data The advent of the network era has brought new opportunities for the management of music audio-visual archives resources. In this case, it is particularly important to strengthen the preservation of materials. By extending it to the management of audio-visual archives resources, it can be explained that the archival work driven by big data can manage all files such as words, pictures, music and videos in a new way, so that it can play a greater role. Archives resource management driven by big data has three characteristics [8]. First, there is a large amount of data. As an important tool for communication, the main recording method is audio and video. Driven by big data, the number is gradually increasing, from traditional text to pictures, videos and web pages. Secondly, the data changes quickly, which makes it more difficult to record the project. In this case, it is particularly important to apply information technology to the management of audio-visual files, which can not only better preserve audio-visual materials, but also promote the effective development of the power industry [9]. Among them, the sound quality characteristics of audio reflect the texture characteristics of music signal [10].

37.2.2 Music Style Recognition and Classification Algorithm If it is realized by analyzing music characteristics and extracting digital audio features, it will take a lot of experimental time to collect samples, and at the same time, the lack of professional knowledge may lead to deviation in the process of feature selection, resulting in a lot of wrong classification of music styles [11]. Neural network has strong learning, automatic adjustment and nonlinear mapping characteristics, and has been widely used in text feature extraction, pattern recognition, industrial process control and so on. The GAN discriminator model for music style identification and classification is shown in Fig. 37.1. It is assumed that all music style sample data are generated by the same potential generation model. Based on this assumption, the unlabeled sample data can be associated with the learning target with the parameters of the potential generation

37 Application of Big Data Analysis Technology in Music Style …

375

Fig. 37.1 GAN discriminator model

model as a bridge, and the labels of unlabeled data are the missing parameters in the potential model, and the missing parameters are usually solved by maximum likelihood estimation. Assume that the input size of l layer is C l × H l × W l tensor, C l is the quantity of input channels of l layer, and the size of a single convolution kernel is C l ×h l ×wl . Then, corresponding to the convolution layer with C l+1 hidden neurons, the output of the corresponding position is as follows: ⎛ yd,i l , j l = σ ⎝

C  w h   l

l

cl =0



l

pd,cl ,i, j × xcl ,i l +i, j l + j + bd ⎠

(37.1)

i=0 j=0

where d is that numb of neurons in the l lay; i l and j l represent location information. The constraints are as follows: 0 ≤ i l ≤ H l − hl + 1

(37.2)

0 ≤ j l ≤ W l − wl + 1

(37.3)

where p is the convolution kernel parameter; b is the bias parameter in convolution; σ (·) is the activation function. Before training the music data, the data in the music library need to be simply processed. These processes include the quantification of data, the elimination of data that do not meet the requirements, and the processing of data formats [12]. In this model, it is mainly necessary to separate tracks, extract musical features and vectorize data. Because there is a description of the category of the instrument in the track header file, it is easy to separate the piano track. Sound quality is an important attribute for sound, and it is the basic feature of sound. If the tone and volume of sound are exactly the same but the sound quality is different, the sound will be very different. Different sound qualities are produced by different types of sound sources, just like different sounds and musical instruments.

376

H. Wu and F. Wu

The independent batch normalization is used to replace the joint normalization of each characteristic element, and the equation is as follows: 

X

(k)

  xi(k) − E x (k) = /   var x (k)

(37.4)

The k dimension ofthe input information is represented byx (k) ,the expectation is  (k) represented by E x , and the variance is represented by var x (k) . Add parameters λ(k) , β (k) in the k dimension of input information to obtain the following equation: 

y (k) = λ(k) X (k) + β (k)

(37.5)

  λ(k) and var x (k) are equal and both are variances. Music feature information can be obtained: L = J (w, e) −

N 

ai w T ∅(xi ) + b + ei + yi

(37.6)

i=1

In which J (w, e) refers to the motion position repeated pixel points; xi and yi are the music style feature vectors of the i-th Gaussian unit respectively; ai is the standard action configuration sequence; ∅(xi ) is the melody feature distribution function. Estimating the probability of the feature i in Gaussian unit obtained by dimensionality reduction, we can get: wi pi − vi r (i ) = k i=1 wi pi (vi )

(37.7)

where pi refers to the probability assigned to i-th Gaussian unit, and wi is the mixed weight. The theoretical basis of supervised classification is the establishment of statistical identification function model for practical problems, and the classification technology is realized according to the method of training representative sample sets. That is, supervised classification includes two processes. Firstly, the classifier is trained by the provided labeled samples, and then the unlabeled feature parameters are discriminated into a certain classification group by using the samples provided by the training area. Supervised classification method does not need to manually add labeled categories after classification, but unsupervised classification method does, which is an important advantage of supervised classification method over unsupervised classification method. After the music signal is expressed in the form of feature vector matrix, it is necessary to find a pattern in the feature space that can divide a feature vector matrix into a class of music styles. In the music style classification system, this is the work of the classifier.

37 Application of Big Data Analysis Technology in Music Style …

377

37.3 Result Analysis and Discussion According to the principle of human vocalization, in the process of vocalization, it will be influenced by glottic excitation and mouth-nose radiation, and the highfrequency part above 800 Hz will decrease by 6 dB/octave. The rectangular window function is commonly used in digital signal processing. In fact, adding a rectangular window to the signal means adding no window function to the signal. The characteristics such as timbre or loudness of music can only be extracted in the range of 10–40 ms, and its characteristics only represent the characteristics of that time period, but not the relationship between the musical characteristics of its adjacent time periods. In the case of a large quantity of unknown input data features, it is not appropriate to specify a unified static structure for different types of action sequence data with different lengths. A lot of attempts must be made to obtain a suitable network structure in order to obtain satisfactory analysis results. Figure 37.2 shows the corresponding time comparison of different algorithms. An adaptive operator is added to the local search strategy of GAN, so that the local search range decreases with the iterative algorithm, and then the local search is more targeted. Compared with the high-level statistical features, the classification model of music style features based on the bottom features can get higher accuracy. Through the improvement of this article, the convergence speed of GAN parameters is faster, and the final model classification accuracy is higher. This method obtains

Fig. 37.2 Response time of different algorithms

378

H. Wu and F. Wu

ideal music style feature recognition results, and the recognition accuracy is higher than other music feature recognition methods. The essence of machine learning is the model approximation of the real problem, and there is a gap between the real model and the approximation model of the problem. Risk is used to estimate the accumulation of this gap. In the process of classification, under the premise of given training samples, the difference between the classification situation and the actual situation of the test samples classified by the classifier is called empirical risk. In the process of training the same batch of iterations, the output of the same batch of samples will change if they are randomly processed. In order to guide the direction of model parameter optimization, it is hoped that the output from the same input will be as same as possible, that is, the probability of belonging to a certain category is as close as possible for prediction classification. On the platform of Matlab, the efficiency of different music style feature classification models is tested, and the recognition efficiency is evaluated by running time. The statistics of calculation time experimental results of different feature dimensionality reduction are shown in Table 37.1. The classification accuracy of musical style features of GAN and Long Short-Term Memory (LSTM) is shown in Table 37.2. Then the final music of different genres will have different strengths and weaknesses. The music classification accuracy of different algorithms varies with the quantity of samples as shown in Fig. 37.3. As can be seen from Fig. 37.3, the music style classification accuracy of the proposed algorithm is superior to that of the LSTM algorithm. In this article, by adding unlabeled data to the training set, the performance of the algorithm is better than that of using labeled data alone, thus improving the accuracy of recognition. The fusion invariant features after dimensionality reduction can retain the key information of music style features, which can not only significantly improve the recognition performance, but also improve the recognition efficiency.

Table 37.1 Dimension reduction time of music style feature classification model Music type Classical music

Training sample

Test sample

LSTM

GAN

LSTM

GAN

7.58

6.29

8.35

5.85

Pop music

9.17

5.72

6.88

4.57

National music

8.69

6.59

7.64

4.95

Table 37.2 Correct rate of music style feature classification model Music type

Training sample LSTM (%)

Test sample GAN (%)

LSTM (%)

GAN (%)

Classical music

87.65

93.46

82.53

93.51

Pop music

83.62

94.21

85.22

96.55

National music

86.37

95.88

85.78

96.24

37 Application of Big Data Analysis Technology in Music Style …

379

Fig. 37.3 Music style classification accuracy of different algorithms

37.4 Conclusion Driven by big data, it is a problem that every archives manager must pay attention to that audio-visual archives management should be managed in an all-round way, so that it can better cope with the impact of networking on archives work, better improve the efficiency of the use of audio-visual archives and better serve the society. The key to the classification of music styles is the feature extraction of music information. In this article, big data analysis technology is applied to music style identification and classification, and a music feature identification method based on GAN is proposed. The designed network is used to learn music styles and identify and classify different genres of music styles. In the future work, researchers can improve the recognition rate by collecting more music tracks of each style, make the extracted features more complete and coherent, and improve the recognition results.

References 1. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Proc. 11(7), 884–891 (2017) 2. Y. Dong, X. Yang, X. Zhao et al., Bidirectional convolutional recurrent sparse network (BCRSN): an efficient model for music emotion recognition. IEEE Trans. Multimedia 21(12), 3150–3163 (2019) 3. M. Mueller, A. Arzt, S. Balke et al., Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Signal Process. Mag. 36(1), 52–62 (2018)

380

H. Wu and F. Wu

4. J. Kocinski, E. Ozimek, Logatome and sentence recognition related to acoustic parameters of enclosures. Arch. Acoust. 42(3), 385–394 (2017) 5. R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2017) 6. X. Peng, X. Gao, Y. Zhang et al., An adaptive feature learning model for sequential radar high resolution range profile recognition. Sensors 17(7), 1675 (2017) 7. B. Ma, J. Teng, H. Zhu et al., Three-dimensional wind measurement based on ultrasonic sensor array and multiple signal classification. Sensors 20(2), 523 (2020) 8. M. Ge, Y. Tian, Y. Ge, Optimization of computer aided design system for music automatic classification based on feature analysis. Comput. Aided Des. Appl. 19(3), 153–163 (2021) 9. J. Zhang, Music feature extraction and classification algorithm based on deep learning. Sci. Program. 2021(2), 1–9 (2021) 10. D. Chaudhary, N.P. Singh, S. Singh, Development of music emotion classification system using convolution neural network. Int. J. Speech Technol. 2021(3), 24 (2021) 11. A. Miskiewicz, T. Rosciszewska, J. Majer, Perception of environmental sounds: recognitiondetection gaps. J. Acoust. Soc. Am. 141(5), 3694–3694 (2017) 12. A. Baro, P. Riba, J. Calvo-Zaragoza et al., From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123(5), 1–8 (2019)

Chapter 38

Construction of Piano Music Recognition System Based on Big Data Algorithm Qianmei Kuang and Chun Liu

Abstract Beat recognition of piano playing music is a research hotspot in the field of music sound recognition in recent years. It is a new technology that uses audio technology to identify music score, staff and audio files according to melody. Because the standards of music sound, score and sound line constitute the music beat system, it can automatically identify the complete audio and automatically generate the score. This paper constructs a piano music recognition system based on big data algorithm. The system extracts the characteristics of piano music, classifies timbre files according to the characteristics, and judges whether timbre files have been adjusted. Comparing the recognition results with the real results, it is found that the recognition accuracy of the system is low. Through the experimental test, it can be seen that the correct rate of timbre recognition of the system is close to 90%, which is 30% higher than the initial correct rate. It can be seen that the era of big data has created favorable conditions for the diversified transformation of piano playing and singing modes, and also supported the technical means of piano playing and singing, making modern and intelligent piano playing and singing a reality. Keywords Big data algorithm · Piano playing · Music recognition system

38.1 Introduction The process of piano playing music recognition is to extract the corresponding features of the audio signal obtained by the computer and process the interested parts. Its purpose is to convert the captured audio signal into corresponding notes and rhythm auspicious text. In short, the underlying music recognition is to identify each note, melody rhythm, music style and even chord of a piece of music. According to the different characteristics of sound waves, audio can be divided into regular audio and irregular audio. The former can be divided into sound effect, voice and music. Q. Kuang · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_38

381

382

Q. Kuang and C. Liu

Music beat recognition is a research hotspot in the field of music recognition in recent years. It is a new technology that uses audio technology to identify music scores, staff and audio files according to melody. Because the music sound, music score and sound line standard constitute the music beat system, the complete audio can be automatically recognized and the music score can be automatically generated [1]. In the field of note segmentation, the method based on short-term energy and zero-crossing rate has universal applicability, but the setting of adaptive threshold is a difficulty. As an artistic tool to express feelings and produce resonance, music itself contains complex contents and levels, which cannot be directly analyzed. In the aspect of fundamental frequency extraction, the fundamental frequency of tone samples can be detected by short-time autocorrelation method, cepstrum method and short-time average amplitude difference method, but in the case of multi-note recognition, a note contains several tilt sample signals, and the traditional basic method cannot meet the accuracy requirements of note recognition [2]. In music performance, the relationship between notes, frequency, beat, audio and coding is usually expressed in a modular form by language. Therefore, based on the standard of note coding accuracy and pitch extraction accuracy, the traditional music recognition algorithm uses the peak-valley characteristic of a single domain to extract the pitch of the tilt sample, and the peak-valley characteristic at the pitch is not prominent enough, which is easy to cause misjudgment [3, 4]. This paper constructs a piano playing music recognition system based on the big data algorithm. The system extracts the features of the piano music timbre, classifies the timbre files according to the features, and determines whether the timbre files have been adjusted. Comparing the recognition results with the real results, the recognition accuracy of the system is low. In this paper, the piano music is extracted and measured, the characteristic parameters of piano timbre are improved, and the timbre is classified to judge whether it has been tampered with. By comparing the authenticity of playing music, it is found that the correct rate of piano music recognition by this method is low.

38.2 Basic Characteristics of Piano Music and Music Signal Preprocessing 38.2.1 Overview of Piano Music Characteristics In daily life, human beings can hear different kinds of sounds, such as noise, speech and so on. The research object of this paper is to play musical instruments, and the range is an attribute of musical instruments, which generally refers to the range from the lowest sound to the highest sound that musical instruments can achieve. Taking the piano as an example, this paper mines and identifies the timbre characteristics of the piano, and finally gets the recognition result. By listening to the music, the hand movements can be adjusted accordingly, so as to realize the overall effect of piano

38 Construction of Piano Music Recognition System Based on Big Data …

383

Fig. 38.1 Characteristic system diagram of piano music

music performance, and the piano playing skills training before piano performance can train the player’s hearing [5]. The system needs the piano melody information when identifying the piano timbre, and the piano melody information can be used as the data source of the computer, through which various audio files can be obtained. The piano timbre processing sub-module is mainly used to process the piano sound. Because the piano timbre requirements are high, different timbre technologies need to be adopted when processing it to make the piano timbre more beautiful. Repeated practice by the performer can make the hearing receive music sounds with different timbres and feelings, so as to evaluate and adjust the piano playing skills, so that the hearing can play its monitoring role in coordinating piano playing [6]. There are four types of piano music, namely, musical structure, piano music style, emotional connotation and piano timbre. Through the basic characteristics of music, we can reflect the complex characteristics, then show the overall characteristics, and finally show the musical structure, express the artistic style and set off the emotional connotation. The characteristic system diagram of music is shown in Fig. 38.1. The performance of a piano work is not only finished by completing the corresponding finger movements and performance skills, but also by grasping the style and melody of the work as a whole. In this process, hearing is a monitor, and good hearing can ensure that the performer can keep his hands and ears consistent in piano performance [7]. The timbre is the sensory characteristic of music. The music played by different instruments contains overtones of different frequencies.

38.2.2 Piano Music Signal Preprocessing The arrival of the era of big data has made piano playing and singing more colorful, giving learners a broader platform and enabling them to obtain rich piano playing and

384

Q. Kuang and C. Liu

singing resources through the big data platform. Because the piano timbre requirements are high, different timbre technologies need to be adopted when processing it to make the piano timbre more beautiful. Repeated practice by the performer can make the hearing receive music sounds with different timbres and feelings, so as to evaluate and adjust the piano playing skills, so that the hearing can play its monitoring role in coordinating piano playing. This chapter will carry out further research on piano music signal preprocessing.

38.2.2.1

Sampling and Quantization

In the process of processing music signals, any operation flow may cause errors in the experimental results, and it is impossible to completely eliminate some errors, but the errors caused by sampling and quantization can be minimized. Piano performance skill training on the basis of improving the player’s musical hearing, the player’s grasp of piano timbre is more detailed. Piano timbre is an important carrier of music performance, and the complicated and changeable piano timbre needs the player to accurately grasp it [8]. Give full consideration to the piano timbre recognition and electronic synthesis system module, which includes multiple functions, in order to realize the operation between the modules and the overall system function in the piano timbre recognition and electronic synthesis system. In order to convert analog signal into digital signal, it needs to be quantized in amplitude. We can have a deeper understanding of the influence of the contact surface, strength and speed of touching strings on the piano timbre. When performing music works, we can adjust the piano playing skills by changing and mastering the piano timbre, so as to play the most perfect piano timbre [9].

38.2.2.2

Pre-emphasis

When explaining the knowledge of piano playing and singing, on the one hand, we should give full play to the demonstration and guidance role of the demonstrator. In the era of big data, it is particularly important to expand piano playing and singing forms and improve learners’ participation timbre recognition is realized through the analysis of functional modules and reasonable programming. This kind of video must have the unique characteristics of piano playing and singing and the unique charm of music, so that people can get happiness from it and feel the influence and charm of music culture. The low-frequency part of the music signal fluctuates in the range of − 20 to 20, and the signal spectrum amplitude of the low-frequency part is obviously higher than that of the high-frequency part, which is improved after pre-emphasis [10]. The experimental results show that the pre-emphasis music signal effectively reduces the frequency offset caused by low-frequency components. The era of big data has created favorable conditions for the diversified transformation of piano playing and

38 Construction of Piano Music Recognition System Based on Big Data …

385

singing modes, and also supported the technical means of piano playing and singing, making modern and intelligent piano playing and singing a reality [11].

38.2.2.3

Wave Filtering

Because the noise signal contained in piano music will affect the detection accuracy of music recognition algorithm, in the process of music recognition, the designed filter is required to minimize the error caused by noise without changing the signal waveform, and fixed filter and adaptive filter can be used. Piano can play all kinds of music and opera scores, and it can be used in a wide range, even as a band. With the help of big data, it not only frees professors from heavy explanation tasks, For example, we can find demonstration video materials of piano playing and singing, so that people can compare and study different materials by themselves, find out the differences and similarities, and get rich playing and singing skills from them. But also allows learners to have more learning opportunities for personalized piano playing and singing. The spectrum of the piano is analyzed, and finally the digital piano timbre is identified and played synchronously. Repeat the above steps for the unrecognized piano timbre until the identification is completed, so as to realize the design of piano timbre identification and electronic synthesis system. If we want to weaken this delay, it will reduce the signal-to-noise ratio of the filtered music signal. Generally speaking, the output signal with signal-to-noise ratio greater than 15 is considered as a moderate value, so the corresponding larger value with output signal-to-noise ratio greater than 15 is adopted.

38.3 System Design Most piano courses need to have the function of playing music scores. In combination with piano courses, music scores are played and demonstrated. When learning piano scores, it is necessary to play and demonstrate them so that learners can have a deeper understanding of piano scores at multiple levels. Piano playing and singing with big data can help break through the shackles of traditional models, meet the needs of diversification, and promote the development of innovation ability. In addition, combining the characteristics of different people, we can optimize the piano playing and singing strategies and guide the construction of a perfect piano playing and singing structure. The piano playing music recognition system based on big data algorithm designed in this paper is mainly composed of the following parts, including training set, preprocessing, voice mining, etc. The specific system architecture is shown in Fig. 38.2. If the Fourier transform of the discrete signal y(n) of the piano is expressed by Y ( j ω), the expression is as follows:

386

Q. Kuang and C. Liu

Fig. 38.2 Piano playing music recognition system

Y ( jω) = Si ( j ω) · Fi ( j ω)

(38.1)

where Fi represents the sine function of piano timbre, Si represents the characteristic value of piano timbre, and the matrix expression of piano timbre feature extraction is as follows:   ai Si ( j ω) = Ai 2 (38.2) ai + (ω − ωi ) where, represents the scale of the piano, ωi represents the octave of the i note in the piano, Ai represents the amplitude, and ai represents the width of the adjusted waveform. On the premise of ensuring the time, it is necessary to accurately distribute the vibration energy of the piano. The function expression is: f (x) =

x2 ex

(38.3)

38 Construction of Piano Music Recognition System Based on Big Data …

387

Table 38.1 Test results of piano timbre Function

Use case analysis

Trial results

Timbre test result

Open the piano score

Open music scores in different formats

Only the current opening mode is supported

Qualified

Add piano attachment

Add dots to the new piano score

Prompt is not add

Qualified

Save piano music

Modify the file name when Enter the saved state saving

Qualified

Delete note

Delete notes at the end of the piano

Qualified

Recombine

Among them, x represents time, e represents amplitude, and the envelope function can pave the way for the electronic synthesis of piano music recognition. The specific performance skills of piano music are the needs of the connotation and emotion of this music work, so the content and emotion of music work are the baton of piano performance. Music recognition and voice recognition have certain relevance, which can be used for reference in technology, but also have differences. Since piano timbre recognition pays more attention to the test results, the test results of piano timbre are shown in Table 38.1. Through the opening and creation of piano score, and the analysis of score format and timbre test results, the opening method of piano score only supports the normal opening of the current opening method. The similarity between the two is that the segmentation of music segment and noise segment in music recognition is similar to the segmentation of voice segment and noise segment in speech recognition. The method can be used for reference. Both can use the short-time average energy method to segment the sound segment and the noise segment. Only notes can be added in the correct format in the piano score, and the sound and speed of piano music playback are fixed.

38.4 System Test and Result Analysis When the piano notes are added, deleted and saved according to the above rules, the final piano timbre results are all qualified, which shows that the piano timbre test results are good. On this basis, the performance of the piano timbre recognition and electronic synthesis system is tested by the system in Ref. [4], the system in Ref. [6] and the system in this paper, and the experimental comparison is shown in Fig. 38.3. It can be seen from Fig. 38.3 that there is no noise interference. When the frequency is 1150 and 1550 Hz, the amplitude is the lowest. In the methods of literature [4] and literature [6], the frequency and amplitude are also relatively average. With the change of frequency, the variation of piano timbre amplitude is almost the same. It can be seen that the interference with noise is basically the same.

388

Q. Kuang and C. Liu

Fig. 38.3 Performance test results of piano timbre recognition of three systems

On the basis of the analysis of the piano timbre recognition performance, in order to test the accuracy of the piano timbre recognition and electronic synthesis system, a piano song is selected for the experiment. The experiment is compared with the system in document [4], the system in document [6] and the system in this paper. The experiment is designed for 5 min, and the experimental comparison is shown in Fig. 38.4. According to Fig. 38.4, when the playing time of piano music is 22 min, the correct rate of timbre recognition is 88%, when the system time in document [4] is 22 min, the correct rate of piano timbre recognition is 38%, and when the system time in document [6] is 22 min, the correct rate of corresponding timbre recognition is 53%. When the playing time is almost over, the correct rate of timbre recognition in this system is nearly 90%, It is 30% higher than the initial accuracy. It can be seen that the accuracy of piano timbre recognition in this system is higher, and the higher the accuracy of timbre recognition, the more perfect the system design.

38.5 Conclusions In music performance, the relationship between notes, frequency, beat, audio and coding is usually expressed in a modular form by language. It can’t meet the requirements of note segmentation accuracy and fundamental frequency extraction accuracy for piano music note recognition. The piano playing and singing mode advocated in the era of big data is based on information and intelligence, which provides a platform for people to learn freely and independently. To really promote the implementation

38 Construction of Piano Music Recognition System Based on Big Data …

389

Fig. 38.4 Accuracy of piano timbre recognition of three systems

of this model, there must be a prerequisite, that is, putting learners in the main position. The traditional music recognition algorithm uses the peak-valley characteristics of a single domain to extract the fundamental frequency of tilted samples, and the peak-valley characteristics are not prominent enough at the fundamental frequency, which is easy to cause misjudgment. Therefore, the short-term energy method is used to realize the extraction of music segments. Finally, the short-term energy difference method is proposed to detect the starting and ending points of music, and the music segments are judged twice to complete the time detection of notes. Experiments show that the improved short-term energy difference method can effectively realize note segmentation. In the specific piano performance, this paper analyzes the piano playing music recognition based on big data algorithm, and concludes that the player’s playing quality is closely related to the musical expression, especially the use and mastery of playing skills, which can provide solid technical support for the actual performance.

References 1. S. Tata, SongRecommended: Music Recognition System with Fine-Grained Song Reviews, vol. 45(20) (Virtual Sheet Music Inc., 2010), pp. 33–57 2. T. Hatta, A. Ejiri, Learning effects of piano playing on tactile recognition of sequential stimuli. Neuropsychologia 27(11), 1345–1356 (2021) 3. C. Shuo, The construction of internet plus piano intelligent network teaching system model. J. Intell. Fuzzy Syst. Appl. Eng. Technol. 37(5), 22–46 (2022)

390

Q. Kuang and C. Liu

4. S. Nishino, K. Ookura, M. Tokumaru et al., A evaluation of playing piano in the virtual reality aided piano educational system. IEICE Tech. Rep. Educ. Technol. 99(11), 135–142 (2022) 5. F. Simonetta, Music interpretation analysis. A multimodal approach to score-informed resynthesis of piano recordings. Bull. Counc. Res. Music. Educ. 15(5), 20–41 (2022) 6. S. Snell, Preservice and in-service music educators’ perceptions of functional piano skills. Bull. Counc. Res. Music. Educ. 25(228), 59–71 (2021) 7. Srisainatyalayam, R. Ashwin Kumar Playing Piano song Munbe Vaa, in Srisaintyalayam Nanganallur His Music School, vol. 45(17) (Virtual Sheet Music Inc., 2022), pp. 24–33 8. G. Donizetti, G. Puccini, G. Rossini et al., Masters of the Italian Art Song: Word-by-Word and Poetic Translations of the Complete Songs for Voice and Piano, vol. 36(14) (2022), pp. 19–49 9. X.U. Lan, On the skill training of piano playing and singing. J. Hubei Correspond. Univ. 36(10), 19–42 (2021) 10. P. Toyan, K. Shoji, J. Miyamichit, Music symbol recognition of printed piano scores with touching symbols. J. Inst. Image Inf. Telev. Eng. 64(8), 1267–1272 (2021) 11. J.J. Jiao, The design and implementation of piano timbre recognition and electronic synthesis system. J. Jingdezhen Univ. 15(3), 16–19 (2022) 12. S. Sujith, K. Jithin, V. Appu, Musical instrument recognition system with dynamic feature selection method. IJERT Int. J. Eng. Res. Technol. 16(10), 16–23 (2021) 13. T.T. Yin, Learning playing technique from piano accompaniment of art songs: taking linden as an example. J. Sichuan Coll. Educ. 35(20), 21–35 (2021)

Chapter 39

Design of Music Recommendation Algorithm Based on Big Data Analysis and Cloud Computing Yanxin Bai and Chun Liu

Abstract Based on practical application, this paper develops the design of music recommendation algorithm based on BD (big data) analysis and cloud computing. Therefore, this paper proposes a hybrid music recommendation algorithm. Specifically, the model can be divided into two layers, and the predicted output value of the first layer model is trained as a new feature of the second layer. In the first layer, the combination features can be automatically mined by k-means algorithm and used as the input of the faster training logistic regression algorithm. In the second layer, Canopy clustering is used to train the predicted values of the first layer as features, and finally the generated recommendation list is provided to users. The research results show that the hybrid music recommendation algorithm is more accurate than the traditional user-based CF (collaborative filtering) recommendation algorithm. The accuracy rate increased by 0.157. Through comparative experiments, it is verified that the music recommendation algorithm based on BD analysis and cloud computing proposed in this paper is superior to other song recommendation algorithms in accuracy. Keywords Big data · Cloud computing · Music recommendation

39.1 Introduction The ubiquitous network has changed the whole human life style, and traditional behaviors can be almost mapped in the network, such as online shopping, social networking, online music, online reading, news and various searches. With the convenience of the Internet, people are immersed in it, so the data of various Internet platforms also increase, which leads to the problem of reduced information utilization, that is, information overload [1, 2]. Y. Bai · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_39

391

392

Y. Bai and C. Liu

The music resources on the Internet are extremely rich. While enjoying the leisure time brought by music, users of various music platforms are also faced with the troubles of too many kinds of music and difficult to express their emotions. The developers of music platforms have gradually noticed the psychological needs of users [3]. BD (big data) and cloud computing enable us to understand a person’s preferences by analyzing his behavior, and even infer his personality, occupation and even family situation [4, 5]. Literature [6] introduces clustering technology into recommendation algorithm. By clustering similar users, the search range of nearest neighbor users is reduced to improve the response speed of recommendation system. Aiming at the problem that single recommendation algorithm is not universal, a hybrid recommendation algorithm based on CF (Cooperation filter) and video genes is proposed in Ref. [7]. Literature [8] combines the algorithm based on critical region model and matrix decomposition in CF algorithm. Under the background of BD and cloud computing, the traditional recommendation algorithm, calculation method and storage mode can not meet the exponential growth of data requirements and personalized user needs [9, 10]. Scholars at home and abroad pay close attention to music recommendation, but these products are also seeking changes in BD and cloud computing environment. Therefore, based on practical application, this paper combines BD analysis technology with personalized music recommendation. On this basis, K-means clustering algorithm is used to cluster users with similar interests, and music recommendation for different types of users is realized.

39.2 Research Method 39.2.1 User Behavior and User Portrait Modeling Different from the traditional recommendation of songs through leaderboards, personalized music recommendation service is more humanized and intelligent. It does not require users to actively provide accurate music preferences. We can know what types of music users are interested in by analyzing users’ favorite music and the amount of music downloaded, and studying historical browsing and playback records. However, most of the recommendations are based on the user’s on-demand history to summarize the types of music that users are interested in, and the recommended works often have a high similarity in genre, author, lyrics and other aspects, without considering that when the user’s mood changes due to various factors, the required music will also change in rhythm and style.

39 Design of Music Recommendation Algorithm Based on Big Data …

393

Nowadays, as a tool for users to quickly extract information, recommendation system has attracted more and more attention. However, before the recommendation system was widely concerned and applied, the search system was always regarded as an effective tool to solve the difficulty for users to obtain accurate information. The core of Hadoop platform is HDFS and Map Reduce. Among them, the distributed architecture adopted by HDFS enables it to solve the storage problem of large-scale data well, while Map Reduce uses the idea of divide and conquer to divide large-scale data into several small-scale data, and then calculates them. Music recommendation algorithm is mainly divided into two functional stages when recommending songs [11]: input information preprocessing stage and song recommendation stage. In the preprocessing stage of input information, the recommendation system needs to continuously collect and filter the behavior data generated by users when listening to songs. Generally speaking, there are three kinds of music recommendation algorithms, namely, song content-based recommendation algorithm, CF recommendation algorithm and hybrid recommendation algorithm [12]. When making music recommendation, the songs that users listen to often change with the change of mood. At the same time, the results of music recommendation will also focus on the historical data set selected by users, and can not provide users with novel songs. The main application field of user portrait is recommendation system. Through the modeling information of user portrait, it provides a large amount of data basis for the recommendation algorithm of mining users’ potential preferences. In the data set used in this paper, users’ behaviors include browsing, collecting, joining shopping cart and buying. For users who have no history, CF algorithm and Canopy clustering can be used to calculate the similarity of users and find nearby users to recommend. It is necessary to model the user’s behavior and analyze valuable business information such as user’s behavior habits and interest preferences through portraits. According to the user’s basic information, commodity information, access information, behavior preference, etc., the user portrait model is constructed, which is the labeling of users. Figure 39.1 shows the main structure of the user portrait. Aiming at the feature extraction of music works, this paper needs to study an audio feature extraction method with low cost, low calculation   and simple process. It is defined that the eigenvalue of a musical work is pi x f {i = 1, 2, . . . , n} under the current pitch effect, and the calculation formula of its pitch is as shown in (39.1):   f x f = Max( pi )

Fig. 39.1 The main structure of user portrait

(39.1)

394

Y. Bai and C. Liu

  Among them, f x f is the pitch feature vector of the current music works. The pitch of felling is 7.4. Music recommendation is different from movie recommendation and business recommendation. First of all, users will not rate a certain piece of music even if they have heard it; Secondly, many times we will choose to listen to music while cleaning and running instead of simply enjoying music, which will lead us to listen to a lot of music that we are not very interested in. Then the formula for finding the playing frequency of the song v that the user u has listened to is shown in (39.2): f u,v =

pu,v pu

(39.2)

where pu,v is the number of times that user u plays music v, and pu is the total number of times that user u listens to music. Music platform is different from film and television platform, which can significantly score the movies or TV series watched, but implicitly collect users’ behaviors and calculate users’ ratings of songs according to the weight of behaviors. Most of the user’s actions are focused on playing songs. The number of times a user plays a song can indirectly reflect the user’s love. The design scoring formula is shown in Formula (39.3): rating = 1.2 − e−0.55x−0.1

(39.3)

x represents the number of times the user has listened to the song completely. If a user listens to a song for the first time in a recent period, but the completeness of listening to the song is less than 70%, it is considered that the user does not like the song.

39.2.2 Implementation of Music Recommendation Algorithm The implementation of recommendation system must rely on BD and cloud computing, instead of a traditional algorithm. Processing the collected data through an extensible platform and getting a reliable model is the root of BD processing in recommendation system. Hybrid cloud combines public and private clouds. In this model, users often outsource some information that is vital to the enterprise, and then put it in the public cloud, which is the control right of services and data that are vital to the enterprise. The key points of recommendation on the premise of music content are as follows: first, it is necessary to make clear the interest direction of each user, use technical means or software data to calculate his preferences, and take the rhythm, main melody and chorus of each song in the data as the characteristic points he is interested in to form a database, that is, a feature vector. Therefore, this paper proposes a hybrid

39 Design of Music Recommendation Algorithm Based on Big Data …

395

Fig. 39.2 Framework of hybrid music recommendation algorithm

model recommendation system based on algorithm. The question can be converted into a classification question of the purchase estimate. The main idea is that by combining different algorithm models, we can effectively use and give full play to the advantages of each model, which is conducive to the use of the advantages of each algorithm, this method has strong robustness and versatility. Therefore, this paper proposes a hybrid music recommendation algorithm as shown in Fig. 39.2. After the above process, user data will be divided into different classes, and each cluster contains several user data with similar interests and behaviors. Here, we apply the superposition integration method to model fusion. On this basis, a new feature extraction method is proposed. On the basis of this method, a compound feature extraction method based on K-means is proposed and applied to the rapid training of Logistic regression model. In the second layer, Canopy clustering is used to train the predicted values of the first layer as features, and finally the generated recommendation list is provided to users. User data clustering aims at clustering users with the same interest behavior, so that they show the same interest characteristics in the same category. In this paper, the clustering problem of user data in K-Means++ algorithm is studied. This paper proposes a new clustering algorithm based on K-means. Define the inter-cluster distance L as the sum of distances from all cluster centers to all data centers. The calculation formula of L is shown in (39.4): L=

K 

ci − m

(39.4)

i=1

where ci is the result cluster center, m is all user data centers, and k is the number of clusters.

396

Y. Bai and C. Liu

The intra-cluster distance D is defined as the sum of the total distances from all points in each cluster in all the result clusters to the center of the cluster. The calculation formula of D is shown in (39.5): D=

K  

x − ci 

(39.5)

i=1 x∈C

x represents a point in the result cluster. In this paper, according to the obtained user behavior information, the classified users are graded and predicted by the improved CF algorithm through the k-means algorithm, and the predicted results are ranked by TOP-N, and then the ranked results are recommended to users. Using clustering algorithm, users in the same category will choose roughly similar products, which reduces the sparsity of scoring matrix. Therefore, the data can be roughly clustered by using Canopy clustering, and the hyperparameter k can be optimized for K-Means clustering algorithm, and then KMeans is used for further detailed clustering after obtaining the k value. Traditional K-Means clustering algorithm is used in each Canopy, and similarity calculation is not carried out between objects that do not belong to the same Canopy. The advantage of the clustering algorithm combining Canopy and K-Means is that the number of Canopys obtained in the first step can be used as this k value, which overcomes the shortcomings of the traditional K-Means clustering algorithm to some extent and reduces the blindness of choosing k.

39.3 Experimental Analysis The experimental data comes from the Million Song Dataset (MSD) data set, which contains the behavior records of 10,000 users on 100,000 music works and the related metadata of users and music. In order to make a better experimental comparison, it is randomly divided into training set and test set according to the ratio of 8: 2. The hardware environment of the experiment in this paper is: Intel (R) Core (TM) i5-6300HQ CPU @ 2.3 GHz, with 4G memory; The software environment is Windows10/64-bit operating system, and the development language is Python programming language. The weight value of the algorithm for calculating the similarity between two users is an adjustable parameter. In the experiment, based on the step size of 0.1, the adjustment range is between [0, 1]. In addition, in k-means user clustering, it is necessary to adjust the size of the parameter to select the number of cluster centers and get the most suitable number of clusters. After the above experimental process, according to the recommended results, the parameters of user behavior weight w and clustering center k are adjusted, and the accuracy under different parameters is obtained (Fig. 39.3).

39 Design of Music Recommendation Algorithm Based on Big Data …

397

Fig. 39.3 Parameter comparison accuracy chart

The value of behavior weight is a parameter that is adjusted in the way of increasing step size of 0.1, and the data listed in this paper is a reference parameter value from 0.7 to 0.9. The selection of the number of clustering centers is the size of k value adjusted by the growth mode of step 5. The training set is used as the input of the song recommendation algorithm. The recommendation program conducts model training according to the input data, and then recommends songs to users. After obtaining the data of the songs recommended by the system, the test set partially verifies the accuracy of the songs recommended by the system. Figure 39.4 shows the performance of different algorithms in accuracy, recall and coverage respectively. As can be seen from the above chart, compared with the CF recommendation algorithm based on users, the CF recommendation algorithm based on songs is lower in accuracy, recall and coverage than the CF recommendation algorithm based on users. Compared with the traditional CF recommendation method, the CF recommendation method proposed in this paper has a great improvement in accuracy. The accuracy is increased by 0.157.

398

Y. Bai and C. Liu

Fig. 39.4 Comparison of experimental results of algorithm

39.4 Conclusion The music resources on the Internet are extremely rich. While enjoying the leisure time brought by music, users of various music platforms are also faced with the troubles of too many kinds of music and difficulty in expressing their emotions. The developers of music platforms have gradually noticed the psychological needs of users. Therefore, based on practical application, this paper combines BD analysis technology with personalized music recommendation. On this basis, K-means clustering algorithm is used to classify users with similar interests, so as to recommend products to different types of users. Experiments show that the mixed music recommendation method proposed in this paper is superior to the traditional CF recommendation method in accuracy. The accuracy rate increased by 0.157.

References 1. Q. Song, Design of music recommendation system based on interest emotion algorithm. Microcomput. Appl. 38(1), 82–84 (2022) 2. Q. Zhou, X. Li, Design of China music database system based on recommendation technology. Comput. Technol. Dev. 25(7), 4 (2015) 3. L. Jia, Design of China music database system based on project collaborative filtering algorithm. Microcomput. Appl. 2, 4 (2019)

39 Design of Music Recommendation Algorithm Based on Big Data …

399

4. T. Lee, Apriori mining algorithm design for music program classification. Mod. Electron. Technol. 42(19), 5 (2019) 5. Z. Wang, Speech and background music separation algorithm and its system design. Autom. Technol. Appl. 44(11), 363 (2019) 6. Y. Li, B. Li, J. Wang, A new music personalized recommendation algorithm based on LDAMURE model. J. Jilin Univ. Sci. Ed. 055(002), 371–375 (2017) 7. X. Ye, M. Wang, Research on music personalized recommendation algorithm TFPMF. J. Syst. Simul. 31(7), 11 (2019) 8. C.Y. Tsai, B.H. Lai, A location-item-time sequential pattern mining algorithm for route recommendation. Knowl. Based Syst. 73(8), 97–110 (2015) 9. Q.Y. Hu, Z.L. Zhao, C.D. Wang et al., An item orientated recommendation algorithm from the multi-view perspective. Neurocomputing 269(20), 261–272 (2017) 10. X. Liu, An improved clustering-based collaborative filtering recommendation algorithm. Clust. Comput. 20(2), 1281–1288 (2017) 11. L. Cui, W. Huang, Y. Qiao et al., A novel context-aware recommendation algorithm with two-level SVD in social networks. Futur. Gener. Comput. Syst. 86(10), 1459–1470 (2017) 12. D. Yao, X. Deng, A course teacher recommendation algorithm based on improved latent factor model and PersonalRank. IEEE Access 2(99), 1 (2021)

Chapter 40

Simulation of Sports Damage Assessment Model Based on Big Data Analysis Xiaodong Li and Zujun Song

Abstract According to periodic exercise training, competitive athletes can continuously improve their physical fitness, project skills, competition psychology and other aspects. If the athletes can’t recover to their original physical condition quickly in the competition, it will cause great damage to their previous training results, which will have a great impact on the improvement of their competitive ability. To this end, this project plans to carry out a simulation study on sports injury evaluation under the big data environment. This project intends to obtain the optimal factors of sports injuries from the perspective of statistics by multi-level linear regression method and multi-level principal component analysis, and then effectively predict the optimal factors of sports injuries, so as to achieve the goal of big data analysis of athletes’ sports injury rehabilitation. The results show that the prediction error of vulnerable body parts in strenuous exercise with this model is lower than that with traditional models. Thus, it is verified that the online sports injury evaluation method using big data can judge specific injuries. Keywords Big data · Sports injury · Assessment

40.1 Introduction Since the formation of human society, people have been doing physical fitness activities, but injuries have always threatened the health of sports participants. Biomechanics research can help people find the risk factors of sports injury and clarify the injury mechanism. According to periodic exercise training, competitive athletes can continuously improve their physical fitness, project skills, competition psychology and other aspects [1]. If the athletes can’t recover to their original physical condition quickly in the competition, it will cause great damage to their previous training results, which will have a great impact on the improvement of their competitive X. Li (B) · Z. Song Biomedicine and Health Industry Research Center, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_40

401

402

X. Li and Z. Song

ability. This phenomenon has caused great obstacles for athletes to maintain and improve their competitive level, and made them retire early [2, 3]. Therefore, the treatment and evaluation of sports injuries has become an important sport. Risk is ubiquitous and exists in a certain probability, but if the nature of risk is deeply analyzed and studied, some methods can be used to reduce or avoid the losses caused by risk. With the development of sports, there are more and more kinds of sports, and people are doing all kinds of sports all the time. With the continuous improvement of data technology, data models have been applied in sports injury diagnosis, sports injury judgment and evaluation and other fields for diagnosis and injury degree evaluation. Literature [4] shows that the core stability of male tennis players is better than that of women, and the flexibility of each joint and muscle group of women is better. It is suggested that foam shaft and static traction should be used to improve flexibility and flexibility, and strength training should be used to enhance muscle strength, eliminate asymmetry and improve stability and control ability. Literature [5] uses the least square support vector machine to model the relationship between high-intensity track and field training and sports injury. Literature [6] proposes a predictive modeling method for vulnerable parts of the body during strenuous exercise based on mutual information principal component analysis feature selection. Firstly, the model extracts the influence characteristics of the easily damaged body parts during strenuous exercise. Traditionally, it is generally realized by using risk identification theory. Based on this, a large number of digital operations are used to clearly identify the risk factors in specific parts, so as to control the severity of injuries caused by sports [7, 8]. This topic plans to use big data analysis technology to make up for the shortcomings of traditional training methods in a sense. The injury model established in this paper has good reference value for preventing sports injuries and ensuring the life safety of athletes.

40.2 Research Method 40.2.1 Establishment of Risk Factors of Sports Injury Wind face factor refers to the reasons and conditions that cause or increase the chance of risk accidents or expand the loss range. The more conditions that constitute risk factors, the greater the possibility of loss and the more serious the loss will be. The operating platform of risk management is a system, so risk identification should be carried out in a system [9]. No matter any system has different factors and different phenomenon characteristics, there are many risk factors that affect the system, and the degree of influence on the system is different, and the effect of influence is different.

40 Simulation of Sports Damage Assessment Model Based on Big Data …

403

The occurrence of these risk factors may make athletes face the risk of injury in all stages and links of training and competition, but the probability and influence of various injury risk factors are different. Therefore, it is more important to effectively identify and evaluate various injury risk factors to prevent sports injuries. The establishment of risk factors is the premise and foundation of risk management, and the scientific establishment determines the accuracy of risk assessment to a great extent [10]. In order to ensure the accuracy of initial analysis, scientific and systematic research should be carried out to comprehensively classify risk factors. In order to fully understand and deeply understand the factors of sports injury in floor exercise, based on the statistics of athletes’ questionnaire data, this study designed an expert questionnaire of sports injury in floor exercise, listing 30 sports injury risk factors related to floor exercise, which were evaluated by experts according to their importance, and the data were statistically analyzed by using the factor analysis function in SPSS22. 0. Through factor analysis, five risk factors affecting the sports injury of floor exercise are obtained, which are named as: factor 1: sports technology, factor 2: sports physiology, factor 3: sports psychology, factor 4: sports training and factor 5: external environment. There must be a specific external environment for the occurrence of sports injury risk. Therefore, analyzing the risk sources of external factors is another angle to study the risk sources of sports injuries in the special teaching activities of volleyball in a university. In this study, a questionnaire was designed to identify the risk sources, and the feasibility of the external risk sources was tested by expert investigation. After investigation, the external risk sources of sports injuries in sports volleyball special teaching activities were obtained, as shown in Fig. 40.1. Volleyball teaching activities are generally carried out in certain venues and venues, and the conditions of venues and venues have certain influence on students, including lighting, noise, ventilation, equipment layout and other factors. Equipment problems are mainly manifested in the quality of volleyball and the tightness of the net. These will often lead to sports injuries in volleyball teaching.

Fig. 40.1 External factors and risk sources of dynamic sports injury

404

X. Li and Z. Song

Teacher factors are mainly reflected in teachers’ teaching level, training level and training experience. Teacher factor is the most direct factor that causes students’ sports injury risk. The risk of social support is mainly reflected in the fact that the family members, friends, spectators, journalists and leaders who pay attention to the students’ participation in the competition have too high expectations of the students, resulting in the students’ excessive psychological pressure, which leads to the risk of sports injury.

40.2.2 Construction of Sports Injury Assessment Model Sports injury refers to the injuries suffered by athletes in the process of sports, which is different from the injuries in daily life and work, and it is produced with the rise of sports [11]. Its occurrence has a great relationship with sports events, special technical characteristics, training means and methods, sports environment and sports equipment, etc. It mainly occurs in the sports system (bones, muscles and joints), and may also be accompanied by vascular and nerve injuries. The cause of sports injury is not single, but the result of multiple factors. In the construction environment, identify the hazard source and the hazard factor, so that it has certain uncertainty, thus ensuring the safety and reliability of the construction process. Due to the characteristics of competitive events and the changes of competition rules, the difficulty coefficient of many events of athletes gradually increases, which makes the risk of physical injury of athletes gradually increase [12], which not only causes great waste of time, but also has a great impact on the authenticity of the results because of repeated calculations. Based on big data, this project makes an indepth study on the rehabilitation of sports injuries. From the perspective of statistics, this paper uses multiple linear regression method to carry out multiple regression on the factors affecting the rehabilitation of sports injuries, and uses multiple regression method to carry out multiple regression on the factors of sports injury rehabilitation. Risk identification is the process of decomposing extremely complex things that cause risks into relatively simple and easily recognized basic units or factors through investigation and other methods, and putting forward the main risk factors that may have great influence. This is a relatively economical, practical and effective work. Before evaluating sports injuries, it is necessary to determine the risk sources of injuries. Take volleyball as an example, its risk sources include: cushion, pass, spike, serve, block and move, and the relationship between each sub-item and the risk sources of injury can be shown in Fig. 40.2. Using big data technology, injuries in sports training can be effectively classified and predicted. The regression function of body vulnerable parts in strenuous exercise is calculated according to the risk probability of the parts injured in strenuous exercise, the relationship between the injured parts and the exercise angle, the exercise environment and many other variables, and the prediction model of body vulnerable parts in strenuous exercise is established by using the calculation results, and the model is verified.

40 Simulation of Sports Damage Assessment Model Based on Big Data …

405

Fig. 40.2 Determination principle of sports injury risk source

Multivariate linear regression analysis In the case of linear separability, a prediction model of easily injured body parts in strenuous exercise based on multivariate linear regression is established; Y = β0 + β1 X 1 + β2 X 2 + · · · + β j X j + · · · + βk X k + ε

(40.1)

In the formula, k represents the number of relationship variables between the injury site and the motion angle, β j represents the regression coefficient, and ε represents the random error after the influence of reduced k independent variables on Y . Assuming that y(x) represents the difference of athletes’ sports injuries in highintensity track and field training in different events, and represents the corresponding injury level in each event, the classifier of the main factors of athletes’ injuries is constructed by using the following formula:   in − pi j ∗ δ Mφn Max Z n = pi xi ∗ (y(x) ∗ γ )

(40.2)

Among them, pi xi represents the injury probability of waist, knee, elbow joint, thigh, ankle joint and wrist in high-intensity track and field training, pi j represents the probability of strain, sprain, abrasion and chronic strain in high-intensity track and field training, γ represents the amount of exercise in high-intensity track and field training, and δ represents that athletes have suffered different degrees and types of injuries in the recent training process. The damage assessment model can be initially established by determining the damage grade value by BP neural network and building the damage hidden layer. In order to ensure the normal use of BPNN (sports injury assessment model, it is necessary to iterate the free parameters generated in the operation process, and the specific formula is as follows:

406

X. Li and Z. Song

 ∂(n) = sv (n)Ru (x) ∂svu (n) u=1

(40.3)

∂(n) ∂vu

(40.4)

n

svu (n + 1) = svu (n)−η

where ∂s∂(n) is the iteration of free parameter; n is the value of the input current vu (n) variable, and n + 1 is the value of modifying the current variable after completing the iteration; η is the overall learning efficiency in this model, that is, constant and variable.

40.3 Simulation Analysis The subjects are professional sports students from all over China, among whom 60% are national first-class sports students. A total of 1250 respondents, including 640 men and 610 women. The contestants are all young people aged 16–25. Based on statistical analysis, this study uses HLM analysis software to analyze the test results. Using this method, the risk factors of sports injury are evaluated, and the risk scores of injuries caused by high-level athletes in different competitive events are ranked by the list ranking method. Using this model and the traditional model respectively, the prediction experiments of vulnerable body parts in strenuous exercise are carried out, and the prediction errors of vulnerable body parts in strenuous exercise are compared between the two models. The comparison results are shown in Fig. 40.3. As shown in Fig. 40.3, the prediction error of vulnerable body parts in strenuous exercise with this model is lower than that with traditional model. This is mainly because the model in this paper first obtains the internal relationship between the sports injury attribute and the movement amplitude of the body parts, and establishes the prediction model of the easily injured body parts in strenuous exercise through the calculation results, which makes the prediction error of the easily injured body parts in this model in strenuous exercise low. On this basis, based on the accuracy of sports injury assessment, big data processing technology and big data processing technology, a sports injury assessment method is established. Figure 40.4 shows the comparison of the accuracy of the two models in evaluating sports injuries. The calculation of an example shows that the prediction accuracy of this method is higher than that of the conventional prediction method. Experiments show that the sports injury evaluation model based on big data and network technology can objectively and accurately evaluate sports injuries, and can effectively solve the problems existing in traditional evaluation methods.

40 Simulation of Sports Damage Assessment Model Based on Big Data …

407

Fig. 40.3 Prediction error comparison

Various measures and methods can be taken to prevent the injury risk of track and field athletes, so as to eliminate or reduce the possibility of injury risk factors of athletes, or reduce the losses caused by these injury risk factors. In the process of sports injury, athletes should take corresponding protective measures according to their own characteristics. Coaches and athletes usually have two strategies when occupational safety accidents occur: avoiding danger and reducing danger. When the weather is cold, you should do more preparatory activities, fully stretch and warm up, and pay attention to keeping warm. When the site is slippery, you should pay more attention to taking anti-skid measures to reduce the risk of injury. Excessive local body load will also increase the probability of injury. Therefore, coaches should control the amount of exercise training and the tolerance of local organizations when making exercise training plans to reduce the risk of injury. Strengthen the risk awareness education of sports injuries. Make a portfolio of athletes’ sports injury risk events, study the quantitative probability of sports injury risk factors, and get the correlation between sports injury and sports performance, so as to increase the attention of relevant personnel to sports injury. Strengthen the investigation and understanding of athletes’ physiology, psychology, training, technology and daily routine, and prevent sports injuries through communication. Through study and training, coaches teach athletes to master the necessary methods of dealing with sports injuries and first aid.

408

X. Li and Z. Song

Fig. 40.4 Comparison results of model estimation accuracy

40.4 Conclusion With the development of data technology, data model has been used in sports injury diagnosis, sports injury judgment and evaluation. In view of this, this project intends to use big data analysis method to construct a sports injury evaluation model based on big data. The research results of this project will provide an important theoretical basis for the prevention and treatment of sports injuries and the protection of athletes’ personal safety. The results show that the prediction accuracy of this model for vulnerable parts of human body is lower than that of the conventional model for lower intensity exercise. On this basis, the prediction accuracy curve of the model in this study is always higher than that of the conventional model, indicating that the prediction accuracy of this research model is higher. Therefore, it can be confirmed that sports injury can be qualitatively and qualitatively evaluated by using big data network.

40 Simulation of Sports Damage Assessment Model Based on Big Data …

409

References 1. L. Chen, Vision-based sports injury assessment method. Mod. Electron. Technol. 41(4), 4 (2018) 2. S. Yang, Research on sports injury assessment model based on big data network. Mod. Electron. Technol. 41(6), 4 (2018) 3. S. Li, D. Ma, S. Li, Sports injury assessment model based on big data. Mod. Electron. Technol. 41(6), 4 (2018) 4. Jiebo, Evaluation model of sports injury degree based on complex network model. Mod. Electron. Technol. 041(006), 165–168 (2018) 5. W. Yao, J. Ding, Imaging evaluation of ankle joint injury caused by excessive exercise. Chin. J. Radiol. 54(3), 4 (2020) 6. L. Chen, Visual-based sports injury assessment method network first. Mod. Electron. Technol. (2018) 7. L. Shao, Simulation of sports injury risk assessment based on association rules. Mod. Electron. Technol. 41(10), 4 (2018) 8. H. Yuan, J. Xu, J. Zeng et al., Screening, intervention and evaluation of anterior cruciate ligament injury under the framework of prevention sequence model. China Tissue Eng. Res. 26(17), 7 (2022) 9. C. Emery, N. Wang, L. Fu, Neuromuscular training reduces sports injuries. China Rehabil. 35(12), 659–659 (2020) 10. S.L. Zuckerman et al., Functional and structural traumatic brain injury in equestrian sports: a review of the literature. World Neurosurg. (2015) 11. Z.Y. Kerr, C.R. Dawn, T.P. Dompier et al., The first decade of web-based sports injury surveillance (2004–2005 through 2013–2014): methods of the national collegiate athletic association injury surveillance program and high school reporting information online. J. Athl. Train. (2018). https://doi.org/10.4085/1062-6050-143-17 12. K. Coxe, K. Hamilton, H.H. Harvey, J. Xiang, M.R. Ramirez, J. Yang, Consistency and variation in school-level youth sports traumatic brain injury policy content. J. Adolesc. Health 62(3), 255–264 (2018)

Chapter 41

Construction of Purchase Intention Model of Digital Music Products Based on Data Mining Algorithm Wenjie Hu, Chunqiu Wang, and Xucheng Geng

Abstract Consumers have generated a lot of behavioral data on the music platform, which makes it possible to analyze consumers’ purchase intentions and consumption habits. How to quickly and effectively locate the corresponding consumers from the vast sea of people and establish a more targeted marketing plan is an important link that music resource platforms have to consider in the next competition and development. In this paper, driven by big data, combined with KNN algorithm, the related product purchase data of users in the past online consumption are mined, the purchase intention model of digital music products is constructed, the purchase behavior of users’ digital music products is predicted, and then the precise marketing is implemented. The simulation results show that this algorithm is more accurate in predicting consumers’ purchasing behavior of online music platform, which is 25.67% higher than the traditional ID3 algorithm. The powerful information retrieval and recommendation function of recommendation system provides consumers with more comprehensive and personalized information, which enables consumers to evaluate the product function, performance and price rationality more deeply and accurately, thus reducing the perceived deviation between different brands. Keywords Data mining · Digital music products · Consumers · Purchase intention · Precision marketing

W. Hu · C. Wang (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] X. Geng Finance Office of Huaihua University, Huaihua, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_41

411

412

W. Hu et al.

41.1 Introduction Music, as an art that reflects human emotions, has conquered people with extraordinary attraction since its emergence and won the favor of the public. Big data is gradually changing the way people live, work and think, leading people into a new era of resource integration [1]. Due to the promotion of internet technology, the scale of digital music market presents a high-speed development model, and at the same time, the infringement of new digital music products gradually breaks out. Internet technology is introduced into the music industry, and musicians use the Internet to spread music and get rid of the technical restrictions of traditional dealers [2]. The initial “destructiveness” of new technology leads to the “crisis” of the traditional mode and trading structure of the music industry, but it also brings new opportunities to reshape the operating structure of the music industry [3]. Through the Internet, enterprises can know all kinds of demand information in the market in time and accurately, thus solving the problem of information asymmetry in the traditional market [4]. Judging from the development trend of the music industry, it is urgent for the industry to achieve innovation in two fields. First, it is necessary to design and improve the legal framework to protect the copyright of digital music [5]. Secondly, participants in the music industry must reconsider the business model in order to achieve profitability and remain competitive. With the help of online purchase behavior analysis, marketers can show digital music products to users more attractively and stimulate users’ desire to buy [6]. The emergence of big data has brought about great changes in technology around the world, and people have had a great impact on their lifestyles and the way of cognitive thinking in the world [7]. In the consumer market, when choosing a music channel, consumers generally consider not only a single factor such as price or quality, but also its cost performance. Although there is a certain difference between pirated music and genuine music in quality, consumers are more inclined to choose pirated music with low price compared with its far-reaching cost [8]. How to quickly and effectively locate the corresponding consumers from the vast sea of people and establish a more targeted marketing plan is an important link that the music resource platform has to consider in the next competition and development [9]. In the past, identifying and analyzing customers’ purchase behavior of financial products basically used the form of questionnaire survey to investigate customers’ behavior intentions, but the accuracy was relatively low and the workload was heavy [10]. In this paper, the key factors affecting the innovation of digital music business model are mined. Driven by big data and combined with KNN algorithm, the purchase data of related products of previous users during online consumption are mined, and the purchase intention model of digital music products is constructed.

41 Construction of Purchase Intention Model of Digital Music Products …

413

41.2 Methodology 41.2.1 Features of Digital Music Products Music, as an art to express human emotions, is widely spread with time and space. The popularity of the Internet provides a broader and more efficient communication platform for music communication. At the same time, modern multimedia technology provides a peer-to-peer multimedia platform for everyone. The integration of network and music makes music generate more powerful [11]. According to the requirements of online marketing business and the characteristics of data mining, the online purchase behavior analysis system mainly studies and discusses data preprocessing, data statistical analysis, data model mining and mining results display and comparison. In the traditional music resource platform consumer shopping mode, because it is easy to transmit information, consumers can browse the latest consumer product information and just stay at home, which provides great advantages for shopping convenience. Usually, online marketing data includes customer information, transaction information, product/service information, supplier information and security information. The amount of these data is very large and will continue to grow every day, and most of them have the characteristics of diversification, redundancy and incompleteness in statistical methods. In view of this feature, the online purchase behavior analysis system must preprocess the data to make the mining accuracy higher. Music, as a kind of information bearing human emotions, can only be stored and reproduced in kind in the traditional music era, so the growth of music is largely limited by this [12]. In today’s internet age, the growth of science and technology and the internet all over the world make information interconnected and resources interoperable, and music is thus transformed into a mode that can be listened to at any time when there is a network, without being limited by other factors. Usually, when evaluating and forecasting high-dimensional sample data, it is necessary to select features through statistical analysis first, and get the influence law of system attributes on the system to help people make more accurate mining and forecasting. The main data of online purchase behavior analysis system comes from WEB sites, and the behavior is based on the websites of major music resource platforms. After data preprocessing and statistical analysis, the formatted data is used for pattern discovery, association rule analysis, sequential pattern discovery, access path and classification cluster analysis.

41.2.2 Digital Music Product Purchase Prediction Model Consumers consume virtual goods through the network virtual environment, so as to obtain psychological and spiritual enjoyment. For intangible cultural consumer goods such as digital music, its virtual value should be the most concerned aspect for enterprises and consumers. Driven by big data, more and more enterprises provide

414

W. Hu et al.

or use elastic cloud services to help complete the storage and analysis of massive data. Since its birth, the Internet has spread all over our lives with lightning speed, and it has played a decisive role in the spread of digital music. Since its launch, a piece of music has been endowed with infinite possibilities by the powerful functions of the Internet, and it can be downloaded and listened to indefinitely without being limited by physical carriers and time and space [13]. In view of the unique nature of the online market, information such as customers’ visits, consumption and feedback can be well counted and stored, so it is more convenient to mine online consumer data and more efficient to apply it. Looking at the growth of music, the symbol of its development and progress is to expand to a wider range, so that more and more people can get in touch with and enjoy music. This trend is also illustrated from music tapes, CDs, walkman and now popular mobile clients. Whether you like music or not, the degree of it also has a great influence on the purchase intention of digital music products. The affinity of music mainly includes the affinity of theme, theme and image. Consumers tend to be more inclined to buy products that can get more customer value, so that they can be satisfied to the greatest extent. This paper constructs and implements a purchase intention model of digital music products based on KNN algorithm, and studies the extraction of feature items in digital music products and the distance calculation of feature items in vector space. The system block diagram is shown in Fig. 41.1. The results of data mining can not be directly used as the terminal of data mining model. When the results are applied, the application effect feedback should be carried

Fig. 41.1 Purchase intention model of digital music products

41 Construction of Purchase Intention Model of Digital Music Products …

415

out to continuously adjust the data mining algorithm, so as to obtain a more optimized data mining algorithm and maximize the mining effect. User portraits need a lot of user data. Before data collection, it is necessary to list the types of data to be collected, and then divide different types into different modules, and each module has a code to record the corresponding data. When a user enters a website, all the user’s behaviors will be recorded to form a log of the website, thus forming a data set for building a portrait of the user. The construction process of user portrait of music digital products is shown in Fig. 41.2. Because data mining needs standard data to operate, it is necessary to sort out the data that has been screened for the first time, and only after unifying the standards can a data warehouse be formed for data mining operations. Then, the data warehouse is deeply mined by using data mining technology to obtain more important information or association rules. Then, these mining results are reasonably explained and evaluated to get the data mining results. Fishbein model can be specifically described by the following formula:

Fig. 41.2 User portrait construction process

416

W. Hu et al.

A0 =

N 

bi ei

(41.1)

i=1

where A0 is the user’s attitude towards a specific digital music product; bi is the strength of the user’s belief in the attribute i of the digital music product; ei is the evaluation of attribute i; N is the number of significant beliefs related to this digital music product. Support is the probability that a certain set (X, Y ) appears in the total set I : support(X → Y ) =

P(X ∪ Y ) P(X, Y ) = P(I ) P(I )

(41.2)

Confidence indicates the probability of deducing Y according to association rules under the condition of X event: con f idence(X → Y ) = P(Y |X ) =

P(X ∪ Y ) P(X, Y ) = P(X ) P(X )

(41.3)

The degree of promotion represents the ratio of the probability of containing Y under the condition of X to the probability of containing Y under the condition of not containing X : li f t(X → Y ) =

P(Y |X ) P(Y )

(41.4)

The rapid progress of digital technology has laid a solid foundation for the emergence and growth of online music, and its powerful storage function enables music lovers to enjoy their favorite songs anytime and anywhere. The interactive communication of the Internet is different from the mass media we used before. It can complete the interactive behavior process only by relying on its own platform, without the help of other social public systems. It provides a completely equal platform for the public to enjoy music. When online consumers visit a shopping website, online merchants summarize the statistical information of users by corresponding reasonable statistical tools to form a large user information database.

41.3 Result Analysis and Discussion Digital music, like the internet, has virtual characteristics, and consumers can’t judge its quality by observing its appearance, but only by information perceived from the outside, thus affecting their purchase intention [14]. After years of development, the digital and virtual three-dimensional space created by the internet all over the world has basically integrated global resources and penetrated into all aspects of social life. In order to avoid the loss of important information caused by excluding special

41 Construction of Purchase Intention Model of Digital Music Products …

417

Fig. 41.3 Data outlier removal processing

data, the quantitative data with large differences in data distribution intervals are discretized, as shown in Fig. 41.3. The information in the era of big data caused by a large amount of data overload can be solved by information technology itself. The most important method is to establish a recommendation system so as to provide the most suitable information for users at the right time and place. The powerful information retrieval and recommendation function of recommendation system provides consumers with more comprehensive and personalized information, which enables consumers to evaluate the product function, performance and price rationality more deeply and accurately, thus reducing the perceived deviation between different brands. Compare the recall and accuracy of the algorithm in the prediction of online music platform consumers’ purchase behavior, as shown in Figs. 41.4 and 41.5. From the test results, it is not difficult to find that the algorithm is more accurate in predicting consumers’ buying behavior on the online music platform, which is 25.67% higher than the traditional decision tree algorithm (ID3 algorithm). On the basis of summarizing the stages of precision marketing, we re-examine all the conditions and indicators that originally defined customer characteristics, further screen the target market in multiple dimensions, flexibly tap, give full play to the functions of systematic target market analysis, exploration and tap, timely analyze and master the marketing potential and target market scale of various products of banks, and timely adjust relevant marketing strategies in combination with the analysis results.

418

W. Hu et al.

Fig. 41.4 Comparison of prediction recall

Fig. 41.5 Comparison of prediction accuracy

41.4 Conclusion Digital music industry not only contains deep-rooted traditional problems, but also contains new problems arising from the innovation of business model. Some problems cannot be solved by self-discipline, which is a supplement to the research on the growth of music industry. Due to the promotion of Internet technology, the scale of digital music market presents a high-speed development model. At the same time, the infringement of new digital music products gradually breaks out. In this paper, driven by big data, combined with KNN algorithm, the purchase data of related products

41 Construction of Purchase Intention Model of Digital Music Products …

419

in the past online consumption of users are mined, and the purchase intention model of digital music products is constructed. The results show that the algorithm is more accurate in predicting consumers’ buying behavior of online music platform, which is 25.67% higher than the traditional ID3 algorithm. The powerful information retrieval and recommendation function of recommendation system provides consumers with more comprehensive and personalized information, which enables consumers to evaluate the product function, performance and price rationality more deeply and accurately. In the future, it is necessary to apply the cyclic steps of adjusting the data mining algorithm in the data processing model of related products, continuously optimize the data mining algorithm, and gradually eliminate the impurity data in the database to make the data mining results more reliable.

References 1. Y. Zhao, J. Qi, J. Qian, Research on innovation of agricultural products marketing model in the digital economy era. Trade Exhibit. Econ. 2022(13), 3 (2022) 2. X. Shi, Research on innovation of agricultural products marketing model under the background of digital economy. Inf. Syst. Eng. 2022(2), 4 (2022) 3. T. Cheng, Analysis of export strategy of music and cultural products based on SWOT model. Comp. Study Cult. Innov. 6(12), 4 (2022) 4. X. Ding, Q. Gan, S. Bahrami, A systematic survey of data mining and big data in human behavior analysis: current datasets and models. Trans. Emerg. Telecommun. Technol. 2022(9), 33 (2022) 5. H.B. Jamie, B. Filippo, P. Carmen et al., Grocery store interventions to change food purchasing behaviors: a systematic review of randomized controlled trials. Am. J. Clin. Nutr. 107(6), 1004–1016 (2018) 6. I.H. Sarker, A. Colman, M.A. Kabir et al., Individualized time-series segmentation for mining mobile phone user behavior. Comput. J. 61(3), 349–368 (2018) 7. X. Xu, S. Zeng, Y. He, The influence of e-services on customer online purchasing behavior toward remanufactured products. Int. J. Prod. Econ. 187(5), 113–125 (2017) 8. Y.C. Wu, G. Lin, J.R. Lin, The interaction between service employees and customers toward the increase of purchase intention: an evidence from service industry. J. Sci. Ind. Res. 78(1), 26–30 (2019) 9. M. Attari, B. Ejlaly, H. Heidarpour et al., Application of data mining techniques for the investigation of factors affecting transportation enterprises. IEEE Trans. Intell. Transp. Syst. 2021(99), 1–16 (2021) 10. Q. Sun, X. Huang, Z. Liu, Tourists’ digital footprint: prediction method of tourism consumption decision preference. Comput. J. 2022(6), 6 (2022) 11. H. Liu, Big data precision marketing and consumer behavior analysis based on fuzzy clustering and PCA model. J. Intell. Fuzzy Syst. 40(2), 1–11 (2020) 12. J. Liang, Research on innovation of marketing model of digital publishing products. Publ. Sci. 25(3), 4 (2017) 13. C. Ma, J. Wang, B. Wang et al., Personalized recommendation helps digital marketing of operators. Telecommun. Sci. 34(1), 7 (2018) 14. J. Tian, Z. Wang, Z. Wang, Research on the innovation of new retail marketing model of agricultural products under the background of digital economy. Commer. Econ. Res. 2022(15), 4 (2022)

Chapter 42

English Learning Analysis and Individualized Teaching Strategies Based on Big Data Technology Yang Wu

Abstract With the advancement of information technology, various industries have experienced rapid development, but the education sector has lagged behind in embracing information-driven progress. The majority of educational institutions continue to rely on traditional teaching models, where teachers assess and make instructional decisions based on their personal experiences, akin to the blind men feeling the elephant. Consequently, the need to leverage data technology for learning analysis, implement personalized teaching approaches in Teaching English to Speakers of Other Languages (TESOL), and acknowledge students’ individual differences has become a pressing concern in TESOL research. This paper presents an overview of the development background and significance of data mining technology, the process of data mining, and the mining tools employed in this study. Furthermore, it introduces the creation of a personalized online platform and relevant theories. Subsequently, a controlled experiment was conducted, utilizing students’ online learning behaviors. The experimental group consisted of Grade 7 (1) students, while Grade 7 (2) students formed the control group. Through comparative analysis, it was established that after personalized learning facilitated by a WeChat network platform, the average scores of the experimental group surpassed those of the control group. Notably, significant changes in learning attitudes were observed as well. Initially, only 20 students expressed interest in English learning, whereas during the experiment, this number increased to 39 students demonstrating an interest in English learning. Keywords Learning analysis · Personalized teaching · Big data technology · English teaching strategy

Y. Wu (B) School of Foreign Languages, Huaihua University, Hunan 418000, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_42

421

422

Y. Wu

42.1 Introduction The study addresses the problem of English learning analysis and proposes the use of big data technology to develop individualized teaching strategies. Traditional teaching methods often struggle to meet the diverse learning needs of English language learners, prompting the need for a new approach. By leveraging big data technology, the researchers aim to collect and analyze learner data in a comprehensive manner, leading to the creation of personalized and effective teaching strategies. Analyzing learner data and tailoring teaching methods have significant implications for enhancing English language acquisition. By analyzing learner data, educators gain valuable insights into individual students’ strengths, weaknesses, and learning preferences. This information enables them to customize teaching approaches that are better aligned with students’ specific needs, ultimately improving their language learning outcomes. Individualized teaching methods have the potential to create a more engaging and motivating learning environment, fostering greater language proficiency and overall student success. English learning analysis presents challenges that require innovative approaches. Traditional methods often rely on standardized assessments that provide limited insights into individual learning patterns and preferences. Additionally, the large amount of data generated in English language learning contexts poses challenges in terms of data collection, management, and analysis. Existing methods may fall short in addressing these complexities, underscoring the importance of leveraging big data technology to overcome limitations. The study utilizes big data technology to facilitate English learning analysis and develop individualized teaching strategies. By harnessing the power of big data, researchers aim to obtain a comprehensive understanding of learners’ language abilities, progress, and needs. This technology allows for the collection, storage, and analysis of large volumes of learner data, including language proficiency assessments, learning behaviors, engagement levels, and socio-cultural factors. Through the use of advanced algorithms and machine learning techniques, this data can be processed to identify meaningful patterns and trends, informing the development of personalized teaching strategies. The statistical and computational methods employed in the study ensure the appropriate analysis of the data. Data mining techniques are used to extract valuable insights from large-scale learner datasets, uncovering hidden patterns and relationships. Machine learning algorithms enable the creation of predictive models that can forecast learners’ future performance and guide decision-making in designing individualized teaching approaches. Statistical modeling techniques are applied to analyze the relationships between different learner variables, providing a deeper understanding of the factors influencing English language acquisition. These methods ensure the rigor and validity of the analysis, supporting evidence-based decisionmaking to enhance English language learning outcomes. This is precisely the value offered by learning analysis technology [1, 2]. Both domestically and internationally, research on “learning analysis technology” has been rapidly progressing. Birjali, for

42 English Learning Analysis and Individualized Teaching Strategies …

423

instance, emphasized the significance of e-learning as a crucial educational method that greatly impacts the learning process and determines its relevance. To explore new learning methods and strategies, Birjali proposed an adaptive e-learning model based on DT technology [3]. In a similar vein, Su presented an innovative approach to optimize online education data mining using DT evaluation technology. This method includes an auxiliary module in traditional online course evaluation technology. It measures learning effectiveness by calculating the learners’ exercise completion for each lesson, thereby formulating a corresponding teaching plan. DT evaluation technology is employed to determine the optimal amount of online education based on the characteristics and needs of the subjects being evaluated [4]. Additionally, Yin proposed a grid clustering algorithm based on discrete Morse theory. By integrating data mining techniques with discrete Morse theory, this algorithm achieves efficient cluster analysis. Optimizing the grid clustering process using critical points from discrete Morse theory leads to the best clustering outcomes. Physical education (PE) teachers can utilize this teaching method to prioritize students’ physical health and enhance the teaching effectiveness. While these studies have shed light on the importance and urgency of learning analysis technology in large-scale education data applications, they have yet to establish a clear framework for learning analysis technology in the context of DT or design a research trajectory for applying learning analysis technology to educational data. Therefore, building upon existing DT analysis methods, this project aims to focus on the extensive data generated in online teaching as the research subject and establish a comprehensive learning analysis technology system for educational applications.

42.2 Introduction of Educational Big Data and Related Technologies 42.2.1 Big Data for Education DT, which stands for Big Data Technology, refers to a data set that cannot be captured, managed, and processed by conventional software tools. It is characterized by its massive volume, diverse nature, high velocity, and inherent value [6, 7]. In the context of education, Educational DT encompasses all the digital, textual, visual, audio, and video data generated throughout the educational process. This includes students’ basic information, academic test results, activity records, learning behaviors, as well as teachers’ profiles, instructional designs, and curriculum plans. These continuously accumulating and contextually relevant datasets hold significant analytical value for education and teaching. The application of DT in education is poised to bring profound changes to educational concepts, teaching methodologies, and evaluation systems [8, 9].

424

Y. Wu

42.2.2 Education Data Mining Within the realm of data mining, various technologies exist, and this paper primarily focuses on association rule analysis. (1) Association rule analysis Association rules pertain to specific rules inherent in a data set. Through data mining techniques, association rules can be discovered within the data set, representing a critical component of data mining and widely considered as one of the most significant models utilized in this field [10, 11]. The definition of association rules can be expressed  as follows: Assuming the implied expression of X → Y formula, data set N = k1 , k2 , . . . , k j represents transaction data set, and X and Y represent two subsets of transaction data set N. To measure X → Y, we need to use confidence and support degree. Confidence is the probability that another transaction will appear at the same time when one transaction appears. It is the main index of mining the correlation between data transactions. Support represents the number of an item set in the whole data set, which is used to explain the commonness of the rules in the data set, so as to determine the correlation degree of the rules. Confidence degree and support degree can be expressed as [13]: suppor t(X → Y ) =

T (X ∪ Y ) NS

con f idence(X → Y ) =

T (X ∪ Y ) T (X )

(42.1) (42.2)

where DS represents the number of data sets.

42.2.3 The Relationship Between English Teaching and Big Data The connection between TESOL and DT manifests in the following aspects: DT enables TESOL to gather and sift through information effectively, make scientific predictions and judgments, and provide personalized education and guidance. Data, on its own, lacks practical significance. Its purpose lies in extracting valuable information through the processing and analysis of vast datasets, transforming raw data into meaningful insights. However, the utilization of data also poses certain challenges to TESOL, such as the potential dissemination of negative ideas and the compromise of students’ privacy. Educators should leverage DT to collect and filter valuable information, shield against the influence of irrelevant and harmful content, and prioritize the education of mainstream values for students and users, ensuring that prevailing ideas and public opinions occupy a central position.

42 English Learning Analysis and Individualized Teaching Strategies …

425

42.3 Application of Data Mining Technology in Learning Analysis 42.3.1 Learning Analysis Techniques Currently, the field of learning needs technology relies on data mining, utilizing abundant educational data to analyze learners’ learning behaviors. The primary focus of learning needs technology is to examine data at various levels, including learners’ information, learning processes, learning environments, and their interrelationships. Building upon educational data mining techniques and incorporating educational data technology, this field harmoniously integrates various models and methods. This integration aims to ensure the reliability and comprehensiveness of the learning needs assessment process and outcomes [14, 15].

42.3.2 Construction of Learning Analysis Technology Model The learning needs model within the DT environment encompasses seven interconnected stages: data acquisition, data storage, data cleaning, data integration, data analysis, data visualization, and operationalization. Each stage can be further broken down into a series of technologies and processes. Figure 42.1 illustrates the comprehensive depiction of the relationships among these seven stages in the learning needs model. In terms of testing methods, this study focuses on correlation analysis. The socalled correlation refers to the relationship between two variables. According to the Data Collection and Acquisition Storage

Action (Visual) Representation

Data Loop

Analyse

Fig. 42.1 Learning model

Cleaning

Integration

426

Y. Wu

scatter plot of data distribution, two variables a and b have three types of correlation: positive correlation, negative correlation and no correlation. The correlation coefficient is calculated as follows:   F (a − γa )(b − γb ) cov(a, b) w(a, b) = = (42.3) θa θb θa θb

42.4 Application of Data Mining Technology in Personalized English Teaching 42.4.1 Teaching Design Using the learning needs process model developed in this study, the data generated by students during their online learning activities is collected and analyzed. The analyzed data is then provided as feedback to both teachers and students. Teachers can utilize this feedback to intervene promptly when students encounter learning difficulties, assist them in modifying ineffective learning behaviors, and identify teaching challenges in a timely manner. Meanwhile, students can adapt their learning approach based on the received information.

42.4.2 Use Learning Analysis Technology to Analyze Personalized Data Students on network platform, can produce the personalized data information includes: the personality characteristic information, process information, interactive information, on this basis, through the social network analysis, analyzing the distribution of the information in the online learning, the learning process, get the user activity on BBS, and information interaction between users.

42.4.3 Personalized Data Feedback Utilizing the learning needs process model proposed in this paper, the data gathered from students’ engagement with the online learning platform is collected and analyzed. Subsequently, the analyzed data is provided as feedback to both teachers and students. Teachers can effectively intervene in students’ learning challenges

42 English Learning Analysis and Individualized Teaching Strategies …

427

based on the feedback data, offering guidance to rectify unproductive learning behaviors and promptly identify teaching issues. Simultaneously, students adjust their learning approach based on the received information.

42.5 English Teaching Practice Constructivism theory emphasizes that students are active participants in the learning process, with teachers assuming the role of guides rather than sole authorities on knowledge. In TESOL, the goal extends beyond mere memorization of words, phrases, and sentence patterns; it involves students learning to apply this linguistic knowledge under the guidance of their teachers. By integrating new knowledge with their existing understanding, students construct new knowledge. However, the objectivity of the newly constructed knowledge requires collaboration, communication, and discussion among students. To validate the efficacy of the learning needs model proposed in this paper, this study adopts the WeChat network platform, leverages learning needs technology to its fullest extent, reconstructs the classroom teaching approach, and realizes intelligent TESOL in the era of data technology.

42.5.1 Research Object Selection and Analysis In this study, two classes of Grade 7 in a middle school were selected for the experiment, among which Grade 7 (1) was the experimental class and Grade 7 (2) was the control class. The students in the experimental class and the control class are taught by the same English teacher, and the final grades of the students are taken as the comparison analysis. The specific differences are shown in Table 42.1. As depicted in Table 42.1, prior to implementing the WeChat network platform to support TESOL, both the experimental class and the control class achieved similar outcomes. However, overall, the control class exhibited slightly better performance compared to the “experimental class”. In the context of middle school, English classes have predominantly been conducted within the confines of the classroom. Consequently, the traditional teaching approach struggles to accommodate the individual differences among students. Based on English teachers’ instructional experience, it Table 42.1 English level of students in experimental class and control class before practice Class

Num.

Average (s)

Max (s)

Min (s)

≥ 100 (s)

100–75 (s)

≦ 75 (s)

Experimental class (Grade 7)

45

88.5

115

40

15

23

7

Control class (Grade 7)

46

89.3

119

52

16

25

4

428

Y. Wu

Table 42.2 T-test of independent samples of final scores of Class (1) and Class (2) in Grade 7 (n = 45, x ± x) Final grade

Experimental class (Grade 7)

Control class (Grade 7)

T

P

94 ± 18.3624

91 ± 20.3812

0.421

0.530

Note * p < 0.05, ** p < 0.01, *** p < 0.001

was observed that some students, despite possessing initial English-speaking abilities, struggled to maintain focus in class and grew disinterested in English. Thus, in this survey, there was minimal disparity in English proficiency levels between the selected classes.

42.5.2 Analysis of the Effect of Personalized Learning Supported by WeChat Network Platform From Table 42.2, we can see that the average score of English level of Class 7 (1) is higher than that of Class 7 (2) after personalized learning with the support of WeChat network platform this semester. Through independent sample t test, it is concluded that T = 0.421 and P = 0.530 > 0.05, so it can be known that there is no significant difference between experimental class and control class in final scores. As can be seen from Fig. 42.2, after this experiment, students’ general attitude towards English learning has been improved, especially their interest in English is greatly increased, their enthusiasm for English learning is high, and they have more confidence in English learning. It also shows that with the help of the platform, students have developed a greater interest in and showed great love for English courses.

42.5.3 Discussion of Practical Results The use of WeChat and other Internet platforms for personalized English learning can not only help students better grasp English knowledge, but also effectively improve the teaching level of middle school English teachers, improve teaching quality, and enrich teaching evaluation methods. Through the comparison practice, it is proved that the evaluation result of the experimental class of Grade 7 (1) after the WeChat network platform assisted personalized learning is higher than the average score of the control class of Grade 7 (2). There are also great changes in learning attitude. There are only 20 students who are interested in English learning before practice, and 39 students who are interested in English learning during practice. In addition, students can also have an objective understanding of their own learning situation through the learning process data recorded by the platform and the analysis results

42 English Learning Analysis and Individualized Teaching Strategies …

Pre-practice

50

39

Number

40 30 20

429

After practice 41

38

40 20

21

18

15

10 TAKE THE INITIATIVE TO TALK TO YOUR TEACHER ABOUT PROBLEMS

BUILD UP THE CONFIDEN CE OF LEARNING ENGLISH

TAKE THE INITIATIVE TO COMPLETE ENGLISH TASKS

LIKE STUDY ENGLISH

0

Learning attitude change Fig. 42.2 Comparison of the change of students’ attitude towards learning before and after practicing

of the data. At the same time, students can also find their own shortcomings in learning according to the feedback of personalized data, so as to adjust their learning style.

42.6 Conclusion In the realm of DT, this study establishes a learning needs Tech model and explores how to effectively enhance English classroom teaching through the utilization of the WeChat network platform, thereby achieving the objective of “smart” teaching. By integrating learning needs Tech with the core educational practices and leveraging data to monitor and forecast students’ learning processes, the aim is to promptly identify potential learning challenges and hidden issues pertaining to teaching and learning. Subsequently, teaching decisions can be adjusted, and the instructional process can be optimized to facilitate intelligent teaching. Emphasizing the significance of data-driven approaches, educational reform is pursued, enabling highly intelligent teaching that genuinely caters to individual needs and promotes students’ cognitive development. Acknowledgements This work was supported by the Outstanding Youth Scientific Research Project of Hunan Provincial Education Department (No. 22B0768), the Teaching Reform Project of Hunan Province (HNJG-2021-0930).

430

Y. Wu

References 1. R. Cobos, F. Jurado, A. Blazquez-Herranz, A content analysis system that supports sentiment analysis for subjectivity and polarity detection in online courses. Rev. Iberoam. Tecnol. Aprendiz. PP(99), 1 (2019) 2. J. Marantika, T.G. Ratumanan, E. Kissiya, The implementation of high school local content learning in Babar Island. Sci. Nat. 2(1), 057–065 (2019) 3. M. Birjali, A. Beni-Hssane, M. Erritali, A novel adaptive e-learning model based on big data by using competence-based knowledge and social learner activities. Appl. Soft Comput. 69(1), 14–32 (2018) 4. G. Su, Analysis of optimization method for online education data mining based on big data assessment technology. Int. J. Contin. Eng. Educ. Life-Long Learn. 29(4), 321–335 (2019) 5. Z. Yin, Outlier data mining model for sports data analysis based on machine learning. J. Intell. Fuzzy Syst. Appl. Eng. Technol. 40(2), 2733–2742 (2021) 6. G. Zhai, Y. Yang, H. Wang et al., Multi-attention fusion modeling for sentiment analysis of educational big data. Big Data Min. Anal. 3(4), 311–319 (2020) 7. W.S. Du, Multi-attention fusion modeling for sentiment analysis of educational big data. Big Data Min. Anal. 3(4), 311–319 (2020) 8. C. Li, W. Leite, W. Xing, Building socially responsible conversational agents using big data to support online learning: a case with Algebra Nation. Br. J. Educ. Technol. 53(4), 776–803 (2022) 9. W. Zhou, X. Xia, Z. Zhang et al., Association rules mining between service demands and remanufacturing services. Artif. Intell. Eng. Des. Anal. Manuf. 35(2), 1–11 (2020) 10. D. Brodic, A. Amelio, Association rule mining for the usability of CAPTCHA interfaces: a new study of multimedia systems. Multimed. Syst. 24(2), 1–20 (2018) 11. K. Li, L. Liu, F. Wang et al., Impact factors analysis on the probability characterized effects of time of use demand response tariffs using association rule mining method. Energy Convers. Manag. 197, 111891.1–111891.15 (2019) 12. R. Somyanonthanakul, T. Theeramunkong, Characterization of interestingness measures using correlation analysis and association rule mining. IEICE Trans. Inf. Syst. E103.D(4), 779–788 (2020) 13. H. Sang, Analysis and research of psychological education based on data mining technology. Secur. Commun. Netw. 2021(7), 1–8 (2021) 14. J. Zhu, C. Gong, S. Zhang et al., Foundation study on wireless big data: concept, mining, learning and practices. China Commun. 15(12), 1–15 (2018) 15. D. Kerzic, A. Aristovnik, N. Tomazevic et al., Assessing the impact of students’ activities in e-courses on learning outcomes: a data mining approach. Interact. Technol. Smart Educ. (2019)

Chapter 43

Research on Distributed Storage and Efficient Distribution Technology of High Resolution Optical Remote Sensing Data Guozhu Yang, Wei Du, Wei Hu, Chao Gao, Enhui Wei, and Bangbo Zhao

Abstract With the increasing demand for high-resolution remote sensing images, many countries and companies attach great importance to the development of highresolution remote sensing satellites. The traditional data management mode is file system mode, which has shortcomings such as low reading and writing rates and slow transmission rates. With the development of database technology, data sharing is gradually improved, data redundancy is reduced, and data consistency and integrity are improved. Aiming at the problem of massive high-resolution optical remote sensing data integration in the scene of multi-satellite data center, this paper designs a task parallel processing mode based on cat swarm algorithm, which distributes independent processing tasks to cluster nodes to calculate their own execution, which has strong independence between nodes, higher reliability and easy to expand gradually. The results show that the algorithm effectively improves the storage efficiency of remote sensing images and ensures the integrity of data, which not only achieves the purpose of screening data but also improves the reading efficiency. The research results can meet the needs of efficient storage and management of massive images, and have good feasibility and scalability. Keywords Remote sensing data · Distributed storage · Cat swarm algorithm · Parallel processing

G. Yang · W. Du · W. Hu (B) · C. Gao · E. Wei · B. Zhao State Grid Electric Power Space Technology Co., Ltd., Beijing 102209, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_43

431

432

G. Yang et al.

43.1 Introduction With the more and more progress of human science and technology and the deeper exploration of the earth, these activities make it more necessary for human beings to make rational use of the earth’s resources. Remote sensing technology, as an important technology for remote detection of the earth, has developed vigorously. Remote sensing information is playing an important role in promoting the development of human society in various fields, and its position in economic, political, military, cultural and many other fields is becoming more and more important [1]. The everincreasing remote sensing data are catalogued, stored and managed by different data centers due to institutional constraints. These data centers mainly include large-scale comprehensive data centers of satellite data production institutions, specialized data departments of universities and scientific research institutions, international data integration organizations and commercial organizations [2]. A series of multi-angle sensors with different temporal and spatial resolutions were launched, resulting in multi-platform, multi-sensor, multi-band and multi-temporal resolutions of remote sensing data, the volume of data has been exploding, and the types and structures of data have become more and more complicated [3]. Remote sensing data is a true description of everything on the surface, which records the transformation of the surface by human beings, and many human activities can be traced in remote sensing images. At present, it has become a trend to use big data to promote economic transformation and upgrading, improve social governance, and enhance government services and management capabilities. Satellite high-resolution optical remote sensing data is a hierarchical digital image matrix and a multi-dimensional data type with spatial characteristics. The amount of remote sensing data obtained from aerospace, aviation, adjacent space and other remote sensing platforms has increased dramatically, and remote sensing data has obvious big data characteristics [4]. Massive remote sensing data, complex data model, multi-scale data processing, high concurrent data access and massive data throughput make data storage and management technology become increasingly important [5]. At present, the efficient data storage and management technology of remote sensing research can not meet the above-mentioned increasing data storage and calculation needs in terms of data storage redundancy, horizontal node expansion, data load balancing and hardware cost [6]. Aiming at the problem of massive highresolution optical remote sensing data integration in the scene of multi-satellite data centers, this paper studies the data organization and integration technology, and realizes a distributed data integration and service system, which provides data support for collaborative processing of multi-data centers and multi-source remote sensing data and product production.

43 Research on Distributed Storage and Efficient Distribution Technology …

433

43.2 Related Work Pan et al. proposed a distributed storage strategy for remote sensing images based on the non-relational database MongoDB, which made the storage of remote sensing images have better time efficiency in data warehousing and concurrent access [7]. However, the technology of non-relational database such as MongoDB is not mature, unstable and difficult to maintain. Bi et al. proposed a storage model of distributed high-resolution optical remote sensing data in cloud environment [8]. This model combines the characteristics of HDFS and HBase to realize distributed remote sensing image storage management based on Hadoop. But this model does not support the distributed storage of remote sensing raw data and related vector data. Liu et al. proposed a fast indexing mechanism of image pyramid model based on linear quadtree technology [9]. This mechanism can be stored and displayed at different resolutions according to users’ needs, but the storage performance of massive image data is low. Weipeng et al. put forward a distributed Key-Value storage model of remote sensing data based on image block organization, and combined with the open source distributed file system HDFS, the distributed efficient storage and spatial area retrieval of image data were realized, which effectively solved the shortcomings that the traditional relational database technology could not meet the performance requirements of storing and managing massive data and the efficiency and scalability of storing massive data in a single node were insufficient [10]. Sánchez-Ruiz and others proposed a hierarchical data organization method based on warp and weft grids [11]. This method is a new image segmentation method with fifteen specific segmentation intervals, which solves the above-mentioned precision loss and the resolution of each level corresponds to the basic scale of the national map. It is a better solution to discretize image data. Chen et al. put forward the shared memory parallel mode of hybrid classification algorithm, which is helpful for the conversion from serial algorithm to parallel algorithm [12]. Yan et al. designed and implemented a highperformance classification system based on algorithm parallelism with the help of MPI parallel programming mode [13]. Considering the bottleneck of parallel processing, this paper designs a task parallel processing mode based on cat swarm algorithm, which distributes independent processing tasks to cluster nodes to calculate their respective execution, which has strong independence between nodes, higher reliability and easy to expand incrementally. In this method, the cats are divided into two working modes. In the search mode, by copying their own positions, and then applying mutation operators to the duplicates, the search around their own positions is deepened and the problem-solving performance is effectively improved.

434

G. Yang et al.

43.3 Methodology Because of the large capacity, large quantity and unstructured characteristics of image data, it is complicated to design data access strategy based on image data itself, which is not easy to reflect the frequency characteristics of data access. Metadata is a reflection of the characteristics of image data, which can be used to establish a metadata access model, and then determine the storage strategy of image data through this model [14]. Image data clustering management software provides distributed cluster resource management and operation monitoring functions, which is convenient for users to manage clusters in confidential environment. The management process of distributed cluster is relatively complicated, and the management difficulty of cluster can be reduced through cluster management tools. The original high-resolution optical remote sensing data come from various sources, which are unprocessed image data with different time resolutions, different spatial resolutions and different spectral resolutions shot by various types of high-resolution remote sensing satellites, unmanned aerial vehicles and other aerial remote sensing sensors. Generally speaking, multi-source heterogeneous data integration is the effective integration and management of spatial data from multi-source satellites and sensors with different temporal and spatial ranges and resolutions in physics, logic and semantics. In the process of integration, it is necessary to fully consider not only the different storage structures of spatio-temporal data, but also the differences in the accuracy of geospatial features expressed by their organizational standards. Remote sensing information product data includes basic geographic data and basic spatial data. Basic geographic data includes basic topographic map elements such as water system, roads and residential areas, and thematic vector information data, such as land use type, forest land resources, river distribution, etc., extracted from remote sensing images, which are in the form of vector points, lines and planes and associated with a large number of thematic attribute data. The architecture of high-resolution optical remote sensing data processing platform is shown in Fig. 43.1. If it is assumed that the uplink and downlink transmission rate of the task Tui to the site is fixed at ri , and when the number of tasks Si transmitted to the internal server of the remote sensing platform is Dui , the transmission delay can be expressed as: Tui = Dui /ri

(43.1)

If the task Si is executed at the mobile terminal, the total system delay from sending the uninstall request to the completion of the task execution can be expressed as:   T Ci = Tti + tbi + min Pci, j + ti

(43.2)

Tti represents the transmission delay, tbi represents the delay waiting caused by insufficient bandwidth, Pci, j represents the delay caused by queuing, and ti represents the delay caused by the remote sensing platform when performing tasks. If the task

43 Research on Distributed Storage and Efficient Distribution Technology …

435

Fig. 43.1 High-resolution optical remote sensing data processing platform

Si executes related commands in the remote sensing platform, its execution delay can be expressed as: T Ci, j = Tti + tbi + Pci, j + ti

(43.3)

Cat swarm algorithm is a non-numerical optimization calculation method based on cat behavior pattern and swarm intelligence. In this paper, a parallel processing mode of tasks based on cat swarm algorithm is designed, which distributes independent processing tasks to cluster nodes to calculate their respective execution, which has strong independence between nodes, higher reliability and easy to expand incrementally. Every cat has its own current speed, which is recorded as: Vi = {Vi1 , Vi2 , . . . , Vil }

(43.4)

Remember Xbest(t) as the best position experienced by the current cat group, that is, the cat with the best adaptability. Each cat updates its speed according to the following formula:  d  Vkd (t + 1) = Vkd (t) + c · rand · X best (t) − X kd (t)

(43.5)

where Vkd (t + 1) is the speed component of the d feature after the update of the k d cat; l is the characteristic number of the solution; X best (t) is the component of the d feature of the currently searched optimal solution; X kd (t) is the d feature component of the current position of the k cat; c is a constant; rand is a random number between (0, 1). Each cat updates its position according to the following formula:

436

G. Yang et al.

X kd (t + 1) = X kd (t) + Vkd (t + 1)

(43.6)

where X kd (t + 1) represents the d feature component after the k cat updates its position. Search mode is a model that simulates a cat searching around and looking for the next location. The cat copies itself, adds a random disturbance in its neighborhood to reach a new position, and then finds the point with the highest fitness according to the fitness function as the position point to which the cat wants to move. The location update function of its copy is: X kd (t + 1) = X kd (t) + X kd (t + 1) · (sr d · (rand ∗ 2 − 1))

(43.7)

where sr d = 0.2, that is, the range of eigenvalue change on each cat is controlled within 0.2, which is equivalent to searching in its own neighborhood. The system supports the online browsing of original image data and processed image data through the generation of high-resolution optical remote sensing number thumbnails and the construction of standard tile service. The system supports online query of original image data and processed image data according to resolution, year, administrative division, spatial location, place name and address, and batch download of image data under the confidential network environment. Virtual server inherits the computing power of physical server, monitors the current storage situation and running status of virtual storage nodes in real time, and supports large-flow concurrent access, which greatly improves the utilization rate of hardware resources.

43.4 Result Analysis and Discussion Inside the virtual server, the data is managed by combining virtual storage with direct addressing. Among them, the storage of remote sensing original image data and primary remote sensing product data only needs to select a spherical grid division standard according to the satellite shooting width, only logically divide the original data, and the data itself does not do any processing, and then determine the row and column numbers of the data according to the latitude and longitude of the image center. When the experiment is trained on the data set, the initial learning rate is 0.001, the learning rate changes in multistep, the step value is set to 28,000 and 56,000, the gamma is set to 0.1, and the maximum number of iterations is set to 800. The network performance is tested every five trainings. Figure 43.2 shows the influence of distributed storage and efficient distribution of remote sensing platform on evenly distributed data sets. Figure 43.3 shows the impact of distributed storage and efficient distribution on real data sets. Because the node algorithm is simplified and compressed sensing is used to process the transmitted data, the data transmission volume of this scheme is greatly reduced. Before executing the operation flow, all operations must be verified by the local control system of each execution module, and then implemented and executed.

43 Research on Distributed Storage and Efficient Distribution Technology …

437

Fig. 43.2 Influence of distributed storage and efficient distribution on evenly distributed data sets

Fig. 43.3 Influence of distributed storage and efficient distribution on real data sets

The most important step in the integration process is to completely or partially transform the organization standard and organization meaning of multi-source heterogeneous data based on a unified data model, so that the integrated data set has a global unified data model outside. Image service publishing supports single-band and multi-band original result image data and processing result image data, as well as data service management and batch management and online visualization of data services under specified conditions. Physically dispersed remote sensing platforms form a whole logically, and

438

G. Yang et al.

Fig. 43.4 Accuracy comparison of the algorithm

they will be interconnected through a wider bandwidth network. It is necessary to have an integrated dispatching platform to manage and dynamically dispatch remote sensing platforms in different geographical locations. Figure 43.4 shows the accuracy comparison of the algorithm. It can be seen that with the increase of the number of experiments, the accuracy of this algorithm is stable at about 95%, and the real-time wavelength tends to be stable. Comparing the accuracy of this algorithm with that of the traditional algorithm, the accuracy of the task parallel processing mode based on cat swarm algorithm is significantly higher than that of the traditional PSO algorithm. In single-machine mode, when the server fails, the system will collapse and cannot provide services to the outside world normally. In distributed mode, the system has high reliability. When the data node fails, the system will make adaptive remedies, which generally will not lead to the complete collapse of the system. After self-repair, the system can continue to operate normally to ensure normal external services. The safety comparison results of different schemes are shown in Fig. 43.5. According to the data in Fig. 43.5, this method is safe and has certain practical application value. The algorithm divides the cats into two working modes. In the search mode, by copying their own positions, and then applying mutation operators to the duplicates, the search around their own positions is deepened and the problemsolving performance is effectively improved. In the tracking mode, the current position of the cat is constantly updated by using the position of the optimal solution, so that the solution is constantly approaching the direction of the optimal solution and finally reaches the global optimal solution.

43 Research on Distributed Storage and Efficient Distribution Technology …

439

Fig. 43.5 Security comparison of different schemes

43.5 Conclusions Satellite high-resolution optical remote sensing data is a hierarchical digital image matrix and a multi-dimensional data type with spatial characteristics. With the rapid development of modern remote sensing technology, the spatial resolution, temporal resolution, spectral resolution and radiation resolution of remote sensing data are constantly improving, and the data types are also increasing. Aiming at the problem of massive high-resolution optical remote sensing data integration in multi-satellite data center scene, this paper designs a task parallel processing mode based on cat swarm algorithm, which distributes independent processing tasks to cluster nodes and calculates their respective execution. In the tracking mode, the current position of the cat is constantly updated by using the position of the optimal solution, so that the solution is constantly approaching the direction of the optimal solution and finally reaches the global optimal solution. The results show that with the increase of the number of experiments, the accuracy of this algorithm is stable at about 95%, and the real-time wavelength tends to be stable. Comparing the accuracy of this algorithm with that of the traditional algorithm, the accuracy of the task parallel processing mode based on cat swarm algorithm is significantly higher than that of the traditional PSO algorithm. The application of this algorithm can obviously improve the efficiency of remote sensing image sharing, improve the ability of remote sensing image spatio-temporal data management and service, and play a positive role in promoting the application of remote sensing big data. Acknowledgements Technology project of State Grid Electric Power Space Technology Co., Ltd. “Application research on power grid operation and maintenance, disaster prevention and reduction based on high resolution and SAR satellite data processing technology, No. 5100/2022-44004B”.

440

G. Yang et al.

References 1. K. Toride, Y. Sawada, K. Aida, T. Koike, Toward high-resolution soil moisture monitoring by combining active-passive microwave and optical vegetation remote sensing products with land surface model. Sensors 19, 3924 (2019) 2. S. Luo, C. Song, K. Liu, L. Ke, R. Ma, An effective low-cost remote sensing approach to reconstruct the long-term and dense time series of area and storage variations for large lakes. Sensors 19, 4247 (2019) 3. J. Rabbi, N. Ray, M. Schubert, S. Chowdhury, D. Chao, Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sens. 12, 1432 (2020) 4. F. Wang, Y. Chen, Z. Li, G. Fang, Y. Li, X. Wang, X. Zhang, P.M. Kayumba, Developing a long short-term memory (LSTM)-based model for reconstructing terrestrial water storage variations from 1982 to 2016 in the Tarim River Basin, Northwest China. Remote Sens. 13, 889 (2021) 5. R. Roni, P. Jia, An optimal population modeling approach using geographically weighted regression based on high-resolution remote sensing data: a case study in Dhaka City, Bangladesh. Remote Sens. 12, 1184 (2020) 6. S.-H. Lee, K.-J. Han, K. Lee, K.-J. Lee, K.-Y. Oh, M.-J. Lee, Classification of landscape affected by deforestation using high-resolution remote sensing data and deep-learning techniques. Remote Sens. 12, 3372 (2020) 7. X. Pan, J. Zhao, A central-point-enhanced convolutional neural network for high-resolution remote-sensing image classification. Int. J. Remote Sens. 38, 6554–6581 (2017) 8. F. Bi, J. Chen, Y. Zhuang, M. Bian, Q. Zhang, A decision mixture model-based method for inshore ship detection using high-resolution remote sensing images. Sensors 17, 1470 (2017) 9. M. Liu, C. Xiong, J. Pan, T. Wang, J. Shi, N. Wang, High-resolution reconstruction of the maximum snow water equivalent based on remote sensing data in a mountainous area. Remote Sens. 12, 460 (2020) 10. J. Weipeng, T. Dongxue, C. Guangsheng, L. Yiyuan, Research on improved method of storage and query of large-scale remote sensing images. J. Database Manag. (JDM) 29, 1–16 (2018) 11. S. Sánchez-Ruiz, F. Maselli, M. Chiesi, L. Fibbi, B. Martínez, M. Campos-Taberner, F.J. GarcíaHaro, M.A. Gilabert, Remote sensing and bio-geochemical modeling of forest carbon storage in Spain. Remote Sens. 12, 1356 (2020) 12. Y. Chen, J.P. Guerschman, Z. Cheng, L. Guo, Remote sensing for vegetation monitoring in carbon capture storage regions: a review. Appl. Energy 240, 312–326 (2019) 13. S. Yan, L. Jing, H. Wang, A new individual tree species recognition method based on a convolutional neural network and high-spatial resolution remote sensing imagery. Remote Sens. 13, 479 (2021) 14. B. Wang, C. Waters, S. Orgill, J. Gray, A. Cowie, A. Clark, D. Li Liu, High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia. Sci. Total Environ. 630, 367–378 (2018)

Chapter 44

Research and Application of Digital Modeling Technology Based on Multi-source Data of Power Grid Junfeng Qiao, Hai Yu, Zhimin He, Lianteng Shen, and Xiaodong Du

Abstract During the production and operation process, power enterprises generate rich power data, which is distributed and stored in multiple power application information systems. Due to past management reasons, there are significant differences in the description rules of data among different power application information systems, and due to the different data structures of various business processes, their storage structures and media also vary greatly. For example, for the operation and maintenance data generated during the operation and maintenance work, mainly photo and video data, this data must be stored in a large file storage server. The power consumption information sales data, which is composed of discrete numerical data, is stored in the relational database. In response to the above issues, this article proposes a method for constructing a power multi-source data model, which unifies the modeling of heterogeneous power data from multiple sources to meet the digital construction and service needs of power business. For file system data, build a file retrieval directory, store it in a relational data table, and establish an association model with relational data to achieve data connectivity of business meaning. This model can provide data services for power application scenarios, and power business application managers can directly call the model interface to meet the needs of power applications. Keywords Multi source electric data · Digital models · Business models · Power applications · Data analysis J. Qiao (B) · H. Yu · Z. He State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, State Grid Smart Grid Research Institute Co., Ltd., Nanjing, China e-mail: [email protected] H. Yu e-mail: [email protected] L. Shen China Electric Power Research Institute Co., Ltd., Beijing, China X. Du State Grid Hebei Electric Power Co., Ltd., Shijiazhuang, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_44

441

442

J. Qiao et al.

44.1 Introduction The research on the construction of power models mainly involves modeling the operational trend of electricity and the development trend of electricity loads, simulating their data curves based on historical data to predict the future trend of this business. In the work of power marketing management, the consumption of power customers is the most concerned data in the power marketing [1]. The power marketing management hopes to comprehensively master the power consumption and payment of all users. After mastering the complete electricity consumption data of users, conduct analysis on their electricity habits, identify their preferred consumption characteristics and needs, and then develop corresponding consumption service strategies [2]. In order to understand the electricity consumption characteristics of power users, it is also necessary to analyze their electricity consumption patterns. Based on the curve data of the electricity consumption information collection system, a mining model for analyzing users’ electricity consumption and electricity consumption fluctuation patterns is constructed. The input of this model is the user’s electricity consumption information collection data, such as voltage curve data, current curve data, power curve data, and daily frozen recorded electricity quantity [3]. The output of this model is the classification of users, such as industry, schedule, and electricity usage habits on weekdays and weekends. In addition to electricity consumption data, electricity data also comes from many business information systems, and their data structures and types also have significant differences and different characteristics [4]. At present, in order to improve the power business, it is usually necessary to integrate and analyze data from multiple information systems of multiple departments, which is difficult to achieve relying on data from a single business. However, different business information systems rely on different standards for data definition, and there are also differences in data storage methods and structures [5]. How to fuse these multi-source heterogeneous data, and the success or failure of the fusion determines whether the power business can achieve in-depth application. In response to this issue, this article proposes a digital modeling method for power multi-source heterogeneous data.

44.2 Related Work The market entities in the power industry are usually composed of a small number of power supply entities, which is due to the need for standardized infrastructure for power supply services [6]. If there are too many power supply entities, resulting in inconsistent standards for infrastructure construction and maintenance, it will bring huge challenges to the operation of power supply services. Therefore, in practical applications in the power industry, data models and business models must be unified and widely applicable. This is also why electricity, as a commodity, is often regarded as a public service commodity, and electricity is always transmitted and used in

44 Research and Application of Digital Modeling Technology Based …

443

real-time without effective storage and storage media and means [7]. Therefore, the power business application model must have real-time performance to ensure realtime data and application in order to serve the implementation and development of power business. Since 2018, the intelligence and self-healing of power operation have increasingly become the principles and goals of power construction in various countries around the world [8]. In the field of transmission and substation, based on real-time measurement data obtained and collected by global satellite positioning systems and sensing devices, all equipment, facilities, and business processes in the entire business process of transmission and substation are simulated and abstracted, and an intelligent analysis model of the power grid is constructed to meet the needs of power grid construction, planning, production, and management operation. In the power equipment management business, traditional power equipment management analyzes the status of power equipment and the impact of power equipment on power dispatch and power supply services, identifying potential hidden dangers and defects in equipment management. The power equipment management model constructed mainly extracts the basic archive data of equipment from the power equipment management system and the operation status data of power equipment in the power dispatch management system. These data sources are existing information systems, and the data is easy to obtain [9]. The model construction and maintenance methods are also relatively mature. However, with the continuous development of three-dimensional technology and the continuous updating of intelligent sensing devices, the traditional collection and transmission of numerical data can no longer meet the current diverse data acquisition needs [10]. The trend of data isomerization is obvious, and image and video data have richer and more vivid description and expression abilities than traditional numerical data. Moreover, the continuous development of artificial intelligence algorithms and technologies provides more powerful mining and analysis algorithms for the processing and analysis of heterogeneous image and video data. Currently, device management analysis models based on artificial intelligence have been widely applied in multiple fields such as power production and operation maintenance, and corresponding power grid analysis models also need to be extensively modified and improved. The traditional power grid equipment analysis model lacks more vivid line trend and terrain spatial data, and the collection of geographic and spatial survey data is usually coordinated by satellite systems and geological departments. Therefore, it is necessary to upgrade and optimize the current power analysis model to meet the richer application needs of power data. According to the analysis of the power company, using real-time data from the digital twin model to inspect the defects and problems of power equipment and facilities can reduce the operating cost of this business by 1–1.5%. Therefore, the application of digital twin technology in power has great positive significance. Compared to existing power simulation model technologies, the data foundation of this method is real-time collection of operating status data of power equipment and facilities, which can perform pattern inference and state perception on real power grids. Based on the above domestic research status, this article will utilize the real-time data foundation of the power grid digital twin model to analyze the methods of power operation monitoring and resource scheduling, and develop corresponding software systems.

444

J. Qiao et al.

44.3 Research on Digital Modeling Technology Based on Multi-source Data of Power Grid The model construction for power grid business applications mainly starts from the perspective of application requirements, considers the characteristics of power grid business application requirements, and covers all functions involved in power grid business applications. The model construction must comprehensively consider all analysis scenarios that the model may be applicable to, and incorporate all data used in the analysis scenarios into the data structure of the model. Then design the corresponding data access interface to adapt to the data access and access program interfaces of all power grid business information systems. At the same time, due to the dynamic development and increase of power grid business, the designed power digital application model must also have scalability to cope with the future development and dynamic changes of power business.

44.3.1 Power Grid Multi-source Data Perception and Extraction The perception of multi-source data in the power grid first needs to solve the technical difficulties of data collection and acquisition. Data collection is mainly achieved through power sensing devices and collection devices. Data perception is to obtain some structural information required for discovering rules from the database, such as field names, field annotations, table association relationships, etc. When the multisource data of the power grid reloads the data perception requirements of the data requester, it is first necessary to request the metadata of the corresponding data file from the data server, and then read the data content of this webpage. After reading and perceiving the data content, it is necessary to load some script files and images required for the target of the request while loading the requested content. Therefore, the data requester will sequentially request the metadata and data of these scripts and image files from the data server and data server before completing the data loading after all the above operations are completed. The data request of the power grid needs to initiate two separate tasks to complete the loading of metadata and data to the data server and metadata server. By analyzing the data request process, it can be found that the data request contains many data access links pointing to other scripts and image files. When accessing data, these linked data will also be accessed simultaneously. Based on the above observation, if the distributed data management system is aware of the associated access mode between data in advance, the metadata server will prefetch the metadata of linked script files and unstructured files when responding to the metadata request from the data requester, and return them to the data requester in a specific form. After receiving the target data and its associated target data, the data requester caches it in the local data cache space, and subsequent access requests for linked scripts and data files

44 Research and Application of Digital Modeling Technology Based …

445

Fig. 44.1 Access correlation in data aware sequences

can be processed locally on the client without the need to request data information from the data server across the network. This article uses a metadata prefetching mechanism to complete the loading of all data with only one metadata operation, which can reduce the number of data in the system by 50% compared to the original data access process. Therefore, using metadata prefetching mechanism can shorten the access process of associated data, reduce the response delay of data requests, and reduce the number of requests processed by the data server. As shown in Fig. 44.1, the characteristics of data prefetching mechanisms based on access correlation. Most existing prefetching mechanisms focus on exploring file access correlation, mainly based on the assumption that if two data units were frequently accessed together in the past, they will also be accessed together with a high probability in the future. The core of exploring the correlation of data access is to search for which data units are frequently accessed together in the historical access information of the data. If data unit A and data unit B are frequently accessed together, it is considered that there is an access correlation between data unit A and data unit B. While the data request server responds to the request from data unit A, it pre accesses the content of data unit B to the requester’s local data cache, thereby accelerating the access process of associated data unit B and improving data access and perception performance.

44.3.2 Digital Modeling of Multi-source Data in Power Grids The digital modeling of multi-source data in the power grid provides a solid data base support for the design of the energy supply and normal monitoring and analysis system. Based on a digital power business analysis model, the construction of an energy supply and demand monitoring system is promoted from five aspects: domestic energy supply and demand situation, energy supply risk analysis, etc., laying the model foundation for the arrangement and deployment of power supply guarantee work for power companies. Taking the most common load forecasting demand in the power industry as an example, higher requirements have been put forward for digital modeling of multi-source data in the power industry. The prediction model must be visualized and easy to operate. The prediction model can be directly dragged and constructed through visualization, reducing the difficulty of analysis work such as load forecasting and new energy output forecasting. Build a universal prediction model library, promote the normalization of workflow management and full cycle management of prediction models, promote the improvement of modeling efficiency of common prediction models in the power grid, and enhance

446

J. Qiao et al.

the level of prediction management on the supply and demand sides of the power grid. As shown in Fig. 44.2, this article proposes a panoramic digital modeling method based on multi-source data of the power grid. Firstly, a survey is conducted based on the data range and data boundaries required by the power business, identifying all the boundaries involved in the data, and defining them as specific data selection criteria. Then overlay the three-dimensional information, and the acquisition of three-dimensional information requires extracting the three-dimensional scanning information of power equipment and facilities, mainly extracting the intuitive description data files of substations and the three-dimensional description data files of various devices after secondary processing. Then, it is necessary to standardize the interface design of the successfully constructed 3D digital model to achieve model fusion. The second is to use convolutional neural network algorithms based on artificial intelligence to predict and classify relevant data related to unknown power equipment models. The surface character data of equipment or power facilities can be extracted using OCR technology and integrated into the model to achieve intelligent identification of the model. The final digital model of power multi-source data constructed can not only present the basic archive information of power equipment and facilities in a visual manner, but also conduct in-depth business mining and analysis based on internal data relationships. Users of the model can analyze and obtain the logic of power equipment based on the various power equipment relationship logic and basic archive data displayed or included in the model, in order to further expand business application analysis. Fig. 44.2 Panoramic digital modeling based on multi-source power data

44 Research and Application of Digital Modeling Technology Based …

447

44.4 Application of Digital Model Based on Multi-source Data of Power Grid The digital model constructed based on multi-source data of the power grid not only needs to meet the current business analysis needs of the power grid, but also explores solutions and suggestions for the business challenges in the existing power grid. At the same time, the construction of the power model also needs to consider the development trend of the future power grid, prepare contingency plans for potential business and management problems, and even discover the root causes of certain upcoming problems and their future development trends and growth patterns in advance. The development and upgrading of the future power system face enormous challenges, but its development trend is towards green and low-carbon, clean and safe, efficient and intelligent, friendly interaction, open and reliable directions. The rapid development of photovoltaics and nuclear energy will also transform the power grid from coal and petroleum oriented power generation to a distributed new energy oriented power production and supply chain. The electricity consumption habits of users are also developing from rigidity and rigid demand to flexibility and flexibility. The AC DC hybrid, local DC, and microgrids are also constantly changing the existing form of the power grid, which has had a huge impact and challenge on the survival and upgrading of the power grid. The application of digital power grid models will also be challenged and iterated in the above business scenarios, ultimately forming a dynamic and scalable model library that continuously iterates and optimizes to provide sustainable model services for power business.

44.4.1 Electricity Compliance Inspection Based on Digital Power Model The relationship between users and transformers refers to the power supply relationship between users and their respective transformers. Due to reasons such as file entry and maintenance in the electricity information collection system, the connection relationship between users and transformers may be incorrect. This question aims to pilot the application of power digital models in power user data acquisition systems. Selecting approximately 1500 electricity meters from 22 high loss stations for analysis and investigation, it was identified that 5 users had abnormalities. After investigation, it was confirmed that the electricity connection relationship of 2 users was incorrect. Analyzing the data collected at 96 points with a daily collection frequency of 15 min can more accurately determine the trend of changes and the relationship between household changes. Based on the load characteristic model library, it can be found that there are users with significantly inconsistent fluctuations, and there is suspicion of abnormal user transformer relationships. At the same time, by analyzing the correlation of line loss curves with adjacent substation areas, one user’s incorrect connection relationship is identified. When installing an electricity

448

J. Qiao et al.

Fig. 44.3 Application of electric power digital model in electricity compliance analysis

meter, the output wiring of the specified meter was incorrectly connected to the corresponding user’s wiring, causing confusion in the user’s measurement. Relying solely on data analysis to identify serial users and tables is very difficult. As shown in Fig. 44.3, this article extracts the historical electricity consumption data (voltage, current, power, electricity consumption, etc.) of electricity customers based on the electricity meter and inconsistent work order data reflected in customer complaints. Calculate the similarity between the first 48 points of the curve and the second half of the curve using the curve similarity algorithm to determine the electricity usage characteristics represented by the curve. Then determines whether the electricity consumption characteristics of electricity customers match their actual electricity consumption behavior, providing a basis for determining whether there are abnormalities. Based on the classification of typical residential electricity loads in the load characteristic model library, the electricity consumption curves of 10 typical users are given, and each user is compared with these 10 typical users to define the characteristics of their residential electricity consumption. Recheck the actual electricity usage of the user, and if it does not match the actual situation, abnormal power connection may occur. For example, if the on-site investigation shows that the user is a household with more than 6 people, and their electricity consumption curve is similar to the electricity consumption characteristics of vacant houses or other types of users, the user can be listed as a suspected abnormal connection electricity customer.

44.4.2 Application of Power Equipment State Feature Analysis Based on Digital Power Model This article selects a power equipment management analysis model to conduct equipment status feature analysis. Collected volume fraction data, ledger information,

44 Research and Application of Digital Modeling Technology Based …

449

and defect information of dissolved gas in oil collected by all sensors in a certain regional power grid within a year. Firstly, data features are extracted, including the type and number of oil chromatography monitoring sensors, the power supply company of the transformer, installation location, collection date, the volume fractions of dissolved H2 (hydrogen), CO (carbon monoxide), CO2 (carbon dioxide), CH4 (methane), C2 H6 (ethane), C2 H4 (ethylene), and C2 H2 (acetylene) in the oil chromatography data, as well as the total hydrocarbon volume fractions. The information of the transformer includes the type, manufacturer, and number of the oil chromatography sensor installed on the transformer, as well as the power supply company, installation location, voltage level, and operation time of the transformer. The defect information of power equipment includes the power supply company, substation, transformer station type, longitude and latitude, pollution level, voltage level, commissioning date, wiring mode, transformer oil manufacturer and specific defect description. Verify the prototype system based on the fault analysis report of a 220 kV transformer in a certain substation provided within the region. The volume fraction of ethylene and total hydrocarbons in this transformer increased significantly near March 16 and May 25, 2021. It is determined that a fault occurred, and the type of fault is high-temperature overheating. Select the online monitoring data of the transformer from February 22, 2021 to June 14, 2021, calculate the gas production rate of the transformer, and draw a trend chart of gas volume fraction and gas production rate. Around March 15, the volume fraction of ethylene exceeded the warning value and its gas production rate exceeded the warning value. Around May 25, the volume fraction of ethylene exceeded the warning value and its gas production rate exceeded the warning value. Around March 15th, the volume fraction of total hydrocarbons exceeded the warning value and the gas production rate exceeded the warning value. Around May 25th, the volume fraction of total hydrocarbons exceeded the warning value and the gas production rate also exceeded the warning value. If the 150 L/ L given by the national standard is set as the attention value, the system will not give a prompt on March 15th and May 25th, indicating that there is a situation of missed reporting. In summary, applying the digital model studied in this article to the early warning system, the early warning results are basically consistent with the actual results, and are more accurate compared to traditional warning values, with an accuracy improvement of about 3%.

44.5 Application Results This article proposes a digital modeling method based on power multi-source data, which integrates multi-source data across power business processes and constructs a unified data fusion interface standard. The design of data interfaces relies on the acquisition of professional knowledge in various stages of power production practice. Therefore, the first step is to extract and understand the description rules of data in each stage of power production. Integrate heterogeneous data from various business

450

J. Qiao et al.

processes into a unified data platform through semantic analysis and data parsing. Then, existing artificial intelligence algorithms and data analysis mining tools are utilized to transform power analysis requirements into digital power data models, providing model support for power management personnel and departments. Finally, the power model is applied to specific scenarios for power load management analysis and power equipment state feature extraction to verify its applicability and perform iterative optimization to improve the usability of the power data model. This power data model has undergone practical testing, continuous optimization, and adaptive transformation, and can be widely applied in multiple links of power production and operation, serving the practical and application needs of the power industry. Acknowledgements This work was supported by State Grid Corporation of China’s Science and Technology Project (5108-202218280A-2-401-XG) which is ‘Research and Application of Basic Common Service Technology for Digital Twins in Power Grid’.

References 1. W. Wang, H. Huang, Application practice of digital based audit modeling for power grid overhaul engineering. China Intern. Audit 7, 2 (2020) 2. R.R. Ma, Digital transformation platform for low code enabling power industry based on visual modeling technology. Yunnan Electr. Power Technol. 50(3), 4 (2022) 3. Y.H. Shen, Application practice of digital based audit modeling for power grid overhaul engineering. Encycl. Forum Electron. Mag. (014), 1770–1771 (2000) 4. W. Guo, Y. Liu, Q. Yan et al., Research on the digital acquisition method of power grid based on 3D laser scanning technology. Electron. Des. Eng. 29(12), 5 (2021) 5. K.Q. Lin, X.X. Ma, Research and application of digital construction of power grid based on 3D panorama. Electron. Prod. 9, 4 (2020) 6. Y.Q. Liu, Y.F. Song, J.Z. Liang et al., Integrated digital standard support for intelligent manufacturing of power equipment. Southern Power Grid Technol. 16(12), 8 (2022) 7. Z. Cai, Research on the 3D modeling method of power facilities under the construction of digital power grid. Technol. Mark. 27(12), 2 (2020) 8. C. Chen, X.F. Song, P.X. Dong et al., Deepening the application of substation logical model transfer based on digital twins. Mod. Electr. Power 40(1), 9 (2023) 9. S.Y. Zhang, J.F. Yan et al., Modeling and implementation method of knowledge model for power grid regulation decision. Chin. J. Electr. Eng. 014, 042 (2022) 10. S.F. Zhang, A new power information system security model and its evaluation method. Digit. Users 37(026), 47–51 (2013)

Chapter 45

Construction of Music Popular Trend Prediction Model Based on Big Data Analysis Yitong Zhou and Chunqiu Wang

Abstract At present, there are mainly music apps, music websites and video music on the Internet. Users can upload original songs and download their favorite songs, which brings convenience to music-loving users. In today’s era of big data, music listeners will determine the popular trend of music. Listeners listen to, download, collect and share music on many music platforms, and pay attention to, comment on, forward and like music on major social networks, video websites, post bars and forums. These behaviors reflect listeners’ preferences for music. Therefore, this paper analyzes the trend of music popularity based on big data, and constructs a prediction model in this paper to further analyze it. The actual development trend of the predicted data is relatively stable. If the fluctuation is small, the prediction accuracy of the algorithm is relatively high. On the contrary, if the fluctuation of the data is large or the data has no long-term development trend, it is easy to cause a relatively large prediction error, and the greater the prediction distance, the greater the error. Keywords Big data analysis · Music popular trend · Prediction model

45.1 Introduction The pop trend of music can be expressed according to the current pop artists, so the prediction of the pop trend of music is also the prediction of which music artists will become the pop artists in the future for a period of time. To judge whether an artist is a pop artist, you can judge according to the number of music auditions of the artist in the recent period [1]. At present, the music forms of the Internet mainly include music app, music website, video music, etc. Users can upload original songs and download favorite songs, which brings convenience to users who like music. But at Y. Zhou · C. Wang (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_45

451

452

Y. Zhou and C. Wang

the same time, a large amount of music data has been generated on the Internet, and the traditional storage methods and computing algorithms cannot meet the growing demand for data storage and computing [2]. For fan users in the “fan economy”, the prediction results of music popularity trends are more important. These users can use the prediction results to master the song popularity of their favorite artists for a period of time in the future, and set a technical benchmark for their “fan behavior activities” [3]. In today’s era of big data, music audiences will determine the trend of music popularity. Listeners listen to, download, collect and share music on many music platforms, and pay attention to, comment on, forward and like music on major social networks, video websites, post bars and forums. These behaviors reflect listeners’ preferences for music. According to the statistics report of China Industry Information Network, in recent years, the traditional music industry has declined, and Internet music has developed rapidly. The rapid growth of Internet music paying users has become the core of promoting the development of the music industry [4]. Music big data is formed by using huge music library resources and user behavior. Through accurate big data analysis, the trend of music can be effectively predicted, and the trend of music popularity can be truly determined by the aggregation of audience preferences. At present, data analysis is an emerging research field for analyzing and processing data. In this paper, using big data to analyze a large number of Internet music data for analysis and research is one of the most concerned research points of Internet enterprises [5, 6]. Through the user’s historical behavior data, predict the number of artists playing in the next stage, excavate the artists who will become the focus of attention, and accurately control the future trend of music.

45.2 Description and Preprocessing of Music Data 45.2.1 Data Description Music popularity refers to the fact that some people in society pursue a certain musical behavior in a certain period of time driven by a certain psychological need, which leads to the spread of a certain style of musical behavior in a certain social scope, and forms different degrees of social popularity and social group fanaticism. Firstly, the target value to be predicted is determined. In the prediction of music pop trends, it can be divided into long-term range prediction, short-term range prediction and short-term prediction from the perspective of time [7]. According to these three times, predict the broadcast volume in the next month, the broadcast volume in the next week and the broadcast volume in the next day. The traditional music platform has gradually disappeared with the arrival of the era of artificial intelligence. Under the background of big data era, data analysis is widely used in various research fields. The application of data analysis should first analyze the data in detail, then process it, and finally get the results after data processing. This paper uses the historical music

45 Construction of Music Popular Trend Prediction Model Based on Big …

453

Table 45.1 Statistics of total data of dataset First group

Singer

Song

Music users

User records

58

11,258

374,514

1,554,122

First group

87

28,745

584,789

2,015,257

First group

150

58,748

865,988

4,012,541

data from May to September as the basic experimental data. Statistics of all data sets are shown in Table 45.1. It is not a good choice for processing a large amount of internet music data. To predict the playback volume in the next month, the selected features include the features extracted from the playback behavior and download behavior. The experimental results show that the download behavior is not conducive to prediction. In addition, the artificial neural network model is used to predict the trend of music popularity. The construction speed of this model needs to be improved, and the high requirements on the experimental environment make it time-consuming and the prediction effect is also average [8]. Predict the broadcast volume in the next week, select the characteristic value of the total broadcast volume in a single week as the input, and the data after adding the statistical characteristic value. The experimental results show that adding the statistical characteristic can significantly improve the performance of the model. Moreover, big data analysis requires a large number of parameters, so the setting of parameters has a great impact on the experimental results, and the workload is complex.

45.2.2 Data Analysis Data quality analysis mainly needs to check whether there is dirty data in the original data set. Dirty data refers to data that is not in accordance with normal logic and requirements, and cannot be directly analyzed. The main task of data quality analysis is to dig out dirty data [9, 10]. In the process of predicting the total audition of artists’ songs, it is necessary to select the factors that affect the total audition of artists’ songs, which are not only reasonable but also representative. Table 45.2 is a song-singer information table, which contains a large number of information records of songs and singers. The data in the table mainly includes the information of the operated songs and the information of the singers to which the songs belong. The attributes contained in the table include the song number, the singer account corresponding to the song, the release time of the song, the initial number of songs played, the language of the song, and the gender of the singer. This paper makes statistical analysis on the data, draws the curve distribution shape of each singer, observes and analyzes the law and the overall trend. If the actual development trend of the predicted data is relatively stable and the fluctuation

454

Y. Zhou and C. Wang

Table 45.2 Song-Singer information table Column name

Type

Explain

Example

artist_id

String

The singer account corresponding to the song

0c80008b0

song_init_plays

String

Initial number of songs played

15,242

language

String

Numbers represent 1, 2, 3… 5 kinds in total

gender

String

The numbers 1, 2 and 3 represent men and women respectively

2

is small, the accuracy of the prediction of the algorithm is relatively high. On the contrary, if the fluctuation of the data is relatively large or the data has no longterm development trend, it is easy to cause relatively large prediction error, and the larger the prediction distance, the greater the error [11]. Some important properties of the data set as a whole can be identified through statistics such as broadcast volume, download volume and collection volume, which is of great reference value for subsequent data analysis.

45.3 Construction and Analysis of Numerical Prediction Model 45.3.1 Prediction Model of Music Pop Trend In view of the fact that the traditional music trend prediction model only uses the data close to the prediction target time period, this paper proposes to cluster artists and combine fuzzy granulation method with support vector machine to build a new model. Experiments show that the changing trend and space of the broadcast volume in the next few days can be accurately predicted. By analyzing and mastering the development law of things, and correctly deducing and calculating according to the law of seed delivery, we can predict its future development trend to some extent. According to the characteristics of big data, this paper analyzes and predicts from the user’s point of view: first, on the user’s side, it counts the general characteristics of users’ listening to songs on the data set, such as maximum, minimum and median; The user’s listening time, song language and other characteristics are clustered to describe the user’s singer preference, time preference and language preference. Let the actual broadcast number of singer i on the j day be Ti, j , the singer collection be S, and the predicted broadcast number of singer i on the j day be Pi, j , then the normalized variance σi of the broadcast prediction and actual variance of singer i is: √ | N |1  Pi, j − Ti, j σi = √ N j=1

(45.1)

45 Construction of Music Popular Trend Prediction Model Based on Big …

455

The weight ϕi of singer i is defined as the square root of the sum of the actual playback volume of singer i on the j day: √ | N | Ti, j ϕi = √

(45.2)

j=1

The evaluation index of the prediction result is F: F=



(1 − σi )

(45.3)

i∈S

F is obtained by adding the evaluation values of each singer, and the evaluation index of each singer is obtained by multiplying the normalized variance σi and ϕi . The prediction model of music popular trend is aimed at the problem of stationary time series, while for non-stationary time series, it needs to carry out one or more differential processing on the original time series to form a new series before it can be used. For example, this method is called ARIMA model. Statistics and management of artists’ song audition volume; The main function of the model is to display the daily audition, download and collection of artists’ music and the daily audition, download and collection of each song with charts and tables, and to provide users with the function of deleting some unnecessary statistical data. If there is no real data in the predicted time period of this task, only the line chart of the predicted data will be displayed. This module displays the number of artists’ song auditions predicted by the quadratic exponential smoothing method, ARIMA model, neural network model and other methods, and can freely combine the predicted results of the algorithm according to a certain proportion and view the combination analysis results.

45.3.2 Prediction and Analysis of Music Trend Based on Big Data Analysis The pop trend of music can be expressed according to the current pop artists, so the prediction of the pop trend of music is to predict which music artists will become pop artists in the future for a period of time. The judge of whether an artist is a pop artist can be based on the number of music auditions of the artist in the recent period. From the perspective of short-term development, the predicted broadcast volume in the next week corresponds to the weekly list in the music platform. In order to eliminate the contingency of the prediction task and verify the generalization ability of feature selection, support vector machine is used to build the prediction model, which also achieves high correlation and small error, but there is still a large gap in predicting the monthly broadcast volume. This chapter removes the download

456

Y. Zhou and C. Wang

behavior, and compares the results of a single weekly broadcast volume and the results after adding statistical features, in order to find the optimal feature attributes. Among the three prediction tasks determined in this paper, the monthly broadcast volume is predicted to be the sum of the broadcast volume in the next month, the weekly broadcast volume is predicted to be the sum of the broadcast volume in the next week, the daily broadcast volume is predicted to be the broadcast volume in the next day, among the questions raised in succession are predicted to be the change trend and change space of the broadcast volume in the next five days, and in the future work, the accurate forecast value of each day in the next five days or more shall be considered. Figure 45.1 shows the comparison between the original data and the predicted value. From Fig. 45.1, it can be seen that the monthly broadcast volume predicted by the model has two points to pay attention to: the value close to the average value has a good prediction result, and the predicted value is close to the actual value, which basically coincides. There is a large difference in the monthly broadcast volume between artists. The difference in the number of popular singers and ordinary singers will increase geometrically after accumulation, causing difficulties in prediction. Figure 45.2 shows the comparison results of the two feature selections of week4. The features in feature1 and feature2 are all single weekly broadcast, and the statistical feature is added to feature3. It can be seen from the line chart that after adding the statistical feature, the error decreases by an order of magnitude, and the mode performance is significantly improved. Therefore, add statistical features and select the best combination of features. In the horizontal comparison, n values are compared from 2 to 4, while in the vertical comparison, hidden is still tested from 1 to 25. From Fig. 45.3, we can see that the optimal value of week3 is relatively close. When hidden = 14 in week1, the error value is 0.0005; When hidden = 15 in week2, the error value is 0.0056. On the whole, the result of week3 is the best, and the curve is relatively flat, indicating that week3 has the lowest sensitivity to the structure and the most gentle feature selection.

Fig. 45.1 Forecast results of broadcast volume in January

45 Construction of Music Popular Trend Prediction Model Based on Big …

457

Fig. 45.2 Comparison of results of different input characteristics

Fig. 45.3 Performance optimization comparison of predicted weekly broadcast volume

Through the above data analysis, we can find that from the relative error of the broadcast volume, we can’t accurately predict the explosive value, which shows that the broadcast volume of super popular singers in reality is not an order of magnitude compared with ordinary singers. The forecast of monthly broadcast volume will widen this gap, and the daily broadcast volume has certain contingency, but the difference between weekly broadcast volume and singer’s broadcast volume will not expand much, and it will average the daily broadcast volume, eliminating contingency. Calculate the total broadcast volume, total download volume and total collection volume of each user for six months. Secondly, the records of the same songs played by the same users are merged, and for these users, the daily play volume of each “user-singer” is calculated. On the premise of excluding the interference of external events, the monthly average, weekly average and daily average of each

458

Y. Zhou and C. Wang

artist’s broadcast volume are calculated and coded, and the average of these coding processes can be used as a prediction of the trend of each artist’s broadcast volume.

45.4 Conclusions In the context of big data, the prediction of music trend is not ideal in the experiment of periodic superposition of the time series of all artists’ playing volume. The periodicity of music data is very weak and the data volume is large, which leads to certain errors in the prediction of the two-month playback volume of each singer. There are still other unexpected events that will lead to unpredictable broadcast volume of some singers. In order to get good experimental results, periodic superposition is not considered in the experiment process of predicting the playback volume of all artists. The actual development trend of the predicted data is relatively stable. If the fluctuation is small, the prediction accuracy of the algorithm is relatively high. On the contrary, if the fluctuation of the data is relatively large or the data has no longterm development trend, it is easy to cause relatively large prediction error, and the larger the prediction distance, the greater the error. The model is also insensitive to its own parameters, which can ensure that the parameters change within a certain range, and the prediction ability remains at a high level. Through statistical analysis of the user’s behavior records of listening, downloading and collecting songs generated by the electronic music platform, and at the same time designing and implementing a music trend prediction model based on big data analysis, the staff who do not understand the prediction algorithm can also find out which artists have the highest amount of music listening in the next stage. These artists represent the music trend in the future for a period of time.

References 1. X. Liu, China’s music market industry ushers in a booming trend. China’s Foreign Trade: English Version 58(5), 3–18 (2022) 2. T. Feng, The inheritance of national music in college music education under the multicultural background. Foreign Language Version: Educ. Sci. 63(3), 5–19 (2021) 3. S. Dai, Z. Zheng, G. Xia, Music Style Transf. Issues: A Position Pap. 56(18), 31–42 (2021) 4. Flatif, Interior design of music schoo1s in Jakarta, in IOP Conference Series: Earth and Environmental Science, vol. 729, no. 1 (2021), pp. 012059–012066 5. J. Guo, Research on innovation and development trend of traditional music teaching in colleges and universities. J. Hubei Open Vocational College 55(14), 16–36 (2020) 6. A. Roper, The birth of German music printing. Early Music 36(12), 19–32 (2022) 7. A. Carot, C. Hoene, H. Busse et al., Results of the fast-music project—five contributions to the domain of distributed music. IEEE Access 63(99), 19–48 (2020) 8. C. Pinkney, S. Robinson-Edwards, Gangs, music and the mediatisation of crime: expressions, violations and validations. Safer Communities 55(17), 20–64 (2022) 9. J.R. Chen, The impact of different genres of music on teenagers. Int. J. Psychol. Stud. 10(20), 21–36 (2020)

45 Construction of Music Popular Trend Prediction Model Based on Big …

459

10. P. Sanitnarathorn, An analysis of music fan towards music streaming purchase intention of Thailand’s music industry. J. Educ. Train. Stud. 6(3), 78–91 (2022) 11. J. Ren, Pop music trend and image analysis based on big data technology. Comput. Intell. Neurosci. 18(12), 20–41 (2021)

Part IV

Deep Learning and Neural Network

Chapter 46

Construction of Personalized Music Emotion Classification Model Based on BP Neural Network Algorithm Siyu Yan, Chunqiu Wang, and Xucheng Geng

Abstract Music plays an important role in our daily life. Emotion is the essential feature of music. The real purpose of people creating music is actually to convey some inner feelings. In reality, the classification of musical emotions is based on several discrete emotional categories, and given these discrete emotional categories, in some cases, it may not be completely and properly related to people’s emotions. In recent years, artificial neural network is a distributed parallel information processing system. Its characteristics of self-adaptation, self-organization and self-learning make it especially suitable for the classification problem in audio recognition, which provides a new way to solve such a complex pattern classification problem. In this paper, based on the BPNN (BP Neural Network) algorithm, personalized music emotion classification is studied, and a music emotion classification model is constructed. Finally, through experimental analysis, we can get that the training error decreases with the increase of iteration times and reaches the minimum at the end of training. The trained BPNN model is used to predict the emotional categories of songs, and the overall classification accuracy of the samples is 87% by modifying the predicted values. Keywords BP neural network algorithm · Personalized musical emotion · Classification model

46.1 Introduction Personalized music emotion analysis is a research direction of artificial intelligence. Some people raised the related issues of music emotion analysis when artificial intelligence was just started in the 1950s. Music plays an important role in our daily S. Yan · C. Wang (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] X. Geng Finance Office of Huaihua University, Huaihua University, Huaihua, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_46

463

464

S. Yan et al.

life. Emotion is the essential feature of music. The real purpose of people to create music is to convey some kind of emotion in their hearts. The system will collect the emotional labels that all users attach to the songs as a data set. The system background uses a specific algorithm model to automatically classify all tagged songs and all untagged songs [1]. Music also has an obvious imprint on the impact of human groups and even nations. The cultures and ways of thinking of countries and nations in the world vary greatly, and culture determines music. And music also affects culture in turn, and then the way of thinking and thinking [2, 3]. Playing a soothing music after a day of physical and mental exhaustion at work will make people calm and release their tense body. Therefore, it can be said that music can play a very positive role in personal growth and personal life. Artificial neural network is a distributed parallel information processing system. Its adaptive, self-organizing and self-learning characteristics make it particularly suitable for classification problems in audio recognition, providing a new way to solve a complex pattern classification problem such as audio classification and recognition [4]. In reality, music emotion classification is based on a number of discrete emotion categories, and given these discrete emotion categories, in some cases, may not be completely and properly associated with people’s emotions. Because the current classification of emotions is based on people’s emotions in the cognitive field. For example, a loud piece of music is associated with emotions of intensity, tension, anger and joy, while music with low loudness is considered to be associated with emotions of tenderness, sadness, solemnity and fear. Fast sound speed is generally regarded as active and exciting, while slow sound speed is often regarded as grand and solemn. However, in the field of music emotion retrieval, when it is necessary to find emotions similar to a certain melody, only the features of the known melody can be extracted, and then the same emotion retrieval can be performed. Therefore, it is necessary to build a computable music emotion classifier [5]. Moreover, even based on discrete emotion category classification, the number of specific emotion categories is also a problem. Based on this, this paper extracts the five feature vectors of pitch, time value, strength, number of notes, and speed that represent the melody characteristics of music based on BPNN algorithm, and constructs the emotion classification model based on BPNN, and classifies music emotion, which has achieved good results.

46.2 Analysis and Extraction of Music Emotional Eigenvalues 46.2.1 MIDI Audio File Analysis The principle of MIDI audio file is to sample the analog waveform signal sent by sound source, then convert the waveform signal into digital signal, and then encode and store the digital signal. MIDI audio files have the characteristics of good sound quality and can be compatible with all kinds of playing software, but its ratio of

46 Construction of Personalized Music Emotion Classification Model …

465

volume to sound quality is not satisfactory. The music played when the operating system is started is MIDI format, which is often used to store voice [6]. Determine the format of the audio file, and then we can analyze and discuss the audio file format and music theory knowledge to find out the eigenvector of music emotion and the function value method for extracting timbre characteristics of current music is as follows:  1 xi ∈ C1 h xi = (46.1) 2 xi ∈ C2 Even if the same piece of music is played at different tempos, it will give people different feelings. If the tempo is fast, the emotion will be happy; Slow tempo is depressing. The MIDI audio format file has two advantages over the format and format. One is that it records not the waveform but the accurate playing sequence. The beat characteristics of each song are certain. Therefore, when the MIDI file extracts the speed characteristics, it needs to read the response meta event information according to the file content [7, 8]. In music analysis, it is no longer necessary to study the waveform but to extract the feature directly. The second is that the storage is very small, and it is faster in reading and transmission.

46.2.2 Main Track Recognition From the above-mentioned in-depth discussion of MIDI music files, we can see that. The emotional activities of music are dynamically manifested through the two poles of sound strength and weakness, tension and relaxation, excitement and calm [9]. From the perspective of music theory, each piece of music does have a main theme and accompaniment melody. When a large band plays, the main content of the music will be played by the protagonist of the band, and the accompaniment of the band will be intermittently added to the performance as needed. Music is composed of a series of notes, and each note contains such attributes as pitch, sound intensity and sound length, but the musical emotion does not change with a single note, but should be considered as a whole. After determining to analyze the continuity of pitch and the discontinuity of notes, we should also find the quantification of the difference between notes from the music theory, because the same pitch is different, and the pitch difference between different notes is also different. The differences between different pitches are shown in Table 46.1. By finding out the rules of emotion recognition, the cognitive discrimination formula and rules are established. Among them, the changes of strength and weakness are reflected in strength elements, the changes of excitement and calm are reflected in speed elements, and the changes of tension and relaxation are reflected in pitch and interval elements in space, and rhythm elements in time [10]. So we can judge the new music samples and determine the emotional classification.

466

S. Yan et al.

Table 46.1 Difference between adjacent pitches expressed by intervals Difference between adjacent pitches

Pitch interval

Example

Distance 0 halftone

Pure degree

12,20,35

Distance 1 half tone

Minor second degree One degree increase

31,68 1#1

Distance 2 halftones

Minor third degree Second degree increase

13,20,40,57,66 #25

Distance 3 halftones

Greater third degree Minus four degrees

25,34,60 1#2

46.2.3 Feature Extraction of MIDI Audio Files From the format analysis, there are also speed, the number of notes and syllables that can be extracted from a message, in which speed refers to the speed of music, which is divided into slow, fast, medium, fast and slow. In the program, we define the following two objects, and through the variables of these two objects, we can solve the later musical emotion feature vector [11]. Emotional information in music signals should be reflected by emotional characteristic parameters, and emotional changes should be reflected by differences in characteristic parameters. Therefore, the emotional recognition of music must extract the corresponding feature parameters to reflect the emotional characteristics of music, and we can get five feature vectors in total: ➀ Pitch pitch The physical characteristic of the sound corresponding to the pitch is the vibration frequency, and the height is determined by the vibration frequency of the pronunciation body, and the two are in direct proportion. The frequency vibration times are more, the pitch is lower, and vice versa. The higher the vibration frequency, the higher the sound, otherwise, the lower the sound. In MIDI files, the low pitch is mainly reflected in the selection of note values and the distribution of note values from 0 to 127. ➁ Duration duration The length of a note or rest also refers to the actual time occupied by a note, indicating the pronunciation duration of the note. In MIDI files, it is mainly determined by the delta-time between two MIDI events: note on and note off. ➂ Intensity intensity Choose 200 pieces of music with different styles from MIDI music library, which can cover all emotions. The size of sound is determined by the amplitude of sound wave, which is also called sound potential. Ask colleagues, family members, friends and

46 Construction of Personalized Music Emotion Classification Model …

467

other 30 people to listen, and classify each piece of music according to the feelings in most people’s minds. ➃ Note Number of notes The number of notes contained in the whole piece of music, find out a total of 101 pieces of music that are complete and can express emotional types, among which 41 pieces belong to cheerful emotional types, and 60 pieces belong to lyric emotional types. ➄ Velocity speed Speed refers to the speed of music beat, which is divided into slow, fast, medium, fast and slow. Even if the same piece of music uses different beat speeds, it will give people different emotional experiences and beat fast, which will make them feel happy; If the beat is slow, it will be depressing. By analyzing these characteristics, we can obtain the emotional connotation information of music, but many characteristic elements do not have the same important position, and the emotional connotation is only shaped and expressed by a few characteristic elements that occupy a more important position [12].

46.3 Construction of Music Emotion Classification Model Based on BP Neural Network All kinds of things have their own ways of expression, such as literature with words, graphics with shapes, dance with body language, etc. Similarly, an important part of recognizing music emotion is to find out the characteristics of music. To recognize things, we need to analyze the characteristics of things. For example, to recognize the shape of a figure, we need to analyze the edges, corners, and faces of the figure. To recognize the type of literature, we need to analyze the wording, syntax, language, theme, and so on of literature. The BP network classifier inputs five dimensional vectors of emotional features, which are pitch, length, timbre, speed and strength. The number of nodes in the input layer is 5, and the output is the music’s emotion. According to Hevner’s emotion model, the music’s emotion is divided into 8 types: happy, sad, lyric, passionate, angry, calm, resolute and quiet. The contribution rate reflects the importance of the first feature to music classification. Finally, BP neural network is used to establish music emotion classification model, as shown in Fig. 46.1. BPNN algorithm puts forward an effective algorithm for adjusting the connection weight of hidden layer. The music signal is processed in frames, and then the endpoint of music is detected, which is convenient for the subsequent music feature extraction operation. Generally, the neural network adopts two-layer structure, namely, the input layer and the output layer. The information between the two layers is transmitted forward, and there is a hidden layer between the two layers. The hidden layer accepts

468

S. Yan et al.

Fig. 46.1 Music emotion classification model

the results of the input layer and makes corresponding transformation according to the requirements of the results, and the transformed results are transmitted to the output layer as the output of the whole neural network. Let the energy of music signal x(n) be E n , and its calculation formula is: En =

∞ 

x(m) · ω

(46.2)

−∞

where ω stands for window function. For a certain frame xt (m) of music, it can be described as follows: X t (m) = xt (m, 1), xt (m, 2)

(46.3)

After smoothing with wavelet transform X t (m), the processed music is Xˆ t (m), as follows: Xˆ t (m) = xˆt (m, 1), xˆt (m, 2)

(46.4)

In the design of this paper, five emotional feature vectors are the inputs of the neural network, and three types are the outputs of the neural network.

46 Construction of Personalized Music Emotion Classification Model …

469

46.4 Analysis of Experimental Results In this paper, the model trained by BPNN algorithm is used to extract features from the data. Through the study of neural network and the study of actual music emotion analysis, the neural network modeling based on learning algorithm is determined. We regard the output of the first full-connection layer of the personalized music emotion classification model based on BPNN algorithm as the characteristic value of the data, totaling 274. The 274 eigenvalues are used as the input of the RAk model to conduct personalized emotion classification test on music. In the processed data set, the first four users are selected according to the number of music from the largest to the smallest, and their music emotion classification effect is tested. The music digitization of the four users and the number of users concerned are shown in Table 46.2. The emotional attributes of a song may change, and the extraction of acoustic characteristic parameters should follow the short-term stability of audio signals. Therefore, the basic object of the experiment is based on fragments and frames. Take out the music segment that best represents the music emotion from each music. Each music segment is 28 s long, and convert these files into 27.33 kHz, 18-bit, mono wav format. This step is completed manually. The training error obtained after BPNN training is shown in Fig. 46.2. It can be seen from the experimental results in Fig. 46.2 that the training error decreases with the increase of the number of iterations and reaches the minimum at the end of training. The trained BPNN model is used to predict the emotional categories of songs, and the overall classification accuracy of the samples is 87% by modifying the predicted values. Figure 46.3 shows the predicted results of some inspection categories and their comparison with the real results. From the experimental results in Fig. 46.3, it can be seen that, on the whole, the predicted classification results of music emotion classification are not much different from the actual results, and there will be large errors in individual e-commerce. Since the predicted outliers account for a small proportion of the overall data set, it will not have a great impact on the prediction accuracy of the model. This chapter mainly verifies the effectiveness of the music emotion classification model proposed in this paper through simulation experiments. Network neologisms are changeable, and the coverage of emotional corpus is limited, which limits the learning ability of neural network to a certain extent, and misjudges the music mood Table 46.2 Selected users

Number of music User 1 124

Number of primary users

Number of secondary users

4

7

User 2 138

2

15

User 3 115

2

5

User 4 106

3

10

470

S. Yan et al.

Fig. 46.2 BPNN error training

Fig. 46.3 Comparison between BPNN prediction results and actual results

of this song. The BPNN model trained in this paper is used to predict the song category. Through the correction of the predicted value, it is proved that the model in this paper has important application value for the classification of music style because of the limited corpus collected from the network.

46 Construction of Personalized Music Emotion Classification Model …

471

46.5 Conclusions By finding the law of emotion recognition, cognitive discrimination formulas and rules are established. Among them, the changes of strength and weakness are reflected in strength, the changes of excitement and calmness are reflected in speed, while the changes of tension and relaxation are reflected in pitch and interval in space and rhythm in time. In this paper, a music emotion classification model based on BPNN algorithm is constructed, and the main track and the accompaniment track are distinguished from each other in terms of pitch continuity and note discontinuity. The author thinks that the algorithm can be redesigned by other discrimination methods or by establishing a new model to make the algorithm more accurate and adapt to more music styles and music types. Finally, through experimental analysis, it can be seen that the training error decreases with the increase of iteration times, and reaches the minimum at the end of training. The trained BPNN model is used to predict the emotional categories of songs, and the overall classification accuracy of the samples is 87% by modifying the predicted values. In addition, if we can learn from the method of fuzzy mathematics, it should be more in line with the actual situation to describe music emotion with membership function, but it also brings difficulties in modeling, which needs further study and discussion by future generations.

References 1. X. Liu, An improved particle swarm optimization-powered adaptive classification and migration visualization for music style. Complexity 67(20), 11–19 (2021) 2. X. Jia, Q. Li, A music emotion classification model based on the improved convolutional neural network. Comput. Intell. Neurosci. 67(22), 38–62 (2022) 3. D. Chaudhary, N.P. Singh, S. Singh, Development of music emotion classification system using convolution neural network. Int. J. Speech Technol. 66(3), 24–59 (2021) 4. S. Wang, T.H. Wu, T. Shao et al., Integrated model of BP neural network and CNN algorithm for automatic wear debris classification. Wear 426(58), 1761–1770 (2019) 5. L. Zhang, Z. Tian, Research on the recommendation of aerobics music adaptation based on computer aided design software. J. Intell. Fuzzy Syst. 55(2), 1–12 (2021) 6. T. Li, Visual classification of music style transfer based on PSO-BP rating prediction model. Complexity 20(15), 31–42 (2021) 7. Y. Hu, Music emotion research based on reinforcement learning and multimodal information. J. Math. 20(11), 19–27 (2022) 8. J. Shi, Music recommendation algorithm based on multidimensional time-series model analysis. Complexity 20(1), 1–11 (2021) 9. W. U. Haijin, J. Chen, Context-aware music recommendation algorithm based on classification and collaborative filtering. J. Fuzhou Univ. (Natural Science Edition) 63(18), 51–72 (2022) 10. Z. Wen, F.T. Amp, Research on classical music style classification model based on group intelligence optimization neural network. Mod. Electron. Tech. 36(15), 25–47 (2019) 11. R. Wang, X. Ma, C. Jiang et al., Heterogeneous information network-based music recommendation system in mobile networks. Comput. Commun. 150(22), 16–28 (2019) 12. D. Jiang, C. Liu, Z. Ou, Classification prediction model based on BP neural network. J. Phys.: Conf. Ser. 22(1), 012040–012058 (2022)

Chapter 47

Music Main Melody Recognition Algorithm Based on BP Neural Network Model Peng Tongxin and Chaozhi Cheng

Abstract The multimedia information data has multiplied, and the traditional retrieval of music by manual text annotation can no longer meet the demand. More and more users are used to obtaining music information for entertainment, study and business through the network. This puts forward higher requirements for music information retrieval (MIR). Based on this, this paper proposes a music melody recognition algorithm based on the back propagation neural network (BPNN) model. Through feature fusion, the model trained in a huge data set has achieved a high music classification effect. There is little difference between the original pitch accuracy and the original pitch accuracy, which indicates that the error rate of the proposed method is low. When dealing with vocal music, music style classification needs to extract the main theme first. Compared with music style classification referring to the semantics of lyrics, pure melody style classification does not need artificial semantic labeling of data sets, so it can get the classification function of music style more simply and quickly. Keywords BP neural network · Music information retrieval · Main melody recognition

47.1 Introduction Music, as a common language for human beings to communicate their emotions, can directly reach the depths of people’s hearts, stimulate emotional resonance and produce pleasant aesthetics, which has irreplaceable artistic charm for other artistic behaviors. Under the wave of Internet technology, computer technology drives the creation of digital music to become more convenient, and its spread speed becomes more rapid [1]. In the huge music data environment, there is no way to simply P. Tongxin · C. Cheng (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_47

473

474

P. Tongxin and C. Cheng

rely on manual management as before. Music, as one of the most traditional forms of entertainment, shows its incomparable charm and energy with the help of the Internet and becomes an indispensable part of people’s lives [2]. More and more users are used to obtaining music information for entertainment, study and business through the network. This puts higher demands on MIR. Theme extraction is a hot and difficult task in the field of MIR [3]. Considering that music signal and speech signal belong to the same sound signal, the analysis process of sound signal is generally similar, which requires signal preprocessing, feature extraction and model recognition [4]. Therefore, this article proposes to use the speech signal recognition system to recognize the music signal. Main melody extraction technology refers to automatically identifying the pitch sequence of musical instruments through algorithms in polyphonic music [5]. It is a hot research content in the field of music information fitness, which plays an extremely important role in the recommendation algorithm of digital music and the classification algorithm of music style, and it is also a research difficulty in the field of MIR [6]. In order to find a piece of music, users must accurately know the title, author, performer or other related text descriptions, otherwise they can only spend a lot of time browsing all the pieces in this music category one by one [7]. Therefore, according to the characteristics of music retrieval, it is necessary to develop a convenient and practical man–machine interface to realize contentbased music retrieval on the Internet [8]. When dealing with vocal music, music style classification needs to extract the main melody melody firstly. Compared with music style classification with reference to the semantics of lyrics, pure melody style classification does not need artificial semantic labeling of data sets, so it can get the classification function of music style more simply and quickly [9]. Based on this, this article puts forward a music theme recognition algorithm based on BPNN model. Through feature fusion, the model trained in huge data sets has achieved high music classification effect.

47.2 Methodology 47.2.1 Digital Processing of Acoustic and Speech Signals Frequency and amplitude are two basic characteristics of waves, and so are sound waves. The frequency of sound is related to the pitch of sound, and the amplitude is related to the loudness of sound. In a complex tone, there is generally not only one harmonic, but several harmonics. According to the difference with the multiple of pitch frequency, it is called subharmonic, subharmonic, etc. In a sound wave, the energy of each harmonic sometimes varies greatly, and only some harmonics are obvious, while the rest will be suppressed or even disappeared. The sound length describes the duration of a single sound, which is positively related to the vibration duration of the sound source. Timbre describes the quality of sound, which refers

47 Music Main Melody Recognition Algorithm Based on BP Neural …

475

to the distinctive characteristics of different sounds at the acoustic level, mainly determined by the proportion of overtones [10]. Loudness describes the intensity of sound, which is positively related to the vibration amplitude of sound source. When using modern computers to process digital signals, it is impossible to measure and calculate infinitely long signals. Moreover, most meaningful speech signals are time-varying signals, and common digital signal processing methods cannot directly process such time-varying signals. However, it is generally believed that the speech signal is a stationary signal in one time, or it can be approximately regarded as stationary, which is called short-time stationarity. Feature parameter extraction is a key step in the construction of recognition system. As the input of matching model, its performance is directly related to the recognition accuracy of the system. Research on the model of musical tone recognition The research on the model of musical tone signal is the key to obtain high recognition rate of musical tone recognition system. Music is an artistic expression formed by the combination of sound through a series of regular changes of strength and speed. From the point of view of audio signal processing, music signal is a mixed audio signal, and melody is considered to come from a single sound source in music signal, which is usually human voice or leading instrument [11]. The purpose of windowing digital signals is to obtain short-time digital signals. A speech signal covered by a short-time window is called a frame, and its length is called the frame length. The short time window moves smoothly on the speech signal, and the distance of each movement is called frame shift. The choice of frame length and frame shift is sometimes related to the analysis method to be used in the next step, and it will have an impact on the analysis results. In the time direction, notes with different frequencies are arranged in sequence with rhythm as a reference to form a melody. Melody is the most important element of music. In continuous melody, the curve formed by pitch sequence is usually called melody line. The audio signal classification method is shown in Fig. 47.1. The progress of analog-to-digital conversion depends on the quantity of bits of the analog-to-digital converter, and the operational efficiency of the system is related to the quantity of bits of the converter and the sampling frequency. The larger the quantity of bits, the higher the sampling frequency, and the larger the amount of data after the music is digitized, which will increase the operational amount of the system, resulting in a decrease in efficiency. Starting with the analysis of the unique performance of music sound, combined with various recognition models of speech recognition system, the system suitable for music sound recognition is analyzed and designed. Based on the physical characteristics and musicality of musical sound, the main frequency value and sound name of musical sound signal are identified. Complete the design of the whole identification system. Windowing the signal is equivalent to truncating the source signal, and according to the related theory of digital signal processing, the signal truncation will cause the phenomenon of spectrum energy leakage, which will inevitably lead to aliasing, and then cause errors in the analysis results [12]. In addition, a short time window function is used to multiply the digital signal in time domain.

476

P. Tongxin and C. Cheng

Fig. 47.1 Audio signal classification method

47.2.2 Music Main Melody Recognition Algorithm From the perspective of frequency domain, a single note has a stationary period, which is particularly different from speech recognition. Main lobe width Main lobe width is the frequency range occupied by the main lobe in the spectrum of window function [13]. The width of the main lobe is related to the frequency resolution of the window function. The narrower the main lobe, the smaller the frequency range it covers, and the more likely it is to distinguish signals that are close in frequency. On the contrary, the wider the main lobe, the greater the possibility that the main lobes of two signals with close frequencies overlap. The attenuation speed of sidelobe refers to the speed at which the amplitude of each sidelobe decreases with the increase of the frequency distance from the main lobe. When there is a lot of strong noise in the signal to be analyzed, it is often hoped that the sidelobe of noise can be rapidly attenuated to minimize the influence of noise on the main signal. In order to speed up the convergence of the network, an adaptive learning rate method is adopted to improve the BPNN: X = lr ·

∂E ∂X

X (k + 1) = mc · X (k) + lr · mc ·

(47.1) ∂E ∂X

(47.2)

where lr is the learning rate and mc is the momentum factor. After constructing the structure of BPNN, the sample set of a given training network contains input vectors and expected output information. In this article, through the analysis of the musical and physical characteristics of the music signal, after the signal preprocessing, the

47 Music Main Melody Recognition Algorithm Based on BP Neural …

477

appropriate feature parameter extraction algorithm is adopted to extract the feature parameters of the music signal. Suppose there are l samples randomly and independently drawn from an unknown probability distribution function to form a training sample set: {(xi , yi ), i = 1, 2, 3, . . . , l} xi ∈ R d

(47.3)

Among them, yi ∈ {+1, −1} is the category identifier of the two types of samples. If xi belongs to the first category, the output value is positive. Construct a function to correctly divide as many samples as possible and maximize the classification interval;  1 T w w+C ξi 2 i=1 l

min

w,b,ξ

(47.4)

  s.t. yi w T xi + b ≥ 1 − ξi

(47.5)

ξi ≥ 0, i = 1, 2, 3, . . . , l

(47.6)

In the formula, C is the penalty parameter, and the larger the value, the greater the penalty for the classification error. According to the different signal analysis methods, speech signal analysis methods can be divided into time domain analysis and transform domain analysis [14]. Time domain analysis is the most intuitive waveform of speech signal, and it is also the first waveform that people come into contact with. Time-domain analysis of speech signal has the advantages of simple operation, clarity and clear physical meaning, but it can’t reflect the essence of speech signal. The essence of speech signal is formed by the modulation of vocal cords with different frequencies, and important perceptual characteristics are reflected in its power spectrum, so it is more effective to use transform domain analysis. In order to solve the difference between pitch estimation and speech detection, this article adopts the method of joint detection, and additionally sets up the auxiliary network module of song detection, and connects the two tasks in a special way in the main network so that the two tasks can share the module. In the auxiliary network, no new module is used to build it, but the existing main network is reserved and a branch is added for special song detection [15]. For musical signals, the short-time zero-crossing rate can also be used as the endpoint for judging musical signals. The zero-crossing rate has a very good shortterm characteristic, which refers to the quantity of times the signal crosses the zero level in each time. Its musical characteristics are reflected in the sum, which changes slowly in the silent section and sharply in the musical section, so we can judge which range the signal is in according to this. As the key link of music recognition, feature extraction is sometimes called frontend processing. The essence of its extraction is to measure the distance between characteristic parameters, so as to find the inherent nature of music. In music tone recognition, the commonly used characteristic parameters are parameters that can

478

P. Tongxin and C. Cheng

reflect the envelope of short-time spectrum. Through model training, the inherent law between musical data is obtained, that is, a well-trained template is obtained. For the training process related to the recognition rate of the whole system, it is necessary to meet the needs of training with a large amount of data and efficient learning speed.

47.3 Result Analysis and Discussion The sparsity of signals is widely used in sound, image compression, wireless communication and so on. Intuitively speaking, it means that the energy of the signal is relatively large at relatively few moments or parts, and the main information in the signal is contained in this part. At other moments or parts, the value is zero or the amplitude is very small. Even if they are ignored, the lost information can be ignored or only have a small impact on the signal. The performance of this method is compared with that of Deep Belief Network (DBN). The result of sound station estimation in the presence of fundamental frequency is shown in Fig. 47.2. From Fig. 47.2, it can be seen that the accuracy of treble estimation decreases with the increase of detuning. Compared with the traditional method, the proposed BPNN method obtains a slightly higher accuracy of the original pitch. Multi-tone music signals are formed by the superposition of sound waveforms generated by all instruments in the recording, and many times these instruments are played at the same time.

Fig. 47.2 Fundamental frequency or harmonic offset

47 Music Main Melody Recognition Algorithm Based on BP Neural …

479

Fig. 47.3 Recall of speech frames of different classifiers

Model training is particularly important in music recognition, which requires users to provide a large quantity of original databases. Therefore, before the research of musical tone recognition is started, a musical tone database should be established first. The database includes recording files of piano syllables in different environments, and a single file must be unique from other files, and can reflect the basic situation of actual music data in a balanced way, otherwise, its training effect is difficult to reach a satisfactory level. The test results of recall of speech frames of different classifiers are shown in Fig. 47.3. BPNN has continuous-time nonlinear dynamics, large-scale parallel distributed processing, global function of the network, high robustness and learning and association ability. Moreover, it is unpredictable, unbalanced, attractive, dissipative, irreversible, difficult, self-adaptive and widely connected like general nonlinear dynamic systems. Therefore, in essence, BPNN is a nonlinear super-large-scale continuous-time adaptive information processing system. Figure 47.4 is the original pitch accuracy of this method on different databases. When the learning rate is too large, the weight correction of the system may not be adjusted to the best, and the system will be unstable; When the learning rate of the system is too small, the training cycle will be long and the learning rate of the system will be too slow, thus affecting the system’s ability. The errors of different algorithms are shown in Fig. 47.5. There is little difference between the original pitch accuracy and the original pitch accuracy. BPNN shows great flexibility and adaptability to the processing of nonlinear signals and those systems whose data cannot be described by rules and formulas.

480

P. Tongxin and C. Cheng

Fig. 47.4 The original pitch accuracy of this method on different databases

Fig. 47.5 Error situation of different algorithms

47.4 Conclusions With the help of the Internet, music shows its unparalleled charm and energy, and becomes an indispensable part of people’s lives. More and more users are accustomed to obtaining music information for entertainment, study and business through the Internet. This puts higher demands on MIR. A music theme recognition algorithm based on BPNN model is proposed. Through feature fusion, the model trained in a huge data set achieves a high music classification effect. Based on the feature that vocal cords and vocal tract can only change constantly when human beings pronounce vowel syllables, this algorithm puts forward the hypothesis that the cross-correlation

47 Music Main Melody Recognition Algorithm Based on BP Neural …

481

value of harmonic energy of adjacent speech frames is large when vowel syllables are pronounced, and verifies this hypothesis by using a large quantity of corpora. BPNN shows great flexibility and adaptability when dealing with nonlinear signals and systems whose data cannot be described by rules and formulas. Compared with previous algorithms, this algorithm is simpler to implement, and avoids complex learning algorithms and mathematical models. It can also be applied to blind multispeaker separation, speech noise reduction and other fields.

References 1. T. Ziemer, P. Kiattipadungkul, T. Karuchit, Music recommendation based on acoustic features from the recording studio. J. Acoust. Soc. Am. 148(4), 2701–2701 (2020) 2. A. Xambó, A. Lerch, J. Freeman, Music information retrieval in live coding: a theoretical framework. Comput. Music J. 42(4), 9–25 (2019) 3. M. Srinivasa, S.G. Koolagudi, Content-Based Music Information Retrieval (CB-MIR) and its applications toward the music industry: a review. ACM Comput. Surv. (CSUR) 51(3), 1–46 (2018) 4. S. Panwar, P. Rad, K. Choo et al., Are you emotional or depressed? Learning about your emotional state from your music using machine learning. J. Supercomputing 75(6), 2986–3009 (2019) 5. N. Kroher, J.-M. Díaz-Báez, Audio-based melody categorization: exploring signal representations and evaluation strategies. Comput. Music J. 41(4), 64–82 (2018) 6. X. Wang, Research on the improved method of fundamental frequency extraction for music automatic recognition of piano music. J. Intell. Fuzzy Syst. 35(3), 1–7 (2018) 7. H.B. Lima, C. Santos, B.S. Meiguins, A survey of music visualization techniques. ACM Comput. Surv. 54(7), 1–29 (2021) 8. M. Mueller, A. Arzt, S. Balke et al., Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Signal Process. Mag. 36(1), 52–62 (2018) 9. B. Kostek, Music information retrieval—the impact of technology, crowdsourcing, big data, and the cloud in art. J. Acoust. Soc. Am. 146(4), 2946–2946 (2019) 10. H. Nordström, P. Laukka, The time course of emotion recognition in speech and music. J. Acoust. Soc. Am. 145(5), 3058–3074 (2019) 11. J. Kocinski, E. Ozimek, Logatome and sentence recognition related to acoustic parameters of enclosures. Arch. Acoust. 42(3), 385–394 (2017) 12. A. Baro, P. Riba, J. Calvo-Zaragoza et al., From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123(5), 1–8 (2019) 13. Y. Dong, X. Yang, X. Zhao et al., Bidirectional convolutional recurrent sparse network, no. (BCRSN): an efficient model for music emotion recognition. IEEE Trans. Multimedia 21(12), 3150–3163 (2019) 14. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Proc. 11(7), 884–891 (2017) 15. G. Yu, Emotion monitoring for preschool children based on face recognition and emotion recognition algorithms. Complexity 2021(5), 1–12 (2021)

Chapter 48

Design of Piano Score Difficulty Level Recognition Algorithm Based on Artificial Neural Network Model Zhaoheng Chen and Chun Liu

Abstract At present, the difficulty classification of piano music score is mainly done by manual method, which is inefficient. In order to solve the problems that it is difficult to assign the difficulty grade labels of massive music scores manually and the existing music score classification methods are highly dependent on human subjectivity, a piano audio signal recognition algorithm based on artificial neural network model is proposed. This method extracts music features that can be used for recognition, expands the previous recognition feature set, and then realizes the output of audio feature results based on machine and the recognition of piano score difficulty level by computer. Finally, the simulation experiment of the algorithm proposed in this article verifies the feasibility and effectiveness of the algorithm. The simulation results show that the recognition accuracy of the algorithm can reach 95.17%. The overall performance of this method has achieved the expected results and met certain practical requirements, which laid a certain foundation for the future research on piano score difficulty level recognition. Keywords Artificial neural network · Piano score · Difficulty level recognition

48.1 Introduction With the continuous development of the Internet and computer technology, music data has exploded, and tens of thousands of piano music score resources can be purchased from the Internet, and even many music websites provide free download services [1]. At the same time, with the continuous improvement of living standards, people also have higher requirements for their own quality, and more and more people begin to learn musical instruments to improve their musical quality [2]. Although there are a large quantity of piano score resources on the Internet, their difficulty levels Z. Chen · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_48

483

484

Z. Chen and C. Liu

are different. For amateur and junior instrument learners, it is difficult to effectively find a score that matches their learning difficulty due to the lack of professional knowledge and guidance [3]. For professional piano learners, although there is a fixed set of advanced teaching materials in the learning process, practicing the same music for a long time is too monotonous and boring, which is not conducive to making personalized learning plans for individuals to increase learners’ learning enthusiasm and improve learning efficiency [4]. With the improvement of computer performance, audio processing technology has also made great progress. Piano audio signal mainly includes pitch and overtone [5]. Among them, high audio frequency is determined by pitch. The recognition of piano audio signal is mainly to detect the pitch period. At present, the classification and recognition of musical instruments in music signals has become a hot topic in the field of music classification and recognition, and scholars and experts have done a lot of research on music retrieval [6]. Although many studies have focused on music category recognition and music emotion classification, there are few studies on music score difficulty level recognition. At present, the difficulty classification of piano score is mainly done by manual method. For a large quantity of digital music scores, it will be a time-consuming and labor-intensive project to judge the difficulty level manually. In addition, it is difficult to grasp the difference between each difficulty level stably and reliably by artificial subjective judgment, especially for multi-category problems [7]. In order to provide the difficulty grade label for the massive digital piano scores shared in the network, and at the same time avoid consuming a lot of human working time and avoiding the inconsistency of subjective judgment of the difficulty grade, it will be an effective strategy to design an algorithm that can automatically identify the difficulty grade of the score according to the relevant theories of machine learning and pattern recognition [8]. In recent years, ANN technology has been used in the research of audio signal separation, which has become an increasingly popular topic in the field of audio signal processing and promoted the development of sound source separation based on ANN technology. ANN has the ability of learning, memory, induction and information extraction similar to that of human brain [9]. In the development of ANN, various nervous systems of living things have been simulated and described, and various network models have emerged [10]. Providing difficulty grade labels for piano music scores existing in the network will greatly improve the efficiency of finding suitable difficulty grade music scores and improve the user experience of music websites. In this article, in order to solve the problems that it is difficult to assign a large quantity of piano score difficulty grade labels manually, the existing classification methods of piano score difficulty are highly dependent on people, and the existing identification algorithms of piano score difficulty grade are difficult to meet the application requirements, an identification algorithm of piano score difficulty grade is designed based on ANN model, which can effectively identify and classify piano score difficulty grade.

48 Design of Piano Score Difficulty Level Recognition Algorithm Based …

485

48.2 Methodology 48.2.1 Piano Audio Signal Recognition Judging the difficulty level of piano music score is a relatively complicated task. At present, most of the difficulty level of piano music score still depends on the subjective judgment of professionals. It is difficult to define the standards and criteria for judging the difficulty of piano score completely and objectively. It will be a time-consuming and labor-intensive project to judge the difficulty level of the massive digital music scores in the network one by one [11]. At the same time, there are many factors that affect human subjective judgment, and it is difficult for human subjective perception to always accurately grasp the difference between each difficulty level. Therefore, it is of practical significance to construct an algorithm for identifying the difficulty level of piano music score. In order to solve the above problems, this article proposes a piano audio signal recognition algorithm based on ANN model. The pronunciation stage of modern piano is that the player touches the keys, and then sounds through mechanical conduction of the felt hammer, so the vibration of the strings is excited by the felt hammer. Piano scales are more important, and the key to playing scales is to make all the sounds sound the same. People may not be able to play the established timbre and melody if they have different dynamics when playing. In the piano, adjacent black and white keys have the same sound name, which is an octave. This is the pitch distance of two notes with different pitches. Starting with the analysis of the unique performance of musical sound, combined with ANN model, this article analyzes and designs a model suitable for piano score difficulty level recognition. Based on the physical characteristics and musicality of piano, the main frequency value and sound name of piano music signal are identified, and the design of the whole piano score difficulty level identification model is completed [12]. In the algorithm, the signal first passes through a high-pass filter to filter out 50 Hz AC sound, and then is processed in parallel. In this article, based on ANN model, the short-time Fourier analysis with high efficiency and high accuracy is selected as the core idea to extract the relevant information of the pitch, trigger time, pronunciation duration and end time of the audio signal.

48.2.2 Algorithm Design of Piano Score Difficulty Level Recognition The difficulty level identification of piano score is to assign a difficulty level label to a specific piano score, so it is usually classified as a pattern classification problem. The difficulty related features are defined and extracted from the symbolic score, and the difficulty level identification of piano score is realized in the original feature space or the transformed feature space based on optimization criteria such as structural risk minimization or empirical risk minimization [13]. In the coordinates of music,

486

Z. Chen and C. Liu

the horizontal axis is the alternation of rhythm, and the vertical axis is the level of sound. In the flow of music, it is very important to correctly listen to the level of sound discrimination. From the perspective of signal processing, piano music itself is a combination of different frequency components at different times. Even when the same song is played by different instruments, it will be very different in hearing, and it will have very obvious characteristics in time domain and frequency domain. A piano has strings in common, and its musical sound is produced by tapping the piano strings by pressing the keys. On the physical level, this process can be described by the lateral free vibration of strings. Because the difficulty identification of piano score is a relatively new research problem, the existing symbolic music features are rarely directly used for difficulty level identification. Therefore, this article first defines eight features closely related to the difficulty of music score, and takes these eight difficulty-related features together with semantic features as feature space; Then, the feature selection algorithm is used to assign weights according to the importance of features, and the 10 features with the largest selection value are used as the feature space for subsequent difficulty level recognition. After analyzing the natural distribution relationship between features and difficulty levels by scatter plot, this article considers that it is more suitable to use nonlinear classification algorithm to recognize the difficulty levels of music scores. At the same time, in order to meet the requirements of accurate classification, good robustness to noise and outliers and strong generalization ability in practical application, and independent of the quantity of categories, an ANN algorithm is proposed to identify the difficulty level of piano music score, so as to avoid the shortcoming that traditional algorithms can not use the prior knowledge in training data. The frame diagram of piano score difficulty level recognition algorithm based on ANN, as show in Fig. 48.1. The analysis and extraction of piano music features play a vital role in the automatic identification of piano score difficulty level. The scientific nature of the extracted object, the feasibility of the extraction method and the accuracy of the extracted results directly affect the automatic identification effect. For this article, the extracted results will further affect the analysis of audio signals in. wav format and its subsequent discrimination, comparison and evaluation. ANN can learn audio data efficiently and quickly, thus extracting specific features of audio, and has the ability of representation learning. In this article, various types of piano music spectra are used as the input images of ANN, and piano music is classified through image recognition. Firstly, an ANN classification model is constructed through TensorFlow framework, and then ANN is trained under different conditions with note spectrum as input image, and the classification accuracy of ANN trained by two input samples is compared. Aiming at the piano score difficulty level recognition algorithm, the Delta learning rule is used to train the recognition network, which can transform the model learning problem into the model optimization problem, and the optimal gradient descent method is used to complete the algorithm learning. Let the time domain signal of the music waveform be x(n) , and the music signal of the i frame obtained after framing by the windowing function w(n) is yi (n). Then yi (n) satisfies the following formula:

48 Design of Piano Score Difficulty Level Recognition Algorithm Based …

487

Fig. 48.1 Frame diagram of piano score difficulty level recognition algorithm based on ANN

yi (n) = w(n) ∗ x((i − 1) ∗ inc + n),

1 ≤ n ≤ L,1 ≤ i ≤ fn n = 1, 2 . . . ,L i = 1, 2, . . . ,fn

(48.1)

where w(n) is the window function and yi (n) is the amplitude of one frame. L is the frame length. inc is the frame shift length; fn is the total quantity of frames after framing. Calculate the short-term average energy of any frame of music signal yi (n) according to the following formula: E(i) =

L-1 

y2i (n)

(48.2)

n=0

After feature extraction, some features correspond to larger values, while others correspond to smaller values, even more than two orders of magnitude. In order to avoid the influence of features with large values on the overall classification, this article uses the Min–Max normalization method to normalize the feature vector to [0,1]. The formula is as follows:

488

Z. Chen and C. Liu

x∗ =

x − min max − min

(48.3)

Among them, min and max represent the minimum and maximum values of features respectively, and represent the features after normalization. Most of the 70% note data frames will have the first maximum peak point corresponding to the pitch period after three-level clipping and autocorrelation function solution. A small quantity of signals will be disturbed by frequency doubling wave due to the influence of resonance peak, which will lead to the shift of the maximum peak point. In order to solve the problem of maximum peak deviation, the peak ratio is evaluated and calculated by data frame translation, and the correct peak point is obtained. Using time n as the abscissa and ω as the ordinate, the two-dimensional image can clearly express the pitch of each moment, and also get the corresponding information of the trigger, duration and release of each note. This representation method can reflect the dynamic spectrum characteristics of audio signals, and its time resolution and frequency resolution are determined by the characteristics of the window function used. In this article, the difficulty identification of music score is classified as a pattern classification problem, and the difficulty-related features are defined and extracted from symbolic music score, and the difficulty level of music score is recognized by using the classification idea. From the supervised learning of training data to the projection matrix which can increase the difficulty discrimination of piano score, a new distance measure is obtained. Using this measure to improve the function, the ANN algorithm model is established. Finally, the optimal combination of parameters is obtained by using the algorithm, and the classification model is established to realize the identification of piano score difficulty level.

48.3 Result Analysis and Discussion This section uses MIDI format score files containing score information such as pitch, beat, time, chord, speed, channel, etc. as the experimental data source. Even if the fingering, expression and other labeling information in the score are lost, the information contained is enough to identify the difficulty level of digital piano score. And MIDI files are small and easy to obtain, and many music websites provide free download services. In the experiment, the feature of sound length is extracted according to the time increment in MIDI file, and the length of each note is quantized according to the preset threshold, thus the feature of sound length of the whole music is extracted. Firstly, algorithm training is carried out, and the training process mainly includes three steps, which are as follows: (1) Music files are divided into two parts: an arbitrary training set and a test set, and the audio data are simultaneously divided to obtain a spectrum feature map. (2) Then, the spectrum characteristics of the training samples are compressed and transmitted to the neural network model for training, so as to obtain the weights of the neural network. (3) Finally, the trained neural network

48 Design of Piano Score Difficulty Level Recognition Algorithm Based …

489

Fig. 48.2 Algorithm training results

model is used to classify the test set. The training results of the algorithm are shown in Fig. 48.2. In this article, before the difficulty identification, the collected MIDI scores are quantized, the extracted characteristic data are normalized and the pretreatment operation to solve the data imbalance is carried out. In single melody piano music, pitch is determined by fundamental frequency, and frequency doubling affects timbre. In previous studies, the waveforms near the first five peaks in the spectrum and the related amplitude matching relationship were often ignored, but these will directly affect the timbre of musical instruments, so we should pay full attention to them now. In order to verify the effectiveness and feasibility of the piano score difficulty level identification algorithm in this article, and to make the experimental results more realistic, accurate and objective, RMSE is selected as the evaluation index of the piano score difficulty level identification and classification model. Different algorithms are used to carry out sample set experiments, and the results shown in Fig. 48.3 are obtained. It can be seen from the comparative experiments of several different algorithms that the error rate of this algorithm is the lowest. This shows that the difficulty level identification algorithm of piano score in this article has certain advantages compared with the other two algorithms. In order to better verify the extensive practicability and generalization performance of the newly proposed features and subsequent classification algorithms, and to meet the classification situation of music score difficulty levels in practical applications, this section collects data sets as experimental data sources and carries out preprocessing operations. After that, the effectiveness of the new features proposed

490

Z. Chen and C. Liu

Fig. 48.3 Comparison of calculation errors of different algorithms

in this article is verified by algorithm comparison experiments. The short-time autocorrelation algorithm and short-time amplitude difference algorithm are compared with the algorithm in this article, and the experimental results are shown in Fig. 48.4.

Fig. 48.4 Comparison of response time of different algorithms

48 Design of Piano Score Difficulty Level Recognition Algorithm Based …

491

It can be seen that compared with the other two algorithms, this algorithm can achieve satisfactory results in a short time. Adding note features to the frequency spectrum can greatly improve the recognition accuracy and effect of piano audio signals, and the experimental results further show that the piano music classification method based on ANN proposed in this article is feasible and effective. When the difficulty level of piano score is recognized by this algorithm, when the piano music rhythm is slow, that is, when the quantity of notes per second is less than 2, the recognition accuracy of traditional autocorrelation method and this method is relatively high; When the rhythm of piano music is accelerated and the quantity of notes per second is in the range of 2–4, the recognition accuracy of traditional methods tends to decline, but the difficulty level recognition algorithm of piano score in this article still has a high recognition accuracy. In order to directly verify the practicability of using this method to identify the difficulty level of piano music score, this article considers three different identification methods for experiments. Get the trend chart of the accuracy of piano score difficulty level recognition, as shown in Fig. 48.5. In this article, the difficulty level identification of piano music score is regarded as a classification problem, and an automatic and accurate algorithm is designed. In the data set, the performance of the classical algorithm and the proposed algorithm is analyzed and compared. This shows that it is feasible and reliable to use the piano score difficulty level identification algorithm based on ANN model to identify the piano score difficulty level.

Fig. 48.5 Comparison of recognition accuracy of different algorithms

492

Z. Chen and C. Liu

48.4 Conclusions The determination of piano score difficulty level is a relatively new research field. Aiming at the problem that the difficulty classification of piano music score is mainly done manually and the efficiency is not high, a piano audio signal recognition algorithm based on artificial neural network model is proposed. Different from the traditional classification of music score difficulty level as regression problem, this article directly models it as a classification problem based on artificial neural network, and gives a piano music classification method based on artificial neural network. The simulation results show that the recognition accuracy of the piano score difficulty level recognition algorithm proposed in this article can reach 95.17%, and the algorithm has small error and high efficiency. This further verifies the feasibility and efficiency of the algorithm, which shows that it is feasible and reliable to identify the difficulty level of piano music score by using the neural network model in this article. This algorithm, which can automatically identify the difficulty level of piano music scores, automatically assigns the difficulty level labels to the massive digital music scores in the network, which will greatly improve the user experience of music websites and promote the spread and development of music.

References 1. B. Sun, Using machine learning algorithm to describe the connection between the types and characteristics of music signal. Complexity 2021, 1–10 (2021) 2. C. Ren, Y. Jing, D. Zha et al., Spoken word recognition in noise in Mandarin-speaking pediatric cochlear implant users. Int. J. Pediatr. Otorhinolaryngol. 113, 124–130 (2018) 3. M. Mueller, A. Arzt, S. Balke et al., Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Signal Process. Mag. 36(1), 52–62 (2018) 4. Y. Dong, X. Yang, X. Zhao et al., Bidirectional Convolutional Recurrent Sparse Network (BCRSN): an efficient model for music emotion recognition. IEEE Trans. Multimedia 21(12), 3150–3163 (2019) 5. H. Nordström, P. Laukka, The time course of emotion recognition in speech and music. J. Acoust. Soc. Am. 145(5), 3058–3074 (2019) 6. Z. Tao, X. Zhou, Z. Xu et al., Finger-Vein recognition using bidirectional feature extraction and transfer learning. Math. Probl. Eng. 2021(1), 1–11 (2021) 7. S.B. Thanga, S.I. Shatheesh, Aging facial recognition for feature extraction using adaptive fully recurrent deep neural learning. Comput. J. 7, 7 (2022) 8. P. Shi, Q. Qi, Y. Qin et al., Intersecting machining feature localization and recognition via single shot multibox detector. IEEE Trans. Industr. Inf. 17(5), 3292–3302 (2021) 9. R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2017) 10. J. Singha, R.H. Laskar, Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimedia Syst. 23(4) (2017) 11. Y.H. Chin, Y.Z. Hsieh, M.C. Su et al., Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Proc. 11(7), 884–891 (2017)

48 Design of Piano Score Difficulty Level Recognition Algorithm Based …

493

12. X. Wang, Research on the improved method of fundamental frequency extraction for music automatic recognition of piano music. J. Intelli. Fuzzy Syst. 35(3), 1–7 (2018) 13. G. Yu, Emotion monitoring for preschool children based on face recognition and emotion recognition algorithms. Complexity 2021(5), 1–12 (2021)

Chapter 49

Design and Optimization of Improved Recognition Algorithm for Piano Music Based on BP Neural Network Zhaoheng Chen and Chun Liu

Abstract Based on the theory of speech recognition and the characteristics of music recognition, this study studies, recognizes and processes music sounds. Artificial Neural Network (ANN) is a distributed parallel information processing system, which can train and identify comprehensive input patterns without selecting specific speech parameters. An improved back propagation neural network (BPNN) model is proposed and applied to piano music recognition to find the maximum autocorrelation function on a small scale to adapt to fast-paced music. The simulation results show that the improved method solves the problems encountered in the application of neural network in music recognition, and at the same time makes use of various advantages of neural network pattern recognition, reducing the stage of manual extraction and processing knowledge in traditional recognition methods. To some extent, this algorithm avoids the situation of missing detection, false detection or recognition error of fast-paced music by traditional algorithms, and can significantly improve the recognition accuracy. Keywords BP neural network · Music recognition · Accuracy · Music segmentation

49.1 Introduction It is essentially content-based audio recognition and processing, which is highly complex [1]. If the computer can be used to automatically identify the music played and automatically complete the music score creation, the creative efficiency of the creator will be greatly improved, and the creative inspiration will be greatly stimulated, and the current bad situation that the creative efficiency is low and even the inspiration is affected due to the inconvenience of writing the music score from the Z. Chen · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_49

495

496

Z. Chen and C. Liu

manuscript will be changed [2]. Piano music signal is composed of pitch and overtone, and it is the pitch that determines its pitch, so the detection of pitch period is the key to piano note recognition [3]. Considering that music signal and music signal belong to the same sound signal, the analysis stage of sound signal is generally similar, which requires signal preprocessing, feature extraction and model recognition [4]. The realization of automatic music score reading can not only free the composer from the complicated professional work of score reading, but also help the beginners who have no expert guidance to supervise their own learning by electronic score reading, thus improving the effectiveness and efficiency of learning [5]. ANN is a distributed parallel information processing system, which is suitable for the classification problem in piano music recognition. It can not only train and recognize the comprehensive input mode by selecting special phonetic parameters, but also integrate the auditory model into the network model [6]. With the development of visual technology, how to effectively manage the corresponding audio information and simply and directly obtain effective audio clips has become an urgent need. In order to solve this problem and improve the efficiency of the indexing mechanism of audio data, people began to classify audio consciously, taking the audio points that have changed in some characteristics as the segmentation endpoints, and then classifying different audio segments with the help of classification technology [7]. Researchers hold different views on the relationship between music and pronunciation. One view is that computer hearing is only aimed at digitized sound and music, and pronunciation is another independent field. Another view is that music recognition is a branch of speech recognition. From all aspects, the theoretical and practical operation of music recognition is more impressive than that of speech recognition. Therefore, we can use the related theories of speech recognition and combine the characteristics of music recognition to study, identify and process music sounds. Cazau et al. proposed twice three-level center clipping and autocorrelation processing, but this method increased the calculation amount and was not suitable for the application scenario of fast calculation [8]. In this article, an improved BPNN model is proposed and applied to piano music recognition.

49.2 Methodology 49.2.1 The Main Task of Music Recognition The stage of music recognition based on piano performance is to automatically generate the corresponding music score table according to the input of piano performance audio, that is, to convert the original audio information into symbolic music score information through a series of signal and data processing processes, so as to realize the function of music recognition [9]. There are many high-energy peaks in a musical audio, and high-energy frames are usually the signs of high-energy peaks and the starting point of notes. Selecting an appropriate threshold can screen

49 Design and Optimization of Improved Recognition Algorithm for Piano …

497

out a quantity of high-energy frames, and then determine a quantity of audio signal feature points. The basic characteristics of a single tone include its pitch, loudness and timbre. The extraction of basic features of single tone is the basis of music recognition, and single tone pitch detection technology is the key in the foundation. The tone of all the notes in the spectrogram shows dark longitudinal stripes. Therefore, the method of note segmentation based on feature point detection can be used, and the method of combining time domain, frequency domain and highest energy can not only simplify the calculation process, but also effectively and accurately segment notes [10]. Identifying polyphony is mainly to distinguish and extract sounds that have multiple melodies at the same time, with pitch estimation as the main method. The technical approaches of polyphony identification can also be divided into signal processing, statistical processing and model-based, and the specific implementation process can be divided into iterative estimation and joint estimation [11]. Among them, iterative estimation is to extract the pitch of polyphony one by one, lock the most prominent tone in a certain frame and suppress the information of other tones related to it; Joint estimation is based on the possibility of estimating a set of pitches at the same time, and the confidence of multi-tone needs to be represented by a confidence function based on a set of single-tone estimation. With the data of note segmentation, the work of pitch tracking is simplified and more effective. The second-level note segmentation is based on the results of pitch tracking, aiming to further segment the Legato notes that have not been segmented in the first level according to the characteristics of significant pitch changes, that is, different pitch frequencies. In this way, the start and end times of each independent note are obtained by the two-level note segmentation, and the pitch frequency value of each note is obtained by pitch tracking. There are many ways to obtain and process piano performance information, which can be an online recorded music or recorded by yourself. The main purpose of processing is to extract the characteristic signal expressing the music melody from the audio file. The work of this article is oriented to the signal processing of piano performance in music recognition, that is, the audio signal of piano performance is processed to obtain the melody characteristic information, and the melody information is expressed as a reasonable intermediate format, which can be directly or after transformation used in different music recognition systems to construct the spectrum table.

49.2.2 Improvement of BPNN in Piano Music Recognition The training and learning algorithm of the network is to apply a series of input vectors, and gradually adjust the weights of the network through the previously determined algorithm, and finally achieve the desired goal. The learning stage of BP algorithm consists of forward propagation of signal and backward propagation of error. Musical signal is also a typical time-varying signal, which contains many single musical notes with different fundamental frequencies in a music segment.

498

Z. Chen and C. Liu

Fig. 49.1 BPNN model

However, as far as the pronunciation of single-tone notes is concerned, from the beginning of pronunciation to the disappearance of musical notes, the pitch and overtones are completely determined, the frequency components remain unchanged, but the amplitude gradually decreases. That is to say, from the frequency domain point of view, a single tone note is a typical stationary time-invariant signal, and the pitch frequency of a single tone note determines the actual pitch of the note, that is, determines the most essential characteristics of the note. BP algorithm is a gradient descent algorithm, and gradient descent has multiextreme problems. Moreover, due to the randomness of the algorithm parameters, the system may fall into a local minimum or a stationary point or oscillate between points during the whole learning process. Therefore, it is very important to choose parameters in the stage of BPNN implementation. The BPNN model for piano music recognition proposed in this article is as shown in Fig. 49.1. After the system error is obtained, it is compared with the allowable range of error. If the errors of all output units meet the conditions, the network learning is finished, otherwise, the back propagation is carried out, the output errors and error gradients of hidden neurons are calculated in turn, and the connection weights between neurons are adjusted to complete the learning process with a primary network. MIDI event represents the message content contained in MIDI music in a short event. The event composition is shown in Formula (49.1): M I D I event = deltatimeM I D I message

(49.1)

The research object of this article is piano music, and the piano is a semi-harmonic instrument, that is, there is not a perfect multiple relationship between the fundamental frequency and the harmonic frequencies of notes, and there will be some deviation. For semi-harmonic instruments, the relationship between the fundamental frequency and the harmonic frequencies is as follows:

49 Design and Optimization of Improved Recognition Algorithm for Piano …

/   fk = k ∗ f0 1 + β k 2 + 1

499

(49.2)

where β represents the disharmony factor, f 0 represents the fundamental frequency, k represents the quantity of harmonics, and f k represents the frequency of the k-th harmonic. Before extracting feature parameters, pre-emphasis is usually realized by a firstorder digital filter. The transfer function of the filter is expressed as: H (z) = 1 − αz −1

(49.3)

Among them, α is the pre-emphasis factor, which is generally taken as a decimal close to 1. In this article, it is taken as 0.95. Pre-emphasis process transforms the signal as follows: x(n) = x(n + 1) − αx(n)

(49.4)

The data obtained from the above formula is the pre-emphasized data, where x(n) is the original piano music signal. The short-time average amplitude difference function Fn (k) of the piano music signal s(m) is defined as follows: Fn (k) =

N −k−1

|Sn (m + k) − Sn (m)|

(49.5)

m=0

In the formula, N represents the window length added by the piano music signal s(m); Said Sn (m) piano music signal s(m) windowed interception after a signal, Sn (m) is defined as follows: Sn (m) = s(m)w(n − m)

(49.6)

Piano music signal has a fixed dimension after passing through the time warping network, and the speech of any duration can be converted into the same dimension, which is beneficial to ANN music recognition. Phonetic state is used to describe the state of each pronunciation organ during pronunciation, and each state can represent different levels of phonetic units. Music recognition is a series of states and transitions. Based on this idea, a regular network is used to transform the speech feature parameter sequence into a state transition matrix. Piano music recognition adopts a hybrid cascade neural network recognition system. In this model, the first stage is a regular network, and then this set of feature vectors is fed into the next stage BPNN to complete piano music recognition.

500

Z. Chen and C. Liu

49.3 Result Analysis and Discussion In this article, the speech feature parameter sequence is transformed into a state transition matrix through a regular network. The dimension of state transition matrix is fixed, reflecting the time-varying characteristics of speech, which solves the problem of dynamic pattern recognition of neural network and realizes piano music recognition of isolated words based on neural network. By transforming the dynamically changing music feature parameter sequence into a sequence with fixed dimensions, ANN can use its classification and input–output mapping ability to identify music. For musical signals, the short-time average zero-crossing rate can be used as a reference for determining the endpoint. The zero-crossing rate refers to the quantity of times that the signal waveform crosses the zero level per unit time. Generally, the waveform changes slowly in the silent section, while the signal changes sharply in the musical section due to the vibration of the string, so that the voiced section and the silent section can be distinguished. The sparsity of signals is widely used in sound, image compression, wireless communication and so on. The result of sound station estimation in the presence of fundamental frequency is shown in Fig. 49.2. It can be seen from Fig. 49.2 that the accuracy of treble estimation decreases with the increase of detuning degree, and the sensitivity of the two methods to fundamental frequency and harmonic frequency offset is similar. Compared with the traditional method, the proposed BPNN method obtains a slightly higher accuracy of the original pitch.

Fig. 49.2 Fundamental or harmonic offset

49 Design and Optimization of Improved Recognition Algorithm for Piano …

501

Fig. 49.3 Original pitch accuracy of this method on different databases

One of the key operations of automatic music score recognition is audio structuring. The higher the structuring level, the more detailed it is, and the smaller and more accurate the segment unit is, so the higher the accuracy of piano music recognition. Because the frequency content of single tone notes is very rich, if the single tone is directly Fourier transformed, there will be many non-zero components in the frequency domain because of the high-frequency overtones, so it is difficult to find the pitch directly. Therefore, the single tone signal can be low-pass filtered to remove the high frequency, and then the remaining low-frequency components can be Fourier transformed. Figure 49.3 shows the original pitch accuracy of this method on different databases. When the learning rate is too large, the weight correction of the system may not be adjusted to the best, and the system will be unstable; When the learning rate of the system is too small, the training cycle will be long and the learning rate of the system will be too slow, thus affecting the system’s ability. Firstly, the above data sets are unified into Wav file format data (file information: mono, sampling rate is 22.05 kHz, quantization bit is 16 bits), and then the pitch is calculated. Model training is particularly important in music recognition, which requires users to provide a large quantity of original databases. Therefore, before the research of music recognition is started, a music sound database should be established first. The database includes recording files of piano syllables in different environments, a single file and must be unique from other files. The experimental results of recall of audio frames of different classifiers are shown in Fig. 49.4. BPNN has continuous-time nonlinear dynamics, large-scale parallel distributed processing, global function of the network, high robustness and learning and association ability. Moreover, it is unpredictable, unbalanced, attractive, dissipative, irreversible, difficult, self-adaptive and widely connected like general nonlinear dynamic systems.

502

Z. Chen and C. Liu

Fig. 49.4 Recall of audio frames of different classifiers

In the process of feature analysis and processing, the deconvolution neural network model not only retains the convolution operation and down-sampling operation of the convolution neural network, but also improves on the original basis. Its parameter updating process is based on the back propagation of the reconstruction error of the input data, so there is no need for labeled data. The essence of BPNN learning is the adjustment of weights, that is, in the stage of learning by using the training sample set, the connection weights between neurons are adjusted according to the error between the actual output and the expected output, so that the output of the network is constantly close to the expected output. The error of different algorithms is shown in Fig. 49.5.

Fig. 49.5 Error situation of different algorithms

49 Design and Optimization of Improved Recognition Algorithm for Piano …

503

BPNN shows great flexibility and adaptability to the processing of nonlinear signals and those systems whose data cannot be described by rules and formulas. The improved method solves the problems encountered in the application of neural network to music recognition, and at the same time, it takes advantage of various advantages of neural network pattern recognition, reducing the stage of artificially extracting processing knowledge in traditional recognition methods, improving the learning speed of the network and reducing the quantity of operations.

49.4 Conclusions Piano music signal is composed of pitch and overtone, and it is the pitch that determines its pitch, so the detection of pitch period is the key to piano note recognition. The focus of music recognition is how to combine the existing technology with the specific characteristics of music sounds, and extract improved methods that are more suitable for music sounds in the stage of combination. In this article, the working principle of neural network is analyzed and studied, and an efficient neural network is proposed and constructed for piano music signals. The performance of the established recognition system is compared and analyzed from the quantity of hidden neurons and the way of characteristic parameters. Through experimental analysis, the correctness and feasibility of the music recognition system are proved, and the recognition rate of music sounds is improved. The improved method solves the problems encountered in the application of neural network to music recognition, and at the same time makes use of various advantages of neural network pattern recognition to reduce the stage of artificially extracting processing knowledge in traditional recognition methods. Considering that the difference of ambient sound and the way piano players touch keys will affect the piano timbre, and then have a certain impact on the recognition accuracy, noise reduction of ambient noise, double-key audio separation and training of adaptive threshold are the directions to be studied in the future.

References 1. Q. Wu, Research on emotion recognition method of weightlifters based on a non-negative matrix decomposition algorithm. Int. J. Biometrics 13(3), 229 (2021) 2. K. Manohar, E. Logashanmugam, Hybrid deep learning with optimal feature selection for speech emotion recognition using improved meta-heuristic algorithm. Knowl.-Based Syst. 21, 246 (2022) 3. H. Yu, Online teaching quality evaluation based on emotion recognition and improved AprioriTid algorithm. J. Intelli. Fuzzy Syst. 40(5), 1–11 (2020) 4. M. Zhang, L. Zhang, Cross-cultural O2O English teaching based on AI emotion recognition and neural network algorithm. J. Intelli. Fuzzy Syst. 11, 1–12 (2020) 5. C. Ju, H. Ding, B. Hu, A hybrid strategy improved whale optimization algorithm for web service composition. Comput. J. 3, 3 (2021)

504

Z. Chen and C. Liu

6. C.K. Ting, C.L. Wu, C.H. Liu, A novel automatic composition system using evolutionary algorithm and phrase imitation. IEEE Syst. J. 11(3), 1284–1295 (2017) 7. F.M. Toyama, W.V. Dijk, Additive composition formulation of the iterative Grover algorithm. Can. J. Phys. 7, 97 (2019) 8. D. Cazau, Y. Wang, O. Adam et al., Calibration of a two-state pitch-wise HMM method for note segmentation in automatic music transcription systems. Chem. Eur. J. 18(28), 8795–8799 (2017) 9. I. Goienetxea, I. Mendialdua, I. Rodríguez et al., Statistics-based music generation approach considering both rhythm and melody coherence. IEEE Access 7(17), 32 (2019) 10. H. Zhu, Q. Liu, N.J. Yuan et al., Pop music generation. ACM Trans. Knowl. Discov. Data (TKDD) 9, 396 (2020) 11. H. Felouat, S. Oukid-Khouas, Graph matching approach and generalized median graph for automatic labeling of cortical sulci with parallel and distributed algorithms. Cogn. Syst. Res. 54(5), 62–73 (2019)

Chapter 50

Design of Piano Music Type Recognition Algorithm Based on Convolutional Neural Network Yuche Liu and Chun Liu

Abstract Due to the growth of modern science and technology in speech recognition technology, music recognition technology has begun to attract people’s attention and become a research hotspot in the industry. In this article, taking piano music types as the research object, combined with previous research results, a new piano music type recognition algorithm based on CNN (Convolutional Neural Network) is designed. Based on the feature extraction of piano music type, the algorithm model realizes the output of piano music type feature results based on machine, as well as the recognition, classification and image display of piano music type by computer. In order to verify its effectiveness, this article carries out simulation experiments. The results show that the piano music type recognition algorithm based on CNN has high accuracy and small error, and the accuracy can reach over 95%, which is about 9.87% higher than the traditional method. The algorithm can extract the effective feature representation of piano music in different classification tasks, and achieve good classification effect, which provides certain technical support for piano music type recognition. Keywords Convolutional neural network · Piano music types · Recognition

50.1 Introduction Music, as a common language for human beings to communicate their emotions, can directly reach the depths of people’s hearts, stimulate emotional resonance and produce pleasant aesthetics, which has irreplaceable artistic charm for other artistic behaviors [1]. At the same time, it has a long history of development and produced many popular works of art. With the continuous growth of Internet and computer technology, the piano music data has exploded [2]. The traditional music recognition Y. Liu · C. Liu (B) College of Music and Dance, Huaihua University, Huaihua 418008, Hunan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_50

505

506

Y. Liu and C. Liu

method can’t meet the current demand of piano music retrieval. In fact, the research on music recognition began as early as the 1970s. Its recognition rate has always been the biggest obstacle to its development [3, 4]. The main task of piano music type recognition is to obtain the relevant information of piano music content through audio signal processing and feature extraction, and then use it for comparison, classification and even automatic music recording [5]. The computer identification of piano music types studied in this article combines the knowledge and technology of computer multimedia technology, signal processing and pattern recognition with piano music theory, simulates the stage of human cognitive analysis of music with computers, analyzes and analyzes piano music, and evaluates and classifies piano music types [6]. In order to better describe the types of piano music, it is necessary to extract more type-related features from piano music, so as to expand the feature space, prepare for the subsequent piano music type identification by using classification algorithms, and improve the accuracy of piano music type identification. NN (Neural network) is an efficient algorithm that imitates the neural structure of human brain [7]. In many aspects, it is closer to people’s processing methods of information, and can simulate people’s thinking ability in images [8]. Among them, perceptron NN, linear NN, BP network, self-organizing network and feedback network are representative [9]. CNN is also a kind of NN. In recent years, CNN has been widely used in image recognition. Because of its superior learning ability and feature extraction ability, CNN has been used for feature extraction and learning in many fields. Convolution layer is equivalent to feature extraction in CNN, and each neuron in convolution layer only needs to receive the information from the local two-dimensional feature map of the upper level, and only extract the features of the local feature map [10]. The convolution layer has multiple feature maps and each feature map has different characteristics. Although statistical model is dominant in speech recognition, NN still has strong development potential by virtue of its strong classification ability and input–output mapping ability [11]. Based on this, this article takes piano music types as the research object, and puts forward a piano music type identification method based on CNN. It is hoped that this method can improve the recognition accuracy and reliability of piano music types.

50.2 Methodology 50.2.1 Related Technical Basis At present, the computer analysis technology of piano music classification has made great progress, but there are also some defects that need to be improved [12]. These include: (1) The segmentation of piano music fragments affects the genre classification of piano music, and the segmentation technology of piano music fragments is itself a difficult problem at present. (2) The classification accuracy is not high. (3) The classification operation takes a long time. Therefore, it is necessary to further

50 Design of Piano Music Type Recognition Algorithm Based …

507

study and improve the computer analysis technology of piano music classification to improve its practicability [13]. In this article, seven piano music features-timbre, pitch, duration, amplitude, mode tonality, speed and playing length, which can be used for recognition, are considered, and their computer realization is also discussed. Especially, it focuses on the extraction algorithm of fundamental frequency, the extraction and representation of pitch duration, the piano timbre and its reconstruction, and the extraction of modal tonality characteristics. The recognition of piano audio signal is mainly to detect the pitch period. In this article, we use marked piano music and CNN to train a classification model to predict the style types of piano music. The transmission function adopted by CNN is different from the general NN, and it is a nonlinear function, similar to logsig and tansig. In the piano music type recognition system, this article chooses continuously differentiable Sigmoid function with certain closed-value characteristics as the transmission function of system neurons. Its function expression is: f(x) =

1 1 + e−x

(50.1)

The CNN network structure in this article shares the basic framework with the traditional AlexNet.

50.2.2 Design of Piano Music Type Recognition Algorithm Piano music style reflects the overall basic characteristics of piano music works, and is the basis of piano music appreciation, analysis and research. To accurately describe a specific piece of piano music involves a wide range of contents, including at least rhythm, melody, harmony, timbre and so on [14]. In fact, the doubling sound that people can hear has a major influence on the timbre of the sound. The reason why all kinds of musical instruments have their own special timbre is caused by the difference in the intensity of the octaves contained in the sounds they emit. Rhythm and pitch are more appropriate characteristics of piano music. It should be noted that absolute pitch cannot be used as a musical feature, because the same piece of music can be transposed. Similarly, the change of rhythm and speed cannot be regarded as a musical feature, because the speed of playing the same music will also change. Because the piano music type recognition algorithm is designed for piano music type recognition, before other features are recognized, the recognition of piano timbre by the algorithm should be emphasized in order to better meet the needs of piano music type recognition. The piano music feature system and type identification framework are shown in Fig. 50.1. Classifier is a nonlinear dynamic system, which can be used to solve problems with complex environmental information, unlimited background knowledge and reasoning rules, and allows a large quantity of samples to be missing and distorted.

508

Y. Liu and C. Liu

Fig. 50.1 Piano music feature system and type identification frame diagram

Through training, the NN classifier can associate an unknown figure with a certain known figure, so as to achieve the purpose of classification. In this article, a corresponding and more intuitive representation method of fundamental frequency and duration characteristics is proposed. Using time n as abscissa instead of ordinate, the two-dimensional image can clearly express the pitch at each moment, and also get the corresponding information of trigger, duration and release of each note. This representation method can reflect the dynamic spectrum characteristics of audio signals, and its time resolution and frequency resolution are determined by the characteristics of the window function used. Convolution will make any two spectral peaks whose frequency difference is less than b merge into a single peak. Because for the same window function, its passband width is inversely proportional to the window length, so if you want high frequency resolution, the window length should be as long as possible. The convolution calculation is expressed as: s(t) = x(t) ∗ w(t) =

τ =+∞ 

x(τ )w(t − τ )

(50.2)

τ =−∞

where x(t) is the input feature; w(t) is feature mapping. When processing twodimensional matrix data, it can be expressed as: s(i, j) =

M  N    wm,m xi+m + wb

(50.3)

M=0 n=0

Because the input layer of the training network is complex, considering that the model should not be too complex, after many model trainings, the hidden layer finally adopts two-layer structure, each layer has 20 neurons. The usual NN with adjacent layer connection can be equivalent to a series-connected biological loop model, and

50 Design of Piano Music Type Recognition Algorithm Based …

509

the non-adjacent layer connection NN designed in this article can be equivalent to a series/parallel-connected biological loop model. The control unit of the model hands over the pitch prediction value and data to the parallel processing unit for processing, and the parallel units respectively get their own pitch estimates, which are handed over to the control unit for final pitch judgment based on the principle of “obeying the majority” and used as the next prediction value.

50.3 Result Analysis and Discussion In order to achieve better experimental results, this section chooses to carry out experiments in the environment of Windows + Python system, with TensorFlow as the experimental platform and TFLearn as the model base. Firstly, the training learning rate of CNN is set to 0.001, and the data samples are spectrum samples and note spectrum samples to train CNN. Because it will take a long time for the network to converge when all layers use Dropout, the Dropout value is set to 0.4 at the fully connected layer in this experiment. Figure 50.2 shows the error of the algorithm. The experimental data in this article are divided into 800 training sets and test sets, all of which come from piano music in cool dog music. There are four types of music: Blues, classic, Jazz and Pop, each with 200 songs, and the music file format is mp3. However, these audio files are usually large, so they are not suitable for direct audio superposition as input audio. So it is necessary to slice and divide the piano music audio. In this article, the pixel of the segmented spectrum segment is 128 ×

Fig. 50.2 Algorithm error

510

Y. Liu and C. Liu

128, which represents 2.56 s audio signal. After dividing different types of audio, about 15,000 fragment samples are obtained. Each small piece is named according to the title and serial quantity of the song, and the pieces of different songs are stored in different folders. These samples are divided into three parts: 5800 training samples, 4600 verification samples and 4600 test samples. Accuracy and recall are important indexes to test the performance of the algorithm. The calculation method of accuracy is shown in the following formula:  

Precision =

u∈U

|R(u) ∩ T (u)| u∈U |R(u)|

(50.4)

The calculation method of recall rate is shown in the following formula:  Recall =



u∈U

|R(u) ∩ T (u)| u∈U |T(u)|

(50.5)

In this article, the accuracy and recall are selected to test the effectiveness of the proposed algorithm, and at the same time, in order to reflect the superiority of the piano music type recognition algorithm in this article, a comparative experiment is carried out with the other two algorithms. The comparison of the accuracy of the algorithm obtained by the experiment is shown in Fig. 50.3, and the comparison of the recall of the algorithm is shown in Fig. 50.4. Experiments show that the CNN classification model based on the combination of piano music spectrum characteristics and note characteristics training proposed

Fig. 50.3 Comparison of the accuracy of the algorithm

50 Design of Piano Music Type Recognition Algorithm Based …

511

Fig. 50.4 Comparison of recall rates of algorithms

in this article is feasible; through experiments on different piano music types, it is found that the proposed piano music type recognition algorithm can greatly improve the recognition accuracy of piano music types. Piano music style reflects the overall basic characteristics of piano music works, and is the basis of piano music appreciation, analysis and research. In this article, the multi-dimensional note spectrum is generated by open source audio processing software. This tool is widely used in the field of acoustic processing and has super compatibility. It can cut and draw sound files in various formats. The processed audio removes all kinds of noises that may interfere with the subsequent sound source separation to a certain extent, making the piano music pure without any accompaniment or noise. This also provides convenience for subsequent slicing, audio superposition, feature extraction and type recognition. In order to verify the feasibility of the proposed piano music type recognition algorithm based on CNN, the piano music data set A and piano music data set B are taken as experimental samples again, and the recognition accuracy of the piano music type recognition algorithm is tested several times. Figure 50.5 shows the curve of the recognition accuracy of the piano music type recognition algorithm based on CNN on different data sets. It can be seen that the piano music type recognition algorithm proposed in this article can achieve high recognition accuracy on data set A and data set B, which further verifies the effectiveness of the algorithm. The results of many experiments in this section show that the recognition accuracy of the piano music type recognition algorithm based on CNN proposed in this article

512

Y. Liu and C. Liu

Fig. 50.5 The change of recognition accuracy of the algorithm on different data sets

is high, the error is small, and the accuracy can reach over 95%, which is about 9.87% higher than that of the traditional method.

50.4 Conclusions More and more research results on the transition from speech recognition to piano music recognition are constantly emerging, which greatly promotes the growth of music recognition. This article takes piano music genre as the research object, and designs a new piano music genre recognition algorithm based on CNN, combining with previous research results. According to the piano music theory and the physical properties of sound, the algorithm extracts the piano music features that can be used for recognition, and expands the previous recognition feature set, thus identifying the piano music types. Finally, in order to verify the feasibility of the proposed piano music type recognition algorithm, this article takes piano music data set A and piano music data set B as experimental samples, and makes several experiments on the recognition accuracy of the piano music type recognition algorithm based on CNN. Finally, the simulation results show that the piano music type recognition algorithm based on CNN proposed in this article has high recognition accuracy and small error, and the accuracy can reach over 95%, which is about 9.87% higher than the traditional method. In addition, the algorithm in this article has high flexibility, which can provide some reference for different types of music recognition and has certain reference value for subsequent research. In the design of CNN, the influence

50 Design of Piano Music Type Recognition Algorithm Based …

513

of network structure and loss function design on classification performance will be deeply studied in the future to improve the feasibility and effectiveness of the fusion algorithm. Try to integrate more classification methods into the emotion of music.

References 1. B. Sun, Using machine learning algorithm to describe the connection between the types and characteristics of music signal. Complexity 2021, 1–10 (2021) 2. M. Kacper, A case study of a pianist preparing a musical performance. Psychol. Music 17(2), 95–109 (2016) 3. R. Bianco, G. Novembre, P.E. Keller et al., Syntax in action has priority over movement selection in piano playing: an ERP study. J. Cogn. Neurosci. 28(1), 41–54 (2015) 4. P.D. Julie, Bass-line melodies and form in four piano and chamber works by Clara WieckSchumann. Music Theory Spectr. 2, 133–154 (2016) 5. L. Huang, Research on the cultivation of the indoor piano performance skills from the perspective of psychology. Basic Clin. Pharmacol. Toxicol. 2019(2), 125 (2019) 6. M. Liu, J. Huang, Piano playing teaching system based on artificial intelligence—design and research. J. Intell. Fuzzy Syst. 40(1), 1–9 (2020) 7. H. Nordström, P. Laukka, The time course of emotion recognition in speech and music. J. Acoust. Soc. Am. 145(5), 3058–3074 (2019) 8. T. Furukawa, M. Ohyama, Study on playing piano of teacher in kindergarten. Neural Regen. Res. 47(23), 1555–1558 (2010) 9. Y. He, Romantic piano playing skills analysis. Contemp. Music 9, 3 (2020) 10. R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2017) 11. M. Weinstein-Reiman, Printing piano pedagogy: experimental psychology and Marie Jaëll’s theory of touch. Ninet.-Century Music Rev. 1–27 (2020) 12. N. Oikawa, S. Tsubota, T. Chikenji et al., Wrist positioning and muscle activities in the wrist extensor and flexor during piano playing. Hong Kong J. Occup. Ther. 21(1), 41–46 (2011) 13. J. Singha, R.H. Laskar, Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimed. Syst. 23(4) (2017) 14. M.D. Singer, The piano and the couch: music and psyche. By Margret Elson [book review]. Med. Probl. Perform. Art. 34(2), 120–121 (2019)

Chapter 51

Application of Emotion Recognition Technology Based on Support Vector Machine Algorithm in Interactive Music Visualization System Feng Wu and Haiqing Wu

Abstract Music visualization is a kind of non-subjective interpretation and judgment of music expression with vision as the core, and it is a presentation technology for understanding, analyzing and comparing the expressive force and internal structure of music. The identification of musical emotion is a process of automatically associating music with certain emotion according to the characteristics of music under given reasoning rules. In this paper, the application of emotion recognition technology based on SVM (Support Vector Machine) algorithm in interactive music visualization system is studied. The relationship between music and emotion is studied by SVM, and this relationship is modeled. GA (genetic algorithm) is used to reduce the dimension of data features, and the features with the most emotion distinguishing ability are selected to participate in modeling, so as to further optimize the SVM classification model. The research results show that the average recognition rate of 92.4% is achieved by using SVM classification algorithm as emotion recognition model. It is proved that the SVM music emotion recognition scheme based on GA optimization is effective. Keywords Support vector machine · Emotional recognition · Music visualization

51.1 Introduction In the field of design, information visualization mainly focuses on information transmission and visual display. Music visualization is a branch of information visualization [1]. Music visualization is a kind of non-subjective interpretation and judgment of music expression with vision as the core, and it is a presentation technology for understanding, analyzing and comparing the expressive force and internal structure F. Wu · H. Wu (B) College of Music and Dance, Huaihua University, Huaihua, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_51

515

516

F. Wu and H. Wu

of music. With the development of computer multimedia technology, visualization technology and graphics technology, we can use computers to accurately reproduce colorful three-dimensional objects in the real world, and give full play to our creative thinking to simulate and transform the real world through human–computer interaction. This is the most fashionable virtual reality technology at present. Music visualization is a new visualization technology. Through feature extraction of music signals, with the help of three-dimensional animation, virtual reality and other technologies, its corresponding characteristics such as waveform, pitch, loudness, duration, timbre, rhythm, melody, harmony and emotion are mapped to corresponding musical scores, two-dimensional/three-dimensional graphics, scenes, roles, animations and other forms of expression, thus forming “visual music” [2]. In the continuous description of emotion, the emotional state is generally corresponding to a point in a multi-dimensional space. Generally speaking, there are three annotation methods for musical emotion, namely, single label, multi-label and fuzzy semantic multi-label [3]. Literature [4] extracted the sustained pronunciation time, fundamental frequency, energy and formant as feature vectors, and achieved about 80% recognition rate in a small experimental data set. Music emotion recognition algorithm based on SVM (Support Vector Machine) mentioned in Ref. [5]. Literature [6] extracted three characteristics of music: intensity, timbre and rhythm, and used the method of Gaussian mixture model and Thayer’s emotional classification model to establish a musical emotional model. Gaussian mixture model is used for each musical emotion category. As soon as music visualization was put forward, its wide application cited people’s great interest. Now, in many TV programs, you can see the music visual effect displayed on the big screen with rectangular blocks. Although it is only a simple transformation such as flame pattern at present, its strong visual impact effect has been deeply rooted in people’s hearts. Combined with the development of the visual industry, it can be said that the use of vision as a carrier to convey a lot of information in the visual industry is to transform various forms of information into visual forms for communication, that is, the visual design of information [7]. In this paper, SVM is used to study the relationship between music and emotion, and this relationship is modeled, so as to automatically identify musical emotions. The feature dimensionality reduction method based on GA (genetic algorithm) is studied, and the classification model is trained by selecting key features from the extracted emotional features, so as to further optimize SVM and improve the recognition performance of music emotion recognition system. It lays a foundation for the application direction of emotion recognition technology in interactive music visualization system. Music emotion recognition has great research and application value. Music emotion recognition is a technology that extracts different emotion features after analyzing and processing human music signals by computer, and then constructs a classifier to classify emotions. Through music emotion recognition technology, the computer can automatically identify the speaker’s emotional state from music signals, thus realizing natural human–computer interaction.

51 Application of Emotion Recognition Technology Based on Support …

517

51.2 Research Method 51.2.1 Emotion Recognition Based on SVM Algorithm While using dynamic graphics to gain attention, immersive experience plays an important role in maintaining attention for a long time. Music visualization is a branch of information visualization. The creative purpose of music visualization may be artistic expression or simple information visualization. The creative technique is to use visual form as the means of expression, with the help of multimedia platform as the carrier, to display music information through visual graphics, images, movies and other forms, so as to help viewers and listeners better appreciate, invest and analyze the information contained in music. Under the background of visual industry, new forms of information visualization and new demands for information design are gradually emerging. The research on the application of dynamic design of music visualization in mobile terminals caters to the needs of these new times from an important angle. The identification of musical emotion is a process of automatically associating music with certain emotion according to the characteristics of music under given reasoning rules. From a mathematical point of view, emotion recognition is a mapping process that maps the musical features of dimension to the emotional space of dimension. The process of emotion recognition in this paper is as follows: firstly, according to the basic theory of mathematical model and music psychology, the mathematical meanings of input vector and output vector of mathematical module are abstracted; secondly, according to the required input vector, relevant feature information is extracted from MIDI files, and each component of the input vector is calculated. Finally, through a lot of data training, many parameters in the mathematical model are gradually adjusted, and finally an efficient and accurate emotional calibration module is realized. Emotion recognition is the key technology to realize human–computer interaction, and its purpose is to give computers the ability to recognize users’ emotions [8, 9]. Emotion recognition based on facial expression, voice, posture and physiological signals has been widely studied in recent years [10]. SVM studies the law of machine learning in the case of small samples. At present, SVM has been one of the important research methods of machine learning and pattern recognition. SVM proposes to transform the low-dimensional feature space into the high-dimensional feature space nonlinearly, and construct a linear discriminant function in the high-dimensional feature space to realize the nonlinear discriminant problem in the original space. This not only solves the complexity problem of the algorithm, but also has good generalization ability. When training the SVM model in each node, the feature dimension is too high, which will lead to the feature matching is too complicated and prone to over-fitting, which not only takes a long time to model, but also has low recognition accuracy. Therefore, a better classification model can be trained by optimizing and screening the extracted emotional features. GA is an adaptive global optimal solution search

518

F. Wu and H. Wu

method, which can automatically screen out the features with the most emotion distinguishing ability to obtain the optimal classification results under the recognition model. Feature dimension reduction based on GA is to select some most effective features from a group of features, and a better model can be constructed without changing the values of the selected features [11]. Therefore, this paper proposes a SVM music emotion recognition model based on GA optimization. Before training the SVM classification model, the feature dimension of the data is reduced by using GA, and the features with the most emotion distinguishing ability are selected to participate in the modeling, so as to further optimize the SVM classification model and improve the recognition performance of the music emotion recognition system. The flow of emotion recognition system is shown in Fig. 51.1. The spectrum centroid is the gradient center of STFT (Short-time Fourier Transform) amplitude spectrum, which can be regarded as the frequency component where the spectrum centroid is located, and it is one of the important measures to describe the brightness of sound in music. This characteristic quantity will change with the intensity of music sound, and the higher the spectrum centroid means the brighter the sound [12]. The definition of spectrum centroid SC is shown in Formula (51.1):

Fig. 51.1 Emotion recognition system flow

51 Application of Emotion Recognition Technology Based on Support …

N SC = n=1 N

n Ain

n=1

519

(51.1)

Ain

where Ain is the amplitude of the STFT amplitude spectrum of the i frame and the n frequency interval, and N is the number of STFT frequency intervals. It is assumed that all the training sample data can be fitted linearly without error, and the distance from the sample point to the hyperplane is less than or equal to ε as a penalty function. Because the function f is unknown, we can only use the linear regression function f (x) = w · x + b to fit the sample data according to the collected samples, and get: yi − (w · xi + b) ≤ ε (w · xi + b) − yi ≤ ε, i = 1, 2, . . . , n

(51.2)

SVM classification algorithm can transform the problem of finding hyperplane solution into a solution:  minw2 /2 (51.3) s.t. yi (w · x + b) ≥ 1 The corresponding prediction function is:  f (x) = sgn

n 

 ai yi xi · x + b

(51.4)

i=1

Because different features have different ability to distinguish emotions, and the feature dimension is too high, it is easy to be over-fitted. Therefore, in order to shorten the modeling time and improve the progress of the model, it is necessary to screen out the key features in the input features. In this paper, the proportional selection operator based on relative fitness is used for selection, that is, the probability of individual being selected is directly proportional to the fitness value of the individual. Calculate the sum of fitness sizes of all individuals in the population F according to the following formula: F=

N 

f (i )

(51.5)

i=1

where N is the population size and fi is the fitness function value of the ith individual. Then calculate the relative fitness of each individual in the population, and the formula for calculating the relative fitness Pi of the ith individual is: Pi =

f (i ) F

(51.6)

520

F. Wu and H. Wu

Finally, the roulette operation is simulated for multiple rounds of selection. The larger the Pi , the larger the area occupied by the individual in the roulette, so the greater the probability and times of being selected.

51.2.2 Realization of Interactive Music Visualization System The intercommunication of human visual and auditory senses is an important condition for music visual creation. The study of paintings that embody the sense of music provides a theoretical basis for the aesthetic characteristics of music visual design, which has certain enlightening significance. There are two ways to design music visualization: one is to extract audio data or music theory knowledge for visual interpretation and translation; second, designers or artists present personal interpretation and secondary creation from emotional or intuitive feelings. Different from the mechanical translation by computer, the creator subjectively and visually transforms the emotional semantics of music, which has certain freedom and individual differences. For example, the changes in the size and position of visual elements such as graphics and characters, as well as the relationship between points, lines, surfaces and blanks, supplemented by the changes in the contrast between light and dark in colors, can make the visual effect of the layout present fresh vitality, strengthen the visual excitement of the viewer and guide the visual process of the viewer. In order to create a sense of immersion in music visualization, it is necessary to present a virtual environment in order to give the audience a high-level experience, which requires the combination of image system and projection system. Music visualization is inseparable from emotion, and images directly touch human perception, so images become the core element of immersion. The persistence of images enables visual works to maintain their real-time nature when users interact, and creates a virtual environment by means of computer system for processing and calculation. The realization forms and solutions of music visualization can be described as brilliant. Music visualization has a wide range of applications, such as music analysis, information retrieval, performance analysis, music teaching, music cognition, emotional expression and game entertainment. This system is designed as a realtime interactive music visualization system based on emotional recognition, and an optional virtual character is used to express emotional actions. This paper hopes to allow users to have an immersive interactive experience with virtual characters. In this way, users can experience the illusion caused by the realistic interaction with the virtual scene of three-dimensional entity size, and allow users and observers to experience the mandatory immersion caused by the real interaction between performers and virtual characters. The structure of interactive music visualization system is shown in Fig. 51.2. The interactive music visualization system established in this paper can be used for music analysis and music visualization. The real-time voice and MIDI input of live performances are processed and analyzed, and finally mapped to virtual characters. And show the music in the form of vision. The graphics in the visual picture are

51 Application of Emotion Recognition Technology Based on Support …

521

Fig. 51.2 Interactive music visualization system structure

histogram, dot, square and triangle. The volume can be adjusted according to the needs of the experiment, and the color, size, density, image ambiguity and movement speed of the graphics in the visual picture can be adjusted. The graphic shape can be dynamically scaled and vibrated with the rhythm of music. The core method is to call Web Audio API to obtain the frequency, waveform and other data from sound sources, and play the audio and animation of visual music in the webpage developed by HTMLS. Then, using canvas to draw graphics, according to the value of 8-bit array, the frequency and amplitude of each frequency domain component control the related actions and sizes of graphics, and each graphic is abstracted into an object, and the parameters of each graphic corresponding to each frame are adjusted in the same proportion to complete audio animation. Then set the parameters of the visual picture and play the music visual animation in the browser.

51.3 Result Analysis In this experiment, a public music multi-label library is used as the music emotion feature library, which is a music feature library containing 593 music fragments, 72-dimensional music feature vectors and 6-dimensional emotion vectors. GA is used to reduce the dimension of 72-dimensional features. When training samples, the classification model is established by 10 times cross-validation. After training and calculating the error, the error gap of each test is not big, so the classification algorithm of SVM is effective in the field of music emotion modeling. The number of iterations of GA is set to 100 and the initial population size is set to 30. The SVM classification model based on GA optimization is used to identify

522

F. Wu and H. Wu

Fig. 51.3 Evolution curve of population fitness function

musical emotion, and the evolution curve of fitness function of population in the first identification experiment is shown in Fig. 51.3, which intuitively shows the process that GA generates better and better approximate solution through iterative evolution until the optimal solution is solved. After establishing the emotion recognition model, the comparison results of the recognition rate of each kind of emotion are shown in Fig. 51.4. For 593 multi-tag public libraries, SVM classification algorithm is used as the emotion recognition model, and the average recognition rate is 92.4%. It is proved that the SVM music emotion recognition scheme based on GA optimization is effective. Music visual design has a variety of ways and technical means, and the fields involved are also different. Under the function of synaesthesia, the combination of music and vision has both functionality and aesthetics, which has a wide application prospect. Therefore, if we can combine visual art with healing art, neuroscience and music, create an art exhibition with immersive experience, set up a device for real-time interaction by capturing participants’ facial emotions in the exhibition hall, and create an independent space for participants through light, shadow, music and images, participants can be freed from the pressure and depression of modern life.

51 Application of Emotion Recognition Technology Based on Support …

523

Fig. 51.4 Comparison of recognition rate of emotion

51.4 Conclusion Music visualization is a new visualization technology. In the visual industry, the use of vision as a carrier to convey a large amount of information is to transform various forms of information into visual forms for communication, that is, the visual design of information. In this paper, the application of emotion recognition technology based on SVM algorithm in interactive music visualization system is studied. Using SVM to study the relationship between music and emotion, and modeling this relationship, so as to automatically identify music emotion. Based on the feature dimension reduction method of GA, SVM is optimized to improve the recognition performance of music emotion recognition system. The results show that the average recognition rate of 92.4% is achieved by using SVM classification algorithm as emotion recognition model. It is proved that the SVM music emotion recognition scheme based on GA optimization is effective. This paper studies music emotion recognition, but due to the limitation of research time and research level, there are still many problems that need to be further improved in the future. At present, some achievements have been made in the research of music emotion recognition algorithm, but there is still a certain distance from practical application. It is of great significance to study a high-precision and high-efficiency recognition algorithm.

524

F. Wu and H. Wu

References 1. S. Jin, J. Qin, Research on intelligent design of music visualization based on computer image style transfer. Packag. Eng. 41(16), 6 (2020) 2. W. Zhao, N. Sun, X. Yang et al., Research and application of visual education of oriental music based on knowledge map. Comput. Eng. Sci. 40(1), 7 (2018) 3. Y. Chen, Y. Xu, Audiovisual synaesthesia and Kandinsky’s music visualization. Art Obs. 2, 5 (2022) 4. Q. Wu, Research on emotion recognition method of weightlifters based on a non-negative matrix decomposition algorithm. Int. J. Biometr. 13(3), 229 (2021) 5. S. Hamsa, Y. Iraqi, I. Shahin et al., An enhanced emotion recognition algorithm using pitch correlogram, deep sparse matrix representation and random forest classifier. IEEE Access 99, 1 (2021) 6. T. Chakraborti, A. Chatterjee, A. Halder et al., Automated emotion recognition employing a novel modified binary quantum-behaved gravitational search algorithm with differential mutation. Expert. Syst. 32(4), 522–530 (2015) 7. K. Manohar, E. Logashanmugam, Hybrid deep learning with optimal feature selection for speech emotion recognition using improved meta-heuristic algorithm. Knowl.-Based Syst. 21, 246 (2022) 8. H. Yu, Online teaching quality evaluation based on emotion recognition and improved AprioriTid algorithm. J. Intell. Fuzzy Syst. 40(5), 1–11 (2020) 9. M. Zhang, L. Zhang, Cross-cultural O2O English teaching based on AI emotion recognition and neural network algorithm. J. Intell. Fuzzy Syst. 11, 1–12 (2020) 10. C. Katsimerou, I. Heynderickx, J.A. Redi, Predicting mood from punctual emotion annotations on videos. IEEE Trans. Affect. Comput. 6(99), 1 (2015) 11. S. Mitsuyoshi, F. Ren, Emotion recognition. J. Inst. Electr. Eng. Jpn. 125(10), 641–644 (2007) 12. W.L. Zheng, B.L. Lu, Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans. Auton. Ment. Dev. 7(3), 1 (2015)

Chapter 52

Research on Image Classification Algorithm of Film and Television Animation Based on Generative Adversarial Network Li Yang

Abstract The strong influence and appeal of TV media and Internet on the audience has greatly strengthened the influence of animation art on politics, economy, society and culture. Image classification is one of the most common tasks in computer vision, which is mainly to extract effective feature information from the original digital image and give the input image a label category. The semi-supervised image classification algorithm can solve the problem of insufficient classification performance under the premise of limited labeled samples. In this article, a movie animation image classification algorithm based on Generative Adversarial Network (GAN) is proposed, which transforms the feature tracking problem of multi-frame movie images into the classification problem of foreground and background of image targets, thus completing the task of classifying, locating, detecting and segmenting movie animation images. The results show that the proposed method can quickly extract the target, and has a fast operation speed, which is enough to realize real-time target tracking. When some labels of training samples are wrong, the classification accuracy of this algorithm will not change much, that is, the fault-tolerant ability of the network will be improved. Keywords Film and television animation · Image classification · Generative Adversarial Network

52.1 Introduction With the expansion of production scale and the deepening of industrial cooperation, the reusability and sharing of material resources pose a very serious problem for film and television animation enterprises [1]. The strong influence and appeal of TV media L. Yang (B) Department of Media and Art Design, Guangzhou Huali College, Guangzhou 511325, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_52

525

526

L. Yang

and Internet on the audience has greatly strengthened the influence of animation art on politics, economy, society and culture. Film and television images are the basis of moving images, and the tracking of moving objects in film and television images is an important link in post-production. The difficulty lies in separating film and television images from complex backgrounds in real time and accurately [2]. Deep learning has made a breakthrough in the field of natural image recognition, especially in the recognition and classification of specific objects such as faces and license plates [3]. However, during the development of the algorithm, the shortcomings such as poor generalization ability and large amount of data needed for training are gradually exposed. Image classification is one of the important research hotspots in computer vision and pattern recognition. According to whether the training samples are labeled or not, classification algorithms can be divided into supervised learning and unsupervised learning [4]. The acquisition cost of labels in supervised learning is high, while the classification performance of unsupervised learning is low. Therefore, semi-supervised learning, which can obtain good classification performance with only a few labels, has become a research hotspot in classification algorithms. Image classification is one of the most important tasks in computer vision, and semi-supervised image classification algorithm can solve the problem of insufficient classification performance on the premise of limited labeled samples [5]. Information data is transmitted all the time, and data in the form of pictures and videos can reflect the content of information more intuitively, comprehensively and quickly than text [6]. Due to the lack of effective semantic description of multimedia data, various systems cannot share information. Therefore, a lot of relevant information cannot be retrieved. In order to obtain multimedia information effectively, information retrieval based on semantics must be realized. Semantic-based multimedia retrieval technology can not only improve the accuracy of information retrieval, but also improve the degree of information sharing and retrieval intelligence [7]. For labeled data, it is impossible to completely classify them correctly, so it is impossible to guarantee that the predicted values of the model are accurate for all unlabeled data, which may make the performance of the model worse than that of fully supervised learning [8]. In this article, a classification algorithm of film and television animation images based on GAN is proposed, which transforms the feature tracking problem of multi-frame film and television images into the classification problem of image target foreground and background, thus completing the task of classification, location, detection and segmentation of film and television animation images. By adding unlabeled data to the training set, the performance of the algorithm is better than that of using labeled data alone.

52 Research on Image Classification Algorithm of Film and Television …

527

52.2 Methodology 52.2.1 Image Characteristics of Contemporary Film and Television Animation Animation art is one of the art categories that can best reflect the emotion of subject and object. Its infinite expression of subject matter, highly hypothetical content, and strong spirit rendering, coupled with the design characteristics of animation art’s preproduction, make the image characteristics of animation art have a unique expression advantage in meeting the psychology of viewers and causing emotional resonance. The formation of film and television animation features is not only a basic point, line, surface and color, the composition of a single shot or multiple shots, nor a simple choice of content and theme. It is a historical and dynamic changing process, which is an important reason for the diversity of contemporary animation image features [9]. The authenticity of the output data is to use the information of the data generated by the generator to the maximum extent in GAN, so that the logical closed loop of the model is more complete. The prediction label of output data is for image classification task. This model can also be used for various tasks such as image segmentation and text generation. If it is used for image segmentation, the prediction label output here should become the coordinates of the segmentation area. Corresponding to different tasks, the final output of the discriminator can be changed according to the situation. In the form of film and television animation images, whether it is role design or scene design, it is not just a simple process of virtual, expression and reproduction of animation forms. The form it involves is actually a complex, which not only involves the unique laws of film and television animation art as an independent art, but also involves the influence of external factors such as the times, science and technology, culture and commercial operation, and its role and limitation as a link in the industrial chain of modern economic exhibitions [10]. Digital movie images also have advantages over ordinary traditional movies. Digital technology avoids the repeated copying of traditional movies from the original shot material to the copy release, which can be easily finalized with one click, greatly shortening the release time. Its playback is lossless, even if it is repeatedly shown, it will not affect the quality of sound and picture, and avoid the scratch of pictures and vocal cords after multiple screenings of traditional movies [11]. As a new sub-direction of machine learning algorithm, deep learning has made breakthroughs in many difficult tasks. However, the disadvantage is that the training process needs a lot of tag data to obtain better results. Therefore, tasks can be divided into supervised learning and unsupervised learning according to whether there are labels in the samples. Supervised learning uses a large number of labeled data to learn and mine the inherent laws of data, thus helping humans identify and estimate the attributes of new data.

528

L. Yang

52.2.2 Video Animation Image Classification Algorithm Based on GAN Starting with the overall distribution characteristics of samples, semi-supervised clustering hypothesis can be adopted, that is, in the same cluster, it is assumed that the sample category labels are the same, so that the decision boundary can be constrained by the sample distribution density learned from unlabeled samples [12]. Starting from the local characteristics of samples, semi-supervised manifold hypothesis can be adopted. Based on the hypothesis that samples in the local minimal neighborhood in the same low-dimensional manifold have a high probability of similar labels, a large number of unlabeled samples are used to fill the sample space to improve the learning performance of local neighborhood relations of samples. After a long time of training, the generator and the discriminator can reach Nash equilibrium point through mutual game. At this point, the loss function reaches a stable value, and the loss value of the model generator reaches a stable value. The generator can generate realistic image data according to the specified label, and the loss value of the discriminator also reaches a stable value. The influence of unlabeled sample on decision boundary is shown in Fig. 52.1. GAN is a confrontation network model, which draws lessons from the idea of mutual game and progress between people. The network model consists of two parts: generator and discriminator. The generator mainly confronts the discriminator to generate false data which cannot be distinguished by the discriminator, while the discriminator mainly confronts the generator to improve its discriminating ability, so as to identify whether the input data is real data or forged data by the generator. The model of GAN discriminator is shown in Fig. 52.2.

Fig. 52.1 Schematic diagram of the influence of unlabeled samples on decision boundaries

52 Research on Image Classification Algorithm of Film and Television …

529

Fig. 52.2 GAN discriminator model

It is assumed that all sample data are generated by the same potential generation model. Based on this assumption, the unlabeled sample data can be associated with the learning target with the parameters of the potential generation model as a bridge, and the labels of unlabeled data are the missing parameters in the potential model, and the missing parameters are usually solved by maximum likelihood estimation. The key of this way is to generate the assumptions of the model, and different methods correspond to different assumptions. Assume that the input size of l layer is C l × H l × W l tensor, C l is the quantity of input channels of l layer, and the size of a single convolution kernel is C l ×h l ×wl . Then, corresponding to the convolution layer with C l+1 hidden neurons, the output of the corresponding position is as follows: ⎛ yd,i l , j l = σ ⎝

w C  h   l

l



l

pd,cl ,i, j × xcl ,i l +i, j l + j + bd ⎠

(52.1)

cl =0 i=0 j=0

where d is that numb of neurons in the l lay; i l and j l represent location information. With the constraints of Eqs. (52.2) and (52.3): 0 ≤ i l ≤ H l − hl + 1

(52.2)

0 ≤ j l ≤ W l − wl + 1

(52.3)

where p is the convolution kernel parameter; b is the bias parameter in convolution; σ (·) is the activation function. The model is trained with labeled data first, then the unlabeled training data is sent into the model, and the unlabeled data is predicted, and the most probable predicted value is selected, and this most probable predicted value is defined as the pseudolabel of this data, which is regarded as the true labeled value of unlabeled data. Let the gray value range of the original film and television animation image f (x, y) be (gmin , gmax ), choose a suitable threshold T , and:

530

L. Yang

gmin ≤ T ≤ gmax

(52.4)

Image segmentation with a single threshold can be expressed as:  g(x, y) =

1, f (x, y) ≥ T 0, f (x, y) < T

(52.5)

g(x, y) is a binarized image. The object can be easily revealed from the background through binarization. The key to binarizing the film and television animation image is the reasonable selection of the threshold T . In the generator, deconvolution is used, upsampling is used instead of pooling layer, and in the discriminator, steplength convolution is used instead of pooling layer, and the whole connected layer is cancelled to make the model have higher stability.

52.3 Result Analysis and Discussion When using neural network in deep learning to classify images, it is no longer necessary to describe the features of the input images manually, but only rely on neural network to train data sets to extract features. Moreover, the features extracted by neural network have higher semantic meaning, which is far more efficient and effective than manual extraction. Semi-supervised learning is to label a small number of data sets. Using models to identify and mine feature information can quickly improve the generalization ability of models, and at the same time solve the shortcomings of supervised and unsupervised. In the training process of a specific network, with the increase of iterations, the network model often fits well in the training set and the training error is small, but the fitting degree of the test set is not good, which leads to a large test error. True or false judgment value for generated data, category prediction value for generated data, true or false judgment value for real tagged data, category prediction value for real tagged data, and true or false judgment value for real unlabeled data [13]. In the experiment, it is found that the classifier’s category prediction value of generated data will reduce the accuracy of model classification, which should be because the generator has not learned the category characteristics of different categories of data in the initial training process, resulting in the image data generated according to the specified label is not close to the real image. The classification accuracy of different algorithms is shown in Fig. 52.3. As can be seen from Fig. 52.3, the image classification accuracy of the proposed algorithm is more advantageous than that of the traditional algorithm. In this article, by adding unlabeled data to the training set, the performance of the algorithm is better than that of using labeled data alone, thus improving the accuracy of recognition. In the process of training the same batch of iterations, the output of the same batch of samples will change if they are randomly processed. In order to guide the direction of model parameter optimization, it is hoped that the output from the same input will

52 Research on Image Classification Algorithm of Film and Television …

531

Fig. 52.3 Classification accuracy of different algorithms

be as same as possible, that is, the probability of belonging to a certain category is as close as possible for prediction classification. The purpose of generator in GAN is to generate data with the same sample distribution. The problem that often occurs in GAN is that a single sample generated by generator can be discriminated by discriminator. In the learning process, the generator will learn towards single-point features, and the generated data will be input to the discriminator. Because of the gradient drop and other reasons, the discriminator can’t separate the same input, which leads to the pattern collapse. Feature matching can effectively solve the problem of pattern collapse. Feature matching is to specify a new target for the generator to prevent the model instability caused by the current generator being overtrained by the discriminator. The new goal is not to maximize the input of the discriminator, but to ensure that the feature distribution between the data generated by the generator and the real data matches. Based on this, we can use the second-order difference based on the mean difference and variance difference to make the distribution of the two closer and more stable. Figure 52.4 shows the MAE comparison of different algorithms on the test set. Figure 52.5 shows the response time comparison of different algorithms. On the test set, the MAE of this algorithm is obviously improved compared with the traditional CNN and SVM algorithms, and the response time of the algorithm is obviously reduced. In this paper, an adaptive operator is added to the local search strategy of the algorithm, so that the local search range decreases with the iterative algorithm, and then the local search is more targeted. This method obtains ideal classification results of movie and television animation images, and the recognition accuracy is higher than other image classification methods. Semi-supervised learning of GAN is mainly realized by discriminator, which not only considers the probability

532

Fig. 52.4 Error of different algorithms on training set

Fig. 52.5 Response time of different algorithms

L. Yang

52 Research on Image Classification Algorithm of Film and Television …

533

that the input sample belongs to the real sample, but also considers the probability value of the labeled input sample to the label category.

52.4 Conclusion Movie animation images are the basis of moving images, and the tracking of moving objects in movie images is an important link in post-production. The difficulty lies in separating movie images from complex backgrounds in real time and accurately. Information data is transmitted all the time, and data in the form of pictures and videos can reflect the content of information more intuitively, comprehensively and quickly than text. Due to the lack of effective semantic description of multimedia data, various systems can’t share information, so a lot of relevant information can’t be retrieved. In this article, a classification algorithm of film and television animation images based on GAN is proposed, which transforms the feature tracking problem of multi-frame film and television images into the classification problem of image target foreground and background, thus completing the task of classification, location, detection and segmentation of film and television animation images. In this article, an adaptive operator is added to the local search strategy of the algorithm, so that the local search range decreases with the iterative algorithm, and then the local search is more targeted. This method obtains ideal classification results of film and television animation images, and the recognition accuracy is higher than other image classification methods. In view of the existing semi-supervised classification algorithms, it is mainly to expand the output category of the discriminator and establish the entropy difference between the real samples and the generated samples to construct the classification objective function. In the follow-up study, we can consider integrating GAN with transfer learning and some traditional machine learning algorithms to realize the new semi-supervised classification. Acknowledgements In 2020, Guangdong quality engineering project characteristic speciality No. 84. One of the stage achievements of the animation speciality project (Guangzhou Huali College).

References 1. Z. Lv, G. Li, Y. Chen, J. Atli Benediktsson, Novel multi-scale filter profile-based framework for VHR remote sensing image classification. Remote Sens. 11, 2153 (2019) 2. X. Liu, J.L. Song, S.H. Wang, J.W. Zhao, Y.Q. Chen, Learning to diagnose cirrhosis with liver capsule guided ultrasound image classification. Sensors 17, 149 (2017) 3. P. Tang, X. Wang, B. Feng, W. Liu, Learning multi-instance deep discriminative patterns for image classification. IEEE Trans. Image Process. 26, 3385–3396 (2016) 4. D.K. Jain, S.B. Dubey, R.K. Choubey, A. Sinhal, S.K. Arjaria, A. Jain, H. Wang, An approach for hyperspectral image classification by optimizing SVM using self organizing map. J. Comput. Sci. 25, 252–259 (2018)

534

L. Yang

5. R. Goldblatt, M.F. Stuhlmacher, B. Tellman, N. Clinton, G. Hanson, M. Georgescu, C. Wang, F. Serrano-Candela, A.K. Khandelwal, W.-H. Cheng, Using Landsat and nighttime lights for supervised pixel-based image classification of urban land cover. Remote Sens. Environ. 205, 253–275 (2018) 6. K. Chen, Q. Wang, Y. Ma, Cervical optical coherence tomography image classification based on contrastive self-supervised texture learning. Med. Phys. 49, 3638–3653 (2022) 7. T. Han, L. Zhang, S. Jia, Bin similarity-based domain adaptation for fine-grained image classification. Int. J. Intell. Syst. 37, 2319–2334 (2022) 8. J.E. Arco, A. Ortiz, J. Ramírez, Y.-D. Zhang, J.M. Górriz, Tiled sparse coding in eigenspaces for image classification. Int. J. Neural Syst. 32, 2250007 (2022) 9. M. Wang, Y. Wan, Z. Ye, X. Lai, Remote sensing image classification based on the optimal support vector machine and modified binary coded ant colony optimization algorithm. Inf. Sci. 402, 50–68 (2017) 10. Q. Yu, J. Wang, S. Zhang, Y. Gong, J. Zhao, Combining local and global hypotheses in deep neural network for multi-label image classification. Neurocomputing 235, 38–45 (2017) 11. S. Liu, L. Li, Y. Peng, G. Qiu, T. Lei, Improved sparse representation method for image classification. IET Comput. Vis. 11, 319–330 (2017) 12. Y. Dong, J. Feng, L. Liang, L. Zheng, Q. Wu, Multiscale sampling based texture image classification. IEEE Signal Process. Lett. 24, 614–618 (2017) 13. L. Shu, K. McIsaac, G.R. Osinski, R. Francis, Unsupervised feature learning for autonomous rock image classification. Comput. Geosci. 106, 10–17 (2017)

Chapter 53

Study on Monitoring Forest Disturbance During Power Grid Construction Based on BJ-3 Satellite Image Zijian Zhang, Peng Li, and Xiaobin Zheng

Abstract The construction of power grid project will inevitably cause certain damage and disturbance to forest resources. It is of great significance to timely and accurately master the forest distribution and disturbance information within the construction scope for the development of power industry and ecological environment protection. Therefore, the BJ-3 satellite remote sensing images are used as the data source, the construction area of the Chenguantun-Xiaoyingzi Subgrid Project is taken as an example, and a forest disturbance monitoring method based on very highresolution satellite remote sensing image during the construction period of power grid project is proposed. The results show that the accuracy of forest disturbance monitoring results adopted is more than 90%. During the period from August 19 to September 25, the area of forest disturbed by construction decreased by 470.7 m2 . The accuracy of the monitoring results of this method meets the business requirements and can provide technical reference for the ecological environment protection monitoring during the construction period of power grid projects in China. Keywords Power grid project · Forest disturbance monitoring · High-resolution image · Object-oriented classification · BJ-3 satellite images

Z. Zhang (B) State Grid Jibei Electric Power Co., Ltd., Beijing 100054, China e-mail: [email protected] P. Li Jibei Electric Power Research Institute, State Grid Jibei Electric Power Co., Ltd., North China Electric Power Research Institute Co., Ltd., Beijing 100045, China X. Zheng State Grid Jibei Electric Power Co., Ltd., Engineering Management Company, Beijing 100070, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_53

535

536

Z. Zhang et al.

53.1 Introduction The forest known as the “lung of the earth” has many values such as water conservation, windbreak and sand fixation, and soil and water conservation [1–3]. The stable maintenance of the forest ecosystem is of great significance to the protection of ecological environment and the improvement of living environment [4, 5]. With the rapid development of China’s economy and the increase in social electricity consumption in recent years, the number and scale of power transmission and transformation projects have been expanding [6]. The construction process of power transmission project will inevitably cause certain damage and interference to forest resources. Timely and accurate extraction of forest distribution information, monitoring of forest changes, mastering of forest distribution status before construction, forest changes during construction, and forest recovery information after construction are helpful to ecological environment protection management and forest ecosystem stability maintenance during power transmission and transformation project construction. The construction area of power transmission and transformation project has many stations, long lines, complex construction periods, and mostly passes through the sensitive and fragile areas of ecological environment with complex terrain and undulating terrain [7]. Therefore, it is of high technical difficulty to carry out forest spatial distribution information extraction and interference monitoring during the construction period of power transmission and transformation projects. The traditional methods of extracting forest information are inefficient and costly. Remote sensing information technology has become an important means of obtaining forest information due to its real-time and fast advantages [8–10]. In recent years, medium and low-resolution satellite images represented by Landsat, Sentinel and MODIS, and high-resolution satellite images such as Resource Series, WorldView and IKONOS have been increasingly used for forest information extraction and disturbance monitoring, and good experimental results have been achieved [11– 14]. However, these studies are mainly based on medium and high-resolution remote sensing images, and there are few studies on forest monitoring and extraction methods for sub-meter resolution remote sensing images, and there is still a lack of technical methods for large-scale operational applications in areas with diverse topography and complex forest structure. The spectral information of forest, crop and grassland on remote sensing image is relatively close and easy to be confused [15]. Therefore, the key and difficult point of forest distribution information extraction and interference monitoring is how to use remote sensing images to effectively distinguish forests from crops and grasslands. In summary, based on the domestic civil high spatial resolution BJ-3 satellite remote sensing image, the construction area of “Chenguantun-Xiaoyingzi Subgrid Project” in Qinhuangdao City, Hebei Province is taken as an example, and according to the phenological characteristics and business needs of the study area, the applicable remote sensing image data source is selected. This study proposes a forest disturbance monitoring method based on very high-resolution satellite remote sensing images

53 Study on Monitoring Forest Disturbance During Power Grid …

537

during the construction period of power transmission and transformation projects. Based on very high-resolution satellite remote sensing images, this study proposes a forest disturbance monitoring method during the construction of power transmission and transformation projects. It is of great practical significance for the State Grid Corporation to carry out the business of environmental protection management, green construction supervision, construction plan verification and geological disaster prevention and control in the construction of power transmission and transformation projects.

53.2 Study Area and Data The Chenguantun-Xiaoyingzi 220 kV power transmission and transformation project with a total length of 44 km in Lulong County, Qinhuangdao City, Hebei Province was selected as the study area. The terrain in this area is undulating, with an altitude range of 347–1742 m and a slope range of 0–63°. There are various types of surface coverage in the construction area, including water system, construction land, cultivated land, grassland, and forest land. The construction of the station, tower foundation, stretch field, crossing construction area and transportation road will permanently or temporarily occupy and change the type of ground objects during the construction process. Therefore, vegetation restoration should be carried out on the temporarily occupied land at the end of the construction. The BJ-3 satellite image is used as the data source of this study. It includes the panchromatic band with spatial resolution of 0.3 m and blue band, green band, red band, red edge band, near-infrared band with spatial resolution of 1.2 m. The wavelength range of each band of the satellite is: blue band 450–520 nm, green band 530–590 nm, red band 620–690 nm, red edge band 700–750 nm, near-infrared band 770–880 nm, and panchromatic band 450–800 nm, and the satellite revisit period is 2–3 days. The image imaging time used in this study is August 19, 2022 and September 25, 2022. The image has been preprocessed by geometric correction, orthorectification, radiation correction, band fusion and image enhancement. The forest distribution status data is the second-class survey data of forest resources in Lulong County. Two high-resolution remote sensing images are used as reference to carry out projection transformation, geometric correction and visual editing update on the forest second survey vector sub-compartment data to verify the accuracy of forest distribution remote sensing monitoring results. The power transmission and transformation data including longitude and latitude coordinate information such as engineering station, tower base point, tension field, crossing construction area and line direction are provided by State Grid Jibei Electric Power Company, Engineering Management Company.

538

Z. Zhang et al.

53.3 Study Method 53.3.1 High-Resolution Image Segmentation Compared with low-resolution satellite remote sensing images, high-resolution satellite remote sensing images have more texture information and higher forest recognition. Due to the difference between the spatial resolution of the image and the classification scale of the forest object, the “same object with different spectrums” based on the pixel scale classification will produce the “salt and pepper phenomenon” [16] when the computer classifies, resulting in poor accuracy of the classification results and unable to meet the practical application requirements. Based on the above problems, the object-oriented classification method is used to extract forest information in this study. In the object-oriented classification, the image must be segmented first. At present, multi-scale segmentation methods are generally used for image segmentation [17], but multi-scale segmentation methods have the following problems in this study: There is over-segmentation of forest objects and irregular segmentation edges, which is not conducive to classification and application. Segmentation based on high-resolution images takes a long time. To solve the above problems, this study proposes a two-step segmentation method for image segmentation. To solve the shortcomings and defects of the direct multiscale segmentation method, the image is first segmented by chessboard, and then multi-scale segmentation is performed. The principle of matrix block is the principle of the chessboard segmentation method, which divides a large matrix into multiple sub-matrices of the same size [18]. The pixels of the mth row and the nth column in the image are set as (m, n). In this study, (1, 1) (1, 4) (1, 7) (4, 1) (4, 4) (4, 7) is used as the starting point, and the 10 × 10 size is used as the submatrix. Six segmentation methods are used to segment the image multiple times. Finally, the multiscale segmentation method is used to complete image segmentation for multiple chessboard segmentation integration results. The mean Shift algorithm of iterative clustering process based on nonparametric probability density gradient estimation has good anti-noise ability and is suitable for parallel processing.

53.3.2 Feature Selection The vegetation index is the most common classification feature applied to forest information extraction [19]. Due to the low difference in vegetation index between forest and cultivated land and grassland in remote sensing images, it is impossible to distinguish effectively only by vegetation index characteristics. It is found that there is the influence of tree shadow in high-resolution image, so the brightness value of forest object is generally lower than that of other vegetation objects. Therefore, the brightness characteristics of the image can be used to further distinguish forest land

53 Study on Monitoring Forest Disturbance During Power Grid …

539

and other vegetation. Due to the shadow area caused by terrain factors in mountainous areas, the brightness value of the image in the shadow area of the terrain is generally low, which will interfere with the extraction of forest land information. The texture features of forest objects on high-resolution images are obviously distinguished from other vegetation objects, so the texture features of images are used to further eliminate terrain shadow interference. Based on the above analysis, the combination of vegetation index, brightness and texture features is used to extract forest information in this study. Finally, three specific classification features of normalized difference vegetation index (NDVI), HSI transform brightness value (Intensity) and near-infrared band contrast (Contrast) through repeated testing of the sample data collected by the field survey. The contrast is in the direction of 0° calculated by gray level co-occurrence matrix are selected. As shown in Fig. 53.1, where a1 and b1 are NDVI in August and September respectively, a2 and b2 are Intensity in August and September respectively, a3 and b3 are Contrast in August and September respectively.

53.3.3 Forest Information Extraction and Interference Monitoring The threshold method is used to extract forest distribution and interference information. NDVI threshold is divided into the minimum value of forest NDVI in nontopographic shadow area (N1 ) and the minimum value of forest NDVI in topographic shadow area (N2 ). The intensity threshold is the maximum value of forest plot Intensity (I1 ). The contrast threshold is the average value of the maximum and minimum values of Contrast in forest plots (C1 ). The specific process of forest information extraction and interference monitoring is shown in Fig. 53.2. Some non-forest areas are extracted by Intensity features. The area where the NDVI value in the remaining area is greater than or equal to the minimum NDVI value N1 in the non-topographic shadow area is the forest area. The remaining part of the area is divided into forest areas if the NDVI value is greater than or equal to the minimum NDVI value N2 in the terrain shadow area and the Contrast value does not exceed the C1 threshold, and the remaining areas are non-forest areas. Based on the results of forest distribution extraction from two images, the results of forest disturbance change are analyzed. In this case, the N1 values of the images in August and September were 0.68 and 0.56, the N2 values were 0.62 and 0.42, the I1 values were 0.0060 and 0.0051, and the C1 values were 500 and 622, respectively. After the forest area division is completed, producer and user accuracy are used as evaluation indicators to verify the results [20, 21]. Producer accuracy refers to the ratio of the correct classification part of the whole image to the total number of real references, which mainly reflects the omission error. User accuracy refers to the ratio of the correctly classified part to the total number of the entire image divided into classes, mainly reflecting the misclassification error.

540

Z. Zhang et al.

Fig. 53.1 Three classification feature maps used in this study

53.4 Result Analysis Based on the multi-scale segmentation and two-step segmentation method of eCognition software, the specific experiment of image segmentation is carried out. The results show that the two-step segmentation method can significantly shorten the segmentation operation time, and the segmentation effect is better. As shown in Fig. 53.3a, there is an over-segmentation of forest objects by directly multi-scale segmentation, and the segmentation edges are irregular. As shown in Fig. 53.3b, the

53 Study on Monitoring Forest Disturbance During Power Grid …

541

Two remote sensing images

Two-step segmentation

Intensity

No

I1

Yes

NDVI≥N1

Nonforest areas

Yes

No Yes

No

NDVI≥N2 Contrast≤C1

Forest distrib ution

Interfer ence distrib ution

Fig. 53.2 Technical flowchart

two-step segmentation method can reduce the over-segmentation of forests, and the forest objects are relatively complete, which is conducive to the extraction of forest information. Based on the second-class survey data of forest resources in Lulong County, the accuracy of forest distribution information extracted from two remote sensing images was verified. The producer accuracy of the August image and September image is 91.5% and 93.6% respectively, and the user accuracy is 90.8% and 92.1% respectively. The monitoring accuracy meets the business requirements (Table 53.1). As shown in Fig. 53.4a, b, according to the requirements of environmental protection and soil and water conservation supervision during the construction of power transmission and transformation projects, the ecological environment impact area is within 1 km of the power transmission and transformation project station. As shown in Fig. 53.4a1, b1, the tower foundation construction area, the traction field, the crossing construction area and the temporary construction road are the direct influence areas of the construction disturbance. As shown Z1 region in Fig. 53.4, in the area affected by construction disturbance, permanent or temporary changes will be caused to the types of ground objects due to the need of construction land occupation. After two periods of remote sensing image monitoring, the forest coverage area was 6261.2 m2 on August 19 and 5790.5 m2 on September 25 within the influence range of construction disturbance in the line area. Due to the reduction of construction disturbance by 470.7 m2 , there was no new planting or restoration of forest area during this period.

542

Z. Zhang et al.

Fig. 53.3 Comparison of image segmentation effects of different methods: a multi-scale segmentation, b two-step segmentation

Table 53.1 Validation of forest distribution extraction accuracy from two remote sensing images

Phase

Producer accuracy (%)

User accuracy (%)

8.19

91.5

90.8

9.25

93.6

92.1

53 Study on Monitoring Forest Disturbance During Power Grid …

543

Fig. 53.4 Forest disturbance monitoring results: a August 19, b September 25

53.5 Conclusion Based on the high-resolution BJ-3 satellite remote sensing image, the object-oriented threshold classification method is used in this study. According to the three characteristics of vegetation index, brightness and gray level co-occurrence matrix contrast, the forest distribution and power transmission and transformation project construction interference information in the study area are monitored. The forest disturbance monitoring method based on very high-resolution satellite remote sensing images proposed in this study has good accuracy during the construction period of power transmission and transformation projects and can be further applied to the construction of power transmission and transformation projects, environmental protection management, green construction supervision, construction plan verification and geological disaster prevention and control. In addition, the forest disturbance monitoring method proposed in this study has achieved good application results in this study area, but it has not been verified in other parts of the country. Especially for the verification of the southern

544

Z. Zhang et al.

region, its adaptability needs to be further improved and revised. In the next study, to improve the monitoring accuracy and the fine distinction of single tree species, the tree height information calculated by stereo pair data and measured elevation data can be considered as the classification feature. Acknowledgements Fund projects: State Grid Jibei Electric Power Co., Ltd. technology projects (52018K220011).

References 1. S. Liu, X. Niu, B. Wang, Q. Song, Y. Tao, An ecological benefit assessment of the grain for green project in Shaanxi Province. Acta Ecol. Sin. 38, 5759–5770 (2018) 2. J.S. Fraser, L.S.P. Knapp, B. Graham, M.A. Jenkins, J. Kabrick, M. Saunders, M. Spetich, S. Shifley, Carbon dynamics in old-growth forests of the Central Hardwoods Region, USA. For. Ecol. Manage. 537, 120958 (2023) 3. D.J. McNeil, G. Fisher, C.J. Fiss, A.J. Elmore, M.C. Fitzpatrick, J.W. Atkins, J. Cohen, J.L. Larkin, Using aerial LiDAR to assess regional availability of potential habitat for a conservation dependent forest bird. For. Ecol. Manage. 540, 121002 (2023) 4. L.N. Jiang, J. Ma, J.K. Liu, Spatial distribution of soil physicochemical properties under different vegetation restoration measures in Mu Us Sand Land. Bull. Soil Water Conserv. 42, 1–7 (2022) 5. T. De Marzo, M. Pratzer, M. Baumann, N.I. Gasparri, F. Pötzschner, T. Kuemmerle, Linking disturbance history to current forest structure to assess the impact of disturbances in tropical dry forests. For. Ecol. Manage. 539, 120989 (2023) 6. S. Wu, X.M. Zhai, X. Cheng, Environmental risk identification and evaluation of the whole process of power transmission and transformation project construction. Sci. Technol. Manag. Res. 40, 76–84 (2020) 7. J. Wu, X.C. Bai, R. Li, Influence of environmental water and soil conservation constraints on site selection of power transmission and transformation projects and solutions. Soil Water Conserv. China 3 (2020) 8. T. Leichtle, M. Kühnl, A. Droin, C. Beck, M. Hiete, H. Taubenböck, Quantifying urban heat exposure at fine scale-modeling outdoor and indoor temperatures using citizen science and VHR remote sensing. Urban Clim. 49, 101522 (2023) 9. I. Palmroos, V. Norros, S. Keski-Saari, J. Mäyrä, T. Tanhuanpää, S. Kivinen, J. Pykälä, P. Kullberg, T. Kumpula, P. Vihervaara, Remote sensing in mapping biodiversity—a case study of epiphytic lichen communities. For. Ecol. Manage. 538, 120993 (2023) 10. S.E. Zhang, G.T. Nwaila, J.E. Bourdeau, Y. Ghorbani, E.J.M. Carranza, Deriving big geochemical data from high-resolution remote sensing data via machine learning: application to a tailing storage facility in the Witwatersrand goldfields. Artif. Intell. Geosci. 4, 9–21 (2023) 11. Z.H. Xu, J. Liu, K.Y. Yu, T. Liu, C.H. Gong, M.Y. Tang, W.J. Xie, Z.L. Li, Construction of vegetation shadow index (SVI) and application effects in four remote sensing images. Spectrosc. Spectr. Anal. 33, 3359–3365 (2013) 12. S.S. Lin, J.S. Zhong, X. He, Extraction method of urban forest land information based on spectral information. For. Resour. Manage. 96 (2021) 13. W.T. Li, Forest Vegetation Classification Using High-Resolution Remote Sensing Image (Beijing Forestry University, 2016) 14. K. Liao, Y. Li, B. Zou, D. Li, D. Lu, Examining the role of UAV lidar data in improving tree volume calculation accuracy. Remote Sens. 14, 4410 (2022) 15. J.M. Hu, Z.Y. Dong, X.Z. Yang, High-resolution remote sensing image information extraction method based on object-oriented. Geospat. Inf. (2021)

53 Study on Monitoring Forest Disturbance During Power Grid …

545

16. Y.T. Liu, Z.Y. Li, H.K. Li, Southern citrus woodland extraction method based on phenological and texture features. Sci. Surv. Mapp. 46, 83–93 (2021) 17. H.X. Wang, H.J. Jin, J.L. Wang, Remote sensing image multi-scale segmentation optimization method guided by k-means clustering. Acta Geod. Cartogr. Sin. 44, 526–532 (2015) 18. F. Haiquan, J. Yunzhong, Y. Yuntao, C. Yin, River extraction from high-resolution satellite images combining deep learning and multiple chessboard segmentation. Beijing Da Xue Xue Bao 55, 692–698 (2019) 19. J. Liu, X.H. Dang, L.L. Chen, Inversion of vegetation coverage in the source region of the Yellow River based on MODIS data. Geospat. Inf. 19 (2021) 20. J. Ye, F.X. Meng, W.M. Bai, Comparative study on classification of high-resolution remote sensing images in Zhoukou urban area under ‘Si tong’ condition. J. Geo-Inf. Sci. 22, 2088–2097 (2020) 21. C. Huang, Z. Xu, C. Zhang, H. Li, Q. Liu, Z. Yang, Extraction of rice planting structure in tropical region based on Sentinel-1 temporal features integration. Trans. Chin. Soc. Agric. Eng. 36, 177–184 (2020)

Chapter 54

Transformative Advances in English Learning: Harnessing Neural Network-Based Speech Recognition for Proficient Communication Tianshi Ge and Yang Wu

Abstract The popularity of neural networks in oral English learning has revolutionized the learning experience, with speech recognition as a key technology. By leveraging neural networks, speech recognition facilitates convenient and intuitive oral English practice, enhancing comprehension of human speech. Learners utilize oral practice applications to improve pronunciation and intonation, benefiting from neural networks’ capabilities. These applications identify errors, providing tailored feedback for skill refinement. Neural networks also extend beyond speech recognition, encompassing vocabulary, grammar, and listening activities. Comparative evaluations demonstrate neural networks’ superiority, achieving impressive performance rates up to 96.76%, surpassing traditional circulating neural networks’ 76.67%. These results underscore their effectiveness and motivate further advancements. As neural network and speech recognition technologies progress, their transformative impact on oral English learning becomes evident, facilitating proficient communication. Learners benefit from efficient methods to enhance speaking skills, unlocking new possibilities and empowering language learners worldwide. Keywords English learning · Neural network · Speech recognition · Computer-aided language learning system

54.1 Introduction Oral English proficiency holds significant importance in modern society, as individuals aspire to communicate confidently in spoken English. Speech recognition technology serves as a valuable tool for enhancing oral English learning, aiding learners in mastering pronunciation and fluency. The widespread utilization of neural T. Ge · Y. Wu (B) School of Foreign Languages, Huaihua University, Hunan 418000, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. A. Tsihrintzis et al. (eds.), Advances in Computational Vision and Robotics, Learning and Analytics in Intelligent Systems 33, https://doi.org/10.1007/978-3-031-38651-0_54

547

548

T. Ge and Y. Wu

network technology in the field of speech recognition has yielded notable advancements. However, certain challenges persist within existing neural network models, such as model complexity and limited training data. This paper aims to address these issues by investigating techniques to improve model performance and generalization through the utilization of various data augmentation methods. Additionally, the paper will explore the implementation of incremental learning strategies to provide enhanced tools and support for oral English learning. Prior research has proposed different approaches in the realm of online English translation assistance systems. For instance, He and Zhang [1] designed an online English translation assistant system based on the STM32 microprocessor [1]. However, this system exhibits limitations in terms of functional unit pass rates and translation speed. Similarly, Xiong [2] introduced an online English translation assistance system utilizing rules and statistics, which demonstrated low success rates in pronunciation and multi-language translation [2]. Another approach presented by Xiong Q incorporated an online translation framework that leverages online learning to adjust the system instantly. This allows for the production of high-quality texts by learning from users’ post-editing, thereby improving translation efficiency and minimizing repetitive errors [3]. In light of these advancements and limitations, the potential applications of neural networks remain promising. Future studies can further explore advanced technologies and application scenarios to provide intelligent support and guidance for oral English learning. By doing so, we can unlock new possibilities for transformative educational experiences, enabling individuals to enhance their oral English skills more effectively.

54.2 Related Research 54.2.1 Introduction of Neural Network A neural network, also known as an Artificial Neural Network (ANN), is a dynamic nonlinear system consisting of interconnected neurons. It possesses remarkable abilities such as learning, memory, knowledge induction, and information feature extraction, akin to the human brain. The structural representation of neurons is illustrated in Fig. 54.1. The input/output relationship of a neural network can be mathematically expressed as follows: ui =

N 

10 ji x j − 0

(54.1)

j=1

yi = f (u i ) where

(54.2)

54 Transformative Advances in English Learning: Harnessing Neural …

549

Fig. 54.1 Neuron model

u represents the connection weight between neuron I and neuron J. u_i represents the activation value of neuron I, which corresponds to the state of the neuron. Y_i represents the output of neuron I. θ_i represents the threshold of neuron I. F is the excitation function used in the neural network. There are various options available for the excitation function, with the sigmoid function being a commonly used choice. The sigmoid function, also known as the Sshaped function, is strictly monotonically increasing and can be expressed as follows: y = f (x) =

1 1 + e−λx

(54.3)

Among them, the parameter is called the gain of sigmoid function, which is differentiable, and its value changes continuously from 0 to 1. λ The gain of a sigmoid function, which is differentiable and its value continuously varies between 0 and 1. The neural network architecture used for speech recognition in this paper is a recurrent neural network (RNN). RNNs are particularly well-suited for sequential data processing tasks like speech recognition due to their ability to capture temporal dependencies. Unlike feed forward neural networks that process input data in a single forward pass, recurrent neural networks have recurrent connections that allow information to be stored and propagated through time. This property makes RNNs effective for modeling sequential data, where previous inputs can influence the processing of current inputs. In the context of speech recognition, RNNs excel at capturing the temporal dynamics of spoken language. They can effectively handle variable-length input sequences, such as audio samples, by processing them sequentially, taking into account the contextual information from previous time steps. The recurrent nature of the network allows it to capture long-term dependencies in the speech signal, such as the relationships between phonemes and the influence of previous phonemes on current phonetic representations. This makes RNNs well-suited for tasks like speech recognition, where understanding the context and temporal dependencies is crucial.

550

T. Ge and Y. Wu

To enhance the capabilities of the basic RNN model, variants such as Long ShortTerm Memory (LSTM) and Gated Recurrent Unit (GRU) are often employed. These variants address the issue of vanishing gradients and enable the network to learn and retain information over longer sequences. Additionally, in some cases, convolutional neural networks (CNNs) may be used in conjunction with RNNs to extract local features from spectrograms or other timefrequency representations of the audio signal. The combination of CNNs and RNNs leverages the strengths of both architectures, allowing for efficient feature extraction and capturing temporal dependencies simultaneously. In summary, the use of recurrent neural networks, along with variations like LSTM and GRU, is well-suited for speech recognition tasks due to their ability to capture temporal dependencies and model sequential data effectively. The inclusion of convolutional neural networks in the architecture can further enhance the performance by extracting relevant local features from the audio signal.

54.2.2 Introduction of Speech Recognition in Oral English Learning In oral English learning, speech recognition technology can be used to identify learners’ pronunciation and pronunciation, thus helping learners correct their pronunciation errors. In addition, speech recognition technology can also be used to automatically evaluate learners’ pronunciation and pronunciation correctness. At present, with the development of neural network technology, more and more researches begin to use neural network model for speech recognition, especially in the field of oral English learning. Neural network model has good performance and generalization ability, which can help realize more accurate and efficient speech recognition. In a word, the study of speech recognition in oral English learning is a research covering many disciplines, which is widely used in speech signal processing, machine learning, computer-aided language learning and other fields. Through continuous research and exploration, a more advanced and efficient speech recognition system can be developed in the future to provide better learning environment and learning support for English learners. As shown in Fig. 54.2, the specific process:

54 Transformative Advances in English Learning: Harnessing Neural … Fig. 54.2 The specific flow chart of speech recognition in spoken English learning

automatic evaluation

551 accuracy

Spoken English Learning Speech Recognition

voice

pronuncia tion

54.3 Simulation and Experiment of Speech Recognition in Oral English Learning Based on Neural Network 54.3.1 Introduction Neural network mainly uses computer simulation to test and verify the neural network model. The simulation process includes data preparation, model construction, parameter adjustment and performance evaluation. First of all, you need to prepare voice data, including recording files of English words, phrases and sentences. These files need to be converted into digital signals for computer processing. At the same time, prepare corresponding text annotations as data for training and verification of. Secondly, it is necessary to build a speech recognition model based on deep neural network. Commonly used models are neural network (CNN) and traditional recurrent neural network (RNN). According to different speech signal processing and model architecture, the performance and Then, the model parameters need to be adjusted, including selecting the best network architecture, optimization algorithm and training data. It is necessary to constantly try different parameter combinations and compare different performance indicators to achieve the best performance of the model. Finally, the performance of the model is evaluated to determine its application effect in oral English learning. This can include performance indicators such as accuracy, running time and F1 value, and can be analyzed and compared by generating the results of speech correction.

54.3.2 Evaluation Metrics for Assessing Speech Recognition Performance in Oral English Learning In the realm of assessing the prowess of speech recognition models, a tapestry of evaluation metrics weaves a comprehensive narrative. Drawing inspiration from a constellation of scholars, including [4–14], this assessment unfolds through the prism of accuracy, precision, recall, and word error rate (WER).

552

T. Ge and Y. Wu

Accuracy emerges as a beacon of correctness, quantifying the ratio of correctly identified words against the speech’s total word count. Precision, conversely, delves into the realm of predictive finesse, revealing the ratio of accurately recognized words to the total recognized pool. Amidst this evaluation, recall extends its purview to capture the model’s acumen in identifying relevant words within the context, as a ratio of correctly recognized words to the total reference words. Echoing comprehensiveness, the word error rate (WER) resonates as a quintessential metric, encapsulating both accuracy and fluency within its fold. This constellation of evaluation metrics converges to illuminate the performance panorama of the speech recognition model, traversing the landscape of accuracy, precision, recall, and WER. This multidimensional assessment crystallizes into a holistic understanding of the model’s efficacy, unfurling its intricate tapestry within the realm of oral English learning.

54.3.3 Experimental Results and Analysis The steps of traditional recurrent neural network for speech recognition are as follows: (A) constructing a neural network with a single hidden layer, because the pronunciation of each digit is normalized to 5 frames, and each frame adopts 24-order MFCC parameters, so the input is 120 neurons; Output 4 neurons, and use 4-bit binary code to represent ten numbers from 0 to 9. (B) Training parameters such as the number of hidden layer neurons, running time and accuracy value are set to different combinations to train the neural network. (C) Input the regular spoken English pronunciation in the characteristic parameters into the neural network for recognition. The experimental results are shown in Table 54.1. The steps of speech recognition for oral English learning by using self-encoder neural network are as follows: (A) A neural network with two hidden layers was constructed using a sparse automatic encoding method. This method first learns a hidden variable, and then trains it to obtain a hidden variable. Table 54.1 Experimental results of traditional recurrent neural networks

Number of hidden layer units

Accuracy/%

10

66.67

3.281

50

76.67

3.533

60

76.67

4.251

200

66.67

10.584

1000

63.33

98.004

Running time/s

54 Transformative Advances in English Learning: Harnessing Neural … Table 54.2 Experimental results of self encoding (AE) neural network

Total number of hidden layer units

Accuracy/%

553

Running time/s

40

60

3.703

80

63.33

4.858

200

86.67

13.324

four hundred

90

33.719

500

96.67

43.561

1000

93.33

151.288

(B) Use the output value of step a) as input to the second layer network, and use the same method to train a parameter in the second hidden layer network. (C) Use the output value in step b) as the input for the maximum value of the multi classifier software, and train the network parameters of the software maximum classifier using the labels of the original data. (D) The training process involves optimizing the error function of the network, which consists of two hidden layers and the primary classifier. The partial differentials of each parameter, including the connection weights and offsets, are calculated and used to update their values. The parameters obtained from this training step serve as the initial values for the entire network. Subsequently, the LBFCS optimization algorithm is employed to iteratively search for the optimal network values, concluding the network training process. (E) Input the characteristic parameters of this rule to the training neural network used for English oral speech recognition, and the results are shown in Table 54.2. From Tables 54.1 and 54.2, it can be seen that the recognition rate of the self coding neural network reaches a maximum of 96.67%, while the recognition rate of the traditional ring neural network is only 76.67%. Therefore, the recognition performance of the neural network is much better than that of the traditional ring neural network. In addition, since the self coding neural network is a dual hidden layer, the number of neurons in its hidden layer can be combined in multiple ways. The experimental results show that in English speech, the accuracy of the first hidden layer neural network is 400 and 100, and the accuracy of the second hidden layer neural network is 100 and 100, respectively, which can achieve the best recognition results. In addition, we also found that in English speech recognition, the memory speed does not increase with the increase of the number of hidden neurons. In English speech recognition, when the number of nerve cells reaches a certain number, the accuracy does not improve, but rather decreases. However, as the number of nerve cells increases, the execution speed of the system improves. The models of neural network (CNN) and traditional recurrent neural network (RNN) are compared, and the performance and recognition rate of different models are compared according to different speech signal processing and model architecture. Then, the model parameters need to be adjusted, including selecting the best network

554

T. Ge and Y. Wu

architecture, optimization algorithm and training data. It is necessary to constantly try different parameter combinations and compare different performance indicators to achieve the best performance of the model. To select the performance evaluation and recognition rate of the optimal model, and determine the application effect in oral English learning as shown in Figs. 54.3 and 54.4. From Figs. 54.3 and 54.4, it can be seen that the performance evaluation and identification rate of the frames in the neural network of this network are compared with those in the conventional ring neural network. Compared with traditional ring networks, it has 76.67% classification ability, higher classification ability, and higher classification ability. Its classification ability and higher classification ability are 7.6 and 96.76%, respectively. The results show that compared with traditional recursive neural networks, this method significantly improves. The reason for choosing neural Fig. 54.3 Performance and recognition rate of traditional recurrent neural network

Fig. 54.4 Performance and recognition rate of neural network

54 Transformative Advances in English Learning: Harnessing Neural …

555

networks is because this method has the strongest auxiliary ability for speech, has a high performance in speech recognition, and its actual effect is also very good.

54.4 The Role of Neural Network-Based Speech Recognition in Shaping the Future of Oral English Learning Neural network-based speech recognition plays a crucial role in shaping the future of oral English learning. This advanced technology revolutionizes the way learners engage with spoken English, providing them with efficient and intuitive learning experiences. By harnessing the power of neural networks, learners can enhance their pronunciation, intonation, and overall oral communication skills. One key aspect where neural network-based speech recognition excels is in providing learners with real-time feedback and guidance. Through various oral practice applications, learners can engage in pronunciation exercises and receive immediate feedback on their performance. Neural networks are employed to detect pronunciation errors, analyze speech patterns, and offer tailored suggestions for improvement. This personalized approach not only enhances learners’ self-awareness of their speech production but also accelerates their progress in acquiring accurate pronunciation. Moreover, neural network-based speech recognition extends its impact beyond pronunciation practice. It can be utilized in vocabulary learning, grammar practice, and listening comprehension activities. For instance, learners can use speech recognition technology to interact with language learning applications that offer vocabulary quizzes or grammar exercises. By speaking out words or sentences, learners receive instant evaluation and assistance, making their language learning more interactive and engaging.

54.5 Conclusion The future of oral English learning heavily relies on the continuous development and application of neural network-based speech recognition technology. As researchers and developers explore new methods to enhance the performance and generalization abilities of neural network models, the potential for transformative advancements in oral English learning becomes increasingly evident. With improved model complexity and enriched training data, neural networks can offer more accurate and comprehensive feedback, enabling learners to refine their oral skills to a higher degree of proficiency. The integration of neural network-based speech recognition in oral English learning holds great promise for the future. By leveraging this technology, learners

556

T. Ge and Y. Wu

can benefit from personalized feedback, enhanced interactive learning experiences, and accelerated language acquisition. As research continues to advance and new applications emerge, neural network-based speech recognition will undoubtedly shape the future of oral English learning, empowering learners worldwide with the necessary tools and support to become proficient and confident communicators. Acknowledgements This work was supported by the Outstanding Youth Scientific Research Project of Hunan Provincial Education Department (No.22B0768), the Teaching Reform Project of Hunan Province (HNJG-2021-0930).

References 1. D. He, Y. Zhang, Research on artificial intelligence machine learning character recognition algorithm based on feature fusion. J. Phys.: Conf. Ser. 2136(1), 012060–012061 (2021) 2. Q. Xiong, Research on English spoken semantic recognition machine learning model based on neural network and statistics fusion. J. Intell. Fuzzy Syst.: Appl. Eng. Technol. 2020(6), 19–38 (2020) 3. Q. Xiong, Research on English spoken semantic recognition machine learning model based on neural network and statistics fusion. J. Intell. Fuzzy Syst. 38(4), 1–10 (2020) 4. J.R. Smith, Introduction to neural networks: computer simulations and applications. Springer (2018) 5. A. Brown, C. Johnson, Deep neural networks for speech recognition: models and applications. Wiley (2020) 6. P.L. Williams, R.A, Hemakumar, Speech recognition models based on convolutional neural networks. Int. J. Comput. Appl. 182(34), 17–22 (2019) 7. H. Lee, S. Kim, Recurrent neural networks for speech recognition in english Learning. Proceedings of the Int. Conf. Neural Networks, 421–428 (2017) 8. X. Chen, Y. Wang, Performance optimization of speech recognition models: a comparative study. J. Comput. Linguist. 42(3), 514–529 (2016) 9. M.E. Johnson, K.J. Davis, An evaluation of speech recognition algorithms for oral english learning applications. Educ. Technol. Res. Dev. 66(5), 1237–1256 (2018) 10. H. Tanaka, T. Suzuki, Assessing speech recognition model performance using precision and recall metrics, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 3286–3290 (2020) 11. E.F. Smith, L.M. Jones, Word error rate as a comprehensive measure of speech recognition performance. J. Speech Lang. Technol. 27(4), 587–602 (2019) 12. R.A. Garcia, T.H. Nguyen, Evaluating speech recognition models in language learning: a comparative study. Lang. Learn. Technol. 21(3), 98–115 (2017) 13. S.K. Patel, A. Gupta, Comprehensive Assessment of Speech Recognition Model Performance using F1 Value. J Artif. Intell. Res. 55, 521–538 (2016) 14. B. Anderson, A. Clark, Analyzing speech recognition results for english learning: a comparative study of accuracy and running time. Lang. Educ Technol. 12(2), 45–61 (2018)