Remote Sensing Intelligent Interpretation for Geology: From Perspective of Geological Exploration 9819989965, 9789819989966

This book presents the theories and methods for geology intelligent interpretation based on deep learning and remote sen

100 49 10MB

English Pages 246 [240] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1 Geological Remote Sensing: An Overview
1.1 Description of Geological Remote Sensing
1.1.1 Concept and Principle of Geological Remote Sensing
1.1.2 Key Remote Sensing Techniques for Geological Research
1.1.3 Key Features for the Interpretation of Geological Remote Sensing
1.1.4 Technical System of Remote Sensing Technology for Geological Research
1.1.5 Applications of Geological Remote Sensing
1.2 Research Advance in Geological Remote Sensing
1.2.1 Payload: Developing from Single Optical Sensor to New Multimodal Sensors
1.2.2 Data Processing: Developing from “Rough” to “Precision”
1.2.3 Geological Application: Developing from Human–Computer Interaction to All-Factor Intelligent Interpretation
1.3 Intelligent Interpretation of Geological Remote Sensing
1.3.1 Technical System of Intelligent Interpretation of Remote Sensing for Geology
1.3.2 Advantages of Intelligent Interpretation of Geological Remote Sensing
1.4 Challenges and Future Directions of Geological Remote Sensing
1.4.1 Gaps and Challenges in Current State of Geological Remote Sensing
1.4.2 Research Directions of Geological Remote Sensing
References
2 Geological Remote Sensing Dataset Construction for Multi-level Tasks
2.1 Pixel-Level Dataset
2.1.1 Study Area
2.1.2 Data Sources
2.1.3 Data Preprocessing
2.1.4 Dataset Construction
2.2 Scene-Level Dataset
2.2.1 Study Area
2.2.2 Data Sources
2.2.3 Data Preprocessing
2.2.4 Dataset Construction
2.3 Semantic Segmentation-Level Dataset
2.3.1 Study Area
2.3.2 Data Sources
2.3.3 Data Preprocessing
2.3.4 Dataset Construction
2.4 Semantic Segmentation-Level Dataset Based on Multisource Data
2.4.1 Study Area
2.4.2 Data Sources
2.4.3 Dataset Construction
2.5 Prior Knowledge-Assisted Dataset
2.5.1 Study Area
2.5.2 Data Sources
2.5.3 Data Preprocessing
2.5.4 Dataset Construction
2.6 Transfer Learning Dataset
2.6.1 Study Area
2.6.2 Data Sources
2.6.3 Data Preprocessing
2.6.4 Dataset Construction
2.7 Transfer Learning Dataset for Prior Knowledge-Assisted Study
2.7.1 Study Area
2.7.2 Data Sources
2.7.3 Data Preprocessing
2.7.4 Dataset Construction
References
3 Lithological Classification Based on Large-Scale Pixel Neighborhood and VGGnet-Based Transfer Learning
3.1 Introduction
3.2 Methods
3.2.1 Construction of Model
3.2.2 VGG16 Convolutional Neural Network Model
3.2.3 VGG16 Transfer Learning Model
3.2.4 Accuracy Evaluation
3.3 Results
3.3.1 Experimental Environment and Setup
3.3.2 Full Image Prediction
3.3.3 Visual Evaluation of Prediction Results
3.3.4 Quantitative Accuracy Evaluation
3.4 Conclusion
References
4 Lithological Remote Sensing Scene Classification Based on Multi-view Data
4.1 Introduction
4.1.1 Research Background and Significance
4.1.2 Research Status
4.1.3 Research Objectives and Main Research Contents
4.2 Methods
4.2.1 Lithologic Scene Classification Based on Multi-view Remote Sensing Data Fusion
4.2.2 Accuracy Evaluation
4.3 Results and Discussion
4.3.1 Experimental Setup and Hyperparameter Optimization
4.3.2 Experimental Result
4.3.3 Discussions
4.4 Conclusion
References
5 Geological Lithology Semantic Segmentation Based on Deep Learning Method
5.1 Introduction
5.2 Methods
5.2.1 The Utilized Algorithms
5.2.2 Evaluation Metrics
5.3 Results
5.3.1 Experiment Setup
5.3.2 Model Performance
5.3.3 Visual Assessment
5.4 Discussions
5.5 Conclusion
References
6 Remote Sensing Lithology Intelligent Segmentation Based on Multi-source Data
6.1 Introduction
6.1.1 Research Background and Meaning
6.1.2 Research Status
6.1.3 Research Objectives and Research Content
6.2 Methods
6.2.1 Remote Sensing Lithology Semantic Segmentation Method Based on Adaptive Fusion of Multi-source Data
6.2.2 Remote Sensing Lithology Semantic Segmentation Method Based on Prior Knowledge
6.3 Results and Discussion
6.3.1 Evaluation of Test Set Accuracy of Multi-modal Data Adaptive Fusion Method
6.3.2 Test Set Accuracy Evaluation Using Methods that Incorporate Prior Knowledge
6.4 Conclusion
References
7 Prior Knowledge-Based Intelligent Model for Lithology Classification
7.1 Introduction
7.2 Methods
7.2.1 Lithological Scene Classification Based on Prior Knowledge and Improved Dense Connected Networks
7.2.2 Improved Dense Connectivity Network
7.2.3 Edge Enhancement
7.2.4 Experimental Setup and Environment
7.2.5 Evaluating Metrics
7.3 Results and Discussions
7.3.1 Comparative Experiment of 3EFFSA Model
7.3.2 Comparative Experiment of Prior-3EFFSA Model
7.3.3 Discussion
7.4 Conclusion
References
8 Multi-view Lithology Remote Sensing Scene Classification Based on Transfer Learning
8.1 Introduction
8.2 Methods
8.2.1 Lithologic Scene Classification Based on Transfer Learning
8.2.2 Lithologic Scene Classification Based on Multi-view Remote Sensing Data Fusion
8.2.3 Accuracy Evaluation
8.3 Results and Discussion
8.3.1 Experimental Setup and Hyperparameter Optimization
8.3.2 Experimental Result
8.4 Conclusion
References
9 Lithological Scene Classification Based on Model Migration and Fine-Tuning Strategy
9.1 Introduction
9.1.1 Overview of Transfer Learning
9.1.2 Deep Transfer Learning
9.2 Methods
9.2.1 Left Side as the Source Domain, and Right Side as the Target Domain
9.2.2 Right Side as the Source Domain, and Left Side as the Target Domain
9.3 Results and Analysis
9.3.1 Experimental Results and Analysis of Transfer Learning Model
9.4 Experimental Results and Analysis of Transfer Learning Based on Small Samples
9.5 Conclusion
References
10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based on Sparse Unmixing Method
10.1 Introduction
10.2 Methods
10.2.1 LMM
10.2.2 SUnSAL
10.2.3 SUnSAL-TV
10.2.4 MUA
10.3 Experimental Results and Discussion
10.3.1 Spectral Library
10.3.2 Simulation Datasets
10.3.3 Real Datasets
10.3.4 Evaluation Criterion
10.3.5 Experimental Analysis
10.4 Conclusion
References
11 Concluding Remarks
References
Recommend Papers

Remote Sensing Intelligent Interpretation for Geology: From Perspective of Geological Exploration
 9819989965, 9789819989966

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Weitao Chen Xianju Li Xuwen Qin Lizhe Wang

Remote Sensing Intelligent Interpretation for Geology From Perspective of Geological Exploration

Remote Sensing Intelligent Interpretation for Geology

Weitao Chen · Xianju Li · Xuwen Qin · Lizhe Wang

Remote Sensing Intelligent Interpretation for Geology From Perspective of Geological Exploration

Weitao Chen School of Computer Science China University of Geosciences Wuhan, China

Xianju Li School of Computer Science China University of Geosciences Wuhan, China

Xuwen Qin China Aero Geophysical Survey and Remote Sensing Center for Natural Resources Beijing, China

Lizhe Wang School of Computer Science China University of Geosciences Wuhan, China

ISBN 978-981-99-8996-6 ISBN 978-981-99-8997-3 (eBook) https://doi.org/10.1007/978-981-99-8997-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Preface

Geological background identification through remote sensing plays a crucial role in understanding the Earth’s composition and structure. The significance of accurate geological background mapping transcends mere academic interest, as it holds profound implications for various sectors, including mineral exploration, environmental management, and infrastructure development. Despite the longstanding use of ground-based and remote sensing techniques in this field, persisting challenges such as time-intensive processes, subjection of human error, and inadequate automation capacity have underscored the requirement for advanced methodologies based on next-generation artificial intelligence technologies that can elevate the efficiency and precision. To fill the gap at the intersection of geological remote sensing and next-generation artificial intelligence technologies, this book constructs multi-type geological remote sensing datasets for the multi-level research system of “pixel-based classification→ scene classification→ semantic segmentation” and focuses on intelligent interpretation of geological remote sensing, especially using prior knowledge and transfer learning. The details are as follows: (1) geological remote sensing dataset construction for multi-level tasks; (2) lithological classification based on large-scale pixel neighborhood and transfer learning; (3) lithological scene classification based on multi-view data; (4) lithology semantic segmentation based on deep learning method; (5) lithology intelligent segmentation based on multi-source data; (6) prior knowledge-based intelligent model for lithology classification; (7) multi-view lithology remote sensing scene classification based on transfer learning; (8) lithological scene classification based on model migration and fine-tuning strategy; and (9) hyperspectral remote sensing inversion of mineral abundance based on sparse unmixing method. Chapter 1 was written by Weitao Chen and Ruizhen Wang and assisted by Xianju Li and Yue Zhou. Chapter 2 was written by Weitao Chen, Xianju Li, Xuwen Qin, and Lizhe Wang, with the assistance of Yue Zhou, Hao Zhou, Zhiyuan Sui, Fasen Li, and Kang Hu. Chapter 3 was written by Xianju Li and Xuwen Qin and assisted by Hao Zhou and Shuqi Fan. Chapter 4 was written by Weitao Chen and Xianju Li and assisted by Hao Zhou and Shuqi Fan. Chapter 5 was written by Xianju Li and Lizhe v

vi

Preface

Wang, with the assistance of Zhiyuan Sui and Ruizhen Wang. Chapter 6 was written by Weitao Chen, Xuwen Qin, and Fasen Li and assisted by Yue Zhou. Chapter 7 was written by Xianju Li and assisted by Kang Hu, Yilong Li, and Zhiyuan Sui. Chapter 8 was written by Weitao Chen and Xianju Li and assisted by Hao Zhou and Shuqi Fan. Chapter 9 was written by Xianju Li and assisted by Kang Hu, Yilong Li, and Zhiyuan Sui. Chapter 10 was written by Taowei Wang and Xuwen Qin and assisted by Ruizhen Wang. Chapter 11 was written by Weitao Chen and Xianju Li and assisted by Ruizhen Wang. The experiments of the book were designed by Weitao Chen, Xianju Li, Xuwen Qin, and Lizhe Wang and completed by Hao Zhou, Zhiyuan Sui, Fasen Li, Kang Hu, and Taowei Wang. The work of the whole book was completed by Weitao Chen, Xuwen Qin, and Lizhe Wang and Xianju Li and assisted by Ruizhen Wang. This book was jointly supported by the Fundamental Research Funds for the Natural Science Foundation of China (No. U21A2013, U1803117, 42071430), Key Research and Development Program of Hubei Province, China (No. 2021BID009), the Natural Resources Research Project of China’s Hubei Province (No. ZRZY2021KJO4), the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No. GLAB2022ZR02), and the Fundamental Research Funds for the Central Universities. The book is intended for senior undergraduate student, postgraduate student, and Ph.D. student who are interested in geology, remote sensing, and artificial intelligence. It can also be used as a reference book for researchers, practitioners, and policymakers to guide the research on key fundamental geological issues, mineral exploration, and management. Wuhan, China

Weitao Chen Xianju Li Xuwen Qin Lizhe Wang

Contents

1

Geological Remote Sensing: An Overview . . . . . . . . . . . . . . . . . . . . . . . 1.1 Description of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . 1.1.1 Concept and Principle of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Key Remote Sensing Techniques for Geological Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Key Features for the Interpretation of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Technical System of Remote Sensing Technology for Geological Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.5 Applications of Geological Remote Sensing . . . . . . . . . . 1.2 Research Advance in Geological Remote Sensing . . . . . . . . . . . . . 1.2.1 Payload: Developing from Single Optical Sensor to New Multimodal Sensors . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Data Processing: Developing from “Rough” to “Precision” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Geological Application: Developing from Human–Computer Interaction to All-Factor Intelligent Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Intelligent Interpretation of Geological Remote Sensing . . . . . . . . 1.3.1 Technical System of Intelligent Interpretation of Remote Sensing for Geology . . . . . . . . . . . . . . . . . . . . . 1.3.2 Advantages of Intelligent Interpretation of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . . . . 1.4 Challenges and Future Directions of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Gaps and Challenges in Current State of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Research Directions of Geological Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 1 2 2 3 4 5 6 6

7 8 8 9 10 10 11 11 vii

viii

2

3

Contents

Geological Remote Sensing Dataset Construction for Multi-level Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Pixel-Level Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Scene-Level Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Semantic Segmentation-Level Dataset . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Semantic Segmentation-Level Dataset Based on Multisource Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Prior Knowledge-Assisted Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Transfer Learning Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Transfer Learning Dataset for Prior Knowledge-Assisted Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.4 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lithological Classification Based on Large-Scale Pixel Neighborhood and VGGnet-Based Transfer Learning . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Construction of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15 15 15 15 17 17 17 18 19 19 20 24 24 24 26 27 29 29 31 34 42 42 42 43 44 55 55 57 57 58 59 59 60 60 60 62 65 65 66 66

Contents

3.2.2 VGG16 Convolutional Neural Network Model . . . . . . . . 3.2.3 VGG16 Transfer Learning Model . . . . . . . . . . . . . . . . . . . 3.2.4 Accuracy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Experimental Environment and Setup . . . . . . . . . . . . . . . . 3.3.2 Full Image Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Visual Evaluation of Prediction Results . . . . . . . . . . . . . . 3.3.4 Quantitative Accuracy Evaluation . . . . . . . . . . . . . . . . . . . 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5

Lithological Remote Sensing Scene Classification Based on Multi-view Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Research Background and Significance . . . . . . . . . . . . . . 4.1.2 Research Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Research Objectives and Main Research Contents . . . . . 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Lithologic Scene Classification Based on Multi-view Remote Sensing Data Fusion . . . . . . . . . . 4.2.2 Accuracy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Experimental Setup and Hyperparameter Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geological Lithology Semantic Segmentation Based on Deep Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 The Utilized Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Visual Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

67 68 68 69 69 70 70 70 72 72 75 75 75 77 80 81 81 88 89 89 90 94 97 98 101 101 102 102 105 106 106 107 107 109 115 116

x

6

7

8

Contents

Remote Sensing Lithology Intelligent Segmentation Based on Multi-source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Research Background and Meaning . . . . . . . . . . . . . . . . . 6.1.2 Research Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Research Objectives and Research Content . . . . . . . . . . . 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Remote Sensing Lithology Semantic Segmentation Method Based on Adaptive Fusion of Multi-source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Remote Sensing Lithology Semantic Segmentation Method Based on Prior Knowledge . . . . . 6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Evaluation of Test Set Accuracy of Multi-modal Data Adaptive Fusion Method . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Test Set Accuracy Evaluation Using Methods that Incorporate Prior Knowledge . . . . . . . . . . . . . . . . . . . 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prior Knowledge-Based Intelligent Model for Lithology Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Lithological Scene Classification Based on Prior Knowledge and Improved Dense Connected Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Improved Dense Connectivity Network . . . . . . . . . . . . . . 7.2.3 Edge Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Experimental Setup and Environment . . . . . . . . . . . . . . . . 7.2.5 Evaluating Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Comparative Experiment of 3EFFSA Model . . . . . . . . . . 7.3.2 Comparative Experiment of Prior-3EFFSA Model . . . . . 7.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-view Lithology Remote Sensing Scene Classification Based on Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Lithologic Scene Classification Based on Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Lithologic Scene Classification Based on Multi-view Remote Sensing Data Fusion . . . . . . . . . .

117 117 118 121 125 127

127 135 144 144 152 158 159 165 165 167

167 167 169 177 178 178 178 180 182 184 185 187 187 188 189 191

Contents

8.2.3 Accuracy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Experimental Setup and Hyperparameter Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3

9

Lithological Scene Classification Based on Model Migration and Fine-Tuning Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Overview of Transfer Learning . . . . . . . . . . . . . . . . . . . . . 9.1.2 Deep Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Left Side as the Source Domain, and Right Side as the Target Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Right Side as the Source Domain, and Left Side as the Target Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Experimental Results and Analysis of Transfer Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Experimental Results and Analysis of Transfer Learning Based on Small Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based on Sparse Unmixing Method . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 LMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 SUnSAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 SUnSAL-TV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.4 MUA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Spectral Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Simulation Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 Real Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Evaluation Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.5 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

194 194 194 195 198 199 201 201 202 202 204 204 205 205 205 209 210 210 211 211 214 214 215 216 217 218 218 218 219 221 223 228 229

11 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Chapter 1

Geological Remote Sensing: An Overview

Abstract Geological remote sensing is an important content for modern geological survey. This chapter offers a comprehensive overview of the principles and applications of remote sensing techniques in geology researches, highlights recent breakthroughs in the utilization of advanced load in platforms, intelligent data processing methodologies, and applications of geological remote sensing. Furthermore, it underscores the challenges associated with data interpretation, data limitations, and operational constraints, indicating the further research directions for tackling these issues.

1.1 Description of Geological Remote Sensing 1.1.1 Concept and Principle of Geological Remote Sensing Geological remote sensing is an interdisciplinary field that uses remote sensing technology to study the geological characteristics of the earth (Bishop et al., 2018; Gupta, 2017). By using various remote sensing sensors, including satellites, aerial remote sensing, ground surveying instruments and geological survey equipment, information on the earth’s surface and underground can be obtained to help scientists understand the geological structures, rock types and mineral resources. The key principle of geological remote sensing is on the basis of electromagnetic spectrum that can encompass a range of frequencies of electromagnetic radiation, including visible light, infrared, and microwave radiation (Bedell et al., 2009). Different geological materials interact with specific wavelengths in unique ways, leading to the identification and differentiation of various geological features based on their spectral signatures. With these information, geologists furtherly use image processing techniques, including image enhancement, classification, and feature extraction, to extract valuable information about the earth surface, achieving the identification of geological structures, minerals, and other relevant features.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_1

1

2

1 Geological Remote Sensing: An Overview

1.1.2 Key Remote Sensing Techniques for Geological Research Remote sensing techniques used in geological studies mainly include multispectral and hyperspectral, radar, thermal infrared and LiDAR remote sensing and their principles are described as follow: (1) Multispectral remote sensing uses sensors to capture multiple wavelength ranges of surface reflection or radiation data. Different rocks and minerals exhibit different spectral characteristics at different wavelengths. For example, iron ore has specific spectral signatures in the visible and infrared spectral ranges that can be used to detect iron mineral deposits (Van der Meer et al., 2012). (2) Hyperspectral remote sensing is a further development of multispectral remote sensing, which provides more spectral channels and finer spectral resolution. This allows scientists to more accurately identify different minerals and rock types (Van der Meer et al., 2012). (3) Radar remote sensing uses microwave radiation that can penetrate clouds and vegetation to obtain more information on the surface and underground. This is useful for studying geological formations and underground geological features. Radar remote sensing can also detect surface deformation and is used to monitor seismic activity and geological disasters (Schmullius & Evans, 1997). (4) Thermal infrared remote sensing can be used to identify the variations in temperature associated with different materials and processes, then identify the thermal anomalies, such as hydrothermal activity and volcanic eruptions, which are essential for geological research and hazard assessment (Ninomiya & Fu, 2019). (5) Light detection and ranging (LiDAR) technology uses laser pulses to measure distances to the ground surface, generating highly accurate 3D representations of the terrain and surface features (Burton et al., 2011). It can be utilized for creating high-resolution digital elevation model (DEM), analyzing surface morphology, and identifying geological structures (Chen et al., 2015).

1.1.3 Key Features for the Interpretation of Geological Remote Sensing Remote sensing technology mainly relies on the extraction of various features that are crucial for the representation of geological information, including geological composition, structure, and dynamics of the earth’s surface and subsurface. The key features that can be extracted using remote sensing include: (1) DEM features: Radar remote sensing can provide accurate topographic data which is essential for creating DEM and representing the terrain characteristics. This information is the key feature for the identification of landforms, geological structures, and surface processes (Drury, 1986).

1.1 Description of Geological Remote Sensing

3

(2) Spectral features: Spectral features can include specific reflectance patterns of different minerals or rock types, which can be detected through multispectral or hyperspectral remote sensing techniques (Hecker et al., 2019). (3) Point cloud feature: Point cloud data an provide a detailed representation of the 3D structure of the earth’s surface, aiding in the identification of geological features such as cliffs, valleys, and other terrain elements. This can be used to study the finer details of landforms and geological structures, achieving a much more comprehensive understanding of the earth’s surface (Dewez et al., 2016). (4) Temperature anomalies: The variations of temperature that may indicate the presence of certain geological phenomena such as hydrothermal activity or subsurface structures (van der Meer et al., 2014). (5) Radar imaging features: Radar backscatter can capture backscatter signals that can identify surface roughness, subsurface structures, and different material properties, including those beneath the earth’s surface, such as bedrock and sediment layers (Dierking, 1999). (6) Time-series variation information: Time-series variation information can represent the ground surface deformation, making it possible to monitor ground subsidence, tectonic movements, and other geological processes (Zhou et al., 2016).

1.1.4 Technical System of Remote Sensing Technology for Geological Research The technical method system of remote sensing technology for lithological research (Prost, 2013) includes the following components: (1) Image processing: Image preprocessing techniques such as radiometric and geometric, and atmospheric correction, image fusion and filtering to improve the quality, interpretability of remote sensing images and achieve more accurate lithological analysis. (2) Image interpretation: Remote sensing geological interpretation refers to the process of extracting remote sensing geological information from remote sensing images, which mainly includes lithology and stratigraphy interpretation, as well as tectonic interpretation. Remote sensing geological image interpretation includes visual interpretation, human–computer interaction interpretation and so on. Remote sensing geological interpretation has experienced from manual interpretation to semi-automatic interpretation, and is developing in the direction of fully intelligent interpretation. Intelligent interpretation of geological remote sensing images refers to the use of artificial intelligence technology to carry out automated and intelligent interpretation of geological information on remote sensing images. (3) Image analysis: Experts can utilize spectral signature analysis and advanced classification algorithms to identify and map various lithological units, mineral

4

1 Geological Remote Sensing: An Overview

compositions and geological structure based on their unique spectral characteristics, allowing for the classification of different rock types and geological formations. (4) Visualization: With the analysis of remote sensing data and geospatial software tools, the geological information can be visualized and its spatial distribution can be analyzed for further applications.

1.1.5 Applications of Geological Remote Sensing Based on remote sensing technology, geologists can gather comprehensive and accurate data that is essential for making informed decisions in various fields, such as mineral exploration, environmental management, and natural hazard mitigation (Gupta, 2017). Geology remote sensing-spaced technology can be divided into two scales: firstly, for the metallogenic zone, using satellite remote sensing data, especially the hyperspectral satellite data acquired in recent years, to optimize the prospecting area; secondly, in the key exploration area, using aerial or unmanned aerial remote sensing data to define the target area for prospecting. It is very important to give full play to the advantages in basic research in the innovation of geological remote sensing interpretation theory and the tackling of macro-scale interpretation techniques and methods, focus on the study of the special response of multi-modal remote sensing spatial and temporal spectral patterns in the process of land mass gathering and dispersal. And it is necessary to take thematic geologic remote sensing mapping as a handful, to continue to deepen the understanding of the overall framework of key metallogenic zones and key oil and gasbearing basins, and make great efforts to solve key basic geological problems of rocks, stratigraphy, tectonics and so on in the process of land mass gathering and dispersal. It is urgent to focus on solving key basic geological problems in rocks, stratigraphy and tectonics in the process of land mass dispersion, and elucidate key elements such as deep material sources, geological processes and fluid transportation in large-scale mineralization. The other detailed descriptions of its main application realms are as follow: (1) Identification of geological structures: Remote sensing can help geologists identify various geological features such as rock formations, faults, folds, and sedimentary structures and map their spatial distribution over large areas efficiently. (2) Mineral exploration and mapping: Different minerals exhibit characteristic spectral signatures that can be detected using remote sensing techniques, which can be used to locate the potential mineral resources and plan exploration activities more efficiently. (3) Terrain analysis and mapping: Remote sensing data such as DEM and LiDAR point cloud can provide detailed topographic information that is crucial for terrain analysis and mapping. Geologists can create accurate and high-resolution

1.2 Research Advance in Geological Remote Sensing

5

maps of the Earth’s surface, enabling them to study terrain characteristics and landforms in detail. (4) Environmental impact assessment: Time series remote sensing can be used to monitor and assess the impact of human activities, such as mining on the local environment by tracking changes in vegetation cover, soil composition, allowing them to evaluate the environmental impact of these activities over time. (5) Natural hazard assessment: Similarly, time series remote sensing can be used to analyze the changes in surface features and deformation patterns to assess and monitor various natural hazards, including landslides, earthquakes, and volcanic eruptions. With this, geologists can better understand the processes leading to these hazards and develop early warning systems to mitigate their potential impact.

1.2 Research Advance in Geological Remote Sensing Geological remote sensing technology is a new discipline that has developed rapidly with the development of remote sensing technology and the continuous expansion of geological work applications and service scope. Its development has roughly gone through three stages: First, from 1962 to 1972, Skylab of United States used airborne infrared scanners and cameras to conduct the remote sensing based geological survey (Oleary & Pohn, 1975). In this stage, the sensor type was relatively single, the imaging technology was rough, and the spatial resolution was low, which cannot yet meet the requirements of large-scale fine geological mapping (Morrison, 1974). The second stage was from 1972 to the early 1980s, the United States launched Landsat 1–3 satellite series. Geologists used remote sensing image data from Landsat, Skylab and three-level aerial platforms to explore and evaluate the performance of remote sensing data used in geological study, and conducted basic theoretical research on geological remote sensing, actual measurement of spectral curves, and applied research (Rowan, 1975). Due to military and civilian demands, after more than 60 years of development, the world’s major space nations have continuously improved their toplevel planning and remote sensing observation systems for high-resolution remote sensing, accumulated talents and technologies, and developed key payload related technologies such as cameras, sensors, imaging, and refrigeration (Cracknell, 2018). These breakthroughs made the industrial chain of remote sensing satellite development, data acquisition and application relatively complete, and could give full play to the overall effectiveness of multi-satellite networking, multi-orbit coordination, fast revisiting, and high positioning accuracy. The third stage is the fast development stage of satellite remote sensing technology. Over the past 40 years, breakthroughs in key technologies have been made such as high-resolution large-scale visible light, infrared, hyperspectral, SAR, high-precision dynamic imaging, and high-orbit imaging, forming a satellite remote sensing system for land, meteorology, and ocean (Chen et al., 2022; Li, 2021).

6

1 Geological Remote Sensing: An Overview

In the future, the developing trend of geological remote sensing will focus on the high-precision advanced payloads, ultra-light and ultra-small on-orbit intelligent paths, forming the periodic monitoring capabilities around the world and important areas.

1.2.1 Payload: Developing from Single Optical Sensor to New Multimodal Sensors In the 1960s, the U.S. Landsat satellites were mainly low-resolution optical satellites. In the past 60 years, civilian scientific research and military applications have greatly promoted the development of remote sensing satellite payloads, which have evolved from single optical payloads to high-resolution optical remote sensing payloads (including very high-resolution visible light camera technology, very high-resolution optical remote sensing imaging technology, visible light/infrared/ hyperspectral comprehensive remote sensing technology, high-orbit high-resolution satellite remote sensing technology), high-resolution microwave remote sensing (including multi-polarization phased array system SAR antenna technology, highprecision multi-mode SAR imaging technology) (Won et al., 2014; Yao, 2017). It has the characteristics of coexistence of high, medium and low resolutions, multi-element detection, and stable data quality (Zhou et al., 2017). Typical satellites have achieved operational and serial development, and have shown continuous improvement in spatial resolution, shortening of revisit cycles, strengthening of satellite networking capabilities. The development trend is becoming more lightweight, economical, efficient, and flexible, making it possible to “see” all types of ground objects clearly (Sun et al., 2021).

1.2.2 Data Processing: Developing from “Rough” to “Precision” For large optical and microwave satellites, perturbation vibration and in-motion imaging affect their resolution and imaging quality, which determine whether the satellite can “see clearly”, while high-precision positioning, high-speed massive data processing and transmission common technologies restricts whether it can be “accurately determined” and “be used well”. Therefore, many countries in the world are promoting the development of satellite data processing technology (Ma et al., 2015). For example, Sentinel-1 launched by the European Space Agency is a series composed of two satellites, Sentinel-1A and Sentinel-1B. They share the same orbital plane with an orbital phase difference of 180°, and can provide independent operational capabilities for continuous radar mapping of the earth. In addition, China has also formed engineering achievements in high-speed image data processing, storage

1.2 Research Advance in Geological Remote Sensing

7

and transmission technology, high-precision target positioning technology, in-motion imaging technology for large remote sensing satellites, and high-resolution remote sensing satellite perturbation and vibration suppression technology (Chen et al., 2023; Lü et al., 2011; Xu et al., 2020; Zhou et al., 2022), basically achieved the development of data processing from “rough” to “precision”.

1.2.3 Geological Application: Developing from Human–Computer Interaction to All-Factor Intelligent Interpretation In the 1960s, due to low spatial resolution, the U.S. Landsat series was mainly used for remote sensing surveys of regional geological backgrounds such as lithology and fault structures and mine environment monitoring. The applications were relatively simple, and the interpretation was mainly achieved by visual interpretation (Zhao et al., 2022). In recent years, the rapid development of space-terrestrial integrated earth observation systems and intelligent computing technology has provided opportunities for the advancement and transformation of remote sensing technology. After experiencing the digital signal processing era with statistical mathematical models as the core from the 1960s to the 1980s, and the quantitative remote sensing era marked by the physical quantification of remote sensing information from the 1990s to the present, remote sensing information technology is now gradually entering an era based on intelligent interpretation of remote sensing data characterized by big data intelligent analysis. And the interpretation elements have also expanded from relatively single thematic elements to all geological elements such as surface water, rock and soil, geological hazards, soil erosion, and mining environment (Brandmeier & Chen, 2019; Ghamisi et al., 2020; Isikdogan et al., 2017; Mohan et al., 2021; Wan et al., 2022). Intelligent information extraction is an inevitable requirement of current remote sensing data methods. Multi-source heterogeneous massive remote sensing data not only puts forward higher requirements for computing power, but also for the data processing method itself (Ghamisi et al., 2019). Traditional processing methods cannot meet the processing accuracy and efficiency of large remote sensing data. In order to meet the growing demand for geological remote sensing, intelligent information extraction methods have emerged. In recent years, deep learning network models have been continuously improved and breakthroughs have been made in geological information extraction technology (Shirmard et al., 2022a, 2022b). The accuracy in many tasks such as identification of rock and soil bodies and geological disasters has exceeded the accuracy of manual identification (Brandmeier & Chen, 2019; Mohan et al., 2021).

8

1 Geological Remote Sensing: An Overview

1.3 Intelligent Interpretation of Geological Remote Sensing Intelligent interpretation of geological remote sensing involves the application of advanced data analysis techniques, including machine learning and deep learning image processing algorithms to automatically extract geological information from remote sensing data and make predictions. It can classify, identify and extract geological information on remote sensing images through deep learning, machine learning and other technologies, so as to realize the automated interpretation of remote sensing images. Various supervised and unsupervised machine learning methods, including support vector machines (SVM), random forest (RF), and artificial neural networks (ANN), has been applied for the classification and interpretation of geological features, which enable automated feature extraction and pattern recognition from remote sensing data for the identification of specific geological structures and materials (Cracknell & Reading, 2014; Radford et al., 2018; Yu et al., 2012). In recent years, deep learning techniques, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have been increasingly used in the realm of geological remote sensing (dos Santos et al., 2021; Sang et al., 2020). Deep learning methods can effectively process large-scale datasets and extract high-level features and patterns, achieving the better identification and classification performance.

1.3.1 Technical System of Intelligent Interpretation of Remote Sensing for Geology Key technical system of remote sensing intelligent interpretation for geology includes following procedures (Han et al., 2023): (1) Data acquisition and preprocessing Firstly, it is needed to obtain remote sensing data related to geological research target from various sources, such as satellite optical, hyperspectral, SAR images, aerial photographs, or LiDAR data, process the remote sensing data to correct for atmospheric effects, sensor noise, and other artifacts. Calibration is essential to ensure the accuracy and consistency of the data, enabling a reliable analysis of geological information. (2) Image enhancement and multi-source data fusion Applying image enhancement techniques can improve the quality of the remote sensing data and can reduce the impacts of limited available samples. Fuse multiple data sources, such as optical and radar imagery, to benefit from the complementary strengths of different sensors and to obtain a more comprehensive understanding of the geological features.

1.3 Intelligent Interpretation of Geological Remote Sensing

9

(3) Manual analysis of remote sensing image Perform manual remote sensing analysis can generate more labels for the intelligent models to better identify specific mineralogical compositions from the surface features. (4) Intelligent modelling This step is to employ machine learning and deep learning algorithms to automate the interpretation of complex geological data, extract features from remote sensing images and train models to recognize the patterns and classify lithologies. Feature extraction includes identifying and extracting geological features such as rock types, landforms, vegetation, water bodies, etc. Feature extraction methods can be based on spectral, spatial and temporal features, such as spectral properties, texture, shape and spatial relationships of pixels. Regarding the algorithms, commonly used algorithms include SVM, RF, CNN, etc. These algorithms can classify geological features based on the results of feature extraction and continuously learn and improve the accuracy of interpretation. (5) Spatial analysis and GIS integration Integrate remote sensing data with geographic information systems (GIS) can help perform better spatial analysis. GIS tools can be utilized to analyze spatial relationships between geological units, create vector geological maps, and generate valuable insights for decision-making by intelligent models. (6) Spectral unmixing and mineral mapping If the hyperspectral remote sensing is used, applying spectral unmixing techniques can identify and map different minerals based on their spectral signatures and extract detailed information about mineral compositions and distributions to facilitate accurate geological mapping and mineral resource assessment.

1.3.2 Advantages of Intelligent Interpretation of Geological Remote Sensing Intelligent interpretation of geological remote sensing data offers several advantages (Shirmard et al., 2022b). Firstly, it enables the automatic analysis of complex geological feature, reducing the need for manual time-consuming tasks by using machine learning and deep learning algorithms, geologists can efficiently process large volumes of remote sensing data, leading to improved efficiency in data analysis and interpretation. In addition, intelligent interpretation techniques can significantly enhance the precision of geological feature extraction and classification by identifying subtle patterns and relationships within geological remote sensing data that may

10

1 Geological Remote Sensing: An Overview

not be discernible through traditional interpretation methods. This provides geologists with valuable insights and information for making informed decisions in various geological applications, including mineral exploration, environmental management, and geological hazard assessment.

1.4 Challenges and Future Directions of Geological Remote Sensing 1.4.1 Gaps and Challenges in Current State of Geological Remote Sensing While intelligent interpretation of geological remote sensing has made significant advancements in geology researches, there are still some challenges in the current state of the field. Some of these include: (1) Lack of ground truth data: Ground survey is still the necessary step in geological study today for the validation of remote sensing-based researches. But acquiring and validating ground truth data can be difficult and expensive, particularly in remote or inaccessible areas, like regions with high elevation, slope angle, and dense vegetation coverage (Fan et al., 2017). (2) Uncertainty in interpretation: Uncertainty can impact the accuracy and reliability of the derived geological information. Various factors contribute to this uncertainty, including limitations in data resolution, variability in spectral signatures, and the complexity of geological features. The presence of atmospheric interference, sensor noise, and data acquisition errors can further add to the uncertainties. Moreover, the subjective nature of image interpretation and the reliance on expert judgment can introduce additional uncertainties, particularly when dealing with complex geological structures and heterogeneous terrain (Bond, 2015; Zhu & Zhuang, 2010). (3) Data integration and fusion: Integration of image data from different remote sensing platforms and sensors, as well as the fusion with other geospatial data sources, have not been well achieved so far due to the issues of spatial and temporal misalignments, and heterogeneity of data sources, etc. (Dalla Mura et al., 2015). (4) Insufficient accuracy: The accuracy of advanced data processing techniques, such as artificial intelligence, machine learning, and deep learning algorithms, for automated feature extraction and accurate geological interpretation still cannot satisfy the demand of practical applications. Developing more robust and efficient algorithms tailored for geological remote sensing applications is essential for enhancing data analysis and interpretation capabilities.

References

11

1.4.2 Research Directions of Geological Remote Sensing To better perform the geological remote sensing and address the aforementioned challenges, future directions and opportunities may include the following contents: (1) Developing advanced methods that can enhance the data integration and fusion is important. By designing and using different network layers and structures, it is possible to enhance the integration of diverse data features (Schmitt & Zhu, 2016). (2) Transfer learning is a machine learning technique that uses knowledge gained from one domain to solve a related problem in another domain. It can a potential way when the target dataset is small or when training a deep learning model from scratch is not feasible due to computational constraints or limited data availability (Pires de Lima & Marfurt, 2019). Transfer learning can help to improve the performance of a model on a new task. (3) Prior knowledge such as the key geological concepts, features, processes, modeling parameters and existing geological maps may assist the effective interpretation and analysis of remote sensing data. Integrating the prior knowledge in the interpretation process by using knowledge graph or GIS tools can draw accurate conclusions about geological formations or geological processes (Sun et al., 2022).

References Bedell, R., Crósta, A. P., & Grunsky, E. (2009). Remote sensing and spectral geology. Society of Economic Geologists. Bishop, C., Rivard, B., de Souza Filho, C., & Van Der Meer, F. (2018). Geological remote sensing. International Journal of Applied Earth Observation and Geoinformation, 64, 267–274. Bond, C. E. (2015). Uncertainty in structural interpretation: Lessons to be learnt. Journal of Structural Geology, 74, 185–200. Brandmeier, M., & Chen, Y. (2019). Lithological classification using multi-sensor data and convolutional neural networks. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42, 55–59. Burton, D., Dunlap, D. B., Wood, L., & Flaig, P. P. (2011). Lidar intensity as a remote sensor of rock properties. Journal of Sedimentary Research, 81(5), 339–347. Chen, J., Di, X., Xu, R., Qi, H., Cong, L., Zhang, K., Xing, Z., He, X., Lei, W., & Zhang, S. (2023). A remote sensing data transmission strategy based on the combination of satellite-ground link and GEO relay under dynamic topology. Future Generation Computer Systems, 145, 337–353. Chen, N., Ni, N., Kapp, P., Chen, J., Xiao, A., & Li, H. (2015). Structural analysis of the Hero Range in the Qaidam Basin, northwestern China, using integrated UAV, terrestrial LiDAR, Landsat 8, and 3-D seismic data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(9), 4581–4591. Chen, W., Li, X., & Wang, L. (2022). Multimodal Remote sensing science and technology. In Remote sensing intelligent interpretation for mine geological environment: From land use and land cover perspective (pp. 7–32). Springer Nature Singapore.

12

1 Geological Remote Sensing: An Overview

Cracknell, A. P. (2018). The development of remote sensing in the last 40 years. International Journal of Remote Sensing, 39(23), 8387–8427. Cracknell, M. J., & Reading, A. M. (2014). Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Computers and Geosciences, 63, 22–33. Dalla Mura, M., Prasad, S., Pacifici, F., Gamba, P., Chanussot, J., & Benediktsson, J. A. (2015). Challenges and opportunities of multimodality and data fusion in remote sensing. Proceedings of the IEEE, 103(9), 1585–1601. Dewez, T. J., Girardeau-Montaut, D., Allanic, C., & Rohmer, J. (2016). Facets: A cloudcompare plugin to extract geological planes from unstructured 3d point clouds. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 41, 799–804. Dierking, W. (1999). Quantitative roughness characterization of geological surfaces and implications for radar signature analysis. IEEE Transactions on Geoscience and Remote Sensing, 37(5), 2397–2412. dos Santos, D. T., Roisenberg, M., & dos Santos Nascimento, M. (2021). Deep recurrent neural networks approach to sedimentary facies classification using well logs. IEEE Geoscience and Remote Sensing Letters, 19, 1–5. Drury, S. A. (1986). Remote sensing of geological structure in temperate agricultural terrains. Geological Magazine, 123(2), 113–121. Fan, X., Xu, Q., Scaringi, G., Dai, L., Li, W., Dong, X., Zhu, X., Pei, X., Dai, K., & Havenith, H. B. (2017). Failure mechanism and kinematics of the deadly June 24th 2017 Xinmo landslide, Maoxian, Sichuan, China. Landslides, 14, 2129–2146. Ghamisi, P., Li, H., Jackisch, R., Rasti, B., & Gloaguen, R. (2020). Remote sensing and deep learning for sustainable mining. In IGARSS 2020–2020 IEEE international geoscience and remote sensing symposium (pp. 3739–3742). Ghamisi, P., Rasti, B., Yokoya, N., Wang, Q., Hofle, B., Bruzzone, L., Bovolo, F., Chi, M., Anders, K., Gloaguen, R., & Atkinson, P. M. (2019). Multisource and multitemporal data fusion in remote sensing: A comprehensive review of the state of the art. IEEE Geoscience and Remote Sensing Magazine, 7(1), 6–39. Gupta, R. P. (2017). Remote sensing geology. Springer. Han, W., Zhang, X., Wang, Y., Wang, L., Huang, X., Li, J., Wang, S., Chen, W., Li, X., Feng, R. & Fan, R. (2023). A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS Journal of Photogrammetry and Remote Sensing, 202, 87–113. Hecker, C., van Ruitenbeek, F. J., van der Werff, H. M., Bakker, W. H., Hewson, R. D., & van der Meer, F. D. (2019). Spectral absorption feature analysis for finding ore: A tutorial on using the method in geological remote sensing. IEEE Geoscience and Remote Sensing Magazine, 7(2), 51–71. Isikdogan, F., Bovik, A. C., & Passalacqua, P. (2017). Surface water mapping by deep learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(11), 4909–4918. Li, J. (2021). Fundamentals of satellite remote sensing technology. Satellite Remote Sensing Technologies, 1–26. Lü, X., Cheng, C., Gong, J., & Guan, L. (2011). Review of data storage and management technologies for massive remote sensing data. Science China Technological Sciences, 54, 3220–3232. Ma, Y., Wu, H., Wang, L., Huang, B., Ranjan, R., Zomaya, A., & Jie, W. (2015). Remote sensing big data computing: Challenges and opportunities. Future Generation Computer Systems, 51, 47–60. Mohan, A., Singh, A. K., Kumar, B., & Dwivedi, R. (2021). Review on remote sensing methods for landslide detection using machine and deep learning. Transactions on Emerging Telecommunications Technologies, 32(7), e3998.

References

13

Morrison, R. B. (1974). Applications of Skylab EREP photographs to mapping of landforms and environmental geology in the Great Plains and Midwest. USGS. Ninomiya, Y., & Fu, B. (2019). Thermal infrared multispectral remote sensing of lithology and mineralogy based on spectral properties of materials. Ore Geology Reviews, 108, 54–72. Oleary, D. W., & Pohn, H. A. (1975). A photogeologic comparison of Skylab and LANDSAT images of southwestern Nevada and southeastern California. USGS. Pires de Lima, R., & Marfurt, K. (2019). Convolutional neural network for remote-sensing scene classification: Transfer learning analysis. Remote Sensing, 12(1), 86. Prost, G. L. (2013). Remote sensing for geoscientists: Image analysis and integration. CRC Press. Radford, D. D., Cracknell, M. J., Roach, M. J., & Cumming, G. V. (2018). Geological mapping in Western Tasmania using radar and random forests. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(9), 3075–3087. Rowan, L. C. (1975). Application of satellites to geologic exploration: Recent experiments in two spectral regions, the visible and near-infrared and the thermal-infrared, confirm the value of satellite observations for geologic exploration of the earth. American Scientist, 63(4), 393–403. Sang, X., Xue, L., Ran, X., Li, X., Liu, J., & Liu, Z. (2020). Intelligent high-resolution geological mapping based on SLIC-CNN. ISPRS International Journal of Geo-Information, 9(2), 99. Schmitt, M., & Zhu, X. (2016). Data fusion and remote sensing: An ever-growing relationship. IEEE Geoscience and Remote Sensing Magazine, 4(4), 6–23. Schmullius, C. C., & Evans, D. L. (1997). Review article synthetic aperture radar (SAR) frequency and polarization requirements for applications in ecology, geology, hydrology, and oceanography: A tabular status quo after SIR-C/X-SAR. International Journal of Remote Sensing, 18(13), 2713–2722. Shirmard, H., Farahbakhsh, E., Heidari, E., Beiranvand Pour, A., Pradhan, B., Müller, D., & Chandra, R. (2022a). A comparative study of convolutional neural networks and conventional machine learning models for lithological mapping using remote sensing data. Remote Sensing, 14(4), 819. Shirmard, H., Farahbakhsh, E., Müller, R. D., & Chandra, R. (2022b). A review of machine learning in processing remote sensing data for mineral exploration. Remote Sensing of Environment, 268, 112750. Sun, S., Dustdar, S., Ranjan, R., Morgan, G., Dong, Y., & Wang, L. (2022). Remote sensing image interpretation with semantic graph-based methods: A survey. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 4544–4558. Sun, Z., Wang, X., Wang, Z., Yang, L., Xie, Y., & Huang, Y. (2021). UAVs as remote sensing platforms in plant ecology: Review of applications and challenges. Journal of Plant Ecology, 14(6), 1003–1023. Van der Meer, F. D., Van der Werff, H. M., Van Ruitenbeek, F. J., Hecker, C. A., Bakker, W. H., Noomen, M. F., Van Der Meijde, M., Carranza, E. J. M., De Smeth, J. B., & Woldai, T. (2012). Multi-and hyperspectral geologic remote sensing: A review. International Journal of Applied Earth Observation and Geoinformation, 14(1), 112–128. Van der Meer, F., Hecker, C., van Ruitenbeek, F., van der Werff, H., de Wijkerslooth, C., & Wechsler, C. (2014). Geologic remote sensing for geothermal exploration: A review. International Journal of Applied Earth Observation and Geoinformation, 33, 255–269. Wan, L., Li, S., Chen, Y., He, Z., & Shi, Y. (2022). Application of deep learning in land use classification for soil erosion using remote sensing. Frontiers in Earth Science, 10, 849531. Won, Y.-J., Yoon, J.-C., & Kim, J.-H. (2014). SAR payload technology for next generation satellite. Aerospace Engineering and Technology, 13(2), 131–141. Xu, C., Du, X., Yan, Z., & Fan, X. (2020). ScienceEarth: A big data platform for remote sensing data processing. Remote Sensing, 12(4), 607. Yao, Y. (2017). Analysis of platform and payload integrated design technology for optical remote sensing satellites. In 3rd international symposium of space optical instruments and applications (pp. 9–22). Springer International Publishing.

14

1 Geological Remote Sensing: An Overview

Yu, L., Porwal, A., Holden, E.-J., & Dentith, M. C. (2012). Towards automatic lithological classification from remote sensing data using support vector machines. Computers and Geosciences, 45, 229–239. Zhao, L., Niu, R., Li, B., Chen, T., & Wang, Y. (2022). Application of improved instance segmentation algorithm based on VoVNet-v2 in open-pit mines remote sensing pre-survey. Remote Sensing, 14(11), 2626. Zhou, Q.-B., Yu, Q.-Y., Jia, L., Wu, W.-B., & Tang, H.-J. (2017). Perspective of Chinese GF-1 high-resolution satellite data in agricultural remote sensing monitoring. Journal of Integrative Agriculture, 16(2), 242–251. Zhou, W., Li, S., Zhou, Z., & Chang, X. (2016). Remote sensing of deformation of a high concretefaced rockfill dam using InSAR: A study of the Shuibuya dam, China. Remote Sensing, 8(3), 255. Zhou, W., Zhang, K., Wu, S., Tan, S., & Wu, Z. (2022). Distributed cooperative control for vibration suppression of a flexible satellite. Aerospace Science and Technology, 128, 107750. Zhu, L., & Zhuang, Z. (2010). Framework system and research flow of uncertainty in 3D geological structure models. Mining Science and Technology (China), 20(2), 306–311.

Chapter 2

Geological Remote Sensing Dataset Construction for Multi-level Tasks

Abstract This chapter introduce the lithology datasets preparing for the intelligent interpretation methods in the following chapters. For each dataset, the basic situation of the study area, remote sensing data sources, the preprocessing approaches and overview of datasets are introduced.

2.1 Pixel-Level Dataset 2.1.1 Study Area The study area (Fig. 2.1) is located in the northeast border of China, with a large area of vegetation coverage, and the ground coverage area of remote sensing image data exceeds 441.87 km2 . After manual interpretation, it contains five types of rock mass (metamorphic rock, sedimentary rock, quaternary rock, granite and extruded rock) and water. Remote sensing geological interpretation data were generated and saved as vector files by ArcMap 10.7.

2.1.2 Data Sources ZY-3 image data was used in this study. The ZY-3 surveying and mapping satellite is the first civilian stereoscopic surveying and mapping satellite in China, which was used to produce 1:50,000 surveying and mapping products of national basic geographic information, and 1:25,000 or larger scale maps for revision and update, and to carry out land resources survey and monitoring. The ZY-3 is equipped with four cameras, including a 2.1 m ground-resolution face-facing camera, two 3.6-m forward-looking and rear-looking cameras, and a 5.8 m multispectral camera (Li, 2012). Parameters of ZY-3 can be found in Table 2.1.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_2

15

16

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

E

E

E

N

N

N

N

N

N

E

E

E

E

Fig. 2.1 Remote sensing image data of the study area Table 2.1 Technical indicators of data from ZY-3 satellite Payload

Band number

Spatial resolution (m)

Spectrum range (μm)

Width (km)

Revisit time (days)

Face facing camera



2.1

0.50–0.80

52

5

Forward looking camera



3.5

0.50–0.80

52

5

Rear-view camera



3.5

0.50–0.80

52

5

Orthoscopic multispectral camera

1 2 3 4

5.8

0.45–0.52 0.52–0.59 0.63–0.69 0.77–0.89

52

5

2.2 Scene-Level Dataset

17

2.1.3 Data Preprocessing After experimental comparison of the original data of 2.1 m pixel and the image metadata was resampling to 5 and 10 m. Using the 10 m resolution can reduce the amount of data, but also ensure the same good experimental accuracy. Therefore, the ArcMap 10.7 software was used to resample the original image with 2.1 m resolution to 10 m firstly. Since the original images contain four bands of B (blue), G (green), R (red) and N (near infrared), they were exported to true color (GBR band) and false color (NGR band) image data respectively.

2.1.4 Dataset Construction The data with manual interpretation by expert knowledge in the research area was used as the labels of the final evaluation algorithm (Fig. 2.2), so the accuracy of the samples directly affect the accuracy of the interpretation results based on the deep learning model. Remote sensing lithology classification is the process of categorizing each pixel in an image into a specific class, that is, designating category labels for pixels with several attributes and converting other attributes of pixel units into rock type attributes (Du et al., 2012). According to the characteristics of remote sensing lithology classification, its classification methods can be divided into three categories, namely, classification based on spectral features, classification based on spatial features and composite classification based on multi-source information (Zhang et al., 2015). In classification based on spatial features, appropriate scale was needed to reflect its spatial aggregation characteristics. In this method, a larger pixel neighborhood scale should be used to construct a training dataset with a size of 48 × 48 pixels. Remote sensing geological interpretation data was used to randomly select several sample points in each type of target area, take the sample points as the center point of the image, cut 48 × 48 scene data on the original image data, save it as a “jpg” format picture, and save the same type of data in a folder. Because some points are at the edge of the picture, the boundary will be crossed during the cropping process, which needs to be eliminated. Finally, a dataset containing two 27,046 true-color band combinations and false-color band combinations was constructed (Table 2.2).

2.2 Scene-Level Dataset In this study, we mainly discuss the application prospect of multi-view remote sensing data in lithology classification, and select the data of ZY-3 satellite and GF-3 satellite to conduct multi-view data fusion research and construct a multi-view lithology remote sensing dataset. Aiming at the problem that the boundary region needs to be

18

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

E

E

E

N

N

N

N

N

N

N

E

E

E Metamorphic 0

2

4

Granite

E

Sedimentary Quaternary system

8 km

Water

Extrusive

Fig. 2.2 Interpreted data for the study area

Table 2.2 The number of samples of various classes of rock masses in the dataset Rock mass class

Metamorphic Sedimentary Quaternary Granite Extrusive Water Total rock rock system rock

Quantity 4708 (amplitude)

4584

4531

4145

4369

4709

27,046

eliminated due to the interference of multi-type lithology feature information when cropping data based on remote sensing images, we propose a method to specify the label of the scene image after cropping to improve the quality of dataset.

2.2.1 Study Area In order to fully explore the application prospect of multi-view remote sensing satellite data in lithology classification, this study selected the Yabuli-Weihe town area

2.2 Scene-Level Dataset

19

in Heilongjiang province as the study area. The study area is located in the central and southern part of Heilongjiang Province, southeast of Shangzhi City, with a total area of 546 km2 . It is located in hilly and shallow mountainous areas, with wide vegetation distribution and few lithologic outcrops.

2.2.2 Data Sources The types of remote sensing data available in the study area include optical remote sensing satellite data, SAR data and DEM data. Optical remote sensing satellite data and DEM data were obtained by ZiYuan-3 (ZY-3) satellite, and SAR data was obtained by Gaofen-3 (GF-3) satellite data, and corresponding data preprocessing operations were carried out. Image data of ZY-3 satellite was selected to extract and generate DEM data. Remote sensing image data will have a certain degree of geometric distortion due to the technical reasons of imaging. The extracted data was used as reference image files to select ground object control points to perform orthographic correction of multiview remote sensing data sources. The low spatial resolution of the multispectral image of ZY-3 satellite leads to the lack of clarity of the ground objects, and it is more difficult to accurately distinguish the types of lithology in the process of lithology classification. Therefore, it is necessary to use the method of image data fusion to improve the spatial resolution of the spectral data and obtain the image data with rich spatial and spectral information. Then Schmidt orthodontic method was used for image data fusion. ZY-3 satellite data was four-band data after the fusion. In this study, we only used the true color band combination data. Radar data needs to be converted into Single Look Complex (SLC) or Multi Look Complex (MLC) firstly, and then the polarization data can be extracted by preprocessing. In this study, the SLC data of GF-3 satellite were processed by data conversion, multi-view processing, polarization matrix conversion (C2 matrix), polarization fine LEE wave processing, H/A/α polarization decomposition, geocoding and other operations to obtain HH-HV dual-polarization data, and the image was resampled to 2 m (Li et al., 2022). The optical data of the original frame in the study area are shown in Fig. 2.3, DEM data in Fig. 2.4, and SAR data in Fig. 2.5. And the detailed information of the pre-processed data of different views in the research area is shown in Table 2.3.

2.2.3 Data Preprocessing Due to inherent issues of images, the frame of the obtained SAR image data was incomplete. In order to make the multi-view data can all cover the same geographical area, optical remote sensing satellite data and DEM data were cropped to obtain the final multi-view data in the research area, and each view data was represented by a red box.

20

0

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

3

6

12 km

Fig. 2.3 Optical data in the study area—true color (the red rectangular box is the cropped area)

0

3

6

12 km

Fig. 2.4 DEM data of the study area (the red rectangle box is the cropped area)

2.2.4 Dataset Construction Using the geographic coordinate information of the data, the image blocks of the same geographic position of different images were cropped to achieve the spatial alignment of different data scene graphs. The lithology type of the scene map was

2.2 Scene-Level Dataset

0

3

21

12 km

6

Fig. 2.5 SAR data in the study area (the red rectangular box is the cropped area)

Table 2.3 Details of different view data in the study area Number of columns

Line number

Pixel size (m)

Number of bands

14,952

9138

2

3

ZY-3 DEM data

7476

4569

4

1

GF-3 SAR data

14,952

9138

2

2

ZY-3 optical data

specified by using the manually interpreted lithology label, so as to construct a multiview remote sensing data lithology scene classification dataset, so as to fill the gap of the current lithology dataset. For the multi-view remote sensing data, the largeformat image data was clipped from left to right and from top to bottom by means of non-overlapping windows, as shown in Fig. 2.6. In order to ensure that the cropped scene map can be spatially aligned, the cropped size was set to 128 × 128 for optical data and SAR data, and 64 × 64 for DEM data. At the same time, the vector data of labels was cropped in the corresponding geographical region, which was used to assign labels to the scene graph. The proportion of lithology categories in the trimmed label vector file was used to specify the label of the scene data. Usually, the lithology category corresponding to the region with the largest area in the scene map is assigned as the label of the scene map, because the region with the largest area accounts for the largest number of pixels in the scene map. However, there is a problem with this approach. If a scene map is cropped at the intersection of multiple categories, the resulting scene map may be relatively evenly distributed for each lithology category. At this time, if the label of the cropped scene map is formulated according to the principle of the largest area, the final model can extract multiple lithology data during the training process. However, these

22

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.6 The way to crop the lithology scene map

features are designated as the characteristic information of the label lithology data. As shown in Fig. 2.7, loose deposits, diorite rocks and granitic rocks are distributed in the scene diagram, among which loose deposits account for the largest area, but granitic rocks also account for a relatively large area. If labels are specified directly according to loose deposits, the characteristic information of granitic rocks will have an interference effect in the process of model learning. Aiming at the problem that the boundary region should be eliminated due to the interference of multi-type lithology characteristic information when cutting data based on remote sensing images, we proposed a method to specify the label of scene map after clipping. This method needs to calculate the area of each lithology in the scene diagram, and calculate whether the difference between the largest area and the sub-large area exceeds half of the total area. When the area of the class with the largest area meets the condition, the class can be designated as the label of the scene map, otherwise the scene map will be removed. It can be calculated by Eq. 2.1. S Max − SSecond largest > 0.5 × ST he total

(2.1)

In the study area, 7761 scene graphs were generated by specifying the scene graph label by using the largest area. Using the method proposed in this paper, some data that do not meet the conditions will be deleted, so finally the total number of available scene graphs is 7051, which has a certain reduction in data volume, but improves the data quality of the dataset. Data imbalance means that the samples of some categories is less than that of other categories. When directly processed, the model will be biased to the characteristics of large categories of data, resulting in low classification performance of a on

2.2 Scene-Level Dataset

23

Fig. 2.7 The cropped label vector image

Diorite

Loose accumulation

Granite

categories with fewer samples (Li et al., 2019b). At present, data-based methods or algorithm-based methods are mainly used to deal with the problem of data imbalance. The former mainly obtains more balanced data sets through sampling techniques, such as downsampling for classes with more data and data augmentation for classes with less data. It can also combine two methods to make up for the shortcomings of a single sampling method. The later one mainly modifies the existing learning algorithm to adapt to the classification scenario of imbalanced data, which requires a deep understanding of the learning algorithm and the research goal, and a clear understanding of the causes of incorrect classification in the case of imbalanced data, and the corresponding modification and design of the algorithm model. In this chapter, we will use the first data-level approach to deal with imbalanced samples. The distribution of lithology shows the characteristics of regional aggregation. There may be one or two kinds of lithology data concentrated in a large area. There are a large number of loose deposits and granitic rock types in the region, while the number of slate and schist scene maps is insufficient, and the data set presents obvious data imbalances. The random sampling method was used for downsampling to extract part of the data in multiple types of samples, so as to reduce the data requirements. Horizontal flip, vertical flip, Angle rotation and other ways were used to process the original sample data for data enhancement. Then the ratio of 6:2:2 was used to divide the training, validation and test datasets. After the above series of operations, the details of the multi-view dataset built in the research area are shown in Table 2.4.

24

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.4 Statistics of the dataset in the study area Raw data

Training set

Validation set

Test set

Total

Total

Mode of operation

1742

600

0

600

200

200

1000

Choose 1000 at random

Slate

81

48

240

288

17

16

321

The training set is enhanced by a factor of 5

Schist

224

134

268

402

45

45

492

The training set is enhanced by a factor of 2

Granitic rock

4247

900

0

900

300

300

1500

Diorite rock

757

454

0

454

152

151

757

Category

Loose deposits

Total

primitive

Data enhancement

7051

Choose 1500 at random –

4070

2.3 Semantic Segmentation-Level Dataset 2.3.1 Study Area The study area is located in Suiyang Town, Heilongjiang Province, China (Fig. 2.8). The forest coverage rate of the whole town is higher than 60%. The average altitude here is about 800 m and the annual average temperature is 14.7 °C. The frost-free period is 271–279 days, and the annual average precipitation is about 1260 mm. Suiyang Town is abundant in various resources including non-coal mines, coal mines, forests, tourism, and water.

2.3.2 Data Sources The remote sensing image used in this study were obtained by ZY-3 Satellite. The size of study area image is 1929 × 1792. The image has 4 bands. And the original remote sensing image is preprocessed to obtain an image with a pixel resolution of 2.1 m. Then, in order to aggregate more information, remote sensing image was

2.3 Semantic Segmentation-Level Dataset

25

Fig. 2.8 Remote sensing image of the study area

resampled with the resolution of 10 m. And the lithologies and other substrates in the image were divided into 9 categories. Their names and percentages are shown in Table 2.5. Table 2.5 Categories and percentage of lithologies and other substrates in dataset

Classification

Percentage (%)

Andesite

21.82

Slate

4.30

Quaternary loose deposits

14.62

Granite_diorite

48.89

Conglomerate

1.37

Rhyolite

0.86

Schist

3.49

Water

0.75

Basalt

3.90

26

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

2.3.3 Data Preprocessing We cropped the image and its label by using the partial overlap cropping and random cropping. And we adopted partial overlap cropping and random cropping, and obtained 6 datasets by combining different bands during cropping. The cutting center points are shown in Figs. 2.9 and 2.10. And we named the 6 datasets: [3,2,1] part, [4,3,2] part, [1,2,3,4] part, [3,2,1] random, [4,3,2] random, [1,2,3,4] random. The crop size is 256 × 256 in this study. A total of 56 256 × 256 images were obtained by using a partially overlapping cropping strategy, and different band combinations were performed during cropping. Figure 2.11 shows the situation of cropping. And we randomly selected 1000 points on the image. Since some of the points cropped 256 × 256 images with no-data parts, we discarded those points and finally got 771 images.

Fig. 2.9 Partially overlapped cropping center point

2.3 Semantic Segmentation-Level Dataset

27

Fig. 2.10 Randomly cropping center point

2.3.4 Dataset Construction The 56 images obtained by partial overlapping clipping were divided into train set (45 images) and test set (11 images) with the ratio of 8:2. And then data augmentation was performed on 45 training images and their label to obtain 270 images and 270 labels. The data augmentation we used included mirror left and right, mirror up and down, rotate 180 degrees, rotate 90 degrees and rotate 270 degrees. Then, 270 images were splited into final train set (216 images) and validation set (54 images) still following the ratio of 8:2. Finally the well construct dataset obtained in this study was with the ratio of train set, validation set and test set at a ratio of 19:5:1. Regarding the random cropping strategy, we get 771 256 × 256 images after cropping. The dataset was splited into the train set (617 images) and test set (154 images) according to the ratio of 8:2. And the train set is then divided into final train

28

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.11 Partial overlapping cropping block diagram

set (494 images) and validation set (123 images) in a ratio of 8:2. So the dataset was constructed by train set, validation set and test set at a ratio of 8:4:5. The distribution of train, validation and test sets was given in Table 2.6 and Fig. 2.12 is a display of example images and their labels in the constructed dataset. Table 2.6 The constructed dataset Train set

Validation set

Test set

Total

45

0

11

56

80%

0%

20%

100%

Partially cropped dataset after data augmentation

216

54

11

281

76%

20%

4%

100%

Randomly cropped dataset

494

123

154

771

64%

16%

20%

100%

Partially cropped dataset

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

29

Fig. 2.12 The display of dataset

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data 2.4.1 Study Area The study area of this article is located in the Jingerquan area of Hami, Xinjiang Province. The area has complex terrain. Due to the arid climate, water shortage, and mostly saline-alkaline soil, it is rich in mineral resources, including coal, copper,

30

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.13 Geographical location map of the study area

iron ore, oil, and natural gas. The coordinates range of the map is: 95° 45' –96° 00' E, 42° 30' –42° 40' N. It is adjacent to Jiuquan City, Gansu Province. After cropping, the total area is about 380 km2 . Geography The location is shown in Fig. 2.13. The geotectonic location of the study area is located in the eastern section of the Kanggur fault zone in the North Tianshan orogenic belt. Magma activity is frequent in the area and there are large areas of volcanic rock strata. Due to the strong regional metamorphism, mixed lithification and ductile shearing activities, there are mixed transition zones between rock masses, making it difficult to identify lithology, which brings troubles to the task of semantic segmentation of lithology.

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

31

Table 2.7 GF-6 Main parameters of the satellite Camera

2 m panchromatic/8 m multispectral high-resolution camera (PMS)

16 m multi-spectral mid-range wide-format camera (WFV)

Spectral angle

0.45–0.90 μm

0.40–0.89 μm

Spatial resolution

Panchromatic: 2 m Multispectral: 8 m

Multispectral 16 m

Width

≥ 90 km

≥ 800 km

2.4.2 Data Sources 2.4.2.1

Gaofen-6 (GF-6) Satellite

The Gaofen-6 (GF-6) satellite was successfully launched on June 2, 2018. It is a low-orbit optical remote sensing satellite with the characteristics of high resolution, high quality and efficient imaging. It has 2 m panchromatic/8 m multispectral (width) Width 90 km) and 16 m multispectral (width 800 km) observation capabilities. The satellite is equipped with a domestic 8-band CMOS (Complementary Metal Oxide Semiconductor) detector, which can effectively reflect the unique “red edge” band spectral characteristics of crops. It can provide remote sensing data supporting agriculture, forestry resource monitoring and investigation, disaster prevention and reduction, etc., and has important strategic significance for the ecological civilization construction and rural revitalization strategy of China. Specific parameter indicators are shown in Table 2.7. The Gaofen-6 data selected in this article was imaged in 2019. After radiation calibration, orthorectification, and image fusion, it contains four bands of red, green, blue, and near-infrared, as shown in Fig. 2.14.

2.4.2.2

Gaofen-3 (GF-3) Satellite

The Gaofen-3 satellite was successfully launched on August 10, 2016. It is the world’s highest-resolution C-band multi-polarization synthetic aperture radar satellite. Its highest resolution reaches 1 m, has 12 imaging modes and can be expanded to more than 20. It can acquire single-polarization, dual-polarization and full-polarization data, and can provide remote sensing with a resolution of 1–500 m and a width of 10–650 km. Image data has the benefits of high spatial and temporal resolution, available in all-weather, and all-day It mainly serves the ocean, disaster reduction, water conservancy, meteorology and other industries. It has been widely used in marine survey and development, land resources and environment detection, and disaster risk prediction and assessment, etc. The main parameters are shown in Table 2.8. The Gaofen-3 data used in this article is single-polarized data based on UFS imaging mode. The polarization method is HH and the image is L1A-level product.

32

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.14 Gaofen-6 processed image of the study area (red-green-blue band combination) Table 2.8 Main parameters of GF-3 satellite Resolution/m

Imaging mode name

Scan imaging mode

Polarization mode

1

10

Unipolarization

UFS

3

30

Unipolarization

FSI

5

50

Bipolarization

Slider bunching (SL) Strip imaging mode

Width/km

FSII

10

100

Bipolarization

SS

25

130

Bipolarization

QPSI

8

30

Fully polarized

QPSII

25

40

Fully polarized

NSC

50

300

Bipolarization

WSC

100

500

Bipolarization Bipolarization

GLO

500

650

Wave imaging mode (WAV)

10

5

Extended angle of incidence (EXT)

Low angle of incidence

25

130

Bipolarization

High angle of incidence

25

80

Bipolarization

Fully polarized

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

33

Fig. 2.15 HH polarization processed image of Gaofen-3 in the study area

The processed image with complex data conversion, multi-view processing, adaptive filtering and geocoding is shown in Fig. 2.15.

2.4.2.3

Advanced Land Observing (ALOS) Satellite

The Advanced Land Observing Satellite (ALOS) satellite was launched in 2006 and stopped operating in 2011, but there are still about 6.5 million scene archived data available for users (Tadono et al., 2016). The global high-resolution land observation data obtained by ALOS satellites are widely used in the fields of surveying and mapping, disaster detection, terrestrial environment detection and resource exploration. The ALOS satellite carries three sensors. Here we only introduce the panchromatic remote sensing stereo mapper related to this article. Its data is mainly used to establish a high-precision digital elevation model. The main parameters are shown in Table 2.9. The DEM data used in this article was produced from ALOS images (Tadono et al., 2016), with a spatial resolution of 30 m. The cropped image of the study area is shown in Fig. 2.16.

34

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.9 Main parameters of ALOS-PRISM Sensor

Panchromatic remote sensing stereo mapping instrument

Band

1 band, full color

Wavelength

0.52–0.77 μm

Observing scope

Satellite point imaging, front-view imaging, rear-view imaging

Spatial resolution

2.5 m (sub-satellite point imaging)

Width

70 km (sub-satellite point imaging), 35 km (joint imaging)

Fig. 2.16 DEM image of the study area

2.4.3 Dataset Construction 2.4.3.1

Optical Data Processing Flow

Radiation calibration: By converting the original digital quantized value into the reflectance of the outer surface of the atmosphere, the error of the sensor itself is eliminated and the true reflection value is obtained. In ENVI, this can be achieved by Radiometric Correction → Radiometric Calibration → Apply FLAASH Settings.

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

35

Orthorectification: By selecting several control points on the image and using the acquired DEM data within the corresponding range to correct the image point displacement caused by ground elevation, sensors, etc., which is a type of geometric correction. In ENVI, this can be achieved by Toolbox → Geometric Correction → Orthorectification → RPC Orthorectification Workflow. Image fusion: Panchromatic images have higher spatial resolution, but the spectral information is simpler and cannot display the color of ground objects. Multispectral images are rich in spectral information, but have low spatial resolution. Image fusion is used to fuse multispectral images and panchromatic images. The resulting image retains the advantages of multispectral images in color and the advantages of panchromatic images in texture. In ENVI, this can be achieved by Toolbox → Image Sharpening → Gram-Schmidt Pan Sharpening.

2.4.3.2

SAR Data Processing Flow

The data obtained in this article are HH polarization single look complex data (Single Look Complex, SLC), which require the following processing before use (the processing tool used is the domestic PIE-SAR software). Complex data conversion: SAR data records the radar echo information of a band and is generally stored in the form of complex numbers. Through complex data conversion, the SAR data is converted into intensity/amplitude data that can reflect the backscatter strength of ground objects, thereby extracting ground objects information. Multi-view processing: In order to improve the visual effect of SAR images, the azimuth/range directions of the SLC data are averaged to obtain the intensity data after multi-view. This method can effectively suppress speckle noise and improve the radiation resolution, but it will reduce spatial resolution. Adaptive filtering: The SAR imaging system has a special imaging mechanism and inherent coherent speckle noise (multiplicative), which seriously affects the interpretation and interpretation of SAR images. Adaptive filtering is a spatial filtering algorithm that takes a sliding window on the image and sequentially calculates the filter value of the pixel in the center of the window. Typical algorithms include Lee algorithm, Frost algorithm, Kuan algorithm, etc. This study uses enhanced Lee filtering to reduce speckle noise while maintaining the texture information of SAR images. Geocoding: Use the existing DEM data within the study area to perform geocoding terrain correction (GTC) on SAR images, which can not only ensure the spatial alignment (slope coordinate system to geographical coordinate system) when making the data set, but also It can also correct geometric distortion on SAR images.

36

2.4.3.3

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

DEM Data Processing Process

Geological rock masses generally present a large-scale continuous distribution. The spatial distribution and occurrence of rock masses can be used to infer lithology within a certain range. This is also an effective means to identify lithology on a large scale. Therefore, this article is based on ArcGIS 10.7 software to extract features such as slope, aspect, and shadow to improve the feature available for the model. Slope extraction: Slope is a measure of the elevation change rate from one pixel to another in the DEM data, and can be calculated by Eq. 2.2. Slop = arctan

/

Slopex2 + Slope2y

(2.2)

where Slopex and Slopey represent the horizontal gradient and vertical gradient at pixel point c calculated using the Sobel operator. The slope extraction results of the study area are shown in Fig. 2.17. Slope aspect extraction: Aspect refers to the orientation of the terrain slope, which can be used to identify the direction with the largest change rate of values from the pixel to the neighborhood direction. It can be calculated by Eq. 2.3.

Fig. 2.17 Slope map of the study area

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

37

Fig. 2.18 Slope aspect map of the study area

A = arctan

Slopex Slope y

(2.3)

where a represents the slope aspect of the pixel, Slopex and Slope y represent the horizontal gradient and vertical gradient of the pixel calculated using the SOBEL operator. The results of slope aspect extraction in the study area are shown in Fig. 2.18. Hill shadow extraction: Hill shadow provides specify information such as sun height for each pixel of the image based on DEM data to calculate the assumed brightness value of the surface. Superimposing the original image display can improve the visual effect, which can be obtained from Eq. 2.4. H = 255.0 × ((cos(Z ) × cos(S)) + (sin(Z ) × sin(S) × cos( A1 − A2)))

(2.4)

If the calculation result is less than 0, take 0, where H represents the mountain shadow, Z the sun’s zenith angle, S represents the slope of a certain point, A1 represents the direction angle of the sun’s rays, A2 represents the slope aspect of a certain point, and the angle units are all in radians. The results of hill shadow extraction in the study area are shown in Fig. 2.19.

38

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.19 Hill shade map of the study area

2.4.3.4

Lithological Classification System of the Study Area

The 1:50,000 label data used in this article comes from the salty water spring (K46E009024) geological map drawn by the Xi’an Geological Survey Center of the China Geological Survey after field measurements (Xie et al., 2022). This map was rated as an “Excellent Map” in 2021. The 1:200,000 label data Wutongwoziquan frame (K4618) comes from the national 1:200,000 digital geological spatial database (Li et al., 2019a) and is based on the only and measured national 1:200,000 regional geological survey results in China. Data measured, verified and accepted in the field and in compliance with relevant technical regulations and standards can ensure the quality of the data set to a certain extent and facilitate the research of this article. The information obtained from remote sensing images is limited, while the geological data measured in the field are classified detailly. In addition, the classification standards for categories are inconsistent at different scales. Therefore, this study firstly needs to combine the existing element categories with geological theoretical knowledge based on the overview of the study area. For the 1:50,000 scale, this study divides the geological lithological components of the study area into 9 main categories (based on the main components, such as biotite plagioclase granite, which are divided into granites), namely Quaternary, granite porphyry, and granite, schist,

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

39

hornfels, sandstone, tuff, diorite, granulite. As shown in Table 2.10, it is the system details of the 1:50,000 remote sensing lithology classification in this article. Similarly, for the 1:200,000 scale, this study divides the geological lithological elements of the study area into 7 categories, namely Dananhu Formation, Quaternary, Gandun Formation, Kawabulak Group, granite, gabbro, Quartz porphyry. As shown in Table 2.11, it is the system details of the 1:200,000 remote sensing lithology classification of the study area in this article. In addition, in order to verify the accuracy of directly dividing the lithology of the 1:200,000 geological map, it paves the way for the subsequent prior knowledge. This article uses the overlay analysis function of ArcGIS 10.7 to overlay and compare the 1:200,000 geological map and the 1:50,000 geological map, and obtain the coincidence rate of the two scale geological maps. Since a type of lithology is divided into a rock group in the 1:200,000 geological map, the corresponding standard is that when a rock group contains a certain type of rock, it is considered corresponding. For example, the Dananhu Formation in the 1:200,000 geological map and the tuffs in the 1:50,000 geological map are corresponded. Through calculation, the number of pixels that can be corresponded is approximately 42.16%, and the visualized results are shown in Fig. 2.20.

2.4.3.5

Dataset Generation

This article first reclassifies the obtained shapefile vector data into merged categories according to the classification system mentioned above, and then converts them into Geo TIFF raster data as labels, as shown in Figs. 2.21 and 2.22. Secondly, in order to facilitate patch cropping, Python code is first used to crop and align the label map and the study area image so that they are correspondingly with the size of 10,240 × 9216. The image size of 256 × 256 is used for cropping, which just divides the entire research area into 40 rows and 36 columns, totally 1440 images. The cropped image retains geographical coordinate information to achieve spatial alignment and facilitate comparison of interpretation results. A sample image of the data set is shown in Fig. 2.23. From a single image, different lithologies show a high degree of similarity. When comparing the three types of data, some similar lithologies can also be roughly divided. This it also shows that multisource remote sensing data can complement each other’s advantages, and their joint use can improve interpretation accuracy. Finally, the data set in the study area was randomly divided into the training set, validation set and test set at the ratio of 6:2:2. In order to ensure the uniformity of distribution of each category in the three data sets and ensure the generality of the experimental results, during the division manual adjustments are also required Fig. 2.24 shows the distribution of the training set, validation set, and test set throughout the study area. As shown in Table 2.12, the distribution of each category in the study area and the three data sets is basically the same, ensuring that the training set, validation set, and test set data are evenly distributed throughout the study area. In addition, it can be seen that schist, granite, and Quaternary system

40

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.10 The 1:50,000 remote sensing lithology classification system Code Classification Category description 0

Fourth series

It is mainly composed of alluvial deposits, alluvial deposits, chemically deposited Glauber’s salt, rock salt and salinized sand

1

Granite porphyry

It is an epigenetic intrusive rock, fully crystalline, with a porphyry structure, and the matrix structure is cryptocrystalline, variegated, hard and dense. The composition of phenocrysts is the same as that of the matrix, and the content is generally 15–20%. It is mainly composed of quartz, alkaline feldspar and a small amount of plagioclase, biotite, etc. Unlike the corresponding granite, the porphyry structure indicates that it is epigenetic

2

Granite

It is a plutonic intrusive rock with common granitic structure, porphyry-like structure, and massive structure. It is generally gray-white or flesh-red in color and has a SiO2 content of more than 66%. The main mineral is quartz, followed by potassium feldspar and acidic plagioclase. The secondary minerals include biotite, hornblende, etc. The biggest feature is that there are more alkali feldspar than plagioclase. It can be further named according to dark minerals, such as biotite granite, hornblende granite, etc.

3

Schist

It is obtained by merging schist and gneiss, both of which are mainly composed of feldspar, quartz, mica, etc. Schist has a flaky structure, with more quartz than feldspar, mainly flake and columnar minerals, and granular minerals arranged in a continuous direction. Gneiss has a gneiss-like structure or a strip-like structure, is deeply metamorphosed, has more quartz than feldspar, is dominated by granular minerals, and is arranged in discontinuous directional arrangements of flake minerals

4

Hornbeam

Also known as horny shale, it is a general term for metamorphic rocks in contact with medium- and high-temperature hydrothermal fluids with fine-grained metamorphic structures and massive structures. Generally dark in color, dense and hard. It is composed of feldspar, mica, amphibole, pyroxene, quartz and other minerals

5

Sandstone

It is composed of sandstone, conglomerate, etc., with fewer categories, and they are all sedimentary rocks. The two are mainly distinguished based on the size of the clastic particles in the rock. The clastic particles with a particle size greater than 2 mm are called conglomerate, and the clasts are mainly rock fragments; the particles with a particle size less than 2 mm are called sandstone, and the clastic particles contain more than 50%, mainly composed of quartz, feldspar, rock debris, mica, etc.

6

Tuff

It belongs to erupted volcaniclastic lava, tuff structure, and massive structure. It is composed of volcanic ash, including crystal debris, glass debris and rock debris. The particle size of most of the debris is less than 2 mm, with various colors and a loose and porous appearance (continued)

2.4 Semantic Segmentation-Level Dataset Based on Multisource Data

41

Table 2.10 (continued) Code Classification Category description 7

Diorite

It is a neutral plutonic intrusive rock. It is dark gray in color and fully crystalline. It is mainly composed of plagioclase and hornblende, with a small amount of pyroxene and biotite. According to the quartz content and dark mineral types, it can be further divided into quartz diorite, pyroxene diorite, etc.

8

Granulite

Derived from the merger of metamorphic grainstone and shallow grainstone. It has a granular crystal structure and a massive structure. Mainly composed of feldspar and quartz, dark minerals include mica, hornblende and diopside. It can be further divided according to the mineral combination and content. When the content of flake and columnar minerals is less than 10%, it is called shallow grainstone

Table 2.11 The 1:200,000 remote sensing lithology classification system in this article Code

Classification

Category description

0

Da Nanhu formation

It is mainly composed of tuff, volcanic breccia, dacite porphyry, and fine rock

1

Fourth series

Same as above table

2

Gandun formation

It is composed of epimetamorphic siliceous rock, argillaceous rock, tuffaceous sandstone, quartz sandstone, etc. In some areas, it is quartz schist, actinolite schist, quartz phyllite, etc.

3

Kawabrak group

It is mainly composed of epimetamorphic carbonate rock and siliceous carbonate rock. The lower part is dolomite and siliceous marble, and the middle part is quartz schist and quartzite. The upper part is mainly marble and quartz schist, intercalated with calcareous sandy conglomerate and quartz sandstone

4

Granite

Same as above table

5

Gabbro

Basic plutonic intrusive rock, gray-black, gabbro structure, massive structure. Mainly composed of clinopyroxene and basic plagioclase, including a small amount of amphibole, biotite, etc.

6

Quartz porphyry

Epigenetic rock, porphyry structure, massive structure, often red or gray. The phenocrysts are mainly quartz with intact crystal form and contain orthoclase. The matrix is felsic, forming a fine structure and cryptocrystalline quality

account for relatively large proportions in the study area, reaching 24.24%, 23.44%, and 19.24% respectively, while the proportions of sandstone, granite porphyry, and hornfels are relatively lower, only 2.99%, 2.65%, and 1.28% respectively. Moreover, the ratio of the most types to the least types in the study area is about 19:1, which reflects that the types of features in the entire study area are very unbalanced.

42

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.20 Correspondence between two types of scale lithology in the study area

2.5 Prior Knowledge-Assisted Dataset 2.5.1 Study Area The research area of this dataset is located in the southeastern part of Hubei Province, with a longitude of 115° 23' –15° 36' and a latitude of 30° 7' –30° 18' in Qichun County and Meichuan Town, covering an area of 436 km2 .

2.5.2 Data Sources In this study, a multimodal lithology remote sensing dataset was constructed using data from GF-6 (Kang et al., 2021), GF-3 (Zhang, 2017), ZF-3 (Wang et al., 2014), and ALOS (Shimada et al., 2010) satellites. Information related to the platforms can be accessed in Sect. 2.4.1.

2.5 Prior Knowledge-Assisted Dataset

43

Fig. 2.21 1:50,000 scale label of study area

2.5.3 Data Preprocessing Firstly, the data was preprocessed to eliminate the impact caused by terrain. It is necessary to use high-precision DEM data for orthocorrection of remote sensing data; Using the orthophoto correction model in ENVI 5.3 software, and using the DEM data extracted in the previous step as a reference image file, selecting ground feature control points to achieve orthophoto correction of multi view remote sensing data sources; Afterwards, image fusion was carried out using the Schmidt orthogonalization method, which avoids the problem of excessive concentration of PCA information and is not limited by frequency bands. The spatial information enhancement effect is good, and the final fused image is 4-band, and resampled to a resolution of 2 m. Then fuse the obtained 4-band optical images with SAR images, DEM images (Figs. 2.25, 2.26, 2.27, 2.28 and 2.29), and slope, aspect, and mountain shadow images extracted from DEM images to obtain 9-band images; Afterwards, SAR data was subjected to single polarization processing. This article uses SLC data from GF-3 satellite and PIE-SAR software for single polarization processing, mainly including radiometric calibration, interference processing, Doppler correction, and other steps to improve data quality; Using different filtering algorithms

44

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.22 1:200,000 scale label of study area

to reduce noise and clutter and improve image quality; By adjusting the brightness, contrast, tone and other parameters of the image, the visual effect of the image can be improved; By utilizing algorithms such as image segmentation and object detection, target information is extracted from single polarization SAR images, resulting in HH single polarization data. The image is resampled to 2 m using ENVI 5.3 software (Cumming & Bennett, 1979).

2.5.4 Dataset Construction 2.5.4.1

Label Merge

The labels in Tables 2.13 and 2.14 are obtained from 1:50,000 scale geological map and 1:250,000 scale geological map. The 1:50,000 scale geological map is the thematic geological map of Qichun area (Xu et al., 2022), and the 1:250,000 scale geological map includes Taihu County and Wuhan City (Zuo et al., 2018). The original data of the three geological maps are all in MapGIS format. It is necessary

2.5 Prior Knowledge-Assisted Dataset

45

Fig. 2.23 1:200,000 scale label of study area

to convert the MapGIS format to Shapefile vector file through the conversion tool, and then perform label operations through ArcGIS. After the geological map is converted into Shapefile vector file, the same lithology map patch is combined in the ArcGIS property table to obtain the three-level label. Then, the lithology with roughly the same lithology name in the third-level label, the lithology with similar distribution or the small-area lithology directly in contact with the structure on the geological map are combined and renamed to get the second-level label. Finally, according to the requirements of this paper, the first-level label is further synthesized

46

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.24 Dataset distribution diagram (black represents the training set, gray represents the verification set, and white represents the test set) Table 2.12 Statistical percentage of each category (%) Category

Dataset Research area

Training set

Validation set

Test set

Number

1400

864

288

288

Fourth series

19.24

18.90

21.20

18.24

Granite porphyry

2.65

2.77

2.42

2.51

Granite

23.44

23.43

21.27

25.55

Schist

24.24

24.68

23.23

24.00

Hornbeam

1.28

1.72

0.78

0.63

Sandstone

2.99

2.72

3.95

2.85

Tuff

13.12

13.62

12.44

12.35

Diorite

6.97

6.69

7.91

6.83

Granulite

6.07

5.51

6.81

7.03

2.5 Prior Knowledge-Assisted Dataset Fig. 2.25 Optical data of the research area—4-band (RGBN)

Fig. 2.26 10 m DEM data of the study area

47

48

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.27 30 m DEM data of the study area

Fig. 2.28 SAR data of the study area

2.5 Prior Knowledge-Assisted Dataset

49

Fig. 2.29 Dataset A category display (where a is schist, b is quaternary system, c is conglomerate, d is metamorphic rock combination, e is Wengmen metamorphic complex, f is diorite, g is leptite, h is water, i is pyroxenite, and j is granite)

into similar categories, such as schist and shallow grained rock continue to merge into metamorphic rock. The specific merger rules are shown in Tables 2.13 and 2.14. The secondary system of 1:50,000 geological map labels and the primary system of 1:250,000 geological map labels are used in the experiment, because the model proposed in this paper requires the number of lithologic label categories of 1:50,000 geological map to be greater than that of 1:250,000 geological map. The 1:50,000 geological map label system plays an auxiliary role in corresponding to the 1:250,000 geological map label system.

2.5.4.2

Crop the Image and Specify the Label

Based on the geographic coordinate information of the data, the image blocks of the same geographical position of the multi-view data are cut to realize the spatial alignment of different data scene maps, and the lithology types of the scene maps are

50

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.13 1:50,000 geological map label merge rules Level 3

Level 2

Level 1

Merge rules

Metaclastic schist matrix

Schist

Metamorphic rock

The metamorphic schist matrix and greenschist matrix both have the same composition as schist. The components of the Quanshui Ao formation are all metamorphic rocks, such as metamorphic rhyolite, metamorphic basalt, etc.; metabasic rocks, monzonitic granitic mylonites, granitic diorite gneiss, and metamorphic supracrustal rocks are all classified on the geological map as the Wengmen metamorphic complex; the dominant mineral in the Chenjia’ao formation is shallow grained rock. The schist, metamorphic rock combination, Wengmen metamorphic complex, and shallow grained rock are all metamorphic rocks

Quaternary

They are all products of the quaternary system

Greenschist matrix Quanshui Aoyan formation

Metamorphic rock assemblage

Metabasic rock block

Wengmen metamorphic complex

Metamorphic supracrustal rocks Monzogranite mylonite Granodiorite gneiss Chenjia’ao formation

Leptite

Quaternary residual Quaternary slope deposits Quaternary alluvial deposits Quaternary lacustrine deposits Construction of Gonggongzhai formation alluvial-proluvial deposits

Conglomerate

Conglomerate

The dominant mineral in the alluvial-proluvial formation of Gonganzhai formation is mainly conglomerate

Porphyric quartz monzodiorite

Diorite

Diorite

They are all a type of diorite

Water

It’s all water

Epidiorite Metaquartz diorite Diorite block Quartz monzodiorite Quartz diorite Perennial Water freshwater reservoir

(continued)

2.5 Prior Knowledge-Assisted Dataset

51

Table 2.13 (continued) Level 3

Level 2

Level 1

Merge rules

Pyroxenite

Pyroxenite

Metagabbro, metapyroxenite, pyroxenite blocks, pyroxenite blocks, and gabbro blocks are mainly composed of pyroxene, while quartzite and marble blocks are in direct structural contact with pyroxenite and are in rare quantities

Granite

Granite

Black cloud monzogranite, gneissic black cloud monzogranite, and gneissic potassium feldspar granite are all types of granite. Granodiorite is a transitional lithology from granite to diorite, but its structure is also similar to granite

Qihe river Metagabbro Metapyroxenite Pyroxenite block Pyroxenite block Gabbro block Quartz rock block Marble block Porphyric granodiorite Biotite adamellite Granodiorite Granodiorite block Gneissic biotite monzogranite Gneissic potassium feldspar granite

specified by manually interpreted lithology labels, so as to build the lithology scene classification dataset of the multi-view remote sensing data, so as to fill the gaps in the current lithology dataset. On the multi-modal remote sensing data after fusion, the large-format image data is clipped from left to right and from top to bottom by means of non-overlapping Windows, as shown in Fig. 2.6. The size of the crop is set to 256 × 256. The cropped scene map retains the projection coordinates, and the geographic coordinate information on each scene map can be used to achieve spatial alignment. At the same time, the vector data of the label is cropped in the corresponding geographical area, which is used to assign the label to the scene map. The label of scene data is specified by the proportion of lithology category in the cropped label vector file. In this paper, the category with the largest area is selected as the label, mainly because in the lithology scene, there are usually many different lithologies and mineral compositions, and the occurrence area of each lithology or mineral may be different. Therefore, selecting the category with the largest area as a label can better reflect the main lithology or mineral composition in the scene, thus improving the accuracy and robustness of the classification. In addition, in lithology scenarios, the largest category usually represents the most dominant lithology or mineral composition of the scene, for example, in geological exploration, the detection of a large area of coal or ore often means that the area has important mineral resources. Therefore, the selection of the largest category as a label can also help geological explorers more quickly determine the characteristics of the scene and potential resources.

52

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.14 1:250,000 geological map label merge rules Level 3

Level 2

Level 1

Water

Water

Water

Potassium feldspar Gneiss granitic gneiss

Metamorphic rock

Potassium feldspar granite gneiss, monzonite granite gneiss, and biotite granite gneiss are all gneiss. The dominant minerals in the Daxinwu formation, Hutashi formation, Puhe formation, Liupingyan formation, Dengying formation, Dabie mountain group C (rock) formation, Yaolinghe formation, Doushantuo formation, Qijiaoshan formation, and Wudang formation are all schist, while schist, gneiss, and quartzite are all metamorphic rocks

Monzonite granitic gneiss Black cloud granitic gneiss Daxinwu formation

Daxinwu formation

Hutaishi formation Hutaishi formation Puhe formation

Puhe formation

Liuping formation

Liuping formation

Dengying formation

Dengying formation

Merge rules

C (rock) formation C (rock) formation of the of the Dabie Dabie mountain mountain group group Yaolinghe formation

Yaolinghe formation

Doushantuo formation

Doushantuo formation

Qijiaoshan formation

Qijiaoshan formation

Wudang rock group

Wudang rock group

Quartz vein

Quartzite

Diorite

Diorite

Diorite

They are all a type of diorite

Pyroxenite

Pyroxenite

The composition is similar to pyroxenite or the geographical distribution is similar to pyroxenite or in direct contact with its structure

Quartz diorite Pyroxenite Pyroxenite Hornblende rock Gabbro Peridotite (continued)

2.5 Prior Knowledge-Assisted Dataset

53

Table 2.14 (continued) Level 3

Level 2

Level 1

Merge rules

Gneistic fine-grained porphyritic biotite monzogranite

Granite

Granite

Gneistic fine-grained porphyry biotite monzogranite, fine-grained biotite monzogranite, porphyry monzogranite, gneiss monzogranite, and monzogranite are all types of granite, and monzogranite is also one of the granite like rocks. Granodiorite is a rock that transitions from granite to diorite, and its structure is still granite structure

Quaternary Pleistocene system system

Quaternary system

They all belong to the quaternary system

Public security village group

conglomerate

The dominant mineral in the Gonganzhai formation is conglomerate

Fine grained biotite monzogranite Porphyric monzogranite Porphyric granodiorite Gneissic monzogranite Granodiorite Adamellite Quartz monzonite Holocene system

2.5.4.3

Public security village group

Brief Information of the Dataset

After data set tailoring is completed, the data set is divided into training set, verification set and test set according to the ratio of 6:2:2. The class imbalance problem exists in the data set of the research area, which means that the number of samples of a certain class is smaller than the amount of sample data of other classes. When traditional classification methods are used to directly process the unbalanced data, due to the quantity tilt between the majority class and the minority class. The model tends to favor the majority class while ignoring the minority class, resulting in low classification accuracy of the minority class (Li et al., 2019b). In this paper, the method to deal with class unbalance is to enhance the class with less data. The specific number of datasets and data enhancement methods are shown in Tables 2.15 and 2.16. Table 2.14 shows the large-scale data set produced by 1:50,000 geological maps and lithologic labels and high-resolution images. Table 2.15 shows the small-scale data set produced by 1:250,000 geological map lithology labels and low-resolution image data, Table 2.16 shows the source domain data set of transfer learning. The display of dataset A and dataset B is shown in Figs. 2.29 and 2.30.

54

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.15 Study area dataset A (1:50,000 scale dataset) Category

Schist

Raw Training set Validation Test Total Processing data Originality Data set mode Total set enhancement 23

13

39

52

6

4

62 The training set was enhanced by a factor of 4

258

154

0

154

53

51

258 No procession

Conglomerate 125

75

0

75

25

25

125 No procession

Quanshui Aoyan formation

45

27

81

108

9

9

126 The training set was enhanced by a factor of 4

Wengmen metamorphic complex

46

27

81

108

10

9

127 The training set was enhanced by a factor of 4

Diorite

57

34

102

136

12

11

159 The training set was enhanced by a factor of 4

Chenjiaosite formation

34

20

60

80

8

6

94 The training set was enhanced by a factor of 4

Water

34

20

0

20

8

6

34 No procession

Pyroxenite

28

16

48

64

7

5

76 The training set was enhanced by a factor of 4

Granite

302

181

0

181

61

60

302 No procession

Total

952

Quaternary system

1363

2.6 Transfer Learning Dataset

55

Table 2.16 Dataset B of the study area (1:250,000 scale dataset) Category

Raw Training set Validation Test Total Processing data Originality Data set mode Total set enhancement

Metamorphic rock

475

285

0

285

95

95

475 No procession

Quaternary system

205

123

0

123

41

41

205 No procession

Conglomerate 135

81

0

81

27

27

135 No procession

Diorite

107

64

64

128

22

21

171 The training set was enhanced by a factor of 2

Water

29

17

0

17

7

5

34 No procession

Gabbro

52

31

93

124

11

10

145 The training set was enhanced by a factor of 4

Granite

267

160

0

160

54

53

267 No procession

Total

952

1363

2.6 Transfer Learning Dataset 2.6.1 Study Area In order to fully explore the application prospect of multi-view remote sensing satellite data in lithology classification, this study selected two typical regions as research areas: Yabli-Weihe Town area in Heilongjiang province and Suiyang Town area in Heilongjiang Province. The former one is called research area A in this chapter, and the latter one is called research area B. The geographical locations of the study areas are shown in Fig. 2.31. Research area A is located in the south-central part of Heilongjiang Province, southeast of Shangzhi City, with a total area of 546 km2 . It is located in the hilly and shallow mountainous area, with a temperate continental monsoon climate, a wide distribution of vegetation and less lithologic outcrop area. Research Area B is located in the eastern part of Heilongjiang Province, north of Dongning City, Mudanjiang City, with a total area of 194 km2 , abundant rainfall and high forest

56

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.30 Dataset B category display (where a is water, b is metamorphic rock, c is diorite, d is pyroxenite, e is granite, f is Quaternary system, g is conglomerate)

Heilongjiang

Jilin

Liaoning Study Area 1 Study Area 2

Fig. 2.31 Geographical location of the study area

coverage rate. There are many types of natural resources, including non-coal mines, coal mines, forests, tourism, water resources, etc. The data constructed in study area B will be studied about the generalization ability of lithologic scene classification model based on multi-view data fusion, and the performance of lithologic scene classification model based on transfer learning will be evaluated.

2.6 Transfer Learning Dataset

57

2.6.2 Data Sources The types of remote sensing data available in study area A include optical remote sensing satellite data, SAR data and DEM data. The data available in study area B include SPOT5 and ZY-3 satellite data. Due to the time of obtaining ZY-3 frame data in March and the snow cover in study area B, there is a big gap between its optical data and that of study area A, which is not conducive to the later transfer learning. Therefore, the optical remote sensing satellite data obtained by SPOT5 was used to replace the optical data in ZY-3, and the DEM data was still obtained through ZY-3 data.

2.6.3 Data Preprocessing Image data of ZY-3 satellite was selected to extract and generate DEM data, which was then used as reference image files to select ground object control points to perform orthographic correction of multi-view remote sensing data sources. The optical data of study area B is shown in Fig. 2.32, and the DEM data is shown in Fig. 2.33. Fig. 2.32 Optical data in study area B—true color

0

1

2

4 km

58

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Fig. 2.33 DEM data of study area B

0

1

2

4 km

Table 2.17 Details of different view data in study area B Number of columns

Line number

Pixel size (m)

Number of bands

SPOT5 optical data

6713

7245

2

3

ZY-3 DEM data

3357

3632

4

1

The detailed information of different view data in study area B after preprocessing is shown in Table 2.17.

2.6.4 Dataset Construction The dataset construction process can refer the Sect. 2.2.4. After the same series of operations, the details of the multi-view dataset built in research area B are shown in Table 2.18.

2.7 Transfer Learning Dataset for Prior Knowledge-Assisted Study

59

Table 2.18 Study area B dataset Category

Raw data

Andesite

64

38

152

190

13

13

216 The training set is enhanced by a factor of 5

Slate

130

78

156

234

26

26

286 The training set is enhanced by a factor of 2

Schist

151

90

180

270

31

30

331 The training set is enhanced by a factor of 2

Basalt

180 108

108

216

36

36

288 The training set is enhanced by a factor of 1

Quaternary loose deposits

320 192

0

192

64

64

320 –

granite-diorite 1521 360

0

360

120

120

Total

Training set

Validation Test Total Mode of set operation Primitive Data Total set enhancement

2366

2.7 Transfer Learning Dataset for Prior Knowledge-Assisted Study 2.7.1 Study Area Refer Sect. 2.5.1.

600 Choose 600 at random 2041

60

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

2.7.2 Data Sources Refer Sect. 2.5.2.

2.7.3 Data Preprocessing Refer Sect. 2.5.3.

2.7.4 Dataset Construction 2.7.4.1

Prior Knowledge of Geological Maps

The 1:250,000 scale geological map is full coverage in Hubei Province, and 1:50,000 scale geological map is low coverage in Hubei Province, and the lithology label of the 1:250,000 scale geological map has a high degree of correspondence with the lithological label of the 1:50,000 scale geological map, and the 1:250,000 scale geological map can be used when predicting the 1:50,000 blank area, so the 1:250,000 scale geological map is useful and definitive knowledge for our ultimate goal, that is, a priori knowledge.

2.7.4.2

Geological Map Data Preprocessing

The study area includes 1:50,000 scale geological maps and 1:250,000 scale geological maps. The 1:50,000-scale geological map is the thematic geological map of the Shechun area, and the 1:250,000-scale geological map includes the Taihu County and Wuhan City. The raw data of the three geological maps are in MapGIS format, and need to be converted into Shapefile vector files through conversion tools. The rock formation was named according to the main lithology, and then the lithological plaque of the same name was merged. The process of constructing a dataset in this chapter is consistent with the previous process of constructing a multimodal remote sensing data lithology scene classification dataset through scene classification, which involves the processes of merging label, cropping images, and specifying labels. After the dataset tailoring was completed, the dataset was then divided into training set, verification set and test set at the ratio of 6:2:2. The class imbalance problem exists in the dataset of the research area, which means that the number of samples of a certain class is smaller than the amount of sample data of other classes. When traditional classification methods are used to directly process the unbalanced data, due to the quantity tilt between the majority class and the minority class, the model tends to favor the majority class while ignoring the minority class, resulting in low

2.7 Transfer Learning Dataset for Prior Knowledge-Assisted Study

61

Table 2.19 Dataset A1 of research area Category

Raw Training set Validation Test Total Processing data Originality Data set mode Total set enhancement 50

30

0

30

10

10

50

No procession

285

171

0

171

57

57

285

No procession

Conglomerate 194

116

0

116

39

39

194

No procession

The spring water Ao formation

40

24

0

24

8

8

40

No procession

Wengmen metamorphic complex

11

6

18

24

2

3

29

The training set was enhanced by a factor of 4

Chenjiaosite formation

14

8

24

32

3

3

38

The training set was enhanced by a factor of 4

Water

57

34

0

34

11

12

57

No procession

Pyroxenite

17

10

30

40

3

4

47

The training set was enhanced by a factor of 4

Granite

184

110

0

110

37

37

184

No procession

Total

852

Schist Quaternary system

924

classification accuracy of the minority class. In this study, the methods to deal with class unbalance were used to enhance the class with less data. The specific number of datasets and data enhancement methods are shown in Tables 2.19 and 2.20.

62

2 Geological Remote Sensing Dataset Construction for Multi-level Tasks

Table 2.20 Dataset A2 of research area Category

Schist

Raw Training set data Originality Data Total enhancement

Validation Test Total Processing set set mode

24

14

42

56

5

5

66

The training set was enhanced by a factor of 4

155

93

0

93

31

31

155

No procession

Metamorphic rock

54

32

0

32

11

11

54

No procession

Wengmen metamorphic complex

74

44

0

44

15

15

74

No procession

Diorite

109

65

0

65

22

22

109

No procession

Leptite

66

39

0

39

13

14

66

No procession

Water

8

4

12

16

2

2

20

The training set was enhanced by a factor of 4

Gabbro

62

37

0

37

12

13

62

No procession

Granite

269

161

0

181

54

54

269

No procession

Total

821

Quaternary system

875

References Cumming, I., & Bennett, J. (1979). Digital processing of Seasat SAR data. In ICASSP ‘79. IEEE international conference on acoustics, speech, and signal processing, Washington, DC, USA (pp. 710–718). Du, P., Tan, K., & Xia, J. (2012). Hyperspectral remote sensing image classification and application of support vector machine. Science Press (in Chinese). Kang, Y., Meng, Q., Liu, M., Zou, Y., & Wang, X. (2021). Crop classification based on red edge features analysis of GF-6 WFV data. Sensors, 2021(21), 4328 (in Chinese). Li, C., Wang, X., He, C., Wu, X., Kong, Z., & Li, X. (2019a). National 1:200000 digital geological map (public version) spatial database. Chinese Geology, 46(S1), 1–10 (in Chinese). Li, D. (2012). China’s first civilian three-line array stereoscopic mapping satellite—Ziyu-3 mapping satellite. Journal of Surveying and Mapping, 41(03), 317–322 (in Chinese).

References

63

Li, F., Li, X., Chen, W., Dong, Y., Li, Y., & Wang, L. (2022). Automatic lithology classification based on deep features using dual polarization SAR images. Earth Science, 47(11), 4267–4279 (in Chinese). Li, Y., Chai, Y., Hu, Y., & Yi, H. (2019b). Review of classification methods for imbalanced data. Control and Decision, 34(4), 673–688 (in Chinese). Shimada, M., Tadono, T., & Rosenqvist, A. (2010). Advanced land observing satellite (ALOS) and monitoring global environmental change. Proceedings of the IEEE, 98(5), 780–799. Tadono, T., Nagai, H., & Ishida, H. (2016). Generation of the 30 M-mesh global digital surface model by ALOS PRISM. In International archives of the photogrammetry, remote sensing & spatial information sciences. Wang, T., et al. (2014). Geometric accuracy validation for ZY-3 satellite imagery. IEEE Geoscience and Remote Sensing Letters, 11(6), 1168–1171. Xie, X., Li, M., Hei, H., Cheng, G., Gao, X., Zha, X., Cook, & Huang, Y. (2022). China geological survey: Xinjiang saltwater spring sheet (K46E009024) 1:50000 Geological map database. Geological Science Data Publishing System (in Chinese). Xu, D., Peng, L., Deng, X., Wang, J., Liu, H., Liu, C., Tian, Y., Jin, W., Zhang, W., Xu, Y„ Liu, H., Jin, X. D., Niu, Z., Wei, Y., & Tan, M. (2022). China geological survey: A 1:50000 geological map database for the Qichun area in Hubei Province (within the maps H50E011006, H50E011007, H50E012006, and H50E012007). Geological Science Data Publishing System (in Chinese). Zhang, Q. (2017). System design and key technologies of the GF-3 satellite. Acta Geodaetica Et Cartographica Sinica, 46(3), 269–277 (in Chinese). Zhang, Y., Qin, Q., Chen, L., Wang, N., & Zhao, S. (2015). Research progress of hyperspectral remote sensing for rock and mineral identification. Optics and Precision Engineering, 23(08), 2407–2418 (in Chinese). Zuo, Q., Ye, T., Feng, Y., Ge, Z., & Wang Y. (2018). Construction of structural map spatial database (V1) in 1:250000 segments of land in China. Development and Research Center of the China Geological Survey [Creation Institution], 2006. National Geological Data Center [Dissemination Institution] (in Chinese).

Chapter 3

Lithological Classification Based on Large-Scale Pixel Neighborhood and VGGnet-Based Transfer Learning

Abstract Remote sensing technology can provide a powerful technical reference for different geological prospecting work. It is an effective method to quickly indicate the engineering geological situation of areas with limited investigation due to poor traffic conditions, which can provide guidance for the local engineering projects. In view of the problems such as insufficient samples, difficulty in intelligent interpretation, and difficulty in selecting appropriate scale in the current stage of intelligent extraction of rock mass, this study proposed a methodology that utilized large-scale pixel neighborhood data based on spectral and spatial characteristic information of different lithologies in remote sensing images and used VGG16 convolutional neural network to pre-train on ImageNet. With the trained model and the corresponding initialization parameters, the model was fine-tuned by using the constructed remote sensing data of the rock mass. The parameters of the network model were constantly adjusted, and the optimal rock mass classification model were successfully obtained. The experimental results showed that the overall classification accuracy in the 441 km2 study area reached 85%, which effectively improves the accuracy and efficiency of rock mass interpretation based on remote sensing.

3.1 Introduction Machine learning and deep learning have been proved that can greatly improve the efficacy of geological work, and a lot of related works have been published in this realm. The machine learning algorithms in the process of lithology identification rapidly developed from the initial unsupervised clustering algorithm (Zhang & Mo, 2007) and unsupervised self-organized competitive neural network (Cai, 2015) to the subsequent supervised random forest algorithm (Kang & Lu, 2020), BP neural network (Chen et al., 2018) and convolutional neural network (Chen et al., 2019), etc., with the continuing increase of the model performance. Traditional machine learning methods have some inherent problems and shortcomings. Neural network methods are prone to overfitting and slow convergence (Chen et al., 2019). Support vector machine is significantly depending on the selection of the kernel function

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_3

65

66

3 Lithological Classification Based on Large-Scale Pixel Neighborhood …

and penalty coefficient. The method of unsupervised learning has shortcomings such as difficulty in controlling the classification categories and the requirement of large samples to ensure the performance (Cai, 2015). In recent years, the development of deep learning theory overcomes the issues of shallow learning model and artificial extracting features, such as low efficiency and accuracy. Focusing on the field of image classification, convolutional neural networks (CNN) was proposed by imitating human visual information processing, and can imitate the abstract process of human understanding of images. It has a strong feature expression ability and can extract abstract and high-order features. Compared with the past remote sensing image analyzing methods, which require complex artificial feature extraction, CNN only needs to be trained with labeled image data samples to obtain good classification results automatically (Ouyang, 2017). However, to obtain enough image data from satellites for a long period is difficult, and the marked high-quality remote sensing image is even more scarce. Therefore, sample data has become a relatively difficult problem in the classification of remote sensing rock mass. With only limited labeled data, overfitting of classification may be significant in CNN. Transfer learning is to transfer the knowledge and methods acquired from one field to another, and makes maximum use of the knowledge in the source field to assist in learning the knowledge in the field of target tasks which has limited data available (Shi et al., 2016). It can to some extent solve the problem of the small amount of sampling data in the process of model training. Experiments show that the method of transfer learning with fine-tuning is feasible in the realm of remote sensing image interpretation (Zhou et al., 2020).

3.2 Methods 3.2.1 Construction of Model In the process of training models with remote sensing image data, there are often insufficient training samples, resulting in insufficient feature extraction ability and low prediction accuracy of the final trained deep learning model. Therefore, the emergence of transfer learning was conducted in this study. The purpose of transfer learning is to transfer the knowledge of the existing prior sample data, and use the knowledge already learned to help the model learn with the new data. In this paper, VGG16 convolutional neural network and transfer learning methods were combined to fine-tune the model learned from ImageNet dataset with rock mass samples and obtain the final model for rock mass classification (Li et al., 2022). Remote sensing geological interpretation data and remote sensing images in the study area were used to construct the training-verification dataset of this experiment.

3.2 Methods True/falsecolor data

67 48 by 48 neighborhoods

VGG Transfer Learning model Model fine-tuning

Class probability

Labels

··· ···

Samp Transfer learning

···

··· ···

ImageNet dataset

Pre-trainedVGG model

Fig. 3.1 Algorithm flow chart

The samples were all 48 × 48 neighborhood image data. In the construction stage of the model, the VGG16 transfer learning model was constructed by combining ImageNet dataset and rock mass dataset, and the model with the best verification accuracy was retained. Finally, the trained model was used to predict the neighborhood data of the whole map segmentation, and the predicted labels were evaluated according to the remote sensing geological interpretation data. The algorithm flow is shown in Fig. 3.1.

3.2.2 VGG16 Convolutional Neural Network Model The VGG convolutional neural network model (Simonyan and Zisserman, 2014) was proposed by Oxford University in 2014. Among many VGG distortions, VGG16 performs well in image classification and target detection tasks, so this study chose it to perform classification task. The network uses a continuous 3 × 3 convolution kernel instead of a larger convolution kernel, and increases the depth of the network to make the learning of the sample more progressive. VGG16 convolutional neural network uses a convolutional layer that is repeatedly superimposed and a pooled layer. Convolutional layer and pooling layer are actually an extraction process of input images. Multilayer convolutional pooling is superimposed on each other to make the network have a larger receptive field while reducing network parameters, and the original single linear change becomes diversified through ReLU activation function, thus enhancing the learning ability (He et al., 2019). The samples can be classified through the full connection layer and the output layer, and the probability distribution of the current samples belonging to different types can be obtained through softmax activation function (Zhang et al., 2019).

68

3 Lithological Classification Based on Large-Scale Pixel Neighborhood …

3.2.3 VGG16 Transfer Learning Model ImageNet project is a large visualization database for visual object recognition software research, which contains about 15 million high-definition images, covering more than 22,000 types of scene information. Each image is strictly manual screening and labeling, and it is a large dataset widely used in the field of deep learning image. High-resolution remote sensing image data has many similarities with ImageNet optical high-definition images in terms of image lines, textures, colors, Spaces and structures. Based on the similarity of images, the feature extraction process from low layer to high layer on ImageNet is also applicable to feature extraction of remote sensing image data (Zhou et al., 2020). Transfer learning is a machine learning method that is able to apply knowledge learned from other fields to the desired research area, requiring only a small amount of data sets and training time to achieve support for new tasks. Due to the small amount of remote sensing data of rock mass, the transfer learning method can not only make up for the insufficient number of rock and soil mass classification samples, but also reduce the training time of the model. VGG16 convolutional neural network is a process of image extraction and refining, which has a deep network structure. At first, the convolutional layer only extracts the features of the image, while the network layer only handles specific tasks when it is deeper. In the process of transfer learning, the model that has been pre-trained in ImageNet was applied, retaining the weights of the lower layers of the model, and then the study only retrained and fine-tuned the relevant parameters for the upper layers using the data of the target domain. During the fine-tuning process, the original top layer was removed and a new output layer was added, and the softmax function was added for classification. Figure 3.1 shows the transfer learning process of the model. The weights trained in ImageNet in the model were transferred and fine-tuned using the rock-soil mass dataset. The VGG16 convolutional neural network model used is a three-band natural image applied to conventional classification, and the input images were in standard RGB formats. The image data in the study area were with 4 bands (R, G, B and infrared). Two sets of data were constructed according to the combination of different bands: true color (G, B, R band) and false color (N, G, R band). The datasets of different band combinations were trained independently, and the model with the best accuracy of the verification set was eventually saved.

3.2.4 Accuracy Evaluation In machine learning, the confusion matrix can be used to visually evaluate the performance of supervised learning algorithms, and is a square matrix of size, n_classes represents the number of classes. Based on the confusion matrix, Precision, Recall and OA in the experimental results can be calculated as the performance of the

3.3 Results

69

analysis model. Precision is the number of correctly classified samples divided by the total number of samples predicted by the model to belong to the class. Recall denotes the number of all correctly classified samples of this class divided by the total number of true labels belonging to this class. OA represents the proportion of correctly classified among all samples. Pr ecision = Recall = OA =

TP T P + FP

TP T P + FN

TP +TN T P + T N + FP + FN

(3.1) (3.2) (3.3)

where TP represents the number of samples whose prediction results and labels are both positive examples; FP represents the number of samples whose prediction results are positive examples and labeled as negative examples; FN represents the number of samples whose prediction results are negative examples and whose labels are positive examples; TN represents the number of samples for which both the prediction result and the label are counterexamples. In addition, F1_score is used to evaluate the classification accuracy, and the ratio of the number of pixels with correct classification to the total number of pixels is used as the overall accuracy. Both the single-class object accuracy and the overall accuracy are calculated from the average value. Generally, the closer the F1_score is to 1, the better the model, and the F1_score is defined as: F1_score =

2 · Pr ecision · Recall Pr ecision + Recall

(3.4)

3.3 Results 3.3.1 Experimental Environment and Setup The deep learning framework used in this method was the experimental framework Keras based on the tensorflow backend. The system used was 64 Ubuntu16.04 LTS, with Python 3.6. The GPU of the network trained on the test hardware platform was NVIDIA GeForce GTX 2080Ti, the CPU was Intel Xeon E5-2620, the memory was 125G, and the solid state was 512G. In this experiment, the number of iterations was set to 500, and the batch size represents the number of images placed into the GPU for training each time. It is limited by the network model parameters and GPU memory, so the study set the

70

3 Lithological Classification Based on Large-Scale Pixel Neighborhood …

batch size to 1024. Adam algorithm was used as the optimizer to train the model in the process of fine-tuning, and the learning rate was set to 0.0001.

3.3.2 Full Image Prediction The 48 × 48 neighborhood data was cropped, and its class label was used as the label of the neighborhood center pixel. When dealing with the surrounding boundary data of the study area, since the complete neighborhood data cannot be trimmed with the target pixel as the center, 24 pixels were needed to be filled in each of the upper, lower, left and right boundaries, and mirror filling was used as the filling method. The study area was predicted, and the effect of pixel-level classification was achieved by labeling each pixel in the image with a class label. The trained VGG16 transfer learning model was used to predict the neighborhood data of each pixel of the image data, and the label image with the same resolution as the original image was obtained.

3.3.3 Visual Evaluation of Prediction Results The prediction and classification results of rock and soil mass using true color band combination data and false color band combination data are shown in Fig. 3.2. It can be seen that the prediction results can predict the corresponding rock mass categories, but there are some salt and pepper phenomenon, and there are some misclassification phenomena at the classification boundary of “sedimentary rock” and “extruded rock”.

3.3.4 Quantitative Accuracy Evaluation The predicted confusion matrix is shown in Fig. 3.3. The single-class classification accuracy of the predicted results of the two band combinations is above 84% and 85% respectively, and the best single-class accuracy of the false color band combination can reach 97%. At the same time, it can be found that the misclassification of “extrusive rock” into “sedimentary rock” exists in both combinations. The remote sensing lithology identification method has a good effect in the area without vegetation cover, but it is difficult to achieve recognition effect in the area with high vegetation cover (Wang et al., 2019). There is a large area of vegetation cover in the study area, which interferes with the identification of rock mass. At the same time, the vegetation in adjacent areas is similar, and the coverage of vegetation produces similar spectral information, which also interferes with the classification accuracy (Table 3.1).

3.3 Results

71

Metamorphic

Sedimentary

Extrusive

Quaternary system

Water

Granite

Fig. 3.2 Prediction results for the full map of the study area. Left—true color band prediction, right—false color band combination Normalized confusion matrix

Normalized confusion matrix

Sedim

Quaternary system

Quaternary system Real label

Metamor

Sedim

Real label

Metamor

Granite

Granite

Extrusive

Outburst

Water

Water

Predicting labels

Predicting labels

Fig. 3.3 Confusion matrix. Left—true color band prediction, right—false color band combination Table 3.1 Accuracy evaluation (average of 5 experiments) Evaluation Index

Precision

Recall

F1

OA

True color

0.839

0.872

0.845

0.879

False color

0.867

0.918

0.889

0.916

72

3 Lithological Classification Based on Large-Scale Pixel Neighborhood …

3.4 Conclusion Remote sensing technology provides an efficient method for large-scale lithology identification and a feasible solution for geological resource exploration and national geographic survey. The intelligent extraction and classification method of rock and soil mass based on large-scale pixel neighborhood and VGG transfer learning proposed in this study has the following innovations. The classification of rock and soil mass is completed by transfer learning. In the remote sensing classification of rock mass, due to the difficulty of obtaining relevant data, it is impossible to obtain sufficient training samples, and there is no public data et for research. Therefore, this study used transfer learning of ImageNet natural dataset, which has better classification accuracy and generalization ability than the method that only used the target region data to train, and provided an effective solution for the classification network of rock and soil mass based on small samples. Intelligent interpretation. Due to the cover of vegetation, it is very difficult for manual experts to interpret. Based on the large scale pixel neighborhood and VGG16 transfer learning model, the intelligent interpretation method of rock and soil mass proposed in this study was verified by experiments, which can greatly improve the interpretation speed and classification accuracy. Large scale neighborhood data. Due to the aggregate spatial characteristics of rock mass and pixel scale neighborhood data, it is difficult to identify rock mass. Therefore, large-scale domain data was used to preserve the spatial characteristics of rock mass data and effectively improve the classification accuracy. Combined with the existing methods, the author believes that the following aspects are worthy of further research. The prediction results obtained by using the two bands combination can achieve higher accuracy, and the results obtained by comparing the two methods show that there is complementarity between the two methods: in the true-color prediction chart, the “granite” is mistakenly divided into the “extrusive rock”, and the classification is correct in the false-color prediction combination; In the false-color prediction, the parts of the ‘granite’ region that were misclassified were correctly classified in the other combination. In the later stage, it can be considered to combine the data of the two bands for feature fusion to improve the overall prediction effect. For the large-scale neighborhood data, it is difficult to have a specific numerical index to measure a suitable neighborhood size, and further research on neighborhood size can be considered in the later stage.

References Cai, Z. (2015). Application of self-organized competitive neural network in logging data interpretation of sandstone type uranium mine. East China University of Technology (in Chinese). Chen, G., Liang, S., Wang, J., & Sui, S. (2019). Application of convolutional neural network in lithology identification. Well Logging Technology, 43(02), 129–134 (in Chinese).

References

73

Chen, K., Li, J., Huang, C., Chen, W., Wang, G., & Liu, Y. (2018). Application of BP neural network in potassium-rich brine. Advance in Earth Science, 33(06), 614–622 (in Chinese). He, S., Ren, L., & Tian, X. (2019). Rolling bearing fault diagnosis based on convolutional neural network. Ordnance Automation, 38(03), 42–44 (in Chinese). Kang, Q., & Lu, L. (2020). Application of random forest algorithm in logging lithology classification. World Geology, 39(02), 398–405 (in Chinese). Li, F., Li, X., Chen, W., Dong, Y., Li, Y., & Wang, L. (2022). Automatic lithology classification based on deep features using dual polarization SAR images. Earth Science, 47(11), 4267–4279 (in Chinese). Ouyang, Y. (2017). Remote sensing image scene classification based on convolutional neural networks. Hunan University (in Chinese). Shi, X., Fang, X., Zhang, D., & Guo, Z. (2016). Image classification based on deep learning hybrid model transfer learning. Journal of System Simulation, 28(01), 167–173 (in Chinese). Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Wang, J., Ye, F., Qiu, J., Meng, S., & Zhang. (2019). Study on lithology identification and classification by remote sensing. World Nuclear Geology, 37(01), 10–22 (in Chinese). Zhang, H., Liu, J., Zhao, X., Hu, X., & Li, H. (2019). Study on AI-assisted diagnosis and classification of acute lymphoblastic leukemia blood cell microscopic images based on VGG16. Chinese Medical Equipment, 34(07), 1–4 (in Chinese). Zhang, T., & Mo, X. (2007). Complex lithology identification based on crossplot and fuzzy clustering algorithm. Journal of Jilin University (Earth Science Edition), (S1), 109–113 (in Chinese). Zhou, L., Chen, L., Liu, J., Zuo, X., Ge, Q., & Chen, X. (2020). Research on scene classification of high-resolution remote sensing images based on transfer learning. Journal of Henan University (Natural Science Edition), 50(04), 443–450 (in Chinese).

Chapter 4

Lithological Remote Sensing Scene Classification Based on Multi-view Data

Abstract Lithology classification is an important branch of remote sensing of geological environment. Deep learning method has strong feature extraction ability and has been widely used in the field of geological remote sensing classification. However, in vegetated areas, the features of remote sensing lithology images are complex, and it is difficult to effectively extract the key feature information of lithology. To solve these problems, based on deep learning, this study carries out research from the two core levels of data and model. To eliminate the boundary effects caused by the interference of feature information of multiple types of lithology, a method of specifying the label of the scene graph after cropping was proposed, which can effectively eliminate the multi-level cropping at the boundary position. This work provides data support for subsequent model training. In order to improve the ability of the model to extract key information of lithology, a lithologic scene classification network model based on enhanced feature fusion and channel attention (EFFCA) was proposed by using the dense connection network and channel attention mechanism. Then based on the strategy of feature-level fusion and data-level fusion, we use EFFCA to construct a multi-view data fusion lithological scene classification model. Experiments on self-constructed multi-view lithology datasets show that compared with VGG16, DenseNet121 and other models, our proposed model achieved better performance. The results of this study can provide theoretical method support for scene classification of lithological remote sensing data, and have certain scientific significance and application value.

4.1 Introduction 4.1.1 Research Background and Significance The accumulation of large amount of remote sensing earth observation data provides a great data basis for contemporary remote sensing lithology identification (Han, 2016; Wang, 2020). As an important branch of geological remote sensing, the classification and identification of remote sensing lithology data is of great significance for

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_4

75

76

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

geological environment analysis and plays an important role in resource and environment investigation, engineering geological environment investigation, engineering site selection and other fields (Fu et al., 2017; Wang, 2020). Different from conventional remote sensing image classification, lithology classification needs to base on a high-level abstract semantic feature. A lithology is obviously or slightly different from other lithology in the distribution area, thus forming a “lithology feature image block” in an area. Therefore, lithology classification cannot be carried out pixel-by-pixel, and needs to be carried out at the scene scale. Figure 4.1 shows the comparison of spatial characteristics of granite at different scales. The characteristics of granite at the “pixel” scale are shown in Fig. 4.1a. The image can only reflect local information of the surface, without clear physical meaning, and cannot describe its high-level semantic information. The characteristics of granite at the “scene” scale are shown in Fig. 4.1b. It can be observed that there is a ribbon texture structure in the image, and the distribution direction of lithology can also be observed. The units in the local area blocks of the image can more accurately correspond to the high-level semantic category information of lithology. The lithologic stratigraphy unit undergoes a long evolution process. Due to the difference in the weathering degree of the rock and surface cover, the spectral information of the rock shows a certain variability. Differences in the composition of different lithologic constituent units will lead to differences in the characteristics of lithologic data at different scales in remote sensing images, and the classification of lithologic data can be carried out by capturing these different combinations of characteristics (Ma & Li, 2008).

(a) Pixel scale

(b) Scene scale

Fig. 4.1 Comparison of spatial characteristics of granite at different scales (Li et al., 2019a, 2019b)

4.1 Introduction

77

4.1.2 Research Status Since feature extraction is the basis for understanding and analyzing remote sensing data (Dong & Zhang, 2019), different feature extraction methods can obtain information with different representation degrees. Scene scale feature extraction methods mainly include low-level visual features, mid-level data features and deep abstract semantic features. The low-level and middle-level visual feature information are generally referred to as artificial features, and specific derivation and transformation methods need to be constructed manually to achieve the task of feature extraction. So in the field of lithology classification, it can be divided into classification based on manual features and classification based on abstract semantic features.

4.1.2.1

Classification Techniques Based on Hand-Crafted Features

There has been a large number of studies on lithology identification based on remote sensing technology. In the 1980s, Hunt et al. carried out the study of mineral reflectance spectra. In this process, the causes of mineral spectra were analyzed. By analyzing the characteristics of rock spectra, Gaffey concluded that the absorption spectra of lithologic minerals could be used for their identification (Gaffey, 1986; Hunt & Ashley, 1979). Spectral features and texture features are the two most basic features of two remote sensing images. The former is the essential feature to distinguish different ground objects, while the latter one is the spatial relationship of the grayscale image, so it does not depend on the color and brightness of the object, and can reflect the spatial arrangement pattern of the image grayscale (Yang et al., 2009). The traditional lithology classification algorithms for spectral data are mainly based on spectral similarity and spectral characteristics (Ni & Wub, 2019; Vignesh & Kiran, 2020; Zhao et al., 2016): The method based on spectral similarity is to achieve the classification of lithology by constructing a spectral similarity measure, such as spectral angle mapping and spectral information difference (Wang et al., 2021). However, it focuses more on the overall waveform characteristics of the spectral data, while ignoring some details, resulting in the information not being fully utilized. The spectral characteristic method usually defines several absorption characteristics, including absorption depth, absorption area, absorption position, absorption width, etc., to identify different lithologies. These methods generally only extract the shallow features of each pixel, and do not consider the deep semantic features of the pixel. Cardoso-Fernandes et al. (2021) used reflectance spectroscopy to analyze lithium minerals and constructed a spectral library of reflectance size and absorption depth of lithium samples. These traditional feature extraction methods based on spectral characteristics of remote sensing data can only roughly extract lithology information, and cannot effectively obtain detailed information. Perez et al. mainly combined the principal component analysis method and wavelet texture features to extract spectral and texture information from remote sensing data, and used support vector machine

78

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

for classification (Perez et al., 2011). Xie et al. (2017) achieved the identification of damaged buildings by calculating the texture and spectral characteristic parameters of buildings. Jia et al. (2017) achieved the identification of bridges through the shape and texture information with high reliability. And PeSaresi et al. proposed the feature method based on texture or shadow (Pesaresi & Gerhardinger, 2011; Pesaresi et al., 2008). Huang and Zhang (2011) proposed the feature extraction method based on mathematical morphological transformation. Huang et al. (2003) proposed the calculation of image texture data based on the number variation function in geostatistics, and used the spatial texture variation characteristics in remote sensing image data to identify lithology (Li, 2004). Pan et al. (2009) proposed a multifractal model of lithology by using ETM image data of different lithology, topographic structure map and lithology component map, and found the correlation between multifractal spectrum of lithology and geological structure. The methods proposed by the above scholars mainly achieve the task of extracting image feature information through complex and diverse mathematical derivation and transformation, which can reflect more specific attribute information in highresolution images. Compared with the deep semantic features in the later stage, due to the integration of manual intervention, they have strong interpretability (Dong & Zhang, 2019). However, the dimensions of manually constructed feature attributes to describe image attribute information are very limited. When dealing with the huge amount of high-resolution image data, only part of the data can be extracted.

4.1.2.2

Classification Techniques Based on Abstract Semantic Features

Artificial features have strong interpretability (Dong & Zhang, 2019), but their representation ability is not strong when facing large data and complex tasks. Deep feature extraction is generally implemented based on deep learning methods. Deep learning mainly generates more abstract and richer feature data by stacking multiple nonlinear processing units, which can achieve automatic feature extraction and greatly improve work efficiency. Zhang et al. (2016) improved DAE and proposed the Stacked Denoising Auto Encoder (SDAE) to carry out the classification task of remote sensing images. Li et al. (2019a, 2019b, 2021) put forward the DBNs method for land cover classification and made a fine classification of large-area surface features. Convolution neural network (CNN) model is a simulation of the human brain structure of multi-layer stacked type neural network structure. After training and learning through the use of labeled data, it can directly recognize from the original image visual patterns (Bengio et al., 2012). Comparing with manual feature extraction methods, CNN model only needs to provide enough labeled data to obtain good classification results (Ouyang, 2017). With the improvement of technology, the resolution of remote sensing data has gradually become higher, which makes it possible to contain richer spatial and spectral information. Therefore, many scholars have used hyperspectral remote sensing technology to excavate or identify the physical and chemical characteristics of

4.1 Introduction

79

lithology, and it has been widely used in the fields of lithology geological exploration, classification and mapping (Seid & Suryanarayana, 2021; Wu et al., 2020; Ye et al., 2020). Liu et al. (2021) used thermal infrared hyperspectral data and deep convolutional neural network to classify lithology. Ye et al. (2020) used hyperspectral data from the Gaofen-5 satellite combining with deep learning for classification. Liu et al. (2020) used the projection maximum average deviation criterion and the idea of extreme learning machine and transfer learning for lithology identification of well logging data. In addition, some scholars have introduced the method of multi-source data fusion, which is to combine different types of data information from the same target (He et al., 2010; Wang & Cheng, 2008), such as high spatial resolution optical remote sensing satellite data, SAR data, DEM data, etc. This method can give full play to the characteristic characterization advantages of data in different lithology categories and improve the classification performance of the model. High spatial resolution optical remote sensing satellite data can provide spatial and spectral data of the ground surface by using different spectral characteristics of the ground. SAR data has certain surface penetration, which can effectively reflect the difference of surface morphology and roughness, and can effectively extract texture feature information (He & Wang, 1990). DEM data contains plane and elevation information of ground objects, which can be used to describe the spatial distribution of geomorphological forms. By integrating DEM with spatial and texture features, classification accuracy can be improved (Jakob et al., 2015; Othman & Gloaguen, 2014). Seid and Suryanarayana (2021) fused DEM data and multi-spectral optical data to extract lithologic features by using deep learning networks. Wang et al. (2021) made comprehensive use of high-resolution, multispectral and hyperspectral data, combining with spectral angle matching, minimum noise separation transformation, typical band combination and other methods to enhance remote sensing information, and achieved the rapid lithology classification in plateau relief landform region. Wang et al. (2020) combined optical remote sensing data and geochemical data, including the physical and chemical properties of lithologic units, built a hybrid model of data fusion, and used machine learning to extract characteristic data for lithologic classification. Pal et al. (2020) used a variety of remote sensing data fusion and classifier integration methods to achieve pixel-by-pixel division. The abovementioned scholars mainly used the remote sensing data obtained by using different imaging methods for fusion to achieve the improvement of lithology classification performance. The attention mechanism, which mainly simulates the way humans understand and perceive images, can also enhance the ability of deep learning network models to extract key features of targets. Commonly used attention mechanisms include spatial attention mechanism and channel attention mechanism (Wang & Fan, 2021). The former assigns weights to the feature map in space, and assigns greater weights to the locations where important information is located. The latter specifies the weight information of each channel data, and usually assigns greater weights to channel data that are more important to the result to highlight its role. Tong et al. (2020) carried out scene classification of remote sensing data based on the improved model of DenseNet121 (Huang et al., 2017). Tian et al. (2021) proposed a multi-scale feature

80

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

fusion remote sensing scene classification model by improving ResNet network. They both used the channel attention module to improve the model’s ability to focus on key information and their models’ performances were greatly improved. Chen et al. (2022) included the global context space attention module in their model to extract context information in the complete scene graph and improve the performance of model classification.

4.1.3 Research Objectives and Main Research Contents At present, the traditional classification method based on manual features has insufficient representation ability of the model and is difficult to extract effective information. The classification method based on deep learning has been greatly improved by virtue of its excellent feature extraction ability, and has also been widely used in the field of remote sensing classification. However, the current lithology classification focuses more on the uncovered outcrop area, and the lithology information in the vegetated area is occluded, which becomes “weak information”. Due to the imaging mechanism of remote sensing data, the phenomenon of “the same objects with different spectrum” and “the same spectrum with foreign objects” in the same area is more significant. It is difficult to effectively extract the key feature information of lithology through only a single data source. The traditional network is difficult to capture the correlation between these features at different scales, causing the lower prediction performance. In order to solve these problems, this study carries out research from the two core aspects regarding data and deep learning model. Aiming at the problem that the boundary area needs to be eliminated due to the interference of multi-type lithology characteristic information when cutting data based on remote sensing images, taking Yabuli-Weihe Town area of Heilongjiang Province as the study area, based on the data of ZY-3 and Gaofen-3 and satellites, a method of specifying the scene label after cropping is proposed to construct a multi-view remote sensing data lithology scene classification dataset (Sect. 2.2). It can effectively eliminate the image data with multiple types of lithological feature information interference clipped at the boundary position, and improve the quality of the dataset. Different from conventional remote sensing image classification, lithology classification belongs to high-level abstract semantic features. From the perspective of lithology itself, there are obvious or slight differences in the distribution of one kind of lithology with other lithologies, thus forming a “lithology feature image block” in a region. Therefore, the lithology classification cannot be carried out in the way of a single pixel, and it needs to be carried out on the scene scale. In the vegetation-covered area, most of the image data received by the sensor are vegetation information, and the influence of clouds, topography and sensor anomalies makes it very difficult to extract lithology information from remote sensing images. However, vegetation cannot change the development characteristics of valleys and mountains, and it is

4.2 Methods

81

necessary to distinguish different lithologies by combining their texture and mountain extension or contour. Therefore, it is generally difficult to extract information characteristics of lithology with only optical data. The differences in the composition of different lithological units will lead to the differences in the characteristics of lithology data at different scales on the remote sensing images. The classification of lithology data can be carried out by capturing these different combinations of characteristics. In addition, due to the problem of data redundancy, some data may have little significance to the classification results, and even interfere with the model classification. Therefore, the conventional method of increasing the depth of the network by model stacking is difficult to capture the association between these features at different scales and eliminate the interference of redundant information, it is difficult to capture the key feature information of lithology, and it is difficult to improve the model performance. Aiming at the problem that it is difficult to effectively extract key feature information for lithology remote sensing classification in vegetated areas, this study proposed a lithology scene classification model based on multi-view data fusion. Firstly, the multi-scale data features are extracted and fused by using the densely connected network with enhanced feature fusion. The channel attention mechanism was added to assign weights to different channel data to improve the ability of the model to focus on key feature information. Therefore, we proposed a lithology scene class network model based on enhanced feature fusion and channel attention (EFFCA). Then based on the strategy of feature-level fusion and data-level fusion, we use EFFCA to construct a multi-view data fusion lithology scene classification model, which comprehensively use multi-source data to form a more complete description of the research target, extract more comprehensive and key characteristics of lithology information. Finally, we use the constructed data to evaluate the two classification models.

4.2 Methods 4.2.1 Lithologic Scene Classification Based on Multi-view Remote Sensing Data Fusion There are few studies on lithology classification by using domestic satellite data to carry out multi-view remote sensing data collaborative processing. Multi-view data can provide more complete feature description information of lithology from multiple dimensions, and these complementary information in space or time can be combined to produce a consistent interpretation of the lithology target area. High-resolution optical remote sensing data contains rich spectral information, and the image is clearer, which can better distinguish the texture and structure information of the ground object type, and is also conducive to the identification of the human eye. SAR data has good penetration, which can reduce the interference of

82

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

clouds and fog in the imaging process, and make up for the imaging problem of optical image data in poor weather situations. DEM data can more intuitively describe the spatial distribution of geomorphology, which is a very important environmental factor in the process of lithology classification. In decision-level fusion, data from different views are not strongly correlated in the processing process, and are usually used to solve problems such as matching between targets (Wu, 2021). Therefore, based on multi-view data and according to the fusion strategy of data-level and feature-level, this study proposes a lithology scene classification model based on multi-view data-level fusion and a lithology scene classification model based on multi-view feature-level fusion.

4.2.1.1

Lithologic Scene Classification Model Based on Multi-view Data Level Fusion

Data-level fusion, also known as pixel-level fusion, is the fusion of the original remote sensing image data or pre-processed data. The fused data contains the complete information in the original data, which is more conducive to the use of the fused data for the next analysis and processing. Due to the different imaging mechanism, remote sensing data from different views have different advantages and characteristics. Through the data level fusion, the advantages of different types of remote sensing data can be combined, the complementarity of different remote sensing images can be played, and the quality of data can be improved. A lithologic scene classification model based on multi-view data level fusion was constructed with the strategy of data level fusion and EFFCA network as the backbone network. The method of data-level fusion can retain the original information in the multi-view data source to the greatest extent, including a large number of lithologic spectrum, texture, shadow and terrain data, which are crucial to the classification of lithology. And then EFFCA network has strong feature extraction ability to excavate the key feature information related to lithology. The structure of the network is shown in Fig. 4.2. There are three branches at the input end, and three kinds of view scene data aligned with spatial features need to be input. First, optical data and SAR data are convolved by 7 × 7 convolutional layer and maximized by 3 × 3 pooling layer, and DEM data is convolved by 1 × 1 convolutional layer and maximized by 2 × 2 pooling layer. The feature maps of the processed data are spliced in the channel direction, and then input into the EFFCA backbone network for feature extraction and classification, and the predicted labels are output.

4.2.1.2

Lithologic Scene Classification Model Based on Multi-view Feature Level Fusion

Feature-level fusion of remote sensing images is an important research direction of feature-level data fusion. Generally, the data after feature extraction of remote

Data concatenation

Down samplingt

Cross-block fusion

Global average pooling

Attention intensive and fast

Fig. 4.2 Lithology scene classification model based on multi-view data level fusion

Attention intensive and fast

Convolutional blocks + Max pooling

SAR scene data

DEM scene data

Optical scene data

Classifier

Label prediction

Fusion within dense blocks

Fully connected layer

4.2 Methods 83

84

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

sensing data from different views are used for fusion. This process will select the original data and eliminate some invalid data, and the data may be more representative to the final results. Remote sensing images usually contain features such as texture and edge, and the expression form of the same feature may be different for remote sensing data from different views. In the process of feature level fusion, the model will increase the dimension of learning the same object, which makes the discrimination of the object more accurate and easier. Generally, it is necessary to use independent modules for feature extraction of different data, so the structure and parameters of the model are multiplied compared with data-level fusion. This model structure can design different models or methods for different modal data, making the extracted feature data more effective, thereby improving the effect of model classification. Another lithologic scene classification model based on multi-view feature level fusion was also constructed by using the feature level fusion strategy and EFFCA network as the backbone network. Remote sensing images with different views may contain different expressions of the same target feature. After the feature level fusion, the model can learn more dimensional information of the target feature, which makes the target identification more accurate and improves the performance of the model. The complete structure of the network is shown in Fig. 4.3. The model structure is also composed of three branches, corresponding to the remote sensing data of three kinds of views respectively. In order to ensure that each data source corresponds to the same target position, the three kinds of input view data also need to be spatially aligned. The three branches of the model use EFFCA network for feature extraction, and the different branch models are trained independently in which the parameters are not shared. Taking optical remote sensing data branch as an example, the input of the scene map data of the model first needs to go through a 7 × 7 convolution layer and a 3 × 3 maximum pooling layer for operation, and then the data is input into the EFFAC network model and the feature information of the view is output. Similarly, DEM data branches and SAR data branches need to perform the same operation, and finally, the feature information of each branch is flattened and spliced, and input into the full connection layer and the classifier layer, and finally output the prediction result.

Optical scene data

Convolution + pooling

EFFCA model

DEM scene data

Convolution + pooling

EFFCA model

SAR scene data

Convolution + pooling

EFFCA model

Flattening the cascade

Fully connected layer

Classifier

Fig. 4.3 Lithologic scene classification model based on multi-view feature level fusion

Label prediction

4.2 Methods

4.2.1.3

85

Lithologic Scene Classification Network Based on Enhanced Feature Fusion and Channel Attention

The fusion method of global features and local features can add rich shallow information such as texture details to the abstract feature information extracted by the deeper network, and improve the feature expression ability of the model. In order to deal with the problem of feature redundancy, it is also necessary to improve the ability of the model to focus on key feature information. EFFCA network adopts the enhanced dense connection network model as the backbone, and maintains the original structure design of alternating combination of 4 dense blocks and 3 transition layers. The SE channel attention module is added before each dense block, thus constituting the attention dense block. Before the feature data is input into the dense block, the channel attention module is used to assign larger weights to the key feature information useful in the final classification result to enhance its feature information, while a smaller weight is assigned to the invalid information to suppress its interference on the classification results. The process of EFFCA network for remote sensing scene data classification is as follows: Firstly, the remote sensing image data is subjected to 7 × 7 convolution and 3 × 3 Max pooling operation, and the obtained feature information is input into the first attention dense block. After the attention-dense block, a transition layer consisting of a 1 × 1 convolutional layer and a 2 × 2 average pooling layer is used to reduce the size and the number of channels of the feature map, and input to the next level of attention-dense block. Starting from the second attention dense block, its input data contains the output feature maps of all the previous attention dense blocks. These feature maps are downsampled twice or three times according to the number of spanning dense blocks, and the processed feature maps are concatenated in the channel direction. The output of the fourth attention-dense block will pass through the fully connected layer and the classifier layer to obtain the prediction result of the model.

Global Feature and Local Feature Fusion Method DenseNet (Huang et al., 2017) is another neural network model with high performance after VGGNet and ResNet. By directly concatenating multiple features through skip connection to form a new combined feature, a large number of feature information in the input data can be extracted with a small number of network model parameters. The reuse of feature data between different layers can effectively alleviate the gradient diffusion problem in the process of model training and accelerate the training speed of the model. DenseNet is mainly composed of dense blocks and transition layers, and in each dense block, a dense connected structure is used. In each dense block, the output feature maps of different convolutional layers are directly spliced, which needs to ensure that the size of the feature map is consistent. In order to achieve the gradual reduction of the feature map in the process of backward transmission, a transition layer is introduced between the dense blocks to reduce the

86

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

size of the feature map, and the number of channels of the feature map is controlled, so as to reduce the amount of model data and reduce the risk of overfitting. In the deep learning network, the output of the previous convolutional layer is used as the input of the next convolutional layer. With the deepening of the network, the feature map gradually decreases, and the perception field of the convolutional layer becomes larger. The shallow dense block can extract the local feature information of lithology because of the smaller field of perception. With the deepening of network depth, the deep dense block is conducive to extracting the global feature information. Because the traditional dense connection network can only fuse the feature data inside the dense block, it cannot capture the feature connection between different dense blocks well. The short-circuit operation between different dense blocks can help fuse global features and local features, and further strengthen the ability of the network to extract abstract feature data at different scales. Based on the traditional dense connected layer network, we proposed a dense connection network with enhanced feature fusion by adding short connection operation between the output feature maps of dense blocks. Based on the traditional dense connected layer network, this study proposes a dense connected network with enhanced feature fusion by adding short connection operation between the output feature maps of dense blocks. Supposing the network model consists of i dense blocks, Bi is the feature map output by the i th dense block, B(·) represents the set of nonlinear operations of many convolutional layers in the dense block. The data of the i th dense block in the network is composed of the output of the 0th to the 0th to the i − 1 dense blocks. The connection structure satisfies Eq. 4.1.   Bi = B B0 , B1 . . . , Bi−1

(4.1)

DenseNet maintains the same feature map size inside each dense block, and uses a transition layer to downsample between adjacent dense blocks by using a 1 × 1 convolution layer and a 2 × 2 average pooling layer, halving H and W, and reducing the number of channels according to compression ratio. Data fusion across dense blocks expands the data in the direction of the channel by concatenating the outputs of different dense blocks. Therefore, when performing the fusion operation across dense blocks, it may be necessary to conduct two or three down-sampling operations according to the number of dense blocks crossed to ensure that the size of the feature map in the process of stitching is consistent. In order to preserve more comprehensive data information during the fusion process, this down-sampling process does not require compression of the number of channels.

Extraction of Key Features from Lithology Data The fusion method of global feature and local feature can add rich texture details and other shallow information to the abstract feature information extracted from the deeper network, and improve the feature expression ability of the model. In lithology

4.2 Methods

87

classification, due to the influence of data redundancy and other problems, part of the data will interfere with the final experimental results, so extracting the feature information is very important to improve the accuracy of lithology classification. In order to deal with the problem of feature redundancy, it is also necessary to improve the focusing ability of the model on key feature information. SENet (Hu et al., 2018) model proposes the SE (Squeeze-and-Excition) module, which pays more attention to channel information. Therefore, this paper introduces the channel attention module into the dense connected network that enhances feature fusion. The SE module inputs a feature map with C channels and first performs the Squeeze operation, which is implemented as shown in Eq. 4.2: z c = Fsq (u c ) =

W H   1 u c (i, j ) H × W i=1 j=1

(4.2)

where u c is the feature map with C channels input to the SE module in the figure, and Fsq (·) represents the compression operation, which is mainly implemented by global average pooling. The feature map on each channel is compressed into a data value to obtain a one-dimensional vector, and each value corresponds to the weight of a channel feature map. H represents the height of the feature map, W represents the width of the feature map, and C represents the number of channels, which respectively represent the element position of the feature map. Next, Excition is performed, which mainly uses two fully connected layers to process the z c vector to extract the dependencies between different channels, so as to obtain the weights of the feature maps of different channels. The process of its specific implementation is shown in Eq. 4.3. We start by multiplying W1 with the output of the “compress” process. W1 is the coefficient of the fully connected layer, and its dimension is C/r × C. r is the hyperparameter in this operation, which is used to control the number of channels in the intermediate results and can reduce the number of parameters of the model. Since z c is a 1 × 1 × C one-dimensional vector, the output of the first fully connected layer is 1 × 1 × C/r, followed by a ReLU activation function layer, and W2 is multiplied with the result of the activation function. W2 also represents the coefficients of the fully connected layer, and its dimension is C × C/r, so the result is a 1 × 1 × C one-dimensional vector. Finally, the Sigmoid function is used to obtain the final result sc , which can be regarded as the weight information of the feature maps of different channels. Two fully connected layers are added to the model. The first one is used to reduce the dimension of the parameters and reduce the model parameters. The latter one is used to recover the dimensions of the final vector, ensuring that the dimensions of sc are consistent with the channels of the original input vector image u c . sc = Fex (z c , W ) = σ (g(z c , W )) = σ (W2 δ(W1 z c ))

(4.3)

88

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

where Fex (·) represents the activation operation, σ represents the Sigmoid function, δ represents the ReLU activation function, and both W1 and W2 represent the operation of the fully connected layer. Finally, the channel activation operation is carried out, and the result of the “excitation” operation sc is multiplied with the original feature map data u c . The specific implementation process is shown in Eq. 4.4. X˜ = Fscale (u c , sc ) = sc u c

(4.4)

where Fscale (·) is the activation operation that redistributes the weights of the feature channels, u c is the original feature map, and sc is the weight coefficients of the channels, which are implemented by dot multiplication, which is equivalent to multiplying the feature map on each channel by a coefficient. The attention mechanism of SE channel mainly includes two steps: “compression” and “stimulation”. In the process of network training, the weights of different channels are adjusted to strengthen the weight of feature information that is more effective for classification, and the ability of the model to extract key features is improved. In the “compression” process, the statistical data of each channel feature map is obtained through global average pooling. The association relationship between different channels is captured during the “excitation” process when the weight information of the channel is obtained. The weight coefficient is compressed to between 0 and 1 by the Sigmoid function. Finally the channel weight coefficient and the original input feature map are multiplied to update the information in the feature map.

4.2.2 Accuracy Evaluation Based on five repetitions of the experiment, the confusion matrix of the test set was obtained and various accuracy metrics were calculated. The mean and standard deviation of various metrics were used to quantify the performance difference between different classification models. In addition, ablation experiments were used to verify the effect of the two improved modules in EFFCA network and analyze the effect of the combination of different view data fusion on the classification performance. Based on the constructed model, scene-level prediction and accuracy evaluation were performed for the study area. OA, Kappa coefficient and F1_score were used as evaluation metrics. The calculation of OA and F1-score can be referred to Sect. 3.2.4. Kappa coefficient is a statistical measure of inter-rater agreement for qualitative (categorical) items that can be calculated by: K appa =

po − pe 1 − pe

(4.5)

where pe is the probability of chance agreement, po is the observed agreement.

4.3 Results and Discussion

89

4.3 Results and Discussion 4.3.1 Experimental Setup and Hyperparameter Optimization The dataset of the corresponding region was randomly divided into the training set, verification set and test set according to the ratio of 6:2:2. In order to obtain reliable experimental results, five repeated experiments were conducted with each parameter setting, and the mean value and standard deviation of the experimental results were taken. The experimental environment in this paper is shown in Table 4.1. Different parameter designs in the process of model training will have an impact on the results. In order to reduce the impact of different relevant parameters on the experimental results, the parameters of relevant experiments will be adjusted according to the data given in Table 4.2. Due to the limitation of the memory capacity of the graphics card, in the process of training the classification network, the data from the dataset can only be put into the model for training in batches, and the number of pictures trained in the network each time is controlled by batchSize. The initial learning rate was set to 0.001, and the learning rate attenuation strategy was introduced, that is, when the number of training reached the set parameter, the value of the learning rate would be adjusted. In this chapter, when the training rounds reach 0.5 times and 0.75 times of the total rounds, the learning rate was set to decrease to 0.1 times of the original value. For other parameters involved in the experiment, the default coefficients in the experiment framework were used. Table 4.1 Experimental environment configuration Specific configuration

Experimental environment Hardware environment

Software environment

Table 4.2 Experimental parameter settings

CPU

2*E5-2620V4

GPU

2*GeForce RTX 2080Ti

Memory

64GB

Operating system

CentOS 7.6.1810 (Core)

Deep learning framework

Pytorch 1.8.1

GPU driver version

CUDA 10.2/CUDNN 7.6.5

Programming language

Python 3.8

Parameter name

Parameter value

Batch size

60

Epoch

150

Initial learning rate

0.001

Weight attenuation coefficient

0.0001

Optimizer

Adam

Loss function

Cross entropy loss function

90

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

4.3.2 Experimental Result 4.3.2.1

Overall Accuracy Evaluation

EFFCA model was used to perform experiments on the lithology data set constructed in the study area, and AlexNet, VGG16, ResNet101 and DenseNet121 were used to perform experiments on the same dataset. Since the basic model was applied to the three-band data in the natural scene, the experiment in this summary only used the optical data in the data set. Finally, the results of various networks were compared, and the results are shown in Table 4.3. Table 4.3 shows the OA, Kappa coefficient and F1_score of different network models on the dataset of lithology scene in the study area. By comparison, we can find that the proposed EFFCA has the best classification performance (bolded). Among several traditional networks, DenseNet121 and ResNet101 networks are obviously superior to AlexNet and VGG16 network models, probably because short links in dense blocks can enhance the transmission of features, and can fully extract lithologic characteristics information data. Compared with the DenseNet121 model, the EFFCA model improves the three indicators by 1.15%, 1.5% and 1.21% respectively, which should be because the enhanced feature fusion mechanism in the model can fuse more effective information. And the attention mechanism improves the model’s ability to pay attention to key feature information. Both of them comprehensively improve the model’s ability to extract lithologic semantic information. Thus, the classification accuracy of the model is improved. Using the lithology data set constructed in the study area, two lithology scene classification models with multi-view data fusion were trained and evaluated, and the classification accuracies are shown in Table 4.4. Since optical data contains more abundant information, in this experiment, optical data was taken as the basic data for comparison, and EFFCA model only uses optical data in the dataset for experiment. The classification model of feature level fusion and data level fusion was discussed experimentally using three kinds of view data. It can be seen from the data in Table 4.4 that the model built after data fusion has improved according to three evaluation indicators (bolded), indicating that the fusion of different types of underlying features and deep semantic features has contributed to the improvement of lithology classification. Compared with the feature-level fusion Table 4.3 Classification accuracy index Methods

OA (%)

Kappa (%)

F1_score (%)

AlexNet

61.91 ± 1.52

44.49 ± 2.08

38.38 ± 5.18

VGG16

69.86 ± 3.38

57.53 ± 5.10

58.12 ± 6.23

ResNet101

71.54 ± 1.86

58.94 ± 2.89

60.48 ± 3.17

DenseNet121

77.89 ± 2.31

68.44 ± 2.91

69.72 ± 3.12

EFFCA

79.04 ± 1.7

69.94 ± 2.41

70.93 ± 2.14

4.3 Results and Discussion

91

Table 4.4 Model classification accuracy indexes of different fusion strategies Data source type

OA (%)

Kappa (%)

F1_score (%)

EFFCA (optical data source)

79.04 ± 1.7

69.94 ± 2.41

70.93 ± 2.14

EFFCA-feature level fusion

82.11 ± 0.69

74.54 ± 0.94

74.90 ± 2.06

EFFCA-data level fusion

83.6 ± 0.44

76.6 ± 0.66

77.01 ± 1.35

method, the results of data-level fusion in OA, Kappa and F1_score increased by 1.49%, 2.06% and 2.11%, respectively, which may be due to the shallow-based data fusion can better preserve the integrity of different types of data sources, thereby improve the performance of classification.

4.3.2.2

Single Class Accuracy Evaluation

The confusion matrix of different models on the dataset in the study area is shown in Fig. 4.4. Figure 4.4a is the confusion matrix of EFFCA model based on the experimental results of optical data sources. Except for the serious misclassification of slate data, the accuracy of the other four lithologies is higher than 70%. The probability of “slate” being misclassified as “granitic rock” is 43.8%. In addition, “schist” and “diorite” are also misclassified as “granitic rock” with a high probability of 13.3% and 13.2%, respectively. The confusion matrix based on the feature level fusion classification model is shown in Fig. 4.4b. The accuracy of all kinds of lithics data is more than 68%, among which the classification accuracy of “slate” and “schist” are 68.8% and 68.9%, respectively. And the probabilities of misclassification as “granitic rock” are 31.2% and 22.2%, respectively. The confusion matrix based on the data-level fusion classification model is shown in Fig. 4.4c. The accuracy of various lithics also exceeds 68%, and the classification accuracy of “slate” is the lowest, only 68.8%, and its probability of being misclassified as “granitic rock” is 31.2%. According to the classification result data of various models, there is a high probability that various lithologies are misclassified as “granite rock”. The possible reason is that although certain measures were taken to suppress the influence of data imbalance, “granite rock” data occupied a larger proportion of samples in the dataset, which had an impact on the experimental results. The comparison of “slate” and “diorite”, which have the lowest classification accuracy on a single data source, shows that the two data fusion network models have greatly improved the performance, especially the classification performance of “slate” has been greatly improved by more than 12%, making the classification accuracy of all types of lithology higher than 68%. It shows that the method of data fusion can give full play to the advantages of more sample data, improve the network accuracy, and reduce the impact of shortcomings of a single data source.

4 Lithological Remote Sensing Scene Classification Based on Multi-view … Normalized confusion matrix Loose accumulation

Reallabel

Slate

Schist

Granite

Diorite

Predicting labels (a) Optical data source Normalized confusion matrix Loose accumulation

Slate

Reallabel

Fig. 4.4 Normalized confusion matrix of classification results of different models

Schist

Granite

Diorite

Predicting labels (b) Feature level fusion Normalized confusion matrix Loose accumulation

Slate

Reallabel

92

Schist

Granite

Diorite

Predicting labels (c) Data level fusion

4.3 Results and Discussion

93

Compared with the single data source, the classification accuracy of “schists” is greatly decreased by 11.1% from 80% to 68.9%, and the probability of misclassification as “granitic” and “diorite” is increased by 8.9% and 4.5%, respectively. At the same time, the network model based on data-level fusion does not appear the problem of “schist” accuracy declination. The possible reason is that after deep semantic feature extraction through the network on different branches, the information of the three will have great similarity. Low-level data can be used to distinguish among these lithology classes, thus ensuring that accuracy being stable.

4.3.2.3

Visual Evaluation of the Whole Area

Figure 4.5 shows the results of full map prediction in the study area by using different models. Different types of data were cropped from top to bottom and from left to right to meet the size requirements of the dataset construction, and the obtained labels were assigned to the lithology category of the region. Since the dataset is constructed using the data of the local region, and some of the data have participated in the training of the model, this result figure does not evaluate the accuracy, only the visual effect is carried out. The result map after classification by EFFCA model only used optical data in the prediction process, which was prone to misclassification on the boundary of schist, granite and diorite, presenting a serious salt-and-pepper phenomenon. The prediction performance of the feature level fusion model and the data level fusion model can also better show the different distribution areas of lithology, but there is also pepper and salt phenomenon, compared with the results of EFFCA model, the occurrence of misclassification in some areas is reduced. Figure 4.5a is the label graph after manual interpretation, as the real label data. Figure 4.5b shows the result map after classification by EFFCA model. The prediction process only uses optical data, and it can be seen that the distribution area of different lithologies can be better displayed. However, the boundary of schist, granitic rock and diorite are prone to be misclassified, showing a more serious salt-and-pepper phenomenon. Figure 4.5c, d respectively show the prediction results of the feature level fusion model and the data level fusion model. The prediction performance of the whole region can also well show the different distribution areas of lithology, but there is also pepper and salt phenomenon, and compared with the results of EFFCA model, the misclassification in some regions is reduced. By comparing the area of black box (1) in the figure, it can be seen that the prediction performance with data fusion is better than that with optical data only, and there are fewer salt-and-pepper pixel blocks caused by misclassification in the figure. Compared with the small black box area in (c) and (d), the continuous distribution of lithologic regions of loose deposits can be better displayed based on data-level fusion, and the classification performance is better visually. The specific distribution of SAR data in the research area is shown in Fig. 2.5. It can be seen that there are data imperfections in the upper right corner. The data read from this part of data is 0 after clipping during the whole map prediction process. By comparing the area of black box (2) in the figure, it can be seen that the prediction of

94

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

(a) Manual label interpretation in the study area

0

5

10 km

Loose accumulation Granite

(b) EFFCA regional forecast chart

Slate Diorit

Schist Rhyoli

Fig. 4.5 Prediction graphs of different models in the study area

two kinds of data fusion will have obvious misclassification in the case of incomplete SAR data. As can be seen from Fig. 4.5c, in the case of feature level fusion, lithology will be misclassified as loose deposits, while in the case of data-level fusion, as shown in Fig. 4.5d, it will be misclassified as slate. In the lithology scene classification model based on multi-view data fusion, it is necessary to provide various data that meet the conditions when making prediction, otherwise it is difficult to guarantee the performance of the model.

4.3.3 Discussions 4.3.3.1

Ablation Experiment of EFFCA Model

In order to verify the role of the two improved modules in the EFFCA network, optical data in the study area were used to conduct ablation comparison experiments, and the experimental results were shown in Table 4.5. It can be seen from the data in the table that EFFCA network (bolded) without using enhanced feature fusion has achieved higher results than DenseNet121, ResNet101 and other models. On this basis, the OA, Kappa and F1_score of EFFCA network with enhanced feature fusion increased by 0.73%, 1.17% and 0.7%, respectively. It shows that the feature maps

4.3 Results and Discussion

95

Table 4.5 Ablation experiments of EFFCA networks Model

No channel attention

No enhanced feature fusion

EFFCA

OA (%)

78.71 ± 0.80

78.31 ± 0.68

79.04 ± 1.70

Kappa (%)

69.42 ± 1.36

68.77 ± 1.07

69.94 ± 2.41

F1_score (%)

70.92 ± 1.56

70.23 ± 2.57

70.93 ± 2.14

Table 4.6 Comparative experiments of different attention modules Data source type

OA (%)

Kappa (%)

F1_score (%)

DenseNet121

77.89 ± 2.31

68.44 ± 2.91

69.72 ± 3.12

DenseNet121+CBAM

73.57 ± 1.16

61.79 ± 1.26

62.10 ± 2.25

DenseNet121+ECA

78.44 ± 0.89

68.90 ± 1.65

70.13 ± 1.61

DenseNet121+SE

79.04 ± 1.7

69.94 ± 2.41

70.93 ± 2.14

of different dense blocks can effectively fuse the feature information of different shallow layers, middle layers and deep layers, and these data have a good indication of lithology differentiation, so as to improve the classification performance of the model. In addition, the channel attention mechanism can also have a positive impact on the classification accuracy of the model, with OA and Kappa increasing by 0.33% and 0.52% respectively. It may be that the channel attention mechanism enables the model to focus on the information content that has a greater impact on the results, and suppress some of the features that are not effective for lithology classification, thus improve the model performance.

4.3.3.2

Comparative Experiments of Different Attention Mechanisms

In order to verify the superiority of using SE attention module in EFFCA model, comparative experiments were carried out for multiple attention mechanisms based on optical data in the study area. The experiments mainly included adding CBAM, ECA and SE three different attention mechanisms to DenseNet121 model, and the results were shown in Table 4.6. It can be seen from the table that compared with the original network, the model after the addition of ECA and SE has certain performance improvement, and the model after the addition of SE has more obvious improvement (bolded), indicating that the use of SE as the attention mechanism of the model has certain advantages.

4.3.3.3

Ablation Experiments of Two Data Fusion Models

In order to analyze the effect of different combination methods of view data fusion on classification performance, the ablation experiments of two data fusion models were carried out using the datasets in the study area. Since optical data has the largest

96

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

amount of channel data, contains more information and richer lithology information, optical data was used as the main data, DEM and SAR data as auxiliary classification data for experimental verification, and the results of multi-view fusion were compared with those using only optical data. The data in Table 4.7 mainly show the results of the ablation comparison experiment of multi-view data fusion using the mode of feature-level data fusion. It can be seen that the fusion methods of “optical +DEM” and “optical +SAR” can improve the classification accuracy of the model compared with a single optical data, and the combination of “optical +SAR” achieves the best performance. It may be that after adding DEM or SAR data, the dimensional information of the same lithology extracted by the model is increased through feature extraction, so as to improve the classification accuracy of the model (bolded). SAR contains dual-polarization data of two channels, and different polarization methods pay different attention to different information when identifying different ground objects, which makes the information contained in SAR more abundant, so the combination effect is better than that of DEM single channel. The three view data fusion models have been further improved on the basis of the former, and the three indicators of OA, Kappa and F1_score have been increased by 2.28%, 3.38% and 2.38% respectively. It indicates that the classification performance of the multi-view fusion model is positively correlated with the types of data sources. It is possible that the increase in the types of data sources will improve the dimension and information of data. After feature extraction and fusion, the useful feature information will be more abundant, which is more favorable for the extraction of key features of lithology. The data in Table 4.8 mainly shows the results of the ablation comparison experiment of multi-view data fusion using the data-level data fusion method. As can be seen from the table, the fusion mode of “Optics+DEM” and “Optics+SAR” can better improve the classification performance of the model compared with that of a single data source. Among them, the combination of “optical +DEM” has the OA, Kappa and F1_score increased by 1.66%, 2.88% and 1.99%, respectively, and the combination of “optical +SAR” has the OA, Kappa and F1_score increased by 1.94%, 2.88% and 3.41%, respectively. It shows that DEM data can improve the classification accuracy of the model through the expression of geomorphology, and the rich texture data on SAR data also has a good effect on the classification of different lithologies. Table 4.7 Multi-view data fusion ablation experiment (feature level fusion) Data source type

OA (%)

Kappa (%)

F1_score (%)

Optics

79.04 ± 1.7

69.94 ± 2.41

70.93 ± 2.14

Optics+DEM

79.24% ± 0.88

70.43% ± 1.30

72.85% ± 1.76

Optics+SAR

79.83% ± 0.80

71.16% ± 1.10

72.52% ± 2.43

DEM+SAR

68.96% ± 1.47

54.33% ± 1.76

53.85% ± 1.62

Optics+DEM+SAR

82.11 ± 0.69

74.54 ± 0.94

74.90 ± 2.06

4.4 Conclusion

97

Table 4.8 Multi-view data fusion ablation experiment (data level fusion) Data source type

OA (%)

Kappa (%)

F1_score (%)

Optics

79.04 ± 1.7

69.94 ± 2.41

70.93 ± 2.14

Optics+DEM

82.11 ± 0.70

74.44 ± 0.96

74.47 ± 1.43

Optics+SAR

82.39 ± 1.50

74.86 ± 2.09

75.89 ± 2.55

DEM+SAR

71.99 ± 0.97

59.45 ± 1.57

61.25 ± 3.44

Optics+DEM+SAR

83.6 ± 0.44

76.66 ± 0.66

77.01 ± 1.35

After fusing the data of the three views, the model has further improved in the combination of “optical +SAR”, and the three indicators are increased by 1.21%, 1.8% and 1.12%, respectively. It may be that each data has different advantages in classifying different lithology types, and the classification advantages of different data can be brought into play in the process of multiple fusion, so as to further improve the classification effect. At the same time, it is also found that the classification effect of the combination of “DEM+SAR” is decreased compared with that of a single optical data, probably because the information contained in this band is small, and it is difficult to extract key feature information from the model. This indicates that the optical data is taken as the main data in lithology classification. Using SAR data and DEM data as supplementary data is a better scheme (bolded).

4.4 Conclusion In view of the difficulty of extracting key features in the process of remote sensing lithology classification in the coverage area based on deep learning technology, this study conducted research from two aspects of feature extraction and model transfer. A lithologic scene classification model based on multi-view data fusion was also constructed. A dense connection network enhanced feature fusion was used to extract and fuse multi-scale data features, and a channel attention mechanism was added to assign weights to different channel data to improve the model’s ability to focus on key feature information. Finally, multi-view data was used to provide more complete feature description information of the target, and two different data fusion strategies were used to construct a model to extract feature information. Finally, experiments show that the enhanced feature fusion and attention mechanisms added to the model contribute to the improvement of model performance. After the introduction of multi-view data, the classification accuracy of the model was further improved. Both kinds of data fusion can achieve a better performance than the conventional network. The OA, Kappa and F1_score of EFFCA-feature level fusion model on the test set were 82.11 ± 0.69%, 74.54 ± 0.94% and 74.90 ± 2.06%, respectively. The OA, Kappa and F1_score of EFFCA-data level fusion model on the test set were 83.6 ± 0.44%, 76.6 ± 0.66% and 77.01 ± 1.35%, respectively. However, it can be found that comparing with the method of feature level

98

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

fusion, the results of data level fusion in OA, Kappa and F1 increased by 1.49%, 2.06% and 2.11% respectively, indicating that data fusion based on shallow layer can better retain the integrity of different types of data sources. And these data can better distinguish different types of lithologic data, thus improve the performance of classification. The performance of the model is positively correlated with the number of data sources to be fused. The lithology classification of covered areas is a challenging research direction. This paper fuses multi-view remote sensing data and improves the classification model based on the current lithology classification from two aspects of data and model, but there are still shortcomings and needs further research: (1) The lithology classification model based on feature-level data fusion proposed in this paper uses the same model structure in each branch, but the amount of data in DEM and SAR branches is relatively small, and whether the branch model structure is too complex is not deeply studied in this study. We can try to reduce the number of dense blocks in the model for branches with fewer data to reduce the risk of overfitting. (2) Constructing a multi-scene remote sensing dataset. The interpretation of lithology is different in different polar environments, high-altitude exposed areas, and alpine frozen soil areas. For the possibility of large-scale regional interpretation of the model, a large number of regional geological data can be used to construct more diverse and typical lithology datasets, so as to enrich the quantity and quality of lithology samples in the dataset. (3) Deep learning models are mainly data-driven, and their performance is affected by the distribution and quantity of sample data. In the traditional sample data enhancement technology, the geometric operations such as image rotation and flip are mainly used to achieve it. In the later stage, the imaging mechanism related information of remote sensing image data can be added to the sample generation process to improve the quality and representativity of the image.

References Bengio, Y., Courville, A., & Vincent, P. (2012). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. Cardoso-Fernandes, J., Silva, J., Perrotta, M. M., et al. (2021). Interpretation of the reflectance spectra of lithium (Li) minerals and pegmatites: A case study for mineralogical and lithological identification in the Fregeneda-Almendra Area. Remote Sensing, 13(18), 3688. Chen, W., Ouyang, S., Tong, W., et al. (2022). GCSANetA global context spatial attention deep learning network for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 1150–1162. Dong, Y., & Zhang, Q. (2019). Deep semantic feature extraction of high-resolution remote Sensing images based on CNN. Remote Sensing Technology and Application, 34(1), 1–11 (in Chinese). Fu, G.-M., Yan, J.-Y., Zhang, K., et al. (2017). Current status and progress of lithology identification technology. Progress in Geophysics, 32(1), 26–40.

References

99

Gaffey, S. J. (1986). Spectral reflectance of carbonate minerals in the visible and near infrared (0.35–2.55 microns); calcite, aragonite, and dolomite. American Mineralogist, 71(1–2), 151– 162. Han, X. (2016). Research on collaborative image lithology enhancement and extraction method in metamorphic rock area. Hangzhou Normal University (in Chinese). He, D.-C., & Wang, L. (1990). Recognition of lithological units in airborne SAR images using new texture features. Remote Sensing, 11(12), 2337–2344. He, H., Yang, X., Li, Y., et al. (2010). Multi-source data fusion technique and its application in geological and mineral survey. Journal of Earth Science and Environment, 32(1), 44–47. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141). Huang, G., Liu, Z., Van Der Maaten, L., et al. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700– 4708). Huang, X., & Zhang, L. (2011). A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery. Photogrammetric Engineering and Remote Sensing, 77(7), 721–732. Huang, Y., LI, P., & Li, Z. (2003). Application of image texture based on geostatistics in lithology classification. Remote Sensing for Land and Resources, 15(4), 45–49 (in Chinese). Hunt, G. R., & Ashley, R. P. (1979). Spectra of altered rocks in the visible and near infrared. Economic Geology, 74(7), 1613–1629. Jakob, S., Bühler, B., Gloaguen, R., et al. (2015). Remote sensing based improvement of the geological map of the Neoproterozoic Ras Gharib segment in the Eastern Desert (NE–Egypt) using texture features. Journal of African Earth Sciences, 111, 138–147. Jia, H., Liu, L., Wei, B., Zhang, M., Wu, Y., & Zhang, H. (2017). Automatic extraction of bridges with shape and texture characteristics using high resolution SAR images. Bulletin of Surveying and Mapping, 12, 82. Li, M., Tang, Z., Tong, W., et al. (2021). A multi-level output-based DBN model for fine classification of complex geo-environments area using ziyuan-3 TMS imagery. Sensors, 21(6), 2089. Li, P. (2004). Lithology classification using ASETR image and geostatistical texture. Journal of Mineralogy and Petrology, (3), 117–121 (in Chinese). Li, X., Tang, Z., Chen, W., et al. (2019). Multimodal and multi-model deep fusion for fine classification of regional complex landscape areas using ZiYuan-3 imagery. Remote Sensing, 11(22), 2716. Li, X., Wu, C., Chen, W., et al. (2019). Remote sensing intelligent interpretation technology of military geological body. Science Press, 29 (in Chinese). Liu, H., Wu, K., Xu, H., et al. (2021). Lithology classification using TASI thermal infrared hyperspectral data with convolutional neural networks. Remote Sensing, 13(16), 3117. Liu, H., Wu, Y., Cao, Y., et al. (2020). Well logging based lithology identification model establishment under data drift: A transfer learning method. Sensors, 20(13), 3643. Ma, D., & Li, P. (2008). Lithology classification with multi-scale image texture. Acta Petrologica Sinica, 24(6), 1425–1430 (in Chinese). Ni, L., & Wub, H. (2019). Mineral identification and classification by combining use of hyperspectral VNIR/SWIR and multispectral TIR remotely sensed data. In IGARSS 2019—2019 IEEE international geoscience and remote sensing symposium (pp. 3317–3320). Othman, A. A., & Gloaguen, R. (2014). Improving lithological mapping by SVM classification of spectral and morphological features: The discovery of a new chromite body in the Mawat ophiolite complex (Kurdistan, NE Iraq). Remote Sensing, 6(8), 6867–6896. Ouyang, Y. (2017). Remote sensing image scene classification based on convolutional neural network. Hunan University (in Chinese). Pal, M., Rasmussen, T., & Porwal, A. (2020). Optimized lithological mapping from multispectral and hyperspectral remote sensing images using fused multi-classifiers. Remote Sensing, 12(1), 177.

100

4 Lithological Remote Sensing Scene Classification Based on Multi-view …

Pan, W., Ni, G., & Li, H. (2009). Study on multifractal characteristics of rocks based on topographic structure-lithology component decomposition of remote sensing images. Earth Science Frontiers, 16(6), 248–256 (in Chinese). Perez, C. A., Estévez, P., Vera, P. A., et al. (2011). Ore grade estimation by feature selection and voting using boundary detection in digital image analysis. International Journal of Mineral Processing, 101(1), 28–36. Pesaresi, M., & Gerhardinger, A. (2011). Improved textural built-up presence index for automatic recognition of human settlements in arid regions with scattered vegetation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 4(1), 16–26. Pesaresi, M., Gerhardinger, A., & Kayitakire, F. (2008). A robust built-up area presence index by anisotropic rotation-invariant textural measure. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 1(3), 180–192. Seid, A., & Suryanarayana, T. (2021). Identification of lithology and structures in Serdo, Afar, Ethiopia using remote sensing and Gis techniques. International Journal of Geoinformatics and Geological Science, 8(1), 27–41. Tian, T., Li, L., Chen, W., et al. (2021). SEMSDNet: A multiscale dense network with attention for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 5501–5514. Tong, W., Chen, W., Han, W., et al. (2020). Channel-attention-based DenseNet network for remote sensing image scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 4121–4132. Vignesh, K. M., & Kiran, Y. (2020). Comparative analysis of mineral mapping for hyperspectral and multispectral imagery. Arabian Journal of Geosciences, 13(4), 1–12. Wang, J. (2020). Research on lithology identification technology of aerial hyperspectral remote sensing based on machine learning. Beijing Institute of Geology, Nuclear Industry (in Chinese). Wang, S., Fan, S., Pei, Q., et al. (2021). Application of multispectral and hyperspectral remote sensing lithology interpretation in the investigation of Sichuan-Tibet railway: A case study of Yongba area in Nujiang Valley, Southeast Tibet. Journal of Engineering Geology, 29(2), 445–453 (in Chinese). Wang, W., & Cheng, Q. (2008). Mapping mineral potential by combining multi-scale and multisource geo-information. In IGARSS 2008—2008 IEEE international geoscience and remote sensing symposium (pp. II-1321–II-1324). Wang, X., & Fan, Y. (2021). Hyperspectral image classification based on improved DenseNet and spatial-spectral attention mechanism. Laser and Optoelectronics Progress, 59(2), 0210014 (in Chinese). Wang, Z., Zuo, R., & Jing, L. (2020). Fusion of geochemical and remote-sensing data for lithological mapping using random forest metric learning. Mathematical Geosciences, 53(6), 1125–1145. Wu, C., Li, X., Chen, W., et al. (2020). A review of geological applications of high-spatial-resolution remote sensing data. Journal of Circuits, Systems and Computers, 29(6), 2030006. Wu, Y. (2021). Research and design of multimodal fusion sensing technology. University of Electronic Science and Technology of China (in Chinese). Xie, J., Li, Y., Li, H., & Wu, X. (2017). Recognition of damage buildings in hollow village based on texture feature of gray level co-occurrence matrix. Bulletin of Surveying and Mapping, 12, 90. Yang, L., Hu, L., Luo, T., et al. (2009). Analysis of several commonly used remote sensing image feature extraction technology. China High-Tech Enterprises, (1), 131–132 (in Chinese). Ye, B., Tian, S., Cheng, Q., et al. (2020). Application of lithological mapping based on advanced hyperspectral imager (AHSI) imagery onboard Gaofen-5 (GF-5) satellite. Remote Sensing, 12(23), 3990. Zhang, Y., Chen, Z., Zhang, F., et al. (2016). Remote sensing image classification based on stacked denoising autoencoder. Computer Application, 36(A02), 171–174 (in Chinese). Zhao, H., Zhang, L., Zhao, X., et al. (2016). A new method of mineral absorption feature extraction from vegetation covered area. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016, 5437–5440.

Chapter 5

Geological Lithology Semantic Segmentation Based on Deep Learning Method

Abstract Remote sensing data has been widely used in geological researches. The geological researching tasks such as lithological and soil type mapping can be achieved easily by working on remote sensing images with classification approaches. Deep learning has made remarkable achievements in processing remote sensing data as it can usually indicate the better accuracy than traditional methods. This chapter proposed the deep learning models for semantic segmentation of remote sensing images for geological mapping. These models include Mobilenet-based UNet, Mobilenet-based PSPNet and SegNet. The experiment was conducted on the randomly cropped datasets and partially overlapped cropped datasets made for the study site of Suiyang Town, China. The result indicates that the pixel segmentation accuracies of the three models are all higher than 95% on the randomly cropped dataset, and about 60% on the partially overlapped cropped dataset. The Mobilenetbased PSPNet indicates the best performance on the non-overlapped cropped dataset. The proposed models can successfully tackle the low accuracy issue of traditional methods and can achieve the more ideal performance for geological mapping tasks.

5.1 Introduction Remote sensing based geological mapping is an important task of modern geological survey. The efficiency and accuracy of it will significantly affect the follow-up ground investigations such as mineral geology, hydrogeology and geological disasters (Ji et al., 2021; Pour et al., 2018). Lithology classification is a key step in remote sensing geological mapping (Yan et al., 2015). Traditional lithological classification methods often require field surveys, which are difficult to conduct in study areas with complex topography and climate. And due to the gaps of professional level of mapping technicians, the quality of mapping can be to some extent affected (Ji et al., 2021). Over the past few years, a number of lithology classification methods based on machine learning (ML) algorithm have been proposed. Cracknell and Reading compared the performance of five machine learning algorithms in lithology classification and found that random forest (RF) was the best choice for classification with

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_5

101

102

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

multi-dimensional data source (Cracknell & Reading, 2014). Harris and Grunsky made an evaluation on RF-based lithologic mapping method with different training strategies (2015). Othman and Gloaguen (2017) and Kuhn et al. (2018) also studied the application of machine learning in lithology classification. But the ML-based lithology classification can hardly get obvious improvement due to the inherent limitations of pixel-based ML technology. Deep learning (DL) allows the accuracy of remote sensing image classification to be significantly improved with the strong feature extraction ability of computer vision (Liu, 2019). In recent years, with the rapid development of computer technology and artificial intelligence, DL methods represented by convolutional neural networks (CNN) have achieved great breakthroughs. The Fully Convolutional Networks (FCN) proposed by Long et al. (2015) pioneered the application of fully convolutional neural networks to semantic segmentation of image data, followed by U-Net (Ronneberger et al., 2015), SegNet (Badrinarayanan et al., 2017), PSPNet (Zhao et al., 2017). These classical models have been used in semantic segmentation of remote sensing images and indicate good performance. In this chapter, three deep learning models including Mobilenet-based U-Net, Mobilenet-based PSPNet and SegNet were constructed and compared for semantic segmentation of lithology as well as other ground surface substrate from remote sensing images. The model training was conducted on the rock and soil type dataset made from remote sensing images of Suiyang Town, China by our laboratory. The flowchart can be found in Fig. 5.1. This study contributes a novel framework on DLbased geological mapping, and also provides a more accurate lithologic classification map for this region to support the decision-making of local stakeholders.

5.2 Methods 5.2.1 The Utilized Algorithms 5.2.1.1

Mobilenet-Based U-Net

U-Net’s feature extraction network is a VGG16 model with a large amount of parameters. In order to reduce the amount of parameters in the network, Mobilenet was proposed to replace VGG16. The Mobilenet series has been widely used in recent years, and it is also a representative of lightweight networks (Gao et al., 2021). The structure of the U-Net based on Mobilenet is shown in Fig. 5.2.

5.2.1.2

SegNet

SegNet is an image segmentation model, which consists of an encoder and a corresponding decoder, and finally a pixel-by-pixel classification layer is connected.

5.2 Methods

103

Fig. 5.1 Flow chart of the study

The encoding part is composed of 4 layers for downsampling, and the decoding part is composed of 4 layers for upsampling, making the network symmetrical (Badrinarayanan et al., 2017). The SegNet model structure is shown in Fig. 5.3.

104

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

Fig. 5.2 Structure of Mobilenet-based U-Net

Fig. 5.3 Structure of SegNet

5.2.1.3

Mobilenet-Based PSPNet

PSPNet is an improved network for semantic segmentation based on FCN. The core idea of the network is to introduce more global information in the segmentation layer when discriminating local small targets. Information can relatively reduce the probability of misrecognition (Liu et al., 2020). The feature extraction part of PSPNet was originally a CNN network. We replaced this CNN network with Mobilenet to reduce the amount of parameters. The structure of the PSPNet based on Mobilenet we used is shown in Fig. 5.4.

5.2 Methods

105

Fig. 5.4 Structure of Mobilenet-based PSPNet

5.2.2 Evaluation Metrics PA, Kappa, F1-score, Recall and MIoU were used as the evaluation metrics of semantic segmentation task in this study. PA represents the ratio of the correct number predicted by all pixel categories to the total number of pixels. IoU is a method to quantify the overlap percentage between the target mask and the prediction mask. It refers to the ratio of the number of pixels in the common area of the target mask and the prediction mask to the total number of pixels. The IOU of can be calculated by Eq. 5.1: IOU =

TP T P + FP + FN

(5.1)

MIoU is the average of IoU for each class. Recall represents the proportion of correctly predicted samples in the study area. Kappa coefficient is used for consistency test and classification accuracy. F1-score is to measure the accuracy of binary classification model in statistics. It takes into account both the precision and recall of the classification model. After extension, it can also be used for multi classification problems.

106

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

5.3 Results 5.3.1 Experiment Setup The experiment was conducted on Windows 10. And the graphics card model is NVIDIA GeForce GTX1050 (2 GB) with the memory size of 8 GB. The DL framework used is Tensorflow. And the relevant hardware and software information is shown in Table 5.1. The training of model adopted the idea of transfer learning, which was divided into two parts, the first part was freezing training, the second part was thawing training. The setting of model training parameters is shown in Table 5.2. Other parameter setting include: “Checkpoint” was used to set the details of saving, and “period” represented how many epochs to save the weight file once. “reduce_ lr” was used to set the learning rate reduction method. “Early_stopping” was used to achieve early stopping, and “val_loss” stops automatically if it does not go down for a certain number of epochs, indicating that the model has been well trained. This operation can also prevent overfitting. Table 5.1 Related configuration Hardware and environment

Version

Running memory 8G NVIDIA GeForce GTX1050 (2G)

Cuda = 10.1, cudnn = 7.6.5

Deep learning framework

Tensorflow-gpu = 2.2.0

IDE

Pycharm2020

Python

Python = 3.7

System environment

Windows10

Table 5.2 Parameter information Parameter

Value

Epochs

Freeze training for 30 epochs; unfreeze training starts from the previous round to the 70th epoch

Bantch_ size

4

Learning rate

Freeze training: 0.001; unfreeze training: 0.0001

Loss

Cross entropy loss function

Optimizer

Adam

5.3 Results

107

5.3.2 Model Performance 5.3.2.1

Training Situation

We use the accuracy loss graph of the training set and the validation set to describe the training situation of the models (Figs. 5.5, 5.6 and 5.7). When validation loss stopped declining, indicating that the model tends to be well trained, so the early stopping module allows the models to stop training at this time. The numbers of epochs of stopping on different datasets were also different, such as the data set with band [1,2,3,4] in U-Net model. Model training stops early on a dataset because the model has learned enough information on the dataset. And the train_acc and val_acc refers to the accuracy of train set and validation set. In the ideal circumstances, the accuracy tends to increase with training process and then turns to a straight line after the model is well trained, simultaneously the loss keeps decreasing and then tends to be the straight line. When the loss and accuracy fluctuate greatly, it indicates that there may be some problems in this training. This phenomenon may be caused by terrible samples or hyperparameters. For example, curves of SegNet firstly fluctuate greatly and then become stable. The reason may be that some hyperparameter settings of freezing training are inappropriate for this model.

5.3.2.2

Model Evaluation

The experimental results are shown in Table 5.3. It can be seen that the PA of the three models on the randomly cropped dataset can reach higher than 90%. On the partially overlapping cropped dataset, the PA values of these three models are all higher than 55% on average, and some scenarios exceeded 60%. On the same dataset, the PA of SegNet is 1–2% higher than the Mobilenet-based U-Net. And the F1-score, Kappa, MIoU and Recall of SegNet are sometimes higher and sometimes lower than the Mobilenet-based U-Net. On the partial overlapping cropped data set, the PA of Mobilenet-based PSPNet can reach the values higher than 60% and even close to 70%. And the F1-score, Kappa, MIoU and Recall of Mobilenet-based PSPNet are higher than the other two models. In addition, we find that the classification accuracy of Mobilenet-based PSPNet is better than the other two models on the 4-band dataset.

5.3.3 Visual Assessment The following two sub-sections are the description of prediction results partial prediction map display of partially overlapped cropped dataset and partial prediction map display of randomly cropped dataset.

108

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

Fig. 5.5 Training process of Mobilenet-based U-Net

5.3.3.1

Prediction on Partially Overlapped Cropped Dataset

It can be seen from Figs. 5.8, 5.9 and 5.10 that some categories have higher error rates while others have lower ones on the partially overlapped cropped dataset. Such as Granite diorite, Andesite and Quaternary loose deposits, which occupy a higher proportion in the label, have higher classification accuracy. Some categories which occupy a lower proportion in the label have lower classification accuracy, such as Slate and water. Overall, there is a big gap between the predicted images and the actual labels.

5.4 Discussions

109

Fig. 5.6 Training process of SegNet

5.3.3.2

Prediction on Randomly Cropped Dataset

It can be seen from Figs. 5.11, 5.12 and 5.13 that the image predicted by the model on the randomly cropped data set has a high accuracy, with little difference from the actual labels. Each category was well predicted.

5.4 Discussions There is a significant difference in the model performance on different datasets obtained by the two different cropping strategies. On the partially overlapped cropped dataset, the PA of three models ranges from 55 to 60%. However, for the randomly

110

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

Fig. 5.7 Training process of Mobilenet-based PSPNet

cropped dataset, the PA of all models exceeded 90%. There could be two reasons for this phenomenon: (1) The difference in the number of images in train set. The train dataset of the randomly cropped dataset has 494 samples, while that of the partially overlapped cropped dataset only have 216 samples even with data augmentation. (2) The overlapped areas have different proportions. There are less overlaps in the partially overlapped cropped dataset. In remote sensing images with an imbalanced proportion of categories, some categories with a small proportion in the train set only exist in very a few samples. Therefore, the prediction accuracy of these categories can be lower. In the random cropped dataset, it can be seen from the cropping center point diagram above that there are large overlaps. As

25.43 ± 0.88 31.46 ± 3.65 24.99 ± 2.71 91.10 ± 0.44 92.08 ± 0.25 87.22 ± 0.81 26.51 ± 1.95 28.21 ± 0.77 22.84 ± 1.80 92.32 ± 0.09 92.75 ± 0.26 88.46 ± 2.21 33.75 ± 1.41 28.53 ± 2.11 32.14 ± 2.06 92.23 ± 0.45 93.17 ± 0.52 91.65 ± 0.50

60.44 ± 1.04 59.33 ± 0.72 56.55 ± 1.83 95.22 ± 0.22 95.69 ± 0.15 92.53 ± 0.47 62.61 ± 1.08 61.77 ± 1.14 56.94 ± 8.97 96.59 ± 0.05 96.68 ± 0.13 94.76 ± 1.08 67.54 ± 1.70 61.98 ± 1.40 66.68 ± 1.25 95.89 ± 0.29 96.44 ± 0.31 95.43 ± 0.22

[3,2,1] part

[4,3,2] part

[1,2,3,4] part

[3,2,1] random

[4,3,2] random

[1,2,3,4] random

[3,2,1] part

[4,3,2] part

[1,2,3,4] part

[3,2,1] random

[4,3,2] random

[1,2,3,4] random

[3,2,1] part

[4,3,2] part

[1,2,3,4] part

[3,2,1] random

[4,3,2] random

[1,2,3,4] random

Mobilenet-based U-Net

Mobilenet-based PSPNet

SegNet

F1

PA

Dataset

Model

Table 5.3 Evaluation metrics of proposed DL models

95.43 ± 0.22

96.44 ± 0.31

95.89 ± 0.29

65.27 ± 0.72

61.36 ± 1.50

66.49 ± 1.65

94.76 ± 1.08

96.68 ± 0.13

96.59 ± 0.05

54.44 ± 11.27

63.20 ± 0.99

61.83 ± 2.45

92.54 ± 0.47

95.68 ± 0.16

95.22 ± 0.22

57.89 ± 1.51

60.90 ± 0.60

61.41 ± 1.00

Kappa

85.74 ± 0.72

88.07 ± 0.88

86.66 ± 0.70

20.64 ± 2.32

18.34 ± 0.94

20.42 ± 1.02

81.31 ± 3.28

87.80 ± 0.44

87.20 ± 0.14

14.97 ± 2.81

18.33 ± 0.66

17.67 ± 1.12

78.70 ± 1.19

87.47 ± 2.47

84.77 ± 0.69

15.12 ± 1.15

19.7 ± 1.69

16.79 ± 0.65

MIoU

91.64 ± 0.30

93.35 ± 0.76

92.56 ± 0.59

35.06 ± 4.38

28.13 ± 1.85

39.15 ± 2.43

91.35 ± 1.44

93.99 ± 0.26

93.66 ± 0.13

21.29 ± 2.32

30.71 ± 1.26

26.25 ± 2.80

89.96 ± 0.64

93.35 ± 0.13

92.64 ± 0.48

28.51 ± 5.03

34.09 ± 5.77

25.73 ± 0.99

Recall

5.4 Discussions 111

112

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

Fig. 5.8 Prediction on partially overlapped cropped dataset by Mobilenet-based U-Net

Fig. 5.9 Prediction on partially overlapped cropped dataset by SegNet

5.4 Discussions

Fig. 5.10 Prediction on partially overlapped cropped dataset by Mobilenet-based PSPNet

Fig. 5.11 Prediction on randomly cropped dataset by Mobilenet-based U-Net

113

114

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

Fig. 5.12 Prediction on randomly cropped dataset by SegNet

a result, some of the images in the train set overlap with some of the images in the test set, so the prediction accuracy is very high. By comparing the accuracy of the experimental results of each model, SegNet is 1–2% higher than Mobilenet-based U-Net on the same dataset. And on the partially overlapped cropped dataset, the accuracy of the Mobilenet-based PSPNet is higher than the other two models. And on the randomly cropped dataset, the accuracy of the Mobilenet-based PSPNet is higher than Mobilenet-based U-Net, but it is sometimes higher and sometimes lower than SegNet. The comparative analysis of experimental results of datasets with different band combinations indicates that except for Mobilenet-based PSPNet, the PA of the other two models on the 4-band dataset was lower than that on the 3-band dataset. In the two models, the dataset with band combination [4,3,2] has higher PA than the other 3-band datasets. However, the situation of the model of the Mobilenet-based PSPNet is different. The accuracy of this model in 4-band dataset is higher than the other two models, and the PA of [4,3,2] combination in partially overlapped cropped dataset is lower than the other two models. It can be seen that the Mobilenet-based PSPNet model is better than the other two models for this classification task with 4 channels. Our datasets are all obtained by cropping 256 × 256 small images from the original image. The cropping size will affect the number of categories contained in each subset image and thus affect the training of the model. And for this study, the

5.5 Conclusion

115

Fig. 5.13 Prediction on randomly cropped dataset by Mobilenet-based PSPNet

cropping size can only ensure that there are at least 3 categories in each subset image. A dataset that contains as many categories as possible at the appropriate cropping size is more conductive for model training. Although the experimental results of the three models on the dataset are not bad, they still have a high error rate on the smaller (partially overlapped cropped) datasets. To tackle this issue, the future researches will focus on constructing larger semantic segmentation datasets with multi-source remote sensing data and optimizing the models.

5.5 Conclusion The geology lithological classification with remote sensing images is an important content in the remote sensing based geological survey. The classification accuracy will directly affect the subsequent geological work, so the choice of proper classification method is very important. In this study, we developed three DL models to classify lithology and other ground surface substrates by using ZY-3 remote sensing images. The experimental results show that DL models are satisfactory in this task.

116

5 Geological Lithology Semantic Segmentation Based on Deep Learning …

The accuracy of the randomly cropped dataset can achieve about 95%, and the accuracy of the partially overlapped cropped data set is about 60%. This study can provide a method for ground surface substrate classification from remote sensing images in further researches.

References Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. Cracknell, M. J., & Reading, A. M. (2014). Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Computers and Geosciences, 63, 22–33. Gao, S., Zhao, Q., Qi, X., & Cheng, M. (2021). Research on the improved image classification method of MobileNet. CAAI Transactions on Intelligent Systems, 16(1), 11–20 (in Chinese). Harris, J. R., & Grunsky, E. C. (2015). Predictive lithological mapping of Canada’s North using random forest classification applied to geophysical and geochemical data. Computers and Geosciences, 80, 9–25. Ji, Q., Wang, W., Liu, Z., Zhu, M., & Yuan, C. (2021). A machine learning-based lithologic mapping method. Journal of Geomechanics, 27(3), 339–349 (in Chinese). Kuhn, S., Cracknell, M. J., & Reading, A. M. (2018). Lithologic mapping using random forests applied to geophysical and remote-sensing data: A demonstration study from the Eastern Goldfields of Australia. Geophysics, 83(4), B183–B193. Liu, H. (2019). Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Computer and Information Technology , 27(5), 12–15 (in Chinese). Liu, Z., Liao, F., & Zhao, T. (2020). Remote sensing image urban built-up area extraction and optimization method based on PSPNet. Remote Sensing for Land and Resources, 32(4), 84–89. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440). Othman, A. A., & Gloaguen, R. (2017). Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different machine learning algorithms in the Kurdistan Region, NE Iraq. Journal of Asian Earth Sciences, 146, 90–102. Pour, A., Hashim, M., Park, Y., & Hong, J. (2018). Mapping alteration mineral zones and lithological units in Antarctic regions using spectral bands of ASTER remote sensing data. Geocarto International, 33(12), 1281–1306. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention—MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, Part III 18 (pp. 234–241). Springer International Publishing. Yan, Y., Chen, Y., Meng, Y., & Li, Z. (2015). Application of remote sensing technique in the geologic mapping of Daheishan application of remote sensing technique in the geologic mapping of Daheishan. Northwestern Geology , 48(2), 231–237 (in Chinese). Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890).

Chapter 6

Remote Sensing Lithology Intelligent Segmentation Based on Multi-source Data

Abstract Due to complex geological and physicochemical processes, rocks indicate spectral variability and diversity, especially in areas with vegetation development and mountain shadows. In addition, there are differences in the preservation information of multi-source remote sensing data. This article focuses on the problem of traditional models and single remote sensing data which are difficult to effectively extract geological features. A remote sensing lithology semantic segmentation method based on multimodal data adaptive fusion is proposed. To address the issue of redundant information interference caused by direct fusion of multimodal data, utilizing the advantage of high resolution of optical data, a step-by-step fusion method was adopted, which combined SAR data and DEM data separately. The channel attention mechanism was used to learn the eights of optical data to other types of data, and the obtained weights are weighted on each type of data. In addition, in order to distinguish the importance of various features, multiple attention was used to explore the connections between space and channels to enhance the model’s ability to extract key feature information. A remote sensing lithology semantic segmentation method based on prior knowledge embedding was also established. In addition, the existing small-scale geologic map was taken as a priori knowledge, which was used as a label to add additional semantic segmentation tasks, further mining the deep information hidden between lithology, and enhancing the lithologic feature extraction and generalization capabilities of the model. Experimental comparisons were conducted with various popular models on the lithology segmentation dataset constructed in book, and the results proved the superiority of the method in this article.

6.1 Introduction Rocks are an important component of the earth’s crust and mantle, and are also an important matrix type in the earth’s shallow surface. The identification of lithology is one of the important contents of geological survey work. Using multi-source remote sensing data to describe the characteristic differences of different lithologies, combined with deep learning methods, to achieve rapid and accurate classification

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_6

117

118

6 Remote Sensing Lithology Intelligent Segmentation Based …

of lithology is one of the important directions of remote sensing-based geological surveys in recent years. This chapter mainly describes the background and significance of the topic selection of remote sensing lithology classification, combines the current research methods and research status, analyzes some existing problems, and proposes corresponding solutions.

6.1.1 Research Background and Meaning In the realm of geological research, field geological surveys can usually obtain accurate research data. However, in areas with inconvenient transportation, such as the Gobi and plateaus, field working conditions are difficult and inefficient. Remote sensing technology can quickly obtain large-scale, information-rich target observation data in a short period of time, and it also has important applications in civil engineering fields such as basic surveying and mapping, environmental detection, engineering survey, urban planning and geological survey. Remote sensing methods can quickly obtain large-scale earth observation data, greatly reducing field workload (Yang, 2019). Therefore, combining remote sensing images with existing geological survey data to infer stratigraphic lithology has become one of the important means in the field of remote sensing geological survey. Remote sensing lithology identification refers to the process of determining the lithology type and determining its spatial distribution range by analyzing the spectral and spatial characteristics of images (Yang, 2019). The classification and identification of lithology is one of the important tasks in remote sensing geological research. By identifying and delineating the lithology type and spatial distribution of specific areas, it can provide excellent target area prediction for mineral prospecting (Fu et al., 2017). Surface rocks evolved under complex and long geological processes. The mineral composition of the rocks may be similar and the degree of homogenization is significant, resulting in unobvious differences in spectral characteristics between types. In addition, weathering differences, coverage, topography and other factors of rocks lead to not only great differences and diversity in the spectrum of lithology, but also great differences in texture characteristics and geometric properties at different image scales (Zhang et al., 2017). The characteristic information extracted from a single remote sensing data source is only limited to the surface, which cannot meet the needs of complex applications in resource exploration such as basic geological research, etc. (Yu et al., 2022). Especially in areas with complex geological conditions and well-developed vegetation, because lithology is “weak information” and is easily obscured, the distinguishable features that can reflect different rocks are insufficient or impossible to be extracted, which makes geological surveys based on remote sensing technology difficult. Figure 6.1 shows optical images, Synthetic Aperture Radar (SAR) images and DEM images respectively. It can be seen that different lithologies have differences in different data sources. For example, the Quaternary and granite have similar tones in

6.1 Introduction

119

optical images, but the tones are obviously different in SAR images and DEM. Multisource remote sensing data records rich attribute information of ground objects from different dimensions. Through data fusion of different types of data, the accuracy of classification results can be effectively improved (Fu et al., 2017; Wang & Cheng, 2008; Yu et al., 2022). For example, the texture characteristics and spatial distribution of rocks can be combined to directly infer the rock type, and since the distribution of geological elements generally has continuity and spatial autocorrelation, it can be further expanded to external areas and inferred indirectly. Multispectral images have rich spectral information but insufficient spatial details. The geometric features and texture features of high-resolution remote sensing images are clearer and can better reflect the characteristics of vegetation, rocks, soil, water bodies, geological structures and other geological objects. Using remote sensing image fusion technology to fuse the two can enhance lithological edge information and facilitate the extraction of lithological features (Maggiori et al., 2016; Tian et al., 2019). The SAR imaging system has low sensitivity to light and has the benefits of available in all-day and all-weather, as well as high resolution. Different wavelengths and polarization modes of SAR have different imaging responses to rocks with different particle sizes and weathering degrees (Li et al., 2022). It has

Fig. 6.1 Comparison of reflections of different data on lithology

120

6 Remote Sensing Lithology Intelligent Segmentation Based …

unique advantages in geological surveys, ore prospecting, especially in applications such as fault structures, volcanic distribution, and various metal vein detection (Othman & Gloaguen, 2014). DEM records a certain range of plane coordinates and their elevation, and can extract useful information such as slope and surface roughness. It improves feature expression capabilities and classification by integrating DEM, space, texture and other features (Jakob et al., 2015; Othman & Gloaguen, 2014). In addition to spectral characteristics, the minerals contained in different rocks can cause geophysical or geochemical anomalies due to their own properties. These features can be used as the basis for classifying rocks and minerals, thereby making up for the shortcoming that optical image features are easily obscured by other features. Multi-source data fusion technology is used to combine images obtained by two or more different sensors, explore the connections and hidden information, realize the complementarity of advantageous information between different data sources, reduce the uncertainty of various types of images and the redundancy of data, and provide technical guarantee for the accuracy of lithology classification. In the early stages of remote sensing geological research, researchers generally interpreted remote sensing images based on a large amount of field work geological data and professional knowledge and experience, forming a mature method system (Zhang et al., 2019). However, this interpretation technology has problems such as low efficiency and the interpretation results are easily affected by personal experience (Yang, 2019). With the development of earth observation technology, multiple types and large-scale data can be easily obtained, which lays the data foundation for remote sensing lithology identification (Wang, 2020). However, traditional remote sensing image interpretation methods cannot efficiently process this massive amount of data. How to effectively combine big data technology and remote sensing technology has become an important current research direction (Song et al., 2019). Nowadays, DL technology has been widely used in the field of remote sensing image interpretation due to its high precision and high efficiency. When it is applied to the interpretation of ordinary remote sensing optical images, it has achieved far better results than results from traditional methods. As lithology identification is an important part of remote sensing geological research, applying deep learning technology to remote sensing image processing is a current research hotspot in the field of geology (Wang, 2008; Zhang et al., 2018). Multi-source data can expand the characteristic space of lithology and facilitate the extraction of different lithology information, which is an effective means of remote sensing lithology interpretation. Therefore, this paper relies on the mentor’s National Natural Science Foundation project “Fine-scale classification of surface coverage information in open-pit mining areas based on high-resolution remote sensing”, combining deep learning and multisource remote sensing data fusion to carry out intelligent interpretation research on remote sensing lithology classification.

6.1 Introduction

121

6.1.2 Research Status Rocks are composed of a variety of minerals, and the spectral characteristics of rocks are the comprehensive expression of electromagnetic wave absorption and reflection of various minerals (Wang, 2020). In remote sensing images, rocks often show specific spectral and spatial distribution characteristics. Therefore, lithology feature information extraction is the basis for lithology classification of remote sensing images (Dong & Zhang, 2019). In previous studies, various algorithms were generally designed to obtain implicit feature information from remote sensing images. According to the semantic information of remote sensing images that can be expressed by feature information, remote sensing information features can be divided into three categories: low-level features based on visual understanding, midlevel features based on attribute expression, and high-level features based on conceptual semantics. Among them, the first two are generally constructed by constructing algorithm models to extract features from images, and can be collectively referred to as artificial features. Accordingly, remote sensing lithology classification can be divided into two major categories: classification based on artificial features and classification based on abstract semantic features. In addition, from the perspective of data utilization methods, multi-source remote sensing data fusion is also one of the important methods for remote sensing lithology classification.

6.1.2.1

Traditional Classification Methods Based on Artificial Features

Mineral spectrums are an important basis for identifying lithology. In this regard, in the 1980s, Hunt’s team conducted a large number of studies and summarized the spectra of various minerals (Hunt, 1970, 1971). Kruse et al. (1992) firstly treated the pixel as a vector and identify ground objects by calculating the angle in 1992. In 1995, Clark and Swayze (1995) fitted mineral spectral characteristics by improving the least squares method. In addition, a large number of scholars in China have also conducted a lot of researches on lithology identification (Gan et al., 2000; Wang et al., 2007; Zhao et al., 2004). In general, classification methods based on artificial features can be mainly divided into three categories: spectral index method, spectral matching method and hybrid spectral decomposition method. The spectral index method uses statistical methods to combine and analyze the reflectance values of the measured spectra of various types of rocks in different bands to extract indicative spectral indexes. For example, the band ratio method can enhance the difference between spectra and extract different lithology information to identify rock types (Chen et al., 2016). Mao et al. (2014) analyzed the reflection spectrum characteristics of coal and proposed the Normalized Difference Coal Index (NDCI) to extract the distribution range of coal bodies and distinguish lignite and bituminous coal. Song et al. (2019) used the SVC HR1024 spectrometer to test its visible-near infrared spectrum, analyzed the spectral characteristics of burned and unburned gangue, and finally constructed a spectral index (Normalized Difference

122

6 Remote Sensing Lithology Intelligent Segmentation Based …

Gangue Index, NDGI) based on the visible light band to identify the them. Mao et al. (2018) analyzed the spectral characteristics by conducting visible-near infrared spectrum testing, constructed a ratio index, a difference index and a normalized index, and predicted the SiO2 content in Anshan iron ore with an error of only 3.57%. The spectral matching method is to classify and identify rocks and minerals by establishing a rock or mineral spectrum database, based on the known rock or mineral reflection or emission spectral curves, and calculating the similarity between the two with a matching algorithm. The related method includes spectral distance matching, spectral angle matching, spectral correlation matching and other methods. Tong et al. (2016) performed dimensionality reduction on gold-copper mine hyperspectral remote sensing data based on principal component analysis, and then used spectral angle matching method to match and identify the target spectrum. Dong et al. (2020) used spectral features to enhance matching and characteristic parameters for fine mineral identification, with an average accuracy higher than 90%. The pixels of remote sensing images may contain a variety of ground objects, and the spectra of various ground objects may interact with each other, resulting in a reduced recognition accuracy. The mixed spectral decomposition method can determine the proportion of spectral components of different ground objects in the same pixel. The most commonly used mathematical method is the constrained least squares method, that is, the residual of a linear mixed or nonlinear mixed model is minimized under the constraint that the sum of abundances is 1 and non-negative (Liu et al., 2021a, 2021b). The research results of Liu et al. (2011) show that when the spectral testing conditions of rock and mineral endmembers are the same, using bulk endmember spectroscopy can effectively unmix the composition and content of bulk rocks, while the effect of particle spectroscopy is poor. Mixed pixel decomposition is widely used in the realm of land use, vegetation coverage extraction and other fields, but there are few studies on lithology extraction (Yu, 2017). The above traditional remote sensing lithology classification methods generally use mathematical modeling to extract attribute features. Feature modeling requires a large amount of manual participation. Therefore, when faced with massive highresolution remote sensing image data, compared with the later depth semantic features, it may has limited ability to describe the characteristics of images due to the issues such as high similarity, inability to distinguish, insufficient utilization of effective information, and low efficiency.

6.1.2.2

Intelligent Classification Method Based on Abstract Features

In recent years, with the rapid development of deep learning theory and the rapid improvement of computing power, deep learning models have the advantages of high efficiency and high accuracy compared with manual extraction of features for classification. In the field of remote sensing image classification, typical deep learning models such as deep autoencoders (Hinton & Salakhutdinov, 2006; O’shea & Hoydis, 2017), deep belief networks (Hinton et al., 2006) and convolutional neural networks (Tian et al., 2021) have been widely used. Among them, convolutional

6.1 Introduction

123

neural networks (CNN) have excellent image feature extraction capabilities. Typical models include AlexNet (Krizhevsky et al., 2017), GoogleNet (Szegedy et al., 2015), VGGNet (Simonyan & Zisserman, 2014), and ResNet (He et al., 2016), DenseNet (Huang et al., 2017), etc. Its essence is to extract local features of the image by sliding the convolution kernel. The model extracts low-level features such as color, shape, texture features, etc. at a shallow level, and then combines them at a deeper level to obtain more abstract high-level semantic features. The convolution operation makes CNN very suitable for two-dimensional images. It has good translation and scaling invariance, so it has great advantages in image classification, detection, segmentation and other tasks, and has achieved excellent results. In the field of semantic segmentation based on deep learning, the proposal of the Fully Convolutional Network (FCN) is a milestone. It replaces the output of the model from a fully connected layer to a deconvolution layer, and then performs upsampling. The classification accuracy of the original-size semantic segmentation results has been greatly improved (Long et al., 2015). After FCN, many semantic segmentation networks have been proposed, including U-Net (Ronneberger et al., 2015), SegNet (Badrinarayanan et al., 2017), PSPNet (Zhao et al., 2017), DeConvNet (Noh et al., 2015), among which U-Net uses skip connections to combine high-level and low-level features. Splicing retains more edge detail features and improves the classification results of object boundary semantic details. Many subsequent semantic segmentation models were proposed based on the feature fusion idea of U-Net, such as RefineNet (Lin et al., 2017), EncNet (Zhang, 2018), and DeepLab series Network (Chen et al., 2014, 2017a, 2017b, 2018). Many scholars in the field of remote sensing have achieved great success in segmentation tasks by introducing these models and making adaptive improvements (Cui et al., 2020; Kemker et al., 2018; Liu et al., 2022; Shang et al., 2020; Yuan et al., 2021; Zhang et al., 2020). In the field of geology, hyperspectral remote sensing data and multispectral remote sensing data contain rich spectral information of ground objects. A large number of scholars conducted analysis based on remote sensing imaging technology combined with lithological spectral characteristics, and applied it to fields such as lithological geological exploration, classification and mapping (Pal et al., 2019; Rezaei et al., 2020; Sekandari et al., 2022; Wu et al., 2020; Ye et al., 2020). Liu et al., (2021a, 2021b) designed a deep convolutional neural network to extract lithological features of thermal infrared hyperspectral data for classification. SAR has a special imaging mechanism. Compared with optical images, it can obtain information such as the dielectric constant of ground objects, and is widely used for classifying surface environmental elements (Wang et al., 2015; Yang et al., 2010). In recent years, some scholars have begun to apply machine learning algorithms to the classification of lithological SAR images. Xie et al. (2015) performed polarization decomposition on fully polarimetric SAR data and classified lithology based on the support vector machine algorithm. Wang et al. (2018a, 2018b) used polarization decomposition to extract features from polarimetric SAR images, and then used autoencoders for classification. Li et al. (2022) constructed a SAR lithology classification dataset by processing dual-polarization SAR images, and compared multiresolution results with SAR image classification based on small neighborhoods.

124

6 Remote Sensing Lithology Intelligent Segmentation Based …

The results proved that SAR images of different scales have different advantages. Combining multiple scales for classification can effectively improve accuracy. In summary, applying machine learning algorithms to remote sensing data processing to achieve high efficiency in remote sensing image processing and application has good development prospects and important research significance in the field of remote sensing based geological research. However, deep learning models are essentially fitting data and require a large amount of labeled data. The method itself lacks interpretability, which prevents deep learning methods from being widely used in some security-sensitive tasks.

6.1.2.3

Classification Method Based on Multi-source Data Fusion

Data fusion methods have huge application prospects. In recent years, many scholars have conducted lithology classification based on multi-source data fusion and machine learning algorithms (Han et al., 2022; Seid et al., 2021; Shebl & Csámer 2021). Wang et al. (2021) used multi-source data fusion technology to process (Advanced Spaceborne Thermal Emission and Reflection Radiometer, ASTER) remote sensing data and geochemical data, retaining the geochemical characteristics and texture structure of lithological units, and used random forest method to identify lithology. Pal et al. (2020) used a classifier integration method to identify lithology in hyperspectral images and multispectral images and achieved lithology mapping of remote sensing images. The results proved that this method can effectively improve the accuracy of lithology mapping. Yu et al. (2022) combined remote sensing and aerial geophysical prospecting multi-source data and used the random forest method to extract and classify lithology. The results show that joint analysis of multi-source data can significantly improve the accuracy of lithology classification. The above studies have shown that diversified information sources can effectively describe lithology information, and collaborative processing is beneficial to lithology information extraction and improve classification accuracy. However, although the data-level fusion method retains a large amount of information, the amount of data is large and there are differences among various sensors, making it impossible to effectively analyze the data. The amount of feature data extracted by feature-level fusion is small and easy to calculate, but compared with data-level fusion information The loss is significant; decision-level fusion can effectively synthesize the processing results of different sensors, but it is difficult to implement and the final decisionmaking criteria is difficult to determine. To sum up, the multi-source data fusion method has advantages and disadvantages. In practical applications, the optimal method needs to be selected according to needs.

6.1 Introduction

125

6.1.3 Research Objectives and Research Content 6.1.3.1

Research Objectives

In order to solve the problem of difficulty in extracting lithology information in the process of lithology interpretation based on remote sensing images, this paper plans to use multi-modal data including optical images, SAR images, and DEM images to propose an adaptive fusion strategy for multi-modal data, using the gradual way to fuse optical data and other data to avoid initial information interference with each other. It also weights features of different channels and positions based on the attention mechanism, suppresses the interference of redundant information, and enhances the information interaction of features at different levels through multilayer fusion of features. In addition, a semantic segmentation method embedding geoscience prior knowledge is proposed, which preprocesses the existing geological data, uses the dilated spatial convolution pooling pyramid structure to extract multi-scale information, and generates boundary binary labels to guide the model to learn the boundary. This study is to achieve break through the key technologies of “acquisition of effective remote sensing lithology characteristics—multi-modal data fusion—introduction of prior geoscience knowledge” for intelligent interpretation of lithology remote sensing to improve the accuracy and generalization ability.

6.1.3.2

Research Content

With the development of remote sensing technology, the number of high-resolution remote sensing data continues to increase. The processing of such massive data urgently needs a method that can automatically process, have high efficiency and good accuracy. Due to the differences in imaging principles and technical means, data from different sensors cannot be directly fused and processed. In order to achieve intelligent classification of lithology based on multi-source data, this study proposes a solution for adaptive fusion of multi-source remote sensing data and using prior knowledge to guide the network to extract features, and designs and optimizes deep learning models to mine learn from remote sensing image data (Fig. 6.2). Hidden connections between lithologies can achieve feasible solutions that meet the practical requirements of high accuracy and generalization capabilities. Specific research contents include: (1) Multimodal remote sensing data adaptive fusion method Different from other tasks in the field of remote sensing, lithology belongs to highlevel abstract semantics and is often unable to be effectively described by a single data source. Furthermore, given the spatial autocorrelation in the distribution of geological covariates, lithology classification also relies heavily on contextual information. When using multi-source remote sensing data to collaboratively describe lithology information, due to imaging differences between the multi-source data, direct fusion

126

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.2 Overall technology roadmap

6.2 Methods

127

cannot make full use of these data. In addition, there will also be mutual influence of noise information between the multi-source data, resulting in the inability of effective information. If it is fully extracted, the potential connections between the data cannot be will revealed, which may even lead to poor prediction results. Due to the complexity of multi-source data, direct fusion can easily lead to the interference of redundant information, resulting in the loss of useful information. This method adopts an adaptive fusion strategy, taking the advantages of optical data spectrum, high resolution, good visual interpretation, and closeness to natural images to extract useful information and achieves the complementary advantages between multiple sources of data. (2) Remote sensing lithology semantic segmentation method with prior geoscience knowledge The various minerals that make up rocks undergo various chemical and physical changes during formation, and are highly homogenized and diversified. This often causes rocks to indicate mixed spectral characteristics, resulting in small differences in spectral characteristics between classes, and intra-class textures, while the difference in color tone increases instead. Remote sensing images show complex class edges, and texture and spectral features are easily obscured and destroyed by vegetation coverage, terrain and other geological conditions, which increases the difficulty of intelligent identification of lithology through remote sensing. In this case, different lithologies cannot be distinguished using remote sensing images alone, and the model cannot extract distinguishable features. This study tends to use the embedding prior knowledge to preprocess existing small-scale geological maps, extract multi-scale information and label it to guide the model learning boundaries, further explore the potential difference information hidden between lithologies, and enhance the model’s feature extraction ability and generalization ability.

6.2 Methods 6.2.1 Remote Sensing Lithology Semantic Segmentation Method Based on Adaptive Fusion of Multi-source Data Remote sensing images record the reflection information of electromagnetic waves by ground objects. Due to the shielding effect of vegetation, terrain, etc., the lithology information is easily destroyed or obscured. In addition, different lithologies may indicate similar colors in remote sensing images, which also restricts the extraction of lithology information. A single remote sensing data source contains limited information. In order to solve this problem, multi-source data can be used for joint analysis by combining multi-sensor data, terrain data, geophysical exploration data, geochemical data and field survey data to describe lithological information from multiple aspects. Using multi-source data fusion technology to fuse data with different resolutions,

128

6 Remote Sensing Lithology Intelligent Segmentation Based …

different spectral characteristics, and multiple sources can provide rich information and data support for lithological feature extraction. From the perspective of remote sensing geology, optical images can provide high-resolution, multi-spectral information of rock and mineral characteristics, such as texture, shape, color, etc., with good visual effects, which are of great help in extracting lithological characteristics. SAR images have certain sensitivity to physical information such as surface roughness and shape, and can reflect topography and landform information. However, they are affected by noise and are difficult to interpret. Topographic analysis can be performed based on DEM data. For example, terrain relief and slope can be used in geological structure analysis to identify fault zones and mountain uplifts, which is of great help to geological exploration research. However, in complex geological structure areas, it has to be used in conjunction with other data to improve accuracy. In summary, although multi-source remote sensing data can provide multidimensional and multi-scale information to describe lithology, due to its “multisource heterogeneous” characteristics, information redundancy, mutual influence and other factors, the performance of direct fusion is unacceptable. In this case, this chapter proposes a remote sensing lithology semantic segmentation method based on adaptive fusion of multi-source data. First, the optical data is combined with SAR data and DEM data based on channel attention mechanism that is used to learn how the optical data performs on other types of data. Then multi-dimensional attention mechanism is used to extract deep features between spaces and channels, strengthen information interaction, thereby suppressing redundant information and improving the data fusion effect.

6.2.1.1

Multimodal Data Adaptive Fusion Method

Optical images have high spectral resolution. Band combination after pre-processing can retain texture information, enrich tones, and enhance the visual effects of images. Interpretation signs such as mineral tones, alterations, and texture features can be easily interpreted visually. SAR images record the backscattering characteristics of radar electromagnetic waves from ground objects. Due to its special imaging mechanism, the dielectric constant and surface roughness of rocks can be obtained, which is displayed as texture on the image and imaging is not limited by light. Compared with visible light and infrared, it can penetrate vegetation and soil to a certain extent. The penetration ability depends on the wavelength. In addition, the spatial resolution of SAR images is lower than that of visible band images. And the scattering echoes are correlated, information is redundant, image signal-to-noise ratio is low, allowing the visual interpretation to be difficult. DEM data describes the landforms and can extract slope, shadow and other information to determine the spatial distribution of rocks, but the original image has poor visible information. Therefore, based on the above theory, this study takes advantage of the high resolution and rich spectral information of optical images, and adopts a progressive fusion method to integrate SAR and DEM images respectively to guide the model

6.2 Methods

129

Fig. 6.3 Adaptive fusion method

to better extract features, as shown in Fig. 6.3. The fusion consists of three steps, as follows. First, the Gaofen-6 image and the SAR image are stacked on the channel, then the feature residual module is used to learn and optimize the output feature map. The multi-attention module is used to learn the channel importance of the feature map, and the weight coefficient vector is output. After the vector is split in the channel dimension, the weight values of the two types of data are obtained respectively, assumed to be ω1 and ω2. In the same way, the weight vectors of the Gaofen-6 image and the DEM image can be learned, which are set to ω3 and ω4 respectively. The second is the feature weighting module. For the two weight values of the Gaofen-6 image learned in the first two times, this article uses Eq. 6.1 to synthesize them: ω = αω1 + (1 − α)ω3

(6.1)

where α is a learnable parameter, and the initial value is set to 0.5. Finally, there is the multi-attention module, which uses a residual fusion to tackle problems such as network degradation and gradient, strengthen the information interaction between channels and spaces, suppress redundant information, and improve classification accuracy.

Channel Attention When extracting features in a convolutional neural network, H × W × C convolution kernels are generally used to perform convolution operations on multiple channels of the input data, so that the features of each channel can be extracted separately, and then added together to obtain a feature map, which enhances the information exchange between channels. Multi-channel feature maps can be obtained by using multiple convolution kernels. The channel attention mechanism can be used to learn the importance of each feature channel, assign more weight to channels containing key features based on different tasks, and weaken redundant information.

130

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.4 ECA channel attention module

Nowadays, the channel attention mechanism has been widely used in deep learning models, effectively improving the performance of the model. However, most channel attention mechanisms introduce more parameters, which inevitably increases the amount of calculation. The Efficient Channel Attention (ECA) model (Wang, 2020) believes that avoiding dimensionality reduction and ensuring appropriate information interaction between channels can maintain performance without increasing the complexity of the model. The ECA module includes three operations: global average, maximum pooling operation, local one-dimensional convolution, and feature weighting, as shown in Fig. 6.4. Assuming that the size of the input feature map is H × W × C. First, the global average pooling operation is used to obtain a 1 × 1 × C one-dimensional vector. Each component of this vector is the average value of all elements in each feature. The benefit of global average pooling is that it can retain more background information of the original feature map. Secondly, using global maximum pooling, we also get a one-dimensional vector of 1 × 1 × C. Each component of this vector is the maximum value of all elements in each feature. The advantage of maximum pooling is that it affects the edges of the image. It is more sensitive to texture features, and finally the two feature vectors are added. The calculation process of this operation can be expressed by Eq. 6.2: O1×1×C =

1 H ×W

H

W

FH ×W ×C (i, j) + Max Pool H ×W (FH ×W ×C )

(6.2)

i=1 j=1

where FH ×W ×C represents the feature map with the number of channels C, height H and width W. i and j respectively represent the pixel coordinates in the feature map.

6.2 Methods

131

The operation of channel dimensionality reduction will have a negative impact on the learning of attention information. It is not necessary to use a fully connected layer to obtain the dependencies of all channels, and the parameters will also increase, resulting in inefficient performance. Therefore, the second operation is to use local one-dimensional convolution (1 × 1 × K) to obtain the interactive information between adjacent channels to avoid reducing the channel dimension when learning channel attention information. The amount of parameters introduced in this way is C × K, and the calculation is more efficient. The convolution process can be expressed by Eq. 6.3. ω = σ (cov1×1×K (C))

(6.3)

where ω represents the learned weight, cov1×1×K represents a one-dimensional convolution with a convolution kernel size of K, C represents the input channel feature vector, and σ represents the activation function. Generally, the Sigmoid function is used to put the weight coefficient and shrink to between 0 and 1, which can avoid the destroy of features. Generally speaking, large-sized convolution kernels can extract long-distance dependencies, and small-sized convolution kernels are convenient for capturing short-distance interactions. The setting of K will affect the size of the receptive field, and manual adjustment wastes computing resources. Therefore, in order to adaptively select different K sizes for the input feature map to extract features of different ranges, an adaptive convolution kernel size is used here, and the calculation process can be expressed by Eq. 6.4: k = ψ(C) =

log2 (C) b + γ γ

(6.4) odd

where |X|odd represents the odd number closest to Proportion, γ and b respectively represent the parameters when the nonlinear approximation equation C = φ(k) ≈ exp(γ ∗ k − b) is used to formulate the ratio between C and K, this article sets γ to 2 and b to 1. According to the formula, it can be seen that the ψ function causes a long-distance interaction between the input channel number and a larger K, and vice versa. Finally, there is the feature weighting operation, which expands the learned feature weight coefficient and multiplies it with the original feature map, which is equivalent to scaling the features of each channel, so as to pay more attention to the key channel features.

Multidimensional Attention When using traditional methods to calculate channel attention, global pooling is required to scale multiple feature maps into a 1 × 1 × C vector. This vector can be used to calculate the weight coefficient, and uniformly weight the original feature

132

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.5 Multiple attention module

maps. However, global pooling will lead to a large loss of spatial detail information. When using this vector to calculate attention, the dependence between the channel dimension and the spatial dimension cannot be captured. Although some attention models such as CBAM (Woo et al., 2018) and BAM (Park et al., 2018) try to connect channels and spaces, they all connect the two in parallel or in series. In essence, the calculations of the two are independent. The Triplet Attention mechanism (Misra et al., 2021) achieves the interaction between dimensions through rotation operations and residual transformation, establishing dependencies, and has a smaller number of parameters. The triple attention module contains three branches. Each branch encodes information and establishes connections in two dimensions, as shown in Fig. 6.5. Assuming that the input feature map size is H × W × C, the calculation process of triple attention is as follows: The first branch is used to capture the cross-dimensional interaction between H and C. First exchange the W and C axis of the input feature map to obtain the H × C × W feature map, and then perform channel pooling (Channel Pool) along the channel axis to obtain the H × C × 2 feature map. After that, K × K convolution is performed to learn the attention weight coefficient and scale the number of channels to 1. After the batch normalization layer, the Sigmoid function is used to map the weight coefficient to between 0 and 1. The rotated features are weighted by point multiplication, and then the W and C axes are switched to obtain a feature-weighted feature map with the original feature map size. Channel pooling refers to the process of splicing the obtained feature maps in the channel dimension after performing global average pooling and global maximum pooling along the channel axis, which can be expressed by Eq. 6.5: Z − Pool(x) = Concat[Max Pool(x), Avg Pool(x)]

(6.5)

The second branch is used to capture the cross-dimensional interaction between W and C. Similarly, the C and H axes of the input feature map are first exchanged to obtain a C × W × H feature map, and then channel pooling (Channel Pool) is performed along the channel axis to obtain a C × W × 2 feature map. After passing through the K × K convolution layer and batch normalization layer, the Sigmoid function is used to calculate the weight, and a C × W × H weight map is obtained,

6.2 Methods

133

which is multiplied and weighted by the rotated feature map. Then the exchange channel maintains the same shape as the original feature map. The third branch is used to capture the spatial dependence between H and W based on spatial attention. Directly perform channel pooling on the original feature map along the channel axis to obtain a H × W × 2 feature map. After passing through the convolution layer and the batch normalization layer in sequence, after being mapped by the activation function, the residual transformation is used and the original feature maps are weighted. Finally, after calculating the feature maps of the three branches and averaging them, the cross-dimensional interaction feature map can be obtained. The entire calculation process of the triple attention mechanism can be expressed by Eq. 6.6: y=

1 xˆ1 σ ψ1 xˆ1∗ 3

+ xˆ2 σ ψ2 xˆ2∗

+ xσ ψ3 xˆ3∗

(6.6)

where xˆ represents the feature map after rotation, xˆ ∗ represents the feature map after channel pooling, ψ represents the convolution layer and normalized batch layer with a convolution kernel size of K, and σ represents the activation function Sigmoid. In addition, triple attention only adds a small number of parameters when modeling cross-dimensional dependencies. With the equation, we can get the 3 × 2 × K2 feature map, which can effectively save computational resources while ensuring performance.

Feature Residual Module For lithology classification, a broad distribution range contains hidden regional background information. This background information can effectively reduce the impact of “same objects with different spectra” or “same spectra and different objects” in rocks and minerals, and improve identification accuracy. The convolutional neural network uses a convolution kernel of K × K size to perform sliding detection on the image. The size of K determines the range of elements participating in the convolution operation. With operations such as convolution and pooling, the receptive field gradually becomes larger, and more abstract contextual information can be obtained. Therefore, this article adds a feature residual module that optimizes features in the early fusion stage, including two branches, as shown in Fig. 6.6. The first branch uses 1 × 1 convolution to initially fuse the two types of spliced data, mainly to enhance the information interaction between channels and adjust the number of channels. The second branch performs 3 × 3 convolution on the original feature map, batch normalization, and activation function, and then performs 3 × 3 convolution again to obtain a feature map of the same size as the first branch. The two feature maps are added and activated using the ReLU function to output the final fused feature. Among them, the features learned by the second branch can be used to optimize the results of 1 × 1 convolution, which is equivalent to using larger-scale,

134

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.6 Feature residual module

abstract semantic features to perform residual fusion optimization on the underlying features containing detailed information.

6.2.1.2

Experimental Design

Evaluation Metrics Evaluation indicators are an important reference for evaluating the performance of semantic segmentation models. Currently, there are two main evaluation criteria for semantic segmentation models. One is based on pixel accuracy (PA) and the other is based on IOU. The experiments in this chapter use OA, F1_score, and MIOU to evaluate the accuracy of the experimental results, and their calculation refer Sects. 3.2.4 and 5.2.2.

Experimental Environment and Parameter Settings (1) Experimental software and hardware environment The experiment in this article is based on the 64-bit Centos Linux 7.9 operating system and uses the PyTorch deep learning framework to build the experimental model. The specific software and hardware environment of the experiment are shown in Table 6.1. (2) Experimental parameter settings When training the model, in order to facilitate comparative experiments, this chapter fixes a set of hyperparameters, as shown in Table 6.2.

6.2 Methods

135

Table 6.1 Experimental software and hardware environment Experimental software and hardware configuration

Specific parameters

Cpu

Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20 GHz

Gpu

NVIDIA GeForce RTX 2080Ti

Memory

64 GB

Operating system

CentOS Linux release 7.9.2009 (Core)

Python version

3.7.11

PyTorch version

1.9.1

CUDA version

12.0

Table 6.2 Experimental parameter settings (without prior knowledge)

Parameter name

Parameter value

Input data

RGBN+DEM+SAR

Epoch

100

Learning rate

0.0001

Weight attenuation coefficient

0.0001

Optimizer

Adam

Loss function

Cross entropy loss function

Batch size

8

6.2.2 Remote Sensing Lithology Semantic Segmentation Method Based on Prior Knowledge Deep learning models can automatically learn the feature representation of input data and then directly map it to the output without the need for manual design of feature extractors or other steps. This learning method is called end-to-end learning. Although the end-to-end approach makes the model easy to build and use, and the results are surprising, the data distribution patterns and nonlinear transformations in the high-latitude space of the middle layer of the model are very complex, making it difficult to intuitively explain the decision-making basis of the model. The generalization ability in some application areas is poor. Nowadays, in order to improve the generalization ability of deep learning models, scholars in many research fields try to integrate knowledge in the corresponding fields into the models, making the output results of deep learning models easier to understand and interpret. For example, in the field of geology, existing geological data such as rock types, regional background information, geochemistry, and geophysical anomaly information are converted into feature representations to describe geological systems and geological processes and improve the interpretability of the model. In addition, many geological problems have obvious constraints and restrictions, such as the spatial distribution of rock masses, the dependent symbiosis and transformation of minerals, etc., which can be

136

6 Remote Sensing Lithology Intelligent Segmentation Based …

used to limit the learning space of the model, making the learning process of the model more consistent with geological laws. With the development of database technology and the establishment and disclosure of geological spatial databases, it has become easier to obtain small-scale geological data. However, large-scale geological data are in short supply due to factors such as heavy workload, long cycle, and consumption of artificial and material resources. Small-scale geological data covers a wide spatial range and contains fewer details of geological information. In contrast, large-scale geological data conducts in-depth research on a certain area or geological issue and contains more detailed information. In order to adapt to different application needs, it is necessary to obtain geological data with as much detailed information and as high accuracy as possible. Existing small-scale geological data provides preliminary identification and interpretation of geological units in the region, and can guide large-scale geological data to identify geological units. Detailed description and labeling are also important means for carrying out large-scale geological surveys. This chapter, on the basis of obtaining the 1:200,000 geological map and 1:50,000 geological map data of the study area, preprocesses the 1:200,000 geological data as prior knowledge to adapt it to the input of the deep learning model to improve the accuracy and generalization ability of the model results, and improve the feature learning ability and generalization performance of the model.

6.2.2.1

Methods to Integrate Prior Knowledge

Prior knowledge refers to the preliminary understanding of things based on observation and existing experience data, domain knowledge, objective laws, etc. before starting a task, which provides guidance for subsequent work. In the process of human cognition of the world, prior knowledge plays a vital role. This knowledge exists in the huge neural network of the human brain and is constantly updated and optimized over time. When recognizing new things, through complex calculations of neural networks, analogies and inductions are used to summarize the similarities and patterns between different things. In the field of remote sensing geological research, regional geological research historical records and results are important basic data for conducting further researches, which can save a lot of artificial and material resources. For example, when carrying out geological surveys, the existing geological data records the lithology type, rock distribution characteristics, geological structure, structural evolution and other information of the area at a certain scale. When interpreting remote sensing images, the geological map and remote sensing images are overlaid and compared to analyze, so as to quickly identify the distribution of lithology and landforms, and improve the accuracy and efficiency of analysis. Due to the limitations of geological conditions, physical and chemical conditions, there will be dependent, symbiotic and mutually exclusive relationships between minerals. For example, in sedimentary rocks, sandstone and mudstone often coexist, and sandstone may contain clay minerals. This information can be used as remote sensing geological solutions.

6.2 Methods

137

In the field of machine learning, the attention mechanism can be seen as the application of prior knowledge, which is essentially a feature selection mechanism. The learning of the model is constrained based on the prior knowledge of related tasks, thereby enhancing the attention of certain features and improving the accuracy of the model. When performing remote sensing image classification tasks, prior knowledge such as spectral characteristics, shape characteristics, texture characteristics, etc. of the objects of interest are used to adjust the model’s attention to different areas, thereby improving the recognition accuracy. Integrating prior knowledge into the deep learning model can improve the robustness and generalization ability of the model, which is of great significance to the geological interpretation of remote sensing images. It is an important development direction and trend for geological interpretation based on deep learning. This section discusses the geological interpretation and the combinations of prior knowledge and deep learning models are summarized, including feature extraction and fusion, multi-task learning.

Feature Extraction and Fusion There are many types of geological data, including various forms such as text and charts. Therefore, it is necessary to use feature engineering to digitally convert these geological data for use by deep learning models. For example, rock types, mineral compositions, lithological boundaries, etc. are extracted from geological maps, resistivity, magnetization anomalies, etc. are extracted from geophysical data, mineralization anomalies, etc. are extracted from geochemical data, and elevation and slope are extracted from terrain data. After extracting features, the features need to be filtered and dimensionally reduced based on expert knowledge to reduce model complexity and computational overhead. For geological map data, image processing technology can also be used to perform edge detection, texture enhancement, filtering and denoising on geological images to improve image quality and facilitate the model to extract more geological body features. In addition, the data can be enhanced based on prior geoscience knowledge. For example, image data can be rotated, translated, simulated noise, etc., or data transformation can be used to transform the image from the spatial domain to the frequency domain to obtain more data samples; Laplace transform can be used for geophysical exploration data, Hough transform and other methods can be used to extract linear features and circular features in the data according to physical laws, so as to better identify geological bodies and enhance the robustness and generalization ability of the model. After digitizing various geological data, these data can be fused based on deep learning methods. The fusion methods include data-level fusion, feature-level fusion, decision-level fusion, etc. as mentioned above.

138

6 Remote Sensing Lithology Intelligent Segmentation Based …

Multi-task Learning In remote sensing geological interpretation, remote sensing data, digitized geological maps, geophysical, chemical data, core data, etc., are a joint description of the regional geological background. When the data contains remote sensing data and other geological auxiliary data, directly splicing them together for lithology classification may cause certain information loss and confusion. Multi-task learning is a machine learning technique that improves efficiency by learning multiple tasks in a single model that can share underlying features, such as shape, texture, etc. When classifying remote sensing lithology, multi-task learning can combine multi-source data for comprehensive analysis, treat prior knowledge as an additional task, and use shared features to perform lithology classification, topography and landform classification, etc. There are differences between these tasks. For example, in geological evolution, terrain also has a certain feedback effect on the formation of lithology. In steep mountainous terrain, rock types such as granite and porphyry are more likely to be exposed on the surface, while sedimentary rocks such as sandstone and mudstone are more likely to be covered. Under the surface. Therefore, when classifying remote sensing lithology, using terrain classification tasks can assist lithology classification and improve classification accuracy. In addition, in multi-task learning, it is necessary to consider the correlation of tasks to design multiple loss functions to optimize the learning of different tasks. It is usually necessary to change the weights of the loss functions of different tasks to weight the importance of the tasks. Improve the generalization ability of the model.

6.2.2.2

Model Design

In the field of remote sensing geology, small-scale geological maps such as 1:200,000 data are a rough description of the geological profile of the whole region. They have also been verified by field surveys and are important reference materials for further geological surveys. The formation lithology in them is contour information has certain utilization value for remote sensing lithology semantic segmentation tasks. This study improves the FCN model so that it can effectively utilize 1:200,000 label data and improve the accuracy and generalization ability of lithology semantic segmentation results. Specifically, three modules need to be designed to improve the model, including a multi-scale boundary information extraction module, a boundary label generation module and a shallow feature encoding module, as shown in Fig. 6.7. The feature extraction module of VGG (Simonyan & Zisserman, 2014) mainly has 5 stages. Each stage performs convolution, batch normalization, pooling and other operations. The shallow part of the feature extraction network retains rich boundary information, and through transformation and combination more advanced features such as shapes can be produced. Therefore, the encoding module is used to encode these features to guide the network to learn the underlying spatial information. This process is an additional two-class semantic segmentation task, with the purpose of

6.2 Methods

139

Fig. 6.7 Prior knowledge fusion multi-task network

learning the boundary information of 1:200,000 labels. Secondly, the ASPP structure is used to extract the multi-scale boundary information of 1:200,000 labels and integrated into the model to improve the semantic segmentation performance. In addition, the Sobel edge detection operator and the Laplacian operator are used in the above model. Sobel is a linear operator that detects edges by calculating the gradient of the image, and calculates the image grayscale in both horizontal and vertical directions. The approximate gradient of degree is simple to calculate and efficient, but it cannot detect edges in other directions. Assuming that the image to be processed is I, the derivation process in the two directions is as follows: ⎡

⎡ ⎤ ⎤ −1 0 +1 −1 −2 −1 G x = ⎣ −2 0 +2 ⎦ ∗ I, G y = ⎣ 0 0 0 ⎦ ∗ I −1 0 +1 +1 +2 +1

(6.7)

where G x is the horizontal change of the image, and G y is the vertical change of the image. Combining the calculated horizontal and vertical gradients, the approximate gradient of the pixel can be calculated by Eq. 6.8: G=

G 2x + G 2y

(6.8)

Usually in order to improve efficiency, it can be replaced by Eq. 6.9: |G| = |G x | + G y

(6.9)

140

6 Remote Sensing Lithology Intelligent Segmentation Based …

After calculating the gradient map of the image, a threshold can be used to obtain the image edge. The Laplacian operator is a nonlinear operator that detects edges by performing second-order differentiation on the image. It has rotation invariance and can detect more comprehensive edge information. It can be approximated by the following convolution template: ⎡

⎤ 0 1 0 G = ⎣ 1 −4 1 ⎦ 0 1 0

(6.10)

The result of this operator is the same in the four directions of up, down, left and right, that is, it has no directionality. The Laplacian operator is based on the secondorder derivative, and the boundary can be judged based on whether the derivative is equal to zero. Since the derivative equal to zero is only a necessary condition for taking the extreme value in actual situations, it is necessary to specify an appropriate threshold. When the absolute value of the response at a certain point exceeds the specified threshold, the point can be used as a boundary point.

Multi-scale Information Extraction In the lithology classification task, since lithology is generally distributed continuously over a large area, the boundary information between lithologies is usually a key feature to distinguish different lithologies. Geological maps can be used to obtain the approximate distribution location and regularity of geological units in a large area. While lithology is generally distributed continuously, multi-scale boundary information can provide continuous distribution information at different scales. Therefore, this chapter uses the ASPP structure to extract multi-scale boundary information and improve the feature extracting ability of the model, as shown in Fig. 6.8. The ASPP structure contains 5 parallel branches, four of which use hole convolution with hole rates of 1, 3, 6, and 12 to extract multi-scale lithology distribution information. The fifth branch uses the Sobel operator to calculate X direction and the gradient in the Y direction respectively, take the absolute value and adds it together. Finally, the five feature maps are added together, and the Sobel operator is used to optimize the learned multi-scale boundary information, and finally the features are integrated into the model.

Boundary Label Generation From an image perspective, the 1:200,000 geological map can only be used as the label, and the pixel value only represents the gray value. There is no one-to-one correspondence with the rock category, and it is meaningless for lithology classification.

6.2 Methods

141

Fig. 6.8 Multi-scale boundary information extraction

However, small-scale geological maps can provide the overall lithology distribution information. Although this information is relatively simplified or weak, it can still be used as prior knowledge to assist fine-grained semantic segmentation tasks. Therefore, this chapter tends to generate a binary boundary label map based on the 1:200,000 geological map. First, the Laplacian operator is used to perform edge detection on the label map using multiple steps, then 1 × 1 convolution is used to adjust the number of channels and fuse the information, and finally the PyTorch API function is used to truncate the value from zero, and finally the threshold is 0.1. This process converts the prediction result into a binary image, in which the boundary position is 1 and the background value is 0, as shown in Fig. 6.9. In addition, since the number of boundary pixels is much smaller than that of background pixels, there would be a serious class imbalance problem in boundary learning. Therefore, boundary learning can include some strategies to alleviate the class imbalance problem, such as binary classification weighted cross-entropy loss function or DICE loss function. The weighted cross-entropy loss function can be as shown in Eq. 6.11: n

L=−

ωi yi log( pi ), ωi = i=1

N − Ni N

(6.11)

142

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.9 Boundary label generation

where ωi represents the weight, N represents the total number of pixels, Ni represents the number of pixels of the i class, pi represents the probability of predicting the i class, and yi represents the label of the i class. In this way, when a certain category is rare, its weight is increased, allowing the model to pay more attention to that category and improve recognition accuracy. In summary, this module obtains binary boundaries by detecting edges and treats the learning of boundary information as an additional learning task. During the training phase, underlying features can be shared and the model can process multiple tasks simultaneously, thereby improving segmentation accuracy.

Shallow Feature Encoding In deep learning models, shallow networks often have fewer parameters and smaller receptive fields, without pooling and other operations, and can better retain the original boundary details of the image. Therefore, this chapter encodes and outputs the second-stage features extracted by the VGG (Simonyan & Zisserman, 2014) model to enhance feature representation and calculate boundary loss. As shown in Fig. 6.10, after passing the feature map through a 3 × 3 convolution layer, batch normalization layer, and ReLU activation function, the loss is calculated after a 1 × 1 convolution to adjust the channel number output.

6.2 Methods

143

Fig. 6.10 Shallow feature encoding module

Table 6.3 Experimental parameter settings (with prior knowledge) Parameter name

Parameter value

Input data

RGBN+DEM+SAR+1:200,000 label

Epoch

100

Learning rate

0.0001

Weight attenuation coefficient

0.0001

Optimizer

Adam

Loss function

Cross entropy loss function

Batch size

8

6.2.2.3

Experimental Design

Evaluation Metrics For specific calculation methods, refer to section “Evaluation Metrics”.

Experimental Environment and Parameter Settings (1) Experimental software and hardware environment For details, see Table 6.1. (2) Experimental parameter settings When training the model, in order to facilitate comparative experiments, this chapter fixes a set of hyperparameters, as shown in Table 6.3.

144

6 Remote Sensing Lithology Intelligent Segmentation Based …

6.3 Results and Discussion 6.3.1 Evaluation of Test Set Accuracy of Multi-modal Data Adaptive Fusion Method 6.3.1.1

Overall Accuracy Evaluation

In order to facilitate the comparison of model performance and the effectiveness of data fusion, the classic network models used in this chapter include FCN (Long et al., 2015), U-Net (Ronneberger et al., 2015), PSPNet (Zhao et al., 2017), DeepLabV3+ (Chen et al., 2018), Swin-Unet (Cao et al., 2022), FTUnetFormer (Wang et al., 2022), and the feature extraction network is uniformly based on VGG (Simonyan & Zisserman, 2014). Verify the performance of the model was conducted based on combining the data fusion module (DFM) and the classic semantic segmentation model. OA, F1_score, and MIOU are used as evaluation metrics. Each group conducts 5 repeated experiments, removes the highest value and the lowest value, and then takes the mean and standard deviation of the three experimental results as the final result. As shown in Table 6.4, the segmentation performance of FCN, U-Net, and FTUnetFormer are significantly better than that of the rest models, with IOUs reaching 45.14 ± 0.21%, 42.04 ± 0.5%, and 43.14 ± 0.1% respectively. This is due to the use of feature fusion in these model structures. Since lithology belongs to high-level semantic information, identification of lithology relies on feature fusion. The high-level features of the model can distinguish the differences between different lithologies, while the low-level features can obtain differences in textures within the lithology. Combining the two can express rock characteristics more comprehensively, thereby achieve higher recognition accuracy. Swin-Unet is based on U-Net and uses Swin Transfomer for encoding and decoding. Although it uses the sliding window form to extract features like CNN, due to its fixed block size and number of layers, it is difficult to extract rock features at different scales. Therefore, the method may not be accurate enough, which also has a large number of parameters, leading to overfitting. In addition, when using remote sensing images to classify lithology, spatial information is more important. Swin Transfomer model does not adequately extract the detailed spatial information. From a horizontal comparison, the data fusion module proposed in this chapter has good performance on most models. Among them, the IOU of FCN, U-Net, and FTUnetFormer increased by approximately 2.14%, 1.26%, and 1.21% respectively. This can show that the use of progressive data fusion and attention mechanism can effectively improve the model’s lithological feature extraction capabilities, thereby improve the recognition accuracy.

66.50 ± 0.49 63.05 ± 1.67 59.84 ± 1.01 61.6 ± 0.74 64.89 ± 0.58

65.64 ± 0.65

63.65 ± 0.43

59.81 ± 0.83

62.35 ± 0.97

65.72 ± 0.84

U-Net

DeepLabV3+

Swin-Unet

FTUnetFormer

67.29 ± 0.52

57.64 ± 0.46

51.93 ± 1.19

52.36 ± 1.53

56.02 ± 1.22

56.89 ± 0.24

58.78 ± 0.29

Data level

DFM 67.35 ± 0.43

Data level

F1_score

OA

PSPNet

FCN

Model/indicator

59.03 ± 0.11

52.77 ± 0.58

51.61 ± 1.56

55.91 ± 0.80

57.76 ± 0.61

60.99 ± 0.29

DFM

Table 6.4 Comparison of overall performances of data fusion methods (%, bold means the best score)

41.93 ± 0.07

37.13 ± 1.27

35.91 ± 1.42

40.4 ± 0.70

40.78 ± 0.26

43.00 ± 0.19

Data level

MIOU

43.14 ± 0.10

37.53 ± 0.05

35.89 ± 1.43

40.41 ± 0.21

42.04 ± 0.50

45.14 ± 0.21

DFM

6.3 Results and Discussion 145

146

6.3.1.2

6 Remote Sensing Lithology Intelligent Segmentation Based …

Single Class Accuracy Evaluation

In order to specifically discuss the performance of the model on the data set and analyze the classification results of each category, this chapter selects three models, FCN, FTUnetFormer and U-Net, which have better overall performance, and lists the IOU results of all categories, as shown in Table 6.5 shown. As can be seen from the table, the module proposed in this chapter has significantly improved the categories with small samples. For example, the IOU results of granite porphyry in the three models have been improved by 9.04%, 8.04%, and 3.14% respectively. The IOU results of hornfels have been improved significantly. They increased by 2.34%, 7.87%, and 2.89% respectively, and the IOU results of sandstone increased by 7.53%, 0.07%, and 0.9% respectively. For other lithologies with many categories, such as schist and tuff, the segmentation results are slightly improved, while the results for Quaternary, granite and other categories are not much different. Therefore, the overall segmentation results are improved. Although data-level fusion retains a large number of original features, it also introduces redundant information. For a small number of lithology categories, the features are inherently sparse, and redundant information will cause these sparse features to be masked or weakened, making it difficult to extract useful lithology features. The module proposed in this chapter uses the FRM module to refine the fusion of the two data, obtain detailed information at different scales in a multi-scale manner, and reduce the impact of redundant information. In addition, the channel attention mechanism is used to increase the weight of important channels and improve the efficiency of feature extraction. Finally, cross-dimensional interactive attention is used to strengthen the connection between spatial features and channel features to further enhance the diversity of features. Table 6.5 Comparison of IOU results of FCN, FTUnetFormer and U-Net in both Data level and DFM by category (%, bold means the best score) Category/ model

FCN

Fourth series

56.34 ± 0.59 53.60 ± 0.26 51.52 ± 2.19 51.32 ± 1.72 55.10 ± 0.72 55.52 ± 0.96

Granite porphyry

46.10 ± 1.71 55.14 ± 1.79 49.74 ± 4.62 57.78 ± 1.22 42.12 ± 5.26 45.26 ± 4.91

Granite

51.12 ± 1.62 50.63 ± 1.55 50.39 ± 1.14 46.66 ± 0.53 48.55 ± 0.81 52.16 ± 0.72

Schist

54.40 ± 0.93 54.71 ± 0.40 53.63 ± 0.33 51.25 ± 1.77 52.90 ± 1.10 53.70 ± 2.01

Data level

FTUnetFormer DFM

Data level

U-Net DFM

Data level

DFM

Hornbeam 25.93 ± 2.41 28.27 ± 2.49 19.28 ± 4.73 27.15 ± 3.44 22.93 ± 4.07 25.82 ± 3.32 Sandstone 28.05 ± 4.85 35.58 ± 1.49 30.09 ± 0.57 31.16 ± 2.61 28.58 ± 3.03 29.49 ± 4.26 Tuff

59.62 ± 2.13 60.95 ± 0.45 55.66 ± 2.73 60.20 ± 1.62 58.59 ± 0.72 57.69 ± 1.64

Diorite

21.84 ± 3.13 20.68 ± 1.76 21.83 ± 1.39 23.19 ± 4.22 22.39 ± 2.82 18.56 ± 3.31

Granulite

45.32 ± 0.62 46.70 ± 0.97 44.23 ± 1.33 39.65 ± 1.12 35.9 ± 1.94

40.14 ± 3.80

6.3 Results and Discussion

147

In order to analyze the classification of lithology, this chapter provides the confusion matrices of FCN, FTUnetFormer, U-Net and other models, as shown in Fig. 6.11. First of all, it can be intuitively seen from the three confusion matrices (a), (b), and (c) that granite porphyry is easily misclassified into tuff, hornfels is easily misclassified into schist, and sandstone is easily misclassified to granite porphyry and tuff, and diorite is easily misclassified to granite. After using the data fusion module, FCN’s recognition accuracy for schist, hornfels, sandstone and metamorphic granite increased by 11.1%, 4.2%, 15.9% and 9.8% respectively. FTUnetFormer’s recognition accuracy off granite porphyry and schist were respectively increase by 15.1% and 5.4% respectively. U-Net’s recognition accuracy for granite porphyry, hornfels, and sandstone has increased by 5.9%, 20.5%, and 9.4%.

6.3.1.3

Ablation Experiment

In order to study the effectiveness of each module of DFM on data fusion, based on the best-performing FCN model, ablation experiments were conducted on DFM. Except for the high coupling degree of the ECA module, which cannot be ablated, the feature residual module (FRM) was combined separately, and triple attention Force module (TA) to analyze and compare the results. As can be seen from Table 6.6, when using the FRM module, MIOU increases by 0.5% and F1_score increases by 0.55%. This is because using multi-scale feature refinement can extract more detailed feature information to distinguish different categories. When using the TA module, MIOU increases by 0.81% and F1_score increases by 0.82%. This shows that the use of cross-dimensional interaction can better capture the spatial dependence between pixels and the semantic dependence of different channels, improve the semantic expression ability of features, and improve segmentation accuracy. In order to intuitively describe the role of the module, the IOU results of each category are given respectively, as shown in Table 6.7. As can be seen from the table, the FRM module and TA module allow the model to pay more attention to minority categories. For example, the IOU of granite porphyry increased by 4.78% and 6.22% respectively, and the IOU of hornfels increased by 3.7% and 2.31% respectively. IOU can better reflect the classification situation of minority categories, and the overall results also illustrate the superiority of the model. Therefore, the information interaction of multi-scale features, channels, and spatial dimensions is very important for lithology classification. As mentioned above, multi-source remote sensing data can provide multiple aspects of information for remote sensing lithology identification. Therefore, this chapter also conducts ablation experiments on multi-source data to emphasize the role of multi-source data. As shown in Table 6.8, the model using the “optical + DEM” data combination method has better recognition results than using only optical data. OA, F1_score, and MIOU increased by 1.95%, 3.12%, and 2.57% respectively. The accuracy improvement using the “optical + SAR” method is not obvious. This may be due to the fact that SAR images are single-band, have limited information and

148

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.11 Confusion matrixes Table 6.6 Comparison of DFM model, and DFM models without either FRM module or TA module (%, bold means the best score) Model

Index OA

F1_score

MIOU

Remove FRM

67.78 ± 0.54

60.44 ± 0.71

44.64 ± 0.47

Remove TA

67.39 ± 0.73

60.17 ± 0.59

44.33 ± 0.74

DFM

67.35 ± 0.43

60.99 ± 0.29

45.14 ± 0.21

6.3 Results and Discussion

149

Table 6.7 Comparison of IOU results of each category in ablation experiments (%, bold means the best score) Category

Model Remove FRM

Remove TA

DFM

Fourth series

56.39 ± 0.51

56.30 ± 1.68

53.60 ± 0.26

Granite porphyry

50.36 ± 2.95

48.92 ± 4.03

55.14 ± 1.79

Granite

51.39 ± 0.85

50.22 ± 2.25

51.63 ± 0.55

Schist

54.39 ± 1.25

53.25 ± 1.32

54.71 ± 0.40

Hornbeam

24.57 ± 4.89

25.96 ± 1.48

28.27 ± 2.49

Sandstone

36.28 ± 1.35

31.03 ± 5.02

35.58 ± 1.49

Tuff

60.92 ± 0.16

61.02 ± 1.63

60.95 ± 0.45

Diorite

23.16 ± 5.03

23.68 ± 4.32

20.68 ± 1.76

Granulite

44.03 ± 2.84

46.61 ± 0.89

47.70 ± 0.97

have a lot of noise. “DEM+SAR” has the worst effect. It lacks four bands of optical images and the amount of information in the data is too small, causing the model to be unable to extract effective features. When the three types of data are combined, the MIOU is slightly improved, indicating that the SAR data is also effective. In order to illustrate the specific significance of multi-source remote sensing data for identifying certain types of lithology, the IOU results of all categories are given. As shown in Table 6.9, compared with using only optical images, when DEM data is added, the IOU of all categories is improved except granite porphyry and granite. From the DEM perspective, granite porphyry and other lithologies are located in similar topographic distributions, which may lead to confusion in model identification. When SAR data is added, the accuracy of granite porphyry is improved. This is because the pixel value of SAR reflects the scattering intensity of ground objects, and the scattering intensity of rocks is closely related to physical properties. For example, granite porphyry and Quaternary system can show different scattering characteristics due to factors such as porosity and water content. Granite porphyry appears bright white on SAR images and can be well distinguished from surrounding lithology. Table 6.8 Comparison of multi-source data ablation experiment results (%, bold means the best score) Data source

Index OA

F1_score

MIOU

Optics

64.13 ± 1.73

55.79 ± 0.78

40.19 ± 0.93

Optics+DEM

66.08 ± 0.51

58.91 ± 0.19

42.76 ± 0.06

Optics+SAR

63.50 ± 0.63

55.92 ± 0.63

40.34 ± 0.80

DEM+SAR

40.42 ± 0.49

13.17 ± 0.24

20.91 ± 0.52

Optics+DEM+SAR

67.29 ± 0.52

58.78 ± 0.29

43.00 ± 0.19

150

6 Remote Sensing Lithology Intelligent Segmentation Based …

Table 6.9 Comparison of IOU results in multi-source data ablation experiments (%, bold means the best score) Category

Data source Optics

Optics + DEM Optics + SAR DEM + SAR

Optics + DEM + SAR

Fourth series

52.43 ± 0.59 55.95 ± 1.43

51.02 ± 0.65

31.13 ± 0.55

56.34 ± 0.59

Granite porphyry

49.21 ± 0.83 38.47 ± 2.88

51.42 ± 3.28

6.03 ± 3.37

46.10 ± 1.71

Granite

48.78 ± 3.39 45.76 ± 1.48

46.79 ± 0.95

29.57 ± 0.85

51.12 ± 1.62

Schist

52.17 ± 2.24 53.19 ± 0.59

50.48 ± 0.41

29.89 ± 0.60

54.40 ± 0.93

Hornbeam 19.48 ± 7.17 26.22 ± 2.78

15.82 ± 0.93

1.28 ± 1.80

25.93 ± 2.41

Sandstone

30.88 ± 5.17 33.01 ± 1.29

27.29 ± 1.63

0.00 ± 0.00

28.05 ± 4.85

Tuff

55.23 ± 2.43 57.99 ± 0.62

57.47 ± 1.23

20.58 ± 3.13

59.62 ± 2.13

Diorite

19.44 ± 1.79 26.20 ± 3.34

22.36 ± 3.13

0.00 ± 0.00

21.84 ± 3.13

Granulite

34.05 ± 3.38 45.01 ± 2.66

40.41 ± 3.48

0.000 ± 0.00 45.32 ± 0.62

Therefore, multi-source remote sensing data is of great significance for lithology classification. By providing multi-faceted information, the accuracy and reliability of lithology identification can be improved.

6.3.1.4

District-Wide Visual Evaluation

In addition, this chapter visualizes the classification results of the test set, as shown in Fig. 6.12. It can be seen from (a), (b), and (c) that the classic model has omitted to identify some lithologies such as tuff, schist, sandstone, etc., but the data fusion module can be used to identify some categories, and the wrong classification can be identified. Dividing problems also has a certain mitigating effect. Judging from the results (d), (e), (f) and other results, the model has poor recognition effect on the Quaternary boundary. This should be due to the unclear boundaries caused by terrain or vegetation and other factors. This can also be seen from the original image. The gully erosion and accumulation of the Quaternary system make the boundary very complicated. According to the statistical results in Chap. 2, Table 2.12, in this dataset, the number of hornfels only accounts for 1.28% of the entire study area, causing the model to pay more attention to categories with larger samples, so (g) the classification results are poor. However, by comparing the results of all models and the original images, it is found that although the prediction results of the model are quite different from the true value labels, they are very consistent from the perspective of visual effects. It may be necessary to compare and confirm and adjust the labels in the later stage. It is also of great significance for the application of deep learning models in remote sensing lithology interpretation.

6.3 Results and Discussion

151

Fig. 6.12 Test set visualization results chart

Finally, this chapter gives the full-region interpretation results based on the bestperforming FCN model, as shown in Fig. 6.13. Since most of the data is used for training, the results are only for illustration and simple analysis. As can be seen from the figure, there are a large number of fragments, which is due to the mixed transition zone of geological elements and the complex lithological boundaries that are difficult to determine.

152

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.13 Schematic diagram of FCN+DFM whole-area interpretation

6.3.2 Test Set Accuracy Evaluation Using Methods that Incorporate Prior Knowledge 6.3.2.1

Overall Accuracy Evaluation

The 1:200,000 geological map is digitized into a grayscale image, which can be used as both data and labels. Therefore, this chapter directly inputs the geological map data and multi-source remote sensing data into the model through channel stacking as an experimental comparison. We use an additional branch to learn the characteristics of the geological map data and integrates it into the backbone network (FCN-DB) using feature fusion. Finally, the method designed in this chapter based on multi-scale boundary information fusion and prior knowledge embedding of boundary binary classification tasks is used for training (FCN-PK). For each group, we conduct 5 repeated experiments, removes the highest value and the lowest value, and then takes the mean and standard deviation of the three experimental results as the final result, as shown in Table 6.10. Comparing Table 6.4 with Table 6.10, we can find that after integrating the 1:200,000 lithology label into the model as prior knowledge, the OA, F1_score, and MIOU indicators of all models have improved. Among them, the indicators of

6.3 Results and Discussion

153

Table 6.10 Comparison of prior knowledge fusion results (bold means the best score) Model

OA

F1_score

MIOU

FCN

68.97 ± 0.64

61.68 ± 0.13

45.51 ± 0.19

U-Net

66.03 ± 0.78

58.82 ± 0.57

43.03 ± 0.43

PSPNet

65.58 ± 0.36

56.18 ± 0.42

41.30 ± 0.24

DeepLabV3+

61.88 ± 0.48

54.20 ± 0.60

38.55 ± 0.36

Swin-Unet

62.47 ± 0.80

52.81 ± 0.49

37.55 ± 0.11

FTUnetFormer

66.33 ± 0.47

60.77 ± 0.24

44.75 ± 0.09

FCN-DB

68.35 ± 0.65

61.27 ± 0.43

45.55 ± 0.19

FCN-PK

68.71 ± 0.93

62.65 ± 0.19

46.62 ± 0.34

FCN increased by 1.68%, 2.9%, and 2.51% respectively, and the indicators of U-Net increased by 0.39%, 1.93%, and 2.25% respectively. The indicators of FTUnetFormer increased by 0.61%, 3.13%, and 2.82% respectively. This shows that prior labels as data can increase the diversity of training data, help the model better grasp the distribution patterns of lithology, avoid interference from excessive noise and irrelevant information, and thus improve recognition accuracy. In addition, when an additional branch is used to extract features of a priori labels, the accuracy does not improve significantly, and some indicators actually decrease. This shows that although the prior label contains larger-scale lithology distribution information, when using different feature extraction networks to extract feature information, there is no information interaction between the remote sensing image and the prior label. The two inputs are fully independent, which also requires the additional parameters, thereby the model becomes complex and easy to overfit, and reduces the generalization performance of the model. The model in this chapter uses the ASPP structure to obtain multi-scale boundary information. This process only introduces a small number of parameters. In addition, the convolution operator is used to learn global features and local detail features with different step sizes, and finally a binary image is generated to guide the network to learn more spatial boundary information. The prior knowledge embedding method in this chapter has further improved the accuracy compared to direct fusion.

6.3.2.2

Single Class Accuracy Evaluation

In order to explore the impact of prior knowledge on the classification results of each lithology, the IOUs of all categories are listed for the well performing models including FCN, FCN-DB, FTUnetFormer and FCN-PK, as shown in Table 6.11. Comparing Table 6.11 with Table 6.5, it can be seen that the model’s recognition effect on tuff, metamorphic rock, etc. has been significantly improved. There are basic boundary contours between tuff and granite rock, metamorphic granulite and Quaternary. After the model captures this information, it can, distinguish the

154

6 Remote Sensing Lithology Intelligent Segmentation Based …

Table 6.11 Comparison of IOU results of FCN, FCN-DB, FTUnetFormer and FCN-PK by category (%, bold means the best score) Model

Category

FCN

FCN-DB

FTUnetFormer

FCN-PK

Fourth series

57.60 ± 0.80

57.86 ± 0.21

51.63 ± 0.62

55.97 ± 0.72

Granite porphyry

45.00 ± 3.93

50.42 ± 3.90

53.16 ± 5.34

52.33 ± 3.71

Granite

52.56 ± 0.54

51.82 ± 0.30

49.10 ± 1.01

52.71 ± 1.57

Schist

55.47 ± 2.07

54.15 ± 2.11

54.26 ± 1.53

55.14 ± 2.06

Hornbeam

33.25 ± 0.52

26.36 ± 1.04

34.86 ± 0.75

31.22 ± 2.97

Sandstone

29.43 ± 2.13

29.87 ± 0.83

31.40 ± 3.51

36.30 ± 1.77

Tuff

62.12 ± 0.71

60.89 ± 1.12

58.66 ± 1.35

62.18 ± 0.42

Diorite

24.46 ± 3.84

27.67 ± 3.05

21.08 ± 3.66

28.34 ± 3.77

Granulite

51.67 ± 2.26

50.89 ± 1.39

48.57 ± 1.61

48.45 ± 2.54

categories. The fusion method in this chapter focuses on categories with small distribution ranges but does not greatly reduce the recognition accuracy of large-scale distribution categories. This shows that multi-scale methods can capture broader spatial information to identify large-scale lithology distribution. At the same time, local detailed information is captured to identify sparsely distributed lithology. In addition, additional boundary tasks can force the model to learn boundary information that can be shared in the shallower layers of the model, thereby improve the generalization performance of the model. In order to obtain the classification of lithology, this chapter gives the confusion matrices of four models with better performance, as shown in Fig. 6.14. Comparing Fig. 6.14 with Fig. 6.11, it can be found that when prior knowledge is incorporated, the recognition accuracy of FCN and FTUnetFormer models for porphyry, hornfels, granulite, etc. is improved, but the recognition accuracy for granite porphyry decreases. This may be due to the simple label boundaries at small scales. The information at different scales may be inconsistent, and the lithology and stratigraphy may various at small scales, making fine boundaries easy to misclassify.

6.3.2.3

Ablation Experiment

In order to explore the impact of each module on the fusion method, this chapter conducted ablation experiments on the boundary information extraction module and boundary label generation module. The results are shown in Table 6.12. As can be seen from Table 6.12, when multi-scale boundary information is extracted and integrated into the model, OA increases by 0.89%, F1_score increases by 0.14%, and MIOU increases by 0.29%. This is because the use of multi-scale boundary information can help the model extract lithological context information,

6.3 Results and Discussion

155

Fig. 6.14 Confusion matrix

Table 6.12 Comparison of FCN-PK model and models without either boundary extraction module or boundary label generation module (%, bold means the best score) Model Remove boundary extraction

Index OA

F1_score

MIOU

67.82 ± 0.41

62.51 ± 0.48

46.33 ± 0.56

Remove label generation

68.13 ± 0.42

61.61 ± 1.10

45.68 ± 0.84

FCN-PK

68.71 ± 0.93

62.65 ± 0.19

46.62 ± 0.34

help the model understand the distribution of lithology, and improve the recognition effect. After adding additional boundary learning tasks after processing the prior labels, each indicator increased by 0.89%, 1.04%, and 0.94% respectively. This shows that additional boundary tasks can guide the model to learn multi-scale boundary information to extract lithology category features at different distribution scales, thereby improving recognition accuracy. In order to show the effect of the two modules on each category, the IOU results of all categories in the ablation experiment are given, as shown in Table 6.13.

156

6 Remote Sensing Lithology Intelligent Segmentation Based …

Table 6.13 Comparison of IOU results of FCN-PK model, and models without either boundary extraction module or boundary label generation module by category (%, bold means the best score) Model

Category

Remove boundary extraction

Remove label generation

FCN-PK

Fourth series

54.99 ± 0.88

54.42 ± 2.13

55.97 ± 0.72

Granite porphyry

49.22 ± 4.10

52.96 ± 3.68

52.33 ± 3.71

Granite

51.71 ± 0.90

51.78 ± 1.96

52.71 ± 1.57

Schist

54.18 ± 0.89

55.04 ± 1.39

55.14 ± 2.06

Hornbeam

34.39 ± 3.37

29.65 ± 2.98

31.22 ± 2.97

Sandstone

34.34 ± 1.03

29.35 ± 3.25

36.30 ± 1.77

Tuff

60.41 ± 1.03

61.39 ± 0.95

62.18 ± 0.42

Diorite

27.02 ± 2.45

28.37 ± 2.57

28.34 ± 3.77

Granulite

50.72 ± 1.59

48.18 ± 2.26

48.45 ± 2.54

As can be seen from Table 6.13, the multi-scale boundary information extraction module can capture the distribution information of categories such as granite porphyry and tuff. Boundary learning has greatly improved the accuracy of categories such as hornbeam and granite.

6.3.2.4

District-Wide Visual Evaluation

In addition, this chapter visualizes the classification results of the test set in Fig. 6.15. It can be seen from (a), (d), (e), and (f) that the model can identify the basic boundary contours of different lithologies, and the method in this chapter is more consistent with remote sensing images. According to (b) and (c), it can be seen that granite, tuff, and diorite are easily misclassified and the boundaries are not clear. This may be because their spectrums are very similar. They are all igneous rocks and have similar physical and chemical properties. Distributions may have staggered mixtures, which may allow the model to be easily confused. In addition, according to the results of (g), after adding prior knowledge, the model’s recognition of schist has been further improved compared to the classic model. Finally, this chapter also presents the full-region interpretation results of the FCNPK model, as shown in Fig. 6.16. Compared with Fig. 6.13, it can be seen that after adding prior knowledge, due to the existence of certain boundary constraints, such as granite porphyry and There is a priori boundary between tuffs, and the model can distinguish the two well, so the accuracy is significantly improved.

6.3 Results and Discussion

Fig. 6.15 Test set visualization results

157

158

6 Remote Sensing Lithology Intelligent Segmentation Based …

Fig. 6.16 Schematic diagram of FCN-PK whole area interpretation

6.4 Conclusion Aiming at the problem that lithology information is difficult to extract in the process of lithology interpretation based on remote sensing images, the feature description ability of a single data source is limited, and the information contained in multisource data is redundant and key features cannot be effectively extracted, this study adopts two strategies: adaptive fusion of multi-modal data and embedding of prior geoscience knowledge to make breakthrough in the key technologies of “acquisition of key features of remote sensing lithology—multi-modal data fusion—introduction of prior geoscience knowledge” for intelligent interpretation of remote sensing lithology. The main results achieved are as follows:

References

159

(1) An adaptive fusion method of multi-modal remote sensing data is proposed. Due to the heterogeneity and diversity of multi-source remote sensing data, direct integration and use of data from different sensors can easily lead to the interference of redundant information, making it difficult to extract effective information. To address this problem, based on the advantages of high resolution and rich spectral information of optical data, this study adopted a step-by-step fusion method to fuse optical data with SAR and DEM data respectively. It learns the contribution of optical data to other types of data, and use channel attention to weight the force mechanism. Finally, a triple attention mechanism is used to mine implicit information between spaces and channels, allowing the model to focus on effective features and reduce the interference of redundant information. Experiments show that optical data has obvious advantages over other data and can be used as a bridge for multi-source data fusion, and the data adaptive fusion module performs well on most models. (2) A semantic segmentation method based on prior knowledge embedding is established. The formation of rocks goes through complex geological, physical and chemical processes, and rocks exhibit similar spectra. In addition, topographic relief and cover will also affect the reflection spectrum of the surface, making it difficult to distinguish different lithologies based on spectral information. It is often necessary to combine field surveys and other methods to improve the accuracy of lithology identification. Small-scale geological maps are the basis and prerequisite for generating new large-scale geological maps, and can provide important guidance and support for large-scale geological maps. Therefore, this study preprocessed the existing geological data, extracted available information and integrated it into the deep learning model to help the model mine the differences between different lithologies and improve the accuracy of boundary recognition, thereby improve the accuracy of semantic segmentation and model generalization ability. Experiments show that prior knowledge can improve the identification accuracy of lithology.

References Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. Cao, H., Wang, Y., & Chen, J. (2022). Swin-unet: Unet-like pure transformer for medical image segmentation. In Computer vision—ECCV 2022 workshops, Tel Aviv, Israel, October 23–27, Proceedings, Part III, 2023 (pp. 205–218). Springer Nature Switzerland. Chen, L., Papandreou, G., & Kokkinos, I. (2017a). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. Chen, L., Papandreou. G., & Kokkinos, I. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.

160

6 Remote Sensing Lithology Intelligent Segmentation Based …

Chen, L., Papandreou, G., & Schroff, F. (2017b). Rethinking atrous convolution for semantic image segmentation. arXiv Preprint arXiv:1706.05587. Chen, L., Zhu, Y., & Papandreou, G. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818). Chen, S., Yu, Y., Yang, J., Wang, N., & Meng, H. (2016). Extraction of lithology information from ASTER remote sensing data based on measured spectral index method. Journal of Jilin University (Earth Science Edition), 46(03), 938–944 (in Chinese). Clark, R., & Swayze, G. (1995). Automated spectral analysis: Mapping minerals, amorphous materials, environmental materials, vegetation, water, ice and snow, and other materials: The USGS Tricorder algorithm. Lunar and Planetary Science Conference. Cui, B., Chen, X., & Lu, Y. (2020). Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access, 8, 116744– 116755. Dong, X., Gan, F., Li, N., Yan, B., Zhang, L., Zhao, J., Yu, J., Liu, R., & Ma, Y. (2020). Fine identification of minerals in Gaofen-5 hyperspectral images. Journal of Remote Sensing, 24(04), 454–464 (in Chinese). Dong, Y., & Zhang, Q. (2019). A review of research on deep semantic feature extraction of highresolution remote sensing images based on CNN. Remote Sensing Technology and Applications, 34(01), 1–11 (in Chinese). Fu, G., Yan, J., & Zhang, K. (2017). Current status and progress of lithology identification technology. Progress in Geophysics, 32(1), 26–40. Gan, F., Wang, R., & Jiang, S. (2000). Imaging spectral remote sensing rock ore identification technology and its application based on complete spectral shape characteristics. Geological Science, (03), 376–384 (in Chinese). Han, W., Li, J., & Wang, S. (2022). Geological remote sensing interpretation using deep learning feature and an adaptive multisource data fusion network. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–14. He, K., Zhang, X., & Ren, S. (2016). Identity mappings in deep residual networks. In Computer vision—ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, proceedings, Part IV 14 (pp. 630–645). Springer International Publishing. Hinton, G., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. Huang, G., Liu, Z., & Van Der Maaten, L. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708) Hunt, G. (1970). Visible and near-infrared spectra of minerals and rocks: I silicate minerals. Modern Geology, 1, 283–300. Hunt, G. (1971). Visible and near-infrared spectra of minerals and rocks: IV. Sulphides and sulphates. Modern Geology, 3, 1–14. Jakob, S., Bühler, B., & Gloaguen, R. (2015). Remote sensing based improvement of the geological map of the Neoproterozoic Ras Gharib segment in the Eastern Desert (NE–Egypt) using texture features. Journal of African Earth Sciences, 111, 138–147. Kemker, R., Salvaggio, C., & Kanan, C. (2018). Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 60–77. Krizhevsky, A., Sutskever, I., & Hinton, G. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. Kruse, F., Lefkoff, A., & Boardman, J. (1992). The spectral image processing system (SIPS) software for integrated analysis of AVIRIS data. In Summaries of the 4th annual JPL airborne geoscience workshop. JPL Pub.

References

161

Li, F., Li, X., Chen, W., Dong, Y., Li, Y., & Wang, L. (2022). Automatic classification of lithology of dual-polarization SAR remote sensing images based on depth features. Earth Science, 47(11), 4267–4279 (in Chinese). Lin, G., Milan, A., & Shen, C. (2017). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1925–1934). Liu, H., Wu, K., & Xu, H. (2021a). Lithology classification using TASI thermal infrared hyperspectral data with convolutional neural networks. Remote Sensing, 13(16), 3117. Liu, S., Wang, D., Mao, Y., Song, L., Ding, R., & Liu, H. (2021b). Rock ore spectrum intelligent sensing technology and research progress in smart mines. Metal Mining, (07), 1–15 (in Chinese). Liu, S., Zhuo, J., Wu, L., & Xu, Z. (2011). Rock thermal infrared spectrum unmixing and mineral content inversion. Science and Technology Herald, 29(35), 24–27 (in Chinese). Liu, Z., Li, J., & Song, R. (2022). Edge guided context aggregation network for semantic segmentation of remote sensing imagery. Remote Sensing, 14(6), 1353. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 3431–3440). Maggiori, E., Tarabalka, Y., & Charpiat, G. (2016). Convolutional neural networks for large-scale remote-sensing image classification. IEEE Transactions on geoscience and remote sensing, 55(2), 645–657. Mao, Y., Ma, B., & Liu, S. (2014). Study and validation of a remote sensing model for coal extraction based on reflectance spectrum features. Canadian Journal of Remote Sensing, 40(5), 327–335. Mao, Y., Wang, D., Wang, Y., & Liu, S. (2018). Research on the determination method of BIF magnetic rate based on visible light-near infrared spectroscopy. Spectroscopy and Spectral Analysis, 38(03), 765–770 (in Chinese). Misra, D., Nalamada, T., & Arasanipalai, A. (2021). Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 3139–3148). Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 1520–1528). O’shea, T., & Hoydis, J. (2017). An introduction to deep learning for the physical layer. IEEE Transactions on Cognitive Communications and Networking, 3(4), 563–575. Othman, A., & Gloaguen, R. (2014). Improving lithological mapping by SVM classification of spectral and morphological features: The discovery of a new chromite body in the Mawat ophiolite complex (Kurdistan, NE Iraq). Remote Sensing, 6(8), 6867–6896. Pal, M., Rasmussen, T., & Abdolmaleki, M. (2019). Multiple multi-spectral remote sensing data fusion and integration for geological mapping. In 2019 10th workshop on hyperspectral imaging and signal processing: Evolution in remote sensing (WHISPERS) (pp. 1–5). IEEE. Pal, M., Rasmussen, T., & Porwal, A. (2020). Optimized lithological mapping from multispectral and hyperspectral remote sensing images using fused multi-classifiers. Remote Sensing, 12(1), 177. Park, J., Woo, S., & Lee, J. (2018). Bam: Bottleneck attention module. arXiv preprint arXiv:1807. 06514. Rezaei, A., Hassani, H., & Moarefvand, P. (2020). Lithological mapping in Sangan region in Northeast Iran using ASTER satellite data and image processing methods. Geology, Ecology, and Landscapes, 4(1), 59–70. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention—MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, proceedings, Part III 18 (pp. 234– 241). Springer International Publishing. Seid, A., & Suryanarayana, T. (2021). Identification of lithology and structures in Serdo, Afar, Ethiopia using remote sensing and Gis techniques. International Journal of Geoinformatics and Geological Science, 8(1), 27–41.

162

6 Remote Sensing Lithology Intelligent Segmentation Based …

Sekandari, M., Masoumi, I., & Pour, A. (2022). ASTER and WorldView-3 satellite data for mapping lithology and alteration minerals associated with Pb-Zn mineralization. Geocarto International, 37(6), 1782–1812. Shang, R., Zhang, J., & Jiao, L. (2020). Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images. Remote Sensing, 12(5), 872. Shebl, A., & Csámer, Á. (2021). Stacked vector multi-source lithologic classification utilizing machine learning algorithms: Data potentiality and dimensionality monitoring. Remote Sensing Applications: Society and Environment, 24, 100643. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Song, J., Gao, S., & Zhu, Y. (2019). A survey of remote sensing image classification based on CNNs. Big Earth Data, 3(3), 232–254. Szegedy, C., Liu, W., & Jia, Y.(2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9). Tian, Q., Yu, C., & Pan, W. (2019). Evaluation of Gaofen-2 satellite image fusion method for geological applications. Science, Technology and Engineering, 19(29), 207–212 (in Chinese). Tian, T., Li, L., & Chen, W. (2021). SEMSDNet: A multiscale dense network with attention for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 5501–5514. Tong, J., Du, H., Zhu, F., Liu, Y., Liu, X., & Liu, S. (2016). Remote sensing detection method of gold and copper mines integrating principal component analysis and spectral angle matching. Metal Mining, (11), 119–123 (in Chinese). Wang, D., Liu, S., Mao, Y., Wang, Y., & Li, T. (2018). Thermal infrared spectroscopic analysis method of SiO_2 content in Anshan iron ore. Spectroscopy and Spectral Analysis, 38(07), 2101–2106 (in Chinese). Wang, J. (2020). Research on airborne hyperspectral remote sensing lithology identification technology based on machine learning. Beijing Institute of Geology of Nuclear Industry (in Chinese). Wang, L., Li, R., & Zhang, C. (2022). UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 2022(190), 196–214. Wang, R. (2008). Strategic thinking on the development of remote sensing geological technology. Remote Sensing of Land and Resources, (01), 1–12+42 (in Chinese). Wang, R., Yang, S., & Yan, B. (2007). Review of imaging spectral mineral identification methods and identification models. Remote Sensing of Land and Resources, (01), 1–9 (in Chinese). Wang, W., & Cheng, Q. (2008). Mapping mineral potential by combining multi-scale and multisource geo-information. In IGARSS 2008—2008 IEEE international geoscience and remote sensing symposium (Vol. 2, pp. II-1321–II-1324). IEEE. Wang, W., Ren, X., & Zhang, Y. (2018b). Deep learning based lithology classification using dualfrequency Pol-SAR data. Applied Sciences, 8(9), 1513. Wang, X., Chen, E., Li Z., Yao J., & Zhao, L. (2015). Multi-temporal dual-polarization synthetic aperture radar interferometry land cover classification method. Journal of Surveying and Mapping, 44(05), 533–540 (in Chinese). Wang, Z., Zuo, R., & Jing, L. (2021). Fusion of geochemical and remote-sensing data for lithological mapping using random forest metric learning. Mathematical Geosciences, 2021(53), 1125– 1145. Woo, S., Park, J., & Lee, J. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3–19). Wu, C., Li, X., & Chen, W. (2020). A review of geological applications of high-spatial-resolution remote sensing data. Journal of Circuits, Systems and Computers, 29(06), 2030006. Xie, M., Zhang, Q., & Chen, S. (2015). A lithological classification method from fully polarimetric SAR data using Cloude-Pottier decomposition and SVM. AOPC 2015: Optical and optoelectronic sensing and imaging technology. SPIE, 9674, 34–41.

References

163

Yang, T., Gong, H., Li, X., & Zhao, W. (2010). Application of imaging radar remote sensing geological disasters. Journal of Natural Disasters, 19(05), 42–48 (in Chinese). Yang, Y. (2019). Research on lithology classification of multi-source remote sensing data supported by machine learning. Chengdu University of Technology (in Chinese). Ye, B., Tian, S., & Cheng, Q. (2020). Application of lithological mapping based on advanced hyperspectral imager (AHSI) imagery onboard Gaofen-5 (GF-5) satellite. Remote Sensing, 12(23), 3990. Yu, C., Sun, J., Zhang, D., Zhang, Y., & Hu, Y. (2022). Lithology classification method based on multi-source remote sensing and airborne geophysical prospecting data. Geological Bulletin, 41(Z1), 210–217 (in Chinese). Yu, L. (2017). Research on remote sensing lithology information extraction based on linear mixed spectral model. Chengdu University of Technology (in Chinese). Yuan, X., Shi, J., & Gu, L. (2021). A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Systems with Applications, 169, 114417. Zhang, B. (2018). Remote sensing big data era and intelligent information extraction. Journal of Wuhan University (Information Science Edition), 43(12), 1861–1871 (in Chinese). Zhang, C., Yu, J., Hao, L., & Wang, S. (2017). Remote sensing lithology identification method based on multi-scale texture and multi-spectral images. Geological Science and Technology Information, 36(04), 236–243 (in Chinese). Zhang, H., Dana, K., & Shi, J. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7151–7160). Zhang, J., Lin, S., & Ding, L. (2020). Multi-scale context aggregation for semantic segmentation of remote sensing images. Remote Sensing, 12(4), 701. Zhang, W., Li, Y., Zhang, T., Gui, L., & Zhou, C. (2019). Remote sensing interpretation of landslide geological hazards in high vegetation coverage areas based on disaster sensitivity analysis. Safety and Environmental Engineering, 26(03), 28–35 (in Chinese). Zhao, H., Shi, J., & Qi, X. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890). Zhao, J., Yang, S., & Chen, H. (2004). Lithology identification method of remote sensing images based on fractal texture. Remote Sensing Information, (02), 2–4 (in Chinese).

Chapter 7

Prior Knowledge-Based Intelligent Model for Lithology Classification

Abstract Vegetation coverage can weaken the lithology information and increases inter-class similarity, making it difficult to effectively extract key feature information for lithology classification. To address the above issues, this study proposes a lithology scene classification model based on prior knowledge and improved dense connected networks. The steps of the improved dense connected network includes: Extracting edge information through edge detection operators to enhance the extraction of detailed features such as lithology, texture, and edges; Extracting multi-scale data features and fusing them using a dense connected network with enhanced feature fusion; Adding a random mixed attention mechanism to efficiently combine channel and spatial attention, while capturing dependency relationships on channels and pixel level relationships in space, improving the model’s ability to focus on key feature information; Using label smoothing to balance the classification accuracy of different categories, making the accuracy of each category more average. Building a dual branch network based on an improved dense connected network, one is to construct the main branch based on dataset A, and the other is to introduce the prior knowledge of 1:250,000 scale lithology classification based on dataset B to construct auxiliary branches; Building unsupervised loss based on label association prior of two datasets; Adaptively fusing the supervised loss and unsupervised loss constructed by two branches to construct a lithology scene classification model based on prior knowledge and improved dense connected networks. A classification experiment was conducted using dataset A and dataset B. The results show that the proposed classification model can effectively classify the lithology of the coverage area, and its performance is superior to other classic scene classification models. Especially, the addition of prior knowledge significantly improved the accuracy.

7.1 Introduction At present, there are some problems in the realm of lithology classification. Current lithology classification studies mainly focus on the uncovered outcrop area, while the lithology information in the vegetated area is obscured by the vegetated area,

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_7

165

166

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

which becomes “weak information”. Another issue in lithology classification is that the lithology category is high-level semantic information that makes classification process challenging. In addition, due to the imaging mechanism of remote sensing data, the influence of the phenomenon of “same objects with different spectrum” and “foreign objects with same spectrum” is more significant. The current classification methods based on manual feature extraction put forward high requirements on the relevant experts, and need to select the classification features and classify them by the professional knowledge of the experts. It is impossible to achieve large-scale automatic classification and difficult to achieve the rapid interpretation of lithology in a large area. The method of deep learning can automatically mine abstract and complex deep feature information from shallow feature data without relying on the knowledge of related fields. However, the application of deep learning model in the field of lithology classification still has the problem that the key features of lithology cannot be extracted efficiently. The lithology category is a highlevel abstract semantic feature, which is distributed in the remote sensing image as an aggregated “image block”, and it is difficult to extract the lithology feature information by pixel-based method. The method of using multiple data sources for fusion has also been adopted by some scholars, but they mainly focus on the combination of relatively single optical and DEM data, or the use of optics combined with other geochemical and other related knowledge for research, and the combination of radar, optical and DEM data for research is less, and more data from multiple sources is added. It can form a more comprehensive data description of the target, so as to provide a more comprehensive data basis for extracting key feature information, which is of great significance for lithology classification. The use of edge detection operators can make the weak information of lithologic edge better extracted and make the classification more accurate. Increase the attention mechanism to highlight the useful data information for classification results; Then, by enhancing feature fusion, multi-scale data is extracted and fused to supplement more detailed texture information for subsequent abstract features and improve classification accuracy. Adding prior knowledge can help the algorithm make better use of the information it already has, thus reducing the need to learn samples. This is especially important for tasks where data is scarce or costly, and also helps the model understand the task better. For example, in an image classification task, if we know that a particular object in some images often has a particular color or shape, we can use this knowledge for the model, thereby improving its accuracy, also helping the model converge faster, and reducing the complexity of the model. This also helps avoid overfitting problems. Aiming at the difficulty of extracting key features caused by the inability of direct detection in the current classification process of remote sensing lithology scenes in the coverage area based on deep learning technology, this chapter proposes a lithology scene classification model based on prior knowledge and improved dense connection network. Firstly, edge information is extracted by edge detection operator to enhance the extraction of detail features such as lithology texture and edge. The dense connected network with enhanced feature fusion was used to extract and fuse multi-scale data features. A random hybrid attention mechanism was added to combine channel and spatial attention efficiently, capture channel dependencies and

7.2 Methods

167

pixel-level spatial relationships, and improve the model’s ability to focus on key feature information. Finally, label smoothing was used to balance the classification accuracy of different classes, so that the classification accuracy can be more average. A two-branch network was constructed based on the improved dense connection network. The main branch was constructed from dataset A, and the auxiliary branch was constructed by introducing the prior knowledge of 1:250,000 scale lithology classification from dataset B. Unsupervised loss was constructed based on label association priors of two data sets. The supervised loss and unsupervised loss constructed by the two branches were adaptively fused to construct a lithologic scene classification model based on prior knowledge and improved dense connection network.

7.2 Methods 7.2.1 Lithological Scene Classification Based on Prior Knowledge and Improved Dense Connected Networks In order to better extract the key features of lithology in the coverage area, the 1:250,000 geological map was used as the prior knowledge embedded in the model to extract the lithology features from the 1:250,000 geological map to make up for the deficiency of the lithology features extracted from the 1:50,000 geological map. Improving the dense connected network based on prior knowledge takes the improved dense connected network as the backbone and maintains the original network structure. At the input end, the lithologic key features of large-scale dataset and small-scale dataset were extracted by double branch of network. The prediction Result 1 of the small-scale dataset obtained through the network was calculated with its real label 1 to obtain the supervised loss 1. Then, the prediction Result 2 of the large-scale dataset obtained through the network was calculated with its real label 2 to obtain the supervised loss 2. Based on the mapping between lithology of 50,000 geological maps and 1:250,000 geological maps, 10 types of labels in large scale dataset were merged into 7 types of labels in small scale dataset to obtain prediction label 3, and then prediction label 3 was calculated with prediction label 1 in small scale dataset through the network to obtain unsupervised loss. Finally, the three kinds of losses were weighted and fused to obtain the final weighted loss, and then backpropagation is carried out to update the parameters, so that the predicted value of the model could be closer to the real value. Its specific network structure is shown in Fig. 7.1.

7.2.2 Improved Dense Connectivity Network Edge enhancement operator has the ability to suppress noise and improve image quality through Gaussian filtering and non-maximum suppression, which can make

168

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Fig. 7.1 Structural schematic diagram of lithologic scene classification model based on prior knowledge embedding

the model pay more attention to the detailed features such as lithology edge and texture, enhance the dense connection network of feature fusion, and extract more multi-scale information. Moreover, the shallow layer information of rich texture details is added to the subsequent more abstract feature information to improve the feature expression ability of the model. In order to deal with the problem of feature redundancy, it is also necessary to improve the model’s ability to focus on key feature information. Therefore, this study introduced a random hybrid attention module on the densely connected network with enhanced feature fusion. It is proposed based on edge enhancement, feature fusion and random mix of attention (Edge Enhancement and Enhanced Feature Fusion and Shuffle Attention, 3 EFFAS) lithology classification model. The 3EFFSA network adopts the enhanced dense connection network model as the backbone, and maintains the original structure design of alternating combination of 4 dense blocks and 3 transition layers. The SE channel attention module is added in front of each dense block to form an attention-dense block. Before the feature data is input into the dense block, the channel attention module will assign more weight to the key feature information affective in the final classification result to enhance its feature information, and assign less weight to the invalid information to restrain its interference with the classification result. The process of 3EFFSA network to classify remote sensing scene data is as follows: First, remote sensing image data goes through 7 × 7 convolution and 3 × 3 maximum pooling operation, and then through edge enhancement operation, the obtained feature information is input to the first attention-intensive block. After passing through the attention block, a transition layer consisting of a 1 × 1 convolution layer and a 2 × 2 average pooling layer is used to reduce the size and number of channels of the feature map before inputting into the next level of attention block. With the second attention-intensive block, its input data contains the output feature maps of all the previous attention-intensive blocks. These feature maps are downsampled twice or three times according to the number of spanning dense blocks, and

7.2 Methods

169

then the feature maps with same size are splicing in channel dimension. The output of the fourth attention-intensive block passes through a global average pooling, fully connected layer, and finally the cross-entropy loss function in the classifier layer is corrected by label smoothing to obtain the prediction result of the model.

7.2.3 Edge Enhancement Edge enhancement is also known as “image sharpening”. An image processing method that highlights the boundary information between different objects on the image and enlarges the detail contrast of the image. There are two processing methods: (1) spatial domain processing: The use of differential operators highlights the image brightness (or gray level) of the big change; (2) Frequency domain processing: Through the high-pass filter, the high-frequency components in the frequency domain are retained, the low-frequency components are filtered out, and the edge features in the image are strengthened. The enhanced result makes the image edge clear and the interpretation vision obvious. In the actual image segmentation, the differential operator is often only used for the first and second derivatives, but it will be sensitive to noise in the pure second order derivative operation, and the derivative information of more than three orders often loses its application value. The second derivative can also explain the type of grayscale mutation. In some cases, such as images with uniform grayscale changes, the boundary may not be found using only the first derivative, in which case the second derivative can provide very useful information. The second derivative is also sensitive to noise, and the solution is to smooth and filter the image first, eliminate part of the noise, and then detect the edge. However, the algorithm using the second derivative information is based on zero crossing detection, so the edge points obtained are relatively small, which is conducive to the subsequent processing and recognition work. First-order differential operators include Roberts operator, Sobel operator, Prewitt operator (Chaple et al., 2015), Canny operator (Canny, 1986), and secondorder differential operators include Laplacian operator (Wang, 2007), Log operator (Ulupinar & Medioni, 1990), and so on. Roberts operator: Edge location accurate, but sensitive to noise. It is suitable for image segmentation with clear edges and less noise. Roberts edge detection operator is a kind of local difference operator to find the edge. The image processing results are not very smooth edge. After analysis, because Robert operator usually produces a wide response in the region near the edge of the image, the edge image detected by the operator often needs to be refined, and the edge location accuracy is not very high. Sobel operator: There are two Sobel operators, one is to detect the horizontal edge, the other one is to detects vertical flat edges. Another form of the Sobel operator is the isotropic Sobel operator. Compared with the common Sobel operator, the isotropic Sobel operator has more accurate position weighting coefficient, and the gradient amplitude is the same when detecting edges in different directions. Due to the

170

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

particularity of building image, it can be observed that when processing the contour of this type of image, it does not need to calculate the gradient direction, so the program does not give the isotropic Sobel operator processing method. As Sobel operator is a form of filter operator, it is used to extract the edge, can use fast convolution function, simple and effective, so it is widely used. The drawback is that Sobel operator does not strictly distinguish the subject of the image from the background, in other words, Sobel operator is not processed based on the gray level of the image. Because Sobel operator does not strictly simulate human visual physiological characteristics, the extracted image contour is sometimes unsatisfactory. Prewitt operator: It has an effect on noise suppression, the principle of noise suppression is through pixel average, but pixel average is equivalent to low-pass filtering of the image, so the Prewitt operator is not as good as Roberts operator in edge positioning. Laplacian operator: The Laplacian operator is a second-order differential operator, which can be used for detection if only the position of the edge points is considered without considering the surrounding gray difference. For a step edge, the second derivative of the step edge appears zero cross at the edge point, and the second derivative of the pixel on both sides of the edge point is different. Canny operator: The function of the operator is better than the previous several, but it is more troublesome to implement, Canny operator is a multi-stage optimization operator with filtering, enhancement and detection, before processing, Canny operator first uses Gaussian smoothing filter to smooth the image to remove noise, Canny segmentation algorithm uses the finite difference of first-order partial derivative to calculate the gradient amplitude and direction. In the process, the Canny operator will also go through a process of non-maximum suppression, and finally the Canny operator also uses two thresholds to connect the edge. In this paper, Canny operator is selected as the edge enhancement operator, because Canny operator has excellent performance in edge detection, and can detect finer edges and weaker edges, which is very important for the extraction of detailed features such as rock texture and rock edge in rock images. Moreover, the low error detection rate of Canny operator can reduce the case of false detection and ensure the accuracy and reliability of extracted rock features. Canny operator also has the ability to suppress noise and improve image quality by Gaussian filtering and non-maximum suppression, which is very important for extracting texture and detail features in rock images. Canny operator has a number of parameters that can be adjusted, such as the standard deviation of Gaussian filter, threshold, etc., which makes the operator can be adjusted and optimized according to the specific situation to extract more realistic rock features. The specific algorithm steps of Canny operator to find edge points are as follows: (1) Smooth the image with a Gaussian filter. Edge detection of images is susceptible to noise. Therefore, denoising is usually required before edge detection. Usually, Gaussian filtering is used to remove noise, for example, the kernel size of the Gaussian filter is 5 × 5, Eq. 7.1 is as follows:

7.2 Methods

171



1 4 ⎢ 4 16 1 ⎢ ⎢ f ilter = ⎢ 7 26 273 ⎢ ⎣ 4 16 1 4

7 26 41 26 7

4 16 26 16 4

⎤ 1 4⎥ ⎥ ⎥ 7⎥ ⎥ 4⎦ 1

(7.1)

(2) Calculate gradient amplitude and direction using first-order partial derivative finite difference method. The gradient amplitude and direction are calculated according to Sobel operator to find the gradient of the image. First, the convolutional template is applied to the x and y directions respectively, and then the gradient amplitude and direction are calculated. Equations 7.2–7.5 is shown as follows: ⎡

⎤ −1 −2 −1 dy = ⎣ 0 0 0 ⎦ 1 2 1 ⎡ ⎤ −1 0 1 dx = ⎣ −2 0 2 ⎦ −1 0 1 / S = dx2 + d y2 θ = arctan

dy dx

(7.2)

(7.3)

(7.4) (7.5)

The direction of the gradient is generally always perpendicular to the boundary, and the direction of the gradient is classified into four categories: vertical, horizontal, and two diagonals (i.e., 0 degrees, 45 degrees, 90 degrees, and 135 degrees). (3) Non maximum suppression of gradient amplitude. For each pixel, the non-maximum suppression technique is applied to filter out the non-edge pixels and make the blurred boundaries clear. The process preserves the maximum value of the gradient intensity at each pixel point and filters out the others. (1) Its gradient direction is approximated to one of the following values, including 0, 45, 90, 135, 180, 225, 270, and 315, indicating the direction of up, down, left, right, and 45 degrees. (2) The gradient intensity of this pixel is compared with that of the pixel in the positive and negative direction of its gradient. If the gradient intensity of this pixel is the highest one, it is retained, otherwise it is suppressed (deleted, that is, set to 0). (4) Using dual threshold algorithm to detect and connect edges. There are still many noise points in the image after non-maximum suppression. A technique called double threshold is used in Canny algorithm. That is, a threshold

172

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

upper bound and a threshold lower bound (usually designated artificially in OpenCV) is set. If the pixel value in the image is larger than the upper bound of the threshold, it is considered to be the boundary (called a strong boundary, strong edge); If it is smaller than the lower bound of the threshold, it is considered to be not the boundary, and the value between the two is considered to be a candidate (called a weak boundary, weak edge).

7.2.3.1

Dense Connected Network for Enhanced Feature Fusion

Dense connected convolutional neural networks (DenseNet) (Zhu & Newsam, 2017) are new to use after VGGNet and ResNet. Algorithmic ideas and efficient neural network models adopt new algorithmic ideas to solve the problem that the network is difficult to train as the network deepens and the number of parameters increases. The dense connection network adopts a new connection mode of feature multiplexing to maximize the transmission of information. In this network, each layer is connected to the other layers in a special way, that is, all the layers before each layer serve as the input features of the layer, while adding a small number of features of the local layer. Because there are many dense connections between the middle layer and the layer of the network, it is called a dense connected convolutional neural network (Fig. 7.2). The advantage of densely connected convolutional neural networks is that they can avoid learning redundant features, thus reducing the burden on the network model. The network splices multiple features directly through skip connections to form a new combined feature. Therefore, when only a small number of convolutional cores are needed, densely connected convolutional neural networks can obtain a large number of feature data, greatly reducing the number of parameters in the network, making the whole network easier to be expanded. At the same time, it can also reuse the

Fig. 7.2 Dense block diagram

7.2 Methods

173

underlying features, and effectively alleviate the problems such as gradient diffusion in the network training process, thus shortening the training time of the network. The dense connection layer network consists of four dense blocks and three transition layers, which are alternately superimposed to form the backbone of the network. In each dense block, a dense connected structure is adopted, and the input of layer i is connected by the output of layer 0 to layer i − 1, as shown in Fig. 7.2, which represents the dense connected layer i = 4, and x0 to x3 represent the output feature graphs of the four convolution layers respectively. These short connections across different feature maps can transfer the effective feature data extracted from the previous layer to the subsequent network layer, so that the feature reuse effect can be realized. The output of layer i is shown in Eq. 7.6: xi = H (x0 , x1 , . . . , xi−1 )

(7.6)

where x0 , x1 , . . . , xi−1 represents the output of the first i convolution layers, respectively; H represents a collection of Batch Normalization (BN), ReLU, and convolutional layer operations, and xi represents the output of the i convolution layer. If the number of channels in the feature graph of the input layer is k0 , the number of channels in the output feature graph of the i convolutional layer is denoted as ki . Due to the superposition of feature graphs, ki satisfies Eq. 7.7: ki = k(i − 1) + k0

(7.7)

where k is the growth rate coefficient of the feature map, used to control the width of the network. The transition layer can reduce the size of the feature map and channels in the feature graph. Since the output feature maps of different convolution layers are directly spliced in each dense block, the size of the feature maps must be ensured to be the same, and the size of the feature maps can only be reduced between different dense blocks. By introducing a transition layer between two dense blocks, the size of the model data volume can be reduced, to avoid overfitting. In this study, a dense connection network with enhanced feature fusion is proposed, which is based on the traditional dense connection layer network and adds a short connection operation between the output feature maps of dense blocks. Assume that the network model is composed of i dense blocks, and the output feature of the i th dense block is Bi, where B(i) represents the set of nonlinear operations of many convolutional layers in the dense block. In the network, the input data of the i dense blocks consists of the output of dense blocks 0 through i − 1. Its network structure is shown in Fig. 7.3, and the connection structure meets Eq. 7.8: Bi = B B0 , B1 . . . , Bi−1

(7.8)

In DenseNet, the size of the feature map inside each dense block is consistent and remains unchanged. The adjacent dense blocks are downsampled using a transition layer consisting of a 1 × 1 convolution layer and a 2 × 2 averaging pooling layer,

174

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Fig. 7.3 Enhanced feature fusion dense block structure map

which halves the height and width of the feature map while reducing the number of channels according to the set compression ratio. This can not only reduce the computational complexity of the model, but also prevent overfitting and improve the generalization performance of the model. In the cross-block data fusion, the output of different blocks is spliced and the data is expanded in the direction of the channel. Therefore, when fusing between dense blocks, it may take 2 or 3 downsampling operations, depending on the number of dense blocks crossed, to ensure that the feature map size is consistent during the splicing process. In order to retain comprehensive low-level data during the fusion process, the downsampling process does not require compression of the number of channels. A traditional deep learning network usually consists of a series of convolutional layers, with the output of the previous convolutional layer serving as the input of the later convolutional layer. As the depth of the network increases, the feature map shrinks and the receptive field of the convolutional layer becomes larger. By shorting between different dense blocks, global features and local features can be fused together, thus enhancing the ability of the network to extract abstract feature data at different scales.

7.2.3.2

Random Mixed Attention Module

Deep convolutional neural networks mainly extract features from images through convolutional cores, and each convolutional kernel can only obtain a certain size of image data at a time. In the process of convolutional operations, the perceived field of view of convolutional operations can be improved by changing the length and size of convolutional cores, and more feature data can be fused. The convolution process can be regarded as the fusion of data in two dimensions, space and channel. At present, attention mechanisms can be mainly divided into two categories, spatial attention mechanisms and channel attention mechanisms, which aim to capture pairto-pixel level relationships and inter-channel dependencies. Using both attention mechanisms at the same time can achieve better results, but inevitably increases the computational effort of the model. The design idea of SANet (Fan & Ling,

7.2 Methods

175

2017) combines group convolution (to reduce computation), spatial attention mechanism (implemented using GN), channel attention mechanism (similar to SENet), ShuffleNetV2 (fusing information between different groups using ChannelShuffle). The tensor was first divided into g groups, each of which was processed internally using SA Unit. SA is divided into spatial attention mechanisms, as shown in blue, and the specific implementation uses GroupNorm to obtain spatial dimension information. The channel attention mechanism used inside SA is shown in red, and the specific implementation is similar to that of SE. SA Unit combines information within a group by joining it together. Finally, the channel random mixing operation is used to rearrange the groups, and the information flows between different groups.

7.2.3.3

Label Smoothing

In a multi-classification problem, the target variable is usually a one-hot vector where the position of the correct class is 1 and the other positions are 0. This is a different task than binary classification because in binary classification there are only two possible classes, but in multi-label classification there can be multiple correct classes in a single data point. Therefore, the multi-label classification problem needs to detect every object present in the image. Label smoothing changes the target vector by a small amount of ε. So instead of predicting 1 for the correct class, it needs to predict 1-ε for the correct class and to predict ε for all other classes. The cross-entropy loss function with label smoothing is converted to the following Eq. 7.9: cross − entropy loss = (1 − ε)ce(i ) + ε

ce( j ) N

(7.9)

where ce(x) represents the standard cross-entropy loss of x (e.g. −log(p(x))), ε is a small positive number, i is the correct class, and N is the number of classes. Intuitively, label smoothing limits the probability values of the correct class to those that are closer to those of the other classes. In this way, it is used as a regularization technique and a way to combat model overconfidence.

7.2.3.4

Comparison Model Introduction

AlexNet (Krizhevsky et al., 2012) was proposed by Alex et al. who won the championship in the ImageNet image classification competition in 2012. The results of AlexNet are shown in Fig. 7.4, with an eight-layer neural network structure consisting of five convolutional layers and three fully connected layers. By using ReLU activation functions, local response normalization, and Dropout techniques, large-scale image recognition problems have been successfully solved. VGGNet (Simonyan & Zisserman, 2014) was proposed by Simonyan et al. in 2014. Compared with AlexNet, this model uses a small convolution kernel and a deeper network structure, and successfully solves large-scale image recognition

176

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Fig. 7.4 AlexNet network structure diagram

problems, especially in ImageNet image recognition challenges, achieving excellent results. Figure 7.5 shows the VGG network model structure diagram. ResNet (He et al., 2016) is a deep convolutional neural network model proposed by He et al. in 2015. It solves the problem of disappearing gradients in deep neural networks by introducing residual connection, which allows the network to be deeper and achieves very good results in ImageNet image recognition challenges. The residual module is shown in Fig. 7.6. Each residuals block contains two convolutional layers and a Shortcut Connection, which directly adds the input and output to form a residual. The use of residual module can effectively reduce the problem of gradient disappearance caused by the increasing depth of the network model, so the depth of the ResNet model can reach 50 layers and 101 layers, and also achieve good classification effect.

Fig. 7.5 VGGNet network structure diagram

7.2 Methods

177

Fig. 7.6 Residual module diagram (He et al., 2016)

7.2.4 Experimental Setup and Environment In the experimental process, the dataset was divided into training set, verification set and test set at the ratio of 6:2:2 in the way of random division. In order to obtain reliable experimental results, experiments were conducted five times in each case, and their mean value and standard deviation of experimental results were taken.

7.2.4.1

Experimental Environment

The experimental environment is shown in Table 7.1. Table 7.1 Experimental environment configuration Specific configuration

Experimental environment Hardware environment

Software environment

CPU

2*E5-2620V4

GPU

2*GeForce RTX 2080Ti

Internal memory

64 GB

Operating system

CentOS 7.6.1810 (Core)

Deep learning framework

Pytorch 1.8.1

GPU driver version

CUDA 10.2 /CUDNN 7.6.5

Programming language

Python 3.8

178

7.2.4.2

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Parameter Settings

In order to reduce the impact of different relevant parameters on the experimental results, the parameters of the relevant experiments in this chapter will be adjusted according to the data given in Table 7.2. Due to the limitation of the memory capacity of the graphics card, in the process of training the classification network, the data from the dataset can only be put into the model for training in batches, and the number of pictures trained in the network each time is controlled by the Batch Size parameter. The initial learning rate was 0.001, and the learning rate attenuation strategy was introduced, that is, when the number of training reaches the set parameter, the size of the learning rate is adjusted. In this chapter, when the training rounds reached 0.5 times and 0.75 times of the total rounds, the learning rate would decrease to 0.1 times of the original value. For other parameters involved in the experiment, the default coefficients in the experiment framework were used.

7.2.5 Evaluating Metrics This study mainly uses OA, Kappa coefficient, F1_score as evaluation metrics. For specific calculation methods, please refer to Sects. 3.2.4 and 4.2.2.

7.3 Results and Discussions 7.3.1 Comparative Experiment of 3EFFSA Model The 3EFFSA model proposed in this chapter was used to conduct experiments on the lithologic dataset constructed by the secondary classification system in the study area, and the same dataset processed by AlexNet, VGG16, ResNet50 and DenseNet121 was compared respectively. Since the basic model was applied to the three-band data in the natural scene, the experiment in this summary only used the optical data in Table 7.2 Experimental parameter settings

Parameter name

Parameter value

Batch size

60

Training rounds (epoch)

100

Initial learning rate

0.001

Weight attenuation coefficient

0.0001

Optimizer

Adam

Loss function

Cross entropy loss function

7.3 Results and Discussions

179

the dataset. Finally, the results of various networks are compared, and the results are shown in Table 7.3. Table 7.3 shows the OA, Kappa coefficient and F1_score of different network models on the lithology scene data set produced by the first-level label classification system in the study area. Through comparison, it can be found that the 3EFFSA proposed in this paper has the best classification performance. Among several traditional networks, DenseNet121 network is obviously superior to ResNet, AlexNet and VGG16 network models, which should be because short links in dense blocks can enhance the transmission of features, and can fully extract lithologic feature information data. The three indexes of the 3EFFSA model improved by 8.36%, 7.02% and 5.53% in the DenseNet121 model, respectively. It should be that the enhanced edge in the model enhanced the weak information of lithology boundary, which made the classification of lithology boundary more accurate, and the enhanced feature fusion mechanism could fuse more effective information. The attention mechanism improves the model’s ability to pay attention to key feature information, comprehensively improves the model’s ability to extract lithologic semantic information, and thus improves the classification accuracy of the model. The experimental confusion matrix is shown in Figs. 7.7 and 7.8. Except for “conglomerate”, the results of other categories either remain unchanged or significantly improve, which shows the effectiveness of the model proposed in this study. “Schist”, “metamorphic rock assemblage” and “pyroxenite” are scattered and narrow. It shows that thin edges and weak edges are detected by edge enhancement in the model. In addition, because the “quaternary system” is widely distributed and has more details and texture features, these features are usually distributed on a smaller scale of the image. By introducing shuffle operation and attention mechanism to the spatial dimension, random mixed attention can better capture these small-scale features, thus improving the classification performance. As shown in Fig. 7.8, except for “Wengmen metamorphic complex” and “shallow grained rock”, the accuracy of other lithology classes is above 46.15%. The probability of “Wengmen metamorphic complex” being seriously misclassified as “granite” is 53.33%. In addition, the misclassification probability of “schist”, “diorite” and “pyroxene” is not low, respectively, 23.08%, 19.05% and 23.08%. The reason for this is that there are too many “granite” categories, and although data enhancement has been done to deal with class imbalances, “granite” is still more than other categories. Table 7.3 Classification accuracies of 3EFFSA and compared models Model

F1_score (%)

Kappa (%)

OA (%)

AlexNet

36.37 ± 2.25

36.88 ± 1.44

47.67 ± 1.05

VGG16

39.79 ± 0.72

39.53 ± 2.53

49.95 ± 1.99

ResNet50

34.66 ± 2.57

28.57 ± 2.09

37.57 ± 1.36

DenseNet121

53.78 ± 0.58

59.33 ± 0.46

67.32 ± 0.38

3EFFSA (ours)

62.14 ± 0.35

66.35 ± 0.14

72.85 ± 0.18

180

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Fig. 7.7 Normalized confusion matrix of DenseNet121 results

7.3.2 Comparative Experiment of Prior-3EFFSA Model The model based on prior knowledge embedding proposed in this chapter was used to conduct experiments on the lithology dataset constructed in study area A, and the model proposed in Sect. 8.2 was compared with the same dataset. The results are shown in Table 7.4. Table 7.4 shows OA, Kappa coefficient and F1_score of the model proposed in Chap. 4 on lithology scene dataset A and OA, Kappa coefficient and F1_score of the model embedded based on prior knowledge in this chapter on lithology scene dataset A and lithology scene dataset B. Through comparison, it can be found that, the prior-3EFFSA proposed in this chapter can improve the classification effect to some extent. The three indicators increased by 2.59%, 1.54% and 1.20% respectively. It shows that the model can be trained by the dataset made by prior knowledge to obtain the prediction result and then compare with the real result to obtain the loss. Moreover, adding prior knowledge through the method of modifying single loss by weighted loss can play a guiding role, thus improving the classification accuracy of the model. The confusion matrix before and after the addition of prior knowledge is shown in Fig. 7.9, in which Fig. 7.9a is the confusion matrix of the experimental results before

7.3 Results and Discussions

181

Fig. 7.8 Normalized confusion matrix of 3EFFSA (ours) results

Table 7.4 Classification accuracies of 3EFFSA and prior knowledge embedded 3EFFSA Model

F1_score (%)

Kappa (%)

OA (%)

3EFFSA

62.14 ± 0.35

66.35 ± 0.14

72.85 ± 0.18

Prior-3EFFSA

64.73 ± 1.56

67.89 ± 0.15

74.05 ± 0.19

the addition of prior knowledge. The accuracy of the remaining eight lithologies exceeds 46.15% except for the serious misclassification of the Wengmen metamorphic complex and the shallow granulite data. The probability of “Wengmen metamorphic complex” being misclassified as “granite” is 53.33%, in addition, “schist”, “diorite” and “pyroxenite” also have a great probability of being misclassified as “granite”, the probabilities are 23.08%, 19.05% and 23.08%, respectively. The confusion matrix of the experimental results after adding prior knowledge is shown in Fig. 7.9b, and the accuracy of all lithologic data exceeds 38.46%. Among them, the classification accuracy of “Wengmen metamorphic complex” and “shallow grained rock” are 40.00% and 42.86%, respectively, which increase by 20.00% and 14.29% compared with before adding prior knowledge. The classification results of the two models show that there is a high probability that a variety of lithologies are misclassified as “granite”. The possible reason is that although certain measures have

182

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Fig. 7.9 Normalized confusion matrix of classification results a before and b after adding prior knowledge

been taken to suppress the impact of data imbalance, “granite” data in the dataset still occupies a larger proportion of samples and will still have an impact on the experimental results. After adding prior knowledge, the accuracy of “quaternary”, “diorite”, “water” and “pyroxene” decreased by 8.34%, 9.53%, 8.34% and 7.69% respectively. The possible reason is that in the 1:250,000 geological map, the lithology categories are not so detailed, and similar classes in the 1:50,000 geological map were labeled as the same class in the 1:250,000 geological map, such as water and quaternary series, quaternary and conglomerate, diorite and granite, gabbro and granite, and so on. It may also be because after deep semantic feature extraction through the network on different branches, the information between the two categories became very similar.

7.3.3 Discussion 7.3.3.1

Ablation Experiment of 3EFFSA Model

In order to further analyze the role of different modules in 3EFFSA network, the experimental results of different modules were compared and analyzed based on the multi-modal dataset in the study area. Table 7.5 shows the classification performance of different modules of the 3EFFSA network on the dataset made under the secondlevel label classification system. EE is edge enhancement, SA is random mixed attention, and LS is label smoothing. It can be seen from the data in the table that the addition of EE, SA and LS modules can improve the results, because edge enhancement can detect edges and

7.3 Results and Discussions

183

Table 7.5 Ablation experiments of EFFCA networks Model

F1_score (%)

Kappa (%)

OA (%)

3EFFSA (ours)

62.14 ± 0.35

66.35 ± 0.14

72.85 ± 0.18

DenseNet121+SA+LS

59.54 ± 2.48

64.15 ± 0.33

71.12 ± 0.19

DenseNet121+EE+LS

57.22 ± 1.30

62.64 ± 0.89

70.03 ± 0.56

DenseNet121+EE+SA

58.23 ± 0.36

63.59 ± 1.11

70.79 ± 0.99

DenseNet121+EE

56.90 ± 1.02

61.67 ± 0.38

69.25 ± 0.29

DenseNet121+SA

56.24 ± 2.37

61.88 ± 1.68

69.31 ± 1.33

DenseNet121+LS

55.64 ± 2.25

61.30 ± 0.56

68.95 ± 0.68

DenseNet121

53.78 ± 0.58

59.33 ± 0.46

67.32 ± 0.38

contours in rock images to extract features such as texture and shape in rock images, thus improving the accuracy of lithology classification. The random mixed attention mechanism can effectively extract important features of rock samples, thus reducing the influence of redundant and irrelevant features, improving the classification performance of the model, and adaptively selecting different features in rock samples and combining them into a global feature representation to obtain more accurate classification results. Label smoothing corrects the cross-entropy loss function, thereby reduces overfitting and improves the generalization ability of the model. It also adjusts the label distribution to a smoother distribution to reduce the influence of label noise or uncertainty in the training data on the model, and improves the accuracy of lithology classification. The precision of pairings is further improved, because the combination of Canny operator and random mixed attention mechanism can improve the effect of edge detection. First of all, Canny operator can help to extract edge information in the image, but sometimes it is interfered by noise and redundant information, resulting in inaccurate edge detection. The random mixed attention mechanism can extract the important information of the input feature map and reduce the influence of noise and redundant information, so as to improve the accuracy of edge detection. Combining Canny operator with label smoothing can improve the effect of image segmentation and classification. Specifically, the output of Canny operator can be used as the input of label smoothing, and then it can be further processed through label smoothing to reduce the discontinuity between labels and improve the robustness of the model. Because Shuffle Attention can extract important information from the input feature map and reduce the influence of noise and redundant information, the effect of label smoothing can be more obvious, resulting in smoother segmentation results. When the three modules were used together, the improvement performance became the most significant and F1_score, Kappa and OA increased by 8.36%, 7.02% and 5.53% respectively, which shows that the advantages of the three modules can be combined together.

184

7 Prior Knowledge-Based Intelligent Model for Lithology Classification

Table 7.6 Ablation experiments of 3EFFSA Networks (10 types) Model

F1_score (%)

Kappa (%)

OA (%)

DenseNet121

53.78 ± 0.58

59.33 ± 0.46

67.32 ± 0.38

DenseNet121+CBAM

50.24 ± 1.83

53.30 ± 1.44

62.54 ± 1.33

DenseNet121+ECA

58.42 ± 0.68

63.06 ± 0.49

70.29 ± 0.43

DenseNet121+SE

59.31 ± 2.48

64.97 ± 0.33

71.66 ± 0.19

DenseNet121+SA

62.41 ± 2.74

66.20 ± 1.07

72.75 ± 0.94

7.3.3.2

Comparative Experiments of Different Attention Mechanisms

It can be seen from Table 7.6 that compared with the original network, the accuracy of the model after the addition of ECA, SE and SA have been improved to some extent. The model after the addition of SA has been improved more obviously, indicating that the use of SA as the attention mechanism of the model has certain advantages.

7.4 Conclusion In this chapter, a lithologic scene classification model based on improved dense connection network was proposed, and then the influence of prior knowledge embedding on the model classification performance was studied. Finally, the performance of the model was evaluated by using the dataset A constructed by the two-level classification system in the label merging rules proposed in Chap. 3, which indicates the excellent performance of the model proposed in this chapter. The improved edge enhancement, enhanced feature fusion, random hybrid attention mechanism and label smoothing introduced into the dense connected network are more accurate than the traditional network model, and the comparison of ablation experiments shows that all the four mechanisms contribute to the performance improvement. Based on the proposed classification model based on the improved dense connection network, this chapter further studied the extraction ability of lithology key features. Through the loss correction strategy, a lithology scene classification model based on the prior knowledge embedding to improve the dense connection network was constructed. In the model, the predicted results obtained from the dataset B made by the prior knowledge label were compared with the real results through model training to correct the experimental losses obtained from the improved dense connected network on the dataset A proposed in Chap. 8.2, which is more accurate than the single loss network model.

References

185

References Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 679–698. Chaple, G. N., Daruwala, R. D., & Gofane, M. S. (2015). Comparisons of Robert, Prewitt, Sobel operator based edge detection methods for real time uses on FPGA. In 2015 international conference on technologies for sustainable development (ICTSD) (pp. 1–4). Fan, H., & Ling, H. (2017). Sanet: Structure-aware network for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 42–49). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv:1409.1556. Ulupinar, F., & Medioni, G. (1990). Refining edges detected by a LoG operator. Computer Vision, Graphics, and Image Processing, 51(3), 275–298. Wang, X. (2007). Laplacian operator-based edge detectors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 886–890. Zhu, Y., & Newsam, S. (2017). Densenet for dense flow. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 790–794).

Chapter 8

Multi-view Lithology Remote Sensing Scene Classification Based on Transfer Learning

Abstract In vegetated areas, the lithology image features are complex and the distribution of lithology species in different regions is different, which makes it difficult to accurately classify lithology across regions. Aiming at the problem that it is difficult to identify new lithology in the cross-region prediction of conventional models, this chapter uses the idea of transfer learning to study the model migration ability on the basis of the lithology scene classification model based on multi-view data fusion, and proposes a transfer learning method based on multi-view data fusion, which can achieve the identification of new lithology types across regions and improve the model generalization ability.

8.1 Introduction The lithologic stratigraphy unit undergoes a long evolution process. Due to the difference in the weathering degree of the rock and surface cover, the spectral information of the rock shows a certain variability. Differences in the composition of different lithologic constituent units will lead to differences in the characteristics of lithologic data at different scales in remote sensing images, and the classification of lithologic data can be carried out by capturing these different combinations of characteristics (Ma & Li, 2008). At present, classification methods based on deep learning have been widely used in the field of remote sensing classification by virtue of their excellent feature extraction ability. Tong et al. (2020) carried out scene classification of remote sensing data based on the improved model of DenseNet121 (Huang et al., 2017). Tian et al. (2021) proposed a multi-scale feature fusion remote sensing scene classification model by improving ResNet network, and used the channel attention module to improve the model’s ability to focus on key information. Their models’ performances were greatly improved. Chen et al. (2022) improved the model by using the global context space attention module to extract context information in the complete scene graph and improve the performance of model classification.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_8

187

188

8 Multi-view Lithology Remote Sensing Scene Classification Based …

Because the lithology has different formation conditions and environment, its distribution presents regional differences. For example, there may be more lithology types distributed in the one study area than in another study area. In this case, it is difficult to identify new lithology types by conventional model parameter migration method, which greatly limits the application scenarios of classification models. So how to effectively improve the generalization ability of lithology classification model is an urgent problem. Deep learning based on transferred model has been widely used in different fields in recent years. Yosinski et al. (2014) randomly divided the ImageNet dataset (Deng et al., 2009) into two groups to train and evaluate the portability of a transferred model. Their experiments showed that the migration performance of the model would be worse when the source task and the target task were less similar, but the model could obtain better results by using the model parameter migration method compared with random initialization parameters. Oquab et al. (2014) used the ImageNet dataset to fine-train the model in the target domain with a small amount of data, and achieved good results, proving that convolutional neural networks can still improve the classification accuracy through transfer learning on two datasets with different statistical characteristics and tasks. The distribution of lithology categories has the characteristics of large regional differences, and the lithology categories that do not exist in the source domain may be distributed in the target domain. In view of the problem that it is difficult to identify new lithology when the conventional model is used for cross-regional prediction. In order to improve the generalization ability of the model, this study takes Suiyang Town area of Heilongjiang as the study area. Based on remote sensing data from ZY3 and SPOT-5, a method of specifying the label of the scene map after cropping is proposed to construct a multi-view remote sensing data lithology scene classification dataset. On the basis of the lithology scene classification model based on multi-view data fusion, the model parameter transfer strategy is used to transfer the knowledge learned in the source domain to the new research area, and the target domain data is used for fine-tuning to achieve the identification of lithology categories not in the source domain.

8.2 Methods The distribution of lithology categories was characterized by great regional differences, and lithology categories that do not exist in the source domain may be distributed in the target domain. Aiming at the problem that it is difficult to identify new lithology when the conventional model is used for cross-regional prediction, we studied the model migration ability based on the lithology scene classification model and multi-view data fusion, and puts forward a transfer learning method based on multi-view data fusion. Transfer learning based on feature level fusion and data level fusion can achieve identification of new lithology species across regions and improve the model generalization ability.

8.2 Methods

189

8.2.1 Lithologic Scene Classification Based on Transfer Learning 8.2.1.1

Transfer Learning Based on Multi-view Data Level Fusion

In the lithology scene classification model based on multi-view data-level fusion, the data was first processed with convolution and pooling operations to make the processed feature map have the same size, and then the channel dimension is spliced. The number of channels after fusion was related to the type and band number of input data, which would affect the relevant parameters in the subsequent enhanced feature fusion and channel attention (EFFCA) network model. In theory, it is the best method to directly use the model parameters trained by the three kinds of view data for transfer learning. However, when the network structure of enhanced feature fusion is added, the channel number of the model and the channel number of the convolution kernel change when the data types of the input model are different, and the parameter transfer will be difficult. As only optical and DEM data were obtained in study area B, a transfer learning strategy based on data-level feature fusion was proposed in this section, and its specific structure is shown in Fig. 8.1. The implementation process of the model is as follows. Firstly, the optical data and DEM data of study area A were used to train the classification model based on

Study area B training set

Study area A training set

Optical data

Optical data

Training

Classificationmodel based on multi-viewdata level fusion

DEM data

Fine tuning

DEM data Test and evaluate

Study area B test set Opticaldata

DEM data

Fig. 8.1 Transfer learning based on multi-view data level fusion

190

8 Multi-view Lithology Remote Sensing Scene Classification Based …

multi-view data level fusion, and then the classifier required by the target domain task was constructed to replace the original classifier of the model. Finally, the data of study area B was used to fine-tune the model, and the best model was selected.

8.2.1.2

Transfer Learning Based on Multi-view Feature Level Fusion

In view of the structural characteristics of the feature-level fusion model, different data were trained by independent models and feature information was extracted. During the transfer learning, model parameters of the same data branch can be considered for migration, because data of the same type are more similar, and the target domain can make use of more source domain knowledge after the transfer. Therefore, this section introduces a transfer learning model based on feature-level data fusion, and its specific structure is shown in Fig. 8.2.

Study area B training set

Study area A training

Optical data

Optical data

Training

Classification model based on Multi-view feature level fusion (three branches) Migration optics and DEM branch parameters

DEM data

Classification model based on Multi-view Feature-level fusion (two branches) SAR data Test and evaluate

Study area B test set Optical data

DEM data

Fig. 8.2 Transfer learning based on feature level fusion

DEM data

Fine tuning

8.2 Methods

191

The implementation process of the model is as follows. Firstly, the optical, DEM and SAR data of research area A were used to train the feature-level fusion classification model based on multi-view. A two-branch feature-level fusion classification model was constructed according to the multi-view data type of research area B. The parameters of the optical and DEM branch structure in the pre-trained model were initialized to the new model. Finally, the model was evaluated using the test set data.

8.2.2 Lithologic Scene Classification Based on Multi-view Remote Sensing Data Fusion 8.2.2.1

Classification Based on Multi-view Data Level Fusion

Lithologic scene classification model based on multi-view data level fusion was constructed with the strategy of data level fusion and EFFCA network as the backbone network. The method of data-level fusion can retain the original information in the multi-view data source to the greatest extent, including a large number of lithologic spectrum, texture, shadow and terrain data, which are crucial to the classification of lithology. And the EFFCA network has strong feature extraction ability to excavate the key feature information related to lithology. The structure of the network has been shown in Sect. 4.2.1.1, Fig. 4.2.

8.2.2.2

Classification Based on Multi-view Feature Level Fusion

Lithologic scene classification model based on multi-view feature level fusion was constructed by using the feature level fusion strategy and EFFCA network as the backbone network. Remote sensing images with different views may contain different expressions of the same target feature. After the feature level fusion, the model can learn more dimensional information of the target feature, which makes the target identification more accurate and improves the performance of the model. The structure of the network has been shown in Sect. 4.2.1.2, Fig. 4.3.

8.2.2.3

Classification Based on Enhanced Feature Fusion and Channel Attention

The fusion method of global features and local features can add rich shallow information such as texture details to the abstract feature information extracted by the deeper network, and improve the feature expression ability of the model. In order to deal with the problem of feature redundancy, it is also necessary to improve the ability of the model to focus on key feature information. EFFCA network adopts the enhanced dense connection network model as the backbone, and maintains the

192

8 Multi-view Lithology Remote Sensing Scene Classification Based …

original structure design of alternating combination of 4 dense blocks and 3 transition layers. SE channel attention module is added before each dense block, thus constituting the attention dense block. Before the feature data is input into the dense block, the channel attention module is used to assign larger weights to the key feature information useful in the final classification result to enhance its feature information, while a smaller weight is assigned to the invalid information to suppress its interference on the classification results. The process of EFFCA network for remote sensing scene data classification is as follows: Firstly, the remote sensing image data is subjected to 7 × 7 convolution and 3 × 3 Max pooling operation, and the obtained feature information is input into the first attention dense block. After the attention-dense block, a transition layer consisting of a 1 × 1 convolutional layer and a 2 × 2 average pooling layer is used to reduce the size and the number of channels of the feature map, and input to the next level of attention-dense block. Starting from the second attention dense block, its input data contains the output feature maps of all the previous attention dense blocks. These feature maps are downsampled twice or three times according to the number of spanning dense blocks, and the processed feature maps are concatenated in the channel direction. The output of the fourth attention-dense block will pass through the fully connected layer and the classifier layer to obtain the prediction result of the model.

Global Feature and Local Feature Fusion Method DenseNet (Huang et al., 2017) is another neural network model with high performance after VGGNet and ResNet. By directly concatenating multiple features through skip connection to form a new combined feature, a large number of feature information in the input data can be extracted with a small number of network model parameters. The reuse of feature data between different layers can effectively alleviate the gradient diffusion problem in the process of model training and accelerate the training speed of the model. DenseNet is mainly composed of dense blocks and transition layers, and in each dense block, a dense connected structure is used. In each dense block, the output feature maps of different convolutional layers are directly spliced, which needs to ensure that the size of the feature map is consistent. In order to achieve the gradual reduction of the feature map in the process of backward transmission, a transition layer is introduced between the dense blocks to reduce the size of the feature map, and the number of channels of the feature map is controlled, so as to reduce the amount of model data and reduce the risk of overfitting. In the deep learning network, the output of the previous convolutional layer is used as the input of the next convolutional layer. With the deepening of the network, the feature map gradually decreases, and the perception field of the convolutional layer becomes larger. The shallow dense block can extract the local feature information of lithology because of the smaller field of perception. With the deepening of network depth, the deep dense block is conducive to extracting the global feature information. Because

8.2 Methods

193

the traditional dense connection network can only fuse the feature data inside the dense block, it cannot capture the feature connection between different dense blocks well. The short-circuit operation between different dense blocks can help fuse global features and local features, and further strengthen the ability of the network to extract abstract feature data at different scales. Based on the traditional dense connected layer network, we proposed a dense connection network with enhanced feature fusion by adding short connection operation between the output feature maps of dense blocks. DenseNet maintains the same feature map size inside each dense block, and uses a transition layer to downsample between adjacent dense blocks by using a 1 × 1 convolution layer and a 2 × 2 average pooling layer, halving H and W, and reducing the number of channels according to compression ratio. Data fusion across dense blocks expands the data in the direction of the channel by concatenating the outputs of different dense blocks. Therefore, when performing the fusion operation across dense blocks, it may be necessary to conduct two or three down-sampling operations according to the number of dense blocks crossed to ensure that the size of the feature map in the process of stitching is consistent. In order to preserve more comprehensive data information during the fusion process, this down-sampling process does not require compression of the number of channels.

Extraction of Key Features from Lithology Data The fusion method of global feature and local feature can add rich texture details and other shallow information to the abstract feature information extracted from the deeper network, and improve the feature expression ability of the model. In the problem of lithology classification, due to the influence of data redundancy and other problems, part of the data will interfere with the final experimental results, so extracting the feature information that plays a key role in lithology classification is very important to improve the accuracy of lithology classification. In order to deal with the problem of feature redundancy, it is also necessary to improve the focusing ability of the model on key feature information. SENet (Hu et al., 2018) model proposes the SE (Squeeze-and-Excition) module, which pays more attention to channel information. Therefore, this paper introduces the channel attention module into the dense connected network that enhances feature fusion. The attention mechanism of SE channel mainly includes two steps: “compression” and “stimulation”. In the process of network training, the weights of different channels are adjusted to strengthen the weight of feature information that is more effective for classification, and the ability of the model to extract key features is improved. In the “compression” process, the statistical data of each channel feature map is obtained through global average pooling. The association relationship between different channels is captured during the “excitation” process, when the weight information of the channel is obtained. The weight coefficient is compressed to between 0 and 1 by the Sigmoid function. Finally the channel weight coefficient and the original input feature map are multiplied to update the information in the feature map.

194

8 Multi-view Lithology Remote Sensing Scene Classification Based …

8.2.3 Accuracy Evaluation Based on five repetitions of the experiment, the confusion matrix of the test set was obtained and various accuracy metrics were calculated. The mean and standard deviation of various metrics were used to quantify the performance difference between different classification models. In addition, ablation experiments were used to verify the effect of the two improved modules in EFFCA network and analyze the effect of the combination of different view data fusion on the classification performance. Based on the constructed model, scene-level prediction and accuracy evaluation were performed for the study area. OA, Kappa coefficient and F1_score were used as evaluation metrics. Their calculation can be referred to Sects. 3.2.4 and 5.2.2.

8.3 Results and Discussion 8.3.1 Experimental Setup and Hyperparameter Optimization The dataset of the corresponding region was randomly divided into the training set, verification set and test set according to the ratio of 6:2:2. In order to obtain reliable experimental results, five repeated experiments were conducted with each parameter setting, and the mean value and standard deviation of the experimental results were taken. The experimental environment in this paper is shown in Table 8.1. Different parameter designs in the process of model training will have an impact on the results. In order to reduce the impact of different relevant parameters on the experimental results, the parameters of relevant experiments will be adjusted according to the data given in Table 8.2. Due to the limitation of the memory capacity of the graphics card, in the process of training the classification network, the data from the data set can only be put into the model for training in batches, and the number of pictures trained in the network each time is controlled by batchsize. The initial learning rate was set to 0.001, and the learning rate attenuation strategy was Table 8.1 Experimental environment configuration Experimental environment

Specific configuration

Hardware environment

CPU

2*E5-2620V4

GPU

2*GeForce RTX 2080Ti

Memory

64 GB

Operating system

CentOS 7.6.1810 (Core)

Deep learning framework

Pytorch 1.8.1

GPU driver version

CUDA 10.2/CUDNN 7.6.5

Programming language

Python 3.8

Software environment

8.3 Results and Discussion Table 8.2 Experimental parameter settings

195

Parameter name

Parameter value

Batch size

60

Epoch

150

Initial learning rate

0.001

Weight attenuation coefficient

0.0001

Optimizer

Adam

Loss function

Cross entropy loss function

introduced, that is, when the number of training reached the set parameter, the size of the learning rate would be adjusted. In this chapter, when the training rounds reach 0.5 times and 0.75 times of the total rounds, the learning rate was set to 0.1 times of the original. For other parameters involved in the experiment, the default coefficients in the experiment framework were used.

8.3.2 Experimental Result 8.3.2.1

Overall Accuracy Evaluation

Based on the classification model based on data-level data fusion, the model parameters of optical and DEM data fusion in study area B are used for transfer. Table 8.3 shows the experimental results after transfer. Using the method of model parameter transfer to fine-tune the target region can improve the performance of the model compared with the direct use of data training, and the OA, Kappa and F1_score increase by 1.11%, 0.87% and 0.75%, respectively (bolded). It shows that the method of data-level fusion and the transfer of model parameters can also ensure a certain generalization ability of the model, and the lithology related knowledge learned by the model in the source domain can be applied to the target domain. Based on the best model trained by feature-level data fusion, model parameters were migrated, and the generalization ability of the model under different data environments was explored on the data set constructed in research area B. The evaluation indicators of the final experimental results were shown in Table 8.4. It can be seen from the table that the classification accuracy of the model after the transfer and finetuning of model parameters has a certain improvement compared with the model Table 8.3 The accuracy of the parameter transfer model evaluated based on multi-view data-level fusion Model

OA (%)

Kappa (%)

F1_score (%)

Direct training

65.74 ± 1.59

53.7 ± 1.91

61.13 ± 2.24

Parameter migration

66.85 ± 1.50

54.57 ± 1.72

61.88 ± 1.41

196

8 Multi-view Lithology Remote Sensing Scene Classification Based …

directly trained, and the improvement in OA, Kappa and three evaluation indicators is 1.94%, 2.24% and 1.78% respectively (bolded). In the process of model parameters transfer, only the parameters of the optical branch and the DEM branch of the source domain model are transferred. It shows that in the process of data missing, the knowledge learned in the source domain can still be well applied to the new domain. In order to further verify the effect of model fine-tuning on target domain data, we froze the network layer before the classifier and trained the parameters of the classifier layer of the model. The final experimental results are shown in Table 8.5. It can be seen from the table that the classification effect of model fine-tuning has been greatly improved. Compared with the model with frozen parameters, the OA, Kappa and F1_score of the model increase by 3.12%, 10.14% and 10.38% (bolded). The improvement effect of Kappa is the most obvious, indicating that after the data finetuning of the target domain, the consistency of model predictions was improved. There is a certain similarity in the lithology data of the two study areas, and the convolutional layer of frozen parameters can act as a feature extractor to directly extract features from the target domain using the knowledge from the source domain. Although certain results can be achieved, there are still some differences between the two research areas. The one is that two types of lithology were added to the target area, and the other is that the spectral data sources of the two research areas were different, resulting in certain differences in spectral information between the two areas. Therefore, fine-tuning can improve the influence of these differences on the model. In order to further verify the classification effect of transfer learning in the case of smaller sample data, part of the data was extracted from the training set of each lithology in study area B, so that the sum of the training, verification and test sets of each lithology was 150, and the accuracy evaluation of the migration model at the end of the experiment was conducted, as shown in Table 8.6. As can be seen, the classification effect of the model has been improved, indicating that the migrated model can also improve the classification accuracy when the sample is small (bolded). Table 8.4 The accuracy of parameter transfer model is evaluated based on multi-view feature level fusion Model

OA (%)

Kappa (%)

F1_score (%)

Direct training

66.85 ± 1.68

55.44 ± 2.14

62.99 ± 1.90

Parameter migration

68.79 ± 1.28

57.68 ± 2.03

64.77 ± 2.07

Table 8.5 Comparison of accuracy evaluation of two transfer forms Model

OA (%)

Kappa (%)

F1_score (%)

Freeze the layers before the classifier

61.66 ± 0.61

47.19 ± 0.73

52.87 ± 0.67

Parameter migration

68.79 ± 1.28

57.68 ± 2.03

64.77 ± 2.07

8.3 Results and Discussion

197

Table 8.6 Accuracy evaluation of transfer learning in a small number of samples Model

OA (%)

Kappa (%)

F1_score (%)

Direct training

64.15 ± 2.18

56.51 ± 2.67

61.52 ± 2.29

Parameter migration

64.78 ± 1.70

57.33 ± 2.13

63.25 ± 2.20

8.3.2.2

Single Class Accuracy Evaluation

Figure 8.3 shows the confusion matrices of directly train and utilize multi-view datalevel fusion model parameter transfer on dataset of research area B. It can be seen that the classification accuracy of andesite, schist and granite-diorite has been improved to a certain extent, with the accuracy increases by 15.3%, 3.3% and 6.6% respectively, and the andesite with the largest accuracy improvement is the new lithology category in the target domain. In addition, it can be seen that the accuracy of the model on slate and loose sediments has a certain decline, which may be due to the different sources of spectral data in research area A and research area B. The model based on data-level fusion is characterized by information fusion at a low level, and the fusion of these data with different spectra of the same object has an impact on the performance of the model. Figure 8.4 shows the confusion matrices that directly train and utilize multiview feature-level fusion model parameter transfer. It can be seen that the overall classification accuracy of the model with transfer learning is higher than 46%, and the classification accuracy of schist and loose deposits is increased by 10% and 10.9%, respectively. The possible reason is that these two types of lithology are similar in the source domain and the target domain, so the model can better apply the previously learned knowledge to the new study area. The andesite in the target domain does Normalized confusion matrix Andesite

Slate

Slate Real label

Real label

Normalized confusion matrix Andesite

Schist

Basalt

Schist Basalt

Loose accumulation

Loose accumulation

granite-diorite

granite-diorite

Predicting labels

Predicting labels

(a)Parameter migration

(b)Direct training

Fig. 8.3 Confusion matrices that directly train and utilize multi-view data-level fusion model parameter transfer

198

8 Multi-view Lithology Remote Sensing Scene Classification Based … Normalized confusion matrix Andesite

Slate

Slate Real label

Real label

Normalized confusion matrix

Andesite

Schist Basalt

Schist Basalt

Loose accumulation

Loose accumulation

granite-diorite

granite-diorite

Predictinglabels (a)Parameter migration

Predicting labels (b)Direct training

Fig. 8.4 Confusion matrices that directly train and utilize multi-view feature-level fusion model parameter transfer

not appear in the source domain, but its classification accuracy is also improved by 7.7%, indicating that the model has good generalization ability and can be used as a feature extractor to learn new lithology category features from the target domain. Basalt is also a new lithology category in the target domain. In the model of transfer learning, the accuracy decreased by 2.8%, which may be due to the fact that basalt and granite-diorite in the target domain are distributed in adjacent areas with unclear boundaries and high similarity of ground coverage. In the process of transfer learning, the generalization ability of the model was affected to a certain extent. Experiments show that the two transfer learning methods have certain performance improvement compared with the direct use of data for model training. By comparison, it is found that the feature-level fusion model trained by the three data sources can achieve better effects than the data-level fusion model when transferring to the target domain of the two data sources, indicating that the model trained by this way has better performance in transfer. When the model is transferred, even if the multi-view data types in the source domain and the target domain are different, it can also bring better classification results.

8.4 Conclusion The distribution of lithology categories is characterized by great regional differences, and lithology categories that do not exist in the source domain may be distributed in the target domain, which makes it difficult to identify new lithology in crossregional prediction by conventional models. Based on the idea of transfer learning, this paper uses a small amount of multi-view data and fine-tuning strategy to adapt to the lithology classification task in the new study area, and carries out the transfer

References

199

learning of the multi-view data lithology classification model, which can achieve the identification of new lithology categories. The experiment shows that the two transfer learning methods have some performance improvement compared with the direct use of data for model training. The classification accuracy of the model after the transfer and fine-tuning of the feature-level fusion model parameters has a certain improvement compared with the model directly trained, and the improvements in OA, Kappa and three evaluation indicators are 1.94%, 2.24% and 1.78%, respectively. The performance of the feature level fusion trained by the three data sources to the target domain of the two data sources exceeds that of the data level fusion, indicating that the model trained by this method has better performance in transfer learning. In addition, using the feature level fusion strategy, the parameters of the model are more conducive to transfer learning, and the model promotion is easier.

References Chen, W., Ouyang, S., Tong, W., et al. (2022). GCSANet: A global context spatial attention deep learning network for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 1150–1162. Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009, 248–255. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141. Huang, G., Liu, Z., Van Der Maaten, L., et al. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700– 4708). Ma, D., & Li, P. (2008). Lithology classification with multi-scale image texture. Acta Petrologica Sinica, 24(6), 1425–1430 (in Chinese). Oquab, M., Bottou, L., Laptev, I., et al. (2014). Learning and transferring mid-level image representations using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1717–1724. Tian, T., Li, L., Chen, W., et al. (2021). SEMSDNet: A multiscale dense network with attention for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 5501–5514. Tong, W., Chen, W., Han, W., et al. (2020). Channel-attention-based DenseNet network for remote sensing image scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 4121–4132. Yosinski, J., Clune, J., Bengio, Y., et al. (2014). How transferable are features in deep neural networks? Advances in Neural Information Processing Systems, 27.

Chapter 9

Lithological Scene Classification Based on Model Migration and Fine-Tuning Strategy

Abstract The distribution of lithologic categories has a significant uneven distribution, which means there may be lithologic categories in one domain that are not available in another domain. To tackle the problem of difficult identification of unabled lithology in cross regional prediction using conventional models, this article is based on the idea of transfer learning. To develop an improved dense connected network for the source domain and fine-tune the model with a small amount of data from target domain for achieving the lithology classification here. A classification experiment was conducted using dataset A1 and dataset A2. The results indicate that the proposed model is valid for identifying new lithology types that was not found in the source domain, and also improved the classification accuracy with only limited samples. OA and F1_score, and Kappa on the normal test set were 61.52 ± 0.95%, 55.58 ± 2.58%, and 52.18 ± 1.01%, respectively, and on a small sample test set were 47.40 ± 0.65%, 49.58 ± 0.41%, and 40.41 ± 0.45%, respectively, which were superior to the direct training.

9.1 Introduction To achieve good performance in deep learning tasks, a large amount of labeled data is usually required for training models. However, in certain fields, such as remote sensing lithology classification, obtaining sufficient annotated data can be difficult and expensive. In this case, researchers have begun to consider using cross domain transfer learning methods. This study will discuss how to use transfer learning technology to solve the problem of lithology classification and explore its further application.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_9

201

202

9 Lithological Scene Classification Based on Model Migration …

9.1.1 Overview of Transfer Learning Transfer learning refers to improving the performance of a model by applying the learned knowledge and experience to new problems. In traditional machine learning methods, models are usually trained for a specific task and typically require a large amount of annotated data to achieve good performance. But in many practical applications, we may not have enough annotated data, or it is impossible to spend a lot of time and resources like training models from scratch. At this point, transfer learning provides an effective method to solve these problems. Transfer Learning is the use of neural networks to learn in the source domain and apply the acquired knowledge to the target domain, thereby improving the performance of network models in the target domain. Usually, the domain is denoted as D, which can be expressed as D = {X, P(X)}, Where X represents the feature space of the domain, P(X) Represents the edge probability density, where X Represents a set of instances in the feature space: X = {x_1, x_2,…, x_n} ∈ X. Recording a task as T can be represented as T = {Y, f (·)}, where Y is the label space and f (·) is the prediction function in domain D. In transfer learning, there are three main components: (1) Source domain: Source domain refers to the domain where we already have some annotated data and have solved a certain task through training models. The data and experience from the source domain can be used to help us train models in new domains. (2) Target area: The target area refers to the area where we want to make prediction. Usually, we only have a small portion of annotated data available in the target domain. (3) Transfer method: Transfer method refers to how to transfer knowledge and experience from the source domain to the target domain. In transfer learning, we can adopt different transfer methods and choose the most suitable method based on specific problems and data.

9.1.2 Deep Transfer Learning Deep transfer learning studies how knowledge from other fields can be harnessed through deep neural networks. With the popularity of deep neural networks in various fields, a large number of deep transfer learning methods have been proposed. Pan and Yang (2009) proposed that according to the form of transfer knowledge, transfer learning can be divided into case-based deep transfer learning, map-based deep transfer learning, adversarial deep transfer learning and network-based deep transfer learning. Instance-based deep transfer learning refers to the use of specific weight adjustment strategies, selects some instances from the source domain as a supplement to the target domain training set, and assigns appropriate weight values to these selected instances. It is based on the assumption that “despite the differences between the two domains, some instances in the source domain can be utilized in the target domain

9.1 Introduction

203

with appropriate weight.” TrAdaBoost, proposed by Dai et al. (2007) uses AdaBoot based technology to filter instances that differ from the source domain to the target domain. The instances in the source domain are reweighted to form a distribution similar to the target domain. Finally, the model is trained by using the reweighted instances from the source domain and the original instances from the target domain. On the basis of maintaining the property of AdaBoost, the weighted training error on different distribution domains can be reduced. The TaskTrAdaBooster proposed by Yao and Doretto (2010) is a fast algorithm that promotes fast retraining of new targets. Unlike TrAdaBoost, which was designed for classification problems, Li et al. (2017) proposed an enhanced TrAdaBoost to deal with the classification of interregional sandstone microscopic images. Map-based deep transfer learning refers to the mapping of instances of source and target domains to a new data space. In this new data space, instances from both domains are similar and suitable for joining deep neural networks. Long et al. (2016) proposed joint maximum mean difference (JMMD) to measure joint distribution relationships. The JMMD generalization deep neural network (DNN) is used to adapt the transfer learning ability of different domain data distribution, which improves the previous work. The Wasserstein distance proposed by Chang et al. (2017) can be used as a new domain distance measurement method to achieve a better mapping. Adversarial deep transfer learning refers to the introduction of adversarial techniques inspired by generative adversarial networks (GANs) (Goodfellow et al., 2014) to find transferable representations applicable to both the source domain and the target domain. Adversarial deep transfer learning (ADversarial deep transfer learning) has developed rapidly in recent years because of its good effect and strong practicability. Ajakan et al. (2014) introduced adversarial techniques for domain adaptive transfer learning by using domain adaptive regularization terms in loss functions. Ganin and Lempitsky (2014) proposed an adversarial training method that is suitable for most feedforward neural models by adding a small number of standard layers and a simple new gradient inversion layer. Tzeng et al. (2017) proposed a new GAN loss and a new domain adaptive method combined with discriminant modeling. Long et al. (2017) proposed a stochastic multilinear adversarial network that utilizes multiple feature layers and a classifier layer based on stochastic multilinear adversarial to achieve depth and discriminative adversarial adaptation. Network-based deep transfer learning refers to reusing a part of the network pre-trained by the source domain (including its network structure and connection parameters) as part of the deep neural network used by the target domain. It is based on the assumption that neural networks are similar to the processing mechanisms of the human brain and are an iterative, continuous abstract process. The front layer of the network can be viewed as a feature extractor, and the extracted features are generic. Zhu et al. (2016) simultaneously learned domain adaptive and deep hashing features in DNN. Chang et al. (2017) proposed a new multi-scale convolutional sparse coding method. The method can automatically learn filter banks at different scales and enhance the scale specificity of the learning mode, providing an unsupervised solution for learning transferable fundamentals and fine-tuning for target tasks. Another very notable result (Yosinski et al., 2014) points out the relationship between network

204

9 Lithological Scene Classification Based on Model Migration …

structure and portability. The results show that some modules may not affect indomain accuracy, but may affect portability. It is pointed out which features are transferable in deep networks and which types of networks are more suitable for transfer. In this study, DenseNet is considered to be a better network-based deep transfer learning method. With the wide application and research of deep learning models, deep learning based on network migration has been widely used in different fields, especially in remote sensing where data acquisition is difficult. The network model is usually trained in the source domain, and the model is fine-tuned using the data of the target domain to adapt to the data characteristics of the domain and complete the task of the target domain. Yosinski et al., (2014) conducted experiments on the ImageNet dataset (Deng et al., 2009), randomly dividing it into two groups to train and evaluate the transferability of the network model. The results show that as the similarity between the source task and the target task decreases, the transfer performance of the model decreases. However, using the parameter transferring method, the model can get better results than the random initialization of the parameters. In another study, Oquab et al. (2014) took a pre-trained model on the ImageNet dataset, transferred the parameters to the target domain with a small amount of data, and fine-tuned the model to get good results. This study shows that convolutional neural networks can still improve classification accuracy by performing transfer learning on datasets with different statistical characteristics and tasks.

9.2 Methods 9.2.1 Left Side as the Source Domain, and Right Side as the Target Domain We firstly trained the left area of the study site as the source domain, then divided the training set, validation set, and testing set on the right side of the study site as the target domain, and used the training set to fine-tune the model trained from the left area data; Afterwards, we used the test set in the right area for testing and evaluating the performance; Totally 50 small sample datasets were selected from each category in the right area, which were then divided into training, validation, and testing sets. Then we used the small sample training set to fine-tune the model trained from the left area data; Finally, we used the small sample test set in the right area to test and evaluate the model performance (Fig. 9.1).

9.3 Results and Analysis

205

Fig. 9.1 Transfer learning model figure 1

9.2.2 Right Side as the Source Domain, and Left Side as the Target Domain The method of exchanging source domain and target domain to verify the robustness and generalization ability of the model can be divided into three steps: Firstly, we trained the right area of the study site as the source domain; Then we used the left side of the study site as the target domain to obtain the training set, validation set, and testing set, and used the training set to fine-tune the model trained from the right area data; Finally we used the test set in the left area to test and evaluate the performance (Fig. 9.2).

9.3 Results and Analysis 9.3.1 Experimental Results and Analysis of Transfer Learning Model Based on the improved dense connected networks, the model parameters were transferred and the generalization ability of the model in different data environments was explored on the dataset constructed in study site A2.

206

9 Lithological Scene Classification Based on Model Migration …

Fig. 9.2 Transfer learning model figure 2

The evaluation metrics of the final experimental results are shown in Table 9.1. It can be seen that by transferring model parameters and fine-tuning a small amount of labeled data in the target domain, the classification accuracy of the model is improved compared to directly retraining the model in the target domain, resulting in a certain improvement in classification performance with the improvements in F1 score, Kappa, and OA of 4.07%, 5.57%, and 4.52%, respectively. Figures 9.3 and 9.4 show the confusion matrix diagrams using parameter transfer and direct training of the model. It can be seen from them that the classification accuracies of Quaternary, shallow grained, and pyroxenite rocks were improved by 19.36%, 15.38%, and 16.66%, respectively, for the transfer learning model. The improved accuracy is relatively higher, possibly due to the similarity of these three types in the source and target domains, so the model can effectively apply the previously learned knowledge to the new research area. Table 9.1 Precision evaluation of parameter migration model Model

F1_score (%)

Kappa (%)

OA (%)

Direct training

51.51 ± 0.70

46.61 ± 0.86

57.00 ± 1.28

Transfer learning

55.58 ± 2.58

52.18 ± 1.01

61.52 ± 0.95

9.3 Results and Analysis

207

Fig. 9.3 Directly trained confusion matrix

The diorite label in the target domain was not available in the source domain, but its classification accuracy also improved by 4.77%, indicating that the transfer learning model has better generalization ability. The model can serve as a feature extractor to learn new lithological category features from the target domain. To further validate the effectiveness of using target domain data for model finetuning, the network layer before the classifier was frozen on the dataset in study area A2, and only the classifier layer of the model was trained. The final experimental results are shown in Table 9.2. From Table 9.2, it can be seen that the classification effect of model fine-tuning has significantly improved, compared to the frozen parameter model in F1 Score, Kappa, and OA which were improved by 5.56%, 7.74%, and 4.29%. And Kappa coefficient shows the most significant improvement, indicating that after fine-tuning with the target domain data, the consistency of the model prediction was highly improved. There is a certain similarity in the lithology data of the two research areas. The convolutional layer of the frozen parameters preserves the effective feature representation of the source domain model in the target domain, while avoiding the feature representation of the source domain model being affected by small-scale data in the target domain, thereby avoiding overfitting and reducing the influence of insufficient training data from the target domain. This improved the generalization ability and performance of the model in the prediction for the target domain. Although certain results can be achieved, due to the differences between the two research areas,

208

9 Lithological Scene Classification Based on Model Migration …

Fig. 9.4 Confusion matrix of parameter migration

Table 9.2 Comparison of precision evaluation between two types of migration forms Model

F1_score (%)

Kappa (%)

OA (%)

Freeze layers before classifier

50.02 ± 1.59

44.44 ± 0.24

57.23 ± 0.72

a type of lithology that was not present in the source domain was included in the target categories. Therefore, fine-tuning can to some extent mitgate the impact of the difference on the model. To verify the generalization ability of the model, the source domain and target domain were also exchanged. The target domain in the first section was used as the source domain, and the source domain was used as the target domain. If a transfer learning model can still perform well even when the source and target domains are interchangeable, it indicates that the model has good generalization ability and robustness, and can be effectively applied on different datasets. In addition, exchanging source and target domains can help us better understand the mechanism of transfer learning models. By observing the performance of the model on different datasets, it is possible to better understand how it utilizes source domain knowledge to assist in target domain learning. The evaluation metrics for the experimental results are shown in Table 9.3. From the table, it can be seen that the improvements in F1 score, Kappa, and OA evaluation metrics are 7.16%, 4.25%, and 3.31%, respectively. It means when the

9.4 Experimental Results and Analysis of Transfer Learning Based on Small …

209

Table 9.3 Precision evaluation of parameter migration model (exchanged) Model

F1_score (%)

Kappa (%)

OA (%)

Direct training

56.17 ± 1.88

63.60 ± 1.75

71.74 ± 1.22

Transfer learning

63.33 ± 4.30

67.85 ± 0.63

75.05 ± 0.34

source domain and target domain were exchanged, the model proposed in this article could still improve classification performance, indicating that the model has good robustness and generalization ability.

9.4 Experimental Results and Analysis of Transfer Learning Based on Small Samples In order to further verify the generalization ability of the model, transfer learning was chosen in the case of smaller sample data which is easily affected by noise and specificity, thereby affecting the generalization ability of the model. By using transfer learning methods, the generalization ability of the target domain model can be improved by utilizing the knowledge of the source domain, making it more adaptable to the characteristics of the target domain data. On the basis of the A2 dataset in the study area, a dataset with fewer samples was constructed for model training: a portion of the data was extracted from each type of lithology training set in the study area A2, so that the sum of training, validation, and testing sets for each type of lithology was 50 images. The accuracy evaluation of the migration model was conducted through experiments, as shown in Table 9.4. From Table 9.4, it can be seen that in the case of small samples, the accuracy has significantly decreased compared to that of previous scenarios, indicating the importance of samples. Sufficient samples can better extract key features. However, the method based on transfer learning could still improve the classification performance of the model when only a small amount of labeled target domain data was involved in model fine-tuning compared to the situation without transfer. The improvements of F1 score, Kappa, and OA were 5.08%, 4.76%, and 4.02%, respectively. The transferred model will transfer the lithology classification knowledge learned in the source domain to in its prediction for the target domain, which can improve the accuracy of classification even when there are few samples, reflecting the model’s generalization ability.

Table 9.4 Precision evaluation of transfer learning with a small number of samples Model

F1_score (%)

Kappa (%)

OA (%)

Direct training

44.50 ± 1.86

35.65 ± 2.23

43.38 ± 1.97

Transfer learning

49.58 ± 0.41

40.41 ± 0.45

47.40 ± 0.65

210

9 Lithological Scene Classification Based on Model Migration …

9.5 Conclusion This chapter utilizes the idea of transfer learning to study the generalization ability of the lithology classification model. In the target domain study site A2, there were lithological types that were not present in the source domain study site A1. In order to improve the generalization ability of the model, a classification model trained in the source domain was used to transfer the relevant information learned by the model to the target domain through the strategy of model parameter transfer. A small amount of labeled data in the target domain was used to adjust the transferred model, improving the classification performance of the model in the target domain and achieving the goal of identifying new lithological categories. Experiments have shown that in all scenarios the transfer learning methods can have certain performance improvements compared to directly using data for model training.

References Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., & Marchand, M. (2014). Domain-adversarial neural networks. arXiv preprint arXiv:1412.4446. Chang, H., Han, J., Zhong, C., Snijders, A., & Mao, J. H. (2017). Unsupervised transfer learning via multi-scale convolutional sparse coding for biomedical applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. Dai, W., Yang, Q., Xue, G. R., & Yu, Y. (2007). Boosting for transfer learning. In Proceedings of the 24th international conference on machine learning (pp. 193–200). Deng, J., Dong, W., Socher, R., et al. (2009). A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). Ganin, Y., & Lempitsky, V. (2014). Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 2672–2680. Li, N., Hao, H., Gu, Q., Wang, D., & Hu, X. (2017). A transfer learning method for automatic identification of sandstone microscopic images. Computers and Geosciences, 103, 111–121. Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2017). Domain adaptation with randomized multilinear adversarial networks. arXiv preprint arXiv:1705.10667. Long, M., Wang, J., & Jordan, M. I. (2016). Deep transfer learning with joint adaptation networks. arXiv preprint arXiv:1605.06636. Oquab, M., Bottou, L., Laptev, I., et al. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1717–1724). Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Computer vision and pattern recognition (CVPR) (Vol. 1, p. 4). Yao, Y., & Doretto, G. (2010). Boosting for transfer learning with multiple sources. In Computer vision and pattern recognition (CVPR) (pp. 1855–1862). Yosinski, J., Clune, J., Bengio, Y., et al. (2014). How transferable are features in deep neural networks? Advances in Neural Information Processing Systems, 27. Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. AAAI, 2415–2421.

Chapter 10

Hyperspectral Remote Sensing Inversion of Mineral Abundance Based on Sparse Unmixing Method

Abstract The task of mineral abundance inversion using hyperspectral images is a promising challenge. This study presents several existing state-of-the-art sparse unmixing algorithms that combine spectral and spatial information from hyperspectral images. They show good performance in both simulation and real hyperspectral datasets. In particular, the spectral information and spatial structure information in hyperspectral images can be more fully utilized by introducing the superpixel segmentation algorithm. Taking the Cuprite dataset as an example, which is a real mining dataset, experiments indicate that the sparse unmixing algorithm achieves satisfactory results on this dataset.

10.1 Introduction Since the 1950s land observation satellites have been applied to the Earth observation mission, which has made the development of hyperspectral remote sensing image processing technology a popular research topic that developed rapidly. Hyperspectral images can capture information with a continuum of spectral bands, usually covering the whole visible and near-infrared bands. It contains hundreds or even thousands of consecutive spectral bands, thus providing richer spectral information. However, it is important to note that due to the limitations of the spatial resolution of the sensor and the complex diversity of the ground, a single pixel of a hyperspectral image may contain the spectral signature of several different substances, which is called the mixed pixel. The presence of mixed pixels is a serious impediment to further exploration of hyperspectral images. In order to solve the problem, hyperspectral unmixing (SU) algorithms have been proposed (Bioucas-Dias et al., 2012). The SU algorithm aims to extract the spectral signatures of the pure material in the mixed pixels, called endmember, and estimate their corresponding abundance. With SU algorithms, hyperspectral images can be interpreted and analyzed more accurately, which can help researchers better understand the information in hyperspectral images. The linear mixing model and nonlinear mixing model are two models commonly used

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_10

211

212

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

in hyperspectral unmixing algorithms. The linear mixing model assumes that the mixing process between different substances and materials occurs on a macroscopic scale and that the interactions between different endmembers are negligible. Thus, the observed spectral signature of mixed pixels can be characterized as a linear combination of the endmembers of a set of substances and materials weighted by their corresponding fractional abundances. The nonlinear mixing model assumes that multiple endmembers in the scene are subject to multiple scattering effects, resulting in the observed spectral response of the mixed pixels being a nonlinear combination of endmember abundances. Although the linear mixing model cannot completely and accurately describe the imaging process of hyperspectral images in real scenes, it is usually considered as an acceptable approximation model for light scattering and instrumental mechanisms. Moreover, due to its simplicity, effectiveness and ease of computation, the linear mixture model has been widely used in hyperspectral unmixing algorithms. Therefore, the hyperspectral unmixing algorithms presented in this book are based on linear mixing models. Hyperspectral unmixing algorithms mainly include geometry-based methods, statistics-based methods, and sparse regression-based methods. Geometry-based unmixing algorithms work by geometrically modeling the hyperspectral image dataset. Such algorithms assume that each endmember in a hyperspectral image dataset can be represented as a vertex of a simplex, which can be either a simplex that contains the smallest volume of the dataset or a vertex of a simplex that contains the largest volume in a convex envelope of the dataset. A simplex is a basic geometric figure that consists of a set of points and the line segments, triangles, and tetrahedrons that connect those points. By representing each end element in a hyperspectral image dataset as a vertex of a simplex, a geometry-based unmixing algorithm can treat each mixed pixel in a hyperspectral image dataset as the intersection of multiple geometric objects. Subsequently, these geometric objects can be decomposed into a collection of simple shapes, thus enabling geometric modeling of hyperspectral image datasets. The unmixing and quantitative analysis of mixed pixels is finally accomplished. Classical geometry-based unmixing algorithms include N-FINDR, vertex component analysis, and simplex growth algorithms. Statistical-based unmixing algorithms take advantage of the statistical properties of hyperspectral images and are able to simultaneously identify endmembers and their corresponding abundances. Popular statistics-based unmixing algorithms include independent component analysis, nonnegative matrix decomposition, and Bayesian methods. Both geometric and statistical based unmixing methods are unsupervised and they can extract endmember information directly from raw hyperspectral image data without any a priori. However, these algorithms require the presence of pure pixels in hyperspectral images, which is often difficult to fulfill for hyperspectral data in real scenes due to the limitation of spatial resolution and the influence of complex environments. As a result, these algorithms may extract endmembers from hyperspectral images that have no real physical meaning, leading to unsatisfactory unmixing results. Sparse unmixing algorithms based on sparse regression have received increasing attention as the availability of spectral library of ground-measured materials increases. The sparse unmixing algorithm utilizes the spectral library as a priori

10.1 Introduction

213

information and assumes that the mixed pixels can be approximated by a linear combination of the spectral signatures of several pure materials in the spectral library, which converts the hyperspectral unmixing problem into identifying the endmembers present within the image from the spectral library and calculating their corresponding abundance coefficients. Sparse unmixing, as a semi-supervised method, can effectively avoid the negative effects of extracting inaccurate endmember information and incorrectly estimating the number of endmembers in hyperspectral images. In addition, the abundance matrix contains only a small number of nonzero elements because the number of end elements present in the hyperspectral image is much smaller than the number of spectral features in the spectral library. In other words, the sparse unmixing problem can be described as finding a sparsest solution under the condition of satisfying the physical constraints of the abundance matrix. Therefore, many unmixing methods are devoted to improving the sparsity of the abundance matrix. For example, Iordache et al. (2011) proposed a sparse unmixing by variable splitting and augmented lagrangian (SUnSAL) algorithm, which introduces the L1-norm as a sparse regularization term, and promotes the abundance matrix to be sparse by minimizing the sum of the absolute values of the elements in the matrix. In addition, taking the spatial correlation of hyperspectral images into account can effectively improve the performance of the sparse unmixing algorithm. The spatial correlation of hyperspectral images is manifested in the fact that pixels in neighboring regions have similar spectral signatures, and thus there should be correlation between the abundance vectors corresponding to neighboring pixels. Iordache et al. (2012) proposed a SUnSAL-TV algorithm by introducing a Total Variation (TV) regularization term based on the SUnSAL algorithm. SUnSAL-TV effectively captures the spatial correlation of hyperspectral images by enforcing the constraint that neighboring pixels of a hyperspectral image have similar abundance. Zhang et al. (2018) proposed a spectralspatial weighted sparse unmixing (S2WSU) algorithm that explores the spatial and spectral information of hyperspectral images using the weighting factor of sparse regularization terms. Specifically, the S2WSU algorithm introduces both spatial and spectral weighting factors in the sparse regularization term. In particular, the spectral weighting factor promotes the row sparsity of the abundance matrix, while the spatial weighting factor captures the spatial correlation information of the hyperspectral image. Superpixel segmentation can group the pixels of an HSI into multiple shape-adaptive homogeneous regions called superpixels, constricting the pixels in the same superpixel similar to each other. For example, Li et al. (2020) proposed superpixel-based reweighted low-rank and total variation (SUSRLR-TV) imposed a low-rank structure for each shape-adaptive homogeneous region, which can increase the smoothing of the unmixing result using the TV regularization. Borsoi et al. (2018) proposed a fast multiscale spatial regularization unmixing algorithm (MUA) first performed superpixel segmentation to construct a coarse domain, and then explored the spatial context information by using the unmixing results of the coarse domain to generate a multiscale spatial regularization in the original domain. Benefiting from the effectiveness of the MUA algorithm, Zhang et al. (2022) proposed a spectral reweighting and spectral similarity weighting for sparse unmixing algorithm

214

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

(SRSSWSU), which captures local spatial information more carefully by utilizing two weighting factors.

10.2 Methods 10.2.1 LMM In the linear mixing model, each mixed pixel of a hyperspectral image is assumed to consist of a linear combination of the spectral signature of several pure materials. For an individual mixed pixel in a hyperspectral image, it can be formalized as Eq. 10.1: y = Sx + n

(10.1)

where y = [y1 , . . . , yl ]T ∈ Rl×1 denotes the column vector of the spectral features of a pixel, yi denotes the value of the reflectance at the ith spectral band, and l T denotes the number of spectral bands. x = x1 , . . . , xq ∈ Rq×1 denotes the abundance vector, x j indicates the proportion of the jth spectral signature in the mixed pixels, and n ∈ Rl×1 is the noise and the modeling error. In addition, the abundance vector x needs to satisfy two physical constraints, which are the abundance nonnegative constraint (ANC) and the abundance sum-to-one constraint (ASC). These two abundance constraints can be expressed as Eqs. 10.2 and 10.3: xi ≥ 0 (i = 1, . . . , q)

(10.2)

1T x = 1

(10.3)

where (·)T denotes the transpose operation of a vector or matrix. These two constraints ensure that the resulting abundance vector x is physically feasible and reasonable. If we assume that the observed hyperspectral image contains m pixels, the corresponding linear mixed model can be expressed as Eq. 10.4: Y = SX + N

(10.4)

where Y = y1 , . . . , ym ∈ Rl×m denotes the hyperspectral image containing m pixels, yi ∈ Rl×1 denotes the measured spectral vector of the ith pixel. X = [x 1 , . . . , x m ] ∈ Rq×m denotes the abundance matrix, x j ∈ Rq×1 corresponds to the fractional abundance of the jth mixed pixel, and N ∈ Rm×l denotes the model error.

10.2 Methods

215

10.2.2 SUnSAL Sparse unmixing algorithms aim to find combinations in the large spectral library that can best explain the observed spectraas it contains hundreds of spectral signatures of the material. Specifically, the observed hyperspectral images can be constructed Y = AX + N

(10.5)

where A ∈ Rl× p denotes the spectral library containing p endmembers, X ∈ R p×m denotes the abundance matrix corresponding to the spectral library A, and each row of elements of the abundance matrix X corresponds to the contribution of an end element of the spectral library in a hyperspectral image, and each column represents the fractional abundance of a mixed pixel. Since the number of endmembers involved in a hyperspectral image is much smaller than the number of spectral signatures p in the spectral library, the abundance matrix X is very sparse, with most of the elements having a value of zero. Thus, sparse unmixing can be expressed as an optimization problem, i.e., finding a sparse abundance matrix X by minimizing an error function such that it accurately describes the spectral characteristics of the mixed pixels. Specifically, it can be expressed as Eq. 10.6: min 21 ||Y − AX ||2F + λ||X ||0 s.t.X ≥ 0 X

(10.6)

where ||·|| F denotes the Frobenius norm, which is used to measure the reconstruction error of the model. ||X||0 denotes the L0 norm of the abundance matrix X, which allows to compute the number of non-zero elements in the abundance matrix X. The L0 norm is used to promote the sparsity of the abundance matrix X. λ ≥ 0 denotes a non-negative regularization parameter, which is used to adjust the weights of the sparse regularization terms in the optimization problem, thus controlling the sparsity of the abundance matrix X. The X ≥ 0 ensures that each element in the abundance matrix X is a non-negative value. It is also worth noting that the ASC cannot always be satisfied due to the phenomenon of spectral variation that exists in real hyperspectral image scenes. Therefore, the ASC constraint is not added to the sparse unmixing model. Although the L0 norm explains the sparsity of the abundance matrix X, Eq. 10.6 is an NP-hard problem due to its nonconvex and discontinuous character. To solve this problem, Iordache et al. proposed the SUnSAL sparse unmixing algorithm, which introduces the L1 norm as a convex approximation to the L0 norm, and its objective function can be expressed as min 21 ||Y − AX ||2F + λ||X ||1,1 s.t.X ≥ 0 X

(10.7)

216

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

m where ||X||1,1 = i=1 ||x i ||1 , and x i denotes the ith column element of the abundance matrix X. . The corresponding optimization problem can be solved effectively by introducing Alternating Direction Method of Multipliers (ADMM).

10.2.3 SUnSAL-TV SUnSAL algorithm simply considers the spectral information of the hyperspectral image, and the high mutual coherence between the endmembers in the spectral library greatly limits the performance of SUnSAL. Hyperspectral images usually exhibit strong spatial distributions, therefore, properly utilizing the spatial information of hyperspectral images can effectively improve the performance of the unmixing algorithm. Iordache et al. introduced a total variation (TV) regularization term based on the SUnSAL algorithm, which limits the strong variations between neighboring pixels so that the abundance vectors of neighboring pixels have similar distributions. The unmixing model of the SUnSAL-TV algorithm can be expressed as Eq. 10.8 min 21 ||Y − AX ||2F + λ||X ||1,1 + λT V T V (X)s.t.X ≥ 0

(10.8)

X

where TV(X) = i, j∈η ||x i − x j ||, and η denotes the set of pixels in the horizontal and vertical neighborhoods of the hyperspectral image, and λ ≥ 0 and λT V ≥ 0 denote the parameters of the sparse regularization term and the TV regularization term, respectively. The TV regularization term effectively contributes to the segmental smoothness of the abundance. When λT V = 0, the SUnSAL-TV degenerates into the SUnSAL. SUNSAL-TV introduces two linear operators Hh and Hv to compute the gradient information between the abundance vectors corresponding to neighboring pixels. Hh : R p×m → R p×m used to calculate the horizontal gradient of the abundance matrix X. Specifically, Hh X = [d1 , d2 , . . . , dn ], where di = xi − xih . xi and xih denote the abundance vectors of the pixel yi and the abundance vectors of its horizontal neighborhood pixel yih . Hv : R p×m → R p×m denotes the linear operator that computes the vertical gradient of the abundance matrix X. Hh X = [v1 , v2 , . . . , vn ], where vi = xi − xiv . i and i v denote the index of the ith pixel of a hyperspectral image and its vertical neighborhood pixel, respectively. By introducing these two linear operators, the TV regularization term can be defined as HX =

Hh X Hh X

(10.9)

The objective function of the SUnSAL-TV algorithm can be expressed as min 21 ||Y − AX||2F + λ||X||1,1 + λT V ||H X||1,1 s.t.X ≥ 0 X

(10.10)

10.2 Methods

217

10.2.4 MUA MUA firstly introduces the SLIC superpixel segmentation algorithm, which can segment hyperspectral images into multiple adaptive homogeneous regions. In this way, MUA algorithm decomposes the original unmixing problem into two simpler problems. One problem involves unmixing in the approximate image domain and the other problem involves the original image domain. In the approximate image domain, the MUA algorithm first segments the hyperspectral image into s superpixels, and then uses the average value of the elements within each superpixel as the constituents of the approximate image Y C ∈ Rl×s , which can be expressed as 1 Y C (:, j ) = |I j |

|I j |

Y :, I j (q)

j = 1, . . . , s

(10.11)

q=1

where Y ∈ Rl×m denotes a hyperspectral image containing m pixels, I j denotes the index of the jth superpixel, and |I j | denotes the number of pixels within the jth superpixel. The sparse unmixing model constructed using the approximate image Y C can be represented as 1 min ||Y C − AX C ||2F + λC ||X C ||1,1 s.t.X C ≥ 0 XC 2

(10.12)

where X C ∈ R p×s denotes the abundance matrix of the approximated image Y C , p denotes the number of endmembers in the spectral library A, and λC ≥ 0 denotes the non-negative hyperparameters of the sparse regularization term. Subsequently, X C will be used to constrain the unmixing problem in the original image domain. In the original image domain, MUA algorithm firstly converts the abundance matrix X C in the approximate image domain back to the original image domain and uses the notation X D ∈ R p×m to denote it, and this conversion process can be represented as X D :, I j (q)

q=1,...,|I j |

= X C (:, j ) j = 1, . . . , s

(10.13)

The abundance matrix X D captures the correlation between pixels within a block of hyperpixels. Next, a multi-scale spatial regularization term is constructed using X D . By adding it within the unmixing problem in the original image domain, the unmixing model can be represented as 1 β min ||Y C − AX C ||2F + λ||X||1,1 + ||X D − X||2F , s.t.X ≥ 0 X 2 2

(10.14)

where λ ≥ 0 and β ≥ 0 denote the hyperparameters of the sparse regularization term and the multi-scale spatial regularization term, respectively. X denotes the unmixing result to be solved. Based on the fact that the spectral signatures of mixed pixels within

218

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

the same superpixel are similar, their corresponding abundance vectors should be similar as well. The MUA algorithm works by minimizing the constructed multi-scale spatial regularization term ||X D − X||2F , so that the final abundance matrix X will be similar to X D . Therefore, the correlation information between neighboring pixels obtained in the approximate image domain can be transferred to the original image domain. Notably, compared with the TV regularization term, the multi-scale spatial regularization term does not need to explicitly consider the dependencies between neighboring pixels, which reduces the computational complexity and convergence time of the unmixing algorithm.

10.3 Experimental Results and Discussion In this section, the spectral libraries, two simulations and a real hyperspectral dataset were used in the experiments, and the associated experiments are described as follow.

10.3.1 Spectral Library Spectral library is an important data resource in the field of remote sensing, which contains the signals of reflectance spectral signatures of different ground materials, such as minerals, soils and vegetation. This study used the United States Geological Survey (USGS) spectral library, which contains a total of 498 different spectral features of pure materials. Each endmember contains 224 spectral bands, and the spectra cover a wavelength range of 0.4–2.5 µm. In the simulation experiments, spectral signatures of 240 materials were randomly selected from the USGS spectral library and used to construct the spectral library A ∈ R240×224 . In addition, in order to mitigate the impact of the high coherence of the spectral library on subsequent experiments, the angle between any spectral signature was at least 4.44°.

10.3.2 Simulation Datasets Simulated dataset 1 (SD1): five end elements were randomly selected from spectral library A to generate the simulated data set SD1. SD1 contains 75 × 75 pixels with 224 spectral bands. SD1 complies with the linear mixing model, and the corresponding abundance matrix satisfies the ANC and ASC constraints. Figure 10.1 a shows the simulated image of SD1, and Fig. 10.1b–f show the true abundance for each of the five endmembers. Simulated dataset 2 (SD2): SD2 was generated by randomly selecting nine endmembers from spectral library A. This data set contains 100 × 100 pixels, each containing 240 spectral bands. SD2 was generated based on a linear mixture model

10.3 Experimental Results and Discussion

219

Fig. 10.1 Simulated dataset 1

and the fractional abundances satisfy the ANC and ASC constraints. Unlike SD1, the abundance map of SD2 was generated based on a Dirichlet distribution centered on a Gaussian random field with locally smooth regions and sharp edge regions. Figure 10.2 shows the true abundance maps for the nine end endmembers of SD2.

10.3.3 Real Datasets The well-known Cuprite dataset was acquired by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over the Cuprite mine in Nevada, U.S.A. The Cuprite dataset provides 224 spectral bands of high-resolution spectral information covering a wavelength range from 0.4 to 2.5 µm. The high-resolution spectral information contained in the Cuprite dataset provides a rich and detailed description of the materials present in the Cuprite mine, and is commonly used for a variety of hyperspectral image analysis tasks, including: hyperspectral image unmixing (estimating each material present in a pixel and its corresponding abundance), classification (assigning each pixel a label denoting the type of material present), and anomaly detection (identifying pixels that are significantly different from surrounding pixels), etc. Figure 10.3 shows a mineral map produced by the USGS in 1995. During the acquisition of image data, the presence of atmospheric water and other gases in the atmosphere may generate interference and data noise, resulting in spectral bands with low signal-to-noise ratios in the hyperspectral images. Therefore,

220

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

Fig. 10.2 Simulated dataset 2

in order to improve the accuracy and reliability of the data in the experiment, the spectral bands 1 to 2, 104 to 113, 148 to 167, and 221 to 224 have been removed and the remaining 188 spectral bands were used for experimental analysis. After proper cropping of the original image, sub-images of size 250 × 191 pixels were used to validate the effectiveness of the unmixing algorithm. Since real abundance maps for the Cuprite dataset are difficult to obtain, this study qualitatively evaluates the performance of different unmixing algorithms using mineral classification maps generated by the Tricorder 4.3 software as a reference. We compared the unmixing results of three mineral materials from the Cuprite dataset, Alunit, Buddingtonite and Chalcedony, which were generated by the Tricorder 4.3 software shown in Fig. 10.4.

10.3 Experimental Results and Discussion

221

Fig. 10.3 USGS map showing the location of different minerals in the Cuprite mining district in Nevada. The map is available online at http://speclab.cr.usgs.gov/cuprite95.tgif.2.2um_map.gif

10.3.4 Evaluation Criterion Signal-to-Reconstruction Error (SRE), Sparsity and Root Mean Square Error (RMSE) were used to quantitatively evaluate the performance of the sparse unmixing algorithm. These metrics can objectively reflect the performance of the unmixing algorithms in different scenarios and help to fully understand the advantages and disadvantages of each algorithm. The SRE is defined as

222

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

Fig. 10.4 Abundance maps of the three materials generated by the Tricorder 4.3 software. From left to right, Alunit, Buddingtonite and Chalcedony

⎛ SRE = 10log10 ⎝

E ||X||2F ˆ 2F E ||X − X||

⎞ ⎠

(10.15)

where E(·) denotes the expectation function, X denotes the true abundance matrix of the hyperspectral image, and Xˆ denotes the abundance matrix estimated by the unmixing algorithm. When the estimated abundance matrix Xˆ is closer to the true abundance matrix X, the corresponding SRE value is larger. Therefore, a larger SRE value represents a better unmixing performance of the algorithm, and vice versa. Sparsity is used to measure the sparsity of the estimated abundance matrix Xˆ and is defined as Sparsity =

z m

(10.16)

where z denotes the number of elements in the estimated abundance matrix Xˆ that have values greater than 0.005, and m denotes the number of pixels in the hyperspectral image. When the value of Sparsity is smaller, it means that the estimated abundance matrix Xˆ has fewer non-zero elements. RMSE is used to measure the error between the reconstructed image and the real image, which is defined as RMSE =

1 m ×l

m

l

yi j − yˆi j

2

(10.17)

i=1 j=1

where yi j denotes a pixel in the pure hyperspectral image, yˆi j denotes the corresponding pixel in the reconstructed hyperspectral image, and l denotes the number of spectral bands.

10.3 Experimental Results and Discussion

223

10.3.5 Experimental Analysis The unmixing was performed on the abovementioned dataset using SUnSAL, SUnSAL-TV, S2 WSU, MUA, SUSRLR-TV, and SRSSWSU algorithms. In order to simulate the effect caused by the environment during data acquisition, we add Gaussian noise to the two simulation datasets with signal-to-noise ratios (SNR) of 20 dB and 30 dB, respectively. The SNR is defined as: SNR = 10log10

E || AX||2 E ||N||2

(10.18)

where N denotes the noise matrix. Table 10.1 shows the specific settings of the hyperparameters of the sparse unmixing algorithm in the simulation experiments. In order to facilitate the comparison of the unmixing results in terms of visualization, Figs. 10.5, 10.6, 10.7 and 10.8 show the abundance maps estimated by the unmixing algorithm on SD1 and SD2. Table 10.2 shows the SRE results obtained by the sparse unmixing algorithm. By comparing the SRE values obtained by different unmixing algorithms, we can find that the SRE values obtained by the sparse unmixing algorithm that incorporates spatial information are significantly higher than those obtained by considering only spectral information. Tables 10.3 and 10.4 show the Sparsity and RMSE results obtained by the unmixing algorithm, respectively. Figure 10.9 shows the abundance maps of the three mineral materials obtained by the unmixing algorithm on the Cuprite dataset. As shown in Fig. 10.9, all the unmixing algorithms were able to recognize the distribution locations of the three Table 10.1 Hyperparameter settings of sparse unmixing algorithm Algorithm

SD1

SD2

SNR = 20 dB

SNR = 30 dB

SNR = 20 dB

SNR = 30 dB

SUnSAL

λ = 7e − 1

λ = 1e − 1

λ = 1e − 1

λ = 1e − 2

SUnSAL-TV

λ = 5e − 2 λT V = 5e − 2

λ = 7e − 3 λT V = 1e − 2

λ = 1e − 2 λT V = 3e − 2

λ = 5e − 3 λT V = 7e − 3

S2 WSU

λ = 1e − 1

λ = 5e − 3

λ = 1e − 2

λ = 1e − 2

MUA

λC = 1e − 3 λ = 1e − 2 β = 10

λC = 7e − 3 λ = 5e − 2 β = 10

λC = 7e − 3 λ = 1e − 1 β=8

λC = 3e − 3 λ = 3e − 2 β=3

SUSRLR-TV

λ = 1e − 2 β = 5e − 2

λ = 5e − 2 β=1

λ = 5e − 2 β = 3e − 2

λ = 5e − 2 β = 7e − 3

SRSSWSU

λC = 1e − 2 λ = 1e − 2 β = 10

λC = 5e − 3 λ = 5e − 3 β = 15

λC = 1e − 2 λ = 5e − 2 β = 10

λC = 1e − 2 λ = 7e − 2 β = 10

224

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

Fig. 10.5 Abundance maps of endmember #2 estimated by the unmixing algorithm in SD1 with SNR = 20

Fig. 10.6 Abundance maps of endmember #2 estimated by the unmixing algorithm in SD1 with SNR = 30

10.3 Experimental Results and Discussion

225

Fig. 10.7 Abundance maps of endmember #2 estimated by the unmixing algorithm in SD2 with SNR = 20

Fig. 10.8 Abundance maps of endmember #2 estimated by the unmixing algorithm in SD2 with SNR = 30

226

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

Table 10.2 SRE results of sparse unmixing algorithm Algorithm

SD1 SNR = 20 dB

SD2 SNR = 30 dB

SNR = 20 dB

SNR = 30 dB

SUnSAL

3.55

8.92

3.85

10.44

SUnSAL-TV

9.43

13.36

11.48

18.01

S2 WSU

7.77

15.62

9.35

21.22

MUA

11.44

15.76

13.65

18.47

SUSRLR-TV

16.63

25.94

17.33

22.21

SRSSWSU

21.16

32.26

19.16

21.91

Table 10.3 Sparsity results of sparse unmixing algorithm Algorithm

SD1

SD2

SNR = 20 dB

SNR = 30 dB

SNR = 20 dB

SNR = 30 dB

SUnSAL

0.0324

0.0435

0.0419

0.0490

SUnSAL-TV

0.0837

0.0405

0.0733

0.0428

S2 WSU

0.0303

0.0256

0.0353

0.0223

MUA

0.0484

0.0403

0.0591

0.0483

SUSRLR-TV

0.0315

0.0371

0.0612

0.0418

SRSSWSU

0.0225

0.0200

0.0305

0.0317

Table 10.4 RMSE results of sparse unmixing algorithm Algorithm

SD1

SD2

SNR = 20 dB

SNR = 30 dB

SNR = 20 dB

SNR = 30 dB

SUnSAL

0.0169

0.0058

0.0133

0.0050

SUnSAL-TV

0.0117

0.0039

0.0110

0.0039

S2 WSU

0.0130

0.0042

0.0124

0.0034

MUA

0.0101

0.0095

0.0118

0.0078

SUSRLR-TV

0.0115

0.0036

0.0112

0.0042

SRSSWSU

0.0080

0.0024

0.0082

0.0038

minerals, which demonstrates the effectiveness of the unmixing algorithms on real hyperspectral images of minerals.

10.3 Experimental Results and Discussion

Fig. 10.9 Abundance maps of the unmixing algorithm on the Cuprite dataset

227

228

10 Hyperspectral Remote Sensing Inversion of Mineral Abundance Based …

Fig. 10.9 (continued)

10.4 Conclusion The task of mineral abundance inversion using hyperspectral images is a promising challenge. This book presents several existing state-of-the-art sparse unmixing algorithms that combine spectral and spatial information from hyperspectral images. They show good performance in both simulation and real hyperspectral datasets. In particular, the spectral information and spatial structure information in hyperspectral images can be more fully utilized by introducing the superpixel segmentation algorithm. Taking the Cuprite dataset as an example, which is a real mining dataset, experiments indicate that the sparse unmixing algorithm achieves satisfactory results on this dataset. In addition, unmixing algorithms based on nonnegative matrix factorization and deep learning-based algorithms can be used for the mineral abundance inversion task. These unsupervised algorithms are capable to perform the abundance inversion task without spectral libraries. It is worth noting that these algorithms were not originally designed to be developed specifically for the mineral abundance inversion task. Therefore, customizing the unmixing algorithms to the characteristics of the mineral areas is expected to achieve superior mineral abundance inversion results.

References

229

References Bioucas-Dias, J. M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., & Chanussot, J. (2012). Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2), 354–379. Borsoi, R. A., Imbiriba, T., Bermudez, J. C. M., & Richard, C. (2018). A fast multiscale spatial regularization for sparse hyperspectral unmixing. IEEE Geoscience and Remote Sensing Letters, 16(4), 598–602. Iordache, M. D., Bioucas-Dias, J. M., & Plaza, A. (2011). Sparse unmixing of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 49(6), 2014–2039. Iordache, M. D., Bioucas-Dias, J. M., & Plaza, A. (2012). Total variation spatial regularization for sparse hyperspectral unmixing. IEEE Transactions on Geoscience and Remote Sensing, 50(11), 4484–4502. Li, H., Feng, R., Wang, L., Zhong, Y., & Zhang, L. (2020). Superpixel-based reweighted low-rank and total variation sparse unmixing for hyperspectral remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 59(1), 629–647. Zhang, S., Li, J., Li, H. C., Deng, C., & Plaza, A. (2018). Spectral–spatial weighted sparse regression for hyperspectral image unmixing. IEEE Transactions on Geoscience and Remote Sensing, 56(6), 3265–3276. Zhang, D., Wang, T., Yang, S., Jia, Y., & Li, F. (2022). Spectral reweighting and spectral similarity weighting for sparse hyperspectral unmixing. IEEE Geoscience and Remote Sensing Letters, 19, 1–5.

Chapter 11

Concluding Remarks

Geological remote sensing is an interdisciplinary field that uses remote sensing technology to study the geological characteristics of the earth (Han et al., 2023). Geological remote sensing uses a variety of remote sensing technologies, including multispectral, hyperspectral and radar remote sensing, to obtain geological information (Bharati et al., 2019). It is a powerful tool that provides valuable information to geologists, resource explorers and environmental scientists (Li et al., 2021), helping them to better understand the earth’s geological characteristics and geological processes, promoting sustainable resource management and reducing the risk of geological disasters. Geological remote sensing research has continued to develop and evolve to leverage advanced remote sensing technologies to better understand the Earth’s geological features and processes (Nicola et al., 2023). It also has wide applications in other fields. For example, in the field of mineral resource exploration, it plays a vital role in assisting in pinpointing mineral deposits, thereby reducing exploration costs. In addition, real-time monitoring of remote sensing data can also help provide early warning of geological disasters, such as landslides, volcanic eruptions and earthquakes, thereby reducing potential risks. The continuous progress of geological remote sensing research in the fields of technology, data processing and application enables us to more comprehensively understand the geological characteristics and processes of the earth. These developments are of great significance for resource exploration, environmental protection, geological hazard monitoring and water resources management, and help solve major challenges in the field of earth science. Intelligent interpretation of remote sensing geology is a research field full of potential. It combines remote sensing technology and intelligent algorithms to provide new tools and methods for geological research. With the continuous development of technology and the continuous accumulation of remote sensing data, we can expect further progress in this field, providing geologists with more efficient and accurate geological information, promoting the development of geological science

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 W. Chen et al., Remote Sensing Intelligent Interpretation for Geology, https://doi.org/10.1007/978-981-99-8997-3_11

231

232

11 Concluding Remarks

and promoting resource management, environmental protection and disaster early warning. Intelligent interpretation of remote sensing geology will play an important role in various fields and bring new opportunities and challenges to the field of earth science. In this work, the intelligent interpretation of geological remote sensing is studied from five aspects: pixel-level classification, scene-level classification, semantic segmentation, prior knowledge and transfer learning. Chapter 2 describes the dataset construction for each method. Chapter 3 is about lithology classification method based on large-scale pixel neighborhood and VGGNet transfer learning. The study used transfer learning of ImageNet natural dataset, which has better classification accuracy and generalization ability than the method that only used the target region data for model training, and provides an effective solution for the classification network of rock and soil mass based on small samples. The intelligent interpretation method based on the largescale pixel neighborhood and VGG16 transfer learning model can greatly improve the interpretation speed and lithology classification accuracy in vegetation covered regions. Large-scale domain data preserves the spatial characteristics of rock mass data and effectively improves the classification accuracy. Chapter 4 is about lithological remote sensing scene classification based on multiview data. In view of the difficulty of extracting key features in the process of remote sensing lithology classification in the coverage area based on deep learning technology, this study conducted research from two aspects of feature extraction and model transferring. A lithologic scene classification model based on multi-view data fusion was also constructed. A dense connection network enhanced feature fusion was used to extract and fuse multi-scale data features, and a channel attention mechanism was added to assign weights to different channel data to improve the model’s ability to focus on key feature information (Su et al., 2022). Finally, multi-view data was used to provide more complete feature description information of the target, and two different data fusion strategies were used to construct a model to extract feature information. After the introduction of multi-view data, the classification accuracy of the model was further improved. And the performance of the model was positively correlated with the number of data sources to be fused (Liu et al., 2021). Chapter 5 is about geological lithology semantic segmentation based on deep learning method. This chapter developed three DL models to classify lithology and other ground surface substrates by using ZY-3 remote sensing images. The experimental results showed that DL models were satisfactory in this task. The accuracy of the randomly cropped dataset achieved about 95%, and the accuracy of the partially overlapped cropped dataset was about 60%. This study can provide a method for ground surface substrate classification from remote sensing images in further researches. Chapter 6 is about research on remote sensing lithology intelligent segmentation method based on multi-source data. This study adopts two strategies: adaptive fusion of multi-modal data and embedding of prior geoscience knowledge (Badrinarayanan et al., 2017). Firstly, an adaptive fusion method of multi-modal remote sensing data

11 Concluding Remarks

233

was proposed. Based on the advantages of high resolution and rich spectral information of optical data, this study adopted a step-by-step fusion method to fuse optical data with SAR and DEM data respectively. It learnt the contribution of optical data to other types of data, and used channel attention to weight the force mechanism. Finally, a triple attention mechanism was used to mine implicit information between spaces and channels, allowing the model to focus on effective features. Then, a semantic segmentation method based on prior knowledge embedding was established. This study processed the existing geological data and integrated it into the deep learning model to help the model mine the differences between different lithologies, thereby improve the accuracy of semantic segmentation and model generalization ability. Chapter 7 is about lithological scene classification based on prior knowledge. In this chapter, a lithologic scene classification model based on improved dense connection network was proposed, and then the influence of prior knowledge embedding on the model classification performance was studied. Finally, the performance of the model was evaluated by using the dataset A constructed by the two-level classification system in the label merging rules proposed in Chap. 2, which indicates the excellent performance of the model. The improved edge enhancement, enhanced feature fusion, random hybrid attention mechanism and label smoothing introduced into the dense connected network allow it to be more accurate than the traditional network model, and the comparison of ablation experiments showed that all the four mechanisms contribute to the performance improvement. Based on the proposed classification model based on the improved dense connection network, this chapter further studied the extraction ability of lithology key features. Through the loss correction strategy, a lithology scene classification model based on the prior knowledge embedding to improve the dense connection network was constructed. In the model, the predicted results obtained from the dataset made by the prior knowledge label were compared with the real results through model training to correct the experimental losses obtained from the improved dense connected network on the dataset proposed in Chap. 2, which was more accurate than the single loss network model. Chapter 8 is about lithological scene classification based on transfer learning. Based on the idea of transfer learning, this chapter used a small amount of multiview data and fine-tuning strategy to adapt to the lithology classification task in a new study area, and carried out the transfer learning of the multi-view data lithology classification model, which can achieve the identification of new lithology categories (He et al., 2010). The experiment showed that the two transfer learning methods had some performance improvement compared with the direct use of data for model training. The classification accuracy of the model after the transferring and fine-tuning of the feature-level fusion model parameters has a certain improvement compared with the model directly trained. The performance of the feature level fusion trained by the three data sources to the target domain of the two data sources exceeded that of the data level fusion, indicating that the model trained by this method can have better performance in transfer learning. In addition, using the feature level fusion strategy, the parameters of the model are more conducive to transfer learning, and the model promotion is easier. Chapter 9 is about lithological scene classification based on model migration and

234

11 Concluding Remarks

fine-tuning strategy. This chapter utilized the idea of transfer learning to study the generalization ability of the lithology classification model. In order to improve the generalization ability of the model, a classification model trained in the source domain was used to transfer the relevant information to the target domain. A small amount of labeled data in the target domain was used to adjust the transferred model, improving the classification performance of the model in the target domain and achieving the goal of identifying new lithological categories. The experiments showed that in all scenarios the transfer learning methods can have certain performance improvements compared to directly using data for model training. Chapter 10 is about hyperspectral remote sensing inversion of mineral abundance based on the sparse unmixing method. Due to the limitations of the spatial resolution of the sensor and the complex diversity of the ground, a single pixel of a hyperspectral image may contain the spectral signature of several different substances, which is called the mixed pixel (Liu, 2023). This study presents several existing state-of-theart sparse unmixing algorithms that combine spectral and spatial information from hyperspectral images. They show good performance in both simulation and real hyperspectral datasets. In particular, the spectral information and spatial structure information in hyperspectral images can be more fully utilized by introducing the superpixel segmentation algorithm. Taking the Cuprite dataset as an example, which is a real mining dataset, experiments indicate that the sparse unmixing algorithm achieves satisfactory results on this dataset.

References Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. Bharati, R. V., Sharma, S., & Goswami, P. (2019). Remote sensing concepts and application in agriculture. Rashtriya Krishi (English), 14(1). Han, W., Zhang, X., Wang, Y., Wang, L., Huang, X., Li, J., Wang, S., Chen, W., Li, X., Feng, R., Fan, R., Zhang, X., & Wang, Y. (2023). A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS Journal of Photogrammetry and Remote Sensing, 202. He, H., Yang, X., Li, Y., et al. (2010). Multi-source data fusion technique and its application in geological and mineral survey. Journal of Earth Science and Environment, 32(1), 44–47. Li, S., Li, C., & Kang, X. (2021). Development status and future prospect of multi-source remote sensing image fusion. Journal of Remote Sensing, 25(01), 148–166 (in Chinese). Liu, C., Ning, Q., Lei, Y., et al. (2021). Application of improved residual neural network in remote sensing image classification. Science Technology and Engineering, 21(31), 13421–13429. (in Chinese). Liu, D. (2023). Research on hyperspectral remote sensing image classification method based on convolutional neural network. University of Chinese Academy of Sciences (Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences) (in Chinese). Nicola, C., Emanuele, I., Veronica, T., et al. (2023). Landslide detection, monitoring and prediction with remote-sensing techniques. Nature Reviews Earth and Environment, 4(1).

References

235

Su, S., Zhang, Y., & Zhang, D. (2022). Coupling degree correlation code bad odor detection method based on deep learning. Computer Applications, 42(06), 1702–1707 (in Chinese).