Pattern Recognition and Computer Vision: 6th Chinese Conference, PRCV 2023, Xiamen, China, October 13–15, 2023, Proceedings, Part X (Lecture Notes in Computer Science) 981998548X, 9789819985487

The 13-volume set LNCS 14425-14437 constitutes the refereed proceedings of the 6th Chinese Conference on Pattern Recogni

159 85 93MB

English Pages 512 [509] Year 2024

Table of contents :
Preface
Organization
Contents – Part X
Neural Network and Deep Learning III
Dual-Stream Context-Aware Neural Network for Survival Prediction from Whole Slide Images
1 Introduction
2 Method
3 Experiments and Results
4 Conclusion
References
A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction
1 Introduction
2 Related Work
2.1 Correlation-Agnostic Algorithms
2.2 Spatial Correlation Algorithms
2.3 Semantic Correlation Algorithms
3 Methodology
3.1 Definition of Multi-label Image Recognition
3.2 The Framework of SSCI
3.3 Loss Function
4 Experiments
4.1 Evaluation Metrics
4.2 Implementation Details
4.3 Comparison with Other Mainstream Algorithms
4.4 Evaluation of the SSCI Effectiveness
5 Conclusion
References
Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation
1 Introduction
2 Related Work
2.1 Temporal Action Segmentation
2.2 Skeleton-Based Action Recognition
3 Method
3.1 Network Architecture
3.2 Multi-Branch Transfer Fusion Module
3.3 Multi-Scale Temporal Convolution Module
3.4 Loss Function
4 Experiments
4.1 Setup
4.2 Effect of Hierarchical Model
4.3 Effect of Multiple Modalties
4.4 Effect of Multi-modal Fusion Methods
4.5 Effect of Multi-Scale Temporal Convolution
4.6 Comparision with State-of-the-Art
5 Conclusion
References
Multi-behavior Enhanced Graph Neural Networks for Social Recommendation
1 Introduction
2 Related Work
3 Preliminaries
4 Methodology
4.1 Embedding Layer
4.2 Propagation Layer
4.3 Multi-behavior Integration Layer
4.4 Prediction Layer
4.5 Model Training
5 Experiments
5.1 Experimental Settings
5.2 Performance Comparison (RQ1)
5.3 Ablation Study (RQ2)
5.4 Parameter Analysis (RQ3)
6 Conclusion and Future Work
References
A Complex-Valued Neural Network Based Robust Image Compression
1 Introduction
2 Related Works
2.1 Neural Image Compression
2.2 Adversarial Attack
2.3 Complex-Valued Convolutional Neural Networks
3 Proposed Method
3.1 Overall Framework
3.2 Nonlinear Transform
4 Experiment Results
4.1 Experiment Setup
4.2 Results and Comparison
4.3 Ablation Study
5 Conclusions
References
Binarizing Super-Resolution Neural Network Without Batch Normalization
1 Introduction
2 Related Work
3 Method
3.1 Batch Normalization in SR Models
3.2 Channel-Wise Asymmetric Binarizer for Activations
3.3 Smoothness-Controlled Estimator
4 Experimentation
4.1 Experiment Setup
4.2 Ablation Study
4.3 Visualization
5 Conclusion
References
Infrared and Visible Image Fusion via Test-Time Training
1 Introduction
2 Method
2.1 Overall Framework
2.2 Training and Testing
3 Experiments
3.1 Experiment Configuration
3.2 Performance Comparison on TNO
3.3 Performance Comparison on VIFB
3.4 Ablation Study
4 Conclusion
References
Graph-Based Dependency-Aware Non-Intrusive Load Monitoring
1 Introduction
2 Proposed Method
2.1 Problem Formulation
2.2 Co-occurrence Probability Graph
2.3 Graph Structure Learning
2.4 Graph Attention Neural Network
2.5 Encoder-Decoder Module
3 Numerical Studies and Discussions
3.1 Dataset and Experiment Setup
3.2 Metrics and Comparisons
4 Conclusion
References
Few-Shot Object Detection via Classify-Free RPN
1 Introduction
2 Related Work
2.1 Object Detection
2.2 Few-Shot Learning
2.3 Few-Shot Object Detection
3 Methodology
3.1 Problem Setting
3.2 Analysis of the Base Class Bias Issue in RPN
3.3 Classify-Free RPN
4 Experiments
4.1 Experimental Setup
4.2 Comparison with the State-of-the-Art
4.3 Ablation Study
5 Conclusion
References
IPFR: Identity-Preserving Face Reenactment with Enhanced Domain Adversarial Training and Multi-level Identity Priors
1 Introduction
2 Methods
2.1 Target Motion Encoder and 3D Shape Encoder
2.2 3D Shape-Aware Warping Module
2.3 Identity-Aware Refining Module
2.4 Enhanced Domain Discriminator
2.5 Training
3 Experiment
3.1 Experimental Setup
3.2 Comparisons
3.3 Ablation Study
4 Limitation
5 Conclusion
References
L2MNet: Enhancing Continual Semantic Segmentation with Mask Matching
1 Introduction
2 Related Work
3 Method
3.1 Preliminaries and Revisiting
3.2 Proposed Learn-to-Match Framework
3.3 Training Loss
4 Experiments
4.1 Experimental Setting
4.2 Quantitative Evaluation
4.3 Ablation Study
5 Conclusion
References
Adaptive Channel Pruning for Trainability Protection
1 Introduction
2 Related Work
3 Method
3.1 Method Framework and Motivation
3.2 Channel Similarity Calculation and Trainability Preservation
3.3 Sparse Control and Optimization
4 Experiments
4.1 Experiments Settings and Evaluation Metrics
4.2 Results on Imagenet
4.3 Results on Cifar-10
4.4 Results on YOLOX-s
4.5 Ablation
5 Conclusion
References
Exploiting Adaptive Crop and Deformable Convolution for Road Damage Detection
1 Introduction
2 Related Work
3 Methods
3.1 Adaptive Image Cropping Based on Vanishing Point Estimation
3.2 Feature Learning with Deformable Convolution
3.3 Diagonal Intersection over Union Loss Function
4 Experiment
4.1 Comparative Analysis of Different Datasets
4.2 Ablation Analysis
5 Conclusion
References
Cascaded-Scoring Tracklet Matching for Multi-object Tracking
1 Introduction
2 Related Work
2.1 Tracking by Detection
2.2 Joint Detection and Tracking
3 Proposed Method
3.1 Cascaded-Scoring Tracklet Matching
3.2 Motion-Guided Based Target Aware
3.3 Appearance-Assisted Feature Warper
4 Experiments
4.1 Experimental Setup
4.2 Ablation Studies
4.3 Comparison with State-of-the-Art Methods
5 Conclusion
References
Boosting Generalization Performance in Person Re-identification
1 Introduction
2 Related Work
2.1 Generalizable Person ReID
2.2 Vision-Language Learning
3 Method
3.1 Review of CLIP
3.2 A Novel Cross-Modal Framework
3.3 Prompt Design Process
3.4 Loss Function
4 Experiments
4.1 Datasets and Evaluation Protocols
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparison with State-of-the-Art Methods
4.5 Other Analysis
5 Conclusion
References
Self-guided Transformer for Video Super-Resolution
1 Introduction
2 Related Work
2.1 Video Super-Resolution
2.2 Vision Transformers
3 Our Method
3.1 Network Overview
3.2 Multi-headed Self-attention Module Based on Offset-Guided Window (OGW-MSA)
3.3 Feature Aggregation (FA)
4 Experiments
4.1 Datasets and Experimental Settings
4.2 Comparisons with State-of-the-Art Methods
4.3 Ablation Study
5 Conclusion
References
SAMP: Sub-task Aware Model Pruning with Layer-Wise Channel Balancing for Person Search
1 Introduction
2 Related Work
3 The Proposed Method
3.1 Framework Overview
3.2 Sub-task Aware Channel Importance Estimation
3.3 Layer-Wise Channel Balancing
3.4 Adaptive OIM Loss for Model Pruning and Finetuning
4 Experimental Results and Analysis
4.1 Dataset and Evaluation Metric
4.2 Implementation Details
4.3 Comparison with the State-of-the-Art Approaches
4.4 Ablation Study
5 Conclusion
References
MKB: Multi-Kernel Bures Metric for Nighttime Aerial Tracking
1 Introduction
2 Methodology
2.1 Kernel Bures Metric
2.2 Multi-Kernel Bures Metric
2.3 Objective Loss
3 Experiments
3.1 Implementation Details
3.2 Evaluation Datasets
3.3 Comparison Results
3.4 Visualization
3.5 Ablation Study
4 Conclusion
References
Deep Arbitrary-Scale Unfolding Network for Color-Guided Depth Map Super-Resolution
1 Introduction
2 The Proposed Method
2.1 Problem Formulation
2.2 Algorithm Unfolding
2.3 Continuous Up-Sampling Fusion (CUSF)
2.4 Loss Function
3 Experimental Results
3.1 Implementation Details
3.2 The Quality Comparison of Different DSR Methods
3.3 Ablation Study
4 Conclusion
References
SSDD-Net: A Lightweight and Efficient Deep Learning Model for Steel Surface Defect Detection
1 Introduction
2 Methods
2.1 LMFE: Light Multiscale Feature Extraction Module
2.2 SEFF: Simple Effective Feature Fusion Network
2.3 SSDD-Net
3 Experiments and Analysis
3.1 Implementation Details
3.2 Evaluation Metrics
3.3 Dataset
3.4 Ablation Studies
3.5 Comparison with Other SOTA Methods
3.6 Comprehensive Performance of SSDD-Net
4 Conclusion
References
Effective Small Ship Detection with Enhanced-YOLOv7
1 Introduction
2 Method
2.1 Small Object-Aware Feature Extraction Module (SOAFE)
2.2 Small Object-Friendly Scale-Insensitive Regression Scheme (SOFSIR)
2.3 Geometric Constraint-Based Non-Maximum Suppression Method (GCNMS)
3 Experiments
3.1 Experimental Settings
3.2 Quantitative Analysis
3.3 Ablation Studies
3.4 Qualitative Analysis
4 Conclusion
References
PiDiNeXt: An Efficient Edge Detector Based on Parallel Pixel Difference Networks
1 Introduction
2 Related Work
2.1 The Development of Deep Learning Based Edge Detection
2.2 Review of Pixel Difference Convolution
3 Method
3.1 PiDiNeXt Architecture
4 PiDiNeXt Reparameterization
5 Experiments
5.1 Datasets and Implementation
5.2 Comparison with the State of the Arts
5.3 Ablation Study
6 Conclusion
References
Transpose and Mask: Simple and Effective Logit-Based Knowledge Distillation for Multi-attribute and Multi-label Classification
1 Introduction
2 Related Work
3 Method
3.1 Preliminaries
3.2 Vanilla KD
3.3 TM-KD
4 Experiments
4.1 Experimental Setting
4.2 Main Results
4.3 Ablation Study
5 Conclusion
References
CCSR-Net: Unfolding Coupled Convolutional Sparse Representation for Multi-focus Image Fusion
1 Introduction
2 Background
3 The Proposed Method
3.1 Model Formulation
3.2 Loss Function
3.3 CCSR-Net-Based Multi-Focus Image Fusion
4 Experiments
4.1 Experimental Settings
4.2 Comparison with State-of-the-Art Methods
4.3 Ablation Experiments
5 Conclusion
References
FASONet: A Feature Alignment-Based SAR and Optical Image Fusion Network for Land Use Classification
1 Introduction
2 Related Work
3 Method
3.1 Network Structure
3.2 FAM
3.3 MSEM
3.4 Loss Function
4 Experimental Results and Analysis
4.1 Experimental Setups
4.2 Experiments on WHU-OPT-SAR Dataset
4.3 Ablation Study
5 Conclusion
References
.28em plus .1em minus .1emDe Novo Design of Target-Specific Ligands Using BERT-Pretrained Transformer
1 Background
2 Materials and Methods
2.1 Network Framework
2.2 Training Setting
3 Results and Discussion
3.1 Drug-Target Affinity Prediction
3.2 Molecule Docking
4 Conclusion
References
CLIP for Lightweight Semantic Segmentation
1 Introduction
2 Related Work
2.1 Lightweight Visual Encoder
2.2 Language-Guided Recognition
3 Proposed Method
3.1 Language-Guided Semantic Segmentation Framework
3.2 Structure of the Fusion Module
3.3 Design of Conv-Former
4 Experiments
4.1 Set up and Implementation
4.2 For Lightweight Visual Backbone
4.3 For Any Visual Backbone
4.4 Ablation Study
5 Conclusion
References
Teacher-Student Cross-Domain Object Detection Model Combining Style Transfer and Adversarial Learning
1 Introduction
2 Related Works
3 Proposed Method
3.1 Fourier Style Transfer
3.2 Augmentation Strategies
3.3 Mutual Learning and Adversarial Learning
4 Experiments
4.1 Implementation Details
4.2 Experimental Settings and Evaluation
4.3 Results and Comparisons
4.4 Ablation Studies
5 Conclusion
References
Computing 2D Skeleton via Generalized Electric Potential
1 Introduction
2 Method Overview
2.1 Symbols
2.2 Overview
3 Skeleton Extraction by the Generalized Coulomb Field
3.1 Generalized Potential Model
3.2 Ridge Extraction of Generalized Electric Potential Surface
4 Skeleton Pruning by the Electric Charge Distribution
4.1 The Electric Charge Distribution on the Shape
4.2 The Implementation of Skeleton Pruning
5 Experimental Results and Performance Analysis
5.1 Parameters Analysis
5.2 Comparisons with Existing Methods
6 Conclusion
References
Illumination Insensitive Monocular Depth Estimation Based on Scene Object Attention and Depth Map Fusion
1 Introduction
2 Related Work
3 Method
3.1 Problem Setup
3.2 Proposed Architecture
3.3 Low Light Image Enhancement
3.4 Scene Object Attention Mechanism
3.5 Weighted Depth Map Fusion
3.6 Pixel Adaptive Convolution
4 Experimental Results and Analysis
4.1 Experimental Data Sets and Evaluation Metrics
4.2 Parameter Setting
4.3 Analysis of Results
4.4 Analysis of Ablation Experiments
4.5 Model Convergence and Execution Efficiency
5 Conclusion
References
A Few-Shot Medical Image Segmentation Network with Boundary Category Correction
1 Introduction
2 Related Work
2.1 Few-Shot Segmentation in Natural Image Field
2.2 Few-Shot Segmentation in Medical Image Field
3 Method
3.1 Overview
3.2 Prior Mask Generation Module
3.3 Multi-scale Feature Fusion Module
3.4 Boundary Category Correction Framework
3.5 Loss Function
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Experiment Results
4.4 Ablation Study
5 Conclusion
References
Repdistiller: Knowledge Distillation Scaled by Re-parameterization for Crowd Counting
1 Introduction
2 Related Work
3 Method
3.1 Model Selection
3.2 Re-parameterization by Neural Architecture Search
3.3 Knowledge Distillation by Gradient Decoupling
4 Experiments
4.1 Model Performance
4.2 Comparisons with Other Compression Algorithms
4.3 Ablation Study
5 Conclusion
References
Multi-depth Fusion Transformer and Batch Piecewise Loss for Visual Sentiment Analysis
1 Introduction
2 Related Work
2.1 Visual Sentiment Analysis
2.2 Vision Transformer
3 Multi-depth Fusion Transformer Framework Design
3.1 Overall Framework Process
3.2 Tokenizer Module
3.3 Multi-depth Fusion Encoder
3.4 Batch Piecewise Loss
4 Experiments
4.1 Comparisons with State-of-the-Art Methods
4.2 Ablation Experiments
4.3 Visualization
5 Conclusions
References
Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
1 Introduction
2 Background
2.1 Open Vocabulary Semantic Segmentation
2.2 Foundational Models Pretrained on Large-Scale Data
3 Issue Analysis in OVS
3.1 Analysis Settings
3.2 Datasets and Models
3.3 Analysis Results
4 Preliminary Approaches for Bottleneck Issues
4.1 Preliminaries
4.2 Refining Benchmark for Dataset Issue
4.3 Our Framework for Segmentation and Reasoning Issue
5 Experiment Results
5.1 Dataset Refinement
5.2 Our Framework
6 Conclusion, Limitations and Future Work
References
Exploring a Distillation with Embedded Prompts for Object Detection in Adverse Environments
1 Introduction
2 Related Work
2.1 Object Detection
2.2 Enhancement for Detection
2.3 Differentiable Image Processing
3 Method
3.1 Spatial Location Prompt
3.2 Semantic Mask Prompt
3.3 Focal Distillation
3.4 Overall Loss
4 Experiments
4.1 Underwater Scenes
4.2 Haze Scenes
4.3 Ablation Study
References
TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking
1 Introduction
2 Methods
2.1 Overview
2.2 Target-Aware Feature Extraction Network
2.3 Dual-Layer Fusion Module
2.4 Objective Classification and Regression Networks
3 Experiment
3.1 Experimental Details
3.2 Public Dataset Evaluation
3.3 Ablation Experiments and Analysis
4 Conclusion
References
DARN: Crowd Counting Network Guided by Double Attention Refinement
1 Introduction
2 Proposed Method
2.1 Framework Overview
2.2 Attention-Guided Feature Aggregation
2.3 Attention-Guided Multi-stage Refinement Method
2.4 Loss Function
3 Experiment
3.1 Implementation Details
3.2 Results and Analysis
3.3 Ablation Studies
4 Conclusion
References
DFR-ECAPA: Diffusion Feature Refinement for Speaker Verification Based on ECAPA-TDNN
1 Introduction
2 Denoising Diffusion Probabilistic Models
3 Proposed Method
3.1 Diffusion Model Representations
3.2 Dual Network Branch Feature Fusion
3.3 Cross-Domain Speaker Verification
4 Experiments
4.1 Dataset
4.2 Training
4.3 Evaluation
4.4 Result
5 Conclusion
References
Half Aggregation Transformer for Exposure Correction
1 Introduction
2 Related Work
2.1 Exposure Correction
2.2 Vision Transformer
3 Method
3.1 Overall Architecture
3.2 Half Aggregation Multi-head Self-Attention
3.3 Illumination Guidance Mechanism
4 Experiment
4.1 Implementation Details
4.2 Comparisons with State-of-the-Art Methods
4.3 Ablation Study
5 Conclusion
References
Deformable Spatial-Temporal Attention for Lightweight Video Super-Resolution
1 Introduction
2 Related Work
2.1 Single Image Super-Resolution
2.2 Video Super-Resolution
3 Proposed Method
3.1 Network Architecture
3.2 Spatial-Temporal Feature Extraction
3.3 Deformable Spatial-Temporal Attention
4 EXPERIMENT
4.1 Experimental Settings
4.2 Quantitative Comparison
4.3 Qualitative Comparison
4.4 Ablation Experiment
5 Conclusion
References
Author Index