Pattern Recognition and Computer Vision: 6th Chinese Conference, PRCV 2023, Xiamen, China, October 13–15, 2023, Proceedings, Part XII (Lecture Notes in Computer Science) 9819985544, 9789819985548

The 13-volume set LNCS 14425-14437 constitutes the refereed proceedings of the 6th Chinese Conference on Pattern Recogni

142 26 78MB

English Pages 540 [535] Year 2024

Table of contents :
Preface
Organization
Contents – Part XII
Object Detection, Tracking and Identification
OKGR: Occluded Keypoint Generation and Refinement for 3D Object Detection
1 Introduction
2 Related Works
2.1 LiDAR-Based 3D Object Detection
2.2 Object Shape Completion
3 Methodology
3.1 Overview
3.2 Occluded Keypoint Generation
3.3 Occluded Keypoint Refinement
3.4 Loss Function
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Evaluation on KITTI Dataset
4.4 Evaluation on Waymo Open Dataset
4.5 Model Efficiency
4.6 Ablation Studies
5 Conclusion
References
Camouflaged Object Segmentation Based on Fractional Edge Perception
1 Introduction
2 Related Work
3 Interactive Task Learning Network
3.1 Integral and Fractional Edge
3.2 Camouflaged Edge Detection Module
4 Performance Evaluation
4.1 Datasets and Experiment Settings
4.2 Quantitative Evaluation
4.3 Qualitative Evaluation
4.4 Generalization of Edge Detection
5 Conclusion
References
DecTrans: Person Re-identification with Multifaceted Part Features via Decomposed Transformer
1 Introduction
2 Related Work
3 Methodology
3.1 Vision Transformer as Feature Extractor
3.2 Token Decomposition (TD) Layer
3.3 Data Augmentation for TD Layer
3.4 Training and Inference
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Comparisons to State-of-the-arts
4.4 Ablation Study
5 Conclusion
References
AHT: A Novel Aggregation Hyper-transformer for Few-Shot Object Detection
1 Introduction
2 Related Work
2.1 Object Detection
2.2 Hypernetworks
3 Method
3.1 Preliminaries
3.2 Overview
3.3 Dynamic Aggregation Module
3.4 Conditional Adaptation Hypernetworks
3.5 The Classification-Regression Detection Head
4 Experiments
4.1 Experimental Setting
4.2 Comparison Results
4.3 Ablation Study
4.4 Visualization of Our Module
5 Conclusion
References
Feature Refinement from Multiple Perspectives for High Performance Salient Object Detection
1 Introduction
2 Proposed Method
2.1 Overall Architecture
2.2 Attention-Guided Bi-directional Feature Refinement Module
2.3 Serial Atrous Fusion Module
2.4 Upsampling Feature Refinement Module
2.5 Objective Function
3 Experiments
3.1 Experimental Setup
3.2 Comparison with State-of-the-Art Methods
3.3 Ablation Study
4 Conclusion
References
Feature Disentanglement and Adaptive Fusion for Improving Multi-modal Tracking
1 Introduction
2 Related Work
2.1 Multi-modal Tracking
2.2 Transformers Tracking
3 Methodology
3.1 Preliminary
3.2 Our Approach
3.3 Training and Inference
4 Experiments
4.1 Implementation Details
4.2 Comparison with State-of-the-Arts Multi-modal Trackers
4.3 Ablation Study
5 Conclusion
References
Modality Balancing Mechanism for RGB-Infrared Object Detection in Aerial Image
1 Introduction
2 Related Work
2.1 Object Detection in Aerial Images
2.2 RGB-Infrared Object Detection
3 Method
3.1 Overview
3.2 Modality Balancing Mechanism
3.3 Multimodal Feature Hybrid Sampling Module
4 Experiment
4.1 Settings
4.2 Comparison with State-of-the-Art Methods
4.3 Ablation Study
5 Conclusion
References
Pacific Oyster Gonad Identification and Grayscale Calculation Based on Unapparent Object Detection
1 Introduction
2 Method
2.1 Compact Pyramid Refinement Module (CPRM)
2.2 Switchable Excitation Model (SEM)
3 Experiments and Analysis of Results
3.1 Establishment of the Datasets
3.2 Experimental Environment and Evaluation Index
3.3 Ablation Experiments
3.4 Comparative Experiments and Analysis of Results
3.5 Visualization Results
3.6 Gray Value Calculation
4 Conclusion
References
Multi-task Self-supervised Few-Shot Detection
1 Introduction
2 Related Work
2.1 Self-supervised Learning
2.2 Few-Shot Object Detection
3 Methodology
3.1 Problem Setting
3.2 Self-supervised Auxiliary Branch
3.3 Multi-Task Learning
4 Experiments
4.1 Implementation Details
4.2 Few-Shot Object Detection Benchmarks
4.3 Ablation Analysis
4.4 Visualization
5 Conclusion
References
CSTrack: A Comprehensive and Concise Vision Transformer Tracker
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 CSBlock
3.3 Prediction Head and Loss
4 Experiment
4.1 Implementation Details
4.2 Comparisons with the State-of-the-Art Trackers
4.3 Ablation Study
4.4 Visualization of Attention Maps
4.5 Visualization of Tracking Performance
5 Conclusion
References
Feature Implicit Enhancement via Super-Resolution for Small Object Detection
1 Introduction
2 Related Works
2.1 General Object Detection
2.2 Small Object Detection Based on Super-Resolution
3 Methods
3.1 Overall Architecture
3.2 Training
4 Experiments and Details
4.1 Dataset and Details
4.2 Ablation Study
4.3 Main Results
5 Conclusion
References
Improved Detection Method for SODL-YOLOv7 Intensive Juvenile Abalone
1 Introduction
2 Methods
2.1 SODL Small Target Detection Network
2.2 ACBAM Attention Module
3 Experimental Results and Analysis
3.1 Experimental Data Preprocessing
3.2 Experimental Environment and Evaluation Index
3.3 Experimental Results and Analysis
4 Conclusion
References
MVP-SEG: Multi-view Prompt Learning for Open-Vocabulary Semantic Segmentation
1 Introduction
2 Related Work
2.1 Vision-Language Models
2.2 Zero-Shot Segmentation
2.3 Prompt Learning
3 Method
3.1 Problem Definition
3.2 MVP-SEG
3.3 MVP-SEG+
4 Experiments
4.1 Datasets
4.2 Evaluation Metrics
4.3 Implementation Details
4.4 Ablation Studies on MVP-SEG
4.5 Comparison with State-of-the-Art
5 Conclusion
References
Context-FPN and Memory Contrastive Learning for Partially Supervised Instance Segmentation
1 Introduction
2 Related Work
3 CCMask
3.1 Overview
3.2 Context-FPN
3.3 Memory Contrastive Learning Head
3.4 Loss Function
4 Experiments
4.1 Experimental Setup
4.2 Experimental Results
4.3 Ablation Study
5 Conclusion
References
A Dynamic Tracking Framework Based on Scene Perception
1 Introduction
2 Related Work
3 Method
3.1 Easy-Hard Dual-Branch Network
3.2 Scene Router
4 Experiments
4.1 Implementation Details
4.2 Comparison with State-of-the-arts
4.3 Ablation Study and Analysis
5 Conclusion
References
HPAN: A Hybrid Pose Attention Network for Person Re-Identification
1 Introduction
2 The Proposed Method
2.1 Local Key Point Features
2.2 Self-Attention
2.3 Hybrid Pose and Global Feature Fusion (HPGFF)
2.4 Loss Function
2.5 Training Strategy
3 Experiments
3.1 Datasets and Evaluation Metrics
3.2 Comparison with SOTA Methods
3.3 Ablation Studies
3.4 Visualization of Attention Maps
4 Conclusion
References
SpectralTracker: Jointly High and Low-Frequency Modeling for Tracking
1 Introduction
2 Related Work
2.1 Visual Tracking
2.2 Frequency Modeling in Visual Transformer
3 Method
3.1 Dual-Spectral Module
3.2 Dual-Spectral for Tracking
3.3 Prediction Head and Total Loss
4 Experiments
4.1 Implementation Details
4.2 State-of-the-Art Comparison
4.3 Ablation Studies
5 Conclusion
References
DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking
1 Introduction
2 Related Works
2.1 Visual Tracking Based on Siamese Network
2.2 Diffusion Model
3 Method
3.1 Architecture
3.2 Training Process
3.3 Inference Process
4 Experiments
4.1 Implementation Details
4.2 Ablation Study
4.3 General Datasets Evaluation
4.4 Attributes Evaluation
4.5 Compatibility Experiment
5 Conclusion
References
Instance-Proxy Loss for Semi-supervised Learning with Coarse Labels
1 Introduction
2 Related Work
3 Method
3.1 Instance-Level Loss
3.2 Proxy-Level Loss
3.3 Instance-Proxy Loss
4 Experiments
4.1 Comparison to SOTA Methods
4.2 Ablation Study
5 Conclusion
References
FAFVTC: A Real-Time Network for Vehicle Tracking and Counting
1 Introduction
2 Related Work
3 Method
3.1 Backbone Network
3.2 Multi-spectral Channel and Spatial Attention (MCSA)
3.3 Data Association
3.4 Vehicle Counting
4 Experiments
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Comparison Experiments
4.4 Ablation Study
5 Conclusion
References
Ped-Mix: Mix Pedestrians for Occluded Person Re-identification
1 Introduction
2 Related Works
2.1 Occluded Person Re-identification
2.2 Data Augmentation and Training Loss
3 Proposed Method
3.1 Ped-Mix
3.2 Non-target Suppression Loss
3.3 Training Procedure
4 Experiment
4.1 Datasets and Evaluation Measures
4.2 Implementation Details
4.3 Ablation Studies
4.4 Comparison with State-of-the-Art Methods
4.5 Visualization
4.6 Why Random Masking
4.7 Results on Holistic Datasets
5 Conclusion
References
Object-Aware Transfer-Based Black-Box Adversarial Attack on Object Detector
1 Introduction
2 Related Work
2.1 Object Detection
2.2 Adverserial Attack
3 Method
3.1 Object-Aware Mechanism
3.2 Assigners and Samplers of Detectors for Attack
3.3 Loss Functions of Detectors for Attack
4 Experiments
4.1 Implementation Details
4.2 Main Results
4.3 Ablation Study
5 Conclusion
References
HTNet: A Hybrid Model Boosted by Triple Self-attention for Crowd Counting
1 Introduction
2 Related Work
2.1 CNN in Crowd Counting
2.2 Self Attention in Crowd Counting
3 Proposed Method
3.1 CNN-Based Backbone
3.2 Global Self-attention Module
3.3 Local and Channel Self-attention Module
3.4 Double Head Module
3.5 Loss Function
4 Experiments
4.1 Ground Truth Generation
4.2 Implementation Details
4.3 Evaluations and Comparisons
4.4 Ablation Studies
5 Conclusion
References
Reliable Boundary Samples-Based Proxy Pairs for Unsupervised Person Re-identification
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Local-Global Consistency Guided Label Refinement (LGCLR)
3.3 Reliable Boundary Samples Based Proxy Pairs (RBSPP)
4 Experiments
4.1 Datasets and Evaluation Protocols
4.2 Implementation Details
4.3 Comparison with State-of-the-Arts
4.4 Ablation Study
4.5 Clustering Quality
5 Conclusion
References
High-Resolution Feature Representation Driven Infrared Small-Dim Object Detection
1 Introduction
2 Methods
2.1 High-Resolution Feature Representation (HRFR)
2.2 Infrared Small-Dim Object Detection (ISDOD)
2.3 Spatial-Frequency Interaction Feature Enhancement (SFIFE)
2.4 Loss Function
3 Experiments
3.1 Baseline Methods
3.2 Datasets and Implementation Details
3.3 Comparison with State-of-the-art Methods
3.4 Ablation Study
4 Conclusion
References
Few-Shot Object Detection Algorithm Based on Adaptive Relation Distillation
1 Introduction
2 Related Work
3 Our Approach
4 Experiment
4.1 Dataset
4.2 Implementation Details
4.3 Comparisons with State-of-the-Art Methods
4.4 Comparisons with DCNet
4.5 Ablation Study
5 Conclusion
References
A Real-Time Safety Detector Based on Re-parameterization Multiscale Feature Fusion for Forklift Driving
1 Introduction
2 Related Works
3 Proposed Methods
3.1 FasterNeXt
3.2 RepGhost
3.3 WIoU
3.4 RMFFDet
4 Experiments
4.1 Datasets
4.2 Experimental Results
4.3 Comparison of Detection Performance
4.4 Ablation Experiment
4.5 Edge Platform Deployments
5 Conclusion
References
RTMDet-R2: An Improved Real-Time Rotated Object Detector
1 Introduction
2 Related Work
2.1 Rotated Object Detector
2.2 Multi-level Features
2.3 Label Assignment
3 Methodology
3.1 Enhanced Path PAFPN
3.2 Task Interaction Decuple Head
3.3 ProbIoU-aware Dynamic Label Assignment
4 Experiments
4.1 Datasets
4.2 Implement Details
4.3 Ablation Studies
4.4 Comparison with State-of-the-Arts
5 Conclusion
References
Boosting Object Detection in Foggy Scenes via Dark Channel Map and Union Training Strategy
1 Introduction
2 Related Work
2.1 Object Detection
2.2 Object Detection in Adverse Scenarios
2.3 Attention Mechanism and Dark Channel Prior
3 Methodology
3.1 Overview of DG-Net
3.2 Dark Channel Map-Guided Feature Fusion Module
3.3 Focal Loss and Self-calibrated Convolutions
3.4 Union Training Strategy
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison with the SOTAs
4.4 Ablation Study
4.5 Efficiency Analysis
5 Conclusion
References
Object Centric Body Part Attention Network for Human-Object Interaction Detection
1 Introduction
2 Related Work
2.1 Human-Object Interaction.
2.2 Part Based Interaction Detection
3 Method
3.1 Overview
3.2 Object Centric Decoder
3.3 Interaction Decoder
3.4 Training and Inference
4 Experiments
4.1 Experimental Settings
4.2 Implementation Details
5 Results
5.1 Comparison to State-of-the-Art
5.2 Ablation Study
6 Conclusion
References
Salient Feature Enhanced Multi-object Tracking with Soft-Sparse Attention in Transformer
1 Introduction
2 Related Works
2.1 Feature Enhancement in Tracking
2.2 Transformers in Tracking
2.3 Attentions Applied in Transformer
3 Method
3.1 Overall Architecture
3.2 Salient Feature Enhancement Module
3.3 Encoder
3.4 Decoder with Soft-Sparse Attention
3.5 Loss Function
4 Experiments
4.1 MOT Benchmarks and Metrics
4.2 Implementation Details
4.3 Benchmark Results
4.4 Ablation Study
4.5 Case Study
5 Conclusion
References
A BiGRU Based Adaptive Gain Estimation for Radar Multi-target Tracking
1 Introduction
2 The MTT-AGE Framework
2.1 Prediction Process
2.2 Data Association
2.3 Survival Likelihood Estimation
2.4 AGE Estimation
3 Experiment
3.1 Training
3.2 Tracking on MIT Trajectory Data
3.3 Tracking on Simulated Scenarios
3.4 Tracking on Real Dataset
3.5 The Verity of AGE Module
4 Conclusion
References
Prompt Based Lifelong Person Re-identification
1 Introduction
2 Related Work
2.1 Person ReID
2.2 Lifelong Learning
2.3 Knowledge Distillation
2.4 Prompt Learning
3 Method
3.1 Problem Definition and Formulation
3.2 Baseline Approach
3.3 Distillation for Vision Transformer
3.4 Prompt Learning
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Performance Evaluation
4.4 Ablation Studies
4.5 Discussion
5 Conclusion
References
Hierarchical Focused Feature Pyramid Network for Small Object Detection
1 Introduction
2 Related Work
2.1 Small Object Detection
2.2 Feature Pyramid Network
2.3 Self-Attention
3 Methodology
3.1 Overview
3.2 Hierarchical Feature Subtraction Module
3.3 Feature Fusion Guidance Attention
4 Experiments
4.1 Datasets
4.2 Experiment Settings
4.3 Comparison Results
4.4 Ablation Study
5 Conclusion
References
JLInst: Boundary-Mask Joint Learning for Instance Segmentation
1 Introduction
2 Related Work
3 Methods
3.1 Overview
3.2 Boundary-Mask Joint Learning Branch
3.3 Learning and Optimization
4 Experiments
4.1 Dataset and Evaluation Metrics
4.2 Experimental Settings
4.3 Comparisons with State-of-the-Art Method
4.4 Ablation Studies
5 Conclusion
References
Boosting One-Stage Multi Object Tracking with Attention Learning
1 Introduction
2 Related Work
3 Methodology
3.1 Discriminability Enhancement Module
3.2 Identity Preserving Module
4 Experiments
4.1 Implementation Details
4.2 Ablation Study
4.3 Comparing with SOTA MOT Methods
5 Conclusion
References
TPNet: Enhancing Weakly Supervised Polyp Frame Detection with Temporal Encoder and Prototype-Based Memory Bank
1 Introduction
2 Method
2.1 Overall Architecture
2.2 Temporal Encoder
2.3 Prototype-Based Memory Bank
2.4 Loss Function
3 Experiment
3.1 Experiment Settings
3.2 Comparison with Other Methods
3.3 Effectiveness of Each Component
3.4 Influence of the Number of Memory Blocks
4 Conclusion
References
Learning Frequency-Based Disentanglement and Filtering for Generalizable Person Re-identification
1 Introduction
2 Methods
2.1 Overview
2.2 Bilateral Frequency Component-Guided Attention
2.3 Fourier Noise Masquerade Filtering
2.4 Loss Function
3 Experiments
3.1 Datasets and Evaluation Protocals
3.2 Implementation Details
3.3 Comparison with State-of-the-art Methods
3.4 Ablation Studies
4 Conclusion
References
Stereo3DMOT: Stereo Vision Based 3D Multi-object Tracking with Multimodal ReID
1 Introduction
2 Related Work
2.1 3D MOT
2.2 ReID Model
3 Method
3.1 3D ReID Network Model
3.2 Stereo3DMOT Tracking Method
4 Experiments
4.1 Experimental Results
4.2 Ablation Study
5 Conclusion
References
Emphasizing Boundary-Positioning and Leveraging Multi-scale Feature Fusion for Camouflaged Object Detection
1 Introduction
2 Our Method
2.1 Enhanced Feature Module
2.2 Boundary and Positioning Joint-Guided Feature Fusion Module
2.3 Overall Loss Function
3 Experiments and Results
3.1 Implementation Details
3.2 Datasets and Evaluation Metrics
3.3 Comparison with State-of-the-Art Methods
3.4 Ablation Study
4 Conclusion
References
Author Index