Computer Vision – ACCV 2022: 16th Asian Conference on Computer Vision, Macao, China, December 4–8, 2022, Proceedings, Part II (Lecture Notes in Computer Science, 13842) 3031262832, 9783031262838

The 7-volume set of LNCS 13841-13847 constitutes the proceedings of the 16th Asian Conference on Computer Vision, ACCV 2

152 20 115MB

English Pages 687 [683] Year 2023

Table of contents :
Preface
Organization
Contents – Part II
Applications of Computer Vision, Vision for X
Skin Tone Diagnosis in the Wild: Towards More Robust and Inclusive User Experience Using Oriented Aleatoric Uncertainty
1 Introduction
2 Related Work
3 Problem
3.1 Color Estimation: A Continuous 3D Output
3.2 Oriented Uncertainty
4 Model
4.1 Aleatoric Uncertainty Principle
4.2 Covariance for Oriented Uncertainty
4.3 Euler Angles Decomposition
4.4 Oriented Uncertainty Benefits
4.5 Probabilities Re-scaling for Models Comparison
5 Experiments
5.1 Dataset
5.2 Evaluation and Implementation Details
5.3 Metrics for Regression and Uncertainty
5.4 Uncertainty for Samples Selection
5.5 Color Control Using Uncertainty Orientation
5.6 Uncertainty Orientation for Foundation Recommendation
6 Conclusion
A Rotation Matrices
B Neural Network Architecture
C Behavior During Training
References
Spatio-Channel Attention Blocks for Cross-modal Crowd Counting
1 Introduction
2 Related Works
2.1 Detection-Based Methods
2.2 Regression–Based Methods
2.3 Multi-modal Learning for Crowd Counting
3 Proposed Method
3.1 Overview
3.2 Spatial-Wise Cross-modal Attention (SCA)
3.3 Channel-Wise Feature Aggregation (CFA)
4 Experiments
4.1 Implementation Details
4.2 Effectiveness of Plug-and-Play CSCA Blocks
4.3 Comparison with Previous Methods
4.4 Ablation Studies
5 Qualitative Results
6 Conclusion
References
Domain Generalized RPPG Network: Disentangled Feature Learning with Domain Permutation and Domain Augmentation
1 Introduction
2 Related Work
2.1 Remote Photoplethysmography Estimation
2.2 Feature Disentanglement
2.3 Domain Generalization
3 Proposed Method
3.1 Problem Statement and Overview of DG-rPPGNet
3.2 Disentangled Feature Learning
3.3 Domain Permutation for Domain-Invariant Feature Learning
3.4 Domain Augmentation via AdaIN
3.5 Loss Function
3.6 Inference Stage
4 Experiments
4.1 Datasets and Cross-Domain Setting
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Ablation Study
4.5 Results and Comparison
5 Conclusion
References
Complex Handwriting Trajectory Recovery: Evaluation Metrics and Algorithm
1 Introduction
2 Related Work
2.1 Trajectory Recovery Evaluation Metrics
2.2 Trajectory Recovery Algorithms
3 Glyph-and-Trajectory Dual-modality Evaluation
3.1 Adaptive Intersection on Union
3.2 Length-Independent Dynamic Time Warping
3.3 Analysis of AIoU and LDTW
4 Parsing-and-Tracing ENcoder-Decoder Network
4.1 Double-Stream Parsing Encoder
4.2 Global Tracing Decoder
4.3 Optimization
5 Experiments
5.1 Datasets
5.2 Experimental Setting
5.3 Comparison with State-of-the-Art Approaches
5.4 Ablation Study of PEN-Net
6 Conclusion
References
Fully Transformer Network for Change Detection of Remote Sensing Images
1 Introduction
2 Related Work
2.1 Change Detection of Remote Sensing Images
2.2 Vision Transformers for Change Detection
3 Proposed Approach
3.1 Siamese Feature Extraction
3.2 Deep Feature Enhancement
3.3 Progressive Change Prediction
3.4 Loss Function
4 Experiments
4.1 Datasets
4.2 Evaluation Metrics
4.3 Implementation Details
4.4 Comparisons with Sate-of-the-Arts
4.5 Ablation Study
5 Conclusion
References
Self-supervised Augmented Patches Segmentation for Anomaly Detection
1 Introduction
2 Related Work
2.1 Embedding-Based Approach
2.2 Reconstruction-Based Approach
2.3 Self-supervised Learning-Based Approach
3 Methods
3.1 Add Augmented Patches to Image
3.2 Augmented Patches Segmentation
3.3 Inference
4 Experiment
4.1 Experiment Details
4.2 Effectiveness of Segmentation Task
4.3 Comparison with Other SSL Models
4.4 Ablation Study
4.5 Anomaly Segmentation
5 Conclusion
References
Rethinking Low-Level Features for Interest Point Detection and Description
1 Introduction
2 Related Work
3 Method
3.1 Interest Point Detection Module
3.2 Low-Level Description-Aware Module
3.3 Correspondence Module
3.4 Optimization
3.5 Deployment
4 Experiments
4.1 Details
4.2 Comparison
4.3 Ablation Study
4.4 Validation on Low-Level Feature Learning
4.5 Further Analysis
5 Conclusion
References
DILane: Dynamic Instance-Aware Network for Lane Detection
1 Introduction
2 Related Works
2.1 Segmentation-Based Methods
2.2 Anchor-Based Methods
2.3 Keypoint-Based Methods
2.4 Parameter-Based Methods
3 Proposed Method
3.1 Learnable Anchor
3.2 Dynamic Head
3.3 Self-attention
3.4 Loss Function
4 Experiment
4.1 Datasets
4.2 Implementation Details
4.3 Metrics
4.4 Result
4.5 Ablation Analysis
5 Conclusion
References
CIRL: A Category-Instance Representation Learning Framework for Tropical Cyclone Intensity Estimation
1 Introduction
2 Related Works
2.1 Tropical Cyclone Intensity Estimation
2.2 Metric Learning
3 The Proposed Approach
3.1 Category-Instance Representation Learning Framework
3.2 Inter-class Distance Consistency Loss
3.3 Inference Stage
3.4 A Smooth Algorithm for Intensity Estimating
4 Experiments
4.1 Experimental Settings
4.2 Comparison to State-of-the-Art Methods
4.3 Ablation Studies and Discussions
5 Conclusion
References
Explaining Deep Neural Networks for Point Clouds Using Gradient-Based Visualisations
1 Introduction
2 Related Work
3 Preliminaries
4 Accumulated Piecewise Explanations (APE)
4.1 Partial Heatmap Lj
4.2 Point Cloud Heatmap
5 Experiments
5.1 Qualitative Experiments
5.2 Quantitative Experiments
6 Conclusions
References
Multispectral-Based Imaging and Machine Learning for Noninvasive Blood Loss Estimation
1 Introduction
2 Methodology
2.1 Hardware Setup
2.2 Dataset
2.3 Feature Extraction Techniques (FET)
2.4 Regression Modelling
2.5 Model Evaluation
3 Results and Discussion
3.1 FET Results
3.2 Machine Learning Algorithms Performance
3.3 Sample Results
4 Conclusion
References
Unreliability-Aware Disentangling for Cross-Domain Semi-supervised Pedestrian Detection
1 Introduction
2 Related Work
2.1 Scene-Specific Pedestrian Detection
2.2 Pedestrian Synthesis
3 Method
3.1 Overview
3.2 Information Transferring from ID-Rich Instances to ID-Absence Instances
3.3 Re-training Detector
4 Experiments
4.1 Experimental Settings
4.2 Evaluation of the Generative Ability
4.3 Evaluation of the Discriminative Ability
5 Conclusion
References
Patch Embedding as Local Features: Unifying Deep Local and Global Features via Vision Transformer for Image Retrieval
1 Introduction
2 Related Work
2.1 Local Features
2.2 Global Feature
2.3 Joint Local and Global CNN Features
2.4 Transformers for High-Resolution Images
3 Methodology
3.1 Model
3.2 Training Objective
4 Experiments
4.1 Experimental Setup
4.2 Comparison with the State-of-the-Art
4.3 Ablation Experiments
5 Conclusions
References
Improving Surveillance Object Detection with Adaptive Omni-Attention over Both Inter-frame and Intra-frame Context
1 Introduction
2 Related Work
2.1 Generic Object Detection
2.2 Surveillance Object Detection
3 Proposed Method
3.1 Omni-Attention Based Surveillance Object Detection Architecture
3.2 Inter-frame Attention Module
3.3 Intra-frame Attention Module
3.4 Feature Fusion Module
4 Experiments
4.1 Experimental Details
4.2 Main Results
4.3 Ablation Study
5 Conclusion
References
Lightweight Image Matting via Efficient Non-local Guidance
1 Introduction
2 Related Work
3 Proposed Method
3.1 Network Architecture
3.2 Long-Short Range Pyramid Pooling Module
3.3 Efficient Non-local Block
3.4 Loss Function
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison with the SOTA Methods
4.4 Ablation Study
5 Conclusion
References
RGB Road Scene Material Segmentation
1 Introduction
2 Related Work
3 KITTI-Materials Dataset
4 RMSNet
4.1 Mix-Transformer as Encoder
4.2 SAMixer-Based Decoder
5 Experiments and Discussions
5.1 Implementation Details
5.2 Experimental Results on KITTI-Materials
6 Ablation Study
6.1 Balanced Query-Key Similarity Measure
6.2 Bottleneck Local Statistics Encoding-Decoding Strategy
7 Conclusion
References
Self-distilled Vision Transformer for Domain Generalization
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Preliminaries
3.2 Self-distilled Vision Transformer for Domain Generalization
4 Experiments
4.1 Comparison with the State-of-the-Art
4.2 Ablation Study and Analysis
5 Conclusion
References
CV4Code: Sourcecode Understanding via Visual Code Representations
1 Introduction
2 Related Work
3 CV4Code
3.1 Sourcecode Representation
3.2 Model
3.3 Implementation Details
4 Experimental Evaluation
4.1 Setup
4.2 Baseline Methods
4.3 CV4Code Setup
4.4 Results
4.5 Ablation Studies
5 Conclusion
References
PPR-Net: Patch-Based Multi-scale Pyramid Registration Network for Defect Detection of Printed Label
1 Introduction
2 Method
2.1 The Proposed Framework
2.2 Patch Splitting and Stitching Strategy
2.3 Multi-scale Pyramid Registration Network
2.4 Loss Function
2.5 Reference-Based Comparison Defect Detection (RCDD)
2.6 Discussion
3 Experiments
3.1 Dataset Generation
3.2 Evaluation Metrics
3.3 Implementation Details
3.4 The Parameter Choice of 1 and 2
3.5 Comparison with Baselines and Time Complexity
3.6 Ablation Studies for PPR-Net
3.7 Qualitative Results
4 Conclusions
References
A Prototype-Oriented Contrastive Adaption Network for Cross-Domain Facial Expression Recognition
1 Introduction
2 Related Works
2.1 Cross-Domain Facial Expression Recognition
2.2 Contrastive Learning
3 Method
3.1 Learnable Category Prototypes
3.2 Pre-train on Source Domain
3.3 Domain-Specific Feature Learning
3.4 Cross-Domain Feature Fusion
3.5 Obtain Class Weights and a Adaptative Threshold
3.6 Overall Objective
4 Experiments
4.1 Databases
4.2 Experimental Details
4.3 Comparison Results
4.4 Empirical Analysis
5 Conclusion
References
Multi-stream Fusion for Class Incremental Learning in Pill Image Classification
1 Introduction
2 Preliminaries
2.1 Problem Definition and Notation
2.2 Conventional CIL Methods
3 Methodology
3.1 Multi-stream Class Incremental Learning Model
3.2 Color Histogram Guidance Stream
3.3 Incremental Multi-stream Intermediate Fusion Technique
4 Experiments
4.1 Dataset
4.2 Experimental Protocol
4.3 Implementation Details
4.4 Experimental Results
5 Ablation Studies
6 Discussions
7 Conclusion
References
Decision-Based Black-Box Attack Specific to Large-Size Images
1 Introduction
2 Related Work
2.1 Transfer-Based Black-Box Attacks
2.2 Score-Based Black-Box Attacks
2.3 Decision-Based Black-Box Attacks
3 Problem Definition
4 Decision-Based Black-Box Attack Specific to Large-Size Images (SLIA)
4.1 Initialization
4.2 Gradient Direction Estimate at the Decision Boundary
4.3 Move Along Estimated Gradient Direction
4.4 Project to Decision Boundary
5 Experiments
5.1 Experimental Settings
5.2 Experimental Results
6 Conclusion
References
RA Loss: Relation-Aware Loss for Robust Person Re-identification
1 Introduction
2 Related Works
2.1 Person Re-identification
2.2 Intra-pair Variance
3 Method
3.1 Intra-pair Variance in ReID
3.2 The Macro-constraint
3.3 The Micro-constraint
3.4 Person ReID via RA Loss
4 Experiments
4.1 Datasets and Evaluation Protocols
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparisons with State-of-the-Art Methods
5 Conclusion
References
Class Specialized Knowledge Distillation
1 Introduction
2 Related Work
2.1 Knowledge Distillation
2.2 Task Specialization
3 Our Approach
3.1 Knowledge Distillation
3.2 Renormalized Knowledge Distillation
3.3 Regularization Loss for Feature Embeddings
3.4 Training with Losses
4 Experimental Results
4.1 Ablation Study Setup
4.2 Ablation Study Experiment: Subclass Numbers for the Student
4.3 Ablation Study Experiment: Hyperparameters for Loss Terms
4.4 Classification Performance Comparisons
5 Conclusion
References
Gestalt-Guided Image Understanding for Few-Shot Learning
1 Introduction
2 Related Work
2.1 Few-Shot Learning
2.2 Gestalt Psychology
3 Method
3.1 Problem Definition
3.2 Metric-Based Few-Shot Learning Pipeline
3.3 Gestalt Guide Image Understanding
3.4 Feature Estimation
3.5 The Overview of Our Approach
4 Experiment
4.1 Datasets
4.2 Implementation Details
4.3 Experiment Results
4.4 Result Analysis
5 Conclusion
References
Depth Estimation via Sparse Radar Prior and Driving Scene Semantics
1 Introduction
2 Related Work
2.1 Monocular Depth Estimation
2.2 Depth Completion
3 Proposed Method
3.1 Network Architecture
3.2 Sparse Pre-mapping Module
3.3 Feature Fusion Module
3.4 Loss Function
4 Scenario Semantics_Based Depth Estimation
5 Dataset Generation
6 Experiments
6.1 Implementation Detail
6.2 Comparing Performance
6.3 Ablation Studies
7 Conclusion
References
Flare Transformer: Solar Flare Prediction Using Magnetograms and Sunspot Physical Features
1 Introduction
2 Related Work
2.1 Time-Series Forecasting
2.2 Solar Flare Prediction
3 Problem Statement
4 Proposed Method
4.1 Novelty
4.2 Input
4.3 Flare Transformer
4.4 Loss Function
5 Experiments
5.1 Experimental Setup
5.2 Evaluation Metric
5.3 Experimental Results
5.4 Ablation Studies
5.5 Error Analysis
6 Conclusion
References
Neural Residual Flow Fields for Efficient Video Representations
1 Introduction
2 Related Works
3 Method
3.1 Dense Optical Flow Estimation
3.2 Image Warping and Completion
3.3 Key Frames
3.4 End-to-End Training
3.5 Multi-reference Frames
3.6 Network Split
4 Experiments
4.1 Dataset
4.2 Single Reference Frame Experiments
4.3 Multi-reference Frames and Network Splitting Experiments
4.4 Comparison with Other Neural-Field-Based Video Representation
4.5 Spatial and Temporal Interpolation
5 Conclusions and Discussion
References
Visual Explanation Generation Based on Lambda Attention Branch Networks
1 Introduction
2 Related Work
3 Problem Statement
4 Proposed Method
4.1 Structure
4.2 Lambda Layer
4.3 PID Score
5 Experiments
5.1 Experimental Setup
5.2 Quantitative Results
5.3 Ablation Studies
5.4 Qualitative Results
5.5 Error Analysis
6 Conclusions
References
Shape Prior is Not All You Need: Discovering Balance Between Texture and Shape Bias in CNN
1 Introduction
2 Related Works
3 Adaptive Bias Analysis (AdaBA)
3.1 Baseline and Its Improvement Avenues
3.2 Methodology
3.3 Validation on AdaBA
3.4 Bias Analysis on Fine-tuned CNNs with AdaBA
4 Mitigating Texture Bias by Redesigning Label Space
4.1 Granular Labeling Scheme
4.2 Bias Analysis on Concatenated Sets
5 Is Granular Labeling Scheme Advantageous in Classification Performance?
5.1 Setup
5.2 Analysis
6 Does Granular Labeling Scheme Contribute to Better OOD Detection?
6.1 Setup
6.2 Analogy
7 What Representation Do Various Schemes Acquire?
7.1 Setup
7.2 Analogy
8 Conclusion
References
What Role Does Data Augmentation Play in Knowledge Distillation?
1 Introduction
2 Related Work
3 Contributions
3.1 Inspired by SSKD and HSAKD
3.2 The Role of Data Augmentation
3.3 Decoupling the Overall Loss
3.4 The Probability of Data Augmentation
3.5 Few-Shot Analysis
3.6 Wide Comparison
3.7 Cosine Confidence Knowledge Distillation
4 Conclusion
A Hyperparameter Settings
B Additional Method Comparisons
C Additional Visualization
References
TCVM: Temporal Contrasting Video Montage Framework for Self-supervised Video Representation Learning
1 Introduction
2 Related Works
3 Method
3.1 Overview of Temporal Contrasting Video Montage Framework
3.2 Video Montage Module
3.3 Temporal Contrasting Module
4 Results
4.1 Implementation Details
4.2 Compared with State-of-the-Art Methods
4.3 Ablation Study
4.4 Visualization Analysis
5 Conclusions
References
SCFNet: A Spatial-Channel Features Network Based on Heterocentric Sample Loss for Visible-Infrared Person Re-identification
1 Introduction
2 Related Work
3 Methodology
3.1 Network Architecture
3.2 Loss Function
3.3 Local Random Channel Enhancement
4 Experiments
4.1 Experimental Setting
4.2 Ablation Experiment
4.3 Comparison with the State-of-the-art Methods
5 Conclusion
References
Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward
1 Introduction
2 Related Work
2.1 Video Editing
2.2 Text Coherence Prediction and Evaluation
2.3 Neural Combinatorial Optimization
3 Method
3.1 Problem Formulation
3.2 Architecture
3.3 Reward Design
3.4 Training
4 Experiments
4.1 Dataset
4.2 Metric
4.3 Implementation Details
4.4 Performance Comparison
4.5 Ablation Studies
4.6 Qualitative Analysis
5 Conclusion
References
CCLSL: Combination of Contrastive Learning and Supervised Learning for Handwritten Mathematical Expression Recognition
1 Introduction
2 Related Work
3 Method
3.1 Encoder
3.2 Transformer Decoder
3.3 Supervised Training
3.4 Contrastive Training
3.5 Combination of Contrastive Learning and Supervised Learning
4 Experiments
4.1 Experimental Setup and Results
4.2 Ablation Experiments
4.3 Encoder Migration
4.4 Visualization
5 Conclusion
References
Dynamic Feature Aggregation for Efficient Video Object Detection
1 Introduction
2 Related Works
3 Methodology
3.1 Preliminary
3.2 Vanilla Dynamic Aggregation
3.3 Deformable Dynamic Aggregation
3.4 Inplace Distillation Loss
4 Experiments
4.1 Experiment Setup
4.2 Main Results
4.3 Model Analysis
5 Conclusion
References
Computational Photography, Sensing, and Display
Robust Human Matting via Semantic Guidance
1 Introduction
2 Related Work
3 Method
3.1 Shared Encoder
3.2 Segmentation Decoder
3.3 Matting Decoder
4 Training
4.1 Segmentation Training
4.2 Matting Training
5 Experiment
5.1 Benchmarks
5.2 Quantitative Comparison
5.3 Qualitative Comparison
5.4 Size and Speed Comparison
5.5 Ablation Studies
6 Conclusion
References
RDRN: Recursively Defined Residual Network for Image Super-Resolution
1 Introduction
2 Related Work
3 Proposed Method
3.1 Network Architecture
3.2 Recursively Defined Residual Block (RDRB)
3.3 Intermediate Supervision (IS)
3.4 Implementation Details
3.5 Discussion
4 Experiments
4.1 Implementation and Trainings Details
4.2 Ablation Study
4.3 Results with Bicubic (BI) Degradation Model
4.4 Results with Bicubic Blur-Downscale (BD) Degradation Model
4.5 Results with Bicubic Downscale-Noise (DN) Degradation Model
4.6 Model Complexity Analyses
5 Conclusion
References
Super-Attention for Exemplar-Based Image Colorization
1 Introduction
2 Related Work
3 Colorization Framework
3.1 Main Colorization Network
3.2 Super-Attention as Color Reference Prior
3.3 Training Losses
3.4 Implementation Details
4 Dataset and References Selection
5 Experimental Validations
5.1 Analysis of the Method
5.2 Comparison with Other Exemplar-Based Colorization Methods
6 Conclusion
References
Author Index

Recommend Papers