Computer Vision – ACCV 2022: 16th Asian Conference on Computer Vision, Macao, China, December 4–8, 2022, Proceedings, Part I 9783031263187, 9783031263194

The 7-volume set of LNCS 13841-13847 constitutes the proceedings of the 16th Asian Conference on Computer Vision, ACCV 2

591 109 105MB

English Pages 686 [687] Year 2023

Table of contents :
Preface
Organization
Contents – Part I
3D Computer Vision
EAI-Stereo: Error Aware Iterative Network for Stereo Matching
1 Introduction
2 Related Works
2.1 Data-driven Stereo Matching
2.2 Iterative Network
3 Approach
3.1 Multi-scale Feature Extractor
3.2 Iterative Multiscale Wide-LSTM Network
3.3 Error Aware Refinement
4 Experiments
4.1 Middlebury
4.2 ETH3D
4.3 KITTI-2015
4.4 Cross-domain Generalization
4.5 Ablations
5 Conclusion
References
Temporal-Aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking
1 Introduction
2 Related Work
2.1 2D Object Tracking
2.2 3D Single Object Tracking
2.3 3D Multi Object Tracking
3 Approch
3.1 Problem Setting
3.2 Template Set Sampling
3.3 Temporal Feature Enhancement
3.4 Temporal-Aware Feature Aggregation
3.5 Loss Function
4 Experiments
4.1 Experimental Settings
4.2 Result
4.3 Ablation Study
5 Conclusions
References
Neural Plenoptic Sampling: Learning Light-Field from Thousands of Imaginary Eyes
1 Introduction
2 Related Work
3 Neural Plenoptic Sampling
3.1 Proxy Depth Reconstruction
3.2 Imaginary Eye Sampling
3.3 Self-supervision via Photo-Consistency
3.4 Color Blending for View Synthesis
4 Experiments
4.1 Datasets and Evaluations
4.2 Training Details
4.3 Comparison with the State-of-the-Art
4.4 Effectiveness of Imaginary Eye Sampling
4.5 Proxy-Depth from Colors and Features
4.6 With or Without View-Direction Weighting
4.7 Different Number of Input Views
5 Conclusion
References
3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis
1 Introduction
2 Related Work
2.1 Skeleton-Based Action Recognition
2.2 Visual-Based Action Quality Assessment
2.3 Sports Action Dataset
3 3D Yoga Dataset
3.1 Yoga Pose Capturing
3.2 Pose Classification
3.3 Pose Assessment
3.4 Data Organization
4 Yoga Pose Analysis
4.1 Data Pre-processing
4.2 Hierarchical Analysis
5 Experiments and Analysis
5.1 Implementation Details
5.2 Ablation Studies
5.3 Comparison with Other Methods
6 Conclusions
References
NEO-3DF: Novel Editing-Oriented 3D Face Creation and Reconstruction
1 Introduction
2 Related Works
3 The Proposed Method
4 Experiments
5 Conclusion
References
LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing
1 Introduction
2 Related Works
3 LSMD-Net
3.1 Dual-Branch Disparity Predictor
3.2 Mixture Density Module
3.3 Losses
4 Livox-Stereo Dataset
4.1 Data Collecting System
4.2 Livox-Stereo Dataset
5 Experiments
5.1 Datasets and Evaluation Metrics
5.2 Implementation Details
5.3 Results on KITTI Stereo 2015 Dataset
5.4 Results on KITTI Depth Completion Dataset
5.5 Results on Livox-stereo Dataset
5.6 Ablation Study
5.7 Computational Time
6 Conclusion
References
Point Cloud Upsampling via Cascaded Refinement Network
1 Introduction
2 Related Work
2.1 Point Cloud Upsampling
2.2 Point Cloud Processing
3 Proposed Method
3.1 Network Architecture
3.2 Compared with Previous Coarse-to-Fine Approaches
3.3 Training Loss Function
4 Experiments
4.1 Experimental Settings
4.2 Results on Synthetic Dataset
4.3 Results on Real-Scanned Dataset
4.4 Model Complexity Analysis
4.5 Ablation Study
5 Conclusion
References
CVLNet: Cross-view Semantic Correspondence Learning for Video-Based Camera Localization
1 Introduction
2 Related Work
3 CVLNet: Cross-view Video-Based Localization
3.1 Geometry-Driven View Projection (GVP)
3.2 Photo-Consistency Constrained Sequence Fusion
3.3 Scene-Prior Driven Similarity Matching
3.4 Training Objective
4 The KITTI-CVL Dataset
5 Experiments
5.1 Cross-view Video-Based Localization
5.2 Single Image-Based Localization
5.3 Limitations
6 Conclusions
References
Vectorizing Building Blueprints
1 Introduction
2 Related Works
3 Blueprint Vectorization Dataset
4 Blueprint Vectorization Algorithm
4.1 Blueprint Representation
4.2 Instance Segmentation and Type Classification
4.3 Frame Detection Module
4.4 Frame Correction Module
4.5 Heuristic Simplification
5 Implementation Details
6 Evaluations
6.1 Quantitative Results
6.2 Qualitative Results
7 Conclusion
References
Unsupervised 3D Shape Representation Learning Using Normalizing Flow
1 Introduction
2 Related Work
2.1 3D Point Cloud Representation Learning
2.2 Normalizing Flows
3 Method
3.1 Self-supervised Reconstruction
3.2 Variational Normalizing Flow
3.3 Contrastive-Center Loss
4 Experiments and Results
4.1 Experimental Datasets
4.2 Implementation Details
4.3 Results on ModelNet
4.4 Cross-Dataset Evaluation
4.5 Robustness Analysis
4.6 Complexity Analysis
4.7 Ablation Analysis
5 Conclusion
References
Learning Inter-superpoint Affinity for Weakly Supervised 3D Instance Segmentation
1 Introduction
2 Related Work
2.1 3D Semantic Segmentation
2.2 3D Instance Segmentation
3 Method
3.1 Backbone Network
3.2 Inter-superpoint Affinity Mining
3.3 Volume-Aware Instance Refinement
3.4 Network Training
4 Experiments
4.1 Experimental Settings
4.2 Results
4.3 Ablation Study
5 Conclusion
References
Cross-View Self-fusion for Self-supervised 3D Human Pose Estimation in the Wild
1 Introduction
2 Related Work
3 Methods
3.1 Lifting Network
3.2 Reprojection
3.3 Cross-View Self-fusion
3.4 Self-supervised Training
4 Experiments
4.1 Datasets and Metrics
4.2 Ablation Studies
4.3 Comparison with State-of-the-Art Methods
5 Conclusion
References
3D-C2FT: Coarse-to-Fine Transformer for Multi-view 3D Reconstruction
1 Introduction
2 Related Works
2.1 CNN and RNN-Based Models
2.2 Transformer-Based Models
3 3D Coarse-to-Fine Transformer
3.1 Image Embedding Module
3.2 2D-View Encoder
3.3 3D Decoder
3.4 3D Refiner
3.5 Loss Function
4 Experiments
4.1 Evaluation Protocol and Implementation Details
4.2 Result
4.3 Ablation Study
5 Conclusion
References
SymmNeRF: Learning to Explore Symmetry Prior for Single-View View Synthesis
1 Introduction
2 Related Work
3 SymmNeRF
3.1 Overview
3.2 Encoding Holistic Representations
3.3 Extracting Symmetric Features
3.4 Injecting Symmetry Prior into NeRF
3.5 Volume Rendering
3.6 Training
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparisons
4.4 Ablation Study
5 Conclusion
References
Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection
1 Introduction
2 Related Work
2.1 3D Object Detection
2.2 Meta-learning
3 Review of VoteNet
4 Method
4.1 Few-Shot 3D Object Detection
4.2 3D Meta-Detector
4.3 3D Object Detector
4.4 Few-Shot Learning Strategy
5 Experiments and Results
5.1 Datasets
5.2 Implementation Details
5.3 Comparison with the Baselines
5.4 Performance Analysis
5.5 Where to Re-weight
6 Conclusion
References
ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection
1 Introduction
2 Related Work
2.1 Point Cloud Representation Learning
2.2 3D Object Detection in Point Clouds
2.3 Transformers in Computer Vision
3 Proposed Method
3.1 Group Embedding
3.2 Reaggregation Transformer Block with Affine Group Features
3.3 Feature Propagation Layer with Multi-scale Connection
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Evaluation Results
4.4 Ablation Study
4.5 Qualitative Results and Discussion
5 Conclusion
References
Adaptive Range Guided Multi-view Depth Estimation with Normal Ranking Loss
1 Introduction
2 Related Work
2.1 Learning-Based MVS
2.2 MVS Loss Functions
3 Method
3.1 Network Architecture
3.2 Adaptive Depth Range
3.3 Structure-Guided Normal Ranking Loss
3.4 Total Loss Function
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Evaluation on DTU
4.4 Evaluation on Tanks and Temples
4.5 Evaluation on BlendedMVS
4.6 Runtime and Memory Analysis
4.7 Ablation Study
5 Conclusion
References
Training-Free NAS for 3D Point Cloud Processing
1 Introduction
2 Related Works
2.1 Point Cloud Processing
2.2 Training-Free Neural Architecture Search
2.3 Neural Architecture Search in 3D Point Cloud Processing
3 Methods
3.1 Preliminaries and Notations
3.2 Training-Free Proxies
3.3 Point Cloud Search Space
3.4 Training-Free NAS
4 Experiments
4.1 Implementation Details
4.2 3D Point Cloud Classification
4.3 3D Point Cloud Part Segmentation
5 Conclusions
References
Re-parameterization Making GC-Net-Style 3DConvNets More Efficient
1 Introduction
2 Related Work
2.1 Disparity Estimation
2.2 Kernel Re-parameterization
2.3 Network Compression
3 Re-parameterization for 3DConvNets
3.1 GC-Net-Style 3DConvNet
3.2 Re-parameterization for Efficient Inference
3.3 Implementation Tricks
3.4 Disparity Regression
3.5 Loss Function
4 Experiments
4.1 Experimental Conditions
4.2 Experimental Results
4.3 Ablation Study
5 Conclusion
References
PU-Transformer: Point Cloud Upsampling Transformer
1 Introduction
2 Related Work
3 Methodology
3.1 Overview
3.2 Positional Fusion
3.3 Shifted Channel Multi-head Self-attention
4 Implementation
4.1 PU-Transformer Head
4.2 PU-Transformer Body
4.3 PU-Transformer Tail
5 Experiments
5.1 Settings
5.2 Point Cloud Upsampling Results
5.3 Ablation Studies
5.4 Visualization
6 Limitations and Future Work
7 Conclusions
References
DIG: Draping Implicit Garment over the Human Body
1 Introduction
2 Related Work
3 Method
3.1 Garment Representation
3.2 Modeling Garment Deformations
3.3 Training the Model
3.4 Implementation Details
4 Experiments and Results
4.1 Dataset and Evaluation Metrics
4.2 Garment Reconstruction
4.3 Garment Deformation
4.4 From Images to Clothed People
5 Conclusion
References
Pyramidal Signed Distance Learning for Spatio-Temporal Human Shape Completion
1 Introduction
2 Related Work
3 Network Architecture
3.1 Spatio-Temporal Feature Encoding
3.2 Pyramidal Feature Decoding
3.3 Implicit Surface Decoding
3.4 Implementation Details
4 Training Strategies
4.1 Common Training Principles
4.2 Pyramidal Training Framework
4.3 Simpler Variants of the Pyramidal Approach
4.4 Full Spatio-Temporal Pyramidal Training
5 Experiments
5.1 Dataset
5.2 Training Protocol
5.3 Ablation and Variants Comparison
5.4 Learning-Based Method Comparisons
5.5 Model-Based Method Comparisons
6 Conclusion
References
Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image
1 Introduction
2 Related Works
3 The Method
3.1 Implicit Covering Relationship
3.2 Garment Indication Field
3.3 Layered-Garment Net
3.4 Training and Inference
4 Experiments
4.1 Dataset and Implementation Details
4.2 Quantitative Comparisons
4.3 Qualitative Results
5 Conclusion, Limitation, and Future Work
References
SWPT: Spherical Window-Based Point Cloud Transformer
1 Introduction
2 Related Works
2.1 Projection-Based Networks
2.2 Voxel-Based Networks
2.3 Point-Based Networks
2.4 Transformer in NLP and Vision
3 Spherical-Window Point Transformer
3.1 Overview
3.2 Spherical Projection
3.3 Spherical Window Transformer
3.4 Crossing Windows Self-attention
4 Experiments
4.1 Shape Classification
4.2 Part Segmentation
4.3 Real-World Object Classification
4.4 Ablation Study
5 Conclusion
References
Self-supervised Learning with Multi-view Rendering for 3D Point Cloud Analysis
1 Introduction
2 Related Work
3 Self-supervised Learning for 3D Point Clouds
3.1 Learning Feature Representation for 2D Images
3.2 Knowledge Transfer from 2D to 3D
4 Experiments
4.1 Implementation Details
4.2 Object Classification
4.3 Network Analysis
4.4 Part Segmentation and Scene Segmentation
4.5 Pre-training with Synthetic vs. Real-World Data
5 Conclusion
References
PointFormer: A Dual Perception Attention-Based Network for Point Cloud Classification
1 Introduction
2 Related Work
3 Our Approach
3.1 Scalar and Vector Self-attention
3.2 Point Multiplicative Attention Mechanism
3.3 Local Attention Block
3.4 Global Attention Block
3.5 Framework of PointFormer
3.6 Graph-Multiscale Perceptual Field (GMPF) Testing Strategy
4 Experiments
4.1 Experimental Setup
4.2 Classification Results
4.3 PointFormer Design Analysis
5 Conclusion
References
Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis
1 Introduction
2 Related Work
3 Neural Deformable Voxel Grid
3.1 Overview
3.2 Deformation Module for Motion Modeling
3.3 Canonical Module for View Synthesis
3.4 Occlusion-Aware Volume Rendering
4 Model Optimization
5 Experiments
5.1 Dataset and Metrics
5.2 Implementation Details
5.3 Comparisons
5.4 Method Analysis
6 Conclusions
References
Spotlights: Probing Shapes from Spherical Viewpoints
1 Introduction
2 Related Work
3 Method
3.1 Background
3.2 Spotlights Model Construction
3.3 Analysis of Spotlights
4 Shape Completion with Spotlights
5 Experiments
5.1 Shape Completion on Synthetic Dataset
5.2 Shape Completion on Real Dataset
5.3 Ablation Study
5.4 Limitations
6 Conclusions
References
OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition
1 Introduction
2 Related Work
3 Methods
3.1 Optimal Viewset Construction Based on Information Entropy
3.2 Multi-view Low-level Local Feature Token Sequence Generation
3.3 Global Descriptor Generation Based on the Pooling Transformer
4 Experimental Results and Discussion
4.1 Dataset
4.2 Implementation Details
4.3 The Influence of the Number of Views
4.4 The Influence of Viewset Information Entropy Ranking
4.5 The Influence of the Pooling Transformer Block
4.6 Ablation Study
4.7 Model Complexity
4.8 Visual Analysis of Confusion Matrix
4.9 Comparsions with State-of-the-Art Methods
5 Conclusion
References
Structure Guided Proposal Completion for 3D Object Detection
1 Introduction
2 Related Work
2.1 Single-Stage Object Detectors
2.2 Two-Stage Object Detectors
2.3 Point Cloud Augmentation
3 Methodology
3.1 RPN for Proposals Generation
3.2 Structure Completion Module
3.3 RoI Feature Completion Module
3.4 Detection Head and Loss Function
4 Experiments
4.1 Experimental Setup
4.2 Comparison with State-of-the-Arts
4.3 Ablation Study
5 Conclusion
References
Optimization Methods
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-task Learning
1 Introduction
2 Related Work
2.1 Multi-task Learning
2.2 Stochastic Gradient Descent
3 Effect of Gradient Noise
3.1 Preliminaries of SGD
3.2 Gradient Noise in STL
3.3 Inter-task Gradient Noise
4 Method
4.1 Gradient-to-Noise Ratio
4.2 Estimation of Expected Gradient
4.3 Weight Selection
5 Experiments
5.1 Experimental Settings
5.2 Baselines
5.3 Main Results
5.4 Dynamic Change of Weights
6 Discussion
6.1 Gradient Noise and Performance
6.2 The Paradox of Weight Design
7 Conclusion
References
Adaptive FSP: Adaptive Architecture Search with Filter Shape Pruning
1 Introduction
2 Related Work
3 Preliminaries
4 Proposed Method
4.1 Filter Shape Pruning (FSP)
4.2 Adaptive Architecture Search (AAS)
5 Experiments
5.1 Experimental Settings
5.2 Results on Cifar-10
5.3 Results on ImageNet
6 Ablation Study
6.1 Effect of Preserving the Receptive Field
6.2 Weight Parameters of the Metric: and
7 Conclusion
References
DecisioNet: A Binary-Tree Structured Neural Network
1 Introduction
2 Related Work
3 DecisioNet
3.1 Architecture
3.2 Classes Hierarchical Clustering
3.3 Routing
3.4 Loss Function
4 Experiments
4.1 FashionMNIST and CIFAR10
4.2 CIFAR100
5 Conclusion
References
An RNN-Based Framework for the MILP Problem in Robustness Verification of Neural Networks
1 Introduction
2 Preliminaries
2.1 Deep Neural Networks
2.2 Robustness Verification of DNNs
3 Semi-Planet Relaxation for MILP Problems
3.1 Formulating the Minimum Function in MILP Model
3.2 Relaxation and Tightening for MILP Constraints
4 Learning to Tighten
4.1 RNN-Based Framework
4.2 Recurrent Tightening Module
4.3 Training the RNN in RTM
5 Experiments
5.1 Effectiveness of RNN-Based Tightening Strategy
5.2 Performance on the OVAL Benchmark
5.3 Performance on the COLT Benchmark
6 Conclusion
References
Image Denoising Using Convolutional Sparse Coding Network with Dry Friction
1 Introduction
2 Related Work
2.1 Convolutional Sparse Coding Model
2.2 Convolutional Sparse Coding Network
2.3 Variance Regularization
3 Proposed CSCNet-DF Network
3.1 Iterative Shrinkage-Thresholding with Dry Friction Algorithm
3.2 Application ISTDFA to CSC Model
3.3 Convolutional Sparse Coding Network with Dry Friction
4 Experiments and Results
5 Conclusion
References
Neural Network Panning: Screening the Optimal Sparse Network Before Training
1 Introduction
2 Related Work
3 Methodology
3.1 Problem Definition
3.2 Transfer of Expressive Force
3.3 Artificial Panning
3.4 Panning Based on Reinforcement Learning
3.5 Selection of Metrics
4 Experiments
4.1 Experimental Setup
4.2 Performance of the Improved Metric
4.3 Experimental Results of Artificial Panning
4.4 Experimental Results of RLPanning
5 Conclusion
References
Network Pruning via Feature Shift Minimization
1 Introduction
2 Related Work
3 Methodology
3.1 Filter Pruning Analysis
3.2 Evaluating the Feature Shift
3.3 Filter Selection Strategy
3.4 Distribution Optimization
3.5 Pruning Procedure
4 Experiments
4.1 Implementation Details
4.2 Results and Analysis
5 Ablation Study
5.1 Effect of the Activation Function
5.2 Error of the Feature Shift Evaluation
5.3 Effect of the Evaluation Error
5.4 Effect of the Variance Adjustment Coefficients
6 Conclusions
References
Training Dynamics Aware Neural Network Optimization with Stabilization
1 Introduction
2 Neural Network Training Dynamics
2.1 Mini-Batch Training as a Dynamic System
2.2 Dynamics of Gradient Descent Optimizers
3 Stable Network Optimization
3.1 Liapunov Stability
3.2 Stable Network Training
3.3 Anchoring
3.4 Training Algorithm
4 Experiments
4.1 Results
5 Discussion
6 Conclusion
References
Neighborhood Region Smoothing Regularization for Finding Flat Minima in Deep Neural Networks
1 Introduction
2 Related Works
3 Neighborhood Region Smoothing (NRS) Regularization
3.1 Basic Background
3.2 Model Divergence
3.3 NRS Regularization
4 Experimental Results
4.1 Cifar10 and Cifar100
4.2 ImageNet
5 Further Studies of NRS
5.1 Parameter Selection in NRS
5.2 Eigenvalues of Hessian Matrix
6 Conclusion
References
Author Index

Recommend Papers