Document Analysis and Recognition – ICDAR 2021 Workshops: Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II (Image Processing, Computer Vision, Pattern Recognition, and Graphics) 3030861589, 9783030861582

This book constitutes the proceedings of the international workshops co-located with the 16th International Conference o

145 25 85MB

English Pages 564 [553] Year 2021

Table of contents :
Foreword
Preface
Organization
Contents – Part II
Contents – Part I
ICDAR 2021 Workshop on Machine Learning (WML)
WML 2021 Preface
chPart1
Organization
Workshop Chairs
Program Chairs
Program Committee
Benchmarking of Shallow Learning and Deep Learning Techniques with Transfer Learning for Neurodegenerative Disease Assessment Through Handwriting
1 Introduction
2 State of the Art Review
3 Shallow Learning for Online Handwriting Neurodegenerative Disease Assessment
3.1 Velocity-Based Features
3.2 Kinematic-Based Features
4 Deep Learning for Offline and Online Handwriting Neurodegenerative Disease Assessment
4.1 CNN Based Networks for Offline Recognition
4.2 Bi-directional LSTM RNN for Online Recognition
5 Dataset Description and Results
5.1 Dataset Description
5.2 Results
6 Results discussion
7 Conclusions
References
Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel
1 Introduction
2 Method
2.1 Segmentation Module
2.2 Connection Module
2.3 Recognition Module
2.4 Loss Function
3 Experiment
3.1 Dataset
3.2 Experimental Settings
3.3 Experimental Results
4 Conclusion
References
Data Augmentation vs. PyraD-DCNN: A Fast, Light, and Shift Invariant FCNN for Text Recognition
1 Introduction
2 Related Work
2.1 Bidirectional Short-Long Term Memory Neural Networks (BLSTM) for OCR
2.2 Data Augmentation
2.3 Fully Convolutional Neural Networks in Modern OCR Systems
3 Propositions
3.1 Data Augmentation: A Probabilistic Approach
3.2 PyraD-DCNN: A Fast, Light, and Shift Invariant FCNN for Text Recognition
4 Experiments
4.1 Data-Sets
4.2 Experimental Setup
4.3 Sensitivity of CNN-BLSTM Systems and the Use of Data Augmentation
4.4 The Use of PyraD-DCNN for Text Recognition
References
A Handwritten Text Detection Model Based on Cascade Feature Fusion Network Improved by FCOS
1 Introduction
2 Related Work
2.1 Scene Text Detection
2.2 Text Detection of Ancient Books
3 Methodology
3.1 Improved Multi-scale Feature Fusion Network
3.2 Deformable Convolution
3.3 Other Improvements
4 Dataset
5 Experiments
5.1 Implementation Details
5.2 Ablation Study
5.3 Comparisons with Previous Methods
6 Conclusion and Future Work
References
Table Structure Recognition Using CoDec Encoder-Decoder
1 Introduction
2 Related Work
3 Table Structure Recognition
3.1 Column Identification via Proposed CoDec Encoder-Decoder
3.2 Row Identification Module
3.3 Table Structure Recognition and Data Extraction
4 Datasets and Protocols
4.1 Implementation Details
5 Results
6 Conclusion
References
Advertisement Extraction Using Deep Learning
1 Introduction
2 Related Work
3 Dataset
3.1 Collecting and Labeling Images
4 The PTPNet Model
4.1 PTPNet Architecture
4.2 Rendering Models
5 Loss Functions
5.1 Vertex-Based Loss Functions
5.2 Area Loss Functions
5.3 Hybrid Loss Functions
5.4 Loss Function for PTPNet
6 Experiments
6.1 Dataset Preparation
6.2 Study Models
6.3 PTPNet
7 Conclusion
References
Detection and Localisation of Struck-Out-Strokes in Handwritten Manuscripts
1 Introduction
2 Proposed Methodology
2.1 GAN Preliminary
2.2 Objective
2.3 Generator Architecture
2.4 Discriminator Architecture
2.5 Detection of Struck-Out Word
3 Database and Experimental Set-Up
3.1 Generation of Struck-Out Words
4 Experimental Results and Discussion
4.1 Localisation of Struck Out Stroke
4.2 Detection of Strike-Out Textual Component
4.3 Performance Comparison
4.4 Performance on Real Word Images
4.5 Performance on Satyajit Ray's Manuscript
5 Conclusion
References
Temporal Classification Constraint for Improving Handwritten Mathematical Expression Recognition
1 Introduction
2 Related Works
3 Methodology
3.1 Encoder-Decoder Architecture in WAP
3.2 Temporal Classification Constraint for WAP
3.3 Learning of the Model
3.4 CTC Auxiliary Learning
4 Evaluation
4.1 Datasets
4.2 Setting and Metrics for Evaluation
4.3 Network Configuration
4.4 Experiments
5 Conclusion
References
Using Robust Regression to Find Font Usage Trends
1 Introduction
2 Related Work
2.1 Deep Regression
2.2 Movie Poster Analysis
3 Movie Posters and Title Text
4 Year Prediction by Robust Regression
4.1 Image-Based Estimation
4.2 Shape-Based Estimation
4.3 Feature-Based Estimation
4.4 Robust Regression
4.5 Automatic Loss Switch
5 Experimental Results
5.1 Dataset
5.2 Architecture and Settings
5.3 Results
5.4 Relationship Between Color and Year
5.5 Relationship Between Shape and Year
5.6 Relationship Between Local Features and Year
6 Conclusion
References
Binarization Strategy Using Multiple Convolutional Autoencoder Network for Old Sundanese Manuscript Images
1 Introduction
2 Related Work
2.1 Convolutional Autoencoder Network
2.2 Skip Connections Method
2.3 Dropout Regularization
2.4 The Batch Normalization
3 Dataset and Methodology
3.1 Datasets
3.2 Experiment Procedures
3.3 Evaluation Method
4 Result and Discussion
4.1 Discussion on Five Hyper Parameters Experiments
4.2 Experiments on Batch Normalization, Dropout, Skip-Connection in CAE Network
5 Conclusion
References
A Connected Component-Based Deep Learning Model for Multi-type Struck-Out Component Classification
1 Introduction
2 Related Work
3 Proposed Model
3.1 Text Component Detection
3.2 Deep Learning Model for Struck-Out Component Classification
4 Experimental Results
4.1 Dataset Creation and Evaluation
4.2 Ablation Study
4.3 Experiments on Component Detection
4.4 Experiments on Struck-Out and Non-struck-Out Component Classification
5 Conclusions and Future Work
References
Contextualized Knowledge Base Sense Embeddings in Word Sense Disambiguation
1 Introduction
2 Related Work
2.1 Knowledge-Based Approaches
2.2 Supervised Approaches
2.3 Language Modelling Representation
3 Preliminaries
4 C-KASE
4.1 Context Retrieval
4.2 Word Embedding
4.3 Sense Embedding
5 Experimental Setup
6 Results
7 Conclusion
References
ICDAR 2021 Workshop on Open Services and Tools for Document Analysis (OST)
ICDAR-OST 2021 Preface
chPart2
Organization
General Chairs
Program Committee Chairs
Program Committee
Automatic Generation of Semi-structured Documents
1 Introduction
2 Semi-structured Document Modeling
2.1 Semi-structured Document Definition
2.2 SSD Graphical Description
2.3 Layout and Content Variation in a SSD
2.4 SSD Annotation
3 Implemented Cases
3.1 Payslips
3.2 Receipts
3.3 Invoices
4 Generated Dataset Evaluation
4.1 Content Text Diversity
4.2 Layout Diversity
5 Use of the SSDs Generator in an Information Extraction System
6 Conclusion
References
DocVisor: A Multi-purpose Web-Based Interactive Visualizer for Document Image Analytics
1 Introduction
2 Related Work
3 Fully Automatic Region Parsing
3.1 Historical Manuscript Region Layout Segmentation
3.2 Table Data Layout Segmentation
3.3 DocBank Dataset
4 Box-Supervised Region Parsing
5 Optical Character Recognition (OCR)
6 Configuration
7 Future Work
8 Conclusion
References
ICDAR 2021 Workshop on Industrial Applications of Document Analysis and Recognition (WIADAR)
WIADAR 2021 Preface
chPart3
Organization
Organizing Chairs
Program Committee
Object Detection Based Handwriting Localization
1 Introduction
2 Related Work
3 Method
3.1 Faster R-CNN
3.2 Cascade R-CNN
3.3 Other Techniques
4 Experiments
4.1 Dataset
4.2 Preprocessing
4.3 Training with Deep Learning Networks
4.4 Evaluation Scores
4.5 Postprocessing
4.6 Results
4.7 Generalizability
5 Discussion
5.1 Conclusion
5.2 Outlook
References
Toward an Incremental Classification Process of Document Stream Using a Cascade of Systems
1 Introduction
2 Related Work
3 The Cascade of Systems
3.1 Main Idea
3.2 Training Architecture
3.3 Set Division
3.4 Decision Process
4 Experiments
4.1 Architectures Used
4.2 Datasets
4.3 Experimentation Methodology
4.4 Results
5 Conclusion
5.1 Overview
5.2 Perspectives
References
Automating Web GUI Compatibility Testing Using X-BROT: Prototyping and Field Trial
1 Introduction
2 X-BROT: Web Compatibility Testing Tool
2.1 Overall Structure
2.2 Element Identification
2.3 Layout Comparison
2.4 Content Comparison
2.5 Evaluation
3 Field Trial: Feedback and Solutions
3.1 Multi-frame Page
3.2 Non-comparison Areas
3.3 Huge Web Pages
3.4 Discussion
4 Conclusion
References
A Deep Learning Digitisation Framework to Mark up Corrosion Circuits in Piping and Instrumentation Diagrams
1 Introduction
2 Related Work
3 Methodology
3.1 Connection Points and Pipe Specs Detection
3.2 Text Bloc Localisation
3.3 Text Alignment
3.4 Text Recognition
3.5 Linkage
4 Experiments and Results
4.1 Detection
4.2 Text Recognition
4.3 Final Output
5 Conclusion and Future Work
References
Playful Interactive Environment for Learning to Spell at Elementary School
1 Introduction
1.1 Context of Study Project
1.2 Destination Market and Product Industrialisation
2 Advances in Handwriting Recognition
2.1 Software Advances and Recent Deep Neural Networks Techniques
2.2 Recent Application Programming Interface in Free Access
3 Comprehensive Scenario for Tablet-Trainee Interactions
3.1 The Interactive Environment
3.2 Handwriting Recognition Pipeline
3.3 Software Environment of the STUDY Application
4 Experimentations
4.1 Results Obtained on Synthetic Writings and IAM Dataset
4.2 Results on a Sample of Real Child Handwriting
5 Conclusion
References
ICDAR 2021 Workshop on Computational Paleography (IWCP)
IWCP 2021 Preface
chPart4
Organization
General Chairs
Program Committee
A Computational Approach of Armenian Paleography
1 Introduction
2 Notions of Armenian Paleography
3 A Dataset for Armenian Script Identification
4 Experiments and Discussion
5 Conclusion
References
Handling Heavily Abbreviated Manuscripts: HTR Engines vs Text Normalisation Approaches
1 Introduction
1.1 Abbreviations in Western Medieval Manuscripts
1.2 Expanding Abbreviations
1.3 Computational Approaches
2 MS BnF lat. 14525
3 Experiments and Results
3.1 Ground Truth Creation
3.2 HTR on Abbreviated and Expanded Data
3.3 Text Normalisation Approach
3.4 Results
4 Discussion and Further Research
References
Exploiting Insertion Symbols for Marginal Additions in the Recognition Process to Establish Reading Order
1 Introduction
1.1 The Problem
1.2 Background
2 Discussion
3 Conclusion
References
Neural Representation Learning for Scribal Hands of Linear B
1 Introduction
2 Prior Work
3 Methods
3.1 Architecture
4 Data
4.1 Collection
4.2 Augmentation
4.3 Statistics
5 Experiments
5.1 Findplace Classification
5.2 QVEC
5.3 Baselines
5.4 Training Details
6 Results
7 Conclusion
References
READ for Solving Manuscript Riddles: A Preliminary Study of the Manuscripts of the 3rd ṣaṭka of the Jayadrathayāmala
1 Introduction
1.1 Brief History of the Manuscript Transmission of the Jayadrathayāmala
1.2 Aim and Methods of the Present Study
2 Procedure
2.1 READ as a Virtual Research Environment (VRE) for Working with Manuscript Materials
2.2 Workflow in READ
3 Selected Discoveries
3.1 Manuscripts with Multiple Scripts/Letter Variants
3.2 Identification of the Same or Very Similar Handwriting
3.3 Overview of the JY Scripts from 12th to the 20th Centuries
3.4 Exploratory Statistical Analysis in R of the Data Produced by READ Paleography Report
4 Conclusion
References
ICDAR 2021 Workshop on Document Images and Language (DIL)
DIL 2021 Preface
chPart5
Organization
Organizers
Program Committee
Additional Reviewers
A Span Extraction Approach for Information Extraction on Visually-Rich Documents
1 Introduction
2 Information Extraction as Span Extraction
3 Pre-training Objectives with Recursive Span Extraction
3.1 LayoutLM for Low-Resource Languages
3.2 Span Extraction Pre-training
4 Experiments
4.1 Experiment Setup
4.2 Results and Discussion
5 Conclusion
References
Recurrent Neural Network Transducer for Japanese and Chinese Offline Handwritten Text Recognition
1 Introduction
2 Related Work
3 RNN-Transducer Architecture
3.1 Visual Feature Encoder
3.2 Linguistic Context Encoder
3.3 Joint Decoder
3.4 Training and Inference Process
4 Datasets
4.1 Kuzushiji Dataset
4.2 SCUT-EPT Chinese Examination Dataset
5 Experiments
5.1 RNN-Transducer Configurations
5.2 Performance on Kuzushiji Dataset
5.3 Performance on SCUT-EPT Dataset
5.4 Sample Visualization
6 Conclusion
References
MTL-FoUn: A Multi-Task Learning Approach to Form Understanding
1 Introduction
2 Multitask Approach to Form Understanding
2.1 Common Backbone
2.2 Task Specific Heads
3 Experiments
3.1 Dataset
3.2 Evaluation Metric
3.3 Baseline Comparisons
3.4 Hyperparameters and Model Details
4 Conclusion and Future Work
References
VisualWordGrid: Information Extraction from Scanned Documents Using a Multimodal Approach
1 Introduction
2 Related Work
3 Data
4 Method
4.1 Document Representation
4.2 Model Architectures
4.3 Implementation Details
5 Experiments
5.1 Datasets
5.2 Results
6 Conclusion
References
A Transformer-Based Math Language Model for Handwritten Math Expression Recognition
1 Introduction
2 Related Works
3 Proposed Method
3.1 Transformer-Based Math Language Model
3.2 Combining Language Model with HME Recognizer
4 Experiments
4.1 Dataset
4.2 HME Recognizer
4.3 Experimental Settings
4.4 Evaluation
4.5 Error Analysis
5 Conclusion and Future Works
References
Exploring Out-of-Distribution Generalization in Text Classifiers Trained on Tobacco-3482 and RVL-CDIP
1 Introduction
2 Related Work
3 In- and Out-of-Distribution Data
3.1 In-Distribution Data
3.2 Out-of-Distribution Data
4 Experiments
4.1 Methodology
4.2 Results
5 Future Work
6 Conclusion
References
Labeling Document Images for E-Commence Products with Tree-Based Segment Re-organizing and Hierarchical Transformer
1 Introduction
2 Related Works
3 Product Document Image Dataset
4 Our Proposed Approach
4.1 Tree-Based Segment Re-organizing
4.2 Hierarchical Transformer
5 Experiments
5.1 Implementation Details
5.2 Experimental Results
6 Conclusion
References
Multi-task Learning for Newspaper Image Segmentation and Baseline Detection Using Attention-Based U-Net Architecture
1 Introduction
1.1 Related Work
1.2 Hand Crafted Features Methods
1.3 Deep Learning Feature Based Methods
1.4 Major Contributions
2 Proposed Multi-task Learning-Based Framework
2.1 Overview
2.2 Modified U-Net Architecture
3 Experimental Results and Discussion
3.1 Dataset Description and Pre-processing
3.2 Evaluation Metrics
3.3 Results of Text-Block Segmentation
3.4 Results of Baseline Detection
4 Conclusion
References
Data-Efficient Information Extraction from Documents with Pre-trained Language Models
1 Introduction
2 Related Works on Information Extraction (IE)
3 Models
3.1 Encoder
3.2 Decoder
4 Datasets
4.1 Scanned Receipts OCR and Information Extraction (SROIE)
4.2 Real-World Purchase Orders (PO-51k)
5 Experiments
5.1 Experiment Settings
5.2 Few-Shot Learning
5.3 Intermediate Learning
6 Conclusion
References
ICDAR 2021 Workshop on Graph Representation Learning for Scanned Document Analysis (GLESDO)
GLESDO 2021 Preface
chPart6
Organization
General Chairs
Program Committee Chairs
Program Committee
Representing Standard Text Formulations as Directed Graphs
1 Introduction
2 Phrasemes and Standardized Passages in Legal Writing
3 Related Work
4 Legal Corpora
5 Method
5.1 Recurrent Sentences
5.2 Standardized Passages
6 Corpus Analysis
6.1 Statistics
6.2 Examples of Sentence Clusters
6.3 Examples and Analysis of the Found Passages
7 Conclusion and Future Work
A Appendices
A.1 Sources for Case Law Corpus
A.2 Sources for Contract Corpus
References
Multivalent Graph Matching for Symbol Recognition
1 Introduction
2 From GED to ExGED Problem Formulation
2.1 Graph Edit Distance
2.2 Extended Graph Edit Distance Problem Formulation
3 Cost Matrices for ExGED
3.1 Definition of the Cost Matrix for Node Operations
3.2 Definition of the Costs for Edge Operations in Extended Case
3.3 Illustrative Example
4 MMAS for ExGED
4.1 Algorithmic Scheme
4.2 Construction Graph
4.3 Construction of a Complete Matching and Neighborhood Search Strategy
4.4 Pheromone Update
5 Experiments
5.1 Data Set and Graph Representation
5.2 Definition of Cost Functions for Edit Operations
5.3 Parameter Setting for MMAS
5.4 Matching Quality Analysis: Interest of Using Merging and Splitting
5.5 Classification Problem
6 Conclusion
References
Key Information Recognition from Piping and Instrumentation Diagrams: Where We Are?
1 Introduction
2 State of the Art
3 Methodology
3.1 Line Detection Using Kernels
3.2 Text Detection Using Character Region Awareness for Text Detection
3.3 Symbol Detection
4 Conclusion
References
Graph Representation Learning in Document Wikification
1 Introduction
2 Background
2.1 Document Disambiguation with Wikipedia
2.2 Language Modelling Representation
3 Graph Representation Modelling
3.1 Concept Embedding
3.2 Context Representation
3.3 Sense Representation
3.4 Graph Convolutional Network
4 Disambiguation Model
4.1 Evaluation Datasets
4.2 msBERT Configuration
4.3 Comparison Systems
5 Experimental Results
6 Conclusion
References
Graph-Based Deep Generative Modelling for Document Layout Generation
1 Introduction
2 Related Work
2.1 Geometric Deep Learning
2.2 Document Layout Generation
3 Method
3.1 Graph to Sequence Mapping
3.2 GRNN Framework
3.3 Learning via Breadth-First Search
4 Experimental Validation
4.1 Datasets
4.2 Data Preparation for Graph Representation
4.3 Training Setup for Graph Recurrent Neural Network
4.4 Evaluation Schema
4.5 Experiments on Administrative Invoice Documents
5 Conclusion
References
Author Index

Recommend Papers

Document Analysis and Recognition – ICDAR 2021 Workshops: Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I (Image Processing, Computer Vision, Pattern Recognition, and Graphics) 303086197X, 9783030861971

105 32 104MB Read more

Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I (Image ... Vision, Pattern Recognition, and Graphics) 3030865487, 9783030865481

This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 1

114 58 108MB Read more

Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part IV ... Vision, Pattern Recognition, and Graphics) 3030863360, 9783030863364

This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 1

103 64 134MB Read more