Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II (Lecture Notes in Computer Science, 12822) [1st ed. 2021] 3030863301, 9783030863302

This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 1

630 169 116MB

English Pages 893 [878] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Foreword
Preface
Organization
Contents – Part II
Document Analysis for Literature Search
Towards Document Panoptic Segmentation with Pinpoint Accuracy: Method and Evaluation
1 Introduction
2 Related Work
3 Document Panoptic Segmentation Model
3.1 System Overview
3.2 Column Region Segmentation
3.3 Line Region Splitting
3.4 Line Region Sequence Labeling
3.5 Page Representation
4 Evaluation Metric
5 Experiments and Results
5.1 Dataset
5.2 Baseline Methods and Proposed Method
5.3 Experimental Results
5.4 Application
6 Conclusion
References
A Math Formula Extraction and Evaluation Framework for PDF Documents
1 Introduction
2 Related Work
3 Formula Detection Pipeline Components
3.1 SymbolScraper: Symbol Extraction from PDF
3.2 Formula Detection in Page Images
3.3 Recognizing Formula Structure (Parsing)
4 Results for Pipeline Components
5 New Tools for Visualization and Evaluation
6 Conclusion
References
Toward Automatic Interpretation of 3D Plots
1 Introduction
2 Related Work
3 Method
3.1 Network Architecture
3.2 The SurfaceGrid Dataset
3.3 Training Details
4 Experiments
4.1 Final Model Performance
4.2 Impact of Training Set Components on Performance
4.3 Performance on Real Plots and Wireframe Models
5 Conclusion
References
Document Summarization and Translation
Can Text Summarization Enhance the Headline Stance Detection Task? Benefits and Drawbacks
1 Introduction
2 Related Work
3 Text Summarization Approaches
3.1 Extractive Summarization
3.2 Abstractive Summarization
3.3 Hybrid Summarization
4 Stance Detection Approaches
4.1 Deep Learning Stance Detection Approach
4.2 Machine Learning Stance Detection Approach
5 Evaluation Environment
5.1 Datasets
5.2 Experiments
5.3 Evaluation Metrics and Results
5.4 Discussion
6 Conclusion and Future Work
References
The Biased Coin Flip Process for Nonparametric Topic Modeling
1 Introduction
2 Background
2.1 Dirichlet Process
2.2 Latent Dirichlet Allocation
2.3 Hierarchical Dirichlet Processes
3 Methods
3.1 Hierarchical Biased Coin Flip Process
4 Results
4.1 Perplexity
4.2 K-Finding
4.3 Gaussian Mixture Models
5 Conclusion
References
CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization
1 Introduction
2 CoMSum: Automatically Generated Summarization Dataset
2.1 Overview
2.2 Creating Contextual Multi-document Summarization (CoMSum)
2.3 Quality Evaluation of CoMSum
3 SIBERT: A Use of CoMSum for Extractive Summarization
3.1 The SIBERT Model
3.2 Experiments
4 Related Work
5 Conclusion
References
RTNet: An End-to-End Method for Handwritten Text Image Translation
1 Introduction
2 Related Works
2.1 Machine Translation
2.2 Text Line Recognition
2.3 Handwritten Text Line Translation
3 Methodology
3.1 Architecture
3.2 Theory
3.3 Training Strategy
3.4 OCR Model
3.5 MT Model
3.6 Autoregressive Decoding
4 Experiments
4.1 Dataset
4.2 Pre-training Results
4.3 End-to-End Results
5 Conclusions
References
Multimedia Document Analysis
NTable: A Dataset for Camera-Based Table Detection
1 Introduction
2 Related Work
2.1 Table Detection Dataset
2.2 Table Detection
3 The NTable Dataset
3.1 NTable-ori and NTable-cam
3.2 NTable-gen
3.3 Manually and Automatic Annotation
4 Experiment
4.1 Data and Metrics
4.2 Experimental Results and Analysis
5 Conclusion
References
Label Selection Algorithm Based on Boolean Interpolative Decomposition with Sequential Backward Selection for Multi-label Classification
1 Introduction
2 Preliminaries
3 A Label Selection Algorithm Based on Boolean Interpolative Decomposition with Sequential Backward Selection
3.1 Uninformative Label Reduction via Exact Boolean Matrix Decomposition
3.2 Less Informative Label Reduction via Sequential Backward Selection
3.3 Label Selection Algorithm Based on BID with SBS
4 Experiments
4.1 Six Benchmark Data Sets and Two Evaluation Metrics
4.2 Experimental Settings
4.3 Reconstruction Error Analysis for Three BMD-Type Methods
4.4 Classification Performance Comparison and Analysis
5 Conclusions
References
GSSF: A Generative Sequence Similarity Function Based on a Seq2Seq Model for Clustering Online Handwritten Mathematical Answers
1 Introduction
2 Related Works
2.1 Computer-Assisted Marking
2.2 Seq2seq OnHME Recognizer
3 Proposed Method
3.1 Generative Sequence Similarity Function
3.2 Similarity-Based Representation
4 Clustering-Based Measurement
5 Experiments
5.1 Datasets
5.2 OnHME Recognizer
5.3 Experimental Settings
5.4 Evaluation
5.5 Visualizing Similarity-Based Representation Matrix
6 Conclusion and Future Works
References
C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis
1 Introduction
2 IMCDB: Proposed Indian Mythological Comic Dataset
3 C2VNet Architecture
3.1 Preprocessing
3.2 Extraction of Comic Elements
3.3 Video Asset Creation
3.4 Video Composition
4 Experiments and Results
4.1 Training and Testing
4.2 Results
5 Conclusion
References
LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework
1 Introduction
2 Related Work
3 Framework
3.1 LSTM-Based VAE Framework
3.2 Loss Function Design
4 Experiments
4.1 Details for Generation
4.2 Layout Generation Results
4.3 Pre-training for Document Layout Analysis
5 Conclusions
References
Mobile Text Recognition
HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification
1 Introduction
2 Related Work
3 HDNN for Handwritten Stroke Classification
3.1 Background
3.2 HCRNN Architecture
4 Fragment Pooling
4.1 Fragment Max Pooling
4.2 Fragment Average Pooling
4.3 Fragment Stochastic Pooling
4.4 Mixed Fragment Pooling
5 Experimental Results
5.1 Dataset Description
5.2 Evaluation Procedure
5.3 Ablation Study
6 Conclusion and Outlook
References
RFDoc: Memory Efficient Local Descriptors for ID Documents Localization and Classification
1 Introduction
2 Previous Work
3 A Dataset of Aligned Patches from ID Document Images
4 Proposed Image Descriptor
4.1 Image Patch Descriptor Construction
4.2 Training and Features Selection
5 Experimental Evaluation
5.1 Training and Features Selection
5.2 Performance in ID Document Patches Verification
5.3 Complex Camera-Based Document Recognition
5.4 Memory Consumption
5.5 Matching Speed Comparison
5.6 Conclusion
References
Dynamic Receptive Field Adaptation for Attention-Based Text Recognition
1 Introduction
2 Related Work
2.1 Segmentation-Based Approaches
2.2 Sequence-Based Approaches
3 Method
3.1 Feature Encoder
3.2 Recurrent Attention Decoder
3.3 Dynamic Feature Adaption
3.4 Memory Attention
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparisons with State-of-the-Arts
4.5 Visual Illustrations
5 Conclusion
References
Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition
1 Introduction
2 Related Work
2.1 Text Detection and Recognition
2.2 End-to-End Text Recognition
2.3 Real-Time Image Recognition
3 Method
3.1 Text-Box Detector and Character Spotter
3.2 Character Decoder
3.3 Feature Extractor
3.4 Training
4 Experiments
4.1 Results
4.2 On-Device Benchmarking
5 Conclusion
References
MIDV-LAIT: A Challenging Dataset for Recognition of IDs with Perso-Arabic, Thai, and Indian Scripts
1 Introduction
2 Datasets for the Indian, Thai, and Perso-Arabic Scripts
3 MIDV-LAIT Dataset
3.1 MIDV-LAIT Target Problems
4 Baseline
4.1 Tesseract OCR 4.1.0
4.2 Experimental Results
5 Conclusion and Future Work
References
Determining Optimal Frame Processing Strategies for Real-Time Document Recognition Systems
1 Introduction
2 Framework
2.1 Real-Time Video Frames Recognition Model
2.2 Frame Processing Strategies
3 Experimental Evaluation
3.1 Dataset
3.2 Frame Processing Time
3.3 Comparing Combination Strategies
3.4 Performance Profiles with Stopping Rules
4 Discussion
References
Document Analysis for Social Good
Embedded Attributes for Cuneiform Sign Spotting
1 Introduction
2 Related Work
2.1 Cuneiform Character Retrieval
2.2 Word Embeddings
3 Cuneiform Sign Expressions
3.1 Gottstein System
3.2 Gottstein with Spatial Information
4 Evaluation
4.1 Database
4.2 Model Architecture
4.3 Training Setup
4.4 Similarity Metric
4.5 Evaluation Protocol
4.6 Evaluation Metrics
4.7 Results and Discussion
5 Conclusion
References
Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach
1 Introduction
2 Related Work
3 Learning Objectives
4 Training Process
5 Experiments
6 Conclusions
References
Two-Step Fine-Tuned Convolutional Neural Networks for Multi-label Classification of Children's Drawings
1 Introduction
2 Related Work
3 Methodology
3.1 Data Preparation
3.2 Two-Step Fine Tuning
3.3 Multi-label Classification
4 Experimental Protocol, Results and Analysis
5 Conclusion
References
DCINN: Deformable Convolution and Inception Based Neural Network for Tattoo Text Detection Through Skin Region
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Skin Region Detection Using YCrCb and HSV Color Space
3.2 Deep Model for Tattoo Text Detection from Skin Region
4 Experimental Results
4.1 Ablation Study
4.2 Experiments for Tattoo Text Detection
4.3 Experiments on Natural Scene Text Dataset
5 Conclusion and Future Work
References
Sparse Document Analysis Using Beta-Liouville Naive Bayes with Vocabulary Knowledge
1 Introduction
2 The Dirichlet Smoothing for Multinomial Bayesian Inference
3 The Liouville Assumption Using Vocabulary Knowledge
4 Experimental Results
4.1 Emotion Intensity Analysis
4.2 Hate Speech Detection
5 Conclusion
References
Automatic Signature-Based Writer Identification in Mixed-Script Scenarios*-6pt
1 Introduction
2 Proposed Method
2.1 Proposed Architecture
3 Experiments
3.1 Dataset
3.2 Experimental Framework and Results with Ablations Studies on Parameters
3.3 Comparative Study with State-of-the-Art and Analysis
3.4 Error Analysis
4 Conclusions and Future Directions
References
Indexing and Retrieval of Documents
Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting
1 Introduction
2 Related Work
2.1 Word Spotting
2.2 Ranking Metrics
3 Architecture
3.1 Problem Formulation
3.2 Encoder Networks
3.3 Learning Objectives
3.4 End-to-End Training
4 Experimental Evaluation
4.1 Implementation Details
4.2 Experimental Discussion
5 Conclusions
References
A-VLAD: An End-to-End Attention-Based Neural Network for Writer Identification in Historical Documents
1 Introduction
2 Related Work
2.1 Handcrafted Feature-Based Methods
2.2 Deep Neural Network-Based Methods
3 Proposed Method
3.1 CNN-Based Feature Extraction
3.2 Attention Classifier
3.3 A-VLAD Core Architecture
3.4 Online Triplet Training
4 Experiments
4.1 Datasets
4.2 Evaluation Methods
4.3 A-VLAD Configuration
5 Results and Discussion
5.1 Hyperparameter Finetuning
5.2 Performance on HisFragIR20 Dataset
6 Conclusion
References
Manga-MMTL: Multimodal Multitask Transfer Learning for Manga Character Analysis
1 Introduction
2 Related Work
3 Proposed Method
3.1 Overview
3.2 Learning Uni-modal Feature Extractor
3.3 Learning Multimodal Feature Extractor
3.4 Manga M-MTL Modal for Comic Character Analysis Tasks
3.5 Implementation Details
4 Results
4.1 Experiment Protocol
4.2 Multimodal vs. Uni-modal Models
4.3 Boosting with Simple Learnt Features Combination
5 Conclusion
References
Probabilistic Indexing and Search for Hyphenated Words
1 Introduction
2 Hyphenation in Historical Manuscripts
3 Probabilistic Indexing of Handwritten Text Images
4 Proposed Probabilistic Hyphenation Model
4.1 Simplifications
4.2 HwF Prediction: Optical and Language Modeling
4.3 Geometric Reasoning
4.4 Unite Prefix/Suffix HwF Pairs
5 Dataset, Assessment and Empirical Settings
5.1 Dataset
5.2 Evaluation Measures
5.3 Query Selection
5.4 Experimental Settings
6 Experiments and Results
6.1 Model Assessment Through HTR Performance
6.2 Hyphenated Word Probabilistic Indexing and Search Results
6.3 Live Demonstrator and Search Examples
7 Conclusion
References
Physical and Logical Layout Analysis
SandSlide: Automatic Slideshow Normalization
1 Introduction
2 Background
2.1 Slideshow Structure
2.2 PDF to PPT Converters
2.3 Automatic Document Annotation
3 Slideshow Normalisation
3.1 Problem Statement
3.2 Scope
4 SandSlide: Automatic Slideshow Normalization
4.1 Detecting and Labeling Objects in Slides
4.2 Qualitative Representation
4.3 Searching for Slide Layouts
5 Evaluation
5.1 Experimental Setup
5.2 Results
5.3 Rebuilding Slides
6 Conclusion and Future Work
References
Digital Editions as Distant Supervision for Layout Analysis of Printed Books
1 Introduction
2 Related Work
3 Analyzing Ground Truth Markup
4 Annotation by Forced Alignment
5 Experiments
5.1 Models
5.2 Pixel-Level Evaluations
5.3 Word-Level Evaluations
5.4 Region-Level Evaluations
5.5 Improving Accuracy with Self-training
5.6 Correlation of Word/Region-Level and Pixel-Level Metrics
6 Conclusions
References
Palmira: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts
1 Introduction
2 Related Work
3 Indiscapes2
4 Our Layout Parsing Network (Palmira)
4.1 Vanilla Mask-RCNN
4.2 Modification-1: Deformable Convolutions in Backbone
4.3 Modification-2: Deforming the Spatial Grid in Mask Head
4.4 Implementation Details
5 Experimental Setup
5.1 Baselines
5.2 Evaluation Setup
5.3 Evaluation Measures
6 Results
7 Conclusion
References
Page Layout Analysis System for Unconstrained Historic Documents
1 Introduction
2 Joint Text Line and Block Detection
2.1 ParseNet Architecture and Training
2.2 Text Baseline Detection
2.3 Text Line Polygon Estimation
2.4 Clustering Text Lines into Text Blocks
3 Multi-orientation Text Detection
4 Experiments
4.1 CBAD 2019 Text Baseline Detection
4.2 PERO Dataset Layout Analysis
4.3 Czech Newspaper OCR Evaluation
5 Discussion
6 Conclusion
References
Improved Graph Methods for Table Layout Understanding
1 Introduction
2 Background
3 Proposed Approach
3.1 Building the Initial Graph
3.2 Graph Neural Networks
3.3 Node and Edge Features
4 Experimental Settings and Datasets
4.1 Datasets
4.2 Improved Graphs Parameters
4.3 GNN Parameters
4.4 Assessment Measures
5 Results
5.1 Oracle Results
5.2 GNN Results
6 Discusion
7 Conclusions
References
Unsupervised Learning of Text Line Segmentation by Differentiating Coarse Patterns
1 Introduction
2 Related Work
3 Method
3.1 Deep Convolutional Network
3.2 Pseudo-RGB Image
3.3 Energy Minimization
4 Experiments
4.1 Data
4.2 Baseline Experiment
4.3 Effect of Patch Size (p)
4.4 Effect of Central Window Size (w)
4.5 Patch Saliency Visualization
4.6 Limitations
5 Results
5.1 Results on the VML-AHTE Dataset
5.2 Results on the ICDAR2017 Dataset
6 Conclusion
References
Recognition of Tables and Formulas
Rethinking Table Structure Recognition Using Sequence Labeling Methods
1 Introduction
2 Related Work
2.1 Table Structure Recognition
2.2 Sequence Labeling
3 Method
3.1 Encoder
3.2 Decoder
4 Data Preparation and Evaluation Metric
4.1 Data Preparation
4.2 Evaluation Metric
5 Experiments and Result
6 Conclusion
References
TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables
1 Introduction
2 The Current State of the Research
3 The TabLeX Dataset
3.1 Data Acquisition
3.2 Data Processing Pipeline
3.3 Dataset Statistics
4 Evaluation Metrics
5 Experiments
5.1 Implementation Details
5.2 Results
6 Conclusion
References
Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer
1 Introduction
2 Related Work
2.1 HMER Methods
2.2 Transformer
2.3 Right-to-Left Language Modeling
3 Methodology
3.1 CNN Encoder
3.2 Positional Encoding
3.3 Transformer Decoder
3.4 Bidirectional Training Strategy
4 Implementation Details
4.1 Networks
4.2 Training
4.3 Inferencing
5 Experiments
5.1 Datasets
5.2 Compare with State-of-the-Art Results
5.3 Ablation Study
5.4 Case Study
6 Conclusion
References
TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition
1 Introduction
2 Related Work
3 Methodology
3.1 Augmentation Operations
3.2 Augmentation Sub-operations
3.3 Augmentation Pipeline
4 Experiments and Results
4.1 Ground Truth
4.2 Performance Measures
4.3 Results and Analysis
5 Conclusion
References
An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder
1 Introduction
2 Our Approach
2.1 WAP-Based HMER Model
2.2 CNN-Transformer Based HMER Model
2.3 WAP-based HMER Model with MHA and Stacked Decoder
3 Experiments
3.1 Experimental Setup
3.2 Comparison of WAP and CNN-Transformer Baseline Models
3.3 Effect of MHA
3.4 Effect of Stacked Decoder
3.5 Comparison with Other HMER Systems and Error Analysis
4 Summary
References
Global Context for Improving Recognition of Online Handwritten Mathematical Expressions
1 Introduction
2 Related Works
2.1 Tree-Based BLSTM for HME Recognition
2.2 Spatial Relations Classification
2.3 Symbol Recognition with Bidirectional Context
3 Our Method
3.1 Overview of the Proposed Method
3.2 Sequential Model for Segmentation, Recognition, and Relation Classification
3.3 Deep BLSTM for Segmentation, Recognition, and Relation Classification
3.4 Constraint for Output at Precise Time Steps
3.5 Training Path Extractions
3.6 Symbol Level Parse Tree
4 Evaluation
4.1 Dataset
4.2 Setup for Experiments
4.3 Experiments and Results
5 Conclusion
References
Image-Based Relation Classification Approach for Table Structure Recognition
1 Introduction
2 Related Works
3 Problem Setting
4 Observations
5 Description of Our Approach
5.1 Preprocessor
5.2 Model
5.3 Data Augmentation
5.4 Scalability
6 Experiments
6.1 Dataset
6.2 Evaluation Metrics
6.3 Baselines
6.4 Experimental Details
6.5 Performances on the Full Data
6.6 Ablation Studies
6.7 Performances on the Small Data
7 Conclusion and Discussion
References
Image to LaTeX with Graph Neural Network for Mathematical Formula Recognition
1 Introduction
2 Related Work
2.1 Tree Based Methods
2.2 Grammar Based Methods
2.3 Learning Based Methods
3 Proposed Method
3.1 Problem Formulation
3.2 Graph Construction
3.3 Encoder
3.4 Decoder
4 Experiments
4.1 Datasets and Preprocessing
4.2 Implementation Details
4.3 Evaluation Metrics
5 Results
5.1 Performance Comparison
5.2 Robustness Analysis
5.3 Contribution Analysis
5.4 Case Study
6 Conclusion
References
NLP for Document Understanding
A Novel Method for Automated Suggestion of Similar Software Incidents Using 2-Stage Filtering: Findings on Primary Data
1 Introduction
2 Related Work
3 The Proposed Method
3.1 Class Labelling and Dataset Creation
3.2 Data Preprocessing
3.3 Text Vectorization
3.4 Removal of Imbalance
3.5 Classification Model
3.6 Classification Performance Analysis
3.7 Heuristic Definition
3.8 Calculating Thresholds
4 Experimental Setup and Evaluation
4.1 TTM Reduction
5 Discussions
6 Conclusion
References
Research on Pseudo-label Technology for Multi-label News Classification
1 Introduction
2 Related Work
3 Method
3.1 Text Representation
3.2 Text Label Matching
3.3 Pseudo-label Generation
3.4 Weighted Loss
4 Experiments
4.1 Datasets
4.2 Experimental Setting
5 Result and Analysis
5.1 Model Comparison
5.2 Pseudo-label Matching Threshold Screening
5.3 Pseudo-label Generation Strategy Verification
5.4 Pseudo-label Loss Weight Screening
6 Conclusion
References
Information Extraction from Invoices
1 Introduction
2 Related Work
3 Dataset
4 Methodology
4.1 Feature Extraction
4.2 Data Extraction Using the NER-Based Method
4.3 Data Extraction Using the Class-Based Method
5 Results
6 Conclusion
References
Are You Really Complaining? A Multi-task Framework for Complaint Identification, Emotion, and Sentiment Classification
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Problem Definition
3.2 BERT-Based Shared Private Multi-task Framework (BERT-SPMF)
4 Experiments, Results, and Analysis
4.1 Dataset
4.2 Deep Learning Baselines
4.3 Experimental Setup
4.4 Results and Discussion
4.5 Error Analysis
5 Conclusion and Future Work
References
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
1 Introduction
2 Related Works
3 Model Architecture
4 Regularization Techniques
5 Experiments
5.1 Training Procedure
5.2 Results
6 Ablation Study
7 Summary
References
Data Centric Domain Adaptation for Historical Text with OCR Errors
1 Introduction
2 Methods
2.1 Architecture
2.2 Noise Methods
2.3 Embeddings
3 Experiments
3.1 Data
3.2 Baselines
4 Results and Discussion
4.1 In-domain Setup
4.2 Cross-Domain Setup
5 Related Work
6 Conclusion
A Appendix
References
Temporal Ordering of Events via Deep Neural Networks
1 Introduction
2 State of the Art
3 Proposed Model
3.1 Shortest Dependency Path Branch
3.2 Sentence Sequence Branch
3.3 Classification Module
3.4 Training Model
4 Experiments
4.1 Dataset
4.2 Model Implementation and Training
4.3 Results and Model Analysis
5 Conclusion and Future Work
References
Document Collection Visual Question Answering
1 Introduction
2 Related Work
2.1 Document Understanding
2.2 Document Retrieval
2.3 Visual Question Answering
3 DocCVQA Dataset
3.1 Data Collection
3.2 Statistics and Analysis
3.3 Evaluation Metrics
4 Baselines
4.1 Text Spotting + QA
4.2 Database Approach
5 Results
5.1 Evidences
5.2 Answers
6 Conclusions and Future Work
References
Dialogue Act Recognition Using Visual Information
1 Introduction
2 Related Work
3 Model Architectures
3.1 Visual Model
3.2 Text Model
3.3 Joint Model
4 Dataset
4.1 Image Dataset Acquisition
4.2 Utterance Segmentation
5 Experiments
5.1 Comparison with SoTA
5.2 OCR Experiment
5.3 Image Embedding Dimension
5.4 Visual Model Experiment
5.5 Text Model Experiment
5.6 Joint Model Experiment
6 Conclusions
References
Are End-to-End Systems Really Necessary for NER on Handwritten Document Images?
1 Introduction
2 Related Work
3 Datasets
3.1 Esposalles
3.2 Synthetic Groningen Meaning Bank
3.3 George Washington
3.4 IAM DB
4 Two-Staged Named Entity Recognition
5 Experiments
5.1 Evaluation Protocol
5.2 Results
5.3 Discussion
6 Conclusion
References
Training Bi-Encoders for Word Sense Disambiguation
1 Introduction
2 Related Work and Motivation
3 Datasets and Preprocessing
3.1 Source Datasets
3.2 Data Preprocessing
3.3 Oversampling
4 Model Training
4.1 Base Model
4.2 Context-Gloss Training
4.3 Context-Hypernym Gloss Training
4.4 Context-Gloss and Context-Hypernym Multi-Task Training
4.5 Triplet Training
5 Experiments
6 Performance on Unseen Synsets
7 Training Parameters
8 Conclusion and Future Work
References
DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction
1 Introduction
2 Related Work
3 Problem Description
4 Approach
4.1 Parsing
4.2 Learning
4.3 Compatible Tree for Structured Prediction
5 Experiments
5.1 Results on CORD
5.2 Results on Invoices
6 Conclusion
References
Consideration of the Word's Neighborhood in GATs for Information Extraction in Semi-structured Documents
1 Introduction
2 Related Work
3 Proposed Method
3.1 SSD Representation
3.2 Graph Modeling
3.3 Word Modeling
3.4 Specific Graph: GAT
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Experimental Results
5 Conclusion
References
Author Index
Recommend Papers

Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II (Lecture Notes in Computer Science, 12822) [1st ed. 2021]
 3030863301, 9783030863302

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

LNCS 12822

Josep Lladós Daniel Lopresti Seiichi Uchida (Eds.)

Document Analysis and Recognition – ICDAR 2021 16th International Conference Lausanne, Switzerland, September 5–10, 2021 Proceedings, Part II

Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA

Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA

12822

More information about this subseries at http://www.springer.com/series/7412

Josep Lladós Daniel Lopresti Seiichi Uchida (Eds.) •



Document Analysis and Recognition – ICDAR 2021 16th International Conference Lausanne, Switzerland, September 5–10, 2021 Proceedings, Part II

123

Editors Josep Lladós Universitat Autònoma de Barcelona Barcelona, Spain

Daniel Lopresti Lehigh University Bethlehem, PA, USA

Seiichi Uchida Kyushu University Fukuoka-shi, Japan

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-86330-2 ISBN 978-3-030-86331-9 (eBook) https://doi.org/10.1007/978-3-030-86331-9 LNCS Sublibrary: SL6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

Our warmest welcome to the proceedings of ICDAR 2021, the 16th IAPR International Conference on Document Analysis and Recognition, which was held in Switzerland for the first time. Organizing an international conference of significant size during the COVID-19 pandemic, with the goal of welcoming at least some of the participants physically, is similar to navigating a rowboat across the ocean during a storm. Fortunately, we were able to work together with partners who have shown a tremendous amount of flexibility and patience including, in particular, our local partners, namely the Beaulieu convention center in Lausanne, EPFL, and Lausanne Tourisme, and also the international ICDAR advisory board and IAPR-TC 10/11 leadership teams who have supported us not only with excellent advice but also financially, encouraging us to setup a hybrid format for the conference. We were not a hundred percent sure if we would see each other in Lausanne but we remained confident, together with almost half of the attendees who registered for on-site participation. We relied on the hybridization support of a motivated team from the Lule University of Technology during the pre-conference, and professional support from Imavox during the main conference, to ensure a smooth connection between the physical and the virtual world. Indeed, our welcome is extended especially to all our colleagues who were not able to travel to Switzerland this year. We hope you had an exciting virtual conference week, and look forward to seeing you in person again at another event of the active DAR community. With ICDAR 2021, we stepped into the shoes of a longstanding conference series, which is the premier international event for scientists and practitioners involved in document analysis and recognition, a field of growing importance in the current age of digital transitions. The conference is endorsed by IAPR-TC 10/11 and celebrates its 30th anniversary this year with the 16th edition. The very first ICDAR conference was held in St. Malo, France in 1991, followed by Tsukuba, Japan (1993), Montreal, Canada (1995), Ulm, Germany (1997), Bangalore, India (1999), Seattle, USA (2001), Edinburgh, UK (2003), Seoul, South Korea (2005), Curitiba, Brazil (2007), Barcelona, Spain (2009), Beijing, China (2011), Washington DC, USA (2013), Nancy, France (2015), Kyoto, Japan (2017), and Syndey, Australia (2019). The attentive reader may have remarked that this list of cities includes several venues for the Olympic Games. This year the conference was be hosted in Lausanne, which is the headquarters of the International Olympic Committee. Not unlike the athletes who were recently competing in Tokyo, Japan, the researchers profited from a healthy spirit of competition, aimed at advancing our knowledge on how a machine can understand written communication. Indeed, following the tradition from previous years, 13 scientific competitions were held in conjunction with ICDAR 2021 including, for the first time, three so-called “long-term” competitions addressing wider challenges that may continue over the next few years.

vi

Foreword

Other highlights of the conference included the keynote talks given by Masaki Nakagawa, recipient of the IAPR/ICDAR Outstanding Achievements Award, and Mickaël Coustaty, recipient of the IAPR/ICDAR Young Investigator Award, as well as our distinguished keynote speakers Prem Natarajan, vice president at Amazon, who gave a talk on “OCR: A Journey through Advances in the Science, Engineering, and Productization of AI/ML”, and Beta Megyesi, professor of computational linguistics at Uppsala University, who elaborated on “Cracking Ciphers with ‘AI-in-the-loop’: Transcription and Decryption in a Cross-Disciplinary Field”. A total of 340 publications were submitted to the main conference, which was held at the Beaulieu convention center during September 8–10, 2021. Based on the reviews, our Program Committee chairs accepted 40 papers for oral presentation and 142 papers for poster presentation. In addition, nine articles accepted for the ICDAR-IJDAR journal track were presented orally at the conference and a workshop was integrated in a poster session. Furthermore, 12 workshops, 2 tutorials, and the doctoral consortium were held during the pre-conference at EPFL during September 5–7, 2021, focusing on specific aspects of document analysis and recognition, such as graphics recognition, camera-based document analysis, and historical documents. The conference would not have been possible without hundreds of hours of work done by volunteers in the organizing committee. First of all we would like to express our deepest gratitude to our Program Committee chairs, Joseph Lladós, Dan Lopresti, and Seiichi Uchida, who oversaw a comprehensive reviewing process and designed the intriguing technical program of the main conference. We are also very grateful for all the hours invested by the members of the Program Committee to deliver high-quality peer reviews. Furthermore, we would like to highlight the excellent contribution by our publication chairs, Liangrui Peng, Fouad Slimane, and Oussama Zayene, who negotiated a great online visibility of the conference proceedings with Springer and ensured flawless camera-ready versions of all publications. Many thanks also to our chairs and organizers of the workshops, competitions, tutorials, and the doctoral consortium for setting up such an inspiring environment around the main conference. Finally, we are thankful for the support we have received from the sponsorship chairs, from our valued sponsors, and from our local organization chairs, which enabled us to put in the extra effort required for a hybrid conference setup. Our main motivation for organizing ICDAR 2021 was to give practitioners in the DAR community a chance to showcase their research, both at this conference and its satellite events. Thank you to all the authors for submitting and presenting your outstanding work. We sincerely hope that you enjoyed the conference and the exchange with your colleagues, be it on-site or online. September 2021

Andreas Fischer Rolf Ingold Marcus Liwicki

Preface

It gives us great pleasure to welcome you to the proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021). ICDAR brings together practitioners and theoreticians, industry researchers and academics, representing a range of disciplines with interests in the latest developments in the field of document analysis and recognition. The last ICDAR conference was held in Sydney, Australia, in September 2019. A few months later the COVID-19 pandemic locked down the world, and the Document Analysis and Recognition (DAR) events under the umbrella of IAPR had to be held in virtual format (DAS 2020 in Wuhan, China, and ICFHR 2020 in Dortmund, Germany). ICDAR 2021 was held in Lausanne, Switzerland, in a hybrid mode. Thus, it offered the opportunity to resume normality, and show that the scientific community in DAR has kept active during this long period. Despite the difficulties of COVID-19, ICDAR 2021 managed to achieve an impressive number of submissions. The conference received 340 paper submissions, of which 182 were accepted for publication (54%) and, of those, 40 were selected as oral presentations (12%) and 142 as posters (42%). Among the accepted papers, 112 had a student as the first author (62%), and 41 were identified as coming from industry (23%). In addition, a special track was organized in connection with a Special Issue of the International Journal on Document Analysis and Recognition (IJDAR). The Special Issue received 32 submissions that underwent the full journal review and revision process. The nine accepted papers were published in IJDAR and the authors were invited to present their work in the special track at ICDAR. The review model was double blind, i.e. the authors did not know the name of the reviewers and vice versa. A plagiarism filter was applied to each paper as an added measure of scientific integrity. Each paper received at least three reviews, totaling more than 1,000 reviews. We recruited 30 Senior Program Committee (SPC) members and 200 reviewers. The SPC members were selected based on their expertise in the area, considering that they had served in similar roles in past DAR events. We also included some younger researchers who are rising leaders in the field. In the final program, authors from 47 different countries were represented, with China, India, France, the USA, Japan, Germany, and Spain at the top of the list. The most popular topics for accepted papers, in order, included text and symbol recognition, document image processing, document analysis systems, handwriting recognition, historical document analysis, extracting document semantics, and scene text detection and recognition. With the aim of establishing ties with other communities within the concept of reading systems at large, we broadened the scope, accepting papers on topics like natural language processing, multimedia documents, and sketch understanding. The final program consisted of ten oral sessions, two poster sessions, three keynotes, one of them given by the recipient of the ICDAR Outstanding Achievements Award, and two panel sessions. We offer our deepest thanks to all who contributed their time

viii

Preface

and effort to make ICDAR 2021 a first-rate event for the community. This year’s ICDAR had a large number of interesting satellite events as well: workshops, tutorials, competitions, and the doctoral consortium. We would also like to express our sincere thanks to the keynote speakers, Prem Natarajan and Beta Megyesi. Finally, we would like to thank all the people who spent time and effort to make this impressive program: the authors of the papers, the SPC members, the reviewers, and the ICDAR organizing committee as well as the local arrangements team. September 2021

Josep Lladós Daniel Lopresti Seiichi Uchida

Organization

Organizing Committee General Chairs Andreas Fischer Rolf Ingold Marcus Liwicki

University of Applied Sciences and Arts Western Switzerland, Switzerland University of Fribourg, Switzerland Luleå University of Technology, Sweden

Program Committee Chairs Josep Lladós Daniel Lopresti Seiichi Uchida

Computer Vision Center, Spain Lehigh University, USA Kyushu University, Japan

Workshop Chairs Elisa H. Barney Smith Umapada Pal

Boise State University, USA Indian Statistical Institute, India

Competition Chairs Harold Mouchère Foteini Simistira

University of Nantes, France Luleå University of Technology, Sweden

Tutorial Chairs Véronique Eglin Alicia Fornés

Institut National des Sciences Appliquées, France Computer Vision Center, Spain

Doctoral Consortium Chairs Jean-Christophe Burie Nibal Nayef

La Rochelle University, France MyScript, France

Publication Chairs Liangrui Peng Fouad Slimane Oussama Zayene

Tsinghua University, China University of Fribourg, Switzerland University of Applied Sciences and Arts Western Switzerland, Switzerland

Sponsorship Chairs David Doermann Koichi Kise Jean-Marc Ogier

University at Buffalo, USA Osaka Prefecture University, Japan University of La Rochelle, France

x

Organization

Local Organization Chairs Jean Hennebert Anna Scius-Bertrand Sabine Süsstrunk

University of Applied Sciences and Arts Western Switzerland, Switzerland University of Applied Sciences and Arts Western Switzerland, Switzerland École Polytechnique Fédérale de Lausanne, Switzerland

Industrial Liaison Aurélie Lemaitre

University of Rennes, France

Social Media Manager Linda Studer

University of Fribourg, Switzerland

Program Committee Senior Program Committee Members Apostolos Antonacopoulos Xiang Bai Michael Blumenstein Jean-Christophe Burie Mickaël Coustaty Bertrand Coüasnon Andreas Dengel Gernot Fink Basilis Gatos Nicholas Howe Masakazu Iwamura C. V. Javahar Lianwen Jin Dimosthenis Karatzas Laurence Likforman-Sulem Cheng-Lin Liu Angelo Marcelli Simone Marinai Wataru Ohyama Luiz Oliveira Liangrui Peng Ashok Popat Partha Pratim Roy Marçal Rusiñol Robert Sablatnig Marc-Peter Schambach

University of Salford, UK Huazhong University of Science and Technology, China University of Technology Sydney, Australia University of La Rochelle, France University of La Rochelle, France University of Rennes, France DFKI, Germany TU Dortmund University, Germany Demokritos, Greece Smith College, USA Osaka Prefecture University, Japan IIIT Hyderabad, India South China University of Technology, China Computer Vision Center, Spain Télécom ParisTech, France Chinese Academy of Sciences, China University of Salerno, Italy University of Florence, Italy Saitama Institute of Technology, Japan Federal University of Parana, Brazil Tsinghua University, China Google Research, USA Indian Institute of Technology Roorkee, India Computer Vision Center, Spain Vienna University of Technology, Austria Siemens, Germany

Organization

Srirangaraj Setlur Faisal Shafait Nicole Vincent Jerod Weinman Richard Zanibbi

xi

University at Buffalo, USA National University of Sciences and Technology, India Paris Descartes University, France Grinnell College, USA Rochester Institute of Technology, USA

Program Committee Members Sébastien Adam Irfan Ahmad Sheraz Ahmed Younes Akbari Musab Al-Ghadi Alireza Alaei Eric Anquetil Srikar Appalaraju Elisa H. Barney Smith Abdel Belaid Mohammed Faouzi Benzeghiba Anurag Bhardwaj Ujjwal Bhattacharya Alceu Britto Jorge Calvo-Zaragoza Chee Kheng Ch’Ng Sukalpa Chanda Bidyut B. Chaudhuri Jin Chen Youssouf Chherawala Hojin Cho Nam Ik Cho Vincent Christlein Christian Clausner Florence Cloppet Donatello Conte Kenny Davila Claudio De Stefano Sounak Dey Moises Diaz David Doermann Antoine Doucet Fadoua Drira Jun Du Véronique Eglin Jihad El-Sana Jonathan Fabrizio

Nadir Farah Rafael Ferreira Mello Miguel Ferrer Julian Fierrez Francesco Fontanella Alicia Fornés Volkmar Frinken Yasuhisa Fujii Akio Fujiyoshi Liangcai Gao Utpal Garain C. Lee Giles Romain Giot Lluis Gomez Petra Gomez-Krämer Emilio Granell Mehdi Hamdani Gaurav Harit Ehtesham Hassan Anders Hast Sheng He Jean Hennebert Pierre Héroux Laurent Heutte Nina S. T. Hirata Tin Kam Ho Kaizhu Huang Qiang Huo Donato Impedovo Reeve Ingle Brian Kenji Iwana Motoi Iwata Antonio Jimeno Slim Kanoun Vassilis Katsouros Ergina Kavallieratou Klara Kedem

xii

Organization

Christopher Kermorvant Khurram Khurshid Soo-Hyung Kim Koichi Kise Florian Kleber Pramod Kompalli Alessandro Lameiras Koerich Bart Lamiroy Anh Le Duc Frank Lebourgeois Gurpreet Lehal Byron Leite Dantas Bezerra Aurélie Lemaitre Haifeng Li Zhouhui Lian Minghui Liao Rafael Lins Wenyin Liu Lu Liu Georgios Louloudis Yue Lu Xiaoqing Lu Muhammad Muzzamil Luqman Sriganesh Madhvanath Muhammad Imran Malik R. Manmatha Volker Märgner Daniel Martín-Albo Carlos David Martinez Hinarejos Minesh Mathew Maroua Mehri Carlos Mello Tomo Miyazaki Momina Moetesum Harold Mouchère Masaki Nakagawa Nibal Nayef Atul Negi Clemens Neudecker Cuong Tuan Nguyen Hung Tuan Nguyen Journet Nicholas Jean-Marc Ogier Shinichiro Omachi Umapada Pal Shivakumara Palaiahnakote

Thierry Paquet Swapan Kr. Parui Antonio Parziale Antonio Pertusa Giuseppe Pirlo Réjean Plamondon Stefan Pletschacher Utkarsh Porwal Vincent Poulain D’Andecy Ioannis Pratikakis Joan Puigcerver Siyang Qin Irina Rabaev Jean-Yves Ramel Oriol Ramos Terrades Romain Raveaux Frédéric Rayar Ana Rebelo Pau Riba Kaspar Riesen Christophe Rigaud Syed Tahseen Raza Rizvi Leonard Rothacker Javad Sadri Rajkumar Saini Joan Andreu Sanchez K. C. Santosh Rosa Senatore Amina Serir Mathias Seuret Badarinath Shantharam Imran Si