Guessing Random Additive Noise Decoding: A Hardware Perspective [1st ed. 2023] 3031316622, 9783031316623

This book gives a detailed overview of a universal Maximum Likelihood (ML) decoding technique, known as Guessing Random

166 81 15MB

English Pages 165 [157] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Acknowledgments
Contents
Part I Guessing Random Additive Noise Decoding: Preliminaries
1 Guessing Random Additive Noise Decoding (GRAND)
1.1 Introduction
1.2 GRAND for Linear Block Codes (n,k)
1.3 GRAND Variants
1.3.1 Conventional GRAND Variants
1.3.2 Specialized GRAND Variants
1.4 Test Error Pattern (TEP) Generation with GRAND
1.4.1 GRANDAB
1.4.2 ORBGRAND
1.4.3 SGRAND
1.5 Performance and Complexity of GRAND
1.5.1 Performance of GRAND Variants
1.5.2 Computational Complexity of GRAND
1.6 Hardware Architectures for GRAND
1.6.1 Offline TEP Generation
1.6.2 Online TEP Generation
1.7 Conclusion
References
Part II Hardware Architectures for Conventional GRAND Variants
2 Hardware Architecture for GRAND with ABandonment (GRANDAB)
2.1 GRAND with ABandonment (GRANDAB)
2.1.1 Decoding Performance and Complexity Analysis
2.1.2 Proposed Simplifications for the Codebook Membership Evaluation
2.1.2.1 Simplifications for Checking TEPs (e) with HW =1
2.1.2.2 Simplifications for Checking TEPs (e) with HW >1
2.2 Proposed VLSI Architecture for GRANDAB
2.2.1 Checking TEPs with Hamming Weight of 1
2.2.2 Checking TEPs with Hamming Weight of 2
2.2.3 Checking TEPs with HW of 3
2.3 VLSI Architecture for GRANDAB with Improved Parallelization Factor
2.3.1 Evaluating TEPs with Hamming Weight of 2 (L > 1)
2.3.2 Evaluating TEPs with Hamming Weight of 3 (L > 1)
2.4 Implementation Results for the Proposed GRANDAB Hardware
2.5 Conclusion
References
3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)
3.1 Ordered Reliability Bits GRAND (ORBGRAND)
3.2 ORBGRAND Design Considerations
3.2.1 Parametric Analysis of ORBGRAND
3.2.2 Proposed Simplified Generation of Integer Partitions (λ)
3.3 Proposed Hardware Architecture for ORBGRAND
3.3.1 Scheduling and Details
3.3.2 Evaluating TEPs (e) for P3
3.3.3 Evaluating TEPs (e) for P>3
3.3.4 Proposed ORBGRAND VLSI Architecture
3.4 ORBGRAND Design Exploration
3.4.1 ORBGRAND Baseline Implementation
3.4.2 Design Expansion and Latency Analysis
3.4.3 ORBGRAND Area Optimization
3.5 Conclusion
References
4 Hardware Architecture for List GRAND (LGRAND)
4.1 Introduction
4.2 ORBGRAND: Analysis and Proposed Modifications
4.2.1 ORBGRAND: Parametric Analysis (LWmax and HWmax)
4.2.2 Proposed List-GRAND (LGRAND) for Approaching ML Decoding
4.2.3 LGRAND: Analyzing the Parameter δ
4.3 Evaluating the Decoding Performance of LGRAND
4.4 Implementing List-GRAND (LGRAND) in Hardware
4.4.1 Baseline ORBGRAND Hardware
4.4.2 Developing LGRAND Hardware by Tweaking ORBGRAND VLSI Architecture
4.4.3 Hardware Implementation Results for the Proposed LGRAND
4.4.3.1 Comparison with ORBGRAND
4.4.3.2 Comparison with Fixed Latency ORBGRAND Decoder (F.L ORBGRAND)
4.5 Conclusion
References
Part III Hardware Architectures for Specialized GRAND Variants
5 Hardware Architecture for GRAND Markov Order (GRAND-MO)
5.1 GRAND Markov Order (GRAND-MO): Introduction
5.1.1 Channel Model for GRAND-MO
5.1.2 GRAND-MO Decoding of Linear Block Codes
5.2 Decoding Performance Evaluation for GRAND-MO
5.3 Analyzing Test Error Patterns (TEPs) for GRAND-MO
5.3.1 TEP Generation for Baseline GRAND-MO (Markov Query Order)
5.3.2 Simplifying TEP Generation Scheme for GRAND-MO
5.3.3 Analyzing Parameters (m, lm) of the Proposed GRAND-MO TEP Generation
5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)
5.4.1 Proposed GRAND-MO VLSI Architecture
5.4.2 Microarchitecture for Checking TEPs with Noise Burst of Length l (1ln)
5.4.3 Proposed TEP (e) Scheduling for GRAND-MO
5.4.4 GRAND-MO Hardware Implementation Results
5.5 Conclusion
References
6 Hardware Architecture for Fading-GRAND
6.1 Introduction
6.2 Fading-GRAND: Channel Model
6.2.1 Spatial Diversity Combining Techniques
6.3 Fading GRAND: Algorithm, TEP Generation and Complexity Analysis
6.3.1 Fading GRAND: Algorithm
6.3.2 Fading GRAND: TEP Generation
6.3.2.1 Threshold (Δ) Computation for Reliable Set (I)
6.4 Fading GRAND: Performance Evaluation
6.4.1 Fading-GRAND Decoding of RLCs
6.4.2 Fading-GRAND Decoding of BCH Codes
6.4.3 Fading-GRAND Decoding of CRC Codes
6.5 Proposed VLSI Architecture for Fading-GRAND
6.5.1 Proposed Modifications in GRANDAB Hardware FGRANDAB-VLSI
6.5.2 Proposed Fading-GRAND Hardware
6.6 Conclusion
References
Part IV GRAND Extensions
7 A Survey of Recent GRAND Variants
7.1 GRAND for Various Communication Channel
7.1.1 GRAND for MIMO Channel
7.1.2 GRAND for MAC Channel
7.2 GRAND for Higher-Order Modulation
7.2.1 GRAND with 64-QAM Modulation
7.2.2 GRAND with 256-QAM Modulation
7.2.3 Symbol-Level GRAND (s-GRAND) for Higher Order Modulations Over Fading Channels
7.3 GRAND for Joint Detection and Decoding
7.4 GRAND for Network Coding
7.5 GRAND for Countering Jamming in Wireless Communication
7.6 GRAND for Assisting Conventional Decoders
7.7 Partitioned GRAND (PGRAND)
References
Appendix A
Proof of Lemma 1
Recommend Papers

Guessing Random Additive Noise Decoding: A Hardware Perspective [1st ed. 2023]
 3031316622, 9783031316623

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Syed Mohsin Abbas Marwan Jalaleddine Warren J. Gross

Guessing Random Additive Noise Decoding A Hardware Perspective

Guessing Random Additive Noise Decoding

Syed Mohsin Abbas • Marwan Jalaleddine • Warren J. Gross

Guessing Random Additive Noise Decoding A Hardware Perspective

123

Syed Mohsin Abbas McGill University Montréal, QC, Canada

Marwan Jalaleddine McGill University Montréal, QC, Canada

Warren J. Gross McGill University Montréal, QC, Canada

ISBN 978-3-031-31662-3 ISBN 978-3-031-31663-0 (eBook) https://doi.org/10.1007/978-3-031-31663-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

This book is dedicated to all the members and alumni of the Integrated Systems for Information Processing (ISIP) Lab, Department of Electrical and Computer Engineering, McGill University, Montréal, Québec, Canada.

Preface

Recently, a universal Maximum Likelihood (ML) decoding technique known as Guessing Random Additive Noise Decoding (GRAND) for short-length and high-rate linear block codes has been introduced. In this book, a comprehensive explanation of the GRAND and its variants is provided. The hardware implementation of the different GRAND variants is the primary topic of discussion in this book. The structure of this book is as follows: Part I introduces linear block codes and the GRAND algorithm. Part II discusses the hardware architecture for typical GRAND variants, which can be applied to any underlying communication channel, whereas Part III describes the hardware architectures for specific GRAND variants developed for specific communication channels. Last but not least, Part IV provides an overview of recently proposed GRAND variants and their unique applications. Montréal, QC, Canada August, 2022

Syed Mohsin Abbas Marwan Jalaleddine Warren J. Gross

vii

Acknowledgments

We would like to acknowledge the contributions of Dr. Thibaud Tonnellier and Dr. Furkan Ercan, for the works presented in Chapters 2 and 3 during their tenure at the ISIP Lab, McGill University, Montréal, Québec, Canada. Dr. Furkan Ercan is a research scientist affiliated with Intel Labs, Intel Corporation, Hudson, MA, 01749 (Email: [email protected]). Dr. Thibaud Tonnellier is with Airbus Defence and Space, France (Email: [email protected].) Furthermore, we thank Jiajie Li (a PhD student at the ISIP Lab) for proofreading this edition of the book.

ix

Contents

Part I Guessing Random Additive Noise Decoding: Preliminaries 1

Guessing Random Additive Noise Decoding (GRAND) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 GRAND for Linear Block Codes (n, k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 GRAND Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Conventional GRAND Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Specialized GRAND Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Test Error Pattern (TEP) Generation with GRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 GRANDAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 ORBGRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 SGRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Performance and Complexity of GRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Performance of GRAND Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Computational Complexity of GRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Hardware Architectures for GRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Offline TEP Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Online TEP Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 4 6 6 6 7 7 8 9 10 10 13 13 13 14 14 15

Part II Hardware Architectures for Conventional GRAND Variants 2

Hardware Architecture for GRAND with ABandonment (GRANDAB) . . . . . . . . . . . . . . . . . . . Syed Mohsin Abbas, Marwan Jalaleddine, Furkan Ercan, Thibaud Tonnellier, and Warren J. Gross 2.1 GRAND with ABandonment (GRANDAB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Decoding Performance and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Proposed Simplifications for the Codebook Membership Evaluation . . . . . . . . . . . . . .

21

21 21 23

xi

xii

3

4

Contents

2.2 Proposed VLSI Architecture for GRANDAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Checking TEPs with Hamming Weight of 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Checking TEPs with Hamming Weight of 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Checking TEPs with HW of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 VLSI Architecture for GRANDAB with Improved Parallelization Factor . . . . . . . . . . . . . . . . 2.3.1 Evaluating TEPs with Hamming Weight of 2 (L > 1) . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Evaluating TEPs with Hamming Weight of 3 (L > 1) . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Implementation Results for the Proposed GRANDAB Hardware . . . . . . . . . . . . . . . . . . . . . . . 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25 26 27 28 31 32 34 36 37 38

Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND) . . . . . . . . . . . . . Syed Mohsin Abbas, Marwan Jalaleddine, Furkan Ercan, Thibaud Tonnellier, and Warren J. Gross 3.1 Ordered Reliability Bits GRAND (ORBGRAND) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 ORBGRAND Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Parametric Analysis of ORBGRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Proposed Simplified Generation of Integer Partitions (λ) . . . . . . . . . . . . . . . . . . . . . . . 3.3 Proposed Hardware Architecture for ORBGRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Scheduling and Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Evaluating TEPs (e) for P ≤ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Evaluating TEPs (e) for P > 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Proposed ORBGRAND VLSI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 ORBGRAND Design Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 ORBGRAND Baseline Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Design Expansion and Latency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 ORBGRAND Area Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Hardware Architecture for List GRAND (LGRAND) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 ORBGRAND: Analysis and Proposed Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 ORBGRAND: Parametric Analysis (LWmax and HWmax ) . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Proposed List-GRAND (LGRAND) for Approaching ML Decoding . . . . . . . . . . . . . 4.2.3 LGRAND: Analyzing the Parameter δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Evaluating the Decoding Performance of LGRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Implementing List-GRAND (LGRAND) in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Baseline ORBGRAND Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Developing LGRAND Hardware by Tweaking ORBGRAND VLSI Architecture . . . 4.4.3 Hardware Implementation Results for the Proposed LGRAND . . . . . . . . . . . . . . . . . . 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73 73 74 74 75 79 82 83 85 86 87 90 91

39 42 42 45 49 49 50 55 65 65 66 67 69 70 71

Contents

xiii

Part III Hardware Architectures for Specialized GRAND Variants 5

Hardware Architecture for GRAND Markov Order (GRAND-MO) . . . . . . . . . . . . . . . . . . . . . . 5.1 GRAND Markov Order (GRAND-MO): Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Channel Model for GRAND-MO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 GRAND-MO Decoding of Linear Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Decoding Performance Evaluation for GRAND-MO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Analyzing Test Error Patterns (TEPs) for GRAND-MO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 TEP Generation for Baseline GRAND-MO (Markov Query Order) . . . . . . . . . . . . . . . 5.3.2 Simplifying TEP Generation Scheme for GRAND-MO . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Analyzing Parameters (m, lm ) of the Proposed GRAND-MO TEP Generation . . . . . . 5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO) . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Proposed GRAND-MO VLSI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Microarchitecture for Checking TEPs with Noise Burst of Length l (1 ≤ l ≤ n) . . . . 5.4.3 Proposed TEP (e) Scheduling for GRAND-MO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 GRAND-MO Hardware Implementation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 95 96 97 97 101 102 102 103 106 109 109 111 119 122 122

6

Hardware Architecture for Fading-GRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Fading-GRAND: Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Spatial Diversity Combining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Fading GRAND: Algorithm, TEP Generation and Complexity Analysis . . . . . . . . . . . . . . . . . 6.3.1 Fading GRAND: Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Fading GRAND: TEP Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Fading GRAND: Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Fading-GRAND Decoding of RLCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Fading-GRAND Decoding of BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Fading-GRAND Decoding of CRC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Proposed VLSI Architecture for Fading-GRAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Proposed Modifications in GRANDAB Hardware [16] . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Proposed Fading-GRAND Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

125 125 126 126 127 127 127 130 135 135 136 136 136 138 139 140

Part IV GRAND Extensions 7

A Survey of Recent GRAND Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 GRAND for Various Communication Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 GRAND for MIMO Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 GRAND for MAC Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 GRAND for Higher-Order Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 GRAND with 64-QAM Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

143 143 143 144 144 144

xiv

Contents

7.2.2 7.2.3

GRAND with 256-QAM Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbol-Level GRAND (s-GRAND) for Higher Order Modulations Over Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 GRAND for Joint Detection and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 GRAND for Network Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 GRAND for Countering Jamming in Wireless Communication . . . . . . . . . . . . . . . . . . . . . . . . 7.6 GRAND for Assisting Conventional Decoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Partitioned GRAND (PGRAND) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

144 144 145 145 145 146 146 146

Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Part I

Guessing Random Additive Noise Decoding: Preliminaries

Preliminaries for Guessing Random Additive Noise Decoding (GRAND), a universal maximum likelihood decoding technique for linear block codes with short code lengths and high code rates, are discussed in this part of the book. Different GRAND variants are also described in detail, as well as their decoding performance and computational complexity.

Chapter 1

Guessing Random Additive Noise Decoding (GRAND)

Abstract This chapter introduces GRAND, a universal maximum likelihood decoding technique for linear block codes of short code-length and high code-rates. GRAND is a noise-centric decoder, which implies that in contrast to other decoding techniques that use the underlying code structure to decode a codeword, GRAND tries to guess the noise that corrupted the transmitted codeword. GRAND features soft-input and hard-input variants that differ primarily in the order in which the channel-induced noise is guessed. This chapter introduces those GRAND variants and provides an analysis of their Frame Error Rate (FER) performance as well as their computational complexity. Notations Matrices are denoted by a bold upper-case letter (M), while vectors are denoted with bold lower-case letters (v). The transpose operator is represented by  . The number of k-combinations from a given set n of n elements is noted by k . 1n is the indicator vector where all locations except the nth are 0 and the nth is 1. Similarly, 1v is the indicator vector in which all locations vi (∀i ∈ [1, n]) are 1. All the indices start at 1. For this work, all operations are restricted to the Galois field with 2 elements, noted F2 . Furthermore, we restrict ourselves to (n, k) linear block codes, where n is the code length and k is the code dimension.

1.1 Introduction Since their conception in 1948 [1], channel codes have become an integral part of modern communication systems. Shannon proved the existence of a maximum achievable rate, known as the Shannon limit, for reliable communication [1]. Since then, a great deal of effort has gone into developing practical channel coding techniques in order to reach this Shannon limit (channel capacity). Previously proposed channel coding methods targeted algebraic coding [2, 3], in which the number of correctable errors is determined by design. Later probabilistic coding approaches, such as Low Density Parity Check (LDPC) codes [4] and Turbo codes [5], were developed that outperformed previously proposed algebraic coding techniques, and these LDPC and Turbo codes are also known for their ability to approach capacity. More recently, Polar codes [6, 7] were proven in 2009 as the first family of codes that asymptotically achieves the Shannon limit © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_1

3

4

1 Guessing Random Additive Noise Decoding (GRAND)

Algorithm 1: GRAND for linear codes Input: H, G−1 , e, yˆ Output: uˆ 1 e←0 2 while H · (yˆ ⊕ e)  0 do 3 e ← generateNewErrorPattern() 4 5

uˆ ← (yˆ ⊕ e) · G−1 return uˆ

for binary-input symmetric memory-less channels, as well as discrete and continuous memory-less channels, with lower encoding and decoding complexity. However, in the short-to-medium block length regime, these capacity approaching and capacity achieving techniques do not perform well [8]. Short channel codes and the corresponding Maximum Likelihood (ML) decoding algorithms have recently reignited a lot of interest in industry and academia. Applications with stringent reliability and ultra low latency requirements are fueling this interest. A few examples of these applications include Ultra-Reliable and Low Latency Communications (URLLC) [9, 10], a 5G use case for mission critical applications, augmented and virtual reality [11], Intelligent Transportation Systems (ITS) [12], the Internet of Things (IoT) [13, 14], Machine-to-Machine (M2M) communication [15] and many others. Guessing Random Additive Noise Decoding [16] is a universal ML decoding technique for short-length and high-rate channel codes. GRAND is a noise-centric and code-agnostic decoding approach, which implies that it tries to determine the noise that corrupted the codeword during transmission across the communication channel rather than relying on the structure of the underlying code to decode a codeword. GRAND guesses channel-induced noise in transmitted codewords by generating Test Error Patterns (TEP) (e), which are then applied to the received vector ( yˆ ) from the channel, and the resulting vector ( yˆ ⊕ e), is queried for codebook membership. The most important feature of GRAND is its ability to work with both structured and unstructured codes provided that there is a way of checking codebook membership of a vector. Linear block codes (n, k) with a well defined encoding structure are an example of structured codes. Unstructured codes, on the other hand, don’t have any kind of encoding structure and are stored in a dictionary [1, 17]. For such unstructured codes, a tree-search with a complexity that is logarithmic in the code-length can be used to check the codebook membership of a vector. Furthermore, GRAND achieves capacity when used with random codebooks, as demonstrated in [16].

1.2 GRAND for Linear Block Codes (n, k) Definition 1.1 A linear block code is a linear mapping g : F2k → F2n , where k < n. In this mapping, a vector u of size k maps to a vector c of size n. Definition 1.2 The ratio R 

k n

is called the code-rate.

Definition 1.3 To characterise any linear block code, there exists a k × n matrix G called generator matrix and a (n − k) × n matrix H called parity check matrix. Definition 1.4 The set of the 2k vectors c is called a code C, whose elements c are called codewords and each codeword verifies the following property:

1.2 GRAND for Linear Block Codes (n, k)

5

∀ c ∈ C, H · c  = 0.

(1.1)

Consider the case where c was sent via a noisy channel and yˆ was received at the channel’s output. Because of the channel noise, yˆ and c might differ. As a result, the relationship between yˆ and c may be deduced as follows: yˆ = c ⊕ e, where e is the error pattern caused by the channel noise. Algorithm 1 summarizes the steps of the GRAND procedure. The inputs to the algorithm are the received vector yˆ of size n, a (n − k) × n parity check matrix H, a n × k matrix G−1 where G−1 refers to the inverse of the generator matrix G of the code such that G · G−1 is the k × k identity matrix. GRAND is centered around generating test error patterns (e), applying them to the received vector ( yˆ ) and querying the resultant vector ( yˆ ⊕ e) for codebook memberships as follows: H · ( yˆ ⊕ e) == 0

(1.2)

If this codebook membership criterion (1.2) is satisfied, e is the guessed noise and cˆ  yˆ ⊕ e is the estimated codeword. Please note that, GRAND can be used with any codebook (C) as long as there is a method for validating a vector’s codebook membership. For linear codebooks, codebook membership can be verified using H. Whereas for other non-structured codebooks, stored in a dictionary, the codebook membership of a vector can be checked with a dictionary lookup. For the rest of the discussion, we restrict ourselves to (n, k) linear block codes. Example 1.1: GRAND for (7, 4) Hamming Code Let G and H be the generator and parity check matrix for (7, 4) Hamming Code, such that a vector u of size 4 maps to the vector c of size 7. ⎡1 ⎢ ⎢0 G = ⎢⎢ ⎢0 ⎢0 ⎣

0 1 0 0

0 0 1 0

0 0 0 1

1 0 1 1

1 1 0 1

0⎤⎥ 1⎥⎥ , 1⎥⎥ 1⎥⎦

⎡ 1 0 1 1 1 0 0⎤ ⎥ ⎢ H = ⎢⎢1 1 0 1 0 1 0⎥⎥ , ⎢ 0 1 1 1 0 0 1⎥ ⎦ ⎣





Let u = 1 0 0 1 , c = 1 0 0 1 0 0 1 and yˆ = 1 0 0 1 0 0 0

Table 1.1: GRAND TEP generation for yˆ = 1 0 0 1 0 0 0 (TEPs) e

yˆ ⊕ e

0000000 1000000 0100000 0010000 0001000 0000100 0000010 0000001

1001000 0001000 1101000 1011000 1000000 1001100 1001010 1001001

H · (yˆ ⊕ e) 001 111 010 100 110 101 011 000

(yˆ ⊕ e) ∈ C No No No No No No No Yes

6

1 Guessing Random Additive Noise Decoding (GRAND)

Suppose that c is the vector transmitted across the noisy communication channel, and yˆ is the vector received at the output of the channel. The vector yˆ is different from the transmitted vector c and it does not satisfy the codebook membership criterion (1.2) (H · yˆ   0). GRAND attempts to guess the noise induced by the communication channel by generating Test Error Patterns (TEPs) e, shown in Table 1.1, and then sequentially combining the test error patterns with the received vector and checking the resultant vector ( yˆ ⊕ e) to see if it is a member of the codebook (C) (1.2). If the resulting vector ( yˆ ⊕ e) is a member of the codebook (1.2), the decoding is assumed to be successful, and e is declared as the guessed noise, whereas cˆ  yˆ ⊕ e is outputted as the estimated codeword.

1.3 GRAND Variants GRAND features both soft-input and hard-input variants, which differ primarily in the order in which TEPs (e) are generated. In addition, GRAND variants can be differentiated based on the underlying communication channels they are designed for. We will briefly go over the key GRAND variants in this section.

1.3.1 Conventional GRAND Variants The standard GRAND variants that can be used with any underlying communication channel are discussed in this section. GRAND with ABandonment (GRANDAB) [16, 18] is a hard decision input variation that generates test error patterns in ascending Hamming weight order, up to the Hamming weight AB. Symbol Reliability GRAND [17] is another variation that employs thresholding on the input channel observation values to discern between reliable and unreliable bits of the received vector. Soft GRAND (SGRAND) [19] and Ordered Reliability Bits GRAND (ORBGRAND) [20] are soft-input variants that efficiently leverage soft information (channel observation values), resulting in improved decoding performance compared to the hard-input GRANDAB.

1.3.2 Specialized GRAND Variants The use of GRAND is not limited to AWGN channels, as variants of GRAND have been shown to work on both memoryless and memory-containing communication channels. Furthermore, GRAND can be applied to both fading and non-fading communication channels since GRAND adjusts the generation of TEPs in response to channel conditions. The GRAND Markov Order (GRAND-MO) [21], for example, uses noise correlations and adapts its TEP generation to mitigate the effect of noise bursts in communication channels with memory. Generally, this burst noise degrades the decoding performance of typical channel code decoders. Hence, time-diversity techniques such as interleaving/de-interleaving have typically been used to mitigate the effect of burst noise and make the communication channel appear memoryless. GRAND-MO [21] eliminates the need for interleavers/deinterleavers for communication channels with memory, facilitating reliable communication even in the presence of burst noise.

1.4 Test Error Pattern (TEP) Generation with GRAND

7

GRANDMO A comprehensive analysis of GRAND-MO and its associated Hardware implementation will be provided in Chapter 5. Fading GRAND In Chapter 6, a comprehensive analysis of Fading-GRAND and its proposed VLSI architecture will be presented. Similarly, Fading-GRAND [22] adjusts the TEP generation to accommodate the channel’s fading conditions. Over the multipath flat Rayleigh fading communication channel, which models the impact of small-scale fading in a multipath propagation environment without a dominant line of sight between the transmitter and the receiver, fading-GRAND outperforms traditional channel code decoders.

1.4 Test Error Pattern (TEP) Generation with GRAND As previously discussed in the introduction section, GRAND operate on the premise that the channel-induced noise can be guessed by first generating TEPs (e), then applying them to the received vector from the channel ( yˆ ), and finally querying the resulting vector ( yˆ ⊕ e) for codebook membership. This section describes the TEP generating scheme and computational complexity analysis for GRAND and its variants. Notations Please note that in this section the received vector of soft channel observation values is denoted as y whereas the vector yˆ denotes the hard-demodulated received vector from the channel.

1.4.1 GRANDAB Combining a TEP (e) with the hard demodulated received vector yˆ corresponds to flipping certain bits of that vector ( yˆ ). GRAND with ABandonment (GRANDAB) [16] is a hard decision input variant of GRAND that generates TEPs in increasing Hamming weight order up to a Hamming weight AB. Example 1.2: GRANDAB Test Error Pattern Generation The TEPs generated in Hamming weight order for n = 6 and AB = 3 are depicted in Fig. 1.1a, where each column corresponds to a TEP and a dot corresponds to a flipped bit location of the received hard demodulated vector ( yˆ ). TEPs with Hamming weight 1 are generated first, followed by TEPs with Hamming weights 2 and 3, as shown in Fig. 1.1a.

8

1 Guessing Random Additive Noise Decoding (GRAND)

Labels In Fig. 1.1, the x-axis indicates the TEP number, and the y-axis indicates the bit positions of a TEP (e), where a dot denotes a 1 and the absence of a dot denotes a 0. Alternately, we could say that each column in Fig. 1.1 represents a TEP (e) and that each dot represents a location where a bit is flipped in the received vector yˆ . For instance, when the received vector yˆ is combined with the TEP e = [0 0 1 1 0 0] ( yˆ ⊕ e), the third and fourth bits of yˆ will be flipped.

1.4.2 ORBGRAND ORBGRAND is a soft-input GRAND variant that uses the Logistic Weight (LW) order to generate TEPs (e). The logistic weight corresponds to the sum of the indices of non zero elements in the TEPs (e) [20]. For example, TEP e = [1, 1, 0, 0, 1, 0] has a Hamming weight of 3, whereas the logistic weight is 1 + 2 + 5 = 8. The ORBGRAND generates TEPs (e), in ascending logistic weight order, using the concept of integer partitions. An integer partition λ of a positive integer m, noted λ = (λ1, λ2, . . . , λ P )  m where λ1 > λ2 > . . . > λ P , is the multi-set of positive integers λi (∀i ∈ [1, P]) that sum to m. The integer partitions (λ) of an integer m (∀m ∈ [1, LWmax ]) are generated sequentially and these integer partitions are then used to generate TEPs. Please note that the generated TEPs (e) obtained from an integer partition with P elements has a Hamming weight of P. Example 1.3: ORBGRAND TEP Generation TEPs generated by ORBGRAND with LWmax = 21 are shown in Fig. 1.1b. For n = 6 and LWmax = 21, 63 TEPs are generated, with the maximum Hamming weight (HWmax ) of the generated TEPs being 6. However, when LWmax is reduced from 21 to 6 the number of TEPs is reduced to 13, as shown in Fig. 1.1c. As such, the parameter LWmax can be adjusted to limit the maximum number of TEPs. 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

50

55

60

65

0

5

10

15

20

25

30

35

40

45

50

55

60

65

0

5

10

15

20

25

30

35

40

45

50

55

60

65

Fig. 1.1: Test error pattern (TEP) generation for GRAND for n = 6 (a) Upper: TEP generation for GRANDAB (AB = 3) (b) Middle: TEP generation for ORBGRAND (LWmax = 21) (c) Bottom: TEP generation for ORBGRAND (LWmax = 6)

1.4 Test Error Pattern (TEP) Generation with GRAND

9

1.4.3 SGRAND Soft GRAND (SGRAND) [19] incorporates all soft information into the decoder to generate the ML order of TEPs, and an efficient method for generating the TEPs in ML order can be found in [23]. Table 1.2 shows an example ML order TEP generation for n = 6 and y = [1.0, 2.1, 3.2, 4.3, 5.4, 6.5]. The TEP (e) which n [(−1)cˆi yi ] (where cˆ = yˆ ⊕ e) is selected from a set of candidate TEPs maximizes the likelihood arg max i=1 e ∈S

maintained in set S in accordance with Algorithm 2 of [19]. Two new TEPs are added to the set S each time a TEP (e) is removed from the set S. We refer the reader to Algorithm 2 of [19] for further details. Table 1.2: Generating TEPs (e) for SGRAND based on ML order [19] #

e

arg max e∈S

n

i=1 [(−1)

0 1 2 3

(0, 0, 0, 0, 0, 0) (1, 0, 0, 0, 0, 0) (0, 1, 0, 0, 0, 0) (1, 1, 0, 0, 0, 0)

– 20.5 18.30 16.3

4 .. .

(0, 0, 1, 0, 0, 0) .. .

16.1 .. .

cˆ i y

i]

S {(1, 0, 0, 0, 0, 0)} {(1, 1, 0, 0, 0, 0),(0, 1, 0, 0, 0, 0)} {(1, 1, 0, 0, 0, 0),(0, 1, 1, 0, 0, 0),(0, 1, 0, 0, 0, 0)} {(0, 1, 1, 0, 0, 0),(0, 0, 1, 0, 0, 0),(1, 1, 1, 0, 0, 0),(1, 0, 1, 0, 0, 0)} .. . .. .

Example 1.4: SGRAND TEP Generation 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

50

55

60

65

0

5

10

15

20

25

30

35

40

45

50

55

60

65

0

5

10

15

20

25

30

35

40

45

50

55

60

65

Fig. 1.2: TEP generation for SGRAND (a) Upper: ML order for y 1 = [1.0, 2.1, 3.2, 4.3, 5.4, 6.5] (b) Middle: ML order for y 2 = [1.3, 2.5, 3.6, 4.9, 5.8, 6.1] (c) Bottom: ML order for y 3 = [1.8, 2.0, 3.9, 4.1, 5.6, 3.3] Let y 1 = [1.0, 2.1, 3.2, 4.3, 5.4, 6.5] be the received vector of channel observation values at time instant 1, and the ML order corresponding to y 1 is shown in Fig. 1.2a. At the second time step, the received vector from the channel is y 2 = [1.3, 2.5, 3.6, 4.9, 5.8, 6.1] and the corresponding ML order for TEP generation is depicted in Fig. 1.2b. Unlike ORBGRAND, even though the order of the absolute value of LLRs ( y¯ i ≤ y¯ j , ∀i < j) is the same for y 1 and y 2 , the TEPs are generated in a different order for

10

1 Guessing Random Additive Noise Decoding (GRAND)

y 1 and y 2 . Similarly, as shown in Fig. 1.2c, the ML order changes at a third time instant when y 3 = [1.8, 2.0, 3.9, 4.1, 5.6, 3.3] changes. As a result of the changing TEP query order (Fig. 1.2) with each received vector from the channel (y) and the TEP interdependence [19], SGRAND does not lend itself to efficient parallel hardware implementation. Alternatively, developing a sequential hardware implementation for SGRAND will result in a high decoding latency, which is unsuitable for applications requiring ultra-low latency. ORBGRAND, on the other hand, generates TEPs in the predetermined logistic weight order. Therefore, ORBGRAND is far better suited to parallel hardware implementation than SGRAND.

1.5 Performance and Complexity of GRAND 1.5.1 Performance of GRAND Variants Figure 1.3 compares the decoding performance of various GRAND variants, for decoding 5G NR CRC-aided Polar code (128, 105+11) and Polar code (128, 99+11), with BPSK modulation over an AWGN channel.

Fig. 1.3: Comparison of decoding performance of different GRAND variants with OSD (Order = 2), CA-SCL and DSCF (ω = 2, Tmax = 50) decoder for 5G-NR polar codes

Furthermore, the decoding performance of state-of-the-art soft-input decoders such as the CRC-aided Successive Cancellation List (CA-SCL) decoder [24, 25], Dynamic SC-Flip (DSCF) [26, 27] decoder, and Ordered Statistics Decoder (OSD) [25, 28, 29] is included for reference. The ORBGRAND and SGRAND softinput decoders outperform the hard-input GRANDAB variant in decoding performance, and the SGRAND variant achieves ML decoding performance similar to OSD, as shown in Fig. 1.3.

1.5 Performance and Complexity of GRAND

11

Figure 1.4 compares the decoding performance of different GRAND variants with OSD and ML decoding of BCH codes (127, 106) and (127, 113), respectively. The ML decoding results are obtained from [30]. Similar trends in decoding performance can be seen in the BCH codes depicted in Fig. 1.4, where soft-input variants of GRAND (ORBGRAND and SGRAND) outperform the hard-input traditional Berlekamp-Massey (B-M) [31, 32] decoder and the SGRAND achieves ML performance comparable to OSD.

Fig. 1.4: Comparison of decoding performance of different GRAND variants with OSD (Order = 2) and ML decoder [30] for BCH codes

In Fig. 1.5, the decoding performance of GRAND-MO is compared to that of traditional BerlekampMassey (B-M) decoder [31, 32] with BCH code (127, 106) in Markov channels (communication channels with memory). The Markov channels are susceptible to burst noise. The transitional probability (g) of the considered Markov channel is Inversely proportional to the memory of the channel. As channel memory increases (as indicated by a decreasing value of g), more burst noise can be observed on the channel. The traditional B-M decoder’s decoding performance deteriorates significantly since interleavers/deinterleavers are not present to spread these noise bursts across multiple codewords. On the other hand, the performance of GRAND-MO improves with increasing channel memory due to GRANDMO’s unique method of generating TEPs and mitigating the noise bursts.

12

1 Guessing Random Additive Noise Decoding (GRAND)

Fig. 1.5: Comparison of the GRAND-MO and BCH Berlekamp-Massey (B-M) decoding performance for BCH code (127, 106) in Markov channels

Fig. 1.6: Comparing fading-GRAND and BCH Berlekamp-Massey (B-M) decoding of BCH codes on Rayleigh flat fading channel

1.6 Hardware Architectures for GRAND

13

Figure 1.6 compares the FER performance of Fading-GRAND with the traditional Berlekamp-Massey (BM) decoder [31, 32] for decoding BCH code (127, 106) and BCH code (127, 113) on a multipath flat Rayleigh fading communication channel. For decoding BCH code (127, 106), the Fading-GRAND outperforms the B-M decoder by 4 dB. Similarly, for decoding BCH code (127, 113), the Fading-GRAND outperforms the B-M decoder by 6.5 dB as shown in Fig. 1.6.

1.5.2 Computational Complexity of GRAND The computational complexity of GRAND and its variants can be expressed in terms of the number of codebook membership queries required. In GRAND and its variants, a codebook membership query consists of simple operations such as bit-flips and a syndrome check (codebook membership verification (1.2)). Furthermore, the complexity can be divided into two categories: worst-case complexity, which corresponds to the maximum number of codebook membership queries required, and average complexity, which corresponds to the average number of codebook membership queries required. For a codelength of n = 128, the worst-case AB n number of queries for GRANDAB (AB = 3) decoder is 349,632 queries ( i [16]). The worst-case number i=1

of queries for the ORBGRAND decoder depends on the value of the parameter LWmax ; for example, with LWmax = 96 and n = 128, the worst case complexity is 3.69 × 106 queries [33]. For SGRAND, the parameter Queriesmax (Queriesmax = 107 [19]), which represents the maximum number of queries allowed, determines the worst-case complexity. Figure 1.7 compares the frame error rate (FER) performance and average complexity for different GRAND variants for decoding CRC Code (128, 104). As seen in Fig. 1.7b, as channel conditions improve, the average complexity of GRAND and its variants decreases sharply because transmissions subject to light noise are decoded quickly [16, 19, 20]. In terms of error decoding performance, SGRAND outperforms other GRAND variants by generating TEPs in ML order [19, 23]. As a result, SGRAND achieves ML decoding performance while requiring the fewest average number of codebook membership queries, as shown in Fig. 1.7. However, as explained previously, the SGRAND is not suited for parallel hardware implementation.

1.6 Hardware Architectures for GRAND The VLSI hardware architecture for GRAND and its variants published in the literature [33–39] can be classified into two basic categories based on whether TEPs are generated offline or online.

1.6.1 Offline TEP Generation The hardware architectures in this category rely on storing the pre-computed TEPs in large memory, which can be accessed either sequentially or in parallel during the decoding process to retrieve the TEPs. For example, the ORBGRAND hardware implementation proposed in [36], requires T − 2 Q s × n-bit pattern memories to store the pre-computed TEPs, where T is the number of pipeline stages and Q s is the size of pattern memory.

14

1 Guessing Random Additive Noise Decoding (GRAND)

Fig. 1.7: Comparison of decoding performance and average complexity GRANDAB, ORBGRAND and SGRAND (queriesmax = 107 ) decoding of CRC Code (128, 104)

Although these hardware architectures are straightforward, they require large area for the hardware implementation since they need to store all the pre-computed TEPs for a particular GRAND variant.

1.6.2 Online TEP Generation Instead of pre-computing and storing TEPs in memories, the GRAND hardware architectures in this category generate TEPs online. In [34], a cyclic shifter and pattern generator modules are used in collaboration to generate the TEPs online. We refer the reader to [34] for further details. The hardware architectures presented in [33–35, 37–39] employ a network of XOR gates and n × (n − k)-bit shift registers, to generate the TEPs. These VLSI architectures [33–35, 37–39] enable hardware implementations that are both area-efficient and energy-efficient and these hardware implementations will be discussed in detail in the relevant chapters of this book.

1.7 Conclusion The GRAND universal decoding technique has been introduced in this chapter. Furthermore, several GRAND variants are described in detail, as well as their test error pattern generation methods. In-depth insights are also given regarding GRAND’s decoding performance and computational complexity.

References

15

References 1. Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423. 2. Hocquenghem, A. (1959). Codes correcteurs d’erreurs. Chiffres, 2, 147–156. 3. Bose, R., & Ray-Chaudhuri, D. (1960). On a class of error correcting binary group codes. Information and Control, 3, 68–79. 4. Gallager, R. (1962). Low-density parity-check codes. IRE Transactions on Information Theory, 8, 21–28. 5. Berrou, C., Glavieux, A., & Thitimajshima, P. (1993). Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1. In Proceedings Of ICC ’93 - IEEE International Conference on Communications (Vol.2, pp. 1064–1070). 6. Sasoglu, E., Telatar, E., & Arikan, E. (2009). Polarization for arbitrary discrete memoryless channels. In 2009 IEEE Information Theory Workshop (pp. 144–148). 7. Arikan, E. (2009). Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory, 55, 3051–3073. 8. Coskun, M., Durisi, G., Jerkovits, T., Liva, G., Ryan, W., Stein, B., & Steiner, F. (2019). Efficient error-correcting codes in the short blocklength regime. Physical Communication, 34, 66–79. 9. Bennis, M., Debbah, M., & Poor, H. (2018). Ultrareliable and low-latency wireless communication: Tail, risk, and scale. Proceedings of the IEEE, 106, 1834–1853. 10. 3GPP NR; Multiplexing and Channel Coding (2020). http://www.3gpp.org/DynaReport/38-series.htm, Rel. 16.1. 11. Durisi, G., Koch, T., & Popovski, P. (2016). Toward massive, ultrareliable, and low-latency wireless communication with short packets. Proceedings of the IEEE, 104, 1711–1726. 12. Parvez, I., Rahmati, A., Guvenc, I., Sarwat, A., & Dai, H. (2018). A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys Tutorials, 20, 3098–3130. 13. Ma, Z., Xiao, M., Xiao, Y., Pang, Z., Poor, H., & Vucetic, B. (2019). High-reliability and low-latency wireless communication for internet of things: Challenges, fundamentals, and enabling technologies. IEEE Internet of Things Journal, 6, 7946–7970. 14. Zhan, M., Pang, Z., Dzung, D., & Xiao, M. (2018). Channel coding for high performance wireless control in critical applications: Survey and analysis. IEEE Access, 6, 29648–29664. 15. Chen, H., Abbas, R., Cheng, P., Shirvanimoghaddam, M., Hardjawana, W., Bao, W., et al. (2018). Ultrareliable low latency cellular networks: Use cases. Challenges and Approaches. IEEE Communications Magazine, 56, 119–125. 16. Duffy, K. R., Li, J., & Médard, M. (2019). Capacity-achieving guessing random additive noise decoding. IEEE Transactions on Information Theory, 65(7), 4023–4040. 17. Duffy, K. R., & Médard, M. (2019). Guessing random additive noise decoding with soft detection symbol reliability information - sgrand. In 2019 IEEE International Symposium on Information Theory (ISIT) (pp. 480–484). 18. Duffy, K., Solomon, A., Konwar, K., & Médard, M. (2020). 5G NR CA-polar maximum likelihood decoding by GRAND. In 2020 54th Annual Conference on Information Sciences and Systems (CISS) (pp. 1–5). 19. Solomon, A., Duffy, K. R., & Médard, M. (2020). Soft maximum likelihood decoding using grand. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC) (pp. 1–6).

16

1 Guessing Random Additive Noise Decoding (GRAND)

20. Duffy, K. R. (2021). Ordered reliability bits guessing random additive noise decoding. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8268–8272). 21. An, W., Médard, M., & Duffy, K. R. (2020). Keep the bursts and ditch the interleavers. In GLOBECOM 2020 - 2020 IEEE Global Communications Conference (pp. 1–6). 22. Mohsin Abbas, S., Jalaleddine, M., & Gross, W. (2022). GRAND for Rayleigh fading channels. ArXiv E-prints. earXiv:2205.00030. 23. Valembois, A., & Fossorier, M. (2001). An improved method to compute lists of binary vectors that optimize a given weight function with application to soft-decision decoding. IEEE Communications Letters, 5, 456–458. 24. Tal, I., & Vardy, A. (2015). List decoding of polar codes. IEEE Transactions on Information Theory, 61, 2213–2226. 25. Balatsoukas-Stimming, A., Parizi, M., & Burg, A. (2015). LLR-based successive cancellation list decoding of polar codes. IEEE Transactions on Signal Processing, 63, 5165–5179. 26. Chandesris, L., Savin, V., & Declercq, D. (2018). Dynamic-SCFlip decoding of polar codes. IEEE Transactions on Communications, 66, 2333–2345. 27. Ercan, F., Tonnellier, T., Doan, N., & Gross, W. (2020). Practical dynamic SC-flip polar decoders: Algorithm and implementation. IEEE Transactions on Signal Processing, 68, 5441–5456. https://doi. org/10.1109/TSP.2020.3023582 28. Fossorier, M., & Lin, S. (1995). Soft-decision decoding of linear block codes based on ordered statistics. IEEE Transactions on Information Theory, 41, 1379–1396. 29. Wonterghem, J., Alloum, A., Boutros, J., & Moeneclaey, M. (2017). On performance and complexity of OSD for short error correcting codes in 5G-NR. In Proceedings of the First International Balkan Conference on Communications and Networking (BalkanCom 2017). 30. Helmling, M., Scholl, S., Gensheimer, F., Dietz, T., Kraft, K., Ruzika, S., & Wehn, N. (2019). Database of channel codes and ML simulation results. www.uni-kl.de/channel-codes 31. Berlekamp, E. (1968). Nonbinary BCH decoding (Abstr.). IEEE Transactions on Information Theory, 14, 242–242. 32. Massey, J. (1969). Shift-register synthesis and BCH decoding. IEEE Transactions on Information Theory, 15, 122–127. 33. Abbas, S., Tonnellier, T., Ercan, F., Jalaleddine, M., & Gross, W. (2021). High-throughput VLSI architecture for soft-decision decoding with ORBGRAND. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8288–8292). 34. Riaz, A., Bansal, V., Solomon, A., An, W., Liu, Q., Galligan, K., Duffy, K., Medard, M., & Yazicigil, R. (2021). Multi-code multi-rate universal maximum likelihood decoder using GRAND. In ESSCIRC 2021 - IEEE 47th European Solid State Circuits Conference (ESSCIRC) (pp. 239–246). 35. Abbas, S., Jalaleddine, M. & Gross, W. (2022). Hardware architecture for guessing random additive noise decoding Markov order (GRAND-MO). Journal of Signal Processing Systems, 94, 1047–1065. https://doi.org/10.1007/s11265-022-01775-2 36. Condo, C. (2022). A fixed latency ORBGRAND decoder architecture with LUT-aided error-pattern scheduling. IEEE Transactions on Circuits and Systems I: Regular Papers, 69, 2203–2211. 37. Abbas, S., Tonnellier, T., Ercan, F., & Gross, W. (2020). High-throughput VLSI architecture for GRAND. In 2020 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 1–6).

References

17

38. Abbas, S., Tonnellier, T., Ercan, F., Jalaleddine, M., & Gross, W. (2022). High-throughput and energyefficient VLSI architecture for ordered reliability bits GRAND. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 30, 681–693. 39. Abbas, S. M., Jalaleddine, M., & Gross, W. J. (2021). High-throughput VLSI architecture for GRAND Markov order. In IEEE Workshop on Signal Processing Systems (SiPS), 2021, 158–163.

Part II

Hardware Architectures for Conventional GRAND Variants

We explore the typical GRAND variants in this section, which can be used with any underlying communication channel. The traditional GRAND versions covered in this section include GRAND with ABandonment (GRANDAB), ORBGRAND, and List-GRAND (LGRAND). For each GRAND variant, a detailed hardware VLSI architecture is also discussed.

Chapter 2

Hardware Architecture for GRAND with ABandonment (GRANDAB) Syed Mohsin Abbas, Marwan Jalaleddine and Warren J. Gross

, Furkan Ercan, Thibaud Tonnellier,

McGill University, Montréal, QC, Canada. e-mail: [email protected] McGill University, Montréal, QC, Canada. e-mail: [email protected] Intel Labs, Intel Corporation, Hudson, MA, 01749, USA. e-mail: [email protected] Airbus Defence and Space, France. e-mail: [email protected] McGill University, Montréal, QC, Canada. e-mail: [email protected]

Abstract This chapter presents a high-throughput and energy-efficient hardware architecture for the GRAND with ABandonment (GRANDAB) decoder, a hard-input variant of GRAND. The GRANDAB generates Test Error Patterns (TEPs) . e in increasing Hamming weight order, which are then applied to the received harddemodulated vector from the communication channel (. yˆ ), and the resultant vector (. yˆ ⊕ e) is evaluated for codebook membership. In this chapter, we propose leveraging the linearity property of the underlying code to simplify the codebook membership verification procedure. Furthermore, we present a shift-register based VLSI design that applies multiple TEPs simultaneously via a network of XOR gates. For a .(128, 104) linear block code, the proposed GRANDAB hardware can achieve an average information throughput of up to 52 Gbps. Furthermore, the proposed hardware can be used to decode any code of length 128 and code rate between 0.75 and 1.

2.1 GRAND with ABandonment (GRANDAB) In this chapter, we focus on the GRAND with ABandonment (GRANDAB) [1, 2] decoder which is a hardinput variant of GRAND. We begin by examining the decoding performance and computational complexity of the GRANDAB decoder. Based on this analysis, we then simplify the GRANDAB algorithm and design a highly parallelized VLSI architecture for its hardware implementation. Throughout the chapter, we use the vector . yˆ to refer to the hard-demodulated received vector from the channel, and the symbol HW to represent the Hamming weight of the TEP . e. Furthermore, we consider BPSK modulation over an AWGN channel with variance .σ 2 for all the numerical simulation results presented in this chapter. Please note that this chapter follows up on our previously published work [3].

2.1.1 Decoding Performance and Complexity Analysis The GRANDAB (.AB = t, ∀ t ∈ [1, ABmax ]) decoder can decode channel codes of various classes, lengths, and rates since GRAND is code-agnostic. To do so, the GRANDAB decoder generates TEPs (. e) in ascending

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_2

21

22

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

Hamming weight order, up to the Hamming weight AB. The decoding performance and computational complexity (number of codebook membership queries required) of GRANDAB decoding are directly impacted by the parameter .AB. Therefore, the parameter .AB can be further explored to strike a balance between decoding performance requirements and complexity budget of a target application. Definition 2.1 The term GRANDAB, with .AB = t, indicates that the Hamming weight of the TEPs (. e) under consideration does not exceed t. Worst Case Complexity of GRANDAB For a code of length n and .AB = t, the worst-case complexity (i.e., the maximum number of codebook membership queries) is given by t    n . (2.1) . i i=1

Example 2.1: Worst Case Number of Queries for GRANDAB with .n = 128 and . AB = 3 The worst-case number of queries for GRANDAB .(AB = 3) is .349,632 with a codelength of .n = 128. Recap: Performance Evaluation Figure 2.1 compares the decoding performance and average complexity for GRANDAB decoding of Bose-Chaudhuri-Hocquenghem (BCH) [4, 5] codes .(127, 106) and .(127, 113) against that of the traditional Berlekamp-Massey (B-M) decoder [6, 7] for BCH codes. As seen in Fig. 2.1a, the decoding performance of GRANDAB for the BCH code .(127, 106) improves with increasing AB up to 3, at which point it performs similarly to the B-M decoder. The GRANDAB with . AB = 2 performs similarly to the B-M decoder for BCH codes .(127, 113), as shown in Fig. 2.1c. For BCH .(127, 106) and BCH .(127, 113) codes, respectively, Fig. 2.1b and d demonstrate the average complexity of the GRANDAB decoder for various choices of parameter AB. As shown in Fig. 2.1b and d, the average complexity of GRANDAB decreases with improved channel conditions (higher . ENb0 values), regardless of the code-rate. As will be seen later in Sec. 2.4, this reduction in complexity will translate to a lower average decoding latency for the GRANDAB hardware implementation, rendering GRAND suitable for applications requiring ultra low latency.

2.1 GRAND with ABandonment (GRANDAB)

23

Fig. 2.1: Comparison of decoding performance and average complexity GRANDAB (. AB = t) and B-M decoding [6, 7] of BCH codes

2.1.2 Proposed Simplifications for the Codebook Membership Evaluation GRAND is centered around generating TEPs . e, applying them to the hard-demodulated received vector (. yˆ ) form the channel and querying the resultant vector . yˆ ⊕ e for codebook memberships as follows:

24

2 Hardware Architecture for GRAND with ABandonment (GRANDAB) .

H · ( yˆ ⊕ e) = 0

(2.2)

If the codebook membership evaluation criteria (2.2) is satisfied, . e is the guessed noise and . cˆ  yˆ ⊕ e is the estimated codeword. The GRANDAB (.AB = t, ∀ t ∈ [1, ABmax ]) decoding procedure applies TEPs (. e) with a Hamming weight of .≥ 1 and .≤ ABmax to the received vector (. yˆ ) and terminates the decoding if any of the applied TEPs (. e) satisfy the codebook membership criterion (2.2). 2.1.2.1 Simplifications for Checking TEPs (e) with HW = 1 For checking the TEPs with Hamming weight of 1 (. e = 1i ∀ i ∈ [1, n]), the codebook membership verification (2.2) can be expanded as follows .

H · ( yˆ ⊕ 1i ) = H · yˆ  ⊕ H · 1 i,

(2.3)

where . H · yˆ  (denoted as . s c ) represents the .(n − k)-bits syndrome for the received vector . yˆ and . H · 1 i (denoted as . si ) is the .(n − k)-bits syndrome for the error pattern with Hamming weight of 1 (.1i ). Please note that the syndrome . si, ∀ i ∈ [1, n] also corresponds to the .i th column of the parity check matrix . H. 2.1.2.2 Simplifications for Checking TEPs (e) with HW > 1 In a similar manner, for verifying error patterns with Hamming weight .> 1, we can leverage the underlying code’s linearity property to aggregate t syndromes of error patterns with a Hamming weight of 1 (. si ), and then generate syndromes that correspond to an error pattern with a Hamming weight of t (. s1,2...,t =   H · 1 1 ⊕ H · 12 . . . ⊕ H · 1t ). For example, the codebook membership for TEPs with Hamming weight 2, . e = 1i, j with .i ∈ 1 . . n, . j ∈ 1 . . n and .i  j, can be checked as .

 H · ( yˆ ⊕ 1i, j ) = H · yˆ  ⊕ H · 1 i ⊕ H · 1j .

(2.4)

Similarly the codebook membership for TEPs with Hamming weight of 3, . e = 1i, j,k , can be checked as .

  H · ( yˆ ⊕ 1i, j,k ) = H · yˆ  ⊕ H · 1 i ⊕ H · 1 j ⊕ H · 1k .

(2.5)

Checking TEPs of an Arbitrary HW In general, codebook membership for TEPs with any Hamming weight, . e = 1i, j,k,...,z , can be checked as:      .H · ( y ˆ ⊕ 1i, j,k,...,z ) = H · yˆ  ⊕ H · 1 (2.6) i ⊕ H · 1 j ⊕ H · 1k ⊕ ... ⊕ H · 1z This is equivalent to applying a XOR operation on the relevant columns of the parity check matrix . H.

2.2 Proposed VLSI Architecture for GRANDAB

25 Shift Reg. 2

Shift Reg. 1

Parallelization Unit (PU) 1 1

H Memory

Mux

Index Reg. 2

Index Reg. 1

Controller

1

Mux

Word Generator

Word Computation Unit (WCU)

Fig. 2.2: Proposed VLSI architecture for GRANDAB (.AB = 3)

2.2 Proposed VLSI Architecture for GRANDAB In this section, we provide details of the proposed VLSI architecture for GRANDAB (.AB ≤ 3) decoding of linear .(n, k) block codes [3]. The proposed VLSI architecture for GRANDAB (.AB = 3) is shown in Fig. 2.2 which features the following components: 1. Two .n × (n − k)-bit shift registers are used in the proposed VLSI architecture. Each of these shift registers stores all the .(n − k)-bit syndromes of error patterns with a Hamming weight of 1 (denoted as . si = H · 1 i , .i ∈ 1 . . n). 2. A network of .(n − k)-bit XOR gates are used to combine multiple . si to generate syndrome of an error   pattern with a Hamming weight of .t > 1 (. s1,2...,t = H · 1 1 ⊕ H · 12 . . . ⊕ H · 1t ). 3. Index registers, which are .n × log2 n-bit shift registers, perform the same operations as the main shift registers (that store . si ) to keep track of the indices (i in . si ). 4. Controller module that works in conjunction with the shift registers. Shift Operations for the Shift Registers There are two shift operations supported by the shift registers: • Cyclic-Shift: At each time step, the shift-registers’ contents can be shifted in a cyclic manner. For example, when the content of the second row is shifted to the first row, the first row’s content is moved to the last row. • Shift-up: This operation is similar to the Cyclic-Shift except that the content of the last row are replaced by the .(n − k)-bit wide null vector during a shift up operation. The following cyclic shifts exclude rows with null vectors after a shift-up operation has been performed.

26

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

The proposed GRANDAB VLSI architecture can be used to decode any linear block code of length n and code-rates between .0.75 and 1. Please note that the control and clock signals are not shown in Fig. 2.2 ˆ for clarity. The proposed hardware architecture takes . yˆ as an input and outputs the estimated word . u. Furthermore, any . H matrix can be loaded into the .(n − k) × n-bit H memory at any time to support different codes and rates. A syndrome check (2.2) is applied to the input vector . yˆ in the first stage of decoding. If the syndrome is verified (syndrome is zero), decoding is assumed to be successful. Otherwise, the TEPs . e are generated in the ascending Hamming weight order and applied to . yˆ , after which the resulting vector . yˆ ⊕ e is checked for codebook membership. If any of the generated TEPs (. e) satisfy the codebook membership constraint (2.2), the controller module forwards the corresponding indices to the Word Computation Unit (WCU) module, which translates these index values to their correct bit flip locations in . yˆ .

Shift Register 1 1

1

Fig. 2.3: VLSI architecture for checking error patterns with Hamming weight of 1 (. si = H · 1 i , .i ∈ 1 . . n)

2.2.1 Checking TEPs with Hamming Weight of 1 Figure 2.3 shows the VLSI architecture for checking TEPs with Hamming weight of 1 for codebook membership (2.3). To check TEPs with Hamming weight of 1, a network of .n − k-bit wide XOR gates is used to combine the syndrome of the received vector (. s c ) with each row of the shift register (. si ⊕ s c , .i ∈ 1 . . n). Each of the n test syndromes is then NOR-reduced to feed a n-to-.log2 n priority encoder. The output of each NOR-reduce is 1 if and only if all of the bits of the syndrome computed by (2.3) are 0. If any of the NOR-reduce outputs is 1, the priority encoder module is used to locate the TEP (. e) syndrome satisfying (2.3).

2.2 Proposed VLSI Architecture for GRANDAB

27

Time Steps Required for .HW = 1 One time step is required to exhaustively evaluate the codebook membership (2.3) for all the TEPs (. e) of .HW = 1 using the VLSI architecture presented in Fig. 2.3.

Shift Reg. 1

Shift Reg. 2

(a) First time step.

Shift Reg. 1

Shift Reg. 2

(b) Second time step.

Fig. 2.4: Content of the shift registers for checking TEPs with Hamming weight of 2, . e = 1i, j , at different time steps

2.2.2 Checking TEPs with Hamming Weight of 2 Figure 2.4a shows the content of the shift-registers at the first time step when checking for the TEPs with Hamming weight 2: the contents of the shift register 2 are an identical replica (copy) of the contents of the shift register 1, which has been cyclically shifted by one. By combining each row of the shift registers with the syndrome of the received vector (. s c ), we can compute n syndromes, associated with TEPs with Hamming weight of 2, in one time step. At the next time step, the content   of the shift register 2 is cyclically shifted by one as shown in Fig. 2.4b. Observing that .1i, j = 1 j,i , all the . n2 TEPs with Hamming weight of 2 are tested

for code membership (2.3) after a total of . n2 − 1 cyclic shifts from the original setting (Fig. 2.4a). Time Steps Required for TEPs with .HW = 2

A total of . n2 time steps are required to exhaustively check all the test error patterns of .HW = 2 (2.4) with the VLSI architecture presented in Fig. 2.4. Please note that every time a shift register is updated (cyclic shifted or shifted up), its corresponding index shift register (index register) is likewise updated in order to keep track of the indices.

28

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

Example 2.2: Checking TEPs .HW = 2, .n = 6 Shift Reg. 1 Shift Reg. 2

Shift Reg. 1 Shift Reg. 2 Shift Reg. 1 Shift Reg. 2

1

3

2

Fig. 2.5: Checking TEPs with Hamming weight of 2 for .n = 6 at different time steps The contents of the shift registers at various time steps for evaluating TEPs with Hamming weight 2 and .n = 6 is shown in Fig. by one at each  2.5. The contents of the shift register 2 are cyclically shifted time-step, and all 15 (. n2 ) TEPs are evaluated for codebook membership after 3 (. n2 ) time-steps.

2.2.3 Checking TEPs with HW of 3

Shift Reg. 1

Shift Reg. 2

Contrl.

Shift Reg. 1

Shift Reg. 2

Contrl.

(a) First time step. Shift Reg. 1 Shift Reg. 2

Contrl.

(c) Time step

(b) Second time step. Shift Reg. 1 Shift Reg. 2

Contrl.

−1 + 1.

2

(d) Time step

−1 + 2.

2

Fig. 2.6: Content of the shift registers and syndrome outputted by the controller for checking the TEPs with Hamming weight of 3, e = 1i, j,k , at different time steps

2.2 Proposed VLSI Architecture for GRANDAB

29

To check all the TEPs corresponding to Hamming weight of 3, a controller is used in conjunction with the shift registers. Figure 2.6a shows the content of the shift registers and the syndrome output by the controller to generate .n − 1 TEPs with Hamming weight of 3 at time step 1. To do so, the shift register 1 is shifted-up by 1, while the shift register 2 is shifted-up by 1 and cyclically shifted by 1 at the initialization. In the next time step, the shift register 2 is cyclically

shifted by 1 to generate  n−1 the next .n − 1 TEPs with Hamming weight time steps all the . of 3 as shown in Fig. 2.6b. After . n−1 2 2 TEPs, with Hamming weight of 3 and with . s1 output by the controller module are evaluated for codebook membership (2.5). In the next time step, the controller outputs . s2 while the shift register 1 is shifted-up by 1 and the shift register 2 is reset, shifted-up by 2 and cyclically shifted by 1. This step evaluates (2.5) .n−2 TEPs with Hamming weight of 3, as shown in Fig. 2.6c. The shift register 2 is cyclically shifted by 1 in the following time step, .n − 2 TEPs with a Hamming weight of 3, as illustrated in Fig. 2.6d. enabling the evaluation of the subsequent

  time steps are required to check (2.5) all the . n−2 TEPs with Hamming Hence, a total number of . n−2 2 2 weight of 3, with . s2 output by the controller. Similarly, this process is repeated until . s n−2 is outputted by the  ⊕ H · s ⊕ H · s . controller, where only one TEP with Hamming weight of 3 is evaluated: . H · yˆ  ⊕ H · s n−2 n n−1 Time Steps Required for TEPs with .HW = 3 n−1 i

Checking all the TEPs (. e) with Hamming weight of 3 requires . i=2 2 time steps. Time Steps Required for Checking all TEPs .(AB ≤ 3) In summary, the number of required time steps to check all the TEPs with Hamming weight of 3 or less .(AB ≤ 3) is given by: n  i .2 + (2.7) . 2 i=2

30

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

Example 2.3: Checking TEPs .HW = 3, .n = 6 Shift Reg. 1

Shift Reg. 2

Contrl.

Shift Reg. 1

Shift Reg. 2

Contrl.

1

Shift Reg. 1

2

Shift Reg. 2

Contrl.

Shift Reg. 1

Shift Reg. 2

Contrl.

3

4

Shift Reg. 1

Shift Reg. 2

Contrl.

5

Shift Reg. 1

Shift Reg. 2

Contrl.

6

Fig. 2.7: Checking TEPs with Hamming weight of 3 for .n = 6 at different time steps The contents of the shift registers at various   time steps for evaluating TEPs with Hamming weight 3 and .n = 6 is shown in Fig. 2.7. All 20 (. n3 ) TEPs are evaluated for codebook membership after 6 n−1 i

(. i=2 2 ) time-steps.

2.3 VLSI Architecture for GRANDAB with Improved Parallelization Factor

31

Shift Reg. 1 1 1

H Memory

1 1 1

1

Controller 1 1

1

Word Computation Unit (WCU)

Fig. 2.8: Proposed VLSI architecture for GRANDAB (.AB = 3 and . L > 1). PU: Parallelization Unit

2.3 VLSI Architecture for GRANDAB with Improved Parallelization Factor The GRANDAB .(AB = 3) VLSI architecture shown in the preceding section has a worst-case latency of 4098 time-steps (2.7) for a linear block code of length .n = 128. It is also possible to duplicate the shift register 2 and associated circuitry, which are referred to as the Parallelization Unit (PU) in Fig. 2.2, to increase the parallelization factor and thereby decrease the worst-case latency (number of required time steps). Figure 2.8 depicts the proposed VLSI architecture for GRANDAB .(AB = 3) with added parallelization units. Please note that L stands for the number of parallelization units, and as the number of parallelization units .(L > 1) increases, the worst case latency reduces, as shown in the following subsection and the implementation results section. This section will discuss the scheduling and evaluation of TEPs using the proposed hardware architecture with additional PUs shown in Fig. 2.2. Shift Reg. 1 Shift Reg.

Shift Reg.

(a)

Shift Reg.

Shift Reg. 1 Shift Reg.

Shift Reg.

Shift Reg.

(b)

Fig. 2.9: Content of the shift registers for checking TEPs with Hamming weight of 2 for . L > 1 at different time steps. (a) First time step. (b) Second time step

32

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

2.3.1 Evaluating TEPs with Hamming Weight of 2 ( L > 1) The contents of the shift registers at the first time step while checking TEPs with Hamming weight 2 and L > 1 are shown in Fig. 2.9a: the contents of shift register .P1 is cyclically shifted by one image of shift register 1. Similarly, the contents of shift register .P j , ∀ j ∈ [2, L] are cyclically shifted by one image of the preceding shift register .P j−1 . In the first time step, .n × L TEPs can be tested for codebook membership (2.2) by combining each row of the shift register 1 and shift register .P j , ∀ j ∈ [1, L] with the syndrome of the received vector (. s c ). At the next time step, for checking next .n × L TEPs, the contents of shift register .P1 are cyclically shifted by L and the contents of shift registers .P j , ∀ j ∈ [2, L] are a cyclically shifted by one image of the preceding shift register . P j−1 shown in Fig. 2.9b. This procedure is repeated to test the subsequent .n × L TEPs, with the shift register 1 being shifted up by L in each time step and the shift registers .P j , ∀ j ∈ [1, L] being cyclically shifted by one image of the preceding shift register .P j−1 . .

Time Steps Needed with L Parallelization Factor   Hence, with . L > 1, all the . n2 TEPs with Hamming weight 2 are checked for codebook membership (2.2) after a total number of .  n/2 L − 1 cyclic shifts (time-steps) from the original setting (Fig. 2.9a). Example 2.4: Using Parallelization Factor of 2 with .n = 6 and .HW = 2 Shift Reg. 1

Shift Reg. 1 Shift Reg.

Shift Reg.

Shift Reg.

Shift Reg.

(a)

(b)

Fig. 2.10: (a) Content of the shift registers for checking TEPs with Hamming weight of 2, .n = 6 and for . L = 2 at first time step. (b) Combining elements of shift registers 1 and shift register .P j , ∀ j ∈ [1, 2] with . s c Figure 2.10a displays the contents of the shift registers at the first time step for evaluating TEPs with Hamming weight 2, .n = 6, and 2 parallelization units .(L = 2). Figure 2.10b shows how to combine each element of shift register 1 and shift register .P j , ∀ j ∈ [1, 2] with the received vector syndrome (. s c ).

2.3 VLSI Architecture for GRANDAB with Improved Parallelization Factor Shift Reg. 1

Shift Reg. 1 Shift Reg.

33 Shift Reg.

Shift Reg.

Shift Reg.

(a)

(b)

Fig. 2.11: (a) Content of the shift registers for checking TEPs With Hamming weight of 2, .n = 6 and for L = 2 at second time step. (b) Combining elements of shift register 1 and shift register .P j , ∀ j ∈ [1, 2] with . s c

.

At the next time step, the contents of the shift register .P1 are cyclically shifted by 2 .(L = 2) and the contents of the shift register .P2 are cyclically shifted by one image of the preceding shift register (.P1 ). Figure 2.11a shows the content of the shift registers with .n = 6 and . L = 2 at second time step. Whereas combining each element of shift register 1 and shift register .P j , ∀ j ∈ [1, 2] with . s c is shown in Fig. 2.11b.

Shift Reg. 1

Shift Reg.

Shift Reg.

Shift Reg.

Contrl.

Shift Reg. 1

Shift Reg.

Shift Reg.

Shift Reg. 1

Shift Reg.

Shift Reg.

Shift Reg.

Contrl.

(a) Shift Reg. 1

Shift Reg.

Shift Reg.

Contrl.

(b) Shift Reg.

Shift Reg.

Contrl.

(c)

(d)

Fig. 2.12: Content of the shift registers for checking TEPs with Hamming weight of 3 for . L > 1 at different

+ 1. (d) Time step .  (n−1)/2

+2 time steps. (a) First time step. (b) Second time step. (c) Time step .  (n−1)/2 L L

34

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

2.3.2 Evaluating TEPs with Hamming Weight of 3 ( L > 1) The procedure described in Sec. 2.2.3 can be used to check TEPs with a Hamming weight of 3 and with L > 1 parallelization units. Some minor modifications are needed to make that procedure suitable for the VLSI architecture described in Fig. 2.2. In the VLSI architecture with a single parallelization unit (. L = 1), the controller module provides the syndrome . s1 , which is combined with the elements of shift registers (Shift registers 1 and 2 in Fig. 2.2) as well as with the syndrome of the received vector (. s c ). For the VLSI architecture supporting . L > 1 parallelization units described in Fig. 2.8, the contents of the shift registers and the syndrome output by the controller (. s1 ) to check TEPs with Hamming weight of 3 are shown in Fig. 2.12a. This arrangement of shift registers evaluates all of the .(n − 1) × L TEPs with the . s1 output from the controller in one time step. To check the subsequent .(n − 1) × L TEPs, at the next time step, the shift register .P1 is cyclically shifted up by L. Whereas, the shift register .P j , ∀ j ∈ [2, L] is a cyclically shifted by one image of its preceding shift

time register .P j−1 , as illustrated in Fig. 2.12b. Each time step, this procedure is repeated, and after .  (n−1)/2 L  n−1 steps, all . 2 TEPs with a Hamming weight of 3 and . s1 output by the controller, are evaluated. In the next time step, the controller outputs . s2 , the shift register 1 is shifted-up by 1 and the shift register . P1 is reset, shifted-up by 2 and cyclically shifted by 1 as shown in Fig. 2.12c. Furthermore, the shift registers . P j , ∀ j ∈ [2, L] are cyclically shifted by one image of the preceding register . P j−1 . In this setting, all of the .(n − 2) × L TEPs are checked for codebook membership in one time step as illustrated in Fig. 2.12c. In the following time step, the shift register .P1 is shifted up by L to check the next .(n − 1) × L TEPs where the shift registers .P j , ∀ j ∈ [2, L] are the cyclically shifted up images of their previous shift registers .P j−1 , as  

time steps, all . n−2 TEPs with Hamming weight of 3 illustrated in Fig. 2.12d. Therefore, after .  (n−2)/2 L 2 and . s2 output by the controller are checked for codebook membership. This process is repeated until the controller outputs . s n−2 , as explained in the preceding subsection. The previous discussion leads to the conclusion that, for . L > 1, checking all TEPs with a Hamming weight of 3 .

requires a total of . 

n−1 i=2 (i/2)

L

time steps.

Time Steps Required to Check All TEPs . AB ≤ 3 and . L > 1 In summary, the number of required time steps to check all the TEPs for codebook membership, with Hamming weight . AB ≤ 3 and with . L > 1, is given by: 2+

.

n  (i/2)

.  L i=2

(2.8)

2.3 VLSI Architecture for GRANDAB with Improved Parallelization Factor

35

Example 2.5: Using Parallelization Factor of 2 with .n = 6 and .HW = 3

Shift Reg. 1

Shift Reg. 1 Shift Reg.

Shift Reg.

Shift Reg.

Contrl.

Shift Reg.

(a)

(b)

Fig. 2.13: (a) Content of the shift registers for checking TEPs with Hamming weight of 3, .n = 6 and for . L = 2 at first time step. (b) Combining elements of shift register 1 and shift register .P j , ∀ j ∈ [1, 2] with . s c ⊕ s1

The contents of the shift registers at the first time step for evaluating TEPs with Hamming weight 3, n = 6 and with 2 parallelization units .(L = 2) is shown in Fig. 2.13a. Furthermore, Fig. 2.13b shows how to combine each element of shift register 1 and shift register .P j , ∀ j ∈ [1, 2] with the received vector syndrome (. s c ) and the syndrome output by the controller (. s1 ).

.

Shift Reg. 1

Shift Reg. 1 Shift Reg.

Contrl.

Shift Reg.

Shift Reg.

Shift Reg.

(a)

(b)

Fig. 2.14: (a) Content of the shift registers for checking TEPs With Hamming weight of 3, .n = 6 and for L = 2 at second time step. (b) Combining elements of shift register 1 and shift register .P j , ∀ j ∈ [1, 2] with . s c ⊕ s1

.

Figure 2.14 shows contents of shift registers at the second time step for checking TEPs with Hamming weight 3, .n = 6 and . L = 2

36

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

Table 2.1: TSMC 65 nm CMOS implementation comparison for GRANDAB (. AB = 3) decoding of CRC code .(128, 104) . AB = 3 = 1 . L = 2 . L = 4 . L = 8 . L = 16 Implementation Synthesis Technology (nm) 65 Supply (V) 0.9 Max. Frequency (MHz) 500 Area (mm2 ) 0.25 0.38 0.58 1.12 2.12 W.C. Latency (cycles) 4098 2082 1074 570 318 Avg. Latency (cycles) 1 1 1 1 1 W.C. Latency (.μs) 8.196 4.164 2.148 1.140 0.636 Avg. Latency (ns) 2 2 2 2 2 W.C. T/P (Mbps) 12.689 24.975 48.714 91.228 163.52 Avg. T/P (Gbps) 52 52 52 52 52 Power (mW) 45.4 69.7 114.8 198.8 370 Energy per Bit (pJ/bit) 0.87 1.34 2.20 3.8 7.11 Area Efficiency (Gbps/mm2 ) 208 136.8 89.65 46.42 24.52 Code compatible Yes Yes Yes Yes Yes Rate compatible Yes Yes Yes Yes Yes

Parameters

.Information .Area

.L

Throughput (Gbps) =

Efficiency (Gbps/mm ) = 2

k Decoding Latency (ns) Avg. T/P (Gbps) , .Energy Area (mm2 )

per Bit (pJ/bit) =

Power (mW) Avg. T/P (Gbps)

2.4 Implementation Results for the Proposed GRANDAB Hardware The proposed hard-input GRANDAB (.AB = 3, . L ≥ 1) VLSI architecture is implemented using Verilog HDL and synthesized with Synopsys Design Compiler using general-purpose TSMC 65 nm CMOS technology. The synthesis results for the proposed GRANDAB decoder with .n = 128, .AB = 3, . L ≥ 1, and .0.75 ≤ R ≤ 1 are presented in Table 2.1. The hardware design is verified using test benches generated via the bit-true C model of the proposed hardware. Furthermore, switching activities from real test vectors are extracted for the GRANDAB hardware architecture to ensure accuracy in power measurements. The GRANDAB hardware implementation can support a maximum frequency of .500 MHz. Since, no pipelining strategy is used, one clock cycle corresponds to one time-step. In the worst-case (W.C.) scenario with . L = 1, 4098 cycles are required to decode a code of length (n) 128 using Eq. (2.7). Furthermore, a total of 2082, 1074, 570, and 318 cycles are required for parallelization units . L = 2, . L = 4, . L = 8, and . L = 16 (using Eq. (2.8)) respectively. As shown in Table 2.1, the GRANDAB hardware can achieve a worst-case information throughput (W.C. T/P) of .12.689∼163.52 Mbps, depending on the parameter L, with a maximum frequency of .500 MHz. As shown in Table 2.1, increasing the value of L from 1 to 2 improves the worst-case throughput by .1.9×, however this increase in L from 1 to 2 results in a .52% area overhead and a .53.5% increase in power consumption. Similarly, increasing L from 1 to 16 improves the worst-case throughput by .12.8× while increasing the area overhead by .8.5× and increasing power consumption by .8× as shown in Table 2.1. Therefore, it can be deduced that, appropriate L values can be chosen to strike a balance between the area overhead, power budget and the worst-case throughput requirement for a target application.

2.5 Conclusion

37

Fig. 2.15: Comparison of average latency and average information throughput for the proposed GRANDAB (. AB = 3) VLSI architecture for Cyclic Redundancy Check (CRC) [8] code (128, 104)

The average latency, on the other hand, is significantly smaller than the worst-case latency, especially at the higher . ENb0 region. The average latency is computed using the bit-true C model after taking into account at least

100 frames in error for each . ENb0 point. As channel conditions improve, the average latency for GRANDAB .(AB = 3, 1 ≤ L ≤ 16) decoder decreases until it reaches only 1 cycle per decoded codeword, as illustrated in Fig. 2.15a, resulting in an average latency of 2ns (corresponding to a maximum clock frequency of .500 MHz) for the considered CRC code .(128, 104). The information throughput, which is the inverse of latency, is depicted in Fig. 2.15b. It should be noted that the information throughput increases with . ENb0 , reaching values of .∼52 Gbps at a target FER of .≤ 10−7 (Fig. 2.15b). The preceding discussion can be concluded as, due to the proposed parallelized, high throughput and energy efficient VLSI implementation, the GRANDAB decoder is suitable for applications that require extremely low average latency and have a constrained energy consumption budget. Furthermore, the degree of parallelization can be tuned based on the target application’s area and latency budget.

2.5 Conclusion The GRANDAB decoder, a GRAND hard-input variant, is presented in this chapter along with a highthroughput and energy-efficient VLSI architecture. The proposed GRANDAB hardware can decode any linear block code with length .n = 128 and code-rates between .0.75 and 1. The GRANDAB decoder’s inherent parallelism is improved by exploring algorithmic simplifications. Therefore, the proposed hardware design can execute .349,632 codebook membership queries in .318∼4098 clock cycles. Furthermore, the proposed GRANDAB hardware’s parallelization factor can be chosen based on the target application’s desired latency as well as its area and energy budget. The proposed GRANDAB hardware can achieve a maximum average

38

2 Hardware Architecture for GRAND with ABandonment (GRANDAB)

decoding information throughput of up to 52 Gbps for a .(128, 104) linear block code at a target FER of ≤ 10−7 .

.

Acknowledgements We would like to thank Dr. Thibaud Tonnellier and Dr. Furkan Ercan for contributing to this chapter as co-authors. Dr. Furkan Ercan is a Research Scientist affiliated with Intel Labs, Intel Corporation, Hudson, MA, 01749 (Email: [email protected]). Dr. Thibaud Tonnellier is with Airbus Defence and Space, France. (Email: [email protected].)

References 1. Duffy, K. R., Li, J., & Médard, M. (2019). Capacity-achieving guessing random additive noise decoding. IEEE Transactions on Information Theory, 65(7), 4023–4040. 2. Duffy, K., Solomon, A., Konwar, K., & Médard, M. (2020). 5G NR CA-Polar maximum likelihood decoding by GRAND. In 2020 54th Annual Conference on Information Sciences and Systems (CISS) (pp. 1–5). 3. Abbas, S., Tonnellier, T., Ercan, F., & Gross, W. (2020). High-throughput VLSI architecture for GRAND. In 2020 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 1–6). 4. Hocquenghem, A. (1959). Codes correcteurs d’erreurs. Chiffres, 2, 147–156. 5. Bose, R., & Ray-Chaudhuri, D. (1960). On a class of error correcting binary group codes. Information And Control, 3, 68–79. 6. Berlekamp, E. (1968). Nonbinary BCH decoding (Abstr.). IEEE Transactions on Information Theory, 14, 242–242. 7. Massey, J. (1969). Shift-register synthesis and BCH decoding. IEEE Transactions on Information Theory, 15, 122–127. 8. Peterson, W., & Brown, D. (1961). Cyclic codes for error detection. Proceedings of The IRE, 49, 228–235.

Chapter 3

Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND) Syed Mohsin Abbas, Marwan Jalaleddine and Warren J. Gross

, Furkan Ercan, Thibaud Tonnellier,

McGill University, Montréal, QC, Canada. e-mail: [email protected] McGill University, Montréal, QC, Canada. e-mail: [email protected] Intel Labs, Intel Corporation, Hudson, MA, 01749, USA. e-mail: [email protected] Airbus Defence and Space, France. e-mail: [email protected] McGill University, Montréal, QC, Canada. e-mail: [email protected]

Abstract Ordered Reliability Bits GRAND (ORBGRAND) is a soft-input GRAND variant that has superior decoding performance than the hard-input GRANDAB. The ORBGRAND hardware architecture is presented in this chapter, which achieves an average information throughput of up to 42.5 Gbps for a code length of 128 at a target FER of 10−7 . Furthermore, the proposed ORBGRAND hardware, which achieves a similar decoding performance to the state-of-the-art Fast Dynamic Successive Cancellation Flip (Fast-DSCF) decoder, is 32× more energy efficient, 5× more area efficient and has a 49× higher average throughput compared to Fast-DSCF hardware decoder using a 5G polar code (128, 105). Additional Notations In addition to notations defined in the previous chapters, the symbols ∴ and ∵ refer to therefore and because respectively. a ◦ b denotes permuting elements in a according to the permutation order in b. Please note that this chapter follows up on our previously published works [1, 2].

3.1 Ordered Reliability Bits GRAND (ORBGRAND) Definition 3.1 The logistic weight (LW) corresponds to the sum of the indices of non zero elements in the Test Error Patterns (TEPs) (e) [3]. The maximum logistic weight considered for a TEP (e) will be referred to as LWmax (LWmax ≤ n(n+1) 2 ).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_3

39

40

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

Example 3.1: Calculating LW and HW Problem 3.1 Given three TEPs e1 = [1, 1, 0, 0, 1, 0], e2 = [0, 0, 0, 0, 1, 0] and e3 = [0, 0, 0, 0, 1, 1], calculate their Hamming and logistic weights. Solution 3.1 The Hamming weight can be calculated as HW =

n 

ei (∀ ei ∈ [0, 1])

i

While the logistic weight can be calculated as LW =

n 

i × ei

i

Hence: e1 has a Hamming weight of 3, whereas its logistic weight is 1 + 2 + 5 = 8. e2 has a Hamming weight of 1, whereas its logistic weight is 5. e3 has a Hamming weight of 2, whereas its logistic weight is 5 + 6 = 11. Definition 3.2 An integer partition λ of a positive integer m, denoted as λ = (λ1, λ2, . . . , λ P )  m where λ1 > λ2 > . . . > λ P , is the multiset of positive integers λi (∀i ∈ [1, P]) that sum to m. Furthermore, the Hamming weight of the generated TEP (e) obtained using an integer partition with P elements is P. P and HW Since the Hamming weight of an error pattern (e) always equals the number of elements in the generated integer partition λ, we will use the variables P and HW interchangeably in this chapter Definition 3.3 If all parts λi (∀i ∈ [1, P]) of the integer partition λ are distinct, the partition is called a distinct integer partition. Example 3.2: Integer Partitions

Fig. 3.1: Distinct integer partitions for LW = 10 with P elements

3.1 Ordered Reliability Bits GRAND (ORBGRAND)

41

The integer partitions for LW = 10 are displayed in Fig. 3.1, where they are organised in ascending order of integer partition size (P). A TEP (e) having a Hamming weight of 1 will be generated with an integer partition with a size of one (P = 1). Similarly, TEPs (e) with Hamming weights greater than one will be generated using integer partitions of sizes greater than one (P > 1).

ORBGRAND [3] is a soft-input GRAND variant that effectively uses the soft channel information (LogLikelihood Ratios (LLRs)) to improve the decoding performance of GRAND. Algorithm 2 lists the framework used by ORBGRAND to decode the received channel signals y. The algorithm is explained in details below:

Algorithm 2: ORBGRAND algorithm

1 2 3 4 5 6 7 8 9

Input: y, H, G−1 , LWmax Output: uˆ ¯ ← Sort(|y |) [ind, y] for i ← 0 to LWmax do S ← (λ1, λ2, . . . , λ P )  i forall j in S do e←0 e ← (e ⊕ 1 j ) ◦ ind if H · (yˆ ⊕ e) == 0 then uˆ ← (yˆ ⊕ e) · G−1 return uˆ

// y¯ i ≤ y¯ j

(∀i < j)

// P ∈ [1, Pmax ])

Line 1 : ORBGRAND starts by sorting y in ascending order of absolute value of LLRs (| y|), and then ORBGRAND records the corresponding indices in a permutation vector which will be referred to by ind . Line 3 : Following that, all integer partitions (λ = (λ1, λ2, . . . , λ P )  i where P ∈ [1, Pmax ] and i ∈ [0, LWmax ]; explained in Sec. 3.2) are generated for each logistic weight i. Line 6 : ORBGRAND then generates TEPs (e) using the integer partitions (λ ∈ S). The generated TEPs (e) are ordered using the permutation vector ind. Line 7 : The TEPs (e) are then combined sequentially with the hard decision vector ( yˆ ), which is obtained from y. The resulting vector yˆ ⊕ e is then queried for codebook membership (H · ( yˆ ⊕ e) == 0). Line 8 : If the codebook membership criterion is met, then e is the guessed noise and cˆ  yˆ ⊕ e is the ˆ is retrieved from the estimated estimated codeword. Using G−1 (line 8), the original message ( u) codeword, and the decoding process is terminated. If the code membership criterion is not met with the current TEP (e), either larger values of logistic weights or other TEPs for the same logistic weight are considered. It should be noted that only distinct integer partitions are considered for generating TEPs (e), and all parts (λi ) of the integer partition (λ) must be less than or equal to n (λi ≤ n (∀i ∈ [1, P])).

42

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

Example 3.3: Generating Test Error Patterns (e) Problem 3.2 Generate the TEP (e) corresponding to λ = (9, 1) and ind = (2, 6, 5, 4, 3, 1, 7, 10, 8, 9). Solution 3.2 We can first generate the TEP (e) corresponding to flipping the first and ninth bits e = (1, 0, 0, 0, 0, 0, 0, 0, 1, 0). After permuting the generated TEP (e) with permutation ind = (2, 6, 5, 4, 3, 1, 7, 10, 8, 9), we obtain the following vector e ◦ ind = (0, 0, 0, 0, 0, 1, 0, 0, 0, 1).

3.2 ORBGRAND Design Considerations The computational complexity of GRAND and its variants can be expressed in terms of the number of codebook membership queries required. Please note that the computational complexity can be characterised by either its worst-case, which corresponds to the maximum number of codebook membership queries required, or by the average complexity, which corresponds to the average number of codebook membership queries required. ORBGRAND generates distinct integer partitions of a given logistic weight (LW), and uses these integer partitions to generate TEPs (e) required for codebook membership queries. In its original form, ORBGRAND as the maximum logistic weight for a (n, k) linear block code (LWmax = n(n+1) uses n(n+1) 2 2 ). Worst Case Complexity The worst case complexity of ORBGRAND when LWmax = n(n+1) is 2n queries, as it entails querying 2 all the TEPs (e) for a specific codeword. On the other hand, the worst case complexity of Maximum Likelihood decoding is 2k queries where all the vectors of message bits are explored based on the maximum likelihood criterion. To reduce the worst-case complexity of ORBGRAND associated with parameter LWmax = n(n+1) 2 , the value of parameter LWmax as well as the Hamming weight of the TEPs (e) can be restricted. This limit can be determined empirically through Monte Carlo simulations as will be discussed in Sec. 3.2.1. It should be noted that, with improved channel conditions, the average complexity of all GRAND varinats decrease sharply since more reliable transmissions are quickly decoded [3–5].

3.2.1 Parametric Analysis of ORBGRAND The ORBGRAND parameters, LWmax and P, influence both the decoding performance as well as the worst-case computational complexity of ORBGRAND. Therefore, these parameters (LWmax , P) can be appropriately chosen to reduce worst-case complexity while having a minimal impact on the decoding performance. The effect of the LWmax and P parameters on ORBGRAND’s decoding performance for 5G

3.2 ORBGRAND Design Considerations

43

Fig. 3.2: Comparison of decoding performance of ORBGRAND with different parameters ( LWmax and P) for decoding 5G CRC-aided Polar Code (128, 105+11)

Fig. 3.3: Comparison of decoding performance of ORBGRAND with different parameters LWmax and P for decoding BCH Code (127, 106)

CRC-Aided (CA) polar code (128, 105+11) [6] is displayed in Fig. 3.2 with BPSK modulation over an AWGN channel. Effect of varying LWmax : By simulating over LWmax values of 128, 96, and 64, we are in fact setting the maximum number of queries to be 5.33 × 107 , 3.69 × 106 , and 1.5 × 105 respectively. A performance degradation of 0.2 dB is observed at FER = 10−7 when LWmax is reduced from 128 to 64. On the other hand, this reduction leads to a decrease in the complexity of ORBGRAND by 355×.

44

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

Effect of varying P: Similarly, the degradation in ORBGRAND decoding performance is negligible when P = 6 is used instead of an unbounded P with LWmax = 64. As such the maximum number of queries can be limited to 1.16 × 105 by choosing P = 6 with LWmax = 64 for ORBGRAND. Similarly, Fig. 3.3 compares ORBGRAND’s decoding performance for the BCH (127, 106) code. Similar observations to that seen with CA-polar (128, 105+11) can be drawn with BCH code (127, 106) as P and LWmax are varied. As a conclusion, limiting the ORBGRAND parameters LWmax and P results in the reduction of ORBGRAND’s computational complexity which aids in the design of high-throughput and energy-efficient hardware, which will be later discussed in the Sec. 3.3. Recap: Decoding Performance of ORBGRAND Compared to Traditional Decoders

Fig. 3.4: Comparison of decoding performance of ORBGRAND decoding with different parameters (LWmax , P) for 5G CRC-Aided (CA) Polar Code (128, 105+11) and BCH code (127, 106) As seen in Fig. 3.4a, with Polar Code (128, 105+11): 1. ORBGRAND outperforms GRANDAB (AB = 3) [4] by around 2 dB for target FER ≤ 10−5 . 2. ORBGRAND outperforms the CA-SCL decoder [7, 8] and the DSCF decoder [9, 10] for decoding CA-Polar code (128, 105+11) up to target FERs of 10−4 and 10−6 , respectively. The number of DSCF attempts (Tmax ) is set to 50, and the maximum bit-flipping order (ω) is set to 2.

3.2 ORBGRAND Design Considerations

45

As seen in Fig. 3.4b, with BCH Code (127, 106): 1. ORBGRAND results in a 1.7 dB performance gain at a target FER of 10−5 compared to B-M decoding of the same code [11, 12]. 2. For target FER of 10−6 , the Ordered Statistic Decoding (OSD) [13–15] outperforms the ORBGRAND decoder by 0.7 dB.

3.2.2 Proposed Simplified Generation of Integer Partitions (λ ) In this section, we propose a novel way to generate TEPs (e) for ORBGRAND using distinct integer partitions of a particular logistic weight. We will also expand on how this can be done using a specific arrangement of shift registers and XOR gates. Although a method for the generation of integer partitions was proposed in [16], their approach cannot be directly applied to our proposed ORBGRAND architecture since the generated partitions are not distinct. Furthermore, the method in [16] generates integer partition in a sequential fashion which prevents us from using their method in a parallelized, high-throughput hardware architecture. We observed that a breakdown of integer m yields useful patterns for generating integer partitions of a particular logistic weight (m). Example 3.4: All the Distinct Integer Partitions of m = 12 λ = {(12); (11, 1); (10, 2); (9, 3); (8, 4); (7, 5); (9, 2, 1); (8, 3, 1); (7, 4, 1); (6, 5, 1); (7, 3, 2); (6, 4, 2); (5, 4, 3); (6, 3, 2, 1); (5, 4, 2, 1)}. For a particular  logistic weight m, integer partitions of size 2 (P = 2) can be generated as (λ1, λ2 )  m where λ2 ∈ [1, m2 − 1] and λ1 = m − λ2 . It can be observed that, if a listing order is followed for integer partitions of size 2 (P = 2), the first integer descends while the second ascends. Example 3.5: Generating Distinct Integer Partitions of Size 2 (m = 12, P = 2) λ ={(11, 1); (10, 2); (9, 3); (8, 4); (7, 5)} Similar trends can be observed for higher-order (P ≥ 3) integer partitions. The integer partitions of size 3 (P = 3) can be generated as (λ1, λ2, λ3 )  m where, λ3 ∈ [1, λ3max ], λ2 ∈ [λ3 + 1, λ2max ] and λ1 = m − λ2 − λ3 . |λ3 Moreover, λ3max is the maximum value of λ3 , and λ2max is the maximum value of λ 2 for a specific value of |λ3 max λ3 (λ3 ∈ [1, λ3 ]). For integer partitions of size 3 (P = 3), it can be seen that the first integer descends, the second ascends, and the third integer remains fixed until all iterations for the first two integers are completed. Example 3.6: Generating Distinct Integer Partitions of Size 3 (m = 12, P = 3) λ ={(9, 2, 1); (8, 3, 1); (7, 4, 1); (6, 5, 1); (7, 3, 2); (6, 4, 2); (5, 4, 3)}),

46

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

Generating Integer Partitions of Size P In general, an integer partition of size P can be generated as (λ1, λ2, . . . , λ P )  m where λ P ∈ [1, λmax P ], P  ]∀i ∈ [2, P − 1] and λ1 = m − λi . Moreover, λ max is the maximum value λi ∈ [λi+1 + 1, λimax P |λ ,...,λ i+1

P−1

i=2

of λ P , and λimax is the maximum value of λi for specific values of λ j (∀ j ∈ [i + 1, P − 1]). |λi+1 ...,λ P−1 as λimax . The maximum value for each λi (∀i ∈ [2, P]) is For simplicity, we will denote λimax |λi+1 ...,λ P−1 bounded by (3.1). Lemma 3.1 If a positive integer m is partitioned into P distinct parts, ∀i ∈ [2, P], and assuming that λi are ordered, the maximum value for each λi is bounded by 2 × m − (i × (i − 1)) + 2 − 2 × λimax


1 (s1,2..., P =   H · 1 1 ⊕ H · 12 . . . ⊕ H · 1 P ).

3.3.2 Evaluating TEPs (e) for P ≤ 3 The Hamming weight of the TEPs (e) that can be evaluated, for codebook membership, in parallel depends on the size and number of shift registers used in the GRANDAB VLSI architecture [17]. For example, two n × (n − k) shift registers are used to evaluate n TEPs (e) in parallel with a Hamming weight of 2 in [17]. However, when more shift registers are added, the complexity of the interconnections increases. Therefore, in the proposed ORBGRAND VLSI architecture, we employ three shift registers to evaluate all of the TEPs (e) concurrently, which corresponds to an integer partition with a maximum size of 3 (P ≤ 3). In the proposed ORBGRAND VLSI architecture, the first, second and third shift registers are initialized with λ1 , λ2 , λ3 ((λ1, λ2, λ3 )  m) respectively. The first and second shift registers are each 2×(λ3max +1)×(n−k) bits in size whereas the third shift register is a λ3max × (n − k) bit shift register, where λ3max value is given by (3.1) corresponding to P = 3. Since we have λ1 = m − 3i=2 λi , corresponding to P = 3, therefore the syndrome s m−i is stored at the ith index of the first shift register. Whereas the syndrome si is stored at the ith index for the second and

3.3 Proposed Hardware Architecture for ORBGRAND

51

4

Shift Register 1

Shift Register 2

Fig. 3.10: Interconnections and the associated XOR gates for the fourth (4th) bus for checking error patterns of Hamming weight of 3 (P = 3)

5

Shift Register 1

Shift Register 2

Fig. 3.11: Interconnections and the associated XOR gates for the fifth (5th) bus for checking error patterns of Hamming weight of 3 (P = 3)

52

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

6

Shift Register 1

Shift Register 2

Fig. 3.12: Interconnections and the associated XOR gates for the last (6th) bus for checking error patterns of Hamming weight of 3 (P = 3)

Shift Register 1

Shift Register 2 Shift Register 3

Fig. 3.13: Shift registers contents for checking error patterns corresponding to a Hamming weight of 2 and 3 for any logistic weight m

3.3 Proposed Hardware Architecture for ORBGRAND

53

Shift Register 2

Shift Register 1

Shift Register 3

Fig. 3.14: Shift registers contents for checking error patterns corresponding to P > 3 for any logistic weight P λi ) m (λ1 = m − i=2 2 3 4 5

Shift Register 1

Shift Register 2 Shift Register 3

Fig. 3.15: Shift registers contents for checking TEPs (e) corresponding to P = 4 at time step 1

third shift registers. An example of the content and interconnection of three shift registers for logistic weight m = 20 is presented in Fig. 3.6. The contents of the three shift registers are syndromes (si ) of the TEPs (e) with Hamming weight of 1. These syndromes are then combined, using an array of (n − k)-bit XOR gates, to generate the syndromes of TEPs (e) with Hamming weights 2 and 3. Please note that, a collection of these interconnections of the shift registers, shown in Fig. 3.6, is defined as a bus. Since there are numerous connections and XOR gates involved, we used a single XOR gate to illustrate a group of XOR gates, and a single bus symbol to illustrate the group of interconnections in Fig. 3.6. As seen in Fig. 3.6, there are 6 buses (λ3max + 1, where λ3max = 5 for P = 3 (3.1)) for m = 20. The first bus, which is highlighted by a blue solid rectangle, is used to check TEPs (e) with Hamming weight of 2, and the remaining buses, which are highlighted by the dashed green rectangle in Fig. 3.6, are used to check for TEPs (e) of Hamming weight 3. To evaluate the TEPs (e), for codebook membership, corresponding to a Hamming weight 2 (P = 2), the first bus (highlighted by blue solid rectangle in Fig. 3.6) is used to combine all the elements of shift register

54

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

2

3

Shift Register 2

Shift Register 2 Shift Register 1

Shift Register 1

4

Shift Register 1

5

Shift Register 2

Shift Register 1

Shift Register 2

Fig. 3.16: Interconnections and the associated XOR gates for the second (2nd), third (3rd), fourth (4th), and fifth (5th) bus for checking error patterns corresponding to P = 4 at time step 1

1 with all the elements of the shift register 2 using an array of XOR gates. These results are again combined with the syndromes of the received vector (sc ) to check for the TEPs (e) with Hamming weight of 2. The detailed interconnections and the associated XOR gates for the first bus are shown in Fig. 3.7. Similarly, to check the error patterns corresponding to a Hamming weight of 3 (P = 3), the selected elements of the shift register 1 and 2 are combined with sc as well as with the elements of the shift register 3. We use a single bus and a single XOR gate to illustrate these interconnections, which are depicted in Fig. 3.6 by the dashed rectangle. Figures 3.8, 3.9, 3.10, 3.11 and 3.12 depict the detailed interconnections and associated XOR gates for the second (2nd), third (3rd), fourth (4th), fifth (5th), and sixth (6th) buses.

3.3 Proposed Hardware Architecture for ORBGRAND

55

2 3 4

Shift Register 1

Shift Register 2 Shift Register 3

Fig. 3.17: Shift registers contents for checking TEPs (e) corresponding to P = 4 at time step 2

Due to the described arrangement and interconnection of the shift registers and XOR gates, all the TEPs (e) corresponding to an integer partition of sizes 2 and 3 for a specific logistic weight m are checked in one time-step. In general, to check the error patterns corresponding to an integer partition of sizes 2 and 3 for any logistic weight m, the content and the interconnection of the three shift registers are depicted in Fig. 3.13. Duplicate TEPs The proposed ORBGRAND VLSI architecture may generate TEPs (e) syndromes for duplicated integer partitions as well as TEP syndromes for non-distinct integer partitions. We can disregard the resulting queries because duplicate integer partitions result to the exact same TEP syndrome and hence do not negatively influence the FER performance. We can also disregard the non-distinct integer partitions because they will generate a TEP syndrome that corresponds to a LW that is less than the current LW and which has been queried before. For instance, in Fig. 3.7, the dotted red rectangles highlight the TEP syndromes for the duplicated integer partitions (s c ⊕ s8 ⊕ s12 and s c ⊕ s12 ⊕ s8 ). Similarly, the dotted blue rectangle highlights the TEP syndrome for the non-distinct integer partition (s c ⊕ s10 ⊕ s10 = s c ).

3.3.3 Evaluating TEPs (e) for P > 3 To check all the TEPs (e) corresponding to integer partitions of sizes P > 3, a controller is used in conjunction with the shift registers. The controller combines the Pmax − 3 syndromes together with the syndrome of the received vector s c to generate the combined syndrome, which is represented as s comp . Hence, when s comp is fixed, only one time-step is required to generate all combinations of {λ1, λ2, λ3 } using the shift registers with adequately chosen shift values.

56

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

2

Shift Register 2

Shift Register 1

3

Shift Register 1

4

Shift Register 2 Shift Register 1

Shift Register 2

Fig. 3.18: Interconnections and the associated XOR gates for the second (2nd), third (3rd) and fourth (4th) bus for checking error patterns corresponding to P = 4 at time step 2

The content and the interconnection of the three shift registers, which are used to check the TEPs (e) corresponding to integer partitions of sizes P > 3, are depicted in Fig. 3.14. Since the first bus is only used to check TEPs with Hamming weight of 2 (P = 2), it is disabled for P > 3 and not shown in Fig. 3.14. A 0 corresponds to a disabled connection, which means the respective elements of the bus, do not take part in the final computations. Figures 3.15, 3.16, 3.17, 3.18 and 3.19 illustrate checking the TEPs corresponding to P = 4 (Hamming weight of 4 ). At each time step, the controller outputs s comp = s c ⊕ sλ4 (λ4 ∈ [1, λ4max ]) and {λ1, λ2, λ3 } are computed and sλi (i ∈ [1, 3]) are mapped to their corresponding shift registers.

3.3 Proposed Hardware Architecture for ORBGRAND

57

2

Shift Register 1

Shift Register 2

Shift Register 3

Fig. 3.19: Shift registers contents for checking test error patterns corresponding to P = 4 at time step 3

Step 1: P = 4, Time Step =1 At the first time step (λ4 = 1), the controller combines the syndrome of the received signal s c with max ] where λ max = 5 with λ = 1 (3.1)) is s1 (s comp = s c ⊕ s1 ). Furthermore, the λ3 (λ3 ∈ [2, λ3,λ 4 3,λ4 4 computed and sλ3 is mapped to the third shift register. Similarly, λ2 (λ2 ∈ [λ3 + 1, λ(2×(λmax +1))) ]) and λ1 3 4 (λ1 = m − i=2 λi ) are computed and sλi (i ∈ [1, 2]) are mapped to their corresponding shift registers. All the TEPs (e) with λ4 = 1 are checked in the arrangement shown in Fig. 3.15 in a single time-step. Recall The blue dotted rectangle in Fig. 3.5 depicts all the distinct integer partitions for size 4 (P = 4) and with λ4 = 1. These integer partitions correspond to the TEPs (e) with Hamming weight of 4 and s comp = s c ⊕ s1 output by the controller in this time step. Figure 3.16 depict the detailed interconnections and associated XOR gates for the second (2nd), third (3rd), fourth (4th) and fifth (5th) bus for checking TEPs (e) corresponding to P = 4 at time step 1.

58

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

2

Shift Register 1

Shift Register 2

Fig. 3.20: Interconnections and the associated XOR gates for the second (2nd) bus for checking error patterns corresponding to P = 4 at time step 3

Step 2: P = 4, Time Step =2 ] where At the next time step, the controller outputs s comp = s c ⊕ s2 (λ4 = 2) and λ3 (λ3 ∈ [3|λ3max |λ4 λ3max = 5 with λ = 2 (3.1)) is computed and s is mapped to shift register 3. Shift register 2 is shifted 4 λ 3 |λ4 4 up by 1 position and shift register 1 maps sλ1 (λ1 = m − i=2 λi ) as shown in Fig. 3.17.

3.3 Proposed Hardware Architecture for ORBGRAND

59

2 3

Shift Register 1

Shift Register 2

Shift Register 3

Fig. 3.21: Shift registers contents for checking TEPs (e) corresponding to P = 5 at time step 1

2

Shift Register 1

3

Shift Register 2 Shift Register 1

Shift Register 2

Fig. 3.22: Interconnections and the associated XOR gates for the second (2nd) and third (3rd) bus for checking error patterns corresponding to P = 5 at time step 1

60

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

2

Shift Register 1

Shift Register 2

Shift Register 3

Fig. 3.23: Shift registers contents for checking TEPs (e) corresponding to P = 5 at time step 2

Recall The green dotted rectangle in Fig. 3.5 depicts all the distinct integer partitions for size 4 (P = 4) and with λ4 = 2. These integer partitions correspond to the TEPs (e) with Hamming weight of 4 and s comp = s c ⊕ s2 output by the controller. Figure 3.18 depict the detailed interconnections and associated XOR gates for the second (2nd), third (3rd) and fourth (4th) bus for checking error patterns corresponding to P = 4 at time step 2. All TEPs (e) with λ4 = 2 are checked in this second time-step.

3.3 Proposed Hardware Architecture for ORBGRAND

61

2

Shift Register 1

Shift Register 2

Fig. 3.24: Interconnections and the associated XOR gates for the second (2nd) bus for checking error patterns corresponding to P = 5 at time step 2

62

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

2

Shift Register 1

Shift Register 2

Shift Register 3

Fig. 3.25: Shift registers contents for checking TEPs (e) corresponding to P = 5 at time step 3

2

Shift Register 1

Shift Register 2

Fig. 3.26: Interconnections and the associated XOR gates for the second (2nd) bus for checking error patterns corresponding to P = 5 at time step 3

3.3 Proposed Hardware Architecture for ORBGRAND

Step 3: P = 4, Time Step =3 ] Similarly, at third time step, the controller outputs s comp = s c ⊕ s3 , (λ4 = 3) and λ3 (λ3 ∈ [4, λ3max |λ4 where λ3max = 4 with λ = 3 (3.1)) is computed as shown in Fig. 3.19. 4 |λ4 Figure 3.20 depict the detailed interconnections and associated XOR gates for the second (2nd) bus for checking error patterns corresponding to P = 4 at time step 3. Therefore, a total of 3 time steps (λ4max = 3, Eq. 3.1), are required to check for error patterns corresponding to P = 4 and m = 20. Recall The pink dotted rectangle in Fig. 3.5 depicts all the distinct integer partitions for size 4 (P = 4) and with λ4 = 3. These integer partitions correspond to the TEPs (e) with Hamming weight of 4 and s comp = s c ⊕ s3 output by the controller.

Step 4: P = 5, Time Step =1 Figures 3.21, 3.22, 3.23, 3.24, 3.25 and 3.26 depict the use of shift registers to check the TEPs (e) corresponding to P = 5 (Hamming weight of 5) and m = 20. At each time step, the controller outputs ]) is computed. s comp = s c ⊕ sλ5 ⊕ sλ4 . For each value of λ5 (λ5 ∈ [1, λ5max ]), λ4 (λ4 ∈ [λ5 + 1, λ4max |λ5 max Similarly, for each value of λ4 , λ3 (λ3 ∈ [λ4 + 1, λ3 |λ ,λ ]), λ2 (λ2 ∈ [λ3 + 1, λ(2×(λmax +1)) ]) and λ1 3 5 4 5 (λ1 = m − i=2 λi ) are computed and sλi (i ∈ [1, 3]) are mapped to their corresponding shift registers. At first time step, the controller outputs s comp = s c ⊕ s1 ⊕ s2 (λ5 = 1 and λ4 = 2). The TEPs (e) with λ5 = 1 and λ4 = 2 are checked as shown in Fig. 3.21. Recall The solid brown rectangle in Fig. 3.5 depicts all the distinct integer partitions for size 5 (P = 5) and with λ5 = 1 and λ4 = 2. These integer partitions correspond to the TEPs (e) with Hamming weight of 5 and s comp = s c ⊕ s1 ⊕ s2 output by the controller. Figure 3.22 depicts the detailed interconnections and associated XOR gates for the second (2nd) and third (3rd) bus for checking error patterns corresponding to P = 5 at time step 1.

63

64

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

Step 5: P = 5, Time Step =2 At the next time step, the controller outputs s comp = s c ⊕ s1 ⊕ s3 (λ5 = 1 and λ4 = 3). The TEPs (e) with λ5 = 1 and λ4 = 3 are checked as shown in Fig. 3.23. Recall The solid gray rectangle in Fig. 3.5 depicts all the distinct integer partitions for size 5 (P = 5) and with λ5 = 1 and λ4 = 3. These integer partitions correspond to the TEPs (e) with Hamming weight of 5 and s comp = s c ⊕ s1 ⊕ s3 output by the controller. Figure 3.24 shows the detailed interconnections and associated XOR gates for the second (2nd) bus for checking TEPs (e) corresponding to P = 5 at time step 2. Step 6: P = 5, Time Step =3 Similarly, at third time step, the controller outputs s comp = s c ⊕ s2 ⊕ s3 (λ5 = 2 and λ4 = 3). The TEPs (e) with λ5 = 2 and λ4 = 3 are checked as shown in Fig. 3.25. Recall The solid yellow rectangle in Fig. 3.5 depicts all the distinct integer partitions for size 5 (P = 5) and with λ5 = 2 and λ4 = 3. These integer partitions correspond to the TEPs (e) with Hamming weight of 5 and s comp = s c ⊕ s2 ⊕ s3 output by the controller. Furthermore, the Fig. 3.26 depicts the detailed interconnections and associated XOR gates for the second (2nd) bus for checking error patterns corresponding to P = 5 at time step 3.  λmax  λmax 4|λ = 3 and λ4max = 3 (3.1)) Hence, a total of 3 time steps ( λ55 =1 λ4 =λ5 5 +1 (1) , where λ5max = 2, λ4max |λ =1 |λ =2 5

5

are required to check for error patterns corresponding to P = 5 and m = 20 as shown in Figs. 3.21, 3.22, 3.23, 3.24, 3.25 and 3.26 In general, the number of time steps required to generate all integer partitions of size P > 3 for a specific logistic weight (LW) is given by: λmax P

λmax

  P−1|λ P  . . . λ P =1 λ P−1 =λ P +1



λmax 4|λ , ..., λ 5 

P

λ4 =λ5 +1

(1) .

(3.2)

3.4 ORBGRAND Design Exploration

H Memory

65

P Bitonic Sorter

Decoder Controller

Mux

Word Generator

Core

Fig. 3.27: Proposed VLSI architecture for ORBGRAND [1, 2]

3.3.4 Proposed ORBGRAND VLSI Architecture Figure 3.27 shows the proposed VLSI architecture for ORBGRAND which can be used to decode any linear block code of length n and code rate R. The control and clock signals are not shown for clarity. At any time, a new H matrix can be loaded into the H memory of size (n − k) × n-bit which allows this VLSI architecture to support various codes and rates. A syndrome check is performed on the hard decided vector yˆ in the first phase of decoding. Decoding is assumed to be successful if the computed syndrome ( s c = H · yˆ  ) is 0. Otherwise, the LLRs values are sorted in ascending order of their absolute value | y| (| yi | ≤ | y j | ∀i < j; Sec. 3.1) to be used in the following stages of ORBGRAND decoding. As seen in Fig. 3.27, the decoder core receives the sorted syndromes of TEPs (e) with Hamming weight of 1 (si, ∀i ∈ [1, n]), while the multiplexers receive the indices of the sorted LLRs for later use by the word generator module. After sorting the LLRs (| y|), all syndromes of error patterns with Hamming weight of 1 (si ) are queried for codebook membership in a single time-step. This is followed by testing all the TEPs (e) for codebook membership in ascending logistic weight (LW) order as explained in Sec. 3.1. The shift register and XOR gate arrangement proposed in Sec. 3.3.1 generates the syndromes of the TEP (e) corresponding to integer partitions of a logistic weight m (∀m ∈ [3, LWmax ]). The rows of shift registers are combined with the controller’s output s comp , and the resulting test syndromes are NOR-reduced and fed to a 2D priority encoder. Each NOR-reduce output is 1 if and only if all of the bits of the test syndromes are 0. If any of the tested syndrome combinations satisfy the codebook membership constraint (NOR-reduced output is 1), the 2D priority encoder is used in conjunction with the controller module to forward the respective indices to the word generator module, where P multiplexers are used to convert the sorted index values to their appropriate bit-flip locations in yˆ .

3.4 ORBGRAND Design Exploration In this section, we present the VLSI implementation results for ORBGRAND (LW,P). Initially, the implementation results for the baseline ORBGRAND architecture with parameters LW ≤ 64 and P ≤ 6 and parameters LW ≤ 96 and P ≤ 8 are presented. This is accompanied with a comprehensive analysis of the worst-case latency and the worst-case throughput achieved by varying the parameters (LW, P). Finally, to reduce the area overhead of the VLSI implementation, the sorter module for the proposed ORBGRAND hardware is segmented into multiple partitions. The effect of the number of partitions on ORBGRAND decoding performance as well as area overhead is also presented and compared to the ORBGRAND with a non-segmented sorter approach.

66

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

3.4.1 ORBGRAND Baseline Implementation The proposed ORBGRAND VLSI architecture with parameters LW ≤ 64 and P ≤ 6 has been implemented in Verilog HDL and synthesized using Synopsys Design Compiler with general-purpose TSMC 65 nm CMOS technology. The design has been verified using test benches generated via the bit-true C model of the proposed ORBGRAND hardware. Table 3.1 shows the synthesis results for the proposed ORBGRAND hardware decoder with n = 128. The proposed ORBGRAND hardware decoder can support code rates R between 0.75 and 1. Input channel LLRs are quantized on 5 bits, including 1 sign bit and 3 bits for the fractional part. To ensure accuracy in power measurements, switching activities from real test vectors are extracted for all of the VLSI architectures presented in Table 3.1. Table 3.1: TSMC 65 nm CMOS synthesis comparison for ORBGRAND with GRANDAB and DSCF for n = 128 GRANDAB [17] Parameters Technology (nm) Supply (V) Max. Frequency (MHz) Area (mm2 ) W.C. Latency (μs) Avg. Latency (ns) W.C. T/P (Mbps)a Avg. T/P (Gbps)a Power (mW) Energy per Bit (pJ/bit)b Area Efficiency (Gbps/mm2 )c Code compatible

a c

ORBGRAND

DSCF [10]

AB = 3

LW ≤ 64, P ≤ 6

LW ≤ 96, P ≤ 8

LW ≤ 96, P ≤ 8, S = 2

LW ≤ 96, P ≤ 8, S = 4

ω = 2, Tmax = 50

65 0.9 500 0.25 8.196 2 12.8 52.5 46 0.87 210 Yes

65 0.9 454 1.82 9.30 2.47 11.3 42.5 104.3 2.45 23.3 Yes

65 0.9 454 2.25 205.76 2.47 0.51 42.5 133 3.13 18.9 Yes

65 0.9 454 2.08 205.76 2.47 0.51 42.5 131.3 3.09 20.4 Yes

65 0.9 454 1.85 205.76 2.47 0.51 42.5 130 3.0 23 Yes

65 0.9 426 0.22 6.103 122 17.2 0.86 68.51 79.6 3.9 No

k b Information Throughput (Gbps) = Decoding Latency (ns) , Energy per Bit (pJ/bit) = Avg. Throughput (Gbps) Area Efficiency (Gbps/mm2 ) = Area (mm2 )

Power (mW) Avg. Throughput (Gbps)

The maximum frequency supported by the ORBGRAND implementation is 454 MHz. One clock cycle corresponds to one time-step since there is no pipelining technique used in the decoder core. The average decoding latency of the proposed ORBGRAND hardware is calculated using the bit-true C model after taking account for at least 100 frames in error for each ENb0 point. At target FER of 10−7 ( ENb0 > 7.5 dB), the average latency is 2.47 ns, resulting in an average decoding information throughput of 42.5 Gbps for a 5G (128, 105) polar code. However, the worst-case (W.C.) scenario needs 4226 cycles with n = 128 and parameters LW ≤ 64 and P ≤ 6, culminating in a W.C. latency of 9.3 μs. As seen in Fig. 3.4a, ORBGRAND with parameters LW ≤ 64 and P ≤ 6 has similar decoding performance (target FER of 10−5 ) to the Dynamic SC-Flip (DSCF) [9] decoder for (128, 105) a 5G (128, 105) polar code. Therefore, we choose to compare the proposed ORBGRAND VLSI implementation (LW ≤ 64 and P ≤ 6) to VLSI architecture for DSCF (ω = 2, Tmax = 50) polar code decoder [10], which employs 7 and 6 bit internal and channel LLR quantizations, respectively. The ORBGRAND (LW ≤ 64, P ≤ 6) implementation has a 8× area overhead, as well as a 52% increase in the worst-case latency compared to DSCF decoder [10]. However, the proposed ORBGRAND hardware results in 49× higher average information throughput than the DSCF [10] at a target FER of 10−7 . Furthermore compared to DSCF decoder [10], ORBGRAND (LW ≤ 64, P ≤

3.4 ORBGRAND Design Exploration

67

6) hardware is 5× more area efficient and 32× more energy efficient. Moreover, the proposed ORBGRAND hardware is code and rate compatible, while the DSCF [10] decoder can only decode polar codes. In comparison to the hard-input GRANDAB decoder (AB = 3) [17], ORBGRAND (LW ≤ 64, P ≤ 6) has a 7× area overhead, as well as a 13.5% higher W.C. and a 23.5% higher average latency. Furthermore, as compared to GRANDAB decoder [17], ORBGRAND (LW ≤ 64, P ≤ 6) is 2× less energy efficient and 9× less area efficient. However, as seen in Fig. 3.4, the FER performance of ORBGRAND (LW ≤ 64, P ≤ 6), a soft decision decoder, outperforms hard-input counterpart decoders by at least 1.3∼2 dB for target FERs ≤ 10−5 .

Fig. 3.28: Worst-Case (W.C.) latency and W.C. information throughput for the proposed ORBGRAND VLSI architecture with various parameters (LW, P)

3.4.2 Design Expansion and Latency Analysis As illustrated in Fig. 3.4, the ORBGRAND with parameters LW ≤ 96 and P ≤ 8 has similar decoding performance to the ORBGRAND with parameters LWmax = 8256 for (128, 105) Polar code. Furthermore, at a target FER of ≤ 10−7 , ORBGRAND with parameters LW ≤ 96 and P ≤ 8 results in a 0.2∼0.3 dB gain in decoding performance when compared to ORBGRAND with parameters LW ≤ 64 and P ≤ 6. Table 3.1 presents the VLSI implementation results for the proposed ORBGRAND VLSI architecture with parameters LW ≤ 96 and P ≤ 8. As shown in Table 3.1, the ORBGRAND implementation with parameters LW ≤ 96 and P ≤ 8 incurs a 23.6% area overhead when compared to the ORBGRAND implementation with parameters LW ≤ 64 and P ≤ 6. Furthermore, the ORBGRAND implementation with parameters LW ≤ 96 and P ≤ 8 is 18.8% less area efficient and 27.7% less energy efficient than the ORBGRAND implementation with parameters LW ≤ 64 and P ≤ 6. The ORBGRAND parameters LW and P influence the worst-case decoding latency as well as the decoding performance of the proposed ORBGRAND VLSI hardware. In the worst-case scenario, the ORBGRAND hardware with parameters LW ≤ 64 and P ≤ 6 requires 4226 cycles (n = 128), whereas the ORBGRAND

68

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

with parameters LW ≤ 96 and P ≤ 8 requires 93,417 cycles, resulting in a worst-case latency of 205.76 μs. Figure 3.28a depicts the worst-case latency (in clock cycles (3.2)) of the proposed ORBGRAND hardware for various LW and P parameter values. Additionally, Fig. 3.28b depicts the information throughout corresponding to k = 105 and a maximum frequency of 454 MHz for the proposed ORBGRAND (LW, P) hardware. To conclude, the ORBGRAND parameters (LW, P) can be appropriately selected to achieve a balance between the area overhead, the energy consumption budget, and the decoding performance requirements for a target application.

Bitonic

Bitonic

Bitonic

Bitonic

Sorter 1

Sorter 2

Sorter 3

Sorter 4

Fig. 3.29: Proposed segmented sorter (S = 4) for ORBGRAND

Table 3.2: Displacement of LLR elements yi (∀i ∈ [1, n]) from their correct locations with segmented sorter # of segments for bitonic sorter Displacement S = 2 =0 ≤1 ≤2 ≤3 ≤5 ≤ 10 ≤ 20 ≤ 30

10.31% 29.40% 45.50% 58.76% 77.84% 96.89% 99.99% 100%

S=4 5.98% 17.42% 28.18% 38.05% 54.62% 81.64% 98.34% 99.94%

S = 8 S = 16 3.87% 11.40% 18.67% 25.67% 38.65% 63.82% 90.09% 98.10%

2.59% 7.67% 12.65% 17.50% 26.85% 47.68% 75.95% 90.58%

3.4 ORBGRAND Design Exploration

69

3.4.3 ORBGRAND Area Optimization In this section, we investigate the sorter module in the proposed ORBGRAND (LW, P) VLSI implementation and propose segmenting the sorter module into multiple partitions to reduce the area overhead of ORBGRAND hardware. As described in Algorithm 2, ORBGRAND starts by sorting the channel LLRs (y) in ascending order based on their absolute value (| y|). It should be noted that any sorter [18], such as insertion sorter, merge sorter, bubble sorter, can be used to sort | y| for the proposed ORBGRAND hardware. The selection of the sorter is determined by the hardware implementation cost and the desired decoding latency for the target application. Sequential sorters tend to have higher latency but lower hardware implementation cost, while parallel sorters generally have lower latency but higher hardware implementation cost. The proposed ORBGRAND VLSI implementation employs a log2 (n) stages pipelined bitonic sorter [19] of length n capable of fully sorting the received LLR vector (y) in log2 (n) clock cycles. The area utilized by bitonic sorter module of the proposed ORBGRAND VLSI implementation can be reduced by partitioning the sorter into multiple segments. The size and number of partitions influence the decoding performance as well as the area utilized by the proposed ORBGRAND VLSI implementation.

Fig. 3.30: Comparison of decoding performance of ORBGRAND decoding with different parameters (LWmax , P, S) for Polar Code(128, 105+11) The bitonic sorter module of length n is partitioned into S partitions, each having a size Sn , for the proposed segmented sorter approach. Please note that the number of segments S should be chosen in such a way that the size of each segment Sn is a positive integer. A segmented bitonic sorter, which employs four sorters (S = 4) of length n4 , each of which receives an unique subset of channel LLRs (y), is presented in Fig. 3.29. To obtain the final sorted LLRs, the sorted LLRs from individual sorters are concatenated. The first four elements of the final sorted LLRs consist of the first element from the output of each sorter. Similarly, the second element from each sorter’s output is placed in the next four positions of the final sorted LLRs. This process is repeated until the last elements of each sorter’s output are placed in the last four positions of the sorted LLRs, as illustrated in Fig. 3.29. Please note that, a non-segmented sorter will ensure that the sorted LLR elements | yi | (∀i ∈ [1, n]) are placed in their correct location. However, the sorted LLRs using a segmented sorter (S > 1) will have elements

70

3 Hardware Architecture for Ordered Reliability Bits GRAND (ORBGRAND)

(| yi |) that are displaced from their correct locations. To assess the impact of the number of segments on the displacement of LLR elements | yi | from their correct positions, we conducted Monte-Carlo simulations and measured the percentage of LLR elements | yi | that were within a particular distance of their correct position. Table 3.2 shows the displacement of LLR elements | yi | from their correct positions using different numbers of segments in the segmented-sorter approach. Table 3.2 indicates that as the number of segments decreases, a greater number of elements are found closer to their correct positions, while an increasing number of segments leads to LLR elements | yi | being concentrated further away from their correct locations. Figure 3.30 illustrates the FER performance of using a segmented sorter approach for ORBGRAND (LW ≤ 96, P ≤ 8) decoding of (128, 105+11) polar codes. As shown in the figure, the ORBGRAND using the segmented sorter with S = 2 and S = 4 experiences a FER performance degradation of 0.1 dB and 0.3 dB, respectively, at the target FER of 10−6 , compared to ORBGRAND with a non-segmented sorter. Table 3.1 compares VLSI implementation results for the ORBGRAND (LW ≤ 96, P ≤ 8) using the non-segmented sorter to the ORBGRAND using segmented sorter approach. As shown in Table 3.1, the proposed ORBGRAND with a non-segmented sorter has an area overhead of 8% and 21.6%, respectively, compared to ORBGRAND with segmented sorter parameters S = 2 and S = 4. In summary, the number of sorter segments affects both the decoding performance and the area overhead of the ORBGRAND hardware; they can be chosen appropriately to strike a balance between decoding performance requirements and area overhead for a target application.

3.5 Conclusion In this chapter, we propose a hardware architecture for the ORBGRAND algorithm, which is a soft input variant of GRAND that generates test error patterns in a fixed logistic weight order, rendering it suitable for parallel hardware implementation. The ORBGRAND architecture is code-agnostic and can decode any code as long as the length and rate constraints are satisfied. We propose modifications to the ORBGRAND algorithm to simplify hardware implementation and reduce the computational complexity. Furthermore, the proposed ORBGRAND VLSI architecture uses parameters that can be tweaked to meet the optimal decoding performance as well as the decoding latency for a specific application. According to VLSI implementation results, an average decoding throughput of 42.5 Gbps can be achieved for a code length of 128 and at a target FER of 10−7 . In comparison to the state-of-the-art DSCF hardware decoder for the 5G (128, 105) polar code, the proposed ORBGRAND VLSI implementation has 49× higher decoding throughput, 32× higher energy efficiency, and 5× higher area efficiency. Overall, the proposed ORBGRAND VLSI architecture is a significant step toward implementing GRAND family soft-input decoders in hardware. Acknowledgements We would like to thank Dr. Thibaud Tonnellier and Dr. Furkan Ercan for contributing to this chapter as co-authors. Dr. Furkan Ercan is a Research Scientist affiliated with Intel Labs, Intel Corporation, Hudson, MA, 01749 (Email: [email protected]). Dr. Thibaud Tonnellier is with Airbus Defence and Space, France. (Email: [email protected].)

References

71

References 1. Abbas, S., Tonnellier, T., Ercan, F., Jalaleddine, M., & Gross, W. (2021). High-throughput VLSI architecture for soft-decision decoding with ORBGRAND. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8288–8292). 2. Abbas, S., Tonnellier, T., Ercan, F., Jalaleddine, M., & Gross, W. (2022). High-throughput and energyefficient VLSI architecture for ordered reliability bits GRAND. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 30, 681–693. 3. Duffy, K. R. (2021). Ordered reliability bits guessing random additive noise decoding. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8268–8272). 4. Duffy, K. R., Li, J., & Médard, M. (2019). Capacity-achieving guessing random additive noise decoding. IEEE Transactions on Information Theory, 65(7), 4023–4040. 5. Solomon, A., Duffy, K. R., & Médard, M. (2020). Soft maximum likelihood decoding using grand. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC) (pp. 1–6). 6. 3GPP NR; Multiplexing and Channel Coding (2020). http://www.3gpp.org/DynaReport/38-series.htm, Rel. 16.1. 7. Tal, I., & Vardy, A. (2015). List decoding of polar codes. IEEE Transactions on Information Theory, 61, 2213–2226. 8. Balatsoukas-Stimming, A., Parizi, M., & Burg, A. (2015). LLR-based successive cancellation list decoding of polar codes. IEEE Transactions on Signal Processing, 63, 5165–5179. 9. Chandesris, L., Savin, V., & Declercq, D. (2018). Dynamic-SCFlip decoding of polar codes. IEEE Transactions on Communications, 66, 2333–2345. 10. Ercan, F., Tonnellier, T., Doan, N., & Gross, W. (2020). Practical dynamic SC-flip polar decoders: Algorithm and implementation. IEEE Transactions on Signal Processing, 68, 5441–5456. https://doi. org/10.1109/TSP.2020.3023582 11. Berlekamp, E. (1968). Nonbinary BCH decoding (Abstr.). IEEE Transactions on Information Theory, 14, 242–242. 12. Massey, J. (1969). Shift-register synthesis and BCH decoding. IEEE Transactions on Information Theory, 15, 122–127. 13. Fossorier, M., & Lin, S. (1995) Soft-decision decoding of linear block codes based on ordered statistics. IEEE Transactions on Information Theory, 41, 1379–1396. 14. Yue, C., Shirvanimoghaddam, M., Li, Y., & Vucetic, B. (2019). Segmentation-discarding ordered-statistic decoding for linear block codes. In 2019 IEEE Global Communications Conference, GLOBECOM 2019, Waikoloa, HI, USA, December 9–13, 2019 (pp. 1–6). https://doi.org/10.1109/GLOBECOM38437.2019. 9014173 15. Wonterghem, J., Alloum, A., Boutros, J., & Moeneclaey, M. (2017). On performance and complexity of OSD for short error correcting codes in 5G-NR. In Proceedings of the First International Balkan Conference on Communications and Networking (BalkanCom 2017). 16. Butler, J., & Sasao, T. (2014). High-speed hardware partition generation. ACM Transactions on Reconfigurable Technology and Systems, 7, 1–17. 17. Abbas, S., Tonnellier, T., Ercan, F., & Gross, W. (2020). High-throughput VLSI architecture for GRAND. In 2020 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 1–6). 18. Cormen, T. (2013). Algorithms for sorting and searching. In Algorithms Unlocked (pp. 25–59) 19. Batcher, K. (1968). Sorting networks and their applications. In Proceedings of the April 30–May 2, 1968, Spring Joint Computer Conference (pp. 307–314). https://doi.org/10.1145/1468075.1468121

Chapter 4

Hardware Architecture for List GRAND (LGRAND)

Abstract In this chapter, we introduce the GRAND variant List-GRAND (LGRAND), which achieves Maximum Likelihood (ML) decoding performance similar to SGRAND and is well-suited for parallel hardware implementation like ORBGRAND. The numerical simulation results demonstrate that the proposed LGRAND outperforms ORBGRAND in decoding performance by 0.5∼0.75 dB for channel-codes of different classes at a target FER of 10−7 . Furthermore, the LGRAND VLSI implementation achieves an average information throughput of 47.27∼51.36 Gbps for linear block codes of length 127/128 and various coderates, and has an area overhead of 4.84% compared to the ORBGRAND VLSI implementation.

4.1 Introduction The Soft GRAND (SGRAND) [1] outperforms other GRAND variants in terms of decoding performance and achieves ML decoding performance. However, the SGRAND is not suitable for parallel hardware implementation. The Test Error Patterns (TEPs) e generated by SGRAND are interdependent, and the order of the codebook membership queries changes depending on the received vector of channel observation values (y) (explained in Chapter 1 Sec. 1.4). Due to this TEP interdependence, realizing an efficient parallel hardware implementation for SGRAND proves to be a challenging task, and a sequential hardware implementation for SGRAND will lead to high decoding latency, thereby rendering it unsuitable for applications that require extremely low latency. On the other hand, Ordered Reliability Bits GRAND (ORBGRAND) [2] employs integer partitioning to generate TEPs (e) in a predefined logistic weight order. The generated TEPs (e) are independent of one another and can be generated concurrently (Please see Chapter 1 Sec. 1.4 as well as Chapter 3 for detailed explanation of TEP generation for ORBGRAND). ORBGRAND is thus extremely parallelizable and well suited to parallel hardware implementation. A high-throughput VLSI design for ORBGRAND is presented in [3] for linear block codes with a code length (n) of 128 (Please see Chapter 3 for details). Due to the parallel generation of TEPs, the ORBGRAND hardware [3] can execute 1.16 × 105 codebook membership queries in 4226 clock cycles. Therefore, on one end of the spectrum we have SGRAND, which achieves ML decoding performance but is not suitable for parallel hardware implementation, and on the other end we have ORBGRAND, which is suitable for parallel hardware implementation but does not offer ML decoding performance. In this chapter, we propose List-GRAND (LGRAND), a variant of ORBGRAND that approaches ML decoding performance for

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_4

73

74

4 Hardware Architecture for List GRAND (LGRAND)

channel codes of different classes (Bose–Chaudhuri–Hocquenghem (BCH) codes [4, 5], Cyclic Redundancy Check (CRC) codes [6] and CRC-Aided-Polar (CA-Polar) codes [7]), and is suitable for efficient parallel hardware implementations. The proposed LGRAND also introduces parameters that can be adjusted to meet the target decoding performance and complexity budget of a target application. The proposed LGRAND works on the premise of generating a list (L) of possible candidate codewords during decoding process and choosing the most likely candidate as final output codeword. This chapter begins by examining the worst-case complexity and TEPs (e) generation procedure for ORBGRAND. Afterwards, the proposed List-GRAND (LGRAND) is presented, its parameters are evaluated, and numerical simulation results are discussed. Towards the end of the chapter, LGRAND hardware architecture as well as the VLSI implementation results are presented. Please note that this chapter follows up on our previously published work [8].

Fig. 4.1: Comparison of decoding performance of ORBGRAND (LWmax ,HWmax ) decoding of BCH code (63, 45)

4.2 ORBGRAND: Analysis and Proposed Modifications This section explores the impact of ORBGRAND parameters on decoding performance as well as the required maximum number of codebook membership queries (the worst-case complexity). In addition, a list-based technique to enhance ORBGRAND decoding performance is introduced in this section.

4.2.1 ORBGRAND: Parametric Analysis ( LWmax and HWmax ) Figures 4.1 and 4.2 illustrate how the ORBGRAND parameters (LWmax and HWmax ) influence the decoding performance as well as the worst-case complexity of ORBGRAND for decoding BCH code (63, 45) with BPSK modulation over an AWGN channel. The decoding performance of ORBGRAND is improved by

4.2 ORBGRAND: Analysis and Proposed Modifications

75

Fig. 4.2: Maximum number of queries (worst-case complexity) comparison for ORBGRAND decoding of BCH Code (63, 45) increasing the values of the parameters LWmax and HWmax , however; as shown in Fig. 4.2, this also increases the worst-case complexity. Therefore, appropriate ORBGRAND parameter (LWmax and HWmax ) values can be chosen to reduce worst-case complexity while having a minimum impact on decoding performance.

4.2.2 Proposed List-GRAND (LGRAND) for Approaching ML Decoding Algorithm 3 outlines the steps of the proposed LGRAND technique to approach ML decoding performance. The inputs to LGRAND are identical to those of ORBGRAND, as described in Chapter 3, with the exception of the additional parameter δ (threshold for logistic weight). In contrast to ORBGRAND, which stops decoding as soon as any vector ( yˆ ⊕ e) satisfies the codebook membership criterion (H · ( yˆ ⊕ e) = 0), the proposed LGRAND generates a list of estimated codewords (L) during the decoding process and selects the most likely (arg max p(y| cˆ )) one as the final codeword cˆ f inal . cˆ ∈ L

Lines 1–2: A preliminary syndrome check is performed on the hard-demodulated received signal yˆ from the communication channel. If the syndrome is zero (H · yˆ T = 0), it implies that the received signal yˆ is error free. Otherwise, the decoding procedure continues with LGRAND. Line 4: Similar to ORBGRAND, LGRAND sorts the received vector (y) from the communication channel in ascending order of absolute values of Log-Likelihood Ratios (LLRs) (| yi | ≤ | y j | ∀i < j), and the associated indices are recorded in a permutation vector denoted by ind. Line 5: The Hamming weight of the generated TEPs (e) is restricted to ≤ Δ (Δ is initialized to HWmax ) and Λ is initialized to LWmax . Lines 7–8: The integer partitions (λ) of a logistic weight i (∀i ∈ [0, Λ], where Λ = LWmax ) are generated (For details regarding generating integer partitions λ of a specific logistic weight, please refer to Chapter 3 Sec. 3.2). Line 10 Following that, LGRAND generates TEPs (e) using the integer partitions λ, and the TEPs (e) are ordered using the permutation vector ind.

76

4 Hardware Architecture for List GRAND (LGRAND)

Algorithm 3: List-GRAND (LGRAND) algorithm

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Input: y, H, G−1 , LWmax , HWmax , δ Output: uˆ if H · yˆ  == 0 then return uˆ ← yˆ · G−1 else ind ← sortChannelObservationValues(y) e ← 0; Λ ← LWmax ; Δ ← HWmax ; L ← ∅; for i ← 1 to Λ do S ← generateAllIntPartitions(i) forall l in S do e ← genErrorPatternMaxHammingWt(l,ind, Δ) if H · (yˆ ⊕ e) == 0 then cˆ ← yˆ ⊕ e ˆ addToList(L,c) if Λ == LWmax then Λ ← min(i + δ, LWmax ) Δ ← HammingWeight(e )

17

ˆ cˆ f i n al ← arg max p(y | c)

18 19

uˆ ← cˆ f i n al · G−1 return uˆ

// |y i | ≤ |y j |

∀i < j

// (λ1, λ2, . . . , λ P ) i // HammingWeight(e ) ≤ Δ

ˆ c∈L

Lines 11–12: The generated TEPs (e) are then applied to the hard-decided vector yˆ and the codebook membership criterion is evaluated (H · ( yˆ ⊕ e) == 0). Line 13–16: LGRAND adds the vector yˆ ⊕ e to the list L whenever a vector yˆ ⊕ e satisfies the codebook membership constraint (H · ( yˆ ⊕ e)T == 0). If this is the first vector that satisfies the codebook membership constraint, Λ is set to the minimum of LWmax and (i + δ) and Δ is limited to the Hamming weight of the current TEP (e). These updated parameters enable LGRAND to perform extended search for TEPs (e) that have the same Hamming weight as the current TEP (e). Any vector yˆ ⊕ e that satisfies the codebook membership criterion in the extended search is instantly added to the list L. Line 17–18: Finally, the most likely candidate (arg max p(y| cˆ )) from the list L is selected as the final cˆ ∈ L

ˆ is retrieved from cˆ f inal . codeword cˆ f inal and the original message ( u) Limiting the extended search for LGRAND The parameter δ serves as a strict hard-limit for the extended search for possible additional codewords ( cˆ ). A valid codeword is added to the list L as soon as the LGRAND discovers it during this extended search. In some cases, the decoder may terminate its search after approaching the δ hard-limit without discovering any additional codewords. However, if a desired list-size is chosen as a limit for the extended search, the decoder will have to continue searching until it reaches the maximum number of queries (Queriesmax ) in cases where the extended search fails to find any additional codewords. Therefore, the parameter δ is more suitable than the desired list size to impose a tight limit on the extended search.

4.2 ORBGRAND: Analysis and Proposed Modifications

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

77

LW = 8 LW = 9

LW = 10

0

5

10

15

20

25

30

35

(a) 40

45

50

55

60

65

70

0

5

10

15

20

25

30

35

(b) 40

45

50

55

60

65

70

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

(c)

Fig. 4.3: Test Error Pattern (TEP) generation for LGRAND for (n = 12, LWmax = 12, HWmax = 4, δ = 2 ) (a) Codebook membership criterion satisfied by 21st TEP (e) with LW = 8 and HW = 2 (red rectangle). (b) Checking additional TEPs for LGRAND (δ = 2) (green rectangle; extended search). (c) Restricting HW of additional TEPs to ≤ 2 (δ = 2 and Δ = Hamming W eight(e)) (brown rectangle; extended search)

Example 4.1: LGRAND Test Error Patterns (TEPs) Figure 4.3a depicts the LGRAND TEPs (e) for parameters n = 12, LWmax = 12, HWmax = 4 and δ = 2. Please note that each column in Fig. 4.3 represents a TEP (e), and each dot represents a position where a bit of the received hard demodulated vector yˆ is to be flipped. For instance, the 21st TEP, which is highlighted in red and corresponds to an integer partition of 8 (LW = 8) and has a Hamming weight of 2, corresponds to e = [0 1 0 0 0 1 0 0 0 0 0 0] and, when combined with the hard demodulated vector ( yˆ ⊕ e), will flip the 2nd and 6th bits of yˆ . Consider the scenario where the 21st TEP (Fig. 4.3a) satisfies the codebook membership criterion (Algorithm 3: line 11) during the proposed LGRAND decoding. Instead of terminating the decoding process, the LGRAND checks additional TEPs (e) corresponding to LW = 9 and LW = 10 (as we set δ = 2; extended search), as shown in Fig. 4.3b. If any of these additional TEPs (e), which correspond to LW = 9 and LW = 10 and are shown in green rectangle in Fig. 4.3b, fulfill the codebook membership constraint, they are added to List L (Algorithm 3: line 13), and the most likely codeword is selected as the final codeword cˆ f inal (Algorithm 3: line 18).

78

4 Hardware Architecture for List GRAND (LGRAND)

The parameter δ serves as a strict hard-limit for the extended search (green rectangle in Fig. 4.3b) for possible additional codewords. To decrease the number of generated TEPs (e) in the extended search (green rectangle in Fig.4.3b), the maximum Hamming weight of the TEPs (e) is restricted to the Hamming weight of the first TEP that satisfied the codebook membership criterion. For instance, the 21st TEP in Fig. 4.3a that meets the codebook membership criterion has a Hamming weight of 2 (Δ = 2) (Algorithm 3: line 16), hence the TEPs (e) in the extended search (green rectangle in Fig.4.3b) can only have a Hamming weight of less than or equal to 2 (≤ Δ), as shown in Fig. 4.3c (brown rectangle).

Fig. 4.4: Parametric analysis of LGRAND (LWmax , HWmax , δ) for BCH (127, 113) code (at

Eb N0

= 6.5 dB)

Fig. 4.5: Worst-case complexity for ORBGRAND and LGRAND decoding of linear block codes of length n

4.2 ORBGRAND: Analysis and Proposed Modifications

79

4.2.3 LGRAND: Analyzing the Parameter δ The proposed LGRAND technique introduces the parameter δ, which affects both the decoding performance and the average complexity (average number of codebook membership queries (TEPs)) of LGRAND. In this section, we examine the LGRAND parameter δ from both a complexity and decoding performance perspective. It should be noted that because the worst-case complexity of ORBGRAND and LGRAND depends on the parameters LWmax and HWmax , which are fixed for both algorithms, the worst-case complexity of LGRAND and ORBGRAND are equal. Figure 4.4 shows the impact of varying the value of δ on the decoding performance and average computational complexity for the LGRAND decoding of the BCH code (127, 113) at ENb0 = 6.5 dB. Figure 4.4a illustrates that while increasing the value of the parameter δ improves decoding performance, it also increases the average computational complexity as shown in Fig. 4.4b. Therefore, the appropriate value of parameter δ can be chosen to strike a balance decoding performance and computational complexity of LGRAND. Worst Case Complexity The worst-case complexity for ORBGRAND and LGRAND decoding of linear block codes with lengths n = 127 and n = 128 is shown in Fig. 4.5. For parameters n = 127, LWmax =127 and HWmax = 16, the maximum number of queries (worst-case complexity) for both LGRAND and ORBGRAND decoders is 4.93 × 107 queries as shown in Fig. 4.5a.

Fig. 4.6: Comparison of decoding performance and average complexity of different GRAND variants for BCH (127, 106) code

80

4 Hardware Architecture for List GRAND (LGRAND)

Fig. 4.7: Comparison of decoding performance and average complexity of different GRAND variants for BCH (127, 113) code

Fig. 4.8: Comparison of decoding performance and average complexity of different GRAND variants for CRC Code (128, 104)

4.2 ORBGRAND: Analysis and Proposed Modifications

81

Fig. 4.9: Comparison of decoding performance and average complexity of different GRAND variants for CRC code (128, 112)

Fig. 4.10: Comparison of decoding performance and average complexity of different GRAND variants for polar code (128, 105+11)

82

4 Hardware Architecture for List GRAND (LGRAND)

4.3 Evaluating the Decoding Performance of LGRAND In this section, we investigate the proposed LGRAND’s decoding performance and computational complexity for various classes of channel codes (BCH code, CA-Polar codes and CRC code). For the numerical simulation results presented in this section, a BPSK modulation over an AWGN channel with variance σ 2 is considered. Figures 4.6a and 4.7a compare the FER performance with several GRAND variants for decoding BCH codes (127, 106) and (127, 113), respectively. Furthermore, the ML decoding [9] results are also provided for reference. At a target FER of 10−7 , the proposed LGRAND, with different parameter settings (LWmax , HWmax , δ), outperforms ORBGRAND in terms of decoding performance by 0.25∼0.75 dB. For the BCH code (127, 106), as shown in Fig. 4.6a, LGRAND with parameters LWmax = 127, HWmax = 16, and δ = 30 achieves a decoding performance gain of 0.7 dB over ORBGRAND at a target FER of 10−7 . Similarly, for the BCH code (127, 113), as demonstrated in Fig. 4.7a, LGRAND with parameters LWmax = 96, HWmax = 8 and δ = 25 achieves decoding performance gain of 0.75 dB over ORBGRAND at a target FER of 10−7 . The average computational complexity for different GRAND variants for decoding BCH code (127, 106) and BCH code (127, 113), respectively, is shown in Figs. 4.6b and 4.7b. Although SGRAND requires the fewest queries of any GRAND variant, it is not suitable for parallel hardware implementation (explained in detail in Chapter 1). Therefore, it is reasonable to compare the number of queries required by the proposed LGRAND to the number required by ORBGRAND since both are equally suitable for parallel hardware implementation. Comparison with Enhanced ORBGRAND TEP Scheduling Schemes Recently, an improved ORBGRAND TEP scheduling technique was presented in [10]; this approach generates TEPs using Improved Logistics Weight Order (ILWO) as opposed to the conventional logistic weight order [2]. The TEPs with lower Hamming weights are given precedence in the ILWO, whereas the TEPs with higher Hamming weights are penalized. The parameter Queriesmax , which indicates the maximum number of queries allowed, influences the decoding performance as well as the computational complexity for ORBGRAND-ILWO [10]. To strike a balance between the decoding performance and computational complexity, appropriate values of Queriesmax can be selected. We refer the reader to [10] for more details on the ILWO TEP schedule and performance/complexity tradeoffs. As shown in Fig. 4.6a, the ORBGRAND-ILWO (Queriesmax =106 ) [10] outperforms the baseline ORBGRAND by ∼0.6 dB, at a target FER of 10−7 , for decoding BCH code (127, 106). Similarly, Fig. 4.7a illustrates the ORBGRAND-ILWO [10] outperforms the baseline ORBGRAND decoder by 0.3∼0.55 dB, at the target FER of 10−7 , for decoding BCH code (127, 113).

4.4 Implementing List-GRAND (LGRAND) in Hardware

83

LGRAND decoding performance and average computational complexity for CRC codes [6] are presented in Figs. 4.8 and 4.9. The generator polynomials are 0xB2B117 and 0x1021 for CRC code (128, 104) and CRC code (128, 112), respectively. In addition to outperforming ORBGRAND, the proposed LGRAND can also achieve ML decoding performance similar to SGRAND with different parameter (LWmax , HWmax ) settings. Furthermore, at the target FER of 10−7 , as demonstrated in Figs. 4.8 and 4.9, the proposed LGRAND outperforms the ORBGRAND-ILWO [10] by 0.1∼0.2 dB. Figure 4.10 compares the proposed LGRAND’s decoding performance, as well as average computational complexity, with different variants of GRAND for decoding the 5G NR CA-polar code (128, 105+11). Furthermore, the decoding performance of state-of-the-art soft-input decoder such as the CA-SCL decoder [11, 12] is included for reference. At a target FER of 10−7 , LGRAND with parameters LWmax = 96, HWmax = 8 and δ = 20 achieves decoding performance comparable to SGRAND and CA-SCL decoder.

H Memory

P Bitonic Sorter

Decoder Controller

Mux

Word Generator

Core

Fig. 4.11: VLSI architecture for ordered reliability bits GRAND (ORBGRAND) [3]

4.4 Implementing List-GRAND (LGRAND) in Hardware The proposed VLSI architecture for LGRAND is covered in this section. The proposed LGRAND hardware incorporates the techniques from GRANDAB [13] and ORBGRAND VLSI architectures [3], which are also

ORBGRAND

MLCU

Fig. 4.12: Proposed LGRAND VLSI architecture [8]

84

4 Hardware Architecture for List GRAND (LGRAND) SMto2C SMto2C

SMto2C

2CtoSM

SMto2C

SMto2C 2CtoSM

SMto2C

SMto2C

SMto2C

SMto2C

Fig. 4.13: Maximum likelihood computation unit (M ←

n

cˆ i i=1 [(−1) yi ])

Fig. 4.14: Example of interaction between the MLCU module and the ORBGRAND decoder for n = 8 and y = [−1, +10, +5, −2, +3, +6, −7, +4] covered in-depth in Chapters 2 and 3 of this book. Shift registers are used in [13] and [3] to store (n − k)-bit syndromes of TEPs (e) with a Hamming weight of 1 (e = 1i , i ∈ 1 . . n). Please note that syndrome of the TEPs (e) with Hamming weight of 1 is denoted by si (si = H · 1 i , i ∈ 1 . . n). Furthermore, the proposed LGRAND hardware enables the combining of several si to generate syndrome of the TEPs (e) with   Hamming weight of l (s1,2...,l = H · 1 1 ⊕ H · 12 . . . ⊕ H · 1l , ∀1 < l ≤ n).

4.4 Implementing List-GRAND (LGRAND) in Hardware

85

SMto2C

SMto2C

SMto2C

SMto2C 2CtoSM

SMto2C

SMto2C

SMto2C

SMto2C

Fig. 4.15: Computing the likelihood value (M) for the candidate codeword cˆ 1 = [1 0 1 0 0 1 0 0]

SMto2C

SMto2C

SMto2C

SMto2C 2CtoSM

SMto2C

SMto2C

SMto2C

SMto2C

Fig. 4.16: Computing the likelihood value (M) for the candidate codeword cˆ 2 = [0 0 1 1 0 0 1 1]

4.4.1 Baseline ORBGRAND Hardware The top-level ORBGRAND VLSI architecture [3], which can decode any linear (n, k) block code with code rate 0.75 ≤ R ≤ 1, is shown in Fig. 4.11. The ORBGRAND hardware receives a vector of channel observation values (y) as input and returns the estimated word uˆ as output. Hard-demodulated channel observation values ( yˆ ) are passed into the decoding core, which then generates TEPs (e) in the logistic weight order and applies them to yˆ . The resulting vector yˆ ⊕ e is then evaluated for codebook membership. If any of the TEPs (e), satisfy the codebook membership constraint, a 2D priority encoder and controller module are used to forward the corresponding indices to the word generator module, where P multiplexers are

86

4 Hardware Architecture for List GRAND (LGRAND)

SMto2C

SMto2C

SMto2C

SMto2C 2CtoSM

SMto2C

SMto2C

SMto2C

SMto2C

Fig. 4.17: Computing the likelihood value (M) for the candidate codeword cˆ 3 = [1 0 0 1 0 1 1 0]

used to transform the sorted index values to their appropriate bit-flip locations. Please note that P refers to the maximum Hamming weight (HWmax ) of the generated TEPs (e) supported by the ORBGRAND hardware. We refer the reader to Chapter 3 for more details pertaining the ORBGRAND decoder hardware.

4.4.2 Developing LGRAND Hardware by Tweaking ORBGRAND VLSI Architecture This section explains how the VLSI design for ORBGRAND [3] can be modified to support the proposed LGRAND. The proposed LGRAND hardware, which expands on the ORBGRAND hardware and incorporates the module Maximum Likelihood Computation Unit (MLCU), is shown in Fig. 4.12. In the proposed LGRAND VLSI architecture, the ORBGRAND decoder collaborates with the MLCU module to select the most likely codeword from a list (L) of candidate codewords. Figure 4.13 shows the micro-architecture of the proposed MLCU module. The MLCU takes estimated  codeword cˆ and n × Q-bit y, where Q is the quantization width, as inputs and generates the Q + log2 n -bit n likelihood value M as output. To compute the likelihood M ← i=1 [(−1)cˆi yi ] [14], the elements of the y vector (yi, ∀i ∈ [1, n]) are added using an adder tree with log2 n stages. Moreover, to facilitate signed addition and comparison, the components SMto2C and 2CtoSM are used to convert from sign-magnitude to 2’s-complement form and from 2’s-complement to sign-magnitude representation, respectively. In the proposed LGRAND VLSI architecture, depicted in Fig. 4.12, the ORBGRAND decoder delivers the estimated codeword ( cˆ ) to the MLCU module, which then calculates the likelihood value n [(−1)cˆi yi ] and returns it to the ORBGRAND decoder. Please note that the list L is not stored in a M ← i=1 n separate memory in the proposed LGRAND hardware; rather, the likelihood value M curr ← i=1 [(−1)cˆi yi ] of an estimated codeword ( cˆ ) is computed by the MLCU as soon as it is available at the ORBGRAND

4.4 Implementing List-GRAND (LGRAND) in Hardware

87

decoder. The output estimated codeword ( cˆ f inal ) is replaced with the currently estimated codeword if the likelihood value of currently estimated codeword (M curr ) is higher than the likelihood value of previous the codeword (M pr ev ) in the LGRAND decoder. The decoding process is then completed n aftercˆrecovering [(−1) i yi ][14]). original message ( uˆ = cˆ f inal · G−1 ) from the output codeword ( cˆ f inal ← arg max i=1 cˆ ∈ L

Example 4.2: Selection of the Most Likely Codeword with LGRAND Let us look at an example where n = 8 and y = [−1, +10, +5, −2, +3, +6, −7, +4] to better understand how the ORBGRAND decoder and MLCU module interact in the proposed LGRAND hardware depicted in Fig. 4.12. Let us suppose there are three candidate codewords in the list L as shown in Fig. 4.14. The MLCU module generates the likelihood value M = −2, as displayed in Fig. 4.15, for the candidate codeword cˆ 1 = [1 0 1 0 0 1 0 0]. Similarly, the likelihood values computed by the MLCU module for candidate codewords cˆ 2 = [0 0 1 1 0 0 1 1] and cˆ 3 = [1 0 0 1 0 1 1 0] are +18 and +26, respectively, as shown in Figs. 4.16 and 4.17. The proposed LGRAND decoder selects cˆ 3 = [1 0 0 1 0 1 1 0] as the final candidate codeword because it generates the highest likelihood value (M = +26) from all of the candidate codewords in the list L as shown in Fig. 4.14.

4.4.3 Hardware Implementation Results for the Proposed LGRAND The proposed LGRAND with parameters (LW ≤ 96, HW ≤ 8, δ ≤ 30) has been implemented in Verilog HDL and synthesized with Synopsys Design Compiler using the general-purpose TSMC 65 nm CMOS technology. The hardware implementation is verified using test benches generated by the bit-true C model of the proposed LGRAND hardware. Furthermore, switching activity from real test vectors, for the proposed LGRAND hardware, is extracted to ensure accuracy in power measurements.

4.4.3.1 Comparison with ORBGRAND The LGRAND implementation is compared with the ORBGRAND hardware (LW≤96, HW≤8), supporting code-length n = 128/127 and code-rate R (0.75 ≤ R ≤ 1), as shown in Table 4.1. Please note that the input channel LLRs (y) are quantized on 5 bits, for both the LGRAND and ORBGRAND designs, with 1 sign bit and 3 bits for the fractional part. A maximum clock frequency of 454 MHz can be supported by the hardware implementations of ORBGRAND and LGRAND. Since we do not consider any pipelining techniques for the ORBGRAND decoder core, one clock cycle corresponds to one time step. As shown in Table 4.1, the proposed LGRAND hardware has a worst-case information throughput (W.C. T/P) of 0.5∼0.549-Mbps. However, with improved channel conditions, the average latency for both ORBGRAND and LGRAND decoders is reduced to 1 clock cycle per decoded codeword, which translates to 2.2 ns. Please note that at least 100 error frames are captured at each ENb0 point, and the average latency is calculated using the bit-true C model of the proposed hardware. Figure 4.18 shows the average information throughput and decoding latency for channel codes of different

88

4 Hardware Architecture for List GRAND (LGRAND)

Fig. 4.18: Comparison of average latency and average information throughput for the ORBGRAND[3] VLSI architecture and the proposed LGRAND VLSI architecture for CRC Code (128, 112) and BCH code (127, 113)

classes using the ORBGRAND and LGRAND hardware. As depicted in Fig. 4.18, the average information throughput increases with ENb0 , reaching values of 47.27−51.36 Gbps. As shown in Table 4.1, in comparison to ORBGRAND implementation, the proposed LGRAND hardware has a 4.84% area overhead. Furthermore, the proposed LGRAND is 7.7∼8.2% less energy efficient than the ORBGRAND.

4.4 Implementing List-GRAND (LGRAND) in Hardware

89

Table 4.1: TSMC 65 nm CMOS Synthesis Comparison for LGRAND (LW ≤ 96, HW ≤ 8, δ ≤ 30) with ORBGRAND (LW ≤ 96, HW ≤ 8) for n = 128/127 and 0.75 ≤ R ≤ 1 LGRAND Parameters Technology (nm) Supply (V) Max. Frequency (MHz) Area (mm2 ) W.C. Latency (μs) Avg. Latency (ns)

W.C. T/P (Mbps)

Avg. T/P (Gbps)

k = 104b k = 105c k = 106d k = 112e k = 113f k = 104b k = 105c k = 106d k = 112e k = 113f

Power (mW) k = 104b k = 105c Energy per Bit (pJ/bit) k = 106d k = 112e k = 113f k = 104b k = 105c Area Efficiency (Gbps/mm2 ) k = 106d k = 112e k = 113f Code compatible Rate compatible

ORBGRAND[3]

(LW ≤ 96, HW ≤ 8, δ ≤ 30) (LW ≤ 96, HW ≤ 8) 65 65 0.9 0.9 454 454 2.38 2.27 205.76 205.76 2.2a 2.2a 0.505 0.505 0.510 0.510 0.515 0.515 0.544 0.544 0.549 0.549 47.27 47.27 47.72 47.72 48.18 48.18 50.90 50.90 51.36 51.36 146.27 134.43 3.09 2.84 3.06 2.81 3.03 2.79 2.87 2.64 2.84 2.62 19.86 20.82 20.05 21.02 20.24 21.22 21.38 22.42 21.58 22.62 Yes Yes Yes Yes

b c b For E N0 ≥ 8.5 dB (Fig. 4.18), CRC Code (128, 104), Polar code (128, 105+11) e f BCH Code (127, 106), CRC Code (128, 112), BCH Code (127, 113) k Information Throughput (Gbps) = Decoding Latency (ns) T/P (Gbps) Power (mW) Energy per Bit (pJ/bit) = Avg. T/P (Gbps) , Area Efficiency (Gbps/mm2 ) = Avg. Area (mm2 ) a

d

4.4.3.2 Comparison with Fixed Latency ORBGRAND Decoder (F.L ORBGRAND) A Look-Up-Table (LUT) assisted F.L ORBGRAND has recently been proposed in [15] and implemented on TSMC 7 nm Fin-FET technology. Table 4.2 compares the proposed LGRAND to the F.L ORBGRAND hardware implementation [15]. Disparity in the Technology Nodes (65 vs 7 nm) Due to the vast discrepancy in the technology nodes employed (65 vs 7 nm), scaling is not used to compare LGRAND with F.L. ORBGRAND [15].

90

4 Hardware Architecture for List GRAND (LGRAND)

Table 4.2: TSMC CMOS synthesis comparison for LGRAND (LW ≤ 96, HW ≤ 8, δ ≤ 30) with F.L. ORBGRAND [15] decoder for 5G NR CA-polar code (128, 105+11) LGRAND F.L. ORBGRANDa [15] Parameters (LW ≤ 96, HW≤8, δ ≤ 30) Queriesmax =213 Technology (nm) 65 7 Supply (V) 0.9 0.5 Max. Frequency (MHz) 454 701 Area (mm2 ) 2.38 3.70 W.C. Latency (ns) 20,5764.3 58.49 Avg. Latency (ns) 2.2b 58.49 W.C. T/P (Mbps) 0.51 73,610 Avg. T/P (Gbps) 47.7 73.61 Power (mW) 146.27 170.84 Energy per Bit (pJ/bit) 3.06 2.32 Area Efficiency (Gbps/mm2 ) 20.04 19.89 Code compatible Yes Yes Rate compatible Yes Yes a For Q LUT = 512, QS = 256 , T = 34 b For LGRAND with parameters LW max = 96, HWmax = 8 and δ = 20

The F.L ORBGRAND decoder [15] deploys T decoder pipeline stages and stores the n−bit error patterns in T − 2 Q s × n-bit pattern memories, where QS is memory depth. Furthermore, F.L ORBGRAND also employs QLUT × n-bit LUTs to store certain error patterns (for details we refer the reader to [15]). The highly pipelined VLSI architecture of the F.L. ORBGRAND decoder [15] enables it to decode the 5G NR CA-polar code (128, 105+11) with a fixed latency of 58.49 ns and achieves a maximum information throughput of 73.61 Gbps. However, the proposed LGRAND hardware achieves an average information throughput of 47.7 Gbps for the same 5G CA-polar code (128, 105). Please note that the proposed LGRAND (LWmax = 96, HWmax = 8, δ = 20) outperforms the F.L. ORBGRAND (Queriesmax =213 ) decoder [15] by ∼0.5 dB at the target FER of 10−7 , and achieves similar decoding performance to SGRAND as depicted in Fig. 4.10.

4.5 Conclusion In this chapter, we introduced List-GRAND (LGRAND), a technique for improving ORBGRAND decoding performance to approach ML decoding performance. The proposed LGRAND features parameters that can be tweaked to meet the target decoding performance and complexity budget of a specific application. Furthermore, with the appropriate parameters, LGRAND achieves decoding performance comparable to SGRAND. Numerical simulation results show that the proposed LGRAND achieves a 0.5∼0.75 dB decoding performance gain over ORBGRAND for channel codes of different classes (BCH, CA-Polar, and CRC) at a target FER of 10−7 . The proposed LGRAND, similar to ORBGRAND, is well suited for parallel hardware implementation. Furthermore, the proposed LGRAND VLSI architecture can achieve an average information throughput of 47.27∼51.36 Gbps for linear block codes of length 127/128 and different code-rates.

References

91

References 1. Solomon, A., Duffy, K. R., & Médard, M. (2020). Soft maximum likelihood decoding using grand. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC) (pp. 1–6). 2. Duffy, K. R. 2021. Ordered reliability bits guessing random additive noise decoding. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8268–8272). 3. Abbas, S., Tonnellier, T., Ercan, F., Jalaleddine, M., & Gross, W. (2022). High-throughput and energyefficient VLSI architecture for ordered reliability bits GRAND. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 30, 681–693. 4. Hocquenghem, A. (1959). Codes correcteurs d’erreurs. Chiffres, 2, 147–156. 5. Bose, R., & Ray-Chaudhuri, D. (1960). On a class of error correcting binary group codes. Information and Control, 3, 68–79. 6. Peterson, W., & Brown, D. (1961). Cyclic codes for error detection. Proceedings of the IRE, 49, 228–235. 7. Arikan, E. (2009). Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory, 55, 3051–3073. 8. Abbas, S., Jalaleddine, M., & Gross, W. (2023). List-GRAND: A practical way to achieve maximum likelihood decoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 31, 43–54. 9. Helmling, M., Scholl, S., Gensheimer, F., Dietz, T., Kraft, K., Ruzika, S., & Wehn, N. (2019). Database of Channel Codes and ML Simulation Results. www.uni-kl.de/channel-codes 10. Condo, C., Bioglio, V., & Land, I. (2021). High-performance low-complexity error pattern generation for ORBGRAND decoding. In 2021 IEEE Globecom Workshops (GC Wkshps) (pp. 1–6). 11. Tal, I., & Vardy, A. (2015). List decoding of polar codes. IEEE Transactions on Information Theory, 61, 2213–2226. 12. Balatsoukas-Stimming, A., Parizi, M., & Burg, A. (2015). LLR-based successive cancellation list decoding of polar codes. IEEE Transactions on Signal Processing, 63, 5165–5179. 13. Abbas, S., Tonnellier, T., Ercan, F., & Gross, W. (2020). High-throughput VLSI architecture for GRAND. In 2020 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 1–6). 14. Ye, M., & Abbe, E. (2020). Recursive projection-aggregation decoding of Reed-Muller codes. IEEE Transactions on Information Theory, 66, 4948–4965. 15. Condo, C. (2022). A fixed latency ORBGRAND decoder architecture with LUT-aided error-pattern scheduling. IEEE Transactions on Circuits and Systems I: Regular Papers, 69, 2203–2211.

Part III

Hardware Architectures for Specialized GRAND Variants

In this section, we look at the special GRAND variations that are developed for specific communication channels. The special GRAND variants described in this part are Fading-GRAND, which is designed for Rayleigh fading communication channels, and GRAND Markov Order, which is designed for communication channels with memory to mitigate burst noise. Furthermore, for each GRAND variant, a detailed hardware VLSI architecture is also discussed.

Chapter 5

Hardware Architecture for GRAND Markov Order (GRAND-MO)

Abstract This chapter introduces the GRAND Markov Order (GRAND-MO), a hard-input variant of GRAND designed specifically for communication channels with memory that are prone to burst noise. In a traditional communication system, burst noise is generally mitigated by employing interleavers and de-interleavers at the expense of higher latency. GRAND-MO can be applied directly to hard demodulated channel signals, eliminating the need for additional interleavers and de-interleavers and resulting in a substantial reduction in overall latency in a communication system. This chapter proposes a high-throughput GRAND-MO VLSI design that can achieve an average throughput of up to 52 Gbps for code length n = 128. Furthermore, the proposed GRAND-MO decoder implementation with a codelength n = 79 has a 33% lower worst-case latency and a 2 dB gain in decoding performance, at a target FER of 10−5 , as compared to the (79, 64) BCH code decoder.

5.1 GRAND Markov Order (GRAND-MO): Introduction GRAND Markov Order (GRAND-MO) [1, 2] is proposed recently as a hard-input GRAND variant that can mitigate the effects of burst noise for communication channels with memory. The decoding performance of traditional channel code decoders degrade significantly as the noise burst length exceeds the error correction capability of the underlying code. Therefore, to minimize the impact of burst noise and make the communication channel appear memory-less, time-diversity techniques such interleaving/de-interleaving have typically been employed. In a practical communication system, burst errors are dispersed across multiple codewords, and the depth of the interleavers/de-interleavers pair determines this dispersion. The deeper the interleavers (i.e., their size), the greater the noise dispersion, and therefore the less degradation in decoding performance for a traditional channel code decoder. The depth of interleavers (size of interleavers), on the other hand, introduces significant latency at both the transmitter and receiver ends of the communication system. For more information on the required depth (therefore the latency) of interleavers to achieve a target decoding performance with a typical channel code decoder and with various amounts of channel memory, we refer the reader to section IV (C) of [1]. Communication latency introduced by interleavers/de-interleavers or performance degradation caused by burst noise are undesirable in emerging applications like Ultra-Reliable Low-Latency Communication (URLLC) [3, 4] where latency and reliability are critical. In communication channels with memory, GRANDMO does not require interleavers/de-interleavers, enabling reliable communication even in the presence

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_5

95

96

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

of burst noise. The GRAND-MO [1] uses noise correlations to its advantage and adjusts its Test Error Pattern (TEP) (e) generation to reduce the effects of noise bursts on communication channels with memory. Therefore, GRAND-MO outperforms traditional channel code decoders in such communication channels with burst noise. In this chapter, we discuss the TEP (e) generation scheme employed by GRAND-MO [1] and propose a simplified TEP generation technique which helps to reduce the computational complexity of GRANDMO and result in a simpler hardware implementation. The proposed GRAND-MO TEP generating scheme introduces parameters that can be adjusted for various classes of channel codes to strike a balance between decoding performance requirements and the complexity/latency budget for a target application. Furthermore, in this chapter, we provide a high throughput hardware architecture for the GRAND-MO that can achieve an average information throughput of up to 52 Gbps for a linear block code with a code length n = 128. Please note that this chapter follows up on our previously published works [5, 6].

5.1.1 Channel Model for GRAND-MO We use a classic two-state Markov chain [7] to mimic a communication channel with memory characterised by burst noise. There are two states in the Markov channel under consideration: good state G, which represents a noiseless channel, and bad state B, which represents a noisy channel. Furthermore, the probability of transitioning from state G to state B is b, and the probability of transitioning from state B to state G is g. The length of consecutive errors (size of burst error/noise) introduced by the Markov channel follows a . Similarly, the length of an error-free run, geometric distribution with a mean of g1 and a variance of 1−g g2 when the channel does not introduce any errors, has a mean of b1 . The stationary bit-flip probability for the  b Markov channel p is b+g = Q( 2R ENb0 ), where R is the code-rate. Please note that the Markov channel is equivalent to a memoryless Binary Symmetric Channel (BSC) for p = b. Algorithm 4: GRAND Markov order Input: H, G−1 , y, ˆ b, g,  d2  Output: uˆ OR AB AN DON 1 e←0 2

Δl ← 

log( bg ) 1−g log( 1−b )



while H · (yˆ ⊕ e)  0 do [e, m, l] ← generateNewMarkovErrorPattern(Δl) 5 if m =  d2  AND l =  d2  then 6 return AB AN DON 3 4

7 8

uˆ ← (yˆ ⊕ e) · G−1 return uˆ

5.2 Decoding Performance Evaluation for GRAND-MO

97

5.1.2 GRAND-MO Decoding of Linear Block Codes Algorithm 4 presents the pseudo-code for decoding a linear (n, k) block code with GRAND-MO. The inputs to the algorithm are yˆ , b, g, and  d2 , where yˆ is the hard demodulated received vector of size n and d is the minimum distance of underlying linear (n, k) block code. Furthermore, the GRAND-MO uses k × n generator matrix G and (n − k) × n parity check matrix H of the underlying code. Line 1–2 : The TEP e is initialized to 0 and Δl is computed using the transition probabilities b and g (Δl ← 

log( bg ) 1−g log( 1−b )

).

Line 3–4: With Δl as its input, the function generateNewMarkovErrorPattern iteratively generates TEPs (e). The generated TEPs have a Hamming weight of l and have m number of burst. The generated TEPs (e) are combined with the received vector yˆ and then the resulting vector yˆ ⊕ e is queried for codebook membership (H · ( yˆ ⊕ e)T = 0). Line 5: When both the number of bursts (m) and the Hamming weight (l) of the generated TEPs (e) equal  d2 , where d is the minimum distance of the underlying code, the GRAND-MO is terminated. Line 7: If the resulting vector yˆ ⊕ e belongs to the codebook (H · ( yˆ ⊕ e)T = 0), e is the guessed noise, cˆ  yˆ ⊕ e is the estimated codeword and the message ( uˆ ← ( yˆ ⊕ e) · G−1 ) is recovered. Generation of TEPs (e) for GRAND-MO Please note that the TEPs (e) generated by the function generateNewMarkovErrorPattern either have the same number of bursts (m), and a higher Hamming weight (l), as the TEPs previously generated, or they have the same Hamming weight (l), and a higher number of bursts (m), as the TEPs previously generated. We refer the reader to section III of [1] for more information on GRAND-MO TEP generation.

5.2 Decoding Performance Evaluation for GRAND-MO We evaluate the GRAND-MO decoding performance (Frame Error Rate (FER) performance) in this section for channel codes of different classes, including Bose-Chaudhuri-Hocquenghem (BCH) codes [8, 9], Random Linear Codes (RLCs) [10, 11], and Cyclic Redundancy Check (CRC) codes [12]. Figures 5.1 and 5.2 compare the FER performance of GRAND-MO decoding of BCH code (127, 106) and BCH code (127, 113) to that of traditional Berlekamp-Massey decoding (B-M) [13, 14]. The GRAND-MO decoder receives hard decision values ( yˆ ) from the demodulator. Please note that the Markov channel’s transitional probability (g) and channel memory are inversely correlated. As channel memory increases (indicated by a decreasing value of g), more burst noise can be observed on the channel. Since interleavers/deinterleavers are absent to disperse these noise bursts across multiple codewords, the decoding performance of the B-M decoder deteriorates significantly with increase in channel memory. On the other hand, GRAND-MO’s decoding performance improves with channel memory because of its unique approach of generating TEPs, which relies on noise correlations to guess the noise bursts in the channel. Other classes of channel codes, including CRC codes [12] and RLCs [10, 11], exhibit similar trends in decoding performance. The FER performance for GRAND-MO and GRANDAB decoding of RLCs and CRC codes of length n = 128 and various code-rates is plotted in Figs. 5.3, 5.4 and Figs. 5.5, 5.6 respectively.

98

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

Fig. 5.1: Comparison of the GRAND-MO and BCH Berlekamp-Massey (B-M) decoding performance for BCH code (127, 106) in Markov channels

Fig. 5.2: Comparison of the GRAND-MO and BCH Berlekamp-Massey (B-M) decoding performance for BCH code (127, 113) in Markov channels

5.2 Decoding Performance Evaluation for GRAND-MO

99

Fig. 5.3: Comparison of the decoding performance of GRAND-MO and GRANDAB (AB = 3) with RLC (128, 104) in Markov channels

Fig. 5.4: Comparison of the decoding performance of GRAND-MO and GRANDAB (AB = 2) with RLC (128, 112) in Markov channels

100

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

Fig. 5.5: Comparison of the decoding performance of GRAND-MO and GRANDAB (AB = 3) with CRC (128, 104) in Markov channels

Fig. 5.6: Comparison of the decoding performance of GRAND-MO and GRANDAB (AB = 2) with CRC (128, 112) in Markov channels

5.3 Analyzing Test Error Patterns (TEPs) for GRAND-MO

101

We can observe, from the numerical simulation results presented in Fig. 5.1–Fig. 5.6, that the GRAND-MO decoder outperforms the GRANDAB decoder in terms of FER performance as channel memory is increased (decreasing value of g). 1 2 3 4 5 6

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

50

55

60

0

5

10

15

20

25

(b)

30

35

40

45

50

55

60

0

5

10

15

20

25

(c) 30

35

40

45

50

55

60

0

5

10

15

20

25

(d) 30

35

40

45

50

55

60

(a)

Fig. 5.7: Test error pattern generation for GRAND-MO for n = 6 and Δl = 2 (a) Markov query order [1, 2] (b) Proposed re-arranged query order (c) Proposed query order with parameters (m = 1, l1 = 6). (d) Proposed query order with parameters (m = 2, l1 = 4 and l2 = 3)

5.3 Analyzing Test Error Patterns (TEPs) for GRAND-MO This section explores the TEP generation scheme of GRAND-MO [1, 2]. Despite the absence of interleavers/deinterleavers to mitigate burst noise, GRAND-MO outperforms traditional channel code decoders because it leverages the noise correlation in the underlying Markov channel to adjust its TEPs generation to the burstiness of the noise (channel memory). Reminder In Fig. 5.7, the x-axis indicates the TEP (e) number, and the y-axis indicates the bit positions (i, ∀ i ∈ [1, 6]) of the TEP, where a dot denotes a 1 and the absence of a dot denotes a 0. The 10th TEP in Fig. 5.7a, which is highlighted by a red rectangle, is denoted as e = [0 0 1 1 0 0]. Alternately, we could say that each column in Fig. 5.7 represents a TEP (e) and that each dot represents a location where a bit is flipped in the received vector yˆ . For instance, when the received vector yˆ is combined with the TEP ( yˆ ⊕ e), shown by the red rectangle in Fig. 5.7a, the third and fourth bits of yˆ will be flipped.

102

5 Hardware Architecture for GRAND Markov Order (GRAND-MO) 1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

50

55

60

Fig. 5.8: Test error pattern generation for GRANDAB [15]

5.3.1 TEP Generation for Baseline GRAND-MO (Markov Query Order) The Markov query order proposed in [1, 2] is used by the GRAND-MO decoder to generate TEPs (e). The Markov query order for a code length of n = 6 and Δl = 2 is shown in Fig. 5.7a. Please refer to Algorithm 1 in [1] for further details on the Markov query order TEP generating scheme. Recall: GRANDAB Test Error Patterns Before describing our proposed simplified TEP generating scheme for GRAND-MO, we briefly discuss TEP generation for the GRANDAB decoder [15] which is a hard decision input variant of GRAND. GRANDAB generates TEPs in increasing Hamming weight order (1 ≤ Hamming W eight(e) ≤ AB). Figure 5.8 displays the TEPs generated for n = 6 and AB = 3 in increasing Hamming weight order. As shown in Fig. 5.8, TEPs (e) with Hamming weights 1 are generated first, and then TEPs (e) with Hamming weights 2 and 3 are generated.

5.3.2 Simplifying TEP Generation Scheme for GRAND-MO In this section, we suggest a simplified TEP (e) generation strategy for GRAND-MO that simplifies hardware implementation and reduces the worst-case computational complexity, which corresponds to the maximum number of TEPs required (codebook membership queries).

Complexity and Parameters The parameters Δl, g1 , and ENb0 influence the maximum number of TEPs (and hence the worst-case complexity) for GRAND-MO decoding [1, 2]. We propose to generate TEPs (e) with an increasing number of bursts (m), and burst size (lm ), for GRAND-MO decoder. The proposed GRAND-MO TEP generating scheme is shown in Fig. 5.7b, where all TEPs corresponding to a single noise burst (m = 1) are generated first (TEPs 1 through 21 in Fig. 5.7b), then TEPs with two (m = 2) bursts (TEPs 22 through 55 in Fig. 5.7b), and finally TEPs with three (m = 3) bursts (TEPs 56 through 60 in Fig. 5.7b).

5.3 Analyzing Test Error Patterns (TEPs) for GRAND-MO

103

In order to reduce worst-case complexity (maximum number of TEPs), we also suggest restricting the number of bursts m (∀m ∈ [1, mmax ]) and burst sizes lm in the generated TEPs (e). Please note that when the number of bursts is m, the maximum burst size is lm . In Fig. 5.7c and d, the generated TEPs (e) of our proposed query order with varied m and lm are shown. The TEPs for a single noise burst (m = 1) are shown in Fig. 5.7c (TEPs 1 through 21 in Fig. 5.7c), with a maximum burst size of 6 (l1 = 6). The TEPs for two bursts (m = 2) are similarly shown in Fig. 5.7d, where the burst sizes are limited to 4 (l1 = 4) and 3 (l2 = 3), respectively. The proposed query order with parameters m = 1 and l1 = 6 (Fig. 5.7c) reduces the worst-case complexity from 60 to 21 TEPs compared to the Markov query order (Fig. 5.7a). Similarly, the proposed query order using the parameters m = 2, l1 = 4 and l2 = 3, reduces the worst-case complexity to 51 (Fig. 5.7d).

Fig. 5.9: Comparison of the GRAND-MO (g = 0.2) decoding performance using BCH code (127, 106) with Markov query order [1] and proposed query order with parameters (m, l1 , l2 )

5.3.3 Analyzing Parameters ( m, l m ) of the Proposed GRAND-MO TEP Generation This section explores the impact of varying the parameters (m and lm ) for the proposed GRAND-MO TEP generation scheme (proposed query order) with various classes of channel codes (i.e., BCH codes, RLCs, and CRC codes). These parameters (m and lm ) have an impact on the worst-case computational complexity (maximum number of codebook membership queries) of GRAND-MO decoder as well as on its decoding performance. The FER performance for GRAND-MO decoding, with proposed query order (m = 2, l1 , l2 ), of BCH code (127, 106) is shown in Figs. 5.9 and 5.10 with different values of channel memory (noise burstiness) in the Markov channel. As shown in Fig. 5.9, the GRAND-MO with proposed query order (m = 2, l1 = 32, l2 = 8) results in a 0.4 dB degradation, at target FER of 10−5 , when compared to Markov query order (baseline GRANDMO TEP generation [1]) for decoding BCH code (127, 106). However, the maximum number of codebook

104

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

Fig. 5.10: Comparison of the GRAND-MO (g = 0.4 and g = 0.8) decoding performance using BCH code (127, 106) with Markov query order [1] and proposed query order with parameters (m, l1 , l2 )

Fig. 5.11: GRAND-MO decoding performance using BCH code (127, 106), at query order with parameters (m, l1 , l2 ) and g = 0.2

Eb N0

= 8 dB, with proposed

membership queries (worst-case complexity) is reduced from 3,530,504 queries, required by Markov query order [1], to 487,818 queries at ENb0 = 8 dB. Similarly, the proposed query order with the parameters (m = 2, l1 = 32, l2 = 16) reduces the maximum number of codebook membership queries (worst-case complexity) to 1,788,530 queries. Therefore, it can be inferred from the foregoing explanation that the parameters of the proposed query order (m, lm ) can be tweaked for a target decoding performance and complexity budget of a specific application. For a specific channel code (i.e., BCH codes, RLCs, and CRC codes), the parameters of the proposed query order (m, lm ) can be appropriately chosen by evaluating the decoding performance of GRAND-MO (m, lm ) and choosing the set of parameters (m, lm ) that provide the best decoding performance at a specific

5.3 Analyzing Test Error Patterns (TEPs) for GRAND-MO

105

Fig. 5.12: GRAND-MO decoding performance using BCH code (127, 106), at query order with parameters (m, l1 , l2 ) and g = 0.4

Eb N0

= 8 dB, with proposed

Fig. 5.13: GRAND-MO decoding performance using BCH code (127, 106), at query order with parameters (m, l1 , l2 ) and g = 0.8

Eb N0

= 8 dB, with proposed

Eb N0 .

The decoding performance of GRAND-MO, using the proposed query order (m = 2, l1, l2 ), on BCH

(127, 106) code is shown in Figs. 5.11 and 5.12 at

Eb N0

= 8 dB and with varying amounts of channel memory

(different values of g). The parameters (m, lm ) capable of minimizing the FER, at a constant ENb0 , are the parameters that achieve the optimum decoding performance. The GRAND-MO decoder employing the proposed query order with parameters (m = 2, l1 = 32, l2 = 16) outperforms all other sets of parameters (m = 2, l1, l2 ) for g = 0.2, as shown in Fig. 5.11. Similarly, for g = 0.4 and g = 0.8, the proposed query orders with the parameters (m = 2, l1 = 32, l2 = 10) and (m = 2, l1 = 16 and l2 = 4), respectively, outperform other sets of parameters (m = 2, l1, l2 ) in decoding performance, as shown in Figs. 5.12 and 5.13.

106

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

Fig. 5.14: Comparison of the GRANDAB (AB = 2) and GRAND-MO decoding performance using RLC (128, 112) with Markov query order and proposed query order (g, m = 1, l1 )

Fig. 5.15: Comparison of the GRANDAB (AB = 2) and GRAND-MO decoding performance using CRC code (128, 112) with Markov query order and proposed query order (g, m = 1, l1 )

Furthermore, as shown in Figs. 5.14, 5.15, 5.16 and 5.17, the parameters of the proposed query order (m, li ∀ i ∈ [1, m]) can be chosen to match the FER performance of the Markov query order (baseline GRAND-MO [1]) for various classes of channel codes (BCH codes, RLCs, and CRC codes) of different lengths (n) and rates R  nk .

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO) The proposed GRAND-MO VLSI architecture is based on the VLSI architecture of GRANDAB decoder [16], which uses n × (n − k)-bit shift registers to store TEP (e) syndromes with a Hamming weight of 1

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

107

Fig. 5.16: Comparison of the B-M Decoder and GRAND-MO decoding performance using BCH code (127, 113) with Markov query order and proposed query order (g, m = 1, l1 )

Fig. 5.17: Comparison of the GRANDAB (AB = 3) and GRAND-MO decoding performance using RLC (128, 104) with Markov query order and proposed query order (g, m, li ∀ i ∈ [1, m]) (si = H · 1i , with i ∈ 1 . . n). Each row (si ) of the shift register is combined with the syndrome of the received vector (sc = H · yˆ  ) to compute the n test syndromes (si ⊕ sc , with i ∈ 1 . . n) in a single time step. Each of the n test syndromes is NOR-reduced to feed n-to-log2 n priority encoder, as shown in Fig. 5.18. If all of the TEP syndrome bits are 0, the output of the NOR-reduce is 1, indicating that the codebook membership criterion for that corresponding TEP (e) has been verified. Furthermore, the GRANDAB VLSI architecture [16] uses the linearity property of the underlying code to combine l syndromes, corresponding to TEPs with Hamming weight of 1 (si, ∀ i ∈ 1 . . l), to generate   syndromes corresponding to TEPs with Hamming weight of l (s1,2...,l = H · 1 1 ⊕ H · 12 . . . ⊕ H · 1l ). We direct the reader to [16] in order to fully comprehend the strategy used to evaluate (for codebook membership) the TEPs (e) with Hamming weights > 1.

108

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

1 1 1

1

Fig. 5.18: VLSI architecture for checking error patterns with Hamming weight of 1 (si = H · 1 i , i ∈ 1 . . n) H Memory

Decoder Core

Word Generator

Controller

Fig. 5.19: The proposed VLSI architecture for GRAND Markov order (GRAND-MO) [5, 6]

1 1 1

1

Fig. 5.20: Checking test error patterns corresponding to a noise burst of length 1 ≤ l ≤ n, where s1,2...,l =   H · 1 1 ⊕ H · 12 . . . ⊕ H · 1l

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

109

 Fig. 5.21: TEP syndromes associated with a noise burst of length 1 ≤ l ≤ n (s1,2...,l = H · 1 1 ⊕ H · 12 . . . ⊕  H · 1l )

5.4.1 Proposed GRAND-MO VLSI Architecture Figure 5.19 depicts the proposed GRAND-MO VLSI architecture, which can be utilized to decode any linear block code of length n. For clarity, the clock and control signals are not shown in Fig. 5.19. The proposed GRAND-MO VLSI architecture receives a hard-decided received vector yˆ of channel observation values and ˆ To support various classes of channel code with various code rates, any parity outputs the estimated word u. check matrix H can be loaded into the (n − k) × n-bit H memory. In the initial phase of decoding, the input vector yˆ is evaluated for codebook membership; if the codebook membership criteria is satisfied (H · yˆ  = 0), the decoding is considered successful. Otherwise, the decoding core generates the TEPs e according on the proposed GRAND-MO query order (GRAND-MO TEP generation) and applies them to the yˆ . The resultant vector yˆ ⊕ e is then evaluated for codebook membership. If any of the generated TEPs (e) satisfy the codebook membership constraint (H · ( yˆ ⊕ e) = 0), the controller module transmits the corresponding index values to the word generator module, which converts these index ˆ is recovered. values to their correct bit flip locations in yˆ and the message ( u)

5.4.2 Microarchitecture for Checking TEPs with Noise Burst of Length l (1 ≤ l ≤ n) The GRANDAB VLSI architecture [16] can only generate TEPs (e) with Hamming weights l ≤ 3, hence further improvements are needed to enable the generation of TEPs with noise bursts of length l ≥ 3   (s1,2...,l = H · 1 1 ⊕ H · 12 . . . ⊕ H · 1l ). The contents of the n × (n − k)-bit shift register and the corresponding peripheral circuitry are displayed in Fig. 5.20 for the proposed GRAND-MO VLSI architecture. The TEPs corresponding to a noise burst of length l (1 ≤ l ≤ n) are evaluated using this configuration of shift register and XOR gates. Please note that each row of the shift register stores a syndrome associated with a specific noise burst, with the lth row (∀ l ∈ [1, n])   storing the syndrome associated with noise burst of length l (s1,2...,l = H · 1 1 ⊕ H · 12 . . . ⊕ H · 1l ). Figure 5.21 displays the TEP syndromes, associated with a noise burst of length l, with the proposed VLSI architecture in Fig. 5.20.

110

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

(a) Checking all test error patterns corresponding to a single noise burst of size time-step 1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

≤ 4 in one

50

55

(b) Error patterns corresponding to a single noise burst of size ≤ 4 (red-rectangle).

Fig. 5.22: Checking test error patterns for GRAND-MO decoding using the proposed query order with parameters n = 6, m = 2, l1 = 4 and l2 = 3

Fig. 5.23: TEP syndromes associated with a single noise burst of size 1 ≤ l ≤ 4

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

111

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

= 2 and | 1 | = 1 with

35

= 2 and | 1 | = 1 with

40

=

45

=



1



1.

50

55

and

= 6

Fig. 5.24: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 2

5.4.3 Proposed TEP (e) Scheduling for GRAND-MO This section explains the e evaluation of TEPs employing the proposed GRAND-MO VLSI architecture and the proposed query order. The below steps specify the contents of the shift register and the configuration of the XOR gates for evaluating the TEPs (e) for the proposed query order with the parameters n = 6, m = 2, l1 = 4 and l2 = 3 (Fig. 5.7d). Step 1: Evaluating TEPs with a Single Burst of Length l at Time Step 1 The contents of the shift register and the configuration of the XOR gates required to verify the codebook membership of TEPs with a single burst of length l, highlighted TEPs (e) in Fig. 5.22b, are shown in Fig. 5.22a. For the sake of clarity, the priority encoder and its related signals are not shown in the Fig. 5.22a. All the TEPs (e) corresponding to a single noise burst (m = 1) of size l ≤ 4 (TEPs from 1 to 18 in Fig. 5.22b) are evaluated (for codebook membership) in a single time step due to the specific configuration of the shift register and the XOR gates, shown in Fig. 5.22a.

112

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

= 2 and | 1 | = 1 with

35

= 2 and | 1 | = 1 with

40

=

45

=



2



2.

50

55

and

= 6

Fig. 5.25: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 3

Computing the TEP Syndromes Corresponding to a Single Burst The TEP syndromes associated with a single noise burst of length 1 ≤ l ≤ 4 are shown in Fig. 5.23. Please note that the last connection, shown in Fig. 5.23 with a red highlight, computes the syndrome sc ⊕ s6 , which is the syndrome associated with noise burst of length 1. The computation of sc ⊕ s6 can be explained as: Since s1,2,3,4,5,6 = s1 ⊕ s2 ⊕ s3 ⊕ s4 ⊕ s5 ⊕ s6 and sc ⊕ s1,2,3,4,5 = sc ⊕ s1 ⊕ s2 ⊕ s3 ⊕ s4 ⊕ s5 Therefore s1,2,3,4,5,6 ⊕ sc ⊕ s1,2,3,4,5 ⇒ s1 ⊕ s2 ⊕ s3 ⊕ s4 ⊕ s5 ⊕ s6 ⊕ sc ⊕ s1 ⊕ s2 ⊕ s3 ⊕ s4 ⊕ s5 ⇒ sc ⊕ s6 Because sa ⊕ sa ⇒ 0 and sa ⊕ 0 ⇒ sa

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

113

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

= 2 and | 1 | = 1 with

35

= 2 and | 1 | = 1 with

40

=

45

=



3



3.

50

55

and

= 6

Fig. 5.26: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 4

Highly Paralleized VLSI Architecture It should be noted that the proposed GRAND-MO VLSI architecture is highly parallelized and only requires 2 time-steps to check all of the TEPs corresponding to a single noise burst (n, m = 1, l1 ). The computation of sc takes place in the first time step, and all of the TEPs associated with a single noise burst are evaluated in the second time step.

Step 2: Evaluating TEPs with Two Bursts and l1 = 1 at Time Step 2 To evaluate the TEPs corresponding to m > 1 (when the number of noise bursts in a TEP (e) exceeds 1), a controller is used in conjunction with the shift register. Figure 5.24 shows the contents of the shift register used to evaluate TEPs for m = 2 and |l1 | = 1, as well as the syndrome output by the controller which is denoted as s comp . Pleas note that |li | is the size of ith burst (∀|li | ∈ [1, lm ]). The controller generates s comp = sc ⊕ s1 , the shift register is shifted up two positions, and as a result, all TEPs with s comp = sc ⊕ s1 are evaluated in a single time step. The TEPs corresponding to m = 2 and |l1 | = 1 with s comp = sc ⊕ s1 are highlighted in red-rectangle in Fig. 5.24b. Please note that 0 corresponds to

114

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

= 2 and | 1 | = 1 with

35

= 2 and | 1 | = 1 with

40

=

45

=



4



4.

50

55

and

= 6

Fig. 5.27: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 5

a disabled connection, which means the respective elements of the shift register, do not take part in the final computations. Furthermore, there is an extra syndrome sc ⊕ s1 ⊕ s3 ⊕ s4 ⊕ s5 ⊕ s6 (highlighted in green in Fig. 5.24) that can be ignored or disabled in hardware. Step 3: Evaluating TEPs with Two Bursts and l1 = 1 at Time Step 3 The controller module generates s comp = sc ⊕ s2 and the shift register is shifted up by one position at the next time step, as shown in Fig. 5.25. We can evaluate all of the TEPs associated with m = 2 and |l1 | = 1 with s comp = sc ⊕ s2 (highlighted in red-rectangle in Fig. 5.25b) in a single time step due to this specific configuration of the shift register and the XOR gates (Fig. 5.25).

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

115

= 2 and | 1 | = 2 with

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

35

40

= 2 and | 1 | = 2 with

=

45

=



50 1, 2

and



1, 2 .

55

= 6

Fig. 5.28: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 6

Step 4: Evaluating TEPs with Two Bursts and l1 = 1 at Time Step 4 At the following time step, the controller generates s comp = sc ⊕ s3 and all the TEPs corresponding to s comp = sc ⊕ s3 (highlighted in red-rectangle in Fig. 5.26b) are evaluated, for codebook membership, in a single time step as shown in Fig. 5.26. Step 5: Evaluating TEPs with Two Bursts and l1 = 1 at Time Step 5 In a similar fashion, the controller outputs s comp = sc ⊕ s4 at the subsequent time step, and Fig. 5.27 shows the evaluating of TEPs corresponding to m = 2 and |l1 | = 1 with s comp = sc ⊕ s4 (highlighted in red-rectangle in Fig. 5.27b).

116

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

= 2 and | 1 | = 2 with

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

35

= 2 and | 1 | = 2 with

40

=

45

=



50 2, 3

and



2, 3 .

55

= 6

Fig. 5.29: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 7

Checking all TEPs Corresponding to m = 2 and |l1 | = 1 We can deduce from the previous discussion that for a code length of n, it takes n − 2 time steps to evaluate all TEPs (TEPs 19 through 37 in Fig. 5.25) corresponding to m = 2 and |l1 | = 1, where the shift register is shifted up by one position per time step. Step 6: Evaluating TEPs with Two Bursts and l1 = 2 at Time Step 6 At the following time step, the shift register is reset and shifted up by 3 positions to evaluate the TEPs corresponding to parameters m = 2 and |l1 | = 2 (highlighted in red-rectangle in Fig. 5.28b), with the controller generating s comp = sc ⊕ s1 ⊕ s2 , as shown in Fig. 5.28. Step 7: Evaluating TEPs with Two Bursts and l1 = 2 at Time Step 7 At the next time step the controller outputs s comp = sc ⊕ s2 ⊕ s3 and the Fig. 5.29 shows the checking of TEPs corresponding to s comp = sc ⊕ s2 ⊕ s3 (highlighted in red-rectangle in Fig. 5.29b).

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

117

= 2 and | 1 | = 2 with

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

35

= 2 and | 1 | = 2 with

40

=

45

=



50 3, 4

and



3, 4 .

55

= 6

Fig. 5.30: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 8

Step 8: Evaluating TEPs with Two Bursts and l1 = 2 at Time Step 8 Similarly, at the following time step, the controller outputs s comp = sc ⊕ s3 ⊕ s4 and the Fig. 5.30 shows the checking of TEPs corresponding to s comp = sc ⊕ s3 ⊕ s4 (highlighted in red-rectangle in Fig. 5.30b). Evaluating all TEPs Corresponding to m = 2 and |l1 | = 2 Therefore, it can be deduced from the foregoing description that n−3 time steps are required to evaluate all the TEPs corresponding to m = 2 and |l1 | = 2 (TEPs from 38 to 47 in Fig. 5.28), with the shift register being shifted up by 1 in each time step. Step 9: Evaluating all TEPs Corresponding to m = 2 and l1 = 3 Similarly, n − 4 time steps are required to check TEPs corresponding to m = 2 and |l1 | = 3 (TEPs 49–51 in Fig. 5.25), as illustrated in Fig. 5.31 (corresponding to s comp = sc ⊕ s1 ⊕ s2 ⊕ s3 ) and Fig. 5.32 (corresponding to s comp = sc ⊕ s2 ⊕ s3 ⊕ s4 ).

118

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

= 2 and | 1 | = 3 with

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

35

= 2 and | 1 | = 3 with

40

=

45

=





50 1, 2, 3

and

1, 2, 3 .

55

=6

Fig. 5.31: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 9

Time-Steps Required by the Proposed GRAND-MO VLSI Architecture Based on the preceding discussion, the number of time steps required to evaluate TEPs corresponding to parameters m = 2 and L = min(l1, l2 ) with the proposed GRAND-MO VLSI architecture is (n − 2) + (n − 3) + . . . + (n − (L + 1)), which can be expressed as L × n − (2 + 3 . . . (L + 1)). Moreover, + 1.a, b with some algebraic manipulations, the equation can be further simplified to L × n − (L+1)×(L+2) 2 In summary, the number of required time steps to check all the TEPs corresponding to the proposed query order (GRAND-MO TEP generation) with parameters (n, m = 2, L = min(l1, l2 )) is given as: L×(

a b

1 + 2 . . . n = n×(n+1) . 2 Can be simplified to L × ( 2×n−L−3 ). 2

2×n−L−3 ) + 2. 2

(5.1)

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

119

= 2 and | 1 | = 3 with

(a) Checking all test error patterns corresponding to 1 2 3 4 5 6

0

5

10

15

20

(b) Error patterns corresponding to (red-rectangle).

25

30

35

= 2 and | 1 | = 3 with

40

=

=



45

50



and

2, 3, 4

2, 3, 4 .

55

=6

Fig. 5.32: Checking test error patterns corresponding to proposed query order for GRAND-MO at time step 10

5.4.4 GRAND-MO Hardware Implementation Results The proposed GRAND-MO VLSI architecture (n, m, li ∀ i ∈ [1, m]) has been implemented in Verilog HDL and synthesized using Synopsys Design Compiler using general-purpose TSMC 65 nm CMOS technology. The following two parametric configurations are supported by the implemented GRAND-MO hardware: 1. The GRAND-MO VLSI design, which can support TEPs (e) with m = 2 noise bursts and each with a burst size l1 ≤ 32 and l2 ≤ 32. 2. A GRAND-MO VLSI design, which can support TEPs (e) with a single noise burst m = 1 with burst size of l1 ≤ 128. The GRAND-MO VLSI design has been verified using test benches generated via the bit-true C model of the proposed GRAND-MO hardware. The ASIC synthesis results for the proposed GRAND-MO decoder, which can support codes of length n = 128 and code-rates 0.75 ≤ R ≤ 1, are summarized in Table 5.2. Please note that, for all of the VLSI designs shown in Table 5.2, switching activity from real test vectors are extracted to ensure accuracy in power measurements. The maximum frequencies that can be supported by GRAND-MO implementations with the parameters (m = 2, l1 ≤ 32 and l2 ≤ 32) and (m = 1, l1 ≤ 128) are 500 MHz and 400 MHz, respectively.

120

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

Fig. 5.33: Comparison of average latency and average information throughput for the GRANDAB (AB = 3)[16] hardware and the proposed GRAND-MO hardware with parameters (g, m = 2, l1 , l2 ) for the (128, 104) RLC code

Fig. 5.34: A comparison of GRAND-MO and PGZ decoding performance for BCH code (79, 64) in Markov channels (g = 0.4)

5.4 VLSI Architecture for GRAND Markov Order (GRAND-MO)

121

There is no pipelining strategy employed for the GRAND-MO hardware implementations, hence one clock cycle corresponds to one time-step. The average latency of the proposed GRAND-MO hardware for RLC (128, 104) with various parameters is shown in Fig. 5.33a, and it is compared with the average latency of the GRANDAB hardware [16]. For the proposed GRAND-MO and GRANDAB hardwares, the average latency is determined using the bit-true C model after considering at least 100 frames in error for each ENb0 value. Regardless of the parameters, we can observe that the average latency reduces as the channel conditions improve, eventually reaching 1 clock cycle per decoded codeword and an average information throughput of 52 Gbps is achieved (Fig. 5.33b). Table 5.1: TSMC 65 nm CMOS synthesis comparison for BCH decoder with GRAND-MO (m = 1, l1 ≤ 16) for n = 79 GRAND-MO (79, 64) BCH decoder [17] Technology (nm) Supply (V) Max. frequency (GHz) Area (μm2 ) W.C. latency (ns) Avg. latency (ns) W.C. T/P (Gbps)a Avg. T/P (Gbps)a Power (mW) Energy per bit (pJ/bit) Area efficiency (Gbps/mm2 ) Code compatible Rate compatible a For k = 64

65 1.1 1 225,964 2 1 32 64 39.9 0.62 283 Yes Yes

65 1.2 N/A 3264 3 1.1 21.3 58.2 1.29 0.022 17830 No No

Table 5.2: TSMC 65 nm CMOS synthesis comparison for GRANDAB with GRAND-MO for n = 128 GRANDAB [16] Parameters Technology (nm) Supply (V) Max. frequency (MHz) Area (mm2 ) W.C. latency (ns) Avg. latency (ns) W.C. T/P (Mbps)a Avg. T/P (Gbps)a Power (mW) Energy per bit (pJ/bit) Area efficiency (Gbps/mm2 ) Code compatible Rate compatible a For k = 104

GRAND-MO

GRAND-MO

AB = 3 m = 2, l1 ≤ 32 and l2 ≤ 32 m = 1, l1 ≤ 128 65 65 65 0.9 0.9 0.9 500 500 400 0.25 0.71 1.3 8196 7076 5 2 2 2.5 12.68 14.69 20,800 52 52 41.6 41 113 65.97 0.788 2.17 1.58 208 73.24 32 Yes Yes Yes Yes Yes Yes

122

5 Hardware Architecture for GRAND Markov Order (GRAND-MO)

As shown in Table 5.2, the worst-case (W.C.) scenario requires 3538 cycles (5.1), resulting in a W.C. latency of 7.0 μ for the GRAND-MO decoder implementation with parameters (n = 128, m = 2, l1 ≤ 32 and l2 ≤ 32). However, GRAND-MO implementation with parameters (n = 128, m = 1, l1 ≤ 128) results in a worst-case latency of 5ns (2 cycles) and a worst-case information throughput of 20.8 Gbps for RLC (128, 104). The proposed GRAND-MO decoder (m = 2, l1 ≤ 32 and l2 ≤ 32) has a 2.8× area overhead and is 2.7× less energy efficient than the GRANDAB decoder (AB = 3) [16]. However, as illustrated in Fig. 5.17 for decoding RLC (128, 104), the proposed GRAND-MO with parameters (g = 0.4, m = 2, l1 = 32, and l2 = 16) outperforms the GRANDAB decoder, in decoding performance, by at least 3 dB for target FERs ≤ 10−5 . Similarly, the GRANDAB decoder (AB = 3) [16] is 6.5× more area-efficient, 2× more energy-efficient than GRAND-MO decoder implementation with parameters (n = 128, m = 1, l1 ≤ 128), However, as shown in Fig. 5.17, the GRAND-MO with parameters (g = 0.05, m = 1, l1 = 110) outperforms the GRANDAB decoder in terms of decoding performance for RLC (128, 104) by at least 3 dB for target FERs ≤ 10−5 . Furthermore, when compared to GRANDAB decoder, the GRAND-MO decoder with parameters (n = 128, m = 1, l1 ≤ 128) achieves 1640× higher worst-case throughput as shown in Table 5.2. A high throughput VLSI architecture, based on the Peterson-Gorenstein-Zierler (PGZ) algorithm, for a (79, 64) BCH code decoder was proposed in [17]. Table 5.1 compares the ASIC synthesis results for the GRAND-MO decoder with parameters (n = 79, m = 1, l1 ≤ 16), and the BCH decoder implementation proposed in [17]. Please note that in the W.C. scenario, the GRAND-MO decoder with parameters (m = 1, l1 ≤ 16) requires 2 cycles, resulting in a 32 Gbps information throughput as shown in Table 5.1. Compared to the GRAND-MO decoder, the BCH decoder [17] is 28× more energy efficient and 63× times more area efficient. However, at a target FER of ≤ 10−5 , the GRAND-MO decoder with parameters (m = 1, l1 = 16) achieves a 2 dB gain in decoding performance over the BCH decoder [17], as shown in Fig. 5.34. Furthermore, the proposed GRAND-MO hardware can be used to decode any channel code with n = 79 and 0.75 ≤ R ≤ 1, while the BCH decoder [17] can only decode the (79, 64) BCH code.

5.5 Conclusion In this chapter, we propose a highly parallelized VLSI architecture for the universal decoder GRAND-MO. We propose modifying the GRAND-MO test error pattern generation, which results in a simple hardware implementation and reduced worst-case complexity. The ASIC synthesis results show that, for a code length of 128 and a target FER of 10−5 , an average information throughput of 52 Gbps can be achieved. For decoding RLC (128, 104), the proposed GRAND-MO hardware outperforms the GRANDAB, a hard-input variation of GRAND, by at least 3 dB at the target FER of 10−5 . When compared to the BCH decoder designed for a (79, 64) code, the proposed GRAND-MO hardware achieves a 33% reduction in worst case latency while also delivering a 2 dB improvement in decoding performance at a target FER of 10−5 .

References 1. An, W., Médard, M., & Duffy, K. R. (2020). Keep the bursts and ditch the interleavers. In GLOBECOM 2020 - 2020 IEEE Global Communications Conference (pp. 1–6). 2. An, W., Médard, M., & Duffy, K. (2022). Keep the bursts and ditch the interleavers. IEEE Transactions on Communications, 70, 3655–3667.

References

123

3. Durisi, G., Koch, T., & Popovski, P. (2016). Toward massive, ultrareliable, and low-latency wireless communication with short packets. Proceedings of the IEEE, 104, 1711–1726. 4. Chen, H., Abbas, R., Cheng, P., Shirvanimoghaddam, M., Hardjawana, W., Bao, W., et al. (2018). Ultrareliable low latency cellular networks: Use cases, challenges and approaches. IEEE Communications Magazine, 56, 119–125. 5. Abbas, S. M., Jalaleddine, M., & Gross, W. J. (2021). High-throughput VLSI architecture for GRAND Markov order. In 2021 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 158–163). 6. Abbas, S., Jalaleddine, M., & Gross, W. (2022). Hardware architecture for guessing random additive noise decoding Markov order (GRAND-MO). Journal of Signal Processing Systems, 94, 1047–1065. https://doi.org/10.1007/s11265-022-01775-2 7. Gilbert, E. N. (1960). Capacity of a burst-noise channel. The Bell System Technical Journal, 39(5), 1253–1265. 8. Hocquenghem, A. (1959). Codes correcteurs d’erreurs. Chiffres, 2, 147–156. 9. Bose, R., & Ray-Chaudhuri, D. (1960). On a class of error correcting binary group codes. Information and Control, 3, 68–79. 10. Gallager, R. G. (1968). Information theory and reliable communication. 11. Coffey, J. T., & Goodman, R. M. (1990). Any code of which we cannot think is good. IEEE Transactions on Information Theory, 36(6), 1453–1461. 12. Peterson, W. W., & Brown, D. T. (1961). Cyclic codes for error detection. Proceedings of the IRE, 49(1), 228–235. 13. Berlekamp, E. (1968). Nonbinary BCH decoding (abstr.). IEEE Transactions on Information Theory, 14(2), 242–242. 14. Massey, J. (1969). Shift-register synthesis and BCH decoding. IEEE Transactions on Information Theory, 15(1), 122–127. 15. Duffy, K. R., Li, J., & Médard, M. (2019). Capacity-achieving guessing random additive noise decoding. IEEE Transactions on Information Theory, 65(7), 4023–4040. 16. Abbas, S., Tonnellier, T., Ercan, F., & Gross, W. (2020). High-throughput VLSI architecture for GRAND. In 2020 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 1–6). 17. Choi, S., Ahn, H. K., Song, B. K., Kim, J. P., Kang, S. H., & Jung, S. (2019). A decoder for short BCH codes with high decoding efficiency and low power for emerging memories. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(2), 387–397.

Chapter 6

Hardware Architecture for Fading-GRAND

Abstract In this chapter, we introduce Fading-GRAND, a special variant of GRAND proposed for multipath flat Rayleigh fading communication channels. The Fading-GRAND outperforms a traditional channel code decoder by adjusting its test error patter generating scheme to mitigate the fading effects of the communication channel. Furthermore, it is demonstrated that the proposed Fading-GRAND can be used in spatial diversity scenarios in which a receiver is equipped with multiple antennas. The proposed Fading-GRAND outperforms a traditional channel code decoder by 0.2 ∼ 8 dB for channel codes of various classes at the target FER of 10−7 . Additional Notations In addition to notations defined in the previous chapters, the symbol ⇐⇒ denotes if and only if. The conjugate of a complex number h is denoted by h∗ . Furthermore, the symbol ∼ |(a) stands for the bit-wise NOR operator.

6.1 Introduction A multipath frequency non-selective Rayleigh fading communication channel models the effect of small-scale fading in a multipath propagation environment with no dominant line of sight between the transmitter and the receiver [1]. In dense urban environments, multipath fading can drastically degrade the performance of a wireless communication system. Diversity approaches in time, frequency, and space are widely used to reduce multipath fading and enhance the performance of a wireless communications system [2]. In this chapter, we introduce Fading-GRAND, a hard-input variant of GRAND designed for mitigating the fading effects of the communication channel. The proposed Fading-GRAND outperforms, in decoding performance, the conventional channel code decoders by adjusting its Test Error Pattern (TEP) generation to the fading conditions of the underlying communication channel. The proposed Fading-GRAND is also shown to work with the SIMO (Single Input Multiple Output) spatial diversity communication model, where the receiver is equipped with multiple receive antennas and the transmitter has single antenna. Numerical simulation results demonstrate that the Fading-GRAND outperforms the Berlekamp-Massey (B-M) decoder [3, 4] in decoding BCH code (127, 106) [5, 6] and BCH code (127, 113) by 0.5∼6.5 dB at a © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_6

125

126

6 Hardware Architecture for Fading-GRAND

target FER of 10−7 . Similarly, when decoding CRC (128, 104) code [7] and Random Linear Code (RLC) (128, 104) [8], the Fading-GRAND outperforms the predecessor hard-input GRANDAB by 0.2∼8 dB at a target FER of 10−7 . Furthermore the average complexity of Fading-GRAND (average number of TEPs required), 1 × the complexity of GRANDAB. Please note that this at ENb0 corresponding to target FER of 10−7 , is 12 × ∼ 46 chapter follows up on our previously published work [9].

6.2 Fading-GRAND: Channel Model √ For the proposed Fading-GRAND, let Eb c represent the transmitted binary phase-shift keying (BPSK) signal, where c = ±1 with equal probability. We assume that there are L receive antennas that are sufficiently separated from one another and that the signal is transmitted by a single transmit antenna over a slow frequency-nonselective (flat) Rayleigh fading channel. Furthermore, the channel gains hi (∀i ∈ [1, L]) are independent Rayleigh distributed and a diversity gain of L is achieved [2]. In this setting, the low-pass equivalent received signal at the ith (∀i ∈ [1, L]) receive antenna is given by (6.1) [10].  yi = hi Eb c + ni i = 1···L (6.1) where the noise ni (∀i ∈ [1, L]) is a complex-valued (ni ∈ C) additive white Gaussian noise (AWGN) with a variance N20 per complex dimension. Furthermore, hi is the multipath fading gain, which has real (in-phase) and imaginary (quadrature) components that are assumed to be Gaussian, stationary, and orthogonal to one another. The fading gain amplitude |hi | ≥ 0 follows a Rayleigh probability distribution, and the average power gain is normalized to unity (E[|hi | 2 ] = 1). In this work, we assume that the receiver has perfect knowledge of the Channel State Information (CSI) [11]. Furthermore, the fading coefficients (hi ) influencing different symbols are independent of one another due to perfect interleaving (i.e., infinite-depth interleaving) [2].

6.2.1 Spatial Diversity Combining Techniques Given that the receiver, in the channel model under consideration, is equipped with L receive antennas, there are L independent spatial diversity branches. We explore the following diversity combining techniques to combine these L spatial diversity branches, followed by demodulation: 1. Selection Combining (SC) is the simplest spatial diversity combining technique, in which the diversity branch with the highest instantaneous signal power among L diversity branches is chosen for signal detection [10].  ∗ (hmax Eb c + n j ), hmax = max(|h1 |, ...|h L |) (6.2) y = hmax 2. Maximal Ratio Combining (MRC) is the optimal spatial diversity combining technique that achieves maximum array gain by coherently combining all the L diversity branches [11, 12]. y=

L  i=1

hi ∗  L

z=1 |hz |

 (hi Eb c + ni )

(6.3)

6.3 Fading GRAND: Algorithm, TEP Generation and Complexity Analysis

127

Algorithm 5: Fading GRAND

1 2 3 4 5 6 7 8 9 10

ˆ H, G−1 , I, AB Input: y, Output: uˆ e←0 if H · yˆ == 0 then uˆ ← yˆ · G−1 return uˆ else for HW ← 1 to AB do e ← 1{λ1 , λ2, ..., λ H W } ∀{λ1, λ2, . . . , λ H W }  I if H · (yˆ ⊕ e) == 0 then uˆ ← (yˆ ⊕ e) · G−1 return uˆ

// λi ∈ [1, n], i ∈ [1, HW ]

6.3 Fading GRAND: Algorithm, TEP Generation and Complexity Analysis 6.3.1 Fading GRAND: Algorithm Algorithm 5 presents the steps of the Fading GRAND. The hard demodulated received vector ( yˆ ) from the communication channel, the maximum Hamming weight (AB) of the generated TEPs, the set of reliable indices I ( which is discussed in detail in Sec. 6.3.2), the parity check matrix H, and the generator matrix G are the inputs of Fading GRAND algorithm. The algorithm proceeds as follows: Line 1–4: In the initial phase of decoding, the hard-demodulated vector yˆ is evaluated for codebook membership, and the decoding is considered successful if the codebook membership criterion is satisfied (H · yˆ T = 0). Line 5–7: Otherwise, Fading-GRAND generates all the TEPs (e) with a Hamming weight HW (HW ∈ [1, AB]) such that the indices i (∀ i ∈ [1, n]) of the generated TEPs (e) do not belong to the reliable set I (i  I). Line 8: The generated TEPs (e) are then successively applied to yˆ , and the resulting vector yˆ ⊕ e is then queried for codebook membership (H · ( yˆ ⊕ e)T = 0). Line 9: If the codebook membership criterion is satisfied for a particular TEP (e), then cˆ  yˆ ⊕ e represents the estimated codeword from which the message uˆ is recovered. Otherwise, the remaining TEPs (e) for that Hamming weight or TEPs (e) for larger Hamming weights are generated. The decoding process terminates after all TEPs (e) with the maximum Hamming weight (AB) have been evaluated for codebook membership.

6.3.2 Fading GRAND: TEP Generation The TEP generation for the proposed Fading-GRAND is based on the TEP generation for GRANDAB. The GRANDAB generates TEPs in ascending Hamming weight order up to a maximum Hamming weight AB (HWmax = AB). We propose a modification to the Hamming weight order TEP generation of GRANDAB, which reduces the fading impact when employing GRAND with Rayleigh fading communication channels. As shown in Algorithm 5, the proposed Fading-GRAND is based on generating a set of reliable indices I that

128

6 Hardware Architecture for Fading-GRAND 1 2 3 4 5 6 1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

50

55

60

65

0

5

10

15

20

25

30

35

40

45

50

55

60

65

Fig. 6.1: TEP generation for n = 6. (a) Upper: For GRANDAB (AB = 4) (b) Bottom: For Fading-GRAND (AB = 4, I = {3})

are used to exclude specific GRANDAB TEPs; as a result, only a subset of GRANDAB TEPs are queried, for codebook membership, in Fading-GRAND. For a particular threshold (Δ), the set I can be generated as i ∈ I ⇐⇒ wi ≥ Δ, ∀ i ∈ [1, n] where wi =

⎧ |h |, ⎪ ⎪ ⎨  Lmax|h ⎪ a=1

L ⎪ ⎪ ⎪ |h |, ⎩ i

a|

(6.4)

if SC (L > 1) , if MRC (L > 1) L=1

Example 6.1: GRANDAB vs Fading-GRAND Test Error Patterns Figure 6.1a displays the TEPs generated for n = 6 and AB = 4 in ascending Hamming weight order, where a column represents a TEP (e) and a dot represents a flipped bit location of yˆ . For example, for n = 6, the TEP e = [1 0 0 1 0 0] with first and fourth non-zero bits corresponds to flipping the first and fourth bits of the received vector ( yˆ ⊕ e). As shown in Fig. 6.1, TEPs with Hamming weight 1 are generated initially, followed by TEPs with Hamming weight 2, 3, and 4. The Fading-GRAND TEP generation with reliable set I = {3} is shown in Fig. 6.1b, where only the TEPs (e) that do not have a non-zero element at index 3 (since I = {3}) are generated (1 {λ1,λ2,λ3,λ4 } ∀{λ1, λ2, λ3, λ4 }  I where λi ∈ [1, 6] and i ∈ [1, 4] (AB = 4)). As a result, only a subset of the GRANDAB TEPs (30 out of 56 TEPs, Fig. 6.1a) is generated by the Fading-GRAND, and this subset depends on I.

6.3.2.1 Threshold (Δ) Computation for Reliable Set (I) We determine the optimal Fading-GRAND thresholds by performing Monte-Carlo simulations on various classes of linear (n, k) block codes. To that end, the FER performance for decoding the underlying channel code using the proposed Fading-GRAND is evaluated with different threshold values at a constant ENb0 . The ˜ threshold value that results in the lowest FER at a constant Eb is considered to be the optimal threshold (Δ) N0

6.3 Fading GRAND: Algorithm, TEP Generation and Complexity Analysis

129

Fig. 6.2: (a) (Left) Finding the optimal threshold Δ˜ for RLC (128, 104) Fading-GRAND decoding. (b) (Right) Best fit line (Δ˜ = m × ENb0 + b; m = −0.0376, b = 1.228)

at that specific

Eb N0

˜ are then plotted with varying Δ˜ = argmin FER( ENb0 ) . These optimal thresholds (Δ) Δ

Eb N0

and the equation of the best-fit line is derived as Δ˜ = m × ( ENb0 ) + b, where m is the slope of the best-fit line and b is the y-intercept. Pre-computing the Optimal Thresholds Δ˜ ˜ for various classes of channel codes, can be computed offline Please note that the optimum threshold (Δ), and stored in a look-up table (LUT) for different ENb0 . Therefore, during Fading-GRAND decoding, the reliable set I can be generated by utilizing these pre-computed thresholds Δ˜ and the CSI from the receiver. Example 6.2: Threshold Computation for RLC (128, 104) Figure 6.2a shows the Frame Error Rate (FER) performance for Fading-GRAND decoding of RLC (128, 104) [8], with varying threshold (Δ) values at different ENb0 , for a Rayleigh fading channel with no spatial diversity (L = 1; the receiver has just one receive antenna). In Fig. 6.2b, the optimal thresholds for each ENb0 are plotted, and the equation of the best fitting line (Δ˜ = m × ( ENb0 ) + b) can be derived from the optimal thresholds.

130

6 Hardware Architecture for Fading-GRAND

SRGRAND vs Fading-GRAND For the AWGN channel, a thresholded variant of GRANDAB called Symbol Reliability GRAND (SRGRAND) was proposed in [13]. SRGRAND thresholds the soft input channel observation values (y), categorizes each bit as reliable or unreliable, and then applies GRANDAB only to the unreliable bits. Fading-GRAND differs from SRGRAND in that it applies thresholding to the CSI (6.4) of the fading channel; as a result, it requires the hard-demodulated vector yˆ rather than the soft input channel values y.

Table 6.1: Parameters for fading-GRAND (Δ˜ = m × (n, k)

Eb N0

L

+ b) for decoding linear (n, k) block codes m

b

Rayleigh 1 −0.0376 1.228 RLC (128, 104)

MRC

2 −0.0579 1.325

SC

2 −0.04975 1.614

Rayleigh 1 −0.02944 1.002 MRC

2 −0.04833 1.191 3 −0.0378 1.275

SC

2 −0.04238 1.563 3 −0.05249 1.836

BCH (127, 106)

Rayleigh 1 −0.02165 0.7924 MRC

2 −0.04044 0.9037 3 −0.0588 1.174

SC

2 −0.03669 1.244 3 −0.03385 1.404

BCH (127, 113)

Rayleigh 1 −0.03467 1.222 CRC (128, 104)

MRC

2 −0.06626 1.47 3 −0.0541 1.541 4 −0.07166 1.734

SC

2 −0.04426 1.568 3 −0.03443 1.737 4 −0.02029 1.839

6.4 Fading GRAND: Performance Evaluation In this section, we evaluate the decoding performance and computational complexity of the Fading-GRAND for various classes of channel codes, including BCH codes, CRC codes and RLCs. Table 6.1 lists the ˜ for each parameters (m, b) for the equation of the best-fitting line for computing the optimal thresholds (Δ) class of channel codes. In the preceding section, we discussed the approach for computing these thresholds Δ˜ at various ENb0 and approximating them with the best-fitting line (Δ˜ = m × ( ENb0 ) + b). For the numerical

6.4 Fading GRAND: Performance Evaluation

131

simulation results presented in this section, the Fading-GRAND thresholds the CSI and generates the reliable set I using the equation Δ˜ = m × ( ENb0 ) + b and the parameters listed in Table 6.1. Furthermore, both scenarios are taken into account for the simulation results presented in this section: the scenario with no spatial diversity (L = 1) and the scenario of multiple branches of spatial diversity (L > 1).

Fig. 6.3: Comparison of decoding performance and average complexity of RLC (128, 104) decoding via Fading-GRAND and GRANDAB decoder

132

6 Hardware Architecture for Fading-GRAND

Fig. 6.4: Comparing fading-GRAND and BCH Berlekamp-Massey (B-M) decoding of BCH (127, 106) code

6.4 Fading GRAND: Performance Evaluation

133

Fig. 6.5: Comparing fading-GRAND and BCH Berlekamp-Massey (B-M) decoding of BCH (127, 113) code

134

6 Hardware Architecture for Fading-GRAND

Fig. 6.6: Comparison of decoding performance and average complexity of CRC code (128, 104) decoding via fading-GRAND and GRANDAB decoder; For both GRANDAB and fading-GRAND decoders

6.4 Fading GRAND: Performance Evaluation

135

6.4.1 Fading-GRAND Decoding of RLCs The performance of the GRANDAB and the Fading-GRAND for decoding RLC (128, 104),1 which uses the optimal thresholds Δ˜ for generating the reliable set I, on Rayleigh fading channel, is compared in Fig. 6.3. Additionally, the GRANDAB decoding performance on the AWGN channel is provided for reference. At a target FER of 10−7 , as shown in Fig. 6.3b, the Fading-GRAND outperforms the GRANDAB by 8 dB in terms of decoding performance (L = 1). When the number of spatial diversity branches (L) is increased to 2 (L = 2), which corresponds to a scenario in which the transmitter has one antenna (nT x = 1) and the receiver has two antennas (nRx = 2), the proposed fading GRAND results in a 3 dB improvement over GRANDAB for both SC and MRC combining techniques, as shown in Fig. 6.3a. Figure 6.3d depicts the average complexity for GRANDAB and Fading-GRAND for decoding RLC (128, 104). When there is no spatial diversity (L = 1), the GRANDAB decoder requires 70 queries on average, whereas Fading-GRAND only requires 1.5 queries at ENb0 = 26 dB point, which corresponds to a target FER of 10−7 . When there are two spatial diversity branches (L = 2), MRC with GRANDAB requires 15 queries on average (Fig. 6.3c), however Fading-GRAND needs 2.6 queries at ENb0 = 14 dB, which corresponds to a target FER of 10−7 . At Eb /N0 = 15 dB with L = 2, as shown in Fig. 6.3c, SC with GRANDAB requires 22 queries on average, whereas Fading-GRAND only requires 3.3 queries at the same Eb /N0 = 15 dB point, corresponding to a target FER of 10−7 . It can be inferred from the simulation results displayed in Fig. 6.3c–d 1 that the complexity of Fading-GRAND is ∼ 46 × that of GRANDAB for L = 1 and 16 ×∼ 15 × that of GRANDAB for L = 2. Fading-GRAND: Computational Complexity The computational complexity of GRAND and its variants can be expressed in terms of the number of codebook membership queries required (TEPs (e) generated). Additionally, the complexity can be divided into two categories; worst-case complexity, which corresponds to the maximum number of queries required, and average complexity, which corresponds to the average number of queries required. For a code length of n, the worst-case number of codebook membership queries required for the both AB  n Fading-GRAND and GRANDAB decoder is i [14], where AB is the maximum Hamming weight i=1

of the TEPs (e) generated. Please note that when channel conditions improve, the average complexity of GRAND (GRANDAB and Fading-GRAND) decreases rapidly, since transmissions that are subject to light noise are quickly decoded [14].

6.4.2 Fading-GRAND Decoding of BCH Codes Figures 6.4 and 6.5 present a comparison between the FER decoding performance of Fading-GRAND and Berlekamp-Massey (B-M) decoder [3, 4] for BCH code (127, 106) and BCH code (127, 113), respectively, on Rayleigh fading channels. Furthermore, the B-M decoding performance [15] for the AWGN channel is presented for reference. At a target FER of 10−7 , the Fading-GRAND outperforms the B-M decoder for decoding BCH code (127, 106) by 4 dB for L = 1, 1.5∼2 dB for L = 2 (Fig. 6.4b), and ∼0.5 dB for L = 3 1 For both GRANDAB and Fading-GRAND decoders the HWma x = 4 for BCH code (127, 106), CRC code (128, 104) and RLC (128, 104), whereas the HWma x = 3 for BCH code (127, 113).

136

6 Hardware Architecture for Fading-GRAND

(Fig. 6.4a). Similarly, the Fading-GRAND outperforms the B-M decoder for decoding BCH code (127, 113) by 6.5 dB, 2.5∼3 dB, and 1 dB for L = 1, 2, and 3 respectively at target FER of 10−7 as shown in Fig. 6.5. Please note that as channel conditions improve, the average computational complexity of Fading-GRAND decoding of BCH code (127, 106) and BCH code (127, 113) (shown in Figs. 6.4 and 6.5c and d) reduces significantly.

6.4.3 Fading-GRAND Decoding of CRC Codes The FER performance and average computational complexity of Fading-GRAND and GRANDAB decoding of CRC codes [7] are presented in Fig. 6.6. Please note that the generating polynomial for CRC code (128, 104) is 0xB2B117. As shown in Fig. 6.6, at a target FER of 10−7 , Fading-GRAND outperforms GRANDAB by 4.5 dB for L = 1, 2∼2.5 dB for L = 2, and 0.7 dB for L = 3. For L = 1 (corresponding to ENb0 = 26 dB), 1 the average complexity of Fading-GRAND is 34 × that of GRANDAB. While the average complexity of 1 1 1 Fading-GRAND is 7 × ∼ 8 × and 2 × the complexity of GRANDAB, for L = 2 (corresponding to ENb0 = 13 dB for MRC and ENb0 = 14 dB for SC) and for L = 3 (corresponding to ENb0 = 9 dB for MRC and ENb0 = 11.5 dB for SC), respectively. For L = 4, however, the Fading-GRAND achieves a 0.2 dB improvement and roughly similar complexity to the GRANDAB.

6.5 Proposed VLSI Architecture for Fading-GRAND Fading-GRAND is similar to GRANDAB in that it generates TEPs (e) in ascending Hamming weight order (Fig. 6.1), therefore with a few minor modifications, the hardware architecture for GRANDAB [16] can also be used for Fading-GRAND.

6.5.1 Proposed Modifications in GRANDAB Hardware [16] The GRANDAB VLSI architecture [16] uses a n × (n − k)-bit shift register to store all the n syndromes associated with the TEPs (e) with Hamming weight of 1 (si = H · 1i, ∀ i ∈ 1 . . n). To evaluate all the TEPs (e) with Hamming weight of 1, each row of the shift register is combined with the syndrome of the received vector (s c ) to compute the n test syndromes (si ⊕ sc, ∀ i ∈ 1 . . n) in a single time step. Each of the n test syndromes is NOR-reduced, to feed an n-to-log2 n priority encoder as depicted in Fig. 6.7a. The output of each NOR-reduce is 1 if and only if all the bits of the test syndrome (si ⊕ sc ) are 0. Figure 6.7 illustrates the NOR-reduced test syndromes, which are used as inputs to the priority encoder. We refer the reader to Chapter 2 for more information regarding the GRANDAB VLSI architecture.

6.5 Proposed VLSI Architecture for Fading-GRAND

137

Fig. 6.7: VLSI architecture for checking test error patterns with Hamming weight of 1 (si = H · 1

i , i ∈ 1 . . n)

The GRANDAB VLSI architecture [16] can be modified to support Fading-GRAND by including n twoinput AND gates at the priority encoder’s inputs as shown in Fig. 6.8. The 1-bit NOR-reduced test syndrome (∼|(s c ⊕ si ) ∀ i ∈ 1 . . n) is the first input for the AND gate, and the mask bit (mi ∀ i ∈ 1 . . n) from the controller module is the second input. As shown in Fig. 6.8, when the mask bit is set to 1, the AND gate outputs the NOR-reduced test syndrome ∼|(s c ⊕ si ) (highlighted in green), and when it is set to 0 (mn−1 ), the AND gate output is 0 (highlighted in red).

138

6 Hardware Architecture for Fading-GRAND

Shift Register 1 1 1 1

1 1 1 1

0

1 1

Controller

0 1

Fig. 6.8: Proposed modified VLSI architecture for checking error patterns with Hamming weight of 1 (si = H · 1

i , i ∈ 1 . . n) for fading-GRAND Shift Reg. 1

H Memory

Shift Reg. 2 1 1 1 1

1 1

Controller

Word Computation Unit (WCU)

Fig. 6.9: The proposed VLSI architecture for fading-GRAND

6.5.2 Proposed Fading-GRAND Hardware The proposed Fading-GRAND hardware, as shown in Fig. 6.9, can be used to decode any linear block code with length n and code-rates between nk and 1. For clarity, the clock and control signals are not displayed. The proposed Fading-GRAND VLSI architecture takes inputs—a hard-demodulated received vector from ˜ ˆ Note the communication channel yˆ , the CSI and optimal threshold Δ—and produces an estimated word u. that the framework presented in Sec. 6.3.2 enables the offline computation of the threshold Δ˜ for the Rayleigh fading channel. To support various codes and rates, the (n − k) × n-bit H memory can be loaded with any

6.6 Conclusion

139

parity check matrix H. We recommend the reader to [16] and Chapter 2 for further details on the baseline GRANDAB hardware. A syndrome check is performed on the hard-demodulated input vector yˆ in the initial stage of the decoding process. If the codebook membership check is verified (H · yˆ = 0), decoding is assumed to be successful. Otherwise, the controller module generates the TEPs e in the Hamming weight order and applies them to the yˆ . The resulting vector yˆ ⊕ e is then tested for codebook membership. If any of the generated TEPs e satisfy the codebook membership constraint, the controller module forwards the corresponding indices to the Word Computation Unit (WCU) module, which transforms these index values to their correct bit flip locations in yˆ . Proposing a VLSI Architecture for SRGRAND [13] The proposed Fading-GRAND hardware (Fig. 6.9) can also be used for Symbol Reliability GRAND (SRGRAND). The only modification would be that instead of using the CSI from the Fading channel, the thresholds and the mask bits (mi ∀ i ∈ 1 . . n) would have to be computed using the received soft input channel observation values y form AWGN channel. We refer the reader to [13] for more information about the SRGRAND.

6.6 Conclusion In this chapter, we introduced Fading-GRAND, a hard-input variant of GRAND designed for multipath flat Rayleigh fading communication channels. Fading can significantly degrade the performance of wireless communication systems, hence diversity approaches (in time, frequency and space) are employed to combat it. The Fading-GRAND outperforms a traditional channel code decoder by adjusting its TEPs to the fading conditions of the underlying channel. Both scenarios—where there is no spatial diversity and where there are multiple spatial diversity branches—are evaluated using the proposed Fading-GRAND. The complexity 1 × that of its predecessor, the GRANDAB decoder, and numerical simulation of Fading-GRAND is 12 × ∼ 46 results presented in this chapter demonstrate that it outperforms a conventional channel code decoder by 0.2∼8 dB at target FER of 10−7 for channel codes of various classes. Furthermore, a high-throughput VLSI hardware for the Fading-GRAND is presented, which is based on GRANDAB hardware. Fading GRAND: Applications Fading-GRAND appeals to applications with strict performance and resource constraints due to its improved decoding performance and reduced complexity. The Fading-GRAND ushers GRAND research into real-world multipath fading channel scenarios, and it could be further explored for various spatial and frequency diversity techniques (MIMO and OFDM) as well as other fading communication channels such as Rician fading and Nakagami fading communication channels [2].

140

6 Hardware Architecture for Fading-GRAND

References 1. Rappaport, T. (1996). Wireless communications: Principles and practice. Prentice Hall. 2. Tse, D., & Viswanath, P. (2005). Fundamentals of wireless communication. Cambridge university Press. 3. Berlekamp, E. (1968). Nonbinary BCH decoding (Abstr.). IEEE Transactions on Information Theory, 14, 242–242. 4. Massey, J. (1969). Shift-register synthesis and BCH decoding. IEEE Transactions on Information Theory, 15, 122–127. 5. Hocquenghem, A. (1959). Codes correcteurs d’erreurs. Chiffres, 2, 147–156. 6. Bose, R., & Ray-Chaudhuri, D. (1960). On a class of error correcting binary group codes. Information and Control, 3, 68–79. 7. Peterson, W. W., & Brown, D. T. (1961). Cyclic codes for error detection. Proceedings of the IRE, 49(1), 228–235. 8. Coffey, J. T., & Goodman, R. M. (1990). Any code of which we cannot think is good. IEEE Transactions on Information Theory, 36(6), 1453–1461. 9. Abbas, S., Jalaleddine, M., & Gross, W. (2022). GRAND for Rayleigh fading channels. In 2022 IEEE Globecom Workshops (GC Wkshps) (pp. 504–509). 10. Kong, N. (2009). Performance comparison among conventional selection combining, optimum selection combining and maximal ratio combining. In 2009 IEEE International Conference on Communications (pp. 1–6). 11. Benedetto, S., & Biglieri, E. (1999). Principles of digital transmission: With wireless applications. Springer Science & Business Media. 12. Kim, Y., & Kim, S. (2001). Optimum selection diversity for BPSK signals in Rayleigh fading channels. IEEE Transactions on Communications, 49, 1715–1718. 13. Duffy, K. R., & Médard, M. (2019). Guessing random additive noise decoding with soft detection symbol reliability information - sgrand. In 2019 IEEE International Symposium on Information Theory (ISIT) (pp. 480–484). 14. Duffy, K. R., Li, J., & Médard, M. (2019). Capacity-achieving guessing random additive noise decoding. IEEE Transactions on Information Theory, 65(7), 4023–4040. 15. Cassagne, A., Hartmann, O., Léonardon, M., He, K., Leroux, C., Tajan, R., Aumage, O., Barthou, D., Tonnellier, T., Pignoly, V., Le Gal, B., & Jégo, C. (2019). AFF3CT: A fast forward error correction toolbox!. Elsevier SoftwareX. 10, 100345. http://www.sciencedirect.com/science/article/pii/ S2352711019300457 16. Abbas, S., Tonnellier, T., Ercan, F., & Gross, W. (2020). High-throughput VLSI architecture for GRAND. In 2020 IEEE Workshop on Signal Processing Systems (SiPS) (pp. 1–6).

Part IV

GRAND Extensions

This section covers all the GRAND variants that have recently been proposed for various channel communication channels, GRAND variants using higher order modulation, and innovative ways to use GRAND to assist traditional channel code decoders.

Chapter 7

A Survey of Recent GRAND Variants

Abstract This chapter discusses the recently proposed GRAND variants for different communication channels, including MIMO channel and Multiple Access Channel. The GRAND variants that employ higher order modulation techniques like QPSK, QAM, and OFDM modulation are also covered in this chapter. Furthermore, this chapter reviews unique techniques to employ GRAND to assist conventional channel code decoders, improve network coding performance, and mitigate jamming in wireless communication.

7.1 GRAND for Various Communication Channel Due to GRAND’s unique capability to adapt its Test Error Pattern (TEP) generation to the underlying communication channel, GRAND can be used with a variety of communication channels. This section explores GRAND’s application to the Multiple Access Channel (MAC) channel and Multiple Input and Multiple Output (MIMO) channels.

7.1.1 GRAND for MIMO Channel Sarieddeen et al. [1] proposed a GRAND variant for MIMO fading channel (Rayleigh and Rician fading [2]) employing Zero Forcing (ZF) and Minimum Mean Square Error (MMSE) channel equalization. The proposed MIMO-GRAND [1] is based on soft-input ORBGRAND [3] and generates the pseudo-soft information, exploited for TEP ordering and scheduling, using channel colored noise statistics. Numerical simulation results show that the MIMO-GRAND [1], leveraging pseudo soft information, achieves a 10 dB gain in decoding performance when compared to hard-input GRAND [4]. Furthermore, the MIMO-GRAND [1] achieves a similar decoding performance to the state-of-the-art soft-input decoders that use the full soft information. We refer the reader to [1] for more details.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. M. Abbas et al., Guessing Random Additive Noise Decoding, https://doi.org/10.1007/978-3-031-31663-0_7

143

144

7 A Survey of Recent GRAND Variants

7.1.2 GRAND for MAC Channel Solomon et al. [5] proposed employing GRAND’s soft-input variant (SGRAND [6]) for MAC channels to reduce both additive noise and interference from multiple users. The proposed MAC-SGRAND [5] uses GRAND decoding to eliminate channel noise, followed by ZigZaG decoding [7] to eliminate interference from multiple users. The proposed MAC-SGRAND [5] technique achieves the optimal Maximum A-Posteriori (MAP) decoding while requiring no user coordination. For more details on the proposed MAC-SGRAND, please see [5].

7.2 GRAND for Higher-Order Modulation 7.2.1 GRAND with 64-QAM Modulation In a coded massive MIMO channel employing higher order modulation up to 64-QAM, Allahkaram et al. [8] proposed using GRAND to decode Random Linear Codes (RLCs). The numerical simulation results demonstrate that the proposed approach outperforms the recent soft-input decoders in terms of both decoding performance and computational complexity [8].

7.2.2 GRAND with 256-QAM Modulation A communication system typically employs interleavers and de-interleavers to mitigate the burst noise caused by communication channels with memory (Markov channels). However, the interleavers/de-interleavers introduce latency to a communication system. Wei et al. [9] proposed using GRAND-MO, a hard-input variant of GRAND, on the Markov channels employing higher order modulations up to 256-QAM. The proposed approach outperforms a conventional channel code decoder by adjusting the TEP generation to minimize the impact of channel memory (burst noise). The proposed approach also eliminates the requirement for interleavers/de-interleavers, which can significantly reduce communication system latency. For more details about the GRAND-MO, employing higher order modulations (256-QAM), please see [9].

7.2.3 Symbol-Level GRAND (s-GRAND) for Higher Order Modulations Over Fading Channels The symbol-level GRAND (s-GRAND), employing high-order modulation schemes, for AWGN and flatfading communication channels is introduced in [10]. The symbol-level GRAND [10] uses knowledge of the underlying modulation scheme as well as the communication channel’s fading coefficient to estimate the probability of occurrence of error patterns and determines the order in which TEPs are evaluated. The symbol-level GRAND [10] achieves similar decoding performance to the baseline GRAND while reducing the average computational complexity. However, this reduction in complexity comes at the expense of an increased memory requirement because the most frequently occurring TEPs must be stored in a look-up-table-based memory.

7.5 GRAND for Countering Jamming in Wireless Communication

145

7.3 GRAND for Joint Detection and Decoding Sarieddeen et al. [11] proposed a GRAND variant (turbo-GRAND) for joint detection and decoding that uses complex received symbols, Channel State Information (CSI), and demapped bits to generate bit-reliability information. The bit reliability information is used, by the proposed turbo-GRAND [11], as enhanced a priori information for the subsequent GRAND decoding iteration. The numerical simulation results demonstrate that after a few iterations, the proposed turbo-GRAND can approach the decoding performance of the softGRAND (SGRAND [6]) based on ML detection in both AWGN and Rayleigh fading channels. Please note that a similar approach of adapting GRAND to generate an iterative soft-input soft-output (SISO) decoder, which can improve decoding performance with a minimal number of iterations, was also proposed in [12].

7.4 GRAND for Network Coding Random Linear Network Coding (RLNC) [13], a forward error correcting approach similar to fountain codes [14], has the potential to boost throughput and improve resilience to network attacks. RLNC encoding generates coded packets using random linear combinations of input data packets. RLNC decoding discards corrupted received coded packets and attempts to reconstruct the original data packets from correctly received coded packets. A hard-input GRAND variation known as transversal-GRAND has recently been proposed in [15]. The proposed transversal GRAND expands on the previously proposed GRAND-MO [9] and mitigates the burst error for RLNC. Similar to GRAND-MO [9], traversal-GRAND [15] employs a traditional two-state Markov chain model. Transversal GRAND improves RLNC decoding performance by exploiting correlations between burst errors over several codewords.

7.5 GRAND for Countering Jamming in Wireless Communication GRAND has recently been employed to counteract the impacts of jammers in wireless transmissions [16]. By blending malicious signals with genuine transmission, jammers often try to disrupt or degrade a signal at the receiver without being noticed. As a result, the received frame is rendered undecodable, leading to anomalies like an increase in repeat requests, a decrease in throughput, extended delays, or a total malfunction. Furkan et al. [16] proposed a pre-processing approach that updates log-likelihood ratio (LLR) reliability information to reflect inferences in the presence of a bursty-jammer, enabling improved decoding performance for any soft detection decoder like soft-input ORBGRAND [3]. The attacker cannot maintain their secrecy since, as shown by numerical simulation results, the proposed method discovers a significant portion of the attacks. Furthermore, it is demonstrated that the proposed approach can successfully prevent a DoS (Denial of Service) and restore an order of magnitude of FER performance while decoding channel codes of various classes using the ORBGRAND algorithm.

146

7 A Survey of Recent GRAND Variants

7.6 GRAND for Assisting Conventional Decoders Instead of being employed as a stand-alone decoder, GRAND can also be used to assist a conventional decoder, which lowers the overall latency and energy consumption of the hybrid decoding solution known as Assisted GRAND (AGRAND) [17]. The AGRAND configuration relies on adding a GRAND stage before the conventional decoder. If the GRAND stage is successful in decoding the received vector from communication channel (transmitted codeword), the conventional decoder stage is bypassed, and the AGRAND terminates the decoding, reducing overall latency and power consumption. The numerical simulation results presented in [17] demonstrate that the AGRAND configuration, when employed with successive cancellation list decoding on CA-polar code, can reduce latency by up to 84% at ENb0 = 5.5 dB.

7.7 Partitioned GRAND (PGRAND) PGRAND [17] is a recently proposed standalone code agnostic channel decoder that can achieve 0.2 dB gain to ORBGRAND at a target FER of 10−5 using 50% less AQPF (Average number of Queries per Frame) at ENb0 = 5.5 dB. The PGRAND algorithm divides the received codewords into several partitions and generates TEPs for each partition. To determine the optimal bit positions for flipping, in the generated TEPs, a highly parameterized pattern generator is introduced alongside an abandonment criterion. The pattern generator generates TEPs based on the most frequent empirical error patterns analyzed through Monte-Carlo simulations. These patterns prioritize flipping less reliable bits in less reliable partitions.

References 1. Sarieddeen, H., Médard, M., & Duffy, K. (2022). GRAND for fading channels using pseudo-soft information. CoRR, abs/2207.10842. https://doi.org/10.48550/arXiv.2207.10842 2. Tse, D., & Viswanath, P. (2005). Fundamentals of wireless communication. Cambridge University Press. 3. Duffy, K. R. (2021). Ordered reliability bits guessing random additive noise decoding. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8268–8272). 4. Duffy, K. R., Li, J., & Médard, M. (2019). Capacity-achieving guessing random additive noise decoding. IEEE Transactions on Information Theory, 65(7), 4023–4040. 5. Solomon, A., Duffy, K., & Médard, M. (2021). Managing noise and interference separately - Multiple access channel decoding using soft GRAND. In 2021 IEEE International Symposium on Information Theory (ISIT) (pp. 2602–2607). 6. Solomon, A., Duffy, K. R., & Médard, M. (2020). Soft maximum likelihood decoding using grand. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC) (pp. 1–6). 7. Gollakota, S., & Katabi, D. (2008). Zigzag decoding: Combating hidden terminals in wireless networks. In Proceedings of the ACM SIGCOMM 2008 Conference On Data Communication (pp. 159–170). https://doi.org/10.1145/1402958.1402977 8. Allahkaram, S., Monteiro, F., & Chatzigeorgiou, I. (2022). URLLC with Coded Massive MIMO via Random Linear Codes and GRAND. https://doi.org/10.48550/arXiv.2208.00086

References

147

9. An, W., Médard, M., & Duffy, K. (2022). Keep the bursts and ditch the Interleavers. IEEE Transactions on Communications, 70, 3655–3667. https://doi.org/10.1109/TCOMM.2022.3171798 10. Chatzigeorgiou, I., & Monteiro, F. (2022). Symbol-level GRAND for high-order modulation over block fading channels. IEEE Communications Letters, 27, 1–1. 11. Sarieddeen, H., Médard, M., & Duffy, K. (2022). Soft-input, soft-output joint detection and GRAND. CoRR, abs/2207.10836. https://doi.org/10.48550/arXiv.2207.10836 12. Condo, C. (2022). Iterative soft-input soft-output decoding with ordered reliability bits GRAND. In 2022 IEEE Globecom Workshops (GC Wkshps) (pp. 510–515). 13. Ho, T., Medard, M., Koetter, R., Karger, D., Effros, M., Shi, J., & Leong, B. (2006). A random linear network coding approach to multicast. IEEE Transactions on Information Theory, 52, 4413–4430. 14. Byers, J., Luby, M., & Mitzenmacher, M. (2002). A digital fountain approach to asynchronous reliable multicast. IEEE Journal on Selected Areas in Communications, 20, 1528–1540. 15. Chatzigeorgiou, I. (2022). Transversal GRAND for network coded data. In IEEE International Symposium on Information Theory, ISIT 2022, Espoo, Finland, June 26–July 1, 2022 (pp. 1773–1778). https://doi.org/10.1109/ISIT50566.2022.9834692 16. Ercan, F., Galligan, K., Duffy, K., Medard, M., Starobinski, D., & Yazicigil, R. (2022). A General Security Approach for Soft-information Decoding against Smart Bursty Jammers. (arXiv). https://arxiv. org/abs/2210.04061 17. Jalaleddine, M. (2021). Towards achieving ultra-reliable low latency communications using guessing random additive noise decoding. M. Eng Thesis, McGill University, Montreal Canada.

Appendix A

Proof of Lemma 1

It is sufficient to show that for all i (i ∈ [2, P]) 2×m−(i×(i−1))+2−2×

λimax

< 2×i Base case (i = 2):

P 

j=i+1

λj

. We use induction on i.

m− λ2max