355 98 4MB
English Pages xvi, 571 pages: illustrations; 24 cm [592] Year 2004
SIMULATING WIRELESS COMMUNICATION SYSTEMS
SIMULATING WIRELESS COMMUNICATION SYSTEMS Companion Software Website http://authors.phptr.com/rorabaugh/
C. Britton Rorabaugh
PRENTICE HALL Professional Technical Reference Upper Saddle River, NJ 07458 www.phptr.com
Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book can be obtained from the Library of Congress.
Companion Software Website http://authors.phptr.com/rorabaugh/
Editorial/Production Supervision: Patti Guerrieri Cover Design Director: Jerry Votta Cover Design: Anthony Gemmellaro Art Director: Gail Cocker-Bogusz Manufacturing Buyer: Maura Zaldivar Publisher: Bernard Goodwin Editorial Assistant: Michelle Vincenti Marketing Manager: Dan DePasquale © 2004 Pearson Education, Inc. Publishing as Prentice Hall Professional Technical Reference Upper Saddle River, New Jersey 07458 Prentice Hall PTR offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales. For more information, please contact: U.S. Corporate and Government Sales, 1-800-382-3419, [email protected]. For sales outside of the U.S., please contact: International Sales, 1-317-581-3793, [email protected]. Company and product names mentioned herein are the trademarks or registered trademarks of their respective owners. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America First Printing
ISBN
0-13-022268-2
Pearson Education Ltd. Pearson Education Australia Pty., Limited Pearson Education South Asia Pte. Ltd. Pearson Education Asia Ltd. Pearson Education Canada, Ltd. Pearson Educación de Mexico, S.A. de C.V. Pearson Education — Japan Pearson Malaysia SDN BHD
To Joyce, Geoff, Amber, and Eleanor
This page intentionally left blank
CONTENTS
PREFACE
xv
1
SIMULATION: BACKGROUND AND OVERVIEW 1.1 Communication Systems 1.2 Simulation Process 1.3 Simulation Programs
1 2 2 3
2
SIMULATION INFRASTRUCTURE 2.1 Parameter Input 2.1.1 Individual Parameter Values 2.1.2 Parameter Arrays 2.1.3 Enumerated Type Parameters 2.1.4 System Parameters 2.1.5 Signal-Plotting Parameters 2.2 Signals 2.2.1 Signal Management Strategy 2.2.2 SMS Implementation 2.3 Controls 2.4 Results Reporting
4 4 5 5 7 7 8 9 10 20 29 30
2A
EXAMPLE SOURCE CODE 2A.1 PracSimModel 2A.2 GenericSignal
33 33 38
3
SIGNAL GENERATORS 3.1 Elementary Signal Generators 3.1.1 Unit Step 3.1.2 Rectangular Pulse
44 44 44 45 vii
viii
3.2 3.3 3.4
3.5
3.1.3 Unit Impulse 3.1.4 Software Implementation Tone Generators 3.2.1 Software Implementation Sampling Baseband Signals 3.3.1 Spectral View of Sampling Baseband Data Waveform Generators 3.4.1 NRZ Baseband Signaling 3.4.2 Biphase Baseband Signaling 3.4.3 Delay Modulation 3.4.4 Practical Issues Modeling Bandpass Signals
3A
EXAMPLE SOURCE CODE 3A.1 MultipleToneGener 3A.2 BasebandWaveform
4
RANDOM PROCESS MODELS 4.1 Random Sequences 4.1.1 Discrete Distributions 4.1.2 Discrete-Time Random Processes 4.2 Random Sequence Generators 4.2.1 Linear Congruential Sequences 4.2.2 Software Implementations 4.2.3 Evaluating Random-Number Generators 4.3 Continuous-Time Noise Processes 4.3.1 Continuous Random Variables 4.3.2 Random Processes 4.4 Additive Gaussian Noise Generators 4.4.1 Gaussian Distribution 4.4.2 Error Function 4.4.3 Spectral Properties 4.4.4 Noise Power 4.4.5 Gaussian Random Number Generators 4.5 Bandpass Noise 4.5.1 Envelope and Phase 4.5.2 Rayleigh Random Number Generators 4.6 Parametric Models of Random Processes 4.6.1 Autoregressive Noise Model
46 47 49 50 51 53 54 55 57 58 59 61 64 64 69 78 78 79 82 83 84 90 92 93 94 97 99 99 100 101 102 102 104 104 109 109 110
ix 4A
EXAMPLE SOURCE CODE 4A.1 AdditiveGaussianNoise
112 112
5
DISCRETE TRANSFORMS 5.1 Discrete Fourier Transform 5.1.1 Parameter Selection 5.1.2 Properties of the DFT 5.2 Decimation-in-Time Algorithms 5.2.1 Software Notes 5.3 Decimation-in-Frequency Algorithms 5.4 Small -N Transforms 5.5 Prime Factor Algorithm 5.5.1 Software Notes
119 119 120 120 123 126 131 136 138 138
5A
EXAMPLE SOURCE CODE 5A.1 FFT Wrapper Routines 5A.2 FFT Engines
141 141 141
6
SPECTRUM ESTIMATION 6.1 Sample Spectrum 6.1.1 Software Implementation 6.2 Daniell Periodogram 6.2.1 Software Implementation 6.3 Bartlett Periodogram 6.3.1 Software Implementation 6.4 Windowing and Other Issues 6.4.1 Triangular Window 6.4.2 Software Considerations 6.4.3 von Hann Window 6.4.4 Hamming Window 6.4.5 Software Implementation 6.5 Welch Periodogram 6.5.1 Software Implementation 6.6 Yule-Walker Method 6.6.1 Software Implementation
146 146 147 148 149 151 152 153 154 155 157 160 161 167 167 167 168
6A
EXAMPLE SOURCE CODE 6A.1 BartlettPeriodogramWindowed 6A.2 GenericWindow
171 171 177
x 7
SYSTEM CHARACTERIZATION TOOLS 7.1 Linear Systems 7.1.1 Characterization of Linear Systems 7.1.2 Transfer Functions 7.1.3 Computer Representation of Transfer Functions 7.1.4 Magnitude, Phase, and Delay Responses 7.2 Constellation Plots 7.2.1 Eye Diagrams
182 182 183 184 186 189 192 193
7A
EXAMPLE SOURCE CODE 7A.1 CmpxIqPlot 7A.2 HistogramBuilder
199 200 203
8
FILTER MODELS 8.1 Modeling Approaches 8.1.1 Numerical Integration 8.1.2 Sampled Frequency Response 8.1.3 Digital Filters 8.2 Analog Filter Responses 8.2.1 Magnitude Response Features of Lowpass Filters 8.2.2 Filter Transformations 8.3 Classical Analog Filters 8.3.1 Butterworth Filters 8.3.2 Chebyshev Filters 8.3.3 Elliptical Filters 8.3.4 Bessel Filters 8.4 Simulating Filters via Numerical Integration 8.4.1 Biquadratic Form 8.4.2 Software Design 8.5 Using IIR Digital Filters to Simulate Analog Filters 8.5.1 Properties of IIR Filters 8.5.2 Mapping Analog Filters into IIR Designs 8.5.3 Software Design 8.6 Filtering in the Frequency Domain 8.6.1 Fast Convolution 8.6.2 Software Design
207 207 207 208 208 209 210 210 217 217 218 222 227 229 231 232 234 236 237 240 242 242 244
8A
EXAMPLE SOURCE CODE 8A.1 Classical Filters
247 247
xi 9
MODULATION AND DEMODULATION 9.1 Simulation Issues 9.1.1 Using the Recovered Carrier 9.2 Quadrature Phase Shift Keying 9.2.1 Nonideal Behaviors 9.2.2 Quadrature Modulator Models 9.2.3 Correlation Demodulator Models for QPSK 9.2.4 Quadrature Demodulator Models 9.2.5 QPSK Simulations 9.2.6 Properties of QPSK Signals 9.2.7 Offset QPSK 9.3 Binary Phase Shift Keying 9.3.1 BPSK Modulator Models 9.3.2 BPSK Demodulation 9.3.3 BPSK Simulations 9.3.4 Properties of BPSK Signals 9.3.5 Error Performance 9.4 Multiple Phase Shift Keying 9.4.1 Ideal m -PSK Modulation and Demodulation 9.4.2 Power Spectral Densities of m -PSK Signals 9.4.3 Error Performance 9.5 Frequency Shift Keying 9.5.1 FSK Modulators 9.6 Minimum Shift Keying 9.6.1 Nonideal Behaviors 9.6.2 MSK Modulator Models 9.6.3 Properties of MSK Signals
262 262 263 264 266 269 270 273 275 279 282 286 286 287 289 290 292 293 293 295 298 299 303 306 306 309 312
9A
EXAMPLE SOURCE CODE 9A.1 MskModulator 9A.2 MpskOptimalDemod
315 316 320
10
AMPLIFIERS AND MIXERS 10.1 Memoryless Nonlinearities 10.1.1 Hard Limiters 10.1.2 Bandpass Amplifiers 10.2 Characterizing Nonlinear Amplifiers 10.2.1 AM/AM and AM/PM 10.2.2 Swept-Frequency Response
325 326 326 327 342 342 343
xii 10.3 Two-Box Nonlinear Amplifier Models 10.3.1 Filter Measurements
344 344
10A EXAMPLE SOURCE CODE 10A.1NonlinearAmplifier
350 350
11
356 356 357 360 368 385 387 388 393
SYNCHRONIZATION AND SIGNAL SHIFTING 11.1 Shifting Signals in Time 11.1.1 Delaying Signals by Multiples of the Sampling Interval 11.1.2 Advancing Signals by Multiples of the Sampling Interval 11.1.3 Continuous-Time Delays via Interpolation 11.2 Correlation-Based Delay Estimation 11.2.1 Software Implementation 11.3 Phase-Slope Delay Estimation 11.4 Changing Clock Rates
11A EXAMPLE SOURCE CODE 11A.1DiscreteDelay
398 398
12
406 407 412 412 424 424 426
SYNCHRONIZATION RECOVERY 12.1 Linear Phase-Locked Loops 12.2 Digital Phase-Locked Loops 12.2.1 Phase-Frequency Detector 12.3 Phase-Locked Demodulators 12.3.1 Squaring Loop 12.3.2 Costas Loop
12A EXAMPLE SOURCE CODE 12A.1DigitalPLL
430 430
13
440 440 440 441 443 449 449 455 459 460
CHANNEL MODELS 13.1 Discrete Memoryless Channels 13.1.1 Binary Symmetric Channel 13.1.2 Other Binary Channels 13.1.3 Nonbinary Channels 13.2 Characterization of Time-Varying Random Channels 13.2.1 System Functions 13.2.2 Randomly Time-Varying Channels 13.3 Diffuse Multipath Channels 13.3.1 Uncorrelated Tap Gains
xiii 13.3.2 Correlated Tap Gains 13.4 Discrete Multipath Channels
461 463
14
MULTIRATE SIMULATIONS 14.1 Basic Concepts of Multirate Signal Processing 14.1.1 Decimation by Integer Factors 14.1.2 Interpolation by Integer Factors 14.1.3 Decimation and Interpolation by Noninteger Factors 14.2 Filter Design for Interpolators and Decimators 14.2.1 Interpolation 14.2.2 Decimation 14.3 Multirate Processing for Bandpass Signals 14.3.1 Quadrature Demodulation 14.3.2 Quadrature Modulation
465 465 466 466 468 469 471 480 487 487 487
15
MODELING DSP COMPONENTS 15.1 Quantization and Finite-Precision Arithmetic 15.1.1 Coefficient Quantization 15.1.2 Signal Quantization 15.1.3 Finite-Precision Arithmetic 15.2 FIR Filters 15.3 IIR Filters
491 491 491 495 495 496 501
16
CODING AND INTERLEAVING 16.1 Block Codes 16.1.1 Cyclic Codes 16.2 BCH Codes 16.3 Interleavers 16.3.1 Block Interleavers 16.3.2 Convolutional Interleavers 16.4 Convolutional Codes 16.4.1 Trellis Representation of a Convolutional Encoder 16.4.2 Viterbi Decoding 16.5 Viterbi Decoding with Soft Decisions
506 506 507 509 513 513 514 515 518 519 525
A
MATHEMATICAL TOOLS A.1 Trigonometric Identities A.2 Table of Integrals A.3 Logarithms
532 532 534 536
xiv A.4 Modified Bessel Functions of the First Kind A.4.1 Identities
536 537
B
PROBABILITY DISTRIBUTIONS IN COMMUNICATIONS B.1 Uniform Distribution B.2 Gaussian Distribution B.3 Exponential Distribution B.4 Rayleigh Distribution B.4.1 Relationship to Exponential Distribution B.5 Rice Distribution B.5.1 Marcum Q Function
538 538 538 539 540 541 541 542
C
GALOIS FIELDS C.1 Finite Fields C.1.1 Fields C.2 Polynomial Arithmetic C.3 Computer Generation of Extension Fields C.3.1 Computer Representations for Polynomials C.3.2 Using a Computer to Find Primitive Polynomials C.3.3 Programming Considerations C.4 Minimal Polynomials and Cyclotomic Cosets
543 543 545 545 551 552 552 557 559
D
REFERENCES
INDEX
563 566
PREFACE M odern communications systems and the devices operating within
these systems would not be possible without simulation, but practical information specific to the simulation of communications systems is relatively scarce. My motive for writing this book was to collect and capture in a useful form the techniques that can be used to simulate a wireless communication system using C++. It has been my experience that organizations newly confronted with a need to simulate a communication system are in a rush to get started. Consequently, these organizations will purchase a commercial simulation package like SPW or MATLAB Simulink without even considering the alternative of constructing their own simulation using C++. In the beginning, progress comes quickly as simple systems are configured from standard library models. Only when they begin to model the more complex proprietary parts of their systems do these organizations begin to realize how much control and flexibility they sacrificed in going with a commercial package. It is not possible for any library of precoded models to be absolutely complete. There will always be a need to build a highly specialized model or make modifications to existing models. A user attempting to do either, using a commercial package, usually spends more time dealing with the rules and limitations of the simulation infrastructure than with the details of the model algorithms themselves. In the mid 1990s, I was the architect and lead designer for a proprietary simulation package that was used to simulate the wireless data communication links in several very large U.S. defense systems. This package wasn’t perfect—software never is—but I drew upon this experience, and while writing this book, I developed a simpler simulation package that avoids many of the complexities and objectionable features of my earlier effort. This new package is called PracSim, which is short for Practical Simulation. All of the source code for the models and infrastructure comprising the PracSim package is provided on the Prentice Hall Web site (http://authors.phptr.com/rorabaugh/). Examples of this code are prexv
xvi
Preface
sented and discussed throughout the book, but there is far too much code to include it all in the text. The library of PracSim models is not intended to be complete, but rather to provide a foundation that users can modify or build upon as needed to capture the nuances of the particular systems they are attempting to model. I didn’t keep accurate records, but I’m sure that construction of the PracSim software took far more time than the actual writing of the text. I would like to thank my wife Joyce, son Geoffrey, daughter Amber, and mother-in-law Eleanor for not complaining too much about all the time I spent on this project and for dealing with all of the household problems that I never seemed to have time for. I would also like to thank my editor, Bernard Goodwin, for his patience despite the numerous times that I postponed delivery of the final manuscript.
Chapter 1
SIMULATION: BACKGROUND AND OVERVIEW M
odern communications systems and the devices operating within these systems would not be possible without simulation. The expanded use of digital signal processing techniques has spawned cell phones and wireless transceivers that offer incredible performance and features at a per-unit cost that puts them within the reach of nearly everyone. However, these low per-unit costs are achieved through mass production of hundreds of thousands or even millions of units from a single design. The design of a new cell phone or wireless modem for a PDA is a very complex and expensive affair. Because of the complexity in such devices, it is not practical to breadboard prototypes for testing until after the design has been exhaustively tested and honed using simulation. Even after a new device has been prototyped, it is usually impractical to test it under every possible combination of operating conditions. For example, the nature of CDMA and GSM cellular phone systems is such that all of the phones in a given area unavoidably interfere with each other. The phones and base stations all include processing to mitigate this interference, but the severity of the interference and the effectiveness of the countermeasures depend upon the relative locations, with respect to the base station tower, of all the potentially interfering phones. Assessment of the interference is complicated by the fact that the phones can individually vary their transmit powers via power-control loops executing in the phones or in response to commands from the base station. Analysis is impossible and exhaustive testing is impractical. Simulation using carefully constructed models of the phones and base station is the only answer. In the design of nearly any type of communications equipment, simulation provides an inexpensive way to explore possibilities and design trades before the more expensive process of prototyping is initiated.
1
2
1.1
Simulation: Background and Overview
Chapter 1
Communication Systems
There are many aspects to the operation of a large complex communication system, and simulations of various kinds can be used to assess the system’s performance with respect to these various aspects. Consider a typical cellular phone system. At the lowest level, there is the radio link between the mobile phone and the base station tower. The performance of this link is degraded by additive noise, interference from other phones, interference from other towers, interference from other noncellular man-made sources, attenuation of the RF signal, and multipath propagation. The system architects can employ a number of techniques to combat these sources of degradation. These techniques include selection of modulation technique, transmit power control, improved receiver sensitivity, diversity combining, error-correction coding, interleaving, equalization, RAKE demodulator designs and interference cancellation. Analytically assessing the performance of these techniques in various combinations is always difficult and often impossible. Simulation is often the only practical way to estimate the performance of the link without actually building and testing it. The simulation techniques presented in this book are concerned with the performance of this link. Other aspects of the mobile-to-base-station link (such as the capacity and throughput limitations of a particular multiple-access protocol under various traffic-loading conditions) involve discrete event simulations of the sort used to model local area network protocols and are covered elsewhere [1]. Another variant of discrete event simulation would be needed to assess the performance of a particular geographic deployment of a cell cluster with respect to tower-to-tower handover of fast moving mobiles on a nearby interstate highway.
1.2
Simulation Process
The simulation process begins with an analysis of the system to be simulated. Obviously, the nature of this analysis depends upon the nature of the system and the maturity of the design. If the system has already been completely designed, the simulation effort can immediately focus on the selection or development of high-fidelity models for constituent parts of the system. On the other hand, if the simulation effort is being mounted for the purpose of assisting in system architecture decisions, the effort might begin with textbook models of channel impairments and idealized subsystems. In many satellite communication systems, the transmit power amplifiers onboard the satellite introduce significant amounts of signal distortion and the rest of the system must be designed to tolerate or mitigate this distortion. In the early stages of architecting the system, simulations using a detailed model of the power amplifier along with idealized, or perhaps “typical-perfomance,” models of other
Section 1.3
Simulation Programs
3
components might be used to decide whether the amplifier-induced distortions are best mitigated by predistortion at the transmitter, equalization at the receiver, or some combination of both. For early-stage, “broad-brush” estimates of system performance, simulations can use idealized models that are implemented directly from textbook descriptions of modulators, demodulators, codecs, and equalizers. These are the types of models usually included with commercial simulation packages. In other situations, highfidelity models that include detailed second- and third-order behaviors of the actual devices must be used. At 110 bits per second, an equalizer can be implemented using digital signal processing (DSP) techniques, and performance can be freely traded against cost and complexity using “canned” models that capture the quantization strategy of the proposed implementations. At one gigabit per second, equalizers must be implemented using analog techniques, and the irreducible distortions that remain in a state-of-the-art design must be captured in a handcrafted model that is validated against measurements and circuit-level simulations of the proposed or brassboard device. Models of nonlinear power amplifiers almost always need to be validated against measurements of the as-built device.
1.3
Simulation Programs
Thirty years ago, the few simulations of truly large communication systems were performed using large, hand-coded FORTRAN programs. Because of the computationintensive nature of simulation and the limited computer speeds available at the time, these programs were tightly integrated monoliths of code that used subroutines only when portions of the code needed to be written in assembly language. Now simulations are performed using modular packages—either proprietary or COTS—that emphasize flexibility and ease of use. PracSim is a modular set of simulation models and connective infrastructure that was developed during the writing of this book. The infrastructure is discussed at length in Chapter 2, and the models are mentioned throughout all of the other chapters. The source code for both the infrastructure and models is available on the Prentice Hall Web site (http://authors.phptr.com/rorabaugh/). The models include various sources of performance degradation that are often left out of the “textbook” models included with some of the popular commercial simulation packages. However, these models cannot be considered truly “industrial grade” because their internal coding is structured for tutorial clarity rather than for absolute optimum execution speed.
Chapter 2
SIMULATION INFRASTRUCTURE A
mong designers of real-time software, there is a tendency to shun object-oriented programming in C++ in favor of hand-optimized programming in a mix of C and assembly. Because they often involve very short sampling intervals and very long durations, simulations can take a long time to execute. The lengthy execution time creates an incentive to apply runtime optimization techniques to simulation programs. However, simulation tools also need to be convenient and easy to use correctly. When designing the simulation software for this book, the convenience and robustness of an object-oriented approach won out over execution speed advantages potentially offered by other approaches.
2.1
Parameter Input
Simulations, and the models from which they are constructed, require a number of input parameters. Usually the total number of parameters for a simulation is large enough to make it impractical to interactively enter the parameters each time the simulation is executed. PracSim provides a capability to read input parameters from a simply formatted text file. The names of the input file and several output files are tied to the name of the simulation. The simulation’s name is established at the beginning of the main program by the macro definition for SIM NAME. To name the simulation BpskSim, the definition would be #define SIM NAME "BpskSim\0"
An extract from a typical parameter file is shown in Table 2.1. The file is divided into sections for the system and each constituent model. Each new section begins with a line containing a single dollar sign. The second line of each section is the name 4
Section 2.1
Parameter Input
5
of the model instance to which the parameters pertain. The section for system-level parameters uses system for the name of the model instance. All of the other sections use the instance names that are passed into each model constructor by main. The final section is tagged SignalPlotter and specifies which system-level signals are to be plotted along with some constraints on how they are to be plotted. The file ends with a line containing $EOF.
2.1.1
Individual Parameter Values
To read a value into a double variable named Pulse Duration, the constructor for the wav gen instance of the BitsToWave model would have to invoke ParmFile::GetDoubleParm using the syntax: Pulse Duration = ParmInput->GetDoubleParm("Pulse Duration");
In order for this method to be successful, the input file section for model instance wav gen must contain a line that assigns a value to Pulse Duration such as Pulse Duration = 1.0
The header file parmfile.h provides a number of macro definitions that can be used to simplify this syntax to GET DOUBLE PARM(Pulse Duration);
Similar macro definitions are provided for other parameter types: GET GET GET GET GET
INT PARM(X) BOOL PARM(X) LONG PARM(X) FLOAT PARM(X) STRING PARM(X)
2.1.2
Parameter Arrays
To read values into a double array named Tone Gain, the constructor for the SinesInAwgn model would have to invoke ParmFile::GetDoubleParmArray using the syntax
6
Simulation Infrastructure
Table 2.1 Extract from parameter file. $ system Date In Short Rpt Name = false Date In Full Rpt Name = false Max Pass Number = 50 $ symb gen Initial Seed = 7733115 Bits Per Symb = 3 $ m psk mod Bits Per Symb = 3 Samps Per Symb = 16 Symb Duration = 1.0 $ spec analyzer 1 Kind Of Spec Estim = SPECT CALC BARTLETT PDGM Num Segs To Avg = 600 Seg Len = 4000 Fft Len = 4096 Norm Factor = 1.0 Hold Off = 0 Psd File Name = test sig psd Freq Norm Factor = 1.0 Output In Decibels = true Plot Two Sided = true Plot Relative To Peak = true Halt When Completed = false Num Bits Per Symb = 3 Time Const For Pwr Mtr = 30.0 Seed = 69069 Sig Pwr Meas Enabled = true Outpt Pwr Scaling On = false Sig Filtering Enabled = false $ SignalPlotter Num Plot Sigs = 2 symb vals, 0.0, 500.0, 1, 1, 0 modulated signal, 0.0, 500.0, 1, 0, 0 $ $EOF
Chapter 2
Section 2.1
7
Parameter Input
Tone Gain = ParmInput->GetDoubleParmArray("Tone Gain\0", Tone Gain, Num Sines);
where Num Sines is the number of elements to be read into the Tone Gain array. If Num Sines equals 3, and the input file section for the instance of SinesInAwgn being initialized contains a line of the form Tone Gain = 5.0,100.0,5.0
then Tone Gain[0] is set to 5.0, Tone Gain[1] is set to 100.0, and Tone Gain[2] is set to 5.0. The header file parmfile.h provides a number of macro definitions that can be used to simplify this syntax to GET DOUBLE PARM ARRAY(Tone Gain,Num Sines);
2.1.3
Enumerated Type Parameters
To improve the readability of the parameter input files, PracSim includes a number of enumerated types for specifying various model options. For example, the DiscreteDelay model described in Chapter 11 has four different operating modes, each represented by a value of the enumerated type DELAY MODE T. The possible values are DELAY MODE NONE, DELAY MODE FIXED, DELAY MODE DYNAMIC, and DELAY MODE GATED. These values are read from the parameter input file using the function GetDelayModeParm, which is provided in file delay modes.h. This function is not a member of the ParmFile class. Table 2.2 summarizes the other enumerated types implemented in PracSim.
2.1.4
System Parameters
System-level parameters are placed in the system section of the parameter input file. There are normally only three system-level parameters: two booleans, Date In Long Report Name and Date In Short Report Name, which are discussed in Section 2.4; and Max Pass Number, an int that specifies the final pass number after which the simulation terminates execution. System-level parameters are read by code located in files sim preamble.cpp and sim startup.cpp. The file sim preamble.cpp should be inserted using the #include directive at the very beginning of the main program for every simulation. This code reads Max Pass Number directly and calls SimulationStartup, which reads the two system-level booleans.
8
Simulation Infrastructure
Table 2.2
Chapter 2
Enumerated types in PracSim.
Type
File
Using Module
ADVANCE MODE T
adv modes.h
DELAY MODE T
delay modes.h
FILT BAND CONFIG T
filter types.h
FILT RESP CONFIG T PCM WAVE KIND T INTERP MODE T
filter resp.h wave kinds.h interp modes.h
KIND OF SPECT CALC T WINDOW SHAPE T WINDOW SHAPE T
spect calc kinds.h window shapes.h window shapes.h
ContinuousAdvance DiscreteAdvance ContinuousDelay DiscreteDelay AnalogFilterByIir AnalogFilterByInteg DenormalizedPrototype FilterResponse BasebandWaveform ContinuousAdvance ContinuousDelay DftDelay SpectrumAnalyzer BartlettPeriodogram WelchPeriodogram
2.1.5
Signal-Plotting Parameters
The signal-plotting parameters are contained in the SignalPlotter section of the input file. This section always begins with the parameter Num Plot Sigs that indicates the number of signals to be plotted. This is followed by a series of one-line plot specifications each having a form like input sig, 0.0, 500.0, 1, 1, 0
The first parameter in the specification is the name of the signal to be plotted. This name must match the signal’s name as it is defined in the main simulation program. The second and third parameters are the starting and stopping times of the signal interval to be plotted. The fourth parameter is the plot decimation rate. A value of 1 indicates that every available sample in the interval is to be written to the plot file. A value of k indicates that k − 1 samples are to be skipped between samples that are written to the plot file. The fifth parameter is a flag, which for most signals is set to zero. If the flag is set to 1, the plotting routine interprets the values provided for the start and stop times as sample counts rather than time values. For each signal plotted, PracSim creates a file that has the same name as the signal and an extension of .txt. Each sample appears on a separate line. The time value appears first on the line, followed by a comma and then the sample value. Complex signal values are written as two floating-point values separated by a comma.
Section 2.2
2.2
Signals
9
Signals
In their simplest embodiment, signals in a simulation would be little more than buffer areas to hold some number of consecutive values from a sampled waveform or a discrete-time sequence. However, such a simple implementation would impose a significant amount of bookkeeping and housekeeping on the simulation models that write to or read from these buffers. PracSim implements signals as objects that each include a number of ancillary parameters and housekeeping methods in addition to the essential buffer for sample values. In a hierarchical system, something as apparently simple as a signal’s name can be a source of complexity. At the system level, a signal might have a name like filtered baseband waveform. Within the filter model that creates this signal, it would be more appropriately called filtered output or simply output sig. Within a bit slicer model that takes this signal as an input, it might be called input wave, while in a correlator model used to estimate delay, this signal might be named delayed input. PracSim handles this situation by allocating a separate Signal object for each context in which a signal appears. One of these objects is designated as the master and all of the other objects for the same signal are linked or connected to this master. Only the master instance of Signal actually allocates the buffer space needed to store sample values—the connected instances all point to this common buffer. The selection of the instance of Signal to be designated as the master instance is not arbitrary. A signal can exhibit “fan-out” in which it serves as an input for multiple models. However, a signal cannot usually exhibit “fan-in,” where a single input on a single model is driven by signals from multiple model outputs. Therefore, it makes sense for the master instance of Signal to be the one associated with the output of the model that actually generates the values to be placed in the buffer. This model controls the one and only write pointer for the sample buffer. Fan-out is accomplished by multiple subordinate instances of Signal that each maintain a separate read pointer for the sample buffer. Signal is actually a class template that can be instantiated for a number of different types of sample values. PracSim includes specializations of Signal for types float, int, bit t, and byte t. A complete implementation of signals involves many attributes and methods that do not depend upon the type of the sample values, and these attributes and methods have been extracted into the nontemplate base class GenericSignal. Tables 2.3 through 2.5 list the attributes and methods belonging to Signal and GenericSignal.
10
Simulation Infrastructure
Chapter 2
Table 2.3 Summary of class template Signal. Constructors: Signal::Signal( char* name ); Signal::Signal( Signal* root id, char* name, PracSimModel* model ); Public Methods: ˜Signal(void); void AllocateSignalBuffer(void); void InitializeReadPtrs(void); T* GetRawOutputPtr(PracSimModel* model); T* GetRawInputPtr(PracSimModel* model); Signal* AddConnection( PracSimModel* model, char* name in model ); void Dump(ofstream); void PassUpdate(void); void SetupPlotSignal(void); void IssuePlotterData(void); Private Attributes: T *Buf Beg T *Phys Buf Beg T *Buf Final Mem Beg T *Next Loc To Plot Notes: 1. This class inherits all methods belonging to GenericSignal. 2. Source code is contained in file signal t.cpp.
2.2.1
Signal Management Strategy
The simplest approach to controlling the flow of signal samples in a simulation would be to have a constant sampling interval throughout the simulation and to have each model, each time invoked, read one sample from each of its input signals and generate one sample for each of its output signals. However, due to the overhead processing incurred every time a model is invoked, it is more efficient to have each model process a block containing multiple samples upon each invocation. Some models,
Section 2.2
11
Signals
Table 2.4
Summary of class GenericSignal.
Constructors: GenericSignal( char* name, PracSimModel* model ); Public Methods: ˜Signal(void); ˜GenericSignal(void); int GetBlockSize(); int GetValidBlockSize(); void SetBlockSize(int block size); void SetValidBlockSize(int block size); char* GetName(void); double GetSampIntvl(void); void SetSampIntvl(double samp intvl); void GenericSignal::SetAllocMemDepth( int req mem depth ); virtual void AllocateSignalBuffer(void){}; GenericSignal* GetId(); virtual void InitializeReadPtrs(void){}; virtual void SetupPlotSignal(void){}; virtual void IssuePlotterData(void){}; void SetupPlotFile( GenericSignal* sig id, double start time, double stop time, int plot decim rate, bool count vice time, bool header desired); virtual void PassUpdate(void){}; double GetTimeAtBeg(void); void SetTimeAtBeg(double time at beg); void SetEnclave(int enclave num); int GetEnclave(void); Notes: 1. Source code is contained in file gensig.cpp.
such as those that use fast convolution or fast correlation, must process a block of a particular size each time invoked. Furthermore, in many situations, it is inconvenient
12
Simulation Infrastructure
Chapter 2
Table 2.5 Attributes of class GenericSignal. Protected Attributes: int Buf Len int Block Size int Valid Block Size int Prev Block Size long Cumul Samps Thru Prev Block int Alloc Mem Depth double Samp Intvl char* Name PracSimModel* Owning Model GenericSignal* Root Id bool Sig Is Root double Plot Start Time double Plot Stop Time int Plot Decim Rate ofstream* Plotter File bool Plotting Enabled bool Plot Setup Complete bool Count Vice Time int Start Sample int Stop Sample int Plotting Wakeup int Plotting Bedtime int Cumul Samp Cnt double Time At Beg int Enclave Num std::vector *Connected Sigs Notes: 1. Definitions are contained in file gensig.h.
to maintain a constant sampling interval throughout the simulation. Some models (such as BitsToWave from Chapter 3) have an inherent need to generate multiple output waveform samples for each input sample. Other models, discussed in Chapter 14, exist for the specific purpose of changing the sampling rates between their inputs and outputs. PracSim includes signal management infrastructure designed to facilitate the use of different sample rates at different places within a simulation. This
Section 2.2
13
Signals
section explores the strategy behind this infrastructure, and Section 2.2.2 discusses the details of its implementation. Consider a model that has one input signal and one output signal. The model and its signals can be represented using a graph, as shown in Figure 2.1. Each signal is represented by a vertex in the graph, and the model is represented by an edge. Each signal has several attributes that are of interest in a discussion of multirate simulation. A processing block size and a sampling interval are associated with each signal. In a single-rate simulation, every signal has the same block size and the same sampling interval. In a multirate simulation, different signals may have different block sizes and different sampling rates, and the simulation can contain both single-rate models and multirate models. For a single-rate model, every input signal and every output signal has the same block size and the same sampling interval. For a multirate model, each signal can, in principle, have a different block size and a different sampling interval—subject to one significant constraint: the average time epoch covered by each signal block must be constant across the entire simulation. In other words, for each model, Nout Tout = Nin Tin where N is the average block size and T is the sampling interval. There is a resampling rate associated with each input/output pair for a multirate model. The value of the resampling rate R is given by R=
Nout Tin = Nin Tout
Once determined, each value of Tin and Tout remains constant over the life of a simulation run. However, only the average values of Nin and Nout need to remain constant over the life of a simulation run. As discussed in Chapter 11, if there is a signal-shifting model in the upstream signal path, there may be slight variations in Nin from block to block. It would be possible to specify the values for N and T as parameter inputs for every model in the simulation, but this would be very cumbersome. Instead, PracSim has been designed to require that values for N and T be specified as input parameters for only a few critical places within a simulation. The PracSim signal management system (SMS) then propagates a consistent set of values for N and T throughout the simulation. To support this approach, each multirate model must convey certain information about itself to the SMS. The information to be conveyed can vary depending upon the nature of the model and the relationships between its inputs and outputs.
14
Simulation Infrastructure
S0
M1 S1
Figure 2.1
2.2.1.1
Chapter 2
input_sig model_A
output_sig
Model graph.
Example System
This section uses a simple example system to explore some of the issues involved in the design of the SMS. For the system shown in Figure 2.2, let’s make the following assumptions: • Encoder uses a code with Rcode = 1/2. • The bit pulses that are output from BitsToWave have a duration of 1.0 normalized time units. • For the fast Fourier transform (FFT) in CmpxFrequencyDomainFilter, Tmem = 20.0, T = 0.0625 and N = 4096. BitGener is a source—it has no input signals and it outputs a single output
signal (S1) comprising a sequence of ones and zeros (one sample per bit). The model’s mission is to generate a block of Nout bits each time its Execute method is invoked. The model is hardwired to produce one sample per bit. A value for Nout must eventually be determined by the SMS, which sets the required size for this output block equal to the required input block size for models adjacent to BitGener. (The term adjacent is used in the nonreflexive, directed graph sense. In Figure 2.2, Encoder is adjacent to BitGener, but BitGener is not adjacent to Encoder.) This model’s constructor does not convey any signal parameters to the SMS. The required value for Nout is obtained by the model’s Initialize method. The Encoder model accepts as input a bit sequence (S1) having one sample per bit. The model encodes this bit sequence in accordance with the definition of the code being implemented. The result of encoding is an output sequence of bits containing more than one bit for each bit in the input sequence. If the input block size is Nin , the output block size Nout will equal Nin /Rcode , where Rcode is the code rate
Section 2.2
15
Signals
M4 M1
M2
M3
S1 BitGener
S2 Encoder
BitsTo Wave
S3
Frequency Domain Filter
S5 BitSlicer M5 M7 Comb Error Counter
Figure 2.2
S4
S6 Decoder M6 M8
M9
Binary Error Counter #1
Binary Error Counter #2
Block diagram of example simulation.
for the particular code being implemented. The Encoder model “knows” that its resampling rate is equal to the inverse of the code rate, and therefore the constructor conveys the value Rresamp = (Rcode )−1 to SMS as the model’s resampling rate. The model cannot immediately determine values for Tin , Tout , Nin , or Nout . The values for these signal parameters are determined by the SMS based upon signal parameter values defined elsewhere in the system. Because Encoder has a value for Rresamp defined in the model constructor, the SMS can immediately propagate signal parameters through this model. For this example, let’s assume Rcode = 1/2. The model BitsToWave accepts a sequence of ones and zeros and generates as output a train of rectangular pulses with a positive-valued pulse for each 1 in the input sequence and a negative-valued pulse for each 0 in the input sequence. The input signal (S2) has one sample per bit, and the output signal (S3) has multiple samples per bit. The user must provide a value for the pulse duration TP . Because there is one pulse per input bit, the BitsToWave model knows that the sampling interval for its input signal is equal to TP . The model’s constructor conveys this value to the SMS as the sampling interval Tin for the input signal.
16
Simulation Infrastructure
Chapter 2
FrequencyDomainFilter is one of the few models that determines its own
block size. Most models have their block sizes determined for them by the SMS. A discrete-Fourier-transform-based model such as FrequencyDomainFilter, which is based on the discrete Fourier transform (DFT), is an obvious place to fix both block size and sampling interval because of the relationship F N T = 1, which holds for all DFTs. In this model it is assumed that Rresamp = 1, Tin = Tout , and Nin = Nout . The user must provide values for the parameters NFFT , TFFT , and Tmem , where Tmem is the effective time duration of the filter’s impulse response. As discussed in Chapter 7, Section 7.6, the number of saved samples in the overlapand-save segmenting strategy must be sufficient to cover a time interval of at least Tmem . The model computes Nin and Nout as Tmem + 0.5 Nin = Nout = NFFT − TFFT For the assumed values of Tmem = 20.0, TFFT = 0.0625, and NFFT = 4096, the resulting value for Nin and Nout is 3776. The model conveys the values of Tin , Tout , Nin , Nout , and Rresamp to the SMS. CombErrorCounter is a hypothetical model that generates a sampling comb that is applied to the input waveform (S4) once per symbol time. The sample values are compared to a bit sequence provided as the reference input (S2). This model was contrived to illustrate what happens when a model has inputs at two or more different sampling rates. This model does not convey any signal parameters to the SMS. The model BitSlicer accepts an input signal (S4), which is a baseband binary waveform, and samples this waveform to generate a sequence of individual output bits. The input signal has multiple samples per bit, and the output signal has one sample per bit. The user must provide a value for the pulse duration TP . Because there is one output bit per pulse, the BitSlicer model knows that the sampling interval for its output signal is equal to TP . The model conveys this value to the SMS as the sampling interval for the output signal (S5). The Decoder model accepts as input an encoded bit sequence (S5) and performs decoding accordance with the definition of the particular code being implemented. Multiple input bits will be consumed in the generation of each bit in the output sequence (S6). The Decoder model knows that its resampling rate is equal to the code rate and therefore conveys the value Rresamp = Rcode to the SMS as the model’s resampling rate. The model cannot immediately determine values for Tin , Tout , Nin , or Nout . The values for these signal parameters are determined by the SMS based upon signal parameter values defined elsewhere in the system. Because Decoder has a value for Rresamp defined in the model constructor, the SMS can immediately
Section 2.2
17
Signals
propagate signal parameters through this model. The BerCounter model compares a received bit sequence to the “true” or reference bit stream for the purpose of counting errors in the received bit sequence. The only signal parameter that this model needs to know is the block size Nin for its two input signals. This value is obtained from the SMS by the model’s Initialize method. All of the signal parameter information provided to the SMS by the model constructors is summarized in the directed graph shown in Figure 2.3. Each signal is represented by a node, and each model instance is represented by an edge in this graph. To avoid unterminated edges in this graph, a dummy source node is provided for BitGener, and dummy destination nodes are provided for CombErrorCounter and both instances of BerCounter. 2.2.1.2
Propagation of Signal Parameters
PracSim models are carefully designed to make maximum use of known relationships between input and output signal parameters, and to provide sufficient information to the simulation infrastructure so that missing signal parameters can be computed from relationships between models which are connected to each other. To generate these mising parameters, the PracSim infrastructure follows these steps:
1. The SMS must search through the graph until it finds a node for which both time increment and block size are defined. Such a node represents a base node from which the SMS can begin to propagate time increments and block sizes to other nodes in the graph. In this example, such a search will find node S3. From the discussion of FrequencyDomainFilter in the previous section, we can determine the time increment TS3 = 0.0625 and block size NS3 = 3776. The SMS attempts to propagate “upstream” from the base node by locating (1) an incident edge to the base node and (2) the source node for this incident edge. In this example, the SMS would locate edge M3 and node S2. Parameter propagation along an edge in the graph occurs in one of two ways, depending on what parameters have already been defined for the edge and its source node: (a) If a resampling rate is defined for the edge, the SMS can use this value to calculate the time increment and block size for the source node from the time increment and block size defined for the base node: Tsource = Rresamp Tbase Nbase Nsource = Rresamp
(2.2.1) (2.2.2)
18
Simulation Infrastructure
Chapter 2
S0
M1 S1 RM2 =
M2 M9a S2
1 =2 Rcode
TS2 = TP = 1.0
M3 S3 M7a
D1
Figure 2.3
S4
M7b
TS4 = TFFT = 0.0625 NS4 = NS3 = 3776
M5
M8b
M9a D3
⎢T ⎥ NS3 = N FFT − ⎢ mem + 0.5⎥ = 3776 ⎣ TFFT ⎦
M4
M8a D2
TS3 = TFFT = 0.0625
S5
M6 M9b
TS5 = TP = 1.0 RM6 = Rcode = 0.5
S6
Signal dependency graph after all models have been constructed.
Section 2.2
19
Signals
In this example, there is no resampling rate yet defined for edge M3. (b) If no resampling rate is defined for the edge, the SMS must turn to the edge’s source node. If either time increment or block size is defined for the source node, the SMS can use this information in Eqs. (2.2.1) and (2.2.2) to compute the resampling rate for the edge and the missing quantity for the source node. In this example, the time increment for node S2 has been defined to be the pulse duration TP . For the sake of concreteness, let’s say TS2 = TP = 1.0. Then we can write Tsource TS2 1 = 16 = = Tbase TS3 0.0625 Nbase NS3 4096 = 256 = = = Rresamp Rresamp 16
Rresamp = Nsource
2. After both TS2 and NS2 have been determined, the SMS treats S2 as a temporary base node and attempts another step of upstream propagation. The new incident edge of interest is M2, and the source node for M2 is S1. It so happens there is a resampling rate defined for M2, and it is Rresamp = (Rcode )−1 = 2. Thus, we can write TS1 = RM2 TS2 = 2 NS2 256 = 128 NS1 = = RM2 2 3. Node S0 and edge M1 are special cases. 4. Now the SMS begins attempts at forward or downstream propagation. A choice must be made here, and the decision is not immediately obvious. The SMS can begin its campaign of forward propagation at node S1, or it could move forward to the original base node S3 and begin from there. The SMS arrived at S1 via backward propagation through edge M2, so there is no need to attempt forward propagation through M2. There is a second departing edge, M9a. The destination node for edge M9a is vertex D3, which is a sink node. Sink nodes exist strictly for keeping the graph tidy, and they do not have a time increment or a block size. 5. After discovering D3 to be a sink node, the SMS backs up to S1 and then moves forward from S1 to S2. Departing from S2 are two previously untraversed edges, D1 and D2.
20
Simulation Infrastructure
Chapter 2
6. Both D1 and D2 terminate in sink nodes, so the SMS moves forward to S3 and attempts forward propagation to S4 via M4. Both TS4 and NS4 are already defined, so the SMS computes RM4 as RM4 =
NS4 4096 =1 = NS3 4096
and then confirms this value as RM4 =
TS3 0.0625 =1 = TS4 0.0625
7. The SMS moves forward to S4 and attempts forward propagation to S5 via edge M5. The time increment has been defined, so the resampling rate for M5 can be computed as RM5 =
TS4 0.0625 = 0.0625 = TS5 1
and the block size for S5 can be computed as NS5 = RM5 NS4 = (0.0625) (4096) = 256 8. The SMS moves forward to S5 and attempts forward propagation to S6 via edge M6. The resampling rate RM6 has been defined, so the time increment and block size for S6 can be computed as TS5 1 =2 = RM6 (1/2) = RM6 NS5 = (1/2)(256) = 128
TS6 = NS6
Figure 2.4 shows the directed graph after all signal parameters have been propagated throughout the system.
2.2.2
SMS Implementation
In order to make systematic use of the signal parameters provided by individual model constructors, the SMS must construct a directed graph, or digraph, similar to the one shown in Figure 2.3. This graph is used to determine dependencies between the various signals, controls, and model instances. In this graph, signals and controls are represented as vertices, and models are represented as directed edges going from the vertices representing model inputs to vertices representing model outputs. For
Section 2.2
21
Signals
S0
M1 TS1 = RM2TS2 = 2.0 N NS1 = S2 = 118 RM2
S1
M2
RM2 = 2
M9a S2
M3 S3
RM3
TS2 = 1.0 N NS2 = S3 = 236 RM 3 TS2 = = 16 TS3 TS3 = 0.0625 NS3 = 3776
M7a D1
M4 M7b
M8a D2
Figure 2.4 defined.
NS4 = 3776
M5 M8b
M9a D3
S4
S5
M6 M9b
NS4 =1 NS3 TS4 = 0.0625
RM4 =
S6
TS4 = 0.0625 TS5 TS5 = 1.0 NS5 = RM5 NS4 = 236
RM5 =
RM6 = 0.5 TS5 = 2.0 RM6 NS6 = RM6 NS5 = 118
TS6 =
Signal dependency graph with all node and edge parameters
22
Simulation Infrastructure
Chapter 2
a single-rate model, the default is to define an edge from each input vertex to each output vertex. For a multirate model, each edge must be explicitly specified by the model constructor. The total “connection picture” for a model is built up using a series of declarations in the model constructor. Eventually, the pictures for all models are merged into one large graph for the entire active system. Until the merging can be accomplished, the connection pattern for each model is stored in an instance of the ModelGraph class. 2.2.2.1 Top-Level Approach
The implementation of the signal management strategy is spread across a number of different classes and methods. This section outlines the sequence of signal-related events that take place as a simulation is constructed, initialized, and executed. Each event is discussed in greater detail in subsequent sections. 1. The program main allocates the master instance of each Signal object that will be used in the simulation. (Section 2.2.2.2.) 2. The program main calls the various model constructors in the order that the models are to be executed. In the current implementation of PracSim, the user must establish this sequence manually by the order in which the constructor calls are placed in main. As discussed on the companion Web site, some infrastructure is provided to support an eventual migration to a graphical specification of the system toplogy, with the execution sequence of the models determined automatically based on signal dependencies between the models. In the current implementation of PracSim, pointers to the appropriate input and output Signal objects are passed in the call to each constructor. 3. For each model constructed, (a) If the model being constructed is not the first model in the simulation, the constructor for the base class PracSimModel calls the method PracSimModel: :CloseoutModelGraph for the previous model. This causes the model graph for the previous model to be integrated into the system graph. (Section 2.2.2.4.) (b) The constructor for the base class PracSimModel creates an instance of ModelGraph. (Section 2.2.2.3.) (c) The constructor for each specific derived model class takes the following actions with respect to signals:
Section 2.2
Signals
23
i. For multirate models, the constructor enables multirate operation using the macro ENABLE MULTIRATE. (Section 2.2.2.3.) ii. The constructor reads parameters from the simulation setup file. Some of these parameters may pertain to input or output signal characteristics. iii. The model constructor copies each passed-in Signal pointer to a corresponding class variable. iv. The model constructor registers each output signal with the current instance of ModelGraph using the macro MAKE OUTPUT(X). (Section 2.2.2.3.) v. The model constructor registers each input signal with the current instance of ModelGraph using the macro MAKE INPUT(X). (Section 2.2.2.3.) vi. If appropriate for the specific model being constructed, the constructor sets edge (i.e., model) parameters in the current model graph using the macro CHANGE RATE(X,Y,Z) or SAME RATE(X,Y). (Section 2.2.2.3.) vii. If appropriate for the specific model being constructed, the constructor sets node (i.e., signal) parameters in the current model graph using SET SAMP INTVL(X,Y) or SET BLOCK SIZE(X,Y). (Section 2.2.2.3.) 4. After the final model constructor has completed, main calls the Executive method MultirateSetup, which performs the following actions: (a) Calls CloseoutModelGraph for the final model. (b) Initializes the SigPlot class. (c) Invokes SystemGraph::ResolveSignalParms, which uses the strategy presented in Section 2.2.1.2 to propagate known signal parameters throughout the entire system graph. (d) Invokes SystemGraph::DistributeSignalParms, which sets the propagated values of signal parameters within each individual signal object. (e) Invokes SystemGraph::AllocateStorageBuffers, which causes the master instance of each signal to allocate the buffer space needed to store a block of signal samples. This allocation is deferred to this point in the processing because not all Signal objects “know” how large
24
Simulation Infrastructure
Chapter 2
their buffers are until after propagated signal parameters have been set in step 4d. (f) Invokes SystemGraph::InitializeReadPtrs, which causes the master instance of each signal to initialize the buffer read pointers in subordinate Signal objects that are connected to the master instance. (g) Invokes SystemGraph::AllocatePlotPointers, which causes the master instance of each signal to create and initialize a complicated structure that is essentially a signal buffer read pointer used by the signalplotting subsystem. The buffer pointers allocated in step 4f are simple pointers that get reset to the beginning of the buffer area on each pass through the simulation. The plotting pointers are complicated by the fact that the plotting of a signal can span many passes and may not even start until after the simulation has been running for some amount of time. (h) Invokes SystemGraph::InitializeModels, which runs the initialization method of every model instance in the system. Invoking the Initialize methods at this point allows these methods to include any calculations that depend upon the availability of valid signal parameters. 5. The simulation enters a loop that invokes SystemGraph::RunSimulation once for each simulation pass. RunSimulation performs the following tasks: (a) Calls the Execute method for each model in the same sequence that the models were constructed. Each time called, the model processes one block’s worth of signal samples. (b) Calls the PassUpdate method for each signal in the system after each model has been invoked. 2.2.2.2
Signal Allocation
The master instance of each Signal object is allocated in main prior to calling the first model constructor. Signal is a class template that can be specialized for a number of different signal types. The constructor for Signal takes an input argument that is a string containing the name of the signal by which it is known at the system level. The calling syntax to allocate a float-valued signal named tx signal would be: Signal* tx signal=new Signal("tx signal");
Section 2.2
Signals
25
A number of macros have been defined in sigstuff.h to simplify the syntax for allocating different specialized Signal objects: BIT SIGNAL(X); BYTE SIGNAL(X); INT SIGNAL(X); FLOAT SIGNAL(X); COMPLEX SIGNAL(X); 2.2.2.3
Current Model Graph
A directed graph can be implemented in software using a list of node descriptors, a list of edge descriptors, and an adjacency matrix that specifies the connections between nodes and edges. Edge k going from vertex i to vertex j is indicated by placing the index k into row i, column j of the adjacency matrix. The class DirectedGraph provided in file digraph.cpp is a minimal implementation of a digraph. The implementation of DirectedGraph (and ModelGraph) makes use of the C++ standard template library (STL). Each node descriptor is simply a pointer to the master instance of the Signal object for the corresponding signal. The list of pointers to Signal instances is kept in an STL vector. ModelGraph maintains a number of lists parallel to the list of node descriptors in DirectedGraph. These lists are parallel in the sense that element k of each list pertains to signal k in the list of node descriptors. These parallel lists contain information like sampling interval, block size, and input/output sense. The hard-core object-oriented approach would be to create node descriptor objects that contain all of this information and eliminate the need for parallel lists. However, PracSim is implemented using parallel lists to avoid the speed penalty associated with dereferencing complicated data structures. In a similar vein, each edge descriptor is simply a pointer to the particular model instance that the edge represents. Each time a model is instantiated, the constructor for the PracSimModel base class creates an instance of the class ModelGraph and establishes the protected class variable Curr Model Graph as a pointer to this instance. The constructor for ModelGraph performs the following tasks: 1. Allocates one instance of DirectedGraph. 2. Allocates STL vector objects for the lists of node and edge properties. 3. Sets Model Is Multirate to the default value of false. 4. Sets Model Is Constant Interval to the default value of false.
26
Simulation Infrastructure
Chapter 2
Model-Wide Signal Properties After the constructor for PracSimModel has created an instance of ModelGraph, and before any specific signals are inserted
into the digraph, the constructor for the derived model class has an opportunity to change several of the defaults that were set in the constructor for ModelGraph: 1. The macro ENABLE MULTIRATE, which expands to Curr Mod Graph->EnableMultirate();
can be used to indicate that the model is a multirate model. 2. The macro ENABLE CONST INTERVAL, which expands to Curr Mod Graph->EnableConstantInterval();
can be used to indicate that the model uses the same sampling interval for all of its inputs and outputs. Signal Registration Before any specific signal parameters can be conveyed to the current model graph, all of the signals that are inputs or outputs of the model must be registered with *Curr Model Graph using the macros MAKE OUTPUT(X) and MAKE INPUT(X). The macro MAKE OUTPUT(X) expands to Curr Mod Graph->InsertSignal( X, this, false);
where X is a pointer to the signal object being registered as an output, this is the pointer to the calling model, and false is a boolean constant indicating that the signal is not an input. The macro MAKE INPUT(X) expands to Curr Mod Graph->InsertSignal( X, this, true); X = X->AddConnection( this, #X);
where this is the pointer to the calling model, true is a boolean constant indicating that the signal is an input, and X starts out as a pointer to the master instance of Signal that actually contains the buffer of samples that were written as output by some upstream model and that will be read as input by the current model. The second line of the macro expansion invokes Signal::AddConnection, which creates a slave instance of Signal, connects this instance to the master instance and then
Section 2.2
Signals
27
returns a pointer to the new slave instance. Before MAKE INPUT(X) is executed, X points to the master instance for the signal of interest. After MAKE INPUT(X) has executed, X points to the newly created slave instance that holds a read pointer for the sample buffer and the signal’s name as it is known inside the model. The InsertSignal method performs the following tasks: 1. Calls the AddVertex method of DirectedGraph to add a new vertex for the signal being inserted. 2. Sets default values for node property list items: (a) Vertex Is Input is set to true or false depending upon which macro was used to invoke InsertSignal. (b) Vertex Kind is set to the enumerated value SK REGULAR SIGNAL. (c) Node Is Feedback is set to false. (d) Block Size is set to zero. (e) Samp Intvl is set to zero. 3. Compares the new vertex against existing vertices to see where new edges need to be added to the graph. If the new vertex is an input, edges are added from new vertex to existing output vertices. If the new vertex is an output, edges are added from existing input vertices to the new vertex. For each newly added edge, it sets default values for edge property list items: (a) Delta Delay is set to zero. (b) Const Intvl is set to the value of Model Is Constant Interval, which was set for the entire model prior to any signals being registered. (c) Resamp Rate is set to an undefined rate if the model is a multirate model. Otherwise, Resamp Rate is set to 1.0. Once a model’s constructor has registered all of its input and output signals with *Curr Model Graph, known signal parameters can be inserted into the graph using a number of macros from the file sigstuff.h, which invoke public methods from class ModelGraph: Setting Signal Parameters
1. The macro CHANGE RATE(X,Y,Z), which expands to Curr Mod Graph->ChangeRate(X->GetId(), Y, Z, this);
28
Simulation Infrastructure
Chapter 2
can be used to set the resampling rate to Z for the edge in the digraph that connects input signal X to output signal Y. The pointer Y can be used directly because, as an output signal, the instance of Signal pointed to by Y will be the master instance for this particular signal. As an input signal, the instance of Signal pointed to by X will only be a connected instance, and the pointer X must be used to invoke GenericSignal::GetId(), which returns a pointer to the corresponding master instance. 2. The macro SAME RATE(X,Y), which expands to Curr Mod Graph->ChangeRate(X->GetId(), Y, 1.0, this);
can be used to set the resampling rate to 1.0 for the edge in the digraph that connects input signal X to output signal Y. 3. The macro SET SAMP INTVL(X,Y), which expands to Curr Mod Graph->SetSampIntvl(X->GetId(),Y);
can be used to set the sample rate for signal X to the value Y. 4. The macro SET BLOCK SIZE(X,Y), which expands to Curr Mod Graph->SetBlockSize(X->GetId(),Y);
can be used to set the block size for signal X to the value Y. 2.2.2.4
Building the System Graph
The first thing that the constructor for the PracSimModel base class does is call the method PracSimModel::CloseoutModelGraph for the previous model. The constructor for the first model in a simulation detects that no previous model exists and therefore does not call CloseoutModelGraph. The last model in a simulation is taken care of by Executive::MultirateSetup, which runs immediately after the final model’s constructor. Except for some optional debug output, CloseoutModelGraph accomplishes its work by invoking two methods belonging to other classes. The first method is ModelGraph::Closeout, which puts the finishing touches on the previous model’s digraph. To avoid the possibility of “dangling” edges, each current model graph (CMG) must have at least one input node
Section 2.3
Controls
29
and one output node. If the CMG does not have any input nodes, Closeout adds a dummy source node for which Vertex Kind is set to SK DUMMY SOURCE SIGNAL. If the CMG does not have any output nodes, Closeout adds a dummy destination node for which Vertex Kind is set to SK DUMMY DEST SIGNAL. SystemGraph::MergeCurrModelGraph is the second method invoked by CloseoutModelGraph and performs the following tasks: 1. Checks each node in the model graph against each node that is already in the system graph. If a match is found, it places the node’s identity in a temporary list of merged nodes and performs the following: (a) If the sampling rate is undefined for the matching node in the system graph (SG), then it copies the sampling rate from the matching CMG node. If the sampling rate is defined differently in the CMG and SG, then a fatal error condition exists. (b) If the block size is undefined for the matching node in the SG, then it copies the block size from the matching CMG node. If the block size is defined differently in the CMG and SG, then a fatal error condition exists. 2. For each CMG node not matched to a node in the SG, the SystemGraph method MergeCurrModelGraph adds a new node to the SG, places this node’s identity in a temporary list of merged nodes, and sets the sampling rate and block size to the values defined in the CMG. 3. Once all the nodes in the CMG have been matched or added to the SG, the adjacency matrix is examined for the presence of edges between nodes appearing on the temporary list of merged nodes. If any such edges are already in the SG, a fatal error condition exists. Such edges must have been placed by some other model, implying that such a model and the current model are both trying to produce the same output signal. This type of fan-in is not supported in PracSim. 4. Inserts new edges into the SG so that every input signal in the merged node list is connected by an edge to every output signal in this list. Edge parameters are copied from the corresponding edges in the CMG.
2.3
Controls
Controls in PracSim are “lightweight” signals, created to handle situations in which one model needs to excise some form of control over another model. Although this
30
Simulation Infrastructure
Chapter 2
control could be accomplished using the signal mechanism discussed in Section 2.2, a separate control mechanism can make it easier to configure a simulation and eliminate a significant amount of the overhead associated with signals. Control is a class template that can be instantiated for a number of different types of control values. PracSim includes specializations of Control for types double, float, int, and bool. A complete implementation of signals involves a few attributes and methods that do not depend upon the type of the sample values, and these attributes and methods have been extracted into the nontemplate base class GenericControl. Tables 2.6 and 2.7 list the attributes and methods belonging to Control and GenericControl.
Table 2.6
Summary of class template Control.
Constructors: Control::Control( char* name ); Control::Control( char* name, PracSimModel* model ); Public Methods: ˜Control(void); T GetValue(void); void SetValue(T value); Private Attribute: T Cntrl Value Notes: 1. This class inherits all methods belonging to GenericControl. 2. Source code is contained in file control t.cpp.
2.4
Results Reporting
PracSim uses stream I/O for all results reporting. Not including the signal plotting files, PracSim creates two or three output files and defines four output streams for each simulation. Often, a user will want different levels of detail in the results
Section 2.4
Results Reporting
Table 2.7
31
Summary of class GenericControl.
Constructor: GenericControl( char* name, PracSimModel* model ); Public Methods: ˜GenericControl(void); char* GetName(void); GenericControl* GetId(void); Protected Attributes: char* Name; PracSimModel* Owning Model; GenericControl* Root Id; Notes: 1. Source code is contained in file genctl.cpp.
report depending upon where the simulation is in its development cycle. Early in the development cycle, very detailed results can be useful in determining that the simulation and its constituent models are configured and operating correctly. Later, when the development is complete, and a number of different parametrized cases are to be run, a less-detailed report may be more convenient. PracSim creates both a full report and a short report. The user has a choice via the system-level parameters Date In Full Report Name and Date In Short Report Name concerning whether or not the report file names will include the time and date at the start of the simulation. For a simulation named BpskSim, started at 14:27:03 on December 28, 2003, the full report file would have one of the following names: BpskSim full.txt BpskSim full 031228 14 27 03.txt
The corresponding short report would have one of two similar names BpskSim short.txt BpskSim short 031228 14 27 03.txt
32
Simulation Infrastructure
Chapter 2
If the date is not included in the report file name, running the simulation multiple times will cause the new file to overwrite the existing file of the same name. This is convenient while a simulation is being constructed and the results are of no interest. If the flag DEBUG is defined, PracSim will also create a debug file BpskSim.dbg. It will be appropriate for some results to be written to more than one report file. To make this easy for the user, PracSim defines a number of output streams that automatically route the results to the appropriate files. The relationship between these streams and the output files is summarized in Figure 2.5.
DetailedResults
Long Report File
BasicResults
Short Report File
Debug File
ErrorStream
Figure 2.5
Relationship between output streams and files.
Appendix 2A
EXAMPLE SOURCE CODE
The support directory on the companion Web site contains the classes that implement the PracSim simulation infrastructure. A number of these classes are listed in Table 2A.1. In addition to the items listed, the support directory also contains I/O support routines for the enumerated types presented in Section 2.1.3. Table 2A.1
Classes in the support directory.
class
file
DirectedGraph Executive GenericControl GenericSignal PsModelError ModelGraph ParmFile PracSimModel PracSimStream SignalPlotter SystemGraph Control Signal
digraph.cpp exec.cpp genctl.cpp gensig.cpp model error.cpp model graph.cpp parmfile.cpp psmodel.cpp psstream.cpp sigplot.cpp syst graph.cpp control T.cpp signal T.cpp
2A.1 PracSimModel Every model in PracSim inherits from the base class PracSimModel. The header for PracSimModel is provided in Listing 2A.1, and implementations for the various methods are provided in Listings 2A.2 through 2A.4. 33
34
Example Source Code
Listing 2A.1
Header for the PracSimModel base class.
class PracSimModel { public: PracSimModel( char *model_name, PracSimModel* outer_model); PracSimModel( int dummy_for_unique_signature, char *model_name); ~PracSimModel(void); const char* GetModelName(void); const char* GetInstanceName(void); int GetNestDepth(void); void CloseoutModelGraph(int key); virtual void Initialize(void){}; virtual int Execute(void){return(-1);}; protected: typedef struct{ GenericSignal *Ptr_To_Sig; bool Sig_Is_Optional; } Sig_List_Elem; char *Model_Name; char *Instance_Name; std::list *Output_Sigs; std::list *Input_Sigs; ModelGraph* Curr_Mod_Graph; int Nest_Depth; };
Appendix 2A
Section 2A.1 PracSimModel
Listing 2A.2
Constructor for the PracSimModel base class.
PracSimModel::PracSimModel( char* instance_name, PracSimModel* outer_model) { //--------------------------------------------------// Closeout the CMSG for previous model instance // and merge it with the Active System Graph Nest_Depth = 1 + outer_model->GetNestDepth(); if( (PrevModelConstr !=NULL) && (Nest_Depth==1)) { #ifdef _DEBUG *DebugFile GetName()),".txt\0"); Plotter_File = new ofstream(file_name, ios::out); Plotting_Enabled = true; Plot_Start_Time = start_time; Plot_Stop_Time = stop_time; Plot_Decim_Rate = plot_decim_rate; Count_Vice_Time = count_vice_time; *DebugFile SetValidBlockSize(Proc_Block_Size); if(Noise_Only_Sig != NULL) Noise_Only_Sig->SetValidBlockSize(Proc_Block_Size); //--------------------------------------------------// determine the power of the input signal in_sig = GET_INPUT_PTR(In_Sig); double sum = 0.0; for(is=0; is *cmpx out sig, Signal *mag out sig, Signal *phase out sig); Parameters: double Bit Durat; double Data Skew; double Subcar Misalign; double Phase Unbal; double Amp Unbal; bool Shaping Is Bipolar; Notes: 1. Source code is contained in file mskmod.cpp.
1.2
envelope
1.1 1 0.9 0.8 5
10
15
20 time (norm. sec.)
25
30
35
Figure 9.37 Envelope variations for an MSK modulator with an amplitude unbalance of 1.1 and a phase unbalance of 5 degrees.
312
Modulation and Demodulation
Chapter 9
1.3 1.2
envelope
1.1 1 0.9 0.8 0.7 5
10
15
20 time (norm. sec.)
25
30
35
Figure 9.38 Envelope variations for an MSK modulator with an amplitude unbalance of 1.1, a phase unbalance of 5 degrees, and a data skew of 0.05.
9.6.3
Properties of MSK Signals
The power spectral density of an MSK signal is given by
8Eb cos [2π (f + fc ) Tb ] 2 cos [2π (f − fc ) Tb ] 2 + (9.6.4) PMSK = 2 π 1 − [4Tb (f − fc )]2 1 − [4Tb (f + fc )]2 Evaluation of this equation is straightforward except at the values of f for which the denominators in the the two fraction terms become zero. When f = fc ± (4Tb )−1 , the denominator of the first term equals zero. For these values of f , the numerator will also be zero, so we have an indeterminate form of type 00 that can be evaluated using L’Hospital’s rule. Similarly, the second term will be an indeterminate form of type 00 when f = −fc ± (4Tb )−1 . Applying L’Hospital’s rule, we obtain cos [2π (f − fc ) Tb ] −π lim = 2 4 f →fc ±(4Tb )−1 1 − [4Tb (f − fc )] cos [2π (f + fc ) Tb ] π = lim 2 4 f → −fc ±(4Tb )−1 1 − [4Tb (f + fc )] These values of f are treated as special cases in the function MskPsd that was used to generate the normalized plot of Eq. (9.6.4) shown in Figure 9.39. An estimated PSD for a complex baseband simulation of an MSK signal is shown in Figure 9.40.
Section 9.6
313
Minimum Shift Keying
0 -10
dB
-20 -30 -40 -50 -60 -4
-3
-2
-1
0
1
2
3
4
Normalized frequency offset from carrier, (f-f c )/Rs
Figure 9.39
PSD for an MSK signal.
0
-10
dB
-20
-30
-40
-50 0
0.5
1
1.5
2
2.5
3
3.5
4
Normalized frequency offset from carrier, (f-f c )/Rs
Figure 9.40 signal.
PSD estimated from a complex baseband simulation of an MSK
314 9.6.3.1
Modulation and Demodulation
Chapter 9
Error Performance
The probability of bit error for MSK is the same as for QPSK; that is, ' ' 1 2Eb Eb Pb = Q = erfc N0 2 N0 Figure 9.41 contains a plot of Pb and several estimated bit-error-rate values that were obtained from simulation of an ideal MSK modulator and perfectly synchronized I&D demodulator.
1x10 0
Probability of bit error
1x10 -1
1x10 -2
X 1x10 -3
X X
1x10 -4
X 1x10 -5 -8
-4
0
4
8
12
Eb /N0
Figure 9.41
Probability of bit error for MSK signals.
Appendix 9A
EXAMPLE SOURCE CODE
The companion Web site includes 14 Microsoft Visual Studio .NET projects, each comprising a simulation that demonstrates and provides a test vehicle for a different pairing of modulator and demodulator models, as listed in Table 9A.1. Table 9A.1
Projects in Modulation directory.
project
modulator
demodulator
BpskSim
BpskModulator
BpskCorrelationDemod
BpskSim Bp
BpskBandpassModulator
BpskBandpassDemod
FskCohSim
ComplexVco
FskCoherentDemod
FskCohSim Bp
BandpassVco
FskCoherentBandpassDemod
FskSim Bp
FskTwoToneModulator
FskBandpassDemod
MpskSim
MpskSymbsToQuadWave
MpskOptimalDemod
QuadratureModulator MpskSim Bp
MpskSymbsToQuadWave
MpskOptimalBandpassDemod
QuadBandpassModulator MskSim
MskModulator
QuadratureDemod
OqpskSim
QuadratureModulator
QuadratureDemod
QamSim
QamSymbsToQuadWaves
QamOptimalDemod
IntegrateDumpAndSlice
QuadratureModulator QamSim Bp
QamSymbsToQuadWaves
QuadBandpassMixer
QuadBandpassModulator
IntegrateAndDump QamSymbolDecoder
QpskSim
QuadratureModulator
QuadratureDemod
QpskSim Bp
QuadBandpassModulator
QuadBandpassMixer IntegrateDumpAndSlice
QpskSim Corr
QuadratureModulator
QpskOptimalBitDemod
315
316
Example Source Code
Appendix 9A
9A.1 MskModulator The header for MskModulator is shown in Listing 9A.1. The constructor is provided in Listing 9A.2, and the Execute method is provided in listing 9A.3.
Listing 9A.1
Header for MskModulator model.
class MskModulator : public PracSimModel { public: MskModulator( char* instance_name, PracSimModel *outer_model, Signal< float >* in_signal_i, Signal< float >* in_signal_q, Signal< complex >* out_signal, Signal< float >* mag_signal, Signal< float >* phase_signal ); ~MskModulator(void); void Initialize(void); int Execute(void); private: float Phase_Unbal; float Amp_Unbal; double Pi_Over_Bit_Dur; double Bit_Durat; double Samp_Intvl; float Subcar_Misalign; float Data_Skew; int Shaping_Is_Bipolar; std::complex Phase_Shift; int Samps_Out_Cnt; int Block_Size; Signal< float > *I_In_Sig; Signal< float > *Q_In_Sig; Signal< std::complex > *Cmpx_Out_Sig; Signal< float > *Mag_Out_Sig; Signal< float > *Phase_Out_Sig; };
Section 9A.1 MskModulator
Listing 9A.2
Constructor for MskModulator model.
MskModulator::MskModulator( char* instance_name, PracSimModel* outer_model, Signal< float >* i_in_sig, Signal< float >* q_in_sig, Signal< complex >* cmpx_out_sig, Signal< float >* mag_out_sig, Signal< float >* phase_out_sig ) :PracSimModel(instance_name, outer_model) { MODEL_NAME(MskModulator); // Read model config parms OPEN_PARM_BLOCK; GET_DOUBLE_PARM(Bit_Durat); GET_DOUBLE_PARM(Data_Skew); GET_DOUBLE_PARM(Subcar_Misalign); GET_DOUBLE_PARM(Phase_Unbal); GET_DOUBLE_PARM(Amp_Unbal); GET_BOOL_PARM(Shaping_Is_Bipolar); // Connect input and output signals I_In_Sig = i_in_sig; Q_In_Sig = q_in_sig; Cmpx_Out_Sig = cmpx_out_sig; Mag_Out_Sig = mag_out_sig; Phase_Out_Sig = phase_out_sig; MAKE_OUTPUT( Cmpx_Out_Sig ); MAKE_OUTPUT( Mag_Out_Sig ); MAKE_OUTPUT( Phase_Out_Sig ); MAKE_INPUT( I_In_Sig ); MAKE_INPUT( Q_In_Sig ); // Set up derived parms double phase_unbal_rad = PI * Phase_Unbal / 180.0; Pi_Over_Bit_Dur = PI/Bit_Durat; Phase_Shift = complex( -sin(phase_unbal_rad), cos(phase_unbal_rad)); }
317
318
Example Source Code
Listing 9A.3
Appendix 9A
Execute method for MskModulator model.
int MskModulator::Execute(void) { float *i_in_sig_ptr, *q_in_sig_ptr; float *phase_out_sig_ptr, *mag_out_sig_ptr; float subcar_misalign, amp_unbal, data_skew; float work, work1; std::complex work2; std::complex *cmpx_out_sig_ptr; int samps_out_cnt; double samp_intvl; double pi_over_bit_dur, argument; std::complex phase_shift; long int_mult; int shaping_is_bipolar; int block_size; int is; cmpx_out_sig_ptr = GET_OUTPUT_PTR( Cmpx_Out_Sig ); phase_out_sig_ptr = GET_OUTPUT_PTR( Phase_Out_Sig ); mag_out_sig_ptr = GET_OUTPUT_PTR( Mag_Out_Sig ); i_in_sig_ptr = GET_INPUT_PTR( I_In_Sig ); q_in_sig_ptr = GET_INPUT_PTR( Q_In_Sig ); samps_out_cnt = Samps_Out_Cnt; samp_intvl = Samp_Intvl; subcar_misalign = Subcar_Misalign; amp_unbal = Amp_Unbal; data_skew = Data_Skew; pi_over_bit_dur = Pi_Over_Bit_Dur; phase_shift = Phase_Shift; shaping_is_bipolar = Shaping_Is_Bipolar; block_size = I_In_Sig->GetValidBlockSize(); Cmpx_Out_Sig->SetValidBlockSize(block_size); Mag_Out_Sig->SetValidBlockSize(block_size); Phase_Out_Sig->SetValidBlockSize(block_size); for (is=0; is* in_sig, Signal< bit_t >* symb_clock_in, Signal< byte_t >* out_sig ); ~MpskOptimalDemod(void); void Initialize(void); int Execute(void); private: double Out_Samp_Intvl; int Block_Size; Signal< byte_t > *Out_Sig; Signal< std::complex > *In_Sig; Signal< bit_t > *Symb_Clock_In; int Bits_Per_Symb; int Samps_Per_Symb; byte_t Num_Diff_Symbs; double *Integ_Val; std::complex *Conj_Ref; };
Section 9A.2 MpskOptimalDemod
Listing 9A.5
Constructor for MpskOptimalDemod model.
MpskOptimalDemod::MpskOptimalDemod( char* instance_name, PracSimModel* outer_model, Signal< complex< float > >* in_sig, Signal< bit_t >* symb_clock_in, Signal< byte_t >* out_sig ) :PracSimModel(instance_name, outer_model) { MODEL_NAME(MpskOptimalDemod); ENABLE_MULTIRATE; // Read model config parms OPEN_PARM_BLOCK; GET_INT_PARM(Bits_Per_Symb); GET_INT_PARM(Samps_Per_Symb); // Connect input and output signals Out_Sig = out_sig; Symb_Clock_In = symb_clock_in; In_Sig = in_sig; MAKE_OUTPUT( Out_Sig ); MAKE_INPUT( Symb_Clock_In ); MAKE_INPUT( In_Sig ); double resamp_rate = 1.0/double(Samps_Per_Symb); CHANGE_RATE( In_Sig, Out_Sig, resamp_rate ); CHANGE_RATE( Symb_Clock_In, Out_Sig, resamp_rate ); Num_Diff_Symbs = 1; for(int i=1; iGetBlockSize(); Out_Samp_Intvl = Out_Sig->GetSampIntvl(); // // set up table of phase references Conj_Ref = new std::complex[Num_Diff_Symbs]; Integ_Val = new double[Num_Diff_Symbs]; for( byte_t isymb=0; isymbGetValidBlockSize(); Out_Sig->SetValidBlockSize(block_size/ Samps_Per_Symb); integ_val = Integ_Val; for (is=0; is *in sig, Signal< std::complex > *out sig); Parameters: int Fft Size; double Dt For Fft; float Overlap Save Mem; bool Bypass Enabled; char* Magnitude Data Fname; double Mag Freq Scaling Factor; char* Phase Data Fname; double Phase Freq Scaling Factor; Notes: 1. Source code is contained in file polar freq dom filt.cpp.
Example 10.4 As shown in Example 10.3, the scatter diagram for an 8-PSK signal is virtually unchanged when the signal is passed through a memoryless nonlinearity. However, this same signal will be degraded by a two-box
348
Amplifiers and Mixers
A
SymbGener
MpskSymbs To QuadWaves
Chapter 10
Additive Gaussian Noise
CmpxToQuadrature Quadrature Modulator
PolarFreq Domain Filter
AnlgDirect FormFir (as LPF)
AnlgDirect FormFir (as LPF)
QuadratureToCmpx Nonlinear Amplifier Cmpx IqPlot
A
Figure 10.24
Simulation architecture for Example 10.4.
nonlinear amplifier model that uses the magnitude and phase responses from Figures 10.22 and 10.23. The simulation architecture is shown in Figure 10.24. Figure 10.25(b) shows the scatter diagram for an 8-PSK signal (with Eb /N0 = 14 dB) passed through the two-box nonlinear amplifier. Compare this to Figure 10.25(a), which shows the scatter diagram when the filter is bypassed so that the signal passes through only the memoryless nonlinearity from Example 10.3. In AWGN, an 8-PSK signal with Eb /N0 = 14 dB achieves an SER of 2 × 10−5 when optimally demodulated. When the two-box model from this example is added to the signal path, the SER degrades to 9 × 10−4 .
349
Section 10.3 Two-Box Nonlinear Amplifier Models
a
b
Figure 10.25 Scatter diagram of 8PSK constellation for Example 10.4: (a) output of data filters with PolarFreqDomainFilter bypassed and (b) with PolarFreqDomainFilter enabled.
Appendix 10A
EXAMPLE SOURCE CODE
The companion Web site includes four Microsoft Visual Studio projects, each comprising a simulation that demonstrates and provides a test vehicle for a different aspect of amplifier modeling. The project Zm Nonlin uses the the model IdealHardLimiter to demonstrate the operation of a memoryless nonlinearity. The project AmAm AmPm uses the NonlinearAmplifier model presented in Section 10A.1 to simulate theAM/AM andAM/PM conversion often exhibited by practical amplifers. The project NLA 2 Box combines the use of NonlinearAmplifier with the use of the model PolarFreqDomainFilter model to assess the impact of nonlinear amplification upon modulated data waveforms. Project Msk 2Box is similar to NLA 2 Box, but structured to use MSK modulator and demodulator models.
10A.1
NonlinearAmplifier
The header for NonlinearAmplifier is provided in Listing 10A.1, and implementations for the constructor and the Execute method are provided in Listings 10A.2 and 10A.3 respectively. This model uses the SampledCurve class shown in Listing 10A.4 to interpolate sampled AM/AM and AM/PM curves for various input power levels.
350
Section 10A.1 NonlinearAmplifier
Listing 10A.1
Header for NonlinearAmplifier model.
class NonlinearAmplifier : public PracSimModel { public: NonlinearAmplifier( char* instance_nam, PracSimModel *outer_model, Signal< complex > *in_signal, Signal< complex > *out_sig ); ~NonlinearAmplifier(void); void Initialize(void); int Execute(void); private: int Out_Avg_Block_Size; int In_Avg_Block_Size; Signal< complex > *In_Sig; Signal< complex > *Out_Sig; double Output_Power_Scale_Factor; double Phase_Scale_Factor; double Anticipated_Input_Power; double Operating_Point; double Agc_Time_Constant; float Input_Power_Scale_Factor; bool Agc_On; SampledCurve *Am_Am_Curve; SampledCurve *Am_Pm_Curve; char *Am_Am_Fname; char *Am_Pm_Fname; };
351
352
Example Source Code
Listing 10A.2
Appendix 10A
Constructor for NonlinearAmplifier model.
NonlinearAmplifier::NonlinearAmplifier( char* instance_name, PracSimModel* outer_model, Signal< complex >* in_sig, Signal< complex >* out_sig ) :PracSimModel(instance_name, outer_model) { MODEL_NAME(NonlinearAmplifier); //
Read model config parms
OPEN_PARM_BLOCK; GET_DOUBLE_PARM(Output_Power_Scale_Factor); GET_DOUBLE_PARM(Phase_Scale_Factor); GET_DOUBLE_PARM(Anticipated_Input_Power); GET_DOUBLE_PARM(Operating_Point); GET_DOUBLE_PARM(Agc_Time_Constant); Input_Power_Scale_Factor = float(Operating_Point/Anticipated_Input_Power); Am_Am_Fname = new char[64]; strcpy(Am_Am_Fname, "\0"); GET_STRING_PARM(Am_Am_Fname); Am_Pm_Fname = new char[64]; strcpy(Am_Pm_Fname, "\0"); GET_STRING_PARM(Am_Pm_Fname); //
Connect input and output signals
In_Sig = in_sig; Out_Sig = out_sig; MAKE_OUTPUT( Out_Sig ); MAKE_INPUT( In_Sig ); Am_Am_Curve = new SampledCurve(Am_Am_Fname); Am_Pm_Curve = new SampledCurve(Am_Pm_Fname); }
Section 10A.1 NonlinearAmplifier
Listing 10A.3
Execute method for NonlinearAmplifier model.
int NonlinearAmplifier::Execute() { complex *out_sig_ptr, out_sig; complex *in_sig_ptr, in_sig; complex agc_in_sig; float power, power_out; float input_phase; double phase_shift; double amplitude; double phase_out; double sum_in, sum_out; double amp_sqrd; double avg_power_in, avg_power_out; int block_size, is; block_size = In_Sig->GetValidBlockSize(); Out_Sig->SetValidBlockSize(block_size); //------------------------------------------------out_sig_ptr = GET_OUTPUT_PTR( Out_Sig ); in_sig_ptr = GET_INPUT_PTR( In_Sig ); block_size = In_Sig->GetValidBlockSize(); Out_Sig->SetValidBlockSize(block_size); sum_in = 0.0; sum_out = 0.0; for(is=0; isGetValue(power)); sum_out += power_out; phase_shift = Am_Pm_Curve->GetValue(power); amplitude = sqrt(2.0*power_out); phase_out = input_phase + Phase_Scale_Factor*phase_shift; out_sig = complex( float(amplitude*cos(phase_out)), float(amplitude*sin(phase_out))); *out_sig_ptr++ = out_sig; } avg_power_out = sum_out/block_size; avg_power_in = sum_in/block_size/2; BasicResults
::DiscreteDelay( //note 2 char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal, Control *dynamic delay) DiscreteDelay< T > ::DiscreteDelay( //note 3 char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal) Parameters: DELAY MODE T Delay Mode; int Initial Delay In Samps; int Num Initial Passes; int Max Delay In Samps; Notes: 1. BlockSize for this model is set by the PracSim system. 2. This constructor does not support DELAY MODE GATED. 3. This constructor does not support DELAY MODE GATED or DELAY MODE DYNAMIC. 4. Source code is contained in file discrete delay T.cpp.
361
362
Synchronization and Signal Shifting
Chapter 11
the second input block. The remaining Nblock − Nadv samples from the second input block are saved inside the model for use in building the second output block. A model that operates this way to realize a signal advance does not adhere to a block-synchronous protocol—output block B cannot be generated until after input block B + 1 is available. The problem becomes even worse when Nadv is greater than Nblock . In this case, output block B cannot be generated until after input block B+Badv , where Badv = Nadv /Nblock . Some fundamental changes to the simulation infrastructure must be made to accommodate models that operate this way. Two different approaches are supported in PracSim. 11.1.2.1
Block-Synchronous Enclaves
Consider the hypothetical simulation shown in Figure 11.2. The models upstream from the SignalAdvance model—Model A through Model B—must each be executed in sequence for a total of Badv +1 passes to generate the input samples needed by SignalAdvance to generate the first block of its output signal, sig b. The models downstream from the SignalAdvance model—Model C through Model D—are not permitted to execute until after SignalAdvance begins producing blocks of sig b. Considered apart from the rest of the simulation, Model A through Model B can be operated in a block-synchronous fashion. Likewise, the models Model C through Model D can be operated in a block-synchronous fashion when the rest of the simulation is not involved. Thus, even though the simulation as a whole is not block-synchronous, it can be divided into two block-synchronous enclaves joined together by the SignalAdvance model, which acts as a gateway for the flow of data and control between the enclaves. During the execution phase of a simulation, the Execute methods of the various models are called in the proper sequence by the RunSimulation method of the ActiveSystemGraph class. When the Execute methods of block-synchronous models terminate normally, they return a value of MES AOK (Model Execution Status, All OKay) to RunSimulation, thus allowing the execution to continue on to the next model in the sequence. Until it is able to issue a block of output samples, a block-asynchronous model such as SignalAdvance will return a value of MES RESTART that causes RunSimulation to increment the global value of PassNumber and begin executing the first model in enclave 0. Once enclave 0 has been executed in this way for a number of passes and the block-asynchronous model has received a number of input blocks sufficient to allow generation of the first output block, the block-asynchronous model will perform the following exit sequence:
Section 11.1
363
Shifting Signals in Time
Model_A
Model_B
enclave 0
signal_a SignalAdvance signal_b Model_C
enclave 1
Model_D
Figure 11.2 Simulation architecture for exploring the problems raised by signal advance.
1. Call SigPlot.CollectData to cause plot values for enclave 0 to be collected. [For the final enclave in a simulation, the call to SigPlot.CollectData is made by RunSimulation at the end of each pass. When the entire simulation is block-synchronous, the final enclave is the only enclave, and none of the models needs concern itself with calling SigPlot.CollectData.] 2. Increment the global value of enclave number. 3. Set the local value of New Pass Number to 1. 4. Set the global value of PassNumber equal to the local value of New Pass Number. 5. Return a value of MES AOK to RunSimulation. This allows execution to continue on to the first model in the next enclaves rather than return to the top of enclave 0. On subsequent passes, the model performs all of these actions with one exception. Instead of setting New Pass Number equal to 1, the existing value is simply incremented by 1.
364
Synchronization and Signal Shifting
Chapter 11
11.1.2.2 Variable Block Lengths
For operation using block-synchronous enclaves, execution is not permitted to progress from one enclave to the next until the block-asynchronous model joining the the two enclaves has received a number of input samples sufficient to allow generation of a complete output block. An alternative approach is to allow the blockasynchronous model to generate an incomplete output block during every pass. On each pass, the output block will be as large as possible given the available inputs. Models downstream from the block-asynchronous model must be prepared to deal with varying block sizes. For most models, this is simply a matter of reading the block size from the signal once per pass rather than just once at system initialization. Models that use FFTs and models that implement memory as fixed-length circular buffers both depend upon input signals having block lengths that are constant. A reblocking model can be used to allow simulations with variable block lengths to include models that depend upon fixed block lengths. A reblocking model contains a buffer that accumulates input samples over multiple simulation passes. Whenever the buffer does not contain enough samples for an output block to be issued, the model sets a flag that downstream models can check during each pass to determine whether or not they should execute. 11.1.2.3
Dynamic Advances
Implementing a dynamic advance model can run into a buffering problem that is similar to the buffering problem encountered in connection with the dynamic delay model. Consider the case of a signal having 10 samples per block, as depicted in Figure 11.3. If the initial advance is four sample intervals in passes 1 and 2 of enclave 0, the samples in the enclave 1, pass 1, output block will be the samples x4 , x5 , x6 , . . . , x13 . Input samples x14 through x19 will be saved in the model’s internal buffer for use in pass 2 of enclave 1. If the advance is then reduced to one sample, the samples in the pass 2 output block should be x11 , x12 , . . . , x20 . However, if the model is implemented using a minimal buffering scheme, the samples x11 , x12 , and x13 would not be available, having been used and discarded during pass 1 of enclave 1. Similar to the approach taken for variable delays, the variable advance model should allocate and maintain its buffer based on a maximum specified delay that is provided as one of the model’s configuration parameters. The correct implementation of buffering for variable advances is slightly more complicated than it is for variable delays. Not only does the specified maximum advance govern the size of the internal buffer, it also establishes the offset between the pass numbers in the enclaves for which the advance model serves as a gateway. Suppose that we specify a maximum advance of 24 samples for the case of a signal
enclave 0 pass 1 input x0 x1 x2 x3 x4 x5 x6 x7 x8 x9
Control: advance = 4 samples
buffer pass 1-2 x4 x5 x6 x7 x8 x9 4 sample advance
enclave 0 pass 2 input x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 buffer pass 2-3 x14 x15 x16 x17 x18 x19
365 1 sample advance
x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 enclave 1 pass 1 output
x14 x15 x16 x17 x18 x19 x20 enclave 1 pass 2 output required samples not available
Figure 11.3
Sample loss in simple implementation of dynamic advance.
Control: advance = 1 sample enclave 0 pass 3 input x20 x21 x22 x23 x24 x25
buffer pass 3-4 x21 x22 x23 x24 x25
366
Synchronization and Signal Shifting
Chapter 11
having 10 samples per block. If the variable delay starts out at the maximum, the model must skip over the first two input blocks and the first four samples of the third block. The first output block (i.e., pass 1 of enclave 1) will contain samples x24 through x33 . The enclave providing input to the advance model must execute pass 4 before the enclave accepting the advanced output can begin pass 1; thus the offset between enclaves is 4 − 1 = 3 passes. If the two enclaves are to remain block synchronous within themselves, this offset cannot change over the life of the simulation—even if the desired amount of delay changes. If the desired advance is reduced from 24 to zero during pass 5 of enclave 0, and the offset between enclaves is held constant, the second output block of the advance model should contain samples x10 through x19 , and the internal buffer should hold samples x20 through x49 for use in subsequent passes. We can generalize on this example to conclude that the internal buffer must be sized to hold P complete input blocks, where , Nmax P = Nblock Another way to look at this is to forget about enclaves for a minute and just recognize that an advance of 24 is equivalent to an advance of 30 plus a delay of 6. This delay could be accomplished using a buffer of length 6. The two different block-synchronous enclaves offset by three passes simply provide a systematic way to mechanize an advance of 30 samples. However, once the offset between enclaves is set to 3 passes, it must stay fixed for the life of the simulation. Thus, when the advance is reduced to zero, it must be viewed as an advance of 30 plus a delay of 30, the latter requiring the model to have an internal buffer of length 30. In general, an advance of Nmax samples can be viewed as an advance of P Nblock samples plus a delay of P Nblock − Nmax samples. It turns out that even though a maximum advance of Nmax is specified, the model can actually support an advance of up to P Nblock samples by simply allowing the delay component to become zero. 11.1.2.4
Discrete Advance Model
Table 11.2 summarizes the model DiscreteAdvance that can be used for advancing signals by integer multiples of the sampling interval. The model can operate in one of four different modes depending upon the value of Advance Mode that is read from the parameter input file when the simulation is being configured. Advance Mode is a variable of type ADVANCE MODE T, which is an enumerated type defined in advance modes.h. Parameter input and stream I/O support for this enumeration are provided in advance modes.cpp. The possible values for Advance Mode and the corresponding model behaviors are analogous to those given above for the Delay Mode in the DiscreteDelay model.
Section 11.1
Shifting Signals in Time
Table 11.2
Summary of model DiscreteAdvance.
Constructors: DiscreteAdvance< T > ::DiscreteAdvance( char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal, Control *dynamic adv, Control *adv change enabled ) DiscreteAdvance< T > ::DiscreteAdvance( //note 2 char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal, Control *dynamic adv) DiscreteAdvance< T > ::DiscreteAdvance( //note 3 char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal) Parameters: ADVANCE MODE T Advance Mode; int Initial Adv In Samps; int Num Initial Passes; int Max Adv In Samps; Notes: 1. BlockSize for this model is set by the PracSim system. 2. This constructor does not support ADVANCE MODE GATED. 3. This constructor does not support ADVANCE MODE GATED or ADVANCE MODE DYNAMIC. 4. Source code is contained in file discrete adv T.cpp.
367
368
11.1.3
Synchronization and Signal Shifting
Chapter 11
Continuous-Time Delays via Interpolation
Sometimes it is necessary to delay a signal by an interval that is not an integer multiple of the sampling interval. In these cases, interpolation must be used to generate new sample values that fall within the intervals between existing samples. If the signal that is to be delayed has been sampled at a relatively high rate, linear interpolation will often provide sufficient accuracy. Consider the case of a signal having 10 samples per block, as depicted in Figure 11.4. Let’s say the signal is to be delayed by 3.6T , where T is the sample interval. This would delay the input x0 to a point 0.6T to the right of y3 and 0.4T to the left of y4 . Similarly, x1 will be delayed to a point 0.6T to the right of y4 and 0.4T to the left of y5 . The model’s output samples must be those values that occur at integer multiples of T , not at times (n + 0.6)T . The output y4 occurring at time 4T needs to be the value that x would have at this point if the signal were a function of continuous time. As depicted for arbitrary values of x0 and x1 in Figure 11.5, the output y4 occurs at a point 0.4T to the right of the delayed x0 and at 0.6T to the left of the delayed x1 . Using linear interpolation to calculate y4 yields y 4 − x0 (4T − 3.6T ) = x1 x0 (4.6T − 3.6T ) y 4 − x0 0.4T = T x1 − x0 y4 = 0.6x0 + 0.4x1 A similar equation can be written for each output and generalized to obtain yn = 0.6xn−4 + 0.4xn−3
(11.1.1)
If we assume xm ≡ 0 for m > 0, we have y0 = y1 = y2 = 0. Calculation of y3 requires special consideration. If we assume x−1 = 0 and apply (11.1.1), we obtain y3 = 0.4x0 On the other hand, we could just define y3 = 0. The output sample y9 is the last output in block 1 and makes use of inputs x5 and x6 . The samples x6 , x7 , x8 , and x9 must be saved inside the model for use during pass 2 of the simulation. These results can be extended to the general case of a signal that is to be delayed by τ T , where T is the sampling interval and τ is nonnegative and real. Let m be the largest integer that does not exceed τ m = τ Then we can state the following:
Section 11.1
369
Shifting Signals in Time
x0
x1
x2
y0
y1
y2
x3
x4
x5
x6
x7
x8
x9
y6
y7
y8
y9
0.6T 0.4T
y3 0.6T 0.4T
y4
Figure 11.4 to xk .
y5
Relative sample positions when yk is delayed by 3.6T with respect
1. The first m + 1 samples of output block 1 will be zero. 2. The last m samples in each input block must be saved for use at the start of the next pass. 3. The output samples yn for n > m can be computed as yn = W xn−m−1 + (1 − W ) xn−m where W = τ − τ
11.1.3.1
Interpolation Using Sampling Functions
According to the uniform sampling theorem, if the spectrum of a signal vanishes beyond an upper frequency of fH , the signal can be completely determined by
370
Synchronization and Signal Shifting
Chapter 11
x0 y4 x1
4.6T
3.6T 4T
Figure 11.5
Linear interpolation of output y4 from inputs x0 and x1 .
samples taken at uniform intervals of T < 1/(2fH ). The sampled signal x[m] is related to the analog xa (t) by x[m] = xa (mT ) The original signal xa (t) can be reconstructed from x[m] by ∞
xa (t) = =
x[m]
m=−∞ ∞
sin [(π /T ) (t − mT )] (π /T ) (t − mT )
x[m] sinc(t/T − m)
(11.1.2a) (11.1.2b)
m=−∞
where
.
sinc(τ ) =
1 sin π τ πτ
τ = 0 otherwise
(11.1.3)
Because Eq. (11.1.2) can be used to find xa (t) for any value of t, it can be used as the basis for a time shifter that can shift a sampled signal by an arbitrary amount. To realize a delay of τ , we need to calculate values of xa (t) for t = nT − τ . xdly (nT ) = xa (nT − τ ) =
( τ) x[m] sinc n − m − T m=−∞ ∞
(11.1.4)
Equation (11.1.4) corresponds to the “usual” view of sinc interpolation, depicted in Figure 11.6, in which each input sample x[m] is used to weight a sinc function that is centered at t = mT . The interpolated value at any arbitrary time td , is obtained as the sum of the values at time td of all the weighted sinc functions. However, because
x [m1]
x [m1]sinc(t d - m1T )
m1T x [m2 ]
m2T
x [m2 ]sinc(td - m2T ) m3T x [m3 ]sinc(t d - m3T )
x [m3 ]
m4T
371
x [m4 ]sinc(t d - m4T ) x [m5 ]sinc(t d - m5T )
x [m4 ]
x [m5 ]
x (t d )
Figure 11.6
Interpolation using sinc functions.
372
Synchronization and Signal Shifting
Chapter 11
sinc(τ ) is symmetric about τ = 0, it is a simple matter to show that the value at t = nT − τ of a sinc function centered at t = mT is equal to the value at t = mT of a sinc function centered at t = nT − τ . This observation leads to an equivalent view of sinc interpolation at time td , which has a set of weighted sinc functions all centered at td , as depicted in Figure 11.7. The sinc weighted by x[m1 ] is evaluated at time m1 T , the sinc weighted by x[m2 ] is evaluated at time m2 T , and so on. The equation corresponding to this alternate view of interpolation is given by xdly (nT ) = xa (nT − τ ) =
) ( τ x[m] sinc m + − n T m=−∞ ∞
(11.1.5)
This alternate view is used in the theoretical development of a continuous-delay model. Equation (11.1.3) indicates that for all integer values of τ other than zero, the value of sinc(τ ) is zero. This means that in cases where the desired shift, τ , is an integer multiple of T , the only nonzero summand in both Eqs. (11.1.4) and (11.1.5) will be the one for which m = n − τ /T , so the interpolation is merely selecting an input sample for each shifted output such that * τ+ xout [n] = x n − T The more interesting case is when τ is not an integer multiple of T . Using the alternative view of Figure 11.7, the peaks of all the sinc functions used to compute xout [n] occur at a time between samples k and k + 1, where / τ0 (11.1.6) k = n− T The output index n can be expressed in terms of the sinc alignment index k as 1τ 2 (11.1.7) n=k+ T Substitution of Eq. (11.1.7) into Eq. (11.1.4) yields xdly (nT ) =
( 1τ 2 τ ) − x[m] sinc k − m + T T m=−∞ ∞
Let τ represent the displacement between t = kT and the centers of the sinc functions; then 1τ 2 τ (1 − τ ) − = T T T
Section 11.1
373
Shifting Signals in Time
and ∞
(1 − τ ) xdly (nT ) = x[m] sinc k − m + T m=−∞
(11.1.8)
The values of sinc(τ ) become small for large values of τ , so the range of the summation in Eq. (11.1.8) can be truncated to a range of m for which the values of sinc(k − m − (1 − τ )/T ) are “significant.” Assume that each output sample is to include contributions from M leading input samples that occur immediately prior to the sinc peak as well as contributions from M lagging input samples that occur immediately after the sinc peak. The M leading samples are x[k + 1 − M] through x[k], and the M lagging samples are x[k + 1] through x[k + M]. The interpolation equation then becomes k+M
(1 − τ ) x[m] sinc k − m + y[n] = T m=k−M+1
(11.1.9)
For any given value of k, the summation will require input samples k − M + 1 through k + M, and the result will be stored in output sample k + τ /T . For given values of M and τ , the sinc function needs to be evaluated at the same 2M abscissae for every value of the output index n. This fact allows a continuous-delay model to compute a set of 2M sinc factors at the beginning of a simulation and use these same values for generating each output for as long as the delay remains constant. It turns out that an interpolating delay element based on Eq. (11.1.9) is really nothing more than an FIR filter in which the filter coefficients h[p] are obtained as (1 − τ ) h[p] = sinc p − M + T
p = 0, 1, . . . , 2M − 1
Block Mode Considerations Before using Eq. (11.1.9) to implement a delay model, it is useful to explore how the various sample locations are related to the block boundaries of the input and output signals. In the discussions so far, the indices n and k have been assumed to start at zero and continually increment until the simulation ends. A different notation is needed to indicate indices with a particular block. In the development that follows, k [B] is used to indicate a sample index within input block B. The intrablock index k [B] and corresponding global index k
x [m1] x [m1]sinc(m1T - t d ) x [m2 ]
x [m2 ]sinc(m2T - t d )
x [m3 ]sinc(m3T - t d )
374
x [m4 ]sinc(m4T - t d ) x [m4 ]
x [m5 ]sinc(m5T - t d )
x [m5 ]
x (t d )
Figure 11.7 Alternate view of sinc interpolation.
Section 11.1
375
Shifting Signals in Time
are related by
k [B] =
⎧ ⎪ ⎪ ⎪ ⎨
B=1
k
⎪ ⎪ ⎪ Nb ⎩ k− B−1
otherwise
b=1
where Nb is the number of samples in block b. (Note that in PracSim, block numbering begins with one rather than zero because it is convenient to keep pass numbering and block numbering the same, and pass zero is reserved for some advanced consistency checking that does not generate any blocks of signal samples.) To simplify the initial development, let’s assume that M + τ /T < N . For development of implementation rules, it is more convenient to express the position of the sinc window in terms of its first and last samples rather than in terms of the alignment index k. The alignment index is a contrived quantity introduced solely for the purpose of allowing window position and window length to be considered independently in the foregoing development. Let mL represent the global index of the leftmost (i.e., first) sample in the truncated sinc window, and let mR represent the global index of the rightmost sample in the truncated sinc window: mL = k − M + 1 mR = k + M mR (1 − τ ) x[m] sinc mL + M − m − 1 + y[n] = T m=m L
The output index n can be expressed in terms of mL as 1τ 2 n = mL + M − 1 + /τ 0 T = mL + M + T Consider input block B containing NB samples. The earliest output sample that depends on any inputs from block B is the output computed when m[B] R = 0 or mR =
B−1
Nb
b=1
Generation of this output also involves samples m[B−1] = NB−1 − 2M + 2 through m[B−1] = NB−1 − 1 saved from input block B − 1. (When B = 1, the values of
376
Synchronization and Signal Shifting
Chapter 11
these saved samples are each assumed to be zero.) Using Eq. (11.1.7), the index for this output sample is obtained as B−1 1τ 2 (11.1.10) nE = Nb − M + T b=1 The latest output sample that can be computed before input block B + 1 becomes available is the one computed when m[B] R = NB − 1 or m R = NB − 1 + =
B
B−1
Nb
b=1
Nb − 1
b=1
Using Eq. (11.1.7), the index for this output sample is obtained as B−1 1τ 2 −1 nL = N b + NB − M + T b=1 The output samples, indexed nE through nL , comprise a sequence of exactly NB samples, and it seems logical to keep them together as a single output block. Let’s denote this block as output block D and not immediately assume that D = B. Close examination of Eq. (11.1.10) reveals that the number of input samples prior to block B and the number of output samples prior to block D differ by τ /T − M. When τ /T = M, the difference is zero, and it seems logical to make D = B and keep the output block size equal to the input block size. When τ /T < M, the number of input samples prior to block B is greater than the number of output samples prior to block D. If τ /T − M is less than the nominal block size, it still makes sense to make D = B, but one output block (block 1 is the logical choice) needs to be shorter than the corresponding input block. If τ /T − M is greater than the nominal block size, we are faced with a choice. We can totally eliminate a number of output blocks such that at most one output block needs to be shortened, or we can keep D = B and shorten two or more output blocks. When τ /T > M, the number of input samples prior to block B is less than the number of output samples prior to block D. This case arises when the delay is larger than half the span of the truncated sinc window. The sequence of output samples will begin with a preamble of τ /T − M zeros taking the place of output samples
Section 11.1
Shifting Signals in Time
377
whose calculation would require delayed samples from input blocks prior to block 1. Signal management is easier if the nominal block size is made the maximum block size. Therefore, when τ /T > M, a number of extra output blocks must be introduced so that the size of any one block does not exceed the nominal size. The first sample in output block B has an intrablock index of n[B] = 0, which corresponds to a global index n = (B − 1)N . The position of the sinc window for generating this output is obtained from Eq. (11.1.6) as / τ0 k1 = (B − 1)N − T The first input sample needed for the summation in this position has a global index m1A given by / τ0 −M +1 (11.1.11) m1A = (B − 1)N − T 1τ 2 −M +1 (11.1.12) = (B − 1)N − T and the final input sample needed has a global index m1B given by / τ0 +M m1B = (B − 1)N − 1 Tτ 2 +M = (B − 1)N − T
(11.1.13) (11.1.14)
The final sample in output block B has an intrablock index of n[B] = N − 1, which corresponds to a global index n = BN − 1. The position of the sinc window for generating this output is obtained from Eq. (11.1.6) as / τ0 k2 = BN − 1 − T The first input sample needed for the the summation in this position has a global index m2A given by / τ0 −M +1 (11.1.15) m2A = BN − 1 − 1τ 2 T = BN − −M (11.1.16) T and the final input sample needed has a global index m2B given by / τ0 +M m2B = BN − 1 − 1 Tτ 2 +M = BN − 1 − T
(11.1.17) (11.1.18)
378
Synchronization and Signal Shifting
Chapter 11
When τ /T > M, the early samples in output block B will be generated using only samples from input block B − 1, as depicted in Figure 11.8. Curly braces are used to indicate the span of the sinc interpolation window for various output samples. Specifically, when τ /T = M + p, 0 < p ≤ (N + 1 − 2M), the first p output samples in block B are generated using only samples from input block B − 1. The first sample in output block B is computed when 1τ 2 m1A = (B − 1)N − −M +1 T = (B − 1)N − (M + p) − M + 1 Large Delays
= (B − 2)N + N − 2M − p + 1 or m[B−1] = N − 2M − p + 1 1A The first sample in output block B that can be computed without using any samples from input block B − 1 corresponds to the point at which m[B] 1A = 0 or m1A = (B − 1)N. This sample is stored in location n[B] = 2M + p − 1. Evaluation of Eqs. (11.1.15) and (11.1.17) for τ /T = M + p reveals that the final sample in output block B is generated using samples N − 2M − p through N − 1 − p from input block B. When τ /T = M + p for p > (N + 1 − 2M), some of the early samples in output block B will require input samples from input blocks B − 2 and prior. Small Delays When the delay is smaller than half the width of the sinc window, the generation of some samples late in output block B will require samples from input block B + 1, as depicted in Figure 11.9. Specifically, when τ /T = M − q, 0 < q < M, generation of sample N − 1 in output block B requires samples 0 through q − 1 from input block B + 1. Generation of the first sample in output block B uses samples N − 2M + q + 1 through N − 1 from input block B − 1 plus samples 0 through q from input block B. Because the sinc interpolation sometimes must wait for future samples, even when used to implement a delay, it encounters buffering difficulties similar to those discussed in Section 11.1.2 for signal advance models. 11.1.3.2
Dynamic Block Size
The first output sample that depends upon block B input is generated when the sinc window is positioned such that its final sample is aligned with the first sample in input block B. Thus, the first sample in output block B is computed using one input from block B and 2M − 1 inputs from block B − 1. The final output that can be
379
Shifting Signals in Time
N -1 N -1
N - 1- p
N - 2M - p
2M - 1
p -1
N -1 0
input block B
0
N - 2M
N - 2M - p + 1
input block B - 1
2M + p - 1
Section 11.1
output block B
Figure 11.8
Relationship between input and output blocks when τ /T > M.
generated before block B + 1 becomes available is computed when the sinc window is positioned such that its final sample is aligned with the final sample in input block B, as depicted in Figure 11.10. If there are N samples in the input block, there will be a total of N output samples corresponding to the end of the sinc window aligning with each of the N input samples. However, it would be inappropriate to issue these N outputs as output block B. Assuming that all input blocks have fixed length N , sample N − 1 of block B has a global index of BN − 1. When the final sample of the sinc window is aligned with this sample, k = BN − 1 − M and the corresponding output index is obtained from Eq. (11.1.7) as 1τ 2 n = BN − 1 − M + T which corresponds to a sample in block B + 1 or beyond whenever τ /T > M.
380
Synchronization and Signal Shifting
N -1
q -1
N - 2M + q
2M - 1 2M - q - 1
q
0
input block B
0
N - 2M + q + 1
input block B - 1
Chapter 11
output block B
Figure 11.9
Relationship between input and output blocks when τ /T < M.
The output that should be issued as sample N − 1 of block B is generated when 1τ 2 k = BN − 1 − T or when the final sample of the sinc window is aligned with input sample m where m is obtained as m = k+M = BN − 1 + M −
1τ 2 T
Consider the very first output sample indexed by n = 0. Equation (11.1.7) indicates that for this sample, the sinc alignment index is given by 1τ 2 k=− T
Section 11.1
381
Shifting Signals in Time
KL - M - 1 0
1
KL
N -1
2
⎡τ ⎤ KL + ⎢ ⎥ ⎢T ⎥
N -1
Figure 11.10 Alignment of sampling function for generating final sample in output block.
For this alignment, the summation (11.1.9) involves input samples x[m0A ] through x[m0B ], where m0A = − m0B = −
1τ 2 1 Tτ 2 T
−M +1 +M
For τ /T > M, the summation for y[0] involves only inputs x[m] for which m < 0. Assuming that x[m] = 0 for m < 0, the proper output value should be y[0] = 0. The summation will not begin to involve inputs x[m] for m ≥ 0 until the output index n equals or exceeds τ /T − M. For τ /T = M with n = 0, the final summand in Eq. (11.1.9) involves x[0]. All of the other summands involve x[m] for m < 0. For τ /T < M, the summand for y[0] involves a number of inputs x[m] for m ≥ 0. Assuming that the input block has a length of N, the final sample in the block will have an index m = N − 1. When the final sample of the sinc window is aligned with this sample, the alignment index is k = N − 1 − M, and the corresponding
382
Synchronization and Signal Shifting
output index is obtained from Eq. (11.1.7) as n=N −1−M +
Chapter 11
1τ 2
T Thus, if the first output block contains all of the samples that can be generated from the first input block, there will be N − M + τ /T samples in this block. The final 2M − 1 samples from the input block must be saved for use in generating the first 2M − 1 samples in the second output block. The first sample in the second output block is generated when the sinc window is positioned such that its final sample is aligned with the first sample of the second input block. Thus, the first sample will be computed using one input from the second block and 2M − 1 inputs that were saved from the first block. The final output that can be generated using the second input block is computed when the sinc window is positioned such that its final sample is aligned with the final sample in the input block. If there are N samples in the input block, there will be N output samples—one output corresponding to each of the N possible alignments of the sinc window. If τ /T is larger than N, there may be one or more all-zero output blocks before the interpolation starts using actual input samples. In such a situation it would be possible to determine the exact length of the all-zero preamble and (1) prepend this preamble to the first output block computed from actual inputs, (2) issue this preamble as one large block during any pass prior to the pass that begins using actual inputs, or (3) issue this preamble as a number of smaller blocks spread over all passes prior to the pass that begins using actual inputs. 11.1.3.3
Continuous Delay Model
Table 11.3 summarizes the model ContinuousDelay that can be used for delaying signals by arbitrary amounts. This model is provided as a template that can be instantiated for signals of various types. The model can operate in any of the four different modes described for DiscreteDelay. Additional constructors are provided that do not require the controls for gating and dynamic delay to be connected if they are not needed. The model uses interpolation to delay the input signal by intervals that in general are not integer multiples of the sampling interval. The particular interpolation technique to be used is selected by the value of Interp Mode that is read from the parameter input file when the simulation is being configured. 1. INTERP MODE LINEAR. Delayed signal values are determined using simple linear interpolation between the point immediately before the desired time and the point immediately after the desired time.
Section 11.1
383
Shifting Signals in Time
2. INTERP MODE QUADRATIC. Quadratic interpolation is used to determine delayed signal values. 3. INTERP MODE SINC. Delayed signal values are interpolated using the sampling function (sin x)/x. The number of points used for this interpolation is specified by an input parameter to the model. Near the end of each block, the model must save a number of input samples for use during processing of the subsequent block. It would be possible to draw most samples to be processed directly from the input signal buffer, and only use the internal buffer for those samples that have been saved from the previous block. However, it is conceptually easier to copy samples from the input buffer to an internal interpolation buffer and then perform all processing using samples from this internal buffer. Let’s assume that this internal buffer is implemented as a circular buffer having L ≥ 2M locations. At the beginning of pass 1, the model initializes locations L − M through L − 1 to zero. The first M input samples are read into locations 0 through M − 1. At this point, the buffer is set up for interpolating a value at a time t, where −T < t ≤ 0. In other words, the maximum available right-bracketing input sample is x0 . Saying that input sample xk+1 is the maximum available right bracket means that samples xk+1 through xk+M but not xk+M+1 have been read into the interpolation buffer. By convention in PracSim, when rate-changing is being performed, the first output sample always has the same value as the first input sample. Then, if the output waveform is being compressed in time, the second output is interpolated at some time t, where 0 < t ≤ T . In this case, the second output can be generated without reading additional inputs into the interpolation buffer. If the output waveform is being stretched in time, then the second output is interpolated at some time t, where T < t ≤ 2T . In this case, it is necessary to read input sample xM into the buffer and thereby change the maximum available right-bracket sample from x0 to x1 . In general, to compute output sample yN , the maximum required right-bracketing input sample will be xk+1 , where k + 1 is obtained as ,
N Tout k+1= Tin
-
Whenever the available right-bracket index equals or exceeds the required rightbracket index, the output sample can be interpolated without reading further input samples into the interpolation buffer. Whenever the required right-bracket index exceeds the available right-bracket index, additonal samples must be read into the interpolation buffer.
384
Synchronization and Signal Shifting
Table 11.3
Summary of model ContinuousDelay.
Constructors: ContinuousDelay< T > ::ContinuousDelay( char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal, Control *new delay, Control *delay change enabled ) ContinuousDelay< T > ::ContinuousDelay( //note 2 char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal, Control *new delay) ContinuousDelay< T > ::ContinuousDelay( //note 3 char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal) Parameters: DELAY MODE T Delay Mode; INTERP MODE T Interp Mode; double Initial Delay; int Max Delay; Notes: 1. BlockSize for this model is set by the PracSim system. 2. This constructor does not support DELAY MODE GATED. 3. This constructor does not support DELAY MODE GATED or DELAY MODE DYNAMIC. 4. Source code is contained in file contin delay T.cpp.
Chapter 11
Section 11.2
11.2
385
Correlation-Based Delay Estimation
Correlation-Based Delay Estimation
A simple cross-correlation between a transmitted waveform and the corresponding received waveform can be used to estimate the delay experienced by the waveform in passing through a simulated channel. In continuous time, the correlation of x(t) and y(t) is defined by ∞ Rxy (τ ) = x(t)y ∗ (t + τ )dt −∞
Subject to certain constraints, if y(t) is simply a delayed version of x(t), the value of τ for which z(τ ) is maximized is equal to the delay between x(t) and y(t). One important constraint is that x(t) be a waveform that has “good” autocorrelation properties. The autocorrelation Rx (τ ) is simply the correlation of a signal with itself, as in ∞ Rx (τ ) = x(t)x ∗ (t + τ )dt −∞
A signal is said to have good autocorrelation properties if the autocorrelation function Rx (τ ) has a single maximum that is sharply peaked and easily distinguished from other large “near-maximum” values. A sinusoid is an example of a waveform with “bad” autocorrelation properties—the autocorrelation of sin ωt is a periodic train of impulses at delays τ = 2kπ /ω where k = 0, ±1, ±2, . . . . In simulations, correlations must be performed between finite blocks of samples from discrete-time signals. The length-N discrete-time correlation of x[n] and y[n] is defined by Rxy [k] =
N−1
x[n] y ∗ [n + k]
(11.2.1)
n=0
Subject to certain constraints, estimating the delay can be mapped into the equivalent problem of finding the delay between x[n] ˜ and y[n] ˜ where x[n] ˜ and y[n] ˜ are the periodic extensions of x[n] and y[n] x[mN ˜ + n] = x[n] y[mN ˜ + n] = y[n]
for n = 0, 1, 2, . . . , N − 1 and m = 0, ±1, ±2, . . .
Assuming that N is a power of 2, this mapping allows the correlation to be performed using FFT-based fast correlation techniques. Specifically, the correlation of x[n] and y[n] can be obtained as the inverse FFT of the product X[m]Y [m], where X[m] and
386
Synchronization and Signal Shifting
Chapter 11
Y [m] are respectively the FFTs of x[n] and y[n]. The result of the IFFT is searched to find the sample with the greatest magnitude, and the index of this sample is multiplied by the sampling interval to obtain the delay interval. In other words, if sample L in the IFFT result has the largest magnitude, then the delay is τ = LTS , where TS is the sampling interval. Because of the assumed periodicities implicit in the DFT, it is impossible to distinguish between the case of y[n] delayed by L samples with respect to x[n] and the case of x[n] delayed by N − L samples with respect to y[n]. There are three different approaches for coping with this ambiguity. If it is known that y[n] is always delayed with respect to x[n], then all delays can be assumed to be positive and computed as τ = LTS . If it is not known whether y[n] is delayed with respect to x[n] or if x[n] is delayed with respect to y[n], but the magnitude of the delay is restricted to be less than N TS /2, then delay can be computed as τ=
LTS (L − N )TS
0 ≤ L ≤ N/2 (N/2) < L < N
A negative value of τ indicates that x[n] is delayed with respect to y[n]. The third, and most robust, approach involves padding both x[n] and y[n] with N zero-values samples prior to performing the FFTs. The result of the IFFT will then have a length of 2N, and the delay can be computed as τ=
LTS (L − 2N )TS
0≤L≤N N < L < 2N
This approach accomodates delays from (1−N )TS to (N −1)TS without ambiguity. However, as the magnitude of the delay approaches N samples, the amount of overlap between x[n] and y[n] decreases, making the results of the finite correlation very unreliable. The estimation of delays larger than approximately 0.8N TS is usually performed as a two-step process. Using knowledge about the source and nature of the delay, an analyst can make a rough estimate of the delay between signals x[n] and y[n]. In the simulation, an extra delay element is used to explicitly create xD [n] as a delayed version of x[n]. The amount of delay between x[n] and xD [n] is an integer number of sampling intervals and is chosen to be large enough that the delay between xD [n] and y[n] is “guaranteed” to be less than 0.8N sample intervals. This residual delay can be estimated using fast correlation and added to the fixed delay between x[n] and xD [n] to obtain the total delay between x[n] and y[n].
Section 11.2
11.2.1
Correlation-Based Delay Estimation
387
Software Implementation
The CoarseDelayEstimator model, summarized in Table 11.4, uses fast correlation to estimate the delay between an input waveform in sig and a reference waveform ref sig. The model assumes that in sig is similar to a delayed version of the reference waveform. If the two waveforms are unrelated, the CoarseDelayEstimator model still returns a delay estimate that corresponds to the peak of the correlation. The model does not perform any thresholding that would be needed to determine if the peak represents a valid alignment between similar waveforms or simply the largest correlation between two random uncorrelated waveforms.
Table 11.4
Summary of model CoarseDelayEstimator.
Constructor: CoarseDelayEstimator ::CoarseDelayEstimator( char* instance name, PracSimModel* outer model, Signal* in sig, Signal* ref sig, Control* estim valid ctl, Control* delay est ctl, Control* samps delay est ctl); Parameters: int Num Corr Passes bool Limited Search Window Enab int Search Window Beg int Search Window End bool Invert Input Sig Enab Notes: 1. This model is a signal sink and has no output signals. 2. The nominal block length for the input signals is set by the PracSim infrastructure. The correlation length is 2k+1 where 2k is the smallest power of two that equals or exceeds the nominal block length. 3. Source code is contained in file coarse delay est.cpp.
388
Synchronization and Signal Shifting
Chapter 11
A block diagram of the CoarseDelayEstimator model is shown in Figure 11.11. The input signals in sig and ref sig must have nominal block lengths that are equal. However, because these signals generally arrive via two very different paths, they may have different values for valid block length in any given pass. The model must employ special measures to ensure that different block lengths do not cause an apparent block-to-block variation in the relative delay between in sig and ref sig. An instance of the SignalReblocker class is created for each input to ensure that an equal number of samples from each input signal is used each time a correlation is performed. If there is an insufficient number of available samples for either waveform, the model exits without generating a new delay estimate. The SignalReblocker objects accumulate unused samples for use in subsequent passes. The parameter Limited Search Window Enab, when set true, causes the search for the correlation peak to be confined to delays greater than Search Window Beg and less than Search Window End. If Limited Search Window Enab is set false, the search window parameters need not be specified, and the search is conducted over all delays from (1 − N )TS to (N − 1)TS . If the parameter Invert Input Sig Enab is set true, each sample of in sig is multiplied by −1 before the correlation is performed. Correlation is performed for the number of passes indicated by Num Corr Passes. Once this limit is reached, the model operates in a bypass mode for the remainder of the simulation. The delay estimate is conveyed to other models via three controls, which are outputs from CoarseDelayEstimator. Once a correlation has been performed, (1) the value of the samps delay est ctl control is set to the estimated delay in samples, (2) the value of the delay est ctl control is set to the estimated delay in normalized time units, and (3) the value of estim valid ctl is set to true indicating that the delay estimates are valid for use by downstream models. On subsequent passes in which a correlation is performed, the values of delay est ctl and samps delay est ctl are updated to reflect the results of the new correlation. Figure 11.12 shows a block diagram indicating how these controls might be used in a simulation. A simulation similar to this block diagram is provided in the file coarsedelayest sim.cpp.
11.3
Phase-Slope Delay Estimation
The delay estimation approach described in Section 11.2 can estimate a delay to the nearest integer multiple of the sampling interval. For certain applications, finer estimates may be needed. Such estimates can be obtained using the differential phase slope approach, which is based on the time-delay property of the Fourier transform.
Section 11.3
389
Phase-Slope Delay Estimation
in_sig
ref_sig
signal reblocker
signal reblocker Invert_Input_Sig_Enab
negate
conjugate
FFT
sampleby-sample multiply
FFT
IFFT model parameters sampleby-sample magnitude
Ltd_Search_Win_Enab Search_Win_Beg Search_Win_End
Samp_Intvl
search for peak
est_is_valid_ctl delay_est_ctl
samps_delay_est_ctl
Figure 11.11
Block diagram for CoarseDelayEstimator model.
390
Synchronization and Signal Shifting
Chapter 11
BitGener
BasebandWaveform
ButterworthFilterByIir
DiscreteDelay
in_sig
ref_sig
CoarseDelayEstimator
est_is_valid_ctl samps_delay_est_ctl
in_sig
DiscreteDelay
delay_change_enab_ctl dynam_delay_ctl out_sig
these two signals are time aligned
Figure 11.12 Block diagram for a simulation that uses the CoarseDelayEstimator model.
Section 11.3
Phase-Slope Delay Estimation
391
Consider a signal x[n] having X[m] as its discrete Fourier transform. If x[n] is delayed by τ , the Fourier transform of the delayed signal is equal to exp(−j 2π τ f ) times the transform of the original signal. x[n] ⇐⇒ X[m] xD [n] ⇐⇒ X[m] exp(−j 2π τ f ) If the transform of x[n] is multiplied by the conjugate of the transform of xD [n], the result will have a magnitude of |X[m]|2 and a phase of θ [m] = 2π τ mF . Theoretically, the delay τ can be determined by estimating the phase θ [m] at any discrete frequency m and computing θ [m] 2π mF However, in practice, an estimate based on a single frequency is subject to various sources of error. A robust estimate based on multiple frequencies can be obtained by observing that the phase function θ [m] is a linear function of the frequency index m. The value for θ [m] can be estimated at a number of frequencies, and then a line can be fitted to the estimated points using linear regression, a standard statistical analysis technique described in numerous texts. The estimated delay is then given by the slope of the fitted line. Figure 11.13 shows plots of |X[m]|2 and θ [m] in degrees for the case of an NRZ waveform (see Chapter 3) for a random sequence of bits. The sampling interval is T = 0.125, and there are eight samples per bit. The nominal block length is 2048 samples padded with zeros to make a correlation record of length 4096. The delay is τ = 1.46. There are two important phenomena illustrated in this figure. First, the phase spectrum exhibits “wrapping” such that the phase values remain in the range −180 to +180 degrees. Second, the phase spectrum appears to become “noisier,” exhibiting larger deviations from a straight line at those frequencies for which the power spectrum becomes very small. The linear regression should be confined to a range of frequencies for which wrapping does not occur and the deviations from linear remain small. The explicit formula for computing the delay estimate is
# 2 NT m ¯ θ m − θ¯ m=m1 (m − m) τ= # 2 π m ¯ 2 m=m1 (m − m) τ=
where θ is in radians and the overbar denotes the time average from m = m1 to m2 . For large delays, the slope of the phase spectrum becomes steep and wrapping occurs within a relatively small number of samples. Therefore, the two-step process described in Section 11.2 is extended to a three-step process when high resolution estimates of large delays are desired:
392
Synchronization and Signal Shifting
Chapter 11
1. Using knowledge about the source and nature of the delay, an analyst makes a rough estimate of the delay between signals x[n] and y[n]. In the simulation, an extra delay element is used to explicitly create xD [n] as a delayed version of x[n]. The amount of delay between x[n] and xD [n] is an integer number of sampling intervals and is chosen to be large enough that the delay between xD [n] and y[n] is “guaranteed” to be less than N sample intervals. 2. The residual delay is estimated using fast correlation and added to the fixed delay between x[n] and xD [n]. 3. The delay between the adjusted xD [n] and y[n] is estimated using the differential phase slope approach. The FineDelayEstimator model, summarized in Table 11.5, uses fast correlation to estimate the delay between an input waveform in sig and a reference waveform ref sig. The model assumes that in sig is similar to a delayed version of the reference waveform. A block diagram of the FineDelayEstimator model is shown in Figure 11.14. The input signals in sig and ref sig are reblocked using the SignalReblocker class, as discussed for the CoarseDelayEstimator model in Section 11.2. The parameters Regression Index Begin and Regression Index End specify the range of frequencies over which regression of the phase is to be performed. (A deluxe version of FineDelayEstimator might be designed to use knowledge about the spectra of various types of waveform to automatically set the range of the frequency index over which the regression is performed.) If the parameter Invert Input Sig Enab is set true, each sample of in sig is multiplied by −1 before the phase is estimated. Estimation is performed for the number of passes indicated by Num Corr Passes. Once this limit is reached, the model operates in a bypass mode for the remainder of the simulation. Because the FineDelayEstimator model is often used to refine a delay estimate produced by the CoarseDelayEstimator model, an input control, estim enab cntl, is provided so that operation of FineDelayEstimator can be suppressed until after other models have applied a coarse delay adjustment to the ref sig input signal. The fine estimate of delay is conveyed to other models via two controls, which are outputs from FineDelayEstimator. Once an estimate has been computed, the value of estimated delay ctl is set to the estimated delay in normalized time units, and the value of estim valid ctl is set to true, indicating that the delay estimate is valid for use by downstream models. On subsequent passes in which a phase-slope estimation is performed, the value of estimated delay ctl is updated to reflect the results of the new estimate.
Section 11.4
393
Changing Clock Rates
20 (a)
magnitude (dB)
10 0 -10 -20 -30 -40 0
512
1024 sample index
1536
2048
phase (degrees)
180 (b)
120 60 0 -60 -120 -180 0
512
1024 sample index
1536
2048
Figure 11.13 Spectrum for fine delay estimation for simulated ideal NRZ waveform: (a) power spectrum, (b) differential phase spectrum.
Figure 11.15 shows a block diagram idicating how these controls might be used in a simulation. A simulation similar to this block diagram is provided in the file finedelayest sim.cpp.
11.4
Changing Clock Rates
It is sometimes necessary to model the effects of clock-rate discrepancies between a transmitter and a receiver. For example, in a data transmitter, baseband I and Q inputs
394
Synchronization and Signal Shifting
Table 11.5
Chapter 11
Summary of model FineDelayEstimator.
Constructor: FineDelayEstimator ::FineDelayEstimator( char* instance name, PracSimModel* outer model, Signal* in sig, Signal* ref sig, Signal* out sig); Parameters: int Num Corr Passes bool Limited Search Window Enab int Search Window Beg int Search Window End bool Invert Input Sig Enab Notes: 1. Source code is contained in file fine delay est.cpp.
for a quadrature modulator are often generated by passing a symbol-rate sequence of complex samples through a pair of pulse-shaping filters. Ideally, a similar pair of baseband waveforms will be available at the output of a quadrature demodulator in the receiver. These waveforms are then sampled at the symbol rate to obtain the sequence of symbol values. The symbol-rate clock signals in the transmitter and receiver are nominally equal, but in practical systems, the two clock rates usually differ by some small amount, and it is often necessary to model the effects of this difference. A constant offset between clock rates is technically a frequency shift, but the techniques for modeling a small offset are more closely related to the techniques of time-shifting than they are to the techniques of frequency-shifting. The interpolation techniques of Section 11.1.3.1 can easily be adapted for making small rate changes. In a fixed-delay model, a single set of interpolation coefficients is computed based on the relative time offset between the set of available samples and the set of desired samples. In a rate-change model, the relative off-
Section 11.4
395
Changing Clock Rates
in_sig
ref_sig
signal reblocker
signal reblocker Invert_Input_Sig_Enab
negate
FFT
conjugate
sampleby-sample multiply
sampleby-sample phase
FFT
model parameters Regression_Start Regression_Stop
linear regression
NT p
slope est_is_valid_ctl
delay_est_ctl
Figure 11.14
Block diagram for FineDelayEstimator model.
set between available samples and desired samples is continuously changing, so in general, a new set of interpolation coefficients must be computed for each output sample to be produced. In some cases, it would be possible to have a number of different sets of interpolation coefficients and periodically cycle through these sets. Specifically, consider the rate change factor FR , defined as FR =
Tin Rout = Tout Rin
396
Synchronization and Signal Shifting
Chapter 11
If Tin and Tout are rational numbers, then FR is a rational number. If FR is expressed as a ratio FR =
NF DF
where NF and DF are integers whose greatest common factor is 1, then a complete cycle of interpolation coefficients will occur every DF output samples. Generating each cycle of DF output samples will consume exactly NF input samples. The RateChanger model, summarized in Table 11.6, takes the most general approach of computing a new set of interpolation coefficients for each output sample. Table 11.6
Summary of model RateChanger.
Constructor: RateChanger ::RateChanger( char* instance name, PracSimModel* outer model, Signal* in sig, Signal* out sig); Parameters: int Num Sidelobes double Rate Change Factor Notes: 1. Source code is contained in file rate changer T.cpp. 2. This template model is instantiated for types int, float, and complex.
Section 11.4
397
Changing Clock Rates
BitGener
BasebandWaveform
ButterworthFilterByIir
ContinuousDelay
in_sig
ref_sig
in_sig
CoarseDelayEstimator
DiscreteDelay
delay_change_enab_ctl dynam_delay_ctl
est_is_valid_ctl samps_delay_est_ctl
out_sig
in_sig
ref_sig
in_sig
FineDelayEstimator
ContinuousDelay
delay_change_enab_ctl
est_is_valid_ctl
dynam_delay_ctl
delay_est_ctl
out_sig
these two signals are time aligned
Figure 11.15 Block diagram FineDelayEstimator model.
for
a
simulation
that
uses
the
Appendix 11A
EXAMPLE SOURCE CODE
The companion Web site includes nine Microsoft Visual Studio projects, each comprising a simulation that demonstrates and provides a test vehicle for a different signal shifter model, as listed in Table 11A.1. Table 11A.1
Projects in Shifters directory.
project DiscDelay DiscAdv ContinDelay ContinAdv RealCorrTest RateChange DftDelay CoarseDelayEst FineDelayEst
11A.1
featured model DiscreteDelay DiscreteAdvance ContinuousDelay ContinuousAdvance RealCorrelator RateChanger DftDelay CoarseDelayEstimator FineDelayEstimator
DiscreteDelay
As discussed in Section 11.1.1.2, DiscreteDelay has three different constructors, which are provided in Listings 11A.1 through 11A.3. Tasks common to all the constructor forms are performed by the Constructor Common Tasks method, which is provided in Listing 11A.4. The Execute method is provided in Listing 11A.5.
398
Section 11A.1 DiscreteDelay
Listing 11A.1 Constructor for DiscreteDelay model that supports gated, dynamic operation.
template< class T > DiscreteDelay< T >::DiscreteDelay( char* instance_name, PracSimModel* outer_model, Signal* in_sig, Signal* out_sig, Control *dynam_dly_ctl, Control *delay_chg_enab_ctl ) :PracSimModel( instance_name, outer_model) { MODEL_NAME(DiscreteDelay_T); this->Constructor_Common_Tasks( instance_name, in_sig, out_sig); //--------------------------// Controls Dynam_Dly_Ctl = dynam_dly_ctl; Delay_Chg_Enab_Ctl = delay_chg_enab_ctl; return; }
399
400
Example Source Code
Appendix 11A
Listing 11A.2 Constructor for DiscreteDelay model that supports ungated, dynamic operation.
template< class T > DiscreteDelay< T >::DiscreteDelay( char* instance_name, PracSimModel* outer_model, Signal* in_signal, Signal* out_signal, Control *dynam_dly_ctl ) :PracSimModel(instance_name, outer_model) { this->Constructor_Common_Tasks( instance_name, in_signal, out_signal); //-----------------------------------------// Controls Dynam_Dly_Ctl = dynam_dly_ctl; switch (Delay_Mode){ case DELAY_MODE_NONE: case DELAY_MODE_FIXED: case DELAY_MODE_DYNAMIC: break; case DELAY_MODE_GATED: ostrstream temp_stream; temp_stream Constructor_Common_Tasks( instance_name, in_signal, out_signal); char *message; ostrstream temp_stream; switch (Delay_Mode){ case DELAY_MODE_NONE: case DELAY_MODE_FIXED: break; case DELAY_MODE_DYNAMIC: temp_stream GetBlockSize(); Samp_Intvl = fsig_Input->GetSampIntvl(); Filter_Core->Initialize(block_size, Samp_Intvl); Osc_Output_Prev_Val = 0.0; OscOutput = 0; Phi_Sub_2 = 0; Prev_Input_Positive = true; Prev_Input_Val = 0; Prev_Filt_Val = 0; Prev_Time_Zc = 0; Prev_State = 0; Prev_Cap_Val = 0; Time_Of_Samp = 0.0; Prev_Osc_Phase = 0.0; }
433
434
Example Source Code
Listing 12A.4
Appendix 12A
Constructor for DigitalPLL model.
int DigitalPLL::Execute() { // pointers for signal data float float float float float float
*fsOutput_ptr; *fs_filtered_error_ptr; *fsOscOutput_ptr; *fsOscFreq_ptr; *fsOscPhase_ptr; *fsInput_ptr;
float float float float float
input_val; prev_input_val; osc_output_val; filt_val; inst_freq;
double samp_intvl; double err_sum=0; double err_avg; int block_size, is; double time_of_samp; double time_zc; double delta_T; double osc_phase; double cap_val; double prev_cap_val; double prev_osc_phase; double prev_filt_val; double prev_time_zc; double output_phase; double tau_n; int prev_state; int new_state; double time_dwell_plus; double time_dwell_minus; // set up pointers to data buffers for input and // output signals fsOutput_ptr = GET_OUTPUT_PTR( fsig_Output ); fs_filtered_error_ptr = GET_OUTPUT_PTR( fsig_Filtered_Error );
Section 12A.1 DigitalPLL
Listing 12A.4
continued.
fsOscOutput_ptr = GET_OUTPUT_PTR( fsig_Osc_Output ); fsOscFreq_ptr = GET_OUTPUT_PTR( fsig_Osc_Freq ); fsOscPhase_ptr = GET_OUTPUT_PTR( fsig_Osc_Phase ); fsInput_ptr = GET_INPUT_PTR( fsig_Input ); samp_intvl = Samp_Intvl; osc_output_val = Osc_Output_Prev_Val; prev_input_val = Prev_Input_Val; prev_filt_val = Prev_Filt_Val; prev_time_zc = Prev_Time_Zc; prev_state = Prev_State; prev_cap_val = Prev_Cap_Val; prev_osc_phase = Prev_Osc_Phase; block_size = fsig_Input->GetValidBlockSize(); fsig_Output->SetValidBlockSize(block_size); fsig_Filtered_Error->SetValidBlockSize(block_size); fsig_Osc_Output->SetValidBlockSize(block_size); fsig_Osc_Freq->SetValidBlockSize(block_size); fsig_Osc_Phase->SetValidBlockSize(block_size); for (is=0; is= 0) != Prev_Input_Positive){ // zero crossing has occurred time_zc = time_of_samp - samp_intvl * input_val /(input_val - prev_input_val); // compute elapsed interval delta_T = time_zc - prev_time_zc; // update oscillator phase inst_freq = Omega_Sub_0 + K_Sub_0 * prev_filt_val; osc_phase = prev_osc_phase + inst_freq * delta_T;
435
436
Example Source Code
Listing 12A.4
Appendix 12A
continued.
// based on osc_phase and prev_osc_phase, // determine if the oscillator waveform has // had a positive-going zero crossing between // times prev_time_zc and time_zc. // // Normalize osc_phase and prev-osc_phase if(osc_phase > TWO_PI){ osc_phase -= TWO_PI; prev_osc_phase -= TWO_PI; } if(osc_phase >= 0.0 && prev_osc_phase < 0.0){ // a positive-going zero crossing has // occurred, so compute the crossing time tau_n = -delta_T * prev_osc_phase/ ( osc_phase - prev_osc_phase); } else{ tau_n = 0.0; } //-----------------------------------------// do state machine for phase detector switch (prev_state){ case 1: if(tau_n !=0.0){ new_state = 0; time_dwell_plus = tau_n; time_dwell_minus = 0.0; } else{ new_state = 0; time_dwell_plus = delta_T; time_dwell_minus = 0.0; } break; case -1: if(!Prev_Input_Positive){ //step 3 Algorithm 12.3 new_state = -1; time_dwell_plus = 0.0; time_dwell_minus = delta_T; }
Section 12A.1 DigitalPLL
Listing 12A.4
continued.
else{ if(tau_n != 0.0){ //step 4a Algorithm 12.3 new_state = -1; time_dwell_plus = 0.0; time_dwell_minus = delta_T - tau_n; } else{ //step 4b Algorithm 12.3 new_state = 0; time_dwell_plus = 0.0; time_dwell_minus = 0.0; } } break; case 0: if(Prev_Input_Positive){ if(tau_n != 0.0 ){ // step 5a Algorithm 12.3 new_state = 0; time_dwell_plus = tau_n; time_dwell_minus = 0.0; } else{ // step 5b Algorithm 12.3 new_state = 1; time_dwell_plus = delta_T; time_dwell_minus = 0.0; } } else{ if(tau_n != 0.0 ){ // step 6a Algorithm 12.3 new_state = -1; time_dwell_plus = 0.0; time_dwell_minus = delta_T - tau_n; } else{ // step 6b Algorithm 12.3 new_state = 0; time_dwell_plus = 0.0; time_dwell_minus = 0.0; } } }
437
438
Example Source Code
Listing 12A.4
Appendix 12A
continued.
// Perform Filtering // if(time_dwell_plus == 0 && time_dwell_minus == 0){ // step 6 Algorithm 12.5 cap_val = prev_cap_val; filt_val = cap_val; } else{ if(time_dwell_minus > 0){ // step 7 Algorithm 12.5 cap_val = prev_cap_val*(1.0 time_dwell_minus/(Tau_1 + Tau_2)); filt_val = cap_val * (1.0 - (Tau_2/ (Tau_1+Tau_2))* (time_dwell_minus/delta_T)); } else{ // step 8 Algorithm 12.5 cap_val = prev_cap_val + (time_dwell_plus/(Tau_1 + Tau_2))* (Supply_Volts - prev_cap_val); filt_val = cap_val + (time_dwell_plus/delta_T)* (Tau_2/(Tau_1+Tau_2))* (Supply_Volts - cap_val); } } //--------------------------------------------// update delayed variables prev_state = new_state; prev_time_zc = time_zc; prev_osc_phase = osc_phase; prev_cap_val = cap_val; prev_filt_val = filt_val; Prev_Input_Positive = (input_val >= 0); } delta_T = time_of_samp - prev_time_zc; inst_freq = Omega_Sub_0 + K_Sub_0 * prev_filt_val; output_phase = prev_osc_phase + inst_freq * delta_T; *fsOutput_ptr++ = sin(output_phase);
Section 12A.1 DigitalPLL
Listing 12A.4
continued.
*fs_filtered_error_ptr++ = prev_filt_val; *fsOscPhase_ptr++ = output_phase; *fsOscFreq_ptr++ = inst_freq/TWO_PI; fsInput_ptr++; } Prev_Input_Val = prev_input_val; Prev_Filt_Val = prev_filt_val; Prev_Time_Zc = prev_time_zc; Prev_State = prev_state; Prev_Cap_Val = prev_cap_val; Time_Of_Samp = time_of_samp; Prev_Osc_Phase = prev_osc_phase;
err_avg = err_sum / block_size; BasicResults 1 Nin − dk−1 − RNout
14.1.2
Interpolation by Integer Factors
The basic idea behind interpolation is to produce new sample values between the existing samples. Consider the sampled signal and its continuous-frequency spectrum shown in Figure 14.2. With a sampling rate of FS , the simulation bandwidth x [n ] LPF
↓M
y [n ]
Figure 14.1 Block diagram of decimation, which consists of antialias filtering followed by downsampling.
Section 14.1
Basic Concepts of Multirate Signal Processing
Table 14.1
467
Summary of template model Downsampler.
Constructor: Downsampler::Downsampler( char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal); Parameter: int Decim Rate Note: 1. Source code is contained in file downsampler t.cpp.
extends from −FS /2 to FS /2, thus rejecting all but the baseband image, as shown in Figure 14.2(c). Suppose we wish to triple the sampling rate. Upsampling is accomplished by inserting two zero-valued samples between each pair of original samples to yield the sequence shown in Figure 14.3. With a new sampling rate of 3FS , the system bandwidth now extends from −3FS /2 to 3FS /2. Now three images of the original spectrum fit within the system bandwidth, as shown in Figure 14.4. Lowpass filtering is performed to limit the signal to a bandwidth equal to half the original sampling rate. This filtering removes the two extra spectral images from the system bandwidth to yield the signal and spectrum depicted in Figure 14.5. For this reason, the filter in an interpolator is sometimes called an anti-imaging filter. Sometimes in the DSP literature, the introduction of the zero-valued samples, as in Figure 14.3, is described as compressing the signal’s spectrum by a factor of M. Upsampling does not really compress the spectrum, but this description arises from the common practice of using the sampling rate to normalize the frequencies in a DSP system. If the frequencies in Figure 14.2(b) are normalized by FS , the bandwidth of the baseband image is confined to the normalized frequency range of ±1/2. After interpolation, the sampling rate is 3FS , and if the frequencies are normalized accordingly, the baseband image is confined to the normalized frequency range of ±1/6. Thus, due to the change in normalization, the spectrum appears to have been compressed by a factor of three. To allow for maximum flexibility in the choice of anti-imaging filters, PracSim does not include an integrated interpolation model. Instead, interpolation is accom-
468
Multirate Simulations
Chapter 14
(a)
−3FS
−2FS
−FS
FS
2FS
3FS
(b)
−FS 2
FS 2 (c)
Figure 14.2 Waveforms for discussion of interpolation: (a) sampled signal, (b) its spectrum, and (c) relationship between spectral images and simulation bandwidth.
plished by using an upsampler followed by a separate filter model. The Upsampler template model is summarized in Table 14.2.
14.1.3
Decimation and Interpolation by Noninteger Factors
The sampling rate of a signal can be changed by a rational factor L/M by first interpolating by a factor of L and then decimating by a factor of M. If L > M, the net effect is interpolation by a factor of L/M. If M > L, the net effect is decimation
Section 14.2
469
Filter Design for Interpolators and Decimators
Figure 14.3 Sampled signal after zero-valued samples have been inserted.
−FS −3FS 2
FS 3FS 2
Figure 14.4 Relationship between simulation bandwidth and the spectrum of the signal after zero-valued samples have been inserted.
by a factor of M/L. Interpolation by a noninteger factor can be used to convert a compact disc (CD) signal into a digital audio tape (DAT) signal. The sampling rate for CD recordings is 44.1 kHz, and the sampling rate for DAT recordings is 48 kHz. Interpolation by a factor of 160 can be used to convert the CD sample rate from 44.1 kHz to 7056 kHz. Decimation by a factor of 147 can then be used to convert the 7056 kHz sample rate down to the DAT rate of 48 kHz.
14.2
Filter Design for Interpolators and Decimators
In DSP applications in which both sampling rates and filter lengths need to be kept as low as possible, the design of filters for decimation and interpolation can sometimes be quite a challenge. However, in simulations, which typically sample signals at rates many times higher than the Nyquist rate, and which tolerate whatever filter lengths are needed for high-fidelity modeling, the design of the necessary filters can be relatively easy. In DSP applications, the filter design difficulties are often eased by using a multistage approach [2], but in a simulation context, single-stage designs are almost always adequate.
470
Multirate Simulations
Chapter 14
(a)
−3FS
−2FS
−FS
FS
2FS
3FS
(b)
Figure 14.5 (a) Sampled signal from Figure 14.3, and (b) its spectrum after lowpass filtering.
Table 14.2
Summary of template model Upsampler.
Constructor: Upsampler::Upsampler( char* instance name, PracSimModel* outer model, Signal* in signal, Signal* out signal); Parameter: int Interp Rate Note: 1. Source code is contained in file upsampler t.cpp.
Section 14.2
Filter Design for Interpolators and Decimators
471
The Remez exchange algorithm provides a convenient way to design FIR filters that can be used for interpolation and decimation. In most digital filter design programs, such as those provided in [2] or in various commercial packages, passband and stopband edge frequencies are normalized by the sample rate. For example, given a sampling rate of 10,000 samples per second, a passband edge at 3 kHz would be specified using a normalized value given by fp =
3 × 103 = 0.3 104
The other design parameter used by the Remez exchange is the ripple ratio, which is the ratio of maximum passband ripple to minimum stopband attenuation. For a maximum passband ripple of δ 1 = 0.025 and a minimum stopband attenuation of LS = 60 dB, the ripple ratio K can be obtained as K=
14.2.1
δ1 (−L 10 S /20)
=
0.025 = 25.0 10(−60/20)
Interpolation
Figure 14.4 shows the continous-frequency spectrum for a signal that has been upsampled by a factor of three. Three images of the original spectrum fit within the new system bandwidth indicated by the unshaded area. The relative spacing of the images depicts a typical DSP situation in which the original sampling rate is only slightly larger than twice the original signal’s one-sided bandwidth. The interpolation filter must remove all but the center image; therefore, the filter’s passband must be wide enough to accomodate this image. The filter’s transistion band must be narrow so that the adjacent spectral image falls completely within the stopband. Example 14.1 Consider a signal consisting of four sinusoids of equal magnitudes. The normalized frequencies of the sinsoids are 3.0, 1.5, 0.75, and 0.375. The sampling rate is 8 samples per second. A segment of such a signal generated by the MultipleToneGener model is shown in Figure 14.6. The spectrum of this signal is shown in Figure 14.7. The spectrum shown was generated by the SpectrumAnalyzer model using the parameters listed in Table 14.3. After the signal is upsampled by a factor of four, the spectrum develops images as shown in Figure 14.8. The interpolation filter needs to have a flat passband response and good attenuation in the stopband. The filter must pass the signal components at ±3 Hz while rejecting all components at ±5 Hz and beyond. A filter
472
Multirate Simulations
Chapter 14
having a passband edge frequency of 3.5 and a stopband edge frequeny of 4.5 should satisfy these requirements. Normalized to the sampling rate, these two frequencies are specified as fP = 3.5/32 = 0.109375 and fS = 4.5/32 = 0.140625. Figures 14.9 and 14.10 show the response of a 71 tap filter having these critical frequencies and a ripple ratio of 20. When the upsampled signal is passed through this filter the result will have the spectrum shown in Figure 14.11. The baseband image is passed with low distortion, and the undesired images are attenuated by more than 55 dB. The ultimate test of goodness for an interpolator would be some measure of the deviation between the the interpolated waveform and what the original waveform would have been if it had been originally generated at the higher sample rate. While not possible in a real system, this kind of test can easily be performed in a simulated system. Simply generate a test waveform at the sample rate desired for the interpolated signal and downsample to the rate that will be used for the interpolator input in the production runs of the simulation. The ouput of the interpolator can then be compared to the original high-sample-rate waveform. The filtering operation will introduce a delay in the interpolated signal, and the original reference signal must be delayed by the same amount before a sample-by-sample comparison can be performed to gauge the fidelity of the interpolated signal. When the simulation depicted in Figure 14.12 is run using the interpolation filter of Figure 14.9, the measured signal-to-distortion ratio is approximately 33.3 dB. This simulation generates a reference signal directly at the interpolator’s output sample rate. The reference signal is then downsampled to obtain the test signal that will be interpolated. The signal-to-distortion ratio is measured by comparing the interpolated signal to the originally generated reference signal.
Section 14.2
473
Filter Design for Interpolators and Decimators
0.8
amplitude
0.4
0
-0.4 -0.8 0
Figure 14.6
5
10 time
15
20
Segment of the four-sinusoid signal used in Example 14.1.
40 20
relative PSD (dB)
0 -20 -40 -60 -80 -100 -120 -140 -160 -4
-3
-2
-1
0
1
2
frequency (Hz) Figure 14.7
Spectrum of four-sinusoid signal for Example 14.1.
3
4
474
Multirate Simulations
Table 14.3 14.1.
Chapter 14
Spectrum analyzer parameters for Example
Kind Of Spec Estim = SPECT CALC BARTLETT PDGM Num Segs To Avg = 600 Seg Len = 4000 Fft Len = 4096 Hold Off = 0 Norm Factor = 1.0 Freq Norm Factor = 1.0 Output In Decibels = true Plot Two Sided = true
40
relative PSD (dB)
20 0 -20 -40 -60 -80 -100 -120 -16
Figure 14.8
-12
-8
-4 0 4 frequency (Hz)
8
12
16
Spectrum of four-sinusoid signal after upsampling by a factor of four.
Section 14.2
475
Filter Design for Interpolators and Decimators
10
magnitude (dB)
-10 -30 -50 -70 -90 -110
2
0
6
4
8 10 frequency (Hz)
12
16
14
Figure 14.9 Magnitude response of 71-tap interpolation filter designed for a passband edge of 3.5 Hz and a stopband edge of 4.5 Hz.
0.2
magnitude (dB)
0 -0.2 -0.4 -0.6 -0.8 -1
0
Figure 14.10
1
2 frequency (Hz)
3
4
Magnified passband detail for filter response shown in Figure 14.9.
476
Multirate Simulations
Chapter 14
30
relative PSD (dB)
0 -30 -60 -90 -120 -150 -16
-12
-8
-4
0
4
8
12
16
frequency (Hz) Figure 14.11
Spectrum of interpolated signal after filtering.
MultipleSineGener ref_sig
SignalAnchor
Downsampler test_sig Upsampler upsamp_test_sig AnlgDirectFormFir DiscreteDelay filt_sig delayed_ref_sig MeanSquareError
Figure 14.12 interpolation.
Block diagram for assessing the signal-to-distortion ratio due to
Section 14.2
Filter Design for Interpolators and Decimators
477
In Example 14.1, the original sampling rate was only slightly larger than twice the bandwidth of the test signal. In a simulation context, the sampling rate is usually much larger than twice the signal bandwidth, making it easier to design a high-performance interpolation filter. Example 14.2 Consider the signal used in Example 14.1 but with a sample rate of 16 rather than 8 samples per second. After the signal is upsampled by a factor of four, the spectrum develops images as shown in Figure 14.13. The interpolation filter must pass the signal components at ±3 Hz while rejecting all components at ±11 Hz and beyond. Figures 14.14 and 14.15 show the response of a 71-tap filter designed for a passband edge frequency of 4 Hz, a stopband edge frequency of 8 Hz, and a ripple ratio of 20. When the upsampled signal is passed through this filter, the result will have the spectrum shown in Figure 14.16. The baseband image is passed with virtually no distortion, and the undesired images are attenuated by more than 90 dB. The measured signal-to-distortion ratio is approximately 66.7 dB. When a signal consisting of a desired signal plus white noise is interpolated, the SNR at the interpolator output is generally different from the SNR at the interpolator input. For an interpolation factor of L, the upsampling process spreads both the original signal energy and the original noise energy over a bandwidth L times greater than the original bandwidth. A well-designed interpolation filter passes almost exactly 1/L of the upsampled signal’s energy. However, the amount of noise energy passed by the filter depends upon the noise-equivalent bandwidth of the filter. If the amount passed is less than 1/L times the total noise energy in the simulation bandwidth, then the overall SNR will be improved by the interpolation process. On the other hand, if the amount of noise energy passed is greater than 1/L times the total noise energy, then the overall SNR will be degraded by the interpolation process. These changes must be accounted for when setting the level of additive noise in a simulation.
478
Multirate Simulations
Chapter 14
20 0
relative PSD (dB)
-20 -40 -60 -80 -100 -120 -140 -160 -32
-24
-16
-8 0 8 frequency (Hz)
16
24
32
Figure 14.13 Spectrum of four-sinusoid signal from Example 14.2 after upsampling by a factor of four. 10
magnitude (dB)
-10 -30 -50 -70 -90 -110
0
4
8
12
16 20 frequency (Hz)
24
28
32
Figure 14.14 Magnitude response of 71-tap interpolation filter designed for a passband edge of 4 Hz and a stopband edge of 8 Hz.
Section 14.2
479
Filter Design for Interpolators and Decimators
magnitude (dB)
0.05
0
-0.05
-0.1
1
0
Figure 14.15
4
2 3 frequency (Hz)
5
Magnified passband detail for filter response shown in Figure 14.14.
20
relative PSD (dB)
0 -20 -40 -60 -80 -100 -120 -140 -160 -32
Figure 14.16
-24
-16
-8 0 8 frequency (Hz)
16
24
32
Spectrum of interpolated signal from Example 14.2 after filtering.
480
Multirate Simulations
14.2.2
Chapter 14
Decimation
Consider the case in which a lowpass signal is to be decimated. If decimation is to be useful, the signal of interest must occupy a bandwidth which is much smaller than half the original sampling rate. For DSP applications, after the sampling rate is reduced, the signal bandwidth must still be less than half the new sampling rate. These requirements drive the design of the decimation filter. As shown in Figure 14.17, the passband of the filter must be wide enough to accomodate the bandwidth of the desired signal, and the filter’s stopband edge frequency must be less than half the new sampling rate. If the signal to be decimated consists of a desired signal plus some amount of AWGN, care must be taken to ensure that the decimation filter does not introduce unacceptable changes in the spectral characteristics of the noise. Example 14.3 Consider a signal consisting of three sinusoids of equal magnitudes. The normalized frequencies of the sinusoids are 1.5, 0.75, and 0.375. The sampling rate is 64 samples per second. A segment of such a signal generated by the MultipleToneGener model is shown in Figure 14.18. The estimated PSD of this signal is shown in Figure 14.19. The signal is to be decimated by a factor of four, reducing the sample rate
signal spectrum filter response
0
f stop
fH
f samp f pass
Figure 14.17
2
Critical frequencies in the design of a decimation filter.
Section 14.2
481
Filter Design for Interpolators and Decimators
to 16 samples per second. In a DSP application, a decimation filter having a passband just large enough to pass the desired signal might be used. If the filter from Example 14.2 is applied to the signal from Figure 14.18 prior to downsampling, the result after downsampling will have a PSD that agrees with Figure 14.19. However, if AWGN is added to the original signal such that the SNR is 5 dB, the signal will be as shown in Figure 14.20, and the estimated PSD of this signal will be as shown in Figure 14.21. If the filter from Example 14.2 is applied to this signal prior to downsampling, the result after downsampling will have a PSD as shown in Figure 14.22. The noise in this signal is not white—it is bandlimited to a range of ±4 Hz.
0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0
2
4
6
8
10
12
14
16
18
time Figure 14.18
Segment of the three-sinusoid signal used in Example 14.3.
20
482
Multirate Simulations
Chapter 14
20 0
relative PSD (dB)
-20 -40 -60 -80 -100 -120 -140 -160 -180 -32
-24
-16
-8
0
8
16
24
32
frequency (Hz) Figure 14.19
Estimated PSD of the three-sinusoid signal for Example 14.3.
1.0 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1.0 0
Figure 14.20
2
4
6
8
10 time
12
Noisy test signal for Example 14.3.
14
16
18
20
Section 14.2
483
Filter Design for Interpolators and Decimators
15 10
relative PSD (dB)
5 0 -5 -10 -15 -20 -25 -32
-24
-16
-8
0
8
16
24
32
frequency (Hz) Figure 14.21
Estimated PSD for noisy test signal used in Example 14.3.
20
relative PSD (dB)
0 -20 -40 -60 -80 -100 -120 -140 -160 -8
Figure 14.22
-6
-4
-2 0 2 frequency (Hz)
4
6
Estimated PSD for decimated signal from Example 14.3.
8
484
Multirate Simulations
Chapter 14
In order to preserve the whiteness of the downsampled noise, it seems reasonable that the bandwidth of the decimation filter should be matched to the downsampled simulation bandwidth. It may not be immediately clear exactly what constitutes a good match. Practical filters have transition bands of nonzero width. If the bandwidth is set so that the transition band falls within the simulation bandwidth, the PSD will roll off before reaching the folding frequency, as demonstrated in Example 14.3. If the bandwidth is set so that the transition band falls outside of the simulation bandwidth, the energy in the transition band will alias back into the simulation bandwidth and the PSD will exhibit peaks near the folding frequency. The trick is to find the combination of (1) passband edge frequency, (2) transition width, and (3) transition-band-response shape such that the transition band straddles the folding frequency as shown in Figure 14.23. In the ideal combination, energy (a) from the transition band above ffold is aliased into transition band frequencies (b) below ffold in just the right amounts to replace the “missing” energy at (c) and thereby yielding a flat PSD out to the folding frequency.
passband edge
c
b
a
folding frequency Figure 14.23
stopband edge
Optimal transition-band configuration for a decimation filter.
Section 14.2
485
Filter Design for Interpolators and Decimators
Example 14.4 Figure 14.24 shows the magnitude response of a 71-tap filter designed for a passband edge frequency of 8 Hz, a stopband edge frequency of 11 Hz, and a ripple ratio of 20. If a signal comprising only the noise portion of Figure 14.20 is decimated using this filter, the result will have a PSD as shown in Figure 14.25. This spectrum has peaks near the folding frequency due to noise energy in the transition band of the filter being aliased to frequencies within the passband. A 71-tap filter designed for a passband edge frequency of 7.12 Hz and a stopband edge frequency of 10.12 Hz comes close to the ideal, as shown by the nearly flat PSD in Figure 14.26.
10
magnitude (dB)
-10
-30
-50
-70
-90
-110
0
4
8
12
20 16 frequency (Hz)
24
28
32
Figure 14.24 Magnitude response of a 71-tap filter designed for a passband edge frequency of 8 Hz and a stopband edge frequency of 11 Hz.
486
Multirate Simulations
Chapter 14
0 -2 -4
relative PSD (dB)
-6 -8 -10 -12 -14 -16 -18 -20 -22 -24 -8
-6
-4
-2
0
2
4
6
8
frequency (Hz) Figure 14.25 Estimated PSD for noise-only signal from Example 14.4 decimated using a filter having a transition band from 8 Hz to 11 Hz.
relative PSD (dB)
-14 -16 -18 -20 -8
-6
-4
-2
0
2
4
6
8
frequency (Hz) Figure 14.26 Estimated PSD for noise-only signal from Example 14.4 decimated using a filter having a transition band from 7.1 Hz to 10.1 Hz.
Section 14.3
14.3
Multirate Processing for Bandpass Signals
487
Multirate Processing for Bandpass Signals
Prior to downsampling a bandpass signal, the bandwidth of the signal is usually reduced by moving the signal’s passband to a lower center frequency. Conversely, for interpolation of bandpass signals, once the signal has been upsampled, the signal bandwidth is usually increased by moving the signal’s passband to a higher center frequency.
14.3.1
Quadrature Demodulation
A real-valued bandpass signal has a spectrum that is conjugate symmetric; that is, the real part of the spectrum is even-symmetric and the imaginary part of the spectrum is odd-symmetric. Consider the real-valued discrete-time signal x[n] having the DTFT spectrum X(ej ω ) shown in Figure 14.27(a). If the signal x[n] is multiplied by ej ω0 T , the spectrum will be shifted to the right by ω0 , as shown in Figure 14.27(b). The signal can then be lowpass-filtered to remove the spectral component centered at 2ω0 . The resulting spectrum, shown in Figure 14.27(c), is in general not conjugate-symmetric. This means that the corresponding time signal will in general be complex-valued. This signal is called the complex envelope [2] of x[n] and is usually denoted as x[n]. ˜ The complex envelope can be expressed in terms of an inphase component xI [n] and a quadrature component xQ [n]: x[n] ˜ = xI [n] + j xQ [n]
(14.3.1)
Because x[n] is real-valued and ej ω0 nT = cos (ω0 nT ) + j sin (ω0 nT ) the inphase and quadrature components of x[n] can be obtained via quadrature demodulation using the demodulator structure shown in Figure 14.28. As an alternative to Eq. (14.3.1), the complex envelope can be expressed in polar form as x[n] ˜ = a(nT ) exp (j φ (nT )) where a(nT ) is the envelope and φ (nT ) is the phase of the signal x[n].
14.3.2
Quadrature Modulation
Given a complex-valued signal x[n] ˜ that has been obtained via quadrature demodulation of a real-valued bandpass signal, the process of quadrature modulation can be used to reconstruct the original bandpass signal. Consider the complex envelope
488
Multirate Simulations
Chapter 14
(a)
−w 0
w0 (b)
2w 0 (c)
Figure 14.27 Spectral interpretation of quadrature demodulation: (a) real-valued bandpass signal, (b) shifted spectrum, (c) filtered spectrum.
signal’s spectrum shown in Figure 14.27(c) and repeated in Figure 14.29(a). Multiplying x[n] ˜ by exp(−j ω0 nT ) will shift the spectrum to the left by ω0 , as shown in Figure 14.29(b). Our goal is to replicate the spectrum shown in Figure 14.27(a), so we can multiply x ∗ [n], the complex conjugate of x[n], by exp(j ω0 nT ) to produce the shifted spectrum shown in Figure 14.29(c). Clearly, the original bandpass spectrum of Figure 14.27(a) can be obtained by adding together the two signals represented by the spectra in Figures 14.29(b) and 14.29(c): x[n] = x[n] ˜ exp(−j ω0 nT ) + x˜ ∗ [n] exp(j ω0 nT )
= xI [n] + j xQ [n] exp(−j ω0 nT ) + xI [n] − j xQ [n] exp(j ω0 nT ) = xI [n] cos(ω0 nT ) + xQ [n] sin(ω0 nT )
Section 14.3
489
Multirate Processing for Bandpass Signals
cos ( w 0nT )
h [n ]
x Ι [n ]
h [n ]
x Q [n ]
x [n ]
sin ( w 0nT ) Figure 14.28
Block diagram of a quadrature demodulator.
(a)
(b)
−w 0 (c)
w0 Figure 14.29 Spectral interpretation of quadrature modulation: (a) spectrum of complex envelope, (b) shifted spectrum, (c) shifted complex-conjugate spectrum.
This page intentionally left blank
Chapter 15
MODELING DSP COMPONENTS M
odeling DSP components is different from modeling other constituent parts of a communications system. The model of an analog device is based on mathematical descriptions of processes that are defined by the laws of physics, often with only limited observability of all the various processes actually involved. Modeling of a DSP device is simplified by the the fact that the model of a DSP device is based on mathematical descriptions of processes that are defined mathematically. On the other hand, modeling of a DSP device is often complicated by the need to model the effects of quantized signal values, quantized coefficients, and finite-precision arithmetic. When quantization effects are included, otherwise linear systems become nonlinear, making the exact architecture of the device and the order of mathematical operations an important consideration in the design of the model.
15.1
Quantization and Finite-Precision Arithmetic
In the analysis and modeling of DSP devices, quantization effects are usually grouped into three categories: coefficient quantization, signal quantization, and finite-precision arithmetic.
15.1.1
Coefficient Quantization
Design algorithms for digital filters often determine the filter coefficients to a very fine level of precision. If the filters are to be implemented using floating-point hardware or software, most or all of the precision in the coefficients can be maintained. However, if the filter is to be implemented using fixed-point hardware or software, the necessary quantization of the design coefficients introduces degradation into the performance of the filter. It is sometimes possible to predict and possibly mitigate 491
492
Modeling DSP Components
Chapter 15
this degradation as part of the design process by introducing coefficient quantization into the design process itself. These approaches may reduce, but not entirely eliminate, the need for simulations to determine filter performance. Most of the attempts to treat coefficient quatization in the design process still assume nearly infinite precision in the representation of the input signal and in the arithmetic used to implement the filter. The ultimate filter performance will include the effects of interactions between coefficient quantization, signal quantization, and finite-precision arithmetic; and these interactions are most easily explored via simulation. 15.1.1.1
Floating-Point Formats
Most higher level languages such as Microsoft Visual C++ represent floating-point numbers using formats specified in IEEE Standard 754. A float in Visual C/C++ is stored in the IEEE 754 single-precision format depicted in Figure 15.1. The 23bit mantissa is effectively 24 bits because for normalized values there is always an implicit bit with value 1 just to the left of the radix point. The 8-bit exponent is in excess 127 form with values 0x00 and 0xFF reserved for special values. In excess 127 format, the actual value of the exponent is the amount by which the indicated value exceeds 127. An exponent of −3 is represented as 12410 = 0x7C, an exponent of 0 is represented as 12710 = 0x7F, and an exponent of 51 is represented as 17810 = 0xB2. The implicit bit to the left of the mantissa’s radix point makes it impossible to exactly represent the value zero. The smallest possible value in normalized form corresponds to an explicit mantissa of zero and has a value of 2−126 ≈ 1.175×10−38 . The second smallest value in normalized form corresponds to the least significant bit (LSB) of the mantissa set to 1 and has a value of (1 + 2−23 ) × 2−126 . The interval between the negative value closet to zero and the smallest positive value is 2148 times larger than the interval between the smallest and second smallest positive values, thus creating a resolution “gap” around zero. The standard includes provisions for a denormalized form that provides an exact representation for zero and fills in the resolution gap around zero. The denormalized form, which is indicated by the reserved exponent value of 0x00, removes the implicit bit to the left of the radix point and uses an effective exponent value of 2−126 . In this form, when the mantissa LSB is set to 1, the value represented is 2−149 . Zero is exactly represented in denormalized form by a mantissa of zero. When the result of an operation has a magnitude that exceeds the largest representable magnitude of (2 − 2−23 ) × 2127 ≈ 3.4 × 1038 , the value is reported as infinity by setting the mantissa to zero and the exponent to all ones. Infinity can be negative or positive depending upon the value of the sign bit. The result of an indeterminate operation is reported as a Quiet Not a Number (QNaN), which has an exponent of all ones and
Section 15.1
493
Quantization and Finite-Precision Arithmetic
31 30
s
23
22
0
e
m
exponent
mantissa
sign normalized: value = (−1)S × M × 2(e −127)
1 ≤ e ≤ 254
M
20
2−1 2−2
2−22 2−23
1 implicit radix point implicit 1
denormalized: value = (−1)S × M × 2−126
M
2−1 2−2
2−22 2−23
Figure 15.1 Single-precision floating-point format.
a nonzero mantissa, with the most significant explicit bit of the mantissa set to 1. The result of an invalid operation is reported as a Signaling Not a Number (SNaN), which has an exponent of all ones and a nonzero mantissa, with the most significant explicit bit of the mantissa set to 0. A C/C++ double is stored in the IEEE 754 double-precision format depicted in Figure 15.2. This format is similar to the single-precision format but has longer
494
Modeling DSP Components
Chapter 15
mantissa and exponent fields. The exponent is in excess 1023 form with values 0x000 and 0x7FF reserved to indicate denormalized form, infinity, and NaNs as described for the single-precision format.
63 62
52
s
51
0
e
m
exponent
mantissa
sign normalized: value = (−1)S × M × 2(e −1023)
1 ≤ e ≤ 2046
M
20
2−1 2−2
2−51 2−52
1 implicit radix point implicit 1
denormalized: value = (−1)S × M × 2−1022
M
2−1 2−2
Figure 15.2
Double-precision floating-point format.
2−51 2−52
Section 15.1
15.1.2
Quantization and Finite-Precision Arithmetic
495
Signal Quantization
Quantization of the signals in a communications system can be significantly more complicated than it first appears. It is fairly straightforward to quantize a given range of voltages into 2N different N-bit digital values. The difficult part is determining where the signal of interest will be positioned within this range of voltages. If the gain is optimized with respect to the quantizer range, the signal of interest may span 7 or 8 bits. On the other hand, if the signal is particularly weak or the gain is set improperly, the signal may span only 2 or 3 bits of the quantizer range. The performance of subsequent DSP stages will differ greatly for these two cases. A particular system may need to be designed so as to provide adequate performance at both extremes.
15.1.3
Finite-Precision Arithmetic
Assessing performance degradations due to finite-precision arithmetic is very much dependent on the details of how a particular algorithm is implemented. For rough estimates of the impact that quantization will have on the performance of a particular DSP device, it is a common practice to use a floating-point model of the device and apply “generic” quantization to the input, coefficients, and output. Generic quantization ignores the numerical format of the actual implementation and simply introduces granularity into representation of the various values. One very simple approach for approximating N-bit quantization is to multiply each floating-point input sample by 2N−M−1 , truncate the result to an integer, and then floating-point-divide this integer by 2N−M−1 . The value of M is the smallest positive integer for which the input samples are guaranteed to have magnitudes less than 2M . For very simple devices like FIR filters, this treatment of quantization may be adequate. However, for devices that involve feedback like IIR filters, or adaptive coefficients like equalizers and RAKE demodulators, the only way to ensure accuracy of the simulation is to employ bit true modeling of the device. If the actual device multiplies a 6-bit signed input sample in fractional form by a 7-bit signed coefficient in fractional form and truncates the result to a 10-bit signed value, then a bit true model does exactly the same thing. Constructing bit true models of DSP devices is facilitated by libraries that perform the finite-precision arithmetic. In one particularly elegant approach, every signal sample is represented by a C++ object that encapsulates both a fixed-point representation and a “full-precision” floating-point representation. Arithmetic operations are implemented in methods belonging to the class. These operations are performed on both the fixed-point and floating-point representations, storing the results in a new instance of the C++ object. Consequently, at any point in the pro-
496
Modeling DSP Components
Chapter 15
cessing, every fixed-point result can be immediately compared to the value it would have had if quantization were not in the picture. These comparisons help pinpoint those locations in a proposed device where additional precison will yield the greatest performance improvements.
15.2
FIR Filters
A finite impulse response (FIR) digital filter is one of the simplest DSP devices. The defining equation for an N-tap FIR filter is y[k] =
N −1
hk x[k − n]
n=0
where y[k] is the filter output at time k x[k] is the filter input at time k hk is the filter coefficient for delay k There are a number of well-known techniques for determining the coefficients for an FIR filter having specified characteristics. These techniques include windowing, frequency sampling, and the Remez exchange [2]. The principles of modeling FIR filters are the same regardless of the method used to generate the coefficients. As depicted in Figure 15.3, the direct form of an N-tap FIR filter consists of a string of N − 1 single-sample delays with N coefficient multipliers and a summer. Sometimes this structure is referred to as a tapped-delay line or transversal filter. If the input signal consists of signed values having bin bits and the coefficients have bcoef bits, then the outputs of the multiplers can have at most (bin + bcoef − 1) bits. The sum 6 of the7multiplier outputs will have at most (bN + bin + bcoef − 1) bits, where bN = log2 N . The model IntDirectFormFir, provided on the companion Web site, models an FIR filter implemented using integer arithmetic. This model accepts an input signal of type Signal assumed to contain no more than bin bits per value. Coefficients are externally scaled into integers of bcoef bits and supplied to the model via the parameter input mechanism. Multiplier and summer outputs are allowed to grow to as many bits as needed. Internal calculations use type int64, so use of this model is limited to cases where (bN + bin + bcoef ) < 64. The summer output is right-shifted by bshift bits and the bmask least significant bits of the result are issued as the filter output. Both bshift and bmask are user-specified values. The model FracDirectFormFir, provided on the companion Web site, takes a different approach, which is diagrammed in Figure 15.4. The bscale LSBs of the
Section 15.2
x [k ]
h0
497
FIR Filters
T
h1
T
T
h N −2
h2
h N −1
∑ y [n ] Figure 15.3 FIR filter.
input signal are scaled into fractional values of bin bits. Coefficients are externally scaled into fractional values of bcoef bits and supplied to the model via the parameter input mechanism. Multiplier outputs are in fractional form and truncated to bmult bits. The summer output is truncated to bsum bits. Values for bscale , bin , bcoef , bmult , and bsum are all user-specified. In exploring alternative topologies for implementing digital filters, it is convenient to depict the topology in the form of a signal flow graph (SFG) as in Figure 15.5. The interpretation of an SFG is subject to four simple rules: 1. The direction of signal flow is indicated by an arrowhead near the center of each branch, and the gain of the branch is indicated near this arrowhead. 2. An unlabeled branch has unity gain. 3. A delay of M sample times is indicated by a branch gain of z−M . 4. All of the branch signals entering a node are added together to obtain the signal exiting the node. Interpreting Figure 15.5 by these rules reveals that the SFG in the figure is equivalent to the block diagram in Figure 15.3. The transposition theorem for SFGs states that the system represented by a particular SFG can be transposed into a different but equivalent system by simply
498
Modeling DSP Components
x [k ]
b scale bits
b c = b coef b m = b mult
scale
b in bits
T
h0
T
h1
bc
T
h N −2
h2
bc
bc
bc
frac trunc frac trunc
bm
bm
Chapter 15
h N −1 bc
frac trunc frac trunc
bm
frac trunc
bm
∑
bm
frac trunc
b sum Figure 15.4
y [n ]
Fractional quantization scheme for a direct-form FIR filter.
reversing the flow direction in every branch and reversing the roles of the systemlevel input node and output node. Figure 15.5 can be transposed to yield the SFG shown in Figure 15.6, which can be redrawn as in Figure 15.7. The direct-form implementation delays raw input samples, multiplies the current sample and N − 1 delayed samples by the filter coefficients, and then immediately sums the multiplier outputs. The transposed direct form multiplies the current sample by all N filter coefficients and then sums each multiplier output into a different point in a delay chain, as depicted in Figure 15.8. FIR filters are often selected for an application because they can be designed to have constant group delay, which is a desirable property for filters because nonconstant group delay will cause envelope distortion in modulated-carrier signals and pulse-shape distortion in baseband data signals. A filter’s frequency response H (ej ω ) can be expressed in terms of amplitude response A(ω) and phase response
Section 15.2
499
FIR Filters
z −1
z −1
z −1
x [n ] h1
h0
h2
h N −2
h N −1 y [n ]
Figure 15.5 Signal flow graph for a direct-form realization of an FIR filter.
z −1
z −1
z −1
y [n ] h1
h0
h2
h N −2
h N −1 x [n ]
Figure 15.6 Transposed signal flow graph for a direct-form realization of an FIR filter.
θ (ω): H (ej ω ) = A(ω)ej θ(ω) The filter will have constant group delay if and only if θ (ω) = β + αω
(15.2.1)
where α and β are constants. It can be shown that an N-tap FIR filter will satisfy Eq. (15.2.1) if all of the following are satisfied: N −1 2 π β = ± 2 h[n] = −h[N − 1 − n] α =
(15.2.2a) (15.2.2b) 0≤n≤N −1
(15.2.2c)
500
Modeling DSP Components
Chapter 15
x [n ] h N −1
h N −2
z −1 Figure 15.7 FIR filter.
h N −3
h1
z −1
h0 z −1
y [n ]
Signal flow graph for a transposed direct-form realization of an
x [k ] h N −1
h N −2
h N −3
T
T
h1
T
h0
T
y [n ]
Figure 15.8 Transposed direct-form FIR filter.
A constant group-delay filter will have linear phase if the phase response passes through the origin; that is, if θ (ω) = αω
(15.2.3)
It can be shown that an N -tap FIR filter will satisfy Eq. (15.2.3) if both of the following are satisfied: N −1 2 h[n] = h[N − 1 − n] α =
(15.2.4a) 0≤n≤N −1
(15.2.4b)
FIR filters having constant group delay are usually categorized into four types corresponding to the four combinations of odd/even N and odd/even symmetry of h[n]. These four types and their properties are summarized in Table 15.1. Because of the symmetry in their coefficients, FIR filters with constant group delay can be implemented more efficiently than the general implementations of Figures 15.5 and 15.7.
Section 15.3
501
IIR Filters
Signal flow graphs for implementations of types 1 through 4 are shown in Figures 15.9 through 15.12. Table 15.1
Properties of FIR filters having constant group delay.
Type Length, N h[n] symmetry Linear phase Constant group delay
z −1
z −1
1
2
3
4
Odd Even Yes Yes
Even Even Yes Yes
Odd Odd No Yes
Even Odd No Yes
z −1
x [n ]
z −1 y [n ]
h0
Figure 15.9
15.3
z −1 h1
z −1 h2
h (N −3) / 2
h (N −1) / 2
Signal flow graph for a Type 1 constant-group-delay FIR filter.
IIR Filters
Infinite impulse response (IIR) digital filters were discussed in Chapter 8 in connection with using the bilinear transformation to model classical analog filters. The current section revisits IIR filters from the perspective of modeling them when they are used as part of the DSP processing in a communication system. IIR filters offer some advantages over FIR filters; they also suffer some disadvantages. IIR filters can usually acheive narrow transition bands and high levels of stopband attenuation using significantly fewer coefficients than a comparable FIR filter, but IIR filters cannot be designed to have exactly linear phase or constant group delay. IIR filters are also more likely to experience stability and numerical precision problems when
502
Modeling DSP Components
z −1
Chapter 15
z −1
z −1
x [n ] z −1
z −1 y [n ]
h0
z −1 h1
Figure 15.10
z −1 h2
h (N / 2 ) − 2
h (N / 2 ) − 1
Signal flow graph for a Type 2 constant-group-delay FIR filter.
z −1
z −1
z −1
x [n ] −1
z −1 y [n ]
h0
Figure 15.11
−1
−1
−1
z −1 h1
z −1 h (N −3) / 2
h2
Signal flow graph for a Type 3 constant-group-delay FIR filter.
implemented using finite-precision arithmetic. The defining equation for an IIR filter is y[k] =
N n=1
an y[k − n] +
M
bm x[k − m]
m=0
The SFG for the direct-form 1 realization of an IIR filter is shown in Figure 15.13. This system can be viewed as two systems in cascade—a moving average (MA) system followed by an autoregressive (AR) system. Because both of these systems are linear time-invariant systems, the order of the cascade can be reversed to obtain the system shown in Figure 15.14. The two delay chains running down the center of the figure are delaying the same signal, so they can be merged into a single chain
Section 15.3
503
IIR Filters
z −1
z −1
z −1
x [n ] −1
−1
z −1 y [n ]
h0
−1
z −1 h1
Figure 15.12
−1
−1
z −1
z −1 h2
h (N / 2 ) − 2
h (N / 2 ) − 1
Signal flow graph for a Type 4 constant-group-delay FIR filter.
to yield the system in Figure 15.15. This system is known as the direct-form 2 realization of an IIR filter. The models DirectForm1Iir and DirectForm2Iir, provided on the companion Web site, both use generic quantization strategies.
b0
x [k ] z −1
y [k ] z −1
b1 z −1
z −1 b2
b M −1 z −1 bM
Figure 15.13
z −1
Signal flow graph for direct-form 1 IIR filter.
504
Modeling DSP Components
b0
x [k ] −1 z −1 z
b1 −1 z −1 z
b2
b M −1 z −1 z −1
Figure 15.14
bM
IIR filter with order of AR and MA sections reversed.
Chapter 15
y [k ]
Section 15.3
505
IIR Filters
b0
x [k ] z
−1
a1
b1 z −1
a2
b2
b M −1
aM −1
z −1 bM
aM
aN −1 z −1 aN Figure 15.15
Signal flow graph for direct-form 2 IIR filter.
y [k ]
Chapter 16
CODING AND INTERLEAVING E rror-correction codes used in wireless communication can be
divided into two broad categories: block codes and convolutional codes. The implementations for these two categories are very different, and so are the corresponding simulation models. Convolutional codes almost always need to be simulated at a level of detail that amounts to a de facto implementation. On the other hand, it is often possible to simulate the performance of block codes without explicitly modeling the details of the encoder or decoder at all. Interleavers are often employed with block codes to improve their performance. An introduction to both block and covolutional codes can be found in [35].
16.1
Block Codes
A block code operates on fixed-length blocks of information bits. These blocks are called message blocks. A block encoder operates on a message block of k information bits to generate an output block of n bits, where n > k. The output block is called a code word, code vector, or code block. The rules for generating the code block vary depending upon the specific code (Hamming, Reed-Solomon, BCH, etc.) being used. An information block of k bits is capable of conveying 2k different messages. The encoded transmission uses n bits to convey only k bits of useful information, so the code block contains (n − k) bits of redundancy. It is this redundancy that is the source of the code’s ability to detect or correct errors. A block code having n bits per code block and k bits per message block is designated as a (n, k) code. The minimum distance d or number of correctable errors τ is often included as an explicitly identified parameter, as in a “(7, 4, d = 3) code” or “(7, 4, τ = 1) code.” Encoding and decoding of block codes involve arithmetic over Galois fields, which is summarized in Appendix C. 506
Section 16.1
16.1.1
507
Block Codes
Cyclic Codes
A binary block code is linear if and only if the modulo-2 sum of any two codewords is also a codeword. A linear block code C is a cyclic code if every cyclic shift of a codeword in C is also a codeword in C. Linear cyclic codes possess mathematical properties that make them easier to encode and decode than linear codes that are not cyclic. The codewords for a (7, 4, d = 3) cyclic code are listed in Table 16.1. The codewords have been sorted into groups such that the codewords within each group are cyclic shifts of each other. Just like the elements of a Galois field, as discussed in Appendix C, each of the codewords can be represented by a polynomial as shown in the table. The nonzero code polynomial of minimum degree is unique and is designated as the generator polynomial of the code and denoted as g(x). A number of useful results have been developed concerning the generator polynomial: • If the generator polynomial of a cyclic code is of the form g(x) =
r
gi x i
i=0
the constant term g0 will always equal 1. • For an (n, k) cyclic code, the degree of the generator polynomial is n − k. • A polynomial of degree n − 1 or less with binary coefficients is a code polynomial of the code C if and only if the polynomial is divisible by the code’s generator polynomial g(x). • If g(x) is the generator polynomial of an (n, k) cyclic code, then g(x) is a factor of x n + 1. • If g(x) is a polynomial of degree n − k and is a factor of x n + 1, then g(x) is the generator polynomial for some (n, k) cyclic code. Algorithm 16.1 provides an encoding approach for cyclic codes that is based on synthetic division. For small values of n and k, this approach can be applied manually using pencil and paper, perhaps to verify the correct operation of a cyclic encoder model. The generator polynomial for the code in Table 16.1 is x 3 + x + 1. The information sequence for the tenth codeword in the table corresponds to the polynomial p(x) = x 3 + 1. When multiplied by x n−k , this becomes x 6 + x 3 . Synthetic division of x 6 + x 3 by x 3 + x + 1 is shown in Figure 16.1. The remainder of x 2 + x corresponds to the check bits 110, which agrees with the check bits shown for the tenth entry in Table 16.1.
508
Coding and Interleaving
Table 16.1 code.
Info block 0000 0001 0010 0101 1011 0110 1100 1000 0100 1001 0011 0111 1110 1101 1010 1111
Codewords for linear cyclic (7,4)
Check block 000 011 110 100 000 001 010 101 111 110 101 010 100 001 011 111
Polynomial 0 x +x+1 x4 + x2 + x x5 + x3 + x2 x6 + x4 + x3 x5 + x4 + 1 x6 + x5 + x x6 + x2 + 1 x5 + x2 + x + 1 x6 + x3 + x2 + x x4 + x3 + x2 + 1 x5 + x4 + x3 + x x6 + x5 + x4 + x2 x6 + x5 + x3 + 1 x6 + x4 + x + 1 x6 + x5 + x4 + x3 +x 2 + x + 1 3
Chapter 16
Section 16.2
509
BCH Codes
Algorithm 16.1
Using synthetic division to encode cyclic codes. n is the code length. k is the number of information bits. Execute: 1. Form the polynomial representation of the k-bit information sequence. 2. Multiply this polynomial by x n−k . 3. Divide this product by the generator polynomial g(x). The remainder produced by this division is the polynomial representation of the check bits.
x3 + x x3 + x +1 x6 +x 3 6 4 x +x +x 3 x4 x 4 +x 2 +x 2 +x x 2 +x Figure 16.1
16.2
Synthetic division of x 6 +x 3 by x 3 +x +1.
BCH Codes
BCH codes are a class of linear cyclic block codes discovered by R. C. Bose and D. K. Ray-Chaudhuri [36] and independently by A. Hocquenghem [37]. A coding theory result known as the BCH bound states that a linear cyclic code is guaranteed to have a minimum distance of δ or greater if the code is constructed such that • each codeword contains n bits. • the code’s generator polynomial g(x) has included among its roots (δ − 1)
510
Coding and Interleaving
Chapter 16
consecutive powers of β, where β is an element of order n from the extension field GF(2m ). BCH codes are the result of creating a generator polynomial having a sequence of roots that satisfies the BCH bound. In order to correct τ errors, the code must have a minimum distance of at least δ = 2τ + 1. The sequence of required roots can be denoted as β b+1 , β b+2 , . . . , β b+2τ Because the roots β b+1 through β b+2τ are drawn from the extension field GF(2m ), the polynomial formed as the product of the factors (x + β b+1 ) through (x + β b+2τ ) will, in general, have coefficients that are also elements of GF(2m ). For a binary code, the generator polynomial must have coefficients from the prime field GF(2). Therefore, the generator polynomial is formed as g(x) = (x + β b+1 )(x + β b+2 )(x + β b+3 ) · · · (x + β b+2τ )p(x) where the polynomial p(x) contains additional roots that are needed to ensure that each coefficient of g(x) is either 0 or 1. The additional roots needed to define p(x) can be found using minimal polynomials. (See Appendix C.) For each required root β r , there is a minimal polynomial M (r) (x) that has β r as a root and has binary coefficients drawn from GF(2). It follows, then, that a generator polynomial that has binary coefficients and that includes all the required roots can be obtained as the least common multiple of the minimal polynomials M (r) (x) for r = b + 1 through r = b + 2τ : g(x) = lcm M (b+1) (x), M (b+2) (x), . . . , M (b+2τ ) (x) The BCH codes most often encountered in practical communications system are primitive narrow-sense BCH codes. A primitive BCH code results when the element β is a primitive element of the extension field GF(2m ). A narrow-sense BCH code results when b = 0, making the sequence of required roots β 1, β 2, . . . , β τ All of the BCH codes considered in this book are primitive narrow-sense BCH codes even if not specifically identified as such. Algorithm 16.2 can be used to construct the generator polynomial for given values of n and τ . This algorithm is implemented by the class BchGenPoly, which is summarized in Table 16.2.
Section 16.2
511
BCH Codes
Algorithm 16.2
Constructing the generator polynomial for a primitive narrow-sense BCH code.
n is the code length subject to the constraint n = 2m − 1, where m is an integer. τ is the desired maximum number of errors to be corrected in each block of n bits. Initialize: g(x) = 1 Execute: For n = 0, 1, 2, . . . , 1. From GF(2m ) select a primitive element β = α j , where j is an integer such that 1 ≤ j ≤ 2m − 2 and gcd(j, 2m − 1) = 1. 2. Use Algorithm C.5 to decompose 2m − 1 into cyclotomic cosets C0 , C1 , C3 , . . . , CQ . 3. Set up = 0 for p = 1, 3, . . . , Q. 4. For j = 0, 1, 2, . . . , τ − 1 compare (2j + 1) to the elements of each cyclotomic coset. If (2j + 1) is an element of cyclotomic coset Cp , then set up = 1. 5. For p = 1, 3, . . . , Q, if up = 1, then form the minimal polynomial M (p) (x) as % (x + q) M (p) (x) = q∈Cp
6. Form the generator polynomial g(x) as g(x) =
Q %
up M (p) (x)
odd p=1
The models BchEncoder and BchDecoder both make use of BchGenPoly. The BchEncoder model implements Algorithm 16.1 as tailored for the BCH case. The BchDecoder model implements the Peterson-Berlekamp algorithm described in the next section.
512
Coding and Interleaving
Table 16.2
Chapter 16
Summary of class BchGenPoly.
Constructors: BchGenPoly::BchGenPoly( int code block len, int max correctable errs) :PolyOvrPrimeField(); BchGenPoly::BchGenPoly( int code block len, int info block len); :PolyOvrPrimeField(); Public methods: int GetMinDistance( void ); int GetInfoBlockLen( void ); int GetMaxCorrectableErrs( void ); Notes: 1. This class does not inherit from PracSimModel and does not read from ParmInput. 2. This class inherits from the base class PolyOverPrimeField. 3. This class creates an instance of class BinaryExtenField. 4. This class creates an instance of class CyclotomicPartition. 5. This class creates several instances of class MinimalPolynomial. 6. Source code is contained in file bch gen poly.cpp.
Section 16.3
16.3
Interleavers
513
Interleavers
Block codes such as BCH codes fail when more than the correctable number of bit errors occur within a single code block. There are many channels in which errors tend to occur in bursts, greatly increasing the likelihood that an excessive number of bit errors will occur within a single code block. In communications systems designed to operate over bursty channels, interleavers are often used to reduce the likelihood of code-block failures. The interleavers permute the transmission order of the encoded bits so that bits from a single code block are dispersed over many codeblock durations in the channel. Within a single code-block duration in the channel, there will be bits from many different code blocks. Thus, a burst of errors created in the channel will be spread over many different code blocks when the deinterleaver restores the encoded bits to their original sequence at the receiver prior to being decoded. With a properly designed combination of block code and interleaver length, each deinterleaved code block will contain sufficiently few bit errors so that the decoder can correct them and deliver an error-free information block. There are two basic types of interleavers: block interleavers and convolutional interleavers.
16.3.1
Block Interleavers
A block interleaver is conceptually very simple. As depicted in Figure 16.2, a rectangular array of bit cells is filled row by row, and when the array is full, the bits are read out column by column. In a practical system, continuous delivery of bits out of the interleaver is accomplished by having two arrays so that one can be read out while the other is being written. Once the input array is completely filled, it becomes the output array, and the original output array becomes the new input array. At the receiver, the deinterleaver conceptually fills its array column by column and reads out row by row. As a practical matter, the sense of the rows and columns is not important. If the interleaver has an NR × NC array and fills this array row by row, a second interleaver with an NC × NR array can accomplish the deinterleaving while filling its array row by row also. The important issue is the relative dimensions of the interleaving and deinterleaving arrays. A single simulation model can be designed to perform either interleaving or deinterleaving and avoid alltogether the notion of rows and columns. The BlockPermuter model, summarized in Table 16.3, configures its internal arrays based on user-supplied values for Fill Segment Len and Drain Segment Len. The value of Drain Segment Len for the interleaver must equal the value of Fill Segment Len for the deinterleaver. Likewise for the interleaver Fill Segment Len and the deinterleaver Drain Segment Len.
514
Coding and Interleaving
Chapter 16
1, 2, 3, 4, 5, 6, 7, 8, . . . fill by rows
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 read by columns 1, 9, 17, 25, 2, 10, 18, 26, . . . Figure 16.2
Table 16.3
Block interleaver.
Summary of model BlockPermuter.
Constructors: BlockPermuter(
char* instance name, PracSimModel* outer model, Signal< int >* in sig, Signal< int >* out sig )
Parameters: int Fill Segment Len; int Drain Segment Len; Notes: 1. BlockSize for this model is set by the PracSim system. 2. Source code is contained in file block perm.cpp.
16.3.2
Convolutional Interleavers
The conceptual design of a convolutional interleaver is depicted in Figure 16.3. A routing commutator cycles through a set of delay lines, routing each successive symbol through a different line. At the other end of the delay lines, a selecting commutator cycles through the parallel lines to construct a permuted serial symbol
Section 16.4
515
Convolutional Codes
stream for transmission over the channel. At the receiver, a routing commutator cycles through a set of delay lines, routing each successive symbol from the permuted symbol sequence through a different line. A selecting commutator cycles through the parallel lines to construct a serial symbol stream that is restored to the same order as before the interleaver at the transmitter. The delay lines in the interleaver are fed in sequence from the shortest (no delay) to the longest delay. The delay lines in the deinterleaver are fed in sequence from longest to shortest. The model ConvolutionalPermuter, summarized in Table 16.4, can be used to implement either an interleaver or deinterleaver according to the value of the parameter Shortest Delay First.
input
receiver
transmitter
Figure 16.3
16.4
output
channel
Convolutional interleaver.
Convolutional Codes
An encoder for a simple convolutional code is shown in Figure 16.4. Bits are shifted into the left end of the three-stage shift register. The switch changes position at twice the input shift rate, thereby producing two output bits for each input bit. The code produced by this encoder can be described as a rate one-half constraint length 3 code, because the input rate is half the output rate and the encoder has 3 bits of memory. The two adders shown in the figure perform modulo-2 addition of the shiftregister bits to which they are connected. The encoder is a state machine that can be represented as the Moore machine, shown in Figure 16.5. The taps (connections to the shift register) are characterized by polynomials, which can be represented using a k-tuple of bits. These k-tuples are further abbreviated by using their equivalent octal values. For the encoder in Figure 16.4, the octal representations are g0 = 78 and g1 = 58 . For the encoder in Figure 16.6, the octal tap representations are g0 = 1718 and g1 = 1338 .
516
Coding and Interleaving
Table 16.4
Chapter 16
Summary of model ConvolutionalPermuter.
Constructors: ConvolutionalPermuter( char* instance name, PracSimModel* outer model, Signal* in sig, Signal* out sig ) Parameters: int Numb Delay Lines; int Delay Increment; bool Shortest Delay First; Notes: 1. BlockSize for this model is set by the PracSim system. 2. Source code is contained in file conv perm.cpp.
g0
g1 Figure 16.4 A simple convolutional encoder.
In the Moore machine model of the encoder, there are 23 = 8 states, and the two output bits are functions of the machine’s state immediately after a new input bit has been shifting in to create the new state. The rightmost bit of the shift register at state-time k plays a role in the outputs for time k, but is of no consequence in the transition to a new state at time k + 1. The state at time k + 1 and the corresponding outputs are completely determined by the input value at time k + 1 along with
Section 16.4
517
Convolutional Codes
1
100 11 0
1
0
0 000 00
010 10
1
0
0 001 11
Figure 16.5
110 01
0
101 00 1 0
1
1
111 10 0
1
1
011 01
Moore machine representation of the encoder from Figure 16.4.
g0
g1 Figure 16.6 A convolutional encoder for a constraint length 7 code with g0 = 1718 and g1 = 1338 .
two leftmost register bits at time k. This view of encoder operation suggests the Mealy machine representation shown in Figure 16.7. Because only the two leftmost register bits are considered, there are only 22 = 4 states in this representation. In a Mealy machine, the outputs are associated with the state transitions rather than with the states. The Moore machine representation is more of a static view and is perhaps easier to think about. However, the Mealy machine representation leads to a simpler trellis representation of encoder operation.
518
Coding and Interleaving
Chapter 16
0/00
00 1/11
0/11 0/10
01
10 1/00 1/01
0/01
11 1/10
Figure 16.7 Mealy machine representation of the encoder from Figure 16.4.
16.4.1 Trellis Representation of a Convolutional Encoder The trellis representation of encoder operation is essential to the understanding of how a Viterbi decoder operates. Figure 16.8 shows a trellis representation for the encoder corresponding to the Mealy machine in Figure 16.7. Each vertical column of nodes corresponds to the possible states at a single time step. As drawn, the trellis indicates the assumption that the encoder is always initialized to the 00 state at time 0. Depending upon the value of the first input bit, the encoder can remain in state 00 or transition to state 10 for time 1. Remaining in state 00 is indicated by the upper branch leaving state 00 time 0 and entering state 00 time 1. This branch is labeled 00, indicating that output bits G0 and G1 are both 0. Transition to state 10 is indicated by the lower branch leaving state 00 time 0 and entering state 10 time 1. This branch is labeled 11, indicating that output bits G0 and G1 are both 1. If the encoder remains in state 00 at time 1, the possible transitions from time 1 to time 2 are the same as for time 0 to time 1. However, if the encoder is in state 10 at time 1, the second input bit will force a transition to either state 01 or state 11 for time 2. Transition to state 01 is indicated by the upper branch leaving state 10 time 1 and entering state 01 time 2. This branch is labeled 10, indicating that the output bits are
Section 16.4
519
Convolutional Codes
G0 = 1 and G1 = 0. Transition to state 11 is indicated by the lower branch leaving state 10 time 1 and entering state 11 time 2. This branch is labeled 01, indicating that the output bits are G0 = 0 and G1 = 1. Notice that the transition branches are not labeled with the particular input value needed to cause the transition. To eliminate clutter in trellis diagrams, it is a common practice to arrange the states within each column so that for the transitions leaving a state, the transition caused by a 0 input is drawn above the transition caused by a 1 input. 00
00 11
00
00
00
11
10
01
01
01 01
00
11
11 00 10
10
00
11
11
11
00
11
11
00
11
11
11
00
00
00
10
10
10
10
01
01 01
01
01
01
01
11
11
00
01
00
11
11 00 10
00 10
01 01
01 01
11 10 time: 0
1
Figure 16.8
2
10
10
3
4
10 5
10 6
10 7
10 8
9
Encoder trellis for 7-bit message plus 2-bit tail.
16.4.2 Viterbi Decoding Figure 16.9 shows the encoder trellis from Figure 16.8 with highlighting on the path that would be traversed to encode the 7-bit message 100010100 plus a 2-bit tail of zeros added to flush out the encoder. The trellis has been modified for the specific message length. The eighth and ninth input bits will always be zeros, so the transitions corresponding to inputs of 1 have been removed from the eighth and ninth transition columns of the trellis. The output generated by the encoder is 11
10
11
00
11
10
00
10
11
Suppose that two bit errors occur so that the received code sequence is 10
10
11
00
11
11
00
10
11
The receiver “knows” that the encoder started at the 00 state. Therefore, upon receiving the first dibit of 10, the receiver knows that one of two possibilites must have occurred:
520
Coding and Interleaving
input: 1 output: 11 00
0 10
00 11
0 11
00
0 00
00
00
11
10 10 01
01
01 01
11
11
0 10
0 11
00
00
00
11
11
11
11
00
00
00
00
10
10
10
10
01 01
1 00
00
11
11 00 10
0 10
00
11
11
11
1 11
01 01
Chapter 16
01 01
11 10
01 01
01
11 10 time: 0
Figure 16.9 100010100.
1
2
10 3
10 4
10 5
10 6
7
8
9
Encoder trellis showing the path traversed to encode the input sequence
1. The first information bit was 0, causing the encoder to remain in the 00 state and produce an output symbol of 00. This symbol was received with an error in the first bit, changing 00 into 10. 2. The first information bit was 1, causing the encoder to transition from the 00 state to the 10 state and produce an output symbol of 11. This symbol was received with an error in the second bit, changing 11 into 10. The two possibilities are equally likely because each implies that one bit error was introduced by the channel. These results are summarized in the partial trellis of Figure 16.10. Upon receiving the second symbol of 10, the receiver knows that one of four possibilities must have occurred: 1. The encoder was in the 00 state, and the second information bit was 0, causing the encoder to remain in the 00 state and produce an output symbol of 00. This symbol was received with an error in the first bit, changing 00 into 10. This possibility implies one bit error in symbol 1 and one bit error in symbol 2 for a total of two bit errors. 2. The encoder was in the 00 state, and the second information bit was 1, causing the encoder to transition from the 00 state to the 10 state and produce an output symbol of 11. This symbol was received with an error in the second
Section 16.4
521
Convolutional Codes
received: 10 00 00 11
1 error
1 error
10
01
11 time: 0
Figure 16.10
1
Partial encoder trellis for time 1.
bit, changing 11 into 10. This possibility implies one bit error in symbol 1 and one bit error in symbol 2 for a total of two bit errors. 3. The encoder was in the 10 state, and the second information bit was 0, causing the encoder to transition from the 10 state to the 01 state and produce an output symbol of 10. This symbol was received correctly. This possibility implies one bit error in symbol 1. 4. The encoder was in the 10 state, and the second information bit was 1, causing the encoder to transition from state 10 to state 11 and produce an output symbol of 01. This symbol was received with errors in both the first and second bits, changing 01 into 10. This possibility implies one bit error in symbol 1 plus two bit errors in symbol 2 for a total of three bit errors. Possibility 3 is the most likely, because only one bit error would be needed to turn the hypothesized transmit sequence into the received sequence. The other possibilities would require two or three bit errors to produce the received sequence. The results after reception of the second symbol are summarized in the partial trellis of Figure 16.11. The analysis of the third received symbol gets interesting, because this analysis will reveal the “trick” that makes the Viterbi decoder into something that is vastly more efficient than a combinatorial analysis of all possible error scenarios.
522
Coding and Interleaving
received: 10 00 00 11
Chapter 16
10 2 errors
00 11
2 errors
10 10 01
1 error
01
11 time: 0
Figure 16.11
3 errors 1
2
Partial encoder trellis for time 2.
Upon receiving the third symbol of 11, the receiver knows that one of eight possibilities must have occurred: 1. The encoder was in state 00, and the third information bit was 0, causing the encoder to remain in state 00 and produce an output symbol of 00. This symbol was received with errors in both the first and second bits, changing 00 into 11. This possibility implies a total of four bit errors—two in symbol 3 and two in prior symbols. 2. The encoder was in state 00, and the third information symbol was 1, causing the encoder to transition from state 00 to state 10 and produce an output symbol of 11. This symbol was received correctly. This possibility implies two bit errors, both in prior symbols. 3. The encoder was in state 10, and the third information bit was 0, causing the encoder to transition from state 10 to state 01 and produce an output symbol of 10. This symbol was received with an error in the second bit, changing 10 into 11. This possibility implies a total of three bit errors—one in symbol 3 and two in prior symbols. 4. The encoder was in state 10, and the third information bit was 1, causing the
Section 16.4
Convolutional Codes
523
encoder to transition from state 10 to state 11 and produce an output symbol of 01. This symbol was received with an error in the first bit, changing 01 into 11. This possibility implies a total of three bit errors—one in symbol 3 and two in prior symbols. 5. The encoder was in state 01, and the third information bit was 0, causing the encoder to transition from state 01 to state 00 and produce an output symbol of 11. This symbol was received correctly. This possibility implies one bit error in a prior symbol. 6. The encoder was in state 01, and the third information bit was 1, causing the encoder to transition from state 01 to state 10 and produce an output symbol of 00. This symbol was received with errors in both the first and second bits, changing 00 into 11. This possibility implies a total of three bit errors—two in symbol 3 and one in a prior symbol. 7. The encoder was in state 11, and the third information bit was 0, causing the encoder to transition from state 11 to state 01 and produce an output symbol of 01. This symbol was received with an error in the first bit, changing 01 into 11. This possibility implies a total of four bit errors—one in symbol 3 and three in prior symbols. 8. The encoder was in state 11, and the third information bit was 1, causing the encoder to remain in state 11 and produce an output symbol of 10. This symbol was received with an error in the second bit, changing 10 into 11. This possibility implies a total of four bit errors—one in symbol 3 and three in prior symbols. The results after reception of the third symbol are summarized in the partial trellis of Figure 16.12. The ultimate goal of the decoding process is to determine, after all nine 2-bit symbols have been received, which complete path through the trellis the encoder most likely followed during the encoding operation. If the selected path is indeed the path that the encoder followed, then all nine information bits can be recovered correctly even though errors may have occurred in the reception of the encoded symbols. In Figure 16.12, each state at time 3 has two paths arriving from different states in time 2. The circled number near each path indicates the metric for the path. In this case, the metric is the Hamming distance, which is simply the cummulative number of differing bits between the received sequence and the encoder output sequence corresponding to that path. It is assumed that paths with lower metrics are
524
Coding and Interleaving
received: 10 00 00 11
10
11
00
4 errors
00
1 error
11
11
2 errors
11
10
00 10
10 01
01
Chapter 16
3 errors 3 errors
01 4 errors
01
3 errors
11 10 time: 0
Figure 16.12
1
2
4 errors 3
Partial encoder trellis for time 3.
more likely than paths with higher metrics. If the most likely complete path passes through state X at time 3, then this path must include the most likely partial path from state 00 at time 0 to state X at time 3. Thus, of the two paths arriving at each node for time 3, the less likely one can be pruned away. Figure 16.13 shows the pruning result of the partial trellis from Figure 16.12. A sequence of partial trellises for times 4 through 9 is shown in Figures 16.14 through 16.19. The trellis for time k shows all eight possible transitions from states at time k − 1. Then, in the trellis for time k + 1, the non-surviving paths from time k − 1 to time k are pruned away. The trellis in Figure 16.16 shows a tie for state 10 at time 4; both transitions entering state 10 have a metric of 3. In cases of a tie, the surviving path can be selected arbitrarily. The soft-decision metrics, discussed in Section 16.5, greatly reduce the incidence of ties. In the contrived example just presented, the total message length was only 9 bits, including 2 flush bits. Typical message lengths are longer than this, but the practice of using flush bits to drive the encoder back to the 00 state is common for relatively short message lengths. However, for longer messages, the decoding of received bits does not need to wait for the entire encoded message to be received. As shown in Figure 16.16, by time 6, all surviving paths share a common subpath from time 0 through time 3. More complicated codes have more rows in their trellis and it takes more than three symbol times for the surviving paths to merge into a
525
Section 16.5 Viterbi Decoding with Soft Decisions
received: 10 00 00 11
10
11 1 error
00 11
11
11
10 10 01
2 errors 10 3 errors
01
11 time: 0
3 errors 1
2
3
Figure 16.13 Partial encoder trellis for time 3 after pruning to remove non-surviving paths.
single subpath. However, at some point, the early portions of the surviving paths all share a common subpath and it is possible to decode the bits corresponding to this subpath. The number of symbol intervals that must elapse before the decoder can assume that all surviving paths are merged is called the traceback depth of the decoder. For the commonly used constraint-length 7 codes, a traceback depth of 40 symbol times is typical. This means that at time k the decoder can issue the decoded bits for times up through k − 40.
16.5 Viterbi Decoding with Soft Decisions The previous discussion of Viterbi decoders involved only hard decisions—the received symbol decisions input to the decoder were 00, 01, 10, and 11. The real strength of Viterbi decoders is their ability to easily make use of soft decisions. Soft decisions convey an indication of signal quality and how confident the receiver is regarding the decisions that have been made. Assume that a demodulator output voltage of +1 v corresponds to a bit value of 1, and an output voltage of −1 v corresponds to a bit value of 0. When Gaussian noise is added to the signal, the demodulator output for a binary 1 will have a probability density function (PDF) like the one shown in Figure 16.20. Under a hard-decision
526
Coding and Interleaving
received: 10 00 00 11
10
11
00 1
00
00 11
10
3
11
3
00 10
10 01
5
11
11
11
Chapter 16
00 10 3
01
01
4
01
3
11 10 time: 0
1
Figure 16.14 received: 10 00 00 11
2
4
3
4
Partial encoder trellis for time 4. 10
11
00
00
11
00 11
00
11
3
11 1
11
10
3
11
10
00 5 10 4
10
01
01
01
4
01
4
11 10 time: 0
1
Figure 16.15
2
3
4
4
5
Partial encoder trellis for time 5.
paradigm, all positive voltages would be decided as 1, and all negative voltages decided as 0. The shaded area in the figure represents the probability of correctly deciding 1 for this noisy demodulator output. The unshaded area under the curve represents the probability of incorrectly deciding 0 instead of 1. A similar PDF can be drawn for a demodulator output consisting of a −1 v signal plus AGN.
527
Section 16.5 Viterbi Decoding with Soft Decisions
received: 10 00
10
11
00
11
00 11
11
11
00
00
11
4
11
11
10
5
3
11
00 4 10
10
01
10
01
2
01
5
01
2
11 10 time: 0
1
2
Figure 16.16 received: 10 00
3
4
5
5
6
Partial encoder trellis for time 6.
10
11
00
11
00 11
11
11
00
4
00
11
11
11
10
00
4
11
11
6
11
00 2 10
10
01
10
4
10
01
01
3
01
4
11 10 time: 0
1
Figure 16.17
2
3
4
5
6
3
7
Partial encoder trellis for time 7.
Soft decisions divide the decision space for a bit interval into more than two regions. In Figure 16.21, the abscissa has been divided into four zones. The zone for v > 0.5 is a confident decision of 1. This decision is designated as 1H . The zone for 0 < v < 0.5 is a less confident decision of 1, designated as 1L . The zone for v < −0.5 is a confident (albeit incorrect) decision of 0, designated 0H , and the zone for −0.5 < v < 0 corresponds to a less confident decision of 0, designated 0L . Instead of the binary symmetric channel assumed for hard-decision
528
Coding and Interleaving received: 10 00
10
11
00
11
11
00
00 11
11
10 5
00
00
11
4
11
10
Chapter 16
11
11 00
10
10
01
10 2
10 01 01
5
01
11 10 time: 0
1
2
Figure 16.18
received: 10 00
10
3
4
5
6
7
8
10
11 6
Partial encoder trellis for time 8.
11
00
11
11
00
00
00 11
11
2
10 11
00
10 10
01
11
11
10
01 01
11 time: 0
1
2
Figure 16.19
3
4
5
6
7
8
9
Partial encoder trellis for time 9.
decoding, the set of four soft decisions 0H , 0L , 1H , and 1L imply the binary-toquaternary discrete memoryless channel (DMC) diagrammed in Figure 16.22. The probabilities shown in the figure should not be used directly as branch metrics in a Viterbi decoder. Because they are probabilities, they would need to be multiplied when combining branch metrics into path metrics, and multiplication is something to be avoided in high-speed decoder implementations. Instead, the branch metrics for the decoder should be based on the logarithms of the probabilities, which are listed in Table 16.5. Because multiplication of two numbers corresponds to the addition of their logarithms, branch metrics based on logarithms can be added to
529
Section 16.5 Viterbi Decoding with Soft Decisions
p
area = 0.9 area = 0.1
-1
0
1
V
Figure 16.20 PDF for signal consisting of a 1v level plus additive Gaussian noise.
form path metrics. The values shown in Table 16.5 are all negative with the least negative value corresponding to the most likely event. For convenience, the negative signs can be dropped, making the smallest value correspond to the most likely event. Figure 16.23 shows the trellis from Figure 16.9 redrawn to indicate metrics based on soft decisions. p
area = 0.6 area = 0.075 area = 0.025 -1
-0.5
Figure 16.21
area = 0.3 0
0.5
1
PDF for noisy signal showing regions for soft decisions.
V
530
Coding and Interleaving
0H
0.6
0.025
0
0.3
0L
0.075 0.075 0.3 1L
1
0.025 0.6
1H
Figure 16.22 Binary to quaternary discrete memoryless channel.
Table 16.5 Transition probabilities and their logarithms for the DMC shown in Figure 16.22.
probability 0.60 0.30 0.075 0.025
logarithm −0.22 −0.52 −1.12 −1.60
Chapter 16
00 0 L0H
11 1H1H
10 1L1L
00 0L0H
10 1H0H
11 1H1H
1.64
2.98
6.18
3.56
6.76
7.44
8.51
10.18
12.6
42
11
10
58
54
11
32
0
7.2
0
0
8.3 6 58
10
82
6.
10 01
00
9.
6.98
8
01
.1
10
6
9.00
8.3
01
4
5
00
64
10
10
10
4
01
11
11
7.
5.6
Figure 16.23
3
7.36
6
38
6.
01
6
10
7.
10
00
.1
11
00
7.3
2
7.70
4
1
01
5.5
8
6
time: 0
36
70
7.
01
10
10 01
01
5.5
4.3
6.18
76
00
10
5.
36
18
6.
00
7.
01 01
11
58
01
38
01
10
60
9.
7.
4.
5.
2.
531
10
11
11
96
00
00 10
00
11
6.
5.
11
4.
5.
24
3.
76
00
9.4
11
3.
00
6.
64
00 11
8.4
11
1.
00 11
7. 76
11
00
5.2
00
00
6
11 1H1H
8.3 0
00
10 1L0H
2
Rx:
11 1L0L
2.8
Tx:
6
Decoder trellis showing soft-decision path metrics.
7
8
9
Appendix A
MATHEMATICAL TOOLS
A.1 Trigonometric Identities tan x =
sin x cos x
sin(−x) = − sin x
(A.1.2)
cos(−x) = cos x
(A.1.3)
tan(−x) = − tan x
(A.1.4)
cos2 x + sin2 x = 1
(A.1.5)
cos2 x =
1 [1 + cos (2x)] 2
(A.1.6)
sin(x ± y) = (sin x)(cos y) ± (cos x)(sin y)
(A.1.7)
cos(x ± y) = (cos x)(cos y) ∓ (sin x)(sin y)
(A.1.8)
tan(x + y) =
(tan x) + (tan y) 1 − (tan x)(tan y)
sin(2x) = 2(sin x)(cos x) 532
(A.1.1)
(A.1.9)
(A.1.10)
533
Section A.1 Trigonometric Identities
cos(2x) = cos2 x − sin2 x tan(2x) =
2(tan x) 1 − tan2 x
(A.1.13)
(cos x)(cos y) =
1 [cos(x + y) + cos(x − y)] 2
(A.1.14)
(sin x)(cos y) =
1 [sin(x + y) + sin(x − y)] 2
(A.1.15)
(sin x) + (sin y) = 2 sin
x−y x+y cos 2 2
(A.1.16)
(sin x) − (sin y) = 2 sin
x−y x+y cos 2 2
(A.1.17)
(cos x) + (cos y) = 2 cos
x+y x−y cos 2 2
(A.1.18)
x+y x−y sin 2 2
(A.1.19)
A cos(ωt + ψ) + B cos(ωt + φ) = C cos(ωt + θ )
(A.1.20)
(cos x) − (cos y) = −2 sin
1/2 C = A2 + B 2 − 2AB cos(φ − ψ) A sin ψ + B sin φ −1 θ = tan A cos ψ + B cos φ A cos(ωt + ψ) + B sin(ωt + φ) = C cos(ωt + θ )
where
(A.1.12)
1 [− cos(x + y) + cos(x − y)] 2
(sin x)(sin y) =
where
(A.1.11)
1/2 C = A2 + B 2 − 2AB sin(φ − ψ) −1 A sin ψ − B cos φ θ = tan A cos ψ + B sin φ
(A.1.21)
534
Mathematical Tools
Appendix A
A.2 Table of Integrals
1 dx = ln x x
1 ax e a
(A.2.2)
ax − 1 ax e a2
(A.2.3)
eax dx = xeax dx =
1 sin(ax) dx = − cos(ax) a
cos(ax) dx =
1 sin(ax) a
1 sin(ax + b) dx = − cos(ax + b) a
cos(ax + b) dx =
1 sin(ax + b) a
1 x x sin(ax) dx = − cos(ax) + 2 sin(ax) a a
x cos(ax) dx =
x 1 sin(ax) + 2 cos(ax) a a
(A.2.1)
(A.2.4)
(A.2.5)
(A.2.6)
(A.2.7)
(A.2.8)
(A.2.9)
sin2 ax dx =
sin 2ax x − 2 4a
(A.2.10)
cos2 ax dx =
x sin 2ax + 2 4a
(A.2.11)
535
Section A.2 Table of Integrals
x 2 sin ax dx =
1
2ax sin ax + 2 cos ax − a 2 x 2 cos ax 3 a
(A.2.12)
x 2 cos ax dx =
1
2 2 2ax cos ax − 2 sin ax + a x cos ax a3
(A.2.13)
1 sin3 x dx = − cos x(sin2 x + 2) 3
(A.2.14)
cos3 x dx =
1 sin x(cos2 x + 2) 3
sin x cos x dx = sin(mx) cos(nx) dx =
1 2 sin x 2
cos(m + n)x − cos(m − n)x − 2(m − n) 2(m + n)
(A.2.15)
(A.2.16)
(m2 = n2 ) (A.2.17)
1 1 x − sin(4x) sin x cos x dx = 8 4 2
2
sin x cosm x dx =
− cosm+1 x m+1
(A.2.19)
sinm+1 x m+1
(A.2.20)
sinm x cos x dx =
(A.2.18)
udv = uv −
vdu
(A.2.21)
536
A.3
Mathematical Tools
Appendix A
Logarithms
The base-10 logarithm or common logarithm of a number x is the power to which 10 must be rasied to equal x: y = log10 x ⇔ x = 10y The base-e logarithm or natural logarithm of a number x is the power to which e must be raised to equal x: y = loge x = ln x ⇔ x = ey In the study of Galois fields and coding theory, it is sometimes necessary to use base-2 logarithms: y = log2 x ⇔ x = 2y Table A.1 lists a number of useful properties of logarithms. Table A.1 Properties of Logarithms.
1.
logb (xy) = logb x + logb y
2.
logb
3.
logb (y x ) = x logb y
4. 5.
1 x
= − logb x
log x logc c = logb x logc b = logb c b ∞ n |z| < 1 ln (1 + z) = (−1)n−1 zn n=1
6.
x
ln x = 1
7.
A.4
d dx
(ln x) =
1 y 1 x
dy
x>0 x>0
Modified Bessel Functions of the First Kind
The modified Bessel function of the first kind of order zero is used in the analysis of Rice random variables. The modified Bessel function of the first kind of order n
Section A.4
is denoted In (x) and is defined as 1 In (x) = 2π
A.4.1
537
Modified Bessel Functions of the First Kind
π
exp (x cos θ ) cos (nθ ) dθ
(A.4.1)
−π
Identities ∞ (x/2)2m+n In (x) = m!(n + m)! m=0
(A.4.2)
I−n (x) = In (x)
(A.4.3)
In (−x) = (−1)n In (x)
(A.4.4)
exp(x cos θ ) =
∞
In (x) exp(j nθ )
(A.4.5)
n=−∞
exp(x cos θ ) = I0 (x) + 2
∞
In (x) cos(nθ )
(A.4.6)
n=1
d n x In (x) = x n In−1 (x) dx
(A.4.7)
d In (x) In+1 (x) = dx xn xn
(A.4.8)
1 I0 (x) = π 1 I0 (x) = π
π
exp(x cos θ ) dθ
(A.4.9)
cosh(x cos θ ) dθ
(A.4.10)
0
π 0
Appendix B
PROBABILITY DISTRIBUTIONS IN COMMUNICATIONS
B.1
Uniform Distribution
A random variable uniformly distributed between a and b, where a < b, has a probability density function (PDF) given by . 1 a≤x≤b b−a p(x) = 0 elsewhere The mean is given by μ=
a+b 2
and the variance is given by σ2 =
B.2
(b − a)2 12
Gaussian Distribution
The Gaussian distribution is ubiquitous throughout science and engineering. The Gaussain distribution is also called the normal distribution. A Gaussian random variable has a PDF given by 1 −(x − μ)2 p(x) = √ exp 2σ 2 σ 2π 538
Section B.3
539
Exponential Distribution
where μ is the mean and σ 2 is the variance. For zero mean and unity variance, the PDF reduces to 2 −x 1 p(x) = √ exp 2 2π The cumulative distribution function (CDF) is obtained by integrating the PDF: 2 X 1 −x exp P (x ≤ X) = √ dx 2 2π −∞ This integral cannot be evaluated in closed form, but to facilitate manipulations of Gaussian CDFs, the error function has been defined as x
2 exp −y 2 dy erf x = √ π 0 The complementary error function, erfc, is defined by ∞
2 erfc x = √ exp −y 2 dy π x = 1 − erf x The Q function is defined as 2 ∞ −y 1 exp dy Q(x) = √ 2 2π x x 1 = erfc √ 2 2
B.3
Exponential Distribution
An exponentially distributed random variable has a PDF given by λ exp(−λx) x≥0 p(x) = 0 x