363 86 28MB
English Pages 725 [726] Year 2023
Springer Topics in Signal Processing
Aurelio Uncini
Digital Audio Processing Fundamentals
Springer Topics in Signal Processing Volume 21
Series Editors Jacob Benesty, INRS-EMT, University of Quebec, Montreal, QC, Canada Walter Kellermann, Erlangen-Nürnberg, Friedrich-Alexander-Universität, Erlangen, Germany
The aim of the Springer Topics in Signal Processing series is to publish very high quality theoretical works, new developments, and advances in the field of signal processing research. Important applications of signal processing are covered as well. Within the scope of the series are textbooks, monographs, and edited books. Topics include but are not limited to: * Audio & Acoustic Signal Processing * Biomedical Signal & Image Processing * Design & Implementation of Signal Processing * Graph Theory & Signal Processing * Industrial Signal Processing * Machine Learning for Signal Processing * Multimedia Signal Processing * Quantum Signal Processing * Remote Sensing & Signal Processing * Sensor Array & Multichannel Signal Processing * Signal Processing for Big Data * Signal Processing for Communication & Networking * Signal Processing for Cyber Security * Signal Processing for Education * Signal Processing for Smart Systems * Signal Processing Implementation * Signal Processing Theory & Methods * Spoken language processing ** Indexing: The books of this series are indexed in Scopus and zbMATH **
Aurelio Uncini
Digital Audio Processing Fundamentals
Aurelio Uncini DIET Sapienza University of Rome Rome, Italy
ISSN 1866-2609 ISSN 1866-2617 (electronic) Springer Topics in Signal Processing ISBN 978-3-031-14227-7 ISBN 978-3-031-14228-4 (eBook) https://doi.org/10.1007/978-3-031-14228-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The recent evolution in the field of information and communication technologies (ICT) has made possible many innovative ways in the use of voice, music, audiovisual, multimedia and, in general, multimodal information. In the case of music, for example, the impact of today’s technologies is comparable to what the press has had on literary work: in the world of the arts, music is, in fact, among those that most intimately compares itself with the means that science and the technique make available. In this context, deeply linked to the world of multimedia and multimodal communication, in the volume “Digital Audio Processing Fundamentals,” we intend to focus on some aspects that characterize the binomials audio-technology and engineering music. The methods of digital manipulation of sounds, the study of artificial spaces and audio effects for musical use, the methods and techniques for digital synthesis of the musical signal, signal compression, sound spatialization, virtual audio are just some concrete technical examples. In more recent times, the collaboration between scientists from ICT, communication sciences, modeling and acoustic physics has led to the emergence of new disciplines such as the science of auditory communication, whose salient aspects range from the computational analysis of the auditory scene, the study of multimodal human–computer interface, augmented reality systems, audio for virtual environments and so on. The medium, in fact, does not simply coincide with the technical transmission–reception apparatus, but consists of a complex system which, in addition to the technological apparatus, implies the relationship between this apparatus and the perceptual and cognitive processes of man; in this context, the sound information, vocal, non-vocal, musical, etc., takes on a role of primary importance. The interest in digital audio signal processing and its spin-offs in various manufacturing sectors, as well as the cross-cutting skills needed, are very broad. The basic disciplines to address not superficially such topics in are hardly covered in the core courses of a single degree program. This text, born from twenty-five years of teaching experience of the author in the course of Digital Audio Signal Processing, held at the University of Rome “La Sapienza”—Italy, for students of degree courses in Engineering, is a transversal and v
vi
Preface
interdisciplinary work, mainly dedicated to the various specializations of computer science, communications but also others such as electronics, mechanics and other courses where there are often interests in digital audio signal processing. The work provides an overview and didactically consistent of the problems related to the processing of the audio signal in its various aspects, from the numerical modeling of vibrating systems to the circuits and algorithms for manipulation and generation of sound signals. The topics covered, while requiring strong interdisciplinary skills, acoustics, mechanical, electrical and computer science, are developed in a self-contained fashion, using a mathematics that should be known from previous courses in the first two years of science faculties. The two volumes, while complementary in subject matter, are autonomous and self-contained works The first volume Digital Audio Processing Fundamentals is organized conceptually into three parts. The first introductory part consists of two chapters, where the fundamental concepts of vibration mechanics, and the various modes of possible generation of sound waves are reported. Linear and nonlinear vibration models in continuous and discrete systems are introduced. The circuit analogies of mobility and impedance are introduced, and the concepts of sound waves in air and the electro-acoustic transducers are briefly recalled. Also in the first part, discrete-time circuit models and basic methodologies for one- and multi-dimensional numerical filtering are introduced. The central part deals specifically with audio signal processing methodologies. Filters for audio, multirate systems and wavelet transforms are introduced, so-called special transfer functions for audio applications are discussed, and circuit modeling methods for physical modeling of complex acoustic phenomena with both lumped and distributed parameters are discussed. In the third part, the so-called audio effects are introduced, such as reverberators, dynamic range control of the audio signal, effects based on time–frequency transformations. In the last two chapters, abstract sound synthesis methods and physical modeling are introduced and discussed. Rome, Italy July 2022
Aurelio Uncini
Acknowledgments
Many friends and colleagues have contributed to the creation of this book by providing useful suggestions, rereading drafts or endorsing my ruminations on the subject. I would like to thank my colleagues Stefania Colonnese, Danilo Comminiello, Massimo Panella, Raffaele Parisi, Simone Scardapane and Michele Scarpiniti of the Department of Information Engineering, Electronics and Telecommunications of the Università degli Studi di Roma—“La Sapienza,” Italy; Giovanni Costantini of the Università degli Studi di Roma—“Tor Vergata,” Italy; Stefania Cecchi, of the Università Politecnica delle Marche, Ancona, Italy. I would also like to thank all the students who provided me with very useful verifications for the correct exposition of the topics in the text.
vii
Contents
1 Vibrating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Vibrating Systems in One Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Simple Harmonic Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Damped Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Phasor Related to Quantities x, v and a . . . . . . . . . . . . . . . 1.3 Forced Oscillations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Transfer Function, Frequency, and Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Transient and Steady-State Response . . . . . . . . . . . . . . . . . 1.3.3 Sinusoidal Response, Mobility and Impedance . . . . . . . . . 1.3.4 Calculation of Complete Response . . . . . . . . . . . . . . . . . . . 1.3.5 Generic Helmholtz Oscillating Systems . . . . . . . . . . . . . . . 1.4 Electrical Circuit Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Kirchhoff Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Analogy of Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Analogy of Impedance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Nonlinear Oscillating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 General Notation and State Space Representation . . . . . . . 1.5.2 2nd-Order Nonlinear Oscillating Systems . . . . . . . . . . . . . 1.5.3 Undamped Pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.4 Van Der Pol and Rayleigh Oscillators . . . . . . . . . . . . . . . . . 1.5.5 Duffing Nonlinear Oscillator . . . . . . . . . . . . . . . . . . . . . . . . 1.5.6 Musical Instrument as a Self-sustained Oscillator . . . . . . . 1.5.7 Multimodal Nonlinear Oscillator . . . . . . . . . . . . . . . . . . . . . 1.6 Continuous Vibrating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Ideal String Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Solution of the String Wave Equation . . . . . . . . . . . . . . . . . 1.6.3 Strings Vibration and Musical Scales . . . . . . . . . . . . . . . . . 1.6.4 Lossy and Dispersive String . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.5 The Vibration of Membranes . . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 2 5 9 10 11 15 16 18 18 21 22 23 26 27 27 31 32 35 38 42 47 47 48 50 60 65 71 ix
x
Contents
1.7
Sound Waves in the Air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Plane Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.2 Spherical Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.3 Acoustic Impedance and Characteristic Acoustic Impedance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.4 Lumped Parameters Acoustic Impedance . . . . . . . . . . . . . . 1.7.5 Sound Field and Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.6 Effects Related to Propagation . . . . . . . . . . . . . . . . . . . . . . . 1.8 Acoustic Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.1 The Microphone and Its Directional Characteristics . . . . . 1.8.2 Loudspeakers: Operating Principle and Model . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Discrete-Time Signals, Circuits, and System Fundamentals . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Ideal Continuous-Discrete Conversion Process . . . . . . . . . 2.2 Basic Deterministic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Unitary Impulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Unit Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Real and Complex Exponential Sequences . . . . . . . . . . . . . 2.3 Discrete-Time Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 General DT System Properties and Definitions . . . . . . . . . 2.3.2 Properties of DT Linear Time-Invariant Circuits . . . . . . . . 2.3.3 Basic Elements of DT Circuits . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Frequency Domain Representation of DT Circuits . . . . . . 2.4 DT Circuits Representation in Transformed Domains . . . . . . . . . . . 2.4.1 The z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Discrete-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . 2.4.3 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Ideal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Discrete-Time Signal Representation with Unitary Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 DFT as Unitary Transformation . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Discrete Hartley Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Discrete Sine and Cosine Transforms . . . . . . . . . . . . . . . . . 2.5.4 Haar Unitary Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Data Dependent Unitary Transformation . . . . . . . . . . . . . . 2.6 Finite Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Transfer Function and Pole–Zero Plot . . . . . . . . . . . . . . . . . 2.6.2 BIBO Stability Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Finite Impulse Response Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Online Convolution Computation . . . . . . . . . . . . . . . . . . . . 2.7.2 Batch Convolution as a Matrix-Vector Product . . . . . . . . . 2.7.3 Convolutional-Matrix Operator . . . . . . . . . . . . . . . . . . . . . . 2.7.4 FIR Filters Design Methods . . . . . . . . . . . . . . . . . . . . . . . . .
72 73 75 76 78 79 84 89 89 94 98 101 101 104 105 106 106 107 108 109 110 113 115 117 117 119 124 126 127 129 130 130 132 132 137 138 140 141 141 143 145 146
Contents
2.8
Infinite Impulse Response Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Digital Resonator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.2 Anti-Resonant Circuits and Notch Filter . . . . . . . . . . . . . . . 2.8.3 All-Pass Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.4 Inverse Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.5 Linear Ordinary Differential Equation Discretization: IIR Filter from an Analog Prototype . . . . . 2.9 Multiple-Input Multiple-Output FIR Filter . . . . . . . . . . . . . . . . . . . . 2.9.1 MIMO Filter in Composite Notation 1 . . . . . . . . . . . . . . . . 2.9.2 MIMO (P, Q) System as Parallel of Q Filters Banks . . . . 2.9.3 MIMO Filter in Composite Notation 2 . . . . . . . . . . . . . . . . 2.9.4 MIMO Filter in Snap-Shot or Composite Notation 3 . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
157 158 159 160 162 164 169 171 172 173 174 174
3 Digital Filters for Audio Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Analog and Digital Audio Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Classifications of Audio Filters . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Shelving and Peaking Filter Transfer Functions . . . . . . . . 3.2.3 Frequency Bandwidth Definitions for Audio Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Constant-Q Equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Digital Audio Signal Equalization . . . . . . . . . . . . . . . . . . . . 3.3 IIR Digital Filters for Audio Equalizers . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Bristow-Johnson Second-Order Equalizer . . . . . . . . . . . . . 3.4 Robust IIR Audio Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Limits and Drawbacks of Digital Filters in Direct Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 All-Pass Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Ladder and Lattice Structures . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Lattice FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Orthogonal Control Shelving Filters: Regalia-Mitra Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 State-Space Filter with Orthogonal Control . . . . . . . . . . . . 3.4.7 TF Mapping on Robust Structures . . . . . . . . . . . . . . . . . . . . 3.5 Fast Frequency Domain Filtering for Audio Applications . . . . . . . . 3.5.1 Block Frequency Domain Convolution . . . . . . . . . . . . . . . . 3.5.2 Low Latency Frequency Domain Filtering . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
177 177 178 178 180
4 Multi-rate Audio Processing and Wavelet Transform . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Multirate Audio Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Sampling Rate Reduction by an Integer Factor . . . . . . . . . 4.2.2 Sampling Rate Increase by an Integer Factor . . . . . . . . . . . 4.2.3 Polyphase Representation . . . . . . . . . . . . . . . . . . . . . . . . . . .
231 231 232 232 235 238
184 188 190 196 197 201 202 202 203 206 210 217 219 219 220 223 228
xii
Contents
4.2.4 Noble Identity of Multirate Circuits . . . . . . . . . . . . . . . . . . 4.2.5 Fractional Sampling Ratio Frequency Conversion . . . . . . 4.3 Filter Banks for Audio Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Generalities on Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Two-Channel Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Filter Bank Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Lowpass Prototype Design . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Cosine-Modulated Pseudo-QMF FIR Filter Banks . . . . . . 4.3.6 Non-uniform Spacing Filter Banks . . . . . . . . . . . . . . . . . . . 4.4 Short-Time Frequency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Time–Frequency Measurement Uncertainty . . . . . . . . . . . . 4.4.2 The Discrete Short-Time Fourier Transform . . . . . . . . . . . 4.4.3 Nonparametric Signal Spectral Representations . . . . . . . . 4.4.4 Constant-Q Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 4.5 Wavelet Basis and Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Continuous-Time Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Inverse Continuous-Time Wavelet . . . . . . . . . . . . . . . . . . . . 4.5.3 Orthogonal Wavelets and the Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Multiresolution Analysis: Axiomatic Approach . . . . . . . . 4.5.5 Dilation Equations for Dyadic Wavelets . . . . . . . . . . . . . . . 4.5.6 Compact Support Orthonormal Wavelet Basis . . . . . . . . . . 4.5.7 Wavelet for Discrete-Time Signals . . . . . . . . . . . . . . . . . . . 4.5.8 Wavelet Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241 242 243 244 246 258 261 267 271 274 274 275 279 282 288 289 294
5 Special Transfer Functions for DASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 FIR Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 IIR Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Feedback Delay Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Universal All-Pass Comb Filters . . . . . . . . . . . . . . . . . . . . . 5.2.5 Nested All-Pass Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Rational Orthonormal Filter Architecture . . . . . . . . . . . . . . . . . . . . . 5.3.1 Kautz–Broome Orthogonal Basis Filter Model . . . . . . . . . 5.3.2 Parameters Estimation of OBF Models . . . . . . . . . . . . . . . . 5.3.3 Laguerre Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Frequency Warped Signal Processing . . . . . . . . . . . . . . . . . 5.4 Circular Buffer Delay Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Circular Buffer Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Delay Lines with Nested All-Pass Filters . . . . . . . . . . . . . . 5.5 Fractional Delay Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Problem Formulation of Band-Limited Interpolation . . . . 5.5.2 Approximate FIR Solution . . . . . . . . . . . . . . . . . . . . . . . . . .
333 333 333 334 334 336 340 340 342 345 348 351 353 366 368 370 372 372 374
295 298 303 311 316 323 329
Contents
xiii
5.5.3 Approximate All-Pass Solution . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.5 Time-Variant Delay Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.6 Arbitrary Sampling Rate Conversion . . . . . . . . . . . . . . . . . . 5.5.7 Robust Fractional Delay FIR Filter . . . . . . . . . . . . . . . . . . . 5.5.8 Taylor Expansion of Lagrange Interpolation Filter . . . . . . 5.6 Digital Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Sinusoidal Digital Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Wavetable Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
380 384 397 398 401 407 409 410 410 413
6 Circuits and Algorithms for Physical Modeling . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Local and Global Approach . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Structural, Functional and Interconnected Models . . . . . . 6.1.3 Local Approach with Circuit Model . . . . . . . . . . . . . . . . . . 6.2 Wave Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Representation of CT Circuits with Wave Variables . . . . . 6.2.2 Mapping the Electrical Elements from CT to DT . . . . . . . 6.2.3 Connecting DT Circuit Elements . . . . . . . . . . . . . . . . . . . . . 6.2.4 DT Circuit Corresponding to Given Analog Filter . . . . . . 6.3 Digital Waveguide Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Lossless Digital Waveguides . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Losses Digital Waveguides . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Terminated Digital Waveguides . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Alternative and Normalized Wave Variables . . . . . . . . . . . 6.3.5 Digital Waveguides Connection . . . . . . . . . . . . . . . . . . . . . . 6.4 Finite-Differences Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 FDM Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Derivation of Connection Models for FDTD Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Nonlinear WDF and DW Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Mapping Memoryless Nonlinear Elements . . . . . . . . . . . . . 6.5.2 Mapping of Nonlinear Elements with Memory . . . . . . . . . 6.5.3 Impedance Adaptation with Scattering Junction . . . . . . . . 6.5.4 Mixed Modeling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 System Modeling by Modal Decomposition and Impulse Response Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Theoretical Route of Vibration Analysis . . . . . . . . . . . . . . . 6.6.2 On the Impulse Response Estimation . . . . . . . . . . . . . . . . . 6.6.3 Physical Model Synthesis by Modal Decomposition . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
419 419 420 421 422 426 429 431 434 438 441 441 444 446 448 450 456 457 461 462 463 465 466 467 469 470 472 475 479
xiv
Contents
7 Digital Audio Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Room Acoustic Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Physical Modeling Versus Perceptual Approach . . . . . . . . 7.3 Schroeder’s Artificial Reverberator . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Schroeder’s First Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 The Schroeder–Moorer Model . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 The Frequency-Dependent Moorer Model . . . . . . . . . . . . . 7.3.4 Selecting Reverberator Parameters . . . . . . . . . . . . . . . . . . . 7.4 The Quality of Artificial Reverberation . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Energy Decay Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Characterization of Diffuse Radiation . . . . . . . . . . . . . . . . . 7.4.3 Early Reflections Characterization . . . . . . . . . . . . . . . . . . . . 7.5 Reverb Model with Feedback Delay Networks . . . . . . . . . . . . . . . . . 7.5.1 Stautner and Puckette’s Model . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Jot and Chainge Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Choice of Feedback Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Other Reverberator Models . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Acoustic Modeling with Digital Waveguides Networks . . . . . . . . . 7.6.1 Wave Propagation Modeling with Digital Waveguide Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 DWN Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Dynamic Range Control of Audio Signal . . . . . . . . . . . . . . . . . . . . . 7.7.1 DRC Static Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 Dynamic Gain Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.3 Signal Level Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.4 Constructive Considerations of the DRC . . . . . . . . . . . . . . 7.7.5 DRC with Multiband Approach . . . . . . . . . . . . . . . . . . . . . . 7.7.6 Dynamic Range Control Applications . . . . . . . . . . . . . . . . . 7.8 Effects Based Time-Variant Fractional-Delay Lines . . . . . . . . . . . . 7.8.1 Angular Modulation with TV-FDL . . . . . . . . . . . . . . . . . . . 7.8.2 Vibrato and Other TV-FDL-Based Effects . . . . . . . . . . . . . 7.8.3 Amplification Systems with Rotating Speakers (Leslie) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Effects Based on Time–Frequency Transformations . . . . . . . . . . . . 7.9.1 Frequency Transposition and Time Scale Change . . . . . . . 7.9.2 Classification of TFT Algorithms . . . . . . . . . . . . . . . . . . . . 7.9.3 Time-Domain FTF Algorithms . . . . . . . . . . . . . . . . . . . . . . . 7.9.4 Time–Frequency Domain Algorithms Based Effects . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
483 483 484 485 487 487 489 489 492 493 493 495 499 502 503 504 505 508 509 510 511 512 513 517 519 521 525 527 530 530 533 540 545 546 548 549 557 560
Contents
xv
8 Sound Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Synthesis by Recorded Sound . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Abstract Algorithm Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Sampling Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 The ADSR Envelope Control . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Looping Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Tremolo and Vibrato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Sampling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 The Cross-Fade Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Wavetable Synthesizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Additive Synthesis and Wavetable Summation . . . . . . . . . 8.3.2 Wavetable and Parameters Estimation . . . . . . . . . . . . . . . . . 8.3.3 Granular Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 WS Hybrid Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Spectral Representation of the Signal . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Additive Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Subtractive Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Discrete-Time Modeling of Analog Synthesizer . . . . . . . . . . . . . . . . 8.5.1 Synthesizer’s Units and Signal and Control Paths . . . . . . . 8.5.2 Layout of Simple Custom Synthesizers . . . . . . . . . . . . . . . 8.5.3 Voltage Control Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.4 Physical Modeling of Analog Synthesizer: The Virtual Analog Music Synthesizers . . . . . . . . . . . . . . . . . . . 8.5.5 Large-Signal DT Model of Moog VCF . . . . . . . . . . . . . . . . 8.6 Frequency Modulation Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 FM Sound Synthesis Principles . . . . . . . . . . . . . . . . . . . . . . 8.6.2 Concatenated FM with Multiple Operators . . . . . . . . . . . . 8.7 Nonlinear Distortion Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 NLD Theoretical Development . . . . . . . . . . . . . . . . . . . . . . 8.7.2 Extensions of the NLD Technique . . . . . . . . . . . . . . . . . . . . 8.7.3 Comparison with FM Technique . . . . . . . . . . . . . . . . . . . . . 8.8 Karplus–Strong Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
565 565 566 567 568 568 570 571 572 573 573 575 576 581 582 582 582 585 585 586 587 589
9 Physical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Physical, Mathematical, and Computational Models . . . . . . . . . . . . 9.3 Vibrating String Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Excitation Models of Vibrating Systems . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Plucked String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Continuously Excited String . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.3 Generalized Excitation by Data-Driven Pseudo-Physical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Modeling of Wind Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
609 609 610 612 613 614 616
592 593 596 596 598 600 600 603 604 604 606
617 621
xvi
Contents
9.5.1
Excitation Mechanism Modeling of Wind Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Wind Instruments Acoustic Tube Modeling . . . . . . . . . . . . 9.5.3 Modeling of Tonal Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Woodwinds Physical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Single-Reed Woodwinds: Clarinet . . . . . . . . . . . . . . . . . . . . 9.6.2 Air-Jet Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Brass Physical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.1 Shape and Characteristics of the Cup Mouthpiece . . . . . . 9.7.2 Bell-Shaped Flaring and Radiation Modeling . . . . . . . . . . 9.7.3 Brass Excitation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Commuted Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8.1 Generalized Commuted Synthesis Model . . . . . . . . . . . . . . 9.9 Guitar Physical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9.1 Guitar Body Physical Model . . . . . . . . . . . . . . . . . . . . . . . . . 9.9.2 Soundboard Bracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9.3 Guitar Overall Physical Model . . . . . . . . . . . . . . . . . . . . . . . 9.10 Bowed String Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.1 Bow Excitation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.2 Helmholtz Motion of Bowed String . . . . . . . . . . . . . . . . . . . 9.10.3 Bow Control Variables: Position, Velocity, and Force . . . . 9.10.4 Body and Bridge Modeling of Bowed String Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.5 Violin’s Impulse Response . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.6 Physical Model of the Bowed String Instrument . . . . . . . . 9.11 Piano Physical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.1 Mechanical Principle and Simplified Piano Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.2 Piano String Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.3 Hammer Excitation Mechanism . . . . . . . . . . . . . . . . . . . . . . 9.11.4 Numerical Piano Modeling by Modal Synthesis . . . . . . . . 9.11.5 Bridge and Soundboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
622 631 635 641 643 648 649 651 652 653 654 657 658 658 661 662 664 666 671 673 676 681 683 687 688 690 693 698 700 702
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
Chapter 1
Vibrating Systems
1.1 Introduction The generation of sound is commonly associated with the vibration of solid objects, and sound itself is often referred to as “air vibration.” The words sound and vibration are, in fact, commonly connected to each other. In musical instruments, for example, some sound generators are vibrating strings as in piano, guitar, violin, etc.; vibration of bars like in the xylophone, vibraphone; vibration of membranes as in drums and banjo; vibration columns of air in pipes as in pipe organ, brass, wood, etc. This chapter briefly describes the basic mathematics of mechanical oscillations, mobility, and impedance analogies with electrical circuits, acoustic waves, the main quantities to measure the intensity of sound, the effects principally related to the phenomena of propagation, and a very brief description of electroacoustic transducers; which are an essential cultural necessity for a deep understanding of digital audio signal processing (DASP) methods and techniques. Moreover, since the sounds rich in harmonics and with acoustically more interesting timbres are obtained through interactions of linear and nonlinear systems, a short section on nonlinear oscillating systems has been introduced.
1.2 Vibrating Systems in One Dimension When we perturb a certain physical system, for example an acoustic-mechanical system, with the possibility of relative motion between its parts provided with elasticity and mass, the system may be subject to periodic motions called vibrations and the dynamical system can be referred to as vibrating system [1, 2]. To analyze the behavior of a vibrating system, let’s consider the simple mechanical system illustrated in Fig. 1.1a consisting of a massless spring, a mass attached to it, and placed on a plane that binds its motion horizontally. We also consider drag losses due to friction and viscosity of the fluid in which the mass moves. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Uncini, Digital Audio Processing Fundamentals, Springer Topics in Signal Processing 21, https://doi.org/10.1007/978-3-031-14228-4_1
1
2
1 Vibrating Systems
R
Spring
Mass
K
Friction losses
a)
b)
m
Displacement position
x (t )
Fig. 1.1 Simple ideal oscillating system. a Mass–spring–friction system. b Circuit representation of vibrating system in one dimensions: mass m, spring K, and damper R
For a more detailed analytical study, it is better to refer to the model (or circuit representation) illustrated in Fig. 1.1b, in which the following quantities are highlighted: the spring with the Hooke’s elastic constant K ; the mass m and a shock-absorber or damper, which models the friction, characterized by a mechanical resistance coefficient R, and the displacement position of the mass indicated as x(t)
Fk = m · x(t) ¨
(1.1)
k
where the quantity x(t) ¨ ≡ ∂ ∂tx(t) = a(t) represents the acceleration.1 In an ideal case, 2 the forces involved in the equilibrium of the above dynamic system are: the force of the spring proportional to the displacement F1 = −K x(t); and the friction propor˙ tional to the speed of motion F2 = −R x(t). Thus, the so-called free motion equation can be determined according to the following definition. 2
Definition 1.1 Let x(t), x(t), ˙ and x(t) ¨ be the quantities denoted as displacement, velocity, and acceleration, respectively; substituting into Eq. (1.1) the constitutive relations F1 and F2 of the spring and the damper yields the following homogeneous linear ordinary differential equation (ODE) m x(t) ¨ + R x(t) ˙ + K x(t) = 0
(1.2)
defined as the motion equation of the mass–spring–damper oscillating system.
1.2.1 Simple Harmonic Motion For lossless systems, since no friction is present to resist the motion, i.e., in Fig. 1.1b is R = 0, for an initial perturbation there is a persistent oscillation that will continue indefinitely according to the following definition. Note that with the symbol p, ˙ we intend the temporal derivative, i.e., p˙ = p we intend the spatial derivative, i.e., p = ∂∂ px .
1
∂p ∂t , while with the symbol
1.2 Vibrating Systems in One Dimension
3
Definition 1.2 A simple harmonic motion is a special type of periodic motion or oscillation motion where the restoring force is directly proportional to the displacement and acts in the direction opposite to that of displacement. The simple harmonic motion is characterized by the following homogeneous ODE m x(t) ¨ + K x(t) = 0.
Definition 1.3 The constant ω0 =
K /m
(1.3)
(1.4)
is defined as resonance-pulsation or resonance-pulsatance or resonance angular 1 K . frequency, and similarly the resonance frequency is defined as f 0 = 2π m Equation (1.3) can be rewritten as x(t) ¨ + ω02 x(t) = 0
(1.5)
which represents the differential equation of simple harmonic motion. Thus, according with Definiton 1.2, from the theory of differential equations, Eq. (1.5) admits a solution x(t), sometime denoted as trial solution, of the type x(t) = A1 cos ω0 t + A2 sin ω0 t
(1.6)
where A1 and A2 are two arbitrary constants to be determined based on the value of the initial conditions (i.c.) x(0) and x(0) ˙ of Eq. (1.5). In addition, observe that the trial solution in Eq. (1.6) is very often written as x(t) = A cos(ω0 t + φ).
(1.7)
The constant ω0 represents resonance (or natural) angular frequency of the system. The natural frequency f 0 of the oscillating system is therefore equal to √ f 0 = (1/2π ) K /m. The amplitude A of the oscillation is equal to A = A21 + A22 , and the initial phase equal to φ = tan−1 A2 /A1 . 1.2.1.1
Velocity and Acceleration
Differentiating Eq. (1.7) with respect to time, we obtain the velocity v(t) v(t) = x(t) ˙ = −ω0 A sin (ω0 t + φ)
(1.8)
4
1 Vibrating Systems
Fig. 1.2 Simple oscillating system. The displacement from the equilibrium position (or simply position) x(t), the velocity v(t) and the acceleration a(t) of the simple harmonic motion relative to a simple linear mechanical oscillator modeled by linear ODE Eq. (1.5)
by differentiating again, we obtain the acceleration a(t) a(t) = x(t) ¨ = −ω02 A cos (ω0 t + φ)
(1.9)
drawn in Fig. 1.2. Note that the velocity is out of phase by π/2 radians and the acceleration of π radians. From Eqs. (1.8) and (1.9), note that: (1) at the same amplitude A, velocity and acceleration are greater as the frequency is higher; (2) x(t) and a(t) are in phase opposition, while v(t) is in quadrature with them.
1.2.1.2
Potential and Kinetic Energy
Vibration mechanics deals with moving objects (masses, strings, membranes, and fluids), so we can also consider energy aspects such as kinetic and potential energy, where: • kinetic energy is the energy of a body in motion so it depends on the speed v(t) and can be written as E k = 21 mv 2 ; • potential energy is the energy xthat a body possesses in relation to its position x(t) and can be written as E p = 0 K xdx. Due to the phase shift between x(t) and v(t), the kinetic energy E k of the system is maximum when the potential energy E p is zero (and vice versa). It is also known from basic physics that during the oscillatory motion these two energies transform alternatively into each other and their sum is constant. So, the internal energy of the system is given for each t by the sum E k + E p , and from Eqs. (1.7) and (1.8), we get 1 2 1 mv + K x 2 2 2 1 1 2 2 = mω0 A sin2 (ω0 t + φ) + K A2 cos2 (ω0 t + φ) 2 2
E=
1.2 Vibrating Systems in One Dimension
5
since K = mω02 and sin2 (·) + cos2 (·) = 1, we have that E = E p + Ek =
1 1 1 K A2 = mω02 A2 = mU 2 2 2 2
(1.10)
where U represents the maximum speed. In a loss-free system, the energy is constant and is equal to the maximum potential energy (at maximum deviation) or maximum kinetic energy (at the central point). Moreover, to characterize the oscillator motions we often refer to the effective quantities or root mean squares (RMS). Definition 1.4 Effective quantities or root mean squares (RMS)—Given a signal x(t) that can represent a physical quantity, its effective or RMS value is defined by the following scalar quantity x R M S = lim
T →∞
1 T
T
|x(t)|2 dt
(1.11)
0
and for a periodic signal is equal to the RMS of√one period of the signal. Thus, for a sinusoidal quantity we have that x R M S = A/ 2 where A is the amplitude of the oscillation. Remark 1.1 In physical modeling, the mass m and the spring K are considered as “concentrated” in a single point. The model is denoted as: lumped parameters model that, under certain assumptions, approximate the behavior of the distributed system. It is useful in electrical, mechanical, and acoustics systems. In particular, the mass m can be considered as an elementary sound source, which transmits the motion to the particles of the medium in which it is in contact.
1.2.2 Damped Harmonic Oscillator In the case of real system, drag or friction loss are no longer negligible (R > 0). The ODE (1.2), considering the substitutions α = R/2m and ω02 = K /m, can be rewritten as (1.12) x(t) ¨ + 2α x(t) ˙ + ω02 x(t) = 0. It is known that the solution of the 2nd-order homogenous differential equation depends on the solution of an associated homogeneous algebraic equation defined as γ 2 + 2αγ + ω02 = 0
(1.13)
where the variable γ is defined as a complex frequency. The roots γ1 and γ2 of (1.13) are computed as
6
1 Vibrating Systems
γ1,2 = −α ±
α 2 − ω02 = −α ± j ω02 − α 2 = −α ± jωd
(1.14)
where ωd = ω02 − α 2 , represents the natural angular frequency of the damped oscillation (which, due to due to the presence √ of friction, is lower than the nondamped resonance angular frequency ω0 = K /m). The solutions (1.14) can be: (1) real and distinct; (2) conjugate complexes, and (3) real coincident. For each of these cases, there will be a trial solution that characterizes a certain physical behavior.
1.2.2.1
Real and Distinct Roots
From Eq. (1.14), the roots γ1 and γ2 are real and distinct if α 2 > ω02 . In this case, it is easy to verify that the general integral of Eq. (1.12) turns out to be x(t) = A1 eγ1 t + A2 eγ2 t
(1.15)
where the constants A1 and A2 (also real) are determinable from the initial conditions (Cauchy problem) solving the system A1 + A2 = x(0) γ1 A1 + γ2 A2 = x(0). ˙
(1.16)
In this case the motion is determined by the sum of two real exponentials: The system does not oscillate.
1.2.2.2
Coincident Roots
In the event that the roots coincide, we have that γ = γ1 = γ2 and the solution is the sum of two terms (1.17) x(t) = A1 eγ t + A2 teγ t where the constants A1 and A2 can be determined by solving the system A1 = x(0) γ A1 + A2 = x(0). ˙ Also in this case the system does not oscillate.
(1.18)
1.2 Vibrating Systems in One Dimension
1.2.2.3
7
Complex Conjugate Roots
In the case of a not too high friction, in the presence of a finite duration impulse perturbation of the system, an oscillation is determined. However, due to the even minimal presence of a friction, the oscillation tends to zero after a certain time. In fact, in the case that α 2 < ω02 , we have complex conjugate roots and γ2 = γ1∗ . So, the general solution can be written as ∗
x(t) = A1 eγ1 t + A2 eγ1 t .
(1.19)
The system for the determination of the constants A1 and A2 results A1 + A2 = x(0) γ1 A1 + γ1∗ A2 = x(0). ˙
(1.20)
From the previous system it is easy to verify the fundamental property that the constants A1 and A2 are complex conjugate: A1 = A∗2 . In this case setting A1 = |A1 | e jφ , and remembering that γ1,2 = −α ± jωd , Eq. (1.19) can be rewritten as x(t) = |A1 | e jφ e(−α+ jωd )t + |A1 | e− jφ e(−α− jωd )t = |A1 | e−αt e j (ωd t+φ) + e− j (ωd t+φ) so, from the Euler formula, it is worth that x(t) = Ae−αt cos (ωd t + φ)
(1.21)
where the amplitude turns out to be A = 2 |A1 |. Remark 1.2 Observe that Eq. (1.21) can be written as x(t) = e−αt (B cos ωd t + C sin ωd t) .
(1.22)
For t = 0, Eq. (1.22) can be used to compute the initial displacement x0 = x(0) and the initial velocity v0 = v(0) leading to
v0 + αx0 sin ωd t . x(t) = e−αt x0 cos ωd t + ωd
(1.23)
The amplitude of the damped oscillator is therefore given by the envelope e−αt , and its trend is not strictly periodic. The distance between two successive zero crossings and the one between two maxima (or minimum) remain constant and equal to Td = 1/ f d , (oscillation period). However, the maximums and minimums are not exactly located at the middle of the path between two successive zeros and each period is not exactly identical to the previous one (pseudoperiodic oscillation).
8
1 Vibrating Systems
Fig. 1.3 Motion of 2nd-order damped harmonic oscillator for different values of the decay time constant τ
e−0.1t e− t e−4t
1.2.2.4
Time Constant τ
An important measure of damping is the time needed for the amplitude to attenuate by a factor equal to 1/e (for which e−αt = e−1 ). Referring to Fig. 1.3, this time, denoted as τ , is called in various ways according to the application fields: time constant, decay constant, life time, characteristic time; so, for Eq. (1.21), we have that τ=
2m 1 = α R
(1.24)
and in the following, τ will be called a time constant.
1.2.2.5
Quality Factor Q and Damping Ratio ζ
Very often to characterize the behavior of a damped oscillator, we use the quality factor or Q-factor, according to the following definition. Definition 1.5 Quality factor—The Q-factor is defined as the ratio between the spring force K x0 and the damping force Rω0 x0 Q=
K x0 K ω0 . = = Rω0 x0 Rω0 2α
(1.25)
This seems to be due, in addition to historical and customary reasons, to the fact that the Q-factor allows an immediate definition in the frequency domain as explained in the following section. Remark 1.3 It is useful to note that in other areas, such as automatic controls, instead of the Q-factor is more common the use of the damping ratio ζ defined as ζ =
α 1 . = ω0 2Q
(1.26)
1.2 Vibrating Systems in One Dimension
9
Fig. 1.4 Quantities of the linear oscillator described as rotating vectors in the plane of the phasors
1.2.3 Phasor Related to Quantities x, v and a In the case of linear systems, typical in electrical engineering and acoustics, steadystate analysis for sinusoidal excitations is often of interest. Signals of the type e jωt , (sometimes referred to as cisoid and indicated as cis(ωt) cos(ωt) + jsin(ωt)), that have distinct advantages over operations with sine or cosine functions, are eigenfunctions for linear systems. Thus, when transient phenomena are terminated, they can be omitted from the formal treatment. For this purpose the following definition applies. Definition 1.6 By phasor or complex amplitude associated with the sinusoidal quantity x(t) = X cos(ωt + φ), we mean a complex number, indicated as X, such that
x(t) = Re Xe jωt . So, the phasor X is then defined as X = X e jφ
(1.27)
therefore phasor contains only information on the amplitude and phase, and not the information on the angular frequency. In addition, the quantity x(t) ¯ = Xe jωt
⇔
x(t) = Re [x(t)] ¯
(1.28)
takes the name of rotating vector or complex displacement associated with the quantity x(t) = X cos(ωt + φ) (see Fig. 1.4). The rotating vector x(t) ¯ is typically written as x¯ (i.e., omitting the time index (t)). The velocity and acceleration vectors can be, respectively, written as v¯ = jω0 Xe jω0 t = jω0 x¯
(1.29)
¯ a¯ = −ω02 Xe jω0 t = −ω02 x.
(1.30)
and
10
1 Vibrating Systems
Graphically these vectors can represent the motion indicated in Fig. 1.4 in the phasor plane. Therefore, we observe that velocity v¯ increases with the frequency, while acceleration a¯ with its square.
1.3 Forced Oscillations Consider the mass–spring–damper mechanical system of Fig. 1.5. If the oscillator is coupled to an external energy source (or force) indicated by f (t), the mathematical model of the motion equation takes the following form m x(t) ¨ + R x(t) ˙ + K x(t) = f (t).
(1.31)
Equation (1.31) expresses the balance between the driving force and the inertia of the oscillator. In general, f (t) can take various forms: random, impulse, sinusoidal, etc. From the theory of differential equations solutions of Eq. (1.31) vary according to the form assumed by f (t). Property 1.1 Linearity—The system described by Eq. (1.31), is linear, that is, the superposition principle is valid. The superposition principle states that for all linear systems the system response caused by two or more stimuli is the sum of the responses that would have been caused by each stimulus individually. So, if input f1 (t) produces response x1 (t) and input f 2 (t) produces response x2 (t), then the input f 1 (t) + f 2 (t) produces response x1 (t) + x2 (t). The linearity property, therefore, simplifies the analysis and the characterization of the dynamic system. Property 1.2 Time-invariant—It should also be noted that if the system coefficients do not vary over time, the system is called time-invariant or stationary. That is, if an input f (t) produces an output x(t), the same input presented in another instant f (t − τ ), produces an output x(t − τ ).
Fig. 1.5 Mechanical mass–spring–damper oscillator driven by an external force f (t): a mechanical scheme; b model
1.3 Forced Oscillations
11
Fig. 1.6 Bode plots of a typical 2nd-order system. Normalized frequency and phase response of the mechanical mass–spring–damper oscillator for increasing Q-factor. Remember that Q = 2ζ1 , where ζ is the damping factor
1.3.1 Transfer Function, Frequency, and Impulse Responses A linear system can be represented by the Laplace transform or L-transform.2 So, considering the L-transform of Eq. (1.31) we can write3 ms 2 X (s) + Rs X (s) + K X (s) = F(s).
(1.32)
The transfer function (TF) defined as the ratio between the L-transform of the output variable and the L-transform of the driving variable can be written as H (s) =
1/K X (s) = R F(s) 1+ Ks +
m 2 s K
.
(1.33)
Recalling the definition of resonance angular frequency (see Definition 1.3) and the definition of quality factor Q (see Eq. 1.25), Eq. (1.33) can be rewritten as H (s) =
1+
1/K +
R s K
m 2 s K
=
G 1+
s
ω0 Q
+
s2 ω02
(1.34)
where G is the gain, Q the quality factor and ω0 the natural angular frequency. The Bode diagram of the above TF, which determines its frequency response, is illustrated in Fig. 1.6.
The reader is reminded that, by transform of a function f (t), we mean the scalar product between the function and a basis function ϕs (t), i.e., F(s) = f (t), ϕs∗ (t) . In the Laplace transform ϕs (t) = ∞ e−st ; so for mono-lateral L-transform is: F(s) = 0 f (t)e−st dt, where s = α + jω is the complex frequency. 3 For simplicity we consider zero initial conditions. 2
12
1 Vibrating Systems
Fig. 1.7 Response of a 2nd-order damped oscillator for an impulse external force f (t) = δ(0) [(i.e, the impulse response (IR)], for different values of the Q-factor. Note that the amplitude descends up to 37% of its initial value after a time equal to the time constant τ = 1/α = 2Q/ω0 which corresponds to Q/π cycles
Remark 1.4 More generally, a TF is defined as a rational function of order (q, p) in the variable s and usually is written as q k b0 + b1 s + · · · + bq s q N (s) k=0 bk s = H (s) = = . p k D(s) a0 + a1 s + · · · + a p s p k=0 ak s
(1.35)
The roots of the numerator N (s) are called “zeros,” while those of the denominator D(s) are denoted to as “poles.” For example, a 2nd-order mechanical oscillator described by the TF (1.34) has two poles [(which are the solution of ODE’s associated homogeneous algebraic equation (1.14)].
1.3.1.1
Impulse Response
The inverse Laplace transform (or L−1 -transform) of a TF is defined as the system impulse response (IR). Specifically for Eq. (1.34), as shown in Fig. 1.7, it takes the form ⎡ h(t) = L−1 ⎣
⎤ G 1+
s ω0 Q
+
s2 ω02
⎦ = G e−αt sin(ωd t), ωd = ω0 1 − (1/2Q)2 ωd
where for high Q values, we have that ωd ≈ ω0 .
(1.36)
1.3 Forced Oscillations
13
Remark 1.5 Observe that in other contest the IR is denoted as Green’s function4
1.3.1.2
Force Versus Velocity Transfer Function
Indicating with V (s) = s X (s) the L-transform of the velocity, the transfer function between velocity and force becomes Hv (s) =
V (s) s (1/K ) . = F(s) 1 + KR s + mK s 2
(1.37)
Remark 1.6 Note that the TF in Eq. (1.37) has a zero in the origin and two poles in the denominator, and, as we will see also in Sect. 1.4, it is the typical one of the so-called resonant circuit.
1.3.1.3
Bandwidth at −3 dB
√ Considering the resonance circuit TF in Eq. (1.37) for ω0 = K /m, Q = K /(ω0 R), and for s = jω, to better study its frequency response, it is more convenient to express the TF (1.37) in terms of angular frequency normalized with respect to ω0 , as H ( jω) =
1 + jQ
1
ω ω0
−
ω0 ω
.
Considering the amplitude response for f = ω/2π , it is easy to verify that |H ( f )| = 1 + Q2
1
f f0
−
f0 f
2
where the Q-factor, as is known, provides a direct measurement of the bell-shape frequency response. The term f 0 , as shown in Fig. 1.8, represents the maximum frequency of the bell and is also called resonance frequency. By definition, the bandwidth at −3 dB5 (see√Fig. 1.8) corresponds to the frequencies for which the TF module is equal to 1/ 2 (i.e., 20 log10 √12 ≈ −3), which 4 The particular solution y (t) of a linear non-homogeneous differential equation with constant p coefficients can be expressed in term of a convolution integral, namely y p (t) = G(τ )x(t − τ )dτ , where x(t) is the non-homogeneous (forcing) term and G(t) is the Green’s function for the associated differential operator. 5 Given a quantity expressed in natural value denoted as x , which generally indicates a ratio nat between a quantity and its reference value, (e.g., a voltage V /V0 , a pressure P/P0 , etc.) this quantity can be expressed in decibel or dB xdB and vice versa, according to
14
1 Vibrating Systems
Fig. 1.8 Frequency and phase response of a resonant oscillator (or circuit) around its resonant frequency (normalized s that f 0 = 1), for different values of the Q-factor
corresponds to half power. These frequencies, defined as cutoff frequencies, can be easily calculated by imposing this value on the module; i.e., we get 1+
Q2
1
f f0
−
1 2 = √2 f0
(1.38)
f
solving with respect to f , the filter cutoff frequency values can be determined as f1 =
f 0 1 + 4Q 2 − 1 , low-cut frequency 2Q
(1.39)
f2 =
f 0 1 + 4Q 2 + 1 , high-cut frequency. 2Q
(1.40)
From the previous expressions it is easy to verify that the f 0 corresponds to the geometric mean of the cutoff frequencies f0 =
f 1 f 2 , resonance frequency.
Combining the expressions (1.39) and (1.40) we have that f 2 − f 1 = usual to define the value of the Q-factor as
xdB = 10 log10 xnat , ⇔
xnat = 10
xdB 10
(1.41) f0 . Q
So, it is
.
When referring to power, assuming it is proportional to the square of the given quantity, we get 2 xdB = 10 log10 xnat = 20 log10 xnat .
1.3 Forced Oscillations
15
Q=
f0 f
(1.42)
where the quantity f = f 2 − f 1 is defined filter bandwidth at −3 dB.6 Remark 1.7 Note that the definition of Q in (1.42) does not depend on the components of the oscillator (K , R), so its use is much more convenient as, for example, in defining the design specifications for analog or digital filters used in the audio sector.
1.3.2 Transient and Steady-State Response Consider the oscillating system modeled with Eq. (1.31) excited by an external force with a sinusoidal waveform of the type f (t) = F cos(ωt + φ), for t ≥ 0.
(1.43)
In the case of zero initial conditions, it is known from the theory of differential equations and from the L-transform that the solution of Eq. (1.31) can be expressed, in time or s-domain, as the sum of two parts x(t) = xt (t) + x p (t)
⇔
X (s) = X t (s) + X p (s).
(1.44)
Assuming a stable system, i.e., all TF’s poles are located on the left of the imaginary axis of the s-plane, the xt (t) part is called the transient response of the system which, by definition, tends to zero in “short” time, while x p (t) is called the permanent or steady-state response that predominates over time. To obtain the solution of a sinusoidal steady state, it is better to write the solution in the Laplace domain as X (s) = H (s)F(s). (1.45) The term X (s) can be characterized by a pair of complex conjugate poles deriving from the sinusoidal excitation function (1.43). Developing in partial fractions the X (s), there is a development of the type X (s) = X t (s) +
R∗ R + s − jω s + jω
(1.46)
where the part X t (s) is the transitory one as it relates to the poles of H (s) which, due to the stability condition, will have a real part less than zero. It therefore appears that the steady-state response is sinusoidal and can be written as For the definition of the cutoff frequencies, in the case of resonant circuits for which 4Q 2 1, instead of the (1.39) and (1.40) it is usual to use approximate formulas f 1 = f 0 (1 − 1/(2Q)) and f 2 = f 0 (1 + 1/(2Q)). This definition is almost never used in the audio sector where the value of Q is generally less than 10.
6
16
1 Vibrating Systems
x p (t) = L−1
R R∗ + = A1 cos(ωt + φ1 ), t ≥ 0. s − jω s + jω
(1.47)
1.3.3 Sinusoidal Response, Mobility and Impedance To obtain the only sinusoidal steady-state response, we can proceed by writing the equation of motion in the phasor domain. Let F = Fe jφ be the external force signal phasor; from Eq. (1.33) the displacement can be written as the following equivalent forms X = H (s)|s= jω F =
F/K 1+
jω ω0 Q
−
ω2 ω02
=
F/m F/m = 2 ω02 − ω2 + jω2α ω0 − ω2 + jω2ζ ω0
(1.48)
where ω02 = K /m, Q = K /(ω0 R) = 1/(2ζ ) and α = R/2m = ω0 ζ . Recalling the link between the phasor of a sinusoidal quantity and the phasor of its derivative with respect to time,7 it results V = jωX. It follows that the relationship between force and velocity in the phasor domain, which in mechanics is called mobility or mechanical admittance and denoted by Y, for (1.37) turns out to be equal to Y=
jω/K V (s) = jω F(s) s= jω 1 + Qω − 0
ω2 ω02
=
jω/m ω02 − ω2 + jω2ζ ω0
(1.49)
Remark 1.8 Note that the mobility plays a very important role in modeling the inherent dynamic characteristics of vibrating mechanical structure in the form of natural frequencies, damping factors, and shapes of so-called vibration modes. It is also of particular importance in the study of acoustic musical instruments, in particular of its constituent parts such as the soundboards of the piano and the sound boxes and soundboards of stringed instruments such as guitars and violins. In addition, the inverse of mobility in the phasor domain is denoted as mechanical impedance Z = 1/Y, defined as the ratio of the force and velocity phasors, whereby Z=
F = R + j (ωm − K /ω) = R + j X m V
(1.50)
where X m = ωm − K /ω represents mechanical reactance. The trend the various impedance functions, as a function of frequency, is shown in Fig. 1.9. The steady-state displacement law, according to Eq. (1.47), can be expressed as a function of the mechanical impedance such as For a sinusoidal quantity x(t) = X cos(ωt + φ), called X its associated phasor, calculating the derivative of x(t) we have: dx(t)/dt = d Re[Xe jωt ]/dt = Re[ jωXe jωt ]; so in the phasors domain ˙ = jωX. we can write X
7
17
B
Mobility Y
Impedance Z
1.3 Forced Oscillations
Fig. 1.9 Real and imaginary parts of the mechanical impedance and admittance of the harmonic oscillator (1.49), for 1/K = 1 (Gain), Q = 0.5 and ω0 = 2π 440. a Mechanical impedance; b mechanical admittance or mobility; c parametric frequency representation of the imaginary part as a function of the real part of the mobility (B(ω) vs. G(ω) mobility Nyquist diagram)
Fig. 1.10 Mobility analysis in mechanics allows us to identify the natural modes of the system in which the system has the most mechanical stress
x(t) = Re
F F jωt = e sin(ωt + φ1 ) jωZ ωZ
(1.51)
where Z is the module of the complex impedance. Remark 1.9 Note that very often for the representation of dynamic systems, instead of using the frequency response plot, as in Fig. 1.6, is used the curve of the real and imaginary part of the mechanical impedance Z = F/V (or of the mobility Y = 1/Z), as for example Fig. 1.9 shown. In addition, notice how at the resonance the value of the mobility has its maximum value and is real, while for the impedance the value of the real part remains constant. Example 1.1 As an example Fig. 1.10 shows the mobility trend for a dynamic system with two resonances at 140 and 280 Hz and an anti-resonance at 170 Hz.
18
1 Vibrating Systems
Fig. 1.11 Complete response of a simple oscillator with Q = 10 for a co-sinusoidal external force f (t) = cos(ωt) applied for t ≥ 0. The ratio ω/ω0 varies between 0.1 and 4 (modified from [2])
1.3.4 Calculation of Complete Response In order to obtain the complete transient and steady-state response, Eq. (1.46) should be explicitly calculated. In these cases, it can be shown that the displacement takes the form F sin(ωt + φ), t ≥ 0 (1.52) x(t) = Ae−αt cos(ωd t + φ)+ ωZ where the arbitrary constants A and φ depend on the system initial conditions. Example 1.2 As an example, in Fig. 1.11, the trend of the complete response for different ω/ω0 ratios is reported for a mechanical oscillator with Q = 10 where it can be observed that the natural resonance at frequency ωd tends to dampen very quickly.
1.3.5 Generic Helmholtz Oscillating Systems The previously studied mass–spring–losses model is also representative of physical systems of different nature. For example, a piston of mass m, free to move inside a closed cylinder of surface S and length L, vibrates in a manner similar to the mass– spring system described above (see Fig. 1.12a). In this case the spring constant constituted by the confined and compressed air is equal to K = γ pa S/L, where pa is the atmospheric pressure, m is the mass of the piston, and γ is a constant that for
1.3 Forced Oscillations
19
Fig. 1.12 Simple vibrating systems. a Piston in a cylinder. b Helmholtz resonator; c Helmholtz resonator without neck (as the cajón box-shaped percussion instrument). d Pendulum (modified from [2])
the air is equal to γ = 1.4. The natural frequency of the system will therefore be equal to 1 γ pa S . (1.53) f0 = 2π mL
1.3.5.1
Helmholtz Resonators
Herman von Helmholtz,8 in 1877 [4], developed a series of hollow spherical resonators made of glass and metal with two openings; a large opening at the bottom, suitable for insertion into the ear, and an opening at the top through which the sound enters the body of the resonator. Basically, the Helmholtz resonator consists of a volume of confined air V accessible through a cylinder called neck, as shown in Fig. 1.12b. Neglecting nonlinear effects, the system behaves as a 2nd-order mechanical (or electrical) resonator. For which the mechanical parameters m, K , R (or electrical parameters as: resistors R, inductors L, and capacitors C), must be redefined according to the acoustic quantities characterizing the resonant cavity. With respect to the previous case, in a cylinder, the mass of the piston is replaced by the mass of air in the neck. The mass m is equal to m = ρSL
8
(1.54)
H. von Helmholtz (1821–1894), was a German physician, physiologist, and physicist. True homo universalis, was one of the most versatile scientists of his time.
20
1 Vibrating Systems
while the constant K can be defined as K =
ρ S 2 c2 V
(1.55)
where ρ [kg m−3 ] is the density of the air and c the speed of sound. According to Definition 1.3, the natural frequency of the system is equal to 1 f0 = 2π
c K = m 2π
S . VL
(1.56)
Figure 1.12c shows the neckless variant of the Helmholtz oscillator where the confined cavity is accessible from a hole with radius r . In this case the oscillating mass m can be calculated by means of the equivalent neck which has a length equal to 16 2 L = 3π r , for which the mass m is m = ρ S L = ρ πr 2
16 3π
∼ = 5.33ρr 3 .
(1.57)
The natural frequency will therefore be equal to c f0 = 2π
1.85r . V
(1.58)
Moreover, note that in case the surface around the hole is not wide, the natural frequency is lower than the calculated one.
1.3.5.2
Helmholtz Resonator’s Losses
Even though no additional damping is included, a Helmholtz resonator dissipates energy around the natural frequency due to losses in the resonators neck. Thus, the damping R is proportional to the velocity in the neck (R x). ˙ The total linear acoustic dissipation in the resonator neck can be formulated as L R= r
√ 2μe f f ρω 2μ0 ρω +2 πr 2 πr 2
(1.59)
where L and r are, respectively, the length and radius of the neck, and
2 v μe f f = μ0 1 + (γ − 1) μ0 c p where μ0 Ns/m2 is the dynamical viscosity of air, γ ≈ 1.401 the ratio of specific heats, c p is the heat capacity at constant pressure and v the velocity; and where the
1.4 Electrical Circuit Analogies
21
viscosity of air depends mostly on the temperature. At 20 ◦ C, the viscosity of air is 1.813 × 10−5 Ns/m2 . The first right-hand term in Eq. (1.59) represents the viscous losses in the neck wall. At the wall boundaries the velocity of moving air particles is zero, whereas velocity reaches a maximum in the middle of the opening (free stream velocity). Considering a wider pipe, the area of free stream velocity is bigger and hence losses due to viscous friction relatively smaller. The second right-hand term in Eq. (1.59) represents the viscous losses of the neck ends. This expression holds for neck ends, ending in an open space as well as for neck ends in an infinite baffle. Remark 1.10 Note that the calculation of losses can be difficult. So, in practical cases, with simple measurements, it is possible to estimate ω0 and Q of the resonant system and with simple inverse procedure determine the losses.
1.4 Electrical Circuit Analogies Many phenomena of great interest in engineering, such as electromagnetic, mechanical, acoustic, and thermodynamic ones, are generally very complex, and the fundamental laws of the models that represent them, while adequately explaining their basic mechanisms, are mostly inadequate for technological application or numerical simulation. Moreover, the fundamental laws of physics are not suitable for the design or study of systems characterized by a complex aggregation of many interacting subsystems (or devices). Simplification of the problem can be achieved by the introduction of appropriate mathematical models that allow a certain local simplicity and a high global complexity. Local simplicity—The problem of local simplicity can be addressed by introducing the so-called lumped parameters elements, i.e., a few types of basic atomic devices, called constituent elements or simply elements, formally well defined and characterized by precise, and generally simple, laws among the physical quantities of interest. These laws, called constitutive relations, locally concentrate all the properties of the element itself. Global complexity—Global complexity is achieved by connecting simple elements according to a certain topology. The topological and algebraic principles of circuit theory represent a fundamental tool for a consistent analysis and systematic synthesis of complex systems [10]. The versatility of the circuital approach has favored the determination of a analogous circuit representation for many physical structures such as mechanical, acoustical, biological, and thermodynamical. This representation is mainly related to the way the physical variables in the domain of interest, such as force, velocity, pressure, temperature, and heat, are associated to the electrical quantities: voltage and current [14].
22
1 Vibrating Systems
Fig. 1.13 Electric bipole with the coordinate verses of currents i(t) and voltage v(t) signs
1.4.1 Kirchhoff Laws In circuit analysis with lumped-element model or lumped parameters model [10], the circuits are characterized by two electrical (or acoustical or mechanical or electromagnetic ...), quantities: 1. one of type denoted as “through” or “flux,” φ(t); 2. one of the type denoted as “at-the-ends” or “potential” or “across,” ξ(t): 3. their product is the instantaneous power: p(t) = φ(t)ξ(t). The notation “through” or “flux,” and “at-the-ends” or “potential” or “across,” refers to the definition of the constitutive relationships9 of a given element and are interchangeable. For example, in the electrical case considering the physical element resistor R, for Ohm’s law it is worth v(t) = Ri(t). In this case the quantity “through” is the current i(t), and the quantity “at-the-ends” is the voltage v(t). However, note that you can also make the opposite, also denoted dual, choice, considering the physical element denoted as conductance G (defined as the inverse or a resistance: G = 1/R), is worth i(t) = Gv(t). So, in this case the “through” quantity is the voltage v(t), while the “at-the-ends” quantity is the current i(t). An electric bipole, as Fig. 1.13 shown, is entirely characterized by the constitutive relations, between the voltage v(t) (i.e., the “at-the-ends” quantity) and current i(t) (i.e., the “through” quantity). For example, the resistor R, the capacitor C, and the inductor L are, respectively, defined by the following constitutive relations.
v(t) = Ri(t), 1 v(t) = i(t)dt, C di(t) , v(t) = L dt
v(t) , Resistor R R dv(t) , Condenser C ⇔ i(t) = C dt 1 ⇔ i(t) = v(t)dt, Inductor L . L ⇔
i(t) =
(1.60)
For an electrical circuit are valid simple topological relationships, known as Kirchhoff’s Laws, which together with the constitutive relationships allow a simplified and systematic study also of very complex circuits. 9
In engineering, a constitutive relationship is a relationship between two physical quantities (e.g., field or force) that is specific to a material or substance, or an ideal abstract- or physical-element, which approximates the response of that material, substance, or physical element, to external stimuli, usually as applied fields or forces.
1.4 Electrical Circuit Analogies
23
Kirchhoff’s first law, or Kirchhoff’s current law (KCL), states that: The current flowing into a closed surface (or a node) is equal to current flowing out of it. That is
i k (t) = 0.
(1.61)
k
The second law of Kirchhoff, or Kirchhoff’s voltage law (KVL), states that: For a closed loop series path the algebraic sum of all the voltages around any closed loop in a circuit is equal to zero. That is
vk (t) = 0.
(1.62)
k
Property 1.3 Power Conservation Principle—The basic consequence of Kirchhoff’s topological laws is the law of conservation of power. Given a circuit with R branches we have that P(t) =
R
vk (t)i k (t) = 0.
(1.63)
k=1
The fundamental basis for the circuit representation also of non-electric structures is based on the definition of variables of the flow and potential type such that the topological laws of Kirchhoff, and consequently the law of conservation of power (1.63), are valid. As a consequence it is possible to define, in the domain of interest, the circuit components, analogous to the resistor, capacitor and inductor. However, the choice of the mechanical variables in analogy to the electrical variables is not univocal and there are two widely used analogies: the analogy of mobility and the analogy of impedance. • The analogy of mobility is an indirect analogy where the force is analogous to current and velocity to the voltage. • The analogy of impedance is a direct analogy where the force is analogous to voltage and velocity to the current. Remark 1.11 The analogy of mobility is a consequence of the use of coordinates of the Lagrangian type that is mobile for each element of the mechanical structure. On the contrary, in the analogy of the impedance, Eulerian or fixed coordinates are used with respect to the inertial reference; so, in this case it the force is the potential (or at-the-ends) variable. However, both methods will produce equivalent results.
1.4.2 Analogy of Mobility To introduce the analogy mechanical–electrical circuits, consider the expression (1.31) of the mass–spring–damper mechanical system reported in Fig. 1.5.
24
1 Vibrating Systems
Fig. 1.14 Analogy of mobility. Equivalent electrical circuit (parallel resonant circuit) of the mechanical oscillator of Fig. 1.5
In the analogy of mobility, we have that the force f (t) is the flux quantity and velocity v(t) is the potential quantity. Thus let indicate the velocity as v(t) = x(t), ˙ Eq. (1.31) can be written as m v(t) ˙ + Rv(t) + K
v(t)dt = f (t).
(1.64)
For the KCL the previous expression indicates that the force f (t) is equal to the sum of three flux quantity f m (t), f R (t) and f K (t), so it can be interpreted as the parallel of three electrical components subjected to the identical velocity. Therefore, the constitutive relations of the mechanical components, in analogy of Eq. (1.60), expressed in terms of force and velocity, can be defined as follows 1 f (t)dt, Mass C = m m v(t) = R −1 f (t), Damper Re = R −1 d f (t) , Spring L = K −1 . v(t) = −K −1 dt v(t) =
(1.65) (1.66) (1.67)
The equivalent electrical circuit is shown in Fig. 1.5 (Fig. 1.14).
1.4.2.1
Analogy of Mobility Example
As an example, we consider the mass–spring–damper systems previously introduced, characterized by Hooke’s and Newton’s laws. To this end, we consider the system, illustrated in Fig. 1.15a, consisting of two rigid bodies of masses m 1 and m 2 , sliding on a horizontal support plane with velocities v1 and v2 , respectively, two springs and a damper. The rigid body of mass m 1 is subjected to the forces F1 , F 2 , and F3 , and, with the verses in Fig. 1.15a the equilibrium law of Newton is valid: k Fk = ma, so we get F1 − F2 − F3 = m 1
dv1 dt
(1.68)
where the quantity m 1 dvdt1 represents the force of inertia. The above expression can be easily interpreted as a Kirchhoff’s law: the algebraic sum of the forces is zero. Generalizing, we can assume that the force is analogous to the current. Let us now
1.4 Electrical Circuit Analogies
25
Fig. 1.15 Analogy of mobility example. a Mechanical structure with masses, springs, and dampers. b Equivalent electrical circuit
consider the velocities of points a, b, c, and d. For reasons of mechanical continuity we can easily verify that the sum of the relative velocities of the elements that meet along the dotted path in the figure is zero: k vk = 0; the relative velocities of the extreme of spring 2 with respect to the other extreme (vc − vd ) and (va − vb ) of the damper are necessarily identical, so we have (vc − vd ) + (va − vb ) = 0.
(1.69)
Ideal spring—The ideal spring is a massless and lossless (due to vibration) element. The force F developed by the spring depends on its displacement x, and the law is of the type F = g(x). Considering g(x) linear, we arrive at Hooke’s formula whereby the force is proportional to the spring’s elastic constant K for the displacement F = −K x, ⇔
x =−
1 F. K
(1.70)
To obtain the constitutive relation of the spring, it is necessary to make explicit the link between the force F here considered as through quantity, and the velocity v here considered at-the-ends quantity. Then we have v=
d(−F/K ) 1 dF dx = =− dt dt K dt
(1.71)
since F is analogous to current, and v is analogous to voltage it follows that the mechanical spring is analogous to electrical inductance, and we can write: L = −1/K . If we have two springs connected in series we have therefore Ls = L1 + L2
(1.72)
so the equivalent elastic constant is Ks =
K1 K2 . K1 + K2
(1.73)
26
1 Vibrating Systems
Fig. 1.16 Analogy of impedence. Equivalent electrical circuit (series resonant circuit) of the mechanical oscillator of Fig. 1.5
Ideal damper The ideal damper is a mechanical quantity analogous to electrical resistance. The velocity v is proportional to the damping constant R. v = R F.
(1.74)
Ideal mass The ideal mass consists of a rigid body that can move without friction. Therefore the law of inertia applies F =m
dv . dt
(1.75)
This relationship is analogous to i(t) = C dv(t) , and for the ideal mass we have the dt analogy with the capacitance of the electric circuit. It is important to note that in this case the velocity to be considered is the absolute velocity of the element. In other words one of the two extremes of the component is the fixed inertial reference. Considering the structure of Fig. 1.15a, for the purpose of characterization with an electrical circuit, we need to consider only the two absolute velocities of the rigid bodies v1 and v2 with respect to the inertial reference (mass of the circuit). These velocities are considered as the voltages of nodes 1 and 2 of the circuit. Spring 1 (denoted by inductor L 1 ) is connected to rigid body m 1 (node 1) and to the constraint (considered as zero velocity or inertial reference). Spring L 2 and damper (denoted by resistor R) are connected to the two masses. With simple visual inspection we can consider the equivalent electrical circuit of the mechanical system shown in Fig. 1.15b.
1.4.3 Analogy of Impedance In the analogy of impedance we have that the force f (t), analogous to voltage, is the potential quantity, and velocity v(t) is the flux quantity. Thus, with similar reasoning of the previous case, the expression (1.64) can be seen as the contribution of three terms vm (t), v R (t), and v K (t). However, in this case we have the sum of three potentials, so the equation is interpreted as the KVL of a mesh with three circuit elements in series as shown in Fig. 1.16. Therefore, the constitutive relations of the mechanical components, in analogy of Eq. (1.60), expressed in terms of force and velocity, can be defined as follows
1.5 Nonlinear Oscillating Systems
1 v(t)dt, Spring C = K −1 K f (t) = Rv(t), Damper Re = R dv(t) f (t) = m . Mass L = m. dt
f (t) = −
27
(1.76) (1.77) (1.78)
Considering also a more complex mechanical device, to the series of elements in Fig. 1.16, should be added other impednces. So, let Z ms (s) = F(s)/V (s) be the global impedance of the mechanical system, as V (s) = s X (s), for a given force f (t) the relationship force–displacement in time domain can be written as f (t) = L −1 {s Z ms (s)} ∗ x(t)
(1.79)
where the symbol “∗” indicate the convolution operator.
1.5 Nonlinear Oscillating Systems Interesting theoretical tools for the study of complex acoustic systems can be determined by introducing nonlinear dynamical system [1–23]. In fact, in many acoustic musical instruments, there are acoustically interesting timbres in the presence of nonlinearity in the excitation organs such as, for example, in the interaction of the bow with the violin string, in the hammer striking the piano string or in reed instruments such as saxophone and clarinet and in many other cases. More generally, nonlinear acoustics dealing with sound waves of sufficiently large amplitudes such that friction, elastic-coefficient or fluid parameters that characterize the propagation of the acoustic wave, cannot be approximated with linear relationships. Thus, large amplitudes require using full systems of governing equations of fluid dynamics (for sound waves in liquids and gases) and elasticity (for sound waves in solids). These equations are generally nonlinear, and their traditional linearization is no longer possible. However, the treatment of nonlinear systems is very complex and goes beyond the scope of this text. Only by way of example, in this section only simple cases are examined.
1.5.1 General Notation and State Space Representation In general terms, a nonlinear dynamical system is a group of interacting or interrelated elements acting according to a set of rules, i.e., “as a system,” such that the superposition principle is no longer valid and that can be described by a nth-order nonlinear ordinary differential equation (ODE). Let u(t), x(t) be, respectively, the external forcing and the output signal, a general form of a nonlinear dynamical system
28
1 Vibrating Systems
can be written as g(t, x(t), x(t), ˙ x(t), ¨ . . .) = f (t, u(t), u(t), ˙ u(t), ¨ . . .)
(1.80)
where f (·) and g(·) represent the system structure. For the study of Eq. (1.80), we proceed to the reduction of the ordinary nth-order ODE as a system of n 1st-order differential equations. This technique is based on the introduction of a new quantity called the state variable (w(t), w(t) ˙ , etc.) that allows to greatly simplify some types of problems, avoiding the introduction of complex forms of ODE solution. Example 1.3 For example, consider a system (without external forcing u(t) = 0) described by the following 3rd-order ODE ... x = f (t, x, x, ˙ x) ¨ ˙ and w3 = x, ¨ you can with a simple change of variables, placing w1 = x, w2 = x, replace the above 3rd-order ODE with a 1st-order system of ODEs. That is w˙ 1 = w2 w˙ 2 = w3 w˙ 3 = f (t1 , w1 , w2 , w3 ).
(1.81)
It is easy to see that the 3rd-order ODE and the 1st-order system’s ODEs are equivalent in the sense that if x(t) is a solution of the 3rd-order ODE, then w1 (t) = x(t), ˙ and w3 (t) = x(t) ¨ are the solutions of the system of ODEs. w2 (t) = x(t)
1.5.1.1
Nonlinear State Space Model
In the previous example, the new variables w1 , w2 , and w3 , replaced in the 3rd-order ODE, are denoted as state variables. Moreover, the resulting 1st-order ODE system is denoted as state space model that, generalizing Eq. (1.81), can be written as w˙ 1 = w˙ 2 = .. . w˙ n =
f 1 (t, w1 , . . . , wn , u 1 , . . . , u p ) f 2 (t, w1 , . . . , wn , u 1 , . . . , u p ) .. . f n (t, w1 , . . . , wn , u 1 , . . . , u p )
(1.82)
where w1 (t), etc., wn (t) are the state variables that together with their time derivative w˙ 1 , etc., w˙ n , represent the memory of the dynamic system; if present, u 1 (t), etc., u p (t) are specified inputs variables. Thus in vector form we can write
1.5 Nonlinear Oscillating Systems
29
˙ = f(t, w, u) w
(1.83)
usually denoted state equation and refer w as the state and u as the input. Sometime another equation y = h(t, w, u) (1.84) denoted as output equation, is associated with (1.83) thereby defining a q-dimensional output vector. Moreover, if the system is time-invariant, the variable t can be omitted and Eq. (1.83) can be written as ˙ = f(w, u) w (1.85) y = h(w, u) sometimes these kind of system of ODEs are defined as autonomous systems.
1.5.1.2
Typical Nonlinear Phenomena
The nonlinear systems dynamics are much richer than that of linear systems. This gives them great appeal in acoustics because the dynamics of acoustic systems are often nonlinear in nature, and more constructively in the design of acoustic, electronic, and virtual musical instruments, where nonlinear phenomena often result in perceptually interesting sounds The following are some of the most typical behaviors of nonlinear systems [22]. • Multiple isolated equilibria points—A linear system can have only one equilibrium point. A nonlinear system can have more than one isolated equilibrium point. The state may converge to one of several steady-state operating points, depending on the initial state of the system. • Limit cycles—In nonlinear system stable oscillations of fixed amplitude and frequency, irrespective of the i.c. state must be produced. This type of oscillation is known as a limit cycle. • Multiple modes of behavior: subharmonic, harmonic, or almost-periodic oscillations—A nonlinear system under periodic excitation can oscillate with frequencies that are submultiples or multiples of the input frequency. Moreover, it may even generate an almost-periodic oscillation. In addition, it may even exhibit a discontinuous jump in the mode of behavior as the amplitude or frequency of the excitation is smoothly changed. • Chaos—Despite the deterministic nature of the system, a nonlinear system can have a more complicated steady-state behavior that is not equilibrium, periodic oscillation, or almost-periodic oscillation. Such behavior is usually referred to as chaos. In practice in chaotic systems, self-similarity or pseudoperiodic oscillations are generated in which each period has a similar patterns, but not identical, to the previous one. In this case, drawing the progress of the state variables (e.g., w1 (t), w2 (t), etc.),
30
1 Vibrating Systems
these have a trajectory that for each period is different. Furthermore, to better characterize the behavior of nonlinear systems, it is interesting to draw the so-called phase portrait. Definition 1.7 Phase portrait—Given a dynamical system, the phase portrait (a term borrowed from Poincaré’s theory [9]) is a geometric representation of the system’s state variables orbits in the space spanned by them, called state space. Thus, the phase portrait is a representation of the solutions of a system of differential equations by trajectories in the space of the system’s state variables, called phases (e.g., w1 (t) vs. w2 (t) vs. w3 (t), etc.). For example, in the linear case, for sinusoidal inputs, the phase portrait would be given by the composition of sinusoidal signals, since we would simply have circles or ellipses or, in cases of more harmonic frequencies, the so-called figures of Lissajous (see, for example [2]). Instead, in nonlinear cases the trajectories would be similar but different for each period, generating a chaotic drawing, i.e., strange figures as if they rotated around a fixed point which, for this reason, is sometime called strange attractor or attractor, to which tends in time the trajectory of a complex system, whatever the initial conditions. It is said, therefore, that the system is attracted to this set of points The phase plane analysis is a commonly used technique for determining the qualitative behavior of solutions of systems of ODEs in low dimensions space, using the normal form notation.
1.5.1.3
Linear State Space Model
Finally, observe that in the case of linear and time-invariant system, the state space representation can be written as ˙ = Aw + Bu w y = Cw + Du
(1.86)
where A ∈ Rn×n , B ∈ Rn× p , C ∈ Rq×n and D ∈ Rq× p , are the matrices that characterize the linear system. Example 1.4 As a simple example we derive the state space model of the mass– spring–damper oscillating system with external forcing. To derive the model we rewrite Eq. (1.31) as m x¨ = f (t) − R x˙ − K x for the solution we place w1 (t) = x(t) and w2 (t) = w˙ 1 (t) for which we can write the following equations
1.5 Nonlinear Oscillating Systems
31
Fig. 1.17 Generic feedback system, which under certain conditions, can be considered a self-sustaining nonlinear oscillator, i.e., in the system input, is also present the output variable, or its derivative, resulting in possibly unstable feedback system
w˙ 1 (t) = w2 (t) R 1 K f (t) − w2 (t) − w1 (t) w˙ 2 (t) = m m m and x(t) = w1 (t) the previous expressions can be reduced to the general form (1.86) as
0 1 0 w1 (t) w˙ 1 (t) = + 1 f (t) w˙ 2 (t) w2 (t) − mK − mR m
w1 (t) + [0] f (t) x(t) = 1 0 w2 (t)
where the reader can easily recognize the matrices A, B, C, and D which in this case is null.
1.5.2 2nd-Order Nonlinear Oscillating Systems Considering Eq. (1.31), which describes a linear 2nd-order oscillating system, if one of the coefficients m, R, and K is not more constant but is nonlinearly dependent on the displacement x or on the velocity x, ˙ it is easy to see that the superposition principle is no longer valid and the overall system is nonlinear. For example, since the mass can be considered constant, very often in real cases due to high displacements, the spring coefficient is not constant but depends nonlinearly on the displacement. In this case we can use a polynomial development of the type: K (x) = K 0 + K 1 x + K 2 x 2 + · · · . In other cases, due to the friction viscosity phenomenon, the damping coefficient can assume values that depend nonlinearly on the velocity. Then we can write: R(v) = R0 + R1 v + R2 v 2 + · · · . In other situations that are very common in complex acoustics, as shown in Fig. 1.17, there is a nonlinear interaction, which can be thought of as feedback, between the external forcing and the vibrating device. The excitation itself may be nonlinearly dependent on displacement or velocity as, for example, in the interaction
32
1 Vibrating Systems
of the violin string with the bow, in the piano hammer and string, or as in the clarinet reed and airflow. In these cases, Eq. (1.31) can be rewritten in a more general form m x(t) ¨ + R x(t) ˙ + K x(t) + g (x, x, ˙ . . . , t) = f (x, x, ˙ . . . , t) .
(1.87)
Remark 1.12 Observe that physical systems in Eq. (1.87), due to the presence of feedback and nonlinearity, under some condition, may exhibit chaotic/hysteretical behavior. Below are some simple examples of nonlinear systems that show, under certain conditions, a chaotic trend, i.e., the presence of an attractor or strange attractor in the phase space.
1.5.3 Undamped Pendulum As the first example of a nonlinear oscillator, let’s consider the simple undamped nonlinear planar pendulum. Let θ be the angle of the pendulum respect to the vertical axes of rest position (see Fig. 1.12d); using Lagrangian mechanics the dynamics of a pendulum under the influence of gravity, the motion of a pendulum can be described by the dimensionless nonlinear equation θ¨ = sin(θ ). Thus, considering Eq. (1.87), for x ≡ θ we have that R = 0, K = 0 and f = sin(x). In this case Eq. (1.87) is reduced to x¨ = sin(x)
(1.88)
so, the above expression is easily traceable to the general nonlinear system in Fig. 1.17 where f (·) = 0 and the feedback signal is equal to sin(x). Reducing the ordinary 2nd-order ODE in normal form, we convert the (1.88) in state space representation by introducing the status variables defined as w1 = x and ˙ This gives the following 1st-order equations system ([22], Eqs. 1.7 and 1.8) w2 = x. w˙ 1 = w2 w˙ 2 = − sin(w1 ).
(1.89)
Physically, the state variable w1 represents the position of the pendulum (i.e., the . angle x ≡ θ in Fig. 1.12d) and w2 represents its angular velocity v ≡ ω = ∂θ ∂t In Fig. 1.18 are given the trajectory of w1 (t) and w2 (t), starting from different initial conditions. From the figure it can be observed that starting from small i.c., [0 0.5] (i.e., x(0) = 0 and v(0) = 0.5), the pendulum trajectories are (almost) of sinusoidal type. For i.c., [0 1.95], the trajectories have a trend of a distorted sinusoid. Unlike for i.c., [0 2.0], the trajectories assume a “strange” non-periodic trend. The locus in the w1 –w2 plane, that is the solution for all w(t) for all t ≥ 0, represents a trajectory or orbit of (1.89) denoted as state plane or phase plane. In addition, we can consider a vector field according to the following definitions.
1.5 Nonlinear Oscillating Systems
33
Fig. 1.18 Dynamic of simple pendulum. Time domain evolution of the position x(t) and the velocity x(t), ˙ starting from different i.c
Definition 1.8 Vector fields—A vector field in the state space plane is a map, which assigns to each point w, in the plane a vector indicated as f(w). In general, these curves can be written in vector form as ˙ = f(w) w where f(w), for a 2nd-order system, is the vector [ f 1 (w) f 2 (w)], that we can considered as a vector field on the state plane, which means that to each point w in the plane, we assign a vector f(w). Thus, repeating this at every point in a grid covering the plane we obtain the vector field diagram. For example, in Fig. 1.19 is represented the vector field diagram for the pendulum Eq. (1.89) for a grid w1 , w2 ∈ [−π, π ] with a step fixed equal to 0.5. Note that, overlapped to the vector field are reported the state trajectory (qt the steady state) for three specific i.c. In Fig. 1.20 is reported the phase portrait of the pendulum (1.88). From the figure we can observe the presence of many attraction points or attractors, spaced 2π radians on the w1 -axis, i.e., at w = [2π k 0]; for k = . . . , −2, −1, 0, 1, 2, etc. In both cases, starting from i.c. not far from these attractors, the trajectories are well defined. Therefore, the trends of w1 (t) and w2 (t), as already illustrated in Fig. 1.18, are almost periodic. For example, the i.c. w = [0 2], which corresponds to curve at the top of Fig. 1.20, does not present any point of attraction for which, as shown in the right part of Fig. 1.18, the trends of x(t) and v(t), tend to take non-periodic and, sometime, divergent forms. A phase space of a dynamical system is a space in which all possible states of a system are represented, with each possible state of the system, in terms of position and velocity (w1 , w2 ), corresponding to one unique point in the phase space.
34
1 Vibrating Systems
Fig. 1.19 Vector field of the pendulum Eq. (1.89) for a grid w1 , w2 ∈ [−π : 0.5 : π ]. Moreover, are reported two-state trajectory for three specific i.c., whose values are reported in the figure legend, identical to that of the time series in Fig. 1.18
Fig. 1.20 Dynamic of simple pendulum. Phase portrait for several i.c. from x(0) = −2π and v(0) ∈ [−π π ]; and from x(0) = 0 and v(0) ∈ [−π, π ], etc. Note that the w1 -axis (that is the θ angle of pendulum), wraps onto itself after every 2π radians
1.5 Nonlinear Oscillating Systems
35
Fig. 1.21 Simplified mechanical scheme of relaxation mass–spring–damper oscillator driven by an external force: a Van der Pol oscillator; b Rayleigh oscillators
1.5.4 Van Der Pol and Rayleigh Oscillators A simple nonlinear oscillator can be characterized by a mechanical resistance coefficient that is a nonlinear function of displacement and velocity: R → R(x, x). ˙ In particular, as we shall see, in the Van der Pol’s oscillator we have that the damping coefficient is proportional to the square of displacement R ∝ μ(1 − x 2 ), while in the Rayleigh’s oscillator is proportional to the square of the velocity R ∝ μ(1 − x˙ 2 (t)). Remark 1.13 The Rayleigh [7] and Van der Pol [8] oscillators, shown schematically in Fig. 1.21, are both self-sustaining oscillators.
1.5.4.1
Van der Pol Oscillator
The Van der Pol oscillator is the first relaxation oscillator 10 [8]. The equation of motion, easily derived from the mechanical system in Fig. 1.21a, is given by ([18], Eq. 1a) u2 u˙ + K u = cu˙ (1.90) m u¨ + ch 2 a + u2 where m and K are the mass and stiffness respectively, ch is the linear viscous damping coefficient, and a is the length of the damper. Provided that the displacement of the system is such that |u/a| < 0.2, the above can be written as m u¨ − (c − ch u 2 )u˙ + K u = 0. √ Writing √ the undamped natural frequency as√ω0 = K /m, the damping ratio ζ = c1 /2 K m, and with the substitution x = u/ c/ch , the above equation can be written in the canonical Van der Pol 2nd-order ODE x¨ − μ(1 − x 2 )x˙ + x = 0 10
(1.91)
A relaxation oscillator is defined as a nonlinear device, e.g., an electronic circuit that can generate a non-sinusoidal repetitive output signal.
36
1 Vibrating Systems
√ where μ = 2ζ = c1 / K m. In other words the damping coefficient is R = μ(1 − ˙ x 2 ), while the frictional force is F = μ(1 − x 2 )x. Property 1.4 The above equation represents a self-sustaining oscillator that vibrates in a limit cycle and has the property that it generates energy in the part of the cycle when the displacement is small and dissipates energy in the part of the cycle when the displacement is large. Writing the above in state space representation given by the following system of equations w˙ 1 = w2 w˙ 2 = −w1 + μ(1 − w12 )w2 where w1 and w2 are system state variables which are a function of the time t and μ is a scalar parameter indicating the nonlinearity and the strength of the damping. Starting with different i.c. the phase portrait tend to similar trajectories. In Fig. 1.22 is reported the time series trend and the phase portrait for different values of the parameter μ.
1.5.4.2
Rayleigh Oscillator
The equation of the Rayleigh oscillator has the form ([7], p. 81, Eq. 2) m u¨ − (c − c3 u˙ 2 )u˙ + K u = 0
(1.92)
where the damping terms have been collected on the left-hand √ side of the equation as in the previous case. Now, with the substitution x = u/ c/c3 , and using the previously defined non-dimensional variables we obtain the canonical Rayleigh equation (1.93) x¨ − μ(1 − x˙ 2 /3)x˙ + x = 0. The reader can easily see that by differentiating Eq. (1.93) with respect to x˙ and, again, with a simple change of variable, the result is formally identical to the Van der Pol relation given in Eq. (1.91). Remark 1.14 Note that, as stated in the introduction of this section, the Van der Pol and Rayleigh oscillators are different in the interpretation of the damping force which in Van der Pol is equal to Fv = μ(1 − x 2 )x˙ while for Rayleigh it is Fr = ˙ Again, as mentioned above, the displacement and velocity of the Van μ(1 − x˙ 2 /3)x. der Pol system are the same as the velocity and acceleration of the Rayleigh system. In the Rayleigh oscillator, the mechanical resistance depends on the square of the velocity. The resultant force will follow a cubic relation Fr = μ(v − v 3 /3), as shown in Fig. 1.23. Thus, in the negatively slope section, the curve can be interpreted as a negative mechanical resistance.
1.5 Nonlinear Oscillating Systems
37
3
5
2 1 0
0
-1 -2 -3 100
120
140
160
180
200
-5 -5
0
5
1
3 2
2 1
1
0
0
-1
-1
-2 -3 100
-2 120
140
160
180
-2
200
-1
0
1
2
1
2
1
3 2
2 1
1
0
0
-1
-1
-2 -3 100
-2 120
140
160
180
200
-2
-1
0 1
Fig. 1.22 Dynamic of the Van der Pol oscillator for i.c. x(0) = 1 and x(0) ˙ = 0 and for different values of the parameter indicating the nonlinearity μ = [0.5 5 10]. Increasing the value of μ increases the oscillation frequency and accentuates the saturation of the waveform
The Rayleigh oscillator is very similar to the Van Der Pol oscillator, except for following key differences. 1. When variable μ increases, in the Van Der Pol oscillator increases the oscillation frequency, while the Rayleigh oscillator increases in amplitude (slope of F-v curve); 2. In Rayleigh’s equation the damping force is only a function of velocity cube which makes it easier to study, while in Van der Pol’s equation it is a function of both displacement and velocity.
38
1 Vibrating Systems
Fig. 1.23 Rayleigh dumping force versus velocity, for different values of μ. For |v| < 1 the curve is interpreted as a negative mechanical resistance. The dashed lines indicate that the energy is being dissipated by the damper while the solid lines indicate that energy is being supplied by the damper (modified from [18])
Fig. 1.24 The Duffing oscillator, for K > 0, can be interpreted as a forced oscillator with a nonlinear spring whose restoring force is written as Fx = −K x − bx 3
1.5.5 Duffing Nonlinear Oscillator In the Duffing nonlinear oscillator we have that K (x) = K + bx 2 and f (t) = γ cos(ωt). Thus the Duffing nonlinear oscillator that under certain conditions exhibits chaotic behavior is the one described by the following relationship denoted as Duffing equation ([19], pp. 215, 293) m x¨ + R x˙ + K x + bx 3 = γ cos(ωt)
(1.94)
where m is the mass, R the damping constant, and K (x) the spring parameter defined by the constants K (the usual spring constant) and b; such that for b > 0, this spring is called a hardening spring, while for b < 0, it is called a softening spring although this interpretation is valid only for small x. For K > 0 (see Fig. 1.24), the Duffing oscillator can be interpreted as a forced oscillator with a spring whose restoring force depends on the displacement cube according to the following law
1.5 Nonlinear Oscillating Systems
39
Fx = −K x − bx 3
(1.95)
where for l l0 , K ∼ = 2K m , and b = K m l0 /l 3 . If the amplitude of the oscillation is small, the term x 3 /l 3 is small and can be neglected. In addition, for K < 0, the Duffing oscillator describes the dynamics that such that a can be observed a chaotic motions. In Fig. 1.25 is reported an example of the dynamic behavior of the Duffing oscillator for K = −1, R = 0.1, b = 0.25, ω = 2 and γ = [1.3 1.322 1.5], for [0, 1] i.c. Observe that by increasing the forcing amplitude γ the oscillator assumes a chaotic trend with a complex phase portrait. For small R (i.e., R ≈ 0), if the amplitude of the oscillation is increased, the Duffing equation of motion can be written as m x¨ = −K x − bx 3 − F0 cos(ωt)
(1.96)
this is the case of a simple mass–spring mechanical oscillator reported in Fig. 1.26. In Fig. 1.27 is reported an example of the dynamic behavior of the Duffing oscillator for K = 1, R = 0, b = 0.25, ω = 2, and γ = [1.3 1.322 1.5], for [0, 1] i.c. So, also in this case by increasing the forcing amplitude the behavior of the oscillator takes on a chaotic pattern with a complex phase portrait. However, for its analysis, as a first approximation of x, it is possible to consider a solution of the type x1 = A cos(ωt) and replace it in the right part of Eq. (1.96) obtaining a second approximation of the displacement x, which is denoted as x2 . So we can write m x¨2 = −K A cos(ωt) − b(A cos(ωt))3 − F0 cos(ωt)
(1.97)
then, using the identity cos3 x = 3/4 cos x + 1/4 cos 3x, we get
x¨2 = −
F0 K A 3b A3 + + m 4m m
cos(ωt) −
b A3 cos(3ωt). 36mω2
(1.98)
This approximation, known as the Duffing method, is reliable if the terms b, A, and F0 are sufficiently small. It should be noted that the term bx 3 is responsible for generating the third harmonic of pulsation 3ω. Moreover, considering Eq. (1.98), equating the term cos(ωt) to the term A cos(ωt) (first approximation for y) we obtain Aω2 = i.e., A2 =
4m 3b
F0 K A 3b A3 + + m 4m m
F0 K ω2 − − . m mA
(1.99)
(1.100)
40
1 Vibrating Systems
Fig. 1.25 Dynamic behavior of the Duffing oscillator for K = −1, R = 0.1, b = 0.25, ω = 2 and γ = [1.3 1.322 1.5], for [0, 1] i.c. Time domain evolution of the position x(t) and phase portrait. Note that for a forcing amplitude γ > 1.3 the behavior of the Duffing oscillator takes on a chaotic pattern with a complex phase portrait Fig. 1.26 Single mode model for nonlinear oscillator consisting of a mass–spring system without a supporting plane. This model can be equated to the Duffing equation for R = 0
1.5 Nonlinear Oscillating Systems
41 1.5
1
1 0.5 2
()
0.5 0
0 -0.5
-0.5
-1 -1 100
120
140
160
180
200
-1.5 -1.5
-1
-0.5
0
0.5
1
1.5
1
4
3 2
2 2
()
1 0
0
-1 -2
-2 -3 100
120
140
160
180
-4 -4
200
-2
0
2
4
1
4
6 4
2 2
()
2 0
0 -2
-2
-4 -6
-4 100
120
140
160
180
200
-5
0
5
1
Fig. 1.27 Dynamic behavior of the Duffing oscillator for K = 1, R = 0, b = 0.25, ω = 1 and γ = [0.2 2 5], for [0, 1] i.c. Time domain evolution of the position x(t) and phase portrait. Note that also in this case by increasing the forcing amplitude the behavior of the oscillator takes on a chaotic pattern with a complex phase portrait
This last expression turns out to be a good approximation of the amplitude–frequency relation shown in Fig. 1.28. When b is positive, the spring constant increases with amplitude: In this case it is called hardening spring. In the case of negative b the spring constant and the frequency decrease with the amplitude: In this case the system is described with the term softening spring. The responses of these systems are shown in Fig. 1.29. In acoustic there are many other nonlinear oscillators, many examples can be found in the physical models of musical instruments, and their motion equations have solutions that can be very complicated.
42
1 Vibrating Systems
Fig. 1.28 Amplitude– frequency relation for the system described by Eq. (1.96). The dotted line represents free oscillation (i.e., the natural angular frequency) that varies according to the amplitude of the oscillation itself
Fig. 1.29 Amplitude–frequency relation for the system described by Eq. (1.96). a softening spring system; b hardening spring system; c system with hysteresis. The dotted line represents free oscillation [1, 2]
1.5.6 Musical Instrument as a Self-sustained Oscillator A simplified model of an acoustical musical instrument can be formalized as a 2nd order nonlinear oscillator according to N. H. Fletcher’s model [3], shown in Fig. 1.30. In the scheme, the player is the one responsible for supplying energy to the system, for example, by blowing into a wind instrument, such as a saxophone, trumpet rubbing a sticky bow on a string as in a bowed-string instrument, striking a membrane as in drums, or operating a key that in turn activates a hammer that strikes a string as in the piano, and so on. The scheme is quite general and also applies to instruments such as the pipe organ, in which the player operates the key that opens a valve to let air into the pipe, although, in this case, it is the pump that provides power rather than the player directly.
1.5.6.1
Rayleigh’s Mass–Spring Self-sustained Oscillator
To better investigate the nonlinear mechanisms underlying many complex acoustical models, such as most musical instruments shown in Fig. 1.30, leaving aside the acoustic analysis of the whole instrument which will be covered later in the text, in this section we want to introduce and analyze the “heart of the sound generator,” which in Fig. 1.30 is a self-sustained oscillator. For example, to analyze the interaction
1.5 Nonlinear Oscillating Systems
43
Fig. 1.30 Generic musical instrument as a self-sustained oscillator (modified from [3])
Fig. 1.31 Dissipative self-sustained oscillator. If the moving belt is in contact with the mass, then the friction between them √ will cause a displacement of the mass and will initiate an oscillation at the resonant frequency ω0 = K /m. a Principle diagram to illustrate the stick-slip cycle typical of the excitation mechanism; b qualitative trend of mass position and velocity in the oscillator (modified from [6])
between the bow and the string in a violin, a simple mathematical model of a bowedstring instruments was proposed in 1877 by Rayleigh [7, 20]. Lord Rayleigh compared the behavior of a bow driving a string as a simple dissipative nonlinear mass–spring oscillator, i.e., a self-sustained oscillator [1], with the mass excited by a moving frictional belt, as illustrated in Fig. 1.31a. The belt motion provides energy to the system and, for simplicity, assuming the belt velocity V constant, we have that the energy required to sustain the motion is a time-independent energy source. The oscillations are self-sustaining and produced by some local feedback mechanism due to belt motion, which depends on the presence of friction in static and dynamic states. Remark 1.15 Note that, a large number of real systems (mechanical, acoustic, electronic) show a mechanism traceable to this model. With reference to Fig. 1.31b, at the beginning of the movement, due to the effect of static friction, the mass m is adherent to the belt and moves jointly with it with a speed V for a certain time, called stumbling time, covering a certain space. The
44
1 Vibrating Systems
adherence to the belt ceases when the spring force exceeds the static friction force, the state changes and the friction force becomes dynamic. At this point due to the linear spring force K the mass moves abruptly in the opposite direction to the belt speed, toward the left in the figure, with a speed determined by: 1. the time constants of the mechanical spring-mass oscillator; 2. the dynamic friction which is less than the static friction; 3. a local feedback mechanism that depends on the velocity of the mass with respect to the belt, and that can be interpreted as an additive negative resistance term. After a certain slip time (flyback-time), when the spring force ceases, the mass velocity is zero, friction is static, and the cycle repeats at the oscillator resonant frequency ω0 .
1.5.6.2
Motion Equation
The system analysis can be performed with reference to Fig. 1.31b, and referring to the simple one-dimensional spring-mass oscillating system introduced in Sect. 1.2. Let x(t), be the mass position, K x(t) the linear restoring force, v(t) = x(t) ˙ its velocity, V the belt velocity, and v (t) = V − v(t) the relative velocity between the mass and the belt; we assume that friction of air, or some other damping mechanism, produces a damping force proportional to the velocity of the mass R x(t). ˙ In addition, there is the belt which, due to the friction between the belt and the mass, provides constant energy to the system, i.e., a forcing term F, which depends on the relative velocity between the mass and the belt: F → F[v (t)]. Therefore, we can write the following differential equation: ˙ m x(t) ¨ + R x(t) ˙ + K x(t) = F[v (t)], v (t) = V − x(t)
(1.101)
the solution of the previous equation is quite complex because the forcing term is a function of relative velocity v (t) (local feedback mechanism). To get a solution, Morse in [1], suggests the series expansion of the force as ([1], pag. 842) ˙ + F (V )x˙ 2 (t)/2 + · · · . F[V − x(t)] ˙ = F(V ) + F (V )x(t)
(1.102)
In the dynamical case, the friction force decreases with increasing relative velocity (i.e., with x(t))) ˙ such that F (V ) < 0. As a first approximation, to determine the dynamic behavior of the system, we consider development up to 1st-order. If we place F (V ) = −R , neglect the higher order terms in v, we have F[v (t)] ≈ F(V ) + R v(t), and Eq. (1.101) simplifies as ˙ + K x(t) = F(V ) m x(t) ¨ + R1 x(t)
(1.103)
1.5 Nonlinear Oscillating Systems
45
where R1 = R − R . In other words, the effect of the moving belt, for small velocities v, can be interpreted as a negative resistance R , and if R1 < 0 the total resistance in the oscillator becomes negative and the oscillations grow indefinitely in time. For brevity, here we can indicate qualitatively, also by intuition, that for R1 > 0 the mass moves in the same direction of the belt, i.e., with the reference of Fig. 1.31b, the mass moves to the right with constant velocity V ; while if R1 < 0 the mass moves very fast to the left, in agreement with the time constant of the the mass– spring oscillator in Eq. (1.103). As shown in Fig. 1.31b also in this phase the speed is approximated as constant. Remark 1.16 Note that for large oscillations the motion can be divided into two distinct phases indicated as R1 > 0 sticking, R1 < 0 sliding; demonstrating the sawtooth displacement trend and velocity illustrated in Fig. 1.31b. This mechanism describes in principle the oscillations of a bowed violin string or the squeaking of a door or the brake on a car. Considering the development up to the third order we have: ˙ + C(x˙ 3 (t))3 = −R1 v(t) 1 − F(V − x(t)) ˙ = −R1 x(t)
v(t) μ
2
where μ is the velocity at which the magnitudes of the linear and nonlinear terms are the same. It follows that expression takes the form of the nonlinear Rayleigh oscillator equation ([20], Eq. 2.1) 2 m x(t) ¨ + x(t)[R ˙ 1 − C x˙ (t)] + K x(t) = 0
(1.104)
which formally except for trivial differences is identical to the expression (1.93). However, if this force is small compared with the linear restoring force, i.e., F(V − x(t)) ˙ < K x(t), ˙ it is possible to define a solution of the type 1 A(t) cos(ω0 t + φ) ω0
x(t) =
in which the amplitude is A(t) ([6], Eq. 14.1.18) A(t) = v0 et/τ where τ = 2m/R1 , v0 = v(t)|t=0 .
3 1+ 4
v0 μ
2
2t/τ e −1
− 21
46
1.5.6.3
1 Vibrating Systems
Graphical Solution with Friction Force Diagram
The complete study of the dynamics of the self-sustained Raylaigh oscillator can only be performed by knowing the curve F[v (t)]. However, in many real cases, as well as in the dynamics of bowed instruments, and their numerical implementation by physical modeling, the force friction versus velocity curve can be determined experimentally. For example, the nonlinear relationship between the friction force and the relative velocity, may have a shape shown in Fig. 1.32b. The curve called Friedlander–Keller diagram [5, 6], which although introduced for the study of bowed-string and reintroduced in Chap. 9 in Sect. 9.10.1.2, characterizes the nonlinear behavior of the friction force as a function of relative velocity F[v (t)]. In fact, the dynamic friction coefficient is inversely proportional to velocity and thus increases with decreasing relative velocity, as shown in Fig. 1.32b. When the position of the mass reaches the value such that K x(t) ≥ F[v (t)], the friction decreases abruptly and the mass slips steeply to the left. Thus, the evolution of the velocity waveform is as shown in Fig. 1.32d and the evolution of the displacement waveform in Fig. 1.32c. It is clear that the characteristic square stick-slip velocity waveform depends on the nonlinearity of the frictional force, and in particular the coarse nonlinearity and the sign change near the zero relative velocity condition.
Fig. 1.32 Rayleigh’s self-sustained nonlinear oscillator. a A simplified 2nd-order mass–spring oscillator scheme over moving belt with a constant velocity V . b The Friedlander–Keller diagram that characterize the nonlinear behavior of the friction force as a function of relative velocity v (t). c The evolution of the displacement waveform x(t). d The associated evolution of the velocity waveform x(t) ˙ (modified from [3])
1.6 Continuous Vibrating Systems
47
1.5.7 Multimodal Nonlinear Oscillator In a real mechanical systems, there can be almost an infinite number of modes of vibration. For example, in acoustic resonators of musical instruments, these modes are almost linear with nearly harmonic relationships. In these cases we have a nearly linear oscillator excited by some nonlinear mechanism (usually in the presence of some nonlinear feedback mechanism). In an oscillator with many normal modes (and normal coordinates ψ1 , ψ2 , ..., ψ j ), the spring constant K i used in the equation to describe the i-th normal mode may also depend on the other normal modes; then K i can be expressed as K i = K 0 + K 1 (ψi )
(1.105)
in other words, in general there is a coupling between the various normal modes of oscillation. The force resulting from these nonlinear interactions can be modeled with the following motion equation m i ψ¨ i + R ψ˙ i + K 0 ψi = f 0 (t) +
Fi j (ψi ).
(1.106)
Remark 1.17 Observe that the acoustic musical instruments are generally modeled as nonlinear oscillators. Very often, in this case, the excitation f (t) turns out to be of the type of Eq. (1.106). For example, the force impressed by the bow on the violin string, or the air through the clarinet’s reed. In general, however, the Nonlinearities encountered are much more complex than that of the above examples.
1.6 Continuous Vibrating Systems Let us now analyze the case in which the mechanical oscillator is a device such as a metal bar or a tight string. In this case the elements of the system, such as the mass and the spring, are no longer distinct from each other but merged together and distributed over the entire geometry of the system itself according to precise laws. In this case the physical system can no longer be described with lumped parameters elements (i.e., with null dimension and phenomena with infinite propagation speed), but, in this case must be taken into account of the spatial interactions and the propagation speed the mechanical phenomenon.
Fig. 1.33 Mechanical oscillator with N mass–spring elements. At equal length, for N → ∞, the system become continuous, defined as distributed parameters system and characterized by an infinite number of modes
48
1 Vibrating Systems
By way of example, with reference to Fig. 1.33, we can think of a distributed system as a system with many lumped elements (in the example, masses and springs) very close to each other. The system composed of N mass–spring elements is characterized by N modes. So, in the case where the number of these elements tends to infinity (if the length does not change), we can think in terms of distributed parameters; that is density mass and mechanical tension. Moreover, it is obvious that in the modeling of the distributed phenomenon we must also consider the variation of mechanical quantities, as well as in time also in space.
1.6.1 Ideal String Wave Equation The study of vibrating strings has a very long history. Pythagoras, observing the configurations of a finite vibrating string (i.e., bound to extremes), he noted different configurations corresponding to specific division of the string according to natural relations (2:1, 3:1, 3:2, etc.) moreover, he observed that to each of the specific configurations was associated a specific frequency of vibration. Obviously, such vibration modes are precisely the natural modes of the strings bound to extremes. In addition to this, a more in-depth motion analysis of the string reveals that these modes depend on: the mass density, the tension, the length and the boundary conditions of the string itself.
1.6.1.1
Ideal String Assumptions
For the development of the vibrational model, consider an ideal string as a simple mathematical model that can be defined according to the following assumptions [15]: 1. 2. 3. 4. 5. 6. 7.
the string vibrates in one plane only; perfectly flexible; the tension is constant; the amplitude of the oscillations is small compared to the length of the string; the weight of the string is very small compared to the tension; the string has a uniform linear density; only transverse waves are present and there is no possibility of longitudinal movement.
We indicate, with μ [kg/m] the linear mass-density, with K [N] the tension. Moreover, for the description of the motion of the string we consider a two-dimensional Cartesian coordinate system with the string lying along the x axis and oscillating along y axis, so the vibration is denoted as a displacement y(x, t).
1.6 Continuous Vibrating Systems
49
Fig. 1.34 Representation of the string of infinite length (a), and its infinitesimal section (b)
1.6.1.2
Ideal String Modeling
For the determination of the model of the stretched string, we consider an infinitesimal long section x so that the mass of the section is equal to μ x. With reference to Fig. 1.34, the vertical force Fy acting in the section will be given by the difference in mechanical tension at its ends Fy = Fy2 − Fy1 = K sin θ2 − K sin θ1
(1.107)
for very small angles we have that sin θ ≈ tan θ and considering the m x slope of the stretch of string x, it follows that tan θ = ∂ y/∂ x = m x . Said m x the difference in slope between the two extremes of the section, the vertical force acting on the section x can therefore be rewritten as K m x . The law that defines the model of the dynamic evolution of the vibrating string can be determined by simply considering the balance between the forces involved in the string that are: f or ce = mass × acceleration (Newton’s law); f or ce = spring-elastic-constant × displacement (Hooke’s law). So, we can rewrite Newton’s second law as K m x = (μ x)
∂2 y ∂t 2
⇒
K
∂2 y m x =μ 2 x ∂t
for x which tends to zero we have that
m x ∂m x ∂2 y lim = = 2. x→0 x ∂x ∂x
(1.108)
(1.109)
These forces on a microscopic scale, can be rewritten as (1) mass-densit y × transver sal-acceleration (2) tension × cur vatur e
∂ 2 y(x, t) , or μ y¨ ∂t 2 ∂ 2 y(x, t) → K , or K y . ∂x2 →
μ
50
1 Vibrating Systems
Thus, the equilibrium between the two forces leads to the writing of the wave equation (or D’Alembert11 equation) to a single dimension (1D wave equation) ∂ 2 y(x, t) ∂ 2 y(x, t) 1 ∂ 2 y(x, t) ∂ 2 y(x, t) = μ , ⇔ = ∂x2 ∂t 2 ∂x2 c2 ∂t 2 √ where c = K /μ is the wave propagation velocity. K
(1.110)
1.6.2 Solution of the String Wave Equation The assumptions for determining the solution of Eq. (1.110) are that: (1) an infinitesimal string segment dx, moves only vertically, so that we can calculate the acceleration of the transverse components only; (2) the amplitude of the vibration is small. In the hypothesis of infinite, unterminated string we consider a simple trial function as the product of two exponentials: one defined in the time domain the other in space domain (1.111) y(t, x) = est · evx = est+vx . Substituting the trial function into the (1.110), it is easy to verify that the (1.111) represents its solution when: s = jω, denotes the angular frequency, and v = jλ, denoted as spatial frequency also called wave number. Thus, the following relationship is valid ω2 = c2 (1.112) λ2 for which ω/λ = ±c. Now, substituting backwards into (1.111), for arbitrary ω, the trial function can be rewritten as y(x, t) = Re{e j (ωt+λx) } = Re{e jω(t±x/c) }.
(1.113)
The above equation shows that the ideal, unterminated string can vibrate at arbitrary angular frequencies. However, the wave number λ and angular frequency ω are related by the constant c. Since the wave equation is linear, we can also consider linear combinations of the solutions (1.113), i.e., the superposition of the previous two cases is also a solution y(x, t) = Re{Ae jω(t−x/c) + Be jω(t+x/c) } 11
(1.114)
Historical note: Jean Le Rond D’Alembert. French mathematician and philosopher who lived in Paris between 1717 and 1783. Member of the Academy at only 20 years he wrote several works and performed various studies including the motion of vibrating strings giving a rigorous formulation of the mathematical problem of these: “The motion of a vibrating string is described by a differential equation to partial derivatives of the 2nd order in space and time variables, called D’Alembert’s equation”.
1.6 Continuous Vibrating Systems
51
where A and B are the complex amplitudes of the two components. Remark 1.18 Note that Eq. (1.113), can be interpreted as the basis function of the inverse Fourier transform. Thus, by the superposition principle the solution can be written as ⎫ ⎧ ∞ ⎬ ⎨ A(ω)e jω(t−x/c) + B(ω)e jω(t+x/c) dω (1.115) y + + y − = Re ⎭ ⎩ −∞
as both t + x/c and t − x/c hold, the time-domain solution of the wave equation is written as the superposition of two generic functions. With these assumptions the solution of Eq. (1.110) (published by D’Alembert in 1747), can be written as x x y(x, t) = y + t − + y− t + c c
(1.116)
where the term y + (t − xc ) represents a wave traveling to the right, or progressive wave, while the term y − (t + xc ) represents a wave traveling to the left, or regressive wave, which retain their shape during their movement as shown in Fig. 1.35. The solution of the wave equation can be determined by first analyzing its frequency eigenmodes. A so-called eigenmode is a solution that oscillates in time with a well-defined and constant angular frequency ω, so that the time part of the wave function takes the form f (x)e− jωt . Thus for (1.115), you can find a harmonic solution of the wave equation that, in exponential notation, takes the form y¯ (x, t) = Ae j(ωt−kx) + Be j(ωt+kx)
(1.117)
in which the quantities12 y¯ , A and B are complex (defined in Sect. 1.2.3 as rotating vectors and phasors), for which y(x, t) = Re[ y¯ (x, t)].
1.6.2.1
Force and Velocity Waves
The force acting in a string section dx, can be expressed as
Fig. 1.35 Plucked infinite length string and progressive-regressive waves, some time after release 12
String inital position Regressive wave (sx)
Progressive wave (dx)
The reader should note, to avoid the proliferation of symbols, that as from now and for the rest of the chapter, phasors are no longer indicated in bold typeface as previously.
52
1 Vibrating Systems
+ ∂ y (t − x/c) ∂ y − (t + x/c) ∂ y(x, t) = −K + ∂x ∂x ∂x K + K − = y (t − x/c) − y (t + x/c) c c
f (x, t) = −K
(1.118)
that is, in a compact form f ± = ∓ (K /c) y ± . So, the equation of velocity is given by ∂ y(x, t) (1.119) v(x, t) = = y˙ + (t − x/c) + y˙ − (t + x/c) ∂t that in compact form is written as v ± = y˙ ± . The two quantities, force and velocity, by analogy with the electrical quantities, are called Kirchhoff variables, (or K -model), and are linked by the relation (similar to Ohm’s law) (1.120) f ± (t ∓ x/c) = ±Z 0 v ± (t ∓ x/c) with Z 0 = K /c =
√
K μ defined as characteristic impedance.
Remark 1.19 Note that, through the definition of the characteristic impedance Z 0 and for the Kirchhoff continuity law, the force f and the velocity v are related to wave variables (or W -model) ( f + and f − ) as f = f + + f − , and v = where f+ =
1 + f − f− Z0
1 1 ( f + Z 0 v) , and f − = ( f − Z 0 v) . 2 2
(1.121)
(1.122)
So, the previous expressions transform the pair of physical variables force and velocity (v, f ) into the pair of wave variables ( f + , f − ) and vice versa.
1.6.2.2
Space and Time
The wave equation y(x, t) is a function of two variables: time and space. For the analytical and qualitative study of two-variable functions it is convenient to maintain one at a time constant and to define significant quantities in the domain of the variable of interest. Fig. 1.36a shows the time-domain trend of a sinusoidal harmonic wave. In this case the oscillation period T can be defined as the duration of a cycle. By keeping the time constant, it is possible to study the spatial evolution of the wave. In Fig. 1.36b, is reported the spatial trend of the harmonic wave. In this case the quantity of interest is the wavelength λ, which represents the length of a cycle. The link between the wavelength λ, (spatial quantity) and the period T (time quantity), as reported earlier in Eq. (1.112), is defined by the wave propagation velocity c, and is expressed by the relation
1.6 Continuous Vibrating Systems
53
Fig. 1.36 Space and time characteristic quantities of waves y(x, t). a Wave in the time domain (x = cost) and its characteristic quantities. b Wave in the space domain (t = cost) and its characteristic quantities
λ = cT =
c f
(1.123)
where f = 1/T Hz is the frequency of the wave. From the above expressions it results that the period T is equal to the time that a wave employs to complete an entire cycle or, equivalently, the wavelength λ is the space traveled by the wave in a period. In addition, the quantity wavenumber, relates the wave propagation speed and the frequency as 2π f 2π = . (1.124) k= λ c
Remark 1.20 Observe that the wave propagation speed c depends on the medium in which the wave propagates. In the case of mechanical waves, as in the strings, √ the speed depends on the mass density μ and the mechanical tension K as: c = K /μ. For example, when we tighten the string to tune a guitar, the propagation speed increases: consequently the time to go through one cycle decreases. The acoustic effect is that the note perceived by the listener has a higher pitch; that is, the frequency of the acoustic wave is increased.
1.6.2.3
Input Impedance of a String
Consider an infinite length string subject to a tension K and excited at one of its extremes (e.g., left end) with a sinusoidal trend force defined by the rotating vector f¯(t) = Fe jωt . Since string is infinite, the solution in Eq. (1.117) is composed only by the progressive wave
54
1 Vibrating Systems
y¯ (x, t) = Ae j(ωt−kx)
(1.125)
where A → Ae jφ is a phasor containing the amplitude and phase with respect to the excitation force, and k is the wavenumber defined in (1.124). Since there is no concentrated mass, the excitation force will be in equilibrium with the transverse component of the tension. That is ∂ y(x, t) , for x ≥ 0. f (t) = −K sin θ ∼ = −K ∂x
(1.126)
It is obvious that the equilibrium condition is also valid for rotating vectors y¯ (x, t) = Ae j(ωt−kx) and f¯(t) = Fe jωt . For we get Fe jωt = jk K Ae jωt , or A = F/ jk K so, we have that y¯ = (F/ jk K ) e j(ωt−kx) . Moreover, for Eq. (1.29) the speed can be expressed as u¯ =
∂ y¯ = (ωF/k K ) e j(ωt−kx) = (cF/K ) e j(ωt−kx) . ∂t
(1.127)
Definition 1.9 We define the mechanical input impedance of a string the ratio between force and velocity at x = 0. Z bin
f¯ Fe jωt K = K μ = μc. = = = jωt u¯ x=0 c (cF/K ) e
(1.128)
So, for an infinite or finite length string, without reflections (i.e., with adapted terminations) the quantity Z bin is real and equal to the characteristic impedance of the string defined in Eq. (1.120); that is Z0
K = K μ = μc. c
(1.129)
In case the string has a finite length instead of Eq. (1.125) we consider the solution in which both the progressive and regressive waves are present y¯ = Ae j(ωt−kx) + Be j(ωt+kx) .
(1.130)
Now, consider the excited string with a force at x = 0 and fixed at the end x = L. By replacing Eq. (1.130) in (1.126) we get
1.6 Continuous Vibrating Systems
55
Fe jωt = K ( jk A − jk B) e jωt .
(1.131)
The condition at the extreme x = L (boundary condition) is such that 0 = Ae− jk L + Be jk L . Solving, we obtain A=
Fe− jk L Fe jk L , and B = 2 jk K cos k L −2 jk K cos k L
so, we have that y¯ =
F sin k(L − x) jωt e . kK cos k L
Deriving with respect to time, we get u¯ = jω
F sin k(L − x) jωt e . kK cos k L
Thus, the input impedance (at x = 0) is defined as Z bin =
f¯ kK = tan−1 (k L) = − j Z 0 tan−1 (k L). u¯ x=0 jω
(1.132)
Remark 1.21 Observe that, the string impedance is purely reactive and varies between 0 (at k L = π/2, 3π/2, ...) and ± j∞ (at k L = 0, π , ...), which represent, respectively, the resonances and the anti-resonances of the vibrating string.
1.6.2.4
Bounded String Wave Solution
Consider a traveling wave that encounters a sharp change in the material in which it is propagating. It can be guessed that in this situation a percentage of the energy, according to a certain coefficient called reflection coefficient kr , will be reflected. The result is therefore that the reflection coefficient is defined as the ratio between the reflected wave and the direct wave. Considering the general solution in Eq. (1.116) we have that f− kr = + . f
56
1.6.2.5
1 Vibrating Systems
Fixed-End String at x = 0
Consider the case in which a string termination, for example at x = 0 is fixed. In this situation the vertical displacement will necessarily be null y(0, t) = 0, it follows that the wave will be entirely reflected, i.e., f − (ct) = − f + (ct). In this case the reflection coefficient is equal to kr = −1, and the reflected wave will be inverted as shown in Fig. 1.37.
1.6.2.6
Free-End String at x = 0
In the opposite case, as shown in Fig. 1.38, in which the termination is completely free, since no transverse force is possible we will have that ∂ y/∂ x|x=0 . Differentiating the general solution in Eq. (1.116), we obtain ∂ y/∂ x = − f + + f − . By integrating it can be seen that f − (ct) = f + (ct), this means that we will have reflection without inversion as illustrated in Fig. 1.39. In this case the reflection coefficient is kr = 1. In the case of several dimensions, such as in the acoustic air waves which will be treated later, it results then that the angle of incidence of a wave front on a reflecting surface is equal to the angle of the reflected wave.
Fig. 1.37 Reflection with bound extreme. The reflected wave is inverted kr = −1
Fig. 1.38 Reflection with free extreme at x = 0. As we have the constraint ∂ y/∂ x|x=0 = 0, we have that f − (ct) = f + (ct), i.e., the reflection coefficient is kr = 1
1.6 Continuous Vibrating Systems
57
Fig. 1.39 Reflection with free extreme. The reflected wave is f − (ct) = f + (ct)|x=0
1.6.2.7
Standing Waves and Vibration Modes
A particular but very important case of interference is due to waves propagating in the same direction in opposite verses. Suppose the two waves with the same amplitude are defined as f + (x, t) = y0 sin(kx − ωt) (1.133) f − (x, t) = y0 sin(kx + ωt). In this case, considering the general solution (1.116) y = f + + f − , we have that y(x, t) = y0 [sin(kx − ωt) + sin(kx + ωt)] = 2y0 sin(kx) cos(ωt)
(1.134)
then, we can observe that this function cannot properly represent a traveling wave. Property 1.5 In the expression (1.134) the wave is expressed as a product of functions of two mutually independent variables: time and space. Thus, the argument of the function y(x, t) is not a traveling wave of the type (ct ± x). In other words, the function y presents itself as the product of two functions, one dependent only by x, the other dependent only on t. The simplest way to obtain this particular situation is to exploit the reflections of a wave that occur in the presence of a discontinuity of the propagation medium, for example, such as a rigid wall that fully reflects the wave. Remark 1.22 Note that in the study of interfering waves it is interesting to analyze the trend of the resulting wave in the spatial domain x, and we can see that the function y(x, t) = 2y0 sin(kx) cos(ωt) has minimum for sin(nkx) = 0, so we get x=
n π k
these points are defined as nodes. On the contrary, we have maximum for x=
(2n + 1) π k 2
these points are defined as anti-nodes. Figure 1.40 shows the trend of the standing wave also known as a stationary wave due to direct and reflected wave interference.
58
1 Vibrating Systems
Fig. 1.40 Direct and reflected wave and standing wave e in the extreme free case
To better explain the phenomenon, consider a plucked L-length string, i.e., the string of length L is fixed at both ends (as in a guitar strings or a piano, etc.). So, we generate waves that propagate ending at the fixed ends of the string. At the ends the waves are reflected by reversing the phase: the string becomes a site of progressive and regressive waves that interfere with each other. This situation is illustrated in Fig. 1.41.
Fig. 1.41 Finite length string with fixed terminations, simultaneously plucked on three points P
1.6 Continuous Vibrating Systems
59
Definition 1.10 Normal modes—Considering destructive and constructive interference due to the sum of the two waves in a finite length string with fixed terminations. The only propagating modes that survive, denoted to as stationary waves, are defined as normal modes of vibrating the string. Their characteristics depend on the length L (and the other parameters as K and μ) of the string. From the conditions to the string extremes we have that y(x, t)|x=0 = 0, and
y(x, t)|x=L = 0
we can easily see that the first condition is automatically satisfied. While for the second one imposes that 2π L = nπ kL = λ that is λ=
2L n
with n positive integer. If this condition is not met, the interference between the various reflected waves is destructive and the vibration decreases very rapidly. The wavelengths of the normal modes (and therefore the relative frequencies) are not arbitrary, but are related to the length of the string 2 1 λ = 2L , L , L , ... . 3 2 The modes are therefore quantized. The mode having the maximum wavelength (equal to 2L), called fundamental or first harmonic, has nodes only in the extremes. The second mode (second harmonic) has a node in the middle of the string, the third way has two nodes, and so on. So, we will have stationary waves on a string fixed at the two ends or in a tube containing air, closed at the ends (see Fig. 1.42). The waveshape that is established on a string at fixed ends is a linear combination of normal modes. The same is true in the case of a closed tube at the ends, as in many wind instruments. Remark 1.23 Observe that this phenomenon can be exploited for the construction of musical instruments. From the acoustic-musical point of view, it can allow to generate a rich in harmonics timbre, that depends on the characteristics of the string, or from the shape of tube section. By changing the length L, the tension K , or the mass density μ of a string (or the tube length L, in the case of wind instruments), we can define the various vibration frequencies (or fundamental mode). In other words, as for a harp, a piano, a trumpet, a clarinet, and son on, by varying the length of the string (or of the air column), we can define the various tonal range of the instrument.
60
1 Vibrating Systems
Fig. 1.42 Stationary wave and modes of a tensioned string of length L constrained to extremes. Note that, an antinode is spaced from a node by a λ/4 distance. Also note that for the fundamental mode λ = 2L, for the 2nd harmonic λ = L, for the 3rd harmonic λ = 2L/3, and so on
Equilibrium or string at rest
1st or fundamental harmonic
2nd harmonic
antinode node
3rd harmonic
antinode node
L
1.6.3 Strings Vibration and Musical Scales The most common way to produce sounds is to use vibrating strings. In fact, by varying the length of the string, with the same tension and mass-density, there is a variation in the frequency of the sound emitted. Between the frequency and length of the string there is a relationship of inverse proportionality: long string, heavy note (bass tones); short string, sharp note (high tones).
1.6.3.1
Natural Musical Scales
Consider a L length string tensioned to its extremes. In case the string is plucked in such a way as to excite the only fundamental mode, the resulting vibration would have a wavelength equal to twice its length (see Fig. 1.43). In fact, considering the relation λ = c/ f the so-called natural frequency of the emitted sound would depend on the tension, on the mass–density and on the length L according to the law 1 K . (1.135) f0 = 2L μ Thus the natural frequency is: • inversely proportional to the length of the string (the law of Pythagoras): f 0 ∝ L1 ; √ • proportional to the square root of the stretching force: f 0 ∝ K ; • inversely proportional to the square root of the mass per unit length: f 0 ∝ √1μ . From (1.135), if a string of length L emits the C1 (at 32.70 Hz) note (or in Italian notation Do) which we suppose to be the reference, the same string of length L/2
1.6 Continuous Vibrating Systems
61
Fig. 1.43 Fundamental mode or 1st harmonic where λ = 2L
sounds at double frequency and emits a C note to the higher octave13 (i.e., C2 at 65.4 Hz). The length L/3 gives a G2 (98.00 Hz). If we want to generate a G1 of the same octave as the reference C1, we just need to take a double length 2L/3 (49.00 Hz). Similarly, a string of length L/4 generates a C3 sound, i.e., two octaves above the reference, a string of length L/5 generates an E2 (82.41 Hz), 2L/5 generates an E1 (41.20 Hz) sound, and so on. Furthermore, from the (1.135) it is evident that the expression of a vibrating string can by expressed as a Fourier-like series y(x, t) =
∞
yk (t) sin(2π f 0 · k) =
k=1
∞ k=1
yk (t) sin
πx L
·k
(1.136)
where x ∈ [0, L], and yk (t) is the instantaneous amplitude of mode k. Figure 1.44 shows the link between the length of the string and the frequency of the sound emitted for some notes. For example, if the string at the top vibrates at a frequency corresponding to the musical note C, it can be seen that by decreasing its length the pitch of the sound increases. In the past, however, Pythagoras and his pupils realized that by dividing a string according to ratio composed of small numbers and making it vibrate, harmonic
Fig. 1.44 Frequencies of the fundamental mode, and example of musical note, for a string of variable length and subject to the same tension. In fact, generally in string instruments, such as the piano, to stress the structure uniformly, the strings are subjected to the identical tension
13
We remind the reader that by octave, we mean the distance between f and 2 f , while by decade we mean the distance between f and 10 f .
62
1 Vibrating Systems
sounds were obtained. With the Pythagorean approach the notes are defined as ratios between the lengths of the string that generates the sound. The Western musical tradition has assigned a special role to some frequencies, or rather to a sequence of relationships between frequencies (or between the lengths of the strings that generate them). The frequency of the notes that derive from this approach is called a natural scale. The natural scale has been so named, because it is based on a physical phenomenon: the succession of harmonic sounds. The natural scale is made up of seven fundamental notes: C, D, E, F, G, A, B. If f is the frequency (in Hertz) of the fundamental note C, the frequency of the others can be obtained by considering the relationships between lengths of a string. Taking as a reference the note C, for simplicity, assign the frequency normalized to 1, the relative frequencies of the intervals are shown in Fig. 1.45a. The musical scale is enriched with other notes introducing the sharp and flat changes between two successive notes, except between the E and the F and between the B and the C of the upper octave. In the natural scale, it is said sharp (indicated with the symbol “#”) the note with an interval of 25/24 higher than the reference note. While, we say flat (indicated with the symbol “b”) the note at the interval 24/25, lower than the reference. Remark 1.24 Observe that in the natural scale, the intervals between the notes are not all the same and only some fretless musical instruments, such as the violin, allow to produce all the notes of the natural scale (not the keyboard instruments). The determination of the frequencies of the notes can be done in the following way. Starting from note B, you get the other notes going up from fifth to fifth (a fifth corresponds to an interval of three tones plus a semitone: see Fig. 1.45a) finding F#, C# etc. Then starting from the F and descending from fifth to fifth, we find Bb, Eb etc. Realizing the stairs in this way it happens that the C# and the Db, for example, do not coincide.
1.6.3.2
Temperated Music Scale
To simplify the notation, and to avoid the inconveniences of the natural scale, at the end of 1600, Andreas Werckmeister (later followed by J. Sebastian Bach) introduced the temperate scale. In the temperate scale, also called equalized-scale, the intervals of two successive notes are always the same. The mathematical construction of this scale is based on the general formula 2k/N (with k e N positive integers), where N represents the total number of intervals or semitones of the scale and k ∈ [0, N − 1] represents the distance in semitones with respect to the note taken as reference (in musical language the reference is said first degree). With this definition two notes of frequency f n and f n+1 has one semitone distance their ratio (in frequency) is f n+1 / f n = 21/N or, in general, we have f n+k k = 2N . fn
1.6 Continuous Vibrating Systems
63
In the twelve-tone scale the octave is subdivided into twelve semitones for which N = 12; that is, two notes have a semitone distance if f n+1 = f n 21/12 . Starting from a reference note (for example, the A4 f 0 = 440 Hz) it is possible to obtain the frequency of the distant k-semitones as f k = f 0 2k/12 . In Fig. 1.45b, the relative frequencies of the tempered musical scale are shown. Observe that the frequencies of the natural and temperate scale are not exactly the same. In Fig. 1.46 is a table giving the frequencies in [Hz] of musical pitches, covering the full range of all normal musical instruments, for a tempered scale with A4 = 440 Hz.
1.6.3.3
The Harmonics
Now consider the case where a point x1 between the string terminations 0 < x1 < L, is constrained so that x1 becomes a node i.e., at the left and right of that point the string can oscillate while x1 remains fixed. In this situation, by appropriately calculating position x1 , one could only excite a specific mode of vibration. If this constraint is placed exactly at half the length of the string we will have only the 2nd mode, characterized by a double frequency (2nd harmonic) with respect to the fundamental mode. Exciting a string (for example with the bow as in the violin) most of vibration modes are simultaneously excited. The resulting sound will therefore be characterized by a complex spectrum composed mainly of the frequency of the fundamental
Fig. 1.45 Natural versus temperate musical scale. a Natural scale; b Temperate scale
64
1 Vibrating Systems
mode and its harmonics. Figure 1.47 shows the excitation mode with a constraint placed in an appropriate position, to obtain some vibration modes of a string. Remark 1.25 Note that a vibrating string in open space, transmits the vibration to the surrounding air molecules, generating sounds of typically very small intensity. To increase the acoustic efficiency in stringed instruments (guitar, piano, violin, etc.) the strings are stretched so that one of the two ends is connected, through a so-called bridge, to a resonant cavity and/or a soundboard, which can reinforce the vibrations produced by the strings in the air. The resonant cavity is also characterized by modes. In this case, the vibrating string can excite some mode of the cavity, obtaining an amplification at certain frequencies that characterize the timbre of the instrument.
Fig. 1.46 Musical pitch versus frequency table of temperate musical scale Fig. 1.47 Excitation of individual modes of vibration of a string. A harmonic sound can be generated by imposing a constraint in a point that turns out to be a node of the specific mode you want to excite. However, when the string is subjected to external forces as, for example, struck with one hammer or stressed with a sticky bow, several natural modes are simultaneously excited creating an harmonics rich timbre
1.6 Continuous Vibrating Systems
65
In the acoustic guitar, for example, there is both the soundboard, which propagates the medium high frequencies, and the cavity that amplifies the lower tones of the instrument. In this case, the desired tonal balance, is achieved by providing the sound cavity with adequate volume and shaped openings (as in the Helmholtz Resonator in Sect. 1.3.5.1). Moreover, the soundboard is equipped with appropriate glued-in ribs, which prevent the emergence of certain natural modes, which in this case, can cause unwanted resonances.
1.6.4 Lossy and Dispersive String In developing acoustic wave theory, according to the assumptions in Sect. 1.6.1.1, we considered an ideal string (at least) when: (1) the string is assumed to be perfectly flexible, i.e., the only restoring force is due to tension; (2) the string moves in a fluid without friction; (3) the string has an infinitesimal section and therefore there are no stiffness and alternating compression; (4) the tension movements that can determine longitudinal waves and dispersion are negligible. In practical cases, however, there is no clear distinction between a rod or bar and a string. Real strings are used in musical instruments because through longitudinal waves and inharmonics, such as in the piano, they characterize the timbre of the instrument itself. To consider a certain vibrating structure as a string or bar, we can consider tension and stiffness. Thus, we intend as a string, when the predominant characteristic is tension i.e., stiffness is negligible; conversely for a bar we have some stiffness while tension is negligible. In real musical acoustic situations, there is a complete plethora of intermediate cases of rigid bars/strings under tension.
1.6.4.1
Lossy Strings
In cases where the string with infinitesimal cross section, moves or vibrates in a fluid, the effects of friction are not negligible. The effect of friction is to damp the free vibration, and slightly change the allowed frequencies and their amplitudes. Because of friction, higher frequencies (or higher modes of vibration) will be more penalized. As with the real oscillator, it is easy to predict that friction can be modeled as a proportional contribution to the velocity opposing the oscillation. Thus, the equation of motion for the string when friction is included is ([1], Eq. 4.3.17) K y = μ y¨ + 2R y˙
(1.137)
where 2R y˙ is the damping term. The coefficient R is the effective frictional resistance per unit length of string. Thus, as a first approximation, it is easy to see that the
66
1 Vibrating Systems
proportionality parameter R behaves like an inductor that, when flowing a current, attenuates higher frequencies. Remark 1.26 Note that the term R mainly accounts for dissipative phenomena, which in the case of a vibrating string in a fluid, are very complex. Thus, for more adequate modeling it is possible to assume, that this term is itself a function of the frequency R → R(ω). Now, to determine the solution, proceeding as before, we can assume a trial function of the type y(t, x) = est+vx . Substituting the trial function into the (1.137) it is possible to show that ([12], Eq. C.27) y(x, t) = e−(2R(ω)/2μ)x/c y + (t − x/c) + e(2R(ω)/2μ)x/c y − (t + x/c).
(1.138)
In the above expression we observe that the progressive and regressive waves, are multiplied by an exponential envelope. In practice, it is observed that the wave attenuates along the spatial propagation coordinate x. In addition, note that the factor 2 before R(ω) in Eq. (1.137) is chosen in order to make the decay time rate τ (ω) at the angular frequency ω equal to τ (ω) = 1/R(ω).
1.6.4.2
Real Frequency-Dependent Lossy String
In oscillatory strings in real fluids such as air, the loss phenomena are very complex. The model in Eq. (1.137) can be improved by substituting the effective frictional resistance parameter R(ω), the identification of which may be complex, considering the following substitution suggested by Hiller and Ruiz in [15] ... R(ω) y˙ = b1 y˙ − b3 y
(1.139)
where b1 accounts for heat dissipation, while b3 accounts for sound radiation, proportional to ∂ 3 y/∂t 3 , represents a damping force proportional to ω2 . Considering the string model in Eq. (1.137) as the simplest way for implementing frequencydependent losses, i.e., the decay time rate τ (ω) can be written as τ (ω) =
1 1 ≈ R(ω) b1 + b3 ω2
(1.140)
Chaigne and Askenfelt in [17], suggest that the above formula is more appropriate for modeling the decay time rate in strings of real instruments such as the piano. Thus, by including a 3rd-order term (1.139), which more adequately accounts for losses as a function of frequency, we get ... K y = μ y¨ + 2b1 y˙ − 2b3 y
(1.141)
1.6 Continuous Vibrating Systems
67
in fact, as shown in [15], the addition of odd-order time derivatives to the wave equation allows the model to better approximate frequency-dependent damping characteristics. However, as we will see better in Chap. 9, in the implementation with discrete models, for high sampling frequencies, Eq. (1.141) can lead to an ill-posed problem [16]. In this case, it is possible to replace the 3rd-order time derivative, with a mixed space-time derivative as ([12], Eq. C.29) K y = μ y¨ + 2b1 y˙ − 2b2 y˙ .
1.6.4.3
(1.142)
Bending Bar
When the string does not have an infinitesimal cross section, as in piano strings, we can refer to the presence of stiffness. In these cases, it is also necessary to consider longitudinal waves and the various loss phenomena associated with stiffness. To illustrate the phenomenon let’s consider a bar that stressed by an external force F is bending. Thus, while one side is subject to compressive forces, the opposite side is subject to stretching forces. In order to characterize phenomena related to the stiffness of a solid body, it is necessary to define a quantity, referred to as Young’s modulus, which indicates the connection between the deformation of the body and the external force applied. Definition 1.11 Young’s modulus—The Young’s modulus is defined to describe the elastic properties of linear objects like wires, rods, air columns which are either stretched or compressed. It is defined as the ratio of the stress to the strain force E
F/S Stress = Strain L/L
(1.143)
where F is the force exerted on an object under tension, S is the actual cross-sectional area, which equals the area of the cross section perpendicular to the applied force, L is length at rest, L is the amount by which the length of the object changes Let dM = Fdx, be the moment exerted by the force on one end of the bar, the bar is subjected to a bending proportional to this moment. To compress a bar of crosssectional area S and length L by an amount dL requires (for Eq. 1.143) a force dF = E S(dL/L), where E is the Young’s modulus. With reference to Fig. 1.48, we assume that the bar is composed of a bundle of fibers of cross section dS, arranged in parallel with respect to the central plane. If the bar is bent at an angle φ into a length dx, then the fibers that are at a distance z from the center plane is compressed by a length zdφ; the force required to compress each fiber is EdS(zφ/dx); and the moment of this force around the center line of the cross section of the bar is (Eφ/dx)z 2 dS. The total moment of this force, required to compress and stretch all the fibers in the bar, is equal to
68
1 Vibrating Systems
Fig. 1.48 Momentum acting on a bar and bending it. When the bar is bent, its lower half is compressed and its upper half stretched (or vice versa). For small φ approximating the bent as a right triangle A-B-C we have ∂2 y that dφ dx ≈ − ∂ x 2
M=
dφ dM = E dx
z 2 dS
(1.144)
where the integration is over the whole area of the cross section. The resistance of a cross section of any material to bending is indicated by the gyration radius indicated as r g and defined as r g2 =
1 S
z 2 dS
(1.145)
i.e., the gyration radius describes how the components of an object are distributed around its axis of rotation, i.e., such that the area moment of inertia (or second moment of area) can be written as I = mr g2 . Its values can be easily evaluated for simpler cross-sectional shapes as shown in Fig. 1.49. 2 ≈ − ∂∂ xy2 , the moment can be writCombining Eqs. (1.144) and (1.145), since dφ dx ten as M = E Sr g2
dφ ∂2 y ≈ −E Sr g2 2 . dx ∂x
(1.146)
Moreover, since dM = Fdx, solving for the force, we obtain F=
∂3 y ∂M = −E Sr g2 3 . ∂x ∂x
(1.147)
Now, as the force produce an acceleration perpendicular to the bar axis, we have that the motion equation can be written as ([2], Eq. 2.56) ∂2 y ∂F dx = −ρ Sdx 2 ∂x ∂t
(1.148)
where ρ is the bar density, i.e., such that an infinitesimal segment mass is ρ Sdx. Substituting in the latter the expression of the force (1.147), we get
1.6 Continuous Vibrating Systems
69
ρS
4 ∂2 y 2∂ y = −E Sr g ∂t 2 ∂x4
(1.149)
which is a fourth order space-domain differential equation.
1.6.4.4
Stiff String Vibration: The Dispersive Wave Equation
In a real string the restoring force is due to the tension (as in the ideal string) to which, considering the linear system, we can add the contribution due to the bending stiffness of the string. So, considering a string with linear density μ the dispersiveor stiff-wave equation can be written as ( [15], Eq. 11) K y = μ y¨ + κ y
(1.150)
where κ = E I , and I = m · r g2 is the area moment of inertia. Proceeding as in [12], the solution can be determined by assuming a trial function of the type y(t, x) = est+vx , that substitutes in (1.150) produces K v 2 = μs 2 + κv 4 . To determine the solution, we consider the 2nd- and 4th-order contributions separately. Thus, when stiffness is negligible, i.e, at very low frequencies, μs 2 = K v 2 , and v = ± s/c. At high frequencies, when the tension K is negligible, we get π 1/4 √ s. In real-world situations, μs 2 ≈ −κv 4 , and the solution is v ≈ ± e± j 4 μκ such as in instruments with strings of non-negligible cross section such as the piano, the region of greatest acoustic interest is at intermediate frequencies where inharmonics are present due to the presence of longitudinal waves.
Fig. 1.49 Gyration radius of some simple bar/rod-shapes. The gyration radius r g of a body around the rotation axis is defined as the radial distance of a point at which the body’s mass is concentrated, with a moment of inertia equal to the true spatial mass distribution
70
1 Vibrating Systems
In this case, for the determination of the solution, for simplicity we consider an ideal string to which we add a correction term that models the stiffness [12]. Assuming κ0 κ/K 1, we have s2 = where c0 =
K , μ
K 2 κ 4 v − v = c02 v 2 1 − κ0 v 2 μ μ
and after som math we have v≈±
s c0
1 s2 1 + κ0 2 . 2 c0
The general eigen-solution can be obtained by substituting v in terms of s into est+vx e
st+vx
$ # 1 s2 x 1 + κ0 2 . = exp s t ± c0 2 c0
For s = jω we obtain est+vx = e jω[t±x/c(ω)] where as defined by Smith in [12],
κω2 . c(ω) c0 1 + 2K c02
(1.151)
In other words, as intuitively expected, the higher-frequency components travel faster than the lower-frequency components. For more details see [1, 12].
1.6.4.5
The Chaigne-Askenfelt Lossy String Model
In the real vibrating systems all the dissipative phenomena due to stiffness and frictional losses are present. The string model, will have both the loss component proportional to the odd time derivatives in Eq. (1.141), and to the fourth spatial derivative in Eq. (1.150). Thus, a general model that includes the various sources of loss is that defined in Hiller–Ruiz [15] and later popularized by Chaigne–Askenfelt ([17], Eq. 1) 2 4 ∂y ∂3 y ∂2 y 2∂ y 2 2∂ y + 2b = c − c L − 2b + f (x, y, t) 1 3 ∂t 2 ∂x2 ∂x4 ∂t ∂t 3
(1.152)
where b1 and b3 represent, respectively, the damping and the loss parameters, that in this model are assumed frequency-independent constants, i.e., the frequency-decay rate has the form in Eq. (1.140), and the stiffness parameter is given = r g2 (E S/K L 2 )
1.6 Continuous Vibrating Systems
71
is the stiffness parameter, and where we recall that • • • • • • •
c is speed sound in the string [m/s]; L is the string length [m]; E Young’s modulus of the string [N/m2 ]; S cross-sectional area of the core [m2 ]; r g the radius of gyration of the string section [m]; K the tension of the string along the axis x [N]; f (x, x0 , t) is the external driving force applied at time t, by the hammer. This excitation is limited in time and distributed over a certain width.
The forcing term f (x, y, t) in Eq. (1.152) is a force density which represents the string excitation as the hammer in the piano. Also in this model, it is possible to replace the 3rd-order time derivative of with a mixed time-space derivative as proposed by Bensa et el. in [16] (as in Eq. 1.142). Therefore, in agreement with this hypothesis, the equation of the string with stiffness and complex dissipation model assumes the form ([16], Eq. 6) y¨ = c2 y − c2 L 2 y − 2b1 y˙ + 2b2 y˙ + f (x, y, t).
(1.153)
Remark 1.27 Note that the damping parameters b1 and b3 (or b2 in Eq. 1.153) were derived from experimental values and standard identification procedure, (as the Ordinary Least Squares (OLS) approach), with the assumption that the above empirical laws accountg globally for the losses in the air and in the string material [17]. Remark 1.28 The model is physically more consistent in that the friction losses are dependent only on the 1st-order time derivative.
1.6.4.6
Empirical Dispersive–Dissipative String Model
Finally note that, the empirical dispersive–dissipative string model can be further enriched with a more general expression of the type ([16], Eq. 1) y¨ =
N k=0
rk
M ∂ 2k y ∂ 2k+1 y + f (x, y, t). + 2 q k ∂ x 2k ∂ x 2k ∂t k=0
(1.154)
In this case we observe that the order of time derivative at most is equal to two and that the unknown parametres rk and qk can be estimate by standard identification procedure.
1.6.5 The Vibration of Membranes The treatment previously carried out on the strings can easily be extended to the case of membranes [1, 2, 25–28].
72
1 Vibrating Systems
Fig. 1.50 Vibration modes of a surface rigidly along its perimeter. a Rectangular membrane. b Circular membrane
Consider the case of an ideal rectangular membrane with a perfectly flexible surface that does not offer flexural strength, and with surface mass–density ρa [kg/m2 ]. In this case we will have a wave y(x, z, t) that propagates in the two directions x and z. So, the wave propagation equation is very similar to that in the one-dimensional and can be written as ∂2 y ∂2 y 1 ∂2 y + = ∂x2 ∂z 2 c2 ∂t 2
(1.155)
√ where c = K a /ρa , (K a [N/m] it is membrane tension per unit of length). Without going into the analytical details, which can easily be obtained in a similar way to what we have done for the vibrating strings, we can obtain all the qualitative and quantitative information related to the vibration modes of the membranes. Figure 1.50 show, by way of example, the vibration modes of a rectangular and circular membrane rigidly fixed along the perimeter.
1.7 Sound Waves in the Air By sound we mean that mechanical phenomenon caused by a perturbation of a transmission medium (usually air), which has characteristics such as to be perceived by the human ear. When the air is perturbed, the pressure value is no longer constant, but varies from point to point: it increases where the molecules are compressed, decreases where the molecules are expanded (see Fig. 1.51). While in the string the propagation occurs through transverse waves, the sound in the air propagates (predominantly) with longitudinal waves in which the particle of the medium move in a direction parallel or anti-parallel to the direction of the energy transport. The propagation speeds is determined only by the properties of the
1.7 Sound Waves in the Air Fig. 1.51 Representation of propagation of the sound in air by longitudinal wave. Alternating air compression and rarefaction generates a longitudinal wave. The gas particle move in a direction parallel or anti-parallel to the wave direction
73 Increase Atm. Pres.
Mean atmospheric pressure
Decrease Atm. Pres.
Motion of the air molecules associated with the sound
Propagation Direction
gas.14 In particular, the air waves propagate with a speed equal to c = 331.3 + 0.6te m/s, where te indicate the temperature. So, the sound waves can be described with equations very similar to those of the vibrating strings and membranes previously considered. For example, in seismic, longitudinal waves are referred to as primary waves (P-waves). Typical speeds are 340 m/s in air, 1450 m/s in water and about 5000 m/s in granite. While transverse waves are referred to as secondary waves (S-waves). The speeds are typically around 60% of that of P-waves in any given material.
1.7.1 Plane Wave For the determination of the mathematical model of the plane wave, we consider a gas (or fluid) contained in a long tube of section S arranged in the propagation direction x, as illustrated in Fig. 1.52. Consider also the following hypotheses: (a) viscosity of the zero mean; (b) homogeneous and continuous means; (c) adiabatic process; (d) isotropic and perfectly elastic medium. Each perturbation in the fluid translates into a movement of its particles along the longitudinal axis of the tube, producing small variations in pressure and density that oscillate around an equilibrium point so that the phenomenon can be modeled as a one-dimensional wave equation (as in the vibrating string described above) of the type 2 ∂ 2 ξ(x, t) 2 ∂ ξ(x, t) = c ∂t 2 ∂x2
(1.156)
Fig. 1.52 Generation scheme of a plane wave
14
When the speed of sound does not depend on amplitude and frequency, the propagation medium is assumed to be linear.
74
1 Vibrating Systems
Fig. 1.53 The cubic compression or bulk modulus K , is a constant that expresses the property of a material: it indicates the compressibility of the volume due to a certain external pressure
where ξ , measured in [m], represents the instantaneous acoustic displacement, i.e., the displacement of the gas particles to the passage of the acoustic wave and c represents the velocity of sound propagation in the gas. √ It is known that for the propagation speed we have that c = K /ρ, where ρ represents the density of the fluid and the term K , said modulus of elasticity to cubic compression or bulk modulus,15 is derived from the law of infinitesimal variation of the volume as a function of the infinitesimal variation in pressure, see Fig. 1.53, that can be written as d pa = −K
dV0 V0
(1.157)
where pa , in this case, represents the pressure in the absence of sound propagation and V0 an infinitesimal volume of the gas. Thus the bulk modulus can be formally defined as K = −V0
∂ pa . ∂V
In other words, the inverse of the bulk modulus indicates the compressibility of a substance. Strictly speaking, since bulk modulus is a thermodynamic quantity, it is necessary to specify how the temperature varies in order to specify a bulk modulus (e.g., constant-temperature). In the propagation of acoustic waves, if the phenomenon is isothermal, the empirical Boyle’s gas law is valid pa V0 = nkT = constant
(1.158)
where T is the absolute temperature and pa the average atmospheric pressure. In the case of adiabatic behavior, instead, is used the following law γ
pa V0 = constant
15
(1.159)
The difference between Young’s modulus and bulk modulus is that Young’s modulus is the ratio of tensile stress to tensile strain, while bulk modulus is the ratio of volumetric stress to volumetric strain.
1.7 Sound Waves in the Air
75
where γ = C p /Cv = 1.4, represents the ratio between the specific heat at constant pressure and at constant volume. In the propagation in air the behavior is almost adiabatic. In fact it can be considered isothermal only at very high frequencies and in confined rooms to very low frequencies 1
decreasing with n; constant; increasing with n.
Special cases of the expression (2.4), for α = 1, are given below • • •
complex sinusoid, |A| e j(ωn+φ) ; j(ωn+φ) +e− j(ωn+φ) real cosine, cos (ωn + φ) = e ; 2 j(ωn+φ) e −e− j(ωn+φ) real sine, sin (ωn + φ) = . 2j
108
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.9 Systems for analog signal processing: a unifilar system diagram, b analog circuit approach
Fig. 2.10 Discrete-time system Single Input/Single Output (SISO), maps the input x[n] into the output y[n] through the operator T : a unifilar (one wire only) scheme; b infinite-precision algorithm or DT circuit; c finite-precision algorithm or digital circuit
2.3 Discrete-Time Circuits The processing of signals may occur in the CT domain with analog circuits, or in DT domain with numerical circuits [1–7]. In the case of analog signals is often used a unifilar systemic representation, as shown in Fig. 2.9a, in which the processing is defined by a mathematical operator T such that y(t) = T {x(t)}. This schematization, although very useful from a simplified mathematical and descriptive points of view, does not take into account the energy interaction in the presence of many system blocks. In fact, the analog signals processing occurs through (mostly electrical) circuits as shown in Fig. 2.9b. When the signals are made up of sequences, signal processing must necessarily be done with algorithms or more generally with DT or numerical circuits. Namely, as shown in Fig. 2.10b, c, we define as DT circuit or numerical circuit the signal processing algorithm implemented with infinite precision arithmetic, while we define digital circuit the finite-precision algorithms. In the audio processing devices the use of analog circuits is basically confined to specific aspects such as signal conditioning circuits (microphone preamplifiers or other transducers, anti-aliasing filters, etc.); or, in the opposite side of the chain, in the power amplifiers, analog crossover filters, and so on.
2.3 Discrete-Time Circuits
109
The current applications of DSP techniques in the audio sector or digital audio signal processing (DASP), are innumerable. Non-real-time processing, in which the signal is first stored and then processed without strict time constraints, are the most varied: filtering, insertion of effects, compression, etc. Real-time DT circuits are currently, compatibly with the processing speed of the available hardware, used in all audio sectors: from professional to consumer devices (radio, TV, PCs, smartphones, voice assistant speakers, etc.).
2.3.1 General DT System Properties and Definitions As shown in Fig. 2.10, we can assimilate a DT system to a mathematical operator T such that y[n] = T {x[n]}. Below, we outline some general properties for the operator T that applies to the related DT circuits [1–8]. Property 2.2 Linearity— An operator T is said to be linear if it holds the superposition principle, defined as y[n] = T {c1 x1 [n] + c2 x2 [n]} , ⇒
y[n] = c1 T {x1 [n]} + c2 T {x2 [n]} . (2.6)
Property 2.3 Time-Invariance or Stationarity— In case the operator T was timeinvariant, it holds the translation property, defined as y[n] = T {x[n]} , ⇒
y[n − n 0 ] = T {x[n − n 0 ]} .
(2.7)
Definition 2.1 Linear time-invariant DT circuits—A DT circuit that satisfies the previous two relationships is called linear time-invariant (DT-LTI). Definition 2.2 Causality— The operator T is causal if its output at time index n 0 depends on the samples of inputs with index n n 0 , i.e., the past, the present but not the future. For example, a circuit that realizes backward first difference, characterized by the relation y[n] = x[n] − x[n − 1], is causal. In contrast, the relation y[n] = x[n + 1] − x[n], called difference first forward, is non-causal: In this case, in fact, the output at instant n depends on the input at a future time sample n + 1.
2.3.1.1
Impulse Response
The DT impulse response as shown in Fig. 2.11 is defined as the circuit response when at its input is applied the unit impulse δ[n]. This response is, in general, indicated as h[n]. So, one can write h[n] = T {δ[n]} .
(2.8)
110
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.11 Example of DT circuit response, to a unitary impulse
2.3.1.2
Bounded-Input-Bounded-Output Stability
An operator T is said to be stable if and only if (iff) for any bounded input a bounded output is obtained. In formal terms y[n] = T {x[n]}, |x[n]| c1 < ∞, ⇒
|y[n]| c2 < ∞, ∀n, x[n]. (2.9)
This definition is also called DT bounded-input-bounded-output stability or BIBO stability Remark 2.2 The DT bounded-input-bounded-output (DT-BIBO) stability definition, although formally very simple, most of the times is not useful for determining whether the operator T , or the circuit that realizes it, is stable or not. Usually, to verify that a circuit is stable, some simple criteria derived from the definition (2.9) are adopted, taking into account the intrinsic structure of the circuit, the impulse response, or some significant parameters that characterize it.
2.3.2 Properties of DT Linear Time-Invariant Circuits A special class of DT circuits, often used in digital signal processing, is that of the linear time-invariant (LTI) circuits. These systems are fully characterized by their impulse response h[n], according to the following Theorem. Theorem 2.1 Let T be an DT-LTI operator, this is fully characterized by its impulse response. For fully characterized we mean the property that, when input x[n] and the impulse response h[n] are known, it is always possible to calculate the circuit output y[n]. Proof From the time-invariant property, if h[n] = T {δ[n]} then h[n − n 0 ] = T {δ[n − n 0 ]}. It also appears that, from the sampling properties, the sequence x[n] can be described as a sum of shifted impulses, i.e., x[n] =
∞ k=−∞
x[k]δ[n − k]
(2.10)
2.3 Discrete-Time Circuits
111
so, it is
y[n] = T
∞
x[k]δ[n − k] .
(2.11)
k=−∞
For linearity, it is possible to switch the T operator with the summation and for (2.8) we get ∞ x[k]h[n − k] (2.12) y[n] = k=−∞
the latter is generally referred to as infinite convolution sum or, in other contest, as Cauchy product of two (finite/infinite) series.
2.3.2.1
Convolution Sum
The expression (2.12) shows that, for a DT-LTI circuit, when the impulse response and the input are known, the output computability is defined as the convolution sum. This operation, very important in DT circuits, and also from software/hardware implementation point of view, is indicated as y[n] = x[n] ∗ h[n] or y[n] = h[n] ∗ x[n], where the symbol “∗” denotes the DT convolution sum. Remark 2.3 The convolution can be seen as a generalization of the superposition principle. Indeed, from the previous development, it appears that the output can be interpreted as the sum of many shifted impulse responses. Note also, that the (2.12), with simple variable substitution, can be rewritten as y[n] =
∞
h[k]x[n − k].
(2.13)
k=−∞
It is easy to show that the convolutional-type input–output link is a direct consequence of the superposition principle (2.5) and the translation property (2.7). In fact, the convolution defines a DT-LTI system, or a DT-LTI system is completely defined by the convolution.
2.3.2.2
Convolution Sum of Finite Duration Sequences
For finite duration sequences, denoted as
and
T x ∈ R N ×1 x[0] x[1] · · · x[N − 1]
(2.14)
T h ∈ R M×1 h[0] h[1] · · · h[M − 1]
(2.15)
112
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.12 Example of finite duration convolution. In case that one of the two sequences represents the impulse response of a physical system. The greater duration can be interpreted with the presence of transient phenomena at the beginning and at the end of the convolution operation
the summation extremes in the convolution sum assume finite values. Therefore (2.13) becomes y[n] =
M−1
h[k]x[n − k], for 0 n < N − M + 1.
(2.16)
k=0
Note that, as shown in Fig. 2.12, the output sequence duration is greater than that of the input. In the case that one of the two sequences represents an impulse response of a physical system the greater duration is interpreted with the presence of transient phenomena at the beginning and at the end of the convolution operation.
2.3 Discrete-Time Circuits
113
Fig. 2.13 Definition and constitutive relations of elements in linear DT circuits. Given the unifilar nature of the circuit we will have only one reactive component: the unit delay
2.3.3 Basic Elements of DT Circuits Similarly as in CT, also in DT domain it is possible to define circuit elements through simple constitutive relations. In this case, the nature of the signal is unique (only one quantity) and symbolic (a sequence of numbers). Then, in DT domain, the circuit element does not represent a physical low, but rather a causal relationship between its input–output quantities. In DT circuits, being present only the through type quantities, it has only one reactive element: the delay unit (indicated generally by the symbols D, z −1 or q −1 ). This allows the study of DT circuits through simple unifilar diagrams. Figure 2.13, presents the definition of DT-LTI circuit elements. Example 2.1 Consider the DT-LTI circuit in Fig. 2.14a. By visual inspection, it is easy to determine the circuit input–output relationship, which can be written as y[n] = 3x[n] + x[n − 1] +
1 y[n − 1]. 2
(2.17)
The above expression is a causal finite difference equation (FDE). Therefore, the following property applies. Property 2.4 Any DT-LTI circuit, defined with the elements of Fig. 2.13, can always be related to an algorithm of this type. It follows that an algorithm, as formulated, can be always associated with a circuit. Consequently, we can assume the dualism algorithm ⇐⇒ circuit. Moreover, this property can be easily generalized even in the presence of nonlinear elements (with memory or memoryless) by causal nonlinear FDE. Definition 2.3 Signal flow graph—Note that, the circuit diagram in Fig. 2.14, here obtained by simple visual inspection, is often referred to as a signal flow graph2 2
Although this representation is a “circuit” rather than a “true graph,” the reader can directly derive the graphical form of the corresponding signal flow graph, i.e., a directed graph in which nodes
114
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.14 Examples the DT circuits: a with two delay elements; b with only one delay element
(SFG). The SFG is of great importance in the analysis and synthesis of DT circuits because through simple rules derived from graph theory, allow to derive various implementation structures of the same circuit but with possible different characteristics of robustness, complexity, etc. Example 2.2 Calculation of the impulse response. Consider the circuit in Fig. 2.14b. By visual inspection, we can determine the difference equation that defines the causal input–output relationship y[n] = 2x[n] −
1 y[n − 1]. 2
For the impulse response calculation we must assume zero initial conditions (i.c.) y[−1] = 0. For an input x[n] = δ[n], we evaluate the specific output for n 0, obtaining n = 0,
y[0] = 2 · 1 + 0 = 2
n = 1,
y[1] = − 21 y[0] = −1
n = 2,
y[2] = − 21 y[1] =
n = 3, .. .
y[3] = − 21 y[2] = − 41 .. .
1 2
Generalizing for the k-th sample, with simple consideration, the expression is obtained in closed form of the type y[n] = (−1)n 21−n which has an infinite duration and reported in Fig. 2.15. Remark 2.4 In general, DT-LTI circuits with an infinite duration impulse response are referred to as infinite impulse response (IIR) filters. represent system variables, and branches (edges, arcs, or arrows) represent functional connections between pairs of nodes.
2.3 Discrete-Time Circuits
115
h[ n ]
0
1
2
3
4
n
Fig. 2.15 Impulse response of the simple IIR filter in Fig. 2.14b. Furthermore, it is easy to verify that if the circuit has an infinite duration impulse response we have: limn→∞ h[n] = 0, otherwise the circuit would be unstable
Fig. 2.16 Block diagram for the measurement of the frequency response (amplitude and phase) of a linear DT circuit
2.3.4 Frequency Domain Representation of DT Circuits The sinusoidal and exponential sequences e jωn , as inputs for DT-LTI circuits, represent a set of eigenfunctions. In fact, the output sequence is exactly equal to the input sequence simply multiplied by a real or complex weight. Suppose we want to measure experimentally the frequency response of a DTLTI circuit placing at its input a sinusoidal signal of unitary amplitude with variable frequency. Since the input is an eigenfunction, it is possible to evaluate the amplitude An and phase ϕn of the output sequence, for a set of frequencies that can be reported in a graphic form as represented by the diagram of Fig. 2.16, which is precisely the measured amplitude and phase responses.
116
2.3.4.1
2 Discrete-Time Signals, Circuits, and System Fundamentals
Frequency Response Computation
In order to perform the calculation in closed form of the frequency response, we proceed as in the empirical approach. The input is fed by a unitary-amplitude complex exponential of the type x[n] = e jωn , and we evaluate the output sequence. Known its impulse response h[n], the circuit’s output can be calculated through the convolution sum defined by (2.13); we obtain y[n] =
∞
h[k]e
jω(n−k)
=
k=−∞
∞
− jωk
h[k]e
e jωn .
k=−∞
It is observed that the output is calculated as the product between the input signal e jωn and a quantity in brackets in the following indicated as H (e ) = jω
∞
h[k]e− jωk .
(2.18)
k=−∞
The complex function H (e jω ), defined as frequency response, shows that the steadystate response to a sinusoidal input is also a sinusoid with the same frequency as the input, but with the amplitude and phase determined by the circuit characteristics represented by the function H (e jω ). For this reason, as we will see successively in this Chapter, this function is also called network function or transfer function (TF).
2.3.4.2
Frequency Response Periodicity
The frequency response H (e jω ) is a periodic function with period 2π . In fact, if we write ∞ j (ω+2π) H (e )= h[n]e− j (ω+2π)n n=0
noting that the term e± j2πn = 1, it follows that e− j (ω+2π)n = e− jωn . So it is true that H (e j (ω+2πn) ) = H (e jω ).
2.3.4.3
Frequency Response and Fourier Series
The H (e jω ) is a periodic function of ω, therefore, (2.18) can be interpreted as Fourier series with coefficients h[n]. From this observation, we can derive the sequence of coefficients from the well-known relationship
2.4 DT Circuits Representation in Transformed Domains
1 h[n] = 2π
117
π H (e jω )e jωn dω.
(2.19)
−π
Remark 2.5 In practice, Eq. (2.18) allows us to evaluate the frequency domain circuits behavior, while the relation (2.19) allows us to determine the impulse response known the frequency response. The pair of Eqs. (2.18) and (2.19) represents a linear transformation that allows us to represent a circuit in the time or frequency domain. This transformation, valid not only for the impulse responses but also extendable to any sequence, is denoted as Discrete-Time Fourier Transform (DTFT). In the remainder of the Chapter, DTFT will be reintroduced in an axiomatic and less intuitive way. In addition, some properties and links to other types of linear transformations will be introduced.
2.4 DT Circuits Representation in Transformed Domains The signal analysis and DT circuits design can be facilitated if performed in the frequency domain. In fact, it is possible to represent signals and systems in various domains. Therefore, it is useful to see briefly the definitions and basic concepts of the z-transform and its relation with the Fourier transform [2–10].
2.4.1 The z-Transform Given a DT signal x[n] as the arbitrary sequence defined in the Hilbert space of quadratically summable sequences, indicated as x[n] ∈ l2 (R, C), for n ∈ N, let z ∈ C be a variable denoted as complex frequency, the z-transform of a sequence x[n] is defined by the following equations pair X (z) = Z {x[n]}
∞
x[n]z −n , direct z − Transform
(2.20)
n=−∞
denoted as bilateral direct z-Transform, and its inverse 1 X (z)z n−1 dz, inverse z-Transform. x[n] = Z −1 {X (z)} 2π j C
(2.21)
118
2 Discrete-Time Signals, Circuits, and System Fundamentals
You can see that the X (z) is an infinite power series in the complex variable z, where the sequence x[n] plays the role of the series coefficients. Thus, it can be considered as a discrete-time equivalent of the Laplace transform. In general, this series converges to a finite value only for certain values of z. A sufficient condition for convergence is given by ∞
|x[n]| z −n < ∞.
(2.22)
n=−∞
The set of values for which the series converges defines a region in the complex z-plane, called region of convergence (ROC). This region has a shape delimited by two circles of radius R1 and R2 of the type R1 < |z| < R2 .
Example 2.3 Given the sequence x[n] = δ[n − n 0 ], its z-transform is X (z) = z −n 0 . Let x[n] = u[n] − u[n − N ], it follows that X (z) is X (z) =
N −1
(1)z −n =
n=0
1 − z −N . 1 − z −1
In both examples the sequence x[n] has a finite duration. The X (z) thus appears to be a polynomial in the z−1 variable, and the ROC is all the z-plane except the point at z = 0. Thus, all finite length sequences have ROC of the type 0 < |z| < ∞. Definition 2.4 Unilateral z -transform - Let x[n] be a sequence defined only for n ≥ 0, we define the one-side or single-sided or unilateral z-transform only for n ≥ 0, as ∞ X (z) = x[n]z −n . n=0
In this case, the X (z) turns out to be a power series. Moreover, the unilateral z-transform can be applicable for casual DT circuits, i.e., described by casual FDE with given i.c. The X (z) ROC is always exterior to the circle hence need not to be specified |z| > R1 . Example 2.4 Let x[n] = a n u[n] be a sequence defined for n ≥ 0 its unilateral ztransform can be as
2.4 DT Circuits Representation in Transformed Domains Table 2.1 Main properties of the z-transform Sequence Linearity Translation Exponential weighing Linear weighing Temporal inversion Convolution
ax2 [n] + bx2 [n] x[n − m] a n x[n] nx[n] x[−n] x[n] ∗ h[n] x[n]w[n]
X (z) =
∞
a n z −n =
n=0
119
z-transform a X 1 + bX 2 [z] z −m X (z) X (z/a) −z(dX (z)/dz) X (z −1 ) X (z)H (z) 1 −1 2π j C X (v)W (z/v)v dv
1 |a| < |z| . 1 − az −1
In this case, the X (z) turns out to be a geometric power series for which there exists an expression in a closed form that defines the sum. This is a typical result for infinite length sequences defined for positive n. Example 2.5 Let x[n] = −bn u[−n − 1], its z-transform can be written as X (z) =
−1 n=−∞
bn z −n =
1 , |z| < |b| . 1 − bz −1
The infinite length sequences x[n] are defined for negative n. In this case the ROC has the form |z| < R2 . The most general case, where x[n] is defined for −∞ < n < ∞, can be seen as a combination of the previous cases. The region of convergence is thus R1 < |z| < R2 . There are theorems and important properties of the z-transform very useful for the study of linear systems. A non-exhaustive list of such properties is shown in Table 2.1.
2.4.2 Discrete-Time Fourier Transform As introduced in Sect. 2.3.4, for a signals that can be defined in CT or DT, also transformations can be defined in a continuous or discrete domain. For a DT signal x[n], it is possible to define a CT transform by the relations (2.18) and (2.19) that are not restricted to circuit impulse response only. In fact, this is possible by applying the (2.18) and (2.19) to any sequence, provided the existence conditions. A sequence x[n] can be represented by the relations pair (2.18)–(2.19), known as Discrete-Time Fourier Transform DTFT, below rewritten as
120
2 Discrete-Time Signals, Circuits, and System Fundamentals
X (e jω ) =
∞
x[n]e− jωn ,
direct DTFT
(2.23)
n=−∞
1 x[n] = 2π
2.4.2.1
π X (e jω )e jωn dω, inverse DTFT.
(2.24)
−π
Discrete-Time Fourier Transform Existence Condition
The existence condition of the transform of a sequence x[n] is simply its computability, namely: 1. If x[n] is absolutely summable exists and it is a continuous function of ω (sufficient condition) ∞
|x[n]| c < ∞, →
uniform convergence.
n=−∞
2. If x[n] is quadratically summable, then X (e jω ) exists and it is a discontinuous function of ω (sufficient condition) ∞
|x[n]|2 c < ∞, →
non-uniform convergence.
n=−∞
3. If x[n] is not absolutely or quadratically summable, then X (e jω ) exists only in special cases. Remark 2.6 Note that a sequence summable in modulus has finite energy. In general the reverse is not true. In other words, there are sequences x[n] that, despite having finite energy, are not summable in modulus. Example 2.6 Given a complex exponential sequence defined as x[n] = e− jω0 n , −∞ < n < ∞ its DTFT can be written as X (e jω ) =
∞
2π δ(ω − ω0 +2π n)
n=−∞
where with the term δ, it is denoted the continuous-time Dirac impulse function.
2.4 DT Circuits Representation in Transformed Domains
121
Fig. 2.17 DTFT and z-transform link. The DTFT can be generated by setting z = e jω in the z-transform
Remark 2.7 From the previous expressions one can simply deduce that: • a stable circuit always has a frequency response; • a circuit with bounded impulse response (|h[n]| < ∞, ∀n) and finite-time duration, called later Finite Impulse Response (FIR) filter, always has a frequency response and it is therefore always stable.
2.4.2.2
DTFT and z-Transform Link
One can easily observe that the equations (2.23) and (2.24) may be seen as a particular case of the z-transform (see (2.20) and (2.21)). The Fourier representation is in fact achievable considering the z-transform only around the unit circle in the z-plane, as shown in Fig. 2.17. As above indicated, the DTFT can be simply formulated by setting z = e jω in the z-transform. In the first of the two examples discussed above it is clear that, since the ROC of X (z) includes the unit circle, also the DTFT converges. In other examples, the DTFT only exists if |a| < 1 and |b| > 1, respectively. Note that these conditions correspond to exponentially decreasing sequences and, therefore, to BIBO stable circuits.
2.4.2.3
Convolution Theorem
Table 2.1 shows the properties of the convolution for the z-transform. We show that this property (as well as others) is also valid for the DTFT. Theorem 2.2 For a linear circuit with impulse response h[n], in which input is present in a sequence x[n], the following relations hold y[n] = h[n] ∗ x[n] ⇔ Y (e jω ) = H (e jω )X (e jω )
122
2 Discrete-Time Signals, Circuits, and System Fundamentals
and y[n] = h[n]x[n] ⇔ Y (e jω ) = H (e jω ) ∗ X (e jω ) where “∗” denote the convolution operator. That is, the convolution in the time domain is equivalent to multiplication in the frequency domain and vice versa. Proof For Eqs. (2.13) and (2.24), the output of the DT circuit can be written as y[n] =
∞
∞
h[k]x[n − k] =
k=−∞
⎡ ⎣h[k]
π 1 2π
k=−∞
⎤ X (e jω )e jω(n−k) dω⎦
−π
separating the variables and, for linearity, switching the integration with the summation, we obtain y[n] =
1 2π
π ∞ −π
− jωk
h[k]e
X (e jω )e jωn dω.
k=−∞
Form the transform definition (2.24), we can write π y[n] =
H (e jω )X (e jω )e jωn dω
1 2π −π
finally, again for definition (2.24), for the output we have Y (e jω ) = H (e jω )X (e jω ). With similar considerations, it is also easy to prove the inverse property.
2.4.2.4
Amplitude and Phase Response
The frequency response is a complex function of ω, it can be written as H (e jω ) = H R (e jω ) + j HI (e jω )
(2.25)
where the terms H R (e jω ) and HI (e jω ) are real function of ω and, respectively, represent the real and imaginary parts of the frequency response. The complex function H (e jω ) expressed in term of modulus and phase as jω H (e jω ) = H (e jω ) e j∠H (e ) = A(ω)e jφ(ω)
(2.26)
2.4 DT Circuits Representation in Transformed Domains
where A(ω) =
123
H R2 (e jω ) + HI2 (e jω )
φ(ω) = tan−1
HI (e jω ) H R (e jω )
(2.27)
(2.28)
and where the real function of real variable A(ω) and φ(ω), represent, respectively, the amplitude- and phase-response. Sometime, it is more convenient to consider the group delay τ (e jω ) defined as τ (e jω ) = −
dφ(ω) . dω
(2.29)
Example 2.7 Calculate the amplitude and phase response of a circuit characterized by the real exponential impulse response h[n] of the type h[n] = a n u[n],
|a| < 1.
(2.30)
For the (2.18) we get H (eiω ) =
∞
h[n]e− jωn =
n=0
∞
(ae− jω )n .
n=0
For |a| < 1 the previous expression converge to H (eiω ) =
1 1 1 = . = 1 − ae− jω 1 − a(cos ω − j sin ω) (1 − a cos ω) + ja sin ω
Calculating the module, we get A(ω) =
1 (1 − a
cos ω)2
+ (a
sin ω)2
=√
1 1 − 2a cos ω + a 2
instead, for the phase we have that φ(ω) = − arctan
a sin ω . 1 − a cos ω
The periodic trend of amplitude and phase are plotted in Fig. 2.18.
124
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.18 Amplitude and phase response of the sequence h[n] = a n u[n] for |a| < 1. The amplitude and phase trend is periodic with period 2π
2.4.2.5
The z-Domain Transfer Function and Relationship with DTFT
Given a DT circuit the transfer function (TF) H (z) is defined as the z-domain ratio between the output and input Y (z) . (2.31) H (z) X (z) As previously noted for a DT-LTI circuit the frequency response is defined as H (z)|z=e jω = H (e jω ) so the DTFT coincides with the TF calculated around the unit circle in the z-plane. In the case that X (z) =1, i.e., x[n] = δ[n], by definition (2.31) and by the convolution theorem, it results that the TF can be defined as h[n] = Z −1 {H (z)} .
(2.32)
The previous relation generalizes the Fourier series relation previously introduced.
2.4.3 Discrete Fourier Transform A DT sequence is said to be periodic, of period N , if x[n] ˜ = x[n ˜ + N ], −∞ < n < ∞.
2.4 DT Circuits Representation in Transformed Domains
125
Similar to the Fourier series for analog signals, a periodic sequence can be represented as a discrete sum of sine waves. Thus, we can define the Fourier series for a periodic sequence x[n] ˜ as N −1 − j 2π ˜ N kn , x[n]e ˜ (2.33) X (k) = n=0 N −1 1 ˜ 2π x[n] ˜ = X (k)e j N kn . N k=0
(2.34)
The Fourier series is an exact representation of the periodic sequence. In practical cases, however, it is of great use to give a different interpretation to the above equations. Consider, for example, a sequence x[n] of finite duration N , i.e., null outside the interval 0 N − 1. Its transform-z will then result in X (z) =
N −1
x[n]z −n .
n=0
If we evaluate X (z) over a set of uniformly spaced points around the unit circle, z k = e j2πk/N , k = 0, 1, ..., N − 1, we obtain X (e j
2π N
k
)=
N −1
x[n]e− j
2π N
kn
, k = 0, 1, ..., N − 1.
n=0
It can be observed that this expression is formally identical to the Fourier series of the periodic sequence N (Eq. 2.33). A sequence of length N can then be represented exactly by the pair of equations defined as direct discrete Fourier transform (DFT) and inverse (IDFT) X (k) =
N −1
x[n]e− j
2π N
kn
, k = 0, 1, ..., N − 1,
(2.35)
n=0
x[n] =
N −1 1 2π X (k)e j N nk , n = 0, 1, ..., N − 1. N k=0
(2.36)
Remark 2.8 The DFT represents, in practice, a simple method for calculating the DTFT. When using the DFT, one must always remember that one is representing a periodic sequence of period N . Table 2.1 shows some of its properties. Property 2.5 The DFT X (k) can be seen as a sampled version on N points of the unit circle of X (z), i.e., by the previous definition of DTFT, the DFT has similar properties with respect to the z-transform and the DTFT.
126
2.4.3.1
2 Discrete-Time Signals, Circuits, and System Fundamentals
The Fast Fourier Transform Algorithm
The N values of the DFT can be computed very efficiently by means of a family of algorithms called fast Fourier transform (FFT) [9–11]. The FFT algorithm is among the most widely used in many areas of numerical signal processing such as spectral estimation, numerical filtering, and many other applications. The fundamental paradigm for FFT development is that of divide et impera. The computation of the DFT is divided into simpler blocks, and the entire DFT is computed by subsequent reaggregation of the various sub-blocks. In Cooley and Tukey’s original algorithm [9], the length of the sequence is a power of two N = 2 B , i.e., B = log2 N . The computation of the entire transform is divided into two DFTs that are (N /2) long. Each of these is done by further splitting into (N /4) long sequences and so on. Remark 2.9 Without going into the merits of the algorithm, whose details and variants can be found in [9–11], we want to emphasize in these notes that the computational cost of an FFT of a sequence long N is equal to N log2 N while for a DFT it is equal to N 2 .
2.4.4 Ideal Filters Filters are circuits capable of allowing certain frequencies present in the input signal to pass and attenuating others (e.g., noise, unwanted components, etc.). Ideal filters are those that, instead of attenuating, completely eliminate unwanted frequencies [3–7]. The classification of linear filters is generally made in relation to the range of frequencies maintained in the filters output signal. The term lowpass is used when the output signal has unattenuated components below a certain frequency, called cutoff frequency. Other types of filter, with characteristics easily guessed by the reader, are the highpass, the bandpass and the bandstop. The frequency responses and respective ideal DTFTs of ideal filters are shown in Fig. 2.19. The impulse response of the ideal lowpass filter can be derived simply from the expression (2.18) as 1 h id [n] = 2π =
e
π HL P (e )e jω
−π − jωc n
−e j2π n
jωc n
which, defining the function sinc(ωc n) ten as
=
jωn
1 dω = 2π
ωc e jωn dω −ωc
sin(ωc n) πn
sin(ωc n) , ωc n
for ωc = 2π f c , is typically rewrit-
2.5 Discrete-Time Signal Representation with Unitary Transformations
127
Fig. 2.19 Frequency response or mask and definition of ideal filters. From top: lowpass, highpass, bandpass and bandstop
h id [n] = 2 f c sinc(ωc n),
− ∞ < n < ∞.
(2.37)
Note that this filter has infinite duration and is non-causal since the impulse response is also defined for negative n. An example of the impulse response of a lowpass filter, with ωc = 2π 0.2, obtained by truncating the impulse response of the ideal filter Eq. (2.37), is shown in Fig. 2.20. From the figure, it can be seen that the truncated filter response exhibits a certain ripple (Gibbs phenomenon), around the ideal response whose amplitude does not depend on the length. Given the non-causality the phase response, not shown in the figure, is zero. In filtering implementations, the impulse response of the filter is causalized by translating it by an amount such that h[n] is defined for 0 ≤ n ≤ N − 1. This introduces a nonzero phase contribution and, in the case of a symmetric impulse response, of a linear type (i.e., with constant group delay for all frequencies).
2.5 Discrete-Time Signal Representation with Unitary Transformations Consider a real or complex sequence of finite length
T x ∈ (R, C) N ×1 x[0] x[1] · · · x[N − 1] .
(2.38)
128
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.20 Impulse response and frequency response of a non-causal ideal filter with truncated h[n]
Denoting by F ∈ (R, C) N ×N an appropriate invertible matrix, called basis matrix or kernel matrix, consider the linear transformation defined as X = F · x.
(2.39)
The vector X ∈ (R, C) N ×1 [X (1) X (2) · · · X (N − 1)]T , contains the values of the sequence x represented in the domain described by the basis matrix F. Thus, Eq. (2.39) is defined as direct transformation and X represents the transformed signal. Similarly, the inverse transformation is defined as x = F−1 · X.
(2.40)
The transformation is said to be unitary transformation, if it preserves the inner product, i.e., if for the real matrix F ∈ R N ×N we have that F−1 = FT
⇔
FFT = I
(2.41)
and similarly in the complex case, for F ∈ C N ×N , it holds F−1 = F H
⇔
FF H = I
(2.42)
where the index (H ) denotes the complex conjugate matrix (Hermitian matrix). Property 2.6 Parseval property—The unitary transformation defined by the basis F, rotates the vector x without changing its length. Indicating with · the norm (i.e., the length), it is X = x ; in fact, we can write
X = X T X = [F · x]T F · x = xT · FT F · x = x . In general, the basis matrix F can be expressed as
(2.43)
2.5 Discrete-Time Signal Representation with Unitary Transformations
⎡ ⎢ ⎢ FK⎢ ⎣
FN00 FN10 .. .
FN01 FN11 .. .
··· ··· .. .
129
⎤
FN0(N −1) FN1(N −1) .. .
⎥ ⎥ ⎥ ⎦
(2.44)
FN(N −1)0 FN(N −1)1 · · · FN(N −1)(N −1) where FN is a real or complex number defined by the nature of the transformation itself.
2.5.1 DFT as Unitary Transformation The DFT can be interpreted as a unitary transformation if in (2.39) the matrix is complex F ∈ C N ×N and chosen based on the definition of DFT (2.35). In fact, the double summation in (2.35) can be interpreted as a matrix-vector product if the matrix is formed by the components of the DFT itself. For (2.35) to be identical to the definition (2.39), the coefficients FN of the transformation matrix F must be determined as FN = e− j2π /N = cos(2π/N ) − j sin(2π/N ). DFT The DFT matrix is defined as F { f k,n = FNkn , k, n ∈ [0, N − 1]} DFT f k,n = e− j
2π N
= cos
kn
2π 2π kn − j sin kn, N N
k, n = 0, 1, ..., N − 1
(2.45)
in explicit terms ⎡
1 ⎢1 ⎢ ⎢ F K ⎢1 ⎢ .. ⎣.
1
e− j2π /N e− j4π /N .. .
1
e− j4π /N e− j8π /N .. .
··· 1 · · · e− j2π(N −1)/N · · · e− j4π(N −1)/N .. .. . .
1 e− j2π(N −1)/N e− j4π(N −1)/N · · · e− j2π(N −1)
2
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎦
(2.46)
/N
The matrix F is symmetric and complex and from its definition the reader can easily H −1 H observe that √ the (2.42) are verified (F = F and√FF = I) provided in the (2.45), is K = 1/ N . The multiplication by the term 1/ N , is inserted precisely to make the transformation linear and unitary. From the above expressions the IDFT (2.36) can be calculated as (2.47) x = F H · X.
130
2 Discrete-Time Signals, Circuits, and System Fundamentals
The DFT, in practice, can be defined as an invertible linear transformation that maps a real or complex sequence into another complex sequence. In formal terms, it can be stated as DFT⇒ f : (C, R) N → C N .
2.5.2 Discrete Hartley Transform In the case of real sequences it is possible, and often convenient, to use transformations defined in the real domain f : R N → R N [14–19]. In fact, in the case of real signals have a complex arithmetic determines a computational load that is not always strictly necessary. The discrete Hartley transform (DHT) is defined as X (k) =
N −1 n=0
2π 2π kn + sin kn , k = 0, 1, ..., N − 1. x[n] cos N N
(2.48)
Whereby in (2.44) FN = cos(2π /N ) + sin(2π /N ) then the DHT matrix can be defined as F { f knD H T = FNkn , k, n ∈ [0, N − 1]} DHT f k,n = cos
2π 2π kn + sin kn, k, n = 0, 1, ..., N − 1. N N
(2.49)
√ To verify the unitarity conditions (2.42), as for the DFT applies K = 1/ N . In practice, the DHT coincides with the DFT for real signals.
2.5.3 Discrete Sine and Cosine Transforms The audio signal is typically encoded as a simple sequence, so it is often represented in terms of specialized transforms for real signals. In the discrete cosine transform (DCT) [14] and in the discrete sine transform (DST), the sequence can only be real x ∈ R N and is represented in terms of series of only real functions of cosine and sine type. In formal terms: DCT/DST ⇒ f : R N → R N . In particular, DCT and DST are transformations similar but not identical to DFT, applicable only to real sequences. A number of variants of DCT/DST are defined in the literature. Unlike DFT, which is uniquely defined, the real transformations DCT and DST can be defined in different ways depending on the type of periodicity imposed on the sequence x[n] of finite length3 N (for details see [3]). There are (at least) four variants reported in the Given a sequence x[n] with 0 N − 1 there are multiple ways to make it periodic depending on the type aggregation of segments and the type of symmetry, even or odd, chosen.
3
2.5 Discrete-Time Signal Representation with Unitary Transformations
131
literature, and the one called type II, which is based on a 2N periodicity, turns out to be one most widely used.
2.5.3.1
DCT-II
The cosine transform, DCT-II version, is defined as X (k) = K k
N −1
x[n] cos
n=0
π N
2n + 1 k , k = 0, 1, ..., N − 1. 2
(2.50)
In terms of unitary transformation, the coefficients of the basis matrix F (see [14, 15]) are defined as DC T f k,n = K k cos
π(2n + 1)k , n, k = 0, 1, ..., N − 1 2N
(2.51)
where, for F−1 = FT , it turns out that Kk =
2.5.3.2
√ 1/ N √ 2/N
k=0 k > 0.
(2.52)
DST-II
The DST-II version is defined as X (k) = K k
N −1
x[n] sin
n=0
π(2n + 1)(k + 1) , 2N
k = 0, 1, ..., N − 1.
(2.53)
Accordingly, the elements of the matrix F are defined as DST f k,n = K k sin
π(2n + 1)(k + 1) , 2N
n, k = 0, 1, ..., N − 1
(2.54)
with K k defined as in(2.52). Remark 2.10 Note that DCT, DST, and other transformations, can be computed with fast algorithms based on or similar to the FFT [14–20].
132
2 Discrete-Time Signals, Circuits, and System Fundamentals
2.5.4 Haar Unitary Transform Given a real analog signal x(t), t ∈ [0, 1), divided into N = 2b tracts, i.e., sampled with sampling period equal to ts = 1/N , the Haar transform can be defined as [16] X (k) = K
N −1
x(ts · n)ϕk (t),
k = 0, 1, ..., N − 1
(2.55)
n=0
so that in continuous time the family of Haar functions is defined in the interval t ∈ [0, 1). For some index k, defined as k = 2 p + q − 1, for
p, q ∈ Z
(2.56)
where p is such that 2 p < k, i.e., the greatest power of two contained in k, and (q − 1) is the remainder, i.e., q = k − 2 p + 1. For k = 0 it holds √ (2.57) ϕ0 (t) = 1/ N i.e., the function ϕ0 (t) is constant for t ∈ [0, 1). Remark 2.11 Haar basis functions can be constructed as dilation and translation of some starting function denoted, sometimes, as mother function. For k ≥ 1 it holds ⎧ p/2 2 , (q − 1)/2 p t < (q − 21 )/2 p 1 ⎨ ϕk (t) = √ , q = k − 2 p + 1 (2.58) −2 p/2 , (q − 21 )/2 p t < q/2 p N ⎩ 0, otherwise for which ϕk (t) assumes constant alternating positive and negative or null values. From the above expression, in fact, we can see that p determines the amplitude and width of the nonzero part of the function ϕk (t), while q determines the position of the alternating nonzero part of the function. In Fig. 2.21 is shown the trend of some Haar basis function computed with (2.58).
2.5.5 Data Dependent Unitary Transformation Transformations such as DFT, DCT, DST, etc. are usually referred to as dataindependent transformations. In fact, the basis functions that define F, are a priori defined, fixed, and chosen in a way that is completely independent of the input data. In the case where the basis depends on the data itself, the matrix F must be computed according to some optimum criterion [21, 22]. In addition, when the signal
2.5 Discrete-Time Signal Representation with Unitary Transformations
133
Fig. 2.21 Shapes of some Haar basis functions computed with Eq. (2.58)
is non-stationary must be determined run-time at considerable computational cost.4 In data-dependent transformation, the basis matrix is a function of the input signal statistic. Since the input sequence may have time-varying statistical characteristics, F must be recalculated (or updated), even in real time, whenever the input statistic changes. For this purpose, we define an M-length sliding window defined as
T xn ∈ (R, C) N ×1 x[n] x[n − 1] · · · x[n − N + 1]
(2.59)
in order to select a short input data segment in which the statistic moments,5 as described in the next Section, can be considered constant. 4
The definition of stationarity has been introduced for systems only. By non-stationary signal, for the sake of simplicity, we can mean a signal generated by a non-stationary system such as, for example, a sinusoidal oscillator that continuously varies amplitude, phase and frequency; therefore its statistical characteristics (mean value, rms, etc.) are not constant. 5 The moments of an random variable (RV) are quantitative measures related to the shape of its probability density function (pdf). For example, the first moment is the so-called expected value, the second central moment is the variance, the third moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.
134
2.5.5.1
2 Discrete-Time Signals, Circuits, and System Fundamentals
Expected Values of Random Sequences
As already indicated in the introduction of this chapter (see Sect. 2.1), a deterministic sequence is entirely predictable from a precise mathematical relation. In contrast, a sampled signal such as speech or music, can be defined as a stochastic process (SP) characterized by a certain probability density function (pdf) [2, 8]. Typically, in these cases the pdf is not a priori known and for the statistical description of the SP certain quantities, referred to as moments, can be easily estimated without explicit knowledge of the pdf. Let x be a vector that contains the samples of the SP x[n], we remind the reader that the statistical moments are defined as the scalar product of a function of the SP, denoted as g(x), and its pdf, i.e., ∞ g(x), p(x) =
g(x) p(x)dx
(2.60)
−∞
where g(x) is an invertible polynomial function usually defined as g(x) = (c − x)m , where c is a constant, m denote the moment’s order and usually refers to the above expression with c = 0. In practice, the moments can be evaluated with the so-called plug-in paradigm, which is a method of empirical estimation based on the discretization of the expression related to its definitions For example, the 1st-order moment of a stochastic process x, also denoted as statistical expectation or simply expectation is formally defined as ∞ E{x} x, p(x) =
x p(x)dx
(2.61)
−∞
sometime indicated as μx ≡ E{x}, or denoted as expected value of x. Similarly, the 2nd-order moment, denoted as variance, is defined as ∞ σx2
(x − μx )2 p(x)dx
(2.62)
−∞
where, as the reader should already know, σx denotes the standard deviation of the process. Thus, applying the plug-in method, for finite length sequence x ∈ R N ×1 , the expectation is usually empirically estimate by simple time-average N −1 1 ˆ x[n] E{x} = N n=0
(2.63)
2.5 Discrete-Time Signal Representation with Unitary Transformations
135
ˆ i.e., we can write μˆ x ≡ E{x}. Similarly, its variance can be empirically estimated as σˆ x2
N −1 1 = (x[n] − μˆ x )2 P n=0
(2.64)
where, regardless of the type of estimator used, biased or unbiased we have that P = N or P = N − 1. Remark 2.12 Note that for zero-mean SP the square root of Eq. (2.64) is defined as theroot mean square (RMS) of the sequence x[n] and indicated as RMS(x[n]) = N −1 2 1 n=0 x [n]. N Moreover, a SP x is defined as ergodic process according to the following definition. Definition 2.5 Ergodic process—Given a SP x this is denoted as ergodic if
N −1 1 Pr lim x[n] − E{x} = 0 = 1 n→∞ N n=0
(2.65)
that indicate the convergence in probability. ˆ Remark 2.13 Observe that, for an ergodic process we have that E{x} ∼ E{x}, i.e, the time average is equivalent to the ensemble average. It is therefore possible to make a consistent estimate of the expected value from a single SP realization.
2.5.5.2
Autocorrelation Matrix
By considering, without loss of generality, zero-mean random sequences, one of the most common methods for defining a unitary transformation dependent on the input data is based on the autocorrelation matrix of xn , indicated as R ∈ (R, C) M×M , and defined as (2.66) R E{xn xnH } or, in practice, of an empirical estimate of it indicated as Rx x . In fact, considering ergodic process and applying the plug-in method, the (2.66) can be estimated as Rx x =
N −1 1 xn xnH . N n=0
(2.67)
N −k+1 Furthermore, by placing the signal-window vectors [xk ]k=n , for n=0, 1, ...; into N ×M the rows of a matrix X ∈ (R, C) , denoted signal data matrix, the empirical correlation matrix can be evaluated as
136
2 Discrete-Time Signals, Circuits, and System Fundamentals
Rx x =
1 H X X. N
(2.68)
The correlation matrix is important in DSP algorithm development because it contains structural information about the SP itself. Remark 2.14 Note that, as better explained in Sect. 2.7.2, the signal data matrix X has Toeplitz symmetry, that is, each column/row contains the same vector xn ∈ R N ×1 , shifted by one position. Thus, its columns/rows are all similar to each other except for a signal shift operation. It follows that the empirical covariance matrix determined with the expression Eq. (2.68) could be ill-conditioned, i.e., its determinant close to the null value. So the inversion of the covariance matrix, sometimes necessary in some problems of parameter estimation, could be difficult. Below there are some properties of the correlation matrix. Property 2.7 Main properties of correlation matrix R or its empirical estimate Rx x . i. The matrix Rx x ∈ (R, C) M×M is positively defined and is Hermitian. ii. It is also a normal matrix, i.e., RxHx Rx x = Rx x RxHx . iii. An Hermitian normal matrix can always be diagonalizable via the unitary similarity transformation defined by the relation = Q H Rx x Q, where Q is orthonormal, i.e., Q H = Q−1 . iv. The diagonal matrix consists of its eigenvalues which, since Rx x is normal, are all positive and real, λk ∈ R+ and ordered such that λ0 > λ1 > · · · λ M−1 . v. Let us consider the DFT{xn }, defined as X n = Fxn , from the above we can write RxFx = E{Fxn [Fxn ] H } = FE{xn xnH }F H = FR x x F H .
(2.69)
vi. The correlation matrix has Toeplitz symmetry, that is, each column/row contains the same vector r ∈ (R, C) M×1 , shifted by one position: Rx x ∈ (R, C) M×M = Toeplitz(rx x ). Where rx x is the correlation vector defined as rx x = x ∗ [n], x[n + k].
2.5.5.3
Principal Component Analysis and Karhunen–Loève Transform
From the above properties, the unitary transformation that diagonalizes the correlation is precisely the unitary similarity-dependent transformation F ≡ Q H , known as Karhunen–Loève Transform (KLT) [21, 22]. The problem of choosing the optimal transform is essentially related to the computational cost required for its determination. In general, an optimal F transformation is dependent on the signal itself and its determination has complexity O(N 2 ).
2.6 Finite Difference Equations
137
Remark 2.15 Observe that, from a statistical point of view the correlation-matrix’s eigenvectors qi are oriented along the directions of maximum variance of the input signal. These orientations are denoted as principal directions. The eigenvalues, on the other hand, indicate the variances relative to these principal directions λi = σi2 . For this reason, the eigenanalysis of the covariance matrix is usually denoted as Principal Component Analysis (PCA). Hence, KLT is also referred to as Principal Component Transformation. Remark 2.16 Typically, the KLT is calculated by performing the Singular Value Decomposition (SVD),6 of the correlation matrix (see for example [21]). Another orthogonal data-dependent transformation consists of the decomposition J = LRL H , where L is a lower triangular matrix with orthogonal columns. This decomposition can be implemented in the form of adaptive lattice filter applied directly to the input data vector xn (see, for example [22], Sect. 8.3.5). Remark 2.17 Note that, using transformations not dependent on the input signal, i.e., representations of the signal related to a predetermined and fixed a priori set of orthogonal basis vectors, such as DFT, DCT, the computational cost can be reduced to O(N ). Property 2.8 Transformations such as DCT can represent an approximation of KLT. In fact, it is known that the performance of DCT approaches that of KLT for processes generated by a 1st-order Markov model with large adjoint correlation coefficients (see, e.g., [22], Sect. 8.2.5). In addition, another important aspect is that KLT is used as a benchmark to evaluate the performance of other reduced complexity transformations.
2.6 Finite Difference Equations A class of DT-LTI circuits of great practical importance in DSP [3–7] is the one that satisfies the finite difference equations (FDE) of order p p k=0
ak y[n − k] =
q
bk x[n − k], ak , bk ∈ (R, C), p q.
(2.70)
k=0
For a0 =1, the previous expression, usually referred to as DT filter, can be written in the normalized form as
The SVD of a rectangular matrix X is a factorization of the form X = USV H , where U and V are orthogonal matrices and S is a diagonal matrix.
6
138
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.22 Signal flow graphs (SFGs) of DT circuit defined by the FDE (2.71), or general form of an IIR filter, for a0 = 1: a Direct form I; b Direct form II with only one delay line possible when p=q
y[n] =
q
bk x[n − k] −
k=0
p
ak y[n − k].
(2.71)
k=1
which turns out to be characterized by a useful circuit representations shown in Fig. 2.22, denoted to as direct form I and direct form II. Remark 2.18 Note that, it is possible to derive the so-called direct form II of the FDE (2.71), shown in Fig. 2.22b, with the advantage of having only one delay line for its implementation, simply by linearity commuting the numerator and denominator TF’s sections for p = q. For this reason, an internal variable w[n], called the filter’s state variable, is introduced.
2.6.1 Transfer Function and Pole–Zero Plot The TF of the FDE (2.71), is a rational function of the type q
H (z) =
k=0 p k=0
bk z −k
=
ak z −k
b0 a0
q
! " 1 − ck z −1
k=1 p !
1 − dk z −1
"
.
(2.72)
k=1
Being, for physical realizability p ≥ q, the FDE order is expressed by the degree of the denominator of the (2.72). Taking into account also the degree of the numerator the TF order is more properly expressed by the pair ( p, q). Example 2.8 In Matlab , the FDE (2.70) or (2.71) is implemented with the function y = filter(b,a,x); where, denoting by M = p + 1 and N = q + 1, the vectors b ∈ (R, C) M and a ∈ (R, C) N represent the TF’s coefficients of the numera-
2.6 Finite Difference Equations
139
tor and denominator respectively. In other words, remembering that in Matlab the indices of the vectors and matrices always start from 1, the filter is implemented as a(1)y[n] = b(1)x[n] + b(2)x[n − 1] + · · · + b(M)x[n − M + 1] −a(2)y[n − 1] − · · · − a(N )y[n − N + 1].
(2.73)
Note that the filter procedure is applied to a vector x containing all the available signal samples. In the case of online, or blockwise, filtering, it is necessary to specify the filter state variable vector w, whose length is equal to max(length(a), length(b))-1, which should be given explicitly as [y w]= filter(b,a, x,w). For example, if you want to implement the filter with only the numerator of the TF, i.e., a FIR filter, simply put a = [1]. In Eq. (2.72) the roots of the polynomial at the numerator, denoted here as z 1 , z 2 , …, z k , …, are called TF’s zeros. The name zero comes simply from the fact that H (z) → 0, for z → z k . The roots of the polynomial in the denominator, denoted here as p1 , p2 , …, pk …, are such that the TF H (z) → ∞, for z → pk . Such values are denoted as poles.7 For circuit characterization, in the same manner as the frequency response (amplitude, phase, and group delay), the graphical representation of the H (z) roots is also used. The resulting graph, called poly-zero diagram or poly-zero plot, is very important, even in the design process for the evaluation of some characteristics of the TF’s circuit. Example 2.9 As an example, we consider an TF H (z) defined as follows. π
H (z) = =
π
(1 + 0.75z −1 )(1 − 0.5z −1 )(1 − 0.9e j 2 z −1 )(1 − 0.9e− j 2 z −1 ) π 4
π 4
(1 − 0.5e j z −1 )(1 − 0.5e− j z −1 )(1 − 0.75e j + 0.25z −1
+ 0.435z −2
+ 0.2025z −3
3π 4
z −1 )(1 − 0.75e− j
− 0.30375z −4
3π 4
z −1 )
(2.74)
1.0 1 + 0.35355z −1 + 0.0625z −2 − 0.13258z −3 + 0.14062z −4
characterized by two pairs of conjugate complex poles, two real zeros, and a pair of conjugate complex zeros. Figure 2.23 shows the trend of the characteristic curves of that TF. The position of the poles, given in the form r e± jθ (in this case: p1,2 = 0.5e± jπ /4 and p3,4 = 0.75e± j3π /4 ) results in two resonances at the respective angular frequency visible in the figure. The position of the zeros on the real axis at π and at 0 [rad] (z 1 = 0.75e jπ and z 2 = 0.5e j0 ) determine the attenuation of the response curve at the ends of the band; while the pair of zeros (z 3,4 = 0.9e± jπ /2 ) determines the anti-resonance at the center of the band. Observe that the amplitudes of resonance and anti-resonance are proportional to the radius of the pole or zero, respectively. 7
The legend says that the term pole comes from the English pole: Specifically it means the circus pole that underlies the tarpaulin. In fact, the cusp shape assumed by the tarp stretched by the underlying pole is reminiscent of the way the complex function, in particular the modulus |H (z)|z→ pk , tends to infinity for z → pk .
140
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.23 Characteristic curves (amplitude response, phase, group delay, and pole–zero diagram) of a TF. By positioning the poles and zeros in the appropriate way it is possible to obtain, in an approximate way, a certain frequency response
2.6.2 BIBO Stability Criterion Previously, we saw that a circuit is stable if: ∀ |x[n]| < ∞ ⇒ |y[n]| < ∞. For an LTI circuit, the link between impulse response and input given by the convolution sum (2.13) also holds. If the input is limited, the condition for the limitedness of the output then depends on the characteristics of the impulse response. Thus, a simple sufficient stability condition is the absolute summation of the impulse response h[n] results then ∞ k=−∞
|h[k]| < ∞
(2.75)
2.7 Finite Impulse Response Filter
141 ∞ h[k]z −k < ∞. For |z| = 1, the
or equivalently in terms of the transform-z,
k=−∞
condition is equivalent to the condition on the convergence region (ROC) of H (z) that must include the unit circle (see Sect. 2.4.1). The consequence of this observation leads to the following property. Property 2.9 A TD-LTI circuit modeled with causal FDE, is stable if and only if all poles of the network function H (z) are internal to the unit circle. Thus, the previous property establishes a simple criterion for evaluating the stability of a given TD-LTI circuit.
2.7 Finite Impulse Response Filter A DT-LTI circuit characterized by an FDE as in Eq. (2.71) can have an impulse response of finite or infinite duration. As previously stated, if the impulse response has infinite duration, the circuit is called IIR (infinite impulse response) filter. Conversely, if the circuit has an impulse response of finite duration, it is called FIR (finite impulse response) filter [3, 4]. In the special case of FIR filter in FDE (2.71) the terms ak are all zero. So the coefficients bk coincide precisely with the impulse response h[k] ≡ bk for k=0,1, ..., M − 1. Thus, the expression (2.71) becomes a simple (finite) sum of convolution y[n] =
M−1
h[k]x[n − k].
(2.76)
k=0
In this case for Eqs. (2.72) and (2.76), the TF H (z) can be written as H (z) = h[0] + h[1]z −1 + · · · + h[M − 1]z −M−1 all poles are positioned in the origin and for the Property 2.9 the circuit is always stable. Note that, by convention, the index M indicates the length of the impulse response and therefore the maximum degree of the polynomial is equal to M – 1.
2.7.1 Online Convolution Computation The convolution expression with (2.76) can be written as a scalar product between the vector containing the impulse response h ∈ R M and a vector containing a window of the input sequence xn ∈ R M . So, we can express the n-th sample of the output sequence as a scalar product of vectors y[n] = xnT h = hT xn ,
n = 0, 1, ..., L − 1
(2.77)
142
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.24 Signal flow graph representation of a numerical convolution, or simply convolver circuit, of length M. The input sequence flows onto the tapped delay line (DL). The output of each delay element is multiplied by its respective filter coefficient in order to perform the scalar product xn , h
where xn = [x[n] · · · x[n − M + 1]]T , as defined in Eq. (2.59), indicates a window of length M that ’slides’ over the input signal. Remark 2.19 In general, given a FDE of the type (2.71), it describes a FIR circuit if a0 = 1, and a1 = a2 = · · · = a N = 0; and it is usual to assume h[k] = {bk }. The resulting circuit, called a numerical convolver, is illustrated in Fig. 2.24. Remark 2.20 A fundamental operation in numerical filtering, which very often also determines the hardware architecture of processors dedicated to numerical signal processing or digital signal processors (DSP), is the multiplication between filter coefficients and signal samples and their accumulation or Multiply And Accumulate (MAC) (see commentary on the “C” code that implements the FIR filter) [7]. In this case, the convolution is implemented as an inner product (2.77) in which for each time instant the vector xn is updated with the new sample of the input sequence as schematically illustrated in Fig. 2.25. // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - d o u b l e fir1 ( int M , d o u b l e x , d o u b l e * h , d o u b l e * xw ,) // M f i l t e r o r d e r // x i n p u t s a m p l e // h f i l t e r i m p u l s e r e p o n s e // xw delay - line buffer { int i ; double y; // o u t p u t xw [0] = x ; // load input samplo into delay - line y = h [0]* xw [0]; // MAC i n i t i a l i z e for ( i =1; i =1; i - -) xw [ i ]= xw [i -1]; // delay line shift return y; }
Almost all DSPs have an hardware multiplier and an instruction, in the Assembly language, that implements the MAC directly in a single machine cycle.
2.7 Finite Impulse Response Filter
143
Fig. 2.25 FIR filtering as linear combination and signal shift in the delay line
2.7.2 Batch Convolution as a Matrix-Vector Product The expression (2.77) can be generalized by considering a batch approach, i.e., computing the convolution, with an impulse response of finite duration h, as a matrix vector product.
2.7.2.1
Batch Convolution with Finite Duration Sequences
When the impulse response and the input signal are both sequences of finite duration the expression (2.76) can be interpreted as a matrix-vector product. Let x[n], 0 ≤ n ≤ N − 1, and h[n], 0 ≤ n ≤ M − 1, with M < N ; it follows that the output y[n], 0 ≤ n ≤ L − 1, has length L = N + M − 1. Arranging the impulse response and output samples into column vectors defined, respectively, as
T
T h h[0] h[1] · · · h[M − 1] , and y y[0] y[1] · · · y[L − 1] the output of the system, reinterpreting the (2.76) as a matrix-vector product, can be written as ⎤ y[0] ⎡ .. ⎥ ⎢ x[0] ⎥ ⎢ . ⎥ ⎢ x[1] ⎢ ⎥ ⎢ ⎢ .. ⎥ ⎢ ⎢ .. . ⎥ ⎢ ⎢ . ⎥ ⎢ ⎢ ⎢ y[M − 1] ⎥ ⎢ x[M − 1] ⎥ ⎢ ⎢ .. ⎥ ⎢ x[M] ⎢ ⎥ ⎢ ⎢ . ⎥=⎢ ⎢ .. ⎥ ⎢ ⎢ . ⎥ ⎢ ⎢ .. . ⎥ ⎢ ⎢ ⎥ ⎢ x[N − 1] ⎢ ⎢ y[N − 1] ⎥ ⎢ ⎥ ⎢ ⎢ 0 .. ⎥ ⎢ ⎢ .. ⎥ ⎢ ⎢ . ⎥ ⎣ ⎢ . ⎥ ⎢ . ⎦ ⎣ .. 0 y[L − 1] ⎡
0 x[0] .. . x[M − 2] x[M − 1] .. . x[N − 2] x[N − 1] .. . 0
⎤ ··· 0 ⎥ ··· 0 ⎥ ⎥ .. .. ⎥ . . ⎥⎡ ⎤ ⎥ ··· x[0] h[0] ⎥ ⎥ ··· x[1] h[1] ⎥ ⎥⎢ ⎥ ⎥⎢ ⎥ .. .. ⎥⎢ ⎣ ⎦ . ⎥ . ⎥ · · · x[N − M] ⎥ h[M − 1] ⎥ · · · x[N − M + 1] ⎥ ⎥ .. ⎥ .. ⎦ . . · · · x[N − 1]
(2.78)
144
2 Discrete-Time Signals, Circuits, and System Fundamentals
or, compactly, as y = Xh
(2.79)
in which the matrix X ∈ R L×M contains the samples of the input signal arranged in columns and gradually shifted by a sample at the bottom.8 Remark 2.21 Note again that by definition, the matrix X has Toeplitz symmetry (see also Sect. 2.5.5.2), i.e., all elements placed on the diagonals identical. This is very important for many applications described in later chapters. Note that the first and last M − 1 rows of the X matrix contain zeros due to the transit of the signal and, as a result of this, the first and last M − 1 samples of the output are characterized by a so-called transient effect.
2.7.2.2
Online Mini-Batch Convolution
Sometimes in digital audio signal processing (DASP), it is necessary to have the input sequence in tabular form with a certain number of input windows arranged in a matrix defined as Xn ∈ R N ×M [xn xn−1 · · · xn−N −1 ]T .
(2.80)
This matrix, previously referred to as signal data matrix (see Sect. 2.5.5.2), can be used, for example, for the determination of the run-time empirical correlation matrix (see Eq. 2.68) or for the calculation of the convolution in mini-batch mode as yn ∈ R N ×1 = Xn h, or in extended mode ⎡ ⎢ ⎢ ⎢ ⎣
⎤ ⎡ x[n] x[n − 1] y[n] x[n] y[n − 1] ⎥ ⎢ x[n − 1] ⎥ ⎢ ⎥=⎢ . . . .. .. .. ⎦ ⎣ x[n − N + 1] x[n − N ] y[n − N + 1]
⎤ ⎤⎡ · · · x[n − M + 1] h[0] ⎥ ⎢ ··· 0 h[1] ⎥ ⎥ ⎥⎢ ⎥. ⎥⎢ . . .. .. .. ⎦ ⎦⎣ . · · · x[n − N − M + 2] h[M − 1]
(2.81)
Therefore, an efficient mechanism for updating the signal data matrix must be defined for run-time convolution computation in mini-batch mode Remark 2.22 Note that in Eq. (2.81) the output dimension is equal to that of the input because in the run-time mini-batch convolution calculation, we do not take into account the transient phenomena that are present in the full-convolution in Eq. (2.79) instead.
In some texts, the expression (2.79) is written as y = XT h where, of course, the data matrix is defined as a transpose of (2.78)
8
2.7 Finite Impulse Response Filter
145
2.7.3 Convolutional-Matrix Operator The relation (2.82) can also be written in convolutional matrix operator notation. Denoting by xn ∈ R M×1 = [x[n] x[n − 1] · · · x[n − M + 1]T the input signal sliding in the delay line of the FIR filter as defined in Eq. (2.77), we can write yn = Hxn
(2.82)
we define the convolutional operator as H ∈ R2M×M , i.e., the matrix containing the sample-translated replicas of the same impulse response h filled with zeros, as shown in Eq. (2.83), such that the vectors xn and yn contain, respectively, the sample window of the input and output sequence ⎡
0 ⎢ . ⎢ ⎤ ⎢ .. ⎡ y[n] ⎢ . .. ⎢ y[n − 1] ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ .. ⎢ ⎥ ⎢ .. ⎢ . ⎥=⎢ . ⎢ ⎥ ⎢ ⎢ .. ⎥ ⎢ . ⎢ ⎥ ⎢ .. ⎢ . ⎦ ⎢ ⎣ .. ⎢ . ⎢ .. . ⎢ ⎣ h[1] h[0]
··· ··· ··· ··· . . .. .. ··· ··· ··· . .. h[0] 0
0 . .. 0 ···
···
···
··· . ..
0
0
h[M − 1]
⎤
⎥ h[M − 1] h[M − 2] ⎥ ⎥ ⎥⎡ ⎤ . . . ⎥ x[n] .. .. .. ⎥ ⎥ ⎢ x[n − 1] ⎥ ⎥⎢ ⎥ .. ⎥. 0 h[M − 1] ··· h[0] ⎥ ⎥⎢ ⎥ . ⎥⎢ ⎣ ⎦ ⎥ .. ⎥ h[M − 1] ··· h[0] 0 ⎥ . ⎥ .. . . . ⎥ .. .. .. . ⎥ ⎦ ··· ··· ··· 0 ··· ··· 0 0
(2.83) Although this notation is of little practical use for convolution implementation, the matrix H can reveal important structural properties of the impulse response usable in theoretical developments as, for example, we will see in Sect. 4.3 in the study of filter banks for audio applications.
2.7.3.1
Convolution with DT Filtering Operator
The online linear filtering operation is a fairly common procedure in many disciplines, and sometimes it is necessary or useful to have appropriate formal mathematical tools for more specific needs. −1 Denoting by q −1 the unit discrete-time delay symbol, we define H q { · } discretetime filtering operator. With such a formalism, the online convolution, which involves updating the delay line of the signal samples present at the input and then computing the inner product between the vectors xn and h, can be stated more simply and compactly through such −1 an operator as y[n] = H q {x[n]}.
146
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.26 Impulse response of a moving average filter of order (M − 1)
Unlike the other formalisms where the convolution or filtering operation is specified and not the implementation mode, with this formalism we specify that the filter is implemented online with the structure shown in Figs. 2.24 and 2.25.
2.7.4 FIR Filters Design Methods 2.7.4.1
Moving-Average Filter
A circuit that calculates a simple moving average can be characterized by the following FDE y[n] =
M−1 1 x[n − k] =x[n] + x[n − 1] + · · · + x[n − M + 1]. M k=0
(2.84)
The moving-average filter is of FIR type, and the term M indicates its length. Its impulse response is shown in Fig. 2.26. The network function is H (z) =
M−1 zM − 1 1 −k 1 − z −M 1 . z = = M k=0 1 − z −1 M z M−1 (z − 1)
(2.85)
Developing the H (z) we ge H (z) =
1 (z − 1)(z − e j2π /M )(z − e j4π /M )(z − e j2π(M−1)/M ) M z M−1 (z − 1)
the zero for z = 1 is cancelled by the respective pole. Dividing each term of the development by z we get H (z) =
M−1 1 # 2π (1 − e j M k z −1 ) M k=1
2.7 Finite Impulse Response Filter
147
Fig. 2.27 Amplitude response, phase, and pole–zero plot of moving average filter: a of order 4 (M=5); b and of order 9 (M=10)
for which there is a pole of order (M − 1) for z = 0. The zeros are distributed uniformly in the unit circle except for the deleted zero at z = 1. From (2.85), for z = e jω , the DTFT of the moving average filter is equal to H (e jω ) =
sin(ωM/2) 1 1 − e− jωM . = 1 − e− jω(M−1)/2 M 1 − e− jω sin(ω/2)
The amplitude and phase response are therefore H (e jω ) = sin(π f M) sin(π f ) ,
arg H (e jω ) = −π f (M − 1)
−0.5 f 0.5.
(2.86)
In Fig. 2.27 are shown the frequency responses of standard moving-average filters with coefficients defined as h[k] = 1/M, k = 0, 1, . . . , M − 1; for M = 5 and M = 10, so as to obtain unity gain for ω → 0. Note that the moving average filter has, of course, lowpass characteristics.
2.7.4.2
FIR Filter Design by Windowing Method
The impulse response of an ideal lowpass filter, with cutoff frequency ωc , has an infinite length and defined as 1 h id [n] = 2π
ωc e jωn dω = −ωc
e jωc n − e− jωc n = 2 f c sinc(ωc n), −∞ < n < ∞. j2π n (2.87)
148
2 Discrete-Time Signals, Circuits, and System Fundamentals
where sinc(ωc n) = sin(ωc n)/(ωc n), and ωc = 2π f c . The sharp truncation of the (2.87) has the effect of a certain ripple around the mask of the ideal filter. The truncation of the (2.87) is equivalent to multiplying the ideal impulse response with a window sequence w[n] defined as: w[n] =
1, for − M ≤ n ≤ M 0, otherwise.
(2.88)
The ripple, then, can be interpreted as the effect of the convolution between the ideal frequency response of the filter and the DTFT of the window. The frequency response of the filter is then H (e jω ) = W (e jω ) ∗ Hid (e jω ). It is obvious that the shape of the window can be chosen so as to diminish this effect: In order to attenuate ripple, truncation is performed, in a less abrupt manner, by multiplying the ideal response by a certain sequence w[n] of limited length called window function [13]. It follows that the causal impulse response of a real lowpass filter can be calculated as 0≤n ≤ N −1 (2.89) h[n] = w[n] [2 f c sinc (ωc (n − M))] , where the index M, which determines the translation to make the filter causal, is such that 2M + 1 = N , for odd N ; 2M = N , for even N . It holds then (N − 1)/2, for N odd (2.90) M= N /2, for N even The window function w[n] can take many forms, and here are some of the most common ones ⎧ ⎪ ⎨ a − b · cos 2π n + c · cos 4π n , n = 0, 1 , ..., N − 1 N −1 N −1 w[n] = ⎪ ⎩ 0, otherwise where, depending on the coefficients a, b and c, we have a = 1.00 b = 0.00 c = 0.00 rectangular a = 0.50 b = 0.50 c = 0.00 Hanning a = 0.54 b = 0.46 c = 0.00 Hamming
(2.91)
a = 0.42 b = 0.50 c = 0.08 Blackman. In the literature, there are many different types of window functions computed in various ways and with different optimization criteria. For further discussion see, e.g., [6, 13]. Figure 2.28 shows, as an example, the characteristic curves of a lowpass filter realized with the expression (2.89) with Hamming window for N = 51.
2.7 Finite Impulse Response Filter
149
Fig. 2.28 Characteristic curves of a lowpass FIR filter with Hamming window, also called raised cosine, for N = 51, and f c = 0.25
150
2.7.4.3
2 Discrete-Time Signals, Circuits, and System Fundamentals
Parametric Kaiser Window
Some types of filter’s windows have control parameters that allow for a greater design flexibility. In other words, the window function is regulated by one or more parameters that allow us to fix a priori the transition bandwidth and the stopband attenuation. In this case, the the window length N required to achieve the specification is determined indirectly. Among the most widely used in filtering and spectral analysis applications, the Kaiser window governed by the β parameter, whose trend is shown in Fig. 2.29, is defined for N = 2M + 1 (odd), as ! √ " I0 β n(2M − n)/M , w[n] = I0 (β)
n = 0, 1, ..., N − 1
(2.92)
where I0 (x) is the modified zero-order Bessel function of the first kind defined as I0 (x) = 1 +
2 ∞ (x/2)k k=1
k!
.
Denoting by A the minimum value, calculated in decibels, of the desired ripple between the stopband and the pass-band
A = −20 log 10 min(δ p , δs )
(2.93)
the Kaiser window allows us to obtain a trade-off between the various specifications of the filter regulated by the value of β that can be determined with the following formulas ⎧ if A ≥ 50 ⎪ ⎨ 0.11020(A − 8.7), (2.94) β = 0.58420(A − 21)0.4 + 0.07886(A − 21), if 21 < A < 50 ⎪ ⎩ 0, if A ≤ 21. Furthermore, Kasier found that to achieve the required performance, the filter length can be determined as ([3], Eq. 7.63) N = fs
A−8 . 17.9 f
(2.95)
Example 2.10 Design of a two-way crossover filter for audio applications with the with the following specifications: sampling frequency f s = 48 kHz, cutoff frequency f 1 = 800 Hz, transition band of f = 500 Hz, and stopband attenuation of 100 dB; cutoff frequency f 2 = 5 kHz, transition band of f = 2.5 kHz, and stopband attenuation of 100 dB.
2.7 Finite Impulse Response Filter
151
Fig. 2.29 Kaiser window trend as a function of control parameter beta z−M
+
DAC
Ampli
Twetter
H BP ( z )
DAC
Ampli
Mid − Range
H LP ( z )
DAC
Ampli
Woofer
−
x[ n] digital audio input
−
Fig. 2.30 Three-way digital crossover filter schematic for an active speaker system. For the delay is M = (N − 1)/2
Denoting h L P [n], h M P [n], and h H P [n], the impulse responses of the lowpass, bandpass, and highpass filters, respectively, we proceed with the windowing technique and impose the condition that the overall response of the three filters is a delayed impulse. It is therefore obtained h L P [n] + h B P [n] + h H P [n] = δ[n − M], where the index M denotes the group delay of the three filters assumed of identical length (Fig. 2.30). The calculation of the Kaiser window parameters is performed by Eqs. (2.93)– (2.95). For Eq. (2.93) it holds, A = 100. The parameter β, for (2.94) is worth, β = 10.0613. Note that the high-frequency band is obtained by subtracting from the input signal the low frequency component coming out of the filter suitably delayed. In fact, it is easy to show that the TF of the highpass filter is complementary to that of the lowpass and bandpass filters; thus we have H H P (e jω ) = e− jωM − HL P (e jω ) − H B P (e jω ). Figure 2.31 shows the amplitude responses of crossover FIR filters designed with the Kaiser window method.
152
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.31 Magnitude response in dB of three-way digital crossover FIR filter design by Kaiser window (designed with the Matlab procedure kaiserord(·)). Stopband attenuation of 100 dB. The crossover frequencies are 800 Hz and 5 kHz
2.7.4.4
FIR Filters Design by Optimization Criteria
The filter impulse response h[n] is determined considering a problem of optimal approximation of the ideal desired frequency mask Hid (e jω ) [6]. Let E(e jω ) = Hid (e jω ) − H (e jω ), be the difference between the desired response and the filter response, denoted as error, then indicating with h ∈ (R, C) M×1 the vector containing the filter coefficients, such that H (e jω ) = DTFT(h), the filter design can be done by defining a cost function (CF) that coincides with the error L p -norm evaluated for a set frequencies ωk for k ∈ [0, K − 1] J (h) =
K −1 %
&1/ p Hid (e jω ) − H (e jω ) p ,
ωk ∈ .
(2.96)
k=0
In filter design the most common norms are the L 2 -norms, known as minimum squared error criterion, and the L ∞ -norm know as the Chebyshev or min-max criterion, correspond to p → ∞ [3, 12]. The optimal value hopt , relative to the chosen norm, is obtained by minimizing the J (h) for which it applies hopt ∴ ∂ J∂h(h) → 0. Let Hid (e jω ) be the desired complexvalued response ω ∈ [−π, π ], and h ∈ (R, C) M×1 = [h[0] h[1] · · · h[M − 1]]T the M−1 jωk h[m]e− jωk = Fh, where, considering filter coefficients such that H (e ) = m=0 a uniform grid of K frequency values (i.e., ωk ∈ [π, π ], for k=0, 1, ..., K − 1), F ∈ C K ×M is the DFT matrix, denoted also Fourier matrix which is a Vandermonde complex matrix, defined as
2.7 Finite Impulse Response Filter
153
⎡
1 e− jω0 e− j2ω0 ⎢ 1 e− jω1 e− j2ω1 ⎢ F=⎢. .. .. ⎣ .. . . 1 e− jω K −1 e− j2ω K −1
· · · e− j (M−1)ω0 · · · e− j (M−1)ω1 .. .. . .
⎤ ⎥ ⎥ ⎥. ⎦
(2.97)
· · · e− j (M−1)ω K −1
T Let d ∈ C K ×1 = Hid (e jω0 ) Hid (e jω1 ) ... Hid (e jω K −1 ) be the vector that contains the the desired magnitude and phase response, the CF function (2.96) can be rewritten in term of L 2 or Frobenius norm as J (h) = d − Fh 22 . In this case, the vector of the optimal filter coefficients is calculated by setting the gradient to zero h L S ∴ ∂ J∂h(h) = −2F H d + 2F H Fh → 0, i.e., we have a set of linear relationship that usually are denoted as normal equations9 FH F = FH d (2.98) the optimal least-squares (LS) solution is therefore h L S = (F H F)−1 F H d
(2.99)
i.e., obtained as a standard LS-solution of an overdetermined linear system. However, for the case of real impulse response it is not necessary the use of the LS approximation by virtue of the following property. Property 2.10 The impulse response of a FIR filter, evaluated by L 2 -norm error minimization, coincides with the truncated ideal filter response defined in (2.99) h L S ∈ R M×1 = (F H F)−1 F H d, ⇒
{h L S } = h id [n], n = 0, ..., M − 1.
Proof For the Parseval relation the L 2 -norm error (2.96) can be expressed in timedomain as J (h) =
∞
|h id [n] − h[n]|2 =
n=−∞
Since, by definition be written as
∞ !
" h 2id [n] + h 2 [n] − 2h id [n]h[n] . (2.100)
n=−∞
∞ n=−∞
|h id [n]|2 = C, for a M-length FIR-filter Eq. (2.100) can
J (h) = C +
M−1
!
" h 2 [n] − 2h id [n]h[n] .
n=0
The optimal solution is for ∂ J∂h(h) → 0. Operating in scalar form, switching the derivative and sum operations, we can write
By indicating with R = F H F, Eq. (2.98) are denoted as normal equations, since R is a normal matrix, i.e., R H R = RR H .
9
154
2 Discrete-Time Signals, Circuits, and System Fundamentals
h L S [n] ∴
M−1 n=0
! " M−1 ∂ h 2 [n] − 2h id [n]h[n] = (2h[n] − 2h id [n]) → 0 ∂h[n] n=0
that has a minimum point for h[n] = h id [n], for n = 0, 1, ..., M − 1. Thus, for h id [n] = sinc(ωc (n − M)), where for M odd, M0 = M−1 , we get 2 h L S [n] = sinc(ωc (n − M0 )),
n = 0, 1, ..., M − 1.
Remark 2.23 Note that, in case the frequency specification Hid (e jω ), symmetrically defined between −π and π , is an even function, and the phase specification arg[Hid (e jω )] is an odd function, the solution is real, and the coefficients calculated with the LS approach have zero imaginary part. In addition, in case the of weighted CF with a frequency weighting function of the type G(ω), the solution is the classic weighted LS (WLS) solution [3]. The optimal WLS solution is therefore hW L S = (F H GF)−1 F H Gd
(2.101)
where G ∈ R K ×K is a diagonal matrix defined as G =diag[G(ω0 ), G(ω1 ), ..., G(ω K −1 )]. However, one of the most used technique in FIR filter design is based on the min-max criterion. This method is based on the polynomial approximation of the frequency response. The optimization procedure minimizes the maximum error, i.e., the L ∞ norm (denoted also as uniform norm or Chebyshev-norm), between the desired frequency response and that reachable by the filter. The error (ripple) is almost identical for all frequencies and for this property the filter is also called equiripple. The equiripple optimization method is usually based on the well known Remez exchange method usually implemented by the Parks-McClellan algorithm variation [12], which is an iterative algorithm used to find a polynomial approximations to a continuous functions in an interval C[a, b], specifically, the best in the uniform norm sense. Example 2.11 Figure 2.32 shows the amplitude responses of three FIR filters of equal length M = 21 and, let f s be the sampling frequency, normalized cutoff frequency f c / f s = 0.25, designed minimizing the CF (2.96) for L 2 and L ∞ norms. the pass-band and stopband regions expressed as normalized frequency f / fs for the three filters are f p1 = [0.0, 0.15], f s1 = [0.35, 0.5]; f p2 = [0.0, 0.2], f s2 = [0.3, 0.5] and f p3 = [0.0, 0.24], f s3 = [0.26, 0.5], respectively. In other words, as shown in the figure, the three filters are characterized by the same half amplitude cutoff frequency equal to f c / f s = 0.25 and different transition bands f 1 = 0.2, f 2 = 0.1,
f 3 = 0.02. It can be observed that minimizing the Chebyshev-norm produces an equiripple filter, whereas this is not the case for L 2 -norm. In addition, for both techniques, by decreasing the f transition band, the ripple increases.
2.7 Finite Impulse Response Filter
155
Fig. 2.32 FIR filter design by optimization algorithms. Filter length M = 21, half amplitude normalized cutoff frequency f c = 0.25. a L 2 -norm minimization or Least squares method. b L ∞ - or Chebyshev-norm (or min-max criterion)
Fig. 2.33 Two-way digital crossover for audio application scheme. The crossover is implemented by a single FIR filter and a delay line of length (M + 1)/2
Remark 2.24 Note that, an approximate formula for the calculation of the filter length from the ripple specifications of the Parks–McClellan algorithm is the following ([3], Eq. 7.104) −10 log(δ p δs ) − 13 (2.102) M = fs 14.6 f where δ p and δs are the maximum error in the passband and stopband regions, respectively. Example 2.12 Design of a two-way crossover filter for audio applications with the scheme in Fig. 2.33, with the following specifications: sampling frequency f s = 48 kHz, cutoff frequency f s = 800 Hz, transition band of f = 500 Hz, and stopband attenuation of 100 dB. Note that the high-frequency band is obtained by subtracting from the input signal, the low-frequency component coming out from the filter suitably delayed. In fact, it
156
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.34 Crossover FIR filter design by Parks–McClellan algorithm. The crossover level is ≈ −6 dB at 800 Hz
Fig. 2.35 Low frequency effect (LFE) or sub-woofer crossover filter designed by Parks–McClellan algorithm
is easy to show that for the TF of the filter pass complementary to the lowpass, the following applies M+1 H H P (e jω ) = e− jω 2 − HL P (e jω ) Figure 2.34 shows the amplitude responses of crossover FIR filters designed with the Parks–McClellan algorithm. The filter length has been determined with Eq. (2.102). Example 2.13 Design of a lowpass crossover filter for a sub-woofer or low frequency effect (LFE) with the following specifications: sampling frequency f s = 44.1 kHz, cutoff frequency f t = 120 Hz, transition band of f = 90 Hz, and stopband attenuation of 100 dB.
2.8 Infinite Impulse Response Filter
157
Fig. 2.36 SFGs of 2nd-order IIR filter (in direct forms I and II), or standard 2nd-order cell. Highorder TFs can be factorized to be implemented as cascaded 2nd-order cells. This results in a more robust structure
The length of the filter is proportional to the Q-factor: f s / f . In fact, by Eq. (2.102) the filter has a length of 2921 taps. Furthermore, using Kaiser’s windows method, given the non-equiripple nature of the response, a filter with 3141 taps would be required (Fig. 2.35). Remark 2.25 Note that to reproduce low frequencies, a lower sampling rate than can guarantee Shannon–Nyquist conditions. For example, for the LFE filter a frequency of 1024 Hz, with the same specifications as in the previous example, a length of 69 taps would be sufficient, which can be implemented with significant computational savings. As we will see better in Chap. 4, multi-rate methods can be used in these cases, i.e., defining different sampling rates for different parts of the algorithm.
2.8 Infinite Impulse Response Filter A filter with impulse response of infinite duration is characterized by a TF of the type (2.72) in which the term a0 = 1. For example a 2nd-order IIR filter, the finite difference equation results to be y[n] = b0 x[n] + b1 x[n − 1] + b2 x[n − 2] − a1 y[n − 1] − a2 y[n − 2] with TF H (z) =
b0 + b1 z −1 + b2 z −2 . 1 + a1 z −1 + a2 z −2
The 2nd-order TF, with SFG in Fig. 2.36a, called the 2nd-order cell, is important because it is one of the fundamental bricks of filtering techniques with which to make more complex filter forms.
158
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.37 Characteristic curves of an IIR digital resonator: a r = 0.707 and θ = π/2; b r = 0.95 and θ = π/2
2.8.1 Digital Resonator The digital resonator is a circuit characterized by a peak in the response around a certain frequency. Now, recall that resonance in the continuous case is realized by placing a pair of conjugate complex poles close to the imaginary axis; and, by analogy, in DT the resonance can be realized by placing a pair of conjugate complex poles very close to the unitary circle. Thus, the network TF, in the case of a 2nd-order resonator, takes the form 1 "! " , with H (z) = ! jθ −1 1 − re z 1 − r e− jθ z −1
r 1), and all remaining zeros, as well as poles, inside it. Extracting that zero, H (z), can then be expressed as H (z) = H1 (z)(1 − r e jθ z −1 )(1 − r e− jθ z −1 ) where, by definition, H1 (z) is minimum phase. Observe further that, by multiplying and dividing by the term (1 − r −1 e± jθ z −1 ) this expression can be written as H (z) = H1 (z)(1 − r −1 e jθ z −1 )(1 − r −1 e− jθ z −1 ) ) *+ , minimum phase
·
jθ −1
(1 − r e z )(1 − r e− jθ z −1 ) (1 − r −1 e jθ z −1 )(1 − r −1 e− jθ z −1 ) *+ , ) all-pass
which can be extended even in the presence of multiple zeros outside the unit circle, and thus proves (2.110). 1
−z −1
3 Example 2.16 As an example, given H (z) = 1+ 1 −1 , this TF is not non-minimum 4z phase since it has a zero at z = 3, i.e., outside the unit circle. Multiplying and dividing by the term (1 − 13 z −1 ) we obtain
H (z) =
1 3
− z −1
1+
1 −1 z 4
·
1 − 13 z −1 1−
1 −1 z 3
=
1 − 13 z −1 1 −1 z 4
1+ ) *+
Hmin (z)
·
1 3
− z −1
1 − 1 z −1 , ) *+3 , Hap (z)
.
164
2 Discrete-Time Signals, Circuits, and System Fundamentals
2.8.5 Linear Ordinary Differential Equation Discretization: IIR Filter from an Analog Prototype In the reference literature there are many approaches and methods for the discretization of analog filters [3–7]. These methods are generally based on two distinct philosophies: direct discretization of ordinary differential equation (ODE); discretization of the ODE integral solution. The most common are the following: 1. Mapping differentials or Euler methods (backward or forward difference). 2. Bilinear transformation (Tustin transformation and trapezoidal integration method). 3. Bilinear transformation and frequency prewarping (precompensation of the frequency distortion between s and z planes). 4. Impulse-invariant transformation z-transform. 5. Matched z-transform. 2.8.5.1
Derivative Approximation by Finite Difference: The Euler’s Method
The discretization of an ODE into a finite difference equation (FDE) is based on the method of discretization of the 1st-order derivative x(t) ˙ that is defined as x(t) ˙ =
x(t) − x(t − t) dx(t) lim
t→0 dt
t
where t is a time interval. The above in the continuous case can be written in the equivalent form x(t) ˙ =
x(t + t) − x(t) dx(t) lim .
t→0 dt
t
In the Euler’s method, time derivative can be approximated replacing the derivative with a 1st-order finite difference. Thus, the above definition can be discretized with an incremental ratio calculated for a finite sampling interval T . So we have that x[n] − x[n − 1] , T x[n] − x[n − 1] , y[n] = T
y[n − 1] =
forward difference (2.111) backward difference.
Furthermore, in order to reduce the approximation error, we can combine with a sum the previous expressions obtaining the modified Euler method, also known as trapezoidal method,11 obtaining 11
The integration -operation with respect to time can be performed with the trapezoidal rule t y(t) = y(t − T ) + t−T x(τ )dτ , that can be approximated as the modified Euler formula (2.112)
2.8 Infinite Impulse Response Filter
165
Fig. 2.42 First-order differentiators filter characteristic curves: a backward difference; b trapezoidal method
2 (x[n] − x[n − 1]) T
(2.112)
2 (x[n] − x[n − 1]) − y[n − 1]. T
(2.113)
y[n] + y[n − 1] = and solving for y[n], we obtain y[n] =
Remark 2.27 Note that the expressions (2.111) are relative to a two coefficient FIR filter and that the expression (2.113) corresponds to a 1st-order IIR filter. As we will see in the next section, it is obvious that the goodness of the approximation of the derivative will depend on the transfer function of such filters. Figure 2.42 shows the filter responses in Eqs. (2.111) and (2.113), and it can be seen that the frequency response has highpass characteristics and grows linearly with frequency until approximately 0.3 (normalized frequency). Thus, the DT circuits based on Euler’s methods: (1) approximate an ideal differential at low frequencies; (2) the trapezoidal method approximates the transfer function of an ideal differential better than the first ones. Note also, that the trapezoidal method has zero phase.
2.8.5.2
Bilinear Transformation
For a more quantitative analysis of the approximation error of the 1st-order analog derivative in discretized form, we perform the analysis in the transformed domain. Considering the z-transform of the trapezoidal method expressed by Eq. (2.113) and the Laplace transform of the ideal derivative or differentiator analog circuit (with an ideal TF H (s) = s), these are related as
166
2 Discrete-Time Signals, Circuits, and System Fundamentals
H (z) =
2 1 − z −1 Y (z) = , ⇔ X (z) T 1 + z −1
H (s) =
Y (s) = s. X (s)
Thus, the mapping between the s-plane and z-plane (also for s = j and z = e jωT ), can be written as s=
2 1 − z −1 , ⇔ T 1 + z −1
j =
2 1 − e− jωT . T 1 + e jωT
(2.114)
From the above, the bilinear- z transform (BZT) is defined from the following substitution (2.115) H (z) = H (s)|s= 2 1−z−1 T 1+z −1
where T is the sample rate and the pulsatance is equal to = 2π f / f s [radians/sample] (where f s = 1/T is the sampling rate).
2.8.5.3
Bilinear Transformation with Frequency Prewarping
The mapping between the continuous-time (CT) and discrete-time (DT) frequencies is not linear, in fact from (2.114) using the De Moivre’s formula12 we get =
ωT 2 tan T 2
(2.116)
i.e., CT angular frequency = 2π f a is determined as a prewarped version of the requested DT angular frequency ω = 2π f . For example in the case of IIR sections, the filter mask is usually determined starting from 2nd-order analog prototype H (s) expressed as H (s) =
b2 s 2 + b1 s + b0 a2 s 2 + a1 s + a0
(2.117)
in which the frequency specifications (gain, cutoff frequency, bandwidth) are predistorted by the bilinear transformation and remapped again in z after determining the H (s) such as to satisfy the specifications on the target mask. The analog filter specifications are therefore those of the type shown in Fig. 2.43. Let us consider the following example for H (s) = 1. Frequency prewarp, i.e. H (s, a ) =
12
a s + tan 2 T
a : s+a
aT 2
We remind the reader that the de Moivre formula is fundamental in the complex numbers, and is defined as (cos x + j sin x)n = cos(nx) + j sin(nx), for x ∈ R and n ∈ N. It can be derived from Euler formula: e j x = cos x + j sin x.
2.8 Infinite Impulse Response Filter
167
Fig. 2.43 Usual frequency-domain specifications for an analog filter
2. mapping s =
2 1−z −1 T 1+z −1
H (z) =
Ka 2 1−z −1 T 1+z −1
+
2 T
aT 2
tan
.
Then, if the required lowpass gain is 1, i.e., the analog DC gain for s = 0 that maps to digital DC gain z = 1 we consider lim H (s) = lim H (z) =
s→0
z→1
2 T
Ka = 1, i.e. K = tan aT 2
2 T
tan a
aT 2
.
The digital filter can be defined as H (z) =
tan aT 2 1−z −1 1+z −1
+ tan
aT 2
, i.e. H (z) =
1 + tan−1
aT 2
1 + z −1 ! + 1 − tan−1
aT 2
"
z −1
.
For other methods, less used in the DASP, refer to specialized DSP texts [3–7]. Example 2.17 Discrete-time implementation of RC analog (resistor, capacitor) 1storder lowpass filter with cutoff frequency equal to 10 kHz, with R ∼ 160 , and C ∼ 0.1 µF. The analog circuit TF is H (s) =
1 sC
1 c = ωωc +s , where it is known that 1+τ s 1 1 = RC and, with different formalism, τ
1 R+ sC
=
the cutoff frequency of the circuit is: ωc = τ = RC, represents the time constant. For a sampling rate of f s = 44.1 kHz applying the bilinear transformation we have H (z) = H (s)|s=2 fs 1−z−1 = 1+z −1
1 1−z −1
1 + τ 2 f s 1+z −1
=
1 + z −1 . 1 + ks − (ks − 1)z −1
The difference equation for the implementation of the DT filter is equal to y[n] =
ks − 1 1 y[n − 1] + (x[n] + x[n − 1]) ks + 1 ks + 1
(2.118)
168
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.44 Magnitude response of analog RC filter and its numerical realizations with bilinear transformation with and without frequency prewarping. With prewarping the −3 dB cutoff frequency of the numerical filter exactly coincides with that of the analog filter
where in the absence of pre-warping ks = f s /π f c = 1.4037, while after the application of pre-warping ks = 1.1578. The magnitude frequency responses of the analog RC filter and its numerical implementations, with and without frequency prewarping, are shown in Fig. 2.44. Observe from the response zoom that in the presence of prewarping the −3 dB cutoff frequency of the numerical filter coincides with that of the analog filter; whereas, in the absence of prewarping it is about 8.7 kHz.
2.8.5.4
Impulse-Invariant Transformation
The basic idea is that the DT impulse response h[n] has the same CT impulse response Ai h(t). Thus as the analog TF can be factorized as a sum of the type H (s) = i s+a i it follows that H (z) = Z {H (s)} =
i
Ai z . z − e−ai T
However this method, do not preserve the frequency response and may be affected by aliasing. 2.8.5.5
Matched z-Transform
The matched z-transform (MTZ) map the poles and zeros of H (s) of the z-plane to the poles and zeros of H (z) to the z-plane
2.9 Multiple-Input Multiple-Output FIR Filter
169
H (z)|(s+a)=(1−z −1 e−aT ) (s + a) → (1 − z −1 e−aT ) (s + a ± jb) → (1 − 2e
−aT
(2.119) cos bT z
Let us consider the following example for H (s) =
−1
−1
s , s+a
+e
−2aT −2
z ).
that is an highpass filter; we
have that: H (z) = K . The highest analog frequency s = ∞ maps to the highest digital frequency z = 1−(−1) 1−eaT . −1, thus we have that: H (z) = K 1−(−1)e −at = 1, i.e., K = 2 Finally we get 1 − e−aT 1 − z −1 H (z) = . 2 1 − z −1 e−at 1−z 1−z −1 e−at
Remark 2.28 In the MZT the mapping of the poles from the s-plan to the z-plane is the same as the impulse-invariant method [3]. However, in the latter the mapping of zeros takes place differently. Compared to the BZT and other transformations, the MZT requires a gain scaling. If the analog filter has zeros at a frequency higher than f s /2 their position on the z plane is affected by aliasing. For all poles H (s) TF, the MZT does not adequately represent the H (s). We need to add zeros at the point z = −1 for each excess pole. Example 2.18 Given an high pass H (s) H (s) =
s , ⇒ s+a
H (z) = K
1 − z −1 . 1 − z −1 e−aT
The gain K can be calculated observing that H (s → ∞) = 1 and imposing the same condition on the DT TF, which is H (z → 1) = 1 H (z)|z=1 = K
1 − (−1) = 1, ⇒ 1 − (−1)e−aT
K =
1 − e−aT . 2
2.9 Multiple-Input Multiple-Output FIR Filter Modern DASP applications, as in multi-point equalizations for room acoustic correction, are implemented using microphone and speaker arrays that are typically implemented with multiple-input multiple-output (MIMO) numerical filters calibrated with adaptive filtering methods [8, 22]. In this section, we extend the FIR filtering notation to the MIMO case as shown in Fig. 2.45. A MIMO FIR filter, with P inputs and Q outputs, can be characterized by the following impulse responses
170
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.45 Representation of P-inputs and Q-outputs MIMO FIR filter. Each output can be seen a P-channels filter bank (FB). Hence the MIMO system is interpreted as Q-FBs each with P-channels
T
hi j ∈ R M×1 h i j [0] · · · h i j [M − 1] , i = 1, . . . , Q, j = 1, . . . , P (2.120) where hi j indicate the P × Q impulse responses, considered for simplicity, all of identical M length, between the j-th input and the i-th output. Indicating with T
x j ∈ R M×1 x j [n] · · · x j [n − M + 1] ,
j = 1, 2, . . . , P
(2.121)
the input signals present on the filters delay lines hi j for j = 1, 2, ..., P, at the instant n, for the Q outputs, we can write T T T x1 + h12 x2 + · · · + h1P xP y1 [n] = h11 T T T y2 [n] = h21 x1 + h22 x2 + · · · + h2P xP .. .
(2.122)
y Q [n] = hTQ1 x1 + hTQ2 x2 + · · · + hTQ P x P . The vector x j is often referred to as the data-record relative to the j-th input of the MIMO system.
2.9 Multiple-Input Multiple-Output FIR Filter
171
Said y[n] ∈ R Q×1 = [ y1 [n] y2 [n] · · · y Q [n] ]T , the vector representing all the outputs of the MIMO filter at the time n, called output snap-shot, the output expression can be written as ⎡ T T ⎤ ⎡ ⎤ T h11 h12 · · · h1P x1 T T T ⎥ ⎢ h21 ⎢ x2 ⎥ h · · · h 22 2P ⎢ ⎥ ⎢ ⎥ (2.123) y[n] ∈ R Q×1 = ⎢ .. ⎢ .. ⎥ .. . . .. ⎥ ⎣ . ⎣ . ⎦ . . ⎦ . hTQ1 hTQ2 · · · hTQ P
Q×P
xP
P×1
where the above notation implies that each element of the array in an M-length vector. So, with this convention the index M does not appear explicitly in the array dimensions. In fact, rewriting the above in extended mode takes the form h 11 [0] · · · h 11 [M − 1] ⎢ .. y[n] = ⎣ .
h Q1 [0] · · · h Q1 [M − 1] ⎡
· · · h 1P [0] · · · .. .
· · · h Q P [0] · · · ⎡⎡
⎤ h 1P [M − 1] ⎥ .. ⎦ . h Q P [M − 1] Q×P M ⎤⎤ x1 [n] ⎥⎥ .. ⎢⎢ ⎦⎥ ⎢⎣ . ⎢ ⎥ ⎢ x1 [n − M + 1] ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ·⎢ . ⎥ .. ⎢ ⎥ ⎢ ⎥ . ⎢⎡ ⎤⎥ ⎢ ⎥ x P [n] ⎢ ⎥ ⎢⎢ ⎥⎥ . ⎣⎣ .. ⎦⎦ x P [n − M + 1] P M×1
(2.124)
The j-th row of the matrix in Eq. (2.124) contains all the impulse responses for the filters that belong to the j-th output, while the column vector on the right, contains the signal of the input channels all stacked in a single column.
2.9.1 MIMO Filter in Composite Notation 1 Equation (2.123) in more compact notation, defined as MIMO filter in composite notation 1, takes the form y[n] = Hx (2.125) where, H ∈ R Q×P(M) is defined as ⎡
H ∈ R Q×P(M)
T T h11 h12 T T ⎢ h21 h22 ⎢ = ⎢ .. .. ⎣ . . hTQ1 hTQ2
⎤ T · · · h1P T ⎥ · · · h2P ⎥ . . .. ⎥ . . ⎦ · · · hTQ P Q×P
(2.126)
172
2 Discrete-Time Signals, Circuits, and System Fundamentals
Fig. 2.46 Diagram of the i-th MISO sub-system, of the MIMO filter. Each output channel can be implemented as a single scalar product
where with the notation Q × P(M), we denote a partitioned Q × P matrix, where each element of the partition is a row vector equal to hi,T j ∈ R1×M . The vector x, said composite input, defined as ⎡
x ∈ R P(M)×1
⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ =⎢ . ⎥ ⎣ .. ⎦ xP
T
= x1T x2T · · · xTP
(2.127)
P×1
is constructed as the vector of all stacked inputs at instant n, i.e., x is formed by the input vectors xi , for i = 1, ..., P.
2.9.2 MIMO ( P, Q) System as Parallel of Q Filters Banks The previous notations allows to interpret a MIMO system as the parallel of Q multiple-input single-output (MISO) system. Indeed, according to Eqs. (2.122)– (2.124), each of the Q output at the instant n can be interpreted as the scalar product T T hi2 · · · hiTP ]T and x = [x1T x2T · · · xTP ]T ; i.e. of the vectors hi = [hi1 yi [n] = hiT x,
i = 1, 2, . . . , Q.
(2.128)
In other words, each output can be interpreted as a bank of P filters. Remark 2.29 Note that, the previous notation is the one normally used in the field of array processing as, for example, in broadband beamforming.
2.9 Multiple-Input Multiple-Output FIR Filter
173
2.9.3 MIMO Filter in Composite Notation 2 Let define the vector
hTj: ∈ R1×P(M) hTj1 hTj2 · · · hTjP
(2.129)
i.e., the j-th row of the matrix H; and the composite weights vector h, built with vectors hTj: for all j = 1, 2, ..., Q, for which we can write. ⎡
h1: ⎢ h2: ⎢ ⎢ . ⎣ ..
h ∈ R(P M)Q×1
⎤ ⎥ ⎥ ⎥ ⎦
h Q:
(2.130) Q×1
that is made with all matrix H rows, staked in a single column, i.e., h = vec(H). In composite notation 2, the MIMO filters hi j are staked in a column vector define as ⎡
T ⎤ T T h11 · · · h1P T ⎥ ⎢ T T ⎢ h21 · · · h2P ⎥ (P M)Q×1 ⎢ ⎥ h∈R ⎢ . (2.131) .. ⎥ ⎣ ⎦ .
T T h Q1 · · · hTQ P Q×1 We define the data composite matrix X as ⎡
X ∈ R(P M)Q×Q = I Q×Q
x ⎢0 ⎢ ⊗x =⎢. ⎣ ..
0 x .. .
··· ··· .. .
⎤ 0 0⎥ ⎥ .. ⎥ .⎦
0 0 ··· x
Q×Q
where the symbol ⊗ indicates the Kronecker product. Thus, from the above definitions we can express the output as ⎡
0 ··· xT · · · .. . . . . 0 0 ···
xT ⎢ 0 ⎢ y[n] = (I ⊗ x)T vec(H) = ⎢ . ⎣ ..
0 0 .. .
xT
⎤
⎡
h1: ⎢ h2: ⎢ ⎢ .. ⎣ .
⎥ ⎥ ⎥ ⎦ Q×Q
h Q:
⎤ ⎥ ⎥ ⎥ ⎦
= XT h. Q×1
(2.132)
174
2 Discrete-Time Signals, Circuits, and System Fundamentals
2.9.4 MIMO Filter in Snap-Shot or Composite Notation 3 Considering the MISO system of Fig. 2.46, we define the vector T
h j [k] ∈ R P×1 = h j1 [k] h j2 [k] · · · h j P [k] P×1 . In addition, in a similar way we define T
x[0] ∈ R P×1 = x1 [0] x2 [0] · · · x P [0] the vector containing all inputs of the filter MISO at instant n, this vector is the socalled input snap-shot. Furthermore, we define the vector x[k] ∈ R P×1 as the signals present on the filter delay lines at the k-th delay. With this formalism, the j-th MISO channel output, combining the above, can be expressed in snap-shot notation as y j [n] =
M−1
hTj [k]x[k].
(2.133)
k=0
Remark 2.30 Note that, the MIMO composite notations 1, 2 and 3, defined by Eqs. (2.125), (2.132) and (2.133), from the algebraic point of view, are completely equivalent. However, note that for certain developments in the rest of the text, it is more convenient to use the one rather than the other notation.
References 1. A. Fettweis, Digital circuits and systems. IEEE Trans. Circuit Syst. 31(1), 31–48 (1984) 2. M. Vetterli, J. Kovaˇcevi´c, V.K. Goyal, Foundations of Signal Processing, free version (2013). http://www.fourierandwavelets.org 3. A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing, II. (Prentice Hall, Hoboken, 1999) 4. L.R. Rabiner, B. Gold, Theory and Application of Digital Signal Processing (Prentice-Hall Inc, Englewood Cliffs, 1975) 5. T. Kaylath, Linear Systems (Prentice Hall, Englewood Cliffs, 1980) 6. A. Antoniou, Digital Filters: Analysis and Design (MacGraw-Hill, New York, 1979) 7. S.J. Orfanidis, Introduction to Signal Processing (Prentice Hall, Englewood Cliffs, 2010). ISBN 0-13-209172-0 8. D.G. Manolakis, V.K. Ingle, S.M. Kogon, Statistical and Adaptive Signal Processing (Artech House, Norwood, 2005) 9. J.W. Cooley, J. Tukey, An algorithm for the machine calculation of complex fourier series. Math. Comput. 19, 297–301 (1965) 10. E.O. Brigam, The Fast Fourier Transform and its Application (Prentice-Hall Inc, Englewood Cliffs, 1998) 11. M. Frigo, S.G. Johnson, The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)
References
175
12. T.W. Parks, J.H. McClellan, Chebyshev approximation for nonrecursive digital filter with linear phase. IEEE Trans. Circuit Theor. 19, 189–194 (1972) 13. F.J. Harris, On the use of windows for harmonic analysis with the discrete Fourier transform. Proc. IEEE 66(1), 51–83 (1978) 14. N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform. IEEE Trans. Comput. 90–93 (1974) 15. S.A. Martucci, Symmetric convolution and the discrete sine and cosine transforms. IEEE Trans. Signal Proces. SP-42(5), 1038–1051 (1994) 16. A. Haar, Zur Theorie der Orthogonalen Funktionensysteme. Math. Ann. 69, 331–371 (1910) 17. S.G. Mallat, A Wavelet Tour of Signal Processing (Academic Press, Cambridge, 1998). ISBN: 0-12-466605-1 18. M. Vetterli, J. Kovaˇcevi´c, Wavelets and Subband Coding, open-access edn (2007). http://www. waveletsandsubbandcoding.org/ 19. E. Feig, S. Winograd, Fast algorithms for the discrete cosine transform. IEEE Trans. Signal Proces. 40(9), 2174–2193 (1992) 20. F. Beaufays, Transform domain adaptive filters: an analytical approach. IEEE Trans. Signal Proces. SP-43(3), 422–431 (1995) 21. M. Moonen, B. De Moor, SVD and Signal Processing, vol. III (Elsevier Science, Amsterdam, 1995). ISBN: 9780080542157 22. A. Uncini, Fundamentals of Adaptive Signal Processing (Springer, Berlin, 2015). ISBN: 9783-319-02806-4
Chapter 3
Digital Filters for Audio Applications
3.1 Introduction The possibility of modifying the signal spectrum is a very common requirement for all digital audio signal processing (DASP) applications [1–9]. Considering a general scheme as shown in Fig. 3.1, the device for its implementation, generally known as a filter, is almost always present in both professional and consumer equipment. For example, the variation of the spectrum may be necessary for the acoustic correction of the listening environment or instead, more simply, it may be guided by the musical tastes of the listener. Applications include real-time audio effects, such as convolution reverbs, digital room equalization, audio rendering for computer games and acoustic virtual reality, spatial sound reproduction techniques, crosstalk cancelation and many more. In a more general sense the filter, fixed or adaptive, is a device designed to remove components or modify some spectral characteristics of a signal. By filter design, in general, we mean the approximation of a certain frequency response, called the filter mask, or in the case of adaptive filter (AFs) the fulfillment of an appropriate criterion, in both cases the result is obtained with an optimization technique that can sometimes take into account other constraints. In addition, in real-time music applications, such as in synthesizers and digital effects, filters are rarely stationary. For these applications, it is important that the filter remains stable and that the variability of the parameters does not introduce perceptible artifacts. In the case of analog processing, the signal is present at the output of the filter with a delay due to the analogical components only and in this case the processing is, by definition, online and real time. In the case of digital processing, whether it be in the form of dedicated hardware or software, in real-time mode the processing scheme must also be of an online type. As in the case of analog circuits, the signal is present at the output with a minimum delay (as much as possible) (and compatible with the specific requests).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Uncini, Digital Audio Processing Fundamentals, Springer Topics in Signal Processing 21, https://doi.org/10.1007/978-3-031-14228-4_3
177
178
3 Digital Filters for Audio Applications
Fig. 3.1 Online versus batch audio processing: a general scheme for online, mini-batch or batch audio processing [implementing a transfer function (TF)]; b A Digital Signal Processor (DSP), often used in embedded and real-time applications, processes signal blocks (at most a single sample) at a time. The overall delay should be as short possible and compatible with the specific requirements
3.2 Analog and Digital Audio Filters In analog or digital audio processing, the filter is commonly used as a tone control or equalizer, i.e., the filter frequency response is modified by enhancing (boosting) or attenuating (cutting) some frequency bands. Moreover, for the discrete-time (DT) transfer function (TF) H (z) determination, usually we start from the definition of an continuous-time (CT) TF prototype H (s).
3.2.1 Classifications of Audio Filters Given the possible specifications that can be requested from H (s), or for its digital counterpart H (z) determined by s → z map, it is necessary to define some filters’ shape, specific to the area of DASP. Shelving Filters—Lowpass and highpass filters with variable cutting gain and frequency. For example, in Fig. 3.2a is reported, a low-frequency boost and cut, and high frequency boost and cut, respectively, at 250 Hz and 2 kHz. The order of the filter determines the slope of the curve in the transition region which in audio is, generally, expressed as ±6 dB/octave (or ±20 dB/decade) for 1st-order filters and ±12 dB/octave (or ±40 dB/decade) for 2nd-order filters. Peaking or Presence Filters—Bandpass and band-stop filters with variable parameters: gain, frequency of intervention and bandwidth, called parametric filters (see Fig. 3.2a, b). Equalizers—Combined filters with fixed intervention frequency and bandwidth and variable gains, called also graphic equalizers. They can be connected in parallel or cascade. The nature of the combined response depends on the filter combining topology. More in general, by audio signal equalization, we mean the process of amplification or attenuation of specific frequency bands so as to obtain a certain desired frequency response.
3.2 Analog and Digital Audio Filters
179
Tone control: Shelving and Peaking filters, f L =250, f M=700 f H =2000 [Hz] 18
Bass
Treble
Mid
20log10 |H(f)| [dB]
12
Boost 6 0 -6
Cut -12
a)
-18 20
50
100
500
1k
5k
10k
20k
Frequency [Hz]
Parametric filters: peaking const-G=+-18.0 [dB] vs peaking const-Q=0.707 18
C on ts-G P e a king
20log10 |H(f)| [dB]
12
C on ts-Q P e a king Boost
6 0 -6
Cut -12
b)
-18 20
50
100
500
1k
5k
10k
20k
Frequency [Hz]
Fig. 3.2 Typical frequency response of a 2nd-order analog or digital filters used as audio bass, middle, and treble tone control, and parametric filter to modify the audio signal spectrum: a lowpass boost-cut shelving filter and boost-cut peaking filter (low cut frequency at 250 Hz, mid at 700 Hz and high-cut frequency at 2 kHz); b parametric filter bandpass and band-stop filters with equal gain and different frequency bandwidths (fixed G) and parametric filter bandpass and band-stop peaking filter with different gains and fixed bandwidth (constant-Q)
The equalizers and peaking filters can be used for the acoustic correction of the environment or devices (loudspeakers, etc.), to correct resonances and antiresonance or in other terms, to obtain a flat frequency response. Or, more simply, to enhance/attenuate certain frequencies depending on the sensitivity of the listener or of the sound engineer. The most common audio signal equalization system, available in almost all analog and digital devices, is probably the tone control. Through a simple adjustment it is
180
3 Digital Filters for Audio Applications
possible to exalt or attenuate the high, or low frequencies. The circuit that performs this adjustment is previously defined as shelving filter. For example, Fig. 3.2a shows the frequency response of the bass/mid/treble tone control for √ f c1 = 250 Hz, f c2 = 700 Hz and f c3 = 2 kHz for some values of G and Q = 1/ 2 (where Q is the quality factor defined in Sect. 1.3.1.3, as Q = f 0 / f ) for bass and treble control, and Q = 0.4 for mid control tone. In the audio field the term equalizer is more properly used when operating selectively on single frequency bands. For example, with a parametric equalizer, it is possible to control the gain G, and the frequency of intervention f 0 and in some cases the bandwidth f of the filter. In the case of exaltation or attenuation of a certain frequency the circuit behaves as band-boost or band-cut.
3.2.2 Shelving and Peaking Filter Transfer Functions We have previously defined shelving-filter as a circuit for adjusting low (bass), mid and high (treble) tones. The shelving filters are, usually, implemented with a 1st- or 2nd-order TFs.
3.2.2.1
First Order Shelving Filter
A 1st-order shelving filter is characterized by a TF with a pole and a zero plus a parameter K which regulates the gain (boost and cut) of the intervention. The gain parameter K , can be placed in the numerator and adjust the position of the zero or at the denominator by adjusting the pole position. Consequently, we have two types of 1st-order TF, indicated as type 1 and 2. Thus, in total, we can identify four types of shelving filter: bass-boost/cut and treble-boost/cut. Bass Boost and Cut Shelving Filter—The 1st-order bass-boost/cut type 1 and 2 TFs can be, respectively, defined as s/c + K s/c + 1 s/c + 1 HB(2) (s) = s/c + K HB(1) (s) =
(3.1)
where c denote the analog cutoff frequency. The relative frequency response are shown, respectively, in Fig. 3.3a–c. Remark 3.1 Observe how the boost and cut frequency responses are not complementary then, to get symmetric boost and cut, we need to insert a switch that selects HB(1) (s) for bass-boost and HB(2) (s) for bass-cut; i.e., that flip the position of the numerator and denominator TFs.
3.2 Analog and Digital Audio Filters
181
Fig. 3.3 Frequreble-cuency response of 1st-order shelving filters with cut frequency at −3 dB
Treble-Boost and Cut Shelving Filter—The 1st-order treble-boost/cut type 1 and 2 TFs, can be defined as K s/c + 1 s/c + 1 s/ c+1 HT(2) (s) = . s K /c + 1 HT(1) (s) =
(3.2)
Remark 3.2 Observe that also in this case the boost and cut frequency responses are not complementary. So, as shown in Fig. 3.3b–d, to get symmetric boost and cut, we need to insert a switch that selects HT(1) (s) for bass-boost and HT(2) (s) for bass-cut. Parametric Filter—As indicated by Harris in [10], defining HL P (s) =
1 s/c + 1
a prefixed lowpass fixed TF, for c = 1 the Eq. (3.1) can be decomposed as HB(1) (s) = 1 + (K − 1)HL P (s), bass-boost 1 , bass-cut HB(2) (s) = 1 + (K − 1)HL P (s)
(3.3)
for bass-boost and cut, while for c = 2 Eq. (3.2) can be decomposed as HT(1) (s) = K + (1 − K )HL P (s), treble boost 1 HT(2) (s) = , treble cut. K + (1 − K )HL P (s)
(3.4)
182
3 Digital Filters for Audio Applications
Remark 3.3 Note that with these configurations, starting from a single a priori determined lowpass HL P (s) TF, the control parameter K is external.
3.2.2.2
2nd-Order Bass Boost and Cut Type 1 and 2 Shelving Filter
To increase the slope (i.e., the band transition) of the shelving filters response, you need to use 2nd-order cells. The 2nd-order type 1–2 lowpass shelving filter can be written as HB(1) (s)
=
G+
√
1+
s G Q + 0
s Q0
+
s2 20
s2 20
2
, and
HB(2) (s)
s + s 2 1 + Q 0 0 = √ s s2 G + G Q + 2 0
(3.5)
0
where 0 = 2π f 0 represents the analog domain cut angular frequency or pulsatance of the filter, the terms Q is the quality factor or Q-factor that determines the filter approximation √ around 0 . For the so-called Butterworth max-flat TF, with we have that Q = 1/ 2 and |H ( j0 )|dB = |H ( j0)|dB − 3 dB. So, for the type 1, if s → 0 we have that H (s) = G, while for s → ∞, H (s) = 1. It follows that for G > 1 we obtain a bass-boost and for G < 1 a bass-cut.
3.2.2.3
2nd-Order Treble Boost and Cut Type 1 and 2 Shelving Filter
The highpass type 1–2 filter can be defined by the following TFs HT(1) (s)
=
1+
√
2
s G Q + G s 2 0 0
1+
s Q0
+
s2 20
2
, and
HT(2) (s)
s + s 2 1 + Q 0 0 = √ s 2 1 + G Q0 + G s 2
(3.6)
0
so, for the type 1 if s → 0 we get H (s) = 1, while for s → ∞, H (s) = G. It follows that for G > 1 we have a treble-boost and for G < 1 a treble-cut. In this way, we can obtain the amplitude response curves, similar to those in Fig. 3.3a–e, but with double slopes of 12 dB/octave as those reported, for example, in bass and treble control in Fig. 3.4.
3.2.2.4
2nd-Order Peaking Filters Transfer Functions
A general form of a 2nd-order type 1-2 s-domain TF that can be used for peaking filters can be written as H P(1) (s)
=
s + 1 + G Q 0
1+
s Q0
+
s2 20
s2 20
, and
H P(2) (s)
=
1+
s Q0
+
s2 20
s 1 + G Q + 0
s2 20
.
(3.7)
3.2 Analog and Digital Audio Filters
183
HT(1)(s )
H B(2)(s )
HT(2)(s )
Amplitude [dB]
H B(1)(s )
Fig. 3.4 Amplitude response of symmetric shelving boost/cut bass filters with f c = 100 Hz and boost/cut-treble with f c = 5 kHz, for various values of the G dB gain. To obtain a symmetrical response it is necessary to switch between two TFs
Peaking filters are very important in DASP as they can be used as parametric filters in which it is possible to define the intervention frequency 0 , the gain G and, adjusting the Q-factor that regulates the peak bandwidth. Figure 3.5 shown some typical situations. For example, in the case of tone control, we choose a certain Q-factor (e.g., Q = 0.5, √12 , ...;) and set the desired gain G. In other situations (see the figure), setting the frequency and the gain, it is possible to adjust the width of the intervention band by adjusting the Q-factor.
3.2.2.5
Shelving and Peaking Filter’s Composition
In practice, it is possible to combine multiple shelving and peaking filters, for example by connecting them in cascade, in order to obtain an overall response, or desired equalization curve. For example, with very narrow band peaking filters, it is possible to correct specific resonance or anti-resonance frequencies due to the presence of troublesome natural modes of the room. Or, for example, eliminate annoying whistles due to feedback between microphone and speaker. For example, Fig. 3.6 shows the frequency response of some shelving and peaking filters connected in cascade that are used for acoustic room correction. However, note that the equalization with deep peak or notches (with amplitude > 12 dB) in the response should not be attempted because of potential adverse effects in other areas of a room, as well as the possibility of amplifier overload.
184
3 Digital Filters for Audio Applications Peaking 2nd-Ord 18
Amplitude [dB]
12 6 0 -6 -12 -18 20
50
100
500
1k
5k
10k
20k
Frequency [Hz]
Fig. 3.5 2nd-order peaking filters centered at various frequencies 100 Hz, 1 kHz and 5 kHz, for example designed in order to remove specific resonances. It is possible to perform a fine-tuning, suitably varying the parameters frequency 0 , Q-factor and the filter gain G
3.2.3 Frequency Bandwidth Definitions for Audio Application In DASP the definition of the filter bandwidth may differ from the standard definition at half-power or at −3 dB, obtained from Eqs. (1.39) and (1.40). In fact, as described in the literature (e.g., [11, 12]), there are some variations in the definition of the band f , gain G B at cutoff frequencies f 1 and f 2 . Let us now consider some definitions of bandwidth for filters or banks of digital equalizer filters commonly used in the audio field.
3.2.3.1
Regalia-Mitra’s Bandwidth Definitions
Regalia and Mitra [13] and Massie [14] define the band of the digital filter as the frequency distance between the two points at −3 dB around of minimum point gain when the filter is a notch filter, i.e., for G = 0 in Eq. (3.7) (H P(1) (s)), as shown in Fig. 3.1. Note that with this definition the bandwidth value does not change by modifying the value of the gain G (Figs. 3.7 and 3.8). The problem with this definition is that two complementary filters with the same frequency boost and cut and with opposite gains do not produce a perfectly flat resultant at 0 dB.
3.2 Analog and Digital Audio Filters
185
Fig. 3.6 Example of equalization curve using two shelving filters at 100 Hz and 5 kHz, and four peaking filters at 80 Hz, 120 Hz, 1 kHz and 3.1 kHz. The narrowband peaking filters are used in order to remove specific resonance or anti-resonance due to the room modes. For example, the notch filter at about 3 kHz can be used to remove the so-called Larsen effect due to microphone–loudspeakers feedback Fig. 3.7 Bandwidth definition of Regalia-Mitra [15]
H(f) 0
f1
f0
f2
f
3dB
Bandwidth
White in [8], defines bandwidth simply as the frequency distance between two points at +3 dB for the boost filters, and −3 dB for the cut filters evaluated with respect to the 0 dB level. However, with this definition is impossible to determine the band when the exaltation or attenuation is less than 3 dB. Moorer in [7], corrects White’s definition by specifying that: (1) for gains greater than ±6 dB the bandwidth is defined as the distance between the two points at ±3 dB; (2) for lower gains than ±6 dB the band is defined as the distance between the two points at G dB /2 (between 0 and G dB dB) (midpoint gain dB).
186
3 Digital Filters for Audio Applications 9 H( f ) [dB]
6
f
3 0
f
3 6 9 20
50
100
200
500
1k
2k
3k
5k
10k
20k
Frequency [Hz]
Fig. 3.8 Bandwidth definition of White [8] Fig. 3.9 Typical frequency response of a 2nd-order numerical resonator used as a digital equalizer cell
H( f ) G f
GB
G0
f0
fs 2
f
This last definition is attractive in that it is mathematically consistent and simplifies the automatic filter design procedures (which often must be done online, i.e., during the filtering operation itself, simply by moving a cursor).
3.2.3.2
Frequency Bandwidth Definition of Orfanidis
Referring to Fig. 3.9, we indicate with f s , f 0 and f = f 2 − f 1 , the sampling frequency, the central frequency, and the filter bandwidth, respectively; we indicate as G 0 , the gain at zero frequency (or continuous) that is equal to the gain at f s /2 (Nyquist frequency); with G the gain at f 0 and with G B the gain at f 1 and f 2 . Moreover, we can affirm that for a boost filter it is worth G > G B > G 0 , while for a cut filter we have that G < G B < G 0 .
3.2 Analog and Digital Audio Filters
187
For a boost filter it is possible to define G B (consequently the f band) in various ways. Below are some examples [12] • • • •
3 dB below the peak G, as G 2B = G 2 /2; 3 dB above the reference gain G 0 , like G 2B = 2G 20 ; arithmetic average between the peak and the reference gain G 2B = (G 20 + G 2 )/2; geometric average between the peak and the reference gain G 2B = G 0 G.
It should also be noted that the 3 dB definition is possible only if the gain is 3 dB higher or if G 2 = 2G 20 . With similar reasoning also for a cut filter it is possible to define a gain G B at 3 dB above the gain cut G, like G 2B = G 20 /2; or 3 dB below the reference gain G 0 , like G 2B = G 20 /2; or as an arithmetic or geometric mean between the peak and the reference gain. Also in this case the first definition is possible only if the cut is greater than 3 dB. Remark 3.4 The definition of G B through the geometric or arithmetic mean is also possible when the boost or cut gain is less than 3 dB. In this case, as indicated in [12], it is appropriate to use arithmetic or geometric weighted averages G 2B = αG 20 + (1 − α)G 2 , arithmetic average G B = G α0 G (1−α) ,
geometric mean
(3.8)
where 0 < α < 1 (conventional averages are given for α = 1/2). The advantage of defining G B through the weighted geometric mean, is that equal and opposite boost and cut compensate each other perfectly since the transfer functions are exactly the inverse of the other.
3.2.3.3
Midpoint Gain Versus −3 dB Bandwidth
In DASP to obtain a certain response curve, filter combinations are very often used, so the choice of the definition of bandwidth is very important and depending on the overall desired response. One of the definitions widely used in the audio sector (see for example [16, 17]) is so-called midpoint gain (see Fig. 3.10), which corresponds to the geometric mean in the expression (3.8) for G 0 = 1 and α = 21 , that is the arithmetic mean of the √ extreme gains on a natural scale G max . In other words, in dB we have the midpoint between 0 dB and G maxdB dB); in fact we have that G mid = 10
G
maxdB 1 20 2
=
G max .
(3.9)
3 Digital Filters for Audio Applications
Amplitude [dB]
188
Fig. 3.10 Midpoint gain versus −3 dB bandwidth: amplitude responses of boost-cut bass-treble shelving filters with equal cut frequency, at 800 Hz
Fig. 3.11 Amplitude responses of a constant-Q filter bank. a Represented: on a log frequency scale. b Represented on a natural frequency scale
3.2.4 Constant- Q Equalizers In multiband equalizers, made with filter banks, each filter has the central frequency which is equally spaced on a logarithmic scale, with respect to the upper and lower cutoff frequency while the band is proportional to the central frequency itself. When the ratio between the central frequency f and the band f is a priori fixed, and it is the same for all the filters, the bank is denoted as constant-Q filter bank [56]. In other words, a graphic equalizer created with a filter bank is said to be constantQ equalizer, if the bandwidth of the filters increases with the filter’s central frequency so that the quantity Q = f k / f k is constant. Figure 3.11 shows the qualitative trend of a filter bank at constant-Q on a log decadic and linear frequency scale. In the audio sector the bandwidth is generally expressed in octaves. The bandwidth in octaves (or fractions, generally integer, of it), bw , is defined considering the ratio between the upper cutoff frequency f 2 and the lower one f 1 . One octave, for example, corresponds to a doubling of frequency; or f 2 = 2 f 1 . For a width of half an octave (bw = 1/2) we will have a higher cutoff frequency of f 2 = 21/2 f 1 . The octave ratio between f 2 and f 1 is expressed as
189
Amplitude [dB]
3.2 Analog and Digital Audio Filters
Fig. 3.12 Example of 1/3rd octave equalizer with 30 bands from 25.00 Hz to 20 kHz according to ISO Standard [18]. Due to the crowding of filters the maximum resolution of the common audio equalizer is one-third octave
f2 = 2bw . f1
(3.10)
The bandwidth bw , expressed in integer fractions of an octave, is a natural quantity defined as an exponent of 2 in the expression (3.10); that is as the difference between the logarithms in base 2 between the upper and lower cutoff frequency: that is, bw = ln2 ( f 2 ) − ln2 ( f 1 ). The central frequency of the filter results for the (1.41) and the (3.10) √ √ (3.11) f 0 = f 1 f 2 = f 1 2bw = f 2 / 2bw in other words, the central frequency of the filter appears to be equally spaced on the logarithmic scale by the upper and lower cutoff frequency; that is ln ( f 0 ) =
ln ( f 1 ) + ln ( f 2 ) . 2
Remark 3.5 In the natural musical scale, the distance of one octave corresponds to the distance of eighth degree with respect to a reference note. The octave note is a replica of the reference but with a double frequency. Amplifying or attenuating the frequency by a certain octave therefore has a very precise correspondence with respect to the musical scale. So, it is for this reason that in the design of the equalizers for the audio signal the number of channels is expressed as octaves or fractions of octaves. The entire audible spectrum can be covered with about 10 octaves, an octave equalizer covers the audible spectrum with 10 filters, while for an equalizer to onethird octave, for which bw = 1/3, it takes 30 bands (see Fig. 3.12). As an example of frequency calculation, consider an octave equalizer (bw = 1) so that f 2 = 2 f 1 . For the (3.11), the center frequency of the filter must be equidistant on a logarithmic scale, from the upper and lower cutoff frequency for which
190
3 Digital Filters for Audio Applications
Table 3.1 Octave equalizer frequencies (10 bands) f 1 = f 0 /21/2 f0 22.1 Hz 44.2 Hz 88.4 Hz 177 Hz 353 Hz 707 Hz 1.414 kHz 2.828 kHz 5.657 kHz 11.31 kHz
31.25 Hz 62.5 Hz 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz 16 kHz
f 2 = f 0 /21/2 44.2 Hz 88.4 Hz 177 Hz 353 Hz 707 Hz 1.414 kHz 2.828 kHz 5.657 kHz 11.31 kHz 22.627 kHz
The number of bands to cover the audible spectrum is equal to ten. Note that in ISO standards, in order to have simpler numbers, some frequencies are rounded [18]
f 1 = f 0 /2bw /2 , and f 2 = f 0 2bw /2 .
(3.12)
In Table 3.1, is shown a possible choice of frequencies for a ten-channel octave audio equalizer that cover the entire audible spectrum. In general the choice is made starting from the central frequency of 1 kHz, calculating the lower and upper frequencies with the previous expressions. Remark 3.6 For octave filters bank (or fractions of them) the quality factor Q can be obtained from the expression (3.11) and from the definition of Q. So, we get √ 2bw f0 . = b Q= f2 − f1 2 w −1
(3.13)
It follows that the filter Q-factor depends only on the choice of bandwidth expressed in octaves, and is constant for each filter of the equalizer. For example for an octave equalizer, crossing √ at −3 dB, by calculating the Q with the expression (3.13) we will have: Q = 2, for one with half octave Q ≈ 2.87, for one to third of eighth Q ≈ 4.318 and so on.
3.2.5 Digital Audio Signal Equalization Analog filtering can be performed only in real time, otherwise, the numerical approach allows the possibility of performing the processing asynchronously with respect to the signal acquisition times. For example, in post-production operations it is possible to modify the signal spectrum with filters that operate offline, and usually faster than the online algorithms. During the filtering process the signal is stored on
3.2 Analog and Digital Audio Filters
191
a device (e.g., hard disk) and available for listening only at the end of the filtering process. With the advent of digital devices (both hardware and software), it is possible to define several way to perform the audio signal processing. Online Filtering—The process is performed with a filter, generally of a causal type, in real-time, that is, the processing time of a signal sample is less than the sampling period. The filter output is generally a reproduction device. Sometimes, is required to be able to modify the characteristics of the filter while listening (or at run-time) without it being interrupted. In this case, to avoid audible artifacts, it is necessary to pay particular attention to the structure used for filtering and to the technique of updating the filter coefficients [56]. It is possible, if the application allows it, to use non-causal filters but with a delay between input and output (group delay) equal to the length of the buffer required for mini-batch type processing. In these cases the choice of configuration and the project of the equalizer are more critical. It is possible to use a cascade IIR filters configuration or parallel bank of IIR filters. Moreover, the use of FIR filters, even in theory possible, would require filters with a length of thousands of coefficients that require the implementations in the frequency domain or with complex multi-rate architectures. Offline Filtering—The filtering process can be completely independent from the acquisition processes. In general, the possibility of modifying the amplitude response during the filtering process may not be required and the filter may not be causal. The filter output can be a storage device but if the output, compatibly with the sampling period, is an audio device the filter group delay can also be high. Remark 3.7 It should be noted that, although in general, from the point of view of the final result the themes of online and offline filtering are common, the two different approaches often lead to algorithm structures and problems that can be very diversified. In the digital mixing consoles (where, by definition, the filters must work online) the specifications on the filters are very restrictive and concern, in addition to obviously the calculation time, the group delay and the possibility of modifying the filter parameters online. A very high group delay, for example, may not be compatible with the application. Furthermore, the online modification of the filter parameters is not a problem of simple solution since it involves online control of stability and, depending on the adopted circuit form, in output one could have an audible artifacts.
3.2.5.1
Equalization Using FIR or IIR Filters
In the audio sector, both FIR and IIR filter types are commonly used. When the specifications on the group delay are not stringent and no run-time tone controls are required, as in audio-streaming software, the possibility of having a linear phase makes the FIR filters more suitable for making equalizers. In the audio field, in fact, FIR filters are always used when the problem of high group delay does
192
3 Digital Filters for Audio Applications
Fig. 3.13 Digital FIR filter equalizer. This solution is generally adopted in audio-streaming software and where there are no problems of excessive group delay
not arise. However, in digital audio mixers the use of quite long FIR filters must be done with some caution. As shown in Fig. 3.13, a graphic equalizer can be implemented by a single FIR filter. The desired frequency response is generally defined in terms of the absolute value |Hd (e jω )| and can be derived by interpolating the position of a set of graphic or mechanical cursors centered around appropriate frequencies. The calculation of the filter parameters can be performed with an optimization procedure that provides an approximate frequency response with an error that mainly depends on: (1) the type of metric used in defining the cost function; (2) from the length of the filter itself. To avoid the optimization procedure, the calculation of the impulse response can be performed with the windowing method (see Sect. 2.7.4). In practice, the h[n] is obtained multiplying the |Hd (e jω )| by a suitable window function |W (e jω )| and, imposing simple constraints on the phase, through an inverse Fourier transform generally implemented with FFT. Remark 3.8 A FIR filter can be designed to approximate any equalization curve and provides maximum flexibility, where magnitude and phase can be adjusted more specifically and semi-independently. However, to support low frequencies, many thousands of taps are needed. So, to implement these filters economically, complicated methods like multirate processing or the use of fast Fourier transform (FFT) are needed (see [1–5]). Furthermore, to ensure a low group delay the equalizer can be implemented directly in the frequency domain with the partitioned overlap-save algorithm (this argument is introduced in Sect. 3.5.2).
3.2 Analog and Digital Audio Filters
3.2.5.2
193
Control Tone, Parametric Filters and Graphic Equalizers
In general terms in the professional or consumer audio scenarios any change in frequency response, determined by a certain transfer function, is referred to as equalization. This term was originally used to flatten, i.e., equalize the response of a device affected by linearity distortions. For example, correcting the response of a telephone line, correcting room response, or flattening the response of a microphone or of a loudspeaker. However, the more modern meaning of the term is related to a much broader use that provides for a more active use of equalization in the entire chain of production and fruition of the musical or in general audio signals. More specifically, from the point of view of the end user, we can define the following types of equalizers. Tone Control—Also present in older audio amplifiers, the tone control appears as two or three knobs or sliders that amplify or attenuate bands of frequencies a priori specified generally indicated as: bass, treble; or bass, mid-range and treble. The end user adjusts the tone control according to his/her musical taste. Parametric Filters—Parametric filtering is generally based on IIR filters, in which the central frequency, the bandwidth (i.e., f ) and level (boost or cut) can be selectively adjusted. Parametric equalizers are mostly used in professional audio because they require an experienced user. Graphic Equalization—In the most common understanding of the term, an equalizer is a graphic equalizer. Typically it is a device (hardware or software) that is externally controlled by a set of sliders (physical or virtual) whose position determines the desired frequency response as shown in Fig. 3.13. Graphic equalizer can be implemented using a (usually long) FIR filter shown in Fig. 3.13 or by a cascade, or parallel filter structure. Remark 3.9 Note that, as better explained further on, both cascade and parallel equalizers suffer from interaction between adjacent passband filters. This phenomenon is due to the fact that the gain of the filter centered around its working frequency, is not zero in the adjacent bands. So increasing the gain for a certain band, inevitably produces an increase in adjacent bands. Therefore, the actual response of the equalizer is different from the graphical target defined by the position of the sliders. In other words, the band interactions phenomenon cause substantial errors in the overall magnitude response of the equalizer [19, 20].
3.2.5.3
Cascade Equalization Architecture
In the case of filters for tone control and equalization, the most common configuration is the cascade one [15, 19, 21]. With reference to Fig. 3.14, as already seen previously in the example in Fig. 3.6, for each filter it is possible to adjust the frequency of intervention, the band and the gain. The filter, in this case, is a parametric filter.
194
3 Digital Filters for Audio Applications H( f )
x[n]
Parametric Filter
f
y[n] boost (g 1) 1 cut ( g 1)
g f0 f
f0 f
H( f )
x[n]
y[n]
H0
H1
H2
f0
f2
f1
boost
1
f0 f
cut
boost / cut f0
f1
f2
f
Fig. 3.14 Audio equalizer with cascade configuration of IIR 2nd-order sections. The parameters of each filter can be changed online. This configuration is typically used in professional tone controls
Furthermore, for the implementation of the equalizer when the computing resources are limited, it could be more practical and more efficient to use the cascade configuration of IIR filter section. In the case of graphic equalizers, as in octave, half octave and one-third octave equalizers, the Q-factor is constant and a priori fixed with Eq. (3.13), and what is changed is the gain of the peaking filter only. If is necessary the online possibility of modifying the response of filters, without interrupting the filtering operation, the implementation form of the IIR filter is of central importance. In addition, filters that require the possibility of modifying the spectrum at run-time are used, in many other applications, such as, for example, vocal synthesis, sound synthesis, some audio effects etc. Remark 3.10 Note that, in the cascaded implementation of the graphic equalizer, each filter has a fixed bandwidth and unity gain (i.e., 0 dB) so that, in the absence of adjustments, the overall amplitude response is perfectly flat.
3.2.5.4
Band Interactions
The 2nd-order peaking cells are quite flexible as they have a magnitude of response that can be quite sharp at its peak. However, they asymptotically approaches a 6 dB per octave slope farther away. This means that, as previously stated (see Remark 3.9) and as shown in Fig. 3.15, the adjustment of a specific band inevitably leads to a significant effect on the adjacent bands. For example, it is difficult to make the response maximum at one band and minimum at the next. It is well known, in fact, that the frequency responses of the
3.2 Analog and Digital Audio Filters
195 Zolzer Equalizer Octave filter bank Q=2.871
Zolzer Equalizer Octave filter bank Q=2.871 30
20
20
Amplitude [dB]
10 10 0
0 -10
-10
-20 -20
-30 20
a)
50
100
b)
1000
5000 10000 20000
20
50
100
c)
Frequency [Hz]
Bristow-Johnson Equalizer Octave filter bank Q=2.871
1000
5000 10000 20000
Frequency [Hz]
Bristow-Johnson Equalizer Octave filter bank Q=2.871 30
20
Amplitude [dB]
20 10 10 0
0 -10
-10
-20 -20
-30 20
50
100
1000
Frequency [Hz]
5000 10000 20000
20
50
100
1000
5000 10000 20000
Frequency [Hz]
Fig. 3.15 Overall response of one-half octave equalizer with 22 bands for different target cursor position. In the upper part of the figure the TF H (z) are determined with the bilinear transaform (see Sect. 2.8.5) of peak filters in Eq. (3.7), while in the lower part of the figure the H (z) are determined by the Bristow-Johson method Sect. 3.3.1. a Linear ramp decreased gain; b alternate gains ±12 dB; c all gains at +12 and −12 dB
graphic equalizers, both analog and digital, created by combining the sections of the IIR filter correspond only approximately to the operator’s settings. In order to remove the interband interference it is therefore necessary to add other filters, in this case adaptive, that are able to correct these errors [17, 22]. Moreover, parallel and cascade topologies can be been combined to yield overall responses with more accuracy and flatness.
3.2.5.5
Parallel Equalization Architecture
In the case of a parallel filters bank (FB) [20, 23–26], as shown in Fig. 3.16, the circuit parameters are determined by the specifications on the intervention bands, while the tone adjustment is performed with a gi gain that exalts, for gi > 1, or attenuates for gi < 1, the ith frequency band. The parallel connection of the bandpass filters prevents accumulation phase errors and, potentially, also avoids the accumulation of quantization noise, of cascade architectures [1]. In a parallel implementation, each bandpass filter produces a resonance at its center frequency and has a low gain (ideally zero gain) at other adjacent center frequencies. However, as in cascade structure, even the parallel structure is equally prone to the interaction between the filters bands when the command gains are set to a non-flat configuration. This implies that not only the amplitude response but also the phase of each bandpass filter affects the total frequency response of the parallel graphic equalizer [25]. However, the determination of the phase of the parallel structure is a rather complicated operations than the design of a cascade graphic equalizer. Given
196
3 Digital Filters for Audio Applications g 0 Flat command gain (g0) configurations
H1 H2
g1
H( f ) g2
x[n]
y[n]
boost
1
cut
HN
gN
f1
f2
f3
f4
f5
f
Fig. 3.16 Audio equalizer realized with parallel IIR filter bank. The ith band gain is regulated by a gain coefficient, positioned downstream of each filter. The gain g0 represents the direct path gain used in flat configuration (modified from [10])
a target response, determined by the command gain settings, the determination of the individual filter responses turns out to be a rather complex optimization problem. To simplify the problem, Bank in [26] proposes a simplified parallel structure where each channel is characterized by IIR filter with 1st-order numerator and a 2nd-order denominator where the poles are a priori set considering the number of bands. For example, for a one octave equalizer with 10 bands, 20 filters poles are preassigned. The determination of the remaining free parameters is done with an ordinary least squares (OLS) procedure, (such that used in adaptive filtering [27]) that determines the position of the numerator coefficient and the actual gain of each channel. In addition, assuming the equalizer is at minimum phase, the desired phase curve can be easily obtained by considering the Hilbert transform of the amplitude spectrum and inserted, therefore, in the OLS optimization process.
3.3 IIR Digital Filters for Audio Equalizers The simplest and most intuitive way, widely used to make equalizers, is to use the normalized 2nd-order IIR cells (i.e., with a0 = 1) defined by the following TF H (z) =
b0 + b1 z −1 + b2 z −2 . 1 + a1 z −1 + a2 z −2
(3.14)
However, in practical applications the above TF can be implemented in the most appropriate form with simple topology transformations. In any case, whatever the chosen circuit architecture, by filter design we mean the determination of the polynomial coefficients [ai , bi ] of Eq. (3.14), starting from the desired frequency response specifications.
3.3 IIR Digital Filters for Audio Equalizers
197
3.3.1 Bristow-Johnson Second-Order Equalizer If we consider the 2nd-order IIR cell in Eq. (3.14) we can observe that this has five free parameters. In [11, 57] Bristow-Johnson, proposed a procedure for determining the H (z) starting from five constraint specified by the desired filter specifications. For example, we can impose: a specific amplitudes at zero frequency and at Nyquist frequency (the gain is equal to one (or 0 dB), i.e., the gain at the band ends); a maximum or a minimum point at ω0 ; the bandwidth ω (or equivalently the Q-factor of the boost/cut region); and finally the filter gain K at ω0 . So we can write: |H (e jω )|ω=0 = G 0 , |H (e )|ω=π ∂ H (e jω ) ∂ω ω=ω0 H (e jω0 ±ω/2 ) H (e jω ) ω=ω0 jω
DC gain
(3.15)
= Gπ ,
Gain at Nyquist frequency
(3.16)
= 0,
Min max central frequency
(3.17)
Bandwidth (or Q − factor)
(3.18)
= G B, = 10
K 20 dB
≡ G, Gain at central frequency.
(3.19)
Thus, from (3.15)–(3.19) we may impose five constraints that can be used to determine the five unknown filter’s coefficients [b0 , b1 , b2 , a1 , a2 ]. Although the determination of IIR 2nd-order coefficients directly in the z-domain has been proposed by Reiss [28], originally this method was proposed by BristowJohnson [11], starting from the s-domain specifications.
3.3.1.1
Analog Prototype Design with Midpoint Cut Frequency
For the determination of the TF H (z) parameters, Bristow-Johnson [11], start from a 2nd-order analog prototype written in the canonical form (3.7) which we rewrite for convenience H (s) =
s 2 + 2Gα0 s + 20 s 2 + 2α0 s + 20
where 0 indicate the analog angular frequency (predistorted properly through the bilinear transformation), Q = 1/2α, and the parameter G that is the gain at 0 . It is indeed easy to verify that for a generic analog bandpass filter, setting for the bilinear z-transform (BZT) 0 = (2/T ) tan(ω0 /2) (see Sect. 2.8.5), the analog specification (3.15)–(3.19) are satisfied, so it holds H ( j0) = H ( j∞) = 1, |H ( j0 )| = G, and
∂ |H ( j)|2 =0 ∂ =0
198
3 Digital Filters for Audio Applications
for the bandwidth constraint, the parameter α (or equivalently Q) must be determined, imposing one of the bandwidth definitions given above in Sect. 3.2.3. Below, indicating with G the gain in natural values and with K the gain in dB, we develop the procedure adopting the definition of band at K /2 dB (midpoint gain dB): the bandwidth is that for which the gain is equal to half of the peak gain is express in decibels; or K /2
|H ( j0 )| = 10 20 dB =
√
G, or |H ( j0 )|2 = G.
(3.20)
By imposing this condition on the TF (3.7), we can write the following equation = 2
20
2 2 1 + 2Gα ± 2α G Gα + 1
the upper and lower frequencies, defined by the midpoint gain, are respectively 2 =
and 1 =
2 2 1 + 2Gα + 2α G Gα + 1
(3.21)
20 1 + 2Gα 2 − 2α G Gα 2 + 1 .
(3.22)
20
If the bandwidth, as commonly used in the audio field, is expressed in octaves bw (or in its fractions) for which it is defined by the relationship 2 /1 = 2bw , it is worth then 2 = 1 2bw = 1 eβ 22 e−β = 21 eβ
(3.23)
where β = ln(2)bw . The coefficient α can be determined by solving the following Eqs. (3.21)–(3.23), w.r.t. α, and we obtain
2 1 1 β 2 e − e−β α = −1 ± 1 + G 4 being α 2 > 0, the negative solution can be neglected. For which it results 1 β . α = √ sinh 2 G
3.3 IIR Digital Filters for Audio Equalizers
199
The analog equalizer can be determined as √ √ 2 s 2 + 2 G sinh β2 0 s + 20 s 2 + 2 G sinh ln(2) 2 bw 0 s + 0 H (s) = = 2 s 2 + √2 sinh β2 0 s + 20 s 2 + √2 sinh ln(2) 2 bw 0 s + 0 G
G
(3.24)
in the resulting TF as control parameters explicitly appear: the bandwidth in octaves bw , the gain in natural values G and the center band frequency 0 . 3.3.1.2
Digital Filter H(z) Realization
The discrete-time Bristow-Johnson TF is obtained from the (3.24) by means of the bilinear transformation (2.115). Setting A = ω0 T /2 = π f 0 / f s , after some steps you get (1 + A2 + 2Gα A) − 2 1 − A2 z −1 + 1 + A2 − 2Gα A z −2 . (3.25) H (z) = (1 + A2 + 2α A) − 2 1 − A2 z −1 + 1 + A2 − 2α A z −2 The (3.25) is evaluated through the BZT considering the predistortion of the only center band frequency 0 . However, for a more correct mapping between the s-z complex planes it is necessary to make the predistortion (as well as the frequency 0 ) also of the analog prototype frequencies that determine the bandwidth, 1 and 2 . This can be done with a type constraint tan−1
2 T 2
= 2bw tan−1
1 T 2
.
However, this constraint does not lead to a closed form suitable for automatic calculation in the case of implementation with online filter control. An inexact but very accurate way to proceed automatically with the precompensation of all the frequencies of interest, is based on the differentiation of the logarithm of the analog response with respect to the logarithm of the digital one in the neighborhood of 0 , or from the expression (2.116). It follows that ln(ω) e 2 + ln tan ln () = ln T 2 differentiating
∂ (ln ()) ω = ∂ ln (ω) sin(ω)
for which bw ←
ω0 bw . sin(ω0 )
(3.26)
200
3 Digital Filters for Audio Applications
From the previous expression it appears that the DT bandwidth must be precompensated (slightly widening) by multiplying bw by the term 0 / sin(0 ) (which is always greater than 1). For the mapping from the analog prototype H (s) to the DT filter H (z) it is necessary then: • precompensate the 0 with Eq. (2.116); • precompensate the bw with the Eq. (3.26). With these positions the relation (3.25) becomes √ √ 1 + B G − 2 cos (ω0 ) z −1 + 1 − B G z −2 H (z) = √ √ 1 + B/ G − 2 cos (ω0 ) z −1 + 1 − B/ G z −2
where B = sinh
(3.27)
ln(2) ω0 bw sin (ω0 ) . 2 sin (ω0 )
Thus, the filter coefficients H (z) can be written as b0 =
√ 1+ B G √ , 1 + B/ G
√ 1− B G b2 = √ , 1 + B/ G
b1 = a1 =
−2 cos ω0 √ 1 + B/ G
√ 1 − B/ G a2 = √ . 1 + B/ G
(3.28)
(3.29)
Remark 3.11 The expressions of the coefficients of H (z) (3.28) and (3.29) were evaluated considering the definition of bandwidth at midpoint gain K /2 dB. In case you want to use another definition for the bandwidth you need to solve the expressions with the new definition (3.20)–(3.23) and recalculate the expression of the analog prototype. Remark 3.12 It can be shown (see [11]) that with the bandwidth definition of Regalia-Mitra [13] and Massie [14], the parameter B in Eqs. (3.28) and (3.29), is defined as √ ω B = G tan 2 where ω is the bandwidth defined as the frequency distance between the points as −3 dB around the point with zero gain. In the case the bandwidth is defined according to the Mooerer definition [7], the term B takes the form
3.4 Robust IIR Audio Filters
201
B=
√
G
F2 − 1 ω tan G2 − F 2 2
where F represents the gain at the cutoff angular frequencies ω1 and ω2 defined at ±3 dB. √Note also that comparing with the definition of Regalia-Mitra we have F = √ 1/2 G 2 − 1. However, with the White definition of bandwidth [8] (frequency distance between two points at +3 dB for the boost filters, and −3 dB for the cut filters evaluated with respect to the 0 dB level), the term B becomes √ B=
ω G tan , P 2
⎧ √ ⎪ G 2 − 2, G> 2 ⎪ ⎨ P= 1 − G2, G < 1/2 ⎪ ⎪ √ ⎩ not defined, 1/2 < G < 2.
Remark 3.13 The literature on equalizers is very wide and there are other solutions which for brevity are not reported. For example Orfanidis [12] proposes a method that allows a high approximation of the analog prototype and does not suffer from the prewarping effect of the BZT near the Nyquist frequency. In Clark et al. [29], in addition to the BZT is proposed the use of direct mapping, called matched-z transform (MZT), of poles and zeros from the analog TF to the TD one. In particular, a modification to the MZT is proposed in order to reduce the problem of the aliasing implicit in this technique. In [21, 30] parametric methods are proposed for high-order shelving filters. A recent overview on the subject of audio equalization was proposed by Välimäki-Reiss [16].
3.4 Robust IIR Audio Filters In DASP one of the central importance problems, after having chosen a TF with the desired frequency response, is the choice of the filter structure. Problems such as noise control, signal scaling, efficient coefficients calculation, limit cycles, overflow control, round-off noise; are very difficult to solve and strongly affect the performance of the final audio quality. For example, in digital filtering it is necessary to appropriately rescale the input signal of the filter taking into account two conflicting requirements. A low signal increases the round-off noise and, moreover, the distortion increases linearly with the decrease of the signal level. On the other hand, a signal that is too high increases the probability of overflow. In DASP the choice of the optimal level is, at times, extremely complex. These problems are present in a less evident way even if the filter is implemented in floating-point arithmetic.
202
3 Digital Filters for Audio Applications
3.4.1 Limits and Drawbacks of Digital Filters in Direct Forms One of the most used structures for the realization of DT filters is the direct form II (see Sect. 2.6, Fig. 2.22b). This is certainly a very efficient architecture but in some situations it can present some problems. Its TF, for a 2nd-order cell, is equal to H (z) = (b0 + b1 z −1 + b2 z −2 )/(a0 + a1 z −1 + a2 z −2 ) where, usually a0 = 1. The main limitations of this architecture are due to quantization noise, overflow, coefficient scaling, difficult parameter tuning, and stability. • Noise—The noise performance, as pointed out by Jackson [31] and by Roberts and Mullis [32], are not good. An increase in the Q-factor or a narrowing of the filter intervention band corresponds to an increase (without limit) in the quantization noise. In audio processing this represents a big problem since narrowband filters, in multi-channel equalizers, are the norm and not the exception. • Coefficient scaling—The coefficients range of a 2nd-order cell for normal audio applications is, in general, ±1 for a2 and b2 , and ±2 for a1 and b1 . These values, completely acceptable in the case of floating-point arithmetic, can create some problems in fixed-point two’s complement arithmetic in which the maximum admissible value is equal to 1. • Overflow and scaling of internal variables—In the direct form II the value of the internal variables depend on the location of the poles and zeros and can have very large amplitudes. This means that in hardware implementations or with fixed-point DSP the internal registers must be rather long. • Tuning—The polynomial coefficients of powers z −1 , a1 and b1 are generally equal to −2r cos θ , and this implies that the tuning of these coefficients interacts with the Q-factor of the filter. The coefficients of z −2 , a2 and b2 control the radius of the poles/zeros. This means that if these are changed the coefficients a1 and b1 must also be changed and the frequency of filter intervention (angle θ ) is also altered. In audio applications, in order to make the calculation of the coefficients easier, it is desirable to have independent controls of the Q-factor and cut frequency. • Stability—In the case of direct forms it is not unlikely to reach unstable configurations. So, it is necessary to add controls that guarantee the stability of the filter. In the case of 2nd-order filters in direct forms, it is therefore necessary to verify that |a1 | < (1 + a2 ). In other forms, which we shall see below, such control can be much simpler.
3.4.2 All-Pass Decompositions In 1988, Regalia et al., in the paper [33] defined all-pass filters as “a versatile signal processing building block.” This is primarily due to the possible TF’s factorization involving all-pass sub-sections. In agreement with the Property 2.11 a non-minimal phase stable rational TF H (z), can be factorized as the product of a minimal phase function and a total step function.
3.4 Robust IIR Audio Filters Fig. 3.17 All-pass decomposition. A generic structurally passive TF G(z) = P(z)/D(z), with symmetric P(z), can be realized with the parallel connection of two all-pass cells
203
A1 ( z ) 1 2
X (z)
Y ( z)
A2 ( z )
That is, H (z) = Hmin (z)Hap (z). Moreover, Vaidyanathan [34] has demonstrated the following widely used properties to build robust filtering structures according to the following Definition and Property. Definition 3.1 (Bounded real TFs) Every rational N -order TF, G(z) = P(z)/D(z), with G(e jω ) ≤ 1, and with P(z) of symmetric type pk = p N −k , is defined as structurally passive or bounded real (BR) TF. Property 3.1 (All-pass decomposition) Every BR TF, can be decomposed as the sum of sub all-pass systems with an additional output multiplier. The resulting structure, illustrated in Fig. 3.17, is therefore of the type G(z) =
1 2
[A1 (z) + A2 (z)] .
A very useful aspect in the design of audio filters is that this structure can exhibit excellent properties in relation to parametric sensitivity, noise, etc. In particular, the overall filter is robust if A1 (z) and A2 (z) are implemented robustly. Note that the symmetry condition of the numerator coefficients indicates that P(z) is linear phase. This restriction is not very strong as most IIR lowpass filters enjoy this property.
3.4.3 Ladder and Lattice Structures Particularly interesting structures for the construction of the 1st-order filter cells are the structures called ladder and lattice [55]. This name derives from the shape of the signal flow graph: if it has crosses it is called lattice; if it is, instead, planar it is said ladder. However, in the audio literature only the term lattice is almost always used (including the ladder structure). An example of a ladder and lattice network is shown in Fig. 3.18. A simple and powerful way to study and derive ladder and lattice structures is the one that refers to the numerical 2-port (2P) network, see Fig. 3.19, so it is possible to write the input–output relations as Y = TX; where X = [X 1 X 2 ]T and Y = [Y1 Y2 ]T are variables that can be expressed both in the frequency domain z and in the time domain n.
204
3 Digital Filters for Audio Applications
Fig. 3.18 Example of filter cells. a Ladder. b Lattice
For example for the ladder structure of Fig. 3.18a we have that
Y1 Y2
√
1 − k2 k X1 = √ X2 1 − k 2 −k
(3.30)
while for the structure of Fig. 3.18b we get
Y1 Y2
=
k 1 1 −k
X1 . X2
(3.31)
With the inclusion of appropriate closing constraints on port 2 it is possible to obtain TFs with certain properties. Property 3.2 (All-pass structure with constrained two-pair network) If in the twopair network of Fig. 3.19 a delay element z −1 is inserted in port 2, then the TF A(z) = Y (z)/ X (z) turns out to be all-pass type. In this case, depending on the T matrix choice, there are several equivalent possible topologies some of which are illustrated in Fig. 3.20.
3.4.3.1
Properties of Lattice and Ladder Structures
Ladder and lattice basic structures have remarkable properties. Below are some of them.
Fig. 3.19 Digital two-pairs or two-ports (2P) network where T is denoted to as 2P transfer matrix
3.4 Robust IIR Audio Filters
205
Fig. 3.20 Equivalent structures of 1st-order all-pass filter. a Lattice structure with two multipliers. b Ladder structure with four multipliers. c Ladder structure with three multipliers. d Lattice structure with a multiplier and three sommators
Property 3.3 The matrix T of the ladder structure (3.30) is orthonormal. Proof Placing the coefficient k = cos θ , we get
Y1 Y2
cos θ sin θ = sin θ − cos θ
X1 Y1 cos θ − sin θ X1 , or equivalently = X2 Y2 X2 sin θ cos θ
so, by the identity sin2 θ + cos2 θ = 1, TT T = I.
Also note that the matrix T has the form of a planar Givens rotator. The main features of the normalized structure with four multipliers are due to the quantization noise, the essence of overflow and the absence of need for scaling of the coefficients. Property 3.4 The matrix T of the lattice structure (3.31) is a unitary matrix less than a constant. Proof In fact, we have that TT T =
k 1 1 −k
k 1 1 −k
=
10 k2 + 1 0 2 = (k + 1) . 0 k2 + 1 01
Advantages • The coefficient scaling—Each node of the structure is normalized in the L 2 norm. This property is very important in the case of real-time of filters tuning (and therefore time-variant) widely used in many audio applications (vocal synthesis, sound synthesis, equalizers, etc.).
206
3 Digital Filters for Audio Applications
• Noise—Many authors have shown that these structures are optimal also from the point of view of quantization noise. In particular, the performance in terms of noise does not vary by varying the position of the poles and zeros (i.e., in the passband region). The noise level is constant and depends only on the order. • Orthogonal tuning—As we will see in the next paragraphs, the use of all-pass sub-structures for the creation of audio filters allows the determination of control parameters, (gain, band and frequency) that are almost independent. • Overflow cycles—In normalized ladder structures, overflow cycles rapidly tend to zero even in the case of saturation-free arithmetic [32]. • Limit cycles—In general, normalized structures are not limit cycle free. However, limit cycles can be eliminated if the quantization of the coefficients is performed with truncation rather than with rounding (passive quantization). • Stability—With the normalized structures the stability test is immediate. Just check that the lattice parameters are |ki | < 1. • Stability in time-variant filters—In time-varying filters the stability control is a non-trivial problem. In fact if the poles are inside the unit circle, do not guarantee stability. The condition (| pi | < 1) is, in fact in this case a necessary but not sufficient condition. Drawbacks The main drawback of normalized structures is that they have a higher computational cost than other structures (about four times higher, in terms of multiplication, than the direct form II). In the case of hardware implementations this limitation can be ignored considering particular architectures (e.g., CORDIC). Moreover, other measures for reducing the computational cost can be made by considering the properties of complex multiplication.
3.4.4 Lattice FIR Filters A FIR filter can be realized by cascading sections of the type shown in Fig. 3.20 by inserting a delay element at its input. The resulting structure, which represents a 1st-order base cell, is illustrated in Fig. 3.21a. By defining the quantities [ pi−1 qi−1 ]T and [ pi qi ]T respectively as inputs and outputs, considering 1st-order base cell of Fig. 3.21b, for simple visual inspection the z-domain input–output relations are Pi (z) = Pi−1 (z) + ki z −1 Q i−1 (z) Q i (z) = ki Pi−1 (z) + z −1 Q i−1 (z).
(3.32)
In the form of a two-port network with transmission matrix Ti (z), we have that
Pi (z) 1 ki z −1 Pi−1 (z) = Q i (z) Q i−1 (z) ki z −1
⇔
Ti (z) =
1 ki z −1 . ki z −1
3.4 Robust IIR Audio Filters
207
pi 1 [ n ]
pi 1 [ n ]
pi [ n ]
pi [ n ]
ki
Ti b)
a) qi 1 [ n ]
z
p0 [ n ]
p1[ n ]
x[ n ]
p2 [ n ]
T0
c)
qi 1[ n ]
qi [ n ]
1
z
z
y[ n ]
p N 2 [n]
TN
1
z
q1[ n ]
qi [ n ]
1
T2
1
q0 [n]
ki z
q 2 [n]
p N 1[ n ]
1
1
q N 2 [n]
q N 1[n]
Fig. 3.21 Lattice FIR filter of order N − 1. a First order lattice base cell. b First order lattice base cell with two multipliers (with reflection coefficient) and unit transmission coefficient. c FIR filter made with cascade lattice sections
Let’s now consider a FIR filter realized by cascade connection of 1st-order lattice described by Eq. (3.32), as illustrated in Fig. 3.21c. The TFs between the input x[n], the outputs at the various stages are defined as Ai (z) = Pi (z)/X (z) , Bi (z) = Q i (z)/X (z)
i = 1, ..., N − 1
(3.33)
and reiterating for each stage, it is possible to write the following recursive expressions
1 ki z −1 1 k2 z −1 1 k1 z −1 A0 (z) Ai (z) = . (3.34) · ... · · · ki z −1 k2 z −1 k1 z −1 Bi (z) B0 (z) Considering A0 (z) = 1 and B0 (z) = 1 and generalizing, the polynomials Ai (z) and Bi (z) assume the following recursive form Ai (z) =
i k=0
ak(i) z −k ,
Bi (z) =
i
(i) −k ai−k z , i = 0, 1, ..., N − 1
k=0
with a0(i) = 1. So, for reflection coefficients it’s worth ai(i) = b0(i) = ki , and for the last stage bk = a(N −1)−k , k=0, 1, ..., N − 1; i.e., in the z-domain we have that Bi (z) = z −i Ai (z −1 )
(3.35)
that is, one is the mirror version of the other. Therefore, the ratio between the polynomials is equal to an all-pass filter
208
3 Digital Filters for Audio Applications
Hapi (z) =
z −i Ai (z −1 ) ai + ai−1 z −1 + · · · + a1 z −(i−1) + z −i Bi (z) = = . Ai (z) Ai (z) 1 + a1 z 1 + · · · + ai z i
Remark 3.14 It should be noted that the lattice structure is not as computationally efficient as the direct form, as two multiplications are required for each filter tap coefficient. This structure, in fact, is mainly used in cases where high robustness is needed, or in cases where both filter outputs are needed y[n] = p N −1 [n] and yq [n] = q N −1 [n].
3.4.4.1
FIR Lattice Analysis Formula: Step-Up Recursion
Starting from the vector of reflection coefficients k = [k0 k1 · · · k N −1 ]T , and A0 (z) = B0 (z) = 1; the TF of the overall structure of Fig. 3.21c, can be determined by the recursive Eq. (3.34), rewritten as Ai (z) = Ai−1 (z) + ki z −1 Bi−1 (z) , i = 1, 2, ..., N − 1 Bi (z) = ki Ai−1 (z) + z −1 Bi−1 (z)
(3.36)
also called step-up recursion, initialized with A0 (z) = B0 (z) = 1 and with Bi (z) = z −i Ai (z −1 ), which allows to determine the FIR filter TF known the reflection coefficients.
3.4.4.2
FIR Lattice Synthesis Formula: Step-Down Recursion
For the determination of the reflection coefficients ki , starting from the impulse response of the (N − 1)-order FIR filter, we consider a step-down recursion of Eq. (3.36). Thus, for Eq. (3.35) it is possible to write the following recursive formula 1 Ai−1 (z) = 1−k 2 [A i (z) − k i Bi (z)] i , i = N − 1, N − 2, ..., 1. −i Bi (z) = z Ai (z −1 )
In addition, to determine the ki coefficients, it is possible to use the expression ki = ai(i) = bi(i) , with a0(i) = 1. So, after normalizing the impulse response as a[n] = h[n]/ h[0], n=0, 1, ..., N − 1, we a priori determine the reflection coefficient of the last stage as k N −1 = a[N − 1]. This allows to initialize the iterative procedure for the calculation of all the reflection coefficients known the filter impulse response.
3.4.4.3
Lattice QMF/CQF Filters
A variant to the architecture of Fig. 3.21c, consists in the use of a cell in which the reflection coefficients appear with alternate signs, with even coefficients all null as
3.4 Robust IIR Audio Filters
209
Fig. 3.22 QMF (N − 1)-rd lattice FIR filter with 1st-order lattice base cell
illustrated in Fig. 3.22. This is a structure of particular interest because it allows the implementation of specular symmetric and paraunitary filter banks. Let us define [ pi−1 qi−1 ]T and [ pi qi ]T , respectively, as the input and output quantities, given the 1st-order base cell of Fig. 3.22, the input–output relationships can be written as
Pi (z) Pi−1 (z) 1 −ki z −1 1 −ki z −2 = ⇔ Ti (z) = ki z −2 ki z −1 Q i (z) Pi−1 (z) where Ti (z) is the transmission matrix. Property 3.5 The transmission matrix Ti (z), representing the various processing stages of QMF (N − 1)-rd lattice FIR filter, is a paraunitary matrix; i.e., TiH (z)Ti (z) = cI,
∀ |z| = 1.
Proof It is indeed T H (z)T(z) =
1 ki −ki z z −1
1 −ki z −1 ki z −1
=
1 + k2 0 0 z −2 + ki2 z −2
= (k 2 + 1)
1 0 . 0 z −2
Property 3.6 The matrices T(z) are paraunitary, thus it is valid TT (e− jω )T(e jω ) = cI,
∀ ω.
Paraunitary matrices are fundamental for the design of orthonormal and wavelet filter banks. For example, in the case T(z) is 1 by 1, then |T(e jω )| = 1, i.e., the corresponding filter is an all-pass filter. Moreover, if T(z) is M × M, it could come a M-channel filter bank, i.e., a filter bank is orthogonal if this matrix is paraunitary. From the limitations of the reflection coefficients values, can be realized filter banks with TFs with complementary power and conjugate responses. In this case the filter bank that can be realized with such structures is said to be power complementary conjugate quadrature filter (CQF). The Ai (z) TF has the following properties
210
3 Digital Filters for Audio Applications
Ai (z)Ai (z −1 ) + Ai (−z)Ai (−z −1 ) = 1 , i = 1, 3, ..., N − 1 Bi (z) = z −i Ai (−z −1 )
(3.37)
with N that only takes even values. These TFs, which are very important in multi-rate signal processing, are used in the definition of paraunitary filter banks. Property 3.7 The upper of Eq. (3.37) implies that the Ai (z) and Bi (z) filters are complementary in power, i.e., |Ai (e jω )|2 + |Bi (e jω )|2 = 1, ∀ω also, anti-transforming the lower of Eq. (3.37), in the time domain it’s true that bi [n] = (−1)(i−n) ai [i − n], i = 1, 3, ..., N − 1
(3.38)
that is, one is the mirror version of the other with alternating signs: in practice, if you notice the zeros of the Ai (z), you can determine the zeros of the Bi (z) by rotating the whole polar diagram by π . i.e., by mirroring and conjugating the zeros of the Ai (z) with respect to the unitary circle, hence the appellation CQF.
3.4.5 Orthogonal Control Shelving Filters: Regalia-Mitra Equalizer As underlined in Sect. 3.4.1 the direct forms I or II are not particularly indicated in the audio processing. To obtain a target frequency response, as that defined by the sliders position of an equalizer, it is necessary to modify all the filter TF polynomials coefficients ai and bi . However, the dynamic variation of all filter’s parameters would produce a perturbation that would be audible artifacts on the audio signal. The instantaneous updating of all TF coefficients, would produce uncomfortable clicks (especially for IIR filters) due to the decoupling between the internal status variables of the recursive part and the new filter coefficients. In other words an audible transient effect is produced. As suggested by Bristow-Johnson [11], a desirable property for biquadratic equalizers is to have a gain control parameter that scales the frequency response in dB in a certain band, using a frequency independent factor. Moreover, as suggested by Jot [35], an equalizer with this property is called proportional parametric equalizer. The problem of creating filter that can vary independent parameters acting in the frequency, bandwidth and gain, can then be difficult to solve. Usually, it is necessary to calculate online all filter parameters whenever the position of only one of the control parameters is changed. A practical solution could be to calculate some of the possible combinations (which are infinite) and appropriately interpolate the intermediate situations.
3.4 Robust IIR Audio Filters
211
This section describes an important contribution due to Regalia-Mitra (R-M) [13], in which the audio filter is implemented considering a particular all-pass transformation. With this structure the authors demonstrate how the filter parameters control independently the gain, the frequency of intervention and the bandwidth. In other words, acting on one of the filter parameters directly intervenes on the specific acoustic quantity of interest.
3.4.5.1
Equalizer with All-Pass Decomposition
The analog prototype for the so-called type 1, 1st-order shelving filter, described in Sect. 3.2.2.1 (see Eq. 3.1), is defined as H (s) =
s + Kp s+p
(3.39)
where the gains at low and high frequencies are respectively H (0) = K and H (∞) = 1, and where the parameter K control the bass-boost and cut, while the parameters p control the analog cut frequency. However, note that the previous equation can be decomposed as the sum of a highpass and lowpass TF, that is H (s) =
p s +K s+p s+p
furthermore, it is easy to verify that the following relationships are also valid
1 s−p p 1 s−p s = 1+ , and = 1− . s+p 2 s+p s+p 2 s+p p Now calling A(s) = s− , we can recognize that the A(s) is a simple 1st-order s+ p all-pass TF. It follows that the H (s) can be expressed as H (s) = 21 [1 + A(s)] + 1 K [1 − A(s)]. 2 Applying the BZT (2.115) to the previous expression, the H (z) of the 1st-order filter can be rewritten as
H (z) =
1 2
[1 + A(z)] + 21 K [1 − A(z)] = 21 [(1 + K ) + (1 − K )A(z)]
where the all-pass A(z) is A(z) = −
z −1 + k1 1 + k1 z −1
and, let T be the time sampling rate, the parameter
(3.40)
(3.41)
212
3 Digital Filters for Audio Applications
t = tan(ωc T /2),
and
k1 =
t −1 . t +1
(3.42)
The form (3.40) represents the R-M all-pass decomposition Remark 3.15 Note that, in Eq. (3.40) the filter gain is regulated by K ; in addition from the (3.42) we see that the parameter relative to the cutoff frequency is regulated by k1 and, as also shown in Fig. 3.23a, is independent of the gain. The expression of the TF (3.40) suggests the structure shown in Fig. 3.24a. In the same figure are also reported other equivalent structures, proposed in [14]. Furthermore, the all-pass cell A(z) can be made with particularly robust structures previously studied such as those, for example, shown in Fig. 3.20.
3.4.5.2
2nd-Order R-M Filter
The 2nd-order all-pass cell can be obtained starting from the 1st-order cell with the lowpass bandpass spectral transformation [1, 2] z −1 → −z −1
z −1 + k2 1 + k2 z −1
(3.43)
Amplitude |H(e j )| [dB]
applying the transformation (3.43) to (3.41), we get
18
12
Amplitude |H(e j )| [dB]
a)
b)
6
c)
0 20
50
100
500
1k
5k
10k
20k
Frequency [Hz]
Fig. 3.23 Example of R-M filters frequency response. a First order shelving for fixed frequency √ ωc = 2π 100Ts and variable K . b 2nd-order peaking filter for fixed frequency ωc , Q = 1/ 2 and variable K . c 2nd-order presence filter for variable frequency ωc and gain K
3.4 Robust IIR Audio Filters
213
Fig. 3.24 Equivalent robust structures for the R-M 1st-order shelving and 2nd-order peaking filters realization (see Eq. (3.40)). The TF A(z) represents a 1st or 2nd-order all-pass TF cell Fig. 3.25 All-pass transformation. The 2nd-order cell is obtained by nesting 2 all-pass 1-st order cells
Fig. 3.26 The 2nd-order R-M filter. There are three independent control parameters: k1 regulates bandwidth; k2 the frequency c ; K the gain
A(z) =
k1 + k2 (1 + k1 )z −1 + z −2 1 + k2 (1 + k1 )z −1 + k1 z −2
(3.44)
which corresponds to a 2nd-order all-pass achievable with a low-sensitivity lattice structure. The terms k1 and k2 correspond to the coefficients of the lattice structure chosen for implementation (with one, two, three or four multipliers). In Fig. 3.25, for example, the structure is shown with only one multiplier per cell. If in Eq. (3.44) we place k2 = − cos ωc T , the transformation maps the behavior of the lowpass filter into a bandpass filter around the frequency ωc (Fig. 3.26). For K = 0 the filter degenerates into a notch filter for which the definition of bandwidth indicated in Sect. 3.2.3 is valid. By choosing a passband ω = ω2 − ω1 at −3 dB (defined for the notch filter for K = 0), and a central frequency equal to ωc , the design equations are
214
3 Digital Filters for Audio Applications
k2 = − cos ωc T K = H (e jωc )
ω=ωc
t = tan(ωT /2) = tan(ωc T /Q 2 ) 1−t . k1 = 1+t With the previous expressions it is possible to modify the filter frequency response Online by controlling the gain, the band and the frequency of intervention in independent mode. The parameter k2 controls the frequency of intervention, k1 the bandwidth and K the gain (K < 1 cut or K > 1 boost) (see Fig. 3.23b–c).
3.4.5.3
R-M Symmetrical Boost-Cut Response
A problem with the R-M filter is the non-symmetrical boost-cut response relative to the control parameters. For example, the effect of a filter, with a gain of K equal to 0.5, would not be perfectly canceled by a gain filter of 2. This phenomenon can also be observed in Figs. 3.19 and 3.24. As explained in Sect. 3.2, a very simple way to obtain symmetrical cut-boost curves is to use type 1 shelving for the boost and type 2 shelving for the cut [36], or s + Kp , K > 1, boost s+p s+p H (s) = , K < 1, cut. s + p/K
H (s) =
(3.45) (3.46)
If in the (3.45) K > 1 and in the (3.46) K < 1 and the term of the gain K varies between −1 to 1, the response of the filter passes from cut to boost with perfectly symmetrical response. However, when passing through zero, it is necessary to switch between the two TFs. R-M 1st-order Symmetrical Response In the case where the (3.46) is implemented with the previously described R-M allpass transformation, by applying the BZT one can simply prove that the parameter k1 in this case assumes a value equal to k1 =
t−K , t+K
where
t = tan(ωc T /2)
(3.47)
which is different from the one calculated with Eq. (3.42). In this case, in fact, the term K also appears, while for the boost filter there is a complete independence between the amplitude and frequency control parameters in the case cut, due to Eq. (3.47), there is a relationship between parameter k1 relative to frequency and parameter K relative to gain.
3.4 Robust IIR Audio Filters
215
The structure of the filter H (z) is however identical for the boost and cut case H (z) = 21 [1 + A(z)] + 21 K [1 − A(z)]. From Eqs. (3.42) and (3.47), let t = tan(ωc T /2) be the parameter related to the cut frequency, the A(z) instead has internal parameter k1 whose value is calculated as a function of the filter gain K z −1 + k1 , A(z) = − 1 + k1 z −1
⎧ t ⎪ ⎪ ⎨ t with k1 = t ⎪ ⎪ ⎩ t
−1 , +1 −K , +K
K >1 K < 1.
Proceeding as above and reversing the order of the expressions boost-cut (3.45) and (3.46), we obtain the family of curves as that shown in Fig. 3.3. R-M 2nd-order Symmetrical Response A similar reasoning can also be done for the 2nd-order filter. Indeed, it can easily be shown that k2 = − cos ωc T K = H (e jωc ) t = tan(ωT /2) = tan(ωc T /2Q) ⎧ 1−t ⎪ ⎪ , K >1 ⎨ k1 + k2 (1 + k1 )z −1 + z −2 1+t A(z) = , with k = 1 ⎪ 1 + k2 (1 + k1 )z −1 + k1 z −2 ⎪ K − t , K < 1. ⎩ K +t
(3.48)
Also in this case for K < 1 the parameter k1 which regulates the intervention bandwidth is not independent of the gain. The price paid to have a symmetrical boost-cut response is the insertion of a switch into the filter structure and the imperfect independence of the gain and frequency control parameters. An alternative way that uses a single TF and a deviator is the one proposed by Harris and Brooking [10]. The parametric filter is denoted as feed-forward (FF) and feed-backward (FB), is that shown in Fig. 3.27 with a H (z), which to ensure stability is of the type H (z) = z −1 H1 (z) (Fig. 3.28). In [35], Jot proposes a modification to the R-M method which allows: (1) a unique boost and cut filter without the switch; (2) the definition of the midpoint gain cutoff frequency. The the bass and treble filters TF takes the form √ √ t K + 1 + t K + 1 z −1 √ HB (z) = √ t/ K + 1 + t/ K + 1 z −1 √ √ , for t = tan(ωc T /2) t K + K + t K + K z −1 √ HT (z) = √ t K + 1 + t K + 1 z −1
216 Fig. 3.27 Symmetrical shelving/peaking with only one TF plus a switch
3 Digital Filters for Audio Applications
x[n]
y[n]
H ( z)
Boost
Amplitude |H(e j )| [dB]
Amplitude |H(e j )| [dB]
Cut
Fig. 3.28 Example of magnitude response of symmetrical R-M shelving and peaking filters evaluated with the structure of Fig. 3.27 (cut frequency −3 dB)
while for the 2nd-order peaking filter we have that √ √ t K + 1 − 2cz −1 + t K + 1 z −2 c = cos(ωc T ) √ H (z) = √ , for t = tan(ωc T /2Q). t/ K + 1 − 2cz −1 + t/ K + 1 z −2 Note that this solution coincides with the formulae reported previously by BristowJohnson (Fig. 3.29). Remark 3.16 Note that a drawback of the modification of the R-M’s equalizer design due to Jot is that the filter coefficients no longer depend on the frequency parameters ω and ωc . For example, the adjustment of the gain K change all the TF coefficients. Therefore, care must be taken to prevent noise when K changing dynamically.
217
Amplitude |H(e j )| [dB]
Amplitude |H(e j )| [dB]
3.4 Robust IIR Audio Filters
Fig. 3.29 Example of magnitude response of symmetrical Jot shelving and peaking filters (midpoint gain cut frequency)
3.4.6 State-Space Filter with Orthogonal Control Among the various topologies available for the implementation of numerical filters for DASP, a particularly used structure is that deriving from the state space form (SSF) (see Sect. 1.5.1). The advantages of this form are many, as for example: (1) minimum round-off noise; (2) low sensitivity with respect to the quantization of the coefficients; (3) absence of limit cycles. A possible SSF, which derives directly from the analog filters made with operational amplifiers, very suitable for audio applications, is that illustrated in Fig. 3.30 [37, 38, 54]. The input–output relationship is described by the system of equations yhp [n] = x[n] − ylp [n − 1] − k2 ybp [n − 1] ybp [n] = k1 yhp [n] + ybp [n − 1] ylp [n] = k1 ybp [n] + ylp [n − 1]
(3.49)
where, given f s the sampling frequency and f c the cut frequency, we have that k1 = 2 sin(π f c / f s ) k2 = 1/Q i.e., k2 represents the parameter related to the bandwidth. With simple steps it can be shown that the TF of the lowpass output can be written as Y pb (z) r2 = (3.50) H (z) = X (z) 1 + (r 2 − q − 1)z −1 + qz −2 where r = k1 and q = (1 − k1 k2 ). The filter in Fig. 3.30 is particularly indicated for DASP due to the simple relationship between the control parameters and the filter frequency response. Furthermore, it can easily be shown that the filter is stable for
218
3 Digital Filters for Audio Applications
Amplitude |H(e j )| [dB]
Amplitude |H(e j )| [dB]
Fig. 3.30 Second-order digital filter with independent control of center frequency and Q-factor, derived from the analog state space filter in [37]. A single filter simultaneously provides lowpass, bandpass, highpass outputs from a single input
√ Fig. 3.31 State-space filter responses for f s = 44.1 kHz, f c = 100 Hz, Q = 1/ 2 (left) and Q = 10 (right)
k1 < (2 − k2 ). This limitation, however, does not represent a problem in audio (in particular in musical applications) since the tuning frequency is very small compared to the sampling frequency and moreover the value of Q is never very high (Fig. 3.31). In particular, this structure is widely used in musical instruments effects of in sound synthesis (synthesizers, etc.) since it does not present transition noises even between extreme tuning situations of the parameters. Moreover, the SSF simultaneously presents the lowpass, bandpass and highpass outputs. Note that is realized with 3 multipliers, 2 delay elements and 3 adders.
3.5 Fast Frequency Domain Filtering for Audio Applications
219
3.4.7 TF Mapping on Robust Structures For H (z) determined according to what stated in previous sections, the transformation to robust structures can take place with simple identities between polynomials. First of all we express the TF of the robust structure in a similar mode to that of direct forms. For example, if the 2nd-order Regalia-Mitra structure is used, described in the previous paragraph, the H (z) will have to be rewritten replacing the (3.44) in the (3.40) obtaining an explicit form of the type H (z) =
1 2
[(1 + k1 ) + K (1 − k1 )] + k2 (1 + k1 ) z −1 + 21 [(1 + k1 ) − K (1 − k1 )] z −2 . 1 + k2 (1 + k1 ) z −1 + k1 z −2
If the H (z) has been calculated, for example, with the Bristow-Johnson procedure [11]. The normalized H (z) (3.27) √
1−B G + (1+B √G ) z −2 ( ) √ H (z) = 1−B G ) −2 ( 2 cos(ω ) 0 −1 1 − 1+B/√G z + 1+B √G z ( ) ( )
1−
2 cos(ω √ 0 ) z −1 (1+B G )
moreover we have that √ 1 − B/ G k 1 = a2 = √ , and k2 = − cos ωc . 1 + B/ G
(3.51)
Remark 3.17 Note that, the parameter k2 depends only on the cut frequency ωc , while the parameter k1 depends on both the gain G and the bandwidth B. In the case in which the bandwidth was √ defined as proposed by Regalia-Mitra the parameter B would be defined as B = G tan(ω/2) and Eq. (3.51) would coincide with the expression k1 of Eq. (3.48) for K > 1. Similar procedures can be determined starting from different H (z) prototypes.
3.5 Fast Frequency Domain Filtering for Audio Applications In DASP the IIR filters are primarily used as shelving filters. In many other cases, FIR filters are widely used. The FIR filters, in addition to equalizers, can also be used in acoustical modeling, reverberation effects (direct convolution with impulse responses), room equalization, binaural 3D audio (implementation with direct convolution of HRTF), in cross-over filters (used in multi via speakers), etc. [39, 40].
220
3 Digital Filters for Audio Applications
Fig. 3.32 Convolution in the time domain and frequency domain (FD) and its computational cost. In terms of real multiplications, block FD convolution will be more efficient starting at N = 32
However, it is well known that the DT convolution of two sequences of N -length has a run-time complexity in O(N 2 ), and in audio and acoustic modeling, often the impulse responses are longer than 1 sec (i.e., thousands of samples). In this case, as proposed in 1966 by Stockham [41], it is more convenient to implement the filtering operation in the frequency domain (FD). Let us consider the convolution Theorem (see Sect. 2.4.2.3, Theorem 2.2), for which the FD product is equal to the DT convolution, and taking advantage of the fast Fourier transform (FFT) algorithm, as shown in Fig. 3.32c. The computational cost can be computed considering an input sequence x ∈ R L×1 , and filter impulse response defined as h ∈ R M×1 . Then, assign N to be the smallest power of 2 such that N = 2k ≥ M + L − 1, the computational complexity in term of real multiplications is given by TFDconv = O(3N log2 N + 6N ).
(3.52)
Thus for sequences such that N > 32, it is possible to obtain a considerable computational saving.
3.5.1 Block Frequency Domain Convolution With reference to Fig. 3.33, to have a simpler formalism it is convenient to use a DFT in vector notation. Considering the complex number or phasor FN = e− j2π/N , the DFT matrix F can be defined as F { f kn = FNkn = e− j (2π/N )·k·n , k, n ∈ [0, N − 1]}, jk i.e., a Vandermode matrix consisting of powers FN primitive roots of unity, that can be written as (see Eq. 2.46)
3.5 Fast Frequency Domain Filtering for Audio Applications
221
Fig. 3.33 Block convolution in F domain. The symbol indicate the Hadamard or pointwise vector product
⎡
1 ⎢1 ⎢ ⎢ F = K ⎢1 ⎢. ⎣ ..
1 FN1·1 FN1·2 .. .
1 FN2·1 FN2·2 .. .
··· ··· ··· .. .
1
FN(N −1)·1 FN(N −1)·2 .. .
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(3.53)
1 FN1·(N −1) FN2·(N −1) · · · FN(N −1)·(N −1)
√ where, K = 1 or K = 1/ N . In fact, the matrix F turns out to be symmetrical and complex and from its definition we can observe that its inverse can be expressed as F−1 = F H or as FF H = K 2 N I, so, to avoid scaling problems in the computation of the transform and its inverse, it is possible to insert in the transformation definition √ a suitable gain factor K such that it turns out FF H = I; for which K = 1/ N is considered. Because the produced of two DFT sequences inherently perform a circular convolution the filters generally require data constraints in order to implement the desired linear convolutdion. These constraints force certain elements of the signals vectors to be zero and only a subset of the components are retained for later use in the algorithm. Let hˆ = [h[0] h[1] · · · h[M − 1]]T be the filter impulse response, the complex vector H ∈ C N ×1 , containing the DFT of the filter is defined as H = Fh
(3.54)
where the vector h ∈ R N ×1 represents an augmented form defined as h = [hˆ T 0TM−N ]T .
(3.55)
Performing the inverse DFT (IDFT) of H, i.e., left multiplying by F−1 both members of (3.54), we get the following augmented form h = F−1 H
(3.56)
222
3 Digital Filters for Audio Applications
Fig. 3.34 Overlap-save method performs a linear convolution between a finite length sequence h[n] and an infinite length sequence x[n] by appropriately input data segmentation. a Input signal buffer composition mechanism for frequency domain filtering. b sectioning method in the case of filter hˆ ∈ R M , L-length block and FFT size equal to N = M + L
so, only the first M elements of the vector h are significant. In other words, the normal form can be referred to as hˆ = [F−1 H]M
(3.57)
where hM indicates the selection of the first M elements of the vector h.
3.5.1.1
Frequency Domain Linear Convolution with Overlap-Save Method
Let us consider the case of convolution between a sequence of infinite duration (input of the filter) and one of finite duration (impulse response of the filter). To determine the linear convolution from the product of the respective FFTs, it is necessary to proceed by sectioning the input sequence into blocks of finite length and impose appropriate constraints. In fact, by inverse transforming the pointwise product of two FFT sequences, produce a circular convolution/correlation in DT domain. In practice, there are two distinct methods of sectioning the sequences known as overlap-save (OLS) and overlap-add (OLA). For brevity we consider only the OLS [1]. To understand the OLS technique we analyze a simple filtering problem of a infinite duration sequence with an FIR filter. Consider a M-length filter and L-length
3.5 Fast Frequency Domain Filtering for Audio Applications
223
signal blocks, with L ≤ M. In order to generate the L effective output samples it is necessary to have an FFT of length N ≥ M + L − 1. Consider, as usual, N = M + L, the FFT results to be calculated considering a block of L input samples to which M past samples are appended. In formal terms, for k = 0, we can write ⎧ T ⎫ ⎨ $ % 0 · · · 0 x[0] · · · x[L − 1] ⎬ &T ' X 0 = diag F = diag F 0TM x0T ⎩ ⎭ i.c.Mpoints blockLpoints while for k > 0 we have that ⎧ T ⎫ ⎨ ⎬ x[k L − M + 1] · · · x[k L − 1] x[k L] · · · x[k L + L − 1] ] X k = diag F ⎩ ⎭ overlap old M points new L points block ' $ % &T T . xkT = diag F xold This formalism allows to express the output signal as Y k = X k H k , it should be noted that the matrix-vector product form X k H k is only possible with the convention of inserting the input DFT data into a diagonal X k matrix. In fact, if we consider a T xkT ]T , the output would vector of the DFT, for example indicated as Xˆ k = F[xold take the form Y k = Xˆ k H k , in which the operator denotes the Hadamard, or pointwise product. With the overlap-save method the useful time-domain samples are determined by selecting only the last L samples of the output vector: yˆ k = [F−1 X k H k ]L . Note that the FFT of the input is defined considering a window of N = M + L samples. With reference to Fig. 3.34, moving forward a block on the input sequence running window, the new FFT is calculated considering also the old M samples. The new FFT window contains L new samples and M old samples: this is referred to as an overlap of the (100L/(M + L)%.
3.5.2 Low Latency Frequency Domain Filtering The main disadvantage frequency domain filtering algorithms is related to the delay introduced required for the preliminary acquisition of the entire block of signal before processing. Even in the case of implementation with a certain degree of parallelism the systematic delay introduced between the input and the output, also referred to as latency, is at least equal to the block length. However, as a common choice is N = L + M, in DASP as M can represents a very long impulse response (lasting even several seconds) this latency can be not compatible with the required low-latency specifications.
224
3 Digital Filters for Audio Applications
An simple solution to decrease the inherent OLS latency, denoted partitioned overlap and save (POLS) algorithm, consists in partitioning the filter impulse response. For the linearity of the convolution, the impulse response of the filter is divided into blocks, or partitions, of shorter length, the convolutions for each partition is locally calculate and the overall filter output is obtained as a sums of the appropriately delayed partial results. In this way the overall filter delay is equal to the length of the partition. The method already well known in the field of adaptive filtering denoted as partition frequency domain adaptive filtering (PFDAF) or multidelay block frequency domain adaptive filter (MDF) (see for example [42–45] and the references inside), was re-proposed in the audio sector, for the convolution part only, by Kulp [46], and refined by Gardner [47]. Later modified by several other authors, see for example [48–52].
3.5.2.1
Uniform Partitioned Overlap-Save Algorithm
As shown in Fig. 3.35 the convolution is implemented in P smaller convolutions that can be performed in parallel and each of them can be implemented in the frequency domain. With this type of implementation, the frequency domain approach advantages are associated with a significant latency reduction. Let us consider the implementation of a filter of length equal to MF = PM taps1 where M is the length of the partition and P the number of partitions. The output of the filter is equal to y[n] =
PM−1
h n [i]x[n − i]
(3.58)
i=0
for the linearity of the convolution, the sum (3.58) can be partitioned as y[n] =
P−1
yl [n]
(3.59)
l=0
( M−1 where yl [n] = i=0 wn [i + l M]x[n − i − l M]. As schematically illustrated in Fig. 3.35, by inserting appropriate delay lines between the partitions, the filter is implemented with P separate M-length convolutions, each of which can be simply implemented in the frequency domain as shown in Fig. 3.36a. The overall output is the sum (3.59). Consider the case in which the block length is L ≤ M. Let k be the block index, denoting with xlk ∈ R(M+L)×1 the l-th partition of the input sequence vectors and with hlk ∈ R M×1 the augmented form of the impulse response, denoted as filter vector, respectively, defined as 1
In this section the filter length is referred to as MF .
3.5 Fast Frequency Domain Filtering for Audio Applications
225
Fig. 3.35 Principle of partitioned impulse response convolution
⎡
x[k L − l M − M] ⎢ .. ⎢ . ⎢ ⎢ x[k L − l M − 1] l,M l,L l T xk = [ xold xk,new ] = ⎢ ⎢ x[k L − l M] ⎢ (M+L) ⎢ .. ⎣ .
x[k L − l M + L − 1]
and
⎡
⎫ ⎥ ⎬ ⎥ ⎥ ⎭ old samples M ⎥ ⎥ ⎫ ⎥ ⎬ ⎥ ⎥ ⎦ ⎭ new block L
⎤
⎫ ⎥ ⎬ ⎥ ⎥ ⎭ M subfilter weights + M − 1] ⎥ ⎥ ⎫ ⎥ ⎬ 0 ⎥ ⎥ .. ⎦ ⎭ zero padding L samples. . 0
h k [l M] .. .
⎢ ⎢ ⎢ ⎢ h k [l M l hk = ⎢ ⎢ ⎢ ⎢ ⎣
⎤
Now, the input data FD representation for the single partition is defined by a diagonal matrix X kl ∈ C(M+L) , with DFT elements of xlk , i.e., X lk = diag{Fxlk } while the FD representation of the impulse response partition hlk , is defined as H lk = Fhlk . Said Y lk = X lk H lk the output augmented form for the l-th partition, the timedomain output is defined as (see Eq. 7.26) ylk = F−1 Y lk whereby the filter overall output is defined by the sum of all partitions yk =
P−1
F−1 Y lk .
(3.60)
l=0
By reversing the order of the DFT and the summation, the output expression can be written as
226
3 Digital Filters for Audio Applications
Fig. 3.36 Principle of partitioned impulse response convolution. a Time-domain delay line implementation Eq. (3.60); b equivalent frequency-domain delay line implementation Eq. (3.58)
Fig. 3.37 Implementation scheme of UPOLS filtering algorithm (modified from [52])
yk = F−1
P−1
X lk H lk .
(3.61)
l=0
In more simple terms, as shown in Fig. 3.36b, the linearity of the FFT allows us to sum the complex frequency domain output of each subfilter before taking the IFFT, which reduces the number of IFFTs required to one. The latter, in fact, allows an efficient frequency domain calculation of the individual partitions contributions X lk H lk . The important aspect is that the FFT in Eq. (3.61) is calculated only on M + L points (relative to the partition). In addition, note that the delay lines are defined in the frequency domain. In Fig. 3.37 an implementation scheme of the overlapping algorithm and uniform partitioned saving (UPOLS) is shown.
3.5 Fast Frequency Domain Filtering for Audio Applications N
x[n]
N
2N
4N
TD FIR zN
z 2 N z 4 N
227
y[n]
FD FIR FD FIR FD FIR
Fig. 3.38 Implementation scheme of NUPOLS filtering algorithm with the Gardner partition scheme [47] P (N , N , 2N , 4N , 8N , . . ., 2k−1 N ). For zero-latency requirements the first block is implemented as a time-domain (TD-FIR) convolution, while the other in the frequency domain (FD-FIR)
Computational Complexity of the UPOLS Regarding the computational cost, indicating with L = M the length of the block, with P = MF /L the number of partitions, with N = 2L the length of the FFT, also indicating with TFFT (N ) ≈ N log2 (N ) the computational cost of the FFT on the single block, we have that in first approximation the cost of the algorithm in term of real multiplications is equal to TUPOLS = (P + 1) · N log2 (N ) + P · 6N + γ L
(3.62)
where with γ L is indicated the computational cost of zero-padding, buffer shifting, and all the other necessary operations. So, the run-time complexity of the filtering operations lies in O(MF ). Moreover, in the case of time-variant filter, it is also necessary to add the computational cost of the FFTs of the impulse response partitions.
3.5.2.2
Non-Uniform Partitioned Overlap-Save Algorithm
An innovative approach, called Non Uniform Partitioned Overlap and Save (NUPOLS) was independently proposed by Sommen [45] and Gardner [47]. To achieve very low latency, partitions are of increasing lengths and the first partition, in order to achieve zero-latency, can be implemented in the time domain. This approach greatly reduces the computational cost and makes it possible to implement in real-time particularly long filters, such as those of artificial reverberators denoted as convolution reverberators. Obviously, the implementation of non-uniformly partitioned convolution are more complex than uniformly partitioned methods (Fig. 3.38). To have a reduced computational cost, it is necessary to consider blocks with more latency, with higher length.
228
3 Digital Filters for Audio Applications
However, for long filters the block size is usually too long compared to acceptable latency values and, conversely, the cost increases considerably when the block length is reduced. For high sampling rate and for impulse responses several seconds long, considering some maximum latency constraint, the choice of the optimal number and length of the blocks is not a trivial problem. In fact, in order to determine the optimal configuration in term of computational complexity per input sample, it would be necessary to make an exhaustive choice of all the possible combinations that can be very expensive from the computational point of view. To determine an optimal configuration, Garcia [48] presented a method to optimize the number and length of blocks. The proposed method is based on the Viterbi algorithm and allows for optimization of several convolution channels in parallel. For example for a 3 s convolution reverberator with f s = 44.1 kHz, the optimal choice determined by the Garcia approach, for latency of 5.8 ms is: 8 blocks of 256 samples, 7 blocks of 2048 samples and 7 blocks of 16,384 samples respectively. For more details and new trends, we refer to the literature on this topic. See for example [41–52].
References 1. A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing, 3rd edn. (Pearson Education, London, 2010) 2. L.R. Rabiner, B. Gold, Theory and Application of Digital Signal Processing (Prentice-Hall Inc, Englewood Cliffs, 1975) 3. S.J. Orfanidis, Introduction to Signal Processing (Prentice Hall, Englewood Cliffs, 2010). ISBN 0-13-209172-0 4. T.W. Parks, J.H. McClellan, Chebyshev Approximation for Nonrecursive Digital Filter with Linear Phase. IEEE Trans. Circuit Theor. 19, 189–194 (1972) 5. E.O. Brigham, The Fast Fourier Transform and Its Application (Prentice-Hall Inc, Englewood Cliffs, 1998) 6. G.W. McNally, Digital Audio: Recursive Digital Filtering for High Quality Audio Signals (BBC Research Department Report, 1981) 7. J.A. Moorer, The manifold joys of conformal mapping: applications to digital filtering in the studio. J. Audio Eng. Soc. 31, 826–841 (1983) 8. S.A. White, Design of a digital biquadratic peaking or notch filter for digital audio equalization. J. Audio Eng. Soc. 34, 479–483 (1986) 9. S.K. Mitra, K. Hirano, S. Nishimura, Design of digital bandpass/bandstop filters with independent tuning characteristics. Frequenz 44, 117–121 (1990) 10. F.J. Harris, E. Brooking, A versatile parametric filter using an imbedded all-pass sub-filter to independently adjust bandwidth, in Center Frequency, and Boost or Cut, Presented at the 95th Convention of the AES, Preprint 3757 (1993) 11. R. Bristow-Johnson, The equivalence of various methods of computing biquad coefficients for audio parametric equalizers, in Proceedings of Audio Engineering Society Convention (1994) 12. S.J. Orfanidis, Digital parametric equalizer design with prescribed Nyquist-frequency gain. J. Audio Eng. Soc. 45(6) (1997) 13. F. Fontana, M. Karjalainen, A digital bandpass/bandstop complementary equalization filter with independent tuning characteristics. IEEE SP Lett. 10(4) (2003)
References
229
14. D.C. Massie, An engineering study of the four-multiply normalized ladder filter. J. Audio Eng. Soc. 41, 564–582 (1986) 15. P.A. Regalia, S.K. Mitra, Tunable digital frequency response equalization filters. IEEE Trans. Acoust. Speech Signal Process. ASSP-35 (1987) 16. V. Välimäki, J.D. Reiss, All about audio equalization: solutions and frontiers. MDPI Appl. Sci. 6, 129 (2016). https://doi.org/10.3390/app6050129 17. R.J. Oliver, J.-M. Jot, Efficient multi-band digital audio graphic equalizer with accurate frequency response control, in 139th AES Convention, Paper no. 9406, New York, USA, Oct. 29–Nov. 1, 2015 18. ISO. ISO 266, Acoustics—Preferred Frequencies for Measurements (1975) 19. J. Rämö, V. Välimäki, Optimizing a high-order graphic equalizer for audio processing. IEEE Sig. Proces. Lett. 21(3), 301–305 (2014) 20. Z. Chen, G.S. Geng, F.L. Yin, J. Hao, A pre-distortion based design method for digital audio graphic equalizer. Digit. Sig. Proces. 25, 296–302 (2014) 21. M. Holters, U. Zolzer, Parametric higher-order shelving filters, in 14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, 4–8 Sept 2006 22. R. Miller, Equalization methods with true response using discrete filters, in 116th AES Convention, Paper no. 6088, Berlin, Germany, 8–11 May 2004 23. Motorola Inc., Digital Stereo 10-Band Graphic Equalizer Using the DSP56001, Application note (1988) 24. S. Tassart, Graphical equalization using interpolated filter banks. J. Audio Eng. Soc. 61(5), 263–279 (2013) 25. J. Rämö, V. Välimäki, B. Bank, High-precision parallel graphic equalizer. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 1894–1904 (2014) 26. B. Bank, Audio equalization with fixed-pole parallel filters: an efficient alternative to complex smoothing. J. Audio Eng. Soc. 61, 39–49 (2013) 27. A. Uncini, Fundamentals of Adaptive Signal Processing (Springer, Berlin, 2015). ISBN: 9783-319-02806-4 28. J.D. Reiss, Design of audio parametric equalizer filters directly in the digital domain. IEEE Trans. Audio Speech Lang. Process. 19, 6 (2011) 29. R.J. Clark, E.C. Ifeachor, G.M. Rogers, P.W.J. Van Eetvelt, Techniques for generating digital equalizer coefficients. J. Audio Eng. Soc. 48, 4 (2000) 30. S.J. Orfanidis, High-order digital parametric equalizer design. J. Audio Eng. Soc. 53(11), 1026–1046 (2005) 31. L.B., Jackson, Roundoff noise bounds derived from coefficient sensitivities for digital filters. IEEE Trans. Circuits Syst. CAS-23, 481–485 (1976) 32. C.T. Mullis, R.A. Roberts, Roundoff noise in digital filters: frequency transformations and invariants. IEEE Trans. Acoust. Speech Signal Process. 24(6) (1976) 33. P.A. Regalia, S.K. Mitra, P.P. Vaidyanathan, The digital all-pass filter: versatile signal processing building block. Proc. IEEE 76(1) (1988) 34. P.P. Vaidyanathan, S.K. Mitra, Y. Neuvo, A new approach to the realization of low-sensitivity IIR digital filters. IEEE Trans. Acoust Speech Signal Proces. 34, 350–361 (1986) 35. J.M. Jot, Proportional parametric equalizers—application to digital reverberation and environmental audio processing, in 139th AES Convention, Paper no. 9358, New York, USA, Oct. 29–Nov. 1, 2015 36. U. Zölzer, T. Boltze, Parametric digital filter structures, in Proceedings of Audio Engineering Society Convention (1995) 37. H. Chamberlin, Musica Application of Microprocessors (Hayden Book Company, Indianapolis, 1980) 38. P. Dutilleux, U. Zölzer, Filters, in DAFX, Digital Audio Effects (Wiley, Hoboken, 2002), pp. 31–62 39. U. Zölzer, Digital Audio Effects (Wiley, New York, 1997). ISBN 0-471-97266-6 40. U. Zölzer (ed.), DAFX–Digital Audio Signal Processing (Wiley, New York, 2002). ISBN 0471-49078-4
230
3 Digital Filters for Audio Applications
41. T.G. Stockham Jr., High-speed convolution and correlation, in Proceedings of Spring Joint Computer Conference, vol. 28, New York, USA, pp. 229–233 (1966) 42. W. Kellermann, Kompensation akustischer Echos in Frequenzteilbandern. Frequenz 39(7/8), 209–215 (1985) 43. P.C.W. Sommen, Partitioned frequency domain adaptive filters, in Proceedings of Asilomar Conference on Signals and Systems, Pacific Grove, California, USA, pp. 677–681 (1989) 44. J.S. Soo, K.K. Pang, Multidelay block frequency domain adaptive filter. IEEE Trans. Acoust. Speech Sig. Proces. 38, 373–376 (1990) 45. G.P.M. Egelmeers, P. Sommen, A new method for efficient convolution in frequency domain by nonuniform partitioning for adaptive filtering, in 7th European Signal Processing Conference (EUSIPCO 1994) (Scotland, Edinburgh, 1994), pp.1030–1033 46. B.D. Kulp, Digital equalization using Fourier transform techniques, in Proceedings of 85th Audio Engineering Society Convention, Los Angeles, USA (1988) 47. W.G. Gardner, Efficient convolution without input-output delay. J. Audio Eng. Soc. 43(3), 127–136 (1995) 48. G. Garcia, Optimal filter partition for efficient convolution with short input/output delay, in 113th Convention of the Audio Engineering Society (2002) 49. F. Wefers, J. Berg, High-performance real-time FIR-filtering using fast convolution on graphics hardware, in Proceedings of the 13th International Conference on Digital Audio Effects (DAFx10), Graz, Austria, 6–10 Sept 2010 50. E. Battenberg, R. Avižienis, Implementing real-time partitioned convolution algorithms on conventional operating systmes, in Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, 19–23 Sept 2011 51. A. Primavera, S. Cecchi, L. Romoli, P. Peretti, F. Piazza, A low latency implementation of a non uniform partitioned overlap and save algorithm for real time applications, in Audio Engineering Society Convention, vol. 131, New York, USA (2011) 52. F. Wefers, M. Vorländer, Optimal filter partitions for non-uniformly partitioned convolution, in AES 45th International Conference, Helsinki, Finland, 1–4 Mar 2012 53. U. Zölzer, B. Redmer, J Bucholtz, Strategies for switching digital audio filters, in Proceedings of 95th Audio Engineering Society Convention, New York (1993) 54. J.O. Smith, Digital State-Variable Filters (2019). https://ccrma.stanford.edu/jos/svf/svf.pdf 55. A.H. Gray, J.D. Markel, Digital lattice and ladder filter synthesis. IEEE Trans. Audio Electroacoustic. AU-21, 491–500 (1973) 56. D.A. Bohn, Constant-Q graphic equalizer. J. Audio Eng. Soc. 34(9) (1986) 57. R. Bristow-Johnson, Cookbook Formulae for Audio EQ Biquad Filter Coefficients (2011). https://www.w3.org/2011/audio/audio-eq-cookbook.html. Accessed 23 Mar 2019
Chapter 4
Multi-rate Audio Processing and Wavelet Transform
4.1 Introduction The hearing organ is capable of perceiving sounds in a very wide frequency range, covering about 4 decades: nominally from 20 Hz to 20 kHz. Most common audio signal sampling frequencies, although they ensure adequate acquisition of higher frequencies, may not be necessary for signals in the lower frequency range. For example, for processing a signal to be sent to a subwoofer, (or LFE, low-frequency effect), which typically covers the band from 3 to 120 Hz, a sampling frequency of 44.1 kHz or higher is completely redundant. In addition, to design a cross-over filter for the subwoofer to have a sufficiently narrow transition frequency band, it would be necessary to have a long FIR filter, or very high precision robust structure in the IIR case (see Sect. 2.7.4.4). In DASP it is possible and convenient to process portions of the audio signal with different sampling frequencies, and the filter banks (FBs) represent a formal way to perform the so-called subband decomposition and partitioning the input spectrum for further processing [1–15]. Under certain conditions, the subbands signal decomposition can be assimilated, to an orthogonal transformation. Thus, closely related to multi-rate and FBs are the method for wavelet analysis and wavelet transformation. Wavelet analysis and transforms represent a tool for time–frequency analysis of audio signals with identical precision over the entire spectral range, which generalizes both the short-time Fourier transform (STFT) presented in Sect. 4.4 and the concept of FB itself [16–28]. The formal development of these topics is dense with theoretical implications that would require adequate space for their in-depth analysis. However, in this chapter the topics are presented by trying to limit the use of mathematical tools to strict necessity while attempting to cover the arguments in a sufficiently comprehensive manner appropriate to the objectives of the book.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Uncini, Digital Audio Processing Fundamentals, Springer Topics in Signal Processing 21, https://doi.org/10.1007/978-3-031-14228-4_4
231
232
4 Multi-rate Audio Processing and Wavelet Transform
4.2 Multirate Audio Processing The term multirate signal processing indicates, in general, a signal processing methodology in which there are more sampling frequencies [1]; and, as we shall see, in professional audio this methodology is widely used in many applications as, for example, digital storage, transmission and in digital audio signal processing (DASP). The interest in multirate signal processing and filter banks (FBs) has grown enormously in recent years. In modern audio (also images and video) signal compression and coding methods, subband equalizers, low-frequency narrowband transition filters, …, such techniques are widely used [1–9]. A first example where it may be necessary is the one related to the conversion between two different sampling frequencies such as in the conversion from (48 or 96 or 192 kHz) to CD (44.1 kHz) or vice versa [29–32]. In other situations multirate signal processing can be used to reduce the computational cost. In signal processing there is no reason why the various processing processes should all be done at the same sampling frequency. For example, in the case of FIR filtering if a narrow transition band is required (as in subband coding for audio compression), by undersampling the signal, filter performance can be improved and computational cost reduced.
4.2.1 Sampling Rate Reduction by an Integer Factor The sampling rate conversion to a lower submultiple frequency can be performed with a process called decimation, defined as xd [n] = x[nD]
(4.1)
with D > 1 integer. The sampling frequency can be reduced by a factor D without aliasing if the starting sequence is band-limited, i.e., x[n] ∴ X (ejω ) = 0, for |ω| > π/D. Usually, to ensure bandwidth limitation, a so-called anti-aliasing filter is inserted as shown in Fig. 4.1. Let h[n] be the impulse response of the anti-aliasing filter, the decimator output can be expressed as
Fig. 4.1 Decimation process of an integer D factor
4.2 Multirate Audio Processing
233 ∞
xd [n] =
h[k]x[nD − k].
(4.2)
k=−∞
The anti-aliasing lowpass filter is inserted upstream of the decimation process and is characterized by the ideal frequency response of the type H (e ) = jω
1,
|ω| ≤ π/D
0,
π/D < |ω| ≤ π.
(4.3)
By performing the inverse discrete-time Fourier transform (IDTFT) of the previous expression, the impulse response of the ideal lowpass filter with ωc = π/D is obtained which is of the type hDid [n] =
ωc sin(π n/D) , π π n/D
−∞ < n < ∞.
(4.4)
The filter impulse response can be determined with one of the classical methodologies (windowing, equiripple, etc.) reported in [33–35] or with more specific procedures related to decimator filters [1, 6]. In the DTFT frequency domain the spectrum of the decimated signal can be written as D−1 D−1 1 1 jω j(ω−2πk)/D) Xd (e ) = X (e )= X (ejω/D e−j2πk/D ) D D k=0 k=0 (4.5) D−1 1 jω/D k = X (e FD ) D k=0
where FD e−j2π/D = cos(2π/D) − j sin(2π/D). X (ej(ω−2πk)/D ), e.g., for k = 0, the spectra corThis means DXd (ejω ) = D−1 k=0 responds to the D-stretched version of X (ejω ), and the D − 1 terms with k > 0 are uniformly shifted versions of this stretched version. An example of a signal decimated by a factor D = 2 is shown in Fig. 4.2. The overall D terms form a function with period 2π in ω, which is the fundamental property of the Fourier transform of any sequence. With z = ejω , (4.5) becomes the z-transform of the decimated signal results in 1 Xd (z) = X (z 1/D FDk ). D D−1
k=0
(4.6)
234
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.2 Time and spectral representation of the decimation process for D = 2. The appearance of the spectra tails overlapping is due to the aliasing phenomenon
For example, for D = 2 1 1 X (z 1/2 FD0 ) + X (z 1/2 FD1 ) = X (z 1/2 ej2π0/2 ) + X (z 1/2 e−j2π1/2 ) 2 2 1 = X (z 1/2 ) + X (−z 1/2 ) . 2 (4.7) The multiplication of z by the phasor FD = e−j2π/D , which can be interpreted as a modulation process, is equivalent to its clockwise rotation by an angle equal to 2π/D. For example, for D = 2 and k = 1, zF21 = ze−jπ = z(cos π − j sin π ) = −z. Xd (z) =
4.2.1.1
Spectral Representation with Modulation Components
Sometimes it is useful to represent the spectrum of the decimated signal by indicating the various modulation components more explicitly. In this case the spectrum is defined with respect to the normalized angular frequency of the undecimated sequence ω as 1 1 (m) jω Xd (e ) = X (ejω−j2πk/D ) = Xk (e ) D D D−1
D−1
k=0
k=0
jω
(4.8)
whereby Xk(m) (ejω ) X (ejω FDk ), we denote the modulation component of X (ejω ), that is, the multiplication of the argument ejω by the phasor FDk = e−j2πkω/D . In Fig. 4.3 are shown, the various spectral representations for a signal decimated by a factor of D = 4.
4.2 Multirate Audio Processing
235
Fig. 4.3 Spectral representations of the decimated signal for D = 4. a Spectrum of band-limited (m) sequence X (ejω ) = 0, |ω| ≥ π/D; b the terms Xk (ejω ), represent the modulation spectral components; i.e., the spectrum is replicated D-folds; c Decimated spectra considering the new decimated frequency ω = ω/D, normalized w.r.t. the reduced sampling frequency rate
Fig. 4.4 Decimation procedure with three cascade stages. For each stage the filter cutoff frequency is high enough so that each filter can be made with a not excessive length
4.2.1.2
Cascade Stages Decimation
In the case of high decimation factor D, the filter will be characterized by a rather low cutoff frequency. To guarantee a good attenuation of the stopband the filter impulse response will be rather long (often hundreds or more) and, therefore, with a high computational cost. For large D it is convenient the use of a multistage approach. To perform an high decimation process we can use a cascade of some stages, for which it results: D = k Dk , for example if D = 12, it is possible to carry out the decimation with three stages D1 = 2, D2 = 2, D3 = 3 as illustrated in Fig. 4.4.
4.2.2 Sampling Rate Increase by an Integer Factor The interpolation process, defined as a process opposite to decimation, allows to represent a numerical sequence with a higher sampling frequency than the original one.
236
4 Multi-rate Audio Processing and Wavelet Transform
This process is characterized by an input–output relationship of the type xi [n] =
∞
h[k]ˆx[n − k]
(4.9)
k=−∞
where xˆ [n] =
x[n/L], n = 0 ± L, ±2L, . . . 0,
otherwise.
(4.10)
As illustrated in Fig. 4.5a, from Eqs. (4.9) and (4.10) we can deduce that the interpolation process is realized in the following two cascading stages. • Sampling rate expander—The first stage consists of inserting L − 1 zeros between two successive samples of the input sequence x[n], thus obtaining the zero-padded sequence xˆ [n]. This operation increases the sampling rate of the input signal by a factor L. • Interpolation filter—The second stage consists of interpolating the null and nonnull samples. Subsequent interpolation is performed by lowpass filtering the sequence xˆ [n] with a FIR filter h[n], called reconstruction filter or anti-image filter. Such a filter has a cutoff frequency equal to π/L and a gain equal to L to compensate for the lowering of the average level due to the previous L − 1 zero-filling between samples. The scheme of the interpolation process is shown in Fig. 4.5b. The ideal lowpass interpolator filter is characterized by a frequency response of the type L, |ω| ≤ π/L jω (4.11) H (e ) = 0, π/L < |ω| ≤ π
Fig. 4.5 Interpolation process of a integer factor L: a Sampling rate expander: zero-padded sequence by inserting L − 1 zeros between two successive samples; b Lowpass DT filter with cutoff frequency π/L and gain L
4.2 Multirate Audio Processing
237
with an impulse response hLid [n] = L
sin(π n/L) , π n/L
−∞ < n < ∞
(4.12)
n = ±L, ±2L, ±3L, . . . .
(4.13)
so, we get hLid [n] = 0,
Figure 4.6 shows an example of an interpolated signal of a factor L = 2 with the relative spectra. The z-transform of the interpolator output xi [n] is given by Xi (z) = X (z L )
(4.14)
this means Xi (ejω ) = X (ejωL ), i.e., Xi (ejω ) is an L-fold compressed version of X (ejω ), as Fig. 4.6d shown. The appearance of multiple copies of the basic spectrum in Fig. 4.6d is called the imaging effect, and the extra copies are the images created by the interpolator. From Eq. (4.13), as also shown in Fig. 4.6, we observe that the impulse response is zero at the non-null input signal samples. This circumstance guarantees the absence of intersymbolic distortion in the process of signal reconstruction. The design of the interpolator filter can be performed with various methodologies and the interpolator can be implemented with several cascade stages (see, e.g., [1, 6, 33]).
Fig. 4.6 Temporal and spectral representation of the interpolation process. The transfer function H (ejω ) represents the anti-image or interpolator or reconstruction filters
238
4 Multi-rate Audio Processing and Wavelet Transform
4.2.3 Polyphase Representation The polyphase components of a sequence represent a very useful tool in the problems of multirate signal processing. We introduce the topic for simplicity by analyzing the two-channel case. Let x[n], be a sequence with z-transform X (z), this can be expressed by separating its odd and even (polynomial) components ∞
X (z) =
x[n]z −n =
n=−∞
∞
∞
x[2n]z −2n + z −1
n=−∞
x[2n + 1]z −2n
(4.15)
x[2n + 1]z −n
(4.16)
n=−∞
the terms (p)
X0 (z) =
∞
x[2n]z −n ,
and
∞
(p)
X1 (z) =
n=−∞
n=−∞
are defined as subphase components. While the terms (p)
X0 (z 2 ) =
∞
x[2n]z −2n ,
and
(p)
X1 (z 2 ) =
n=−∞
∞
x[2n + 1]z −2n
(4.17)
n=−∞
are defined as polyphase components. From the above, X (z) can be represented as the sum of the terms (p) (p) (4.18) X (z) = X0 (z 2 ) + z −1 X1 (z 2 ). For D = 2, the two polynomials of the polyphase representation have the form (p)
X0 (z 2 ) = x[0]z −0 + x[2]z −2 + x[4]z −4 + · · ·
(4.19)
(p) X1 (z 2 )
(4.20)
= x[1]z −0 + x[3]z −2 + x[5]z −4 + · · · .
So the transform X (z) can be written as X (z) = z −0 · (x[0]z −0 + x[2]z −2 + x[4]z −4 + · · · ) + z −1 · (x[1]z −0 + x[3]z −2 + x[5]z −2 + · · · ). Generalizing the (4.15), the polyphase representation of a FIR filter h[n], for any integer D, for n = mD and with D > 1, can be written as H (z) =
D−1 ∞ k=0 m=−∞
h[mD + k]z −(mD+k) =
D−1 k=0
(p)
z −k Hk (z D )
(4.21)
4.2 Multirate Audio Processing
239
indicated polyphase representation of type1 I, where the subphase filters are defined as ∞ (p) (p) Hk (z) = hk [m]z −m , for k = 0, 1, . . . , D − 1 (4.22) m=−∞
with
(p)
hk [m] h[mD + k], kth subphase filter.
(4.23) (p)
Observe that, from the z-transform of the above, the subphase filter hk [m] can be interpreted as the output of a circuit consisting of a shift of k samples (z k ) followed (p) by a decimator D, as depicted in Fig. 4.7. The term Hk (z D ) that appears in (4.21) has the form (p)
Hk (z D ) =
∞
h[mD + k]z −mD ,
k = 0, 1, . . . , D − 1
(4.24)
m=−∞
is defined as the kth polyphase component of TF H (z). For Eq. (4.22), generalizing the interpretation in Fig. 4.7, the polyphase decomposition of order D can be interpreted as shown in Fig. 4.8. Sometimes, the M polynomials of the polyphase representation are referred as vector’s elements x(p) (z) = [ X0(p) (z) z −1 X1(p) (z) · · · z −(M −1) XM(p)−1 (z) ]T .
(4.25)
Regarding the spectrum, we observe that the Xd (z D ) in the (4.21) can be expressed in terms of the modulation components by confirming the expressions (4.6)–(4.8). In fact, for z = (z )1/D and FDk = e−j2πk/D , it results Fig. 4.7 Circuit representation of the kth subphase filter
Fig. 4.8 Graphical interpretation of a D-order polyphase decomposition 1
Other types of polyphase representations are defined in the literature. See, e.g., [6, 8].
240
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.9 Representation of the modulation components of the signal X (ejω ) for D = 4. This representation contains all the modulation spectral components and in multirate systems allows a unitary description related only to the sampling rate of the input sequence
Xd (z ) =
1 1 (m) X ((z )1/D FDk ) = Xk (z) D D D−1
D−1
k=0
k=0
(4.26)
where Xk(m) (z) X (zFDk ) denotes the modulation component of X (z), i.e., the multiplication of the independent variable z by the number FDk = e−j2πk/D , so we have the substitution z → zFDk . Similarly to (4.25), the modulated representation of X (z), for a given decimation or interpolation rate D, is defined as [6] T T (m) x(m) (z) = X0(m) (z) X1(m) (z) · · · XD−1 (z) = X (z) X (zFD ) · · · X (zFDD−1 )
(4.27)
in practice, consists of a representation of X (z) in which all its D modulation components Xk(m) (z), for k = 0, 1, …, D − 1 are shown. Figure 4.9, for example, shows the modulated representation of X (z) for D = 4. Observe, see (4.7), that for D = 2 the modulated representation is T x(m) (z) = X (z) X (−z)
(4.28)
where X0(m) (z) = X (z) is the baseband component while, for F21 = cos(π ) = −1, the component X0(1) (z) = X (−z) is the one translated around ω = π . Remark 4.1 The modulated representation of X (z) contains, by definition, all the spectral modulation components. In multirate systems, this representation is very important because it allows a unitary description, related to the sampling rate of the input sequence and the subblocks that compose the signal processing system. In multirate systems, the various blocks work at different frequencies, usually multiple or submultiple of the input sampling frequency that is taken as a reference.
4.2 Multirate Audio Processing
241
Fig. 4.10 Noble identities for commuting. The downsamplers and upsamplers are linear, time-varying operators. Noble identity is not a trivial property because in time-varying systems the order of the process is very important
Remark 4.2 From the previous development, the reader can easily verify that the polyphase components are obtained from the modulation components as x(p) (z) =
1 Fx(m) (z) D
(4.29)
where F is the DFT matrix defined in Chap. 2, Sect. 2.5.1.
4.2.4 Noble Identity of Multirate Circuits Figure 4.10 shows the so-called noble identity of multirate circuits, which allows switching between a decimator/interpolator with a sparse TF that can be expressed as a function of z D . Note that, decimators and interpolators are time-varying operators and, therefore, the order of the process is very important. In addition note that, the noble identity holds for all memoryless operators such as sum and multiplication.
4.2.4.1
Decimation: Revisiting with Polyphase Components
Consider the decimation circuit of Fig. 4.11a in which the input sequence is first filtered by the h[n] anti-aliasing filter and then decimated. By applying the noble identity it is possible to switch the two operations in a way that allows a computational gain in the realization of the filter. In fact, as shown in Fig. 4.11b, filters are applied to a D times smaller signal saving D multiplications in the convolution calculation. The outputs of the decimator bank in Fig. 4.11b are defined as polyphase signals. The decimator circuit is structured as a polyphase filter bank, where the kth polyphase (p) (p) signal xk [n] is filtered by the kth subphase filter xk [n] and the overall output is defined by the sum of the polyphase bank. The circuit in Fig. 4.11b can be interpreted as a D-length input delay line that performs a series–parallel conversion. The D signals for k = 0, …, D − 1 are decimated
242
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.11 Circuit for decimating a signal: a the input sequence is first lowpass filtered and then decimated by a factor D. b the input sequence is first decimated by a factor D and then lowpass (p) (p) (p) filtered by a bank of D filters: h0 [m], h1 [m], . . ., hD−1 [m]
Fig. 4.12 Circular switching demultiplexer circuit with polyphase structure. For each sampling instant n the position switches sending the output from one of the D subphase filters
(p)
and then filtered with an impulse response hk [n]. Observe that the input delay line is equivalent to a circular switching demultiplexer as shown in Fig. 4.12. Let Lf = Ls D − 1, with D and Ls integers, be the length of the anti-aliasing filter h[n], the subphase filters are of length Ls . The subphase filters and the subphase sequence (input into the delay line of the subphase filter) are defined as (p)
hk [n] h[nD + k] , (p) xk [n] x[nD + k]
k = 0, 1, . . . , D − 1.
(4.30)
4.2.5 Fractional Sampling Ratio Frequency Conversion Conversion between two arbitrary sample rates, including cases where the ratio is integer, rational, or irrational, is of central importance in many audio applications. In all the situations described, the conversion problem consists of determining a new signal sample placed between two samples of the original signal by a process of interpolation or extrapolation.
4.3 Filter Banks for Audio Applications
243
Fig. 4.13 Fractional interpolation ratio interpolation scheme
4.2.5.1
Sampling Rate Conversion with Rational Ratio
In the case of fractional D/L decimation or interpolation factor, the result can be easily achieved by placing a decimator stage downstream of an interpolator stage as shown in Fig. 4.13. For linearity, the two cascaded filters can be implemented as a single filter with the following cutoff frequency ωc = min
π π , L D
(4.31)
and a gain G = L. In the case where the conversion factor cannot be expressed as a integers ratio, it is possible to determine an approximate solution. For example, if we want to change from a sampling frequency of 44.1 kHz to a sampling frequency of 48 kHz , we see that the ratio between the two frequencies is not integer D/L = 44.1/48 = 0.91875. The approximate choice of the L and D factors can be made as min (|L − L|) ,
L=D
48 44.1
(4.32)
where · represents the integer part. In this case an acceptable solution is that for which 160 48 ≈ . (4.33) L = 160, D = 148, whereby 44.1 148 However, the previous solution is an approximation and reported only by way of example. As will be described later (see Sect. 5.5.6), in fact, ad hoc solutions are available for online frequency conversion between 48 and 44.1 kHz [30]. Remark 4.3 Note that the non-rational frequency conversion can be addressed as a general interpolation problem which will be dealt with in Sect. 5.5.
4.3 Filter Banks for Audio Applications Filter banks (FBs) are circuits consisting of lowpass, bandpass, and highpass filters, combined according to appropriate architectures, designed to decompose the spectrum of the input signal into a number of contiguous bands.
244
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.14 General principle of a subband encoder implemented with a filter bank (FB)
The desired characteristics of such spectral subband decomposition (SBD) vary depending on the context, and moreover, for any particular application, multiple methodologies are available. Therefore, it is necessary to consider: i. whether to use FIR or IIR filters. IIR filters can lead to computationally very efficient structures; while FIR FBs offer great flexibility in both the design and use of the bank itself; ii. the type of SBD; the most commonly used subdivisions are the uniform in which the contiguous signal bands have the same width and the octave subdivision in which the contiguous signal bands double as in constant-Q filter banks. Other characteristics often desired or imposed in the FBs design may concern the filters transition bands width, the attenuation of the stopband, the ripple, the regularity, etc. The SBD, as shown in Fig. 4.14, is performed by a FB called analysis filter bank while the reconstruction is done through a so-called synthesis filter bank. Thus, a subband coding (SBC) is constituted by an analysis FB of followed by a synthesis FB: The signal is partitioned into subbands, coded or processed according to the type of use, and then reconstructed through the synthesis bank.
4.3.1 Generalities on Filter Banks In the philosophy of FBs design, rather than analyzing the individual filters of the analysis and synthesis FBs separately, it is appropriate to consider the global SBC specification such as, for example, the acceptable level of aliasing, group delay, and reconstruction error. In other words, the specification is given in terms of the relationships between the input and output signals regardless of the local characteristics of the analysis and synthesis filters. For example, it is possible to consider a FB called perfect reconstruction filter bank, aliasing, … (PRFB). Sometimes it is possible to trade-off computational efficiency, or other characteristics (such as group delay) by introducing acceptable signal reconstruction errors. Thus, by imposing a certain quality index on the reconstructed signal, it is possible to have more degrees of freedom and control capabilities in the filter design.
4.3 Filter Banks for Audio Applications
245
Fig. 4.15 A subband coding (SBC) consists of an analysis FB followed by a synthesis FB: the signal is partitioned into subbands and encoded
A large class of FBs can be represented by the diagram in Fig. 4.15. The analysis bank decomposes the input signal into M subbands of identical width, each of which is decimated by a factor D. In the case where D = M , the bank is called maximally decimated or at critical sampling.
4.3.1.1
FB’s Orthogonality and Orthonormality Conditions
From a formal point of view a filter bank can be seen as a linear transformation applied to an input signal. This aspect allows us the assimilation of FBs with other linear transformations (and vice versa), and the use of algebraic tools in theoretical development. Therefore, in the design and use of filter banks, the conditions of orthogonality, normality, and bi-orthogonality with which the signals are decomposed and represented are very important. Now, before proceeding with the study of filter banks, we recall some fundamental concepts. Given a set of vectors qi {qi ∈ (R, C)N }, i = 1, . . . , N , the following definitions apply. Definition 4.1 A set of vectors qi is orthogonal if
qi , qj = 0, or qiH qj = 0,
i = j ∈ Z.
Definition 4.2 A set of vectors qi is orthonormal if qiT qj = δ[i − j] where δ[i − j] or δij represents the Kronecker symbol: δij = 1, for i = j; δij = 0, for i = j. Definition 4.3 A matrix Q ∈ (R, C)N ×N is orthonormal if QH Q = QQH = I
246
4 Multi-rate Audio Processing and Wavelet Transform
observe, also, that in the case of orthonormality it holds that Q−1 = QH . Furthermore, a matrix for which QH Q = QQH holds is called a normal matrix. Definition 4.4 Two not square matrices Q and P are said to be bi-orthogonal if QH P = PH Q = I. In particular, the latter definition may be useful if one wants to design SBCs with non-symmetrical and reciprocal analysis and synthesis filter banks.
4.3.2 Two-Channel Filter Banks Without loss of generality, for simplicity, consider the two-channels SBC shown in Fig. 4.16. It is known that the two-channels SBC can be used to determine certain fundamental characteristics extensible to the M channels case.
4.3.2.1
Perfect Reconstruction Conditions
For M = 2, the TFs of the analysis–synthesis FBs, the pairs H0 (z), G 0 (z), and H1 (z) G 1 (z) are, respectively, a lowpass and highpass symmetrical filters with a cutoff frequency equal to π/2. To formally determine the so-called perfect reconstruction conditions (PRC), we consider the overall input–output TF. The cascade connection of the analysis and synthesis FBs, as shown in Fig. 4.16, is defined as perfect reconstruction FB, if the input–output relationship is a simple delay. Let xˆ [n] be the reconstructed output sequence, we have that xˆ [n] = x[n − n0 ], such that the overall TF, relative to the input signal sampling frequency, is equal to T (z) =
Xˆ (z) = z −n0 . X (z)
(4.34)
Through the modulated component representation, referred to the sampling frequency of the input sequence, the (4.34) can be expressed as
Fig. 4.16 Two-channels SBC with critical sample rate (M = D = 2)
4.3 Filter Banks for Audio Applications
247
H0 (z) H0 (−z) 1 X (z) G 0 (z) G 1 (z) Xˆ (z) = X (−z) H1 (z) H1 (−z) 2
(4.35)
which can be rewritten as the sum of two contributions F0 (z) and F1 (z) Xˆ (z) =
1 2
[G 0 (z)H0 (z) + G 1 (z)H1 (z)] X (z) +
1 2
[G 0 (z)H0 (−z) + G 1 (z)H1 (−z)] X (−z)
= F0 (z)X (z) + F1 (z)X (−z).
(4.36) Definition 4.5 Perfect reconstruction and aliasing-free conditions—Imposing Eq. (4.36) on the overall TF, desired for a perfect reconstruction FB, the following expression holds H0 (z) H0 (−z) 1 G 0 (z) G 1 (z) = z −n0 0 . H1 (z) H1 (−z) 2
(4.37)
The F1 (z) condition guarantees the aliasing components cancelation and results in 1 2
[G 0 (z)H0 (−z) + G 1 (z)H1 (−z)] = 0
(4.38)
also denoted to as aliasing-free condition. The F0 (z) guarantees perfect reconstruction, unless a delay 1 2
[G 0 (z)H0 (z) + G 1 (z)H1 (z)] = z −n0
(4.39)
and, it is denoted to as perfect reconstruction condition or non distortion condition. The analysis–synthesis banks must be chosen to satisfy, or at least approximate, the perfect reconstruction aliasing-free conditions in Eqs. (4.38) and (4.39). In addition, a second essential characteristic is the frequency selectivity of the filters. Furthermore, in addition to the expression (4.34), the other modulated components must also be taken into account. Thus, by solving for the component the Xˆ (−z) we can write H0 (z) H0 (−z) 1 X (z) G 0 (−z) G 1 (−z) . (4.40) Xˆ (−z) = X (−z) H1 (z) H1 (−z) 2 Now, to determine global conditions, the expressions (4.35) and (4.40) can be aggregated as
1 G 0 (z) G 1 (z) Xˆ (z) H0 (z) H0 (−z) X (z) = . H1 (z) H1 (−z) X (−z) Xˆ (−z) 2 G 0 (−z) G 1 (−z)
(4.41)
The latter expression provides a complete description of the two-channel bank in terms of the input and output modulation components.
248
4 Multi-rate Audio Processing and Wavelet Transform
4.3.2.2
FBs Modulation Component Conditions
Denoting the modulation component matrices in (4.41) as G 0 (z) G 1 (z) H0 (z) H0 (−z) T , and Hm (z) = , Gm (z) = G 0 (−z) G 1 (−z) H1 (z) H1 (−z)
(4.42)
T T and with xˆ (m) (z) = Xˆ (z) Xˆ (−z) , x(m) (z) = X (z) X (−z) , Eq. (4.41) in modulation components notation, can be rewritten in compact form xˆ (m) (z) =
1 Gm (z) · HmT (z) · x(m) (z). 2
(4.43)
The perfect reconstruction conditions (4.38)–(4.39) turn out to be
−n 0 X (z) Xˆ (z) z 0 = 0 (−z)−n0 X (−z) Xˆ (−z)
(4.44)
from which it is easy to derive the relationship between the matrices Gm (z) and Hm (z) which, assuming n0 odd, is equal to T −1 1 0 Hm (z) −n0 0 (−1) 2z −n0 H1 (−z) −H0 (−z) = det Hm (z) H1 (z) −H0 (z)
Gm (z) = 2z −n0
(4.45)
and writing the determinant as det Hm (z) = H0 (z) H1 (−z) − H0 (−z) H1 (z)
(4.46)
the link between the analysis and synthesis filters, for perfect reconstruction conditions, can be expressed as G 0 (z) =
2z −n0 2z −n0 · H1 (−z), and G 1 (z) = − · H0 (−z). det Hm (z) det Hm (z)
(4.47)
The FB TFs can be made with IIR and FIR filters. However, in many applications it is more appropriate to use FIR filters. Furthermore, even though H0 (z) and H1 (z) are of FIR type from (4.47), due to the presence of the denominator, G 0 (z) and G 1 (z) are of IIR filter. Property 4.1 The condition for G 0 (z) and G 1 (z) to be FIR-filtered type is that the denominator be a pure delay det Hm (z) = H0 (z) H1 (−z) − H0 (−z) H1 (z) = c · z −k
(4.48)
4.3 Filter Banks for Audio Applications
249
in this case it applies that ([6], Eqs. (6.38) and (6.39)) G 0 (z) = + 2c z −n0 +k H1 (−z) G 1 (z) = − 2c z −n0 +k H0 (−z).
(4.49)
Property 4.2 From Eqs. (4.38)–(4.39), the condition on the modulation components matrices can be written compactly as
F(z) 0 Tm (z) = Hm (z)Gm (z) = . 0 F(−z)
(4.50)
Again, from the latter and (4.38), a necessary and sufficient condition for aliasing cancelation is G 0 (z)H0 (−z) + G 1 (z)H1 (−z) = 0.
4.3.2.3
Filter Bank Signal Decomposition: Time-Domain Analysis
Let us consider a two-channel analysis FB that produces a decimated output. As described in Chap. 2, Sect.2.7.2, we can write the input–output relation as the convolution operator form, i.e., yi = Hi x, i = 0, 1. In this case, since we need to decimate the signal, we can put the decimation operation directly into the convolution operator. The matrix Hi is constructed by translating the impulse responses of two samples (instead of one sample as in normal convolution). The input–output relation turns out to be the following ⎡
0 ⎢ 0 ⎢ ⎢ . ⎤ ⎢ ⎡ ⎢ .. yi [n] ⎢ ⎢ yi [n − 1] ⎥ ⎢ . ⎥ ⎢ ⎢ . ⎥ ⎢ ⎢ . . ⎥=⎢ ⎢ ⎢ . ⎥ ⎢ . ⎢ . ⎥ ⎢ . ⎢ ⎦ ⎢ . ⎣ . ⎢ . ⎢ . . ⎢ . ⎢ . ⎢ ⎣ h [2]
··· ···
..
.
··· 0
. .. hi [1] i hi [0] 0
··· ···
··· ···
.
..
···
0
..
.
hi [M − 1] · · ·
. .. hi [0] 0
. .. 0 ···
⎤ ··· 0 hi [M − 1] hi [M − 1] hi [M − 2] hi [M − 3] ⎥ ⎥ ⎥ . . . . ⎥⎡ ⎤ ⎥ .. .. .. .. x[n] ⎥ ⎥ ⎢ x[n − 1] ⎥ ⎥⎢ ⎥ ⎥⎢ ⎥ hi [M − 1] ··· hi [0] 0 ⎥⎢ . ⎥ ⎥⎢ . ⎥ ⎥⎢ . ⎥ ⎥⎣ ⎦ ⎥ hi [0] 0 0 0 . ⎥ . ⎥ . . . . . ⎥ . . ⎥ .. .. . . ⎥ ⎦ ··· ··· 0 ··· ··· ··· 0 ··· 0
(4.51)
In the same way, on the synthesis side, we define a matrix Gi similarly to Hi , but with the filter coefficients gi [n] written in reverse order, such that the overall output of the analysis–synthesis bank turns out to be equal to xˆ = (G0 H0 + G1 H1 )x.
(4.52)
In this case, the condition of perfect reconstruction turns out to be simply G0 H0 + G1 H1 = I.
(4.53)
250
4 Multi-rate Audio Processing and Wavelet Transform
Now, in order to have a more compact notation, and to define global (algebraic) properties of the bank, Eq. (4.52) is rewritten in term of linear transformation, as xˆ = Ta Ts x, where the matrix Ta , called the transfer matrix of the analysis bank, contains both impulse responses h0 , h1 . Similarly, Ts , called the synthesis bank transfer matrix, contains the impulse responses g0 , g1 . The composition of the matrices Ta , and Ts , is such that the following equality is met G0 H0 + G1 H1 = Ta Ts .
(4.54)
Denoting by h0 , h1 , and g0 , g1 , the impulse responses vectors of the analysis and synthesis bank, the reader will be able to verify that the perfect reconstruction condition is met in the case where the matrices Ta , and Ts , are defined as ⎡
h0 [M − 1] · · ·
h0 [0]
0
···
h0 [0]
0
⎢ ⎢ h1 [M − 1] ⎢ ⎢ ⎢ 0 Ta = ⎢ ⎢ 0 ⎢ ⎢ .. ⎢ . ⎣ .. .
0 h0 [M − 1] · · · 0 h1 [M − 1] · · · ··· .. .
··· .. .
··· .. .
⎤ ··· ··· ··· . ⎥ · · · · · · .. ⎥ ⎥ .. ⎥ h0 [0] · · · . ⎥ ⎥ h1 [0] · · · · · · ⎥ ⎥ .. .. ⎥ ⎥ . . ··· ⎦ .. .. .. . . .
(4.55)
and for Ts we have that ⎡
g0 [0] · · · g0 [M − 1] 0
⎢ ⎢ g1 [0] ⎢ ⎢ ⎢ 0 T Ts = ⎢ ⎢ 0 ⎢ ⎢ . ⎢ .. ⎣ .. .
· · · g1 [M − 1] 0 0 0
g0 [0] g1 [0]
··· ···
··· .. .
··· .. .
··· .. .
⎤ ··· ··· . ⎥ ··· · · · .. ⎥ ⎥ .. ⎥ g0 [M − 1] · · · . ⎥ ⎥. g1 [M − 1] · · · · · · ⎥ ⎥ .. .. ⎥ ⎥ . . ··· ⎦ .. .. .. . . . ···
(4.56)
The condition of perfect reconstruction is Ta Ts = Ts Ta = Ja , where Ja denotes the anti-diagonal unitary matrix. The matrices Ta and Ts are shown as a bi-orthogonal pair. It follows that in terms of filter impulse responses this results as
gi [2k − n], gj [n − 2l] = δ[k − l]δ[i − j],
i = 0, 1.
(4.57)
In the special case where TTs = Ta holds, and hence TTa Ta = Ja , the SBC is called orthonormal and perfect reconstruction FB. In fact, the bi-orthogonal term comes from the fact that the identity condition is more relaxed and satisfied only with the presence of two matrices.
4.3 Filter Banks for Audio Applications
4.3.2.4
251
Quadrature Mirror Filters
A first simple solution suggested in the literature in 1976, by Croisier et al. [2] (see also [6–13]), to achieve aliasing-free and perfect reconstruction FB, in Eqs. (4.38) and (4.39), is the so-called quadrature mirror filters (QMFs) FBs. The starting point is a TF, called a lowpass prototype, with a cutoff frequency equal to π/2 which, for now, we assume is a priori known, determined according to an optimality criteria, which will be reported later in this chapter. Regarding the aliasing-free condition, we can state the following theorem. Theorem 4.1 Let H (z) be the TF of the lowpass prototype also called a half-band filter, realized with a FIR filter, the aliasing-free conditions in Eq. (4.38) are verified, if the following conditions are met H0 (z) = H (z) H1 (z) = H (−z)
(4.58) (4.59)
G 0 (z) = 2H (z) G 1 (z) = −2H (−z)
(4.60) (4.61)
where the (optional) factor 2 in the synthesis FB is inserted to compensate for the factor 1/2 introduced by the decimation. Proof Substituting these last ones in Eq. (4.38), we have that H (z)H (−z) − H (−z) H (z) = 0, identical to (4.66), so aliasing components are canceled. Remark 4.4 No anti-aliasing and anti-image filters are included in the decimation and interpolation banks. This result is surprising in that the conditions of the sampling theorem are violated in both the analysis and synthesis banks but satisfied overall. As might be expected, observe that the TFs H0 (z), H1 (z), and G 0 (z), G 1 (z) are a complementary and symmetrical lowpass and highpass filter, respectively, with cutoff frequency equal to π/2. For example, typical TFs’ frequency responses are shown in Fig. 4.17. As we shall see better below, the impulse responses of the lowpass and highpass analysis and synthesis FB’s filters are derived from a given lowpass prototype: h = [0.27844 0.73454 0.58191 − 0.05046 − 0.19487 0.03547 0.04692 − 0.01778], (tabulated in Vaidyanathan [9]). In addition from Theorem 4.1, the following property holds. Property 4.3 In the time domain, the conditions (4.58)–(4.61) are equivalent to h0 [n] = h[n] h1 [n] = (−1)n h[n]
(4.62) (4.63)
g0 [n] = 2 h[n] g1 [n] = −2(−1)n h[n].
(4.64) (4.65)
252
4 Multi-rate Audio Processing and Wavelet Transform
The frequency response and cutoff frequency of the lowpass prototype can be adjusted as a function of slope (roll-off rate) in order to have a maximally flat overall bank response. This issue is discussed later in Sect. 4.3.3. Remark 4.5 Note that, Eq. (4.63) indicates that the highpass response h1 is obtained by changing the sign to the odd samples of h0 . In terms of the z-transform, this is equivalent to the position of the specular and conjugate zeros of H1 (z) considering the vertical axis, with respect to the zeros of H0 (z). In fact, giving the i-th zero of H0 (z) as ziH0 = αi ± jβi , by the (4.59) holds that z → −z, then the zeros of H1 (z) are ziH1 = −αi ∓ jβi ); then simply changed sign and conjugate. For example, in Fig. 4.18, the poles/zero diagrams of TFs with frequency response in Fig. 4.17 are shown. Theorem 4.1 guarantees the non-aliasing condition. For perfect reconstruction, we also need to verify the condition in Eq. (4.39). Hence, we can state the following theorem.
Fig. 4.17 Typical pair of half-band analysis filters for a given prototype lowpass impulse response h[n] tabulated in Vaidyanathan [9]. a symmetric amplitude response; b complementary in power (constant sum)
Fig. 4.18 Typical impulse responses of h0 and h1 half-band analysis filters and poles/zero diagram. The name quadrature mirror filter (QMF) comes from the observation that the zeros of H1 are mirrors and conjugates of the zeros of H0
4.3 Filter Banks for Audio Applications
253
Theorem 4.2 Given a N -length FIR filter h0 [n], with N even, a two-channel QMF FB has perfect reconstruction if H02 (z) − H02 (−z) = z −(N −1) .
(4.66)
Proof In order to have perfect reconstruction we need to satisfy (4.38)–(4.39). For H0 (z) = H (z), by substituting into (4.38), the QMF conditions (4.59)–(4.61) we get: 21 [2H (z)H (−z) − 2H (−z)H (z)] = 0, as in Theorem 4.1. Now, substituting the QMF conditions in the (4.39) we obtain [H (z)H (z) − H (−z)H (−z)] = H 2 (z) − H 2 (−z) = z −n0
(4.67)
where n0 is assumed odd (see Sect. 4.3.2.2). Moreover, the H0 (z) in Eq. (4.66) on the left has odd symmetry; therefore, H (z) must be chosen to be even-length to avoid distortion. Let N be the even-length of the filter, in (4.67) the overall delay of the analysis–synthesis pair is odd: n0 = N − 1. The relation (4.66) explains the name quadature mirror filters. If H (z) is lowpass, then H (−z) is highpass. The frequency response H (−z) = H (ejω±jπ ) is just the mirror image of H (z) with respect to the axis of symmetry. Furthermore, the filters are also complementary in power, in fact, in the (4.66) for z = ejω we have that H (ejω )2 + |H (ejω±jπ )|2 = 1.
(4.68)
To achieve perfect reconstruction, the lowpass FIR prototype must fully satisfy the condition (4.66). Many design techniques are available in the literature that can determine the h[n] coefficients in a way that well approximates the condition (4.66) and/or the (4.68) (see for example [1, 2]). Later in the chapter, in Sect. 4.3.4.1, the Johnston technique [3], for the optimal design of two-channel QMF banks, is introduced.
4.3.2.5
Example: Haar’s FB
The simplest symmetric two-channel filter bank that can intuitively be devised is the one referring to the moving-average filter (see Sect. 2.7.4.1), so the lowpass prototype results in h0 = √12 [1 1]T For N = 2, in fact, the moving-average filter is precisely√ half-band. Indeed, H (ejω ) = a(1 + e−jω )|ω=π/2 =a(1 − j), for which |H (ejπ/2 )| = a 2. Applying (4.63) we have that h1 = √12 [1 − 1]T and, as intuitively could be assumed, the highpass is realized as a simple prime difference. The respective TFs are therefore equal to H0 (z) =
√1 (1 2
+ z −1 ), and H1 (z) =
√1 (1 2
− z −1 ).
(4.69)
254
4 Multi-rate Audio Processing and Wavelet Transform
It then results in H0 (z) + H1 (z) = frequency responses are √ H0 (ejω ) = 2(1 + cos ω),
√1 2
√ 1 + z −1 + 1 − z −1 = 2. The respective
and
√ H1 (ejω ) = 2(1 − cos ω)
(4.70)
which result in perfectly complementary half-bands. The plots of magnitude (4.70) are shown in Fig. 4.20. Note that due to its form, this BF is assimilated to the Haar unitary transform (see Sect. 2.5.4). Similarly, for the synthesis FB for Eqs. (4.64) and (4.65), we have g0 = [1 1], and g1 = [−1 1]. Observe, also, that for (4.55) and (4.56) the perfect reconstruction condition is verified as: ⎤ ⎡ 1 1 0 0 0 0 ⎢ −1 1 0 0 0 0 ⎥ ⎥ ⎢ 1 ⎢ 0 0 1 1 0 0⎥ ⎥ (4.71) Ta = √ ⎢ ⎥ 2⎢ ⎢ 0 0 −1 1 0 0 ⎥ ⎣ 0 0 0 0 1 1⎦ 0 0 0 0 −1 1 thus, Ts = TTa , and TTa Ta = Ja , where Ja is the anti-diagonal unitary matrix. Remark 4.6 Note, that the conditions (4.66) in the case of FIR filters cannot be satisfied exactly in closed form except in the case of Haar filters. In this case the lowpass prototype is a moving-average filter with only two coefficients, for which H (z) = a(1 + z −1 ). Thus, Eq. (4.66) becomes 1 (a 2
+ 2az −1 + az −2 ) − 21 (a − az −1 + az −2 ) = 2az −1
(4.72)
which exactly satisfies the (4.66) and the conditions (4.58)–(4.61) or, in the time domain, the (4.62)–(4.65). Indeed, for the Haar filter we have that Ts = TTa , and therefore TTa Ta = Ja . This result implies that the Haar filter is orthonormal and perfect reconstruction.
4.3.2.6
Conjugate Quadrature Filters
A second solution for the design of the perfect reconstructed two-channel FB, similar to the one suggested above, is to choose highpass prototypes in conjugate form. In fact, the following property applies Property 4.4 Given a lowpass prototype H (z), such that H (z)H (z −1 ) + H (−z)H (z −1 ) = 1
(4.73)
with H0 (z) = H (z), and then with the chosen highpass filter having conjugate symmetry with respect to H (z), i.e., H1 (z) = z −(N −1) H (−z −1 ). In that case, denoting
4.3 Filter Banks for Audio Applications
255
by N the length of the lowpass FIR prototype, the conditions (4.58)–(4.61) can be rewritten as H0 (z) = H (z) −(N −1)
(4.74) −1
H (−z ) H1 (z) = z −(N −1) G 0 (z) = 2z H (z −1 )
(4.75) (4.76)
G 1 (z) = 2H (−z).
(4.77)
Proof For the proof, it is sufficient to check whether (4.76)–(4.77) satisfy the conditions (4.47). For the determinant (4.46) we have that det Hm (z) = H (z) H (z −1 ) + H (−z) H (z −1 ) · −z −(N −1)
(4.78)
substituting the latter in the (4.47) we have G 0 (z) = −2H1 (−z) = 2z −(N −1) H (z −1 ) G 1 (z) = 2H0 (−z) = 2H (−z)
(4.79)
whereby, for H0 (z) = H (z) and H1 (z) = z −(N −1) H (−z −1 ), the conditions of perfect reconstruction are precisely the (4.79) which coincide with the (4.76) and (4.77). Property 4.5 From (4.74)–(4.77), in the time domain the conditions apply h0 [n] = h[n] h1 [n] = (−1)(N −1−n) h[N − n − 1] g0 [n] = 2h[N − 1 − n]
(4.80) (4.81) (4.82)
g1 [n] = 2(−1)n h[n].
(4.83)
The above, while only sufficient conditions, are widely used in practice. Also derived analytically, with a weak approach based on the component alone z −n0 , do not produce aliasing-free banks under all conditions. Remark 4.7 Comparing Figs. 4.18 and 4.19, starting from the same lowpass prototype h[n], we can observe that in the QMF case, the poles of H1 (z) are a mirrored version with respect to the symmetry axis of those of H (z); in the CQF case, however, they are a mirrored, reciprocal version. Indeed, denoting the i-th zero of H0 (z) as ziH0 = αi ± jβi , by the (4.75), z → −z −1 , so the zeros of H1 (z) are 1 ziH1 = − α2 +β 2 (αi ∓ jβi ), so changed sign and reciprocal i i This is due to the fact that in the CQF case the condition (4.81), in addition to the alternating sign change (equivalent to a rotation equal to π on the unit circle), also imposes the time-reversal of the impulsive response. In fact, in the time domain, synthesis filters are equivalent to the time-reversed version of analysis filters (with in addition a gain that compensates for decimation). In real-world situations, sometimes,
256
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.19 Pair of half-band CQF analysis filters, the same as in Fig. 4.18. a Impulse response h0 ; b impulse response h1 alternate signs and time-reversal of h0 ; c poles/zero diagram
instead of including a gain factor equal to and 2 in the synthesis bank, √ the gain is often distributed between the analysis–synthesis filter and equal to 2. The resulting bank is referred to as conjugate quadrature filters (CQF). Property 4.6 With the perfect FB reconstruction condition in Eqs. (4.37), and taking into account the conditions (4.80)–(4.83), for the first modulation component of the bank, we obtain the equality H 2 (z) + H 2 (−z) = T (z) = z −n0 .
(4.84)
Suppose that H (z) is realized with a linear phase FIR filter of even-length for which it is possible to write (4.85) H (z) = |H (z)| · z −(N −1)/2 where |H (z)| is the zero-phase amplitude response and the term z −(N −1)/2 represents its time translation. Substituting the previous expression into (4.84), one can easily verify that the overall bank delay turns out to be n0 = N − 1. Considering (4.84), neglecting the delay (or phase), it turns out that |H (z)|2 + |H (−z)|2 = 1. Hence, for z = ejω , the condition Eq. (4.68), identical to that for QMF filters also holds, and for (4.80)–(4.83) holds |H0 (ejω )|2 + |H0 (ejω±jπ )|2 = 1 (4.86) |H0 (ejω )|2 + |H1 (ejω )|2 = 1 and
G 0 (ejω )|2 + |G 0 (ejω±jπ )|2 = 2 |G 0 (ejω )|2 + |G 1 (ejω )|2 = 2.
(4.87)
4.3 Filter Banks for Audio Applications
257
Fig. 4.20 Impulse and frequency response of the Haar CQF filter
The frequency responses of the filter pairs H (z), H (−z), and consequently for the (4.80)–(4.82) of the filters H0 (z), H1 (z) and G 0 (z), G 1 (z), shown schematically in Fig. 4.17, are power complementary; i.e., the sum of their powers is constant as expressed by (4.68). Example 4.1 Let us consider the previously discussed Haar filter in which the normalized lowpass prototype is equal to h = √12 [1 1]T . The characteristics of the filters h0 (n), h1 (n), g0 (n), and g1 (n) are shown in Fig. 4.20 where, note, √ that the filters are of type CQF and further that for ω → 0 the filter gain is equal to 2.
4.3.2.7
Half-Band Lowpass Prototype Properties
The QMF and CQF FBs both have a lowpass type prototype or reference half-band filter, i.e., with normalized cutoff angular frequency π/2. Let’s see, some conditions related to this filter for a correct design of the bank. Denoting T (z) the zero-phase power transfer function , defined as ([6], Eq. 6.51) T (z) = H (z)H (z −1 )
(4.88)
that in the unit circle becomes 2 T (ejω ) = H (ejω )H (e−jω ) = H (ejω ) ≥ 0,
∀ω.
(4.89)
The condition (4.73) can be written as T (z) + T (−z) = 1.
(4.90)
Thus, the power transfer function must be half-band filter. However, as discussed earlier this condition alone is not sufficient.
258
4 Multi-rate Audio Processing and Wavelet Transform
Finally observe that, as for the QMF and QCF filters, substituting (4.89) into (4.90), the condition (4.68) is always valid: |H (ejω )|2 + |H (ejω±jπ )|2 = 1, i.e., H (z), H (−z), and thus H0 (z), H1 (z), G 0 (z) and G 1 (z), must be power complementary. Remark 4.8 Note that, a linear phase FIR filter cannot verify the exact condition in Eq. (4.68) except in two specific cases of scarce practical importance (such as in Haar’s FB in Sect. 4.3.2.5) and must be approximated numerically.
4.3.3 Filter Bank Design Following are some procedures for the design of orthogonal FBs. The first is based on spectral factorization, and the second is based on lattice structures. In addition, project methods with a time-domain approach are presented.
4.3.3.1
FB Design by Spectral Factorization
The QMF and CQF conditions are sufficient conditions for which, in principle, various modes can be defined for the design of perfect reconstruction FBs. A technique that allows for multiple degrees of freedom in determining the SBC filters is what is known as spectral factorization, proposed by Smith and Barnwell [5]. The method, derived directly from the properties of the half-band lowpass prototype in Sect. 4.3.2.7, and the conditions (4.38) and (4.39), allows a global formulation for synthesizing the entire bank rather than individual filters. We define the product of filters H0 (z) and G 0 (z) as P0 (z) = G 0 (z)H0 (z)
(4.91)
then, for alternating signal Theorem 4.1, it is easy to prove that − P0 (−z) = G 1 (z)H1 (z)
(4.92)
such that the perfect reconstruction condition in Eq. (4.39) can be written as F0 (z) = G 0 (z)H0 (z) + G 1 (z)H1 (z) = P0 (z) − P0 (−z) = 2z −n0 .
(4.93)
A consequence of (4.93) is that the zeros of P0 (z) appear in pairs, and that the factorization of P0 (z) into its components can be done in many ways. That is, it is possible to partition the group delay of the SBC in non-identical ways between the analysis and synthesis bank and, furthermore, the individual filters need not necessarily be linear phase. A solution to this problem, introduced in [5], consists in observing that:
4.3 Filter Banks for Audio Applications
259
Fig. 4.21 Pair of complementary half-band filters: a complementary in amplitude; b complementary in power; c typical impulse response of the zero-phase half-band filter
• the product H0 (z)G 0 (z) = P0 (z) is a lowpass filter; • the product H1 (z)G 1 (z) = −P0 (−z) is the correspondent translate highpass filter; • the filters P0 (z) and −P0 (−z) constitute a lowpass highpass pair with cutoff frequency of π/2 and complementary in amplitude, i.e., half-band filter as illustrated in Fig. 4.21a. In this way the filters H0 (z), H1 (z) and G 0 (z), G 1 (z), regardless of the type of factorization chosen, constitute a highpass/lowpass pair, half-band filter complementary in power as shown in Fig. 4.21b. In the literature many methodologies are available for the design of lowpass filters such as, for example, windowing methods (Kaiser, Hamming, etc.) or equiripple methodologies such as the Parks–McClellan algorithm (see [6]). Half-band lowpass, zero-phase FIR filters have the property of having a null impulse response at points n = ±2, ±4, ±6, . . .; as, for example, the one illustrated in Fig. 4.21c. Once the P0 (z) is determined with the chosen methodology, this can be subsequently factored as a product P0 (z) = G 0 (z)H0 (z) ≡ H0 (z)H1 (−z). Remark 4.9 The decomposition of P0 (z) into the product H0 (z)H1 (−z) is not unique and, generally, the best way to do it is such that H1 (−z) = H0 (z −1 ); that is, the roots of H1 (−z) are the reciprocals of H0 (z). The result of the described procedure is that FIR filters, √ of equal length, are complementary in power (their amplitudes intersect at 1/ 2 (−3 dB) and form a FB’s basis. The H1 (z) therefore corresponds to the time-reversed version translated into a frequency of H0 (z). For which it results |H0 (ejω )| = |H1 (−ejω )|.
(4.94)
In the literature many optimal techniques are available, according to various criteria, for the determination of the coefficients h1 [n] and h0 [n]. For further information [5–9]. Property 4.7 The filter specification is given in terms of the global response p[n]. Consider an impulse response of an ideal half-band filter multiplied by a symmetric window
260
4 Multi-rate Audio Processing and Wavelet Transform
p[n] = w[n]
sin(π/2n) , π/2n
n = −2L + 1, . . . , 2L − 1.
Since p[2n] = δ[n] we have that P(z) + P(−z) = 2.
(4.95)
Daubechies Minimum Phase Wavelets Filters An example of a spectral factorization that satisfies (4.95) is to synthesize the factorized P(z) so that it has a large number of zeros in ω = π P(z) = (1 + z −1 )k (1 + z)k R(z).
(4.96)
To satisfy (4.95), the TF R(z) is characterized with: (i) symmetry constraint, R(z) = R(z −1 ); (ii) positive in the unit circle R(ejω ) ≥ 0; (iii) with powers of z j for j ∈ [−k + 1, k − 1]. The solution of this optimization problem, keeping only the zeros inside the unit circle, yields a family of minimum phase G 0 (z), known as Daubechies filters [18]. Example 4.2 Consider the Daubechies filter Eq. (4.96), for k = 2, that is, a filter of length four P(z) = G 0 (z)G 0 (z −1 ) = (1 + z −1 )2 (1 + z)2 R(z) (4.97) and as P(z) + P(−z) = 2, all even coefficients of P(z) are zero and the coefficient of z 0 equals to 1. For a filter of length N = 4, the highest degree of the polynomial results in z −3 which implies an R(z) of the symmetrical form R(z) = az 1 + bz 0 + az −1 . Substituting the latter into the (4.97) and developing the product we have that P(z) = az 3 + (4a + b)z 2 + (7a + 4b)z + (8a + 6b) + (4b + 7a)z −1 + (b + 4a)z −2 + az −3 . By equating the coefficients of the powers equal to zero and the coefficient of z 0 to one we can write the following relationships
4a + b = 0 8a + 6b = 1
1 with solution a = − 16 and b = 41 . Factoring as R(z) = KR (a0 + a1 z −1 )(b0 + b1 z), we get
4.3 Filter Banks for Audio Applications
261
√ √ 1 1 1 1
R(z) = − z 1 + − z −1 = √ 1 + 3 + 1 − 3 z −1 16 4 16 4 2
√ √ × 1+ 3+ 1− 3 z
√
√
of which we need to take only the zeros inside the unit circle 1 + 3 + 1 − 3 z −1 . Combining with (4.97), we obtain a FIR lowpass TF equal to ([18], Eq. (3.42))
√ √ 2
1 1 + 3 + 1 − 3 z −1 G 0 (z) = √ 1 + z −1 4 2
√
√ √ √ 1
1 + 3 + 3 + 3 z −1 + 3 − 3 z −2 + 1 − 3 z −3 = √ 4 2 (4.98) characterized by a pair of zeros at z = −1. The impulse and amplitude response of the resulting FB is shown in Fig. 4.22. √ √ √ √ Remark 4.10 Note that the FIR filter g0 = 4√1 2 [1 + 3 3 + 3 3 − 3 1 − 3]T , obtained in the above exercise, known as a Daubechies filter of length 4 (denoted as D2 or db2 as in Matlab ) is very important because, as we will see later, it is the starting point for defining a very important class of functions called limited support wavelets function.
4.3.4 Lowpass Prototype Design In literature many filter design techniques which are able to determine the coefficients h[n] in order to fine approximate the QMF/CQF condition are available.
Fig. 4.22 Daubechies filters. Impulsive and amplitude responses of the two-channel orthogonal Daubechies-FB, derived from Eq. (4.98) [18]
262
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.23 Amplitude response of two-channel orthogonal filters in Table 4.1. a Smith and Barnwell [5]; b Vaidyanathan and Hoang [9]; c Daubechies db4 [18] Table 4.1 Lowpass prototype coefficients of the length 8 determined by various spectral factorization methods n Smith and Barnwell Daubechies Vaidyanathan and Hoang 0 1 2 3 4 5 6 7
− 0.10739700 − 0.03380010 0.50625500 0.78751500 0.31665300 −0.08890390 −0.01553230 0.04935260
− 0.01059740 0.03288301 0.03084138 −0.18703481 −0.02798376 0.63088076 0.71484657 0.23037781
− 0.01778800 0.04692520 0.03547370 −0.19487100 −0.05046140 0.58191000 0.73454200 0.27844300
For example, (see Table 4.1) in Johnston in [3] and Jain-Crochiere, [4] determined a set of coefficients of the impulse responses of QMF filters that are tabulated and available in many articles or texts [3, 9]. The design of the lowpass prototype starts from the assumption of having a maximally flat response in the bandpass region without the need for an equiripple-type response (Fig. 4.23).
4.3 Filter Banks for Audio Applications
4.3.4.1
263
Lowpass Prototype Design by Johnston’s Method
In Johnston’s method [3], the design of the lowpass prototype assumes a maximally flat response in the bandpass region without the need for an equiripple response. Let h ∈ RN ×1 , for convenience of notation we define M = N /2 and the following M -length sequences w[n] = he [2n] , v[n] = ho [2n + 1]
n = 0, . . . , M − 1.
As the odd-index elements are zeros, w[n] and v[n] may be considered as the packed sequences of the even and odd parts, respectively. Moreover as v[n] = w[M − 1 − n], the vector w ∈ RM ×1 can be considered as representative of all the coefficients of the analysis–synthesis FB. The cost function to be optimized to obtain a frequency response H (ejω ) can be defined as the sum of two contributions J (w) = JER (w) + αJSB (w) subject to the constraint wT w = 1 where • The term JER (w) takes into account the reconstruction error; i.e., it is the condition that imposes the complementarity of the FB in power and amplitude over the whole band. This term corresponds to the energy of the ripple of the overall response of the bank. • The term JSB (w) takes into account the error in the stopband of the lowpass filter. • The constraint wT w = 1 takes into account of the necessary and sufficient condition for perfect reconstruction; i.e., the overall impulse response of the FB is a delayed delta function. It turns out then JER (w) = JSB (w) =
π
H (ejω )2 + H (ej(ω−π) )2 − 1 ω=0 π ω=ωSB
H (ejω )2 ,
ωSB =
π + δπ. 2
For small δ the frequency ωSB is as close as possible to π/2. The weight α, in the cost function, determines the amount of stopband attenuation energy. The coefficients of the TF prototype H (ejω ) are chosen by minimizing the function
264
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.24 Lowpass protoype and the complementary half-band filters: a Lowpass protoype and its power complementary frequency response; b impulse response of the zero-phase QMF half-band filter
w ∴ arg min [JER (w) + αJSB (w), ] s.t. wT w − 1 = 0.
(4.99)
α,w∈RM
The previous expression can be minimized through conventional optimization methods (e.g., with Lagrange multipliers) or with global optimization algorithms that do not involve the calculation of the derivative. Algorithms such as the Tabù search, the simulated annealing or the genetic algorithms can be used in this regard. Note that the coefficient α is generally set a priori with a value, generally, close to 1. Let fsb be stopband normalized frequency, let f be transition band, as for QMF filter by default is fc = 0.25, we have that 0.25 < fsb < 0.5, and of course 0 < f < 0.25 (i.e., f = fsb − fc ). For example, Fig. 4.24 shows the frequency response of the lowpass prototype, that minimize the cost function (4.99) (and the generated QMF responses), with a procedure described in [1], for N = 64, fsb = 0.3 and α = 1. Remark 4.11 Note that the perfect reconstruction solution does not always exist. In the literature several methods of design of the lowpass prototype are available, operating both in the time and frequency domains. Such solutions, for brevity not reported, generally take into account the overall characteristics of the FB. For example, it is possible to consider the overall group delay as a parameter of the optimization algorithm. In other words, it is possible to set a certain low delay and to determine the optimal solution for this delay or to consider other critical aspects related to the specific audio application. For more details [2–13].
4.3.4.2
FB Synthesis in the Time Domain
One of the most flexible methodologies for solving the synthesis problem of the PRQMF FB is the one formulated in the time domain proposed in 1992 by Nayebi et al. [7]. This formulation has allowed the discovery of new classes of FBs such as those minimum delay, variable delay, and time-variant.
4.3 Filter Banks for Audio Applications
265
With the time-domain synthesis it is possible for the design of each type of linear FB optimizing according to the parameter (or a combination of parameters) more convenient for the application. This synthesis technique can be formulated by expressing the exact reconstruction condition directly in the time domain with a suitable matrix form. Let us consider a FB with M -critical sampling channels. The whole system of analysis–synthesis, given the decimation operation, can be seen as consisting of M different time-invariant TFs. It is possible, therefore, to consider the exact reconstruction condition for each of the subsystems of the structure. In other words, a δ[n − i] pulse placed at the input to the entire system will appear at the output unchanged at the instant n = i + n0 , with i = 0, 1, 2, …, M − 1, where n0 represents the system group delay. Then, it is possible to write an overdetermined system of linear equations of the type AS = B, where A represents the matrix of the coefficients of the analysis FB; B represents the desired response, that is all zeros except the appropriate delayed impulse δ[n − n0], and S is defined as an unknown matrix of the FB coefficients. Particular linear combinations of the coefficients of the analysis and synthesis filters can be determined at different instants which are corresponding to different impulses placed at the input of the system. The idea is to construct the matrices A, S and B in such a way as to completely describe all M TFs of the bank. Given D and the decimation/interpolation rate indicating by M the number of channels of the bank and N , such that L = N /D (i.e., N = LD ) the length of the filters, we define the matrices P, Q ∈ RM ×N , from the set of column vectors of the impulse responses of the analysis and synthesis bank, i.e., ⎡
⎤ h0 [1] · · · h0 [N − 1] ⎢ h1 [1] · · · h1 [N − 1] ⎥ ⎢ ⎥ P=⎢ ⎥ .. .. .. ⎣ ⎦ . . . hM −1 [0] hM −1 [1] · · · hM −1 [N − 1] M ×N and
⎡
h0 [0] h1 [0] .. .
⎤ g0 [1] · · · g0 [N − 1] ⎢ g1 [1] · · · g1 [N − 1] ⎥ ⎢ ⎥ Q=⎢ . ⎥ .. .. .. ⎣ ⎦ . . . gM −1 [0] gM −1 [1] · · · gM −1 [N − 1] M ×N g0 [0] g1 [0] .. .
(4.100)
(4.101)
For the development of the method, we define the submatrices Pi ∈ RM ×D ⊂ P ∈ RM ×LD and Qi ∈ RM ×D ⊂ Q ∈ RM ×LD , for i = 0, 1, …, L − 1, such that the linear equations system is defined
266
4 Multi-rate Audio Processing and Wavelet Transform
⎡
P0T
0
⎢ T ⎢ P1 P0T ⎢ ⎢ .. .. ⎢ . . ⎢ ⎢ PT PT ⎢ L−1 L−2 ⎢ 0 PT L−1 ⎢ ⎢ . .. ⎣ .. . 0 0
··· ··· T PL−3 T PL−2 .. .
0
0 .. .
..
. ··· ··· .. . 0
⎤⎡ ⎤ ⎡ ⎤ 0 Q0 0 .. ⎥ ⎢ Q ⎥ ⎢ . ⎥ 1 . ⎥ ⎥ ⎢ . ⎥ ⎥⎢ .. ⎥ ⎢ . ⎥ ⎥⎢ ⎢ . ⎥ ⎢ 0 ⎥ 0 ⎥ ⎥ ⎢ ⎥ ⎥⎢ T ⎥ ⎢ Qj ⎥ = ⎢ JD ⎥ . P0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢ 0 ⎥ P1T ⎥ ⎢ ⎢ ⎥ ⎥ . ⎥ ⎥ ⎢ . ⎥ .. ⎥ ⎢ ⎣ ⎦ ⎣ .. ⎦ . ⎦ QL−2 T QL−1 0 PL−1 S
A
(4.102)
B
The first M columns of the matrix A are the transpose of the coefficient matrix of the analysis filter bank P, followed by L − 1 blocks of null matrices 0D×M ; hence A ∈ R(2L−1)(D)×L(M ) , S ∈ RL(M )×1(D) , B ∈ R(2L−1)(D)×1(D) , and the matrix JD is defined as a scaled unitary anti-diagonal matrix ⎡
⎤ 0 ··· 0 1 ⎥ 1 ⎢ ⎢0 ··· 1 0⎥ JD = ⎢ . . . . ⎥ . D ⎣ .. . . .. .. ⎦ 1 ··· 0 0
(4.103)
The location of the matrix JD within the matrix B is, therefore, a specification that depends on the desired group delay. In the case where JD is located at the center of the vector B, as in (4.102), the group delay of the analysis–synthesis system corresponds to the length of the impulse response of the FIR filter and is equal to N − 1. This condition is not the only possible one: Different time conditions can be expressed with this methodology. The condition (4.102) is, in fact, necessary and sufficient for the bank to have perfect reconstruction properties [7]. To better understand this condition we consider the simple case where M = D = 1. In such a situation condition where (4.102) becomes h1 [n] ∗ g1 [n] = δ[n − ]. Equation (4.102) represents an oversized linear system that admits infinite solutions, and the least-squares approximate solution turns out to be calculable as S = (AT A)−1 AT B
(4.104)
where the matrix (AT A)−1 AT is the pseudo-inverse of A. The solution AS = B approximates the optimal solution Bˆ for a given analysis FB and a N − 1 delay. For this problem it is possible to find in the literature many approximation algorithms. For example, it is possible to determine the solution of the problem recursively using gradient descent methods The cost function (CF) to be minimized can contain the frequency specifications relating to the FB responses. For example, in the case of two-channels FB we can write
4.3 Filter Banks for Audio Applications
B−B ε= O
2 F
!πp + 0
1 − H1 (ejω )2 dω +
267
!πp
H0 (ejω )2 dω
(4.105)
πs
where ε represents the error function, i.e., the CF to be minimized. Note that the 2 term Bˆ − B represents the Frobenius quadratic norm. Note also that the previous F
expression simultaneously reduces the global error and the error in the stopband of the analysis FB. Moreover, the cost function can also contain other types of constraints (e.g., the width of the transition bands, constraints on the step response, etc.), and also other optimization criteria can be used (e.g., norm L1 , L∞ , …). Remark 4.12 Note that the perfect reconstruction solution does not always exist. The techniques in the time domain, in particular in the formulation in Eq. (4.105), to find a suboptimal solution subject to specific constraints. One of the advantages of the time-domain method is its flexibility. With this technique, for example, it is possible to consider the overall group delay as a parameter of the optimization algorithm. In other words, it is possible to impose a certain delay and determine the optimal solution for this delay.
4.3.5 Cosine-Modulated Pseudo-QMF FIR Filter Banks The modulation of a lowpass prototype impulse response has been widely used for the generation of M -channels FB since the 1980s. These banks do not achieve a perfect reconstruction like the PRQMFs previously described, and for this reason they are often defined as pseudo-QMF (PQMF) banks. Despite the non-perfect reconstruction, the modulated cosine FBs are widely used in the coding of the audio signal by virtue of the following properties that are very interesting for the audio sector. 1. Design simplicity. In practice, only a lowpass prototype is defined; 2. The entire structure of analysis–synthesis presents a linear phase and therefore a constant group delay; 3. Possibility of implementation with very fast blocking algorithms; 4. Low complexity (a filter plus a modulator); 5. Critical sampling. Let ωk be and the central frequency of each bandpass filter, the coefficients of the modulated cosine FB are generated with a lowpass prototype h[n], of (N − 1)-order, with cutoff frequency equal to π/2M , translated in frequency by multiplying by a cosine of frequency ωk ; that is ωk =
π (2k + 1) , k = 0, 1, . . . , M − 1 2M
(4.106)
268
4 Multi-rate Audio Processing and Wavelet Transform
for example, from the prototype h[n], it is possible to generate two complex filters with a translation equal to ω0 = π/M and −ω0 ; or jω0 n −jω0 n , and h(−) . h(+) 0 [n] = h[n] · e 0 [n] = h[n] · e
(4.107)
In fact, adding the previous expressions it is possible to obtain a real filter (−) jω0 n + e−jω0 n ) = 2h[n] · cos(ω0 n). h0 [n] = h(+) 0 [n] + h0 [n] = h[n](e
(4.108)
With similar approach it is possible to generate all the other impulse responses by translating of ±(2k + 1)π/M ; for which we get " hk [n] = 2h[n] · cos
# (2k + 1) π n , k = 0, 1, . . . , M − 1. 2M
(4.109)
The FB frequency responses generated in this way are illustrated in Fig. 4.25. From the figure it is possible to observe that the bank has aliasing components. In PQMF banks the impulse response of the synthesis bank is simply obtained as gk [n] = hk [N − 1 − n]
(4.110)
i.e., the response gk [n] satisfies the mirror image condition which guarantees the elimination of the phase distortions of the global TF of the bank.
4.3.5.1
Aliasing Compensation
The aliasing cancelation can be obtained by establishing precise relationships between the analysis and synthesis TFs Hk (z) and G k (z), respectively. This phenomenon can be considered by expressing (4.108) in the z-domain with appropriate scale coefficients Hk (z) = αk Hk(+) (z) + αk∗ Hk(−) (z) moreover, as regards the synthesis bank
Fig. 4.25 Cosine-modulated FIR FB a Frequency response of the lowpass prototype; b FB’s response
(4.111)
4.3 Filter Banks for Audio Applications
269
∗ (−) G k (z) = βk G (+) k (z) + βk G k (z), k = 0, 1, . . . , M − 1.
(4.112)
A fundamental property of the pseudo-QMF cosine-modulated filter bank is that, by appropriately choosing the α and β parameters, the aliasing components can be mutually compensated and in this case the FB would be “almost perfect reconstruction” or Perfect Reconstruction Pseudo-QMF (PR-Pseudo-QMF). For the exact derivation of the parameters α and β, we refer to the literature [9–13]. As the optimization result (see, e.g., [9]), we have an impulse response of the analysis and synthesis FBs, hk [n] and gk [n], respectively, generated by modulating the impulse response of the lowpass prototype h[n] with particular cosine function defined as ([12], Eqs. (1) and (2)) #" # " ⎧ 1 N −1 π ⎪ ⎪ h k + n − + θ [n] = 2h[n] cos k ⎨ k M 2 2 " #" # ⎪ π 1 N −1 ⎪ ⎩ gk [n] = 2h[n] cos k+ n− − θk M 2 2
(4.113)
for k = 0, 1, . . . , M − 1; n = 0, 1, . . . , N − 1 and where θk = (−1)k
π 4
(4.114)
so, it is easy to verify that gk [n] = hk [N − 1 − n], and that G k (z) = z −(N −1) Hk (z −1 ). With the impulse responses calculated with the previous expressions the aliasing distortion is completely eliminated. The design procedure of the bank consists therefore only in the determination of the impulse response h[n] such that the amplitude distortion is minimal as that presented in Sect. 4.3.4. The prototype is designed in a similar way to the Johnston method in [3], done in the case of two channels. To minimize amplitude distortion and aliasing due to stopbands, the CFs can be defined π
JER (h) =
!M
H (ejω )2 + H (ej(ω−π/M ) )2 − 1 dω
0
!π JSB (h) = π 2M
H (ejω )2 dω.
+δπ
For small δ, the frequency ωSB is closer to π/M . So even in this case, to cancel the aliasing the frequency of the stopband is slightly modified. The prototype TF coefficients can be chosen in a way to minimize α and h; that is α, h ∴
arg min [αJER (h) + (1 − α)JSB (h)] .
α∈(0,1),h∈RN
270
4 Multi-rate Audio Processing and Wavelet Transform
It should be noted that the previous CF defines a good compromise between the attenuation of the stopband and the aliasing cancelation. This operation is necessary to minimize the amplitude distortion of through the CF JER (h) (see Fig. 4.26 and the aliasing error due to the stopbands through the CF JSB (h). With the impulse responses calculated with the previous expressions the aliasing and phase distortion is completely eliminated. The design procedure of the bank consists therefore only in the determination of the impulse response h[n] such that the amplitude distortion is minimal. Remark 4.13 Observe that, the PQMF filter banks have played a very important role in the evolution of compressed coding methods based on perceptive models of the audio signal. In fact, the IS11172-3 and IS13818-3 ISO standards (better known as MPEG1 and MPEG2) in the I–II layers employ a 32-channel PQMF filter bank [14, 15]. As shown in Fig. 4.27a the standard ISO lowpass prototype has a length N = 512 samples (a multiple to the number of channels: N = 16 · M ), which ensures an attenuation of the secondary lobes equal to 96 dB while the output ripple (nonPR) is less than 0.7 dB. In layer III a similar modulated cosine desk is made with in addition to the PR conditions forming a hybrid architecture. The MPEG1 layer III algorithm, known as mp3, given the enormous diffusion due to the exchange of audio files on the WEB, has now become one of the most widespread standards in audio reproduction. In addition, in the digital satellite transmission (DBS
Fig. 4.26 Prototype lowpass frequency response and response composition for the first and second channels of the bank
Fig. 4.27 Cosine-modulated FIR uniform FB: a Frequency response of the lowpass prototype; b MP3 FB’s response implemented with Eq. 4.113
4.3 Filter Banks for Audio Applications
271
/ DSS) and in the European digital audio broadcasting standard (DBA) the standard used is MPEG1 layer I.
4.3.6 Non-uniform Spacing Filter Banks The non-uniform spacing FBs are used in various sectors and are extremely important in the audio applications. Among these an important class, already seen in the previously introduced equalizer circuits, consists of constant-Q filters where the ratio between bandwidth and central frequency is constant (see Fig. 3.11). FBs with these features can be made by appropriately connecting a two-channel, lowpass and highpass PRQMF bank of the type previously introduced.
4.3.6.1
Non-uniform Diadic Filter Banks
An class of non-uniform spacing FBs, very interesting for audio applications, is that consisting of constant-Q filters where the ratio between bandwidth and central frequency is constant. Banks with these characteristics can be created by appropriately connecting a two-channel PRQMF bank of the type previously introduced. An important aspect of these structures is the close connection with spectral transformations called wavelets transform. Considering the two-channel case, in fact, the banks are constructed considering a lowpass filter type h0 [n] and a highpass filter and h1 [n] connected as shown in Fig. 4.28. By connecting to the lowpass output to another FB similar to the previous one in a recursive way, as indicated in Fig. 4.29a, the decomposition, called dyadic tree, turns out to be equivalent to a constant-Q FB which, under certain conditions, is equivalent to the computation of the discrete wavelet transform (DWT). The synthesis FB structure, perfectly dual with respect to that previously presented, consists in the signal reconstruction starting from its subband components. The synthesis bank perfectly symmetric to the analysis structure of Fig. 4.29a is shown in Fig. 4.29b and in Fig. 4.30. The overall analysis–synthesis is shown in Fig. 4.31. This type of bank can be used for the implementation of octave equalizers or its fractions. The decimation and interpolation processes allow the use of relatively short FIR filters. In addition, the use of dyadic tree structures based on conjugate complemen-
Fig. 4.28 Filter bank with two complementary channels used to define a dyadic FB: a symbols; b complementary frequency response of the bank
272
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.29 Octave FB with dyadic tree structure. a Analysis FB. b Synthesis FB Fig. 4.30 Two-channel synthesis FB and its symbolic representation
Fig. 4.31 Analysis–synthesis FB for signal reconstruction starting from the subbands
tary symmetry filters ensures linear phase for each channel. In addition, it is possible to design the lowpass prototype to minimize the aliasing phenomenon. A filter bank with critical sampling rate leads to banks with output aliasing. However, to avoid aliasing you can use a multi-complementary filter bank as indicated in [6]. Finally, observe that with the structure in Fig. 4.32 the aliasing cancelation condition is not verified for Ci = Cj .
4.3.6.2
Filter Banks and Perceptive Models: Gammatone Filters
The physical modeling of the hearing organ is a very complex subject; this study is beyond the scope of this book and several philosophies and models for this purpose are available in the literature. In general terms, however, the models are made in several stages: the first performs a spectral decomposition of the signal; the second,
4.3 Filter Banks for Audio Applications
273
Fig. 4.32 Example of subband linear phase equalizer
generally nonlinear, represents a dynamic model of the hair cell. The concept of the so-called critical band is strongly related to the masking phenomenon. In fact, the modification of the audibility threshold due to sinusoidal tones or narrowband noise is intimately linked to the way in which the hearing organ performs the spectral analysis of the acoustic signal. The perceptual model, based on so critical bands and on masking thresholds, can be implemented through a FB with appropriate distribution. For example, the Gammatone filters are characterized by the impulse response calculated as: g(t) = at n−1 e−bt cos(2π f0 t + ϕ),
t>0
that represent an intermediate model between the physical and psychoacoustic approaches. The response of these filters is asymmetric and represents an accurate approximation of the auditory channel. Figure 4.33 shows a Gammatone filter bank equally spaced on the equivalent rectangular bandwidth (ERB) scale.
Fig. 4.33 Gammatone filter bank is designed to model the human auditory system. Gammatone filter (up), and 32-channel filter bank (down)
274
4 Multi-rate Audio Processing and Wavelet Transform
4.4 Short-Time Frequency Analysis The audio signal usually has non-stationary characteristics. In order to have an accurate frequency analysis it is necessary to introduce a bidimensional time–frequency representation, which in continuous time can be indicated as2 X (t, jω) while in discrete time as X (n, ejω ) [36–39].
4.4.1 Time–Frequency Measurement Uncertainty Given the time-varying nature of the spectrum, it is necessary to introduce appropriate transformations which, in CT domain, can be indicated as F t {x(t)} = X (t, jω)
(4.115)
i.e., the spectrum is not constant, but depends on the time instant t, at which the measurement is made. However, in this kind of analysis, to avoid having to consider the indeterminacy in the measurement of frequency, it is convenient to think of the spectrum as stationary if measured over short intervals. The signal spectrum given by the expression (4.115) in the discrete-time domain, denoted X (n, ejω ), is a function of two variables: time, understood as the index of the time sample n, and angular frequency ω. As it is usually done, for the study of two-variable functions, one of the two variables is fixed at a time. By fixing the time variable n we obtain the time-varying spectrum, also called instantaneous frequency spectrum, defined as F {x[n]}|n=const = Xn (ejω ).
(4.116)
By fixing the frequency variable ω, we obtain an output in the time domain that corresponds to the signal around a certain frequency ωc , i.e., a bandpass signal F{x[n]}|ω=const = xωc [n].
(4.117)
In the above formulation, as illustrated in Fig. 4.34, with the notation Xn (ejω ) we want to highlight the spectral nature of the representation; while, with the notation (4.117) xωc [n] we indicate the temporal structure of the transformation. In Fig. 4.34 the non-stationary signal is represented on the time–frequency plane. This representation, reminiscent of musical notation on a pentagram, illustrates the various frequencies played over time.
2
Now and in the continuation of the chapter, for simplicity of representation, the pulsation is indicated with ω both in the time-continuous and time-discrete domains. In some texts, and in other chapters of this book, the pulsation in continuous time is denoted by the variable .
4.4 Short-Time Frequency Analysis
275
Fig. 4.34 Representation of a non-stationary signal in the time–frequency plane. On the left interpretation as filter bank outputs; on the bottom as sliding window of the short-time Fourier transform (STFT). The time–frequency tile is referred to as Heisenberg uncertainty rectangle or Gabor diagram of information ([23], p. 432)
Remark 4.14 In Fig. 4.34, we have highlighted the time–frequency tile, given by the product ω · t, which determines an uncertainty in the measurement of instantaneous frequency or time. This uncertainty can be traced back to Heisenberg uncertainty principle: If the signal is non-stationary its frequency changes at each time instant, then the measurement must be made for each of these instants on very short signal windows or, at the limit, sample by sample. Frequency measurement, by definition, cannot be done by analyzing a single sample but it is necessary to have a certain time interval t of the signal. Therefore, the measurement of instantaneous frequency has inherent uncertainty of measurement due to a finite size of t, within which the signal may be non-stationary. On the other hand, measuring the frequency over a finite t means an average frequency over that interval; then, it also results in uncertainty in time because we do not know at what precise time instant the measured frequency corresponds.
4.4.2 The Discrete Short-Time Fourier Transform A simple extension of the DTFT in Eq. (2.23), for non-stationary signals, known as the short-time Fourier transform (STFT), is one in which the operator (4.116), in discrete time, takes a form of the type X (n, ejω ) = Xn (ejω ) =
∞
h[n − m]x[m]e−jωm
(4.118)
m=−∞
where h[n], as shown in Fig. 4.35, represents an analysis window3 of appropriate duration and usually normalized; multiplied to the signal, it selects a portion of it 3
The Gabor transform is very similar to the STFT except the window function h has the shape of a Gaussian (i.e., the normal distribution).
276
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.35 Discrete-time short-time Fourier transform (DSTFT) is equivalent to a DTFT of a portion of the signal determined by an analysis window of appropriate duration. a Analysis window h[n]; b windowed signal or short-time sequence x[n, m]
(approximated as stationary for this portion) where the spectrum is evaluated. In other words, the discrete-time STFT (DSTFT) is equivalent to a DTFT over limited (or compact) support, i.e., a transformation of a frame of the signal limited by an analysis window of appropriate shape and duration. The DSTFT X (n, ejω ), defined by (4.118), turns out to be formally the DTFT of the signal x[n] multiplied by the analysis window h[n − m]; i.e., x[n, m] = h[n − m]x[m].
(4.119)
Remark 4.15 Note that the analysis windows are the same as those already used in the FIR filter design (see Sect. 2.7.4.2, e.g., Eq. (2.91)). They are applied to each signal segment to mitigate the Gibbs phenomenon due to discontinuities resulting from abrupt truncation, i.e., segment selection by multiplication with a rectangular window.
4.4.2.1
Time–Frequency Duality: Short-Time Fourier Transform and Filter Banks
A different interpretation of the STFT, dual to the previous one, results possible observing that the (4.118) is linear and can be rewritten as the following convolution expression (4.120) xω [n] ≡ X (n, ejω ) = h[n] ∗ x[n]e−jωn . The (4.120) reveals that xω [n] can be interpreted as the output of a linear filter with impulse response h[n], at whose input is put a signal modulated with a carrier: e−jωn , and that is shifted-down in frequency around a certain ω. In the case where the translation frequency is some specified ωk , it is usual to denote its subscript k as well; as shown in Fig. 4.36, we use to write: xωk [n] = h[n] ∗ x[n]e−jωk n .
4.4 Short-Time Frequency Analysis
277
Fig. 4.36 STFT is equivalent to lowpass filtering of a frequency-shifted (or demodulated) DT signal
In this case, h[n] represents the impulse response of a lowpass filter, denoted to as analysis filter, that determines the bandwidth or frequency portion of the signal to be analyzed, (ω, in Fig. 4.34). Equation (4.120) can also be written as X (n, ejω ) = xωk [n] = (h[n] · e−jωk n ) ∗ x[n] = hk [n] ∗ x[n].
(4.121)
Differently, in this case, the structure takes the form of a filter bank, as illustrated in Fig. 4.37, where the signal xωk [n] represents the output of a bandpass filter centered around the frequency ωk xωk [n] = x[n] ∗ hk [n] =
L−1
hk [k]x[n − k]
(4.122)
k=0
with this second interpretation hk [n] = h[n] · e−jωk n indicates the impulse response of the filter centered around frequency ωk . Then, it is usual to define the impulsive response h[n] of the analysis filter as the filter bank’s lowpass prototype.
4.4.2.2
Discrete STFT Implementation with Filter Bank
From the expressions (4.120) and (4.121), it is possible to determine two different ways to implement the bank filters. • In the first, as already illustrated in Fig. 4.36, the signal is multiplied by the complex exponential (demodulation) and then lowpass filtered with a unique impulse response h[n]. • In the second, illustrated in Fig. 4.37, it is the response of the prototype lowpass filter that is translated thus giving rise to the bandpass filter bank.
Fig. 4.37 Interpretation of the STFT as a filter bank. The filters outputs represent the signal around frequency ωk
278
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.38 Single channel filter bank implementation
Given the expression (4.120), making explicit the real and imaginary part of the complex exponential (using Euler’s formula), we have xωk [n] = h[n] ∗ x[n] cos(ωk n) − j sin(ωk n) = aωk [n] + jbωk [n].
(4.123)
With the above expression, the single channel of the bank can be represented as shown in the left part related to the analysis filter in Fig. 4.38. The signals real and imaginary part aωk [n] and bωk [n] are called low-frequency components. Generally before the synthesis phase they are processed (decimated/ interpolated, compressed, etc.). Remark 4.16 Note that, in the application to the vocal and/or musical signal the processing is almost never done directly(on the real and imaginary part, but it is usual to process the modulus: Xωk [n] = aω2 k [n] + b2ωk [n], and the phase: θωk [n] = arctan(bωk [n]/aωk [n]). In order to eliminate the problems related to the calculation of the phase (cancelation of the denominator and non-monodromicity) is used its derivative θ˙ωk [n] calculated as θ˙ωk [n] =
bωk [n]˙aωk [n] − aωk [n]b˙ ωk [n] . aω2 k [n] + b2ωk [n]
(4.124)
In this last expression we avoid the problem of the phase jump from −π to π that we have in the calculation when θ < −π ; the denominator, moreover, is equal to the square of the modulus and, except in the trivial case, is never null. The calculation of the derivatives of the sequences real and imaginary part can be approximated by means of the prime differences a˙ ωk [n] = aωk [n] − kaωk [n − 1] and b˙ ωk [n] = bωk [n] − kbωk [n − 1], with 0 k < 1.
4.4.2.3
Discrete STFT Implementation with FFT
In DASP applications it is usual to use the discrete STFT (DSTFT) defined as Xn (k) =
N −1 n=0
x[n]h[n − m]WNnk ,
k = 0, 1, . . . , N − 1
(4.125)
4.4 Short-Time Frequency Analysis
279
where WN = e−j2π/N and by Xn (k) we denote the sampled version (with N samples) of the time-variant spectrum (4.118) with k being the frequency index and n the time index. Equation (4.125) represents a single discrete Fourier transform (DFT) computed for a particular time index n. Notably, Eq. (4.125) can be computed with fast algorithms such as the FFT. An appropriate choice of analysis window h[n] ensures that the sequence x[n] is reconstructible from the DSTFT through the synthesis formula defined as x[n] =
N −1 1 Xn (k)WN−nk , N
for each n.
(4.126)
k=0
Although for sequence reconstruction it is necessary and sufficient to compute the previous expression, it is possible to interpret the x[n] sum of the outputs of a contiguous bank of bandpass filters. This interpretation is derived by considering a set of impulsive responses hk [n] defined as hk [n] =
1 h[n]WNnk , N
k = 0, 1, . . . , N − 1
(4.127)
where h[n] is the lowpass prototype of the filter bank. The output of the single filter bank is then ∞ xk (n) = x[r]hk [n − r] (4.128) r=−∞
where in xk (n), k is the frequency index and n is the time index. The output of the entire bank is therefore equal to the sum of the contributions to the various frequencies y[n] =
N −1
xk [n].
(4.129)
k=0
4.4.3 Nonparametric Signal Spectral Representations The DSTFT calculated with a succession of FFTs gives rise to some spectral representations for stationary and of the non-stationary signal. Commonly in graphical representations of the signal spectrum, what is displayed is an estimate of the power spectral density commonly referred to as sample power spectral density (PSD) according to the following definitions [40, 41]. Definition 4.6 Periodogram—Given a sequence x[n], n ∈ [0, N − 1], the periodogram or sample power spectral density (PSD) of x[n] is defined as the DTFT of its autocorrelation function (acf), i.e.,
280
4 Multi-rate Audio Processing and Wavelet Transform
) Rxx (e ) jω
true DTFT{rxx [k]}
= lim
N →∞
N −1
* rxx [k]e
−jωk
(4.130)
k=−N +1
true [k] denote the true acf, and rxx [k] its empirical estimation. where rxx
The empirical acf rxx [k] is an estimate over a finite number of samples N ≤ const < ∞ and can be computed according to the following definition. ˆ ∼ E, be a time-average Definition 4.7 Empirical autocorrelation function - Let E approximation of the statistical expectation as defined in Sect. 2.5.5, according to some weak assumptions, the empirical acf can be evaluated as ⎧ N −1−k ⎪ ⎪ ⎨ 1 x[n + k]x∗ [n], k = 0, . . . , N − 1 ˆ + k]} = N n=0 rxx [k] = E{x[n]x[n ⎪ ⎪ ⎩ ∗ rxx [−k], k = −(N − 1), . . . , −1 (4.131) where (∗ ) indicates the complex conjugate, and note that the (4.131) ensures that the values of x[n] that fall outside the interval [0, N − 1] are excluded, we further true [k]. assume convergence limN →∞ rxx [k] = rxx Given a sequence of infinite length x[n], when the sequence is weighted with a rectangular window of length N , the following property holds. Property 4.8 Given a sequence xs [n], multiplied with an N -length window function h[n], that select the so-called analysis segment, such that x[n] = h[n]xs [n]; from the above definitions and the convolution Theorem 2.2 (see Sect. 2.4.2.3), it is easy to shown that the periodogram can be also defined as the squared-magnitude DTFT of xh [n], divided by N , that is 2 1 1 ˆ Pxx (ejω ) DTFT E{x[n]x[n + k]} = X (ejω )X ∗ (ejω ) = X (ejω ) (4.132) N N such that, for N → ∞, we have that: Pxx (ejω ) → Rxx (ejω ). For the above definitions, for a non-stationary sequence, if for a short window length the sequence is approximated as stationary, we can write Pxx (n, ejω )
2 1 X (n, ejω ) . N
(4.133)
Remark 4.17 The periodogram is an estimate of the signal spectrum and thus characterized by the mean and variance. There are several variants in the literature, which in some situations can result in a more robust estimate [40, 41].
4.4 Short-Time Frequency Analysis
4.4.3.1
281
Bartlett’s Periodogram
A variant of the periodogram technique, which under certain conditions tends to reduce the variance of the estimate, is the so-called Bartlett’s method in which the estimate is made through an average over several periodograms. Given a sequence in the observation interval [0, Ns − 1], this is divided into K shorter intervals, each of duration N . Assumed to be different samples of the stochastic process, with the assumption that the various segments are independent and identically distributed (i.i.d.). The spectral estimate is thus reduced to a simple average of the spectra computed on the K segments.
4.4.3.2
Welch’s Method
Welch’s method is an improvement of Bartlett’s method but with two differences: i. An opportune (non-rectangular)-shaped analysis window can be applied to each of the K segments to mitigate the Gibbs phenomenon. ii. During the analysis, to reduce discontinuity effects, the segments are partially overlapped (as illustrated in Fig. 4.39). Indicating with L the number of overlap samples, in the implementation of the method we have two degrees of freedom. 1. We can increase the length N of each segment, N → N + L, without changing either the duration Ns of the observation interval, nor the number K = Ns /N of segments. In this case we can expect an increase in frequency resolution. 2. We can increase the number of segments with the same length K = Ns /(N − L), without varying the duration Ns . This results in a decrease in the variance of estimation.
Fig. 4.39 Single channel filter bank implementation. The graphical representation of the spectrum is usually an estimate of the time-varying power spectral density (PSD). Typically, to reduce discontinuity effects, analyses are performed on partially overlapping time windows
282
4.4.3.3
4 Multi-rate Audio Processing and Wavelet Transform
Audio Signals 3D Graphical Representations
The first representation is a 3D waterfilling type. In this case the frequency is usually represented by the horizontal axis, the time by the oblique axis and the amplitude are represented by the height of the peaks. In the second representation, called spectrogram, the x-axis represents the time, the y-axis represents the frequency and the amplitude is determined by the color intensity. This last representation is also called sonogram because it resembles the musical notation on the pentagram. In Fig. 4.40a, b, there are two graphical representations of the time-varying spectrum, for the same sinusoidal logarithmic sweep input signal, evaluated using the DSTFT technique.
4.4.4 Constant-Q Fourier Transform The audio signal analysis often requires time–frequency representations, but given the high-frequency range, that covers 4 decades, a uniform time–frequency resolution as in DSTFT may not be always adequate. In contrast, in this section we present the so-called, constant-Q transform (CQT) which, in order to obtain a more uniform resolution, longer time analysis windows are used in low frequency than in high frequency. The QCT was originally introduced in 1978 by Youngberg-Boll [42]. In musical elaboration by Brown and Puckette in [43, 44], and more recently the methodology has been revisited in [45–49].
Fig. 4.40 3D spectrum representations of a sinusoidal logarithmic sweep type signal calculated with DSTFT: a waterfilling representation; b spectrogram (or sonogram or sonograph), representation. Spectrograms are used extensively in music and speech processing
4.4 Short-Time Frequency Analysis
4.4.4.1
283
Time versus Frequency Uncertainty
With the transformation defined by Eq. (4.125), we highlight the spectrum of the signal within the time bounded by the window function h[n − m], thus obtaining information about its harmonic content in a m-th signal sample neighborhood. The signal segment selection through the multiplication of the sequence x[n] with the analysis window h[n] is equivalent to the convolution between the respective transforms X (ejω ) and H (ejω ) in the frequency domain. The effect of its shape influences the well-known Gibbs phenomenon, and its duration, influences the time– frequency resolution; the h[n]’s shape and duration are therefore of primary importance in spectral analysis problems. Specifically, in time–frequency resolution, we must account for the trade-off time resolution t, versus frequency resolution f . For the basic Gabor uncertainty principle, which defines limits on the representation of any signal, we have that ([23], Eq. (1.2)) t × f ≥ 1 (4.134) which states that t and f cannot be defined in exact way, an increment in one implies a decrease in the other. In addition, it means that the product of the uncertainties in frequency and time must exceed a fixed constant, so it generalizes Heisenberg’s uncertainty principle4 . From the above discussion, the DSTFT, defined by the relation (4.125), yields a frequency resolution, understood as the distance between two distinguishable spectral lines, given by the sampling rate divided by the sequence length N . Thus, if we analyze a music signal, sampled at a certain sampling rate Fc , over an analysis window of N samples, the frequency resolution, denoted as the bandwidth f , turns out to be defined by the ratio (4.135) f = Fc /N . Now, to intuitively introduce the problem of the frequency spectral estimation accuracy, consider the following examples. Example 4.3 Given two signals, the first at 1 kHz, and the second at 100 Hz, both sampled with Fc =32 kHz, and with N =1024 samples. In case of spectral analysis with standard periodogram, the bandwidth for (4.135) is equal to f = Fc /N = 31.25 Hz, constant for both frequencies, is determined by the duration of the analysis window defined as to t = 1/f = N /Fc = 32 ms. Now, we note that for the first at 1 kHz, the analysis is performed on a time window equivalent to 32 periods of the signal, while for the second, at 100 Hz, it is performed only on 3.2 periods of the signal. In other words, constant t (or f ) implies the same number of samples, both in the low and high range of the signal spectrum, but not the same number of signal periods to be analyzed which, in contrast, is what determines the robustness of the frequency 4
Observe that, Gabor’s research is, in part, motivated by the observation that our perception of sound is simultaneously defined by duration and pitch, and thus that an analysis of sound should be performed in terms of localized elements in both duration and frequency ([23], pp. 431–432).
284
4 Multi-rate Audio Processing and Wavelet Transform
estimate. Therefore, to have the same accuracy, for the 100 Hz signal, we will have to use the same number of periods, and thus a window length of N = 10240 samples. Example 4.4 As a second example, consider two musical notes, A3 and A8, acquired with Fc = 32 kHz, and of length equal to N =1024 samples. Note A3, has a frequency equal to f0 = 220 Hz, while A8 has a frequency equal to 7.04 kHz. The resolution equal to f = Fc /N =31.25 Hz is equivalent to an error of about (f /f0 ) × 100 ∼ 14% (at 220 Hz); if we consider the A8 note, with a frequency of 7.04 kHz, the error is ∼ 0.44%. Considering that the minimum analysis error for separating two adjacent notes is about 6% (the distance between two notes equal to one semitone is equal to: 1 − 2−1/12 ), an analysis window of 1024 samples is undersized at low frequencies and oversized at high frequencies. It is evident, as shown in Fig. 4.41, that for a correct analysis of the musical signals the conventional Fourier transform is inadequate: In order to have a constant resolution at all frequencies, it is necessary to have long analysis windows at low frequencies and short analysis windows at high frequencies [42–44].
4.4.4.2
Spectrum Analysis by Constant-Q Filter Bank
To perform a spectral analysis of broadband processes, such as music signals, with the same resolution over the entire frequency range, we must assume, lengths of the analysis window inversely proportional to frequency, or equivalent, bandwidth f proportional to frequency f . Thus, let fk be the bandwidth, and fk be the central frequency, i.e., the frequency around which the spectral analysis needs to be performed, where k is the frequency index, to have the same resolution at all frequencies we need to impose the Q factor, defined as Q = fk /fk
(4.136)
Fig. 4.41 Time–frequency resolution displayed with Heisenberg uncertainty tiles with equal surface t × f . a STFT: the rectangles are identical and depend only on the choice of analysis window; b Constant Q transform: the area of the rectangle remains constant but the time and frequency resolution are variable, respectively
4.4 Short-Time Frequency Analysis
285
identical for all frequencies in the range; this type of measurement is then called constant-Q frequency analysis. The constant Q is a constraint that, in practice, allows to have the same precision in all frequencyfk that we intend to analyze. For example, to have a resolution equal to a quarter tone, in the case of equal tempered scale, this leads to a distance between two notes equal to: 21/24 − 1, the Q turns out to be: Q > 1/(21/24−1 ) ≈ 34. Remark 4.18 Note that, to realize a spectrum analyzer with constant-Q analysis, it is possible to use a filter bank with a bandpass equal to fk , as in the constant-Q equalizers presented in Sect. 3.2.4. In this case, at each bandpass filter output we can insert an average energy meter ∼ N1 |X (ej2πfk )|2 and have direct information about the power spectral density of the signal for each frequency band fk of the bank. For example, Fig. 4.42 shows the magnitude 10 log10 |X (ejω )|2 of a Gaussian white random noise, and of a random sequence called pink noise or noise with distribution 1/f . The sequences have a duration of 10 [s], with Fc = 44.1 kHz and the spectra are evaluated by a bank of constant-Q filters with 1/3 and 1/6 octave resolution. Spectra has been evaluated with the Matlab function Pxx = poctave(x,Fc, ’BandsPerOctave’,Bw); that returns the fraction-of-octave spectrum of a sequence x[n] sampled at a rate Fc . The octave spectrum is the average power over octave bands as defined by the ANSI S1.11 standard [50]. Remark 4.19 Note that, in the spectral analysis performed at fractions of octave, it must be taken into account that the energy is the greater, the wider is the interval fk . In matter of fact, for white Gaussian noise, the energy spectrum is uniform over the whole frequency range. Hence, with a fraction-of-octave analyzer, 1/3 and 1/6, in the figure, the curve has an increasing trend as a function of frequency. Since for each fraction of octave the bandwidth doubles: fk+1 = 2fk , for uniform distribution process, it follows that the energy also doubles. So, the increment for each band is equal to 10 log10 2 ≈ 3 dB (equivalent to 10 dB/dec). Pink noise is therefore a random signal, widely used in audio measurements, since by definition it has a distribution equal to 1/f , and therefore, it has the property of
Fig. 4.42 Gaussian versus pink noise spectra evaluated by constant-Q filter bank with 1/3 and 1/6 octave resolution. By definition, pink noise has a spectrum distribution equal to 1/f . Hence, it has the property of having the same energy for each octave (or fraction thereof). The decrease in magnitude is 3 dB per octave. So the curve appears flat
286
4 Multi-rate Audio Processing and Wavelet Transform
having the same energy for each octave (or fraction). Indeed, from the spectra in Fig. 4.42, measured with a constant-Q filter bank, it can be observed that the pink noise spectrum appears flat. It turns out that a simple way to generate pink noise is to lowpass filter a Gaussian white noise with a filter of slope equal to −3 [dB/oct]. Remark 4.20 Note that, from a perceptual point of view, the octave, referred to the musical scale, is more representative of the human hearing organ. For example, the interval between the notes A1 at 55 Hz and A2 at 110 Hz is perceived as the interval between the notes A6 at 1760 Hz and A7 at 3520 Hz. However, to resolve very high harmonics such as those in the violin spectrum, it is usual to choose higher values of Q (e.g., Q = 24, 48, 96).
4.4.4.3
Constant-Q Short-Time Fourier Transform
In a dual way, in order to realize a STFT-based spectral analysis algorithm that has the same resolution for each frequency range, one must then impose a certain duration tk , expressed in terms of the number of samples N [k], different for each frequency fk , such that the Q, defined in Eq. (4.136), remains constant for all spectral components. From Eq. (4.135), we have that fk = Fc /N [k], and from Eq. (4.136), we can write Q = fk /(Fc /N [k]). Thus, solving for N [k] the length of the analysis window for the k-th frequency component results to be equal to N [k] =
Fc . fk
(4.137)
The above expression indicates that the analysis window contains exactly Q cycles of signal at frequency fk ; one period corresponds in fact to Fc /fk samples. To perform spectral analysis with the same resolution at all frequencies, we consider an analysis window of different length for each frequency index calculated with (4.137). Thus, the expression from the DSTFT (4.125) is then rewritten, considering a variable length of the analysis window as ([43], Eq. (5)) Xn (k) =
N [k]−1 1 Fc x[n]h[n, k]e−j2πnQ/N [k] , for N [k] = Q N [k] n=0 fk
(4.138)
where h[n, k] is the analysis of the time window of variable length equal to N [k]. In the above expression, the period in samples is N [k]/Q; i.e., we always analyze Q cycles. Thus, Q ≥ 17, corresponding to a resolution of a semitone. For a quarter-tone resolution Q ≥ 34, and the length of the window is evaluated as N [k] = Nmax /(21/24 )k ,
4.4 Short-Time Frequency Analysis
287
where Nmax is Q times the period, evaluated in number of samples, of the lowest frequency to be analyzed. For example, at 27.5 Hz to have about 10 periods, with Fc = 44.1 kHz, the length of the window is equal to ∼16000 samples; while at 4186.00 Hz, the length of the window is equal to about ∼100 samples. The analysis window h[n, k], in case we use the raised cosine window, turns out to be equal to h[n, k] = α + (1 − α) cos(2π n/N [k]),
0 ≤ n ≤ N [k] − 1
where, in the case of the Hamming window, α = 24/25. To speed up the analysis time, it is usual to precompute and store on a LUT all the values present in the (4.138): LUT[n, k] = h[n, k]e−j2πnQ/N [k] ,
k = 1, . . . , kmax , n = 1, . . . , N [k].
For musical applications, considering that the piano keyboard covers over 7.3 octaves with 88 notes, in the frequency range 27.50–4186.00 Hz, a (number of bands×octave ≥24), is needed for quarter-tone resolution, thus the frequency bins number should be kmax ≥ 88 × 2. An efficient procedure for the FFT-based Eq. (4.138) is given in [44]. For example, Fig. 4.43 shows a comparison of some constant-Q analysis methods in comparison with the previously illustrated filter bank-based method. In particular, Fig. 4.43-a) shows the spectrum analysis with the algorithm of Holighaus et al. [48] implemented in the function Matlab cfs=cqt(x, ’SamplingFrequency’, Fs, ’BinsPerOctave’,Bw). The algorithm, that is an evolution of Brown’s FFT-method in [43] and [44], called non-stationary Gabor transform (NSGT); it is based on the theory of non-stationary Gabor frames, allows a very efficient FFT-based implementation, and unlike Brown’s formula (4.138), also allows the inverse transformation. Note, however, that unlike the filter bank approach, the spectrum of the white noise appears flat and that of the pink noise as decreasing. Figure 4.43b shows the spectrum calculated with constant-Q with 24 filters per octave. Figure 4.43c shows the spectrum evaluated with standard periodogram (see Sect. 4.4.3, Eq. (4.132)). Figure 4.43c shows the spectrum calculated with Welch’s method that allows a spectral estimation with low variance. In fact, in this case the curves appear smoother than in the standard periodogram. Finally, Fig. 4.44 shows the spectrum of the body TF magnitude, obtained from the analysis of the measured impulse response of a bowed-string instrument (a viola) [46]. The figure shows the responses of the periodogram superimposed on the constant-Q spectral analysis at 48 bins-per-octave which, in practice, corresponds to a smooth version of the periodogram. From the smooth curve, the various modes of vibration that represent the acoustic signature of the instrument are clearly evident. Remark 4.21 It should be noted that the main advantage of the constant-Q method is that we can adjust the spectrum smoothness by changing the number of filters
288
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.43 Constant-Q spectral analysis of Gaussian white and pink noise, with 1/24-oct resolution. Values in dB are not normalized and are defined as the default library options Fig. 4.44 Normalized TF’s magnitude obtained from the measured impulse response of a viola. Spectra evaluated by standard periodogram and a by constant-Q FFT with 1/48-octave resolution. (Viola impulse response courtesy of [46])
per octave. Decreasing the value of Q widens the filter band, and the spectrum appears smoother. In audio this is very important because, as with the musical scale, the hearing organ is sensitive to the logarithm of frequency. In fact, often spectral analysis is used to adjust EQ curves related to acoustic perception and therefore better described with -Q analysis techniques.
4.5 Wavelet Basis and Transforms The Wavelet Transform (WT) can be seen as a generalization of the STFT analysis with variable resolution. In particular, the discrete wavelet transform (DWT) generalizes the constant-Q frequency analysis and is, also, related to perfect reconstruction orthogonal octave filter banks [16–26]. Wavelet transforms are implemented considering a parametric basis function, denoted as wavelet, literally “small wave” or mother wavelet, written as ψs,τ (t), where s represents the dilatation or scale parameter, and τ is the translation parameter. The WT can be defined as the following scalar product: !∞ X (s, τ ) = −∞
∗ x(t)ψs,τ (t)dt
(4.139)
4.5 Wavelet Basis and Transforms
where
1 ψs,τ (t) = √ ψ |s|
289
"
t−τ s
# .
(4.140)
We have two types of WTs: continuous WTs (CWTs) and discrete WTs (DWTs). The CWTs allow a wide flexibility in the choice of wavelet shape and the sequence of scales of analysis; differently, the DWTs limit the scaling factors, usually as a power of two, but allow simpler formulation and numerical implementation, of transformation and anti-transformation operations. Remark 4.22 Note that, as we will see better later in Sect. 4.5.3, a way to derive the mother wavelet functions ψs,τ (t), can be defined by a simple recursive transformation of a given scaling function ϕ(t), also called father function, which can be defined according to simple regularity conditions. The mother wavelets ψ( · ) and the father functions ϕ( · ), are very short duration functions oscillating around zero. That is, they have limited support because they are localized exactly in the time–frequency space. Differently, in the Fourier transform sine–cosine basis functions are perfectly localized in frequency but are defined on the entire time axis; i.e., they have infinite support. Wavelets, on the other hand, are well-localized in time since they are relative to a precise time interval (they tend to zero for t → ±∞) and are also well-localized in frequency. These properties make them particularly suitable for signals representation that is localized in both time and frequency space and, therefore, are particularly suitable for transient or spiked and non-stationary audio signals. In DASP the DWT can be used to analyze the temporal and spectral properties of non-stationary sequences such as the most common audio signals. In fact, given its greater flexibility, it can be used in all contexts where the spectral analysis technique is a critical factor, such as, for example, to extract features in automatic classification systems, for audio signals denoising, in sound synthesis systems, in compression systems, in audio effects based on time–frequency transformation, etc. [16].
4.5.1 Continuous-Time Wavelet Given a signal x(t) ∈ L2 (R, C), i.e., a signal defined in the Hilbert space of quadratically integrable functions, also called the space of energy functions, to introduce the fundamental concepts of the wavelet transform, we consider the generalized transformation, indicated as T {x(t)}, defined by the scalar product between x(t) and a generic basis function ψ(t), which defines the domain of the transformed signal, as !∞ T {x(t)} = x(t), ψ(t) = −∞
x (t) ψ ∗ (t) dt.
(4.141)
290
4 Multi-rate Audio Processing and Wavelet Transform
So, it also applies
x(t) = T −1 {T {x(t)}} = x(t), ψk (t) , ψj∗ (t)
(4.142)
where a(t), b(t) denote the scalar product between the functions a(t) and b(t). The function ψ(t) is defined as the transformation kernel. The above expression indicates that the signal x(t) is mapped from the original domain into the transformed domain defined by the transformation kernel. For example, for ψ(t) = ejωt the (4.141) coincides with the continuous-time Fourier transform (CTFT) F(x(t), ω) = x(t), ejωt =
!∞
x (t) e−jωt dt.
(4.143)
−∞
The CTFT of a signal x(t) depends only on the angular frequency ω and is generally called the spectrum of the signal. The dependence also on time was introduced with the definition of the STFT (see Sect. 4.4) which expressed with the formalism (4.141) can be rewritten as !∞ STFT (x(t), ω, τ ) =
x(t)h(t − τ )e−jωt dt
−∞
= x(t), h(t − τ )e
jωt
(4.144)
= x(t)h(t − τ ), e
jωt
.
Thus, in STFT the bandwidth f and the duration of the time window t are constant and depend only on the shape and the length of the a priori fixed analysis window h(t).
4.5.1.1
Continuous-Time Wavelet Transform Definition
The CWTs (4.139) are defined in terms of scalar product as ([22], Eq. (1.14), and [26], Eq. (2.6.8))
CW T (x(t), s, τ ) = x(t),
√1 ψ |s|
t−τ s
1 =√ |s|
!∞ x(t)ψ −∞
∗
"
t−τ s
# dt
(4.145)
where the scalar s and τ are, respectively, defined as dilation or scaling parameter √ and translation parameter, and the multiplication by 1/ s, it is entered for energy normalization. The parametric function ψs,τ (t) is referred to as the mother wavelet, can be seen as a parametric basis such that, through the dilation and translation parameters, it allows to properly adjust both the time scale and the frequency resolu-
4.5 Wavelet Basis and Transforms
291
tion. As we will see, function ψs,τ (t) has bandpass characteristics. So, the mapping on the time scale plane coincides with the time–frequency mapping. Figure 4.45 shows the Heisenberg uncertainty rectangles (time–frequency tiles) of various types of transformations. In practice, CWT stems from the idea of replacing the modulation operation ejωt of the Fourier transform with a dilation-translation operation obtained by means of a function ψs,τ (t), parametric in the parameters s, τ , called a wavelet. Explicating (4.145), we obtain the CWT of the signal x(t) ∈ L2 (R, C), that usually is denoted as CW T (x(t), s, τ ), X (s, τ ), …; and formally is defined as in Eq. (4.140) or (4.145), where the wavelet functions are obtained via the dilatation and translation parameters s and τ , applied recursively from a certain initial function called father function. Remark 4.23 In other words, the wavelets ψs,τ (t) are a set of basis functions of finite duration, very suitable for the representation of non-stationary signals and that, under certain conditions, are orthonormal5 . In formal terms, the (4.145) consists of a wavelet representation of a function x(t) with respect to a complete set of functions, for certain values of s and τ . A typical trend of wavelet functions, typically a sine or exponential function multiplied to a window of appropriate duration and shape, is shown in Fig. 4.46. Property 4.9 With a simple change of variable (t → ts), the expression (4.145) can be rewritten as a convolution integral
Fig. 4.45 Time–frequency tile representation for various types of transforms: a short-time Fourier transform; b wavelet transform; c and d generic transforms with time-varying and frequency-varying tiles (modified from [26]) Fig. 4.46 Possible trend of wavelet functions for different dilatation parameter or time scales (independent of translation): a small time scale (broadband in frequency domain); b large time scale (narrowband in frequency domain)
We remind the reader that a set of complex functions ψn (t), are orthonormal if ψn (t), ψm (t) = δnm . In the case n = m this simply expresses normalization to 1 of a function ψn (t)2 = 1 , while in the case n = m it expresses a condition of orthogonality between the two functions.
5
292
4 Multi-rate Audio Processing and Wavelet Transform
X (s, τ ) =
+
!∞ |s|
x(st)ψ ∗ (t −
−∞
τ )dt. s
(4.146)
Thus, the interpretation of (4.145) is that for (s < 1), ψ(t − τ/s) expands and takes into account the behavior over longer time periods. Equivalently, Eq. (4.146) indicates that if the scale grows, a contracted version of the signal itself x(st), flows through the filter with impulse response ψ( · ). It follows that, the scale factor s can be interpreted as the scale of a map or as the zoom of a camera: a large scale (s > 1) allows a larger (or global) view; a small scale (s < 1) allows an enlarged and more detailed view of a small size of the scene. Remark 4.24 Remember that, in terms of frequency response, the resolution of a continuous-time signal is related to its frequency content. For example, lowpass filtering reduces the resolution but does not change the time scale of the signal. Moreover, in continuous time if scale parameter changes x(t) → x(t/c), the time duration change but does not alter the signal resolution, in fact, in this case the transformation is perfectly reversible. On the contrary, in the discrete-time case, a time scale reduction usually implies a non-reversible subsampling that automatically reduces the frequency resolution; while, a change of scale with oversampling, as it is reversible, does not change the resolution of the signal.
4.5.1.2
Continuous-Time Wavelet Properties
The wavelet mother function ψs,τ (t) has the following properties. !∞ −∞ !∞
−∞ !∞
ψs,τ (t) dt = 0
(4.147)
ψs,τ (t)2 dt = 1
(4.148)
ψs,τ (t) dt = c < ∞.
(4.149)
−∞
Equations (4.147)–(4.149) indicate that the ψs,τ ∈ L2 (R, C) is zero mean, normalized, and with finite amplitude. It is generally useful note that it also holds !∞ t p ψs,τ (t) dt = 0, −∞
for p < P ∈ N
(4.150)
4.5 Wavelet Basis and Transforms
293
by interpreting the previous expression from a statistical point of view (i.e., considering ψs,τ (t) as a distribution), we have that ψs,τ (t) has a number of null moments. Example 4.5 A widely used wavelet function family can be defined as a modulated Gaussian. An example is the Morlet or Gabor wavelet, consisting of a complex exponential modulated by a Gaussian, that can be defined as ([23], Eq. (1.27)) ψk (t) = ejωk t e−αk (t−τk ) 2
2
(4.151)
where the parameters αk2 = 2σ1 2 , ωk and τk , define the pulse sharpness. The ψsk ,τk (t), k defined by scaling and translation of ψ(t) in (4.151), is illustrated in Fig. 4.47. Remark 4.25 Note that, Morlet or Gabor wavelet transforms as well as Gaussian smoothing are widely used for time–frequency analysis of non-stationary time series data. However, the most important wavelet’s parameter is the width (or duration). Thus, in the case, Morlet wavelets are the Gaussian deviation that modulates the sine wave. In fact, as we know it is not possible to have arbitrarily good precision in both time and frequency at the same time, and wavelets width controls the tradeoff between time precision and frequency precision. Generally, in frequency analysis the Gaussian width is defined as the number of cycles that you want to analyze in each frequency range. Thus, let NC be the number of cycles (usually, 2 ≤ NC < 20), we have that ([25], Eq. (2)) σk = NC /ωk ,
⇔
ωk NC = √ 2 αk
(4.152)
i.e., the parameter NC defines the time–frequency precision trade-off, similarly to the Q factor in the constant-Q spectral analysis FBs (see Sect. 4.4.4.2). For more details, see for example [25]. For example, DASP can be used to capture short bursts of repeating and alternating music notes with a clear start and end time for each note; moreover, speech processing can be used in pitch estimation and can produce more accurate results than Fourier transform techniques.
Fig. 4.47 Morlet or Gabor wavelet consists of a complex sinusoidal signal modulated by a Gaussian envelope. Example for three different values of αk , ωk and τk . Each wavelet contains an identical number of periods of the basis
294
4 Multi-rate Audio Processing and Wavelet Transform
4.5.2 Inverse Continuous-Time Wavelet The function x(t) can be reconstructed by CWT using the inverse transform defined as !∞ !∞ dtds 1 X (s, τ ) ψs,τ (t) 2 (4.153) x(t) = Cψ s 0 −∞
where Cψ is a finite constant dependent on the chosen mother wavelet function. This formula can be approximated in many ways, and the most usual is !∞ X (s, τ )
x(t) Kψ
ds . s3/2
(4.154)
0
The inner product (4.139) defines the similarity between the signal x(t) and the basis functions ψs,τ (t) so that, as (4.153) indicates, each signal can be represented as the sum (in terms of an integral) of many elementary functions of identical shape but with different amplitudes and durations. Remark 4.26 We have seen that the mother wavelets ψs,τ (t), in the parameters s and τ , behave in wavelet analysis and synthesis as a basis funcions, although properly basis is not. In fact, as indicated earlier, in general the ψs,τ (t), are not orthogonal since they are defined for continuous variations of s and τ . Surprisingly, the reconstruction formula (4.154) is satisfied for ψ(t), at finite energy and of bandpass type (i.e., with finite impulse response, and similar to a wave from which the name is derived). More precisely, the condition for reconstruction is a simple sufficient regularity of ψ(t), defined by the conditions (4.147)–(4.149).
4.5.2.1
Spectrogram
Used in signal analysis, under appropriate conditions that we will see later in the next paragraphs, the CWT behaves as an orthonormal basis and allows an effective representation of the signal energy in the time–frequency plane in terms of the quadratic modulus of the (4.146) associated with the d τ ds/s2 measurement. It is worth, in fact, the Parseval theorem for which !∞ !∞ Ex =
dτ ds |X (s, τ )| = s2
!∞ |x (t)|2 dt.
2
0 −∞
(4.155)
−∞
This ensures a consistent representation of the signal energy distribution, expressed as power per unit frequency, on the time scale plane with the advantage of having a different resolution for all frequencies.
4.5 Wavelet Basis and Transforms
295
Fig. 4.48 Time–frequency resolution of a continuous wavelet transform for time scale dilatation s = 2, and integer time translation τk = k
Remark 4.27 Note that, in the case where s = 2, and τ = k integer, as we will see better in the next paragraphs, the resulting spectrum has a resolution as shown in Fig. 4.48.
4.5.3 Orthogonal Wavelets and the Discrete Wavelet Transform To introduce the fundamental concepts of orthogonal wavelets and multiresolution analysis, we proceed with an intuitive approach based on the following property. Property 4.10 The wavelets functions ψs,τ (t) can be generated by translations and dilatation of a certain function ϕ(t), called father function, or scaling function, characterized by the property !∞ ϕ (t) dt = 1. (4.156) −∞
As an example, Fig. 4.49 shows the “father” and “mother” functions of Haar, symmlet-4, Daubechies-4, and Coiflets-4.
Fig. 4.49 Common scaling functions (or father functions) (top) and related wavelet functions (bottom) of: Haar, Symmlet-4, Daubechies, Coiflets
296
4 Multi-rate Audio Processing and Wavelet Transform
To introduce the fundamental concepts of orthogonal wavelets and multiresolution analysis, we proceed with an intuitive approach. Consider a parametric father function ϕs,τ (t), for example the type previously given in (4.140), in which the dilation is always equal to s = 2 (or a power of two, positive or negative) denoted as dilation scale-2, and that the translation is always equal to an integer quantity τ = k ∈ Z. Remark 4.28 Note that, in the case of integer scale-2 expansion and translation, the wavelet, while defined in continuous time (and therefore analog) is referred to as a discrete wavelet transform (DWT). As a first example, for simplicity, we consider the father function to be a Haar function (shown at the top left in Fig. 4.49) defined as ϕ(t) = 10 (t) =
1, 0,
t ∈ [0, 1) otherwise
(4.157)
where 10 (t) denotes the rect(t) function of unit amplitude and unit surface, i.e., defined for t ∈ [0, 1). We consider the generation of new functions by applying to Eq. (4.157), a dilation scale-2 and integer translations (D2I) operator according to the following definition. Definition 4.8 Dilation scale-2 and integer translations - Indicated as, DTi,k {· }, the D2I is defined as DTi,k {·} DTi,k {ϕ(t)} → 2i/2 ϕ(2i t − k), with k, i ∈ Z
(4.158)
where, in agreement with the (4.156) property, the multiplication by 2i/2 is introduced in order to have a unit surface and denoted for simplicity of exposition, as ϕi,k (t) = 2i/2 ϕ(2i t − k)
(4.159)
for which the following property applies. Property 4.11 For integer translations and scale-2 dilations, the orthonormality is valid. Proof Orthonormality can be easily proved as
ϕi,k (t), ϕi,l (t) = 2i/2 ϕ(2i t − k), 2i/2 ϕ(2i t − l) = δ(k − l).
(4.160)
4.5.3.1
DWT Orthogonal Space Definition
Now to define a more formal way for DWT best practices, we define V0 = spank {ϕ0,k (t)}, as the space generated (or spanned) by the translation of the father function alone DT0,k {ϕ(t)} = ϕ(t − k), i.e.,
4.5 Wavelet Basis and Transforms
297
ϕ0,k (t) = ϕ(t − k), ⇔
V0 = spank {ϕ0,k (t)}.
(4.161)
Similarly, the scale-2 expansion of the same father function DT1,k {ϕ(t)} generates a family of functions ϕ1,k (t) which in turn constitute an orthonormal basis for a space V1 , i.e., (4.162) ϕ1,k (t) = 21/2 ϕ(2t − k), ⇔ V1 = spank {ϕ1,k (t)} which, for Haar’s father function, is constant at intervals of length 21 . By definition V0 includes within it the space V−1 , i.e., V−1 ⊂ V0 ⊂ V1 . Generalizing, the spaces Vi are specified by a succession of scale-2 dilations (and integer translations), of the parent function ϕ(t), formally ϕi,k (t) = 2i/2 ϕ(2i t − k), ⇔
Vi = spank {ϕi,k (t)}, k, i ∈ Z.
(4.163)
Remark 4.29 The scale-2 dilation operator produces an implicit reduction in the value of the norm so in its definition (see (4.158)) is included the normalization factor that ensures that the norm 2i/2 ϕ(2i t − k) = const, for each level dilation scale-2 and translation regardless of the indices i, k; without such correction we have that the expression |ϕ(2i t − k)| → 0, for i → ∞. For the definition of the wavelets mother functions ψs,k (t), generated by the father functions ϕs,k (t), orthogonally translated and dilated, we can think of representing some function x(t) ∈ L2 , defined in Hilbert space denoted here as x(t) ∈ Vi+1 , as the sum of a component in Vi and a component in its orthogonal complement denoted by Wi ; formally (4.164) Vi+1 = Vi ⊕ Wi with Vi ⊥Wi , and where the symbol ⊕ denotes the direct sum. In other words, the space Vi can be viewed as the “sum” of the Vi component and the orthogonal Wi component, also referred to as wavelet space. For example, with reference to Fig. 4.50, the orthogonal complement of V−1 (i.e., that which does not intersect with it, and which when added to V−1 equals V0 ), consists of the set indicated by the dashed line denoted as W−1 , i.e., V0 = V−1 ⊕ W−1 generalized by (4.164).
Fig. 4.50 Sequence of nested closed spaces such that: · · · Vi−1 ⊂ Vi ⊂ Vi+1 · · · , as sum of the Vi component and the orthogonal Wi component
298
4 Multi-rate Audio Processing and Wavelet Transform
Thus, for the (4.164), every level Vi can be decomposed into the orthogonal components Vi−1 and Wi−1 and, consequently, the following property holds. Property 4.12 The space of wavelet functions Wi can be interpreted as the difference between the space of dilated and translated father functions Vi+1 and −Vi . Wi = Vi+1 ⊕ (−Vi ).
(4.165)
Then, the mother wavelet ψ(t) ∈ W0 , denoted as the zero-order wavelet ψ0,0 (t), is generated by the two-scale characteristic functions, as the difference ψ0,0 (t) = ϕ1,0 (t) − ϕ1,1 (t). For example, by directly applying the property (4.165) to (4.157), the (zero-order) Haar’s wavelet shown at the bottom left of Fig. 4.49 can be defined as ψ0,0 (t) = ϕ1,0 (t) − ϕ1,1 (t) = ϕ(2t) − ϕ(2t − 1) 1/2
= 0 (t) − 11/2 (t) or
⎧ ⎪ ⎨ 1, t ∈ [0, 1/2) ψ(t) = −1, t ∈ [1/2, 1) ⎪ ⎩ 0, otherwise.
(4.166)
(4.167)
By analogy with the ϕ0,k (t) = ϕ(t − k), previously defined, it is easy to verify that the functions ψ0,k (t) = ψ(t − k), generated by translation of the mother wavelet, constitute, in turn, an orthonormal basis in the orthogonal space W0 . Thus, similarly, the following property applies. Property 4.13 The parametric functions ψi,k (t), defined as ψi,k (t) = 2i/2 ψ(2i t − k), ⇔
Wi = spank {ψi,k (t)}, k, i ∈ Z
(4.168)
constitute an orthonormal basis for the spaces Wi . In addition, as in the previous case, the inclusion property applies · · · Wi−1 ⊂ Wi ⊂ Wi+1 · · · , (see Fig. 4.51).
4.5.4 Multiresolution Analysis: Axiomatic Approach The wavelets theory has been introduced for the multiresolution signals decomposition or multiresolution analysis (MRA), to analyze the information content of signals at different resolutions in the time–frequency plane. Formally introduced by Mallat in [21], MRA is a direct consequence of Eq. (4.164) and of the Properties 4.12 and 4.13. The MRA can be viewed as a succession of transformations of some function x(t) with resolution 2i , defined as an orthogonal projection of x(t) into the reduceddimensional subspace Vi−1 .
4.5 Wavelet Basis and Transforms
299
Fig. , 4.51 Multiresolution analysis. Representation of the nested subspaces such that: Vi ⊂ Vi+1 and i Vi = V0 ∈ L2 (R)
Thus, MRA is useful for analyzing the information content of signals at different resolutions in the time–frequency plane, is intimately related to wavelet transform theory, and also provides some methods for its practical use. Given V0 the space defining the coarser resolution, and V−1 , V−2 , …; the finer frequency resolution spaces, the shift in the set of subspaces from one level of resolution to another with twice the resolution in frequency, or half the time, is described by the following rule called the scaling-2 property, i.e., · · · x(2−1 t) ∈ Vi−1 ⇔ x(t) ∈ Vi ⇔ x(2t) ∈ Vi+1 ⇔ x(22 t) ∈ Vi+2 · · · . (4.169) By the property (4.164) and (4.165), it holds Vi+1 = Vi ⊕ Wi and Vi = Vi−1 ⊕ Wi−1 . Applying the latter recursively yields the following pyramid disaggregation .. . V0 = V−1 ⊕ W−1 , = V−2 ⊕ W−2 ⊕ W−1 ,
level 0 level − 1 (4.170)
.. . = V−L ⊕ W−L ⊕ · · · ⊕ W−2 ⊕ W−1 ,
level − L
.. .. where V0 is the coarsest scale subspace (corresponding to the low-frequency characteristics of the x(t) signal, i.e., the longest signal window). It follows that a given function x(t) ∈ V0 , in addition to being represented by its level-i, and the level-i of its orthogonal complement Wi , can be represented as disaggregated at gradually finer levels of detail. In (4.170) as illustrated in Fig. 4.51, in the L-level case we have a total of L elements W−i and one V−L . Property 4.14 In summary, the fundamental assumptions of MRA can be defined axiomatically by considering the following definitions and properties (see [21, 26] for details). i. Causal property of MRA: ∀x(t) ∈ L2 (R), there exists a set of nested subspaces of the signal space L2 (R) with various temporal resolutions V−∞ ⊂ · · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · ⊂ V∞ = L2 (R)
(4.171)
300
4 Multi-rate Audio Processing and Wavelet Transform
such that each space has different basis vectors with different temporal resolution. The resolution in Vi is finer as the index i increases. ii. Completeness and closure -
.
Vi = L2 (R),
i∈Z
Vi = {0}.
(4.172)
i∈Z
iii. Decomposition into orthogonal components Vi+1 = Vi ⊕ Wi , i ∈ Z
(4.173)
with Wi ∩ Wj = {0}, i = j. iv. Scale-2 expansion property or scale invariance ⇔
x(t) ∈ V0
x(2i t) ∈ Vi , i ∈ Z.
(4.174)
x(t − k) ∈ V0 , k ∈ Z.
(4.175)
v. Translation invariance property x(t) ∈ V0
⇔
vi. There exists a basis ϕ(t) ∈ V0 such that ϕ0,k (t) = ϕ(t − k)
⇔
V0 = spank {ϕ0,k (t)}
(4.176)
orthonormal in V0
ϕi,k (t), ϕi,l (t) = δ(k − l), i, k, l ∈ Z.
4.5.4.1
(4.177)
Multiresolution Analysis Wavelet Series Expansion
From the previous properties, since V0 = spank {ϕ0,k (t)} and Wi = spank {ψi,k (t)}, a signal x(t) ∈ V0 can be serially expanded with the contribution in V−L , plus a linear combination of contributions in W−i ; formally x(t) =
k
∞
aj (k) · ϕj,k (t) +
i=j
∈V−L
k
bi (k) · ψi,k (t)
(4.178)
∈(W−1 ⊕W−2 ···⊕W−L+1 ⊕W−L )
where the term aj (k) is calculated as
aj (k) = x(t), ϕj,k (t) = 2
!∞ j/2 −∞
ϕ(2j t − k)x(t)dt
(4.179)
4.5 Wavelet Basis and Transforms
301
and, similarly, the term bi (k) is given by
!∞
bi (k) = x(t), ψi,k (t) = 2
ψ(2i t − k)x(t)dt.
i/2
(4.180)
−∞
Remark 4.30 From a more physical point of view, MRA is based on: 1) a coarsegrained decomposition considering an average (i.e., a sum); and 2) a more finegrained one considering detail or contrast (i.e., a difference). As the resolution increases, the norm of the sum Vi ⊕ Wi tends to zero, i.e., lim x(2i t − k) → 0: i→−∞
this allows the MRA of any function x(t) ∈ L2 (R).
Example 4.6 For a better understanding we perform the MRA of a simple signal x(t) defined as ⎧ 8, ⎪ ⎪ ⎨ −4, x(t) = ⎪ 8, ⎪ ⎩ −10,
t t t t
∈ [0, ∈ [ 41 , ∈ [ 21 , ∈ [ 43 ,
1 ) 4 1 ) 2 3 ) 4
,
0 otherwise.
(4.181)
1)
For MRA we consider, for simplicity, a Haar father function (4.157) and its associated wavelet defined in (4.167). To calculate the coefficient, by making explicit scalar product for i = k = 0 in (4.178), we have !∞ a0 (0) =
ϕ0,0 (t)x(t)dt −∞
!1/4 !2/4 !3/4 !4/4 = ϕ(t)x(t)dt + ϕ(t)x(t)dt + ϕ(t)x(t)dt + ϕ(t)x(t)dt 0
=
1/4
(1)(8)( 41 )
+
2/4
(1)(−4)( 14 )
+
(1)(8)( 41 )
3/4
+ (1)(−10)( 14 ) =
1 2
similarly for the scalar products (4.180), we have !∞ b0 (0) =
ψ0,0 (t)x(t)dt −∞
!1/4 !2/4 !3/4 !4/4 = ψ(t)x(t)dt + ψ(t)x(t)dt + ψ(t)x(t)dt + ψ(t)x(t)dt 0
=
(1)(8)( 14 )
1/4
+
(1)(−4)( 14 )
2/4
+
(−1)(8)( 41 )
3/4
+ (−1)(−10)( 14 ) =
3 2
302
4 Multi-rate Audio Processing and Wavelet Transform
√ √ b1 (0) = x(t), ψ1,0 (t) = x(t), 2ψ(2t) = 2 √ √ √ = ( 2)(8)( 14 ) + (− 2)(−4)( 41 ) = 3 2
1 !1/2 √ ! ψ(2t)x(t)dt + 2 ψ(2t)x(t)dt 0
1/2
1/2 1
√ ! √ √ ! b1 (1) = x(t), ψ1,1 (t) = x(t), 2ψ(2t − 1) = 2 ψ(2t − 1)x(t)dt + 2 ψ(2t − 1)x(t)dt
√ √ √ = ( 2)(8)( 14 ) + (− 2)(−10)( 14 ) = 29 2.
0
1/2
The previous development can be written in matrix form as ⎤ ⎡ ⎡ √ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ 0 1 8 1 √2 ⎢ −4 ⎥ ⎢ 1 ⎥ 1 ⎢ 1 ⎥ 3 ⎢ − 2 ⎥ √ ⎢ 0 ⎥ 9√ ⎥ ⎢ ⎢ ⎢ ⎢ √ ⎥ ⎥ ⎥ ⎢ ⎥ ⎣ 8 ⎦ = ⎣ 1 ⎦ 2 + ⎣ −1 ⎦ 2 + ⎣ 0 ⎦ 3 2 + ⎣ 2 ⎦ 2 2 √ −1 −10 1 − 2 0 ⎡
(4.182)
expressing in the form transformation matrix as y = W4 b we have √ 0 1 √2 1 − 2 √0 −1 0 √2 −1 0 − 2
⎡
⎤ ⎡ 8 1 ⎢ −4 ⎥ ⎢ 1 ⎢ ⎥ ⎢ ⎣ 8 ⎦=⎣1 −10 1 ⎡
from which
1 ⎢1 W4 = ⎢ ⎣1 1
⎤ ⎡
⎤ 1/2 ⎥ ⎢ 3/3 ⎥ ⎥·⎢ √ ⎥ ⎦ ⎣3 2⎦ √ 9 2 2
√ 0 1 √2 1 − 2 √0 −1 0 √2 −1 0 − 2
(4.183)
⎤ ⎥ ⎥ ⎦
(4.184)
is the discrete Haar transform matrix. And b b=
1 2
√ 1 3 2
9 2
√ T 2
(4.185)
represents the vector of wavelet coefficients, which are just the values of the wavelet transform of the x(t) signal. From the above discussion, it appears that the resolution in subspaces grows as a power of two (dyadic scaling or dyadic multiresolution).
4.5.4.2
Parseval’s Theorem
The basis defined in the subspaces Vj and Wi is orthonormal. Also, by definition, every subspace is orthogonal to every other subspace. Thus
4.5 Wavelet Basis and Transforms
303
ϕj,m (t), ϕj,n (t) = δmn
(4.186)
ψi,m (t), ψl,n (t) = δil δmn
(4.187)
ϕj,m (t), ψi,n (t) = 0
(4.188)
and
with m, n, k, l, i ∈ Z. The signal x(t) in (4.178) turns out to be expanded in terms of basis vectors that are all orthonormal to each other for which it is worth !
∞ 2 |x(t)| dt = |bi (k)|2 . aj (k) + 2
i=j
k
(4.189)
k
The energy of the signal is equal to the sum of the squares of the coefficients of the expansion. This is easily verified by inserting the coefficients of the expansion (4.178) in the scalar product x(t), x(t) considering the relations (4.186)–(4.188).
4.5.5 Dilation Equations for Dyadic Wavelets The scale-2 translation and dilation equations ensure the orthonormality of the wavelet basis and allow the MRA representation of any function x(t) ∈ L2 (R). In this section, by analyzing other properties of ϕi,k (t) and ψi,k (t), we will see how it is possible to associate with these functions the impulsive response, of finite duration, of a numerical filter. Remark 4.31 Note that, in wavelet analysis, very often, operations between different domains are empirically defined. In this case between continuous-time (CT) and discrete-time (DT) domains. This, even if from a formal point of view is not correct, allows, however, a simpler treatment for those familiar with DT circuits.
4.5.5.1
Dilation Equations
From the previous definitions of nested subspaces, in particular from the causal property MRA (4.171), since V0 ⊂ V1 , if ϕ(t) ∈ V0 , it follows that ϕ(t) ∈ V1 . Then, every function belonging to V1 (hence also ϕ(t) itself) can be represented as a linear combination of the basis V1 =spank {21/2 ϕ(2t − k)}. Therefore, the following theorems apply. Theorem 4.3 Dilation equations—There always exists a set of real coefficients g0 [k] such that the function ϕ(t) ∈ V1 ⊂ V0 can be represented by the following linear combination √ g0 [k] · ϕ(2t − k) (4.190) ϕ(t) = 2 k
304
4 Multi-rate Audio Processing and Wavelet Transform
and in general applies
ϕ(2i t) = 2(i+1)/2
k
g0 [k] · ϕ(2i t − k).
(4.191)
The above expressions are known by different names including: dilation equations, refinement equations, scale-2 difference equations, etc. In the following, we will also see how the coefficients of the linear combination (4.190) are assimilated to a numerical FIR filter with impulse response g0 = [g0 [0] · · · g0 [N − 1]]T . Theorem 4.4 An important property of the coefficients g0 is that the following normalization holds. √ g0 [k] = 2. (4.192) k
Proof Integrating both members of (4.190) we have that !∞
!∞ √ ϕ(t)dt = 2 g0 [k] ϕ(2t − k)dt k
−∞
−∞
by making a change of variable u = 2t − k, in the integral on the right we have that !∞ ϕ(t)dt =
√ 2 2
k
!∞ g0 [k]
−∞
ϕ(u)du ⇒
k
g0 (k) =
√
2.
−∞
Theorem 4.5 Considering the Fourier transform (jω) = F{ϕ(t)}, for ω → 0 it is worth !∞ (ω)|ω=0 = ϕ(t)e−jωt dt|ω=0 = 1 (4.193) −∞
for which the (ω), has lowpass characteristics and also allows the representation of continuous signals. Theorem 4.6 Given ϕ(t) such that ϕ(t), ϕ(t + k) = δ(k), k ∈ Z, ⇒
∞
|(ω + 2kπ )|2 .
(4.194)
−∞
Proof Observe that the scalar product to the left side of the preceding expression, corresponds to the autocorrelation function of ϕ(t) evaluated for integer k translations i.e., said ρ(τ ) = ϕ(t), ϕ(t + k), the (4.194) (repetition of period k with unit step)
4.5 Wavelet Basis and Transforms
305
corresponds to the product ρ(τ ) · sk (τ ) with sk (τ ) equal to a train of pulses δ(t − kT ). Thus, for the convolution Theorem 2.2 (see Sect. 2.4.2.3), for T = 1 we have ρ(τ ) ·
∞
δ(t − k)
⇒
(ω) ∗
−∞
∞
δ(ω − 2π k) =
−∞
∞
(ω − 2π k) (4.195)
−∞
that proves (4.194). Observe, also, that for energy conservation, for (4.156), (Parseval’s theorem) holds ∞ |(ω + 2π k)|2 = 1. (4.196) −∞
Theorem 4.7 The Fourier transform (ω) = F{ϕ(t)}, for the (4.190) is equal to the product of the DTFT of the filter g0 and the CTFT of the ϕ(t) evaluated at ω/2. Formally 1 (ω) = √ G 0 (e−jω/2 ) (ω/2) . (4.197) 2 DTFT
CTFT
Proof For the proof we need to define the TF of the expressions (4.190). For which it is valid !
√ ! ϕ(t)e−jωt dt = 2 g0 [k]ϕ(2t − k)e−jωt dt k ! ! √ 1 1 g0 [k] g0 [k]e−jωk/2 ϕ(t)e−jωt/2 dt = 2 ϕ(t)e−jωt/2 e−jωk/2 dt = √ k 2 2 k
(ω) =
DTFT
CTFT
1 = √ G 0 (e−jω/2 ) · (ω/2). 2
Remark 4.32 Note that, the term G 0 (ejω/2 ) can be seen as the DTFT of the numerical filter g0 . The expression (4.197) is important because it allows us to formally link the continuous-time wavelets with the impulsive response of such numerical filters. Furthermore, the following property holds. Property 4.15 Equation (4.197) allows us to: i. determine the wavelet basis by iterating a numerical filter; ii. determine algorithms for implementing the wavelet transform directly in time without specific knowledge of the functions ϕ(i) and ψ(i). Theorem 4.8 For the filter g0 , it holds |G 0 (ejω )|2 + |G 0 (ejω+jπ )|2 = 2.
(4.198)
306
4 Multi-rate Audio Processing and Wavelet Transform
Proof Substituting the (4.197) into the (4.196) we can write ∞ 2 1 G 0 (e−j(ω+kπ) ) · (ω + kπ ) = 1 √ 2 k=−∞
and developing the square we get ∞ ∞ 2 2 1 1 |(ω + 2kπ )|2 + G 0 (e−j(ω+π ) ) |(ω + (2k + 1)π )|2 = 1 G 0 (e−jω ) 2 2 k=−∞
k=−∞
again for the (4.196) we have that |G 0 (ejω )|2 + |G 0 (e(jω+π) )|2 = 2.
4.5.5.2
Wavelets Equations
By similar reasoning regarding the dilation function, since V1 = V0 ⊕ W0 the wavelet function must also necessarily satisfy W0 = V1 ⊕ (−V0 ). Therefore, for any ψ(t) ∈ W0 ⊂ W1 the following theorems hold. Theorem 4.9 Existence - There always exists a set of real coefficients g1 [k] such that the function ψ(t) ∈ W0 can always be represented by the following linear combination √ g1 [k] · ϕ(2t − k) (4.199) ψ(t) = 2 k
called equation wavelets and generalized as ψ(2i t) = 2(i+1)/2
g1 [k] · ϕ(2i+1 t − k).
(4.200)
k∈
Recall that for the (4.147), we have that ψi,k (t) is zero mean from which we can derive properties needed for the filter g1 . Property 4.16 For the filter g1 [k], it holds k
g1 [k] = 0.
(4.201)
Proof Integrating both members of (4.199), we have !∞
!∞ √ ψ(t)dt = 2 g1 [k] ϕ(2t − k)dt. k
−∞
Since, by definition
/∞
−∞
(4.202)
−∞
ψ(t)dt = 0 [see Eq. (4.147)], (4.201) holds.
Property 4.17 From the Fourier transform F{ψ(t)}, for ω → 0, (again for the (4.147)) is worth
4.5 Wavelet Basis and Transforms
307
!∞ [](ω)|ω=0 =
ψ(t)e−jωt dt = 0
(4.203)
−∞
for which the (ω) blocks the continuous component; i.e., it has non-lowpass characteristics. Remark 4.33 As illustrated with the example in Fig. 4.52, given the filter coefficients g1 [k] and the father function ϕ(t), the expression (4.199) allows us to determine the wavelet function. Theorem 4.10 For the filter g1 we have that g1 [k] = (−1)k g0 [−k + 1]
(4.204)
i.e., the orthonormality of the basis in the subspaces Vi , see Eq. (4.160), is directly correlated with the orthonormality of the impulsive responses of the filters g0 and g1 . Proof Let us consider the TF of the wavelet equation (4.199). Proceeding in the same way as for the dilation equation, it is easy to verify that it is worth 1 (jω) = √ G 1 (ejω/2 ) · (jω/2) 2
(4.205)
by definition ψi,k (t) ∈ Wi , ϕi,k (t) ∈ Vi , and Vi ⊥Wi , for which ψ(t), ϕ(t − k) = 0, and then !∞ ψ(t), ϕ(t − k) = 0, ∀k ∈ Z, ⇒ −∞
Fig. 4.52 Haar wavelet function determined with the expression (4.199)
(jω)∗ (jω)ejωk dω = 0
(4.206)
308
4 Multi-rate Audio Processing and Wavelet Transform
or similarly to (4.195), we have !∞ l
(jω + 2lπ )∗ (jω + 2lπ )ejωk dω = 0
(4.207)
−∞
whereby
l
(jω + 2π l)∗ (jω + 2π l) = 0.
(4.208)
Substituting (4.197) and (4.205) into (4.208), separating the even and odd components 1 G 1 (ej(ω/2+2π k) )(ω/2 + 2kπ ) · G ∗0 (ej(ω/2+2π k) )∗ (ω + 2π k) 2 k=2l∈Z
+
1 2
G 1 (ej(ω/2+2π k) )(ω/2 + 2kπ) · G ∗0 (ej(ω/2+2π k) )∗ (ω + 2π k) = 0
(4.209)
k=(2l+1)∈Z
ˆ ˆ G 1 (ejωˆ )G ∗0 (ejωˆ ) + G 1 (ej(ω+π) )G ∗0 (ej(ω+π) ) = 0.
(4.210)
Note that, the condition of orthonormality between the impulse responses of the filters g0 and g1 can also be proved by the following theorem. Theorem 4.11 The coefficients g0 and g1 are the lowpass and highpass impulse responses of a two-channel orthonormal filter bank (FB). Proof By the property of orthonormality and from the scaling equation we can write ϕ(t + l), ϕ(t + m) =
k
g0 [k] · ϕ(2t + 2l − k),
n
g0 [n] · ϕ(2t + 2m − n)
for k − 2l → k1 and n − 2m → n1 , we get ϕ(t + l), ϕ(t + m) =
0
= 21
k1
g0 [k1 + 2l] · ϕ(2t − k1 ),
n1
1 g0 [n1 + 2m] · ϕ(2t − n1 )
(4.211)
g [k + 2l]g0 [k1 + 2m] = δ[l − m] k1 0 1
that is, the lowpass and highpass are orthogonal to their even translations. Similarly, the lowpass filter is orthogonal to the highpass and its even translations. In other words, gi [n − 2k], for i = 0, 1 is an orthonormal set and can be used to construct an orthogonal FB.
4.5 Wavelet Basis and Transforms
4.5.5.3
309
Regularity Conditions for h0 , h1 , g0 , g1 , and H(z)
In agreement with the above proof, we report below the properties of the filter bank g0 and g1 (extended by symmetry also to filters h0 and h1 ), so that these impulse responses are precisely the expansion coefficients of the dilatation and wavelet functions: 1. the filters h0 , h1 , and g0 , g1 constitute a perfect reconstruction filter bank; 2. the sum of the coefficients of the filters h0 , h1 results in √ h0 [n] = 2, h1 [n] = 0; (4.212) n
n
3. the sum of the coefficients of the filters g0 , g1 results in
g0 [n] =
√
g1 [n] = 0;
(4.213)
h0 = h1 = g0 = g1 = 1;
(4.214)
2,
n
n
4. the L2 norm of the filters results in
5. the transfer functions H0 (z), H1 (z), and G 0 (z), G 1 (z) must be regular (i.e., passive.) For the existence of the wavelets and dilation functions, all previous conditions must be satisfied. Regularity conditions for the H (z) The condition for a regular bank is that the lowpass prototype has at least one zero at z = −1. This emerges from the Daubechies sufficient conditions [18], and to verify this condition the Nz zeros at z = −1 can be separated as H (z) =
√
"
1 + z −1 2 2
# Nz F(z)
(4.215)
and said B the upper extreme of the modulus of the remaining part, on the unit circle is worth (4.216) B = sup F(ejω ) . ω∈[0,2π]
Remark 4.34 Observe that, conditions 1–5 coincide with the conditions of perfect reconstruction filter banks. Now, denoting by h[n] the impulse response of a lowpass prototype, its translations constitute an orthogonal set of functions. This is true if:
310
4 Multi-rate Audio Processing and Wavelet Transform
i. the impulse responses h0 , h1 , g0 , and g1 , belong to a two-channel perfect reconstruction analysis–synthesis bank; ii. the CQF conditions (see Sect. 4.3.2.6, Eqs. (4.80)–(4.81)) hold. Thus, for the analysis bank, the following holds h0 [n] = h[n] h1 [n] = (−1)(N −1−n) h[N − n − 1]
(4.217)
and the condition (4.82)–(4.83). For the synthesis bank, neglecting the scale factor 2, we have that g0 [n] = h[N − 1 − n] (4.218) g1 [n] = (−1)n h[n]. In the case of the filter h with only two coefficients, as with the Haar wavelet, the determination of the impulse response of the g0 and g1 filters can be done unambiguously by explicitly writing the properties (4.213) combined with the (4.214). So for the filter g0 we have √ 2
g0 [0] + g0 [1] =
g02 [0] + g02 [1] = 1 solving the system, we have g0 =
√1 √1 2 2
T
.
(4.219)
(4.220)
While for the filter g1 is worth g1 [0] + g1 [1] = 0 g12 [0] + g12 [1] = 1 so solving we have g1 =
√1 2
−
√1 2
T
.
(4.221)
(4.222)
Remark 4.35 Note that, the filter g1 can be determined directly by the property (4.204). Example 4.7 For the Haar wavelet, characterized by the father function (4.157), we have that the dilation equation (4.190) is verified with the filter (4.220) √ √ ϕ(t) = 2g0 [0] · ϕ(2t) + 2g0 [1] · ϕ(2t − 1) = ϕ(2t) + ϕ(2t − 1)
(4.223)
4.5 Wavelet Basis and Transforms
311
and the wavelet expression is verified for the filter (4.222) ψ(t) =
√
2g1 [0] · ϕ(2t) +
√ 2g1 [1] · ϕ(2t − 1)
= ϕ(2t) − ϕ(2t − 1).
(4.224)
In fact, from the preceding for the father function and the wavelet function we have that 1/2 ϕ(t) = 0 (t) + 11/2 (t) (4.225) 1/2 ψ(t) = 0 (t) − 11/2 (t).
4.5.6 Compact Support Orthonormal Wavelet Basis A very important class of wavelet functions in practical applications is that of compactly supported orthonormal wavelet basis, i.e., of exactly finite duration between 0 ≤ t < T [18–20]. For the construction of limited support orthogonal wavelets we take the dilation equations (4.190) and the wavelet equation (4.199) and rewrite them as ϕ(2i+1 t) = 2i/2 ψ(2i+1 t) = 2i/2
k
k
g0 [k + 2n]ϕ(2i t − k)
(4.226)
g1 [k + 2n]ϕ(2i t − k).
(4.227)
From the above expressions, it is immediate to see that the compactness of the support ϕ and ψ is guaranteed if the number of coefficients of g0 and g1 is finite and thus the bank filters are of FIR type.
4.5.6.1
Father Function Determination ϕ(t) from g0
As shown in [19], determining the dilation and wavelet function, starting with the filter g0 has the advantage of ensuring and allowing better control of the support itself of ϕ(t) and ψ(t). The support is, in this case, compact, i.e., null outside a specific time region. To determine the connection between the filter g0 and the function ϕ(t) we consider the expansion from the space V−L to the space V0 representing by (4.226) as a series connection of filters and interpolators for 2, as shown in Fig. 4.53.
Fig. 4.53 Circuit representation of the expression (4.226)
312
4 Multi-rate Audio Processing and Wavelet Transform
In this case, we can write the input–output relations for each block and substitute backwards so that we get the overall TF between 0 (ω) and L (ω). Starting with the block on the left, for Theorems 4.3 and 4.7, we can write " # " # jω jω L √1 G 0 (ejω/2 ) · L = −L+1 2 2L−1 2L "
and −L+2
jω 2L−2
#
" = =
L−1 √1 G 0 (ejω/2 ) 2
· −L+1
L−1 √1 G 0 (ejω/2 ) 2
·
jω 2L−1
#
L √1 G 0 (ejω/2 ) 2
" · −L
jω 2L
#
.. . for which it is worth 0 (jω) =
√1 G 0 (ejω/2 ) 2
·
√1 G 0 (ejω/4 ) 2
· ... ·
L √1 G 0 (ejω/2 ) 2
" · −L
jω 2L
# (4.228)
where, for L → ∞, ω/2L → 0; thus the term (ω/2L ), by the lowpass property (4.193), converges to the value (0) = 1. It is then true that 0 (jω) =
∞ 2
2−1/2 G 0 (ejω/2
k+1
)
(4.229)
k=0
considering that the filter g0 has a regular G 0 (z), (i.e., by definition the filter is passive and has specific properties) this limit exists and is the Fourier transform of the dilation function. By transforming to the DT domain, we can write the dilation function with the following recursive formula g(k) = 2 ↑ g(k−1) , √ ϕ(k+1) = ϕ(k) ∗ 2g(k)
k = 1, . . . , K − 1
(4.230)
whit the following i.c. g(k) = g0 , √ ϕ(k+1) = 0 2g0 0N −2
k=0
where K represents the number of iterations, ϕ(k) is the vector which contains the sampled dilatation function at the k-th iteration, 2 ↑ is the zero-filling operator (see Sect. 4.2.2), g(k) is the zero-filled vector, and “∗” is the full convolution operator. The
4.5 Wavelet Basis and Transforms
313
(4.230) is a recursive convolution whereby, at each iteration the filter g(k) increases its length. Given N the length of the filter g0 , the length of the sampled dilatation function after K iteration can be computed as NT = 2K (N − 1) + 1.
(4.231)
The described procedure is performed in the DT domain so we need to return the functions to the correct time dimension. By comparing the length of the filter NT with the time t = [0, T ), we can define the time step, equal to ts = T /NT , by which the function ϕ(t)|t=ts ·n is determined for n = 0, 1, …, NT − 1. In practice, T is determined by the length of the impulse response of the filter g0 a priori known, as T = N − 1, so (4.232) ts = (N − 1)/NT . The recursive procedure (4.230) is schematically illustrated with the circuit in Fig. 4.54. Remark 4.36 The computation of ϕ(t) within the interval t ∈ [0, T ), (typically, T is equal to the length N − 1 of the filter g0 ) is performed with a higher degree of resolution (or sampling) as the number of iterations in Eq. (4.230) increases, Given that the convolution produces a gradually longer sequence, the class of wavelet functions generated by this procedure is a sequence corresponding to the function ϕ(t), t ∈ [0, T ) sampled with NT samples and, by definition, null, outside this range. This property makes the wavelet transform thus generated, able to operate on a finite set of data (a property that is formally called compact support.) Furthermore, the wavelets generated by such a procedure appear, by visual inspection, to be extremely irregular. This is due to the fact that the recursion equation ensures that the resulting ϕ(t) function is everywhere non-differentiable. The coefficients of the filters used to determine ϕ(t) (and consequently ψ(t)) are well chosen from some specific set of families and, as a result, the resulting wavelet functions have a rather recognizable shape. Determining the wavelet function from ψ(t) and g1
Fig. 4.54 DT circuit that produces a gradually longer sequence √ corresponding to the sampled form of scaling function ϕ(t), in the interval √ t ∈ [0,√T ). The term 2 is due to the compensation of the gain (equal to 2) due to zero-filling: 2 = 2/ 2
314
4 Multi-rate Audio Processing and Wavelet Transform
It is possible to determine the wavelet function with a simple linear combination of ϕ(t) functions translated and scaled by applying the expression (4.199). The term k appearing in the wavelet equation is the index of the vector g0 [k] while, in the case of the function ϕ(2t − k), the term k corresponds to the time translation of k [sec]. Considering the sampling period ts defined in (4.232), the expression (4.199) in the DT domain takes the form √ g1 [k] · ϕ[2n − k/ts ] (4.233) ψ[n] = 2 k
where g1 is determined by g0 with the CMF conditions described above.
4.5.6.2
Example: Dilation Function ϕ(t) and Daubechies Wavelet ψ(t)
As an example, we consider Daubechies wavelets of order N = 4 [19]. The impulse response of the g0 filter, computed in the example in §4.3.3.1 (see Eq. (4.98)), results √ √ √ √ 1 g0 = √ 1 + 3 3 + 3 3 − 3 1 − 3 4 2 = [0.4830 0.8365 0.2241 − 0.1294].
(4.234)
Figures 4.55 and 4.56 show an example of the computation of Eq. (4.230), using the iterative procedure described in the previous section.
Fig. 4.55 DT dilatation and wavelet functions computed with the iterative procedure starting with impulse responses of the g0 and g1 filters, and time axis determined with (4.232). a Daubechies dilation function and wavelet of order 4, computed by iterative procedure in Eq. (4.230). b and c Simple Matlab code used to generate the sampled version of the dilation and wavelet functions
4.5 Wavelet Basis and Transforms
315
Fig. 4.56 Example of DT dilatation and wavelet functions computation with 9 iterations (similar to the Matlab procedure [phi,psi,tts] = wavefun(’db2’,9))
Fig. 4.57 Daubechies wavelet of order N = 4: a value of coefficients; b amplitude spectrum of analysis–synthesis filters
As shown in [18], to obtain a dilation function ϕ(t) with a good approximation, the procedure converges in 7–8 iterations of recursion of Eq. (4.230). Since, by definition, the filter bank has perfect reconstruction, the coefficients h0 , h1 , g0 and g1 of the analysis and synthesis filter banks can be derived from (4.80) to (4.83). Figure 4.57 shows the trends of the filter coefficients with the corresponding amplitude spectra of Daubechies wavelets of order N = 4. The wavelet function can be derived by applying (4.227). Figure 4.58 shows the dilation function ϕ(2t − k) (defined in the interval 0 ≤ t ≤ 3) calculated with the expression (4.233). The bottom of Fig. 4.58 is the wavelet function plot obtained√from the linear combination of the functions ϕ(2t − k) with the filter coefficients 2g1 [k].
316
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.58 Daubechies wavelet function of order N = 4, determined with the expression (4.233) for a number of iterations of (4.230) equal to 4 (sx) and 10 (dx)
4.5.7 Wavelet for Discrete-Time Signals The sequences wavelet transform (SWT) can be defined by sampling the signal x(t) ∈ V0 ⊂ L2 (R) and tha wavelet functions (ϕ(t) → ϕ[n] and ψ(t) → ψ[n]). Consider an arbitrary sequence x[n] quadratically summable x[n] ∈ V0 ⊂ l2 (Z), the signal is projected onto a finite number of subspaces Vi elements of a proper subspace l2 (Z). Considering the disaggregation from level 0 to level L, we have V0 = V−L ⊕ W−L ⊕ · · · ⊕ W−2 ⊕ W−1 .
(4.235)
Hence, denoting the projections of x[n] ∈ V0 on subspaces Vi as ai [n] ∈ Vi , i.e., a−1 [n] ∈ V−1 , . . . , a−L [n] ∈ V−L
(4.236)
and the projections onto subspaces Wi as b−1 [n] ∈ W−1 , . . . , b−L [n] ∈ W−L .
(4.237)
The signal x[n] ∈ V0 can always be expanded in the terms T T x[n] → a−L [n], b−L [n], b−L+1 [n], . . . ,b−1 [n] = a−L b and uniquely reconstructed as the sum of the various contributions
(4.238)
4.5 Wavelet Basis and Transforms
317
x[n] = a−L [n] + b−L [n] + b−L+1 [n] + · · · + b−1 [n].
(4.239)
The wavelet series development consists of calculating the coefficients aj [n] and bj [n] by deriving them recursively from a−L [n]. This computation, which in practice is performed with a dyadic L-level filter bank, is by definition of discrete type and is known as a discrete-time wavelet transform here indicated as discrete-time DWT (DT-DWT) sequences wavelet transform (SWT)6 .
4.5.7.1
Discrete-Time DWT with Filter Bank
In the discrete-time domain, there is a precise relationship between dyadic wavelets and perfect reconstruction filter banks [6, 26–28]. This relation makes the computation of the DT-DWT very efficient, without explicit knowledge of the functions ϕ(t) and ψ(t), and implies interesting practical considerations that are very useful, especially in the analysis and the compressed coding of audio and video signals. Let us now see how the computation of the development coefficients for the projection of a signal from space Vi+1 into the subspace Vi and Wi can be traced to a two-channel analysis filter bank [6]. Since Vi+1 = Vi ⊕ Wi , the signal can be developed into the series of its projections into the subspaces Vi and Wi , so applying the expression (4.178) we have x[n] = =
k T ai
ai [k] · ϕi,k [n] +
bi [k] · ψi,k [n] (4.240)
k
· ϕi,k +
bTi
· ψi,k
where the coefficients of the projections ai [k] and bi [k] are defined by the scalar products ai [k] = x[n], ϕi,k [n] = xnT · ϕi,k bi [k] = x[n], ψi,k [n] = xnT · ψi,k . Recalling that it holds ϕ(2i t) = 2(i+1)/2 · the generic translation m we can write
ϕi,m (t) ∈ Vi ϕ(2i t − m) = 2(i+1)/2 ·
k
(4.241) (4.242)
g0 [k] · ϕ(2(i+1) t − k) and considering
k
g0 [k] · ϕ(2(i+1) t − k − 2m).
With the substitution 2m + k → j is ϕ(2i t − m) = 2(i+1)/2· j g0 [j − 2m] · ϕ(2(i+1) t − j) rewritten for expressive convenience as ϕi,k (t) = m g0 [m − 2k] · ϕi+1,m (t). 6
Note that the DWT as defined above in Sect. 4.5.3 is a continuous transform. In many texts and articles, however, it is common practice to denote the wavelet transform for sequences or discretetime wavelet transform by the acronym DWT, so we encourage the reader to understand from the context the type of transform considered.
318
4 Multi-rate Audio Processing and Wavelet Transform
With the same reasoning we can also write ψi,k (t) = m g1 [m − 2k] · ϕi+1,m (t). In discrete time then g0 [m − 2k] · ϕi+1,m [n] (4.243) ϕi,k [n] = m ψi,k [n] = g1 [m − 2k] · ϕi+1,m [n]. (4.244) m
For the recursive, and efficient, calculation of the coefficients ai [k] by substituting in the scalar product (4.241) the (4.243), remembering that h0 [n] = g0 [N − 1 − n], we have that
g0 [m − 2k] · ϕi+1,m [n] ai [k] = x[n], ϕi,k [n] = x[n], m g0 [m − 2k] · x[n], ϕi+1,m [n] = g0 [m − 2k] · ai+1 [m] = (4.245) m m = g0 [−n] ∗ ai+1 [n]|n=2k = hT ai+1 . 0
n=2k
Similarly we have that the coefficients bi+1 [k] can also be derived recursively from ai [k]. In fact, by substituting in the (4.242) and the (4.244), we have that
g1 [m − 2k] · ϕi+1,m [n] bi [k] = x[n], ψi,k [n] = x[n], m g1 [m − 2k] · x[n], ϕi+1,m [n] = g1 [m − 2k] · ai+1 [m] (4.246) = m m = h1T ai+1 n=2k . The coefficients ai [n] can be computed as a convolution between the coefficients ai+1 [n] and the filter h0 and a subsequent subsampling by a factor of 2. The coefficients bi [n] can be computed similarly but with a filter h1 . The resulting circuit structure consists of a two-channel filter bank as depicted in Fig. 4.59.
Fig. 4.59 Two-channel analysis filter bank for dyadic wavelet expansion computation (projection into scale-2 subspaces)
4.5 Wavelet Basis and Transforms
319
Fig. 4.60 Two-channel filter bank for calculating the coefficients of the wavelet expansion. a Symbols; b complementary frequency response of the bank
Fig. 4.61 DT-DWT with filter bank: a dyadic analysis filter bank implementing DWT; b overall frequency response of the filter bank
4.5.7.2
Dyadic Filter Bank for Wavelet Analysis
The expressions (4.245) and (4.246) indicate that in DT domain, the wavelet coefficients decomposition is computed as the output of a two-channel analysis filter bank, and the coefficients ai [n] and bi [n] are evaluated at half the sampling rate of the coefficients of the previous level ai+1 [n]. If we consider the scaling-2 property and the definition of nested subspaces (4.235), it can be shown that the projection of Vi+1 onto Vi corresponds to lowpass filtering while the projection onto the subspace Wi corresponds to highpass filtering. The h0 (n) turns out to be the response of a lowpass filter and h1 (n) is the associated complementary highpass impulse response, as illustrated in Fig. 4.60. By connecting the lowpass output to another two-channel filter bank identical to the previous one recursively, as shown in Fig. 4.61, the decomposition results as indicated by (4.235)–(4.237). The DT-DWT is therefore realized with an octave filter bank implemented as a dyadic tree structure. The coefficients b1 through b5 thus correspond to highpass filtered sequences while the coefficients a1 to a5 of the scaling function as a lowpass filtered sequences. The structure in Fig. 4.61, also referred to as the dyadic analysis filter bank, is the one that is usually used to implement DT-DWT or SWT.
320
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.62 Discrete-time discrete wavelet transform (DT-DWT) with pyramid algorithm
4.5.7.3
Triangular Pyramid Algorithm
A dyadic filter bank is implemented with an algorithm proposed in [21], called a pyramid algorithm, described below. Consider a data block of length equal to a power of two N = 2D , where D is the number of dilations, the algorithm is implemented with a dyadic filter bank fed with the dilated scale-2 signal. In practice, the dilation corresponds to a decimation by a factor of two, i.e., the highpass filters are fed with even samples, and the lowpass filters are fed with odd samples from the previous stage. The pyramid dilation scheme is illustrated in Fig. 4.62. Although Mallat’s algorithm is computationally quite efficient and allows for the definition of appropriate hardware structures, one of the most robust methods for realizing the filter bank consists of the lattice structure as shown in Fig. 4.63 (see [8, 28] for details).
4.5.7.4
DT Inverse DTW: Filter Bank for Wavelet Synthesis
The discrete-time inverse wavelet transform (DT-IDWT), a dual situation to the one previously presented, consists in reconstructing the signal in the space Vi+1 from its projections Vi and Wi . Starting from the wavelet projections ai [n] and bi [n] the
4.5 Wavelet Basis and Transforms
321
Fig. 4.63 Lattice structure implementing a 6th-order filter for wavelet transform reconstruction
reconstruction of the signal is done through a (synthesis) filter bank. In practice, we proceed similarly to analysis, considering the space Vi+1 as the sum of the subspaces Vi and Wi . Since Vi ⊂ Vi+1 a translation of n in Vi+1 corresponds to a change equal to 2n in Vi and in Wi . Formally ϕi+1,n (t) =
k
ϕi+1,n (t) =
m
g¯ 0 [k] · ϕi,k+ n (t) + 2
g0 [2m − n] · ϕi,m (t) +
k
g¯ 1 [k] · ψi,k+ n (t)
(4.247)
g1 [2m − n] · ψi,m (t)
(4.248)
2
m
It is immediate to identify the above expression as the sum of two convolutions. The structure for the DT-IDWT is, therefore, a two-channel filter bank of the type illustrated in Fig. 4.64. Due to the oversampling effect, the filters, characterized by the impulse responses g0 and g1 , are fed only by the components of ai [n] and bi [n] with even index. The synthesis structure for performing DT-IDWT, perfectly dual to the analysis structure of Fig. 4.61, is shown in Fig. 4.65. Remark 4.37 The computation of the coefficients for the projection of the signal x[n] into its subspaces essentially corresponds to the computation of the output of a dyadic octave filter bank. In accordance with (4.245), the coefficients of the impulse response h0 [n] express, in fact, the expansion of the basis ϕi,k (t) in terms of the
Fig. 4.64 Two-channel synthesis filter bank and its representation for signal reconstruction from subspace projections
322
4 Multi-rate Audio Processing and Wavelet Transform
ϕi+1,k (t); similarly from (4.246), the coefficients of the impulsive response h1 [n] express the expansion of the basis ψi,k (t) in terms of the ψi+1,k (t). Since the signals are projected into the subspaces by direct summation the discretetime signal can be unambiguously decomposed with a two-channel filter bank. The analysis bank of Fig. 4.61 and the synthesis bank of Fig. 4.65 (or the respective implementation pyramid structures shown in Figs. 4.62 and 4.66, respectively) form a perfect reconstruction filter bank. It follows that the coefficients a0 [n] can be perfectly reconstructed. The analysis filter bank in Fig. 4.61 together with the synthesis filter bank shown in Fig. 4.65, therefore form an octave filter bank with perfect reconstruction, it follows that the coefficients a0 [n] are reconstructed perfectly. Remark 4.38 For the paraunitary filter bank for DT-DWT/DT-IDWT, it can be shown that (see [21] for details) it holds h[n − 2k], h[n − 2l] =
1 2
· δkl
(4.249)
The impulse response h[n] and its even translations constitute an orthogonal set of functions. This is true if for the impulse response h0 [n], h1 [n], g0 [n], g1 [n] of the analysis–synthesis banks the CQF conditions (4.80)–(4.81) hold. The orthogonality conditions also hold. h0 [n − 2k], g0 [N − n + 2l − 1] = δkl h1 [n − 2k], g1 [N − n + 2l − 1] = δkl
(4.250) (4.251)
for the lowpass and highpass channels, respectively.
Fig. 4.65 DT-IDWT: synthesis filter bank for signal reconstruction with subspace projections
4.5 Wavelet Basis and Transforms
323
4.5.8 Wavelet Examples In the following examples, the Wavelet ToolboxTM software from Matlab was used to define the scaling and wavelet functions.
4.5.8.1
Haar’s Wavelet
Haar’s wavelet is among the simplest and most intuitive. In fact, the scaling function is simply the function (4.157) and for expressive convenience denoted as 10 (t), the scaling function is defined as (4.252) ϕ(t) = 10 (t) while the wavelet associated with it is equal to 1/2
ψ(t) = 0 (t) − 11/2 (t)
(4.253)
Furthermore, as determined in (4.220) and (4.222), it is worth
Fig. 4.66 DT-IDWT: synthesis filter bank for signal reconstruction with subspace projections: x[n] = aL [n] + bL [n] + bL−1 [n] + · · · + b1 [n]
324
4 Multi-rate Audio Processing and Wavelet Transform
Fig. 4.67 Haar (or db1) scaling and wavelet function and related amplitude spectrum plot
Fig. 4.68 Haar’s wavelet, impulse and frequency responses, and poly/zero diagram, of the h0 , h1 analysis and g0 , g1 synthesis FBs
g0 =
√1 √1 2 2
T
, and g1 =
√1 2
−
√1 2
T
.
(4.254)
In Fig. 4.67, the scaling function and the Haar wavelet function are shown. However, in Fig. 4.68, the impulse responses and their amplitude spectra are shown. Obviously, analysis and synthesis FBs are complementary and this coincides with what was previously shown above. Remark 4.39 As it is easy to assume the frequency response of the Haar scaling function which is lowpass and defined as !1 (jω) =
1 · e−jωt dt =(sin π t/π t) · e−jπt .
(4.255)
0
sIn contrast, the frequency response of the wavelet, derived as a difference, has highpass characteristics. In fact
4.5 Wavelet Basis and Transforms
325
(jω) = sinc(ω/4) sin(ω/4)
(4.256)
and is zero for ω → 0 consistent with (4.203). The frequency responses of Haar’s scaling and wavelet functions are shown in Fig. 4.67-(sx). We can observe that the spectrum (4.256) decays slowly at high frequencies. It follows that the Haar wavelets have unsatisfactory spectral resolution.
4.5.8.2
Wavelet Functions with the Inverse Fourier Transform
The Haar transform, very selective in time has a Fourier transform equal to a sinc(π t), as defined by (4.255), which by its nature represents a lowpass filter with poor attenuation characteristics of the secondary lobes. In case more frequency selectivity is needed, at the expense of less precise temporal localization, a wavelet function can be determined by a dual procedure. In this case we define a father function as a windowed sinc(π t), which allows for an optimal trade-off between frequency selectivity and locality.
4.5.8.3
Shannon’s Wavelet
The Shannon wavelet is dual, in the sense of the Fourier transform, to the Haar wavelet. The scaling function is, in fact, defined as the inverse Fourier transform of the Haar wavelet. Therefore ϕ(t) = sinc(π t) (4.257)
and (jω) =
1, |f | < 1/2 0, otherwise.
(4.258)
The orthogonality of ϕ(t) with respect to integer translations, in fact, is already implicitly used in Shannon’s sampling theorem. To calculate the wavelet associated with the (4.257) it is convenient to consider the spectrum of the wavelet itself which will have the form (jω) = F{ϕ(2t) − ϕ(2t − 1)} 1 ω ( 2 ) − e−jπ ( ω2 ) = 2
therefore results (jω) =
1, π ≤ |ω| < 2π 0, otherwise.
Anti-transforming the previous one, for the wavelet function results in
(4.259)
326
4 Multi-rate Audio Processing and Wavelet Transform
!−π ψ(t) =
!2π e dω + jωt
1 2π
ejωt dω
1 2π π
−2π
(4.260)
sin(2π t) sin(π t) − =2 2π t πt 1 3π = sinc( 2 t) · cos( 2 t).
While Haar’s wavelets have an unsatisfactory spectral characteristic, on the contrary Shannon’s wavelets, by definition, are very frequency selective and not very selective in the time domain.
4.5.8.4
Meyer’s Wavelet
For the definition of the Mayer orthonormal smooth wavelet basis, the frequency response of the scaling function is, a priori, predetermined. A possible choice, for example, is the following ⎧ < 23 π ⎨ 1, |ω| π 3 3 (jω) = cos 2 θ ( 4π |ω| − 1) , 2 π ≤ |ω| ≤ 43 π ⎩ 0, otherwise
(4.261)
another possible choice ⎧( ⎪ ⎨ θ (2 + (jω) = ( ⎪ ⎩ θ (2 −
3ω ), 2π
ω 0 the low-frequency components will proceed slower than high-frequency components. To evaluate this dispersion effect, we place in input to the AP sections a signal consisting of the sum of five sinusoids of unit amplitude at frequencies f = [625, 1250, 2500, 5000, 10000], (Hz) shown in Fig. 5.23a. In Fig. 5.23b, c, the output signal and the 3D spectrum, evaluated with short-time Fourier transform (STFT), are also shown. Observing the output signal and the trend of the spectrum as a function of time, it can be observed that the low-frequency signal (at 625 Hz) is present at the output after a delay of about 80 [ms]. This is due to the effect of dispersion determined by the all-pass cells, which for λ > 0, is higher at low frequency. In particular, Fig. 5.24 shows the group delay for the total of the 500 cascaded 1st-order all-pass cells that constitute the warping operator D(z). From the figure, it is most evident that the dispersive effect is high at low frequency and gradually smaller for higher-frequency sinusoids. Finally in Fig. 5.23d–e we can observe that the spectrum of overall output signal ˆ ) ≡ X (e j ωˆ ), while the signal’s is identical to that of the input signal, i.e., G(e jθ(ω) length and its time-variant spectrum computed with the STFT not, i.e., G(e jω , n) = X (e j ωˆ , n). This confirms that the warping operator is all-pass, linear but not shiftinvariant. Example 5.5 In this example we consider the same warping delay line implemented with 500 all-pass sections of the previous example, but at the input we put a linear
Fig. 5.23 Effect of a chain warping operator (output of all-pass delay line) D500 (z), applied to a harmonic signal consisting of 5 sine waves of different frequencies. a Input signal x[n]; b output delayed signal g[n] = D500 (x[n]); c 3D spectrum evaluated with STFT of the output signal; d spectrum of input signal; e spectrum of the overall output signal
5.3 Rational Orthonormal Filter Architecture 357
358
5 Special Transfer Functions for DASP
Fig. 5.24 Group delay of the warping operator D500 (z) as a function of frequency. Delays related to the frequencies of the sine wave of the input signal are highlighted
Fig. 5.25 Effect of the operator D500 (x[n]) on a linear sweep type signal
sweep signal2 , that is, a simple sine wave of unit amplitude whose frequency ω(t) varies linearly from ω1 to ω2 in a certain time interval t ∈ [0, TM ], so x(t) = sin [ω(t) · t] , where ω(t) = ω1 +
ω2 − ω1 t. TM
In Fig. 5.25 the temporal and spectral trends of the input sweep signal and the corresponding output are shown. In this case, due to the effect of higher group delay in low frequency, a time compression is determined and the output signal is shorter than the input one. However note that although the temporal trends of the input and output are different, the spectrum of the input and output signals is exactly the same.
2
Sweeps are test signals commonly used in the field of audio measurements. Typically the frequency of the sweep varies over the standard audio bandwidth of 20 Hz to 20 kHz with linear or exponential law.
5.3 Rational Orthonormal Filter Architecture
5.3.4.2
359
Warped FIR Filters
The warped signal processing, consists in the generalization of the scheme in Fig. 5.21, where the warping operator is applied on the filter impulse response in order to synthesize a given target TF. In practice, the frequency warped filters are numerical transversal or recursive structures, where the unit delays zˆ −1 are replaced by frequency dependent dispersive elements zˆ −1 = D1 (z). In other words, warped filtering is not a method for filter design, it is an implementation structure that produces the effect of warping the frequency axis by tuning the AP parameter λ. The filter could be designed using any of the well-known filter design methods [22–32]. Examples of possible warped FIR filter (WFIR) structures are shown in Fig. 5.26 where the all-pass cells Dλ (z) are implemented in one of the forms in Fig. 2.40 or other possible architectures [31]. Now, denoting by ts the sampling period, for a regular M-length FIR filters the duration of the impulse response is equal to ts · M. With reference to Fig. 5.27a if we substitute for the unit delay an all-pass cell z −1 −λ zˆ −1 → 1−λz −1 , for λ > 1, the time-lag between two successive samples is τλ > 1, and as can be seen from Figs. 5.20 and 5.24, this delay is not uniform for all frequencies. The resulting effect is that the duration of the impulse response, given the same number of filter coefficients, is warped and its duration results in ∼ ts · τλ · M. Formally, let w[n] be a given starting prototype, the warped IR h[n] can be obtained by the bilinear mappings between zˆ -domain and z-domain relationship ([32], Eq. (11))
a)
b)
c)
Fig. 5.26 Equivalent structures for WFIR implementation. a General scheme; b structure with AP cell Dλ (z) with 2 delay elements and one multiplier; c optimized structure proposed in [31], with only one delay and one multiplier. Note that these architecture are similar to the Laguerre filter presented in Sect. 5.3.3
360
5 Special Transfer Functions for DASP
a)
b)
Fig. 5.27 Warped FIR (WFIR) filter. The warped transfer function (WTF) is synthesized by replacing the delay elements zˆ −1 with all-pass dispersive element D1 (z). a WFIR scheme with M coefficients; b original M-length IR w[n], and the warped IRs h[n] for various λ values
H (z) =
∞
h[n]z
−n
n=0
∞
z −1 − λ = w[k] 1 − λz −1 k=0
k .
(5.34)
Hence, H (z) is the IR of the warped filter obtained from the prototype w[n], i.e., the (5.34) is the so-called warped filter synthesis formula. Obviously the inverse transform, usually denotes to as warped filter analysis formula, is also valid ([32], Eq. (12)) W (ˆz ) =
∞
w[k]ˆz
k=0
−k
∞
zˆ −1 + λ = h[n] 1 + λˆz −1 n=0
n .
(5.35)
Thus, H (z) and W (ˆz ) define the direct and reverse mapping between TFs, i.e., these equations define all necessary mappings between a standard IR w[n] and its warped counterpart h[n]. The frequency response of the warping filter depends on the parameter λ. For λ = 0, the filter behaves like a normal FIR filter (i.e., w[n] ≡ h[n]), while for λ = 0, you have a bilinear mappings between zˆ -domain and z-domain. Therefore, the frequency-warping depends on the warping parameter λ as shown in Fig. 5.20. For a M-length FIR filter the TF can be written as HWFIR (z) =
M−1 m=0
w[m]ˆz −m =
M−1
w[m] {D1 (z)}m .
(5.36)
m=0
Note that, since each delay element is a 1st-order IIR all-pass filter the overall impulse response of the warped FIR filter has infinite duration. However, let τλ > 1 be the delay introduced by the all-pass cell, in common practice the length is however truncated to only τλ · M values. In the following example we see how with the warping paradigm it is possible to synthesize a filter with very demanding specifications, a low computational complexity, and the possibility of a simple parametric adjustment.
5.3 Rational Orthonormal Filter Architecture
361
Example 5.6 Synthesis of a WFIR lowpass-crossover filter for a low frequency effect (LFE) (sub-woofer) from a simple lowpass prototype. One of the typical problems in DASP is to design filters with very low cutoff frequencies relative to the sampling frequency. For example, the cutoff frequency for LFE crossover filter typically varies from 60 to 200 Hz; so, at the usual sampling frequency of f s = 44.1 kHz or greater, a stopband attenuation of at least 100 dB would require a FIR filter with an IR of several thousands coefficients (see Sect. 2.7.4.4, Example 5.6). For the solution of the synthesis problem with a WFIR filter, as a starting point we proceed with the definition of a generic lowpass prototype filter w[n], that can be designed by one of the methods known in the literature. In our example we consider the Parks–McClellan equiripple procedure3 [33], with parameters that meet the required specifications in terms of stopband attenuation and transition bandwidth. In our example, placing f c / f = 1.364, choosing a f c = 3 kHz, wanting a stopband attenuation of at least 100 dB, one can easily verify that these specifications can be achieved with a filter length of N = 131 (i.e., order equal to 130). Considering the implementation scheme in Fig. 5.28a, by adjusting the AP parameter λ, it is possible to smoothly adjust the delay between two successive samples at the input of the filter linear combiner, denoted in the figure as τλ . Thus, the parameter λ > 0 results in a fractional delay τλ > 1 between two successive samples easily computed by considering the all-pass cell group delay as previously illustrated in Fig. 5.24. As far as the filtering operation is concerned, this delay is equivalent to a stretching of the filter’s IR determining a lowering of its cutoff frequency.
a)
b)
Fig. 5.28 Warped lowpass FIR filter (WFIR). a FIR filter with controllable dispersion delay line; b all-pass warping parameter λ as a function of desired cutoff frequency normalized w.r.t. prototype cutoff frequency f c 3
For example in Matlab denoting by Nrd the order of the filter with f s and f c the sampling and cutoff frequencies of the filter, and by D F the transition bandwidth, the Parks-McClellan procedure is as follows w = firpm(Nrd,2*[0 fc-DF/2 fc+DF/2 fs/2 ]/fs, [1 1 0 0]);.
362
5 Special Transfer Functions for DASP
Fig. 5.29 Synthesis of a WFIR for LFE from a simple lowpass prototype. Lowpass FIR filter of length 131 and its WFIR versions with warped IR for different values of λ, determined with the realization in Fig. 5.28b, corresponding to the cutoff frequencies of 60, 120, and 240 Hz
The relationship between the cutoff frequency specification of the warped filter f and λ can be determined from Eq. (5.36), however, in Fig. 5.28b, for the sake of brevity, an empirical curve is given, evaluated with a simple exponential regression g(x) where x = f / f c , that links the cutoff frequency of the warped filter, normalized with respect to the cutoff frequency of the lowpass prototype. Hence, it is possible to determine the precise value of λ of the all-pass cells, as λ = g(x) such that the warped filter with the required specification is obtained. Some examples of LFE filter designs with frequencies of 60, 120, and 240 Hz obtained by simply varying the parameter of the AP cell λ = g(x) are shown in Fig. 5.29 (see exponential regression model in Fig. 5.28b). The top figure shows the amplitude and phase response for λ = 0, i.e., of the filter used as a lowpass prototype at 3 kHz. For the other values of λ determined with the curve in Fig. 5.28b, the responses of the filters at 240, 120, and 60 Hz are shown. From the curves, it can be seen that the specifications of the warped filters are, in practice, the same as the lowpass prototype (100 dB attenuation and same relative transition band f c / f = 1.364). Obviously, due to the warping effect the resulting filters IR is not exactly symmetrical, so the phase is not exactly linear even if from the curves (see the right side of the figure) it can be observed that in the transition band this effect is negligible. Example 5.7 In this example, we analyze the warped filters for LFE designed in the previous example with an exponential sweep signal (ESS), evaluating the output signal and the output spectrum of the filters themselves. The ESS is a sinusoidal
5.3 Rational Orthonormal Filter Architecture
363
Fig. 5.30 Lowpass FIR and WFIR filtering of an exponential sweep signal (ESS), with the LFE filters with cutoff frequencies of 60, 120, and 240 Hz (as in the previous Example) (Fig. 5.30). Note that the frequency variation in the ESS signal is slow in low frequency and fast at high so that the spectrum is not white and falling down by –10 dB/decade (or –3 dB/octave)
signal with frequency varying from f 1 to f 2 with exponential law defined as: x[n] = A sin K 2π f 1 en/K − 1 , for K =
N ln( f 2 / f 1 )
where A represents the amplitude and, N the number of signal samples. From the previous examples, we have seen how with the warping method it is possible to design a lowpass filter with a cutoff frequency equal to a small fraction of the sampling frequency, and with very demanding specifications in terms of stopband attenuation and transition bandwidth. Moreover, the time duration of the filter’s IR can be very long but characterized by a limited number of free parameters. Remark 5.13 Observe that the main advantages of the warping method are that the filter can be implemented with a small number of multiplications and that the cutoff frequency can be adjusted with the λ parameter of the all-pass transform.
5.3.4.3
Warped IIR Filters
The TF function of a warped IIR filter (WIIR) can be written as ([27], Eq. (23))
364
5 Special Transfer Functions for DASP
M HWIIR (z) =
m=0
1+
N
bm {D1 (z)}m
m=1
am {D1 (z)}m
.
(5.37)
However, as shown in Fig. 5.31b, in the direct implementation of Eq. (5.37) with an AP cell is not possible since it would contain delay-free loops that cannot be computed also for λ = 0. Some solutions of the problem exist in the literature, and a possible implementation scheme that avoids this problem is shown in Fig. 5.31b, c [26]. From the figures we can see that to make WIIR feasible, the topology is modified by moving the feedback ai (dashed in the figure) directly to the unit delay outputs. Obviously this modification can be applied in the structure of Fig. 5.31c [27]. Moreover, in the above structures the parameters bi of the numerator of the resulting TF filter are unchanged. In contrast, the denominator coefficients ai must be recalculated appropriately. For example, the parameters of Fig. 5.31c, can be computed with the following recursive formula ([32], p. 6). α N +1 = λa N , S N = a N for i = N , N − 1, . . . , 2 Si−1 = ai−1 − λSi
(5.38)
αi = λSi−1 + Si end α1 = S1 , α0 = 1/(1 − λS1 ).
Other variants of the structure in Fig. 5.31b, c are presented in [28]. Since warping is a simple mapping from zˆ -plane to warped z-plane practically all conventional DASP methods can be revised in the warped-domain. As in the case of
a)
b)
c)
Fig. 5.31 Warper IIR (WIIR) filter (5.37) implementation schemes. a General WIIR filter structure; b form with non-computable delay-free loops and possible solution with modified connection and parameters σi ; c a realizable form with 1st-order AP single unit-delay cells: structure with modified topology that avoids the delay-free loop, requiring redefinition of the FDE and computation of new topology coefficients αi [27]
5.3 Rational Orthonormal Filter Architecture
365
WFIR filters, it is possible to use the standard design filter methods for the IIR filter design and then convert it to WIIR. The topic is very wide and for more details on implementation refer, for example, to the following literature [27–32]. Example 5.8 Synthesis of a lowpass and bandpass WIIR from lowpass and bandpass Butterworth prototype. Starting from a 2nd-order lowpass filter with f c = 2 kHz, in Fig. 5.32a are reported the WIIR filter responses at 100, 200 and 400 Hz realized with the (5.37). Figure 5.32b shows the filter responses at 100, 200, and 400 Hz made with the (5.37) starting with a 2nd-order bandpass with f c = 2 kHz and Q = 2.87, ( f 2,1 = 1 1 + 4Q 2 ± 1). 2Q The values of the λ parameters to obtain the desired responses are obtained as for the WFIR filter design example and reported in the figure. Example 5.9 In this example, originally proposed in [27], we see how a uniform bank of 24 constant-Q IIR filters of the 2nd-order crossing at –3 dB can be transformed into a uniform FB in the Bark psychocaustic scale [30]. The Bark scale ranges from 1 to 24 and corresponds to the first 24 critical bands of hearing. A suitable approximation of the Bark scale, in terms of conventional frequency is given by ([27], Eq. (5)) ! f + 3.5tan−1 f B = 13 tan−1 0.76 kHz
f 7.5 kHz
!2
a)
c) b) Fig. 5.32 Synthesis of a lowpass and bandpass WIIR from lowpass and bandpass prototype (dashed curves) with f c = 2 kHz, for f s = 44.1 kHz. a Prototype and warped 2nd-order lowpass WIIR; b bandpass 2nd-order Q = 2.87 prototype warped at 100, 200 and 400 Hz; c regression for warping parameter computation λ
366
5 Special Transfer Functions for DASP
Fig. 5.33 Bark scale (solid curve) and AP warping AP transform Dλ (z) (dashed curve) as function of conventional frequency scale. The approximation is determined for a sampling frequency f s = 32 kHz, where the optimal for warping parameter is λ = 0.71276
and as shown in the solid curve in Fig. 5.33, this relation has a similar form to the AP warping transform defined by replacing the delay element of the filter with a 1storder AP cell z −1 → Dλ (z). In addition, Smith and Abel [30] derived an analytical expression for determining the parameter λ such that the distance between the Bark and AP cell warping bilinear transform functions is minimum ([30], Eq. (26)) λ = 1.0674
2
π
(0.06583 f s )
1/2
− 0.1916
(5.39)
where f s is the sampling rate. This approximation in shown in the dashed curve of Fig. 5.33, for a sampling frequency f s = 32 kHz. In addition, the uniform FB consisting of 24 IIR prototypes is shown in Fig. 5.34a. Each bandpass filter is designed with the Butterworth approximation, so the cutoff and crossover frequencies at −3 dB coincide. Fig. 5.34b shows the FB with the 24 channels warped with Dλ (z), where λ = 0.7128 determined with Eq. (5.39), considering the linear frequency scale. In Fig. 5.34c the warped FB on the Bark scale is shown. This IIR warped FB is approximately uniform on the Bark’s psychoacoustic scale, i.e., the bandwidth of each filter is approximately one [Bark].
5.4 Circular Buffer Delay Lines As we know the delay line (DL), sometime called tapped-delay line (TDL), is the fundamental structure for the implementation of FIR and IIR filters [1–5]. In DASP DLs are extremely important as they are the basis for the realization of numerous audio effects such as vibrato, flanger, chorus, slapback, echo, and so on [4, 55], and, as mentioned above, the simulation of room acoustics [5, 56, 69]. For example, in
5.4 Circular Buffer Delay Lines
367
Fig. 5.34 Transformation of a uniform FB in the conventional frequency scale, to a uniform FB in Bark’s psychoacoustic scale. a Uniform 2nd-order IIR Butterworth FB. b Corresponding 2nd-order warped FB obtained with all-pass transformation Dλ (z), for λ = 0.7128; c 2nd-order warped FB represented on the Bark scale
the case of FIR digital filtering, as shown in Fig. 5.27a, the delay line is the element on which the input signal samples are shifted down and then multiplied by the filter coefficients. In audio signal processing, very often, the delay line is used as a pure delay, i.e., the signal enters at one end of the line and exits with a certain delay, due to the number of memory elements, at the opposite end. If the line consists of D elements z −1 , it can be represented with a single block as shown in Fig. 5.35. The DL’s constitutive relation is therefore: y[n] = x[n − D].
(5.40)
368
5 Special Transfer Functions for DASP
5.4.1 Circular Buffer Addressing The DL’s software implementation can be done with a vector in which the samples are shifted as shown in Fig. 5.36. The signal sample feeds the first memory location: At each time-clock the sample flows (i.e., according to Fig. 5.36, the sample shift-right) and frees the first vector position where the new incoming sample x[n] is simultaneously entered. Therefore, to realize the shifting, the algorithm that implements the DL, performs D assignment operations (w[i] = w[i − 1]). Remark 5.14 Observe that, in common audio applications the delay D required to perform a certain type of processing (e.g., as in long echo effect), can be hundreds of [ms] and sometimes even some [s] and, therefore, at typical audio sampling rates the line length can reach tens of thousands of samples. In these cases, the computational cost of the shift-operations may not be negligible. For an efficient implementation of long delay line, it is necessary to avoid shiftoperations according to the technique called circular-buffer addressing [3, 5]. With reference to Fig. 5.37, it is possible to think of the buffer (that implements the DL), as if it were circularly arranged. Instead of scrolling the samples along the line, the value of the index (i.e., a pointer p) that point the position where the input sample is inserted is increased. The first signal sample is entered at position zero, the second at position one, and so on. When the buffer length is exceeded, the first position is overwritten, and so on (wrap operation).
Fig. 5.35 A delay line with D memory elements and compact representation
Fig. 5.36 A delay line naive shift mechanism
5.4 Circular Buffer Delay Lines
369
Fig. 5.37 Operating principle of a circular buffer. Diagram of the circular-buffer that realizes a DL with D = 7 (8 locations, from 0 to 7). For n = 8, the first position of the buffer is overwritten by the new incoming signal sample and the pointer is set to zero again ( p = ( p + 1)%D)
The last D signal samples are then always present in the buffer. Taking as output the value ahead the location where the input is loaded, the output is delayed by D samples. Always with reference to Fig. 5.37, when n = 7 the output is equal to the first element of the signal vector or y[n] = x[n − D]. Figure 5.38 shows two procedures to implement a DL of D samples.
Fig. 5.38 Possible implementations of a “C”-language function that implements the structure described in Fig. 5.37
370
5 Special Transfer Functions for DASP
A more general way to implement circular buffers, that realize a TF H (z) = z −D , is the one called module addressing, described for example in [5], that uses two pointers: one to define the input p (write pointer) and the other for the output q (read pointer). If you want to realize a DL of order D, said M the number of accessible contiguous memory locations, the input and output pointer is linked by the relationship p = (q + D)%M where the % symbol indicates the modulus M operation. At any given time, the input is written to the location addressed by p while the output is taken from location q. The two pointers are updated as p = ( p + 1)%M, and q = (q + 1)%M. The pointers are increased by respecting the buffer circularity. Remark 5.15 Observe that, in certain dedicated architectures such as wavetable synthesizers, where the sampled waveforms are read sequentially from the buffer and sent to the D/A converter, the sample can be read with a variable increment pointer. In general, the sampled waveform that is available in the buffer has a certain duration. Sometimes, however, in the execution the entire waveform is not used but only a portion of it. If, we indicate with 2r the amount of global available memory locations, and with M = 2s (with s < r ) the memories which are actually used, these locations are not contiguous and the update of the pointers will have to be done accordingly p = ( p + 2r −s )%2r , and q = (q + 2r −s )%2r . In practice, if the addressing is r -bit long, you don’t need to explicitly calculate the module: You just need to sum it up avoiding overflow. The following also applies p = (q + m2r −s )%2r .
5.4.2 Delay Lines with Nested All-Pass Filters Rather long delay lines, together with all-pass filters, are particularly used in artificial reverberation circuits [59]. In this case the nested AP structures are particularly interesting as it is possible to realize several nested filters on a single delay line as shown in Fig. 5.39. As you can see, these circuits are simple extensions of the nested-AP filters seen in Sect. 5.2.4.
5.4 Circular Buffer Delay Lines
371
Fig. 5.39 All-pass filters nested in direct form II on a delay line Fig. 5.40 All-pass filters. a implemented on a single DL; b all-pass filters nested on a single DL
a)
b)
Let’s consider a circuit with a generic AP TF defined as Ai (z) =
z −Di + ai . 1 + ai z −Di
This circuit can be implemented on a single DL according to the schematization in Fig. 5.40a. Replacing the ki -th element of the DL with a TF Ai+1 (z) we get z −k1 ← z −k2 A2 (z) ← z −k3 A3 (z) ← · · · .
372
5 Special Transfer Functions for DASP
The resulting structure is composed with a number of all-pass nested one inside the other as shown in Fig. 5.40b. This form is particularly efficient because, in practice, a single delay line is used whose length is equal to the sum of the delays of the individual APs D = i Di .
5.5 Fractional Delay Lines The digital delay line is characterized by a minimum delay that is defined by the sampling frequency f s of the signal. The minimum time-delay is equal to the sampling period Ts = 1/ f s which, considering the representation with the normalized sampling rate, is defined as unit delay. In many applications it is necessary to have a delay that may not be exactly a multiple of the unit delay. In these situations, indicated as αTs , for α ∈ [0, 1), it is necessary to define tools able to control a continuous delay or fractional delay (FD). A FD may be necessary in applications such as: echo cancellation, phased-array antenna, or more generally array processing problems, pitch-synchronous speech synthesis, time-delay estimation and detection of arrivals, modem synchronization, physical modeling of musical instruments. In audio signal processing, the fractional delay lines (FDLs) can be used for various types of applications such as in the conversion of the sampling rate with an irrational ratio, in various audio effects such as vibrato, microphones array processing, in the physical modeling of complex phenomena (FDLs are essential in wave-field synthesis); just to name a few [37–50, 56]. From an implementation point of view, band-limited fractional delay lines can be seen as a linear lowpass TF implemented by a numerical filters. Therefore, these can be FIR or IIR type, and designed with different philosophies, methodologies, and optimization criterion, like max-flat or min-max. As we will see, simple FDLs can be determined with simple intuitive considerations or by using optimization techniques usually adopted in the design of numerical filters. For example, the maximally flat FIR filter approximation is equivalent to the classical Lagrange interpolation method. However, it is not always convenient or possible to determine in closed form the impulse responses of decimation and interpolation filters: In the case of real-time applications, the computational cost of the exact solution would be too expensive (see Sect. 4.2). The topic of fractional delay lines is very specific and broad, and only a few aspects are explained here. For a more in-depth study, please refer to the specific bibliography [35–52, 82–88].
5.5.1 Problem Formulation of Band-Limited Interpolation The delay D ∈ R can be decomposed as a integer and a fractional part: D = Di + α, where Di = D is the integer part of the delay, and α ∈ [0, 1) = D − D its fractional part.
5.5 Fractional Delay Lines
373
Let x[n] be a sequence coming from band-limited analog signal by ideal conversion, the output of fractional delay line (FDL) is y[n] = x[n − (Di + α)], however, in order to avoid aliasing, it is necessary to verify the Nyquist band-limited condition, compared to the new sampling period. There are numerous methods in the literature for the determination of FDL [37], which are generally based on the approximation a so called ideal delay-operator L D {·} by an FIR/IIR filter or other interpolation techniques. The problem can be formulated by defining an L D -operator such that y[n] = L D {x[n]} = x[n − D]
(5.41)
that in z-transform notation is defined by the relationship Y (z) = z −D X (z), so it turns out that the ideal transfer function (TF) in the frequency domain is Hid (e jω ) = e− jωD ,
|ω| ≤ π.
(5.42)
For the module and phase we have that " " " Hid (e jω )" = 1,
# $ arg Hid (e jω ) = id (ω) = −Dω
where id (ω) indicate the ideal phase response. The group delay is therefore τgid = −
∂ [ id (ω)] = D ∂ω
while for the phase delay we have that τ pid = −
id (ω) = D. ω
Having a group delay identical to the phase delay means, in fact, that the entire waveform, regardless of its frequency content, is delayed by a time equal to D. The ideal solution to delay a signal by a D ∈ R quantity is a filter with a TF equal to (5.42). The impulse response is therefore h id [n] =
1 2π
π −π
e− jωD e jωn dω =
sin [π (n − D)] = sinc (n − D) , π (n − D)
∀n (5.43)
so, when D = n, and the previous expression becomes unitary. In general, the ideal solution is of limited usefulness for online applications because the h id : (1) has an infinite length; (2) is non-causal.
374
5 Special Transfer Functions for DASP
Fig. 5.41 Linear interpolation between two successive signal samples x[n − 1] and x[n], for α ∈ [0, 1)
5.5.2 Approximate FIR Solution The ideal solution in Eq. (5.43) can be approximated in many different ways. Of special interest in audio applications are digital filters that approximate the ideal interpolation in a maximally flat manner at low frequencies. In addition, in the case of the audio signal you should also be very careful with the following aspects: (1) online and real-time implementability; (2) group delay; (3) perceived quality.
5.5.2.1
Linear Interpolation: 1st-Order FIR Filter
The simplest and most intuitive way to determine a fractional delay is to consider a linear interpolation between two successive samples as shown in Fig. 5.41. Let α ∈ [0, 1) be the value representing fractional interpolation between the x[n − 1] and x[n] samples. The equation of the straight line, in the ordinate passing between these points, is worth x[n − 1 + α] − x[n − 1] α−0 = 1−0 x[n] − x[n − 1] i.e., the expression of the linear interpolator is a FIR filter h = [1 − α α]T , and the interpolated sample value can be compute as x[n − 1 + α] = αx[n] + (1 − α)x[n − 1] = x[n] − (x[n] + x[n − 1])α. (5.44) Based on the type of factorization of the previous expression, the linear interpolator filter scheme can be made with two multipliers or, using the equivalent polynomial Horner’s scheme (or Farrow structure Sect. 5.5.7), with only one as shown in Fig. 5.42. To evaluate the frequency response of the linear interpolator filter, it is necessary to evaluate the DTFT of Eq. (5.44) for which we have H (z)|z=e jω =
Y (e jω ) = α + (1 − α)e− jω . X (e jω )
(5.45)
5.5 Fractional Delay Lines
375
a)
b) Fig. 5.42 Possible schemes for the realization of the linear interpolator. a With one multiplier and two adders. b With two multipliers and one adder
Fig. 5.43 Frequency response |H (e jω )| and phase delay −arg{H (e jω )}/ω, of the 1st-order linear interpolator h = [1 − α α], for fractional delay values α from 0 to 1 step 0.1
Example 5.10 As an example, Fig. 5.43 shows the frequency and phase delay response of the expression (5.45) for some values of the α fractional delay. The amplitude response is almost flat for small α values. The linear interpolator sounds good when the signal is oversampled so that the signal spectrum is concentrated at low frequency while for values higher than FD the interpolator behaves like a lowpass filter. In fact, although the linear interpolation technique has a low computational cost, it has some drawbacks. Below are some of them. • Linearity distortion—The linear interpolator is a lowpass filter. • Amplitude and phase modulation—The filter is time-variant and introduces an overall variation in signal level and phase. • Aliasing—Interpolation, in general, can be considered as a non-optimal sampling rate conversion process. Property 5.7 The linear interpolator can be derived from Taylor’s expansion of the term x[n + α]
376
5 Special Transfer Functions for DASP
... x[n + α] = x[n] + α x[n] ˙ + α 2 x[n] ¨ + α 3 x [n] + · · ·
(5.46)
considering the 1st-order approximation and posing x[n] ˙ = (x[n + 1] − x[n])/1 we have that x[n + α] = x[n] + α(x[n + 1] − x[n]) which coincides with the non-causal version of Eq. (5.44). It should also be noted that this approach can be used to define higher-order interpolation filters.
5.5.2.2
Truncation and Causalization of the Ideal Impulse Response
A simple approximation of an ideal interpolator, is its causalized and truncated version. As shown in Fig. 5.44, the fractional output x[n − D] is computed as a linear combination of its previous and subsequent samples. Considering an M-length FDL, the delayed sample is inside the M-length signal window starting from a given reference index M0 appropriately chosen, i.e., −M0 < D < M − M0 − 1. For example, for M0 = M/2 − 1, the output can be calculated as x[n + α] =
M−M 0 −1
h[k − M0 ]x[n − k] =
k=−M0
M−1
h[k]x[n − M0 + k].
k=0
So, for a M-length filter we have h[k] =
sinc(k − D),
k ∈ [−M0 , M + M0 − 1]
0,
otherwise.
(5.47)
Note that the smallest error for a given filter length is obtained when the overall delay D is placed around its group delay. Thus, for a linear phase FIR filter the reference index M0 can be chosen around the group delay of the filter. For example, given a M-length FIR filter an possible choice of the reference index is
Fig. 5.44 Fractional delay FIR filter of length M. To have a causal filter, you will need to delay the output of M − 1 − M0 samples
5.5 Fractional Delay Lines
⎧M ⎪ − 1, ⎨ 2 M0 = ⎪ ⎩ M − 1, 2
377
for M even, and α ∈ [0, 1) (5.48) for M odd,
and α ∈ [−0.5, 0.5).
Example 5.11 As an example, Fig. 5.45 shows the impulse responses of a two FDL with M = 16 and M = 17. According to Eq. (5.48), for the even-length filter we have M0 = 7, with this choice, to have symmetry of the phase delay response (see Fig. 5.46), the fractional part is chosen in the interval α = [0, 1); for the odd-length filter we have M0 = 8 and with this choice, for symmetric phase delay response, the fractional part can be chosen in the interval α = [−0.5, 0.5). Remark 5.16 Observe that, the frequency responses of even- and odd-length FIR FDL filters are different. As you can see from Fig. 5.46, the even-length FIR FDL filter (M = 16) has a high ripple in the amplitude response (magnitude), while the phase delays are quite smooth and with minimum low frequency error. The oddlength FIR FDL filter (M =17) has complementary characteristics, a low ripple in the magnitude, high ripple in the phase delay response and high error in the low frequencies. For a better overview, in Fig. 5.47 are reported the 3D plot of the magnitude squared error |Hid (e jω ) − H (e jω )|2 and of the phase-delay squared error |α − τ H (e jω ) |2 , evaluated for several even- and odd-length filters.
Fig. 5.45 Fractional delay FIR filters of length M = 16, 17. In the upper a integer delay equal to M0 = 7, 8 is considered, so the filter impulse response is a simple delayed unit. The related sinc(·) function (dashed line) is null in correspondence of all the samples except for n = 0 where it is equal to h[0] = 1. In the lower part is reported the impulse responses of the fractional delay for α = 0.6, −0.4, and their related sinc(·) functions
378
5 Special Transfer Functions for DASP
Fig. 5.46 Magnitude and phase-delay response for different α’s values, such that D = M0 + α, of even-length (left) and odd-length (right) truncated sinc(·) FDL. Note that, for even-length filter the phase delay is symmetric respect to the delay 0.5, while for odd-length respect 0
Finally, Fig. 5.48 shows the magnitude and delay of a FDL for M = 2. The reference index M0 is M0 = M/2 − 1 and, with this reference the impulse sinc(n − D) response is simply evaluated for n = 0 and n = 1 as h[0] = sinc(α),
and
h[1] = sinc(1 − α).
Moreover, for M = 2 the interpolator filter can be implemented as shown in Fig. 5.49.
5.5.2.3
Fractional Delay FIR Filter by Least Squares Approximation
Let E(e jω ) = Hid (e jω ) − H (e jω ), be the difference between the desired response and the filter response, denoted as error we can proceed with least squares (LS) approximation as previously described in the Sect. 2.7.4.4, by placing Hid (e jω ) = e− jωD and solving the normal equations as Eq. (2.99). Moreover, an improved weighted LS (WLS) method for variable FD FIR filters design can be found in [46]. However, for the case of real impulse response it is not necessary the use of the LS approximation. In fact, by virtue of the Property 2.10, we known that: h L S [n] = h id [n] = sinc(n − D), for n = 0, 1, …, M − 1. As in the standard FIR-filter design, to reduce the ripple (known as the Gibbs phenomenon) of the interpolator filter, the ideal impulse response can be multiplied with an appropriately shaped window w[k − D].
5.5 Fractional Delay Lines
379
a)
b)
c)
d)
Fig. 5.47 Mean squared magnitude error and mean squared phase-delay error for even- and oddlength FIR filters. a For fixed M = 4, as a function of frequency and of the delay M0 + [0, 1). b For even-length filters M ∈ [2, 16] with fixed target delay with α = 0.4, as a function of ω/2π . c) For fixed M = 5. c For fixed M = 5, as a function of frequency and of the delay M0 + [−0.5, 0.5). d For odd-length filters M ∈ [3, 17] with fixed target delay with α = 0.4, as a function of ω/2π
h[k] =
w[k − D]sinc(k − D), 0,
k ∈ [−M0 , M + M0 − 1] otherwise
(5.49)
where w[·] is the window function which can have various shapes, triangular (or Hanning window), raised cosine (Hamming window), and so on [1–3]. In Fig. 5.50 are reported the magnitude responses and the phase delays of a two FIR filter multiplied with different window function, when the desired fractional phase delay varies from α = ±0.5 with a step of 0.1 [frac/sample]. From the figure we can see that for a rectangular window (i.e., simple truncated sinc function), the amplitude response has a certain ripple. Using a standard Chebyshev window, the ripple is contained at very low values. However, the price paid for the ripple reduction is the widening of the filter transition band. Therefore, near the maximum (normalized) frequency 0.5, there is a degradation of the filter response [1–3].
380
a)
5 Special Transfer Functions for DASP
b)
Fig. 5.48 Fractional delay FIR filter of length M = 2 with h = [sinc(α) 1 − sinc(α)]T . a The impulse response filter and sinc function (dashed line) for α = 0.6. b Magnitude and phase delay for eleven values of delay α
Fig. 5.49 Possible implementation schema of a fractional delay FIR filter of length M = 2
Remark 5.17 Observe that, as shown in the next paragraphs, very often in nonrecursive filter interpolation, a FIR filter is used in the form of a linear on nonlinear interpolator (polynomial or spline type) or Lagrange quadratic law [37, 50, 57].
5.5.3 Approximate All-Pass Solution Widely used for its low computational complexity, another technique to realize the a+z −1 FDL, is to use the 1st-order AP cell here written as H (z) = 1+az −1 , where |a| < 1 for stability. For filters of this type it is easy to verify that at low frequencies the following approximation applies
5.5 Fractional Delay Lines
381
Fig. 5.50 Magnitude responses (left) and phase delay curves (right) of a 25-length FD FIR filter. In the upper the simple truncated ideal impulse response. In the lower the ideal impulse response multiplied by a Chebyshev window. Note that, the ripple decrease is associated with a widening of the filter transition band Fig. 5.51 Fractional delay line with all-pass approximation
382
5 Special Transfer Functions for DASP
arg H (e jω ) ≈ −
a sin(ω) 1−a sin(ω) + ≈ −ω a + cos(ω) 1 + a cos(ω) 1+a
for group τg (ω) and phase τp (ω) delays for ω → 0, results τg (ω) ≈ τp (ω) ≈
1−a . 1+a
Given by definition α = τg (0), the filter coefficient can be determined on the basis of the desired delay at low frequencies such as " 1 − τg (ω) "" 1−α . = a= " 1 + τg (ω) ω→0 1+α
(5.50)
With this approximation the group delay is constant, with good approximation, up to f s /5 [5] (Fig. 5.52). In this case the fractional delay line scheme is the one shown in Fig. 5.51 where the whole part of the delay (Di − 1) is performed with a normal DL and the fractional part is relegated to the all-pass filter. The input–output relation results to be y[n] = ax[n] + x[n − 1] − ay[n − 1] = a(x[n] − y[n − 1]) + x[n − 1].
Phase Delay
Magnitude [dB]
The frequency response of an all-pass filter flat by definition. As with the linear interpolator, the all-pass has some acoustic drawbacks. Typically however, for low α values, the all-pass structure sound better than the linear interpolator.
Fig. 5.52 1st-order all-pass interpolator. Magnitude response |H (e jω )| and phase delay −arg{H (e jω )}/ω, of the 1st-order all-pass interpolator a = [α 1], b = [1 α] for fractional delay values α from 0 to 1 step 0.1
5.5 Fractional Delay Lines
5.5.3.1
383
Reduced Transient Response
The use of recursive structures (such as the all-pass) must be done very carefully as they may give rise to annoying transients due to the unlimited filter impulse response. In fact, due to recurrent nature, unlike in the FIR case, in the all-pass interpolation, interpolated values cannot be requested arbitrarily at any time in isolation or random access mode [69]. Moreover note that, as Fig. 5.53 shown, for α →0 the impulse response of the all-pass filter is quite long. This results in nonlinear-phase distortion at frequencies close to half the sampling rate. This phenomenon produces a disturbance that is particularly audible when using low sample rates (e.g., f s 0.3( f s /2)
Thiran all-pass filters have unity gain at all frequencies and they produce a maximally flat group delay response at ω → 0. As illustrated in Fig. 5.54, performance is optimal at DC but poor close to the Nyquist frequency. Also note that the impulsive response is relatively short so it can be used in time-varying situations.
5.5.4 Polynomial Interpolation The theoretical development of the methods described below is performed considering a continuous signal x(t), defined in the interval t ∈ [t0 , t N ], and known only in a finite set of N + 1 samples x(tk ), k = 0, 1, …, N . The problem is the reconstruction of the whole waveform x(t) in t ∈ [t0 , t N ] starting only from the knowledge of N + 1 samples. The estimation of the signal is done by means of an interpolating function (t, t) which, in general, depends on the time t and the choice of the sampling interval t.
5.5 Fractional Delay Lines
385
If the N + 1 known samples of the signal are not uniformly spaced, the interpoˆ the signal estimate lating function should be evaluated for each tk sample. Given x(t) x(t), we have x(t) ˆ =
N
k (t, tk )x(tk ).
k=0
If a specific interpolating function (e.g., polynomial, B-spline) is considered, the previous expression can be rewritten in the convolution form as x(t) ˆ =
N
h(t, tk )x(tk )
(5.52)
k=0
which has a form that is equivalent to a time-varying FIR filter. In case the samples are uniformly spaced, the function k (t, t) is the same for all the sampling intervals, and we can simply write x(t) ˆ =
N
k (t)x(tk ).
k=0
In this case, regardless of the chosen interpolating function, it is possible to determine a time-invariant FIR filter that implements the FDL, also called fractional delay filter, of the type x(t) ˆ =
N
h(tk )x(t − tk ).
(5.53)
k=0
Now let’s consider the case in which the signal is a sequence derived from the x(t) signal by means of a sampling done in instants tn = nT (with a constant T sampling period and, for simplicity, T = 1) for which x[n] = x(t)|t=nT ,
n ∈ [0, N ].
(5.54)
Said x[n + α], with 0 ≤ α < 1, the signal value at any point between two successive samples n and (n + 1), Eq. (5.53) can be written as x[n + α] =
N k=0
i.e., a simple FIR filter.
h[k]x[n − k]
(5.55)
386
5.5.4.1
5 Special Transfer Functions for DASP
Interpolation with Polynomial FIR Filter
In the first method, proposed in [42], we analyze the interpolating function (t) is realized with a polynomial of order N . The methodology derives considering a polynomial p N (x) of order N able to represent exactly an x(t) function in a set of uniformly spaced N + 1 samples. Let us consider a polynomial, x(t) = a0 + a1 t + a2 t 2 + · · · + a N t N =
N
ak t k
(5.56)
k=0
which passes in (N + 1) points (tn , xn ) with n = 0, 1, …, N . The above expression can be seen as a set of linear equations of the type Ta = x where T is a Vandermonde matrix. Explicitly we have that ⎡
⎤ ⎡ ⎤ a0 x0 ⎥ ⎢ a1 ⎥ ⎢ x 1 ⎥ ⎥⎢ ⎥ ⎢ ⎥ .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥ . ⎦⎣ . ⎦ ⎣ . ⎦ aN xN 1 t N t N2 · · · t NN
1 ⎢1 ⎢ ⎢ .. ⎣.
t0 t1 .. .
t02 · · · t12 · · · .. .
t02
⎤⎡
(5.57)
where polynomial coefficients ai are computed as a = T−1 x. If we consider a sequence x[n], and 2nd-order interpolator x(t) = a0 + a1 t + a2 t 2 , for t = n, as shown in Fig. 5.55a, we have that the points where the polynomial passes are (n − 1, x[n − 1]), (n, x[n]) and (n + 1, x[n + 1]). Therefore, the system equations to determine the ak coefficients are x[n − 1] = a0 + a1 (n − 1) + a2 (n − 1)2 x[n] = a0 + a1 n + a2 n 2
(5.58)
x[n + 1] = a0 + a1 (n + 1) + a2 (n + 1)
2
Fig. 5.55 Example of interpolating polynomial filter. a Order N = 2. b Order N = 4
5.5 Fractional Delay Lines
387
that in matrix form can be written as ⎤⎡ ⎤ ⎡ ⎤ ⎡ a0 x[n − 1] 1 (n − 1) (n − 1)2 ⎣1 n n 2 ⎦ ⎣ a1 ⎦ = ⎣ x[n] ⎦ . x[n + 1] a2 1 (n + 1) (n + 1)2
(5.59)
The system (5.59) can be symbolically solved, so we have ⎡
⎤ ⎡ n(n+1) ⎤⎡ ⎤ 1 − n 2 n(n−1) a0 x[n − 1] 2 2 ⎣ a1 ⎦ = ⎣ − (2n+1) 2n − (2n−1) ⎦ ⎣ x[n] ⎦ . 2 2 1 1 x[n + 1] a2 −1 2 2 Once the coefficients of the polynomial ak have been determined, the sample interpolated at the instant (n + α), with α < 1, is calculated by evaluating Eq. (5.56) for t = (n + α). Therefore it is equal to x[n + α] =
N k=0
" " ai t " " k"
= a0 + a1 (n + α) + a2 (n + α)2 .
(5.60)
t=(n+α)
Remark 5.18 Note that, it can be useful to express the sample with fractional delay as a function of the neighboring samples x[n + α] = F(x[n − 1], x[n]). This can be easily obtained by combining (5.60) with (5.58). It follows a linear relationship and that the sample x[n + α] can be seen as the output of a 2nd-order FIR filter, i.e., with three coefficients, of the form in Eq. (5.55) x[n + α] = c−1 x[n − 1] + c0 x[n] + c1 x[n + 1], where c−1 =
1 α(α − 1), 2
c0 = −(α + 1)(α − 1),
c1 =
1 α(α + 1) 2
(5.61)
Remark 5.19 Observe that, the FIR filter coefficients c−1 , c0 and c1 are timeinvariant and depend only on the delay α. If the sampling interval is not constant this would no longer be true and the filter coefficients should be recalculated for each sample. In general, the determination of the x[n + α] can be expressed as N /2
x[n + α] =
k=−N /2
For example, for N = 4 we have that
ck (α)x[n − k].
(5.62)
388
5 Special Transfer Functions for DASP
a)
b)
Fig. 5.56 Implementation scheme of interpolating polynomial filter. a Order N = 2. b Order N = 4
1 (α + 1)α(α − 1)(α − 2) 24 1 c−1 = − (α + 2)α(α − 1)(α − 2) 6 1 c0 = (α + 2)(α + 1)(α − 1)(α − 2) 4 1 c1 = − (α + 2)α(α + 1)(α − 2) 6 1 c2 = (α + 2)α(α + 1)(α − 1). 24
c−2 =
(5.63)
The filter implementation structure is shown in Fig. 5.56. Similar expressions can be found for polynomial orders higher than N = 2M. In general we can write ck =
M 1 (α − n), (−1) M−k (M + k)!(M − k)! n=−M
k = −M, . . . , M. (5.64)
n =k
From the DTFT of the filter coefficients evaluated by Eq. (5.63), or more in general by Eq. (5.64), it is possible to evaluate the frequency response of the interpolator filter H (e jω ) and then evaluate the squared error (SE) between the filter response and the ideal one Hid (e jω ) = e− jωD , for D = α, as " "2 SE(w, α, N ) = " Hid (e jω ) − H (e jω )" . The graphic trend of the SE(w, α, N ) as a function of the normalized frequency ω/2π and the delay α is shown in Fig. 5.57a where you can see that the error generally increases near the Nyquist frequency. In addition, Fig. 5.57b shows that the error decreases by increasing the order N of the interpolator.
389
Squared Error
Squared Error
5.5 Fractional Delay Lines
a)
b)
Fig. 5.57 Squared error |S E(α, ω, N )| surface trend, of the interpolating polynomial filter with coefficients evaluated by Eq. (5.64). a For fixed N = 6, as a function of ω and of the delay α. b For fixed target delay α = 0.5, as a function of ω and of the order N
5.5.4.2
Lagrange Polynomial Interpolation Filter
The problem of determining the sample value at a fractional delay x[n + α], with a polynomial approximation, where the polynomial coefficients are determined by Eq. (5.64), can be reintroduced from different assumptions. The Lagrange polynomial interpolation method originates by considering a N order polynomial that can represent exactly one function x(t) in a set of N + 1 samples ti . In this case, however, it is a priori required that the polynomial assumes zero value except at a specific point. The polynomial characterized by this behavior is called Lagrange’s polynomial (LP). The Lagrange polynomial of order N , related to the i-th sample, indicated as liN (x), is defined as ) 1, i =k liN (x) = δik = 0, otherwise. The LP, as shown in Fig. 5.58, is characterized by N zeros in positions t0 , t1 , …while at the i-th value x(ti ) = 1 applies. In numerical transmission jargon, this condition guarantees the absence of intersymbolic distortion. The basic idea in the use of the Lagrange polynomial is therefore to force to zero the impulse response of the
li ( x )
t0
t1
t2
li 1 ( x )
1
ti
ti
1
li 2 ( x)
tN
Fig. 5.58 Lagrange polynomials li (x) (LPs). The assumption is that the polynomial takes zero values at the sampling point except in the sample of interest
390
5 Special Transfer Functions for DASP
interpolator filter in correspondence of the neighboring samples so as to reduce the effects of intersymbolic distortion. From the LP’s zeros it is easy to verify that this can be written as liN (x) = ai (t − t0 ) · · · (t − ti−1 )(t − ti+1 ) · · · (t − t N )
(5.65)
from the previous expression for liN (x) = 1 , the ai coefficients are calculated as ai =
1 . (t − t0 ) · · · (t − ti−1 )(t − ti+1 ) · · · (t − t N )
Considering all the signal sampling points, the shape of the overall interpolator is the sum of the LPs related to all the N + 1 points each of which is multiplied by the value of the function in that point p N (x) =
N
liN (x)x(ti ) = l0N (x)x(t0 ) + l1N (x)x(t1 ) + · · · + l NN (x)x(t N ). (5.66)
i=0
By setting a = liN (x)
N j=0
(t − t j ) the (5.65) can be rewritten as
a 1 = ai = N t − ti j=0, j =i (ti − t j )
N j=0
(t − t j )
t − ti
=
N j=0, j =i
t − ti ti − t j
(5.67)
additionally, in the case of uniformly spaced samples, by placing ti = t0 + it and t j = t0 + jt; (i, j integer) and defining a new variable α < 1 such that t = t0 + αt we have that t − ti t0 + αt − t0 − it α−i = = ti − t j t0 + it − t0 − jt i−j and the expression (5.67) can be rewritten as liN (x) =
N j=0, j =i
α− j i−j
(5.68)
here denoted as Lagrange interpolation filter (LIF), where only sampling points appear. Note that the above expression is identical to (5.64) obtained by simple polynomial regression model. Let’s consider as usual numerical sequence where the available samples are uniformly spaced n = tn , the interpolator output (5.66) can be calculated as a simple convolution
5.5 Fractional Delay Lines
391
Fig. 5.59 Example of Matlab code for Lagrange interpolation filter (LIF) coefficients determination in Eq. (5.71)
function h = coeff_L(N, delay) % ---------------------------------------------------------% Returns (N + 1) taps of Lagrange interpolator FIR % filter which implements a fractional delay-line. % ---------------------------------------------------------if mod(N,2) == 0 Nsup = N/2; Ninf = -Nsup; else Nsup = (N + 1)/2; Ninf = -(N - 1)/2; end n = 0; for i = Ninf : Nsup n = n + 1; for j = Ninf : Nsup if (i~=j) h(n) = h(n) * (delay - j)/(i - j); end end end end
x[n + α] =
N
h[n]x[n − k], where h[k] = lkN (α).
(5.69)
k=0
It should be noted that this expression is similar to that seen above (5.62). In fact, in the case of uniform sampling the (5.69) would take the form of an FIR filter with constant coefficients that depend only on the value of the fractional delay. Finally, as D = Di + α where Di ∈ Z+ is fixed, in Eq. (5.68), in order to consider only the adjustable fractional delay α ∈ R, we have two null-phase version alternative for the interval [0, N ]. A common choice for even and odd order is N 1 1 , Ninf = −Nsup , α ∈ − , , 2 2 2 N +1 N −1 , Ninf = − , α ∈ [0, 1), = 2 2
Nsup = Nsup
N -even (5.70) N -odd.
In fact, as previously explained in Figs. 5.45 and 5.46, for optimal performance, the fractional delay should be positioned approximately halfway along the filter length. So, in case of odd order we can also choose Nsup = N 2−1 and Ninf = − N 2+1 . Thus, Eq. (5.68) can be rewritten as h[n] =
lnN (x)
=
Nsup j=Ninf , j =n
α− j n− j
(5.71)
So, considering the implementation using Eq. (5.68) the argument is the overall delay Di + α, while with the null-phase expressions (5.71), the argument is the only fractional part of the delay α. In Fig. 5.59 an example of Matlab code that implements Eq. (5.71); while Fig. 5.60 shows the magnitude and phase delay responses of even and odd orders Lagrange interpolation filters designed with Eq. (5.71).
5 Special Transfer Functions for DASP
Phase Delay
Magnitude [dB]
Phase Delay
Magnitude [dB]
392
Fig. 5.60 Example of LIF magnitude and phase delay responses of even (N = 4), and odd orders (N = 5). Lagrange interpolation filters Eq. (5.71)
For example for N =2, using Eq. (5.71) we will have h[−1] = h[0] = h[1] =
1
1 α− j = α(α − 1) −1 − j 2 j=−1, j =−1 1
α− j = −(α − 1)(α + 1) = 1 − α 2 0 − j j=−1, j =0 1
1 α− j = α(α + 1). 1− j 2 j=−1, j =1
It can be seen that this result is identical to that determined by the polynomial method described above (5.61). Remark 5.20 The Lagrange polynomial behavior is similar to that of the ideal interpolator filter realized by Nyquist’s lowpass filter in Eq. (5.42). This, in fact, has a null impulse response in correspondence of all the signal samples except for the ith reference sample (see Figs. 5.62 and 5.61). The following Theorem is also valid. Theorem 5.1 For an infinite number of equally spaced samples tk+1 − tk = , the Lagrange polynomial base converges to the sinc(·) function of type x − k , lk (x) = sinc
k = . . . , −2, −1, 0, 1, 2, . . . .
5.5 Fractional Delay Lines
393
Fig. 5.61 Ideal interpolation with Nyquist filter: the dashed samples represent the interpolated signal. The filter impulse response is zero in correspondence with the samples adjacent to the one to which it refers Trucated sinc(n- ) vs Lagrange Polynomial Interpolation 0.8 Trucated sinc
0.6
Lagrange Polynomial
h [n]
0.4
0.2
0
0.4 -0.2
-0.4 0
1
2
3
4
5
6
7
8
n
Fig. 5.62 Comparison of L D {sinc(n − α)} (or LS solution) FIR filter impulse response and the LIF lnN (x), evaluated with Eq. (5.71), for N = 8, for fixed delay α = 0.4. For N → ∞ ⇒ {lnN (α) →sinc(n − α) }
Proof Each analytical function is determined by its zeros and value at a point other than zero. Since the function sin(π x) is zero for x integer except zero, and since the function sinc(0) =1, it coincides with the Lagrange polynomial base for k → ∞ and k = 0. However note that, as shown in Fig. 5.62, for finite length filter, the Lagrange solution in Eq. (5.68) can also be obtained the windowing method where the window coefficients are computed using the binomial formula. For more detail on the connection between the sinc(·) function and the Lagrange interpolation refer to [75, 82–84]. Figure 5.63 shows the amplitude and phase delay responses for two polynomial filters. The Lagrange polynomial coefficients are evaluated with Eq. (5.71). It can be observed that compared to the sinc(·) interpolator with the same order (see Fig. 5.46), the responses are much smoother. In Fig. 5.64 are reported the magnitude and phase-delay response of even- and odd-order Lagrange interpolating filter for fixed delay α = 0.4.
5 Special Transfer Functions for DASP
Phase Delay
Phase Delay
Magnitude
Magnitude
394
Phase Delay
Magnitude [dB]
Fig. 5.63 Magnitude and phase-delay response of odd-order (left) and even-order (right) Lagrange polynomial interpolator. The polynomial coefficients are evaluated with with Eq. (5.71)
Fig. 5.64 Magnitude and phase-delay response of Lagrange interpolating filter for fixed delay α = 0.4. The continuous lines are related to polynomials with even orders, while the dashed lines to the odd ones
The Lagrange polynomial interpolation is widely used in practice because: 1. no need for feedback, i.e., is implemented with a FIR filter; 2. the coefficients of the filter are easily calculable with an explicit formula that allows a simple implementability also in real time. 3. it has a maxflat frequency response the maximum of the magnitude response never exceeds unity. Remark 5.21 Note that especially for low polynomial orders, LIF has optimal behavior up to a fraction of the Nyquist frequency. One possible approach to limit the error
5.5 Fractional Delay Lines
395
at high frequencies is to insert an upsampling block bringing the signal to a frequency L times higher than the input frequency. Usually a multi-phase network is used that can be interpreted as as a FDL followed by an L-fold decimation, evaluated every Lth sample of the oversampled signal [40, 45].
5.5.4.3
Cubic B-Spline Interpolation
A methodology that guarantees more control possibilities, better performance, and lower computational cost is based on the replacement of the Lagrange polynomial with a B-spline interpolator as proposed in [54]. The disadvantage of the B-spline method is a distortion of the spectrum of the interpolator filter which can, however, be easily precompensated with a simple equalizer to be put before the B-spline interpolator. The theoretical development of the method is done as in the case of the polynomial interpolators seen above where the signal (at first supposed continuous) x(t), t ∈ [t0 , t N ] is evaluated by the knowledge of N + 1 samples x(t
kN), k = 0, 1, …, N ; with k (t)x(tk ). the interpolating B-spline function (t) such that x(t) ˆ = k=0 Suppose we know the N + 1 samples of a x(t) signal in the range [x(tk ), x(tk+1 ), …, x(tk+N )]. The B-spline function of order N is defined as [4, 54]. kN (t) =
k+N +1
N +1
j=0,i = j
i=k
)
where
(t − ti )+N
(t − ti )+N =
(ti − t j )
(5.72)
(t − ti ) N t ≥ ti 0 t < ti .
The interpolator filter coefficients can be determined by evaluating the expression (5.72) for a generic delay by placing, in the case of uniformly spaced samples, ti = t0 + it e ; (i, j integer) and defining a new variable α < 1 such that t = t0 + αt we have that k+N +1 (α − i)+N . kN (α) =
N +1 i=0,i =k (i − j) i=k Since the kN (t) function decreases as N increases, a normalized version of it is usually used defined as NkN (t) = (tk+N +1 − tk )kN (t), then NkN (α)
= (N + 1)
k+N +1 i=k
(α − i)+N .
N +1 i=0,i =k (i − j)
(5.73)
The calculation of the previous expression for N = 2 produces the following coefficients of the interpolating FIR filter
396
5 Special Transfer Functions for DASP
1 N32 (α) = h(0) = − α 2 2 1 3 2 N2 (α) = h(1) = − (1 + α)2 + α 2 2 2 1 N12 (α) = h(2) = − (1 − α)2 . 2 For N = 3 produces the following interpolating FIR filter coefficients 1 3 α 6 1 N23 (α) = h(1) = (1 + α)3 − 6 1 N13 (α) = h(2) = (2 − α)3 − 6 1 N03 (α) = h(3) = (1 − α)3 . 6 N33 (α) = h(0) =
2 3 α 3 2 (1 − α)3 3
Figure 5.65 shows a comparison of a 3rd (N = 3) FDLs. For N = 6 you can prove that the (t) = (sin(t)/t)7 . In fact, B-spline functions can be obtained by means of repeated convolutions of rectangular impulses. Remark 5.22 The choice of the type of interpolation depends on the type of application. In general terms Rocchesso in [57] defined three properties that should be met: • flat frequency response; • linear phase response; • the delay-time variation does not give rise to audible transient. Interpolator Order 3
2
Target delay 0.4
0.5
0 0.4
Phase Delay
Magnitude [dB]
-2 -4 -6 -8 -10
sinc( ) Lagrange polynomial B-spline
-12
0.3
sinc( ) Lagrange polynomial B-spline
0.2
0.1
-14
0 0
0.1
0.2
0.3
0.4
Normalized frequency
0.5
0
0.1
0.2
0.3
0.4
Normalized frequency
Fig. 5.65 Comparison of magnitude and phase-delay response of 3rd-order FDLs
0.5
5.5 Fractional Delay Lines
397
It is clear that these properties are contradictory. For example an all-pass interpolator satisfies the (1) but not the (2) for an extended frequency range. The type of approximation choice for our application may not be easy. For more information on this, please refer to the literature in particular Laakso et al. [37], Dattorro [55], Rocchesso [57], Smith [69] and Bhandari et al. [76].
5.5.5 Time-Variant Delay Lines In irrational sample rate conversion and in many digital audio effects, some of which will be described in later chapters, are based on the use of time-varying delay lines (TVDLs), i.e., delay D, or is fractional part α, is itself a time function. You therefore have y[n] = x [n − D[n]] . (5.74) In general it may be convenient to express the variable delay over time as D[n] = D0 + D1 f D [n] = D0 (1 + m D f D [n])
(5.75)
where D0 represents the nominal length of the DL, the function f D [n], generally with null mean value, represents the variation law (or modulation type) and the constant m D ∈ [0, 1] represents the modulation index. The obtained effect depends on the variation law f D [n] and its modulation depth D1 = m D D0 (see Fig. 5.66). In general the D[n] value is not integer and must be interpolated with one of the techniques described in the previous paragraph.
Fig. 5.66 Time-variant delay line (TVDL) a General schema of DL with delay with variable length. b TVDL structure with interpolator filter
398 Fig. 5.67 Possible implementation in C++ of a delay line with variable length and fractional delay with linear interpolator (courtesy of [70])
5 Special Transfer Functions for DASP
static double A[N]; static double *rptr static double *wptr
= =
A; // read ptr A; // write ptr
double setdelay(int M) { rptr = wptr - M; while (rptr < A) { rptr + = }
N }
double delayline(double x) { double y; A[wptr++] = x; long rpi = (long)floor(rptr); double a = rptr - (double)rpi; y = a * A[rpi] + (1-a) * A[rpi+1]; rptr + = 1; if ((wptr-A) > = N) { wptr - = N } if ((rptr-A) > = N) { rptr - = N } return y; }
Figure 5.67 shows, as an example, the implementation in C++ of the Synthesis Tool Kit [70], an FD with linear interpolation.
5.5.6 Arbitrary Sampling Rate Conversion The conversion between two arbitrary sampling frequencies, including the cases in which the ratio is an integer, rational or irrational, is of central importance in many DASP applications. In all the situations described the conversion problem consists in determining a new signal sample placed between two samples of the original signal by means of an interpolation or extrapolation process. Thus the problem can be solved by a time-varying delay line where the variation concern only the fractional part, i.e., α → α(n) where, depending on the used technique [−0.5 < α(n) < 0.5] or [0 < α(n) < 1] (less than a fixed systematic integer delay). In a more general form the problem of interpolation and extrapolation must be understood as a method to interface two signals with any sampling frequencies, even with irrational ratio. An intuitive way to understand the conversion with irrational ratio is to convert the signal back to analog and then resample it to the desired frequency. This procedure, even if theoretically consistent, is not practically feasible since it should be implemented with a dedicated and almost non-programmable hardware structure; furthermore, the conversion processes would still produce a series of artifacts, distortions, noise, etc. inherent in the A/D–D/A conversion. In online audio applications, the ADC-DAC process is simulated using appropriate approximate interpolation scheme as polynomials or splines. In this case the rational
5.5 Fractional Delay Lines
a)
399
b)
Fig. 5.68 Arbitrary sampling rate conversion. a Example of downsampling Ts1 → Ts2 , with Ts2 > Ts1 . b Possible simple implementation algorithm
interpolation process can be seen as a cyclic time-varying filtering process in which the filter coefficients are calculated at each sample of the input signal [42–51]. Remark 5.23 Observe that, in arbitrary sampling rate conversion the converted sample is a function of the known neighboring samples so, as illustrated in Fig. 5.68, for each new sample the fractional part and the relative interpolator filter must be recalculated. In real-time applications, it is therefore necessary to have efficient algorithms both for the calculation of the filter coefficients and for the filtering operation. The design of the interpolator filter is quite critical. In the previous paragraph we have seen that for the correct definition of the group delay on the whole band, there is a bandwidth restriction and vice versa. Figure 5.69 shows an example of conversion, using different-order Lagrange polynomial and all-pass techniques, from a sample frequency of 48–44.1 kHz. In order to define a qualitative metric evaluation, the input signal consists of six sine waves centered at 20 Hz, 200 Hz, 1 kHz, 10 kHz, 15 kHz, and 20 kHz, to cover a large portion of the spectrum at the maximum amplitude allowed (0 dB). From the figure we can observe that all techniques produce artifacts especially at the higher frequencies (>10 kHz in this case). As we could have expected from the discussion in the previous paragraphs, the worst result is that of the all-pass interpolator. The artifacts produced by the Lagrange filter of order 32 are all below −60 dB, therefore, with inaudible effects. Remark 5.24 Note that, the computational cost of an N -order Lagrange interpolator is equal to the standard cost of the FIR filter: N + 1-multiplications + N -additions × sample; to which the cost for the calculation of the filter parameters must be added. If we use the expression (5.68) to calculate the parameters we should add more N -multiplications + 2N -additions × sample. Arbitrary conversion between sampling rates and its efficient implementation is a central theme in the DASP. For improved computational efficiency in [82], it has been proposed to store interpolator filter coefficients on an LUT, for a sufficient number of fractional delays αk . This method is commonly used in closely related sample rate conversion problems.
Fig. 5.69 Example of irrational signal sample-rate conversion from 48 to 44.1 kHz, very common in audio applications. In the upper part is reported 2ms of the time-domain signal. In the lower part the magnitude spectrum of the original signal end the signal converted using different interpolator filters
400 5 Special Transfer Functions for DASP
20 log 10|X(ej )|
401
20 log 10|X(ej )|
20 log 10|X(ej )|
20 log 10|X(ej )|
5.5 Fractional Delay Lines
Fig. 5.70 Spectra of converted signals with irrational sampling rate using preloaded interpolation filters on different length LUTs. (Upper) downsampled from 48 to 44.1 kHz. (Lower) upsampled from 44.1 to 48 kHz
By the way, in Fig. 5.70 are reported the results of the resampling process with Lagrange filters stored in LUTs of different lengths. The experiment was performed with the same signal in Fig. 5.69. From the figure we can observe that with a LUT with 256 prememorized coefficients produces a result very similar to that with run-time calculated filters.
5.5.7 Robust Fractional Delay FIR Filter The implementation of time-variant FDL FIR can have some critical issues when updating the filter status. In DASP, in particular, it can lead to audible and annoying artifacts. So, as seen also in the case of equalizers (see Sect. 3.4), sometimes it is convenient to use robust architectures in general at the expense of a computational overhead.
5.5.7.1
The Farrow’s Structure
In order to make the system more robust, Farrow in [41] proposes to use a parallel filter bank a priori determined and kept fixed and, as indicated in Fig. 5.71, the D = Di + α parameter that regulates delay, can be placed outside the fixed filter bank [35–41].
402
5 Special Transfer Functions for DASP
Fig. 5.71 Interpolation by the Farrow structure basis filter. The only variable parameter is external to the TF’s filters-bank so the structure is robust
The fixed filter bank can therefore be realized with very robust and efficient circuit structures, especially in case you want to realize a dedicated hardware [42–49]. Let h D [n], the filter impulse response of the interpolation filter relate to the delay D, Farrow in [41], proposed to approximate each impulse response with a M-order polynomial of the type h D [n] =
M
cm,n D m ,
n = 0, 1, . . . , N
(5.76)
m=0
The TF of the above expression can be written as HD (z) =
N M n=0 m=0
cm,n D z = m n
* N M m=0
n=0
+ cm,n z
n
Dm =
M
Cm (z)D m
(5.77)
m=0
where Cm (z), are the fixed TFs of a parallel FIR filters bank. Property 5.8 For exact Lagrange interpolation of order N , the order of the subfilters Cm (z) must equal to N (see for example [38]). The resulting Farrow’s structure shown in Fig. 5.71 comes from the rewriting of the polynomial (5.77) with Horner’s method. So it results HD (z) = C0 (z) + [C1 (z) + [C2 (z) + · · · + [C N −1 (z) + C N (z) D]D · · · ]D . , -. / N
(5.78) As for Lagrange polynomial interpolation, a common choice for the delay is Di = N /2 (N even) and α ∈ [−0.5, 0.5). The consistency of Farrow’s method can be demonstrated by the computability of bank TFs Cm (z).
5.5.7.2
LS Computation of the Farrow’s Sub-Filter Coefficients
The Farrow structure consists of bank of N -filters each with M parameters and, given the high order of degrees of freedom, the determination of the Cm (z) sub-filters, which approximates a desired response, indicated as D, can be done in several ways.
5.5 Fractional Delay Lines
403
Here, the determination of the solution is done by minimizing a given cost function (CF) with the least squares (LS) criterion. For the formulation of the CF the following definitions and assumptions are considered. First of all, we write the (5.76) as the following scalar product hD = vDT C where vD ∈ R(M+1)×1 = [1 D D 2 · · · D M ]T is the vector of delays, and C ∈ R(M+1)×(N +1) is the matrix of overall polynomial coefficients of the filter bank that represents the FDL impulse response hD . By choosing a set of values D0 D1 …, D L , usually uniformly spaced, where we want to approximate the desired response of the bank, indicated as D ∈ R(L+1)×(N +1) and related to these values, we can write the CF as J (H) = D − VD C2 where the matrix VD ∈ R(L+1)×(M+1) = [v TD0 v TD1 · · · v TDL ]T contains the grid of delays. Being the quadratic cost function, the optimal solution is unique. By setting its gradient to zero you get the following normal equations VD C = D, left multiply both members by VDT and solving w.r.t. C, the optimal LS solution is −1 T CLS = VDT VD VD D. The above solution is general for M < N and L > N .
5.5.7.3
Computation of the Farrow’s Sub-Filter Coefficients
Following a simpler and more direct approach, as proposed by Välimäki in [35] and reconsidered in [44], the determination of the coefficients cm,n of the TF Cm (z), can be done by imposing equality of the overall Farrow TF and a target TF T (z). If the = z −D . output is a D-delayed version of the input the desired relation is T (z) = YX (z) (z) So, for Eq. (5.77), and imposing this condition for various delays bn , for n=0,1, …; which for simplicity we consider integers we have that N HD (z) ∴ arg min Cm (z)bnm − z −bn , cm,n ∈R
bn = 0, 1, . . .
(5.79)
m=0
where {bn } can be the set of the natural numbers. Thus, for simplicity, considering N + 1 relations as bn = 0, 1, …, N , and writing Eq. (5.79) extensively we get
404
5 Special Transfer Functions for DASP
c0,0 00 + c0,1 01 + · · · + c0,N 0 N = z −0 c1,0 10 + c1,1 11 + · · · + c1,N 1 N = z −1 .. .
(5.80)
c N ,0 N 0 + c N ,1 N 1 + · · · + c N ,N N N = z −N i.e., a set of N + 1 equations that in matrix form can be written as CV = Z where V is a Vandermonde matrix defined as ⎡ 1 0 0 ··· ⎢1 1 1 ··· ⎢ ⎢ V = ⎢1 2 4 ··· ⎢ .. .. .. . . ⎣. . . .
0 1 2N .. .
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
1 N N2 · · · N N
and Z = diag(1 z −1 · · · z −N ) is the matrix of delay such that the coefficients cm,n of the TF Cm (z) can be determined as C = ZQ where Q = V−1 indicate the inverse of the Vandermonde matrix V. In other words, as z represents the delay elements in z-domain, according with the above equation, the TFs Cm (z) are obtained as the following scalar product Cm (z) = q(m)z,
m = 0, 1, . . . , N
(5.81)
where with q(m) indicate the raw of the matrix Q. For example, for N = 4 we have that ⎡
10 ⎢1 1 ⎢ V=⎢ ⎢1 2 ⎣1 3 14
0 1 4 9 16
0 1 8 27 64
⎤ 0 1 ⎥ ⎥ 16 ⎥ ⎥, 81 ⎦ 256
so, the Farrow’s TFs are
⎡
Q = V−1
24 ⎢ −50 1 ⎢ ⎢ 35 = 24 ⎢ ⎣ −10 1
0 96 −104 36 −4
0 −72 114 −48 6
0 32 −56 28 4
⎤ 0 −6 ⎥ ⎥ 11 ⎥ ⎥ −6 ⎦ 1
5.5 Fractional Delay Lines
405 function h = Farrow_imp_resp(C, alpha) M = length(C(1,:)); N = M - 1; x = zeros(M,1); x(1) = 1; h = alpha*conv(C(N+1,:), x); for i= N : -1 : 1 h = alpha*h + conv(C(i,:), x ); end end
function [C, V, Q, T] = Farrow_TF(N) if isodd(N), M = (N-1)/2; else M = N/2, end for m=0:N for n=0:N V(m+1,n+1) = m^n; if n>=m T(m+1,n+1) = M^(n-m)*nchoosek(n,m) ; end end end Q = inv(V); C = T*Q ; end
Fig. 5.72 Example of Matlab code for Farrow sub-filters specifications
C0 (z) = 1 25 C1 (z) = − 12 + 4z −1 − 3z −2 + 43 z −3 − 41 z −4
C2 (z) = C3 (z) = C4 (z) =
13 −1 11 −4 z + 19 z −2 − 73 z −3 + 24 z 2 4 −2 5 3 −1 7 −3 1 −4 − 12 + 2 z − 2z + 6 z − 4 z 1 1 −4 − 16 z −1 + 41 z −2 − 16 z −3 + 24 z 24 35 24
−
while the entire TF of the Farrow structure is HD (z) = C0 (z) + [C1 (z) + [C2 (z) + [C3 (z) + C4 (z)D]D]D]D.
5.5.7.4
Efficient Farrow Structure
Farrow’s structure can be made more efficient by considering only the fractional part of the delay. Hence, in accordance with (5.70), for even N −0.5 ≤ α < 0.5, while for odd N 0 ≤ α < 1 [36, 37]. To consider the fractional part only, we remove the integer part Di . So, you need to properly transform the matrix Q as C = TQ where T is a transformation matrix defined as )N ) j−k j! · j!( j−k)! , for j ≥ k M , N -even , where M = N2 −1 TTj,k = , N -odd. 0, for j < k 2 This transformation is equivalent to substituting D = D − 1 (Fig. 5.72). For example for N = 3 and N = 4 we have ⎡ 1 ⎢0 T(3) = ⎣ 0 0
1 1 0 0
1 2 1 0
⎤ ⎡ 0 6 0 1 3⎥ (3) = 1 ⎢ −2 −3 6 , C ⎦ ⎣ 3 −6 3 3 6 −1 3 −3 1
⎡ ⎤ 1 2 0 ⎢0 1 −1 ⎥ (4) ⎢0 0 ; T = ⎦ ⎣ 1 0 0 1 0 0
4 4 1 0 0
8 12 6 1 0
⎤ 16 32 ⎥ (4) = 1 24 ⎥ ⎦, C 24 8 1
⎤ 0 0 24 0 0 ⎢ 2 −16 0 16 −2 ⎥ ⎢ −1 16 −30 16 −1 ⎥ . ⎦ ⎣ −2 4 0 −4 2 1 −4 6 −4 1 ⎡
406
5 Special Transfer Functions for DASP
Thus the shifted Farrows sub-filters TFs, respectively, are (4)
C0 (z) = z −2
C0(3) (z) = z −1
(4)
C1 (z) =
(3)
C1 (z) = − 13 − 21 z −1 + z −2 − 16 z −3 (3)
C2 (z) = 21 −z −1 + 21 z −2 + 16 z −3 (3)
C3 (z) = − 16 + 21 z −1 − 21 z −2 + 16 z −3
,
(4)
C2 (z) = C3(4) (z) = (4)
C4 (z) =
1 −4 12 z 1 15 −2 1 −4 − 24 + 23 z −1 − 12 z + 23 z −3 − 24 z 1 1 −1 1 −3 1 −4 − 12 + 6 z − 6 z + 12 z 1 1 −1 1 −4 + 41 z −2 − 16 z −3 + 24 z . 24 − 6 z 1 12
− 23 z −1 + 23 z −3 −
Property 5.9 From Fig. 5.73 we can observe that the TFs of the sub-filters, except for the C0 (z), around ω = 0, have highpass characteristics typical of the differentiators filters [1–3]. This depends on the global TF required by the bank. In fact, by developing the target function HD (e jω ) = e− jωD in Taylor’s series we get ∞ ∞ (−1)m m (−1)m m α ( jω)m e− jωDi = α Hm (e jω ) m! m! m=0 m=0 (5.82) where the Hm (e jω ) = ( jω)m e− jωDi is the frequency response of the m-order differentiator that for m = 0 is a simple delay.
HD (e jω ) = e− jωD =
h[n] h[n] h[n]
h[n]
arg[C(ej )] arg[C(ej )] arg[C(ej )]
arg[C(ej )]
|C(ej )| |C(ej )|
|C(ej )|
|C(ej )|
The previous property suggests that the coefficients of the Farrow structure may be obtained using FIR approximations of ideal differentiators. This property, as we will also see in the following paragraphs, has been used for the alternative design
Fig. 5.73 Farrow sub-filters characteristic for N = 3. Note that C0 (z) (top) is a simple delay while C1 (z), C2 (z) and C3 (z), for ω → 0, have the typical highpass characteristics of a differentiator filter
5.5 Fractional Delay Lines Overall Magnitude Farrow order=3
407
Overall phase delay Farrow order=3
Overall Magnitude Farrow order=4
Overall phase delay Farrow order=4
1
0
0.9
-10
0.5 0.4 0.3 0.2
Magnitude [dB]
= 0.0 = 0.1 = 0.2 = 0.3 = 0.4 = 0.5 = 0.6 = 0.7 = 0.8 = 0.9 = 1.0
0.6
Phase Delay
0.7
Phase Delay
Magnitude [dB]
0.8 = 0.0 = 0.1 = 0.2 = 0.3 = 0.4 = 0.5 = 0.6 = 0.7 = 0.8 = 0.9 = 1.0
-5
0.1 0 -15 0
0.1
0.2
0.3
0.4
Normalized frequency
0.5
0
0.1
0.2
0.3
0.4
Normalized frequency
0.5
Normalized frequency
Normalized frequency
Fig. 5.74 Farrow FDL for N = 3 (left) and N = 4 (right)
of Farrow’s structure [44]. However, given the high freedom degree in choosing sub-filters, the reverse is not necessarily true. Depending on the chosen philosophy design, the sub-filters are not required to approximate differentiators [47] (Fig. 5.74). Remark 5.25 Note that, the computational cost of Farrow’s robust structure requires the calculation, even if in parallel, of N FIR filters for which you have N (N + 1)multiplications, N 2 -additions. Considering also the N -multiplications for D, we have a cost computational total of N 2 + 2N -multiplications and N 2 + 2N -additions × sample. Moreover, since the delay line is shared throughout the bank, many intermediate calculations are common to multiple filters. In the above example for N = 4, the term 23 z −3 is common to C1 (z) and C2 (z). You can also see how many other terms are common to less than one sign. These terms are easily grouped together in order to decrease the number of multiplications, which generally requires more computation than sums. However, more efficient architectures are available in the literature. For example in [52], a Farrow structure is proposed in which the filter bank is implemented with coefficients with values of power of two. So it is possible to reduce the multiplier complexity in case of hardware implementation.
5.5.8 Taylor Expansion of Lagrange Interpolation Filter The Taylor expansion of Lagrange interpolation filter (LIF), as seen above in Eq. (5.82), consists of a set of basis functions consisting of simple differentiator filters. Based on this observation, Candan in [85] proposes a simple structure based on LIF Taylor expansion carried out directly in the discrete-time domain. Before proceeding, to simplify the development as suggested in [86], let’s consider the following useful notations. 1. The term (·) indicates the difference operator: ( f [n]) = f [n] − f [n − 1], i.e., the TF is (1 − z −1 ).
408
5 Special Transfer Functions for DASP
2. The term f [N ] = f ( f + 1)( f + 2) · · · ( f + N − 1), indicate the factorial polynomials (aka rising factorials or Pochhammer symbol); 3. Let us indicate its first-difference (as ddf f N = N f N −1 ), by the following recursive formula ( f [N ] ) = N f [N −1] . Thus, the LIF truncated discrete-time Taylor series can be written as y[t] =
N n=0
n (x[k]) ·
(t − k)[n] n!
(5.83)
the previous expression is also denoted as Newton’s backward difference formula [87]. Now, indicating n (x[k]) = n−1 (x[k]) − n−1 (x[k − 1]), with simple manipulations the (5.83) can be written recursively as (t − k)[n] (t − k)[n−1] t − k + N − 1 = −n−1 (x[k]) · · · (x[k]). n! (n − 1)! N (5.84) Indicating with D = k − t the desired fractional delay, the expression (5.84) yields a modular structure shown in Fig. 5.75, also denoted as a Newton fractional-delay (NFD) filter. In this case the sub-filters are simple differentiators implementable without multiplication, and the complexity for a N -order filter is O(N ), instead of O(N 2 ) of Farrow’s structure. In the LIF structure of Fig. 5.75 in the intermediate outputs, the delayed input signal is present at the same time, for different delay parameters. In addition, note that the structure in Fig. 5.75 consists of a series of sub-filters whose output is multiplied by a term that depends on the target delay D (Fig. 5.77). However, each output sample depends on the current and past values of the delay parameter. Therefore, the above LIF structure does not work correctly for the timevarying fractional delay (e.g., as in sample rate conversion). To overcome the problem, in [88] a new architecture based on Newton’s backwards difference formula (5.83) has been proposed, called the Newton-transpose interpolation structure (TNIS) illustrated in Fig. 5.76. n (x[k]) ·
Fig. 5.75 Modular LIF structure also denoted as Newton fractional-delay (NFD) filter. At the summer outputs all LIF interpolators order are available (modified from [85])
5.6 Digital Oscillators
409
Magnitude [dB]
Phase Delay
Magnitude [dB]
Phase Delay
Fig. 5.76 Modular LIF structure also denoted as transposed Newton fractional-delay (TNFD) filter (modified from [88])
Fig. 5.77 Magnitude and absolute delay responses of modular LIF implemented with the structure in Fig. 5.76 of orders 4, 5
In this case the upper part of the structure is similar to a delay line where the delay elements are replaced by differentiator-elements z −1 → (1 − z −1 ), with no intermediate operations. The parameter that defines the fractional delay is external to this chain so it can be time-varying because, as in Farrow’s structure, it does not affect the internal state of the sub-filters. This is very interesting for audio applications as it is very robust, efficient, easily scalable, and feasible in both software and dedicated hardware structures.
5.6 Digital Oscillators As for analog signals very often also for numerical sequences, it is necessary the generation of periodic waveforms as sinusoidal-, square-, triangular-waves, etc. A simple approach for the generation of periodic signals consists in determining a filter with TF H (z) such that its impulse response h[n] is equal to the desired
410
5 Special Transfer Functions for DASP
waveform. By sending an unitary impulse δ[n] to this filter the desired waveform is generated. The computational cost of the generator is therefore that of the filtering process. However, a more efficient way to generate any waveform is the so-called wavetable technique. The technique simply consists in storing a period of the waveform on a table consisting of a Random Access Memory (RAM) organized as a circular buffer and reading its contents periodically. The reading period is therefore equivalent to the fundamental frequency of the waveform.
5.6.1 Sinusoidal Digital Oscillator This approach is based on the synthesis of a certain TF that allows the simple of a sinusoidal signal at frequency f 0 with sampling rate f s [3]. Said ω0 = 2π f 0 / f s the angular frequency of the desired sinusoidal, the TF’s impulse response can be written as h[n] = R n sin(ω0 n)u[n], which for 0 < R < 1, corresponds to the generation of a sinusoid with exponential decay. In case R = 1, the generated signal is a pure sinusoid with frequency f 0 . The corresponding TF results Hs (z) =
R sin ω0 z −1 . 1 − 2R cos ω0 z −1 + R 2 z −2
Similarly, it is possible to generate a cosine wave oscillation. In this case the h[n] results to be equal to h[n] = R n cos(ω0 n)u[n], and the resulting TF H (z) is Hc (z) =
1 − R cos ω0 z −1 . 1 − 2R cos ω0 z −1 + R 2 z −2
5.6.2 Wavetable Oscillator In computer the wavetable oscillator is one of the earliest techniques used in digital sound synthesis [63]. The wavetable oscillator is realized with a RAM memory in which is stored a period of the waveform that is read periodically with a certain speed. The read values are sent to a digital to analog converter (DAC) that produces an analog output signal. The table on which the waveform is stored is called look-up table (LUT) and is generally read with the circular buffer technique described above [3, 64] (Fig. 5.78). Let D be the length of the LUT where period of the waveform is stored, let f s be the reading speed of the table samples, the frequency of the periodic sound is equal to f = f s /D. However, if we want to realize a sound with the same waveform but with a different frequency, we can proceed in two ways:
5.6 Digital Oscillators
411
a)
b)
Fig. 5.78 Discrete-time circuit diagrams for the implementation of a pure oscillator. a Sine oscillator. b Cosine oscillator
1. By varying the reading frequency f s of the table; 2. By virtually varying, with appropriate fractional interpolation techniques, the length of the table. Since, for the pitch variation of the memorized sound, it is more complex to vary the LUT reading sampling frequency, the second technique is almost always considered. The change in the length of the table is obtained by means of an interpolation process. That is, a table of a certain length is used (generally a rather long table is preferred) by taking the most appropriate value each time or by interpolation (see Sect. 5.5) between two (or more) adjacent points or by using the abscissa value closest to the desired one (zero order interpolation). Said sampling increment n SI , the distance between the two samples read successively, the fundamental frequency of the sound produced results as follows f0 =
n SI f s . D
(5.85)
Wavetable synthesizers read the sampled waveforms sequentially from the buffer, and the sample can be read with a variable increment pointer. Said p the instantaneous phase of the oscillator (pointer to the circular buffer) the reading algorithm can be implemented simply with a module D p = ( p + n SI )%D s = A · LUT { p} where A is the signal amplitude, LUT{ · }, is the table where the waveform is stored, and s represents the output signal. Remark 5.26 Note that the wavetable oscillator is the basic element for the realization of numerous sound synthesis techniques. This type of oscillator is used both in real-time, with dedicated hardware, and off-line with programs that store the file containing the song that will be available for listening in times after its generation [63–67].
412
5.6.2.1
5 Special Transfer Functions for DASP
Band-limited and Fractional Delay Wavetable Oscillator
Many waveforms useful in sound synthesis have discontinuities. Square and triangular and other waveforms, or waveforms derivative, have a spectrum that is not band-limited, so their direct use in wavetable oscillators would give rise to aliasing phenomena. In the presence of aliasing, the harmonics above the Nyquist limit fold back in the low frequencies, causing annoyance and noise in the audio range [66, 67]. Remark 5.27 A simple method to avoid aliasing is that the stored waveform must be appropriately limited in band, and the discontinuity should be rounded near the desired sampling time. If in Eq. (5.85) the parameter n SI > 1, the sampling period is lowered. In this case, care must be taken to respect the Nyquist frequency. In fact, interpolated wavetable synthesis is not guaranteed to be band-limited when the phase increment is larger than one sample Thus, for a waveform with many harmonics, usually an upper bound is imposed on the phase increment by the highest harmonic of the signal in the wavetable. Let Nh the harmonic number, a common choice is max(n SI ) =
D f0 D = f s Nh P Nh
where P ∈ R = f s / f 0 , is normally not integer. Otherwise, to obtain a limited band signal, an anti-aliasing/anti-image filter must be applied before conversion. For example, to increase the pitch of a signal the sampling frequency should be augmented (P < 1). Consequently, to prevent aliasing must provide effective lowpass filtering. So, according to Eq. (4.31), you need to consider an anti-aliasing filter h a (t) =sinc( f a t) with f a = min( f s /2, f 0 /2) (see Sect. 4.2.5). In other words, up/downsampling algorithm requires a lowpass filtering with a variable cutoff frequency that is controlled by the conversion ratio P. Thus, antialiasing wavetable methods utilize variable fractional delay filters as an essential part of the oscillator algorithm [73].
5.6.2.2
Signal-to-Noise Ratio of Wavetable Oscillator
The signal-to-noise ratio (SNR) of the wavetable oscillator can be analyzed with simple considerations in [64, 65]. For a table of length D = 2b said xi [n] the reference signal (obtained with an ideal sampling) and x[n] the signal obtained from the wavetable oscillator, the RMS error is defined as 0
D 2 n=1 (x i [n] − x[n]) . e[n] = D
References
413
Considering this error as additive disturbance, and zero-mean signals, we can easily calculate SNR as
2 2 σx x [n] σ2 . (5.86) SNR = x2 = 2 , or in decibel SNRdB = 10 log10 e [n] σe σe2 Therefore, the SNR depends on the input signal statistics: If the input signal level is low, the SNR decreases. For a signal sample representation with a b-bit long word, the q quantization step is defined as q = 2xmax /2b where xmax ∴ |x[n]| ≤ xmax , is the maximum level of the input signal. Considering uniform distributed quantization error, the noise variance 2 2 xmax turns out to be σe2 = q12 = (3)2 2b . So, Eq. (5.86) can be rewritten as SNR =
σ2 (3)22b σx2 = x 2x = !2 2 max σe xmax 2b (3)2
σx
that in dB is SNRdB = 20 log10
(3)2b ! = 6.02b + 4.77 − 20 log10 xmax σx
xmax σx
.
So, for a maximum amplitude sinusoidal signal we get SNRdB ≈ 6.02b + 1.76, for a uniform distributed signal we have that SNRdB ≈ 6.02b, while for a Gaussian signal the SNR is SNRdB ≈ 6.02b − 8.5. For an audio signal such as voice or music, the Gaussian distribution can be considered an approximation of the true distribution, and thus the SNR is 8.5 dB lower than the best case. For example, in linear audio CDs, the signal is represented with 16 bits so the SNR is about 87.8 dB.
References 1. A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing, 3rd edn (Pearson Education, 2010) 2. L.R. Rabiner, B. Gold, Theory and Application of Digital Signal Processing (Prentice-Hall Inc, Englewood Cliffs, N.J., 1975) 3. S.J. Orfanidis, Introduction to Signal Processing (Prentice Hall, 2010). ISBN 0-13-209172-0 4. P. Dutilleux, U. Zölzer, Filters, in DAFX, Digital Audio Effects (John Wiley & Sons Inc., 2002), pp. 31–62 5. D. Rocchesso, Introduction to Sound Processing. ISBN-10: 8890112611. http:// freecomputerbooks.com/Introduction-to-Sound-Processing.html (2004)
414
5 Special Transfer Functions for DASP
6. A. Uncini, Fundamentals of Adaptive Signal Processing, Signals and Communication Technology Book Series (Springer, 2015). ISBN: 978-3-319-02806-4 7. W.H. Kautz, Transient synthesis in the time domain. IRE Trans. Circuit Theory. 1(3), 29–39 (1954) 8. T.Y. Young, W.H. Huggins, Discrete orthonormal exponentials, in Proceedings of National Electricity Conjunction (1962), pp. 10–18 9. P.W. Broome, Discrete orthonormal sequences. J. As-SOC. Comput. Mach. 12(2), 151–168 (1965) 10. H.J.W. Belt, Orthonormal Bases for Adaptive Filtering (Technische Universiteit Eindhoven, Eindhoven). https://doi.org/10.6100/IR491853 (1997) 11. A.C. den Brinker, H.J.W. Belt, Using Kautz Models in Model Reduction, in Signal Analysis and Prediction ed. by D.A. Prochazka, N.G. Kingsbury, P.J.W. Payner, J. Uhlir. ISBN : 978-14612-7273-1, (1998) 12. P.S.C. Heuberger, T.J. de Hoog, P.M.J. van den Hof, B. Wahlberg, Orthonormal basis functions in time and frequency domains: Hambo transform theory. SIAM J. Control Optim. 42(4), 1347–1373 (2003) 13. B. Wahlberg, System identification using Kautz models. IEEE Trans. Autom. Control. 39(6) (1994) 14. T. Paatero and M. Karjalainen, Kautz Filters and generalized frequency resolution: theory and audio applications. J. Audio Eng. Soc. 51(1/2) (2003) 15. T.J. Mourjopoulos, M.A. Paraskevas, Pole and zero modeling of room transfer functions. J. Sound Vib. 146, 281–302 (1991) 16. Y. Haneda, S. Makino, Y. Kaneda, Common acoustical pole and zero modeling of room transfer functions. IEEE Trans. Speech Audio Process. 2(2), 320–328 (1994) 17. G. Bunkheila , R. Parisi, A. Uncini, Model order selection for estimation of common acoustical poles, in IEEE International Symposium on Circuits and Systems (Seattle, WA, USA, 2008), pp. 1180–1183 18. G. Vairetti, E. De Sena, M. Catrysse, S.H. Jensen, M. Moonen, T. van Waterschoot, A scalable algorithm for physically motivated and sparse approximation of room impulse responses with orthonormal basis functions. IEEE/ACM Trans. Audio Speech Ana Lang. Process. 25(7) (2017) 19. T. Oliveira e Silva, Rational orthonormal functions on the unit circle and on the imaginary axis, with applications in system identification. http://www.ieeta.pt/tos/bib/8.2.ps.gz (1995) 20. T. Oliveira e Silva, Optimality conditions for truncated Laguerre networks. IEEE Trans. Signal Process. 42(9), 2528–2530 (1994) 21. T. Oliveira e Silva, Laguerre filters—an introduction. Revista do Detua 1(3) (1995) 22. A.V. Oppenheim, D.H. Johnson, K. Steiglitz, Computation of spectra with unequal resolution using the Fast Fourier Transform. Proc. IEEE 59, 299–301 (1971) 23. W. Schiissler, Variable digital filters. Arch. Elek. Obertragung 24, 524–525 (1970) 24. A.G. Constantinides, Spectral transformations for digital filters. Proc. IEEE 117(8), 1585–1590 (1970) 25. S. Bagchi, S.K. Mitra, Nonuniform Discrete Fourier Transform and its Signal Processing Applications (Kluwer, Norwell, MA, 1999) 26. M. Karjalainen, A. Härmä, Realizable Warped IIR Filters and Their Properties, in Proceedings of IEEE ICASSP’97, (Munich, Germany, 1997), pp. 2205–2208 27. A. Härmä, M. Karjalainen, L. Savioja, V. Välimäki, U.K. Laine, J. Huopaniemi, Frequencywarped signal processing for audio applications. J. Audio Eng. Soc. 48(11), 1011–1031 (2000) 28. A. Härmä, Implementation of frequency-warped recursive filters. Signal Process. 80, 543–548 (2000) 29. A. Makur, S.K. Mitra, Warped Discrete-Fourier Transform: theory and applications. IEEE Trans. Circ. Syst. I Fundamental Theory Appl. 48(9), 1086–1093 (2001) 30. J.O. Smith, J.S. Abel, Bark and ERB Bilinear Transform. IEEE Trans. Speech Audio Process. 7, 697–708 (1999) 31. M. Karjalainen, E. Piirilä, A. Järvinen, J. Huopaniemi, Comparison of loudspeaker equalization methods based on DSP techniques. J. Audio Eng. Soc. 47(1–2), 14–31 (1999)
References
415
32. M. Karjalainen, T. Paatero, J. Pakarinen, V. Välimäki, Special digital filters for audio reproduction, in AES 32nd International Conference(Hillerød, Denmark, 2007), pp. 21–23 33. J.H., McClellan, T.W. Parks, A personal history of the Parks-Mc Clellan algorithm. IEEE Signal Process. Mag. 22(2), 82–86 (2005) 34. G. Ramos, J.J. López, B. Pueo, Cascaded warped-FIR and FIR filter structure for loudspeaker equalization with low computational cost requirements. Digital Signal Process. 19, 393–409 (2009) 35. V. Välimäki, Discrete-time modeling of acoustic tubes using fractional delay filters, Ph.D. dissertation, Lab. Acoust. Audio Signal Process., TKK, Espoo, Finland, 1995 36. V. Välimäki, A new filter implementation strategy for lagrange interpolation, in Proceedings of ISCAS’95—International Symposium on Circuits and Systems (1995) 37. T.I. Laakso, V. Välimäki, M. Karjalainen, U.K. Laine, Splitting the unit delay—tools for fractional delay filter design. IEEE Signal Process. Mag. 13(1) (1996) 38. H. Meyr, M. Moeneclaey, S.A. Fechtel, Digital Communication Receivers (John Wiley & Sons, Inc., 1998). ISBN: 0-471-50275-8 39. J.P. Thiran, Recursive digital filters with maximally flat group delay. IEEE Trans. Circ. Theory. CT-18(6) (1971) 40. R.E. Crochiere, L.R. Rabiner, Multirute Digital Signal Processing (Prentice-Hall, Englewood Cliffs, New Jersey, 1983) 41. C.W. Farrow, A continuously variable digital delay element, in Proceedings of IEEE International Symposium Circuits Systems, vol. 3 (Espoo, Finland, 1988), pp. 2641–2645 42. G.S. Liu, C.H. Wei, A new variable fractional delay filter with nonlinear interpolation. IEEE Trans. Circ. Syst. II Analog Dig. Signal Process. 32(2), 123–126 (1992) 43. H. Johansson, P. Löwenborg, On the design of adjustable fractional delay filters. IEEE Trans. Circ. Syst. II Analog Dig. Signal Process. 50, 164–169 (2003) 44. A. Franck, Efficient algorithms and structures for fractional delay filtering based on LaGrange interpolation. J. Audio Eng. Soc. 56(12), 1036–1056 (2008) 45. A. Franck, K. Brandenburg, U. Richter, Efficient delay interpolation for wave field synthesis, in 125th Audio Engineering Society(AES) Convention (San Francisco, CA, USA. 2008) 46. W.-S. Lu, T.B. Deng, An improved weighted least-squares design for variable fractional delay FIR filters. IEEE Trans. Circ. ANS Syst. II Analog DSP 46(8) (1999) 47. T.B. Deng, Coefficient-symmetries for implementing arbitrary-order LaGrange-type variable fractional-delay filters. IEEE Trans. Signal Process. 55(8), 4078–4090 (2007) 48. J. Vesma, T. Saramäki, Optimization and efficient implementation of FIR filters with adjustable fractional delay, in Proceedings of IEEE International Symposium Circuits Systems, vol. 4 (Hong Kong, 1997), pp. 2256–2259 49. K. Rajamani, Y.S. Lai, C.W. Farrow, An efficient algorithm for sample rate conversion from CD to DAT. IEEE Signal Process. Lett. 7(10), 288–290 (2000) 50. T.A. Ramstad, Digital methods for conversion between arbitrary sampling frequency. IEEE Trans. ASSP ASSP-32 (1984) 51. P.J. Kootsookos, R.C. Williamson, FIR approximation of fractional sample delay systems. IEEE Trans. Circ. Syst. II Analog Dig. Signal Process. 43(3), 269–271 (1996) 52. C.K.S. Pun, Y.C. Wu, S.C. Chan, K.L. Ho, On the design and efficient implementation of the Farrow structure. IEEE Signal Process. Lett. 10(7), 189–192 (2003) 53. http://www.acoustics.hut.fi/software/fdtools/ 54. S. Cucchi, F. Desinan, G. Parlatori, G. Sicuranza, DSP implementation of arbitrary sampling frequency conversion for high quality sound application, in Proceedings of IEEE ICASSP91 (Toronto, 1991), pp. 3609–3612 55. J. Dattorro, Effect design, part 2: delay-line modulation and chorus. J. Audio Eng. Soc. 45(10), 764–788 (1997) 56. M.R. Schroeder, Digital simulation of sound transmission in reverberant spaces. Part 1. J. of Acoust. Soc. Am. 47(2), 424–431 (1970) 57. D. Rocchesso, Fractionally addressed delay lines. IEEE Trans. Speech Audio Proc. 8(6) (2000)
416
5 Special Transfer Functions for DASP
58. A.H. Gray, J.D. Markel, Digital lattice and ladder filter synthesis. IEEE Trans. Audio Electroacoustic. AU-21, 491–500 (1973) 59. W. Grant Gardner, The Virtual Acoustic Room, Thesis S.B., Computer Science and Engine, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1982 60. G. Martinelli, M. Salerno, Fondamenti di Elettrotecnica, vol. 1 e 2, Edizioni Siderea, Roma, II ed. 1995 61. A.H. Gray, J.D. Markel, A normalized digital filter structure. IEEE Trans. Acoust. Speech Signal Process. ASSP-23, 268–277 (1975) 62. P.A. Regalia, S.M. Mitra, P.P. Vaidaynathan, The digital all-pass filter: a versatile signal processing building block. Proc. IEEE 76(1), 19–37 (1988) 63. M.V. Mathews, The Technology of Computer Music (MIT Press, Cambridge, MA, 1969) 64. W. Hartmann, Digital Waveform Generation by Fractionally Addressing. J. Acoust. Soc. Am. 82, 1883–1891 (1987) 65. F.R. Moore, Table lookup noise for sinusoidal digital oscillator. Comp. Music J. 1(1), 26–29 (1977) 66. H.G. Alles, Music synthesis using real time digital techniques. Proc. IEEE 68(4), 436–449 (1980) 67. M. Puckette, The Theory and Technique of Electronic Music. Hackensack (World Scientific Publishing Co., NJ, 2007) 68. T. Stilson, J.O. Smith, Alias-free digital synthesis of classic analog waveforms, in Proceeding of ICMM (1996) 69. J.O. Smith, Physical Audio Signal Processing (2010). Web published at http://ccrma.stanford. edu/~jos/pasp/ 70. P. Cook, P. Scavone, Synthesis Tool Kit in C++. http://www-ccrma.stanford.edu/CCRMA/ software/STK/ 71. S. Mitra, K. Hirano, Digital all-pass networks. IEEE Trans. Circ. Syst. 21(5), 688–700 (1974) 72. A.H. Gray, J.D. Markel, Digital lattice and ladder filter synthesis. IEEE Trans. Audio Electroacoust. 21(6), 491–500 (1973) 73. J. Pekonen, V. Välimäki, J. Namy, J.O. Smithy, J.S. Abel, Variable fractional delay filters in bandlimited oscillator algorithms for music synthesis, in Intern Conference on Green Circuits and Systems (ICGCS) (2010) 74. P.P. Vaidyanathan, S. Mitra, Y. Neuvo, A new approach to the realization of low-sensitivity IIR digital filters. IEEE Trans. Acoust Speech Signal Process. ASSP-34 2, 350–361 (1986) 75. P. Kootsookos, R.C. Williamson, FIR approximation of fractional sample delay systems. IEEE Trans. Circ. Syst. -II Analog Dig. Signal Process. 43(2) (1996) 76. A. Bhandari, P. Marziliano, Fractional delay filters based on generalized cardinal exponential splines. IEEE Signal Process. Lett. 17(3) (2010) 77. J.O. Smith, Introduction to digital filters for audio applications. Online Book, Ed. (2007). Web published at https://ccrma.stanford.edu/~jos/filters/ 78. E. Zwicker, H. Fastl, Psychoacoustics Facts and Models (Springer-Verlag, Berlin, Germany, 1990) 79. J. Stautner, M. Puckette, Designing multi-channel reverberators. Comput. Music J. 6(1), 52–65 (1982) 80. V. Välimäki, J.D. Parker, L. Savioja, J.O. Smith III., J.S. Abel, Fifty years of artificial reverberation. IEEE Trans. Audio Speech Lang. Process. 20(5), 1421–1448 (2012) 81. S.J. Schlecht, E.A.P. Habets, On lossless feedback delay networks. IEEE Trans. Signal Process. 65(6), 1554–1564 (2017) 82. J.O. Smith, P. Gossett, A flexible sampling-rate conversion method, in Proceedings of the Intern Conference on Acoustics, Speech, and Signal Processing, vol. 2 (San Diego, New York, IEEE Press, 1984), pp. 19.4.1–19.4.2. expanded tutorial and associated free software available at the Digital Audio Resampling Home Page: http://ccrma.stanford.edu/~jos/resample/ 83. E. Meijering, A chronology of interpolation: from ancient astronomy to modern signal and image processing. Proc. IEEE 90, 319–342 (2002)
References
417
84. M.M.J. Yekta, Equivalence of the Lagrange interpolator for uniformly sampled signals and the scaled binomially windowed shifted sinc function. Digital Signal Processing 19, 838–842 (2009) 85. Ç. Candan, An efficient filtering structure for LaGrange interpolation. IEEE Signal Process. Lett. 14(1), 17–19 (2007) 86. J.O. Smith, Bandlimited interpolation, fractional delay filtering, and optimal FIR filter design, from lecture overheads, in Center for Computer Research in Music and Acoustics (CCRMA) (Stanford University, 2020). https://ccrma.stanford.edu/~jos/Interpolation/Interpolation.html 87. E.W. Weisstein, Newton’s Backward Difference Formula. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/NewtonsBackwardDifferenceFormula.html 88. V. Lehtinen, M. Renfors, Structures for Interpolation, Decimation, and Nonuniform Sampling Based on Newton’s Interpolation Formula, HAL Id: hal-00451769 https://hal.archivesouvertes.fr/hal-00451769, Submitted on 30 Jan 2010
Chapter 6
Circuits and Algorithms for Physical Modeling
6.1 Introduction Sound and acoustic wave propagation are very complex phenomena, and for the generation or numerical manipulation of audio signals, it is of primary importance to have both theoretical and practical tools to model this complexity in systematic terms. The modeling of acoustic phenomena is, in fact, central in many applications of digital audio. Think, for example, of the simulation of propagation phenomena in confined environments for virtual acoustic reconstruction or sound synthesis by physical models of acoustic musical instruments. In these cases, it is the numerical model that becomes the instrument itself. The simulation of an analog acoustic system by an algorithm, which by definition is defined in discrete-time (DT), can be done by taking into account several very different paradigms. The relationships between continuous-time (CT) acoustic quantities, known in terms of partial differential equations (PDEs), are converted, according to some paradigms, into relationships between numerical sequences, in terms of finite-difference equations (FDEs). The solution is done in terms of algorithms that can be implemented on DSP devices that can, also, operate in real time. In the following sections the main methodologies for numerical simulation of analog dynamic systems will be discussed. In particular, the methodologies of wave digital filters (WDF); digital waveguides (DW); and the finite-difference time-domain (FDTD) technique are presented and discussed. Finally, we will see how it is possible to interconnect DT sub-models conceived with different philosophies with appropriate methodologies, derived from the WDF theory. It is in fact possible to connect numerical sub-structures modeled with different paradigms, taking into account appropriate energy constraints.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Uncini, Digital Audio Processing Fundamentals, Springer Topics in Signal Processing 21, https://doi.org/10.1007/978-3-031-14228-4_6
419
420
6 Circuits and Algorithms for Physical Modeling
6.1.1 Local and Global Approach The determination of a mathematical model of a complex physical system can be done with different modes and paradigms (e.g., FDE, DW, WDF, FDTD). In the following, we want to focus on two different modeling philosophies called global approach and local approach.
6.1.1.1
Global Approach
A first modality consists in discretizing the mathematical relationship between the analog physical system variables. If the system is described by a differential equation, the digital simulator is determined by the finite-differences equation derived by mapping the time variables t ∈ R, defined in CT, into a discrete index n ∈ Z, defined in DT. For example, in the case of a linear system, this is generally done in terms of transfer function (TF): A mapping is made from the complex plane defined by the Laplace variable s ∈ C, to the plane of complex variable z ∈ C [1]. In other words, with the global approach, the determination of the DT circuit that simulates (or implements) the analog system is done considering the relationship between input and output variables in global terms, for example, as we have seen in the case of the shelving filters described in Chap. 3. The global approach, being a purely mathematical method, does not always allow the preservation of certain intrinsic structural system properties (such as robustness and passivity). When mapping a linear CT systems to DT systems, a great deal of attention must be paid to problems related to stability (mapping of s poles to z poles), robustness, aliasing problems, avoiding delay-free loops, etc.
6.1.1.2
Local Approach
A second philosophy is to discretize the individual elements that make up the analog physical system. The so-called constitutive equations or element equations, of the basic elementary or atomic constituents or elements of the physical system are mapped. Think of a simple mechanical system consisting of masses, springs, dampers; or an analog electrical circuit that implements a filter, characterized by a certain TF H (s). With the global approach the DT-TF H (z) is determined through a mapping of the complex variables s to z (e.g., by using the bilinear transformation Sect. 2.8.5.2). On the contrary, with the local method the single constituent relations of the analog circuits elements (resistor R, inductor L, capacitor C, generators, transformer, N -ports networks, etc.) are mapped. The DT circuit is determined by the connection of the corresponding redefined DT’s circuit elements. Although the local approach is conceptually simpler, as it allows the discretization of individual parts of the CT system, great attention must be paid to how the various blocks are interconnected.
6.1 Introduction
421
In the case of complex dynamic systems, this second approach can be more appropriate because it is possible to associate the same functional valence to the single DT block as the CT block. As we will see, moreover, the local approach allows the subdivision of the whole phenomenon to be modeled into sub-models, each of which can be characterized by certain local properties. One of the main advantages of the local method is that if properly done, it maintains some remarkable properties of the CT system. In the case of discretization of a CT circuit, for example, the property of passivity can be preserved, which is very important as it gives the DT circuit robustness characteristic. Moreover, if the analog physical system is characterized by certain initial conditions (i.c.) on some elements (e.g., a charged capacitor at t = 0), with the local method these would be inserted directly into the DT circuit element. On the contrary, with a global approach, the i.c. are a purely mathematical facts that are not associable to a precise physical element of the system.
6.1.2 Structural, Functional and Interconnected Models In general terms, for complex physical systems modeling, it is convenient to have a subdivision of the physical systems into sub-models or sub-blocks [2]. In this way it is possible to obtain a numerical model characterized by: • low local complexity of the single block; • high global complexity of the overall connected blocks. Partitioning can take place with even different philosophies: Sub-models can be identified starting from physical similarity, functional similarity, simplicity of formal description, or other specific criteria that can be defined when necessary. Physical similarity—In case one wants to model the behavior of a complex mechanical–acoustical structure, some distinct organs can be identified. For example, in the case of musical instruments such as the piano, we can identify: the hammer, the string, the piano soundboard; in the case of violin: the bow, the string, the bridge, the soundboard, etc. Functional similarities—In this case the paradigm is not the physical one but the functional one. Always considering the case of musical instruments we can identify functional organs such as the exciter, the resonator, or a subset of parts fulfilling a specific function (e.g., the damper pedal of the piano). Simplicity of formal description—The complex system is sectioned into submodels starting from descriptions in formal terms. For example, it is possible to divide the phenomenon into linear and other nonlinear parts. In the simulation of analog circuits the elementary formal models are the constituent relations of the basic circuit components R, L, C. The constitutive relations can also be defined in a more general form considering multi-port networks that are accessible and can be characterized only by the external
422
6 Circuits and Algorithms for Physical Modeling
access variables. Each structure can be viewed as an externally accessible block (or device) and can be modeled in aggregate form as an N -port network. Therefore, with this approach it is possible to model even more complex structures with active elements, containing, for example, bipolar transistors (BJT) or other electronic devices with active circuits such as operational amplifiers, thermionic valves and so on. The subdivision into sub-models can be taken to its limits by considering atomic or fine-grained physical or functional structures. Such simulation methods are characterized by the construction of space–time grids in which each node represents a simple computational process and, in general, replicated in all nodes. Among these techniques we can mention the following: • Atomic decomposition—the system is created by connecting many elementary computational elements, such as elementary oscillators, which exchange energy with each other. In practice, the simulator consists of the connection of many damped mechanical harmonic oscillators. One or more of these oscillators are excited by external forces while the output signal is picked up by an oscillator in the appropriate position. This method is characterized by a high structural and computational cost and a difficult control of the variables [6]. • Modal models—The modal model was born as an extension of the atomic model and aims to have its own descriptive generality. The simulator is made up of a set of independent-controllable elementary oscillators that are connected to each other by means of redistribution matrices that have the task of distributing energy over their respective natural modes oscillation. The output is usually determined by the weighted combination of the signals from the various damped oscillators. It is obvious that other types of subdivision are also possible and that there can be considerable overlapping and sharing among the derived submodels starting from different paradigms.
6.1.3 Local Approach with Circuit Model A very powerful formal tool for the modeling and simulation of physical phenomena is the circuital one. With such a paradigm extended to discrete time it is possible, in fact, to describe complex phenomena, from a structural, functional and formal point of view, using few elements (circuit elements), characterized by few constitutive relations, and precise general topological laws (Kirchhoff’s laws) that allow the systematic, consistent and simplified study of the interconnection of such elements in formal terms [3]. In general, as shown in Fig. 6.1, the modeling process of a complex physical phenomenon can be divided into two phases: 1. subdivision into sub-blocks; 2. identification of the connection interfaces between the sub-blocks.
6.1 Introduction
423
Fig. 6.1 Modeling of a complex physical phenomenon by a circuit model. Interconnection of simple, linear, nonlinear lumped and distributed elements with digital waveguides and impedance adapters
The identification of sub-blocks consists in the identification of local properties (structural, functional, formal, topological) in order to make possible a consistent subdivision of the physical system to be modeled. In this phase we are generally still in a continuous time domain and the formal block description can be done in mathematical terms (linear and nonlinear differential equations, transfer functions); or, already in this phase, circuit elements and/or interconnected circuits can be identified according to a certain topology (equivalent circuit model). The use of the local model with equivalent circuit allows a division of the problem in both theoretical and practical terms. The various sub-blocks are implemented with separate software simulators that can be implemented very easily with object-oriented programming [4]. It is well known in fact that one of the main advantages of the socalled circuital approach to modeling is based in the dualism circuit ⇔ graph, graph ⇔ algorithm,
⇒
circuits ⇔ algorithms.
424
6 Circuits and Algorithms for Physical Modeling
This is particularly suitable and consistent with modern complex physical systems simulation software development methodologies: The circuit approach is strongly oriented to the design of object-oriented algorithms [5]. With the circuital approach, however, the problem of the consistent interconnection of the various sub-blocks arises. In fact, these blocks must be connected respecting precise system local energy constraints. Therefore, these physical energy constraints must be translated into a corresponding numerical constraint. For this purpose, as we will see later, it is possible to define a new two- or multi-port DT network, used as interconnection interface between the constituent elements. This new device called wave adapter or diffusion junction takes into account of energy constraints between the physical or wave variables of the system. Remark 6.1 The local approach by its essence leads to the creation of DT structures that are not optimal in terms of computational complexity. However, it is not a major issue: the possibility of having local elements (and well localized also in sw structures) allows the possibility of having a high degree of local and orthogonal control parameters, a property very often indispensable in complex audio applications.
6.1.3.1
Connection of Linear and Nonlinear, Lumped and Distributed Elements
Circuit elements in general can be linear or nonlinear, lumped or distributed. In particular, in acoustic modeling, interconnections of different types of elements are used. For propagation phenomena modeling, mainly distributed linear models such as digital waveguides (DW), sometimes called digital transmission lines (DTL), are used. DWs model propagation in a medium (e.g., air) and can be connected to other DWs that model propagation in a medium of a different nature (e.g., a wall) by means of diffusion junctions that can be thought of as numerical impedance adapters [13]. In the case of physical modeling of acoustic musical instruments, DWs are used to connect linear elements (e.g., resonant filters) and nonlinear elements that are typical of external excitations (e.g., piano hammer, violin bow, clarinet reed). The first formal tool described in this chapter, of fundamental importance for the modeling of systems that can be described with lumped parameters circuit, is the theory of wave digital filters (WDF) introduced by Fettweis in the early 70s of the last century [36–39]. WDFs allow a synthesis of DT circuits starting from systematic CT prototypes (suitable for modeling of complex analog circuits) and preserve a number of important properties. If the analog circuit is lossless or passive, the digital counterpart is also lossless or passive. The second theoretical tool that is analyzed is the one related to distributed constant circuits. In the following, in fact, the concepts introduced with WDF are extended and integrated with the study of digital waveguides (DW), introduced in sound synthesis for physical modeling, by Smith in the early 90s.
6.1 Introduction
425
In this case the numerical modeling is based on the discretization of the solution of the differential equation (rather than the equations themselves) which for convenience is expressed in terms of wave variables here denoted as W -model. We will also see the extension of WDFs connected to a static and dynamic nonlinear elements. Widely used in electromagnetic modeling, the third tool is the finite-difference time-domain (FDTD) technique also denoted to as finite-difference methods (FDMs), particularly suitable for 2D and 3D acoustic modeling, will be briefly illustrated. In FDTD the differential equation is empirically rewritten considering the variables in discrete form, that is, sampling them in the space–time domain. In this case the variables are the same physical quantities of the system that generally have precise energy relationships between them and for this reason, are referred to as Kirchhoff variables, or K -model.
6.1.3.2
K -models versus W -models
A not secondary aspect in the simulation of complex systems, concerns the choice of the variables of interest. In the case of the simulation of analog system, the most obvious choice would be in the simulation of Kirchhoff variables related to precise physical quantities such as voltage and current, pressure and velocity, force, and velocity. In this case, as introduced in the previous section, the model would be denoted as K -model. However, for the consistency of the simulator, sometimes it is more convenient to use other types of variables, not always related to precise physical quantities and generically indicated as wave variables. So the model should be referred to as W -model. For example, WDF and DW simulation methodologies, as we will see later, are referred to wave variables (i.e., W -model). On the contrary, the FDTD technique is usually directly referred to physical quantities such as voltage–current or pressure– velocity. The link between the quantities is expressed directly by Kirchhoff’s relations, (i.e., K -model). In addition, the diffusion or scattering junctions in the W -model case are sometimes called wave adapters; in the K -model case they are called impedance adapters.
6.1.3.3
Wiener and Hammerstein Nonlinear Models
The sound emitted by any musical instrument is based on an oscillatory phenomenon. A typical musical instrument can be represented by the diagram in Fig. 6.2. The diagram is quite general and applies to string and air instruments. For example, in the piano, the excitation is originated by the percussion of the hammer on the string and the radiation is made by the soundboard, while in the clarinet, the excitation is made by the nonlinear reed mechanism that drive the oscillation due to the column of air in the acoustic tube that behaves like a string. Again, in the
426
6 Circuits and Algorithms for Physical Modeling Trigger Excitation
String
Resonator Radiator
Output sound
Fig. 6.2 Main structural macro blocks of a typical musical instrument
guitar, the strings are plucked with a plectrum (or guitar pick) or with the fingers, and the sound is emitted by the soundboard (or top) and the sound box (or resonant chamber), which behaves like a Helmholtz resonator. From the diagram in the figure, it can be seen that the string vibrating in its natural modes generates the sound; the excitation provides the energy to the string, and finally, the radiator/resonator amplifies and diffuses the acoustic signal. The simplest partitioned nonlinear system can be constructed as linear dynamic system, characterized by a transfer function H (z) = N (z)/D(z), connected to memoryless1 (or zero memory) nonlinear function g(·). The simplest models of this class are the Hammerstein and Wiener models. • The Wiener model is a linear dynamical system followed by a memoryless nonlinearity Fig. 6.3a; • the Hammerstein model is a memoryless nonlinearity followed by a linear dynamical system with transfer function H (z), Fig. 6.3b. In addition, you can also use mixed multi-layer models and various topologies such as those shown in Fig. 6.3c, d [51, 52].
6.2 Wave Digital Filters In a series of papers from the early 1970s [36–38], Fettweis introduced the concept of wave digital filters (WDF) as a means of obtaining robust and low-sensitivity structures. Such filters belong to the family of structurally passive DT circuits, (also described in [50]) and are characterized by low sensitivity, low quantization noise (or roundoff noise) and free from limit cycles. Historically, moreover, WDFs were the first DT circuits with structural passive properties. The first two step in building a WDF, given set specifications [41], are: (i) instead of voltages V and currents I of the K -model (or any other across and through quantities of interest), are used the so-called wave variables A and B of the W -model, defined as A = V + RI B = V − RI 1
In circuits theory an element is said without memory or memoryless, if in its constitutive relation there are no temporal derivatives.
6.2 Wave Digital Filters
x[n]
a)
x[ n]
427
v[n]
N ( z) D( z )
g ( x[ n])
v[ n]
y[ n]
g (v[n])
y[n]
N ( z) D( z )
b)
x[n]
c)
x[n]
d)
v1[ n]
N1 ( z ) D1 ( z )
g1 ( x[n])
v1[ n]
g (v1[ n])
N ( z) D( z )
v2 [ n ]
v2 [ n ]
N2 ( z) D2 ( z )
g 2 (v2 [ n])
y[ n]
y[ n]
Fig. 6.3 Wiener and Hammerstein nonlinear models. a Wiener linear-nonlinar (LN) model; b Hammerstein model nonlinear-linar (NL) model; c Hybrid Wiener–Hammerstein Linearnonlinear-linear (LNL) model; d Hybrid Wiener–Hammerstein model Nonlinear-linear-nonlinear (NLN) model
where R is a positive constant, called the port resistance, and depends on the situation, A and B will be referred to as the forward and backward traveling wave, or port quantities; (ii) the integration operation with respect to time is performed with the trapezoidal rule t x(τ )dτ y(t) = y(t − T ) + t−T
approximated as the modified Euler formula (see Sect. 2.8.5.1), y(t) = y(t − T ) +
T 2
[x(t) − x(t − T )] .
Thus, denoting by ω and , the frequency in the digital and analog domains, respectively, the mapping between the s-plane and z-plane is performed through the bilinear transformation in Eq. (2.115) (see Sect. 2.8.5.3) rewritten here for your convenience
428
6 Circuits and Algorithms for Physical Modeling
Fig. 6.4 Structure of an analog filter modeled as a two-pair or two-port network terminated on R1 and R2 resistances
j =
2 1 − e− jω Ts 1 + e− jω
(6.1)
where Ts is the sampling period. So, in order to have a digital filter with a cut-off frequency equal to ωc , it is necessary to first design an analog filter H (s) with a predistorted cut-off frequency defined as c = (2/Ts )tan(ωc /2). The second step is to design the analog filter with standard electrical components: resistors R, inductors L and capacitors C (in addition to R LC elements, transformers and other ideal active components are also sometimes included). The analog filter is designed in the form of a 2-port (2P) lossless network doubly terminated on resistances as shown in Fig. 6.4. The synthesis process is carried out according to the maximum power transfer criterion, i.e., at frequency k , which corresponds to the maximum in the bandpass section, the voltage generator Vi ( j) transfers the maximum power to load R2 . With this paradigm, it is known that the analog filters are characterized by a low parametric sensitivity. The third step is to translate the analog TF from s-plane back into z-plane. Remark 6.2 Note that the synthesis procedure with WDFs is done considering the wave variables A(s) and B(s). The use of A and B variables (as alternatives to Kirchhoff variables) is widely used in both the analysis and synthesis of CT circuits [3]. Moreover, the s-plane to z-plane mapping by the bilinear transformation in Eq. (6.1), allows [39]: 1. to avoid delay-free directed loops that give rise to an not-computable graphs in the corresponding DT circuit; 2. that the total delay in any loop (directed or not) is equal to a multiple (zero, positive or negative) of sample period T . With this approach, it is possible to model continuous-time circuits (or other analog system) that contain feedback loops, which are very common in analog filters used in several audio devices such as, for example, voltage-control filters (VCF) in analog synthesizers.
6.2 Wave Digital Filters Fig. 6.5 Bipole represented with: a electrical quantities voltage V and current I, and b with wave variables A and B
429 I
+ V
A
Bipole
Bipole
−
a)
B
b)
6.2.1 Representation of CT Circuits with Wave Variables Let’s consider the two Kirchhoff’s variables2 V and I , that characterize any linear bipole (or port). Definition 6.1 Wave variables—In the case where the bipole has no internal excitations, the physical variables V and I can be transformed into another pair of variables, generically called A and B, by the following linear transformation A = a11 V + a12 I B = a21 V + a22 I
(6.2)
with a11 a22 − a21 a12 = 0. Depending on the choice of the transformation coefficients ai j , the port quantities A and B may or may not have a certain physical interpretation. A possible choice, very useful as it highlights the energy exchanges between generators and bipoles, is the one that allows to interpret A and B as wave variables3 : where we have that a11 = a21 = 1 and a12 = −a22 = R. Definition 6.2 Normalization constant—The parameter R is a normalization constant, it can easily be observed that it has the dimension of a resistance and is often called a reference resistance. With such formalism the variables A and B represent, in fact, voltage waves that are sometimes indicated with V + and V − respectively. Figure 6.5 shows a bipole represented with voltage and current and wave variables. For a N -ports circuit, like the one described in Fig. 6.3a, it is therefore possible to write Ak = Vk + Rk Ik , k = 1, 2, . . . , N (6.3) Bk = Vk − Rk Ik where Rk is the reference resistance or port resistance of the k-th port. The variables Ak and Bk are called wave variables and, in particular, Ak represents a positive voltage traveling wave or incident wave entering the port, while Bk represents a negative voltage traveling wave or reflected wave leaving the port. 2
The discussion can be done in both domain s and t domain. This name derives from the fact that the variables A and B can be thought of as traveling waves in a transmission line section of infinite length of characteristic impedance R. 3
430
6 Circuits and Algorithms for Physical Modeling
V2 B2 A2 R2
I1 A1
V1
B1
I2
2
R1 1
N RN
BN AN
VN
IN
a)
j Rj
Bj Aj
Ak R k Bk k
b) Fig. 6.6 Generic analog N -port network. a The wave variables Ak and Bk are highlighted. b N ports network interconnection or cascade where are highlighted the continuity conditions: A j = Bk , Ak = B j and R j = Rk
Since the transformation is invertible for R = 0, there is no loss of information in the use of these quantities rather than the physical quantities Vk and Ik : The study of the networks can therefore be reformulated by means of the quantities Ak and Bk : The inverse transformation from which it is possible to trace the physical variables is in fact valid. Vk = (Ak + Bk )/2 , Ik = (Ak − Bk )/2Rk
k = 1, 2, . . . , N .
(6.4)
In addition, when two N -ports are connected, as shown in Fig. 6.6b, to maintain the electrical charge continuity in the wave flow, we have to consider the following conditions R j = Rk so that A j = Bk and Ak = B j . Remark 6.3 Note that the port resistors R can take on an arbitrary value. However, the choice of its value must be made carefully in order to avoid delay-free loops when digital circuits are connected.
6.2 Wave Digital Filters
6.2.1.1
431
Multiport WDFs in Matrix Notation
According to [48], for a generic N -port element the transformation that maps K -port variables onto W -port variables, Eqs. (6.3) can be written in matrix form as
A J R V = B J −R I
(6.5)
where V, I ∈ R N ×1 are, respectively, the vectors of across and through K -model quantities; A, B ∈ R N ×1 are, respectively, the vectors of incident and reflected waves; R ∈ R N ×N = diag(R01 , R02 , . . . , R0N ) is the diagonal port resistance matrix; and J is the (N × N ) unitary matrix.
6.2.2 Mapping the Electrical Elements from CT to DT The idea is to redefine the constituent relations of the CT circuit elements by means of the A, B variables in the Laplace domain and remap the relation of the element with bilinear transformation in Eq. (6.1) in the corresponding bipolar DT circuit.
6.2.2.1
The Reflection Coefficient
Let’s consider the constitutive relation of an impedance, with zero i.c., in the Laplace domain V (s) = Z (s)I (s). Substituting this relation into Eq. (6.3) we obtain A(s) = Z (s)I (s) + R I (s) B(s) = Z (s)I (s) − R I (s)
(6.6)
by replacing the first of the two equations in the second we get B(s) = K (s)A(s) where K (s) =
Z (s) − R B(s) = . A(s) Z (s) + R
(6.7)
(6.8)
The term K (s) is defined as the ratio between the reflected wave B(s) and the incident wave A(s) and in transmission line terminology this ratio is defined as the reflection coefficient. The terms R and Z (s) represent the characteristic line impedance and load impedance respectively. The K (s) function has the same meaning as the Z (s) function defined as the ratio between voltage V (s) and current I (s). Mapping the (6.7) in z taking into account the (6.6) gives the relationship in the z domain
432
6 Circuits and Algorithms for Physical Modeling
B(z) = K (z)A(z) where K (z) = K (s)|s= 2 1−z−1 . T 1+z −1
(6.9)
The K (z) represents a relation defined with respect to the wave variables A(z) and B(z), of the analog component mapped in the z domain. It is therefore possible to define the impedances, i.e., the constitutive relation, of the circuit elements R, L and C and the real voltage generators, directly in the z domain.
6.2.2.2
Inductance
For Z (s) = s L, for Eq. (6.8), we have K (s) = (s L − R)/(s L + R). It follows that for (6.9) the reflection coefficient K (z) assumes the form K (z) =
2(1 − z −1 )L − (1 + z −1 )T R . 2(1 − z −1 )L + (1 + z −1 )T R
By choosing a reference resistor of value R = (2/T )L, the previous expression becomes (1 − z −1 )2L − (1 + z −1 )2L K (z) = = −z −1 . (1 − z −1 )2L + (1 + z −1 )2L The constitutive relationship of the Inductor in z domain is therefore equal to K (z) = B(z)/A(z) = −z −1 . The relationship between the quantities A(z) and B(z) thus represents the Inductor element of the CT circuits mapped in the z domain. In other words, the Inductor in the z-domain can be represented, unless a sign, with a simple delay element as shown in Fig. 6.7a.
6.2.2.3
Capacitor
In a similar way to the case of the inductor, place Z (s) = 1/sC and placing the reference resistance equal to R = T /(2C) we have that K (z) = B(z)/A(z) = z −1 .
6.2 Wave Digital Filters
433
Fig. 6.7 Equivalent circuits in z domain of analog L , C, R elements. a Inductor; b Capacitor; c Resistor
Fig. 6.8 Equivalent circuit in z domain. a short circuit; b and open circuit
6.2.2.4
Resistor
In the case of the resistor, where Z (s) = R, with the reference resistor R, you get K (z) = 0. Thus, the analog resistor behaves like an open circuit in z K (z) = B(z)/A(z) = 0 in analogy to transmission lines, if the line impedance Z (s) = V (s)/I (s), is identical to the load impedance, the reflected wave is zero.
6.2.2.5
Short Circuit and Open Circuit
In the case of a short circuit, as shown in Fig. 6.8, you have Z (s) = 0 so from (6.8) you have K (s) = −1 and, consequently, K (z) = −1. In the open circuit Z (s) = ∞ so it follows that K (s) =1 and K (z) =1. In both cases the R value is arbitrary.
434
6 Circuits and Algorithms for Physical Modeling
Fig. 6.9 Equivalent real voltage generator circuit with purely resistive internal impedance
6.2.2.6
Real Voltage Generator
For the determination of the DT model of the real voltage generator (ideal generator with series impedance) it is necessary to consider the generic port expression A = V + RI B = V − RI
(6.10)
V = I Z (s) + V0
(6.11)
subject to external constraint where V0 represents the nominal voltage of the ideal generator and Z (s) its characteristic impedance (Fig. 6.9). The equivalent circuit of the real voltage generator depends, therefore, on the nature of its internal impedance. From (6.10) and (6.11) it is possible to express the term B by expressing it as B = K 1 (s)V0 + K 2 (s)A where K 1 (s) =
2R Z (s) − R , and K 2 (s) = . R + Z (s) Z (s) + R
In case Z (s) = R0 , place R = R0 , you have K 1 (z) = 1, and K 2 (z) = 0 so that the constituent relationship is simply B = V0 . In case Z (s) = s L, place R = 2L/T , we have K 1 (z) = 1 + z −1 , and K 2 (z) = −z −1 so the constituent relation is expressed as an equivalent circuit in Fig. 6.10a. In case Z (s) = 1/sC, place R = T /(2C), we have that K 1 (z) = 1 − z −1 , and K 2 (z) = z −1 for which the constituent relation is expressed as equivalent circuit in Fig. 6.10b.
6.2.3 Connecting DT Circuit Elements The connection of two or more analog circuit elements, implicitly, determines an energetic constraint between the elements deriving from the topological Kirchhoff
6.2 Wave Digital Filters
435
Fig. 6.10 Equivalent circuit in the z domain of real voltage generator with internal impedance. a Purely inductive; b Purely capacitive
laws concerning the “across” and “through” quantities as the voltage and the current in electrical circuits. In contrast, numerical modeling through DT circuits is based on a model with only one variable indicated as sequence, and the connection of two or more elements defined in the DT does not imply an energy constraint between these elements. Let’s suppose we want to obtain a DT equivalent of a CT circuit and that this is composed of various elements (generators, resistors, inductors and capacitors, transformers, etc.) connected to each other. Before connecting the DT elements it is necessary to verify the energetic constraints between the wave variables of the various DT elements. In fact, these variables, which appear in the nodes of the generic DT circuit element, are not compatible with each other in the sense that the reference resistance R used in their definition is not the same for all the elements. To make the interconnection in an appropriate way it is then necessary to define a digital equivalent of the interconnection of the analog circuits. This element defined, as already indicated above, as a scattering junction or as wave adaptor (WA) can be realized considering the Kirchhoff laws of series and parallel connections of multiport networks and then expressing these Kirchhoff’s-constraints according to the wave variables Ak and Bk definitions.
6.2.3.1
Parallel Wave Adapter
Consider as an example the parallel connection of three bipolar elements described in Fig. 6.11a. According to Kirchhoff’s laws, it is easy to verify, just from the definition of parallel connection, that the electrical quantities are constrained by the following relation (6.12) V1 = V2 = V3 , and I1 + I2 + I3 = 0.
436
6 Circuits and Algorithms for Physical Modeling
I1 V1
V3
A1
G2
G1
B1
I3
a)
G3
B2 A2
B3 A3
V2 I1
I2
V1
A3 B3
A1 B1
I3
R3 R1
R2
b)
V3
B2 A2
V2
I2
Fig. 6.11 Connection models for 2P networks and bipoles in wave variables Ak and Bk . a Parallel connections with port conductances G k ; b Series connections with port resistor Rk
From the (6.4) relationships replaced in the (6.12) it is possible to express this constraint according to the Ak and Bk quantities as Bk = 2 A3 − Ak + α1 (A1 − A3 ) + α2 (A2 − A3 ), for k = 1, 2, 3
(6.13)
where, considering the port-conductance G k = 1/Rk , we have that αk =
2G k GT
(6.14)
with G T = G 1 + G 2 + G 3 . The structure implementing the (6.13), that is, generating the Bk in response to the Ak is defined as a parallel adaptor (PA). The PA that can be realized as a three-port (Block(3,3), see also [50]), is used in case you want to interconnect three elements characterized by different reference conductances G 1 , G 2 and G 3 . Remark 6.4 Note that, the implementation of the expression (6.13) requires seven sums and two multiplications. The structure can be simplified by imposing (as is almost always possible) the constraint G2 = G1 + G3.
(6.15)
By replacing (6.15) in (6.13) we have that B1 = (−1 + α)A1 + A2 + (1 − α)A3 B2 = α A1 + (1 − α)A3 B3 = α A1 + A2 − α A3
(6.16)
from (6.13) we have that α1 = G1/G2. Remark 6.5 Observe that, the previous expression can be implemented with only one multiplier and four adders as shown in Fig. 6.12. Since B2 is not directly con-
6.2 Wave Digital Filters
437
Fig. 6.12 Parallel wave adapter (PA). a Signal flow graph of a PA with 2-port reflection free; b Circuit symbol of the PA; c Circuit symbol of PA with reflection free 2-port
nected with A2 (in other cases yes) port 2 is called reflection free as shown in the element symbol (Fig. 6.12b). This feature allows you to connect port 2 to an adjacent element without creating delay-free-loops.
6.2.3.2
Series Wave Adapter
With similar considerations it is possible to define the series wave adapter or series adaptor (SA) with reference resistors R1 , R2 and R3 . For the definition of series circuit, we have the constraints V1 + V2 + V3 = 0, and I1 = I2 = I3 . Therefore, from the relationship (6.3) we have Bk = Ak + βk (A1 + A2 + A3 ), for k = 1, 2, 3 where βk =
2Rk . R1 + R2 + R3
(6.17)
(6.18)
Remark 6.6 The implementation of the expression (6.17) can be simplified by imposing the following constraint R2 = R1 + R3
(6.19)
and even in this case, the port 2 is reflection free. The circuit implementing the series adapter is shown in Fig. 6.13.
438
6 Circuits and Algorithms for Physical Modeling
Fig. 6.13 Series wave adapter (SA): a Signal flow graph of an SA with 2-port reflection free; b SA symbol; c SA reflection free symbol
6.2.4 DT Circuit Corresponding to Given Analog Filter The procedure for determining the DT circuit corresponding to the analog filter can be derived through the following steps: • identification of analog series and parallel elements; • determine of the corresponding DT topology. Choose unspecified reference resistances to ensure that the series and parallel adapters are assigned common resistances for each 2P network interconnection. Calculate the values for each multiplier. Example 6.1 Synthesize the DT circuit corresponding to the elliptical filter in Fig. 6.14a, characterized by the following normalized specifications (example from [54]) (see Sect. 2.8.5.3, Fig. 2.43) • Pass-band ripple = 1 dB, stop-band √ ripple = −34.5 dB; • Pass-band angular frequency = 0.5 rad/s, stop-band angular frequency = √ 1/ 0.5 rad/s. The analog filter normalized elements are • C1 = C3 = 2.6189 F, C2 = 0.13941 F, L 2 = 1.2149 H, and R = 1 . Design a WDF that satisfy the above specifications using a sampling angular frequency of 10 rad/s. Determination of Adapter Parameters—For the synthesis of the WDF, the first step is to determine the connections of the series and parallel analog components and the number of adapters required as shown in Fig. 6.14b. The resulting WDF structure shown in Fig. 6.15, consists of two reflection free parallel adapters p1 and p3 , one reflection free series adapter s1 and one parallel adapter p2 . Note that, due to the constraints imposed on all three ports, the p2 adapter cannot be reflection free.
6.2 Wave Digital Filters
439
R
L2
+
+
C2
Vi
R
C1
V0 _
C3
a)
L2 1
C2
3
p3 2
R 3
+ 1
Vi
p1
2
s1
1
2
1
3
p2
2
3
R
+ V0 _
b)
C3
C1
Fig. 6.14 Example of synthesis with WDF. a Normalized elliptical lowpass filter; b identification of series and parallel interconnections modified from [54]
In order to determine the multiplier parameters of the interconnection blocks, a certain order is required. In fact, the values of the coefficients of blocks p1 and p3 depend only on the value of the analog circuit components. Since, on the other hand, the s1 series block is internal, it is necessary to know the resistance at port 1 and port 3 (which can be determined by blocks p1 and p3 respectively) in order to evaluate its multiplier. The evaluation of the parameters of block p2 can be made only after having determined the resistance of port 1 (which depends on s1 ). Interconnection p1 : p
G11 =
2 · 2.6189 1 2C1 p p p p = 1, G 3 1 = = , G 2 1 = G 1 1 + G 3 1 = 9.33622 R T (2π )/10 p
α p1 =
G11 p = 0.107109 G21
Interconnection p3 : p
G13 =
T 2C2 p p p p = 0.25859, G 3 3 = = 1.01687, G 2 3 = G 1 3 + G 3 3 = 1.27546 2L 2 T
440
6 Circuits and Algorithms for Physical Modeling –z–1
p3
1
3
z–1
2
s1
p1
Bi
1
3
1
2
p2 2
A0
1
2
3
3
_
z–1
z1
Fig. 6.15 WDF elliptical filter
p
α p3 =
G13 p = 0.202741. G23
Interconnection s1 : R1s1 =
1 p = 0.10711, G21
R3s1 =
1 p = 0.78403, G23
β s1 =
R2s1 = R1s1 + R3s1 = 0.89114
R1s1 = 0.120194 R2s
Interconnection p2 : p
G12 =
1 1 2C3 p2 p = 1, G 3 2 = = 8.33622 s2 = 1.22158, G 2 = R2 R T p
p
α1 3 =
p G13
2G 1 3 p p = 0.231408 + G23 + G33 p
p
α2 3 =
p
G13
2G 2 3 p p = 0.191234. + G23 + G33
6.3 Digital Waveguide Theory
6.2.4.1
441
Recent Trends in WDF
The WDF methods, while representing a general methodology, present difficulties when the analog circuit has a complex topology. When the topology is not due to the simple series and parallel of elements or in the presence of multi-port networks, operational amplifiers, etc.), the circuit is not treatable [59]. However, it remains an extremely powerful, versatile and effective approach in object-based sound synthesis systems [4], and as general framework for physical modeling [60–65]. For example, in a recent paper Giampiccolo et al. [67], extends the WDF method to the modeling of analog circuits with multiple nullors.4 The nullor is a theoretical two-port element suitable to model several multi-port active devices, very common in complex analog circuit, as operational amplifier and transistors operating in linear regime. Recently, this approach has been used to recreate the DT structures of analog filters that formed the basis of the sound of analog audio effects, equalizers, etc. [64], and of famous synthesizers of the 70s and 80s of the last century [65, 66].
6.3 Digital Waveguide Theory In sound synthesis by physical modeling of a musical instrument, appropriate modeling of the string, or other vibrating structure as air, membrane, etc., with its natural modes, boundary and i.c., and its distributed dynamics, is the basis of any process of creating the virtual model of an acoustic instrument. The strings can be considered as distributed parameters circuits known also as transmission lines, that can be redefined in the discrete time as digital transmission line (DTL) or simply denoted as digital waveguides (DW). Thus, the theory of digital waveguides (DW), can be directly introduced in the DT domain by simply generalizing lumped elements circuits. However, we prefer to follow the development from a physical point of view as proposed by O. J. Smith [12, 13], who originally coined the name digital waveguides, and developed this methodology because, as we will see, it allows to understand some important concepts such as, for example, the losses DW and the use of alternative quantities.
6.3.1 Lossless Digital Waveguides The wave equation for an ideal lossless, linear, flexible string is expressed by D’Alembert equation (see Sect. 1.6.1), that represents the Newton’s second law from the equilibrium between the acceleration force and the curvature tension 4
A nullor is an ideal 2-port network formed by a nullator at its input and a norator at its output. A nullor represents an ideal amplifier, having infinite current, voltage, transconductance and transimpedance gains [3].
442
6 Circuits and Algorithms for Physical Modeling
K
∂2 y ∂2 y = μ , or K y = μ y¨ ∂x2 ∂t 2
(6.20)
where K is the string tension, μ is the linear mass density, y is the displacement of the string, x is the distance from the origin along the string direction, t is time. We have seen in Sect. 1.6.1 that the equation is satisfied by arbitrary functions of the type y(x, t) = fr (t − x/c) + fl (t + x/c)
(6.21)
where the term fr (t − x/c), represents a right traveling wave or progressive wave; the term fl (t + x/c), represents a left traveling wave regressive wave and c represents the wave propagation speed, provided that fr and fl can be differentiable into x and t domains. The solution of the wave equation is a two-variable function and if we want to create a numerical simulator, we need to determine a time sampling period T and a spatial sampling interval X [19–23]. The discretization of the functions fr (t − x/c) and fl (t + x/c), is therefore obtained by changing variables as x → xm = m X, and t → tn = nT. The determination of the solution is simplified by choosing a spatial sampling interval such that X = cT . In this case is possible to write the simulator equations of Eq. (6.21) as ([25], p. 1). y(xm , tn ) = yr [T − m X/T ] + yl [T + m X/T ] = yr [(n − m)T ] + yl [(n + m)T ]. Considering a normalized sampling period, so T = 1, and defining the numerical quantities y − [n] yl (nT ) y + [n] yr (nT ), we can rewrite the simulator as a simple finite-difference equation (FDE) defined as y[m, n] = y + [n − m] + y − [n + m]
(6.22)
which indicates that wave propagation can be obtained by updating the status variables of two delay lines of the type y + [m, n + 1] = y + [m − 1, n], and y − [m, n + 1] = y − [m + 1, n]
(6.23)
i.e., shifting the signal samples to the left and right respectively. The expression (6.23) is defined as a numerical transmission line or digital wave guide (DW) or (DWG) [23–25] and, since the wave is expressly indicated, this is a W -model [57]. So, it can be observed that the DW can be implemented with two delay lines fed with opposite verses, as shown in Fig. 6.16, denoted as bidirectional delay-line.
6.3 Digital Waveguide Theory
y + [n ]
443
z −1
y + [n − 1]
z −1
y + [n − 2]
z −1
y + [n − 3]
m-delay delay-line
z −1
y − [n ]
z −1
y − [n + 1]
z −1
( x = 2cT )
( x = cT )
( x = 0)
y − [n + 2]
y − [n + 3]
( x = 3cT )
Fig. 6.16 Digital waveguide with bidirectional delay line
y + [n ]
+
y − [n ] ( x = 0)
z
−1
y + [n − 1]
z
−1
y + [n − 2]
z
−1
+
y[nT ,0]
z −1
y − [n + 1] ( x = cT )
y + [n − 3]
z −1
y − [n + 2] ( x = 2cT )
z −1
y[nT ,3 X ]
y − [n + 3] ( x = 3cT )
Fig. 6.17 Digital waveguide with physically consistent output signals. In order to have the physical wave it is necessary to add the two components: y[n] = y + [n] + y − [n]
Remark 6.7 Observe that, the components that propagate along the delay lines do not represent the propagating wave: from Eq. (6.22), in fact, it can be observed that to have a wave it is necessary to add the two components as shown in Fig. 6.17. The paradigm of digital transmission lines allows to model quite well the propagation phenomena that can be encountered in practical cases such as in the physical modeling of musical instruments or listening environments. With a delay line, in fact, it is possible to model a simple plane wave while with a bidirectional delay line it is possible to model any type of one-dimensional wave. The one-dimensional wave equation solution, given by the waveguide (6.22), is exact at the discrete points of the delay line, or equivalently in time at the sampling periods. The highest frequencies present in the yr (t) and yl (t) signals must not necessarily exceed half the sampling frequency. To estimate the value of the variables in the guide in non-integer points, it is possible to use interpolator filters inserted along the DLs like those, for example, described in Sect. 5.5.
444
6 Circuits and Algorithms for Physical Modeling
Fig. 6.18 Digital waveguide with distributed losses
6.3.2 Losses Digital Waveguides In real propagation phenomena or in the vibration of strings in air, the wave equation must be modified by inserting terms that model loss. By analogy with damped oscillators as with friction phenomenon, as previously reported in Sect. 1.6.4, we can assume that the loss is proportional to the velocity. Thus, in the presence of losses along the line propagation equation (see Sect. 1.6.4.2, Eq. (1.141)) results in ... K y = μ y¨ + 2b1 y˙ − 2b3 y + · · · , where the term b1 y˙ represents the 1st-order timederivative term or friction losses, proportional to speed. Assuming the simple 1st-order approximation, in the solution of the wave equation it is therefore necessary to insert an exponential term that takes into account these losses an in Eq. (1.138) here rewritten b − μ1 xc
y(t, x) = e
yr (t − x/c) + e
b1 μ
x c
yl (t + x/c).
The numerical model in Eq. (6.22) shall be transformed taking into account losses by entering a term g ±m where g is defined as g e−
b1 T μ
then Eq. (6.22) is modified as y[m, n] = g −m y + [n − m] + g m y − [n + m]. The scheme of the losses DW is therefore the one shown in Fig. 6.18.
6.3.2.1
Strings with Frequency-Dependent Losses and Dispersions
In the case of real string with not infinitesimal cross section and with losses, as reported in [21, 22], the frequency-dependent loss, are modeled through the use of a 3rd-order time-derivative perturbation to the dispersive wave equation. Thus, wave
6.3 Digital Waveguide Theory
445
equation in the presence of frequency-dependent loss, can be written as in ChaigneAskenfelt lossy string model (see Sect. 1.6.4.5), by Eq. (1.152)) here rewritten 2 4 ∂y ∂3 y ∂2 y 2∂ y 2 2∂ y = c − c L − 2b + 2b 1 3 ∂t 2 ∂x2 ∂x4 ∂t ∂t 3
(6.24)
where the stiffness parameter is given by = r g2 (E S/K L 2 ), where b1 is the loss factor that depends on the velocity y˙ , b3 is the loss factor that depends on the 3rd time-derivative term, K and μ are the string tension and linear mass density such √ that c = K /μ is the sound speed and E, S and r g are, respectively, the Young’s modulus, the cross-sectional area and the gyration’s radius of the string. In the wave solution the terms b1 and b3 , produce wave fronts that are attenuated exponentially over time, in other words, the waves that propagate are multiplied at each step by a frequency-dependent constant. According with [24, 25], this loss can be modeled with an TF, which in the discrete-time simulator will be indicated with the network function G(z), as shown in Fig. 6.19. The term proportional to (∂ 4 y/∂ x 4 ), takes into account that the wave speed depends on the frequency. You can also see from the equation that this term is non-zero for strings with finite rigidity (E = 0). The approximation to the 1storder of frequency-dependent velocity can be expressed as in Eq. (1.151) [23], here κω2 , where c0 is the ideal string speed the absence of rewritten c(ω) = c0 1 + 2K c02 stiffness, κ = E I and c(ω) indicate that the wave speed is proportional to frequency. From the previous expression it can be observed that the higher frequency signal components have higher speed than those with lower frequency, i.e., the waveform shape changes over time. Remark 6.8 Note that, the situation described above is typical in the piano. In comparison with other stringed instruments such as guitar and violin, piano strings are quite thick, therefore, they are not assumed to be perfectly flexible and will borrow some of the vibrational behavior of metal bars. This rigidity causes high frequency waves to travel faster than low frequency waves. This phenomenon, as the sound speed depends on frequency traveling waveshapes, is called dispersion, which is present in every real string, is very important in the physical modeling of musical instruments. In particular, dispersion represents an essential aspect in the timbral reconstruction of metal strings or bars of high dimensions such as, for example, in the piano and in certain types of percussion instruments. Note that, most of the piano models intended for sound synthesis applications are based on the digital waveguide technique due to its efficiency (see for example [47]). In addition, as the losses and dispersion of real strings are lumped, the full string model reduces to a delay line and a low-order IIR filter in a feedback loop. However, as reported in Sect. 1.6.4.2 the presence of frequency-dependent losses due to friction with air, viscosity and the finite mass of the string, the 3rd-order time3 derivative term in (6.24), is replaced with a mixed term ∂∂x 2y∂t where the derivatives are mixed time-space.
446
6 Circuits and Algorithms for Physical Modeling
Fig. 6.19 Digital waveguide with frequency-dependent distributed losses
For which Eq. (6.24) can be rewritten as 2 4 ∂y ∂3 y ∂2 y 2∂ y 2∂ y + 2b = c − κ − 2b 1 2 ∂t 2 ∂x2 ∂x4 ∂t ∂ x 2 ∂t
(6.25)
where the second term in the right-hand side of the above equation is the so-called ideal bar that introduces the dispersion, or frequency-dependent wave velocity, and here κ represents the stiffness coefficient. The third and fourth terms model the losses, and if b2 = 0, the decay rates will be frequency-dependent.
6.3.3 Terminated Digital Waveguides In case the line is terminated as, for example, in the vibrating string blocked at the ends, in the numerical model it will be necessary to take into account the boundary conditions imposed by the termination. Assuming an ideal L-length string, blocked at the ends x = 0 and x = L, we obtain the respective boundary conditions y(t, 0) = 0 and y(t, L) = 0. Considering Eq. (6.22), for the physical continuity we have that fr (t − 0/c) = − fl (t + 0/c) fl (t − L/c) = − fr (t + L/c). At the extremes with rigid terminations (or in a short-circuited transmission line) we have a totally reflected wave with opposite sign. The constraint on the numerical model will therefore result y + [n] = −y − [n] y + [n + N /2] = −y − [n − N /2]
6.3 Digital Waveguide Theory
447
Fig. 6.20 Digital waveguide closed on a short circuit. The physical output is taken at any point (also fractional) x =ξ
Fig. 6.21 Digital waveguide closed on a T (z) impedance, load with concentrated in TF G D (z) losses
where N = 2L/ X is defined as the time (expressed in number of samples) for propagation in both directions, from one end of the string to the other, or the total delay of the string loop. In Fig. 6.20 is shown the case of an ideal terminated string, the physical output is taken at any point x = ξ . If the line is terminated on any load, the load impedance can be modeled with a T (z) network function. If there are frequency-dependent losses in the line, as the circuit is linear, these could all be concentrated in a single network function placed in series at the termination as shown in Fig. 6.21.
6.3.3.1
Simple Lossy and Dispersion Model for DW
One of the main advantages of modeling vibrating string phenomena with DW is that for linearity, as illustrated in Fig. 6.21, it is possible to consider a single filter to model all loss phenomena. For example, a typical model of a string instrument is shown in Fig. 6.22 in which the excitation is supplied to the string through the excitation model and the loss and dispersion phenomena are modeled by the TF Hdl (z). For dispersive phenomena we know that waves at higher frequencies, travel at a higher velocity. As suggested in [26], to model dispersive phenomena that crega +z −Da ate inharmonicity, it is possible to use all-pass filters with TF Hd = 1+g −Da , (see az gl Sect. 5.2.4). Instead, a single-pole filter Hl (z) = 1+al z −1 is sufficient for lossy phenomena. Thus the TF has two free parameters: the gain gl and the parameter al
448
6 Circuits and Algorithms for Physical Modeling
Fig. 6.22 Basic principle of a stringed musical instrument implemented with digital transmission line, an excitation model, and a TF that models real string dispersions and losses
For overall modeling, a single TF Hdl (z) = Hd (z)Hl (z) can be used, which accounts for both dispersion and losses with parameters that can be determined from real measurements, with optimization procedures (see for example [27], Chap. 3).
6.3.4 Alternative and Normalized Wave Variables In the DW model described in the previous paragraphs, we have assumed the displacement of the vibrating string y as the magnitude of interest. The DW model is also valid if alternative quantities are used, such as velocity v y˙ or acceleration a y¨ or other derived or integral quantities of the displacement with respect to time. The conversion between the derivative or integral quantities can be done simply by inserting in the circuits model integrators or differentiators. Moreover, it is possible to define alternative quantities, not only through the temporal derivatives, but also through the spatial derivatives also called wave curves.
6.3.4.1
Force, Characteristic Impedance and Power Waves
By applying a force at one end of an ideal string of infinite length it is possible to define the characteristic impedance Z 0 of the string as the ratio between the wave of force applied and the wave of velocity such as Z0 =
K = K μ = μc. c
(6.26)
In Sect. 6.2.2 the characteristic impedance Z (s) was axiomatically introduced in the expression (6.6) and used for the definition of alternative current and voltage quantities for circuit analysis. With the (6.26) we want, instead, to underline also the physical aspects.
6.3 Digital Waveguide Theory
449
In the case of DW it is possible, through the use of the alternative quantities described in the previous paragraph, to define the force wave as f + [n] = Z 0 v + [n] f − [n] = −Z 0 v − [n]
(6.27)
since Z 0 is a real quantity, the force wave is always in phase with the speed wave. The power is expressed by the product force × velocity (or similarly, current × voltage, pressure × volume, etc.). From the previous expression it is possible to define the power traveling wave as: p + [n] = f + [n]v + [n] p − [n] = − f − [n]v − [n] or, for the definition of impedance, as p + [n] = Z 0 (v + [n])2 p − [n] = Z 0 (v − [n])2 . The left and right traveling components are therefore always not negative. The sum of the traveling powers is therefore p[n, m] = p + [n − m] + p − [n + m]. Power waves are important because they express the wave’s ability to do external work.
6.3.4.2
Normalized Quantities
In 2-ports network representations, oriented to the synthesis it is convenient to use normalized quantities [3]. In DW theory it is possible to define normalized quantities like f N− [n] f − [n]/ Z 0 f N+ [n] f + [n]/ Z 0 , + − v+ v− N [n] v [n]/ Z 0 , N [n] v [n]/ Z 0 where Z 0 is defined as constant or normalization resistance. By means of this definition the representation of power waves is much more immediate because the term due to impedance is simplified and we will have + + 2 2 p + [n] = f N+ [n]v + N [n] = (v N [n]) = ( f N [n]) − − 2 2 p − [n] = − f N− [n]v − N [n] = (v N [n]) = ( f N [n]) .
450
6 Circuits and Algorithms for Physical Modeling
The normalized waveforms f N+ [n] and v + N [n] from a physical point of view have a behavior similar to the force and velocity wave (or voltage–current, pressure– velocity) but are scaled in such a way that their square produces instantaneous power. The normalized representation of the DT circuits is extremely useful for a set of interesting properties such as numerical robustness, dynamics control, etc. An example of these robust structures are the digital normalized lattice filters [17, 32], already described in previous chapters. The use of normalized quantities is also very useful in the external representation of analog circuits, since it is possible to characterize a 2-port network by means of quantities such as direct wave, reflected wave and reflection coefficient (instead of current impedance voltage) [3].
6.3.5 Digital Waveguides Connection In DW, the propagation medium was assumed to be uniform and therefore characterized by a constant impedance Z 0 . In the case of a traveling wave that encounters a medium of different nature, i.e., of different characteristic impedance, one part of the wave continues to propagate in the same direction while the other part is reflected. The connection of digital transmission lines with different characteristic impedance, is mainly used in the physical modeling of complex phenomena such as, for example, the propagation of the acoustic wave in different media. In this case, it is necessary to extend the principle of the wave digital filters (WDFs) to the case of distributed constant circuits [36–38].
6.3.5.1
Wave Propagation in the Presence of Discontinuities
To introduce the concepts of reflection and transmission coefficient considering a physical point of view, we refer to an analog electrical transmission line Fig. 6.23 of characteristic impedance Z 0 connected to a termination bipole with impedance Z L , denoted as load impedance. With this circuit it is possible to model the discontinuity in the transmission line, considering a junction with a line with different characteristic impedance. According to Kirchhoff’s laws current and voltage must be continuous at the ends of the discontinuity. The total voltage in the line can be considered as the sum of the voltage in a positive traveling wave V + at the point of discontinuity and a reflected voltage wave V − at the point of discontinuity (respectively A and B of (6.3)). At the discontinuity point x = L, the sum of V + and V − shall be equal to the voltage VL at the ends of the discontinuity Z L ; similarly the currents of the positive and negative traveling waves at the discontinuity point shall be equal to the current flowing in Z L . Since these solutions represent superpositions of voltages and currents waves, we have that
6.3 Digital Waveguide Theory
451
Fig. 6.23 Transmission line terminated on a load impedance Z L
V (x, t) = V + (t − x/c) + V − (t + x/c) I (x, t) =
1 [V + (t Z0
− x/c) − V − (t + x/c)]
or more concisely V = V + + V −,
and
I =
1 (V + − V − ) Z0
(6.28)
as V + is a function of (t − x/c) and V − is a function of (t + x/c). By definition the characteristic impedance of the line is equal to Z0 =
V+ V− = − . I+ I−
(6.29)
Now, considering the presence of the load impedance Z L , we have that I = I L (see Fig. 6.23). Thus, from Eqs. (6.28)–(6.29), we can write IL =
1 (V + − V − ). Z0
(6.30)
Highlighting V + and V − in Eq. (6.28) and taking into account (6.30) follows the relationship between Kirchhoff variables and wave variables 1 (VL + Z L I L ) 2 1 = (VL − Z L I L ) 2
A ≡ V+ = B ≡ V−
(6.31)
formally similar to (6.3) (less than a scaling factor of 2). Similar reasoning could be done at the discontinuity point x = 0. Definition 6.3 Reflection coefficient—The reflection coefficient k is defined as the ratio between the reflected voltage wave and the direct voltage wave; i.e., k=
V− . V+
(6.32)
452
6 Circuits and Algorithms for Physical Modeling
In the case, for example, of a line terminated on a short circuit the reflection coefficient is k = −1 (total reflection). In this case the load impedance can be seen as a short circuit or Z L = 0. From Eqs. (6.28) and (6.30) we can write k=
Y0 − Y L Z L − Z0 = Z L + Z0 Y L + Y0
(6.33)
it should be noted that the latter is identical to the definition (6.8) in Sect. 6.2.2. Similarly to the previous definition it is possible to define a transmission coefficient t as VL t= + (6.34) V so, we get t=
2Z L 2Y0 = Z0 + Z L Y0 + Y L
(6.35)
moreover note that t = 1 − k. The most interesting, and perhaps most obvious, conclusion is that according to previous expression there is no reflected wave if the load impedance is exactly equal to the characteristic line impedance. As a result, all the energy of the incident wave is transferred to the load, which may not be distinguished by a infinite length line with characteristic impedance Z 0 = Z L .
6.3.5.2
Digital Waveguide Connection and Networks: Parallel Wave Adapter Connection
In the DT representation of analog circuits with lumped elements, developed in Sect. 6.2.3, for the connection of the various elements characterized by different reference resistance, special multi-port networks called series and parallel wave adapters have been introduced. This concept can be immediately extended to the case of DT circuits with distributed elements. Wanting to connect two or more lines of different characteristic impedance, it is necessary to insert a numerical N -port network, which models the effects of impedance mismatching. In practice with such a connection model it is possible to define a so-called digital wave network (DWN); defined, precisely, as the connection of two or more bidirectional DW through adapters, called scattering- or dispersion-junctions (SJ), properly designed [13, 16, 17]. As previously seen in the connection of lumped elements in DT circuit, it is possible to define the connection of distributed elements considering the Kirchhoff constraints of the physical variables and transform them into wave variables. With reference to Fig. 6.24, for the determination of the scattering junction characteristic we can consider the physical variables pressure P and volume velocity U (the treatment could be done with the electrical variables V and I ). In this case Eq. (6.31), analogous to (6.3) of the WDF, are rewritten as
6.3 Digital Waveguide Theory
453 _
Fig. 6.24 Digital waveguides network (DWN) with connection through a parallel scattering junction (N -port network)
P2
P2+
U ext
Y2 Scattering junction
P1+ Y1
P1
PL
PN+ YN _
_
PN
1 (Pk + Z k Uk ) 2 1 Bk ≡ Pk− = (Pk − Z k Uk ) . 2
Ak ≡ Pk+ =
(6.36)
For Kirchhoff’s laws, in analogy to (6.12), we can write the constraints related to the inputs of the N -ports network: the junction is in parallel, so the pressures are identical in all the sections and the sum of the volume velocities is zero; it is valid then P1 = P2 = · · · = PN = PL ,
and
U1 + U2 + · · · + U N + Uext = 0 (6.37)
where PL represents the junction pressure (physical quantity) common to all branches, and Uext represents an external flow (external generator such as, for example, air blown by a performer on a wind instrument). By analogy with (6.28) we can write Pk− = PL − Pk+ (for k = 1.2, …, N ) which replaced in (6.37) produces Uext +
N
(Uk+ + Uk− ) = Uext +
k=1
N
Yk (Pk+ + Pk− ) = Uext +
k=1
N
Yk (2Pk+ − PL ) ≡ 0.
k=1
(6.38) Solving w.r.t. PL we get 1 PL = YT this last expression, where YT = wave Pk− as a function of Pk+
Uext + 2
N
Yi Pi+
(6.39)
i=1
N k=1
Yk , can be used to determine the reflected
Uext Pk− = PL − Pk+ = + YT
2
N i=1
Yi Pi+
YT
− Pk+ .
(6.40)
454
6 Circuits and Algorithms for Physical Modeling
Fig. 6.25 Scheme of 3-port scattering junction (SJ) for the connection of W -model distributed elements network
By defining, similarly to (6.14), the quantity αi = 2Yi /YT ,
i = 1, 2, . . . , , N
the (6.40) can be rewritten as Pk− = α0 Uext +
N
αi Pi+ − Pk+
(6.41)
i=1
with α0 = 1/YT , which for k = 1, 2, 3; similarly to what we have seen for WDF wave adapters, can be translated into DT diffusion junction type circuits shown in Fig. 6.25. In this case the parameters αk can be frequency dependent, and are interpreted in the z domain, as numeric filters with FIR or IIR structures. Also note that for null external flows (Uext = 0) we have that PL =
N N 2 Yk Pk+ = αk Pk+ YT k=1 k=1
(6.42)
where the parameter αk coincides with the transmission coefficient (see (6.35)) of the k-th port. Remark 6.9 For N = 2 and Uext = 0 the expression (6.40) is that relative to the junction of two acoustic tubes of characteristic admittance Y1 and Y2 . In this simple case the (6.40) can be rewritten as
6.3 Digital Waveguide Theory
455
Fig. 6.26 Model with scattering junction for connecting DW with different impedances Z 1 and Z 2
Y1 − Y2 + 2Y2 P − P + = k P1+ − t P2+ Y1 + Y2 1 Y1 + Y2 2 2Y1 Y1 − Y2 + P2− = PL − P2+ = P1+ − P = t P1+ − k P2+ Y1 + Y2 Y1 + Y2 2 P1− = PL − P1+ =
(6.43)
where with t = α2 we indicated the transmission coefficient and with k (see Eq. (6.33)) the reflection coefficient. In vector form, the (6.43), can be rewritten as
P1− P2−
=
k −t t −k
P1+ P2+
(6.44)
equivalent to the well-known junction relation of Kelly-Lochbaum (remembering that t = 1 − k) whose diffusion diagram is the one shown in Fig. 6.26. The SJ (6.44) can be represented through a scattering matrix A such that p+ = Ap− where the output wave variables are function of the input ones. That is, the matrix A results to be completely defined by the reflection coefficient. The junction is called lossless if the energy is neither created nor destroyed. It follows that the condition for a DW to be lossless is that a transmission coefficient t is defined such that t = 1 − k. It should be noted that the scattering junction can be realized, through appropriate topological transformations already described in the previous chapters and [50], with different architectures at one, two, three and four multipliers. In Fig. 6.27, for example, the structure with only one multiplier is shown. A different formalism to characterize digital waveguide networks and scattering junctions is described in [24, 29]. Note that in the case without external excitations for connecting N -lines in parallel it is easily demonstrated that Eq. (6.44) can be generalized with the following expression ⎤ ⎡ α1 − 1 α2 · · · P1− ⎢ P2− ⎥ ⎢ α1 α2 − 1 · · · ⎥ ⎢ ⎢ ⎢ .. ⎥ = ⎢ .. .. ⎣ . ⎦ ⎣ . . ⎡
PN−
α1
α2
αN αN .. .
· · · αN − 1
⎤⎡
⎤ P1+ ⎥ ⎢ P2+ ⎥ ⎥⎢ ⎥ ⎥ ⎢ .. ⎥ ⎦⎣ . ⎦ PNN
(6.45)
456
6 Circuits and Algorithms for Physical Modeling
Fig. 6.27 Single multiplier Kelly-Lochbaum scattering junction model [16, 17]
so, indicating with α the matrix with columns equal to the transmission coefficients, the diffusion matrix can be written as A = α − I.
6.3.5.3
Normalized Scattering Junction
In numerical simulation can be used normalized √ wave variables. Normalization ˜k+ = Pk+ · Yk and velocity waves U˜ k+ = can be done for both pressure waves P √ Uk+ / Yk . In this case the power associated with the traveling wave is simply 2 2 ˜ + = U˜ k+ (power root wave) and does not depend on the value PRW+ k = Pk ˜ can of the normalization admittance. In this case the normalized diffusion matrix A be expressed as ⎡ 2Y1 ⎢ ⎢ ˜ A=⎢ ⎢ ⎣
˜T = where Y
√
YT√ 2 Y2 Y1 YT
√ 2 Y1 Y2 YT 2Y2 −1 YT
2 Y N Y1 YT
√ 2 Y N Y2 YT
Y1 , . . . ,
−1
.. . √
√
··· ··· .. .
√ 2 Y1 Y N √YT 2 Y2 Y N YT
.. . 2 · · · 2Y −1 YT
⎤ ⎥ ⎥ ⎥= ⎥ ⎦
˜Y ˜T 2Y 2 − I ˜ Y
(6.46)
Y N , is denoted as Householder reflection matrix [13].
6.4 Finite-Differences Modeling The DW method described above is based on the discretization of the solution of the differential equation that models the physical CT system. Another discretization methodology, widely used in electromagnetic field modeling, called finite-difference time-domain (FDTD) or finite-difference models (FDM), is equally simple to understand and implement. Introduced in 1966 in [34], the method is based on direct discretization of the differential equation rather than its solution. In the electromagnetic case, Maxwell’s equations, written in the differential form,
6.4 Finite-Differences Modeling
457
are rewritten with the modified Euler or trapezoidal technique (see for example [3]) as centered difference equations (see Sect. 2.8.5), discretized and implemented via software.
6.4.1 FDM Definition In computer music scenario, the first virtual instrument based on the physical model of an instrument, based on FDM, was presented as early as 1971 by Hiller and Ruiz [19, 20]. Starting from D’Alembert’s wave equation (6.20), they consider the string with losses, as in the piano where the section is not negligible, and they propose the solution with spatiotemporal discretization. In 1994, Chaigne and Askenfelt [21], proposed an extension of the work of Hiller and Ruiz, considering the interaction of the losses string with the hammer. Since then, the methodology has been successfully applied in numerous practical situations. See for example, [30, 46, 56], and [57] for a review.
6.4.1.1
Discrete-Time String Wave Equation Derivation
For the explanation of the method we again consider the string equation (6.20), (K y = μ y¨ ), for the wave propagation along the x-direction, where K is the string tension, μ the linear mass density and the variable y(x, t) usually represents a K model quantity as, for example, the string displacement, or some acoustic quantity as the pressure. Using the Euler’s forward-backward difference first-derivative approximation, introduced in Sect. 2.8.5, here rewritten for convenience considering also the spatial derivative, we have y[n, m] − y[n − 1, m] ∂ y(x, t) ≈ ∂t T x=m X,t=nT ∂ y(x, t) y[n, m] − y[n, m − 1] ≈ ∂ x x=m X,t=nT X where T and X are, respectively, the temporal and spatial sampling interval, and n,m the respective discrete, temporal and spatial, index. Thus, extending to the second derivative, the most common way to simulate the (6.20) is to approximate the second degree partial derivatives with finite 2nd-order differences, that is y[n, m + 1] − 2y[n, m] + y[n, m − 1] ∂ 2 y(x, t) ≈ ∂ x 2 x=m X,t=nT X2
(6.47)
458
6 Circuits and Algorithms for Physical Modeling
∂ 2 y(x, t) y[n + 1, m] − 2y[n, m] + y[n − 1, m] ≈ . ∂t 2 x=m X,t=nT T2
(6.48)
√ Thus, let c = K /μ, be the propagation speed, the lossless, dispersionless, unforced string wave equation (6.20), can be approximated as y[n + 1, m] − 2y[n, m] + y[n − 1, m] =
c2 T 2 (y[n, m + 1] − 2y[n, m] + y[n, m − 1]) X2
by choosing the time sampling interval T corresponding to the spatial sampling X = cT , we can write the so-called finite-difference scheme (FDS) ([57], Eq. (8)) y[n + 1, m] = y[n, m − 1] + y[n, m + 1 − y[n − 1, m]
(6.49)
valid for ideal lossless, dispersionless, unforced string. The above expression uses the lumped Kirchhoff variables and thus is a K -model. From (6.49) it is observed that the new sample at time (n + 1) for the m-th spatial coordinate, y[m, n + 1], is calculated as the sum of the near positions minus the position itself at the previous instant. The FDTD technique requires that the entire spatial region to be simulated is discretized and, since the spatial sampling interval must be small compared to the smallest wavelength to be modeled, the simulation may require high computational resources. Observe that in the case where X = cT , we can select the spatial and temporal sampling intervals such that R cX/T ≤ 1, where R is known as the CourantFriedrichs-Levy condition [57].
6.4.1.2
Signal Flow Graph of Discrete-Time Wave Equation
The finite-difference scheme (FDS) of Eq. (6.49) can be represented as a grid of points as illustrated in Fig. 6.28a, and/or as a signal flow graph (SFG) in discrete time as illustrated in Fig. 6.28b. Note that, the value of spatial index m increases to the right while the temporal index n increases upwards. The diagrams show the spatiotemporal dependencies, i.e., how the next displacement value of the current position y[n + 1, m] depends on the spatial neighborhood, y[n, m − 1] and y[n, m + 1], and on the previous value y[n − 1, m]. The diagram also indicates that the update rule is the same for all elements. Remark 6.10 Note that, generally W -model techniques such as the DW described above are preferable in the 1-D case where they guarantee numerical robustness and efficient algorithms. On the contrary, K -model techniques such as FDTD are more efficient in 2D and 3D case modeling where, however, great attention must be paid to numerical accuracy.
6.4 Finite-Differences Modeling
459
time index
time index
n +1
n +1 n
a)
y[n + 1, m]
z −1
z −1
n −1 n−2
y[n + 1, m − 1]
y[n + 1, m + 1]
z −1
n y[n, m − 1]
m −1 m m +1 space index
z
b)
n −1
z
y[n − 1, m − 1]
m −1
y[n, m + 1]
y[n, m]
−1
−1
y[n − 1, m ]
m space index
z
−1
y[n − 1, m + 1]
m +1
Fig. 6.28 Finite-difference scheme (FDS) representation of the wave equation. a FDTD grid or space-time stencil. The position y[n + 1, m] point of Eq. (6.49). b Discrete time signal flow graph representation. (Modified from [55, 57])
6.4.1.3
FDM of Stiff and Lossy String
According to the Chaigne-Askenfelt PDE [21], simple model of stiff, lossy string, can be modeled as already shown in Sect. 6.3.2.1. By proceeding to discretize Eq. (6.24) as before, we get ([21], Eq. (10)) y[m, n + 1] = a1 y[m, n] + a2 y[m, n − 1] + a3 (y[m + 1, n] + y[m − 1, n]) + a4 (y[m + 2, n] + y[m − 2, n]) + a5 (y[m + 1, n − 1] + y[m − 1, n − 1] + y[m, n − 2]) + t 2 N FH [n]g[m, m 0 ] /M S .
(6.50)
where the integer N is an appropriate number of spatial steps that divide the string as a function of string fundamental frequency f 1 , for a given sampling frequency f s (e.g., for f s =48 kHz, N ≈ f 1 /10) FH [n] = K |η[n] − y[m 0 , n]| p is the struck string, i.e., hammer force that result of a nonlinear interaction process between the hammer and the (piano) string, g[m, m 0 ] is a dimensionless window which accounts for the width of the hammer, η[n] is the displacement of the hammer head that is given by d 2η M H 2 = −FH (t) dt
460
6 Circuits and Algorithms for Physical Modeling
Fig. 6.29 FDTD grid of lossy and rigid string (3rd-order in time, 4th-order in space (modified from [55])
n +1 time index
n n −1 n−2 n −3
m − 2 m −1 m m +1 m + 2 space index
K and p are the stiffness parameters of the felt are derived from experimental data on real piano hammers, and finally, M S and M H denote the string and the hammer mass respectively (for the other string parameters see for example, [31]). The coefficients a1 -a5 are defined as a1 = [2 − 2r 2 + b3 / t − 6 N 2 r 2 ]/D a2 = [−1 + b1 t + 2b3 / t]/D a3 = [r 2 (1 + 4 N 2 )]/D
(6.51)
a4 = [b3 / t − N 2 r 2 ]/D a5 = [−b3 / t]/D where D = 1 + b1 t + 2b3 / t, and r = c t/ x. The time-space grid of (6.50) is shown in Fig. 6.29.
6.4.1.4
FDM of Stiff and Lossy String in State Space Formulation
The expression of the Chaigne-Askenfelt string model, in Eq. (6.24) can be written in vector form, as suggested by Bilbao ans Smith [55], as yn = A1 yn−1 + A2 yn−2 + A3 yn−3 + f n (yn−1 [i h ], yn−1 [h])
(6.52)
where the column vector yn contains the displacements of the N nodes at time step n and Ak ∈ R N ×N . The nonlinear scalar function f n (yn−1 [i h ], yn−1 [h]) sets the amplitude of the external excitation (e.g., as the hammer force distribution in piano modeling across position at time n). Ignoring the nonlinear external excitation, and proceeding as in [55], the (6.52), can be written in state-space model as
6.4 Finite-Differences Modeling
yn = A1 yn−1 + A2 yn−2 + A3 yn−3 yout = C1 yn
461
(6.53)
where yn is the state-vector. Now inserting the various temporal instants in a single vector we can write ⎡ ⎤ ⎡ ⎤ ⎤⎡ yn A1 A2 A3 yn−1 ⎣ yn−1 ⎦ = ⎣ I 0 0 ⎦ ⎣ yn−2 ⎦ yn−2 yn−3 0 I 0 ⎡ ⎤ (6.54) yn−1 yout = C1 0 0 ⎣ yn−2 ⎦ . yn−3 The latter represents the state space model for stiff and lossy strings of length L and given initial conditions (usually zero position and velocity), proposed by BilbaoSmith in [55], and derived from the PDE model of Chaigne-Askenfelt in Eqs. (6.50) and (6.51) [21].
6.4.2 Derivation of Connection Models for FDTD Simulators Also in the case of DT simulators derived from FDTD wave adapters can be defined. Considering also in this case the acoustic quantities of type P pressure and U volume velocity, imposing the Kirchhoff constraints of type (6.37), and for null Uext we have PL ,n+1
N N 2 = Yi Pi,n − PL ,n−1 = αi Pi,n − PL ,n−1 YT i=1 i=1
(6.55)
for αi = 2Yi /YT . In this case the variables are the physical quantities and not the wave quantities. It is possible to obtain a connection scheme, solving with respect to the output variable, similar to the case of DW and shown in Fig. 6.30. Note that the presence of an internal node feedback can potentially create spurious oscillation problems. There is, in fact, one pole at the origin and one at the Nyquist frequency (see Fig. 6.30). This possible source of instability is contrasted by eliminating these poles in the excitation source: the Uext is then filtered with an TF H (z) = 1 − z −2 to cancel these poles. By omitting this cancelation, instabilities could occur that would propagate throughout the network.
462
6 Circuits and Algorithms for Physical Modeling
Fig. 6.30 Schematic diagram of the 3-port junction scattering network derived with the FDTD method modified from [46]
6.5 Nonlinear WDF and DW Models In the physical modeling of acoustic musical instruments some elements are nonlinear. Just think for example of the inelastic impact of the piano hammer on the string or the interaction between the reed and the acoustic tube in the clarinet. In both these cases the modeling involves the study of linear (acoustic) waveguides with nonlinear loads or the connection of linear circuit elements (DW or WDF) with nonlinear elements with or without memory. In general, the analysis of a circuit with nonlinear elements is made by means of Kirchhoff’s laws and the constitutive relations of the elements. For example, let’s consider the circuit in Fig. 6.31, consisting of a transmission line with characteristic impedance Z 0 closed on a nonlinear resistive load characterized by the constitutive relation i = g(v) (active voltage-controlled resistance). In the case of numerical realizations, the insertion of nonlinear elements poses problems in the definition of SJ [42, 43]. The resolving system is given by the KVL (mesh) and the constitutive relation of the load v0 (t) − Z 0 i(t) = vi (t) (6.56) i(t) − g(v0 (t)) = 0.
6.5 Nonlinear WDF and DW Models
463
Fig. 6.31 Example of a transmission line closed on a nonlinear load
In general, the previous system allows a single solution only in cases where the function i = g(v) is a single value (or monodromic function) [3, 40].
6.5.1 Mapping Memoryless Nonlinear Elements The modeling of memoryless nonlinear elements, i.e., a nonlinear resistors, in the digital wave filter domain can be performed through an affine transformation that maps Kirchhoff v and i variables in wave variables a and b. An element (linear or nonlinear) can be described as F(v, i) = 0 or in the form f (a, b) = 0. As indicated in Sect. 6.2.1, variables i, v, a and b can be defined both in the time t and in the Laplace transform s domains (usually with lower case letters a, b are indicated the variables are defined in the time domain t or n, while with uppercase letters those in the Laplace or z-transform domain). From the transformations (6.3), rewritten below as a = v + Ri b = v − Ri then, we can map a resistor (linear or nonlinear), in a and b wave variables as f (a, b) = F
a+b a−b , . 2 2R
For example, in the case of linear resistor R1 ,(= R), the relationship between a and b (b = g(a)) is linear. In fact, this relationship can be described in the s domain by (6.7) or, in terms of WDF through mapping (6.9) as B(z) = K (z)A(z) where for Eq. (6.8), K (z) = (R1 − R)/(R1 + R). In addition, by choosing R1 = R. as indicated in Sect. 6.2.2, we have K =0. In case of nonlinear resistance, the condition that allows to explicitly write the reflected wave b as a function of the incident wave a, that is, b = g(a), is guaranteed by the implicit function theorem [44]. In fact, if the characteristic function f (a, b) and its derivative are continuous around the point (a0 , b0 ) belonging to the resistance characteristic curve (i.e., f (a0 , b0 ) = 0), then
464
6 Circuits and Algorithms for Physical Modeling
Fig. 6.32 Example of nonlinear resistance with piecewise-linear characteristic
i
v
∂ f (a, b) = 0 ∂b (a0 ,b0 ) guarantees the existence of a function g(·) in an interval around a0 such that f (a, g(a)) = 0. In other words, the characteristic function f (a, b) is locally invertible. For example, suppose that the resistor has a resistance that varies as a function of the current flowing through it (i.e., a current-controlled resistance), the nonlinear characteristic is of the type F(i, v) = v − v(i); therefore a−b a+b −v =0 f (a, b) = 2 2R then, we have that ∂ f (a, b) ∂ = ∂b ∂b
a+b a−b 1 ∂ ∂i 1 1 a−b −v = − = + v 2 2R 2 ∂i ∂b 2 2R 2R
where v (i) = ∂v/∂i. In fact, given the local invertibility of the feature v = v(i), the possibility to write the b = g(a) form, is guaranteed by the condition v (i) = −R. If the nonlinear characteristic of the resistance is piecewise-linear like the one for example reported in Fig. 6.32, of the type i = i(v) (voltage-controlled resistance), the existence of the relation between the wave variables b = g(a) is guaranteed if inf
v2 =v1
i(v2 ) − i(v1 ) 1 > − , and v2 − v1 R
sup
v2 =v1
i(v2 ) − i(v1 ) 1