139 10 9MB
English Pages [104] Year 2023
On 2 June 1948, the Professional Group on Audio of the IRE was formed, establishing what would become the IEEE society structure we know today. 75 years later, this group — now the IEEE Signal Processing Society — is the technical home to nearly 20,000 passionate, dedicated professionals and a bastion of innovation, collaboration, and leadership.
Celebrate with us: Digital Object Identifier 10.1109/MSP.2023.3308398
Contents
Volume 40 | Number 6 | September 2023
FEATURES
14 Perspectives
The Discrete Cosine Transform and Its Impact on Visual Compression: Fifty Years From Its Invention Yao Wang and Debargha Mukherjee
26 QUATERNIONS IN SIGNAL
AND IMAGE PROCESSING
On the Concept of Frequency in Signal Processing: A Discussion Moisés Soto-Bajo, Andrés Fraguela Collar, and Javier Herrera-Vega
Sebastian Miron, Julien Flamant, Nicolas Le Bihan, Pierre Chainais, and David Brie
41 INTEGRATED SENSING AND
75 Lecture Notes
Discriminative and Generative Learning for the Linear Estimation of Random Signals Nir Shlezinger and Tirza Routtenberg
COMMUNICATIONS WITH RECONFIGURABLE INTELLIGENT SURFACES
Sundeep Prabhakar Chepuri, Nir Shlezinger, Fan Liu, George C. Alexandropoulos, Stefano Buzzi, and Yonina C. Eldar
Periodograms and the Method of Averaged Periodograms Shlomo Engelberg
92 SP Competitions
63 DEEP LEARNING MEETS SPARSE
REGULARIZATION
Rahul Parhi and Robert D. Nowak
Synthetic Speech Attribution Davide Salvi, Clara Borrelli, Paolo Bestagini, Fabio Antonacci, Matthew Stamm, Lucio Marcenaro, and Angshul Majumdar
ON THE COVER This issue focuses on the diversity in SP methods and tools. Quaternions, sensing and communications with RIS, and deep learning are among the topics explored. Basic concepts like frequency, periodogram and discrete cosine transform are also discussed.
99 Humor
System Design Manish Narwaria
COVER IMAGE: ©SHUTTERSTOCK.COM/G/AGSANDREW
COLUMNS 11 Society News
New Society Officers Elected
PG. 14
Election of President-Elect, Regional Directors-at-Large, and Members-at-Large Ahmed Tewfik
PG. 63
IEEE SIGNAL PROCESSING MAGAZINE (ISSN 1053-5888) (ISPREG) is published bimonthly by the Institute of Electrical and Electronics Engineers, Inc., 3 Park Avenue, 17th Floor, New York, NY 10016-5997 USA (+1 212 419 7900). Responsibility for the contents rests upon the authors and not the IEEE, the Society, or its members. Annual member subscriptions included in Society fee. Nonmember subscriptions available upon request. Individual copies: IEEE Members US$20.00 (first copy only), nonmembers US$248 per copy. Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright Law for private use of patrons: 1) those post-1977 articles that carry a code at the bottom of the first page, provided the per-copy fee is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without fee. Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For all other copying, reprint, or republication permission, write to IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854 USA. Copyright © 2023 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Periodicals postage paid at New York, NY, and at additional mailing offices. Printed in the U.S.A. Postmaster: Send address changes to IEEE Signal Processing Magazine, IEEE, 445 Hoes Lane, Piscataway, NJ 08854 USA. Canadian GST #125634188
Digital Object Identifier 10.1109/MSP.2023.3293574
IEEEIEEE SIGNAL PROCESSING MAGAZINE 2023| SIGNAL PROCESSING MAGAZINE| September | July 2023
|
1
IEEE Signal Processing Magazine
DEPARTMENTS
EDITOR-IN-CHIEF
ASSOCIATE EDITORS—COLUMNS AND FORUM
Christian Jutten—Université Grenoble Alpes, France
4 From the Editor
SPM Is Your Magazine—You Are Both Reader and Author Christian Jutten
Ulisses Braga-Neto—Texas A&M University, USA Cagatay Candan—Middle East Technical University, Turkey Wei Hu—Peking University, China Andres Kwasinski—Rochester Institute of Technology, USA Xingyu Li—University of Alberta, Edmonton, Alberta, Canada Xin Liao—Hunan University, China Piya Pal—University of California San Diego, USA Hemant Patil—Dhirubhai Ambani Institute of Information and Communication Technology, India Christian Ritz—University of Wollongong, Australia
AREA EDITORS Feature Articles Laure Blanc-Féraud—Université Côte d’Azur, France
6 President’s Message
Special Issues Xiaoxiang Zhu—German Aerospace Center, Germany
Reflecting on the Successes of ICASSP 2023 Athina Petropulu
100 Dates Ahead
Columns and Forum Rodrigo Capobianco Guido—São Paulo State University (UNESP), Brazil H. Vicky Zhao—Tsinghua University, R.P. China e-Newsletter Hamid Palangi—Microsoft Research Lab (AI), USA
ASSOCIATE EDITORS—e-NEWSLETTER
Social Media and Outreach Emil Björnson—KTH Royal Institute of Technology, Sweden
EDITORIAL BOARD
©SHUTTERSTOCK.COM/EVGENY KONDRASHOV
100
The IEEE International Workshop on Information Forensics and Security (WIFS) will be held in Nuremberg, Germany, 4–7 December 2023.
Massoud Babaie-Zadeh—Sharif University of Technology, Iran Waheed U. Bajwa—Rutgers University, USA Caroline Chaux—French Center of National Research, France Mark Coates—McGill University, Canada Laura Cottatellucci—Friedrich-Alexander University of Erlangen-Nuremberg, Germany Davide Dardari—University of Bologna, Italy Mario Figueiredo—Instituto Superior Técnico, University of Lisbon, Portugal Sharon Gannot—Bar-Ilan University, Israel Yifan Gong—Microsoft Corporation, USA Rémi Gribonval—Inria Lyon, France Joseph Guerci—Information Systems Laboratories, Inc., USA Ian Jermyn—Durham University, U.K. Ulugbek S. Kamilov—Washington University, USA Patrick Le Callet—University of Nantes, France Sanghoon Lee—Yonsei University, Korea Danilo Mandic—Imperial College London, U.K. Michalis Matthaiou—Queen’s University Belfast, U.K. Phillip A. Regalia—U.S. National Science Foundation, USA Gaël Richard—Télécom Paris, Institut Polytechnique de Paris, France Reza Sameni—Emory University, USA Ervin Sejdic—University of Pittsburgh, USA Dimitri Van De Ville—Ecole Polytechnique Fédérale de Lausanne, Switzerland Henk Wymeersch—Chalmers University of Technology, Sweden
Abhishek Appaji—College of Engineering, India Subhro Das—MIT-IBM Watson AI Lab, IBM Research, USA Behnaz Ghoraani—Florida Atlantic University, USA Panagiotis Markopoulos—The University of Texas at San Antonio, USA
IEEE SIGNAL PROCESSING SOCIETY Athina Petropulu—President Min Wu—President-Elect Ana Isabel Pérez-Neira—Vice President, Conferences Roxana Saint-Nom—VP Education Kenneth K.M. Lam—Vice President, Membership Marc Moonen—Vice President, Publications Alle-Jan van der Veen—Vice President, Technical Directions
IEEE SIGNAL PROCESSING SOCIETY STAFF Richard J. Baseil—Society Executive Director William Colacchio—Senior Manager, Publications and Education Strategy and Services Rebecca Wollman—Publications Administrator
IEEE PUBLISHING OPERATIONS
Sharon M. Turk, Journals Production Manager Katie Sullivan, Senior Manager, Journals Production Janet Dudar, Senior Art Director Gail A. Schnitzer, Associate Art Director Theresa L. Smith, Production Coordinator Mark David, Director, Business Development Media & Advertising Felicia Spagnoli, Advertising Production Manager Peter M. Tuohy, Director, Production Services Kevin Lisankie, Director, Editorial Services Dawn M. Melley, Senior Director, Publishing Operations
Digital Object Identifier 10.1109/MSP.2023.3293576
SCOPE: IEEE Signal Processing Magazine publishes tutorial-style articles on signal processing research and
IEEE prohibits discrimination, harassment, and bullying. For more information, visit http://www.ieee.org/web/aboutus/whatis/policies/p9-26.html.
2
applications as well as columns and forums on issues of interest. Its coverage ranges from fundamental principles to practical implementation, reflecting the multidimensional facets of interests and concerns of the community. Its mission is to bring up-to-date, emerging, and active technical developments, issues, and events to the research, educational, and professional communities. It is also the main Society communication platform addressing important issues concerning all members.
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
Get Published in the New IEEE Open Journal of Signal Processing Submit a paper today to the premier new open access journal in signal processing.
In keeping with IEEE’s continued commitment to providing
Your research will also be exposed to 5+ million unique
options supporting the needs of all authors, in 2020, IEEE
monthly users of the IEEE Xplore® Digital Library.
introduced the high-quality publication, the IEEE Open Journal
The high-quality IEEE Open Journal of Signal Processing will
of Signal Processing.
draw on IEEE’s expert technical community’s continued
In recognition of author funding difficulties during this
commitment to publishing the most highly cited content.
unprecedented time, the IEEE Signal Processing Society is
The editor-in-chief is the distinguished Prof. Brendt
offering a reduced APC of USD$995 with no page limits for
Wohlberg, who specializes in signal and image processing,
regular papers. (This offer cannot be combined with any
inverse problems, and computational imaging.
other discounts.)
The rapid peer-reviewed process targets a publication
We invite you to have your article peer-reviewed and
time frame within 10-15 weeks for most accepted papers.
published in the new journal. This is an exciting opportunity for
This journal is fully open and compliant with funder
your research to benefit from the high visibility and interest
mandates, including Plan S.
the journal will generate.
Submit your paper today! The high-quality IEEE Open Journal of Signal Processing launched in IEEEXplore ® in January 2020 and welcomes submissions of novel technical contributions.
Click here to learn more
www.signalprocessingsociety.org Digital Object Identifier 10.1109/MSP.2023.3308399
[cov2] MAY 2023
FROM THE EDITOR Christian Jutten
| Editor-in-Chief | [email protected]
SPM Is Your Magazine—You Are Both Reader and Author Contribute to IEEE Signal Processing Magazine
T
he objectives of IEEE Signal Processing Magazine (SPM) are to propose, for any IEEE Signal Processing Society (SPS) member and beyond, a wide range of tutorial articles on both methods and applications in signal and image processing. The articles are divided into different categories: feature articles, column and forum articles, and articles in special issues, the specificities of which are detailed on the SPM webpage “Information for Authors - SPM”: https:// signalprocessingsociety.org/publications -resources/ieee-signal-processing-magazine/ information-authors-spm. In short, a tutorial article must present a systematic introduction of fundamental theories, common practices, and applications in a well-defined, reasonably matured, or emerging area, preferably an area that is of interest to readers from multiple fields in signal and image processing. A tutorial article, either a feature article or an article in a special issue of SPM, should cover the history, the state of the art, and the future directions of the topic and include a limited and relevant selection of references— instead of an exhaustive list. A tutorial article is not suited to present new results, to cover only the author’s own work, or to present a narrow and biased view of a domain. A special issue comprises multiple, interrelated tutorials that provide a com-
Digital Object Identifier 10.1109/MSP.2023.3298172 Date of current version: 8 September 2023
4
prehensive coverage of a specific topic of interest to the signal processing community. It is proposed and managed—if the proposal is accepted—by a team of guest editors, and its first step is an open Call for Papers. A feature article is a tutorial paper submitted by prospective authors without responding to the call for papers that is done for special issues. Accepted feature articles are published whenever the review process is completed and the magazine has adequate page capacity, without using a fixed special issue structure. A column contains an article, either technical or nontechnical (depending on the column profile), that addresses a specific topic of special interest to the general SPS reader. There are several columns in the magazine, e.g., “Lecture Notes,” “Tips & Tricks,” “Perspectives,” and “Special Reports.” You can see details on the categories by accessing https:// signalprocessingsociety.org/publications -resources/ieee-signal-processing-magazine/ column-descriptions. A forum article is the result of an open discussion with several experts. The structure of a forum article is different from that of the articles mentioned previously as it consists of comments and responses of the participants on several aspects within the topic of discussion. In essence, any scientist can contribute to SPM, either directly by submitting a special issue proposal with a team of scientists (the guest editors) or by subIEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
mitting a feature article white paper or a column article by sharing an elegant or efficient way to present or implement known results. Tutorials and keynote talks at conferences and workshops are good candidates for feature articles, and organizers of special sessions can determine if it is timely to propose an SPM special issue. For all of the categories of SPM articles, in a spirit of open science, we also encourage authors to share additional material, like codes, data, and slides. Note that uploading additional materials to an article strongly increases its visibility and impact.
In this issue The SPM September issue contains three feature articles and six column papers. These well-written papers cover a wide range of topics in signal and image processing. I am confident you will learn a great deal from them. Everybody knows what a discrete cosine transform (DCT) is. But do you know its history and its impact on image processing? You can discover details in the column by Wang and Mukherjee [A1], which is also a tribute to Nasir Ahmed, who published an article on the DCT for the first time in 1974. A second column article, by Soto-Bajo, Fraguela Collar, and Herrera-Vega [A2], proposes an interesting discussion around a key and ubiquitous concept in signal processing: frequency, and shows how the concept of frequency itself is complex and tricky.
The September SPM issue contains a report on the IEEE Signal Processing Cup 2022 Student Competition [A8], which focused on a timely problem: synthetic speech attribution. The problem is to detect what system has been used for providing a speech sequence. Currently, in the audio case, since it is very easy to create fake synthetic speech tracks, it is very important to be able to discriminate original speech recordings from fake ones. Finally, Narwaria proposes a humorous view [A9] of model-driven versus datadriven approaches in the deep learning era. As I explained in the first part of this editorial, SPM is your journal: enjoy reading it, and further support it by submitting your contributions.
Appendix: Related Articles [A1] Y. Wang and D. Mukherjee, “The discrete cosine transform and its impact on visual compression: Fifty years from its invention,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 14–17, Aug. 2023, doi: 10.1109/MSP.2023.3282775. [A2] M . Soto-Bajo, A. Fraguela Collar, and J. Herrera-Vega, “On the concept of frequency in signal processing: A discussion,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 18–25, Aug. 2023, doi: 10.1109/MSP.2023.3257505.
[A3] S. Miron, J. Flamant, N. Le Bihan, P. Chainais, and D. Brie, “Quaternions in signal and image processing: A comprehensive and objective overview,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 26–40, Aug. 2023, doi: 10.1109/MSP.2023.3278071. [A4] S. P. Chepuri, N. Shlezinger, F. Liu, G. C. Alexandropoulos, S. Buzzi, and Y. C. Eldar, “Integrated sensing and communications with reconfigurable intelligent surfaces: From signal modeling to processing,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 41–62, Aug. 2023, doi: 10.1109/MSP.2023.3279986. [A5] R. Parhi and R. D. Nowak, “Deep learning meets sparse regularization: A signal processing perspective,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 63–74, Aug. 2023, doi: 10.1109/ MSP.2023.3286988. [A6] N . S h l e z i n g e r a n d T. Ro u t t e n b e r g , “Discriminative and generative learning for the linear estimation of random signals,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 75–82, Aug. 2023, doi: 10.1109/MSP.2023.3271431. [A7] S. Engelberg, “Periodograms and the method of averaged periodograms,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 83–91, Aug. 2023, doi: 10.1109/MSP.2023.3285044. [A8] D. Salvi, C. Borrelli, P. Bestagini, F. Antonacci, M. Stamm, L. Marcenaro, and A. Majumdar, “Synthetic speech attribution: Highlights from the IEEE signal processing cup 2022 student competition,” IEEE Signal Process. Mag., vol. 40, no. 6, pp. 92–98, Aug. 2023, doi: 10.1109/ MSP.2023.3268823. [A9] M. Narwaria, “System design,” IEEE Signal Process. Mag., vol. 40, no. 6, p. 99, Aug. 2023, doi: 10.1109/MSP.2023.3256068.
We want to hear from you!
SP
IMAGE LICENSED BY GRAPHIC STOCK
I am sure that you know the word “quaternions.” But are you familiar with the tools for using quaternions? Do you know in what domains quaternions could actually be smart tools? The feature article by Miron et al. [A3] presents a simple and comprehensive overview on quaternions for signal and image processing and fairly highlights the pros and cons. In previous SPM issues, a few articles have been written on reconfigurable intelligent surfaces (RISs). In [A4], Chepuri et al. discuss the potential of RISs for integrating sensing and communication, especially in communications and radar, with related signal processing challenges. Machine learning and deep learning are now ubiquitous, but many open questions remain, especially concerning overparameterization. In the feature article by Parhi and Nowak [A5], the authors provide rigorous explanations for the sparsitypromoting effect of the common regularization scheme of weight decay in neural network (NN) training, hinging on the homogeneity of activation functions like the rectified linear unit. This explains why NNs seemingly break the curse of dimensionality. This SPM also includes two excellent “Lecture Notes.” In the first one, by Shlezinger and Routtenberg [A6], related to the tradeoff between model-driven and data-driven learning approaches, the authors introduce the concepts of generative and discriminative learning for inference and compare them both theoretically and numerically in a simple linear context. The second article, by Engelberg [A7], focuses on the periodogram and averaged periodogram, which are customary tools for estimating power spectral density (PSD). The author presents a simple deterministic argument, one that complements the standard probabilistic argument, to explain why the PSD of a signal is not well approximated by a single periodogram. I would like to highlight that these two “Lecture Notes” articles propose additional materials (codes in Python or Matlab), which is greatly appreciated by the Editorial Board and will certainly help scientists to include some of these reflections in their own lectures.
Do you like what you’re reading? Your feedback is important. Let us know—send the editor-in-chief an e-mail!
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
5
PRESIDENT’S MESSAGE Athina Petropulu
| IEEE Signal Processing Society President | [email protected]
Reflecting on the Successes of ICASSP 2023
A
s we gear up for the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024, it is essential to take a moment to celebrate the achievements and highlights of ICASSP 2023, which took place on Rhodes Island, Greece, this past June. ICASSP 2023 was a momentous event as it marked the first postpandemic ICASSP, and the return to in-person meetings. With the theme “Signal Processing in the AI Era,” the conference underscored the strong connection between signal processing and machine learning, highlighting the pivotal role of signal processing in shaping the development of artificial intelligence (AI). ICASSP 2023 surpassed all expectations, with close to a 50% increase in submitted and accepted papers compared with previous submission records. More than 4,000 participants, with over 3,700 attending in person, demonstrated the key role of signal processing in both academia and industry, underscoring the importance of our community in advancing the field of AI. Spearheading the event were the general chair, Prof. Petros Maragos (NTUA, Greece) and cochairs Kostas Berberidis (U Patras, Greece), and Petros Boufounos (MERL, USA), led a committee of distinguished academics and practitioners who curated an outstanding technical program.
Digital Object Identifier 10.1109/MSP.2023.3302476 Date of current version: 8 September 2023
6
ICASSP 2023 coincided with the 75th anniversary of the IEEE Signal Processing Society (SPS), and the program was designed to commemorate this milestone. Various activities took place [1], showcasing the evolution and impact of signal processing over the decades. Aligning with the conference theme, the role of signal processing in AI took center stage during the plenary talks. Distinguished speakers delivered captivating talks to shed light on this vital angle. Andrea Goldsmith’s “Disrupting NextG” plenary emphasized the outsized role of signal processing in nextgeneration wireless technologies. She highlighted the symbiotic relationship
between machine learning and signal processing, underscoring that knowledge of the application and data can lead to more effective and explainable machine learning algorithms for wireless communications. Richard Baraniuk’s talk, “The Local Geometry of Deep Learning,” provided a fresh perspective on deep learning algorithms by exploring them through the lens of approximation theory via splines. This novel approach opened a window into the inner workings of these algorithms, offering valuable insights. Michael Jordan’s plenary, “An Alternative View on AI: Collaborative Learning, Incentives, and Social Welfare,” envisioned a future AI landscape that is more collective and
ICASSP 2023 ingeniously incorporated outdoor poster sessions, effectively accommodating the substantial influx of participants. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
autonomous, with a focus on statistical evaluation. Christos Papadimitriou’s talk, “How Does the Brain Create Language?,” delved into a model-based approach for understanding brain functions, with potential applications in emulating high-level cognitive phenomena. He discussed ongoing efforts to create a neuromorphic language organ within this framework. The technical program at ICASSP 2023 was exceptional, adhering to the high standards set by this flagship conference of the SPS. Besides the wellestablished components, there were several special new features added by the organizing committee that added novelty to the event. For the first time in ICASSP’s history, satellite workshops were included before and after the main conference. These 18 satellite workshops covered cutting-edge topics, contributing to General Chair Petros Maragos’ vision of introducing thematic diversity and attracting nontraditional ICASSP audiences. The workshops touched upon various areas, such as the Data Science and Learning Workshop: Unraveling the Brain; Integrated Sensing and Communications: New Frontiers, Newer Challenges; Signal and Data Processing for Next Generation Satellites; Signal Processing for Autonomous Systems (SPAS); and Sign Language Translation and Avatar Technology. Three indus-
The welcome reception: an exquisite event showcasing local delicacies.
try workshops, offered by MathWorks, Huawei, and Meta, further enriched the program, showcasing real-world applications of AI and signal processing in various industries. I will discuss some relatively recently introduced activities at ICASSP, with the hope that their success will continue.
Education-oriented short courses An SPS strategic goal is the delivery of education-oriented short courses, aimed at transforming the education and training landscape. Since 2022, these courses offer participants an
opportunity to deepen their understanding of critical topics in the field. Unlike traditional tutorials, SPS’s education-oriented short courses take a deep dive into various subjects, providing a comprehensive and multisided perspective on each topic. The courses consist of parallel tracks of 10-hour sessions conducted in three segments, offering participants an immersive learning experience. Upon successful completion of the course and quiz, participants are awarded professional development hours and continuing education units certificates.
The Low-Complexity Mini-Marathon: a unifying social affair among ICASSP 2023 participants.
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
7
ICASSP 2023, with the help of the SPS Education Board led by Chair Roxana Saint-Nom, organized the inperson delivery of these short courses, providing attendees the option to participate either live or remotely, catering to the diverse needs of learners from around the world. The four selected courses were as follows: ■■ “A Hands-On Approach for Im p le menting Stochastic Optimiz ation Algorithms from Scratch” ■■ “Graph Neural Networks” ■■ “Graph Sig nal Processing and Geometric Learning: A Founda tional Approach” ■■ “Learning Nonlinear and Deep LowDimensional Representations from High-Dimensional Data: From Theory to Practice.” These courses will also be offered on demand for SPS members in the SPS Resource Center.
Fostering entrepreneurship in signal processing Entrepreneurship has become another strategic priority for the SPS, recognizing the crucial role of innovation and startups in driving advancements in the field. Since 2022, the ICASSP conference has been offering an entrepreneurship forum, providing a platform for researchers, entrepreneurs, investors, and experts to come together and share their insights, experiences, and success stories. The ICASSP 2023 forum continued this tradition, o rga n i z e d by D r. Costantinos Papadias (The American College of Greece), Dr. Nassos Katsamanis (Behavioral Signals and ATHENA RC), and Dr. Evita Fotinea (ATHENA RC), with an aim to inspire
the next generation of signal processing innovators to think outside the box. The forum featured a diverse program agenda, including keynote speeches, panels, networking opportunities, and a startup fair, creating a vibrant and interactive environment for participants. Attendees, both present and remote, had the unique opportunity to listen to the entrepreneurship journeys of seasoned entrepreneurs who provided valuable insights into the challenges and opportunities they encountered in their ventures. A highlight of the event was the startup fair. Organizations from Greece and other international startups showcased their products and innovations in an adjacent exhibition. Following the exhibition, a pitching competition was held, providing young professionals from the startup community with the chance to present their ideas and projects in various application areas, ranging from b ioinformatics, health, and sound systems to autonomous vehicles, imaging radars, and wildfire prevention. The participants received invaluable feedback from experienced entrepreneurs, creating an environment of collaboration and knowledge sharing. The keynote speaker of the forum was Alexandros Eleftheriadis, a partner at Big Pi Ventures, a renowned Greek venture capital firm that specializes in deep tech startups. He shared valuable tips about investments, criteria, entities, and terms, providing valuable insights into the funding landscape for aspiring entrepreneurs. The forum also featured a panel discussion titled “Tech-Based Entrepreneurship,” which shed light on the challenges young entrepreneurs
face while establishing their startups. The discussion covered topics such as limited resources, the trend of bootstrapping, the protection of European companies, the importance of basic research, and the critical role of building strong teams. The forum culminated with the announcement of the winners of the startup competition. The first prize was awarded to Treble, a cloud-based sound systems startup, Waveye, an AI-driven imaging radars venture, got the second prize, and Voinosis, a startup focused on diagnosing diseases like dementia based on voice analysis won the third prize.
Challenges at ICASSP The Signal Processing Grand Challenges (SPGCs) demonstrate the community’s commitment to addressing real-world issues through signal processing advancements. By fostering an environment of cooperation and intellectual exchange, these challenges continue to drive progress in the field. ICASSP 2023 featured a record number of SPGCs. Comprising 15 distinct challenges spanning various signal processing domains, including audio, speech, communications, biomedical applications, computer vision, and brain–computer interfaces, these events stimulated innovative discussions and collaborations. At the conference, each challenge was allocated a dedicated session where the top five submissions were presented, showcasing their winning solutions. Challenge organizers also offered valuable insights through overview presentations, shedding light on the significance and scope of each challenge. Recognizing the importance of sharing knowledge, the challenge organizers were invited to contribute overview papers to a special issue in the IEEE Open Journal of Signal Processing. Additionally, the winning teams were also provided with the opportunity to publish papers detailing their successful approaches.
PROGRESS Workshop at ICASSP Supporting the SPS goal to make signal processing a more inclusive field, the 6th Promoting Diversity in Signal
Conference banquet: immersing in festive Greek-style celebration.
8
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
Processing (PROGRESS) Workshop was held during the ICASSP week. The goal of this workshop is to motivate and support women and underrepresented minorities to pursue academic careers in signal processing. We know that women and minorities are underrepresented on the faculties of universities around the world, which limits the diversity of perspectives in academic research and also deprives students of diverse role models. Role models and mentors play a crucial role in inspiring students, instilling confidence in their abilities, and demonstrating the potential for success in their chosen fields. The PROGRESS Workshop recognizes the significance of diversity in academia and seeks to bridge gender and cultural gaps through its empowering initiatives. Since its inception at International Conference on Image Processing (ICIP) in 2020, the PROGRESS Workshop has gained substantial momentum. It is now an institutionalized part of SPS, featuring prominently during both ICASSP and ICIP conferences. The workshop is overseen by the SPS Diversity Subcommittee in Membership and Development under the Membership Board, with the support of a representative from the respective conference organizing committees, further emphasizing the commitment of the signal processing community to foster inclusivity. At ICASSP 2023, the PROGRESS Workshop was successfully led by Dr. Theodora Chaspari from Texas A&M University assisted by the local committee of Dr. Maria Flouri and Dr. Nancy Zlatintsi. Dr. Chaspari, along with a distinguished international lineup of speakers, including Dr. Vasileia Filidou (Athena RC), Dr. Xiaoli Ma (Georgia Tech, USA), Dr. Urbashi Mitra (USC, USA), Ana Perez (Center Technologic de Telecommunicacions de Catalunya), Yuejie Chi (Carnegie Mellon University, USA), and Yonina Eldar (Weizmann Institute of Technology, Israel), engaged both present and remote graduate and undergraduate students, as well as postdoc researchers. The speakers discussed their cutting-edge research and also shared valuable insights into
the academic career path. Their experiences and success stories demonstrated the opportunities and possibilities available in the field, encouraging attendees to envision their own rewarding academic journeys in signal processing. One of the significant highlights of the PROGRESS program was the professional development training session conducted by NaturalScience. Careers. This session equipped participants with essential tools for building a successful academic career, such as crafting an engaging CV, effectively promoting their skills and accomplishments, and mastering the art of the interview process. One of the most noteworthy aspects of the PROGRESS Workshop is that it is open and free for all registered ICASSP attendees, ensuring access and inclusivity for anyone interested in participating, even if they do not feel they belong to a specific technical or socio-cultural category. Additionally, non-ICASSP attendees had the opportunity to partake in the workshop upon the review of their application materials, reflecting the workshop’s commitment to embracing diversity and community outreach. The PROGRESS Workshop is supported by the SPS and external funding agencies. This year, in a significant move to support students and early-career academics, the SPS offered competitive travel grants of US $1,000. The recipients of these prestigious PROGRESS travel grants hailed from India, Switzerland, USA, Canada, Taiwan, and Australia, underscoring the workshop’s global reach.
Celebrating diversity at ICASSP 2023 The SPS remains dedicated to fostering diversity and inclusion. This year’s conference saw several notable instances of inclusivity. Attendees appreciated the lactation room, the childcare services and the provision of gender-neutral restrooms offered by the ICASSP organizers. A significant milestone was achi eved this year with the introduction of an LGBTQI+ event during ICASSP 2023. This “unofficial” ICASSP gathering was attended by 58 individuals spanning various signal processing domains and academic tiers, from students to associate professors, as well as esteemed IEEE SPS colleagues. Together, the attendees enjoyed good conversations with lots of laughter, in a relaxing atmosphere. The positive feedback from attendees was overwhelming, with many expressing their immense joy over the establishment of an LGBTQI+ event. Gratitude is extended to the organizers, Lucas Thomaz, Odette Scharenborg, and the IEEE SPS staff, who orchestrated this groundbreaking event. We hope that such gatherings will gain official recognition in future editions of ICASSP.
Acknowledging the heroes behind the success of ICASSP 2023 The resounding success of ICASSP 2023 would not have been possible without the dedication and hard work of an outstanding team of volunteers and staff. As we reflect on the
Embracing diversity: An LGBTQI+ gathering at ICASSP 2023.
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
9
The closing ceremony: expressing gratitude to the organizers, participants, and supporters of this exceptional conference.
achievements of this remarkable conference, it is essential to express gratitude to those who played a pivotal role in making it a grand success. At the helm of the event was General Chair Petros Maragos, whose vision, leadership, and oversight were instrumental in orchestrating every aspect of the conference. His tireless efforts ensured a seamless organization and delivery of the conference, and his unwavering commitment set the tone for the entire team. Co-chairs Kostas Berberidis and Petros Boufounos deserve heartfelt thanks for their invaluable contributions behind the scenes. Their dedication and hard work complemented the efforts of the general chair, ensuring that every detail was meticulously taken care of. The technical program cochairs, includ i ng Sh r i Na raya na n, Constantine Kotropoulos, Ken Ma, and Athina Petropulu played a vital role in shaping the technical program of the conference. The SPS technical committee chairs also deserve recognition for their exceptional efforts in handling the unprecedented number of submissions. Their competence and efficiency ensured a well-curated technical program. 10
Every volunteer involved in ICASSP 2023 dedicated their best self to the conference, and their hard work and commitment are commendable. Their collective efforts contributed to the success of the event, making it a memorable experience for all attendees. The SPS Board Chair Ana Isabel Pérez Neira and the Conference Board provided important oversight of this conference and all SPS conferences and workshops. ICASSP 2023 experienced a tremendous increase in attendance, and so the general chair and the team had to navigate uncharted territory to meet the needs of the conference effectively. They efficiently utilized the available space and introduced innovative ideas, such as holding poster sessions in garden areas and extending coffee breaks, to avoid overcrowding and ensure a smooth conference experience. Many thanks to the local professional conference organizer Matina Gika and her team, who played a big role in the planning. The organizers also delivered a wide range of social events that brought the attendees together. Events like the LowComplexity Mini-Marathon, the welcome reception, the open-air fair, the banquet, and the open-air concert provided ample opportunities for networkIEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
ing, making new friends, and enjoying the nontechnical aspects of the conference beyond the technical sessions. The unwavering support of the SPS staff was crucial in making ICASSP 2023 a success. Rich Baseil, Theresa Argiropoulos, Caroline Johnson, Bill Colacchio, Nicole Allen, Debbie Blazek, Michelle Demydenko, Jessica Perry, J aquie Rash, Rebecca Wollman, and others worked tirelessly behind the scenes, providing essential support and ensuring that the conference ran smoothly. Conference Catalysts, led by Chris Dyer, also provided tools and support. As we eagerly await ICASSP 2024, set to take place during 14–19 April at COEX, Seoul, South Korea, we look forward to another exciting event. General Chair Hanseok Ko promises an outstanding conference, building on the successes of ICASSP 2023. In the meantime, our second annual flagship conference, ICIP, is set for 8–11 October 2023 in Kuala Lumpur. It will have similar attractions to those of ICASSP, such as industry workshops, entrepreneurial presentations, short courses, membership events, and young professional and diversity discussions. Its technical program features cuttingedge advances in image processing research and technology. See you all in Kuala Lumpur and Seoul!
Acknowledgment I would like to acknowledge the help of Theresa Argiropoulos, Rich Baseil, Constantinos Papadias, Theodora Chaspari, Odette Scharenborg, and Alexander Bertrand in writing this article. ChatGPT was used in certain parts of the article.
Reference
[1] C. Jutten and A. Petropulu, “IEEE signal processing society 75th anniversary during ICASSP 2023: Remembering the past, engaging with the present, and building the future [From the Editor],” IEEE Signal Process. Mag., vol. 40, no. 5, pp. 4–11, Jul. 2023, doi: 10.1109/MSP.2023.3286188.
SP
SOCIETY NEWS
New Society Officers Elected
T
he Board of Governors of the IEEE Signal Processing Society elected two new officers, who will start their terms on 1 January 2024.
Digital Object Identifier 10.1109/MSP.2023.3294034 Date of current version: 8 September 2023
New Society officers elected
Antonio Ortega
Haizhou Li Haizhou Li will serve as the 2024–2026 vice president-conferences. He is a Fellow of IEEE and is with the Chinese University of Hong Kong (CUHK), Shenzhen, China, and the National University of Singapore. He succeeds Ana Perez-Neira, who has held this position since January 2021.
Antonio Ortega will serve as the 2024– 2026 vice presidentpublications. He is a Fellow of IEEE and is with the University of Southern California. He succeeds Marc Moonen, who has held this position since January 2021.
Ahmed Tewfik | IEEE Signal Processing Society Past President, 2022–2023 Nominations and Appointments Committee Chair
Election of President-Elect, Regional Directors-at-Large, and Members-at-Large
I
t is my pleasure to announce that the IEEE Signal Processing Society (SPS) annual election will commence on 15 August, and your vote is more important than ever! This year, all eligible SPS members will vote for the next President-Elect (for a term from 1 January 2024 through 31 December 2025) in addition to the Regional Directors-atLarge for Regions 7&9 and 10 (for a Digital Object Identifier 10.1109/MSP.2023.3299372 Date of current version: 8 September 2023
term from 1 January 2024 through 31 December 2025) and Members-at-Large (for a term from 1 January 2024 through 31 December 2026) of the SPS Board of Governors (BoG). Ballots will be mailed to SPS members. The ballot includes a diverse slate of candidates for all elections, who were vetted by the SPS Nominations and Appointments Committee, as well as a space for write-in candidates. This year’s election offers IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
The Candidates for President-Elect
Konstantinos (Kostas) N. Plataniotis
Nicholas Sidiropoulos
11
The Candidates for Regional Director-at-Large Regions 7 and 9 Region 10
Timothy N. Davidson
Chiou-Ting Hsu
Vítor Heloiz Nascimento
Chirag N. Paunwala
The Candidates for Member-at-Large
Qian He
Yingbo Hua
Petros Maragos
Stephen McLaughlin
Iole Moccagatta
Thrasyvoulos N. Pappas
SPS members the opportunity to cast their votes via the web at https:// eballot4.votenet.com/IEEE for up to one President-Elect, one Regional Director-at-Large for your corresponding Region—Regions 7 and 9 (Canada and Latin America) and Region 10 (Asia and Pacific)—and three Member-at-Large candidates. Ballots must be received at the IEEE no later than 2 October 2023 to be counted. Members must meet the eligibility requirements at the time the ballot data are generated to be eligible to vote. To be eligible to vote in this year’s Society election, you must have been an active SPS member, affiliate, or graduate student member as of 30 June 2023. This is the date when the list of eligible Society voting members was compiled. 12
Brendt Wohlberg
The 2023 Candidates for President-Elect The 2023 candidates for President-Elect (presented in alphabetical order) and their candidate statements appear next.
Candidate statement from Konstantinos (Kostas) N. Plataniotis The SPS has been my professional home since 1991, and I have always cherished the opportunity to continue serving our community. The nomination for the post of SPS President-Elect humbles me. Within the term of office, I envision addressing the following priorities: ■■ Serving our members: As we celebrate our 75th anniversary and look to the future, we can confidently say that our Society’s membership will look different and much more diverse than today’s. Members in IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
■■
different geographical regions, industries, and career stages will have different needs and priorities. We will develop personalized activities and services around individual members’ needs to the extent possible. For example, the SPS will support activities that combine networking and mentoring programs with customized education and help build project-focused communities quickly for specific purposes. We will use enabling technologies, including webinars and virtual meetings, to establish and promote public forums, possibly free-of-charge ecosystems, to exchange ideas, data, and research artifacts; engage practicing engineers; and collaborate with industry leaders. We will intensify our investment and collaborate with IEEE and sister Societies in offering services and solutions that address the educational and capacitybuilding needs of our current and future members by delivering microlearning opportunities and “stackable” credentials in emerging interdisciplinary areas. With new technologies making the interactions among our members more convenient, we will continue expanding our networking and mentoring programs to more underserved areas and underrepresented groups in our community. Diversity and inclusion are our moral and ethical responsibility and necessary conditions for sustainable long-range growth. Support to volunteers: The SPS’s governance structure ensures that our goals and objectives are met. It should be future-focused and agile. We will expand our footprint and create new programs in emerging interdisciplinary areas such as data science, artificial intelligence (AI), and machine learning (ML). Moreover, as the submitted material is projected to increase, we will, in consultation with IEEE, experiment with partially automated initial vetting of submitted material while maintaining overall quality control at subsequent levels. Keeping our volunteers engaged and their workload
reasonable while maintaining our reputation as a trusted knowledge source is feasible and doable. ■■ Securing the future: The SPS finances our community’s operations, offerings, and services to its members. Continuing to operate in a self-sustainable way is paramount for the long-term viability of our professional home. Dealing with the financial shortfall from the open access initiative, accounting for the regional membershipsegment conference participation fee adjustments, financing new member services, and supporting membership-drive incentives impact the SPS’s fiscal profile. Still, some of our members’ conference fees are high, and open access publishing fees are unreasonably excessive given their circumstances, i.e., for those who transition between careers or are in a low-income region. That said, the SPS’s funding models do not need to be limited to membership subscriptions and conference and publication fees. In consultation with IEEE, we will explore income streams from sponsored activities, including support from corporate sponsors for conferences; public forums and special group activities; education and skills capacity offerings; and the monetization of applications related to “information summarization” and “executable knowledge.” For more information about Konstantinos (Kostas) N. Plataniotis, please visit https://www.plataniotis.com.
Candidate statement from Nicholas Sidiropoulos I have served the SPS in leadership roles for many years. Among the various efforts I have undertaken, I’m particularly pleased with the following: ■■ In my service as the chair of the Signal Processing for Communications and Networking Technical Committee (SPCOM TC), I helped institute a rigorous paper award screening and multistage selection process that remains in use 17 years later. Every
article published in IEEE Transactions on Signal Processing that falls under the scope of the TC is “scan-reviewed” by a TC member and is thus given a chance to be considered for award nomination. I received the SPS Me ritorious Service Award for my TC service and leadership. ■■ In 2016–2017, I chaired a committee tasked with developing an SPSrelated repository under arXiv. We established arXiv/eess (Electrical Engineering and Systems Science: https://arxiv.org/archive/eess), the first engineering repository under arXiv, which now attracts many thousands of preprints/year. This has enhanced the visibility and footprint of the SPS. ■■ While serving as SPS Vice President for Membership, I led a charge to reduce the SPS membership fees for members in developing countries and another to ensure the proper representation of the different technical constituencies within the SPS on the Awards Board, the Nominations and Appointments Committee, and the SPS Fellows Committee. Fairness, transparency, equity, and inclusion have always been important to me. In recent years, many of us have turned our attention to ML, and this has created a strong current within the SPS. Non-IEEE/SPS venues are growing in job market value. Our answer should be multipronged, but a key point should be to renew our vows to offer quality reviews and insightful and authoritative editorial decisions in a timely fashion. I was pleased to see the ICASSP submission and reviewing process move to the Microsoft CMT platform, which hosts the major AI/ ML conferences, and the presence of big AI/ML industry players at ICASSP 2023. We should up the ante and turn ICASSP into a major AI/ML recruiting event for our students. Open access has been something that we have been struggling with for years now. While we have taken steps to mitigate its short-term impact, we have failed to embrace it. We should offer open access for our flagship IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
transactions and conferences at a reasonable (US$1,000) open access fee, further reduced for members in developing economies. We can compensate for the revenue loss through targeted IEEE Xplore advertisements and the increased visibility/hits/citations/ impact that come from open access. Promoting diversity and fostering inclusive excellence and respect for individual differences should be high on our agenda. We have made progress, but much remains to be done. There are underserved parts of the world that are a significant source of underrepresented SPS talent, and we should focus more on those communities to recruit, mentor, and elevate future SPS leaders. Another grand challenge we must reckon with is the rapidly emerging transition to a more divided multipolar world. As an international scientific society, we should stand united to foster pathways for scientific exchange and better understanding of each other. I will be honored to serve the SPS if elected. For more information about Nicholas Sidiropoulos, please visit https:// sites.google.com/virginia.edu/sidiro poulosforsps-pe?usp=sharing
The 2023 Candidates for Regional Director-at-Large The 2023 candidates for Regional Director-at-Large (presented in alphabetical order) appear next. Candidate biographies will be included in the ballot.
The 2023 Candidates for Member-at-Large The 2023 candidates for Member-atLarge (presented in alphabetical order) appear next. Candidate biographies will be included in the ballot.
Conclusion The BoG is the governing body that oversees the activities of the SPS. The SPS BoG has the responsibility of establishing and implementing policy and receiving reports from its standing boards and committees and comprises (continued on page 25) 13
PERSPECTIVES Yao Wang
and Debargha Mukherjee
The Discrete Cosine Transform and Its Impact on Visual Compression: Fifty Years From Its Invention
C
ompression is essential for efficient storage and transmission of signals. One powerful method for compression is through the application of orthogonal transforms, which convert a group of N data samples into a group of N transform coefficients. In transform coding, the N samples are first transformed, and then the coefficients are individually quantized and entropy coded into binary bits. The transform serves two purposes: one is to compact the energy of the original N samples into coefficients with increasingly smaller variances so that removing smaller coefficients have negligible reconstruction errors, and another is to decorrelate the original samples so that the coefficients can be quantized and entropy coded individually without losing compression performance. The Karhunen–Loève transform (KLT) is an optimal transform for a source signal with a stationary covariance matrix in the sense that it completely decorrelates the original samples, and that it maximizes energy compaction (i.e., it requires the fewest number of coefficients to reach a target reconstruction error). However, the KLT is signal dependent and cannot be computed with a fast algorithm. In January 1974, Ahmed et al. published an article titled “The Discrete Cosine Transform” (DCT) [1]. This seminal article introduced a signal-independent transform, called the DCT, which uses real basis functions from the family of disDigital Object Identifier 10.1109/MSP.2023.3282775 Date of current version: 8 September 2023
14
crete Chebyshev polynomials. The DCT was shown, via numerical examples, to have an energy compaction performance almost as good as the KLT, superior to other well-known signal-independent transforms including the discrete Fourier transform (DFT), Haar transform, and Walsh–Hadamard transform for signals that can be modeled as a first-order Markov process with a correlation coefficient close to one. Furthermore, if the source can be modeled as a Gaussian process, the DCT leads to a rate-distortion bound similar to using the KLT, lower than the DFT. The article also showed that the N point DCT can be obtained from the real part of a modified 2 N point DFT of the zero-extended signal, and thus can be computed efficiently using the fast Fourier transform (FFT) algorithm. The basic research work and events that led to the development of the DCT were summarized in an article titled “How I Came Up With the Discrete Cosine Transform,” by Ahmed in 1991 [2], which reveals that the DCT was first conceived by him in 1972. The DCT was also introduced in a book coauthored by Ahmed and Rao [3]. In a subsequent article in 1982 [4], Flickner and Ahmed proved rigorously that the DCT can be readily derived as the limiting case of the KLT of the first-order Markov processes as the correlation coefficient approaches one. Thankfully, most real-world 1D signals or each dimension of multidimensional signals can be well modeled by a first-order Markov process, making IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
the DCT superior to other orthogonal transforms for signal compression. Ahmed and his collaborators also explored an application of the DCT for compression of real-world signals, first for compression of electrocardiogram signals using a 1D DCT in 1975 [5], then for compression of video using a 3D DCT in 1977 [6] and, furthermore, compression of images using a 2D DCT in 1978 [7]. The article by Hein and Ahmed [7] also introduced a faster algorithm for computing the DCT through the Walsh–Hadamard transform to significantly reduce the number of multiplications. The DCT is a special case of orthonormal transforms for finite length sequences. Let s denote the signal vector, consisting of N signal samples s ^nh, n = 0, 1, f, N - 1, and t repre sent the transform vector, consisting of N transform coefficients t(k), k = 0, 1, f, N - 1. The inverse DCT represents s as a weighted sum of N basis vectors b k s=
/ t^k hb k .
N-1 k=0
The elements of the basis vectors are defined by
b k ^nh = a k cos c
with
^2n + 1 h kr
2N n = 0, 1, f, N - 1
m,
1 ,a = 2, N k N k = 1, 2, f, N - 1.
a0 =
1053-5888/23©2023IEEE
(Note that the DCT introduced in [1] has a different scaling factor a k . Here we choose to define a k so that the basis vectors are orthonormal.) These basis vectors form an orthonormal set, and hence, the coefficient t ^ k h can be simply obtained by the inner product of s and b k t^k h =
/ s ^nh b k ^nh .
N-1 n=0
Although the DCT is introduced for 1D signals of length N, it can be easily extended to 2D signals of dimension N # N by forming the basis images B k, l using the outer product of the DCT basis vectors, i.e., B k, l = b k b Tl , k, l = 0, 1, f, N - 1, and representing an image S by S=
/ / t^k, lh B k,l .
N-1 N-1
Figure 1 shows that the DCT is almost as good as the KLT in energy compaction, yielding a lower approximation error than the Hadamard transform and the DFT for the same K . Although this was demonstrated for first-order Markov processes with high correlation in the original DCT article [1], the fact that it actually holds true for natural images is reassuring. Figure 2 compares the basis images of the DCT and the KLT. As with the DCT, the KLT includes images of horizontal and vertical patterns of different frequency as well as some basis images that have mixed directions. Interestingly, the KLT also includes other directional patterns that the DCT does not (because the 2D
k=0 l=0
The transform coefficients t ^k, l h can be obtained by first applying the length-N 1D DCT to each row of the image and then applying the length-N 1D DCT to each column of the intermediate result, leading to the final transform image T, also of dimension N # N. The reverse operation can recover the original 2D image from the transform image. Similarly, the DCT can be extended to 3D or even higher dimensional signals. The DCT can also be derived by first symmetrically extending the original length-N sequence into a length 2N sequence and then applying the 2N-point DFT. Depending on the type of symmetric extension, various versions of the DCT arise, which are sometimes numbered from DCT-I to DCT-IIX, where the original version proposed in [1] corresponds to DCT-II [8]. This relation of the DCT to the DFT allows a fast hardware or software implementation of the DCT using the butterfly principle known from the FFT, which initially helped to pave the way for the DCT’s use in practical compression algorithms. To illustrate that the DCT has superior energy compaction property close to the KLT for natural images, we compare the average K -term approximation errors with different transforms when each 8 × 8 block of an image is represented with K coefficients that have the largest variances. This experiment is conducted using the 25 images in the Kodak image
400 Reconstruction Error (MSE)
dataset [9]. To derive the KLT transform, we determine the empirical covariance matrix from all nonoverlapping 8 × 8 image blocks (each ordered into a 1D vector of dimension 64) of all images in the dataset, and order the resulting eigenvectors in decreasing eigenvalues. For each of the other transforms including the DCT, Hadamard transform, and the DFT, we evaluated empirically the variance of each transform coefficient. Note that for the DFT, which has complex coefficients, the variance of a coefficient is the sum of the variance of the real part and that of the imaginary part. The K coefficients with largest variances include K distinct real numbers as half of the coefficients are complex conjugates of the other.
DFT Hadamard DCT KLT
300
200
100
0 10
0
20 30 40 50 Number of Coefficients Kept
60
FIGURE 1. A K-term approximation error versus K using different transforms applied on 8 × 8 image blocks. The results shown are obtained by averaging over all nonoverlapping 8 × 8 image blocks in the Kodak image dataset. The coefficients are ordered based on their empirical variances. MSE: mean-square error.
(a)
(b)
FIGURE 2. The basis images of (a) the DCT and (b) KLT, derived from the images in the Kodak image dataset. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
15
sumer cameras, smartphones, and other photographic capture devices, and enabled the interchange and sharing of photographic images on the Internet. Although a later image coding standard, JPEG 2000, used the more powerful full-frame wavelet transform, the original JPEG encoder is more hardware friendly and likely to coexist with JPEG 2000 for decades to come. Figure 3 shows sample JPEG compressed images at different bit rates. We can see that a color image of 24 bits/pixel (bpp) can be compressed to less than 1 bpp without noticeable artifacts. At even lower bit rates, JPEG tends to produce visible blocking artifacts because the DCT is applied to each image block separately and DCT coefficients are quantized independently. Today, the intracoding method in the latest video coding standards such as high-efficiency video coding (HEVC), VP9, AV1, VVC, and EVC uses the DCT as one of several possible block transforms. The intra-only frame coding method in HEVC is one of the best performing image coding methods today, standardized by the ISO/International Electrotechnical Commission as part of the High Efficiency Image File Format and available freely as the Better Portable Graphics codec [10]. More recently, the intra-only frame coding method in AV1 was released as the AVIF image coding format by the alliance of open media. The DCT has also been the basis of all video coding standards from 1988
DCT bases are obtained from the outer product of 1D bases).
Application of the DCT in image and video compression Since its invention in the 1970s, the DCT has been the bedrock for image and video compression. The DCT is the basis for JPEG, the first lossy image compression format that was introduced by the Joint Photographic Experts Group (JPEG) of the International Organization for Standardization (ISO) in 1992. In a nutshell, the JPEG encoder partitions an image into small blocks and applies a 2D DCT to each block. The coefficients are then quantized and entropy coded. DCT-based image coding utilizes both the energy compaction and decorrelation properties of the DCT as well as the insensitivity of the human visual system to high-frequency components, allowing high-frequency DCT coefficients to be quantized more heavily without introducing visible distortion. JPEG greatly reduces the amount of storage required to represent an image at the cost of a relatively small reduction in image quality and has become the most widely used image file format. It also greatly reduces the bandwidth needed to upload or download an image to/from the Internet, through either a wired or wireless connection. The highly efficient JPEG compression algorithm made possible the wide proliferation of digital images and digital photos. The JPEG codec is literally embedded in all con-
(a)
(b)
(c)
FIGURE 3. (a) The original image represented using 24 bits/pixel (bpp). (Source: https://unsplash.com/ photos/r-nJDGpjRic.) (b) and (c) JPEG compressed images at 0.371 bpp and 0.247 bpp, respectively. In (b), the image is compressed by a factor of 64 and yet is almost indistinguishable from the original image. In (c) (with a compression factor of 97), the image has noticeable artifacts in the cloud and water regions due to the heavy quantization of DCT coefficients, but is otherwise acceptable.
16
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
to the present. In these video coding standards, a video frame is either coded standalone (called intraframe) using the DCT and optionally with a few other transforms, or is coded in the predictive mode, where the frame is first predicted from the previous and possibly following frames, and then the prediction error is coded as an image using predominantly the DCT and optionally with a few other transforms. The DCT is the core technology behind all the video coding standards established by the International Telecommunications Union (ITU) and Motion Picture Experts Group (MPEG) of ISO, including H.261 (1988), MPEG-1 (1993), MPEG-2 (1995), H.263 (1998), MPEG-4 (1998), H.264/AVC (2003), H.265/HEVC (2013), and H.266/VVC (2020). The MPEG-1 video standard enabled wide distribution of low-quality movies on CDs and over the Internet in the early 1990s. The MPEG-2 video standard enabled digital video broadcasting over terrestrial, cable, and satellites, replacing analog TVs in the late 1990s as well as distribution of highquality digital movies over DVD. The H.264/AVC video coding standard facilitated the explosion of video content over the Internet and on smartphones in the early 2000s. The H.265/HEVC video standard has enabled even higher-quality video applications since 2013 and, for example, is used for terrestrial broadcasting within Digital Video Broadcasting-Second Generation Terrestrial. The latest video standard, H.266/VVC, has focused on higher resolution and higher dynamic-range video and 360° video. Although the compression methodologies become increasingly more complex with each new standard, they all adopt the basic block-based hybrid prediction plus transform coding framework and utilize the DCT for coding the prediction error blocks. In earlier standards (H.261, MPEG-1, MPEG-2, H.263, and MPEG-4), the DCT was exclusively used for coding the intraframes or prediction residual frames. In later standards, the DCT, along with several other transforms, are adaptively chosen to optimize coding efficiency. Specifically, the standards established
in the last decade starting from H.265/ HEVC (2013) have started using other discrete trigonometric transforms such as certain forms of the discrete sine transform (DST) for predictive intracoding and, to a lesser extent, intercoding, but the DCT remains the most widely used transform in all these codecs. In parallel to the ITU/MPEG effort, the WebM project developed an open royalty-free format VP9 (2012) that also predominantly uses the DCT for intercoding, along with the DST for intracoding. Later, the Alliance for Open Media, an industrial consortium, developed AV1 (2018), a more advanced open, royaltyfree video format, which also adopted the DCT as a core technology, along with certain forms of the DST. VP9 and AV1 are used in popular video streaming applications such as YouTube and Netflix.
Other applications of the DCT The modified DCT (MDCT), a DCT variant, was developed by Princen et al. in 1987 [11]. The MDCT is based on DCT-IV, with the additional property of being lapped; that is, it is designed to be applied to overlapped blocks of samples in a long sequence so that the last half of one block coincides with the first half of the next block. This overlapping helps suppress artifacts stemming from the block boundaries. The MDCT is employed in most of the modern audio coding standards, including MP3, Dolby Digital (AC-3), Advanced Audio Coding, Dolby AC-4, and MPEG-H 3D Audio. Apart from its main application in compression, the DCT has also found its way into other applications of digital signal processing, leveraging its frequencybased signal representation. An example of such an application is image denoising using the BM3D [12] algorithm, which applies a 3D DCT on the 3D block formed by similar image blocks in an image, and applies soft thresholding on the resulting DCT coefficients. The DCT has also played an important role in advancing automatic speech recognition and automatic speaker recognition, where the DCT has been used to compute mel-frequency cepstral coefficients, the features used for the recognition algorithms [13].
Impact of the DCT Today, people take for granted that they can capture and edit high-quality images and videos and freely share them with their friends and families, and that they can watch high-quality images and movies from almost anywhere. During the COVID-19 pandemic, various video conferencing platforms served as essential tools for family members and friends to connect with each other, and for corporate and education entities to conduct remote work and learning. Behind all these applications lie video codecs that utilize the DCT. The societal and economic impact of the DCT is immeasurable! The impact of the DCT is beautifully explained in an February 2021 episode of the TV series “This is Us,” which showed that families were able to share the important milestones of their lives during the pandemic due to the invention of the DCT by Ahmed [14].
Acknowledgment The authors would like to thank Prof. Sanjit Mitra (University of California, Santa Barbara) for encouraging and guiding them to prepare this article. The authors would like to thank Zhongzheng Yuan (New York University Tandon School of Engineering) for generating the experimental results in Figures 1–3.
Authors Yao Wang ([email protected]) received her Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara, in 1990. She is a professor in the Department of Electrical and Computer Engineering and the Department of Biomedical Engineering, New York University Tandon School of Engineering, New York, NY 11201 USA. Her research interests include video compression and delivery, and computer vision and medical image analysis. She is a Fellow of IEEE. Debargha Mukherjee (debargha@ google.com) received his Ph.D. degree in electrical and computer engineering from the University of California, Santa
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
Barbara, in 1999. He is a principal engineer at Google, Mountain View, CA 94043 USA, where he leads open nextgeneration video codec development. His research interests include conventional and learning-based image and video compression, and related areas. He is a Fellow of IEEE.
References
[1] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, no. 1, pp. 90–93, Jan. 1974, doi: 10.1109/T-C.1974. 223784. [2] N. Ahmed, “How I came up with the discrete cosine transform,” Digit. Signal Process., vol. 1, no. 1, pp. 4–5, Jan. 1991, doi: 10.1016/1051-2004(91)90086-Z. [3] N. Ahmed and K. R. Rao, “Orthogonal transforms for digital signal processing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Philadelphia, PA, USA: Springer-Verlag, 1975, pp. 136–140, doi: 10.1109/ICASSP.1976.1170121. [4] D. Flickner and N. Ahmed, “A derivation for the discrete cosine transform,” Proc. IEEE, vol. 70, no. 9, p p. 1132 –113 4, Se p. 1982 , doi: 10.110 9/ PROC.1982.12439. [5] N. Ahmed, P. J. Milne, and S. G. Harris, “Electrocardiographic data compression via orthogonal transforms,” IEEE Trans. Biomed. Eng., vol. BME-22, no. 6, pp. 484– 487, Nov. 1975, doi: 10.1109/ TBME.1975.324469. [6] T. Natarajan and N. Ahmed, “On interframe transform coding,” IEEE Trans. Commun., vol. 25, no. 11, pp. 1323–1329, Nov. 1977, doi: 10.1109/ TCOM.1977.1093769. [7] D. Hein and N. Ahmed, “On a real-time WalshHadamard/Cosine Transform image processor,” IEEE Trans. Electromagn. Compat., vol. EMC-20, no. 3, pp. 453–457, Aug. 1978, doi: 10.1109/TEMC.1978.303679. [8] S. A. Martuci, “Symmetric convolution and the discrete sine and cosine transforms,” IEEE Trans. Signal Process., vol. 42, no. 5, pp. 1038–1051, Mar. 1994, doi: 10.1109/78.295213. [9] KODAK Image Dataset Jan. 27, 2013. [Online]. Available: https://r0k.us/graphics/kodak/ [10] S. Anthony. “BPG: A new, superior image format that really ought to kill off JPEG.” ExtremeTech. Accessed: Dec. 12, 2014. [Online]. Available: https:// www.extremetech.com/computing/195856-bpg-a -new-superior-image-format-that-really-ought-tokill-off-jpeg [11] J. P. Princen, A. W. Johnson, and A. B. Bradley, “Subband/Transform coding using filter bank designs based on time domain aliasing cancellation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1987, vol. 12, pp. 2161–2164, doi: 10.1109/ ICASSP.1987.1169405. [12] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007, doi: 10.1109/TIP.2007.901238. [13] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech Commun., vol. 54, no. 4, pp. 543–565, May 2012, doi: 10.1016/j.specom.2011.11.004. [14] “‘This Is Us’ honored a real-life ‘Genius’ who made it possible for the pearsons to connect amid COVID,” People, Feb. 16, 2021. [Online]. Available: ht t ps://people.com /tv/t h is-is-us-nasi r-a h med -pearsons-births-covid/
17
Moisés Soto-Bajo , Andrés Fraguela Collar , and Javier Herrera-Vega
On the Concept of Frequency in Signal Processing: A Discussion
N
ikola Tesla said: “If you want to find the secrets of the universe, think in terms of energy, frequency and vibration.” Unfortunately, this is a hieroglyph, and we are still looking for its Rosetta Stone. Frequency is a fundamental concept in science. However, in spite of its seemingly simple meaning, its mathematical foundation is not as straightforward as it may seem at first glance. A naive interpretation of the different mathematical concepts for modeling frequency can be misleading as their actual meanings essentially differ from the intuitive notion that they are supposed to represent. This circumstance should be taken into account to develop and apply appropriate signal analysis and processing tools. We discuss this topic to draw the attention of the mathematical and engineering community to this point, which is often overlooked.
Introduction Frequency is a central concept in science and a keystone in signal processing. It describes the oscillatory behavior of signals, which is usually argued to be the manifestation of some of their key features, depending on their nature. Hence, a mathematically rigorous definition of frequency, tightly linked to a meaningful physical or phenomenological interpretation, is highly desirable and critical. Nevertheless, beyond the intuitive notion arisen from the study of simple vibratory waves, this search is intricate and slippery. This is not surprising at all; as stated in [12], “The term ‘instantaneous frequency’ is somewhat of an oxymoron.” It is even paradoxical [1]. In this regard, it is interesting to retrieve the discussion in [11] about the modeling of “real-world” signals. ConDigital Object Identifier 10.1109/MSP.2023.3257505 Date of current version: 8 September 2023
18
cepts such as frequency, time limited, or band limited are helpful mathematical abstractions, but they are actually meaningless in practice. What do we exactly mean/understand by “frequency”? How is it related to oscillations? Is it appropriate for answering the key questions that arise in any discipline?
Motivation The concept of “frequency” is almost ubiquitous in signal processing and in the many disciplines that make extensive use of it, and it is present in the most common methodologies for analyzing signals. Applications of frequency analysis cover a broad spectrum, ranging from biomedical signals like artifact removal in functional near-infrared spectroscopy neuroimaging (measuring Mayer waves, heartbeat, or breathing), limb movement monitoring in poststroke rehab patients, fault detection in induction motors or other devices through acoustic wave analysis, and also in economics and financial time series analysis. In bioengineering, the analysis of bioelectric signals is a major task. Especially interesting is the case of neurophysiology. Electroencephalographic (EEG) signals are supposed to be the superposition or juxtaposition of brain rhythms, which are the manifestation of the collective and synchronous activity of millions of neurons, and they depend on many factors. Consequently, from the perspective of signal processing, the basic goal is to decompose EEG into rhythms or to analyze these “components” or different “oscillatorybehaved” manifestations. Frequencybased analyses are at the heart of this kind of analysis, and many techniques and procedures, ultimately resulting in automatized analytical algorithms, have been developed to process EEG signals. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
Brain rhythms are characterized by frequency and amplitude ranges. They are oscillatory signals with a close-touniform oscillation rate, which fluctuates within a specific range. However, there is not a consensus about the limits of frequency bands (see Figure 1). This is not a rigorous definition, but a “naive” description. Despite being central, the use of frequency is merely intuitive. In practice, it strongly depends on the used methodology. Because of the lack of firmly established and consensual definitions, EEG analysis turns out to be a difficult task, far from being completely understood. How can these fuzzy notions be appropriately mathematically modeled? This is a bewildering and challenging question. The customary answer is: The signal is a relation between the independent variable, let us say time, and the main magnitude (voltage difference). The spectral representation of the signal is another function that relates to each frequency the amplitude corresponding to the basic oscillation at this frequency, that is, the strength with which this harmonic takes part in the composition of the signal (see Figure 2). Clearly, the first description is elementary; it is completely linked to perception or measurement. However, the second one is less intuitive and depends strongly on the frequency concept itself and on the analytical procedure performed. According to this description, rhythms need to be understood in terms of the chosen spectral representation, and they depend on the corresponding concept of frequency. However, it is clear that any analysis should respect the neuroscientist’s paradigm (or the corresponding expert’s, according to each type of phenomenon), in terms of oscillations, and the subsequent components
in which signals are split should be meaningful to the specialists.
Frequency in Fourier analysis Undoubtedly, signal processing borrowed the concept of frequency from physics and mathematics, specifically from classical Fourier analysis. The sinusoidal basic functions are sin (2r~t) and cos (2r~t) , or e 2ri~t . Here, t represents time and ~ frequency. The physical interpretation is the angular frequency or angular speed (in hertz) in the corresponding rotary motion around a resting state or equilibrium point, and it represents the rate of change of the phase, measuring the angular displacement in cycles per unit of time. Frequency measures the oscillatory speed: | ~ | is the number of cycles performed per unit of time. This
Classical Fourier analysis consists of two main areas: the Fourier series and the Fourier transform. The Fourier series theory deals with periodic signals and provides an effective way of analyzing and synthesizing them. The most remarkable result is the Plancherel theorem: The properly normalized trigonometric system " e 2rik $ /L / L : k ! Z , is an orthonormal basis. Consequently, any L-periodic function f is written as the corresponding Fourier series, where the information of f is encoded into its Fourier coefficients, which, due to orthonormality, are given by (put b - a = L)
interpretation is naturally restricted to pure sinusoidal signals [1]. In harmonic analysis, we seek to somehow represent general functions in terms of basic sinusoidal functions. These representations are defined by the set of frequencies involved and the corresponding amplitudes accompanying the sinusoidal components, and they take the form of sequences of coefficients or functions, depending on the nature of the frequency set (discrete or continuous). These amplitudes are the response in frequency of the signal or the spectral representation of the function. Hence, the representation process is split into two parts: the analysis, which consists in obtaining the representative given by the corresponding amplitudes, and the synthesis, which consists in recovering the signal from its representation.
ft(k) = 1 L
#a b f (t) e -2 ikt/L dt (1) r
and satisfy the Parseval identity:
EEG
1 0.5 0 –0.5
γ
α, β
δ, θ
–1
0
1
2
3 Time (s) (a)
4
5
6
0
1
2
3 Time (s) (b)
4
5
6
0
1
2
3 Time (s) (c)
4
5
6
0
1
2
3 Time (s) (d)
4
5
6
0.15 0.1 0.05 0 –0.05 –0.1 –0.15
0.2 0.1 0 –0.1 –0.2
0.2 0.1 0 –0.1 –0.2
FIGURE 1. An EEG signal (a) with two possible decompositions into brain rhythms. (b) d in dark blue and i in light blue, (c) a in red and b in orange, and (d) c in green. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
19
#a b
2
f (t) dt =
/
k!Z
Analogously, the Fourier transform deals with nonperiodic signals. For an integrable function f, the Fourier transform is given by ft(p) =
#-33 f (t) e -2 i t dt.(3) rp
The Plancherel identity reads as
#-33
2
f (t) dt =
#-33
it is inherent to the central concept of frequency itself. It is easy to see some eloquent symptoms of this: one of the main tasks in signal processing is the identification of the characteristic oscillations of signals and where they occur (time–frequency localization). First of all, it is worth noting that, in real applications, one deals with finite signals, so the discrete Fourier transform is applied. The analysis is mostly performed by computing inner products with basic harmonics. For instance, for any p, h ! R,
functions, and the analysis and synthesis processes are performed by the Fourier and inverse Fourier transform, respectively. These objects extend the concept of frequency from pure sinusoidal functions to more general signals. Now, frequency is a parameter that indexes the set of responses of the signal when it is faced against a pure oscillation: the Fourier coefficient ft(k) or the Fourier transform value ft(p) . It is also the “addition index” when recovering the signal. But this is achieved at some price. In this framework, the exact meaning of “frequency” is hidden behind the variables k and p. It turns out that frequency is no longer an oscillation speedometer (although it is supposed to be). This is the point we want to highlight here, and it plunges into the deep nature of time–frequency analysis since
2 ft(k) . (2)
2 ft(p) dp. (4)
In the first case, the set of frequencies is Z (the set of integers), we represent periodic functions, the analysis process consists in computing the se quence of Fourier coefficients, and the synthesis process consists in summing the Fourier series. In the second case, the set of frequencies is R (real numbers), we represent nonperiodic
b sin (rL (h - p)) # e 2ri (h - p) t dt = LC rL (h - p) a (5)
where C = e ri (h - p)(a + b) and ; C ; = 1. This can be interpreted as the kth Fourier coefficient (corresponding to the frequency p = k/L, ignoring the constant L ) of a pure tone of frequency h, or
1
EEG
0.5 0 –0.5 –1
0
1
2
3
4
5
6
Time (s) (a) Spectral Amplitude
1 0.8 0.6 0.4 0.2 0
0
25
50
75
100
125
150
170
200
225
250
Frequency (Hz) (b) Spectral Amplitude
1 0.8 0.6 0.4 0.2 0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Frequency (Hz) (c)
FIGURE 2. An EEG signal (a) with (b) two possible spectral amplitudes (in absolute values). (c) At the bottom, a horizontal zoom is shown. 20
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
15
16
Square Pulse
as the Fourier transform (valued at p) of this windowed (on [a, b]) pure tone of frequency h, too. This implies that, when computed on a finite interval, basic tones interfere with each other in the whole spectrum of frequencies, excluding a discrete sequence of pairings in harmonic consonance, given by the relation (h - p) (b - a) ! Z. Therefore, the appearance of some harmonic, taking part in the composition of the signal, is not only traced back to the corresponding coefficient, but it leaves a spurious trail through the entire spectral representation, except its consonant frequencies. Moreover, this harmonic consonance relation depends not only on the pair of frequencies but also on the length L = b - a of the sample window, which is in general uncorre-
Spectral Amplitude
orthogonality relations are obtained at the price of neglecting dissonant frequencies: (h - p) (b - a) " Z. The Fourier series scatters spectrum energy, giving rise to a completely spurious spectrum with no physical meaning at all (see Figure 3). Moreover, consonant frequencies are also troublesome. Sometimes harmonics represent different and independent features, sometimes not. Several harmonics can contribute altogether to reshape a specific waveform, where it is not every component but the joint admixture that provides a meaningful profile. Furthermore, a recurrent compound waveform could have an intrinsic “frequency” related to that of its components to a greater or lesser extent (see Figure 4).
2 1 0 –1 0
1
2 Time (s) (a)
3
4
2.5 2 1.5 1 0.5 0
Potential-Type Signal
lated with the signal or even with the phenomenon itself. This circumstance could have been favorable when modeling finite vibrating strings, but it does not represent the general case. The previous computation also shows how the Fourier transform spreads en tirely over the whole spectrum, distributing the energy completely. This is exactly the opposite of being condensed at h, as would be expected. This fact is very inappropriate since we are usually interested in studying signals on finite intervals. This clearly manifests the fact that the Fourier transform was designed to handle signals defined in R. You can restrict the support, but you cannot “fool” the Fourier transform. Representing the other side of the same coin are the Fourier coefficients, where
1.5 1 0.5 0 –0.5 –1
1
2
0
3
4
1
5 Frequency (Hz) (b)
6
7
2 Time (s) (c)
8
9
10
3
4
Spectral Amplitude
2 1.5 1 0.5 0
1
2
3
4
5 Frequency (Hz) (d)
6
7
8
9
10
FIGURE 3. (a) In blue, a square pulse signal and (c) an action potential-type signal (b), (d) with their respective spectral representations. In red, a pure sinusoidal tone (sin (2rt/4)) of frequency ~ = 1/4 has been added to them. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
21
But, why should we worry about dissonant frequencies? Does it make sense at all to distinguish between real frequencies and spurious ones? How to define independent or cooperative components and how to classify and/or cluster them? If one is only interested in decomposing and synthesizing signals, there is no major problem. But if the purpose is to get meaningful components, explanatory of the underlying constitutive mechanisms of the phenomenon, the previous concerns are justified. Discordant frequencies are not superfluous at all. In the EEG case, they can occur naturally, as the superposition of independent, perhaps asynchronous and uncoordinated, but concurrent, contributor oscillations. Orthogonality enables energy additivity, but it does not necessarily mean independence or uncorrelation. Frequency analysis can
definitely distort the phenomenological nature of the signal. It is mathematically correct, but misinformative. Other parallel ideas, shaped like uncertainty principles, are well known [5]. In the framework of Fourier analysis, which was conceived for other purposes of a completely different nature, these drawbacks cannot be avoided.
Beyond harmonic analysis: time–frequency analysis Harmonic analysis is a classical discipline that has turned out to be the cornerstone of many others in pure and applied sciences. Concerning time– frequency localization, many techniques have been developed: Fast Fourier transforms, band-limited functions, and prolate spheroidal wave functions, among many others. These tools have been successfully applied in
many contexts. However, in spite of their remarkable properties, there are also disadvantages in their use. Anyway, we want to focus here on the fact that all of these techniques depend on the concepts of frequency that we have just explained and, consequently, share their shortcomings. Beyond the classical harmonic framework, much effort has been devoted to developing alternative tools and procedures for capturing the main time–frequency features of a signal. The point is to identify the pairs of time windows and frequency bands in which characteristic oscillations of the signal occur. These different approaches can be clustered under the common name time–frequency analysis [5]. In contrast to classical harmonic analysis, time–frequency analysis treats (or attempts to) time and frequency variables equally, as primary concepts.
Square Pulse
1 0.5 0 –0.5 –1
Spectral Amplitude
0
1
Potential-Type Signal
3
4
2.5 2 1.5 1 0.5 0
Spectral Amplitude
2 Time (s) (a)
1
2
3
4
5 Frequency (Hz) (b)
6
7
8
9
10
1 0.5 0 –0.5
0
1
2 Time (s) (c)
3
4
1.2 1 0.8 0.6 0.4 0.2 0
1
2
3
4
5 Frequency (Hz) (d)
6
7
8
9
10
FIGURE 4. (a) In blue, a square pulse signal and (c) an action potential-type signal (b), (d) with their respective spectral representations. In red, a pure sinusoidal tone (0.2 $ sin (2r6t)) of frequency ~ = 6 has been added to them.
22
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
However, frequency is not actually an absolute primary concept; only the way the frequency variable appears in each transform (and only that) accurately determines its precise meaning, not the name with which it was coined. Many techniques have been developed with this flavor: the short-time Fourier transform; the spectrogram, ambiguity function, Wigner–Ville distribution, and other quadratic time– frequency representations; and Gabor frames, among others. These tools have been successfully applied to many processing tasks, such as feature extraction, separation of signal components, or signal compression [5]. Typically, time–frequency techniques define a local frequency spectrum that is supposed to report the strength of each oscillation. They borrow their foundational idea from Fourier analysis: response in frequency is obtained by facing up an averaged portion of the signal (via window functions) to the exponential kernel, which encodes the oscillation speed. Consequently, despite their success, the deficiencies of these approaches are inherited; they cannot overcome the limits that Fourier analysis imposes. Presently, many symptoms are actually well known (although the diagnosis maybe not so well). Two main principles are ubiquitous in time–frequency analysis: 1) the smoothness– decay duality: if one representation is smooth, the other one decays; and 2) the uncertainty principles: both representations cannot be simultaneously well localized. At first sight it seems reasonable that sudden jumps produce respective manifestations at high frequencies. But consider the square signal given by sign (sin (2rt)) . How would you describe its periodicity, its oscillation, in terms of frequency? How do you think a neurophysiologist would do it? Now compute its Fourier transform and compare. The uncertainty principle has been well studied [5]. Some relevant examples are the Heisenberg–Pauli–Weyl inequality, the Donoho–Stark uncertainty principle, Lieb’s inequalities,
the radar uncertainty principle, the time-varying frequency, following the Wigner–Ville distribution uncertainty spectral frequency peak. It plays the role of principles, the positivity of smoothed a variable frequency in an amplitude– Wigner–Ville distributions, the Balian– phase representation (the analytic signal) Low theorem, and density theorems that locally best fits the signal [1], [7]. for Gabor frames, etc. They make the The definition and physical interexistence of an absolute and ideal conpretation of the IF as well as the amcept of instantaneous frequency (or plitude–phase decomposition, the spectrum) impossible: there is no finite monocomponent signals, etc., are far energy function concentrated on an from being completely clarified, and arbitrarily small interval with Fourier doubts persist. The suitability of this transform also conmathematical object, The appearance of some centrated on an arbiassumed to possess trarily small band. some “natural” propharmonic, taking part in Remarkable exer ties, is actually the composition of the ceptions to the uncerconditioned to the signal, is not only traced tainty principle are fulfillment of some back to the corresponding a rg uable re qu i rethe Wilson basis, the coefficient, but it leaves Malvar basis, or the ments. Another seria spurious trail through local Fourier bases ous drawback is the [5], [6]. However, benonuniqueness of the entire spectral cause of their struca m pl it u d e – p h a s e representation, except its ture as orthogonal type representations consonant frequencies. bases of windowed and their physical Fourier series, they suffer from the same meanings—only partially solved by Gashortcomings. The window smooths the bor [1]. Also, the IF usually deviates from edge effects but also totally reshapes the the expected frequency band, even attainsinusoidal waveforms, intimately related ing meaningless negative values [7]. to the notion of oscillation. It is clear that the whole spectrum Another case of interest is wavelet cannot be represented by a single theory. Wavelets have been applied with number, so the IF is just appropriate for great success in signal processing and monocomponent signals (whatever it many other disciplines. There are well means). Consequently, m ulticomponent localized in time–frequency wavelets and signals need to be decomposed [1]. very suitable for time–frequency analyMany algorithms have been proposed sis [6]. Nevertheless, not all that glitters for that decomposition. One of the most is gold. In this case, the concept of fresuccessful is the empirical mode decomquency is in some way replaced by the position (EMD). Resulting intrinsic concept of scale or resolution. The scale mode functions (IMFs) are supposed spectrum grows geometrically, and mesh to be monocomponent and thus suitrefines according to scale. Hence, “disable to possess a meaningful IF. This sonant” features (with scale and posiis the so-called Hilbert–Huang transform, tion outside the lattice) will be scattered and the resulting time–frequency–amplithrough several spurious components. On tude/energy distribution is called the the other hand, there is relatively much Hilbert spectrum [8]. freedom when choosing the waveform for The Hilbert spectrum seems to prowavelets; they do not need at all to have a vide a more appropriate notion of IF sinusoidal appearance. than previous methodologies, although it is not exempt from drawbacks. At each IMF, the IF is not homogeneous Instantaneous frequency and may not remain within a band, so The instantaneous frequency (IF) is also possibly many vibrating modes seem worth mentioning. It has a long history, but to coexist. Consequently, IMFs are not its own nature has been quite controversial, necessarily meaningful. Furthermore, despite successful applications, especially the mathematical foundation of all of in the analysis of nonstationary signals. these data-driven techniques (especially The IF is supposed to measure a local IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
23
mutually interfering waveforms. But the so-called sifting process in EMD) this results in a very auspicious proturns out to be very challenging, and posal since the concept of frequency is most of the confidence they generate linked again with repetition: it accounts comes from numerical experiments and for the number of repetitions of the particular data analyses [2], [7], [8]. waveform per unit time, and it also Another interesting approach for measures the instantaneous speed when defining and computing IFs is the performing each one. Note the subtlety synchrosqueezed wavelet transform, underlying this notion. Here the IF does a reallocation method based on wavenot attempt to capture a kind of absolute let analysis [4]. In what is called the oscillation speed, but the local relative adaptive harmonic model, a sinusoidal rhythm with which the corresponding representation with amplitude and frewaveform is traced. quency modulation is considered. In A priori, the wave this case, the IF is shape is an intrinsic always positive, and The information on feature, presumably the m at hem at ica l morphology, speed, the product of the untreatment is more and intensity is unmixed derlying mechanism easily handled. The by using the wave of signal generation, mat hemat ica l inshape, amplitude which could be esterpretation of the timated from previIF is straightforward, functions, and IF. ous knowledge about but not so the physithe phenomenon. Consequently, this cal one, and not the decomposition method is able to capture the essential into intrinsic mode type functions, signal dynamics in a phenomenologiwhich obviously rely on the time–frecally meaningful fashion. In short, the quency representation provided by information on morphology, speed, and wavelet analysis. intensity is unmixed by using the wave Concerning the multicomponent shape, amplitude functions, and IF. splitting problem, and the “one or two frequencies” question, see [3], [9], [10], and [14]. A recent review on nonlinear Conclusions time–frequency analysis can be found It is not easy to rigorously “define” what in [13]. frequency means or even should mean. Time–frequency analysis has been successfully and firmly established. HowThe turn of wave shapes ever, some problems arise, which could In many types of signals (for example, seem counterintuitive (see the musical as in electrocardiography), specific score metaphor [5]) or necessary (uncerwaveforms take place repeatedly. These patterns do not strictly appear as peritainty principles in quantum mechanodic but in the form of modulated basic ics). Other recent attempts, such as the shapes. This led Wu [12] to introduce IF or the wave-shape functions, are the adaptive nonharmonic model, where quite promising, but some perplexing signals are composed by nonsinusoidal features still remain unrevealed. oscillation patterns with time-varying Probably, there is not an absolute amplitudes and arguments, called waveanswer; it could be application dependent. shape functions. They are model-depenWhat is certain is that, in any event, soludent periodic outlines modulated by an tions must be phenomenologically meanintrinsic intrawave frequency. Hence, ingful to allow a deep comprehension of the IF is defined as the derivative of the studied phenomena, and experts in each generalized phase functions. specific field will have the last word. There are still important drawbacks This is definitely an old topic, but it is to overcome, such as the nonuniquestill in full force. There is a real need for ness of representation, the identificagenerating new strategies that fill the gaps tion of suitable basic shape patterns, the between theory and practice and between computation of amplitude and phase real problems and mathematical modelfunctions, or the splitting into different ing. New ideas will produce more fruit24
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
ful notions of frequency, susceptible to being applied in many disciplines.
Acknowledgment The authors thankfully acknowledge CONACYT financial support of MSB through the “Cátedras CONACYT para Jóvenes Investigadores 2016” program and also funding of JHV from CONACYT under postdoctoral scholarship 741147. This article was prepared in the context of the research project A1-S-36879, “Estudio teórico de las soluciones periódicas y de problemas inversos en sistemas de reacción difusión y elípticos que aparecen en los modelos matemáticos de generación y propagación de la actividad eléctrica en el corazón y el cerebro.” The authors thankfully acknowledge the computer resources, technical advice, and support provided by Laboratorio Nacional de Supercómputo del Sureste de México, a member of the CONACYT national laboratories, under project 202104097C. The authors acknowledge the anonymous referees for their useful comments and suggestions, which have improved this text.
Authors Moisés Soto-Bajo (moises.soto@fcfm. buap.mx) received his Ph.D. degree in mathematics from Universidad Autónoma de Madrid in 2012. He has been awarded with a Cátedra Conacyt position from the corresponding program for young researchers from CONACYT, commissioned at Centro Multidisciplinario de Modelación Matemática y Computacional, Benemérita Universidad Autónoma de Puebla, 72570 Puebla, Mexico. He has been a level I researcher with the Sistema Nacional de Investigadores of CONACYT since 2016. His research interests include functional analysis, Fourier, and wavelet analysis and shift-invariant subspace theory, mathematical epidemiology, and mathematical modeling of brain and heart electrical activity. Andrés Fraguela Collar (fraguela@ fcfm.buap.mx) received his B.S. and M.S degrees from the University of Havana (Cuba), his Ph.D. degree from Lomonosov State University, and his Habilitation of the Doctorate in science
at the M.V. Lomonosov University and the V.A. Steklov Mathematical Institute of the Russian Academy of Sciences. He is currently a professor of Applied Mathematics at the B e n e m é r i t a Universidad Autónoma de Puebla, 72570 Puebla, Mexico. He is a recipient of the Distinguished Visitor Award from the Complutense University of Madrid, an associate researcher at the International Center of Theoretical Physics of Trieste (Italy), and received the State Prize for Science and Tech nology of the State of Puebla, Mexico. His research focuses on theoretical results in several branches of analysis and differential equations and their applications in epidemiology and medicine, including the analysis of normal and abnormal electrical activity of the brain and the heart and its correlation with the corresponding electrical signals, using inverse problem methodologies. Javier Herrera-Vega (vega@fcfm. buap.mx) received his Ph.D. degree in computer science from the National Institute of Astrophysics, Optics and
SOCIETY NEWS
Electronics. He was a posdoctoral researcher at Centro Multidisciplinario de Modelación Matemática y Com putacional and is with Benemérita Universidad Autónoma de Puebla, 72570 Puebla, Mexico. He has been a level C member of the Sistema Nacional de Investigadores since 2020. His research focuses on the processing and analysis of biomedical signals, mainly from neuroimaging modalities like electroencephalography and functional nearinfrared spectroscopy.
References
[1] B. Boashash, “Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals,” Proc. IEEE, vol. 80, no. 4, pp. 520–538, Apr. 1992, doi: 10.1109/5.135376. [2] B. Boashash, “Estimating and interpreting the instantaneous frequency of a signal. II. Algorithms and applications,” Proc. IEEE, vol. 80, no. 4, pp. 540–568, Apr. 1992, doi: 10.1109/5.135378.
[5] K. Gröchenig, Foundations of Time-Frequency Analysis. New York, NY, USA: Springer Science & Business Media, 2001. [6] E. Hernández and G. Weiss, A First Course on Wavelets. Boca Raton, FL, USA: CRC Press, 1996. [7] N. E. Huang, Z. Wu, S. R. Long, K. C. Arnold, X. Chen, and K. Blank, “On instantaneous frequency,” Adv. Adaptive Data Anal., vol. 1, no. 2, pp. 177– 229, Apr. 2009, doi: 10.1142/S1793536909000096. [8] N. E. Huang and S. S. P. Shen, Hilbert-Huang Transform and Its Applications, vol. 16, 2nd ed. Singapore: World Scientific, 2014. [9] V. Lostanlen, A. Cohen-Hadria, and J. P. Bello, “One or two frequencies? The scattering transform answers,” in Proc. 28th IEEE Eur. Signal Process. Conf. (EUSIPCO), 2021, pp. 2205–2209, doi: 10.23919/Eusipco47968.2020.9287216. [10] G. Rilling and P. Flandrin, “One or two frequencies? The empirical mode decomposition answers,” IEEE Trans. Signal Process., vol. 56, no. 1, pp. 85–95, Jan. 2008, doi: 10.1109/TSP.2007.906771. [11] D. Slepian, “On bandwidth,” Proc. IEEE, vol. 64, no. 3, pp. 292–300, Mar. 1976, doi: 10.1109/ PROC.1976.10110. [12] H.-T. Wu, “Instantaneous frequency and wave shape functions (I),” Appl. Comput. Harmon. Anal., vol. 35, no. 2, pp. 181–199, Sep. 2013, doi: 10.1016/j. acha.2012.08.008.
[3] A. Cicone, S. Serra-Capizzano, and H. Zhou, “One or two frequencies? The iterative filtering answers,” 2021, arXiv:2111.11741.
[13] H.-T. Wu, “Current state of nonlinear-type time–frequency analysis and applications to high-frequency biomedical signals,” Current Opinion Syst. Biol., vol. 23, pp. 8–21, Oct. 2020, doi: 10.1016/j.coisb.2020.07.013.
[4] I. Daubechies, J. Lu, and H.-T. Wu, “Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool,” Appl. Comput. Harmon. Anal., vol. 30, no. 2, pp. 243–261, Mar. 2011, doi: 10.1016/j. acha.2010.08.002.
[14] H.-T. Wu, P. Flandrin, and I. Daubechies, “One or two frequencies? The synchrosqueezing answers,” Adv. Adaptive Data Anal., vol. 3, no. 1, pp. 29–39, Apr. 2011, doi: 10.1142/S179353691100074X.
SP
(continued from page 13)
23 Society members: the President and President-Elect, who are elected by the voting members of the Society; five Vice President officers of the Society, who are elected by the BoG; nine Members-at-Large, elected by the voting members of the Society; four Regional Directors-at-Large, elected locally by the Society voting members of the corresponding Region; the Awards Board chair and Young Professionals Committee chair. The seven officers are the President, PresidentElect, Vice President of Conferences, Vice President of Education, Vice President of Membership, Vice Presi-
dent of Publications, and Vice President of Technical Directions. The Executive Director of the Society shall serve ex officio, without vote. The President-Elect is an SPS member elected by the Society’s membership via the annual election to serve as an officer and as a voting member on the Society’s BoG, Executive Committee, Conferences Board, Education Board, Membership Board, and Publications Board. The President-Elect position automatically succeeds to P resident. Regional Directors-at-Large are SPS members who are elected locally IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
by Society voting members of the corresponding Region via the annual election to serve on the Society’s BoG as nonvoting members and voting members of the Society’s Membership Board. Members-at-Large represent the member viewpoint in the BoG’s decision making. They typically review, discuss, and act upon a wide range of items affecting the actions, activities, and health of the Society. More information on the SPS can be found on the SPS website at https://sig nalprocessingsociety.org/. SP 25
Sebastian Miron , Julien Flamant , Nicolas Le Bihan , Pierre Chainais , and David Brie
Quaternions in Signal and Image Processing
©SHUTTERSTOCK.COM/METAMORWORKS
A comprehensive and objective overview
Q
uaternions are still largely misunderstood and often considered an “exotic” signal representation without much practical utility despite the fact that they have been around the signal and image processing community for more than 30 years now. The main aim of this article is to counter this misconception and to demystify the use of quaternion algebra for solving problems in signal and image processing. To this end, we propose a comprehensive and objective overview of the key aspects of quaternion representations, models, and methods and illustrate our journey through the literature with flagship applications. We conclude this work by an outlook on the remaining challenges and open problems in quaternion signal and image processing.
History, background, and aim of the article Quaternions were first introduced by Irish mathematician Sir William Rowan Hamilton in 1843 as a result of his dedication Digital Object Identifier 10.1109/MSP.2023.3278071 Date of current version: 8 September 2023
26
to generalizing complex numbers in more than two dimensions. He spent many years trying in vain to define a 3D algebra based on a system of triplets. Story has it that he discovered quaternions when walking along the Royal Canal in Dublin, Ireland, on 16 October 1843 and immediately carved the fundamental equation for quaternion algebra in the stone of the nearby Brougham Bridge: i 2 = j 2 = k 2 = ijk = -1(1)
where i, j, and k define imaginary units. While the carving has now disappeared, a plaque honoring Hamilton’s memory can be found at the same place today. Hamilton devoted his last 20 years to the study of his quaternions, which culminated in his book Elements of Quaternions. After his death in 1865, quaternions remained fashionable for some time, but they were rapidly superseded by the advent of linear algebra as we know it today through the work of Gibbs and Heaviside at the end of
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
1053-5888/23©2023IEEE
the 19th century. To learn more about the fascinating history of quaternions and linear algebra, we recommend reading A History of Vector Analysis: Evolution of the Idea of a Vectorial System, by Michael J. Crowe, Dover Publications, 1994. Still, Hamilton was a precursor in many aspects and influenced many. For instance, he invented the term vector well before the advent of modern linear algebra; at the time, it simply referred to the 3D imaginary part of a quaternion. The set of quaternions is usually denoted by H as a tribute to Hamilton’s discovery. Just as complex numbers are well known to describe algebraically the geometry of the 2D plane, quaternion algebra permits straightforward descriptions of geometric transformations in 3D and 4D spaces. As a generalization of complex numbers to higher dimensions, quaternions are the first and simplest example of hypercomplex numbers. For more details on the topic of hypercomplex algebras, we refer the interested reader to Hypercomplex Numbers: An Elementary Introduction to Algebras, by I.L. Kantor and A.S. Solodovnikov, Springer, 1989. Formally, a quaternion is defined by a real (or scalar) part and an imaginary (or vector) part made of three components along imaginary units i, j, and k. This close relationship between purely imaginary (or, simply, pure) quaternions and vectors in R 3 is fundamental. In fact, the triplet of imaginary units (i, j, k) can be identified with the canonical Cartesian basis of R 3 given by (e 1, e 2, e 1 # e 2), where # denotes the cross product between vectors of R 3. Remarkably, quaternion algebra encodes the cross product operation in a natural way since ij = k, jk = i, or
ki = j. More generally, the product of two quaternions involves 3D scalar products and cross products. This also explains why the multiplication of two quaternions is noncommutative: it results from the well-known noncommutativity of the cross product and translates the fact that geometric transformations in 3D and higher dimensions lack commutativity as well. For later reference, Table 1 collects essential definitions, sets, properties, and polar forms related to quaternion algebra. Perhaps one of the most striking examples of their utilization in today’s applications lies in quaternions’ ability to represent 3D rotations. Representing a 3D rotation with a single-unit quaternion has many benefits over standard Euler angles rotation matrices: a lower number of parameters, no gimbal lock singularities (this refers to the loss of one degree of freedom that can occur when using Euler angles to parameterize 3D rotations, causing important practical issues when representing a sequence of rotations), and nice interpolation properties between rotations. These advantages have been acknowledged for a long time in robotics [1] and computer graphics [2], where the use of quaternions is well established. On the contrary, the use of quaternions in signal and image processing is still blooming, with the first works dating back to the early 1990s [3]. This article aims at providing an overview of the current use of quaternions in signal and image processing, ranging from data representation using quaternions to dedicated quaternion domain methods and algorithms. It is intended to demystify the field for newcomers and make it accessible to the many. We hope to demonstrate
Table 1. A handbook of quaternion algebra. Basic Definitions
Elementary relations
H = span " 1, i, j, k , i 2 = j 2 = k 2 = ijk = -1, ij = - ji = k ki = -ik = j, jk = - k j = i
Cartesian representation
q = a + ib + jc + kd, a, b, c, d ! R
Real and imaginary parts
Re (q) = a, Im i (q) = b, Im j (q) = c, Im k (q) = d S (q) = a, V (q) = ib + jc + kd qr = a - ib - jc - kd = S (q) - V (q)
Canonical Basis
Scalar and vector parts Conjugation Modulus
; q ; = qqr = qq r = a2 + b2 + c2 + d2 qr q -1 = 2 , q ! 0 ;q; q n = - nqn, n 2 = -1
Inverse Involution Sets
Pure quaternions Unit quaternions Complex subfields of H, n ! V (H) + Sp (1)
Properties p, q ! H
Addition
Re ( p + q) = Re ( p) + Re (q) Im n ( p + q) = Im n ( p) + Im n (q), n = i, j, k
Product
pq = S ( p) S (q) - V ( p), V (q) conjugation (pq) = qr pr involution (pq) n = p n q n inverse (pq)-1 = q -1 p -1 modulus pq = p q
Compatibility with operations
Polar forms and geometry
V (H) = " q ! H Re (q) = S (q) = 0 , Sp (1) = # q ! H q = 1 C n = # a + n b a, b ! R -
Euler formula (axis–angle representation) Euler angle polar form (xzy convention) 3D rotation by axis n, angle a ! [0, r]
R3
+ S ( p) V (q) + S (q) V (q) + V ( p) #R V (q) 3
q = q e n U = q ^ cos z q + n q sin z q h axis n q ! V (H) with n q = 1, angle z q ! [0, 2r] r r r r q = q e ii e -k| e jz i ! 9- 2 , 2 C, | ! 9- 4 , 4 C, z ! [0, 2r] a a R n,a (q) = exp a n 2 k q exp a -n 2 k q
q
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
27
that, up to the special care required to extend standard signal and image processing tools to quaternion algebra, the use of quaternion domain approaches enables a compact, elegant, and interpretable way to handle geometric properties of signals and images.
Representing signals and images with quaternions Many physical phenomena can be probed using (electronic) sensors. From a very broad viewpoint, sensors transform complex physical properties into electrical properties (such as output voltage or current) that can be processed by further electronics. For that reason, physical measurements always boil down to acquiring real values: intensity of light passing through a color filter or variations of amplitude along one direction in an accelerometer, for instance. Data recordings therefore correspond to arrays of real numbers, such as vectors (e.g., univariate signals) or matrices (e.g., gray-scale images). However, even if raw data are intrinsically real valued, one often takes advantage of other representations to facilitate modeling, analysis, or processing. One of the most striking examples is, perhaps, the use of complex numbers in signal and image processing. Complex numbers arise naturally when transforming raw data by using the (complex) Fourier transform. Such a manipulation enables many insights that would otherwise be (almost) impossible. For instance, complex numbers define unambiguously the essential notions of magnitude and phase, which are pivotal to signal processing practice: spectral analysis, filter design, time-frequency analysis, array processing, and so on. They also provide a compact and elegant way to write pairs of signals, such as in-phase and quadrature components in communications or functional magnetic resonance imaging. These several convenient properties explain the popularity of complex-valued representations in signal and image processing. Quaternions are no different in that respect. Just like complex numbers, they offer a novel representation space, which exhibits several unique properties, such as polar forms and natural handling of 3D geometry, which can be interesting to exploit in applications. More importantly, quaternions define a (skew) field. This means that except for the noncommutativity of the quaternion product, quaternions have the same desirable properties as the real and complex fields. This ensures that the mathematical foundations crucial to signal processing (the Fourier transform, vector spaces, linear algebra, and so on) can all be defined in a meaningful way. Moreover, the similarity among methodologies developed for quaternion-valued signal and image processing and their real counterparts tend to demonstrate that noncommutativity is not an issue in general, provided that it is handled in an adequate manner. Since their introduction in the signal processing community more than three decades ago, the usage of quaternion-valued representations has focused on two complementary settings, namely, the encoding of 3D and 4D signals and the construction of interpretable algebraic embeddings of signals and images.
Encoding 3D and 4D vector signals This first setting may arguably be seen as the most natural one. The main idea is to encode the components of 3D or 4D vector signals on the three (imaginary only) or four (real and imaginary) parts of a quaternion. This allows us to extend the standard arith28
metic operations over real numbers (addition, subtraction, multiplication, and division) to 3D and 4D real vectors. In the case of 2D real vectors, this extension is naturally performed by complex numbers. This way, one can handle vector quantities by using algebraic operations in a way similar to what can be done with scalars. This can be very helpful, especially for the case when 3D or 4D vector data are acquired with respect to one or two diversities (time, space, wavelength, and so on). As an illustrative example, consider the case of a color image defined by the triplet of real matrices {R, G, B} encoding red, green, and blue color channels, respectively. This triplet can be conveniently represented as the pure quaternion matrix Q = iR + jG + kB. This algebraic representation follows directly from the identification of the imaginary units i, j, and k with the canonical Cartesian basis of R 3 . It permits us to separate the internal multivariate nature of the color image (i.e., a 3D vector encoding colors at each pixel) and its external multidimensional nature (an array of M × N spatial pixels) in an elegant way. In comparison, the equivalent real domain representation of such data is often cumbersome and usually handled by stacking the three or four components in a single long vector or matrix. While this stacking procedure is mathematically sound, it may interfere with the intimate relationships between the internal components and the geometric properties of such vector data. On the contrary, the algebraic quaternion encoding of 3D and 4D vectors enables natural representations of multidiversity vector data as quaternion vectors and quaternion matrices. This also means that many fundamental signal processing operations for 3D and 4D vector data can be formulated in terms of quaternion linear algebra operations in a rather straightforward way. Building on these advantages, quaternions have been effectively employed to encode vector measurements in seismology [4], wind and temperature forecasting [5], electromagnetics [6], telecommunications [7], and color image processing [8], [9], to name a few.
Algebraic embeddings of signals and images This second setting is a little bit more intricate. It relies on carefully designed transforms that map signals or images into the quaternion domain. These transforms define quaternion embeddings, which facilitate the analysis, understanding, and processing of the original data. They ship with highly interpretable parameters, making it possible to decipher geometric features of the original signal or image. So far, most of the research toward interpretable quaternion-valued embeddings has focused on two areas: the construction of quaternion transforms for analyzing local features in gray-scale images and the development of a geometric signal processing toolbox for bivariate signals. It is worth noting that although being apparently unrelated, both approaches consider generalizations of the analytic signal in higher dimensions by exploiting quaternion algebra; they also both leverage extensively quaternion polar forms for meaningful interpretations of the embeddings. These two areas are reviewed in detail in the following.
Quaternion transforms for gray-scale image analysis The importance of the analytic signal for the understanding and modeling of the instantaneous amplitude and phase of real-valued
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
signals has been recognized for a long time in signal processing. This motivated the study of its generalizations in higher dimensions, the most prominent one being the definition of a meaningful 2D “analytic signal” to analyze the local content of images (gray scale). The main difficulty in directly extending definitions from the 1D case is that higher dimensions (notably, 2D) lack a natural multidimensional Hilbert transform. To fill this gap, several approaches have been designed for the 2D case. The most salient ones exploit the higher degrees of freedom offered by quaternion algebra to formulate meaningful 2D counterparts of the 1D analytic signal. A first approach, proposed by Bülow and Sommer [10], uses a carefully designed 2D quaternion Fourier transform (QFT) enjoying desirable symmetry properties for gray-scale images. This permits us to define a one-to-one mapping between a gray-scale image and a quaternion-valued image obtained by restricting its QFT to the first orthant. Further decomposing this quaternion-valued signal by using the Euler quaternion polar form (see Table 1) allows identification of a local amplitude and three phases, which are meaningful for texture analysis. This QFT-based approach was further explored in [11] with the design of a dual-tree quaternion wavelet transform for coherent multiscale analysis of gray-scale images. Another line of work, perhaps the most popular one, revolves around the monogenic signal. It was first introduced by Felsberg and Sommer [12] as a generalization of the analytic signal to the 2D case. The monogenic signal is a quaternion-valued image built from the original gray-scale image and two Riesz transforms. The interpretation as a 2D “analytic signal” essentially comes from the intuition that “the Riesz transform is to the Hilbert transform what the gradient is to the derivative operator,” to quote [13]. Given a gray-scale image f (r) with spatial coordinates r = (r1, r2), the Riesz transform Rf = (R 1 f, R 2 f) is defined in the spatial domain as
R i f (r) = p.v. 1
r
##R
2
(ri - ril) 3 f (r) dr, i = 1, 2 (2) r - rl 2
where p.v. stands for Cauchy principal value. The Riesz transform is translation and scale invariant. It also exhibits nice
–0.5
0 (a)
0.5
0.2
0.4
0.6
0.8
1
compatibility with 2D rotations, a property known as steerability. The monogenic signal Mf is constructed in the quaternion domain as Mf = f + iR 1 f + jR 2 f. Being quaternion valued, it can be uniquely decomposed using the quaternion polar form q = q e nq zq, where the axis n q is a pure unit quaternion (i.e., such that n 2q = - 1) and z q ! [0, r) is the phase. Applying this polar decomposition to the monogenic signal enables identification of local features of the image f (r) in a straightforward way. It reads Mf (r) = A (r) exp (n i (r) z (r)) (3)
where A (r): = Mf (r) defines the local amplitude, z (r) is the local phase, and the axis n i (r) = i cos i (r) + j sin i (r) defines a local orientation i (r) ! [- r, r) . Note that the axis n i (r) has no k component, as a result of the construction of the monogenic signal Mf (r) along the imaginary axes i and j. As a first example, consider a plane wave f (r) = A 0 cos ^l · rh, where l = (l 1, l 2) ! R 2 is the wavenumber vector. Direct computations of the Riesz transform yield the monogenic signal Mf (r) = A 0 exp 6(l · r) (i cos i 0 + j sin i 0))@, with i 0 = arg (k 1 + ik 2) . Hence, the local amplitude is constant A (r) = A 0, the local phase z (r) = l · r is directly that of the cosine wave, and the local orientation is constant i (r) = i 0, corresponding to that of the wavenumber vector k in the 2D plane. Figure 1 depicts a more sophisticated example corresponding to a 2D amplitude modulation (AM)-frequency modulation (FM) mode. The monogenic signal permits direct identification of Gaussian AM kernel A (r) and local orientation i (r) . The local phase z (r) allows us to detect lines (z (r) = 0 mod r) simultaneously with contours (z (r) = r/2) . The monogenic signal provides key insights into the geometry of 2D gray-scale images. However, it suffers from the same limitations as the standard 1D analytic signal approach. It exhibits poor performance in noisy settings and fails to capture meaningful local features when considering multicomponent 2D signals, such as the superposition of 2D AM–FM modes. Therefore, an important line of research has focused on extending
π /2
0
π
(c)
(b)
0
π /2
π
(d)
FIGURE 1. Monogenic signal analysis of an amplitude modulation-frequency modulation mode. The quaternion polar form enables identification of the local amplitude (Gaussian kernel envelope); local orientation (shown as direction for visualization purposes), which gives the orientation of the tangent vector of contour lines; and local phase, which encodes image lines (zero or pi values) and contours (r/2 values). The (a) original image, (b) local amplitude, (c) location orientation (mod r ), and (d) local phase. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
29
the monogenic signal approach toward multiscale or multiresolution analysis of gray-scale images. The general idea is to devise multiple filter banks built from monogenic wavelet functions at different scales. Then, at each scale or resolution, one can identify corresponding local features by computing the quaternion polar form of coefficients. For completeness, it is worth noting that not all works subsequent to the original seminal paper [12] use an explicit quaternion formulation of the monogenic signal. However, they largely make use of local angle and axis features, which are naturally connected to the quaternion polar form (3) of the monogenic signal. Therefore, these approaches can be labeled as quaternion based in a broad sense. Hereafter, we mention some of these extensions. One is the monogenic continuous wavelet transform [14], which can be seen as a generalization of the 1D analytic wavelet transform to the case of gray-scale images. In the discrete case, a minimally redundant monogenic multiresolution analysis was proposed in [13], using so-called Riesz–Laplace wavelets. A generalization of the curvelet transform to the monogenic case, called the monogenic curvelet transform, was proposed in [15]. Other proposed approaches include transposing ideas from mode reconstructions in time-frequency analysis to the case of the monogenic signal, leading to the monogenic synchrosqueezing transform [16], or extending monogenic wavelet decompositions to the case of color images [17]. Monogenic signal-based approaches have found many applications (e.g., texture segmentation, target recognition, and boundary detection) in various domains, such as medical imaging, synthetic aperture radar imaging, and geophysics.
QFT for bivariate signal processing Bivariate signals appear in a broad range of applications where the joint analysis of two real-valued time series is required: polarized waveforms in seismology and optics, eastward and northward current velocities in oceanography, or even gravitational waves (GWs) emitted by coalescing compact binaries. In such applications, it is crucial to provide clear and straightforward interpretations of the joint geometric and dynamic properties of the two components x 1 (t) and x 2 (t) that define the bivariate signal. Formally, a bivariate signal can be represented in two equivalent ways: a 2D time-evolving vector x (t) = [x 1 (t), x 2 (t)] ! R 2 or a complex-valued signal x (t) = x 1 (t) + ix 2 (t) ! C encoding the two components on its real and imaginary parts. While the vector representation is generic (meaning that it is not restricted to the bivariate case), it also hinders a natural understanding of the geometric properties of bivariate signals. On the other hand, the complex representation permits the definition of a meaningful quaternion framework for bivariate signals, relying on 1) a dedicated QFT and 2) the extensive use of quaternion calculus (such as polar forms) to extract relevant physical and geometric information. The key intuition for a quaternion spectral representation of bivariate signals is rather simple. For real-valued (that is, univariate) signals, the use of the standard complex Fourier transform enables a complex-valued spectral representation. This complex embedding of univariate signals is at the heart of definitions of amplitude and phase, which are crucial to many tasks of signal 30
processing, such as spectral analysis, filtering, or time-frequency analysis. Now, if one represents bivariate signals as complexvalued signals, a quaternion embedding can be constructed in a similar way. First, observe that x (t) = x 1 (t) + ix 2 (t) ! C i 1 H : it is a special case of a quaternion-valued signal. However, contrary to the complex Fourier transform, the QFT has no unique (or canonical) definition. The freedom of definition comes from the position of the exponential, which can appear either left or right of the signal x(t), and from the choice of the axis n (a pure unit quaternion such that n 2 = - 1) in the exponential. For instance, by choosing n = i, with x (t) ! C i, one recovers the standard complex Fourier transform. For bivariate signals, the right-sided QFT definition with n = j is usually adopted [18]: X^ f h =
#R x (t) e -j2 ft dt, r
x (t) ! C i .(4)
The definition (4) exhibits every desirable property of Fourier transforms: it is well defined for typical bivariate signals, it preserves energy and inner products (the Parseval-Plancherel theorem), and it can be computed efficiently with two fast Fourier transforms by observing that X ^ f h = X 1 ^ f h + iX 2 ^ f h, where X 1 ^ f h, X 2 ^ f h are standard (C j -valued) complex Fourier transforms of x 1 (t) and x 2 (t), respectively. More importantly, for bivariate signals viewed as (C i -valued signals, it exhibits a Hermitian-like symmetry X ^- f h = - iX ^ f h i, meaning that only the positive frequency spectrum carries relevant information. This makes it possible to define the quaternion embedding of a bivariate signal by canceling out the negative frequency spectrum. This bivariate analog of the well-known analytic signal of real-valued univariate signals is defined as x + (t) =
#R
+
X ^ f h e j2rft df.(5)
The signal x + (t) is quaternion valued. Therefore, at each time instant, it can be decomposed thanks to the Euler polar form of a quaternion q = ae ii e -k| e jz, which identifies a magnitude a := q and three phases corresponding to successive rotations around axes i, j, and k. The Euler polar form plays the same role as the standard polar form for the usual analytic signal. It establishes a one-to-one mapping between the original bivariate signal x(t) and a canonical quadruplet of instantaneous parameters [a (t), i (t), | (t), z (t)], obtained by decomposing the quaternion embedding x + (t) as x + (t) = a (t) e ii (t) e -k| (t) e jz (t) .(6)
Under a classical narrowband assumption on x(t), it is now possible to attach a very insightful interpretation to the canonical parameters [a (t), i (t), | (t), z (t)] . Figure 2(a) displays the instantaneous ellipse traced out by x(t) in the (x 1, x 2) plane. The ellipse is characterized by its size a(t), orientation i (t), and shape | (t): the last canonical parameter z (t) corresponds to the dynamical phase, i.e., the instantaneous position of x(t) within the ellipse. This shows that the instantaneous parameters have a natural geometric interpretation, which also corresponds to the physical notion of polarization in optics.
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
S3 S0
i, x2
x(t) = x1(t) + ix2(t) |a |
φ
sin |χ|
θ •
j,
S1 S0
2θ
x1
χ |a |
s co
µ
2χ
χ
k,
(a)
S2 S0
(b)
2
1e–20
χ>0 χ M. This problem appears naturally in color imaging, where y is a color patch encoded as a pure quaternion vector, D is an overcomplete collection of color atoms, and q is a sparse quaternion vector. Just like in the real and complex cases, this problem can be solved using greedy algorithms based on the , 0 penalty, such as quaternion orthogonal matching pursuit [8], [34]. On the other hand, following standard practice, one can relax the nonconvex , 0 penalty into a convex , 1 -norm regularization, leading to the quaternion least absolute shrinkage and selection operator (LASSO) problem [35], [36] (or equivalently, quaternion basis pursuit denoising):
argmin y - Dq q ! HM
2 2
+ m q 1, m $ 0.(16)
In (16), the quaternion , 1 norm of vector q ! H M is defined as the sum of moduli of its entries q 1 : = R mM= 1 q m . Interestingly, since the quaternion , 1 norm collects a sum of moduli, it can be interpreted as a mixed norm , 2, 1 on the real matrix A obtained by concatenating the four real-valued components of q such that A = 6q a q b q c q d@ ! R M # 4; i.e., q 1 = A 2, 1 . This means that the Q-LASSO (16) can be interpreted in the real domain as a group LASSO with M groups of size four. The problem (16) defines a convex optimization problem in quaternion variables. In practice, quaternion convex optimization problems can be solved by translating usual real-domain algorithms to the quaternion case by leveraging generalized HR calculus. Still, this generalization is not trivial and requires special care [37]. In particular, the general form of equality constraints in quaternion convex optimization problems is widely affine; i.e., it reads A 1 q + A 2 q i + A 3 q j + A 3 q k = b, where matrices and vectors are quaternion valued with appropriate sizes. In comparison, in real-domain convex optimization, only affine equality constraints of the form Ax = b, where vectors are all real valued, are considered. To solve the (unconstrained) problem (16), one can adapt the celebrated iterative shrinkage-thresholding algorithm (ISTA) to handle quaternion variables. Generalized HR calculus makes it possible to derive the quaternion ISTA iterations in an intuitive way. Letting f (q) = y - Dq 22, the iterations read 36
q (k + 1) = Tmhk " q (k) - h k d qr f (q (k)) , (17)
where h k 2 0 is the step size at iteration k and Tb is the soft thresholding operator (i.e., the proximal operator associated to the quaternion , 1 norm) given entry-wise by [38]: Tb (q) = max (0, 1 - b/; q ;) q. Note that in this case, the gradient can be directly computed, thanks to Table 2, as 4 qr f (q) = (1/2) D H (Dq - b) . Of course, more sophisticated optimization problems can be formulated and solved directly in the quaternion domain. For instance, in the case of 3D data sparse coding (such as color images) that uses pure quaternions, it might be relevant to impose that the solution q * satisfies Re (Dq *) = 0. This constraint is widely linear, meaning that constraining the problem (16) preserves its convex nature. Solving this type of constrained quaternion optimization problem can be carried out within the same general framework, for instance, using the quaternion alternating direction of multipliers method, as explained in [37].
Statistics for quaternion random variables Statistics for quaternion random variables and vectors has been developed to extend classical signal processing algorithms relying on probabilistic models. As 4D variables, quaternions can be understood as either 4D real vector-valued random variables, i.e., variables in R 4, or as 2D complex random vectors, i.e., variables in C 2 . Just like for complex-valued random variables and vectors, detecting and taking into account symmetries in the probability density function (pdf) of quaternion random variables is essential in devising powerful signal processing tools. The notion of properness (also known as second-order circularity) captures such rotation invariance of the pdf. It was considered in many scenarios and exploited in several algorithms either as an extra parameter in the signal model or as a signature of the absence/presence of a targeted signal hidden in a noisy environment. A solid amount of literature is available for the complex case (see, e.g., [39] and the references therein). In the quaternion case, the original study and definition of properness traces back to Vakhania’s work [40] before being considered by the signal processing community [41], [42], [43], [44]. The major difference between the complex and quaternion cases is the existence of different levels of properness over H, while only one level can be identified over C. The three levels of properness of a quaternion-valued random variable are denoted R properness, (1, n) properness (where n is a pure unit quaternion, also denoted C n properness), and H properness [42], [44]. (The notion can be directly extended to vectors, but we illustrate here the concept on scalar-valued variables for clarity.) These levels correspond to different invariance properties of the random variable distribution. In the Gaussian case, properness means invariance properties of the 4 × 4 real (augmented) covariance matrix C R = E {q R q 1442443
direct path
RIS path
direct path
RIS path
h r = a r a r ^ i 1 h + G r Ub ^ i 2 h 1442443 1442443
Forward RIS
Target
Tx/ Rx
Forward RIS
Target
Tx
Target
Backward RIS
(b)
(c)
RIS Path (Transmit Signal)
Target
Tx
Rx
Tx/ Rx (a)
(3)
where G t = b t a t (~) b H (o) ! C L T # N and G r = b r a r (~) b H (o) ! C L S # N are the channels between the RIS and Tx and between the RIS and Rx, respectively; ~ is the direction of the path between the Tx/Rx and RIS with respect to the Tx/Rx; and o is the direction of the path from the Tx/Rx at the RIS. Here, b t and b r are the overall attenuation of the RIS paths. Although a t is inversely proportional to the squared distance between the Tx and the target, b t is inversely proportional to the product of the squared distance between the Tx and RIS and the RIS and the target. Hence, b t would be, in general, much smaller than a t . The received echo signal can thus be represented as
Forward RIS Forward/Backward RIS
Rx (d)
RIS Path (echo) Direct Path (echo)
Direct Path (Transmit Signal)
FIGURE 2. RIS-aided sensing configurations. (a) Monostatic. (b) Monostatic with co-located Txs/Rxs at the BS and one RIS assigned to handle only forward relaying. (c) Bistatic with RIS handling either forward or backward echo relaying. (d) Bistatic with one RIS assigned to handle only forward relaying. IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
47
y s ^ t h = h r h tH ws ^ t h + z s ^ t h (4)
where z s ^ t h represents the zero-mean additive white Gaussian noise with variance v 2s . Note that the overall received echo signal comes from the two directions, i.e., from the target reflection angle i 1 and the RIS reflection angle ~. There are four propagation paths in the sensing channel, namely, the BS-target-BS, BS-RIS-target-RIS-BS, BS-RIS-target-BS, and BS-target-RIS-BS channels. Compared to conventional downlink communication scenarios, the three newly added paths not only provide extra channel gains that can be used for sensing, but also offer an additional dimension to sense parameters that refer to the target. Specifically, by estimating the key parameters contained in the four paths, i.e., i 1 and i 2, one can localize the target. It is noted that the effect of the echoes from the RIS on the Rx array will be relatively small; hence, we hereinafter assume that their contribution is negligible by setting b r = 0 leading to the monostatic sensing scenario in Figure 2(b).
Transmit beamforming Target illumination power, defined as the power of the signal at the target due to the transmit sensing waveform x ^ t h, can be expressed as power ^w, Uh _ E 8
B=
2 h tH x (t)
w, U
subject to
w
2 2
#1
z i = 1, i = 1, f, N
(5)
where the constraint on w is due to the transmit power budget, which we set to one without loss of generality. The unit modulus constraint on z i is based on the simplified phaseshifting model of RISs, as discussed in the “Signal Modeling” section. This optimization problem is formulated for a single target for simplicity and can be easily extended to multiple targets with the objective of maximizing the worst-case illumination power over all of them. As observed, this design problem is nonconvex in the variables ^w, Uh, but it can be suboptimally solved using alternating optimization: by fixing one variable and solving with respect to the other and vice versa, and iterating this procedure till convergence. Without the RIS and with only the direct path to/from the target, the channels in (3) will not have the second terms. This, of course, implies that there will be no RIS phase profiles to be optimized. In this case, target illumination power can be trivially maximized by aligning w to the channel as 48
RIS beamforming gain with no direct path Whenever there is no direct path, RISs can play an important role in illuminating the target when they are designed to focus the energy from the Tx to it. To do so, the sensing beamformer is aligned to the channel between the Tx and RIS, i.e., w = a t (~) / < a t (~) < 2 . This design implies that all the Tx energy is beamformed toward the RIS. In this case, the useful signal reaching the target is r b (o). (7) h tH x (t) ? b H (i 2) U
where (r$) denotes complex conjugation. Hence, the optimal choice of U to maximize target illumination power is to choose z i for i = 1, f, N as follows: zr i = exp " - j angle ^6b (i 2) 9 b (o)@i h, (8) H
maximize power ^w, Uh
a t (i 1) (6) a t (i 1) 2
which is the well-known matched filter beamformer. To understand when an RIS is actually useful for improving target illumination power and when it is not, we next discuss the beamforming gain offered by the RIS in cases with and without the direct path.
h tH ww H h t .
For a given target direction, target illumination power not only depends on the precoder w, but also on the reflection pattern of the RIS. By optimally designing the precoder and the RIS reflection pattern, one can improve the effective target illumination power, which in turn improves sensing performance of the system. Thus, we can maximize target illumination power with respect to ^w, Uh as
w=
where the 9 symbol is the Hadamard product. The total illumination power is then E 6; h tH x ^ t h;2@ = v 2b L T N 2, where 2 2 v b = E 6; b t ; @ is the average strength of the RIS path. Without the direct path, we can see that by using an RIS, a beamforming gain of N 2 can be achieved due to the RIS, but only an array gain of L T due to the Tx array.
RIS beamforming gain with the direct path When there is no direct path or RIS, it is clear that the precoder should be aligned to the channel between the Tx and target or the Tx and RIS, respectively. In the presence of the direct path and RIS, the optimal precoder w * obtained from (5) is a 1D subspace of the L T- dimensional space, with an orthonormal basis W so that w * can be synthesized using W. Precoding using W simplifies the computation of the worst-case RIS beamforming gain. Suppose we choose an orthogonal precoding matrix W to transmit probing signals as x (t) = Ws (t) so that WW H = I and E " s (t) s (t) H , = I. In other words, suppose we transmit isotropically and choose the RIS reflection pattern as in (8). Then, target illumination power simplifies to E 8 h tH x (t) B = E 6h tH h t@ 2
= v 2a L T + v 2b L T N 2
(9)
2 where v 2a = E 6 a t @ is the strength of the direct path with a t and b t being mutually uncorrelated. Normally, the strength of the RIS path is typically weaker than the strength of the direct one. For instance, let v 2b = tv a2 with t 1 1, i.e., the RIS path is t- times weaker than the direct path. When choosing
IEEE SIGNAL PROCESSING MAGAZINE
|
September 2023
|
an RIS with N 2 1/ t , the RIS path will be strengthened more than the direct one. This is due to the beamforming gain of N 2 offered by the RIS. Although difficult to quantify, the gain from the RIS deployment and the target illumination power will usually be higher when the Tx beamformer and RIS phase shifts are optimally chosen, i.e., by finding the optimum of the joint design problem in (5).
We focus on the Rx side of the RIS-aided sensing system in what follows. In particular, the signal at the target is reflected and received by the receive array. We can thus rewrite the received signal model as y s (t) = ha r (i 1) h tH x (t) + z s ^ t h (10)
where h is the target gain (it models the radar cross section as well as the path attenuation between the target and Rx) with variance v 2h . We assume that the Rx uses a matched filter uses a filter matched to the target angle i 1 a r (i 1) . To implement the filter, standard direction finding methods [43], e.g., subspace-based methods or beamforming, or codebookbased methods [38] can be used to find the target bearing angle i 1 . In this ideal case, the signal-to-noise ratio (SNR) at the output of the matched filter is given by SNR =
L S v 2h 2
vs
power ^w, Uh
i.e., it depends on target illumination power. Thus solving (5) optimizes received SNR as well. We can detect the presence or absence of a target at angle i 1 using the Neyman–Pearson detector with a generalized likelihood ratio test (GLRT). For RIS-aided sensing, GLRT with respect to the target amplitude to determine the presence of a target (hypothesis H 1) and its absence (hypothesis H 0) is ysH a r 2 vs
2
CRB(i 1) = =
Target detection and parameter estimation
signal and the target gain) with respect to the angle i is given by [43]
H1
Uc
H0
where c 2 0 is the detection threshold, which is set to obtain the constant false-alarm rate P f = e - c. When h 2 is nonfluctuating, the detection probability is [20] Pd = Q 1 ^ 2SNR , 2c h with Q 1 ($ , $) being the Marcum Q-function. As the received echoes do not depend on the RIS, the optimal Rx filter (in terms of SNR or Pd) is the matched filter. The Cramér–Rao bound (CRB) gives us a lower bound on the variance of an unbiased estimator for the targetbearing angle. It also gives us a baseline on the performance of the direction estimator. Now suppose that the receive . array is conjugate symmetric, i.e., a r (i) H a r (i) = 0 where . a r (i) = 2a r (i) /2i. Then, the CRB (conditioned on the echo
LS 1 1 . 2 ; SNR + 2E (SNR) 2T < a r