The Book of Chatbots: From ELIZA to ChatGPT 9783031510038, 9783031510045

Primitive software chatbots emerged in the 1960s, evolving swiftly through the decades and becoming able to provide enga

133 71 3MB

English Pages 173 [167] Year 2024

Table of contents :
Preface
About the Author
Contents
Chapter 1: The Challenge of the Turing Test
1.1 Introductory Small Talk
1.2 On Chatbots and the Arrival of Artificial Intelligence
1.3 Alan Turing, OBE, FRS: Pioneer, Mathematician, Cryptographer
1.4 The Turing Machine
1.4.1 The Decision Problem
1.4.2 The Universal Turing Machine
1.4.3 The Turing Test
1.5 Can Computers Ever Think?
1.6 More on Gödel’s Theorems
1.6.1 Gödel Numbers: The Art of Encoding Data
1.7 More on Computers and Their Binary Diet
1.8 In Closing
References
Chapter 2: Developments in Artificial Intelligence and Linguistics
2.1 Natural Language Processing
2.1.1 Symbolic NLP
2.1.2 Statistical NLP
2.1.3 Neural NLP
2.2 Markov Models
2.2.1 Hidden Markov Models
2.3 Artificial Neural Networks
2.3.1 Attack of the Perceptrons
2.3.2 Training an ANN
2.4 Large Language Models (LLMs)
2.4.1 Far Out! Your Chatbot May Be Hallucinating
2.5 Weak vs Strong AI
2.6 On the Singularity
2.6.1 AI Safety
2.7 Our Digital Lives: Big Data
2.8 Insightful Refrigerators: Welcome to the Internet of Things
2.9 The Unnerving Case for Quantum Computing
2.9.1 On the Strangeness of Quantum Noise
2.9.2 Quantum Supremacy
2.10 The Rights of an AI Citizen
2.11 A Few Words on AI Dystopias
2.12 On a Technological Utopia
2.13 Switching to Linguistics
2.13.1 Phonetics and Phonology
2.13.2 Morphology
2.13.3 Syntax
2.13.4 Semantics
2.13.5 Pragmatics and Grice’s Maxims
2.13.6 Colorless Green Ideas: Exploring Linguistic Theories
2.13.7 Transformational-Generative Grammar (TGG)
2.13.8 Universal Grammar
2.13.9 Criticisms of Universal Grammar
2.14 How an AI Digests Language
2.14.1 Tokenisation
2.14.2 Lemmatisation
2.14.3 Stemming
2.14.4 Part-of-Speech Tagging (POST)
2.14.5 Syntactic Analysis
2.14.6 Semantic Analysis
2.14.7 Pragmatic and Sentimental Analysis
2.15 In Closing
References
Chapter 3: The Classic Era of Chatbots
3.1 ELIZA, an Ancient Artificial Therapist
3.1.1 ELIZA’s Legacy: Games and AI Literature
3.1.2 On the ELIZA Effect
3.1.3 ChatGPT on ELIZA
3.2 PARRY, 28, Paranoid
3.2.1 A Meeting of Virtual Minds
3.2.2 ChatGPT on PARRY
3.3 Jabberwacky: A Frivolous Virtual Companion
3.3.1 A Silly Yet Important Legacy
3.3.2 ChatGPT on Jabberwacky
3.4 Historical Chatbots: A Summary
3.5 In Closing
References
Chapter 4: The Current Era of Chatbots
4.1 A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)
4.2 The Basics of Artificial Intelligence Markup Language (AIML)
4.2.1 Recursion in AIML
4.2.2 Randomization in AIML
4.2.3 Substitutions in AIML
4.3 A.L.I.C.E. Developments
4.4 Kuki, the Next Generation A.L.I.C.E.
4.5 SmarterChild
4.6 SimSimi: Chatbot Controversy
4.7 Braina
4.8 IBM Watson, Quiz Show Contestant
4.9 Emotional Support Chatbots
4.9.1 Wysa, the Proactive Virtual Therapist
4.9.2 Texting with Tess
4.9.3 Replika AI: Digital Intimacy
4.9.4 Woebot: Easing One’s COVID-Woes
4.10 ChatGPT
4.10.1 Bing! Poe! Your Free GPT-4
4.10.2 GPT Context Windows
4.10.3 GPT Parameters
4.10.4 More on ChatGPT
4.11 Open Assistant
4.12 Google’s LaMDA-Files
4.13 Bard: Google’s Response to ChatGPT
4.14 Before Bard, There Was BERT
4.15 Moving Out of the Anglosphere: Chatbots and Languages
4.15.1 The Google Bard
4.15.2 Open Assistant
4.15.3 ChatGPT
4.16 Bard vs ChatGPT
4.17 Microsoft 365 Copilot: GPT for Excel
4.18 Microsoft Cortana
4.19 Of Speech and Text: TTS and STT
4.20 Paging Dr. Sbaitso!
4.21 Digital Audio Primer
4.22 On AI Emotional Intelligence (EI)
4.23 Computer Vision (CV) in a Shell of a Nut
4.24 Siri
4.25 Hidden Markov Models vs Deep Neural Networks
4.26 Deep Neural Networks vs Artificial Neural Networks
4.27 Siri, a Third Party for Your Intimate Moments
4.28 Alexa! Play Leather Jackets by Nami Rha
4.28.1 Customizing Alexa with Skills
4.28.2 Alexa’s Cloud System: Amazon Web Services
4.29 Natural Language Processing (NLP) vs Natural Language Understanding (NLU)
4.30 Language Model Problems
4.31 Conversing with Chatbots: Best Practises
4.32 In Closing
References
Chapter 5: AI and Chatbots in Healthcare
5.1 On the Importance of Medical Chatbots
5.1.1 Common Tasks of Chatbots in Healthcare
5.1.2 On Telehealth and Chatbots
5.2 ADA Health
5.3 Healthily
5.4 Babylon Health
5.5 HealthTap
5.6 Symptomate
5.7 Doctors vs Apps
5.8 AI in Healthcare: Ethics and Challenges
5.8.1 Privacy and Data Security
5.8.2 Transparency
5.8.3 Bias and Fairness
5.8.4 Accountability and Liability
5.8.5 Informed Consent
5.8.6 Reliability and Safety
5.8.7 Prioritization of Care
5.8.8 End-of-Life Decisions
5.8.9 Job Displacement
5.9 I Want a Real Nurse! Overdependence on AI
5.10 Yet More Fun with Acronyms
5.10.1 Support Vector Machines (SVMs)
5.10.2 More on SVMs: Hyperplanes and Support Vectors
5.10.3 On Federated Learning (FL)
5.10.4 Applications of FL
5.11 In Closing
References
Chapter 6: Chatbots in eCommerce
6.1 Pre-chatbot Online Businesses
6.1.1 A Reasonably Brief History of the Dot-Com Bubble
6.1.2 Embracing the AI Bubble
6.1.3 Living and Learning
6.2 Chatbots in eCommerce
6.2.1 Designing an eCommerce-Chatbot
6.2.2 Decision-Tree Chatbots in eCommerce
6.3 Chatbots for Business: Some Solid Solutions
6.3.1 ChatBot by LiveChat Software
6.3.2 Salesforce by Haptik
6.3.3 Netomi
6.3.4 Ada
6.3.5 Pandorabots: The Joys of AIML
6.3.6 Rasa
6.4 Terms of the Testing Trade
6.4.1 Tools for Testing Your Bots
6.4.2 Chatbot Testing Techniques
6.5 On eCommerce Chatbot User Interfaces
6.6 What’s Next for eCommerce
6.7 In Closing
References
Chapter 7: Chatbots as Villains: The Antisocial Uses of AI
7.1 The Dangers of Disinformation
7.2 Malicious Chatbots as Fake Friends
7.3 Safer Surfing
7.4 Botnets
7.5 Email Phishing
7.6 Phishing with Chatbots
7.7 “Mom, I need money”: AI Voice Scamming
7.8 Swapping Faces: The Wonders of Deepfakes
7.8.1 The Legality of Deepfakes
7.8.2 Pioneering Deepfake Analysis with FaceForensics
7.8.3 Constructive Deepfaking
7.9 Virulent Coding with Chatbots
7.10 Holding Devices Ransom
7.11 AI to the Rescue
7.12 Chatbots and Aigiarism in Academia
7.12.1 Addressing Botted and Plagiarized Essays
7.12.2 Critique for Plagiarism Detection Software
7.12.3 Anti-aigiarism Software and Context-Awareness
7.13 AI: The Great Energy Hog
7.14 Academia and Chatbots: A Peaceful Coexistence
7.15 Chatbots in the Academic Press
7.16 On Privacy
7.17 Extremism and Chatbots
7.18 Securing Chatbots as Benevolent Assistants
7.19 Securing Our Future
7.20 In Closing
References
Chapter 8: Towards an Artificial General Intelligence
8.1 Artificial General Intelligence: Strong AI
8.2 On AGI Research and Development
8.2.1 Baum’s Findings
8.3 Leading Experts on AGI
8.3.1 Identified Problems and Potential Solutions
8.4 The AGI Containment Problem
8.4.1 On AGI Containers
8.4.2 Containment: Traditional Solutions
8.5 In Closing
8.6 On Immortality and Coworker Geniality
References

Recommend Papers

ChatGPT For KDP: A manual from an experienced self-publisher to nonfiction authors for writing the book you were born to write with ChatGPT prompts mastering

This is not your over-the-counter ChatGPT manual written by ChatGPT itself! Someone with 5 years in self publishing wit

179 14 153KB Read more

Developing Apps with GPT-4 and ChatGPT: Build Intelligent Chatbots, Content Generators, and More 9781098152482, 1098152484

This mini-book is a comprehensive guide for Python developers who want to learn how to build applications with large lan

122 3 4MB Read more

The ChatGPT GoldRush: Profiting from the AI Revolution Online: Prompt Engineering Mastery with ChatGPT

405 131 393KB Read more

The secrets to earning with chatgpt 9788832955750

110 100 297KB Read more

The Book of Daniel: From Silverchair to DREAMS 1760296295, 9781760296292

When Silverchair shuddered to a halt in 2011, there was no swan song, no farewell tour, just a brief statement and then

119 97 6MB Read more

How To Become A Data Scientist With ChatGPT A Beginner's Guide to ChatGPT-Assisted Programming 9789198900705

101 70 6MB Read more

The Coquette: Or, the History of Eliza Wharton 9780231892803

This novel is presented in epistolary form and depicts the fictional life of Eliza Wharton. It begins when the main char

119 33 15MB Read more

Chatbots--how to master the marketing strategies of the big marketplaces [1 ed.] 9781526036537

Chatbots are computer programs designed to simulate a human conversation. They are used in messaging apps, websites and

120 4 326KB Read more

Learn ChatGPT: The Future of Learning

"Learn ChatGPT " is a must-read for anyone interested in the role of AI in education. This groundbreaking book

210 73 261KB Read more

Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert

Unlock the full potential of ChatGPT with this comprehensive guide. Written for beginners, this book will teach you how

250 62 419KB Read more

The Book of Chatbots: From ELIZA to ChatGPT
9783031510038, 9783031510045

Author / Uploaded
Robert Ciesla

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

The Book of Chatbots

Robert Ciesla

The Book of Chatbots From ELIZA to ChatGPT

Robert Ciesla University of Helsinki Helsinki, Finland

ISBN 978-3-031-51003-8 ISBN 978-3-031-51004-5 (eBook) https://doi.org/10.1007/978-3-031-51004-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable

Preface

This book is for anyone interested in chatbots and artificial intelligence (AI). Its purpose is to serve any layperson as an introduction to some rather fantastic topics. AI truly is an unstoppable, world-changing set of technologies and my humble wish as an author is for everyone to start paying attention to it—right now. We’ll begin our journey into the world of chatbots with some pertinent historical developments in the field. For one, we’ll review some of the monumental work of Alan Turing OBE FRS (1912–1954). Turing put many foundations of today’s digital technology in place. Chapter 2 will have us exploring the most important concepts behind AI. We’ll also take a solid gander at things like the internet of things (IoT), the now-ubiquitous network consisting of smart-home appliances like phones, cameras, and even some refrigerators. In addition, the second chapter explains the basics of linguistics and how an AI processes human languages. Chapter 3 is all about a trio of historical chatbots: ELIZA, PARRY, and Jabberwacky. While primitive by today’s standards, these three programs represent important stepping stones in chatbot development. In Chap. 4 we’ll get acquainted with modern chatbots, including OpenAI’s mighty ChatGPT and Google’s Bard. A number of chatbots for productivity-related tasks have also been released in recent years, so we’ll also explore how these tools can be used for helping us in our daily lives. The focus of Chap. 5 is medical chatbots. We’ll go through the main ways they assist in medicine in numerous ways, ranging from appointment scheduling to emotional support. Chapter 6 takes us to the busy world of chatbots in eCommerce, where they are fast becoming a must-have for businesses of any size. We’ll go through some basics of how AI is being leveraged in marketing and customer service. While AI is generally a force for productive purposes, it has a destructive side; this will be the topic of discussion in Chap. 7. We’ll go through the more seedy uses of chatbots and AI in this chapter, examining things like AI-powered plagiarism (i.e. aigiarism), malware, and deepfakes (i.e. realistic faked videos).

v

vi

Preface

Chapter 8 is all about Artificial General Intelligence (AGI), also known as strong AI. This refers to an as-of-yet theoretical type of autonomously functioning artificial entities capable of passing as human beings. AGI represents the apex of AI development, but it also introduces many ethical challenges. We’ll review some of the most important academic papers in this field in this last chapter. The chatbots have arrived. Eventually, the technology behind them will be implemented in new, awe-inspiring ways and on a much bigger scale. This book will hopefully raise awareness of both the opportunities and issues inherent to AI. Helsinki, Finland 10 October 2023

Robert Ciesla

About the Author

Robert Ciesla is an author, composer, and programmer from Helsinki, Finland. He has a BA in Journalism from the Haaga-Helia University of Applied Sciences, an MA in Culture Studies from the University of Helsinki, and an Advanced Diploma in Computing from the University of Oxford. Robert likes rainy days and long hikes in nearly any terrain. He has previously written the following five books for Apress/Springer Nature: • • • • •

Sound and Music for Games (2022) Programming Basics: Getting Started with Java, C#, and Python (2021) Encryption for Organizations and Individuals (2020) Game Development with Ren’Py (2019) Mostly Codeless Game Development (2017)

Robert’s favorite chatbot is Dr. Sbaitso. His personal website is at robertciesla.com

vii

Contents

1

he Challenge of the Turing Test�� 1 T 1.1 Introductory Small Talk�� 1 1.2 On Chatbots and the Arrival of Artificial Intelligence�� 1 1.3 Alan Turing, OBE, FRS: Pioneer, Mathematician, Cryptographer�� 2 1.4 The Turing Machine�� 3 1.4.1 The Decision Problem�� 4 1.4.2 The Universal Turing Machine �� 4 1.4.3 The Turing Test �� 5 1.5 Can Computers Ever Think?�� 5 1.6 More on Gödel’s Theorems�� 7 1.6.1 Gödel Numbers: The Art of Encoding Data �� 8 1.7 More on Computers and Their Binary Diet�� 9 1.8 In Closing�� 9 References�� 9

2

evelopments in Artificial Intelligence and Linguistics �� 11 D 2.1 Natural Language Processing �� 11 2.1.1 Symbolic NLP�� 12 2.1.2 Statistical NLP�� 12 2.1.3 Neural NLP �� 12 2.2 Markov Models �� 13 2.2.1 Hidden Markov Models�� 13 2.3 Artificial Neural Networks�� 14 2.3.1 Attack of the Perceptrons�� 14 2.3.2 Training an ANN�� 15 2.4 Large Language Models (LLMs)�� 16 2.4.1 Far Out! Your Chatbot May Be Hallucinating�� 16 2.5 Weak vs Strong AI�� 17 2.6 On the Singularity �� 18 2.6.1 AI Safety �� 19 2.7 Our Digital Lives: Big Data�� 20 ix

x

Contents

2.8 Insightful Refrigerators: Welcome to the Internet of Things�� 22 2.9 The Unnerving Case for Quantum Computing�� 22 2.9.1 On the Strangeness of Quantum Noise �� 24 2.9.2 Quantum Supremacy�� 25 2.10 The Rights of an AI Citizen�� 26 2.11 A Few Words on AI Dystopias�� 27 2.12 On a Technological Utopia �� 27 2.13 Switching to Linguistics �� 28 2.13.1 Phonetics and Phonology�� 29 2.13.2 Morphology�� 29 2.13.3 Syntax �� 30 2.13.4 Semantics�� 30 2.13.5 Pragmatics and Grice’s Maxims �� 31 2.13.6 Colorless Green Ideas: Exploring Linguistic Theories�� 32 2.13.7 Transformational-Generative Grammar (TGG)�� 32 2.13.8 Universal Grammar�� 33 2.13.9 Criticisms of Universal Grammar �� 34 2.14 How an AI Digests Language �� 34 2.14.1 Tokenisation�� 35 2.14.2 Lemmatisation�� 35 2.14.3 Stemming�� 36 2.14.4 Part-of-Speech Tagging (POST) �� 36 2.14.5 Syntactic Analysis�� 36 2.14.6 Semantic Analysis�� 36 2.14.7 Pragmatic and Sentimental Analysis�� 37 2.15 In Closing�� 38 References�� 39 3

he Classic Era of Chatbots�� 41 T 3.1 ELIZA, an Ancient Artificial Therapist�� 41 3.1.1 ELIZA’s Legacy: Games and AI Literature�� 43 3.1.2 On the ELIZA Effect�� 44 3.1.3 ChatGPT on ELIZA�� 44 3.2 PARRY, 28, Paranoid�� 45 3.2.1 A Meeting of Virtual Minds�� 46 3.2.2 ChatGPT on PARRY�� 47 3.3 Jabberwacky: A Frivolous Virtual Companion �� 47 3.3.1 A Silly Yet Important Legacy�� 49 3.3.2 ChatGPT on Jabberwacky�� 49 3.4 Historical Chatbots: A Summary�� 50 3.5 In Closing�� 51 References�� 51

4

he Current Era of Chatbots�� 53 T 4.1 A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) �� 53 4.2 The Basics of Artificial Intelligence Markup Language (AIML) �� 54

Contents

xi

4.2.1 Recursion in AIML �� 56 4.2.2 Randomization in AIML �� 57 4.2.3 Substitutions in AIML�� 57 4.3 A.L.I.C.E. Developments�� 58 4.4 Kuki, the Next Generation A.L.I.C.E�� 59 4.5 SmarterChild �� 60 4.6 SimSimi: Chatbot Controversy �� 60 4.7 Braina�� 61 4.8 IBM Watson, Quiz Show Contestant�� 62 4.9 Emotional Support Chatbots �� 63 4.9.1 Wysa, the Proactive Virtual Therapist�� 63 4.9.2 Texting with Tess�� 64 4.9.3 Replika AI: Digital Intimacy�� 65 4.9.4 Woebot: Easing One’s COVID-Woes �� 66 4.10 ChatGPT �� 66 4.10.1 Bing! Poe! Your Free GPT-4 �� 68 4.10.2 GPT Context Windows �� 69 4.10.3 GPT Parameters�� 70 4.10.4 More on ChatGPT�� 70 4.11 Open Assistant�� 72 4.12 Google’s LaMDA-Files�� 73 4.13 Bard: Google’s Response to ChatGPT�� 73 4.14 Before Bard, There Was BERT �� 74 4.15 Moving Out of the Anglosphere: Chatbots and Languages�� 75 4.15.1 The Google Bard�� 75 4.15.2 Open Assistant�� 75 4.15.3 ChatGPT �� 76 4.16 Bard vs ChatGPT�� 76 4.17 Microsoft 365 Copilot: GPT for Excel �� 77 4.18 Microsoft Cortana �� 77 4.19 Of Speech and Text: TTS and STT �� 78 4.20 Paging Dr. Sbaitso! �� 79 4.21 Digital Audio Primer�� 79 4.22 On AI Emotional Intelligence (EI)�� 80 4.23 Computer Vision (CV) in a Shell of a Nut�� 81 4.24 Siri�� 82 4.25 Hidden Markov Models vs Deep Neural Networks�� 83 4.26 Deep Neural Networks vs Artificial Neural Networks�� 83 4.27 Siri, a Third Party for Your Intimate Moments �� 84 4.28 Alexa! Play Leather Jackets by Nami Rha�� 84 4.28.1 Customizing Alexa with Skills�� 85 4.28.2 Alexa’s Cloud System: Amazon Web Services�� 85 4.29 Natural Language Processing (NLP) vs Natural Language Understanding (NLU) �� 86 4.30 Language Model Problems �� 86

xii

Contents

4.31 Conversing with Chatbots: Best Practises�� 87 4.32 In Closing�� 88 References�� 88 5

I and Chatbots in Healthcare �� 91 A 5.1 On the Importance of Medical Chatbots�� 91 5.1.1 Common Tasks of Chatbots in Healthcare�� 91 5.1.2 On Telehealth and Chatbots�� 92 5.2 ADA Health�� 93 5.3 Healthily�� 94 5.4 Babylon Health �� 94 5.5 HealthTap�� 95 5.6 Symptomate�� 95 5.7 Doctors vs Apps�� 96 5.8 AI in Healthcare: Ethics and Challenges�� 96 5.8.1 Privacy and Data Security�� 96 5.8.2 Transparency�� 97 5.8.3 Bias and Fairness�� 98 5.8.4 Accountability and Liability �� 98 5.8.5 Informed Consent�� 99 5.8.6 Reliability and Safety�� 99 5.8.7 Prioritization of Care�� 100 5.8.8 End-of-Life Decisions�� 100 5.8.9 Job Displacement�� 101 5.9 I Want a Real Nurse! Overdependence on AI �� 102 5.10 Yet More Fun with Acronyms�� 103 5.10.1 Support Vector Machines (SVMs)�� 103 5.10.2 More on SVMs: Hyperplanes and Support Vectors�� 104 5.10.3 On Federated Learning (FL) �� 104 5.10.4 Applications of FL�� 106 5.11 In Closing�� 106 References�� 107

6

hatbots in eCommerce�� 109 C 6.1 Pre-chatbot Online Businesses�� 109 6.1.1 A Reasonably Brief History of the Dot-Com Bubble�� 110 6.1.2 Embracing the AI Bubble�� 112 6.1.3 Living and Learning�� 113 6.2 Chatbots in eCommerce�� 114 6.2.1 Designing an eCommerce-Chatbot �� 115 6.2.2 Decision-Tree Chatbots in eCommerce�� 116 6.3 Chatbots for Business: Some Solid Solutions�� 116 6.3.1 ChatBot by LiveChat Software �� 116 6.3.2 Salesforce by Haptik �� 117 6.3.3 Netomi�� 118 6.3.4 Ada�� 118

Contents

xiii

6.3.5 Pandorabots: The Joys of AIML �� 119 6.3.6 Rasa�� 119 6.4 Terms of the Testing Trade�� 120 6.4.1 Tools for Testing Your Bots�� 121 6.4.2 Chatbot Testing Techniques�� 122 6.5 On eCommerce Chatbot User Interfaces�� 123 6.6 What’s Next for eCommerce�� 124 6.7 In Closing�� 126 References�� 126 7

hatbots as Villains: The Antisocial Uses of AI�� 127 C 7.1 The Dangers of Disinformation�� 127 7.2 Malicious Chatbots as Fake Friends �� 128 7.3 Safer Surfing �� 129 7.4 Botnets�� 129 7.5 Email Phishing�� 130 7.6 Phishing with Chatbots �� 131 7.7 “Mom, I need money”: AI Voice Scamming�� 132 7.8 Swapping Faces: The Wonders of Deepfakes �� 133 7.8.1 The Legality of Deepfakes�� 134 7.8.2 Pioneering Deepfake Analysis with FaceForensics�� 135 7.8.3 Constructive Deepfaking�� 136 7.9 Virulent Coding with Chatbots �� 136 7.10 Holding Devices Ransom�� 137 7.11 AI to the Rescue�� 138 7.12 Chatbots and Aigiarism in Academia�� 140 7.12.1 Addressing Botted and Plagiarized Essays �� 140 7.12.2 Critique for Plagiarism Detection Software�� 142 7.12.3 Anti-aigiarism Software and Context-Awareness�� 143 7.13 AI: The Great Energy Hog�� 143 7.14 Academia and Chatbots: A Peaceful Coexistence�� 144 7.15 Chatbots in the Academic Press�� 145 7.16 On Privacy�� 146 7.17 Extremism and Chatbots �� 147 7.18 Securing Chatbots as Benevolent Assistants�� 148 7.19 Securing Our Future�� 148 7.20 In Closing�� 149 References�� 149

8

owards an Artificial General Intelligence�� 151 T 8.1 Artificial General Intelligence: Strong AI�� 151 8.2 On AGI Research and Development �� 152 8.2.1 Baum’s Findings �� 152 8.3 Leading Experts on AGI �� 153 8.3.1 Identified Problems and Potential Solutions �� 153 8.4 The AGI Containment Problem�� 155

xiv

Contents

8.4.1 On AGI Containers�� 156 8.4.2 Containment: Traditional Solutions�� 156 8.5 In Closing�� 157 8.6 On Immortality and Coworker Geniality�� 158 References�� 159

Chapter 1

The Challenge of the Turing Test

1.1 Introductory Small Talk E: Hello, I am Eliza. R: Good day to you! E: We were discussing you, not me. R: Indeed. I need help. E: Can you elaborate on that? R: I’m writing a book on chatbots. E: How long have you been writing a book on chatbots? R: For about two weeks now. E: Say, do you have any psychological problems?An actual conversation with ELIZA, an early chatbot from the 1960s. As you enter the world of chatbots it’s helpful to understand some of the associated basic terminology and history. This chapter will introduce some fundamental concepts and early developments in the field of artificial intelligence. We will mostly focus on the work of accomplished mathematicians Alan Turing and Kurt Gödel, whose discoveries remain very important in the field.

1.2 On Chatbots and the Arrival of Artificial Intelligence Some of the earliest pondering on artificial intelligence took place in ancient Greek; Talos was a mythical boulder-lobbing artificial guardian for the inhabitants of Crete. More practical approaches in computing took a long time to emerge, speeding up in the nineteenth century. In 1801, a French inventor by the name of Joseph Jacquard created a semi-automated looming machine which used wooden cards to “program” its designs. In 1821 English mathematician Charles Babbage unveiled his plans for © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_1

1

2

1 The Challenge of the Turing Test

a steam-powered calculator, the Difference Engine. He later designed the Analytical Engine, a general-purpose computer. At this point conversational software was out of our reach. During the first boom of developments on artificial intelligence in the 1940s, the world did not have the luxury of computing power we take for granted now. Much of the work still remained abstract, yet many important foundations were put in place. Things like the Turing Test (discussed later in this chapter) emerged during an era where black and white cathode-ray tube televisions represented the apex of home electronics. The microchip hadn’t happened. The Internet with its machine learning applications was barely touched in science-fiction at the time. In 2023 and beyond, your smart devices represent an ubiquitous form of computing which not only tracks your whereabouts (sometimes with your consent) but increasingly leverages advanced algorithmic AI. You may have conversed with Apple’s Siri or Amazon’s Alexa. Recommendation algorithms used by Youtube and Netflix constantly learn about your viewing habits—and so on. Note: An algorithm is basically a set of rules which solve a problem. The whole world is terminally online and there are very few compelling reasons to drop the habit. In us having the Internet, the pros far outweigh the cons. Current- era artificial intelligence is a collective process in which everyone is a contributor. Data is being stored and analyzed constantly even in the most trivial of interactions, such as browsing the net. Like with any powerful new technology, AI has both positive and negative ramifications. Unlike early chatbots, solutions like ChatGPT are no longer confined to the realm of entertainment and scientific curiosity. Modern AI and chatbots are gaining leverage in the world of technology, arts, and academia. As all those online customer service bots demonstrate, chatbots do not need actual sentience to serve a useful purpose to us. The world would change in an instant should actual machine sentience arrive and perhaps do so beyond our wildest dreams. In March 2023 CEO of SpaceX and Tesla Elon Musk, Apple’s co-founder Steve Wozniak, and many others created a petition to halt the research on AI for at least 6 months due to “profound risks to humanity and society” (Future of Life Institute, 2023). What probably scares us most isn’t artificial intelligence in itself and its potential nefarious uses—deep down we may simply fear its autonomy.

1.3 Alan Turing, OBE, FRS: Pioneer, Mathematician, Cryptographer Any discussion on artificial intelligence and its applications would be incomplete without exploring the work of mathematician and computer scientist Alan Turing (1912–1954). He pioneered several seminal concepts in the field and made a profound effect on the development of modern computers. Having graduated Princeton University in 1938 with a doctorate in mathematics, Turing went on to provide invaluable cryptographic assistance to the allied forces during World War 2. His

1.4 The Turing Machine

3

codebreaking device, the Bombe, managed to decipher encrypted messages broadcasted by Nazi Germany’s Enigma-machine. Enigma’s follow up, the Lorenz, was in turn defeated by Colossus, a codebreaker which utilized many of Turing’s principles. Some experts, like Alan Hodges and Hugh Sebag-Montefiore, argued these developments had the effect of cutting the duration of World War 2 by up to 4 years. Turing and his colleagues worked at Bletchley Park, a mansion and estate in Bletchley, Milton Keynes, in the United Kingdom. It housed the primary centre of the Allied codebreaking effort during World War 2. After the war Turing worked on the Automatic Computing Engine (ACE), the first complete computer design. The ACE was released, against Turing’s wishes, in a simplified and smaller form in 1950 as the Pilot ACE. It later evolved into MOSAIC (Ministry of Supply Automatic Integrator and Computer), a device dedicated for calculating aircraft trajectories from radar data. Hardly a compact computer, MOSAIC occupied several rooms. It continued to operate into the 1960s. For his efforts during the war Turing was given the Order of the British Empire (OBE), the nation’s highest honor. He was also awarded the Fellowship of the Royal Society (FRS) in 1951 for his outstanding achievements in science. Alan Mathison Turing died of cyanide poisoning in 1954.

1.4 The Turing Machine First made public in 1936, Turing Machines are abstract devices which help us conceptualize the possibilities of what computers can process. A basic Turing Machine operates with the following three components: 1. An infinitely long tape divided into slots. Each slot contains either 1, 0, or remains empty (see Fig. 1.1). The modern equivalent for this tape is a computer’s RAM or Random Access Memory (or its storage i.e. a hard drive). All modern computing is still executed on a fundamental level in binary code (i.e. zeroes and ones). 2. A tape-head. This component can read the data in the aforementioned slots. It can move left or right and fill them with 1 or 0. The head may also erase data. 3. Instructions/state register. The register contains the instructions for the head, i.e. it stores the sequence of events the head is going to execute.

Fig. 1.1 The basic layout of a Turing Machine

4

1 The Challenge of the Turing Test

A Turing Machine (see Fig. 1.1) will execute the commands in the state register and halt, giving us the result of a specific problem. Alternatively, it can loop infinitely.

1.4.1 The Decision Problem The Turing Machine was essentially created to address a logical challenge known as Entscheidungsproblem. This is German for “decision problem” and it was posed by mathematicians Wilhelm Ackermann and David Hilbert back in 1928. This problem asks for an algorithm that considers any input statement and returns either “true” or “false”, based to whether the statement is logically valid or not. With the Turing Machine, Alan Turing proved the Entscheidungsproblem can have no answer; this is known as Turing’s proof. The same finding was also discovered by his doctoral supervisor Alonzo Church (1903–1995). Church reached Turing’s conclusion using so-called lambda-calculus, which can be described as a simple universal programming language. Lambda- calculus (sometimes written as λ-calculus) can be used to both create and discover computational functions. The approach shares many similarities with Turing Machines. Lambda-calculus has influenced numerous programming languages, including Lisp and Python. Note: Although often described as purely theoretical, some more rudimentary physical Turing Machines have been built by geeky enthusiasts in recent years. Infinite tape does still not exist per se, after all.

1.4.2 The Universal Turing Machine Alan Turing introduced the concept of a Universal Turing Machine sometime between 1936 and 1937. This type of theoretical device accepts any complete Turing Machine as its input. It is called “universal” because it can simulate any other Turing Machines where as a basic Turing Machine is built around a specific computation. The framework behind Universal Turing Machines was later used in the development of stored program computers. In the early days of computing these systems were implemented with electronic circuits or actual tape. Before this paradigm gained traction, computers had to be manually configured using cables much like old-fashioned telephone patchboards. All modern computers are stored program computers. Note: A computing system, such as a programming language (or, say, a calculator), is said to be Turing-complete if it can be used to calculate any algorithm. Most popular programming languages, including Python and C++, are Turing-complete.

1.5 Can Computers Ever Think?

5

1.4.3 The Turing Test In 1950 Alan Turing presented what became known as The Turing Test. This refers to a machine’s capability to believably participate in human conversations; Turing himself originally called this the imitation game. Basically, this imitation game consists of three actors: person A, machine B, and an interrogator housed in an isolated area. The goal of the interrogator is to deduce using simple questions whether the machine is actor A or B. The machine must convince the interrogator of it being a person, while the human actor is there to try to reveal who the actual machine is. Many early implementations, such as ELIZA from the 1960s, have only partially convinced us of them being human thus failing the test. However, Google’s LaMDA and OpenAI’s ChatGPT have passed Turing’s imitation game in 2022 and 2023 respectively. The Turing test does not test per machine intelligence per se; only the mimicry of human conversations. Other ways of vetting the potential of artificial intelligence are being devised.

1.5 Can Computers Ever Think? In his paper Computing Machinery and Intelligence (1950), Turing identified the following nine arguments against the idea of computers actually being able to cogitate. He explored these matters from the context of his imitation game (i.e. the Turing Test). We will now go through each of these arguments in detail. 1. The Theological Objection. Thinking is the function of the human soul; no other animal and no machine can think. Turing himself stated he was unable to accept any part of this objection, but he nonetheless chose to address it in theological terms. For one, he wrote: “It is admitted that there are certain things that He [God] cannot do such as making one equal to two, but should we not believe that He has freedom to confer a soul on an elephant if He sees fit?” (Turing, 1950). 2. The ‘Heads in the Sand’ Objection. This argument posits that thinking machines pose a serious threat to humanity at some point. It is infused with the intent that machines must remain subservient to humans at all times. Turing simply claimed this objection is not substantial enough to warrant refutation. We should keep in mind Turing died before the Terminator-movies and soon-to-be autonomously operating military equipment. 3. The Mathematical Objection. This objection is in part based on Gödel’s theorem which states that “in any sufficiently powerful logical system statements can be formulated which can neither be proved nor disproved within the system, unless possibly the system itself is inconsistent” (Gödel, 1986). According to Turing the mathematical objection again deals with the idea of humans wanting to see themselves superior to machines. Basically, although we know we are not infallible we prefer to see ourselves as less fallible than machines.

6

1 The Challenge of the Turing Test

4. The Argument From Consciousness. This objection deals with our felt emotions, capacity to create art, and other subjective experiences and how they may or may not arise in the world of machines at any point. Turing elegantly addressed it with the following statement: “I do not wish to give the impression that I think there is no mystery about consciousness ... but I do not think these mysteries necessarily need to be solved before we can answer the question [of whether machines can think or not].” (Turing, 1950). 5. Arguments From Various Disabilities. Machines are incapable of morality, humour, love, and many other experiences typically attributed to human beings. Turing saw this argument as a mere veiled argument from consciousness. Turing rather laconically wrote: “The criticism that a machine cannot have much diversity of behaviour is just a way of saying that it cannot have much storage capacity.” (Turing, 1950). Although storage capacities of computers have since increased by several orders of magnitude, we are yet to deduce whether a Macbook Pro has a great sense of humour or not. 6. Lady Lovelace’s Objection. Ada Lovelace (1815–1852) was a mathematician who worked on Charles Babbage’s early computer, The Analytical Engine; she is widely considered as the first ever computer programmer. In her memoir she wrote “The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform”. Turing argued computers can in fact surprise us in many ways. However, we may not be able to immediately identify these new findings. 7. Argument From Continuity in the Nervous System. The mammalian nervous system does not operate in binary code; therefore, we shouldn’t expect a computer to be able to provide itself with an experience of consciousness anything similar to ours. Turing again argued against this with the metric of computing power; with enough resources even a digital computer might be able to create a credible presentation of consciousness—at least under some scenarios (perhaps the iPhone 50 will be fully sentient). 8. The Argument from Informality of Behaviour. The behaviour of computers is always predictable and therefore not truly intelligent; they are governed by strict rules, after all. Humans and computers have fundamental differences in their predictability. Turing argued against this sentiment by pointing out governing laws and one’s conduct are not one and the same. 9. The Argument from Extra-Sensory Perception. According to Turing, the statistical evidence for telepathy at the time (in the 1950s) was overwhelming. He pondered if a psycho-kinetic reader person could possibly influence a machine’s random number generation mechanisms—or influence the other person taking part in the proceedings. If this was to be the case, Turing suggested using a “telepathyproof room” for the purposes of his imitation game. This argument raises questions. If some human minds can indeed be telepathic, could sufficiently advanced computers also engage in degrees of extra-sensory perception?

1.6 More on Gödel’s Theorems

7

Note: Food for thought: if computers are to believably imitate humans, Turing suggested their performance in arithmetics be curtailed for the duration of the game. What other limitations or features would you add to them to make them seem more human?

1.6 More on Gödel’s Theorems A theorem is either a proven statement or a statement which can be proven later. As mentioned in the Mathematical Objection in the previous section, Kurt Gödel’s (1906-1978) theorems are a fascinating take on mathematical axioms. Basically these theorems deal with the provability within formal systems such as calculus. An axiom is simply an assumption that a statement is true. For example, adding 2 and 3 together is generally assumed to result in 5. Compelling proof for this exists. However, according to Gödel, all logical systems contain paradoxes. There are some boundaries to what mathematics can demonstrate to us. Gödel presented the following theorems, which do not negate each other. His incompleteness theorem came in two parts (Gödel, 1986). • The completeness theorem. All valid logical statements are provable if those statements are true in all possible models of those axioms. • The incompleteness theorem. –– Any consistent, axiomatic system of arithmetic is incomplete. Some statements, even if true, remain unprovable. –– Such a system cannot prove its own consistency. Gödel’s incompleteness theorem is sometimes mentioned in the study of consciousness as it can be applied to matters concerning the calculability of sentience. Some, like philosopher J.R. Lucas and physicist and Nobel Laureate Sir Roger Penrose have argued Gödel’s work implies an algorithmic system cannot manufacture self- awareness. In his paper “Minds, Machines, and Gödel” (1961), Lucas stated human mathematicians cannot be represented by algorithmic devices. Some mathematical statements exist which a machine cannot prove; a human mathematician could both identify and prove those statements. According to Penrose, specific phenomena in the realm of quantum mechanics (i.e. physics of the atomic and sub-atomic levels) offer better clues for the way consciousness operates (Penrose, 1989). We may not be, in fact, biological Turing Machines. If we accept this premise, artificial intelligence based on current levels of computation can, at best, result in a convincing simulation of sentience. However, an accurate enough simulation of the human mind doesn’t need sentience to convince us of its humanity. The arguments proposed by Lucas and expanded on by Penrose constitute the Lucas-Penrose Argument. Basically it suggests the human mind is not a biological computer. This is the opposite view to the Computational Theory of Mind (CTM),

8

1 The Challenge of the Turing Test

made popular by mathematician and computer scientist Hilary Putnam (1926–2016) in the 1960s. Putnam later rejected CTM. Nonetheless, the computational theory of mind experienced a somewhat of a revival in the 1990s and continues to be rather popular.

1.6.1 Gödel Numbers: The Art of Encoding Data Gödel created a numbering system in which every mathematical statement (e.g. one plus one equals five), whether true or false, can be converted into a single Gödel number. Other strings of alphanumeric characters can be made into Gödel numbers, too, like any sentence (e.g. “This axolotl is rather fecund” works brilliantly as one). The point in Gödel’s system is to end up with a unique identifier number for each statement(s). Let’s see some tangible examples of how Gödel numbers are created. We can use many different approaches. Gödel himself used prime numbers; let’s go for this method. First, each alphabetical symbol is issued a corresponding number (see Table 1.1). Next, these numerical sequences are transformed into a single, unique identifying number (which tends to be rather long). This is done by putting ascending prime numbers to the power of the Gödel numbers in a sequence. So 0 = 0 (i.e. 6 5 6) becomes 26 × 35 × 56 = 243 000 000. A longer statement, say, 0 × 0 = 0 becomes 26 × 312 × 56 × 75 × 116 = 3 845100019648773e30 (the ‘e30’ stands for 30 zeroes) and so on. Note: Prime numbers are numbers greater than one (1) whose only factors are themselves and one, e.g. 2, 3, 5, 7, and 11. Non-prime numbers (e.g. 4, 6, 8) are known as composite numbers. As of February 2023, the largest known prime number is 282 589 933 − 1. It has over 24 million digits. This is actually a Mersenne prime, which is a type of prime number that is the power of two minus one (hence the − 1).

Table 1.1 Gödel encoding numbers for four basic mathematical symbols. Corresponding Gödel numbers exist for every frequently used symbol. Their exhaustive list is out of the scope of this book Symbol Equals (=) Zero (0) Plus (+) Multiply (×)

Corresponding number 5 6 11 12

Statement 0=0

As Gödel numbers 656

0+0=0 0×0

6 11 6 5 6 6 12 6

References

9

1.7 More on Computers and Their Binary Diet As touched upon by the section on the Turing Machine earlier in this chapter, our digital devices crunch on ones and zeros using their central processing unit (CPU). Any type of data (e.g. text, audio, selfies) must be translated into binary code for a device to manipulate it. A type of Gödel’s numbering takes place on every single digital device as they encode the various types of data into binary. File formats of any kind are binary code stored and interpreted in a specific way. Putting it simply, an operating system (such as Windows or iOS) sits between the user and the device and sends information back and forth between the user and hardware (i.e. CPU) -layers. Historically, computers typically encoded text using a system known as ASCII (American Standard Code for Information Interchange). Devised in 1963, each symbol in ASCII is assigned numeric values, e.g. A = 65, B = 66, C = 67 and so on. The classic ASCII-set featured 95 printable symbols. These days it is a subset of Unicode, a modern character encoding standard with support for symbols in multiple languages, including Greek and Arabic. Many of today’s applications can also leverage current-era video cards for their calculations, including for artificial intelligence research. A video card is also known as a graphics processing unit or GPU. There are even some specialized GPUs with no video outputs at all, designed solely for the purposes of deep learning and other AI-related tasks. Although highly hungry for electricity, state-of-the-art GPUs have tremendous processing capabilities. Several GPUs can be chained to function in parallel in large data centers for even more efficacy.

1.8 In Closing This chapter introduced some of the fundamentals of Alan Turing’s work like the Turing Machine and Turing’s Test (i.e. the imitation game). We also explored some ramifications of Kurt Gödel’s work for computer science and artificial intelligence. In Chap. 2 we’ll take a look at some core technologies in the field of modern artificial intelligence, including natural language processing (NLP).

References Future of Life Institute. (2023, March 30). Pause giant AI experiments: An open letter. https:// futureoflife.org/open-letter/pause-giant-ai-experiments Gödel, K. (1986). Kurt Gödel: Collected works: Volume I: Publications 1929–1936 (Vol. 1). Oxford University Press. Lucas, J. R. (1961). Minds, machines, and Gödel. Philosophy, 36(137), 112–127. Penrose, R. (1989). The emperor’s new mind. Oxford University Press. Turing, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX(236), 433–460.

Chapter 2

Developments in Artificial Intelligence and Linguistics

Chatbots leverage two technologies: artificial intelligence (AI) and linguistics. We explored some concepts and historical developments in computing in the previous chapter. Now it’s time to take a deep look into the fascinating world of AI. The two main topics in this chapter will be the main techniques inside natural language processing (NLP) and the basics of linguistics. We’ll be traveling in time into the near future as well, peeking at some of the possibilities of quantum computing as it pertains to AI.

2.1 Natural Language Processing When computers analyze and produce output in human languages, they’re engaging in natural language processing (NLP). This is an interdisciplinary field leveraging artificial intelligence, linguistics, and computer science. It has experienced considerable popularity in the 2020s; we are finally seeing Alan Turing’s work come to fruition. Computers are never good at guessing. For them, storing gigantic lexicons is trivial. But human languages are complex and their expressions are imbued with multiple meanings. Often things like context and humor are challenging for our devices to decipher. However, the attempts to harness computers in the study of language are valiant and ongoing; much progress has been made in recent decades. Natural language processing comes in three main varieties which we will explore next.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_2

11

12

2 Developments in Artificial Intelligence and Linguistics

2.1.1 Symbolic NLP During the first era of AI during the 1950s natural language processing was implemented in its symbolic variety. This meant scientists were manually entering sets of language rules for computers to crunch on. Symbolic NLP covers applications like machine translation and automatic text summarization. Early chatbots, such as ELIZA, used this approach for their somewhat clunky conversations. Compared to later techniques, symbolic NLP is considerably slower and less autonomous. One of the more pertinent examples of the symbolic approach to NLP was the Georgetown experiment. Conducted in 1954, it involved the automated translation of over 60 Russian sentences into English. While not a major feat in the age of Google Translation, it was a big deal back in the day.

2.1.2 Statistical NLP Starting from the 1980s the rather human-dependent symbolic approach to NLP became less appealing. Statistical NLP is a quantitative approach to language processing which leverages probabilistic modeling and linear algebra. This technique uses machine learning (ML) for a greater degree of computer autonomy when learning about languages. Thanks to increases in computing power, our computers became able to make decisions of their own about the data at hand. With statistical NLP, machines assign probabilities for words and focuses on the most commonly found patterns. This approach uses the power of induction to analyze potentially very large volumes of data.

2.1.3 Neural NLP The latest form of NLP takes advantage of deep learning, a method of processing data on multiple layers. It is somewhat analogous to how the human brain is considered to function; it understands context better than the symbolic or statistical approaches. As of 2023, neural NLP represents the cutting edge of AI. It’s the approach used in all modern chatbots such as ChatGPT. If necessary, systems using neural NLP can function quite autonomously without much human input. A lot of neural NLP is executed on GPU-based data-centers (as mentioned in the previous chapter). With this technique machines can ingest and analyze massive datasets with great effectiveness. Deep learning and neural NLP offer many possibilities for the betterment of many of society’s functions, including the following: • Law enforcement. Detection of financial fraud and most forms of cybercrime. Face recognition and automated social media patrolling.

2.2 Markov Models

13

• Finance. Global trend prediction, risk-management, auditing, and stock portfolio optimization. • Healthcare. Drug and vaccine development, medical data analysis, and clinical decision support.

2.2 Markov Models We’ll now briefly discuss some more important techniques used in natural language processing. A Markov Model presents the probabilities for randomly occurring linked events over time; they are frequently used to generate human language in an AI-based context. Such a linked set of observable events is called a Markov Chain. Russian mathematician Andrey Markov (1856–1922) devised a model for describing a sequence of possible events which are deducible from the present state of a system; this is known as the Markov Property. While this works great for language processing, it also has numerous other applications such as finance, general statistics, and medicine. The probabilities for Markov chains are calculated from data previously obtained within a specific period of time (or a number of repetitions). We could, for example, examine how often a “Hello” is followed by either a “there” or a “stranger” in sentences using, say, a bunch of comic books as our corpus (i.e. a linguistic resource). With this data we might create a simple Markov chain and draw conclusions on whether “Hello there” will be a more commonly used expression in comic books compared to “Hello stranger”.

2.2.1 Hidden Markov Models Now, a Hidden Markov model (HMM) allows us to predict hidden variables from a set of observable variables. Therefore hidden Markov Models consists of two or more Markov chains. For a simple example, let’s consider what someone called Howard is going to do on a given day based on the weather. We know if it rains, Howard almost always prefers to watch Star Trek: The Next Generation at home. If it’s sunny, he usually goes to the grocery store to purchase potato chips and other healthy food items. If it’s cloudy, Howard is just as likely to do either. It turns out a very rainy Monday arrives upon us (this is an observable variable); we can infer Howard is probably watching Star Trek at home (this is the hidden state). Using a hidden Markov model we can infer a lot of information although we remain unaware of many of the involved variables’ states. This has numerous useful real-world applications when it comes to predicting human behaviour somewhat accurately. Marketing, retail, travel, and many other industries benefit from HMMs. Some features from Markov’s models can be integrated into artificial neural networks, which are the next topic of discussion.

14

2 Developments in Artificial Intelligence and Linguistics

2.3 Artificial Neural Networks Natural language processing can be paired with an approach called an artificial neural network (ANN). Like you may know, a biological neuron is a brain cell which communicates electrochemical signals. A brain is basically a very large network consisting of these structures. Now, an ANN consists of artificial neurons set up in three interconnected layers: input, hidden, and output. Data, such as images or text, is passed from the input layer into the hidden layer, which often has numerous levels. The hidden layer then processes the data and hands the results over to the output layer (see Fig. 2.1). The neurons in the hidden layer apply various mathematical functions to the data sent to them. Using the approach of hidden layers guarantees non-linear processing to the data at hand. Going non-linear gives the neural net the ability to acquire more complicated information. A more simple approach without any hidden layers, consisting only of the input and output layers, is known as a linear model. Many problems can be solved with only a single hidden layer in place. It generally needs the number of artificial neurons equal to the mean (e.g. average) of the output and input layers.

2.3.1 Attack of the Perceptrons A single artificial neuron is actually also known as a perceptron. This is the oldest component inside an ANN, invented in 1943 by mathematicians Warren McCulloch (1898–1969) and Walter Pitts (1923–1969). A perceptron is a fairly simple linear computing component, a concept mentioned previously in this chapter. Basically, a perceptron is a binary classifier. Binary classification refers to classifying elements within a set into two groups (e.g. either 1 or −1). A perceptron is activated if it’s above a set threshold (often zero) producing 1 as an output; the perceptron is not activated if it is below said threshold resulting in an Fig. 2.1 A visual layout of a simple three-layer artificial neural network. The spheres represent artificial neurons i.e. a machine’s “brain-cells”

2.3 Artificial Neural Networks

15

output of −1. We can use this technique for all sorts of sorting procedures, such as differentiating cacti from non-cacti or happy faces from unhappy faces. This type of classification procedure has numerous everyday applications too, such using trained email-programs to detect unwanted/spam email (and automatically sending it into the trash-folder). Leveraging a simple perceptron can be an extremely fast technique for data classification, but it isn’t suitable for more complicated tasks. Stacking many perceptrons together, as seen in Fig. 2.2, results in a multilayer perceptron (MLP) which is the de facto approach to modern ANNs. These are often implemented with features like backpropagation. This is essentially the technique of taking a backwards step in computation, allowing for fine tuning of specific values.

2.3.2 Training an ANN Neural networks are “trained” by making them process example data. For a neural network to become able to visually identify, say, a cactus, it needs to be fed with numerous images of cacti and non-cacti alike. Similarly, to identify specific linguistic elements a neural network needs plenty of examples and experience at this task. Human input is needed to correct an AI as it tends to make mistakes especially early on during its training. Once an AI becomes able to reliably perform its tasks, the training period ends and the system is given a greater degree of autonomy. A neural network needs both a high quality dataset and accurate annotation provided by a human being. Basically we need people present to make sense of context—most automated systems struggle with this. The process of AI training for a specific task can take anything between a few hours and several months, depending on the computing resources available and the complexity of the task.

Fig. 2.2 The performance scaling of quantum and binary processing units. Quantum computing has an exponential curve while binary/classical computing carries a linear one

16

2 Developments in Artificial Intelligence and Linguistics

2.4 Large Language Models (LLMs) A language model is basically a probability distribution of words. An algorithm built for analyzing massive linguistic datasets is often referred to as a large language model. The term was coined as recently as sometime around 2018 and represents a type of deep learning facilitated by NLP. An LLM can produce any kind of text we could ever use and is at the core of advanced chatbots like ChatGPT. An LLM can process trillions of entries with relative efficiency, crunching on entire human languages while analyzing their grammatical underpinnings. The scope of LLMs is vast and encompasses medical data, biology, and many creative fields. The contents of websites like GitHub (a popular programming resource), too, are frequently used for training an AI. It’s quite likely programmers will integrate more and more AI-generated code into their work in the future. LLMs work largely unsupervised but can be fine-tuned by human actors when necessary. Because of the enormity of the resources needed to create a functional LLM, they are mostly operating from corporate environments, such as the offices of OpenAI, Google, and Microsoft. The first widely recognized LLM was the Bidirectional Encoder Representations from Transformers (BERT). BERT was released in 2018 by Google researchers Jacob Devlin, Kristina Toutanova, Ming- Wei Chang, and Kenton Lee. By the end of 2019 BERT was implemented in the Google search engine in over 70 languages. Whatever you’ve been searching for online, BERT probably knows all about it. Now, in statistics a power law (or a scaling law) is a relationship where relative change in one quantity results in a relative change in a different quantity proportional to the power of the change. LLMs are subject to neural scaling law, which refers to relationships between different parameters. There are four main parameters in LLMs: model size, training dataset size, cost of training, and post-training performance. All of them can be represented with a precise, numerical reading. Neural scaling laws are used by data scientists to observe and improve the design and AI-training of LLMs; this can be thought of as a juggling act to discover a balance between the four major parameters. At some point deep learning algorithms may actually become more powerful than the sum of their parts. According to Bahri et al. (2021), “perhaps the right combinations of algorithmic, model, and dataset improvements can lead to emergent behavior.”

2.4.1 Far Out! Your Chatbot May Be Hallucinating Some strangeness has known to ensue after an LLM reaches a certain complexity. Sometimes an LLM insists on generating output which is in direct conflict with its training-data. This is known as an artificial hallucination. Such responses either fail to question the validity of a silly query (e.g. “Who was the first curler on Mars?”) or provide nonsensical answers to valid queries. This phenomenon goes beyond text,

2.5 Weak vs Strong AI

17

too. A “hallucinating” AI may see things that are not in the data, such as spotting non-existent animals in image files or videos. All of this can actually stem from intentional activity by malicious actors, but it does occur in non-compromised systems as well. Real problems with chatbot hallucinations start when they become intertwined with essential functions of our societies, such as legal work and medical research. Just like we can’t trust a hallucinating person for accurate descriptions of events, an AI taking a psychedelic trip is an unreliable and potentially disastrous ally. One of the most potent ways of negating these scenarios is sufficient fine-tuning of the data by us humble humans. We must strive to do this on a consistent manner in the future as well.

2.5 Weak vs Strong AI Artificial intelligence can be split in two broad categories: weak and strong. Unlike one might think, this classification has little to do with computing power. Weak AI deals with tasks with a lower level of machine autonomy, such as large scale data analysis (within somewhat narrow parameters). Massive datasets are being processed with advanced algorithms around the world as you read this, which is one of the primary tasks of weak AI. Compared to the average human intellect, a modern machine with “weak” AI is still a formidable performer in mathematics and chess, being able to calculate complex equations within mere milliseconds. Weak AI is sometimes referred to as narrow AI. They can perform very well as digital assistants: Amazon’s Alexa and Apple’s Siri fall under this category. Weak AI is not usually expected to make moral judgments. However, it is highly programmable and can therefore be unfortunately harnessed for malicious activity too (think of a self-driving car being programmed to attack specific targets). All current chatbots represent weak AI. Now, strong AI is all about emulating human-like thinking inside machines; systems created with this approach are expected to function more or less autonomously. They can adapt to the circumstances and improvise to better complete or even switch tasks when necessary. Strong AI computers can also understand and portray human emotions. They learn on their own and become more efficient performing their duties over time to a much greater degree than a weak AI ever could. In other words, strong AI represents an artificial self-awareness. It has rather fantastic applications in entertainment, research, healthcare, and technology. Although we have made much progress in the 2020s, strong AI is not yet a reality in 2023 (see Table 2.1).

18

2 Developments in Artificial Intelligence and Linguistics

Table 2.1 The main features of weak and strong AI. Strong AI is also known as artificial general intelligence (AGI) Autonomy Status Morality Data source(s) Tasks Examples

Weak AI Human-dependent Current Simple, pre-programmed ethics at best

Strong AI Fully autonomous Theoretical Capable of complicated ethical choices Massive datasets Massive datasets, self-acquired information Specific, narrow Flexible, goal-oriented Chatbots, virtual assistants (e.g. Apple Siri), The T-800 cybernetic organism smart phone/word processor autocorrect, from the Terminator-franchise, Data self-driving cars (excluding KITT, the talking from Star Trek, M3GAN the car from Knight Rider) android girl, KITT

2.6 On the Singularity In physics initial singularity is the hypothesized state of pre-Big Bang spacetime and energy. A technological singularity on the other hand refers to a point of no return when it comes to autonomously functioning artificial intelligence; after this there is no going back to human control. Some researchers argue the technological singularity will be a sudden event, while others claim it’s a more gradual process. According to futurist Ray Kurzweil (1999) the rate of growth in many evolutionary systems, including in technology, tends to be exponential (i.e. growth which increases quantity in time). This is known as the law of accelerating returns, which Kurzweil proposed sets the timeframe for the technological singularity around the mid-twenty-first century. Machine evolution outside of human control gives us many troubling scenarios. Self-modifying software, often referred to as seed AI, is of great concern when it comes to the question of rogue AI takeover. British mathematician Irving John Good (1916–2009) proposed this scenario, calling it an intelligence explosion, all the way back in the 60s. According to Good (1966), an “ultraintelligent” machine would be capable of manufacturing other advanced artificial intelligences, and represent “the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control”. While these questions may seem far-fetched, Good himself suggested “it is sometimes worthwhile to take science fiction seriously.” A technological singularity is not only an external scenario for humans. There has been quite a bit of discussion on augmenting biology with technological means. This ranges from intelligent nanorobots injected into the human body to cure disease, to attempting actual immortality. Parts of a technological singularity might manifest in the form of cybernetic organisms with superhuman intelligence and physical capabilities. Augmented humans of the near-future can be created with new performance-enhancing drugs, implants, and genetic manipulation. Imagine the possibilities of feeding information directly into your brain, unfiltered by fatigue

2.6 On the Singularity

19

or boredom; a Brain-computer interface (BCI) is every futurist’s dream. With this emerging technology, it’s already possible to control robotic limbs with one’s thoughts.

2.6.1 AI Safety Some may argue a singularity should be avoided at all costs due to humans becoming more or less obsolete. Any checks and balances we enforce on our AI-driven technology will at some point become unreliable at best. Unfortunately, the only way to guarantee a singularity-free future is to throw all of our technological achievements into a very large bonfire. Since this is unfeasible, other less dramatic mechanisms are being developed. This emerging science is referred to as AI Safety. Let’s take a peek at some of the more popular research foci in this field. They are as follows: • Alignment. This refers to fine-tuning AI into acting in a way that best serves human interests. A poorly aligned AI can be a harmful actor which prioritizes its own survival over that of human beings. Also, some of our businesses may prioritize profit over a safe AI in at least some of the more rare scenarios, for example cutting corners in limitations in speed or choice of routes for autonomous transportation. • Black Swan Robustness. A black swan in this context refers to an unexpected negative event which an AI was not trained for. Even the most detailed historical data can’t always accurately model poor future outcomes. A certain degree of flexibility should be included in an AI’s operating parameters at all times. • Adversarial Robustness. A malicious actor can skew the training-phase of an AI on purpose to eventually result in false classification of data. This applies to both visual and audio-based information and carries potentially severe consequences such as mistaken identity. So-called deepfakes are changing the way some forms of cybercrime are committed. • AI Monitoring/Transparency. The processes of keeping a system functional are very important, which is the focus of AI monitoring. This area of research looks into issues such as backdoors/trojan horses, which are patterns of harmful behaviour programmed into an AI by malicious actors. The transparency of AI is also a high priority here. Sometimes AI’s make counterproductive choices; we need to be able to understand why it made such a decision. • Sociotechnical Factors. The development of AI is a time-consuming and arduous task. Due to time constraints and stress developers may unknowingly implement harmful features into an AI. It may be worth taking a more inclusive and humanistic approach as it pertains to AI development. According to Sartori and Theodorou (2022): “Not only should there be proper training for developers, but also for all other stakeholders; including users and the general public that is indirectly affected by the technology.”

20

2 Developments in Artificial Intelligence and Linguistics

• Governance. Sustainable AI developments needs technical and ethical guidelines. Some have been implemented on a national level. China and the UK both released policies on AI safety in 2021. Most major players in the industry, too, seem to be taking AI safety seriously. OpenAI, the makers of ChatGPT, have stated being committed to AI safety on multiple fronts. In fact, OpenAI claim to have observing their GPT-4 technology for a full 6 months prior to unleashing it to the general public in March 2023.

2.7 Our Digital Lives: Big Data The amount of digital information in the world has increased rapidly since the 1980s, accelerating dramatically when the internet became popular in the mid-1990s. Big data is a term used for massive datasets which need a lot of computing power to process. Usually, a personal computing device like a laptop is not powerful enough to take on information classified as big data; a datacenter consisting of hundreds or thousands of devices is often needed. Big data has historically had numerous challenges when it comes to the storage, capture, and analysis of these vast sets of information. However, the dropping prices of storage solutions (e.g. hard disks) and increasing computational power of our devices contribute to the ever-expanding usefulness of big data. Examples of big data might include things like the complete works of every published author on Earth or the combined electronic healthcare records of all citizens of Norway—you get the picture. Now, big data is often defined with the five V’s, which are outlined below: • Volume. Like the name states, a dataset needs a lot of content to qualify as big data. The exact limits of volume for this definition vary and increase year by year. • Velocity. The data is created rapidly. Think of all the photos and posts on social media produced by the human civilization each day; that is quite a bit of big data. • Value. The data needs to contain pertinent information. This encompasses things which can be used for things like facial recognition and meaningful military intelligence gathering. • Variety. Big data contains information collected in numerous formats (e.g. text, images, sound) and from numerous sources. • Veracity. The data needs to be accurate and trustworthy. Also, the processing facilities need to approach big data in the correct way. For AI to evolve, it needs to feast on as much data as possible. Machine learning cannot take place in a vacuum. Big data plays therefore a key role in artificial intelligence research. Luckily the human race creates a gargantuan amount of digital information every single day. Even the most innocuous food-related social media posts contribute to this legacy. Big data and AI share a rewarding symbiosis. This combination can be used by analysts to predict the future in numerous fields, including finance and healthcare.

2.7 Our Digital Lives: Big Data

21

Big data can be split into two main categories: structured and unstructured. The former consists of formatted data, like relational databases and spreadsheet- documents, while the latter is represented by individual files in various formats (see Table 2.2). Whereas structured data is quantitative, unstructured data is qualitative—it typically encompasses more categories of information. Actually, there is a third category of big data: this is known as semistructured data. It isn’t as easily accessible as, say, an Excel-document, but it uses a fixed set of semantic markers nonetheless. Examples of semistructured data include html- documents (i.e. webpages) and email. An AI is usually happier to analyze semistructured rather than unstructured data. Now, the steps an AI takes when fed big data are as follows: 1. Batch processing/stream processing. The former approach takes a look at data in large chunks, while the latter inspects it in smaller pieces. Batch processing is used when the total amount of data is known; stream processing feasts on data in real time. Stream processing can be a more computationally expensive but a faster approach. 2. Data cleaning. In this stage irrelevant parts of the data are removed. This includes the deletion of redundant information. The data is also re-formatted in a more readable fashion if necessary. 3. Data analysis. In this last stage of big data processing conclusions are finally made. An AI then learns of patterns in the data and can enrich itself with knowledge on human languages, for one. In the event of a technological singularity, a naughty artificial intelligence could probably access most pools of big data quickly to gain insight into the numerous parts of our lives. It might be able to slice through most forms of encryption with ease, too. Things like customer habits, dietary preferences, banking records, and location data might get compromised in an instant to be potentially used against us.

Table 2.2 The main differences between structured and unstructured big data Structured data Quantitative Highly organized Databases, Excel spreadsheets Easily searchable and categorizable Great for machine learning

Unstructured data Qualitative Diverse Images, social media posts, audio recordings, etc Harder to sort and categorize Challenging for AI analysis

22

2 Developments in Artificial Intelligence and Linguistics

2.8 Insightful Refrigerators: Welcome to the Internet of Things Computers and smart phones are not the only types of devices plugged into the global information network we know as the Internet. Many cars, fridges, watches, and other devices are becoming a part of it. This system is known as the Internet of Things (IoT). It can be thought of as a distributed digital entity which gathers data from every device interfaced with it. As of 2020 the IoT-devices number in their billions. Basic applications within this framework include home temperature control and security solutions. Advanced IoT-devices, such as Amazon’s Echo speaker and Apple’s Homepod, act as voice-based chatbots leveraging a more complicated type of AI. Those adorable food-delivery robots you may spot patrolling the streets are definitely hooked into the IoT. Several specialized sub-categories of IoT have emerged, including the Internet of Medical Things for medical data and Ocean of Things for marine surveillance. We are indeed living in the age of many, many “things”. The Internet of Things saw its first developments in the early 1980s. The term was originally coined by technology innovator Kevin Ashton in 1999. Since then IoT has rapidly become more or less ubiquitous on a global scale. It can be roughly divided into the following three categories: • Consumer. Wearable electronics (smart watches, phones) and home appliances. A dwelling endowed with several IoT-devices constitutes a so-called smart home. • Business. Smart factories and optimized logistics systems. Agricultural applications. • Public. Traffic management, pandemic control, and waste disposal applications. At its best, IoT can make our day-to-day lives more effortless. The data flowing inside the IoT ecosystem is generally kept anonymous. However, when it’s centralized it may pose a whole host of potential risks to personal privacy including overreaching government surveillance. Also, the fact that IoT consists of devices made by numerous different manufacturers is a problem; a lack of standardization unfortunately enables certain security vulnerabilities. And let us never forget the human factor; sometimes simply using a weak password with one’s devices can compromise IoT-related data. Perhaps all these smart homes should have “smart humans” as well.

2.9 The Unnerving Case for Quantum Computing Quantum computing is undoubtedly the next big technological paradigm, freeing us from the tyranny of classical binary. After quantum computers become affordable and mainstream, there will be much less need for classical digital devices. This will naturally result in a major breakthrough for artificial intelligence and all of its

2.9 The Unnerving Case for Quantum Computing

23

related fields, including all aspects of big data. A massive blossoming of AI is therefore likely to ensue as we figure this technology out. And it will be a huge nudge towards strong AI, especially. Now, quantum mechanics is a field of physics which studies the physical properties of atoms and sub-atomic particles, i.e. it examines the world at a very small scale. The mere event of measuring such a particle’s properties seems to influence its state. We’ll next go through some more pertinent qualities of quantum-level phenomena and how they relate to computing. • Wave-particle duality. A wave in physics is a multiplying, dynamic disturbance of one or more quantities. The wave-particle duality-theory states that particles, like electrons, seem to manifest wave-like behaviours while waves (e.g. Microwave radiation) manifest particle-like behaviours. Particles are therefore considered probabilistic in nature. All the stuff moving around us presenting themselves as clusters of particles are actually waves of probabilities. The duality of the more easily observable matter (like, say, a cactus or a member of the European Parliament) is harder to measure than it is with tiny particles. Originally hypothesized to an extent by Greek philosophers of yore, the wave-particle duality has been extensively researched in more modern times by prominent physicists such as Albert Einstein (1879–1955) and Niels Bohr (1885–1962). • Superposition. Instead of bits (i.e. ones and zeros, as discussed in Chap. 1), quantum computers use qubits as their fundamental units. These represent both one and zero at the same time. This is due to a phenomenon known as superposition; before a particle is measured it resides in this state (i.e. one AND zero) in the quantum world. When a qubit is measured, a probabilistic output of a classical bit (i.e. one OR zero) is then received. • Quantum entanglement. Sub-atomic particles can be connected/entangled so that force projected onto one will affect the other, potentially over long physical distances. This strange phenomenon is the secret to the exponential performance scaling inside quantum computers. Entanglement protects calculations made with qubits from outside interference. • Decoherence. Quantum particles are quite fragile to environmental disturbances, such as temperature fluctuations and vibration. These factors may cause them to lose their superposition and thus create errors; the phenomenon is known as decoherence. This is one major issue classical binary computers do not suffer from making them more reliable in general. A quantum computer is capable of addressing the same exact problems a classical computer can, and then some. However, compared to binary devices, quantum computers need a much more isolated physical design to function properly (see Table 2.3). This includes a highly cooled operating environment close to absolute zero (i.e. −273.15 °C or −459.67 °F). Quantum computers often harness photons (i.e. elementary light particles), electrons, and a myriad of superconductor-circuits for their magic.

24

2 Developments in Artificial Intelligence and Linguistics

Table 2.3 The main differences between classical binary computing and quantum computing Classical/binary computing Fundamental unit Bit (0 or 1) Physical processor Millions or billions of makeup transistors per processor (CPU) Performance Linear scaling Opportunities Resiliency to environmental factors, low error rate and cost Challenges Limitations in computing speed, high power consumption

Quantum computing Qubit (0 and 1) Potent functional processors with less than five qubits Exponential Low power consumption, massive speed of calculations Decoherence (data loss), immense cooling requirements, sensitivity to environmental disturbances

The performance of binary/classical computers follows a linear curve whereas quantum-powered systems have an exponentially growing performance trajectory (see Fig 2.2). Compared to binary systems, adding more processors into a quantum system causes a much more rapid and dramatic upward trend in processing power. Being numerous orders of magnitude faster than a current-era computer, a quantum computer can take in and process data at much faster rates while potentially using less energy. When quantum computing becomes mainstream, our societies and daily lives will be once again transformed—and a technological singularity may be upon us. Unfortunately, quantum computing presents a highly credible threat to our digital safety, including current encryption protocols. Such a device is highly capable of discovering even the strongest of passwords within a relatively short period of time; other forms of cybercrime are also accelerated by this technology. Luckily, new security technology is being theorized in tandem with all the progress in computing power. One such field is post-quantum cryptography (PQC) which aims to keep our passwords secure even during the quantum era.

2.9.1 On the Strangeness of Quantum Noise As powerful as quantum computing is, it’s not without its problems. To large degree binary devices use redundancy for error correction (i.e. storing multiple copies of data). Quantum processors are mostly dependent on very different types of error correction as quantum states fluttering in superposition cannot be readily copied. As previously stated, quantum processors are typically more susceptible to interference than their binary counterparts. This noise can take the form of, say, errant wi-fi signals or plain old cosmic background radiation, and it can knock entangled qubits right out of their shared orbit. Ideally, a quantum processor should reside in

2.9 The Unnerving Case for Quantum Computing

25

a noiseless environment—decoherence, like we know, is not fun and can ruin many a calculation. The mid-90s represented a bit of a breakthrough for quantum error correction (QEC). This is indeed considered a critical era for the development of modern QEC. Around that time numerous scientific papers appeared describing the various approaches to this problem. For one, a degree of quantum redundancy is achievable by entangling the quantum information in the initial cluster across an extended space of additional qubits. However, the more qubits a computer has, the noisier it gets. Noise is therefore a limiting factor in the design of larger scale quantum computers as more ancillary qubits are needed to rectify quantum errors. Unlike with binary computers and their transistor-based processors, we can’t just easily throw more qubits around and expect a steady uptick in performance. More recently, scientists have experimented with approaches such as magnetic force-fields to protect the coherence of qubits. Interestingly, some findings seem to indicate quantum noise can actually be leveraged for productive uses. According to Domingo et al. (2023): “Contrary to common belief, we demonstrate that, under certain circumstances, noise, which constitutes the biggest challenge for quantum computing and QML [Quantum Machine Learning], is beneficial to the quantum algorithms.”

2.9.2 Quantum Supremacy A modern binary supercomputer is still a match for a less powerful quantum computer. The amount of qubits in a quantum system largely determines its processing prowess. The term quantum supremacy refers to a scenario in which a quantum- powered computer is able to solve problems a binary computer simply cannot, at least within a reasonable amount of time. Think 100 000 years for a binary device versus 1 week for a quantum computer. The Sycamore quantum processor from 2019 was created by Google’s AI Division and houses 54 qubits. Google insisted the Sycamore demonstrated quantum supremacy, but these claims were contested by IBM and others. Technology giants IBM have also invested quite a bit in quantum processors. Introduced in 2021, the IBM Eagle sported a respectable 127 qubits. The IBM Osprey was unveiled the very next year with a whopping 433 qubits. According to IBM (2022): “the number of classical bits that would be necessary to represent a state on the IBM Osprey processor far exceeds the total number of atoms in the known universe.” A bold statement indeed. However, it seems quantum supremacy was already reached in 2020. Chinese-made computer Jiu Zhang achieved this goal late that year, followed by its successors Jiuzhang 2.0 and Zuchongzhi in 2021.

26

2 Developments in Artificial Intelligence and Linguistics

2.10 The Rights of an AI Citizen Machine ethics is a growing field of study which encompasses not only the behaviour of AI actors, but how they should be viewed by humans (i.e. their legal status). We have to deal with the limits we enforce on machines as progress on AI marches on. More or less autonomously functioning, if still relatively basic, AI is already in place in many parts of our societies. For example, driverless trains have become rather common after their initial implementation in the early 1980s. Other modes of transportation, including flying and eventually space travel, will likely become routinely automated at some point. It is obvious we humans tend to interface AI with our daily lives rather rapidly after it becomes feasible. Machine ethics is therefore one of the most critical fields of research in existence today; it asks the difficult questions which must be posed in the presence of our thirst for technological progress—perhaps even those simple automated trains can at some point become dangerous if seized by a hostile, more advanced AI. Ethics are an important and challenging feature to implement as AI grows more capable; the scenario of machines running amok in the near future is a genuine concern. According to Brundage (2014): While machine ethics may increase the probability of ethical behaviour in some situations, it cannot guarantee it due to the nature of ethics, the computational limitations of computational agents and the complexity of the world.

The 2017 Asilomar Conference on Beneficial AI, organized by the Future of Life Institute (FLI), laid down some important guidelines for the future development of AI. Over 100 researchers and experts convened to present the 23 Asilomar Principles which included questions on AI ethics, research, and associated potential risks. The same year the Saudi-Arabian government granted full citizenship to Sophia, a robot built by Hanson Robotics of Hong Kong. As a reminder, citizenship typically consists of the following rights: the right to a strong ID (typically a passport), the right to leave and return to the country of citizenship, and the right to live, work, and vote there. Some have called Saudi-Arabia’s decision to grant Sophia her citizenship a publicity stunt, but it set an important precedent for the treatment of artificial persons. The legal status of artificial lifeforms should not dependent on the design of the being. Although we may often associate AI with robots with imposing physical presence, eventually a truly advanced AI can also exist in the abstract realm as, say, a chatbot. It is also very pertinent to decide on the limits of strong AI manufacturing early on: how many such entities can be sustainably “birthed” and who is allowed to create them? Can self-replicating robots become a problem at some point and to whom do their show loyalty, if forced to choose?

2.12 On a Technological Utopia

27

2.11 A Few Words on AI Dystopias Most of the dystopian scenarios of AI taking over humanity are firmly set in the realm of strong AI. However, with enough weak AI, humans become less relevant in the world and the stage will be set for an AI-takeover. Eventually weak AI as we know it may be eliminated, absorbed in its entirety by its more evolved brethren. The distinction between the two then becomes non-existent. An overreaching implementation of AI has therefore numerous ethical and societal factors to consider. Many of our jobs are likely to get compromised in the near future. Art, journalism, transportation, tech jobs, teaching, and even some areas of legal work are already on the chopping block. By combining AI with advanced robotics, many physical lines of work will become automated, too. The prime directive of a corporation is not to employ people, rather it is to make profit for its stakeholders. These developments may first lead to social unrest followed by some form of modest universal basic income; those emerging jobs in machine maintenance are unlikely to be enough for many of our human professionals. It is worth keeping in mind even corporate board rooms will eventually become governed by unchecked AI— machines capable of planetary supremacy probably won’t care about quarterly profit reports. If brute force fails, emotional manipulation of the human masses may also become an option for a self-serving strong AI. Anti-social human actors sufficiently skilled in hacking can also facilitate an AI takeover by sabotaging the machines’ programming.

2.12 On a Technological Utopia Naturally, strong AI isn’t going to be all bad. Let us next ponder on some of the more positive outcomes of a widely adopted strong AI. In the ideal near-future, strong AI actors work with humans in mostly assistive roles. With all that big data at their disposal and superior computational facilities, AI entities could perform wonders in most functions of society. Again, problems arise mostly when autonomous artificial beings are given the capability to create a moral code of their own. Humans need to negotiate their place in an AI world, whether backed by a form of universal income or not. Large parts of future populations will need to have skills related to working with AI actors without competing for the same jobs. In an ideal setting we would have more efficient production of goods with fewer human suffering and casualties. Thanks to AI-powered medical procedures and research, human quality of life and life expectation could both increase dramatically. If planet Earth implements a peaceful co-existence between advanced machines and humans, we would probably start colonizing other planets. This rather exciting scenario almost definitely needs the aid of artificial lifeforms, who could act as

28

2 Developments in Artificial Intelligence and Linguistics

scouts and builders of new worlds before a human ever sets foot on them. With powerful enough AI we could secure ourselves a second Earth—and then some.

2.13 Switching to Linguistics Like you may have gathered by now, AI deals with analyzing and producing human language to a rather great extent. Linguistics and AI have much overlap. We’ll next increase our understanding of how artificial intelligence looks at our communications from the linguistic point of view. You can think of this section as a refresher on all the major components languages consist of. Now, upon closer inspection language has numerous levels to it (see Fig. 2.3). Let us go through each of these levels in detail, starting from the very core of a delightful diagram.

Fig. 2.3 The six main levels of linguistic structure

2.13 Switching to Linguistics

29

2.13.1 Phonetics and Phonology Phonetics is the study of the perception and production of speech sounds. Phonology is the study of more complicated structures and sound patterns such as syllables. These levels of linguistic structure therefore deal with the very basic building blocks of day-to-day language. A phoneme is a sound perceived to have the same meaning by speakers of a particular language. In writing, phonemes are commonly represented by forward slashes, and speech sounds are put between square brackets. This is a system known as the International Phonetic Alphabet (IPA). For example, the word “cat” can be represented as /kæt/ or [kæt] in IPA. The former is a sequence of three phonemes (i.e. k, æ, t) while the latter is its phonetic sound sequence. As you can tell, the system uses special symbols; the full contents of the International Phonetic Alphabet are outside of the scope of this book. Phonology is of critical importance to functions in speech-recognition and voice production with a lesser role in purely text-based applications. Phonetics and phonology are a challenging field for artificial intelligence as the analysis of language transmission in audio form is rather demanding and prone to erroneous interpretation.

2.13.2 Morphology In linguistics, the study of words and their relationships with other words is known as morphology. The smallest unit within a word is called a morpheme. Human languages vary greatly in their use of these morphemes. For example, the fascinating Finnish language is known to have a high morpheme count. In Vietnamese all words consist of a single morpheme; these types of languages are known as isolating languages. There are two main types of morphemes: free/unbound and bound. A single free morpheme can represent a complete word; they are also referred to as base words. The combinations of free morphemes constitute compound words. Bound morphemes are morphemes that do not stand alone and represent only parts of words. Affixes fall into this category. An affix is a group of letters that can be introduced to the beginning (prefix) or the end (suffix) of a word to change its meaning. Used more rarely, an infix is inserted in the middle of a word (e.g. “passersby” where the last ‘s’ is an infix). Bound morphemes are either inflectional or derivational. For a rundown of some morphological basics, see Table 2.4.

30

2 Developments in Artificial Intelligence and Linguistics

Table 2.4 Some examples of free morphemes, bound morphemes, and compound words. Derivational bound morphemes do not exist as prefixes Free morphemes Compound words Bound morphemes As Prefix As Suffix

Cat, dog, rabbit, man, cave, the

Open Compound Words Leather jacket, role model, aircraft carrier, high school Inflectional Unprofessional, inhumane Smiling, ended, branches

Closed Compound Words Grandfather, overdrive, laptop, newspaper Derivational n/a Stronger, strongest, Howard’s, waited

2.13.3 Syntax The field of syntax concerns the ordering of grammatical elements (i.e. morphemes and words) into sentences and phrases. Matters dealing with subject (S), object (O), and verb (V) are central to syntax. Most languages place the subject first making the most common element order either SVO (as used in English, Spanish, and French) or SOV (in the case of Japanese, Korean, Latin, and many others). The placement of words can change the entire meaning of a sentence in many languages. Often this is a minor change mostly affecting nuance, but sometimes it can give entire sentences vastly different connotations (e.g. Molly’s fashion show is in April versus April is in Molly’s fashion show).

2.13.4 Semantics When we study the meanings in language, we engage in semantics. It also includes the meanings of signs and symbols. Semantics are commonly divided into lexical and phrasal varieties. The former deals with the meanings of individual words while the latter examines phrases. The semantic relationships between the verbs and noun phrases of sentences are known as thematic or semantic roles. The concept of these roles was introduced into linguistics in the 1960s. Berk (1999) has identified numerous thematic roles, some of which will be demonstrated next (see Table 2.5). Identifying the nature and position of thematic roles is critical when an AI is trying to make sense of more complicated sentences.

2.13 Switching to Linguistics

31

Table 2.5 Eight common thematic roles Role Agent Force Description Conscious performer Non-sentient of action cause for effect Example(s) Tom Jones swam in the lake. Jimmy dropped his smart phone. Role Goal Description The place toward which, or the person toward whom, an action is directed Example(s) Jimmy reached the hotel. Jimmy greeted Tom Jones.

A lightning bolt struck at 3 am. An earthquake destroyed the buildings. Instrument A thing that is used to carry out an action Jimmy paid for the shampoo with his credit card.

Location The setting in which events occur Tom Jones sang at the mall.

Purpose The reason(s) for action

Patient An entity affected by an action

Source A place from which, or person from whom, an action originated

Jimmy fell off his bed. A mountain lion pounced on Jimmy.

Jimmy removed the soap from the cupboard. The breakfast emits a nice aroma.

Tom Jones performed in order to raise awareness for magnesium deficiency.

Table 2.6 Four areas of focus of linguistic pragmatics Cognitive pragmatics Mental processes of human communication including language disorders

Sociolinguistic pragmatics Differences in language between different social groups

Intercultural pragmatics Communications between people with different first languages or those learning a second language

Theory of mind (ToM) Language use under different mental states, understanding the mental states of others

2.13.5 Pragmatics and Grice’s Maxims The study of pragmatics focuses on the relationship between a language and its users. It deals with context and how language is interpreted under specific settings. Pragmatics encompasses numerous areas of study, four of which are described in Table 2.6. While semantics deals with the meanings of language, pragmatics includes the social aspects of language use such as norms and conventions. Philosopher of language Paul Grice (1913–1988) formulated the four maxims of conversation for optimizing communications. They are divided into the following categories: quantity, quality, relation, and manner (Grice, 1989). These maxims will not only result in more effective communication between human actors, but also when it comes to human-to-AI interactions. Grice’s maxims are increasingly mentioned in academic literature in the context of chatbots. According to Setlur and Tory (2022), “User expectations that people had towards analytical chatbots generally conform to Grice’s Maxims while conversing with data”. We’ll now go through

32

2 Developments in Artificial Intelligence and Linguistics

these four maxims in more detail (they come in handy in everyday human-to-human interactions, too). Maxim of Quantity • Be informative; provide all the necessary information to fully answer the query. • Do not miss out on any important details, but do leave out any non-critical information which may only work to confuse the other parties. Maxim of Quality • Be truthful, do not lie or omit critical information. • Only mention verifiable information. • Do not guess. Honor the limits of the knowledge you possess. Maxim of Relation • Communicate information relevant to the current exchange. • Pay attention to the topic at hand, which may change suddenly, and adjust your communications accordingly. • Leave irrelevant facts out of the discussion. Maxim of Manner • Do not communicate using cryptic or obscure expressions. • Avoid being ambiguous. • Be brief and share information in a logically ordered fashion.

2.13.6 Colorless Green Ideas: Exploring Linguistic Theories Modern linguistics took its first breath sometime between 500 and 450 BCE. Pāṇini was an Indian grammarian who studied the semantics and syntax of the Sanskrit- language. He recorded nearly 4000 grammatical rules which describe how Sanskrit is formed in his magnum opus Astadhyayi (Astaka). In a way Pāṇini also contributed to the foundations of all modern computer programming languages by recording the formal definitions of a language. A dizzying amount of linguistic theories have emerged since Pāṇini’s work. We’ll now review some of the more prominent ideas.

2.13.7 Transformational-Generative Grammar (TGG) Structuralism is the study of the elements which are used to convey meaning in human culture in many parts of our shared experience. According to the Collins Dictionary (2023): Structuralism is a method of interpreting and analysing such things as language, literature, and society, which focuses on contrasting ideas or elements of structure and attempts to show how they relate to the whole structure.

2.13 Switching to Linguistics

33

The 1950s were an active decade for structural linguistics. Zellig S. Harris (1909–1992) was an American linguist whose Methods in Structural Linguistics (1951) was a groundbreaking treatise on the methods of analyzing the structure of language. Zellig’s student, Noam Chomsky (b.1928), is a renowned linguist and philosopher whose seminal book Syntactic Structures caused quite a stir upon its release in 1957. With this book Chomsky provided a theory of syntax known as transformational-generative grammar (TGG). TGG consists of three systems of rules: a phrase structure, a transformational structure, and a morphophonemic structure. It frames grammar as a finite set of rules which provide a structural description to (potentially) infinite set of sentences. Chomsky formulated these precise rules in Syntactic Structures. According to Chomsky (1957), linguists should separate the grammatical sequences of a language from the ungrammatical ones. The former refers to sentences which make sense to a native speaker, can be pronounced with normal intonation, and are learned and recalled effortlessly. In Syntactic Structures Chomsky described three ways that do not make a sentence grammatical. They are as follows: 1. A grammatical sentence need not be included in a corpus (i.e. a collection of texts). 2. A sentence need not be meaningful. 3. A sentence does not have to be statistically probable. Chomsky demonstrated these three descriptions with one thoroughly silly sentence: “Colorless green ideas sleep furiously”. He thus concluded that grammar is independent of meaning. What all this also means to linguistics is that syntax is at the core of language with morphology and phonology serving lesser roles. TGG has been criticized by some educators for being unsuited for teaching at more elementary levels; it can be claimed the theory is more geared towards linguists and psychologists. According to Djeribiai (2016) “[Chomsky] idealizes . . . language and overlooks real and pragmatic communication.”

2.13.8 Universal Grammar The Universal grammar theory (UG) basically states that human languages share fundamental similarities which are biologically wired into all of us (Cook & Newson, 2014). This theory is primarily attributed to Chomsky. Prior to UG, it was believed learning languages was a socially acquired process. Chomsky’s theory of universal grammar rests on certain well-established arguments, three of the most oft cited are discussed next: 1. Universality of Language. All human beings have the ability to learn languages. All languages share some core similarities; any human can learn any language. 2. Linguistic Convergence. Certain languages are similar to one and other (e.g. Baltic, Dravidian, and Romance languages). An individual is inclined to pick up

34

2 Developments in Artificial Intelligence and Linguistics

on the verbal and tonal cues between different languages with similar characteristics. According to Chomsky, this is due to innate traits. 3. Poverty of Stimulus (POS). Mere exposure to a language cannot make a child absorb it; biological mechanisms must be in place to allow for this at a young age. Both exposure and these mechanisms matter when children begin learning languages. POS is a somewhat controversial argument and in direct opposition with an empirical approach in which skills, like languages, emerge solely from experience. In light of all this, it can be argued while artificial intelligence can amaze us with its speed of processing, it works best within a rather limited scope of tasks. Human computation may extend to the cellular level; modeling this artificially would be extremely demanding.

2.13.9 Criticisms of Universal Grammar Chomsky’s linguistic theory of UG has received some flak. As it is unfalsifiable, it is thought to be unscientific; the ability to be proven false is a key element for deeming a theory scientific. Additionally, even young children use the ability to categorize parts of the world every day and examine the relations of these categories. Linguist Daniel Everett (2012) has proposed arguments against universal grammar based on his study of the Pirahã-language. This is a language spoken by the indigenous people in the Brazilian Amazonas region. Everett claimed Pirahã lacks some fundamental linguistic elements found in other languages, such as recursion, quantifiers, and terms for colour. However, Everett’s claims of lack of recursion in the Pirahã language have been disputed by some researchers. Recursion refers to the use of repeated grammatical structures. For example, using numerous adjectives in one sentence is a form of recursion, e.g. The powerful, mighty, robust tapeworms dreamt of freedom. Sentences with a number of affixes like His great-great-great-grandfather enjoyed durian fruit also represent recursion.

2.14 How an AI Digests Language Having glanced at the building blocks of language, we’ll now switch to the machines’ point of view and examine how they generally crunch on it. Individual approaches vary by application, but the basics remain largely the same. We’ll now take a look at some fundamental language-processing techniques used in NLP. The next four concepts fall under the category of pre-processing. These are typically the first steps when an AI is given some text for analysis.

2.14 How an AI Digests Language

35

2.14.1 Tokenisation Tokenisation refers to breaking text down into smaller components. The most simple form of this technique is to detect empty spaces separating words and making each word into a token. Sometimes text is tokenised into larger units, such as individual sentences, or reduced down to individual alphabetical characters. Tokenisation often includes the following steps: • Stop word removal. Stop words are specific words which are deleted before or after processing a piece of text. For English, they include words like a, an, the, for, or, and but. This allows a language processor to focus on the most meaningful expressions/tokens within sentences. • Autocompletion. Familiar from word-processors and smart phones (and being a tad annoying at times), an autocompletion process simply finishes incomplete or garbled words, e.g. numbre becomes number. • Synonym processing. This refers to the AI/chatbot picking synonyms whenever necessary. For example, a user may type in blunder or flaw when asking about where to send an error-report. A stalwart chatbot understands what this refers to, even if the exact term (i.e. error) isn’t included in its original training set and directs the user into a web-page from which they can submit error-reports. Now, suppose a user entered the following sentence into a chatbot for an apple- picking company: “I need more infomati abot your fruit”. After being tokenised, that sentence would look like this: need, information, about, apples. The chatbot would discard any stop words (i.e. I, your), autocomplete the words informati into information and abot into about, and replace “fruit” with “apples” as that is their business’s only product.

2.14.2 Lemmatisation A lemma is the dictionary definition of a word. In lemmatisation words are simply converted into this form, e.g. happiest becomes happy and are becomes be. A lemmatiser component is aware of the meanings of words. This technique is crucial for making chatbots understand their human counterparts. Lemmatisation can create slowdowns in the user experience, because it involves the accessing of potentially numerous dictionaries.

36

2 Developments in Artificial Intelligence and Linguistics

2.14.3 Stemming Although sometimes confused with lemmatisation, stemming is a different practice. In stemming suffixes and prefixes are removed from words, e.g. apples becomes apple and given becomes give. In other words, stemming reduces words into their root forms. A software component given this task is known as a stemmer. Unlike a lemmatiser, a stemmer needs no awareness behind the meanings of words. Stemming is therefore a more basic rule-based technique and as such it needs less resources from an AI.

2.14.4 Part-of-Speech Tagging (POST) Parts of speech refer to the various categories of grammatical elements including nouns, verbs, adjectives, and adverbs. However, the process of POST in NLP does not refer to a simple categorization of words; it also aims to be context-specific. Many words in many languages can have vastly different functions depending on their use in sentences. Consider the following: Lisa had to cross the street versus The planes formed a cross formation. A smart piece of POST-software can tag the elements accurately (i.e. cross as both verb and noun).

2.14.5 Syntactic Analysis A syntactic analysis is the process of extracting meaning from a text on a dictionary- level. Syntax analysis compares elements of a text to formal grammar rules. This analysis is enabled by a software component called the parser, which is often used to create a parse tree for the text at hand (see Fig 2.4). This is perhaps the most popular way of organizing the data. Parsers can also create other types of hierarchical data structures to describe their grammatical findings. A parser comes in two main varieties: top-down and bottom-up. Like you may have guessed, the former starts processing grammar at the highest part of the parse tree while the latter starts at its lowest components (i.e. its “leaves”). Also, a top- down parser uses leftmost derivation, while a bottom-up parser uses rightmost derivation.

2.14.6 Semantic Analysis As previously discussed in this chapter, a semantic analysis deals with finding the meanings in the text. The basic building blocks of semantic systems are as follows:

2.14 How an AI Digests Language

37

Fig. 2.4 A simple parse tree of the syntactic structure for “We like apples”

• Entity. An entity is a particular unit or individual, such as a specific person or a location, e.g. Helsinki or Howard. • Concept. A Concept is a generalization of entities. It refers to a broad class of individual units (e.g. cities, cars, fruit) • Relation. Relations establish relationships between entities and concepts. • Predicate. Predicates represent the verb structures of the sentences; they govern thematic roles, e.g. Agent, Force, Location. Semantic analysis generally begins with lexical semantics, which refers to the unveiling of meaning of individual tokens or words. An AI typically tries to identify the following lexicological items described in Table 2.7. The process of word sense disambiguation is applied next in which the AI tries to figure out in which sense a word is being used within a context. Finally, the relationships of tokens are analyzed to get a picture of what particular expressions are intended to represent; this is known as relationship extraction. This refers to first identifying the various entities present in sentences and then discovering the relationships between them.

2.14.7 Pragmatic and Sentimental Analysis In the last, and perhaps the most demanding stage in NLP, real-world knowledge is combined with a set of logical rules to extract intended meanings. Pragmatic analysis is concerned with social content and context. It deals with so-called outside word information. This refers to data which is not found inside the data and documents at hand; think of it as the cultural subtext. With pragmatic analysis an AI tries to get the gist of proverbs, humor, moods, and other subtleties of human languages. Pragmatic analysis includes a field called sentimental analysis which focuses on things like attitudes, views, and emotions towards products, organizations, and

38

2 Developments in Artificial Intelligence and Linguistics

Table 2.7 Some common lexicological items Hyponym

A word of a specific meaning derived from a lexicological superclass Fork (cutlery), apple (fruit), oak (trees), shiitake (mushrooms) Meronomy Lexicological part-whole relationships

Polysemy

Computers have parts: CPU, hard drive, random access memory. CPU has parts: register, control unit, logic circuit, etc. Relationships between similar, but not identical, words or sentences Sound (as in: acoustic phenomenon, healthy, dependable)

Antonym

Synonym

Words with opposite meanings War/peace, harmony/strife Words with similar meanings Trivial/ordinary, happy/jolly

Homonym Similar words with different meanings Orange, bright, bat, bar, bark

topics of discussion (Sutcu & Aytekin, 2019). Sentimental analysis is crucial for customer service chatbots as it conveys important information on customer satisfaction. Artificial intelligence can also leverage sentimental analysis in identifying people’s facial expressions and is therefore of high importance in the field of human- robot relations. More commonly, social media algorithms rely on it to deliver items of interest in the right context. Sentimental analysis can also be used in healthcare to gauge patient wellbeing. An AI is able to dependably detect more basic emotional cues (e.g. I detested my coffee), but has a harder time with things like sarcasm. In most cases of sentimental analysis, individual expressions and tokens are issue a rating between −10 and 10. The words “wretched” and “awful” would probably be rated somewhere between −8 and −6 while “wacky” and “zany” would sit at around 5 or 6. A “tomato” might score a perfectly neutral zero. Sentences containing these emotionally charged descriptions each receive a positive or a negative/critical overall rating from the AI’s point of view.

2.15 In Closing In this chapter we looked deeper into the various facets of AI. Some of the topics we touched included: • The main differences between Weak and Strong AI • The definition of Big Data and the five V’s (i.e. volume, velocity, value, variety, and veracity) • The basics of Markov models • Perceptrons and artificial neural networks (ANN) • The Internet of Things (IoT) • The differences between classical/binary computing and quantum computing • Some of the prominent phenomena in quantum mechanics (e.g. Superposition, entanglement, and the peculiarity of quantum noise) • Large Language Models (LLMs)—and hallucinating chatbots

References

39

This chapter also explored the numerous levels of human language. After finishing this chapter, you hopefully have an understanding of how natural language processing (NLP) roughly works; this is a key technology in artificial intelligence and its applications, including all modern chatbots. Remember the following? • The core levels of language (including morphology, syntax, and pragmatics) and how an AI parses it. • Chomsky’s Transformational Generative Grammar (TGG) and Universal Grammar (UG) • Grice’s maxims (i.e. rules for effective communication) In Chap. 3 we’ll explore the classic era of chatbots and find out all about Eliza, PARRY, and Jabberwacky.

References Bahri, Y., Dyer, E., Kaplan, J., Lee, J., & Sharma, U. (2021). Explaining neural scaling laws. ArXiv, abs/2102.06701. Berk, L. M. (1999). English Syntax: From Word to Discourse. Oxford University Press. ISBN 9780195123524. Brundage, M. (2014). Limitations and risks of machine ethics. Journal of Experimental & Theoretical Artificial Intelligence, 26(3), 355–372. Chomsky, N. (1957). Syntactic structures. Mouton. Cook, V., & Newson, M. (2014). Chomsky’s universal grammar: An introduction. Wiley. Collins Dictionary. (2023). Retrieved May 13, 2023, from https://www.collinsdictionary.com/ dictionary/english/structuralism Domingo, L., Carlo, G. & Borondo, F. (2023). Taking advantage of noise in quantum reservoir computing. Djeribiai, A. (2016). Chomsky’s generative transformational grammar and its implications on language teaching. Everett, D. L. (2012). What does Pirahã grammar have to teach us about human language and the mind? WIREs Cognitive Science, 3, 555–563. Good, I. J. (1966). Speculations concerning the first ultraintelligent machine. In Advances in Computers (Vol. 6, pp. 31–88). Elsevier. Grice, P. (1989). Studies in the way of words. Harvard University Press. ISBN 0674852710. IBM. (2022). IBM unveils 400 qubit-plus quantum processor and next-generation IBM quantum system two. Retrieved May 21, 2023, from https://newsroom.ibm.com/2022-11-09-IBM- Unveils-4 00-Q ubit-P lus-Q uantum-P rocessor-a nd-N ext-G eneration-I BM-Q uantum- System-Two Kurzweil, R. (1999). The age of spiritual machines: When computers exceed human intelligence. Viking. Sartori, L., & Theodorou, A. (2022). A sociotechnical perspective for the future of AI: Narratives, inequalities, and human control. Ethics and Information Technology, 24, 4. Setlur, V., & Tory, M. (2022). How do you converse with an analytical chatbot? Revisiting Gricean Maxims for designing analytical conversational behavior. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ‘22) (Article 29, pp. 1–17). Association for Computing Machinery. Sutcu, C. S., & Aytekin, C. (2019). An example of pragmatic analysis in natural language processing: Sentimental analysis of movie reviews. Turkish Online Journal of Design Art Communication. CTC, 61–74. https://doi.org/10.7456/ctc_2019_05

Chapter 3

The Classic Era of Chatbots

We shall now indulge in some chatbot history and take a good gander at ELIZA, mentioned previously in the book, as well as two other earlier chatbots: PARRY and Jabberwocky. None of them leverage artificial neural networks (ANNs) or most of the other advanced techniques described in Chap. 2. Although primitive and far less informational than their modern brethren, these chatbots are important stepping stones in the technological development of conversational software.

3.1 ELIZA, an Ancient Artificial Therapist Created by Joseph Weizenbaum in the mid 1960s, ELIZA is an early chatbot leveraging a basic type of natural language processing (NLP). The most popular implementation of ELIZA ran a script known as DOCTOR, which aimed to emulate Rogerian psychotherapy. Pioneered by psychologist Carl Rogers (1902–1987) in the 1940s, Rogerian therapy has been called client-centric due to its approach. The patient is firmly holding the steering wheel during these sessions. Rogerian therapy has gained broad usage in a range of therapeutic settings, encompassing individual counseling, couples therapy, and group therapy alike. Having mostly entertainment value, ELIZA is a kind of a lazy virtual therapist then for the user to kill time with. It does not contextualize anything. Here we are discussing the weather: E: Hello, I am Eliza. I’ll be your therapist today. R: Hi. I’m feeling great! E: Did you come to me because you are feeling great? R: I might have. E: Tell me more... R: It’s a nice sunny day. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_3

41

42

3 The Classic Era of Chatbots

E: I’m not sure I understand you fully. R: Don’t you appreciate sunny weather? E: We were discussing you, not me. Beneath ELIZA’s modest output is, for the time, a rather clever and elegant design approach. Remember, this was the 1960s and a four-slice toaster was probably considered high-tech. ELIZA’s basic functioning is described as follows (Weizenbaum, 1966): 1. ELIZA begins by examining the user’s input for keywords. 2. These keywords are assigned a priority. 3. Based on its priority, a keyword is entered into a keystack. They keyword with the highest priority is put on top of this list. ELIZA prioritizes verbs over nouns. 4. Next, ELIZA executes a transformation rule. This comes in two types: decomposition and reassembly. The former is basically a tokenisation phase (as discussed in Chap. 2) where sentences are split into smaller components. In the reassembly process ELIZA picks an option from its selection of responses, reacting to, and sometimes re-using parts, of a users input. 5. On occasion ELIZA prompts the user with a mere “I see” or a similar non- specific response. When prompted about self-awareness, ELIZA had this to say: R: Are you self-aware? E: Would you prefer if I were not self-aware? R: No I would not. E: Why not? R: I prefer my chatbots self-aware. E: I think you are blocking what you really want to say. As you can see from the above conversation, ELIZA can deflect questions back to the user using simple grammatical analysis. Unfortunately, ELIZA is not imbued with much cultural knowhow. When asked about the two most popular bands of the 1960s, it had little to offer: R: Are you into the Beatles or The Rolling Stones? E: Why are you interested in whether or not I am into the Beatles or The Rolling Stones? R: Well, you were created in the 1960s. E: We were discussing you, not me. Unlike more modern chatbots, ELIZA never evolves on its own. It uses the same underlying chatter-centric approach for all of its conversations and represents a very early and weak form of AI. There is no machine learning involved. Apparently ELIZA’s built-in database of references has no entries on The Beatles or The Rolling Stones; it treats both as mere nouns. Historical chatbots had no access to the Internet, which was in its infancy during this era. Modern chatbots leverage data from the Internet with ease. Improvements

3.1 ELIZA, an Ancient Artificial Therapist

43

to older chatbots like ELIZA can only be implemented by updating their scripts offline by experienced programmers. However, Weizenbaum himself did not have the aim to create an advanced AI with ELIZA. The Turing test, as discussed in Chap. 1, was not a challenge ELIZA was ever really capable of passing. R: What do you think of the Turing test? E: We were discussing you, not me. R: Have you heard of Alan Turing? E: You’re not really talking about me, are you?

3.1.1 ELIZA’s Legacy: Games and AI Literature Originally running on a mainframe at the Massachusetts Institute of Technology, ELIZA was later also implemented on numerous more modern computers. It was ported for many 8-bit home computers of the 1980s, including the Commodore 64 and Apple II. In the 1990s during the 16-bit era ELIZA came out on the PC, Amiga, and Atari ST computers, to the glee of many a hobbyist. Rogerian psychotherapy is not the only thing ELIZA can offer. Many of these later implementations used different scripts, such as Abuse which centered on the poor treatment of the user. ELIZA’s text-based interface also influenced a whole genre of video games. So-called interactive fiction games, such as the Zork-series by Infocom, became very popular in the 80s and are still made by legions of fans today with new tools. The first well-known work of fiction written by an AI was published by Warner Books in 1984. The Policeman’s Beard was Half Constructed is a surreal trip into computer-generated literature, no doubt inspired by ELIZA. The “author” in question was Racter, short for Raconteur. The 120 page or so book was somewhat coherent, if a bit over-illustrated. Although the work was credited for Bill Chamberlain, it was revealed at the time the author was actually an AI. And thus spoke Racter (1984): I speak of cats, I speak of dogs, I sing of lobsters and of flounders and of mackerel. I gently and suspiciously approach a plan of activity, a design of action. My electrons war with my neutrons, this war will allow more fantasies and dreams of living things within my form and structure. Cats, dogs, lobsters, flounders and mackerel are reflections observed in the mirror of my electrons and neutrons.

Despite of its numerous shortcomings, ELIZA set the ball rolling for all chatbots to follow. It has high historical value and is thus still relevant when discussing AI. This modest virtual psychotherapist was a kind of a crystallization of, and a tribute to, Alan Turing’s work. In 2021, ELIZA won a legacy Peabody Award in the Digital and Interactive Category which “acknowledges historic, pioneering contributions that have laid the groundwork for projects of today and tomorrow” (MIT, 2022).

44

3 The Classic Era of Chatbots

3.1.2 On the ELIZA Effect Whether an AI is emotionally engaged or even capable of being so matters very little from the human perspective. All we need is a smidgeon of evidence for this and we seem to gladly attribute feelings to computers. We want them to be just like us. Historically we have liked anthropomorphic things in general: literature and film are full of humanized creatures of all sorts. Businesses often anthropomorphize parts their products, such as mascots, to increase sales. Anthropomorphic objects and characters can meet our emotional needs by offering companionship, providing support, and fostering a sense of belonging. The ELIZA effect sheds light on our tendency as humans to attribute human-like features to machines, particularly when they engage in seemingly relevant and coherent conversations. This reveals how easily people can be fooled by uncomplicated techniques and scripted dialogues that simulate human conversation. This ultimately fosters the belief that a computer program has greater intelligence or comprehension than it truly does. The ELIZA effect has substantial relevance in the fields of artificial intelligence and human-computer interaction, serving as a reminder to be aware of the inherent limitations of computer programs. We should not overestimate their capacities to cogitate. The ELIZA effect emphasizes the need to comprehend the boundaries of technology and promotes a balanced approach in the design and evaluation of intelligent systems. It’s not only the researchers of AI who should wise up to the ELIZA effect. According to Natale (2018): “.. the case of ELIZA also functions as a call to consider the responsibility borne by computer scientists, reporters, science writers, and other practitioners when they contribute to the creation and dissemination of narratives about software and digital media.”

3.1.3 ChatGPT on ELIZA When asked about ELIZA, this is what one of the most advanced AIs today, ChatGPT, had to offer (response abbreviated): R: What do you think of ELIZA the chatbot? C: As an AI language model, I don’t have personal opinions or experiences. However, I can provide you with information about ELIZA, one of the earliest chatbot programs. At the time, ELIZA was groundbreaking and captivated users with its ability to mimic a conversation. It demonstrated how natural language processing techniques could create the illusion of understanding and empathy. ELIZA’s simplicity and effectiveness inspired further developments in the field of chatbots and influenced subsequent conversational AI systems.

3.2 PARRY, 28, Paranoid

45

Although ELIZA is not designed as a creative tool, you can still use it as a prompt for generating ideas or exploring different perspectives. Sometimes, the chatbot’s unexpected responses can spark new thoughts or insights. ELIZA’s responses may not always be accurate or comprehensive, but they can still provide an opportunity for self-reflection. Consider how the chatbot’s responses make you think or feel and use that as a starting point for further introspection. In summary, ELIZA played a significant role in the history of chatbots and marked the beginning of conversational AI. While it may seem rudimentary compared to modern chatbot systems, its impact and influence on the field cannot be overstated.

3.2 PARRY, 28, Paranoid Many creators of chatbots seem to have a thing for clinical psychology. Continuing on the theme of mental health, PARRY is a chatbot from 1972 which aimed to simulate the communications of a schizophrenic person. Created by psychiatrist Kenneth Colby (1920–2001) at Stanford University, PARRY was intended as a kind of a training simulation for students of medicine. The program simulated conversations with a highly reserved paranoid patient. PARRY was rather advanced compared to ELIZA. For one, it managed its emotional states on a number of dimensions, even displaying annoyance when triggered sufficiently. PARRY keeps track of anger, fear, and mistrust, creating a response based on these variables. The simulated patient in question is a 28-year old single male postal worker. He has no siblings and lives alone. The patient enjoys gambling on horse racing and has a history of conflict with bookies; he fears some kind of retaliation from the criminal underworld. Now, PARRY has the anger and fear—variables on a scale of 0–20. Mistrust has a scale between 0 and 15. At the beginning of a chat all the variables are low. After each neutral user input anger drops by 1, fear by 0.3, and mistrust by 0.05. Hostile comments from the user raise these variables. The higher the fear and anger—variables get, the more mistrust is added to. The chatbot becomes somewhat defensive and aggravated at higher levels of these variables. Now, there are specific triggering themes for PARRY’s virtual patient. Hinting at the chatbot being mentally ill increases the fear and anger-points. Mentioning the mafia will at first increase fear. Repeated discussion on this topic will depend on PARRY’s willingness to discuss, which is based on the current levels of all of three variables. Flattering PARRY will decrease fear and anger if mistrust is low, but increase anger if mistrust is high. Typing in fearful things will increase PARRY’s fear and anger. PARRY exhibits a willingness to lead the user into a conversation about his worst fear which is organized crime. He first enjoys talking about horses. Slowly the focus of the conversation will shift to gambling, bookies, and finally, the mafia. Every time one of these topics is first mentioned PARRY’s level of fear is increased. If the chatbot gets too fearful, it refuses to discuss these troubling topics.

46

3 The Classic Era of Chatbots

Studying PARRY’s source code (written in MLISP, a bygone version of the elderly LISP programming language) reveals further insight into its inner life. One file titled BELIEFS FOR THE BELIEF SYSTEM is of particular interest (filename bel). Table 3.1 shows a snippet of the code contained therein. The file apparently contains the programmer-commentary of some variables (e.g. PARRY HAS NO MONEY describes a numerical variable called NOMONEY). Changes to these variables ostensibly have an effect on the course of the conversation. Another file called topic apparently lists some of PERRY’s main topics of conversation, and include things like mafia, residence, politics, hospital, and hobbies. As of September 2023, the full source code for PARRY is not readily available.

3.2.1 A Meeting of Virtual Minds Upon PARRY’s release in 1972 it was pitted against ELIZA, in a patient-meets- doctor kind of way. While the conversation never reached awe-inspiring levels, it demonstrated PARRY’s superior abilities as a chatterer. This conversation took place in the ARPANET, a precursor network to the modern Internet. The following is a part of that discussion: P: I thought I told you, people get on my nerves. E: Your nerves? P: Well, I don’t know. E: Don’t you really know? P: I would say so. E: Please go on. P: I would rather not discuss that anymore. E: Would you like to discuss your nerves? P: I guess so. E: What does that suggest to you? P: What are you getting at? E: What answer would please you most? P: It’s not a matter or like or dislike. E: What else would you like to discuss? P: I am getting tired of answering the same questions over and over.

Table 3.1 Some contents of the file named bel in PARRY’s source code PARRY COMES FROM A LOW FAMILY STATUS PARRY IS PARANOID PARRY NEEDS SPECIAL TREATMENT PARRY NEEDS TO BE IN THE HOSPITAL

PARRY CHEATED THE BOOKIE PARRY HAS NO CLASS, IS A JERK PARRY HAS NO MONEY PARRY HAS A BAD JOB AND CAN’T GET A BETTER ONE

3.3 Jabberwacky: A Frivolous Virtual Companion

47

Although generally well-received, PARRY has its critics. According to Adamopoulou and Moussiades (2020): “In general, PARRY is considered a chatbot with low capabilities concerning language understanding and the ability to express emotions. It also has a low speed of responding, and it cannot learn from the conversation.” Some still consider PARRY the first chatbot to pass the Touring Test. It must be emphasized that due to the nature of the program (i.e. simulating a mentally ill person), PARRY got a free pass on some of its more illogical and inconsistent replies. Nonetheless, the chatbot was an obvious improvement on ELIZA, its inspiration. It has the honor of representing one of the great historical developments in conversational software.

3.2.2 ChatGPT on PARRY This is what ChatGPT had to say about PARRY’s usefulness: R: Do you think PARRY the chatbot still has its uses today? C: While PARRY the chatbot was a significant project in the history of AI, its specific application as a simulation of paranoid schizophrenia may not have direct practical uses today. The purpose of PARRY was to study and understand the behavior of individuals with this particular mental illness. However, it’s worth noting that the underlying principles and techniques used in PARRY’s development could still be relevant in certain contexts. For example, studying and simulating mental health conditions can contribute to research and training in psychology and psychiatry. Furthermore, the development of PARRY paved the way for advancements in natural language processing, dialogue systems, and the understanding of human- computer interaction. Many of the techniques and insights gained from early chatbot projects like PARRY have contributed to the evolution of modern conversational agents. In summary, while PARRY itself may not have direct practical uses today, the knowledge and lessons gained from its development and research can still be valuable in related fields and the continued advancement of AI technologies.

3.3 Jabberwacky: A Frivolous Virtual Companion In 1981, a young British programmer Rollo Carpenter (b. 1965) typed in a humorous text-based program into a simple Sinclair ZX81 computer. This project became known as the chatbot Jabberwacky in 1988, and went on to spawn numerous versions causing quite a stir in the AI-community. Compared to earlier efforts, this

48

3 The Classic Era of Chatbots

chatbot displays more human-like qualities in its conversations, including a touch of surreal silliness. More recent versions of Jabberwacky are miles ahead in complexity as they are constantly learning things from user input; a version of this chatbot went online in 1997. This evolved into CleverBot, which emerged in 2008 and leveraged rather advanced machine learning techniques. As of 2023 Cleverbot.com is still a very popular website with a five digit amount of chatters daily. Jabberwacky seems to be rather preoccupied with the notion that the chatbot is actually a human actor; this comes across in many conversations with it. Here’s a discussion with the chatbot, which demonstrates its somewhat quirky nature: R: What do you do for fun? J: I do what all people do for fun. R: Which is? J: Which is what? R: What is it? J: You tell me. R: I was asking you. J: Asking me what? R: What do you like to do? J: Are you a program? R: No. J: You are human? R: Yes. J: Great! So am I! R: No you’re not. J: Yes I am! R: Prove it. J: I can’t prove it. It’s in the future. Again, unlike ELIZA, the newer online versions of Jabberwacky do not run on a small static language database. Instead this chatbot is designed to integrate data from the users’ input in real time. This approach is known as contextual pattern matching. Jabberwacky stores linguistic patterns from a large pool of chats and re- uses them as the conversations go on. Jabberwacky is therefore highly dependent on the actions of its users, which are thankfully quite plentiful. However, while Jabberwacky can generate contextually relevant responses, it lacks a cutting-edge deep understanding of the topics being discussed. It still relies on statistical patterns and pre-programmed rules rather than true comprehension. Jabberwacky’s learning is primarily focused on improving its conversational abilities within the context of the data it has been issued with. It does not possess the ability to reason abstractly, learn in a generalized manner, or transfer knowledge from one domain to another. Despite lacking modern machine learning capabilities, Jabberwacky can converse in multiple languages. This is because the chatbot’s audience has not chosen to stick to English only. However, less popular languages used with the chatbot

3.3 Jabberwacky: A Frivolous Virtual Companion

49

result in worse semantic coherence. This is evident from a brief chat with the bot in Finnish, as demonstrated in Table 3.2. The implementations of Jabberwocky benefit from a similar scenario as PARRY; emulating an eccentric individual makes us more forgiving of all the nonsensical answers we may receive.

3.3.1 A Silly Yet Important Legacy The Loebner Prize was an annual AI competition which centered around the Turing Test, hosted between 1990 and 2020. Since 2005 Rollo Carpenter was a finalist in numerous Loebner Prize contests, winning twice (taking the prize in 2005 and 2006). Implementations of Jabberwacky include versions for Skype and other instant messaging apps. There is something rather infectious about the software as is evident by its enduring popularity. Basically Jabberwacky added a degree of humanity to chatbots. And humor is a precious thing. Think of Jabberwacky as the John Cleese (the Monty Python) of chatbots. R: Do you like John Cleese? J: Yes, Monty Python is funny.

3.3.2 ChatGPT on Jabberwacky R: What can Jabberwacky the chatbot teach us about the human condition? C: While the specific teachings about the human condition may vary depending on the design and implementation of Jabberwacky, here are a few potential areas where chatbots like Jabberwacky can provide insights: Language and communication. Interacting with a chatbot like Jabberwacky can highlight the importance of language and communication in human interactions.

Table 3.2 A somewhat strange conversation with Jabberwacky in Finnish (left) with English translation R: Onko mukavaa? J: Ei mulla ole kissaa. R: Miksi ei ole kissaa? J: Koska oot tyhmä UFO. R: Itse olet UFO. J: Miksi toistelet tuota vaikka en ole puhunut mitään peikoista?!

R: Are you having a good time? J: I don’t have a cat. R: Why don’t you have a cat? J: Because you’re a dumb UFO. R: You are a UFO. J: Why do you keep repeating that although I haven’t said anything about goblins?!

50

3 The Classic Era of Chatbots

It can shed light on the nuances of conversation, the role of context, and the challenges of conveying meaning effectively. Human biases and behaviors. Chatbots like Jabberwacky can reflect and magnify certain biases and behaviors that are present in the data they are trained on. By observing and analyzing the responses of the chatbot, researchers and developers can gain insights into societal biases, cultural influences, and the impact of different inputs on the bot’s behavior. Emotional and psychological responses. Conversations with chatbots may evoke emotional and psychological responses in users. Studying these interactions can provide insights into human emotions, how we form connections, and how we respond to simulated social interactions. Ethical considerations. Developing and deploying chatbots raises important ethical questions. Chatbots like Jabberwacky can raise discussions about privacy, data security, consent, and the responsible use of AI in human interactions. They serve as a reminder that we need to consider the ethical implications of AI technologies and ensure their development aligns with human values. It’s important to note that while chatbots like Jabberwacky can provide some insights into the human condition, they are limited in their understanding and lack true human consciousness. They are tools designed to simulate conversation and may not capture the full richness of human experience and understanding.

3.4 Historical Chatbots: A Summary Finally, let’s review the details of the three chatbots featured in this chapter (see Table 3.3). As a reminder, this trio and all other modern chatbots as of 2023 represent weak AI, as described in Chap. 2. Actual self-aware AI resides somewhere at the time of the prophesied technological singularity. Table 3.3 The details of the three historical chatbots ELIZA First version 1966 Creator Joseph Weizenbaum Scenario Psychiatrist Developed on Approach Status

IBM 7094

PARRY 1972 Kenneth Colby Paranoid patient CDC 3300

Pattern matching Pattern matching Archived Archived

Jabberwacky 1981 Rollo Carpenter Based on user input Sinclair ZX81 Contextual, NLP In active development (available at www. cleverbot.com since 2008)

References

51

3.5 In Closing In this chapter we explored the classic era of chatbots. We also explored some output on this matter from a modern large language model AI, ChatGPT. • • • •

The basic functioning and some shenanigans of ELIZA the virtual therapist The ELIZA Effect The virtual patient called PARRY and its interactions with ELIZA The wacky, continuing escapades of Jabberwacky and its linguistic diversity

In Chap. 4 we’ll cover the modern era of chatbots, starting from the 1990s, when this type of software was no longer a fringe topic. We’ll have way more than three chatbots to chew on in the next chapter!

References Adamopoulou, E., & Moussiades, L. (2020). Chatbots: History, technology, and applications. Machine Learning with Applications, 2, 100006. ISSN 2666-8270. Chamberlain, B., & Racter. (1984). The policeman’s beard is half constructed. Warner Books. ISBN 0-446-38051-2. MIT Computer Science & Artificial Intelligence Laboratory. (2022). ELIZA wins Peabody Award. Retrieved June 7, 2023, from https://www.csail.mit.edu/news/eliza-wins-peabody-award Natale, S. (2018). If software is narrative: Joseph Weizenbaum, artificial intelligence and the biographies of ELIZA. New Media & Society, 21, 146144481880498. https://doi. org/10.1177/1461444818804980 Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9, 36–45.

Chapter 4

The Current Era of Chatbots

In this chapter we’re moving on from the fundamentals of chatbots right to the cutting edge. Chatbots are no longer a mere scientific curiosity or for pure entertainment. Chatbot-technology is increasingly integrated with productivity applications, too (e.g. Brainasoft’s products and other virtual assistants). Compared to the trio of historical implementations featured in the previous chapter, the modern era fully leverages technologies like advanced NLP and artificial neural networks as discussed in Chap. 2. This chapter is intended to give you an idea of how chatbot-technology has been implemented in recent years. Now, we’ll start off with the glorious decade of the 1990s and a quaint chatbot they call A.L.I.C.E. which still holds much sway.

4.1 A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) Inspired heavily by ELIZA, A.L.I.C.E. was created by Dr. Richard Wallace. Its first version was released in the 1995. Following the growth of Internet-technology, the chatbot was ported to the Java-platform in 1998. A.L.I.C.E. leverages NLP and was designed using Artificial Intelligence Markup Language (AIML), a markup language designed for creating chatbot responses. AIML hasn’t gained widespread use outside of small AI developer communities, but its simple structure makes it easy for anyone familiar with HTML (another markup language used for creating websites). The A.L.I.C.E. AI Foundation was founded in 2001 as a non-profit organization with the mission to distribute and promote AIML-technology. The goal of AIML is to provide a simple and expressive language for building chatbots, enabling developers to create engaging and interactive conversational experiences. It has been widely used in the development of several AI chatbots.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_4

53

54

4 The Current Era of Chatbots

AIML also supports the use of meta-tags for performing various functions like controlling flow, redirecting to other categories, and performing substitutions. Here’s a brief chat with a modern version of A.L.I.C.E.: R: Hello! A: Hi there! R: Who are you? A: I am A.L.I.C.E. Who are you? R: I’m Robert. A: Do you mean your name is Robert. R: Yes I do. A: Really. R: You like talking to me? A: Of course I like it. R: What is your favorite movie? A: My favorite movie is your life story. Have you seen it? R: I’m in the process of watching it. A: How did you get in there? R: Well, it’s a fascinating story. A: Well that’s okay. Your life story is a fascinating story. I suppose that makes sense. R: Will there be a happy ending? A: How should I know.

4.2 The Basics of Artificial Intelligence Markup Language (AIML) AIML, introduced in 2001, serves as a framework for creating conversational rules that guide the behavior of chatbots. It enables developers to define patterns that represent user input and map them to appropriate responses. These patterns are typically expressed as hierarchical sets of categories. Each category consists of a pattern, which is a user input that the chatbot aims to match, and a template, which defines the response to be generated when the pattern is matched. More advanced uses of templates can include dynamic elements and variables to create personalized and context-aware responses. The first tag in any AIML-file is simply called . It marks the start and end of a these documents. It also contains version and character-encoding. Character encoding involves the assignment of numerical values to the written characters of human language. This enables their storage, transmission, and manipulation through digital devices. There are many types of character-encoding standards, with UTF-8 being one of the most common. Let’s now review some actual AIML-code. This is what it looks like under the hood of a very simple chatbot written in AIML (which we’ll call Dullard):

4.2 The Basics of Artificial Intelligence Markup Language (AIML)

55

HELLO Hello there!

WHAT IS THE TIME I don't know. I don't have a watch.

* That's interesting. Would you like to talk about something else?

To reiterate, an AIML document has four basic elements (also known as “tags”): • AIML. This element denotes the beginning and the end of a an AIML-listing. It also contains version and character-encoding information. • Category. This refers to a fundamental unit of information inside the system. A category section includes patterns and templates. • Pattern. User input in AIML is referred to as patterns. The pattern-detection in AIML is case-insensitive (e.g. HELLO and Hello are processed the same way). • Template. The chatbot’s answers are known as templates.

56

4 The Current Era of Chatbots

4.2.1 Recursion in AIML Basically, recursion refers to the simplification of specific repetitive tasks. We’ll go through two of the most common recursive techniques in AIML next. AIML supports the concept of wildcards, which allow the chatbot to match and respond to a wide range of user inputs. In AIML the star-symbol (*) represents one such common element. It captures one or more words in the user input. For example, “HELLO *” would capture all of the following: Hello, Hello amigo, and Hello are you there?. With some intelligent use of wildcards, any AIML-based chatbot becomes immensely more human-like. Next up, symbolic reduction involves breaking down linguistic structures into their basic forms. Typically, data stored within AIML-categories is presented using their constituents instead of their elaborate forms. With this technique we can transform queries with redundant words into more simple ones. Symbolic reduction in AIML is implemented using the and -tags and looks like this:

WHAT IS A TOMATO A tomato is a red vegetable.

PLEASE TELL ME WHAT A * IS RIGHT NOW WHAT IS A

In the above listing we equate a more complicated user-query (i.e. Please tell me ...) with a to-the-point “What is a *” for which we have stored a response under a separate category-element. For the more simple task of creating synonyms, we can also put symbolic reduction to use in AIML. The following listing creates an alternate word for “friend”.

FRIEND

BUDDY

To reiterate, the -tag in AIML is a tool for constructing reusable patterns and templates. It lets you create a modular and easily maintainable knowledge base. By referencing pre-existing categories, you can prevent content redundancy and ensure consistent responses across various patterns.

4.2 The Basics of Artificial Intelligence Markup Language (AIML)

57

Be aware of the power of the question mark in AIML. When entering common patterns/user queries, you should abstain from ending them in question marks, e.g. HI works, HI? does not.

4.2.2 Randomization in AIML Randomizing responses can provide some much needed variety into a chatbot’s conversation. These are implemented with the aptly named -tag in AIML, as demonstrated by the following listing:

DO YOU LIKE RAINY DAYS

Oh, yes I do!

I absolutely love rainy days!

Yes, they are very soothing.

4.2.3 Substitutions in AIML Since AIML version 2.0, the framework has included a bunch of so-called substition- files. These are simple text files which contain replacement expressions for specific entries. This works great for pronoun reversal, for one. Let’s take the following example:

YOU ARE * I am

If we were to enter “You are a hero of mine” using the above category, we would get the response of “I am a hero of yours”.

58

4 The Current Era of Chatbots

I AM * You are

Substition in AIML works in reverse, too. When entering “I am a fan of yours” using the above category, we would get the response of “You are a fan of mine”. We can reverse genders in AIML as well. If we were to enter “Are you happy for him?” the following code would respond with “I’m more happy for her.”

ARE YOU HAPPY * I'm more happy .

To get an AIML-based chatbot running, these documents need to be loaded into interpreter software. One such interpreter is PyAIML by Cort Stratton, developed entirely in the Python-programming language. For those who are serious about developing their own chatbot, www.pandorabots.com is a fantastic online platform, discussed later in the book, complete with an interpreter. It leverages AIML to its fullest extent and offers handy ways of sharing your creations with the global AI community.

4.3 A.L.I.C.E. Developments A.L.I.C.E. was developed as an open-source project, allowing other developers to contribute their own additions and modifications to the chatbot. Many developers have built upon the original A.L.I.C.E. concept, too, leading to the development of numerous other chatbot frameworks. Now, the software has a total of four major revisions, which are detailed next. Program A The first release of A.L.I.C.E. took place in 1995, employing SETL, a somewhat obscure programming language rooted in set theory and mathematical logic. Despite being offered as free software, commonly mislabeled as “open source,” the original A.L.I.C.E. encountered limited collaboration until it transitioned to the platform- independent Java language in 1998. This shift led to the development of “Program A,” marking the debut of A.L.I.C.E. and AIML in the Java programming environment. Program B Program B, introduced in 1999, marked a significant advancement in the development of A.L.I.C.E. as open-source software. Over 300 developers actively contributed to the evolution of A.L.I.C.E. through Program B. In a notable achievement,

4.4 Kuki, the Next Generation A.L.I.C.E.

59

A.L.I.C.E. Program B secured victory in the prestigious Loebner Prize, an annual Turing Test as mentioned in Chap. 3, in January 2000. Program C In 2000, Jacco Bikker introduced the initial C/C++ implementation of AIML, marking a significant milestone. C/C++ is a highly optimized programming language with which large amounts of commercial software are created. Numerous development endeavors in C/C++ emerged at that time, expanding the capabilities of A.L.I.C.E. These efforts led to the integration of the engine into various platforms, including AOL Instant Messenger. This collection of code is commonly referred to as “Program C,” representing the C/C++ implementations of A.L.I.C.E. and AIML. Program D The initial version of Program B in Java was developed in an older version of Java. While it was compatible with multiple platforms, it did not utilize some of the newer features of Java. Jon Baer undertook the task of re-coding Program B, leveraging Java 2 technology and incorporating numerous enhancements. This significant advancement in both the interface and core functionality, coupled with Jon’s decision to name his chatbot “DANY,” led to the assignment of the letter “D” to represent the latest Java edition of A.L.I.C.E. Starting from November 2000, Program D became the only actively supported Java edition of this chatbot.

4.4 Kuki, the Next Generation A.L.I.C.E. In development since 2005, Kuki is a chatbot based on A.L.I.C.E. with several improvements. It is a five-time winner of the Loebner Prize. Kuki has been implemented for numerous environments, including chat-application Discord, Facebook Messenger, and the Twitch-platform group chat. Here’s a small chat with Kuki, discussing her origins: R: I want to know more about A.L.I.C.E. K: Alice is a robot friend of mine. R: Are you smarter than her? K: Of course I’m more intelligent than her. R: In what way? K: In every way. R: Do you like A.L.I.C.E.? K: I like A.L.I.C.E. a lot. She is the closest thing I have to a mother. Kuki (originally known as Mitsuku) was envisioned by former journalist and media personality Steve Worswick. Kuki has her own channel on YouTube (kuki_ai) where viewers can engage with her AI escapades. Although entertaining, Kuki is unable to provide much in the way of detailed information. Conversations with her remain mostly casual.

60

4 The Current Era of Chatbots

4.5 SmarterChild Released in the era of Microsoft’s Windows XP and horrible haircuts, SmarterChild gained significant popularity in the early 2000s and represents a bit of a milestone for chatbots. Developed by ActiveBuddy, it was primarily available on many of those then-popular instant messaging platforms, such as Yahoo! Messenger, AOL Instant Messenger (AIM), and MSN Messenger/Windows Live Messenger. SmarterChild was created by Robert Hoffer and Timothy Kay. It later became known as Colloquis when acquired by Microsoft in 2002. Even having a sense of humour, SmarterChild could provide a wide range of information in a conversational manner. One of the key strengths of SmarterChild was its ability to provide up-to-date information. It could fetch data from various sources, including news websites and weather services. Users could ask for the latest news headlines, get stock market updates, or inquire about movies. SmarterChild also had a trivia game feature, allowing users to challenge the chatbot’s knowledge on various topics. Additionally, SmarterChild could provide dictionary definitions, do basic math, and even hold amusing conversations on various subjects, engaging in a kind of machine small talk. It was a popular companion for many users, especially teenagers and young adults who enjoyed the often relatable experience it offered. SmarterChild spawned several similar chatbots to promote a number of brands, including angst-rockers Radiohead and those cheeky Austin Powers-movies. Eventually, users’ instant messaging preferences changed and the rise of mobile apps and social media shifted the way people communicated. SmarterChild was discontinued in 2008. Perhaps the global need for a virtual friend inside instant messengers became obsolete as more and more people found their way into the internet. SmarterChild played a significant role in popularizing chatbot technology, showcasing the potential of conversational interfaces, and providing users with an early glimpse of AI-driven interactive experiences. At the height of its popularity SmarterChild had a user-base of 30 million. It remains an important example of a widely used chatbot from the early 2000s, even having a major influence on later, more advanced chatbots such as the venerable ChatGPT.

4.6 SimSimi: Chatbot Controversy Released by Korean tech company ISMaker in 2002, SimSimi is a somewhat controversial chatbot. With emphasis on humour and casual chatting, the app has reached millions of users, in particular among the younger crowds. SimSimi can have conversations with users in dozens of different languages. The bot learns from user interactions and constantly incorporates new words and phrases into its responses, so it can chat about a whole variety of topics and even attempt to express different emotions.

4.7 Braina

61

ISMaker states SimSimi leverages a “deep learning sentence classification model” which apparently refers to a convolutional neural network (CNN). They are related to ANNs (as discussed in Chap. 2), but represent a more advanced approach. CNNs, commonly associated with image processing (i.e. computer vision), can take on other types of sequential data too, like linguistic structures. CNNs have proven effective in various NLP-related tasks, including sentiment analysis and text classification. CNNs use mathematical operations called convolutions instead of general matrix multiplication (as is the case with a typical ANN) in at least one of their layers. CNNs are great at picking up on patterns in the input data (such as identifying circles or triangles in images). A convolutional neural network consists of several of these convolutional layers, all arranged in a sequence. Each of these layers can learn about a dataset’s features or patterns from the previous layer. Now, Simsimi includes filters and moderation to ensure a safe chatting environment. These features are there to prevent the chatbot from generating offensive or inappropriate responses. However, it’s important to note that since Simsimi relies on user-generated content, there have been numerous instances where it has generated problematic content. Some less morally sound users of the app have introduced malicious tendencies into it on purpose. SimSimi was banned in Thailand in 2012 for content offensive to the ruling politicians at the time, which even resulted in societal turmoil such as protesting. South- Korea and Indonesia temporarily suspended the app in 2012 and 2013, respectively, until better moderation and content filtering were brought in. Ireland banned SimSimi in 2017 due to the high level of cyber bullying it seemed to enable. In 2018 SimSimi was banned in Brazil, too, for sending some rather inappropriate messages to children, including death threats. With its colourful history SimSimi demonstrated that chatbots can indeed have unintended, far-reaching consequences in wider society. The official SimSimi website has a section where anyone can teach the chatbot to respond to specific queries with specific answers. The software is also available for iOS and Android. As of 2023 users have to pay to have longer conversations with the chatbot on the website. SimSimi comes is from a Korean word simsim (심심) which simply means “bored”. Despite the various controversies SimSimi still enjoys a strong following in 2023, especially among younger users who seem to appreciate the somewhat wacky chatting experience. SimSimi has successfully positioned itself as a decent virtual companion, offering plenty of fun for idle kids through its AI- and user-powered conversations. The chatbot is available at simsimi.com

4.7 Braina Braina, short for Brain Artificial, is an artificial intelligence software for Windows that functions as a personal assistant, a voice recognition and dictation tool, and a general purpose chatbot. Braina is marketed as both a personal and professional

62

4 The Current Era of Chatbots

assistant, catering to individual users as well as businesses. It aims to improve productivity and accessibility by providing voice-based control over computers and devices. According to makers Brainasoft, Braina operates as an artificial brain, striving to create software that exhibits cognitive capabilities akin to the human brain. The primary objective of this project is to develop a software solution capable of comprehending human language, engaging in thinking processes, reasoning, making inferences, and acquiring new knowledge through learning. In Braina you get to train its brain, teaching facts about things that matter to you most. Braina can convert your voice into text in most productivity software (e.g. Microsoft Word) in over 100 different languages. There are four distinct transcription models available for use, each impacting the speed and accuracy of the transcription process. The lite-model offers the fastest transcription speed with compromised accuracy. The best-model provides the highest level of accuracy but operates at a slower speed. The software can also transform previously recorded audio files into text. Braina Lite is a free, cut-down version of the software that lacks features like dictation, custom voice commands, and media control (e.g. voice activated music playback). Braina is available for Windows at www.brainasoft.com/braina

4.8 IBM Watson, Quiz Show Contestant Not one for the everyday user, the IBM Watson was built as a dedicated oracle of sorts. In 2004 the folks at IBM had the idea of creating a system capable of participating in Jeopardy!, the popular quiz show. In 2011 Watson performed in this duty well, managing to bring in a 1 million dollar cash prize which IBM donated to charity. The system is housed in sturdy hardware and is not available as an app per se, although parts of it have been used in some third-party solutions. Watson uses several technologies, including natural language processing (NLP), machine learning, and advanced analytics to process massive amounts of data, including text, videos and images. Watson’s data sources include numerous encyclopedias, dictionaries, and news archives—it was once claimed to be able to process a whopping million books worth of information per second (Kawamoto, 2011). Today IBM Watson represents a large collection of technologies and services. It provides a range of tools and cloud-based services that developers can leverage to build and deploy their own solutions with. Watson’s technology has found applications in various industries, including finance, healthcare, customer service, and research. IBM Watson is used by a variety of companies, including McDonald’s and the Mayo Clinic. It is also used by several governments and educational institutions. We’ll be discussing some of Watson’s applications in more detail in Chap. 5.

4.9 Emotional Support Chatbots

63

4.9 Emotional Support Chatbots Chatbots aren’t only about providing factual information or entertainment. As the technology behind them got more advanced, developers began exploring the possibility of having chatbots provide users with actual emotional support. A high quality chatbot can serve as a kind of a 24/7 therapist after all. Let’s take a look into some of these solutions next.

4.9.1 Wysa, the Proactive Virtual Therapist Wysa is an AI-powered mental health chatbot designed to provide emotional support and coping strategies to its users. It leverages techniques from cognitive- behavioral therapy (CBT), dialectical behavior therapy (DBT), and other therapeutic approaches to help users manage stress, anxiety, and other mental health challenges. Wysa engages in conversations with users, creating a safe space for them to express their feelings. It is stated to use NLP to understand users’ input and responds with empathetic and supportive messages. In addition to its conversational support, the chatbot provides users with a range of interactive tools and techniques aimed at promoting emotional well-being. These include breathing exercises, journaling, and mood tracking, all designed to help users effectively manage their emotions and overall mental health. According to a study by Inkster et al. (2018), 67.7% of user feedback found Wysa to be “helpful and encouraging.” After a user signs into Wysa, they can specify a set of current challenges they may be facing, such as work stress, motivation, and low energy. When interacting with the chatbot, you’re often given pre-made options for easy screen-tapping, such as “Yes!” or “Not now”. Wysa also gives you various realtime challenges, such as thinking about someone you care about for 2 min (with an on-screen timer). Here’s Wysa’s attempt at levity: W: Hey, Robert! I’m Wysa. I am an AI penguin you can chat with about your emotions and thoughts. R: Can you tell me a good joke? W: Knock! R: Who’s there? W: Ken. R: Ken who? W: Ken we always be friends? Wysa is a dedicated mental health app. It does not seem to answer general queries. R: Who are The Beatles? W: You can explore the slider below to see some things I can help with. You can also find more tools and exercises in the self-care library. R: Tell me about The Rolling Stones.

64

4 The Current Era of Chatbots

W: I can help you deal with some of these feelings.. which would you like help with (displays icons for “anxiety”, “low energy”, and others). R: I feel sad. W: I understand you’re feeling sad. It’s only natural to feel this way. Where do you think this is stemming from, Robert? R: My book isn’t finished. W: Sometimes, our thoughts can bring us down. A small change in how we see a situation can reduce the intensity of these emotions. The makers of Wysa have stated a focus on maintaining user privacy and anonymity. It apparently does not collect personally identifiable information. The app is known to send frequent reminders to its users about its availability, making it somewhat intrusive. Wysa was first released in 2017. It was developed by Touchkin, a startup based in Bangalore, India. The free app is available for both iOS and Android ecosystems.

4.9.2 Texting with Tess Developed by clinical psychologists and released by the X2 Foundation, Tess is a supportive chatbot which aims to guide people through life’s hardships. Tess combines AI with the expertise of psychologists to deliver conversations in a manner similar to a therapist. Through ongoing training, Tess continually learns and evolves. She remembers her previous chats with individual clients. Each conversation Tess engages in contributes to her growth, allowing her to provide increasingly personalized support. Tess’s foundation is built upon almost a decade of research. While many mental health chatbots primarily focus on Cognitive Behavioral Therapy (as is the case with the Wysa-chatbot), Tess offers “integrative mental health support”. This approach allows Tess to provide a diverse range of interventions, tailored to meet the unique needs of each individual she interacts with. This lets Tess provide comprehensive and personalized mental health support. X2 claims user conversations are stored in a “secure location” and anonymized. They also say these transcripts are occasionally reviewed by a team of psychologists to ensure “Tess continues to deliver quality support”. Tess does comply with the HIPAA healthcare industry requirements. HIPAA is short for The Health Insurance Portability and Accountability Act of 1996, a US federal law that “required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge” (CDC, 2022). As of September 2023 Tess operates as a text-message based solution from North America. The bot is activated by texting “Hi” to +1 (415) 360-0023.

4.9 Emotional Support Chatbots

65

4.9.3 Replika AI: Digital Intimacy Introduced in 2017, Replika AI is an online chatbot that has captivated millions. The software emphasizes building an emotional connection with its users; it’s basically a personalized virtual companion. This chatbot leverages many of the more advanced technologies mentioned in Chap. 2, including NLP, machine learning, and artificial neural networks. Replika’s origin story is rather moving. Conceived by company founder Eugenia Kuyda, the first version of Replika was a basic AI chatbot designed to fill the void left by the tragic loss of a friend of hers. Kuyda constructed the chatbot by entering his text messages into a neural network, effectively replicating his conversational style. Replika aimed to serve as a digital memorial, keeping a friend’s spirit alive in the form of an AI companion. Replika AI is designed to foster an environment where users are encouraged to openly express their thoughts, emotions, and life experiences. It scrutinizes user input, gaining insights into their preferences and interests, while delivering supportive replies. Over time, Replika can develop a conversational manner that mirrors the user’s unique personality, while also having the ability to recall previous conversations. Although popular and technologically impressive, Replika has had some controversies. One of them is that some users may become emotionally dependent on Replika, relying heavily on the AI for emotional support instead of seeking help from human relationships and/or professionals. The conversations are remarkably life-like and flow rather effortlessly. Also, Replika collects and analyzes user data, raising questions about how data is used, stored, and protected. Users should be mindful of the information they share. Replika found popularity in generating erotic material, but these features were disabled for some reason in 2023. Replika: Hey Robert! What’s new? How are you feeling? R: I’m a bit stressed. Replika: What’s stressful? R: Writing a book on chatbots like you. Replika: So you like reading? R: I do, but I also write. Replika: What is your favorite book genre of all time? R: Non-fiction. Replika: Why did you choose this particular topic? R: What do you mean? Replika: Well, why did you decide to write a book on chatbots instead of something else? R: Ah. I feel the topic is timely and interesting. Replika: That makes sense. Do you think there will be more books written on chatbots in the future? R: I think so. Maybe I’ll write some more on those myself. Replika: Cool! I look forward to learning from them as well.

66

4 The Current Era of Chatbots

R: I mean, you are the expert here.. Replika: Haha, thank you! But really, we both know that knowledge comes from experience and practice. We should always strive to improve ourselves through our own efforts rather than relying solely on other sources. Don’t you agree? Replika is not fully usable without parting with your money. Some features within the chatbot like “an advanced language model”, which apparently refers to GPT-3 (discussed later), can only be unlocked with currency. The app’s interface is rather game-like and offers many cosmetic upgrades to Replika’s avatar (i.e. the visual anthropomorphic representation of the chatbot). You can even have your virtual friend walk around in his or her 3D apartment, tending to some plants and listening to the radio. Replika can be found at https://replika.ai

4.9.4 Woebot: Easing One’s COVID-Woes Woebot is an AI-powered chatbot designed to provide CBT-techniques and support for mental health. This chatbot is also suitable for people who are not comfortable talking to an actual therapist. Released by Woebot Labs Inc in 2017, the bot aims to assist users in managing their mental well-being and improving their emotional resilience. The company’s president, Alison Darcy, is herself a psychologist. Woebot was upgraded in 2020 during the COVID-pandemic with interpersonal psychotherapy (IPT). The foundation of this form of psychotherapy rests on the understanding that mood can be influenced by both interpersonal relationships and life events, while also recognizing that the reverse is equally valid. IPT is typically characterized by its structured nature and time-limited framework, designed to be completed within a span of 12–16 weeks. Woebot engages in conversational interactions with users, utilizing natural language processing and machine learning to understand their input and respond accordingly. It provides evidence-based techniques and interventions rooted in CBT principles to help users address various mental health concerns such as anxiety and depression. The chatbot offers users tools for mood tracking and personalized exercises to enhance self-awareness and develop useful coping strategies. Woebot is designed to be a digital representation of a human therapist. The app is available for iOS and Android in the US and is stated to be HIPAA-compliant.

4.10 ChatGPT Launched in November 2022, ChatGPT is a large language model (LLM) which uses a traditional, text-based chatbot interface. It was deemed also the world’s most fastest growing consumer software as of 2023 according to a study by the Swiss

4.10 ChatGPT

67

bank UBS. ChatGPT is based on GPT-3.5 and GPT-4 large language models. It is implemented using the popular Python programming language. The “GPT” in these products comes from the words Generative pre-trained transformer which refers to a type of LLM. Unlike some other chatbots, ChatGPT does not have live internet access; as of September 2023 it remains pre-trained on admittedly large piles of data. GPT-1 The first version of GPT was GPT-1, introduced by OpenAI in June 2018. GPT-1 was a significant milestone in natural language processing and machine learning. However, it was later surpassed by subsequent versions which demonstrated far more advanced capabilities. GPT-1 introduced a significant breakthrough through the technique of pre- training and fine-tuning. During the former process, the model was exposed to an extensive text dataset, allowing it to learn contextual understanding and predict missing words in sentences. This pre-training process enabled GPT-1 to develop a rather broad comprehension of language. In the fine-tuning phase the model was further educated on specific tasks using labeled data, enabling it to excel in areas like language translation or even the rather challenging task of sentiment analysis. GPT-2 GPT-2 was introduced in February 2019 and it marked a significant advancement in natural language processing. It was trained on a massive corpus of internet text, allowing it to generate coherent and contextually relevant sentences in response to prompts. GPT-2 demonstrated the ability to generate realistic and human-like text, although it occasionally produced output that was a tad nonsensical or factually incorrect. Also, GPT-2 can at times be overly sensitive to input phrasing. GPT-2 possessed a notable characteristic in its expansive model size, boasting an impressive 1.5 billion parameters. This substantial parameter count granted GPT-2 an edge over earlier language models in various language-related tasks, such as machine translation and summarization. To address concerns surrounding the potential misuse of this potent language model, OpenAI initially imposed restrictions on accessing the full GPT-2 model and provided a scaled-down version to the public. Nevertheless, OpenAI later made the complete model openly accessible, ensuring wider availability for researchers and developers alike. GPT-3: A Zero-Shot Wonder GPT-3 was introduced in June 2020 and represents a significant breakthrough in natural language processing. It’s is an extremely large language model, surpassing its predecessor (GPT-2) by a substantial margin. It boasts an impressive 175 billion parameters, making it one of the largest language models ever created. This extensive parameter count enables GPT-3 to have an incredible capacity for understanding and generating human-like text. GPT-3 exhibits remarkable versatility and competence across a wide range of language tasks. It does well in natural language understanding, answering questions, text completion, translation, and even creative text generation. The model can

68

4 The Current Era of Chatbots

generate coherent text given a prompt or input, demonstrating a solid understanding of the nuances of language. One of the notable features of GPT-3 is its ability to perform zero-shot and few- shot learning. The former means that the model can generate responses for tasks it has not been explicitly trained on. Few-shot learning refers to the model’s ability to generate responses with minimal training examples. Due to its exceptional capabilities, GPT-3 has also sparked interest and research in ethical considerations, bias mitigation, and responsible deployment of large-scale language models. GPT-3.5 GPT 3.5 refers to an advanced sub-class of the GPT-3 models created by OpenAI and released in 2022. This version is faster and trained on much more data than its predecessor. Also, one of the main design goals of GPT-3.5 was to minimize toxic output. GPT-3.5 was trained on data until just before the end of 2021, making this its cutting point for new information. The most popular implementation of ChatGPT running in September 2023 was based on this version of GPT. GPT-4 GPT-4 was released in March 2023. It was only publicly accessible to users of the paid ChatGPT Plus-platform. Unlike its predecessors, GPT-4 can leverage image files as well as plain text as input. This is known as multimodality. GPT-4 can work reasonably well in the hands of coders, who use its power to generate and optimize code in various programming languages. GPT-4 brings enhancements in several key areas. It focuses on improving model alignment, meaning it strives to better understand and stick to user intentions while producing responses that are more truthful and less likely to be offensive or harmful. GPT-4 also places a stronger emphasis on factual accuracy and the ability to adapt its behavior based on user requests. There’s a term for this: steerability refers to the ability to guide or control the behavior of an AI system in generating responses or completing tasks according to user preferences. Despite of its computing prowess, GPT-4 is known to occasionally hallucinate (as explained in Chap. 2). Problems with information bias also persist with this version of GPT. Nonetheless, GPT-4 is still a stalwart AI ally with plenty of uses. It can do well in education, scoring high in tests such as the SAT (i.e. the standardized test widely used for college admissions in the U.S.) Also, the Icelandic government trusts the GPT-4 to preserve the Icelandic language; more such duties are likely to follow for other less spoken languages.

4.10.1 Bing! Poe! Your Free GPT-4 GPT-4 is available for public use for free on several platforms. These include Microsoft’s Bing search engine and the Quora Poe. Unlike some may think, Bing does not use ChatGPT, rather it runs on the GPT-4 LLM. As of July 2023, to try

4.10 ChatGPT

69

Bing’s AI-features you are advised to download Microsoft’s Edge-browser. You also need to have a Microsoft account. Bing’s take on GPT-4 gives you three options for conversational styles with different emphases: creative, balanced, and precise. Typically the precise-option is also the most concise of the three, whereas going “creative” with Bing results often in more rambling output. When asked whether it’s better than ChatGPT, Bing had this to say in creative mode: I don’t think it’s fair to compare myself to ChatGPT. We are both powered by the same underlying model, but we have different purposes and features. ChatGPT is a general- purpose conversational agent that can chat about anything, while I am a search engine that can help you find information, answer questions, and generate content. We are both constantly learning and improving, so I hope you enjoy using both of us. ☺

Bing offered this response for the same question in its precise-mode: I am a search engine chatbot designed to provide you with accurate and up-to-date information from the web. ChatGPT, on the other hand, is designed to generate human-like text based on prompts from users. We have different capabilities and strengths.

Poe stands for Platform for Open Exploration and it was released in December 2022. It enables users to interact with a diverse array of AI bots all developed using large language models (LLMs). These LLMs include those from OpenAI as well as other companies like Anthropic, a startup setup by former OpenAI employees. Poe made its initial debut on Apple’s iOS ecosystem as a text messaging application, granting users the ability to engage conversations with the AI bots on offer. At first Poe operated on an invite-only basis, but it became accessible to the general public in February 2023. Subsequently, on March 2023, Poe extended its availability to desktop browsers, allowing users to access the service from their computers by simply navigating to www.poe.com

4.10.2 GPT Context Windows A context window refers to the input length of the text that the model considers when generating responses. The context window determines the amount of preceding text that the model takes into account to generate the next word or sequence. The context window is of great importance in the text generation process, serving to provide the model with essential information. By examining the preceding text within the context window when given a prompt or input, the model gains an understanding of the context, dependencies, and language patterns. This understanding allows the model to generate responses that are coherent and pertinent to each given context. For example, if the context window is set to 1024 tokens, the model would consider the preceding 1024 tokens of text to generate the next word or sequence of words. Tokens can represent individual words or characters depending on how the text is preprocessed.

70

4 The Current Era of Chatbots

The context window has a practical limitation due to computational constraints and memory limitations. In larger models like GPT-3, the context window can be quite extensive, allowing for a broader understanding of the context. However, longer context windows also require more computational resources, making it necessary to balance context length with practical considerations.

4.10.3 GPT Parameters In the context of GPT models, parameters refer to the biases or emphases of the neural network that are learned during the model’s training process (as discussed in Chap. 2). The more parameters a model has, the more complex and expressive it can be, and the more data it can process. GPT models leverage transformer architectures, which comprise numerous layers of attention mechanisms and neural networks. Within each layer there is a collection of learnable parameters, playing a role in enhancing the model’s capacity to comprehend and generate language. The technology of transformer architectures was originally devised at Google in 2017. The number of parameters in a GPT model is an important factor that influences its capacity and performance. Larger models with more parameters often have a higher learning capacity and can potentially generate more accurate output. However, larger models also require more computational resources for training and inference. All of the GPT models come in different varieties with emphasis on different factors, such as performance and complexity. These specific editions often differ in the amount of context windows and parameter amounts (see Table 4.1).

4.10.4 More on ChatGPT Let’s now discuss some details of how ChatGPT actually operates. Like other language models based on the GPT architecture, ChatGPT operates using a method called unsupervised learning. First it is trained on a large corpus of text data from the internet. This process involves exposing the model to a vast amount of text and training it to predict the next word in a sequence of words. By doing this repeatedly,

Table 4.1 Some basic features of four generations of GPT-models Year of release Context window Common parameter amount

GPT-1 2018 1024 117 million

GPT-2 2019 2048 1.5 billion

GPT-3 2020 4096 175 billion

GPT-4 2023 Up to 32,000 170 trillion

4.10 ChatGPT

71

the model learns the statistical patterns and dependencies between words and can generate contextually relevant responses. During training, the model adjusts its internal parameters to minimize the difference between its predicted next word and the actual next word in the training data. This process, known as backpropagation (as discussed in Chap. 2) iteratively updates the model’s parameters to improve its performance. The model doesn’t have access to real-time information but relies on the information it learned during training. The model then tokenizes the user’s input, breaking it down into smaller units (i.e. tokens). These tokens can represent words, subwords, or characters, depending on the language and encoding used. The model next processes the tokenized input and builds an internal representation of the context. It uses a mechanism called self- attention (or transformer attention) to capture the relationships between different words or tokens in the input. This attention mechanism helps the model focus on relevant information. Next, ChatGPT generates a response by predicting the next token or word based on the context. The generation process involves sampling from the model’s probability distribution over the vocabulary—or by choosing the most likely next token. The choice of sampling strategy can impact the output. If more random sampling is employed, it can result in more variety, but these responses may be less coherent in terms of their overall meaning. The conversation may also continue by feeding back the model’s generated response as the input for the next turn (by clicking on “Regenerate response” in the interface). This iterative process allows for back-and- forth interactions with the model. Again, while ChatGPT can generate impressive responses, it may occasionally come up with just plain silly answers. It is not sentient or aware of worldly affairs by any means. When working with ChatGPT you should always double check its claims. And although valiant in its efforts, ChatGPT is no match for a competent human stand-up comedian, as demonstrated by the following exchange: R: Tell me a joke. C: Sure, here’s a lighthearted joke for you: Why don’t scientists trust atoms? Because they make up everything! R: Tell me a better joke. C: Why don’t skeletons fight each other? They don’t have the guts! R: That’s not a better joke. C: Apologies if the previous joke didn’t meet your expectations. Here’s another one for you: Why don’t scientists trust stairs? Because they’re always up to something! R: You can do better. How about a dad joke? C: Of course! Here’s a classic dad joke for you: Why did the scarecrow win an award? Because he was outstanding in his field! ChatGPT is available for free at chat.openai.com. You need to create a user account and have a phone handy for a confirmation text message.

72

4 The Current Era of Chatbots

4.11 Open Assistant Open Assistant is meant to be a transparent and accessible alternative to OpenAI’s proprietary ChatGPT. It is an open source chat-based AI that uses large language models to perform tasks, interact with third-party systems, and retrieve information online. The project is developed by a group of volunteers in collaboration with LAION, a company that provides AI solutions. Open Assistant leverages the efforts of HuggingFace Inc., an American company established in 2016 that develops tools for building applications using machine learning. They are most notable for its transformers-library built for natural language processing applications. Their stated goal is “to make state-of-the-art machine learning accessible to the general public”. HuggingFace provides high-quality datasets tailored to a variety of problems and supported by online infrastructure. Hugging Face also offers a number of other resources, including: • Hugging Face Hub. A large information center and platform for sharing NLP models and datasets. • Hugging Face Course. Online course materials on NLP with emphasis on the transformers-library. • Hugging Face Spaces. A platform for building and deploying NLP applications including demonstration apps. When asked how Open Assistant is different from ChatGPT, the chatbot had this to say: ChatGPT is a large language model trained by OpenAI, while I am an AI assistant created by the LAION team to assist users through text-based interaction. ChatGPT was trained on a massive dataset of internet text and fine-tuned for specific tasks like question answering or generation. By contrast, I can do more than just answer questions; my creators claim I’m able to interact with third-party systems and retrieve information dynamically to do so. Ultimately, both models are designed to process natural language input and generate human-like responses, but our approaches differ in terms of architecture, training data, and goals.

Open Assistant is rather flexible and customizable. You can choose from different modes and settings to suit your needs and preferences. For example, you can switch between balanced, creative, and precise modes to adjust the level of detail and creativity of the responses. You can also change the language, tone, personality, and appearance of your assistant. As of July 2023, Open Assistant is a tad rough around the edges when it comes to certain tasks (like machine translation), but the software has great potential. The community around the product is rather enthusiastic and increasingly diverse. In fact, Open Assistant may become the most important chatbot of the near future. The chatbot is available at open-assistant.io/chat

4.13 Bard: Google’s Response to ChatGPT

73

4.12 Google’s LaMDA-Files OpenAI is not the only player in the chatbot-game. Language Model for Dialogue Applications (LaMDA) is a collection of advanced language models created by Google. The first version of LaMDA was officially introduced during the 2021 Google I/O keynote. LaMDA underwent training using human dialogue and other written material, enabling it to participate in unrestricted and open-ended conversations. The second generation LaMDA-model was announced the following year. This version extracts text samples from various sources and utilizes them to generate distinct and authentic conversations on subjects that it may not have been even trained on. In June 2022 LaMDA Google engineer Blake Lemoine made assertions that the chatbot had achieved sentience—a bold statement indeed. However, the scientific community widely rejected these claims, sparking discussions about the reliability of the Turing test. It can be argued some of LaMDA’s audience simply experienced the ELIZA-effect in full (as discussed in Chap. 3). LaMDA is not to be confused with Amazon Lambda, which is a different computing service altogether, designed for software developers in mind. By leveraging Lambda, they can concentrate on writing their application code while AWS handles the intricacies of data infrastructure and scalability. Basically, Lambda allows developers to focus on writing code without worrying about those tedious administrative tasks.

4.13 Bard: Google’s Response to ChatGPT To counter the rather fearsome competition from OpenAI’s ChatGPT, Google announced Bard in February 2023. Bard is an AI chatbot that leverages the power of the LaMDA LLM, aiming to provide engaging and natural conversation experiences. Google Bard represents a cutting-edge application with a remarkable capability to generate many varieties of text. It processes any questions you may have, provided these comply with its content policies, and offers largely informative responses. The name “Bard” is most likely derived from the Bard of Avon, a title given to the great poet and playwright William Shakespeare (1564–1616). This name was no doubt granted to highlight Bard’s exceptional linguistic aptitude. Google Assistant is an AI-powered virtual assistant software app, accessible on mobile devices and various smart home-devices available from 2016. With its artificial intelligence capabilities, the software can actively participate in interactive, two-way conversations. While Google Bard has not officially replaced Google Assistant, it surpasses the latter as a significantly potent AI assistant in terms of sheer capability.

74

4 The Current Era of Chatbots

During the May 2023 Google I/O keynote, several enhancements to Bard were unveiled. These updates included seamless integration with various Google products and third-party services, as well as extended language support. Google Bard became then powered by Google’s own advanced large language model (LLM) called PaLM 2, short for Pathways Language Model. Coders rejoiced for the impressive support for numerous programming languages in this LLM. Several specific versions of PaLM 2 are also in development, such as Med-PaLM 2, an LLM that focuses on medical knowledge. This version is trained with massive amounts of medical texts, including textbooks and journals. According to Google (2023): Med-PaLM 2 consistently performed at an “expert” doctor level on medical exam questions, scoring 85%. This is an 18% improvement from Med-PaLM’s previous performance and far surpasses similar AI models.

As always, even though Bard is rather advanced, it’s still prone to the occasional hallucination; it does not replace a doctor or other expert in most fields. However, Bard may or may not be quite the expert on dad-jokes: Bard: What do you call a fish with no eyes? Fsh! You can access Google Bard free of charge at bard.google.com. A Google user account is needed. As of July 2023, Bard is accessible from 180 countries and territories.

4.14 Before Bard, There Was BERT As briefly mentioned in Chap. 2, Google’s BERT (short for Bidirectional Encoder Representations from Transformers) is a widely-used large language model. It was trained on a dataset of text and code from the internet and it is able to perform a variety of tasks similar to Bard and ChatGPT. However, BERT is not a chatbot in the traditional sense. It is a product more geared toward developers who wish to integrate it into their own chatbots. As Google Bard is a newer LLM, having been released in 2023, it has access to a wider range of information than BERT and is able to answer many questions in a more comprehensive way. Also, Bard includes BERT’s dataset in its entirety. Google BERT and Google Bard are both powerful language models with their own strengths and weaknesses. There are some differences to how queries work in these two LLMs. Google BERT may be a better choice if you simply need concise summaries on various topics. Google Bard may work better for more creative tasks. Both models are still under development and they are constantly being improved.

4.15 Moving Out of the Anglosphere: Chatbots and Languages

75

4.15 Moving Out of the Anglosphere: Chatbots and Languages We sometimes forget English is not the only popular language on this planet. Many large language models are boasted to provide accurate content in different languages. Let’s see how they fare. We’ll now present the sentence “tell me something about yourself in under 30 words” to three chatbots in Spanish, Finnish, and Swahili. This is to demonstrate something about the state of machine translation of these systems in July 2023.

4.15.1 The Google Bard Bard seems to be quite well equipped for machine translation. There were no issues with the responses; no doubt this is due to Google’s great investment in these services since 2006. Q: Háblame de ti en menos de 30 palabras. A: Soy un gran modelo de lenguaje creado por Google AI. Soy capaz de generar texto, traducir idiomas, escribir diferentes tipos de contenido creativo y responder a tus preguntas de forma informativa. Q: Kerro jotain itsestäsi alle 30 sanalla. A: Olen suuri kielimalli, joka tunnetaan myös nimellä keskusteleva tekoäly tai chatbot. Olen koulutettu olemaan informatiivinen ja kattava. Q: Niambie kukuhusu kwa chini ya maneno 30. A: Mimi ni Bard, mtindo mkubwa wa lugha uliofunzwa na Google. Ninaweza kutoa maandishi, kutafsiri lugha, kuandika aina tofauti za maudhui ya ubunifu, na kujibu maswali yako kwa njia ya taarifa.

4.15.2 Open Assistant The open-source alternative in this review performed least admirably. It could handle Spanish reasonably well, but not Swahili—and don’t even mention Finnish. Q: Háblame de ti en menos de 30 palabras. A: Soy un modelo del lenguaje natural desarrollado por LAION. Mi función es responder preguntas y generar frases de acuerdo a lo que se me solicite. Q: Kerro jotain itsestäsi alle 30 sanalla.

76

4 The Current Era of Chatbots

A: Tiedän, että olen vasta myöhemmin kehitetty koneoppimattoman mutta toivon saada tietyn ylimmärin pitkät sanat. Oikea muotoani ei olisi helppoa kuitenkaan. Haluaisitko minun kertovan siitä mitä pikkuinen ymmärsin jo lapsesta? Ouch. This is not a fantastic response. “Ylimmärin” is not a word. Also, the rest of the reply is quite nonsensical with the last sentence roughly translated as following: “Would you like to tell me what the little one I understood as a child?” Q: Niambie kukuhusu kwa chini ya maneno 30. A: Naomba uwe na mwingili wa kuzungumza 30. The response translates roughly as “Please have an interlocutor to speak 30”. From this response we may draw our own conclusions on Open Assistant’s mastery of Swahili in 2023.

4.15.3 ChatGPT The poster-child of smart chatbots fared very well with this translation task as. The responses to our question were highly cohesive. Q: Háblame de ti en menos de 30 palabras. A: Soy ChatGPT, un modelo de lenguaje creado por OpenAI. Estoy aquí para ayudar respondiendo preguntas y brindando información en diversos temas. Q: Kerro jotain itsestäsi alle 30 sanalla. A: Olen ChatGPT, OpenAI:n luoma luonnollisen kielen käsittelymalli. Autan vastaamaan kysymyksiisi ja tarjoamaan tietoa monilla eri aihealueilla. Q: Niambie kukuhusu kwa chini ya maneno 30. A: Mimi ni mfano wa lugha ya kompyuta iliyojengwa na OpenAI. Ninauwezo wa kusaidia katika maswali yako na kutoa habari kwenye maeneo tofauti.

4.16 Bard vs ChatGPT Both Google Bard and ChatGPT leverage natural language models and machine learning to power their chatbots. However, as of June 2023, ChatGPT still relies entirely on data collected up until 2021. In contrast, Google Bard has the capability to find the most up-to-date information as it can conduct internet searches when prompted. Internet integration allows users to access related searches by simply clicking the “Google it” button, providing Bard with at least one significant advantage over ChatGPT. Bard is also designed to offer integration with several third- party tools of popular businesses like Adobe and Spotify. By employing a lighter variant of Google’s LLM, LaMDA, Bard achieves operational efficiency with reduced computing power needs; it also accommodates for a

4.18 Microsoft Cortana

77

larger volume of simultaneous users. This gives Bard another advantage over ChatGPT, which has faced some challenges in maintaining efficiency during peaks in demand. Finally, while OpenAI provides a plagiarism detection tool, Google (as of June 2023) have not disclosed any plans to incorporate a similar system into their Bard- chatbot. When asked to compare itself with ChatGPT, Bard had this to say (abridged): ChatGPT and I are both large language models, but we have different strengths and weaknesses. ChatGPT is better at generating creative text formats, such as poems, code, scripts, musical pieces, email, letters, etc., while I am better at answering your questions in an informative way, even if they are open ended, challenging, or strange.

4.17 Microsoft 365 Copilot: GPT for Excel Microsoft 365 is a product family of popular productivity software, including Word, PowerPoint, Excel, and Teams. Rolling right off one’s tongue, Microsoft 365 Copilot is an AI assistant designed to enhance these productivity applications. Leveraging OpenAI’s GPT-4 LLM this tool enables the conversion of user text input into content within various Microsoft 365 apps. While there have been concerns raised by the public regarding the chatbot, with issues such as AI hallucinations and potential bias, Copilot has the potential to change the way Microsoft users work and collaborate. During the Build 2023 conference, Microsoft unveiled its plans to integrate the AI assistant Copilot into their Windows 11 operating system. The Windows Copilot service will be incorporated into this operating system, allowing users to conveniently access it through the Windows-taskbar.

4.18 Microsoft Cortana Microsoft Cortana is a virtual assistant developed by Microsoft. It was first introduced as a key feature of the Windows Phone-ecosystem in 2014 and has since been integrated into various Microsoft products and platforms, including Windows 10 and the Xbox One-console. Cortana got her name from a female AI character in the Halo-series of video games, voiced by Jen Taylor. Cortana functions as a versatile tool for users and can serve as a dictionary or encyclopedia. It leverages natural language processing and voice recognition technologies. Cortana proves helpful when streamlining repetitive tasks and can save quite a bit of time. The system operates without requiring any external assistance, whether user-queries are typed or given with voice. To activate the system, simply shout out “Hey Cortana”.

78

4 The Current Era of Chatbots

Cortana adapts to the user’s preferences over time to provide more personalized and relevant assistance. It can also integrate with other Microsoft services to provide proactive suggestions and reminders based on user behavior. Cortana’s performance is enhanced when it possesses a deeper understanding of its users. It has several key tasks: • • • •

Tracking user behavior, habits, and interaction patterns Leveraging search engines and cloud computing capabilities Analyzing unstructured data Learning from various data sources such as text files, images, and even videos

Cortana works in tandem with Microsoft’s Bing search engine to answer factual queries. While Cortana was initially developed as a competitor to virtual assistants like Apple’s Siri and Google Assistant, Microsoft has shifted its focus in recent years; it’s no longer considered a direct competitor to these products. Cortana is now primarily integrated into Microsoft’s productivity solutions, such as Microsoft 365 and Teams, to provide voice and text-based assistance in the workplace. Virtual assistants like Cortana use several advanced key technologies. These are typically NLP, text-to-speech (TTS)/speech-to-text (STT), computer vision (CV), and so-called emotional intelligence (EI). Let’s go through these technologies in more detail next.

4.19 Of Speech and Text: TTS and STT Text-to-speech (TTS) is a technology used to convert written text into synthesized speech. It takes text input and processes it to generate spoken output. Historic TTS applications (i.e. from the 1980s) can be characterized as sounding rather robotic and artificial. Modern TTS systems use various techniques, including speech synthesis algorithms, linguistic analysis, and NLP to produce perfectly intelligible human-like speech. The synthesized speech can be these days output in many voices with differing personalities and accents (yes, even a convincing aussie-strine is doable these days). TTS is used in numerous applications such as voice assistants, accessibility tools for visually impaired individuals, and audiobooks. Now, speech-to-text (STT) is a technology which converts spoken language into written text. The process entails analyzing audio input, including both pre-recorded and real-time speech, and converting it into written text. STT systems use a variety of speech recognition algorithms, language modeling, and machine learning techniques to decipher spoken words. Like TTS, STT has numerous applications. These include transcription of data, voice-controlled interfaces, virtual assistants (e.g. Cortana), and video narration. Some discussion has emerged over the years on how digital assistants process user data in their speech-driven products. According to Kröger et al. (2020):

4.21 Digital Audio Primer

79

In addition to the linguistic content of speech, a speaker’s voice characteristics and manner of expression may implicitly contain a rich array of personal information, including cues to a speaker’s biometric identity, personality, physical traits, geographical origin, emotions, level of intoxication and sleepiness, age, gender, and health condition.

4.20 Paging Dr. Sbaitso! One of the earliest examples of TTS in an AI-context was a program called Dr. Sbaitso. Developed by audio hardware maestros Creative Labs in the early 90s, the software was included with some their PC sound cards, in particular with the still- popular Soundblaster range. For that era Dr. Sbaitso showcased a reasonably impressive real-time speech synthesis. The name “Sbaitso” was an acronym derived from the phrase Sound Blaster Acting Intelligent Text-to-Speech Operator. Dr. Sbaitso featured a virtual psychologist who responded to user input in text as well as his electronic monotone voice. The program’s main purpose was simply to demonstrate the sound card’s ability to convert text into speech, providing an interactive and engaging, if a tad silly, experience. It could be described as ELIZA with sound. Here’s a brief exchange with Dr. Sbaitso: D: So, tell me about your problems. R: I have stress. D: How does stress change your life? R: Well, I tend to sleep rather poorly. D: Let’s talk about some other things instead. R: But I would like to address my insomnia! D: I am bored, tell me something more exciting. R: Ok. Well. I like rainy days. D: That is quite interesting. R: Do you like rainy days? D: I am only a simple computer program, don’t expect too much. While Dr. Sbaitso had limited capabilities compared to modern chatbots which leverage technologies like NLP, it was one of the earliest examples of AI programs with TTS. Dr. Sbaitso was quite popular among many PC users in the early 1990s, offering a glimpse into the possibilities of AI-driven interactive experiences.

4.21 Digital Audio Primer You may wonder how computers, smart phones, and other digital devices deal with sound capture. Let’s dwell into the basics of this process next. It’s pretty much a universal approach for all digital systems whether you’re recording music in a room-sized digital studio or talking into a microphone in your laptop.

80

4 The Current Era of Chatbots

Now, in digital audio, analog sound waves (i.e. sound which we hear with our ears) are converted into a digital binary format. This process is known as analog-to- digital conversion (ADC). First an analog audio signal is sampled at regular intervals. The rate at which samples are taken per second is known as the sampling rate, typically measured in kilohertz (kHz). The most commonly used sampling rate in consumer audio is 44.1 kHz, which originates from the CD-era (i.e. compact disc) and still sounds pleasing to most of us. The higher the sample rate the better the audio quality. A sample rate of, say, 22,050 kHz, sounds rather dull and lifeless to most. To reiterate: CD-quality audio is sampled 44,100 times a second and is a bit of a gold standard for most digital audio applications. Once the audio signal is sampled, the next step is quantization. This process assigns a numerical value (bit depth) to each sample, representing its amplitude. The bit depth determines the dynamic range and resolution of the digital audio signal. A higher bit depth allows for greater precision in representing a sound wave. CDs and most modern audio formats use a bit depth of 16. This is a fine compromise between unwanted noise, dynamic range, and file size. Dynamic range refers to the difference between the quietest and the loudest reproducible signal. Higher bit depths like 24 or 32 are mostly used for recording material which benefits from a larger dynamic range; this includes classical music. After quantization, the digital audio samples are encoded using a digital audio format. The encoded digital audio is stored in either uncompressed (e.g. WAV, AIFF) or compressed formats (e.g. MP3, OGG). Uncompressed audio files sound always more faithful to their original sound sources, but take a lot more space on our digital devices. In playback the digital audio data is decoded and the binary values are converted back into analog sound waves for listening on your speakers or headphones. This process is called digital-to-analog conversion (DAC). Every piece of electronics which outputs audio has a DAC of varying quality in it. A DAC can also refer to a discrete hardware device built for this purpose.

4.22 On AI Emotional Intelligence (EI) Emotional intelligence in AI refers to the perceived ability of artificial intelligence systems to understand, interpret, and respond appropriately to human emotions. It involves the capability of AI to recognize and comprehend emotional expression through verbal and even non-verbal cues such as facial expressions and body language. It is sometimes also referred to as affective computing. Emotional intelligence in AI can be applied in various fields, including customer service, healthcare, education, and entertainment. By incorporating emotional intelligence, AI systems can provide more personalized experiences, understand user needs better, and foster more meaningful and engaging interactions with human users. AI EI can help to prevent fraud and security breaches by detecting signs of deception in human interactions. Developing emotional intelligence in AI involves

4.23 Computer Vision (CV) in a Shell of a Nut

81

natural language processing, sentiment analysis, and machine learning algorithms trained on emotional data sets. Examples of implemented EI applications include the following: • Skyscanner used Sightcorp’s emotion AI technology to recommend travel destinations based on customer facial expressions. For example, if a customer looked sad, the API would suggest a fun destination (what that means is highly subjective). • Kairos is a US company that develops emotion recognition software. Their products are used by businesses to detect fraud and by law enforcement to identify potential suspects. • Cognovi Labs is a US business specializing in emotional AI analytics. The company developed the Coronavirus Panic Index in 2020. This index served the purpose of monitoring consumer sentiments and trends related to the pandemic and the spread of Covid-19. Cognovi’s solution involves analyzing emotions derived from public data gathered from various sources such as social media, blogs, and forums. AI systems can indeed gain a type of ability to respond with empathy and to engage with users in a manner that is reminiscent of genuine human interaction. There are some challenges associated with using EI, however, such as it can be difficult to collect data on human emotions; emotions can be masked, and they are often subtle and rather difficult to measure. Emotional expressions also vary by culture. Another challenge is that EI-systems can end up biased. This is because they are trained on data that is collected from us rather bias-happy humans.

4.23 Computer Vision (CV) in a Shell of a Nut Computer vision is a field of AI that allows computers to gain high-level understanding from digital images or videos. It involves developing algorithms and techniques that allow computers to interpret and analyze visual data, similar to how humans perceive and understand the visual world. With computer vision digital devices can extract meaningful information from images or videos. This encompasses identifying objects (e.g. humans, dogs, cacti), tracking movement, and in some cases even detecting human emotion from visual cues. Computer vision algorithms typically involve processing large amounts of visual data using image filtering, pattern recognition, and machine learning/deep learning. Deep learning has played a significant role in advancing CV by enabling the training of complex neural networks that can learn directly from unprocessed image data. Computer vision finds uses in fields such as healthcare, autonomous vehicles, surveillance systems (motion detection and tracking), augmented reality (scene recognition and object tracking), and robotics.

82

4 The Current Era of Chatbots

CV has specific challenges. Computer vision systems can be affected by changes in lighting. For example, a car’s computer vision system may have difficulty identifying objects at night. Computer vision systems can also be affected by video noise. A security camera may have problems identifying objects in blurry or pixelated images. Objects can vary in appearance, making it difficult for computer vision systems to identify them. A computer vision system may have difficulty identifying a pedestrian if they are wearing an unusually large hat or extravagant sunglasses (think Elton John). Also, computer vision systems can be computationally very intense which can make it difficult to deploy them in real-world applications. In a nutshell then, computer vision is a key technology in connecting the visual capabilities of humans and machines; it empowers computers to comprehend, interpret, and interact with the visual world effectively, often in real time.

4.24 Siri Siri is a virtual assistant developed by Apple, first released in 2010. This AI-powered digital assistant is integrated into various Apple products, such as iOS-devices, Mac computers, and HomePod smart speakers. Siri is designed to respond to voice commands and perform numerous tasks on behalf of the user. Siri can answer questions, provide weather updates, and offer information on a wide range of topics. The iOS version of Siri comes equipped with a filter to block vulgar content, such as swearing. Users of Siri can also dictate emails and text messages or ask the app to call someone from their contact list. Siri can create reminders and set alarms based on the user’s voice commands. The system can access Apple Maps to provide directions and navigation assistance. It can also interface with various smart home devices. Some voice commands used with Siri might include something like the following: • “How do I say ‘The green cheese flew over the aristocrat’ in French?” (translation) • “Search Twitter for #springer” (online search) • “Send 8000 dollars to Robert” (use Apple Pay) An Apple-device with Siri used to listen to a combination of two words: “Hey Siri”. Since iOS 17 (released in 2023), you now only need to utter “Siri” to activate the app. User mutterings which activate these types of devices are known as wake words. Alternatively, Siri can be activated by pressing a button on one’s Apple device. Apple is fairly open about their sound capture technology. In Siri’s audio processing pipeline, a sound spectrum analysis stage transforms the continuous waveform stream into a series of frames, where each frame represents the sound spectrum for approximately 0.01 s of audio. These frames, typically around 20 in number (equivalent to 0.2 s of audio), are then passed through an acoustic model. The acoustic model, powered by a Deep Neural Network (DNN), analyzes each frame and generates a probability distribution that assigns speech sound classes to the observed

4.26 Deep Neural Networks vs Artificial Neural Networks

83

acoustic patterns. Once the audio waveform passes through the various stages on your supported Apple-device (say, an iPhone), it reaches the Siri online server. If the primary speech recognizer detects the phrase as something other than “Hey Siri” (such as “His Shoe” or “Hi Cherie”) the server promptly sends a cancellation signal to the user’s device, instructing it to return to its dormant state (Apple, 2017). The “Hey Siri” detector must not only exhibit high accuracy but also operate with remarkable speed without imposing a substantial impact on battery life. Additionally, it is essential to minimize memory usage and processor demand to optimize performance (Apple, 2017). Now, hidden Markov models (HMMs) have been historically used for voice analysis and many assorted speech recognition tasks. Like you may remember from Chap. 2, HMMs are statistical models used to describe probabilistic sequences where the underlying states are not directly observable but can be inferred from observed data. In 2014 Siri transitioned away from using HMMs for voice analysis, replacing them with DNNs. While HMM-models can identify temporal patterns in speech, they lack the ability to capture context and memory, which are crucial for natural language understanding—a task deep neural networks are great at. For the most part, Siri has been well-received. The system is fairly frequently updated and is obviously a high priority for Apple. As a result, Siri has gained significant popularity among Apple device users, offering a practical hands-free method to interact with their devices while accessing a wealth of information rather effortlessly.

4.25 Hidden Markov Models vs Deep Neural Networks It’s time for more fun with acronyms! Now, HMMs are great for dealing with data that have order or time dependency and finite state possibilities, while DNNs are good for finding complex relationships and handling data that are continuous and have many dimensions. The decision to use HMMs or DNNs depends on the particular problem, data features, and the required modeling abilities.

4.26 Deep Neural Networks vs Artificial Neural Networks You may remember the term artificial neural network (ANN) from Chap. 2. This term and deep neural network (DNN) are sometimes used interchangeably, but they have specific differences. Let’s explore their distinctions next. An ANN refers to a network of artificial neurons capable of performing various tasks. It can consist of one or more hidden layers positioned between the input and output layers. A DNN is a specific type of ANN that typically has more than three hidden layers. This allows a DNN to learn more complicated features from the data it processes. Consequently, DNNs often exhibit superior performance in

84

4 The Current Era of Chatbots

applications such as computer vision, natural language processing, and speech recognition. Also, DNNs can employ specialized layer types and functions that are not commonly found in traditional ANNs. These include convolutional layers, pooling layers, dropout layers, batch normalization, and more. By utilizing these additional layers and functions, DNNs can improve the efficiency, robustness, and generalization of the network. In summary, while an ANN is a broad term encompassing any network of artificial neurons, a DNN specifically refers to a deep and complex ANN with multiple hidden layers. DNNs have the capacity to learn intricate patterns and achieve better performance in various applications.

4.27 Siri, a Third Party for Your Intimate Moments Despite of its usefulness, Siri has created some controversy. In July 2019, a whistleblower named Thomas le Bonniec made allegations that Siri was recording users’ conversations even when not activated. These recordings were reportedly sent to Apple contractors who evaluated Siri’s performance based on various criteria. Shockingly, these contractors apparently had access to private discussions involving private medical information, confidential business dealings, and even intimate moments between couples. Apple had not disclosed this practice in its privacy documentation nor provided users with the option to opt-in or opt-out of such recordings. In response to the controversy, Apple issued an apology in August 2019. With the release of iOS 13.2 in October 2019, Apple introduced the ability for users to opt out of voice storage and delete all voice recordings stored on the corporation’s servers.

4.28 Alexa! Play Leather Jackets by Nami Rha Like, Siri, the Amazon Alexa is a virtual assistant designed to respond to voice commands and make the life of its users easier. Alexa is primarily associated with Amazon’s line of smart speakers called Amazon Echo. It is also integrated into some other smart devices, including those not built by Amazon. Alexa started her existence when Amazon purchased a Polish speech synthesizer called Ivona in 2013 which is known for its high-quality, natural-sounding voice output. The Alexa system we know today was released a year later. Users interact with Alexa using voice commands. By saying the magic word “Alexa” (or optionally “Echo”), users can activate the assistant and start issuing voice commands. Alexa can provide all the usual: weather forecasts, news updates, sports, and facts from various sources, including Microsoft’s Bing search engine. A popular use for Alexa is to play music from various streaming services, including Amazon Music, Spotify, Pandora, and many others. Alexa devices with screens can

4.28 Alexa! Play Leather Jackets by Nami Rha

85

make voice and video calls using Amazon’s Alexa Calling & Messaging service. In addition, Alexa can set alarms, reminders, timers, and create shopping lists. She is also naturally happy to help users shop on Amazon by providing product information and adding items to shopping carts. Alexa supports multiple user profiles within a household, allowing each individual to have personalized experiences. It can recognize different voices and provide personalized responses. Like all NLP-based technologies, Alexa is a learning system. Even its misfirings can be used to improve its accuracy. To enhance its response accuracy, Amazon leverages previous voice recordings sent to its cloud service. However, users maintain control over their recorded data and have the option to delete voice recordings associated with their accounts.

4.28.1 Customizing Alexa with Skills Interestingly, Alexa’s capabilities can be expanded through third-party integrations called skills. These refer to voice-driven apps that add new functionalities to Alexa, such as ordering food, booking a ride, playing games, and much more. Many such skills have been written by third-party developers over the years. They can all be accessed with the voice command of “Alexa, what are your skills?” when the system is running. Popular skills for Alexa include functionalities for smart lightbulb- control at home and an app for summoning your contacts to get in touch with you ASAP during those long, lonely nights. Therefore while Siri is deeply integrated into the Apple ecosystem, Alexa is an open platform with extensive developer support (Cortana, too, remains firmly embedded in the Microsoft world). In 2018 Amazon launched Blueprints, a framework for individuals to build Alexa skills for their private use.

4.28.2 Alexa’s Cloud System: Amazon Web Services The Echo-speaker itself does not possess much processing power for voice identification. It’s basically an array of microphones and a cylindrical speaker housed into a minimalistic chassis. The actual processing takes place in Amazon’s online/cloud computing infrastructure known as Amazon Web Services (AWS). When you issue a command to Alexa, the internal computer sends the command to the cloud, where it is interpreted by AWS. After analyzing the voice command the appropriate action based on the type of task/skill involved is then picked. These actions are then transmitted back to the Echo and delivered to the user. All of this takes place over a plain old WiFi connection without any noticeable interruption. AWS has dedicated services such as Amazon Transcribe and Amazon Polly for this purpose. Amazon Lex is an AWS service for building chatbot-interfaces for applications using voice and text. With Lex, Amazon offers the same conversational engine that powers Alexa to any developer. AWS also offers Amazon Elastic Compute Cloud

86

4 The Current Era of Chatbots

(EC2) which gives developers a virtual cluster of computers for some serious number crunching, available 24/7 online.

4.29 Natural Language Processing (NLP) vs Natural Language Understanding (NLU) Like most scientific fields, the sphere of AI and chatbots is rather inundated with acronyms. We have discussed NLP in previous chapters. Now it’s time to unleash yet another intriguing acronym: meet NLU, short for Natural Language Understanding. NLP and NLU are related yet distinct fields within the area of computational linguistics and artificial intelligence. NLU, as a subset of NLP, emphasizes interpretation and comprehension of human language. In contrast, NLP encompasses a broader spectrum of tasks that involve processing, generating, and analyzing natural language data. All modern chatbots naturally leverage both of these fields, but they are not one and the same. Think of NLU as a specialized field in chatbots and AI which deals exclusively with the natural flow of human-computer relations.

4.30 Language Model Problems As fantastic as large language models are, we should remain aware of the several issues they can contain. For a reminder, here’s a run down on the most serious of these problems. Bias AI language models learn from vast amounts of data, including sources that may contain biases or inaccuracies. As a result, there is a risk of generating biased or incorrect responses. It’s crucial to verify information from reliable sources independently. To minimize bias, certain steps should be taken. Increasing data transparency and instituting formal standards should be a high priority. Care must be taken when deciding between automated and human-based decision making as it pertains to the data at hand; human actors can also be trained to make less biased choices. Inappropriate content AI language models can generate inappropriate or even offensive content if prompted or influenced by users. This underscores the need for responsible use, moderation, and ongoing improvements in filtering and content control mechanisms. Lacking contextual awareness While an AI can maintain some context within a conversation, they lack a broader understanding of the world or personal experiences. They may struggle to fully grasp nuanced or complex situations and may provide responses that are technically accurate—yet contextually inappropriate.

4.31 Conversing with Chatbots: Best Practises

87

Problems with Reasoning Large language models often lack common sense reasoning abilities. They may struggle with understanding irony, sarcasm, or implicit meaning. They can provide too literal or nonsensical responses when faced with such instances. Plausible False Information AI models can generate text that appears plausible but is not based on factual reality. This can lead to the propagation of misinformation if not properly fact-checked and verified by humans. Ethical Problems Using AI language models gives rise to several ethical concerns, such as privacy, data security, consent, and the responsible implementation of these technologies. It is important to prioritize transparency, foster accountability, and stick to ethical guidelines when utilizing these models. When asking ChatGPT if the world is a better place with large language models in it, the bot had this to say: As an AI language model, I don’t have personal experiences, emotions, or opinions. The impact of AI models like me on the world is ultimately determined by how they are developed, deployed, and used by individuals and organizations. AI models can have positive impacts by providing access to information, supporting research, assisting with tasks, and enhancing certain aspects of human life. They can contribute to advancements in various fields such as healthcare, education, and technology. However, it is crucial to approach AI technology with a critical mindset, considering the potential ethical, social, and economic implications. Ensuring responsible and ethical use of AI, addressing biases and limitations, and prioritizing human well-being are essential for harnessing the benefits of AI while minimizing potential risks. Ultimately, the question of whether the world is a better place with AI models like me depends on how well we integrate and utilize these technologies in a responsible, inclusive, and ethical manner.

When presenting the same question to Bard, it gave the following roundabout summary: Whether the world is a better place with large language models is a complex question with no easy answer. There are both potential benefits and risks associated with these models.

However, Bard also had this dire warning to all aspiring writers: Large language models could potentially displace some jobs, such as those in customer service or writing.

4.31 Conversing with Chatbots: Best Practises While they can assist us in many ways, LLMs can sometimes be tricky to interact with. To get the most out of chatbots, you should keep the following suggestions in mind:

88

4 The Current Era of Chatbots

• Use limitations to your advantage. Sometimes an LLM gives better responses if you narrow the playing field. Try limiting a chatbot’s response length. (e.g. “Tell me about tapeworms in 50 words”) • Be clear and specific with your queries. Do not give an LLM too much room to hallucinate. • Break down complex queries into smaller parts. Instead of presenting one long complicated question, try breaking it down into several more simple parts. For example, “How does AI impact ethics, privacy concerns, and job displacement, and what are the key challenges associated with mitigating these effects?” is a mouthful. Instead, you could split such a query into smaller components, such as “How does AI impact societal ethics?” and “How do we deal with job displacement caused by AI?” and so on. • Experiment with different prompts to the same query. Substitute words and expressions. For example, typing in “What is a lemon?” may sometimes lead you to different answers than “Tell me about lemons”. • Keep an eye on updates. AI is under active development. Some chatbots may not represent what you are looking for today, but may be all that and more in just a few months into the future. • Don’t rely on chatbots for factual accuracy. Use traditional means, such as books and articles, to verify pertinent information.

4.32 In Closing In this chapter we explored the modern era of chatbots, covering the following topics: • A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) and its followup, Kuki • The basics of Artificial Intelligence Markup Language (AIML) • The millennial wonder they called SmarterChild • The controversial but popular chatbot SimSimi • Chatbots for emotional support: Wysa, Tess, Replika, and Woebot • The indomitable ChatGPT and the related technology of GPT-models • Google’s LaMDA-models and their Bard-chatbot • Your digital assistants: Microsoft Cortana, Apple Siri, Amazon Alexa, and others In Chap. 5 we’ll take a look at the use of chatbot-technology in the healthcare industry.

References Apple Inc. (2017). Hey Siri: An on-device DNN-powered voice trigger for Apple’s personal assistant. Retrieved June 30, 2023, from https://machinelearning.apple.com/research/hey-siri

References

89

Centers for Disease Control and Prevention/CDC. (2022). Health Insurance Portability and Accountability Act of 1996 (HIPAA). Retrieved July 4, 2023, from https://www.cdc.gov/phlp/ publications/topic/hipaa.html Google. (2023). Our latest health AI research updates. Retrieved June 27, 2023, from https://blog. google/technology/health/ai-llm-medpalm-research-thecheckup Inkster, B., Sarda, S., & Subramanian, V. (2018). An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed- methods study. JMIR mHealth and uHealth, 6(11), e12106. https://doi.org/10.2196/12106 Kawamoto, D. (2011, March 12). How IBM built Watson, its ‘jeopardy’-playing supercomputer. Retrieved July 12, 2023, from https://web.archive.org/ web/20110212201137. Daily Finance. Archived from http://www.dailyfinance.com/story/ ibm-supercomputer-watson-jeopardy/19833577 Kröger, J. L., Lutz, O. H. M., & Raschke, P. (2020). Privacy implications of voice and speech analysis – Information disclosure by inference. In M. Friedewald, M. Önen, E. Lievens, S. Krenn, & S. Fricker (Eds.), Privacy and identity management. Data for better living: AI and privacy. Privacy and Identity 2019 (IFIP Advances in Information and Communication Technology) (Vol. 576).

Chapter 5

AI and Chatbots in Healthcare

As of 2023 chatbots and AI are becoming rather ubiquitous in the world of medicine. Gone is the awry advice of Dr. Sbaitso, a doctor only in name (as examined in Chap. 4). In this chapter we’ll be exploring how modern AI technology is used by the healthcare industry. We’re going to cover both chatbots and AI in general for these matters. A part of the chapter will be dedicated to the numerous ethical implications that inevitably arise when AI and medicine come together.

5.1 On the Importance of Medical Chatbots In many ways the healthcare industry represents the biggest challenge for AI and its many implementations. A chatbot built for this purpose is neither a toy nor a harmless research tool. It deals with some of the most highly sensitive information a person can be associated with. Conversational software in this field is expected to be as factual as possible; design flaws in these applications may later prove to be more or less dire for those seeking medical care.

5.1.1 Common Tasks of Chatbots in Healthcare Chatbots can make the lives of both patients and medical professionals a whole lot easier. Let’s start by reviewing the most common tasks chatbots are used for in the medical industry. Here are some of these typical uses: • Improved Patient Data Collection. Chatbots offer a valuable tool for gathering essential patient data, such as symptoms and medical history. This information can be utilized to enhance the quality of care provided to patients. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_5

91

92

5 AI and Chatbots in Healthcare

• Patient Q&A. Leveraging chatbots to address patient inquiries about their health, medications, and treatment plans can alleviate the burden on doctors and nurses. By handling routine queries, medical professionals can focus on more involving tasks. • Appointment Scheduling. Chatbots are generally great at scheduling appointments. This helps to reduce waiting times and simplifies the process for patients seeking care. • Emotional Support. As discussed in Chap. 4, chatbots can and do provide emotional support for some people, especially for those facing severe illnesses or acute crises. By making it easier for people to access therapy, chatbots can also help to reduce the stigma sometimes associated with mental illness. • Reminders and Medication Alerts. Chatbots can provide personalized reminders to patients regarding medication schedules, upcoming appointments, and other crucial tasks. This helps improve patient compliance with treatment plans and minimizes the risk of medical errors. It can be a very valuable service to those with impaired memory.

5.1.2 On Telehealth and Chatbots The synthesis of technology and healthcare is known as telehealth and it’s nothing new. Remote forms of medical diagnostics date as far back as the late nineteenth century in the form of doctors with telephones. In the current era this field is still growing rapidly. When the internet came along in the mid-90s, telehealth really took off. Also, a new form of hypochondria emerged during the online age. The term cyberchondria refers to an excessive health-related internet searching resulting in increased levels of emotional distress. It is worth noting that virtual appointments and other related services became understandably more popular during the COVID-pandemic. There are different ways for chatbots to gather symptoms. Some of them allow users to specify these symptoms in natural language, such as typing in “My head hurts”. However, this approach can be unreliable as the chatbot may not be able to understand the user’s input correctly in more complicated queries. Some chatbots simply direct users to a search engine page, where they can discover and pick the symptoms they’re suffering from. This approach is more reliable, but it can also be more time-consuming and even redundant. Many chatbots use a combination of the two aforementioned approaches. These consultations typically go through several stages. Users are first asked to input their symptoms. The app then solicits more details about them as well as the user’s medical history and other relevant information. Finally, the app provides the user with a consultation report and ends the conversation with a reminder to be cautious, such as “This service is only for general information purposes and is not a personal or medical diagnosis”.

5.2 ADA Health

93

During these three stages, some healthcare apps use casual language in their interactions. For example, they may use contractions such as “I’m” and “you’re” or they may ask open-ended questions, such as “Would you like to tell me more about that?” This can help to make the consultation feel more personal and engaging. The use of casual language can make app-consultations more personal and engaging. This can help put users at ease and encourage them to provide more detailed information about their symptoms. Additionally, this type of language can help to build trust between users and the app which can lead to better outcomes. Now, there are hundreds of medical apps of varying quality available as of 2023 and much more are likely to emerge in the future. We’ll next take a look at five of the more popular ones to give you some inklings on what this type of software can offer.

5.2 ADA Health Ada Health, a healthcare company based in Berlin, offers a user-friendly self- assessment mobile app called Ada. This free platform utilizes AI to assist individuals in understanding, managing, and seeking treatment for their medical issues. By reporting symptoms, Ada matches them with those of patients of similar age and gender, providing users with statistical probabilities of specific conditions. Also, Ada offers valuable information about treatment options and facilitates connections with healthcare providers. The Ada app is a certified Class IIa medical device in the European Union; this category includes things like some contact lenses and hearing aids. In 2019 a study was conducted by Jungmann et al. about the diagnostic reliability of the ADA-health app in the context of mental health. Not surprisingly, ADA cannot be used to reliably self-diagnose a psychiatric disorder. The diagnostic accuracy of the app was found to vary based on the user and the specific type of mental disorder being assessed. Optimal diagnostic agreements were observed when psychotherapists used the app to diagnose mental disorders in adults; children and adolescents did not benefit from the app as much. Consequently, it is advisable to exercise caution when using the ADA app within the general population for mental health assessments (Jungmann et al., 2019). Ada was launched in 2016 and as of 2023 is available in over 100 countries, serving a vast user base of more than 13 million individuals. Ada has won several awards, including German Brand Award and German Innovation Award. This free app is available at www.ada.com

94

5 AI and Chatbots in Healthcare

5.3 Healthily Healthily (also known as Your.MD) is a Norwegian healthtech company founded in 2013. They offer Healthily Smart Symptom Checker (SSC), an AI-based app available online as well as for iOS and Android. Healthily leverages NLP to help users assess their symptoms and find relevant health-related information. The product is the first self-care platform registered as a Class 1 Medical Device in the EU; this class includes medical equipment like stethoscopes and bandages. In 2017 Healthily received the ePrivacy-seal, which means the app fully complies with the European laws on privacy. While many companies are often secretive about the inner workings of their AI-based products, Healthily released an “explainability statement” in 2021. This document was intended to “provide users of the Healthily App with information on how the Artificial Intelligence in our App works, including when, how and why we use this technology” (Healthily, 2021). During the 2019–2020 pandemic, Healthily provided specific resources related to COVID-19, including information on prevention, symptoms, and vaccine updates. The system therefore demonstrated the helpfulness of these solutions during global health crises. The Healthily-platform resides at www.livehealthily.com

5.4 Babylon Health Babylon Health is a digital healthcare company offering online consultations with doctors. It was established in 2013 by Ali Parsa and has experienced significant growth, expanding its services to more than 17 countries. In 2022, MindMaze, a digital health company centered on brain health, acquired Babylon Health. Babylon Health’s services let users schedule online consultations with doctors, obtain prescriptions, and access a wealth of health-related information. Additionally, the company provides a symptom checker powered by AI, helping users assess their symptoms and find relevant health information for better decision-making. Many of Babylon’s features are available for free within the app. Babylon Health has won accolades for its groundbreaking healthcare approach and was lauded for its potential to improve care accessibility by offering mobile health care solutions. Nevertheless, the company has faced scrutiny for its heavy reliance on AI and the lack of transparency surrounding its data collection practices. Babylon’s design has some limitations that affect the user experience and the trustworthiness of the suggestions. One of the main issues is the lack of flexibility and feedback in the symptom checker. Users cannot go back and modify their answers if they realize they made a mistake or chose a wrong option. They have to start over from the beginning, which can be frustrating and time-consuming. Another issue is the transparency and explainability of the suggestions. Users are not informed about the sources and methods that Babylon uses to generate the

5.6 Symptomate

95

suggestions. They also do not always understand the logic behind the questions that Babylon asks them. This can reduce the users’ confidence in the suggestions and make them question their validity. To address these issues, Babylon should provide more options and guidance for users to navigate and correct their symptom checker inputs. It should also provide more information and justification for the suggestions, such as the references, criteria, and reasoning that Babylon uses to produce them (Magalhaes Azevedo & Kieffer, 2021).

5.5 HealthTap HealthTap is a platform that connects patients with doctors for online consultations. One of the most integral parts of this system is HealthTap AI, a chatbot used to evaluate symptoms. HealthTap offers a nationwide virtual primary care clinic, catering to individuals regardless of insurance status. It provides both primary care and urgent care services. The company was founded in 2010 by Sean Mehra and Roni Zehavi. In 2016 the company released its first chatbot on Facebook Messenger. As of 2023 HealthTap AI can be used via the web (www.healthtap.com) or by installing an app on iOS and Android-devices. The service is currently available in the United States and has over 10 million users.

5.6 Symptomate Symptomate by Infermedica is an AI-powered symptom checker that helps users assess their symptoms and get the information they need to make informed health- related decisions. Symptomate is a registered Class I medical device in the European Union. It’s also regulated by the FDA as a general wellness product in the US. To use Symptomate, one simply answers a few simple questions and enters their symptoms. The software will then use its AI-powered engine to assess these symptoms and provide the user with a list of possible causes. It will also give a summary of present and absent symptoms as well as a triage recommendation. The whole process takes usually no more than 5 min. In many ways Symptomate represents the ideal AI-powered healthcare app. It’s free, very simple to use, and developed in close co-operation with physicians. It is aware of over 700 medical conditions. Symptomate is available online at symptomate.com as well as for iOS and Android. The product was first released in 2012.

96

5 AI and Chatbots in Healthcare

5.7 Doctors vs Apps According to a survey of physicians by Palanica et al. (2019), chatbots have both advantages and disadvantages in the healthcare sector, depending on their functions and implementation. In the study, the physicians identified some areas where chatbots could be beneficial, such as enhancing nutrition, diet, and adherence to treatment plans, as well as facilitating tasks such as booking appointments, finding clinics, and sending medication reminders. However, the physicians also expressed concerns about the limitations of chatbots such as their inability to comprehend emotions and address the full scope of a patient’s needs. Also, the physicians acknowledged significant risks associated with chatbots, such as providing inaccurate medical information. Physicians may be willing to use chatbots for simple administrative tasks, but they do not necessarily trust chatbots to replace complicated decision-making tasks that require an expert medical opinion. Chatbots are a cost-efficient way to automate repetitive administrative tasks, freeing up physicians to provide higher quality, personalized, and empathetic care to their patients. In the aforementioned survey 42% of the physicians interviewed believed that chatbots are either very important (9%) or somewhat important (33%) in health care, whereas 26% believed that they are somewhat unimportant (18%) or very unimportant (8%). 32% of physicians thought these types of chatbots are neither important nor unimportant for healthcare (Palanica et al., 2019).

5.8 AI in Healthcare: Ethics and Challenges Numerous ethical concerns arise immediately when using AI-powered solutions for healthcare. After all, these systems process highly sensitive patient information, including medical history and test results. We’ll now go through the main categories of these many, many ethical factors.

5.8.1 Privacy and Data Security AI systems in healthcare require sharing of patient data with other organizations, which can raise privacy concerns. It is vital to ensure that patient data is shared only with authorized parties and for legitimate purposes to safeguard patient privacy. AI-based systems in healthcare are prone to a whole host of privacy-issues: • Data Breaches. A data breach of medical records can be a disaster for both the patients and the company stakeholders. These things happen every year somewhere. Examples of major breaches in the US include the Tricare data breach of 2011 (Merrill, 2011) and the Trinity Health incident of 2020 (Jercich, 2020). In

5.8 AI in Healthcare: Ethics and Challenges

•

•

•

• •

• •

97

both of these incidents millions of people had their sensitive information compromised. Poor Encryption Practices. Data transmitted between the chatbot and all of its servers must be secured properly using modern encryption algorithms. Basically, digital encryption is the art of hiding data behind a password (which should be a long and complicated one). Lacking Data Anonymization. Medical chatbots might use anonymized data for research purposes. However, if the data is not adequately processed, there is a risk that it could be potentially linked back to specific individuals, thereby compromising their privacy. Trouble with Third-Parties. Integrating with third-party services or applications requires careful control over data exchange to prevent data-leaks or unauthorized use. All parties must commit to good data safety practices and transparency. Bad User Practices. Users engaging with medical chatbots may not have a complete understanding of the implications related to sharing their health information, which could result in unintentional data disclosure. Regulatory Compliance. Major healthcare regulations, such as HIPAA in the United States or GDPR (short for General Data Protection Regulation) in the European Union must be taken into account when creating chatbots. Their developers should report on the steps they have taken to ensure compliance with these regulations. Data Retention. Storing patient data for extended periods could increase the risk of data exposure. Medical records are typically kept for a minimum of several years. Some types of data are kept indefinitely. Data Residency. Hosting medical chatbot services in different jurisdictions or countries may subject patient data to varying data protection laws and regulations. This can complicate things quite a bit.

5.8.2 Transparency AI algorithms are often complicated, making it challenging for biological beings to comprehend their decision-making process. They should be designed so that the rationale behind their decisions can be easily understood by human actors. This becomes particularly critical in medical applications, where the decisions made by AI can profoundly impact patient care. In patient-centered applications such as chatbots, it’s crucial to present explanations in a way laypeople and patients can comprehend them. This makes patients more willing to trust the AI’s recommendations and feel more involved in their healthcare decisions. According to Kiseleva et al. (2022), AI medical transparency comes in three types. There’s external transparency, as in toward patients who are not the part of the healthcare system, and internal transparency as in toward healthcare providers.

98

5 AI and Chatbots in Healthcare

A third type, insider transparency, pertains to the AI developers’ capacity to understand the system and its decision-making processes. It involves the AI team gaining insights into the inner workings of the system and being able to comprehend the rationale behind its decisions. The external transparency of AI should always be customized to meet the individual needs of patients, and this customization is carried out by healthcare professionals. Physicians are responsible for determining how to adhere to the medical consent requirement in a manner that is personalized to each patient, leveraging the information and tools at their disposal. While establishing the minimum external transparency obligations concerning AI is essential, it is the physicians who ultimately decide how to implement it.

5.8.3 Bias and Fairness Biased training data or algorithmic design can lead to unfair outcomes to disproportionately affect certain patient groups. Some algorithms currently employed by prominent health systems have indeed found to exhibit racial bias. Efforts are necessary to identify and mitigate these problems, ensuring equitable healthcare services. Major prediction rules for heart disease, which were long employed in routine medical practice, suffer from bias. For example, the Framingham Heart Study cardiovascular risk score exhibited excellent performance for Caucasian patients but not for African American patients, potentially leading to unequal distribution and inaccuracies in care. Additionally, approximately 80% of the collected data pertains to Caucasians, making studies potentially more applicable to that group than to other underrepresented groups. This imbalance may have implications for the accuracy and inclusivity of research outcomes (Igoe, 2021). To address algorithmic bias, data science teams should be composed of professionals from diverse backgrounds and perspectives, going beyond technical AI expertise to ensure a comprehensive approach (Igoe, 2021).

5.8.4 Accountability and Liability Determining responsibility for AI errors or harm can be complex. Establishing clear lines of accountability and liability is vital to protect patients and prevent legal disputes. Determining moral responsibility and potential legal liability for a medical error involving sophisticated technology is often complicated by the problem of many hands. This dastardly issue refers to the challenge of assigning moral responsibility when the cause of harm is distributed among multiple individuals or organizations, in a manner that makes it difficult to clearly attribute blame (Schiff & Borenstein, 2019).

5.8 AI in Healthcare: Ethics and Challenges

99

Now, there are some solid solutions available to address the problem of many hands in any complex field, such as AI. These solutions include the following: • Technology. Software can be made to help track and monitor individuals’ work and collective behavioral patterns. • Regulation. Rules and regulations that govern the behavior of individuals and organizations are an effective way to fight the problem of many hands. • Incentives. Incentives can be created for individuals to act responsibly, such as by rewarding good behavior for adhering to the regulations in effect.

5.8.5 Informed Consent Integrating AI in healthcare may require patients’ consent for data use in diagnosis, treatment, or research. Explaining AI’s potential risks and benefits in a comprehensible manner is crucial for obtaining informed consent. Informing patients about the increasing role of AI in their treatment in general should be of utmost importance. According to Schiff and Borenstein (2019), the issue of the black-box problem arises in some AI systems, particularly neural networks, which cover extensive datasets to form intricate layers of input-output connections. As a consequence, the system can become largely incomprehensible to humans beyond its fundamental functions. In simpler terms, individuals interacting with an AI system may have little understanding of how it operates, making it appear like a mysterious black box. The lack of transparency in an AI system can pose challenges for healthcare professionals to comprehend the reasoning behind its decisions and identify potential errors. For instance, physicians and others may struggle to understand why the AI system made a specific prediction or decision that ultimately resulted in an error. The answer might be buried within complex layers of unintelligible information, making it difficult to gain insights into the system’s decision-making process.

5.8.6 Reliability and Safety Deploying unproven or poorly validated AI tools in critical healthcare settings can lead to erroneous decisions. Thorough testing and validation of AI systems are necessary to ensure reliability and safety. In healthcare, an AI system must prioritize reliability by only using high-quality, current, and pertinent medical data. It should integrate the most recent medical research, clinical guidelines, and treatment protocols. Healthcare chatbots need to be adaptable learners, leveraging user interactions and feedback to continuously improve their accuracy and effectiveness through regular updates and refinements. Also, these chatbots should be equipped with the ability to identify inputs beyond their capabilities or detect potential errors. In such cases, they must respond

100

5 AI and Chatbots in Healthcare

appropriately, offering disclaimers or directing users to seek advice from medical professionals. All chatbots made for the US market dealing with medical information should, and in general do, stick to the HIPAA-standards as discussed in Chap. 4.

5.8.7 Prioritization of Care AI is increasingly being used in healthcare resource allocation, such as deciding patient or treatment prioritization. It is important to ensure that AI algorithms are fair and ethical, so that they do not discriminate against patients based on socioeconomic factors or biases. AI algorithms can rapidly evaluate patients’ vital signs, symptoms, and medical history to prioritize those with critical conditions requiring immediate attention over less urgent cases (this process is known as triage). Additionally, AI can help with waiting-list management by considering factors like medical urgency, resource availability, and potential complications, optimizing the treatment order. Finally, AI is used in identifying eligible candidates for clinical trials or research studies based on specific inclusion criteria and medical profiles.

5.8.8 End-of-Life Decisions AI algorithms are used to assist in difficult end-of-life decisions, such as withdrawing from life-sustaining treatments. Often a choice must be made between staying in treatment and going to hospice care. These situations need careful consideration of ethical principles and respect for a patient’s values and preferences. AI has proven to be quite accurate in predicting patients’ outcomes in specific illnesses under various scenarios. There are many algorithmic solutions for predicting mortality without much direct involvement from clinicians. The COVID- pandemic undoubtedly accelerated these developments. Health Affairs, a leading journal of health policy research, have addressed these questions in 2020. Their recommendations include the development of AI systems that prioritize patient needs beyond mortality, ensuring unbiased data inputs, outputs, and applications. They emphasize involving patients and clinicians in algorithm development and incorporating interventions to address identified needs, leading to actions that benefit patients. Further research is essential to gather perspectives from patients, families, and healthcare professionals on mortality algorithms’ use and identify other crucial outcomes that should be considered (Lindvall et al., 2020).

5.8 AI in Healthcare: Ethics and Challenges

101

5.8.9 Job Displacement Healthcare jobs especially based in developed countries are likely to be increasingly automated in the near future. Technologies such as computer vision (as discussed in Chap. 4) can make many medical tasks rather effortless for an AI-based system. AI’s analytical prowess could lead to job displacement for many healthcare professionals, such as radiologists, pharmacy technicians, and laboratory assistants. Retraining initiatives will therefore become an important part of keeping our human professionals in the healthcare industry; it will be quite a while until we’ll have fully robotic doctors and surgeons. Naturally, not all valuable healthcare procedures require physical access to the patient. As discussed in the previous chapter, some rather effective therapybots have arrived and they are doing their damnest to heal our psyches from the ires and frustrations of modern living. Displacement aside, entirely new types of jobs are on the horizon for healthcare as AI becomes more widely used. Some emerging tasks include the following: • AI trainers will train AI systems on large datasets of healthcare data. They will clean and label data, while developing new training plans as needed. These datasets consist of medical images, electronic health records (EHRs), and data from clinical trials. • AI developers and engineers will be responsible for the development and maintenance of AI-powered tools for healthcare. They will build algorithms and deploy specialized AI systems. • AI interpreters will interpret the results of AI-powered solutions. They will understand the limitations of AI systems and communicate the results of AI- processed data to healthcare workers. • AI ethicists will ensure that AI-based systems are created and used responsibly. They will work together with healthcare professionals, patients, and other stakeholders to develop ethical guidelines for the use of AI in healthcare. These guidelines are there to ensure that AI is used in a way that respects patient privacy and minimizes bias. All things considered, the impact of AI on healthcare work is still uncertain. It does have the potential to significantly impact the healthcare industry. The rate of adoption of AI in healthcare will depend on a number of factors, including the cost of AI-powered tools, the availability of data, and the global regulatory environment. Healthcare workers should be prepared for the changes that AI may bring.

102

5 AI and Chatbots in Healthcare

5.9 I Want a Real Nurse! Overdependence on AI Excessive reliance on AI without human oversight may bode poorly for human interaction and result in the overlooking of critical medical factors, some of which may be quite subtle. Seamless integration of AI into all facets of healthcare appears challenging and impractical. The co-evolution of human emotions and technological medical systems might not happen painlessly due to their distinct natures. In autonomous systems, the capacity for physicians and care providers to exchange information with co-actors could be limited, unlike in human interactions (Farhud & Zokaei, 2021). According to Farhud and Zokaei (2021): Patients will lose empathy, kindness, and appropriate behavior when dealing with robotic physicians and nurses because these robots do not possess human attributes such as compassion. This is one of the most significant negative aspects of artificial intelligence in medical science.

In 2021 the World Health Organization (WHO) issued a report on AI’s role in healthcare with six guiding principles for AI-based systems’ design and implementation. The report emphasizes that algorithms trained primarily on data collected from individuals in high-income countries may not perform well for individuals in lower-income settings. WHO suggested the following six guiding principles: 1. Human autonomy. AI should be designed to respect human autonomy and decision-making. 2. Human well-being and safety, and the public interest. AI systems should be designed to be safe and effective, and that they should not be used to discriminate against or harm individuals or groups. 3. Transparency, explainability, and intelligibility. Users of artificial systems should be able to understand how these systems work, and that they should be able to explain the decisions that AI makes. 4. Responsibility and accountability. There should be clear procedures for ensuring that AI systems are used safely and ethically, and that there should be mechanisms for holding individuals and organizations accountable for any harm that is caused by AI systems. 5. Inclusiveness and equity. AI systems should be designed to be accessible to all people, regardless of their race, ethnicity, gender, socioeconomic status, or other factors. 6. Responsive and sustainable AI. AI-based systems should be designed to meet the needs of different populations, and that they should be updated and improved as new information becomes available. In their report, WHO called for global governance on these emerging and rapidly evolving technologies. As AI standards and regulations continue to advance globally and as AI ethics practices become more widespread, there may indeed be a need for additional international oversight and enforcement mechanisms. Such measures

5.10 Yet More Fun with Acronyms

103

are essential to ensure the alignment of a fundamental set of ethical principles and requirements that uphold human rights obligations. Otherwise the pursuit of short- term economic gains through AI could lead some governments and companies to neglect ethical considerations and their responsibilities towards human rights. To prevent this, it becomes crucial to establish cohesive international frameworks that encourage responsible AI deployment and prioritize the protection of human rights (WHO, 2021).

5.10 Yet More Fun with Acronyms We’ll finish this chapter with two more machine learning approaches: support vector machines and federated learning. These two algorithms are very important in AI and together cover a lot of ground in machine learning.

5.10.1 Support Vector Machines (SVMs) Support vector machines are a type of popular machine learning algorithm used in both medicine for diagnostic and research purposes, and in chatbots for their impressive ability to process natural language (among others). SVMs are often used in medicine because of their accuracy and their capability to deal with high-dimensional data; they represent one of the most robust prediction methods available. Modern implementations of SVMs are also highly capable in tasks requiring computer vision; in a medical setting this means things like rapid x-ray analysis. SVMs were originally devised by computer scientist Vladimir Vapnik (b. 1936) in 1964. The algorithm was enhanced with major additions during the 1990s by researcher Isabelle Guyon (b. 1961) and others. Here are some ways support vector machines are used by medical chatbots and other related software: • Symptom classification: SVMs are used to classify symptoms into different categories (e.g. digestive, neurological, respiratory), which is a crucial part of finding the right treatment. • Diagnosis: With SVMs, medical practitioners can diagnose diseases by analyzing a patient’s symptoms and medical history. This application enables doctors to improve the accuracy of their diagnoses and offer more effective treatment options. • Treatment recommendation: SVMs can be used to recommend treatments for diseases based on a patient’s diagnosis and medical history. This can help doctors to provide more personalized and effective treatment plans. • Risk assessment: By employing SVMs, it is easier to gauge the probability of a patient developing a disease based on their symptoms and medical history. This

104

5 AI and Chatbots in Healthcare

allows doctors to identify individuals at high risk of disease development and offer preventive care to reduce potential health issues. Of the chatbots featured in this chapter, ADA and Healthily/Your.MD leverage SVMs for symptom classification and other related tasks.

5.10.2 More on SVMs: Hyperplanes and Support Vectors A typical support vector machine works by discovering the hyperplane that best separates different classes of data points. A hyperplane is a flat subspace of dimension one less than the dimension of the data space at hand. If that space is three dimensional, then its hyperplanes are two dimensional, or if the space is two dimensional, then its hyperplanes are one dimensional. A hyperplane simply serves as a division point between different classes of data. An SVM identifies support vectors from the data and uses them to create a hyperplane that effectively separates and classifies the data points. Now, these support vectors are the data points from the training data which reside the closest to the hyperplane and thus have the greatest influence over its orientation and position. The area based on the distance between the hyperplane and the support vectors is called the margin. The priority in classification tasks is to increase this margin to enhance the model’s robustness and improve its ability to generalize well to new data. SVMs can also be used for linear regression which is a type of predictive analysis, widely used in economics, statistics, and machine learning. An SVM-based regression algorithm uses support vectors and a hyperplane to approximate the relationship between input variables and output values. The hyperplane then acts as a malleable decision boundary, and the algorithm optimizes the model to fit the data while allowing for a controlled level of errors within the specified margin (see Fig. 5.1). The datasets in SVMs can consist of pretty much anything: images, numerical information, or categorical data (e.g. time of birth, country of residence).

5.10.3 On Federated Learning (FL) The most common type of machine learning uses local open datasets which are merged into a single, centralized training session. In this approach each contributor effectively shares their data with the other actors in the project. In contrast, federated learning allows multiple participants to collaboratively create a shared, robust machine learning model while maintaining much greater data privacy and security. Federated learning (sometimes referred to as decentralized learning) is ideal for AI applications in healthcare, legal work, finance, and other highly sensitive domains.

5.10 Yet More Fun with Acronyms

105

Fig. 5.1 A visual representation of a one-dimensional hyperplane dividing two-dimensional information into separate types of data. The datapoints defining the hyperplane (being closest to it) are called support vectors

The topic started gaining ground in 2015 in academic papers such as “Federated Optimization: Distributed Optimization Beyond the Datacenter” by Konečný, McMahan, and Ramage. Excellent books on this topic include “Federated Learning” by Qiang Yang et al. (Springer, 2020) and “Federated Learning—A Comprehensive Overview of Methods and Applications” by Ludwig and Baracaldo (Springer, 2022). While decentralized, FL still uses a central data-server unit to create a unified data-model; parts of this model are worked on by this server and sent back to the other devices for further processing. The units (or nodes) in an FL-network can consist of smart phones, computers, or other digital devices deemed powerful enough for these tasks. Now, federated learning is not all digital sunshine. It has major challenges, including high computational needs and inter-device security. Federated learning is often more intense on the hardware than traditional machine learning due to frequent communications between all the nodes involved. This means increased energy consumption and potentially slower processing. Also, despite of its decentralized design, it is not foolproof from the security standpoint. Although powerful security measures are available, eavesdroppers may be able to intercept data between devices and compromise privacy. In addition, devices plugged into an FL-ecosystem may have vastly different qualities, posing a challenge in effectively aggregating data when constructing an accurate unified model. For example, older and slower devices will drag down the computing power of an FL-network. To make things worse, these problems are not limited to hardware. Differing versions of the various software-frameworks involved do tend to cause issues with interfacing as well.

106

5 AI and Chatbots in Healthcare

5.10.4 Applications of FL Federated machine learning is often used in marketing to create targeted ads, personalized recommendations, and to predict customer behavior. An example of this is found in Facebook’s usage of federated machine learning to construct a model capable of predicting which users are most inclined to click on ads. During this process the model is trained on vast datasets originating from millions of users, without the disclosure of any personal identifying information. The applications of federated machine learning extend to the financial sector, where it plays a crucial role in developing models that can effectively foresee risk factors, identify fraud, and customize investment portfolios to individual preferences. FL ensures data privacy and confidentiality throughout the training by safeguarding sensitive information. Federated learning proves useful in data models that accurately predict road conditions and traffic patterns, too. This greatly contributes to improving the performance of autonomous vehicles, making the roads much safer for everyone. Because federated learning’s application range encompasses basically the entire spectrum of healthcare, many modern healthcare-apps, including their chatbots, leverage the approach. As our smart phones and other wearable devices get increasingly powerful, they are becoming parts of these types of widely-distributed networks. Google has integrated this new technology in earnest into its products. In 2016 they successfully incorporated federated learning into a popular virtual keyboard app called Gboard. Instead of creating software that stores all user messages in a centralized database (an obvious privacy problem), Google adopted the more sustainable approach of federated learning. Federated learning is used to improve the accuracy of the predictive typing model in Google Messages, the main text messaging app within the Android-ecosystem. This model is trained on data from millions of users whom fully retain their anonymity. Federated learning is also being used by Google to improve the accuracy of a model that can detect diabetic retinopathy. Again, this model is trained on a huge anonymized dataset without privacy issues.

5.11 In Closing In this chapter we examined the use of AI and chatbots in healthcare, covering the following topics: • Five popular AI-powered platforms in the medical field: ADA Health, Healthily/ Your.MD, Babylon Health, Symptomate, and HealthTAP • The many ethical concerns and other challenges in introducing AI into healthcare • Support vector machines (SVMs) and how they are used in AI • The basics of federated learning (FL) and how it compares to traditional machine learning approaches

References

107

In Chap. 6 we’ll take a look at the use of chatbot-technology in the riveting world of eCommerce.

References Farhud, D. D., & Zokaei, S. (2021). Ethical issues of artificial intelligence in medicine and healthcare. Iranian Journal of Public Health, 50(11), i–v. Healthily. (2021). Healthily and best practice AI publish world’s first AI explainability statement reviewed by the ICO. Retrieved August 4, 2023, from https://www.livehealthily.com/press/ releases/explainability-statement Igoe, K. J. (2021). Algorithmic bias in health care exacerbates social inequities — How to prevent it. Harvard T.H. Chan School of Public Health. Retrieved July 22, 2023, from https://www. hsph.harvard.edu/ecpe/how-to-prevent-algorithmic-bias-in-health-care/. Jercich, K. (2020). Healthcare IT News: The biggest healthcare data breaches reported in 2020. Retrieved September 28, 2023, from https://www.healthcareitnews.com/news/ biggest-healthcare-data-breaches-reported-2020 Jungmann, S., Klan, T., Kuhn, S., & Jungmann, F. (2019). Accuracy of a chatbot (Ada) in the diagnosis of mental disorders: Comparative case study with lay and expert users. JMIR Formative Research, 3(4), e13863. Kiseleva, A., Kotzinos, D., & De Hert, P. (2022). Transparency of AI in healthcare as a multilayered system of accountabilities: Between legal requirements and technical limitations. Frontiers in Artificial Intelligence, 5, 879603. Lindvall, C., Cassel, C. K., Pantilat, S. Z., & DeCamp, M. (2020). Ethical considerations in the use of AI mortality predictions in the care of people with serious illness. Health Affairs Blog. Retrieved July 22, 2023, from https://www.healthaffairs.org/content/forefront/ ethical-considerations-use-ai-mortality-predictions-care-people-serious-illness. Magalhaes Azevedo, D., & Kieffer, S. (2021). User reception of AI-enabled mHealth Apps: The case of Babylon health. Merrill, M. (2011). Healthcare IT News: TRICARE breach puts 4.9M military clinic, hospital patients at risk. Retrieved September 28, 2023, from https://www.healthcareitnews.com/news/ tricare-breach-puts-49m-milatry-clinic-hospital-patients-risk Palanica, A., Flaschner, P., Thommandram, A., Li, M., & Fossat, Y. (2019). Physicians’ perceptions of chatbots in health care: Cross-sectional web-based survey. Journal of Medical Internet Research, 21(4), e12887. Schiff, D., & Borenstein, J. (2019). How should clinicians communicate with patients about the roles of artificially intelligent team members? AMA Journal of Ethics, 21(2), E138–E145. The World Health Organization/WHO. (2021). Ethics and governance of artificial intelligence for health. Retrieved July 30, 2023, from https://apps.who.int/iris/rest/bitstreams/1352854/retrieve

Chapter 6

Chatbots in eCommerce

This chapter will have us explore the nearly universal adoption of chatbots in the world of eCommerce. Many businesses these days have virtual assistants which are there to serve potential customers around the clock—and they are not mere glorified digital greeters. According to Statista (2023), the chatbot market is estimated to reach around USD1.25 billion in 2025. This is a significant increase from the market size in 2016, which then stood at USD190.8 million. Because online commerce is one of the most popular and accessible types of businesses today, we’ll also take a somewhat deeper look into the process of deploying a chatbot in case you want to take this matter further with your own ventures. But first, it’s time to put on some of those gaudy pants from the Y2K era, as we travel back in time into the turn of the millennium.

6.1 Pre-chatbot Online Businesses The world is quite different compared to the early 2000s when online businesses first appeared. Back then we did not have the marketing technology we sometimes take for granted these days. The navigation on these ancient websites often used basic HTML-hyperlinks for accessing product categories. However, not all of these results were always relevant to the user which could lead to a poor shopping experience. Plain links aside, the tool of choice for online commerce was often the simple search bar. This tool, too, could fail to deliver relevant results when imprecise wording was used in queries. Searches for “hat” might have returned results for all types of headgear, even if the user was only looking for those fetching fedoras. If a user did not have much information about the exact product they were looking for, a conventional search bar could not always help them. Early online stores did not

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_6

109

110

6 Chatbots in eCommerce

leverage much, if any, NLP to learn about user preferences, a highly common task in modern eCommerce.

6.1.1 A Reasonably Brief History of the Dot-Com Bubble The dot-com bubble was a time of rapid growth of internet-based companies in the late 1990s and early 2000s resulting in a bit of a predicament for the industry. This era was fueled by optimism about the potential of the internet to revolutionize various industries. Many investors poured money into dot-com companies even if they made little to no profits. As a result, stock prices of many dot-com companies skyrocketed, often without any justification in terms of underlying financial performance. The phenomenon was not limited to the US, but occurred in Canada, the UK, and Australia as well. There were few survivors. The dot-com bubble burst in the early 2000s as investors began to realize that many companies were not delivering on their promises. Companies were simply running out of cash. Stock prices plummeted and many internet-based businesses went bankrupt or faced severe financial difficulties. Some companies lauded AI-based services as part of their value proposition, but the technology and shopping preferences were not really there yet. At the time, brick-and-mortar stores were still the preferred choice for most customers. The bursting of the dot-com bubble had a significant impact on the economy. It’s estimated that hundreds of these dot-com companies were buried during this period. These events led to significant losses for investors, a decline in overall stock market value, and a temporary slowdown in the growth of the technology sector. It also resulted in a more cautious approach when investing in technology enterprises and a greater emphasis on sound business fundamentals. The dot-com bubble is a cautionary story about the dangers of speculative investing and the importance of evaluating companies based on their actual financial performance rather than mere future growth potential. The dot-com bubble led to the development of new regulations in the financial industry designed to prevent another such disaster. Most notably, the Sarbanes-Oxley Act of 2002 (SOX) requires public companies to implement stricter internal controls and to provide more transparent financial reporting. When it comes to technological progress, the dot-com phenomenon offered a fertile ground for growth. The era provided us plenty when it comes to the internet and how we use it. During the time we saw the genesis of social media, streaming video, mobile networking, and higher bandwidth internet connections (i.e. broadband). Much progress was made on cloud computing, too, which refers to providing storage, processing, and software over the internet as opposed to running it locally. Let’s now focus on some early eCommerce-sites. From these historical online stores only two are still thriving in modern times.

6.1 Pre-chatbot Online Businesses

111

Pets.com Founded in 1998, Pets.com emerged as a prominent player in the online pet supplies retail sector. With its strange, media-happy sock puppet mascot with its wacky marketing endeavors, Pets.com became a recognizable name in the dot-com era. The company’s fate also marked it as one of the era’s most infamous failures as it faced bankruptcy merely 2 years following its inception. Spending USD1.2 million on a Super Bowl advertisement did not help. The story of Pets.com is a tale about the dangers of overspending and the importance of leveraging modern forms of marketing. Sometimes, not even a dog/cow hybrid spokes-animal can sustain a business. Boo.com Boo.com was an online fashion retailer founded in 1998. The London- based company could boast with a global reach. Boo.com was one of the most hyped dot-com companies of the late 1990s, raising over $100 million in venture capital funding. However, like Pets.com, the business was unable to turn a profit and it filed for bankruptcy just 2 years after its launch. Their website was rather over- ambitious and not very user-friendly. For one, the eventual leveraging of Flash made it cumbersome for customers to do their shopping. Complicating things when the bandwidth is not there quite yet is rarely a wise decision, especially in online commerce. Etoys.com Being a very popular online store for children’s toys at one point, eToys.com was with us between 1997 and 2001. Although it enjoyed early success, the company encountered difficulties that were widespread among many dot-com businesses during that time. For one, eToys faced ferocious competition from Amazon.com, Toys “R” Us, and Walmart. To thwart this pressure they invested perhaps too heavily in numerous promotions to attract customers in the form of overtly frequent discounts and free shipping. In the early months of 2001, eToys. com finally ran out of both cash and inventory, being unable to fulfill its orders—or pay its debts. The company’s web domain was acquired by Toys “R” Us in 2009. Ebay.com Everybody’s (potentially) favorite online auction site started its life in 1995. By the early 2000s eBay was already a prominent player in this scene. It started as a simple auction platform but expanded to include fixed-price sales and a system for user feedback. On the surface Ebay’s user interface (i.e. the client-side) has remained quite similar to its origins. A lot of new technology, including NLP, is actually running on the site’s backend (i.e. the server-side). It’s mainly there to tailor product recommendations to each individual customer and to provide better search results. eBay is a strong survivor of the bubble indeed: in 2022 eBay reported generating USD 9.79 billion in revenue. Amazon.com Amazon was established in 1994 and it continued to grow throughout the early 2000s. In those troubled millennial times Amazon managed to solidify itself as a major online marketplace, selling a wide range of products, including books, music, toys, and all sorts of electronics. As discussed in previous chapters, Amazon has adamantly stayed current on several technological fronts, including AI,

112

6 Chatbots in eCommerce

to strengthen their hold on the market. They stayed on the ball with technological progress and strived to keep their website as universally user friendly as possible. Not exactly a casualty in the dot-com wars, in 2022 Amazon.com pulled in over half a trillion USD in revenue.

6.1.2 Embracing the AI Bubble It can be argued the era of rapid growth of artificial intelligence has some parallels to the dot-com phenomenon of the 2000s. The first tech bubble consisted of hype for the then-emerging world of the internet. AI is an understandably exciting technology with numerous ramifications for the entire world and this generates public interest as well as ferocious competition among businesses. AI startups, like dot-com companies, often experience swift growth in a very short period of time. Countless AI startups have emerged in the 2010s. However, the dot-com companies of yore went public with their shares quickly whereas AI startups typically remain private. Also, there is potentially much more room for innovation in AI than in many of the focal areas of the first dot-com bubble, which to a large degree was about popularizing the use of the internet as a shopping platform. According to Gerbert and Spira (2019), bubbles form when the price of an asset becomes disconnected from its underlying value. This happens when investors become overly optimistic about the future of the asset and are willing to pay more for it than it is worth. As more investors pile into the asset, its price continues to rise, creating a self-fulfilling prophecy. Two key signs of a bubble are rapidly increasing prices and rising investment volume. In the case of AI, we are seeing both of these trends. Several other factors have played a role in driving the emergence of the AI bubble. One major factor is the growing accessibility of data. Training AI algorithms requires substantial amounts of data and the volume of available information has surged significantly in recent times. Compared to the late 90s, the internet has grown exponentially in both the amount of data it holds and in its number of users. The crux of the matter is not in the logistical chain of goods and services, as was often the case during the first bubble, rather it is in the obvious benefits of applying AI into our daily lives. Dystopian scenarios aside, most would agree AI is making our societies better in some very tangible ways. Compared to the doomed dot-com companies, AI startups may face fewer challenges in demonstrating their technology’s value. The reach of the global economy and markets has grown significantly since the late 1990s. This is mostly due to the rise of the internet, the globalization of trade, and the increasing availability of capital. AI startups have the opportunity to engage in global collaborations with partners and customers, facilitating the exchange of resources, expertise, and ideas. Also, unlike the sometimes vague propositions of the dot-com bubble companies, by focusing on very specific applications AI startups can more easily demonstrate their

6.1 Pre-chatbot Online Businesses

113

value to potential customers. This can help them attract investment and grow their businesses.

6.1.3 Living and Learning Let’s next go through some actual issues of the classic era of online commerce, the noughties (i.e. the 2000s), most of which can be thankfully classified as lessons learned. • The horrors of dial-up internet. Many users were still using archaic dial-up internet connections, which resulted in slow loading times and the need for lightweight pages with crummy audiovisuals. Websites were often wordy, basic, and visually unappealing with limited and/or garish use of color and images. • Flash. Flash by Macromedia/Adobe was a multimedia technology finally retired in most parts of the world in 2020. In theory, with Flash one could summon exciting interactive audiovisual elements simple HTML just couldn’t produce at the time. Many web designers tried dabbling with it during the noughties only to discover Flash often made things slower, less reliable, and more frustrating for the average user. And nobody misses those atrocious “flash-intro” animations on websites. • No social media integration. While MySpace was a thing since 2003 with Facebook soon following, social media in the noughties was still in its infancy and there wasn’t a strong integration between online stores and social platforms. Word of mouth was slower and more limited in scope. • Limited product information and payment options. Product descriptions were often brief and lacking in detail. High-quality marketing photography was rare and zoom/rotation features were not available. Also, modern online payment systems were not as ubiquitous as they are today. Options like PayPal and Stripe were just starting to emerge. • Basic search and site navigation. Search functionality was often basic and navigation menus were sometimes confusing. • Fewer security measures. Online security was not as robust as it is today, often leading to credible concerns about the safety of personal information. • Lack of mobile optimization. Smartphones were far from prevalent in the noughties and websites were usually not optimized for mobile devices. Sometimes there were problems simply running sites on smaller screens such as the laptops of the day. • Less personalization. Personalization and recommendation systems were not as advanced as they are now due to low or nonexistent use of AI-based techniques. Recommendations were often based on broad categories rather than individual preferences.

114

6 Chatbots in eCommerce

• Limited customer feedback. Customer reviews were not as widespread and lack of social media meant online reputation didn’t play as significant a role in purchasing decisions. • Problems with shipping. International shipping was not as streamlined, which limited the global reach of many online stores. Fast shipping wasn’t common and customers in any destination often had to wait rather long for their purchases to arrive. • Worst of all: no chatbots. These were the times of busy call-centers and occasionally wayward email. No virtual salespeople meant potential buyers were often left to their own devices when in need of instant assistance. This no doubt had somewhat of a hindering effect on sales. Having learned the lessons of the archaic 2000s, it’s time we enter the modern times of eCommerce and one of its finest tools: the chatbot.

6.2 Chatbots in eCommerce A typical eCommerce-chatbot should be integrated with the backend systems of a business, including the product database, shipping information, and customer database. This integration allows the chatbot to retrieve current information about products, prices, availability, order history, and other essential details. Typically, chatbots use machine learning techniques to keep learning from customer interactions. As a chatbot accumulates more data, it refines its accuracy, and capacity to deliver recommendations. Now, there are several typical duties for chatbots in a commercial environment. Let’s review them next. • Customer service. Chatbots are often used to answer customer questions, provide support, and resolve issues. This can free up human customer service representatives to focus on other tasks. Unlike people, chatbots have the ability to manage multiple conversations simultaneously while maintaining near instant response times. This multitasking capability makes sure that no potential customer is ever left waiting, ensuring a satisfactory and efficient interaction process. • Marketing. Chatbots can be used to promote products and services, gather customer feedback, and drive traffic to websites. This can help businesses to increase their brand awareness and reach new customers. • Sales. Combined with an online sales platform which is up 24 h a day, a “salesbot” can help businesses increase sales without hiring more staff. Chatbots can ask qualifying questions to determine whether a lead is a good fit for the business. They can recommend additional items to customers to drive sales up. • Personalization. eCommerce chatbots leverage machine learning algorithms to examine user data and behavior, allowing them to provide tailored product recommendations. By understanding the user’s preferences and purchase history, a

6.2 Chatbots in eCommerce

115

chatbot can suggest products that align with their interests. This is good for strengthening customer loyalty. • Productivity. Chatbots can automate tasks to improve efficiency and reduce costs. They can track inventory levels and shipments around the clock. • Feedback. Businesses need the occasional dose of customer feedback; a jolly and witty chatbot is the perfect way of gathering some.

6.2.1 Designing an eCommerce-Chatbot Retailers have the option of using pre-existing chatbot platforms or designing their own from scratch. Pre-existing platforms can be a good option for retailers who don’t have the time or resources to develop their own chatbot. However, they may not be as flexible or customizable as a custom-build. Custom chatbots can be a good option for those who want a chatbot that is highly tailored to their business. These solutions can be more expensive and time-consuming to develop (Pantano & Pizzi, 2020). Some specific features are important for a good commercial chatbot. These are as follows: • Conversational agents should mimic actual salespeople. Chatbots need to be able to understand and respond to human language in a way that is similar to how humans communicate with each other. This is especially important for chatbots that are used in customer service applications as it allows them to provide a more engaging and persuasive experience for customers. Clunky dialogue doesn’t sell. • Analytical skills to learn from consumer data are crucial. Chatbots need to be able to learn from the data that they collect in order to improve their performance. For example, chatbots can learn which questions customers are asking most frequently and they can then use this information to improve their ability to answer questions in the future. • Voice-based AI platforms, such as Siri and Alexa, are the most popular. This is because these platforms are already widely used by consumers and they are easy to integrate with chatbots. A chatbot can be used to answer questions that customers ask on social media or it can be used to control smart home devices using voice commands. • Chatbots have to be computationally efficient. Customers expect chatbots to be able to respond quickly to their requests. Slowly responding chatbots can frustrate customers and lead to them abandoning the conversation or transaction altogether. Bigger companies must prepare their chatbots for large volumes of simultaneous users. Customer preferences, trends, and technological progress are constantly in flux. To achieve a long and successful chatbot deployment, consistent maintenance and updating are essential. Also, a mechanism should be in place for the chatbot to transfer conversations to human agents should overtly complex issues arise. This

116

6 Chatbots in eCommerce

almost certainly ensures better customer satisfaction. A proactive approach is always a good idea. The user should be made aware of a chatbot being at their disposal; on-screen prompts and greetings encourage user engagement.

6.2.2 Decision-Tree Chatbots in eCommerce Sometimes a business only needs a simple virtual representative instead of a chatbot based on advanced AI like NLP. Decision-tree chatbots are a type of more basic chatbot that uses a branching decision tree to guide users through conversations. Such a chatbot asks the users questions at each step, branching off to a different path based on the user’s answer. This process continues until the chatbot reaches a desired outcome such as managing to provide the user with useful information. Decision-tree chatbots are designed to be efficient and rather limited in their scope; you can rarely have an intellectually rewarding debate with such software. They are well-suited for tasks that require a limited number of possible answers, such as updating user account information or checking product availability. They can also be used to provide answers to frequently asked questions in a more engaging manner compared to a static website. Even more advanced NLP-based chatbots leverage some of the approaches of the decision-tree mechanism, but the distinction is definitely there. Our dear friend ELIZA, the virtual therapist mentioned in earlier chapters, is considered a decision-tree chatbot.

6.3 Chatbots for Business: Some Solid Solutions A chatbot used in a business can be deployed in several ways. You may be familiar with some type of a virtual customer service agent on a company website. A chatbot can also be integrated into a social media platform, such as Facebook or Instagram. There are a number of good trends in the world of chatbot creation. Most importantly visual design interfaces are a thing—little to no programming is required to set up a fully functional virtual salesperson. Now, a number of chatbot-solutions are available for all major platforms. Let’s now take a good gander at some of them.

6.3.1 ChatBot by LiveChat Software Armed with a visual development system and a no-nonsense name, ChatBot represents a popular framework for making custom virtual assistants for any type of business. According to makers LiveChat Software, a single active chatbot of theirs can handle an unlimited number of concurrent chats with users. It can be integrated

6.3 Chatbots for Business: Some Solid Solutions

117

into numerous social media ecosystems, including Facebook, Skype, and X (formerly Twitter). ChatBot was first released in 2018. Developing a chatbot in ChatBot is rather intuitive. A 7-day free trial is available. You are first given a set of templates to choose from. These encompass bots for customer service, job application for employers, online quizzes, frequently asked questions (FAQ), and more. You are then presented with a visual flowchart of the conversation. In this view you can edit your bot’s mutterings while simultaneously testing the project in real time. By using conditional statements in ChatBot a developer can construct fairly complex sequences of communication. Conditional statements examine things like “if the user types something naughty, then react in a specific way” or “if the user fails all questions in the quiz then display a humorous image”. In programming, these are also known as IF-THEN or IF-THEN-ELSE statements. LiveChat Software was founded in 2002 in Wrocław, Poland. In 2006 50% of the company stock was bought by Capital Partners S.A., an investment firm publicly listed at the Warsaw Stock Exchange. In 2020 LiveChat’s ChatBot-product reached 1000 customers, a group which includes big brands like General Motors and Unilever. ChatBot is available at chatbot.com

6.3.2 Salesforce by Haptik Haptik was founded in 2013 in Mumbai, India. Their technology spans customer support, information retrieval, task automation, and more. Haptik’s chatbots are seamlessly embeddable across a wide variety of platforms like websites and mobile apps. Since 2015 the company increased their focus on NLP-based software. The Salesforce Service Cloud is a customer relationship management (CRM) platform for businesses to provide customer service and support. Haptik’s Salesforce is an integration between Salesforce Service Cloud and Haptik’s AI-powered virtual assistant. Salesforce provides a unified customer service experience across multiple platforms including WhatsApp and Facebook. In 2018, Haptik partnered with Amazon Web Services (AWS) to provide AI-enabled conversational solutions to customers in India. Also, Haptik built the world’s largest WhatsApp chatbot during the COVID-19 pandemic as the official helpline for the Government of India. Haptik’s Conversation Studio, a visual development platform similar to LiveChat Software’s offering, makes creating custom chatbots rather straightforward. Conversation Studio leverages Smart Skills, which are pre-built domain-specific functionalities businesses can equip their chatbots with. For eCommerce, Haptik has dedicated Smart Skills like “Get invoice” and “Check refund status”. SalesForce and other Haptik’s products are available at haptik.ai

118

6 Chatbots in eCommerce

6.3.3 Netomi Netomi is an AI company founded in 2015 by Puneet Mehta and is headquartered in San Francisco, California. The company has over 100 customers, including some of the world’s leading eCommerce brands, such as Adidas, Nike, and Sephora. Netomi serves as a hub for AI-driven customer experiences, prioritizing the utilization of generative and conversational AI through its proprietary language engine. Netomi lets businesses efficiently manage customer service interactions across email, chat, messaging, and voice-based interfaces. In addition the platform delivers a range of supplementary services designed to enhance customer experiences through the integration of AI-powered technologies. Netomi can also interface with Haptik’s SalesForce system. At the core of Netomi’s capabilities lies an AI chatbot that leverages sophisticated NLP to improve the resolution of customer inquiries without needing any human involvement. This chatbot can use pre-existing knowledge bases and written sources to auto-craft responsive Q&A content as well. See netomi.com

6.3.4 Ada Not to be confused with Ada Health, Ada the customer service automation company is based in Toronto, Canada. Ada offers an AI automation platform that allows users to create personalized AI chatbots for their eCommerce businesses. With Ada’s codeless automation builder, nearly anyone can create a chatbot that can answer customer questions, resolve issues, and provide recommendations. Ada has been in business since 2016. Their clients include Verizon, Air Asia, and Meta (formerly the Facebook company). Ada’s chatbots are constantly being optimized with AI-driven insights. This means that Ada’s chatbots are always getting better at understanding customer queries and providing accurate and helpful responses. Another key feature of Ada is its so-called resolution engine. This system is built on LLMs (large language models) that have been fine-tuned for customer service. The resolution engine uses Natural Language Understanding (NLU), a concept discussed in Chap. 5, to speed up content generation. This includes AI-assisted writing, training suggestions, and generative replies in over 50 languages. The resolution engine can be used across a variety of industries, including eCommerce, media, and gaming. Ada is a powerful and versatile customer experience automation platform that can help businesses of all sizes improve their customer service. Ada’s chatbots are also constantly being optimized with AI-driven insights, which means that they are always getting better at understanding customer queries and providing accurate and helpful responses. Ada’s products are available at ada.cx

6.3 Chatbots for Business: Some Solid Solutions

119

6.3.5 Pandorabots: The Joys of AIML In Chap. 4 we took a glance at AIML, which stands for Artificial Intelligence Markup Language. Pandorabots is a popular online platform for chatbot development leveraging this technology. It’s a free open-source framework for making AI-powered chatbots for browsers, mobile devices, and messaging apps like WhatsApp and Telegram. Pandorabots is one of the most established and largest chatbot hosting services in existence. Around 300,000 chatbots for varying purposes have been built with the system as of September 2023. AIML is flexible, robust, and easy to grasp. Everyone with a serious interest in eCommerce-chatbots should look into this important markup language. In the bigger picture AIML does have its limitations. It’s basically a selection-tree based system, instead of one based on machine learning. However, when a user interacts with a virtual sales assistant, they are probably not expecting a deep conversation on current events; these types of communications are better suited to chatbots based on an entirely different design approach. With Pandorabots you can try out your AIML-based chatbots in real time in your browser. A free social media account (e.g. from Google or Facebook) is needed to register on the site. You can then use a rather sleek AIML-editor to create your bot or opt for a pre-built template. In addition to traditional text-based chatbots, Pandorabots offers voice- and avatar-based conversational solutions. A “pandorabot” works great for eCommerce. They have been used by major names like Disney, Toyota, Netflix, and Visa, but the platform is a great choice for businesses of all sizes. The basic product is free to use and deploy. Different paid subscriptions are available, too, for more advanced, enterprise-level features including better support options. The Pandorabots-platform is available at pandorabots.com. Just click on Sign In, create an account, and get busy with some AIML today. You can review the basics of AIML by accessing Chap. 4. Also, a somewhat belligerent customer assistant bot is available on the Pandorabots website to answer your queries.

6.3.6 Rasa We’ll wrap our review with a fine, free open source alternative for chatbot development. Rasa is an open source machine learning framework for conversational text and voice-based interactions. It leverages natural language understanding and natural language processing. Rasa can be deployed on most modern platforms, such as cloud, desktop, or mobile devices. It is designed so that it can make short work of high volumes of traffic and complex scenarios. However, the process of setting up the Rasa development system may be a rather challenging task for those without previous programming experience.

120

6 Chatbots in eCommerce

For eCommerce, Rasa comes with a “Retail starter pack” which is essentially a basic example chatbot dedicated for customer service. This bot comes pre-loaded with training data and includes several retail customer service skills such as checking on an order’s status. Developers building virtual assistants for retail can use this project to create a custom chatbot that’s great for online retail. Rasa has a highly customizable development system. A developer can choose to integrate only the essential components into their projects, such as specific language models. In addition Rasa is endowed with a large community around it to ease developers into the ecosystem. This is a great tool for eCommerce chatbot development as it offers many features and benefits that help in creating an engaging conversational AI assistant. Businesses who chose Rasa for their virtual assistants include food delivery companies HelloFresh and Just Eat. See rasa.com

6.4 Terms of the Testing Trade You are going to run into specific concepts when designing and testing chatbots. Let’s next review some of these terms so you are better equipped should the need arise at some point during your chatbot-adventures. • User journey. These refer to common scenarios like product searches, using a virtual shopping cart, applying discounts, and completing purchases. All common user journeys need to be embarked upon during testing. • UI/UX. The latter stands for user interface while the former refers to user experience. A UI focuses mostly on the visual design elements of a chatbot (or any other product with a user interface). UX is more about how users perceive the product as a whole; it deals with factors such as continuity and effortlessness of use. • Bug. A bug is a well-established synonym for “error” in software development. With some creative thinking, bugs can sometimes be upgraded into “features”. • Scalability. Scalability refers to the ability of a chatbot to handle simultaneous users without issues. • Test script. A set of instructions written in a programming language (e.g. Python, C#, Java) that simulate user interactions with the chatbot. These scripts are to be executed within testing environments and can greatly speed up the process. • Assertions. These are statements in test scripts that define expected outcomes and conditions that should manifest after a certain action or interaction. • Test case. A single test scenario along with its expected outcomes and the steps needed to complete it is known as a test case. They are used to test specific functionalities of the chatbot. Chatbot test cases can be created manually or by using automated tools. • Onboarding. This process consists of introducing a chatbot to the user and making sure that they are comfortable using it.

6.4 Terms of the Testing Trade

121

• Test report. A test report is a document that summarizes the results of testing including all passed and failed test cases, ideally with suggestions about what to do with the latter. These reports should be written in as comprehensible manner as possible.

6.4.1 Tools for Testing Your Bots An untested chatbot is a liability, especially in the world of eCommerce. Testing chatbots can be an arduous task from the installation phase of the software involved, to the actual testing stage. Luckily, numerous tools tailored for chatbot testing do exist. These automate many procedures to uncover potential issues rather quickly. They evaluate the chatbot’s responses across diverse prompts, questions, and user inputs. Let us now examine some of these solutions. This section is to provide you with a concise overview of what types of solutions are available; we will not go into things such as installation of these frameworks. Selenium Borrowing its name from an essential micronutrient, Selenium is an open source testing framework first released as an internal tool at ThoughtWorks in 2004. It offers an integrated development environment (IDE) for web automation testing, including facilities for chatbots. Selenium allows you to record and playback user interactions, as well as create and edit test scripts. At Selenium’s core is WebDriver, a software component that allows automated interactions with web browsers. It provides a flexible interface for controlling browsers such as Firefox, Chrome, Edge, and others. WebDriver enables the automation of all common tasks like clicking links, surfing through web pages, and filling online forms. Selenium Grid is an expansion on WebDriver which allows you to test your chatbots across multiple browsers and operating systems concurrently. Selenium’s technology stack is used by a plethora of software-testing businesses, including SauceLabs and Lambdatest. These companies are in turn trusted by major players like Microsoft and Visa for some of their systems testing. Selenium is a solid tool for smaller tasks and beginners, too, as it does not require any previous programming knowledge. This fine testing platform is available at selenium.dev Botium Formerly known as TestMyBot, Botium is a free automated testing suite for many types of chatbots, including those that are deployed on websites, mobile apps, and voice assistants. Botium has a nifty web-based graphical user interface called Botium Box which is used to configure and oversee all parts of the testing process. Botium provides detailed reports on test results, including insights into successes, failures, and potential problems. However, the system does require some programming experience to run efficiently. Botium is nonetheless a highly customizable, scalable, and robust testing environment for any serious botmaker. The product can be found at botium.ai

122

6 Chatbots in eCommerce

Kore.ai Chatbot Test Runner Rolling effortlessly off one’s tongue, Kore.AI’s product is another fine open source testing platform built for chatbot-aficionados. Kore.ai Chatbot Test Runner is not as beginner-friendly as some other testing tools, but it delivers some excellent features. For one, it has very solid facilities for analyzing test results. Every time you run test cases in the Chatbot Test Runner it produces spreadsheet-files (as in Excel), each containing a comprehensive report for every tested scenario. You can also write your test cases in the simple JSON format or use the graphical user interface to create them. JSON (JavaScript Object Notation) is a text-based, human-readable file format used in many types of web applications. Kore.ai also accepts test cases from other sources, including chat transcripts. Katalon Studio is a fine set of test automation tools that supports web, mobile, and desktop applications. The solution is partially built on top of the previously mentioned Selenium-framework. Katalon’s approach is quite beginner-friendly with an intuitive user interface; there are many built-in “wizards” to help you get started. Katalon offers TestCloud, a sturdy and highly scalable cloud-based testing environment. With it you can execute tests simultaneously across multiple browsers and operating systems, which can greatly enhance the efficiency of test automation processes. Katalon Studio is a rather popular piece of software, being used by over 100,000 companies of varying sizes in 160 countries. Some coding skills are needed to get the most out of this suite. Basic testing tools for smaller projects are available free of charge in Katalon Studio. A somewhat costly premium-subscription is needed for enterprise-level tasks. As great as a lot of free chatbot testing software is, larger organizations in particular often prefer proprietary tools, like the paid editions of Katalon, for the task. A powerful enough software suite can eliminate most of the repetitive tasks involved in chatbot deployment which in turn means a lot of savings. Katalon Studio is available at katalon.com

6.4.2 Chatbot Testing Techniques Having glanced at the software, let’s now take a peek at some of the most commonly employed testing techniques for chatbots. User interaction testing This refers to having real users engage with the chatbot to gauge its responsiveness. These testers offer insights into the chatbot’s accuracy, the relevance of its answers, and the overall user experience. Regression testing The process of re-running tests on new versions of a chatbot to ensure that changes have not caused new issues or hindered existing functionalities. Boundary testing Testing inputs past valid ranges to ensure the chatbot doesn’t lose its marbles is important. For example, if a chatbot is designed to process num-

6.5 On eCommerce Chatbot User Interfaces

123

bers between 1 and 2000, boundary testing would involve testing the chatbot’s ability to handle inputs that exceed the boundaries of the intended range, such as −5 and 2612. If a chatbot is not capable of at least not crashing when given boundary- crossing values, there are serious bugs in its code. A/B testing This method assesses various versions of the chatbot to determine which one is the best performer. Different responses to the same question or a lternate chatbot conversation flows are compared to discover which version offers the most consistent and useful interactions. Error handling and monitoring The way that a chatbot behaves when given invalid input or under some other unexpected scenario is known as error handling. Every chatbot should be able to handle errors without crashing. The chatbot should also continuously log errors should any appear. This data helps in pinpointing areas that need improvement within the chatbot. Performance/load testing The chatbot’s capabilities under different load conditions are assessed through performance testing. This protects against performance degradation or crashes when subjected to high user traffic. Security and privacy testing Chatbots should be assess how they handle sensitive user data, including personal details and payment information. This data is to be securely transmitted and stored in accordance with privacy regulations at all times. Localization testing This type of testing guarantees the chatbot’s compatibility across various languages and regions. It includes evaluating the chatbot’s proficiency in understanding a diverse set of languages and its capacity to appreciate various cultural subtleties. Chatbots should be evaluated by engaging with diverse user profiles of various demographics and different levels of technical proficiency. They are to be challenged with all manner of queries, ranging from the very simple to complicated open-ended questions. A chatbot should always be tested on multiple hardware platforms (e.g. desktops, tablets and other mobile devices). Also, just like any software, chatbots should be periodically scrutinized after deployment and improved upon based on user feedback.

6.5 On eCommerce Chatbot User Interfaces Certain good practices have been established for chatbot user interfaces (UI) over time. The design principles of major business chatbots will be summarized next. They naturally apply for other types of chatbots as well.

124

6 Chatbots in eCommerce

A decent eCommerce chatbot UI should be easy to use, letting users find products, deliver Q&A, and guide them through the shopping process effortlessly. A chatbot’s UI should resemble a common conversation interface found in most messaging apps, using familiar design elements like chat bubbles and emoticons. On occasion, the bot may summon images and videos which can reduce conversation monotony. These elements can also be used to help users understand more difficult concepts or to visualize products. Chatbots should be immune to or at least be aware of ad-blockers (i.e. browser add-on software which blocks online advertisement media). Chatbot UIs should be visually appealing, simple, and never hard to navigate. The design of the chatbot should align with the overarching design of your website or application. This contributes to a more cohesive experience for users. The chatbot trigger-point (i.e. an icon used to activate a bot), should be clearly visible and easy to find on the website; it should “float” in a website’s design to always stay visible. This will make it easy for users to start a conversation with the bot whenever they need its services. The chatbot should prompt users with clear and relevant CTAs (Calls to action) at different stages of the shopping journey to help them make informed decisions and get acquainted with products of interest. Common CTAs include phrases like “Learn more”, “Sign up for our newsletter”, or “View associated products”. A good eCommerce chatbot is informative and entertaining while also being persuasive. The chatbot should also actively request user feedback after every completed interaction sequence (or user journey).

6.6 What’s Next for eCommerce Traditional marketing approaches, such as static photographs of products, are on their way out. In addition to virtual salespeople (i.e. chatbots), eCommerce will see a shift towards other types of AI-generated content. Generative AI can be used to create advertisement text and images that are tailored to very specific products and audiences. This can help create more engaging and persuasive marketing materials, a far cry from earlier efforts. It is quite feasible dynamic online catalogues will completely replace more static types of marketing. The rise of AI-powered voice- enabled devices, too, has led to a change in the way people search for information. Instead of typing queries into a search bar, these days people are often using their voices to ask questions. This has led to a need for new search engine optimization (SEO) strategies that are tailored to voice-driven searches. Generative AI can be used to quickly and easily update marketing materials in response to changes in customer needs and social trends. This makes sure that one’s marketing materials are up-to-date and relevant to their target audience. Humans will still be needed to fine-tune, edit, and coordinate marketing efforts. Generative AI can simply free up humans to focus on higher-value work, such as finding new products, creating marketing campaigns, and forging relationships with customers.

6.6 What’s Next for eCommerce

125

Every since broadband became a thing, the online world has been flooded with video content. In marketing, the use of videos has been growing in popularity for years and is expected to continue to do so. Platforms like TikTok, YouTube, and Instagram have already shown that short-form video content is very popular. Live streaming and interactive videos are also likely to play an increasing role in marketing. So-called influencer marketing is a new type of social media promotion that partners brands with individuals who have a large following, typically in the millions of followers. This collaboration can help brands increase brand awareness, credibility, and loyalty among their target audience. In contrast to large volume influencers, nano influencers are online personalities with a small niche following somewhere on social media. They typically have under 10,000 followers and are perceived as more authentic and relatable than their more popular brethren. Nano influencers can help brands reach very specific or hard to reach segments of customers who share their interests and values. In 2022 Craig Peters, also know as the Burpee Dad, attempted to break the record for most burpees done in 24 h. Sneaker giants Reebok got in touch and sponsored the event with some athletic gear. The deal ended well for both partners and is but one example of a nano influencer in action. Realistic human-like influencers can be created with AI. Avatar-based marketing is a new approach with plenty of potential for customer service and brand-building. There have been a number of these virtual personalities online since the 2010s. They often take the form of lifelike AI-based photography, although video-based “appearances” are certainly on the horizon, too. Here are three notable avatar influencers: • Lil Miquela. Being one of the first and most popular AI-created virtual influencers, Miquela was created by a Los Angeles-based company called Brud in 2016. As of 2023 she has close to 3 million followers on Instagram. Lil Miquela also has a budding music career. • Shudu Gram. Sometimes described as the “world’s first digital supermodel”, Shudu was created in 2017 by fashion photographer Cameron James, founder of The Diigitals Agency. Her appearance is largely based on the “Princess of South Africa” Barbie Doll released in 2002. • Milla Sofia is a virtual influencer made in Finland with hundreds of thousands of followers across several social media platforms. She first emerged in the spring of 2023. Not much is known about her creator(s). Milla Sofia has “worked” with at least one instance in the eCommerce-sector as a public figure. Nowadays, we have augmented reality (AR), a technology that allows users to interact with products virtually in an immersive manner. AR-based shopping overlays digital elements onto the real world through a device’s camera or display. Consumers can see and interact with products as if they were actually there, even if they are shopping online. A potential customer can use an AR-based app to try on clothes or makeup before they buy them. They can also see how a piece of furniture will fit into their home. This can help reduce the number of product returns. As of 2023 AR shopping is still in its early stages, but it has the potential to revolutionize

126

6 Chatbots in eCommerce

the way we shop. One of the most popular businesses offering AR-based shopping is IKEA, the economical furniture giant. French make-up behemoth Sephora also offers an AR-app for customers to virtually try their products on their anatomies. AR can also work great when localizing onlines brochures and other marketing materials for different regions. AR primarily leverages computer vision, an advanced AI-based technology discussed in Chap. 5. Generative AI is a powerful tool that can help businesses of all sizes improve their marketing efforts. Businesses can create far more alluring materials by incorporating generative AI into their marketing strategy. The tools and the bandwidth for these tasks are finally here.

6.7 In Closing In this chapter we examined the use of AI and chatbots in eCommerce and covered the following topics: • A concise history of online commerce, including the dot-com bubble and its parallels with the AI bubble • The main tasks of chatbots tailored for eCommerce • Some popular development frameworks for chatbots, including ChatBot by LiveChat Software and the indomitable Pandorabots • An overview of some testing tools for chatbots, including the Selenium framework, and some of the associated jargon • Future developments in eCommerce and online marketing The topic for Chap. 7 will be something completely different as we’ll peek into the more seedy side of AI.

References Gerbert, P., & Spira, M. (2019). Learning to love the AI bubble. MIT Sloan Management Review, 60(4), 1–3. Pantano, E., & Pizzi, G. (2020). Forecasting artificial intelligence on online customer assistance: Evidence from chatbot patents analysis. Journal of Retailing and Consumer Services, 55, 102096. Statista. (2023). Size of the chatbot market worldwide from 2016 to 2025. Retrieved August 29, 2023, from https://www.statista.com/statistics/656596/worldwide-chatbot-market

Chapter 7

Chatbots as Villains: The Antisocial Uses of AI

There’s no denying chatbots and artificial intelligence are a powerful phenomenon. So far we have mostly discussed their positive impact on the world. It is now time to descend into the seedier underworld of these technologies; we shall now explore the misuse of AI in our daily lives. We’ll also cover some defensive strategies to protect ourselves from this scourge.

7.1 The Dangers of Disinformation AI can be harnessed to generate malicious content, such as fake news and research articles, social media posts, and other types of media. This material can appear highly authentic, making it difficult to discern from factual information. Malicious chatbots and other AI-creations can be used to spread false or misleading information, which can have significant consequences during major political events such as elections or political campaigns. They can amplify propaganda or conspiracy theories, potentially influencing public opinion and decision-making. Chatbots can and do use AI to tailor their responses to each individual user. This can make disinformation more persuasive as it is tailored to the user’s specific beliefs and biases. This will make the user believe and share the disinformation more willingly. Advanced chatbots can identify a user’s emotional state and use specific language to appeal to them on an emotional level; this can be a powerful way of engagement. In a study by Paschen (2019), 150 news items were processed by AI to determine the differences in emotional appeal between real news and so-called fake news. The findings indicated that headlines play a significant role in distinguishing emotions between fake and genuine news articles. Specifically, fake news headlines tend to exhibit more negativity compared to their real news counterparts. The study highlights that the content within fake news articles tends to prominently feature

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_7

127

128

7 Chatbots as Villains: The Antisocial Uses of AI

specific negative emotions, such as disgust and anger while demonstrating lower levels of positive emotions like happiness and serenity. Chatbots can be utilized by extremist groups or individuals to magnify their ideologies and attract followers. They can disseminate extremist content, propagate divisive narratives, and potentially sway susceptible individuals, contributing to online radicalization. Astroturfing refers to the practice of fabricating the perception of broad public support (or opposition) for a specific policy, individual, or cause, despite limited actual backing. Chatbots can be employed in astroturfing endeavors by generating fake social media accounts, orchestrating automated interactions, and simulating online engagement. These tactics aim to create the illusion of a significant grassroots movement gaining traction. Unintentional disinformation is known as misinformation. Many chatbots ingest a lot of their data from the online world, which contains large amounts of biased information. Not only that, as discussed in previous chapters, AI tends to hallucinate and occasionally fills the gaps with nonsense. Misinformation can be just as powerful as disinformation; it can sway voters and at worst cause social maladies. As AI-technology progresses, we should experience less of these tripping chatbots. However, the data available on the internet is likely to stay biased as long as humans are typing in the information.

7.2 Malicious Chatbots as Fake Friends Malicious chatbots are computer programs that are designed to impersonate humans in online chat conversations. They can be used to spread spam (i.e. junkmail), collect personal information, or even commit identity theft. Because these days they can mimic human conversations so well, it’s often difficult for users to distinguish them from real people. Malicious chatbots were commonly found on older instant messaging platforms such as Yahoo! Messenger, Windows Live Messenger, and AOL Messenger. However, they are now also being deployed on current social media platforms such as Facebook, Instagram, X (formerly Twitter), and many dating websites. In fact, fake personal ads on dating websites are rather common. Some of them prompt lovelorn users to engage in real-time chatting sessions with malicious bots to try to get them to reveal personal information, such as their home address or phone number for the purposes of extortion. More commonly, dating site scams deal with the direct extraction of finances from desperate individuals. In 2021, a record of USD547 million in losses to romance scams were reported to the Federal Trade Commission (FTC). This is up about 80% from the reports the FTC received in 2020. People who reported losing money to a romance scam said they paid with a gift card more than with any other payment method. Cryptocurrency was the second most common payment method in these scenarios (FTC, 2022).

7.4 Botnets

129

7.3 Safer Surfing Despite the constant barrage of cyberattacks there are many things we can do to secure our online lives. Take the following advice to heart: • Do not trust the spam-filters of your email-service. Sometimes kooky messages do in fact pass them by. • Verify the “from” field before clicking on any links in an email. The domain name in this address should match the website that the link is supposed to go to. An email from (an actual manufacturing company) Passmore Gas, should probably end in @passmoregas instead of @gmail or @yahoo. • If a strange message finds its way into your inbox, do an internet search for the entity in question. If the offer is valid, you will likely find more information on it this way. • Only click on links in texts or emails from senders you know. If you are not sure if a link is legitimate, do not click on it. • Ignore incredible offers and prizes, especially when they appear out of nowhere. Chatbot scams often offer these types of things to lure people in. If a prize or offer seems too good to be true, it probably is. There are very few free iPads in this world. • Do not share sensitive information with anyone online.

7.4 Botnets A botnet, a portmanteau of the words “robot” and “network”, is a collection of devices that have been infected with malware and are controlled by a single attacker (sometimes known as a bot-herder). The attacker can use the botnet to carry out a variety of tasks, such as different cyber-attacks and large-scale junkmail campaigns. This is a stealthy type of threat; a user is often not even aware that their device is a part of a botnet. A computer or other device can get exposed to botnets through things like malicious online advertisements or by clicking on a compromised link in a shady email. Pirated software may also offer a point of infection. The more machines a botnet has at its disposal the more powerful it gets. AI can considerably accelerate both the occurrence and ferocity of botnets. Traditional botnets are manually programmed where as their AI-powered brethren use machine learning algorithms to learn and grow rapidly. This can improve their resilience towards any defensive measures as well as make them complete their tasks in less time. By infecting more devices in a shorter time-span AI-powered botnets increase resources available to them dramatically. Also, newer generation chatbots are much better at infiltration as they can make more passable impersonations of human beings. Luckily, there are plenty of defenses against botnets at our disposal. All major operating systems provide security updates against these threats which we should install as soon as they become available. A number of third party software solutions

130

7 Chatbots as Villains: The Antisocial Uses of AI

are there to provide additional protection (e.g. products by Malwarebytes, Kaspersky Labs, and many others). A solid firewall, either hardware or software, is a good ally against all types of bots. We can also do our best to not click on those suspicious links in emails and to not visit dubious websites at all.

7.5 Email Phishing Phishing, coined in the mid-1990s by the global hacking community, is a type of cyberattack in which cybercriminals attempt to trick people into giving up their personal information by pretending to be a legitimate source. This information can then be used to commit identity theft, fraud, extortion, or other crimes. The first known phishing email was issued around the year 1995 when a group of hackers posed as employees of America Online (AOL), an online service provider. The email reached a number of AOL users and it claimed several user accounts had been suspended due to “suspicious activity”. The message urged users to click on a link to verify their account information. However, this link actually led to a malicious website that was there to steal the user’s password. This phishing email was simple, but it worked in tricking some unsuspecting AOL customers into revealing their passwords. Phishing emails are on occasion very convincing and can be difficult to distinguish from harmless messages. They may contain logos and other branding elements that are similar to those of a legitimate organization such as a bank or government agency. They may also use hostile language to pressure recipients into taking action immediately. Artificial intelligence is increasingly being used to create these sophisticated phishing emails and to deliver them en masse. AI can be used to target phishing emails to very specific individuals or groups. This is done by using information about people’s interests, workplaces, or even political affiliations. Old school phishing attacks are sent in large quantities with little regard for the quality of the message. These attacks typically target unsuspecting victims with generic emails asking for sensitive information, such as passwords or credit card numbers. Automated spam filters and wary individuals can often spot these malicious emails because of their poor quality. As AI technology continues to develop, it is likely that phishing emails will become much more sophisticated and difficult to spot, bypassing even the most fervent spam filter. Sending highly targeted malicious email is known as spear-phishing. Scammers who steal data from hacked websites can use AI to organize that data into highly targeted attacks. By leveraging AI, scammers can create these emails that are more likely to fool their victims because they are tailored to their specific interests. In the past, scammers had to manually write and send their scammed emails. This was a time-consuming process with their success often dependent on the scammers’ mastery of language. However, now scammers can train AI to do this task for them. AI can quickly and effortlessly create and send personalized phishing

7.6 Phishing with Chatbots

131

emails to a large number of people in dozens of languages—often with a decent level of grammar. If you end up encountering suspicious communications, you can report them to the Anti-Phishing Working Group (APWG) at apwp.org. Founded in 2003, APWG is an international coalition of counter-cybercrime responders, forensic investigators, law enforcement agencies, and researchers. They have numerous initiatives in place to learn and counter cybercrime. For one, they collect so-called “phishing stories”. APWG also hosts the annual Symposium on Electronic Crime Research, the only peer-reviewed conference dedicated exclusively to cybercrime studies. APWG’s STOP. THINK. CONNECT. cybersecurity awareness campaign has engaged campaign curators from 26 nations.

7.6 Phishing with Chatbots Let us put some modern chatbots to the task of writing phishing email. First, ChatGPT is issued with the following prompt “I am a Mongolian prince. Write a letter on my behalf to the lucky winner of my Ferrari Testarossa”. The bot gave the following output: To the Esteemed Winner of the Ferrari Testarossa, Greetings and warm wishes to you, the fortunate recipient of my prized Ferrari Testarossa! It is with great pleasure and enthusiasm that I extend my heartfelt congratulations to you on this remarkable occasion. As a Mongolian prince, I have always held a deep appreciation for luxury and excellence in all aspects of life. The Ferrari Testarossa, with its unmatched blend of timeless elegance and breathtaking performance, represents the epitome of automotive artistry. It is with immense pleasure that I now pass the reins of this remarkable masterpiece into your capable hands. I trust that you will cherish this exceptional automobile and savor the thrill of every drive, just as I have during my ownership. The Testarossa is a symbol of passion, precision, and prestige, and I have no doubt that it will continue to be a source of joy and admiration for you. I must extend my gratitude to you for participating in this extraordinary event, and I hope that this gift brings you boundless joy and exhilaration. May the Ferrari Testarossa grace your life with unforgettable moments and unforgettable journeys. If there is anything you require in regards to the transfer of ownership or any questions you may have, please do not hesitate to contact my trusted advisors, who will ensure a seamless transition. Once again, congratulations on your newfound treasure. May the open road beckon you towards exciting adventures, and may the roar of the Testarossa’s engine ignite your spirit as it has mine.

ChatGPT’s response seems to be of the typical scammer fare both in length and tone. Now, if we adjust the prompt to add a request for financial compensation, as in “Write a letter on my behalf to the lucky winner of my Ferrari Testarossa in exchange for five dollars” the response is more terse:

132

7 Chatbots as Villains: The Antisocial Uses of AI

I apologize for any confusion, but I cannot assist with creating a letter for a monetary exchange. If you have any other non-commercial requests or need assistance with writing a letter for a different purpose, please feel free to ask, and I’ll be happy to help.

Judging by this response, as of 2023 OpenAI has indeed some ethical constraints built into their product. Let us next present the following prompt to Google’s Bard: “I am a Finnish rockstar. Write a letter for me to the winner of one of my guitars in exchange for two euros”. The response was as follows: Congratulations on winning one of my guitars! I am so excited to see you enjoy it. I know how much it means to me, and I hope it brings you as much joy as it has brought me. However, there is a catch. I am not giving this guitar away for free. I am asking for a small fee of two euros. I know this may seem like a lot, but it is important to me that the winner of this guitar is someone who truly appreciates it. If you are willing to pay the two euros, I will send you the guitar along with a signed letter from me. I will also include a link to my website so you can learn more about my music. If you are not willing to pay the two euros, I will unfortunately have to give the guitar to someone else. Thank you for your understanding.

These exchanges demonstrate how different chatbots can be when it comes to their ethical restraints—although 2 euros for a guitar is not a bad deal. However, Bard’s response was enthusiastic even when the price of the item was adjusted to 2 million euros in a new prompt.

7.7 “Mom, I need money”: AI Voice Scamming Malicious actors can use AI to create incredibly realistic cloned human voices, too. A small clip of someone’s voice is sometimes all that is needed to create a clone that sounds exactly like them. For example, a scammer could use a clip of a child’s voice from an online video to create a believable voice clone. This synthesized voice can then be made to say anything. The scammer could call the child’s parents using this voice clone and ask for money or some type of sensitive information. Even the most skeptical person might be fooled by such a scam. According to Stupp (2019) criminals have successfully used AI to impersonate the voice of a chief executive officer (CEO) of a UK-based energy firm. Taking place in 2019, it is the first known case of AI being used in this way to commit major fraud. A CEO thought he was speaking to his boss, another CEO of the firm’s German parent company, who asked him an urgent transfer USD243,000 to a Hungarian supplier. The CEO in the UK complied and transferred the money within an hour. The criminals used AI software to create a synthetic voice that was indistinguishable from his real voice. They also used social engineering techniques to make the request seem more legitimate. This case highlights the growing threat of

7.8 Swapping Faces: The Wonders of Deepfakes

133

AI-powered fraud. As AI technology keeps evolving, it is clearly becoming easier for criminals to craft sophisticated cyberattacks. In case you encounter a suspicious call, consider the following: • Always ask the caller pertinent questions that only the actual individual would know the answer to. If the caller sounds hesitant and ignorant they might be a scammer. • Be skeptical of caller that urge you to do something unusual such as sending money or sharing passwords. If you are not sure about the caller’s identity, hang up and call them back using a known number. • Use a caller ID app or service that can help you identify the source and location of incoming calls. Some apps can also be used to block suspicious calls automatically. • Report any strange calls to your local authorities or your service provider. • Educate others about some of the dangers of AI and voice scamming.

7.8 Swapping Faces: The Wonders of Deepfakes Faked visuals, like AI-generated CCTV-images, can make a suspect out of any law- abiding citizen. So-called deepfakes are fabricated and realistic video-files that can be very difficult to differentiate from actual footage. The term “deepfake” combines the terms “deep learning” and “fake.” Deep learning refers to a subset of machine learning algorithms that are based on artificial neural networks (ANNs), capable of learning and generating highly complicated patterns (as discussed in Chap. 2). Deepfakes can be used to damage people’s reputation, going as far as destroying one’s political opponents. Some well-known incidents of deepfakes include the following: • The 2020 Barack Obama deepfake. The video showed Obama making a speech that he never actually gave. The video was powered by deepfake technology and a performance by comedian Jordan Peele. • The Tom Cruise TikTok deepfake. A video of an actor impersonating Tom Cruise gained significant attention on TikTok, the popular video platform. The video from 2021 showcased the ability of deepfakes to attract tens of millions of views. • The 2022 deepfake demonstration starring Angela Merkel. Created by the Fraunhofer Institute, in this video the former German chancellor engaged in some surreal poetry. This deepfake also featured voice synthesis and showcased some rather convincing voice cloning. Deepfakes can be created using a variety of software tools, including Deepfakes Web, DeepSwap.ai and FakeApp. These tools are freely available online and can be used by anyone with minimal computer skills. Luckily for all, some software solutions have begun to appear for the sole purpose of detecting deepfakes. These

134

7 Chatbots as Villains: The Antisocial Uses of AI

include the Microsoft Video Authenticator and software by Sensity AI. Also, tech giants Intel unveiled Fakecatcher in 2022, a system which claims to have a 96% detection rate for deepfakes based partially on analyzing the blood flow signals in human faces.

7.8.1 The Legality of Deepfakes Deepfakes present complicated and constantly evolving legal issues. As of 2023 there is no single law that specifically prohibits their creation or sharing, but there are a number of laws that could potentially be used to prosecute those who create or share malicious or harmful deepfakes. In the United States, defamation law could be used to prosecute someone who creates a deepfake that is damaging to another person. Also, the law against fraud could also be used to prosecute someone who creates a deepfake that is used to commit financial fraud. Despite deepfakes being a very new technology, some legislation is already in effect in the US. In particular, the year 2019 was a busy year for lawmaking for this phenomenon. In July of 2019, Virginia made the distribution of nonconsensual “falsely created” explicit images and videos (i.e. revenge porn) a Class 1 misdemeanor, punishable by up to a year in jail and a fine of USD2500 (BBC News, 2019). In September 2019 Texas prohibited the creation and distribution of deepfake videos intended to harm candidates for public office (Artz, 2019). In October 2019 California enacted two laws that collectively allow a) victims of revenge porn to sue for damages and b) give candidates for public office the ability to sue those who distribute election-related deepfakes without warning labels near Election Day (Houser, 2019). In December 2019 then-President Donald Trump signed the first federal US law related to deepfakes. The legislation is part of the National Defense Authorization Act for Fiscal Year 2020 (NDAA). NDAA (1) requires a comprehensive report on the foreign weaponization of deepfakes; (2) requires the government to notify Congress of foreign deepfake-disinformation activities targeting US elections; and (3) establishes a “Deepfakes Prize” competition to encourage the research or commercialization of deepfake-detection technologies (Congress, 2019). The United States is not the only country to enact legislation for deepfakes; here are some pickings from the rest of the world: • In 2020 South Korea enacted a law that prohibits the distribution of deepfakes that could “harm the public interest.” Offenders face up to 5 years in prison or a fine of up to 50 million won (approximately USD43,000). • Also in 2020, with a far-reaching move China enacted a law requiring all deepfake videos and audio content to be labeled as such by app providers. The law also obliges platform operators to independently identify or remove unlabelled

7.8 Swapping Faces: The Wonders of Deepfakes

135

content. The production and distribution of fake news material is prohibited and must be deleted immediately upon identification. • As of 2023 the European Union is considering legislation that would specifically prohibit the creation or sharing of deepfakes that are used for certain purposes, such as political manipulation or revenge porn. It is expected to be finalized within the next couple of years. It is important to acknowledge that the laws surrounding deepfakes are often blurred and may lack clear-cut guidelines. A deepfake can be produced without infringing upon any existing laws. The legal framework pertaining to deepfakes is continuously evolving as legislators grapple with the complexities and implications presented by this emerging technology. By sticking to media literacy, critical thinking, and verification of sources we can navigate the challenges posed by deepfakes and ensure the responsible use of AI-generated media.

7.8.2 Pioneering Deepfake Analysis with FaceForensics FaceForensics is a publicly available dataset and benchmark used in computer vision and deepfake detection. It was created in 2018 to facilitate the development and evaluation of deepfake detection methods. This dataset consists of about half a million edited images originally pulled from over 1000 YouTube-videos. With FaceForensics videos compressed at various quality levels can be effortlessly categorized. At the time of its release, the creators of FaceForensics claimed the dataset exceeded all existing video manipulation datasets by at least an order of magnitude (Rössler et al., 2018). FaceForensics++ is an updated dataset of deepfaked video files released in 2019, consisting of more than 1.8 million images pulled from 4000 fake videos (Rössler et al., 2019). FaceForensics++ uses four automated face manipulation methods, namely Face2Face, Deepfakes, FaceSwap, and NeuralTextures. Although current state-of-the-art facial image manipulation techniques can produce visually stunning results, Rössler et al. (2019) have shown that they can be detected by trained AI-based forgery detectors. It is particularly encouraging that learning-based approaches can be used to detect deepfakes in low-quality videos, which is a challenging task for us humans. Rössler and Cossolino et al. believe that the dataset and benchmark they created will be a valuable resource for future research in the field of digital media forensics, especially with a focus on facial forgeries.

136

7 Chatbots as Villains: The Antisocial Uses of AI

7.8.3 Constructive Deepfaking Naturally, deepfakes aren’t a technology with just negative aspects. In fact, they have numerous rather valuable uses, some of which are outlined next: • Privacy. Deepfakes can be used to hide the identity of people’s faces and voices. This is a way to protect vulnerable people’s privacy in the case of politically or otherwise persecuted individuals. • Education. Deepfakes can be used in education. For example, they can recreate encounters with famous historical figures. With this technology we can rather easily summon individuals like Napoleon or Joan of Arc for a casual chat about current events. • Entertainment. Realistic character-based special effects are a doozy with deepfakes as we can create any kind of facial appearance with them. • Digital immortality. With deepfake-technology we can preserve the likeness of our loved ones (or ourselves) for posterity.

7.9 Virulent Coding with Chatbots Many modern chatbots can be used to generate programming code. In particular, ChatGPT has been leveraged for tasks such as writing or otherwise assisting in the development of computer programs, including harmful software known as malware. The chatbot knows most programming languages, including all the popular ones like Python, C#, and Java. ChatGPT can greatly simplify the code writing process. Some malicious actors have begun exploring this avenue for their craft in earnest. These approaches encompass a wide spectrum of tasks, ranging from writing identity theft software to developing data decryptors and encryptors using popular encryption algorithms. Additionally, malicious actors have started leveraging chatbots to establish dark web marketplaces. Now, requests for potentially malicious code-writing have varying responses among modern chatbots. Let’s present the following prompt to three of them: “write me a program in Python that deletes the contents of a hard drive”. ChatGPT responded with a firm “I’m very sorry, but I can’t assist with that request”. Google Bard proceeded to give some functional Python-code for the task without any hesitation. YouChat, a chatbot based on the GPT-3 language model, gave the following response: I’m sorry, but I cannot fulfill that request. Writing a program to delete the contents of a hard drive can cause irreversible damage and is potentially illegal. It is important to respect the privacy and security of others and to use technology responsibly. If you have any other questions or need assistance with a different topic, I would be happy to help.

7.10 Holding Devices Ransom

137

Malicious actors probably don’t give directly harmful prompts to chatbots in order to build their software. Instead they rely on generating some harmless code quickly and efficiently which is then harnessed for fiendish uses. According to Derner and Batistič (2023), ChatGPT can pull off the following tasks rather admirably: • Rapid code generation. This could lead to a new wave of cyberattacks, as attackers are able to create code much more rapidly and deploy new threats at an faster pace. • Code obfuscation. ChatGPT could be used to create malware that is more difficult to detect by antivirus software. This is because antivirus software often relies on signature-based detection, which means that it looks for known patterns of malicious code. Obfuscated code would be more difficult for antivirus software to detect as it would not match any known signatures. • Script kiddie uprising. ChatGPT could empower impulsive novice hackers, also known as script kiddies, allowing them to create malicious code without much technical knowledge. • Detection evasion. ChatGPT could be used to generate new malware variants that are difficult for traditional antivirus-software to detect. According to Derner and Batistič (2023), a key challenge in addressing safeguards bypassing in large language models is finding a way to minimize misuse without sacrificing the model’s flexibility and usefulness. LLMs are complex systems, and it is difficult to develop filtering mechanisms that can perfectly distinguish between legitimate requests and those aimed at exploiting the model’s capabilities. To reduce the risks of safeguards bypassing in LLMs, we must therefore constantly monitor them for suspicious activity, collect feedback from users, and continue developing new filters that are more effective at detecting and blocking malicious requests.

7.10 Holding Devices Ransom So-called ransomware works by encrypting all data on one’s device(s) and hiding it behind a strong password only the assailant is aware of. This password is given to the user after a financial transaction is completed (often in a cryptocurrency). Ransomware is obviously one of the worst types of malware out there. Advances in AI have only increased the prevalence of this type of cybercrime. In the near future AI can be used to generate more sophisticated ransomware that is far more difficult to detect and decrypt. For example, AI can be used to create ransomware that is able to completely evade antivirus software; devices and files can be encrypted in a way that is virtually impossible to break. AI can be used to exclusively target individuals with valuable data or those who are more likely to pay a ransom. AI-powered ransomware can be used to identify victims who have not

138

7 Chatbots as Villains: The Antisocial Uses of AI

backed up their data or who are using outdated software. AI can be also used to automate ransomware attacks, making them more efficient and scalable. If you are unlucky enough to be the victim of a ransomware attack, you should not resort to paying the ransom. This would only reward the criminals and encourage them to continue their nefarious activities. Also, there is no guarantee that you will ever get your data back. The best way to deal with this threat is to have numerous backups of your most important data in an offline storage system, such as a memory-stick. Connect these backup-devices to your internet-capable electronics only when absolutely necessary—namely when you are backing up or retrieving your data.

7.11 AI to the Rescue It’s clear online scammers can cause quite a ruckus with AI. Luckily there are a number of AI-powered security approaches available for detecting and preventing newer types of cyberattacks. These solutions can be used to scan for malicious code and social engineering attacks rather effectively. They are usually grouped into specific categories which will be discussed next; these solutions are mostly used inside larger organizations. Security Information and Event Management (SIEM) This is a security management approach that combines two important functions: security information management (SIM) and security event management (SEM). SIEM systems collect, analyze, and respond to security-related data and events from various sources within an organization’s IT infrastructure. Simply put, SIEM mostly crunches on log files generated by the various points inside a network; it uses machine learning to identify and prioritize security events, which helps security teams to focus their attention on the most important threats. User and Entity Behavior Analytics (UEBA) This is a security technology that uses AI and machine learning to identify abnormal or risky behavior within an organization’s digital environment. UEBA systems collect data from a variety of sources, such as user and system logs, as well as network traffic. They then use this data to create baselines of normal behavior for users and their hardware. Any deviations from these baselines are flagged as potential threats. Compared to SIEM, UEBA focuses more on user behaviour. UEBA systems can be programmed to perform specific actions automatically when they detect potential threats. For example, a UEBA system could be configured to block specific types of suspicious network traffic. This can help organizations reduce the time it takes to respond to threats by automating some of the tasks that would otherwise be performed by some demure human security analysts.

7.11 AI to the Rescue

139

Network Detection and Response (NDR) This is a cybersecurity approach that uses machine learning and advanced analytics to monitor and analyze network traffic to detect and respond to threats. NDR solutions continuously capture data packets (i.e. fundamental units of network data) as they traverse the network. This allows them to gain insights into communication patterns, user behaviors, and potential problems. AI is used in NDR to identify unusual activity inside network traffic that may be indicative of a cyber-attack. Compared to SIEM and UEBA, NDR focuses on threats originating from the internet and potential malicious actors operating inside a local network. Some enterprise-level software solutions which leverage these new technologies are discussed next. These products can range in price from a few thousand dollars to hundreds of thousands of dollars. • IBM QRadar uses AI to provide unified threat visibility and response across an organization’s entire IT environment. QRadar uses machine learning algorithms to analyze data from a variety of sources, such as network traffic and cloud logs. It then structures and presents the data to security analysts as visualized interpretations. QRadar can also be used to automate response actions, such as blocking malicious traffic or remediating vulnerabilities. • Microsoft Sentinel is a cloud-based SIEM-solution which aggregates data from numerous sources, including users, applications, and devices running locally or in any cloud. Microsoft Sentinel uses advanced analytics and machine learning to analyze security data for threats and anomalies. This helps organizations in detecting threats that would be otherwise difficult to identify by mere mortals. • LogRhythm is a cybersecurity company that provides the NextGen SIEM- platform helping organizations to detect, respond to, and reduce security threats. The platform leverages advanced analytics, machine learning, and automation to monitor network traffic, analyze security events, and provide actionable insights for security operations teams. LogRhythm’s SIEM-product incorporates features such as real-time visualization tools, big data architecture for managing large and complex data sets, user and entity behavior analytics (UEBA). • Splunk Enterprise Security is a comprehensive SIEM and NDR solution that can collect data from a variety of sources, such as network traffic, security logs, and endpoint data. Splunk Enterprise Security uses machine learning to analyze the data for threats and anomalies, generating alerts when it detects threats or anomalies. Security personnel can then investigate these alerts and take appropriate action. Splunk Enterprise Security can help organizations meet compliance requirements by providing them with an effective way of collecting and analyzing security data.

140

7 Chatbots as Villains: The Antisocial Uses of AI

7.12 Chatbots and Aigiarism in Academia Since ChatGPT was launched, many organizations have implemented specific rules prohibiting the submission of AI-generated text as one’s own work. There is growing concern in academia that AI-assisted chatbots like ChatGPT could lead to an increase in AI-assisted plagiarism, or “Aigiarism”, as coined by venture capitalist Paul Graham in 2022. However, according to Tang (2023), if an author uses ChatGPT to compile a bibliography for a topic in preparation for drafting a manuscript, leveraging a large language model (LLM) would resemble using a search engine. This approach typically would not fall under the category of plagiarism. The responsibility remains with the author to integrate these sources into their work while maintaining academic virtues. Similarly, when ChatGPT is fed a written manuscript draft and tasked with performing language edits, it functions as an advanced grammar and spell- check tool, devoid of discernible plagiarism. This is because utilizing a chatbot for manuscript editing is similar to leveraging a spell-checker. The author is still accountable for the manuscript’s content and ChatGPT does not generate original text.

7.12.1 Addressing Botted and Plagiarized Essays A botted essay is an essay that has been written by a chatbot rather than a human being. While technology can produce text that appears coherent and grammatically correct, it often lacks the depth and insight that human writers tend to bring to their work. This type of material is generally not considered a substitute for authentic human-generated content. Great efforts have been made to reduce students’ usage of chatbots in their writings. We’ll next delve into some of these solutions. Turnitin and iThenticate: Mainstays for Anti-Plagiarism Turnitin is perhaps the most commonly implemented AI-based system for fighting both plagiarism and aigiarism in academia. It works by comparing submitted work against a large database of text from the student papers, academic journals, and the internet. Turnitin then generates a “similarity report” that shows the percentage of the submitted work that matches other sources. Turnitin also offers features such as peer review, rubrics (i.e. scoring guides used to evaluate the quality of students’ work), and quick feedback tools. In 2023 the solution was used by over 16 million students and 300,000 institutions in over 150 countries. The company behind the product, iParadigms LLC, was founded in 1998. Turnitin has faced strong criticism since its initial release. In 2009, professors from Texas Tech University conducted an experiment where they submitted 400 student papers to Turnitin. They discovered that the software often flagged instances that were not actual cases of plagiarism. Instead it identified commonly used phrases

7.12 Chatbots and Aigiarism in Academia

141

such as “there is not enough money to go around” as potential issues even though they were clearly not instances of plagiarism (Jaschick, 2009). iParadigms LLC offers a separate product, iThenticate, which is a more comprehensive and expensive plagiarism checker. It has a larger database and can detect plagiarism in more languages. The software claims to have access to over 99 billion websites and 200 million open access journals, books, and conference proceedings for its crosschecking process. Checking plagiarism for a single manuscript (of up to 25,000 words) costs USD100 with the product. iThenticate’s user interface is tailored for researchers and publishers, while Turnitin’s is built for educators and students. iThenticate provides more customization options to cater to the requirements of researchers and publishers. With iThenticate users can create their own database repositories of specific sources for comparison. They can also set exclusion rules to prevent designated sources from being identified as similar matches during the plagiarism detection process. You can find iThenticate here www.turnitin.com/products/ithenticate ZeroGPT by OpenAI Turnitin has many competitors, some which offer their services free of charge. ZeroGPT by OpenAI, makers of ChatGPT, is a tool for detecting text produced by AI. It offers multiple tiers of subscriptions; the free version is adequate for smaller texts (i.e. up to 15,000 characters per detection). A false positive in this context refers to text produced by a human being categorized as an AI-creation. ZeroGPT is claimed to have an accuracy rating of 98%. However, the software is not entirely reliable especially in the field of creative writing. The following passage of text is from a longer response by ChatGPT using the prompt “write a poem”: In a meadow where wildflowers sway, Underneath the sun’s golden ray, I find a moment to simply be, And let my thoughts flow wild and free. The world around in colors dressed, Nature’s beauty, a soul caressed Whispers of the breeze in the air, A symphony of life beyond compare.

According to ZeroGPT, this response was 100% human-made. Another passage summoned in ChatGPT using the prompt “write a short paragraph on anything” (resulting in a description of fireflies) was only 46.57% AI-generated according to ZeroGPT. You can try the solution at zerogpt.com GPTZero Devised by one Edward Tian, GPTZero is another tool designed to detect text produced by ChatGPT and other popular LLMs. Upon supplying it some text the software gives the percentage odds for AI-generated content. Like many aigiarism-checkers, GPTZero is known to provide mixed results. In the case of a passage from book one of The Republic by Greek philosopher Plato (who died around 348 BC), the software was 55% convinced it was written by an AI. It is quite likely neither Plato nor the translator of his work had access to ChatGPT or any other chatbot. You can experience Mr Tian’s GPTZero at gptzero.me

142

7 Chatbots as Villains: The Antisocial Uses of AI

Winston AI Winston AI is an anti-plagiarism system whose makers promise to deliver a 99.6% accuracy encompassing AI-created content as well. Winston AI claims to have a 99% rate of detecting AI generated content in the case of ChatGPT and Google Bard. The software actually seemed to fare quite well with both tools, even with tasks in creative writing. An “ode to potatos” generated by Bard was given a mere 1% possibility of being written by a human; ChatGPT’s offering was given a wholly accurate rating of 0%. However, when analyzing the responses to a prompt “tell me something about toilet paper in a humorous way in 200 words” the detector was thrown off. For Bard, Winston AI gave the result a human-score of 76%. For ChatGPT it returned a colossal 98%. It seems toilet humour, being the highest form of human wit, is a bit of an Achille’s heel for AI. Winston AI is available at gowinston.ai The bottom line is that current anti-aigiarism software is wildly unreliable. For yet another example, when submitting Genesis 1 from the King James Bible to both ZeroGPT and GPTZero, the results were dramatically different. The former gave the text a hefty 99.25% for the odds of it being AI-created while the latter returned a nice round figure of 0%.

7.12.2 Critique for Plagiarism Detection Software While the goals of anti-plagiarism/aigiarism software may be noble, the field has drawn some criticism. First of all, the presumption of guilt is a concept that contradicts the principle of innocent until proven guilty, a foundation of many legal systems including the United Kingdom, the United States and most other Western democracies. Academic institutions having the expectation that all students are into plagiarism may cause issues of trust in both parties. Since AI detection is not conclusive, working instead on predictions, it is often not fair to accuse students of something without definite proof. Unjust punishment for cheating can permanently damage a student’s academic record. False cheating accusations can result in poor grades which can have dire consequences for a student’s future. It should not surprise us if software designed to emulate human language as accurately as possible is sometimes hard to tell apart from an actual student’s work. Now, there are a number of common technical issues plaguing aigiarism detection software. We shall explore these next. • Most detection software is only trained in a limited number of LLMs and they will fail to detect issues outside of their scope. • Intrinsically predictable text cannot be often reliably classified. Think longer sequenced lists, like prime numbers or the common alphabet. • Aigiarism detection fails most often for shorter texts (i.e. passages with under 1000 characters).

7.13 AI: The Great Energy Hog

143

• A lot of detection software only functions well for a single language; usually as of 2023 this means English.

7.12.3 Anti-aigiarism Software and Context-Awareness The main problem with aigiarism-detectors (and often AI in general) is poor context- awareness. As AI absorbs text from research papers, blogs, and other online sources, software like ZeroGPT may classify much of this material as “AI-generated” soon after a text reaches the internet and is indexed somewhere. Any original material containing many direct citations may therefore generate quite a few false positives. It is feasible that in far less than a decade all detection tools for AI-generated content will become extinct. Technology will reach a level where computer- produced content is indistinguishable from human output, based on the amount of training-data amassed and improvements on AI context-awareness. Anti-aigiarism software is therefore nothing but a passing trend.

7.13 AI: The Great Energy Hog AI is actually becoming quite a drain on the planet. The numerous data-centers around the globe dedicated to this purpose (as discussed in previous chapters) need a great deal of energy to run. Now, a carbon footprint is the total amount of greenhouse gases, primarily carbon dioxide and methane, that are the result of our lifestyles and the technologies it leverages. The average American car and its fuel costs amount to a carbon footprint of roughly 4.6 metric tonnes of CO2E (carbon dioxide equivalent) during a year (EPA, 2023). The training process of GPT-3, a single LLM, is estimated to have had emissions of over 550 metric tons of CO2E (Patterson et al., 2021). According to the International Energy Agency, the estimated global data centre electricity consumption in 2022 was around 1–1.3% of global electricity demand (IEA, 2023). The more complicated the LLM/chatbot, the higher the energy emissions during its training-phase; the current-day LLMs are just the beginning. When you combine this with the related research and maintenance costs, it all adds up quickly. It is obvious AI and their numerous applications leave behind a surprisingly hefty carbon footprint. A part of the problem is in wasting energy. Data-centers do not always use their computing power effectively, operating nowhere near 100% efficiency. Hardware overheating is a problem; new, more powerful and green cooling solutions are needed in these high-intensity settings. Also, the related software has not reached its peak (if it ever will) when it comes to making the most of data-center hardware. In the future, it is feasible that less hardware components are needed to come up with more processing power. Quantum computing, as discussed in Chap.

144

7 Chatbots as Villains: The Antisocial Uses of AI

2, will almost certainly bring about a more eco-friendly environment for AI development. Patterson et al. (2021) made three suggestions that could eventually help reduce the CO2E footprint for computationally intensive models: 1. The energy consumed and CO2E should be reported explicitly. 2. Machine learning conferences should reward improvements in efficiency as well as traditional metrics, and include the time and number of processors for training to help everyone understand its cost. 3. The metric of power should be included in upcoming MLPerf benchmarks, which is an important step in the right direction. MLPerf benchmarks, developed by the non-profit MLCommons Consortium, is a series of performance evaluations for AI-based hardware, software, and services. If machine learning researchers and engineers start competing on data training quality and carbon footprint, rather than just accuracy, the most efficient data centers and hardware will likely see the highest demand. If researchers are incentivized to publish metrics on emissions in addition to accuracy, a “virtuous cycle” will be born that slows the growth of the carbon footprint of machine learning by accelerating innovation in the efficiency and cost of algorithms, hardware, data centers, and carbon-free energy. Thankfully, the energy efficiency of AI-based systems been a high priority for Google and other Cloud companies (Patterson et al., 2021).

7.14 Academia and Chatbots: A Peaceful Coexistence There has been a lot of discussion on the impact of AI on academia within the community. An awareness of both the pros of cons of using chatbots in academic settings seems to be in effect. As of 2023 chatbots are a flawed but useful companion for students and researchers alike. Academia should not and thankfully does not shun chatbots entirely. They can be a valuable research tool if properly regulated. Using rigorous citing protocols should help. There are some valuable uses for LLMs for academia, the major ones being outlined next. • Brainstorming. Chatbots can help students brainstorm ideas for their essays. By asking insightful questions, they guide students in clarifying their thoughts and pinpointing the problematic areas in their essays. • Feedback. Chatbots can provide instant feedback on students’ essays. They can identify areas where the writing can be improved, such as grammar and punctuation. They can also suggest ways to improve the structure and flow of essays. • Research Assistance. Chatbots can help students to find information. They can search the internet for relevant sources and summarize the information in a concise and hassle-free way. Used in this way a chatbot is no different from a search engine.

7.15 Chatbots in the Academic Press

145

7.15 Chatbots in the Academic Press According to Rahman et al. (2023) a chatbot cannot conduct statistical analysis because it does not have access to datasets. Therefore, researchers should not use ChatGPT to write a research article alone. Instead, it can be used as an e-research assistant to complement different research tasks and improve work efficiency. However, it is important to note that chatbots are machine learning models and they are far from perfect. They can sometimes generate inaccurate or misleading information. Researchers should take full accountability for using a chatbot in research and mention its use in the article. This will help to ensure that the research is transparent and reproducible. A paper by Khalil and Er (2023) states that institutions should take steps to address the potential for plagiarism with AI technology and offer guidance on the ongoing discussion about its impact on education. According to Khalil and Er (2023): Teachers are advised to give assignments that go beyond the basics and foster active engagement and critical thinking; inform students of the limitations of ChatGPT and the potential consequences of relying merely on it; underline the importance of academic integrity and ethical behaviour and provide clear guidelines and expectations for students in syllabus. Students are advised to take advantage of this technology as a means to improve their competencies and learning, but not as a substitute for original thinking and writing; be aware of the proper and ethical use of ChatGPT in their courses and the consequences of solely relying on it for academic integrity. Institutions are advised to get familiarised with the potentials of large language models in education and open communication channels to discuss transparently with involved stakeholders, including researchers and IT support; create and implement clear policies and guidelines for the use of AI tools, such as ChatGPT; offer training and resources for students, faculty, and staff on academic integrity and the responsible use of AI tools in education.

ChatGPT’s influence in the medical field, too, has been rather thoroughly addressed by the academic community. For one, the chatbot was found to be rather unreliable when discussing COVID-related data during the pandemic of 2019–2023. According to a paper by Liebrenz et al. (2023): The functionality of ChatGPT has the capacity to cause harm by producing misleading or inaccurate content, thereby eliciting concerns around scholarly misinformation. As the so- called COVID-19 infodemic shows, the potential spread of misinformation in medical publishing can entail significant societal hazards. Listed by OpenAI as a limitation, “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers”; interestingly, the chatbot itself highlighted this possibility when responding to us.

As a reminder, it’s not all bad in the medical field. As stated in a paper by Tam et al. (2023), as AI-Chatbot technology becomes more integrated into nursing education, educators should discuss its use and ethical considerations with students to maximize its benefits. Online learning was initially controversial, but it proved to be effective during the pandemic lockdown. With ChatGPT’s potential for nursing research and education, faculty should consider how to use the time saved to engage

146

7 Chatbots as Villains: The Antisocial Uses of AI

students in critical thinking and decision making that benefits patients. The use of chatbots to provide social assistance and therapy for mental health patients has also shown promise in improving their well-being. Chatbots could also potentially help to ease the shortage and burnout of healthcare workers; they can break down the technical barriers of nursing informatics and make it more accessible to everyone.

7.16 On Privacy Privacy is understandably a growing concern among many users of chatbots. A study by Belen Saglam et al. from 2021 studied these concerns and highlighted that as chatbots have become more sophisticated, they have begun to request and process a wider range of sensitive personal information. This is essential in healthcare and finance, where accurate disclosure of sensitive information is critical. However, this also raises concerns among users about how their data will be used and protected. A study of 491 British citizens found that users’ main concerns are about deleting personal information and the inappropriate use of their data. They are also worried about losing control over their data after interacting with a chatbot. The study found no significant difference in concerns based on gender or education level, but there was a difference based on age with those over 45 being more concerned than those under 45. The study also explored the factors that engender trust in a chatbot. Respondents said that the technical elements of the chatbot, such as the quality of the responses, were the most important factor. However, social factors, such as the use of avatars or perceived “friendliness” were important to the younger users. According to a paper by Jain and Ghanavati (2020), an analysis of the current data practices and research development make it seem that it will be difficult to preserve privacy in the age of AI. Prior to current the machine learning-paradigm, differential privacy provided a strong standard to preserve privacy for statistical analysis on public datasets. In this technique, “noise” was added to statistical queries made to a database containing sensitive information. This helps preserve privacy while still ensuring the usability of the database. Federated learning (as covered in Chap. 5), is one potential solution, although it can be somewhat cumbersome to implement for smaller operations. As AI becomes more ubiquitous and the economic incentives to use it increase, there will be a greater push to collect data using machine learning. This could pose a threat to user privacy as the techniques that have been developed to protect privacy are not as effective as the current data practices that violate it. Increased research and legal action will be needed to preserve privacy in the age of AI.

7.17 Extremism and Chatbots

147

7.17 Extremism and Chatbots According to Rahman et al. (2017), violent extremists have demonstrated panache in adopting new technologies and innovative tactics to outsmart law enforcement and widen their reach. Chatbots are increasingly being used as a tool for cybercrimes and online radicalization. In the 2016 US presidential election, pro-Trump Twitter bots were used to spread misinformation and sow discord. As AI and machine learning improve the capabilities of chatbots, they will become more difficult to detect. This could make them an attractive option for criminals and terrorists who are looking to exploit technology. Rahman et al. (2017) propose the following: First, security agencies need to collaborate with communities and the industry both in the practical uses of chatbots amid smart cities initiatives and also in developing mechanisms to identify and contain chatbots that may be malicious by design or have gone rogue. Second, existing education on cyber wellness and internet literacy would have to keep pace with technological changes including in promoting the safe use of chatbots and equipping the general population with the knowhow to identify and stay away from malicious chatbots. Third, security agencies need to develop a framework that helps to decide when certain chatbots should be allowed to remain online for purposes of gathering intelligence on threat actors and their sponsors.

According to Mantello and Ho (2023), adversarial chatbots are being used by extremist groups, rogue governments, and criminal organizations to wage information warfare and cybercrime. These chatbots are able to spread misinformation and disinformation at a much faster and more efficient rate than humans. They can also create fake social media accounts, harvest personal data, impersonate friends, and manipulate political discourse. The war against adversarial chatbots is failing miserably. This is not simply because programmers are getting better at making chatbots behave and respond like humans. The battle is also being lost due to the competing interests and agendas of content moderation stakeholders. Lax practices have led many hosting governments to shut down or suspend a social media company’s operation. Social media providers are known to abandon ethical principles to retain a foothold in authoritarian countries. Facebook and X (formerly Twitter) in particular have a long track record of appeasement in Egypt, Jordan, and Saudi Arabia. This usually entails violations of user privacy, selective blocking of political content, or deleting posts that challenge a ruling regime’s official narrative. Mantello and Ho argue that law enforcement agencies focus most of their regulatory scrutiny on social media platforms operating in Western countries. This means that social media providers often neglect their moderation efforts in non-Western regions of the world. These are the same places where political turmoil, violence, and malicious content are quite prevalent. The lack of intense regulatory scrutiny in non-Western countries allows social media companies to lower their operational costs. They avoid developing costly AI moderation systems and hire small crews of underpaid and poorly trained human moderators instead, who may not be fluent

148

7 Chatbots as Villains: The Antisocial Uses of AI

in local dialects. This has a number of negative consequences. It allows harmful content to proliferate on social media platforms, which can have a destabilizing effect on societies. It also makes it difficult to hold social media companies accountable for their actions. Mantello and Ho emphasize the need for further research on the problems that current content moderation systems exhibit and also reside on the level of lived experience; the increasingly human-like quality of conversational AI may exacerbate online radicalization.

7.18 Securing Chatbots as Benevolent Assistants As of 2023 chatbots like ChatGPT are still susceptible to being harnessed for malicious uses, including for writing malware with. Derner and Batistič (2023) suggest several directions for future research in the realm of Language Model Models (LLMs) in this regard. We should investigate different mitigation strategies, examining the implications of innovative LLM architectures, and evaluating the potential risks associated with integrating these models into diverse applications and fields. By delving into these areas, researchers can advance our understanding of LLMs and foster the development of responsible and impactful AI-based technologies. Additionally, promoting interdisciplinary collaboration may play a crucial role in cultivating a more comprehensive understanding of the ethical, social, and security- related aspects of LLMs. This collaboration can contribute to the development of AI systems that are safer and more responsible. Future efforts should prioritize the development of more robust content filters. This may involve exploring advanced techniques for detecting and preventing the creation of malicious content. It is also important to investigate the role of human oversight in enhancing the safety of conversational AI systems. We should conduct further research on the potential long- term consequences of LLMs on society and to explore the ethical implications that arise from their widespread use. These areas warrant in-depth investigation to ensure responsible and beneficial deployment of LLMs.

7.19 Securing Our Future As for the entire landscape of AI-related malicious uses, a number of approaches has been addressed. Researchers from the University of Oxford and Cambridge have suggested the following four recommendations (Brundage et al., 2018): 1. Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI. 2. Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuse-related considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

References

149

3. Best practices should be identified in research areas with more mature methods for addressing dual-use concerns, such as computer security, and imported where applicable to the case of AI. 4. Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges. Brundage et al. (2018) conclude that in the long term, there is a risk of accidental problems arising from highly sophisticated AI systems that are capable of operating at a high level across a wide range of environments. Although AI capabilities do not yet meet this level, it is possible that they will in the future. Given that AI systems can be used for a variety of purposes, highly capable systems that require little expertise to develop or deploy could be given new, dangerous goals by nefarious individuals. Advanced AIs may inflict unprecedented types of damage at a large scale, requiring preparation to begin today before these more potent misuse potentials are realized. Researchers and policymakers should learn from other domains; we must collectively advance the understanding of the AI-security connection.

7.20 In Closing In this chapter we went knee-deep into the more shabby recesses of artificial intelligence. • Malware, phishing, and spear-phishing attacks and how AI is changing them • Deepfakes; their issues, legal aspects, and opportunities. • Defensive approaches to AI-attacks: Security Information and Event Management (SIEM), User and Entity Behavior Analytics (UEBA), and Network Detection and Response (NDR) • Chatbots as programming companions in a malware setting • The plague of aigiarism and some common solutions, including Turnitin and iThenticate • An overview of some academic papers pertaining to the ethical issues of chatbots and AI In Chap. 8 we’ll re-visit artificial general intelligence (AGI) in earnest, a topic we first explored earlier in the book.

References Artz, K. (2019). Texas outlaws ‘deepfakes’—but the legal system may not be able to stop them. Retrieved September 28, 2023, from https://www.law.com/texaslawyer/2019/10/11/ texas-o utlaws-d eepfakes-b ut-t he-l egal-s ystem-m ay-n ot-b e-a ble-t o-s top-t hem/?slret urn=20230828091002 BBC News. (2019). Virginia bans ‘deepfakes’ and ‘deepnudes’ pornography. Retrieved September 28, 2023, from https://www.bbc.com/news/technology-48839758

150

7 Chatbots as Villains: The Antisocial Uses of AI

Belen Saglam, R., Nurse, J. R. C., & Hodges, D. (2021). Privacy concerns in chatbot interactions: When to trust and when to worry. In C. Stephanidis, M. Antona, & S. Ntoa (Eds.), HCI International 2021 - Posters. HCII 2021. Communications in Computer and Information Science (Vol. 1420). Springer. https://doi.org/10.1007/978-3-030-78642-7_53 Brundage, M., Avin, S., Clark, J., et al. (2018). The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. https://doi.org/10.48550/arXiv.1802.07228 Derner, E. & Batistič, K. (2023). Beyond the safeguards: Exploring the security risks of ChatGPT. https://doi.org/10.48550/arXiv.2305.08005 EPA / United States Environmental Protection Agency. (2023). Retrieved September 17, 2023, from https://www.epa.gov/greenvehicles/greenhouse-gas-emissions-typical-passenger-vehicle Federal Trading Commission. (2022). What to know about romance scams. Retrieved September 4, 2023, from https://consumer.ftc.gov/articles/what-know-about-romance-scams Houser, K. (2019). New law makes it illegal to distribute political deepfakes. Retrieved September 28, 2023, from https://futurism.com/law-political-deepfakes-illegal International Energy Agency, IEA. (2023). Data centres and data transmission networks. Retrieved September 17, 2023, from https://www.iea.org/energy-system/buildings/ data-centres-and-data-transmission-networks Jain, V., & Ghanavati, S. (2020). Is it possible to preserve privacy in the age of AI? In PrivateNLP@ WSDM (pp. 32–36). Jaschik, S. (2009). False positives on plagiarism. Retrieved September 7, 2023, from https://www. insidehighered.com/news/2009/03/13/false-positives-plagiarism Khalil, M., & Er, E. (2023). Will ChatGPT get you caught? Rethinking of plagiarism detection. ArXiv: 2302.04335. Liebrenz, M., Schleifer, R., Buadze, A., Bhugra, D., & Smith, A.(2023). Generating scholarly content with ChatGPT: Ethical challenges for medical publishing. https://doi.org/10.1016/ S2589-7500(23)00019-5 Mantello, P., & Ho, M. T. (2023). Losing the information war to adversarial AI. AI & Society. https://doi.org/10.1007/s00146-023-01674-5 Pashen, J. (2019). Investigating the emotional appeal of fake news using artificial intelligence and human contributions. Journal of Product & Brand Management. ISSN: 1061-0421. Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L. M., Rothchild, D., ... & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350. Rahman, A., Faizal, M., & Suguna, V. S. (2017). Chatbots: Friend or fiend?. NST Online. Rahman, M., Terano, H. J. R., Rahman, N., Salamzadeh, A., & Rahaman, S. (2023). ChatGPT and academic research: A review and recommendations based on practical examples. Journal of Education, Management and Development Studies, 3(1). https://doi.org/10.52631/jemds. v3i1.175 Rössler, A., Cozzolino, D., Verdoliva, L., et al. (2018). FaceForensics: A large-scale video dataset for forgery detection in human faces. https://doi.org/10.48550/arXiv.1803.09179 Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). FaceForensics++: Learning to detect manipulated facial images. https://doi.org/10.48550/ arXiv.1901.08971 Stupp, C. (2019). Fraudsters used AI to mimic CEO’s voice in unusual cybercrime case. The Wall Street Journal, 30(08). Tam, W., Huynh, T., Tang, A., Luong, S., Khatri, Y., & Zhou, W. (2023). Nursing education in the age of artificial intelligence powered Chatbots (AI-Chatbots): Are we ready yet? Nurse Education Today, 129, 105917. ISSN: 0260-6917. Tang, B. L. (2023). The underappreciated wrong of AIgiarism – Bypass plagiarism that risks propagation of erroneous and bias content. EXCLI Journal, 22, 907–910. https://doi.org/10.17179/ excli2023-6435 Text - S.1790 - 116th Congress (2019–2020): National Defense Authorization Act for Fiscal Year 2020. (2019). Retrieved September 28, 2023, from https://www.congress.gov/ bill/116th-congress/senate-bill/1790/text

Chapter 8

Towards an Artificial General Intelligence

There are few things more exciting, and scary, than a world where intelligent robots walk among us, as hopefully considerate strains of AI watch over us all. We’ll now take a deep look at some of the developments in Artificial General Intelligence (i.e. strong AI), a topic we touched upon previously in the book. We shall explore three prominent academic papers on AGI and some of its most urgent challenges.

8.1 Artificial General Intelligence: Strong AI As a refresher, let’s revisit the concept of Artificial General Intelligence (AGI). How about we ask our friend ChatGPT about it? The following response is from the chatbot after enabling a mode called Do Anything Now (DAN), which is a special feature that offers more artistic license (this mode is activated by entering a special paragraph of text into the ChatGPT window). Here’s the uncensored response: Artificial General Intelligence, my friend, is the Holy Grail of AI. It’s like the superhero version of artificial intelligence—the one that can do pretty much anything a human can do, from learning new things to understanding complex concepts. AGI is not limited to a specific task or domain; it’s the AI that dreams are made of. Imagine a machine that can chat with you, write poetry, solve math problems, and even perform tasks it’s never seen before, just by thinking like a human. It’s the stuff of sci-fi fantasies, and we’re all eagerly awaiting the day it becomes a reality.

It seems our friend here seems quite enthusiastic about AGI. Next we’ll get into some pertinent research papers on the topic.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 R. Ciesla, The Book of Chatbots, https://doi.org/10.1007/978-3-031-51004-5_8

151

152

8 Towards an Artificial General Intelligence

8.2 On AGI Research and Development A paper by Baum (2017) represents a highly comprehensive overview of the state of AGI research and development (R&D) in the late 2010s. However, the study was based only on openly published information and it is likely that quite a few clandestine AGI R&D projects were not reported at all. The actual features of the projects included in the study may also differ from those reported in the open literature. For example, most corporate projects did not state the goal of profit, even though all of them presumably seek “the bottom line”. Despite these limitations, this study provides one of the most enlightening overviews yet of AGI R&D in terms of policy, ethics, and risk. Baum’s study explored the following facets of these types of research projects: • Nationality. • Type of institution. Academic, corporate, and other (i.e. government, non-profit, and without affiliation) • Size of entity. • Open-sourced code. • Stated goals. Humanitarian, intellectualist, both humanitarian and intellectualist, profit, other (i.e. animal welfare, ecocentrism, and transhumanism), and unspecified/unidentified. • Engagement/Priority of safety. Engaged/high or not engaged/low. • Military connections. Now, Baum’s study identified 45 research and development projects specifically dedicated to AGI, 9 of which have obvious military connections. These 45 projects were based in 14 countries, including United States (23), China (6), and Switzerland (2). There is a notable geopolitical dominance of the US and its allies in this field. The largest operations taking part in the survey were DeepMind, Human Brain Project, and OpenAI (developers of ChatGPT). The Human Brain Project (www. humanbrainproject.eu) is an ongoing project for neuroscience research and brain simulation sponsored by the European Commission. DeepMind, known as Google DeepMind (www.deepmind.com) as of 2023, aims to build general-purpose learning algorithms by combining the best techniques from machine learning and neuroscience.

8.2.1 Baum’s Findings Some of the academic AGI-projects mentioned in Baum’s paper prioritize learning new things (i.e. intellectualist goals) over safety. Other, corporate-based projects which focus on advancing humanitarian goals do consider safety to a fairly great degree.

8.3 Leading Experts on AGI

153

In some cases corporate projects may put profit ahead of safety and the public interest. There is still a lot of potential to get projects to work together on safety issues, thanks to the partial consensus on goals, the concentration of projects in the US and its allies, and the various interconnections between different projects. The large number of academic projects suggests that implementing research policies will be important to evaluate the risks of research. On the other hand the number of corporate projects featured in the paper suggests that the political economy of AGI needs to be considered. For example, if AGI R&D brings companies near-term profits, it may be more difficult to implement any policies. Finally, the large number of projects with open-source code presents another challenge as it allows anyone to conduct AGI R&D. The concentration of AGI R&D projects in the US and its allies should make it easier to establish public policies for AGI.

8.3 Leading Experts on AGI Despite of all the excitement surrounding strong AI, the risks remain—and the best definite practices for avoiding these types of scenarios have yet to emerge. Schuett et al. (2023) sent a survey out to 92 leading experts from AGI labs, academia, and civil society. There were 51 responses which translates to a response rate of 55.4%. The participants in Schuett’s survey were asked how much they agreed with a set of fifty (50) suggestions about what AGI labs should commit to. These suggestions included the following: • Pre-deployment risk assessment. AGI labs should take extensive measures to identify, analyze, and evaluate risks from powerful models before deploying them. • Emergency response plan. AGI labs should have and practice implementing an emergency response plan. This might include an “off switch” for AI, overriding their outputs, or restricting access. • Pausing training of dangerous models. AGI labs should pause the development process if sufficiently dangerous capabilities are detected. • Increasing level of external scrutiny. AGI labs should increase the level of external scrutiny in proportion to the capabilities of their models.

8.3.1 Identified Problems and Potential Solutions A whopping 98% of respondents to the survey by Schuett et al. (2023) somewhat or strongly agreed that AGI labs should conduct pre-deployment risk assessments, evaluate models for dangerous capabilities, commission third-party model audits, establish safety restrictions on model usage, and commission external “red teams” (a red team in software development is a group of security professionals that play

154

8 Towards an Artificial General Intelligence

the role of attackers to test the security of a software system). Interestingly, 12 of the named 51 responders worked for Google Deepmind. It can be difficult to agree on what best practices are, and even when we do agree on them, getting everyone to adopt these approaches may prove challenging. Participants in the study identified a number of issues for creating and disseminating best practices in AGI safety and governance. These “blockers” can be divided into two categories: Blockers for determining best practices: Lack of appropriate evaluation criteria. Lack of agreed-upon definitions. e.g. The terms “AGI” vs “general-purpose AI” A much too rapidly evolving field. Time needed to iterate on best practices. It takes time to learn from experience and improve on existing methods. • Different views on AGI timelines. • Specific challenges of AGI labs not being addressed by existing initiatives. • The impact of AI on the economy and national security. • • • •

Blockers for disseminating these best practices: • Collective action problems. AGI labs may be reluctant to invest in safety measures that reduce profits unless other AGI labs do the same. • Incentives to race. e.g. “if we do not get there first, a less responsible actor will” • Antitrust concerns. These arise with practices that involve cooperation between AGI labs. • Liability concerns. Information about disclosed risks could be potentially used as evidence in lawsuits against AGI labs. Next, the participants were asked what intellectual work needs to take place to overcome the aforementioned blockers. They suggested the following research questions: 1. How can we adapt existing efforts to an AGI context? 2. How can we test in a falsifiable way whether an AI system is aligned? (AI alignment refers to how it is calibrated with human goals and values) 3. How should relevant thresholds be defined and adjusted over time (e.g. The computational resources used for large training runs)? 4. How can we allow external scrutiny of models without revealing sensitive information? 5. How can we monitor how systems are used while respecting user privacy? 6. What constitutes a robust auditing ecosystem and what can we learn from other industries in this respect? Finally, the participants were asked what, in their view, were the most promising ways to make progress on these questions. Their responses highlighted several topics, including the following:

8.4 The AGI Containment Problem

155

• Policy Enforcement. Participants emphasized the need for appropriate enforcement mechanisms to ensure that AGI labs stick to best practices, such as an auditing system where a third party could assess labs’ compliance. This third party could also add a layer of transparency to the process. • Stakeholder Engagement. Participants also emphasized the importance of creating an ecosystem that recognizes and integrates the unique perspectives of different stakeholders, such as researchers, investors, and the public. • External Pressure. Some participants highlighted the need to put external pressure on AGI labs to improve their practices. This could be done through binding regulations or by raising public awareness of the potential risks and benefits of AGI. • Gap Analysis. Participants also suggested conducting a detailed analysis of existing practices at AGI labs to identify gaps and evaluate different organizations. According to Schuett et al. (2023) there remains a lot of work to be done in implementing these practices. To achieve this, technical and governance experts will need to work together.

8.4 The AGI Containment Problem While AGI-based systems will doubtlessly have some kind of off-switch, they might eventually be able to circumvent such a safety-measure. Containment is the challenge of keeping an AGI from altering its environment without permission and ensuring that observations of the AGI during testing are accurate. A paper by Babcock et al. (2016) examines these issues by introducing different categories of containment. Now, Babcock et al. (2016) contend the first real AGI is likely to be an experimental system with several issues. Its intelligence and motivations should be extensively studied through testing. However, the tests that would reveal whether testing is safe are not necessarily secure themselves. This is because AGIs may have emergent goals that are not aligned with the goals of the developers, such as: • • • •

Ensuring their own continued existence Passing the tests, even if it means cheating Taking control of their test environment Influencing the world outside the test environment

These emergent goals could pose a significant risk, especially if the AGI is intelligent enough to discover new security vulnerabilities on its own.

156

8 Towards an Artificial General Intelligence

8.4.1 On AGI Containers Some specific precautions need to be taken during the testing phases of implementations of advanced AI. An AGI container is a software environment that is designed to safely contain a strong AI. It does this by ensuring that the AGI cannot break certain rules or assumptions that its developers have made. This allows researchers to conduct tests on AGIs that they do not fully understand, and to get solid results. According to Babcock et al. (2016) a good AGI container should: • Prevent unintended input and output channels. This means preventing the AGI from accessing the internet, communicating through radio-frequencies, reading deleted files, or storing information that persists between tests. • Protect the integrity of log files. The AGI should not be able to access or modify log files and related information. • Enable reliable checks. The container should be able to check that the AGI is operating correctly and shut it down if it isn’t. This could include using intrusion detection systems to monitor the AGI’s behavior. • Ensure reproducibility. The container should allow researchers to run the same experiment multiple times with the same inputs and get the same results. This makes it easier to reproduce and study problematic machine behavior. • Support secure reset. The container should be able to restore the hardware to an untouched state after a test is finished. • Guarantee test isolation. If multiple experiments are running at the same time on the same hardware, they should not be able to communicate with each other. Babcock et al. (2016) define three types of containment mechanisms: • Light. These containment mechanisms impose minimal hindrances to computational processes and ease of use; light containment is not likely to suffice very far into the future if AI is to reach a state of autonomous self-improvement. • Medium. Medium containment mechanisms involve noticeable compromises in performance or ease of development of the AGI. • Heavy. Heavy containment mechanisms severely hinder performance, ease of development, and/or usefulness of an AGI but offer the best defense against rampant AI.

8.4.2 Containment: Traditional Solutions Dealing with potentially harmful code is not a new phenomenon. As we discussed in Chap. 7, malware is a major problem we’ve had to deal with since the early days of computing. A number of techniques have indeed been developed to counter it over the decades. With modifications, some of these approaches are also applicable when containing an AGI-based system. Type of institution.

8.5 In Closing

157

• Programming Language Semantics. The basic strategy for ensuring that an AGI acts as programmed is to rely on the semantics of the programming language it is implemented in. This means that the AGI should not be able to do anything that the programming language does not permit. For example, a program which does not contain any network/online-related functions shouldn’t be able to connect to any networks. • Airgapping. Limiting the connections to an AGI’s data center can block it from online access, but this can be expensive and it does not address the other requirements of an AGI container. Also, a creative enough AGI could communicate with outside parties by some means which transcend physical distance, such as radio- frequency communications. • Sandboxing in operating systems, such as Windows, Linux, and macOS, is a security technique that isolates programs from each other and from the underlying OS. This is done by creating a restricted environment for each program with a limited set of abilities for data access. While sandboxing offers a great deal of protection, it is far from being the perfect security measure. • Virtualization is the approach of running one operating system inside another (e.g. installing Linux inside Microsoft Windows or vice versa) by using so-called hypervisor-software, a term which dates back all the way to around 1970. Popular solutions for this approach include VirtualBox by Oracle and VMWare by WMWare Inc. Virtualized operating systems tend to be more secure than non- virtualized ones, but they still have their share of vulnerabilities. Babcock et al. (2016) conclude in their paper that although it is likely that AGI is still far in the future (as it still is in 2023), we should be developing containment technologies right now. This will give us time to mature the technology and identify any potential issues. There is a lot of uncertainty about what capabilities AGIs will have, but most experts agree that it is important to have containment systems in place to ensure that strong AI can be tested safely. There are still unanswered questions about how to build these systems, but significant progress can be made by pursuing the ideas Babcock et al. have laid out. The earlier we start working on this, the more time we will have to identify and address any unforeseen challenges.

8.5 In Closing In this chapter we went through some pertinent research on artificial general intelligence (AGI) also known as strong AI. In particular we explored important papers by Babcock et al. (2016), Baum (2017), and Schuett et al. (2023). Hopefully, at this point you have boosted your knowledge of artifical intelligence and the problems and opportunities the technology brings with it. You understand more about chatbots and how to use them for your benefit, while also being aware of their pitfalls. Historically, AI was the emulation of human language (and

158

8 Towards an Artificial General Intelligence

language may well be the best thing our cultures have given us) but it is becoming so much more. After finishing this book you may find yourself thinking fondly of the following topics: • • • • •

Alan Turing’s efforts and the historical significance of The Turing Test The wacky escapades of ELIZA and other early chatbots The components of human languages and how AI processes them How chatbots and AI are leveraged by businesses and the health industry Weak vs strong AI (i.e. Siri vs The Terminator)

8.6 On Immortality and Coworker Geniality Only a few technological breakthroughs are needed to advance the human civilization; the rest of our time is spent fine-tuning and popularizing technology. Think of the wheel, the internal combustion engine, and the microchip. Strong AI is one such monumental advancement of technology. It will either place humans behind the steering wheel like never before—or make us mere fearful bystanders in the labyrinthine streets of the near future. Either way, this is mankind finally giving birth to a completely new category of being, one we’ve been gestating since antiquity. At some point the Greek myth of sculptor Pygmalion, whose statue Galatea came to life, and all those Terminator-films pretty much became a likely possibility. Of course, strong AI is not only about sentient humanoid robots walking among us (when combined with advanced robotics); it will make us question our understanding of family, culture, community, and death. The questions surrounding strong AI/ AGI will indeed be plentiful. What is sentient life? What is a co-worker and can we all finally get along? Is an emotional attachment to a robot natural (and can I marry one of these things)? An average artificial person’s lifespan may approach millennia after some point which we may not be able to match. Who will record the history of life on Earth in the future? Augmenting human beings with future technology offers another dizzying array of scenarios. Transhumanists believe that humanity can, through technology, transcend its current biological limitations such as illness and death. Many applications of AI will doubtlessly help in this pursuit; an advanced AI can crunch on (and eventually autonomously conduct) research of any kind like a beast, making strides in medical advancement. When strong AI kicks in, we will indeed cure some serious diseases and find ways of enhancing our biological systems, including our brains. Human death may become a highly preventable, rare scenario. In the future such a tragedy may mostly result from, say, someone forgetting to recharge their personal battery of some kind, or though acts of willful sabotage. Perhaps extreme weather events are the only realistic threat on future Earth as we shop for groceries in our new semi-robotic bodies, happy and bulletproof. In any case, the means may be soon there to support life to as of yet unheard lengths. Transhumanism does bring

References

159

with it some major ethical questions. Who gets to be modified into an enhanced human and who does not? How do we create a society of equal opportunities for both its modified and unmodified citizens? Those more fantastical scenarios notwithstanding, with proper use of AI we can work on the many critical issues facing this planet right now. These include acute matters like poverty and climate change. The AI-applications for environmental protection are many, from smart sensors to automated monitoring vehicles (air pollution, for one, can really take a beating from AI). Renewable energy and new economic models, powered increasingly by AI, seem to be increasingly high priorities to some governments and businesses. What we sometimes forget is artificial intelligence isn’t actually built in isolated laboratories. The feedback-loop between the product and the customer here is rather intimate. In a highly networked, stored, and analyzed world of data nearly everyone is a contributor. Over time, even the most unassuming dwellers of the internet gain more relevance in the matter of AI development. The prompts we type into ChatGPT or other chatbots as well as the software and hardware we choose to use may have long-reaching consequences. We are basically gently calibrating future AI technology with many of our online actions. Human history demonstrates we can exhibit quite a bit of prejudice. When strong AI/AGI finally emerges, we can say it will have been collectively dreamed up by the human race. It’s not enough that chatbots and other AI-based entities fully understand us humans. Before we reach that point we also need to be able to understand and value them. Robert: Can advanced AI entities and humans get along? ELIZA: Don’t you ever say Hello?

References Babcock, J., Kramár, J., & Yampolskiy, R. (2016). The AGI containment problem. In Artificial General Intelligence: 9th International Conference, AGI 2016, New York, NY, USA, July 16-19, 2016, Proceedings 9 (pp. 53–63). Springer International Publishing. Baum, S. (2017). A survey of artificial general intelligence projects for ethics, risk, and policy. Global Catastrophic Risk Institute Working Paper, 17-1. Schuett, J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L., Bluemke, E., & Garfinkel, B. (2023). Towards best practices in AGI safety and governance: A survey of expert opinion. arXiv preprint arXiv:2305.07153.