144 84 9MB
English Pages 456 [433] Year 2023
Lecture Notes in Networks and Systems 697
Debasis Giri · Dieter Gollmann · S. Ponnusamy · Sakurai Kouichi · Predrag S. Stanimirović · J. K. Sahoo Editors
Proceedings of the Ninth International Conference on Mathematics and Computing ICMC 2023
Lecture Notes in Networks and Systems Volume 697
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Debasis Giri · Dieter Gollmann · S. Ponnusamy · Sakurai Kouichi · Predrag S. Stanimirovi´c · J. K. Sahoo Editors
Proceedings of the Ninth International Conference on Mathematics and Computing ICMC 2023
Editors Debasis Giri Maulana Abul Kalam Azad University of Technology Kolkata, West Bengal, India S. Ponnusamy Indian Institute of Technology Madras (IIT Madras) Chennai, Tamil Nadu, India Predrag S. Stanimirovi´c University of Niš Niš, Serbia
Dieter Gollmann Hamburg University of Technology Hamburg, Germany Sakurai Kouichi Kyushu University Fukuoka, Japan J. K. Sahoo BITS Pilani K. K. Birla Goa Campus, Pilani, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-3079-1 ISBN 978-981-99-3080-7 (eBook) https://doi.org/10.1007/978-981-99-3080-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Committee
General Chairs D. M. Kulkarni, BITS Pilani K. K. Birla Goa Campus P. K. Saxena, Scientific Consultant-Cyber Security, Government of India, Former Director, SAG, DRDO, New Delhi, India P. D. Srivastava, IIT Bhilai, Raipur, India
Programme Chairs Debasis Giri, Maulana Abul Kalam Azad University of Technology, West Bengal, India Dieter Gollmann, Hamburg University of Technology, Hamburg, Germany S. Ponnusamy, IIT Madras, Chennai, India Sakurai Kouichi, Kyushu University, Japan Predrag S. Stanimirovi´c, University of Niš, Serbia
Organizing Chair J. K. Sahoo, BITS Pilani K. K. Birla Goa Campus
Organizing Co-chairs Anil Kumar, BITS Pilani K. K. Birla Goa Campus P. Danumjaya, BITS Pilani K. K. Birla Goa Campus
v
vi
Committee
Organizing Committee Tarkeshwar Singh, BITS Pilani K. K. Birla Goa Campus Shilpa Gondhali, BITS Pilani K. K. Birla Goa Campus Minhajul, BITS Pilani K. K. Birla Goa Campus Yasmeen S. Akhtar, BITS Pilani K. K. Birla Goa Campus
Technical Program Committee Members Abdul Quddoos, Integral University, Lucknow Adriana M. Coroiu, Babes-Bolyai University, Romania Ajoy Kumar Khan, Mizoram University Aleksandr Poliakov, Sevastopol State University, Russia Ali Ebrahimnejad, Islamiz Azad University, Iran Amit Maji, IIT Roorkee Amit Verma, IIT Patna Amitava Das, IIIT Sri City Anup Nandy, NIT Rourkela Anwesha Mukherjee, West Bengal University of Technology S. K. Arif Ahmed, University of Tromsø, Norway Arshdeep Kaur, Thapar Institute of Engineering and Technology, Punjab Ashish Awasthi, NIT Calicut Ashok Kumar Das, IIIT Hyderabad Asish Bera, SRM University, Andhra Pradesh Atanu Manna, IICT Bhadohi Bidyut Kr. Patra, NIT Rourkela Bimal Roy, ISI Kolkata Binod Chandra Tripathy, Tripura University, Agartala Biswapati Jana, Vidyasagar University, West Bengal Bubu Bhuyan, NEHU, Shillong Canan Bozkaya, Middle East Technical University, Turkey Chien-Ming Chen, Harbin Institute of Technology Shenzhen Graduate School, China Christina Boura, Université de Versailles Saint-Quentin-en-Yvelines, France Costin Badica, University of Craiova, Romania Darjan Karabasevic, University Business Academy in Novi Sad, Serbia Daya Reddy, International Science Council, France Debashis Nandi, NIT Durgapur Debasis Giri, Maulana Abul Kalam Azad University of Technology, West Bengal Debi Prosad Dogra, IIT Bhubaneswar Devanayagam Palaniappan, Texas A&M University, USA Dhananjoy Dey, SAG DRDO Dieter Gollmann, Hamburg University of Technology
Committee
vii
Dijana Mosic, University of Niš, Serbia Dipanwita Roy Chowdhury, IIT Kharagpur Djamal Foukrach, Hassiba Benbouali University of Chlef, France Doost Ali Mojdeh, University of Mazandaran, Iran Dung Hoang Duong, University of Wollongong, Australia Emel A¸sıcı, Karadeniz Technical University, Turkey Engin Sahin, ¸ Çanakkale Onsekiz Mart University, Turkey Fagen Li, University of Electronic Science and Technology of China Fahredd˙in Abdullayev, Kyrgyz-Turkish Manas University (Mersin University), Kyrgyzstan Florin Leon, Gheorghe Asachi Technical University, Iasi Fouzul Atik, SRM University Fu-Hsing Wang, Chinese Culture University, Taiwan Girraj Kumar Verma, Amity School of Engineering and Technology, Gwalior Gitanjali Chandwani, IIT Kharagpur Gopal Chandra Shit, Jadavpur University, Kolkata Hari Mohan Srivastava, University of Victoria, Canada Hari Shankar Mahato, IIT Kharagpur P. K. Harikrishnan, Manipal Institute of Technology Heinrich Begehr, Free University, Berlin Ilsun You, Soonchunhyang University, South Korea Indivar Gupta, SAG DRDO J. V. Rao, United States International University, Nairobi Jajati Sahoo, BITS Pilani K. K. Birla Goa Campus Jana Dittmann, Uni Magdeburg, Germany Jaroslaw Adam Miszczak, Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Portugal Jay Bagga, Ball State University, USA Jayalakshmi Karamsi, Jawaharlal Nehru Technological University, Anantapur Jaydeb Bhaumik, Jadavpur University, Kolkata Jaydeep Howlader, NIT Durgapur Jianting Ning, Singapore Management University Jitesh Gajjar, The University of Manchester, UK Juan Rafael Sendra, University of Alcalá, Spain Jugal Prajapat, Central University of Rajasthan K. Somasundaram, Amrita Vishwa Vidyapeetham, India Konstantin Volkov, Kingston University, UK Kouichi Sakurai, Kyushu University, Japan Koustav Kumar Mondal, IIT Jodhpur Krishna Kumar, Anna University, India Kwok Yan Lam, Nanyang Technological University, Singapore Kübra Seyhan, Ondokuz Mayıs Üniversitesi, Turkey Lala Septem Riza, Universitas Pendidikan Indonesia Lev A. Kazakovtsev, Siberian State Aerospace University Liming Fang, Nanjing University of Aeronautics and Astronautics, China
viii
Long Jin, Lanzhou University, China Madhumangal Pal, Vidyasagar University, West Bengal Mallikarjunaiah Muddamallappa, Texas A&M University, USA Mario Larangeira, Tokyo Institute of Technology, IOHK Md Firoz Ali, NIT Durgapur Meenakshi Wasadikar, Dr. B. A. M. University, Aurangabad Meltem Kurt Pehlivanoglu, Kocaeli University, Turkey Mikheil Rukhaia, Tbilisi State University, Georgia Milena J. Petrovic, University of Pristina, Kososvka Mitrovica Miroslav Ciric, University of Nis, Serbia Moumita Mandal, IIT Jodhpur Muzafer Saracevic, University of Novi Pazar, Serbia N. M. Bujurk, Karnataka University Nabakumar Jana, IIT(ISM) Dhanbad Nauman Aslam, Northumbria University, UK Naveed Aman, University of Nebraska-Lincoln, USA Neetesh Saxena, Cardiff University, UK Niladri Puhan, IIT Bhubaneswar O. D. Makinde, Stellenbosch University, South Africa P. D. Srivastava, IIT Bhilai P. K. Saxena, Scientific Consultant-Cyber Security, Government of India Partha Sarathi Majee, IIT Kharagpur Parvaz Alam, VIT Vellore Pawan Kumar Mishra, IIT Bhilai Peide Liu, Shandong University of Finance and Economics, China Prasun Ghosal, IIEST Shibpur Predrag S. Stanimirovi´c, University of Niš, Serbia Rabindra Bista, Kathmandu University, Nepal Rajendra Pant, University of Johannesburg, South Africa Ranbir Sanasam, IIT Guwahati Ravi Kanth Asv, NIT Kurukshetra Ravi P. Agarwal, Texas A&M University, USA Rifat Colak, Firat University, Turkey B. Rushi Kumar, VIT Vellore S. Ponnusamy, IIT Madras S. Sivasankaran, University of Malaya S. K. Neogy, ISI Delhi Sabyasachi Dutta, University of Calgary, Canada Sabyasachi Mukherjee, IIEST Shibpur Saibal Pal, DRDO Delhi Saiyed Umer, Aliah University, Kolkata Sanjeev Singh, IIT Indore Santanu Manna, IIT Indore Santanu Sarkar, IIT Madras Saravanan Chandran, NIT Durgapur
Committee
Committee
ix
Sarita Ojha, IIEST Shibpur Saru Kumari, Chaudhary Charan Singh University, Meerut Sedat Akleylek, Ondokuz Mayis University, Turkey Shanmugam Dhinakaran, IIT Indore Shreeya Sahoo, NIT Rourkela Shuai Li, Swansea University, UK Shyamalendu Kandar, IIEST Shibpur Siddhartha Bhattacharyya, RCC Institute of Information Technology, Kolkata Sokratis Katsikas, Norwegian University of Science and Technology Somesh Kumar, IIT Kharagpur Soumya Sen, University of Calcutta Sourav Mandal, Xavier School of Computer Science and Engineering XIM University, Bhubaneswar Sourav Gupta, SVNIT, Surat Sreedhara Rao Gunakala, University of the West Indies Srinivasu Bodapati, IIT Mandi Sriparna Saha, Maulana Abul Kalam Azad University of Technology, West Bengal Subhas Barman, Jalpaiguri Government Engineering College, India Suchandan Kayal, NIT Rourkela Sujit Kumar Das, Jadavpur University, Kolkata Sunirmal Khatua, Calcutta University Sushomita Mohanta, Fakir Mohan University, India Suvrojit Das, NIT Durgapur Swadesh Kumar Sahoo, IIT Indore Tanmoy Maitra, KIIT University Thoudam Doren Singh, NIT Silchar Vasilios N. Katsikis, National and Kapodistrian University of Athens, Greece Venkatesh Raman, The Institute of Mathematical Sciences, Chennai Vijay Sohani, IIT Indore Y. Vijay Kumar, Central University, Rajasthan Vilem Novak, University of Ostrava, Czechia Weizhi Meng, Technical University of Denmark Wen Chean Teh, Universiti Sains Malaysia Wenjuan Li, Hong Kong Polytechnic University
Sub-reviewers Amit Setia, Harshdeep Singh, Sukanta Das, Tarun Yadav, Sabyasachi Dutta, Dan Su, Xiufang Chen, Chuan Qin, Sumanta Pasari, Sharwan Kumar Tiwari, Khalid Mahmood, Ratikanta Behera, Zeynep Kayar, Dhanumjaya Palla, Ratikanta Behera, Dhanumjaya Palla, Rajendra Kumar Roul, Pablo M. Berná, David Sevilla, Vineet Kumar Singh, Amit Setia, Ratikanta Behera, Abhishek Kumar Singh, Jugal Mohapatra, Abhishek Kumar Singh, Tarkeshwar Singh, Lavanya Selvaganesh, Benedek
x
Committee
Nagy, Valentin Brimkov, Vinod Kumar, Francisco Criado, Tarkeshwar Singh, Vivek Vijay, Tarkeshwar Singh, Andrew Rajah, Ushnish Sarkar, Rajendra Kumar Roul, Jugal Mohapatra, Deepmala, Pradeep Boggarapu, Mayank Goel, Xiufang Chen, Ponnurangam Kumaraguru, Haowen Tan, N. Prasanna Kumar, Abhishek Kumar Singh, Shivi Agrawal, Vineet Kumar Singh, Maharage Nisansala Sevwandi Perera, Sayan Ghosh, Sumanta Pasari, Xiufang Chen, Dhanumjaya Palla, Noor A’Lawiah Abd Aziz, Mayank Goel, Amit Setia, Amiya K. Pani, D. Mishra, Abhishek Kumar Singh, Shivi Agrawal, Vineet Kumar Singh, Shivi Agrawal, Pradeep Boggarapu, Sachin Kumar, Pradip Majhi, Hiranmoy Mondal, A. Chandrashekaran, Sharmistha Ghosh, Haydar Alıcı, Rajendra Kumar Roul, Samir Maity, Mukesh Kumar Awasthi, R. Sivaraj, and Arnab Patra.
Message from General Chairs
It gives me a great pleasure to welcome you to ICMC 2023, the 9th edition of the premier annual conference on Mathematics and Computing. This year, ICMC has been held at the Department of Mathematics, Birla Institute of Technology and Science Pilani, K. K. Birla Goa Campus, Goa, India in association with Ramanujan Mathematical Society (RMS) and Cryptology Research Society India (CRSI), where advanced technology infrastructure and collection of creative talents are gathered, making it an excellent position to develop advanced Computing. ICMC has been the most impactful conference on all aspects of mathematics and computing. Papers published from this impactful conference represent the hard work of researchers. We are delighted to report that ICMC remains at the forefront of computer networks. This year, the leading conference embodied a set of 31 papers out of 135, selected through careful and rigorous peer review by TPC members as the best of the best submissions. It has been our honor to have the most prominent researcher as our conference invitees, Prof. Predrag S. Stanimirovi´c (Faculty of Sciences and Mathematics, University of Niš), Prof. Sokratis Katsikas (Department of Information Security and Communication Technology, Norwegian University of Science and Technology, Norway), Prof. Rakesh M. Verma (Department of Computer Science, University of Houston, Houston, USA), Prof. R. N. Mohapatra (Department of Mathematics, University of Central Florida, Orlando, USA), Prof. Dieter Gollmann (Head of the Security in Distributed Applications Institute, Hamburg University of Technology, Hamburg, Germany), Prof. S. Ponnusamy and Prof. S. Sundar (Department of Mathematics, IIT Madras, India), Dr. Ratikanta Behera (Department of Computational and Data Science, IISc Bengaluru, India), and Prof. Subir Das (Department of Mathematical Sciences, IIT(BHU), Varanasi, India). First, we would like to thank all the program and organizing committee members who have done an outstanding job carrying out the paper review tasks. In particular, we would like to thank to all program chairs who have diligently developed the review system. We would also like to thank the Honorable Director Prof. Suman Kundu, BITS Pilani K. K. Birla Goa Campus, Goa, India, for his support and providing the infrastructure. We thank to the sponsors, The Science and Engineering Board (SERB), The Defence Research and Development Organisation (DRDO), and The xi
xii
Message from General Chairs
National Board for Higher Mathematics (NBHM), Government of India, for financial support. Finally, we thank all conference participants for making ICMC a grand success. P. K. Saxena Scientific Consultant—Cyber Security and Former Director, SAG, DRDO Delhi, India P. D. Srivastava Indian Institute of Technology Bhilai Raipur, India D. M. Kulkarni BITS Pilani K. K. Birla Goa Campus Goa, India
Message from Program Chairs
It was a great pleasure for us to organize the 9th International Conference on Mathematics and Computing (ICMC2023) during January 06–08, 2023 at the Department of Mathematics, BITS Pilani K. K. Birla Goa Campus, Goa, India. Our main goal was to provide an opportunity to the participants to learn about contemporary research in mathematics and computing and exchange ideas among themselves and with experts present in the conference as keynote speakers. 19 speakers from India and abroad delivered their talks, and some acted as session chairs. After an initial call for papers, 135 papers were submitted for the conference. All submitted papers were sent to external referees. After refereeing, 31 articles were recommended for publication for the conference proceedings published by Springer series: Lecture Notes in Networks and Systems. ICMC 2023 has become an international platform to deliver and share novel knowledge in various fields on applied mathematics and computing of interest. We are grateful to the chief patron, patron, general chairs, program chairs, organizing chair, speakers, participants, referees, organizers, sponsors, and funding agencies for their support and help without which it would have been impossible to organize this
xiii
xiv
Message from Program Chairs
conference. We owe our gratitude to the volunteers who worked behind the scene tirelessly in taking care of the details in making this conference a great success. Debasis Giri Maulana Abul Kalam Azad University of Technology West Bengal, India Dieter Gollmann Hamburg University of Technology Hamburg, Germany S. Ponnusamy IIT Madras Chennai, India Sakurai Kouichi Kyushu University Fukuoka, Japan Predrag S. Stanimirovi´c University of Niš Niš, Serbia
Preface
In the last two decades, Computing has grown beyond its classical roots in mathematics and the physical sciences and has started to revolutionize the life sciences and medicine. In the twenty-first century, its pivotal role continues to expand to broader areas, including the social sciences, humanities, business, and finance. Scientific computing is essential for finding solutions to research problems that are unsolvable by traditional theory and experimental approaches, difficult to study in the laboratory, or time-consuming or expensive. With mathematical modeling and computational algorithms, many problems from the sciences, commerce, and other walks of life can be solved efficiently. The International Conference on Mathematics and Computing (ICMC) is such a premier forum for the presentation of new advances and results in the fields of Cryptography, Network Security, Cybersecurity, Internet of Things, Due and Edge computing, Applied Algebra, Mathematical Analysis, Mathematical modeling, Fluid dynamics, Fractional calculus, Multi-optimization, Integral equations, Dynamical Systems, Numerical Analysis and Scientific Computing. The conference in this series has been bringing together leading academic scientists, experts from industry worldwide. ICMC was organized in 2013, 2015, and 2017– 2022 at different institutions across India. The 9th ICMC 2023 brought together novice and experienced scientists with developers to meet new challenges, collect new ideas, and establish further cooperation between research groups. It provided a platform for researchers from academia and industry to present their original work and exchange ideas, information, techniques, and applications in Computational Applied Mathematics. The 9th Internal Conference on Mathematics and Computing (ICMC 2023) was organized by the Department of Mathematics, BITS Pilani K. K. Birla Goa Campus, Goa, India, in association with Ramanujan Mathematical Society (RMS), India, Cryptology Research Society India (CRSI) and Society for Electronic Transactions and Security (SETS), India. This conference received 135 papers from India and
xv
xvi
Preface
abroad. Nine eminent keynote speakers (Prof. Predrag S. Stanimirovi´c, Professor, Faculty of Sciences and Mathematics, University of Niš, Serbia; Prof. Amiya Kumar Pani and Prof. M. Thamban Nair, Visiting Professor, Department of Mathematics, BITS Pilani K. K. Birla Goa Campus, India; Prof. Sokratis Katsikas, Professor, Department of Information Security and Communication Technology, Norwegian University of Science and Technology, Norway; Prof. Rakesh M. Verma, Professor of Computer Science, University of Houston, Houston, USA; Prof. R. N. Mohapatra, Academic Director and Professor, Department of Mathematics, University of Central Florida, Orlando, USA; Prof. Dieter Gollmann, Professor, Head of the Security in Distributed Applications Institute, Hamburg University of Technology, Hamburg, Germany; Prof. S. Ponnusamy and Prof. S. Sundar, Department of Mathematics, IIT Madras, India) and 10 invited speakers (Dr. Ratikanta Behera, Department of Computational and Data Science, IISc Bengaluru, India; Prof. Vineet Kumar Singh and Prof. Subir Das, Department of Mathematical Sciences, IIT(BHU), Varanasi, India; Prof. Muslim Malik, Department of Mathematics, IIT Mandi, India; Prof. Sarvesh Kumar Rajput, Department of Mathematics, IIST Thiruvananthapuram, India; Prof. Navnit Jha, Prof. Jagdish Chand Bansal, and Prof. Saroj Kumar Sahani, Department of Mathematics, South Asian University, Delhi, India; Prof. Shivi Agarwal and Prof. Trilok Mathur, Department of Mathematics, Birla Institute of Technology and Science, Pilani, India) delivered their talks, and 17 professors from India and abroad chaired paper presentation sessions. A basic premise of this book series is that quality assurance is effectively achieved by selecting quality research articles from a scientific committee of about 200 reviewers worldwide. This book contains selected papers of several dynamic researchers in 31 chapters. This book also provides a comprehensive literature survey which reveals the challenges, outcomes, and developments of advanced mathematics and computing in this decade. The theoretical coverage of this book is relatively at a higher level to meet the global need of mathematics, computing and its applications in science and engineering. The audience of this book is mainly postgraduate students, researchers, and industrialists. As Volume Editors, we sincerely thank all the administrative authorities of Birla Institute of Technology and Science Pilani, K. K. Birla Goa Campus, Goa, India, for their motivation and support. We also extend our profound thanks to all faculty members and research scholars of the department of mathematics and all staff members of our institute. We especially thank program chairs and all the members of the organizing committee of ICMC 2023 who worked as a team by investing their time to make the conference a great success. We sincerely thank all the referees for their valuable time reviewing the manuscripts and selecting the research papers for publication, which led to substantial improvements. We express a special thanks to the sponsors, The Science and Engineering Board (SERB), The Defence Research and Development Organisation (DRDO), and National Board for Higher Mathematics (NBHM), Government of India, for financial support. Without their support
Preface
xvii
the conference would not have been successful. In the least, the organizing committee is grateful to Springer (Lecture Notes in Networks and Systems) for their support towards the publication of this book. Kolkata, India Hamburg, Germany Chennai, India Fukuoka, Japan Niš, Serbia Pilani, India
Debasis Giri Dieter Gollmann S. Ponnusamy Sakurai Kouichi Predrag S. Stanimirovi´c J. K. Sahoo
Contents
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Souvik Sur and Dipanwita Roychowdhury
1
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bijoy Das, Abhijit Das, and Dipanwita Roy Chowdhury
15
MILP Modeling of S-box: Divide and Merge Approach . . . . . . . . . . . . . . . Manoj Kumar and Tarun Yadav
29
A Relation Between Properties of S-box and Linear Inequalities of DDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manjeet Kaur, Tarun Yadav, Manoj Kumar, and Dhananjoy Dey
43
Damage Level Estimation of Rubble-Mound Breakwaters Using Deep Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Susmita Saha and Soumen De
57
Facial Image Manipulation Detection Using Cellular Automata and Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shramona Chakraborty and Dipanwita Roy Chowdhury
69
Language-Independent Fake News Detection over Social Media Networks Using the Centrality-Aware Graph Convolution Network . . . . Sujit Kumar, Mohan Kumar, and Sanasam Ranbir Singh
89
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sourav Saha, Ashok Kumar Das, and Debasis Giri
99
GradeChain-α: A Hyperledger Fabric Blockchain-Based Students’ Grading System for Educational Institute . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Snigdha Mayee Samantray, Debasis Giri, and Tanmoy Maitra
xix
xx
Contents
Object-Background Partitioning on Images: A Ratio-Based Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Shyamalendu Kandar and Seba Maity More on Semipositive Tensor and Tensor Complementarity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 R. Deb and A. K. Das An Improved Numerical Scheme for Semilinear Singularly Perturbed Parabolic Delay Differential Equations . . . . . . . . . . . . . . . . . . . . 157 J. Mohapatra and S. Priyadarshana Computational Modeling of Noisy Plasma Images Applicable to Tokamak Imaging Diagnostics for Visible and X-ray Emissions . . . . . . 171 Dhruvil Bhatt, Kirtan Delwadia, Shishir Purohit, and Bhaskar Chaudhury On Partial Monotonicity of Some Extropy Measures . . . . . . . . . . . . . . . . . . 185 Nitin Gupta and Santosh Kumar Chaudhary Error Bound for the Linear Complementarity Problem Using Plus Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Bharat Kumar, Deepmala, A. Dutta, and A. K. Das On the Solutions of the Diophantine Equation u a + v b = z 2 for Some Prime Pairs u and v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Ednalyn Xyra P. Calabia and Jerico B. Bacani A Second-Order Optimal Hybrid Scheme for Singularly Perturbed Semilinear Parabolic Problems with Interior Layers . . . . . . . . . . . . . . . . . . 223 S. Priyadarshana and J. Mohapatra Temperature Distribution During Hyperthermia Using a 2D Space-Time Fractional Bioheat Model in Irregular Domain . . . . . . . . . . . 235 Bhagya Shree Meena and Sushil Kumar Characterization of Minimum Structure and Sub-structure Cut of Exchanged Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Paul Immanuel and A. Berin Greeni Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Subhasis Koley and Sasthi C. Ghosh Classification of Texts with Emojis to Classify Sentiments, Moods, and Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Sounava Pal, Sourav Mandal, and Rohini Basak A Study on Fractional SIS Epidemic Model Using RPS Method . . . . . . . . 293 Rakesh Kumar Meena and Sushil Kumar
Contents
xxi
On Zero-Sum Two Person Perfect Information Semi-Markov Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Sagnik Sinha and Kushal Guha Bakshi Interval Estimation for Quantiles of Several Normal Populations with a Common Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Habiba Khatun and Manas Ranjan Tripathy Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Nishi Gupta and Md. Maqbul On m-Bonacci Intersection-Sum Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Kalpana Mahalingam and Helda Princy Rajendran Three-Time Levels Compact Scheme for Pricing European Options Under Regime Switching Jump-Diffusion Models . . . . . . . . . . . . . 367 Pradeep Kumar Sahu, Kuldip Singh Patel, and Ratikanta Behera The Sum of Lorentz Matrices in M2 (Zn ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Richard J. Taclay and Karen Dizon-Taclay A Robust Analytic Approach to Solve Non-linear Fractional Partial Differential Equations Using Fractional Complex Transform . . . . . . . . . . 385 Vishalkumar J. Prajapati and Ramakanta Meher Relations Between Discrete Maximal Operators in Harmonic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Sri Sakti Swarup Anupindi and Michael Alphonse Multivariate Bernstein α-Fractal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 409 D. Kumar, A. K. B. Chand, and P. R. Massopust Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
About the Editors
Dr. Debasis Giri is at present an associate professor in the Department of Information Technology of Maulana Abul Kalam Azad University of Technology (formerly known as West Bengal University of Technology), West Bengal, India, prior to the professor (in Computer Science and Engineering) and the dean (in School of Electronics, Computer Science and Informatics) of Haldia Institute of Technology, Haldia, India. He did his masters (M.Tech. and M.Sc.) both from IIT Kharagpur, India, and also completed his Ph.D. from IIT Kharagpur, India. He is tenth all India rank holder in Graduate Aptitude Test in Engineering in 1999. He has published more than 100 papers in international journal/conference. His current research interests include cryptography, information security, blockchain technology, E-commerce security, etc. He is an editorial board member and reviewer of many international journals. He is a life member of Cryptology Research Society of India, Computer Society of India, the International Society for Analysis, its Applications and Computation (ISAAC), and IEEE annual member. Prof. Dieter Gollmann received his Dipl.-Ing. in Engineering Mathematics (1979) and Dr. tech. (1984) from the University of Linz, Austria, where he was a research assistant in the Department for System Science. He was a lecturer in Computer Science at Royal Holloway, University of London, and later a scientific assistant at the University of Karlsruhe, Germany, where he was awarded the ‘venia legendi’ for Computer Science in 1991. He rejoined Royal Holloway in 1990, where he was the first course director of the M.Sc. in Information Security. He moved to Microsoft Research in Cambridge in 1998. In 2003, he took the chair for Security in Distributed Applications at Hamburg University of Technology, where he retired in 2021. He was an adjunct professor at the Technical University of Denmark, 2005–2009, and a visiting professor at Nanyang Technological University, Singapore 2017–2019. He is a visiting professor with the Information Security Group at Royal Holloway, University of London.
xxiii
xxiv
About the Editors
Dr. S. Ponnusamy is currently the chair professor at IIT Madras and the president of the Ramanujan Mathematical Society, India. His research interest includes complex analysis, special functions, and functions spaces. He served five years as a head of the Indian Statistical Institute, Chennai Centre. He is the chief editor of the Journal of Analysis and serves as an editorial member for many peer-reviewed international journals. He has written five textbooks and has edited several volumes and international conference proceedings. He has solved several long-standing open problems and conjectures and published more than 300 research articles in reputed international journals. He has been a visiting professor to a number of universities in abroad (e.g., Hengyang Normal University and Hunan Normal University; Kazan Federal University and Petrozavodsk State University; University Sains Malaysia; University of Aalto, University of Turku, and University of Helsinki; University of South Australia; Texas Tech University). Sakurai Kouichi received the Doctorate in engineering in 1993 from the Faculty of Engineering, Kyushu University. He was engaged in research on cryptography at the Computer and Information Systems Laboratory at Mitsubishi Electric Corporation from 1988 to 1994. From 1994, he worked for the Department of Computer Science of Kyushu University in the capacity of associate professor and became a full professor there in 2002. He was working partially with the Institute of Systems and Information Technologies and Nanotechnologies, as the chief of Information Security laboratory. In March 2006, he established research co-operations under a Memorandum of Understanding in the field of information security with Bimal Kumar Roy, the first time Japan has partnered with the Cryptology Research Society of India. He is now working also for Advanced Telecommunications Research Institute International as a visiting researcher with department of advanced security. He has published more than 400 academic papers around cryptography, cybersecurity, and privacy. Predrag S. Stanimirovi´c has accomplished his Ph.D. in Computer Science at University of Niš, Faculty of Philosophy, Niš, Serbia. He is a full professor at University of Niš, Faculty of Sciences and Mathematics, Department of Computer Science, Niš, Serbia. He acquired thirty-five years of experience in scientific research in diverse fields of mathematics and computer science, which span multiple branches of numerical linear algebra, recurrent neural networks, linear algebra, symbolic computation, nonlinear optimization, and others. His main research topics include numerical linear algebra, operations research, recurrent neural networks, and symbolic computation. Within recent years, he has successfully published over 300 publications in scientific journals, including five research monographs, six textbooks, five monographs, and over 80 peer-reviewed research articles published in conference proceedings and book chapters. He is a section editor of the scientific journals Electronic Research Archive (ERA), Filomat, Facta Universitatis, Series: Mathematics and Informatics, and several other journals.
About the Editors
xxv
Dr. J. K. Sahoo is an associate professor and the head of the Department of Mathematics at BITS Pilani K. K. Birla Goa Campus. He has authored or co-authored more than 45 scientific publications and has been a reviewer of many reputed journals. He completed his graduate studies with a Ph.D. from the Indian Institute of Technology Madras (India) in 2010 and his undergraduate studies at the Utkal University with an M.Sc. and B.Sc. in Mathematics in 2004 and 2002, respectively. He has over 12 years of teaching and research experience. He has guided a Ph.D. student, more than seven Master theses, and a few undergraduate students. He is an active researcher, working on topics related to numerical linear algebra, matrix theory, machine learning, and tensor computations.
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata Souvik Sur and Dipanwita Roychowdhury
Abstract A Verifiable Delay Function (VDF) is a function that takes a specified (typically long) sequential time to be evaluated but can be efficiently verified. VDFs are useful in several applications ranging from randomness beacons to sustainable blockchains. However, VDFs are really rare in practice as they need computational problems that are inherently sequential. At present, we are aware of only two such problems, a group of unknown orders [16, 18] and isogenies over super-singular curves [8]. In this paper, we show that nonlinear hybrid cellular automata (NHCA) also turns out to be another option to derive VDF. The sequentiality comes from the fact that a state in an NHCA can be reached from another state only sequentially. As the key reason behind this sequentiality, we prove that NHCA produces a sequence of random unbiased bit-strings that cannot be accessed arbitrarily. Therefore, the only way to reach a state from another state is to enumerate all the intermediate states sequentially. We also establish that our VDF is appropriately sound. Keywords Verifiable delay functions · σ-sequentiality · Soundness · Cellular automata
1 Introduction The concept of verifiable delay functions was first formalized in [4]. A verifiable delay function V is a function with the domain X and the range Y, that takes a specified number of sequential steps T to be evaluated (irrespective of the amount of parallelism) and can be verified efficiently (even without parallelism) and publicly. Wesolowski [18] and Pietrzak [16] come up with two VDFs separately, although S. Sur (B) · D. Roychowdhury Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India e-mail: [email protected] D. Roychowdhury e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_1
1
2
S. Sur and D. Roychowdhury
based on the same sequentiality assumption of time-lock puzzle [17]. Feo et al. [8] propose a VDF based on super-singular elliptic curves defined over finite fields. Except these two [8, 17], we do not know if there exist VDFs based on any other sequentiality assumption. So, a crucial question would be can we derive VDFs from another sequential problem?
1.1 Our Contributions We propose a VDF scheme based on nonlinear hybrid cellular automata (NHCA). These automata fit in the current context, because they offer intrinsic randomness required for the soundness and sequentiality of the VDF. Briefly, our scheme works as follows. We use an n-bit nonlinear hybrid cellular automaton. The input x ∈ X passes through a hash function H (·) that produces an n-bit hash value. Both the NHCA and the hash function are public knowledge. The prover uses this hash output to initialize the state of the NHCA. The NHCA is then updated up to τ state-transitions where τ is the difficulty level and is again a public parameter. Under the assumption that the initializing state is a random n-bit vector (because it is a hash output), there is no shortcut to reach the intended state before τ transitions (See Theorem 1). The state s reached after τ transitions is provided by the prover as the output of the Eval algorithm of the VDF. During verification, the verifier gets the same initial state of the NHCA by the same hash calculation on the input x. The verifier then checks whether τ updates of the NHCA from this initial state lets the NHCA reach the state s. Unlike during Eval, the verifier initializes the NHCA by s, Then (s)he comes back to the initial state H (x) in (2n − 1 − τ ) transitions of the NHCA. We take τ (2n − 1 − τ ) thus verification is the order of magnitude more efficient than the Eval. We show that our construction is correct, perfectly sound, and appropriately sequential. Although we use a hash function in our scheme, these three properties are not proven in the random oracle model or under any algebraic hardness assumption. Rather the proofs are based on the intrinsic randomness of an NHCA. We prove that maximum-length NHCA produce a sequence of pseudo-random unbiased bitstrings (i.e., states). Till date, the randomness of NHCA was used to be expressed as a long-standing conjecture only. This randomness is at the heart of our design of asymptotically difficult puzzles.
1.2 Organization of the Paper This paper is organized as follows. In Sect. 3, we present a succinct review of VDFs and cellular automata, and prove three important theorems to establish the randomness of NHCA. We propose our VDF scheme in Sect. 4. Section 5 establishes the
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata
3
essential properties correctness, sequentiality, and soundness of the VDF. Finally, Sect. 7 concludes the paper after highlighting an open problem in this context.
2 Related Work In this section, we briefly discuss the state of the art on VDFs and a few recent developments based on NHCA.
2.1 Review of Verifiable Delay Functions Boneh et al. [4] propose a VDF based on injective rational maps of degree T , where the fastest possible inversion is to compute the polynomial GCD of degree-T polynomials. They conjecture that it achieves (T 2 , (T )) sequentiality using permutation polynomials as the candidate map. However, it is a weak form of VDF as it needs O(T ) processors for the inversion in parallel time T . Wesolowski [18] and Pietrzak [16] independently propose two VDFs based on modulo exponentiation over the group (Z/N Z)× . The first one asks the prover to comT T pute y = x 2 and w = x 2 /l , where l is a prime chosen by the verifier. The verifier T checks whether y = wl x (2 mod l) . On the other hand, Pietrzak’s VDF is an interactive i T /2u for i = 0, 1, 2, . . . , 2u − 1, protocol asking the prover to compute 2u elements x 2 1 where u = 2 log2 T . The verifier checks them in O(log T ) time, whereas the prover √ needs O( T ) time to generate them. It uses the RSA group and the class groups of imaginary quadratic number fields. Feo et al. [8] presents a VDF based on isogenies of super-singular elliptic curves. They start with five groups G 1 , G 2 , G 3 , G 4 , G 5 of prime order > T with two nondegenerate bilinear pairing maps e12 : G 1 × G 2 → G 5 and e34 : G 3 × G 4 → G 5 . There are also two group isomorphisms φ : G 1 → G 3 and φ : G 4 → G 2 . Given all the above descriptions as the public parameters along with a generator P ∈ G 1 , the prover needs to find φ(Q), where Q ∈ G 4 , using T sequential steps. The verifier checks if e12 (P, φ(Q)) = e34 (φ(P), Q) in poly(log T ) time.
2.2 Cryptography Based on Non-linear CA We present a succinct review of cryptographic protocols based on NHCA. For the survey on other domains of applications like pattern recognition, memory testing , etc., we refer to [3, 7, 9]. The application of CA in cryptographic pseudo-random number generators (PRNG) started in the 1980s with Wolfram. In the recent past, the cryptographic
4
S. Sur and D. Roychowdhury
effectiveness of NHCA has been studied in more detail [13]. NHCA is efficacious in the design of symmetric ciphers [12, 15]. In particular, NHCA prohibits a class of attacks, namely, fault attacks, in stream ciphers [14]. The substitution box or S-box in block ciphers can be made fault-resilient using NHCA [11]. NHCA suits in the designs of hash functions too [2]. Authenticated encryption schemes may also be derived from NHCA.
3 Preliminaries We start with the notation first.
3.1 Notation We denote the security parameter with λ ∈ Z+ . The term poly(λ) refers to some polynomial of λ, and negl(λ) represents some function λ−ω(1) . If any randomized R
$
algorithm A outputs y on an input x, we write y ← − A(x). By x ← − X , we mean that x is sampled uniformly at random from X . For an element x, |x| denotes the bit-length of x, whereas for any set X , |X | denotes the cardinality of the set X . We denote the time complexity of any algorithm A by T (A), and we consider A as efficient if it runs in probabilistic polynomial time (PPT). We say that an algorithm A runs in parallel time σ with Δ processors if it can be implemented on a PRAM machine with Δ parallel processors running in time σ. The total sequential time T needed for the computation on a single processor can be taken as T = σ × Δ.
3.2 Verifiable Delay Function We follow the terminology of VDFs described in [4]. Definition 1 (Verifiable Delay Function) A VDF V = (Setup, Eval, Verify) that implements a function X → Y is specified by three algorithms. • Setup(1λ , T ) → pp is a randomized algorithm that takes as input a security parameter λ and a targeted time bound T and produces the public parameters pp. We require Setup to run in poly(λ, log T ) time. • Eval( pp, x) → (y, π) takes an input x ∈ X and produces an output y ∈ Y and a (possibly empty) proof π. Eval may use random bits to generate the proof π. For all pp generated by Setup(λ, T ) and all x ∈ X , the algorithm Eval( pp, x) must run in parallel time T with poly(λ, log T ) processors.
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata
5
• Verify( pp, x, y, π) → {0, 1} is a deterministic algorithm that takes an input x ∈ X , an output y ∈ Y, and a proof π (if any), and either accepts (1) or rejects (0). The algorithm must run in poly(λ, log T ) time. The three desirable properties of a VDF are now introduced. Definition 2 (Correctness) A VDF V is correct with some error probability ε, if for all λ, T , parameters pp, and x ∈ X , we have ⎡
⎤ pp ← Setup(1λ , T ) ⎢ ⎥ $ Pr ⎣ Verify( pp, x, y, π) = 1 x ← ⎦ = 1. −X (y, π) = Eval( pp, x) Definition 3 (Soundness) A VDF V is sound if for all algorithms A that run in time poly(T, λ), we have
pp ← Setup(1λ , T ) y = Eval( pp, x) ≤ negl(λ). Pr Verify( pp, x, y, π) = 1 (x, y, π) ← A(1λ , 1T , pp)
Definition 4 (Sequentiality) A VDF V is (Δ, σ)-sequential if there exists no pair of randomized algorithms A0 with total running time poly(T, λ), and A1 which runs in parallel time σ on at most Δ processors, such that ⎡ ⎢ Pr ⎢ ⎣y
⎤ pp ← Setup(1λ , T ) λ T state ← A0 (1 , 1 , pp) ⎥ ⎥ ≤ negl(λ). = Eval( pp, x) $ ⎦ −X x← y ← A1 (state, x)
Here, A0 is a preprocessing algorithm that precomputes some state based only on the public parameters, and A1 exploits this additional knowledge to evaluate Eval(x, pp) in parallel running time σ on Δ processors. An almost-perfect VDF would achieve a sequentiality σ = T − o(T ). Even a sequentiality σ = T − T for small is sufficient for most applications.
3.3 Cellular Automata A cellular automaton (CA, in short) is traditionally presented as an array of cells, evolving through a number of discrete time-steps according to a set of functions working on the states of neighboring cells [6]. In this work, we consider one-dimensional binary cellular automaton, that is the cells of the CA can assume only binary values. Thus, such a CA of n-cells can be realized as an n-dimensional bit vector. Let, s(t) denotes the state of this CA at time t, and bit represents the ith bit of the vector s(t). The cells are numbered i = 0, 1, 2, . . . , n − 1. Each of the bits bit in a state s(t) of
6
S. Sur and D. Roychowdhury
a CA takes value using a predefined boolean function f i called as rule. The set of all the rules R = f 0 , f 1 , . . . , f n−1 is known as the rule vector of a CA. The set of neighboring cells Ni that affects the cell bi known as the neighbor set of bi . Any bit of CA can be expressed as bit = f i (Ni ). For example, if we take Ni = {bi−1 , bi , bi+1 } a t t , bit , bi+1 ). As f i is one of three-neighborhood CA, then for all i and t, bit+1 = f i (bi−1 |Ni | 2 |Ni | the 2 boolean functions it can be represented as 2 -bit integer. As an example, two popular rules for three-neighborhood CA are t t ⊕ bi+1 . Rule 90: bit+1 = bi−1 t+1 t t . Rule 150: bi = bi−1 ⊕ bit ⊕ bi+1 Since a state s(t) of a CA consists of the bits bit we may collectively write s(t + 1) = R(s(t)). Enumerating the state s(t + 1) by applying R on the state s(t) is termed as the transition of the CA. Starting from a state s(t) the τ th state s(t + τ ) can be obtained by τ number of transitions of the CA, i.e., s(t + τ ) = Rτ (s(t)). If t and bnt are always assumed to be 0 in a CA, then it is called a the boundary bits b−1 null-boundary CA. The state transition of an 5-bit null-boundary CA under a given rule vector is illustrated in Fig. 1. The ruleset of this CA can be specified as t ⊕ b0t ⊕ b1t b0t+1 = b−1
as f 0 is Rule 150 as f 1 is Rule 150 as f 2 is Rule 150
b1t+1 = b0t ⊕ b1t ⊕ b2t b2t+1 = b1t ⊕ b2t ⊕ b3t b3t+1 = b2t ⊕ b3t ⊕ b4t b4t+1
=
b3t
⊕
as f 3 is Rule 150 as f 4 is Rule 90
b5t
If all the rules f i ∈ R include only linear boolean operations (e.g., logicalXOR/XNOR) then the CA is called a linear CA. The inclusion of a nonlinear operation (e.g., logical AND/OR) in any of the f i s makes the CA a nonlinear one. Similarly if all the f i ∈ R are identical with each other then the CA is called a uniform CA, otherwise a hybrid CA. The CA shown in Fig. 1 is a linear hybrid CA (LHCA) as both the rules 90 and 150 are linear but not identical with each other. Similarly, we can synthesize nonlinear hybrid CA (NHCA) from an LHCA using the algorithm described in [10]. Basically this algorithm deterministically looks for the bit positions of an LHCA to inject the nonlinear operation. For example, the five-bit CA in Fig. 1 can be converted into a maximum-length NHCA as follows: Position of the bits State s(t) at time t Rule vector R State s(t + 1) at time (t + 1)
b0
b1
b2
b3
b4
0
1
1
1
1
150 150 150 150 90 1
0
1
1
0
Fig. 1 State change of a null-boundary CA with rule vector 150, 150, 150, 150, 90
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata
7
t b0t+1 = b−1 ⊕ b0t ⊕ b1t
b1t+1 = b0t ⊕ b1t ⊕ b2t ⊕ (b0t ∧ b4t ) b2t+1 = b1t ⊕ b2t ⊕ b3t ⊕ (b0t ∧ b4t ) ⊕ ((b0t ⊕ b1t ) ∧ b3t ) b3t+1 = b2t ⊕ b3t ⊕ b4t ⊕ (b0t ∧ b4t ) b4t+1 = b3t ⊕ b5t Given the ruleset R of an n-bit LHCA, we have 0n = R(0n ), i.e., the LHCA continues to stay in the all-zero state when it is initialized with the same. If the remaining 2n − 1 non-zero states occur in a single cycle, the CA is called a maximum-length CA. Maximum-length LHCAs of n cells are derived from primitive polynomials of degree n over F2 . There exists a probabilistic polynomial-time algorithm to generate a rule vector for an n-cell maximum-length LHCA from a primitive polynomial (See Sect. D in cf. [5]). Further, a maximum-length NHCA can be derived from a maximum-length LHCA using the algorithm (See Algorithm 1 in cf. [10]). In what follows, we show that finding the state s(t − δ) from a state s(t) in a maximum-length NHCA is a hard problem. Theorem 1 Suppose C is an (n + 1)-bit NHCA synthesized from a null-boundary, maximum-length LHCA L with the neighborhoodness N > 2. Given a state s(t) in C it needs 2(n) -time (i.e., infeasible) to find the state s(t − δ) where 2 < δ ∈ poly(n). Proof There are only two ways to figure out s(t − δ) from s(t). One needs to compute either (2n+1 − 1 − δ) number of state-transitions in the forward direction or δ number of state-transitions in the backward direction. We show that each of these strategies takes 2(n) -time. Forward direction: δ is in poly(n), so (2n+1 − 1 − δ) > 2n . So it must take (n) -time to figure out s(t − δ) from s(t) in the forward direction. 2 Backward direction: One may proceed in the backward direction by computing s(t − δ) = R−δ (s(t)). As N > 1 each bit bit of s(t) depends on at least (δ + 1) bits de f
of s(t − δ). Since, δ > n each bit bit depends on all the bits bit−δ . We denote f iδ = f i ( f i (. . . f i ())) as the composition of δ number of f i s. Therefore, in the backward direction, we have b0t = f 0δ (b0t−δ , b1t−δ , . . . , bnt−δ ) b1t = f 1δ (b0t−δ , b1t−δ , . . . , bnt−δ ) .. .. .. . . . bnt = f nδ (b0t−δ , b1t−δ , . . . , bnt−δ ). For δ = 2, solving a system of nonlinear equations is known to be NP-hard (Sect. 2.7.4 in cf. [1]). In our case, δ > 2. Hence, finding s(t − δ) from s(t) is NPhard. Therefore, it is infeasible to find the state s(t − δ) from the state s(t).
8
S. Sur and D. Roychowdhury
Note that, finding s(t − δ) from s(t) is easy when the inverse CA (i.e., the inverse ruleset R−1 ) is at hand. Given the ruleset R of any maximum-length CA, there exists an inverse ruleset R−1 but we do not know any efficient algorithm to find it. A third line of attack may try to exploit the biasness of the nonlinear operations logical AND and/or OR. In Lemma 1, we show that all f i are unbiased as they include at least one logical-XOR operation. Lemma 1 Boolean functions containing one or more logical-XOR operation are unbiased. Proof Suppose, f : {b0 , b1 , . . . , bn−1 } → {0, 1} is any boolean function of n varin ables. Note that f can be any function among the 22 boolean functions. Now we construct another boolean function f = bn ⊕ f (b0 , b1 , . . . , bn−1 ). Suppose x ∈ {0, 1} and f = x for m < 2n number of combinations of {b0 , b1 , . . . , bn−1 }. For rest of ¯ Now f = x implies either f = x and bn = 0 or the (2n − m) combinations f = x. f = x¯ and bn = 1. Therefore, among 2n+1 total combinations of {b0 , b1 , . . . , bn }, for m + (2n − m) = 2n combinations f = x. Similarly, for other 2n combinations ¯ Hence the probability, f = x. Pr[ f = 0] = Pr[ f = 1] =
2n 1 = . 2n+1 2
Hence, by Lemma 1, all the bits of C (and L) are unbiased. Therefore, statistical attacks based on the bias in any bit bit are not possible. Theorem 1 and Lemma 1 together indicate that C produces a sequence of random bit-strings (states) during its evolution and this sequence is infeasible to access arbitrarily when the offset is sufficiently long. These observations are vital for our VDF scheme. Note, however, that in a maximum-length n-bit CA, the state repeats after 2n − 1 transitions which is a huge number. What is important in the current context is that there should not be information leakage before 2n − 1 a number of transitions.
4 VDF Based on Cellular Automata In this section, we propose a VDF scheme based on an NHCA C generated from a maximum-length, null-boundary LHCA L with N > 1 by the Alg. 1 in [10]. Clearly, C satisfies Theorem 1 and Lemma 1. The number of cells n of C is determined by the security parameter λ ∈ Z+ . We will see the targeted sequential time T is bounded by O(λ2λ ). The three algorithms that specify our VDF are now described.
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata
9
4.1 The Setup(1λ , T ) Algorithm This algorithm outputs the public parameters pp = C, n, δ, H having the following meanings. i. C is a maximum-length, null-boundary NHCA generated from an LHCA L (according to Alg. 1 in [10]). ii. n is the number of bits in C. We need n = λ + 1. iii. δ ∈ Z+ tunes the effort to verify. We require n ≤ δ ≤ poly(n) in order to achieve efficient verification and sequentiality as described in Theorem 4. iv. We take H : {0, 1}∗ → S to be an efficiently computable hash function with the set S of all possible states of C as its range. In particular, {0, 1}∗ and S are the domain and the range of this VDF, respectively.
4.2 The Eval( pp, x) Algorithm The Eval( pp, x) algorithm is defined as follows. 1. Use H to map the challenge x ∈ X to s(0) ∈ S. 2. Initialize C with s(0). 3. Update the state of the NHCA C, for τ = 2n − δ − 1 times. Here, τ counts the number of state updates (transitions in CA) applied on C. Formally, compute s(τ ) = Rτ (s(0)). 4. Output s(τ ).
4.3 The Verify( pp, x, s(τ )) Algorithm The Verify( pp, x, s(τ )) algorithm runs the following steps. 1. Use H to map the challenge x ∈ X to s(0) ∈ S. 2. Initialize the NHCA with s(τ ). 3. Update the state of the NHCA up to δ times to obtain a state s . That is s = Rδ (s(τ )). 4. Output 1 if s = s(0); 0 otherwise.
5 Security of the Proposed VDF The three desired properties of VDFs are discussed here.
10
S. Sur and D. Roychowdhury
5.1 Correctness According to Definition 2, any VDF should accept, with high probability, a valid output (and the proof, if any) against the corresponding input from its domain. The following theorem establishes this correctness property of our VDF scheme. Theorem 2 The proposed VDF V is correct. Proof Since H is a deterministic hash function, s(0) ∈ S is uniquely determined by the challenge x ∈ X . Moreover, for any τ ≥ 0, s(τ ) = Rτ (s(0)) if and only if n s(0) = R(2 −1−τ ) (s(τ )). In our case δ = 2n − 1 − τ , therefore, it follows that ⎡
⎤ pp ← Setup(1λ , T ) ⎢ ⎥ $ Pr ⎣ Verify( pp, x, s(τ )) = 1 x ← ⎦=1 −X s(τ ) = Eval( pp, x)
5.2 Soundness Here we show that the VDF is perfectly sound, that is, even a computationally unbounded adversary A cannot produce a misleading proof for an invalid output of the VDF. Theorem 3 The proposed VDF V is perfectly sound. Proof Since C is a maximum-length NHCA, we have s(τ ) = s(τ ) if and only if τ ≡ τ mod (2n − 1). Any s(τ ) = s(τ ) does not satisfy Verify( pp, x, s(τ )) = 1. Therefore even for a computationally unbounded algorithm A, it holds that Pr
pp ← Setup(1λ , T ) s(τ ) = Eval( pp, x) =0 Verify( pp, x, s(τ )) = 1 (x, s(τ )) ← A(1λ , 1T , pp)
.
5.3 Sequentiality The sequentiality analysis of our VDF scheme is based on the random evolution of the NHCA C. We show that even if there is a preprocessing algorithm A0 that enables the adversary A1 running on Δ parallel processors, to determine τ from s(0) and s(τ ) in O(1) time, our scheme stands sequential. The preprocessing algorithm works before the challenge x is made available.
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata
11
Theorem 4 The proposed VDF V is (Δ, T /(λ + 1))-sequential. Proof Upon the arrival of a challenge x ∈ X , s(0) = H (x) is first computed. Since H is a hash function, s(0) is a random state of the NHCA C, and the adversary has no control over this initial state s(0). By Theorem 1 it needs (λ2λ )-operations to find s(τ ) from s(0) as n = λ + 1 and δ ≥ n. Further, Lemma 1 suggests that all the states are randomly distributed over S. Any random choice of s(τ ) ∈ S will satisfy Verify( pp, x, s(τ )) = 1 with probability ≤ 1/2n = 1/2λ+1 which is negl(λ). On the other hand, A1 may take the advantage of at most n processors out of Δ processors to reach the next state in O(1) time leading to 2(λ) parallel time. More than n processors apparently do not help A1 to do it faster.
6 Performance Analysis Here, we analyze the effort required in the proposed VDF. Complexity of Setup Among the public parameters, only the rule vector needs to be computed. As already mentioned in Sect. 3, given a primitive polynomial of degree n, the rule vector of a maximum length, null-boundary LHCA is generated by a polynomial-time algorithm [5]. It runs non-deterministically with success probability 1/2 if n is even, and is deterministic if n is odd. Further, it can be efficiently converted to the NHCA C by injecting logical ANDs into certain chosen positions [10]. We can design H from any good hash function. The difficulty level for the prover is set as proportional to (λ2λ ). Complexity of Eval The effort spent by the prover to run Eval( pp, x) is now deduced. We assume that a single call of the hash function H takes O(1) time. Each state update s(i) → s(i + 1) of C requires computing the next value of the n cells. Since any bit of s(i + 1) depends on at least N neighbors of it, to update a single bit (N )-operations are required. Each state update s(i) → s(i + 1) of C needs (nN ) operations. For τ transitions, the required effort of the prover is (nτ N ). We have n = λ + 1 and τ > 2λ , so this effort is (λ2λ ), an exponential expression in the security parameter λ. Complexity of Verify The effort spent by the verifier is now enumerated. The verifier needs δ state updates where each update requires (nN ) operations. So it requires (nδN ) time. Finally, (n) operations can compare two vectors. As n = λ + 1 and δ is in poly(λ), (nδN ) is a poly(λ) expression in the security parameter λ. We compare our work with the existing ones in Table 1.
12
S. Sur and D. Roychowdhury
Table 1 Comparison among the existing VDFs. T is the targeted time bound, λ is the security parameter, Δ is the number of processors. All the quantities may be subjected to O-notation, if needed VDF (by authors) Eval Eval parallel Verify Setup Proof size sequential Boneh et al. [4] Wesolowski [18]
T2 (1 +
2 log T
Pietrzak [16]
(1 +
√2 T
Feo et al. [8] Our work
T T
)T
)T
> T − o(T ) (1 + 2 Δ log T )T
log T λ4
log T λ3
– λ3
(1 +
log T
λ3
log T
λ4
T log λ poly(λ)
– –
2 √ Δ T
)T
T T /(λ + 1)
λ2
7 Conclusion and Open Problems This paper presents an idea of constructing a verifiable delay function based on nonlinear hybrid cellular automata. Our scheme is always correct and sound and is sequential up to negligible division of the targeted time. Additionally, the proposed VDF produces the proof with the shortest size reducing the cost of communication between the prover and the verifier. Although we have been able to link the security of our scheme with the inherent randomness of cellular automata, it remains open to investigate whether any intrinsic property of some primitives other than cellular automata can be used to design VDFs or not.
References 1. Arora S, Barak B (2009) Computational complexity - a modern approach. Cambridge University Press. http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521424264 2. Banerjee T, Chowdhury DR (2019) NCASH: nonlinear cellular automata-based hash function. In: Proceedings of the fifth international conference on mathematics and computing - ICMC 2019, Bhubaneswar, India, pp 111–123. Accessed from 6–9 Feb 2019. https://doi.org/10.1007/ 978-981-15-5411-7_8 3. Bhattacharjee K, Naskar N, Roy S, Das S (2020) A survey of cellular automata: types, dynamics, non-uniformity and applications. Nat Comput 19(2):433–461. https://doi.org/10.1007/s11047018-9696-8 4. Boneh D, Bonneau J, Bünz B, Fisch B (2018) Verifiable delay functions. In: Shacham H, Boldyreva A (eds) Advances in Cryptology - CRYPTO 2018 - 38th annual international cryptology conference, Santa Barbara, CA, USA, Proceedings, Part I. Lecture notes in computer science, vol 10991. Springer, pp 757–788. Accessed from 19–23 Aug 2018. https://doi.org/10. 1007/978-3-319-96884-1_25 5. Cattell K, Muzio J (1996) Synthesis of one-dimensional linear hybrid cellular automata. IEEE Trans CAD Integrat Circuits Syst 15:325–335. https://doi.org/10.1109/43.489103 6. Chaudhuri PP, Chowdhury DR, Nandi S, Chattopadhyay S (1997) Additive cellular automata: theory and applications, vol 1. Wiley
Verifiable Delay Function Based on Non-linear Hybrid Cellular Automata
13
7. Das S, Mukherjee S, Naskar N, Sikdar BK (2009) Characterization of single cycle CA and its application in pattern classification. Electron Notes Theor Comput Sci 252:181–203. https:// doi.org/10.1016/j.entcs.2009.09.021 8. Feo LD, Masson S, Petit C, Sanso A (2019) Verifiable delay functions from supersingular isogenies and pairings. In: Galbraith SD, Moriai S (eds) Advances in cryptology - ASIACRYPT 2019 - 25th international conference on the theory and application of cryptology and information security, Kobe, Japan, Proceedings, Part I. Lecture notes in computer science, vol 11921. Springer, pp 248–277. Accessed from 8–12 Dec 2019. https://doi.org/10.1007/978-3-03034578-5_10 9. Ganguly N, Maji P, Sikdar BK, Chaudhuri PP (2002) Generalized multiple attractor cellular automata (GMACA) model for associative memory. Int J Pattern Recognit Artif Intell 16(7):781–796. https://doi.org/10.1142/S0218001402001988 10. Ghosh S, Sengupta A, Saha D, Chowdhury DR (2014) A scalable method for constructing nonlinear cellular automata with period 2n − 1. In: Was J, Sirakoulis GC, Bandini S (eds) Cellular automata - 11th international conference on cellular automata for research and industry, ACRI 2014, Krakow, Poland, Proceedings. Lecture notes in computer science, vol 8751. Springer, pp 65–74. Accessed from 22–25 Sep 2014. https://doi.org/10.1007/978-3-319-11520-7_8 11. Jose J, Das S, Chowdhury DR (2016) Prevention of fault attacks in cellular automata based stream ciphers. J Cell Autom 12(1–2):141–157. http://www.oldcitypublishing.com/journals/ jca-home/jca-issue-contents/jca-volume-12-number-1-2-2016/jca-12-1-2-p-141-157/ 12. Karmakar S, Chowdhury DR (2011) NOCAS: a nonlinear cellular automata based stream cipher. In: 17th international workshop on cellular automata and discrete complex systems, automata 2011, Center for mathematical modeling, University of Chile, Santiago, Chile, pp 135–146. Accessed from 21–23 Nov 2011. http://dmtcs.episciences.org/2970 13. Maiti S, Chowdhury DR (2018) Achieving better security using nonlinear cellular automata as a cryptographic primitive. In: Mathematics and computing - 4th international conference, ICMC 2018, Varanasi, India, Revised Selected Papers, pp 3–15. Accessed from 9–11 Jan 2018. https://doi.org/10.1007/978-981-13-0023-3_1 14. Maiti S, Chowdhury DR (2021) Design of fault-resilient s-boxes for aes-like block ciphers. Cryptogr Commun 13(1):71–100. https://doi.org/10.1007/s12095-020-00452-0 15. Maiti S, Ghosh S, Chowdhury DR (2017) On the security of designing a cellular automata based stream cipher. In: Information security and privacy - 22nd Australasian conference, ACISP 2017, Auckland, New Zealand, Proceedings, Part II, pp 406–413. Accessed from 3–5 July 2017. https://doi.org/10.1007/978-3-319-59870-3_25 16. Pietrzak K (2019) Simple verifiable delay functions. In: Blum A (ed) 10th Innovations in theoretical computer science conference, ITCS 2019, San Diego, California, USA. LIPIcs, vol 124. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 60:1–60:15. Accessed from 10–12 Jan 2019. https://doi.org/10.4230/LIPIcs.ITCS.2019.60 17. Rivest RL, Shamir A, Wagner DA (1996) Time-lock puzzles and timed-release crypto. Tech. rep, USA 18. Wesolowski B (2019) Efficient verifiable delay functions. In: Ishai Y, Rijmen V (eds) Advances in cryptology - EUROCRYPT 2019 - 38th annual international conference on the theory and applications of cryptographic techniques, Darmstadt, Germany, Proceedings, Part III. Lecture notes in computer science, vol 11478. Springer, pp 379–407. Accessed from 19–23 May 2019. https://doi.org/10.1007/978-3-030-17659-4_13
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack Bijoy Das, Abhijit Das, and Dipanwita Roy Chowdhury
Abstract Welch–Gong (WG) is a hardware-oriented LFSR-based stream cipher. WG-7 is a version of the eStream submission Welch–Gong, used for RFID encryption and authentication purposes. It offers 80-bit cryptographic security. In modern days, almost all ciphers achieve security by exploiting the nonlinear feedback structure. In this paper, we investigate the security of the nonlinear feedback-based initialization phase of the WG-7 stream cipher using the conventional bit-based division property of cube attack, by considering the cipher in a non-blackbox polynomial setting. In our work, we mount the cube attack using mixed-integer-linear-programming (MILP) models. The results of our attack enable us to recover the secret key of WG-7 after 20 rounds of initialization utilizing 210 keystream bits in 273 time. We show that our proposed attack takes significantly lower data complexity. To the best of our knowledge, our attack is the first one that investigates the security of the nonlinear feedback-based initialization phase of WG-7 cipher. Keywords Welch–Gong · Cube attack · Division property · MILP · Lightweight stream ciphers
1 Introduction WG-7 [4] is a fast lightweight word-oriented stream cipher whose construction is based on the Welch–Gong (WG) [5] stream cipher. WG-7 includes a 23-stage wordoriented LFSR with each stage working over the finite field F7 , and a nonlinear filter function that is based on the WG transformation. First, the LFSR is loaded with the B. Das (B) · A. Das · D. R. Chowdhury Indian Institute of Technology Kharagpur, Kharagpur, India e-mail: [email protected] A. Das e-mail: [email protected] D. R. Chowdhury e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_2
15
16
B. Das et al.
key and the IV. Next, the LFSR with its nonlinear function is run for 46 iterations. Then the cipher generates the appropriate keystream bits that are used for encryption. The cipher is mainly designed for encryption in resource-constrained environments such as mobile phones, smart cards, and RFID applications. We see that a 80-bit lightweight stream cipher WG-7 has already been broken with an algebraic attack. The authors of [6, 8] mounted the attack separately on WG-7 by exploiting the fact that WG-7 is updated linearly in the KSG phase. In [6], it is possible for the attacker to find the annihilator function g of the filtering function f such that f g = 0 and deg g < deg f = 5. The best annihilator function g with degree 3 is worked out. This helps the attacker to mount the algebraic attack on WG-7 with a time complexity of about 227 and data complexity 219.38 . In the case of [8], the authors improved the algebraic attack complexity of [6] by providing the upper bound for the spectral immunity (SI) of the cipher. But the computation of SI depends on the number of nonzero coefficients in the polynomial. The measurement of nonzero coefficients is bounded by the degree of the polynomial. But the degree of any polynomial does not change when the feedback path contains only a linear operation. This helps the attacker to improve the attack with 217.3 data complexity and 228 time complexity. But this cannot happen in the presence of the nonlinear function in the feedback path of the cipher, where the degree of the functions grows very fast. So we observe that the aforementioned attacks work only when the feedback path of the cipher is linear. In this paper, we analyze the initialization phase which contains the nonlinear operation in the feedback path, and show that the secret key of the WG-7 cipher can be recovered to the reduced 20 rounds of the initialization phase using only 210 data complexity. To the best of our knowledge, our attack is the first one to investigate the security of the nonlinear feedback-based initialization phase of the WG-7 lightweight stream cipher. Our Contributions. We mount the cube attack on the initialization phase of the reduced 20 rounds of WG-7. We construct the model of the division trail that propagates through the WG-permutation (WGP) function of WG-7 as a 7-bit S-box trail. Moreover, we build the model of the division trails through the linear layer of the cipher by finding the invertible (sub)matrices of the matrix of the linear transformation. Our optimizations lead to a full key recovery using only 210 bits in the keystream and with a time complexity of 273 . Table 1 shows the comparison with the existing algebraic attack. We see that our attack is not impaired by the nonlinear function in the feedback operation. Our attack also significantly reduces data complexity of 210 . The rest of the paper is organized as follows. In Sect. 2, we review the concepts of cube attack. Section 2.2 presents a brief overview of division property, bit-based division property, and how to model the division trails using Mixed Integer Linear Programming (MILP). The specification of WG-7 is provided in Sect. 3. Section 4 elaborates our proposed cube attack on the initialization phase of WG-7. Section 5 concludes the paper.
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack Table 1 Performance on data complexity Ref. Data complexity Time complexity [6] [8] Our approach
219.38
227
217.3 210
228 273
17
Work environment It works only in the absence of the nonlinear feedback path Attack become successful even if nonlinear function is present in the feedback path
2 Preliminaries Here, we give the notations and definitions we will use in this paper.
2.1 Cube Attack The cube attack was proposed by Dinur and Shamir in EUROCRYPT [2] to recover the secret key. For an n 1 -bit key k = (k1 , k2 , . . . , kn 1 ) and m 1 -bit IV v = (v1 , v2 , . . . , vm 1 ); let f (x) be a boolean function from Fn2 to F12 such that x = k||v and n = n 1 + m 1 . Let u ∈ Fn2 be a constant vector. Then the ANF of f (x) is defined as f (x) = x u × p(x) + q(x), where each term of q(x) is not divisible by x u . For a set of cube indices I = {0 ≤ i ≤ n − 1 : u i = 1} ⊂ {0, 1, . . . , n − 1}, x u represents the corresponding monomial. Therefore, the summation of f (x) over all values of C I = {x ∈ Fn2 : u x} is given by x∈C I
f (k, v) =
(x u × p(x) + q(x)) = p(x),
(1)
x∈C I
where p(x) is called the superpoly of C I , and it only involves the variables x j such that u j = 0 for 0 ≤ j ≤ n − 1. Equation (1) implies that if the attacker gets a superpoly that is simple enough, she can query the encryption oracle feeding C I . All the first keystream bits returned are summed to evaluate the right-hand side of Eq. (1). Subsequently, she recovers the secret key bits by solving a system of equations.
18
B. Das et al.
2.2 MILP-Aided Bit-Based Division Property Bit Product Function πu (x). For any u ∈ Fn2 , let πu (x) be a function from Fn2 to F2 . For any input x ∈ Fn2 , πu (x) is the AND of x[i] satisfying u[i] = 1. It is defined as n πu (x) = x[i]u[i] . i=1
Definition 1 (Division Property) Let X be a multi-set whose elements take values l l0 ,l1 ,...,lm −1 , from Fl20 × Fl21 × · · · × F2m−1 . The multi-set X has the division property DK where K denotes a set of m-dimensional vectors whose ith elements take values between 0 and li , if it fulfills the following condition:
πu (x) =
x∈X
unknown, if there exist k ∈ K such that wt(u) k, 0, otherwise.
l ,l ,...,l
If there are k, k ∈ K such that k k in the division property DK0 1 m−1 , then k can be removed from K because it is redundant. When l0 , l1 , . . . , lm−1 are restricted to 1, we talk about bit-based division property. The main idea of the MILP-aided bit-based division property is to model the propagation rules as a series of linear (in)equalities. We adopt the MILP models for copy, AND, and XOR from [11].
3 The WG-7 Stream Cipher WG-7 [4] is a stream cipher designed by Y. Luo, Q. Chai, G. Gong, and X. Lai in 2010. As depicted in Fig. 1, the structure of WG-7 consists of a 23-stage LFSR and a nonlinear filtering function which is realized by the Welch–Gong (WG) transformation. Each stage of WG-7 works over the finite field F27 . This finite field is defined by the primitive polynomial g(x) = x 7 + x + 1. The characteristic polynomial is primitive over F27 , and is given by f (x) = x 23 + x 11 + β ∈ F27 [x], where β is a root of g(x). This cipher uses an 80-bit secret key and an 81-bit initialization vector (IV). The cipher works in two phases: initialization and Keystream Generation (KSG) phases. In the initialization phase, the cipher is first loaded with the 80-bit key and the 81-bit IV. Then, the LFSR is clocked 46 times with the output of a nonlinear permutation feedback function WGP, whereas the KSG phase does not include this nonlinear feedback path. We denote the state at the ith round by S i = S i [0]||S i [1]|| . . . ||S i [22], where S i [ j] = (s7i j , s7i j+1 , s7i j+2 , s7i j+3 , s7i j+4 , s7i j+5 , s7i j+6 ) for 0 ≤ j ≤ 22. The nbit secret key and the m-bit IV are represented as K 0,...,n−1 and I V0,...,m−1 , respectively. Initially, the 23-stage LFSR is loaded as follows: For 0 ≤ i ≤ 10, S 0 [2i] = (K 7i,7i+1,7i+2,7i+3 , I V7i,7i+1,7i+2 ), S 0 [2i + 1]=(K 7i+4,7i+5,7i+6 , I V7i+3,7i+4,7i+5,7i+6 ), and S 0 [22] = (K 77,78,79 , I V77,78,79,80 ). The state-update function in this phase is given by S i+1 [ j] = S i [ j + 1] for 0 ≤ j ≤ 21, and S i+1 [22] = S i [11] ⊕ β S i [0] ⊕ W G P(S i [22]).
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack
19
7 : WG permutation with cubic decimation Initialization
7
: Trace function
Phase
1
Fig. 1 Design of WG-7 Stream Cipher
During the KSG phase, the keystream bit is given by z i−46 = T r (s 3 + i i i i i i i , s155 , s156 , s157 , s158 , s159 , s160 ). s + s 21 + s 57 + s 87 ), where s = S i [22] = (s154 The ANF representation of the keystream bit is represented by z i−46 i i i i i i i i i i i i i i i = s160 + s158 s160 + s158 s159 + s157 + s157 s160 + s157 s159 + s157 s159 s160 + s157 s158 + i i i i i i i i i i i i i i i i i i s158 s160 + s157 s158 s159 s160 + s156 s158 s160 + s156 s157 + s156 s157 s159 s160 + s156 s157 s157 i i i i i i i i i i i i i i i i i s158 s159 s160 + s155 + s155 s159 + s155 s159 s160 + s155 s156 s158 + s154 s155 + s155 s156 s160 i i i i i i i i i i i i i i i i i i + s155 s158 s159 s160 + s155 s157 s159 s160 + s155 s157 s158 s159 s160 + s155 s156 s157 s160 + s155 i i i i i i i i i i i i i i i i i s156 s157 s158 + s154 + s154 s159 + s154 s159 s160 + s154 s156 + s154 s158 s159 + s154 s157 s160 i i i i i i i i i i i i i i i i i i + s154 s157 s159 s160 + s154 s157 s158 s159 s160 + s154 s156 s160 + s154 s156 s158 s160 + s154 s156 i i i i i i i i i i i i i i i i i i i s158 s159 s160 + s154 s156 s157 s159 + s154 s156 s157 s158 s159 + s154 s155 s159 s160 + s154 s155 s158 i i i i i i i i i i i i i i i i i i s160 + s154 s155 s156 + s154 s155 s158 s159 s160 + s154 s155 s157 s159 + s154 s155 s157 s158 s160 + i i i i i i i i i i i i i i i i i i i s155 s157 s158 s159 + s154 s155 s156 s159 + s154 s155 s156 s159 s160 + s154 s155 s156 s158 s159 + s154 i i i i i i i i i i s154 s155 s156 s157 s160 + s154 s155 s156 s157 s158 . In this phase, the state is updated as S i+1 [ j] = S i [ j + 1] for 0 ≤ j ≤ 21, and i+1 S [22] = S i [11] ⊕ β S i [0]. 9
4 Cube Attack on WG-7 In this section, we describe the process of mounting the cube attack on WG-7 cipher.
4.1 Model the Initialization Phase of WG-7 Using MILP We start the cube attack by modeling the division property propagation in each round for each of the functions used in the WG-7 cipher.
20
B. Das et al.
Algorithm 1 MILP model for β.X operation in WG-7 1: function LinearLayer(X) 2: Let M be the 7 × 7 Linear Transformation Matrix for WG-7 Cipher 3: M ← MatrixTranspose(M) Compute the Transpose of M 4: M.var ← y j as binary for 0 ≤ j ≤ 6 5: for i ∈ {0, 1, 2, 3, 4, 5, 6} do 6: M.var ← ti j as binary if Mi, j = 1 for 0 ≤ j ≤ 6 7: M.var ← t ji as binary if Mi, j = 1 for 0 ≤ j ≤ 6 8: end for 9: for i ∈ {0, 1, 2, 3, 4, 5, 6} do 10: M.con ← xi = [t ji if Mi, j = 1 for 0 ≤ j ≤ 6] where X = (x0 , x1 , . . . , x6 ) M.con ← [ti j if Mi, j = 1 for 0 ≤ j ≤ 6] = yi 11: 12: end for 6 6 13: M.con ← yj = xj j=0
j=0
14: return (M, [y0 , y1 , y2 , y3 , y4 , y5 , y6 ]) 15: end function
Algorithm 2 MILP model for KSG operation in WG-7 1: function KSG(S) 2: M.var ← z as binary 3: (M, A, a1 ) ← AND(S,[158,160]) , (M, B, a2 ) ← AND(A,[154,155,157,158,160]) 4: (M, C, a3 ) ← AND(B,[158,159]) , (M, D, a4 ) ← AND(C,[154,155,157,159]) 5: (M, E, a5 ) ← AND(D,[157,160]) , (M, F, a6 ) ← AND(E,[154,155,158,160]) 6: (M, G, a7 ) ← AND(F,[157,159]), (M, H, a8 ) ← AND(G,[154,155,159,160]) (M, J, a10 ) ← AND(I,[154,156,160]) 7: (M,I, a9 ) ← AND(H,[157,159,160]), 8: (M, K, a11 ) ← AND(J,[157,158]) , (M, L, a12 ) ← AND(K,[156,157,159,160]) 9: (M, M, a13 ) ← AND(L,[157,158,160]), (M, N, a14 ) ← AND(M,[156,158,160]) 10: (M, O, a15 ) ← AND(N,[157:160]) , (M, P, a16 ) ← AND(O,[154,156,158,160]) 11: (M,Q, a17 ) ← AND(P,[156,157]) , (M, R, a18 ) ← AND(Q,[155,158,159,160]) 12: (M, T, a19 ) ← AND(R,[156:160]), (M, U, a20 ) ← AND(T,[154,157,160]) 13: (M, V, a21 ) ← AND(U,[155,159]), (M, W, a22 ) ← AND(V,[155,157,159,160]) 14: (M,X, a23 ) ← AND(W,[155,159,160]), (M, Y, a24 ) ← AND(X,[155,156,160]) 15: (M, Z, a25 ) ← AND(Y,[155,[157:160]]), (M, b, a26 ) ← AND(Z,[155,156,158]) 16: (M, c, a27 ) ← AND(b,[155:158]), (M, d, a28 ) ← AND(c,[154,157,159,160]) 17: (M, e, a29 ) ← AND(d,[154,159]), (M, f, a30 ) ← AND(e,[154,156,157,159]) 18: (M, g, a31 ) ← AND(f,[154,156]), (M, h, a32 ) ← AND(g,[154,156,[158:160]]) 19: (M, k, a33 ) ← AND(h,[[155:157],160]), (M, l, a34 ) ← AND(k,[154,159,160]) 20: (M,m, a35 ) ← AND(l,[154,158,159]), (M,q, a36 ) ← AND(m,[154,[157:160]]) 21: (M, r, a37 ) ← AND(q,[154,155]), (M, t, a38 ) ← AND(r,[154,155,[158:160]]) 22: (M, u, a39 ) ← AND(t,[154,155,[157:159]]), (M,v, a40 ) ← AND(u,[154:156]) 23: (M,w, a41 ) ← AND(v,[154,[156:159]]), (M, x, a42 ) ← AND(w,[[154:156],159]) 24: (M, y, a43 ) ← AND(x,[[154:156],159,160]), (M, z, a44 ) ← AND(y,[154:158]) 25: (M, i, a45 ) ← AND(z,[[154:156],158,159]) (M, j, a46 ) ← AND(i,[[154:157],160]) 26: (M, S47 , a47 ) ← XOR(j, [154, 155, 157, 160]) 27: M.con ← z = 47 1 ai 28: return (M, S47 , z) 29: end function
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack
21
MILP model for the Feedback Function (FBK). The function FBK at the ith round is expressed as β S i [0] ⊕ S i [11]. To model β S i [0], one can use its ANF representation described in Sect. 3. Let (x0 , x1 , . . . , x6 ) and (y0 , y1 , . . . , y6 ) be the input and output of β S i [0], respectively. Using the technique [9], we get the following system of equations by introducing 17 intermediate binary variables ti for 1 ≤ i ≤ 17 (see equation (2)). ⎧ x0 ⎪ ⎪ ⎪ ⎪ x1 ⎪ ⎪ ⎪ ⎪ x ⎪ 2 ⎪ ⎨ x3 x4 ⎪ ⎪ ⎪ ⎪ x5 ⎪ ⎪ ⎪ ⎪ x6 ⎪ ⎪ ⎩
= t1 = t2 + t3 + t4 = t5 + t6 + t7 + t8 = t9 + t10 = t11 + t12 + t13 = t14 + t15 = t16 + t17
⎧ y0 ⎪ ⎪ ⎪ ⎪ y1 ⎪ ⎪ ⎪ ⎪ y ⎪ 2 ⎪ ⎨ y3 y4 ⎪ ⎪ ⎪ ⎪ y5 ⎪ ⎪ ⎪ ⎪ y6 ⎪ ⎪ ⎩
= t2 + t9 + t11 = t5 = t6 + t14 = t12 = t3 + t7 = t16 = t1 + t4 + t8 + t10 + t13 + t15 + t17
(2)
We get 626 solutions for these equations. But we observe 76 solutions lead to an invalid propagation. Hence, in order to model β.S i [0], we adopt the technique from [3, 13] where the invalid propagations are eliminated by checking whether the corresponding sub-matrices of the linear transformation matrix are invertible or not. Algorithms 1 and 3 elaborate the MILP model for this operation of WG-7 cipher. Algorithm 3 MILP model for FBK operation in WG-7 1: function FBK(S, [0,11]) 2: M.var ← z j as binary for 0 ≤ j ≤ 6 3: for i ∈ [0, 11] do 4: M.var ← s7i+ j , x 7i+ j as binary for 0 ≤ j ≤ 6 5: end for 6: for i ∈ [0, 11] do 7: M.con ← s7i+ j = s7i+ j + x 7i+ j as binary for 0 ≤ j ≤ 6 8: end for 9: (M, Y ) ← LinearLayer(X) X = {x0 , x1 , . . . , x6 } 10: for j ∈ [0, 1, 2, 3, 4, 5, 6] do 11: M.con ← z j = y j + x 77+ j 12: end for 13: for j ∈ {0, 1, . . . , 22} \ {0, 11} do 14: S [ j] = S[ j] S [ j] = (s7 j , s7 j+1 , s7 j+2 , s7 j+3 , s7 j+4 , s7 j+5 , s7 j+6 ) 15: end for 16: return (M, S , [z 0 , z 1 , z 2 , z 3 , z 4 , z 5 , z 6 ]) 17: end function
MILP model for the WG-permutation (WGP). We represent this function as a 7-bit S-box, where (x0 , x1 , x2 , x3 , x4 , x5 , x6 ) and (y0 , y1 , y2 , y3 , y4 , y5 , y6 ) are the input and the output of WGP, respectively. Then we use the table-aided bit-based division property on this function introduced in [1]. After that what has been proposed in [12],
22
B. Das et al.
by using the inequality generator() function in the Sage software,1 a set of 8831 inequalities returned. This makes the size of the corresponding MILP problem too large to solve. Hence, we reduce this set by the greedy algorithm in [10, 12]. We see that the following 21 inequalities are sufficient as constraints to model this function. Using these 21 inequality constraints, the MILP model for the WG-permutation (WGP) is constructed as in Algorithm 7. 1. −2x6 − x5 − 2x4 − 2x3 − 2x1 − 2x0 − 2y6 + 2y5 − y4 + y3 + y2 −3y1 + y0 ≥ −12 2. −3x6 − 3x5 − 2x4 − 4x3 − 3x2 − 4x1 − x0 + y6 − y5 + 3y4 + y3 +2y2 + 2y1 + 2y0 ≥ −17 3. x6 + x5 + x4 + x3 + x2 + 25x1 + x0 − 5y6 − 5y5 − 5y4 − 5y3 − 5y2 − 5y1 − 5y0 ≥ −4 4. 6x6 − y6 − y5 − y4 − y3 − y2 − y1 − y0 ≥ −1 5. x6 + x5 + x4 + x3 + x2 + x1 + 29x0 − 6y6 − 5y5 − 6y4 − 5y3 − 6y2 − 6y1 − 6y0 ≥ −5 6. x6 + x5 + 25x4 + x3 + x2 + x1 + x0 − 5y6 − 5y5 − 5y4 − 5y3 − 5y2 − 5y1 − 5y0 ≥ −4 7. x6 + x5 + x4 + 28x3 + x2 + x1 + x0 − 5y6 − 6y5 − 6y4 − 5y3 − 5y2 − 6y1 − 6y0 ≥ −5 8. 10x5 + x2 − 3y6 − y5 − 2y4 − 2y3 − 2y2 − y1 − 3y0 ≥ −3 9. −13x6 − 12x5 − 11x4 − 9x3 − 8x2 − 11x1 − 13x0 − y6 + 5y5 + 2y4 + 2y3 − 3y2 + y1 + 4y0 ≥ −67 10. −2x6 − x5 − 2x4 − 3x3 − 2x2 − 3x1 − x0 + 13y6 + 11y5 + 12y4 +12y3 + 13y2 + 13y1 + 13y0 ≥ 0 11. 6x2 − y6 − y5 − 2y4 − 2y3 − y2 − y1 ≥ −2 12. −2x5 − x3 − x2 − x0 − y6 − y5 − y4 − y3 + 5y2 − y1 − y0 ≥ −6 13. −3x5 − 5x4 − 4x3 − 6x2 − 2x1 − 2x0 − 4y6 − 5y5 + 16y4 − y3 −3y2 − 3y1 − 3y0 ≥ −25 14. −3x6 − 4x5 − 3x4 + 3x3 − 3x2 − 3x1 − 2x0 + 3y6 − 2y5 − 5y4 −4y3 − 4y2 − 2y1 + 9y0 ≥ −20 15. −2x6 − 2x5 − 2x4 − 3x3 − 2x1 − 2x0 + 2y6 + y5 − 2y4 − y3 + 5y2 − 3y1 − 4y0 ≥ −15 16. −x6 − 3x5 − 3x4 − 2x3 − 3x1 − x0 − y6 + 3y5 + 2y4 + 2y3 + y2 + y1 + 3y0 ≥ −11 1
http://www.sagemath.org/.
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack
23
17. −3x6 − 2x4 − 2x3 − x2 − 3x1 − 3x0 + 4y6 + 2y5 + 3y4 + 3y3 +y2 + 3y1 + 4y0 ≥ −10 18. −2x6 − 2x4 − 2x3 − 2x2 − x1 − x0 + 10y6 + 9y5 + 9y4 + 8y3 +9y2 + 9y1 + 10y0 ≥ 0 19. −x5 − x3 − 3x2 − x1 + x0 − y5 + 6y4 − 2y3 − 2y2 − 2y1 − 2y0 ≥ −8 20. −6x6 − 3x5 − 5x4 − 2x3 − x2 − 5x1 − 6x0 + 4y6 + 5y5 + 6y4 +6y3 + y2 + 5y1 + 4y0 ≥ −21 21. −x6 − 2x4 − 2x3 − 2x2 − 2x1 − x0 + y6 − y5 + y4 + 2y3 + y1 +y0 ≥ −9
MILP model for the Keystream Generation Operation (KSG). The MILP model for KSG is realized by the COPY, XOR, and AND operations based on its ANF expression (see Sect. 3). Algorithm 2 explains the MILP variables and constraints to propagate the division property for computing the keystream bit. Here, [a : b] contains (b − a + 1) elements. The values lie consecutively between a and b such as {a, a + 1, . . . , b − 1, b}. The MILP models for the AND and XOR are described in Algorithms 4 and 5, respectively. The overall MILP model for the WG-7 whose initialization is reduced to R rounds is given as function WG7Eval in Algorithm 6. The required number(s) of MILP variables and constraints for WG7Eval are (73R + 532) and (78R + 584), respectively.
4.2 Evaluate Secret Variables Involved in the Superpoly We start the evaluation by preparing a cube C I (I V ) taking all possible combinations of {vi1 , vi2 , . . . , vi|I | }. Then, we extract the involved secret variables J = {k j1 , k j2 , . . . , k j|J | } in the preferable superpoly using the technique proposed in [11, Algorithm 1]. It computes all the R-round division trails taking the initial division property as vi = 1 for i ∈ I , and vi = 0 for i ∈ {0, 1, . . . , m − 1} \ I . Table 2 summarizes all the secret bits involved in the preferable superpoly. Table 2 is built for 14 to 20 rounds of the initialization phase based on our chosen cubes of Table 2. Searching Cubes. We choose the cube I such that the value of 2 I +J becomes as small as possible. The cubes that we choose for this attack
satisfy the above condition, and are shown in Table 2 as I1 , I2 , . . . , I8 . A total of |I81| cubes of size |I | are possible. It is computationally infeasible to choose so many cubes. We do not have the reasonable evidence that our choice of cube indices are appropriate. But we have experimented a lot of cubes randomly. Based on our experiments, the cubes of Table 2 appear to be the best for this cipher. How to choose appropriate cubes is left as an open question.
24
B. Das et al.
Extract a balanced superpoly. We choose the constant part of the IV randomly, and recover the superpoly p(J, v), ¯ for v¯ = {v0 , v1 , . . . , vm−1 } \ I by trying out a total of 27 × 262 possible combinations (see Table 2) for the reduced 20 rounds of the cipher. Let Jˆ be the set of 262 possible values of J . In the offline phase, we compute the values of p(J, v), ¯ and store them in a table T1 indexed by Jˆ, and then evaluate the ANF accordingly. If p(J, v) ¯ becomes constant, we pick another random I V , and repeat the above process until we find an appropriate one such that p(J, v) ¯ is not constant.
Algorithm 4 MILP model for AND operation in WG-7 1: function AND(S, I) 2: for i ∈ I do 3: M.var ← si , xi as binary 4: end for 5: M.var ← y as binary 6: for i ∈ I do 7: M.con ← si = si + xi 8: end for 9: for i ∈ I do 10: M.con ← y ≥ xi 11: end for 12: for k ∈ (0, 1, . . . , 160) \ I do 13: sk = sk 14: end for 15: return (M, S , y) 16: end function
Algorithm 5 MILP model for XOR operation in WG-7 1: function XOR(S, I) 2: for i ∈ I do 3: M.var ← si , xi as binary 4: end for 5: M.var ← y as binary 6: for i ∈ I do 7: M.con ← si = si + xi 8: end for 9: temp = 0 10: for i ∈ I do 11: temp = temp + xi 12: end for 13: M.con ← y = temp 14: for k ∈ (0, 1, . . . , 160) \ I do 15: sk = sk 16: end for 17: return (M, S , y) 18: end function
Table 2 Involved key bits in the superpoly for the cube C{I1 ,I2 ,...,I8 } Cube sequence
Cube indices
#Rounds
Involved secret key variables ( J )
|J |
Time
I1
0,36,37,73,74,75,76
14
0, 1, 2,…, 5, 6,39,40,…,47,48,77,78,79
20
220+7
I2
35,37,73,74,75,76,80 15
0, 1, 2,…, 9, 10,39,40,…,51,52,77,78,79
28
228+7
I3
36,37,73,74,75,76,78 16
0, 1, 2,…, 12, 13,39,40,…,54,55,77,78,79
34
234+7
I4
35,37,73,74,75,76,77 17
0, 1, 2,…, 16, 17,39,40,…,58,59,77,78,79
42
242+7
I5
36,37,45,73,74,75,76 18
0, 1, 2,…, 19, 20,39,40,…,61,62,77,78,79
48
248+7
I6
35,37,73,74,75,76,79 19
0, 1, 2,…, 23, 24,39,40,…,65,66,77,78,79
56
256+7
I7
35,37,38,73,74,75,76 20
0, 1, 2,…, 26, 27,39,40,…,68,69,77,78,79
62
262+7
I8
35,37,39,73,74,75,76
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack
25
Algorithm 6 MILP model for the Initialization Round of WG-7 Stream Cipher 1: function WG7Eval(R) 2: Prepare an empty MILP model M 3: M.var ← S 0 [i] for 0 ≤ i ≤ 22 4: for r ∈ {1, 2, . . . , R} do 5: (M, S , a) ← W G P(Sr −1 ) 6: (M, S , b) ← F B K (S , [0, 11]) 7: for i = 0 to 21 do 8: Sr [i] = S [i + 1] 9: end for 10: M.con ← S [0] = 0 11: M.var ← S r [22] as binary 12: M.con ← Sr [22] = a + b 13: end for 14: (M, S , z) ← K SG(S R ) 15: for i = 0 to 22 do 16: M.con ← S [i] = 0 17: end for 18: M.con ← z = 1 19: end function
S 0 [i] = (s7i , s7i+1 , s7i+2 , . . . , s7i+6 )
Algorithm 7 MILP model for the WG-permutation in WG-7 1: function WGP(S) 2: M.var ← s154+i , xi , yi as binary for 0 ≤ i ≤ 6 M.con ← s154+i = s154+i + xi for 0 ≤ i ≤ 6 3: 4: Add constraints to M based on the reduced set of inequalities 5: for j = 0 to 21 do 6: S [ j] = S[ j] S [ j] = (s7 j , s7 j+1 , s7 j+2 , s7 j+3 , s7 j+4 , s7 j+5 , s7 j+6 ) 7: end for 8: return (M, S , [y0 , y1 , y2 , y3 , y4 , y5 , y6 ]) 9: end function
To sum up, we compute a table T1 of size Jˆ by computing 27+62 operations. The attack is possible if the attacker can find the appropriate IVs easily. We assume that we can recover the balanced superpoly in only one trial for each of the cubes in Table 2. Indeed, since each of these cubes has size 7, there exist 81 − 7 = 74 bits to set the constant part of the IV. Therefore, we anticipate that Assumption 1 (mentioned in [11]) holds with high probability, and derive the complexity figures accordingly.
4.3 Key Recovery for 20-Round Initialization Phase We use the balanced superpolys for the cubes I1 , I2 , . . . , I8 . The online phase consists of the following operations for each i ∈ {1, 2, . . . , 8}.
26
B. Das et al.
• Query the encryption oracle with C Ii and compute S = C I f (k, v) i • Compare S with each entry of T1 . The values of k¯ = {k j1 , k j2 , . . . , k j62 } for which S does not match T1 are discarded. Since the superpoly is balanced, we have ¯ and p({k j1 , k j2 , . . . , k j62 }, v) ¯ = 0 for 261 values of k, ¯ =1 p({k j1 , k j2 , . . . , k j62 }}, v) 61 ¯ for the remaining 2 values of k. Therefore, we can recover one bit of information in the secret variables. We can recover one bit of secret information for each cube only in one trial. Since we work with eight cubes, we recover eight secret variables. The remaining secret bits (80 − 8 = 72 of them) are recovered by guessing involving a brute-force complexity of 272 . The total time complexity for the attack is therefore 8 × 269 + 272 = 273 . The data complexity for the total computation is 8 × 27 = 210 .
5 Conclusion This paper investigates the security of the nonlinear feedback-based initialization phase of the lightweight stream cipher WG-7 [4]. We mount a division-propertybased cube attack by considering the structure of the cipher as a non-blackbox polynomial. Our attack proposes a MILP model for the initialization phase and the keystream generation phase of the cipher. We work out the details of the model as specific to the WG-7 cipher. To the best of our knowledge, the only two published attacks against this cipher are both algebraic attack that can work only when no nonlinear filter function present in the feedback path of the cipher. Our attack works both in the presence and absence of nonlinear filter functions, and reduces the data complexity over those algebraic attacks.
References 1. Boura C, Canteaut A (2016) Another view of the division property. In: Robshaw M, Katz J (eds) Advances in cryptology - CRYPTO 2016 - 36th annual international cryptology conference, Santa Barbara, CA, USA, Proceedings, Part I. Lecture notes in computer science, vol 9814. Springer, pp 654–682. Accessed from 14–18 August 14-18 2. Dinur I, Shamir A (2009) Cube attacks on tweakable black box polynomials. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 278–299 3. ElSheikh M, Youssef AM (2021) On milp-based automatic search for bit-based division property for ciphers with (large) linear layers. In: Baek J, Ruj S (eds) Information security and privacy - 26th Australasian conference, ACISP 2021, virtual event, Proceedings. Lecture notes in computer science, vol 13083. Springer, pp 111–131. Accessed from 1–3 Dec 2021 4. Luo Y, Chai Q, Gong G, Lai X (2010) A lightweight stream cipher WG-7 for RFID encryption and authentication. In: Proceedings of the global communications conference, 2010. GLOBECOM 2010, Miami, Florida, USA, pp 1–6. Accessed from 6–10 Dec 2010 5. Nawaz Y, Gong G (2008) WG: a family of stream ciphers with designed randomness properties. Inf Sci 178(7):1903–1916
Security Analysis of WG-7 Lightweight Stream Cipher Against Cube Attack
27
6. Orumiehchiha MA, Pieprzyk J, Steinfeld R (2012) Cryptanalysis of WG-7: a lightweight stream cipher. Cryptogr Commun 4(3–4):277–285 7. Rohit R, AlTawy R, Gong G (2017) Milp-based cube attack on the reduced-round WG-5 lightweight stream cipher. In: IMA international conference on cryptography and coding, pp 333–351 8. Rønjom S (2017) Improving algebraic attacks on stream ciphers based on linear feedback shift register over F2k F 2 k. Des Codes Cryptogr 82(1–2):27–41 9. Sun L, Wang W, Wang MQ (2019) Milp-aided bit-based division property for primitives with non-bit-permutation linear layers. IET Inf Secur 14(1):12–20 10. Sun S, Hu L, Wang P, Qiao K, Ma X, Song L (2014) Automatic security evaluation and (related-key) differential characteristic search: application to simon, present, lblock, des (l) and other bit-oriented block ciphers. In: International conference on the theory and application of cryptology and information security. Springer, pp 158–178 11. Todo Y, Isobe T, Hao Y, Meier W (2017) Cube attacks on non-blackbox polynomials based on division property. In: Advances in cryptology - CRYPTO 2017 - 37th annual international cryptology conference, Santa Barbara, CA, USA, Proceedings, Part III. pp 250–279. Accessed from 20–24 Aug 2017 12. Xiang Z, Zhang W, Bao Z, Lin D (2016) Applying MILP method to searching integral distinguishers based on division property for 6 lightweight block ciphers. In: Advances in cryptology - ASIACRYPT 2016 - 22nd international conference on the theory and application of cryptology and information security, Hanoi, Vietnam, Proceedings, Part I. pp 648–678. Accessed from 4–8 Dec 2016 13. Zhang W, Rijmen V (2019) Division cryptanalysis of block ciphers with a binary diffusion layer. IET Inf Secur 13(2):87–95
MILP Modeling of S-box: Divide and Merge Approach Manoj Kumar
and Tarun Yadav
Abstract Mixed Integer Linear Programming (MILP) solvers are used to ascertain the resistance of block ciphers against the differential attack. This requires the translation of active S-box minimization and probability optimization problems into the MILP model. The possible and impossible transitions in the difference distribution table (DDT) are used to generate the linear inequalities for the S-box. There are a large number of linear inequalities in this set that need to be minimized to construct an efficient MILP model. At ToSC 2020, Boura and Coggia presented an approach to minimize the set of linear inequalities, but the complexity to get an optimal solution is high due to the trade-off between time complexity and optimal solution. In this paper, we aim to optimize this trade-off and propose an algorithm based on the divide and merge approach. We experimented with various configurations of the proposed algorithm for 4-bit and 5-bit S-boxes and compared the results. The results show that the divide and merge approach outperforms the existing approach with large advantage factors. Keywords Block cipher · Differential cryptanalysis · Linear inequalities · MILP · S-box
1 Introduction The differential attack is a three-decade-old cryptanalysis technique that was initially proposed by Biham and Shamir and applied to present the first practical attack on DES [3]. Differential cryptanalysis is a technique that is used to evaluate the security of block ciphers by constructing the distinguishers. The branch-and-bound methods are used to construct the differential distinguishers [12] and thereby mount the key M. Kumar (B) · T. Yadav Scientific Analysis Group, DRDO, Metcalfe House Complex, Delhi 110 054, India e-mail: [email protected] T. Yadav e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_3
29
30
M. Kumar and T. Yadav
recovery attacks [9]. In 2012, an MILP-based method was proposed by Mouha et al. [14], thereafter MILP-based tools are being used to attack the block ciphers through differential cryptanalysis. The differential cryptanalysis problem can be translated into the MILP model, and MILP solvers can be used to construct the differential distinguishers. Designers use MILP solvers to provide security bounds, and cryptanalysts use MILP solvers to search the new distinguishers. Mouha et al. proposed the use of MILP for block ciphers in 2012 [14], and Sun et al. proposed the bitwise modeling for block ciphers in 2014 [17]. The modeling of linear layers is an easier task than the modeling of nonlinear S-boxes [10]. There are various approaches to model the differential properties of S-boxes into linear inequalities. Sun et al. proposed two approaches based on the H-representation of convex hull and logical condition. The method of H-representation is not efficient for 8-bit S-boxes [17, 18]. In 2017, Abdelkhalek et al. generated the linear inequalities for 8-bit S-boxes using the logic minimization tool Logic Friday [11] that generates and minimizes the inequalities based on the product-of-sum of boolean functions [1]. The limitation of the Logic Friday tool is to generate linear inequalities with more than sixteen variables. Yadav and Kumar presented an Espresso [6] based tool MILES to generate the linear inequalities of a large S-box in more (≥16) variables [20]. In 2020, Boura and Coggia presented a new technique to minimize the number of linear inequalities for S-boxes [5]. Motivation and Contribution. To model the differential attack using MILP, the non-linear component (most commonly S-box) of block cipher needs to be converted into linear constraints in the form of inequalities. There are various approaches to generate a set of linear inequalities using differential properties of S-box. The set contains a large number of linear inequalities, therefore the minimization of the number is required to model an efficient MILP problem. Boura and Coggia [5] proposed a method to generate and minimize the linear inequalities in this set. The new inequalities are introduced in this set by adding two/three inequalities at a time. The time complexity of this approach is high, and this complexity increases exponentially for large S-boxes. We optimize the reduction approach by division of the original set of inequalities into multiple batches. Using the approach, the trade-off between time complexity and minimized linear inequalities is optimized with large advantage factors. Organization. The paper is divided into four sections. In Sect. 2, we present MILP modeling of S-boxes and the reduction of inequalities using the standard approach. We also discuss the method proposed by Boura and Coggia to minimize the linear inequalities and propose a divide and merge approach to get a better reduction. We present our algorithm to improve the existing method to reduce the linear inequalities of the S-box in this section. In Sect. 3, we apply our approach to 4-bit and 5-bit Sboxes to minimize the set of linear inequalities and present the results. We conclude the paper in Sect. 4.
MILP Modeling of S-box: Divide and Merge Approach
31
2 MILP Modeling of S-boxes MILP is used by the designers to prove the resistance of a design against various cryptanalytic attacks, and cryptanalysts use MILP solvers to break the security claims of the block ciphers. Since MILP solvers are being used for a long time to solve optimization problems across domains, these solvers are well developed. So, the only task remains the translation of cryptanalytic problems into linear constraints and the application of existing solvers to get the results. In a differential attack, MILP modeling depends on the linear and non-linear components of the target block cipher. Linear components can be easily translated into linear constraints but the translation of non-linear components into linear constraints is an effortful task. Sbox is a non-linear component that is used in most block cipher designs. To mount a differential attack, the difference distribution table (Appendix: Table 4) of S-box is designed to get the possible transitions of input and output differences. In this section, we describe the process to write linear inequalities of DDT and methods of minimization of these inequalities.
2.1 Generating Linear Inequalities of DDT H-representation of the convex hull (Fig. 1) is a method to generate the linear inequalities that represent the behavior of DDT of the S-box [17]. The entries in DDT (Appendix: Table 4), defined as t = (Δx → Δy) for input difference Δx and output difference Δy, are either zero or non-zero. If t = 0 then (Δx → Δy) is a possible transition otherwise an impossible transition. The set of all possible transitions is used to generate the linear inequalities for the S-box. A differential transition (Δx → Δy) in the DDT of n-bit S-box is represented as a vector (Δxn−1 , Δxn−2 , · · · , Δx0 , Δyn−1 , Δyn−2 , · · · , Δy0 ) in R 2n . The set of possible
Fig. 1 Example of Convex Hull and Corresponding Linear Inequalities
32
M. Kumar and T. Yadav
transitions is used to describe the DDT of the S-box with the following N inequalities (Eq. 1) and that is the H-representation of n-bit S-box: α0,n−1 Δxn−1 + · · · + α0,0 Δx0 + β0,n−1 Δyn−1 + · · · + β0,0 Δy0 + γ0 ≥ 0 · · · (1) α N ,n−1 Δxn−1 + · · · + α N ,0 Δx0 + β N ,n−1 Δyn−1 + · · · + β N ,0 Δy0 + γ N ≥ 0 SageMath [16] can be used to compute the H-representation thereby generating a set of linear inequalities that excludes all impossible transitions [18]. We use a Python library of CDDLib [15] to compute the H-representation of the convex hull. Motzkin et al. [13] presented the Double Description Method to generate vertices of a convex polyhedron. This method is implemented in CDDLib library that also supports computation of convex hull for a given set of vertices. This convex hull is represented by a set of linear inequalities. We use this feature of the library to compute a convex hull for the vertex representation of the DDT. The library also provides a feature to obtain the linear inequalities of the convex hull. The pseudocode to generate the linear inequalities is presented in Algorithm 1 and the source code is available on GitHub.1 Algorithm 1 Generating Linear Inequalities of S-box Input: S-box (n-bit, n ≤ 5 ) Output: L = Set of linear inequalities representing S-box 1: D DT ← Difference Distribution Table of S-box (input difference : Δxn−1 , Δxn−2 , · · · , Δx0 & output difference : Δyn−1 , Δyn−2 , · · · , Δy0 ) 2: VD DT ← Vertices representation (Δxn−1 , Δxn−2 , · · · , Δx0 , Δyn−1 , Δyn−2 , · · · , Δy0 ) of possible transitions in DDT 3: H ← Convex Hull of vertices in V D DT (computed using CDDLib) 4: L ← Linear inequalities generated from H-representation of H
2.2 Inequality Minimization Using Impossible Transitions The set of linear inequalities generated using the H-representation of a convex hull can be used to model the MILP problem for the differential attack. In each round, this set of linear inequalities will be used to write the constraints for each S-box. There are multiple rounds in a block cipher, and multiple S-boxes are used in each round. Therefore, it is necessary to reduce the number of inequalities to construct an efficient MILP model. Sasaki and Todo [19] proposed a method to reduce the number of linear inequalities using impossible transitions of the DDT. In this method, a new MILP model is constructed by assigning a binary variable z i (0 ≤ i ≤ N − 1) to each inequality. For each impossible transition, all inequalities (z i ) are checked for 1
https://github.com/tarunyadav/Sbox_LinIneq_Reduction_DM.
MILP Modeling of S-box: Divide and Merge Approach
33
satisfying assignments. If an inequality does not satisfy an impossible transition, then this inequality removes that particular impossible transition. All the variables corresponding to the inequalities removing a particular impossible transition are added to obtain a new inequality of the form z i + · · · + z j ≥ 1. This new linear constraint is added to ensure that an impossible transition must be removed by at least one inequality. This procedure is repeated for all impossible transitions, and a set of linear inequalities is generated in terms of z i variables. The objective function of this MILP model is the minimization of the sum of z i variables, i.e. ( 0N −1 z i ). The MILP model is solved using an optimization solver (e.g. GUROBI [7] and CPLEX [8]). The optimal solution for the objective function is used to obtain a reduced set of linear inequalities. The reduced set contains the inequalities corresponding to the variables z i with value 1. The number of inequalities in this set is significantly reduced from the original set. Now, this set is used for efficient MILP modeling of the differential characteristic search problem. This is a basic reduction approach to generate the linear inequalities through the H-representation of convex hull and minimization using the MILP solver.
2.3 Existing Improvements in Inequality Minimization Boura and Coggia [5] proposed an approach to achieve a better reduction of the number of inequalities in comparison to the basic reduction approach. They proposed a method to introduce the new inequalities using the addition property of inequalities. These new inequalities are introduced to the set of linear inequalities generated using the H-representation of the convex hull (Fig. 2). The extended set of inequalities is minimized using the MILP solver as discussed in Sect. 2.2. In this approach, k inequalities from the initial set are added to construct a new inequality. The addition property of inequalities ensures that a possible transition p=(Δx, Δy) satisfying any k inequalities (C1 , C2 , . . . , Ck ) will also satisfy the new inequality introduced by adding these k inequalities, i.e. Cnew = (C1 + C2 + · · · + Ck ). The number of impossible transitions removed by this new inequality Cnew are counted and listed. If Cnew removes a different set of impossible transitions of DDT, which are not removed by any existing inequality, then it is introduced to the extended set of inequalities. This process is repeated for all possible combinations of k inequalities. Boura and Coggia experimented with this approach for 4-bit, 5-bit, and 6-bit S-boxes with k = 2 and k = 3. The experimental results show better reduction as compared to the existing reduction approaches.
34
M. Kumar and T. Yadav
Fig. 2 Example to Represent the Addition of New Linear Inequalities
2.4 Inequality Minimization: Divide and Merge Approach The approach proposed by Boura and Coggia processes a large number of linear inequalities (L BC ), and it may take hours of time to produce a minimized set of BC ). We propose a new approach to minimize the number of inequalinequalities (L min ities in practical time. We use the proposed approach to optimize a trade-off between the number of linear inequalities processed (computationally called time complexity) and the reduction achieved. This approach gives a near-optimal solution (w.r.t. to the approach presented by Boura and Coggia) with a high advantage factor (Eq. 2). This factor (AF) indicates a gain in the inequalities processed versus a loss that occurred in the reduction achieved. The proposed approach is based on the divide and merge strategy. We use the linear inequalities (L) obtained from the H-representation of the convex hull (Algorithm 1). The inequalities which do not remove at least one impossible transition are removed from the list L. Now, the number of impossible transitions removed by each inequality are counted, and the set of inequalities is sorted in descending order of this count. We divide this sorted set of linear inequalities (L sor t ) into a set (B) of multiple batches of batch size β. The last batch may have a batch size less than β. For each batch (b ∈ B), every combination of k linear inequalities is added to get a new inequality, and these new inequalities are introduced to a set L new . Once the process is completed for all the batches, L sor t and L new are merged together to get L merge . To reduce the number of inequalities in the set L merge , the new MILP problem is modeled in a similar way as described in Sect. 2.2. The model is solved using an optimization solver (CPLEX) to get the minimized set of linear inequalities L min . The divide and merge approach is described in Algorithm 2. Advantage Factor (AF) =
L BC /L merge BC L min /L min
(2)
MILP Modeling of S-box: Divide and Merge Approach
35
Algorithm 2 Linear Inequality Minimization: Divide and Merge Approach Input: S-box (n-bit, n ≤ 5 ), batch size (β) Output: L min = Set of minimized linear inequalities 1: D DT ← Difference Distribution Table of S-box 2: L ←Linear inequalities generated using H-representation of Convex Hull 3: L ← L − {li ∈ L s.t. number of impossible transitions removed by li = 0} 4: L sor t ← Sorted L in descending order w.r.t number of impossible transitions removed 5: B ← L sor t is divided into a set of batches of size β 6: L new ← φ 7: for b ∈ B do 8: Sineq ← {(Add(li , l j ) where li , l j ∈ b} Addition of coefficients & constants 9: L new ← L new ∪ Sineq 10: end for 11: L merge ← L sor t ∪ L new 12: M.constraints ← φ 13: for all Impossible Transitions (X → Y ) in DDT do 14: Z ← {z i , ∀li ∈ L merge and li (X, Y ) < 0} z i is a binary variable 15: M.constraints ← M.constraints ∪ { Z ≥ 1 } 16: end for 17: M.objective ← Min( {z i , ∀li ∈ L merge }) 18: L min ← M.optimi ze()
3 Experimental Results: 4-Bit and 5-Bit S-boxes The block cipher designs are based on Feistel, SPN, and ARX structures. The S-box is used to introduce the non-linearity while permutation layers spread the effect of the S-box in the output. In general, 4-bit and 8-bit S-boxes are most commonly used, but there are a few designs based on 5-bit, 6-bit, and 7-bit S-boxes. We demonstrate the proposed divide and merge approach on 4-bit and 5-bit boxes used in some popular designs. The 4-bit S-boxes used in PRESENT [4], WARP [2], GIFT, and TWINE and 5-bit S-boxes used in ASCON, FIDES-5, and SC2000-5 are given in Table 1. We set k=2 with batch size β ∈ {10, 20, 30, 50, 100} and k=3 with batch size β ∈ {2, 5, 10, 15, 20} and apply2 Algorithm 2 with these configurations. The initial set of inequalities is divided into batches of size β, and every k inequalities of a batch are added together to generate a new set of inequalities. In this process, the number of inequalities explored and the size of minimized set using MILP modeling are two important criteria to calculate the advantage factor. These criteria and advantage factors calculated using Eq. 2 are mentioned in Tables 2 and 3. As discussed in the previous section, the advantage factor is used to determine the efficacy of the proposed approach. The high value of the advantage factor indicates the benefits of using the divide and merge approach. We compare the outcome of the divide and merge approach using different configurations of Algorithm 2 with the existing minimization approaches in Tables 2 and 3. These experiments indicate that divide and 2
https://github.com/tarunyadav/Sbox_LinIneq_Reduction_DM.
36
M. Kumar and T. Yadav
Table 1 S-box of various ciphers (decimal values) i PRESENT WARP GIFT TWINE 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
12 05 06 11 09 00 10 13 03 14 15 08 04 07 01 02
12 10 13 03 14 11 15 07 08 09 01 05 00 02 04 06
01 10 04 12 06 15 03 09 02 13 11 07 05 00 08 14
12 00 15 10 02 11 09 05 08 03 13 07 01 14 06 04
ASCON
FIDES-5
SC2000-5
04 11 31 20 26 21 09 02 27 05 08 18 29 03 06 28 30 19 07 14 00 13 17 24 16 12 01 25 22 10 15 23
01 00 25 26 17 29 21 27 20 05 04 23 14 18 02 28 15 08 06 03 13 07 24 16 30 09 31 10 22 12 11 19
20 26 07 31 19 12 10 15 22 30 13 14 04 24 09 18 27 11 01 21 06 16 02 28 23 05 08 03 00 17 29 25
MILP Modeling of S-box: Divide and Merge Approach
37
Table 2 Comparison of the proposed method with existing methods for k = 2 Cipher
Algo 1
[5]
Divide and merge approach (Algo 2) β = 10 β = 20 β = 30 β = 40 β = 50 β = 100
PRESENT
WARP
GIFT
TWINE
ASCON
FIDES-5
SC2000-5
Explored
327
106929
3101
6121
9121
12161
15121
30121
Minimized
21
17
20
20
20
20
19
19
AF
–
1
29.31
14.85
9.96
7.47
6.33
3.18
Explored
239
57121
2209
4409
6469
8529
10529
20529
Minimized
21
17
18
18
18
18
17
17
AF
–
1
24.42
12.24
8.34
6.33
5.43
2.78
Explored
237
56169
2201
4401
6421
8441
10441
20441
Minimized
21
17
20
20
19
19
19
18
AF
–
1
21.69
10.85
7.83
5.95
4.81
2.60
Explored
324
104976
3064
6064
9064
11984
15064
30064
Minimized
23
20
23
22
22
22
22
21
AF
–
1
29.79
15.74
10.53
7.96
6.34
3.33
Explored
2415
5832225
23900
47700
71500
95300
119100 238100
Minimized
40
32
39
38
37
37
37
36
AF
–
1
200.23
102.96
70.55
52.93
42.35
21.77
Explored
910
828100
8881
17681
26461
35281
44021
87921
Minimized
79
64
75
74
74
73
72
69
AF
–
1
79.57
40.51
27.07
20.58
16.72
8.74
Explored
908
824464
8864
17664
26424
35264
43944
87744
Minimized
82
66
78
78
77
76
76
74
AF
–
1
78.70
39.49
26.74
20.30
16.29
8.38
merge approach optimizes a trade-off between the number of inequalities explored and the reduced set of linear inequalities.
3.1 Divide and Merge Approach: k = 2 For the k = 2 configuration, we experiment with various batch sizes (β) to get a minimized set of linear inequalities. It is evident from Table 2 that the number of explored inequalities is always lesser, but the minimized set contains more inequalities in comparison to the Boura and Coggia approach [5]. Although the minimized inequalities are more, the advantage factor is high which indicates the good efficacy of our approach in practical time. For PRESENT, we are able to reduce the number of linear inequalities in the minimized set from 21 to 19 with batch size β = 50 and by exploring only 15121 inequalities instead of 106929. This gives the advantage factor of 6.33 which can be considered as a performance benefit of the proposed divide and merge approach. Similarly, we are able to get the near-optimal set of reduced inequalities with a very high advantage factor for GIFT, TWINE, ASCON, FIDES-5,
SC2000-5
FIDES-5
ASCON
TWINE
GIFT
908
82
–
Minimized
AF
–
AF
Explored
79
Minimized
–
AF
910
40
Minimized
Explored
2415
–
AF
Explored
23
Minimized
–
AF
324
21
Minimized
Explored
237
–
AF
Explored
21
Minimized
–
AF
239
21
Minimized
Explored
327
Explored
PRESENT
WARP
Algo 1
1Cipher
1
66
748613312
1
64
753571000
1
32
14084823375
1
20
34012224
1
17
13312053
1
17
13651919
1
17
34965783
[5]
176076.51
79
3552
178605.71
76
3553
1208868.01
39
9560
24006.37
23
1232
12232.04
21
881
13740.02
19
889
23949.17
20
1241
β=2
28966.62
77
22152
29774.44
73
22189
198508.85
38
59750
3865.11
23
7652
2165.20
19
5501
2332.82
18
5527
4036.28
19
7751
β=5
Divide and merge approach (Algo 2)
Table 3 Comparison of the Proposed Method with Existing Methods for k = 3
7543.41
74
88512
7764.98
70
88729
509684.89
37
23900
1061.63
21
30512
571.45
18
22001
585.35
18
22027
1009.17
19
31001
β = 10
3398.55
73
199152
3458.92
70
199189
23991.71
35
536750
476.28
21
68012
258.79
18
48581
276.09
17
49447
454.52
19
68831
β = 15
1946.68
72
352512
1981.59
69
352729
131372.96
36
95300
268.79
21
120512
142.87
18
88001
155.09
17
88027
257.85
19
121331
β = 20
38 M. Kumar and T. Yadav
MILP Modeling of S-box: Divide and Merge Approach
39
and SC2000-5. For WARP, we are able to achieve the optimal number of minimized inequalities, i.e. 17 with an advantage factor of 5.43.
3.2 Divide and Merge Approach: k = 3 For k = 3, there is a manifold increment in the number of inequalities to explore as compared to the case of k=2. In this case, the impact of the divide and merge approach is expected to be very high, and the benefits of the proposed method will be more significant. In the case of GIFT cipher, we are able to minimize the set of linear inequalities from 21 to 18 with an advantage factor of 571.45 by exploring only 22001 inequalities instead of 13313053 inequalities (Table 3). Similarly, it can be observed that the divide and merge approach produces the near-optimal solution with a high advantage factor for PRESENT, TWINE, ASCON, FIDES-5, and SC2000-5. For WARP, the optimal solution (17) is achieved by exploring only 49447 inequalities instead of 13651919 inequalities with an advantage factor of 276.09.
4 Conclusion The representation of DDT of the S-box in terms of linear inequalities is necessary for MILP modeling of the differential attack on block ciphers. For this purpose, the H-representation of the convex hull is used to get the linear inequalities that represent the differential properties of the DDT of the S-box but to model an efficient MILP problem, minimized inequalities are required. Existing approaches either provide a large number of reduced inequalities or are very costlier in terms of time complexity. In this paper, we presented a divide and merge approach to optimize a trade-off between the number of inequalities to be explored and reduced set of linear inequalities. We presented an algorithm describing the divide and merge approach to reduce the linear inequalities and experimented with various configurations. These experiments show that the divide and merge approach achieves near-optimal solutions with large advantage factors for 4-bit and 5-bit S-boxes. The application of the proposed algorithm for 8-bit S-boxes can be explored in the future.
Appendix See Table 4.
40
M. Kumar and T. Yadav
Table 4 DDT of WARP S-Box (ΔX, ΔY ) 00 01 02 03 04 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 2 4 0 2 2 2 0 2 0 0 0 0 0 2 0
0 4 0 0 4 0 0 0 0 4 0 0 4 0 0 0
0 0 0 0 2 0 4 2 2 2 0 0 0 2 0 2
0 2 4 2 2 2 0 0 2 0 0 2 0 0 0 0
05
06
07
08
09
10
11
12
13
14
15
0 2 0 0 2 0 0 4 0 2 4 0 2 0 0 0
0 2 0 4 0 0 0 2 2 0 0 0 2 2 0 2
0 0 0 2 0 4 2 0 0 0 0 2 0 4 2 0
0 2 0 2 2 0 2 0 0 2 0 2 2 0 2 0
0 0 4 2 0 2 0 0 2 2 0 2 2 0 0 0
0 0 0 0 0 4 0 0 0 0 4 0 0 4 0 4
0 0 0 0 2 0 0 2 2 2 0 4 0 2 0 2
0 0 4 0 0 2 2 0 2 2 0 0 2 0 2 0
0 0 0 2 0 0 2 4 0 0 4 2 0 0 2 0
0 2 0 0 0 0 0 2 2 0 0 0 2 2 4 2
0 0 0 2 0 0 2 0 0 0 4 2 0 0 2 4
References 1. Abdelkhalek A, Sasaki Y, Todo Y, Tolba M, Youssef AM (2017) MILP modeling for (large) S-boxes to optimize probability of differential characteristics. IACR Trans Symmetr Cryptol 2017(4):99–129 2. Banik S, Bao Z, Isobe T, Kubo H, Minematsu K, Liu F, Sakamoto K, Shibata N, Shigeri M (2020) WARP: revisiting GFN for lightweight 128-bit block cipher, SAC 2020, LNCS, vol 12804. Springer, Cham, pp 535–564 3. Biham E, Shamir A (1992) Differential cryptanalysis of the full 16-round DES. CRYPTO 92, LNCS, vol 740. Springer, pp 487–496 4. Bogdanov A, Knudsen LR, Leander G, Paar C, Poschmann A, Robshaw MJB, Seurin Y, Vikkelsoe C (200) PRESENT: an ultra-lightweight block cipher, CHES 2007, vol 4727, LNCS. Springer, pp 450–466 5. Boura C, Coggia D (2020) Efficient MILP modelings for S-boxes and linear layers of SPN ciphers. IACR Trans Symmetr Cryptol 3:327–361 6. Espresso Logic Minimizer, https://ptolemy.berkeley.edu/projects/embedded/pubs/downloads/ espresso 7. Gurobi Optimizer 7.5.2, http://www.gurobi.com 8. IBM ILOG CPLEX Optimization Studio V12.7.0 documentation (2016) Official webpage, https://www-01.ibm.com/software/websphere/products/optimization/cplex-studiocommunity-edition/ 9. Kumar M, Suresh TS, Pal SK, Panigrahi A (2020) Optimal differential trails in lightweight block ciphers ANU and PICO. Cryptologia 44(1):68–78 10. Kumar M, Yadav T (2022) MILP based differential attack on round reduced WARP, SPACE 2021, Lecture notes in computer science, vol 13162. Springer, Cham, pp 42–59 11. Logic Friday, http://sontrak.com/
MILP Modeling of S-box: Divide and Merge Approach
41
12. Matsui M (1994) On correlation between the order of S-boxes and the strength of DES, EUROCRYPT, Italy, May 1994, pp 366–375 13. Motzkin TS, Raiffa H, Thompson GL, Thrall RM (1953) The double description method. In: Kuhn HW, Tucker AW (eds) Contributions to theory of games, vol 2. Princeton University Press, Princeton, RI 14. Mouha N, Wang Q, Gu D, Preneel B (2011) Differential and linear cryptanalysis using mixedinteger linear programming, Inscrypt 2011, vol 7537, LNCS. Springer, pp 57–76 15. pycddlib 2.1.6, https://pypi.org/project/pycddlib/ 16. SAGE, http://www.sagemath.org/index.html 17. Sun S, Hu L, Wang P, Qiao K, Ma X, Song L (2014) Automatic security evaluation and (relatedkey) differential characteristic search: application to SIMON, PRESENT, LBlock, DES(L) and other bit-oriented block ciphers, ASIACRYPT 2014, Part I, vol 8873, LNCS. Springer, pp 158– 178 18. Sun S, Hu L, Wang M, Wang P, Qiao K, Ma X, Shi D, Song L, Fu K (2014) Towards finding the best characteristics of some bit-oriented block ciphers and automatic enumeration of (related-key) differential and linear characteristics with predefined properties. Cryptology ePrint Archive, Report 2014/747 19. Sasaki Y, Todo Y (2017) New impossible differential search tool from design and cryptanalysis aspects - revealing structural properties of several ciphers, EUROCRYPT 2017, vol 10212, LNCS. Springer, pp 185–215 20. Yadav T, Kumar M (2021) Modeling large S-box in MILP and a (related-key) differential attack on full round PIPO-64/128, IACR cryptology e-print Archive, Report No. 2021/1388
A Relation Between Properties of S-box and Linear Inequalities of DDT Manjeet Kaur , Tarun Yadav , Manoj Kumar , and Dhananjoy Dey
Abstract Mixed integer linear programming (MILP)-based tools are widely used to analyze the security of block ciphers against differential attacks. The differential properties of an S-box are represented by the difference distribution table (DDT). The methods based on convex hull and logic minimization are used to represent the DDT through linear inequalities. The impossible transitions in DDT are utilized for the minimization of the number of linear inequalities generated using the convex hull approach. The Boolean logic minimization tools Logic Friday and MILES minimize the truth table of DDT to construct and reduce the set of linear inequalities. In this paper, we construct and compare the number of linear inequalities for S-boxes of size 4 bits used in 42 lightweight block ciphers. We analyze the cryptographic properties of S-boxes and observe an inverse relationship between the boomerang uniformity of S-boxes and the number of linear inequalities constructed. We establish that the number of linear inequalities for an S-box decreases as the value of boomerang uniformity increases. Keywords Boomerang uniformity · Differential branch number · MILP · S-box
M. Kaur (B) · D. Dey Indian Institute of Information Technology, Lucknow, U.P. 226 002, India e-mail: [email protected] D. Dey e-mail: [email protected] T. Yadav · M. Kumar Scientific Analysis Group, DRDO, Delhi 110 054, India e-mail: [email protected] M. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_4
43
44
M. Kaur et al.
1 Introduction The two fundamental cryptanalytic attacks to determine the security of block ciphers are differential and linear [20]. A block cipher should resist these attacks. In a differential characteristic, a lower bound of the number of active S-boxes is computed to determine the security of block ciphers against differential attacks. This bound and maximum value of differential probability is used to estimate an upper bound on the probability of differential characteristics. MILP is used to get the optimal differential characteristics by transforming the search problem into a MILP model. MILP is an optimization method m to maximize (or ci xi , where xi ∈ minimize) an objective function (linear) f (x1 , x2 , . . . , xm ) = i=1 R (some of them are integers) under certain linear constraints Ax ≤ b with A ∈ Rn×m and b T ∈ Rn . This model is solved to get an optimal solution x = (x1 , x2 , . . . , xm )T ∈ Rm using optimization problem solvers (viz. GUROBI/CPLEX). The linear inequalities corresponding to the linear layer of a block cipher can be generated easily. The DDT is utilized to produce the linear inequalities of an S-box. This set consists of many inequalities that result in an inefficient MILP model. Consequently, the minimization of the number of linear inequalities to get the solution efficiently is necessary. There are several methods that minimize the number of inequalities. In 2011, Mouha et al. [29] developed a novel method using MILP to optimize the number of active S-boxes in any differential characteristics of the word-oriented ciphers. This method is not appropriate for SPN-based block ciphers with bitwise permutation-based diffusion layers. In 2013, Sun et al. [35] developed a MILP-based approach that optimizes the number of active S-boxes for bit-permutation-oriented block ciphers. In 2014, Sun et al. [36] proposed two systematic methods (logical condition modeling and computational geometry) to explain the possible and the impossible differential transitions of an S-box in terms of a set of linear inequalities. SageMath tool was used to generate the linear inequalities and the greedy algorithm was used to minimize these inequalities. They also proposed an automatic method to evaluate the security of block ciphers with bit-oriented diffusion layers using MILP modeling. In 2017, Sasaki and Todo [32] proposed a reduction algorithm to minimize a large set of linear inequalities (included redundant) of an S-box using MILP. These inequalities are obtained using SageMath or the logical conditional model. This new reduction algorithm creates a new MILP model based on impossible transitions in DDT for the minimization of the number of inequalities. In 2017, Abdelkhalek et al. [2] produced linear inequalities using Logic Friday, but it is unable to represent the DDT of a large S-box (≥9-bit) into inequalities. In 2021, Yadav and Kumar [39] presented a new tool MILES to reduce the linear inequalities in more than 16 variables. This tool overcomes the limitation of Logic Friday and reduce the number of linear inequalities by minimizing the truth table corresponding to the DDT of an S-box in more than 16 variables. Motivation. An S-box is used to induce the confusion property in block ciphers. There are several methods to test the strength of an S-box against various crypto-
A Relation Between Properties of S-box and Linear Inequalities of DDT
45
graphic properties. Nowadays, the problem of differential cryptanalysis is converted into a MILP model to find an efficient solution. In this paper, we analyze the representation of differential properties of an S-box through linear inequalities using several methods. We aim to find a relation between some of the cryptographic properties of an S-box and the number of linear inequalities. Organization. The remaining paper is organized as follows. First, we discuss the properties of non-linear S-boxes in Sect. 2. The existing methods for construction and minimization of linear inequalities are discussed in Sect. 3. The number of minimized linear inequalities are compared and used to establish a reverse relation with the boomerang uniformity in Sect. 4. Conclusion and future work are outlined in Sect. 5.
2 Non-linear S-boxes An S-box is a non-linear element of the block ciphers. An m × n S-box is a function S : {0, 1}m → {0, 1}n that takes m-bit input and transforms it into n-bit output. It can be implemented as a lookup table. In this paper, we discuss S-boxes such that n = m = 4. A 4-bit S-box is shown in Fig. 1. The S-box of the block cipher CRAFT is presented in Table 1.
2.1 Properties of S-box A strong S-box must satisfy various cryptographic properties viz. strict avalanche criterion, high non-linearity, high algebraic degree, etc. We discuss only those cryptographic properties of S-boxes that are used in our experiments.
t1
Fig. 1 4-bit S-box
t2
t3
t4
S-Box
s1 Table 1 S-box of CRAFT (Decimal Values) t 0 1 2 3 4 5 6 7 y = 12 S(t)
10
13
3
14
11
15
7
s2
s3
s4
8
9
10
11
12
13
14
15
8
9
1
5
0
2
4
6
46
M. Kaur et al.
Table 2 DDT of CRAFT S-box
Δi
Δo
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
16
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
2
4
0
2
2
2
0
2
0
0
0
0
0
2
0
2
0
4
0
0
4
0
0
0
0
4
0
0
4
0
0
0
3
0
0
0
0
2
0
4
2
2
2
0
0
0
2
0
2
4
0
2
4
2
2
2
0
0
2
0
0
2
0
0
0
0
5
0
2
0
0
2
0
0
4
0
2
4
0
2
0
0
0
6
0
2
0
4
0
0
0
2
2
0
0
0
2
2
0
2
7
0
0
0
2
0
4
2
0
0
0
0
2
0
4
2
0
8
0
2
0
2
2
0
2
0
0
2
0
2
2
0
2
0
9
0
0
4
2
0
2
0
0
2
2
0
2
2
0
0
0
10
0
0
0
0
0
4
0
0
0
0
4
0
0
4
0
4
11
0
0
0
0
2
0
0
2
2
2
0
4
0
2
0
2
12
0
0
4
0
0
2
2
0
2
2
0
0
2
0
2
0
13
0
0
0
2
0
0
2
4
0
0
4
2
0
0
2
0
14
0
2
0
0
0
0
0
2
2
0
0
0
2
2
4
2
15
0
0
0
2
0
0
2
0
0
0
4
2
0
0
2
4
Definition 1 A bit string t of length m is called a fixed point (FP) of an S-box if S(t) = t. Definition 2 The differential branch number (BN) of an S-box is stated as B N = min {wt (t ⊕ r ) + wt (S(t) ⊕ S(r ))}, t,r =t
where wt (r ) represents the Hamming weight of r . Difference Distribution Table of S-box. In differential attack, non-uniform relations between the input differences Δi and corresponding output differences Δo are utilized. We present the number of input pairs with input difference Δi and output difference Δo in the DDT of an S-box. The DDT of CRAFT S-box is presented in Table 2. Boomerang Connectivity Table (BCT). Boomerang connectivity matrix of an invertible m×m S-Box S is a 2m × 2m matrix with entry at row Δi ∈ Fm 2 and column Δo ∈ F m 2 equal to −1 −1 |{t ∈ Fm 2 |S (S(t) ⊕ Δo ) ⊕ S (S(t ⊕ Δi ) ⊕ Δo ) = Δi }|
(1)
The Boomerang connectivity table of CRAFT S-box is described in Table 3. Definition 3 The boomerang uniformity (BU) of an S-box is equal to the highest value in the BCT table after ignoring the entries in the first row and column.
A Relation Between Properties of S-box and Linear Inequalities of DDT
47
Table 3 BCT of CRAFT S-box
Δo
0
1
0
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
1
16
2
4
0
6
2
6
0
2
0
0
0
0
0
2
0
2
16
4
16
4
4
0
4
0
0
4
0
4
4
0
4
0
3
16
0
4
0
6
0
4
2
2
2
0
0
0
2
0
2
4
16
6
4
6
2
2
0
0
2
0
0
2
0
0
0
0
5
16
2
0
0
2
4
0
4
0
2
8
0
2
4
0
4
6
16
6
4
4
0
0
0
2
2
0
0
0
2
2
0
2
7
16
0
0
2
0
4
2
4
0
0
8
2
0
4
2
4
8
16
2
0
2
2
0
2
0
0
2
0
2
2
0
2
0
9
16
0
4
2
0
2
0
0
2
6
0
6
2
0
0
0
10
16
0
0
0
0
8
0
8
0
0
16
0
0
8
0
8
11
16
0
4
0
2
0
0
2
2
6
0
4
0
2
0
2
12
16
0
4
0
0
2
2
0
2
2
0
0
6
0
6
0
13
16
0
0
2
0
4
2
4
0
0
8
2
0
4
2
4
14
16
2
4
0
0
0
0
2
2
0
0
0
6
2
4
2
15
16
0
0
2
0
4
2
4
0
0
8
2
0
4
2
4
Δi
2
3
4
5
6
7
8
9
10
11
12
13
14
15
3 Methods for Construction and Minimization of Linear Inequalities of DDT In this section, we discuss the H-representation for the convex hull, and Boolean logic minimization tools (viz. Logic Friday and MILES). We also present the limitations of the convex hull approach and Logic Friday.
3.1 H-representation of Convex Hull The convex hull H of a discrete set U ⊆ Rk is the smallest convex set such that U ⊆ H. A convex hull of a discrete set in Rk that can be written in terms of the common solutions of a system of finitely many linear inequalities is known as the H-representation of a convex hull. Let (x1 , x2 , . . . , xm ) ∈ Rm be the input differential and (y1 , y2 , . . . , ym ) ∈ Rm be the corresponding output differential of an m-bit S-box. An input-output differential transition is represented as a vector (x1 , x2 , . . . , xm , y1 , y2 , . . . , ym ) ∈ R2m . Then by calculating the H-representation for the convex hull of the set of all possible input-output differential transitions of an S-box, we get finitely many linear inequalities. There are redundant inequalities in this representation, and the differential characteristics search problem becomes inefficient with these many inequalities. For the minimization of the number of linear inequalities, Sun et al. [36] have proposed an impossible transitions-based reduction
48
M. Kaur et al.
approach. Each linear inequality removes some points corresponding to impossible differential transitions, and the objective is to remove all impossible transitions with the minimum number of inequalities. A MILP problem is designed with the minimization of the number of linear inequalities as an objective function subject to these constraints, and an optimization solver such as GUROBI or CPLEX is used to solve a MILP problem. The solution provides a reduced set of inequalities that removes all impossible transitions in DDT.
3.2 Boolean Logic Minimization Tools—Logic Friday, MILES Logic Friday and MILES both are based on the Boolean logic minimization algorithm Espresso. Logic Friday takes a truth table as an input which is obtained from the DDT of an S-box. There are two steps involved to minimize linear inequalities, i.e., (1) Minimize and (2) Product of Sums sequentially. Then, we get the minimized linear inequalities as an output. On the other hand, the MILES tool gets an S-box and gives the minimized set of linear inequalities. The four processes (1) DDT Generation, (2) Conversion of DDT to Truth Table, (3) Minimization of Truth Table, and (4) Linear Inequalities Generation are used consecutively in MILES to generate the minimized linear inequalities.
3.3 Limitation of Convex Hull Approach and Logic Friday Tool Computing a convex hull corresponding to possible transitions in DDT is an easier task for the 4-bit S-boxes, but the complexity increases exponentially with larger (≥6bit) S-boxes. Convex hull computation for larger dimensions (≥12) has practical time limitations and may take days to produce the results. Therefore, for S-boxes more than 6 bits in size, the convex hull-based method does not work. Logic Friday tool generates the linear inequalities using the Espresso algorithm, but it has a limitation on input size (≤16) which limits an S-box size to 8 bits. Therefore, for S-boxes more than 8 bits in size, the application of Logic Friday is computationally infeasible. The comparison of tools to generate the linear inequalities is presented in Table 4.
4 Results The three approaches and tools discussed in the previous section are experimented with various block ciphers to get minimized linear inequalities of DDT. The convex hull-based approach gives a large number of initial inequalities which are minimized
A Relation Between Properties of S-box and Linear Inequalities of DDT Table 4 Comparison of tools for linear inequality generation tools SageMath Logic friday Parameters Based On Convex hull function Espresso Advantages (work for ≤6 ≤8 S-boxes of size in bits) Limitations (do not >6 >8 work for S-boxes of size in bits) a It
49
MILES Espresso ≥8a –
also works for < 8
using an impossible transition-based reduction approach. The approach converts the minimization problem into a MILP problem, and the problem is solved using GUROBI solver. The source code for this approach is available on GitHub.1 The executable files for Logic Friday2 and MILES3 are provided by the developers.
4.1 Experiments Linear inequalities construction and minimization approaches are applied on various S-boxes used in 42 ciphers, and minimized inequalities are presented in Table 5. It is evident from Table 5 that for 4-bit S-boxes the convex hull-based approach gives better minimization than Logic Friday and MILES which are based on Boolean logic minimization. Although Logic Friday and MILES are based on the same Espresso minimization algorithm, the results are not the same in most of the cases. The reason for this difference is the availability of various minimization modes in MILES, but there is no specific mode that gives the optimal results in all cases. We have experimented with all available modes to get the results. It is also observable that all S-boxes of size 4 bits don’t have the same number of minimized inequalities using any of the experimented approaches. It depends on the properties of an S-box that are computed using SageMath4 and presented in Table 5. In the next subsection, we will observe the relationship between the properties of S-boxes and the number of linear inequalities.
1
https://github.com/Manjeet95/MinZ_L_InEq_and_S_Boxes_Properties. https://download.cnet.com/Logic-Friday/3000-20415_4-75848245.html. 3 https://github.com/tarunyadav/MILES/releases/download/V1.1/MILES.1.0.0.exe. 4 https://www.sagemath.org/. 2
50
M. Kaur et al.
Table 5 Comparison: number of minimized linear inequalities and properties of 4-bit S-boxes using different tools and approaches Cipher
Convex hull
Logic friday
MILES
BU
FP
BN
1
Lucifer S0 [31]
25
54
52
10
0
2
2
Lucifer S1 [31]
21
48
49
8
0
2
3
Present [31]
21
39
39
16
0
3
4
Present−1 [31]
21
38
38
16
0
3
5
JH S0 [31]
24
49
49
10
0
2
6
JH S1 [31]
24
49
47
10
1
2
7
ICEBERG0 [31]
24
49
50
10
0
2
8
ICEBERG1 [31]
25
44
43
10
0
2
9
LUFFA [31]
24
51
50
16
0
2
10
NOEKEON [16]
20
32
33
16
4
2
11
WARP [5]
21
47
48
16
4
2
12
HAMSI [31]
21
46
43
16
0
3
13
Serpent S0 [31]
21
39
38
16
0
3
14
Serpent S1 [31]
21
39
41
16
2
3
15
Serpent S2 [31]
21
46
43
16
0
3
16
Serpent S3 [31]
27
48
50
10
2
3
17
Serpent S4 [31]
23
43
44
10
1
3
18
Serpent S5 [31]
23
43
43
10
2
3
19
Serpent S6 [31]
21
38
39
16
2
3
20
Serpent S7 [31]
27
49
50
10
0
3
21
μ2 [40]
21
39
39
16
0
3
22
LED [19]
21
39
39
16
0
3
23
RECTANGLE [42]
21
31
32
16
0
2
24
SKINNY [11]
21
31
32
16
1
2
25
SKINNY−1 [11]
21
31
31
16
1
2
26
ANU [7]
23
39
39
16
0
3
27
Hummingbird-2 S1 [18]
27
50
48
10
0
3
28
Hummingbird-2 S2 [18]
24
46
48
10
0
3
29
Hummingbird-2 S3 [18]
23
44
43
10
0
3
30
Hummingbird-2 S4 [18]
23
44
44
10
0
3
31
mCrypton S0 [26]
23
52
49
6
0
2
32
mCrypton S1 [26]
24
49
50
6
0
2
33
mCrypton S2 [26]
23
50
49
6
0
2
34
mCrypton S3 [26]
24
48
51
6
0
2
35
Khudra [22]
21
39
39
16
0
3
36
LBlock S0 [38]
24
30
31
16
0
2
37
LBlock S1 [38]
24
30
31
16
0
2
38
LBlock S2 [38]
24
30
32
16
0
2
39
LBlock S3 [38]
24
31
31
16
0
2
40
LBlock S4 [38]
24
30
32
16
0
2
(continued)
A Relation Between Properties of S-box and Linear Inequalities of DDT
51
Table 5 (continued) Cipher
Convex hull
Logic friday
MILES
BU
FP
BN
41
LBlock S5 [38]
24
30
31
16
0
2
42
LBlock S6 [38]
24
30
31
16
0
2
43
LBlock S7 [38]
24
30
31
16
0
2
44
LBlock S8 [38]
24
30
31
16
0
2
45
LBlock S9 [38]
24
30
31
16
0
2
46
LiCi [30]
22
30
31
16
0
2
47
LILLIPUT [13]
23
47
48
6
0
2
48
Loong [27]
23
45
45
10
2
2
49
QTL S1 [24]
21
39
39
16
0
3
50
QTL S2 [24]
23
52
49
6
0
2
51
PICO [8]
22
30
30
16
0
2
52
PUFFIN [15]
24
49
50
10
0
2
53
NVLC [1]
22
57
58
10
4
2
54
MIBS [21]
23
52
49
6
0
2
55
I-PRESENTT M S [41]
21
39
38
16
2
3
56
I-PRESENTT M S’ [41]
20
31
32
16
4
2
57
I-PRESENTT M S−1 [41]
21
40
41
16
2
3
58
HISEC [4]
21
38
41
16
2
3
59
HERMES S1 [28]
23
49
49
6
0
2
60
HERMES S2 [28]
22
51
50
6
0
2
61
HERMES S3 [28]
22
52
53
6
0
2
62
HERMES S4 [28]
24
50
51
6
0
2
63
HERMES S5 [28]
23
51
50
6
0
2
64
HERMES S6 [28]
24
50
50
6
0
2
65
HERMES S7 [28]
23
49
49
6
0
2
66
HERMES S8 [28]
22
52
50
6
0
2
67
HERMES S1−1 [28]
23
52
51
6
0
2
68
HERMES S2−1 [28]
22
51
51
6
0
2
69
HERMES S3−1 [28]
22
53
53
6
0
2
70
HERMES S4−1 [28]
24
52
50
6
0
2
71
HERMES S5−1 [28]
23
51
48
6
0
2
72
HERMES S6−1 [28]
24
52
51
6
0
2
73
HERMES S7−1 [28]
23
51
49
6
0
2
74
HERMES S8−1 [28]
22
51
50
6
0
2
75
Piccolo [34]
21
31
32
16
0
2
76
GRANULE [6]
22
31
31
16
0
2
77
PRINCE [14]
22
52
51
10
0
2
78
FeW [23]
24
48
48
10
0
3
79
CRAFT [12]
21
47
48
16
4
2
80
MIDORI [12]
21
47
48
16
4
2
(continued)
52
M. Kaur et al.
Table 5 (continued) Cipher
Convex hull
Logic friday
MILES
BU
FP
BN
81
BORON [9]
23
40
39
16
0
3
82
ANU-II [17]
23
40
39
16
0
3
83
RoadRunneR [10]
24
32
32
16
1
2
84
SAT_Jo [33]
25
48
50
6
2
2
85
SFN S1 [25]
21
47
48
16
4
2
86
SFN S2 [25]
22
52
51
10
0
2
87
SLIM [3]
21
39
39
16
0
3
88
TWINE [37]
23
47
48
6
1
2
4.2 Relation Between Boomerang Uniformity and Number of Linear Inequalities We analyze the experiments performed on S-boxes of size 4 bits employed in 42 lightweight block ciphers to search the relations between linear inequalities and properties of S-boxes. Boomerang uniformity (BU), number of fixed points (FP), and differential branch number (BN) are three properties of an S-box which take different values for most of the experimented ciphers. For all ciphers, these values are mentioned in Table 5. BU takes four values (6, 8, 10, 16), FP takes four values (0, 1, 2, 4), and BN takes two values (2, 3). To get a deeper understanding of these experimental results, we observe the relation between Boomerang uniformity and the number of linear inequalities produced by MILES for fixed values of BN and FP. There are eight possible configurations ((BU,FP,BN) = {(*,0,2), (*,1,2), (*,2,2), (*,4,2), (*,0,3), (*,1,3), (*,2,3), (*,4,3)}) to analyze, and out of these option (∗, 1, 3) does not exist in our experiments. The graphs corresponding to these configurations are depicted in Fig. 2. The blue circles represent a number of minimized linear inequalities corresponding to the DDT of an S-box, while the red diamonds represent the boomerang uniformity of an S-box. Figure 2a describes the configuration (*,0,2) of experiments. This configuration is valid for 47 (out of 88) 4-bit S-boxes and for each S-box, boomerang uniformity and linear inequalities are represented in Fig. 2. For BU values 6 and 8, the number of inequalities is in the higher range, while for BU values 10 and 16 this number shows decreasing trends. This proposes a relation that for fixed BN and FP, the number of linear inequalities is decreased if the boomerang uniformity is increased. A similar relation is observed in all the graphs in Fig. 2 which establishes the inverse relationship for a variety of S-boxes. The relations can be explained using BCT of an S-box. The boomerang uniformity is the highest non-trivial entry of the BCT, and higher values of this entry present the existence of more boomerang relations in DDT. More relations in a DDT make its representation simpler in linear space and, therefore, result in a lesser number of linear inequalities.
A Relation Between Properties of S-box and Linear Inequalities of DDT
(a) FP = 0, BN = 2
(b) FP = 1, BN = 2
(c) FP = 2, BN = 2
(d) FP = 4, BN = 2
(e) FP = 0, BN = 3
(f) FP = 2, BN = 3
53
Fig. 2 Relation between linear inequalities and Boomerang uniformity
5 Conclusion The differential characteristic search problem is converted into a MILP model and solved using MILP solvers. The size of the MILP model depends on the minimum number of linear inequalities required to represent the DDT of an S-box. In this paper, we have applied and compared three different methods to construct the linear inequalities for 4-bit S-boxes that are employed in various lightweight block ciphers. We have performed various experiments and observed a relation between the number of linear inequalities and the cryptographic properties of S-boxes. We have established that the number of linear inequalities is inversely related to the value of the
54
M. Kaur et al.
boomerang uniformity of an S-box. This relation will help to construct cryptographically strong S-boxes. We can experiment with the other cryptographic properties of 4-bit and 8-bit S-boxes in the future.
References 1. Abd Al-Rahman S, Sagheer A, Dawood O (2018) Nvlc: new variant lightweight cryptography algorithm for internet of things. In: AiCIS. IEEE, pp 176–181 2. Abdelkhalek A, Sasaki Y, Todo Y, Tolba M, Youssef AM (2017) Milp modeling for (large) s-boxes to optimize probability of differential characteristics. In: ToSC, pp 97–112 3. Aboushosha B, Ramadan RA, Dwivedi AD, El-Sayed A, Dessouky MM (2020) Slim: a lightweight block cipher for internet of health things. IEEE Access 8:203747–203757 4. AlDabbagh SSM, Al Shaikhli IFT, Alahmad MA (2014) Hisec: a new lightweight block cipher algorithm. In: ICPS, pp 151–156 5. Banik S, Bao Z, Isobe T, Kubo H, Liu F, Minematsu K, Sakamoto K, Shibata N, Shigeri M (2020) Warp: revisiting gfn for lightweight 128-bit block cipher. In: SAC. Springer, pp 535–564 6. Bansod G, Patil A, Pisharoty N (2018) Granule: an ultra lightweight cipher design for embedded security. Cryptology ePrint Archive 7. Bansod G, Patil A, Sutar S, Pisharoty N (2016) Anu: an ultra lightweight cipher design for security in IoT. Secur Commun Netw 9(18):5238–5251 8. Bansod G, Pisharoty N, Patil A (2016) Pico: an ultra lightweight and low power encryption design for ubiquitous computing. Defence Sci J 66:3 9. Bansod G, Pisharoty N, Patil A (2017) Boron: an ultra-lightweight and low power encryption design for pervasive computing. Front Inf Technol Electron Eng 18(3):317–331 10. Baysal A, Sahin ¸ S (2015) Roadrunner: a small and fast bitslice block cipher for low cost 8-bit processors. In Lightweight Cryptography for Security and Privacy. Springer, pp 58–76 11. Beierle C, Jean J, Kölbl S, Leander G, Moradi A, Peyrin T, Sasaki Y, Sasdrich P, Sim SM (2016) The skinny family of block ciphers and its low-latency variant mantis. In: Annual international cryptology conference. Springer, pp 123–153 12. Beierle C, Leander G, Moradi A, Rasoolzadeh S (2019) Craft: lightweight tweakable block cipher with efficient protection against dfa attacks. ToSC 2019(1):5–45 13. Berger TP, Francq J, Minier M, Thomas G (2015) Extended generalized feistel networks using matrix representation to propose a new lightweight block cipher: Lilliput. IEEE Trans Comput 65(7):2074–2089 14. Borghoff J, Canteaut A, Güneysu T, Kavun EB, Knezevic M, Knudsen LR, Leander G, Nikov V, Paar C, Rechberger C et al (2012) Prince–a low-latency block cipher for pervasive computing applications. In: ASIACRYPT. Springer, pp 208–225 15. Cheng H, Heys HM, Wang C (2008) Puffin: a novel compact block cipher targeted to embedded digital systems. In: 2008 11th EUROMICRO conference on digital system design architectures, methods and tools. IEEE, pp 383–390 16. Daemen J, Peeters M, Van Assche G, Rijmen V (2000) Nessie proposal: Noekeon. In: First open NESSIE workshop, pp 213–230 17. Dahiphale V, Bansod G, Patil J (2017) Anu-ii: a fast and efficient lightweight encryption design for security in IoT. In: 2017 international conference on big data, IoT and data science. IEEE, pp 130–137 18. Engels D, Saarinen M-JO, Schweitzer P, Smith EM (2011) The hummingbird-2 lightweight authenticated encryption algorithm. In: International workshop on radio frequency identification: security and privacy issues. Springer, pp 19–31 19. Guo J, Peyrin T, Poschmann A, Robshaw M (2011) The led block cipher. In: International workshop on cryptographic hardware and embedded systems. Springer, pp 326–341
A Relation Between Properties of S-box and Linear Inequalities of DDT
55
20. Heys HM (2002) A tutorial on linear and differential cryptanalysis. Cryptologia 26(3):189–221 21. Izadi M, Sadeghiyan B, Sadeghian SS, Khanooki HA (2009) Mibs: a new lightweight block cipher. In: International conference on cryptology and network security. Springer, pp 334–348 22. Kolay S, Mukhopadhyay D (2014) Khudra: a new lightweight block cipher for fpgas. In: SPACE. Springer, pp 126–145 23. Kumar M, Sk P, Panigrahi A (2014) Few: a lightweight block cipher. TJMCS 11(2):58–73 24. Li L, Liu B, Wang H (2016) Qtl: a new ultra-lightweight block cipher. Microprocess Microsyst 45:45–55 25. Li L, Liu B, Zhou Y, Zou Y (2018) Sfn: a new lightweight block cipher. Microprocess Microsyst 60:138–150 26. Lim CH, Korkishko T (2005) mcrypton–a lightweight block cipher for security of low-cost rfid tags and sensors. In: International workshop on information security applications. Springer, pp 243–258 27. Liu B-T, Li L, Wu R-X, Xie M-M, Li QP (2019) Loong: a family of involutional lightweight block cipher based on spn structure. IEEE Access 7:136023–136035 28. M˘alu¸tan SB, Dragomir IR, Laz˘ar M, Vitan D (2019) Hermes, a proposed lightweight block cipher used for limited resource devices. In: 2019 international conference on speech technology and human-computer dialogue. IEEE, pp 1–6 29. Mouha N, Wang Q, Gu D, Preneel B (2011) Differential and linear cryptanalysis using mixedinteger linear programming. In: International conference on information security and cryptology. Springer, pp 57–76 30. Patil J, Bansod G, Kant KS (2017) Lici: a new ultra-lightweight block cipher. In: 2017 international conference on emerging trends & innovation in ICT (ICEI). IEEE, pp 40–45 31. Saarinen M-JO (2011) Cryptographic analysis of all 4× 4-bit s-boxes. In: International workshop on SAC. Springer, pp 118–133 32. Sasaki Y, Todo Y (2017) New algorithm for modeling s-box in milp based differential and division trail search. In: International conference for information technology and communications. Springer, pp 150–165 33. Shantha MJR, Arockiam L (2018) Sat_jo: an enhanced lightweight block cipher for the internet of things. In: ICICCS. IEEE, pp 1146–1150 34. Shibutani K, Isobe T, Hiwatari H, Mitsuda A, Akishita T, Shirai T (2011) Piccolo: an ultralightweight blockcipher. In: International workshop on cryptographic hardware and embedded systems. Springer, pp 342–357 35. Sun S, Hu L, Song L, Xie Y, Wang P (2013) Automatic security evaluation of block ciphers with s-bp structures against related-key differential attacks. In: International conference on information security and cryptology. Springer, pp 39–51 36. Sun S, Hu L, Wang P, Qiao K, Ma X, Song L (2014) Automatic security evaluation and (related-key) differential characteristic search: application to simon, present, lblock, des (l) and other bit-oriented block ciphers. In: International conference on the theory and application of cryptology and information security. Springer, pp 158–178 37. Suzaki T, Minematsu K, Morioka S, Kobayashi E (2012) twine: a lightweight block cipher for multiple platforms. In: SAC, pp 339–354 38. Wu W, Zhang L (2011) Lblock: a lightweight block cipher. In: ACNS. Springer, pp 327–344 39. Yadav T, Kumar M (2021) Miles: Modeling large s-box in milp based differential characteristic search. IACR Cryptol. ePrint Arch. 2021:1388 40. Yeoh W-Z, Teh JS, Sazali MISBM (2020) μ 2: a lightweight block cipher. In: Computational science and technology. Springer, pp 281–290 41. Z’aba MR, Jamil N, Rusli ME, Jamaludin MZ, Yasir AAM (2014) I-present tm: an involutive lightweight block cipher. J Inf Secur 2014 42. Zhang W, Bao Z, Lin D, Rijmen V, Yang B, Verbauwhede I (2015) Rectangle: a bit-slice lightweight block cipher suitable for multiple platforms. Sci China Inf Sci 58(12):1–15
Damage Level Estimation of Rubble-Mound Breakwaters Using Deep Artificial Neural Network Susmita Saha
and Soumen De
Abstract Breakwaters perform an important role in the protection of the ports, harbours and the entire coastal region. The optimum design of these breakwater structures and a proper stability analysis is very essential. An accurate estimation of the damage level is an important issue in the context of breakwater’s stability analysis. The advanced approach of deep learning is introduced in the present study. Using a large experimental dataset over the whole ranges of the stability variables, three deep artificial neural networks (ANN) had been developed and compared for the prediction of the damage level of rubble-mound breakwaters. Overcoming the existing limitations of this field, the proposed study shows high accuracy in the estimation of the damage level. These deep ANN models have the potential to deal with the complex non-linearities related to this field and can reduce the time and cost of estimation of the damage ratio. Keywords Artificial neural network · Deep learning · Rubble-mound breakwater · Damage level
1 Introduction Breakwaters are one of the most essential protective structures in coastal regions. It controls the destructive wave energy and protects the coast from continuous wave attacks. Moreover, it maintains the safety of the port and provides loading and unloading facilities for the cargo and the passengers. Rubble-Mound breakwaters are one of the most used and popular kinds of breakwater structures all over the world. It falls in the category of statically stable breakwater structures and it consists of three layers; core, filter and armor layer. The economically optimum design of these
S. Saha (B) · S. De Department of Applied Mathematics, University of Calcutta, 92, A.P.C. Road, Kolkata 700 009, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_5
57
58
S. Saha and S. De
rubble-mound breakwaters depends on the amount of displaced armor units and this is called the damage [9]. This damage level (S) can be measured by S=
A Dn50 2
(1)
where A is the cross-sectional eroded area and Dn50 is the nominal diameter of the stones. The accurate estimation of this damage level is a major issue in the field of coastal engineering. For the past few years, a lot of experimental and physical studies have been carried out in this field of breakwater’s damage analysis [10, 11]. Hudson et al. [12] investigated the parameters that affect the armor unit. They presented an empirical formula considering the wave height but the wave period and wave steepness were ignored. Hanzawa et al. [14] and Melby et al. [15] proposed empirical formula in the estimation of breakwater’s damage ratio. In this prospect, a physical model study [13] has been conducted on various types of breakwaters by Rao and Shirlal. Van deer Meer [10] conducted a large number of tests on the stability analysis of rubble-mound breakwaters and presented empirical formulas. Experimental studies are very much time-consuming and costly. There is a large disagreement between the original values of the damage level and the estimated values of the existing models. The real-life problems and the system of coastal regions are very complex in nature. The theoretical models and empirical formulas cannot deal with these complexities related to this field. The advanced techniques of artificial intelligence (AI) seem to bring a revolution in the field of damage analysis and many other areas of ocean and coastal engineering. In the recent past, various types of data-driven AI models such as artificial neural network (ANN), support vector machine (SVM), adaptive neuro-fuzzy intelligence system (ANFIS), genetic programming (GP), fuzzy logic (FL) and many more have been implemented in the estimation of damage ratio [2–4]. Mase et al. [9] used the concept of artificial neural networks (ANN) for the prediction of the stability number and damage level. These ANN models showed better results than the existing theoretical models. Yagci et al. [1] introduced three different artificial neural network models and a fuzzy model to predict the damage ratio, considering mean wave period, wave steepness, significant wave height and the slope angle of the breakwater. Janardhan et al. [5] applied principal component regression analysis to select the input variables and measured the damage level of the reshaped berm breakwater. Kuntoji et al. [6] used a warm intelligence-based neural network to predict the damage ratio of tandem breakwaters. Mandal et al. [7] used the experimental data for non-reshaped berm breakwater and introduced artificial neural network, support vector machine and adaptive neuro-fuzzy inference systembased models for the damage analysis of breakwaters. Kim et al. [8] presented a new approach by incorporating a wave height prediction ANN model into a Monte Carlo Simulation to predict the damage level. Saha et al. [18] developed machine learning models to predict the stability number of rubble-mound breakwaters.
Damage Level Estimation of Rubble-Mound Breakwaters …
59
Although the existing data-driven models show better performance than the traditional models, there are some boundaries and the estimation of the damage level should be improved. Moreover, the available studies did not consider the whole ranges of stability parameters. In most of the previous literatures, the number of data used in model development is very less and only a few of the stability variables were considered. Each of the stability parameters has an individual effect on the damage level and the stability of breakwaters [10]. Ignoring them randomly can affect the estimation. The reviewed literatures [2–4] show the advantages and limitations of artificial intelligence (AI) techniques in the field of stability analysis of breakwaters. It concluded that further research based on different concepts of AI are required to build more significant and robust models. Motivating by the advantages of the data-driven AI models and observing their limitations, three deep ANN models were developed in the present study to estimate the damage level of rubble-mound breakwaters. Almost each of the stability variables
Fig. 1 A schematic idea of the present study
60
S. Saha and S. De
had been considered and a large dataset over the whole ranges of these stability parameters was used to develop these ANN models. At first, a brief idea about deep learning is introduced here and then the development of deep ANN based prediction models are described. The obtained result was analysed with the statistical criteria and compared with the existing study. A step-by-step summary of the present study is shown in Fig. 1.
2 Deep Artificial Neural Network Recently, deep learning has achieved immense popularity with its latest advancement. The fundamental concept of deep learning is inspired by the functioning of the human brain and these networks are developed based on the machine learning algorithms of artificial neural network. Deep artificial neural networks are the type of feedforward neural networks, constructed with an input layer, output layer and an arbitrary number of fully connected deep hidden layers. Pre-activation and activation phases are implemented to each of the neurons of these hidden layers and the output layer [17]. These neurons pass the result of the weighted input and bias to the next layer. This network structure can be summarized in the following matrix representation: fp = ψ p (χ p fp−1 ) + b p
for p = 1, 2, ..., k,
(2)
where χ p represents the weight matrix and ψ is the activation function. The column vectors fp and b p are the output and bias, respectively, and k is the number of layers of the corresponding network. A schematic diagram of a deep artificial neural network consisting of three hidden layers is presented in Fig. 2. The deep ANN model can learn the complex non-linearities by itself during the training period with its deep hidden layer structure [16]. These deep ANNs are more advanced than the existing prediction models and they can save the time and construction cost of the experimental studies with their fast converging nature. The increasing volume of the experimental dataset related to the field of breakwaters, indicates that deep ANN-based models will be an appropriate choice for further investigation.
3 Dataset A good number of appropriate data plays an important role in the performance of these data-driven models. Considering the effect of almost each of the stability variables, the experimental dataset of Van deer Meer [10] had been used in this present study. A total of 632 data points over the whole ranges of the stability parameters were considered to develop the proposed deep ANN model.
Damage Level Estimation of Rubble-Mound Breakwaters …
61
Fig. 2 Deep artificial neural network
The permeability of the structure (P), number of waves (N), the water depth at the structure (h), nominal diameter of the stone (Dn50 ), relative submerged mass density (), slope angle (cot α), surf similarity parameter (ζm ), spectral shape (SS), average wave period (Tm ), significant wave height in front of the structure (Hs ), grading of D85 ) and the stability number (Ns ) were taken as the input variables armor stones ( D 15 and the logarithm of the damage level (S) was taken as the output variable in this present study. The entire dataset was normalized and split into two parts with 85% and 15%, for training and testing purpose, and 15% of this training data was used for validation.
4 Development of the Deep ANN-Based Damage Level Estimation Model The network structure plays an important role in the performance of the deep ANN models and this can be improved by changing the number of hidden layers and the number of neurons in it. In the present study, three deep ANN models had been constructed with different network structures, named deep ANN I, deep ANN II and deep ANN III, respectively. The performances of these models in the prediction
62
S. Saha and S. De
Table 1 Structures of the deep ANN models Model Network structure Deep ANN I
Deep ANN II
Deep ANN III
1st Hidden layer 2nd Hidden layer Output layer 1st Hidden layer 2nd Hidden layer 3rd Hidden layer Output layer 1st Hidden layer 2nd Hidden layer 3rd Hidden layer 4th Hidden layer Output layer
No. of Neurons
Activation function
50 25 1 100 50 25 1 100 50 25 10 1
ReLU ReLU Sigmoid ReLU ReLU ReLU Sigmoid ReLU ReLU ReLU ReLU Sigmoid
of the damage level of rubble-mound breakwaters were analysed statistically and compared to find out the most robust one. The proposed deep ANN models were developed with 2, 3 and 4 hidden layers, respectively, containing different numbers of neurons in each layer. The activation function ReLU was used for the hidden layers and the sigmoid function was used for the output layer in all of these three above-mentioned models. The structural description of these deep ANN models is provided in detail in Table 1. The learning algorithm also affects the performance of these deep ANN models. After analysing various optimizing functions, Adam (adaptive moment estimation) was used for the proposed models.
5 Results and Discussion 5.1 Performance Analysis and Model Accuracy The performances of the proposed deep ANN models in the estimation of damage level of rubble-mound breakwaters were analysed with statistical criteria such as coefficient of determination (R 2 ) and root mean square error (RMSE). The corresponding values of R 2 and RMSE, obtained from the proposed deep ANN models on training, validation and unseen test dataset, are provided in Table 2. It can be observed from this table that all of these three proposed deep ANN models predict the damage level very well. Comparing these results, it may be concluded that the deep ANN III seems to be the most robust model in the estimation of damage level as it gets trained with 92% accuracy and provides 86% accuracy on unseen data.
Damage Level Estimation of Rubble-Mound Breakwaters … Table 2 Performances of the deep ANN models Model Epoch Data Deep ANN I
481
Deep ANN II
481
Deep ANN III
481
Training Validation Testing Training Validation Testing Training Validation Testing
63
RMSE
R2
0.0454 0.0676 0.0581 0.0372 0.0693 0.0559 0.0373 0.0680 0.0516
0.8860 0.7892 0.8390 0.9219 0.7725 0.8523 0.9190 0.7914 0.8633
Fig. 3 Variation of RMSE of the deep ANN III model
The variation of the RMSE of the deep ANN III model, in the estimation of the damage level, is presented in Fig. 3. This shows that the error rate found for both the training and validation is almost similar to each other. The corresponding values of R 2 have been shown in Fig. 4. Plotting the experimental and estimated values of the damage level along with the horizontal and vertical lines, respectively, for both the training and testing dataset, the scatter diagram is presented in Fig. 5. This shows that the proposed model performs very well in the prediction of unseen values of the damage level, as the data points seem to be very close to y = x line. This deep ANN model can be used in the prediction of the damage level without performing any experimental study. It can save the estimation cost and time exceedingly.
64
S. Saha and S. De
Fig. 4 The obtained values of R 2 of the deep ANN III model
(a)
(b)
Fig. 5 The experimental versus predicted values of the damage level on the (a) training and (b) testing dataset
The corresponding error of the proposed deep ANN III model is presented in a histogram (Fig. 6) based on the unseen test dataset. This error histogram indicates the high accuracy of the proposed model with almost zero frequency of prediction error in estimation of the damage level. The above statistical analysis indicates that the proposed deep ANN model performs excellently in estimation of the damage level of rubble-mound breakwaters.
Damage Level Estimation of Rubble-Mound Breakwaters …
65
Fig. 6 Error histogram profile of the deep ANN III model on unseen data
5.2 Comparison of the Proposed Deep ANN Model with the Existing ANN-Based Damage Level Estimation Model Mase et al. [9] used randomly chosen 100 data points from Van deer Meer [10] experimental dataset and presented ANN models with a single hidden layer to predict the damage level. They have used different numbers of neurons in the hidden layer as 4, 8, 12, 16 and 20. First, they considered only four stability variables (P, N, Ns and ζm ) as input and trained the neural network up to 1000, 5000, 10000, 30000 and 50000 epochs, respectively. The best result was obtained up to a coefficient of determination as 0.88 at the epoch of 5000. Then they added the dimensionless water depth ( Hhs ) and the spectral shape (SS) into the input dataset. Using these six stability variables (P, N, Ns , ζm , Hhs and SS), they developed another ANN model with 12 neurons in the hidden layer. This ANN model was also trained over 100 data points only and it showed 66% accuracy in R 2 , at the epoch of 5000. Performing the model up to such a higher number of epochs is time-consuming and it can create difficulties during the damage estimation. Also, each of the stability parameters has less or more influence on the damage level [10]. Ignoring most of these stability parameters, the prediction models cannot give better performance. The proposed deep ANN models were trained with 537 Van deer Meer [10] experimental data, over the whole ranges of the stability variables, considering almost 85 , Hs and Ns ). These every stability parameter (P, N, h, Dn50 , , cot α, ζm , SS, Tm , D D15 models were trained over all of the low crest, large-scale and small-scale data. Deep neural networks can learn the underlying nonlinearities from the dataset during its
66
S. Saha and S. De
Table 3 Comparison of the proposed deep ANN model with the existing ANN model Literatures Model Dataset Data used in Input Epoch R2 training Variables Mase et al. [9]
ANN
Van deer Meer [10]
100
P, N, Ns , ζm , Hhs , SS
5000
0.66
Present study
Deep ANN
Van deer Meer [10]
537
P, N, Ns , ζm , Hhs , SS
500
0.82
537
P, N, Ns , ζm , Hhs , SS
1000
0.87
537
P, N, h, Dn50 , , cot α, ζm ,SS, Tm , D85 D15 , Hs , Ns
481
0.92
training period. For this purpose, a large amount of appropriate data is always a good choice for developing such forecasting models. The proposed deep ANN model shows 92% accuracy during its training at the epoch of 481 only in the prediction of the damage level. Moreover, it can predict the unseen damage level data with 86% accuracy within 481 epochs. To compare these two ANN models, the proposed deep ANN III model is trained separately with the same six input parameters as used in Mase et al. [9], that is P, N, Ns , ζm , Hhs and SS. This deep ANN model gives the estimation of the damage level with the coefficient of determination as 0.82 and 0.87 at the epoch of 500 and 1000, respectively. The proposed deep ANN model can reach up to such an optimum result with a minimum number of learning iterations. This comparison is provided in Table 3. The above comparison shows that the proposed deep ANN model can predict the damage level of rubble-mound breakwaters more accurately. Also, it can deal with the uncertainties related to this field. These proposed deep ANN models can fulfil the necessity of a robust model in estimation of the damage level.
6 Conclusion The proposed study has introduced deep learning-based ANN models to predict the damage level of rubble-mound breakwaters. Almost, each of the stability variables over their whole ranges were considered in this study. Few excellent conclusions have been found in this present study. These are given below: 1. The proposed deep ANN model shows the highest level of estimation of the damage level. These deep ANN models have the potential to deal with the available
Damage Level Estimation of Rubble-Mound Breakwaters …
67
ambiguities and non-linearities related to the field of damage level and stability analysis of rubble-mound breakwaters. 2. These deep neural networks can reduce the cost and time of estimation drastically with their rapid convergence property. 3. The proposed deep learning-based study can be a useful alternative approach for the designers of the breakwater structures. Similar kinds of prediction models can be built using the advanced concepts of different AI techniques, in the field of ocean and coastal engineering. Acknowledgements This work is partially supported by SERB, DST [grant number TAR/2022/000107]. The author S. Saha wish to thank the Council of Scientific and Industrial Research (CSIR), India, for providing financial support (File No: 09/0028(11208)/2021-EMR-I), as a research scholar of the University of Calcutta, India.
References 1. Yagci O, Mercan DE, Cigizoglu HK, Kabdasli MS (2005) Artificial intelligence methods in breakwater damage ratio estimation. Ocean Engineering 32:2088–2105 2. Kundapura S, Hegde AV (2017) Current approaches of artificial intelligence in breakwaters-A review. Ocean Systems Engineering 7(2):75–87 3. Karthik S, Rao S (2017) Application of Soft Computing in Breakwater Studies- A Review. International Journal of Innovative Research in Science, Engineering and Technology 6(5):2347– 6710 4. Jain A, Rao S (2018) Application of Soft Computing techniques in Breakwater - A Review. International Journal of Scientific and Engineering Research 9(4):2229–5518 5. Janardhan P, Harish N, Rao S, Shirlal KG (2015) Performance of Variable Selection Method for the Damage Level Prediction of Reshaped Berm Breakwater. Aquatic Procedia 4:302–307 6. Kuntoji, G., Rao, S., Manu, M., Reddy, M.: Prediction of Damage Level of Inner Conventional Rubble Mound Breakwater of Tandem Breakwater Using Swarm Intelligence-Based Neural Network (PSO-ANN) Approach. Advances in Intelligent Systems and Computing, 2017;2:978981-13-1594-7 7. Mandal S, Rao S, Harish N (2012) Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models. International Journal of Naval Architecture and Ocean Engineering 4:112–122 8. Kim DH, Kim Y, Hur DH (2014) Artificial neural network based breakwater damage estimation considering tidal level variation. Ocean Engineering 87:185–190 9. Mase H, Sakamoto M, Sakai T: Neural network for stability analysis of rubblemound breakwaters. Journal of Waterway, Port, Coastal, and Ocean Engineering, Coast. Ocean Engineering, 1995;ASCE. 121:294-299 10. Van der Meer, JW.: Rock Slopes and Gravel Beaches under Wave Attack. PhD Thesis. Delft University of Technology Delft, The Netherlands; 1988 11. Rao, S., Pramod, CH., Rao. BK.: Stability of berm breakwater with reduced armor stone weight. Ocean Engineering, 2004;31(11-12):1577-1589 12. Hudson, VY., Herrmann, FA., Sager, RA., Whalin, RW., Keulegan, GH., Chatham, CE., Hales, LZ.: Coastal Hydraulic Model. Special Report, US Army Corps of Engineering, Coastal Engineering Research, 1979;Centre, No: 5 13. Shirlal KG, Rao S (2007) Ocean wave transmission by submerged reef-A physical model study. Ocean Engineering 34:2093–2099
68
S. Saha and S. De
14. Hanzawa, M., Sato, H., Takahashi, S., Shimosako, K., Takayama, T., and Tanimoto, K.: New stability formula for wave-dissipating concrete blocks covering horizontally composite breakwaters. In: Proceedings of the 25th International Conference on Coastal Engineering ASCE, Orlando, 1996:1665-1678 15. Melby, J., Kobayashi, N. Progression and variability of damage on rubble mound breakwaters. J. Waterw., Port, Coast. Ocean Eng. 1998;(124):286-294 16. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature. 521(7553):436–444 17. Svozil, D. Kvasnicka, V, Pospichal, J.: Introduction to multilayer feed forward neural network. Chemometrics and Intelligent Laboratory Systems, 1997;39:43-62 18. Saha, S., Changdar, S., De, S.: Prediction of the stability number of conventional rubble-mound breakwaters using machine learning algorithms. Journal of Ocean Engineering and Science, 2022;2468-0133, https://doi.org/10.1016/j.joes.2022.06.030
Facial Image Manipulation Detection Using Cellular Automata and Transfer Learning Shramona Chakraborty and Dipanwita Roy Chowdhury
Abstract In digital media forensics, detecting manipulated facial images is a crucial topic. With the advancement of synthetic face generation technologies, different types of face morphing have raised serious concerns in social media implications. Hence, we have worked on detecting the manipulated face image thoroughly. Despite just using a multi-task learning approach, we propose a facial image manipulation detection scheme using cellular automata (CA) and transfer learning (TL) as feature extractors to enhance the classifier model’s feature maps. The extracted features point out the informative regions to refine the binary classification power. We have collected datasets that contain diverse facial forgeries. We have also built our dataset, keeping every possible morphing way in mind. We have analyzed the datasets thoroughly and demonstrated that using a combination of CA and TL improves manipulated fake detection on different morphing techniques. Keywords Cellular automata · Machine learning · Image manipulation detection · Feature extraction
1 Introduction Digital image manipulation is a digital art that requires discovering image properties and good visual creativeness. One tampers with images for varied motives. The wrong cause is to create false evidence. The image forgery is mostly misused for identity loss, and the prime area is vital facial points. Image manipulation is almost as old as photography, but technology has made it familiar and easy. Almost everyone uses photo editing software. The difference is that few people make small changes S. Chakraborty (B) · D. R. Chowdhury Crypto Research Lab, Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721 302, India e-mail: [email protected] D. R. Chowdhury e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_6
69
70
S. Chakraborty and D. R. Chowdhury
to an image, like adjusting colors, lighting, or creative purposes. But wicked people wrongly use this technology and play with the content. Images are meant to show reality, and manipulation can create big problems. For instance, social media users can get a digital makeover by editing tools and different application filters. Users can make themselves look however they desire in just a few moments. This type of issue brings our focus to facial manipulation. Digital face manipulation methods fall into four categories: attribute change (like hair color, sunglass, hairstyle, etc.), emotion change (like sleeping, smiling, eating, etc.), face swap (taking two individual’s faces and merging to get a newer version), and complete change (completely different human face). 3D face reformation and animation methods are extensively utilized for expression swap, for instance, Face2Face [1]. These techniques help transfer expressions from one individual to another in real time. Next, for identity swap, one individual’s face is replaced with the face of another. In attribute manipulation, the replacement of single or multiple facial attributes, e.g., age, gender, glasses, skin color, and hair, is done. It is observed that the framework of generative adversarial networks (GANs) is used negatively for image translation [2] or manipulation, which diversifies facial image synthesis. Finally, the last type of issue is total face synthesis. Creating an entirely synthetic facial image is extremely easy because of the availability of enormous face data and the triumph of GANs. This work considers identity swap, attribute, and expression manipulation. Entire face synthesis is not valuable for criminal identification, border control, or biometrics. Data is collected from standard sources and prepared with facial pictures of multiple individuals from different angles. As described earlier, morphing attacks create a significant threat to the face recognition system (FRS), especially in border control scenarios. In the future, to ensure a safe and smooth face recognition operation, it is required to detect manipulated facial images correctly. This paper overviews manipulation attack detection (MAD) algorithms and metrics details to compute and compare the MAD implementation. The first step for digital face image manipulation detection is to extract essential features from the images. Various feature extraction procedures deal with different types of edges. This paper uses a straightforward and new approach for multiple edge detection, which functions effectively on other images and provides numerous overviews of a single image. The method uses lightweight CA, which is famous for its simple nature in complex computations. The comparison results outperform the standard edge detection procedures. Coming to the manipulation detection procedure various researchers are working on it because of the seriousness of the topic. The first work on intuitive morphed face detection was introduced in [3] that investigates the usability of the micro-texture pattern employing support vector machine (SVM) and binarised statistical image features (BSIF). They have experimented on 450 manipulated face images (digital samples) manually generated to acquire highquality manipulated images. The detection technique has exhibited a good detection performance. Thies et al. [4] have introduced an expression swap for facial reenactment with an RGB-D camera. A real-time face reenactment approach is known as Face2Face [5], which uses an RGB camera solely. Rather than exploiting expression, the upgraded work [6] transfers the entire 3D head position, eye blinking,
Facial Image Manipulation Detection Using Cellular …
71
and expression from a source face to a target face of a portrait video. For example, facial expression animation based on an input audio signal happens in “Synthesizing Obama” [7]. Furthermore, deep learning approaches are famous for synthesizing or exploiting faces [8]. Recently, there have been diverse public implementations of deepfakes by FaceAPP [9]. GAN-based techniques can create entire synthetic faces, including non-face background [10]. Providing a single solution for this diverse problem has been an open challenge for decades, and researchers are actively working on it. By keeping the consequences in mind, a proper comparison is made separately for different morph types with multiple works in the literature. The comparison results are described in Sect. 3.4. In this paper, work on both SMAD and DMAD is presented. The MAD algorithm is further divided into feature extraction processes for classification. The features used for MAD are feature extractors using CA and deep features. The paper is organized as follows: Sect. 2 delivers our theory of work on digital image manipulation detection. Section 3 provides the dataset details, results of the proposed application, and the comparison results with other recent works. Lastly, the conclusion has been drawn in Sect. 4.
2 Digital Image Manipulation Detection The MAD procedure can be categorized as single image MAD (SMAD) or noreference detection and reference-based or differential MAD (DMAD). The S-MAD can be applied in the enrollment and verification process. The D-MAD can be employed where a trusted live capture (TLC) of the same subject is available along with the suspected image. The suspected image is passed through the MAD system. The individual’s identity in the alleged image can be new to the database, or we can match and recognize the face with available data. The SMAD can be applied in the enrollment and verification process. In contrast, DMAD can be used in the recognition and searching process. Figure 1 shows the two different MAD categories. For SMAD, upon receiving a suspect image, we run it on our model and classify it as original or fake based on the MAD score. For DMAD, upon receiving a suspect image, we first try to match the individual’s photo with the available genuine photo in the database. If both picture matches, then we move on with the classification procedure. Otherwise, if the two photographs don’t match, we search the whole dataset and try to find if there is any other individual whose identity and features match the suspect image. If found, we can say that the suspect is trying to pretend to be someone else. Now, if any resemblance is not found, we say the identification fails because the features don’t match the genuine person. Once done, we proceed to the classification procedure to detect morph or genuine pictures. – Training Framework: For classifying the suspected morph images, some prerequisites are maintained for the training and evaluating of machine learning (ML) algorithms. It is ensured that the type of feature vectors used in the algorithm
72
S. Chakraborty and D. R. Chowdhury
(a) No-reference Morphing Detection Scheme
(b) Differential Morphing Detection Scheme
Fig. 1 Categorization of no-reference and differential morphing detection scheme
are used for the total dataset. Next, there is a clear separation of datasets. It is checked that the data used in the training set is not present in the evaluation set. It is also ensured that we have a balanced number of samples per class. In MAD, morphed images can be generated effortlessly since, with a single original image, we can tweak any portion of the subject, resulting in an exponential increase in the number of morphed sample images with respect to the same subject. Therefore, a pre-selection of the morph images is carried out to create a balanced dataset. Furthermore, each dataset contains morphed images using a single morphing technique at a time. Hence, the training and the test data comprise morph images having the same artifacts and inconsistencies. The model is initialized with fixed parameters before training with the selected algorithm. The selection of parameters is discussed in this section. – Classifier parameters: The selection of hyperparameters for ML algorithms directly influences the training and structure of the generated model. Determining the optimal hyperparameters is intricate. On the contrary, it has a vast search space and depends on the training data type. Different approaches help find the optimal hyperparameters; among them, the easiest way is a grid search. The hyperparameters used in this paper are described in Table 1. Due to the bounded availability and size of the datasets, model 1 only works correctly for some of the datasets. That is why weights from pre-trained models are used, and then a new model is constructed by removing the last layer. – Chosen Classifiers: The working of the classifiers is shown in Fig. 2. The configurations of the layers are then described gradually. 1. Input layer: The input layer has nothing to learn; it feeds the input image’s shape. So no learnable parameters are employed here.
Facial Image Manipulation Detection Using Cellular … Table 1 Hyperparameters for model 1 and model 2 Model 1 (Trained from scratch) Number of hidden layers Epochs Batch size Learning rate Loss function
7 20 40 0.0001 Categorical cross-entropy
73
Model 2 (Transfer Learning) 5 20 40 0.0001 Categorical cross-entropy
(a) Model 1 structure
(b) Model 2 structure
Fig. 2 Model structures used in our work
2. Convolution2D layer: The CNN learns here; it has weight metrics. The learnable parameters can be computed by a small matrix of numbers (called kernel or filter) enacted over our image. The transformation is done based on the weights of the filter. After positioning our kernel over a specified pixel, we carry each value from the filter and multiply them in pairs with corresponding weights from the image. Finally, we add everything and set the result right in the output feature map. Valid and Same Convolution: Due to the dimensionality reduction property, an image shrinks every time a convolution is performed on it. Hence we can apply it only a few times before the image disappears entirely. We can pad the images with an extra border to solve the problem. Usually, in practice, the padding is done with zeroes. But relying on whether we use padding or not, there are two classes of convolution: Valid and Same. Valid—indicates that we utilize the original image. In Same—we employ the border around it so that the input and output pictures are equal. 3. Activation functions: Activation functions are added in the model to add nonlinearity and learn complex patterns in the data. It is used at the end to decide what will be fired to the next neuron. The activation functions used in our work are:
74
S. Chakraborty and D. R. Chowdhury
– ReLU: Rectified linear unit urf ReLU is expressed as f (x) = max(0, x). This activation function is used in most cases, especially in convolution neural networks. The reason behind its popularity is it does not saturate, it can be computed efficiently, and it does not cause the vanishing gradient problem. – Softmax: Softmax is generally used in output layer. It produces values for the number of defined classes. 4. Pooling layer: The pooling layer doesn’t have any learnable parameters and is fundamentally used to reduce the size of the tensor and speed up calculations. 5. Dropout: Dropout regularization randomly sets activation to zero during the training process to avoid overfitting a model. This does not happen during prediction on the validation/test set. 6. Flatten: In flattening, the feature map matrix is transformed into a single column, then fed to the neural network for processing. 7. Dense: In dense layers, every neuron is connected to every other neuron. It is the product of the number of neurons in the current layer, the number of neurons in the previous layer, and the bias term. 8. BatchNormalization: BatchNormalization regularization helps in reducing overfitting in a model and is used to normalize the output of the previous layers. – Model Summary: The layer details, output shape, and the count of trainable and non-trainable parameters are shown in Tables 2 and 3. Model 1 is trained from scratch, and the classifier is used for the morphing types: identity swap and expression swap. For model 2, we have tried some standard pre-trained Keras models [11]. The CNN model named MobileNet helped get the best predictions from them, and model 2 is used for the attribute-change method.
Table 2 Model1 summary Layer type Layer details Input layer Hidden layer
Output layer Total params: Trainable params: Non-trainable params:
Images Conv2D Conv2D MaxPooling2D Dropout Flatten Dense Dropout Dense 3,993,762 3,993,762 0
Output shape
#Parameters
96, 96, 3 92, 92, 32 44, 44, 32 22, 22, 32 22, 22, 32 15488 256 256 2
0 2432 25632 0 0 0 3965184 0 514
Facial Image Manipulation Detection Using Cellular … Table 3 Model2 summary Layer type Layer details Input layer Hidden layer
Output layer Total params: Trainable params: Non-trainable params:
Images Mobilenet GlobalAveragePooling2D Dense BatchNormalization Dropout Dense 2,587,458 2,552,834 34,624
75
Output Shape
#Parameters
96, 96, 3 3, 3, 1280 1280 256 256 256 2
0 2257984 0 327936 1024 0 514
3 Results We have examined the complete experimental setup in this section. Section 3.1 describes the division of the experimental dataset and its prepossessing. Despite readily available image data, acquiring feasible public databases to train MAD algorithms is tough. The prime reason is legal constraints. We have tried to collect standard public datasets and construct a dedicated MAD dataset. We have also built our MAD dataset, which follows the legal guidelines. Section 3.2 illustrates the working of our application in detail. It also shows some snapshots from the application working in numerous scenarios with the previously described morph types. Section 3.3 explains the evaluation metrics of the models to justify their necessity. Our work covers all the required metrics and has achieved promising results. Furthermore, a comparison with standard works is required to achieve a proper benchmark. Section 3.4 shows the comparison with other available morph detection models. We can perceive that the results achieved from the proposed model are comparable and sometimes better than the standard results.
3.1 Experimental Data This section defines the datasets used in our experiment for different types of image morphs. All the images from face datasets don’t always fit the creation of the morphing database. The legal guidelines in ISO/IEC 19794-5 [12] are heeded for selecting morphed images. Figure 3 shows the illustration of different morph types for the three morph types. Real face images: In our experiments, real face samples are collected from CelebA [13] since they exhibit good diversity, i.e., they hold variations in gender, age, race,
76
S. Chakraborty and D. R. Chowdhury
(a) Original face (b) Morphed (c) Original face (d) Morphed picture Image (Attribute picture Image (FF++ Change) Identity Swap)
(e) Original face (f ) Morphed Image (Face2Face picture Expression Swap)
Fig. 3 Illustration of face morph images generated using different methods
and expression, pose, resolution, illumination, and camera capture quality. Next, we have utilized the original images from the FaceForensics (FF)++ [14] as extra real faces. We have also racked up our dataset. With permission, we have clicked pictures of individuals from our college department and hostel. – Attribute Change: StarGAN [2] dataset is used to generate attribute-manipulated images. As input real images, 2, 000 test faces of CelebA are used. Then, for each face image from the set in StarGAN, we generated 5 fake face images. It is an image-to-image translation method based on GAN. As a whole, we have collected 10, 000 attribute manipulated images from StarGAN. We have taken pictures from different angles and on multiple days to capture the change in behavior and look. We have considered different lighting conditions for realistic results. We have at least 5 images of every individual and as a whole, we have pictures of 25 individuals. We manually manipulated different facial attributes for high-definition images for each person. We have 5 manipulated images of every fellow. So, as a whole, we have 250 real and morphed images. The dataset has helped us to work with both morph types. – Expression and Identity swap: To collect data for identity swap and facial expression change, we have gathered the video clips of the FF++ dataset [14]. It has 1,000 real face videos assembled from YouTube and their corresponding 2,000 fake versions of videos. Pre-processing: Firstly, features are extracted using the 2D CA feature extraction method. If an image’s facial-point detection or alignment fails, it is discarded. The images are cropped so that only the facial portion is visible. Images with multiple faces are discarded. The individual’s eyes should be visible, and it is preferred to have both eyes open. We extract one frame per 32 s for video samples to decrease the size without sacrificing the diversity. All the datasets are split into 60% for training, 20% for validation, and 20% for testing. The data is augmented by horizontal flip, vertical flip, and re-scaled to achieve more normalization. All the fake images morphed with the same real image are in the same set as the source image.
Facial Image Manipulation Detection Using Cellular …
77
3.2 Application Previews The application starts with Fig. 4a, and then we get a drop-down option for training or testing. Training is the first step, and the results for all the standard datasets are shown in Fig. 5a–f. To go with testing, we first need to choose the morph types. For SMAD, we land up in Fig. 3c, whereas for DMAD, we get the preview as shown in Fig. 3d. The test results for both morph types are shown in this section. Figure 6 shows the results for SMAD with a few example images of the facial morph detection. Figure 6a is downloaded from google and is tested on our application. The picture is morphed, and its attribute is changed. Figure 6c is edited in photoshop and the result is shown in Fig. 6d. Figure 6e is original and the result is shown in Fig. 6f. Figure 6g is again downloaded, and its face is swapped, and the result is shown in Fig. 6h. Next, for DMAD, a few examples are shown in Fig. 7 from our dataset. As the DMAD process is used in the recognition and searching process, the first step to proceed with the method is to verify the suspect image with the available database image. In Fig. 7a, the public database image is very young. The suspect image was taken recently, and
(a) Landing page of MAD
(b) Train a classifier
(c) Test a sample image with SMAD
(d) Test a sample image with DMAD
Fig. 4 Application preview
78
S. Chakraborty and D. R. Chowdhury
(a) Training loss and accuracy of FF++ (Identity swap)
(c) Training loss and accuracy of Attribute-change
(e) Training loss and accuracy of FF++ (TIMIT, LQ)
(b) Training loss and accuracy of FF++ (Expression swap)
(d) Training loss and accuracy of Celeb-DF-v2
(f ) Training loss and accuracy of FF++ (TIMIT, HQ)
Fig. 5 Training loss and accuracy of the standard datasets
Facial Image Manipulation Detection Using Cellular …
(a) Test-image1
79
(b) Decision-Forged
(c) Test-image2
(d) Decision-Forged
(e) Test-image3
(f ) Decision-Genuine
(g) Test-image4
(h) Decision-Forged
Fig. 6 Facial morph detection results for SMAD
80
S. Chakraborty and D. R. Chowdhury
(a) Test-image1
(b) Decision-Genuine
(c) Test-image2
(d) Decision-Forged
(e) Test-image3
(f ) Decision-Genuine
(g) Test-image4
(h) Test-image5
Fig. 7 Facial morph detection results for DMAD
Facial Image Manipulation Detection Using Cellular …
81
both the photos are first verified. The suspect image has glasses on, which is an addon feature. The model acts perfectly since both eyes are visible. Once verified, we opt for a manipulation detection test and Fig. 7b shows the test result where the suspect image is termed as genuine. Next, in Fig. 7c, the suspect image is verified with the database image. But the decision is forged because the image’s facial attributes are manipulated. After that, we can see in Fig. 7e, the database image matches with the suspect image. And the decision is genuine. But in Fig. 7g, the suspect image is not verified with the database image. By bare eyes, no such difference is visible between the two pictures, but the model clearly differentiates it and then searches in the database. Similarly, in Fig. 7h the suspect and database images don’t match. Here, we can spot the difference in character. So, our model again searches the database and finds out that there are 3 images that match the suspect image. Therefore, it is inferred that the suspect image is trying to steal someone else’s identity.
3.3 Performance Metrics of the Proposed Solution The proposed approaches are evaluated in the context of face morphing detection by using the following evaluation metrics [15], here t p = true positives, tn = true negatives, f p = false positives, and f n = false negatives. The false acceptance rate (FAR) measures the likelihood that the biometric security system will incorrectly accept an access attempt by an unauthorized user. The FAR is responsible for security, and hence it is the most critical measurement. It is also known as the false-positive rate. FAR = False positive rate =
Misclassified real morphs( f p ) All morphed images( f p + tn )
(1)
Precision is used to see the face verification performance benchmark. Precision =
Accepted originals(t p ) All predicted originals(t p + f p )
(2)
Recall means the ability of a model to find all the relevant cases within a data set. It is also known as true positive rate. Recall = True positive rate =
tp t p + fn
(3)
F1 score denotes the offset between the precision and the recall. F1 score = 2 ×
Precision × Recall Precision + Recall
(4)
82
S. Chakraborty and D. R. Chowdhury
Table 4 Model outputs for different morphs. AUC = area under the curve, FAR = false acceptance rate, EER = equal error rate Morph type
Model used
Test accuracy
Confusion Precision Recall matrix on test-data
f1 score FAR
AUC score
EER
FF+ (DeepFake, FaceSwap, LQ)
Model1
0.9727
tn: 934 fp: 30 fn: 49 tp: 1881
0.9843
0.9746
0.9794
0.0311
0.9961
0.029
FF+ Model1 (Face2Face, NeuralTextures, LQ)
0.9921
tn: 274 fp: 18 fn: 2 tp: 2225
0.9905
1
1
0.0695
0.9096
0.1615
Attribute change
Model2
0.9343
tn: 226 fp: 157 fn: 0 tp: 2007
0.9274
1
0.9623
0.4099
0.9699
0.0704
Celeb-df-v2
Model1
0.9224
tn = 2739 fp = 513 fn = 295 tp = 6872
0.9305
0.9588
0.9445
0.1575
0.9675
0.0888
DeepFake TIMIT (LQ)
Model1
0.9659
tn = 31 fp = 0 fn = 3 tp = 54
1
0.9474
0.9730
0
0.9994
0.0175
DeepFake TIMIT (HQ)
Model1
0.9432
tn = 35 fp = 0 fn = 5 tp = 48
1
0.9057
0.9505
0
0.9908
0.0286
AUC means area under the curve where a curve is plotted on false positive rate (xaxis) versus the true positive rate (y-axis) for various threshold values between 0.0 to 1.0. We can also say this as a faulty versus hit rate. A skilled model assigns a higher likelihood to a randomly picked true positive occurrence. Generally, a skill model is displayed by curves that curve up to the top left of the plot. A no-skill classifier cannot determine any class and predicts an arbitrary or invariant class in all cases. A diagonal line is depicted from the bottom left of the plot to the top right for the no skill model. The AUC for it is 0.5. Therefore, we get an AUC score between 0.0 and 1.0 for no skill and perfect skill, respectively. Lastly, equal error rate (EER) is an algorithm employed for biometric security systems to predetermine the threshold values for its false acceptance rate and rejection rate. The lower the equal error rate value, the higher the accuracy of the biometric system. Table 4 shows the final outputs of the performance metrics for all the morph types used in our work. Model 1 is used for morph-type face swap and expression change, whereas for attribute change, model 2 is used. The evaluation results are comparable with the works presented in [16, 17].
Facial Image Manipulation Detection Using Cellular …
83
3.4 Comparison with Other Works This section compares different morph types separately with some recent standard works. The comparison results with the defined performance metrics are shown in Tables 5, 6, and 7. After that, it is demonstrated how our system works better than others. ∗
∗
∗
Identity Swap: The evolution of new techniques to detect face swap manipulations is constantly growing. Table 5 compares the suitable detection techniques in this area. We have tried to find all the measurements for our proposed method to compare our work efficiently. In [18], Afchar et al. suggested two different networks based on the image’s mesoscopic effects. Firstly, they introduced Meso-4, a CNN network with four convolutional layers plus a fully connected layer at the end. After that, they introduced an up-gradation of Meso-4, including an inception module variant presented in [24], known as MesoInception-4. In this succession, in [19], Matern et al. suggested morph detection techniques focused on elementary visual artifacts such as eye color, misplaced reflections, etc. They have considered two classifiers: logistic regression and multilayer perceptron (MLP) model. They have tested the proposed algorithm in different databases, and the AUC for each is shown in Table 5. Next, strategies based on steganalysis and mesoscopic features are also explored. The results are shown with the modified network. That pre-trained detection model experimented with the unseen datasets in [25], verified to be a strong technique in some cases, for instance, with the FF++ database. Extensive experiments have been done with various technologies like RNN and capsule networks. We have compared the results by combining the FF++ datasets and the Deepfake TIMIT datasets. We can see that the usage of 2D CA has helped us achieve a reasonable detection rate and is comparable with other standard methods. Attribute Manipulation: This method is based on reforming a picture in some facial attributes like skin or hair color, change in gender, modifying age, glasses, adding makeup, etc. Despite the GAN-based framework’s success, only some datasets are publicly available. We compared the results based on the StarGAN datasets and showed the results acquired by our private datasets. Table 6 compares the results of our proposed model with different standard methods. Expression Swap: This process is based on altering the facial expression of an individual. We have focused on the widespread methods NeuralTextures and Face2Face, which transform one person’s facial expression with another individual in a video. The only public dataset available for investigation in this area is FF++ [14]. Table 7 compares with some recent works proposed to detect the manipulated images. In [19] by Matern et al., the best performance for their proposed technique has acquired a final 86.6 % AUC, whereas, for our work, it is 90.96 %. We have also compared our result with Rossler et al.’s mesoscopic, steganalysis, and deep features in [21]. Likewise, many other approaches have worked on this particular detection technique and have yielded some promising
84
S. Chakraborty and D. R. Chowdhury
Table 5 Comparison of morph-type identity swap with model1. AUC = area under the curve, Acc. = Accuracy, FF++ = FaceForensics++, EER = equal error rate Study
Method
Classifiers
Best performance (in %)
Databases (Generation)
Afchar et al. (2018) [18]
Mesoscopic features
CNN
AUC = 98.4 AUC = 87.8 AUC = 68.4 Acc. = 90.0 Acc. = 94.0 Acc. = 83.0 Acc. = 93.0 AUC = 75.3
Own DeepFake TIMIT (LQ) DeepFake TIMIT (HQ) FF++ (DeepFake LQ) FF++ (DeepFake HQ) FF++ (FaceSwap, LQ) FF++ (FaceSwap, HQ) Celeb-DF
Matern et al. (2019) [19]
Visual features
Logistic regression, MLP
AUC = 85.1 AUC = 77.0 AUC = 77.3 AUC = 55.1
Own DeepFake TIMIT (LQ) DeepFake TIMIT (HQ) Celeb-DF
Nguyen et al. (2019) [20]
Deep learning features
Capsule networks
AUC = 78.4 AUC = 74.4 AUC = 96.6
DeepFake TIMIT (LQ) DeepFake TIMIT (HQ) FF++ (DeepFake)
Rossler et al. (2019) [21]
Mesoscopic features Steganalysis features Deep learning features
CNN
Acc. = 94.0 Acc. = 98.0 Acc. = 93.0 Acc. = 97.0
FF++ (DeepFake LQ) FF++ (DeepFake HQ) FF++ (FaceSwap, LQ) FF++ (FaceSwap, HQ)
Sabir et al. (2019) [22]
Image + Temporal features
CNN + RNN
AUC = 96.9 AUC = 96.3
FF++ (DeepFake, LQ) FF++ (FaceSwap, LQ)
Wang and Dantcheva (2020) [23]
Deep learning features
3DCNN
AUC = 95.13 AUC = 92.25
FF++ (DeepFake, LQ) FF++ (FaceSwap, LQ)
Proposed model
2D CA features
CNN
Acc. = 97.27 AUC = 99.61 EER = 0.03 Acc. = 92.24 AUC = 96.75 EER = 0.88 Acc. = 96.59 AUC = 99.94 EER = 0.01 Acc. = 94.32 AUC = 99.08 EER = 0.02
FF+ (DeepFake, FaceSwap, LQ) Celeb-DF DeepFake TIMIT (LQ) DeepFake TIMIT (HQ)
Facial Image Manipulation Detection Using Cellular …
85
Table 6 Comparison of morph-type attribute manipulation with model2. AUC = area under the curve, Acc. = Accuracy, EER = equal error rate Study Method Classifiers Best Databases performance (Generation) Bharati et al. (2016) [26] Jain et al. (2019) [27] Marra et al. (2019) [28] Nataraj et al. (2019) [29] Rathgeb et al. (2020) [30] Wang et al. (2019) [31] Proposed model
Deep learning features (Face patches) Deep learning features (Face patches) Deep learning features Steganalysis features
RBN
Acc. = 96.2 Acc. = 87.1
Own (Celebrity retouching, ND-IIITD retouching)
CNN + SVM
Acc. = 99.6 Acc. = 99.7
Own (ND-IIITD retouching, StarGAN)
CNN + Incremental leaning CNN
Acc. = 99.3
Own (Glow/StarGAN)
Acc. = 99.4
Own (StarGAN/CycleGAN)
PRNU features
Score-level fusion
EER = 13.7
Own (5 public apps)
GAN-pipeline features
SVM
Acc. = 84.7
Own (InterFaceGAN/StyleGAN)
Deep learning features
CNN
Acc. = 93.43 StarGAN, own dataset AUC = 96.99 EER = 7.04
results. We have compared our results with all the methods, and we can ascertain that 2D CA feature extraction has helped us to optimize feature selection to go forward with the detection technique.
4 Conclusion In this paper, we have introduced a system to detect manipulation in facial images. At first, 2D CA-based feature extraction is used to enhance the feature maps of the images. The feature vector then goes through classification to provide a binary decision. We have proposed two classifiers for the learning mechanism. The first classifier is trained from scratch and provides good results for identity swap and expression swap. Transfer learning is used for the second classifier for attribute change morph type. We have compared our results with some standard and recent models and seen promising outcomes. To conclude, we can see that the system
86
S. Chakraborty and D. R. Chowdhury
Table 7 Comparison of morph-type expression swap with model1. AUC = area under the curve, Acc. = Accuracy, FF++ = FaceForensics++, EER = equal error rate Study Method Classifiers Best Databases performance (Generation) Afchar et al. (2018) [18]
Mesoscopic features
CNN
Acc. = 83.2 Acc. = 75
Rossler et al. (2019) [21]
Mesoscopic features Steganalysis features Deep learning features Visual features
CNN
Acc. = 91 Acc. = 81
Logistic regression, MLP CNN + Optical Flow
AUC = 86.6
FF++ (Face2Face, LQ)
Acc. = 81.6
FF++ (Face2Face,-)
CNN + RNN
AUC = 94.3
FF++ (Face2Face, LQ)
3DCNN
AUC = 90.27 AUC = 80.5
CNN
Acc. = 99.21 AUC = 90.96
FF++ (Face2FAce, LQ) FF++ (NeuralTextures, LQ) FF++ (Face2FAce, NeuralTextures, LQ)
Matern et al. (2019) [19] Amerini et al. (2020) [32] Sabir et al. (2019) [22] Wang and Dantcheva (2020) [23] Proposed model
Image + Temporal features Image + Temporal features Deep learning features
2D CA features
FF++ (Face2Face, LQ) FF++ (NeuralTextures, LQ) FF++ (Face2FAce, LQ) FF++ (NeuralTextures, LQ)
is a complete package for detecting facial morphs for different morph types and also works well for different morphing schemes. Further, it may be explored if our proposed approach works the same way for some other classes of images.
References 1. Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2019) Face2face: real-time face capture and reenactment of rgb videos. Commun ACM 62(1):96–104. http://dblp.uni-trier.de/ db/journals/cacm/cacm62.html#ThiesZSTN19 2. Choi Y, Uh Y, Yoo J, Ha J-W (2020) Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Facial Image Manipulation Detection Using Cellular …
87
3. Raghavendra R, Raja KB, Marcel S, Busch C (2016) Face presentation attack detection across spectrum using time-frequency descriptors of maximal response in laplacian scale-space. In: 2016 sixth international conference on image processing theory, tools and applications (IPTA). IEEE, pp 1–6 4. Thies J, Zollhöfer M, Nießner M, Valgaerts L, Stamminger M, Theobalt C (2015) Real-time expression transfer for facial reenactment. ACM Trans Graph 34(6):183–1 5. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395 6. Kim H, Garrido P, Tewari A, Xu W, Thies J, Niessner M, Pérez P, Richardt C, Zollhöfer M, Theobalt C (2018) Deep video portraits. ACM Trans Graph (TOG) 37(4):1–14 7. Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing obama: learning lip sync from audio. ACM Trans Graph (ToG) 36(4):1–13 8. Tran L, Liu X (2019) On learning 3d face morphable model from in-the-wild images. IEEE Trans Pattern Anal Mach Intell 43(1):157–171 9. Dang H, Liu F, Stehouwer J, Liu X, Jain AK (2020) On the detection of digital face manipulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5781–5790 10. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410 11. Keras application. https://keras.io/api/applications/. Accessed 08 Sep 2021 12. Sang J, Lei Z, Li SZ (2009) Face image quality evaluation for iso/iec standards 19794-5 and 29794-5. In: International conference on biometrics. Springer, pp 229–238 13. Liu Z, Luo P, Wang X, Tang X (2018) Large-scale celebfaces attributes (celeba) dataset. Retrieved August 15(2018):11 14. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) FaceForensics: a large-scale video dataset for forgery detection in human faces 15. Wandzik L, Kaeding G, Garcia RV (2018) Morphing detection using a general-purpose face recognition system. In: 26th European signal processing conference (EUSIPCO). IEEE, pp 1012–1016 16. Raja K, Venkatesh S, Christoph Busch R et al (2017) Transferable deep-cnn features for detecting digital and print-scanned morphed face images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 10–18 17. Chaudhary B, Aghdaie P, Soleymani S, Dawson J, Nasrabadi NM (2021) Differential morph face detection using discriminative wavelet sub-bands. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1425–1434 18. Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: IEEE international workshop on information forensics and security (WIFS). IEEE, pp 1–7 19. Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: IEEE winter applications of computer vision workshops (WACVW). IEEE, pp 83–92 20. Nguyen HH, Yamagishi J, Echizen I (2019) Use of a capsule network to detect fake images and videos. arXiv:1910.12467 21. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1–11 22. Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P (2019) Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3(1):80–87 23. Wang Y, Dantcheva A (2020) A video is worth more than 1000 lies. Comparing 3dcnn approaches for detecting deepfakes. In: 2020 15th IEEE international conference on automatic face and gesture recognition (FG 2020). IEEE, pp 515–519
88
S. Chakraborty and D. R. Chowdhury
24. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 25. Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: a large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3207–3216 26. Bharati A, Singh R, Vatsa M, Bowyer KW (2016) Detecting facial retouching using supervised deep learning. IEEE Trans Inf Forensics Secur 11(9):1903–1913 27. Jain A, Singh R, Vatsa M (2018) On detecting gans and retouching based synthetic alterations. In: 2018 IEEE 9th international conference on biometrics theory, applications and systems (BTAS). IEEE, pp 1–7 28. Marra F, Saltori C, Boato G, Verdoliva L (2019) Incremental learning for the detection and classification of gan-generated images. In: IEEE international workshop on information forensics and security (WIFS). IEEE, pp 1–6 29. Nataraj L, Mohammed TM, Manjunath B, Chandrasekaran S, Flenner A, Bappy JH, RoyChowdhury AK (2019) Detecting gan generated fake images using co-occurrence matrices. Electr Imaging 2019(5):532–1 30. Rathgeb C, Botaljov A, Stockhardt F, Isadskiy S, Debiasi L, Uhl A, Busch C (2020) Prnu-based detection of facial retouching. IET Biom 9(4):154–164 31. Wang R, Juefei-Xu F, Ma L, Xie X, Huang Y, Wang J, Liu Y (2019) Fakespotter: a simple yet robust baseline for spotting ai-synthesized fake faces. arXiv:1909.06122 32. Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based cnn. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0
Language-Independent Fake News Detection over Social Media Networks Using the Centrality-Aware Graph Convolution Network Sujit Kumar, Mohan Kumar, and Sanasam Ranbir Singh
Abstract Though social media have emerged as a convenient platform for news consumption and update about various events, the spread of fake news over social media is a threat to the freedom of expression and democracy. In the literature, studies related to fake news detection on social media can be categorized into two categories: tweet propagation networks with a user characteristic-based and tweet propagation networks with post and reply content. Tweet propagation network with a user characteristic-based model suffers from a lack of user profile information available due to users’ privacy concerns. Tweet propagation networks with post and reply content are language-dependent. Motivated by the above two challenges, this study proposes a centrality-aware graph convolution network CGCN model. CGCN model incorporates the centrality of every node to overcome the limitation of user profile information and not rely on the text in posts and replies to make it languageindependent. Our experimental results of two benchmark datasets suggest that the proposed model efficiently detects fake news without using text information in posts and comments. Hence, our proposed model can be used to detect fake news in any vernacular language. Keywords Fake news detection · Social media network · Centrality-aware graph convolution network
S. Kumar · M. Kumar · S. R. Singh (B) Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India e-mail: [email protected] S. Kumar e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_7
89
90
S. Kumar et al.
1 Introduction Due to the sudden rise of misinformation over digital platforms, misinformation detection has become an important research problem. Fake news1 is described as a fabricated storyline on a broad scale to deceive readers. According to media scholars [2], fake news is defined as distorted and deceptive content in circulation as news via communication mediums such as print, electronic, and digital communication. Fake news has become a substantial threat to the freedom of expression and democracy. According to a study, it has weakened the public trust in government and spread quickly on digital platforms.2 According to a study,3 64% of the adult population in the U.S. accepted that fake news stories were the primary source of confusion regarding basic facts and current affairs. Thus, fake news detection is an important research problem. In the literature, studies [1, 5, 6, 8, 14] consider tweet propagation networks, metainformation of users, and text in tweets and replies within the propagation network. These studies can be further categorized into two groups: tweet propagation network with a user characteristic-based and tweet propagation network with post and reply content. Studies [5, 6, 8] incorporate user activity and user characteristic features for fake news classifications in social media networks. These users’ meta-information includes information extracted from user profiles such as name, account verified or not, number of words in bio, user’s favorite topic, number of followers, and number of people the user is following. Though user characteristic-based methods help in fake news classifications, many users do not provide personal information due to privacy concerns. Another limitation of the user characteristic-based method is that these methods assume verified and trusted users are not involved in spreading fake news, which is only sometimes true in real scenarios. In tweet propagation network with post and reply content group study [11, 13], investigate the semantic relevance between posts and comments for fake news classifications. These methods also rely on text in posts and comments, which are language-dependent. The key limitation on such methods based on post and comment relation is that these methods work for a dataset with a specific language only. Curating a large-scale fake news dataset for a vernacular language is a challenging task. However, the main challenge with social media networks is that post and reply content includes creative spelling, typos, multilingual content, informal language, and slang. Considering the above-discussed challenges and limitations associated with the tweet propagation network with user characteristic-based and tweet propagation network with post and reply content methods, this study proposes Centrality-aware Graph Convolution CGCN for fake news detection. CGCN combines user profile meta-information, network structural and temporal information, and centrality information of nodes in the tweet propagation network for fake news classification. Incorporating centrality information of nodes 1
https://en.wikipedia.org/wiki/Fake_news. Pew Research Center Report on Fake News. 3 Study on Fake News. 2
Language-Independent Fake News Detection over Social …
91
in a tweet propagation network helps to understand how fast information is spreading and how a user is influencing or contributing to the spread of information. Incorporating the centrality information of nodes also helps to overcome the limitation of the user profile feature because the user profile information of a user associated with nodes may not be available, but the Centrality of a node could always be estimated to understand the importance of a node in the spread of news. Our proposed CGCN solely relies on network-related features that make our model language independent. From experimental results over two benchmark datasets, it is evident that our proposed model efficiently detects fake news without using any textual information in posts and comments. The contribution of this paper can be summarized as follows: – Investigate the role of the different network-based features for languageindependent fake news detection. – Proposed the Centrality-aware Graph Convolution model CGCN for languageindependent fake news detection. The rest of the paper is organized as follows. Section 2 discusses work related to fake news detection on social media networks. Section 3 presents our proposed model. Sections 4 and 5 present experimental, and results, respectively. Similarly, Sect. 6 presents the conclusion of this paper.
2 Related Work Social media have become a popular tool in our daily life to get an update about various events. However, several studies have pointed out several concerns over the presence of misinformation on social media. Due to the increasing popularity and acceptability of social media among people, any information can quickly reach a wide range of users. Hence, social media platforms are being misused for fake news propagation. Several state-of-the-art methods have been proposed in the literature to counter misinformation on social media. State-of-the-art methods have utilized several features to debunk fake news on social media platforms. These features can be categorized into three categories: linguistic feature, propagation of network, and user profile feature. Short text [4, 11, 13] of tweets is considered a linguistic feature, and a graph is formed with retweets and reply sequences for the propagation network. Similarly, information present in the user profile is considered for user profile features such as account being verified or not, presence or absence of profile picture, number of words in Twitter profile, decryption, and profile name, gender, hometown, number of followers, etc. [5, 6] proposed a model that classifies the news propagation path with the help of CNN and GRU. The authors formed a feature vector for each user based on user profile features, and the trained GRU and CNN on the sequence of user vectors in the propagation path. Study [12] explored the scope of propagation network for debunking fake news on social media networks and created a hierarchical propagation network of each source tweet. Hierarchical propagation network includes two subnetworks, namely micro-level and macro-level
92
S. Kumar et al.
information. Macro-level networks constitute retweet propagation to capture global spreading patterns. Similarly, micro-level network constitutes tweets and their replies to capture local spreading patterns. They extracted several structural and temporal features to train traditional classifiers for detecting fake news. Study [6] proposed a graph-aware co-attention network (GCAN) model, which uses short text in tweets, propagation network of retweet sequences of users and features related to the user profile for the prediction of fake source stories. GCAN model learns word embedding of source short text, CNN and GRU to learn retweet propagation based on user features; GRU model is trained to learn the potential interaction between users. Finally, dual co-attention is applied to learn the correlation between all the submodels. Study [1] explored the opportunity of online communities for the detection of fake news in social media network and constructed a heterogeneous graph with two types of nodes, new node and user node with an edge between a user node and news node if the user has shared the news in their tweet or retweeted and an edge between user nodes if two users hold follow or follower relationship. Graph neural network is applied to learn community graphs, and linguistic feature from text is encoded with the help of CNN. Finally, the representation of text and graph is aggregated for the classification of fake news. However, linguistic-based models are language-dependent, and user profile feature-based models suffer from the lack of availability of user information. To overcome such a limitation, this paper proposed CGCN to overcome such limitations.
3 Proposed Model As stated above, this paper proposes a language-independent model for fake news detection in social media networks. Accordingly, we investigate different features essential for fake news detection without considering texts in posts and comments. We study the role of the following features in fake news detection: (i) user profile characteristic features, (ii) network-based features, and (iii) centrality features. The detail of these features is as follows.
3.1 User Profile Features To verify the authenticity and what kind of users are participating in the tweet propagation network, this study considers extracting the following information from user profiles to construct a user profile feature vector: (i) profile description length, (ii) profile name length, (iii) follower count, (iv) count following count, (v) total stories that user created, (vi) difference between first and last post, (vii) whether that user’s profile is verified or not, and (viii) user’s location is available or not. A feature vector
Language-Independent Fake News Detection over Social …
93
of eight entries is formed, where there are entries for each profile information mentioned above. If the user profile information is unavailable, we consider a random vector as a user profile feature for the tweet.
3.2 Network Structure and Temporal Features Studies [16, 17] suggest the network structure of a tweet propagation graph exhibit different characteristics and properties for real and fake news. Motivated by the influence of structural and temporal features in the study [12], this study considers the following temporal and network structure features: (i) Depth of tweet propagation graph, (ii) Number of nodes in propagation graph, (iii) Maximum out-degree of tweet propagation graph, (iv) Maximum out-degree of node’s level, (v) Number of cascades with retweets, (vi) Fraction of cascades with retweets, (vii) Number of bot user retweeting, and (viii) Fraction of bot user retweeting. Similarly, we also consider the following: (i) average time difference between adjacent replies, (ii) time difference between the first tweet posting node and first reply node, (iii) time difference between first tweet posting node and last reply node, (iv) average time between adjacent reply nodes in the deepest cascade, and (v) time difference between the first tweet posting news and last reply node in the deepest cascade.
3.3 Centrality To measure the influence of a node in the tweet propagation network, we incorporate the centrality of nodes of the tweet propagation network. Here, the influence of a node implies how much a node contributes to the spread of information. In this study, we consider three centrality measures, namely degree [15], closeness [9], and page rank centrality [10]. Figure 1 present the block diagram of our proposed model Centrality-aware Graph Convolution Network CGCN. Given a tweet propagation graph, G = (V, E), CGCN first obtains centrality and user profile feature vector for each node v ∈ V . Then both centrality and user profile feature vector are concatenated to form a feature vector pi, j ∀i = 1 to k and ∀ j = 1 to n. k is the number of source tweets, and n is the number of retweets and replies for a source tweet. Subsequently, the tweet propagation graph G with concatenated feature vector pi, j ∀ i, j is passed through a graph convolution neural network GC AN [3] to obtain an encode representation g of the graph G. We also extract network and temporal features t from the graph G. Finally, encoded representation g of a graph G and extract network and temporal features t are concatenated and passed to a two-layer fully connected neural network for classification.
94
S. Kumar et al.
Fig. 1 The proposed model C GC N is represented in the diagram. First, a tweet propagation graph G is formed by following a hierarchy of news → tweet → retweet ∨ replies. Then a feature vector pi, j is formed by concatenating the centrality of the node and the user profile feature vector associated with the node n i, j . Subsequently, the graph G is passed through a Graph Convolution Neural network (GCN) to obtain an encoded representation g of the graph G. Similarly, structural and temporal features t are obtained from the graph. Finally, feature vectors t and g are concatenated and passed through a fully connected neural network for classification
4 Experimental Setup To study the effectiveness of different features for fake news detection in social media network, we consider different sets of baselines. The details of baselines are discussed below. 1. User Profile Feature (UPF): This baseline model is similar to the proposed model, but only the user profile feature is used as a node feature in graph G and role of network structure and temporal features are eliminated. 2. Centrality: This baseline model is similar to the proposed model, but only the user centrality feature is used as a node feature in graph G and role of network structure and temporal features are eliminated. 3. Network Structure and Temporal Features (Network): This is a simple baseline model where a multi-layer perceptron is trained over network structural and temporal features for fake news classification. 4. Centrality with User Profile Feature (Centrality + UPF): This is a variation of C GC N where only encoded representation g of graph G is passed through multi-layer perceptron for fake news classification and role of network structural and temporal features are eliminated.
5 Datasets This study uses two publicly available datasets, Twitter15 and Twitter16 [7] to study the response of proposed models. Table 1 presents the characteristics of datasets. Table 2 presents the number of user profiles where user profile information is publicly not available due to privacy concerns.
Language-Independent Fake News Detection over Social … Table 1 Characteristics of Twitter15 and Twitter16 datasets Platform Twitter15 # Source tweets # True # False # Users #Retweets per story # Words per source
742 372 370 488,211 292.19 13.25
95
Twitter16 412 205 207 268,721 308.7 12.81
Table 2 Number of users with missing user profile information of Twitter15 and Twitter16 Twitter15 Twitter16 Total number of users #Users without profile information
488,811 92,512
256,721 59,564
Table 3 Details of hyperparameters used in experimental setup Hyperparameters Value Batch size Learning rate Activation function Loss function # Epochs Max # of Nodes in a tweet propagation graph # Layer in Feedforward NN
8 0.01 Softmax Cross entropy 500 250 2
Table 3 presents the details of a hyperparameter used to produce the results presented in Table 4.
5.1 Result and Discussion Table 4 presents the performance comparison between the baseline model and the proposed models C GC N . By comparing the performance of baseline models, the following observation can be made. – Performance Centralit y and U P F model over Twitter15 and Twitter16 datasets indicate that Centrality and user profile feature are significant for detecting fake news. Still, these features are not good indicators of true news.
96
S. Kumar et al.
Table 4 Performance comparison between proposed and baseline models over two benchmark datasets, namely Twitter15 and Twitter16. Here, Acc. indicates the Accuracy, Fake indicates the F-measure on the Fake class, and True indicates the F-measure on the True class Model Twitter 15 Twitter 16 Acc. Fake. True Acc. Fake. True. Baseline
Proposed
Centralit y U PF N etwor k Centralit y + U P F C GC N
0.75 0.73 0.704 0.771 0.872
0.856 0.843 0.784 0.808 0.884
0.075 0.047 0.153 0.47 0.5
0.475 0.573 0.695 0.682 0.951
0.619 0.616 0.648 0.658 0.8
0.156 0.461 0.518 0.693 0.82
– Performance of the N etwor k model indicates that tweet propagation networks of fake and true news are different; hence, it is a significant feature in fake news detection. The performance of the N etwor k model is superior to the Centralit y and U P D models. – Centralit y + U P F model outperforms all other baseline counterparts. This indicates that combining centrality and user profile features detects fake news more efficiently. – The proposed centrality-aware graph convolution network C GC N model outperforms its baseline counterparts. This establishes the superiority of the proposed model. – Performance of the proposed model indicates that The proposed model CGCN only relies on network-based features, and from Table 4 it is evident that the proposed model poses state-of-the-art performance over two publicly available benchmark datasets. Since our proposed model does not include any text-related feature, it is obvious that it will be efficiently able to detect fake news over social media networks in any vernacular language.
6 Conclusion This paper proposes language-independent fake news detection methods CGCN for detecting fake news in social media networks. CGCN utilizes user profile, centrality, and structural and temporal features with a graph convolution network. We compare our proposed model with several nontrivial baselines. Our experimental results suggest that our proposed model can efficiently detect fake news in social media networks without considering any text in posts and replies. This makes our proposed model language-independent. Since C GC N only relies on graph-based features, it can efficiently detect fake news in any vernacular language. In future work, we will investigate more graph-based features for the fake news detection task.
Language-Independent Fake News Detection over Social …
97
References 1. Chandra S, Mishra P, Yannakoudakis H, Nimishakavi M, Saeidi M, Shutova E (2020) Graphbased modeling of online communities for fake news detection. arXiv:2008.06274 2. Higdon N (2020) The anatomy of fake news: a critical news literacy education. University of California Press 3. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net 4. Kumar S, Kumar G, Singh SR (2022) Textminor at checkthat! 2022: fake news article detection using robert. In: Working notes of CLEF 5. Liu Y, Wu Y-FB, Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: Thirty-second AAAI conference on artificial intelligence 6. Lu Y-J, Li C-T (2020) Gcan: graph-aware co-attention networks for explainable fake news detection on social media. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 505–514 7. Ma J, Gao W, Wong K-F (2017) Detect rumors in microblog posts using propagation structure via kernel learning. Association for Computational Linguistics 8. Monti F, Frasca F, Eynard D, Mannion D, Bronstein MM (2019) Fake news detection on social media using geometric deep learning. arXiv:1902.06673 9. Okamoto K, Chen W, Li X-Y (2008) Ranking of closeness centrality for large-scale social networks. In: International workshop on frontiers in algorithmics. Springer, pp 186–195 10. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab 11. Shu K, Cui L, Wang S, Lee D, Liu H (2019) Defend: explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 395–405 12. Shu K, Mahudeswaran D, Wang S, Liu H (2020) Hierarchical propagation networks for fake news detection: investigation and exploitation. In: Proceedings of the international AAAI conference on web and social media, vol 14, pp 626–637 13. Yang Y, Wang Y, Wang L, Meng J (2022) Postcom2dr: utilizing information from post and comments to detect rumors. Expert Syst Appl 189:116071 14. Yuan C, Ma Q, Zhou W, Han J, Hu S (2020) Early detection of fake news by utilizing the credibility of news, publishers, and users based on weakly supervised learning. In: Proceedings of the 28th international conference on computational linguistics, pp 5444–5454 15. Zhang J, Luo Y (2017) Degree centrality, betweenness centrality, and closeness centrality in social network. In: 2017 2nd international conference on modelling, simulation and applied mathematics (MSAM2017). Atlantis Press, pp 300–303 16. Zhao Z, Zhao J, Sano Y, Levy O, Takayasu H, Takayasu M, Li D, Junjie W, Havlin S (2020) Fake news propagates differently from real news even at early stages of spreading. EPJ Data Sci 9(1):7 17. Zhou X, Zafarani R (2019) Network-based fake news detection: a pattern-driven approach. ACM SIGKDD Explor Newslett 21(2):48–60
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System Sourav Saha, Ashok Kumar Das, and Debasis Giri
Abstract The Internet of Things (IoT)-enabled healthcare becomes one of the important technology due to the advances in the information and communications technology (ICT). However, there are several security and privacy issues in an IoTenabled healthcare system due to security threats and attacks mounted by an adversary. The blockchain technology offers immutability, decentralization, and transparency while the data is stored. In this paper, we design a novel access control scheme, called private blockchain enabled security framework for IoT-based healthcare system (in short, PBESF-IoTHS) which is not only efficient in computation and communication but also offers superior security and more functionality features as compared to other relevant competing schemes. The blockchain-based simulation results also demonstrate the proposed PBESF-IoTHS’s practical usefulness in finding the computational time required for a varied number of blocks mined in a Peer-to-Peer (P2P) blockchain network and a varied number of transactions per block. Keywords Internet of Things (IoT) · Blockchain · Healthcare system · Authentication and key agreement · Security
S. Saha · A. K. Das (B) Center for Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad 500 032, India e-mail: [email protected]; [email protected] S. Saha e-mail: [email protected] D. Giri Department of Information Technology, Maulana Abul Kalam Azad University of Technology, Kolkata 741 249, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_8
99
100
S. Saha et al.
1 Introduction The furtherance of “Internet of Things (IoT)” transforms the healthcare system immensely and provides quality, affordable, and efficient healthcare services to the human lives. Healthcare practitioners, patients, and other healthcare service providers use IoT-based wearable devices or IoT-based diagnosis tools to provide advanced treatment to people. Billions of IoT smart sensors are connected with the healthcare system for continuous health tracking and monitoring. The healthcare infrastructure connected with various sensing devices can collect, transmit, process, and store different sensitive medical data that needs to be securely communicated. In healthcare system, various vital data, such as “electroencephalography (EEG)”, “electrocardiography (ECG)”, “heart rate”, “glucose level”, “body temperature”, “blood pressure (BP)”, “oxygen saturation (SpO2)”, and “respiration rate/heartbeat data” are continuously transmitted from a patient. In such scenario, it is highly dangerous if the confidential and private vital data of the patient is accessed by an unauthorized user or an attacker. Therefore, there is a great demand in designing a highly secure, robust, and efficient access control framework for a healthcare system. As per the recent report: “2021 H2 Healthcare Data Breach Report Critical Insight” [1], it was reported that there is an increase in healthcare-related data breaches, like 14 million–44.9 million records, i.e., 84% increase of data breaches in healthcare sector during 2018–2021. The majority of healthcare sectors do not meet the appropriate security requirements, and due to this, there are various data leakages that were reported in the healthcare infrastructure [2]. Considering today’s needs, appropriate security framework needs to be provided to address the security threats. The healthcare sector struggles with a single point of failure and scalability because IoT devices collect heterogeneous immeasurable amount of medical data, and hence, it opens the door of applying the blockchain technology. The inherent features of blockchain, such as immutability, irreversibility, scalability, and transparency enhance the medical system and preserve the patients’ privacy and security [21]. The rapid growth in population, increase in various diseases, and lack of security suggest that there is an extreme surge in designing robust and efficient blockchain-based security framework for healthcare systems.
1.1 System Models This section explains the used network model as well as the threat model in this paper.
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
1.1.1
101
Network Model
The network model of PBESF-IoTHS is presented in Fig. 1. In PBESF-IoTHS, a private blockchain-based security framework is considered to connect with various entities and other hospital networks. All the respective Hospital Centers (HCs) are connected among each other to form a “Peer-to-Peer (P2P)” network. In PBESFIoTHS, there are various components connected with an HC, such as patients, doctors, diagnostic center, ambulances, in-patients, and operation theaters to collect data from the attached smart IoT devices on the entities and sent to the HC. Each hospital
Fig. 1 Private blockchain-envisioned IoT-based healthcare system
102
S. Saha et al.
center (H C j ), ( j = 1, 2, 3, . . . , n hc ), where n hc denotes the total number of HCs, considers as a trusted entity and it communicates with the associate patients (Ptk ), (k = 1, 2, 3, . . . , n Pt ) where n Pt is the number of registered patients, using the designed access control framework. It is essential for Pt being a user to establish a session key with its attached H C to build a secure connection for data exchange. The information of the Pt that will be stored in the H C needs to be encrypted with the public key of H C, and the encrypted data considered as the transactions need to be put in the blocks and the to be stored in the blockchain network created by the P2P hospital centers after running a consensus protocol among them.
1.1.2
Threat Model
The popular two threat models, namely “Dolev–Yao (DY) model” [12] and “Canetti and Krawczyk’s adversary model (CK-adversary model)” [6] have been applied in the proposed design. – The DY model allows an attacker, say A, to access the communication channel to eavesdrop and also to perform modification, deletion, and insertion of forged data during the communication. – Another dominant model, known as the “CK-adversary model”, permits an attacker to have all the capabilities of the DY model and other capabilities like compromise of the secret credentials, i.e., the “session keys” established between two parties in a current session and “session states through the session hi-jacking attacks.” The end point entities like users in an IoT-enabled healthcare system can not be in general faithful. However, each H C is considered as a trusted entity. An adversary can compromise all the secret credentials stored in a lost or stolen user’s smart card or mobile device with the help of the “power analysis attacks” [20]. We presume that the adversary A can guess the low-entropy password or an identity, whereas guessing correctly more than one secret credential is a difficult job in polynomial time by A.
1.2 Research Contributions The following are the main contributed points: – The proposed secure private blockchain-enabled security framework for IoT-based healthcare system allows access control between a user and its associated H C to mutually authenticate among each other and also to establish a session key among them for secret communication. The data aggregated securely by the H Cs from the various users in IoT-based healthcare system, each H C forms the blocks that are used to mine by a P2P HCs network for data storage so that the encrypted data remains secure in the private blockchain.
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
103
– The proposed scheme allows a user to update his/her password at any time without further contacting the associated HC. This minimizes both the communication and computational costs during this procedure. – The security framework is validated through the security analysis and also a comparative study with the recently proposed schemes reveals that the proposed scheme is secure and efficient. – The blockchain-based simulation study has been conducted to in finding the computational time required for a varied number of blocks mined in a Peer-to-Peer (P2P) blockchain network and a varied number of transactions per block in order to demonstrate the practical use of the designed framework.
1.3 Paper Outline The remaining paper is arranged as follows. Section 2 presents the existing related schemes and their analysis. Section 3 discusses the proposed security framework (PBESF-IoTHS). Section 4 then discusses the blockchain implementation phase followed by the security analysis in Sect. 5. Next, a comparative analysis is shown in Sect. 6. The blockchain simulation study is conducted in Sect. 7, and finally, we conclude the article in Sect. 8.
2 Related Work In this section, the existing work on access control for an IoT-related environment is presented. Access control and key management are considered the primary security services to secure various IoT-enabled networking environments [4, 8–10, 17, 22, 24]. An efficient access control was proposed by Li et al. [14] in the area of “Wireless Sensor Network (WSN)” focusing on IoT environment. Their scheme is not only computationally heavy as it uses “bilinear pairing” but also fails to resist “man-inthe-middle attack”, along with no anonymity preservation property, and it is also vulnerable against “Ephemeral Secret Leakage (ESL) attack” under the “CK-adversary model”. Luo et al. [16] proposed another access control mechanism in WSNs based on IoT environment. However, their scheme uses computationally heavy weighted “bilinear pairing” operation and also fails to resist against “man-in-the-middle attack” and the “ESL attack under the CK-adversary model”. Li et al. [15] proposed a three-factor based authentication scheme for “wireless medical sensor networks” between a user, the gateway node, and a sensor node, which uses “fuzzy commitment scheme” and an “error-detection code (Bose–Chaudhuri– Hocquenghem (BCH))” [18] code conversion function and inverse conversion function. However, the session key was created in their scheme with the help of the
104
S. Saha et al.
temporal random secrets which leads to vulnerable against “ESL attack under the CK-adversary model.” Additionally, their scheme also fails to secure against “replay attack” and “impersonation attack”.
3 Proposed Security Framework In this section, we elaborate the various phases associated with our proposed “private blockchain-enabled security framework for IoT-based healthcare system (PBESFIoTHS).” The list of notations used in this article is tabulated in Table 1. It is assumed that the various entities in the network are synchronized with their clocks because the time-stamping mechanism has been applied for replay attack protection in PBESFIoTHS. This mechanism is also standard as it is applied in designing other access control methods in IoT-related networking scenarios [17, 22, 23].
Table 1 List of notations with their interpretation Symbol Significance E q (γ , δ)
G k·G Q+R γ ∗δ RA pr R A Pub R A Ui , H C j I DUi , P WUi BioUi Gen(·), Rep(·) T I DUi , R I DUi C T1 , C T2 , C T3 T
A non-singular elliptic curve of the form: “y 2 = x 3 + γ x + δ (mod q) with 4γ 3 + 27δ 2 = 0 (mod q)”, two constants γ , δ ∈ Z q = {0, 1, 2, . . . , q − 1}, and q is a large prime so that “elliptic curve discrete logarithm problem (ECDLP)” is intractable A base point in E q (γ , δ) Elliptic curve point (scalar) multiplication; k · G = G + G + · · · G (k times), k ∈ Z q∗ Elliptic curve point addition; Q, R ∈ E q (γ , δ) Ordinary multiplication of two numbers γ , δ ∈ Z q Trusted registration authority Private key of the R A R A’s public key; Pub R A = pr R A · G ith user and jth hospital center, respectively Ui ’s identity and password, respectively Ui ’s personal biometric template Fuzzy extractor “probabilistic generation” and “deterministic reproduction” functions, respectively Temporary and pseudo-identities of Ui , respectively Current timestamps “Maximum allowable transmission delay associated with a message”
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
105
3.1 System Initialization Phase The trusted “Registration Authority (RA)” selects the following system parameters: – A “cryptographic one-way collision-resistant hash function of the type: h : {0, 1}∗ → {0, 1} Ol which takes an arbitrary length input data (string) x ∈ {0, 1}∗ and maps to a fixed length output string (message digest), y = h(x) ∈ {0, 1} Ol of length Ol bits” is selected by the R A. For instance, one may choose the “secure hash algorithm (SHA-1) [19] that produces 160-bit message digest.” For better security, one may also choose “SHA-256 has function” [19]. – A “non-singular elliptic curve” E q (γ , δ) over a finite field GF(q) of the type: y 2 = x 3 + γ x + δ (mod q) such that 4γ 3 + 27δ 2 = 0 (mod q) with γ , δ ∈ Z q = {0, 1, . . . , q − 1}, is then selected by the R A, where q is a “sufficiently large prime, say 160 bits so that the Elliptic Curve Discrete Logarithm Problem (ECDLP) and Elliptic Curve Decisional Diffie–Hellman Problem (ECDDHP)” become intractable. – The R A then chooses a base point G ∈ E q (γ , δ). Next, a biometric verification fuzzy algorithm, called fuzzy extractor, is selected by the R A. The fuzzy extractor is associated with the two functions: (1) “probabilistic generation” Gen(·) and (2) “deterministic reproduction” Rep(·) functions [11]. – The R A also selects its own (master) private key pr R A ∈ Z q∗ = {1, 2, 3, . . . , q − 1} and computes the respective public key Pub R A = pr R A · G. Finally, the R A declares {E q (γ , δ), G, Pub R A , Gen(·), Rep(·), h(·)} as public and keeps pr R A as (secret) private key to itself.
3.2 Registration Phase 3.2.1
Hospital Center Registration
In order to register a hospital center, say H C j , the below steps are followed: – H C j selects its own identity I D H C j and a (random) private key pr H C j ∈ Z q∗ . H C j then computes its public key Pub H C j = pr H C j · G, and pseudo identity R I D H C j = h(I D H C j ||r H C j ) using a generated one-time random secret r H C j ∈ Z q∗ . H C j sends the registration information {R I D H C j , Pub H C j } to the R A securely. – After receiving the registration information from H C j , the R A computes the temporal credential as T C H C j = h(R I D H C j || pr R A ||RT S H C j ), where RT S H C j is the registration timestamp of H C j and sends {T C H C j } securely to the H C j . – After receiving the information from the R A, H C j records { pr H C j , R I D H C j , T C H C j } in its secure database. The R A publishes Pub H C j as public. The working flow of this registration process is shown in Fig. 2.
106
S. Saha et al.
Fig. 2 Registration process of H C j
3.2.2
User Registration
In order to enrol/register a user, say Ui , the R A and Ui need the below steps: – Ui selects its own identity I DUi and password P WUi . Next, Ui also imprints his/her biometric BioUi at the sensor of a specific terminal to generate a “biometric secret key” σUi and the corresponding “public reproduction parameter” τUi as Gen(BioUi ) = (σUi , τUi ). – Ui generates a random secret rUi ∈ Z q∗ , computes its pseudo identity R I DUi = h(I DUi ||rUi ) and pseudo password R P WUi = h(P WUi ||σUi ||rUi ), and sends the registration credentials {R I DUi , R P WUi } to the R A via a secure channel. – The R A generates a registration timestamp RT SUi , a (random) temporary identity T I DUi and also a session number (count) S NU MUi initialized to zero. The R A computes the temporary credential T CUi = h(R I DUi ||R P WUi || RT SUi || pr R A ) and sends the credentials {T CUi , T I DUi , S NU MUi = 0} back to Ui via a secure channel. – Ui generates a private key prUi ∈ Z q∗ and computes the respective public key PubUi = prUi · G. Ui then computes T CU∗ i = T CUi ⊕ h(σUi || R I DUi ||P WUi ), rU∗ i = rUi ⊕h(P WUi ||I DUi ||σUi ), V erUi = h(R I DUi ||T CUi ||σUi ||P WUi ), prU∗ i = prUi ⊕ h(P WUi || T CUi ||σUi ), and stores them in its mobile device M DUi . – The R A generates a shared secret between Ui and H C j as SSUi ∈ Z q∗ , and sends it securely to Ui and H C j . The R A finally sends the information (T I DUi , R I DUi , S N MUi = 0, SSUi ) corresponding to each registered user Ui to the H C j via secure channel. H j then stores these parameters in its secure database. – Ui computes SSU∗ i = SSUi ⊕ h(I DUi ||T CUi ||σUi ) and stores in the mobile device M DUi . Finally, Ui declares PubUi as public. The working flow of this registration process is shown in Fig. 3. In addition, the stored credentials at both H C j ’s secure database and Ui ’s mobile device M DUi are shown in Figs. 4 and 5, respectively.
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
107
Fig. 3 Registration process of user (Ui )
Fig. 4 H C j ’s stored credentials in its secure database
Fig. 5 Ui ’s stored credentials in his/her mobile device M DUi
3.3 Access Control and Key Agreement Phase For mutual authentication and key agreement between a registered user Ui and its associated H C j through the access control process, the below steps are executed: – Step 1: The registered user Ui first provides his/her credentials, like the identity I DUi , password P WU i , and imprints his/her personal biometric BioU i at the sensor of the mobile device M DUi . M DUi proceeds to compute σU i = Rep(BioU i , τUi ) provided that the “Hamming distance between the registered biometric BioU i and current biometric BioU i is less than or equal to an error-tolerance threshold value et.” Then, M DUi computes rUi = rU∗ i ⊕ h(P WU i ||I DUi || σU i ), R I DUi = h(I DUi ||rUi ), T CUi = T CU∗ i ⊕ h(σU i || R I DUi || P WU i ), V erU i = h(R I DUi || T CUi || σU i || P WU i ). M DUi checks if the verifying condition: V erU i = V erUi holds. If it holds, M DUi computes Ui ’s secret key as prUi = prU∗ i ⊕ h(P WU i || T CUi || σU i ) and the shared secret key SSUi = SSU∗ i ⊕ h(I DUi || T CUi || σU i ). – Step 2: M DUi generates a current timestamp C T1 and a random secret r1 ∈ Z q∗ , calculates: R1 = h(r1 || T CUi || prUi || C T1 ) ·G, X 1 = h( prUi || SSUi || S NU MUi || σU i || P WU i || I DUi ) ⊕ h(R I DUi || SSUi || T I DUi || C T1 ), and Y1 = h(R1 || X 1 ||
108
–
–
–
–
S. Saha et al.
SSUi || R I DUi || T I DUi || C T1 ), and then Ui sends the message M SG 1 = {T I DUi , R1 , X 1 , Y1 , C T1 } to H C j via public channel. Step 3: If the message M SG 1 is received at the time T1 , H C j first validates the received timestamp by |C T1 − T1 | ≤ T , where T is the “maximum allowable transmission delay.” If it is valid, H C j fetches (S NU MUi , R I DUi , SSUi ) from its secure database corresponding to T I DUi . Using the extracted credentials, H C j calculates Y1 = h(R1 || X 1 || SSUi || R I DUi || T I DUi || C T1 ) and if Y1 = Y1 , the message is considered as authentic; else, the phase is discarded. Step 4: H C j calculates x1 = h( prUi ||SSUi ||S NU MUi ||σUi ||P WUi ||I DUi ) = X 1 ⊕ h(R I DUi ||SSUi ||T I DUi ||C T1 ), generates a current timestamp C T2 and a random secret r2 ∈ Z q∗ to compute R2 = h(r2 || T C H C j || pr H C j || C T2 ) ·G, the session key shared with Ui as S K H C j ,Ui = h(h(r2 || T C H C j || pr H C j || C T2 ) ·R1 ||x1 ||C T2 ), the session key verifier as S K V1 = h(S K H C j ,Ui ||C T1 ||C T2 ). After that, H C j generfor Ui and calculates T I DU∗ i = T I DUnew ⊕ ates a new temporary identity T I DUnew i i h(T I DUi ||S K H C j ,Ui ||C T2 ), and sends the message M SG 2 = {T I DU∗ i , R2 , S K V1 , C T2 } to the Ui via public channel. Step 5: After receiving M SG 2 from H C j , M DUi checks the validity of timestamp C T2 . If the timestamp validation fails, the phase is discarded; otherwise, it calculates the session key shared with H C j as S K Ui ,H C j = h(h(r1 || T CUi || prUi || C T1 ) · R2 || h( prUi || SSUi || S NU MUi || σU i ||P WUi ||I DUi ) ||C T2 ) and S K V1 = h(S K Ui ,H C j ||C T1 ||C T2 ). If S K V1 = S K V1 , the session key is valid and M DUi stores S K Ui ,H C j = S K H C j ,Ui and S NU MUi = S NU MUi + 1 in its memory. After that, M DUi also generates current timestamp C T3 to calculate S K V2 = h(S K Ui ,H C j || C T3 || S NU MUi ) and send the message M SG 3 = {S K V2 , C T3 } to H C j via public channel. Step 6: H C j checks the timestamp C T3 , and if it is valid, sets S NU MUi = S NU MUi + 1 and computes S K V2 = h(S K H C j ,Ui || C T3 || S NU MUi ). After that, H C j checks if S K V2 = S K V2 is valid. If it is so, it proceeds to store S K H C j ,Ui (= S K Ui ,H C j ) for secure communication with the Ui .
The above-discussed phase is finally summarized in Fig. 6.
3.4 User Password Change Phase For security reasons, it is preferable for a registered user Ui to update his/her current password. Ui may not opt to update the biometric as it does not change over the time. The following steps are executed: – Ui first enters his/her identity I DUi , password P WU i , and imprints his/her personal biometric BioU i at the sensor of the mobile device M DUi . If the verification of all these credentials is successful by executing Step 1 described in Sect. 3.3, M DUi asks Ui to input new password. – Ui enters new password P WUni and M DUi computes T CUn i = T CUi ⊕ h(σU i || R I DUi ||P WUni ), rUn i = rUi ⊕h(P WUi . ||I DUi ||σU i ), V erUn i = h(R I DUi ||T CUi
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
109
Fig. 6 Authentication and key agreement process between Ui and H C j
||σU i ||P WUni ), prUn i = prUi ⊕ h(P WUni || T CUi ||σU i ) and SSUn i = SSUi ⊕ h(I DUi ||T CUi ||σU i ). – Finally, M DUi replaces the credentials {T CU∗ i , rU∗ i , V erUi , prU∗ i and SSU∗ i } with {T CUn i , rUn i , V erUn i , prUn i and SSUn i } in its memory to complete the password update process.
4 Blockchain Implementation Phase 4.1 Block Creation The structure of a block used in the security framework is presented in Fig. 7. After receiving the patient’s data, the respective H C j encrypts the confidential data T X i with its public key Pub H C j and creates the n tn transactions of the form: Enc Pub H C j (T xi ) |i = {1, 2, . . . , n tn }, where Enc(·) is the ECC-based public key encryption. After that, H C j will calculate the Merkle tree root (M TRoot ) with the available encrypted transactions. The H C j computes the current block hash C Bh as C Bh = h(B K V R ||P Bh ||T S ||Pub H C j ) and the signature B Sign EC DS A will be performed on the top of C Bh with the help of “Elliptic Curve Digital Signature Algorithm (ECDSA)” [13] to make a complete block B K g .
110
S. Saha et al.
Fig. 7 Private block formation and added in a blockchain
4.2 Block Addition and Consensus in Blockchain After successfully building a block B K g by H C j , a leader will be picked which can add the block in the blockchain network. In PBESF-IoTHS, H C j will select a leader from its P2P nodes with the help of existing voting-based “Practical Byzantine Fault Tolerance (PBFT)” algorithm [7] to have a consensus among the H Cs as in [5]. In addition, we assume that the H Cs will be synchronized among its peer nodes and the mining blocks will be updated in the local ledgers.
5 Security Analysis Based on the threat model defined in Sect. 1.1.2, we show that the proposed scheme (PBESF-IoTHS) is robust against various attacks that are shown in the following propositions. Proposition 1 PBESF-IoTHS is secure against replay attack. Proof The communication between Ui and H C j occurs in public channel and the communicated messages are M SG 1 = {T I DUi , R1 , X 1 , Y1 , C T1 }, M SG 2 = {T I DU∗ i , R2 , S K V1 , C T2 }, and M SG 3 = {S K V2 , C T3 } which include the unique timestamps and random secrets. The freshness of each message M SG l , l = 1, 2, 3, is verified by the respective recipient by means of validating the attached timestamp in the message. If the timestamp validation fails, the message will be treated as old one and as a result, it will be a replayed message. Therefore, replaying the existing messages without the current timestamps can be easily identified by the receiver. Hence, PBESF-IoTHS is secure against replay attack.
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
111
Proposition 2 PBESF-IoTHS is secure against privileged-insider and stolen mobile device attacks. Proof In the time of registration, the communication messages float in secure channel. The R A does not allow to register any random entities without having appropriate information. Moreover, the R A removes all the secret credentials in the memory after registration. During the user registration, a user Ui sends the credentials {R I DUi , R P WUi } to the R A via a secure channel. Even if a privileged-insider user of the R A being an adversary A has the credentials {R I DUi , R P WUi }, he/she can not compute Ui s identity I DUi , password P WUi and biometric BioUi without having random secret rUi ∈ Z q∗ and biometric secret key σUi . Now, assume that after registration, Ui ’s mobile device M DUi has been lost or stolen. The adversary A can then extract the credentials {T I DUi , S NU MUi = 0, T CU∗ i , rU∗ i , V erUi , prU∗ i = prUi ⊕ h(P WUi ||T CUi ||σUi ) and SSU∗ i = SSUi ⊕ h(I DUi ||T CUi ||σUi )}. However, without having the secrets like P WUi , σUi and I DUi it is not possible for A to have T CUi , SSUi , rUi and prUi . In this manner, PBESF-IoTHS is secure against “privileged-insider attack” as well as “stolen mobile device attack”. Proposition 3 PBESF-IoTHS is secure against man-in-the-middle (MiTM) attack. Proof Assume an adversary A collects M SG 1 , M SG 2 , and M SG 3 during the access control and key agreement phase. After that, A may try to create a valid message M SG 1 for M SG 1 and send to H C j . To fulfil it, A needs to select random secret r1 ∈ Z q∗ and timestamp C T1 on the fly, and then need to generate R1 , X 1 , Y1 . However, without having the appropriate secret information A will fail to generate complete M SG 1 . Similarly, by intercepting the message M SG 2 and M SG 3 , A can not generate valid messages without valid secrets. Therefore, PBESF-IoTHS is resistance against MiTM attack. Proposition 4 PBESF-IoTHS provides anonymity and untraceability. Proof During the communication if A captures the messages M SG 1 = {T I DUi , R1 , X 1 , Y1 , C T1 }, M SG 2 = {T I DU∗ i , R2 , S K V1 , C T2 }, and M SG 3 = {S K V2 , C T3 } to find out the identity of the sender, it will not possible as none of the messages contains actual identity of the user Ui . It is also hard by A to compute the identity of Ui from the intercepted messages. Moreover, all the messages are unique and different in each session, due to the involvement of the current timestamps, random secrets, and updated temporary identity. As a result, the adversary A can not trace the same user Ui over successive sessions. Thus, PBESF-IoTHS provide anonymity and untraceability features. Proposition 5 PBESF-IoTHS is secure against impersonation attacks. Proof Assume A attempts to act as an authorized entity on behalf of either Ui or H C j , and wants to construct an authorized message, say M SG 1 = {T I DUi , R1 , X 1 , Y1 , C T1 }. A can choose a random secret r1 ∈ Z q∗ and a timestamp C T1 . But, A needs to compute valid R1 = h(r1 || T CUi || prUi || C T1 ) ·G, X 1 = h( prUi || SSUi ||
112
S. Saha et al.
S NU MUi || σU i || P WU i || I DUi ) ⊕ h(R I DUi || SSUi || T I DUi || C T1 ) and Y1 = h(R1 || X 1 || SSUi || R I DUi || T I DUi || C T1 ) to impersonate Ui . However, without having the secret credentials, A’s task is computationally infeasible to create valid M SG 1 . Similarly, A will also fail to create other valid messages M SG 2 and M SG 3 on behalf of H C j and Ui , respectively. PBESF-IoTHS is then secure against user as well as hospital center impersonation attacks. Proposition 6 PBESF-IoTHS is secure against “Ephemeral Secret Leakage (ESL)” attack. Proof At the time of access control and key agreement phase between Ui and H C j , both the entities need to compute the share session key S K H C j ,Ui (=S K Ui ,H C j ), where S K H C j ,Ui = h(h(r2 || T C H C j || pr H C j || C T2 ) ·R1 ||x1 ||C T2 ) and S K Ui ,H C j = h(h(r1 || T CUi || prUi || C T1 ) · R2 || h( prUi || SSUi || S NU MUi || σU i ||P WUi ||I DUi ) ||C T2 ). The session-centric (ephemeral) credentials (“short-term secrets”) like random secrets, and long term secrets like identity, password, biometric secret key, temporal credentials and shared secret are required to generate the session key in each session. Unless both types of secrets are known to an adversary, the session keys in different sessions can not be computed. Thus, PBESF-IoTHS provides both “perfect forward and backward secrecy”, and at the same time, the “Ephemeral Secret Leakage (ESL)” attack is resisted in the proposed scheme (PBESF-IoTHS) under the CK-adversary model.
6 Comparative Analysis In this section, a comparative analysis on various functionality & security attributes are presented in Table 2. The compared schemes are considered as the schemes proposed by Li et al. [15], Luo et al. [16], and Li et al. [14]. Table 2 shows the comparison with different “functionality and security attributes” ( ATF R1 –ATF R7 ) and it is found that PBESF-IoTHS supports more functionality and security attributes as compared to those for the schemes of Li et al. [15], Luo et al. [16], and Li et al. [14].
7 Blockchain Simulation Study In this section, a blockchain simulation study for the proposed security framework (PBESF-IoTHS) is presented with the help of node.js and VSCODE 2019. We have considered the synthetic healthcare-related data for the patients in the proposed PBESF-IoTHS, where the transactions are stored in the blocks. The setting is considered as “CPU Architecture: 64-bit, Processor: 2.60 GHz Intel Core i5-3230M, Memory: 8 GB, OS: Ubuntu 18.04.4 LTS” for our simulation study.
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System Table 2 Comparison of functionality & security attributes Attribute feature Li et al. [15] Luo et al. [16] Li et al. [14] (ATF R ) ATF R1 ATF R2 ATF R3 ATF R4 ATF R5 ATF R6 ATF R7
× × × × ×
× × ×
× × ×
113
PBESF-IoTHS
ATF R1 : resilience against replay attack, ATF R2 : resilience against privileged-insider attack, ATF R3 : resilience against man-in-the-middle attack, ATF R4 : support to anonymity property, ATF R5 : resilience against impersonation attack, ATF R6 : resilience against ESL attack under the CKadversary model, ATF R7 : support to blockchain-based solution : a scheme is secure or it supports an attribute feature (ATF R ), ×: a scheme is insecure or it does not support an attribute feature (ATF R )
In addition, the size of block version B K V R , previous block hash P Bh , Merkle tree root M TRoot , timestamp T S, owner of block H C j , public key of signer Pub H C j , current block hash C Bh , signature on block using the ECDSA signature algorithm B Sign EC DS A are taken as 32, 256, 256, 32, 160, 320, 256 and 320 bits, respectively. However, each encrypted transaction Enc Pub H C j (T xi ) |i = {1, 2, . . . , n tn } contains of two ECC points, which results in (320 + 320) = 640 bits. The total cost of a block is then 1632 + 640n tn bits. There are two cases considered in this simulation study, which are as follows. – Case 1: Here, the number of transactions per block versus the total computational time (in seconds) required is considered. The P2P nodes are taken as 7 and the blocked mined is fixed at 15. The simulation results presented in Fig. 8a show that the computational time increases linearly with the number of transactions contained in each block. – Case 2: In this scenario, the total computational time (in seconds) is measured with the number of blocks mined where the total number of P2P nodes is 7 and a fixed number of transactions per block is 30. The simulation results presented in Fig. 8b also exhibit a similar trend in Case 1 where the computational time increases linearly with the number of mined blocks.
8 Conclusion This article addresses an important security service needed in an IoT-enabled healthcare system. The proposed access control and key agreement scheme has been migrated with blockchain technology to preserve the confidential and private data
114
S. Saha et al.
Fig. 8 Blockchain simulation results: a Case 1 b Case 2
belong to the healthcare personnels. The security analysis and comparative analysis show that the proposed scheme is secure against various attacks and also efficient. Moreover, the proposed scheme provides forward and backward secrecy along with the preservation of anonymity and untraceability properties. The practical demonstration with respect to blockchain simulation shows the impact on the proposed scheme. In future, we would like to perform big data analytics on the stored blockchain data related to healthcare as in [3]. Acknowledgement This work is supported by the Unified Blockchain Framework project for offering National Blockchain Service, Ministry of Electronics & Information Technology, New Delhi, Government of India (Grant No. 4(4)/2021-ITEA.).
References 1. H2 Healthcare Data Breach Report by Critical Insight (2021) https://cybersecurity. criticalinsight.com/2021_H2_HealthcareDataBreachReport. Accessed July 2022 2. Healthcare Cyber Trend Research Report by Infoblox (2022 https://info.infoblox.com/ resources-whitepapers-the-2022-healthcare-cyber-trend-research-report. Accessed July 2022 3. Aujla GS, Chaudhary R, Kumar N, Das AK, Rodrigues JJPC (2018) SecSVA: secure storage, verification, and auditing of big data in the cloud environment. IEEE Commun Mag 56(1):78– 85 4. Bagga P, Das AK, Chamola V, Guizani M (2022) Blockchain-envisioned access control for internet of things applications: a comprehensive survey and future directions. Telecommun Syst 81(1):125–173 5. Bera B, Saha S, Das AK, Kumar N, Lorenz P, Alazab M (2020) Blockchain-envisioned secure data delivery and collection scheme for 5G-based IoT-enabled internet of drones environment. IEEE Trans Veh Technol 69(8):9097–9111 6. Canetti R, Krawczyk H (2002) Universally composable notions of key exchange and secure channels. In: International conference on the theory and applications of cryptographic techniques (EUROCRYPT’02). Amsterdam, The Netherlands, pp 337–351
Private Blockchain-Enabled Security Framework for IoT-Based Healthcare System
115
7. Castro M, Liskov B (2002) Practical Byzantine fault tolerance and proactive recovery. ACM Trans Comput Syst 20(4):398–461 8. Chatterjee S, Das AK (2015) An effective ECC-based user access control scheme with attributebased encryption for wireless sensor networks. Sec Commun Netw 8(9):1752–1771 9. Chatterjee S, Das AK, Sing JK (2014) An enhanced access control scheme in wireless sensor networks. Ad Hoc & Sens Wireless Netw 21(1):121–149 10. Das AK (2012) A random key establishment scheme for multi-phase deployment in large-scale distributed sensor networks. Int J Inf Secur 11(3):189–211 11. Dodis Y, Reyzin L, Smith A (2004) Fuzzy extractors: how to generate strong keys from biometrics and other noisy data. In: International conference on the theory and applications of cryptographic techniques (EUROCRYPT’04). Interlaken, Switzerland, pp 523–540 12. Dolev D, Yao A (1983) On the security of public key protocols. IEEE Trans Inf Theory 29(2):198–208 13. Johnson D, Menezes A, Vanstone S (2001) The elliptic curve digital signature algorithm (ECDSA). Int J Inf Secur 1(1):36–63 14. Li F, Han Y, Jin C (2016) Practical access control for sensor networks in the context of the Internet of Things. Comput Commun 89–90:154–164 15. Li X, Peng J, Obaidat MS, Wu F, Khan MK, Chen C (2020) A secure three-factor user authentication protocol with forward secrecy for wireless medical sensor network systems. IEEE Syst J 14(1):39–50 16. Luo M, Luo Y, Wan Y, Wang Z (2018) Secure and efficient access control scheme for wireless sensor networks in the cross-domain context of the IoT. Secur Commun Netw 2018:1–10 17. Malani S, Srinivas J, Das AK, Srinathan K, Jo M (2019) Certificate-based anonymous device access control scheme for IoT environment. IEEE Int Things J 6(6):9762–9773 18. Massey J (1965) Step-by-step decoding of the Bose-Chaudhuri- Hocquenghem codes. IEEE Trans Inf Theory 11(4):580–585 19. May WE (2015) Secure Hash Standard. http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.1804.pdf. FIPS PUB 180-1, National Institute of Standards and Technology (NIST), U.S. Department of Commerce, April 1995. Accessed April 2022 20. Messerges TS, Dabbish EA, Sloan RH (2002) Examining smart-card security under the threat of power analysis attacks. IEEE Trans Comput 51(5):541–552 21. Mohammad Hossein K, Esmaeili ME, Dargahi T, Khonsari A, Conti M (2021) BCHealth: a novel blockchain-based privacy-preserving architecture for IoT healthcare applications. Comput Commun 180:31–47 22. Saha S, Chattaraj D, Bera B, Das AK (2021) Consortium blockchain-enabled access control mechanism in edge computing based generic Internet of Things environment. Trans Emer Telecommun Technol 32(6):e3995 23. Saha S, Sutrala AK, Das AK, Kumar N, Rodrigues JJPC (2020) On the design of blockchainbased access control protocol for IoT-enabled healthcare applications. In: ICC 2020 - 2020 IEEE international conference on communications (ICC). Dublin, Ireland, pp 1–6 24. Zeadally S, Das AK, Sklavos N (2021) Cryptographic technologies and protocol standards for Internet of Things. Internet Things 14:100075
GradeChain-α: A Hyperledger Fabric Blockchain-Based Students’ Grading System for Educational Institute Snigdha Mayee Samantray, Debasis Giri , and Tanmoy Maitra
Abstract The technological advance has an impact on education for many decades. Education is a core area where different peers need to share and modify shared information. Educational institutes are managing large amounts of student records pertaining to student data, marks, and teachers’ details to provide better grading solution. Marks are more sensitive as its disclosure might lead to revealing the identity of the students. There are several advantages to use blockchain in the education system where digitalized students’ record is one of them. In such a system, the organizations can generate marks, and can store marksheets and certificates of individual students which can be accessed and verified by the other institute/company if the students have applied to that company/organization for getting a job or for getting admission to higher studies. This paper has designed a Hyperledger Fabric blockchainbased marks generation marks storing system of an educational organization, named GradeChain-α. The generator of the marks can be tracked by the administrator of the system. The reliable, replicated, redundant, and fault-tolerant (RAFT) protocol has been used to automate the result publication of students. The implementation status shows that this architecture can be used in an automated students’ grading system. Keywords Blockchain · Education system · Grading system · Hyperledger fabric · RAFT protocol
S. M. Samantray · T. Maitra (B) School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, Odisha, India e-mail: [email protected] D. Giri Department of Information Technology, Maulana Abul Kalam Azad University of Technology, Nadia 741249, WestBengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_9
117
118
S. M. Samantray et al.
1 Introduction The first blockchain was proposed in [1] which is based on the blind signature. After that, the first digital payment was attempted by Digicash [2]. Satoshi Nakamoto proposed Bitcoin [3] in 2008 to control his own data and finances. However, in the recent era of technology, blockchain along with the smart-contract feature has been introduced in different sectors like health care [4], banking [5], finance [6] (payments), energy [7], supply chain management [8], (logistics and warehouses), manufacturing [9], real estate [10], and agriculture [11]. The problem of trust, security, and cost of a centralized system is solved by the decentralized nature of blockchain by eliminating the third party and enabling peer-to-peer transactions. Blockchain has the potential to maintain a record of all the transactions as a ledger and is shared with all the nodes (i.e., parties). There are three generations of blockchain networks. The first generation of blockchain only stores and transfers the value (example: Bitcoin, Ripple, and Dash). The second generation of blockchain is programmable via smart contract (example: Ethereum). The third generation of blockchain is highly scalable with availability, the enterprise blockchain (example: Hyperledger [12], R3 Corda, and Ethereum Quoram). Bitcoin network extends anonymity and operates on permissionless networks, whereas known entity is operated on a permissioned blockchain network. Nowadays, the education system is also adopting blockchain technology, where ‘Blockchain in schools and colleges’1 became a slogan in social networks. National Education Policy 2020 (NEP-2020), India, has set an ambitious agenda, i.e., the goal is to achieve a 100% Gross Enrollment Ratio (GER) in school education by 2030 and double it in higher education, to reach 50% by 2035. However, the management of student records, including daily data like assignments, extracurricular activities, and attendance, as well as data on degrees, might be greatly facilitated by the use of blockchain technology. Classical university database application is centralized with the risk of singlepoint-of-failure, and can be manipulated from inside the premises and outside the university. In such a system, it is needed to trust the owner of the database server to keep data secure. Attackers can also infiltrate the server and can change the data, and even restoring from the backup is not that proper and not even trusted. These systems are certified by third party auditors, and the auditors can be bribed or mistaken. The traditional system needs some manpower, intermediary agencies, and paper work to supply the marksheets physically to the students. This procedure (see Fig. 1) takes time to generate and publish the results of the students of the university. Therefore, many students cannot get certificates/transcript whenever they need to apply for a job to a company or to get admission at other organizations for higher studies. When the student’s data increases in size and quantity, the efficiency issues arise. The existing student gradation systems also need to keep and protect the log files used for recreating the previous state of records and histories of student’s data. The file must be protected against all types of vulnerabilities. 1
https://indianexpress.com/article/opinion/blockchain-technology-education-nep-7696791/.
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
119
Fig. 1 Traditional evaluation system flowchart
The aim of this work is to provide a transparent, immutable, and secure blockchainbased platform by removing the third party, where the organizations can generate and store marksheet securely for each student efficiently and securely in the blockchain via smart contract, namely GradeChain-α. These marksheets can be read or accessed, once it is needed from the remote place. This study provides the tracing of the marks’ generators so that it can eliminate the fake institute to keep fake records in the Blockchain or also can eliminate the fake marksheets of a student. To generate timely, trusted, and transparent marksheets of students, the private permissioned blockchain like Hyperledger Fabric is suitable as it is better in authenticity and confidentiality than Ethereum [13]. The outline of this study is as follows: Sect. 2 discusses the related current study in blockchain-based education system. Section 3 highlights some general information related to this study. Section 4 introduces the proposed architecture of the system to be implemented and smart contract design, i.e., GradeChain-α. Implementation details and result analysis are given in Sect. 5 followed by the conclusion of this work in Sect. 6.
120
S. M. Samantray et al.
2 Literature Review The USA, Canada, and the European Union like many countries implemented blockchain in their education system [14]. There are several educational domains where Blockchain have been introduced which are as follows: 1. Eliminate fake educational organization: There are several educational organizations which are not accredited by government, but they provide fake degrees and/or certificates. By using Blockchain technology, this kind of fraud can be eliminated. Acachain [15] was introduced so that fraud academic credentials can be prevented by the institute. 2. Efficient accredited facility: Organizations can store their own achievements in Blockchain which can be easily verified by the accredited committee to provide ranks and grades [16]. This study could not find any blockchain-based solution for this domain. 3. Digitalized students’ record: Blockchain technology can provide students with an excellent framework for managing records, including daily information such as assignments, attendance, information about degrees, and extracurricular activities [17]. Besides these, organizations also store marksheets, certificates of individual students which can be accessed and verified by the other institute/company if the students have applied to that company/organization for getting a job or for getting admission to higher studies. In India, IIT Kanpur has implemented blockchain-based certificate for the students.2 The education projects were to build, redesign, and upgrade the existing education systems to integrate students’ profiles with gradation and track record via certified integration platforms and intermediaries. But the system is difficult to achieve scalability, security, and privacy among students. This system is slow and complicated. 4. Online discussion and course study: Blockchain can provide an online platform where students and experts can discuss on a course, and if a teacher posted some tasks, then the student can submit the completion of that task in a blockchain which will be automatically verified by the smart contract. Cheriguene et al. in [18] proposed a solution. 5. Scholarship scheme: In a traditional system, some difficulties faced by the students in availing the scholarship are due to the lack of traceability of the application form, loss of application form in transit through Postal Service, lack of transparency between students and their respective Education Board, lack of bank account verification, etc. In the article [19], the authors have given a solution which is based on Ethereum. 6. Access from IoT devices: Because of the limited capability of blockchain to process the transaction requests from a huge number of IoT systems, the different groupkey agreement protocols [20] have been implemented to ensure the security. To reduce the sources overhead in a cloud-edge computing environment, the authors in the paper [21] suggested for a privacy-preserving authentication scheme. 2
https://blockchain.cse.iitk.ac.in/index.html.
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
121
Apart form the aforementioned applications, the companies like Digicert [22] implemented Ethereum smart contracts having the issue of access permissions; Blockcert [23] implemented bitcoin for storing certificates and involve with high transaction fees; and Smartcert [24] is based on hash and digital certificates vulnerable to share the credentials with employers. This study first attempts to implement the Hyperledger Fabric Blockchain-based students’ grading system for educational institutes, namely GradeChain-α. This study uses permissioned blockchain to implement GradeChain-α.
3 Backgrounds This section discusses the architecture of blockchain and Hyperledger fabric which are used to design GradeChain-α.
3.1 Blockchain Model In GradeChain-α, the students, teachers, examination controller, and admin should join the private blockchain network (permissioned). Blockchain model [25] is divided into three different layers as shown in Fig. 2. (1) Infrastructure layer: Node (the participant), storage, and network facilities like all hardware components are in this layer. (2) Platform layer: REST (RE-presentational State Transfer) and Web API (Application Programming Interface) and RPC (Remote Procedure Call) are invoked in this
Fig. 2 Blockchain layers
122
S. M. Samantray et al.
layer for client and blockchain network communication. (3) Distributed computing layer: This layer takes care of transactions (communications among network participants), hashing (for data privacy, hash function is used like SHA-512, SHA-256, and MD5, where from a variable length input string, a fixed length output string is generated, but the reverse is not possible), consensus (an algorithm to validate the order of transactions, update ledger, decision on adding the block), signature (technique used for user authentication), and replication (for immutability and fault tolerance, the ledger is replicated on nodes).
3.2 Hyperledger Fabric (HLF) The proposed GradeChain-α is based on Hyperledger Fabric, because (a) the permissioned blockchain will work where all members are known entities. Here, teachers, who are the known entities within the university network, provide the marks to the students; (b) Hyperledger Fabric operates on a channel, so a transaction is visible to members of that particular channel. Here, to provide different accesses and visibility to the different users, Hyperledger Fabric is used; (c) there is no cryptocurrency provided by Hyperledger Fabric, so it does not involve any transaction fee for grading the student, and is a cost-effective one; (d) as Hyperledger Fabric uses the executer-order-validate mechanism, it helps to increase the performance of the system by eliminating non-deterministic transactions; and (e) the business logic can be implemented using self-executing programs called chaincode. Hyperledger Fabric blockchain is comprised of the following components: (a) Node: There are three types of nodes in Hyperledger Fabric. (1) Client web application/CLI (command line interface): The nodes which can, i.e., start, stop, and configure the node, and can manage the chaincode, i.e., installation, execution, and upgradation; (2) Peers: Endorsers which can hold chaincode for validating, simulating, and endorsing, and committer which can update the blockchain and ledger after verification; (3) Ordering service (orderer): The nodes which can establish the consensus on the order of transactions and then broadcast blocks to the peers. (b) Channel: The nodes can communicate each other within the channel to maintain confidentiality. (c) Ledger and world state: World state is the current value of the ledger and a ledger is the chain of blocks. CouchDB or GoLevelDB are two databases used for this. (d) Chaincode: It runs within a container (e.g., Docker) for isolation and does not have a direct access to the ledger state. (e) Membership service provider: It is responsible for associating entities in the network with cryptographic identities. Three types of consensus implementations are there in HLF: Solo, RAFT, and Kafka for ordering service. One ordering node such as Solo orders the system to run, which is a disadvantage for fault-tolerance. Hence, it is only used for test purposes. Kafka follows a leader-follower configuration. The disadvantage of Kafka is that
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
123
there is an additional administrative overhead. RAFT built-in leader-follower model has one leader per channel (when a leader goes down, a new leader is randomly chosen), and it is a crash-fault-tolerant protocol. Thus, this study picks RAFT as a consensus algorithm to implement GradeChain-α.
4 The Proposed GradeChain-α This section discusses the proposed GradeChain-α.
4.1 Network Structure of GradeChain-α GradeChain-α has four types of users: students, teachers, a controller of examination (CoE), and an administrator. Here for simplicity, this study considered one student, two teachers from different departments, a CoE, and an administrator of an organization to design GradeChain-α. Suppose the student of computer science has a client application SC , the teachers of computer science and basic science have a client application TC and TB , respectively, the CoE has a client application C, and the admin has a client application A. Common department teachers, the CoE and the admin, are connected through the same channel. They maintain a common ledger. In Fig. 3, the teacher of computer science (TC ), CoE (C), and admin (A) are connected through a channel Cdept1 . The transactions are stored in a ledger L dept1 and maintained by a peer node Pdept1 . Similarly, the teacher of basic science (TB ), CoE (C), and admin (A) are connected through channel Cdept2 . The transactions are stored in ledger L dept2 and maintained by a peer node Pdept2 . The student of computer science (SC ) and admin (A) are connected through channel CC S . The transactions are stored in ledger L C S and maintained by a peer node PC S . All the ledgers, i.e., L dept1 , L dept2 , and L C S are maintained in the committing peer nodes Pi ∀ i ∈ [1, n] and connected to the all channels Cdept1 , Cdept2 , and CC S . Channel configuration policy CC Pdept1 governs channel Cdept1 , CC Pdept2 governs channel Cdept2 , and CC PC S governs channel CC S . There are the ordering peers Oi ∀ i ∈ [1, m] which run the RAFT consensus protocol to generate the new blocks. Certificate authorities C A SC , C A TC , C A TB , C AC , and C A A are responsible for issuing the certificate to the members TC , TB , C, and A respectively of the network for identifying the users. Membership service provider (MSP) (see in Fig. 3, SC .MSP, TC .MSP, TB .MSP, C.MSP, and A.MSP are membership service provider for corresponding users) authenticates and authorizes the user to perform the assigned task. The student and admin are connected to the system with the channel CC S which is different from the channel that connects the teacher, CoE, and admin. Thus, it is easy to maintain the confidentiality and access control in the process of assigning marks. The third department can be joined to the network, when the network is up and
124
S. M. Samantray et al.
Fig. 3 Proposed network of GradeChain-α and deployment
running. Note that all the students will use one channel to connect with the admin. Here, the number of channels used is depending on the number of departments present in that institute. For the database, CouchDB which supports indexes and image handling has been used to store the ledgers in this work as it is more flexible.
4.2 Transaction Flow of GradeChain-α After the nodes become members of the network by obtaining a certificate from the certificate authority (CA), they are allowed to perform the transactions according to the access control policy. Figure 4 shows the transaction flow of GradeChain-α, where the left block is the front end and right block is the back end application. The first user Ui (client node) sends a transaction proposal through the Angular JS. Then the transaction proposal flows to the docker through Fabric SDK. At the back end, then MSP receives and checks the authenticity of the transaction proposal through a certificate authority (CA). After verification, MSP returns the response with credentials (certificate) to Fabric SDK. Then Ui returns the transaction proposal again with the valid certificate to the endorsing nodes via the docker. At the back end, the
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
125
Fig. 4 Transaction flow of GradeChain-α
endorsing nodes take the transaction proposal and execute the smart contract. Here, Pdept1 , Pdept2 , and PC S are the endorsing nodes. Endorsing nodes update the world state without updating the ledger as well as maintaining a log for the clients in the REDIS database. Then a validation reply is sent to the client which is signed with his/her certificate. The ordering nodes collect those proposal responses from various clients, and put a time stamp for each transaction. Periodically, for a time span, ordering nodes take a chunk of received transactions and create a block, and put the new block in a blockchain. The blocks are then communicated to the committing nodes, and they do the validation of the transaction response and update the ledger and world state through CouchDB. Committing node can generate an event whether the transaction submitted by the client is successfully completed or not. Finally, the committing node broadcasts the updated ledger to the endorsing nodes.
4.3 Functionalities After the successful setup of GradeChain-α, the chaincode is invoked. The teacher, admin, and CoE are given both the read and write permissions, while the student node is given only the view permission. The application has functionalities like registration, login, marks assigned by the teacher, weightage calculation by CoE, marksheet upload by admin, and view of marksheet by student. The chaincodes are installed and executed in different docker containers. This study discusses the chaincodes used in GradeChain-α which are as follows: • Registration: A student has a university-provided registration number Regd I D . When the student tries to register, the transaction travels from the client to the application (Angular.js web application). CA of the organization enrolls the user and returns the digital signature (ECDSA)3 and private key for the student. The student should safely store the credentials for future transactions. The chaincode RegisterStu3
https://cryptobook.nakov.com/digital-signatures/ecdsa-sign-verify-messages.
126
S. M. Samantray et al.
Fig. 5 Pseudochaincode: a student registration, and b user login
dent() (see Fig. 5a) will be invoked when a new student needs to register. Similarly, admin, teachers and CoE do their registration in the network. The parameters {name, identity, certificate, wallet} are maintained after the registration for each user. • Login: Once the user gets registered, he/she can login by providing his/her credentials (see Fig. 5b). The MSP corresponding to that user authenticates the user, and then the web server generates a JWT (JSON web token) token after validation. JWT token has three parts, (1) header, (2) payload, and (3) 256-bit secret key. The hash of the three parts creates the signature. After authentication, a transaction identity is created for the user to transmit the transaction, and the details of login logs are added to the REDIS database. • Marks assigned by the teacher: The teacher after receiving the student details from the admin could check for the authenticity of the student using the database. If he/she finds that the student detail comes from a proper source, he/she then accepts the student and will change the field of marks like internal assessment, mid-term, project, final, and project work. After successfully changing the fields, a new transaction identity is created. The teacher assigns marks for the particular subject and the transaction contains E M PI D (identity of teacher), Tname , Regd I D , Cour se I D , marks, C A Ti , and Ti .M S P (see Fig. 6a). • Weightage calculation by examination controller: In the dashboard, CoE can check the authenticity of the students and teachers, the students’ marks, and can change certain attributes. He/she has the privilege of calculating the marks according to the weightage which can be done through chaincode. Once the admin gets the message from CoE, the chaincode will validate the fields. The transaction contains Regd I D , Cour se I D , marks, date, C AC , C.M S P, CGPA/SGPA/average, and E M PI D (see Fig. 6b). • Marksheet upload by admin: After the validation, admin converts the marksheet to base64 string, calculates the hash value by using SHA-256, and stores it. Then they encrypt with the AES algorithm to upload the marksheet (see Fig. 6c).
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
127
Fig. 6 Pseudochaincode: a Marks assigned by the teacher, b weightage calculation by examination controller, and c marksheet upload by admin
• View of marksheet by student: The student can check the authenticity of the marksheet by using his/her Regd I D . He/she can trace the path of the mark generated from the teacher. The student will be provided with the details about the subject marks by the teacher. The student does not have the right to update the details. He can only read the marks. For verification, the student can upload the marksheet; the chaincode calculates the hash value (SHA-256) of the marksheet uploaded by the student, to verify with the hash value of the stored marksheet.
5 Implementation and Result This section briefly discusses the implementation details of GradeChain-α.
5.1 Experimental Setup Table 1 shows the software configuration for the experimental setup.
128
S. M. Samantray et al.
Table 1 Software configuration for running Hyperledger Fabric Software Version Software Docker Python Git Hyperledger Fabric
20.10.12 v3.10.5 2.37.1 v1.4.12
Table 2 Docker composer Containers
Curl Node.JS OS Npm
Version 7.83.1 16.16.0 Windows 8.15.1
Number
Components
Fabric CA containers
5
Peer containers Orderer containers CouchDB containers Policy contract containers
4 1 3 3
C A SC , C A TC , C A TB , C AC , C AA Pdept1 , Pdept2 , PC S , P raft OC S L dept1 , L dept2 , L C S CC Pdept1 , CC Pdept2 , CC Pcs
5.2 Initialization Steps for initialization of GradeChain-α are as follows: (a) generating root certificates and secret key pairs for the nodes using the Hyperledger cryptogen tool, (b) then genesis block creation using configtxgen tool (configtx.yaml should be embedded in the genesis block), (c) Docker image creation by configuring Docker-composeetcdraft.yaml, (d) channel creation and joining of all peers, and (e) smart contract initialization.
5.3 Docker Container Docker provides a sandbox environment [26] and is used to run all components of HLF. Docker containers (images) in the blockchain network are used to configure the entities like Fabric CA, peer, orderer, CouchDB, and policy contract. Table 2 shows the docker containers needed to implement GradeChain-α.
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
129
Fig. 7 Throughput of GradeChain-α: a query transaction and b invoke function
5.4 Performance Analysis This work varies the number of transactions to check the throughput of GradeChainα. The throughput detail is given in Fig. 7. Figure 7a shows the throughput of query transactions. That means if any student wants to get or see his/her marksheet, then query transactions are generated. Figure 7b shows the throughput of marks generation transactions (i.e., invoke function).
6 Conclusion The proposed GradeChain-α used the RAFT (attempts to develop a Byzantine faulttolerant) ordering service. The performance of RAFT depends on (a) the choice of the network parameters and topology of nodes in the network, (b) the hardware on which nodes run, (c) the number of nodes and channels, (d) a distributed application and transaction size, and (e) the ordering service and consensus implementation and their parameters. The latency of GradeChain-α is very low because of a very small network and very less number of nodes as well as the transaction sent per second is also low. This is the α version of the system. In future, this work will be implemented with more departments and more channels. To test the model efficiency, the number of orderer nodes can be increased in the design. As the number of students are increasing day by day, the cloud database can be used to store and handle the database. The architecture needs to be tested with security aspects like man-in-the-middle attack and forgery attack.
130
S. M. Samantray et al.
References 1. Haber S, Stornetta WS (1991) How to time-stamp a digital document. In: Menezes AJ, Vanstone SA (eds), Advances in cryptology-CRYPTO’ 90. Springer, Berlin, pp 437–455 2. History of blockchain. https://www.javatpoint.com/history-of-blockchain 3. Nakamoto S (2008) Bitcoin whitepaper. https://bitcoin.org/bitcoin.pdf. Accessed 17 June 2019 4. Sarkar A, Maitra T, Neogy S (2021) Blockchain in healthcare system: security issues, attacks and challenges. Springer International Publishing, Cham, pp 113–133 5. Guo Y, Liang C (2016) Blockchain application and outlook in the banking industry. Financ Innovat 2(1):1–12 6. Ali O, Ally M, Dwivedi Y et al (2020) The state of play of blockchain technology in the financial services sector: a systematic literature review. Int J Inf Manage 54:102199 7. Andoni M, Robu V, Flynn D, Abram S, Geach D, Jenkins D, McCallum P, Peacock A (2019) Blockchain technology in the energy sector: a systematic review of challenges and opportunities. Renew Sustain Energy Rev 100:143–174 8. Queiroz MM, Telles R, Bonilla SH (2019) Blockchain and supply chain management integration: a systematic review of the literature. Supp Chain Man: Int J 9. Kasten JE (2020) Engineering and manufacturing on the blockchain: a systematic review. IEEE Eng Manag Rev 48(1):31–47 10. Fahim Ullah and Fadi Al-Turjman. A conceptual framework for blockchain smart contract adoption to manage real estate deals in smart cities. Neural Computing and Applications, pages 1–22, 2021 11. Bermeo-Almeida O, Cardenas-Rodriguez M, Samaniego-Cobo T, Ferruzola-Gómez E, Cabezas-Cabezas R, Bazán-Vera W (2018) Blockchain in agriculture: a systematic literature review. In: International conference on technologies and innovation. Springer, pp 44–56 12. Hyperledger. https://hyperledger-fabric.readthedocs.io/en/release-1.4/whatis.html 13. Mohanty D (2018) Blockchain: from concept to execution-new. BPB Publications 14. Yumna H, Khan MM, Ikram M, Ilyas S (2019) Use of blockchain in education: a systematic literature review. In: Nguyen NT, Gaol FL, Hong T-P, Trawi´nski B (eds), Intelligent information and database systems. Springer International Publishing, Cham, pp 191–202 15. Bhumichitr K, Channarukul S (2020) Acachain: academic credential attestation system using blockchain. In: Proceedings of the 11th international conference on advances in information technology, pp 1–8 16. Mihus I (2020) The main areas of the blockchain technology using in educational management. Econ Finance Manag Rev 4:84–88 17. Verma R (2022) Transforming education through blockchain technology. In: Transformations through blockchain technology. Springer, pp 43–71 18. Cheriguene A, Kabache T, Kerrache CA, Calafate CT, Cano JC (2022) Nota: a novel online teaching and assessment scheme using blockchain for emergency cases. Educ Inf Technol 27(1):115–132 19. Lohit A, Kaur P (2022) Blockchain application in the elimination of scholarship-based manipulation. Int J Res Appl Sci & Eng Technol 10(V):2289–2296 20. Chen C-M, Deng X, Gan W, Chen J, Islam SK (2021) A secure blockchain-based group key agreement protocol for iot. J Supercomput 77(8):9046–9068 21. Mei Q, Xiong H, Chen Y-C, Chen C-M (2022) Blockchain-enabled privacy-preserving authentication mechanism for transportation cps with cloud-edge computing. IEEE Trans Eng Manag 1–12 22. Poorni R, Lakshmanan M, Bhuvaneswari S (2019) Digicert: a secured digital certificate application using blockchain through smart contracts. In: 2019 international conference on communication and electronics systems (ICCES). IEEE, pp 215–219 23. Jirgensons M, Kapenieks J (2018) Blockchain and the future of digital learning credential assessment and management. J Teach Educ Sustain 20(1):145–156 24. Dumpeti NK, Kavuri R (2021) A framework to manage smart educational certificates and thwart forgery on a permissioned blockchain. In: Materials today: proceedings
GradeChain-α: A Hyperledger Fabric Blockchain-Based …
131
25. Ismail L, Materwala H, Zeadally S (2019) Lightweight blockchain for healthcare. IEEE Access 7:149935–149951 26. Aleksieva V, Valchanov H, Huliyan A (2020) Implementation of smart-contract, based on hyperledger fabric blockchain. In: 2020 21st international symposium on electrical apparatus & technologies (SIELA). IEEE, pp 1–4
Object-Background Partitioning on Images: A Ratio-Based Division Shyamalendu Kandar and Seba Maity
Abstract Image segmentation is a process of partitioning an image’s distinct objects from the background and has wide applications in diverse fields that include medical images, video surveillance, scene understanding, partial image encryption, image compression, etc. Segmentation is accomplished by thresholding some image attributes, thereby dividing image data into two parts. In this work, image data is considered as an array of gray values in the form of a line segment comprising objects and background pixels. Then an iterative thresholding on intensity values is done in the framework of dividing a line segment internally. The image is partitioned into non-overlapping blocks and segmented in an iterative manner based on cumulative frequency of the gray values in the local histogram. On the segmented blocks, the probabilities of the correct detection (Pd ) of object pixels are computed, which are then used to find the same for the whole image, called global probability of object detection, denoted as PD . The iteration stops when the global PD meets some predefined threshold value. A large set of simulation results show the efficiency of the proposed segmentation scheme over the existing methods in terms of object detection probability (0.9) and false detection probability of background (0.1) at faster convergence (four to six iterations). Keywords Image segmentation · Line segment · Internal partitioning · Iterative thresholding · Probability of object detection
S. Kandar (B) Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India e-mail: [email protected] S. Maity Department of Electronics and Communication Engineering, College of Engineering and Management, Kolaghat, Purba Medinipur 721171, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_10
133
134
S. Kandar and S. Maity
1 Introduction Segmentation means partitioning of an image into distinct regions called objects or features of interest, represented by grouping together pixels that have similar attributes. Segmentation bridges the gap between low-level image processing, concerned with the manipulation of gray-level intensity or color values, to correct the defects on the enhancement of its content in an image, and high-level image processing that is responsible for the manipulation and analysis of features of interest. It has diverse applications involving the detection, recognition, and measurement of objects in an image ranging from conventional applications like industrial inspection [1–3], optical character recognition (OCR) [4, 5], tracking of objects in a sequence of images [6, 7], classification of regions in satellite images [8, 9] to the specialized applications in medical image analysis, involving identification and measurement of bones [10], tissues, tumor [11], etc. Irrespective of these applications, needless to mention that segmentation plays a crucial role in several tasks requiring image analysis, the success and failure of image analysis to a great extent depends on the accuracy of its segmentation results. However, developing a method of reliable, accurate, computationally simple yet achieving high detection accuracy in an automatic means is truly a challenging task. That is why image segmentation is still an open research problem, even though the literature in this particular research domain is quite rich. Image segmentation can be viewed as a thresholding problem on any image attribute such as gray-level intensity, color information, edge information, texture, variance, energy, and entropy. The accuracy of segmentation is determined by the choice of the optimal threshold value to be applied to the proper image attribute. A large set of tools and techniques are explored involving classical optimization [12, 13], bio-inspired approaches [14], information-theoretic approaches [15, 16], as well as recent times’ widely used various machine learning algorithms [17–19]. A majority of these methods, though they achieve high accuracy, suffer from a large computational complexity that often limits their uses for various real-time applications. However, some applications may fulfill the needs with reasonably acceptable accuracy but more importantly demand simple computational cost. There are several spatial and transform domain approaches that are used in image segmentation involving set theoretic [20], signal processing [21], control theory, etc. where images are viewed as stochastic or random signals. In its simplest form, an image can be thought of as a two-dimensional array of gray/ intensity values. This array representation leads us to view images in the form of a linear data structure. In other words, in its simplest form, an image can be thought of as a line segment of graylevel intensity, and segmentation can be thought of as dividing this line segment into
Object-Background Partitioning on Images: A Ratio-Based Division
135
two parts, object and background. In this representation, a certain proportion of pixel intensity seems to constitute the object while the remaining as the background. Hence, image segmentation may be viewed as the partitioning of pixels in two appropriate proportions to represent the object and the background as if dividing the whole set of pixels into two groups. This leads us to view image segmentation as an internal partitioning of a line segment (image) at the optimal proportion, and this framework is used in this paper. The rest of the paper is organized as follows: Section two makes a brief literature review on image segmentation followed by the scope and the contributions of the present work. Section three presents the proposed method, and then experimental results and discussions are made in Sect. 4. Finally, the paper is concluded in Sect. 5 along with the scope of future works.
2 Literature Review, Scope, and Contributions A plenty of techniques such as gray-level thresholding [22–25], edge detection [26, 27], region growing [28, 29], clustering [30–32], and machine learning [17, 33] are successfully applied for image segmentation. Image thresholding means partitioning of a range of values (continuous or discrete) into two (or more) sets so that some properties (gray values, color information, edge or texture, etc.) measured from the image that falls below the threshold are labeled or assigned to the background (or object) while the same above the threshold seems to belong to the object (or background). Determining the optimum value of this threshold indicates the accuracy of the segmentation results that simultaneously maximize the correct detection and minimize the number of false detections. Such approaches demand solving a max-min problem that often becomes mathematically intractable. An obvious solution is to make use of human intervention to find the appropriate threshold value so that the best result is achieved. However, this is not desirable for fully automatic segmentation. Hence, finding the optimal threshold value in an automatic way and by means of iterative approaches is challenging in terms of accuracy, computation, and convergence that often lead to a trade-off. A closed-form mathematical relation involving image characteristics to determine the optimal threshold value often becomes difficult. Iterative methods are then used, where, to start with, an initial guess on gray level is made and the value refines this estimate by successive passes over the image. Often, some form of assumption is made in iterative thresholding, for example, the corner pixels represent the background rather than objects of interest [34] or objects that have higher energy than the background.
136
S. Kandar and S. Maity
An iterative thresholding is reported in [35] that uses the Taylor series expansion for updating the threshold value. Li and Lee [36] introduced the concept of the minimum cross entropy thresholding for image segmentation. Cross entropy is computed on a pixel-to-pixel basis between two images (segmented image and the original image), and aim of the technique is to select the threshold value that minimizes the cross entropy between the segmented image and the original image. Numerous segmentation attempts [37–39] were made based on the minimum cross-entropy techniques. Otsu’s method [40] for image segmentation aims to select an appropriate threshold value based on the maximum variance between the foreground and the background. An improved version of Otsu’s method [40] is found in [41], which has performed segmentation based on the maximum inter-class variance. A graph-based method for image segmentation is presented in [42]. The perimeter of the area to be segmented is determined by constructing a weighted graph using the neighboring pixels. Image foresting transform (IFT) is a graph-based image processing technique based upon connectivity [43]. Several image segmentation methods using IFT are reported in [44–47]. Of late, wide applications of different machine learning (ML) techniques in image segmentation are seen in the literature [18, 48, 49]. A modified k-means clustering algorithm is applied for image segmentation in [18]. The original image is processed by k-means clustering followed by edge detection to perform segmentation. In [48], the adaptive k-means algorithm is used for the segmentation of color image. Here, k-means clustering is applied over the image converted to LAB color space. Several applications of k-means clustering for image segmentation are found in diverse fields like crop-yield estimation [49], brain tumor detection [11], separating crops and weeds [19], etc. Recently, convolution neural network (CNN) has found to be a popular choice for image segmentation. Some of the recent novel CNN-based works for image segmentation are available in literature [50–53]. The literature review reveals that the existing works suffer from several limitations. Iterative thresholding is free from complex mathematical analysis but suffers from convergence issues based on certain assumptions, for example, choice of initial seeds in region growing, corner pixel belongings to background, choice of offset value in region similarity, and choice of derivative operator for edge definition. On the other hand, ML approaches, although they offer improved accuracy, are data-intensive, where a large set of image data is required for training, testing, and validation. Furthermore, they are computationally expensive. Often, some applications demand low computation (hence need faster convergence) even with little sacrifice on the segmentation accuracy. Hence, computationally simpler implementation, free from any object-background assumption, without involving optimization in complex mathematical frameworks and fast convergence are the demand of several
Object-Background Partitioning on Images: A Ratio-Based Division
137
application-specific segmentations. Such applications demand a more suitable technique involving the choice of threshold so that a fixed proportion of pixels are detected as objects by the thresholding operation. This concept works fine, if it is known in advance the proportion of image pixels associated with the features of interest—often true in certain OCR, industrial inspection applications, medical image analysis, etc. On the one hand, simple partitioning of an image as a line segment makes it attractive, on the other hand, the incomplete knowledge of their appropriate proportion makes the implementation a challenging one. To address the issue, the present work considers object-background segmentation as partitioning of a line segment where appropriate proportion is determined by meeting a target object detection probability. The contributions of the work are as follows: (i) A linear data structure of an array is envisioned as an image and a linear segment of its gray-level profile is used for segmentation. A sliding window sub-image is represented as a line segment; its internal partitioning into two segments is considered as an object-background segmentation. (ii) The optimal partitioning is key to the accuracy of segmentation results. The optimal partitioning is determined by meeting the target detection accuracy of the object pixels that is set to ∼0.9 in our simulation. (iii) Unlike the other work [34], the present work does not consider any assumption of background information confined only to the corner pixels or in their variance or values. Furthermore, four to six iterations are found to be good enough for the algorithm to converge at target object detection probability ∼0.9 that supports its faster implementation.
3 Proposed Algorithm In the proposed technique, an image is considered as an array of gray-level intensities. The original image is partitioned into p × p sliding window blocks, ith block, let be denoted as Bi . Let there be total N number of such blocks obtained after partitioning the whole image. For each block, a histogram is constructed. Take the cumulative frequency m i of some gray value g from the individual histogram and based on that partition Bi as background and object in a particular ratio of pixels. For each such iteration, the local probability of correct detection Pd is computed for each block using Eq. 1. The global PD as given in Eq. 2 is found by averaging the local Pd . The iteration ends when PD meets some predefined threshold value PDth . The average cumulative frequency m avg is determined from m i of each block. Particular gray value at m avg is considered as threshold and using this, the original image is segmented into an object and background.
138
S. Kandar and S. Maity
Algorithm 1: Image Segmentation using iterative thresholding based on proportion division Input: Source image Iw×h , Ground truth image G Iw×h , Predefined detection probability threshold PDth Output:Segmented image Iseg I. Divide I into p × p sliding window blocks, ith block Bi and N total number of blocks II. For each Bi Find the histogram H i III. Set j=0 and global detection probability PD = 0 while (PD ≥ PDth ) do for each sliding window block (sub-image) Bi (i=0 to N-1) do compute_pd(i, H i , j) N −1 Pdi Compute global PD = i=0 N j=j+1 N −1 m i IV. m avg = i=0 N V. Segment I into object and background based on gray level with cumulative frequency m avg of I . function compute_pd(i, H i , j) { j (i) Take m i = k=0 H ji value such that the total pixel within the block is partitioned into m:n ratio (ii) Based on gray value at H ji divide Bi into background object ratio. (iii) Compute probability of correct detection (Pdi ) for sub image Bi (See Eq. 1) (iv) Return m i , Pdi } Pd =
T otal number o f corr ectly detected object point T otal number o f object pi xels in the ground truth image
(1)
N −1 1 Pd N i=0 i
(2)
PD =
Segmentation accuracy is determined by achieving not only a high value of PD but also a low value of probability of false detection PF . PF is computed using Eq. 3 as follows. PF =
T otal number o f backgr ound pi xels f alsely detected as object T otal number o f actual backgr ound pi xels in the ground truth image (3)
The algorithm is presented in Algorithm 1. It is worth mentioning that Pdi calculation is done with the help of a ground truth segmented image. Thus, the pro-
Object-Background Partitioning on Images: A Ratio-Based Division
139
posed segmentation algorithm is applicable for a class of images that include ground truth images and non-segmented images, both belonging to the same class characterized by a particular statistical distribution. In other words, it is assumed that images to be segmented and the ground truth images follow the same probability distribution.
4 Experimental Results and Analysis To validate the efficiency of the proposed method, simulations are done on 150 test images, however, due to space constraints the results of four standard images, namely ‘Lena’, ‘Cameraman’, ‘Brain Mri’, and ‘Spine Mri’, each of size 512 × 512 pixels, are shown in Fig. 1. The test is carried out using Matlab R 2018b over an Intel Core i7 processor, 3.40-GHz, 8 GB system. In our simulations, the window size is taken as 8 × 8 and PDth is set as 0.9, the value of which is achieved in four to six iterations. Figure 1 also reflects that to reach the PDth (which is 0.9 for this work), the value of m is different for different images, i.e. different images are linearly segmented at different points. The value of m is taken as cumulative frequency of gray level within each block. With the increased value of m, the probability of correct detection PD increases up to a certain value. After reaching the PDth value, the value of PD falls with the increased value of m. For the ‘Lena’ image, the m versus PD graph is depicted in Fig. 2. Form the graph, it is observed that for ‘Lena’ image at m = 34, PD
Lena
Cameraman
Brain Mri
Spine Mri
34
29
28
28
Test Image
Segmented Image m value
Fig. 1 Segmentation results of i Lena, ii Cameraman, iii Brain Mri, and iv Spine Mri obtained by linear segmentation
140
S. Kandar and S. Maity
Fig. 2 m versus PD graph for Lena image
value reaches the target value 0.9. With the increase in m value, PD value increases, reaches the maximum value at m ∗ = 34, then a further increase in m value reduces PD value indicating that maximum PD value is achieved at an optimal value 34. The left part of the graphical plot indicates that some object points are not detected properly (under segmentation), while low values of PD at the right part are due to the inclusion of some background pixels as objects. In the proposed method, the image is thought of as a line segment of gray level. With different segments of the line, pixels are switched from the foreground to the background or vice versa, and this is reflected in the value of PD . To observe the change, we have performed an experiment on the test images with m = 22 and 48. It is observed from Fig. 3 that for different values of m, different PD values are seen for the same test images. As mentioned at the beginning of this section, the PDth value is reached within four to six iterations. It is observed from Fig. 4 that with the increase in the number of iterations, the PD value increases, reaches PDth value, then there is no further improvement even though the iteration value increases. The proposed technique is compared with Gao et al. [21] and Li et al. schemes [41]. The first one is based on wavelet transform, whereas the second one is based on variance maximization. The visual results of ‘Lena’, ‘Cameraman’, and ‘Spine MRI’ images are shown in Fig. 5. It is seen that the proposed method shows the best performance in PD value, and is implementationally fastest than the other two techniques. Figure 6 shows receiver operating characteristic (ROC) curves of segmentation results as a graphical plot of PD versus PF for the proposed method and the other
Object-Background Partitioning on Images: A Ratio-Based Division
141
Lena
Cameraman
Brain MRI
Spine MRI
m=22 PD
0.65
0.63
0.55
0.52
m=48 PD
0.71
0.68
0.63
0.73
Fig. 3 Different PD values for the test images for changing values of m
Fig. 4 Number of iterations versus PD graph for Lena, Cameraman, Brain MRI, and Spine MRI images
two methods [21, 41]. Results show that the proposed method not only offers high detection probability of object pixels (0.9) but also offers a low value of PF (0.1). The values of PD and PF for [21] are 0.83 and 0.27, and for [41] they are 0.80 and 0.3. The area under the curve (AUC) values for the proposed method is 0.855, while the values of the same for [21] and [41] are 0.747 and 0.732, respectively.
142
S. Kandar and S. Maity
Lena
Cameraman
Spine MRI
WSM [16] PD Time (In Sec)
0.87 2.0313
0.80 2.3750
0. 82 2.3281
MVSM [52] PD Time (In Sec)
0.86 0.5156
0.70 0.4219
0.86 0.4219
Proposed PD Time (In Sec)
0.9 0.4579
0.9 0.4130
0.9 0.4206
Fig. 5 Comparison of the proposed technique with some existing schemes
Fig. 6 Comparison of AUC of the proposed technique with existing schemes
Object-Background Partitioning on Images: A Ratio-Based Division
143
5 Conclusions and Scope of Future Works The paper explores the concept of dividing a line segment internally as a means of object-background partitioning of images based on gray-level intensity. The appropriate proportion of object-background pixels is determined by setting the desired probability of the detection threshold by means of an iterative approach. A large set of simulation results shows that the proposed method achieves 0.9 probability of correct detection of object pixels with 0.1 false probability and at four to six iterations that highlight its faster convergence. The proposed method can be extended for future works as follows: (i) Instead of gray-level intensity, an edge-based partitioning in appropriate proportion can be done by setting the desired detection probability of edge that includes the object. (ii) Often, the image to be segmented suffers from the enhancement problem that involves two separate operations, image enhancement, and then segmentation. The proposed concept can be used as an external partitioning to include linear dynamic range change (expansion) and segmentation together.
References 1. Islam MJ, Basalamah S, Ahmadi M, Sid-Ahmed MA (2011) Capsule image segmentation in pharmaceutical applications using edge-based techniques. In: IEEE international conference on electro/information technology. IEEE, pp 1–5 2. Thurley MJ, Andersson T (2008) An industrial 3D vision system for size measurement of iron ore green pellets using morphological image segmentation. Miner Eng 21(5):405–415, Elsevier 3. Mati´c T et al (2018) Real-time biscuit tile image segmentation method based on edge detection. ISA Trans 246–254, Elsevier 4. Bukhari SS, Shafait F, Breuel TM (2011) Improved document image segmentation algorithm using multiresolution morphology, Document recognition and retrieval XVIII, vol 7874, SPIE, pp 109–116 5. Zirari F, Ennaji A, Nicolas S, Mammass D (2013) A document image segmentation system using analysis of connected components. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 753–757 6. Xu Z, Zhang W, Tan X, Yang W, Huang H, Wen S, Ding E, Huang L (2020) Segment as points for efficient online multi-object tracking and segmentation. In: European conference on computer vision. Springer, pp 264–281 7. Xu Z, Yang W, Zhang W, Tan X, Huang H, Huang L (2021) Segment as points for efficient and effective online multi-object tracking and segmentation. IEEE Trans Pattern Anal Mach Intell 44(10):6424–6437 8. Jia H, Lang C, Oliva D, Song W, Peng X (2019) Hybrid grasshopper optimization algorithm and differential evolution for multilevel satellite image segmentation. Remote Sens 11(9):1134 9. Rahaman J, Sing M (2021) An efficient multilevel thresholding based satellite image segmentation approach using a new adaptive cuckoo search algorithm. Expert Syst Appl 174:114633 10. Zhang Y, Miao S, Mansi T, Liao R (2020) Unsupervised X-ray image segmentation with task driven generative adversarial networks. Medical Image Anal 62:101664
144
S. Kandar and S. Maity
11. Nasor M, Obaid W (2020) Detection and localization of early-stage multiple brain tumors using a hybrid technique of patch-based processing, k-means clustering and object counting. Int J Biomed Imaging Hindawi 12. Tan TY, Zhang L, Lim CP, Fielding B, Yu Y, Anderson E (2019) Evolving ensemble models for image segmentation using enhanced particle swarm optimization. IEEE Access 7:34004–34019 13. Lenin Fred A, Kumar SN, Padmanaban P, Gulyas B, Ajay Kumar H (2020) Fuzzy-crow search optimization for medical image segmentation. Applications of hybrid metaheuristic algorithms for image processing. Springer, Cham, pp 413–439 14. Timchenko LI, Pavlov SV, Kokryatskaya NI, Poplavska AA, Kobylyanska IM, Burdenyuk II, ... Kashaganova G (2017) Bio-inspired approach to multistage image processing. In: Photonics applications in astronomy, communications, industry, and high energy physics experiments 2017, vol 10445, SPIE, pp 1087–1100 15. Rodriguez R, Garcés Y, Torres E, Sossa H, Tovar R (2019) A vision from a physical point of view and the information theory on the image segmentation. J Intell & Fuzzy Syst 37(2):2835–2845 16. Corona E, Hill JE, Nutter B, Mitra S (2013) An information theoretic approach to automated medical image segmentation. In: Medical imaging 2013: image processing, vol 8669, SPIE, pp 1028–1035 17. Xu Y, Wang Y, Yuan J, Cheng Q, Wang X, Carson PL (2019) Medical breast ultrasound image segmentation by machine learning. Ultrasonics 91:1–9, Elsevier 18. Bharathi BS, Swamy KV (2020) Effective image segmentation using modified K-means technique. In: 2020 4th international conference on trends in electronics and informatics (ICOEI), vol 48184. IEEE, pp 757–762 19. Sodjinou SG, Mohammadi V, Mahama ATS, Gouton P (2022) A deep semantic segmentationbased algorithm to segment crops and weeds in agronomic color images. Inf Process Agricult 9(3):355–364, Elsevier 20. Mushrif MM, Ray AK (2018) Color image segmentation: rough-set theoretic approach. Pattern Recogn Lett 29(4):483–493 21. Gao J, Wang B, Wang Z, Wang Y, Kong F (2020) A wavelet transform-based image segmentation method. Optik 208:164123, Elsevier 22. Pun T (1980) A new method for grey-level picture thresholding using the entropy of the histogram. Signal Process 2(3):223–237, Elsevier 23. Kapur JN, Sahoo PK, Wong AK (1985) A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vis Graph Image Process 29(3):273–285, Elsevier 24. Mardia KV, Hainsworth TJ (1988) A spatial thresholding method for image segmentation. IEEE Trans Pattern Anal Mach Intell 10(6):919–927, IEEE 25. Cheriet M, Said JN, Suen CY (1998) A recursive thresholding technique for image segmentation. IEEE Trans Image Process 7(6):918–921, IEEE 26. Sumengen B, Manjunath BS (2005) Multi-scale edge detection and image segmentation. In: 2005 13th European signal processing conference. IEEE, pp 1–4 27. Saad MN, Muda Z, Ashaari NS, Hamid HA (2014) Image segmentation for lung region in chest X-ray images using edge detection and morphology. In: 2014 IEEE international conference on control system, computing and engineering (ICCSCE 2014). IEEE, pp 46–51 28. Xiaohan Y, Yla-Jaaski J, Huttunen O, Vehkomaki T, Sipila O, Katila T (1993) Image segmentation combining region growing and edge detection. In: Proceedings, 11th IAPR international conference on pattern recognition, vol III. Conference C: image, speech and signal analysis. IEEE, pp 481–484 29. Shih FY, Cheng S (2005) Automatic seeded region growing for color image segmentation. Image Vis Comput 23(10):877–886, Elsevier 30. Dhanachandra N, Manglem K, Chanu YJ (2015) Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Comput Sci 54:764–771, Elsevier 31. Uslan V, Bucak IÖ (2010) Microarray image segmentation using clustering methods. Math Comput Appl 15(2):240–247, Association for Scientific Research 32. Chuang KS, Tzeng HL, Chen S, Wu J, Chen TJ (2006) Fuzzy c-means clustering with spatial information for image segmentation. Comput. Med. Imaging Graph 30(1):9–15, Elsevier
Object-Background Partitioning on Images: A Ratio-Based Division
145
33. Lee SH, Koo HI, Cho NI (2010) Image segmentation algorithms based on the machine learning of features. Pattern Recogn Lett 31(14):2325–2336, Elsevier 34. Sonka M, Hlavac V, Boyle R (2014) Image processing, analysis, and machine vision. Cengage Learning 35. Perez A, Gonzalez RC (1987) An iterative thresholding algorithm for image segmentation. IEEE Trans Pattern Anal Mach Intell (6):742–751, IEEE 36. Li CH, Lee CK (1993) Minimum cross entropy thresholding. Pattern Recogn 26(4):617–625, Elsevier 37. Pal NR (1996) On minimum cross-entropy thresholding. Pattern Recogn 29(4):575–580, Elsevier 38. Li CH, Tam PKS (1998) An iterative algorithm for minimum cross entropy thresholding. Pattern Recogn Lett 19(8):771–776, Elsevier 39. Zimmer Y, Tepper R, Akselrod S (1996) A two-dimensional extension of minimum cross entropy thresholding for the segmentation of ultrasound images. Ultrasound Med & Biol 22(9):1183–1190, Elsevier 40. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66, IEEE 41. Li M et al (2012) An improved method of image segmentation based on maximum inter class variance. J Nanjing Univ Technol: Nat Sci Ed 36(2) 42. Wang S, Siskind JM (2003) Image segmentation with ratio cut. IEEE Trans Pattern Anal Mach Intell 25(6):675–690 43. Falcao AX, Stolfi J, de Alencar Lotufo R (2004) The image foresting transform: theory, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 26(1):19–29 44. Bragantini J, Martins SB, Castelo-Fernandez C, Falcão AX (2018) Graph-based image segmentation using dynamic trees. In: Iberoamerican congress on pattern recognition. Springer, pp 470–478 45. Li Y, Zhang J, Gao P, Jiang L, Chen M (2018) Grab cut image segmentation based on image region. In: 2018 IEEE 3rd international conference on image, vision and computing (ICIVC). IEEE, pp 311–315 46. Borlido Barcelos I, Belém F, Miranda P, Falcão AX, do Patrocínio ZK, Guimarães SJF (2021) Towards interactive image segmentation by dynamic and iterative spanning forest. In: International conference on discrete geometry and mathematical morphology. Springer, pp 351–364 47. Bejar HH, Guimaraes SJF, Miranda PA (2020) Efficient hierarchical graph partitioning for image segmentation by optimum oriented cuts. Pattern Recogn Lett 131:185–192, Elsevier 48. Zheng X, Lei Q, Yao R, Gong Y, Yin Q (2018) Image segmentation based on adaptive K-means algorithm. EURASIP J Image Video Process 2018(1):1–10, Springer 49. Reza MN, Na IS, Baek SW, Lee KH (2019) Rice yield estimation based on K-means clustering with graph-cut segmentation using low-altitude UAV images. Biosyst Eng 177:109–121, Elsevier 50. Gonçalves DN, de Moares Weber VA, Pistori JGB, da Costa Gomes R, de Araujo AV, Pereira MF, ... Pistori H (2020) Carcass image segmentation using CNN-based methods. Inf Proc Agricul 560–572 51. Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, González-Ortega D (2021) A deep learning approach for brain tumor classification and segmentation using a multiscale convolutional neural network. Healthcare 9(2):153, MDPI 52. Zhang C, Chen X, Ji S (2022) Semantic image segmentation for sea ice parameters recognition using deep convolutional neural networks. Int J Appl Earth Obs Geoinf 112:102885 53. Maity A, Nair TR, Mehta S, Prakasam P (2022) Automatic lung parenchyma segmentation using a deep convolutional neural network from chest X-rays. Biomed Signal Process Control 73:103398
More on Semipositive Tensor and Tensor Complementarity Problem R. Deb and A. K. Das
Abstract In recent years, several classes of structured matrices are extended to classes of tensors in the context of tensor complementarity problem. The tensor complementarity problem is a type of nonlinear complementarity problem where the functions are special polynomials defined by a tensor. Semipositive and strictly semipositive tensors play an important role in the study of the tensor complementarity problem. The article considers a few important characteristics of semipositive tensors. We establish the invariance property of the semipositive tensor. We prove necessary and sufficient conditions for semipositive tensor. A relation between the even order row diagonal semipositive tensor and its majorization matrix is proposed. Keywords Tensor complementarity problem · Semipositive tensor · Semimonotone matrix · Null vector · Majorization matrix
1 Introduction A tensor is a multidimensional array that is a natural extension of matrices. A real tensor of order r and dimension n, M = (m i1 ...ir ) is a multidimensional array of entries m i1 ...ir ∈ R, where i j ∈ In with j ∈ Ir . Here In is the set In = {1, 2, ..., n}. Tr,n denotes the collection of tensors with real entries of order m and dimension n. Tensors have many applications in science and engineering. The common applications are found in electromagnetism, continuum mechanics, quantum mechanics and quantum computing, spectral hypergraph theory, diffusion tensor imaging, image authenticity verification problem, optimization theory, and in many other areas. In optimization R. Deb (B) Jadavpur University, 188 Raja S.C. Mallik Road, Kolkata 700 032, India e-mail: [email protected] A. K. Das Indian Statistical Institute, 203 B. T. Road, Kolkata 700 108, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_11
147
148
R. Deb and A. K. Das
theory, the tensor complementarity problem (TCP) was proposed by Song and Qi [29], which is an extension of the linear complementarity problem and a subclass of the nonlinear complementarity problem where the involved functions are defined by a tensor. Assuming A ∈ Rn×n and q ∈ Rn , the linear complementarity problem, LCP(q, A) [2] determines vectors w, z ∈ Rn such that z ≥ 0, w = Az + q ≥ 0, z T w = 0.
(1)
Here the collection of solutions of LCP(q, A) is denoted by SOL(q, A). Several matrix classes arising in linear complementarity problems are important due to the computational point of view. For details, see [5, 10, 12, 17, 19–21]. For details of game theory, see [4]. Even several matrix classes arise during the study of Lemke’s algorithm as well as principal pivot transform. For details, see [3, 10–12, 15, 16, 18]. Given a tensor M ∈ Tr,n and q ∈ Rn , the tensor complementarity problem determines vectors ω, u ∈ Rn such that u ≥ 0, ω = Mu r −1 + q ≥ 0, u T ω = 0.
(2)
The problem is denoted by TCP(q, M). The set containing solutions of TCP(q, M) is denoted by SOL(q, M). The tensor complementarity problem arises in optimization theory, game theory, and in other areas. Many practical problems can be modeled as forms of equivalent tensor complementarity problems like a class of noncooperative games with multiple players [7], clustering problems with hypergraphs, and traffic equilibrium problem [8]. Identification of tensor classes are crucial in the event of TCP. In recent years, many structured tensors are developed and studied in the context of TCP. For details, see [14, 22, 24, 27, 29, 30]. Note that the tensor complementarity problem is basically based on the Shao product which considers the products of tensor with vector and the products of tensor with matrix. Multiperson noncooperative game, hypergraph clustering problem, traffic equilibrium problem can be formulated as tensor complementarity problem. Introducing new tensors and study of existing tensors in the context of TCP are important to enrich this area. Einstein product consists of the product between two tensors which is used in computational mechanics, signal processing, image restoration, and video processing. In case of t-product, two tensor of the same order are used. Computer vision, image feature extraction, and signal processing use the t-product of tensors. The class of semimonotone matrices has a crucial contribution to the theory of linear complementarity problem. Several matrix theoretic properties and properties related to the solution of linear complementarity problems involving semimonotone matrices were studied in the literature. For details, see [13, 23, 31]. Song and Qi [29] widened the semimonotone class of matrices to semipositive tensors in the context of the tensor complementarity problem. Song and Qi [29] showed that a strictly semipositive tensor is a Q tensor. Song and Qi [28] showed that for a (strictly) semipositive tensor M the TCP(q, M) has only one solution for q > 0 (q ≥ 0). Many other tensor the-
More on Semipositive Tensor and Tensor Complementarity Problem
149
oretic characteristics of TCP with semipositive tensors were discussed. For details, see [6, 28–30, 32]. The paper contains some tensor theoretic properties of semipositive tensors as well as strictly semipositive tensors. We discuss some necessary as well as sufficient conditions for (strictly) semipositive tensors. We prove an equivalence in connection with the majorization matrix of an even order row diagonal (strictly) semipositive tensors. The paper is organized as follows. Section 2 contains some necessary notations and results. In Sect. 3, we investigate some tensor theoretic properties of semipositive tensors. We establish the invariance property of semipositive tensors. We propose necessary and sufficient conditions for semipositive tensors. We establish the connection between an even order row diagonal semipositive tensor and its majorization matrix.
2 Preliminaries Now we introduce related notations which are used in the paper. We consider vectors, matrices, and tensors with real entries. Assuming n > 0, In denotes the set {1, 2, ..., n}. In this article, Rn is the Euclidean space of dimension n and Rn+ = {u ∈ Rn | u ≥ 0}, Rn++ = {u ∈ Rn | u > 0}. The vector u ∈ Rn denotes a column vector and u T is the transpose of u. A diagonal matrix D = [di j ]n×n = di ; ∀ i = j, diag(d1 , d2 , ..., dn ) is defined as di j = 0 ; ∀ i = j. Definition 1 ([1, 23]) A matrix M ∈ Rn×n is semimonotone, if for any 0 = z ≥ 0, ∃ an index k ∈ In such that z k > 0 and (M z)k ≥ 0. Definition 2 ([1, 23]) A matrix M ∈ Rn×n is strictly semimonotone, if for any 0 = z ≥ 0, ∃ an index k ∈ In such that z k > 0 and (M z)k > 0. Let M = (m i1 ...ir ) be a multidimensional array of entries m i1 ...ir ∈ R where i j ∈ In with j ∈ Ir . Tr,n is a particular collection of tensors with real entries of order r and dimension n. Any M = (m i1 ...ir ) ∈ Tr,n is symmetric tensor, if m i1 ...ir are permutation invariant with respect to indices. Sr,n denotes the set of all r th order n dimensional symmetric tensors with r, n ≥ 2. An identity tensor of order r, I = (δi1 ...ir ) ∈ Tr,n 1 : i 1 = ... = ir is defined as follows: δi1 ...ir = . Let O denote the zero tensor 0 : else where each entry of O is zero. For any u ∈ Rn and p ∈ R, let u [ p] denote the vector p p p (u 1 , u 2 , ..., u n )T . For M ∈ Tr,n and u ∈ R, (Mu r −1 )i =
n i 2 ,...,ir =1
and Mu r ∈ R is a scalar defined by
m ii2 ...ir u i2 · · · u ir , for all i ∈ In
150
R. Deb and A. K. Das
u T Mu r −1 = Mu m =
n
m i1 ...ir u i1 · · · u ir .
i 1 ,...,ir =1
For a tensor M ∈ Tr,n a vector u ∈ Rn is said to be a null vector of M, if Mu r −1 = 0. In tensor theory several types of products are available. Shao [25] introduced a kind of product. Here A and B are two n dimensional tensor of order p ≥ 2 and r ≥ 1. The product AB is an n dimensional tensor C of order (( p − 1)(r − 1)) + 1 with entries a j j2 ··· j p b j2 β1 · · · b j p β p−1 , c jβ1 ···β p−1 = j2 ,··· , j p ∈In
where j ∈ In , β1 , · · · , β p−1 ∈ Inr −1 . Definition 3 ([24, 27]) Given M = (m i1 ...ir ) ∈ Tr,n and an index set J ⊆ In with |J | = l, 1 ≤ l ≤ n, a principal subtensor of M is denoted by MlJ = (m i1 ...ir ), ∀ i 1 , i 2 , ...ir ∈ J. Definition 4 ([9, 30]) Given M ∈ Tr,n and q ∈ Rn , TCP(q, M) is solvable if ∃ u ∈ Rn that satisfies (2). The tensor M ∈ Tr,n is called a Q-tensor if the TCP(q, M) defined by Eq. (2) is solvable for all q ∈ Rn . Definition 5 ([27]) A tensor M ∈ Tr,n is a P0 (P)-tensor, if for any u ∈ Rn \{0}, ∃ i ∈ In such that u i = 0 and u i (Mu r −1 )i ≥ 0 (> 0). Definition 6 ([26]) The ith row subtensor of M ∈ Tr,n is denoted by Ri (M) and its entries are given as (Ri (M))i2 ...ir = (m ii2 ...ir ), where i j ∈ In and 2 ≤ j ≤ r. M is a row diagonal tensor if all the row subtensors R1 (M), ..., Rn (M) are diagonal tensors. Definition 7 ([26]) Given M = (m i1 ...ir ) ∈ Tr,n , the majorization matrix of M is ˜ i j = (m i j... j ), ∀ i, j ∈ In . ˜ ∈ Rn×n where M denoted by M ˜ be the majorization matrix of M. Then Lemma 1 ([26]) Let M ∈ Tr,n and M ˜ M is row diagonal tensor iff M = MI, where I is identity tensor of order r and dimension n. Theorem 1 ([28]) Let M = (m i1 ...ir ) ∈ Tr,n . Then each principal subtensor of a (strictly) semipositive tensor is a(strictly) semipositive tensor. Theorem 2 ([28]) Let M = (m i1 ...ir ) ∈ Tr,n . The statements given below are equivalent: (a) M is semipositive tensor. (b) For every q > 0, the TCP(q, M) has a unique solution. (c) For any index set J ⊆ In with |J | = r, the system MrJ u rJ−1 < 0, u J ≥ 0 has no solution where, u J ∈ Rr .
More on Semipositive Tensor and Tensor Complementarity Problem
151
Theorem 3 ([28]) Let M = (m i1 ...ir ) ∈ Tr,n. The statements given below are equivalent: (a) M is strictly semipositive tensor. (b) For every q ≥ 0, the TCP(q, M) has a unique solution. (c) For any index set J ⊆ In with |J | = r, the system MrJ u rJ−1 ≤ 0, u J ≥ 0, u J = 0 has no solution where, u J ∈ Rr .
3 Main Results At the outset, we define semipositive tensor along with an example. Subsequently, we establish the invariance property of semipositive tensor with respect to the diagonal matrix as well as the permutation matrix. Definition 8 ([29]) A tensor M ∈ Tr,n is (strictly) semipositive tensor if for each u ∈ Rn+ \{0}, ∃ l ∈ In such that u l > 0 and (Mu r −1 )l ≥ 0 (> 0). Here we give an example of a semipositive tensor. Example 1 Let M ∈ T4,3 such that m 1211 = 1, m 1233 = 2, m 1323 = −1, m 1223 = = −1, −3, m 1232 = 4, m 2211 = 1, m 2223 = −3, m 2322 = 5, m 3232 ⎛ ⎞ m 3322 = u1 −3, m 3313 = −2 and all other entries of M are zero. For u = ⎝ u 2 ⎠ ∈ R3 we have u3 ⎛ 2 ⎞ 2 2 u1u2 + u2u3 + u2u3 ⎠ . Then M is a semipositive tensor. Mu 3 = ⎝ u 21 u 2 + 2u 22 u 3 2 2 −2u 1 u 3 − 4u 2 u 3 Theorem 4 Let M ∈ Tr,n and D ∈ Rn×n be a positive diagonal matrix. Then M is (strictly) semipositive tensor iff DM is (strictly) semipositive tensor. Proof Let D = diag(d1 , ..., dn ) be a positive diagonal matrix. Therefore, the diagonal entries di > 0, ∀ i ∈ In . Suppose M ∈ Tr,n is a (strictly) semipositive tensor. Then for 0 = u ≥ 0 ∃ an index l such that u l > 0 and (Mu r −1 )l ≥ 0 (> 0). We show that DM is (strictly) semipositive tensor. Since dl > 0 we obtain (DMu r −1 )l = dl (Mu r −1 )l ≥ 0 (> 0). Thus DM is (strictly) semipositive tensor. Conversely, suppose DM is (strictly) semipositive tensor. Now D is positive diagonal matrix so D −1 exists. Therefore D −1 (DM) = M is a (strictly) semipostive tensor. Corollary 1 Suppose M ∈ Tr,n is a semipositive tensor and D ∈ Rn×n be a nonnegative diagonal matrix. Then DM is semipositive tensor. Now we show that the converse of the result does not hold.
152
R. Deb and A. K. Das
Example 2 Let M ∈ T4,2 be such that m 1111 = 1, m 1121 = −1, m 1112 = −3, m 1222 = −1, m 1211 = 1, m 2121 = −3, m 2222 = 2 and all3 other2 entries3 of u 1 − 3u 1 u 2 − u 2 u1 M are zero. Then for u = ∈ R2 we have Mu 3 = . u2 −3u 21 u 2 + 2u 32 −3 1 . This implies M is not a Clearly for u = we obtain Mu 3 = −1 1 semipositive tensor. 00 Now we consider two nonnegative diagonal matrices D1 = and D2 = 03 20 . Let D1 M = B and D2 M = C. Then B = (bi jkl ) ∈ T4,2 with b2121 = 00 u1 −9, b2222 = 6 and all other entries of B are zero. For u = ∈ R2 we have u2 0 . Clearly B is a semipositive tensor. Bu 3 = −9u 21 u 2 + 6u 32 Also C = (ci jkl ) ∈ T4,2 with c1111 = 2, c1121 = −2, c1112 = −6, c1222 = u1 ∈ R2 −2, c1211 = 2 and all other entries of C are zero. Then for u = u2 3 2u 1 − 6u 21 u 2 − 2u 32 we have Cu 3 = . Clearly C is a semipositive tensor. 0 Theorem 5 Let P ∈ Rn×n be a permutation matrix also let M ∈ Tr,n . Then the tensor M is (strictly) semipositive tensor iff PMP T is (strictly) semipositive tensor. Proof Let M ∈ Tr,n be a (strictly) semipositive tensor. Let P ∈ Rn×n be a permutation matrix such that for any vector v ∈ Rn , (Pv)k = vσ (k) where σ is a permutation on the set of indices In . Let 0 = u ≥ 0 where u ∈ Rn . Let v = P T u. Then v ≥ 0. As M is a (strictly) semipositive tensor ∃ l such that vl > 0 and (Mvr −1 )l ≥ 0 (> 0). Thus (PMP T u r −1 )σ (l) = (MP T u r −1 )l = (Mvr −1 )l ≥ 0 (> 0). Also u σ (l) = (P T u)l = vl > 0. Therefore PMP T is (strictly) semipositive tensor. Conversely, suppose PMP T is (strictly) semipositive tensor. Now M = P(PMP T )P T , since P is a permutation matrix. Hence by the forward part of the proof M is (strictly) semipositive tensor. Now we prove the necessary and sufficient conditions for a tensor to be semipositive tensor. Theorem 6 Let M, N ∈ Tr,n where M is a (strictly) semipositive tensor and N ≥ O. Then M + N is (strictly) semipositive tensor. Proof Suppose M ∈ Tr,n is (strictly) semipositive tensor. Then for 0 = u ≥ 0 ∃ l ∈ In such that u l > 0 and (Mu r −1 )l ≥ 0 (> 0). Let N ∈ Tr,n where N ≥ O. Then for 0 = u ∈ Rn+ we have (N u r −1 )i ≥ 0, ∀ i ∈ In . Therefore ((M + N )u r −1 )l = (Mu r −1 )l + (N u r −1 )l ≥ 0 (> 0). Hence, M + N is (strictly) semipositive tensor. Here we discuss a necessary and sufficient condition for (strictly) semipositive tensor.
More on Semipositive Tensor and Tensor Complementarity Problem
153
Theorem 7 Let M ∈ Tr,n be a tensor with the property that all its proper principal subtensors are (strictly) semipositive tensor. M is (strictly) semipositive tensor iff for all diagonal tensors D ∈ Tr,n with D > O (≥ O), the tensor M + D does not have a positive null vector. Proof Let there be a diagonal tensor D ∈ Tr,n with D > O (≥ O) with the property that M + D has a positive null vector, i.e., There exists 0 < u ∈ Rn such that (M + D)u r −1 = 0 =⇒ Mu r −1 + Du r −1 = 0 =⇒ Mu r −1 = −Du r −1 < 0 (≤ 0), which shows that M is not (strictly) semipositive tensor. Conversely, suppose M is not (strictly) semipositive tensor. We prove that ∃ a diagonal tensor D ∈ Tr,n with D > O (≥ O) with the property that the tensor M + D has a positive null vector. Since M is not (strictly) semipositive tensor (but all proper principal subtensors are), ∃ u > 0 such that Mu r −1 < 0 (≤ 0). Now we r −1 )i construct diagonal tensor D with diagonal entries being defined as di...i = − (Mu . u r −1 i
Then D > O (≥ O) and Mu r −1 = −Du r −1 . Therefore, Mu r −1 + Du r −1 = (M + D)u r −1 = 0. Thus, M + D has a positive null vector, since u > 0.
Here we propose a necessary and sufficient condition for a semipositive tensor to be a strictly semipositive tensor. Theorem 8 A tensor M ∈ Tr,n is a semipositive tensor iff for every δ > 0, the tensor M + δI is a strictly semipositive tensor. Proof Let M ∈ Tr,n be a semipositive tensor. Then for every 0 = u ≥ 0, ∃ an index l such that u l > 0 and (Mu r −1 )l ≥ 0. Therefore, [(M + δI)u r −1 ]l = (Mu r −1 )l + δu rl −1 > 0. Thus M + δI is a strictly semipositive tensor. Conversely, suppose that M + δI is a strictly semipositive tensor for each δ > 0. For arbitrarily chosen 0 = u ≥ 0 consider a sequence {δk }, where δk > 0 and δk converges to zero. Then for each k, ∃ lk ∈ In such that u lk > 0 and [(M + δI)u r −1 ]lk > 0. This implies ∃ l ∈ In such that u l > 0 and (Mu r −1 )l + δk u rl −1 = [(M + δk I)u r −1 ]l > 0 for infinitely many δk . Since δk → 0 when k → ∞ we conclude that Mu rl −1 ≥ 0. Hence M is semipositive tensor. Let I denote the identity matrix of order n and [0, I ] denote all diagonal matrices from Rn×n with diagonal entries from [0, 1]. Now we prove a necessary and sufficient condition for strictly semipositive tensor. Theorem 9 Let M ∈ Tr,n and r is even. M is strictly semipositive tensor if and only if ∀ G ∈ [0, I ] the tensor GI + (I − G)M has no nonzero nonnegative null vector. Proof Let M be a strictly semipositive tensor and ∀ G ∈ [0, I ], ∃ 0 = u ≥ 0 which is a null vector of the tensor GI + (I − G)M. i.e. (GI + (I − G)M)u r −1 = 0.
154
R. Deb and A. K. Das
Let v = Mu r −1 . As M is strictly semipositive tensor ∃ l ∈ In such that u l > 0 and vl = (Mu r −1 )l > 0. Now [GI + (I − G)M]u r −1 = 0 =⇒ gll u rl −1 + (1 − gll )(Mu r −1 )l = 0 =⇒ gll (u rl −1 − vl ) = −vl . Note that if u rl −1 = vl then vl = 0 which contradicts the fact that vl > 0. Therefore −vl u rl −1 − vl = 0 and gll = ur −1 . Now if u rl −1 − vl > 0 then we have −vl ≥ 0, since −v l
l
gll ≥ 0. Therefore vl ≤ 0, which is again a contradiction, since vl > 0. Let u rl −1 − −vl ≤ 1 =⇒ u rl −1 − vl ≤ −vl =⇒ u l ≤ 0 vl < 0. Now since gll ≤ 1, we have ur −1 −vl l (since r is even) which contradicts the fact that u l > 0. Hence no nonzero nonnegative null vector exists. Conversely, let M be not a strictly semipositive tensor. Then ∃ 0 = u ≥ 0 such that u i (Mu r −1 )i ≤ 0 ∀ i ∈ In . Let v = Mu r −1 . Then for u l > 0 we have either vl = 0 or vl < 0. Note that if u l = 0, then there are three choices for vl , they are vl < 0. Now we define diagonal a matrix G with the diagonal vl = 0, vl > 0 or ⎧ ⎪ 0 if u i > 0, vi = 0 ⎪ ⎨ −vi if u > 0, v < 0 r −1 i i . Note that if u i > 0 and vi < 0, then elements as gii = u i −vi ⎪ 0 if u i = 0, vi = 0 ⎪ ⎩ 1 if u i = 0, vi ≶ 0 −vi −vi −vi 0 ≤ ur −1 ≤ 1. If not, then there are two possibilities either ur −1 < 0 or ur −1 > −v −v −v i
i
i
i
i
i
−vi < 0 implies vi > 0, since u ri −1 − vi > 0. This leads to a contradiction. 1. Now ur −1 −v i
Again if
i
−vi u ri −1 −vi
> 1 then we have −vi > u ri −1 − vi =⇒ u ri −1 < 0, which is again
a contradiction. Thus, we have 0 ≤ gii ≤ 1 and [(I − G)Mu r −1 ]i = (1 − gii )vi = −gii u ri −1 = −(GIu r −1 )i . Therefore, [GI + (I − G)M]u r −1 = 0., i.e., there exists a diagonal matrix G whose diagonal entries are from [0, 1] such that the tensor [GI + (I − G)M] has a nonzero nonnegative null vector. Here we establish a relation between an even order row diagonal (strictly) semipositive tensor and its majorization matrix. Theorem 10 Let M ∈ Tr,n be a row diagonal tensor of even order. M is (strictly) ˜ ∈ Rn×n is a (strictly) semimonotone matrix. semipositive tensor iff M ˜ ∈ Proof Let M ∈ Tr,n be a row diagonal tensor of even order. i.e., r is even. Let M n×n n r −1 be the majorization matrix of the tensor M. Then for u ∈ R we have Mu = R ˜ r −1 = Mu ˜ [r −1] . Let M be a (strictly) semipositive tensor. We prove that M ˜ is MIu 1 (strictly) semimonotone matrix. Now for 0 = v ≥ 0, v [ r −1 ] exists uniquely with 0 = 1 1 v [ r −1 ] ≥ 0, since r is even. Let u = v [ r −1 ] . Since M is a (strictly) semipositive tensor, ˜ [r −1] )l ≥ 0 (> 0). This for 0 = u ≥ 0 ∃ l such that u l > 0 and (Mu r −1 )l = (Mu r −1 [r −1] ˜ )l ≥ 0 (> 0). i.e., For 0 = v ≥ 0 ∃ l such that vl = implies u l > 0 and (Mu r −1 [r −1] ˜ ˜ ˜ is a (strictly) u l > 0 and (Mv)l = (Mu )l ≥ 0 (> 0). This implies that M semimonotone matrix.
More on Semipositive Tensor and Tensor Complementarity Problem
155
˜ be a (strictly) semimonotone matrix. Since M = MI, ˜ Conversely, let M for x ∈ Rn we have ˜ [r −1] )i, for all i ∈ In . (3) (Mx r −1 )i = (Mx We choose x ∈ Rn such that 0 = x ≥ 0 and construct the vector y = x [r −1] . Then 0 = ˜ is a (strictly) semimonotone matrix ∃ t ∈ In such that yt > 0 and y ≥ 0. Since M ˜ t ≥ 0 (> 0). i.e., For 0 = x ≥ 0 ∃ t ∈ In such that xtr −1 > 0 and (Mx r −1 )t ≥ (My) 0 (> 0) by Eq. 3. Therefore for 0 = x ≥ 0 ∃ t ∈ In such that xt > 0 (since r is even) and (Mx r −1 )t ≥ 0 (> 0). Hence M is (strictly) semipositive tensor. Not all even order semipositive tensors are row diagonal. Here we provide an example of a tensor that is semipositive tensor but not row diagonal tensor. Example 3 Let M ∈ T4,3 be such that m 1122 = 2, m 1131 = 2, m 2211 = 1, m 2112 = entries −4, m 2322 = −1, m 3232⎛= −1, ⎛ all other ⎞ m 3322 = 1, m 3313 = 3 and ⎞ of M u1 2u 1 u 22 + 2u 21 u 3 are zero. Then for u = ⎝ u 2 ⎠ ∈ R3 we have Mu 3 = ⎝ −3u 21 u 2 − u 22 u 3 ⎠ . The u3 3u 1 u 23 tensor M is a semipositive tensor of even order but M is not a row diagonal tensor.
4 Conclusion In this article, we establish some tensor theoretic properties of semipositive tensors and strictly semipositive tensors. We show the invariance property of (strictly) semipositive tensor with respect to the diagonal matrix and permutation matrix. We show that M ∈ Tr,n is a semipositive tensor if and only if for every δ > 0, the tensor M + δI is strictly semipositive tensor. We propose necessary and sufficient conditions for (strictly) semipositive tensor. Furthermore, we prove that an even order row diagonal tensor is (strictly) semipositive tensor if and only if its majorization matrix is (strictly) semimonotone matrix. Acknowledgements The author R. Deb is thankful to the Council of Scientific & Industrial Research (CSIR), India, for Junior Research Fellowship scheme for financial support.
References 1. Cottle RW (1968) On a problem in linear inequalities. J Lond Math Soc 1(1):378–384 2. Cottle RW, Pang JS, Stone RE (2009) The linear complementarity problem. SIAM 3. Das AK (2016) Properties of some matrix classes based on principal pivot transform. Ann Oper Res 243(1):375–382 4. Das AK, Jana R, Deepmala (2016) On generalized positive subdefinite matrices and interior point algorithm. In: International conference on frontiers in optimization: theory and applications. Springer, pp 3–16
156
R. Deb and A. K. Das
5. Das AK, Jana R, Deepmala (2017) Finiteness of criss-cross method in complementarity problem. In: International conference on mathematics and computing. Springer, pp 170–180 6. Guo Q, Zheng MM, Huang ZH (2019) Properties of S-tensors. Linear and Multilinear Algebra 67(4):685–696 7. Huang ZH, Qi L (2017) Formulating an n-person noncooperative game as a tensor complementarity problem. Comput Optim Appl 66(3):557–576 8. Huang ZH, Qi L (2019) Tensor complementarity problems–part III: applications. J Optim Theory Appl 183(3):771–791 9. Huang ZH, Suo YY, Wang J (2015) On Q-tensors. arXiv:1509.03088 10. Jana R, Das AK, Dutta A (2019) On hidden Z -matrix and interior point algorithm. Opsearch 56(4):1108–1116 11. Jana R, Das AK, Sinha S (2018) On processability of Lemke’s algorithm. Appl Appl Math 13(2) 12. Jana R, Dutta A, Das AK (2021) More on hidden Z -matrices and linear complementarity problem. Linear and Multilinear Algebra 69(6):1151–1160 13. Karamardian S (1972) The complementarity problem. Math Program 2(1):107–129 14. Luo Z, Qi L, Xiu N (2017) The sparsest solutions to Z -tensor complementarity problems. Optim Lett 11(3):471–482 15. Mohan SR, Neogy SK, Das AK (2001) More on positive subdefinite matrices and the linear complementarity problem. Linear Algebra Appl 338(1–3):275–285 16. Mohan SR, Neogy SK, Das AK (2001) On the classes of fully copositive and fully semimonotone matrices. Linear Algebra Appl 323(1–3):87–97 17. Neogy SK, Das AK (2005) On almost type classes of matrices with Q-property. Linear and Multilinear Algebra 53(4):243–257 18. Neogy SK, Das AK (2005) Principal pivot transforms of some classes of matrices. Linear Algebra Appl 400:243–252 19. Neogy SK, Das AK (2006) Some properties of generalized positive subdefinite matrices. SIAM J Matrix Anal Appl 27(4):988–995 20. Neogy SK, Das AK (2013) On weak generalized positive subdefinite matrices and the linear complementarity problem. Linear and Multilinear Algebra 61(7):945–953 21. Neogy SK, Das AK, Bapat RB (2009) Modeling, computation and optimization, vol. 6. World Scientific 22. Palpandi K, Sharma S (2021) Tensor complementarity problems with finite solution sets. J Optim Theory Appl 190(3):951–965 23. Pang JS (1979) On Q-matrices. Math Program 17(1):243–247 24. Qi L (2005) Eigenvalues of a real supersymmetric tensor. J Symb Comput 40(6):1302–1324 25. Shao JY (2013) A general product of tensors with applications. Linear Algebra Appl 439(8):2350–2366 26. Shao J, You L (2016) On some properties of three different types of triangular blocked tensors. Linear Algebra Appl 511:110–140 27. Song Y, Qi L (2015) Properties of some classes of structured tensors. J Optim Theory Appl 165(3):854–873 28. Song Y, Qi L (2016) Tensor complementarity problem and semi-positive tensors. J Optim Theory Appl 169(3):1069–1078 29. Song Y, Qi L (2017) Properties of tensor complementarity problem and some classes of structured tensors. Ann Appl Math 30. Song Y, Yu G (2016) Properties of solution set of tensor complementarity problem. J Optim Theory Appl 170(1):85–96 31. Tsatsomeros MJ, Wendler M (2019) Semimonotone matrices. Linear Algebra Appl 578:207– 224 32. Zheng YN, Wu W (2018) On a class of semi-positive tensors in tensor complementarity problem. J Optim Theory Appl 177(1):127–136
An Improved Numerical Scheme for Semilinear Singularly Perturbed Parabolic Delay Differential Equations J. Mohapatra
and S. Priyadarshana
Abstract This work proposes a more generalized numerical algorithm for delayed semilinear differential equations that are singularly perturbed in nature. After linearizing through the quasilinearization technique, a generalized θ -scheme is applied to deal with the time derivative term. The upwind scheme is applied to deal with the spatial derivative on layer resolving Shihskin type meshes which provides a uniform convergent result. The proposed scheme is also tested over a model with small space shifts (both negative and positive) and proved to be globally first-order accurate. To illustrate the method’s efficiency, numerical results are verified through tables and figures. Keywords Singular perturbation · Semilinear parabolic problem · Time delay · Quasilinearization technique
1 Introduction In many practical situations modeled by initial/boundary value problems, the presence of one or more small parameters creates abrupt changes near a narrow region of the considered domain. These problems are known as singular perturbation problems (SPPs) [6]. Recently, the singularly perturbed partial differential equations (SPPDEs) that contain lags either in space or in time or in both are of more concern. Our work focuses on SPPs with delays that are of retarded type, as the delay does not happen at the highest order derivative. Furthermore, time-delayed SPPs have been the major focus of many researchers of the current time. This time lag happens as the system needs time to sense the previous actions to react further. The major difference between SPPs with and without time-delay are 1. SPPs with time-delay can capture/predict the after-effect of a system that can not be done by conventional-instantaneous SPPs. J. Mohapatra (B) · S. Priyadarshana Department of Mathematics, National Institute of Technology, Rourkela 769008, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_12
157
158
J. Mohapatra and S. Priyadarshana
2. ρ = o(1), i.e., the delay in time is quite large and hence the nature of the approximated solution can behave quite differently from the original solution, if approximated by the first few terms of Taylor series expansion. 3. The SPPs without delay have initial condition as a function of s only, i.e., along t = 0, whereas in case of time-delayed SPPs, the initial condition is always a function of both s and t. This category of model problem has gained more research interest as these are quite useful in the modeling of semiconductor devices and population dynamics. In the literature, many works are done [12] in this context notably, a hybrid scheme for the space on the Shishkin mesh and the first-order accurate implicit Euler scheme for the time on a uniform mesh is used by Das and Natesan [1], a fourth-order scheme for problems arising in population dynamics is proposed by Govindrao et al. [3]. Later, for the similar SPPs with two small parameters, Govindarao et al. [2] proposed a first-order accurate finite difference scheme, Sahu and Mohapatra [10] elevated the accuracy by using the hybrid scheme on Shishkin mesh in space. Recently, the authors in [9] used the same scheme proposed by [2] along with the Richardson extrapolation technique to obtain second-order global accurate results. For more details, one can allude to the references of the aforesaid works. Though there are many works done for time-delayed SPPs, the literature for SPPs with time delay as well as spatial delay and advance terms is not so enriched. There is only one work of Sahu and Mohapatra [11] discussing a first-order approach for timedelayed SPPs with spatial retarded terms. Also, the semilinear form of the aforesaid problems are not well studied. To the best of our knowledge, only the authors in [8] proposed a first-order approximate numerical approach for semilinear time-delayed SPPs and for the analysis of semilinear SPPDEs with space delay, there are only two approaches done so far [4, 5]. This shows that there is no work in the literature providing a numerical approach for semilinear SPPs with time delay as well as spatial delay and advance terms. The proposed work can fill this gap. Our aim here is to provide a generalized and robust numerical method for semilinear SPPDEs involving both space and time delay terms. The organizational structure of the contribution is as follows. Some basic assumptions of the model problem are given in Sect. 2. Section 3 describes the fully discrete scheme. To verify the main conclusion in Sect. 5. Section 4 provides numerical tests on two exemplifying problems. In the whole manuscript, C > 0 is used as a generic constant. For g(s, t) on G, .∞ is defined as the standard supremum norm.
2 Description of the Problem
Lu(s, t) = u t + L ε u(s, t) = − f (s, t)u(s, t − ρ) + g(s, t, u) for (s, t) ∈ G(1) u Υd = Ψd (s, t), u Υl = Ψl (s, t), u Υr = Ψr (s, t).
An Improved Numerical Scheme for Semilinear Singularly Perturbed …
159
Here, G = G ∪ ∂G with G = (0, 1) × (0, T ] and ∂G = Υd ∪ Υl ∪ Υr . Υd = [0, 1] × [−ρ, 0], Υl = 0 × (0, T ], Υr = 1 × (0, T ]. L ε u = −εu ss (s, t) + a(s)u s (s, t) + b(s)u(s, t) with ρ > 0 is the temporal lag and T = kρ for k ∈ N. We assume a(s), b(s), f (s, t), g(s, t, u), Ψd (s, t), Ψl (t), Ψr (t) are ε−independent, sufficient smooth and bounded. The temporal and spatial domains are denoted by Gt = [−ρ, T ] and Gs = [0, 1], respectively.
2.1 Asymptotic Behavior of the Exact Solution Deducing some major estimates for the solution of (1), this subsection describes some useful information to be used later. Following the concept of the upper and lower solutions (see Definition 3.1 in [7]), we can write u˘ t − εu˘ ss (s, t) + a(s)u˘ s (s, t) + b(s)u(s, ˘ t) ≥ uˆ t − εuˆ ss (s, t) + a(s)uˆ s (s, t) + b(s)u(s, ˆ t).
Here, u˘ and uˆ are denoted as the upper and lower solutions of (1). Now, for any u ∈ u, ˘ u ˆ by the use of weaker assumption gu ≥ 0 in G, the problem (1) possesses a unique solution (see Lemmas 3.4–3.6, Chap. 2 in [7]). Lemma 1 Let, Θ(s, t), Φ(s, t) are two times differentiable in s, once in t, and satisfy LΘ ≤ LΦ in G, Θ ≤ Φ on ∂G, where L is the operator defined in (1). Then, Θ ≤ Φ in G. Proof The lemma is proved by the method of contradiction. Let, (s ∗ , t ∗ ) be a point in G such that, (Θ − Φ)(s ∗ , t ∗ ) = max (Θ − Φ)(s, t). In [0, ρ] clearly, (s,t) ∈ G (Θ − Φ) (s ∗ ,t ∗ ) ≤ 0, (Θ − Φ) (s ∗ ,t ∗ ) = 0, and (Θ − Φ)(s ∗ ,t ∗ ) > 0. In the right
hand side, as Θ ≤ Φ on ∂G is given, so we have − f (s ∗ , t ∗ )(Θ − Φ)(s ∗ , t ∗ − ρ) ≥ 0 for u(s ∗ , t ∗ − ρ) ∈ ∂G. Again, g(s ∗ , t ∗ , Θ) − g(s ∗ , t ∗ , Φ) = gu (s ∗ , t ∗ , ℘)(Θ − Φ)(s ∗ , t ∗ ) > 0 as gu > 0, taking ℘ as an intermediate value between Θ and Φ. Hence, we get L(Θ − Φ) > 0, which is a contradiction to our given statement. Hence, Θ ≤ Φ in G.
2.2 Quasilinearization The quasilinearization process is used to solve (1). The function u(s, t) is approximated with a suitable initial guess, i.e., u (0) (s, t) in g(s, t, u). From the conditions given in (1), we can introduce u (0) (s, t) = u (0) . Now, for all i > 0, expansion of g(s, t, u (i+1) ) around u (i) gives
160
J. Mohapatra and S. Priyadarshana
g(s, t, u (i+1) (s, t)) ≈ g(s, t, u (i) ) + u (i+1) − u (i)
∂g + ... ∂u(s, t) (s,t,u (i) )
(2)
Applying (2) in (1), we get (i+1) − εu (i+1) + a(s)u (i+1) + b(s)u (i+1) − Lu ∼ = ut ss s
∂g u (i+1) ∂u (s,t,u (i) )
∂g = − f (s, t)u (i+1) (s, t − ρ) + g(s, t, u (i) ) − u (i) ∂u . (s,t,u (i) )
(s, t)
On further rearrangement, we have (i+1) Lu ∼ (s, t) − εu (i+1) (s, t) + a(s)u (i+1) (s, t) + B(s, t)u (i+1) (s, t) = ut ss s (i+1) = − f (s, t)u (s, t − ρ) + G(s, t).
(3)
∂g ∂g B(s, t) = b(s) − ∂u and G(s, t) = g(s, t, u (i) ) − u (i) ∂u . Afterward, (s,t,u (i) ) (s,t,u (i) ) each iteration is solved numerically with a stopping criterion max U i+1 (sn , tm ) − U i (sn , tm ) ≤ T O L .
(sn , tm ) ∈ G
Only a few minor modifications are required in [1] to obtain the following results. Lemma 2 Choose C sufficiently large for all (s, t) ∈ G, the solution of (3) satisfies |u(s, t) − Ψd (s, 0)| ≤ Ct, and
| u(s, t) |≤ C.
Theorem 1 For p, q ≥ 0, 0 ≤ p + q ≤ 5, the following can be obtained: p+q ∂ u −α(1 − s) −p , (s, t) ∈ G. ∂s p ∂t q ≤ C 1 + ε exp ε
2.3 Decomposition of the Solution For getting the parameter uniform error estimate, the bounds on the derivatives of the analytical solution of (3) are required. Thus, the solution of (3) is divided into its smooth (u r ) and boundary-layer components (u b ) as u(s, t) = u r (s, t) + u b (s, t). The smooth components satisfy
(u r )t (s, t) + L ε u r (s, t) = G(s, t) − f (s, t)u r (s, t − ρ), in G (4) u r Υ = Ψd (s, t), u r Υ = Ψl (s, t) and u r Υ = 4j=0 ε j u r j (s, t) for 0 ≤ t ≤ T. d
l
r
u r j , for j = 0, 1, . . . , 4 are the regular components as described in [1]. Now, the singular component satisfies
An Improved Numerical Scheme for Semilinear Singularly Perturbed …
(u b )t (s, t) + L εu b (s, t) = − f (s, t)u b (s, t − ρ), in G u b Υd = 0, u b Υl = 0 and u b Υr = u Υr − u r Υr for 0 ≤ t ≤ T.
161
(5)
Theorem 2 For p, q > 0, with enough compatibility conditions at the corners of G, u r and u b satisfy the following bounds for 0 ≤ p + q ≤ 5 : p+q ∂
ur 4− p , ∂s p ∂t q ≤ C 1 + ε
p+q ∂ u b −α(1 − s) −p ≤ Cε . exp ∂s p ∂t q ε
3 Numerical Approximation 3.1 Temporal Semi-discretization The proposed numerical procedure approximates the time variable employing a uniM form mesh with step length Δt. The discretized domain in time Gt = [−ρ, 2ρ = T ] is GtM1 = {tm = mΔt, m = 0, 1 . . . , M1 , t M1 = 0, Δt = ρ/M1 }, GtM2 = {tm = mΔt, m = 0, 1 . . . , M2 , t M2 = T, Δt = T /M2 }. The total number of mesh intervals in time domain [−ρ, 2ρ] is M = (M1 + M2 ), with M1 and M2 denoted as the number of mesh intervals in [−ρ, 0] and [0, T ] , respectively. To discretize the time variable for (3), the θ -scheme is used. In general, a stable numerical scheme is constructed for 0.5 ≤ θ ≤ 1. For θ = 0.5, the CrankNicolson scheme and for θ = 1, the classical upwind scheme can be obtained. The semi-discrete approximation of (3) is u m+1 − u m m+1 m+1 m+1 m+1 ∼ + θ − εu ss + au s + B u Lu = Δt m m m (6) +(1 − θ ) − εu m ss + au s + B u = θ − f m+1 u m+1−M1 + G m+1 + (1 − θ ) − f m u m−M1 + G m ∈ G, with u Υd = Ψd (s, t), u Υl = Ψl (t), u Υr = Ψr (t). Here, u m+1 and u m are the computed solutions at tm+1 and tm time level, respectively. The local truncation error of the semi-discrete scheme (6) namely em+1 is defined as u(sn , tm+1 ) − U (sn , tm+1 ), where U (sn , tm+1 ) is the solution of
162
J. Mohapatra and S. Priyadarshana
U m+1 + Δtθ − εUssm+1 + aUsm+1 + B m+1 U m+1 m m m = U m − Δt (1 − θ ) εu m + au + B u ss s +Δtθ − f m+1 u m+1−M1 + G m+1 + Δt (1 − θ ) − f m u m−M1 + G m ,
(7)
with similar boundary conditions as mentioned above. For easy error calculation let us denote (7) as ˜ tm+1 ), (I + Δtθ L ε )U (s, tm+1 ) = G(s,
(8)
where m m m ˜ tm+1 ) = U m − Δt (1 − θ ) − εu m + au + B u G(s, ss s +Δtθ − f m+1 u m+1−M1 + G m+1 + Δt (1 − θ ) − f m u m−M1 + G m . ∂qU ≤ C for q = 0, 1, 2, the local trun∂t q cation error associated with (7) at tm+1 , estimates
Theorem 3 Considering Lemma 2 and if em+1 ≤
C(Δt)2 , if 0.5 < θ ≤ 1 C(Δt)3 , if θ = 0.5.
Theorem 4 (Error for temporal semi-discretization) Using the hypothesis mentioned in Theorem 3, the global error at tm+1 is E m+1 ≤
C(Δt), if 0.5 < θ ≤ 1 C(Δt)2 , if θ = 0.5.
3.2 Spatial Discretization The complete number of mesh intervals in G s is taken to be N , with ϑ0 = α2 and a non-negative constant (1 − ϑ) as the transition parameter, where ϑ is defined by 1 , ϑ0 ε ln N . ϑ = min 2
N
(9)
The domain G s is split up to two sub-domains, i.e., [0, 1 − ϑ] and [1 − ϑ, 1] and defined as N G s = {s0 = 0, s1 , s2 , . . . , s N /2 = (1 − ϑ), . . . s N = 1}.
An Improved Numerical Scheme for Semilinear Singularly Perturbed …
163
N and h n = h if n = Let sn = nh n with h n = sn − sn−1 . h n = H if n = 1, 2, . . . , 2 N + 1, . . . , N − 1. S-mesh is denoted by 2 ⎧ 2(1 − ϑ) N ⎪ ⎨n , if n = 1, 2, . . . , , N 2 sn = 2ϑ N ⎪ ⎩(1 − ϑ) + (n − N /2), if n = + 1, . . . , N − 1. N 2 B-S-mesh is ⎧ 2(1 − ϑ) N ⎪ , if n = 1, 2, . . . , , ⎨n N 2 2 sn = N N − 2(N − n)(N − 1) 2 ⎪ ⎩1 + ε ln , if n = + 1, . . . , N − 1. α N2 2
3.3 Finite Difference Scheme With N and M number of mesh points in Gs and Gt , G N ,M is denoted as the discretized form of G. The fully discrete scheme after the use of the upwind scheme in space is U m+1 − Unm LNU ∼ + θ L εN Unm+1 + (1 − θ)L εN Unm = n Δtm+1 m+1−M1 = θ − f n Un + G nm+1 + (1 − θ) − f nm Unm−M1 + G m on G N ,M , n (10) N m+1 ∼ = (I + Δtθ L ε )Un m+1−M1 N m m+1 m+1 = (I − Δt (1 − θ)L ε )Un + Δtθ − f n Un + Gn , +Δt (1 − θ) − f nm Unm−M1 + G m n
L εN Unm+1 = − ε Ds+ Ds− Unm+1 + an Ds− Unm+1 + Bnm+1 Unm+1 . The operators are defined as Ds+ Ds− Unm =
m U m − Un−1 2 + m Ds Un − Ds− Unm , Ds− Unm = n . n hn
n is denoted as n = h n + h n+1 . After the use of the finite difference scheme, we have ⎧ − m+1 m+1 A1 Un−1 + Ac1 Unm+1 + A+ ⎪ 1 Un+1 = ⎪ ⎪ 1 1 ⎪ + Δt (1 − θ ) − f nm u m−M + G m+1 + Gm ⎨ Δtθ − f nm+1 u m+1−M n n n n − m + m , +A2 Un−1 + Ac2 Unm + A2 Un+1 (11) ⎪ −m ⎪ ⎪ Un = Ψd (sn , −tm ), for m = 0, . . . , M1 and n = 1, . . . , N − 1, ⎪ ⎩ m+1 = Ψl (tm+1 ), U Nm+1 = Ψr (tm+1 ) ∀(s, t) ∈ G N ,M . U0 The coefficients for 1 < n ≤ N are given by
164
J. Mohapatra and S. Priyadarshana
⎧ −2ε −2ε ⎪ ⎪ A+ , A+ , ⎪ 1 = θΔt 2 = −(1 − θ)Δt ⎪ ⎪ n h n+1 n h n+1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ an an 2ε 2ε + + Bnm+1 + 1, Ac2 = −(1 − θ)Δt + + Bnm + 1, Ac1 = θΔt ⎪ h n+1 h n hn h n+1 h n hn ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ an an 2ε 2ε ⎪ ⎪ ⎩ A− − , A− − . 1 = θΔt − 2 = −(1 − θ)Δt − n h n hn n h n hn
Thomas algorithm is applied to handle the tri-diagonal matrices formed in (11). Proposition 1 Let Θ(sn , tm+1 ), Φ(sn , tm+1 ) are two times differentiable in s, once in N ,M t, and satisfy L N Θ(sn , tm+1 ) ≤ L N Φ(sn , tm+1 ) in G , Θ(sn , tm+1 ) ≤ Φ(sn , tm+1 ) N ,M N on ∂G , where L is the operator defined in (10). Then, Θ(sn , tm+1 ) ≤ Φ(sn , tm+1 ) N ,M in G . Proposition 2 Let u and U are the solutions of (1) and (10), respectively. The error N ,M associated with U for (sn , tm+1 ) ∈ G at time level tm+1 for S-mesh is u(sn , tm+1 ) − U (sn , tm+1 )∞ ≤
C(N −1 ln N + Δt), if 0.5 < θ ≤ 1 C(N −1 ln N + (Δt)2 ), if θ = 0.5,
and for B-S-mesh is u(sn , tm+1 ) − U (sn , tm+1 )∞ ≤
C(N −1 + Δt), if 0.5 < θ ≤ 1 C(N −1 + (Δt)2 ), if θ = 0.5.
4 Numerical Outputs Example 1 Consider the semilinear SPPs having time lag of the form (1), with (s, t) ∈ (0, 1) × (0, 2] and a(s) = 2 − s 2 ,
b(s) = s,
g(s, t, u) = 10t 2 ex p(−tu)s(1 − s)
with given conditions: Ψd (s, t) = 0, Ψl (t) = 0, Ψr (t) = 0. Example 2 Consider the semilinear time-delayed SPPDEs with spatial delay and advance term of the following, with (s, t) ∈ (0, 1) × (0, 2]
u t + L ε,δ,η u(s, t) = − f (s, t)u(s, t − ρ) + g(s, t, u) for (s, t) ∈ G u Υd = Ψd (s, t), u Υl = Ψl (s, t), u Υr = Ψr (s, t).
(12)
An Improved Numerical Scheme for Semilinear Singularly Perturbed …
165
Υd = [0, 1] × [−ρ, 0], Υl = [−δ, 0] × (0, T ], Υr = [1, 1 + η] × (0, T ]. L ε,δ,η u = −εu ss (s, t) + a(s)u s (s, t) + b(s)u(s, t) + c(s)u(s − δ, t) + d(s)u(s + η, t). δ, η are the delay and the advance parameters in space, respectively. The coefficients are defined as a(s) = 2 − s 2 , d(s) = 1,
b(s) = s − 3, f (s, t) = 1,
c(s) = 2, g(s, t, u) = 10t 2 ex p(−tu)s(1 − s),
with given conditions: Ψd (s, t) = 0, Ψl (s, t) = 0, Ψr (s, t) = 0. Unlike ρ = o(1), the case when δ and η are of o(ε)(i.e., δ = μ1 ε and η = μ2 ε where μ1 and μ2 are of o(1)) is considered. Hence, the retarded terms in spatial direction can be treated with the Taylor series expansion as δ2 u ss (s, t), 2 η2 u(s + η, t) ≈ u(s, t) + ηu s (s, t) + u ss (s, t). 2
u(s − δ, t) ≈ u(s, t) − δu s (s, t) +
(13)
The use of (13) converts (12) to
Lu(s, t) = u t + L ε,δ,η u(s, t) = − f (s, t)u(s, t − ρ) + g(s, t, u) for (s, t) ∈ G (14) u Υ = Ψd (s, t), u(0, t) = Ψl (0, t), u(1, t) = Ψr (1, t) for t ∈ Gt , d
2 2 where L ε,δ,η u(s, t) = − ε − δ2 c(s) − η2 d(s) u ss (s, t) + a(s) − δc(s) + ηd(s)
u s (s, t) + b(s) + c(s) + d(s) u(s, t). Here, u t + L ε,η,δ u(s, t) ≈ L with error of 3 3
O(δ , η ). Smaller δ and η makes (14) a preferable approximation for (12). Here, b(s) + c(s) + d(s) > γ > 0, c(s) > 2ξ1 and d(s) > 2ξ2 , ∀s ∈ Gs are assumed.
2 2 Again, Example 2 satisfies, ε − δ2 c(s) − η2 d(s) > 0 and a(s) − δc(s)+ηd(s) > 2ξ > 0 and hence exhibits a layer at right hand side. Now, (12) has reduced to the form of (1), so the numerical approximation and the theoretical aspects can be done accordingly. For both the test examples, the exact solution is not known. Assuming U N ,M and 2N ,2M as the numerical solutions in G N ,M and G 2N ,2M , respectively, errors(E N ,Δt ) U and rates of convergence(R N ,Δt ) are obtained by E N ,Δt =
max U N ,M (sn , tm ) − U 2N ,2M (sn , tm ), R N ,Δt = log2
(sn , tm ) ∈ G
E N ,Δt . E 2N ,Δt/2
Examples 1 and 2 are provided for the reference of numerical experiments on semilinear time-delayed SPP and semilinear time-delayed SPP with spatial retarded terms, respectively. For computation, the TOL is chosen to be 10−10 . Figure 1 shows
166
J. Mohapatra and S. Priyadarshana
0.8 0.6
0.3
U(s,t)
U(s,t)
0.4
0.2
0.4 0.2
0.1 0
0 0
0.2
0.4
0.6
1 2
s
(a) ε = 10
0 0
1
0.8
0.5
t
s
0
1
1.5
1 2
0
0.5
t
(b) ε = 10−3
Fig. 1 Surface plots for Example 1 -1
0.08
=10
-3
=10
0.04
U(s,t)
0.04 0.02 0
0.03
0.2
0.4
s
0.6
0.4 0.2
0.95
0
t=0.5 t=1 t=1.5 t=2
0.6
=10-4
U(s,t)
0.06
U(s,t)
0.8
=10-2
1
s 0.8
(a) Layer plot for different ε
1
0
0
0.2
0.4
0.6
0.8
1
s
(b) Layer plot at different time levels with ε = 10−7
Fig. 2 Solution profiles with δ = 0.6 × ε and η = 0.8 × ε for Example 2
the layer behavior of the solution of Example 1, through surface plots. The effect of spatial retarded terms on the solution along with the layer behavior at different time levels for Example 2 can be shown through Fig. 2. Table 1 compares the numerical solution of Example 1 for different values of θ on both the meshes and proves the advantage of using the B-S-mesh over the S-mesh. As the proposed scheme is generalized in nature, it is further extended to give second-order accurate results with a limitation h = (Δt)2 which is shown in Table 4. Tables 2 and 3 account the numerical outputs for different values of δ and η on the S-mesh and the B-S-mesh, respectively for Example 2. All the tables confirm that the approximations are better for θ = 0.5 comparative to 0.5 < θ ≤ 1 (Table 4).
An Improved Numerical Scheme for Semilinear Singularly Perturbed …
167
Table 1 Numerical results for Example 1 ε
N/ Δt
Mesh
1 32/ 16
θ 10−4
S-mesh
S-mesh
1 512/ 256
5.4230e-2
3.4181e-2
2.1127e-2
1.2525e-2
7.1132e-3
5.2508e-2
3.3386e-2
2.0827e-2
1.2381e-2
7.1084e-3 7.0890e-3
5.1318e-2
3.2927e-2
2.0639e-2
1.2288e-2
0.6401
0.6739
0.7481
0.7936
1
4.5047e-2
2.3301e-2
1.1875e-2
5.9973e-3
3.0747e-3
0.7
4.2861e-2
2.2169e-2
1.1312e-2
5.7186e-3
2.8913e-3 2.8168e-3
4.1444e-2
2.1524e-2
1.1041e-2
5.5953e-3
0.9452
0.9630
0.9805
0.9901
1
5.4436e-2
3.4296e-2
2.1185e-2
1.2559e-2
7.2491e-3
0.7
5.2709e-2
3.3499e-2
2.0884e-2
1.2415e-2
7.1707e-3 7.1067e-3
0.5 B-S-mesh
1 256/ 128
1
0.5 10−6
1 128/ 64
0.7 0.5 B-S-mesh
1 64/ 32
5.1513e-2
3.3037e-2
2.0696e-2
1.2322e-2
0.6408
0.6747
0.7481
0.7939
1
4.5047e-2
2.3301e-2
1.1875e-2
5.9973e-3
3.0765e-3
0.7
4.3089e-2
2.2294e-2
1.1379e-2
5.7556e-3
2.9263e-3 2.8363e-3
0.5
4.1668e-2
2.1637e-2
1.1105e-2
5.6311e-3
0.9454
0.9622
0.9797
0.9894
Table 2 Numerical results on S-mesh with δ = 0.5 × ε, η = 0.5 × ε for Example 2 ε 10−3
10−5
θ
N/Δt 1 32/ 16
1 64/ 32
1 128/ 64
1 256/ 128
1
5.2600e-2
3.3245e-2
2.0653e-2
1.2251e-2
1 512/ 256
7.0554e-3
0.7
5.0921e-2
3.2472e-2
2.0358e-2
1.2109e-2
6.9901e-3
0.5
4.9767e-2
3.2025e-2
2.0174e-2
1.2017e-2
6.9468e-3
0.6359
0.6667
0.7474
0.7906
1
5.4512e-2
3.4309e-2
2.1184e-2
1.2557e-2
7.2162e-3
0.7
5.2790e-2
3.3508e-2
2.0882e-2
1.2413e-2
7.1489e-3
0.5
5.1579e-2
3.3043e-2
2.0695e-2
1.2319e-2
7.1053e-3
0.6424
0.6750
0.74784
0.7939
Table 3 Numerical results on B-S-mesh with δ = 0.3 × ε, η = 0.3 × ε for Example 2 ε 10−3
10−5
θ
N/ Δt 1 32/ 16
1 64/ 32
1 128/ 64
1 256/ 128
1 512/ 256
1
4.3050e-2
2.2212e-2
1.1297e-2
5.7026e-3
2.8738e-3
0.7
4.1135e-2
2.1240e-2
1.0828e-2
5.4742e-3
2.7621e-3
0.5
3.9749e-2
2.0656e-2
1.0569e-2
5.3573e-3
2.7058e-3
0.9443
0.9667
0.9802
0.9854
1
4.5146e-2
2.3321e-2
1.1877e-2
5.9958e-3
3.0122e-3
0.7
4.3194e-2
2.2315e-2
1.1380e-2
5.7539e-3
2.8941e-3
0.5
4.1777e-2
2.1648e-2
1.1103e-2
5.6287e-3
2.8347e-3
0.9484
0.9632
0.9800
0.9896
168
J. Mohapatra and S. Priyadarshana
Table 4 Numerical results with ε = 10−6 for Example 1 N↓ θ =1 θ = 0.5 S-mesh B-S-mesh S-mesh B-S-mesh N = M1 N = M12 N = M1 N = M12 N = M1 N = M12 N = M1 N = M12 16
32
64
8.0145e2 0.5958 5.3029e2 0.6566 3.3638e2
1.4751e2 1.4722 5.3167e3 1.5097 1.8672e3
8.1150e2 0.9016 4.3438e2 0.9509 2.2470e2
8.9353e3 1.4660 3.2345e3 1.7074 9.9045e4
7.6598e2 0.5710 5.1562e2 0.6416 3.3051e2
1.2258e2 1.6215 3.9838e3 1.7294 1.2014e3
7.8102e2 0.9056 4.1691e2 0.9453 2.1651e2
5.5590e3 1.9835 1.4057e3 1.9946 3.5274e4
5 Conclusion A more generalized numerical scheme is proposed for time-delayed semilinear singularly perturbed parabolic problems. The quasilinearization technique is used to tackle the semilinearity in the problem. The temporal direction is treated with the generalized θ -scheme and to deal with the abrupt change that happens inside the layer region, the spatial direction is handled with the upwind scheme on two layer resolving meshes namely, the Shishkin mesh and the Bakhvalov Shishkin mesh. Highly efficient Thomas algorithm is used to solve the tri-diagonal system of equations formed therein. The scheme is proved to be convergent and parameter uniform for 0.5 ≤ θ ≤ 1. Moreover, the potency of the scheme is tested over time-delayed semilinear singularly perturbed parabolic problems along with spatial delay and advance terms and is proved to be efficient enough. Though the scheme is globally first-order accurate, but it is very efficient from the computational point of view and can be extended to second-order accurate schemes easily. Acknowledgements Ms. S.Priyadarshana conveys her profound gratitude to DST, Govt. of India for providing INSPIRE fellowship (IF 180938).
References 1. Das A, Natesan S (2015) Uniformly convergent hybrid numerical scheme for singularly perturbed delay parabolic convection-diffusion problems on Shishkin mesh. Appl Math Comput 271:168–186 2. Govindarao L, Sahu SR, Mohapatra J (2019) Uniformly convergent numerical method for singularly perturbed time delay parabolic problem with two small parameters. Iran J Sci Technol Trans A Sci 43(5):2373–2383 3. Govindarao L, Mohapatra J, Das A (2020) A fourth-order numerical scheme for singularly perturbed delay parabolic problem arising in population dynamics. J Appl Math Comput 63(1):171–195
An Improved Numerical Scheme for Semilinear Singularly Perturbed …
169
4. Kabeto MJ, Duressa GF (2021) Robust numerical method for singularly perturbed semilinear parabolic differential difference equations. Math Comput Simul. https://doi.org/10.1016/j. matcom.2021.05.005 5. Kumar S, Kumar BR (2017) A finite element domain decomposition approximation for a semilinear parabolic singularly perturbed differential equation. Int J Nonlinear Sci Numer Simul 18(1):41–55 6. Ladyženskaja OA, Solonnikov VA, Ural’ceva NN (1968) Linear and quasi-linear equations of parabolic type. Am Math Soc 23 7. Pao CV (1992) Nonlinear parabolic and elliptic equations, 1st edn. Springer, New York, NY 8. Priyadarshana S, Mohapatra J, Govindrao L (2021) An efficient uniformly convergent numerical scheme for singularly perturbed semilinear parabolic problems with large delay in time. J Appl Math Comput 12(1):55–72 9. Priyadarshana S, Mohapatra J, Pattanaik SR (2022) Parameter uniform optimal order numerical approximations for time-delayed parabolic convection diffusion problems involving two small parameters. Comput Appl Math 41(233). https://doi.org/10.1007/s40314-022-01928-w 10. Sahu SR, Mohapatra J (2021) Numerical investigation of time delay parabolic differential equation involving two small parameter. Eng Comput 38(6):2882–2899. https://doi.org/10. 1108/EC-07-2020-0369 11. Sahu SR, Mohapatra J (2021) Numerical study of time delay singularly perturbed parabolic differential equations involving both small positive and negative space shift. J Appl Anal. https://doi.org/10.1515/jaa-2021-2064 12. Shishkin GI, Shishkina LP (2008) Difference methods for singular perturbation problems, 1st edn. CRC Press, New York
Computational Modeling of Noisy Plasma Images Applicable to Tokamak Imaging Diagnostics for Visible and X-ray Emissions Dhruvil Bhatt , Kirtan Delwadia , Shishir Purohit , and Bhaskar Chaudhury Abstract Tokamaks and stellarators are well-known magnetic confinement devices, to achieve thermonuclear fusion power, that are used to confine plasma by using powerful magnetic field. This paper aims at achieving a synthetic image of the plasma placed inside the tokamak for different temperature profiles where large number of actual images are difficult to get for divergent configurations. In the actual image obtained using tokamak rotations, certainly some noise is added due to the electrical disturbances of the hardware along with some non-deterministic noise. In order to incorporate this effect in the obtained synthetic images, we have added stochastic noises of different distributions. We aim at obtaining a synthetic image as close as possible to the image captured in the lab using the actual tokamak and the camera apparatus, therefore information loss due to fiber bundle effect has been also taken into consideration. The primary objective is to provide a computational tool that can assist in designing any future imaging diagnostic for a crowded tokamak/stellarator environment for perpendicular or tangential viewing geometry. The proposed methodology can also generate synthetic images having the feature statistics of actual images and can be used to create large data-sets for ML assisted tokamak plasma imaging diagnostics. Keywords Tokamak · Tomography · Synthetic image · Imaging diagnostics (ID)
1 Introduction Nuclear fusion, a process where two lighter nuclei fuse to form a heavy nucleus, is a crucial phenomenon for future energy needs [1]. Nuclear fusion takes place in the hot plasma, where the mutual repulsive forces of the nuclei are overcome by high D. Bhatt · K. Delwadia · B. Chaudhury (B) Group in Computational Science and HPC, DA-IICT, Gandhinagar, India e-mail: [email protected] S. Purohit Institute for Plasma Research, Gandhinagar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_13
171
172
D. Bhatt et al.
pressure and temperature. Nuclear fusion can be performed in a controlled environment, and tokamak devices are one of the natural choices [2]. The tokamak is a doughnut-shaped vessel that magnetically confines high-temperature plasma within. The sustained fusion grade plasma requires a range of diagnostic to monitor its progress. These diagnostics cover a range of plasma aspects like temperature, density, pressure, internal magnetic perturbations, namely Thomson scattering, Electron cyclotron emission (ECE), magnetic probe, flux loops, and X-ray diagnostic [3]. The equilibrium reconstruction is the most crucial objective of any tokamak plasma diagnostics. The tokamak plasma’s magnetic axis and the plasma boundary assists in the realization of the equilibrium reconstruction. Magnetic diagnostic and Thomson scattering are essential for such realization. However, the imaging diagnostic (ID) has also proven its capability in giving out the plasma boundary and magnetic axis information for any given tokamak plasma [4]. Considering the fusion grade plasma, the ID will be contributing significantly as the conventional diagnostic may not be very effective. The ID gives out the time evolution of line integrated emission measure along the line of sight of the detector system. The ID works at different wavelengths such as VUV (Vacuum UV), visible, and X-ray [5]. The X-ray emissions are associated with the plasma core and hold information about the magnetic islands as well as about the plasma emission profile. The VUV /visible emissions are strongly coupled with the edge of the tokamak plasma and low “Z” plasma. The viewing geometry is either tangential or perpendicular, depending on the requirements. Sophisticated mathematical computation is required for analyzing the imaging diagnostics data to carry out 3D reconstruction [6]. This kind of computation requires optimization for any given plasma and the ID viewing geometry, which is computationally expensive. The situation is even more critical when the diagnostic view is partial or when insufficient lines of sight are available. Therefore, a proper understanding of the plasma viewing and the optimization of the computational strategy is required for the diagnostic design phase. The simulation of such complex geometries and the generation of the associated ID output at the diagnostic design phase is very helpful. This is the primary motivation of this paper. The paper discusses in depth the generation of the synthetic data for the ID, applicable to multiple wavelengths and bands, for optimizing the viewing geometry and the tomographic reconstruction methodologies. Using a mathematical model, synthetic data is generated artificially to replicate distributions and patterns found in actual experimental data. The noise, which is an integral part of any diagnostic data is also discussed, and the introduction of the noise within the data is elaborated on in-depth. The objective is to provide a computational tool that assists in designing any future imaging diagnostic for a crowded tokamak environment for perpendicular or tangential viewing geometry. Another key outcome is to have synthetic ID output data for a given plasma emission for training Machine learning (ML)/Artificial Intelligence (AI)-based models for the tomographic reconstructions [7]. The visualization of the problem is given in Sect. 2, experimental setup. The mathematical transformation of the experimental setup is explained in Sect. 3,
Computational Modeling of Noisy Plasma Images Applicable …
173
Mathematical modeling. The computational implementation of the mathematical model is elaborated in Sect. 4. Results and discussion are presented in Sect. 5, followed by the summary.
2 Experimental Setup The high-temperature plasma is created and confined magnetically within the vacuum vessel of the tokamak. The plasma created within also has the shape of the torus/doughnut. The major axis of the tokamak, the distance from the center of the doughnut to the center of the poloidal cross-section, is denoted by “R.” The radius of the poloidal cross-section is called the minor axis, “a.” The poloidal cross-section can be either circular or a “D” shaped. The experimental setup considered here is having a circular cross-section vacuum vessel as well as plasma, as shown in Fig. 1. The major radii, denoted by R, is the same for the vessel and the plasma that is 0.75 m. The minor radii, denoted by a, is 0.25 m for the plasma and 0.27 m for the tokamak vacuum vessel. The steel vacuum vessel consists of a cavity or a viewing port at the midplane of the torus. A 2D camera, 512 × 512, is placed at the midplane, viewing the plasma tangentially as shown in Fig. 1. A group of the linear detector array, five arrays with 15 detector pixels per array, viewing the plasma in the perpendicular direction is installed at the same toroidal location within a poloidal plane. The plasma, assembly of high-energy charged particles, moving within the magnetic confinement radiates electromagnetic radiation depending on their energy. These radiations are observed by the 2D camera/linear detector arrays which provides the line integrated data for a given time. The main objective of this paper is to design a computer-assisted model to simulate this data and evaluate all the key components which affect the quality of the imaging diagnostic data. The simulation of synthetic camera images requires an in-depth understanding of the plasma emission profile, viewing geometry, camera design, operational conditions, and possible noise features [8].
Fig. 1 Schematic of the experimental setup considered in this work
174
D. Bhatt et al.
3 Mathematical Model The mathematical model for generating synthetic data of a plasma imaging diagnostic requires an understanding of plasma emission, the realization of the viewing geometry, integration of the emission along the line of sight of ID, and the estimation of possible noises, respectively. The line integrated data and different noise elements as discussed formulate the mathematical model as given by Eq. (1). Here Ii is the ith pixel intensity value, Ai j is geometrical factor and E j is local emission value for jth location along the line of sight. B is the coefficient of the noise element which decides the amount of noise that had to be added to the image. The second term in Eq. (1) is a linear combination of different noise elements [9, 10]. Ii =
Ai j E j + B N oise
(1)
j
3.1 Plasma Emissions The plasma is a wideband emission source, emitting all possible wavelengths and the amount of emission may change from different parts of the tokamak plasma. Emission at different wavelengths harbors various aspects of the plasma process. The Vacuum Ultraviolet (VUV) or the visible (VS) emission are not uniform across the plasma poloidal cross-section and, peak at a certain point between the edge and the core of the tokamak plasma, see Fig. 2b. The X-ray (SX) emissions are parabolic emission profiles in nature. The maximum emission is from the core regions and reduces in a parabolic manner toward the edge, refer to Fig. 2(b). The poloidal representation of the respective emission profiles is given in Fig. 2c and d. The selection of emission profiles is the first step toward synthetic image generations.
3.2 Viewing Geometry The camera is pixelated, therefore, the image/data is discrete in nature and individual pixel level signal has to be estimated. The line of sight for every pixel is evaluated considering the viewing geometry by which the emitted radiation, from the plasma, is reaching the camera. The basic viewing geometry is the pinhole geometry as shown in Fig. 1. A pinhole is placed between the plasma and camera to have an acceptable spatial resolution. In this situation, the line of sight for a pixel is considered starting from the camera pixel entering to the plasma via pinhole and terminating at the inner walls of the tokamak vessel. The estimation of Lines of sights for all the pixels is an important step toward synthetic image generations.
Computational Modeling of Noisy Plasma Images Applicable …
175
Fig. 2 The typical 1D emission profile for SX (a) and visible emission (b) for which the poloidal emission distribution is presented in (c) and (d) respectively
3.3 Emission Integration As the emission profile shape and the line of sight are now known, see Figs. 1 and 2, now the integration of the emission for a given pixel is to be performed. The local emission seen by the line of sight is estimated by calculating the poloidal location of every point on the LoS within the plasma. Subsequently E j is determined and added together to have line integrated data for a given pixel, see Eq. (1).
3.4 Fiber Bundle Effect The high temperature and density plasma emits a range of emission, which under some conditions, are hazardous for the camera system. Therefore the emission light is transported to the camera via optical fiber bundle. Numerous individual fibers, having circular cross-section, are stack together to make a sufficient surface area to transport the emission light to the camera sensor plane. The gaps within the fiber bundle offer undesirable effects on the image taken, as the gaps or imperfection in the bundle cross-section restrict the emission from the plasma reaching the detector [11].
176
D. Bhatt et al.
This feature enables the loss of information and which is required to be estimated for the synthetic image creations. The realization of the fiber bundle gap is approximated as a mask with circular punctures and is added to the line integrated emission information [12].
3.5 Noise The noise for an ID output is a combined effect of light deficiency, environmental conditions, sensor temperature, transmission channel, and any pollutant present. The type of noise created by the listed has a unique effect on the image, and this information assists in the creation of a near to real synthetic ID image. The noises in images are categorized as Photo-electric, Impulse, and structure respectively. The photo-electric noise addresses the noise introduced due to the less light situations and the heating effects of the sensors. The light is less or in our case, if the plasma emission is not strong there will be a noise due to photon detection statics. As the temperature of the sensor increases the random motion of the charge carrier will increase and this will create noise in the image. This noise is referred to as white noise and it contributes to all possible frequencies available. This type of noise can be defined mathematically as Eq. (2), a Gaussian type of distribution or a Gaussian random variable x, which can be considered as the intensity of pixel in image domain. In this equation, σ represents the standard deviation of the intensity of the pixels and μ represents the mean intensity of all the pixels in the image. −(x−μ)2 1 pgaussian (x) = √ e 2σ 2 σ 2π
(2)
The imaging diagnostics measures the emission photons from the plasma. Thus the photon counting error will come into the picture and this brings the Poisson noise. The difference between the Poisson and Gaussian is, Poisson noise is quantized whereas Gaussian is continuous. Poisson noise is also termed quantum noise due to its associated origin. The probability distribution of a Poisson random variable x can be given by Eq. (3). e−μ μx (3) p poisson (x) = x! In the above equation, μ represents the mean intensity of all the pixels in the image. Impulse noise or salt-pepper noise is a very general type of noise that appears with the image due to transmission issues or very specific to the analog to digital converter or pixel malfunction. Under this noise, the pixel output is completely altered independently to either completely off or of the highest intensity values [13–15]. This situation may also appear when improper memory allocation is observed.
Computational Modeling of Noisy Plasma Images Applicable …
177
4 Computational Implementation The computational implementation is one of the important aspects of this work. The implementation is systematically shown in the block diagram, Fig. 3. The computational implementation holds two major parts, firstly the estimation of the noise-free image viewed by the camera considering the adopted emission profile and secondly the determination of different noises. – The inputs required for the computational implementation address the geometrical locations of plasma, pinhole, and camera. The emission profile type is one of the perquisites, as to which type of ID has to be simulated. The camera specification, like the number of the pixel as well as the size, is required for the realization of the line of sight. Besides this information, the plasma and vacuum vessel dimensions are also required. This constitutes the input requirements for the simulation of the ID synthetic image. – The first module of the simulation tool determines the precise location of the camera pixels. As it is clear that the pixel is having a finite area and therefore the field of view has to be considered to have a general realization of the plasma volume seen by the pixel. However, the field of view is approximated by a line of sight, which considers the pixel as a point entity. This approximation is achieved by considering a line originating from the center of the pixel and passing through the pinhole and entering into the plasma. This line is referred to as the Line of sight. This approximation relaxes the computation to a fair extent. – Every pixel gives out the line integrated emission along the line of the sight. In other words, it is the summation of all the emission values seen by the pixel. The plasma vacuum vessel dimension is known the entry and the exit of the LoS is estimated, to determine the LoS path length within the plasma. This estimation carefully evaluates the exit point of the LoS as the emission is restricted via a metal vessel. Once the effective path length is available the emission at every point on this path length is estimated. As the emission profile is given in the normalized radii space the point on the path length is traced in the normalized radii space and the respective emission value is assigned. All such assigned values are then added together to have the final line integrated value for a given pixel. The spatial resolution of the points taken on the path length is an important parameter, finer spatial resolution serves better results. Sub-millimeter spatial resolution or better is a good choice. – The line integrated emission values for every pixel are placed at their physical location according to their camera location, and the line integrated noise-free camera image is realized. The pixel distribution depends on the final experimental image as sometimes the image is opposite due to the optical arrangements. – After the noise-free image is generated, the obtained synthetic image is masked with circular punctures. This step is performed to replicate the information loss due to imperfect bundling of the optical fibers. – The three major noise types, Poisson, Gaussian, salt, and pepper respectively, are estimated for the noise-free image. The Mean, minimum, maximum, and variance
178
D. Bhatt et al.
Fig. 3 Flowchart of the synthetic image generation algorithm
of the intensity observed in the noise-free image are recorded for the calculation of different noises. These input values directly make the noise image dependent or more specifically the geometry and the emission profile dependent. The Gaussian noise, Poisson noises are then estimated via Eqs. (2) and (3), in a 2D format. The Salt and pepper noise is also estimated in a 2D format and the pixels are subjected to the extreme values of the intensity. – The final image is made by adding the noise image with the noise-free image. The amount of noise in the image is a user-dependent parameter. The noise can be amplified as per the required situation.
5 Results and Discussion The synthetic noise-free data generated from the discussed procedure are displayed in Fig. 4. Here Fig. 4a and c represents the line integrated images for the assumed profile X-ray (a) and the visible emission (c) profiles. As X-ray emission having a parabolic profile, see Fig. 2, the image Fig. 4a shows an integration of a solid torus which suggests the analogy of the line integrated image. The visible emission profile peaks at a particular location in the normalized radii space constitute a ring-type
Computational Modeling of Noisy Plasma Images Applicable …
179
Fig. 4 a and c are the line integrated images along the tangential view of the SX and VUV profile respectively, b and d are the line integrated images along the restricted tangential view of the SX and VUV profile respectively
emission in the poloidal space, see Fig. 2. Such emission profiles are realized as a hollow torus and when integrated tangentially they should show a bright ring structure which is visible in Fig. 4c. Figure 4b and d is the restricted viewing situation, where the ID is only able to view a portion of the plasma. The line integrated data for the perpendicular viewing system with 5 arrays, 75 effective pixels, considering the two emission profiles are given in Fig. 5. The hollow nature of visible emission in 3D space is visible in the perpendicular viewing geometry too, black color. Considering Figs. 4 and 5, it is clear that the 3D viewing geometry and the line integration have been transformed fairly well into a mathematical model. Figure 6a shows the line integrated plasma image for the SX profile. To replicate the loss of information due to imperfect fiber bundling, a mask with circular punctures is generated, as shown in Fig. 6b. An estimation of the effect of fiber bundle gaps on the actual plasma image is presented in Fig. 6c. Image within the red patch in Fig. 6c is zoomed-in to show the loss of information due to the circular punctures in the mask, and is presented in Fig. 6d. The individual noise images are presented in Fig. 7, where the Gaussian, Poisson, and Salt and Pepper Noise are shown in
180
D. Bhatt et al.
Fig. 5 Comparison of pixel intensities of SX (red-dashed) and VUV (black-solid) profiles captured along the perpendicular view
Fig. 6 Visualization of the effect of fiber bundle gaps on actual plasma image
Fig. 7 2D images of a Gaussian Noise b Poisson Noise and c Salt and Pepper Noise generated by the code for 512 × 512 pixels synthetic image
Computational Modeling of Noisy Plasma Images Applicable …
181
Fig. 7a, b, and c respectively. The Gaussian noise and the Poisson noise are in distant appearances from the salt and pepper noise, due to the very nature of the noise. Salt and pepper noise gives a deviation of the extreme to the pixel brightness value, either maximum or the minimum, with no intermediate values. Whereas the probability distribution function (PDF) for the other two noises is bell-shaped, extended tail in case of salt and pepper. Therefore the deviation of the intensity value of a pixel from the mean pixel intensity value is not very drastic. The combined results, visible emission profile, for the noises with the noise-free image are shown in Figs. 8, 9, and 10. These figures are also masked to demonstrate the effect of the fiber bundle gaps. With Fig. 8 the images contain 10% of salt and pepper noise (SPN). From right to left, a-d, the Gaussian noise (GN) increases by 0,
Fig. 8 Synthetic image obtained for VUV profile with 10% Salt-pepper noise and varying Gaussian and Poisson noises are shown here. As we move from left to right in a row, Gaussian noise increases [0, 10, 20, 40%]. Similarly as we move down in a column, Poisson noise increases [0, 10, 20, 40 %]
182
D. Bhatt et al.
Fig. 9 Synthetic image obtained for VUV profile with 10% Gaussian noise and varying Saltpepper and Poisson noises are shown here. As we move from left to right in a row, Salt-pepper noise increases [0, 10, 20, 40%]. Similarly as we move down in a column, Poisson noise increases [0, 10, 20, 40%]
10, 20, and 40% respectively. Whereas from a-m, that is top to bottom the Poisson noise (PN) increases by 0, 10, 20, and 40%. The increase in GN and PN deteriorates the image marginally. Figure 9 has fixed 10% GN and left to right SNP is rising and top to bottom PN with increments of 0, 10, 20, and 40% respectively. It is very much clear that SPN holds a strong influence over the ID image. This is endorsed by Fig. 10 where the PN is fixed at 10% and GN and SPN are moving, left to right and top to bottom. The last row, which corresponds to the 40% SPN, is substantially influenced by the noise. So this suggests that the detector temperature and the associated electronics especially the signal transformation from analog to digital change are some of the critical points for noise reduction.
Computational Modeling of Noisy Plasma Images Applicable …
183
Fig. 10 Synthetic image obtained for VUV profile with 10% Poisson noise and varying Gaussian and Salt-pepper noises are shown here. As we move from left to right in a row, Gaussian noise increases [0, 10, 20, 40%]. Similarly as we move down in a column, Salt-pepper noise increases [0, 10, 20, 40%]
6 Conclusions The work presented in the paper transforms a 3D viewing geometry of a tokamak imaging diagnostic into a proper mathematical system where the emission viewed by the camera of imaging diagnostic is estimated. This transformation is wavelengthindependent and thus applicable to a wide range of imaging diagnostic. The realization of the possible noises and optical fiber bundle effect is a novel incorporation to the synthetic image estimation. Generally, white noises are considered for the synthetic images. The simulation of the synthetic image suggests that the salt and pepper noise is one of the critical noises for the ID images and a slight presence influence the overall results. These results give a qualitative realization of the creation of
184
D. Bhatt et al.
synthetic data for ID in the tokamak environment. The qualitative realization requires the real experimental parameter for which the tool has to be employed. The proposed approach can also aid to augment real data for training different ML models either attempting to balance classes or to achieve more data to improve model training and performance.
References 1. Hawryluk R, Batha S, Blanchard W, Beer M, Bell M, Bell R, Berk H, Bernabei S, Bitter M, Breizman B, Bretz N, Budny R, Bush C, Callen J, Camp R, Cauffman S, Chang Z, Cheng C, Darrow D, Zweben S (1998) Fusion plasma experiments on TFTR: a 20 year retrospective. Phys Plasmas 5:1577–1589 2. Freidberg J, Mangiarotti F, Minervini J (2015) Designing a tokamak fusion reactor-How does plasma physics fit in? Phys Plasmas. 22(7):070901 3. Young K (1997) Advanced tokamak diagnostics. Fusion Eng Des 34–35:3–10. https://www. sciencedirect.com/science/article/pii/S092037969600676X. (Fusion Plasma Diagnostics) 4. Tritz K, Fonck R, Thorson T (1999) Application of x-ray imaging to current profile measurements in the PEGASUS experiment. Rev Sci Instrum 70:595–598. https://doi.org/10.1063/1. 1149273 5. Bitter M, Goeler S, Sauthoff N, Hill K, Brau K, Eames D, Goldman M, Silver E, Stodiek W (1981) X-ray radiation from tokamaks. In: Inner-shell and x-ray physics of atoms and solids, pp 861–870. https://doi.org/10.1007/978-1-4615-9236-5_169 6. Chiro G, Rodney A (1979) Brooks. The 1979 Nobel prize in physiology or medicine. Science 206:1060–1062. https://www.science.org/doi/abs/10.1126/science.386516 7. Würfl T, Ghesu F, Christlein V, Maier A (2016) Deep learning computed tomography. Med Image Comput Comput Assist Interv MICCAI 2016:432–440 8. Meyer W, Fenstermacher M, Groth M (2014) Analysis of tangential camera views of tokamak plasmas. In: APS division of plasma physics meeting abstracts, vol 2014, no 10, pp eNP8.060 9. Ohdachi S, Toi K, Goeler S (2001) Tangential soft x-ray camera for Large Helical Device. Rev Sci Instrum 72(1):724–726 10. Purohit S, Suzuki Y, Ohdachi S, Yamamoto S (2019) Soft x-ray tomographic reconstruction of Heliotron J plasma for the study of magnetohydrodynamic equilibrium and stability. Plasma Sci Technol 21(4):065102. https://doi.org/10.1088/2058-6272/ab0846 11. Lodhi M, Dumas J, Pierce M, Waheed B (2017) Computational imaging through a fiber-optic bundle. In: Proceedings of SPIE-the international society for optical engineering 12. Renteria C, Suárez J, Licudine A, Boppart S (2020) Depixelation and enhancement of fiber bundle images by bundle rotation. Appl Opt 59(1):536–544. http://opg.optica.org/ao/abstract. cfm?URI=ao-59-2-536 13. Boyat A, Joshi B (2015) A review paper: noise models in digital image processing. Signal Image Process Int J 6(5) 14. Fu B, Zhao X, Song C, Li X, Wang X (2019) A salt and pepper noise image denoising method based on the generative classification. Multimed Tools Appl 78(5):12043–12053 15. Al Azzeh J, Zahran B, Alqadi Z (2018) Salt and pepper noise: effects and removal. JOIV: Int J Inf Vis 2(7)
On Partial Monotonicity of Some Extropy Measures Nitin Gupta and Santosh Kumar Chaudhary
Abstract Gupta and Chaudhary (On general weighted extropy of ranked set sampling, 2022, [13]) introduced general weighted extropy and studied related properties. In this paper, we study conditional extropy and define the monotonic behaviour of conditional extropy. Also, we provide results on the convolution of general weighted extropy. Keywords Entropy · Extropy · Log-concavity · Log-convexity · Partial monotonicity Mathematical Subject Classification: 94A17 · 62N05 · 60E15
1 Introduction In the technological age we live in, technology is a part of almost everything. In the field of computer science, the most well-known technology for allowing a computer to automatically learn from the past is called machine learning. Entropy and extropy in machine learning are two of the many techniques and concepts that are being used to solve complex problems easily. Further, entropy and extropy are also useful in the fields of information theory, physics, probability and statistics, computer science, economics, communication theory, etc. (see Balakrishnan et al. [3], Becerra et al. [7], Kazemi et al. [19], Sati and Gupta [25], Tahmasebi and Toomaj [29], Tuli [32]). Shannon [27] introduced the notion of information entropy which measures the average amount of uncertainty about an occurrence associated with a certain
N. Gupta · S. Kumar Chaudhary (B) Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India e-mail: [email protected] N. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_14
185
186
N. Gupta and S. Kumar Chaudhary
probability distribution. Let Y be a discrete random variable having probability mass function pi , i = 1, 2 . . . , N . The discrete version of Shannon entropy is defined, as N HN (Y ) = − pi log ( pi ). i=1
Here and throughout this paper, log denotes logarithm to base e. Let Y be an absolutely continuous random variable having probability density function gY (y) and cumulative distribution function G Y (y). This notation of probability density function and cumulative distribution function will be carried out throughout the paper. The differential form of Shannon entropy is defined, as H (Y ) = −
∞ −∞
gY (y) log (gY (y)) dy.
Various researchers have suggested different generalisations of entropy to measure uncertainty (see Gupta and Chaudhary [13], Hooda [14], Jose and Sathar [15], Kayal and Vellaisamy [18], Kayal [17], Qiu [22], Sathar and Nair [24]). Cumulative past entropy was proposed and studied by Di Crescenzo and Longobardi [11] as ξ(Y ) = −
∞
−∞
G Y (y) log (G Y (y)) dy.
(1.1)
Rènyi [23] proposed the generalised entropy of order θ for θ > 0, θ = 1, which is given by ∞ 1 log Hθ (Y ) = (gY (y))θ dy . 1−θ −∞ Tsallis [31] defined the generalised entropy for θ > 0, θ = 1, which is given by ∞ 1 θ 1− Sθ (Y ) = (gY (y)) dy . θ −1 −∞ Kapur [16] gives Kapur entropy of order θ and type λ for θ = λ, θ > 0, λ > 0, which is given by Hθ,λ (Y ) =
∞ ∞ 1 log (gY (y))θ dy − log (gY (y))λ dy . λ−θ −∞ −∞
Varma [33] generalised entropy of order θ and type λ for λ − 1 < θ < λ, λ ≥ 1, which is given by Hθλ (Y ) =
1 log λ−θ
∞
−∞
(gY (y))θ+λ−1 dy .
On Partial Monotonicity of Some Extropy Measures
187
The conditional Shannon entropy of Y given S, where S = {c < Y < d}, is given by
d
H (Y |S) = −
gY |S (y) log gY |S (y) dy,
c
where gY |S (y) =
gY (y) , c < y < d. G Y (d) − G Y (c)
One may refer to Sunoj et al. [28] for a review of conditional Shannon entropy. Convolution and monotonic behaviour of the conditional Shannon entropy, Renyi entropy, Tsallis entropy, Kapur’s and Verma’s entropies have been studied in the literature (see Chen et al. [9], Gupta and Bajaj [12], Sati and Gupta [25] and Shangari and Chen [26]). Bansal and Gupta [5] studied the monotonicity properties of cumulative past entropy and convolution results for conditional extropy. In this paper, we study the monotonicity of conditional extropy and convolution results for general weighted extropy. As described in Chen et al. [9] and Shangari and Chen [26], the conditional Shannon entropy H (Y |Y ∈ S) may serve as an indicator of uncertainty for an interval S. The measure of uncertainty shrinks/expands as the interval providing the information about the outcome shrinks/expands. For intervals S1 and S2 such that S2 ⊆ S1 , then entropy H is partially increasing (decreasing) if H (Y |Y ∈ S2 ) ≤ (≥)H (Y |Y ∈ S1 ). Under the condition that G Y (y) is a log-concave function (for more on log-concave probability and its application, see Bagnoli and Bergstrom [2]). Shangari and Chen [26] proved that conditional Shannon entropy H (Y |S) of Y given S = (c, d) is a partially increasing function in the interval S. Under the same condition, they also proved that the conditional Renyi entropy Hθ (Y |S) of Y given S = (c, d) is a partially increasing function in the interval S for θ ≥ 0 θ = 1. Under the condition that G Y (x) is concave, Gupta and Bajaj [12] proved that conditional Kapur entropy Hθ,λ (Y |S) of Y given S = (c, d) is a partially increasing function in the interval S. They also show that if G Y (y) is a log-concave function then the conditional Tsallis entropy Sθ (Y |S) of Y given S is a partially increasing function in the interval S where S = (c, d). Sati and Gupta [25] studied the monotonic behaviour of conditional Varma entropy. Under the condition G Y (y) is log-concave function and θ + λ > ( c we have ψ1 (d) ≥ |S) ≥ 0. Therefore J (Y |S) ψ1 (c), that is, ψ1 (d) ≥ 0. Hence from (2.1) we have dJ (Y dd is increasing in d, for fixed c. In the next section, we provide a result on convolution of J w (Y ). We will prove that the conditional general weighted extropy of V = |Y1 − Y2 | given S = {c ≤ Y1 , Y2 ≤ d} is partially increasing in S.
On Partial Monotonicity of Some Extropy Measures
191
3 Convolution of General Weighted Extropy Let Y be a random experiment and it is repeated to measure its reproducibility or precision or both. Then measure of uncertainty of the experiment is the function V = |Y1 − Y2 |, where Y1 and Y2 are independent and identically distributed random variable from an experiment Y with probability density function gY (y). The difference V = |Y1 − Y2 | is the measure of the uncertainty between two outcomes. Uncertainty should reduce if further information of the form S = {c < Y1 , Y2 < d} is provided. The marginal probability density function of V = |Y1 − Y2 | given S = {c < Y1 , Y2 < d} is d h(v; c, d) = c+v
gY (y − v)gY (y)dy , for all v ∈ [0, d − c]. (G Y (d) − G Y (c))2
Chen et al. [8] proved that the conditional Shannon entropy of V given S is partially monotonic in S provided the random variables Y1 and Y2 have log-concave probability density functions that take value in S. Shangari and Chen [26] claimed and Gupta and Bajaj [12] proved that if Y1 and Y2 have log-concave probability density function which takes value in S, then the conditional Tasalli and Renyi entropy of V given S is partially increasing function in S if θ > 0, θ = 1. Sati and Gupta [25] study the partial monotonicity of the conditional Varma’s entropy of V given S. Bansal and Gupta [5] studied the convolution results for conditional extropy. The proof of the next result of this section will be using the following lemma from Chen et al. [9] (also see Sati and Gupta [25]). Lemma 1 (a) Let the probability density functions of random variables Y1 and Y2 be log-concave functions. If the function φ(v) is increasing in v, then E(φ(V )|S) is increasing in d for any c, and decreasing in c for any d; where V = |Y1 − Y2 | where S = {c < Y1 , Y2 < d}. (b) If gY (y) is log-concave function, then h(v; c, d) is decreasing function of v on v ∈ [0, d − c]. Now, we will prove the following theorem which provides the conditions for J w (V |S) to be a partially increasing/decreasing in S. Theorem 2 Let the probability density functions of random variables Y1 and Y2 be log-concave function. Let weight be w(y) ≥ 0, w(y) is decreasing in y, and S = {c < Y1 , Y2 < d}, then the J w (Y |S) is a partially increasing in S. Proof The conditional general weighted extropy of V given S is −1 J (V |S) = 2 w
d c
w(v) (h(v; c, d))2 dv.
192
N. Gupta and S. Kumar Chaudhary
For fixed c, if we choose for any d1 ≤ d2 , ψ1 (v) = (w(v))1/2 h(v; c, d1 ) ψ2 (v) = (w(v))1/2 h(v; c, d2 ) clearly here ψ1 (v) and ψ2 (v) are non-negative functions. Also, let p = 2, q = 2, ¨ inequality, we now then p > 0, q > 0 and 1p + q1 = 1. With the help of Holder’s obtain 1/ p
ψ1 (v)ψ2 (v)dv ≤
(ψ1 (v)) p dv
,
that is,
1/q (ψ2 (v))q dv
w(v)h(v; c, d1 )h(v; c, d2 )dv ≤
1/2 w(v)h (v; c, d1 )dv 2
1/2 w(v)h 2 (v; c, d2 )dv
For fixed d > 0, let
then,
.
(3.1)
φ1 (v) = −w(v)h(v; c, d);
φ1 (v) = −w (v)h(v; c, d) − w(v)h (v; c, d) ≥ 0
as the probability density function h(v; c, d) is decreasing function in v for 0 ≤ v ≤ d − c (Using Lemma 1 (b)), w(v) ≥ 0, and w(v) is decreasing function in v for 0 ≤ v ≤ d − c. Hence φ1 (v) increases in v. Therefore, by Lemma 1 (a) for any c < d1 < d2 , we have E(φ1 (V )|c ≤ Y1 , Y2 ≤ d1 ) ≤ E(φ1 (V )|c ≤ Y1 , Y2 ≤ d2 ), 2 i,e.; w(v)h (v; c, d2 )dv ≤ w(v)h(v; c, d1 )h(v; c, d2 )dv. From (3.1) and (3.2), we have w(v)h 2 (v; c, d2 )dv ≤ w(v)g 2 (v; c, d1 )dv. Therefore we have 1 1 2 − w(v)h (v; c, d1 )dv ≤ − w(v)h 2 (v; c, d2 )dv, 2 2 that is,, J w (V |c < Y1 , Y2 < d1 ) ≤ J w (V |c < Y1 , Y2 < d2 ); for d1 ≤ d2 . Hence, for a fixed c, J w (V |S) is increasing in d. Now for fixed d, if we choose for any c1 ≤ c2 ,
(3.2)
(3.3)
On Partial Monotonicity of Some Extropy Measures
193
−1 ψ3 (v) = w 2 (v)h 2 (v; c1 , d) h 2 (v; c2 , d) , ψ4 (v) = w(v)h 2 (v; c1 , d) . Clearly ψ3 (v) and ψ4 (v) are non-negative. Also, let p = 21 , q = −1, then p < 1, q < 0 and 1p + q1 = 1. Now Holder’s ¨ inequality provides
1/ p (ψ3 (v)) p dv
1/q (ψ4 (v))q dv
≤
ψ3 (v)ψ4 (v)dv,
2 w(v)h(v; c1 , d) h(v; c2 , d)dv
that is,
w(v)(h(v; c1 , d)) dv
−1
2
≤
that is,
w(v)(h(v; c2 , d))2 dv,
w(v)h(v; c1 , d) h(v; c2 , d)dv ≤
w(v)(h(v; c1 , d))2 dv
w(u)(h(v; c2 , d)) dv 2
For fixed c2 > 0, let
then,
21
21
.
(3.4)
φ2 (v) = −w(v)h(v; c1 , d);
φ2 (v) = −w (v)h(v; c1 , d) − w(v)h (v; c1 , d) ≥ 0,
as the probability density function h(v; c, d) is decreasing function in v for 0 ≤ v ≤ d − c (Using Lemma 1 (b)), w(v) ≥ 0, and w(v) is decreasing function in v. Hence φ2 (v) increases in v. By Lemma 1 (a) for any c1 < c2 < d, we have E(φ2 (V )|c2 ≤ Y1 , Y2 ≤ d) ≤ E(φ2 (V )|c1 ≤ Y1 , Y2 ≤ d), 2 that is, w(v) (h(v; c1 , d)) dv ≤ w(v) h(v; c1 , d)h(v; c2 , d)dv.
(3.5)
Now, (3.3) and (3.5) imply
w(v) (h(v; c1 , d)) dv ≤ 2
w(v) (h(v; c2 , d))2 dv.
Therefore we have 1 1 − w(v) (h(v; c2 , d))2 dv ≤ − w(v) (h(v; a1 , d))2 dv, 2 2 that is, J w (V |c2 < Y1 , Y2 < d) ≤ J w (V |c1 < Y1 , Y2 < d); for c1 ≤ c2 .
(3.6)
194
N. Gupta and S. Kumar Chaudhary
As a result, for fixed d, the J w (V |S) is decreasing in c. Therefore the J w (V |S) is partially increasing in S. Remarks: It is observable from the above theorem that, under specific circumstances, the J (X |S) is partially increasing in S, demonstrating its reasonability as a complement dual of the entropy measure. Remark 1 In Theorem 2 if we take w(y) = 1, we get the result of Bansal and Gupta [5]. The following examples of Theorem 2 may be provided. Example 1 (a) Let Y1 and Y2 be two independent and identically distributed Weibull random variables with probability density function for θ ≥ 1, λ ≥ 0, θ
gY (y) = θ λθ y θ−1 e−(λy) , y ≥ 0. Since the probability density function of Weibull distribution is a log-concave function for θ ≥ 1 and w(y) = 1/y, then conditional weighted extropy of V given interval S is a partially increasing in S by using Theorem 2. (b) Let Y1 and Y2 be two independent and identically distributed gamma random variables with probability density function for θ ≥ 1, λ ≥ 0, gY (y) =
λθ θ−1 −λy y e , y ≥ 0. (θ )
Since the probability density function of the gamma distribution is a log-concave function for θ ≥ 1 and let w(y) = 1/y, then conditional weighted extropy of V given S is a partially increasing in S by using Theorem 2.
4 Conclusion The extropy measure and its generalisations are now widely used in all scientific domains. General weighted extropy is a generalisation of extropy. We proposed conditional extropy and studied its partial monotonicity. We also provided some results on convolution of general weighted extropy. Funding Santosh Kumar Chaudhary is getting financial assistance for research from the Council of Scientific and Industrial Research (CSIR), Government of India (File Number 09/0081 (14002)/2022-EMR-I). Conflict of Interest No conflicts of interest are disclosed by the authors. Acknowledgements The authors are thankful to the reviewers for their insightful comments, which significantly improved this manuscript.
On Partial Monotonicity of Some Extropy Measures
195
References 1. Ash RB (1990) Information theory. Dover Publications Inc., New York 2. Bagnoli M, Bergstrom T (2005) Log-concave probability and its applications. Econ Theory 26(2):445–469 3. Balakrishnan N, Buono F, Longobardi M (2022) On Tsallis extropy with an application to pattern recognition. Stat Probab Lett 180:109241 4. Balakrishnan N, Buono F, Longobardi M (2020) On weighted extropies. Commun Stat Theory Methods 1–31. 10:1080=03610926:2020:1860222 5. Bansal S, Gupta N (2020) On partial monotonic behaviour of past entropy and convolution of extropy. In: Castillo O, Jana D, Giri D, Ahmed A (eds) Recent advances in intelligent information systems and applied mathematics. ICITAM 2019. Studies in computational intelligence, vol 863. Springer, Cham. https://doi.org/10.1007/978-3-030-34152-7_16 6. Bansal S, Gupta N (2021) Weighted extropies and past extropy of order statistics and k-record values. Commun Stat Theory Methods 1–18 7. Becerra A, de la Rosa JI, González E, Escalante NI (2018) Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition. Multimed Tools Appl 77:27231–27267 8. Chen J (2013) A partial order on uncertainty and information. J Theor Probab 26(2):349–359 9. Chen J, van Eeden C, Zidek JV (2010) Uncertainty and the conditional variance. Stat Probab Lett 80:1764–1770 10. Cover T, Thomas JA (2006) Elements of information theory, 2nd edn. John Wiley & Sons Inc., Hoboken, NJ 11. Di Crescenzo A, Longobardi M (2009) On cumulative entropies. J Stat Plann Infer 139(12):4072–4087 12. Gupta N, Bajaj RK (2013) On partial monotonic behaviour of some entropy measures. Stat Probab Lett 83(5):1330–1338 13. Gupta N, Chaudhary SK (2022) On general weighted extropy of ranked set sampling. https:// doi.org/10.48550/arXiv.2207.02003. (Communicated to journal) 14. Hooda D (2001) A coding theorem on generalized r-norm entropy. Korean J Comput Appl Math 8(3):657–664 15. Jose J, Abdul Sathar E (2019) Residual extropy of k-record values. Stat Probab Lett 146:1–6 16. Kapur JN (1967) Generalized entropy of order α and type β. In: The mathematics seminar, vol 4, pp 78–82 17. Kayal S (2021) Failure extropy, dynamic failure extropy and their weighted versions. Stoch Qual Control 36(1):59–71 18. Kayal S, Vellaisamy P (2011) Generalized entropy properties of records. J Anal 19:25–40 19. Kazemi MR, Tahmasebi S, Buono F, Longobardi M (2021) Fractional deng entropy and extropy and some applications. Entropy 23:623 20. Lad F, Sanfilippo G, Agro G (2015) Extropy: complementary dual of entropy. Stat Sci 30(1):40– 58 21. Qiu G (2017) The extropy of order statistics and record values. Stat Probab Lett 120:52–60 22. Qiu G, Jia K (2018) The residual extropy of order statistics. Stat Probab Lett 133:15–22 23. Rènyi A (1961) On measures of entropy and information. Technical report, Hungarian Academy of Sciences Budapest Hungary 24. Sathar EIA, Nair RD (2019) On dynamic survival extropy. Commun Stat Theory Methods 50(6):1295–1313 25. Sati MM, Gupta N (2015) On partial monotonic behaviour of Varma entropy and its application in coding theory. J Indian Stat Assoc 53:135–152 26. Shangri D, Chen J (2012) Partial monotonicity of entropy measures. Stat Probab Lett 82(11):1935–1940 27. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(379– 423):623–656
196
N. Gupta and S. Kumar Chaudhary
28. Sunoj S, Sankaran P, Maya S (2009) Characterizations of life distributions using conditional expectations of doubly (interval) truncated random variables. Commun Stat Theory Methods 38(9):1441–1452 29. Tahmasebi S, Toomaj A (2022) On negative cumulative extropy with applications. Commun Stat Theory Methods 51:5025–5047 30. Tahmasebi S, Kazemi MR, Keshavarz A, Jafari AA, Buono F (2022) Compressive sensing using extropy measures of ranked set sampling. Mathematica Slovaca, accepted for publication 31. Tsallis C (1988) Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys 52:479–487 32. Tuli R (2010) Mean codeword lengths and their correspondence with entropy measures. Int J Eng Nat Sci 4:175–180 33. Varma R (1966) Generalizations of Renyi’s entropy of order α. J Math Sci 1:34–48 34. Yeung RW (2002) A first course in information theory. Kluwer Academic/Plenum Publishers, New York
Error Bound for the Linear Complementarity Problem Using Plus Function Bharat Kumar, Deepmala, A. Dutta, and A. K. Das
Abstract In this article we establish error bound for linear complementarity problem with P-matrix using plus function. We define a fundamental quantity connected to a P-matrix and demonstrate a method to determine bounds on the error for the linear complementarity issue of the P-type. For the quantity introduced, we find upper and lower bounds. Keywords Linear complementarity problem · Plus function · Error bound · Relative error bound AMS Subject Classifications 90C33 · 15A39 · 15A24 · 15A60 · 65G50
1 Introduction Error bound is important to consider a measure by which the approximate solution does not belong to the solution set and obtain the convergence rates of different approaches. The error bound decides stopping criteria in terms of convergence analysis for iterative method. It also plays an extensive role in sensitivity analysis. In linear complementarity problem, the use of error bound is focused not only for the bounds but also for the rate of convergence for the iterative method applied to find B. Kumar (B) · Deepmala Department of Mathematics, PDPM-Indian Institute of Information Technology Design and Manufacturing, Jabalpur 482005, (MP), India e-mail: [email protected] Deepmala e-mail: [email protected] A. Dutta Department of Mathematics, Jadavpur University, Kolkata 700 032, India A. K. Das SQC & OR Unit, Indian Statistical Institute, Kolkata 700 108, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_15
197
198
B. Kumar et al.
the solution. This is indicated as LCP(A, q). In case of A ∈ P-matrix, Mathias and Pang [11] established bound for LCP(A, q). Here we propose a new kind of error bound for LCP(A, q) where A ∈ P-matrices based on plus function. The linear complementarity problem identifies a real-valued vector that satisfies a particular system of inequalities and a complementarity condition, which is stated as follows: Assume A ∈ Rn×n and q ∈ Rn . Now LCP(A, q) is to determine a z with the following: Az + q ≥ 0, z ≥ 0, (1) z T (Az + q) = 0.
(2)
For this problem, we obtain a vector z ∈ Rn , which either satisfies the inequalities (1) along with the complementarity condition (2) or ensures that no such z exists. The set of feasible points for LCP(A, q) is indicated as FEA(A, q) and SOL(A, q) = {z ∈ F E A(A, q) : z T (Az + q) = 0}. For the recent study on the issue of linear complementarity and their use, see [6, 16, 18, 23] and the given references. For details of several matrix classes in complementarity theory, see [7, 9, 10, 12, 13, 17, 20–22] and the given references. The optimal strategies and value vector for zero-sum game and quadratic related programming problem are formulated as linear complementary problems. For details, see [14, 15, 19, 26]. For details of pivotal method to find solution of a complementarity problem, see [4, 5, 24, 25]. For LCP(A, q), studying the related error bound is one key issue. Now consider S = SOL(A, q). The residual function in case of LCP(A, q) is defined by the nonnegative function r¯ : S → R+ if it fulfills the condition that r¯ (z) = 0 if and only if z ∈ S. In recent years, many researchers are interested with global error bounds for this problem and related mathematical programs. For details, see [1, 3, 8, 11]. In this paper, we consider LCP(A, q), where A is a P-matrix. Our aim is to acquire global error bounds with respect to the solution of LCP(A, q) using plus function. Some basic terminologies and results are given in Sect. 2. In Sect. 3, we formulate both boundaries of solution of LCP(A, q). In the last section, give an example to illustrate the bounds of the error of the solution of LCP(A, q).
2 Preliminaries We start by outlining certain fundamental notations that will be utilized in this study. We take into account real matrices and vectors. Rn implies the n dimensional space of real entries, and Rn++ and Rn+ imply positive and nonnegative orthant. x ∈ Rn is a column vector and x T stands for its transpose. e implies column vector where entries are 1. xi implies ith component of the vector x ∈ Rn .
Error Bound for the Linear Complementarity Problem Using Plus Function
199
For a given z ∈ Rn , we define z∞ = max |z i |. For A ∈ Rn×n , we say A∞ = max |Ai j |.
i
i, j
Let z2A = max z i (Az)i . Now the quantity β(A) is defined as i
β(A) = min z2A z∞ =1
(3)
which is finite and positive. Given z ∈ Rn , the following inequality holds: max z i (Az)i ≥ β(A)z2∞ . i
(4)
For details, see [11]. Here we state a few selected definitions and lemmas for the next section. Definition 1 Consider z ∈ R. Then the plus function is defined as z + = max(0, z). For z ∈ Rn , the plus function z + is also defined as (z + )i = (z i )+ ∀ i, where (z i )+ = max(0, z i ). Based on the definition of z + , it is clear that z + ≥ 0. Definition 2 [2] A matrix A ∈ Rn×n is a P-matrix if all principal minors of A are positive. Lemma 1 [2] A matrix A is a P-matrix if and only if for every z ∈ Rn \{0} max z i (Az)i > 0. i
(5)
Definition 3 [27] An error bound of (S1 , T1 ) is defined as α r¯ (z)μ1 ≤ dist(z, S1 ) ≤ β r¯ (z)μ2 ∀ z ∈ T1
(6)
for some α, β, μ1 , μ2 ∈ R++ , where dist(z, S1 ) = inf{z − y : y ∈ S1 } and the function r¯ : S → R+ is known as residual function. Note that if Eq. (6) holds for T1 = Rn , then S1 possesses a global error bound. For any column vector z ∈ T1 \ S1 , a relative error bound is achieved based on above Eq. (6). For all vectors in T1 , the relative error bound to z ∗ can be achieved as follows: β r¯ (z)μ2 α¯r (z)μ1 z − z ∗ ≤ ≤ ∀ z ∈ T1 β r¯ (0)μ2 z ∗ α r¯ (0)μ1
(7)
with the condition z ∗ = 0 and 0 ∈ T1 . For details see [27]. Lemma 2 [11] Suppose A is a P-matrix and z ∈ Rn denotes only solution of LCP(A, q). Then β(A−1 )(−q)+ ∞ ≤ z∞ ≤ β(A)−1 (−q)+ ∞ .
(8)
200
B. Kumar et al.
3 Main Results Theorem 1 Suppose A ∈ Rn×n is a P-matrix and z is the only solution of LCP(A, q). Then for any vector ζ ∈ Rn , 1 A Q ≤ z − ζ∞ ≤ Q, where A = 1 + A∞ and Q = (ζ − (ζ − (Aζ + q))+ )∞ . A β(A)
(9)
Proof Let p = (ζ − (ζ − (Aζ + q))+ ), l = q + Az, A = 1 + A∞ and Q = (ζ − (ζ − (Aζ + q))+ )∞ = p∞ . Consider the vector s = ζ − p = ζ − (ζ − (ζ − (Aζ + q))+ ) = (ζ − (Aζ + q))+ ≥ 0. Let c = q + (A − I ) p + As. This implies c = q + (A − I ) p + As = q + Ap − p + Aζ − Ap = q − p + Aζ = q − (ζ − (ζ − (Aζ + q))+ ) + Aζ = q − ζ + (ζ − (Aζ + q))+ + Aζ. Now we show that the vectors c and s satisfy complementarity condition. If ζi ≥ (Aζ + q)i , ((ζ − (Aζ + q))i )+ = ζi − (Aζ + q)i . Then ci = qi − ζi + ζi − (Aζ + q)i + (Aζ)i = 0 and si = ((ζ − (Aζ + q))i )+ = ζi − (Aζ + q)i ≥ 0. In another way, if ζi ≤ (Aζ + q)i , ((ζ − (Aζ + q))i )+ = 0. This implies that si = 0 and ci = qi − ζi + ((ζ − (Aζ + q))i )+ + (Aζ)i = (Aζ)i + qi − ζi ≥ 0. Considering both the cases, we obtain the following pair of inequalities and complementarity condition. s ≥ 0,
c = q + (A − I ) p + As ≥ 0, s T c = 0.
Now, for any i we write (s − z)i (c − l)i = si ci + z i li − z i ci − si li ≤ 0.
(10) (11)
Error Bound for the Linear Complementarity Problem Using Plus Function
201
Therefore, we have 0 ≥ (s − z)i (c − l)i = (ζ − p − z)i (q + Ap − p + As − q − Az)i = (ζ − p − z)i (q + Ap − p + Aζ − Ap − q − Az)i = (ζ − p − z)i (− p + A(ζ − z))i = −(ζ − z)i pi − pi (A(ζ − z))i + pi 2 + (ζ − z)i (A(ζ − z))i ≥ −(ζ − z)i pi − pi (A(ζ − z))i + (ζ − z)i (A(ζ − z))i .
This implies (ζ − z)i (A(ζ − z))i ≤ (ζ − z)i pi + pi (A(ζ − z))i . In particular, for the index i, we write (ζ − z)i (A(ζ − z))i = max(ζ − z) j (A(ζ − z)) j . j
Now from the condition (4), we have max z i (Az)i ≥ β(A)z2∞ . i
Hence (ζ − z)i (A(ζ − z))i ≥ β(A)ζ − z2∞ . Therefore, β(A)ζ − z2∞ ≤ (ζ − z)i (A(ζ − z))i ≤ (ζ − z)i pi + pi (A(ζ − z))i = ((I + A)(ζ − z))i pi ≤ (1 + A∞ ) p∞ ζ − z∞ . Hence ζ − z2∞ ≤
A Qζ β(A)
− z∞ .
To establish the left part of (9), take up an arbitrary index i for which pi > 0 and li = 0. Then (Az)i = −qi . In this case, if pi > 0, then pi = ζi − ((ζ − (Aζ + q))+ )i = ζi − ((ζ − (Aζ + q))i )+ > 0. Since li = 0, the inequality (Aζ + q)i ≥ ζi implies that pi = ζi ≤ (Aζ + q)i = (A(ζ − z))i and the another inequality (Aζ + q)i ≤ ζi implies that pi = ζi − ζi + (Aζ + q)i = (Aζ + q)i = (A(ζ − z))i . Hence considering the case pi > 0 and li = 0, we obtain | pi | = pi ≤ (A(ζ − z))i ≤ A∞ (ζ − z)∞ .
202
B. Kumar et al.
If z i = 0, then s = ζ − p ≥ 0 implies that pi ≤ ζi . Therefore, | pi | = pi ≤ ζi − z i ≤ (ζ − z)∞ , when z i = 0, pi > 0. Thus considering all the cases, we conclude that | pi | ≤ (1 + A∞ )(ζ − z)∞ . Now we consider the case pi ≤ 0. Let li = 0. Then pi = ζi − ((ζ − (Aζ + q))+ )i = ζi − ((ζ − (Aζ + q))i )+ < 0. Considering both the cases (Aζ + q)i ≤ ζi and (Aζ + q)i ≥ ζi , we obtain | pi | = − pi ≥ −(A(ζ − z))i , which implies that | pi | ≤ (A(ζ − z))i ≤ A∞ (ζ − z)∞ . If z i = 0, then | pi | = − pi ≥ −ζi + z i , which implies that | pi | ≤ (ζ − z)∞ . Hence considering the cases pi < 0, li = 0, z i = 0, we conclude that | pi | ≤ (1 + A∞ )(ζ − z)∞ . This inequality | pi | ≤ A(ζ − z)∞ implies that (ζ − z)∞ ≥
| pi | , A
for an arbitrary index i.
Hence (ζ − z)∞ ≥
p . A
Hence, we conclude that Q A
≤ (z − ζ)∞ ≤
A Q. β(A)
Remark 1 The residue of the vector ζ is the quantity (ζ − (ζ − (Aζ + q))+ )∞ in the expression (9). When ζ = 0, this residue is same as the quantity (−q)+ ∞ . Now we deduce the relative error bound. Theorem 2 Suppose A is a P-matrix and z ∈ Rn is only solution of LCP(A, q). Consider an arbitrary vector ζ ∈ Rn . Suppose that (−q)+ = 0. Then, β(A) Q Q z − ζ∞ A ≤ ≤ , A (−q)+ ∞ z∞ β(A−1 )β(A) (−q)+ ∞
(12)
where A = 1 + A∞ and Q = (ζ − (ζ − (Aζ + q))+ )∞ . Proof From Theorem 1, we have the inequality (9), and from Lemma 2, it is given that β(A−1 )(−q)+ ∞ ≤ z∞ ≤ β(A)−1 (−q)+ ∞ . Now combining these two inequalities, we obtain
Error Bound for the Linear Complementarity Problem Using Plus Function
203
Q Q β(A) z − ζ∞ A ≤ ≤ , −1 A (−q)+ ∞ z∞ β(A )β(A) (−q)+ ∞ where A = 1 + A∞ and Q = (ζ − (ζ − (Aζ + q))+ )∞ . Theorem 3 Assume A is a P-matrix. Then β(A) ≤ σ(A), where σ(A) = min{γ(Aμμ ) : μ ⊆ {1, 2, . . . , n}}, γ(Aμμ ) denotes the smallest eigenvalue of the principal submatrix Aμμ . Proof Let σ(A) = min{γ(Aμμ ) : μ ⊆ {1, 2, . . . , n}}, γ(A) be the smallest eigenvalue of the principal submatrix Aμμ . By this definition of σ(A), the matrix (A − σ(A)I ) cannot be a P-matrix. Then ∃ y ∈ Rn such that max yi ((A − σ(A)I )y)i ≤ 0. i
This signifies that max(yi (Ay)i − σ(A)yi 2 ) ≤ 0. i
With y∞ = 1, max(yi (Ay)i ) ≤ σ(A). i
Now introducing minimum on both sides of the above inequality, we obtain min max yi (Ay)i ≤ σ(A).
y∞ =1
i
This implies that α(A) ≤ y2A ≤ σ(A). Corollary 1 Assume A is a nondiagonal P-matrix and λ ∈ (0, 1). Now m, h and ti for i = 1, 2, . . . , n are defined as follows. m = max |Ai j |, i = j
t1 = min{σ(A), λh},
ti+1 =
h=
m2 . σ(A)
(1 − λ)2 ti2 h
f or i ≥ 1.
Then β(A) ≥ tn . Corollary 2 Suppose A is an H -matrix with aii > 0. Consider that A¯ defined as A¯ i j =
Aii if i = j, −|Ai j | if i = j.
204
B. Kumar et al.
Then, for every vector e > 0, e1 = A¯ −1 e > 0 and β(A) ≥
(mini ei )(mini e1i ) . (min j e1 j )2
The upper and lower boundary of the term β(A) are established by the above corollaries. Now we study the error bound related to diagonal P-matrix. Theorem 4 Suppose A is a diagonal P-matrix and z ∈ Rn is only solution of 1 Q ≤ z − ζ∞ ≤ minAi Aii Q, where A = LCP(A, q). Then for any ζ ∈ Rn , A 1 + A∞ , Q = (ζ − (ζ − (Aζ + q))+ )∞ and Aii is the ith diagonal of A. Proof Consider that A is a diagonal P-matrix. Then max Aii = max Aii xi 2 ≥ min Aii > 0, x∞ =1,i
i
i
where Aii is the i-th diagonal element of A. By definition (3), β(A) = min z2A = min max z i (Az)i ≥ min Aii z∞ =1
z∞ =1
i
i
and by Theorem 3, β(A) ≤ σ(A). For diagonal P-matrix A, σ(A) = min Aii . i
Hence β(A) ≤ min Aii . i
Both the inequalities imply that β(A) = min Aii , i
where Aii is the ith diagonal element of A. Let the real vector z be the only solution of LCP(A, q) and ζ ∈ Rn be any vector. From Theorem 1, we obtain the inequality (9). Now using the value of β(A) in the inequality (9), we obtain the following inequality: 1 Q A
≤ z − ζ∞ ≤
A mini Aii
Q
where Aii is the ith diagonal of A. Theorem 5 Consider a diagonal P-matrix A ∈ Rn×n . Suppose z ∈ Rn is a solution of LCP(A, q). Consider an arbitrary vector ζ ∈ Rn . Then the relative error satisfies the following inequality:
Error Bound for the Linear Complementarity Problem Using Plus Function
min Aii i
A
205
Q Q z − ζ∞ A ≤ ≤ , (−q)+ ∞ z∞ ( max1 Aii )min Aii (−q)+ ∞ i
i
where A = 1 + A∞ , Q = (ζ − (ζ − (Aζ + q))+ )∞ and Aii is the ith diagonal of A. Proof Consider a diagonal P-matrix A ∈ Rn×n . By Theorem 4, it is clear that β(A) = min Aii , where Aii is the i-th diagonal element of A. Since A is a diagi
onal matrix, β(A−1 ) = min(A−1 )ii = i
min Aii i
A
1 max Aii
. Now from Theorem 2, we obtain
i
Q Q z − ζ∞ A ≤ ≤ , 1 (−q)+ ∞ z∞ ( max Aii )min Aii (−q)+ ∞ i
i
where z ∈ Rn is the only solution of LCP(A, q), ζ ∈ Rn is an arbitrary vector and Aii is the i-th diagonal element of the matrix A.
4 Numerical Example ⎡
⎤ 4 1 2 Consider the matrix A = ⎣ 3 5 −1 ⎦, which is a P-matrix. −1 −2 7
41 The principal submatrices of A are A11 = 4, A22 = 5, A33 = 7, Aαα = , 35 5 −1 4 2 where α = {1, 2}, Aββ = , where β = {2, 3}, Aδδ = , where −2 7 −1 7 δ = {1, 3}. √ Now γ(A11 ) = 4, γ(A22 ) = 5, γ(A33 ) = 7, γ(Aαα ) = 9−2 13 , γ(Aββ ) = √ 12− 12 , γ(Aδδ ) 2
= 4.5, where γ is defined by Theorem 3. √
√
√
Hence σ(A) = min{4, 5, 7, 4.5, 9−2 13 , 12−2 12 } = 9−2 13 = 2.69722436. m2 Now m = maxi = j |Ai j | = 3, h = σ(A) = 3.33676357. Let λ = 0.5, then t1 = min{σ(A), λh} = min{2.69722436, 1.66838179} = 1.66838179. (1−λ)2 t12 t2 = = 0.25∗2.7834978 = 0.208547725. h 3.33676357 (1−λ)2 t 2
2 = 0.25∗0.0434921536 = 0.00325855823. t3 = h 3.33676357 Therefore β(A) ≥ 0.00325855823. Again, β(A) ≤ z A 2 ≤ σ(A) = 2.69722436. Here A∞ = 7. So A = 8. Consider that z ∈ Rn is the only solution of LCP(A, q) and ζ ∈ Rn is a vector. From (9), we obtain
1 Q A
≤ z − ζ∞ ≤
A Q. β(A)
206
B. Kumar et al.
This implies that 1 Q 8
≤ z − ζ∞ ≤
8 Q. 2.69722436
Therefore, the error satisfies the inequality, 0.125Q ≤ z − ζ∞ ≤ 2.96601207Q.
5 Conclusion In this study, we introduce a different type of error bounds for LCP(A, q) in case of P-matrix using plus function. We introduce a new residual approach to bound the error as well as the relative error. We also study the error bound for diagonal P-matrix. A numerical example is illustrated to demonstrate the upper and lower bound of the error. Acknowledgements The UGC (University Grants Commission), Government of India, is gratefully acknowledged by the author Bharat Kumar. The author A. Dutta is thankful to the DST, Govt. of India, INSPIRE Fellowship Scheme for providing financial assistance.
References 1. Chen X, Xiang S (2006) Computation of error bounds for P-matrix linear complementarity problems. Math Program 106(3):513–525 2. Cottle RW, Pang JS, Stone RE (2009) The linear complementarity problem. SIAM 3. Dai PF, Lu CJ, Li YT (2013) New error bounds for the linear complementarity problem with an SB-matrix. Numer Algorithms 64(4):741–757 4. Das AK (2014) Properties of some matrix classes based on principal pivot transform. Ann Oper Res 243. https://doi.org/10.1007/s10479-014-1622-6 5. Das AK, Jana R (2016) Deepmala: on generalized positive subdefinite matrices and interior point algorithm. In: Frontiers in optimization: theory and applications. Springer, pp 3–16 6. Das AK, Jana R (2017) Deepmala: finiteness of criss-cross method in complementarity problem. In: International conference on mathematics and computing. Springer, pp 170–180 7. Dutta A, Jana R, Das AK (2022) On column competent matrices and linear complementarity problem. In: Proceedings of the seventh international conference on mathematics and computing. Springer, pp 615–625 8. Garcia-Esnaola M, Pena J (2010) A comparison of error bounds for linear complementarity problems of h-matrices. Linear Algebr Appl 433(5):956–964 9. Jana R, Das AK, Dutta A (2019) On hidden z-matrix and interior point algorithm. OPSEARCH 56. https://doi.org/10.1007/s12597-019-00412-0 10. Jana R, Dutta A, Das AK (2019) More on hidden z-matrices and linear complementarity problem. Linear Multilinear Algebr 69:1–10. https://doi.org/10.1080/03081087.2019.1623857 11. Mathias R, Pang JS (1990) Error bounds for the linear complementarity problem with a Pmatrix. Linear Algebr Appl 132:123–136 12. Mohan SR, Neogy SK, Das AK (2001) More on positive subdefinite matrices and the linear complementarity problem. Linear Algebr Appl 338(1–3):275–285
Error Bound for the Linear Complementarity Problem Using Plus Function
207
13. Mohan SR, Neogy SK, Das AK (2001) On the classes of fully copositive and fully semimonotone matrices. Linear Algebr Appl 323:87–97. https://doi.org/10.1016/S0024-3795(00)002470 14. Mohan SR, Neogy SK, Das AK (2004) A note on linear complementarity problems and multiple objective programming. Math Program Series A. Series B 100. https://doi.org/10.1007/s10107003-0473-8 15. Mondal P, Sinha S, Neogy SK, Das AK (2016) On discounted arat semi-markov games and its complementarity formulations. Int J Game Theory 45(3):567–583 16. Neogy SK, Bapat RB, Das AK, Parthasarathy T (2008) Mathematical programming and game theory for decision making. https://doi.org/10.1142/6819 17. Neogy SK, Bapat RB, Das AK, Parthasarathy T (2021) Mathematical programming and game theory for decision making 18. Neogy SK, Bapat RB, Das AK, Pradhan B (2016) Optimization models with economic and game theoretic applications. Ann Oper Res 243. https://doi.org/10.1007/s10479-016-2250-0 19. Neogy SK, Das AK (2005) Linear complementarity and two classes of structured stochastic games. In: Mohan SR, Neogy SK (eds) Operations research with economic and industrial applications: emerging trends. Anamaya Publishers, New Delhi, India, pp 156–180 20. Neogy SK, Das AK (2005) On almost type classes of matrices with q-property. Linear Multilinear Algebr 53:243–257. https://doi.org/10.1080/03081080500092380 21. Neogy SK, Das AK (2006) Some properties of generalized positive subdefinite matrices. SIAM J Matrix Anal Appl 27:988–995. https://doi.org/10.1137/040613585 22. Neogy SK, Das AK (2013) On weak generalized positive subdefinite matrices and the linear complementarity problem. Linear Multilinear Algebr 61. https://doi.org/10.1080/03081087. 2012.719507 23. Neogy SK, Das AK, Bapat RB (2022) Modeling, computation and optimization 24. Neogy SK, Das AK, Gupta A (2012) Generalized principal pivot transforms, complementarity theory and their applications in stochastic games. Optim Lett 6:339–356. https://doi.org/10. 1007/s11590-010-0261-3 25. Neogy SK, Das AK, Gupta A (2012) Generalized principal pivot transforms, complementarity theory and their applications in stochastic games. Optim Lett 6(2):339–356 26. Neogy SK, Das AK, Sinha SK, Gupta A (2008) On a mixture class of stochastic game with ordered field property. In: Mathematical programming and game theory for decision making. World Scientific, pp 451–477 27. Pang JS (1997) Error bounds in mathematical programming. Math Program 79(1):299–332
On the Solutions of the Diophantine Equation u a + v b = z 2 for Some Prime Pairs u and v Ednalyn Xyra P. Calabia and Jerico B. Bacani
Abstract In this paper, we solve the Diophantine equation u a + v b = z 2 in the set of non-negative integers. Here, the base pair (u, v) is any of the prime pairs of the form (u, 4u + 1), (u, 4u + 3), and (u, 8u + k), where k = 1, 3, 5, 7. Our strategy uses factoring and modular arithmetic methods. Keywords Exponential Diophantine equation · Prime pairs · Catalan’s conjecture
1 Introduction A Diophantine equation (DE) is any equation in one or more unknowns, that is to be solved in the set of integers or in its subset. DEs of the form a1 x1 + a2 x2 + · · · + an xn = c, where a1 , a2 , ..., an are fixed integers are called linear Diophantine equations (LDE). Otherwise, they are called nonlinear Diophantine equation (NDE) [1]. Several studies have provided general solutions for various forms of NDEs. There are also some that are named after mathematicians. One example is Fermat’s Last Theorem which states that no positive integers a, b and c satisfy the equation a n + bn = cn for any integer n greater than 2 [15]. The specific case when n = 2 in this equation, i.e., a 2 + b2 = c2 , is widely known and is called Pythagorean equation. The integers a, b, c that satisfy this equation are called Pythagorean triples. This equation is widely used as a way of finding the length of a side of a right-angled triangle Another commonly known DE is Pell’s equation with the general form x 2 − Dy 2 = m, where m is a nonzero integer and D is a non-square positive integer [2]. E. X. P. Calabia (B) · J. B. Bacani University of the Philippines Baguio, Governor Pack Road, 2600 Baguio City, Philippines e-mail: [email protected] J. B. Bacani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_16
209
210
E. X. P. Calabia and J. B. Bacani
Exponential Diophantine equations (EDEs) are NDEs that have one or more variable exponents [2]. This present paper will be presenting new results on EDEs of the form (1) ua + vb = z2 . Many authors had studied the general solutions of (1) on different conditions. Some studies that had examined the solution of (1) when there are fixed integer values for u and v can be read in [3] and [4]. Some also derived the solutions when the bases u and v are different types of prime pairs. Bacani and Rabago [5], and Gupta, Kumar R., and Kumar S. [6] solved for solutions of (1), where u and v are twin primes. Both studies had shown that (1) will have infinitely many solutions in this case. Burshtein [7] [8], and Bacani and Mina [9], on the other hand, studied the solutions when u and v are cousin primes, and prime pairs with prime gaps of multiples of 4. Burshtein [10], and Gupta, Kumar R., and Kumar S. [11] explored the solutions when u and v are sexy primes. Other forms of prime pairs like ( p, p + 8) and ( p, p + 20) are studied by de Oliveira [12] and Dockhan [13], respectively. One set of prime pairs for (1) to have infinite solutions is when u and v are pairs of safe and Sophie Germaine primes which are proven by Burshtein in [14]. The above-mentioned works are the motivations of this study. This present study seeks to find the solutions in N0 (i.e., the set of all non-negative integers) of the Diophantine equation ua + vb = z2 , by considering the base pairs (u, v) that are equal to any of the following prime pairs: (u, 4u + 1), (u, 4u + 3) and (u, 8u + k), where k = 1, 3, 5, 7. We shall be using the notation a ≡m b for congruence modulo m.
1.1 Preliminaries Here, we present some definitions and theorems from published literature that the authors found useful in the conduct of the study. Theorem 1 ([16] (Dirichlet’s Theorem)) Given a and b which are relatively prime, the sequence of the arithmetic progression of terms an + b, where n = 1, 2, 3, ... contains an infinite number of primes. Corollary 1 Each of the sequences {4n + 1}, {4n + 3}, {8n + 1}, {8n + 3}, {8n + 5} and {8n + 7}, where the integer n ranges from 1 onwards, contains an infinite number of primes. Theorem 2 ([17] (Catalan’s Conjecture/Mihailescu’s Theorem)) The only solution in integers to the equation a x − b y = 1, where a, b, x, y > 1, is (a, b, x, y) = (3, 2, 2, 3), i.e., 32 − 23 = 1.
On the Solutions of the Diophantine Equation …
211
Theorem 3 ([15] (Fermat’s Last Theorem)) There exist no positive integers a, b, and c that satisfy the equation a n + bn = cn for any integer n > 2. Theorem 4 ([3]) The equation 2a + 17b = z 2 has the following solutions in N0 : (a, b, z) ∈ {(3, 0, 3), (3, 1, 5), (5, 1, 7), (6, 1, 9), (7, 3, 71), (9, 1, 23)}. Theorem 5 ([4]) The equation (a, b, z) = (3, 0, 3) in N0 .
2a + 19b = z 2
has
a
unique
solution
2 Main Results In this section, we provide the non-negative integer solutions of the Diophantine equation (2) ua + vb = z2 , for prime pairs u and v of the form (u, v) ∈ {(u, 4u + 1), (u, 4u + 3), (u, 8u + k)}, where k = 1, 3, 5, 7. By non-negative integer solution, we mean the triple (a, b, z) where the components are all in N0 . We first solve (2) by letting u = 2 in each of the cases, and then solve for the case when u is an odd prime. Note here that if u is an odd prime, then u ≡4 1 or u ≡4 3. Also, note that since u and v are primes and are greater than 1, it follows that z is an even number greater than 2 and z 2 ≡4 0.
2.1 When u = 2 Let u = 2. Then, we have the following values of v : v = 4u + 1 = 4(2) + 1 = 9, v = 4u + 3 = 4(2) + 3 = 11, v = 8u + 1 = 8(2) + 1 = 17, v = 8u + 3 = 8(2) + 3 = 19, v = 8u + 5 = 8(2) + 5 = 21, v = 8u + 7 = 8(2) + 7 = 23. Integers 11, 17, 19, and 23 are prime numbers. Thus, if u = 2, we get the prime pairs (2, 11), (2, 17), (2, 19), and (2, 23), which are in the forms (u, 4u + 3), (u, 8u + 1), (u, 8u + 3), and (u, 8u + 7), respectively. We now solve for the solutions of (2) if (u, v) are these prime pairs.
212
2.1.1
E. X. P. Calabia and J. B. Bacani
Solutions of 2 a + 11b = z 2
Theorem 6 The equation 2a + 11b = z 2 has a unique solution (a, b, z) = (3, 0, 3) in N0 . Proof Consider the equation 2a + 11b = z 2 . We divide the proof into two cases. Case 1: If a = 0, then the equation becomes 11b = z 2 − 1 = (z + 1)(z − 1). This means that there exist integers α and β, where α < β and α + β = b, such that 11α = z − 1 and 11β = z + 1. Subtracting these two, we have 11β − 11α = (z + 1) − (z − 1) ⇒ 11α (11β−α − 1) = 2. Since 11α = 2, we have α = 0. If we substitute this to z − 1 = 11α , we get z = 0. Also, if α = 0 in 11β − 11α = 2, we get 11β = 3. However, this will imply that 11β = z + 1 = 3, which is impossible for z = 0. Case 2: Let a be a positive integer. Note that 2a is even and 11b is odd for nonnegative integers a and b. This implies that z is odd, that is z = 2t + 1 for some non-negative integer t. Hence, it follows that 2a + 11b = (2t + 1)2 = 4(t 2 + t) + 1. Since 2|(4(t 2 + t)), then it must be that 11b ≡4 1. Moreover, since 11 ≡4 3, then 11b ≡4 1 only if b is even, that is, b = 2s for some non-negative integer s. If b = 0, then we have z 2 − 1 = 2a or (z − 1)(z + 1) = 2a , where z − 1 = 2v and z + 1 = 2a−v for some non-negative integer v. Subtracting these two gives 2a−v − 2v = (z + 1) − (z − 1) ⇒ 2v (2a−2v − 1) = 2, which implies that v = 1. We are now left with 2a−2 = 2. Equating the exponents, we have a − 2 = 1 or a = 3. Substituting a = 3 and b = 0 to our equation gives 23 + 110 = 8 + 1 = 9 which implies that z = 3. If b ≥ 2, then s ≥ 1 and we have z 2 − 112s = 2a or equivalently (z − 11s )(z + s 11 ) = 2a . Thus, z − 11s = 2u and z + 11s = 2a−u for a non-negative integer u. Subtracting these two, we have 2a−u − 2u = 2u (2a−2u − 1) = (z + 11s ) − (z − 11s ) = 2(11)s . If u = 0, then z − 11s = 2u = 20 = 1 or z = 11s + 1, which implies that z is even. This is a contradiction to the assumption that z is odd. If u = 1, then we have 2(2a−2 − 1) = 2(11)s or 2a−2 − 11s = 1. This is a contradiction to Catalan’s conjecture (Theorem 2). Therefore, the only solution of 2a + 11b = z 2 in the set of non-negative integers is given by (a, b, z) = (3, 0, 3).
On the Solutions of the Diophantine Equation …
2.1.2
213
Solutions of 2 a + 17 b = z 2
See Theorem 4 for the solutions of 2a + 17b = z 2 .
2.1.3
Solutions of 2 a + 19b = z 2
See Theorem 5 for the solutions of 2a + 19b = z 2 .
2.1.4
Solutions of 2 a + 23b = z 2
Theorem 7 The equation 2a + 23b = z 2 has a unique solution (a, b, z) = (3, 0, 3) in N0 . Proof The proof here follows the arguments in the proof of Theorem 6. Case 1: If a = 0, then there exist α and β, where α < β and α + β = b, such that 23α = z − 1 and 23β = z + 1. Hence, we have 23α (23β−α − 1) = 2. This implies that α = 0, and consequently, z = 0. Moreover, we get that 23β = z + 1 = 3, which is impossible for z = 0. Case 2: Let a be a positive integer. Note here that z is odd, that is, z = 2t + 1, for some non-negative integer t. Hence, we have 2a + 23b = 4(t 2 + t) + 1. It must be that 23b ≡4 1. Since 23 ≡4 3, then 23b ≡4 1 only if b is even, that is, b = 2s for some non-negative integer s. If b = 0, we have (z − 1)(z + 1) = 2a , where z − 1 = 2v and z + 1 = 2a−v , for some non-negative integer v. This gives 2v (2a−2v − 1) = 2, which implies that v = 1. Hence, we have 2a−2 = 2. This implies that a = 3, moreover z = 3. Thus, we have the solution (x, y, z) = (3, 0, 3). If b ≥ 2, then s ≥ 1, and we have (z − 23s )(z + 23s ) = 2a . Thus, z − 23s = 2u and z + 11 = 2a−u , for a non-negative integer u. Subtracting these two, we have 2u (2a−2u − 1 = 2(23s ). If u = 0, then z = 23s + 1, which is a contradiction because z is odd. If u = 1, then 2a−2 − 23s = 1, which contradicts Theorem 2. Therefore, the only solution of 2a + 23b = z 2 in the set of non-negative integers is given by (a, b, z) = (3, 0, 3). We have obtained the solutions of (2) for the prime pairs that we considered such that u = 2. In the next sections, we will be considering the cases where u is an odd prime.
2.2 Solutions of u a + v b = z 2 for (u, v) = (u, 4u + 1) Lemma 1 Take any odd u and 4u + 1 such that u ≡4 3. Then z 2 ≡4 0, where z 2 = u a + (4u + 1)b for even integer a and nonzero integer b .
214
E. X. P. Calabia and J. B. Bacani
Proof Let u be an odd number such that u ≡4 3. Then, u a ≡4 3 for any even integer a. Note that (4u + 1)b ≡4 1 for any nonzero integer b. Adding these two to obtain z 2 , we have z 2 = u a + (4u + 1)b ≡4 3 + 1 ≡4 0. Thus, z 2 ≡ 0 in this case. Lemma 2 The Diophantine equation u a + (4u + 1)b = z 2 , where u ≡4 1, has no solutions in N0 . Proof Let u ≡4 1. Then, 4u + 1 ≡4 1. Hence, replacing v by 4u + 1 in (2), we have u a + (4u + 1)b ≡4 2, which is a contradiction because z 2 ≡4 0. Lemma 3 The Diophantine equation u a + (4u + 1)b = z 2 , where u ≡4 3, has no solutions in N0 when a is even. Proof Let u ≡4 3 and a be even. Note that u 2k ≡4 1 for any positive integer k. Replacing v by 4u + 1 in (2), we have u a + (4u + 1)b ≡4 2 which is a contradiction because z 2 ≡4 0. Theorem 8 Let a be an odd number and b be an even number. Then, the Diophantine equation (2) for prime pairs (u, v) = (u, 4u + 1), where u ≡4 3, has two solutions in N0 given by (u, v, a, b, z) = (3, 13, 1, 0, 2), (3, 13, 3, 2, 14). Proof Let a and b be non-negative integers such that a is odd and b is even. Then, for some non-negative integer s, we can write b = 2s. Replacing v by 4u + 1, the DE (2) becomes u a + (4u + 1)2s = z 2 . (3) Adding −(4u + 1)2s on both sides of the equation, we can rewrite u a as u a = z 2 − (4u + 1)2s , or equivalently u a = (z + (4u + 1)s )(z − (4u + 1)s ). Since u is prime, there exist integers α and β with α < β such that α + β = a, and u β = (z + (4u + 1)s ) and u α = (z − (4u + 1)s ). Subtracting these two, we have u α (u β−α − 1) = (z + (4u + 1)s ) − (z − (4u + 1)s ) = 2(4u + 1)s .
(4)
Since u 2 and u 4u + 1, we have α = 0. Substituting α = 0 in (4), we now have u α (u β−α − 1) = u 0 (u β−0 − 1) = u β − 1 = 2(4u + 1)s .
(5)
Since α = 0, then β = a. By factoring (5), we get (u − 1)(u a−1 + u a−2 + · · · + 1) = 2(4u + 1)s .
(6)
Note that since all powers of u is odd, the term (u a−1 + u a−2 + · · · + 1) is also odd. Moreover, since u is odd, then u − 1 is even, and it must be that u − 1 = 2(4u + 1)r , for some non-negative integer r ≤ s, so that (6) will hold true. If r ≥ 1, then u − 1 < 2(4u + 1)r ; hence, we have no solution when r ≥ 1.
On the Solutions of the Diophantine Equation …
215
If r = 0, then u − 1 = 2(4u + 1)0 , which implies that u = 3. Let u = 3 in (6) to get (2)(3a−1 + 3a−2 + · · · + 1) = 2(4(3) + 1)s ⇒ 3a−1 + 3a−2 + · · · + 1 = 13s .
(7)
Let a = 1 in (7). Then we have 1 = 13s , which implies that s = 0. Let u = 3, a = 1 and s = 0 in (3). Solving for z, we get 31 + 132(0) = z 2 , or equivalently 3 + 1 = z 2 , which implies that z = 2. This gives the solution (u, v, a, b, z) = (3, 13, 1, 0, 2) in the set of non-negative integers. Now, we consider s ≥ 1 and a > 1. First, let s = 1 in (7), then a = 3, that is, 33−1 + 33−2 + 33−3 = 9 + 3 + 1 = 13. Now, let u = 3, a = 3 and s = 1 in (3). Solving for z we have 33 + (4(3) + 1)2 = 33 + 132 = 196 = 142 . Hence, we have the solution (u, v, a, b, z) = (3, 13, 3, 2, 14). Lastly, let a > 3 and s > 1. Taking mod 13 of (7), we have 3a−1 + 3a−2 + · · · + 1 ≡13 0. Now, multiplying this by 3-1, we have (3 − 1)(3a−1 + 3a−2 + · · · + 1) ≡13 (3 − 1)0 ⇒(3 − 1)(3a−1 + 3a−2 + · · · + 3) − (3a−1 + 3a−2 + · · · + 3 + 1)) ≡13 0 ⇒3a − 1 ≡13 0, which will hold true if 3a ≡13 1, or equivalently, if a = 3. This is a contradiction to the assumption that a > 3.
2.3 Solutions of u a + v b = z 2 for (u, v) = (u, 4u + 3) Lemma 4 The Diophantine equation u a + (4u + 3)b = z 2 , where u ≡4 1, has no solutions in N0 when b is even. Proof Let u ≡4 1 and b be even. Note that (4u + 3)2k ≡4 1 for any positive integer k. Replacing v by 4u + 3 in (2), we have u a + (4u + 3)b ≡4 2 which is a contradiction because z 2 ≡4 0. Lemma 5 The Diophantine equation u a + (4u + 3)b = z 2 , where u ≡4 3, has no solutions in N0 , when a and b are of the same parity.
216
E. X. P. Calabia and J. B. Bacani
Proof Let u ≡4 3. Note that 4u + 3 ≡4 3. If a and b are both even, then for some nonnegative integers r and s, we can write a = 2r and b = 2s. We now have u 2r ≡4 1 and (4u + 3)2s ≡4 1. Adding these two, we get u 2r + (4u + 3)2s ≡4 2. If a and b are both odd, then we can write a = 2r + 1 and b = 2s + 1. It follows that u 2r +1 ≡4 3 and (4u + 3)2s+1 ≡4 3. Adding these two, we get u 2r +1 + (4u + 3)2s+1 ≡4 2. Since in both cases u a + v b ≡4 0 when a and b are of the same parity, we have no solutions in this case. Theorem 9 The Diophantine equation (2) for prime pairs (u, v) = (u, 4u + 3), where u ≡4 3 , has no solutions in N0 , when a is even and b is odd. Proof Let a be even. Then, a = 2s for some non-negative integer s. Hence, rewriting the Diophantine equation, we have u 2s + (4u + 3)b = z 2 , or equivalently, (4u + 3)b = (z + u s )(z − u s ). Then, there exist integers α and β with α < β, such that α + β = b, and (4u + 3)α ((4u3)β−α − 1) = 2u s . Since (4u + 3) 2 and (4u + 3) u, then α = 0 and β = b. Hence, we are left with (4u + 3)b − 1 = 2u s . By factoring, we have ((4u + 3) − 1)((4u + 3)b−1 + (4u + 3)b−2 + · · · + 1) = 2u s . This implies that (4u + 3) − 1 = 2u r , or 4u + 2 = 2u r , for some r ≤ s. If r = 0, then we have 4u = 2 − 2, which implies that u = 0, which is not prime. If r = 1, then we have 4u − 2u = −2, which implies that u = −1, which is not prime. If r = 2, then we have 4u + 2 = 2u 2 , which gives no integer solutions u. If r > 2, then 4u + 2 < 2u r . Hence, we have no solution in the set of non-negative integers when a is even and b is odd. Theorem 10 The Diophantine equation (2) for prime pairs (u, v) = (u, 4u + 3), where u ≡4 3, has no solutions in N0 , when a is odd and b is even. Proof Let b be even, then b = 2s, for some non-negative integer s. Hence, rewriting v = 4u + 3 in (2) gives u a + (4u + 3)2s = z 2 . This is equivalent to u a = z 2 − (4u + 3)2s or u a = (z − (4u + 3)s )(z + (4u + 3)2 ). Since u is prime, there exist integers α and β with α < β such that α + β = a, and u α (u β−α − 1) = 2(4u + 3)s .
On the Solutions of the Diophantine Equation …
217
Since u 2 and u 4u + 3, then α = 0 and β = a. By factoring, we have (u − 1)(u a−1 + u a−2 + · · · + 1) = 2(4u + 3)s . Since (u a−1 + u a−2 + · · · + 1) is odd, it follows that u − 1 = 2(4u + 3)r , for some r ≤ s. If r = 0, then u − 1 = 2(4u + 3)0 = 2, which implies that u = 3. However, note that if u = 3, then 4u + 3 = 15 is not prime. Hence, this gives no solution. If r ≥ 1, then u − 1 < 2(4u + 3)r . Therefore, we have no solution in the set of non-negative integers when a is odd and b is even.
2.4 Solutions of u a + v b = z 2 for (u, v) = (u, 8u + k) The proof of the following theorems and lemmas will follow directly from the proofs of the cases presented previously.
2.4.1
When k = 1
Lemma 6 The Diophantine equation u a + (8u + 1)b = z 2 , where u ≡4 1, has no solutions in N0 . Proof The proof is similar to the proof of Lemma 2.
Lemma 7 The Diophantine equation u a + (8u + 1)b = z 2 , where u ≡4 3, has no solutions in N0 , when a is even. Proof The proof is similar to the proof of Lemma 3.
Theorem 11 The Diophantine equation (2) for prime pairs (u, v) = (u, 8u + 1), where u ≡4 3, has no solutions in N0 , when a is odd and b is even. Proof Since 4u + 1 ≡4 8u + 1, the proof here follows directly from the proof of Theorem 8. Following the same argument presented in the proof, we end up with u − 1 = 2(8u + 1)r , for some non-negative integer r . If r = 0, then u − 1 = 2(8u + 1)0 , implying that u = 3. However, for u = 3, we have v = 8u + 1 = 25, which is not prime. If r > 1, then u − 1 < 2(8u + 1)r . Hence, we have no solutions in this case.
218
2.4.2
E. X. P. Calabia and J. B. Bacani
When k = 3
Lemma 8 The Diophantine equation u a + (8u + 3)b = z 2 , where u ≡4 1, has no solutions in N0 , when b is even. Proof The proof is similar to the proof of Lemma 4.
Lemma 9 The Diophantine equation u a + (8u + 3)b = z 2 , where u ≡4 3, has no solutions in N0 when a and b are of the same parity. Proof The proof is similar to the proof of Lemma 5.
Theorem 12 The Diophantine equation (2) for prime pairs (u, v) = (u, 8u + 3), where u ≡4 3, has no solutions in N0 when a is even and b is odd. Proof Since 4u + 3 ≡4 8u + 3, the proof here follows directly from the proof of Theorem 9. Following the same argument presented in the proof, we end up with (8u + 3) − 1 = 2u r or 8u + 2 = 2u r , for some non-negative integer r . If r = 0, then 8u = 0, implying that u = 0, which is not prime. If r = 1, then we have 8u − 2u = −2, which implies that u is negative and is not prime. If r = 2, then we have 8u + 2 = 2u 2 , which gives no integer solutions u. If r > 2, then 8u + 2 < 2u r for all r and for u ≥ 3. Hence, we have no solutions in this case. Theorem 13 The Diophantine equation (2) for prime pairs (u, v) = (u, 8u + 3), where u ≡4 3, has no solutions in N0 when a is odd and b is even. Proof Since 4u + 3 ≡4 8u + 3, the proof here follows directly from the proof of Theorem 10. Following the same argument presented in the proof, we end up with u − 1 = 2(8u + 3)r , for some non-negative integer r . If r = 0, then u − 1 = 2(8u + 1)0 , implying that u = 3. However, for u = 3, we have v = 8u + 3 = 27, which is not prime. If r > 1, then u − 1 < 2(8u + 3)r . Hence, we have no solution in this case.
2.4.3
When k = 5
Lemma 10 The Diophantine equation u a + (8u + 5)b = z 2 , where u ≡4 1, has no solutions in N0 .
On the Solutions of the Diophantine Equation …
Proof The proof is similar to the proof of Lemma 2.
219
Lemma 11 The Diophantine equation u a + (8u + 5)b = z 2 , where u ≡4 3, has no solutions in N0 when a is even. Proof The proof is similar to the proof of Lemma 3.
Theorem 14 Let a be odd and b be even. Then, the Diophantine equation (2) for prime pairs (u, v) = (u, 8u + 5), where u ≡4 3, has the unique solution (u, v, a, b, z) = (3, 29, 1, 0, 2) in N0 . Proof Since 4u + 1 ≡4 8u + 5, the proof here follows directly from the proof of Theorem 8. Following the same argument presented in the proof, we end up with (u − 1)(u a−1 + u a−2 + · · · + 1) = 2(8u + 5)s ,
(8)
which will imply that u − 1 = 2(8u + 5)r , for some non-negative integer r ≤ s. If r = 0, then u − 1 = 2(8u + 5)0 , implying that u = 3 and v = 8u + 5 = 29. It follows that a = 1, b = 0 and z = 2. Using the same arguments as in the proof of Theorem 8, we get the solution (u, v, a, b, z) = (3, 29, 0, 1, 2). While if s = 1 in (8), then we have 3a−1 + 3a−2 + · · · + 1 = 29. However, if a = 3, the left-hand side of (8) is equal to 13 < 29, and if a = 5, then we have 364 > 29. Hence we have no solution for s > 1. If r > 1, then u − 1 < 2(8u + 5). Hence, we have the unique solution (u, v, a, b, z) = (3, 29, 0, 1, 2) in this case.
2.4.4
When k = 7
Lemma 12 Let b be an even number. Then, the Diophantine equation u a + (8u + 7)b = z 2 , where u ≡4 1, has no solutions in N0 . Proof The proof is similar to the proof of Lemma 4.
Lemma 13 The Diophantine equation u a + (8u + 7)b = z 2 , where u ≡4 3, has no solutions in N0 when a and b are of the same parity. Proof The proof is similar to the proof of Lemma 5.
Theorem 15 The Diophantine equation (2) for prime pairs (u, v) = (u, 8u + 7), where u ≡4 3, has no solutions in N0 when a is even and b is odd. Proof Since 4u + 3 ≡4 8u + 7, the proof here follows directly from the proof of Theorem 4. Following the same argument presented in the proof, we end up with (8u + 7) − 1 = 2u r or 8u + 6 = 2u r ,
220
E. X. P. Calabia and J. B. Bacani
for some non-negative integer r . If r = 0, then 8u = 0, implying that u = 0, which is not prime. If r = 1, then we have 8u − 2u = −6, which implies that u is negative and is not prime. If r = 2, then we have 8u + 6 = 2u 2 , which gives no integer solutions u. If r > 2, then 8u + 6 < 2u r for all r and for u ≥ 3. Hence, we have no solution in this case. Theorem 16 Let a be odd and b be even. The Diophantine equation (2) for prime pairs (u, v) = (u, 8u + 7), where u ≡4 3, has the unique solution (u, v, a, b, z) = (3, 31, 1, 0, 2) in N0 . Proof Since 4u + 3 ≡4 8u + 7, the proof here follows directly from the proof of Theorem 5. Following the same argument presented in the proof, we end up with (u − 1)(u a−1 + u a−2 + · · · + 1) = 2(8u + 7)s ,
(9)
which will imply that u − 1 = 2(8u + 7)r , for some non-negative integer r ≤ s. If r = 0, then u − 1 = 2(8u + 7)0 , implying that u = 3 and v = 8u + 7 = 31. Hence, we have a = 1, b = 0, and z = 2. This gives the solution (u, v, a, b, z) = (3, 31, 1, 0, 2). While if s = 1 in (9), then we have 3a−1 + 3a−2 + · · · + 1 = 31. If a = 3, the left-hand side of (9) is equal to 13 < 31, and if a = 5, then we have 364 > 31. Hence we have no solution for s > 1. If r > 1, then u − 1 < 2(8u + 7)r . Hence, we have the unique solution (u, v, a, b, z) = (3, 31, 1, 0, 2) in this case. Acknowledgements The authors would also like to thank the University of the Philippines Baguio for the support in the dissemination of the results, and in the publication of this manuscript. The authors would also like to thank Professor Perlas C. Caranay for her valuable feedback that helped make this paper even more promising. Lastly, a big thanks is given to the anonymous referees for their time and expertise.
References 1. Burton DM (2011) Elementary number theory. McGraw-Hill 2. Andreescu T, Andrica D, Cucurezeanu I (2010) An introduction to Diophantine equations. Springer 3. Rabago JFT (2017) On the Diophantine equation 2x + 17 y = z 2 . J Indonesian Math Soc 22:85– 88 4. Sroysang B (2013) More on the Diophantine equation 2x + 19 y = z 2 . Int J Pure Appl Math 88:157–160 5. Rabago JFT, Bacani JB (2015) The complete set of solutions of the Diophantine equation p x + q y = z 2 for twin primes p and q. Int J Pure Appl Math 104:517–521 6. Gupta S, Kumar R, Kumar S (2015) OOn the nonlinear Diophantine equation p x + ( p + 2) y = z 2 . Ilkogretim Online-Element Educ Online 19:472–475
On the Solutions of the Diophantine Equation …
221
7. Burshtein N (2018) All the solutions of the Diophantine equation p x + ( p + 4) y = z 2 when p, p + 4 are primes and x + y = 2, 3, 4. Ann Pure Appl Math 16:241–244 8. Burshtein N (2018) The Diophantine equation p x + ( p + 4) y = z 2 when p>3, p + 4 are primes is insolvable in positive integers x, y, z. Ann Pure Appl Math 16:283–286 9. Mina RJS, Bacani JB (2021) On the solutions of the Diophantine equation p x + ( p + 4k) y = z 2 for prime pairs p and p + 4k. Eur J Pure Appl Math 14:471–479 10. Burshtein N (2018) Solutions of the Diophantine equation p x + ( p + 6) y = z 2 when ( p, p + 6) are primes and x + y = 2, 3, 4. Ann Pure Appl Math 17:101–106 11. Gupta S, Kumar S, Kishan H (2018) OOn the nonlinear Diophantine equation p x + ( p + 6) y = z 2 . Ann Pure Appl Math 18:125–128 12. De Oliveira FN (2018) On the solvability of the Diophantine equation p x + ( p + 8) y = z 2 when p>3 and p + 8 are primes. Ann Pure Appl Math 18:9–13 13. Dokchan R, Pakongpun A (2021) On the Diophantine equation p x + ( p + 20) y = z 2 , where p and p + 20 are primes. Int J Math Comput Sci 16, 179–183 (2021) 14. Burshtein N (2017) On solutions of the Diophantine equation p x + q y = z 2 . Ann Pure Appl Math 13:143–149 15. Darmon H, Diamond F, Taylor R, Giles S (1995) Fermat’s last theorem. Current developments in mathematics 16. Apostol T (1976) Introduction to analytic number theory. Undergraduate texts in mathematics 17. Catalan E (1844) Note extraite d’une lettre adressee a l’editeur. J Reine Angew Math 27:192
A Second-Order Optimal Hybrid Scheme for Singularly Perturbed Semilinear Parabolic Problems with Interior Layers S. Priyadarshana
and J. Mohapatra
Abstract An improved time-accurate hybrid finite difference scheme is studied for singularly perturbed semilinear parabolic problems with interior layers. After dealing with the semilinearity through Newton’s linearization technique, the temporal direction is treated by the implicit Euler scheme. The space derivatives are handled with a hybrid scheme on two-layer resolving meshes namely, the Shishkin mesh and the Bakhvalov-Shishkin mesh. Richardson extrapolation is applied in time to get rid of the reduction of the order of accuracy outside the layer region. The robustness of the scheme is proved through two test examples among which one is the time-delayed model. Keywords Singular perturbation · Convection-diffusion problem · Interior layers · Time delay · Richardson extrapolation
1 Introduction In this work, a singularly perturbed semilinear parabolic problem with interior layers is discussed. Use the following notations: ⎧ ⎨ G = G ∪ ∂G, G = Gs × Gt , G − = Gs− × Gt , G + = Gs+ × Gt , G = (0, 1), Gt = (0, T ], Gs− = (0, η), Gs+ = (η, 1), 0 < η < 1, ⎩ s ∂G = ϒd ∪ ϒl ∪ ϒr , ϒd = Gs × 0, ϒl = 0 × (0, T ], ϒr = 1 × (0, T ]. The following singularly perturbed parabolic initial and boundary value problem (IBVP) is described in (s, t) ∈ G − ∪ G + .
Lε z(s, t) ≡ − z t+ εz ss + a(s)zs − b(s)z (s, t) = f (s, t, z) z ϒd = Φd (s), z ϒl = Φl (t), z ϒr = Φr (t), t ∈ Gt .
(1)
S. Priyadarshana (B) · J. Mohapatra Department of Mathematics, National Institute of Technology, Rourkela 769008, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_17
223
224
A. Priyadarshana and J. Mohapatra
0 < ε 1 is the perturbation parameter. a(s), f (s, t) and b(s) are sufficiently smooth on Gs− ∪ Gs+ , G − ∪ G + and G s , respectively [6, 9]. Consider,
b(s) ≥ β ≥ 0, on G s |[a]| ≤ C, |[ f ]| ≤ C, at s = η.
The solution satisfies the interface condition [z] = 0, [z s ] = 0 where, [z] denotes the jump of z i.e., [z](η, t) = z(η+ , t) − z(η− , t), z(η± , t) = lim z(s, t) at s = η. s→η±0
Interior layer of O(ε) is created in the neighborhood of s = η, if there is a discontinuity in the convection coefficient. The below-mentioned assumptions are made to have stronger interior layers:
−α1∗ < a(s) < −α1 < 0, s < η, α2∗ > a(s) > α2 > 0, s > η.
Φd , Φl and Φr are chosen to be sufficiently smooth and satisfying the compatibility conditions at all the corners including at s = η. Now, following the concept of the upper and lower solutions (see Definition 3.1 in [11]) and by the use of weaker assumption f z ≥ 0 in G, problem (1) have a unique solution (see Lemma 3.4–3.6, Chap. 2 in [11]). Now, the analytical aspects of the continuous problem can be followed from [8]. With different signs of a(s), (1) can be identified as the timedependent linearized viscous Burger’s equation having shock layer [10]. In the recent past, the use of the Fitted Mesh Methods (FMMs) for the approximation of SPPs with discontinuous convection coefficients has dragged the attention of many researchers. One can go through [1] for the non-stationary SPPs, [2] for stationary SPPs, and the references therein. Again, using optimal order accurate finite difference schemes on these graded meshes has gained much more popularity. Mukherjee and Natesan have used the implicit Euler in time and the upwind scheme on two-layer resolving meshes, namely, the Shishkin-mesh (S-mesh) and the Bakhvalov-Shishkin mesh (B-S-mesh) providing a first-order global accurate result in [7], then in [8] they have proposed the hybrid scheme on the S-mesh in space, providing a scheme that is second-order accurate (up to a logarithmic factor) in space. All the works described above have discussed the linear form of (1). For a similar class of problems recently, Yadav and Mukherjee [16] have proposed an improved hybrid numerical algorithm that provides at least a second-order accurate scheme in both inside and outside the layer for both linear and semilinear form of IBVP (1). Even if some of the schemes in the literature are second-order accurate in space but not in time. Thus, the overall accuracy can drop to first-order especially, at outside of the layer region. Again to the best of our knowledge, the only work present in the literature is [16] that has discussed the numerical approximation for the semilinear form of the prescribed class of problems. Furthermore, none of the works have described the effect of delay terms in the concerned problems. Hence, our work can contribute in these three major directions. The organizational structure
A Second-Order Optimal Hybrid Scheme …
225
of the present manuscript is as follows: the numerical approximation for (1) along with the error estimates are described in Sect. 2. The numerical outputs are discussed in Sect. 3 followed by the major conclusions in Sect. 4.
2 Numerical Approximation Before using the fully discretized finite difference scheme, Newton’s linearization technique is used to linearize (1).
2.1 Newton’s Linearization Technique We use z (0) (s, t) as a suitable initial guess for z(s, t) in the source term f (s, t, z). Now, for all i > 0, expansion of f (s, t, z (i+1) ) around z (i) gives
f (s, t, z (i+1) (s, t)) ≈ f (s, t, z (i) ) + z (i+1) − z (i)
∂ f + ... ∂z(s, t) (s,t,z (i) )
(2)
Substituting (2) in (1), we get Lz ∼ =
(i+1) + a(s)z s(i+1) − b(s)z (i+1) − − z t(i+1) + εz ss = f (s, t, z (i) ) − z (i) ∂∂zf (s,t,z (i) ) .
∂f z (i+1) ∂z (s,t,z (i) )
(s, t)
On further rearrangement we have Lz ∼ =
−
z t(i+1)
+
(i+1) εz ss
+
a(s)z s(i+1)
− B(s, t)z
(i+1)
(s, t) = F(s, t).
(3)
B(s, t) = b(s) + ∂∂zf (s,t,z (i) ) and F(s, t) = f (s, t, z (i) ) − z (i) ∂∂zf (s,t,z (i) ) . Afterward, each iteration is solved numerically with a stopping criterion max Z i+1 (sn , tm ) − Z i (sn , tm ) ≤ T O L .
(sn , tm ) ∈ G
2.2 Discretization of the Domains The proposed numerical procedure approximates the time variable employing a uniM form mesh with step length t. The discretized domain in time Gt = [0, T ] is
226
A. Priyadarshana and J. Mohapatra
GtM = {tm = m t, m = 0, 1 . . . , M, t M = T, t = T /M}. The point of discontinuity divides the entire domain into two parts and each part is further divided into another two parts due to the presence of layers on each side of s = η. The complete number of mesh intervals in G s is taken to be N with nonnegative constants (ϑ1 and ϑ2 ) being the transition parameters, defined as
η η−1 , ϑ0 ε ln N , ϑ2 = min , ϑ0 ε ln N , ϑ1 = min 2 2 where ϑ0 = 2/θ , θ > 0 to be chosen suitably later. As discussed earlier, G s is split up to four sub-domains i.e., G s = [0, η − ϑ1 ] ∪ [η − ϑ1 , η] ∪ [η, η + ϑ2 ] ∪ [η + ϑ2 , 1]. The intervals are created in such a way that s0 = 0, s N /4 = (η − ϑ1 ), s3N /4 = (η + ϑ2 ) and s N = 1. Let us denote G l = [0, η − ϑ1 ] × [0, T ], G m = [η − ϑ1 , η + ϑ2 ] × [0, T ] and G r = [η + ϑ2 , 1] × [0, T ]. The intervals [0, η − ϑ1 ] and [η + ϑ2 , 1] are uniform in nature with N /4 number of mesh intervals in each, whereas [η − ϑ1 , η] and [η, η + ϑ2 ] are partitioned using mesh generating functions 1 (χ ), χ ∈ [1/4, 1/2] and 2 (χ ), χ ∈ [1/2, 3/4] that are monotonically increasing, continuous and piecewise differentiable. The functions are chosen to satisfy, 1 (1/4) = − ln N , 1 (1/2) = 0, 2 (1/2) = 0 and 2 (3/4) = ln N . Clearly, s N /2 = η. Let sn = nh n with h n = sn − sn−1 and n = h n + h n+1 for n = 1, . . . , N − 1. The mesh points are given by ⎧ 4(η − ϑ1 ) ⎪ ⎪ , n ⎪ ⎪ ⎪ N ⎪ ⎪ ⎪ ⎨η + ϑ0 ε 1 (χn ),
N , 4 N N ≤n≤ , for χn = n/N , if 4 2 sn = 3N N ⎪ ⎪ η + ϑ ≤ n ≤ , ε (χ ), for χ = n/N , if ⎪ 0 2 n n ⎪ 2 4 ⎪ ⎪ ⎪ 3N 4(1 − η − ϑ2 ) ⎪ ⎩(η + ϑ2 ) + (n − 3N /4), if < n ≤ N. N 4 if 0 ≤ n
0 is the temporal lag. T is assigned as T = kτ for k in N. Total number of mesh intervals will be split into M = (M1 + M2 ). The discretized domain in time M Gt = [−τ, 2τ = T ] will be GtM1 = {tm = m t, m = 0, 1 . . . , M1 , t M1 = 0, t = τ/M1 }, GtM2 = {tm = m t, m = 0, 1 . . . , M2 , t M2 = T, t = T /M2 }. M1 and M2 are denoted as the number of mesh intervals in [−τ, 0] and [0, T ], respectively. For details, one can refer [3, 4, 12–15]. To claim the reduction of error after extrapolation (as mentioned in Sect. 2.4), the error plots both before and after extrapolation are given in Fig. 1. E N , t and R N , t are tabulated in Tables 1 and 2, for S-mesh and B-S-mesh, respectively, to show the elevation of accuracy. A comparison result between S-mesh and B-S-mesh is given in Table 3 which confirms the efficacy of B-S-mesh over S-mesh. The nature of the solution of Example 2 is shown through surface plots in Fig. 2. A comparison of E N , t and R N , t in every region, obtained on both S-mesh and B-S-mesh for example 2 is provided in Table 4. Tables and graphs are plotted with different values of ϑ0 , t, and h. In all the cases it is evident that after extrapolation the scheme becomes globally second-order accurate. Though both the meshes provide similar approximations outside the layer region, being graded in nature B-S-mesh is proved to provide a better approximation than S-mesh inside the layer region.
232
A. Priyadarshana and J. Mohapatra
Table 1 E N , t and R N , t with M = N /2, ϑ0 = 1.8 on S-mesh for Example 1 Number of mesh intervals in space (N ) ε
Extrapolation
64
10−3
Before
6.5584e-3
2.3679e-3
1.4697
1.3721
After 10−5
Before After
128
256
512
1024
9.1479e-4
3.8426e-4
1.7066e-4
1.2514
1.1710
6.5210e-3
2.2579e-3
7.3247e-4
2.3183e-4
1.5301
1.6241
1.6597
1.6945
6.5868e-3
2.3704e-3
9.2501e-4
3.9424e-4
1.4745
1.3576
1.2304
1.1517
6.2656e-3
2.1477e-3
7.1507e-4
2.2863e-4
1.5447
1.5866
1.6450
1.6832
7.1629e-5 1.7744e-4 7.1196e-5
Table 2 E N , t and R N , t with M = N /2, ϑ0 = 2.8 on B-S-mesh for Example 1 Number of mesh intervals in space (N ) ε
Extrapolation
64
128
256
512
1024
10−5
Before
2.2785e-3
1.0960e-3
6.0047e-4
3.1601e-4
1.6254e-4
1.0558
0.8680
0.9261
0.9591
After 10−7
Before After
1.6186e-3
3.9070e-4
9.5498e-5
2.3608e-5
2.0506
2.0325
2.0162
1.9723
2.2788e-3
1.0964e-3
6.0081e-4
3.1623e-4
1.0556
0.8677
0.9259
0.9590
1.6192e-3
3.9056e-4
9.5305e-5
2.3477e-5
2.0516
2.0349
2.0213
1.9908
6.0164e-6 1.6267e-4 5.9071e-6
Table 3 E N , t and R N , t after extrapolation with M = N , ϑ0 = 3.2 for Example 1 Number of mesh intervals in space (N ) ε
Mesh
64
128
256
512
1024
10−4
S-mesh
2.2584e-2
7.4627e-3
2.4110e-3
7.6495e-4
2.3185e-4
1.5975
1.6301
1.6562
1.7222
1.7699e-3
4.6057e-4
1.1764e-4
2.8986e-5
1.9422
1.9690
2.0210
2.0022
B-S-mesh 10−6
S-mesh B-S-mesh
2.2393e-2
7.3512e-3
2.3543e-3
7.4519e-4
1.6070
1.6427
1.6596
1.6942
1.7396e-3
4.4853e-4
1.1401e-4
2.8747e-5
1.9555
1.9760
1.9877
1.9922
7.2353e-6 2.3028e-4 7.2259e-6
A Second-Order Optimal Hybrid Scheme …
233 1
0.2
0.5
Z(s,t)
Z(s,t)
0.1 0
-0.5
-0.1 -0.2 2
0
1.5
0 1
0.5
t
0.5 0 1
-1 2 1
t
s
0 1
(a) ε = 100
0.6
0.8
0.4
0.2
0
s
(b) ε = 10−2
Fig. 2 Surface plots for Example 2 Table 4 E N , t and R N , t after extrapolation with ε = 10−6 , M = N , ϑ0 = 3.2 for Example 2 S-mesh B-S-mesh Gr Gm Gl Gr Gm Gl N 32 64 128 256 512
6.1111e-3 1.9748 1.5547e-3 1.9995 3.8882e-4 2.0003 9.7187e-5 1.9998 2.4300e-5
6.8579e-2 1.5954 2.2695e-2 1.6181 7.3931e-3 1.6491 2.3572e-3 1.6636 7.4406e-4
4.0763e-3 1.8877 1.1016e-3 1.9404 2.8701e-4 1.9671 7.3407e-5 1.9810 1.8595e-5
6.2847e-3 2.0003 1.5709e-3 2.0083 3.9048e-4 2.0038 9.7364e-5 2.0013 2.4319e-5
6.7248e-3 1.9546 1.7350e-3 1.9748 4.4139e-4 1.9877 1.1129e-4 1.9930 2.7957e-5
4.0763e-3 1.8877 1.1016e-3 1.9404 2.8701e-4 1.9671 7.3407e-5 1.9810 1.8595e-5
4 Conclusion The present work discusses the application of a hybrid numerical algorithm on singularly perturbed semilinear parabolic problems having interior layers. The semilinearity is dealt with Newton’s linearization technique. The finite difference scheme comprises the use of an implicit Euler scheme in time and a hybrid scheme in space. The order of accuracy is second order in space but first order in time. To make the scheme globally second-order accurate, the Richardson extrapolation technique is applied only in the time direction. Tables and graphs in the numerical section prove the efficacy of the scheme. The B-S-mesh is shown to provide better accuracy than S-mesh. Acknowledgements Ms. S. Priyadarshana conveys her sincere thanks to DST, Govt. of India for providing INSPIRE fellowship (IF 180938).
234
A. Priyadarshana and J. Mohapatra
References 1. Chandru M, Prabha T, Das P (2011) A numerical method for solving boundary and interior layers dominated parabolic problems with discontinuous convection coefficient and source terms. Differ Equ Dynam Syst 37(1):247–265. https://doi.org/10.1007/s12591-017-0385-3 2. Farrell PA, Hegarty AF, Miller JJH, O’Riordan E, Shishkin GI (2004) Global maximum norm parameter-uniform numerical method for a singularly perturbed convection-diffusion problem with discontinuous convection coefficient. Math Comput Model 40(11–12):1375–1392. https:// doi.org/10.1016/j.mcm.2005.01.025 3. Govindarao L, Mohapatra J (2019) Numerical analysis and simulation of delay parabolic partial differential equation involving a small parameter. Eng Comput 37(1):289–312 4. Govindarao L, Mohapatra J, Das A (2020) A fourth-order numerical scheme for singularly perturbed delay parabolic problem arising in population dynamics. J Appl Math Comput 63(1):171–195 5. Gupta V, Kadalbajoo MK, Dubey RK (2018) A parameter-uniform higher order finite difference scheme for singularly perturbed time-dependent parabolic problem with two small parameters. Int J Comput Math 96(3):474–499. https://doi.org/10.1080/00207160.2018.1432856 6. Ladyženskaja OA, Solonnikov VA, Ural’ceva NN (1968) Linear and quasi-linear equations of parabolic type. Am Math Soc 23 7. Mukherjee K, Natesan S (2011) Optimal error estimate of upwind scheme on Shishkin-type meshes for singularly perturbed parabolic problems with discontinuous convection coefficients. Bit Numer Math 51:289–315 8. Mukherjee K, Natesan S (2011) ε-Uniform error estimate of hybrid numerical scheme for singularly perturbed parabolic problems with interior layers. Numer Algorithms 58(1):103– 141 9. O’Riordan E, Shishkin GI (2004) Singularly perturbed parabolic problems with non-smooth data. J Comput Appl Math 166:233–245 10. O’Riordan E, Quinn J (2015) A linearised singularly perturbed convection-diffusion problem with an interior layer. Appl Numer Math 98:1–17 11. Pao CV (1992) Nonlinear parabolic and elliptic equations, 1st edn. Springer, New York 12. Priyadarshana S, Mohapatra J, Govindrao L (2021) An efficient uniformly convergent numerical scheme for singularly perturbed semilinear parabolic problems with large delay in time. J Appl Math Comput. https://doi.org/10.1007/s12190-021-01633-7 13. Priyadarshana S, Mohapatra J, Pattanaik SR (2022) Parameter uniform optimal order numerical approximations for time-delayed parabolic convection diffusion problems involving two small parameters. Comput Appl Math. https://doi.org/10.1007/s40314-022-01928-w 14. Sahu SR, Mohapatra J (2021) Numerical investigation of time delay parabolic differential equation involving two small parameters. Eng Comput 38(6):2882–2899 15. Sahu SR, Mohapatra J (2021) Numerical study of time delay singularly perturbed parabolic differential equations involving both small positive and negative space shift. J Appl Anal. https://doi.org/10.1515/jaa-2021-2064 16. Yadav NS, Mukherjee K (2020) Uniformly convergent new hybrid numerical method for singularly perturbed parabolic problems with interior layers. Int J Appl Comput Math 6(53). https:// doi.org/10.1007/s40819-020-00804-7
Temperature Distribution During Hyperthermia Using a 2D Space-Time Fractional Bioheat Model in Irregular Domain Bhagya Shree Meena
and Sushil Kumar
Abstract This study aims to solve the two-dimensional space-time fractional Pennes bioheat model to predict the temperature distribution in irregular domains subjected to electromagnetic heating. Time and space fractional derivatives are considered in Caputo fractional derivative form. A computational method based on radial basis functions and shifted Chebyshev polynomials is proposed to solve the mathematical model. The effect of fractional-order derivatives α and β on the temperature profile is investigated, and it is observed that the highest temperature in tissue increases as α increases and β decreases. The effects of the heat source parameters and the blood perfusion on temperature distribution in the tissue are also assessed. Keywords Chebyshev polynomials · Hyperthermia · Irregular domain · Meshless method · Pennes bioheat model
1 Introduction Temperature distribution in tissue is crucial for applications such as cryosurgery, skin burns, radio frequency and laser thermal ablations, thermal resections, ultrasound, hyperthermia, and hypothermia. In situations where surgical intervention is risky or impractical, hyperthermia tries to elevate the temperature of malignant tissue above a therapeutic value while keeping the surrounding healthy tissue at sublethal temperature values. In biological tissue, heat transfer is generally expressed as a bioheat model, which includes thermal conduction, convection, blood circulation, and the metabolic heat in the tissue. The models of Pennes, Wolf, Klinger, Chen-Holmes, and Nakayama have presented the heat transfer process in skin tissue. However,
B. S. Meena (B) · S. Kumar Department of Mathematics, S. V. National Institute of Technology, Surat, India e-mail: [email protected] S. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_18
235
236
B. S. Meena and S. Kumar
Pennes model is widely used to study the heat flow in tissue due to its simplicity and reliability. It is given as ρc
∂ T (X, t) = k∇ 2 T (X, t) + Wb cb (Tb − T (X, t)) + Q m + Q e , ∂t
(1)
where T, ρ, c, k, cb , Wb , Tb , Q e , and Q m denote temperature, density, specific heat, thermal conductivity, specific heat of blood, blood perfusion rate, arterial blood temperature, external heat source, and metabolic heat generation in skin tissue, respectively. Fractals and fractional calculus have attracted much attention and appreciation due to their ability to improve the model’s accuracy and present an enhanced version of various problems. The primary benefit of fractional-order operators being non-local is that the next state of a system is based not only on its current state but also on its historical states. This is one factor contributing to the popularity and realism of fractional calculus. Fractional calculus can give a brief model for describing the dynamic processes that take place in biological tissue. When characterizing the input–output behavior of biological systems, it is essential to consider that distributed relaxation processes appear in tissue. The fractional calculus model suggests new experiments and measurements that can help explain structure and dynamics in biological systems. These models also unravel the complexity of individual molecules and membranes in a way that helps us to understand the overall behavior of living systems. Fractional diffusion models have attracted the interest of many researchers because of their prevalence in several applications in fluid mechanics, viscoelasticity, biology, physics, engineering, etc. Numerous studies have been performed on time, space, and time-space fractional bioheat models. Damor et al. [6] obtained the fractional bioheat model for cancer therapy with an external heating source and used the finite difference method for an approximate solution to the model. Kumar et al. [16] considered a fractional Pennes bioheat model with three different heating scenarios for studying heat transfer in living tissues. Singh et al. [21] derived a space-time fractional bioheat equation and solved it using the homotopy perturbation method. Cui et al. [5] obtained the analytical solution of a fractional Pennes bioheat model using the separation of variables method. Ferres et al. [10] established the time fractional Pennes bioheat model and solved it numerically using a finite difference scheme. Damor et al. [7] obtained the numerical solution based on the implicit finite difference method for the fractional Pennes bioheat model and studied the effect of anomalous diffusion in skin tissue. Qin and Wu [17] presented the time fractional Pennes bioheat model during the heat transfer process in cancer treatment. Roohi et al. [18] proposed the time-space fractional Pennes bioheat model using the fractional-order Legendre functions and the Galerkin method. Saadwai and Humedi [3] studied the time fractional Pennes bioheat model and employed the fractional shifted Legendre polynomial. Hobiny et al. [14] obtained the analytic solution to investigate the thermal effects of laser irradiation on skin tissue. Saadwai and Humedi [4] employed the space fractional bioheat equation and found the numerical solution using the fractional Legendre
Temperature Distribution During Hyperthermia …
237
polynomials. Humedi and Abdulhussein [2] discussed the numerical solution based on the nonpolynomial spline and exponential spline methods for the time fractional bioheat model. Abdulhussein and Oda [1] obtained the numerical solution of the timespace fractional bioheat transfer equation using fractional quadratic spline methods. Santra and Mohapatra [19, 20] proposed the numerical solution of the time fractional partial differential equation with the Caputo derivative using the central difference method and quadrature rule. Also, they analyzed the error by the L 1 scheme. Yang and Sun [22] developed a space fractional Pennes bioheat model to measure the skin tissue’s thermal response to the heating and cooling process. All the above-cited studies are in the one-dimensional regular domain. Biological tissues and tumors can have a regular or irregular shape. It motivates us to study a two-dimensional space-time fractional bioheat model in the irregular domain to simulate the biological hyperthermia process irradiating by the electromagnetic heat source. In the present study, we are considering the following space-time fractional Pennes bioheat model: ρc
∂ α T (X, t) = k∇ β T (X, t) + Wb cb (Tb − T (X, t)) + Q m + Q e , ∂t α
(2)
where T (X, t) denote temperature at point X = (x, y, z) and time t. Here, we aim to solve Eq. (2) in irregular tissue to simulate the biological hyperthermia process with an external heat source. There are numerous methods available, including the finite difference method, the finite element method, the spectral method, etc. to solve the fractional models. Finite difference methods can be high-order accurate, but a structured grid is required. Spectral methods are even more accurate but have severe geometry domain restrictions. Finite element methods are extremely flexible, but achieving high-order accuracy is difficult, and mesh generation becomes increasingly difficult as the space dimension increases. In present study, we use the mesh-free collocation method using the radial basis functions (RBFs) and shifted Chebyshev polynomials for space and time directions, respectively. Mesh-free techniques benefit from not requiring a mesh. The pairwise distances between points are the only geometric properties used in an RBF approximation. Distances are easy to compute in any space dimension, so working with irregular geometry does not increase the complexity. Furthermore, Chebyshev polynomials in the time direction allow us to solve the system on the entire time domain in one go with a few nodes. After discretizing in time and space direction, this model is converted into a system of linear equations with initial and boundary conditions. The effect of fractional derivatives and heat source parameters on tissue temperature profile has been discussed.
238
B. S. Meena and S. Kumar
The manuscript is structured as: After giving brief introduction and literature survey in Sect. 1, some preliminaries about this study, such as shifted Chebyshev polynomials and radial basis functions, are introduced in Sect. 2. Section 3 discusses the two-dimensional space-time fractional bioheat model with initial and boundary conditions. We discuss the numerical approach for the model in Sect. 4. Section 5 investigates the impact of fractional derivatives, blood perfusion, and heat source parameters on the tissue temperature profile. Section 6 contains a conclusion followed by some cited references.
2 Preliminaries Definition 1 (Caputo Fractional Derivative) The Caputo fractional partial derivative of order n − 1 < α < n(n ∈ N) with respect to X of function f (X, t) is defined as C α a DX
f (X, t) =
⎧ ⎨ ⎩
1 (n−α) ∂ ∂ Xn n
X a
∂n ∂ Xn
f (X,t) ds, (X −s)α+1−n
f (X, t),
n − 1 < α < n, n ∈ N, α = n, ∈ N.
(3)
Definition 2 (Chebyshev Polynomials) The Chebyshev polynomials of first kind Tn (s), s ∈ [−1, 1], n = 0, 1, . . ., are given by Tn (s) = cos(ncos −1 s). The change of variables s = 2tT − 1, s ∈ [−1, 1] is performed in order to use these polynomials on the interval [0, T ] for t, and the shifted Chebyshev polynomials are defined as Tn∗ (t) = Tn 2tT − 1 . The analytic form of shifted Chebyshev polynomials of degree n, ψn (t) is given as ψn (t) =
n n(n + i − 1)! i (−1)n−i 22i t. (2i)!(n − i)!T i i=0
(4)
Using Caputo derivative Eq. (3), the fractional derivative of shifted Chebyshev polynomial Ca Dαt ψn (t) is defined as C α a Dt ψn (t)
=
n
bk,n,α t k−α ;
n ≥ α,
(5)
k=α
where bk,n,α = (−1)n−k 22k
(k + 1) n(n + k − 1)! . (2k)!(n − k)! (k + 1 − α)T k
Definition 3 (Radial Basis Function) A kernel K is symmetric, if K (X, X c ) = K (X c , X ) holds for all X, X c ∈ . A radial function is a function that is radially
Temperature Distribution During Hyperthermia …
239
symmetric around some point X c , called the function’s center. For a kernel K : Rs × Rs → R with scattered nodes X = (X 1 , X 2 , . . . , X s ) and X c = ((X c )1 , (X c )2 , . . . , (X c )s ), K is a radial function if it can be defined as K (X, X c ) = φ(r ),
∀X, X c ∈ Rs ,
where r = ||X − X c ||2 is the Euclidean distance between the points X and X c .
3 Mathematical Model 3.1 Domain The irregular tissue domain considered for this study is shown in Fig. 1 and defined as = X = (x, y) ∈ R2 , where x = r (θ ) cos θ + 0.015, y = r (θ ) sin θ + 0.015, 7 r (θ ) = 108 (7 − cos 5θ ), θ ∈ [0, 2π ].
3.2 Governing Equation To study temperature distribution in the living tissue, we consider two-dimensional space-time fractional bioheat model ρc
∂αT =k ∂t α
∂β T ∂β T + β ∂x ∂ yβ
+ Wb cb (Tb − T (x, y, t)) + Q m + Q e ,
(6)
where 0 < α ≤ 1 and 1 < β ≤ 2 are time and space fractional derivatives, respectively. The metabolic heat generation, Q m , is a temperature-dependent function defined as [12] Q m = Q mo [1 + 0.1 (T − T0 )] , (7)
Fig. 1 The irregular domain with uniform points
y (meter)
0.03
0.02
0.01
0
0
0.01
0.02
x (meter)
0.03
240
B. S. Meena and S. Kumar
where Q mo represents the metabolic heat source term. The term Q e represents the heat source due to electromagnetic energy and defined as [8] Q e = ηBe−ηr ,
(8)
where B is energy flux density (W/m 2 ) and η is attenuation coefficient of tissue (1/m).
3.3 Initial and Boundary Conditions The model’s initial and boundary conditions are as follows: T (X, 0) = T0 ,
X ∈ .
T (X, t) = Tw , (X, t) ∈ ∂ × (0, tl ],
(9) (10)
where T0 and Tw being the initial and body core temperatures, respectively.
3.4 Non-dimensionlization Now, using the following dimensionless variables
1/α X k (T − T0 ) (Ta − T0 ) P = , F0 = t, = , a = , β L ρcL T0 T0 (Tw − T0 ) Q m0 β Wb C b β ηB β L , S 2f = L , Sr = L .
w = , Sm = T0 T0 k k T0 k Equation (6) can be written in the following form: ∂ α (P, F0 ) ∂ β (P, F0 ) = + C (P, F0 ) + S(P), α ∂ F0 ∂ Pβ
(11)
where C = (Sm d − S 2f ) and S(P) = (Sm + S 2f a + Sr eη P ). The initial and boundary conditions are converted to
(P, 0) = 0,
P ∈ .
(P, F0 ) = w , (P, F0 ) ∈ ∂ × (0, F0l ].
(12) (13)
Temperature Distribution During Hyperthermia …
241
4 Numerical Scheme The computational domain = [0, 1] × [0, F0l ] is divided into (w1 × w2 ) nodes. Let us consider Pin = {P1 , P2 , . . . , Pw1in } internal nodes, Pbd = {Pw1in+1 , Pw1in+2 , . . . , Pw1 } boundary nodes in domain and F0 = {0 = F01 , F02 , ..., F0w2 = F0l } time nodes. Further P = Pin ∪ Pbd and w1bd = w1 − w1in are total nodes and boundary nodes, respectively, in space direction. To find the solution of bioheat model Eqs. (11), (12), and (13), (P, F0 ) is approximated by using radial basis function and shifted Chebyshev polynomial in space and time direction, respectively, as
(P, F0 ) =
w1 w2
ψ j (F0 )λ ji φi (P) ≈ (F0 )λ(P).
(14)
i=1 j=1
Here, (F0 ) = [ψ1 (F0 ), ψ2 (F0 ), . . . , ψw2 (F0 )], and (P) = [φ1 (P), φ2 (P), . . . , 2 φw1 (P)] , where φi (P) = φi (r ) = eri , i = 1, 2, . . . , w1 are Gaussian radial basis function with shape parameter, and λ = [λ ji ]w1 ,w2 are unknown coefficients. Here, w1 and w2 are uniform nodes in space and time domains, respectively. Now putting Eq. (14) into Eq. (11), we get the following residual: β
R(P, F0 , λ) = C0 DαF0 (λ) − C0 D P (λ) − C (λ) − S.
(15)
Collocating the residual Eq. (15) at internal nodes in domain × (0, F0l ], we have (I1 λJ1 ) − (I2 λJ2 ) − C (I2 λJ1 ) = E 1 ,
(16)
where ⎡ I1 = ⎣
j
⎤ k−α ⎦ bk, j,α F0 i+1
k=α
,
I2 =
J2 =
,
k bk, j F0 i+1 w2 −2,w2
k=0
w2 −2,w2
J1 = φi (P j ) w1 ,w1in ,
j
C β 0 D P φi (P j )
w1 ,w1in
,
E 1 = S(Pi , F0 j ) (w2 −2)w1in ,1 . The initial condition Eq. (12) and boundary condition Eq. (13) can be written as
where matrices are defined as
I3 λJ3 = E 2 ,
(17)
I4 λJ4 = E 3 ,
(18)
242
B. S. Meena and S. Kumar
I3 = [ψ1 (0), ψ2 (0), . . . ψw2 (0)], I4 =
j
J3 = φi (X j ) w1 ,w1 ,
E 2 = [0, 0, . . . 0] w1 ,1 ,
,
k bk, j F0 i+1
J4 = [φi (P j )]w1 ,w1 bd , ∀P j ∈ Pbd ,
w2 −2,w2
k=0
E 3 = w × [1, 1, . . .] (w2 −2)w1bd ,1 . Now, using the properties of Kronecker product, Eqs. (16), (17), and (18) can be written as [(J1 ⊗ I1 ) − (J2 ⊗ I2 ) − C (J1 ⊗ I2 )]λ = E1 , J3 ⊗ I3 λ = E2 , J4 ⊗ I4 λ = E3 ,
(19) (20) (21)
where λ and E1 , E2 , E3 are obtained by stacking the columns of λ, E 1 , E 2 , and E 3 , respectively, on top of each other. Equations (19), (20), and (21) become a linear system of w1 w2 equations in w1 w2 unknowns λ ji , i = 1, 2, . . . , w1 , j = 1, 2, . . . , w2 . After putting the values of λ into Eq. (14), the approximate solution w1 w2 (P, F0 ) is obtained at the uniform nodes.
5 Results and Discussions In present study, we consider the following parameter values: L = 0.03 m, ρ = 1050 kg/m3 , c = 4180 J/kg ◦ C, cb = 3344 J/kg ◦ C, k = 0.5 W/m ◦ C, Wb = 8 kg/m3 s, Q m 0 = 1091 W/m3 , B = 4000 W/m2 , η = 200 1/m, T0 = 37 ◦ C = Ta = Tw . When we use the radial basis function, the shape parameter is crucial for determining the accuracy of PDE solutions. Various expressions for determining values of shape parameters are available in literature [9, 11, 13, 15]. In this study, we use √ , N being number of nodes [11]. Here, we are the shape parameter value = 1.25 N using 13 nodes in space domain, and hence = 0.3467 to simulate the model.
5.1 Verification For the verification, we compared the numerical results with the exact solution of the reaction-sub-diffusion equation which is given as C α 0 D F0 ( (P,
β
F0 )) = C0 D P ( (P, F0 )) − ( (P, F0 )) + S,
(22)
Temperature Distribution During Hyperthermia …
243 10
2
0.012
-4
Numerical results Exact solution
0.01
1.5
Error
(P,F0)
0.008 0.006
1
0.004 0.5 0.002 0
0 0
0.2
0.4
0.6
0.8
1
P
(a)
0
0.2
0.4
0.6
0.8
1
P
(b)
Fig. 2 a Temperature versus distance for present and exact solution b Absolute error
2 with (P, 0) = 0, (0, F0 ) = 0, (1, F0 ) = F0 2 , and S(P, F0 ) = (3−α) F0 (2−α) 6 2 2 3 (3−β) 3 P − (4−β) F0 P + F0 P . The exact solution of Eq. (22) is (P, F0 ) = F0 2 P 3 . The exact and approximate solution obtained by proposed method of Eq. (22) at F0 = 1 with α = 0.8 and β = 1.4 are plotted in Fig. 2a. The absolute error in the approximate solution is also plotted in Fig. 2b. The overlapping of both the solutions in Fig. 2a and the absolute error in Fig. 2b are sufficient to validate the efficiency and accuracy of the proposed algorithm and computer code.
5.2 Effect of Fractional Orders The influence of fractional orders α and β on heat transmission in the biological tissue during hyperthermia treatment is examined in this section. For α = 0.8, 0.9, 1.0, and β = 1.8, 1.9, 2.0, the temperature profiles in domain are plotted in Fig. 3 at time t = 180s. The highest tissue temperatures for various values of fractional orders α and β are also discovered and listed in Table 1 for domain . For a fixed β value, it is observed that the tissue’s maximum temperature increases as α increases. Furthermore, for a fixed α value, the highest temperature in the tissue decreases as β increases. These phenomena can also be observed in Table 1. A sub-diffusion effect of temporal fractional order α < 1 is found in the domain here. However, as α decreases, the heat spreads more slowly across the domain, and the maximum temperature decreases as well. For spatial fractional β < 2, a superdiffusive effect is observed. However, the heat spreads rapidly, and hence temperature increases as the value of β decreases.
244
B. S. Meena and S. Kumar
(a) α = 1, β = 2
(b) α = 0.9, β = 2
(c) α = 0.8, β = 2
(d) α = 1, β = 1.9
(e) α = 0.9, β = 1.9
(f) α = 0.8, β = 1.9
(g) α = 1, β = 1.8
(h) α = 0.9, β = 1.8
(i) α = 0.8, β = 1.8
Fig. 3 Temperature profile in domain at t = 180s for different values of α and β Table 1 The highest temperature in domain (α, β) α=1 β = 2.0 β = 1.9 β = 1.8
49.9509 ◦ C 53.9472 ◦ C 55.0577 ◦ C
α = 0.9
α = 0.8
47.4379 ◦ C 49.7325 ◦ C 50.3289 ◦ C
44.8014 ◦ C 45.9979 ◦ C 46.2934 ◦ C
5.3 Effect of Heat Source Parameter This section discusses the impact of heat source parameters such as energy flux density and tissue attenuation coefficient on temperature distribution in tissue. The temperature profiles in domain for α = 0.8, β = 1.4, at t = 180 s are plotted in Fig. 4 for η = 200, 400. The highest temperature in tissue is 46.7217 ◦ C and 56.4302 ◦ C for η = 200, 400, respectively. It can be observed from Fig. 4 that the maximum tissue temperature in the domain increases with an increase in η.
Temperature Distribution During Hyperthermia …
(a) η = 200
245
(b) η = 400
Fig. 4 Temperature profile at t = 180s for different values of η in domain
(a) B = 3000
(b) B = 4500
Fig. 5 Temperature profile at t = 180s for different values of B in domain
The temperature profiles in domain at α = 0.8, β = 1.4, t = 180s are plotted in Fig. 5 for B = 3000 and 4500. The highest tissue temperature is 44.2946 ◦ C and 47.9353 ◦ C, for B = 3000 and B = 4500, respectively. It is observed that tissue temperature increases in the domain as B values increase. This influence of B on temperature profile is natural as the external heating source is directly proportional to B.
5.4 Effect of Blood Perfusion To show the effect of blood perfusion rate (Wb ) on heat transfer, contour plots for different values of Wb = 5, 6, 7, 8 are plotted in Fig. 6 in domain for α = 0.8, β = 1.4, t = 180s. In the domain , the maximum temperature measured is 47.5873 ◦ C, 47.2872 ◦ C, 46.9978 ◦ C, and 46.7217 ◦ C for Wb = 5, 6, 7, and 8, respectively. A lower Wb value is associated with a higher tissue temperature. Blood perfusion behaves as a heat sink during the heating procedure. Therefore, a greater amount of heat is absorbed with a higher value of Wb , resulting in a decrease in tissue temperature.
246
B. S. Meena and S. Kumar
(a) Wb = 5
(c) Wb = 7
(b) Wb = 6
(d) Wb = 8
Fig. 6 Temperature profile at t = 180s for different values of Wb in domain
6 Conclusion This study considered the two-dimensional space-time fractional bioheat model for predicting the temperature in irregular tissue heated by an electromagnetic external heat source during hyperthermia therapy. The maximum temperature in the domain increases with an increase in fractional orders α and a decrease in β. The fractional derivatives α < 1 and β < 2 show the subdiffusive and superdiffusive processes, respectively. In response to an increase in the heat source parameters η and B, the tissue temperature in the domain rises. The greatest tissue temperature is proportional to the amount of heat energy. Blood perfusion performs as a heat sink during heating; the higher the Wb value, the more heat is absorbed, which lowers the tissue temperature. Based on the analysis of the impacts of fractional-order derivatives, blood perfusion, and heat source characteristics on the diffusion process, it is determined that these parameters influence the diffusion process. Therefore, it is determined that the mathematical model’s parameters, α, β, η, B, and Wb , significantly affect the temperature profile. Consequently, these parameters should have the best possible values for the best possible outcome of the thermal therapy.
Temperature Distribution During Hyperthermia …
247
Acknowledgements The first author acknowledges the financial support received from University Grant Commission as junior research fellowship (JRF) (Ref: 1069(CSIR-UGC NET Dec 2018)) during the preparation of this manuscript.
References 1. Abdulhussein AM, Oda H (2020) The numerical solution of time-space fractional bioheat equation by using fractional quadratic spline methods. In: AIP conference proceedings, vol 2235. AIP Publishing LLC, p 020013 2. Al-Humedi HO (2019) Abdulhussein AM (2019) Spline methods for solving time fractional bioheat equation. Int J Adv Math 6:16–25 3. Al-Saadawi FA, Al-Humedi HO (2019) Fractional shifted Legendre polynomials for solving time-fractional bioheat equation. J Basrah Res (Sciences) 45(2A):118–130 4. Al-Saadawi FA, Al-Humedi HO (2020) The numerical approximation of the bioheat equation of space-fractional type using shifted fractional Legendre polynomials. Iraqi J Sci 61(4):875–889 5. Cui ZJ, Chen GD, Zhang R (2014) Analytical solution for the time-fractional Pennes bioheat transfer equation on skin tissue. In: Advanced materials research, vol 1049. Trans Tech Publ, pp 1471–1474 6. Damor RS, Kumar S, Shukla AK (2014) Numerical simulation of fractional bioheat equation in hyperthermia treatment. J Mech Med Biol 14(02):1450018 7. Damor RS, Kumar S, Shukla AK (2015) Parametric study of fractional bioheat equation in skin tissue with sinusoidal heat flux. Fract Differ Calculus 5(1):43–53 8. Deng ZS, Liu J (2002) Analytical study on bioheat transfer problems with spatial or transient heating on skin surface or inside biological bodies. J Biomech Eng 124(6):638–649 9. Fasshauer GE (2022) Newton iteration with multiquadrics for the solution of nonlinear PDEs. Comput Math Appl 43:423–438 10. Ferrás LL, Ford NJ, Morgado ML, Nóbrega JM, Rebelo MS (2015) Fractional Pennes’ bioheat equation: theoretical and numerical studies. Fract Calculus Appl Anal 18(4):1080–1106 11. Franke R (1982) Scattered data interpolation: tests of some methods. Math Comput 38(157):181–200 12. Gupta PK, Singh J, Rai K (2010) Numerical simulation for heat transfer in tissues during thermal therapy. J Therm Biol 35(6):295–301 13. Hardy RL (1971) Multiquadric equations of topography and other irregular surfaces. J Geophys Res 76(8):1905–1915 14. Hobiny A, Alzahrani F, Abbas I, Marin M (2020) The effect of fractional time derivative of bioheat model in skin tissue induced to laser irradiation. Symmetry 12(4):602 15. Kansa E (2005) Highly accurate methods for solving elliptic and parabolic partial differential equations. WIT Trans Model Simul 39:5–15 16. Kumar S, Damor RS, Shukla AK (2018) Numerical study on thermal therapy of triple layer skin tissue using fractional bioheat model. Int J Biomath 11(04):1850052 17. Qin Y, Wu K (2016) Numerical solution of fractional bioheat equation by quadratic spline collocation method. J Nonlinear Sci Appl 9(7):5061–5072 18. Roohi R, Heydari M, Aslami M, Mahmoudi M (2018) A comprehensive numerical study of space-time fractional bioheat equation using fractional-order Legendre functions. Eur Phys J Plus 133(10):412 19. Santra S, Mohapatra J (2022) Analysis of a finite difference method based on L1 discretization for solving multi-term fractional differential equation involving weak singularity. Math Methods Appl Sci 45(11):6677–6690
248
B. S. Meena and S. Kumar
20. Santra S, Mohapatra J (2022) A novel finite difference technique with error estimate for time fractional partial integro-differential equation of Volterra type. J Comput Appl Math 400:113746 21. Singh J, Gupta PK, Rai KN (2011) Solution of fractional bioheat equations by finite difference method and HPM. Math Comput Model 54(9–10):2316–2325 22. Yang J, Sun Y (2021) A space-fractional Pennes bioheat conduction model for skin tissue. Springer Nat Appl Sci 3(1):1–5
Characterization of Minimum Structure and Sub-structure Cut of Exchanged Hypercube Paul Immanuel and A. Berin Greeni
Abstract Connectivity is the measure of least cardinality of faulty vertices needed to disconnect a graph. The structure and substructure connectivity are two novel generalization of connectivity with practical applicability. Let G be a connected graph and H be a connected subgraph of G. The H -structure cut (resp. H -substructure cut) is the minimum cardinality of a set F, upon existing, will leave the graph G disconnected while every element in the set F is isomorphic to H (resp. a subgraph of H ), denoted by κ(G; H ) [resp. κs (G; H )]. Our work deals with determining the characterization of cardinality of the set F on exchanged hypercube E Hs,t for H structure cut (resp. H -substructure cut) with cut edges (graphs) being star graphs K 1,m , for m = 1, 2 and C4 cycle. Keywords Exchanged hypercube · Structure connectivity · Sub-structure connectivity
1 Introduction Connectivity with respect to interconnection network is one of the paramount parameters. The connectivity reveals how reliable and robust an interconnection network can work efficiently. However, faulty vertices don’t occur in a specified way to efficiently disconnect an interconnection network. As a result, the parameter connectivity is the measure of worst-case situation. To resolve this issue, Harary introduced the concept of conditional connectivity [5] with few constraints, by characterizing the components when a graph is disconnected. Continuing the line, C.-K. Lin et al. introduced structure and sub-structure connectivity [9], determining constraints on how we choose a subgraph and sub-subgraph respectively in order to disconnect a graph. For an undirected graph G, with graph theoretic property ρ, Harary stated the conditional connectivity κ(G; ρ) as the minimal cardinality of a vertex set, upon existing, P. Immanuel · A. B. Greeni (B) School of Advanced Sciences, Vellore Institute of Technology, Chennai 600 127, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_19
249
250
P. Immanuel and A. B. Greeni
whose deletion will result in disconnecting the G, with the subgraphs (components) possessing property ρ [5]. Esfahanian et al. came up with an idea, the restricted connectivity, which was precise to assess the interconnection networks’ fault tolerance [1]. A few parameters of connectivity include component connectivity [6], Rg connectivity [4], and conditional edge connectivity [2]. With the literature on structure and sub-structure connectivity being thin, our work is an attempt to extend the pre-existing literature on exchanged hypercube E Hs,t . Few phenomenal works were done on structure and sub-structure connectivity for hypercube [9], hypercube-like structures [10], balanced hypercube [8], alternating group graphs [13], crossed cube [18], and twisted hypercube [11]. Exchanged hypercube E Hs,t is a variant of hypercube, constructed by relatively smaller dimensional hypercubes. Exchanged hypercube E Hs,t has been explored in many aspects, for domination [7], on different aspects of diagnosability and reliability [3], cycle embedding [16], and for distinct kinds of connectivities [12, 15, 17, 19]. The exchanged hypercube is constructed by methodically removing edges from an n-cube in order to decrease interconnection complexity while preserving a number of pivotal n-cube characteristics, such as the ability to optimally embed linear arrays and rings with respectable efficiencies. The exchanged hypercube also has a number of other advantageous characteristics, including a small diameter, a low degree, fault tolerance, robust connectivity, recursive construction, partition capability, low latency, and has same average distance as hypercubes, leading to elevated levels of performance and effectiveness [14]. The number of vertices in E Hs,t and (s + t + 1)dimensional hypercube are same, nonetheless, the proportion of edges in E Hs,t to that of a Q s+t+1 - dimensional hypercube is 1/2 + 1/(2(s + t + 1)). E Hs,t is a bipartite graph. E Hs,t (1 ≤ s ≤ t) is Hamiltonian [14]. The H -structure (resp. H substructure) connectivity of E Hs,t for s ≥ 2, t ≥ 3 is considered. The idea of structure and sub-structure is to find the minimum number of subgraphs needed to disconnect a graph G with subgraphs isomorphic to certain pertaining graphs. This helps to determine the robustness, reliability, and fault tolerability. The concept of connectivity is a little vague on practical applications, where every neighbor vertices of a vertex have to be deleted or should be in fault for a graph G to go disconnected. This exact scenario doesn’t quite often occur and to overcome this, the concept of structure and substructure cut was introduced. Many connectivity parameters related with co-existing problems were useful to determine the fault tolerability and diagnosability within the given constraints on disconnected components of G. These connectivity parameters have concentrated on the impact of defective individual vertex. However, a fault vertex might affect the performance of its neighbors by making them more vulnerable to become faulty. These types of scenarios demand a study not only on the individual faulty vertices but also on some particular collection of linked vertices. In this paper, we explore the characterization of minimum structure and substructure cut of E Hs,t with the cut set F isomorphic to K 1,1 , K 1,2 , and C4 cycle. The key objectives of our work for E Hs,t , s ≥ 2, t ≥ 3 is in Table 1.
Characterization of Minimum Structure and Sub-structure Cut … Table 1 Structure and Sub-structure cut of E Hs,t H K1 K 1,1 s+2 ; s = even 2 κ(E Hs,t ; H ) s + 1 s+2 ; s = odd 2 s+2 ; s = even 2 κs (E Hs,t ; H ) s + 1 ; s = odd s+2 2
K 1,2
s+2 2 s+2 2 s+2 2 s+2 2
251
C4 ; s = even ; s = odd
2s
; s = even ; s = odd
2s
The rest of the article is structured as follows: Sect. 2 provides essential definitions and graphical notations. Sect. 3 is dedicated to find both structure and sub-structure connectivity of E Hs,t . Sect. 4 concludes our work with the future direction and also contains a few generalized corollaries.
2 Preliminaries Let G be simple undirected graph with two tuples (V (E), E(G)) as V (G) represents the vertex set and E(G) represents the edge set. If (x, y) is an edge in G, then two unique vertices x and y are said to be adjacent. A vertex set NG (x) adjacent to a vertex x is called the neighbor of x. Likewise, a vertex set NG (Y ) adjacent to vertex set S is called the neighbor of S excluding the vertices in Y . A star graph K 1,m is bipartite with a vertex set consisting of a singleton vertex and the other vertex set has m vertices connected only to the singleton vertex. A cycle Cn is a sequence of vertices and edges whose starting and ending vertex are the same with length n, containing n distinct vertices. Definition 1 ([14]) The vertex set V of exchanged hypercube E Hs,t is the set {u s+t u s+t−1 , . . . u 0 |u i ∈ {0, 1} for 0 ≤ i ≤ s + t}. Let u = u s+t u s+t−1 , . . . u 0 and v = vs+t vs+t−1 , . . . v0 be two vertices in E Hs,t . There is an edge (u, v) in E Hs,t if and only if (u, v) is in one of the following sets: E 1 = {(u, v)|u 0 = v0 , u i = vi for 0 ≤ i ≤ s + t}, E 2 = {(u, v)|u 0 = v0 = 1, H (u, v) = 1 with u i = vi for some 1 ≤ i ≤ t}, and E 3 = {(u, v)|u 0 = v0 = 0, H (u, v) = 1 with u i = vi for some t + 1 ≤ i ≤ s + t} Fig. 1. Definition 2 ([9]) Let G be an undirected simple graph with H as a subgraph. A set F is said to be a subgraph cut if G − V (F) leaves the G disconnected or with a trivial graph as one of the components. Now F, a subgraph cut is a H -structure (resp. H substructure) cut if every subgraphs (sub-subgraphs) in F are isomorphic to H . The minimum cardinality needed for the set F being a H -structure (resp. H -substructure) cut to disconnect G is the H -structure (resp. H -substructure) connectivity. Theorem 1 ([15]) κ(E Hs,t ) = λ(E Hs,t ) = s + 1 where 1 ≤ s ≤ t.
252 Fig. 1 Exchanged hypercube E H1,2
P. Immanuel and A. B. Greeni
0000
0001
0011
0010
1000
0101
0111
1010
0100
1001
1011
0110
1100
1101
1111
1110
Theorem 2 ([19]) κ4 (E Hs,t ) = min{s, t} for min{s, t} ≥ 3. Theorem 3 ([17]) κ (E Hs,t ) = λ (E Hs,t ) = 2s where s ≤ t. Notations needed F1s,t : subset of {{x}\x ∈ V (E Hs,t )} F2s,t : subset of {{x1 , x2 }\x1 , x2 ∈ E(E Hs,t )} F3s,t : subset of {{x1 , x2 , x3 }\xi , x3 ∈ E(E Hs,t ) for every i = 1, 2} F4s,t : subset of {{x1 , x2 , x3 , x4 }\xi , x4 ∈ E(E Hs,t ) for every i = 1, 2, 3} Z 4 : subset of {{x1 , x2 , x3 , x4 }\{(x1 , x2 ), (x2 , x3 ), (x3 , x4 ), (x4 , x1 )} ∈ E(E Hs,t )} Q is : a hypercube where 1 ≤ i ≤ 2t j Q t : a hypercube where 1 ≤ j ≤ 2s j j E 1 (Q is ∪ Q t ) : subset of {{x, y}\x ∈ V (Q is ) and y ∈ V (Q t ) or otherwise} i i E 2 (Q s ) : subset of {{x, y}\x, y ∈ V (Q s )} j j E 3 (Q t ) : subset of {{x, y}\x, y ∈ V (Q t )} E 1 ∪ E 2 : subset of {{u, N (u)}\u ∈ Q s , N (u) ∈ (Q s ∪ Q t )} E 1 ∪ E 3 : subset of {{u, N (u)}\u ∈ Q t , N (u) ∈ (Q s ∪ Q t )} We set u = 000 . . . 00 and v = (u)0 through out the paper. Our aim is to disconnect E Hs,t into components having {u} as one of the elements of disconnected compoj j nents. For the purpose of brevity, we denote E 1 (Q is ∪ Q t ), E 2 (Q is ), E 3 (Q t ), F1s,t , s,t s,t s,t F2 , F3 , and F4 as E 1 , E 2 , E 3 , F1 , F2 , F3 , and F4 respectively, unless otherwise specified.
3 Structure and Sub-structure Connectivity of EHs,t This section is determined to find the relationship between exchanged hypercube E Hs,t and (sub)-structure connectivity with the subgraph H isomorphic to {K 1,m for 1 ≤ m ≤ 2} and C4 -cycle.
Characterization of Minimum Structure and Sub-structure Cut …
253
u
(u)s+t-2
(u)s+t
(u)s+t-1 v 2
(v)1
(v) (v)s+t-2 ((v)s+t-2)2
((v)1)2 Qs1 s+t-2 1
((v)
)
((v)s+t-2)1)2
(v)s+t-1
((v)s+t-1)2
((v)s+t-1)1 ((v)s+t-1)1)2
(v)s+t ((v)s+t)2
((v)s+t)1
((v)s+t)1)2
Fig. 2 K 1,1 -structure cut for E Hs,t
3.1 κ(E Hs,t ; K 1,1 ) and κs (E Hs,t ; K 1,1 ) Lemma 1 For s ≥ 2, t ≥ 3, κ(E Hs,t ; K 1,1 ) ≤ s + 1 and κs (E Hs,t ; K 1,1 ) ≤ s + 1. Proof Let u = 000 . . . 00 and v = (u)0 . We set F = {{(u)i , (v)i }, {v, (v)1 }\s + t − (s − 1) ≤ i ≤ s + t}. Since {{(u)i , (v)i }, {v, (v)1 }} ∈ E(E Hs,t ), evidently |F| = s + 1. As N (u) = {v, (u)i \s + t − (s − 1) ≤ i ≤ s + t}, and N (v) = {u, (v)i }, 1 ≤ i ≤ t}, E Hs,t − F will be disconnected and clearly {u} belongs to one of the components. And every element in F is isomorphic to K 1,1 , so κ(E Hs,t ; K 1,1 ) ≤ s + 1 and κs (E Hs,t ; K 1,1 ) ≤ s + 1. 2 2 Lemma 2 If i=1 |Fi2,3 | ≤ 1 then E H2,3 − ∪i=1 Fi2,3 is connected. 2 2 |Fi2,3 | ≤ 1 then E H2,3 − ∪i=1 Fi2,3 is connected. From Proof We show that if i=1 2,3 2 Theorem 1, E H2,3 − ∪i=1 Fi is obviously connected for |F22,3 | = 0. Therefore, assume |F22,3 | = 1 and consequently |F12,3 | = 0. We set u = 000000 and v = (u)0 . Let {u, v} ∈ F22,3 . Since {u, v} ∈ E(E H2,3 ) and V (E H2,3 ) − {u, v} is clearly connected by the property of the presence of hamiltonian path [14]. Thence, V (E H2,3 ) − {u, v} is connected. Lemma 3 For s ≥ 2, t ≥ 3, κs (E Hs,t ; K 1,1 ) ≥ s + 1.
254
P. Immanuel and A. B. Greeni
2 Proof The proof is complete if we show that E Hs,t − ∪i=1 Fi is connected, and that 2 E Hs,t − V (F1 ∪ F2 ) is connected for i=1 |Fi | ≤ s. For any s, t ≥ 1, κ(E Hs,t ) = 2 Fi2,3 is connected, and holds for E Hs−1,t as s + 1 [15]. By Lemma 2, E H2,3 − ∪i=1 well as for E Hs,t−1 . We prove by assuming |F2 | ≤ s. We set F2 = {{v, (v)1 }, {N (u) − v, (v)i }\s + t − (s − 1) ≤ i ≤ s + t}. Now Fig. 2 we encounter the following cases. / E(E Hs,t ). Case 1.If |F2 | = s. By the definition of E Hs,t , every {u s+t , u s+t+1 } ∈ This implies the need for at least |E 1 | = s vertices to be deleted for E Hs,t to be disconnected. Also from the set F, there exists {v, (v)1 } ∈ E 3 . So |(E 2 ∪ E 3 )| ≥ s. Case 2. If |F2 | ≤ s − 1. By the definition of E Hs,t , {{v, u s+t }, {u s+t , u s+t+1 }, s+t+1 , v}} ∈ / E(E Hs,t ), while {v, u s+t , u s+t+1 } ∈ N (u). Clearly from our assump{u tion, |V (F2 )| ≤ s − 1, |V (E Hs,t ) − V (E 2 )| ≥ 2s+t − 2(s − 1), and |V (E Hs,t ) − V (E 3 )| ≥ 2s+t − 2(s − 1). Since 2(2s+t − 2(s − 1)) = 2s+t+1 − 4(s − 1) > 2s+t , for s ≥ 2, t ≥ 3. It is clear and suffices the proof of needing at least s + 1 vertices to be deleted, and that E Hs,t is connected when |F2 | ≤ s − 1.
From the results of Lemmas 1 and 3, the following result is obtained. Theorem 4 For s ≥ 2, t ≥ 3, κs (E Hs,t ; K 1,1 ) = s + 1. As κ(E Hs,t ; K 1,1 ) ≥ κs (E Hs,t ; K 1,1 ), we have κs (E Hs,t ; K 1,1 ) ≥ s + 1. From the results of Lemma 1 and Theorem 4, the following result is obtained. Theorem 5 For s ≥ 2, t ≥ 3, κ(E Hs,t ; K 1,1 ) = s + 1.
3.2 κ(E Hs,t ; K 1,2 ) and κs (E Hs,t ; K 1,2 ) Lemma 4 For s ≥ 2, t ≥ 3, κ (E Hs,t ; K 1,2 ) ≤ s
s+2 2 s+2 2
; s = even ; s = odd.
Proof We set u = 00 . . . 0 and F = {{v, (v)1 , (v)2 }, {(u)s+t−1 , (u)s+t+1 , ((u)s+t−1 )s+t+1 }}. Since {{v, (v)1 }, {v, (v)2 }} ∈ E(E Hs,t ) and obviously {{(u)s+t−1 , ((u)s+t−1 )s+t+1 }, {(u)s+t+1 , ((u)s+t−1 )s+t+1 }} ∈ E(E Hs,t ). See Fig. 3 the proof divides into following cases. because every edge specified in Case 1. s= even. From the set F, clearly |F| = s+2 2 F is an edge set in E(E Hs,t ). Also N (u) = {{v, (u)i }\s + t − (s − 1) ≤ i ≤ s + t}. 3 ) < 2s+t+1 − 1. Thence, E Hs,t − ∪i=1 Fi Clearly N (u) ⊆ V (F) and |V (F)| ≤ 3( s+2 2 is disconnected with {u} being one of the component, with every element in F isomorphic to K 1,2 .
Characterization of Minimum Structure and Sub-structure Cut …
255
u
(u)s+t-2
(u)s+t
(u)s+t-1 v 2
(v)1
(v) (v)s+t-2 ((v)s+t-2)2
((v)1)2 s+t-2 1
((v)
)
((v)s+t-2)1)2
(v)s+t-1
((v)s+t-1)2
((v)s+t-1)1 ((v)s+t-1)1)2
(v)s+t ((v)s+t)2
((v)s+t)1
((v)s+t)1)2
Fig. 3 K 1,2 -structure cut for E Hs,t
Case 2. s= odd. From the set F, clearly |F| = s+2 because every edge speci2 fied in F is an edge set in E(E Hs,t ). Also N (u) = {{v, (u)i }\s + t − (s − 1) ≤ i ≤ ) < 2s+t+1 − 1. Thence, E Hs,t − s + t}. Clearly N (u) ⊆ V (F) and |V (F)| ≤ 3( s+2 2 3 ∪i=1 Fi is disconnected with {u} being one of the component, with every element in F isomorphic to K 1,2 . Therefore, the statement holds. 3 3 Lemma 5 If i=1 |Fi | ≤ 1, then E H2,3 − ∪i=1 |Fi | is connected. 3 Proof From Theorem 5, E H2,3 − ∪i=1 |Fi | is connected if |F3 | = 0. So we assume j |F1 |=0, |F2 |=0 and |F3 |=1. Then we set F3 ={{a, b, c}\{(a, b),(b, c)} ∈ E(Q t )}}. Let us assume x = 000001, y = 000011 and z = 000101. However, {E(E Hs,t ) − 3 |Fi | is connected. {x, y, z}} forms a hamiltonian path [14]. Thence, E H2,3 − ∪i=1 s+2 ; s = even s 2 Lemma 6 For s ≥ 2, t ≥ 3, κ (E Hs,t ; K 1,2 ) ≥ s+2 2 ; s = odd.
3 |Fi | ≤ s+2 − 1, by Proof The lemma is proved using induction on (s, t). If i=1 2 3 the above lemma E Hs,t − ∪i=1 |Fi | is connected. We assume the lemma holds for any E Hs−1,t or E Hs,t−1 . By Theorem 4, the previous lemma holds for |F3 | = 0. So let us assume |F3 | ≥ 1. Case 1. When s is even. We set v = (u)0 and F3 = {v, (v)1 , (v)2 } and that {{v, (v)1 }, {(v)1 , (v)2 } ∈ E(E Hs,t ). Now N (u) = {v, (u)i \s + t − (s − 1) ≤ i ≤ s + t} and {v, (v)1 , (v)2 } ∈
256
P. Immanuel and A. B. Greeni
V (E Hs,t ). Without loss of generality, we assume |E 1 | = 0. Let S0 be the subset of {V (E 2 ) ∪ V (E 3 )} and we set {v, (v)1 , (v)2 } ∈ S0 . By induction on hypothesis E Hs,t − S0 is connected. 3 Case 1.1. E Hs,t − S0 is connected. As i=1 |V (Fi )| ≤ 3.( s+2 − 1) < 2s+t for 2 s ≥ 2, t ≥ 3 then there exists a vertex {u} in E Hs,t − S0 with {u s+t } in V (E 1 ∩ E 2 ). 3 Fi is connected. So E Hs,t − ∪i=1 Case 1.2. E Hs,t − S0 is disconnected. If at least one component of E Hs,t − S0 3 connects to at least one component of E Hs,t − S0 , then E Hs,t − ∪i=1 Fi is connected. If disconnected there should exist a smallest component C. Case 1.2.1. If |V (C)| ≥ 2. By super connectivity of E Hs,t [17], we need at least 2s vertices to be deleted for at least one of the components to be not isolated. For contradicting our assumption |V (C)| ≥ 2. s ≥ 2, 2s > s+2 s 3 |Fi | Case 1.2.2. If |V (C)| = 1. From Theorem 4 and 5, we have E Hs,t − ∪i=1 is disconnected for at least |Fi | = s + 1, an upperbound for K 1,2 . Let V (C) = x, then N (x) ⊆ V (S0 ), |N (x)| = s and S ⊆ V (E 1 ) ∩ V (F1 ∪ F2 ∪ F3 ). Obviously, 3 |N (x)+S0 | ≥ s+t+1 − 1 > s+2 + 1. This is at contradiction as we i=1 |Fi | ≥ 2 2 2 3 s+2 assumed i=1 |Fi | ≤ 2 − 1. Case 2. The above proof follows when s is odd for s+2 . 2 From the results of Lemmas 4 and 6, the following result is obtained. s+2 ; s = even s 2 Theorem 6 For s ≥ 2, t ≥ 3, κ (E Hs,t ; K 1,2 ) = s+2 2 ; s = odd. From the results of Lemma 4 and Theorem 6, the following result is obtained. s+2 ; s = even 2 Theorem 7 For s ≥ 2, t ≥ 3, κ(E Hs,t ; K 1,2 ) = s+2 2 ; s = odd.
3.3 κ(E Hs,t ; C4 ) We present the upper bound for κ(E Hs,t ; C4 ) and prove that the bound is tight. Lemma 7 κ(E Hs,t ; C4 ) ≤ 2s . Proof Let v = 000 . . . 01, j = v s+t−(s−1) , k = v s+t−1 , l = (v s+t−1 )s+t−2 , m = v s+t , n = (v s+t )s+1 , o = (v s+t )s+t−1 , and p = ((v s+t )s+t−1 )s+t−2 . We set F = {{v, (v)1 , (v)2 , (v 1 )2 }, { j, ( j)1 , ( j)2 , ( j 1 )2 }, . . . , {k, (k)1 , (k)2 , 1 2 (k ) }, {l, (l)1 , (l)2 , (l 1 )2 }, {m, (m)1 , (m)2 , (m 1 )2 }, {n, (n)1 , (n)2 , (n 1 )2 }, . . . , {o, (o)1 , (o)2 , (o1 )2 }, { p, ( p)1 , ( p)2 , ( p 1 )2 }}. Clearly {v, (v)1 , (v)2 , (v 1 )2 } in F induces a 4-cycle subgrpah and every other edge sets induces 2s − 1, 4-cycle subgrpahs. Therefore, |F| = 2s . Since, every set of edges in F is a subset of E(E Hs,t ) and is also isomorphic to a cycle of length 4. As N (u) ⊆ V (F), E Hs,t − V (F) is disconnected with |V (F)| = 4.2s . As u is one of the vertices of a component in the E Hs,t − V (F), the lemma follows.
Characterization of Minimum Structure and Sub-structure Cut …
257
u
(u)s+t-2
(u)s+t
(u)s+t-1 v 2
(v)1
(v) (v)s+t-2 ((v)s+t-2)2
((v)1)2 s+t-2 1
((v)
)
((v)s+t-2)1)2
(v)s+t-1
((v)s+t-1)2
((v)s+t-1)1 ((v)s+t-1)1)2
(v)s+t ((v)s+t)2
((v)s+t)1
((v)s+t)1)2
Fig. 4 C4 -structure cut for E Hs,t
Lemma 8 If |Z 4 | ≤ 1, then E H1,2 − Z 4 is connected. Proof By Theorem 5, E H1,2 − Z 4 is connected if |Z 4 | = 0. Assume |Z 4 | = 1. Since j E Hs,t has Q is and Q t embedded into, which are vertex and edge symmetric, let Z 4 = j j {0000, 0001, 0011, 0101}. Clearly Z 4 ⊆ E 3 (Q 2 ) and there exists at least another Q 2 , s since 1 ≤ j ≤ 2 . Thus E H1,2 − Z 4 is connected. Remark 1 ([16]) Every edge of E Hs,t lies on an even l-cycle where 8 ≤ l ≤ 2s+t+1 . If a cycle in E Hs,t contains a E 1 then the length of cycle is at least 8. Lemma 9 For s ≥ 1, l ≥ 2, κ(E Hs,t ; C4 ) = 2s . Proof To show that E Hs,t − Z 4 is connected if |Z 4 | = 2s − 1. For s = 1, t = 2, deleting one Q t with its induced edges will definitely leave the graph connected, by Lemma 8. Hence assume for s > 1 and t > 2. To suffice the proof, we assume E Hs,t − Z 4 is disconnected if |Z 4 | = 2s − 1. If C is the smallest component of E Hs,t − V (Z 4 ), it has at least one C x−2 or C x+2 [16]. If |Z 4 | ≤ 2s − 1. Let u ∈ C and we set u = 00 . . . 00. Then N (u) = {(u)i \s + t − (s − 1) ≤ i ≤ s + t} with |V (Z 4 )| = 4.2s − 4. By [17], |V (C)| = 1 by deleting j only C4 s . So, |V (C)| ≥ 2. Since any V (C) will be a Q t then |V (C)| = 2s , from our assumption s ≤ t. Thus |N (u)| = t with degree t. Then the induced subgraph of j j j j vertices of Q t is N (Q t ) ∪ Q t . Specifically |N (Q t )| = 2s which is a contradiction to our assumption that E Hs,t − V (Z 4 ) is connected for |Z 4 | = 2s − 1. Thence, the lemma Fig. 4.
258
P. Immanuel and A. B. Greeni
The following research outcomes are left as conjectures for additional investigation. Conjecture 1 For E H s,t , s ≥ 2, t ≥ 3, and 3 ≤ m ≤ t + 1. s+2 ; s = even 2 κs (E Hs,t ; K 1,m ) = ; s = odd. s+2 2 Conjecture 2 For E H s,t , s ≥ 2, t ≥ 3, and 3 ≤ m ≤ t + 1. s+2 ; s = even 2 κ(E Hs,t ; K 1,m ) = s+2 2 ; s = odd. Conjecture 3 For E Hs,t , any C4k and k = 2n for n = W, we have κ(E Hs,t ; C4k ) = 2s+t−k−1 .
4 Conclusion In this paper, we have characterized the relationship between minimum structure and sub-structure cut on exchanged hypercube E Hs,t for s, t ≥ 2 with the set F isomorphic to K 1,1 , K 1,2 for s ≥ 2, t ≥ 3, and C4 . The above obtained results are a novel generalization of connectivity and are used in enhancing the robustness of interconnection networks. The results obtained speaks volume about the reliability and tolerability of exchanged hypercube. The applicability of these results helps in developing more feasible and fault-free networks with the exponential growth in Network on Chips.
References 1. Esfahanian A-H (1989) Generalized measures of fault tolerance with application to n-cube networks. IEEE Trans Comput 38(11):1586–1591 2. Esfahanian A-H, Hakimi SL (1988) On computing a conditional edge-connectivity of a graph. Inf Process Lett 27(4):195–199 3. Guo C, Liu Q, Xiao Z, Peng S (2022) The diagnosability of interconnection networks with missing edges and broken-down nodes under the PMC and MM* models. Comput J (2022) 4. Guo C, Xiao Z, Liu Z, Peng S (2020) Rg conditional diagnosability: a novel generalized measure of system-level diagnosis. Theor Comput Sci 814:19–27 5. Harary F (1983) Conditional connectivity. Networks 13(3):347–357 6. Hsu LH, Cheng E, Lipták L, Tan JJ, Lin CK, Ho TY (2012) Component connectivity of the hypercubes. Int J Comput Math 89(2):137–145 7. Klavžar S, Ma M (2014) The domination number of exchanged hypercubes. Inf Process Lett 114(4):159–162 8. Lü H, Wu T (2020) Structure and substructure connectivity of balanced hypercubes. Bull Malaysian Math Sci Soc 43(3):2659–2672 9. Lin CK, Zhang L, Fan J, Wang D (2016) Structure connectivity and substructure connectivity of hypercubes. Theor Comput Sci 634:97–107
Characterization of Minimum Structure and Sub-structure Cut …
259
10. Lin CK, Cheng E, Lipták L (2020) Structure and substructure connectivity of hypercube-like networks. Parallel Process Lett 30(03):2040007 11. Li D, Hu X, Liu H (2019) Structure connectivity and substructure connectivity of twisted hypercubes. Theor Comput Sci 796:169–179 12. Li XJ, Xu JM (2013) Generalized measures of fault tolerance in exchanged hypercubes. Inf Process Lett 113(14–16):533–537 13. Li X, Zhou S, Ren X, Guo X (2021) Structure and substructure connectivity of alternating group graphs. Appl Math Comput (391), 125639 (2021) 14. Loh PK, Hsu WJ, Pan Y (2005) The exchanged hypercube. IEEE Trans Parallel Distribut Syst 16(9):866–874 15. Ma M (2010) The connectivity of exchanged hypercubes. Discrete Math, Algorithms Appl 2(02):213–220 16. Ma M, Liu B (2009) Cycles embedding in exchanged hypercubes. Inf Process Lett 110(2):71– 76 17. Ma M, Zhu L (2011) The super connectivity of exchanged hypercubes. Inf Process Lett 111(8):360–364 18. Pan Z, Cheng D (2020) Structure connectivity and substructure connectivity of the crossed cube. Theor Comput Sci 824:67–80 19. Zhao SL, Hao RX (2019) The generalized 4-connectivity of exchanged hypercubes. Appl Math Comput 347:342–353
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid Subhasis Koley and Sasthi C. Ghosh
Abstract Let us take two given integers h and k where h, k ≥ 0 and a graph G = → (V (G), E(G)). An L(h, k)-edge-labeling for G is defined as a function f : E(G) − {0, 1, · · · , n} such that ∀e1 , e2 ∈ E(G), | f (e1 ) − f (e2 )| ≥ h when d (e1 , e2 ) = 1 and | f (e1 ) − f (e2 )| ≥ k when d (e1 , e2 ) = 2 where d (e1 , e2 ) denotes the distance between e1 and e2 in G. If there is no other way to connect e1 and e2 with less than k − 1 number of edges in G, then d (e1 , e2 ) = k . The objective is to find the span which is the least value of n obtained over all such L(h, k)-edge-labeling for G, and it is denoted as λh,k (G). Motivated from the channel assignment problem in wireless network, L(h, k)-edge-labeling has been studied for various infinite regular grid graphs. For infinite 8-regular grid T8 , it is known in the existing literature that 25 ≤ λ1,2 (T8 ) ≤ 28. But the lower and upper bounds obtained there are not identical. Here, we improve the lower bound and prove that λ1,2 (T8 ) ≥ 28 and eventually it is proved that λ1,2 (T8 ) = 28. Keywords L(h, k)-edge-labeling · Infinite 8-regular grid · Span · Edge-labeling
1 Introduction For the purpose of communication in a wireless network, transmitters are allotted frequency bands. As the number of frequency bands are limited, a frequency band may be reused in multiple transmitters which are communicating simultaneously. Hence, one of the main challenges here is assigning frequency bands to the transmitters effectively such that interference cannot occur due to the reusing of frequency
S. Koley (B) · S. C. Ghosh Advanced Computing and Microelectronics unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, India e-mail: [email protected] S. C. Ghosh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_20
261
262
S. Koley and S. C. Ghosh
bands simultaneously. The frequency assignment problem is also known as Channel Assignment Problem (C A P). Hale [8] formulated C A P as a classical vertex coloring problem. Here, vertices can be thought of as transmitters, colors assigned to vertices can be thought of as frequencies and two vertices having a common edge can be thought of as transmitters in the interference zone. But sometimes the effect of interference is to be considered for more than a hop distance apart due to the usage of proximate frequency. So, to include this scenario into formulation, distance-labeling of the vertices of a graph was proposed in [7, 11]. In the literature, a special case of distance vertex-labeling of a graph, L(h, k)-vertex-labeling, was proposed where distinct colors are assigned to the pair of vertices which are at one distance and two distance apart, and the gap between the colors assigned here must be predefined. Before defining the L(h, k)-vertex-labeling formally, we will define the distance of two vertices in a graph formally. Take the graph G = (V (G), E(G)) and take two vertices u, v in V (G). The distance between u and v is termed as d(u, v). If there is no other way to connect u and v with less than k number of edge/s in G, then d(u, v) = k . Now we formally define L(h, k)-vertex-labeling as follows. Definition 1 Let us take two given integers h, k where h, k ≥ 0 and a graph G = (V (G), E(G)). An L(h, k)-vertex-labeling for G is defined as a function f : V (G) − → {0, 1, · · · , n} such that for any pair of vertices v1 , v2 ∈ V (G), | f (v1 ) − f (v2 )| ≥ h when d(v1 , v2 ) = 1 and | f (v1 ) − f (v2 )| ≥ k when d(v1 , v2 ) = 2. The span is the least value over all n obtained from all possible L(h, k)-vertexlabeling of G, and it is denoted as λh,k (G). The L(h, k)-vertex-labeling for various graphs have been studied. A survey corresponding to the labeling is widely carried out in [2, 6, 12] in a detailed manner. Here in almost all the studies, the focus was to find out the span mainly. Now, like the L(h, k)-vertex-labeling, the L(h, k)-edge-labeling is also interesting. With the increase of the degree of the vertices of a graph, the L(h, k)-edge-labeling becomes more challenging. So the study of the L(h, k)-edge-labeling for a graph with a higher degree of vertices is also interesting, and thus it draws the focus of the researchers. The L(h, k)-edge-labeling for different types of graphs have been studied also. Here also, in almost all the research, the focus was to find out the span. The L(h, k)edge-labeling was proposed in [5]. Now, before defining the L(h, k)-edge-labeling, we will define the distance of a pair of edges in a graph formally. Take the graph G = (V (G), E(G)) and take a pair of edges e1 , e2 in E(G). The distance between e1 and e2 is termed as d (e1 , e2 ). If there is no other way to connect e1 and e2 with less than k − 1 number of edge/s in G, then d (e1 , e2 ) = k . Now we formally define L(h, k)-edge-labeling as follows. Definition 2 Let us take two given integers h, k where h, k ≥ 0 and a graph G = (V (G), E(G)). An L(h, k)-edge-labeling for G is defined as a function f : E(G) − → {0, 1, · · · , n} such that for any pair of edges e1 , e2 ∈ E(G), | f (e1 ) − f (e2 )| ≥ h when d (e1 , e2 ) = 1 and | f (e1 ) − f (e2 )| ≥ k when d (e1 , e2 ) = 2.
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid
263
The span is the least value over all n obtained from all possible L(h, k)-edgelabeling of G, and it is denoted as λh,k (G). Various studies for L(h, k)-edge-labeling have been carried out for different types of graphs. In most of the cases, the bounds of the span of the corresponding graphs were determined. Sometimes, it is a natural choice to model a wireless communication by infinite regular grids for the regular and symmetric patterns of the grid. So the research of L(h, k)-edge-labeling for infinite regular grids has also importance from a theoretical point of view. So among the various graph classes, many studies have been done on the infinite regular hexagonal grid, infinite square grid, infinite triangular grid and infinite 8-regular grid especially for h = 1, 2 and k = 1, 2 in [1, 3, 4, 9, 10]. In [3], L(h, k)-edge-labeling of infinite 8-regular grid T8 has been studied, and it is proved that 25 ≤ λ1,2 (T8 ) ≤ 28. Note that here the obtained lower and upper bounds are not identical. Because of the high degree, it was also challenging to improve the bounds. In this manuscript, by improving the lower bound, we prove that λ1,2 (T8 ) ≥ 28. As in [3], λ1,2 (T8 ) ≤ 28; eventually it is concluded that λ1,2 (T8 ) = 28. From now onward, we use the term color instead of label. We use the structural properties of T8 to find λ1,2 (T8 ). More specifically, we first identify a subgraph G of T8 where no two edges can have the same color. After that, we consider all edges where a pair of consecutive colors (c, c ± 1) can be used in G . Based on this, we identify the subgraphs in T8 where c and c ± 1 cannot be used. Then using the structural properties of those subgraphs, we conclude how many additional colors other than the colors used in G must be required to color T8 by using the pigeonhole principle and accordingly derive the span.
2 Results Figure 1 shows a portion of infinite 8-regular grid T8 . Observe that there are four types of edges in T8 . The edges which are along or parallel to the X -axis and the edges which are along or parallel to the Y -axis are said to be horizontal and vertical edges, respectively. The edges which are at 45◦ to any of a horizontal edge and the edges which are at 135◦ to any of a horizontal edge are said to be right slanting edges and left slanting edges, respectively. Here, the angular distance between two adjacent edges e1 and e2 represents the smaller of the two angles measured anticlockwise from e1 to e2 and from e2 to e1 . In Fig. 1, the edges ( p, q), ( p, r ), ( p, s) and ( p, t) are a horizontal, a right slanting, a vertical and a left slanting edges, respectively. By slanting edges, we mean all the left and right slanting edges and by non-slanting edges, we mean all the horizontal and vertical edges. Consider any K 4 (complete graph of 4 vertices) in T8 . The vertex set of the K 4 is S (S = V (K 4 )). Let v be a vertex in T8 and Nv be the set of vertices in T8 which are adjacent to v. Let N (S) = Nv be the set of all vertices in T8 which are v∈S
adjacent to at least one vertex in S. Let us define G S as the subgraph of T8 such that V (G S ) = S ∪ N (S), and E(G S ) is the set of all edges of T8 which are incident to at
264 Fig. 1 An infinite 8-regular grid T8
S. Koley and S. C. Ghosh
Y
t
s
r
p
q
X
(0,0)
Fig. 2 The G S corresponding to the K 4 having vertex set S = {v1 , v2 , v3 , v4 }
u1
u2
u3
u4
u12
v4
v3
u5
u11
v1
v2
u6
u10
u9
u8
u7
least one vertex in S. Figure 2 shows the K 4 with vertex set S = {v1 , v2 , v3 , v4 } and the corresponding G S . It is evident that any two edges of G S are at a distance at most 2. Hence, no two edges of G S can be given the same color for L(1, 2)-edge-labeling of G S . As |E(G S )| = 26 and d (e1 , e2 ) ≤ 2, ∀e1 , e2 ∈ E(G S ), λ1,2 (G S ) ≥ 25. Let e1 and e2 be the edges in E(G S ) where d (e1 , e2 ) = 2. Let the color c be assigned to e1 . Clearly, it is not possible to assign the colors c ± 1 at e2 for L(1, 2)edge-labeling. As in G S , there does not exist any pair of edges with distance 3 or more; both c ± 1 must be used at the adjacent edges of e1 in G S . Consider the K 4 having vertex set S = {v1 , v2 , v3 , v4 } as shown in Fig. 3. Now define S = S ∪ N (S) ∪ N (N (S)). Figure 3 shows the graph G such that V (G) = S ∪ N (S ), and E(G) is the set of all edges in T8 which are incident to at least one vertex in S . Note that there are total 25 distinct K 4 s including the K 4 having vertex set S in G. Note that here G is built over the G S . Consider a G S and the corresponding G. Suppose colors c and c + 1 are used in G S . In Lemma 1, we identify all the K 4 s having vertex sets S1 , S2 , · · · , Sm such that c and c + 1 cannot be used at G S1 , G S2 , · · · , G Sm when c and c + 1 are used in G S . To prove Lemma 1, we first take two adjacent edges where the consecutive colors are used. Now the
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid
w1
w2
w3
w4
w5
w6
w20
u1
u2
u3
u4
w7
w19
u12
v4
v3
u5
w8
w18
u11
v1
v2
u6
w9
w17
u10
u9
u8
u7
w10
w16
w15
w14
w13
w12
w11
265
Fig. 3 A subgraph G of an infinite 8-regular grid T8
angle between the adjacent edges may be 45◦ or 90◦ or 135◦ or 180◦ or 225◦ or 270◦ or 315◦ and depending on these, we derive the proof of Lemma 1. Lemma 1 For every pair of colors c and c + 1 used in two adjacent edges e1 and e2 in G S , except when e1 and e2 form an angle of 45◦ at their common incident vertex, there exist at least 4 different K 4 s having vertex sets S1 , S2 , S3 and S4 other than S such that (1) it is not possible to use c at G S1 as well as G S2 , and (2) c + 1 cannot be used at G S3 as well as G S4 . When e1 and e2 are at angle 45◦ , there exist 3 different K 4 s having vertex sets S1 , S2 and S3 other than S such that either (1) c cannot be used at G S1 and G S2 , and c + 1 cannot be used at G S3 or (2) c + 1 cannot be used at G S1 , and G S2 and c cannot be used at G S3 . Proof Note that the angular distance between any two adjacent edges in T8 can be any one of 45◦ , 90◦ , 135◦ , 180◦ , 225◦ , 270◦ and 315◦ . From symmetry, it suffices to consider the cases where the angular distance between two adjacent edges in T8 is 180◦ or 135◦ or 90◦ or 45◦ . – When angular distance between two adjacent edges is 180◦ : Observe that two adjacent horizontal edges or two adjacent vertical edges or two adjacent left slanting edges or two adjacent right slanting edges can be at 180◦ . Clearly the cases where
266
S. Koley and S. C. Ghosh
two horizontal edges and two vertical edges forming 180◦ are symmetric. Similarly, the cases where two left slanting edges and two right slanting edges forming 180◦ are also symmetric. So the following two cases are needed to be considered. Case (1) When two adjacent horizontal edges e1 = (v1 , v2 ) and e2 = (v2 , u 6 ) form 180◦ (Fig. 3): Let f (e1 ) = c and f (e2 ) = c + 1. Note that any edge e incident on any of the vertices in {u 5 , u 6 , u 7 } is at a distance at most two from e1 . As f (e1 ) = c, it is not possible to use c there for distance constraints. Similarly, observe that any edge e incident on any of the vertices in {w8 , w9 , w10 } but not incident on any of the vertices in {u 5 , u 6 , u 7 } is at distance two from e2 . As f (e2 ) = c + 1, it is not possible to use c there also for distance constraints. Hence, there exist 2 different K 4 s having vertex sets S1 = {u 5 , u 6 , w8 , w9 } and S2 = {u 6 , u 7 , w9 , w10 } such that it is not possible to use c in G S1 and G S2 . With a similar argument, it can be shown that there exist 2 different K 4 s having vertex sets S3 = {v1 , v4 , u 11 , u 12 } and S4 = {v1 , u 9 , u 10 , u 11 } such that c + 1 cannot be used at G S3 and G S4 . Case (2) When two adjacent right slanting edges e1 = (v1 , v3 ) and e2 = (v1 , u 10 ) form 180◦ (Fig. 3): Let f (e1 ) = c and f (e2 ) = c + 1. In this case, there exist 3 different K 4 s having vertex sets S1 = {u 10 , w15 , w16 , w17 }, S2 = {u 9 , u 10 , w14 , w15 } and S3 = {u 10 , u 11 , w17 , w18 } such that c cannot be used at G S1 , G S2 and G S3 . Similarly, there exist 3 different K 4 s having vertex sets S3 = {v3 , v4 , u 2 , u 3 }, S4 = {v3 , u 3 , u 4 , , u 5 } and S5 = {v2 , v3 , u 5 , u 6 } such that c + 1 cannot be used at G S3 , G S4 and G S5 . – When angular distance between two adjacent edges is 135◦ : Observe that a horizontal edge and its adjacent left slanting edge or a horizontal edge and its adjacent right slanting edge or a vertical edge and its adjacent left slanting edge or a vertical edge and its adjacent right slanting edge may be at 135◦ . Clearly, all the cases are symmetric. So only the following two cases are needed to be considered. Let us consider the horizontal edge e1 = (v1 , v2 ) and the right slanting edge e2 = (v2 , u 5 ). Let f (e1 ) = c and f (e2 ) = c + 1. From similar discussion stated in the previous case, there exist 3 different K 4 s having vertex sets S1 = {v3 , u 3 , u 4 , u 5 }, S2 = {u 4 , u 5 , w7 , w8 } and S3 = {u 5 , u 6 , w8 , w9 } such that it is not possible to use c at G S1 , G S2 and G S3 . Similarly, there exist 2 different K 4 s having vertex sets S4 = {v1 , v4 , u 11 , u 12 } and S5 = {v1 , u 9 , u 10 , u 11 } such that it is not possible to use c + 1 at G S4 and G S5 . – When angular distance between two adjacent edges is 90◦ : Observe that a horizontal edge and its adjacent vertical edge or a vertical edge and its adjacent horizontal edge or a right slanting edge and its adjacent left slanting edge may be at 90◦ . Clearly, the first two cases are symmetric. So the following two cases are needed to be considered. Case (1) Let e1 and e2 be a horizontal and a vertical edge, respectively. Assume e1 = (v1 , v2 ) and e2 = (v2 , v3 ). Let f (e1 ) = c and f (e2 ) = c + 1. As a similar argument stated in the previous cases, there exist 2 different K 4 s having vertex sets S1 = {v3 , v4 , u 2 , u 3 } and S2 = {v3 , u 3 , u 4 , u 5 } such that it is not possible to use c at G S1 and G S2 . Similarly, there exist 2 different K 4 s having vertex sets S3 = {v1 , v4 , u 11 , u 12 } and S4 = {v1 , u 9 , u 10 , u 11 } such that it is not possible to use c + 1 at G S3 and G S4 .
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid
267
Case (2) Consider the right slanting edge e1 = (v1 , v3 ) and the left slanting edge e2 = (v1 , u 12 ). Let f (e1 ) = c and f (e2 ) = c + 1. As a similar argument stated in the previous cases, there exist 3 different K 4 s having vertex sets S1 = {u 1 , u 12 , w19 , w20 }, S2 = {u 1 , u 2 , u 12 , v4 } and S3 = {u 11 , u 12 , w18 , w19 } such that it is not possible to use c at G S1 , G S2 and G S3 . Similarly, there exist 3 different K 4 s having vertex sets S4 = {v3 , v4 , u 2 , u 3 }, S5 = {v3 , u 3 , u 4 , u 5 } and S6 = {v2 , v3 , u 5 , u 6 } such that it is not possible to use c + 1 at G S4 , G S5 and G S6 . – When angular distance between two edges is 45◦ : Observe that a horizontal edge and its adjacent left slanting edge or a horizontal edge and its adjacent right slanting edge or a vertical edge and its adjacent left slanting edge or a vertical edge and its adjacent right slanting edge may be at 45◦ . Clearly, all the cases are symmetric. So only the following cases are needed to be considered. Consider the horizontal edge e1 = (v1 , v2 ) and the left slanting edge e2 = (v2 , v4 ). Let f (e1 ) = c and f (e2 ) = c + 1. As a similar argument stated in the previous cases, there exist 2 different K 4 s having vertex sets S1 = {v4 , u 1 , u 2 , u 12 } and S2 = {v3 , v4 , u 2 , u 3 } such that c cannot be used at G S1 and G S2 . Similarly, there exists a K 4 s having the vertex set S3 = {v1 , u 9 , u 10 , u 11 } where it is not possible to assign c + 1 at G S3 . If f (e1 ) = c + 1 and f (e2 ) = c, then there exist 2 different K 4 s having vertex sets S1 = {v4 , u 1 , u 2 , u 12 } and S2 = {v3 , v4 , u 2 , u 3 } such that c + 1 cannot be used at G S1 and G S2 . Similarly, there exists a K 4 s having the vertex set S3 = {v1 , u 9 , u 10 , u 11 } such that c cannot be used at G S3 . Now we state and prove the following theorems. Theorem 1 λ1,2 (T8 ) ≥ 26. Proof Take a K 4 with the vertex set S = {v1 , v2 , v3 , v4 } and the corresponding G S as shown in Fig. 2. Observe that there are 26 edges in G S . Note that there are no two edges at more than distance two apart in G S . Hence, all the colors used in G S must be distinct due to the distance constraints. Hence, 26 consecutive colors {0, 1, · · · , 25} are to be used in G S , otherwise λ1,2 (G S ) ≥ 26. In that case, consecutive colors can be assigned only at adjacent edges in G S . Let c and c + 1 be two colors used in G S . Therefore, from Lemma 1, there exists at least a K 4 having vertex set S1 such that c cannot be used at G S1 . But in G S1 , 26 distinct colors must be used. So if we do not use c in G S1 , at least a new color which is not used in G S must be used in G S1 . So at least the color 26 must be introduced in G S1 . Hence, λ1,2 (G S1 ) ≥ 26 implying λ1,2 (T8 ) ≥ 26. Theorem 2 λ1,2 (T8 ) ≥ 27. Proof Consider S = {v1 , v2 , v3 , v4 } and the corresponding G S as shown in Fig. 3. Note that λ1,2 (G S ) ≤ 26 only if 1) 26 consecutive colors {0, 1, · · · , 25} or {1, 2, · · · , 26} are used in G S , or 2) the colors {0, 1, · · · , 26} \ {c } are used in G S , where 1 ≤ c ≤ 25. The cases of {0, 1, · · · , 25} and {1, 2, · · · , 26} are clearly symmetric. Hence we consider only the following two cases.
268
S. Koley and S. C. Ghosh
Consider the first case. Observe that the set of colors {0, 1, · · · , 25} can be partitioned into 13 disjoint consecutive pairs of colors (0, 1), (2, 3), · · · , (24, 25). Consider a pair of colors (c, c + 1), where 0 ≤ c ≤ 24. As discussed in Lemma 1, for the colors (c, c + 1), there exist two distinct S1 and S2 other than S such that either c or c + 1 cannot be used in G S1 and G S2 . So for the 13 disjoint pairs mentioned above, there must be 26 such S1 , S2 , · · · , S26 other than S. Note that there are total 25 S j s including S in G. Therefore, there exist only 24 such S j s other than S in G. Hence, from pigeonhole (26 pigeons and 24 holes) principle there must be at least one G S j where two different colors which are used in G S cannot be used there and hence λ1,2 (G S j ) ≥ 27 implying λ1,2 (T8 ) ≥ 27. Now consider the second case. Consider that the colors {0, 1, · · · , 26} \ {c } have been used in G S , where 1 ≤ c ≤ 25. First consider the case when c is even. In that case, the colors {0, · · · , c − 1, c + 1, · · · 26} can be partitioned into 13 disjoint consecutive pairs of colors (0, 1), · · · , (c − 2, c − 1), (c + 1, c + 2), · · · , (25, 26). Hence proceeding similarly as above case, from pigeonhole principle, it is obtained that λ1,2 (T8 ) ≥ 27. Now we consider that c is odd. Let us first consider c = 1, 25. Note that the colors {0, · · · , c − 1, c + 1, · · · 26} can be partitioned into 12 disjoint consecutive pairs of colors (0, 1), · · · , (c − 3, c − 2), (c + 1, c + 2), · · · , (24, 25). So there must be 24 K 4 s having vertex sets S1 , S2 , · · · , S24 other than S in G. Now consider the pair of consecutive colors (25, 26). Now, from Lemma 1, there must exist at least one S25 other than S such that color 26 cannot be used in G S25 . So we need 25 such S1 , S2 , · · · , S25 other than S. But there exist only 24 such distinct S j s other than S in G. Hence from pigeonhole (25 pigeons and 24 holes) principle, there must be at least one G S j where two different colors used in G S cannot be used there and hence λ1,2 (G S j ) ≥ 27 implying λ1,2 (T8 ) ≥ 27. When c = 1, the set of colors {0, 2, · · · 26} can be partitioned into 12 disjoint consecutive pairs of colors (2, 3), · · · , (24, 25). Considering these 12 pairs and the pair of consecutive colors (25, 26), we get λ1,2 (T8 ) ≥ 27 by proceeding similarly as above. When c = 25, the set of colors {0, · · · 24, 26} can be partitioned into 12 disjoint consecutive pairs of colors (1, 2, ), · · · , (23, 24). Considering these 12 pairs and the pair of consecutive colors (0, 1), we get λ1,2 (T8 ) ≥ 27 by proceeding similarly as above. Observe that a K 3 contains one horizontal, one vertical and one slanting edge. Note that 3 colors, say c − 1, c and c + 1 may be assigned at three edges which may or may not form a K 3 (complete graph of 3 vertices) in G S , c being used at a slanting or non-slanting edge. Accordingly, we now have the following two lemmas. Lemma 2 When 3 colors c − 1, c and c + 1 are assigned at three edges forming a K 3 in G S with c being used at the slanting edge, then there exists a K 4 with vertex set S1 in G such that c cannot be used in G S1 . When c − 1, c and c + 1 are assigned at three edges not forming a K 3 in G S with c being used at the slanting edge, then there exist at least 2 different K 4 s with vertex sets S1 and S2 in G such that c cannot be used at G S1 and G S2 . Proof Let us consider that c − 1, c and c + 1 are used at three edges which are forming a K 3 with c being used at the slanting edge. The different cases where such
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid
c+1 c-1 c
a.
c+1 c c-1
c-1 c c+1
b.
c.
c-1 c
c
c
c-1
c+1
c-1
d.
c+1
c+1
h.
i.
j.
c-1
c c+1 c-1 c c+1
c-1 c c+1
e. c c-1 c+1
269
g.
f.
c+1 c-1 c
k.
c+1 c c-1
l.
Fig. 4 All the possible cases where c is used in a slanting edge e and both c ± 1 are used at edges forming 45◦ with e
scenarios occur have been shown in Fig. 4a–d. Note that all the cases that appeared in Fig. 4a–d are symmetric. So we consider any one of the cases. Now consider that colors c − 1, c and c + 1 are assigned at (v1 , v2 ), (v1 , v3 ) and (v2 , v3 ), respectively (Fig. 3). Now consider the K 4 with vertex set S1 = {v2 , u 6 , u 7 , u 8 } and observe that any edge incident to any of the vertices in S1 \ {v2 } is at distance two from (v1 , v2 ). As f (v1 , v2 ) = c − 1, it not possible to assign c at those edges. As any edge incident to v2 is at distance at most 2 from (v1 , v3 ) and f (v1 , v3 ) = c, the color c cannot be used there as well. So c cannot be used at G S1 . Now we will discuss the case where c − 1, c and c + 1 are assigned at three edges which are not forming a K 3 such that c is used at the slanting edge. Note that the colors c − 1 and c + 1 can be assigned at two edges both of which are at 45◦ with the edge having color c. The different cases where such scenarios arrive have been mentioned in Fig. 4e, g and l. The cases that appeared in Fig. 4e–h are symmetric. The instances that appeared in Fig. 4i–l are also symmetric. So we consider only the following two cases. – In the first case, assume that the colors c − 1, c and c + 1 are assigned at (v1 , v4 ), (v1 , v3 ) and (v2 , v3 ), respectively (Fig. 3). In this case, the said three edges form a structure isomorphic to Fig. 4e–h. Here, note that there exist two K 4 s with vertex sets S1 = {v4 , u 1 , u 2 , u 12 } and S2 = {v2 , u 6 , u 7 , u 8 } such that c cannot be used in G S1 and G S2 . – In the second case, assume that the colors c − 1, c and c + 1 are assigned at (v1 , v4 ), (v1 , v3 ) and (v1 , v2 ), respectively (Fig. 3). In this case, the said three edges form a structure isomorphic to Fig. 4i–l. Here also, there exist two K 4 s with vertex sets S1 = {v4 , u 1 , u 2 , u 12 } and S2 = {v2 , u 6 , u 7 , u 8 } such that c cannot be used in G S1 and G S2 . Now we consider the case where the edge having color c + 1 (or c − 1) is not forming 45◦ with the slanting edge having color c and the other edge having color c − 1 (or c + 1) forming 45◦ with the slanting edge. In that case, from Lemma 1, it follows that there exist at least 2 distinct K 4 s having vertex sets S1 and S2 other than S such that c cannot be used in G S1 and G S2 . As any one of c ± 1 is being assigned
270
S. Koley and S. C. Ghosh
at an edge forming 45◦ with the slanting edge, there also exist at least another K 4 having vertex set S3 other than S such that c cannot be used in G S3 . So in this case there are at least 3 distinct K 4 s. Now consider that both the colors c − 1 and c + 1 are used at edges not forming 45◦ with the slanting edge with color c. In this case, from Lemma 1, there exist at least 4 different K 4 s having vertex sets S1 , S2 , S3 and S4 other than S such that c cannot be used in G S1 , G S2 , G S3 and G S4 . So in this case there are at least 4 distinct K 4 s. Lemma 3 If 3 colors c − 1, c and c + 1 are assigned at three edges of G S forming a K 3 with c being used at a non-slanting edge, then there exist two K 4 s with vertex sets S1 and S2 in G such that c cannot be used in G S1 and G S2 . If 3 colors c − 1, c and c + 1 are assigned at three edges of G S without forming a K 3 with c being used at a non-slanting edge, then there exist at least 3 K 4 s with vertex sets S1 , S2 and S3 in G such that c cannot be used in G S1 , G S2 and G S2 . Proof – First, we will discuss the case where c − 1, c and c + 1 are used at three edges which are forming a K 3 with c being used at the non-slanting edge. Consider the K 3 with vertex set {v1 , v2 , v3 } in G S as shown in Fig. 3. Suppose the colors c − 1, c and c + 1 are used at (v1 , v2 ), (v2 , v3 ) and (v1 , v3 ), respectively. In this case, there exist two K 4 s with vertex sets S1 = {v1 , u 9 , u 10 , u 11 } and S2 = {v1 , v4 , u 11 , u 12 } such that c cannot be used at G S1 and G S2 . – Now we will discuss the case where c − 1, c and c + 1 are used at three edges which are not forming a K 3 with c being used at the non-slanting edge. In the previous case, as the two edges with colors c − 1 and c + 1 have a common vertex other than the end vertices of the edge with color c, we get only two different K 4 s. Note that if three edges do not form a K 3 , there does not exist a common vertex of the two edges with colors c − 1 and c + 1, other than an end vertex of the edge with color c. In that case, there exist at least 3 different K 4 s having vertex sets S1 , S2 and S3 such that c cannot be used at G S1 , G S2 and G S3 . We now consider how many K 3 S can be there in G S such that for each such K 3 , 3 consecutive colors can be used. For this, we have the following lemma. Observation 1 Consider the K 4 having vertex set S = {v1 , v2 , v3 , v4 } and the subgraph G S . There can be at most 8 K 3 s in G S such that in each of the K 3 , 3 consecutive colors can be used. Proof Consider the edges e1 = (v1 , v2 ), e2 = (v2 , v3 ), e3 = (v3 , v4 ) and e4 = (v4 , v1 ) of the K 4 having vertex set S = {v1 , v2 , v3 , v4 } (Fig. 2). Observe that there are total 12 K 3 s in G S and each K 3 has at least one edge in {e1 , e2 , e3 , e4 }. Each edge e ∈ {e1 , e2 , e3 , e4 } is a common edge of 4 different K 3 s. Let c be used in an edge e ∈ {e1 , e2 , e3 , e4 }. In order to form a K 3 with 3 consecutive colors with e, either c + 1 or c − 1 must be used in that K 3 . So, out of the 4 K 3 s that include e, at most two of them can have 3 consecutive colors. As there are 4 edges in {e1 , e2 , e3 , e4 }, we can have at most 8 K 3 s where each of them has 3 consecutive colors.
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid
271
Now we state our final theorem and prove it. From the discussion of Theorem 1 and Theorem 2, it is known that the 26 distinct colors from {0, 1, · · · , 27} are required for the coloring of the edges of a subgraph isomorphic to G S in T8 . So there should be two colors, say c1 and c2 in {0, 1, · · · , 27} which should remain unused in such a subgraph. Now some or all of the colors of 0, c1 − 1, c1 + 1, c2 − 1, c2 + 1 and 27 must be distinct. In the proof of the following theorem, we first consider that all the colors 0, c1 − 1, c1 + 1, c2 − 1, c2 + 1 and 27 are distinct and prove that there must exist a subgraph isomorphic to G S in T8 where it is not possible to assign any color from {0, 1, · · · , 27} by using Lemma 2, Lemma 3 and Observation 1. For other cases, when all the colors under consideration here are not distinct, the proofs are similar and we conclude the result of Theorem 3. Theorem 3 λ1,2 (T8 ) ≥ 28. Proof Let us consider the graph G, the K 4 with vertex set S = {v1 , v2 , v3 , v4 } and the subgraph G S as shown in Fig. 3. As per the discussion in Theorem 2, note that 26 distinct colors from {0, 1, · · · , 27} must be used in G S . In other words, there must be 2 colors in {0, 1, · · · , 27} which should remain unused. Let these two colors be c1 and c2 where c1 < c2 . Let us now consider the 6 colors 0, c1 − 1, c1 + 1, c2 − 1, c2 + 1 and 27. We first consider the case when all these 6 colors are distinct and denote X = {0, c1 − 1, c1 + 1, c2 − 1, c2 + 1, 27}. In this case, for each color c ∈ X , only one of c ± 1 is used and the other is not used in G S . From Lemma 1, for each c ∈ X , there exists at least one K 4 having vertex set S1 other than S in G such that it is not possible to use c at G S1 . Moreover, c must be used at a slanting edge in G S in this case. Considering all 6 colors in X are used in 6 slanting edges, we get at least 6 such K 4 s. For each c among the remaining 26 − 6 = 20 colors, both c ± 1 are used in G S . There are total 26 edges in G S among which 14 are slanting edges and 12 are non-slanting edges. Therefore, we are yet to consider the colors used in the remaining 14 − 6 = 8 slanting edges. Note that for each such color c, both c ± 1 are used in G S . Assume these 8 colors, x many colors c1 , c2 , · · · , cx are there such that for each ci , 3 consecutive colors ci − 1, ci , ci + 1 can be used in a K 3 . From Lemma 2, for each such color ci , there must exist at least one K 4 having vertex set S1 other than S in G such that ci cannot be used at G S1 . Considering all these x colors, we get at , cx+2 , · · · , c8 least x such K 4 s in G. Now consider the remaining 8 − x colors cx+1 used in slanting edges. Note that for each such ci , 3 consecutive colors ci − 1, ci and ci + 1 cannot be used in a K 3 . From Lemma 2, for each such ci , there exist at least 2 different K 4 s having vertex sets S1 and S2 other than S in G such that ci cannot be used at G S1 and G S2 . Considering all these 8 − x colors, we get at least 2(8 − x) K 4 s in G. Now consider the 12 non-slanting edges in G S . Assume that among them, y many colors c1 , c2 , · · · , cy are there such that for each ci , 3 consecutive colors ci − 1, ci , ci + 1 can be used in a K 3 . Clearly, all those y many K 3 s must be different from those x many K 3 considered for slanting edges. From Observation 1, there exist at most 8 K 3 s in G S such that for each of them, 3 consecutive colors can be
272
S. Koley and S. C. Ghosh
used. As x many K 3 s have already been considered for slanting edges, y can be at most 8 − x. From Lemma 3, for each such ci , there exist at least 2 different K 4 s having vertex sets S1 and S2 other than S such that ci cannot be used at G S1 and G S2 . Considering all these 8 − x colors, we get at least 2(8 − x) K 4 s. We are yet to consider the remaining z = 12 − (8 − x) = 4 + x non-slanting edges. For each such color c , the colors c − 1, c and c + 1 cannot be used in a K 3 in G S . So from Lemma 3, for each such c , there exist at least 3 different K 4 s having vertex sets S1 , S2 and S3 other than S in G such that c cannot be used at G S1 , G S2 and G S3 . Considering all these 4 + x colors, we get at least 3(4 + x) K 4 s. In total, we get at least 6 + x + 2(8 − x) + 2(8 − x) + 3(4 + x) = 50 K 4 s in G. But there are only 24 distinct K 4 s in G other than S. From pigeonhole principle (50 pigeons and 24 holes), there exists at least one Si in G such that at least 3 colors which are used in G S cannot be used in G Si and hence λ1,2 (G Si ) ≥ 28 implying λ1,2 (T8 ) ≥ 28. If the colors 0, c1 − 1, c1 + 1, c2 − 1, c2 + 1 and 27 are not distinct, proceeding similarly, we can show that there is a need for more that 50 K 4 s in G and hence from pigeonhole principle (more than 50 pigeons and 24 holes) λ1,2 (T8 ) ≥ 28.
3 Conclusion It was proved in [3] that 25 ≤ λ1,2 (T8 ) ≤ 28. But the upper and lower bounds obtained there are different. Here, we prove that λ1,2 (T8 ) ≥ 28. As in [3], it was shown that λ1,2 (T8 ) ≤ 28; eventually it is proved that λ1,2 (T8 ) = 28. To improve the lower bound, we first identify a subgraph of T8 and show that 26 consecutive colors are not enough to color the edges of the subgraph and conclude that λ1,2 (T8 ) ≥ 26. Next, we prove that λ1,2 (T8 ) ≥ 27 and λ1,2 (T8 ) ≥ 28 by identifying two subgraphs of T8 where 27 and 28 many colors are not enough to color the edges, respectively. To prove our results, we explore the structural properties of T8 .
References 1. Bandopadhyay S, Ghosh SC, Koley S (2021) Improved bounds on the span of L(1, 2)-edge labeling of some infinite regular grids. Springer International Publishing, pp 53–65. https:// doi.org/10.1007/978-3-030-63072-0_5 2. Calamoneri T (2011) The L(h, k)-labelling problem: an updated survey and annotated bibliography. Comput J 54(8):1344–1371 3. Calamoneri T (2015) Optimal L(j, k)-edge-labeling of regular grids. Int J Found Comput Sci 26(4):523–535 4. Chen Q, Lin W (2012) L(j, k)-labelings and L(j, k)-edge-labelings of graphs. Ars Comb 106:161–172 5. Georges J, Mauro D (2004) Edge labelings with a condition at distance two. Ars Comb 70 6. Griggs JR, Jin XT (2007) Real number labelings for paths and cycles. Internet Math 4(1):65–86 7. Griggs JR, Yeh RK (1992) Labelling graphs with a condition at distance two. SIAM J Discret Math 5(4):586–595
Improved Lower Bound for L(1, 2)-Edge-Labeling of Infinite 8-Regular Grid
273
8. Hale W (1980) Frequency assignment: theory and applications. Proc IEEE 68(12):1497–1514. https://doi.org/10.1109/PROC.1980.11899 9. He D, Lin W (2014) L(1, 2)-edge-labelings for lattices. Appl Math J Chin Univ 29:230–240 10. Lin W, Wu J (2013) Distance two edge labelings of lattices. J Comb Optim 25(4):661–679 11. Roberts F (2003) Working group agenda. In: DIMACS/DIMATIA/Renyi working group on graph colorings and their generalizations 12. Yeh RK (2006) A survey on labeling graphs with a condition at distance two. Discret Math 306(12):1217–1231. https://doi.org/10.1016/j.disc.2005.11.029, www.sciencedirect. com/science/article/pii/S0012365X06001051
Classification of Texts with Emojis to Classify Sentiments, Moods, and Emotions Sounava Pal, Sourav Mandal , and Rohini Basak
Abstract Here we propose a method to classify emoji-based content using both text and emoji images. In order to do this, we have investigated a number of machine learning and deep learning techniques for text and image classification. Two machine learning approaches- Multinomial Naive Bayes and Bernouli Naive Bayes, and one BiLSTM-based deep learning approach used for text classification in various sentiments, moods, and emotions labels, and Zero-Shot image classification approach for classifying emoji images for the same purpose have been proposed. In terms of accuracy, the emoji classification method performs with an accuracy of 50%. Both machine learning approaches produce an accuracy of 27%. The BiLSTM model performs comparatively better with an accuracy of 30%. Keywords Text classification · Image classification Emoji classification · BiLSTM
1 Introduction Social media sentiment analysis has become a hot research topic in recent years. Sentiment analysis looks for indications of a speaker’s or writer’s attitude. In addition to attitude or sentiment, a speech or a text might reveal a person’s emotion, mood, personality trait, interpersonal posture, etc., called affective states of human. These are all examples of affective states in people. From a collection of tweets, comments on social media, or comments on a YouTube video, there is a lot of potential and chance to express a variety of emotions, moods, emotions, etc. Furthermore, tweets and comments become more fascinating when a person includes an emoji or a group of emojis. The use of the emoji conveys both the context of the comments and the S. Pal · R. Basak Jadavpur University, Kolkata, India S. Mandal (B) XIM University, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_21
275
276
S. Pal et al.
user’s sentiments, moods, or emotions. The emoji or group of emojis significantly captures the different affective states of the statement. Therefore, it is crucial to keep an eye on the emojis and attempt to interpret the tone or emotions of the remarks using the emojis that are utilized. If the text has an associated emoji, it is simpler to identify the author’s intended sentiment or mood or emotion. Once the emoji is considered, it can be used to forecast a variety of sentiments, moods, and emotions. Emojis, as was previously noted, not only convey the sentiment but also the moods and emotions of the source texts. The text and emoji classification tasks from “SemEval 2018 task 2” [1] are the ones that are discussed here. The goal of this challenge is to assign a tweet to the appropriate emoji label or class. Emoji-only tweets were considered for this challenge. Many teams used various strategies to properly estimate the class of tweets. The methods [1] combined various machine learning (ML) and deep learning (DL) methods. They also employed thorough preprocessing procedures. The analysis of these works considering the competition is the primary driving force behind this work. Text with emoji classification presents a challenge because they encompass such a wide spectrum of subjectivity. As a result, great efficiency in these kinds of multiclass classification assignments is challenging to achieve. Additionally, the preprocessing of the tweets is a minor element that needs attention because even small alterations have a significant impact on the outcomes. In this work, we examine the task outlined in the competition in our own unique style and assess the benefits and drawbacks of our efforts. For text classification and image classification, our method combines ML and DL. We completed a common preprocessing step for the ML and DL tasks. Preprocessing is the process of transforming the unprocessed tweet data into a format that is more comprehensible to the algorithms being used. Even yet, there are a few stages that the ML and DL techniques do that are distinct after preprocessing. No sentences are used in the process of classifying images; just the emojis’ visual representations are used. Our proposed approach (see Fig. 1) examines and compares several ML and DL techniques, and it highlights some of the most effective techniques. As previously noted, we also used image classification [2] on the emoji pictures. To improve the overall accuracy, we suggest a method for aggregating or combining the text and image classification tasks. The proposed method makes use of a dataset that includes tweets and the emojis that go with them. The preprocessing step is applied on the tweet dataset. After that, we use ML and DL techniques to analyze the preprocessed data and apply text classification on them to predict the various sentiments, moods, and emotion labels associated with each tweet. Using the emojis from our existing dataset and the images we downloaded from the internet, we generate a new dataset for emoji classification. Our method of image classification uses this dataset. The tweets are used as the input for preprocessing. The elimination of links and punctuation follows. Then just two consecutive letters are allowed in each word before the letters are eliminated. The spelling check is completed next. After applying stopword removal, PorterStemmer is used to stem the words. Before ML application, further procedures are employed following preprocessing. The preprocessed data is extracted and represented as a bag
Classification of Texts with Emojis to Classify Sentiments, Moods …
Original dataset Original
Preprocessed Preprodata
277
TextapML classification using ML
Text DL apclassification using DL
ImImages
Image and Image emoji dataset+
Image
Image (emoji) classification
Fig. 1 Workflow of our proposed method for text and image classification
of words (BOW). Then, Synthetic Minority Oversampling Technique (SMOTE) is used to equalize all of the classes’ frequencies in order to erase the data’s imbalance. The data is then transformed into tf-idf format. The two ML approaches are then applied to the training data after the train-test splitting has been completed. The DL application differs slightly from ML. The preprocessed data is transformed into glove embedding in this instance. The BiLSTM model receives this embedding, and accuracy is calculated. We apply the Github hosted Zero-Shot Image classification module for image classification task. No prior training is necessary for the strategy. The model is given the emoji images, and it makes the best guesses it can about the emojis. The overall end objective of this work is to demonstrate how text and image-based data classification may be combined to yield improved classification outcomes that can be applied to a variety of issues in sentiment analysis. The remainder of the paper is structured as follows. The significant works on text classification using emojis are covered in Sect. 2. Section 3 goes into further information about our proposed method, dataset, and guiding concepts. The experimental outcomes of the methods used in our work are discussed in Sect. 4 of the paper. Section 5 concludes with a discussion of some potential future research directions.
2 Related Work Many works are mentionable in the text and emoji classification domain. Alexa et al. [3] presented a Multinomial Naive Bayes (MNB) and Recursive Neural Network (RNN) implementation on the classification of tweet data. In the case of the RNN application, separate models were formed to classify a text and the probability of a model was checked for evaluation. Shunning et al. [4] implemented an ensemble learning method for this task. As base classifiers, they used MNB, Random Forest and
278
S. Pal et al.
Logistic Regression. The given data is skewed, so oversampling is used for minority classes using SMOTE. Then they created a meta ensemble classifier using both the original and oversampled data. The weights are (3, 1) for Spanish and (4, 1) for English. After the evaluation, they understood that a little change in preprocessing produces better results. Liu et al. [5] employed the BiLSTM model with glove embedding for the classification task. They use Tweetokenize for preprocessing and replace the user mentions and URLs with special symbols. They observed that including more effective architectures in the BiLSTM network can improve its accuracy. Daniel et al. [6] followed a different approach for this task. They replaced emojis with text strings like smile, laugh etc. They used different linear and nonlinear classifiers. They used MNB, Logistic Regression and Linear SVM as linear classifiers. Then Random Forest and AdaBoost with Decision Tree Base were used. As part of DL, they used a 2-layer BiLSTM with a dropout rate of 0.35 and Adam optimizer. They also adapted CNN as it works faster than RNN. Glove embedding is used for RNN architecture. The task is incredibly challenging given the tweets are full of slang and ambiguous emoticons. As future work, deeper CNN can be implemented as Squeeze and Excitation networks for text. Taraka et al. [7] experimented and compared two methods–SVM and RNN. For SVM, they used a bag of word/character n-gram features and for RNN, they used word and character sequences as input. They observed that linear models, particularly SVMs, produce better results than deep neural models in a series of text classification tasks. Wiegand et al. [8] studied the use of emojis to detect abusive language. Felbo et al. [9] worked on emoji embeddings. They used a single pre-trained model to obtain state-of-the-art performance on 8 benchmark datasets with sentiment, emotion, and sarcasm detection. Saroufim et al. [10] experimented with a sentiment specific word embedding (SSWE) and showed that it works better than the pre-trained word embeddings like word2vec. This experiment was conducted on French tweets that were auto labeled by a set of positive and negative emojis. They also applied a transfer learning approach to refine the network weights with a small size manually labeled training data set. Irony detection and sentiment analysis are performed on tweets by Singh et al. [11]. They approached the emoticons differently. They used them as any other word embeddings and swap out the emojis for their textual descriptions instead of using any emoji embeddings at all. They accomplished this by using a straightforward neural network model to achieve new state-of-the-art performance. They were able to identify irony in the tweet with 81% for binary classification and 72% accuracy for four-way classification (verbal irony with polarity contrast, other verbal irony, situational irony, non-irony). Additionally, they achieved three-way sentiment classification accuracy of 70% (positive, neutral, or negative). Emoji usage in tweets was investigated by Donato et al. [12] who divided it into three categories: 1. Redundant or providing information that is already present in the text. 2. Non redundant, new information is included in the text, making it non-redundant. 3. Non redundant + POS, which is a subclass of the redundant class in which an emoji is used in place of a word. With the use of an interface they designed, people manually annotated these
Classification of Texts with Emojis to Classify Sentiments, Moods …
279
tweets. They analyzed the dataset, pointed out issues with the annotations, and made recommendations for future enhancements. They provided a method for examining the relationship between the repetition of words, emoticons, and words in a tweet.
3 Proposed Methodology The emoji prediction challenge entails predicting an emoji from a text message that contains that emoji solely based on that message’s contents. This work specifically focuses on the single emoji that appears in tweets, using data from Twitter in the process. The example is like–“Last hike in our awesome camping weekend! .” It is an example of a tweet with an emoji at the end that was considered for the emoji classification task. The challenge calls for the use of tweets that contain one of the top 20 emoticons used most frequently in English. As a result, the task can be seen as a 20-label multiclass classification problem. Two ML, one DL, and one image classification methods make up our proposed methodology. Both the ML and DL techniques require the same preprocessing steps. Figure 2 provides a general understanding of the task.
Preprocessed data
Input Data
Images
Glove embedding
Image and emoji dataset
BOW transformation
SMOTE transformation
Text classification using DL
Tf- idf transformation
Image (emoji) classification
Text classification using ML
Fig. 2 Proposed methodology for text (sentiments, moods, and emotions) and image (emoji) classification
280
S. Pal et al.
Figure 2 shows the overview of our proposed method. To classify texts in various sentiments, moods, and emotions labels, we use the tweet dataset. Tweets are screened beforehand. Applications for ML and DL use the same data. The preprocessed data is subjected to a bag of words transformation for the ML technique. To correct the imbalance in the number of samples in the output classes, Synthetic Minority Oversampling Technique (SMOTE) is used. The data is then transformed using Tf-idf. ML is used on the train and test sets of data that have been divided. We employ Bernouli Naive Bayes and Multinomial Naive Bayes as our two classification algorithms to classify the tweets into their emoji labels. We employ the exact same preprocessed data for the DL strategy to achieve a better performance than the ML approaches on the tweet classification task. This data is transformed into a format for glove embedding. Train-test splitting is then carried out. The DL approach uses this training data as its input. We use Bidirectional Long Short-Term Memory (BiLSTM) as part of the DL approach. It calculates the accuracy of the training data after training on the data from the training set. The test set is then used for testing. For the image classification challenge, we combine the emojis that are available and download the emoji images from the internet to generate a new dataset. Then, this dataset is subjected to image classification. For emoji classification, we used the Zero-Shot Image classification approach. This technique categorizes an emoji image into categories. This is how our proposed approach would generally operate.
3.1 Dataset The renowned SemEval 2018 task-2 dataset is the one used for the task. The dataset1 made available for the contest is used. The dataset in this case consists of 550 million English tweets and 100 million Spanish tweets each of which contains one emoji. Predicting the emojis used in the tweets was the goal of the tournament. The top 20 emojis used most frequently in both languages were taken into consideration for the task. The dataset consists of the following files.
3.1.1
Mapping
The mapping of the emojis, together with their written descriptions and IDs, are all stored in a mapping file. Instead of using the real emojis for the assignment, the ids of the emojis are used. The mapping between English tweets and emojis are described in Fig. 3.
1
https://github.com/fvancesco/Semeval2018-Task2-Emoji-Detection/tree/master/dataset.
Classification of Texts with Emojis to Classify Sentiments, Moods …
281
Fig. 3 Mapping emojis with the description (of dataset)
3.1.2
Tweet Dataset
The actual tweets that were used for the task are contained in this dataset file. 489,609 tweets in all were included in the file. Since these are tweets from genuine users, the language is quite expressive; for example, they feature a wide variety of emotions, slang terms, spelling errors, repeated letters in words, etc.
3.1.3
Data Labels
The tweet file and the label file are in sync. Every tweet in the tweet file has an id or number linked with it. According to the mapping file mentioned above, the ids stand in for the emojis.
282
S. Pal et al.
3.2 Preprocessing The initial application involves concatenating the tweet file, labels file, and actual emojis. The whole list appeared as illustrated in Fig. 4. The tweet data is an illustration of tweets from actual users. As a result, it contains several spelling errors, slang, links, user mentions, etc. The data must be converted into standard English language. On the data, we apply operations in steps. An empty corpus array is initially retained. The array is to be updated with the amended results. The outcome of the preprocessing task is produced by this array. The initial tweet data that was provided at the beginning is used for the process. Then, using the re-library, all the extraneous words from the data are removed, including user mentions, hashtag symbols, and other numbers. After that, duplicate letters within words are dealt with. The words are run through an algorithm that only permits a word to have a maximum of two identical letters. As a result, words like “awww” become “aww”. It reduces the number of spelling errors. The spelling checks are completed next. The incorrectly spelled words are transformed into their correct forms. It is accomplished by using the correct() method of the Python Textblob module. It accepts the word to be fixed and outputs the fixed word. It can occasionally change a misspelled word into one that was not meant. Still, it generates excellent results in terms of accurate spellings. The stopwords are then eliminated. The NLTK stopwords module is used to compile the stopwords list. In this stage, it is determined whether each word in the dataset is one of the stopwords. The term is disregarded if it appears in the stopwords. The words are then subjected to stemming. The PorterStemmer module does it with the “stem” approach. This concludes the task of preprocessing. Following the completion of the preprocessing procedure, the corpus appears as illustrated in Fig. 5. Data is applied to ML and DL techniques after preprocessing. There are some more processes before the ML and DL application, as was previously described.
Fig. 4 The tweet file, the id of the emoji and the actual emojis
Classification of Texts with Emojis to Classify Sentiments, Moods …
283
Fig. 5 Preprocessed tweet file
3.3 Proposed Machine Learning-Based Approaches The best and most well-known approaches for these kinds of classification jobs are ML approaches. These algorithms examine all the data, evaluate various probabilities, and then deliver the best result. These algorithms have an exceptionally low error rate, and the workflow is quite straightforward. There is also room for experimenting with various features and parameters and contrasting the results. The ML techniques are an absolute necessity for these kinds of tasks because of all these benefits. There are a few things that need to be done to the data before training to make it ready. Machines only comprehend numerical data since they cannot grasp any language. The data must be transformed into a format and quantitative form for the training stage. The actions taken during this phase are listed below. Vectorize the text data using. 1. BOW technique. 2. SMOTE technique. 3. tf-idf. 3.3.1
Vectorization Using BOW
The data is first vectorized using BOW in this stage. The process of transforming words into a vector form is known as the BOW approach. Consequently, the entire data set is represented as (sentence number, word vector). The countvectorizer module is imported to complete this step. After that, the countvectorizer technique is applied to the preprocessed tweet data and it is saved in a variable. Figure 6 depicts the results of these stages in their ultimate form.
3.3.2
SMOTE Method
The classes are now seen to be unbalanced. That indicates that the values in the output classes are distributed unevenly. Figure 7 shows the imbalance in the dataset. With a total of 20 emoji classes, the red heart emoji accounts for more than a fifth of the samples. To ensure that the training is properly carried out, we must put into practice some method that ensures the number of class values are equal. The SMOTE, or Synthetic Minority Oversampling Technique, is used to address this imbalance problem. SMOTE addresses issues that arise from employing an unbalanced data set.
284
S. Pal et al.
Fig. 6 BOW representation
It is vital to learn the skills required to work with this type of data because imbalanced data sets frequently appear in practice. This module is imported to run the SMOTE algorithm. The prior BOW transformer form of the data is then fitted and resampled. Each class increases in number until it equals the number of samples in the class with the majority.
Fig. 7 The unequal number of emojis distributed
Classification of Texts with Emojis to Classify Sentiments, Moods …
3.3.3
285
Tf-idf Transformation
The data must be transformed into tf-idf format as the last step before applying ML. Each word in the text is transformed into its tf-idf vector. As a result, every word in the text is understood as important while processing them. It serves as a score to emphasize the significance of each word across the entire text. The tf-idf method is used to do this operation. As mentioned in Sect. 2, Alexa et al [3] ’s text classification using Multinomial Naive Bayes (MNB) yielded good results. MNB was utilized by Mustakim et al. [13] to create a model that can recognize the sentiment of movie critic videos in both Tamil and Malayalam. In order to perform sentiment analysis and categorize code-mixed (English-Hindi, English Spanish) social media content into positive, neutral, or negative categories, Zhu et al. [14] proposed an ensemble model utilizing MNB. When attempting to categorize the sentiment of tweets, Badr et al. [15] employed MNB to examine the significance of part of speech characteristics, sparse phrasal features, and combinations of unigrams and phrasal features. The MNB model does the aforementioned functions incredibly well. The Naive Bayes classifier methods are among the most effective ML algorithms for multiclass classification issues. In comparison to other algorithms, they perform better and need less training data. For categorical output classes like these, these algorithms work best. Additionally, they require less training time. For this work, we employ two Naive Bayes classification algorithms: MNB and Bernouli Naive Bayes. The train set is made up of 80% of the data, and the test set is made up of the remaining 20%. In this work, we kept the train-test split ratio at 80% and 20% for all the training assignments. Then ML models MNB and Bernouli Naive Bayes are used. The accuracy of both models is almost the same, producing 27% for each.
3.4 Deep Learning Approach (BiLSTM-Based) The classification of the text was performed using a bidirectional LSTM model. Applying the model involves several stages. Preprocessing follows the same steps as those described in prior sections. The identical preprocessed data from step one is used in step two. Tokenizing the text data comes first. The words are thus divided and handled as independent tokens. The words are separated by a comma for this phase. It is accomplished with the aid of Python’s “re” package. The line “let’s go there” appear like “let’s,” “go,” and “there” after completing this step.
3.4.1
Padding
After that, padding is added to equalize the length of each sentence. For this, we assume that each sentence is 50 words long. This length is completely arbitrary. We need to include some symbols for the remaining length of the line because our sentence length is 50. For instance, if a line contains 20 words, there must be
286
S. Pal et al.
some words or symbols to fill the remaining 30 spaces. The leftover spaces after the sentence length are filled with the number “0.“ The remaining characters are all zeroes. To accomplish this, we take each line of the input and fill in the remaining length with ‘0’s. Following this modification, the tweet data have a final length of 0 and a padding of 50 for each line.
3.4.2
Glove Word Embedding
Each word is transformed into a tokenizer index format. Each word is then transformed into a word:index format. The next step vocabulary building is aided by this step. Tokenizer module handles this step. The data from the previous stage is converted and fitted into the module. Our current objective is to vectorize every word of the text input into a predetermined vector format. Accordingly, each word is represented by a set of three-dimensional vectors. That assists the machine in determining the relationship between various words and the significance of various terms in deciding the text’s label. Glove word embedding [16] is the name of the vector embedding technique utilized. The Glove vector is a pre-made vector made up of vector representations of English vocabulary words. This implies that each word can be represented as a vector with a certain dimension. The 50d Glove embedding is employed in our work. The data from the tokenizer index is utilized to create the Glove embedding representation. The glove embedding is traversed, and it is determined whether the word is contained in the tokenizer index data. If it is, the vector is copied and maintained in accordance with the word. The words without vector embedding are disregarded. The primary dataset is then orientated with relation to word embeddings. To do this, every word in the dataset is examined to see if it has a Glove vector representation or not. If so, the word is copied along with the appropriate vector representation. Otherwise, 50 “0” are kept in place of the term. The remaining 50 “0 s” are preserved after the sentence’s length. After the entire transformation, the dataset is represented as a vector, with each line having 50 × 50 dimensions. Accordingly, each word is represented as a 50-dimensional vector and there are 50 words per line. The DL model uses this transformed data as its input. The entire procedure is depicted in Fig. 8.
3.4.3
BiLSTM Model
The DL model uses a Bidirectional Long Short-Term Memory (BiLSTM) [16]. The model can be constructed in various layers and, as the name implies, runs “deep” into the input over numerous epochs or turns. It highlights the relationships between various inputs and outputs as well as between input classes. As its name implies, this BiLSTM network also has the benefit of memory; that is, it can store the key information from input sequences and learns what to memorize and what not to. Additionally, the BiLSTM model concatenates both outputs after analyzing the input in both directions (ahead and backward). It is essential to utilize these models because
Classification of Texts with Emojis to Classify Sentiments, Moods …
287
Glove embedding
Tokenizer index data
Glove vectorization
Preprocessed data Glove vectorization
Input to BiLSTM
Fig. 8 Glove word embedding of input text data
of their exceptional success rates in these kinds of multiclass classification tasks. For each input, we supply a word embedding of size 50*50. Each input is attempted to be predicted and classified into one of the 20 output classes by the model. The input is supplied while a bidirectional unit of 100 is added. The dropout is then increased by 0.2. Then, a 50-unit BiLSTM layer is added again. The output layer is then included, with the activation function being “ReLu.“ Categorical-crossentropy served as the loss function and the Adam optimizer was employed. The model is thus trained. Figure 9 depicts the proposed DL model for tweet classification into 20 different classes using BiLSTM.
3.5 Image (Emoji) Classification Approach Emojis can also be read as images. Emojis could outperform sentences if they are treated as images and given the right classification. We can identify and anticipate an emoji in this way as well. In this method, we consider the emojis to be images and attempt to properly categorize them. The same dataset is utilized here. The top 20 English-language emojis used on Twitter made up the dataset we use. We made a file with the images of the emojis and the actual emojis after downloading the photos related to those emojis. As a result, we have the image file and the appropriate emojis, as well as the tweet file. The Zero-Shot image classification process [2] is employed for image classification. It is a GitHub-hosted pre-trained module. It forecasts the probabilities of each label based on the image using an input of an image and a few labels. That image can have the label with the highest score applied to it. The overview of our approach of classifying images is shown in Fig. 10. We compile the list of emojis from our initial dataset, which included tweets and emojis.
288
S. Pal et al.
c1 c2… cn
Output layer Fully connected layer L21
L11
L12
L22
L23
BiLSTM layer 2
L2n
L1n
L13
BiLSTM layer 1 Embedding layer Input layer
Fig. 9 Proposed BiLSTM-based neural network model for tweet classification
We then construct a dataset with the photos and the actual emojis after downloading the images of these emojis from the internet. The zero-shot image classification technique [2] is then used on the images. We include a description for each emoji in our dataset’s “mapping” file (see Fig. 3). These labels are supplied by us as input. For each of the supplied photos, the image classification algorithm tries to predict the actual label. It accomplishes this by giving each of those labels a score. The label with the highest score is then taken into consideration as the actual label for that image. Finally, we evaluate the model’s accuracy by contrasting the outcomes with the actual emojis.
Actual emojis
Image +
Image and emoji dataset
Image
Image (emoji) classification
Im- of Images ages of emojis
Fig. 10 Overview of our mage (emoji) Classification approach
Results
Classification of Texts with Emojis to Classify Sentiments, Moods … Table 1 Accuracy of different approaches
Our approaches
289
Accuracy
BiLSTM
29.455
Multinomial Naive Bayes
27
Bernouli Naive Bayes
27
Image classification
50
4 Experimental Results This section discussed the outcomes of the suggested strategies. This task consists of three different sorts of operations. They are image categorization, machine learning, and deep learning. Multinomial Naive Bayes and Bernouli Naive Bayes are applied in machine learning. Deep learning employed BiLSTM. A method for classifying images is also employed. Table 1 displays the results in terms of accuracy. Our model performs well for picture classification. The only issue is that the classifier had some difficulty categorizing the many types of emoji faces, such as winking faces, laughing faces, sobbing faces, etc. It labels most cases as winking faces. Except that it functions remarkably accurately. Two ML and one DL techniques are employed to categorize the text. The work is difficult because there was a large (20) output class and raw Twitter data. However, by experimenting with various variables, our algorithm attains 30% accuracy. The accuracy of the top-performing teams is higher than this. It is acknowledged that additional consideration for preprocessing should be given while performing such activities. Sometimes, punctuation or other signals that are retained in the data improve the accuracy. Additionally, several preprocessing tools can be employed, such as “Ekphrasis” or “Tweetokenizer.“ Additionally, various algorithms, including SVM, Random Forest, MNB, RNN, CNN, BiLSTM, and many more, carry out this work. Our approach performs better than text classification when it comes to image categorization. If the labels are not as detailed, it may yield better results. It demonstrates some difficulties in identifying the variations of a common emoji type. Emojis that resembled the classifier, for instance, included smiling faces, smiling faces with tears, winking faces, and others. Therefore, fewer labels with specific information will result in better outcomes. However, the outcomes are decent, and it is error-free in circumstances that are evident.
5 Conclusions and Future Work This work presents a method for categorizing texts based on the emojis they include. A preset dataset from Semeval 2018 job 2 is utilized to address the task. The dataset includes tweets in both English and Spanish together with the corresponding emojis. The tweets and the emoticons are separated. Only English-language tweets are used
290
S. Pal et al.
for this task. While working with the tasks, it became clear that the language used in the tweets is quite informal and that the users choose how to communicate their emotions. Emojis are used, thus it’s clear that the situation was serious. Due to the emoji’s broad range, there may also be many topics to comprehend. In future, these two methods could be combined or aggregated to get a more accurate outcome. For instance, combining the results of text and image classification will increase our overall accuracy. The quality of this emoji-related sector will greatly benefit researchers in the field of NLP, and this will open a lot of opportunities for future studies. A wide spectrum of subjectivity, context, emotion, information, etc. are produced by emojis. Emoji processing will be a fascinating research topic and beneficial for the public in the foreseeable future.
References 1. Barbieri F, Camacho-Collados J, Ronzano F, Anke LE, Ballesteros M, Basile V, Patti V Saggion H (2018) Semeval 2018 task 2: multilingual emoji prediction. In: Proceedings of the 12th international workshop on semantic evaluation, pp 24–33. (Jun 2018) 2. Available at: https://www.kaggle.com/code/nulldata/emoji-classifier-zero-shot-image-classific ation. Accessed Jan 2022 3. Alexa L, Lorent AB, Gifu D, Trandabat D (2018) The dabblers at semeval-2018 task 2: multilingual emoji prediction. In: Proceedings of the 12th international workshop on semantic evaluation, pp 405–409. (Jun 2018) 4. Jin S, Pedersen T (2018) Duluth urop at semeval-2018 task 2: multilingual emoji prediction with ensemble learning and oversampling. arXiv:1805.10267 5. Liu M (2018) EmoNLP at SemEval-2018 task 2: english emoji prediction with gradient boosting regression tree method and bidirectional LSTM. In: Proceedings of the 12th international workshop on semantic evaluation, pp 390–394. (Jun 2018) 6. Kopev D, Atanasov A, Zlatkova D, Hardalov M, Koychev I, Nikolova I, Angelova G (2018) Tweety at semeval-2018 task 2: predicting emojis using hierarchical attention neural networks and support vector machine. In: Proceedings of the 12th international workshop on semantic evaluation, pp 497–501. (Jun 2018) 7. Çöltekin Ç, Rama T (2018) Tübingen-oslo at SemEval-2018 task 2: SVMs perform better than RNNs in emoji prediction. In: Proceedings of the 12th international workshop on semantic evaluation, pp 34–38. (Jun 2018) 8. Wiegand M, Ruppenhofer J, Kleinbauer T (2019) Detection of abusive language: the problem of biased datasets. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, vol 1 (Long and Short Papers), pp 602–608. (Jun 2019) 9. Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S (2017) Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv:1708.00524 10. Saroufim C, Almatarky A, Hady MA (2018) Language independent sentiment analysis with sentiment-specific word embeddings. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 14–23. (Oct 2018) 11. Singh A, Blanco E, Jin W (2019) Incorporating emoji descriptions improves tweet classification. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, vol 1 (Long and Short Papers), pp 2096–2101. (Jun 2019)
Classification of Texts with Emojis to Classify Sentiments, Moods …
291
12. Donato G, Paggio P (2017) September. Investigating redundancy in emoji use: study on a twitter based corpus. In: Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 118–126 13. Mustakim N, Jannat N, Hasan M, Hossain E, Sharif O, Hoque MM (2022) CUET-NLP@ DravidianLangTech-ACL2022: exploiting textual features to classify sentiment of multimodal movie reviews. In: Proceedings of the second workshop on speech and language technologies for Dravidian languages, pp 191–198. (May 2022) 14. Zhu Y, Zhou X, Li H, Dong K (2020) Zyy1510 team at SemEval-2020 Task 9: sentiment analysis for code-mixed social media text with sub-word level representations. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1354–1359. (Dec 2020) 15. Badr BM, Fatima SS (2015) Using skipgrams, bigrams, and part of speech features for sentiment classification of twitter messages. In: Proceedings of the 12th international conference on natural language processing, pp 268–275. (Dec 2015) 16. Available at: https://www.kaggle.com/code/abhijeetstalaulikar/glove-embeddings-bilstm-sen timent-analysis/notebook. Accessed Jan 2022
A Study on Fractional SIS Epidemic Model Using RPS Method Rakesh Kumar Meena
and Sushil Kumar
Abstract The present study considers the fractional-order SIS epidemic model in the Caputo sense. The primary goal of this work is to obtain the semi-analytical solution of the resulting nonlinear system of fractional ODEs using the residual power series (RPS) approach. The obtained result in terms of convergent series are compared with the Runge-Kutta method of order-4 (RK-4) for β = 1 and are represented graphically to demonstrate the reliability and accuracy of the RPS method. Further, we also discuss the characteristics of the proposed model for various values of fractional order β ∈ (0, 1]. Keywords Caputo fractional derivative · Epidemiology · Fractional-order differential equations · Fractional-order SIS epidemic model · Fractional power series · Residual power series method
1 Introduction Historically, for developing countries, it is generally considered that the enemy of human health is an epidemic, which has been the cause of suffering and mortality over time. Epidemiology is the study of how diseases spread and how they affect people. It includes a lot of different fields, from biology to philosophy and sociology; all of these fields should be utilized to understand infection better and stop it from spreading. Recently, we discovered that fractional calculus has a wide range of applications in science and engineering, including fluid dynamics, viscoelastic systems, and solid dynamics [2, 3, 24, 29]. Although the fractional derivative operator is more difficult to grasp than the integer order, numerical techniques exist in the literature for solving R. K. Meena (B) · S. Kumar Department of Mathematics, S. V. National Institute of Technology Surat, Surat 395007, GJ, India e-mail: [email protected] S. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_22
293
294
R. K. Meena and S. Kumar
systems of linear and nonlinear fractional differential equations (FDEs) [7, 11, 30, 32–34]. Fractional calculus is useful in epidemiology because it is crucial to the superdiffusive and subdiffusive processes [35]. The memory index appears to be a physical interpretation of the fractional order in fractional calculus, making it relevant to epidemiology [26]. Consider a population (S(t) + I (t) = N ) that remains constant and is divided into two groups: the susceptibles denoted by S(t), who can catch the disease at time t; the infectives, denoted by I (t), who are infected and can transmit the disease to the susceptibles at time t. Here, the rate between S(t) and I (t) is assumed to be constant over time. Also, demographic divisions based on mobility, age, sex, birth, death, or any other distinctions are not considered, even though they are undoubtedly significant. Since, from the modeling perspective, only the overall state of a person with respect to the disease is relevant, the progress of individuals is schematically described by r1 S(t) I (t). r2
These types of models are known as SIS models. Using the concept of compartment model [19, 20], the SIS epidemic mathematical model can be formulated as d S(t) = −r1 S(t)I (t) + r2 I (t), dt d I (t) = r1 S(t)I (t) − r2 I (t), dt
(1a) (1b)
with S(0) = S0 and I (0) = I0 . The positive constants r1 and r2 are called the infections and recovery (or susceptible) rates, respectively. Nowadays, there are several studies ongoing using fractional derivatives in biological models. The study of mathematical epidemiological models has a long history, going back to the theory Kermack and McKendrick [19] established in the early 1900s. El-Saka [14] discussed the fractional SIS model with variable population size in 2014. Angstmann et al. [4] studied the compartment models with fractional order in 2017. In 2018, the fractional SIS model was discussed by Hassouna et al. [16]. Hasan et al. [15] studied the SIR epidemic model with fractional order in 2019. In 2020, the SIS epidemic model with fractional order in the case of constant population size was discussed by Balzotti et al. [6]. Dynamics and numerical approximations for a fractional SIS epidemic model with a saturating contact rate were discussed by Hoang et al. [17] in 2020. A new fractional SIS epidemic model with a fear effect was discussed by Mandal et al. [25] in 2020. In 2021, Chen et al. [8] studied the review of epidemic models with fractional order. Jana et al. [18] have discussed the fractional SIS model with saturated treatment and disease transmission in 2021. In 2021, the effects of fractional derivatives with different orders in the SIS model were also derived by Balzotti et al. [5]. Meena and Kumar [27] studied the SIR epidemic model with fractional order in 2022.
A Study on Fractional SIS Epidemic Model Using RPS Method
295
In this study, we have considered the SIS epidemic models with classical and fractional orders of influenza disease. Here, we have used the non-dimensionalized technique to simplify the fractional SIS model and obtained the semi-analytical solution of the proposed model using the RPS approach. Further, we compare the results with RK-4 for β = 1, and give a graphical representation to show the reliability of the RPS method. Also, shows the impact of fractional order β ∈ (0, 1] on susceptible (s(τ )) and infectives (i(τ )). The rest of the paperwork is portioned as follows: Basic preliminaries are presented in Sect. 2. The convergence of the RPS method is given by Sect. 3. Section 4 and Sect. 5 have the formulations of the dimensional and non-dimensional fractional SIS models, respectively. The stability analysis and existence have been discussed of the non-dimensional fractional SIS model in Sects. 6 and 7, respectively. Section 8 describes the procedure of the RPS method. Sections 9 and 10 are about the numerical problem and results, respectively, and finally Sect. 11 has a conclusion.
2 Preliminaries Definition 21 ([31]) Let z, β ∈ R+ , then for function q(z), the Caputo derivative can be defined as z q ( p) (u) 1 C β a (z−u)β+1− p du, if ( p − 1) < β < p ∈ N, Γ ( p−β) a Dz q(z) = dp q(z), if β = p ∈ N. dz p β
It should be noted that C 0 Dz (C) = 0, where C is a constant. Definition 22 ([13]) The fractional power series (FPS) about z = z 0 can be defined as ∞
jk (z − z 0 )kβ = j0 + j1 (z − z 0 )β + j2 (z − z 0 )2β + . . . ;
k=0
( p − 1) < β ≤ p, p ∈ N , z ≤ z 0 , where the constants jk , k = 0, 1, 2, . . . are the coefficients of the FPS. Theorem 1 ([13] At z = z 0 , the representation of the FPS of the function q(z) can be defined as ∞ jk (z − z 0 )kβ . q(z) = k=0 kβ
z 0 ≤ z < (z 0 + T ) . It was found that if C0 Dz0 q(z), ∀k = 0, 1, 2, . . . . are continuous on (z 0 , z 0 + T ), then jk = C βC β 0 Dz 0 0 Dz 0
β . . . .C0 Dz0
Dzk0 q(z 0 ) . Where T Γ (1+kβ)
(k-times).
kβ
is the radius of convergence and C0 Dz0 =
296
R. K. Meena and S. Kumar
Proposition 21 ([31]) Let q(z) = z p , z ≥ 0, then for function q(z) the Caputo derivative can be defined as C β p 0 Dz z
=
Γ ( p+1) z p−β , Γ ( p+1−β)
0,
if p ≥ β, if p < β.
Proposition 22 ([31]) For the continuous functions q1 (z) and q2 (z) for ( p − 1) < β ≤ p, z ≥ 0, we have
C β 0 Dz .
j1 q1 (z) + j2 q2 (z) = j1 .C0 Dzβ q1 (z) + j2 .C0 Dzβ q2 (z),
where j1 and j2 are constants.
3 Convergence of RPS Method q(t) =
∞
jr (t − t0 )rβ , qm (t) =
r =0
m
jr (t − t0 )rβ , t0 ≤ t < (t0 + T ).
r =0
Theorem 2 ([13]) When 0 < M < 1, |qm+1 (t)| ≤ M|qm (t)|, ∀m ∈ N and 0 < t < T < 1, then the series of numerical solutions converge to an exact solution. Proof We consider ∞ |q(t) − qm (t)| = qr (t) ≤
r =m+1 ∞
|qr (t)|, ∀ 0 < t < T < 1.
r =m+1
∞ ≤ | j0 | Mr r =m+1
M m+1 = | j0 | → 0 as m → ∞. (1 − M)
A Study on Fractional SIS Epidemic Model Using RPS Method
297
Theorem 3 ([13]) When −∞ < t < ∞, the classical power series expansion r∞=0 ∞ rβ jr t r has a radius of convergence ρ, then the FPS r =0 jr t , t ≥ 0 has a radius of 1 convergence ρ β .
4 Fractional SIS Epidemic Model Mathematical models are used not only in a particular area of research and engineering but also in other fields. It can be claimed that compartment models serve as the basis for mathematical epidemiology. The form of epidemic models is influenced by how people move between compartments within a population. This SIS epidemic model is made using the following assumptions: a. In a closed environment (i.e., no emigration and no immigration), the disease spreads with no death and birth in the population; hence the total population (N ) remains constant. b. When an infected person is placed in the susceptible compartment and encounters enough susceptibles, the number of newly infected people per unit of time is r1 S(t), where r1 is the infection rate and t is the period. At time t, there are a total of r1 S(t)I (t) newly infected individuals. c. The number of recovered at time t equals r2 I (t), where r2 is the constant recovery (or susceptibility) rate. The SIS epidemic model with distinct fractional order can be formulated as C β1 0 Dt S(t) C β2 0 Dt I (t)
= −r1 S(t)I (t) + r2 I (t),
(2a)
= r1 S(t)I (t) − r2 I (t).
(2b)
Subject to S(0) = S0 , I (0) = I0 .
(3)
And S(t) + I (t) = N ; N is the total size of the particular region’s population, which remains constant. Where 0 ≤ t < T , βi ∈ (0, 1], ∀ i = 1, 2. Positive constants r1 and r2 are the infections and recovery (or susceptible) rates, respectively, for Eqs. (2a) and (2b). β1 C β2 The terms C 0 Dt S(t) and 0 Dt I (t) are called the Caputo derivatives for S(t) and I (t) of orders β1 and β2 , respectively.
298
R. K. Meena and S. Kumar
5 Non-dimensional Fractional SIS Epidemic Model 0 0 Using the non-dimensionalized parameters s(τ ) = S(t)−S , i(t) = I (t)−I , and τ = Tt S0 I0 in Eqs. (1a) and (1b), the non-dimensional integer-order SIS mathematical model can be written as
ds(τ ) = −w1 s(τ )i(τ ) − w1 s(τ ) + w2 i(τ ) + w2 , dτ di(τ ) = w3 s(τ )i(τ ) + w3 s(τ ) + w4 i(τ ) + w4 . dτ
(4a) (4b)
Similarly, the non-dimensional fractional-order SIS model can be written as C β1 0 Dt s(τ ) = −w1 s(τ )i(τ ) − w1 s(τ ) + w2 i(τ ) + w2 , C β2 0 Dτ i(τ ) = w3 s(τ )i(τ ) + w3 s(τ ) + w4 i(τ ) + w4 .
(5a) (5b)
Subject to s(0) = 0, i(0) = 0, where s(0) and w1 = r1 T I0 , w2 =
r 2 T I0 S0
(6)
di(τ ) ds(τ ) + i(0) = 0, dτ dτ
− r1 T I0 , w3 = r1 T S0 , w4 = r1 T S0 − r2 T , 0 ≤ τ < β
β
1 2 C 1, βi ∈ (0, 1], i = 1, 2. The terms C 0 Dt s(τ ) and 0 Dt i(τ ) are called the Caputo derivatives for s(τ ) and i(τ ) of orders β1 and β2 , respectively.
6 Stability Analysis of the Non-dimensional Fractional SIS Epidemic Model An effective way to assess the stability of nonlinear systems is to use the Lyapunov direct technique [22, 23]. Fractional derivatives, however, cannot be used with the well-Leibniz rule. This method is very effective for verifying the suggested model’s stability. From Eqs. (5a), (5b) and (6), rewriting the systems of FDEs in the form of matrices as follows C βi 0 Dτ
X (τ ) = AX (τ ) + f (τ, X (τ )),
X (0) = X 0 ,
T where βi ∈ (0, 1], i = 1, 2, X (τ ) = s(τ ), i(τ ) , X 0 = (s0 , i 0 )T ,
T −w1 w2 . f (τ, X (τ )) = −w1 s(τ )i(τ ) + w2 , w3 s(τ )i(τ ) + w4 , A = w3 w4
A Study on Fractional SIS Epidemic Model Using RPS Method
299
We choose a Lyapunov function (i.e., V (τ )) candidate V (τ ) = X T P X = X T X , where P = I2 is a constant, square, symmetric, and positive definite matrix. A straightforward computation shows that 1 C βi βi Dτ V (τ ) ≤ X T (τ )P C 0 Dτ X (τ ), ∀βi ∈ (0, 1], ∀τ ≥ 0, 20 C D βi V (τ ) ≤ 2X T (τ )P C D βi X (τ ), 0 τ 0 τ
T = AX (τ ) + f (τ, X (τ )) X (τ ) + X T (τ ) AX (τ ) + f (τ, X (τ )) ,
= − 2w1 s 2 (τ )i(τ ) − 2w1 s 2 (τ ) + 2w2 s(τ )i(τ ) + 2w2 s(τ ) + 2w3 s(τ )i 2 (τ ) + 2w3 s(τ )i(τ ) + 2w4 i 2 (τ ) + 2w4 i(τ ). β
i This expression C 0 Dt V (τ ) < 0, if (i) i(τ ) > −1 and w1 >
w3 i(τ ) 4 i(τ ) (ii) i(τ ) < −1 and w1 < w2 s(τs)+w . + 2 (τ ) s(τ )
w2 s(τ )+w4 i(τ ) w3 i(τ ) + s(τ ) s 2 (τ )
.
βi Therefore, C 0 Dτ V (τ ) is negatively defined, which means that the suggested model’s trivial solution is Mittag-Leffler stability in accordance with Theorem (3.1) in [23]. The trivial solution of the suggested model is also asymptotically stable, according to Remark (3.1) in [23].
7 Existence of the Uniformly Solution of the Non-dimensional Fractional SIS Epidemic Model Recent works on initial and boundary value problems for nonlinear FDEs have focused on the existence and uniqueness of solutions [9, 10, 14]. In these works, we state
C β1 0 Dτ s(τ ) = f 1 s(τ ), i(τ ) = −w1 s(τ ) C β2 0 Dτ i(τ ) = f 2 s(τ ), i(τ ) = w3 s(τ )
+ w2 i(τ ) + w2 − w1 s(τ )i(τ ), + w4 i(τ ) + w4 + w3 s(τ )i(τ ),
with τ > 0, βi ∈ (0, 1], i = 1, 2, s(0) = 0, and i(0) = 0. Let D = {s(τ ), i(τ ) ∈ R : |s(τ )| ≤ a, |i(τ )| ≤ b, τ ∈ [0, 1]} then on D we have ∂ f (s, i) 1 ≤ k1 , ∂s ∂ f (s, i) 2 ≤ k3 , ∂s
∂ f (s, i) 1 ≤ k2 , ∂i ∂ f (s, i) 2 ≤ k4 , ∂i
where a, b, k1 , k2 , k3 , and k4 are positive constants. This implies that each of the two following functions f 1 , f 2 satisfies the Lipschitz condition with respect to the two arguments s(τ ) and i(τ ), then each of the two
300
R. K. Meena and S. Kumar
functions f 1 , f 2 is absolutely continuous with respect to the two arguments s(τ ) and i(τ ). Finally, it can be concluded that the proposed model satisfies the existence of the uniformly solution.
8 Solution of the Non-dimensional Fractional SIS Epidemic Model Using RPS Method We use the RPS method [1, 12, 21, 28, 36] to solve the fractional non-dimensional SIS epidemic model in Eqs. (5a), (5b) and (6). Following are the steps of the RPS method. Step 1:
The FPS for s(τ ) and i(τ ) about τ = 0 can be written as s(τ ) =
∞
∞
br τ rβ2 ar τ rβ1 , i(τ ) = , 0 ≤ τ < 1. Γ (1 + rβ1 ) Γ (1 + rβ2 ) r =0 r =0
(7)
The mth-term truncated series of s(τ ) and i(τ ) denoted by sm (τ ) and i m (τ ), respectively, are defined as sm (τ ) =
m r =0
br τ rβ2 ar τ rβ1 , i m (τ ) = . Γ (1 + rβ1 ) Γ (1 + rβ2 ) r =0 m
(8)
For m = 0, using the initial conditions from Eq. (6), we have s0 (τ ) = a0 = s0 (0) = s0 = 0, i 0 (τ ) = b0 = i 0 (0) = i 0 = 0.
(9)
Now, from Eqs. (8) and (9) the mth-truncated series of Eq. (8) can be defined as sm (τ ) =
m r =1
br τ rβ2 ar τ rβ1 , i m (τ ) = . Γ (1 + rβ1 ) Γ (1 + rβ2 ) r =1 m
(10)
Step 2: The residual functions, for the SIS epidemic model of distinct fractional order from Eqs. (5a) and (5b) could be defined as β
1 Ress (τ ) = C 0 Dt s(τ ) + w1 s(τ )i(τ ) + w1 s(τ ) − w2 i(τ ) − w2 ,
Resi (τ ) =
C β2 0 Dt i(τ )
− w3 s(τ )i(τ ) − w3 s(τ ) − w4 i(τ ) − w4 .
(11a) (11b)
Hence, the mth-residual functions of s(τ ) and i(τ ), respectively, are β1 Ressm (τ ) = C 0 Dτ sm (τ ) + w1 sm (τ )i m (τ ) + w1 sm (τ ) − w2 i m (τ ) − w2 , (12a) β2 Resim (τ ) = C 0 Dτ i m (τ ) − w3 sm (τ )i m (τ ) − w3 sm (τ ) − w4 i m (τ ) − w4 .
(12b)
A Study on Fractional SIS Epidemic Model Using RPS Method
301
For an approximate solution, Ress (τ ) = Resi (τ ) = 0, ∀τ ≥ 0. Also, lim Ressm (τ ) = Ress (τ ),
m→∞
lim Resim (τ ) = Resi (τ ).
m→∞
Since any constant has a zero Caputo derivative, we have [12] C ( p−1)β1 Ress (0) 0 Dτ C ( p−1)β2 Resi (0) 0 Dτ
( p−1)β1 =C Ress p (0), 0 Dτ ( p−1)β2 =C Resi p (0) 0 Dτ
∀ p = 1, . . . , m.
Step 3: To determine the coefficients ar and br for r = 1, 2, 3, . . . , m, we substitute the mth-truncated series of s(τ ) and i(τ ) in Eqs. (12a) and (12b), and then (m−1)β1 (m−1)β2 and D0 on Ress (τ ) use the Caputo fractional derivative operators D0 and Resi (τ ), respectively. It gives the equations C β1 0 Dτ Ressm (0)
= 0,
C β2 0 Dτ Resi m (0)
= 0, ∀m = 1, 2, 3, . . .
(13)
Step 4: Now, Eq. (13) is solved to get the values of ar and br for r = 1, 2, 3, . . . , m. Step 5: To achieve an adequate number of coefficients, repeat the process. The series solution’s accuracy can be improved by evaluating more terms in Eq. (10).
9 Numerical Scheme and Its Solution 9.1 Numerical Scheme Considering a small population suffering from infectious diseases capable of infecting a large population with rates r1 = 0.001, r2 = 0.02, and T = 1 with S0 = 10 and I0 = 90. For these values, we have w1 = 0.09, w2 = 0.09, w3 = 0.01, and w4 = −0.01. Also, fractional order β1 = β2 = β ∈ (0, 1].
9.2 Using the RPS Method for Solving the Non-dimensional SIS Epidemic Model of Fractional-Order β From Eq. (10), the 1st truncated power series approximation is given as s1 (τ ) = a0 +
a1 τ β b1 τ β , i 1 (τ ) = b0 + . Γ (1 + β) Γ (1 + β)
Using Step (3), 1st residual functions of s(τ ) and i(τ ) are obtained as
302
R. K. Meena and S. Kumar β Ress1 (τ ) = C 0 Dτ s1 (τ ) + 0.09s1 (τ )i 1 (τ ) + 0.09s1 (τ ) − 0.09i 1 (τ ) − 0.09, β Resi1 (τ ) = C 0 Dτ i 1 (τ ) − 0.01s1 (τ )i 1 (τ ) − 0.01s1 (τ ) + 0.01i 1 (τ ) + 0.01.
Substituting s1 (τ ) and i 1 (τ ) in the above expression and equating Ress1 (0) and Resi1 (0) to zero gives the values of a1 and b1 as a1 = w2 = 0.09, b1 = w4 = −0.01. Hence, s1 (τ ) and i 1 (τ ) can be written as s1 (τ ) =
0.09τ β 0.01τ β , i 1 (τ ) = − . Γ (1 + β) Γ (1 + β)
From Eq. (10), the 2nd truncated power series approximation is given as s2 (τ ) =
a2 τ 2β b2 τ 2β 0.09τ β 0.01τ β + , i 2 (τ ) = − + . Γ (1 + β) Γ (1 + 2β) Γ (1 + β) Γ (1 + 2β)
Now, using the Step 2 to Step 4 as discussed above, we obtain a2 = −0.009, b2 = 0.001. Thus, s2 (τ ) and i 2 (τ ) can be written as s2 (τ ) =
0.009τ 2β 0.001τ 2β 0.09τ β 0.01τ β − , i 2 (τ ) = − + . Γ (1 + β) Γ (1 + 2β) Γ (1 + β) Γ (1 + 2β)
The other coefficients of the truncated series can be obtained using the following recurrence relations for i = 1, 2 . . . m ai+1 = −w1
i r =0
bi+1 = w3
i r =0
ar bi−r Γ (1 + iβ) − w1 ai + w2 bi , Γ (1 + rβ)Γ (1 + (i − r )β)
ar bi−r Γ (1 + iβ) + w3 ai + w4 bi . Γ (1 + rβ)Γ (1 + (i − r )β)
10 Numerical Results and Discussion Absolute errors |sm+1 (τ ) − sm (τ )| and |i m+1 (τ ) − i m (τ )| are plotted in Fig. 2a, b for m = 1, 2, . . . , 45, β ∈ (0, 1], and τ = 1 to examine the convergence of the RPS solution with the number of terms in series. It is observed that both the errors are O(10−20 ) or smaller for 14 ≤ m ≤ 45.
A Study on Fractional SIS Epidemic Model Using RPS Method
303
Fig. 1 The SIS epidemic model
100
10-50
10-100
=1.00 =0.95 =0.90 =0.85 =0.80
0
=0.80 =1.00
20
40
|im+1( ) - im( )|
|sm+1( ) - sm( )|
100
10-50
10-100
=0.80
=1.00 =0.95 =0.90 =0.85 =0.80
0
=1.00
20
m
40
m sm (τ )
im (τ )
Fig. 2 The absolute error in RPS solution with m for different β and τ = 1
As shown in Fig. 3a, b, for a smaller value of m, the absolute errors are even lower for a smaller value of τ . The outcomes shown in these graphs are sufficient to prove the efficacy of the RPS method. It is obvious that the accuracy in the solution improves with the number of terms in the series. We use m = 15 only in subsequent calculations, as this is sufficient to achieve reasonable accuracy, i.e., s(τ ) ≈ s15 (τ ) =
15 r =0
br τ rβ ar τ rβ , i(τ ) ≈ i 15 (τ ) = . Γ (1 + rβ) Γ (1 + rβ) r =0 15
The SIS model discussed in Sect. 5 is also solve using RK-4 for β = 1, and calculate the absolute errors defined as Abss (τ ) = |RPS(s(τ )) − RK-4(s(τ ))|, Absi (τ ) = |RPS(i(τ )) − RK-4(i(τ ))|. The absolute solution obtained by RPS and RK-4 method with absolute error for s(τ ) and i(τ ) for different τ are listed in Tables 1 and 2, respectively. From Tables 1 and 2, it is observed that the maximum absolute errors in s(τ ) and i(τ ) are O(10−4 ) and O(10−7 ), respectively, which also conforms the efficiency of the RPS method. Figures 4a and 4b show the nature of s(τ ) and i(τ ), respectively, using the RK-4 and RPS method for β = 1 and m = 15 with step size τ = 0.1 over the interval
304
R. K. Meena and S. Kumar
Table 1 The approximate solution of s(τ ) using RPS and RK-4 methods for β = 1 τ RK-4(s(τ )) RPS(s(τ )) Abss (τ ) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
0 0.008955176356773 0.017821405753004 0.026599727314000 0.035291165231827 0.043896729030420 0.052417413825036 0.060854200576190 0.069208056338211 0.077479934502550 0.085670775035975
0 0.008955203095691 0.017821617593566 0.026600435381231 0.035292827472230 0.043899944451592 0.052422916910176 0.060862855868166 0.069220853188015 0.077497981977135 0.085695296980644
0 2.674E-08 2.118E-07 7.0807E-07 1.662E-06 3.215E-06 5.503E-06 8.655E-06 1.280E-05 1.805E-05 2.452E-05
Table 2 The approximate solution of i(τ ) using RPS and RK-4 methods for β = 1 τ RK-4(i(τ )) RPS(i(τ )) Absi (τ ) 0 −0.000995019595197 −0.001980156194778 −0.002955525257111 −0.003921240581314 −0.004877414336713 −0.005824157091671 −0.006761577841799 −0.007689784037579 −0.008608881611394 −0.009518975003997
10-100
=0.80 =1.00
10
=1.00 =0.95 =0.90 =0.85 =0.80
-150
10-200
0
0.5
0 −0.000995019588567 −0.001980156088777 −0.002955524724718 −0.003921238912942 −0.004877410298440 −0.005824148789751 −0.006761562593243 −0.007689758246577 −0.008608840651382 −0.009518913105164
|im+1( ) - im( )|
|sm+1( ) - sm( )|
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
1
10-100
0 6.630E-12 1.060E-10 5.324E-10 1.668E-09 4.038E-09 8.302E-09 1.525E-08 2.579E-08 4.096E-08 6.190E-08
=0.80 =1.00
10
=1.00 =0.95 =0.90 =0.85 =0.80
-150
10-200
0
sm (τ ) Fig. 3 The absolute error in RPS solution with τ for different β and m = 45
0.5
1
im (τ )
A Study on Fractional SIS Epidemic Model Using RPS Method
305
0
0.1
RK-4 RPS
i( )
s( )
RK-4 RPS
0.05
0
0
0.5
1
-0.005
-0.01
0
s(τ )
0.5
1
i(τ )
Fig. 4 Approximate solution using RPS and RK-4 method with β = 1
τ ∈ [0, 1]. From these graphical results, it is clear that the approximations obtained by the RPS method are very efficient, and the efficiency can be achieved using a relatively small number of terms, 15 terms in our example. However, efficiency can be increased by increasing the number of terms in power that can accurately predict the behavior of s(τ ) and i(τ ) for the region. Moreover, to show the effect of the fractional SIS epidemic model, the graphs of the RPS solutions of s(τ ) and i(τ ) for different values of β ∈ (0, 1] are established in Fig. 5a, b, with step size τ = 0.1 over the interval τ ∈ [0, 1]. From these Fig. 5a, b, it is clear that compared to integer-order derivative, fractional-order derivative offers a higher degree of freedom. We obtained remarkable responses to the proposed model by taking the fractional derivative concept. However, the curves of various compartments s(τ ) and i(τ ) of the fractional SIS model approach those of the classical SIS model as the fractional order approaches the integer order. As the order of the fractional derivative increases, the amplitude to s(τ ) is decreased, and i(τ ) is increased, and the tendency is similar to the case β = 1. Results obtained from the fractional-order model will be more helpful for public health practitioners and policymakers to club together in efforts to enhance epidemic care. The approximate solutions of s(τ ) and i(τ ) using RPS method for β = 0.8 and 0.9 are also listed in Table 3. Also, using the RPS method with β = 0.8 and m = 15 and τ = 0.1 over the range τ ∈ [0, 1], Fig. 6 illustrates the nature of s(τ ) and i(τ ).
306
R. K. Meena and S. Kumar
Table 3 The approximate solutions of s(τ ) and i(τ ) using RPS method for β = 0.9 and β = 0.8 β = 0.9
β = 0.8
τ
RPS(s(τ ))
RPS(i(τ ))
RPS(s(τ ))
RPS(i(τ ))
0.00
0
0
0
0
0.10
0.011696240671015
−0.001299574579765
0.015158299602745
−0.001684236028459
0.20
0.021691058237923
−0.002410067917264
0.026193489475336
−0.002910286128361
0.30
0.031061404093563
−0.003451120045246
0.035985860317939
−0.003998163001627
0.40
0.040014852307632
−0.004445777691012
0.045015049336704
−0.005001147074185
0.50
0.048648318231975
−0.005404794530182
0.053494523804762
−0.005942947487653
0.60
0.057018604101790
−0.006334468884326
0.061544981606453
−0.006836967594922
0.70
0.065163290846987
−0.007238964651457
0.069243893326079
−0.007691809039259
0.80
0.073109225654315
−0.008121254127593
0.076645128418291
−0.008513452095272
0.90
0.080876637655617
−0.008983575139302
0.083788283142075
−0.009306291885729
1.00
0.088481379766116
−0.009827680015131
0.090703676263010
−0.010073693308679
0 =1.00 =0.95 =0.90 =0.85 =0.80
0.05
=1.00 =0.95 =0.90 =0.85 =0.80
=0.80
=1.00
i( )
s( )
0.1
-0.005
=1.00 =0.80
0
-0.01 0
0.5
1
0
0.5
s(τ )
1
i(τ )
Fig. 5 Approximate solution using RPS method for different β Fig. 6 Approximate solutions of s(τ ) and i(τ ) using RPS method for β = 0.8
=0.8
0.1
0
i( )
s( )
-0.005 0.05 -0.01 0
0
0.5
1
-0.015
A Study on Fractional SIS Epidemic Model Using RPS Method
307
11 Conclusion The non-dimensionalized fractional SIS model is easy to understand and analyze, and the domain belongs to [0, 1]. This paper uses the RPS method to examine the proposed SIS model. The RPS method makes the proposed model more straightforward and realistic for comprehending epidemic diseases’ dynamic behavior for τ ∈ [0, 1]. It has been found that the construction of the RPS technique possesses a very rapid convergent series due to the general formula of its coefficients, which can be determined after a few successive iterations. The behavior of the RPS solution seems to be extremely interesting. The obtained results demonstrate the reliability of the method. Some numeric interpretations and their graphical presentations have been provided. The solutions’ natural frequency varies with the fractional-order change β ∈ (0, 1]. The values of s(τ ) and i(τ ) are decreased and increased, respectively, when fractional order β is increased. These results indicate that the fractional-order SIS model in the epidemic is more beneficial and essential than the ordinary integer-order model. Acknowledgements The junior research fellowship (JRF), provided by the Council of Scientific & Industrial Research (CSIR), New Delhi, India, via file no: 09/1007(0011)/2021-EMR-I during the development of the work is acknowledged by the first author.
References 1. Abu-Gdairi R, Al-Smadi M, Gumah G (2015) An expansion iterative technique for handling fractional differential equations using fractional power series scheme. J Math Stat 11(2):29 2. Al-Smadi M, Freihat A, Hammad MA, Momani S, Arqub OA (2016) Analytical approximations of partial differential equations of fractional order with multistep approach. J Comput Theor Nanosci 13(11):7793–7801 3. Al-Smadi M, Freihat A, Khalil H, Momani S, Ali Khan R (2017) Numerical multistep approach for solving fractional partial differential equations. Int J Comput Methods 14(03):1750029 4. Angstmann CN, Erickson AM, Henry BI, McGann AV, Murray JM, Nichols JA (2017) Fractional order compartment models. SIAM J Appl Math 77(2):430–446 5. Balzotti C, D’Ovidio M, Lai AC, Loreti P (2021) Effects of fractional derivatives with different orders in SIS epidemic models. Computation 9(8):89 6. Balzotti C, D’Ovidio M, Loreti P (2020) Fractional SIS epidemic models and their solutions. arXiv:2004.12803 7. Changpin ZF, Liu FL (2012) Spectral approximations to the fractional integral and derivative. Fract Calc Appl Anal 15(3):383–406 8. Chen Y, Liu F, Yu Q, Li T (2021) Review of fractional epidemic models. Appl Math Model 97:281–307 9. Deng J, Deng Z (2014) Existence of solutions of initial value problems for nonlinear fractional differential equations. Appl Math Lett 32:6–12 10. Deng J, Ma L (2010) Existence and uniqueness of solutions of initial value problems for nonlinear fractional differential equations. Appl Math Lett 23(6):676–680 11. Diethelm K, Freed AD (1998) The fracpece subroutine for the numerical solution of differential equations of fractional order. Forschung Und Wissenschaftliches Rechnen 1999:57–71 12. El-Ajou A, Arqub OA, Al-Smadi M (2015) A general form of the generalized Taylor’s formula with some applications. Appl Math Comput 256:851–859
308
R. K. Meena and S. Kumar
13. El-Ajou A, Arqub OA, Zhour ZA, Momani S (2013) New results on fractional power series: theories and applications. Entropy 15(12):5305–5323 14. El-Saka H (2014) The fractional-order SIS epidemic model with variable population size. J Egypt Math Soc 22(1):50–54 15. Hasan S, Al-Zoubi A, Freihet A, Al-Smadi M, Momani S (2019) Solution of fractional SIR epidemic model using residual power series method. Appl Math Inf Sci 13(2):153–161 16. Hassouna M, Ouhadan A, El Kinani E (2018) On the solution of fractional order SIS epidemic model. Chaos Solitons Fract 117:168–174 17. Hoang MT, Zafar ZUA, Ngo TKQ (2020) Dynamics and numerical approximations for a fractional-order SIS epidemic model with saturating contact rate. Comput Appl Math 39(4):1– 20 18. Jana S, Mandal M, Nandi SK, Kar TK (2021) Analysis of a fractional-order SIS epidemic model with saturated treatment. Int J Model Simul Sci Comput 12(01):2150004 19. Kermack WO, McKendrick AG (1927) A contribution to the mathematical theory of epidemics. Proc R Soc A Math Phys Eng Sci 115(772):700–721 20. Kermack WO, McKendrick AG (1932) Contributions to the mathematical theory of epidemics. Proc R Soc A Math Phys Eng Sci 138(834):55–83 21. Komashynska I, Al-Smadi M, Al-Habahbeh A, Ateiwi A (2016) Analytical approximate solutions of systems of multi-pantograph delay differential equations using residual power-series method. arXiv:1611.05485 22. Korobeinikov SM, Kelly O, Thomas CA, O’Callaghan MJ, Pokrovskii AV (2010) Lyapunov functions for SIR and SIRS epidemic models. Appl Math Lett 23(4):446–448 23. Liu S, Jiang W, Li X, Zhou XF (2016) Lyapunov stability analysis of fractional nonlinear systems. Appl Math Lett 51:13–19 24. Machado JT, Kiryakova V, Mainardi F (2011) Recent history of fractional calculus. Commun Nonlinear Sci Numer Simul 16(3):1140–1153 25. Mandal M, Jana S, Nandi SK, Kar T (2020) Modelling and control of a fractional-order epidemic model with fear effect. Energy Ecol Environ 5(6):421–432 26. Maolin WZ, Hu HD (2013) Measuring memory with the order of fractional derivative. Sci Rep 3(1):1–3 27. Meena RK, Kumar S (2022) Solution of fractional order SIR epidemic model using residual power series method. Palest J Math 11 28. Moaddy K, Al-Smadi M, Hashim I (2015) A novel representation of the exact solution for differential algebraic equations system using residual power-series method. Disc Dyn Nat Soc 2015 29. Momani S, Arqub OA, Freihat A, Al-Smadi M (2016) Analytical approximations for FokkerPlanck equations of fractional order in multistep schemes. Appl Comput Math 15(3):319–330 30. Panda A, Santra S, Mohapatra J (2022) Adomian decomposition and homotopy perturbation method for the solution of time fractional partial integro-differential equations. J Appl Math Comput 68(3):2065–2082 31. Podlubny I (1998) Fractional differential equations: an introduction to fractional derivatives, fractional differential equations, to methods of their solution and some of their applications. Elsevier 32. Santra S, Mohapatra J (2022) Analysis of a finite difference method based on l1 discretization for solving multi-term fractional differential equation involving weak singularity. Math Methods Appl Sci 33. Santra S, Mohapatra J (2022) Numerical treatment of multi-term time fractional nonlinear KdV equations with weakly singular solutions. Int J Model Simul 1–11 34. Santra S, Panda A, Mohapatra J (2022) A novel approach for solving multi-term time fractional Volterra-Fredholm partial integro-differential equations. J Appl Math Comput 68(5):3545– 3563 35. Skwara U, Martins J, Ghaffari P, Aguiar M, Boto J, Stollenwerk N (2012) Fractional calculus and superdiffusion in epidemiology: shift of critical thresholds. In: Proceedings of the 12th international conference on computational and mathematical methods in science and engineering, La Manga
A Study on Fractional SIS Epidemic Model Using RPS Method
309
36. Wang L, Chen X (2015) Approximate analytical solutions of time fractional Whitham-BroerKaup equations by a residual power series method. Entropy 17(9):6519–6533
On Zero-Sum Two Person Perfect Information Semi-Markov Games Sagnik Sinha and Kushal Guha Bakshi
Abstract A zero-sum two-person Perfect Information Semi-Markov game (PISMG) under limiting ratio average pay-off has a value and both the maximiser and the minimiser have optimal pure semi-stationary strategies. We arrive at the result by first fixing an arbitrary initial state and forming the matrix of undiscounted pay-offs corresponding to each pair of pure stationary strategies of the two players and proving that this matrix has a pure saddle point. Keywords Semi-Markov games · Perfect information · (pure) Semi-stationary strategies
1 Introduction A semi-Markov game (SMG) is a generalisation of a Stochastic (Markov) game (Shapley [11]). Such games have already been studied in the literature (e.g. Lal- Sinha [1], Luque-Vasquez [2], Mondal [4]). Single player SMGs are called semi-Markov decision processes (SMDPs) which were introduced by Jewell [5] and Howard [16]. A perfect information semi-Markov game (PISMG) is a natural extension of perfect information stochastic games (PISGs) (Raghavan et al. [6]), where at each state all but one player is a dummy (i.e., he has only one available action in that state). Note that for such a game, perfect information is a state property. In this paper, we prove that such games (PISMGs) have a value and both players have pure semi-stationary optimal strategies under undiscounted (limiting ratio average) pay-offs. We prove this by showing the existence of a pure saddle point in the pay-off matrix of the game for each initial state. The paper is organised as follows. Section 2 contains Supported by Department of Science and Technology, Govt. of India, INSPIRE Fellowship Scheme. S. Sinha · K. G. Bakshi (B) Department Of Mathematics, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata 700032, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_23
311
312
S. Sinha and K. G. Bakshi
definitions and properties of an undiscounted two person zero-sum semi-Markov game considered under limiting ratio average pay-off. Section 3 contains main result of this paper. In Sect. 4 we state the algorithm to compute a Cesaro limiting matrix of a transition matrix, proposed by Lazari et al. [9]. Section 5 contains a numerical example illustrating our theorem. Section 6 is reserved for the conclusion.
2 Preliminaries 2.1 Finite Zero-Sum Two-Person Semi-Markov Games A zero-sum two-person finite SMG is described by a collection of objects Γ =< S, {A(s) : s ∈ S}, {B(s) : s ∈ S}, q, P, r >, where S = {1, 2, . . . , N } is the finite non-empty state space and A(s) = {1, 2, . . . , m s }, B(s) = {1, 2, . . . , n s } are respectively the non-empty sets of admissible actions of the players I and II respectively in the state s. Let us denote K = {(s, i, j) : s ∈ S, i ∈ A(s), j ∈ B(s)} to be the set of admissible triplets. For each (s, i, j) ∈ K , we denote q(. | s, i, j) to be the transition law of the game. Given (s, i, j) ∈ K and s ∈ S, let τissj be the transition time random variable which denotes the time for a transition to a state s from a state s by a pair of actions (i, j) ∈ A(s) × B(s). Let Pissj = Pr ob(τissj ≤ t) for each (s, i, j) ∈ K , s ∈ S be a probability distribution function on [0, ∞) and it is called the conditional transition time distribution function. Finally r is the real-valued functions on K , which represents the immediate (expected) rewards for the player I (and −r is the immediate reward for player II). Let us consider player I as the maximiser and player II as the minimiser in the zero-sum two person SMG. The semi-Markov game over infinite time is played as follows. At the 1st decision epoch, the game strats at s1 ∈ S and the players I and II simultaneously and independently choose actions i 1 ∈ A(s1 ) and j1 ∈ B(s1 ) respectively. Consequently player I and II get immediate rewards r (s1 , i 1 , j1 ) and −r (s1 , i 1 , j1 ) respectively and the game moves to the state s2 with probability q(s2 | s1 , i 1 , j1 ). The sojourn time to move from state s1 to the state s2 is determined by the distribution function Pis11js12 (.). After reaching the state s2 on the next decision epoch, the game is repeated over infinite time with the state s1 replaced by s2 . By a strategy (behavioural) π1 of the player I, we mean a sequence {(π1 )n (. | histn )}∞ n=1 , where (π1 )n specifies which action is to be chosen on the n-th decision epoch by associating with each history histn of the system up to nth decision epoch (where histn = (s1 , a1 , b1 , s2 , a2 , b2 . . . , sn−1 , an−1 , bn−1 , sn ) for n ≥ 2, hist1 = (s1 ) and (sk , ak , jk ) ∈ K are respectively the state and actions of the players at the k-th decision epoch) a probability distribution (π1 )n (. | histn ) on A(sn ). Behavioural strategy π2 for player II can be defined analogously. Generally by any unspecified strategy, we mean behavioural strategy here. We denote Π1 and Π2 to be the sets of strategies (behavioural) of the players I and II respectively.
On Zero-Sum Two Person Perfect Information Semi-Markov Games
313
A strategy f = { f n }∞ n=1 for the player I is called semi-Markov if for each n, f n depends on s1 , sn and the decision epoch number n. Similarly we can define a semi-Markov strategy g = {gn }∞ n=1 for the player II. A stationary strategy is a strategy that depends only on the current state. A stationary strategy for player I is defined as N tuple f = ( f (1), f (2), . . . , f (N )), where each f (s) is the probability distribution on A(s) given by f (s) = ( f (s, 1), f (s, 2), . . . , f (s, m s )). f (s, i) denotes the probability of choosing action i in the state s. By similar manner, one can define a stationary strategy g for player II as g = (g(1), g(2), . . . , g(N )) where each g(s) is the probability distribution on B(s). Let us denote F1 and F2 to be the set of stationary strategies for player I and II respectively. A stationary strategy is called pure if any player selects a particular action with probability 1 while visiting a state s. We denote F1s and F2s to be the set of pure stationary strategies of the players I and II respectively. A semi-stationary strategy is a semi-Markov strategy which is independent of the decision epoch n, i.e., for a initial state s1 and present state s2 , if a semi-Markov strategy g(s1 , s2 , n) turns out to be independent of n, then we call it a semi-stationary strategy. Let ξ1 and ξ2 denote the set of semi-stationary strategies for the players I sp sp and II respectively and ξ1 and ξ2 denote the set of pure semi-stationary strategies for the players I and II respectively. Definition 1 A zero-sum two person SMG Γ =< S, {A(s) : s ∈ S}, {B(s) : s ∈ S}, q, P, r > is called a perfect information semi-Markov game (PISMG) if the following properties hold (i)S = S1 ∪ S2 , S1 ∩ S2 = φ. (ii)| B(s) |= 1, for all s ∈ S1 , i.e., on S1 player II is a dummy. (iii)| A(s) |= 1, for all s ∈ S2 , i.e., on S2 player I is a dummy.
2.2 Zero-Sum Two-Person Semi-Markov Games Under Limiting Ratio Average (Undiscounted) Payoff Let (X 1 , A1 , B1 , X 2 , A2 , B2 · · · ) be a co-ordinate sequence in S × (A × B × S)∞ . Given behavioural strategy pair (π1 , π2 ) ∈ Π1 × Π2 , initial state s ∈ S, there exists a unique probability measure Pπ1 π2 (. | X 1 = s) (hence an expectation E π1 π2 (. | X 1 = s)) on the product σ - field of S × (A × B × S)∞ by Kolmogorov’s extension theorem. For a pair of strategies (π1 , π2 ) ∈ Π1 × Π2 for the players I and II respectively, the limiting ratio average (undiscounted) pay-off for player I, starting from a state s ∈ S is defined by φ(s, π1 , π2 )= lim inf n→∞
E π1 π2 nm=1 [r (X m ,Am ,Bm )|X 1 =s] n . E π1 π2 m=1 [τ¯ (X m ,Am ,Bm )|X 1 =s]
314
S. Sinha and K. G. Bakshi
∞ Here τ¯ (s, i, j) = s ∈S q(s | s, i, j) 0 td Pissj (t) is the expected sojourn time in the state s for a pair of actions (i, j) ∈ A(s) × B(s).
Definition 2 For each pair of stationary strategies ( f, g) ∈ F1 × F2 we define the transition matrix as Q( f, g) = [q(s | s, f, g)] N ×N , where q(s | probability s, f, g) = i∈A(s) j∈B(s) q(s | s, i, j) f (s, i)g(s, j) is the probability that start ing from the state s, next state is s when the players choose strategies f and g respectively (For a stationary strategy f , f (s, i) denotes the probability of choosing action i in the state s). For any pair of stationary strategies ( f, g) ∈ F1 × F2 of player I and II, we write the undiscounted pay-off for player I as φ(s, f, g) = lim inf n→∞
n r m (s, f,g) nm=1 m m=1 τ¯ (s, f,g)
for all s ∈ S,
where r m (s, f, g) and τ¯ m (s, f, g) are respectively the expected reward and expected sojourn time for player I at the m th decision epoch, when player I chooses f and player II chooses g respectively and the initial state is s. We define r ( f, g) = [r (s, f, g)] N ×1 , τ¯ ( f, g) = [τ¯ (s, f, g)] N ×1 and φ( f, g) = [φ(s, f, g)] N ×1 as expected reward, expected sojourn time and undiscounted pay-off vector for a pair of stationary strategy ( f, g) ∈ F1 × F2 . Now = s ∈S P f g (X m = s | X 1 = s)r (s , f, g) r m (s, f, g) m−1 = s ∈S r (s , f, g)q (s | s, f, g) = [Q m−1 ( f, g)r ( f, g)](s) and
= s ∈S P f g (X m = s | X 1 = s)τ¯ (s , f, g) = s ∈S τ¯ (s , f, g)q m−1 (s | s, f, g) = [Q m−1 ( f, g)τ¯ ( f, g)](s).
τ¯ m (s, f, g)
Since Q( f, g) is a Markov matrix, we have by Kemeny et al. [12] limn→∞
1 n
n m=1
It is obvious that limn→∞
1 n
limn→∞
1 n
and
Q m ( f, g) exists and equals Q ∗ ( f, g).
n
m=1 r
n m=1
m
( f, g) = [Q ∗ ( f, g)r ( f, g)](s)
τ¯ m ( f, g) = [Q ∗ ( f, g)τ¯ ( f, g)](s).
Thus we have for any pair of stationary strategies ( f 1 , f 2 ) ∈ F1 × F2 , φ(s, f, g) =
[Q ∗ ( f,g)r ( f,g)](s) [Q ∗ ( f,g)τ¯ ( f,g)](s)
for all s ∈ S
On Zero-Sum Two Person Perfect Information Semi-Markov Games
315
where Q ∗ ( f, g) is the Cesaro limiting matrix of Q( f, g). Definition 3 A zero-sum two person undiscounted semi-Markov game is said to have a value vector φ = [φ(s)] N ×1 if supπ1 ∈Π1 inf π2 ∈Π2 φ(s, π1 , π2 ) = φ(s) = inf π2 ∈Π2 supπ1 ∈Π1 φ(s, π1 , π2 ) for all s ∈ S. A pair of strategies (π1∗ , π2∗ ) ∈ Π1 , ×Π2 is said to be an optimal strategy pair for the players if φ(s, π1∗ , π2 ) ≥ φ(s) ≥ φ(s, π1 , π2∗ ) for all s ∈ S and all (π1 , π2 ) ∈ Π1 × Π2 . Throughout this paper, we use the notion of undiscounted pay-off as limiting ratio average pay-off.
3 Results Theorem 1 Any zero-sum two person undiscounted perfect information semiMarkov game has a solution in pure semi-stationary strategies under limiting ratio average pay-offs. Proof Let Γ =< S = S1 ∪ S2 , A = {A(s) : s ∈ S1 }, B = {B(s) : s ∈ S2 }, q, P, r > be a zero-sum two person perfect information semi-Markov game under limiting ratio average pay-off, where S = {1, 2, . . . , N } is the finite state space. Let us fix an initial state s ∈ S. We assume that in | S1 | number of states (i.e., states {1, 2, . . . , S1 }), player II is a dummy and from states {| S1 | +1, . . . , | S1 | + | S2 |} player I is a dummy. We assume that in this perfect information game, player I has d1 , d2 , . . . , d S1 number of pure actions in the states where he is non-dummy and similarly player II has t S1 +1 , t S1 +2 , . . . , t S1 +S2 number of pure actions available in S1 S1 +S2 di and D2 = Πi=S t . Let us the states where he is non-dummy. Let D1 = Πi=1 1 +1 i consider the pay-off matrix ⎡ ⎤ φ(s, f 1 , g1 ) φ(s, f 1 , g2 ) · · · φ(s, f 1 , g D2 ) ⎢ φ(s, f 2 , g1 ) φ(s, f 2 , g2 ) · · · φ(s, f 2 , g D2 ) ⎥ ⎢ ⎥ A D1 ×D2 = ⎢ ⎥ .. .. .. .. ⎣ ⎦ . . . . φ(s, f D1 , g1 ) φ(s, f 2 , g2 ) · · · φ(s, f D1 , g D2 ) where ( f 1 , f 2 , . . . , f D1 ) and (g1 , g2 , . . . , g D2 ) are the pure stationary strategies chosen by player I and II respectively. In order to prove the existence of a pure semistationary strategy, we have to prove that this matrix has a pure saddle point for each initial state s ∈ S. Now by theorem 2.1 (“Some topics in two-person games”, in the Advances in Game Theory.(AM-52), Volume 52, 1964, page-6) proposed by Shapley [17], we know that if A is the pay-off matrix of a two-person zero-sum game and if every 2 × 2 submatrix of A has a saddle point, then A has a saddle point. So, we concentrate only on a 2 × 2 matrix and observe if it has a saddle point or not. We consider the 2 × 2 submatrix:
φ(s, f i , g j ) φ(s, f i , g j ) φ(s, f i , g j ) φ(s, f i , g j )
316
S. Sinha and K. G. Bakshi
where i , i ∈ {d1 , d2 . . . , d S1 }, (i = i ) and j, j ∈ {t S1 +1 , t S1 +2 , . . . , t S1 +S2 }, ( j = j ). Now, by suitably renumbering the strategies, we can write the above submatrix as
A2×2 =
φ(s, f 1 , g1 ) φ(s, f 1 , g2 ) . φ(s, f 2 , g1 ) φ(s, f 2 , g2 )
Now we know φ(s, f i. , g. j ) =
S1
S1 +S2 (t|s, f i. )r (t, f i. )]+ v=S [q ∗ (v|s,g. j )r (v,g. j )] 1 +1 . S1 +S2 ∗ (t|s, f )τ (t, f )]+ ∗ [q i. i. t=1 v=S +1 [q (v|s,g. j )τ (v,g. j )] t=1 [q
∗
S1
1
We replace φ(s, f i. , g. j ) by the expression above in the matrix A. Let us rename the elements of the 2 × 2 submatrix as we consider the following two cases when A cannot have a pure saddle point. Case-1: φ(s, f 1 , g1 ) is row-minimum and column-minimum, φ(s, f 1 , g2 ) is row-maximum and column-maximum, φ(s, f 2 , g1 ) is row-maximum and columnmaximum and φ(s, f 2 , g2 ) is row-minimum and column-minimum. These four conditions can be written as φ(s, f 1 , g1 ) < φ(s, f 1 , g2 ), φ(s, f 1 , g1 ) < φ(s, f 2 , g1 ) φ(s, f 2 , g2 ) < φ(s, f 2 , g1 ) and φ(s, f 2 , g2 ) < φ(s, f 1 , g2 ). So, the above four inequalities can be written elaborately as S1
0.
t=1 v=S1 +1
(3.8)
Using the fact that, 0 < q ∗ (s | s, a) < 1, (where s, s ∈ {1, 2, . . . , N }, a is the action chosen by either player I or II) and adding (3.5) and (3.6), we get S1 S1 +S2
(τ (t, 1.)r (v, .2) − r (t, 1.)τ (v, .2)) +
t=1 v=S1 +1
S1 S1 +S2
(r (t, 1.)τ (v, .1) − τ (t, 1.)r (v, .1))+
t=1 v=S1 +1 S1 S1 +S2
(r (v, .1)τ (t, 2.) − r (t, 2.)τ (v, .1)) +
t=1 v=S1 +1
S1 S1 +S2 t=1 v=S1 +1
(r (t, .2)τ (v, 2.) − τ (t, 2.)r (v, .2)) > 0.
(3.9) Similarly adding (3.7) and (3.8), we get S1 S1 +S2
(τ (v, .2)r (t, 1.) − r (v, .2)τ (t, 1.)) +
t=1 v=S1 +1
S1 S1 +S2
(r (v, .1)τ (t, 1.) − τ (v, .1)r (t, 1.))+
t=1 v=S1 +1 S1 S1 +S2 t=1 v=S1 +1
(r (t, 2.)τ (v, .1) − r (v, .1)τ (t, 2.)) +
S1 S1 +S2 t=1 v=S1 +1
(r (v, .2)τ (t, 2.) − τ (t, .2)r (v, 2.)) > 0.
(3.10) From (3.9) and (3.10), we clearly get a contradiction. Now we consider the next case: Case-2: φ(s, f 1 , g1 ) is row maximum and column maximum, φ(s, f 1 , g2 ) is row-minimum and column-minimum, φ(s, f 2 , g1 ) is row-minimum and columnminimum and φ(s, f 2 , g2 ) is row-maximum and column-maximum. These four conditions can be written as φ(s, f 1 , g1 ) > φ(s, f 1 , g2 ), φ(s, f 1 , g1 ) > φ(s, f 2 , g1 ), φ(s, f 2 , g2 ) > φ(s, f 2 , g1 ) and φ(s, f 2 , g2 ) > φ(s, f 1 , g2 ). We can re-write them as follows: S1 +S2 S1 ∗ ∗ t=1 [q (t | s, f 1. )r (t, f 1. )] + v=S1 +1 [q (v | s, g.1 )r (v, g.1 )] (3.11) S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 1. )τ (t, f 1. )] + v=S1 +1 [q (v | s, g.1 )τ (v, g.1 )] S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 1. )r (t, f 1. )] + v=S1 +1 [q (v | s, g.2 )r (v, g.2 )] > S1 . S1 +S2 ∗ ∗ t=1 [q (t | s, f 1. )τ (t, f 1. )] + v=S1 +1 [q (v | s, g.2 )τ (v, g.2 )]
On Zero-Sum Two Person Perfect Information Semi-Markov Games
319
S1
>
S1 +S2 ∗ ∗ t=1 [q (t | s, f 1. )r (t, f 1. )] + v=S1 +1 [q (v | s, g.1 )r (v, g.1 )] S1 +S2 S1 ∗ ∗ t=1 [q (t | s, f 1. )τ (t, f 1. )] + v=S1 +1 [q (v | s, g.1 )τ (v, g.1 )] S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 2. )r (t, f 2. )] + v=S1 +1 [q (v | s, g.1 )r (v, g.1 )] . S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 2. )τ (t, f 2. )] + v=S1 +1 [q (v | s, g.1 )τ (v, g.1 )]
(3.12)
S1
>
S1 +S2 ∗ ∗ t=1 [q (t | s, f 2. )r (t, f 2. )] + v=S1 +1 [q (v | s, g.2 )r (v, g.2 )] S1 +S2 S1 ∗ ∗ t=1 [q (t | s, f 2. )τ (t, f 2. )] + v=S1 +1 [q (v | s, g.2 )τ (v, g.2 )] S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 2. )r (t, f 2. )] + v=S1 +1 [q (v | s, g.1 )r (v, g.1 )] . S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 2. )τ (t, f 2. )] + v=S1 +1 [q (v | s, g.1 )τ (v, g.1 )] S1 +S2 ∗ ∗ t=1 [q (t | s, f 2. )r (t, f 2. )] + v=S1 +1 [q (v | s, g.2 )r (v, g.2 )] S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 2. )τ (t, f 2. )] + v=S1 +1 [q (v | s, g.2 )τ (v, g.2 )] S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 1. )r (t, f 1. )] + v=S1 +1 [q (v | s, g.2 )r (v, g.2 )] . S1 ∗ S1 +S2 ∗ t=1 [q (t | s, f 1. )τ (t, f 1. )] + v=S1 +1 [q (v | s, g.2 )τ (v, g.2 )]
(3.13)
S1
>
(3.14)
Like the previous case we also rename the strategies f 1. , f 2. , g1. and g2. as 1., 2., .1 and .2 respectively to avoid notational complexity. Hence, (3.11) yields S1
q ∗ (t | s, 1.)τ (t, 1.)
t=1
+
S1 +S2
q ∗ (v | s, .2)r (v, .1)
v=S1 +1
−
S1 S1 +S2
S1 +S2
S1
q ∗ (v |, s.1)r (v, .1) +
q ∗ (t | s, 1.)r (t, 1.)
v=S1 +1
t=1
S1 +S2
S1 S1 +S2
q ∗ (v | s, .2)τ (v, .2) −
v=S1 +1
q ∗ (t | s, 1.)q ∗ (v | s, .1)r (t, 1.)τ (v, .1) −
t=1 v=S1 +1
q ∗ (t | s, 1.)q ∗ (v | s, .2)τ (t, 1.)r (v, .2)
t=1 v=S1 +1 S1 +S2
q ∗ (v | s, .2)r (v, .2)
S1 +S2
q ∗ (t | s, 2.)τ (t, 2.)q ∗ (v |, s.2)r (v, .2) +
S1 S1 +S2
q ∗ (v | s, .1)τ (v, .1)q ∗ (v | s, .2)r (v, .2) −
q ∗ (t | s, 1.)q ∗ (v | s, .1)τ (t, .2)r (v, 2.) −
t=1 v=S1 +1
Equation (3.12) yields
S1 S1 +S2
q ∗ (t | s, 2.)r (t, 2.)q ∗ (v |, s.1)τ (v, .1)
t=1 v=S1 +1
v=S1 +1 v=S1 +1
−
q ∗ (v | s, .1)τ (v, .1) > 0
(3.15)
t=1 v=S1 +1 S1 +S2
S1 +S2 v=S1 +1
Equation (3.13) yields
+
q ∗ (v |, s.2)τ (v, .2)
v=S1 +1
v=S1 +1
S1 S1 +S2
S1 +S2
S1 S1 +S2
q ∗ (t | s, 2.)q ∗ (v | s, .2)τ (t, 2.)r (v, .1)
t=1 v=S1 +1 S1 +S2
S1 +S2
q ∗ (v | s, .2)τ (v, .2)q ∗ (v | s, .1)r (v, .1) > 0
v=S1 +1 v=S1 +1
(3.16)
320
S. Sinha and K. G. Bakshi S1
q ∗ (t | s, 2.)τ (t, 2.)
S1 +S2
q ∗ (v |, s.1)r (v, .1) +
v=S1 +1
t=1
+
S1
q ∗ (t | s, 1.)r (t, 1.)
t=1 S1 S1 +S2
S1
q ∗ (t | s, 1.)r (t, 1.)
q ∗ (t | s, 2.)τ (t, 2.) −
t=1
q ∗ (v |, s.1)τ (v, .1)
v=S1 +1
t=1
S1
S1 +S2
S1 S1
q ∗ (t | s, 1.)q ∗ (t | s, .2)τ (t, 1.)r (t, 2.)−
t=1 t=1
q ∗ (t | s, 1.)q ∗ (v | s, .1)r (v, .1)τ (t, 1.) −
t=1 v=S1 +1
S1
S1 +S2
q ∗ (t | s, .2)r (t, .2)
q ∗ (v | s, .1)τ (v, .1) > 0
v=S1 +1
t=1
(3.17)
Equation (3.14) yields S1
q ∗ (t | s, 1.)τ (t, 1.)
q ∗ (v |, s.2)r (v, .2) +
v=S1 +1
t=1
+
S1 t=1
−
S1 +S2
S1 S1 +S2
q ∗ (t | s, 2.)r (t, 2.)
S1
q ∗ (t | s, 2.)r (t, 2.)
q ∗ (t | s, 1.)τ (t, 1.) −
t=1
q ∗ (t | s, 2.)q ∗ (v | s, .2)r (v, .2)τ (t, 2.) −
t=1 v=S1 +1
q ∗ (v |, s.2)τ (v, .2)
v=S1 +1
t=1 S1
S1 +S2
S1 S1
q ∗ (t | s, 1.)q ∗ (t | s, .2)r (t, 1.)τ (t, 2.)
t=1 t=1 S1
q ∗ (t | s, 1.)r (t, 1.)
S1 +S2
q ∗ (v | s, .2)τ (v, .2) > 0.
v=S1 +1
t=1
(3.18) Similarly using the fact that 0 < q ∗ (s | s, a) < 1, (where s, s ∈ {1, 2, . . . , N }, a is the action chosen by either player I or II) and adding (3.15) and (3.16), we get S1 S1 +S2
(τ (v, .2)r (t, 1.) − r (v, .2)τ (t, 1.)) +
t=1 v=S1 +1
S1 S1 +S2
(r (v, .1)τ (t, 1.) − τ (v, .1)r (t, 1.))+
t=1 v=S1 +1 S1 S1 +S2
(r (t, 2.)τ (v, .1) − r (v, .1)τ (t, 2.)) +
t=1 v=S1 +1
S1 S1 +S2 t=1 v=S1 +1
(r (v, .2)τ (t, 2.) − r (t, .2)τ (v, 2.)) > 0.
(3.19) Now adding (3.17) and (3.18) we get S1 S1 +S2
(τ (t, 1.)r (v, .2) − r (t, 1.)τ (v, .2)) +
t=1 v=S1 +1
S1 S1 +S2
(r (t, 1.)τ (v, .1) − τ (t, 1.)r (v, .1))+
t=1 v=S1 +1 S1 S1 +S2 t=1 v=S1 +1
(r (v, .1)τ (t, 2.) − r (t, 2.)τ (v, .1)) +
S1 S1 +S2 t=1 v=S1 +1
(r (t, .2)τ (v, 2.) − r (v, .2)τ (t, 2.)) > 0.
(3.20) From (3.19) and (3.20) we get a contradiction. Thus, every 2 × 2 submatrix has a pure saddle point and by Theorem 2.1 proposed by Shapley [17, page-6], the matrix A has a pure saddle point and the game Γ has a pure stationary optimal strategy pair for each initial state. Suppose ( f 1 , f 2 , . . . , f N ) be optimal pure stationary strategies for player I when the initial states are 1, 2, . . . , N respectively and (g1 , g2 , . . . , g N ) be optimal pure stationary strategies for player II when the initial states are 1, 2, . . . , N respectively. The f ∗ = ( f 1 , f 2 , . . . , f N ) and g ∗ = (g1 , g2 , . . . , g N ) are the optimal pure semi-stationary strategies for player I and II respectively in the perfect information semi-Markov game Γ .
On Zero-Sum Two Person Perfect Information Semi-Markov Games
321
4 Calculating the Cesaro Limiting Matrix of a Transition Matrix Lazari et al. [9] proposed an algorithm to compute the Cesaro limiting matrix of any Transition (Stochastic) matrix Q with n states. The algorithm runs as follows: Input: Let the transition matrix Q ∈ Mn (R) (where Mn (R) is the set of n × n matrices over the field of real numbers). Output: The Cesaro limiting matrix Q ∗ ∈ Mn (R). Step 1: Determine the characteristic polynomial C Q (z) =| Q − z In |. Step 2: Divide the polynomial C Q (z) by (z − 1)m(1) (where m(1) is the algebraic multiplicity of the eigenvalue z 0 = 1) and call it quotient T (z). Step 3: Compute the quotient matrix W = T (Q). Step 4: Determine the limiting matrix Q ∗ by dividing the matrix W by the sum of its elements of any arbitrary row.
5 An Example Example: Consider a PISMG Γ with four states S = {1, 2, 3, 4}, A(1) = {1, 2} = A(2), B(1) = B(2) = {1}, B(3) = B(4) = {1, 2}, A(3) = A(4) = {1}. Player II is the dummy player in the state 1 and 2 and player I is the dummy player for the states 3 and 4. Rewards, transition probabilities and expected sojourn times for the players are given below
1.1 3.1 ( 21 , 21 ,0,0) ( 21 , 21 ,0,0) 3 5.8 1 1 State-2: State-3: (0, 0, 1, 0) (0, 0, 1, 0) State-1: 1 3 1 2 ( 31 , 23 ,0,0) ( 23 , 31 ,0,0) 0.9 1.1 4 2 State-4: ( 21 ,0, 21 ,0) ( 21 ,0, 21 ,0) 2 1.1 (r ) A cell (q1 , q2 , q3 , q4 ) represents that r is the immediate rewards of the players, τ¯ q1 , q2 , q3 , q4 represents that the next states are 1, 2, 3 and 4 respectively and τ¯ is the expected sojourn time if this cell is chosen at present. Here player I is the row player and player II is the column player. Player I has the pure stationary strategies f 1 = {(1, 0), (1, 0), 1, 1}, f 2 = {(1, 0), (0, 1), 1, 1}, f 3 = {(0, 1), (1, 0), 1, 1}
322
S. Sinha and K. G. Bakshi
and f 4 = {(0, 1), (0, 1), 1, 1}. Similarly the pure stationary strategies for player II are g1 = {1, 1, (1, 0), (1, 0)}, g2 = {1, 1, (1, 0), (0, 1)}, g3 = {1, 1, (0, 1), (1, 0)} and g4 = {1, 1, (0, 1), (0, 1)}. Now, we calculate the undiscounted value of the PISMG for each initial state by complete enumeration method. Using the algorithm described in Sect. 4, we calculate the Cesaro limiting matrices ⎤ ⎡ 1 1 as follows: 00 2 2 ⎢ 1 1 0 0⎥ ∗ ⎥ 2 2 Q ∗ ( f 1 , g1 ) = Q ∗ ( f 2 , g1 ) = Q ∗ ( f 3 , g1 ) = Q ∗ ( f 4 , g1 ) = ⎢ ⎣ 0 0 1 0 ⎦, Q ( f 2 , g1 ) = 1 1 00 2 2 ⎤ ⎡1 1 00 2 2 ⎢ 2 1 0 0⎥ ∗ ∗ ∗ ⎥, Q ∗ ( f 3 , g1 ) = Q ∗ ( f 3 , g2 ) ⎢ 3 3 Q ( f 2 , g2 ) = Q ( f 2 , g3 ) = Q ( f 2 , g4 ) = ⎣ 0 0 1 0⎦ 1 0 1 0 ⎤ 2 2 ⎡1 2 00 3 3 1 1 ⎢ 0 0⎥ ⎥ 2 2 Q ∗ ( f 4 , g1 ) = Q ∗ ( f 4 , g2 ) = Q ∗ ( f 4 , = Q ∗ ( f 3 , g3 ) = Q ∗ ( f 3 , g4 ) = ⎢ ⎣ 0 0 1 0 ⎦, 1 0 1 0 ⎤2 2 ⎡1 2 00 3 3 2 1 ⎢ 0 0⎥ ⎥ 3 3 g3 ) = Q ∗ ( f 4 , g4 ) = ⎢ ⎣ 0 0 1 0 ⎦. Now the reward vector rˆ ( f 1 , g1 ) = (1.1, 3.1, 3, 4) 1 0 21 0 2 and expected sojourn time vector τ¯ ( f 1 ) = (1, 1, 1, 2). Thus by using the definition ˆ we get φ( ˆ f 1 , g1 ) = (2.1, 2.1, 3, 0.9). Similarly we calculate the undiscounted of φ, ˆ f 1 , g2 ) = (2.1, 2.1, 3, 0.9), pay-offs for other pairs of pure stationary strategies as φ( ˆ f 1 , g3 ) = (2.1, 2, 2.9, 0.9), φ( ˆ f 1 , g4 ) = (2.1, 2.1, 2.9, 0.9), φ( ˆ f 2 , g1 ) = (1.8353, φ( ˆ f 2 , g2 ) = (1.8353, ˆ f 2 , g3 ) = (1.8353, 1.8362, 2.9, 1.2776), φ( 1.8362, 3, 0.53), φ( ˆ f 3 , g1 ) = (2.2985, ˆ f 2 , g4 ) = (1.8353, 1.8362, 2.9, 1.2773), φ( 1.8362, 3, 0.58), φ( ˆ f 3 , g3 ) = (2.2985, ˆ f 3 , g2 ) = (2.2985, 2.2985, 3, 0.4088), φ( 2.2985, 3, 0.4088), φ( ˆ f 4 , g1 ) = ˆ f 3 , g4 ) = (2.2985, 2.2988, 2.9, 1.2182), φ( 2.2985, 2.9, 0.4088), φ( ˆ f 4 , g2 ) = ˆ f 4 , g3 ) = (2.112, 2.1129, 2.9, 1.2141), φ( (2.2979, 2.2985, 3, 0.4277), φ( ˆ f 4 , g4 ) = (2.112, 2.1129, 2.9, 1.2137). For initial (2.112, 2.1129, 3, 0.4267), φ( state 1, we get the pay-off matrix A as described in Sect. 3, as ⎡ ⎤ 2.1 2.1 2.1 2.1 ⎢ 1.8353 1.8353 1.8353 1.8353 ⎥ ⎥ A14×4 = ⎢ ⎣ 2.2985 2.2985 2.2985 2.2985 ⎦ . 2.2979 2.112 2.112 2.112 So, this matrix has a pure saddle point at the 3rd row, 3rd column position and we conclude ( f 3 , g3 ) is the optimal pure stationary strategy pair for the players for initial state 1. Similarly for initial state 2, 3 and 4 we get pay-off matrices as
On Zero-Sum Two Person Perfect Information Semi-Markov Games
323
⎡
⎡ ⎤ ⎤ 2.1 2.1 2 2.1 3 3 2.9 2.9 ⎢ 1.8362 1.8362 1.8362 1.8362 ⎥ ⎢ 3 3 2.9 2.9 ⎥ 3 4 ⎢ ⎥ ⎥ A24×4 = ⎢ ⎣ 2.2985 2.2985 2.2985 2.2988 ⎦, A4×4 = ⎣ 3 3 2.9 2.9 ⎦ and A4×4 = 2.2985 2.1129 2.1129 2.1129 3 3 2.9 2.9 ⎡ ⎤ 0.9 0.9 0.9 0.9 ⎢ 0.53 0.58 1.2776 1.2773 ⎥ ⎢ ⎥ ⎣ 0.4088 0.4088 0.4088 1.2182 ⎦. The optimal pure stationary strategy pairs of the 0.4277 0.4267 1.2141 1.2137 player I and player II for the initial states 2, 3 and 4 are ( f 3 , g3 ), ( f 1 , g3 ) and ( f 1 , g2 ) respectively. Thus the optimal pure semi-stationary strategy is f ∗ = ( f 3 , f 3 , f 1 , f 1 ) and g ∗ = (g3 , g3 , g3 , g2 ) for the players I and II respectively and the game has a value (2.2985, 2.2985, 2.9, 0.9).
6 Conclusion The purpose of this paper is to show that there exists an optimal pure semi-stationary strategy pair by just looking at the pay-off matrix in any Perfect Information semiMarkov game. Thus, the existence of the value and a pair of pure semi-stationary optimals for the players in a zero-sum two person Perfect Information undiscounted semi-Markov game can be obtained as a corollary of Shapley’s paper [17] directly without going through the discounted version. Furthermore, the existence of a pure optimal strategy (not necessarily stationary/semi-stationary) for an N person Perfect Information non-cooperative semi-Markov game under any standard (discounted/ undiscounted) pay-off criteria can be shown. We shall elaborate on this in a forthcoming paper.
References 1. Lal AK, Sinha S (1992) Zero-sum two-person semi-Markov games. J Appl Prob Cambridge University Press 2. Luque-Vasquez F, Hernandez-Lerma O (1999) Semi-Markov control models with average costs. Applicationes mathematicae, Instytut Matematyczny Polskiej Akademii Nauk 3. Sinha S, Mondal P (2017) Semi-Markov decision processes with limiting ratio average rewards. J Math Anal Appl, Elsevier 4. Mondal P, Sinha S (2015) Ordered field property for semi-Markov games when one player controls transition probabilities and transition times. Int Game Theory Rev 5. Jewell WS (1963) Markov-renewal programming. I: formulation, finite return models: operations research. Informs 6. Thuijsman F, Raghavan TES (1997) Perfect information stochastic games and related classes. Int J Game Theory, Springer 7. Adler I, Resende MGC, Veiga G, Karmarkar N (1989) An implementation of Karmarkar’s algorithm for linear programming. Mathematical programming, Springer, San Diego (1995) 8. Mondal P (2020) Computing semi-stationary optimal policies for multichain semi-Markov decision processes. Ann Oper Res. Springer
324
S. Sinha and K. G. Bakshi
9. Lazari A, Lozovanu D (2020) New algorithms for finding the limiting and differential matrices in Markov chains. Buletinul Academiei de Moldovei. Matematica 10. Mondal P (2017) On zero-sum two-person undiscounted semi-Markov games with a multichain structure. Adv Appl Probab. Cambridge University Press 11. Shapley LS (1953) Stochastic games. In: Proceedings of the national academy of sciences. National Academy of Sciences (1953) 12. Kemeny JG, Snell JL (1961) Finite continuous time Markov chains. Theory Probab Appl. SIAM 13. Gillette D (2016) 9. Stochastic games with zero stop probabilities. Contributions to the theory of games (AM-39), vol III. Princeton University Press 14. Liggett TM, Lippman SA (1969) Stochastic games with perfect information and time average payoff. SIAM Rev. SIAM 15. Derman C (1962) On sequential decisions and Markov chains. Management science. Informs 16. Howard RA (1971) Semi-Markov and decision processes. Wiley 17. Dresher M, Berkovitz LD, Aumann RJ, Shapley LS, Davis MD, Tucker AW (1964) Advances in game theory. Princeton University Press, Annals of Mathematics Studies
Interval Estimation for Quantiles of Several Normal Populations with a Common Standard Deviation Habiba Khatun
and Manas Ranjan Tripathy
Abstract This article discusses the interval estimation of the quantile γ = μ1 + ζ σ of the first population (for known ζ ) when samples are available from other k − 1, normal populations with a common standard deviation and unequal means. Utilizing the information matrix, we derive the asymptotic confidence interval (ACI) for the quantile. Another classical confidence interval using the method of variance estimate recovery (MOVER) is also derived. Further, several approximate confidence intervals, such as bootstrap-p, bootstrap-t, and highest posterior density (HPD) intervals, have been obtained numerically. Furthermore, we propose the generalized confidence interval (GCI) for the quantile using the generalized variable approach. Comparisons are made between the proposed intervals using measures such as their coverage probability (CP) and average length (AL). Finally, two real-life examples are considered for demonstrating the methodology. Keywords Asymptotic confidence interval · Average length · Coverage probability · Generalized variable approach · HPD interval · Markov chain Monte Carlo · Parametric bootstrap
1 Introduction Consider k(≥ 2) normal populations with potentially distinct means μ1 , μ2 , . . . , μk and a common standard deviation σ. To be more explicit, let (Yi1 , Yi2 , . . . , Yim i ) be a random sample of size m i taken from the ith normal population N (μi , σ 2 ), i = 1, 2, . . . , k. We consider the problem of interval estimation of the first population’s quantile γ by taking samples from the other k − 1 populations. Here, γ = μ1 + ζ σ is the qth quantile of the first population, where ζ = Φ −1 (q), q ∈ (0, 1), and Φ(.) is the cumulative distribution function of N (0, 1). Note that, though we have considered H. Khatun (B) · M. R. Tripathy Department of Mathematics, National Institute of Technology Rourkela, Rourkela 769008, OR, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_24
325
326
H. Khatun and M. R. Tripathy
the interval estimation of the first population only, without doing much changes to the original problem, one can also obtain the interval estimator for the quantile of other populations. A joint complete and sufficient statistics is (Y¯1 , Y¯2 , . . . , Y¯k , S 2 ) for this model where mi mi k 1 Y¯i = Yi j , Si2 = (Yi j − Y¯i ) and S 2 = Si2 ; i = 1, 2, . . . , k. m i j=1 j=1 i=1
The random variables Y¯i ∼ N (μi , σ 2 /m i ) and Si2 ∼ σ 2 χm2 i −1 ; i = 1, 2, . . . , k so k 2 , and m = i=1 mi . that S 2 ∼ σ 2 χm−k Inference of quantiles is very much inevitable due to their importance in real-life application. Quantiles are essential in several fields of study, for example, in medical science, quantile points are used to know the effect of a newly launched drug on the majority of patients (see [4]). For some further applications of quantiles in different areas of studies, one may refer to [2, 8, 12]. In this paper, we consider real-life situations where the data sets are satisfactorily modeled using normal distributions with a common variance and possibly different means. In the past, many researchers have investigated the problem of inference of quantiles for normal distribution. From the decision-theoretic perspective, [19, 20] probably were the first to discuss the point estimation of the quantile of the normal population, in a general setup. Rukhin [15] addressed the point estimation of the quantile of the normal population and established some nice decision-theoretic results, including the inadmissibility of the best equivariant estimator. While estimating the quantile of the first population, [10] derived some decision-theoretic results under two normal populations when their means are common. Further, [16] generalized these results to several normal populations. The estimation of quantiles of two normal populations with order restricted variances and a common mean was investigated by [13]. Some results in comparing quantiles for two or more populations under normality assumption are also available. One may refer to [6, 12] for some problems on comparing quantiles of several normal populations. In these cases, the authors derived several test procedures, such as an approximation test and a test using the generalized method for comparing the quantiles. Further, [1] derived some test procedures from testing quantile equality of several normal populations. Recently, [9] investigated the problem of hypothesis testing along with interval estimation of the quantile of a population when there are k − 1 other normal populations with a common mean. The authors provided multiple confidence intervals and test procedures and compared their performance numerically using average length and coverage probability (in interval estimation) and size values and power (in the case of hypothesis testing). The main contribution of the current research work can be elaborated in the following way. We note that a lot of research work has been done on the inference of quantiles for normal populations from a classical and decision-theoretic perspective, as discussed in the previous paragraph. A little or no work has been done on the inference of quantiles under the common variance or standard deviation model setup.
Interval Estimation for Quantiles of Several Normal …
327
The importance of assuming common variance in real-life situations has been shown in Examples 1 and 2 (see Sect. 8). Moreover, we specifically derive the form of the interval estimators using certain estimators of the common variance and the means. The problem and the related obtained results are new and have not been considered in the literature. In [14, 17], the problems of estimating the common standard deviation and variance when the means follow a particular ordering are discussed, for two normal populations, respectively. The rest of our contributions can be outlined as follows. In Sect. 2, asymptotic confidence interval (ACI) of γ is computed using the Fisher information matrix. We derive an exact confidence interval for γ using the method of variance unbiased estimate recovery (MOVER) in Sect. 3. In Sect. 4, we obtain two confidence intervals using the bootstrap sampling procedure. In Sect. 5, we derive the highest posterior density (HPD) interval for γ using the Monte Carlo Markov Chain (MCMC) procedure. In Sect. 6, using the generalized p-value method, we derive the generalized confidence interval (GCI) for γ . In Sect. 7, with the help of a simulation study, we perform a numerical comparison of the performance of each of the confidence intervals. Finally, to illustrate the potential applicability of our methodology, we consider two real-life examples, the details of which are presented in Sect. 8.
2 Asymptotic Confidence Interval for Quantile In this section, we will consider the interval estimation of the quantile using the usual asymptotic normality assumption of the distribution of the MLE of the quantile. We derive the ACI for the quantile γ = μ1 + ζ σ by using the MLE and the information matrix under the related model problem. The MLEs of the parameters μi and σ 2 are obtained as μˆ i M L = Y¯i and σˆ M2 L = S 2 /m, respectively. Utilizing these, the MLE of γ = μ1 + ζ σ is calculated as γˆM L = μˆ 1M L + ζ σˆ M L . Further to obtain the information matrix, we assume that (Yi1 , Yi2 , . . . , Yim i ), i = 1, 2, . . . , andk be a random sample of size m i from the ith normal population N (μi , σ 2 ). The log-likelihood function of the current model is obtained as L(μ1 , μ2 , . . . , μk , σ, yi j ) ∝ −m log σ −
mi k 1 (yi j − μi )2 . 2σ 2 i=1 j=1
The information matrix is obtained as ⎤ 0 ... 0 ⎢ 0 m22 . . . 0 ⎥ σ ⎥ I (μ1 , . . . , μk , σ ) = ⎢ ⎦ ⎣. . . . 0 0 . . . 2m 2 σ ⎡ m1 σ2
The inverse of I is obtained as
328
H. Khatun and M. R. Tripathy
⎡
I (μ1 , . . . , μk , σ )−1
⎤ 0 ... 0 2 ⎢ ⎥ ⎢ 0 mσ 2 . . . 0 ⎥ =⎢ ⎥ ⎣. . . ⎦ σ2 0 0 . . . 2m . σ2 m1
We will obtain the ACI for γ by applying the Delta method. Here, γ is a function of μ1 , . . . , μk and σ, that is γ = f (μ1 , μ2 , . . . , μk , σ ) and the gradient of f exists and is nonzero for all values of μ1 , μ2 , . . . , μk and σ. The gradient of f is obtained as Δf (μ1 , . . . , μk , σ ) = (1, . . . , ζ )t . The MLE of γ is obtained which is γˆM L = f (μˆ 1M L , . . . , μˆ k M L , σˆ M L ) = μˆ 1M L + ζ σˆ M L . Thus, γˆM L follows a nor2 σ2 . mal distribution N (γ , Δf t I −1 Δf ) asymptotically, where Δf t I −1 Δf = mσ 1 + ζ 2 2m Thus, we obtain the (1 − ν)100% ACI for the quantile γ as
γˆM L ± z ν/2
σˆ M2 L σˆ 2 + ζ 2 ML , m1 2m
(1)
where z b denotes the bth quantile of N (0, 1). In the next section, we obtain another confidence interval for the quantile γ , which is exact and also for fixed sample size.
3 Confidence Interval Using Method of Variance Estimate Recovery (MOVER) Suppose, one is interested to construct a two-sided confidence interval for the sum of two parameters ξ1 and ξ2 , when the individual sample confidence intervals for ξ1 and ξ2 are available as (L ξ1 , Uξ1 ) and (L ξ2 , Uξ2 ), respectively. A (1 − ν)100% twosided confidence interval (L ξ , Uξ ) of ξ = ξ1 + ξ2 can be obtained using the MOVER approach introduced by [21]. Here, L ξ (lower) and Uξ (upper) are the bounds of the confidence interval of ξ, defined as L ξ = ξˆ1 + ξˆ2 − and Uξ = ξˆ1 + ξˆ2 +
(ξˆ1 − L ξ1 )2 + (ξˆ2 − L ξ2 )2
(Uξ1 − ξˆ1 )2 + (Uξ2 − ξˆ2 )2 ,
(2)
(3)
where ξˆ1 and ξˆ2 are the unbiased estimators of the parameters ξ1 and ξ2 , respectively. For our model, the usual confidence intervals for μ1 and σ are obtained as Y¯1 ± tm 1 −1;1−ν/2 S1 / (m 1 (m 1 − 1)), and
Interval Estimation for Quantiles of Several Normal …
S2 2 χm−k;1−ν/2
,
329
S2 2 χm−k;ν/2
,
respectively. To obtain the confidence interval of the quantile γ = μ1 + ζ σ, let us denote ξ1 = μ1 and ξ2 = ζ σ. Then plugging these values in the equations (2) and (3), we get the confidence interval for the quantile. Thus, the two-sided (1 − ν)100% confidence interval for γ is obtained as ⎧ 22 S1 tm −1;1−ν/2 ⎪ S2 1 1 2 S2 √ 1 ⎪ ¯ √ + ζ − ⎨ L = Y1 + ζ m−k − 2 m 1 (m 1 −1) m−k χm−k;1−ν/2 22 S t 2 ⎪ 1 m 1 −1;1−ν/2 S 1 ⎪ 2 S2 √ 1 √ ⎩ U = Y¯1 + ζ m−k , + + ζ − 2 m 1 (m 1 −1) m−k
(4)
χm−k;ν/2
where L and U are the lower and upper bounds of the MOVER confidence interval.
4 Confidence Intervals Using Bootstrap Sampling We utilize the bootstrap sampling method to get two confidence intervals, called bootstrap percentile and bootstrap-t, for the quantile γ . Perhaps, [5] was the first to introduce the bootstrap percentile (Boot-p) confidence interval, and later on [7] proposed the bootstrap-t (Boot-t) confidence interval. Following is the step-by-step computational algorithm for obtaining the Boot-p and Boot-t confidence intervals for the quantile γ .
4.1 Boot-p Method 1. Given the sample (Yi1 , Yi2 , . . . , Yim i ), i = 1, 2, . . . , k from N (μi , σ 2 ), calculate the MLEs of μi and σ. Using this, get the MLE of γ as γˆM L = μˆ 1M L + ζ σˆ M L . ∗ ), i = 1, 2, . . . , k from N (μˆ i M L , 2. Generate bootstrap samples (Yi1∗ , Yi2∗ , . . . , Yim i 2 ∗ σˆ M L ) and then calculate the MLEs μˆ i M L and σˆ M∗ L . Therefore, the bootstrap MLE of γ is computed as γˆM∗ L = μˆ ∗1M L + ζ σˆ M∗ L . 3. Step 2 is repeated a large number of times, say A times, and the bootstrap estimates of γ are obtained as γˆM∗ L1 , γˆM∗ L2 , . . . , γˆM∗ L A . Then arrange them in increasing order. 4. Let T ( f ) = P(γˆM∗ L ≤ f ), then the (1 − ν)100% Boot-p confidence interval of γ is given by (γˆBp (ν/2), γˆBp (1 − ν/2)), where γˆBp ( f ) = T −1 ( f ).
330
H. Khatun and M. R. Tripathy
4.2 Boot-t Method The algorithmic steps for computing the bootstrap-t approximate confidence interval for γ is given as follows. 1. Given the samples (Yi1 , Yi2 , . . . , Yim i ), i = 1, 2, . . . , k from the ith normal population N (μi , σ 2 ), calculate the MLEs of μi and σ. Utilizing these, we get the estimate of γ as γˆM L = μˆ 1M L + ζ σˆ M L . 2. Generate bootstrap samples (Yi1∗ , Yi2∗ , . . . , Yin∗ i ) from N (μˆ i M L , σˆ M2 L ), i = 1, 2, . . . , k and compute the bootstrap MLEs μˆ i∗M L and σˆ M∗ L of μi and σ , respectively. Consequently, compute the bootstrap MLE of γ , say, γˆM∗ L = μˆ ∗1M L + ζ σˆ M∗ L . γˆ ∗ −γˆ 3. Compute the statistic B ∗ = √M L M∗L . V ar (γˆM L )
4. Steps 2 and 3 are repeated, let’s say A(sufficiently large) times. 5. The (1 − ν)100% bootstrap-t interval of γ is
∗ ∗ ∗ ∗ γˆM L − B(1−ν/2)A V ar (γˆM L ), γˆM L − B(ν/2)A V ar (γˆM L ) .
5 Bayesian Interval Estimation Inference regarding the parameters or a function of it, when some prior information regarding their behavior is known in advance, can be done using the prior information and assuming certain probability distribution for the parameters. In this section, the parameters μi , and σ 2 are assumed to follow certain probability distributions, and Gibbs sampling is used to generate samples from the posterior density of the parameters. In order to apply this method, a conjugate family of prior distributions for μi and σ 2 are taken. We take the independent priors of μi , i = 1, 2, . . . , k, which are distributed as N (ai , bi ), that is, normal distribution with mean ai and variance bi . The prior distribution of σ 2 is taken as I G(θ, β) which is inverse gamma distribution with parameters θ and β. The joint posterior density function of (μi , σ 2 ) given the sample (Yi j ) is obtained as Π (μ1 , μ2 , . . . , μk , σ 2 |data) = K × I G(θ, β) × L(μ1 , μ2 , . . . , μk , σ |data) ×
k i=1
= K × h(μ1 , μ2 , . . . , μk , σ 2 |data),
where
N (ai , bi )
Interval Estimation for Quantiles of Several Normal …
K −1 =
∞ −∞
∞ −∞
...
∞
331
h(μ1 , μ2 , . . . , μk , σ 2 |data)dμ1 dμ2 . . . dμk dσ 2 ,
0
and β θ (σ 2 )−(θ+ 2 +1) m
h(μ1 , μ2 , . . . , μk , σ 2 , x) =
( m2 +k)
(2π)
(θ)
× exp −
k
− 21
bi
i=1
k k mi β (μi − ai )2 1 2 . − (y − μ ) i j i 2σ 2 bi 2σ 2 i=1
i=1 j=1
(5) The conditional distribution of μi given μ j ( j = 1, 2, . . . , k; j = i), σ 2 and the 2 i y¯i bi and variance ( b1i + mσ 2i )−1 . sample value is normal distribution with mean ai σσ 2 +m +m i bi Equivalently, we have μi |(σ 2 , data) ∼ N
a σ 2 + m y¯ b 1 m i −1 i i i i , , + 2 2 σ + m i bi bi σ
(6)
and
m σ |(μ1 , μ2 , . . . , μk , data) ∼ I G θ + , β + 2 2
k i=1
m i
(yi j − μi )2 . 2
j=1
(7)
The following describes the algorithmic steps for the Gibbs sampling procedure. 1. Start with the initial guess μi(0) , i = 1, 2, . . . , k and σ 2(0) .
2. Generate μi(1) from the conditional probability distribution N m i −1 . ) 2(0) σ
ai σ 2(0) +m i y¯i bi σ 2(0) +m i bi
, ( b1i +
3. Generate σ 2(1) from the conditional probability distribution of σ 2 , given μi = 2 k m i (1) (1) i=1 j=1 yi j −μi m 2(1) . Obtain the square ∼ IG θ + 2 ,β + μi . That is, σ 2 root of σ 2(1) and denote it as σ (1) . (1) . 4. Compute the value of γ (1) = μ(1) 1 + ζσ 5. Repeat Steps 2, 3, and 4 multiple times, say N times, to obtain the values γ (1) , . . . γ (N ) . The (1 − ν)100% HPD interval for γ can be constructed using the [3] approach.
332
H. Khatun and M. R. Tripathy
6 Generalized Variable Approach We calculate the confidence interval of γ in this section using the generalized variable method. The generalized variable approach was proposed by [18]. The following result will be useful to derive intervals using this well-known method. Definition 1 Suppose Y is a random variable whose distribution depends on (r, f ), where r is the parameter of interest and f is a nuisance parameter. A random variable Z = Z (Y ; y, r, f ) is referred to as a generalized pivot variable for computing the generalized confidence interval (GCI) of r, if the following conditions are true: (i) For a fixed Y = y, the distribution of Z (Y ; y, r, f ) is free from all the unknown parameters. (ii) The value of Z at Y = y is r, that is, Z (Y ; y, r, f ) = r, the parameter of interest. Using the above generalized variable, one can construct a (1 − ν)100% confidence interval for r using appropriate percentiles of Z . The (1 − ν)100% GCI for r is given by (8) (Z (y; ν/2), Z (y; 1 − ν/2)), where Z (y, q) is the qth percentile of Z (Y ; y, r, f ). Next, using this method we propose a generalized pivot variable and construct a confidence interval for the quantile γ . The pooled estimator for the common variance 2 , chi-square σ 2 is obtained as S 2p = S 2 /(m − k). Thus, U 2 = (m − k)S 2p /σ 2 ∼ χm−k 2 distribution with ‘m − k’ degrees of freedom. Let y¯i and s p be the observed values of Y¯i and S 2p , respectively. We obtain the generalized pivot variable for the quantile γ = μ1 + ζ σ as
s p tm−k +ζ Z = y¯1 − √ m1
(m − k)s 2p U2
,
(9)
where t(m−k) denotes a random variable which follows the Student’s t−distribution with (m − k) degrees of freedom. We observe that, for fixed y¯i and s 2p , the distribution of Z is independent of unknown parameters, and the value of Z equals the parameter of interest γ . Thus, the (1 − ν)100% GCI for γ is obtained as (Z (ν/2), Z (1 − ν/2)).
(10)
7 Computational Results In the previous sections, several confidence intervals for γ have been derived, some of which do not have expressions in closed form. In this section, we compare all the proposed confidence intervals in terms of their average lengths (ALs) and coverage probabilities (CPs) numerically.
Interval Estimation for Quantiles of Several Normal …
333
The ALs and CPs of all the proposed confidence intervals are calculated based on 20,000 replications. Note that 2500 bootstrap samples have been generated for computing both the bootstrap confidence intervals, that is, A = 2500 has been taken. Further, for the generalized confidence interval, the inner loop has been replicated the same number of times. In the case of MCMC through the Gibbs sampling method, we consider N = 50,000 and out of which M(= N /4) are taken as the burn-in period. Also, the AL and CP of all the intervals do not depend on μi for i ≥ 2. Hence, without loss of generality, we take μi = 0 for i = 2, 3, . . . , k in our simulation study. Throughout the simulation, we take ζ = 1.96 and γ = 1.96 (μ1 = 0, σ = 1). The choices of sample sizes are taken as mi =
i m 1 , 2 ≤ i ≤ k. 2
Standard error ranges from 0.002 to 0.003 for simulation results, which is pretty acceptable, and the simulation’s accuracy has been verified. The CPs and ALs of each of the intervals is obtained with significance levels ν = 0.05 and ν = 0.10. These values are reported in Tables 1 and 2 respectively for k = 2, 3, 4, and 5. Our simulation study, including Tables (1 and 2), gives the following results. (1) The CPs and ALs of each interval are computed at confidence levels 90% and 95%. Out of all the confidence intervals, the MOVER and the GCI attain the nominal level. Looking at these two intervals, we see that the generalized confidence interval has the shortest average length in terms of ALs. If we become a little liberal and consider the nominal level below the actual within 30%, then HPD attains it. However, among all the three, again in terms of ALs, the generalized confidence interval has the best performance in terms of shortest length. (2) Though we have presented our simulation results up to k = 5, we have carried out the simulation study for some other values of k, such as k = 7 and 10. The results are quite similar in terms of CPs and ALs of all the intervals, and the conclusions regarding their performances remain the same.
8 Applications This section presents two real-life data sets which are modeled using normal distributions with a common variance and demonstrated various interval estimation methodologies for the quantile γ = μ1 + ζ σ. Example 1: The data sets represent the number of fatalities caused by COVID19 per day in three countries, namely India, Italy, and Spain. The data sets are collected from the period 1st July 2020 to 20th July 2020 (India), 1st April 2020 to 15th April 2020 (Italy), and 5th April 2020 to 21st April 2020 (Spain). These data sets are available on the official website of World Health Organization (WHO) (see htt ps : //covid19.who.int/). The data sets are given as follows:
334
H. Khatun and M. R. Tripathy
Table 1 AL (CP) of all the confidence intervals with ν = 0.05 k
m1
ACI
MOVER
Boot-t
HPDI
GCI
2
6
1.993 (0.85)
2.921(0.96) 1.962 (0.79)
1.962 (0.87)
2.030 (0.93)
1.776 (0.95)
10
1.622 (0.89)
2.014(0.96) 1.600 (0.86)
1.600 (0.91)
1.338 (0.94)
1.310 (0.95)
16
1.318 (0.92)
1.499(0.96) 1.303 (0.89)
1.303 (0.92)
1.347 (0.94)
1.012 (0.95)
20
1.187 (0.93)
1.310(0.95) 1.175 (0.91)
1.175 (0.93)
1.193 (0.95)
0.898 (0.95)
6
1.817 (0.88)
2.492(0.96) 1.795 (0.83)
1.795 (0.90)
1.972 (0.94)
1.681 (0.95)
10
1.462 (0.91)
1.734(0.96) 1.446 (0.88)
1.446 (0.92)
1.272 (0.95)
1.266 (0.95)
16
1.182 (0.93)
1.307(0.95) 1.170 (0.91)
1.170 (0.93)
1.177 (0.95)
0.989 (0.94)
20
1.064 (0.94)
1.154(0.95) 1.053 (0.92)
1.053 (0.94)
1.004 (0.95)
0.880 (0.95)
6
1.728 (0.90)
2.301(0.96) 1.709 (0.86)
1.709 (0.91)
1.745 (0.96)
1.652 (0.95)
10
1.379 (0.92)
1.614(0.96) 1.367 (0.89)
1.367 (0.92)
1.323 (0.95)
1.262 (0.95)
16
1.108 (0.93)
1.214(0.96) 1.097 (0.91)
1.097 (0.93)
1.163 (0.95)
0.989 (0.95)
20
0.997 (0.93)
1.072(0.95) 0.987 (0.92)
0.987 (0.94)
1.007 (0.96)
0.884 (0.95)
6
1.674 (0.91)
2.190(0.96) 1.742 (0.88)
1.742 (0.93)
1.727 (0.96)
1.636 (0.95)
10
1.333 (0.93)
1.533(0.95) 1.387 (0.90)
1.387 (0.94)
1.165 (0.96)
1.255 (0.95)
16
1.068 (0.94)
1.165(0.96) 1.111 (0.92)
1.111 (0.95)
1.009 (0.96)
0.987 (0.95)
20
0.959 (0.94)
1.030(0.96) 0.998 (0.92)
0.998 (0.95)
0.986 (0.96)
0.881 (0.95)
3
4
5
Boot-p
India: 507, 434, 379, 442, 613, 425, 467, 482, 487, 475, 519, 551, 500, 553, 582, 606, 687, 671, 543, 681. Italy: 837, 727, 760, 766, 681, 525, 636, 604, 542, 610, 570, 619, 431, 566, 60. Spain: 786, 740, 750, 766, 736, 749, 667, 676, 628, 544, 587, 607, 552, 487, 507, 435, 401. The normality of the data sets has been checked using both Shapiro-Wilk and Anderson-Darling tests. The statistics and their corresponding p-value obtained are tabulated in Table 3. It indicates that the normality assumption holds true at 0.05 significance level. Further, Bartlett’s test is used to test the equality of variances of these data, whose p-value was obtained as 0.4. This indicates that the equality of variances for these three data sets cannot be rejected (Table 3).
Interval Estimation for Quantiles of Several Normal …
335
Table 2 AL (CP) of all the confidence intervals with ν = 0.10 k
m1
ACI
MOVER
Boot-t
HPDI
GCI
2
6
1.676 (0.80)
2.323(0.92) 1.658 (0.73)
1.658 (0.82)
1.940 (0.87)
1.445 (0.90)
10
1.363 (0.84)
1.636(0.92) 1.349 (0.80)
1.349 (0.85)
1.222 (0.88)
1.082 (0.90)
16
1.104 (0.86)
1.234(0.91) 1.100 (0.83)
1.100 (0.86)
0.982 (0.88)
0.840 (0.90)
20
0.999 (0.87)
1.089(0.91) 0.988 (0.84)
0.988 (0.88)
0.964 (0.89)
0.747 (0.90)
6
1.521 (0.82)
1.981(0.92) 1.515 (0.77)
1.515 (0.85)
1.096 (0.89)
1.390 (0.90)
10
1.232 (0.86)
1.420(0.91) 1.217 (0.82)
1.217 (0.87)
1.108 (0.90)
1.056 (0.89)
16
0.991 (0.87)
1.082(0.91) 0.983 (0.85)
0.983 (0.88)
0.892 (0.90)
0.828 (0.90)
20
0.892 (0.87)
0.955(0.90) 0.886 (0.85)
0.886 (0.88)
0.906 (0.90)
0.738 (0.90)
6
1.451 (0.84)
1.829(0.91) 1.441 (0.80)
1.441 (0.86)
1.220 (0.91)
1.375 (0.90)
10
1.160 (0.88)
1.315(0.91) 1.151 (0.84)
1.151 (0.88)
1.140 (0.91)
1.055 (0.90)
16
0.931 (0.88)
1.005(0.91) 0.925 (0.86)
0.925 (0.89)
0.954 (0.91)
0.829 (0.90)
20
0.837 (0.88)
0.888(0.91) 0.831 (0.86)
0.831 (0.88)
0.915 (0.90)
0.740 (0.90)
6
1.410 (0.85)
1.736(0.91) 1.471 (0.81)
1.471 (0.88)
1.471 (0.91)
1.367 (0.90)
10
1.119 (0.87)
1.252(0.91) 1.167 (0.84)
1.167 (0.89)
1.080 (0.91)
1.049 (0.90)
16
0.897 (0.88)
0.960(0.91) 0.936 (0.86)
0.936 (0.89)
0.981 (0.91)
0.827 (0.90)
20
0.806 (0.89)
0.850(0.90) 0.841 (0.86)
0.841 (0.90)
0.876 (0.91)
0.738 (0.90)
3
4
5
Boot-p
The 95% and 90% confidence intervals for γ = μ1 + ζ σ are computed and reported in Tables 4 and 5, respectively, using all the proposed methods. In the table, γˆL is the lower bound and γˆU is the upper bound of the intervals. From Tables 4 and 5, it is noticed that the GCI has the shortest length among all the proposed confidence intervals. Example 2: Nine laboratories were involved in an inter-laboratory study conducted by [11] to measure the total dietary fiber (TDF) content of various foods containing almost no starch, including most fruits, vegetables, and several pure polysaccharides. Nine different labs were provided the six distinct samples in blind duplicates. From these, we select two samples for our consideration. Using a non-enzymatic gravi-
336
H. Khatun and M. R. Tripathy
Table 3 Normality test for the COVID-19 data sets Test Statistics/P-Value India Shapiro-Wilk test Test statistics p-value AndersonTest statistics Darling test p-value
Italy
Spain
0.996 0.262 0.292
0.998 0.767 0.285
0.997 0.234 0.385
0.569
0.577
0.352
Table 4 95% confidence interval for the COVID-19 Data Interval ACI MOVER Boot-p Boot-t γˆL γˆU Length
671.722 790.077 118.355
683.497 802.435 118.938
665.293 784.066 118.773
677.732 796.505 118.773
Table 5 90% confidence interval for the COVID-19 data Interval ACI MOVER Boot-p Boot-t γˆL γˆU Length
681.384 780.415 99.031
691.189 790.791 99.602
674.652 773.902 99.250
687.897 787.147 99.250
Table 6 Normality test for the food dietary fiber data Test Statistics/P-Value Apple Shapiro-Wilk Test
Test statistics p-value Anderson-Darling Test Test statistics p-value
0.89008 0.1999 0.408 0.270
HPDI
GCI
706.364 824.925 118.561
715.842 811.433 95.591
HPDI
GCI
714.961 814.392 99.431
691.119 771.266 80.147
Cabbage 0.90363 0.2738 0.370 0.341
metric method, the following are the percentages of TDF for apples and cabbage from nine labs. Apple: 12.44, 12.87, 12.21, 12.82, 13.18, 12.31, 13.11, 14.29, 12.08. Cabbage: 26.71, 26.26, 25.93, 26.66, 26.36, 26.13, 27.46, 27.43, 25.82. The p-value obtained in the normality test indicate that normal distribution fits quite well to these three data sets with significance level 0.05. Additionally, the F-test is used to test whether or not the variances of this data are identical. The p-value for this test was calculated to be 0.721, which suggests the equality of variances for these three data sets cannot be rejected (Table 6). The 95 and 90% confidence intervals for γ using all the proposed methods is reported in Tables 7 and 8, respectively.
Interval Estimation for Quantiles of Several Normal …
337
Table 7 95% confidence interval for all proposed methods using dietary fiber data Interval ACI MOVER Boot-p Boot-t HPDI GCI γˆL γˆU Length
13.44279 14.54648 1.103696
13.45335 14.90409 1.450739
13.38664 14.48848 1.101837
13.50079 14.60262 1.101837
13.54316 14.69508 1.151921
14.09921 15.01399 0.9147742
Table 8 90% confidence interval for all proposed methods using dietary fiber data Interval ACI MOVER Boot-p Boot-t HPDI GCI γˆL γˆU Length
13.53147 13.56254 14.45779 14.7387 0.9263165 1.176167
13.42203 13.65399 13.61686 14.33527 14.56724 14.57637 0.9132458 0.9132458 0.959509
13.63472 14.39353 0.7588103
From Tables 7 and 8, it is observed that the GCI has the smallest length out of all the confidence intervals.
9 Concluding Remarks In this article, we proposed several confidence intervals for the quantile γ = μ1 + ζ σ when samples are taken from k(≥ 2) normal populations with a common standard deviation and possibly unequal means. In particular, we obtained the asymptotic confidence interval, an interval using the method of variance estimate recovery (MOVER), bootstrap confidence intervals (Boot-p and Boot-t), the HPD interval using the MCMC method, and an interval using the generalized variable approach. From our simulation study, we concluded that the generalized confidence interval (GCI) outperforms all other intervals in terms of both CP and AL. Finally, we considered two real-life examples where the data sets have been modeled using normal distributions with a common variance, and various interval estimators are obtained to illustrate the methodologies. Acknowledgements The authors would like to convey their gratitude to the anonymous reviewers for their insightful remarks that helped improve the presentation.
References 1. Abdollahnezhad K, Jafari AA (2018) Testing the equality of quantiles for several normal populations. Commun Stat-Simul Comput 47(7):1890–1898 2. Albers W, Löhnberg P (1984) An approximate confidence interval for the difference between quantiles in a bio-medical problem. Statistica Neerlandica 38(1):20–22
338
H. Khatun and M. R. Tripathy
3. Chen MH, Shao QM (1999) Monte Carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Stat 8(1):69–92 4. Cox TF, Jaber K (1985) Testing the equality of two normal percentiles. Commun Stat-Simul Comput 14(2):345–356 5. Efron B (1982) The Jackknife, The bootstrap, and other resampling plans. Siam 6. Guo H, Krishnamoorthy K (2005) Comparison between two quantiles: the normal and exponential cases. Commun Stat-Simul Comput 34(2):243–252 7. Hall P, Martin MA (1988) On bootstrap resampling and iteration. Biometrika 75(4):661–671 8. Huang LF, Johnson RA (2006) Confidence regions for the ratio of percentiles. Stat Probab Lett 76(4):384–392 9. Khatun H, Tripathy MR, Pal N (2020) Hypothesis testing and interval estimation for quantiles of two normal populations with a common mean. Commun Stat-Theory Methods 51(16):5692– 5713 10. Kumar S, Tripathy MR (2011) Estimating quantiles of normal populations with a common mean. Commun Stat-Theory Methods 40(15):2719–2736 11. Li BW, Cardozo MS (1994) Determination of total dietary fiber in foods and products with little or no starch, nonenzymatic-gravimetric method: collaborative study. J AOAC Int 77(3):687– 689 12. Li X, Tian L, Wang J, Muindi JR (2012) Comparison of quantiles for several normal populations. Comput Stat Data Anal 56(6):2129–2138 13. Nagamani N, Tripathy MR (2020) Improved estimation of quantiles of two normal populations with common mean and ordered variances. Commun Stat-Theory Methods 49(19):4669–4692 14. Patra LK, Kayal S, Kumar S (2021) Minimax estimation of the common variance and precision of two normal populations with ordered restricted means. Statist Pap 62(1):209–233 15. Rukhin AL (1983) A class of minimax estimators of a normal quantile. Stat Probab Lett 1(5):217–221 16. Tripathy MR, Kumar S, Jena AK (2017) Estimating quantiles of several normal populations with a common mean. Commun Stat-Theory Methods 46(11):5656–5671 17. Tripathy MR, Kumar S, Pal N (2013) Estimating common standard deviation of two normal populations with ordered means. Stat Methods Appl 22(3):305–318 18. Weerahandi S (1993) Generalized confidence intervals. J Am Stat Assoc 88(423):899–906 19. Zidek JV (1969) Inadmissibility of the best invariant estimator of extreme quantiles of the normal law under squared error loss. Ann Math Stat 40(5):1801–1808 20. Zidek JV (1971) Inadmissibility of a class of estimators of a normal quantile. Ann Math Stat 42(4):1444–1447 21. Zou GY, Taleban J, Huo CY (2009) Confidence interval estimation for lognormal data with application to health economics. Comput Stat Data Anal 53(11):3755–3764
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function Nishi Gupta and Md. Maqbul
Abstract This article deals with a class of nonlinear diffusion equations along with time delay and integral forcing function under Dirichlet boundary and history conditions. The method of semidiscretization has been adopted to demonstrate the existence and unique of a strong solution. Keywords Nonlinear diffusion equation · Discretization method · Delay differential equation · Rothe’s method
1 Introduction Rothe [9] has introduced a method known as the method of semidiscretization or Rothe’s method in 1930 for solving a second-order one-dimensional boundary value problem. After that, many authors adopted this technique to solve fractional, functional, and partial differential equations [1, 2, 4, 5, 7, 8]. Here, we adopt Rothe’s method to establish some suitable conditions for the existence of a unique strong solution of the following nonlinear delay diffusion problem along with integral forcing function subject to history and Dirichlet boundary conditions ⎧ t ∂L ∂2 L ⎪ ⎪ ⎪ = + G t, L(t, x), L(t, x)L(t − Φ(t), x), K (t, , L(, x))d , ⎪ 2 ⎨ ∂t ∂x 0 t ∈ (0, T ], x ∈ [0, 1], ⎪ ⎪ ⎪ L(t, x) = Ψ (t, x), (t, x) ∈ [−δ, 0] × [0, 1], ⎪ ⎩ L(t, 0) = L(t, 1) = 0, t ∈ [−δ, T ],
(1)
N. Gupta (B) · Md. Maqbul Department of Mathematics, National Institute of Technology Silchar, Cachar 788010, AS, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_25
339
340
N. Gupta and Md. Maqbul
where IT = [0, T ], Iδ = [−δ, 0], G : IT × R × R × R → R, K : IT × IT × R → R, Ψ : Iδ × [0, 1] → R and Φ : IT → [δ, ∞) are some appropriate functions and δ, T ∈ R with T ∈ (0, δ]. The motivation for such problems lies in different branches of biophysics, biochemistry, climate model theory, and many other areas. In these systems, the rate of change of unknown function depends not only on the system’s current state at a fixed time but also on the values of the unknown at certain times in the past.
2 Assumptions and Abstract Formulation Considering a few assumptions throughout this paper as given below. (M0) 0 < T ≤ δ. (M1) The continuous functions Ψ : Iδ × [0, 1] → R and Φ : IT → [δ, ∞) satisfy the conditions that √ |Ψ (t1 , x) − Ψ (t2 , x)| ≤ L Ψ | 2x||t1 − t2 | for all (tι , x) ∈ Iδ × [0, 1] (ι = 1, 2) and for some L Ψ > 0, and |Φ(t1 ) − Φ(t2 )| ≤ L Φ |t1 − t2 | for all tι ∈ IT (ι = 1, 2) and for some L Φ > 0. (M2) G is a continuous function from IT × R × R × R into R and holds |G(t1 , m 1 , n 1 , o1 ) − G(t2 , m 2 , n 2 , o2 )| ≤
LG (|t1 − t2 | + |m 1 − m 2 | + |n 1 − n 2 | + |o1 − o2 |) 2
for all (tι , m ι , n ι , oι ) ∈ IT × R × R × R (ι = 1, 2) and for some L G > 0. (M3) K is a continuous function from IT × IT × R into R and holds LK |K (t1 , v, c1 ) − K (t2 , v, c2 )| ≤ √ (|t1 − t2 | + |c1 − c2 |) 2 for all tι , v ∈ IT (ι = 1, 2), for all c1 , c2 ∈ R and for some L K > 0. Consider the real Hilbert space Hs := L 2 [0, 1] with the inner product ·, · and the norm · . Defining D(−A) := {s ∈ Hs : s
∈ Hs , s(0) = 0 = s(1)} with −As = s
, where D is the domain of −A. Then, −A be an infinitesimal generator of a C0 -semigroup of contractions in Hs . Further, if the functions s : [−δ, T ] → Hs , Ψ ∗ : Iδ → Hs , G ∗ : IT × Hs × Hs × Hs → Hs , and K∗ : IT × IT × Hs → Hs are defined by
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function
341
s(t)(x) = L(t, x), ∗
Ψ (t)(x) = Ψ (t, x), G (t, a, b, r )(x) = G(t, a(x), b(x), r (x)), K∗ (t, , a)(x) = K (t, , a(x)), ∗
(2) (3) (4) (5)
respectively, then we can rewrite (1) as ⎧ t ⎨ ds (t) + As(t) = G ∗ t, s(t), s(t)s(t − Φ(t)), K∗ (t, , s())d , t ∈ (0, T ], dt 0 ⎩ s(t) = Ψ ∗ (t), t ∈ Iδ .
(6)
Since (M0)–(M3) hold, further presumptions hold true in the remaining part of the manuscript. (N0) 0 < T ≤ δ ≤ inf Φ(t). t∈IT
(N1) Ψ ∗ and Φ are Lipschitz continuous functions, that is, Ψ ∗ (t1 ) − Ψ ∗ (t2 ) ≤ L Ψ |t1 − t2 | ∀ tι ∈ Iδ (ι = 1, 2), and |Φ(t1 ) − Φ(t2 )| ≤ L Φ |t1 − t2 | ∀ tι ∈ IT (ι = 1, 2). (N2) G ∗ satisfies the condition that G ∗ (t1 , m 1 , n 1 , o1 ) − G ∗ (t2 , m 2 , n 2 , o2 ) ≤ L G (|t1 − t2 | + m 1 − m 2 + n 1 − n 2 + o1 − o2 ) ∀ (tι , m ι , n ι , oι ) ∈ IT × Hs × Hs × Hs (ι = 1, 2). (N3) K∗ satisfies the condition that K∗ (t1 , v, c1 ) − K∗ (t2 , v, c2 ) ≤ L K (|t1 − t2 | + c1 − c2 ) for all tι , v ∈ IT (ι = 1, 2), for all c1 , c2 ∈ Hs . Using (N0), we can rewrite (6) as ⎧ t ⎨ ds (t) + As(t) = G ∗ t, s(t), s(t)Ψ ∗ (t − Φ(t)), K∗ (t, , s())d , t ∈ (0, T ], dt 0 ⎩ s(t) = Ψ ∗ (t), t ∈ Iδ .
(7)
342
N. Gupta and Md. Maqbul
3 Discretization and a Priori Estimates Define the real numbers MΨ1∗ , MΨ2∗ and Mk by MΨ1∗ = MΨ2∗ =
sup
|1 − Ψ (t − Φ(t), x)|,
x∈[0,1], t∈IT
sup
|Ψ (t − Φ(t), x)|,
x∈[0,1], t∈IT
Mk = = sup K∗ (0, z, Ψ (0)) . z∈IT
For n ∈ Z+ , let t0n = 0, h n = Tn , and t nj = jT for 1 ≤ j ≤ n. n Let us consider s0n = Ψ ∗ (0) ∀ n ∈ Z+ . Then, by Theorem 1.4.3 of [6], we get {s nj }nj=1 as the unique solution of 1 (s − s nj−1 ) + As = G ∗ t nj , s nj−1 , s nj−1 Ψ ∗ (t nj − φ(t nj )), hn
t nj 0
K∗ (t nj , , s nj−1 )d . (8)
successively. Lemma 1 There exists Ω > 0 such that s nj − Ψ ∗ (0) ≤ Ω,
j = 1, 2, . . . , n, n ∈ Z+ .
Proof Putting j = 1 in (8), we get 1
(s1n − s0n ), s1n − s0n + A(s1n − s0n ), s1n − s0n hn t1n
K∗ (t1n , , s0n )d , s1n − s0n = G ∗ t1n , s0n , s0n Ψ ∗ (t1n − Φ(t1n )), 0
− As0n , s1n − s0n . By Theorem 1.4.3 of [6], we obtain 1 n s − s0n 2 hn 1 ≤ G ∗ t1n , s0n , s0n Ψ ∗ (t1n − Φ(t1n )),
t1n 0
K∗ (t1n , , s0n )d − As0n s1n − s0n ,
that is, t1n 1 n ∗ n n n ∗ n n n s1 − s0 ≤ G t1 , s0 , s0 Ψ (t1 − Φ(t1 )), K∗ (t1n , , s0n )d + As0n . hn 0 (9) Now, consider
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function
∗ n n n ∗ n n G t1 , s0 , s0 Ψ (t1 − Φ(t1 )),
t1n 0
343
K∗ (t1n , , s0n )d
t1n ∗ n n n ∗ n n ≤ G t1 , s0 , s0 Ψ (t1 − Φ(t1 )), K∗ (t1n , , s0n )d − G ∗ 0, Ψ ∗ (0), 0, 0 t1n t1n K∗ (0, , Ψ ∗ (0))d + G ∗ 0, Ψ ∗ (0), 0, K∗ (0, , Ψ ∗ (0))d 0 0
≤ L G T + MΨ2∗ Ψ ∗ (0) + L K T 2 + Mk T + G ∗ (0, Ψ ∗ (0), 0, 0) . (10) By (9) and (10), we get s1n − s0n ≤ h n L G t1n + MΨ2∗ Ψ ∗ (0) + L K (t1n )2 + Mk t1n
+ G ∗ (0, Ψ ∗ (0), 0, 0) + A(Ψ ∗ (0)) ≤ h n L G T + MΨ2∗ Ψ ∗ (0) + L K T 2 + Mk T
+ G ∗ (0, Ψ ∗ (0), 0, 0) + A(Ψ ∗ (0)) .
(11)
Similarly, by using (8) and Theorem 1.4.3 of [6], we get s nj − s0n
≤ s nj−1 − s0n + h n G ∗ t nj , s nj−1 , s nj−1 Ψ ∗ (t nj − Φ(t nj )), t nj K∗ (t nj , , s nj−1 )d + h n A(Ψ ∗ (0)) .
(12)
0
For each n ∈ Z+ and for 2 ≤ j ≤ n, we conclude that s nj − s0n ≤
s1n
−
s0n
j ∗ n n n + hn Ψ ∗ (tin − Φ(tin )), G ti , si−1 , si−1 i=2 tin 0
n K∗ (tin , , si−1 )d + ( j − 1)h n A(Ψ ∗ (0))
≤ 2T 2 L G (1 + Mk + T L K ) + 2T ( G ∗ (0, Ψ ∗ (0), 0, 0) + A(Ψ ∗ (0)) ) + Ψ ∗ (0) (MΨ1∗ + MΨ2∗ ) + (1 + MΨ2∗ + T L K )h n
j−1 i=1
Using Gronwall’s inequality, s nj − s0n ≤ a1 eb1 ( j−1)h n ≤ a1 eb1 T
sin − s0n .
(13)
344
N. Gupta and Md. Maqbul
for all n ∈ Z+ and 1 ≤ j ≤ n, where a1 =2T 2 L G (1 + Mk + T L K ) + 2T ( G ∗ (0, Ψ ∗ (0), 0, 0) + A(Ψ ∗ (0)) ) + Ψ ∗ (0) (MΨ1∗ + MΨ2∗ ), b1 =1 + MΨ2∗ + T L K . Hence the proof is completed. Lemma 2 There exists Ω > 0 such that 1 n s − s nj−1 ≤ Ω, hn j
j = 1, 2, . . . , n, n ∈ Z+ .
Proof By (11), we obtain 1 n s1 − s0n ≤L G T + MΨ2∗ Ψ ∗ (0) + L K T 2 + Mk T hn + G ∗ (0, Ψ ∗ (0), 0, 0) + A(Ψ ∗ (0)) . By Theorem 1.4.3 of [6] and Lemma 1, we get 1 n s − s nj−1 hn j 1 n ≤ s j−1 − s nj−2 + G ∗ t nj , s nj−1 , s nj−1 Ψ ∗ (t nj − Φ(t nj )), hn t nj K∗ (t nj , , s nj−1 )d − G ∗ t nj−1 , s nj−2 , s nj−2 Ψ ∗ (t nj−1 − Φ(t nj−1 )), 0
t nj−1
0
K∗ (t nj−1 , , s nj−2 )d
1 n ≤ s j−1 − s nj−2 + L G |t nj − t nj−1 | + s nj−1 − s nj−2 hn + s nj−1 Ψ ∗ (t nj − Φ(t nj )) − s nj−2 Ψ ∗ (t nj−1 − Φ(t nj−1 )) t nj−1 t nj ∗ n n + K (t j , , s j−1 )d − K∗ (t nj−1 , , s nj−2 )d . 0
0
Since, s nj−1 Ψ ∗ (t nj − Φ(t nj )) − s nj−2 Ψ ∗ (t nj−1 − Φ(t nj−1 )) ≤ L Ψ L Φ (Ω + Ψ ∗ (0) ) +MΨ2∗ s nj−1 − s nj−2 and
(14)
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function
t nj
K
0
∗
(t nj , , s nj−1 )d
−
t nj−1
0
345
K∗ (t nj−1 , , s nj−2 )d
≤ L K (2T + Ω) + Mk + T L K s nj−1 − s nj−2 , therefore, by (14), we get 1 n 1 n s j − s nj−1 ≤ s j−1 − s nj−2 + h n L G 1 + L Φ L Ψ (Ω + Ψ ∗ (0) ) hn hn + L K (2T + Ω) + Mk + L G (1 + MΨ2∗ + T L K ) s nj−1 − s nj−2 .
(15)
From (14), for each n ∈ Z+ and for 2 ≤ j ≤ n, we have 1 n s − s nj−1 hn j 1 n ≤ s1 − s0n + ( j − 1)h n L G 1 + L Φ L Ψ (Ω + Ψ ∗ (0) ) hn j n n +L K (2T + Ω) + Mk + L G (1 + MΨ2∗ + T L K ) si−1 − si−2 i=2
≤ T L G 2 + L φ L Ψ (Ω + Ψ (0) ) + L K (3T + Ω) + 2Mk + MΨ2∗ ∗
∗
∗
Ψ (0) + A(Ψ (0)) + L G (1 + M
Ψ2∗
j−1 n sin − si−1 + T L K )h n . h n i=1
By Gronwall’s inequality, we obtain 1 n s − s nj−1 ≤ a2 eb2 ( j−1)h n ≤ a2 eb2 T hn j for all n ∈ Z+ and 1 ≤ j ≤ n, where a2 = T L G 2 + L Φ L Ψ (Ω + Ψ ∗ (0) ) + L K (3T + Ω) + 2Mk + MΨ2∗ Ψ ∗ (0) + A(Ψ ∗ (0)) , b2 = L G (1 + MΨ2∗ + T L K ). Hence proved. For every n ∈ Z+ , define the sequences {P n } and {Q n } on [−δ, T ] by P (t) = n
Ψ ∗ (t), t ∈ Iδ , s nj−1 + h1n (t − t nj−1 )(s nj − s nj−1 ), t ∈ (t nj−1 , t nj ],
(16)
346
N. Gupta and Md. Maqbul
and Q n (t) =
Ψ ∗ (t), t ∈ Iδ , s nj , t ∈ (t nj−1 , t nj ].
(17)
Remark 1 From Lemma 2, it is apparent that P n (t) is Lipschitz continuous on IT for every n ∈ Z+ , and P n (t) − Q n (t) → 0 uniformly on IT as n → ∞. Since, Q n (t) ∈ D(A) for all t ∈ IT , then {Q n (t)}, {AQ n (t)}, and {P n (t)} are all uniformly bounded. For t ∈ (t nj−1 , t nj ], j = 1, 2, . . . , n, define Z n (t) by Z n (t) = G ∗ t nj , s nj−1 , s nj−1 Ψ ∗ (t nj − Φ(t nj )),
t nj 0
K∗ (t nj , , s nj−1 )d .
(18)
Then, by (8), we obtain d− n P (t) + AQ n (t) = Z n (t), t ∈ (0, T ], dt where
d− dt
(19)
represent the left derivative. So, we get
t
∗
AQ ()d = Ψ (0) − P (t) + n
0
n
t
Z n ()d, t ∈ (0, T ].
(20)
0
Lemma 3 There exists a continuous function s from [−δ, T ] into Hs which is Lipschitz continuous on IT such that P n (t) → s(t) as n → ∞ uniformly on [−δ, T ]. Proof For t ∈ (0, T ], using (19) and Theorem 1.4.3 of [6], we reach d− dt
(P n (t) − P q (t)), Q n (t) − Q q (t) ≤ Z n (t) − Z q (t), Q n (t) − Q q (t).
(21)
Now, consider 1 d− n P (t) − P q (t) 2 2 dt d−
(P n (t) − P q (t)), Q n (t) − Q q (t) = dt d−
(P n (t) − P q (t)), Q n (t) − P n (t) + P q (t) − Q q (t) . − dt By (21), we get
(22)
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function
347
1 d− n P (t) − P q (t) 2 2 dt d−
(P n (t) − P q (t)) − Z n (t) + Z q (t), P n (t) − P q (t) − Q n (t) + Q q (t) ≤ dt (23) +Z n (t) − Z q (t), P n (t) − P q (t). q
q
For t ∈ (t nj−1 , t nj ] ∩ (tm−1 , tm ], 1 ≤ j ≤ n, 1 ≤ m ≤ q, by Lemma 2, we get q
q
s nj−1 − sm−1 ≤ s nj−1 − P n (t) + P n (t) − P q (t) + P q (t) − sm−1 ≤ P n (t) − P q (t) + Ω(h n + h q ),
(24)
and Z n (t) − Z q (t) q ≤ L G |t nj − tmq | + s nj−1 − sm−1 + s nj−1 Ψ ∗ (t nj − Φ(t nj )) − sm−1 Ψ ∗ (tmq − Φ(tmq )) tmq t nj q ∗ n n + K (t j , , s j−1 )d − K∗ (tmq , , sm−1 )d . q
0
(25)
0
Since s nj−1 Ψ ∗ (t nj − Φ(t nj )) − sm−1 Ψ ∗ (tmq − Φ(tmq ))
≤ (h n + h q ) L Φ L Ψ (Ω + Ψ ∗ (0) + Ω MΨ2∗ + MΨ2∗ P n (t) − P q (t) q
and
q K∗ (tmq , , sm−1 )d 0 0
≤ (h n + h q ) Ω(T L K + L K ) + 2T L K + Mk + T L K P n (t) − P q (t) , t nj
K
∗
(t nj , , s nj−1 )d
−
q
tm
therefore, by (25), we have Z n (t) − Z q (t) ≤ ξnq + L 0 P n (t) − P q (t) , where ξnq = L G (h n + h q ) 1 + L Φ L Ψ (Ω + Ψ ∗ (0) ) + Ω(1 + MΨ2∗ + T L K
+L K ) + 2T L K + Mk , L 0 = L G (1 + MΨ2∗ + T L K ).
348
N. Gupta and Md. Maqbul
Clearly, ξnq → 0 as n, q → ∞. Therefore, d− n P (t) − P q (t) 2 ≤ L 0 (χnq + P n (t) − P q (t) 2 ) dt for a.e. t ∈ IT , where χnq is a real sequence such that χnq → 0 as n, q → ∞. Therefore,
t
P (t) − P (t) ≤ L 0 T χnq + n
q
2
P n (s) − P q (s) 2 ds .
0
Applying Gronwall’s lemma, ∃ a continuous function s from [−δ, T ] into Hs such that P n (t) converges uniformly to s(t) on [−δ, T ] as n → ∞. Clearly, s is Lipschitz continuous function on IT by Remark 1. Hence the proof is completed.
4 Main Result In this part, we will see that s(t) is a strong solution of (7). Theorem 1 If the conditions (N0)–(N3) are satisfied, then Eq. (7) has at least one strong solution s on [−δ, T ]. Moreover, if 2T L G (1 + MΨ2∗ + T L K ) < 1, then there exists a unique strong solution of Eq. (7) on [−δ, T ]. Proof Using Lemma 3 and concerning Remark 1, one can see Q n (t) converges uniformly to s(t) on IT , and s(t) ∈ D(A) for t ∈ IT . In view of Lemma 2.5 of [3], AQ n (t) As(t), where denotes the weak convergence. For t ∈ (t nj−1 , t nj ], t n ∗ ∗ K∗ (t, , s())d Z (t) − G t, s(t), s(t)Ψ (t − Φ(t)), 0 ∗ ≤ h n L G 1 + L φ L Ψ (Ω + Ψ (0) ) + Ω(1 + MΨ2∗ + L K + T L K ) + 2T L K +Mk + L G (1 + MΨ2∗ ) Q n (t) − s(t) + T L G L K sup Q n (t) − s(t) . t∈IT
t n ∗ ∗ Therefore, Z (t) − G t, s(t), s(t)Ψ (t − Φ(t)), K∗ (t, , s())d → 0 as n → ∞ uniformly on IT . By (20),
t 0
0
AQ n (), ud = Ψ ∗ (0), u − P n (t), u +
t
Z n (), ud
0
for all u ∈ Hs . By the bounded convergence theorem, we have
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function
t
As(), ud = Ψ ∗ (0), u − s(t), u +
0
349
t
G ∗ , s(), s()Ψ ∗ ( − Φ()),
K∗ (, v, s(v))dv , u d.
0
0
(26)
Thus, we have ds (t) + As(t) = G ∗ t, s(t), s(t)Ψ ∗ (t − Φ(t)), dt
t
K∗ (t, , s())d a. e. t ∈ IT .
0
(27) If s1 , s2 are strong solutions of (27) with s1 = s2 = Ψ ∗ on Iδ , then with the aid of (27) and Theorem 1.4.3 (b) of [6], for t ∈ IT , we obtain d s1 (t) − s2 (t) 2 dt t ≤ 2G ∗ t, s1 (t), s1 (t)Ψ ∗ (t − Φ(t)), K∗ (t, , s1 ())d − G ∗ t, s2 (t), 0 t K∗ (t, , s2 ())d s1 (t) − s2 (t) . (28) s2 (t)Ψ ∗ (t − Φ(t)), 0
Therefore, d s1 (t) − s2 (t) 2 ≤ 2L G (1 + MΨ2∗ + T L K ) sup s1 (t) − s2 (t) 2 . dt t∈IT For t ∈ IT , we conclude that s1 (t) − s2 (t) 2 ≤ 2T L G (1 + MΨ2∗ + T L K ) sup s1 (t) − s2 (t) 2 . t∈IT
Therefore, we have s1 (t) = s2 (t) on [−δ, T ].
5 Application Consider the following nonlinear delay diffusion problem ⎧ ∂L ∂2 L ⎪ ⎪ = + αL(t, x) + β L(t, x)L(t − δet , x) + λx sin t ⎪ 2 ⎪ ∂t ∂x ⎪ t ⎨ +μ cos(se−t + L(s, x))ds, t ∈ (0, T ], x ∈ [0, 1], ⎪ ⎪ ⎪ L(t, x) = ηx cos t,0 (t, x) ∈ I × [0, 1], ⎪ ⎪ δ ⎩ L(t, 0) = 0 = L(t, 1), t ∈ [−δ, T ],
(29)
350
N. Gupta and Md. Maqbul
where α, β, λ, μ, η, δ, T > 0 are real numbers and T ∈ (0, δ]. Here, Φ(t) = δet , s(t)(x) = L(t, x), Ψ ∗ (t)(x) = ηx cos t, K∗ (t, s, v)(x) = cos(se−t + v(x)) and G ∗ (t, v1 , v2 , v3 )(x) = λx sin t + αv1 (x) + βv2 (x) + μv3 (x). For t1 , t2 ∈ IT , we have |Φ(t1 ) − Φ(t2 )| ≤ δ sup et |t1 − t2 | ≤ δeT |t1 − t2 |. t∈IT
For t1 , t2 ∈ Iδ , consider Ψ ∗ (t1 ) − Ψ ∗ (t2 ) 2 = ≤
1
(ηx cos t1 − ηx cos t2 )2 d x
0 2
η (t1 − t2 )2 . 2
|η| Therefore, Ψ ∗ (t1 ) − Ψ ∗ (t2 ) ≤ √ |t1 − t2 | for all tι ∈ Iδ (ι = 1, 2). For tι , s ∈ 2 IT (ι = 1, 2) and v1 , v2 ∈ Hs , consider K∗ (t1 , s, v1 ) − K∗ (t2 , s, v2 ) 2 1 2 cos(se−t1 + v1 (x)) − cos(se−t2 + v2 (x)) d x = 0 1 2 se−t1 + v1 (x) − se−t2 − v2 (x) d x ≤ 0 1 ≤2 (se−t1 − se−t2 )2 + (v1 (x) − v2 (x))2 d x 0
≤ 2T 2 (t1 − t2 )2 + 2 v1 − v2 2 . √ Therefore, K∗ (t1 , s, v1 ) − K∗ (t2 , s, v2 ) ≤ 2 max{1, T }(|t1 − t2 | + v1 − v2 ) for all tι , s ∈ IT (ι = 1, 2) and vι ∈ Hs (ι = 1, 2). For tι ∈ IT (ι = 1, 2) and m 1 , m 2 , n 1 , n 2 , o1 , o2 ∈ Hs , we have G ∗ (t1 , m 1 , n 1 , o1 ) − G ∗ (t2 , m 2 , n 2 , o2 ) √ ≤ max{ 2|λ|, 2|α|, 2|β|, 2|μ|}(|t1 − t2 | + m 1 − m 2 + n 1 − n 2 + o1 − o2 ). Hence, by Theorem 1, if 2T L G (1 + MΨ2∗ + T L K ) < 1, then (29) has a unique √ solution, where L G = max{ 2|λ|, 2|α|, 2|β|, 2|μ|}, MΨ2∗ = sup | cos(t − δet )| and t∈IT , √ L K = 2 max{1, T }.
Approximate Solutions to Delay Diffusion Equations with Integral Forcing Function
351
6 Conclusion In this article, we studied a nonlinear diffusion equation with time delay and integral forcing function (1) subject to history and Dirichlet boundary conditions. We first reduced the problem (1) to a problem (7) of an evolution equation by using the semigroup theory of bounded linear operators. We considered a discretization method and then established some priori estimates. Further, we constructed a sequence of functions and then demonstrated this sequence converges uniformly to the unique strong solution of (7). Eventually, we considered an application to manifest the result.
References 1. Chaoui A, Guezane-Lakoud A (2015) Solutions to an integrodifferential equation with integral condition. Appl Math Comput 266:903–908 2. Gupta N, Maqbul Md (2019) Solutions to Rayleigh-Love equation with constant coefficients and delay forcing term. Appl Math Comput 355:123–134 3. Kato T (1967) Nonlinear semigroup and evolution equations. Math Soc Jpn 19:508–520 4. Maqbul Md (2018) Solutions to neutral partial functional differential equations with functional delay. J Phys: Conf Ser 1132:012024 5. Merazga N, Bouziani A (2007) Nonlinear analysis: theory, methods and applications. Nonlinear Anal 66:604–623 6. Pazy A (1983) Semigroup of linear operators and application to partial differential equations. Springer, New York 7. Raheem A, Bahuguna D (2011) A study of delayed cooperation diffusion system with dirichlet boundary conditions. Appl Math Comput 218:4169–4176 8. Rektorys K (1982) The method of discretization in time and partial differential equations. Reidel Publishing Company, Dordrecht-Boston-London. D 9. Rothe E (1930) Two-dimensional parabolic boundary value problems as a limiting case of onedimensional boundary value problems. Math Ann 102:650–670
On m-Bonacci Intersection-Sum Graphs Kalpana Mahalingam and Helda Princy Rajendran
Abstract We define a new graph called m-bonacci intersection-sum graph denoted by Im,n for positive integers m and n. The graph Im,n is a graph on the vertex set V = P(A) − ∅, A = {1, 2, 3, . . . , n}, and two vertices u and v are adjacent if and only if they are not disjoint and there exist i ∈ u and j ∈ v, i = j, i + j being an m-bonacci number. We use this graph to arrange all the non-empty subsets of {1, 2, 3, . . . , n} such that the consecutive subsets are not disjoint and there exists at least one element from the consecutive subsets, such that they are unequal and their sum is an m-bonacci number. Keywords m-bonacci number · Sum graph · Intersection graph
1 Introduction In graph theory, an intersection graph connects a family of sets to a graph. An intersection graph is an undirected graph formed from a family of non-empty sets. The graph is obtained by considering each set as a vertex and two different vertices are adjacent if they are not disjoint. Intersection graph plays an important role in both theoretical aspects as well as applications. Any graph can be represented as an intersection graph [5, 16]. In 1966, Erdös et al. proved that, for any graph G on n n2 vertices, there is a set S with 4 elements and a collection F of n subsets of S such 2 that the intersection graph on the sets F is G, and n4 is the lower bound for the cardinality of the set S [8]. Based on the properties and the structure of the intersection graph, different types of intersection graphs are defined. To name a few, some intersection graphs are K. Mahalingam · H. Princy Rajendran (B) Department of Mathematics, Indian Institute of Technology, Chennai 600036, India e-mail: [email protected] K. Mahalingam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_26
353
354
K. Mahalingam and H. Princy Rajendran
interval, line, trapezoid, chordal, unit disk, etc. [4, 6, 7, 10, 12, 17]. For more information on the types of intersection graphs and properties of intersection graphs, we refer the reader to [14, 15]. In this paper, we combine the concept of Intersection graphs and m-bonacci-sum graphs. In 2006, Barwell posed the problem of finding an ordering of the set {1, 2, 3, . . . , 34} such that the consecutive pairs sum up to a Fibonacci number [2]. In [9], Fox et al. defined a new set of graphs called Fibonacci-sum graphs to solve the problem posed by Barwell. Using that definition, several properties of Fibonacci-sum graphs were discussed in [1]. In [13], Mahalingam et al. considered an extension of Barwell’s problem. Extension of Barwell problem: Given any set {1, 2, . . . , n}, does there exist a partim−1 ) and an ordering of elements in each Ai such tion of it into m − 1 sets (say {Ai }i=1 that the sum of any two consecutive numbers in the set is an m-bonacci number? To solve this problem, Mahalingam et al. extended the concept of the Fibonacci-sum graph to the m-bonacci-sum graph by replacing Fibonacci numbers with m-bonacci numbers and solved the extension of Barwell’s problem. The extension of the Barwell problem has a solution only for certain values of n. For m = 3 and 4, the number n can be up to 9 and 11, respectively and for m ≥ 5, the number n can be up to 12. Thus, only for very few sets of integers does such an ordering exist. In order to extend the value of n, in this paper, we consider another extension of Barwell’s problem. The new extended version of Barwell’s problem is, given a set {1, 2, 3, . . . , n}, does there exist an ordering of all the possible non-empty subsets of {1, 2, . . . , n}, such that the consecutive sets are not disjoint and there exists at least one pair of unequal elements from the consecutive sets such that their sum is an m-bonacci number? To solve this problem, we defined a new set of graphs called m-bonacci intersection-sum graphs. Let A = {1, 2, 3, . . . , n}. The m-bonacci intersection-sum graph is a graph on the vertex set P(A) − ∅, where P(A) is the power set of the set A, any two vertices are adjacent if and only if the corresponding sets are not disjoint, and there exists at least one element from each set, such that they are unequal and their sum is an m-bonacci number. We found the values of n for which such an ordering exists. The paper is arranged as follows. In Sect. 2, notations, existing definitions, and some existing results are given. The definition and example of m-bonacci intersection-sum graph are given in Sect. 3. The properties of the m-bonacci intersection-sum graph are discussed in Sect. 4. In Sect. 5, the solution to the extension of Barwell’s problem is discussed. We end the paper with a few concluding remarks.
2 Preliminaries In this section, we recall some definitions. Let G be a graph on n vertices. We denote the degree of any vertex u of G by degG (u) or d(u). The closure of the graph G is obtained by adding edges between non-adjacent vertices whose degree sum is at
On m-Bonacci Intersection-Sum Graphs
355
least n until no such pair of vertices exists. The closure of G is denoted by C(G). A graph is called chordal if any cycle with four or more vertices has a chord. For the basic concepts of graph theory, we refer the reader to [3]. Now, we recall the definition of the intersection graph. Definition 1 Let A = {A1 , A2 , A3 , . . .} be a family of sets. Denote each set as a vertex vi , that is, the set Ai corresponds to the vertex vi . A graph with vertex set {v1 , v2 , . . .} and edge set defined as any two vertices are adjacent if and only if the corresponding sets have a non-empty intersection is called an intersection graph. Next, we recall the definition of m-bonacci numbers [11]. Definition 2 The m-bonacci sequence {Z n,m }n≥1 is defined by Z i,m = 0, 1 ≤ i ≤ m − 1, Z m,m = 1 and for n ≥ m + 1, Z n,m =
n−1
Z i,m .
i=n−m
Each Z i,m is called an m-bonacci number. For example, when m = 5, the sequence is {Z n,5 }∞ n=1 = {0, 0, 0, 0, 1, 1, 2, 4, 8, 16, 31 . . .}. We recall the definition and some properties of the m-bonacci-sum graph from [13]. Definition 3 ([13]) For fixed m ≥ 2, for each n ≥ 1, m-bonacci-sum graph denoted by G m,n = (V, E) is the graph defined on the vertex set V = [n] = {1, 2, . . . , n}, and with edge set E = {{i, j} : i, j ∈ V, i = j, i + j is an m-bonacci number}. The following results are some properties of the m-bonacci-sum graph G m,n from the literature which will be used later. Lemma 1 ([13]) If 2i ≤ n for some i ∈ {1, 2, . . . , m − 2}, in G m,n , 2i is not adjacent to any k < 2i . Proposition 1 ([13]) Let m ≥ 3. In G m,2i , 1 ≤ i ≤ m − 2, 2i and 2i−1 are the only isolated vertices. In G m,2m−1 , 2m−2 is the only isolated vertex. Proposition 2 ([13]) Let m ≥ 2. G m,n has at least one isolated vertex if n < N , N = 3 · 2m−2 − 1. For all n ≥ N , G m,n has no isolated vertex. Corollary 1 ([13]) Let k ≥ m + 1 and n satisfy Z k,m ≤ n < Z k+1,m . If n < Z k+2,m , 2 then the degree of the vertices Z k,m , Z k+1,m , . . . , n are at most one in G m,n . If n ≥ Z k+2,m , then the degree of the vertices Z k,m , Z k,m + 1, . . . , Z k+2,m − n − 1 are at most 2 one in G m,n . Theorem 1 ([13]) For each n ≥ 2m−2 , G m,n has exactly (m − 1) components.
356
K. Mahalingam and H. Princy Rajendran
3 m-Bonacci Intersection-Sum Graph In this section, we define the m-bonacci intersection-sum graph. Definition 4 Let m ≥ 2 and A = {1, 2, 3, . . . , n}. Let V = P(A) − ∅, where P(A) is the power set on A. The m-bonacci intersection-sum graph denoted by Im,n is defined as a graph on the vertex set V and the edge set E is defined as follows. Let u, v ∈ V, u = v. Then (u, v) ∈ E if and only if 1. u ∩ v = ∅ 2. there exist some i ∈ u and j ∈ v, such that i = j and i + j is an m-bonacci number. From the definition, it is clear that the number of vertices of the graph Im,n is 2n − 1. We illustrate the definition with examples given in Figs. 1 and 2.
Fig. 1 Im,3 , m ≥ 3
{1}
{1, 2, 3}
{2}
{2, 3}
{3}
{1, 3}
{1, 2}
{1}
Fig. 2 I2,3
{1, 2, 3}
{2}
{2, 3}
{3}
{1, 3}
{1, 2}
On m-Bonacci Intersection-Sum Graphs
357
4 Properties of m-Bonacci Intersection-Sum Graph In this section, we discuss some properties of the m-bonacci intersection-sum graph. Let A = {1, 2, . . . , n} and let V = P(A) − ∅. Denote the intersection graph on the vertex set V as In . One example is given in Fig. 3. The following result states the relation between In and Im,n . Lemma 2 The m-bonacci intersection-sum graph Im,n is a subgraph of In for any m ≥ 2. Proof Let m ≥ 2. Note that the vertex sets of In and Im,n are the same. If two vertices u and v are adjacent in Im,n , then u ∩ v = ∅. Hence, u and v are adjacent in In . Thus, Im,n is a subgraph of In for any m. The converse of Lemma 2 is not true. For example, consider the intersection graph I3 . In I3 , the vertex {3} is adjacent to the vertices {1, 3}, {2, 3}, and {1, 2, 3} (Fig. 3). However, in I2,3 , the vertices {3} and {1, 3} are not adjacent (Fig. 2) and in Im,3 , m ≥ 3, the vertices {3} and {2, 3} are not adjacent (Fig. 1). The following result states that the m-bonacci intersection-sum graph Im,n is always a subgraph of m-bonacci intersection-sum graph Im,n+i , i ≥ 0. This result is a direct observation from the definition of the m-bonacci intersection-sum graph. Lemma 3 For fixed m ≥ 2, for all n, Im,n is a subgraph of Im,n+i , i ≥ 0. The next result shows the relation between the m-bonacci-sum graph G m,n and the m-bonacci intersection-sum graph Im,n . Lemma 4 Let m ≥ 2 and n ≥ 1. The vertex x is isolated in G m,n if and only if the vertex {x} is isolated in Im,n .
{1}
Fig. 3 I3
{1, 2, 3}
{2}
{2, 3}
{3}
{1, 3}
{1, 2}
358
K. Mahalingam and H. Princy Rajendran
Proof Let x ∈ V (G m,n ). Assume that x is an isolated vertex in G m,n . Then, there exists no y, 1 ≤ y ≤ n, such that x + y is an m-bonacci number. Hence, the vertex {x} is isolated in Im,n . The proof for the converse part is similar. The following result follows directly from Lemma 1, Proposition 1, and Lemma 4. Lemma 5 If 2i ≤ n ≤ 2i+1 for some i ∈ {1, 2, . . . , m − 2}, in Im,n , the vertex {2i } is isolated. In [13], Mahalingam et al. proved that the m-bonacci-sum graph G m,n is bipartite. However, the m-bonacci intersection-sum graph Im,n is not bipartite for all but finite values of n. By direct verification, for m ≥ 2, the graphs Im,1 , Im,2 are bipartite. In the next result, we prove that the m-bonacci intersection-sum graph Im,n is not bipartite for all n ≥ 3. Proposition 3 Let m ≥ 2. The graph Im,n , n ≥ 3 is not bipartite. Proof We prove this result by considering the following two cases: Case 1: m = 2 graph has a cycle of length three. The cycle is formed by the vertices {1}, {1, 2}, {1, 2, 3}. Case 2: m ≥ 3 graph has a cycle of length three formed by the vertices {1}, {1, 3}, {1, 2, 3}. Hence, Im,n is not bipartite.
The m-bonacci-sum graph G m,n is disconnected and has exactly m − 1 components for each n ≥ 2m−2 [13]. But the m-bonacci intersection-sum graph Im,n is connected for all n ≥ 3 · 2m−2 − 1. In the next theorem, the connectedness of Im,n , n ≥ 2 is discussed. When n = 1, Im,1 = K 1 , which is connected. Theorem 2 Let m ≥ 2. The graph Im,n is connected for all n ≥ 3 · 2m−2 − 1. Also, for 1 < n < 3 · 2m−2 − 1, Im,n is disconnected. Proof Let n ≥ 3 · 2m−2 − 1 and let v, u ∈ V (Im,n ). Then, u, v = ∅. If u and v are adjacent, then we are done. Assume that u and v are not adjacent. Choose i ∈ u and j ∈ v such that i = j (since u = v). Since n ≥ 3 · 2m−2 − 1, by Proposition 2 and Lemma 4, there exist i 1 and j1 , 1 ≤ i 1 ≤ j1 ≤ n, such that i + i 1 and j + j1 are mbonacci numbers. Consider a set which has i, i 1 , j, j1 , say w. Clearly, by definition, w is adjacent to both u and v and hence, the graph Im,n is connected. Let 1 < n < 3 · 2m−2 − 1. Then, by Proposition 2, there exists an x such that x is an isolated vertex in G m,n . Now, by Lemma 4, the vertex {x} is isolated in Im,n . Hence, the result. In the following result, we discuss the degree of certain vertices of the graph Im,n .
On m-Bonacci Intersection-Sum Graphs
359
Lemma 6 Let m ≥ 2 and n ≥ 3 · 2m−2 − 1. (a) The degree of the vertex {1, 2, 3, . . . , n} in Im,n is 2n − 2. (b) The degree of the vertex {x}, 1 ≤ x ≤ n in Im,n is 2n − 2n−1 − 2n−k + 2n−k−1 , where k is the number of vertices adjacent to x in G m,n . (c) Let x ∈ {1, 2, 3, . . . , n}. The degree of the vertex {1, 2, 3, . . . , n} − {x} is 2n − 2 − 2k , where k is the number of pendant vertices adjacent to x in G m,n . Proof (a) Let u = {1, 2, 3, . . . , n} and let v ∈ V (Im,n ) − {u}. To prove the result, it is enough to show that u and v are adjacent. Clearly, u ∩ v = ∅. Since n ≥ 3 · 2m−2 − 1, there exist some x ∈ u and y ∈ v such that x + y is an m-bonacci number. Hence, u and v are adjacent. (b) Let u be a vertex such that x ∈ / u. Then, u ∩ {x} = ∅. Hence, u is not adjacent to {x}. Let A be the collection of such vertices. The cardinality of A is 2n−1 − 1. If a vertex u is adjacent to {x} in Im,n , then there exists a y ∈ u, such that x + y is an m-bonacci number. The number of such y ≤ n such that x + y is an m-bonacci number is the degree of x in G m,n . Let it be k. So, the vertices which do not contain such y are not adjacent to {x}. Let B be the collection of such vertices, and we have |B| = 2n−k − 1. By calculation, we have |A ∩ B| = 2n−k−1 − 1. Thus, the degree of the vertex {x} in Im,n is 2n − 2n−1 − 2n−k + 2n−k−1 . (c) Let u = {1, 2, 3, . . . , n} − {x}. Let S = {y : 1 ≤ y ≤ n, x = y, y is a pendant vertex adjacent to x in G m,n }. Let |S| = k and let v be any vertex in Im,n such that v ⊆ S. Clearly u ∩ v = v. By our choice of v, there exist no i ∈ u and j ∈ v, such that i + j is an mbonacci number. Hence, u and v are not adjacent. Let w be any other vertex in Im,n , such that w S and w = {x}. Since u = {1, 2, 3, . . . , n} − {x} and w S, w ∩ u = ∅, and w ∩ u S, choose i ∈ (w ∩ u) − S. Since n ≥ 3 · 2m−2 − 1 and i ∈ / S, by Proposition 2, there exist j ∈ u such that i + j is an m-bonacci number. Hence, w is adjacent to u in Im,n . Therefore, the only vertices that are not adjacent to u in Im,n are {x} and v, v ⊆ S. The number of such v is 2k − 1. Thus, the degree of the vertex u in Im,n is 2n − 2 − 2k . From Fig. 2, one can easily verify that the graph I2,3 is chordal. However, the graph Im,n is not necessarily chordal for all m and n. In the next theorem, we find the values of m and n for which the graph Im,n is not chordal. Theorem 3 For m = 3, the graph Im,n is not chordal for n ≥ 4. For m = 3, Im,n is not chordal for n ≥ 5. Proof For n = 2, the graph Im,n is either a path on three vertices or K 3c . Hence, Im,2 is chordal. For n = 3, from Figs. 1 and 2, we can directly verify that the graph Im,3 is chordal. By direct verification, it is easy to check that I3,4 is chordal. Let m = 2 and n ≥ 4. Consider the cycle C : {1, 3}, {2, 3}, {2, 4}, {1, 4}, {1, 3}. Since {1, 3} ∩ {2, 4} = ∅, the vertices {1, 3} and {2, 4} are not adjacent. Similarly,
360
K. Mahalingam and H. Princy Rajendran
the vertices {2, 3} and {1, 4} are not adjacent. That is, C is a chordless cycle of length four. Hence, I2,n is not chordal for all n ≥ 4. Let m = 3 and n ≥ 5. The cycle {1, 5}, {3, 4, 5}, {2, 3}, {1, 2}, {1, 5} is a chordless cycle of length four and hence I3,n is not chordal for all n ≥ 5. Let m ≥ 4 and n ≥ 4. The cycle {1, 2}, {2, 3}, {1, 2, 4}, {2, 3, 4}, {1, 2} is a chord less cycle of length four and hence Im,n is not chordal for all n ≥ 4. In the next theorem, we show that the graph Im,n , n ≥ 2 is not Eulerian. For m ≥ 2 and n = 1, Im,1 = K 1 , which is Eulerian. Theorem 4 Let m ≥ 2. The graph Im,n is not Eulerian for all n ≥ 2. Proof By Theorem 2, the graph is disconnected for 1 < n < 3 · 2m−2 − 1, and hence Im,n is not Eulerian in this case. Let n ≥ 3 · 2m−2 − 1. There are two cases. 1. m = 2 2. m ≥ 3. Case 1: m = 2 For n = 2, I2,2 is a path on three vertices and hence, not Eulerian. Let n ≥ 3. Choose x ∈ {1, 2, . . . , n}, such that there exists exactly one y, y ≤ n and x + y is an mbonacci number. That is, choose a number x, such that x is a pendant vertex in G 2,n . Since n ≥ 3 · 2m−2 − 1 and by Corollary 1, such an x always exists. By Theorem 1, G 2,n is connected and hence y is not a pendant vertex in G 2,n (since n ≥ 3). By Lemma 6, the degree of the vertex {1, 2, 3, . . . , n} − {x} is 2n − 3. The degree of the vertex {1, 2, 3, . . . , n} − {x} is odd and hence, Im,n is not Eulerian. Case 2: m ≥ 3 By Theorem 1, the graph G m,n has exactly m − 1 components. Let the components be A1 , A2 , . . . , Am−1 . Let A1 be the component that has the vertex 1 in G m,n . Since m ≥ 3, n ≥ 3 · 2m−2 − 1 ≥ 5. Let B be the vertex set of A1 . Since n ≥ 5, for m = 3, B has at least 1, 3, and 4. Similarly, for m ≥ 4, B has at least 1, 3, and 5. Hence, the number of vertices in the component A1 is at least 3. Choose x ∈ B, such that d A1 (x) ≤ d A1 (y) for each y ∈ B and y = x, where d A1 (x) denotes the degree of the vertex x in A1 . If the degree of x is one in A1 , then the degree of x in G m,n is also one. Let y be the vertex adjacent to x in G m,n . Since the number of vertices in A1 is at least three, the degree of the vertex y in G m,n must be at least two (since A1 is a component and hence, connected). Hence, x is not adjacent to any pendant vertex in G m,n . Now, consider the vertex u = {1, 2, 3, . . . , n} − {x} in Im,n . By Lemma 6, the degree of the vertex u in Im,n is 2n − 3, which is odd. If the degree of the vertex x is greater than one in A1 , any vertex that is adjacent to x in G m,n is not a pendant vertex (since the degree of x is the minimum in A1 ). By Lemma 6, the degree of the vertex {1, 2, 3 . . . , n} − x is 2n − 3, which is odd. Therefore, Im,n is not Eulerian. Hence, the result.
On m-Bonacci Intersection-Sum Graphs
361
5 Extension of Barwell’s Problem In this section, we solve the extension of Barwell’s problem. In this paper, we considered the following extension of Barwell’s problem: Given a set {1, 2, 3, . . . , n}, does there exist an ordering of all the possible non-empty subsets of {1, 2, . . . , n}, such that the consecutive sets are not disjoint and there exists at least one pair of unequal elements from the consecutive sets such that their sum is an m-bonacci number? To solve this problem, it is enough to check whether the graph Im,n has a spanning path or not. In the following theorem, we show that the m-bonacci intersection-sum graph Im,n is Hamiltonian for all n ≥ 3 · 2m−2 − 1 and hence, it has a spanning path. Theorem 5 Let m ≥ 3. The graph Im,n is Hamiltonian for all n ≥ 3 · 2m−2 − 1. For m = 2, the graph Im,n is Hamiltonian for all n ≥ 4. Proof Let A = {1, 2, · · · , n}. We know that if the closure of a graph is Hamiltonian, then the graph is Hamiltonian. So, it is enough to prove that C(Im,n ) is Hamiltonian. Now, we have the following two cases. 1. m ≥ 3 and n ≥ 3 · 2m−2 − 1 2. m = 2 and n ≥ 4. Case 1: m ≥ 3 and n ≥ 3 · 2m−2 − 1 First, let us find the degree of any given vertex in Im,n . Let u be a vertex of Im,n and u be defined as u = {x : ∃ y ∈ u such that x = y, x + y is an m − bonacci number}. Throughout this theorem, we follow the above definition and notation for u . Let |A − u| = k1 , |A − u | = k2 , |A − (u ∪ u )| = k3 . Let v ⊆ A − u. Since u ∩ (A − u) = ∅, u and v are not adjacent. Similarly, any vertex which is a subset of A − (u ∪ u ) or A − u is not adjacent to u. Thus, a vertex that is a subset of either A − u, A − u , or A − (u ∪ u ) is not adjacent to u. Hence, the degree of u is 2n − 2k1 − 2k2 + 2k3 − 1 if u = u , otherwise, the degree of u is 2n − 2n−k − 1, where k is the cardinality of the set u. Claim 1: d({x1 , x2 , . . . , xk }) ≤ d({x1 , x2 , . . . , xk , xk+1 }). Let u and v be two vertices of Im,n such that u = {x1 , x2 , . . . , xk } and v = {x1 , x2 , . . . , xk , xk+1 }. Note that u ⊂ v and hence u ⊆ v . Since |u| < |v|, |A − u| > |A − v|. Let |A − v | = k2 and |A − (v ∪ v )| = k3 . Here, we have two cases. (1) u = u
(2) u = u . Subcase 1: If u = u , then d(u) = 2n − 2n−k − 1. Since u ⊂ v and xk+1 ∈ / u, xk+1 ∈ / u . Since n ≥ 3 · 2m−2 − 1, there exist y ∈ A, such that xk+1 + y is an m-bonacci number. Since u = u and xk+1 ∈ / u , y ∈ / u . However, by the definition of v , y ∈ v .
Therefore, u ⊂ v . Let |v | = k + k1 , k1 ≥ 1. In this case, d(v) is 2n − 2n−(k+1) − 2n−(k+k1 ) + 2n−(2k+k1 +1) − 1. Since k1 ≥ 1, 2n−(k+1) + 2n−(k+k1 ) − 2n−(2k+k1 +1) +
362
K. Mahalingam and H. Princy Rajendran
1 ≤ 2n−k + 1. Therefore, the degree of the vertex v is greater than the degree of the vertex u. Hence, d(v) ≥ d(u). Subcase 2: Proof of the case, when u = u is similar to Subcase 1, we can conclude that d(u) ≤ d(v). Claim 2: Degree of each and every vertex in C(Im,n ) is 2n − 2. Let u be a vertex of Im,n . Let |u| = n. Since n ≥ 3 · 2m−2 − 1, by Lemma 6, we have the degree of u is 2n − 2. Now, let us consider u to be a vertex of Im,n with |u| = n − 1. Let x be the only element not in u. Since n ≥ 3 · 2m−2 − 1, there exist at least one y ∈ {1, 2, . . . , n} − {x} such that x + y is an m-bonacci number. Collect all y such that x is the only number in the set {1, 2, . . . , n} − {y} which satisfies the condition x + y is an m-bonacci number. Let B be the collection of such y ∈ {1, 2, . . . , n} and |B| = k. By Lemma 6, the degree of the vertex u is 2n − 2k − 2. Since n ≥ 5, 0 ≤ k ≤ n − 3. Let y ∈ B. Consider the vertex {y}. Degree of the vertex {y} is 2n−2 . The sum of the degrees of the vertices u and {y} is at least 2n + 2n−3 − 1, which is greater than the number of vertices of Im,n . By the definition of closure of a graph, u and {y} are adjacent in C(Im,n ). This implies that any vertex of the form {y}, y ∈ B is adjacent to u in C(Im,n ). By Claim 1, the degree of any vertex of the form {x1 , x2 , . . . , xk1 } ⊆ B, k1 > 1 is greater than or equal to the degree of the vertex {x1 }. Thus, any vertex which is a subset of B is adjacent to u in C(Im,n ). Now, the only remaining vertex is {x}. Since u is adjacent to every other vertex of Im,n and the degree of {x} is at least 2n−2 , u is adjacent to {x} in C(Im,n ). Therefore, the degree of u in C(Im,n ) is 2n − 2. Now, let us consider the case of |u| = n − 2. Let x and y be the only elements not in u. If x + y is an m-bonacci number and there does not exist any z ∈ u such that neither x + z nor y + z is an m-bonacci number, then the degree of u is 2n − 5. Since the minimum degree is 2n−2 , n ≥ 5, u is adjacent to each and every vertex in C(Im,n ). Now, assume that there exists a z ∈ u such that x + z is an m-bonacci number. Also, assume that there exists no z ∈ u such that y + z is an m-bonacci number. This implies that x + y is an m-bonacci number (since n ≥ 3 · 2m−2 − 1). Collect all z such that x is the only number in the set {1, 2, . . . , n} − {z} which satisfies the condition x + z is an m-bonacci number. Let the collection be C and |C| = k. Note that y ∈ C. By definition, any vertex which is a subset of C is not adjacent to u in Im,n . Also, the vertices {x} and {x, y} are not adjacent to u in Im,n . Hence, the degree of u is 2n − 2k − 3, 1 ≤ k ≤ n − 3. Any vertex of the form {z}, z ∈ C has degree 2n−2 . The sum of the degrees of vertices u and {z} is at least 2n + 2n−3 − 3. Since n ≥ 5, the degree sum is greater than the number of vertices in Im,n . Thus, u and {z} are adjacent in C(Im,n ). This implies that any vertex which is a subset of C is adjacent to u in C(Im,n ) (since d({x1 }) ≤ d({x1 , x2 }) for any x1 , x2 ∈ {1, 2, . . . , n}). The remaining vertices are {x} and {x, y}. By the definition of C(Im,n ), u is adjacent to each and every vertex. In a similar way, we can prove that the degree of each and every vertex in C(Im,n ) is 2n − 2. Thus, C(Im,n ) is Hamiltonian and hence, Im,n is Hamiltonian.
On m-Bonacci Intersection-Sum Graphs
363 {1, 2, 3, 4}
{1, 2}
{3} {2, 3}
{1, 4}
{1}
{2, 4}
{1, 2, 3}
{1, 2, 4}
{2}
{4}
{2, 3, 4}
{1, 3, 4} {3, 4}
{1, 3}
Fig. 4 Hamiltonian cycle of I2,4
Case 2: m = 2 and n ≥ 4 Consider the graph I2,4 . One of the Hamiltonian cycles in I2,4 is given in Fig. 4. Hence, I2,4 is Hamiltonian. For n ≥ 5, the proof is similar to that of m ≥ 3 and n ≥ 3 · 2m−2 − 1 case. In the following corollary, we show that the bound given in Theorem 5 is strict. Corollary 2 Let m ≥ 3. The graph Im,n is not Hamiltonian for all 1 < n < 3 · 2m−2 − 1. For m = 2, the graph Im,n is not Hamiltonian for all n = 2, 3. Proof For m ≥ 3, the graph Im,n is not connected for 1 < n < 3 · 2m−2 − 1 and hence not Hamiltonian. For m = 2, I2,2 is a path on three vertices. For I2,3 , from Fig. 2, it is easy to verify that the length of the largest cycle is six, which is lesser than the number of vertices. Hence, the result. From Theorem 5, we have that the m-bonacci intersection-sum graph Im,n , m ≥ 3 is Hamiltonian for all n ≥ 3 · 2m−2 − 1 and hence, it has a spanning path. Thus, for n ≥ 3 · 2m−2 − 1 and m ≥ 3, it is possible to arrange the subsets in such a way that the consecutive subsets are not disjoint and there exist i ∈ U and j ∈ V , i = j, where U and V are consecutive subsets in the arrangement and i + j is an m-bonacci number. From Theorem 2, we know that for 1 < n < 3 · 2m−2 − 1, the graph Im,n
364
K. Mahalingam and H. Princy Rajendran
is disconnected and hence it does not have any spanning path. Hence, for 1 < n < 3 · 2m−2 − 1 and m ≥ 2, it is not possible to get the required arrangement. For m = 2, from Theorem 5, we have that the graph I2,n is Hamiltonian for every n ≥ 4. For n = 1, 2, and 3, direct verification shows that I2,n has a spanning path. Now, we give the solution to the extension of Barwell’s problem. Extension of Barwell’s problem: Given a set {1, 2, 3, . . . , n}, does there exist an ordering of all the possible non-empty subsets of {1, 2, . . . , n}, such that the consecutive sets are not disjoint and there exists at least one pair of unequal elements from the consecutive sets such that their sum is an m-bonacci number? Finding such an ordering is equivalent to finding a spanning path in the graph Im,n . The following table gives the values of n for a given m, for which such an ordering exists. m=2 m≥3
n≥1 n = 1 & n ≥ 3 · 2m−2 − 1
6 Conclusion In this paper, we defined a new intersection graph called the m-bonacci intersectionsum graph. We discussed the properties of the m-bonacci intersection-sum graph and showed that the graph is connected, not bipartite, and not Eulerian. We proved that the non-empty subsets of the set {1, 2, 3, . . . , n} can be arranged in such a way that the consecutive subsets in the arrangement have a non-empty intersection, and there exists at least one element from each pair of consecutive subsets such that they are not equal and their sum is an m-bonacci number for all but finitely many values of n. It will be interesting to look into the other graph-theoretical properties of the m-bonacci intersection-sum graph such as the number of automorphisms of Im,n and so on. Acknowledgements The second author wishes to acknowledge the fellowship received from the Department of Science and Technology under INSPIRE fellowship (IF170077).
References 1. Arman A, Gunderson DS, Li PC (2017) Properties of the Fibonacci-sum graph. arXiv:1710.10303v1, [math CO] 2. Barwell B (2006) Problem 2732, Problems and conjectures. J Recreat Math 34:220–223 3. Bondy JA, Murty USR (2008) Graph theory. Springer, New York 4. Clark BN, Colbourn CJ, Johnson DS (1990) Unit disk graphs. Discret Math 86(1–3):165–177 ˇ 5. Culík K (1964) Applications of graph theory to mathematical logic and linguistics, Theory of graphs and its applications (Proceedings symposium smolenice, 1963), 13–20. Publ. House Czech. Acad. Sci, Prague
On m-Bonacci Intersection-Sum Graphs
365
6. Dagan I, Golumbic MC, Ron Yair Pinter RY (1988) Trapezoid graphs and their coloring. Discret Appl Math 35–46 7. Dirac GA (1961) On rigid circuit graphs. Abh Math Semin Univ Hambg 25(1–2):71–76. https:// doi.org/10.1007/BF02992776 8. Erdös P, Goodman AW, Pósa L (1966) The representation of a graph by set intersections. Can J Math 18:106–112 9. Fox K, Kinnersely WB, McDonald D, Orflow N, Puleo GJ (2014) Spanning paths in FibonacciSum graphs. Fibonacci Q 52:46–49 10. Harary F, Norman RZ (1960) Some properties of line digraphs. Rendiconti del Circolo Matematico di Palermo 9(2):161–169 11. Kappraff J (2002) Beyond measure: a guided tour through nature, myth and number, chapter 21. World Scientific, Singapore 12. Lekkerkerker CG, Boland JC (1962) Representation of a finite graph by a set of intervals on the real line. Fundam Math 51:45–64. https://doi.org/10.4064/fm-51-1-45-64 13. Mahalingam K, Rajendran HP (2021) Properties of m-bonacci-sum graphs. Discret Appl Math 14. McKee TA, McMorris FR (1999) Topics in intersection graph theory. SIAM 15. Pal M (2013) Intersection graphs: an introduction. Ann Pure Appl Math 4(1):43–91 16. Szpilrajn-Marczewski E (1945) Sur deux propriétés des classes d’ensembles. Fundam Math 33:303–307 17. Whitney H (1932) Congruent graphs and the connectivity of graphs. Am J Math 54(1):150–168
Three-Time Levels Compact Scheme for Pricing European Options Under Regime Switching Jump-Diffusion Models Pradeep Kumar Sahu, Kuldip Singh Patel, and Ratikanta Behera
Abstract This article presents three-time levels compact scheme for solving the partial integro-differential equation (PIDE) arising in European option pricing under jump-diffusion models in the regime switching market. A diagonally dominant system of linear equations is achieved for a fully discrete problem by eliminating the second derivative approximation using the unknown itself and its first derivative approximation. Moreover, the problem’s initial condition is smoothed to assure the fourth-order convergence of the proposed three-time levels implicit compact scheme. Numerical illustrations for solving PIDE are obtained and results are compared with the three-time levels finite difference scheme. Keywords Compact schemes · Three-time levels implicit method · Regime switching jump-diffusion models · Partial integro-differential equation
1 Introduction The assumptions of constant volatility and the log-normal nature of the distribution of underlying assets considered by Black and Scholes [1] to derive option pricing have been proven inconsistent with the real market scenario. Consequently, the research community came up with advanced models to elaborate on terms such as negative P. K. Sahu Department of Mathematics, International Institute of Information Technology Naya Raipur, 493661 Chhattisgarh, India e-mail: [email protected] K. S. Patel (B) Department of Mathematics, Indian Institute of Technology Patna, Bihta 801106, Bihar, India e-mail: [email protected] R. Behera Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_27
367
368
P. K. Sahu et al.
skewness, heavy tails, and volatility smile. In one of those efforts, the jump phenomenon was incorporated into the behavior of the underlying asset by Merton [2] to surpass the shortcomings of the Black–Scholes model. The model is termed the jump-diffusion Merton model, and in this case, a partial integro-differential equations (PIDEs) are derived for pricing European options. Jump sizes in Merton’s model follow the Gaussian distribution. In another approach, Kou [3] suggested that jump sizes follow a double exponential distribution. This model is termed the Kou jump-diffusion model. In another effort, volatility is considered a stochastic process and is referred to as stochastic volatility models. A two-dimensional PDE comes into the picture for pricing an option under stochastic volatility models [4]. The aforementioned approaches (jump-diffusion and stochastic volatility models) have been discussed in detail to capture volatility smile. However, in subsequent studies, it turns out that jump-diffusion models explain the volatility smile only for short maturities [5], whereas stochastic volatility models perform satisfactorily for long maturities [6]. In order to overcome this limitation, a regime switching jumpdiffusion model (RSJDM) has been proposed which is able to capture the volatility smile for short as well as for long maturities. In spite of the popularity of analytical solutions to option pricing problems (for example, the Black–Scholes model), it is not available for RSJDM. Therefore, efficient and accurate numerical methods are required. The finite difference method has been considered to price the European and American options under RSJDM in [7], however, the proposed approach is only second-order accurate. It has been noted that a substantial increment in the number of grid points of computational stencil may result in high-order accurate FDM, however, the inclusion of boundary conditions would become tedious in such a case. Moreover, in such a case, the discretization matrices with more bandwidth appear in fully discrete problems, and it may also suffer from restrictive stability conditions. Therefore, FDMs have been developed using compact stencils at the cost of some complexity in their evaluation, and these are generally referred to as compact schemes. The compact schemes provide high-order accuracy, and they are also parsimonious while solving the problems on hypercube computational domains as compared to FDMs. Apart from this advantage, another exception is that compact schemes can be developed in three ways. In the first approach, the primary equation is treated as an auxiliary equation and the derivative of the leading term of truncation error is approximated compactly [8]. The second approach is known as the operator compact implicit (OCI) method. In this approach, a relationship on three adjacent grid points between the PDE operator and an unknown variable is obtained, and a resulting fourth-order accurate relationship is derived by Taylor series expansion. In the third approach, Hermitian schemes [9] are considered for spatial discretization of PDEs, [10]. The third approach has already been discussed for the American option under RSJDM [11]. In this manuscript, we consider the third approach for developing a compact scheme with a three-time level for pricing European options. The system of partial integro-differential equations is solved using a compact scheme. Numerical results are compared with a finite difference scheme. It is observed that the proposed scheme is fourth-order accurate.
Three-Time Levels Compact Scheme for Pricing European …
369
This article is organized as follows. In Sect. 2, a model of the problem is given. The three-time level compact scheme is proposed for pricing European options in Sect. 3. Numerical illustrations are featured to verify the theoretical claims in Sect. 4. Conclusion and future work are discussed in Sect. 5.
2 The Model Problem To analyze financial derivatives, we take into account the RSJDM as a stochastic process of an underlying asset. We employ the RSJDM framework because it enables us to consider the various economic conditions and examine the affect of the jump in the underlying assets. It is specified that a continuous-time Markov chain process takes a value in finite state space H = {e1 , e2 , . . . , e Q }, if it has the form Y = (Yt )t≥0 on a probability space (, F, P). Transition probability is given by
P Yt+δt = eq | Yt = e j =
γq j δt + O(δt) 1 + γq j δt + O(δt)
for q = j, for q = j,
j
for a Q−dimensional vector e j := eq such that eqj
=
0 1
for q = j, for q = j.
Now Markov chain process Yt in [7] is given as dYt = d Mt + AYt− dt, where A = γq j Q×Q is the Markov chain generator Yt and Mt is a martingale. σt is volatility and rt is risk free interest rate given as σt = σ, Yt and rt = r, Yt , where σ = (σ1 , σ2 , . . . , σ Q )T and r = (r1 , r2 , . . . , r Q )T are vectors of dimensional Q with σ j ≥ 0 & r j ≥ 0 for all 1 ≤ j ≤ Q. Here < ., . > represents scalar product in R Q and (...)T represents the matrix’s transposition. Then we assume the underlying asset St follows RSJDM, the stochastic differential equation of St is given by d St = (rt − λt ζt ) dt + σt dWt + ηt d Mt , St−
(1)
T where Wt is a Wiener process, Mt = Mt , Yt with Mt = Mt1 , Mt2 , . . . , MtQ is a Poisson process with intensity λt at the state Yt , given as λt = λ, Yt with λ =
370
P. K. Sahu et al.
T T λ1 , λ2 , . . . , λ Q , ηt = η, Yt with η = η1 , η2 , . . . , η Q is a random variable to generate jump sizes from St− to St , and ζt is the scalar product of the expectation T of η and the Markov process Yt , that is, ζt = ζ, Yt with ζ = ζ1 , ζ2 , . . . , ζ Q and ζ j = E η j for 1 ≤ j ≤ Q. The stochastic processes Wt , Yt , Mt1 , Mt2 , . . . , MtQ are mutually independent and the jumps of both the processes Yt and Mt do not happen simultaneously almost surely. When St follows the with the no arbitrage condition in (1), the price of RSJDM a European option v τ , x, e j satisfies the following PIDE vτ τ , x, e j − Lv τ , x, e j = 0, for all τ , x, e j ∈ (0, T ] × R × H with initial condition v 0, x, e j = a(x), where a(x) is the payoff function and Lv is the integro-differential operator defined by
σ 2j σ 2j ∂ 2 v ∂v τ , x, e j − r j + λ j τ , x, e j + r j − Lv τ , x, e j = − λjζj 2 2 ∂x 2 ∂x ∞
v τ , z, e j g z − x, e j dz + v, Ae j . ∗ v τ , x, e j + λ j −∞
(2) Here A is the generator of Yt , v =(v 1, v 2 , . . . , v Q )T and v j (x, τ ) = v(x, τ , e j ), 1 ≤ j ≤ Q. τ = T − t and x = ln SS0 such that T is maturity and S0 is initial underlying asset. g(x) is the probability density function of the random variable ln(η j + 1) that includes jump sizes of the log price x.
2.1 Localization to Bounded Domain We limit the infinite domain R to the localized domain = (−X, X ) with X > 0 in the log price x, to solve the PIDE for the European option numerically. The European option’s asymptotic values outside of the domain are, therefore, required. The payoff function a(·) of put option is given by a(x) = max 0, K − S0 e x , for x ∈ R, where K is the strike price, the asymptotic nature of the European put option is lim
x→−∞
v τ , x, e j − K e−r j τ − S0 e x = 0
and
lim v τ , x, e j = 0.
x→∞
(3)
Three-Time Levels Compact Scheme for Pricing European …
371
3 The Fully Discrete Problem With the help of first derivative approximations and unknowns, we obtain fourthorder accurate second derivative approximations of the unknowns. Second-order accurate central difference approximation for first and second derivatives from the Taylor series can be stated as x vi =
vi+1 − vi−1 vi+1 − 2vi + vi−1 , 2x vi = , δx δx 2
where vi is an approximation of v(xi ) at a grid point xi and δx is space step size. Now, for the first derivative, the fourth-order accurate compact finite difference approximation is represented as 3 1 1 3 1 vxi−1 + vxi + vxi+1 = − vi−1 + vi+1 , 4 4 δx 4 4
(4)
where vxi is an approximation of first derivative of v at grid point xi . Similarly, fourth-order accurate second derivative approximation can be expressed as 1 1 1 vx xi−1 + vx xi + vx xi+1 = 2 10 10 δx
6 12 6 vi−1 − vi + vi+1 , 5 5 5
(5)
where vx xi is an approximation of second derivative of the unknown v at grid point xi . Using the unknowns and approximations of their first-order derivative, second derivative approximations of unknowns are eliminated while maintaining the tridiagonal structure of the scheme. If vxi ’s are considered as a variable then from (4), we obtain 3 1 1 3 1 (6) vx xi−1 + vx xi + vx xi+1 = − vxi−1 + vxi+1 . 4 4 δx 4 4 Eliminating vx xi+1 and vx xi−1 from (5) and (6), we obtain approximation of second derivative as vx − vxi−1 vi+1 − 2vi + vi−1 . (7) − i+1 vx xi = 2 2 δx 2δx We can write (7) as vx xi = 22x vi − x vxi .
(8)
It has been observed that (4) and (8) give a fourth-order compact approximation for the first and second derivatives, respectively. To solve the PIDE, a three-time level method for temporal discretization and a high-order compact approximation for spatial discretization are explored.
372
P. K. Sahu et al.
∂v (x, τ , e j ) = Lv(x, τ , e j ), (x, τ , e j ) ∈ × (0, T ] × H, ∂τ v(x, τ , e j ) = h(x, τ , e j ), (x, τ , e j ) ∈ R \ × (0, T ] × H, v(x, 0) = a(x), (x, e j ) ∈ × H, where a(x) is given in Eq. (12), respectively. The operator L can be split into the following parts: ∂v (x, τ , e j ) = Dv(x, τ , e j ) + Iv(x, τ , e j ) + Ev(x, τ , e j ), ∂τ
(9)
where D represents the differential operator, and I represents the integration part. Operators D, I, and E can be written as Dv(x, τ , e j ) =
σ 2j ∂ 2 v σ 2j ∂v − λ j ζ j ) (x, τ , e j ), (x, τ , e ) + (r − j j 2 ∂x 2 2 ∂x
Iv(x, τ , e j ) = λ j
R
v(y, τ , e j )g(y − x, e j )dy,
Ev(x, τ , e j ) = −(r j + λ j )v(x, τ , e j ) + v, Ae j . For the approximation of differential operator D, finite difference approximations with a compact grid are used. The main difficulty arises in the approximation of integral terms. Integral operator Iv(x, τ , e j ) is separated into two parts, namely on R\ and to compute it numerically. In order to calculate the integral on , Simpson’s 1/3 rule is used. For more details on Simpson’s 1/3 rule, one can see [9]. It is shown that Simpson’s 1/3 rule is fourth-order accurate. By using the compact approximations for first and second derivatives and above approximation for integral operators, the operators D, I, E and L are approximated by D , I , E and L as follows: Dvnm, j
≈ D
m+1, j
vn
m−1, j
+ vn 2
, Ivnm, j ≈ I vnm, j , Evnm, j ≈ E vnm, j , Lvnm, j
≈
(10)
L vnm, j .
Now, compact finite difference operator D , discrete integral operator I , and E are defined as follows:
σ 2j m σ 2j m, j vx xn + r j − − λ j ζ vxmn , D vn = 2 2 ⎞ ⎛ N N 2 2 −1 δx ⎝ m, j j m, j j m, j j m, j j I vnm, j = λ j v2i−1 gn,2i−1 + 2 v2i gn,2i + v N gn,N ⎠ , v0 gn,0 + 4 3 i=1 i=1
Three-Time Levels Compact Scheme for Pricing European …
373
m,1 m,2 m,Z T where vm n = (vn , vn , . . . , vn ) . Hence, L can be defined as
L vnm, j
=
m, j
m, j
m, j
D vn + I vn + E vn m = 0, m+1, j m−1, j m, j vn +vn m + I pn + E vn m ≥ 1. D 2
(11)
The value of v at τ = 0 is obtained from the initial condition itself, and since the scheme is a three-time level, we also need the value of v at δτ to start the computation. m, j The explicit method obtains the value of v on the first time level. We define Vn , m, j the approximation of vn , which is a solution of the following problem: m+1, j
m, j
− Vn = D Vnm, j + I Vnm, j + E Vnm, j , for m = 0, δτ m+1, j
m+1, j m−1, j m−1, j Vn − Vn + Vn Vn = D + I Vnm, j + E Vnm, j , 2δτ 2 Vn
for 1 ≤ m ≤ M − 1, with the initial and boundary conditions for European put options discussed in Sect. 2. Since the initial condition is non-smooth, so we need to smooth the initial condition to achieve fourth-order convergence rate [12].
4 Numerical Results In this section, two examples are considered for pricing European put options under RSJDM. The payoff function a(x) for put option can be written as a(x) = max(0, K − S0 e x ) ∀ x ∈ R.
(12)
Merton and Kou, two types of RSJDMs, are discussed in this section. The probability density function g(x, e j ) for Merton’s model can be written as g(x, e j ) =
−
1
e j2
j (x−μ J )2 j2 2σ J
,
(13)
2πσ J j
j
where μ J is the mean and σ J is the standard deviation of the normal distribution at the j th state of economy. For Kou’s model j
j
g(x, e j ) = p j λ+ e−λ+ x Ix≥0 + (1 − p j )λ− eλ− x Ix≤0 , j
j
(14)
374
P. K. Sahu et al. j
j
where 0 ≤ p j ≤ 1, I A is the indicator function of A, λ+ > 1 and λ− > 0. These parameters are given in [3]. We have to restrict the infinite domain R to the localized domain = (−X, X ) for X > 0 in order to solve the RSJDM (9). For all computations, X is taken to be 1.5, i.e., = (−1.5, 1.5). Example 1 (Regime switching Merton jump-diffusion model for European put options) European put options under Regime switching Merton model for constant volatility are discussed in this example. The parameters used in this example are T = 1, K = 100, S0 = K , ζ = −0.3288415300010089, X = 1.5. We considered three states of the Markov chain under the regime switching Merton model, and the corresponding parameters used in the simulation are ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0.15 0.05 −0.50 0.45 σ = ⎣0.15⎦ , r = ⎣0.05⎦ , μ J = ⎣−0.50⎦ , σ J = ⎣0.45⎦ . 0.15 0.05 −0.50 0.45 The rate matrix A of the Markov chain and the intensity λ are given below ⎡ ⎤ ⎡ ⎤ −0.8 0.2 0.1 0.3 A = ⎣ 0.6 −1.0 0.3 ⎦ , λ = ⎣0.5⎦ . 0.2 0.8 −0.4 0.7 In this illustration, the first state of the economy has been considered and reference prices for European put options under the regime switching Merton model are obtained from [7]. Table 1 shows the value of option at (S, τ ) = (90, 0), (100, 0), & (110, 0), with different space and time grid points N & M, respectively. We can observe that the price of the option converges to the reference solution as the number of grid points increases. Table 2 represents the price of the European put option at the second and third state of the economy. We have already covered the fact that the implicit approach with three-time levels is second-order accurate in terms of time, and Fig. 1 shows accuracy in terms of space. Remark 1 The rate of convergence is decreased to two when the grid contains a singular point of the payoff function. However, if the initial data are smoothed, it is possible to restore the fourth-order convergence using such a mesh. Example 2 (Regime switching Kou jump-diffusion model for European put options) Regime switching Kou model for European put options with constant volatility is discussed in this example. The parameters used in this example are T = 0.25, K = 100, S0 = K , ζ = 0.08333333, X = 1.5. The regime switching Kou model is employed with five states of the Markov chain, and the corresponding simulation parameters are as follows:
Three-Time Levels Compact Scheme for Pricing European …
375
⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ ⎤ 0.5 0.05 0.5 3 2 ⎢0.5⎥ ⎢0.05⎥ ⎢0.5⎥ ⎢3⎥ ⎢2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ σ=⎢ ⎢0.5⎥ , r = ⎢0.05⎥ , p = ⎢0.5⎥ , λ+ = ⎢3⎥ , λ− = ⎢2⎥ . ⎣0.5⎦ ⎣0.05⎦ ⎣0.5⎦ ⎣3⎦ ⎣2⎦ 0.5 0.05 0.5 3 2 ⎡
The rate matrix A and intensity λ are given below ⎡
−1 ⎢0.25 ⎢ A=⎢ ⎢0.25 ⎣0.25 0.25
0.25 −1 0.25 0.25 0.25
0.25 0.25 −1 0.25 0.25
0.25 0.25 0.25 −1 0.25
⎡ ⎤ ⎤ 0.1 0.25 ⎢0.3⎥ 0.25⎥ ⎢ ⎥ ⎥ ⎢ ⎥ 0.25⎥ ⎥ , λ = ⎢0.5⎥ . ⎣0.7⎦ 0.25⎦ −1 0.9
In this illustration, third state of economy has been considered and reference prices for European put options under Kou’s model with regime switching
Table 1 Pricing of European put options under Merton’s model with a regime switching market considering the first state of the economy N
M
(S, τ ) = (90,0)
(S, τ ) = (100,0)
(S, τ ) = (110,0)
Present scheme Lee [7]
Present scheme Lee [7]
Present scheme Lee [7]
32
25
13.3829
13.3144 10.4452
10.3239 8.6965
8.6576
64
50
13.4717
13.4686 10.5078 10.4882
8.7295
8.7202
128
100
13.5197
13.5135 10.5401
10.5317 8.7470
8.7426
256
200
13.5261
13.5246 10.5444
10.5423 8.7493
8.7482
512
400
13.5278
13.5274 10.5455
10.5450 8.7499
8.7496
1024
800
13.5282
13.5281 10.5458
10.5456 8.7500
8.7500
2048
1600
13.5283
13.5283 10.5458
10.5458 8.7501
8.7501
Table 2 Pricing of European put options under Merton’s model with a regime switching market considering the second and third state of the economy N
M
(S, τ ) = (90,0)
(S, τ ) = (100,0)
(S, τ ) = (110,0)
Second state Third state
Second state Third state
Second state Third state
32
25
15.6560
17.3817
13.0297
14.8393
11.2045
12.9047
64
50
15.7544
17.4774
13.0907
14.9011
11.2402
12.9480
128
100
15.7753
17.4930
13.1027
14.9090
11.2470
12.9529
256
200
15.7805
17.4968
13.1056
14.9109
11.2487
12.9541
512
400
15.7818
17.4978
13.1064
14.9114
11.2491
12.9544
1024
800
15.7821
17.4980
13.1066
14.9115
11.2493
12.9545
2048
1600
15.7822
17.4981
13.1066
14.9116
11.2493
12.9545
376
P. K. Sahu et al. 10-2 N-2 -4
N Error
Error
10-3
10-4
10-5
10-6 30
40
50
60
70
80
90
Number of grid points in space
Fig. 1 Rate of convergent of the compact scheme in space for regime switching Merton’s model
market are obtained from [7]. Table 3 shows the value of option at (S, τ ) = (90, 0), (100, 0), & (110, 0), and different space with time grid points N & M, respectively. We can observe that the price of the option converges to the reference solution as the number of grid points increases. Tables 4 and 5 represent the price of the European put option at first-second and third-fourth state of the economy, respectively.
Table 3 Prices of European put options under regime switching Kou model at the third state of the economy N
M
(S, τ ) = (90,0)
(S, τ ) = (100,0)
(S, τ ) = (110,0)
Present scheme Lee [7]
Present scheme Lee [7]
Present scheme Lee [7]
32
25
15.8852
15.8483 10.7925
10.7392 7.1749
7.1319
64
50
15.9497
15.9398 10.8691
10.8551 7.2511
7.2393
128
100
15.9652
15.9627 10.8875
10.8841 7.2695
7.2665
256
200
15.9691
15.9685 10.8921
10.8913 7.2741
7.2733
512
400
15.9700
15.9699 10.8933
10.8930 7.2752
7.2750
1024
800
15.9703
15.9702 10.8936
10.8935 7.2755
7.2755
2048
1600
15.9703
15.9703 10.8936
10.8936 7.2756
7.2756
Three-Time Levels Compact Scheme for Pricing European …
377
Table 4 Prices of European put options under regime switching Kou model at the first and second state of the economy N
M
(S, τ ) = (90,0)
(S, τ ) = (100,0)
(S, τ ) = (110,0)
First state Second state
First state Second state
First state Second state
32
25
14.8017
15.3480
9.6765
10.2386
6.0938
6.6375
64
50
14.8754
15.4171
9.7605
10.3189
6.1750
6.7163
128
100
14.8933
15.4338
9.7808
10.3383
6.1948
6.7354
256
200
14.8977
15.4380
9.7859
10.3431
6.1997
6.7402
512
400
14.8988
15.4390
9.7872
10.3444
6.2010
6.7414
1024
800
14.8991
15.4393
9.7875
10.3447
6.2013
6.7416
2048
1600
14.8992
15.4394
9.7876
10.3447
6.2014
6.7417
Table 5 Prices of European put options under regime switching Kou model at the fourth and fifth state of the economy N
M
(S, τ ) = (90,0)
(S, τ ) = (100,0)
(S, τ ) = (110,0)
Fourth state Fifth state
Fourth state Fifth state
Fourth state Fifth state
32
25
16.4136
16.9332
11.3384
11.8764
7.7061
8.2310
64
50
16.4734
16.9884
11.4113
11.9454
7.7795
8.3016
128
100
16.4877
17.0014
11.4286
11.9618
7.7971
8.3184
256
200
16.4912
17.0046
11.4330
11.9658
7.8015
8.3226
512
400
16.4921
17.0055
11.4340
11.9669
7.8026
8.3237
1024
800
16.4923
17.0057
11.4343
11.9671
7.8029
8.3240
2048
1600
16.4924
17.0057
11.4344
11.9672
7.8030
8.3240
5 Conclusion This paper discusses a fourth-order accurate compact scheme using the Hermitian approach for option pricing under RSJDM. A diagonally dominant system of linear equations is achieved for a fully discrete problem by eliminating the second derivative approximation using the unknown itself and its first derivative approximation. European options under regime switching Merton and Kou jump-diffusion models are considered. The approach is fourth-order accurate in a spatial variable, according to numerical examples. For future work, we may explore the proposed method for an Asian type of options under jump-diffusion models under the regime switching market. Acknowledgements The corresponding author acknowledges the support under the grants DST/INTDAAD/P-12/2020 and SERB SIR/2022/000021.
378
P. K. Sahu et al.
References 1. Black F, Scholes M (1973) The pricing of options and corporate liabilities. J Polit Econ 81:637– 654 2. Merton RC (1976) Option pricing when underlying stock returns are discontinuous. J Financ Econ 3:125–144 3. Kou SG (2002) A jump-diffusion model for option pricing. Manag Sci 48:1086–1101 4. Hull J, White A (1987) The pricing of options on assets with stochastic volatilities. J Financ 42:281–300 5. Tankov P, Voltchkova E (2009) Jump-diffusion models: a practitioner’s guide. Banque et Marches ´ 99 6. Cizek P, Hardle ¨ WK, Weron R (2005) Statistical tools for finance and insurance. Springer, Berlin 7. Lee Y (2014) Financial options pricing with regime-switching jump-diffusions. Comput Math Appl 68:392–404 8. Spotz WF, Carey GF (1995) High-order compact scheme for the steady stream-function vorticity equations. Int J Numer Meth Eng 38:3497–3512 9. Patel KS, Mehra M (2017) Fourth-order compact finite difference scheme for American option pricing under regime-switching jump-diffusion models. Int J Appl Comput Math 3(1):547–567 10. Mehra M, Patel KS (2017) Algorithm 986: a suite of compact finite difference schemes. ACM Trans Math Softw 44 11. Düring B, Fourniè M, Jüngel A (2004) Convergence of high-order compact finite difference scheme for a nonlinear Black-Scholes equation. Math Model Numer Anal 38:359–369 12. Kreiss HO, Thomee V, Widlund O (1970) Smoothing of initial data and rates of convergence for parabolic difference equations. Commun Pure Appl Math 23:241–259
The Sum of Lorentz Matrices in M2 (Zn ) Richard J. Taclay
and Karen Dizon-Taclay
Abstract Let n ∈ Z+ and A ∈ M2 (Zn). We define the function L : A → A, 1 0 such that L (A) = L −1 A T L with L ≡ (mod n). We say that an invertible 0 −1 matrix A ∈ M2 (Zn ) is L -or thogonal or simply Lor ent z if L (A) = L −1 A T L ≡ A−1 (mod n). In this study, we found that the set of Lorentz matrices in M2 (Zn ) forms a group under matrix multiplication. As a result, a matrix A ∈ M2 (Zn ) can be written as a sum of Lorentz matrices if and only if the product, Q A P can be written as a sum of Lorentz matrices whenever Q and P are Lorentz matrices. Furthermore, we have established that orthogonal diagonal matrices which are orthogonal with determinant congruent to either 1 or −1 modulo n are Lorentz matrices. Keywords Lorentz matrices · Orthogonal matrices · Diagonal matrices · Determinant
1 Introduction Let k be a positive integer and denote the set of all k-by-k matrices over a field F by Mk (F). Suppose we have a nonsingular matrix S ∈ Mk (F); we define the function S (A) = S −1 A T S, that is, matrix A ∈ Mk (F) is mapped to S −1 A T S. If S (A) = A−1 , then A is said to be S -or thogonal.The set of S -or thogonal matrices forms a group under matrix multiplication in the case when S is symmetric or skew-symmetric [1]. If S is the identity matrix in Mk (F), then S (A) = A−1 gives A T = A−1 . In this case, A becomes an orthogonal matrix. We know that the set of orthogonal matrices forms a group under matrix multiplication. In [2], given k ≥ 2, every A ∈ Mk (F) R. J. Taclay · K. Dizon-Taclay (B) University of the Philippines Baguio, Gov. Pack Rd., Baguio, Philippines e-mail: [email protected] Nueva Vizcaya State University, Bayombong, Nueva Viz., Philippines © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_28
379
380
R. J. Taclay and T. Dizon-Taclay
can be written as a sum of orthogonal matrices if F = R or C. Moreover, for F = R, the least number of orthogonal matrices that can add up to a given matrix in Mk (R) was studied. 0 Ik and A ∈ M2k (F) is a nonsingular matrix. Suppose now that S = J = −Ik 0 We say that A is J -or thogonal or symplectic if J (A) = A−1 . A criterion for determining whether a matrix is symplectic or not is presented in [3]. Examples of symplectic matrices are also presented. Likewise, Gerritzen [4] provided a list of symplectic matrices. 0 I . Finally, we consider a nonsingular matrix A ∈ Mk (F) and S = L = n 0 −In−k A is said to be a L -or thogonal or Lor ent z matrix if L (A) = A−1 . We are particularly interested in the set of Lorentz matrices, the forms of Lorentz Matrices, and their sum with A ∈ M2 (Zn ).
2 Preliminaries Let n and k be positive integers. We denote by Mk (Zn ) the set of all k-by-k matrices with entries from Zn . Definition 1 Let A = [ai j ] and B = [bi j ] ∈ Mk (Zn ). We say that A is congruent to B modulo n, written as A ≡ B (mod n), if and only if ai j ≡ bi j (mod n) for all i and for all j. Similar to Mn (F), we say that a matrix A ∈ Mk (Zn ) is invertible if there exists a matrix B ∈ Mk (Zn ) such that AB ≡ Ik (mod n). Moreover, we know that whenever the determinant of a matrix A ∈ Mk (F) is not equal to zero for F = R or C, the inverse of matrix A exists. However, a matrix A ∈ Mk (Zn ) may have a determinant not congruent to zero modulo n yet its inverse need not exist. Hence, in this study, whenever we say a matrix A ∈ Mk (Zn ) is nonsingular, we mean that its determinant is not congruent to zero modulo n and that the inverse of its determinant exists in Mk (Zn ). We are also interested with the permutation matrices. Definition 2 A permutation matrix is a matrix whose only nonzero entry in each row and column is 1 and all other entries is 0. Permutation matrices exhibit the following properties related to the study. 1. The determinant of a permutation matrix is 1 or –1. This implies that permutation matrices are nonsingular. 2. The product of two permutation matrices is again a permutation matrix. However, permutation matrices do not necessarily commute, that is, if P and Q ∈ Mk (F) are permutation matrices, then P Q is not necessarily equal to Q P. 3. The identity matrix is a permutation matrix. Moreover, P T ≡ P −1 for every permutation matrix P. Hence, P is an orthogonal matrix.
The Sum of Lorentz Matrices in M2 (Zn )
381
3 Main Results In this study, we are interested in Lorentz matrices in M2 (Z2 ). However, some generalizations are provided for matrices in M2 (Zn ). We begin with a definition of the Lorentz Matrix in M2 (Zn ). Definition 3 A nonsingular matrix A ∈ Mk (Zn ) is said to be Lorentz if A is L orthogonal, that is, A L (A) = L (A)A ≡ Ik (mod n). We denote the set of all Lorentz matrices in Mk (Zn ) by L k (Zn ) or simply L k,n . Proposition 1 The set of L -orthogonal matrices in Mk (Zn ) forms a group under matrix multiplication. Proof Let A and B be L -orthogonal matrices, i.e. L −1 A T L ≡ A−1 (mod n) and L −1 B T L ≡ B −1 (mod n), respectively. We need to show that AB −1 is also L orthogonal, i.e., L −1 (AB −1 )T L ≡ (AB −1 )−1 (mod n). We proceed as follows: L −1 (AB −1 )T L = L −1 (B −T A T )L = L −1 (B −T L L −1 A T )L = (L −1 B −T L)(L −1 A T L)
Thus, the last matrix gives the following equation (B −1 )−1 (A−1 ) ≡ (AB −1 )−1 (mod n) as desired. We now proceed with the following proposition which helps us write matrices in Mk (Zn ) as a sum of Lorentz matrices similar to a Lemma in [5]. Its proof follows from Proposition 1. Proposition 2 Let Q, P and A ∈ Mk (Zn ) be given. Suppose that Q and P are Lorentz matrices. Then A can be written as a sum of Lorentz matrices if and only if Q A P can be written as a sum of Lorentz matrices. Proof We assume that matrix A ∈ Mk (Zn ) can be written as a sum of Lorentz matrices, say A ≡ L 1 + L 2 + · · · + L m (mod n), where L i ∈ L 2,n , for 1 ≤ i ≤ m. Then, Q A P ≡ Q(L 1 + L 2 + · · · + L m )P (mod n), where Q, P ∈ L 2,n . Proposition 1 guarantees us that each Q L i P is a Lorentz matrix for i = 1, 2, . . . , m. Therefore, Q A P can be written as a sum of Lorentz matrices. Suppose that Q A P can be written as a sum of Lorentz matrices. So, Q A P ≡ L 1 + L 2 + · · · + L m (mod n), where L i ∈ L n,k . Since Q and P are Lorentz matrices, then we know that their inverses exist. Thus, Q −1 Q A P P −1 ≡ Q −1 (L 1 + L 2 + · · · + L m )P −1 (mod n). We get A ≡ −1 Q L 1 P −1 + Q −1 L 2 P −1 + · · · + Q −1 L m P −1 (mod n). We can observe that Q −1 L i P −1 ∈ L 2,n for i = 1, 2, . . . , m since L 2,n forms a group under multiplication as stated in Proposition 1. At this point, we now consider the matrices in M2 (Zn ). Our goal is to identify the Lorentz matrices in M2 (Zn ).
382
R. J. Taclay and T. Dizon-Taclay
Proposition 3 The matrix
01 is not L -orthogonal for n ≥ 3. 10
T −1 01 01 L ≡ (mod n). We note that Proof We need to show that L −1 10 10 T −1 01 01 01 1 0 1 0 = = . Also, L = ≡ (mod n) = L −1 . 10 10 10 0 n−1 0 n−1 Thus 1 0 01 1 0 0 1 1 0 ≡ (mod n) 0 n−1 1 0 0 n−1 n−1 0 0 n−1 0 n−1 ≡ (mod n) n−1 0 01 0 n−1 01 ≡ (mod n) for n ≥ 3. Hence, is not L -orthogonal 10 n−1 0 10 for n ≥ 3. But,
Remark 1 The only permutation matrix in M2 (Zn ) which is L -or thogonal is the identity matrix for n ≥ 3. 0 n−1 Proposition 4 The matrix is not L -orthogonal for n ≥ 3. n−1 0 0 n−1 Proof Let A = . We note that A = A T and A ≡ A−1 (mod n). We n−1 0 01 0 n−1 can check that L −1 A T L ≡ (mod n) which is not congruent to 10 n−1 0 0 n−1 modulo n for n ≥ 3. Indeed, is not L -orthogonal for n ≥ 3. n−1 0 Remark 2 The following orthogonal anti-diagonal matrices are not Lorentz matrices: 01 1. (mod n) 10 0 n−1 2. (mod n) n−1 0 0 1 3. (mod n) n−1 0 0 n−1 4. (mod n) 1 0 Theorem 1 Let A be an orthogonal diagonal matrix in M2 (Zn ). A is L -orthogonal if and only if det A ≡ ±1(mod n).
The Sum of Lorentz Matrices in M2 (Zn )
383
Proof Let A be an orthogonal matrix in M2 (Zn ). Then A T ≡ A−1 (mod n). diagonal a0 We assume that A = is L -orthogonal. Then A T L A ≡ L(mod n). 0b 2 0 a T T (mod n). Thus, Solving for A L A, we get A L A ≡ 0 b2 (n − 1) 2 a 1 0 0 (mod n) ≡ (mod n) which implies that a 2 ≡ 1(mod n) 0 n−1 0 b2 (n − 1) and b2 ≡ 1(mod n). Hence, a ≡ 1(mod n) or a ≡ (n − 1)(mod n) and b ≡ 1(mod n) or b ≡ (n − 1)(mod n). We proceed with the following cases: (i) a ≡ 1(mod n) and b ≡ 1(mod n) 10 A≡ (mod n) ⇒ det A ≡ 1(mod n) 01 (ii) a ≡ −1(mod n) and b ≡ 1(mod n) n−1 0 (mod n) ⇒ det A ≡ −1(mod n) A≡ 0 1 (iii) a ≡ 1(mod n)and b ≡ −1(mod n) 1 0 A≡ (mod n) ⇒ det A ≡ −1(mod n) 0 n−1 (iv) a ≡ −1(mod n) and b ≡ −1(mod n) n−1 0 A≡ (mod n) ⇒ det A ≡ 1(mod n) 0 n−1 Suppose det A ≡ ±1(mod n). This implies that A−1 exists. Hence, we have 10 1 0 A1 ≡ (mod n); A2 ≡ (mod n); 01 0 −1 −1 0 −1 0 A3 ≡ (mod n); A4 ≡ (mod n). 0 1 0 −1 Note that Ai = AiT = Ai−1 for i = 1, 2, 3, 4. One can verify that L (Ai ) ≡ Ai−1 (mod n). Therefore, A T ≡ A−1 (mod n).
Based on the preceding theorem, here are the different forms of the orthogonal diagonal Lorentz matrices: 10 1. , the identity matrix 01 1 0 2. (mod n) 0 n−1 n−1 0 3. (mod n) 0 1
384
R. J. Taclay and T. Dizon-Taclay
4.
n−1 0 0 n−1
(mod n)
We now present the following illustrations for Lorentz matrices in M2 (Z2 ) and M2 (Z3 ): • We note that −1 ≡ 1(mod 2). • The Lorentz matrices in M2 (Z2 ) are:
10 01
01 10
• The matrices which can be written as a sum of Lorentz matrices are: 00 10 01 11 . 00 01 10 11 • We note that −1 ≡ 2(mod 3). • The Lorentz matrices in M2 (Z3 ) are:
10 01
10 02
20 20 . 01 02
• The matrices which can be written as a sum of Lorentz matrices are: 00 10 10 20 20 20 00 00 01 02 01 02 00 02
10 00
00 01
20 10 . 01 02
References 1. Horn RA, Merino D, Serre D (2020) The S-orthogonal groups, (Private communication with the authors) 2. Merino D, Paras AT, Reyes E, Walls G (2011) The sum of orthogonal matrices in Mn (Zk ). Linear Algebra Appl 434:2170–2175 3. Dato-on JE, Merino D, Paras A (2009) The J polar decomposition of matrices with rank 2. Linear Algebra Appl 430:756–761 4. Gerritzen L (1999) Symplectic 2 × 2 matrices over free algebras 5. Merino D (2012) The sum of orthogonal matrices. Linear Algebra Appl 436:1960–1968
A Robust Analytic Approach to Solve Non-linear Fractional Partial Differential Equations Using Fractional Complex Transform Vishalkumar J. Prajapati and Ramakanta Meher
Abstract This study examines different non-linear partial differential equations of fractional order using a unique proposed strategy termed the homotopy analysis fractional complex transform method. Tabular and graphical presentations are used to display the obtained results. The tabular results demonstrate that the proposed method is reliable, with a low processing time and high accuracy. Furthermore, comparative tests are performed to verify that the suggested approach coincides nicely with current methodologies existing in the literature. This approach is a new proposed algorithm that provides good approximate results in a short time while keeping great accuracy. Keywords Non-linear partial differential equation · Jumarie’s modified Riemann–Liouville fractional derivative · Homotopy analysis method · Fractional complex transform
1 Introduction Fractional calculus has applications in a wide range of engineering and science domains, such as electromagnetics, fluid mechanics, electrochemistry, mathematical models in biology, signal processing, etc. Fractional calculus helps to simulate physical and technical processes that are best characterised by fractional differential equations. Fractional calculus It is the promotion of integer-order derivatives and integrals. In recent years, different analytical and numerical approaches, as well as their applicability to novel problems, have been proposed in these disciplines [1, 2]. Laplace, Fourier, and Mellin transformations [1] are utilised to solve linear fractional differential equations. However, numerical methods are frequently employed to solve V. J. Prajapati (B) · R. Meher Department of Mathematics, Sardar Vallabhbhai National Institute of Technology, Surat 395007, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_29
385
386
V. J. Prajapati and R. Meher
non-linear FDEs [3–6]. Nevertheless, numerical techniques have several drawbacks. In addition to causing rounding mistakes, discretizing the variables requires a substantial amount of computer memory. There are a number of analytic methods available to investigating non-linear fractional differential equations like Adomian decomposition method [7–9], homotopy perturbation method(HPM) [10, 11], variational iteration method(VIM) [12, 13], and homotopy analysis method [14, 15]. Some researchers applied integral transforms to homotopy analysis method [16–18]. In this research, the fractional complex transform[22] and homotopy analysis will be utilised. This work employs the homotopy analysis method via fractional complex transform to investigate the specified linear and non-linear fractional issues. This paper has been divided into six sections. Basic definitions are found in Sect. 2. Section 3 discusses the proposed technique . While Sect. 4 includes several numerical examples. Section 5 presents a comparison and debate, and the conclusion is provided in the last section.
2 Preliminaries In this portion, we will provide some basic definitions which are helpful to us. Definition 1 The (left sided) Riemann–Liouville derivative [2] of order α of a given f (t) is defined as RL α 0 Dt
f (t) =
dm 1 (m − α) dt m
t 0
f (τ ) dτ, (t − τ )1−α
(1)
where m − 1 ≤ α < m and m ∈ N, α is any fractional order. Definition 2 (Jumarie’s Fractional Derivative). There exist so many definitions of fractional derivatives, but from that, the Riemann–Liouville fractional derivative, Grünwald–Letnikov fractional order derivative, and Caputo fractional order derivative are the most important and used frequently. Modified (Jumarie’s) Riemann– Liouville derivative of fractional order α is given by Jumaries [19–21] as ⎧ ξ ⎪ ⎨ 1 d (ξ − τ )−α ( f (τ ) − f (0))dτ, 0 < α < 1. α (2) Dξ f (ξ ) = (1−α) dξ 0 ⎪ ⎩( f n (ξ ))α−n , n ≤ α < n + 1. Some basic properties of Modified R–L fractional derivative are: (1+s) 1. Dξα ξ s = (1+s−α) ξ s−α , where s > α > 0. α 2. Dξ ( f (ξ )g(ξ )) = g(ξ )Dξα f (ξ ) + f (ξ )Dξα g(ξ ). 3. Dξα f (g(ξ )) = Dξ1 f (ξ )Dξα g(ξ ).
A Robust Analytic Approach to Solve Non-linear …
387
4. Dξα (k f (ξ )) = kDξα f (ξ ), α > 0, where k is a constant. 5. Dξα k = 0, α > 0, where k is a constant. Definition 3 (Fractional Complex Transform). The Fractional Complex Transform(FCT) was first explained by Li and He [22] to transform the fractional-order differential equations into integer-order differential equations. Let the general fractional order differential equation with modified R–L fractional derivative of the form γ
g(θ, Dtα θ, Dxβ θ, Dλy θ, Dz θ, Dt2α θ, Dx2β θ, D2λ y θ, Dz θ, ...) = 0, where θ = θ (t, x, y, z). 2γ
(3) Now, Fractional complex transform(FCT) requires the following: T =
pt α qt β rtλ st γ , X= , Y = , Z= , (1 + α) (1 + β) (1 + λ) (1 + γ )
where 0 < α, β, λ, γ ≤ 1, and p, q, r, s are unknown constants which are further determined. Now from the properties of Jumarie’s fractional derivative, we have Dtα T = Dtα
p (1 + α) pt α p = t α−α = p Dtα t α = (1 + α) (1 + α) (1 + α) (1 + α − α)
So, in this way, we will get Dtα T = p, Dxβ X = q, Dλy Y = r, Dzγ Z = s Using Jumarie’s chain rule [20, 21] or the above property (3), we have Dtα θ =
∂ 2θ ∂θ α ∂θ ∂ 2θ α 2 Dt T = p , Dt2α θ = (Dt T ) = p 2 2 2 ∂T ∂T ∂T ∂T
Therefore ∂θ ∂θ ∂θ , Dxβ θ (t, x, y, z) = q , Dλy θ (t, x, y, z) = r , ∂T ∂X ∂Y 2 ∂ 2θ ∂ 2θ ∂θ 2∂ θ Dzγ θ (t, x, y, z) = s , Dt2α θ = p 2 2 , Dx2β θ = q 2 , D2λ , y θ =r 2 ∂Z ∂T ∂X ∂Y 2 ∂ 2θ Dz2γ θ = s 2 2 ... ∂Z
Dtα θ (t, x, y, z) = p
By applying this fractional complex transform, Eq. (3) becomes
∂θ ∂θ ∂θ 2 ∂ 2 θ 2 ∂ 2 β 2 ∂ 2 θ 2 ∂ 2 θ ∂θ , q ,r , s ,p g θ, p , q , r , s , . . . = 0 (4) ∂T ∂X ∂Y ∂Z ∂T 2 ∂ X2 ∂Y 2 ∂ Z2 Now, In next we will describe the brief idea of homotopy analysis method to solve above Eq. (4).
388
V. J. Prajapati and R. Meher
3 Brief Idea of Homotopy Analysis Fractional Complex Transform Method (HAFCTM) In this section, first we will apply FCT to the fractional differential equation to convert it into a traditional differential equation, and then we implement the homotopy analysis method (HAM) to examine the converted equation. Let us take the fractional differential equation with the modified R–L fractional order derivative as Dτα u(ξ, τ ) + Ru(ξ, τ ) + N u(ξ, τ ) = f (ξ, τ ),
(5)
where Dτα u(ξ, τ ) represents the modified R–L fractional derivative of u(ξ, τ ) with respect to time τ . Ru(ξ, τ ) and N u(ξ, τ ) represents linear and non-linear differential terms, respectively, and f (ξ, τ ) is the source term. Let aτ α , 0 < α ≤ 1, t= (1 + α) where a is an unknown constant. As mentioned in the above Sect. 2 and applying Jumarie’s chain rule Dτα u(ξ, τ ) =
∂ u(ξ, t)Dτα t. ∂t
But for a = 1 τα 1 (1 + α) 1 Dτα t = Dτα = Dτα τ α = t α−α = 1 (1 + α) (1 + α) (1 + α) (1 + α − α) So the Eq. (5) becomes ∂ u(ξ, t) + R[u(ξ, t)] + N [u(ξ, t)] = f (ξ, t) ∂t
(6)
Now simplifying the Eq. (6) u t + R[u] + N [u] − f = 0
(7)
Now to apply HAM, we will take non-linear operator N L[Φ] as N L[Φ(ξ, t; η)] = Φt (ξ, t; η) + RΦ(ξ, t; η) + N Φ(ξ, t; η) − f (ξ, t) = 0, (8) where Φ(ξ, t; η) is an unknown function of ξ, t and η (an auxiliary parameter). Now, first define the deformation equation of zero order
A Robust Analytic Approach to Solve Non-linear …
389
(1 − η)L[Φ(ξ, t; η) − u 0 (ξ, t)] = ηH(ξ, t)N L[Φ(ξ, t; η)],
(9)
where L = ∂t∂ and (= 0) and H(ξ, t)(= 0) are auxiliary parameter and function, respectively. u 0 (ξ, t) is an initial approximation of u(ξ, t). If η = 0 and η = 1, then Φ(ξ, t; 0) = u 0 (ξ, t) and Φ(ξ, t; 1) = u(ξ, t)
(10)
respectively. That is, we can say that as η increases from 0 to 1, the solution Φ(ξ, t; η) converges from u 0 (ξ, t) to u(ξ, t). By using Taylor series expansion, we expand the function Φ(ξ, t; η) with respect to η, we will get Φ(ξ, t; η) = u(ξ, t) = u 0 (ξ, t) +
∞
u m (ξ, t)ηm ,
(11)
m=1
1 ∂ m Φ(ξ, t; η)
u m (ξ, t) =
m! ∂ηm η=0
where
The series (11) become convergent at η = 1 for good choice of u 0 (ξ, t) and . Then it gives the one of the solution of the Eq. (7), u(ξ, t) = u 0 (ξ, t) +
∞
u m (ξ, t)
(12)
m=1
Now differentiating equation (9) m-times by η and then divide by m! and by substituting η = 0, we will get deformation equation of m order as → L[u m (ξ, t) − χm u m−1 (ξ, t)] = H(ξ, t)Rm (− u m−1 ), where the vector − → u m−1 = {u 0 (ξ, t), u 1 (ξ, t), . . . , u m−1 (ξ, t)}, → u m−1 ) stands for and Rm (− → R m (− u m−1 ) =
∂ m−1 N L[Φ(ξ, t; η)]
1
(m − 1)! ∂ηm−1 η=0
= u t (ξ, t) + Ru(ξ, t) + N u(ξ, t) − (1 − χm ) f (ξ, t) = 0 and
(13)
390
V. J. Prajapati and R. Meher
χm =
0, m ≤ 1. 1, m > 1.
(14)
Now by applying inverse of linear integral operator L = (13), we will get
∂ ∂t
to both sides of the Eq.
→ u m−1 )] u m (ξ, t) = χm u m−1 (ξ, t) + L−1 [H(ξ, t)Rm (−
(15)
Now by solving equation (15), we get u m (ξ, t) ( f or m = 0, 1, 2, . . .). So, we get approximate series solution as follows: u=
∞
um
(16)
m=0 α
aτ Now replacing t by t = (1+α) , we will get final approximate series solution in original variables ξ and τ .
4 Numerical Application In this part, the applications of proposed technique to some examples are given. Example 1 Consider the following time-fractional non-linear Klein–Fock–Gordon (KFG) equation with modified R–L fractional order derivative Dτγ u(ξ, τ ) =
3 ∂ 2 u(ξ, τ ) 3 − u(ξ, τ ) + u 3 (ξ, τ ), γ = 2α, 0 < α ≤ 1, τ > 0 ∂ξ 2 4 2 (17)
where u(ξ, 0) = − sech(ξ ) and u τ (ξ, 0) =
1 sech(ξ ) tanh(ξ ). 2
(18)
are initial conditions. If α = 1, then the exact solution of the Example (1) is u(ξ, τ ) = − sech(ξ + τ2 ). Now we will apply our proposed algorithm(HAFCTM), for that, first apply FCT. For that, we will take transform t= which gives us Dτ2α u(ξ, τ ) =
pτ α , 0 0 ∂ξ 2
(26)
and u(ξ, 0) = 1 + sin(ξ ) and u τ (ξ, 0) = 0
(27)
are given initial conditions. Now we will apply our proposed algorithm(HAFCTM), for that first apply FCT. For that we will take transform, t= which gives us Dτ2α u(ξ, τ ) =
pτ α , 0 0, there exists a sequence t of disjoint dyadic intervals I j such that
(i) t
t2 , then each I jt1 is subinter val o f some Imt2 , ∀ j, m ∈ Z An operator T is bounded on p (Z) if ∀a ∈ p (Z) T a p (Z) ≤ C p a p (Z) An operator T is of weak type (1,1) on p (Z) if for each a ∈ 1 (Z) |{m ∈ Z : |T a(m)| > λ}| ≤
C a1 λ
For {a(n) : n ∈ Z} ∈ p (Z), norm in p (Z) (refer to [1]) is given by a p (Z) =
0
∞
pλ p−1 |{m ∈ Z : |a(m)| > λ}|dλ
Relations Between Discrete Maximal Operators …
401
3 Definitions 3.1 Maximal Operators Let {a(n) : n ∈ Z} be a sequence. We define the following three types of Hardy– Littlewood maximal operators as follows. Definition 1 If Ir is the interval {−r, −r + 1, . . . , 0, 1, 2, . . . , r − 1, r }, define centered Hardy–Littlewood maximal operator M a(m) = sup r >0
1 |a(m − n)| (2r + 1) n∈I r
We define Hardy–Littlewood maximal operator as follows: Ma(m) = sup m∈I
1 |a(n)| |I | n∈I
where the supremum is taken over all intervals containing m. Definition 2 We define the dyadic Hardy–Littlewood maximal operator as follows: Md a(m) = sup m∈I
1 |a(k)| |I | k∈I
where supremum is taken over all dyadic intervals containing m. Given a sequence {a(n) : n ∈ Z}and an interval I , let a I denote average of {a(n) : n ∈ Z} on I , i.e. a I = |I1| m∈I a(m). Define the sharp maximal operator M # as follows 1 |a(n) − a I | M # a(m) = sup m∈I |I | n∈I where the supremum is taken over all intervals I containing m. We say that sequence {a(n) : n ∈ Z} has bounded mean oscillation if the sequence M # a is bounded. The space of sequences with this property is called sequences of bounded mean oscillation and is denoted by BMO(Z). We define a norm in BMO(Z) by a = M # a ∞ . The space BMO(Z) is studied in [2].
402
S. S. Swarup Anupindi and M. Alphonse
4 Relation Between Maximal Operators Theorem 2 Given a sequence {a(m) : m ∈ Z}, the following relations holds: M a(m) ≤ Ma(m) ≤ 3M a(m) Proof The first inequality is obvious as M considers supremum over centered intervals, while M considers supremum over all intervals. For second inequality, let I = [m − r1 , m − r1 + 1, . . . , m + r2 − 1, m + r2 ] containing m. Let r = max {r1 , r2 }. Consider I1 = [m − r, m − r + 1, . . . , m + r − 1, m + r ] containing m. Note that |I1 | = 2r + 1, |I | = r1 + r2 + 1. Then |I | = r2 + r1 + 1 ≥ r = This gives
1 1 1 3r ≥ (2r + 1) = |I1 | 3 3 3
3 1 |a(k)| ≤ |a(k)| ≤ 3M a(m) |I | k∈I |I1 | k∈I 1
Theorem 3 If a = {a(k) : k ∈ Z} is a sequence with a ∈ 1 , then | m ∈ Z : M a(m) > 4λ | ≤ 3|{m ∈ Z : Md a(m) > λ}| Proof Using Calderon–Zygmund decomposition at height λ, we obtain a collection of disjoint dyadic intervals I j : j ∈ Z + such that λ
λ} It suffices to show that
m ∈ Z : M a(m) > 4λ ⊂ ∪ j 3I j
Let m ∈ / ∪ j 3I j . We shall prove m ∈ / k ∈ Z : M a(k) > 4λ . Let I be any interval centered at m. Choose N ∈ Z+ such that 2 N −1 ≤ |I | < 2 N . Then I intersects exactly 2 dyadic intervals in I N say R1 , R2 . Assume R1 intersects I on the left / ∪∞ / 2R I j , j = 1 . . . and and R2 intersects I on the right. Since m ∈ j=1 3I j , m ∈ m∈ / 2L I j , j = 1, . . . . But m ∈ 2R R1 and m ∈ 2L R2 . Therefore, both R1 and R2 cannot be any one of I j . Hence, the average of {a(n) : n ∈ Z} on each Ri , i = 1, 2 is at most λ. Further, note that |R|I1|| ≤ 2, |R|I2|| ≤ 2. So
Relations Between Discrete Maximal Operators …
403
1 1 |a(m)| ≤ |a(k)| + |a(k)| |I | m∈I |I | k∈R k∈R2 1
1 |R1 | 1 |R2 | = |a(k)| + |a(k)| |R1 | |I | k∈R |R2 | |I | k∈R 1 2
1 1 ≤2 |a(k)| + |a(k)| = 2(λ + λ) = 4λ |R1 | k∈R |R2 | k∈R 1
2
Corollary 1 For a sequence {a(n) : n ∈ Z}, if Md a ∈ p (Z), 1 < p < ∞, then M a
p (Z)
≤ C Md a p (Z)
Proof M a
p (Z)
∞
pλ p−1 | m : M a(m) > λ |dλ 0
∞ λ λ |dλ p( ) p−1 | m : Md a(m) > ≤ 3(4) p−1 4 4 0 ∞ ≤ 3(4) p pu p−1 |{m : Md a(m) > u}|du
=
0
≤ 3(4) Md a p (Z) p
In the following lemma, we see that in the norm of BMO(Z) space, we can replace the average a I of {a(n)} by a constant b. The proof is similar to the proof in continuous version [1]. We provide the proof for the sake of completeness. Lemma 1 Consider a nonnegative sequence a = {a(k) : k ∈ Z}. Then the following are valid: 1.
1 1 a ≤ sup inf |a(m) − b| ≤ a 2 m∈I b∈R |I |
2. M # (|a|)(i) ≤ M # a(i), i ∈ Z Proof For the first inequality, note for all b ∈ R, |a(m) − a I | ≤ |a(m) − b| + |b − a I | = A + B(say) m∈I
m∈I
m∈I
404
S. S. Swarup Anupindi and M. Alphonse
Now 1 a(k)| |I | k∈I
1 (b − a(k)) | ≤ |b − a(k)| = |I || |I | k∈I k∈I
B = |I ||b − a I | = |I ||b −
So,
|a(m) − a I | ≤ |a(m) − b| + |b − a I | ≤ 2 |a(m) − b| m∈I
m∈I
m∈I
m∈I
Now, divide both sides by |I |, and take infimum over all b followed by supremum over all I . This proves 1 1 a ≤ sup inf |a(m) − b| 2 m∈I b∈R |I | The proof for the second inequality sup inf
m∈I b∈R
1 |a(m) − b| ≤ a |I |
is obvious. The proof of (2) follows from the fact that ||a| − |b|| ≤ |a| − |b| for any a, b ∈ R. Lemma 2 If a ∈ p0 (Z) for some p0 , 1 ≤ p0 < ∞, then for all γ > 0 and λ > 0 | n ∈ Z : Md a(n) > 2λ, M # a(n) ≤ γλ | ≤ 2γ|{n ∈ Z : Md a(n) > λ}| Proof Perform Calderon–Zygmund decomposition for the sequence {a(n) : n ∈ Z} at height λ, which gives collection of intervals I j such that for each j, λ≤
1 |a(k)| ≤ 2λ |I j | k∈I j
Let I be one of the intervals in the collection I j . In Calderon–Zygmund decomposition, there exists interval I˜ such that I˜ is either 2R I or 2L I and 1 |a(k)| ≤ λ | I˜| k∈ I˜
For details, refer to section on Preliminaries and Notation.
Relations Between Discrete Maximal Operators …
405
It is easy to observe that ∀m ∈ I , Md a(m) > 2λ implies Md (aχ I )(m) > 2λ. Let, a1 = (a − a I˜ )χ I a2 = a I˜ χ I Then, since Md is sublinear a1 + a2 = (a − a I˜ )χ I + a I˜ χ I Md (a1 + a2 ) ≤ Md ((a − a I˜ )χ I ) + Md (a I˜ χ I ) ≤ Md ((a − a I˜ )χ I ) + (a I˜ ) Since Md (a I˜ χ I )(k) ≤ a I˜ ∀k, it follows that Md (a1 + a2 ) = Md (aχ I ) ≤ Md ((a − a I˜ )χ I ) + (a I˜ ) Hence for every k ∈ I , it follows that Md ((a − a I˜ )χ I )(k) ≥ Md (aχ I )(k) − a I˜ So, for those k, Md ((a − a I˜ )χ I )(k) ≥ Md (aχ I )(k) − a I˜ > λ By remark, using weak(1, 1) inequality for Md
C k ∈ Z : Md ((a − a I˜ )χ I ))(k) > λ ≤ |a(k) − a I˜ | λ I C 2 ≤ |I | |a(k) − a I˜ | λ | I˜| ˜ I
2C |I | inf M # a(m) ≤ m∈I λ 2C γλ|I | = 2Cγ|I | ≤ λ As a consequence of good-λ inequality, we prove the following theorem. Theorem 4 Let {a(n) : n ∈ Z} be a nonnegative sequence in p (Z), 1 < p < ∞.Then |Md a(m)| p ≤ C |M # a(m)| p m∈Z
m∈Z
where Md is the dyadic maximal operator and M # is the sharp maximal operator whenever the left-hand side is finite.
406
S. S. Swarup Anupindi and M. Alphonse
Proof For a positive integer N > 0, let
N
IN =
pλ p−1 |{m ∈ Z : Md a(m) > λ}|dλ
0
I N is finite, since a ∈ p (Z) implies Md a ∈ p (Z)
N
IN = 0
pλ p−1 |{m ∈ Z : Md a(m) > λ}|dλ
N 2
= 2p ≤2
pλ p−1 |{m ∈ Z : Md a(m) > 2λ}|dλ
0 N 2
p
pλ p−1 | m ∈ Z : Md a(m) > 2λ, M # a(m) ≤ γλ |dλ+
0
pλ p−1 | m ∈ Z : Md a(m) > 2λ, M # a(m) > γλ |dλ
N 2
2p 0
≤2
N 2
p
pλ p−1 Cγ|{m ∈ Z : Md a(m) > λ}|dλ+
0
pλ p−1 | m ∈ Z : M # a(m) > γλ |dλ 0 N pλ p−1 |{m ∈ Z : Md a(m) > λ}|dλ + ≤ 2 p Cγ N 2
2p
0
N 2
2p
pλ p−1 | m ∈ Z : M # a(m) > γλ |dλ
0
It follows that (1 − 2 p Cγ)I N ≤ 2 p
N 2
pλ p−1 | m ∈ Z : M # a(m) > γλ |dλ
0
Now choose γ =
1 C2 p+1
such that (1 − 2 p Cγ) = 21 . Then
1 IN ≤ 2 p 2
2p ≤ p γ
N 2
pλ p−1 | m ∈ Z : M # a(m) > γλ |dλ
0
Now, take N → ∞, we get m∈Z
N 2
pλ p−1 | m ∈ Z : M # a(m) > λ |dλ
0
Md a(m) p ≤ C
m∈Z
p
M # a(m) .
Relations Between Discrete Maximal Operators …
407
References 1. Duandikotoxea, Fourier analysis. Graduate texts in mathematics 2. Alphonse AM, Madan S (1995) The commutator of the ergodic Hilbert transform. Contemporary Mathematics, vol 189 3. Alphonse A, Madan S (1994) On ergodic singular integral operators. Colloquium Mathematicum, vol LXVI 4. Swarup ASS, Alphonse AM The boundedness of Fractional Hardy-Littlewood maximal operator on variable lp(Z) spaces using Calderon-Zygmund decomposition. arXiv:2204.04331
Multivariate Bernstein α-Fractal Functions D. Kumar, A. K. B. Chand, and P. R. Massopust
Abstract This article deals with novel multivariate Bernstein α-fractal functions and their approximation theory. These fractal functions provide approximants of continuous functions with two free parameters: base functions and scaling functions. The proposed multivariate Bernstein α-fractal functions are more general than the existing bivariate α-fractal functions in the literature. We also give the construction of multivariate Bernstein α-fractal functions with the help of multivariate Bernstein operators that provide non-negative approximants for continuous and non-negative I functions of C( m k k=1 ). Keywords Fractal interpolation function · Fractal functions · Multivariate Bernstein operator · Positivity
1 Introduction Most of the functions/data in experimental works are non-linear and irregular. Therefore, classical functions are not suitable approximants for these functions. Aiming function/data-required continuous representation with high regularities, Barnsley [1] introduced the concept of fractal interpolation functions (FIFs). Barnsley and Harrington also discussed how to construct FIFs that are smooth in [2]. By taking advantage of non-smoothness of fractal interpolation functions, many works have been recently carried out focusing on the fractional calculus of fractal functions [7, 10, 22]. Fractal interpolation provides smooth and non-smooth approximants that make it different from other interpolation and approximation methods. D. Kumar · A. K. B. Chand (B) Department of Mathematics, Indian Institute of Technology Madras, Chennai 600036, India e-mail: [email protected] P. R. Massopust Centre of Mathematics, Technical University of Munich (TUM), 85748 Garching b. München, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D. Giri et al. (eds.), Proceedings of the Ninth International Conference on Mathematics and Computing, Lecture Notes in Networks and Systems 697, https://doi.org/10.1007/978-981-99-3080-7_31
409
410
D. Kumar et al.
Fractal interpolation and approximation got much attention in the last 30 years. In order to approximate 3-D natural object and complex structures, fractal surfaces (bivariate fractal functions) were studied by various authors. The construction of fractal interpolation surfaces from suitable iterated function systems (IFSs) was first studied by Massopust [11]. After this, many authors explored a wide variety of constructions of fractal surfaces; see, for instance, [3, 4, 15, 19–21, 23]. In order to construct continuous fractal surface as a fixed point of a suitable operator on a rectangular domain, Ruan and Xu modified the existing construction on the even or odd index of data points [17] recently. In [18], Vijender developed the univariate Bernstein α-fractal functions and studied its convergence based on a sequence of base functions so that the scale functions can be assumed to be non-zero. Based on Ruan and Xu’s construction, Jha et al. in [9] proposed the bivariate Bernstein α-fractal functions, and studied the approximation of bivariate functions in C(I × J ) and L p (I × J ), where I and J are bounded and closed intervals in R. They also studied box-dimension estimates for bivariate α-fractal functions, where interpolation data is distributed over a rectangular gird. Recently, Pandey et al. [16] developed the multivariate fractal interpolation function for an interpolation data over a m-dimensional rectangular grid. The paper is organized as follows. In Sect. 2, we give basics of univariate and multivariate fractal interpolation methodologies for the existence of multivariate α-fractal functions. We propose novel class of multivariate α-fractal functions in Sect. 3. In Sect. 4, we derive sufficient conditions to preserve the positivity feature of a multivariate function.
2 Background and Preliminaries 2.1 Univariate Fractal Interpolation Function We shall start the outline of fractal interpolation. For more details, the reader can consult details in [1, 8, 12]. We fix the following notations: Nm := {1, . . . , m}, Nm,0 := {0, 1, . . . , m} ∂Nm,0 := {0, m}, intNm,0 := {1, . . . , m − 1}. Suppose that I = [x0 , x N ] is a closed and bounded interval of R, and Δ = {x0 < x1 < · · · < x N } is the partition of I . For i ∈ N N , define homeomorphism mappings u i : I → Ii such that for all x, x ∈ I and for some 0 < li < 1, |u i (x) − u i (x )| ≤ li |x − x |, u i (x0 ) = xi−1 , u i (x N ) = xi .
Multivariate Bernstein α-Fractal Functions
411
Consider N maps vi : I × R → R, which are continuous in the x-direction and contractive in the y-direction and satisfy the following conditions: vi (x0 , y0 ) = yi−1 , vi (x N , y N ) = yi , |vi (x, y1 ) − vi (x, y2 )| ≤ κi |y1 − y2 | for all y1 , y2 ∈ R, i ∈ N N , κi ∈ [0, 1). For i ∈ N N , define mappings wi : I × R → Ii × R as wi (x, y) = (u i (x), vi (x, y)), ∀(x, y) ∈ I × R. Then we have a hyperbolic iterated function system (IFS) I := {I × R; wi (x, y) = (u i (x), vi (x, y)), i ∈ N N }.
(2.1)
It is known that [1] the IFS I possesses an attractor G on H (I × R) := Hyperspace of I × R with the Hausdorff metric. Let us consider G := {g ∈ C(I ) : g(x0 ) = y0 and g(x N ) = y N }. We define a metric on G by ρ(h, g) := max{|h(x) − g(x)| : x ∈ I } for g, h ∈ G. Then (G, ρ) is a complete metric space. Define the Read Bajraktarevi`c operator (RB operator) T on (G, ρ) by Tg(x) =
N
vi (u i−1 (x), g ◦ u i−1 (x))χu i (I ) (x), x ∈ I.
(2.2)
i=1
Theorem 1 [1] The IFS I defined in (2.1) has a unique attractor G which is a graph of a continuous f ∗ : I → R satisfying f ∗ (xi ) = yi , for all i. Furthermore, f ∗ is the unique fixed point of the operator T defined in (2.2). The fixed point f ∗ of T in the previous theorem is called fractal interpolation function (FIF) determined by the IFS I, and f ∗ obeys the fixed point equation ∗
f (x) =
N
vi (u i−1 (x), f ∗ ◦ u i−1 (x))χu i (I ) (x), x ∈ I.
i=1
Barnsley [1] and Navascuès [12] observed that the concept of FIFs can be used to define a class of fractal functions associated with a given function f ∈ C(I ). Now we need a continuous function b : I → R that fulfils the conditions b(x0 ) = f (x0 ), b(x N ) = f (x N ) with b = f , and real numbers αi , i ∈ N N satisfying |αi | < 1 for the construction of a family of fractal perturbation functions from f ∈ C(I ). Let Δ := {x0 < x2 < . . . < x N } be a partition of [x0 , x N ]. Define an IFS I through the maps
412
D. Kumar et al.
u i (x) = ai x + bi , vi (x, y) = αi y + f (u i (x)) − αi b(x), i ∈ N N . α = f α, Then the corresponding FIF to data {(xi , f (xi )) : i ∈ N N ,0 } is denoted by f Δ,b and is referred to as the α-fractal function for f with respect to scaling vector α = (α1 , α2 , ..., α N ), base function b, and partition Δ. The function f α is the fixed point of the RB operator T : G → G defined by
(Tg)(x) = f (x) +
N
αi (g(u i−1 (x)) − b(u i−1 (x)))χu i (I ) (x), x ∈ I,
i=1
where G := {g ∈ C(I ) : g(x0 ) = f (x0 ), g(x N ) = f (x N )}. Consequently, α-fractal function f α can be generated by using the following relation: f α (x) = f (x) +
N
the
αi ( f α (u i−1 (x)) − b(u i−1 (x)))χu i (I ) (x), x ∈ I.
i=1
To obtain fractal functions with more flexibility, the constant scaling αi of the map vi (x, y) in (2.1) can be replaced by variable scaling function αi (x). Thus, we have the following IFS: I := {I × R; wi (x, y) = (u i (x), vi (x, y)), i ∈ N N }; u i (x) = ai x + bi , vi (x, y) = αi (x)y + f (u i (x)) − αi (x)b(x), i ∈ N N . In this case, the α-fractal function f α with variable scaling satisfies the self-referential equation f α (x) = f (x) +
N
αi (u i−1 (x))( f α (u i−1 (x)) − b(u i−1 (x)))χu i (I ) (x), x ∈ I.
i=1
The fractal perturbation process introduces a fractal operator: F α : C(I ) → C(I ) such that F α ( f ) = f α . Further, for b = L f , where L : C(I ) → C(I ) is a bounded linear map and scaling function fulfilling α ∞ := max{ αi ∞ : i ∈ N N } < 1, the α-fractal operator F α is a bounded linear operator [13].
Multivariate Bernstein α-Fractal Functions
413
2.2 Multivariate Fractal Interpolation Let m ≥ 2 be a natural number, C( m k=1 Ik ) be the Banach space empowered with the sup-norm and Ik = [ak , bk ], k ∈ Nm be closed and bounded intervals of R. Consider the interpolation data Δ := {(xk, j1 , . . . , xk, jm , y j1 ... jm ) : jk ∈ N Nk , and k ∈ Nm } such that ak = xk,0 < ... < xk,Nk = bk for each k ∈ Nm . Here, {ak = xk,0 , . . . , xk,Nk = bk } is the partition of Ik , and we denote the jk -th subinterval of Ik by Ik, jk = [xk, jk −1 , xk, jk ], jk ∈ N Nk . For every jk ∈ N Nk , consider an affine map u k, jk : Ik → Ik, jk satisfying |u k, jk (x) − u k, jk (x )| ≤ κk, jk |x − x |, ∀x, x ∈ Ik ,
(2.3)
where 0 < κk, jk < 1, and
u k, jk (xk,0 ) = xk, jk −1 and u k, jk (xk,Nk ) = xk, jk , if jk is odd, u k, jk (xk,0 ) = xk, jk and u k, jk (xk,Nk ) = xk, jk −1 , if jk is even.
(2.4)
It is clear from (2.4) that −1 u −1 k, jk (x k, jk ) = u k, jk +1 (x k, jk ), ∀ jk ∈ intN Nk ,0 .
(2.5)
Define a map τ : Z × {0, N1 , . . . , Nm } → Z such that
τ ( j, 0) = j − 1 and τ ( j, Nk ) = j if j is odd, τ ( j, 0) = j and τ ( j, Nk ) = j − 1 if j is even.
(2.6)
Using (2.6), we can rewrite (2.4) as u k, jk (xk,ik ) = xk,τ ( jk ,ik ) , ∀ jk ∈ N Nk , i k ∈ ∂N Nk ,0 , k ∈ Nm .
(2.7)
m Let K = ( m k=1 Ik ) × R. For each ( j1 , . . . , jm ) ∈ k=1 N Nk , define a continuous function v j1 ... jm : K → R satisfying the following conditions: v j1 ... jm (x1,i1 , . . . , xm,im , yi1 ...im ) = yτ ( j1 ,i1 )...τ ( jm ,im ) , ∀ (i 1 , . . . , i m ) ∈
m
∂N Nk ,0
k=1
(2.8) and
414
D. Kumar et al.
|v j1 ... jm (x1 , . . . , xm , y) − v j1 ... jm (x1 , . . . , xm , y )| ≤ γ j1 ... jm |y − y |, (x1 , . . . , xm ) ∈
m
Ik , and y, y ∈ R,
(2.9)
k=1
where 0 ≤ γ j1 ... jm < 1. For ( j1 , . . . , jm ) ∈ m k=1 N Nk , we define W j1 ,..., jm : (
m
Ik ) × R → (
k=1
m
Ik, jk ) × R
k=1
such that W j1 ,..., jm (x1 , . . . , xm , y) = (u 1, j1 (x1 ), . . . , u m, jm (xm ), v j1 ... jm (x1 , . . . , xm , y)). (2.10) Finally, consider the IFS for the multivariate data as I = {K , W j1 ,..., jm : ( j1 , . . . , jm ) ∈
m
N Nk }.
(2.11)
k=1
Let us consider m m Ik ) : g(x1, j1 , . . . , xm, jm ) = y j1 ... jm , ( j1 , . . . , jm ) ∈ ∂N Nk ,0 } G := {g ∈ C( k=1
k=1
and the uniform metric ρ( f, g) := max{| f (x1 , . . . , xm ) − g(x1 , . . . , xm )| : (x1 , . . . , xm ) ∈
m
Ik } for f, g ∈ G.
k=1
Then (G, ρ) is a complete metric space. Define RB operator T on (G, ρ) as Tg(x1 , . . . , xm ) =
( j1 ,..., jm )∈ m k=1 N Nk
−1 v j1 ... jm (u −1 1, j1 (x 1 ), . . . , u m, jm (x m ),
−1 m g(u −1 1, j1 (x 1 ), . . . , u m, jm (x m )))χu j1 ,... jm ( k=1 Ik ) (x 1 . . . , x m ), (2.12) m Ik . ∀(x1 , . . . , xm ) ∈ k=1
It is easy to verify that T is well defined and contractive [16]. Therefore, utilizing Banach fixed point theorem, the unique fixed point f ∗ of T satisfies
Multivariate Bernstein α-Fractal Functions
f ∗ (x1 , . . . , xm ) =
( j1 ,..., jm )∈ m k=1 N Nk
415 −1 v j1 ... jm (u −1 1, j1 (x 1 ), . . . , u m, jm (x m ),
−1 m f ∗ (u −1 1, j1 (x 1 ), . . . , u m, jm (x m )))χu j1 ,..., jm ( k=1 Ik ) (x 1 , . . . , x m ),
∀(x1 , . . . , xm ) ∈
m
Ik .
k=1
(2.13) −1 −1 (X ) = (u (x ), . . . , u (x )) and Let us assume that X = (x1 , . . . , xm ), u −1 j1 ... jm 1,i 1 1 m, jm m u i1 ...im (X ) = (u 1,i1 (x1 ), . . . , u m, jm (xm )). Then we can write the above self-referential equation as m Ik, jk , f ∗ (u j1 ... jm (X )) = v j1 ... jm (X, f ∗ (X )), ∀X ∈ ( j1 , . . . , jm ) ∈
k=1 m
(2.14) N Nk .
k=1
This unique fixed point f ∗ interpolates data points Δ, and graph G( f ∗ ) = {(x1 , . . . , xm , f ∗ (x1 , . . . , xm )) : (x1 , . . . , xm ) ∈
m
Ik }
k=1
of f ∗ satisfies G( f ∗ ) =
( j1 ,..., jm )∈ m k=1 N Nk
W j1 ,..., jm (G( f ∗ )).
The function f ∗ is known as multivariate fractal interpolation function corresponding to IFS (2.11).
2.3 Multivariate α-Fractal Function m For m a given f ∈ C( k=1 Ik ), we need a continuous base function such as b : k=1 Ik → R satisfying the conditions: b(x1, j1 , . . . , xm, jm ) = f (x1, j1 , . . . , xm, jm ), ∀( j1 , . . . , jm ) ∈
m k=1
Also, we need a partition
∂N Nk ,0 .
(2.15)
416
D. Kumar et al.
Δ := {(xk, j1 , . . . , xk, jm ) : ( j1 , . . . , jm ) ∈
m
N Nk ,0 } such that
k=1
ak = xk,0 < . . . < xk,Nk = bk for each k ∈ Nm . For k ∈ Nm , affine maps u k, jk : Ik → Ik, jk are defined as u k, jk (x) = ak, jk (x) + bk, jk , jk ∈ N Nk ,
(2.16)
where ak, jk and bk, jk are chosen such that maps u k, jk satisfy (2.3) and (2.4). Consider a continuous function known as scaling function as α : m k=1 Ik → R with α ∞ < 1. Further, define v j1 ... jm (x1 , . . . , xm , y) = f (u 1, j1 (x1 ), . . . , u 1, j1 (xm )) + α(x1 , . . . , xm )(y − b(x1 , . . . , xm )), ∀ ( j1 , . . . , jm ) ∈
m
(2.17)
N Nk .
k=1
The IFS defined by the maps defined in (2.16) and (2.17) determines a fractal function α , and it is the fixed point of known as multivariate α-fractal function denoted by f Δ,b the RB operator T : G → G defined by Tg(x1 , . . . , xm ) = f (x1 , . . . , xm ) +
( j1 ,..., jm )∈ m k=1 N Nk
−1 α(u −1 1, j1 (x 1 ), . . . , u m, jm (x m ))
−1 −1 −1 · (g(u −1 1, j1 (x 1 ), . . . , u m, jm (x m )) − b(u 1, j1 (x 1 ), . . . , u m, jm (x m )))
· χu j1 ,..., jm (mk=1 Ik ) (x1 , . . . , xm ), ∀(x1 , . . . , xm ) ∈
m
Ik .
k=1
(2.18)
α also satisfies the self-referential equation The fixed point f Δ,b α f Δ,b (x1 , . . . , xm ) = f (x1 , . . . , xm ) +
( j1 ,..., jm )∈ m k=1 N Nk
−1 α(u −1 1, j1 (x 1 ), . . . , u m, jm (x m ))
−1 −1 −1 α · ( f Δ,b (u −1 1, j1 (x 1 ), . . . , u m, jm (x m )) − b(u 1, j1 (x 1 ), . . . , u m, jm (x m )))
· χu j1 ,..., jm (mk=1 Ik ) (x1 , . . . , xm ), ∀(x1 , . . . , xm ) ∈
m
Ik .
k=1
(2.19) Equivalently,
Multivariate Bernstein α-Fractal Functions
417
α f Δ,b (u 1, j1 (x1 ), . . . , u m, jm (xm )) = f (u 1, j1 (x1 , . . . , u m, jm (xm ))+ α α(x1 , . . . , xm )( f Δ,b (x1 , . . . , xm ) − b(x1 , . . . , xm )),
∀(x1 , . . . , xm ) ∈
m
Ik , ( j1 , . . . , jm ) ∈
k=1
m
(2.20)
N Nk .
k=1
α α Or, f Δ,b (u j1 ... jm (X )) = f (u j1 ... jm (X )) + α(X )( f Δ,b (X ) − b(X )),
∀X ∈
m
Ik , ( j1 , . . . , jm ) ∈
k=1
m
N Nk .
(2.21)
k=1
We can easily get the following inequality from (2.21) α
f Δ,b − f ∞ ≤
α ∞
f − b ∞ . 1 − α ∞
(2.22)
α − f ∞ → 0 as either α ∞ → 0 or f − From (2.22), we observe that f Δ,b b ∞ → 0. In [18], Vijendra et al. developed the fractal functions using univariate Bernstein polynomial as base functions and studied its convergence, shape preserving aspects apart from the results on Bernstein α-fractal Fourier series, C r -Bernstein fractal functions α-fractal, etc. Similarly, here we will introduce the multivariate Bernstein fractal function, and study its positivity aspects.
3 Multivariate Bernstein α-Fractal Function α To get the convergence of multivariate α-fractal function f Δ,b to f without imposing a condition on the scaling function α, we take the base function b as multivariate f (x1 , . . . xm ) [5, 6] of f . The (n 1 , . . . , n m )-th BernBernstein polynomial Bn 1 ,...,n m stein polynomial for f ∈ C( m k=1 Ik ) is given by
Bn 1 ,...,n m f (x1 , . . . xm ) =
n1 k1 =0
...
nm
f (x1,0 + (x1,N1 − x1,0 )
km =0
(xm,Nm
k1 , . . . , xm,0 + n1
m km − xm,0 ) ) bk ,n (xr ), n m r =1 r r
(3.1)
where bkr ,nr (xr ) =
n r (xr − xr,0 )kr (xr,Nr − xr )nr −kr , r = 1, . . . , m; n 1 , . . . , n m ∈ N. kr (xr,Nr − xr,0 )nr
418
D. Kumar et al.
From (2.11), the IFS In 1 ,...,n m := {K , Wn 1 ,...,n m ; j1 ,..., jm : ( j1 , . . . , jm ) ∈
m
N Nk },
(3.2)
k=1
where Wn 1 ,...,n m ; j1 ,..., jm (x1 , . . . , xm , y) := (u 1, j1 (x1 ), . . . , u m, jm (xm ), vn 1 ,...,n m ; j1 ... jm (x1 , . . . , xm , y)) and vn 1 ,...,n m ; j1 ... jm (x1 , . . . , xm , y) = f (u 1, j1 (x1 ), . . . , u 1, j1 (xm ))+ α(x1 , . . . , xm )(y − Bn 1 ,...,n m f (x1 , . . . , xm )), ∀ ( j1 , . . . , jm ) ∈
m
N Nk
k=1 α α determines the multivariate α-fractal function f Δ,B = f Δ;n = f nα1 ,...,n m , n 1 ,...,n m 1 ,...,n m and it is called the multivariate Bernstein α-fractal function corresponding to the continuous function f : m k=1 Ik → R. If b(x) = bn 1 ,n...,n m f (x), then using (2.21), we have α (u j1 ... jm (X )) = f (u j1 ... jm (X )) + f Δ;n 1 ,...,n m α (X ) − Bn 1 ,...,n m f (X )), α(X )( f Δ;n 1 ...n m
∀X ∈
m
Ik , and ( j1 , . . . , jm ) ∈
k=1 α Definition 1 Define a operator FΔ,B n α FΔ,B n
1 ,...,n m
1 ,...,n m
m
(3.3) N Nk .
k=1
: C(
m
k=1 Ik )
α ( f ) = f Δ,B n
1 ,...,n m
→ C(
m
k=1 Ik )
such that
,
where Δ is the set of data points, Bn 1 ,...,n m is multivariate Bernstein operator and α is the scaling function. This operator is termed as multivariate Bernstein α-fractal operator. Theorem 2 Multivariate Bernstein α-fractal operator α FΔ,B n
1 ,...,n m
: C(
m k=1
is a linear operator and bounded.
Ik ) → C(
m k=1
Ik )
Multivariate Bernstein α-Fractal Functions
419
Proof of this theorem is similar to the univariate case [13] by using auxiliary results as multivariate Bernstein polynomials [6]. Theorem 3 Let f ∈ C( m k=1 Ik ). Then multivariate Bernstein α-fractal function α f Δ,Bn ,...,nm converges uniformly to f as n i → ∞ for all i ∈ Nm . 1
Proof From (3.3), we can get α
f Δ,B n
1 ,...,n m
α − f ∞ ≤ α ∞ f Δ,B n
≤ α ∞
1 ,...,n m
− Bn 1 ,...,n m f ∞
α f Δ,B n 1 ,...,n m
− f ∞ + α ∞ Bn 1 ,...,n m f − f ∞
which implies α
f Δ,B n
1 ,...,n m
− f ∞ ≤
α ∞
f − Bn 1 ,...,n m f ∞ . 1 − α ∞
(3.4)
It is known [6] that f − Bn 1 ,...,n m f ∞ → 0 as n i → ∞ for all i ∈ Nm and hence α − f ∞ → 0 as n i → ∞ for all i ∈ Nm . That is, from (3.4), we get f Δ,B n 1 ,...,n m α f Δ,Bn ,...,nm converges to f uniformly as n i → ∞ for all i ∈ Nm even if αi ∞ = 0. 1
Example 1 In this example, we give the verification of Theorem 3. Let f (x, y) = x 2 y2 sin( x4π 2 +y 2 +1 ) in I1 × I2 , where I1 = I2 = [0, 1]. Consider a multivariate interpolation data set: Δ=[0 0 0; 1/3 0 0; 2/3 0 0; 1 0 0; 0 1/3 0; 1/3 1/3 1/9; 2/3 1/3 2/9; 1 1/3 1/3; 0 2/3 0; 1/3 2/3 2/9; 2/3 2/3 4/9; 1 2/3 2/3; 0 1 0; 1/3 1 1/3; 2/3 1 2/3; 1 1 1], where each triplet represents (xi , yi , f (xi , yi )), α(X ) = 0.7, X ∈ m k=1 Ik , and x 2 y2 m = 2. Figure 1a is the plot corresponding to the function f (x, y) = sin( x4π 2 +y 2 +1 ). Figure 1b and e gives the verification of the above theorem for the convergence results based on larger indices for Bernstein functions even if α is not close to the zero vector. Figure 1c and d confirms that the convergence in x- and y-directions are based on higher indices for n 1 and n 2 respectively of Bernstein base function. Figures 2a α as α ∞ goes to 0 even if n 1 and n 2 are and 1f show the convergence of f Δ,B n 1 ,n 2 fixed.
4 Positive Approximation with Multivariate Bernstein α-Fractal Functions Here, we discuss the approximation of a positive continuous function with positive multivariate Bernstein α-fractal functions. m Theorem 4 Let f ∈ C( m k=1 Ik ) and f (X ) ≥ 0 for all X ∈ k=1 Ik . Consider a data set based on the grid
420
D. Kumar et al.
2 2
y (a) Graph of f (x, y) = sin( x4πx 2 +y 2 +1 )
α , α = 0.7 (b) Graph of f10,10
α , α = 0.7 (c) Graph of f20,2
α (d) Graph of f2,20 , α = 0.7
α , α = 0.7 (e) Graph of f20,20
Fig. 1 Convergence of f nα1 ,n 2 as n 1 , n 2 → ∞
α (f) Graph of f2,2 , α = 0.1
Multivariate Bernstein α-Fractal Functions
α (a) Graph of f2,2 , α = 0.9
421
α (b) Graph of f20,20 , α = 0.9
Fig. 2 Convergence of f nα1 ,n 2 as n 1 , n 2 → ∞
Δ := {(xk, j1 , . . . , xk, jm ) : ( j1 , . . . , jm ) ∈
m
N Nk ,0 },
k=1
where ak = xk,0 < . . .