Distribution Theory: Principles and Applications 1774912147, 9781774912140

This book provides a thorough understanding of distribution theory and data analysis using statistical software to solve

414 53 14MB

English Pages 265 [266] Year 2023

Table of contents :
Cover
Half Title
Title Page
Copyright Page
About the Editors
Table of Contents
Contributors
Abbreviations
Preface
1. Descriptive Statistics: Principles and Applications
2. Random Variables and Their Properties
3. Discrete Distributions
4. Continuous Distributions
5. Family of Weibull Distributions
6. Life Distributions
7. Dynamics of Data Analysis and Visualization with Python
Index

Recommend Papers

Relativistic kinetic theory: principles and applications 0444854533, 9780444854537

369 78 4MB Read more

Nutrition Principles and Applications

Book used for BPK 110 at SFU, this book contains introductory level information for nutrition and health. Forcing studen

278 79 10MB Read more

Precipitation: Theory, Measurement and Distribution 0521851173, 9780521851176

Precipitation plays a significant role in the climate system, and this book, originally published in 2006, was the first

394 67 4MB Read more

Toxicogenomics: Principles and Applications

John Wiley & Sons, 2004. - 361 p.Contents Preface General Toxicology A Short Introduction to the Expression Profile

598 102 2MB Read more

Value Distribution Theory 3662029170, 9783662029176

It is well known that solving certain theoretical or practical problems often depends on exploring the behavior of the r

373 60 6MB Read more

Refrigeration: Theory And Applications

Bookboon, 2013. — 128 p. — ISBN: 978-87-403-0363-6Refrigeration: Theory and Applications by James K Carson covers the ba

528 103 3MB Read more

Distribution Theory: Convolution, Fourier Transform, and Laplace Transform 9783110298512, 9783110295917

The theory of distributions has numerous applications and is extensively used in mathematics, physics and engineering. T

177 55 952KB Read more

Managerial Epidemiology: Principles and Applications 9781284082173

Managerial Epidemiology provides a solid balance of baseline materials on epidemiologic methods with a focus on tools an

434 34 20MB Read more

Electrochemistry: Principles, Methods, And Applications 0198553889, 9780198553885

This much-needed, comprehensive text offers an introduction to electrochemistry. The book begins at an elementary level

508 56 75KB Read more

Smart Polymers: Principles and Applications 9781501522468, 9781501522406

Smart polymers react sharply to small changes in physical or chemical conditions and present an intelligent response to

175 41 2MB Read more

Distribution Theory: Principles and Applications
1774912147, 9781774912140

Author / Uploaded
Fozia Homa
Mukti Khetan
Mohd. Arshad
Pradeep Mishra

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

DISTRIBUTION THEORY Principles and Applications

DISTRIBUTION THEORY Principles and Applications

Edited by

Fozia Homa, PhD Mukti Khetan, PhD Mohd. Arshad, PhD Pradeep Mishra, PhD

First edition published 2024 Apple Academic Press Inc. 1265 Goldenrod Circle, NE, Palm Bay, FL 32905 USA 760 Laurentian Drive, Unit 19, Burlington, ON L7N 0A4, CANADA

CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 USA 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN UK

© 2024 by Apple Academic Press, Inc. Apple Academic Press exclusively co-publishes with CRC Press, an imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the authors, editors, and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors, editors, and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library and Archives Canada Cataloguing in Publication Title: Distribution theory : principles and applications / edited by Fozia Homa, PhD, Mukti Khetan, PhD, Mohd. Arshad, PhD, Pradeep Mishra, PhD. Other titles: Distribution theory (Palm Bay, Fla.) Names: Homa, Fozia, editor. | Khetan, Mukti, editor. | Arshad, Mohd. (Lecturer in mathematics), editor. | Mishra, Pradeep (Professor of statistics), editor. Description: First edition. | Includes bibliographical references and index. Identifiers: Canadiana (print) 20230150616 | Canadiana (ebook) 20230150683 | ISBN 9781774912140 (hardcover) | ISBN 9781774912157 (softcover) | ISBN 9781003336303 (ebook) Subjects: LCSH: Distribution (Probability theory)—Statistical methods—Data processing. | LCSH: Mathematical statistics—Data processing. Classification: LCC QA273.6 .D57 2023 | DDC 519.2/4—dc23 Library of Congress Cataloging-in-Publication Data

CIP data on file with US Library of C ongress

ISBN: 978-1-77491-214-0 (hbk) ISBN: 978-1-77491-215-7 (pbk) ISBN: 978-1-00333-630-3 (ebk)

About the Editors Fozia Homa, PhD Department of Statistics, Mathematics and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India Fozia Homa, PhD, is an Assistant Professor and Scientist in the Department of Statistics, Mathematics and Computer Application, Bihar Agricultural University, Sabour, India, and author or co-author of several journal articles. Dr. Fozia has received numerous awards in recognition of her research and teaching achievements from several organizations of national and international repute. She has been conferred with the Young Scientist Award in the field of Mathematics and Statistics (2016) by Aufau Periodicals, Tamil Nadu, India. She was also awarded with a SP Dhall Distinguished Publication Award in Statistics (2015) by the Society for Advancement of Human and Nature, Himachal Pradesh, India; Young Scientist Award (2015) by the Venus International Foundation, Chennai, India; and Best Young Researcher Award (2015) by GRABS Educational Trust, Chennai, India. She has been an active member of the organizing committees of several national and international seminars, conferences, and summits. Dr. Homa acquired her BSc (Statistics Hons) and MSc (Statistics) degrees from Banaras Hindu University, Varanasi, Uttar Pradesh, India, and her PhD (Applied Statistics) with a specialization in sampling techniques from the Indian Institute of Technology (Indian School of Mines), Dhanbad, Jharkhand, India. She has received several grants from various funding agencies to carry out her research projects. Her areas of specialization include sample surveys, population studies, and mathematical modeling.

vi

About the Editors

Mukti Khetan, PhD Department of Mathematics, Amity University, Mumbai, Maharashtra, India Mukti Khetan, PhD, is currently working as an Assistant Professor in the Department of Mathematics, Amity School of Applied Sciences, Amity University, Mumbai, Maharashtra, India. She formerly worked in the Department of Mathematics at IIT Bombay, Maharashtra, and in the Department of Statistics at Sambalpur University, Sambalpur. She completed her PhD in Statistics from IIT(ISM), Dhanbad, and her MSc degree in Statistics from Banaras Hindu University, Varanasi. She has published several research papers and book chapters in reputed national and international journals. She has supervised three MPhil students and has more than four years of teaching and research experience.

Mohd. Arshad, PhD Department of Mathematics, IIT Indore, Madhya Pradesh, India Mohd. Arshad, PhD, is currently working as an Assistant Professor in the Department of Mathematics, Indian Institute of Technology Indore, India. He has previously worked in the Department of Statistics and Operations Research at Aligarh Muslim University, India. After acquiring MSc degree in Statistics, Gold Medalist, from C.S.J.M. University, Kanpur, he completed his PhD in Statistics from IIT Kanpur. He has published several research papers in reputed international journals. He has supervised two PhD students and is presently supervising two PhD students. He is a member of the editorial boards of various journals and scientific societies. He is an author of a book titled, Solutions to IIT JAM for Mathematical Statistics. Dr. Arshad has been teaching undergraduate and postgraduate courses for more than seven years.

About the Editors

vii

Pradeep Mishra, PhD Department of Mathematics and Statistics, College of Agriculture, Jawaharlal Nehru Agricultural University, Jabalpur, Madhya Pradesh, India Pradeep Mishra, PhD, is an Assistant Professor of statistics at the College of Agriculture at the Jawaharlal Nehru Agricultural University, Jabalpur, Madhya Pradesh, India. Dr. Mishra formerly served as a data management specialist at a private multinational company for several years. He specializes in the field of time series, design of experiments, and agriculture statistics. He has published more than 45 research articles in international and national journals. He has won several awards, including a Young Scientist Award in 2017 at the international conference Global Research Initiatives for Sustainable Agriculture and Allied Sciences (GRISAAS2017), Best Doctoral Degree at the same meeting in 2018, and an award for a paper from the Society of Economics and Development in 2018, among other honors. Dr. Mishra earned his BSc in Agriculture from the College of Agriculture, Bilaspur, affiliated with the Indira Gandhi Krishi Vishwavidyalaya, Raipur (C.G.); his MSc in Agricultural Statistics from the College of Agriculture, Jawaharlal Nehru Agricultural University; and his PhD in Agriculture Statistics, specializing in modeling and forecasting of foods crops in India and their yield sustainability. During his PhD, he was selected for an INSPIRE Fellowship by the Department of Science and Technology of the Government of India due to his valuable research work.

Contents

Contributors.......................................................................................................... xi Abbreviations...................................................................................................... xiii Preface..................................................................................................................xv 1.

Descriptive Statistics: Principles and Applications................................... 1

Fozia Homa, Mohd. Arshad, Vijay Kumar, Shubham Thakur, and Pradeep Mishra

2.

Random Variables and Their Properties................................................. 25

Vijay Kumar, Mukti Khetan, Mohd. Arshad, and Shweta Dixit

3.

Discrete Distributions................................................................................ 51

Mukti Khetan, Vijay Kumar, Mohd. Arshad, and Prashant Verma

4.

Continuous Distributions........................................................................ 107

Shweta Dixit, Mukti Khetan, Mohd. Arshad, Prashant Verma, and Ashok Kumar Pathak

5.

Family of Weibull Distributions............................................................. 151

Mohd. Arshad, Vijay Kumar, Mukti Khetan, and Fozia Homa

6.

Life Distributions..................................................................................... 189

Mohd. Arshad, Mukti Khetan, Vijay Kumar, and Fozia Homa

7.

Dynamics of Data Analysis and Visualization with Python................. 209

Prashant Verma, Shweta Dixit, Mukti Khetan, Suresh Badarla, and Fozia Homa

Index................................................................................................................... 243

Contributors

Mohd. Arshad

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India

Suresh Badarla

Department of Mathematics, Amity School of Applied Sciences, Amity University, Mumbai, Maharashtra, India

Shweta Dixit

Clinical Development Services Agency, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Haryana, India

Fozia Homa

Department of Statistics, Mathematics and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India

Mukti Khetan

Department of Mathematics, Amity School of Applied Sciences, Amity University, Mumbai, Maharashtra, India

Vijay Kumar

Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India

Pradeep Mishra

College of Agriculture, Rewa, JNKVV, Jabalpur, Madhya Pradesh, India

Ashok Kumar Pathak

Department of Mathematics and Statistics, School of Basic and Applied Sciences, Central University of Punjab, Bathinda, Punjab, India

Shubham Thakur

Department of Statistics, Mathematics and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India

Prashant Verma

Department of Statistics, Faculty of Science, University of Allahabad, Prayagraj, India

Abbreviations

AUC area under curve CDF cumulative distribution function CF characteristic function CGF cumulant generating function CPI cumulative performance index FN false negative FP false positive GSK GlaxoSmithKline MGF moment generating function MTTF meantime to failure PDF probability density function PGF probability generating function PMF probability mass function ROC receiver operating characteristic RV random variable TN true negative TP true positive

Preface

The book is written in a way that it has the least prerequisites for its readers. So, anybody with a basic graduate-level understanding of statistics would benefit from it. The book’s ultimate aim is to provide an understanding of distribution theory and data analysis through statistical software. It will prove helpful to readers in solving problems related to basic statistics, probability models, and simulation using statistical software. It will also explains statistical data analysis by taking multiple real-life situations using the open-source software R version 4.0 and Python 3.0+. A detailed study of the statistical models has been provided with examples related to health, agriculture, insurance, etc. Each chapter will help increase your knowledge, starting from basic statistics to advance statistics. This book comprises seven chapters. Chapter 1 introduces statistics to the reader, including definition, scope utilization, and its importance in day-to-day life. This chapter further discusses the methods for graphical representations and summary statistics with the help of numerous examples related to real-life situations. Chapter 2 establishes the concept of random variables (RVs) and their probability distributions, the classification of RVs based on distribution function, and the shapes of the distributions. It also talks about several statistical properties along with the theorems. This chapter profoundly describes the properties with suitable examples. Chapter 3 presents several discrete probability models and their properties, along with detailed derivations. All the mentioned distributions have been explained with their importance in real-life situations, and simulation methods using statistical software R. Chapter 4 consists of the concepts of continuous distributions and derivation of their properties. Further in this chapter, the generation of random samples from the mentioned continuous distribution using R software is introduced for readers to achieve practical insights. Chapter 5 introduces the family of the Weibull distributions with its different cases along with its characteristics such as reliability function, hazard function, mean time to failure (MTTF), etc. Some modifications to the Weibull family have been provided. This chapter also discusses the

xvi Preface

process of sample generation from the Weibull family of distributions using R-software. Chapter 6 presents a few life distributions, such as Pareto, generalized Pareto, and Burr distributions, along with their detailed properties. These properties have been discussed with various graphs generated over R-software for better understanding. Chapter 7 introduces data analysis through the freely available statistical package Python. The chapter is written from the perspective of statistical analysis rather than from the perspective of statistical computing through Python. It will help readers to quick start their data analysis journey through the advanced packages of Python software. The exploratory data analysis, considering the open-source titanic data, explains all aspects of data analysis, right from data cleaning to data interpretation with statistical modeling.

CHAPTER 1

Descriptive Statistics: Principles and Applications FOZIA HOMA1, MOHD. ARSHAD2, VIJAY KUMAR3, SHUBHAM THAKUR1, and PRADEEP MISHRA4 Department of Statistics, Mathematics, and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India

1

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India

2

Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India

3

4

College of Agriculture, Rewa, JNKVV, Jabalpur, Madhya Pradesh, India

1.1 INTRODUCTION Statistics have been described or defined differently from time to time by many authors. In ancient times, statistics was only limited to affairs of state, but now the use of statistics has been widened considerably, and thus, with time, the old definitions have been replaced by new definitions which are more detailed and exhaustive. Thus, the most detailed and exhaustive definition has been given by Prof. Horace Secrist as he defined statistics as an “Aggregate of facts affected to a marked extent by a multiplicity of causes, numerically expressed, enumerated or estimated according to

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

Distribution Theory

2

reasonable standards of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other.” In the case of traditional definitions, Dr. A. L. Bowley has defined statistics as “Statistics may be called the science of counting.” This definition does not define the statistics properly as in the case of large numbers, we cannot count them, but we estimate them. In another definition, he said, “Statistics is the science of measurement of social organism regarded as a whole in all its manifestations.” The definition is vague as this nature of statistics is not defined properly, and the applications of statistics are limited to the social aspect only, or we can say to humans and its activities. There are many other definitions given by different scientists. Some are listed below:  “Statistics is the science of estimates and probabilities.” —Boddington  “Statistics may be regarded as a body of methods for making wise decisions in the face of uncertainty.”—Wallis and Roberts  “Statistics is defined as the science of collection, presentation, analysis, and interpretation of numerical data.”—Croxton and Cowden  “Statistics is both a science and an art.”—L. H. C. Tippet 1.1.1 USES OF STATISTICS Statistics can be used in every aspect of life in one way or other. Statistics can be used in education, sports, and health, by the government in decisionmaking, data science, robotics, business, weather forecasting, and much more. Statistics can be used in our day-to-day life to calculate the right time for getting up to go to the office, school, college, or any program. Similarly like, how much money is required for my monthly expenses or to reach my school and office in a month? In education for comparing the marks of students or to check the performance of students. In health like to calculate how many peoples are affected by a particular disease or to calculate average no. of accidents in an area on a yearly basis. In sports for checking his performance over a period of time and compare it with others in a particular game. Statistics are also used in machine learning, deep learning, aerospace, and many other aspects. Nowadays, several advanced techniques of statistics have been developed for big data analytics.

Descriptive Statistics: Principles and Applications

3

1.1.2 STATISTICS WITH OTHER DISCIPLINES Statistics has relation almost with every discipline. I can’t even name them as the utility of statistics is very wide and it covers every discipline and every sphere of life is affected by statistics in one way or another. Earlier it was only limited to the affairs of the state, but with time, statistics is diversified into various disciplines – physical and natural sciences, social, economics, business, state, and administration, planning, biology, agriculture, astronomy, psychology, and many more. Statistics is mainly applied to quantitative data, but we can also use statistics in the case of qualitative data; for example, attributes such as deaths, defective, new, and old can be transformed into numbers of deaths or number of defectives, etc., as we can quantify them and apply statistics. 1.1.3 ABUSES OF STATISTICS Statistics are very useful in every aspect of life, but as with good, some bad things also occur in every day-to-day life. Similarly, statistics has some abuses also. One can use the knowledge of statistics for his benefit by manipulating data or sneaky tricks to mislead the naïve. Bad data collection–collecting irrelevant samples, small samples, biased samples, nonrandom sampling methods, and using a proxy to replace the data that was lost or not available. Poor application of statistical methods–writing wrong formulae, poor mathematics, use of the wrong statistic, misreporting of errors. Loaded questions–by manipulating the words of question to elicit a certain answer from the respondent. Misleading graphs, pictures, and ignoring important facts and features. 1.2 PRESENTATION OF DATA The first step in statistical analysis is to collect the data and data are generally collected in raw format and which is difficult to understand so after the organization of data the main thing is how we are going to present the data, which is easily understandable to both author and readers and suitable for further analysis and to make suitable conclusions. Data are presented mainly in the following ways:

Distribution Theory

4

1. Textual Presentation: Data is presented in the form of texts and one can reach conclusions by reading the collected data and suitable when the quantity is not very large. This method requires a long time to reach conclusions as we have to read the whole text, which is quite boring and not suitable for quick decision making. 2. Tabular Presentation: It is the orderly and systematic arrangement of data in rows and columns. Main components of the table are: i. Table number ii. Title iii. Headnote iv. Column heading v. Body vi. Stub heading vii. Row headings viii. Footnote ix. Source

Table No.----------- Title ---------------------Headnote ----------Stub Heading

Column Headings

Row Headings Body

------------Foot Note------------

--------------Source-------------3. Graphical or Diagrammatic Presentations: Data shown in tables can be presented in a better way by diagrams and graphs. A diagram is a symbolic or pictorial representation of data and graphs are modifications of diagrams where the mathematical relation between variables can be compared. A graph has two axes horizontal x-axis/ abscissa and vertical y-axis/ordinate. Some standard presentations are: i. Pie Chart: A pie chart (or a circle chart) is a circle divided into component sectors with areas proportional to the size of the corresponding component (Figure 1.1). Pie charts are widely used in business and media for comparing the different data sets. The followings are the key points for a pie chart: a. A pie chart is presenting the data in a circular graph; b. Pie charts are the frequently used graphs to represent realworld information; c. Pie charts are suitable for discrete data (qualitative data); d. The complete pie chart represents the whole data, and their slices represent the parts of the whole data.

Descriptive Statistics: Principles and Applications

5

 Example 1.1: The following data represents confirmed cases of COVID-19 of top five States in India till April 22, 2021. The data were collected from www.mygov.in. State

Total Confirmed Cases

Percentage (approx.)

Maharashtra

4,094,840

50

Kerala

1,322,054

16

Karnataka

1,247,997

15

Uttar Pradesh

9,76,765

12

Chhattisgarh

6,05,568

7

Total

8,247,224

100

FIGURE 1.1 Pie chart.

ii. Line Diagram: It is a chart that shows a line joining several points. It is useful to represent time-series data. Suppose (t,yt) denote a time series data, yt represents the value of the variable at time t. We plot each pair of data in a two-dimensional plane and then joined the successive points by line segments. Then the resulting chart is a line diagram (Figure 1.2).  Example 1.2: The following data represents COVID-19 cumulative death cases from date (09/04/2021–23/04/2021). The data were picked up from the Ministry of Health and Family Welfare, India.

Distribution Theory

6

FIGURE 1.2 Line chart.

ii. Bar Graph: It displays the distribution of a qualitative (categorical) variable, showing the counts for each category next to each other for easy comparison. Bar graphs should have small spaces between the bars to indicate that these are freestanding bars that could be rearranged into any order (Figures 1.3 and 1.4). The followings are the key points for a bar graph: a. A bar graph is a way of demonstrating data using bars in which the length of each bar is proportional to the value they represent; b. The bars present a visual display for assessing quantities in different categories; c. The bars are lined up along a common base; d. Each bar should have equal width; e. Bar graphs are useful in data handling.  Example 1.3: The following data represents COVID-19 cases of India. The data were taken from https://en.wikipedia.org/wiki/ COVID-19_pandemic_in_India. COVID-19 Dashboard Total positive cases

15,924,732

New samples tested

1,639,357

Total active cases

2,290,689

Total recovered cases

13,449,371 As of April 21, 2021

Descriptive Statistics: Principles and Applications

7

FIGURE 1.3 Simple bar chart. COVID-19 Pandemic in India by State State/Union Territory

Cases

Recoveries

Active

Karnataka

1,247,997

1,037,857

196,255

Kerala

1,322,054

1,160,472

156,554

Uttar Pradesh

976,765

706,414

259,810

West Bengal

700,904

621,340

68,798

Maharashtra

4,094,840

3,330,747

701,614

As of 23 April 2021

FIGURE 1.4 Multiple bar chart.

8

Distribution Theory

The data were taken from: https://en.wikipedia.org/wiki/Template: COVID19_pandemic_data. iv. Histogram: It is the graphical presentation of data where data is grouped into continuous number ranges and each range corresponds to a vertical bar. Firstly, we will discuss the groping of data by using the concept of frequency distributions (Figure 1.5). • Frequency Distributions: It is the distribution of the total number of observations into different classes, which shows all the distinct observations of our interest and the number of times they occur (frequency of occurrence). It is generally of two types: o Discrete Frequency Distribution: Suppose we have an individual series with a finite number of observations. When the number of observations is large and the range of data is small, we convert the individual series into discrete frequency distribution. o Continuous Frequency Distribution: When the number of the observations is large and range of data is also large, we convert it into class limits (lowest and highest values that can be included in the class) with some class interval (difference between upper and lower limit) such type of frequency distribution is called continuous frequency distributions. Numbers of classes included can be calculated by K = 1 + 3.322log N (base 10) where N is the total frequency (number of observations). Graphical representation of frequency distribution can be in the form of histogram and frequency polygon. In the case of continuous frequency distribution, we take class intervals along the x-axis and frequency on the y-axis. In each class interval, we draw an erect rectangle height equal to the frequency of that class. Thus, this diagram of continuous rectangles is known as a histogram. If the frequency distribution is not continuous, it is not an appropriate diagram to represent the data. However, some alternative diagrams can be drawn in such cases as bar diagrams. We cannot construct a histogram for distribution with open-end classes.

Descriptive Statistics: Principles and Applications

9

 Example 1.4: The following data represents glucose blood level (mg/100 ml) after a 12 hour fast for a random sample of 70 women. (The data were picked up from American J. Clin. Nutr. Vol. 19, 345–351). Glucose

Frequency

45–55

3

55–65

6

65–75

18

75–85

29

85–95

9

95–105

4

105–115

1

FIGURE 1.5 Histogram.

v. Frequency Polygon: In the case of ungrouped data the frequency polygon can be obtained by plotting the variate values on the x-axis and their corresponding frequency on the y-axis and by joining the points by straight lines. In the case of grouped data with equal class intervals on the x-axis, we take mid values of the class, and on the y-axis take frequency and join the midpoints using straight lines (Figure 1.6).

Distribution Theory

10

 Example 1.5: The following frequency polygon is prepared by using the data given in Table 1.1 (Glucose blood level (mg/100 ml) after a 12 hour fast).

FIGURE 1.6 Frequency polygon.

vi. Cumulative Frequency Curve or Ogive: Firstly, we will discuss the preparation of cumulative frequency distribution. Cumulative frequency is the running total of frequencies up to a certain point (or class) and the distribution of cumulative frequency is known as cumulative frequency distribution. It can be graphically represented by a cumulative frequency curve or ogive. Cumulative frequency is of two types: a. Less than cumulative frequency: It is the sum of the frequency of that class and the frequencies of all preceding classes; and b. More than cumulative frequency: It is the sum of the frequency of that class and all the frequencies of classes below it. Cumulative frequency curve (or ogive) is obtained by plotting upper or lower limits of class interval on the x-axis and its corresponding frequency on the y-axis. There are two methods of constructing ogive: •

Less Than Type: If the group is not continuous, we first convert it into a continuous group by subtracting d/2 from the lower limit of class and adding d/2 to the upper limit of class where d is the jump from the upper limit of the preceding class to the lower limit of next class. Then by plotting the upper limit of a class on the x-axis

Descriptive Statistics: Principles and Applications

11

against the corresponding less than cumulative frequency on the y-axis, we get some points and thus by free-handedly joining these points we get that less than type ogive (Figure 1.7).

FIGURE 1.7 Less than type ogive.

•

More Than Type: Here we take lower limits of the class on the x-axis and corresponding more than cumulative frequency on the y-axis, and by joining the points we get the more than type ogive (Figure 1.8).

FIGURE 1.8 More than type ogive.

Distribution Theory

12

From ogive (less than and more than types) we can find the value of the median. If we draw a perpendicular line at the point of intersection of less than type and more than type on the x-axis where the perpendicular line meet is the value of the median (Figure 1.9).

FIGURE 1.9 Ogive.

 Example 1.6: The following ogives (less than and more than type) are prepared by using the data given in Table 1.1. TABLE 1.1 Glucose Blood Level (mg/100 ml) After a 12 Hour Fast Glucose

CF (Less than)

CF (More than)

45–55

3

70

55–65

9

67

65–75

27

61

75–85

56

43

85–95

65

14

95–105

69

5

105–115

70

1

vii. Scatter Plot: It is a dots plot to represent the values of two variables. It is used to investigate the relation between two variables. In this plot, the first variable assigned to the horizontal axis, and another variable assigned to the vertical axis. It helps to determine whether

Descriptive Statistics: Principles and Applications

13

the relationship between two variables is linear. It also provides a way to detect outliers in the data set (Figures 1.10 and 1.11).  Example 1.7: Auto insurance in Sweden data. In the following data: X = Number of claims; Y = Total payment for all the claims in thousands of Swedish Kronor for geographical zones in Sweden.

FIGURE 1.10 Scatter plot.

FIGURE 1.11 Correlation analysis through scatter plot. Reference: Swedish Committee on Analysis of Risk Premium in Motor Insurance. Source: https://college.cengage.com.

Distribution Theory

14

1.3 DESCRIPTIVE ANALYSIS Descriptive analysis is a classical approach to data analysis. It is a very useful statistical tool to provide good conclusions for a real-life problem based on data information. It includes different measures of central tendency, dispersion, skewness, kurtosis, etc. These measures help us to analyze the data mathematically. Here, we will discuss these measures in brief. 1.3.1 MEASURE OF CENTRAL TENDENCY An average is a single value within the range of the data that is representative of the data. Since an average is somewhere within the range of the data, it is also called a Measure of Central Tendency. In statistics, central tendency represents the center or location of the distribution of the data. Central tendency allows researchers to summarize a large set of data into a single number. It is possible to compare two (or more) sets of data by simply comparing the measure of central tendency for different data sets. A good measure of central tendency (or average) should be rigorously defined, depends on all the observations, less influenced by extreme observations, capable of interpretation, easy to use for further algebraic treatments, etc. Table 1.2 provides a different measure of central tendency. TABLE 1.2 Measure of Central Tendency Mathematical Average

Positional Average

Commercial Average

Arithmetic mean

Median

Moving average

Geometric mean

Mode

Progressive average

Harmonic mean

Quartile, decile, percentile

Composite average

1. Arithmetic Mean: Mean of a set of observations is the sum of observations divided by the total number of observations. Let x1, x2,…xn be the observations in an experiment, then the arithmetic mean (or mean) of these observations is given by:

∑ X =

n

x

i =1 i

n

Descriptive Statistics: Principles and Applications

15

In case of frequency distribution, the arithmetic mean of the data is defined as:

∑ X = ∑

n

fx

i =1 i i n

f

i =1 i

where; fi denotes the frequency of the observation xi, i = 1, 2,…,n. The detailed calculation is given in the Table 1.3.  Example 1.7: Calculate the arithmetic mean for data of glucose blood level (mg/100 ml) given in Tables 1.1 and 1.3. TABLE 1.3 Calculation of Arithmetic Mean Glucose 45–55 55–65 65–75 75–85 85–95 95–105 105–115 Total

Frequency ( f ) 3 6 18 29 9 4 1 70

fx ∑= ∑ f n

= X

i =1 i i n i =1 i

Mid-Value (x)

fx

50 60 70 80 90 100 110 –

150 360 1,260 2,320 810 400 110 5,410

5410 = 77.29. 70

2. Geometric Mean: It is the n th root of the product of n values of a set of observations. It is appropriate for averaging the ratios of changes, for an average of proportions, etc. Let x1, x2,…xn be the observations in an experiment, then the geometric mean of these observations is given by: = GM

1

( x1.x2 .… xn ) n

If we take log both side of the above formula and simplify it, then the formula for the geometric mean is given by: 1  = GM Antilog  ( log ( x1 ) + log ( x2 ) +  + log ( xn )  n 

In case of frequency distribution, the geometric mean of the data is defined as:

Distribution Theory

16

GM

 1  Antilog  n ( f1 * log ( x1 ) + f 2 * log ( x2 ) +  + f n * log ( xn )   ∑ fi   i =1 

where; fi denotes the frequency of the observation xi, i = 1, 2,…,n.  Example 1.8: Consider a stock that grows by 15% in the first year, declines by 10% in the second year, and then grows by 20% in the third year. The geometric mean of the growth rate is calculated as follows: 1

GM = ( (1 + 0.15) * (1 − 0.1) * (1 + 0.2 ) ) 3

= 1.075 = 1 + 90.075

or 7.5% growth rate annually.

3. Harmonic Mean: It is the inverse of the arithmetic mean of the reciprocals of the observations. It is appropriate for the rate of change per unit time such as speed, number of items produced per day, etc. Let x1, x2,…xn be the given observations, then the harmonic mean of these observations is given by: HM =

n 1 1 1 + + + x1 x2 xn

In case of frequency distribution, the geometric mean of the data is defined as: HM =

f1 + f 2 +  + f n f f1 f 2 + + + n x1 x2 xn

where; fi denotes the frequency of the observation xi, i = 1, 2,…,n.  Example 1.9: If a family purchases the milk per liter during October, November, and December is Rs. 50, Rs. 48, and Rs. 45 per liter, then the average price of the milk in these three months will be: = HM

3 = 47.58 1 1 1 + + 50 48 45

Descriptive Statistics: Principles and Applications

17

4. Median: It is that value that divides the whole data into two equal parts. It has half of the values less than the median and half of the values greater than the median. It is a positional average and is not affected by the extremely large or small values in the data. It can be computed for ratio, interval, and ordinal data but not appropriate for nominal data. If the distribution is skewed, the median is a better measure as compared to the arithmetic mean. But it is not capable of further algebraic treatment, e.g., if the medians of two or more data sets are given, then the combined median cannot be calculated, whereas in such cases, the combined mean can be calculated. Let x1, x2,…xn be the given observations, then the median of these observations can be calculated as follows: • First, arrange the observations in increasing (or decreasing) order of magnitude, i.e., let x(1) ≤ x(2) ≤ … ≤ x(n) be the ordered values of x1, x2,…xn.  n +1

If the sample size n is an odd number, then   th observation  2  in the order data is the median. • If the sample size n is an even number, then the median is the

•

n

n



average of   th observation and  + 1 th observation in the 2  2 order data.  Example 1.10: Find the median in the following data: a. 4, 7, 2, 8, 9, 10, 8. First, arrange data in increasing order, i.e., 2, 4, 7, 8, 8, 9,10.  7 +1

The sample size n = 7 is an odd number, i.e.,   = 4th  2  observation 8 is the median. b. 28, 30, 10, 12, 3, 6, 35, 38, 17, 40. First, arrange data in increasing order, i.e., 3, 6, 10, 12, 17, 28, 30, 35, 38, 40.The sample size n =10 is an even number. Here,  10   10  6th observation is   = 5th observation is 17, and  2 + 1 =    2 28. Thus, the median of the given data is 17 + 28 = 22.5 . 2

5. Mode: It is that value that is occurring more frequently or having a maximum frequency. It is appropriate for any measurement level, i.e., nominal, ordinal, interval, and ratio. It is not necessarily to be unique.

Distribution Theory

18

A distribution with a single mode is said to be unimodal. If two or more values occur the same number of times, then there will be two or more modes. A distribution with more than one mode is said to be bimodal, trimodal, etc., or in general, multi-modal. If no value is repeated, no mode is defined (Figure 1.12).

FIGURE 1.12 Unimodal and bimodal distributions.

 Example 1.11: Gestational ages (weeks) of 14 newborns: 32, 36, 39, 39, 42, 35, 40, 34, 39, 41, 33, 38, 37, 39. There are four observations with a gestation age of 39 weeks, the rest of the values are occurring only once. Thus, the mode of the given data is 39 weeks. 6. Quartiles: The three points which divide the data into four equal parts are called quartiles. The first, second, and third points are known as the first, second, and third quartiles, respectively. It is a generalization of the idea of the median. The first quartile Q1 is the value which exceeds 25% of the observations and is exceeded by 75% of the observations. The second quartile Q2 coincides with the median. The third quartile Q3 is the point which has 75% observations before it and 25% observations after it. The 5-number summary of data reports its median, quartiles, and extremes observations, i.e., the maximum observation and the minimum observation. 7. Deciles: The nine points which divide the data into 10 equal parts are called deciles. For example, the seventh decile D7 has 70% observations

Descriptive Statistics: Principles and Applications

19

before it and 30% observations after it. The fifth decile D5 coincides with the median. 8. Percentiles: The 99 points which divide the data into 100 equal parts are called percentiles. For example, the 47th percentile P47 has 47% observations before it and 53% observations after it. The 50th percentile P50 coincides with the median. Percentiles indicate the percentage of scores that fall below a particular value. They tell us where a score stands relative to other scores. 1.3.2 MEASURE OF DISPERSION It measures the scatteredness and variability of the mass of figures in given data from its average. There are mainly two types of dispersions: 1. Absolute Measure of Dispersion: i. Range: It is the difference between the maximum and minimum values in the given set of data. Let x1, x2,…xn be the given data, and let x(1) ≤ x(2) ≤ … ≤ x(n) be the ordered values of x1, x2,…xn. Then, the range is defined as: Range = x(n) – x(1) ii. Quartile Deviation: It is half of the interquartile range, i.e., QD =

Q3 − Q1 , where Q3 – Q1 is the interquartile range, and Q1 and 2

Q3 are first and third quartile. iii. Mean Deviation: It is the arithmetic mean of the absolute value of the deviation of observation from its average value (mean, median, or mode). Let x1, x2,…xn be the given observations, then the mean deviation about an average A (mean, median, or mode) is given by: MD = ( A)

1 n ∑ xi − A n i =1

vi. Variance and Standard Deviation: The variance measures the spread between the observations in the data set. It is defined as the average of the square of deviation of observations from its mean value.). Let x1, x2,…xn be the given observations, then the variance is denoted by σ2 and is given by:

Distribution Theory

20

Variance =

1 n 2 ( xi − x ) ∑ n i =1

where; x̅ denotes the mean of the data set. The positive square root of the variance is called the standard deviation. It is denoted by σ. Note that the above formula for variance is an estimator of population variance and has some undesirable properties. To overcome these, we may define the sample variance as: Corrected= Variance

1 n 2 ∑ ( xi − x ) n − 1 i =1

This corrected variance is a useful measure of variability in the analysis of real-life data.

2. Relative Measure of Dispersion: When two or more than two series to be compared which differ in their units, the relative dispersion is calculated by making the series unit free. The coefficient of dispersion is the measure of relative dispersion. • Coefficient of dispersion based on range: COD (range) =

L−S L+S

where; L is the maximum observation and S is the minimum observation. • Coefficient of dispersion based on quartile deviation: COD (QD ) =

Q3 − Q1 Q3 + Q1

where; Q1 is the first quartile; and Q3 is the third quartile. • Coefficient of dispersion based on mean deviation: COD ( MD ) =

•

Mean Deviation Average value ( mean, mode or median )

Coefficient of dispersion based on standard deviation: CD (σ ) =

σ X

Descriptive Statistics: Principles and Applications

21

where; σ is the standard deviation; and X̅ is the mean.

• Coefficient of variation: Around 100 times the coefficient of dispersion based on the standard deviation is known as the coefficient of variation. It measures the percent variation. C.V =

σ X

*100

The series with smaller C.V is more consistent and has less variation as compared to series with higher C.V and have more variation. 1.3.3 MEASURE OF SKEWNESS It is a measure of lack of symmetry in the shape of a distribution. If the mean, median, and mode are not the same, then distribution is said to be skewed and quartiles are not equidistant from the median. Skewness is of two types (Figure 1.13): i.

Positively Skewed: A series is said to be positively skewed if mean > median > mode and more frequencies lie towards the right-hand side of the distribution.

ii. Negatively Skewed: A series is said to be negatively skewed if mode > median > mean and more frequencies lie toward the lefthand side of the distribution. We may also identify the skewness of the distribution of the data by using the concept of third sample moment. Let x1, x2,…xn be the given observations, then third sample moment about the mean is defined as: = µ3

1 n 3 ( xi − x ) ∑ n i =1

where; x̅ denotes the mean of the data set. If μ3 = 0, the distribution of data is symmetric, otherwise it is skewed. If μ3 > 0, the distribution is positively skewed, and if μ3 > 0, the distribution is negatively skewed.

Distribution Theory

22

FIGURE 1.13 Symmetric, positive skewed, and negative skewed.

1.3.4 MEASURE OF KURTOSIS It measures the flatness or peak of the distribution and tells whether the distribution is flatter or peaked. It also provides a way to identify the data that are heavy-tailed or light-tailed relative to a normal distribution. The curve of the normal distribution is called mesokurtic. The data distribution having greater kurtosis than the normal is called leptokurtic. Similarly, the data distribution having smaller kurtosis than the normal is called platykurtic. Let x1, x2,…xn be the given observations, then second sample moment about the mean (variance) and fourth sample moment about the mean are, respectively, defined as: = µ2

1 n 2 ( xi − x ) ∑ n i =1

= µ4

1 n 4 ( xi − x ) ∑ n i =1

and

Descriptive Statistics: Principles and Applications

23

where; x̅ denotes the mean of the data set. Now, the coefficient of kurtosis is defined by (Figure 1.14): β=

µ4 µ22

It is a number without a unit of measurement. If β = 3, the distribution of data is mesokurtic (normal). If β < 3, the distribution is leptokurtic, and if β < 3, the distribution is platykurtic.

FIGURE 1.14 Kurtosis of the distribution.

KEYWORDS • • • • •

descriptive analysis frequency distributions histogram statistical analysis tabular presentation

24

Distribution Theory

REFERENCES Gun, A. M., Gupta, M. K., & Dasgupta, B., (2013). Fundamentals of Statistics (Vol. 2). The World Press Pvt. Ltd.: Kolkata. Gupta, S. C., & Kapoor, V. K., (1997). Fundamentals of Mathematical Statistics. Sultan Chand and Sons: New Delhi. Heumann, C., Schomaker, M., & Shalabh, (2016). Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R. Springer. Miller, I., Miller, M., Freund, J. E., & Miller, I., (2004). John E. Freund’s Mathematical Statistics with Applications. Prentice Hall: Upper Saddle River, NJ. Ross, S., (2017). Introductory Statistics (4th edn.). Academic Press.

CHAPTER 2

Random Variables and Their Properties VIJAY KUMAR1, MUKTI KHETAN2, MOHD. ARSHAD3, and SHWETA DIXIT4 Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India

1

Department of Mathematics, Amity School of Applied Sciences, Amity University Mumbai, Maharashtra, India

2

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India

3

Clinical Development Services Agency, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Haryana, India

4

2.1 INTRODUCTION In our daily life, we often face experiments having outcomes with uncertainty. These experiments are called random experiments. The set of all possible outcomes in a random experiment is called the sample space. We use the symbol S for sample space throughout this chapter. Each element (or member) of S has some probability of occurrence. For example, if we appear in a competitive examination, the outcome may be “pass” or “fail” in that examination. Both outcomes (pass or fail) have some probability of occurrence. Let λ1 and λ2 denote the outcomes pass and fail, respectively.

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

Distribution Theory

26

Then, the sample space is S = {λ1, λ2}. The subsets of S are called the events. In this example, the class of events is  = {φ , {λ1} , {λ2 } , S }. If the probability of an event {λ1} (i.e., the result is pass) is 0.65, then the probability of an event {λ2} is 0.35, (since {λ2} is a complementary event of {λ1}). Sometimes, S is large and all subsets of S are not important to the experimenter. In such cases, the experimenter collects only the important subsets of S. This class of collected subsets may be extended further to a class F that contains the subsets of S and satisfying the following properties: • S ∈ F; • K ∈ F ⇒ Kc = S – K ∈ F; ∞ • Ki ∈ F, i = 1, 2, … ⇒  i =1 K i ∈  . Then, the class F is called a sigma-field (or σ-field) of subsets of S. The smallest sigma-field is F1 = {ϕ, S}, and the largest sigma-field is the power set of S. Note that the probability measure P is a set function defined on the sigma-field of the subsets of S, i.e., P: F → [0, 1] is a set function such that: • P(K) ≥ 0, ∀ K ∈ F; ∞ ∞ • P  K i = ∑ P( K i ), for any sequence of disjoint events {K1,

(

i =1

)

K2…}in F; • P(S) = 1.

i =1

The above definition is called the axiomatic (or modern) definition of probability. It was given by Kolmogorov in his fundamental work on probability theory. His book “Foundations of the Calculus of Probabilities,” published in German in 1933. A Russian version of Kolmogorov’s work appeared in 1936, and the first English version was published in 1950 as “Foundations of the Theory of Probability.” Kolmogorov’s approach is more practical and useful than other earlier formulations of probability theory. It is well known that the probability theory had its origin in gambling and the game of chance. Although, the first mathematical formulation of the probability theory was given by Laplace in 1812. Laplace developed the classical theory of probability, which applied to events that can occur only in a finite number of ways. According to the classical definition, the computation of the probability of events was reduced to combinatorial counting problems. For the more detailed theory of probability and its applications, the readers can see the books by Robert B. Ash; Robert B.

Random Variables and Their Properties

27

Ash and C. D. Dade. Now, in the next section, we will discuss the random variables (RVs). 2.2 RANDOM VARIABLES (RVS) Most of the time, the sample space S in a random experiment contains non-numeric outcomes and further analysis of these outcomes is difficult. To overcome this difficulty, we may transform the given S to a subset of the set of real numbers through a function. This idea motivates to define a random variable (RV), which is a function on S that satisfies certain conditions. Recall that F denotes the sigma-field of subsets of S in a random experiment, and P denotes the probability measure defined on F. The triplet (S, F, P) is called the probability space.  Definition 2.1: Let (S, F, P) be a probability space. The function X: S → R is said to be a RV if: X −1 ( −∞, x= ]

{ω ∈ S : X (ω ) ≤ x} ∈  ,∀x ∈ ,

where; R is the set of real numbers.  Example 2.1: A student appears in an examination with five grades, A, B, C, D and F. Each grade represents some points for calculating the cumulative performance index (CPI) of the student for his result. Table 2.1 represents the grading points of the different grades. TABLE 2.1 Grading Scheme Grade

A

B

C

D

F

Points

10

8

6

4

0

Here the sample space S = {A, B, C, D, F}, and the sigma-field F = P (S): the power set of S. The RV X: S → R is given by: X(A) = 10, X(B) = 8, X(C) =6, X(D) = 4, X(F) = 0 This RV X represents the points of the grade awarded by the institute. Now, we will define a probability function associated with the RV X. Recall that X is a real-valued function and its range set SX (say) is a subset

Distribution Theory

28

of R. Since (S, F, P) is a probability space and X is a RV defined on S, it follows that there exist a class B (called the Borel-field) of subsets of R and a probability function PX (called the induced probability measure) induced by the RV X. The induced probability function PX is defined as: PX (B) = P(X–1(B)), B ∈ B If we take B = (–∞, x], then the induced probability function is: PX ({X ≤ x}) = PX (–∞, x] = P(X–1(–∞, x]), ∀ x ∈ R Clearly, PX ({X ≤ x}) is defined for each x ∈ R if P is a probability measure defined on sigma-field. It has many desirable properties and useful to study the various properties of the associated RV X. Now, we define the cumulative distribution function (CDF) of the RV X.  Definition 2.2: Let X be a given RV with induced probability function. Then, the function F: R → R defined by: F(x) = PX ({X ≤ x}), ∀ x ∈ R

is called the CDF (or distribution function) of RV X.

 Example 2.2: (Continued Example 2.1). If we use the past information regarding the result of this examination and calculate the probabilities of the different grades secured by the students. Suppose that Table 2.2 provides the probability distribution. TABLE 2.2 Probability Distribution of Grades Grade

A

B

C

D

F

Probability P(ω)

0.05

0.4

0.35

0.1

0.1

First, we will calculate the induced probability function PX of the RV X, defined in Example 2.1. We can write PX in two ways: i. Tabular Form: Table 2.3 provides the induced probability function PX of the RV X. TABLE 2.3 Probability Function of the Associated Random Variable x (Value of X)

0

4

6

8

10

Probability PX (x)

0.1

0.1

0.35

0.4

0.05

Random Variables and Their Properties

29

ii. Functional Form: The following function represents the induced probability function PX of the RV X.  0.1, x = 0,  0.1, x = 4,   0.35, x = 6, PX ( x= ) PX ( X= x=)   0.4, x = 8,  0.05, x = 10,  0, otherwise.

Now, we will compute the CDF: 0, x < 0, 0.1, 0 ≤ x < 4,  0.2, 4 ≤ x < 6, FX ( x= ) PX ( X ≤ x=)  0.55, 6 ≤ x < 8, 0.95, 8 ≤ x < 10,  1, x ≥ 10.

Note that the CDF FX (x) of a RV has several nice properties, which help us to analyze the data in real-life experiments. A few are given below: • FX is non-decreasing; • FX is right continuous at every point in R; • limx→–∞ FX (x) = 0 and limx→∞ FX (x) = 1.  Remark: Let a and b are real constants. The following probability can be calculated, by using the CDF FX (x), as follows: i. P(X < a) = FX (a–) ii. P(a ≤ X ≤ b) = FX (b) – FX (a–) iii. P(a < X ≤ b) = FX (b) – FX (a) iv. P(a ≤ X < b) = FX (b–) – FX (a–) v. P(a < X < b) = FX (b–) – FX (a) vi. P(X = a) = FX (a) – FX (a–) vii. P(X ≥ a) = 1 – FX (a–) viii. P(X > a) = 1 – FX (a).

Distribution Theory

30

 Example 2.3: If the CDF of a RV X is given by: 0, x < −1,   x +1  , −1 ≤ x < 0,  4 FX ( x ) =   x + 3 , 0 ≤ x < 1,  4  1, x ≥ 1. 

Find the following probabilities: i.

1  1 P− < X ≤  2 2 

ii. P  − 1 ≤ X ≤ 3  4  4 iii. P  0 < X < 1  2



 1



iv. P  − ≤ X < 1  2  v. P(X= 0)  Solution: Using the CDF, the following probabilities are as follows: 1 1 + 3 − +1 1 7 1 3  1 1  1 2 − 2 = − = P  − < X ≤  = FX   − FX  −  = 2 4 4 8 8 4  2 2  2 3 1  1 −  4 + 3 − 4 + 1 15 3 3 3  1 3 − = − = P  − ≤ X ≤  = FX   − FX  −  = 4 4 4 16 16 4  4 4  4  1 +3  1−  1 3 7 3 1  P  0 < X <  = FX   − FX ( 0 ) = 2 − = − = 2 4 4 8 4 8  2  1 − +1  1−  1 7 2 =1 − =  −  =1 − 4 8 8  2  3 1 1 P ( X = 0 ) = FX ( 0 ) − FX ( 0− ) = − = 4 4 2

 1  P  − ≤ X < 1 =FX (1− ) − FX  2 

 Example 2.4: The following is the probability mass function (PMF) of a RV X:

Random Variables and Their Properties

31

x

2

4

6

8

P(X = x)

0.3

0.2

0.4

0.1

Find the CDF FX (x) of the RV X.  Solution: We know that FX ( x)= P( X ≤ x)= the CDF of X is:

∑

i: xi ≤ x

P ( X= xi ). Hence,

0, x < 2, 0.3, 2 ≤ x < 4,  FX = ( x) 0.5, 4 ≤ x < 6, 0.9, 6 ≤ x < 8,  1, x ≥ 8.

 Example 2.5: The following is the PMF of a RV X: x  , x = 1, 2, 3, P ( X= x= ) 6 0, otherwise.

Find the CDF FX (x) of the RV X.

 Solution: We know that FX ( x)= P( X ≤ x)= Hence, the CDF of X is (Figure 2.1):

∑

i: xi ≤ x

P ( X= xi ).

0, x < 1, 1  ,1 ≤ x < 2, 6 FX ( x) =   1 , 2 ≤ x < 3, 2 1, x ≥ 3. 

2.2.1 DISCRETE TYPE RANDOM VARIABLE (RV) We observed that FX may be continuous everywhere in R or it may have discontinuities in R. If the sum of sizes of jumps at the point of discontinuities of FX is equal to one, then the associated RV X is of discrete type.  Definition 2.3: Let X be a RV with CDF FX, and let SX denote the set of discontinuity points of FX.Suppose that JX (x) = FX (x) – FX (x –) denote the size of the jump at the discontinuity point x, where

Distribution Theory

32

F(x –) = limh→0 F(x – h). The RV X is said to be of discrete type if ∑ x∈S pX ( x ) = 1. The set SX is called the support of the RV X. The function pX : R → R, X

 J ( x), x ∈ S X pX ( x ) =  X 0, x ∉ S X , 

is called the PMF of the discrete RV X.

FIGURE 2.1 Distribution function of a RV X

The PMF of a discrete RV must satisfy the following two conditions: i. pX (x) ≥ 0, ∀x ∈ R; and ii.

∑

x∈S X

p X ( x ) = 1.

Note that if a function g(x) satisfies the above two conditions, then g(x) is a PMF of some discrete RV. Note that the function PX (x), given in Example 2.2, satisfy the above properties of the PMF of a RV. Hence, the function PX (x) in Example 2.2 is the PMF of the RV X, which represents the points of the grade awarded by the institute.  Example 2.6: A RV X has the following PMF: x

0

1

2

3

4

P(X = x)

0.1

a

0.2

2a

0.4

Find: i. The value of a;

Random Variables and Their Properties

ii. iii. iv. v.

33

P(X < 3); P(X ≥ 3); P(0 < X < 2); Draw the graph of the PMF.

 Solution: i. For the value of a we must verify two properties of PMF (a) p(x) ≥ 0; and (b) Σx p(x) = 1. The first property gives us that constant a ≥ 0. Now for value of a, using second property of PMF, that is:

∑ p( x) = 1 x

p (0) + p (1) + p (2) + p (3) + p (4) = 1 0.1 + a + 0.2 + 2a + 0.4 = 1

a = 0.1 For a = 0.1, the PMF of X is: x

0

1

2

3

4

P(X = x)

0.1

0.1

0.2

0.2

0.4

ii. iii. iv. v.

P( X < 3) = ∑ x = 0 P ( X = x ) = 0.4; P( X ≥ 3) =1 − P( X < 3) =0.6; P(0 < X < 2) =P ( X =1) =0.1 2

The graph of the PMF is given in Figure 2.2.

FIGURE 2.2 Graph of the PMF.

Distribution Theory

34

 Example 2.7: The following is the PMF of a RV X: x  , x = 1, 2, 3, 4, P( X= x= ) 10 0, otherwise.

Find: i. P(X = 3); ii. P(1 < X < 4); iii. Draw the graph of the PMF.

 Solution: i. P(X = 3) = 0.3; ii. P(1 < X < 4); = P(X = 2 + P (X = 3) = 0.5; iii. The graph of the PMF is given in Figure 2.3.

FIGURE 2.3 The PMF of the random variable X.

2.2.2 CONTINUOUS TYPE RANDOM VARIABLE (RV) Recall that the CDF FX is right continuous at every point in R for all types of RVs. Sometimes, we observe that for a special class of RVs, the CDF FX is continuous everywhere in R, and can be written as: = FX ( x )

∫

x

−∞

f X ( t ) dt , x ∈ 

where; fX (�) is a real-valued, non-negative, and integrable function defined on R. The function fX is called the probability density function (PDF),

Random Variables and Their Properties

35

and the associated RV X is called the continuous RV. Note that the set SX = { x ∈  : f X ( x ) > 0} is called the support of the continuous RV X. The PDF of a continuous RV must satisfy the following two conditions: i.

f X ( x ) ≥ 0, ∀x ∈ ;

ii.

∫

∞

−∞

f X ( x ) dx = 1.

Note that if a function g(x) satisfies the above two conditions, then g(x) is a PDF of some continuous RV.  Remark: Let a and b are real constants. The following probability can be calculated, by using the CDF of continuous RV X, as follows: P ( a ≤ X ≤ b )= P ( a < X ≤ b )= P ( a ≤ X < b ) = P ( a < X < b )= FX ( b ) − FX ( a ) .

 Example 2.8: The CDF of a RV X is given by:

Find: i. P(1 ≤ X ≤ 2); ii. P(1 < X ≤ 2); iii. P(1 ≤ X < 2); iv. P(1 < X < 2); v. P(X = 1).

1 − e − x , x ≥ 0, FX ( x) =  0, x < 0.

 Solution: Since continuous RV, it follows those parts (i)–(iv) give the equal probabilities, and the required probability is given by: FX ( 2 ) − FX (1) = 1 − e −2 − 1 + e −1 = e −1 − e −2

The probability given in part (v) is: P ( X =1) =FX (1) − FX (1− ) =1 − e −1 − (1 − e −1 ) =0

 Remark: Note that if X is a continuous type RV, then P(X = c) = 0 of any constant c ∈ R. Thus P(X = 1) = 0 in part (v) of the above example.  Example 2.9: The following is the PDF of a continuous RV: 3  x(2 − x), 0 < x < 2, f ( x) =  4  0, otherwise.

Find the CDF FX (x) and draw its graph.

Distribution Theory

36

 Solution: We know that: FX= ( x) P( X ≤ x) =∫

x

−∞

=

∫

x

0

f (t )dt

3 t (2 − t )dt 4 x

=

3  2 t3  t −  4 3 0

Hence, the CDF of X is:  0, x < 0, 1  ( x)  (3 x 2 − x 3 ), 0 ≤ x < 2, FX= 4  1, x ≥ 2.

The graph of the CDF is given in Figure 2.4.

FIGURE 2.4 CDF of continuous RV.

 Example 2.10: The following is the PDF of a continuous RV: 0.04 x, 0 < x < 5,  = f ( x) 0.04(10 − x), 5 < x < 10, 0, otherwise. 

Find the CDF FX (x).

Random Variables and Their Properties

37

x

 Solution: We know that FX ( x) = ∫−∞ f (t )dt. The CDF FX (x) is:  0, x < 0, 0.02 x 2 , 0 ≤ x < 5,  FX ( x) =  2 0.02(20 x − x ) − 1, 5 ≤ x < 10, 1, x ≥ 10.

 Example 2.11: A RV X has the following PDF: ax(1 − x), 0 < x < 1, f ( x) =  0, otherwise.

Find the value of a for which f (x) is PDF and draw the graph of the PDF.  Solution: For the value of a we must verify two properties of pmf (i) f ( x) ≥ 0, ∀x , which gives us that a ≥ 0 and by using (ii) ∫x f ( x)dx = 1, we can obtain the value of constant a. Now:

∫ ∫

1

0

x

f ( x)dx = 1

ax(1 − x)dx = 1 1

 x 2 x3  a −  = 1  2 3 0

a = 6.

Hence, for a = 6, f (x) is PDF. The graph of the PDF is given in Figure 2.5.

FIGURE 2.5 Graph of the PDF.

Distribution Theory

38

 Example 2.12: A RV X has the following PDF: 0.04 x, 0 < x < 5,  = f ( x) 0.04(10 − x), 5 < x < 10, 0, otherwise. 

Find P(X > 7).  Solution: 7

P( X < 7) = ∫ f ( x)dx 0

= =

∫

5

0

∫

5

0

7

f ( x)dx + ∫ f ( x)dx 5

7

0.04 xdx + ∫ 0.04(10 − x)dx 5

5

7

 x2   x2  = 0.04   + 0.04 10 x −  2 5  2 0 

= 0.82  Example 2.13: The expression of the PDF is:  λ e − λ x , 0 < x < ∞, f X ( x) =   0, otherwise.

Find the CDF and draw its graph when λ = 1 and 5.

 Solution: We know that: x

FX ( x) = ∫ f (t )dt 0

x

= ∫ λ e − λt dt 0

= 1 − e−λ x  0, x < 0, FX ( x) =  −λ x 1 − e , x ≥ 0.

The graph of CDF when λ = 1 is given in Figure 2.6.

Random Variables and Their Properties

39

FIGURE 2.6 Graph of CDF when λ = 1.

The graph of CDF when λ = 5 is given in Figure 2.7.

FIGURE 2.7 Graph of CDF when λ = 5.

2.3 EXPECTATION OF RANDOM VARIABLE (RV) The characteristic of a probability model can be described by its moments. Let X be a RV having a probability distribution. The expectation of RV is defined as:  xf ( x)dx, if X is a continuous RV,  ∫x E( X ) =   ∑ xp ( x), if X is a discrete RV,  x

Distribution Theory

40

Provided

∫ | x | f ( x)dx and ∑ x

x

| x | p( x) are exists.

Let g(X) be a measurable function of RV X. The expectation of a function of the RV is defined as:  g ( x) f ( x)dx, if X a is continuous RV,  ∫x E ( g ( X )) =   ∑ g ( x) p ( x), if X is a discrete RV,  x

provided

∫ | g ( x) | f ( x)dx and ∑ x

x

| g ( x) | p ( x) are exists.

2.4 MOMENTS OF RANDOM VARIABLE (RV) The rth raw moments or moments about the origin is defined as: ′ E ( X r )= µ= , r 1, 2, …. r

The first four moments are useful to analyze the shape (or behavior) of the = µ1′ E= ( X ) µ (say) is known as the probability models. The first moment mean (or the population mean) of the distribution. Further, the rth central moments (or moments about the mean) is defined as: µr = E ( X − µ ) ,r = 1, 2, …. r

Here; μ is the mean of the distribution. The first central moment is always zero for every RV, i.e., µ1= E ( X − µ )= 0 . The second central moment is 2 µ2 = E ( X − µ ) = E ( X 2 ) − µ 2 . It is called the variance of the RV X and also 2 2 σ= V ( X= ) E ( X − µ ) . The known as the measure of spread, i.e., µ= 2 third and fourth central moments are useful in measuring the skewness and kurtosis, respectively. For measuring the skewness and kurtosis, Karl Pearson defined the following four coefficients based upon the first four central moments. Measure Skewness Kurtosis

Formula β1 =

µ32 µ23

β2 =

µ4 µ22

= γ1

= β1

γ= β2 − 3 2

µ3 3/2 µ ( 2)

Random Variables and Their Properties

41

For symmetrical distribution, β1 shall be zero. The value of β1 > 0 indicates the skewness of the distribution but β1 does not provide the direction of the skewness, i.e., positive skewed or negative skewed. So, we can use another coefficient of skewness γ1, which provides the direction of the skewness. Since μ2 is always positive, it follows that the sign of skewness would depend on the sign of μ3. Similarly, the values of β2 and γ2 measure of the kurtosis If γ 2 = 0,γ 2 > 0,γ 2 < 0 , the distribution is normal (or mesokurtic), leptokurtic, and platykurtic distribution, respectively.  Properties of Expectation: 1. Let X be a non-negative RV then E(X) ≥ 0, provided E(X) exists. 2. Let X and Y be two RVs such that E(X) and E(Y) are exists. Let a, b and c are three non-zero real constants, then: E ( aX + bY += c ) aE ( X ) + bE (Y ) + c .

3. Let X be a RV and g(X) is a function of RV X, then: E ( ag ( X= ) + b ) aE ( g ( X )) + b

where; a and b are real constant. 4. Let X and Y are independent RVs such that E(X) and E(Y) exists then: E(XY) = E(X)E(Y). 5. The expected value of constant is constant. That is, for a given constant c: E(c) = c. 6. If X and Y are independent RVs with a condition that P(X ≤ Y) = 1, then: E(X) ≤ E(Y). 7. Let E(X r) exists. If we take 0 < s ≤ r, then E(X S) also exists.  Properties of Variance: 1. A variance of constant is zero. That is: V(c) = 0. 2. Let X be a RV and a, b are real constants then: V ( aX + b ) = a 2V ( X ).

That is, the variance is independent of location b but not scale a.

Distribution Theory

42

3. Let X and Y be two independent RVs then: V ( X + Y= ) V ( X ) + V (Y ).

 Example 2.14: Find the E(X) and E(X 2) for the PDF given in Example 2.13.  Solution: Mean of RV X is: E ( X ) = ∫ xf ( x)dx x

∞

= λ ∫ xe − λ x dx 0

=

1

λ

Now, the second moment about the origin is: E ( X 2 ) = ∫ x 2 f ( x)dx x

∞

= ∫ λ x 2 e − λ x dx 0

=

2

λ2

 Example 2.15: A RV X has the following PDF: 3 x 2 , 0 < x < 1, f ( x) =  0, otherwise.

Find the E(X) and V(X).

 Solution: Mean of RV X is: = E( X )

∫

1

x

3 = xf ( x)dx 3= ∫ x dx 0

3 4

Similarly, the second moment about the origin is: = E( X 2 )

∫

1

0

1

4 x2 = f ( x)dx 3= ∫ x dx 0

3 5

Now, the variance of RV X is: 3 9 3 2 V ( X ) =E ( X 2 ) − ( E ( X ) ) = − = 5 16 80

Random Variables and Their Properties

43

2.5 MOMENT GENERATING FUNCTION (MGF) The moment generating function (MGF) of X gives us all moments of RV X (if MGF exists). That is why it is called the MGF. If MGF exists, it uniquely determines the distribution. If two RVs have the same MGFs, then they must have the same distributions. Thus, if you find the MGF of a RV, you have indeed determined its distribution. The MGF of a RV X is defined as:  etx f ( x)dx, if X is a continuous RV,  ∫x M X (t ) =  tx ∑e p ( x), if X is a discrete RV,  x

where t < h, for some h > 0 Now, consider: M X (t ) = E ( etx )  ∞ ( tx )r   = E∑  r =0 r !    r ∞ t = ∑E ( X r ) r! r =0 ∞

M X (t ) = ∑µr′ r =0

tr r!

tr That is, coefficient of M X (t ) gives rth raw moments (or moments about r!

origin). Therefore:

= µr′ E= ( X r ) coefficient of

tr in M X (t ). r!

 Result: We can also find, rth raw moments μʹr using the following formula: r ′ E ( X= = µ ) r

∂r M X (t ) = , r 1, 2, … ∂t r t =0

 Properties of Moment Generating Function: 1. Le c be non-zero real constant then: McX (t) = MX (ct) 2. Let X1, X2,… Xn, be independent RVs. Define Y = ∑ i =1 X i , then MGF of Y is: n

Distribution Theory

44

= M Y (t ) M = (t ) n ∑X i i =1

n

∏M i =1

Xi

(t )

3. Let X be a RV having MGF MX (t). Consider Y = b ≠ 0. Then, the MGF of Y is: M Y (t ) = e

−

at b

X −a for b

t M X  . b

4. The MGF of the distribution, if it exists, uniquely determines the distribution, that is: MX (t) = MY (t) ⇒ X&Y have the same CDFs.  Example 2.16: Find MGF expression and find four raw moments about origin using the PDF given in Example 2.13.  Solution: We know that: M X (t ) = ∫ etx f ( x)dx x

∞

= ∫ e λ e − λ x dx tx

0

∞

= λ ∫ e − ( λ −t ) x dx 0

−1

t  M X (t ) = 1 −  , t < λ  λ

We can find raw moments using MGF.

The coefficient of M X (t ) gives rth raw moments (or moments r! about origin):

tr

−1

t  M X (t= ) 1 −  , t > λ  λ

Using the expansion (1 – a)–1 =

∑

8 k =0

a k , | a | < 1,

∞   t   t  2  t 3  t  4  ∞  t  r r! tr M X (t ) =+ 1   +   +   +   +…. = ∑ ∑   = r λ r 0 λ r!   λ   λ   λ   λ  =  r 0=

μʹr = coefficient of

Thus, = µ1′

tr r! in MX (t) = r = ; r 1, 2, … r! λ

1 = , µ 2′

λ

2 = , µ3′ 2

λ

6 24 = , µ4′ 3 4

λ

λ

Random Variables and Their Properties

45

 Alternative Method: We can also find, rth raw moments μʹr using the following formula: r ′ E ( X= = µ ) r

∂r M X (t ) = ; r 1, 2, …, ∂t r t =0

Substitute r = 1, 2 and solve them; we can get the first and second raw moments μʹ1, μʹ2, µ1′ =

Now,

∂ M X (t ) ∂t t =0 −1

−2

∂ ∂ t 1 t M X (t ) = 1 −  = 1 −  , t > λ ∂t ∂t  λ  λ λ

Hence, = µ1′

∂ 1 = M X (t ) ∂t λ t =0

Further, the second raw moment: µ2′ =

Now,

∂2 M X (t ) ∂t 2 t =0

∂2 ∂ ∂  M X (t ) =  M X (t )  2 ∂t  ∂t ∂t  −2 ∂ 1  t  =  1 −   ∂t  λ  λ   −3

= = µ2′

2  t 1−  , t > λ 2  λ  λ ∂2 2 = M X (t ) 2 ∂t λ2 t =0

Similarly, we can find the third and fourth raw moments.

2.6 CUMULANT GENERATING FUNCTION (CGF) Cumulants of a probability distribution is a sequence of numbers that compactly define the distribution. The cumulant generating function (CGF) of a RV X is defined as:

Distribution Theory

46

κ X (t ) = log M X (t )

Now, the right-hand side can be expanded as a convergent series in powers of t. Thus, tr r!

∞

κ X (t ) = ∑κ r r =0

tr K X (t ) gives rth cumulants. Therefore: That is, the coefficient of r! tr κ r = coefficient of inκ X (t ) r!

 Result: We can also find, rth cumulants κr using the following formula: = κ r

∂r κ X (t ) = ; r 1, 2, …, ∂t r t =0

 Result: The relationship between cumulants and moments is given below: μʹ1 = к1 μ2 = к2 μ3 = к3 = µ4 3κ 22 + κ 4

 Example 2.17: Find the expression of the CGF and find mean, second, third, and fourth central moments using the PDF given in Example 2.13.  Solution: We know that: κX (t) = log (MX (t))

Substituting the expression of MGF from the previous example, we get:  

t 

−1

κ= log 1 −  , t < λ X (t ) λ

We can find central moments using CGF as follows:

κr = coefficient of Therefore,

tr in кX (t) r!

t  log 1 −  κ= X (t )  λ

−1

Random Variables and Their Properties

47

t  = −log 1 −  , t < λ  λ x 2 x3 Using log (1 − x) =− x − − −  , x < 1 expansion, we get: 2 3  1 t 1  t  2 1  t 3 1  t  4  = − − −   −   −   +   1 λ 2  λ  3  λ  4  λ   2

3

4

1 t 1t 2t  6t = +   +   +   + 1! λ 2!  λ  3!  λ  4!  λ 

t 1 in κ X (t ) = λ 1! t2 1 κ2 = μ2 = variance = coefficient of in κ X (t ) = 2 2! λ t3 2 κ3 = μ3 = coefficient of in κ X (t ) = 3 3! λ 4 t 6 κ4 = coefficient of in κ X (t ) = 4 4! λ

κ1 = μʹ1, = mean= coefficient of

µ4 =κ 4 +3κ 22 =

6

2

9  1  + 3 2  = 4 4 λ λ λ 

 Alternative Method: We can also find, rth cumulants using the following formula: = κ r

∂r κ X (t ) = , r 1, 2, … ∂t r t =0

Substitute r = 1, 23 and 4 and solve them; we can get κ1, κ2, κ3, and κ4 κ1 =

∂ κ X (t ) ∂t t =0

Now, −1

∂ ∂ t 1 1 t  κ X (= t) log 1 − = 1 −  −1 ∂t ∂t  λ t  λ λ  1 −  λ  

Hence, = µ1′

∂ 1 = κ X (t ) ∂t λ t =0

−2

Distribution Theory

48

Further, the second raw moment: µ2 =

∂2 κ X (t ) ∂t 2 t =0

Now, ∂2 ∂ ∂  κ X (t ) =  κ X (t )  ∂t  ∂t ∂t 2  ∂ 1  t  1 −  ∂t  λ  λ 

=

=

= µ2

1  t 1− λ 2  λ 

−1

  

−2

∂2 1 = κ X (t ) . 2 ∂t λ2 t =0

Similarly, we can find the third and fourth central moments.

 Example 2.18: Find β1, γ1, β2, γ2 for the PDF given in Example 2.13. Also, give the interpretation.  Solution: Using the results second, third, and fourth central moments from the above example, we get: 2

 2  µ32  λ 3  = = 4 β1 = µ23  1 3  2 λ  = γ1

µ3 = 2 µ23/ 2 9

µ4 λ 4= 9 = β= 2 2 µ2  1  2  λ 2 

γ2 = 9 – 3 = 6 This implies that the given probability model is positively skewed and leptokurtic.

Random Variables and Their Properties

49

2.7 CHARACTERISTIC FUNCTION (CF) The advantage of the characteristic function (CF) is that it always exists, unlike the MGF. The CF of a RV X is a function φX (t) defined as:  eitx f ( x)dx, if X is a continuous RV,  ∫x φ X (t ) =  itx  ∑e p ( x), if X is a discrete RV.  x

 Example 2.19: Find the CF of the PDF given in Example 2.13.  Solution: We know that: ϕ X (t ) = ∫ eitx f ( x)dx x

∞

= ∫ e λ e − λ x dx itx

0

∞

= λ ∫ e − ( λ −it ) x dx 0

 

it 

−1

φ X (t=) 1 −  . λ 

KEYWORDS • • • • • •

cumulative distribution function cumulative performance index positively skewed and leptokurtic probability density function probability mass function random variable

REFERENCES Casella, G., & Berger, R. L., (2002). Statistical Inference. Duxbury: Belmont, CA. Gupta, S. C., & Kapoor, V. K., (1997). Fundamentals of Mathematical Statistics. Sultan Chand and Sons: New Delhi. Heumann, C., Schomaker, M., & Shalabh, (2016). Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R. Springer.

50

Distribution Theory

Hogg, R. V., McKean, J. W., & Craig, A. T., (2013). Introduction to Mathematical Statistics (7th edn.). Pearson: Boston. Miller, I., Miller, M., Freund, J. E., & Miller, I., (2004). John E. Freund’s Mathematical Statistics with Applications. Prentice-Hall: Upper Saddle River, NJ. Rohatgi, V., & Saleh, A. K. Md. E., (2015). Introduction to Probability and Statistics (3rd edn.). John Wiley.

CHAPTER 3

Discrete Distributions MUKTI KHETAN1, VIJAY KUMAR2, MOHD. ARSHAD3, and PRASHANT VERMA4 Department of Mathematics, Amity School of Applied Sciences, Amity University Mumbai, Maharashtra, India

1

Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India

2

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India

3

Department of Statistics, Faculty of Science, University of Allahabad, Prayagraj, India

4

3.1 UNIFORM DISTRIBUTION (DISCRETE) The uniform distribution contains a fixed number of values (κ1, κ2, …, κl) with probability (1/l). The probability model wherein a finite (predetermine) number of values ‘l’ are equally likely to be observed; each value of ‘l’ has the same probability 1/l. In uniform distribution, the expression of probability mass function (PMF) of the random variable (RV) X is: 1  P( X= x= ) l 0

x = κ1 , κ 2 ,..., κ l

,

otherwise

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

Distribution Theory

52

where; κ1, κ2, …, κl are real constant. We can use κ l = l ∀l = 1, 2,..., n . In this case, we define a discrete uniform distribution over the set {1, 2,…,n} as:

1  P( X= x= ) n 0

x = 1, 2,..., n otherwise

Symbolically: X~Uniform(n). Here n is a parameter of the distribution.  Sample Generation from Uniform distribution: For sample generation of discrete uniform distribution (Figure 3.1), we must install one package using the following coding: install.package(“purr”) library(purr)

FIGURE 3.1 Uniform distribution.

 Output:

Discrete Distributions

53

3.1.1 PROPERTIES OF THE UNIFORM DISTRIBUTION a. Raw Moments of Uniform Distribution: If X is the random variable (RV), the rth raw moment is defined below: P( X ∑ x=

= µr′ E= (X r )

r

The expression of the μʹ1 is: ′ E ( X= µ= ) 1

x).

x

n

X ∑ xP(=

x= )

1

n

∑ x n.

= x 1= x 1

1 Since = i n(n + 1), we get: ∑ 2 i =1 n

(n + 1) ⇒ µ1′ = . (3.1.1) 2

Thus,

(n + 1) is the mean of the distribution. 2

The expression of the μʹ2 is: ′ µ= E ( X 2= ) 2

2

x= )

n

∑x

2

= x 1= x 1

1 . n

1 Since ∑ i = n(2n + 1)(n + 1), we get: 6 i =1 (2n + 1)(n + 1) ⇒ µ 2′ = . (3.1.2) 6 n

n

X ∑ x P(=

2

The expression of the μʹ3 is: ′ E ( X 3= µ= ) 3

n

X ∑ x P(= 3

x= )

n

∑x

3

= x 1= x 1

n

1



1 . n

2

Since = i 3  n(n + 1)  , we get: ∑ 2  i =1

n(n + 1) 2 ⇒ µ3′ = . (3.1.3) 4

The expression of the μʹ4 is: ′ µ= E ( X 4= ) 4

n

X ∑ x P(= 4

x= )

n

∑x

= x 1= x 1

1 Since ∑= i4 n(n + 1)(6n3 + 9n 2 + n − 1) we get: 30 i =1 n

4

1 . n

Distribution Theory

54

′ ⇒ µ= 4 =

1 (n + 1) 3n 2 (2n + 1) + 6n 2 + 3n − 2n − 1 . 30 ′ µ4=

1 (n + 1)(6n3 + 9n 2 + n − 1) 30

1 (n + 1)(2n + 1) 3n 2 + 3n − 1 (3.1.4) 30

b. Central Moments of Uniform Distribution: If X is the RV, the rth central moment is defined below: µr = E ( X − µ )r =

∑ (x − µ)

r

P( X = x)

x

where; μ is the mean. In this section, we have derived the first four central moments μ1, μ2, μ3, μ4. Since the first central moment is always zero, i.e., μ1 = 0

(3.1.5)

The expression of the μʹ2 is: μ2 = μʹ2 – μʹ12

2

=

(2n + 1)(n + 1)  (n + 1)  −  [using Eqns. (3.1.1) and (3.1.2)] 6  2 

=

(n + 1)  (2n + 1) (n + 1)  − 2  3 2 

=

(n 2 − 1) (3.1.6) . 12 2

Thus, the variance of the distribution as mentioned above is The expression of the μʹ3 is:

(n − 1) . 12

μ3 = μʹ3 – 3 μʹ2 μʹ1 + 2μʹ13 n(n + 1) 2 (2n + 1)(n + 1) (n + 1)  (n + 1)  = −3 + 2  4 6 2  2 

= 0 The expression of the μʹ4 is:

3

[using Eqns. (3.1.1)–(3.1.3)] (3.1.7)

Discrete Distributions

55

μ4 = μʹ4 – 4 μʹ3 μʹ1 + 6 μʹ2 μʹ12 – 3μʹ14 =

1 n(n + 1) 2 (n + 1) (n + 1)(2n + 1) 3n 2 + 3n − 1 − 4 30 4 2 2

+6

(2n + 1)(n + 1)  (n + 1)   (n + 1)  − 3    6  2   2 

4

[using Eqns. (3.1.1)–(3.1.4)]

3

1 (n + 1)  (2n + 1) 3(n + 1)  (n + 1)(2n + 1) 3n 2 + 3n − 1 + −n + − 30 2  2 8  1 = (n + 1)  48n3 + 72n 2 + 8n − 8 + 15(n 2 + 2n + 1 − 3n3 − 6n 2 − 3n)  240 =

1 (n + 1) 3n3 − 3n 2 − 7 n + 7  240 1 (n + 1) 3n 2 (n − 1) − 7(n − 1)  = 240 1 (n 2 − 1)(3n 2 − 7) (3.1.8) = 240 =

c. Skewness and Kurtosis of Uniform Distribution: Karl Pearson defined the four coefficients for skewness and kurtosis based upon the first four central moments. The expressions of skewness (β1, γ1) and kurtosis (β2, γ2) are given below:

β1 =

µ32 = 0. [using Eqns. (3.1.6) and (3.1.7)] µ23

= γ1

µ µ2

β= 2

µ4 1 (n + 1)(n − 1)(3n 2 − 7) . [using Eqns. (3.1.6) and (3.1.8)] = 2 2 µ2 240  (n 2 − 1)   12 

β2 =

3 = 0. (3.1.9) 3/ 2

 

3 (3n 2 − 7) . 5 (n 2 − 1)

3 (3n 2 − 7) 6 (n 2 + 1) − 3 =− . (3.1.10) 2 5 (n − 1) 5 (n 2 − 1)

γ 2 =β 2 − 3 =

Distribution Theory

56

d. Moment Generating Function (MGF) of Uniform Distribution: The MGF plays an essential role in the theory of distribution. It will make it easier to know about the characteristics of the model. Symbolically the MGF is denoted by MX(t) and is defined as: M= E= (etX ) X (t )

P( X ∑e= tx

x).

x

The expression for MGF MX(t) is obtained as below: M X (t ) =

n

∑e

tx

P( X = x) =

n

∑e

tx

x 1= x 1 =

1 1 t = (e + e 2t + e3t + ... + e nt ). n n

As we know that the sum of the series b + br + br2 + … + brn is b(1 − r n ) , substituting the formula in the above expression, we get: (1 − r ) ⇒ = M X (t )

1  et (1 − e nt )    ; t ≠ 0, and M X (0)=1. n  (1 − et ) 

 Note: The expression probability generating function (PGF) can be derived using the same MGF method. The formula of PGF is = PX (t )

= t P( X ∑ x

x).

x

e. Characteristic Function (CF) of Uniform Distribution: The benefit of the CF is that it always exists, unlike the MGF function. Symbolically the CF is denoted by ϕX (t) and is defined as: φ= E= (eitX ) X (t )

P( X ∑e= itx

x).

x

The expression for CF (ϕX (t)) is obtained as below: φ X (t ) =

n

∑e

itx

P( X = x) =

n

∑e

itx

x 1= x 1 =

1 1 it = (e + e 2it + e3it + ... + e nit ), n n

Using the formula of the sum of the series, we get: = ⇒ φ X (t )

1  eit (1 − e nt )  = φ X (0) 1.   ; t ≠ 0, and n  (1 − et ) 

3.2 BERNOULLI DISTRIBUTION If an experiment has only two outcomes, then the probability distribution of this experiment is the Bernoulli distribution, which usually have zero

Discrete Distributions

57

and one with probability (1-p) and p, respectively. Here ‘p’ is the success probability and 0 ≤ p ≤ 1. For example, the result of a competitive exam has two possible consequences: pass or fail, and the HIV test has two possible consequences: tested positive or negative. In Bernoulli distribution, the PMF of a random variable X is: p x (1-p)1− x P( X= x= )  0

x=0,1 otherwise

Symbolically: X~Bernoulli(p). Here; p is a parameter of the distribution.  Sample Generation from Bernoulli Distribution (Figure 3.2):

FIGURE 3.2 Bernoulli distribution.

 Output:

3.2.1 PROPERTIES OF BERNOULLI DISTRIBUTION a. Moments of Bernoulli Distribution: The first four raw moments μʹ1, μʹ2, μʹ3, μʹ4 have been derived as below: r ′ E ( X= ) µ= r

1

X x= ) ∑ x r P(=

1

∑x p

x 0= x 0 =

r

x

x (1-p)1−= 0p 0 (1-p)1− 0 + 1.p1 (1-p)1−1

Distribution Theory

58

μʹr = p Substitute r = 1, 2, 3, 4 in the above expression and get all fours moments μʹ1, μʹ2, μʹ3, μʹ4. μʹr = p (3.2.1)

Thus, p is the mean of the distribution.

μʹ2 = p (3.2.2)

μʹ3 = p (3.2.3)

μʹ3 = p (3.2.4)

b. Central Moments Bernoulli Distribution: The first four central moments μʹ1, μʹ2, μʹ3, μʹ4 have been derived as below: μ1 = 0

(3.2.5)

µ2 =µ2′ − µ 1′2 =p − p 2 =p (1 − p ) [using Eqns. (3.2.1) and (3.2.2)]

μ2 = p (1 – p) (3.2.6) Thus, the variance of the above-mentioned distribution is p (1 – p)

µ3 = µ3′ − 3µ2′ µ1′ + 2µ1′3 =p − 3 pp + 2 p 3 [using Eqns. (3.2.1)–(3.2.3)] = p (1 − 3 p + 2 p 2 ) = p (2 p 2 − 2 p − p + 1)

µ3 =p(1 − 2 p)(1 − p) (3.2.7)

p − 4 pp + 6 pp 2 − 3 p 4 µ4 = µ4′ − 4 µ3′ µ1′ + 6µ2′ µ1′2 − 3µ1′4 =

= p (1 − 4 p + 6 p 2 − 3 p 3 )

[using Eqns. (3.2.1)–(3.2.4)]

= p (1 − p − 3 p + 6 p 2 − 3 p 3 ) = p {1 − p − 3 p (1 − 2 p + p 2 )}

= p {1 − p − 3 p (1 − p ) 2 }

µ4 =p(1 − p) {1 − 3 p(1 − p)} (3.2.8)

c. Skewness and Kurtosis of Bernoulli Distribution: The expressions of skewness (β1, γ1) and kurtosis (β2, γ2) four coefficients based upon the first four central moments are given below (Figure 3.3):

= β1

µ32 = µ23

[ p(1 − 2 p)(1 − p)] [using Eqns. (3.2.6) and (3.2.7)] 3 [ p(1 − p)] 2

Discrete Distributions

59

β1 = = γ1

(1 − 2 p ) 2 p (1 − p )

µ µ2

3 = 3/ 2

1− 2 p p (1 − p )

(3.2.9)

µ4 p(1 − p) {1 − 3 p(1 − p)} {1 − 3 p(1 − p)} = = 2 p (1 − p ) µ22 [ p(1 − p)] [using Eqns. (3.2.6) and (3.2.8)] β= 2

= β2

γ 2 = β2 − 3 =

1 −3 p (1 − p )

1 1 − 6 p (1 − p ) −6= (3.2.10) p (1 − p ) p (1 − p )

For 0 < p < 1 (i.e.,γ 1 > 0), the distribution is positively skewed; for

2 1 1 = ,γ 1 0, this implies that distribution is symmetric when p = . 2 2 1 Similarly, the distribution is negatively skewed when ≤ p < 1. Let 2 1 1 1 1 0.2113 and 0.7887 be two values of the = p0 = = p1 = 2 2 2 12 parameter p. For p ∈ {p0 , p1} , the distribution is mesokurtic, for p ∈ ( 0,p 0 ) ∪ ( p1 ,1) , the distribution is leptokurtic and for p ∈ ( p 0 , p1 ) , = p

the distribution is platykurtic.

FIGURE 3.3 Shape of Bernoulli distribution.

d. Moment Generating Function (MGF) of Bernoulli Distribution: Each probability distribution has a unique MGF. The expression for MGF (MX(t)) is obtained as below:

Distribution Theory

60

M X (= t ) E (etX= )

1

∑e

tx

P(= X x= )

1

∑e

tx

p x (1-p)1− x .

= x 0= x 0

M X (t ) =(1 − p ) + pet . (3.2.11)

In this method, we find all four moments μʹ1, μʹ2, μʹ3, μʹ4 using = µr′

∂r = M X (t ) t 0. ∂t r

Substitute r = 1, 2, 3, 4 and solve them; we can get all four moments μʹ1, μʹ2, μʹ3, μʹ4 ∂ = M X (t ) t 0 ∂t

= µ1′

∂ ∂ M X (t )= (1 − p + pet )= pet [using Eqns. (3.2.11)] (3.2.12) ∂t ∂t ∂



′  M X (t )= µ= t 0= 1  p  ∂t  = µ2′

∂2 = M X (t ) t 0 ∂t 2

∂2 ∂ ∂ ∂  = = = M X (t ) M ( pet ) pet X (t )  2  ∂t  ∂t ∂t ∂t 

[using Eqns. (3.2.12)]  ∂2

′  µ= 2

2  ∂t

= µ3′

(3.2.13)

 M X (t )= t 0=  p 

∂3 = M X (t ) t 0 ∂t 3

 ∂ ∂3 ∂  ∂2 = = = M X (t ) M ( pet ) pet X (t )   3 ∂t  ∂t 2 ∂ t ∂t 

[using Eqns. (3.2.13)] (3.2.14)

 ∂3

 M X (t )= t 0=  p  ∂t 

′  µ= 3

= µ4′

3

∂4 = M X (t ) t 0 ∂t 4

Discrete Distributions

61

 ∂ ∂4 ∂  ∂3 = = = M ( t ) ( pet ) pet X X (t )   3M 4 ∂t  ∂t ∂ t ∂t 

[using Eqns. (3.2.14)] (3.2.15)

 ∂4

 M X (t )= t 0=  p  ∂t 

′  µ= 4

4

 Note: The expression PGF can be derived using the same MGF = PX (t ) ∑ = t x P ( X x). method. The formula of PGF is x

e. Characteristic Function (CF) of Bernoulli Distribution: The CF is helpful to know the behaviors of the probability model. The expression for CF (ϕX (t)) is obtained as below: φ X (t= ) E (eitX= )

1

X x= ) ∑ eitx P(=

1

x ∑ eitx p x (1-p)1−=

1

∑ ( pe

) (1-p)1− x

it x

= x 0= x 0= x 0

φ X (t ) =(1 − p ) + peit .

 Note: We can also generate moments using CF. f. Cumulants Generating Function of Bernoulli Distribution: A CGF takes the moment of a probability density function (PDF) and produces the cumulant. A cumulant of a probability distribution is a sequence of numbers that defines the distribution in a helpful, compact way. The expression for CGF (KX (t)) is obtained as below: K X (t ) = log( M X (t ))

K X (t= ) log(1 − p + pet ) [using Eqns. (3.2.11)] (3.2.16)

We can get central moments using cumulants generating function: κ1 = mean; κ 2 = variance; κ 3 = µ3 ;

κ4 = −3κ 22 + µ4 . ∂r ∂t

= K X (t ) t 0. In this method, we find all four κ1 , κ 2 , κ 3 , κ= 4 using κ r r

Substitute r = 1, 2, 3, 4 and solve them; we can get κ1, κ2, κ3, κ4 = κ1

∂ = K X (t ) t 0 ∂t

∂ ∂ pet t = K X= (t ) log(1 − p + pe ) ∂t ∂t (1 − p + pet )

[using Eqns. (3.2.16)] (3.2.17)

Distribution Theory

62

∂



κ= t 0= 1  ∂t K X (t )=  p.   = κ2 ∂2 ∂t

∂ ∂ ∂t  ∂t

∂2 = K X (t ) t 0 ∂t 2  ∂ pet  t  [using Eqns. (3.2.17)] ∂t  (1 − p + pe ) 

 

K X (t ) = = 2  K X (t )  ∂2 K X (t ) = ∂t 2

p {(1 − p + pet )et − pet et } (1 − p ) pet = (3.2.18) t 2 (1 − p + pe ) (1 − p + pet ) 2  ∂2

 M X (t )= t 0=  p (1 − p ).  ∂t 

κ=  2

2

= κ3

∂3 = K X (t ) t 0 ∂t 3

 ∂  (1 − p ) pet  ∂3 ∂  ∂2 = K X (t ) = K X (t )    3 2 t 2  [using Eqns. (3.2.18)] ∂t  ∂t ∂t  ∂t  (1 − p + pe )  t 2 t t t t ∂  (1 − p ) pet  (1 − p ) p {(1 − p + pe ) e − 2(1 − p + pe ) pe e } =   ∂t  (1 − p + pet ) 2  (1 − p + pet ) 4

(1 − p ) p (1 − p + pet )et {1 − p + pet − 2 pet } ∂3 K ( t ) = X ∂t 3 (1 − p + pet ) 4

=

(1 − p ) pet (1 − p − pet )

(3.2.19)

(1 − p + pet )3  ∂3



 ∂t



κ3 = 0  (1 − p ) p (1 − 2 p ).  3 K X (t ) t == = κ4

∂4 = K X (t ) t 0 ∂t 4

t t  ∂  (1 − p ) pe (1 − p − pe )  ∂4 ∂  ∂3 = = K ( t ) K ( t )   X X   ∂t  ∂t 3 ∂t 4 (1 − p + pet )3  ∂t  

[using Eqns. (3.2.19)]

Discrete Distributions

63

{

} 

 (1 − p ) p (1 − p + pet )3 ( et (1 − p − pet ) − pe 2t ) − 3(1 − p + pet ) 2 pet et (1 − p − pet ) =  (1 − p + pet )6 

{

} 

 (1 − p ) p (1 − p + pet ) 2 et (1 − p + pet ) (1 − p − 2 pet ) − 3 pet (1 − p − pet ) =  (1 − p + pet )6 

{

=

 

} 

 (1 − p ) pet (1 − p + pet ) (1 − p − 2 pet ) − 3 pet (1 − p − pet ) =  (1 − p + pet ) 4  =

 

(1 − p ) pet (1 − p + pet ) 4

 

1 − p − 2 pet − p + p 2 + 2 p 2 et + pet   2 t t 2 2t 2 t 2 2t   − p e − 2 p e − 3 pe + 3 p e + 3 p e 

(1 − p ) pet 1 − 2 p + p 2 − 4 pet + 4 p 2 et + p 2 e 2t  (1 − p + pet ) 4   ∂4

κ4 = 

4  ∂t

 K X (t ) t = 0  =− (1 p ) p [1 − 6 p (1 − p ) ] . 

g. Recurrence Relation for Moments of Bernoulli Distribution: 

p (1 − p )  r µr −1 + µr +1 = 

 d µr  dp 

 Proof: The derivation of recurrence relation for the moments of Bernoulli distribution is given below: 1

µr = x) = ∑ { x − p} P( X = ∑ { x − p} p x (1-p)1− x r

x

r

x =0

Differentiating with respect to p, we get: 1 d d 1 d r r µr =∑ { x − p} p x (1-p)1− x = { x − p} p x (1-p)1− x ∑ dp dp x 0=x 0 dp =

 r { x − p}r −1 (-1)p x (1-p)1− x + { x − p}r   = ∑ x −1 (1-p)1− x + (1 − x)p x (1-p)1− x −1 (−1)} x = 0 {xp  1

=

1



x =0



∑  −r { x − p}

r −1

 x − xp − p + xp  r p x (1-p)1− x + { x − p} p x (1-p)1− x   p (1 − p )  

Distribution Theory

64

=

1



x =0



∑ −r { x − p} 1

= −r ∑ { x − p}

r −1

r −1

p x (1-p)1− x + { x − p}

r +1

1

p x (1-p)1− x + ∑ { x − p}

 1  p x (1-p)1− x    p (1 − p ) 

r +1

x 0= x 0 =

 1  p x (1-p)1− x    p (1 − p ) 

d µr 1 = −r µr −1 + µ r +1 dp p (1 − p )  d µr  + r µ r −1  (3.2.20)  dp 

p (1 p )  µr +1 =−

Substituting r = 1 in Eqn. (3.2.20):  d µ1  + r µ0   dp 

µ1+1 =− p (1 p ) 

As we know: μ0 = 1 and μ1 = 1 μ2 = p (1 – p) Substituting r = 2 in Eqn. (3.2.20): dµ  d  µ2 +1 =pq  2 + 2 µ1  =p(1 − p)  p(1 − p)   dp   dp 

μ3 = p (1 – p) [1 – 2p] Substituting r = 3 in Eqn. (3.2.20):  d µ3  + 3µ 2   dp 

µ3+1 =− p (1 p ) 

d  { p(1 − p)(1 − 2 p)} + 3 p(1 − p)  dp   d  = p (1 − p )  { p − 2 p 2 − p 2 + 2 p 3 } + 3 p (1 − p )  dp  

µ4 = p(1 − p) 

= p (1 − p ) {1 − 6 p + 6 p 2 } + 3 p (1 − p )  = p (1 − p )(1 − 3 p + 3 p 2 )

µ4 =p(1 − p) {1 − 3 p(1 − p)}

Discrete Distributions

65

3.3 BINOMIAL DISTRIBUTION It is a generalization of Bernoulli’s distribution. If the Bernoulli experiment performs n times, the distribution of the probability of the number of success x occurs in n number of trials is called Binomial distribution. It contains a finite number of values (0, 1, 2, …, n) with a probability of success (p) and failure probability (q) with two conditions i) 0 ≤ p ≤ 1 and ii) p + q = 1. For example, among 25 HIV tests with two consequences: positive or negative, we can find the probability of five HIV tested positive. This distribution gives the probability of x number of success in n number of trials. In Binomial distribution, the PMF of a random variable X is: n!  px qn− x  )  x !( n − x ) ! P( X= x= 0 

x=0,1,...,n otherwise

Symbolically: X~B (n, p). Here n and p are two parameters of the distribution.  Sample Generation from Binomial Distribution (Figure 3.4)

FIGURE 3.4 Binomial distribution.

 Output:

Distribution Theory

66

3.3.1 PROPERTIES OF BINOMIAL DISTRIBUTION a. Moments of Binomial Distribution: The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: n

∑ xP( X=

′ E ( X= µ= ) 1

x= )

n!

n

∑ x x !( n − x )! p

x

(1-p) n − x

= x 0= x 0

n! p x (1-p) n − x x = 0 ( x − 1)!( n − x ) ! n

=∑

(n − 1)! p x −1 (1-p) n − x = np ( p + q ) n −1 = np (1) n −1 x =1 ( x − 1)!( n − x ) ! n

= np ∑

[using Binomial theorem]

μʹ1 = np (3.3.1)

Thus, np is the mean of the distribution. n

X ∑ x P(=

2 ′ E( X= µ= ) 2

2

x= )

n

∑x

2

= x 0= x 0

=

n!

n

n! p x (1-p) n − x x !( n − x ) !

∑ [ x( x − 1) + x ] x !( n − x )! p

x

(1-p) n − x

x =0

n n! n! = x( x − 1) p x (1-p) n − x + ∑ x p x (1-p) n − x ∑ ( x)!( n − x ) ! ( x)!( n − x ) ! x 0= x 0 n

n n − 2! n(n − 1) p 2 ∑ p x − 2 (1-p) n − x + np [using Eqn. (3.3.1)] = x = 2 ( x − 2)!( n − x ) !

=n(n − 1) p 2 ( p + q ) n − 2 + np [using Binomial theorem]

µ2′ =n(n − 1) p 2 + np (3.3.2) ′ E ( X 3= µ= ) 3

n

∑ x P( X= 3

x= )

n

∑x

= x 0= x 0

=

n

3

n! p x (1-p) n − x x !( n − x ) ! n!

∑ [ x( x − 1)( x − 2) + 3x( x − 1) + x ] x !( n − x )! p

x

(1-p) n − x

x =0

n n! n!   x n− x x ( x − 1)( x − 2) p (1-p) + 3 x( x − 1) p x (1-p) n − x  ∑ ∑ x !( n − x )! ( x)!( n − x ) ! x 0= x 0  = n   n! p x (1-p) n − x +∑ x   x = 0 ( x)!( n − x ) !  n

Discrete Distributions

=

67

n!

n

∑ x( x − 1)( x − 2) ( x)!( n − x )! p

x

x =0

(1-p) n − x + 3n(n − 1) p 2 + np

[using Eqns. (3.3.1) and (3.3.2)]

(n − 3)! p x −3 (1-p) n − x + 3n(n − 1) p 2 + np ( 3)! ! x − n − x ( ) x =3 n

= n(n − 1)(n − 2) p 3 ∑

= n(n − 1)(n − 2) p 3 ( p + q ) n −3 + 3n(n − 1) p 2 + np [using Binomial theorem]

µ3′ = n(n − 1)(n − 2) p 3 + 3n(n − 1) p 2 + np (3.3.3) 4 = µ4′ E ( X= )

n

(X ∑ x P= 4

x= )

n

∑x

n! p x (1-p) n − x x !( n − x ) !

4

= x 0= x 0

=

n!

n

∑ [ x( x − 1)( x − 2)( x − 3) + 6 x( x − 1)( x − 2) + 7 x( x − 1) + x ] x !( n − x )! p

x

(1-p) n − x

x =0

n n! n!  n x n− x x n− x   ∑ x( x − 1)( x − 2)( x − 3) x ! n − x ! p (1-p) +6∑ x( x − 1)( x − 2) x ! n − x ! p (1-p)  ( ) ( ) x 0= x 0  = n n   n! n! x n− x x n− x p (1-p) + ∑ x p (1-p)  +7∑ x( x − 1)  ( x)!( n − x ) ! ( x)!( n − x ) ! x 0  x 0= 

=

n!

n

∑ x( x − 1)( x − 2)( x − 3) ( x)!( n − x )! p

x

(1-p) n − x + 6n(n − 1)

x =0

(n − 2) p 3 + 7 n(n − 1) p 2 + np

[using Eqns. (3.3.1)–(3.3.3)]

(n − 4)! p x − 4 (1-p) n − x + 6n(n − 1) x = 4 ( x − 4)!( n − x ) ! n

= n(n − 1)(n − 2)(n − 3) p 4 ∑ (n − 2) p 3 + 7 n(n − 1) p 2 + np

= n(n − 1)(n − 2)(n − 3) p 4 ( p + q ) n − 4 + 6n(n − 1)(n − 2) p 3 + 7 n(n − 1) p 2 + np

[using Binomial theorem]

µ4′ = n(n − 1)(n − 2)(n − 3) p 4 + 6n(n − 1)(n − 2) p 3 + 7n(n − 1) p 2 + np (3.3.4)

b. Central Moments Binomial Distribution: The first four central moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: μ1 = 0

(3.3.5)

−np + np = µ2 = µ2′ − µ 1′ = n(n − 1) p + np − (np ) = np (1 − p ) [using Eqns. (3.3.1) and (3.3.2)] 2

2

2

2

μ2= npq (3.3.6)

Distribution Theory

68

So, the variance of the Binomial distribution is npq µ3 =µ3′ − 3µ2′ µ1′ + 2µ 1′3 = n(n − 1)(n − 2) p 3 + 3n(n − 1) p 2 + np  −3  n(n − 1) p + np  np + 2(np ) 2

3

[using Eqns. (3.3.1)–(3.3.3)]

= np (n − 1)(n − 2) p 2 + 3(n − 1) p + 1 − 3n(n − 1) p 2 − 3np + 2n 2 p 2  = np  2 p 2 − 3 p + 1

μ3 = npq (q – p) (3.3.7)

µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4  n(n − 1)(n − 2)(n − 3) p 4 + 6n(n − 1)(n − 2) p 3 + 7 n(n − 1) p 2 + np     −4np  n(n − 1)(n − 2) p 3 + 3n(n − 1) p 2 + np  + 6  n(n − 1) p 2 + np   =   2 4   ( np ) − 3(np ) 

[using Eqns. (3.3.1)–(3.3.4)]

 (n − 1)(n − 2)(n − 3) p 3 + 6(n − 1)(n − 2) p 2 + 7(n − 1) p + 1    3 2 2 =np  −4  n(n − 1)(n − 2) p + 3n(n − 1) p + np  + 6  n(n − 1) p + np     3  np − 3(np )   (n3 − 6n 2 + 11n − 6) p 3 + 6(n 2 − 3n + 2) p 2 + 7(n − 1) p + 1  = np   3 2 2 2 2 2 3 3    −4np (n − 3n + 2) − 12n(n − 1) p − 4np + 6np  n p − np + np  − 3n p   n3 p 3 − 6n 2 p 3 + 11np 3 − 6 p 3 + 6n 2 p 2 − 18np 2 + 12 p 2 + 7 np − 7 p + 1   = np  −4n3 p 3 + 12n 2 p 3 − 8np 3 − 12n 2 p 2 + 12np 2 − 4np + 6n3 p 3   −6n 2 p 3 + 6n 2 p 2 − 3n3 p 3    = np 3np 3 − 6 p 3 − 6np 2 + 12 p 2 + 3np − 7 p + 1 = np (3n − 6) p ( p 2 − 2 p + 1) + q 

µ= npq [3(n − 2) pq + 1] (3.3.8) 4

c. Skewness and Kurtosis of Binomial Distribution: The expressions of skewness (β1, γ1) and kurtosis (β2, γ2) four coefficients based upon the first four central moments are given below (Figure 3.5):

= β1

µ32 = µ23

[ npq(q − p)= ] 3 [ npq ] 2

(q − p) 2 [using Eqns. (3.3.6) and (3.3.7)] npq

Discrete Distributions

69

= γ1

µ µ2

3 = 3/ 2

µ4 npq [3(n − 2) pq + 1] = = 2 µ22 [ npq ] β2=

γ 2 = β2 − 3 =

(q − p) npq

(3.3.9)

[3(n − 2) pq + 1]= npq

3+

1 − 6 pq npq

[using Eqns. (3.3.6) and (3.3.8)] 1 − 6 pq (3.3.10) npq

1 (i.e.,γ 1 > 0), the distribution is positively skewed; 2 1 p = ,γ 1 0, this implies that distribution is symmetric when for= 2 1 p = . Similarly, the distribution is negatively skewed when 2 1 1 1 1 1 ≤ p < 1. Let = 0.2113 and = = 0.7887 be two p0 = p1 2 2 2 2 12 values of the parameter p. For p ∈ {p0 , p1}, the distribution is

For 0 < p
0. It contains a countable number of values (0, 1, 2, 3, …). For example, the number of deaths due to COVID-19 out of infected patients within one hour. Here we have applied poison distribution because the number of COVID-19 patients is very large, and the probability of the event (death) is very low. In Poisson distribution, the PMF of a random variable X is:  λ x e−λ x=0,1,2,3,...  )  x! P( X= x= 0 otherwise 

Symbolically: X ~ P (λ). Here λ is the parameter of the distribution and λ > 0.  Sample Generation from Poisson Distribution (Figure 3.6)

Distribution Theory

76

FIGURE 3.6 Poisson distribution.

 Output:

3.4.1 PROPERTIES OF THE POISSON DISTRIBUTION a. Moments of the Poisson Distribution: The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: ′ E ( X= µ = ) 1

X ∑ xP(=

x= )

∑x

= x 0= x 0

λ x e−λ x!

= λ e−λ ∑

= x 1

λ x −1

( x − 1)!

Using exponential series, we get: μʹ1 = λ (3.4.1)

Thus, λ is the mean of the distribution. µ2′ = E ( X 2 ) =

∑ x P( X = 2

= x 0

=

∑ x( x − 1)

= x 0

λ x e−λ x!

x) =

∑x

2

λ x e−λ x!

=

∑ [ x( x − 1) + x ]

λ x e−λ

= x 0= x 0

+∑x

λ x e−λ x!

=

n

λ x e−λ

∑ x − 2! + λ=

= x 0= x 2

n

λ 2 e−λ ∑

= x 2

x!

λ x−2 x − 2!

+λ

Discrete Distributions

77

Using exponential series and Eqn. (3.4.1), we get: ′ E ( X 3= µ= ) 3

μʹ2 = λ2 + λ (3.4.2) x −λ 3 3 λ e

∑ x P( X=

x= )

∑x

x!

= x 0= x 0

=

∑ [ x( x − 1)( x − 2) + 3x( x − 1) + x ]

x −λ

λ e

x!

x =0

=

∑ x( x − 1)( x − 2)

λ x e−λ

+ ∑ 3 x( x − 1)

x!

= x 0= x 0 x −λ

λ x e−λ x!

λ e λ x λ 3e−λ ∑ + ∑= +3λ 2 + λ x! x 0= x 3 x − 3! = x −3

[using Eqns. (3.4.1) and (3.4.2)]

Using exponential series, we get:

μʹ3 = λ3 + 3λ2 +λ (3.4.3) ′ E ( X 4= µ= ) 4

∑ x P( X= 4

x= )

∑x

x 0= x 0 =

=

4

λ x e−λ x!

∑ [ x( x − 1)( x − 2)( x − 3) + 6 x( x − 1)( x − 2) + 7 x( x − 1) + x ]

λ x e−λ

x =0

∑ x( x − 1)( x − 2)( x − 3)

=

λ x e−λ x!

+6∑ x( x − 1)( x − 2)

x 0= x 0

+7∑ x( x − 1)

x =

=

x −λ

λ e

+∑x x! x 0 0=

∑ x( x − 1)( x − 2)( x − 3)

λ x e−λ

x =0

= λ 4 e−λ ∑

x!

x!

λ x e−λ x!

x −λ

λ e

x!

+ 6λ 3 + 7λ 2 + λ [using Eqns. (3.4.1)–(3.4.3)]

λ x−4

+ 6λ 3 + 7λ 2 += λ λ 4 e − λ eλ + 6λ 3 + 7λ 2 + λ x 4! − [using exponential series] x=4

µ4′ =λ 4 + 6λ 3 + 7λ 2 + λ (3.4.4)

b. Central Moments Poisson Distribution: The first four central moments μ1 , μ2, μ3, μ4 have been derived as below: Since, the first central moment about mean is zero, i.e.:

μ1 = 0

(3.4.5)

µ2 = µ2′ − µ 1′2= λ 2 + λ − (λ ) 2 [using Eqns. (3.4.1) and (3.4.2)] (3.4.6)

μ2 = λ

Distribution Theory

78

So, variance of the Poisson distribution is λ. µ3 = µ3′ − 3µ2′ µ1′ + 2µ1′3 = λ 3 + 3λ 2 + λ  − 3 λ 2 + λ  λ + 2λ 3 [using Eqns. (3.4.1)–(3.4.3)]

μ3 = λ (3.4.7) µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4 = λ 4 + 6λ 3 + 7λ 2 + λ − 4λ λ 3 + 3λ 2 + λ  + 6 λ 2 + λ  λ 2 − 3λ 4

[using Eqns. (3.4.1)–(3.4.4)]

μ4 = 3λ2 + λ

(3.4.8)

c. Skewness and Kurtosis of Poisson Distribution: The expressions of skewness (β1, γ1) and kurtosis (β2, γ2) four coefficients based upon the first four central moments are given below (Figure 3.7): = β1

µ32 = µ23

[λ= ] 3 [λ ] 2

= γ1

β2=

1

λ [using Eqns. (3.4.6) and (3.4.7)]

= β1

1

λ

(3.4.9)

µ4 3λ 2 + λ 1 = = 3 + [using Eqns. (3.4.6) and (3.4.8)] 2 λ µ22 [λ ] γ 2 = β2 − 3 =

1

λ

(3.4.10)

This distribution is positive skewed and leptokurtic for all values of λ > 0. Note that, as λ → , the distribution approaches to symmetric and mesokurtic.

FIGURE 3.7 Shape of the Poisson distribution.

Discrete Distributions

79

d. Moment Generating Function (MGF) of Poisson Distribution: The expression for MGF (MX (t) is obtained as below: M X (t= ) E (etX= )

∑ etx P( X= x=)

x 0 =

(e t λ ) x e − λ (e t λ ) x −λ e = ∑ ∑ x! x= ! x! 0= x 0 x 0

∑ etx

x =

λ x e−λ

=

M X (t ) = eλ ( e −1) [using exponential series] (3.4.11) t

We can find moments using the moment generating function:

= µr′

∂r = M X (t ) t 0 ∂t r

= µ1′

∂ = M X (t ) t 0 ∂t

t ∂ ∂ λ ( et −1) = M X (t ) = e eλ ( e −1) (λ et ) [using Eqns. (3.4.11)] (3.3.12) ∂t ∂t ∂  ′  M X (t )= µ= t 0= 1  λ t ∂  

∂2 = M X (t ) t 0 ∂t 2

= µ2′

∂2 ∂ ∂  ∂  t λ ( et −1)  = M X (t ) = λe e  M X (t )  2  [using Eqn. (3.4.12)] ∂t  ∂t ∂t  ∂t 

{

}

t t ∂2 = M X (t ) λ et eλ ( e −1) (λ et ) + et eλ ( e −1) (3.4.13) 2 ∂t

 ∂2

 2 M X (t )= t 0=  λ +λ ∂ t  

′  µ= 2

(3.4.14)

2

∂3 = M X (t ) t 0 ∂t 3

= µ3′

{

}

t t  ∂  ∂3 ∂  ∂2 = M ( t ) = λ et eλ ( e −1) (λ et ) + et eλ ( e −1)   2 M X (t )  X 3   ∂t  ∂t ∂t  ∂t

{

}

{

[using Eqn. (3.4.13)]

}

{

}

t ∂3 ∂ 2t λ ( et −1) ∂ ∂ 2t λ ( et −1) ∂ ∂  + +  M X (t )  M X (t )= λ 2 e e λ et eλ ( e −1) = λ 2 e e 3 ∂t ∂t ∂t ∂t  ∂t ∂t 

[using Eqn. (3.4.12)]

Distribution Theory

80

{

}

∂3 ∂2 2 λ ( et −1) 2 t λ ( et −1) 2 t t = + + M ( t ) λ 2 e e λ e e e M X (t ) (3.4.15) X ∂t 3 ∂t 2  ∂3  µ3′ =  3 M X (t ) t = 0  = λ 2 {2 + λ} + λ 2 + λ  ∂t 

µ3′ =λ 3 + 3λ 2 + λ (3.4.16)

= µ4′

∂4 = M X (t ) t 0 ∂t 4

{

}

 ∂  2  ∂4 ∂  ∂3 ∂2 λ ( et −1) 2 t λ ( et −1) 2 t t M ( t ) = M ( t ) = λ 2 e e + λ e e e + M X (t )    X X  4 3 2 ∂t  ∂t ∂t ∂t  ∂t  

[using Eqn. (3.4.15)]

t t   ∂ ∂ ∂ ∂2 =  2λ 2 eλ ( e −1) e 2t + λ 3 eλ ( e −1) e3t + M X (t )  2 ∂t ∂t ∂t ∂t  

{ {

} }

 2λ 2 2eλ ( et −1) e 2t + λ eλ ( et −1) et e 2t    ∂ M X (t =  3  4 t t ∂ ∂t  +λ 3 3eλ ( e −1) e3t + λ eλ ( e −1) et e3t + 3 M X (t )  ∂t   4

 ∂4



µ4′ =  4 M X (t ) t = 0  =  2λ 2 {2 + λ} + λ 3 {3 + λ} + λ 3 + 3λ 2 + λ   ∂t  [using Eqn. (3.4.16)] µ4′ =λ 4 + 6λ 3 + 7λ 2 + λ

 Note: The expression PGF can be derived using the same MGF = PX (t ) ∑ = t x P ( X x). method. The formula of PGF is x

e. Characteristic Function (CF) of Poisson Distribution: The expression for CF (ϕX (t)) is obtained as below: φ X (t= ) E ( eitX = )

∑e

= x 0

= e−λ ∑

x =0

(e λ ) x! it

itx

P( X= x= )

∑e

itx

λ x e−λ x!

=

(eit λ ) x e − λ x! 0

∑

= x 0= x

x

φ X (t ) = eλ ( e

it

−1)

[using exponential series]

f. Cumulant Generating Function (CGF) of Poisson Distribution: The detailed derivation of the CDF is given below:

Discrete Distributions

81

KX (t) = log(MX(t)) t

K X (t ) = log eλ ( e −1)

KX (t) = λ (et –1)

(3.4.17)

We can find moments using cumulant generating function:

κr = is the coefficient of κ1 = mean;

κ2 = variance;

κ3 = µ3 ;

tr . r!

κ4 = −3κ 2 + µ4 .

   t t2 t3 t4  t t2 t3 t4 K X (t ) = λ 1 + + + + ........ − 1 = λ  + + + ........   1! 2! 3! 4!   1! 2! 3! 4! 

[using exponential series]

κ1 is the coefficient of

t = λ. 1!

κ 2 is the coefficient of

t2 = λ. 2!

κ 3 is the coefficient of

t3 = λ. 3!

κ 4 is the coefficient of

t4 = λ. 4!

g. Recurrence Relation for Moments of Poisson Distribution: dµ   µr +1 λ  r µr −1 + r  = dλ  

 Proof:

x) = µr = ∑ { x − λ} P ( X = ∑ { x − λ} r

r

λ x e−λ

x =0

x

x!

Differentiating with respect to λ, we get: x −λ d d n 1 d r λ e r µ r = ∑ { x − λ} =∑ { x − λ} λ x e − λ λ λ λ d d x ! x ! d x 1= x 0 = 1 r −1 x −λ = ∑ r { x − λ} (−1)λ e + { x − λ}r ( λ x e − λ (−1) + xλ x −1e − λ )    x =0 x !

=

1

∑ x !  − r { x − λ} x =0



r −1

 

λ x e − λ + { x − λ} λ x e − λ  −1 + r

x 

λ  

Distribution Theory

82

=

=

1

∑ x !  − r { x − λ} x =0

1

x =0

λ x e − λ + { x − λ} λ x e − λ 

r −1

λ x e − λ + { x − λ}



∑ x !  − r { x − λ} 

 x − λ    λ 

r −1

r

r +1

 1 

λ x e−λ    λ 

 r −1 1 r +1 1    1  = −r ∑ { x − λ} λ x e − λ  + ∑ { x − λ} λ x e−λ   x! x!   x 0  λ  = x 0=

d µr 1 = −r µ r −1 +   µ r +1 dλ λ dµ   µr +1 λ  r µr −1 + r  = dλ  

Substituting r = 1 dµ   µ1+1 λ  µ0 + 1  = dλ  

As we know: µ0 =E ( x − Ex)0 =E (1) =1 and µ1 =0

μ2 = λ [1 + 0] μ2 = λ Substituting r = 2 dµ   d  µ2 +1= λ  2 + 2µ1 = λ  λ + 0   dλ   dλ 

μ3 = λ Substituting r = 3 dµ   d (λ )  µ3+1 = λ  3 + 3µ2  = λ  + 3λ  = λ [1 + 3λ ] d d λ λ    

μ4 = λ + 3λ2 3.5 GEOMETRIC DISTRIBUTION Geometric distribution contains a countable number of values (0, 1, 2, …) with success probability (p) and failure probability (q) with two conditions

Discrete Distributions

83

i) 0 ≤ p ≤ 1 and ii) p + q = 1. This distribution gives the probability of the number of failures before the first success (x). For example, the number of personal loan offers given by the lender to the customers until the first loan is sanctioned. In Geometric distribution, the PMF of a random variable X is:  pq x P ( X= x= )  0

x=0,1,2... otherwise

Symbolically: X ~ Geo(p). Here p is the parameter of the distribution.  Sample Generation from Geometric Distribution (Figure 3.8)

FIGURE 3.8 Geometric distribution.

 Output:

3.5.1 PROPERTIES OF THE GEOMETRIC DISTRIBUTION a. Moments of Geometric Distribution: The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: µ1′ = E ( X ) = ∑ xP( X = x) = ∑ xpq x = p (0q 0 + 1q1 + 2q 2 + 3q 3 .......) x

x =0

µ1′ = p(q + 2q 2 + 3q 3 .......)

Distribution Theory

84

Multiply by q in the above equation: q µ1′ = p (q 2 + 2q 3 + 3q 4 .......)

Subtracting the above two equations, we have: µ1′ − q µ1′ = p(q + 2q 2 + 3q 3 .......) − p(q 2 + 2q 3 + 3q 4 .......) µ1′(1 − q ) = p (q + q 2 + q 3 .......) q 1− q

p µ1′(1 − q ) =

µ1′ =

pq

(1 − q )

µ1′ =

Thus,

2

q (3.5.1) p

q is the mean of the distribution. p ′ E ( X 2= µ= ) 2

∑x

2

∑ x P( X= 2

x= )

x

pq x= p (02 q 0 + 12 q1 + 22 q 2 + 32 q 3 + 42 q 4 + 52 q 5 .......)

x =0

µ2′ = p(0 + 1q1 + 4q 2 + 9q 3 + 16q 4 + 25q 5 .......)

Multiply by q in the above equation: q µ 2′= p (1q 2 + 4q 3 + 9q 4 + 16q 5 + 25q 6 .......)

Subtracting the above two equations, we have: µ2′ − q µ2′ = p(0 + 1q1 + 4q 2 + 9q 3 + 16q 4 + 25q 5 .......) − p (1q 2 + 4q 3 + 9q 4 + 16q 5 + 25q 6 .......)

µ2′ (1 − q ) = p (q + 3q 2 + 5q 3 + 7q 4 .......)

Multiply by q in the above equation: µ2′ q(1 − q)= p(q 2 + 3q 3 + 5q 4 + 7 q 5 .......)

Subtracting the above two equations, we have: µ2′ (1 − q) 2 = p(q + 2q 2 + 2q 3 + 2q 4 .......) = pq + 2q 2 p (1 + q + q 2 + q 3 .......) µ2′ (1 − q ) 2 = pq + 2q 2 p

1 1 1 = 2  pq + 2q 2  = 2  q ( p + q ) + q 2  1− q p p

Discrete Distributions

85

= µ2′

µ3′= E ( X 3 )=

∑x

3

q [ q + 1] (3.5.2) p2

pq x= p (03 q 0 + 13 q1 + 23 q 2 + 33 q 3 + 43 q 4 + 53 q 5 .......)

x =0

µ3′ = p(0 + 1q1 + 8q 2 + 27q 3 + 64q 4 + 125q 5 .......)

Multiply by q in the above equation: q µ3′ = p (q 2 + 8q 3 + 27 q 4 + 64q 5 + 125q 6 .......)

Subtracting the above two equations, we have: µ3′ − q µ3′ = p(0 + 1q1 + 8q 2 + 27q 3 + 64q 4 + 125q 5 .......) − p (q 2 + 8q 3 + 27 q 4 + 64q 5 + 125q 6 .......)

µ3′ (1 − q ) = p (q + 7q 2 + 19q 3 + 37q 4 .......)

Multiply by q in the above equation: µ3′q(1 − q)= p(q 2 + 7 q 3 + 19q 4 + 37 q 5 .......)

Subtracting the above two equations, we have: µ3′ (1 − q ) 2 = p (q + 6q 2 + 12q 3 + 18q 4 .......)

Multiply by q in the above equation: µ3′q(1 − q) 2 = p(q 2 + 6q 3 + 12q 4 + 18q 5 .......)

Subtracting the above two equations, we have: µ3′ (1 − q )3 = p (q + 6q 2 + 12q 3 + 18q 4 .......) − p (q 2 + 6q 3 + 12q 4 + 18q 5 .......)

µ3′ (1 − q )3 = p (q + 5q 2 + 6q 3 + 6q 4 .......) 

µ3′ (1 − q)3 = p {q + 5q 2 + 6q 3 (1 + q + q 2 ....)} = p q + 5q 2 + 6q 3

µ3′ =

q { p + 5 pq + 6q 2 }= p3

µ4′= E ( X 4 )=

∑x

4

1   1− q 

 q q 2 2 p + 5 pq + 5q + q }= (1 + 4q + q 2 ) (3.5.3) 3 { p p3

pq x= p (04 q 0 + 14 q1 + 24 q 2 + 34 q 3 + 44 q 4 + 54 q 5 .......)

x =0

µ4′ = p(0 + 1q1 + 16q 2 + 81q 3 + 256q 4 + 625q 5 .......)

Distribution Theory

86

Multiply by q in the above equation: q µ 4′ = p (0 + 1q 2 + 16q 3 + 81q 4 + 256q 5 + 625q 6 .......)

Subtracting above two equations, we have: µ4′ − q µ4′ = p(0 + 1q1 + 16q 2 + 81q 3 + 256q 4 + 625q 5 .......) − p (0 + 1q 2 + 16q 3 + 81q 4 + 256q 5 + 625q 6 .......)

µ4′ (1 − q ) = p (q + 15q 2 + 65q 3 + 175q 4 + 369q 5 .......)

Multiply by q in the above equation: µ4′ q(1 − q)= p(q 2 + 15q 3 + 65q 4 + 175q 5 + 369q 6 .......)

Subtracting above two equations, we have: µ4′ (1 − q) 2 = p(q + 15q 2 + 65q 3 + 175q 4 + 369q 5 .......) − p (q 2 + 15q 3 + 65q 4 + 175q 5 + 369q 6 .......)

µ4′ (1 − q ) 2 = p (q + 14q 2 + 50q 3 + 110q 4 + 194q 5 .......)

Multiply by q in the above equation: µ4′ q(1 − q) 2 = p(q 2 + 14q 3 + 50q 4 + 110q 5 + 194q 6 .......)

Subtracting above two equations, we have: µ4′ (1 − q)3 = p(q + 14q 2 + 50q 3 + 110q 4 + 194q 5 .......) − p (q 2 + 14q 3 + 50q 4 + 110q 5 + 194q 6 .......)

µ4′ (1 − q )3 = p (q + 13q 2 + 36q 3 + 60q 4 + 84q 5 .......)

Multiply by q in the above equation: µ4′ q(1 − q)3 = p(q 2 + 13q 3 + 36q 4 + 60q 5 + 84q 6 .......)

Subtracting the above two equations, we have: µ4′ (1 − q) 4 = p(q + 13q 2 + 36q 3 + 60q 4 + 84q 5 .......) − p (q 2 + 13q 3 + 36q 4 + 60q 5 + 84q 6 .......)

µ4′ (1 − q ) 4 = p (q + 12q 2 + 13q 3 + 24q 4 + 24q 5 .......)

Multiply by q in the above equation: µ4′ q(1 − q) 4 = p(q 2 + 12q 3 + 13q 4 + 24q 5 + 24q 6 .......)

Discrete Distributions

87

Subtracting the above two equations, we have: µ4′ (1 − q)5 = p(q + 12q 2 + 13q 3 + 24q 4 + 24q 5 .......) − p (q 2 + 12q 3 + 13q 4 + 24q 5 + 24q 6 .......)

µ4′ (1 − q )5 = p (q + 11q 2 + q 3 + 11q 4 ) µ4′ =

q (1 + 11q + 11q 2 + q 3 ) (3.5.4) p4

b. Central Moments of Geometric Distribution: The first four central moments μ1 , μ2, μ3, μ4 have been derived as below:

μ1 = 0 (3.4.5) 2

µ2 = µ2′ − µ 1′2=

q q ( q + 1) −   [using Eqns. (3.5.1) and (3.5.2)] p2  p

q (3.5.6) p2 q So, variance of the geometric distribution is 2 . p

µ2 =

µ3 = µ3′ − 3µ2′ µ1′ + 2µ 1′3 =

 q  q  q q 1 + 4q + q 2 } − 3  2 ( q + 1)    + 2   3 { p  p p  p 

= µ3

3

[using Eqns. (3.5.1)–(3.5.3)]

q (1 + q ) (3.5.7) p3

µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4 2

=

q (1 + 11q + 11q 2 + q3 ) − 4 pq3 (1 + 4q + q 2 )  qp  + 6 pq2 ( q + 1)  qp  − 3  qp  p4      

=

4

[using Eqns. (3.5.1)–(3.5.4)]

q 1 + 11q + 11q 2 + q 3 − 4 ( q + 4q 2 + q 3 ) + 6 ( q 3 + q 2 ) − 3q 3   p4  q µ4= 4 1 + 7 q + 7 q 2  (3.5.8) p

c. Skewness and Kurtosis of Geometric Distribution: The expressions of skewness (β1, γ1) and kurtosis (β2, γ2) four coefficients based upon the first four central moments are given below (Figure 3.9):

Distribution Theory

88

2

 q (1 + q )  µ32  p 3  (1 + q ) 2 = = β1 = 3 q µ23  q   p2   

µ3 (1 + q ) = (3.5.9) µ23/ 2 q

= γ1

q

1 + 7 q + q 2 

µ4 p4  β2 = = 2 µ22  q   p2   

1 + 7 q + q 2  1 + q 2  1 + (1 − p ) 2  = = = 7+  7+  q q q

1 + 1 + p 2 − 2 p   2q + p 2  = = 7+  7+  q q

β 2= 9 +

p2 q

γ 2 = β2 − 3 = 6 +

p2 (3.5.10) q

This distribution is positively skewed and leptokurtic.

FIGURE 3.9 Shape of the geometric distribution.

The expression for MGF (Mx (t)) is obtained as below: M X (= t ) E (etX= )

∑e

= x 0

tx

P(= X x= )

∑e

tx

x pq= p ∑ (e t q ) x

= x 0= x 0

Discrete Distributions

89

1  p = p (et q )0 + (et q )1 + (et q ) 2 + ..... = [using sum of geometric series] 1 − et q M X (= t ) p (1 − qet ) , t < − ln (q) (3.5.11)

−1

We can get moments using the moment generating function: = µr′

∂r = M X (t ) t 0 ∂t r

= µ1′

∂ = M X (t ) t 0 ∂t

∂ ∂ M X (t ) = p (1 − qet ) −1 = p (−1)(1 − qet ) −1−1 ( − qet ) = pqet (1 − qet ) −1−1 ∂t ∂t

[using Eqn. (3.5.11)]

∂ M= pqet (1 − qet ) −2 (3.5.12) X (t ) ∂t

′  M X (t )= t 0= µ= 1  p (3.5.13)  ∂t 

∂

= µ2′



q

∂2 = M X (t ) t 0 ∂t 2

∂2 ∂ ∂ ∂ ∂ t −1   pqet (1 − qet ) −2 = pq et (1 − qet ) −2  M X (t= )  p (1 − qe ) = 2 ∂t  ∂t ∂t ∂t  ∂t

[using Eqn. (3.5.11)]

= pq (1 − qet ) −2 et + (−2)(1 − qet ) −2 −1 (−qet )et 

∂2 M X (t ) =pq (1 − qet ) −2 et + 2qe 2t (1 − qet ) −3  (3.5.14) ∂t 2  ∂2

  1 2q  q M X (t ) t =0  =pq  2 + 3  = 2 (1 + q ) (3.5.15) p  p p  ∂t 

µ2′ =

2

= µ3′

∂3 = M X (t ) t 0 ∂t 3

∂3 ∂3 ∂  ∂2 t −1  M X (t ) = p (1 − qet ) −1 =  2 p (1 − qe )  3 3 ∂t  ∂t ∂t ∂t 

Distribution Theory

90

∂ {(1 − qet ) −2 et + 2qe 2t (1 − qet ) −3 }  ∂t  ∂ ∂  = pq  (1 − qet ) −2 et + 2q e 2t (1 − qet ) −3  [using Eqn. (3.5.5)] ∂ ∂ t t  

= pq

=pq (1 − qet ) −2 et + 2(1 − qet ) −3 et ( qet ) + 4qe 2t (1 − qet ) −3 + 6qe 2t (1 − qet ) −4 ( qet )  ∂3

t −2 t t −3 t −4 2t 2 3t ∂t 3 M X (t ) = pq (1 − qe ) e + 6qe (1 − qe ) + 6q e (1 − qe )  (3.5.16)

 ∂3

  1 6q 6q 2  ( ) = 0 = M t t pq  X  2+ 3+ 4  3 p p  p  ∂t  q q  p 2 + 6qp + 6q 2=   p 2 + 6q  [using Eqn. (3.5.6)] 3  p p3 

µ3′ = 

=

µ3′=

= µ4′

q (1 + 4q + q 2 ) (3.5.17) p3 ∂4 = M X (t ) t 0 ∂t 4

 ∂4 ∂  ∂3 M X (t ) =  3 M X (t )  4 ∂t  ∂t ∂t  ∂4 ∂  pq (1 − qet ) −2 et + 6qe 2t (1 − qet ) −3 + 6q 2 e3t (1 − qet ) −4   M X (= t)  ∂t   ∂t 4 ∂ ∂ ∂  = pq  (1 − qet ) −2 et + 6qe 2t (1 − qet ) −3 + 6q 2 e3t (1 − qet ) −4  ∂t ∂t  ∂t  (1 − qet ) −2 et + 2(1 − qet ) −3 et ( qet ) + 12qe 2t (1 − qet ) −3 + 18qe 2t (1 − qet ) −4 ( qet )   = pq   +18q 2 e3t (1 − qet ) −4 + 24q 2 e3t (1 − qet ) −5 ( qet )   

(1 − qet ) −2 et + 14qe 2t (1 − qet ) −3 + 36q 2 e3t (1 − qet ) −4  = pq   t −5 3 4t  +24q e (1 − qe )   ∂4

  1 14q 36q 2 24q 3  M X (t ) t = 0  = pq  2 + 3 + 4 + 5  p p p  p  ∂t  q  p 3 + 14qp 2 + 36q 2 p + 24q 3  = p4  q 1 − 3q + 3q 2 − q 3 + 14qp 2 + 36q 2 p + 24q 3  = p4 

µ4′ = 

4

Discrete Distributions

= =

q p4 q p4 = =

91

1 − 3q + 3q 2 − q 3 + 14qp + 22q 2 p + 24q 3  1 − 3q + 3q 2 − q 3 + 14qp + 22q 2 p + 22q 3 + 2q 3  q p4 q p4

1 − 3q + 3q 2 + q 3 + 14qp + 22q 2  1 − 3q + 3q 2 + q 3 + 14q − 14q 2 + 22q 2  q µ4′ = 4 1 + 11q + 11q 2 + q 3  (3.5.18) p

 Note: The expression PGF can be derived using the same MGF method. The formula of PGF is = PX (t ) ∑ = t x P ( X x). x

e. Characteristic Function (CF) of Geometric Distribution: The expression for CF (ϕX (t)) is obtained as below: = φ X (t ) E= (eitX ) =

P( X ∑e= itx

x)

x =0

= e pq p= ∑ ∑ e q p ∑ (e q ) itx

= x 0

x

itx

x

it

x

= x 0= x 0

=p

1 [using sum of geometric series] 1 − eit q

φX = (t ) p (1 − qeit ) . −1

f. Cumulant Generating Function (CGF) of Geometric Distribution: The detailed derivation of the CDF is given below: K X (t ) = log( M X (t )) K X (t ) = − log p (1 − qeit )

We can get center moments using cumulant generating function: = K X (t ) log p (1 − qet ) −1 



 t t2 t3 t4 + + + ....  [using exponential series] 1! 2! 3! 4! 

= log p − log 1 − q 1 + +

    t t2 t3 t4  = log p − log 1 − q + q  + + + + ....  1! 2! 3! 4!  

Distribution Theory

92

  t t2 t3 t4  p p q = log − log +   + + + + ....   1! 2! 3! 4!  

 q  t t2 t3 t4  = − log 1 +  + + + + ....  p 1! 2! 3! 4!   2 2  q  t t2 t3 t4   1  q   t t2 t3 t4   −  + + + + .... −    + + + + ....   p 1! 2! 3! 4!   2  p  1! 2! 3! 4!  = − 3 4  3 4 2 3 4 2 3 4  1 q  t t   t t t t  1 q   t t  − 3  p  1! + 2! + 3! + 4! + .... − 4  p  1! + 2! + 3! + 4! + .... ..          

κr = coefficient of

tr in KX (t) r!

κ1 = coefficient of

t q in K X (t ) = 1! p 2

2 q q q κ2 = coefficient of t in K X (t ) = +   =2

2!

κ3 = coefficient of

p  p

p

3

t in K X (t ) 3! 2

3

q q q q κ 3 = + 3   + 2   = 3  p 2 + 3qp + 2q 2  p p p p     q q (1 + q ) 1 + q 2 − 2q + 3q − 3q 2 + 2q 2  = = p p3

Similarly, we get κ4 = coefficient of

t4 in K X (t ) . 4!

3.6 NEGATIVE BINOMIAL DISTRIBUTION Negative Binomial distribution contains a countable number of values (0, 1, 2, …) with success probability (p) and failure probability (q) with two conditions i) 0 ≤ p ≤ 1 and ii) p + q = 1. For example, the number of loan offers given by the lender to the customers until the 5th loan is sanctioned. This distribution gives the probability of that number of failures before rth success. In Negative Binomial, distribution, the PMF of a random variable X is:

Discrete Distributions

93

 ( x + r − 1) ! r x pq  P( X= x= )  (r − 1)!( x )! 0 otherwise 

x = 0,1,2,...

 Another Form of Negative Binomial Distribution: −r !  x pr q x (−1) (−r − x)!( x )! )  P( X= x= 0 otherwise 

 Remark: if p = Since p + q =

x = 0,1,2,...

1 1 P q ⇒ Q = and q = ⇒P= Q p Q p

1 P + ⇒ P +1 = Q = Q − P = 1 Q Q

r x   1  P (−1) x − r Cx     P( X= x= )  Q Q  0

x=0,1,2,... otherwise

or (−1) x − r Cx Q − r − x P x x=0,1,2,... P( X= x= )  otherwise (3.6.1) 0

 Sample Generation from Negative Binomial Distribution (Figure 3.10)

FIGURE 3.10 Negative binomial distribution.

Distribution Theory

94

 Output:

3.6.1 PROPERTIES OF NEGATIVE BINOMIAL DISTRIBUTION a. Moment Generating Function (MGF) of Negative Binomial Distribution: It is another method for obtaining moments from the probability model. The expression for MGF (MX (t)) is obtained as below: M = E= (etX ) X (t )

P( X ∑e= tx

x)

x =0

= ∑ etx (−1) x − rCx Q − r − x P x = ∑ − r C x Q − r − x ( −e t P )

x

= x 0= x 0

MX (t) = (Q – Pet)–r [using Binomial theorem] (3.6.2) We can get moments using MGF:

= µr′

∂r = M X (t ) t 0 ∂t r

= µ1′

∂ = M X (t ) t 0 ∂t

∂ ∂ M X (t ) =(Q − Pet ) − r = (−r )(Q − Pet ) − r −1 ( − Pet ) ∂t ∂t [using Eqn. (3.6.2)] = rPet (Q − Pet ) − r −1 = rP(Q − P) − r −1

∂ M= rP(Q − P) − r −1 (3.6.3) X (t ) ∂t

′  M X (t )= t 0= = r (3.6.4) µ= 1  rP p  ∂t 

∂

= µ2′



∂2 = M X (t ) t 0 ∂t 2

∂2 ∂ ∂  M X (t ) =  M X (t )  ∂t  ∂t ∂t 2 

q

Discrete Distributions

95

∂ ∂ = (rP)(Q − Pet ) − r −1 et  =rP (Q − Pet ) − r −1 et  [using Eqn. (3.6.4)] ∂t ∂t = rP (Q − Pet ) − r −1 et + (−r − 1)(Q − Pet ) − r − 2 (− Pet )et  ∂2 M X= (t ) rP (Q − Pet ) − r −1 et + (r + 1) Pe 2t (Q − Pet ) − r − 2  (3.6.5) ∂t 2  ∂2  µ2′ =  2 M X (t ) t = 0  = rP [1 + (r + 1) P ]  ∂t 

µ2′= rP [1 + (r + 1) P ]= r = µ3′

q q q 1 + (r + 1) = r 2 (1 + rq ) (3.6.6) p p p

∂3 = M X (t ) t 0 ∂t 3

 ∂3 ∂3 ∂  ∂2 = = M ( t ) M ( t ) X X  2 M X (t )  3 3 ∂t  ∂t ∂t ∂t  ∂3 ∂  rP (Q − Pet ) − r −1 et + (r + 1) Pe 2t (Q − Pet ) − r − 2   = M X (t ) 3  ∂t   ∂t

= rP

[using Eqn. (3.6.5)]

∂ (Q − Pet ) − r −1 et + (r + 1) Pe 2t (Q − Pet ) − r − 2  ∂t 

∂ ∂  = rP  (Q − Pet ) − r −1 et + (r + 1) P e 2t (Q − Pet ) − r − 2  ∂t  ∂t  (Q − Pet ) − r −1 et + (r + 1) Pe 2t (Q − Pet ) − r − 2 + 2(r + 1) Pe 2t (Q − Pet ) − r − 2  = rP   2t t − r −3 t (r + 1)(r + 2) Pe (Q − Pe ) ( Pe ) 

(Q − Pet ) − r −1 et + 3(r + 1) Pe 2t (Q − Pet ) − r − 2  ∂3 M t = rP ( )   (3.6.7) X 2 3t t − r −3 ∂t 3  +(r + 1)(r + 2) P e (Q − Pe )   ∂3

 M X (t ) t = 0  = rP 1 + 3(r + 1) P + (r + 1)(r + 2) P 2   ∂t 

µ3′ = 

3

= r = r

q q q 1 + 3(r + 1) + (r + 1)(r + 2)   p  p  p

2

q 2  p + 3(r + 1)qp + (r + 1)(r + 2)q 2  p

  

Distribution Theory

96

q p3 q = r 3 p

= r

 p 2 + 3qrp + 3qp + r 2 q 2 + 3rq 2 + 2q 2  1 + q 2 + qp + 3qrp + 3rq 2 + r 2 q 2  q

′ rP 1 + 3(r + 1) P + (r + 1)(r + 2) P 2=  r 3 1 + q + 3qr + r 2 q 2  (3.6.8) µ= 3 p ∂4 ∂t

= M X (t ) t 0 Similarly, we can find µ4′ = 4

 Note: The expression PGF can be derived using the same MGF = PX (t ) ∑ = t x P ( X x). method. The formula of PGF is x

b. Characteristic Function (CF) of Negative Binomial Distribution: The expression for CF (ϕX (t)) is obtained as below: = φ X (t ) E= (eitX )

P( X ∑e= itx

x)

x =0

−r ! −r ! = eitx (−1) x Q−r − x P x = Q − r − x (−eit P) x ∑ ∑ r x x r x x ( − − )! ! ( − − )! ! ( ) ( ) x 0= x 0

KX (t) = log(MX (t)) [using Binomial theorem] c. Cumulant Generating Function (CGF) of Negative Binomial Distribution: We can derive the expression of the CGF to see the highlights of the probability model. The detailed derivation of the CDF is given below: KX (t) = log(MX (t)) KX (t) = log(Q – Pet)–r We can get center moments using cumulant generating function: KX (t) = – r log(Q – Pet) 







 t t2 t3 t4 + + + ....  [using exponential series] 1! 2! 3! 4! 

−r log  Q − P 1 + + =

  t t2 t3 t4  = −r log  Q − P + P  + + + + ....  1! 2! 3! 4!      t t2 t3 t4  = −r log 1 + P  + + + + ....  1! 2! 3! 4!  

Discrete Distributions

97

2   t t2 t3 t4   1  t t2 t3 t4   − P  + + + + .... − P 2  + + + + ....   1! 2! 3! 4!   2 1! 2! 3! 4!  = −r  3 4  2 3 4 2 3 4  1 3t t   t t t t  1 3t t  − 3 P 1! + 2! + 3! + 4! + .... − 4 P 1! + 2! + 3! + 4! + .... ..       r κr = coefficient of t in K X (t )

r! t q t ) rP = r κ1 = coefficient of 1! in K X (= p 2

q t2 q rqp + rq 2 κ2 = coefficient of in K X (t ) =rP + rP 2 =r + r   = p 2! p2  p rq ( p + q ) rq = = p2 p2 3 t κ3 = coefficient of in K X (t ) 3! 2 3 q q q rP + 3rP 2 + 2rP 3 = r + 3r   + 2r   κ3 = p  p  p = r

q  p 2 + 3qp + 2q 2  q 1 + q 2 + qp  q (1 + q ) =  r=   r 2 2 p p p p p3   

Similarly, we get κ4 = coefficient of r

q ( p 2 + 6q ) p4

2 Further, we can find µ4 =κ 4 + 3κ 2 =r

t4 rPQ (1 + 6 PQ) = in K X (t ) = κ4 = 4! q  p 2 + 3q (r + 2)  . p4 

 Note: The expression PGF can be derived using the same MGF = PX (t ) ∑ = t x P ( X x). method. The formula of PGF is x

d. Skewness and Kurtosis of Negative Binomial Distribution: The expressions of skewness (β1, γ1) and kurtosis (β2, γ2) four coefficients based upon the first four central moments are given below: 2

 q (1 + q )  r 2 p 3  µ3  (1 + q ) 2 = = β1 = 3 3 rq µ2  q  r p 2   

Distribution Theory

98

= γ1

µ4 β= = 2 µ22

r

µ µ2

3 = 3/ 2

(1 + q ) rq

q  p 2 + 3q (r + 2)   p 2 + 3q (r + 2)  p4  = 2 rq  q  r p 2     p 2 + 6q 

β2= 3 + 

rq

 p 2 + 6q  γ 2 = β2 − 3 =  rq

 Note: The other properties of the Negative Binomial distribution (Figure 3.11) can be derived utilizing the methods used above for the earlier derivations.

FIGURE 3.11 Shape of the negative binomial distribution.

3.7 HYPER-GEOMETRIC DISTRIBUTION Hyper-geometric distribution contains a non-negative finite number of values. Let us consider a container with N ‘T-shirts,’ out of which ‘M’ are white ‘T-shirts’ and rest (N-M) are Black ‘T-shirts.’ Further, we take the sample size n using simple random sampling without replacement. In the sample, the number of white shirts is ‘k’ and the rest (n-k) are black. The hypergeometric distribution gives the probability of getting ‘k’ white ‘T-shirts’ out of the n sample. In Hyper-Geometric distribution, the PMF is:

Discrete Distributions

99

 M Ck N − M Cn − k  N P( X= k= ) h(k ; N , M , n= )  Cn 0 

k=0,1,2....min(n,M) otherwise

Symbolically: X ~ Hypergeomtric(N, M, n, k). Here N, M, n, k are parameters of the distribution and k < n.  Sample Generation from Hypergeometric Distribution (Figure 3.12)

FIGURE 3.12 Hypergeometric distribution.

 Output:

3.7.1 PROPERTIES OF HYPER-GEOMETRIC DISTRIBUTION a. Moments of Hyper-Geometric Distribution: The first four raw moments μʹ1, μʹ2, μʹ3, μʹ4 have been derived as below: n

= µ1′ E= (X )

(X ∑ xP=

M

C x N − M Cn − x N Cn

n

= ∑x x =0

x =1

x)

Distribution Theory

100

M! ( N − M )!    x − 1!( M − x )! (n − x)!( N − M − n + x ) !   = ∑ N!  x =1    n ! N n ! − ( )   n

=M

=M

n   1 M − 1! ( N − M )!   ∑ N! x =1   x − 1!( M − x ) ! (n − x)!( N − M − n + x ) ! n !( N − n )! n   1 M − 1! ( N − M )!   ∑ N! x =1   x − 1!( M − x ) ! (n − x)!( N − M − n + x ) ! n !( N − n )!

let x − 1 =k   when x = 1 ⇒ k = 0     x = n ⇒ k = n-1  =M

=M

n −1   1 M − 1! ( N − M )!   ∑ N! k ! M − k − 1 ! ( n − k − 1)! N − M − ( n − 1 − k ) ! ) ( )  k =0   ( n !( N − n )! n −1   1 M − 1! ( N − M )!   ∑ N! k =0   k !( M − 1 − k ) ! (n − 1 − k )!( N − M − (n − 1 − k ) ) ! n !( N − n )!

Since we have =M

We know that

r a + b! a! b! =∑ r !( a + b − r )! k = 0 k !( a − k )! (r − k )!( b − r + k )!

{( M − 1) + ( N − M )}! 1 N! {( M − 1) + ( N − M ) − (n − 1)}!(n − 1)! n !( N − n ) ! ( N − 1)! 1 =M N! ( N − n) ( n − 1)! n !( N − n )! ( N − 1)! N! N = n !( N − n )! n {( N − 1) − (n − 1)}( n − 1)! =M

( N − 1)! 1 N! ( N − n) ( n − 1)! n !( N − n )!

Discrete Distributions

101

=M

( N − 1)! 1 ( N − 1)! ( N − n) ( n − 1)! N n ( N − n) ( n − 1)! µ1′ =

Thus,

n M (3.7.1) N

n M is the mean of the distribution. N n

P( X ∑ x=

= µ2′ E= (X 2) n

= ∑ x2 x =0 n

x =0 n

M

M

C x N − M Cn − x N Cn

n C x N − M Cn − x +∑x N Cn 0= x 0

= ∑ x( x − 1) x

x)

C x N − M Cn − x N Cn

∑ [ x( x − 1) + x ]

=

M

2

x =0

M

C x N − M Cn − x N Cn

n  M C x N − M Cn − x  = x( x − 1)  ∑  + µ1′ N Cn x =0  

n  M C x N − M Cn − x  n = x( x − 1)  ∑ + M N Cn x =0   N

[using Eqn. (3.7.1)]

M! ( N − M )!    x − 2!( M − x )! (n − x)!( N − M − n + x ) !  n  + M ∑ N!  N x=2    n !( N − n ) !   n

M − 2! ( N − M )!    x − 2!( M − x ) ! (n − x)!( N − M − n + x ) !  n + M M ( M − 1)∑  = N!  N x=2    n !( N − n ) !   n

k let x − 2 =   when x = 2 ⇒ k = 0     x = n ⇒ k = n-2 

Distribution Theory

102

M − 2! ( N − M )!    x − 2!( M − 2 − k ) ! (n − 2 − k )!( N − M − n + 2 + k ) !  n + M M ( M − 1)∑  = N!  N k =0    n !( N − n )!   n−2

r a + b! a! b! =∑ Since we have r !( a + b − r )! k = 0 k !( a − k )! (r − k )!( b − r + k )!

N − 2! n − 2!( N − n )! n = M ( M − 1) + M N! N n !( N − n )!

µ2′ = M ( M − 1)

= µ3′ E= (X 3) n

= ∑ x3

M

x =0

=

n(n − 1) n + M (3.7.2) N ( N − 1) N n

P( X ∑ x= 3

x)

x =0

C x N − M Cn − x N Cn

n

∑ [ x( x − 1)( x − 2) + 3x( x − 1) + x ] x =0

M

C x N − M Cn − x N Cn

M  n C x N − M Cn − x n + ∑ 3 x( x − 1)  ∑ x( x − 1)( x − 2) N Cn x 0= x 0  =  n M C N −M C x n− x +∑ x N Cn  x = 0

=

n

∑ x( x − 1)( x − 2) x =0

n

∑ x =0

M

M

C x N − M Cn − x   N Cn    

C x N − M Cn − x + 3µ 2′ + µ1′ [using (3.7.1) and (3.7.2)] N Cn

M! ( N − M )! x − 3!( M − x ) ! (n − x)!( N − M − n + x ) ! n(n − 1) M ( M − 1) nM +3 + N! N ( N − 1) N n !( N − n ) !

Discrete Distributions

103

n

= M ( M − 1)( M − 2)∑ x =3

+3

M − 3! ( N − M )! x − 3!( M − x ) ! (n − x)!( N − M − n + x ) ! N! n !( N − n )!

n(n − 1) M ( M − 1) nM + N ( N − 1) N k let x − 3 =   when x = 3 ⇒ k = 0     x = n ⇒ k = n-3 

M − 3! ( N − M )! k !( M − 3 − k ) ! (n − 3 − k )!( N − M − (n − 3 − k ) )! = M ( M − 1)( M − 2)∑ N! k =0 n !( N − n )! n −3

+3

n(n − 1) M ( M − 1) nM + N ( N − 1) N

Since we have

r a + b! a! b! =∑ r !( a + b − r )! k = 0 k !( a − k )! (r − k )!( b − r + k )!

N − 3! n − 3!( N − n )! n(n − 1) M ( M − 1) nM = M ( M − 1)( M − 2) +3 + N! N ( N − 1) N n !( N − n )! = M ( M − 1)( M − 2)

µ3′ = n(n − 1)(n − 2)

n(n − 1)(n − 2) n(n − 1) M ( M − 1) nM +3 + N ( N − 1)( N − 2) N ( N − 1) N

M ( M − 1)( M − 2) M ( M − 1) M + 3n(n − 1) +n (3.7.3) N ( N − 1)( N − 2) N ( N − 1) N

= µ4′ E= (X 4) n

= ∑ x4 x =0

M

n

P( X ∑ x= 4

x =0

C x N − M Cn − x N Cn

x)

Distribution Theory

104

=

n

∑ [ x( x − 1)( x − 2)( x − 3) + 6 x( x − 1)( x − 2) + 7 x( x − 1) + x ]

M

x =0

M n  n C x N − M Cn − x x x − x − x − + ∑ 6 x( x − 1)( x − 2) ( 1)( 2)( 3) ∑ N Cn x 0= x 0 = − M N M M n n  Cx Cn − x C x N − M Cn − x  + ∑ 7 x( x − 1) +∑x N N Cn Cn  x 0= x 0

=

n

∑ x( x − 1)( x − 2)( x − 3) x =0

n

∑ x=4

M

C x N − M Cn − x N Cn M

C x N − M Cn − x   N Cn    

C x N − M Cn − x + 6 µ3′ + 7 µ 2′ + µ1′ N Cn

M! ( N − M )! x − 4!( M − x )! (n − x)!( N − M − n + x ) ! + 6 µ3′ + 7 µ 2′ + µ1′ N! n !( N − n ) ! n

= M ( M − 1)( M − 2)( M − 3)∑ x=4

M − 4! ( N − M )! x − 4!( M − x )! (n − x)!( N − M − n + x ) ! N! n !( N − n ) !

+6 µ3′ + 7 µ 2′ + µ1′ k let x − 4 =   when x = 4 ⇒ k = 0     x = n ⇒ k = n-4  M − 4! ( N − M )! k !( M − 4 − k ) ! (n − 4 − k )!( N − M − n + 4 + k )! = M ( M − 1)( M − 2)( M − 3)∑ N! k =0 n !( N − n )! n−4

+6 µ3′ + 7 µ 2′ + µ1′

Since we have

r a + b! a! b! =∑ r !( a + b − r )! k = 0 k !( a − k )! (r − k )!( b − r + k )!

N − 4! (n − 4)!( N − n )! = M ( M − 1)( M − 2)( M − 3) + 6 µ3′ + 7 µ 2′ + µ1′ N! n !( N − n )!

Discrete Distributions

105

= M ( M − 1)( M − 2)( M − 3)

n(n − 1)(n − 2)(n − 3) + 6 µ3′ + 7 µ 2′ + µ1′ N ( N − 1)( N − 2)( N − 3)

M ( M − 1)( M − 2)( M − 3)    n(n − 1)(n − 2)(n − 3) N ( N − 1)( N − 2)( N − 3)   = M ( M − 1)( M − 2) M ( M − 1) M   +6n(n − 1)(n − 2) N ( N − 1)( N − 2) + 7 n(n − 1) N ( N − 1) + n N    = µ4′ n

M ( M − 1)  ( M − 2)  ( M − 3)   1 + (n − 1) 7 + (n − 2)  6 + (n − 3)   ( N − 1)  ( N − 2)  ( N − 3)   (3.7.4) N 

b. Central Moments Hyper-Geometric Distribution: The first four central moments μ1 , μ2, μ3, μ4 have been derived as below: μ1 = 0 μ2 = μʹ2, – μʹ12 2

=n(n − 1)

= n = n

M ( M − 1) M  M + n +  n  [using Eqns. (3.7.1) and (3.7.2)] N ( N − 1) N  N

= n

M  ( M − 1) M (n − 1) +1− n   N  ( N − 1) N

M N ( N − 1)

[ N (n − 1)( M − 1) + N ( N − 1) − nM ( N − 1)]

2

M  NnM − Nn − NM + N + N 2 − N − nNM + nM  N ( N − 1) 2

= n

M  − Nn − NM + N 2 + nM  N 2 ( N − 1) 

µ2 = n

M (N-n)(N-M) (3.7.5) N 2 ( N − 1) M

So, the variance of the Hyper-geometric distribution is n 2 N ( N − 1) (N-n) (N-M) Similarly, we get third and fourth central moments.

 Note: The other properties of the Hypergeometric distribution can be derived utilizing the methods used above for the earlier derivations.

Distribution Theory

106

KEYWORDS • characteristic function • • • • •

hyper-geometric distribution moment generating function probability generating function probability mass function random variable

REFERENCES Casella, G., & Berger, R. L., (2002). Statistical Inference. Duxbury: Belmont, CA. Gupta, S. C., & Kapoor, V. K., (1997). Fundamentals of Mathematical Statistics. Sultan Chand and Sons: New Delhi. Heumann, C., Schomaker, M., & Shalabh, (2016). Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R. Springer. Hogg, R. V., McKean, J. W., & Craig, A. T., (2013). Introduction to Mathematical Statistics (7th edn.). Pearson: Boston. Miller, I., Miller, M., Freund, J. E., & Miller, I., (2004). John E. Freund’s Mathematical Statistics with Applications. Prentice Hall: Upper Saddle River, NJ. Rohatgi, V., Saleh, A. K. Md. E., (2015). Introduction to Probability and Statistics (3rd edn.). John Wiley.

CHAPTER 4

Continuous Distributions SHWETA DIXIT1, MUKTI KHETAN2, MOHD. ARSHAD3, PRASHANT VERMA4, and ASHOK KUMAR PATHAK5 Clinical Development Services Agency, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Haryana, India 1

Department of Mathematics, Amity School of Applied Sciences, Amity University Mumbai, Maharashtra, India

2

Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India

3

Department of Statistics, Faculty of Science, University of Allahabad, Prayagraj, India

4

Department of Mathematics and Statistics, School of Basic and Applied Sciences, Central University of Punjab, Bathinda, Punjab, India

5

4.1 UNIFORM DISTRIBUTION A uniform distribution is defined on an interval (a, b). If a random variable X follows a uniform distribution, the expression of the probability density function (PDF) is (Figure 4.1):  1  f ( x; a, b) =  b − a 0

if a < x < b otherwise

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

Distribution Theory

108

Symbolically: X~U (a, b). Here ‘a’ and ‘b’ are the parameters of the distribution. The cumulative distribution function (CDF) is: F ( x)= P( X ≤ x)=

x

x

a

a

1

∫ f (t )dt= ∫ b − a dt=

0 x-a  = F ( x)  b - a 1

x−a , a ≤ x 4 [Using beta function]

( β − 1)( β − 2 )( β − 3)( β − 4 )

; β > 4. (4.4.4)

Continuous Distributions

127

 Alternative Method: (X r ) = µr′ E=

∞

r ∫x 0

=

µr′ =

∞

1 xα −1 1 xα + r −1 = dx dx B(α , β ) (1 + x)α + β B(α , β ) ∫0 (1 + x)α + β

1 B(α + r , β − r ); B(α , β ) (α + r − 1)!( β − r − 1) ! (α − 1)!( β − 1)!

β > r [using beta function] ;

β > r. (4.4.5)

Put r = 1 in Eqn. (4.4.5), we get: = µ1′

(α )!( β − 2 )! = (α − 1)!( β − 1)!

α . ( β − 1)

Put r = 2 in Eqn. (4.4.5), we get: ′ µ= ′ µ= 2 r

(α + 1)!( β − 3)! = (α − 1)!( β − 1) !

α (α + 1) . ( β − 1)( β − 2 )

Put r = 3 in Eqn. (4.4.5), we get: = µ3′

(α + 2)!( β − 4 )! = (α − 1)!( β − 1)!

α (α + 1)(α + 2)

( β − 1)( β − 2 )( β − 3)

.

Put r = 4 in Eqn. (4.4.5), we get: = µ4′

(α + 3)!( β − 5 ) ! = (α − 1)!( β − 1)!

α (α + 1)(α + 2)(α + 3) β − ( 1)( β − 2 )( β − 3)( β − 4 )

b. Central Moments of Beta Distribution of Second Kind: The first four central moments μ1 , μ2, μ3, μ4 have been derived as below:

μ1 = 0

(4.4.6)

2

2 µ2 = µ2′ − µ 1′ =

 α  α (α + 1) −   [using Eqns. (4.4.1) and (4.4.2)] ( β − 1)( β − 2 )  ( β − 1)  =

α  (α + 1)  α   −   ( β − 1)  ( β − 2 )  ( β − 1)   =

α ( β + α − 1)

( β − 1) ( β − 2 ) 2

(4.4.7)

Distribution Theory

128

So, variance of the beta distribution of second kind is α ( β +2 α − 1) ; β > 2.

( β − 1) ( β − 2 )

α (α + 1)(α + 2) µ3 =µ3′ − 3µ2′ µ1′ + 2 µ ′ = ( β − 1)( β − 2 )( β − 3) 3 1

α (α + 1)  α   α  −3   + 2  ( β − 1)( β − 2 )  ( β − 1)   ( β − 1)  =

3

α (α + 1)(α + 2) α2 + ( β − 1)( β − 2 )( β − 3) ( β − 1)2

[using Eqns. (4.4.1)–(4.4.3)] α   (α + 1) +2 −3  ( β − 1)   ( β − 2 )

α (α + 1)(α + 2) α2 + {−3(α + 1) ( β − 1) + 2α ( β − 2 )} ( β − 1)( β − 2 )( β − 3) ( β − 2 )( β − 1)3 α (α + 1)(α + 2) α2 + {−3(αβ + β − α − 1) + 2 (αβ − 2α )} ( β − 1)( β − 2 )( β − 3) ( β − 2 )( β − 1)3  (α + 1)(α + 2) α ( −αβ − α − 3β + 3)  +   2 ( β − 2 )( β − 1)  ( β − 3)  ( β − 1)

α

α (α + 1)(α + 2) ( β − 1)2 + α ( −αβ − α − 3β + 3)( β − 3)   ( β − 1) ( β − 2 )( β − 3)  3

=

α ( β − 1) ( β − 2 )( β − 3) 3

(α 2 + 3α + 2) ( β 2 + 1 − 2 β ) + ( −α 2 − α 2 β − 3αβ + 3α ) ( β − 3)    =

2α

( β − 1) ( β − 2 )( β − 3)

=

3

2 2  2α − 3α + 3βα + β − 2 β + 1

2α

( β − 1) ( β − 2 )( β − 3) 3

( 2α + β − 1) (α + β − 1)  (4.4.8)

Similarly, we can find the fourth raw moment. c. Skewness and Kurtosis of Beta Distribution of Second Kind: To find the shape of the distribution, we have derived the expressions of skewness (β1, γ1) and kurtosis (β2, γ2), which are given below:

Continuous Distributions

129

  2α ( 2α + β − 1) (α + β − 1)    3  µ32  ( β − 1) ( β − 2 )( β − 3) = β1 = 3 3 µ2  α ( β + α − 1)    2  ( β − 1) ( β − 2 ) 

2

4 ( β − 2 )( 2α + β − 1) β1 = [using Eqns. (4.4.7) and (4.4.8)] 2 α ( β − 3) ( β + α − 1) 2

4 ( β − 2 )( 2α + β − 1) 2 ( 2α + β − 1) ( β − 2) = . 2 − β 3 α ( ) ( β + α − 1) α ( β − 3) ( β + α − 1) 2

= γ1

= β1

Similarly, we can find the measure of kurtosis. 4.6 NORMAL DISTRIBUTION A normal distribution is defined on the set of all real numbers R. If a random variable X follows a normal distribution, the expression of the PDF is 1  x−µ   1 -   2 σ   e 2 f ( x; µ , σ ) =  2πσ  0

2

-∞ < x < ∞ otherwise

Symbolically: X ~ N(μ, σ2) Here μ and σ2 are the parameters of the distribution and the range of the parameter should be –∞ < μ < ∞ and σ2 > 0. The CDF of the continuous normal distribution is (Figure 4.5): F ( x)= P( X ≤ x)=

x

∫

f (t )dt=

−∞

= F ( x)

x

∫

−∞

1 2πσ

x

∫

−∞

e

1  t −µ  -   2 σ 

1 2πσ

e

1  t −µ  -   2 σ 

2

dt

−∞ < x < ∞

2

dt

Distribution Theory

130

FIGURE 4.5 Normal distribution.

 Sample Generation of Normal Distribution R Code: Generate a random sample from normal distribution. x = rnorm(n, μ, σ) where; n is the sample size; μ, σ are the parameters of the distribution Example: x = rnorm(10, 0, 1) Output:

4.6.1 PROPERTIES OF NORMAL DISTRIBUTION a. Moments of Normal Distribution: The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: (X ) = µ1′ E=

∞

∞

−∞

−∞

x)dx ∫ x ∫ xf (=

1 2πσ

e

1  x−µ  -   2 σ 

2

dx

Continuous Distributions

131

x−µ

Let u =

σ 1

⇒ du= ∞

= ∫ (uσ + µ ) −∞

→σ

∞

2π

−∞

∞

∫

and

1

−∞

function]

e

2πσ

1 - u2 2

∞

∞

−∞

1 - u2 2

∞

du = 2 ∫ 0

1

e

2π

1 - u2 2

2π

∫ 0

∞

1 2π

⇒ dz = udu ∞

1 - u2

2π

e 2 du is odd function] ∞

1

∫

du [since

∫e

1 - u2 2

2π

1 - u2

e 2 du is even

du

0

1 2 u ⇒ u= 2

Let z=

2

1

1 - u2

1

−∞

⇒ 2µ

= µ

∞

1 - u2

1

e 2 du + µ ∫ e 2 du σ du = σ∫u 2π 2π −∞ −∞

e 2 du = 0 [since ∫ u e

2π

dx= σ du

dx ⇒

1 - u2

1

∫u

1

σ

⇒ x = uσ + µ

2z 1

⇒ du =

2z

dz

∞

1 −1 1 1 1 1 -z 2 e= dz µ = z e- z dz µ= 1/ 2 µ ∫ 2z π 0 π π

π

= μ (4.5.1) Thus, mean of the normal distribution is μ. ∞

∫x

(X ) = µ2′ E= 2

2

( x)dx f=

−∞

∞

2 = ∫ (uσ + µ ) −∞

=

1 2π

1 2πσ ∞

x−µ

σ 1

σ e

2

1 2πσ

e

1  x−µ  -   2 σ 

2

dx

⇒ x = uσ + µ

dx ⇒

1 - u2 2

∫ {(uσ )

−∞

∫x

2

−∞

Let u = ⇒ du=

∞

dx= σ du 1

∞

1 - u2

σ du = (uσ + µ ) 2 e 2 du ∫ 2π −∞ + µ 2 + 2 µ uσ )} e

1 - u2 2

du

Distribution Theory

132

∞ ∞ ∞ 1 1 1 - u2 - u2 - u2  1  2 2 2 σ ∫ u e 2 du + µ ∫ e 2 du + 2σµ ∫ u e 2 du  2π  −∞  −∞ −∞

=

→ 2 µσ

∫

−∞

2π

2π

∞

1 - u2

1

1 - u2

1

∫u

−∞

and ∞

∞

e 2 du = 0 [odd function]

∞

1 1 - u2 - u2 1 1 2 2 2= e du & u e ∫0 2π ∫−∞ 2π 2 du

e 2 du

⇒

=

π 1

{2σ

{σ π

2

2

1 - u2 2

du

⇒ du =

2z 1 2z

dz

∞ ∞  2  2 1 1 -z dz + µ 2 ∫ e- z dz  σ ∫ 2 z e 2π  0 2z 2 z  0

= 1

2π

e

[even functions]

1 2 u ⇒ u= 2

⇒ dz = udu

=

0

1

∞ ∞ 1 - u2 2  2 2 - 12 u 2  2 σ ∫ u e du + µ ∫ e 2 du  2π  0  0

Let z=

⇒

∞

2∫ u 2

∞ ∞ 1 −1 1  2 32 −1 - z  2 -z 2σ ∫ z e dz + µ ∫ z 2 e dz  π  0 0 

1  21  2 2σ Γ(1/ 2) + µ Γ(1/ 2)  2 π  

Γ(3 / 2) + µ 2 Γ= (1/ 2)}

π + µ2 π

}

μ2 + σ2 (4.5.2) (X 3) = µ3′ E=

∞

∫

( x)dx x3 f =

−∞

∫

−∞

x−µ

Let u = ⇒ du=

∞

σ 1

σ

x3

1 2πσ

⇒ x = uσ + µ

dx ⇒

dx= σ du

e

1  x−µ  -   2 σ 

2

dx

Continuous Distributions

133

∞

3 = ∫ (uσ + µ ) −∞

=

2π

e

2πσ

∞

1

=

1

∫ {( uσ )

3

1 - u2 2

1

∞

1 - u2

3 σ du = ∫ (uσ + µ ) e 2 du 2π −∞

}

1 - u2

+ µ 3 + 3 ( uσ ) µ + 3µ 2 uσ e 2 du 2

−∞

∞ ∞ ∞ ∞ 1 1 1 1 - u2 - u2 - u2 - u2 1  3  3 3 2 2 2 σ ∫ u e 2 du + µ ∫ e 2 du +3σµ ∫ u e 2 du + 3σ µ ∫ u e 2 du  2π  −∞ −∞ −∞ −∞  ∞

∞

1 - u2

1 - u2

= σ 3 ∫ u 3 e 2 du 3= σµ 2 ∫ u e 2 du 0

−∞

and ∞

∞

1 - u2

µ 3 ∫ e 2 du

∞

1 - u2

∞

1 - u2

2= µ 3 ∫ e 2 du & 3σ 2 µ ∫ u 2 e 2 du 6σ 2 µ ∫ u 2 e 0

−∞

⇒ dz = udu

2z 1

⇒ du =

2z

dz

∞ ∞ 2  3 - z 1 1  dz +3σ 2 µ ∫ 2 z e- z dz  µ ∫ e 2π  0 2z 2 z  0

=

= 1

du

[even functions]

1 2 u ⇒ u= 2

Let z=

1 - u2 2

0

−∞

=

[odd functions]

−∞

∞ ∞ 3 1 −1 −1 1  3  -z 2 -z  µ ∫ z 2 e dz +6σ µ ∫ z 2 e dz  π  0 0 

{µ Γ(1/ 2) + 6σ π 3

2

1  3 1  2 π  µ π + 6σ µ 2 π  

µ Γ(3 / 2)}=

= μ3 + 3σ2 μ (4.5.3) (X ) = µ4′ E= 4

∞

∫x

4

( x)dx f=

−∞

Let u = ⇒ du=

∞

∫x

−∞

x−µ

σ 1

σ

4

1 2πσ

⇒ x = uσ + µ

dx ⇒

dx= σ du

e

1  x−µ  -   2 σ 

2

dx

Distribution Theory

134

∞

4 = ∫ (uσ + µ ) −∞

=

=

1 2π

∞

∫ {( uσ )

4

1 2πσ

e

1 - u2 2

1

∞

1 - u2

4 σ du = ∫ (uσ + µ ) e 2 du 2π −∞

}

1 - u2

+ µ 4 + 4 ( uσ ) µ + 4 µ 3uσ + 6 ( uσ ) µ 2 e 2 du 3

2

−∞

∞ ∞ 1 1 - u2 - u2  4 ∞ 4 - 12 u 2  4 3 3 2 2 σ ∫ u e du + µ ∫ e du + 4σµ ∫ u e du + 4σ µ  1  −∞  −∞ −∞ ∞  ∞ 1 2 1 2 u u 2π  3 2  2 2 2 2 + e 6 e u du σ µ u du ∫ ∫  −∞  −∞  ∞

∞

1 - u2

1 - u2

→ 4σµ 3 ∫ u e 2 du = 4σ 3 µ ∫ u 3 e 2 du = 0 [odd functions]

−∞

and

−∞

∞ ∞ ∞ 1 1 1 - u2 - u2 - u2  4 ∞ 4 - 12 u 2  = σ 4 ∫ u 4 e 2 du, µ 4 ∫ e 2 du 2µ 4 ∫ e 2 du  σ ∫ u e du 2= −∞ −∞ 0 0  → ∞ ∞ 1 2 1 2   - u - u & 6σ 2 µ 2 ∫ u 2 e 2 du = 12σ 2 µ 2 ∫ u 2 e 2 du  −∞ 0  

⇒

∞ ∞ ∞ 1 1 - u2 - u2 2  4 4 - 12 u 2  4 2 2 2 σ ∫ u e du + µ ∫ e 2 du + 6σ µ ∫ u e 2 du  2π  0 0 0 

Let z=

1 2 u ⇒ u= 2

⇒ dz = udu =

[even functions]

⇒ du =

2z 1 2z

dz

∞ ∞ ∞  2  4 1 1 1 2 -z dz + µ 4 ∫ e- z dz + 6σ 2 µ 2 ∫ 2 z e- z dz  σ ∫ ( 2 z ) e 2π  0 2z 2z 2 z  0 0

=

∞ ∞ ∞ 1  4 2 - z 1 1 1  dz + µ 4 ∫ e- z dz + 12σ 2 µ 2 ∫ z e- z dz  4σ ∫ z e π  z z z  0 0 0

=

∞ ∞ ∞ 3 1 −1 −1 1  4 52 −1 - z 1  dz + µ 4 ∫ z 2 e- z dz + 12σ 2 µ 2 ∫ z 2 e- z dz  4σ ∫ z e π  z 0 0 0 

Continuous Distributions

= =

1

{4σ π

4

Γ(5 / 2) + µ 4 Γ(1/ 2) + 12σ 2 µ 2 Γ(3 / 2)}

1  431 1  Γ(1/ 2) + µ 4 Γ(1/ 2) + 12σ 2 µ 2 Γ(1/ 2)  4σ 22 2 π   =

135

1

π

{3σ

4

π + µ 4 π + 6σ 2 µ 2 π

}

= 3σ4 + μ4 + 6σ2 μ2 (4.5.4)

b. Central Moments of Normal Distribution: The first four central moments μ1 , μ2, μ3, μ4 have been derived as below:

μ = 0 [first central moments always zero] (4.5.5) μ2 = μʹ2 – μʹ12

μ2 = σ2 [using Eqns. (4.5.1) and (4.5.2)] (4.5.6) Thus, variance of the uniform discrete distribution is σ2 µ3 = µ3′ − 3µ2′ µ1′ + 2µ 1′3

μ3 = 0 [using Eqns. (4.5.1)–(4.5.3)] (4.5.7) µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4

μ4 = 3σ4 [using Eqns. (4.5.1)–(4.5.4)] (4.5.8)

c. Skewness and Kurtosis of Normal Distribution: To find the shape of the distribution, we have derived the expressions of skewness (β1, γ1) and kurtosis (β2, γ2), which are given below:

β1 =

µ32 [using Eqns. (4.5.6) and (4.5.7)] µ23

β1 = 0 = γ1

β2 =

= β1 0

µ4 =3 [using Eqns. (4.5.6) and (4.5.8)] µ22

γ 2 = β2 – 3 = 0 This implies that the distribution is symmetric and mesokurtic.

Distribution Theory

136

d. Moment Generating Function (MGF) of Normal Distribution: The expression for MGF (MX (t)) is obtained as below: ∞

∫e

tX

(e ) M= E= X (t )

tx

∞

∫e

( x)dx f=

−∞

∞

2πσ

−∞

x−µ

Let u = ⇒ du=

1

tx

σ 1

σ

2

dx

⇒ x = uσ + µ dx= σ du

dx ⇒ ∞

1 - u2 1 t ( uσ + µ ) 2 e = e σ du ∫−∞ 2πσ

e

1  x−µ  -   2 σ 

1 - u2 1 tµ = e ∫ etuσ e 2 du 2π −∞

∞

1 - ( u 2 − 2 tuσ + t 2σ 2 − t 2σ 2 ) 1 tµ = e ∫e 2 du 2π −∞

1

e

2π

1 2π

∞ 1 t µ + t 2σ 2 2

∫e

∞

et µ

∫e

-

1 2 u − 2 tuσ 2

(

)

du

−∞ -

1 ( u − tσ )2 2

du

−∞

Let s = u - tσ ⇒ u = s + tσ ⇒ ds = du =e

∞ 1 t µ + t 2σ 2 2

1

∫

2π

−∞

Let p =

e

-

1 2 s 2

ds

s2 1 ⇒ 2dp = 2 sds ⇒ ds = dp ⇒ ds = s 2

∞

∞

1 2p

dp

∞

1 1 1 t µ + t 2σ 2 t µ + t 2σ 2 t µ + t 2σ 2 1 -p 1 1 -1/2 - p 1 1/2-1 - p 2 2 2 e = e dp 2 = e p e dp e p e dp, ∫−∞ 2π ∫ ∫ 2p π 0 2 π 0

Using gamma function, we get: =

1

π

e

1 t µ + t 2σ 2 2

Γ(1/ = 2)

1

π

e

1 t µ + t 2σ 2 2

π

1 t µ + t 2σ 2

⇒ M X (t ) = e 2

Note: The expression PGF can be derived using the same MGF method. x The formula of PGF is PX (t ) = ∫ t f ( x)dx. x

e. Characteristic Function (CF) of Normal Distribution: The expression for CF (ϕX (t)) is obtained as below:

Continuous Distributions

φ= E= (e ) X (t ) itx

137

∞

∫e

itx

∞

∫e

( x)dx f=

−∞

itx

1 2πσ

−∞

e

1  x−µ  -   2 σ 

2

dx.

Note that this integral is a complex-valued integral, which is beyond the scope of this book. Therefore, we will state the characteristic function without proof. 1 it µ − t 2σ 2

⇒ φ X (t ) = e 2

f. Cumulants Generating Function of Normal Distribution: The expression for CGF (KX (t)) is obtained as below: KX (t) = log(MX (t))  t µ + 1 t 2σ 2  1 2 2 ) log  e 2 = K X (t=  tµ + t σ 2  

We can get central moments using cumulants generating function: κ1 = mean; κ 2 = variance; κ 3 = µ3 ;

κ4 = −3κ 2 + µ4 .

1 ) t µ + t 2σ 2 . K X (t= 2 tr κr = coefficient of in K X (t ). r! t κ1 = mean = coefficient of in K X (t ) = µ. 1!

κ2 = variance = coefficient of κ3 = μ3 = coefficient of κ4 = coefficient of

t2 in K X (t ) = σ 2 . 2!

t3 in K X (t ) = 0. 3!

t4 in K X (t ) = 0. 4!

µ4 =κ 4 + 3κ 22 = 0 + 3σ 2 = 3σ 2 .

g. General Central Moment of Normal Distribution: Case I: Odd order central moment of normal distribution µ2 n +1 =E ( X − µ )

2 n +1

∞

=∫ ( x − µ ) −∞

2 n +1

∞

f ( x)dx =∫ ( x − µ ) −∞

2 n +1

1 2πσ

e

1  x−µ  -   2 σ 

2

dx

Distribution Theory

138

x−µ

Let u = ⇒ du=

σ 1

⇒ x = uσ + µ dx= σ du

dx ⇒

σ

∞

1 - u2 1 2 n +1 2 ( ) e = u σ σ du ∫−∞ 2πσ

1 2π

∞

σ ( 2 n +1)

∫

u ( 2 n +1) e

-

1 2 u 2

du

−∞

[Since the above function is odd]

μ2n+1 = 0 So, all the odd central moments are 0 (μ1 = μ3 = μ5 = … = μ2n+1 = 0) Case II: Even order central moment of normal distribution µ2 n = E ( X − µ ) = 2n

∞

∫ (x − µ)

2n

f ( x)dx =

−∞

Let u = ⇒ du=

x−µ

σ

2n

1

σ

1 2πσ

dx= σ du

dx ⇒

1

σ ( 2n)

2π

∞

∫

1 - u2

u ( 2 n ) e 2 du

−∞

u2 ⇒ 2dp = 2udu ⇒ du = 2

1 2p

dp

∞

1 1 n +1 ( n) = σ ( 2 n ) ∫ ( 2p ) e- p dp σ ( 2 n ) ( 2 ) 2π 2p −∞ ∞

∫2 0

1

π

∞

2 p n-1/2 e- p dp = ( 2 ) σ ( n ) ∫ n

0

µ2 n

1

π

p n+1/2-1e- p dp

1  2n σ 2 n  n +  ! 2  =

π

Case n=1  1 3 2σ 2 1 +  ! 2σ 2   !  2 2 σ2 µ2 = = =

π

1  x−µ  -  σ 

e 2

⇒ x = uσ + µ

1 - u2 1 2n 2 ( ) e = u σ σ du ∫−∞ 2πσ

Let p =

∫ (x − µ)

−∞

∞

=

∞

π

2

dx

Continuous Distributions

139

Case n=2 1  5 22 σ 4  2 +  ! 4σ 4   ! 2   2  3σ 2 µ4 = = =

π

π

Similarly, we get μ6 = μ8, …, μ2n h. Mode of Normal Distribution: Mode is the value of x for which f (x) is maximum, i.e., the mode is the solution of f ʹ(x) = 0 and f ʺ(x) = 0

We have:

1  x−µ   1 -   2 σ   e f ( x; µ , σ 2 ) =  2πσ  0

2

-∞ < x < ∞ otherwise

1  x−µ    1  - 1  x − µ  -     1  2 σ  = log f ( x) log  = e 2  σ   log   + log  e 2 2 πσ πσ      2

 1  1 x−µ  = log  -    2πσ  2  σ 

2

  

2

2 ∂ ∂   1  1 x−µ    x−µ  log f ( x) = log       = − 2  ∂x ∂x   2πσ  2  σ    σ 

1  x−µ   x−µ  −  2  ⇒ f '( x) = −  2  f ( x) f ′( x) = f ( x)  σ   σ   x−µ   2  f ( x) = 0  σ  ⇒ x − µ = 0 ⇒ x = µ [Since f(x) & σ2 > 0]

Now,

∂ ∂   x−µ   1 ∂  f ′( x) = − − 2  ( x − µ ) f ( x)   f ( x)  = x ∂x ∂x   σ 2  ∂ σ    1 = − 2 [ f ( x) + ( x − µ ) f '( x) ]

σ

Substitute the value of f ʹ(x) in the above function:

Distribution Theory

140

2 2  (x − µ) ∂ 1  f ( x)  ( x − µ )  f '( x) = − 2  f ( x) − f ( x)  = − 2 1 −  ∂x σ  σ2 σ  σ 2  

 f ( x)  ( x − µ )2    f ( x)  f ''( x) = −  2 1 − − < 0.  = σ 2    σ 2  x = µ  σ  

Hence, x = μ, is the mode of the normal distribution. 4.7 LOG-NORMAL DISTRIBUTION

A log-normal distribution is defined on the interval (0, ∞). If logeX follows a normal distribution, then a positive random variable X follows a lognormal distribution. In other words, if loge X ~ N(μ, σ2) then X ~ Lognormal distribution. The expression of the PDF of random variable X is: 1  log x − µ   1 -  e   e 2 σ  2 f ( x; µ , σ ) =  x 2πσ  0

2

if 0 < x < ∞ otherwise

Here μ and σ are two parameters of the distribution and the range of the parameters should be μ ∈ R and σ > 0. Remark: If logeX follows a standard normal distribution, then a positive random variable X follows standard log-normal distribution (Figure 4.6). 2

FIGURE 4.6 Log normal distribution.

Continuous Distributions

141

 Sample Generation of Log-Normal Distribution R Code: Generate a random sample from Log-Normal distribution. x = rlnorm(n, μ, σ) where; n is the sample size; μ, σ are the parameters of the distribution Example: x = rlnorm(10, 0.5, 1) Output:

4.7.1 PROPERTIES OF THE LOG-NORMAL DISTRIBUTION a. Distribution Function of Log Normal Distribution: FX ( x)= P ( X ≤ x = ) P ( log e X ≤ log e x ) log e x − µ   log e X − µ log e x − µ   ≤ =P  =P  Z ≤  , σ σ σ    

where; Z ~ N(0,1)

 log e x − µ  FX ( x) = Φ  , σ  

where; Φ denotes the CDF of N(0,1) b. Median of Log Normal Distribution: Let X~Log normal distribution with parameters μ and σ2 and loge X ~ N(μ σ2). Then Median(loge X)= μ loge(Median X)= μ [If f(x) is a monotonic function. Then median of f(x) = f(median(x))] ⇒ Median = eµ

So, the median of the log-normal distribution is eμ. c. Mode of Log Normal Distribution: The PDF of the log-normal distribution is given below:

Distribution Theory

142

1  log x − µ   1 -  e   e 2 σ  f ( x) =  x 2πσ  otherwise 0 2

if 0< x 0, 1  log x − µ   1 -  e   log e f ( x) = log e e 2 σ   x 2πσ 

log e f ( x) = − log e x − log e

   

1  log e x − µ  2πσ -   2 σ 

(

d log e f ( x) = − log e x − log e dx

2

)

(

2

1  log e x − µ  2πσ -   2 σ 

)

f ′( x) 1 1  log e x − µ  1 = − - 2 σ x f ( x) x 2  σ  f ′( x) = −

f ( x)   log e x − µ   1+  x   σ2 

Note that f ʹ (x) = 0 ⇒−

f ( x)   log e x − µ   1+   =0 x   σ2 

As we know that f (x) & x ≠ 0  log e x − µ  ⇒ 1+  0 = σ2   ⇒

log e x − µ

σ2

= −1

⇒ log e x − µ = −σ 2 ⇒x= e µ −σ > 0 f ′( x) =  < 0

2

x < e µ -σ

2

x > e µ -σ

2

By 1st derivative test, f(x) is maximum at x = e µ −σ

2

2

Continuous Distributions

143

2

Hence x = e µ −σ is the mode of the distribution. Remark:

1

µ+ σ2

Mean = e 2

> Median = e µ > Mode = e µ −σ

2

The derivation of the mean is derived in the next coming section of “Moments of Log Normal Distribution.” Hence, Log-normal Distribution is positively skewed. d. Moments of Log Normal Distribution: The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: (X ) = µ1′ E=

∞

∞

0

0

1

x)dx ∫ x ∫ xf (= x

e

2πσ

1  log e x − µ    σ 2 

-

2

dx

log e x − µ uσ + µ ) = ⇒ x e(

= Let u

σ

1 ⇒ du = dx ⇒ = dx xσ du ⇒ = dx e(uσ + µ )σ du xσ

Using the above results, we get: ∞

∞

1 1 - u2 - u2 1 1 ( uσ + µ ) µ uσ 2 2 e e e = σ du e = e du e µ ∫ ∫ 2πσ 2π −∞ −∞ ∞

1 1 2 1 - ( u −σ ) + σ 2 µ+ σ2 1 µ 2 2 2 e = e= du e ∫ 2π −∞

∞

1

∫

2π

e

1 2π -

∞

∫

e

-

1 2 u − 2 uσ 2

(

)

du

−∞

1 ( u −σ ) 2 2

du

−∞

Let p = u − σ ⇒ dp = du

Applying the above results, we get: =e

1 2

µ+ σ2

∞

1

∫

2π

−∞ ∞

As we know that

∫

−∞

e

-

1 2 p 2

dp

1 - p2

1

e 2 dp = 1, therefore:

2π

=e

1 2

µ+ σ2

. (4.6.1) 1

So, mean of the log-normal distribution is e µ + 2 σ . (X ) = µ2′ E= 2

∞

∫x 0

2

( x)dx f=

∞

∫x 0

2

1 x 2πσ

e

2

-

1  log e x − µ    2 σ 

2

dx

Distribution Theory

144

log e x − µ = ⇒ x e(uσ + µ )

= Let u

σ

1 ⇒ du = dx ⇒ = dx xσ du ⇒ = dx e(uσ + µ )σ du xσ

Substituting the above results, we get: ∞

∞

1 1 - u2 - u2 1 1 1 2( uσ + µ ) 2µ 2 uσ 2 2 e e e = σ du e = e du e 2 µ ∫−∞ ∫ 2πσ 2π −∞ 2π

∞

∫

e

-

1 2 u − 4 uσ 2

(

)

du

−∞

Let p =u − 2σ ⇒ dp =du

Using the above results, we get: = e 2 µ + 2σ

2

∞

∫

−∞ ∞

∫

As we know that

−∞

1 2π

e

-

1 2 p 2

dp

1 - p2

1

e 2 dp = 1, therefore:

2π

2

= e 2 µ + 2σ (4.6.2) (X ) = µ3′ E= 3

∞

∫x

3

( x)dx f=

0

= Let u

∞

∫x

1

3

0

x 2πσ

e

-

1  log e x − µ    2 σ 

2

dx

log e x − µ = ⇒ x e(uσ + µ )

σ

1 dx ⇒ = dx xσ du ⇒ = dx e(uσ + µ )σ du xσ

⇒ du =

Using the above results, we get: ∞

=

1 - u2 1 3( uσ + µ ) 2 = e e σ du e3 µ ∫−∞ 2πσ ∞

∫e

−∞

∞

3uσ

e

1 - u2 2

du = e3 µ

1 2π

∞

∫

e

1 2π

1 - u 2 − 6 uσ 2

(

)

du

−∞

1 9 2 9 - ( u − 3σ ) + σ 2 3µ + σ 2 1 3µ 2 2 2 e e= du e ∫ 2π −∞

1 2π

Let p =u − 3σ ⇒ dp =du

∞

∫

−∞

e

-

1 ( u − 3σ )2 2

du

Continuous Distributions

145

Applying the results mentioned above, we have: =e

∞ 9 3µ + σ 2 2

1

∫

2π

−∞ ∞

∫

As we know that

−∞

1 2 p 2

dp

e 2 dp = 1, therefore:

2π

=e (X ) = µ4′ E=

-

1 - p2

1

4

e

∞

∫x

4

9 3µ + σ 2 2

( x)dx f=

0

. (4.6.3)

∞

∫x

1

4

e

x 2πσ

0

1  log e x − µ    2 σ 

-

2

dx

log e x − µ uσ + µ ) = ⇒ x e(

Let u =

σ

1 dx ⇒ = dx xσ du ⇒ = dx e(uσ + µ )σ du xσ

⇒ du =

Using the above results, we get: ∞

=

1 - u2 1 4( uσ + µ ) 2 = e e σ du e 4 µ ∫−∞ 2πσ ∞

∫e

4 uσ

e

1 - u2 2

∞

1

du = e 4 µ

∫

2π

−∞

e

1 2π

1 - u 2 − 8uσ 2

(

)

du

−∞

∞

1 2 16 - ( u − 4σ ) + σ 2 2 1 4µ 2 2 e e= du e 4 µ +8σ ∫ 2π −∞

∞

1 2π

∫

e

-

1 ( u − 4σ )2 2

du

−∞

Let p =u − 4σ ⇒ dp =du

Substituting the above results, we get: = e 4 µ +8σ

2

∞

∫

−∞ ∞

∫

As we know that

−∞

1 2π

e

-

1 2 p 2

dp

1 - p2

1

e 2 dp = 1, therefore

2π

= e 4 µ +8σ (4.6.4) 2

rth Moments of log-normal distribution: (X ) = µr′ E= r

∞

∫x 0

r

( x)dx f=

∞

∫x 0

r

1 x 2πσ

e

-

1  log e x − µ    2 σ 

2

dx

Distribution Theory

146

log e x − µ = ⇒ x e(uσ + µ )

= Let u

σ

1 ⇒ du = dx ⇒ = dx xσ du ⇒ = dx e(uσ + µ )σ du xσ

Using the above results, we get: ∞

1

-

1 2 u 2

e e σ du ∫= 2πσ

=

r ( uσ + µ )

er µ

−∞ ∞

∫e

ruσ

e

1 - u2 2

∞

1

du = e r µ

2π

−∞

∫

e

2π

1 - u 2 − 2 ruσ 2

(

)

du

−∞

2

∞

1

2

1 r 2 r - ( u − rσ ) + σ 2 rµ + σ 2 1 rµ 2 2 2 e e= du e ∫ 2π −∞

1 2π

∞

∫

e

-

1 ( u − rσ )2 2

du

−∞

Let p =u − rσ ⇒ dp =du

Substituting the above results, we get: =e

rµ +

r2 2 ∞ σ 2

1

∫

2π

−∞

As we know that

∞

∫

−∞

1 2π

-

1 2 p 2

dp

1 - p2

e 2 dp = 1, therefore: =e

e

rµ +

r2 2 σ 2

Substituting the values of r (= 1, 2, 3, 4), we get all four moments which are mentioned in Eqns. (4.6.1)–(4.6.4)

e. Central Moments of Log Normal Distribution: The first four central moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: As we know that the first central is moment is zero. μ1 = 0

(4.6.5)

μ2 = μʹ2 – μʹ1

2

Substituting the values from the Eqns. (4.6.1) and (4.6.2) in the above expression, we get: = e

2 µ + 2σ 2

 µ + 1σ 2  −e 2   

2

Continuous Distributions

147

= e 2 µ +σ

(e

2

σ2

)

− 1 (4.6.6)

(

2

)

Thus, the variance of the uniform discrete distribution is e 2 µ eσ − 1 µ3 = µ3′ − 3µ2′ µ1′ + 2µ 1′3

Substituting the values from the Eqns. (4.6.1) and (4.6.3) in the above expression, we get: =e

9 3µ + σ 2 2

=e

− 3e

9 3µ + σ 2 2

=e

2 µ + 2σ 2

= e

− 3e

3 3µ + σ 2 2

3 3µ + σ 2 2

e

5 3µ + σ 2 2

(e

3σ 2

(e

σ2

 µ + 1σ 2  + 2 e 2   

1 2

µ+ σ2

+ 2e

3 3µ + σ 2 2

− 3eσ + 2

)

) (e

)

2

−1

2

σ2

3

+ 2 (4.6.7)

µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4 ,

Substituting the values from the Eqns. (4.6.1) and (4.6.4) in the above expression, we get: 2

 3µ + 9 σ 2  µ + 1 σ 2  µ + 1σ 2   µ + 1σ 2  2 e = − 4  e 2  e 2 + 6 e 2 µ + 2σ  e 2  − 3  e 2       

(

4 µ + 8σ 2

2

)

2

2

=e 4 µ +8σ − 4e 4 µ + 5σ + 6e 4 µ + 3σ − 3e 4 µ + 2σ = e 4 µ + 2σ

= e 4 µ + 2σ

2

(e

σ2

2

(e

6σ 2

) (e

−1

2

2

2

− 4e3σ + 6eσ − 3 4σ 2

2

4

2

)

2

)

+ 2e3σ + 3e 2σ − 3 (4.6.8)

f. Skewness and Kurtosis of Log Normal Distribution: To find the shape of the distribution, we have derived the expressions of skewness (β1, γ1) and kurtosis (β2, γ2), which are given below: β1 =

µ32 µ23

Distribution Theory

148

Substituting the values from the Eqns. (4.6.6) and (4.6.7) in the above expression, we get: 2  3 µ + 32 σ 2 σ 2  2 e − 1 eσ + 2  e  = 3 2 2 e 2 µ +σ eσ − 1

(

)(

(

)

))

(

(

)(

2

2

2

=eσ − 1 eσ + 2

)

2

Using the value of β1, we get: γ 1 = β1

(

2

γ1 = eσ + 2

) (e

β2 =

)

σ2

−1

µ4 µ22

Substituting the values from the Eqns. (4.6.6) and (4.6.8) in the above expression, we get: =

e 4 µ + 2σ

2

(e

σ2

) (e

−1

2

e 4 µ + 2σ 2

2

4σ 2

(e

σ

2

2

+ 2e3σ + 3e 2σ − 3 2

)

−1

2

)

2

2

=e 4σ + 2e3σ + 3e 2σ − 3

Using the value of β2, we get: γ= β2 − 3 2 2

2

2

γ 2 =e 4σ + 2e3σ + 3e 2σ − 6

Note: The other properties of the Log-Normal distribution can be derived utilizing the methods used above for the earlier derivations.

Continuous Distributions

149

KEYWORDS • characteristic function • • • •

cumulants generating function moment generating function normal distribution probability generating function

REFERENCES Casella, G., & Berger, R. L., (2002). Statistical Inference. Duxbury: Belmont, CA. Gupta, S. C., & Kapoor, V. K., (1997). Fundamentals of Mathematical Statistics. Sultan Chand and Sons: New Delhi. Hogg, R. V., McKean, J. W., & Craig, A. T., (2013). Introduction to Mathematical Statistics (7th edn.). Pearson: Boston. Miller, I., Miller, M., Freund, J. E., & Miller, I., (2004). John E. Freund’s Mathematical Statistics with Applications. Prentice Hall: Upper Saddle River, NJ. Rohatgi, V., & Saleh, A. K. Md. E., (2015). Introduction to Probability and Statistics (3rd edn.). John Wiley.

CHAPTER 5

Family of Weibull Distributions MOHD. ARSHAD1, VIJAY KUMAR2, MUKTI KHETAN3, and FOZIA HOMA4 Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India 1

Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India

2

Department of Mathematics, Amity School of Applied Sciences, Amity University Mumbai, Maharashtra

3

Department of Statistics, Mathematics, and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India

4

5.1 INTRODUCTION Weibull Distribution was introduced by Waloddi Weibull in the year 1951 and therefore is named after him, although it was first identified by Frechet (1927). Rosin and Rammler (1933) provided an application of this model on the particle size distribution. Weibull model is the most popular lifetime distribution in reliability engineering, and biological studies due to its flexibility, closed-form survival function, and monotonic failure (or hazard) rates. But this model does not provide a good fit to data sets with bathtub-shaped or upside-down bathtubshaped failure rates. It is evident from the citations (more than 11,900) of the Weibull (1951) that it is a more applicable model to analyze several lifetime

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

Distribution Theory

152

events. Modifications of the Weibull model have also been made to extend its applicability to a real-life scenario. In this chapter, we will discuss the Weibull model as well as some modified versions. The Weibull model is also related to several other probability models; e.g., the exponential and Rayleigh models are particular cases of this model. For more details and applications, readers can see the handbook written by Rinne (2009).  Definition: A positive random variable X is said to follow Weibull Distribution if its cumulative distribution function (CDF) is

  x−a FX ( x) = 1 − exp  −     b 

c

 ; 

x > a (5.1.1)

Symbolically: X ~ W (a, b, c) Here a, b, and c are the three parameters of this model in which a is the shift (or location) parameter, b (> 0)is the scale parameter, and c (> 0) is the shape parameter. The probability density function (PDF) is f X ( x) =

  x − a c   d d  FX ( x) = 1 − exp −     dx dx    b   

  x − a c  exp  −    ;   b   (5.1.2) x > a, b > 0, c > 0, - ∞ < a < ∞

1 x−a f X ( x) c  =  b b 

c −1

Sample Generation of Weibull Distribution

FIGURE 5.1 Weibull distribution.

Family of Weibull Distributions

153

R Code: Generate random sample from Weibull distribution. x = rweibull(n, c, b) where; n is the sample size; b, c are the parameters of the distribution Example: x = rweibull(10, 2, 2)

If a ≠ 0  Step 1: Generate x= rweibull(n, c, b)  Step 2: Generate y = x + a i.e., y denotes the vector containing W(a, b, c) data set of size n. Sample Generation of Using Inverse Transformation Method:   X − a c  FX ( x) = 1 − exp  −   ;   b  

x>a

  X − a c  U= 1 − exp  −      b    X −a − log(1 − U ) =   b  X = b [ − log(1 − U ) ]

1/ c

c

+a

where; U follows uniform (0,1). Note: First generate n data points u = (u1, u2,…,un) from U(0,1) distribution. Now, consider the transformation xi = b[–log(1– ui)]1/c + a; i = 1, 2, …n. Then, the data set x = (x1, x2,…,xn) has W(a, b, c) distribution.  Example 5.1: Let u = (0.1,0.3,0.22,0.51,0.11,0.6) be a data set generated from U(0,1) distribution. Now, we want to generate random samples from W (1, 2, 2), W (2, 1, 3) and W (1, 1, 2). Calculate xi = b[–log(1– ui)]1/c + a; i = 1,2,…6, where ui = (u1, u2,…,u6).

Distribution Theory

154

U(0,1)

Random Sample W(1, 1, 2)

W(1, 2, 2)

W(2, 1, 3)

0.10

1.324593

1.649186

2.472309

0.30

1.597223

2.194445

2.709182

0.22

1.498459

1.996918

2.628665

0.51

1.844600

2.689201

2.893513

0.11

1.341370

1.682741

2.488447

0.60

1.957231

2.914462

2.971280

5.1.1 PROPERTIES OF WEIBULL DISTRIBUTION a. Raw Moments of Weibull Distribution: In this section, we derived the moments about zero known as raw moments. The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived as below: = µ1′ E= (X )

∞

∫

xf (= x)dx

c

−∞

∞

1  x−a x  b ∫a  b 

  x − a c  exp  −    dx   b  

c −1

c

1 (1/ c ) −1  x−a 1/ c Let  dz  = z ⇒ x = bz + a ⇒ dx = b z c  b 

µ1′ = c

∞

1 1 1− (1/ c ) ) bz1/ c + a ) z ( exp(− z )b z (1/ c ) −1dz ( ∫ b0 c ∞

=

∫ ( bz

1/ c

+ a ) exp(− z )dz

0

∞

∞

0

0

= b ∫ z1/ c exp(− z )dz + a ∫ exp(− z )dz

Using the usual gamma function, we get µ1′ = b (1 + (1/ c) ) + a. (5.1.3)

So, the mean of the Weibull distribution is µ1′ = b (1 + (1/ c) ) + a and if (1/c) it is an integer, the mean is b (1/ c )!+ a . The second raw moment is: = µ2′ E= (X2)

∞

∫

−∞

x2 f = ( x)dx c

∞

1 2 x−a x   b ∫a  b 

c −1

  x − a c  exp  −    dx   b  

Family of Weibull Distributions

155

c

1 (1/ c ) −1  x−a 1/ c Let  dz  = z ⇒ x = bz + a ⇒ dx = b z c  b  ∞ 2 1 1 1− (1/ c ) ) µ2′ = c ∫ ( bz1/ c + a ) z ( exp(− z )b z (1/ c ) −1dz b0 c ∞

=∫ ( bz1/ c + a ) exp(− z )dz 0

∞

∫ (b z

= ∞

=

2 2/ c

0

∫b z

2 2/ c

0

2

+ a 2 + 2abz1/ c ) exp(− z )dz ∞

∞

0

0

exp(− z )dz + ∫ a 2 exp(− z )dz + 2ab ∫ z1/ c exp(− z )dz

Using the result of the gamma function, we get µ2′ = b 2 1 + ( 2 / c ) + a 2 + 2ab 1 + (1/ c ) (5.1.4)

The third raw moment is = µ3′ E= (X3)

∞

∫

x3 f = ( x)dx c

−∞

∞

1 3 x−a x   b ∫a  b 

c −1

  x − a c  exp  −    dx   b  

c

1 (1/ c ) −1  x−a 1/ c Let  dz  = z ⇒ x = bz + a ⇒ dx = b z b c   ∞ 3 1 1 1− (1/ c ) ) µ3′ = c ∫ ( bz1/ c + a ) z ( exp(− z )b z (1/ c ) −1dz b0 c ∞

=∫ ( bz1/ c + a ) exp(− z )dz

∞

=

∫ (b z

3 3/ c

0

3

0

+ a 3 + 3a 2 bz1/ c + 3ab 2 z 2/ c ) exp(− z )dz

∞

∞

0

0

= b3 ∫ z 3/ c exp(− z )dz + ∫ a 3 exp(− z )dz + 3a 2 ∞

∞

0

0

b ∫ z1/ c exp(− z )dz + 3ab 2 ∫ z 2/ c exp(− z )dz

Using the result of the gamma function, we get

µ3′ = b3 1 + ( 3 / c ) + a 3 + 3a 2 b 1 + (1/ c ) + 3ab 2 1 + ( 2 / c ) (5.1.5)

The fourth raw moment is = µ4′ E= (X4)

∞

∫

−∞

x4 f = ( x)dx c

∞

1 4 x−a x   b ∫a  b 

c −1

  x − a c  exp  −    dx   b  

Distribution Theory

156

c

1 (1/ c ) −1  x−a 1/ c dz Let   = z ⇒ x = bz + a ⇒ dx = b z c  b  ∞

1 b0

µ4′ = c ∫ ( bz1/ c + a ) z ( 4

∞

1− (1/ c ) )

1 exp(− z )b z (1/ c ) −1dz c

=∫ ( bz1/ c + a ) exp(− z )dz ∞

=

4

0

∫ (b z

4 4/ c

+ a + 4a 3bz1/ c + 4ab3 z 3/ c + 6a 2 b 2 z 2/ c ) exp(− z )dz 4

0

∞ ∞  4 ∞ 4/ c  4 3 1/ c b ∫ z exp(− z )dz + ∫ a exp(− z )dz + 4a b ∫ z exp(− z )dz  0 0 0  = ∞ ∞    +4ab3 ∫ z 3/ c exp(− z )dz + 6a 2 b 2 ∫ z 2/ c exp(− z )dz  0 0  

Using the result of the gamma function, we get µ4′ = b 4 1 + ( 4 / c ) + a 4 + 4a 3b 1 + (1/ c )

+ 4ab3 1 + ( 3 / c ) + 6a 2 b 2 1 + ( 2 / c )

(5.1.6)

b. Central Moments of Weibull Distribution: Central moments are very essential for the distribution as they tell the characteristics of the distribution. So, the first four central moments μ1 , μ2, μ3, μ4 have been derived as below. We know that the first central moment is always zero, i.e.,

μ1 = 0

(5.1.7)

The second central moment tells the variation of the distribution and is known as the variance. The second central moment is given below. μ2 = μʹ2 – μʹ1 2 Using the values from Eqns. (5.1.3) and (5.1.4), we get:

(

µ2 =b 2 1 + ( 2 / c ) + a 2 + 2ab 1 + (1/ c ) − b 1 + (1/ c ) + a

{

} −a

= b 2 1 + ( 2 / c ) + a 2 + 2ab 1 + (1/ c ) − b 2 1 + (1 / c )

2

2

)

2

− 2ab 1 + (1/ c )

Family of Weibull Distributions

157

{

}

2 µ2= b 2  1 + ( 2 / c ) − 1 + (1/ c )  (5.1.8)  

Thus, the variance of the Weibull distribution is

{

}

2 b 2  1 + ( 2 / c ) − 1 + (1 / c )  .  

The third central moment tells the skewness of the distribution. The derivation of the third central moment is given below: µ3 = µ3′ − 3µ2′ µ1′ + 2µ 1′3

Using the values from Eqns. (5.1.3) to (5.1.5), we get: b3 1 + ( 3 / c ) + a 3 + 3a 2b 1 + (1 / c ) + 3ab 2 1 + ( 2 / c )   = 3   2 2    3 b 1 2 / c a 2 ab 1 1 / c b 1 1 / c a 2 b 1 1 / c a − + + + + + + + + + ( ) ( ) ( )   ( )    

(

)(

)

  b3 1 + ( 3 / c ) + a 3 + 3a 2 b 1 + (1/ c ) + 3ab 2 1 + ( 2 / c )    2   3   2 2 b 1 + ( 2 / c ) 1 + (1/ c ) + a b 1 + (1/ c ) + 2ab 1 + (1/ c )  =  −3   2 3 2   + ab 1 + ( 2 / c ) + a + 2a b 1 + (1/ c )    3 2 3 3 2 2  +2 b 1 + (1/ c ) + a + 3a b 1 + (1/ c ) + 3ab 1 + (1/ c )   

{

{

}

{

}

{

}}

{

}

3 µ3= b3  1 + ( 3 / c ) − 31 + ( 2 / c ) 1 + (1/ c ) + 2 1 + (1/ c )  (5.1.9)  

The fourth central moment helps to find the kurtosis of the distribution. The fourth central moment is given below: µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4

Using the values from Eqns. (5.1.3) to (5.1.6), we get:  b 4 1 + ( 4 / c ) + a 4 + 4a 3b 1 + (1/ c ) + 4ab3 1 + ( 3 / c ) + 6a 2 b 2 1 + ( 2 / c )   −4 b3 1 + ( 3 / c ) + a 3 + 3a 2 b 1 + (1/ c ) + 3ab 2 1 + ( 2 / c ) b 1 + (1/ c ) + a =   2  +6 b 2 1 + ( 2 / c ) + a 2 + 2ab 1 + (1/ c ) b 1 + (1/ c ) + a − 3 b 1 + (1/ c ) + a 

(

{(

)(

)( )} {

)

}

     4  

Distribution Theory

158

b 4 1 + ( 4 / c ) + a 4 + 4a 3b 1 + (1 / c ) + 4ab3 1 + ( 3 / c ) + 6a 2b 2 1 + ( 2 / c )    2   4  3 2 2 3 + + + + + + + + + b c c a b c a b c ab c c 1 3 / 1 1 / 1 1 / 3 1 1 / 3 1 2 / 1 1 / ( ) ( ) ( ) ( ) ( ) ( )  −4      3 4 3 2 2     + ab 1 + ( 3 / c ) + a + 3a b 1 + (1 / c ) + 3a b 1 + ( 2 / c )      b 4 1 + ( 2 / c ) 1 + (1 / c ) 2 + a 2b 2 1 + (1 / c ) 2 + 2ab3 1 + (1 / c ) 3    =      2 2  4 3   +6  + a b 1 + ( 2 / c ) + a + 2a b 1 + (1 / c )     2    +2ab3 1 + ( 2 / c ) 1 + (1 / c ) + 2a 3b 1 + (1 / c ) + 4a 2b 2 1 + (1 / c )         4 3 2   4 4 3 3 2 2  −3  b 1 + (1 / c ) + a + 4a b 1 + (1 / c ) + 4ab 1 + (1 / c ) + 6a b 1 + (1 / c )      

{

{

}

{

}

}

{

}

{

{

}

}

{

}

{

}

 1 + ( 4 / c ) − 4 1 + ( 3 / c ) 1 + (1/ c ) + 6 1 + ( 2 / c )   2 4  (5.1.10)  1 + (1/ c ) − 3 1 + (1/ c )   

µ4 = b 4 

{

} {

}

c. Skewness and Kurtosis of Weibull Distribution: To find the shape of this model, we have derived the expressions of skewness (β1, γ1) and kurtosis (β2, γ2) in this section. µ32 µ23

β1 =

Using the values from Eqns. (5.1.8) and (5.1.9), we get:

{

2

}

 1 + 3 / c − 31 + 2 / c 1 + 1/ c + 2 1 + 1/ c 3  ) ( ) ( ) ( )   (  (5.1.11) β1 =  2 3  1 + 2 / c − 1 + 1/ c  ) ( )   ( 

{

= γ1

= β1

}

µ3 µ23/ 2

Substituting the value of β1 in the above expression, we get:

γ1 =

{

}

1 + ( 3 / c ) − 31 + ( 2 / c ) 1 + (1/ c ) + 2 1 + (1/ c )

{

}

 1 + 2 / c − 1 + 1/ c 2  ) ( )   ( 

β2 =

µ4 µ22

3/ 2

3

(5.1.12)

Family of Weibull Distributions

159

Using the values from Eqns. (5.1.8) and (5.1.10), we get:

{ } − 3{ 1 + (1/ c )} β =  1 + 2 / c − 1 + 1/ c  ) { ( )}   ( (5.1.13)  1 + ( 4 / c ) − 4 1 + ( 3 / c ) 1 + (1/ c ) + 6 1 + ( 2 / c ) 1 + (1/ c )

2

2

2

4

2

Substituting the value of β2 in the above expression, we get:

{ } − 3{ 1 + (1/ c )} − 3 γ  1 + 2 / c − 1 + 1/ c  ) { ( )}   ( (5.1.14)  1 + ( 4 / c ) − 4 1 + ( 3 / c ) 1 + (1/ c ) + 6 1 + ( 2 / c ) 1 + (1/ c )

2

2

2

4

2

 Shapes of the Weibull Distribution at Different Values of Parameters

FIGURE 5.2 Shape of the Weibull distribution.

d. Median, Quartile, and Percentile of Weibull Distribution: Median divides the data into two equal parts or in other words we can write FX ( M ) =

1 , 2

Distribution Theory

160

where M denotes the median. From the above section we know the CDF of Weibull distribution, we get   M −a 1 − exp  −     b    M −a exp  −     b 

c

c

 1 =  2

c  1  M −a  = ⇒ −  = − log e 2  b   2

= ⇒ M b ( log e 2 )

1/ c

+ a (5.1.15)

Quartiles divide the data into four equal parts or in other words we can write = FX (Qr )

r = , r 1, 2,3 4

where Qr denotes the rth quantile. 1/ c

r  Qr = b  − log e  4 

+a

Clearly, Q2 equals to M, i.e., the second quantile is equal to the median. Percentiles divide the data into 100 equal parts or in other words we can write = FX ( Pr )

r = r 1, 2,...,99 100

where Pr denotes the rth percentile. 1/ c

r   Pr = b  − log e  100  

+a

r= 1, 2,...,99

We can find the expressions of P1, P2, …, P100 by putting r =1 to 99, respectively in the above expression. e. Mode of Weibull Distribution: The behavior of the PDF of the Weibull model depends on parameter c. For obtaining the mode of this model, we will consider the following two cases.  Case 1: when c is not equal to 1. Mode is the highest occurred value in the data set and to find the mode of the distribution we have to differentiate the PDF with respect to x which is given below: log f X ( x) =

 1  x − a c −1   x − a c   d log e  c  exp −     b  b  dx   b    

Family of Weibull Distributions

f X ( x) log=

161

d   x−a  x−a  log e c − log e b + (c − 1) log e  −  dx   b   b 

c

  

Differentiating both the side with respect to x 1 1 1  x−a f X' ( x) = 0 + 0 + (c − 1) − c  f X ( x)  x−a b  b     b 

c −1

1 b

c −1  1 1 x−a  −c  f x′( x) = (c − 1)  f ( x) ( x − a ) b  b   X  c −1  1 1 x−a  ′ f x if c c ( ) = 0 ( − 1) −   f ( x) =0 . Clearly, x ( x − a ) b  b   X 

Since PDF is always positive, we get: c −1

⇒ (c − 1)

1 1 x−a −c  = 0; c ≠ 1 ( x − a ) b  b 

1 1 x−a (c − 1) = c  ( x − a ) b  b 

1/ c

 c −1  x= b   c 

+ a;

c −1

c −1  x − a  ⇒ =   c  b 

c

c ≠ 1 (5.1.16)

Thus, for c ≠ 1, the mode of the distribution is given by Eqn. (5.1.16). Case 2: When c = 1. In this case, the PDF of this model is

  x − a  1 exp  −  f X ( x) =   ; x > a (5.1.17) b   b 

The PDF f (x), given in (5.1.17), is a decreasing function of x. Thus, the mode of this model occurs at the beginning of the support of x, i.e., the mode the distribution is x = a. f. Reliability Function of Weibull Distribution: In reliability analysis, the random variable of interest is the failure time of the component/ system. Let us denote this random variable by X. Then, by definition, reliability R(x) at the time x, of the component/system is given by R(x) = Probability that the time to failure of the component/system is greater than x

Distribution Theory

162

R(x) = P[X > x] = 1 – Fx (x) Using Eqn. (5.1.1), we get:    x − a c  exp  −   ; x > a R( x) =  (5.1.18)   b    x ≤ a 1 ; 

g. Hazard Rate of Weibull Distribution: The instantaneous rate of failure at time x is known as the hazard rate and is defined by λ ( x) =

1 f ( x). R( x)

Substituting the expression of R(x) and f(x), we get: λ ( x)

1

1 x−a c   c   x−a  b b  exp  −      b  

1 x−a = λ ( x) c   b b 

c −1

c −1

  x−a exp  −     b 

c

  

; x > a (5.1.19)

Family of Weibull Distributions

163

FIGURE 5.3 Reliability function and hazard function of Weibull distribution.

h. Mean Time to Failure of Weibull Distribution: In reliability terminology, the mean of the random variable X in the absence of repair and replacement is known as the meantime to failure (MTTF). The expression of MTTF is given by = MTTF

∞

xf ( x)dx ∫=

Mean of the Weibull distribution

−∞

Using Eqn. (5.1.3), we get MTTF = b (1 + (1/ c) ) + a (5.1.20)

i. Moment Generating Function (MGF) of Weibull Distribution: The moment generating function (MGF) of this model is given by M= E= ( etX ) X (t )

∞

tx ( x)dx ∫ e f=

−∞

c

∞

1 tx  x − a  e   b ∫a  b 

c −1

  x − a c  exp  −    dx   b  

c

1 (1/ c ) −1  x−a 1/ c Let  dz  = z ⇒ x = bz + a ⇒ dx = b z c  b  M X (t) = c

∞

1 t ( bz1/c + a ) (1−(1/ c )) 1 e z exp(− z )b z (1/ c ) −1dz ∫ b0 c

Distribution Theory

164

∞

eta ∫ etbz exp(− z )dz

=

1/ c

0

= e

ta

∞ ∞

∫∑

( tbz )

1/ c r

∞

=eta ∑

( tb )

r =0

exp(− z )dz

r!

0 r =0

r

∞

∫z

r!

r /c

exp(− z )dz ,

0

Using gamma function, we get ∞

= M X (t ) eta ∑

r =0

( tb )

r

r!

1 + ( r / c ) (5.1.21)

j. Characteristic Function (CF) of Weibull Distribution: The characteristic function (CF) is here. It is denoted by ϕX (t). The derivation of ϕX (t) is like the derivation MX (t), given in Section (i).

= φ X (t ) e

ita

∞

∑ r =0

( itb )

r

r!

1 + ( r / c ) (5.1.22)

k. Entropy of Weibull Distribution: Entropy is a measure of uncertainty. The concept of information entropy was introduced by Shannon in 1948. It is defined as Entropy = E[-log f (X)]   1  X − a c −1   X − a c    E  − log e  c  exp  −  =      b b     b      = − log e

Now,

 X − a  c  c   X − a  − (c − 1) E log e    + E    (5.1.23) b  b    b   ∞

  X − a   x−a E  log    = ∫ log   f ( x)dx  b  a  b   ∞

 x−a 1 x−a ∫a log  b  c b  b 

c −1

  x−a exp  −     b 

c

 dx 

Family of Weibull Distributions

165

c

1 (1/ c ) −1  x−a 1/ c Let  dz  = z ⇒ x = bz + a ⇒ dx = b z c  b  ∞

 1 1  X − a  (1− (1/ c ) ) 1/ c E  log  exp(− z )b z (1/ c ) −1dz   = c ∫ log ( z ) z b b c    0 ∞

= ∫ log ( z1/ c ) exp(− z )dz 0

∞

=

 ϒ  X − a  E  log    = - , (5.1.24) b c   

∞

= ϒ where;

1 log z exp(− z )dz c ∫0

∫ log z exp(− z )dz

is Euler’s Constant.

0

Consider

c

 X −a E =   b 

c

∞

 x−a 1 x−a ∫a  b  c b  b 

c −1

  x − a c  exp  −   dx   b  

c

1 (1/ c ) −1  x−a 1/ c Let  dz  = z ⇒ x = bz + a ⇒ dx = b z c  b   X − a  E    b  

c

∞  1 ( 2 −(1/ c )) 1 exp(− z )b z (1/ c ) −1dz  = c ∫ z b0 c  ∞

=

∫ z exp(− z )dz 0

= 2

  X − a c  E  =1   b   (5.1.25)  

Now, using Eqns. (5.1.23) and (5.1.24) in Eqn. (5.1.22), we get: Entropy = log e

b ϒ + (c − 1) + 1 c c

Distribution Theory

166

5.2 SPECIAL CASES OF WEIBULL DISTRIBUTION 5.2.1 TWO-PARAMETER WEIBULL DISTRIBUTION Two-parameter Weibull distribution can be derived by putting a = 0 in eq (5.1.1) and (5.1.2). The CDF and PDF of this model are respectively given by

  x FX ( x) = 1 − exp  −     b  1 x f X ( x) c   = bb

c −1

c

  ; x > 0, b > 0, c > 0 (5.2.1) 

  x c  exp  −    ; x > 0, b > 0, c > 0 (5.2.2)   b  

Symbolically: X ~ W (b, c) The properties of the two-parameter Weibull distribution are given below. 5.2.1.1 PROPERTIES OF TWO-PARAMETER WEIBULL DISTRIBUTION a. First Four Moments of this Model are given below:

= µ1′ b (1 + (1/ c) ) (5.2.1.3)

= µ2′ b 2 1 + ( 2 / c ) (5.2.1.4)

= µ3′ b3 1 + ( 3 / c ) (5.2.1.5)

= µ4′ b 4 1 + ( 4 / c ) (5.2.1.6)

b. The first four central moments of two-parameter Weibull model are same as three-parameter Weibull model (given in Eqns. (5.1.17)–(5.1.10)) c. Skewness and kurtosis of two-parameter Weibull model are same as three-parameter Weibull model (given in Eqns. (5.1.11)–(5.1.14)) d. Median, Quartile, Percentile, and Mode of Two-Parameter Weibull Distribution:

Median = b ( log e 2 ) (5.2.1.7) 1/ c

Family of Weibull Distributions

167

1/ c

r  Qr = b  − log e  4 

;r = 1, 2,3

1/ c

r   Pr = b  − log e  100  

+ a ;r = 1, 2,...,99 1/ c

 c −1  Mode = b   (5.2.1.8)  c 

e. Reliability Function, Hazard Function, and MTTF of Two-Parameter Weibull Distribution

  x c  R( x) = exp  −    ; x > 0, b > 0, c > 0 (5.2.1.9)   b   1 x = λ ( x) c   bb

c −1

; x > 0, b > 0, c > 0 (5.2.1.10)

MTTF = b (1 + (1/ c) ) (5.2.1.11)

f. Moments Generating Function, Characteristic Function (CF) and Entropy of two-parameter Weibull Distribution

∞

M X (t)= ∑

( tb ) r!

r =0

∞

M X (t)= ∑

( itb )

r =0

r

Entropy = log e

r!

r

1+ ( r/c ) (5.2.1.12) 1+ ( r/c ) (5.2.1.13)

b ϒ + (c − 1) + 1 (5.2.1.14) c c

5.2.2 ONE PARAMETER WEIBULL DISTRIBUTION One parameter Weibull distribution can be derived by putting a = 0 and b=1 in eq (5.1.1) and (5.1.2). The CDF and PDF of one parameter Weibull distribution are respectively given by

FX ( x) = 1 − exp ( − x c ) ; x > 0, c > 0 (5.2.2.1) = f X ( x) c ( x )

c −1

c exp  − ( x )  ; x > 0, c > 0 (5.2.2.2)  

Distribution Theory

168

Symbolically: X ~ W (b, c) The properties of one parameter Weibull distribution are given below. 5.2.2.1 PROPERTIES OF THE ONE PARAMETER WEIBULL DISTRIBUTION a. Moments:

(1 + (1/ c) ) (5.2.2.3)

′ µ= 1

µ2′= 1 + ( 2 / c ) (5.2.2.4)

µ3′= 1 + ( 3 / c ) (5.2.2.5)

µ4′= 1 + ( 4 / c ) (5.2.2.6)

b. Central Moments:

μ1 = 0

µ2 = 1 + ( 2 / c ) − 1 + (1/ c ) (5.2.2.8)

{

µ3 =1 + ( 3 / c ) − 31 + ( 2 / c ) 1 + (1/ c ) + 2 1 + (1/ c ) (5.2.2.9)

(5.2.2.7)

}

2

{

{

}

3

} { 2

}

4

µ4 =1 + ( 4 / c ) − 4 1 + ( 3 / c ) 1 + (1/ c ) + 6 1 + ( 2 / c ) 1 + (1/ c ) − 3 1 + (1/ c ) (5.2.2.10) c. Skewness and kurtosis of one parameter Weibull model are same as three-parameter Weibull model (given in Eqns. (5.1.11)–(5.1.14)) d. Median, Quartile, Percentile, and Mode:

Median = ( log e 2 )

1/ c

(5.2.2.11)

1/ c

r  Qr = ;r = 1, 2,3  − log e  4  1/ c

r   Pr =  − log e  100  

+ a ;r = 1, 2,...,99 1/ c

 c −1  Mode =   (5.2.2.12)  c 

Family of Weibull Distributions

169

e. Reliability Function, Hazard Function, and MTTF:

c R( x) = exp  − ( x )  ; x > 0, c > 0 (5.2.2.13)  

= λ ( x) c ( x )

; x > 0, c > 0 (5.2.2.14)

c −1

(1 + (1/ c) ) (5.2.2.15)

MTTF =

f. Moment Generating Function (MGF), Characteristic Function (CF) and Entropy:

= M X (t ) = φ X (t )

∞

∑

(t )

r!

r =0

∞

∑

( it )

r =0

r

r!

= log e Entropy

r

1 + ( r / c ) (5.2.2.16) 1 + ( r / c ) (5.2.2.17)

1 ϒ + (c − 1) + 1 (5.2.2.18) c c

5.3 RAYLEIGH DISTRIBUTION It is a special case of the Weibull model. Rayleigh model is named after the British physicist Lord Rayleigh (1842–1919). This model has wide applications in modern science and technology, specifically, communication theory, signal processing, aircraft technology, reliability, and survival analysis, etc. 5.3.1 TWO-PARAMETER RAYLEIGH DISTRIBUTION Two-parameters Rayleigh model can be derived by putting c = 2 in eq (5.1.1) and (5.1.2). The CDF and PDF of this model are respectively given by

  x − a 2  FX ( x) = 1 − exp  −   ;   b  

x > a (5.3.1.1)

Symbolically: X ~ R(a,b)   x − a 2  1 x−a f X ( x) 2  exp = −     ; x > a, b > 0, - ∞ < a < ∞ (5.3.1.2) b b    b  

170

Distribution Theory

FIGURE 5.4 Rayleigh distribution.

Sample Generation of Rayleigh Distribution R Code: Generate random sample from Rayleigh Distribution. x = rweibull(n, 2, b) where; n is the sample size; b is the parameters of the distribution Example: x = rweibull(100, 2, 1)

If a ≠ 0  Step 1: Generate x= rweibull(n, c, b)  Step 2: Generate y = x + a i.e., y is a random sample size n from Rayleigh (a, b). The properties of two-parameter Rayleigh Distribution are given below

Family of Weibull Distributions

171

5.3.1.1 PROPERTIES OF TWO-PARAMETER RAYLEIGH DISTRIBUTION a. Moments of Two-Parameter Rayleigh Distribution: 1 b π + a (5.3.1.3) 2

= µ1′

µ2′ = b 2 + a 2 + ab π (5.3.1.4) = µ3′

3 3 3 b π + a 3 + a 2 b π + 3ab 2 (5.3.1.5) 4 2

µ4′ =+ 2b 4 a 4 + 2a 3b π + 3ab3 π + 6a 2 b 2 (5.3.1.6)

b. Central Moments of Two-Parameter Rayleigh Distribution:

μ1 = 0

(5.3.1.7)

1 2 b [ 4 − π ] (5.3.1.8) 4

= µ2

1 3 b π 6 − 12 π + π  (5.3.1.9) 8

µ3 =

= µ4

1 4 b (16 − 3π 2 ) (5.3.1.10) 8

c. Skewness and Kurtosis of Two-Parameter Rayleigh Distribution: 2

π 6 − 12 π + π  (5.3.1.11) β1 =  3 [4 − π ]

γ1 =

π 6 − 12 π + π  (5.3.1.12) 3/ 2 [4 − π ]

(16 − 3π ) 2

β2 = γ2 =

[4 − π ]

2

(5.3.1.13)

4 − 6π 2 + 24π

[4 − π ]

2

(5.3.1.14)

Distribution Theory

172

d. Median, Quartile, Percentile, and Mode of Two-Parameter Rayleigh Distribution:

= Median b ( log e 2 )

1/ 2

+ a (5.3.1.15)

1/ 2

r  Qr = b  − log e  4 

+a

;r = 1,2,3

+a

;r =1,2,...,99

1/ 2

r   Pr = b  − log e  100  

1 Mode b   + a (5.3.1.16) = 2

e. Reliability Function, Hazard Function, and MTTF of Two-Parameter Rayleigh Distribution

  x − a 2  R( x) = exp  −    ; x > a (5.3.1.17)   b  

1 x−a = λ ( x) 2   ; x > a (5.3.1.18) b b 

MTTF =

1 b π + a (5.3.1.19) 2

Family of Weibull Distributions

173

FIGURE 5.5 Reliability function and hazard function of Rayleigh distribution.

f. Moments Generating Function, Characteristic Function (CF), and Entropy of Two Parameter Rayleigh Distribution

∞

M X (t ) eta ∑ = r =0

= φ X (t ) e

ita

∞

∑ r =0

( tb )

r

r!

( itb )

r

r!

Entropy = log e

1 + ( r / 2 ) (5.3.1.20) 1 + ( r / 2 ) (5.3.1.21)

b ϒ + + 1 (5.3.1.22) 2 2

5.3.2 ONE PARAMETER RAYLEIGH DISTRIBUTION One parameter Rayleigh model can be derived by putting c = 2 and a = 0 in eq (5.1.1) and (5.1.2). The CDF and PDF of this model are respectively given by

  x 2  FX ( x) = 1 − exp  −    ;   b  

x > 0, b > 0 (5.3.2.1)

Symbolically: X ~ R(b)

 x 1 x f X ( x) 2   exp  −   = bb   b 

2

  ; x > 0, b > 0 (5.3.2.2) 

Distribution Theory

174

The properties of one parameter Rayleigh Distribution are given below 5.3.2.1 PROPERTIES OF ONE PARAMETER RAYLEIGH DISTRIBUTION a. Moments of One Parameter Rayleigh Distribution: 1 2

µ1′ = b π (5.3.2.3)

µ2′ = b 2 (5.3.2.4)

µ3′ = b3 π (5.3.2.5)

µ4′ = 2b 4 (5.3.2.6)

3 4

b. The first four Central Moments of one parameter Rayleigh model are same as two-parameter Rayleigh model (given in Eqns. (5.3.1.7)–(5.3.1.10)). c. Skewness and Kurtosis of one parameter Rayleigh model are same as two-parameter Rayleigh model (given in Eqns. (5.3.1.11)–(5.3.1.14)). d. Median, Quartile, Percentile, and Mode of One Parameter Rayleigh Distribution

Median = b ( log e 2 ) (5.3.2.7) 1/ 2

1/ c

r  Q= b  − log e  r 4 

; r = 1,2,3

1/ c

r   P= b  − log e r  100  

;r =1,2,...,99

1 Mode = b   (5.3.2.8) 2

e. Reliability Function, Hazard Function, and MTTF of One Parameter Rayleigh Distribution

  x 2  R( x) = exp  −    ; x > 0 (5.3.2.9)   b  

1 x = λ ( x) 2   ; x > 0 (5.3.2.10) bb

Family of Weibull Distributions

175

1 b π (5.3.2.11) 2

MTTF = b= 3/ 2

f. Moments Generating Function, Characteristic Function (CF), and Entropy of One Parameter Rayleigh Distribution

M X (t ) =

= φ X (t )

∞

∑ r =0

∞

∑ r =0

( tb )

r

r!

( itb )

r

r!

= log e Entropy

1 + ( r / 2 ) (5.3.2.12) 1 + ( r / 2 ) (5.3.2.13) b ϒ + + 1 (5.3.1.14) 2 2

5.4 EXPONENTIAL DISTRIBUTION The exponential distribution is a popular distribution in industrial applications, where the units manufactured using the standard process are modeled by two parameters exponential distribution. It also arises naturally when describing the waiting time in a homogeneous Poisson Process. It is also a useful model for queuing theory, reliability theory, physics, and hydrology, etc. It is a submodel of the Weibull model. 5.4.1 TWO-PARAMETER EXPONENTIAL DISTRIBUTION Two-parameters exponential model can be derived by putting c = 1 in eq (5.1.1) and (5.1.2). The CDF and PDF of this model are respectively given by

  x − a  FX ( x) = 1 − exp  −   ;   b 

x > a, b > 0 (5.4.1.1)

Symbolically: X ~ Exp(a,b)

f= X ( x)

  x − a  1 exp  −    ; x > a, b > 0, - ∞ < a < ∞ (5.4.1.2) b   b 

Distribution Theory

176

FIGURE 5.6 Exponential distribution.

Sample Generation of Exponential Distribution R Code: Generate random sample from Exponential Distribution. x = rexp(n, d) where; n is the sample size; d is the reciprocal of the scale parameter Example: x = rexp(100, 1) If a ≠ 0  Step 1: Generate x = rexp(n, d)  Step 2: Generate y = x + a i.e., y denotes the vector containing Exponential (a,d) data set of size n. 5.4.1.1 PROPERTIES OF TWO-PARAMETER EXPONENTIAL DISTRIBUTION a. Moments of Two-Parameter Exponential Distribution

µ1′= b + a (5.4.1.3)

µ2′ = 2b 2 + a 2 + 2ab (5.4.1.4)

Family of Weibull Distributions

177

µ3′ = 6b3 + a 3 + 6a 2 b + 6ab 2 (5.4.1.5)

µ4′= 24b 4 + a 4 + 4a 3b + 24ab3 + 12a 2 b 2 (5.4.1.6)

b. Central Moments of Two-Parameter Exponential Distribution

μ1 = 0

μ2 = b2 (5.4.1.8)

μ3 = 2b3 (5.4.1.9)

μ4 = 9b4 (5.4.1.10)

(5.4.1.7)

c. Skewness and Kurtosis of Two-Parameter Exponential Distribution

β1 = 4

(5.4.1.11)

γ1 = 2

(5.4.1.12)

β2 = 9

(5.4.1.13)

γ2 = 6

(5.4.1.14)

d. Median, Quartile, Percentile, and Mode of the Two-Parameter Exponential Distribution = Median b ( log e 2 ) + a (5.4.1.15)

r  Qr = b  − log e  + a 4  r   Pr = b  − log e +a 100  

;r =1,2,3 ;r =1,2,...,99

Mode = a (5.4.1.16)

e. Reliability Function, Hazard Function, MTTF of Two-Parameter Exponential Distribution

  x − a  R( x) = exp  −    ; x > a (5.4.1.17)   b 

λ ( x) =

Constant hazard rate.

1 (5.4.1.18) b

MTTF = b + a (5.4.1.19)

Distribution Theory

178

FIGURE 5.7 Reliability function and hazard function of Rayleigh distribution.

f. Moments Generating Function and Characteristic Function (CF) of Two-Parameter Exponential Distribution

M X (t ) = eta (1 − tb ) ; t
0 (5.4.2.1)

Symbolically: X ~ Exp(b)

f X ( x)=

  x  1 exp  −    ; x > 0, b > 0 (5.4.2.2) b   b 

5.4.2.1 PROPERTIES OF ONE PARAMETER EXPONENTIAL DISTRIBUTION a. Moments:

μʹ1 = b (5.4.2.3)

μʹ2 = 2b2 (5.4.2.4)

μʹ3 = 6b3 (5.4.2.5)

μʹ4 = 24b4 (5.4.2.6)

b. The first four Central Moments of one parameter exponential model are same as two-parameter exponential model (given in Eqns. (5.4.1.7) –(5.4.1.10)) c. Skewness and Kurtosis of one parameter exponential model are same as two-parameter exponential model (given in Eqns. (5.4.1.11) –(5.4.1.14)) d. Median, Quartile, Percentile, and Mode

Median = b(loge 2) r  Q= b  − log e  r 4   r   P= b  − log e r  100  

;r=1,2,3 ;r =1,2,...,99

(5.4.2.7)

Distribution Theory

180

Mode = 0

(5.4.2.8)

e. Reliability Function, Hazard Function, MTTF

  x  R( x) = exp  −    ; x > 0 (5.4.2.9)   b 

λ ( x) = . (5.4.2.10)

1 b

It is a constant hazard rate and it is the same as two parameter exponential distribution. MTTF = b= 2 b (5.4.2.11) f. Moment Generating Function (MGF) and Characteristic Function (CF)

M X (t ) = (1 − tb ) ; t
d= ) P ( X > x);

for d,x ≥ 0

Family of Weibull Distributions

= P ( X > d + x / X > a) =

=

181 P ( X > d + x, X > d ) P( X > d )

;

for d,x ≥ 0

P ( X > d + x ) 1 − FX (d + x) = P( X > d ) 1 − FX (d ) e−b( x + d ) − bx P( X > x) = e= e − bd

Therefore, P ( X > d + x / X > d= ) P ( X > x);

for d,x ≥ 0

5.5 SOME MODIFICATIONS OF WEIBULL DISTRIBUTION Although, the Weibull model is one of the most useful models for analyzing the lifetime/failure time data, the data sets with bathtub shape hazard rates cannot be analyzed by the Weibull model. For analyzing these data sets, some modified Weibull distributions are available in the statistical literature. Now, we will discuss a few modified Weibull distributions. 5.5.1 EXPONENTIATED WEIBULL DISTRIBUTION Mudholkar and Srivastava (1993) developed a three-parameter exponentiated Weibull model by adding a new shape parameter d > 0 in the standard two-parameter Weibull model. The PDF and CDF of this model is: x x −  cdx c −1 − b   b  f X ( x)= e e 1 − bc  c

x  −   and FX ( x) = 1− e  b  

c

c

  

d −1

;x > 0

d

  ;x > 0 

where b > 0 denotes scale parameter and c > 0, d > 0 denote shape parameters. Mudholkar et al. (1995) further studied various properties of exponentiated Weibull distribution and analyzed the bus-motor failure data

Distribution Theory

182

FIGURE 5.8 Density function and distribution function of exponentiated Weibull distribution.

The particular cases of this model are given below: Parameters

Name of Distribution

d=1

Two-parameter Weibull distribution

c=2

Generalized Rayleigh distribution

c=1

Generalized exponential distribution (see Kundu and Gupta, 1999)

d = 1, c = 2

Rayleigh distribution

d = 1, c = 1

Exponential distribution

Family of Weibull Distributions

183

The hazard function of the distribution is given by λ ( x) =

1 f ( x) R( x) x

cdx c −1 − b  = e d c    bc x −   1 − 1 − e  b           1

c

x  −  1 − e  b  

c

  

d −1

,x > 0

The following table provides the failure rate behavior of this distribution (see page 300, Mudholkar and Srivastava, 1993). c

d

Failure Rate Behavior

1

1

Constant (exp. distribution)

1

Monotonic (Weibull distribution)

1

Increasing

>1

0, b > 0, c > 0  1 

c

−  1 = fY ( y ) c c c +1 e  by  ; y > 0, b > 0, c > 0 b y

FIGURE 5.10 Density function and distribution function of inverse Weibull distribution.

Family of Weibull Distributions

λ ( y) =

= c

185

 1 

1  1   −  1 − e  by   

c

−  1 c c c +1 e  by   b y   

c

1 1 ; y > 0, b > 0, c > 0 b c y c +1   1 c   e by  − 1    

FIGURE 5.11 Hazard function of inverse Weibull distribution.

5.5.3 MODIFIED WEIBULL DISTRIBUTION Gurvich et al. (1997) developed a modified Weibull distribution in the context of modeling random strength of brittle materials. Lai et al. (2003) independently proposed the same distribution for modeling failure times. The authors studied the several properties of this model. The CDF and PDF of this model are, respectively, given by

  x c  FX ( x) = 1 − exp  −   exp(dx)  ; x > 0 (5.5.1)   b  

Distribution Theory

186

f X ( x)

 dx + c   x   b  b    

c −1

  x c  exp(ax) exp  −   exp(dx)  ; x > 0 ; d > 0, b > 0, c > 0   b  

FIGURE 5.12 Density function and distribution function of modified Weibull distribution.

The hazard function is given by  dx + b   x  = λ ( x)     c  b 

c −1

exp(dx); x > 0

Family of Weibull Distributions

187

Since the parameters b and d do not influence the slope of λ(x), we may consider the two cases, namely, c ∈ (0,1) and c ≥ 1. In the case c ∈ (0,1), λ(x) initially decreases and then increases in x, i.e., λ(x) has a bathtub shape. Recall that the hazard function of the usual Weibull distribution is constant, increasing, or decreasing. Thus, the hazard function λ(x) for the modified Weibull distribution differs from the usual Weibull distribution (Figure 5.13). In another case c ≥ 1, the hazard function λ(x) in increasing function of x, i.e., the modified Weibull model has an increasing hazard function. For more details about the modified Weibull model, the readers are advised to see Gurvich et al. (1997); and Lai et al. (2003).

FIGURE 5.13 Hazard function of modified Weibull distribution.

KEYWORDS • • • • •

Characteristic function Cumulative distribution function Meantime to failure Moment generating function Probability density function

188

Distribution Theory

REFERENCES Fréchet, M., (1927). On the Law of Probability of the Maximum Deviation (Vol. 6, pp. 93–116). Annals of the Polish Mathematical Society, Krakow. Gurvich, M. R., Dibenedetto, A. T., & Rande, S. V., (1997). A new statistical distribution for characterizing the random strength of brittle materials. Journal of Materials Science, 32, 2559–2564. Kundu, D., & Gupta, R. D., (1999). Generalized exponential distribution. Aust. N Z J. Stat., 41, 173–188. Lai, C. D., Xie, M., & Murthy, D. N. P., (2003). A modified Weibull distribution. IEEE Transactions on Reliability, 52(1), 33–37. Mudholkar, G. S., & Srivastava, D. K., (1993). Exponentiated Weibull family for analyzing bathtub failure rate data. IEEE Transactions on Reliability, 42(2), 299–302. Mudholkar, G. S., Srivastava, D. K., & Freimer, M., (1995). The exponentiated Weibull family: A reanalysis of the bus-motor-failure data. Technometrics, 37(4), 436–445. Murthy, D. N. P., Xie, M., & Jiang, R., (2003). Weibull Models (Vol. 358). New York: Wiley. Rinne, H., (2009). The Weibull Distribution: A Handbook. CRC Press, Taylor & Francis. Rosin, P., & Rammler, E., (1933). The laws governing the fineness of powdered coal. Journal of the Institute of Fuel, 7, 29–36. Shannon, C. E., (1948). The mathematical theory of communication. Bell Syst. Tech. J., 27, 249–428. Weibull, W., (1951). A statistical distribution function of wide applicability. Journal of Applied Mechanics, 18, 293–297. Weibull, W., (1952). Discussion: A statistical distribution function of wide applicability. Journal of Applied Mechanics, 19, 233–234. Weibull, W., (1961). Fatigue Testing and Analysis of Results. Pergamon Press, New York. Xie, M., Tang, Y., & Goh, T. N., (2002). A modified Weibull extension with bathtub-shaped failure rate function. Reliability Engineering System Safety, 76(3), 279–285.

CHAPTER 6

Life Distributions MOHD. ARSHAD1, MUKTI KHETAN2, VIJAY KUMAR3, and FOZIA HOMA4 Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, India 1

Department of Mathematics, Amity School of Applied Sciences, Amity University Mumbai, Maharashtra, India

2

Department of Mathematics, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India

3

Department of Statistics, Mathematics, and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India

4

6.1 PARETO DISTRIBUTION The Pareto distribution is named after an Italian-born Swiss professor of economics, civil engineer, and sociologist, Vilfredo Pareto (1897). Pareto developed a law, which dealt with the distribution of income or distribution of wealth over a population or a society, known as Pareto’s Law. His original quotation towards the distribution of income or wealth explained it as an 80–20 proportion. In simple terms, the smaller proportion of the population experiencing the larger portion of wealth. This 80–20 rule is also known as the Pareto principle.

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

Distribution Theory

190

 Definition: If a random variable X follows the Pareto distribution, then the expression of the cumulative distribution function (CDF) is α λ λ λ , α > 0, λ > 0.

FIGURE 6.1 Pareto distribution.

Sample Generation of Using Inverse Transformation Method: α

λ FX ( x) = 1−   ; x α

λ U = 1−   x

x>λ

Life Distributions

191

α

λ (1 − U ) =   x X =

λ (1 − U )1/α

,

where U follows uniform (0,1).  Note: Firstly, we generate a random sample of size n from U(0,1), i.e., λ u = (u1 , u2,…,un) (say). = Define, xi = ; i 1, 2,.., n . Then, x = 1/ α (1 − ui )

(x1 , x2,…,xn) gives a random sample of size n from P(α, λ).  Example 6.1: Let u = (0.1,0.3,0.22,0.51,0.11,0.6) be a random sample generated from U(0,1) distribution. Now, we want to generate data λ from P(2,3), P(2,5) and P(3,5). Calculate ; i = 1, 2,.., 6 , 1/ α (1 − ui )

where ui denotes the ith element of = (u1,u2,…,un) U(0,1) 0.10

Random Sample P(2,5) 3.162278

P(2,5) 5.270463

P(3,5) 5.178721

0.30

3.585686

5.976143

5.631239

0.22

3.396831

5.661385

5.431734

0.51

4.285714

7.142857

6.342171

0.11

3.179994

5.299989

5.198045

0.60

4.743416

7.905694

6.786044

6.1.1 PROPERTIES OF THE PARETO DISTRIBUTION a. Raw Moments of Pareto Distribution: The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived below. rth raw moment

µr′ = E ( X r ) ∞

= ∫ x rα λ

∞

λα xα +1

dx

= αλ α ∫ x −α + r −1dx λ

Distribution Theory

192

∞

 x −α + r  λ −α + r αλ α = αλ =   α −r  −α + r  λ α

µr′ =

λ rα ; (α − r )

α > r= ; r 1, 2,3, 4

= For r = 1, µ1′

λα ; (α − 1)

For= r = 2, µ2′

λ 2α ; α > 2, (6.1.2) (α − 2 )

= For r = 3, µ3′

λ 3α ; (α − 3 )

For= r = 4, µ4′

λ 4α ; α > 4 (6.1.4) (α − 4 )

α > 1, (6.1.1)

α > 3, (6.1.3)

b. Central Moments of Pareto Distribution: The first four central moments μ1 , μ2, μ3, μ4 have been derived in this section. We have already discussed that the first central moment is always zero μ1 = 0

(6.1.5)

For the second central moment, we are utilizing the values of the first two raw moments and we get: µ= µ2′ − µ 1′2 2 2

 λα  λ 2α = −  [using Eqns. (6.1.1) and (6.1.2)] (α − 2 )  (α − 1)  =

=

λ 2α  α − 1)2 − (α − 2 ) α  2 (  (α − 2 )(α − 1)

λ 2α  1 + α 2 − 2α ) − ( −2α + α 2 )  2 (  (α − 2 )(α − 1) µ2 =

λ 2α 2 (α − 2 )(α − 1) (6.1.6)

λ 2α ;α > 2 Thus, the variance of the Pareto distribution is 2 (α − 2 )(α − 1)

Life Distributions

193

The third central moment is µ3 = µ3′ − 3µ2′ µ1′ + 2µ 1′3

Similarly, we get the third central moment using the first three raw moments 3

 λα  λ 3α λ 2α  λα  = −3   + 2  [using Eqns. (6.1.1)–(6.1.3)] (α − 3) (α − 2 )  (α − 1)   (α − 1) 

λ 3α  α − 2 )(α − 1)3 − 3α (α − 3)(α − 1)2 + 2α 2 (α − 3)(α − 2 )  3 (  (α − 3)(α − 2 )(α − 1) =

(α 4 − 3α 3 − α + 3α 2 − 2α 3 + 6α 2 + 2 − 6α )  λ 3α   3 (α − 3)(α − 2 )(α − 1)  −3α (α 3 − 2α 2 + α − 3α 2 + 6α − 3) + 2 (α 4 − 5α 3 + 6α 2 ) 



=

λ 3α 3 (α − 3)(α − 2 )(α − 1)

(α 4 − 3α 3 − α + 3α 2 − 2α 3 + 6α 2 + 2 − 6α )     −3α (α 3 − 5α 2 + 7α − 3) + 2 (α 4 − 5α 3 + 6α 2 )   

=

λ 3α 3 (α − 3)(α − 2 )(α − 1)

α 4 − 5α 3 + 9α 2 − 7α + 2    4 3 2 4 3 2  −3 (α − 5α + 7α − 3α ) + 2 (α − 5α + 6α ) 

= µ3

2λ 3α

(α − 3)(α − 2 )(α − 1)

3

(α + 1) ; α > 3 (6.1.7)

Similarly, we get fourth central moment using the first four raw moments µ4 = µ4′ − 4µ3′ µ1′ + 6µ2′ µ 1′2− 3µ 1′4 2

 λα  λ 4α λ 3α  λα  λ 2α  λα  = −4  +6   − 3  (α − 4 ) (α − 3)  (α − 1)  (α − 2 )  (α − 1)   (α − 1) 

=

4

[using Eqns. (6.1.1)–(6.1.4)]

λ 4α 4 (α − 4 )(α − 3)(α − 2 )(α − 1)

(α − 3)(α − 2 )(α − 1)4 − 4α (α − 4 )(α − 2 )(α − 1)3     +6α 2 (α − 4 )(α − 3)(α − 1)2 − 3α 3 (α − 4 )(α − 3)(α − 2 )   

Distribution Theory

194

(α − 3)(α − 2 ) (α 4 + 1 − 4α 3 + 6α 2 − 4α )     −4α (α − 4 )(α − 2 ) (α 3 − 1 − 3α 2 + 3α )  λ 4α   = 4  (α − 4 )(α − 3)(α − 2 )(α − 1)  +6α 2 (α − 4 )(α − 3) (α 2 − 2α + 1)    −3α 3 (α − 4 )(α − 3)(α − 2 )    (α 2 − 5α + 6 )(α 4 + 1 − 4α 3 + 6α 2 − 4α )    2 3 2   4 − α α − α + α − − α + α 4 6 8 1 3 3 ( )( ) λα   = 4  (α − 4 )(α − 3)(α − 2 )(α − 1)  +6α 2 (α 2 − 7α + 12 )(α 2 − 2α + 1)    −3α 3 (α 2 − 7α + 12 ) (α − 2 )    =

λ 4α 4 (α − 4 )(α − 3)(α − 2 )(α − 1)

 α 6 + α 2 − 4α 5 + 6α 4 − 4α 3 − 5α 5 − 5α + 20α 4 − 30α 3 + 20α 2      4 3 2  +6α + 6 − 24α + 36α − 24α    5 2 4 3 4 3 2 3  −4α  α − α − 3α + 3α − 6α + 6α + 18α − 18α + 8α − 8      2   −24α + 24α    +6α 2 (α 4 − 2α 3 + α 2 − 7α 3 + 14α 2 − 7α + 12α 2 − 24α + 12 )     −3α 3 (α 3 − 7α 2 + 12α − 2α 2 + 14α − 24 )    (α 6 − 9α 5 + 32α 4 − 58α 3 + 57α 2 − 29α + 6 )     −4α (α 5 − 9α 4 + 29α 3 − 43α 2 + 30α − 8 )  4 λα   = 4  (α − 4 )(α − 3)(α − 2 )(α − 1)  +6α 2 (α 4 − 9α 3 + 27α 2 − 31α + 12 )    −3α 3 (α 3 − 9α 2 + 26α − 24 )    (α 6 − 9α 5 + 32α 4 − 58α 3 + 57α 2 − 29α + 6 )     −4 (α 6 − 9α 5 + 29α 4 − 43α 3 + 30α 2 − 8α )  λ 4α   = 4  (α − 4 )(α − 3)(α − 2 )(α − 1)  +6 (α 6 − 9α 5 + 27α 4 − 31α 3 + 12α 2 )    −3 (α 6 − 9α 5 + 26α 4 − 24α 3 )   

µ4

3λ 4α

(α − 4 )(α − 3)(α − 2 )(α − 1)

4

( 3α

2

+ α + 2 ) ; α > 4 (6.1.8)

Life Distributions

195

c. Skewness and Kurtosis of Pareto Distribution: Using the Eqns. (6.1.6) and (6.1.7), we get µ2 β1 = 33 µ2   2λ 3α (α + 1)  3  (α − 3)(α − 2 )(α − 1)  = 3   λ 2α  2   (α − 2 )(α − 1) 

(α − 2 )  2 (α + 1)  β1 =   α  (α − 3) 

2

2

Using the above result β1, we get γ1 =

µ3 µ23/ 2

2λ 3α (α + 1) =

(α − 3)(α − 2 )(α − 1)   λ 2α  2   (α − 2 )(α − 1) 

γ1 =

3

3/ 2

(α − 2 )  2 (α + 1)    α  (α − 3) 

Using the Eqns. (6.1.6) and (6.1.8), we get β2 = 3λ 4α ( 3α 2 + α + 2 )

=

µ4 µ22

(α − 4 )(α − 3)(α − 2 )(α − 1)   λ 2α  2   (α − 2 )(α − 1) 

β2 =

2

4

[using Eqns. (6.1.6) and (6.1.8)]

3 (α − 2 ) ( 3α 2 + α + 2 )

α (α − 3)(α − 4 )

Distribution Theory

196

Using the above result β2, we get γ 2 = β2 – 3 = =

3

3 (α − 2 ) ( 3α 2 + α + 2 )

(α − 4 )(α − 3) α

−3

(α − 2 ) ( 3α 2 + α + 2 ) − (α − 4 )(α − 3) α  

(α − 4 )(α − 3) α  3

( 3α 3 + α 2 + 2α − 6α 2 − 2α − 4 ) − (α 2 − 7α + 12 ) α  

(α − 4 )(α − 3) α  =

3

( 3α 3 − 5α 2 − 4 ) − (α 3 − 7α 2 + 12α )   4 3 α − α − α ( )( ) 

γ2 =

6 (α 3 − α 2 + 6α − 2 )

α (α − 3)(α − 4 )

d. Median, Quartile, and Percentile of the Pareto Distribution: Median divides the data into two equal parts, or in other words, we can write FX ( M ) =

1 , 2

where M denotes the median. From the above section, we know the CDF of Pareto distribution, so by substituting M in place of x in the expression of the CDF, we get α

1  λ  1−   = M 2  

M = λ21/α Quartiles divide the data into four equal parts, or in other words, we can write = FX (Qr )

r = ; r 1, 2,3 4

where Qr denotes the rth quantile. Qr =

λ 1/ α

 r 1 −   4

Clearly, Q2 equals to M, i.e., the second quantile is equal to the median.

Life Distributions

197

Percentiles divide the data into 100 equal parts, or in other words, we can write = FX ( Pr )

r = r 1, 2,...,99 100

where; Pr denotes the rth percentile. = Pr

λ

= r 1, 2,...,99 1/ α r   1 −   100 

We can find the expressions of P1, P2, …, P100 by putting r =1 to 99, respectively in the above expression. e. Mode of the Pareto Distribution: Mode is the highest occurred value in the data set and to obtain the mode of this model, we have to check the behavior of its PDF. Now, consider the logarithmic of the PDF  αλ α  = log f X ( x) log e  α +1  , x>λ x  log f X ( x= ) log e α + α log e λ − (α + 1) log e x,

x>λ

Differentiating both the side with respect to x

(α + 1) 1 f X′ ( x) = − f X ( x) x  (α + 1)  f X′ ( x) =−   f X ( x) < 0,  x 

∀x > λ

This implies that f (x) decreasing function of x. Therefore, mode = λ f. Reliability Function of Pareto Distribution R( x) =P[ X > x] =1 − FX ( x)  λ α ;x > λ  R( x) =  x   ;x ≤ λ 1

g. Hazard Rate of Pareto Distribution The hazard (or failure) rate λ(x) of this model is given by λ ( x) =

1 f ( x) R( x)

Distribution Theory

198

Substituting the expression of R(x) and f (x), we get λ ( x) =

α +1

1 α

λ   x

λ ( x) =

α x

1  

αλ α   x ;x > λ

FIGURE 6.2 Reliability function and hazard function of Pareto distribution.

Life Distributions

199

h. Mean Time to Failure of Pareto Distribution: In reliability terminology, the mean of the random variable X in the absence of repair and replacement is known as the meantime to failure (MTTF). The expression of MTTF is the same as the mean of the model, i.e., αλ = MTTF ,α > 1. (α − 1) 6.2 GENERALIZED PARETO DISTRIBUTION If a random variable X follows Pareto distribution, then the expression of the CDF is α

 λ  FX ( x) = 1 −   ; λ+x

0< x0) are the parameters of the distribution. The density function is f X= ( x) = f X ( x)

α d d   λ   FX= ( x) 1 −     dx dx   λ + x  

αλ α (λ + x)α +1

; x > 0, α > 0, λ > 0.

6.2.1 PROPERTIES OF THE GENERALIZED PARETO DISTRIBUTION a. Moments of Generalized Pareto Distribution The first four raw moments μʹ1 , μʹ2, μʹ3, μʹ4 have been derived below μʹ1 = E(X ) ∞

= ∫ xα 0

∞

λα (λ + x)α +1

= αλ α ∫ x 0

dx

1 dx (λ + x)α +1

Let λ + x = z ⇒ dx = dz & x = z − λ

Distribution Theory

200

Substituting the values of x and dx, we get ∞

= αλ α ∫ ( z − λ ) λ

1 z

α +1

∞

dz

∞

= αλ α ∫ z −α dz − αλ α +1 ∫ z −α −1dz λ

λ

∞

−α

∞

−α

∞

 z   z  α +1 = αλ   − αλ    ( −α + 1)  λ  ( −α )  λ − α +1

α

∞

 λ  λ  α +1 = αλ α   − αλ    ( −α + 1)  λ  ( −α )  λ − α +1

=

αλ +λ (α − 1) λ

= µ1′

(α − 1)

;α > 1

λ . Thus, the mean of the Generalized Pareto distribution is (α − 1) Second raw moment

μʹ2 = E(X 2) ∞

= ∫ x 2α 0

∞

λα (λ + x)α +1

=αλ α ∫ x 2 0

dx

1 dx (λ + x)α +1

Let λ + x = z ⇒ dx = dz & x = z − λ

Substituting the values of x and dx, we get ∞

= αλ α ∫ ( z − λ ) 2 λ

1 z

∞

= dz αλ α ∫ ( z 2 + λ 2 − 2 zλ )

α +1

λ

∞

∞

∞

λ

λ

λ

1 dz zα +1

=αλ α ∫ z −α +1dz + αλ α + 2 ∫ z −α −1dz − 2αλ α +1 ∫ z −α dz ∞

∞

∞

 z   z   z  α +2 α +1 = αλ   + αλ   − 2αλ    ( −α + 2 )  λ  ( −α )  λ  ( −α + 1)  λ −α +2

α

−α

− α +1

Life Distributions

201

 λ     λ  α +2 λ α +1 = αλ α   + αλ   − 2αλ    (α − 2 )   (α )   (α − 1)  −α +2

−α

− α +1

 λ  λ   λ  = α  +α   − 2α    (α − 2 )   (α )   (α − 1)  2

2

2

 1 2  1 1  = αλ  − −2  1 − α   (α − 2 ) α =

λ

2

(α − 1)(α − 2)

(α − α

2

− α 2 + 3α − 2 − 4α + 2α 2 ) 2

= µ2′

2λ ;α > 2 (α − 1)(α − 2)

c. Reliability Function, Hazard Rate, Mean Time to Failure of Generalized Pareto Distribution  λ α ;x > 0  R( x) =  x + λ  1 ;x ≤ 0  1 ;x > 0 = λ ( x) α (x + λ) = MTTF

λα ;α > 1 α ( − 1)

6.3 BURR DISTRIBUTION If a random variable X follows Burr distribution, then the expression of the CDF is: α

 1  ; FX ( x) = 1 −  γ  1+ x 

0< x0) are the parameters of the distribution. The density function is f X= ( x)

α d d   1   FX= ( x) 1 −    dx dx   1 + xγ  

Distribution Theory

202

f X ( x) =

αγ xγ −1

; x > 0, α > 0, γ > 0

(1 + xγ )α +1

FIGURE 6.3 Burr distribution.

Sample Generation of Using Inverse Transformation Method: α

 1  1−  ; FX ( x) = γ  1+ x 

x>0

α

 1  U = 1−  γ  1+ x 

α

 1  1−U =  γ  1+ x 

1/ γ

(1 − U )−1/α − 1 x=  

where U follows uniform (0,1).  Note: Generate a random sample of size n from U(0,1) i.e., u = (1 − ui ) (u1,u2,…,un) (say). Define, xi = 

−1/ α

1/ γ

− 1 

; i= 1, 2,.., n. Then, x =

(x1,x2,…,xn) gives a random sample of size n from Burr(α, γ).

Life Distributions

203

 Example 6.2: Let u = (0.1,0.3,0.22,0.51,0.11,0.6) be a random sample from U(0,1) distribution. Now, we want to generate a random sample 1/ γ

(1 − ui )−1/α − 1 ; i = from Burr distribution. Calculate xi = 1, 2,.., 6,   where ui denotes the ith element of (u1,u2,…,un). U(0,1)

Random Sample Burr(2,3)

Burr(2,5)

Burr(3,5)

0.10

0.3781921

0.5579914

0.5136190

0.30

0.5801155

0.7212880

0.6610660

0.22

0.5095203

0.6672628

0.6127018

0.51

0.7539474

0.8441209

0.7687190

0.11

0.3914822

0.5696750

0.5242743

0.60

0.8345006

0.8971312

0.8139251

6.3.1 PROPERTIES OF THE BURR DISTRIBUTION rth raw moments:

µr′ = E ( X r ) ∞

αγ xγ −1

= ∫ xr

(1 + xγ )α +1

0

dx

∞

xγ + r −1 dx γ α +1 0 (1 + x )

=αγ ∫

Let xγ =z ⇒ x =( z )

1/ γ

∞

= αγ ∫ 0

z [ z]

( r −1)/ γ

(1 + z )α +1

& dx =1/ γ ( z ) 1/ γ ( z )

(1/ γ ) −1

(1/ γ ) −1

dz

dz

∞

1 (r /γ ) z dz α +1 ( ) (1 ) + z 0

=α∫

Now, using the results of the Beta function of the second kind, we get 

r r



µr′ = α B  α − , + 1 , γ γ  

α>

r

γ

Distribution Theory

204

For r= 1,

1 1





µ1′= α B  α − , + 1 , γ γ  

1

α> . γ

Therefore, the mean of the Burr distribution is 1 1





α B  α − , + 1 γ γ   For r= 2

2 2





µ2′ = α B  α − , + 1 , γ γ  

2

α> . γ

The variance of the Burr distribution is given below µ= µ2′ − µ 1′2 2 

2 2

 

1 1





2

µ2 = α B  α − , + 1 − α B  α − , + 1  γ γ    γ γ  

b. Median, Quartile, and Percentile of the Burr Distribution: Median divides the data into two equal parts, or in other words, we can write FX ( M ) =

1 2

where M denotes the median. From the above section, we know the CDF of Burr distribution, so by substituting M in place of x in the expression of the CDF, we get α

1  1  = 1−  γ  2 1 M +   1/ α

 1  1 ⇒ =   γ  1+ M   2  ⇒ 1 + M γ =21/α

⇒ M= ( 21/α − 1) 

1/ γ

Quartiles divide the data into four equal parts, or in other words, we can write FX (Qr ) =

r r 1, 2,3 = 4

Life Distributions

205

where Qr denotes the rth quantile. 1/ γ

    1  = Qr − 1   r 1/α   1 −    4  

Clearly, Q2 equals to M, i.e., the second quantile is equal to the median. Percentiles divide the data into 100 equal parts, or in other words, we can write FX ( Pr ) =

r r 1, 2,...,99 = 100

where Pr denotes the rth percentile. 1/ γ

    1  1 Pr =  − 1/ α   r   1 −     100  

r = 1, 2,...,99

We can find the expressions of P1, P2, …, P100 by putting r =1 to 99, respectively in the above expression. c. Mode of the Burr Distribution: Mode is the highest occurred value in the data set and to obtain the mode of this model, we have differentiated the PDF with respect to x, which is given below  αγ xγ −1  log f X ( x) = log e  γ α +1   (1 + x )  log f X ( = x) log e α + log e γ + ( γ − 1) log e x − (α + 1) log e (1 + xγ )

Differentiating both the side with respect to x 1 = f X' ( x) f X ( x)

(γ − 1) (α + 1) x

−

(1 + xγ )

γ xγ −1

 ( γ − 1) (α + 1) γ −1  γ x  f X ( x) = − f X′ ( x)  (1 + xγ )  x 

Distribution Theory

206

Since f X′ ( x) = 0 if  ( γ − 1) (α + 1) γ −1  γ x  f X ( x) = − 0  (1 + xγ )  x 

The density function always positive. ⇒

(γ − 1) (α + 1) −

x xγ =

(1 + xγ )

γ xγ −1 = 0

− ( γ − 1)

( (γ − 1) − (α + 1) γ ) 1/ γ

 γ −1  mode =    1 + αγ 

d. Reliability Function of Burr Distribution: The reliability function R(x) at the time (x) of the component/system is given by R(x) = Probability that the time to failure of the component/system is greater than x R( x) =P[ X > x] =1 − FX ( x) α

 1  R( x)  , = γ  1+ x 

0< x 0

Life Distributions

207

FIGURE 6.4 Reliability function and hazard function of Burr distribution.

g. Mean Time to Failure of Burr Distribution: The expression of MTTF of this model is  1 1  MTTF= α B  α − , + 1 , γ γ  

α>

1

γ

208

Distribution Theory

KEYWORDS • Cumulative distribution function • Meantime to failure • Pareto distribution • Probability density function

REFERENCES Alzaatreh, A., Famoye, F., & Lee, C., (2013). Weibull-pareto distribution and its applications. Communication in Statistics- Theory and Methods, 42(42), 1673–1691. Hakim, A., Fithriani, I., & Novita, M., (2021). Properties of Burr distribution and its application to heavy-tailed survival time data. Journal of Physics: Conference Series, 1725. 012016. 10.1088/1742-6596/1725/1/012016. Ibrahim, E., & Gokarna, A., (2017). A new generalization of the exponential Pareto distribution. Journal of Information and Optimization Sciences, 38(5), 675–697. Jayakumar, K., Bindu, K., & Hamedani, G. G., (2020). On a new generalization of Pareto distribution and its applications. Communications in Statistics-Simulation and Computation, 49(5), 1264–1284. Moharram, S. H., Gosain, A. K., & Kapoor, P. N., (1993). A comparative study for the estimators of the generalized Pareto distribution. Journal of Hydrology, 150(1), 169–185.

CHAPTER 7

Dynamics of Data Analysis and Visualization with Python PRASHANT VERMA1, SHWETA DIXIT2, MUKTI KHETAN3, SURESH BADARLA3, and FOZIA HOMA4 Department of Statistics, Faculty of Science, University of Allahabad, Prayagraj, India 1

Clinical Development Services Agency, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Haryana, India

2

Department of Mathematics, Amity School of Applied Sciences, Amity University Mumbai, Maharashtra, India

3

Department of Statistics, Mathematics, and Computer Application, Bihar Agricultural University, Sabour, Bhagalpur, Bihar, India

4

7.1 BACKGROUND In 1989, Guido Van Rossum developed an object-oriented programming language – Python. It has been optimally designed for rapid prototyping of complex applications in the field of data science. It has interfaces to several Operating Systems and libraries. Considering the large-scale utilities of Python, many worldwide renowned companies and organizations use the

Distribution Theory: Principles and Applications. Fozia Homa, Mukti Khetan, Mohd. Arshad, & Pradeep Mishra (Naz) (Eds.) © 2024 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)

210

Distribution Theory

Python programming language, e.g., GlaxoSmithKline (GSK), ISRO, NASA, Google, YouTube, and Bit-Torrent. Python programming is widely used in Machine Learning, Natural Language Processing, Artificial Intelligence, Neural Networks, and other advanced dimensions of Information Technology and Computer Science. Code readability has been the primary center of attention for the development of Python. This chapter will demonstrate the applications of Python in reading data, data visualization, data manipulation, data analysis, model building and prediction using an open-source data set. 7.2 INSTALLING PYTHON Python is open source and is free to install. Before we start working with Python, we need first to download the installation package of the desired version of Python from the link: https://www.python.org/downloads/. This page will direct the user to choose between the two latest versions for Python 2 and 3: Python 3.9.1 and Python 2.7.18. Alternatively, if a user needs a specific Python version, the page can be scrolled down to find download links for earlier versions. The current exercise will be done in ‘Jupyter Notebook;’ however, one can use ‘Spyder’ or other programming interfaces using the same programming codes for Python. 7.3 INSTALLING PYTHON LIBRARIES We can use Anaconda prompt for installing a new library using the code “pip install library_name.” Alternatively, the libraries can be installed through Jupyter Notebook as well. Once the library is installed, the same needs to be imported using the import command – “import library_name as preffered_name.” Usually, the preferred name of the library is taken as the initials or short names of the library, e.g., “import numpy as np.” 7.3.1 IMPORTING PANDAS LIBRARY In the Python programming language, pandas refer to a software library developed for data analysis and manipulation. For example, pandas

Dynamics of Data Analysis and Visualization with Python

211

propound data structures and operations for manipulating Data frames or numerical tables.

7.4 IMPORT DATA FROM .CSV FILE The Titanic data set is a renowned piece of data that is open source. The data set contains the information of passengers onboard the Titanic, a British passenger liner that sank in the North Atlantic Ocean in 1912. This data set is often used to demonstrate the binary classification problem in which we try to predict if a passenger boarded on the ship survives or not. The data is available in .csv format. Thus the data will be fetched using the below codes.

In Python, we can choose single quotes (‘ ’) as well as double quotes (“ ”) to address the location of the file name, variable name, and other functions. A user can read the excel file using ‘_excel’ command in place of ‘_csv’ command. 7.4.1 RENAMING THE DATASET Suppose we need to change the name of our dataset from ‘Data’ to ‘df,’ the below command will serve the purpose.

The dataset ‘Data’ now be stored as a different data frame ‘df.’ In fact, Python will create a duplicate data frame with the name ‘df.’ Thus, we can use both ‘df’ and ‘data’ data-frames for further analysis. For data preview, we execute the below code.

Distribution Theory

212

If we wish to preview the desired number of observations (row), e.g., 10 rows of the data frame (df), we can execute the below code.

7.4.2 SELECTING A RANDOM SAMPLE WITH REPLACEMENT A random sample with replacement is taken using the codes mentioned below. This new data frame Data_Titanic consists of 95% observations of the original dataset df.

Now further, we will be dealing with this dataset ‘Data_Titanic.’ 7.5 DATA EXPLORATION AND PREPARATION In data analysis, data exploration is the initial step and basically involves summarizing the foremost attributes of a data set, including its shape, accuracy, initial patterns in the dataset, and other features. Followings are the steps involved to understand, clean, and prepare the data for analysis and model building: i. ii. iii. iv. v. vi.

Variable identification; Univariate analysis; Bi-variate analysis; Missing values treatment; Variable transformation; Variable creation.

Dynamics of Data Analysis and Visualization with Python

213

The data dictionary is the most significant tool to understand the data for further treatments. The data dictionary is also called the metadata, a set of data that narrate and furnish information about other data which we are going to use for analysis or model building. To understand the below data dictionary let us have a preview of the data Data_Titanic.

 Data Dictionary Details: 1. Survived: 0 = No, 1 = Yes (Survival of the Passengers) 2. pclass: Economic status 1st = Upper 2nd = Middle 3rd = Lower 3. Sex: Sex of the Passenger 4. Age: Age in years 5. sibsp: Number of siblings / spouses aboard the Titanic 6. parch: Number of parents / children aboard the Titanic 7. ticket: Ticket number 8. fare: Cost of the Ticket 9. cabin: Cabin number 10. embarked: Port of Embarkation; C = Cherbourg, Q = Queenstown, S = Southampton 7.5.1 DATA QUALITY AND MISSING VALUE ASSESSMENT We always need an exemplary data set to perform data analysis and modeling; thus, we need to measure data quality effectively to serve the purpose. Data quality refers to the capability of a data set to achieve the desired objective. Effective data analysis cannot be done with low-quality data. There are several advanced strategies that we can use to improve the quality of the data and make the best use of it to fulfill our analytical requirements.

214

Distribution Theory

Before going forward to access the data quality, we should check the dimension of our dataset. The below code exhibits the dimensions of the given Data Frame. Here, it shows that the data set contains 846 passengers (rows) and 12 variables (columns).

 Output: (846, 12) Further, we can use the “describe” function to obtain the basic statistics of the data frame.

If we desire to get the basic statistics of a particular variable, then the below code can be executed using the describe function and the variable name. In the below command, basic statistics for the variable ‘Age’ have been obtained.

In addition to getting all the basic statistics of a variable, one can find the value of a particular characteristic (parameter) corresponding to the

Dynamics of Data Analysis and Visualization with Python

215

variable. The below command enables us to obtain the minimum of the variable ‘Age.’

 Output: 0.42 The above value of age is obtained in years. Since it is in years, we can convert the age into months using the below code.

 Output: 5.04 In order to realize the missing value and have a preview of missing values in the data frame, isnull() function can be used as below.

In the above output, the “True” value reflects that the variable has the null value at that place. On the other hand, if we need to realize the count of missing observations corresponding to the various variables, the below code can serve the purpose.

216

Distribution Theory

7.5.1.1 DATA VISUALIZATION Data visualization provides accessibility to find and understand patterns, outliers, and trends in the dataset. Charts, graphs, and maps are the most common data visualization tools. Let us see how the ‘Age’ variable looks like in the Histogram (Figure 7.1) using matplotlib.pyplot library.

We can change the current font size (11) in the syntax as per the visualization need.

FIGURE 7.1 Histogram for the age distribution.

Referring to the above codes for a histogram, the following points are suggested:

Dynamics of Data Analysis and Visualization with Python

217

i. Bins can be modified to show the width of the Histogram, color can be modified as per requirement, and alpha shows the thickness of the color (1 stand for 100% Thickness). ii. We can choose a different file type, e.g., pdf or png, for exporting the image. Also, we can opt for the save image as an option by using right-click on the graph. iii. Please note that the syntax ‘plt.show()’ should be placed after the ‘plt.save’ command; otherwise, the image may be saved as blank in the output file. Furthermore, we have removed the grid lines in Histogram (Figure 7.2) using the below code:

FIGURE 7.2 Histogram (without grid) for the age distribution.

 Age–Missing Values: Moving forward, we can find the missing value count corresponding to a particular variable. The below code supplies the count of null values found in ‘Age’ Variable.

218

Distribution Theory

 Output: 171 The above output suggests that in the data set total of 171 records are there with missing age values. In the below syntax, we have calculated the missing data proportion in ‘Age’ variable. First, we have calculated the total number of observations (846) in the data set. Then divide the number of missing records (171) by the total number of passengers.

 Output: 846

 Output: 0.20212765957446807 One may give the round-off value using the ‘round’ function as below. The proportion of missing values has been rounded off to the 3 decimal place.

 Output: 0.202 Here, we can comment that ~20% of records do not have the age information of the passenger.

 Output: 29.81 After glancing at the Histogram of ‘Age,’ we can observe that this variable is significantly skewed. This indicates that the mean might give us biased results if we choose it for missing value imputation. In this connection, we will use the median to impute the missing values through the below code.

 Output: 28.0 In the above code, ‘skipna=True’ removes the null values and calculates the median on the rest data points.

Dynamics of Data Analysis and Visualization with Python

219

 Cabin–Missing Values: Similar to the ‘Age’ variable, we need to compute the proportion of missing observation against the variable ‘Cabin’ using the codes given below. In the below codes 665 is the number of observation where the value of the Cabin variable is null.

 Output: 0.786 We have realized that ~79% of records are missing; this indicates that it is probably not wise to impute the missing values and use the ‘Cabin’ variable for further analysis. Thus, we may ignore this variable in our model or make a separate category for passengers with the null value. Embarked–Missing Values Similarly, the proportion of missing values for the variable ‘Embarked’ has been computed as below.

 Output: 0.002 There are only 2 missing values for ‘Embarked,’ so we can impute with the port where most people boarded (Mode) since the variable is nominal type. Previously we have used the matplotlib, a data visualization library. Further, we will use Seaborn library, which provides a high-level interface to draw statistical graphs and charts. Before using the seaborn library, we need to install the library using the below command.

In the next step, we can set the background color of our choice. In the below bar chart, the background is white. The style must be one of white, dark, white grid, dark grid, tickswhite background style for seaborn plots. Figure 7.3 exhibits the bar chart for the feature/variable ‘Embarked Port’ using seaborn library.

220

Distribution Theory

FIGURE 7.3 Bar chart for embarked port.

Further, we can show the grid lines using white grid and change the color combinations using ‘Set1.’ Please refer to Figure 7.4.

FIGURE 7.4 Bar chart for embarked port with grid lines and different color set.

Dynamics of Data Analysis and Visualization with Python

221

Since most passengers boarded in Southampton, we will impute two Null observations with “S.” 7.5.1.2 REPLACING THE NULL VALUES Now, we will replace the null values with the other values as discussed above. In the below codes, the null values in the 'Age' variable have been replaced by the median age, i.e., 28 years. Similarly, null values in the 'Embarked’ variable have been replaced by 'S,' which means Southampton. The mentioned code inplace=True depicts that the data set Data_Titanic will be overwritten with these changes (null value imputation).

7.6 REMOVING A VARIABLE FROM THE DATA SET We have observed that the variable ‘Cabin’ consists of 79% missing or null values; therefore, it is recommended that this variable should be dropped from the datasets and not be used in further analysis. ‘Drop’ function has been used as below to delete the column ‘Cabin.’

222

Distribution Theory

Here, we can observe that the null values have been replaced. Thus, there is no null value present in the variable preview of null counts. 7.7 COMPUTATION OF NEW VARIABLES According to the data dictionary, both the variables ‘SibSp’ and ‘Parch’ show the number of companions. In order to avoid possible multicollinearity, we can combine the effect of these variables into one categorical feature, i.e., whether or not that individual was a solo passenger. First, we added the two variables ‘SibSp’ and ‘Parch’ to get the total count of the companions with the passenger. We have stored the total number of companions in a new variable, ‘Traveller_Count.’ The below-mentioned code is used to calculate the total count of the companions.

Furthermore, to compute the variable ‘Solo_Traveller,’ numpy library has been imported. Using the ‘where’ function, a passenger is marked ‘NO’ in the variable ‘Solo_Traveller’ if the total count of companions is greater than 0.

In order to use the continuous variable ‘Age’ as a Categorical variable, a new variable, ‘Age_Group’ has been computed using various conditions. This computed categorical variable has Six categories, as mentioned in the below codes.

The syntax mentioned below enables us to glance at the distribution of age categories in the data.

Dynamics of Data Analysis and Visualization with Python

223

 Ascending the data view corresponding to a particular variable In order to have a glance at data with the ascending order of age group, the following syntax may be used

7.8 EXPLORATORY DATA ANALYSIS Exploratory Data Analysis is a critical process of executing initial investigations on data to discover patterns, spot anomalies, test hypotheses, and check assumptions with the help of summary statistics and graphical representations. In this section, we will explore the bivariate distributions of the independent variable and dependent variable. 7.8.1 BI-VARIATE EXPLORATION OF AGE The below-mentioned code explores the unique value of the age of the passengers.

224

Distribution Theory

The below-mentioned codes give the density plot (Figure 7.5) of age with respect to the probability of survival of the passengers.

FIGURE 7.5 Density plot of age with respect to survival probability.

Using the below codes, we have plotted the age distribution of survived and died passengers in a single density plot (Figure 7.6).

Dynamics of Data Analysis and Visualization with Python

225

FIGURE 7.6 Density plot of age for surviving population and deceased population.

The age distributions for survivors and dead passengers are pretty similar. However, it is worth noting that, of the survivors, a more significant proportion were children. This may be a piece of evidence that the passengers had made an attempt to save children by giving them the priority and provide lifeboats. 7.8.2 BI-VARIATE EXPLORATION OF FARE

FIGURE 7.7 Density plot of fare for surviving and deceased population.

226

Distribution Theory

Figure 7.7 depicts that distributions are significantly different for the fares of survivors vs. deceased, this would likely be a significant predictor in our model development process. Passengers who paid lower fares appear to have been less likely to survive. This result suggests that there is probably a strong correlation between with Passenger Class and survival, which we will look at next model development section. 7.8.3 BI-VARIATE EXPLORATION OF PASSENGER CLASS The below-mentioned codes explore the distribution of survival probabilities among the passenger classes.

FIGURE 7.8 Survival probabilities among passenger classes.

Figure 7.8 gives the survival probabilities among all passenger classes. Inevitably, being a first-class passenger was safest and had the highest propensity of survival.

Dynamics of Data Analysis and Visualization with Python

227

7.8.4 BI-VARIATE EXPLORATION OF EMBARKED PORT The below-mentioned codes exhibit the distribution of survival probabilities among the Embarkation ports (Figure 7.9).

FIGURE 7.9 Survival probabilities among embarked ports.

Exploring the above distribution, we can state that the passengers who boarded in Cherbourg, France, appear to have the highest survival rate. Passengers who boarded in Southampton were marginally less likely to survive than those who boarded in Queenstown. This is probably related to passenger class, or maybe even the order of room assignments (e.g., maybe earlier passengers were more likely to have rooms closer to deck). 7.8.5 BI-VARIATE EXPLORATION OF SOLO TRAVELER

228

Distribution Theory

FIGURE 7.10 Survival probabilities v/s. solo passengers.

Figure 7.10 depicts that individuals traveling without families were more likely to die in the disaster than those with families. For the Given incident, there is a possibility that individuals traveling alone were likely male. 7.8.6 BI-VARIATE EXPLORATION OF GENDER VARIABLE

The above-mentioned finding in Figure 7.11 exhibits a pronounced difference. Clearly, female passengers were given preference to get into the lifeboat first, increasing their chances of survival. 7.9 CREATING DUMMY VARIABLES In order to use the categorical variables in the model, we need to create dummy variables for all the categorical variables, which seem essential

Dynamics of Data Analysis and Visualization with Python

229

for model building. We can refer to the codes mentioned below to create dummy variables for the categorical predictors/independent variables ‘Pclass,’ which refers to the passenger class.

FIGURE 7.11 Survival probabilities v/s gender of traveler.

In order to create multiple dummy variables in one step, the following syntax may be utilized.

After creating dummy variables for all desired categorical variables, the new dataset is stored as Data_Titanic2. The below preview of the data ‘Data_Titanic2’ exhibits the dummy variables for the categorical variables ‘Embarked’ and ‘Sex.’ The use of dummy variables will be discussed in the model development section.

230

Distribution Theory

After the variable’s exploration, we have an idea about the importance of predictors for model building. Therefore, the unimportant and unnecessary variables can be dropped further as per the below syntax.

7.10 EXPORTING THE DATA IN .CSV FILE Once the data cleaning and preparation is completed, we should export the data in desired file format to save the processing time while using the exact data for analysis or model building in the future. The below syntax can be used to export the prepared data “Data_Titanic2” into a .csv file with the name “Data_Titanic.csv.” Users may export the data in other formats of files like .txt .xlsx, etc.

7.11 MODEL DEVELOPMENT Further, for model building, independent variables (predictors) need to be stored in an array. Therefore, we have stored all the predictors in an array cols as mentioned in the below python codes.

In order to get the list of all variables which we will supply as independent variables in the model, we need to print the cols as mentioned below. The continuous independent variables are supplied as it is; however, we need to supply all the categories for categorical variables except the one we consider as the reference category. For example, the dummy variable

Dynamics of Data Analysis and Visualization with Python

231

‘Sex_male’ will be supplied as a predictor to the model, and ‘Sex_female’ will be the reference category, which means the regression coefficients will be displayed for Male passengers with respect to Female passengers.

Having a glance at all the variable names stored in cols enables us to cross-validate if we have taken all the important predictors for the further model development process. Further, we will split the data set into two data frames, X and Y, as given in the codes mentioned below. Here, X contains all the desired independent variables which are suitable for model building. On the other side, Y contains only one column/dependent/target variable 'Survived,' which we need to predict.

7.11.1 LOGISTIC REGRESSION MODEL We will be exploring the logistic regression model for this binary classification problem. In order to perform the initial model exploration, we need to have the regression coefficients of the logistic regression. Therefore, the below codes for the logistic regression model have been utilized using statsmodels.api, and scipy libraries in order to get the regression coefficients. Logistic regression model using sklearn library will be discussed later when we need to fit the model without regression coefficients for

232

Distribution Theory

further prediction. The below syntax can be used to fit the logistic regression model with all the desired predictors.

Except ‘Solo_Traveller’ and ‘Fare,’ nearly all the independent variables are significant at the 5% level of significance. Therefore, we will rerun the model without ‘Solo_Traveller’ and ‘Fare.’ We removed these insignificant variables one by one in each iteration and found no significant effect on the regression results. Thus, these insignificant input variables were finally removed from the regression model and found the final regression coefficients. First of all, we need to store the remaining significant variables again in an array as mentioned in the codes mentioned below.

Dynamics of Data Analysis and Visualization with Python

233

In the preview of cols, we can see that the insignificant variables have been removed from the list of independent variables. Further, we will create the data frame X again using the below code for all the significant independent variables. The Y data frame, which contains the dependent variable, will remain the same as we used above.

Using the below codes, we again fit the logistic regression model to find out the new regression coefficients.

The above results show a slight improvement in the p values for the selected independent variables. We further trained the logistic regression for prediction on whole data using the sklearn library. Referring to the below codes, LogisticRegression has been imported from the library sklearn.linear_model. After fitting the logistic regression model (logreg), we computed the model’s accuracy using the ‘.score’ function and rounded it off to the two decimal place.

234

Distribution Theory

 Output: 0.83 The above result shows that the logistic regression is fitted on whole data with 83% accuracy of prediction. The additional code for removing the error in the output display is employed using the warnings library. 7.11.2 VISUALIZING THE MODEL’S ROC CURVE AND AUC The receiver operating characteristic (ROC) curve is a performance evaluation tool for a predictive model at variously defined thresholds. In a ROC curve, the true positive (TP) rate (Sensitivity) is plotted in function of the false positive (FP) rate (1-Specificity) for different cut-off points.

Dynamics of Data Analysis and Visualization with Python

235

Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The below-mentioned codes are utilized to display the ROC curve (Figure 7.12) for the logistic regression model trained on the complete Titanic data set. We have imported roc_curve and auc from the sklearn. metrics as mentioned in the codes. The codes involve matplotlib.pyplot library as mentioned in the previously mentioned graphs.

FIGURE 7.12 ROC curve of the model trained on complete data set.

236

Distribution Theory

The area under curve (AUC) is the area enclosed by the ROC curve. A perfect classifier has AUC = 1, and a completely random classifier has AUC = 0.5. Usually, a good model scores somewhere in between. The range of possible AUC values is [0, 1]. The below-mentioned codes are utilized to obtain the value of AUC.

 Output: ROC AUC: 0.88 It is noted in the above output that the value of AUC is 0.88; it means there is ~88% chance that the model will distinguish between positive class (passenger survived) and negative class (passenger not survived). 7.11.3 VALIDATION OF THE MODEL USING TRAIN AND TEST PARTITIONS The model developed on complete data provides good accuracy; however, it is necessary to test the model on a partition of the complete data that is not used while training the model. It is a general convention to take the 70:30 or 80:20 to split the data into train and test partition. However, one may take other ratios as well to split the data into training and testing sets. In order to split the data into train and test partitions, we have imported train_test_split function from the sklearn.model_selection. In this classification problem, the complete data have been partitioned into train and test data sets in the ratio of 70:30, which means that 70% of data points (passengers) will be considered for the model training, and the remaining 30% will be utilized to test the model predictions. The codes mentioned below are used to split the complete data (Data_Titanic2) into train and test data sets with 70:30.

Dynamics of Data Analysis and Visualization with Python

237

Similar to the previous section, X2 and Y2 data sets are stored containing the independent and dependent variables, respectively, for the training partition. After that, we have re-fitted/re-trained the logistic model for the training data set.

 Output: 0.84 The score for the new training sample (84% of original) is very close to the original performance, which is good. Let us assess how well it scores on the 30% hold-out sample (testing partition). The below codes show that X3_test and Y3_test data sets are stored containing the independent and dependent variables, respectively, for the testing partition. Further, the prediction for the testing data is stored into a data frame Y3test_pred using the logreg.predict function on the LR model (logreg), which was trained on the training dataset.

7.11.4 CONFUSION MATRIX AND MODEL PERFORMANCE The required libraries and functions are imported to evaluate the model accuracy, confusion matrix, and other model performance parameters as given in the below codes.

The accuracy of the trained model on testing data can be found using the codes mentioned below. The output shows that the model accuracy

238

Distribution Theory

is 86% on the testing dataset. This indicates that the trained LR model is performing well on the testing dataset also.

The below-mentioned codes are employed to find out the confusion matrix, which is a matrix that exhibits the TP-63, true negative (TN)-156, FP-13, false negative (FN)-22.

The below code is used to obtain the classification report, which consists of precision, recall, and f1-score on the testing data set here.

In the output mentioned above, recall value (0.92) corresponding to “0” class (not survived) suggests that out of total actual dead passengers, the model can predict 92% dead passengers correctly. The recall value of 0 class is also called specificity. Similarly, recall value (0.74) corresponding to “1” class (survived) suggests that out of total actual survived passengers, the model is able to predict 74% of survived passengers correctly. This value is also known as sensitivity. The precision value for class 1 (survived) 0.83 depicts that out of total passengers predicted as survived by the model, 83% actually survived. Similarly, the precision value for class 0 (not survived) 0.88 depicts that out of total passengers who are predicted as not survived by the model, 88% are actually dead. Further, the f1-score is the harmonic mean of the precision and recall corresponding to each class of dependent variable. After glancing at the f1-score values for each

Dynamics of Data Analysis and Visualization with Python

239

prediction class, we can state that the model performs very well on the testing data set. 7.11.5 ASSESSING THE MODEL PERFORMANCE BASED ON ROC/ AUC (TESTING DATA) As we discussed earlier, the ROC curve has been obtained as below to access the model performance. The ROC curve for the testing data (Figure 7.13) looks similar to the one which was obtained for complete data using the LR model.

FIGURE 7.13 ROC curve of LR model on testing data set.

240

Distribution Theory

 Output: ROC AUC: 0.86 The AUC value for the testing data set 0.86 is pretty close to the AUC value (0.88) obtained after building the LR model on the complete dataset. Therefore, we can state that the trained model predicts the survival of the passenger quite accurate and gives the stable results on testing data as well. 7.11.6 SAVING THE TRAINED MODEL INTO PICKLE FILE Once a well-tuned model is trained with the desired accuracy, we should save the model for future use or prediction. In order to serve this purpose first, we need to import the pickle library using the below codes.

In order to save the .pkl file to save the model, the following codes can be referred. A .pkl file for our trained model logreg is saved at our desired location in the given codes. The name of the saved model file is Titanic_Model.pkl which can be retrieved in the future to reuse this model.

7.11.7 LOADING THE SAVED MODEL If a .pkl file is saved for a model, the same can be retrieved for further use. The below-mentioned codes can be utilized to load the model for further prediction.

Dynamics of Data Analysis and Visualization with Python

241

KEYWORDS • classification • • • • • •

confusion matrix data exploration data visualization model development python titanic data set

REFERENCES Beazley, J., (2009). The Python Essential Reference (4th edn.). Addison-Wesley Professional, 2009. Bernard, J., (2008). Use python for scientific computing. Linux Journal, 175(7), 2008. Lutz, M., (2013). Learning Python (5th edn.). O’Reilly Media. Wolfram Research, (2018). Sample Data: Titanic Survival. The Wolfram Data Repository.

Index

A Administration, 3 Aerospace, 2 Age-missing values, 217 Agriculture, 3 Aircraft technology, 169 Algebraic treatment, 14, 17 Alternative method, 45, 47, 115, 121, 127 Analytical requirements, 213 Applications python, 210 statistics, 2 Area under curve (AUC), 236, 240 Arithmetic mean, 14–17, 19 Artificial intelligence, 210 Astronomy, 3 Axiomatic, 26

B Bad data collection, 3 Bar chart, 7, 219 graphs, 6 Bathtub shape hazard rates, 181 Bernoulli distribution, 56, 57, 59, 61, 63, 65 properties, 57 experiment, 65 Beta distribution (119–128) first kind, 119, 120 properties, 126 second kind, 125 function, 121, 203 probability model, 119 Biased samples, 3 Big data analytics, 2 Bimodal, 18 Binary classification problem, 211, 231

Binomial distribution, 65, 68, 69, 72, 75, 92, 93, 98 properties, 66 Bi-variate analysis, 212 distributions, 223 Brittle materials, 185 Burr distribution, 201–204, 207 properties, 203

C Cabin-missing values, 219 Calculus of probabilities, 26 Categorical predictors-independent variables, 229 variables, 228–230 Central moments, 40, 46, 48, 54, 55, 58, 61, 67, 68, 72, 77, 78, 87, 97, 105, 110, 115, 118, 122, 126, 127, 135, 137, 138, 146, 156, 166, 168, 171, 174, 177, 179, 180, 192 Bernoulli distribution, 58 beta distribution of first kind, 122 beta distribution of second kind, 127 Binomial distribution, 67 gamma distribution, 115 geometric distribution, 87 Hyper-Geometric distribution, 105 log normal distribution, 146 normal distribution, 135 Pareto distribution, 192 Poisson distribution, 77 uniform distribution, 54, 110 Weibull distribution, 156 tendency, 14 Characteristic function (CF), 12, 49, 56, 61, 72, 80, 91, 96, 106, 112, 118, 136, 137, 149, 164, 167, 169, 173, 175, 178, 180, 187

244 Index Charts, 216 Classical theory of probability, 26 Closed-form survival function, 151 Code readability, 210 Coefficient dispersion, 20 variation, 21 Color combinations, 220 Combinatorial counting problems, 26 Communication theory, 169 Competitive examination, 25 Complementary event, 26 Complex applications, 209 Computation categorical variable, 222 new variables, 222 Computer science, 210 Confusion matrix, 237, 238, 241 Continuous frequency distribution, 8 independent variables, 230 COVID-19 cumulative death cases, 5 patients, 75 Cumulant, 45–47, 61, 118, 137, 149, 180 generating function (CGF), 45, 46, 61, 72, 80, 81, 91, 96, 118, 137, 149, 180 Bernoulli distribution, 61 normal distribution, 137 Cumulative distribution function (CDF), 28–31, 34–39, 49, 80, 91, 96, 108, 112, 119, 125, 129, 141, 152, 160, 166, 167, 169, 173, 175, 179, 181, 184, 185, 187, 190, 196, 199, 201, 204, 208 frequency, 10, 11 curve, 10 distribution, 10 performance index (CPI), 27, 49 Cut-off points, 234

D Data analysis, 14, 210, 212, 213 cleaning, 230 dictionary, 213, 222 details, 213

exploration-preparation, 212 data quality-missing value assessment, 213 frames, 211, 214 groping, 8 manipulation, 210 pictorial representation, 4 presentation, 3 proportion, 218 quality, 213, 214 science, 2, 209 visualization, 210, 216, 219, 241 tools, 216 Deciles, 18 Decision-making, 2 Deep learning, 2 Density function, 199, 201, 206 Dependent variable, 223, 233, 237, 238 Descriptive analysis, 14, 23 measure of, central tendency, 14 kurtosis, 22 skewness, 21 Deviation of observation, 19 Diagrammatic presentations, 4 Discrete frequency distribution, 8 uniform distribution, 52 Dispersion, 14, 20, 21 Distribution cumulative frequency, 10 function (log normal distribution), 141 survival probabilities, 226, 227 wealth, 189 Dummy variables, 228, 229

E Economics, 3, 189 Effective data analysis, 213 Embarkation missing values, 219 ports, 219, 227 Entropy two-parameter Weibull Distribution, 167 Weibull Distribution, 164 Errors misreporting, 3

Index

245

Exploratory data analysis, 223 bi-variate exploration age, 223 embarked port, 227 fare, 225 gender variable, 228 passenger class, 226 solo traveler, 227 Exponential distribution, 175, 176, 180, 182 one parameter, 179 properties one parameter, 179 two-parameter, 176 two-parameter, 175 Extreme observations, 14, 18

F F1-score values, 238 Failure probability, 65, 82, 92 False negative (FN), 238 positive (FP), 234, 238 Frequency distribution, 8, 10, 15, 16, 23 polygon, 8–10

G Gamma distribution, 112–115, 117, 118 properties, 113 function, 114, 115, 117, 136, 154–156, 164 General Central Moment (Normal Distribution), 137 Pareto distribution, 199, 200 properties, 199 Geometric distribution, 82, 83, 91 properties, 83 Geometric mean, 15, 16 GlaxoSmithKline (GSK), 210 Graphical, 4, 216, 235 presentation (data), 8 representation, 8, 223

H Harmonic mean, 16, 238 Hazard function, 163, 167, 169, 172–174, 177, 178, 180, 183, 184, 186, 187, 198, 207 rate, 162, 177, 180, 197, 201, 206 Burr distribution, 206 Pareto distribution, 197 Weibull distribution, 162 High-level interface, 219 Histogram, 8, 9, 23, 216–218 Homogeneous Poisson process, 175 Hydrology, 175 Hyper-geometric distribution, 98, 99, 105, 106 properties, 99

I Import data (csv file), 211 renaming dataset, 211 selecting random sample, 212 Independent variables, 223, 230–233 Induced probability measure, 28 Industrial applications, 175 Information technology, 210 Insignificant input variables, 232 variables, 232, 233 Installation package, 210 python libraries, 210 importing pandas library, 210 Instantaneous rate of failure, 162 Interquartile range, 19 Inverse Weibull distribution, 184, 185

J Jupyter Notebook, 210

K Kurtosis, 14, 22, 23, 40, 41, 55, 58, 68, 78, 87, 97, 111, 116, 124, 128, 129, 135, 147, 157, 158, 166, 168, 171, 174, 177, 179, 195

246 Index

L Large-scale utilities (Python), 209 Leptokurtic, 22, 23, 41, 59, 69, 78 Line diagram, 5 Logarithmic, 197 Logistic regression, 231–235 model, 231–233, 235 Log-normal distribution, 140, 141, 143, 148

M Machine learning, 2, 210 Manipulation, 210 Mathematical formulation, 26 Matplotlib, 216, 219, 235 Mean deviation, 19, 20 time to failure (MTTF), 163, 167, 169, 172, 174, 175, 177, 180, 187, 199, 201, 207, 208 Burr distribution, 207 Generalized Pareto distribution, 201 Pareto distribution, 199 Weibull distribution, 163 Measure central tendency, 14 dispersion, 19, 20 uncertainty, 164 Measurement level, 17 Median, 14, 17, 141, 143, 159, 166, 168, 172, 174, 177, 179, 196, 204 log normal distribution, 141 Memoryless property, 180 Mesokurtic, 22, 23, 41, 59, 78, 135 curves, 117 Metadata, 213 Ministry of Health Family Welfare, 5 Misleading graphs, 3 Missing value imputation, 218 treatment, 212 Mode, 14, 17, 139, 141, 143, 160, 166, 168, 172, 174, 177, 179, 180, 197, 205, 219 Burr distribution, 205 log normal distribution, 141 normal distribution, 139 Pareto distribution, 197 Weibull distribution, 160

Model accuracy, 233, 237 building, 210, 212, 213, 229–231 development, 226, 229–231, 241 assessing model performance ROC-AUC (testing data), 239 confusion matrix-model performance, 237 loading saved model, 240 logistic regression model, 231 process, 226, 231 saving trained model (pickle file), 240 validation of model (train-test partitions), 236 visualizing models ROC curve-AUC, 234 exploration, 231 performance, 237, 239 predictions, 236 random strength, 185 training, 236 Modern science technology, 169 Modified Weibull model, 187 Moment bernoulli distribution, 57 beta distribution (second kind), 126 binomial distribution, 66 generalized Pareto distribution, 199 generating function (MGF), 43, 44, 46, 49, 56, 59, 61, 69–71, 79, 80, 88, 89, 91, 94, 96, 97, 106, 112, 117, 118, 136, 149, 163, 169, 180, 187 geometric distribution, 83 hyper-geometric distribution, 99 log normal distribution, 143 normal distribution, 130 one parameter Rayleigh distribution, 174 Monotonic failure, 151 function, 141 Multicollinearity, 222 Multi-modal, 18

N Natural language processing, 210 Negative binomial, 92–94, 96–98 distribution, 92–94, 96, 97 properties, 94

Index

247

skewed, 21, 22, 41 Neural networks, 210 Nominal data, 17 Non-negative finite number of values, 98 Non-numeric outcomes, 27 Non-random sampling methods, 3 Normal distribution, 22, 129–131, 136–138, 140, 141, 143, 145, 149 properties, 130 Null value, 217, 218, 221, 222 imputation, 221 Numerical tables, 211

O Object-oriented programming language, 209 Odd order central moment, 137 One parameter Exponential model, 179 Rayleigh distribution, 174, 175 Weibull distribution, 167 Open-source data set, 210 Operating systems, 209 libraries, 209 Outliers, 13, 216

P Pandas, 210 Parameter exponential model, 179 Pareto distribution, 189, 190, 192, 196, 198, 199, 208 law, 189 Particle size distribution, 151 Passenger class, 226 Percentage of scores, 19 Percentile, 19, 159, 160, 166, 168, 172, 174, 177, 179, 196, 197, 204, 205 Weibull distribution, 159 Performance evaluation tool, 234 Personal loan offers, 83 Physical natural sciences, 3 Pickle library, 240 Pie chart, 4 Platykurtic, 22, 23, 41, 59, 69, 112 distribution, 41 Point of discontinuities, 31 Poisson distribution, 75, 76, 78–80 properties, 76

Population variance, 20 Positive real number, 75 skewed, 21, 22, 41, 78 leptokurtic, 48, 49, 88, 117 Probability density function (PDF), 34–38, 42, 44, 46, 48, 49, 61, 107, 112, 119, 125, 129, 140, 141, 152, 160, 161, 166, 167, 169, 173, 175, 179, 181, 184, 185, 187, 190, 197, 205, 208 distribution, 28, 39, 45, 56, 59, 61 generating function (PGF), 56, 61, 71, 80, 91, 96, 97, 106, 112, 118, 136, 149 mass function (PMF), 30–34, 49, 51, 57, 65, 75, 83, 92, 98, 106 model, 39, 40, 48, 51, 61, 94, 96, 112, 152 success, 65 theory, 26 Processing time, 230 Programming interfaces, 210 Properties expectation, 41 moment generating function, 43 one parameter Rayleigh distribution, 174 Weibull distribution, 168 pareto distribution, 191 two-parameter Rayleigh distribution, 171 Weibull distribution, 166 uniform distribution, 109 variance, 41 weibull distribution, 154 Psychology, 3 Python, 209–211, 230, 241 development, 210 programming, 210 language, 210 version, 210

Q Qualitative data, 3, 4 Quartile, 14, 18, 19, 21, 159, 160, 166, 168, 172, 174, 177, 179, 196, 204 deviation, 19 Queuing theory, 175

248 Index

R Random experiments, 25 variables (RVS), 25, 27–32, 34–45, 49, 51, 53, 54, 57, 65, 75, 83, 92, 106, 107, 119, 125, 129, 140, 152, 161, 163, 190, 199, 201 Range, 19 Rapid prototyping, 209 Raw moments, 40, 43–45, 57, 66, 76, 83, 99, 109, 113, 120, 130, 143, 154, 191, 192, 199, 203 beta distribution, 120 gamma distribution, 113 Pareto distribution, 191 uniform distribution, 53, 109 Weibull distribution, 154 Rayleigh distribution, 169, 170, 173, 174, 178, 182 one parameter, 173 two-parameter, 169 model, 152, 169, 173, 174 Realworld information, 4 Receiver operating characteristic (ROC), 234–236, 239, 240 Recurrence relation moments Bernoulli distribution, 63 Binomial distribution, 74 Poisson distribution, 81 Regression coefficients, 231–233 model, 231, 232 Reliability, 151, 161, 163, 169, 175, 199, 206 engineering, 151 function, 161, 167, 169, 172, 174, 177, 180, 197, 201, 206 Burr distribution, 206 Pareto distribution, 197 Weibull distribution, 161 Replacement, 98, 163, 199, 212 null values, 221 Robotics, 2 Room assignments, 227

S Sample generation Bernoulli distribution, 57 beta distribution, 120 binomial distribution, 65 exponential distribution, 176 inverse transformation method, 153, 190, 202 Poisson distribution, 75 Rayleigh distribution, 170 uniform distribution, 52, 108 space, 25–27 variance, 20 Scatter plot, 12 Scatteredness, 19 Science of counting, 2 Seaborn library, 219 Sensitivity, 234, 235, 238 Shape parameters, 181 Signal processing, 169 Significant independent variables, 233 Simple random sampling, 98 Skewness, 14, 21, 40, 41, 55, 58, 68, 78, 87, 97, 111, 116, 124, 128, 135, 147, 157, 158, 166, 168, 171, 174, 177, 179, 195 expressions, 55 Kurtosis Bernoulli distribution, 58 beta distribution first kind, 124 beta distribution second kind, 128 binomial distribution, 68 gamma distribution, 116 geometric distribution, 87 log normal distribution, 147 normal distribution, 135 Pareto distribution, 195 Poisson distribution, 78 two-parameter Rayleigh distribution, 171 uniform distribution, 55, 111 Weibull distribution, 158 Sklearn library, 231, 233 Software library, 210 Solo-traveller, 222, 232

Index

249

Specificity, 235, 238 Spot anomalies, 223 Standard deviation, 20, 21 log-normal distribution, 140 Statistical, 1–3, 51, 107, 151, 189, 209 abuses, 3 advanced techniques, 2 analysis, 3, 23 graphs, 219 methods, 3 other disciplines, 3 Success probability, 57, 82, 92 Survival analysis, 169 probability, 224 Swedish Kronor, 13 Symmetric, 21, 59, 69, 78, 112, 117, 135 distribution, 41 Systematic manner, 2

T Tabular form, 28 presentation, 4, 23 Terminology, 163, 199 Test hypotheses, 223 partitions, 236, 237 sets, 236 Textual presentation, 4 Three-parameter Weibull model, 168 Thresholds, 234 Tickswhite background style, 219 Titanic data set, 211, 235, 241 Training dataset, 237 Trimodal, 18 True negative (TN), 238 positive (TP), 234, 238 Two-dimensional plane, 5 Two parameters exponential distribution, 175–178 model, 175, 179 Rayleigh

distribution, 171, 172 model, 169, 174 Weibull distribution, 166, 167, 182 model, 166, 181

U Uniform discrete distribution, 135, 147 distribution, 51, 56, 107–110, 112 Unimodal, 18 Univariate analysis, 212 Upside-down bathtub-shaped failure rates, 151

V Variable creation, 212 identification, 212 name, 211, 214 transformation, 212 Variance, 19, 20, 22, 40–42, 47, 54, 58, 61, 68, 72, 78, 81, 87, 105, 110, 115, 118, 119, 122, 128, 135, 137, 147, 156, 157, 180, 192, 204 standard deviation, 19

W Weather forecasting, 2 Weibull distribution, 151–154, 157, 159, 160, 163, 164, 166–168, 181–187 exponentiated, 181 inverse, 184 modifications, 181, 185 one parameter, 167 special cases of, 166 two-parameter, 166 model, 151, 152, 160, 166, 168, 169, 175, 181, 187 Well-tuned model, 240 White grid, 219, 220

Y YouTube, 210