279 58 1MB
English Pages 279 [370]
Python For Beginners 2 Books in 1. A Completed Guide to Master the Basics of Python Language Programming and Data Science. Learn Coding Fast with Examples and Tips Julian James McKinnon
© Copyright 2021 by Julian James McKinnon All rights reserved. The material contained herein is presented with the intent of furnishing pertinent and relevant information and knowledge on the topic with the sole purpose of providing entertainment. The author should thus not be considered an expert on the topic in this material despite any claims to such expertise, first-hand knowledge, and any other reasonable claim to specific knowledge on the material contained herein. The information presented in this work has been researched to ensure its reasonable accuracy and validity. Nevertheless, it is advisable to consult with a duly licensed professional in the area pertaining to this topic, or any other covered in this book, in order to ensure the quality and validity of the advice and/or techniques contained in this material. This is a legally binding statement as deemed so by the Committee of Publishers Association and the American Bar Association in the United States. Any reproduction, transmission, copying, or otherwise duplication of the material contained in this work are in violation of current copyright legislation. No physical or digital copies of this work, both total and partial, may not be done without the Publisher’s express written consent. All additional rights are reserved by the publisher of this work. The data, facts, and description of events forthwith shall be considered as accurate unless the work is deemed to be a work of fiction. In any event, the Publisher is exempt of responsibility for any use of the information contained in the present work on the part of the user. The author and publisher may not be deemed liable, under any circumstances, for the events resulting from the observance of the advice, tips, techniques and any other contents presented herein. Given the informational and entertainment nature of the content presented in
this work, there is no guarantee as to the quality and validity of the information. As such, the contents of this work are deemed as universal. No use of copyrighted material is used in this work. Any references to other trademarks are done so under fair use and by no means represent an endorsement of such trademarks or their holder.
Python For Beginners PYTHON PROGRAMMING Introduction The Parts You Should Know about the Python Code Getting That Environment Set Up Chapter 1. Basic Background of Python What Is Python? Why Python? Installing Python Using a Text Editor Using an IDE Your First Program Code Comments and Your Program Chapter 2. Data Types in Python Strings Numeric Data Type Booleans List Variables User-Input Values Chapter 3. Operators - The Types and Their Uses The Types The Operator Precedence The Logical Operators Chapter 4. Loops and Functions LOOPS Nested if Statements in Python
For Loop in Python Range() Function in Python Using for Loop with Else While Loop in Python Using While Loop with Else Python’s Break and Continue Continue Statement in Python Pass Statement in Python Functions in Python Calling a Function in Python Docstring Python Function Return Statement Random Function in Python Iterators Manually Iterating Through Items in Python Explaining the Loop Creating Custom Iterator in Python Infinite Iterators Closure Function in Python Projects - Implementing Simple Calculator in Python Chapter 5. Exception Handling What Is Exception Handling? Handling the Zero Division Error Exception Using Try-Except Blocks Reading an Exception Error Trace Back Using Exceptions to Prevent Crashes The Else Block Failing Silently
Handling the File Not Found Exception Error Checking If File Exists Try and Except Creating a New File Chapter 6. Variable Scope and Lifetime in Python Functions Function Types Keywords Arguments in Python Arbitrary Arguments Recursion in Python Python Anonymous Function Python’s Global, Local and Nonlocal Creating a Local Variable in Python Python’s Global and Local Variable Python’s Nonlocal Variables Global Keyword in Python Creating Global Variables across Python Modules Python Modules Module Import Import Statement in Python Importing All Names Module Search Path in Python Reloading a Module Dir() built-in Python function Python Package Number Conversion Type Conversion Mathematics in Python Random Function in Python
Lists in Python Nested Lists Accessing Elements from a List Chapter 7. Modules How to Create a Module? Import Statement Locate a Module Syntax of PYTHONPATH Chapter 8. Working with Files Reading from a File File Pointer File Access Modes Writing to a File Practice Exercise Summary Chapter 9. Object-Oriented Programming Classes and Objects Chapter 10. Real-World Examples of Python Data Science Machine Learning Applications in Web Development Automation Things We Can Do in Python Comment Reading and Writing Files Integers Triple Quotes
Variables The Scope of a Variable Modifying Values The Assignment Operator Chapter 11. Getting Started; Python Tips and Tricks Web Scraping Chapter 12. Common Programming Challenges Debugging Working Smart User Experience Estimates Constant Updates Problems Communicating Security Concerns Relying on Foreign Code Lack of Planning Finally Conclusion Introduction Effectiveness of Libraries for Python There Is Always Someone Available to Help in the Python Community Chapter 1: What Is Data Science? The Importance of Data Science How Is Data Science Used? The Lifecycle of Data Science The Components of Data Science Chapter 2: Basics of Python Python IDEs
Getting Started with Python Data Types Functions and Modules Object-Oriented Programming Class Inheritance Regular Expressions Match and Search Functions Exception Handling File Handling Chapter 3: The Best Python Libraries for Data Science Core Libraries and Statistics Visualization Machine Learning Libraries Deep Learning Chapter 4: Data Science and Applications Banking and Finance Health and Medicine Oil and Gas The Internet Travel and Tourism Chapter 5: The Lifecycle of Data Science The Discovery Phase The Data Preparation Phase The Model Planning Phase The Operationalize Phase The Communicate Results Phase Chapter 6: Probability, Statistics, and Data Types Real-Life Probability Examples
Statistics Data Types The Importance of Data Types Statistical Methods Descriptive Statistics Chapter 7: Most Common Data Science Problems Management Expects the World Misunderstanding How Data Works Taking the Blame for Bad News Communication as a Solution Chapter 8: Comparison of Python with Other Languages Python versus Java comparison Python versus C# Python versus JavaScript Python versus Perl Python versus Tcl Python versus Smalltalk Python versus C++ Python versus Common Lisp and Scheme Python versus Node.js Coding Everything in JavaScript Python versus PHP Chapter 9: Data Cleaning and Preparation What Is Data Preparation? Why Do I Need Data Preparation? What Are the Steps for Data Preparation? Handling the Missing Data Chapter 10: Data Visualization
Data Visualization to the End-User Matplotlib Visualization Using Pandas The Objective of Visualization The Simplest Method to Complex Visualization of Data Overview of Plotly Heat Maps Conclusion
PYTHON PROGRAMMING
Introduction There are a lot of reasons why you will love working with Python code. It is easy to use, easy to learn, has a lot of great frameworks and libraries to work with (and we will discuss at least a few of these as we go through this guidebook), and is still powerful enough to make machine learning easy for you. While it is possible to work with other coding languages to help you get the results you want, most people prefer to work with Python due to all of the benefits discussed. Before we take a look at how to set up the Python environment so you can use it properly, let’s take a look at a few of the different parts that come with the Python language, so you understand how a few of these codes can work for you.
The Parts You Should Know about the Python Code First, we need to take a look at these important keywords in the Python language. Like with what you will find in other coding languages, there is a list of keywords in Python that are meant to tell your text editor what to do. These keywords are reserved, and you should only use them for their intended purposes if you want to be able to avoid issues with your code writing. They are basically the commands that will tell your compiler how to behave, and they remain reserved so that you can execute the code without a lot of issues in the process. Variables are important because they will save up spots on your computer’s memory in order to hold onto parts of your code. When you go through the process of creating a new variable, you are making sure that you are reserving some space on your computer for this. In some cases, such as when you are working with data types, your interpreter will do the work of deciding where this information should be stored because that speeds up the process. When it comes to working with variables, your job will be to make sure that the right variables are lining up with the right values. This will ensure that the right parts show up in your code at the right time. The good news is that you can give the variable the value that you want, but do check that it actually works inside of your code. When you are ready to assign a new value to a variable, you can use the equal sign to make this happen. Let’s look at a good example of how this would work: #!/usr/bin/python counter = 10
# Assigning an integer
kilometers = 100.0
# Assigning a floating-point
fname = "Jordan"
# Assigning string
print counter print kilometers print fname With the above example, you will get the results to show up with the variable that you placed with the value. So, when the counter shows up, it will show you a 100 for the counter, 1000 for the miles, and John as the result of the name. Next are the Python comments. These are helpful to leave a little note in your code and can make a difference in how others are able to look through the code and know which parts are supposed to work with. Working with the comments can be relatively easy when you are on Python. You simply need to add the # sign in front of any comments you would like to write. The compiler will know how to avoid that part of the code and will skip over it, without any interruption in the program. One thing to note is how many comments you write. While you can technically write out as many of these as you would like or that you think the code needs, try only to keep the absolutely necessary ones. You do not want to write in so many comments that it is hard to read the rest of the code. Just write in comments when they are needed, not all of the time. Python statements are a simple part of the code that can make a big difference, so we are going to take some time to explore them real quick here. Statements are going to be the part that the compiler is going to execute for
you. You can write out any kind of statement that you want, but make sure they are in the right place and that you are not using any of the keywords with them, or the compiler will get confused. The next thing that you need to take a look at here is the functions. Functions can be another part of your language that you need to learn about. This is basically a part of the code that can be reused, and it can help to finish off one of your actions in the code. Basically, these functions are often really effective at writing out your code without having a lot of wasted space in the code. There are a lot of functions that you can use in Python, and this can be a great benefit to the programmer. These are just a few of the basics that come with the Python code. We will take a closer look at doing these a bit more as we move through this guidebook, but these will help you get the Python language basics and how you can use it to your advantage in machine learning.
Getting That Environment Set Up Now that we have had a chance to look at machine learning, some of the ways that you can benefit from and benefit from machine learning, and some of the different types of machine learning that you are able to work with, it is time to introduce some Python into this. Python is a great coding language that you can work with, no matter what your skill level is when coding. When it is combined with some of the ideas that come with machine learning, you will be able to see even better results in the long run. That is why we are going to spend some time looking at how you can set up your own environment when working with the Python code. This will help you make sure that Python is set up on your computer properly and will make it easier to work with some of the codes that we will talk about later on. You will find along the way that the Python code is going to be a really easy one to learn compared to some of the others that are out there, and it is often one that is recommended for beginners to learn because it is simple. But this isn’t meant to fool you! Just because you see that it is simple to work with doesn’t mean that you won’t be able to find the strength and the power that you need with this one. There are a lot of different parts that you can learn about the code, but first, we are going to make sure that the environment for Python is set up in the right way to help with the Python environment with the help of machine learning. So, to help us get this done, we need to go to the Python official website and download the Python program that we want to work with. Then make sure that with the files you are working with, you will need to make sure the right IDE is present. This is going to be the environment that has to be there and will ensure that
you are able to write out the codes that you want to work with. The IDE is also going to include all of the installation of Python, the debugging tools you need, and the editors. For this specific section of machine learning, we are going to focus on the IDE for Anaconda. This is an easy IDE to install, and it is going to have some of the development tools that you need. It will also come with its own command-line utility, which will be so great for you installing any of the third-party software you need with it. When you work with this IDE, you won’t have to worry about doing a separate installation with the Python environment on its own. Now we are on to the part of downloading this IDE. There are going to be some steps that you will need to complete to make this happen. We are going to keep these steps as simple as possible, and we are going to look at what we need to do to install this Anaconda IDE for a Windows computer. But you will find that the steps that come with installing this on a Mac computer or a Linux computer are going to be similar to this as well. Some of the steps that you need to use in order to help you download this kind of IDE to your computer include: 1. To start, download your preferred newest version of Python. 2. Once the executable file is downloaded, you can go over to its download folder and run the executable. When you run this file, you should see the installation wizard come up. Click on the “next” button. 3. Then the License Agreement dialogue box is going to appear.
Take a minute to read this before clicking the “I Agree” button. 4. From your “Select Installation Type” box, check the “Just Me” radio button and then “next.” 5. You will want to choose which installation directory you want to use before moving on. You should make sure that you have about 3 GB of free space on the installation directory. 6. Now you will be at the “Advanced Installation Options” dialogue box.” You will want to select the “Register Anaconda as my default Python 3.6” and then click on “Install.” 7. And then, your program will go through a few more steps, and the IDE will be installed on your program. As you can see with all of this, setting up the Python environment that you would like to work with is going to be simple. You just need to go through these steps to get the Anaconda IDE set up properly, and then you are able to use it for all of the codes that we will discuss in this guidebook, along with some of the other codes that you will want to write along the way. Remember that there are some other options that you can work with when it is time to pick out an IDE that you would like to work with. If you are doing some other work than machine learning, or you like some of the features and more that come with another IDE, you are able to download these IDEs to make it work with them as well. But we are going to spend time working with the Anaconda IDE because it is going to have all of the features that we need to get the machine learning algorithms working the way that you would like.
Chapter 1. Basic Background of Python Getting Started Programming is becoming an increasingly demanded skill for anything from web design to Machine Learning and the Internet of Things. It’s on its way to having daily use due to the importance of technology. While programming used to be a subject that people started studying for their computer science degree, now it is often taught starting from elementary school. One of the main reasons for its widespread use is accessibility. You don’t need much to get started. Thanks to the Internet's power, all you need is a computer and a number of software tools that you can download and install without spending a penny. In addition, there are many resources to learn from, as well as organized communities you can join and learn from. In this chapter, you are going to learn why Python is one of the best programming languages to start with, as well as progress your career if this isn’t your first language. Furthermore, you will explore the tools you need, install them, and start your journey. This chapter will guide you step by step and show you everything you need to know in order to get started. If you are already familiar with any other programming language such as C, C++, or Java, you might want to skip this chapter or simply glance through it to refresh your memory. We can define programming as the process of designing, coding, debugging, and maintaining the source code of a computer program, which means that we say the steps to follow for the creation of the source code of computer
programs. The programming language is all the rules or regulations, symbols, and particular words used for the creation of a program, and with it, offer a solution to a particular problem. The best-known programming languages are Basic (1964), C++ (1983), Python (1991), Java (1995), C# (2000), among others. Programming is one of the stages for software development; programming specifies the structure and behavior of a program, verifying if it is working properly or not. Programming includes the algorithm's specification defined as the sequence of steps and operations that the program must perform to solve a problem; for the algorithm to work, the program must be implemented in a compatible and correct language. We could consider programming even easier than learning a new language because the programming language will be governed by a set of rules, which are, generally, always similar, so you could say that it might be considered a natural language. In order to better understand the subject of programming, we could start with the beginnings of programming and how all this universe of languages and programs we know today began. We could start by saying that programming began when the first computer was created in the fifteenth century when a machine capable of doing basic operations and square roots appeared (Gottfried Wilhelm von Leibniz). However, the one that actually served as a great influence for the creation of the first computer was the differential machine for calculating polynomials with the support of Lady Ada Countess (1815-1852), known as the first person who entered programming and from whom comes the name of the programming language ADA, created by the DoD (Department of the United States), in the 1970s. Initially, it was programmed in binary codes (bi=2), in other words, it consists of strings of 0s and 1s, which is the language directly understood by the computer, or what is known as machine language, a language considered
fundamental for the commuter to be thus capable of interpreting the information supplied. Later the languages of high level appeared, using English words to give orders to follow, using intermediate processes between the language and the computer, this process can be a compiler or an interpreter. The syntax of these programming languages is much simpler than our languages, and they use a much smaller vocabulary and set of rules. In summary, we could say that programming is a set of sentences written in a programming language that tells the computer what tasks to perform and in what order through a series of instructions that fully detail the process. In the world of programming languages, we find interpreted languages, such as javascript, where a program called interpreter executes the sentences while reading the text file where they are written, which is why these programs are often also called scripts. On the other hand, we have compiled languages such as Java. In this case, we must previously convert the text file to a ¨translation¨ through a program called ‘compiler’ and the resulting file is the one that will finally run on the computer. In this book, we will speak specifically of the Python programming language, being this an interpreted language whose main and most important characteristic is the application of a syntax that favors the application of readable code. We could say that the interpreter is a type of program that executes code directly, that is to say, it does not need to be compiled, and that is the case of our target language.
What Is Python? Python is one of the most important programming languages nowadays, being a general-purpose language. In this book, you will have the language base so that you can start with it. With this language, you can create a huge and varied number of applications because it allows you to create different applications since it doesn't have a defined purpose.
Why Python? Python is a versatile and powerful programming language that was developed in 1991 by Guido van Rossum. As a fun fact, you should know that the name of the language doesn’t come from the snake, which bears the same name. Guido named his project “Python” after Monty Python, which was a British comedy group he was a big fan of. If you happen to be a fan as well, you will find several “Easter Eggs” within the official documentation of the language. Since 1991, Python has been used to introduce people to programming due to its simple syntax, as well as to create complex programs or analyze massive amounts of data. As a beginner with Python, you will be able to write a basic program quickly. However, you can easily scale it further and turn it into a commercial project. The main reason why Python is so popular for beginners is the fact that the language is easy to read and write. Its structure is human-like and easy to understand; therefore, the code is very user-friendly. This means that you shouldn’t find it too difficult to remember the language and structure. In addition, Python comes with a number of libraries and premade functions that you can immediately add to your code. This way, you can save time. In many ways, it’s like playing with Legos. As long as you pace yourself, learn and practice everything in this book, and extend your knowledge using other resources, you will be able to write a program that you will understand ten years from now.
Program maintenance is a crucial part of your responsibilities as a programmer, but luckily Python code is easy to administrate compared to other languages. With that in mind, let’s briefly explore the plethora of reasons why you should learn Python instead of any other languages. After all, Python isn’t quite the only language that offers you the advantages you’ve learned about so far. User-friendly: the purpose of a programming language is to form the connection between humans and computers. Python, like C# and Java, is a high-level programming language, which means that it is quite far from the machine language which the computer then processes. The opposite of this is the low-level language, which usually refers to assembly language or machine code. In other words, Python is close to English. This allows you to write code as fast as you write any sentence, once you learn the rules and the syntax. Powerful: sometimes, Python is looked down upon because it is so easy to learn and it’s usually the first language programmers explore, whether on their own or at computer science 101. However, Python is a very powerful language that is just as versatile and efficient as more complex languages such as C++. Python is used in every technical department in companies like Google, Microsoft, IBM, Xerox, NASA, and many more. You can even use Python in game development if you prefer to practice a programming language in a more artistic way. OOP: Object-oriented programming is many times the optimal computer problem solver. It is a methodology that offers a method of defining data and
actions as objects. This type of programming is not always necessary; however, it is usually the most optimal approach when working on large applications. For instance, programming languages such as C# and Java are object-oriented. Python can be considered an object-oriented language as well, but this feature is optional. The other languages don’t offer such versatility. This means that with Python, you don’t necessarily have to learn the object-oriented methodology from the start. This is one of the reasons why it’s so much easier to start programming with Python than C++. However, you have the massive benefits of OOP at your fingertips, but only when you actually need it. If you are working on a basic program, there’s no need for it. Python offers you all the power and versatility you need. Computer-friendly: you can run Python on any kind of computer. You don’t need a powerful computer processing unit and a great deal of RAM to start programming. You can even use a credit-card-sized computer like the Raspberry Pi. In fact, Python requires so little that it is one of the top languages used in creating little robots that are operated by $5 computers. In addition, Python runs on any operating system, whether it’s Linux, Windows, or Mac. The programs you write do not depend on the platform. You can work on an application on your Windows running
computer and then switch it to your Mac. For instance, if you finished creating a program and you need beta testers, you can email your project to a friend that uses Linux and another one with Windows. The program will work. Language adaptability: if you ever write a program in another language, you can integrate Python within it. In other words, you can use Python on a program that was written in Java. In addition, you also combine Python with another language in order to take advantage of the benefits that are offered by both of them. For instance, you can integrate C or C++ in order to benefit from the system optimization and speed that they offer. It’s free: everyone likes free stuff, and Python won’t cost you a cent. You can always download and install it for free as many times as you want. In addition, Python is an open-source language, which means that the license even allows you to make modifications to the source code. This means that you can modify Python and then sell your own version of it. You might not be interested in these features at this point, but it is one of the reasons why it’s such a popular language. Community: being a powerful and versatile open source programming language brings benefit to the community. There are many online communities dedicated to teaching and learning everything there is to know about Python. You can ask questions on online boards or seek the advice of a
master programmer. You can also seek fellow students and work on a project together. Python’s popularity has gathered a massive crowd around it, and you should take advantage of it.
Installing Python Before you can start programming, you need to download and install Python on your machine. The installation is quite straightforward no matter what operating system you’re running; however, you do need to pay attention to a couple of things. First, you need to head to Python’s homepage and head to the “Downloads” section. There you will see a number of different installers, and each one of them has a different version. Make sure to download the right installer that matches your computer’s operating system and select the latest version. Once the download is complete, run the installer and follow the steps. You should simply accept the standard settings, and once the installation is complete, you’re ready to go. If you don’t want to install Python for some reason, you may notice that you have some kind of a console on the website's homepage. This is a Python online console, and you can use it to practice your coding skills or to try out some of the examples in this book. It’s advisable for you to type the code yourself, even if you copy it from the book, and then try to be creative with it. You need to practice in order to memorize the syntax and specific commands, and the online console is really handy for a quick practice session.
Using a Text Editor Python programming can be done with nearly any kind of plain text editor. You can use programs like Notepad, Notepad++, gedit, and many more. Keep in mind that some of these text editors come with a variety of features that are useful to programmers. For instance, some of them, such as Notepad++ offer syntax highlighting which will instantly show you any errors you made. If you type code in a basic editor like plain Notepad, the program won’t tell you when you’ve forgotten a semicolon or if you added additional space. There are many programs to choose from, so pick any editor you feel comfortable with. With that in mind, avoid using word processors such as Microsoft Word or Open Office. They aren’t good for programming purposes. They can be used to write code; however, the problem is that when saving it, the program will sneak in some additional lines of code by itself. That code is specific to the word processor, and it can impact your program’s speed, or even worse, it will simply not run.
Using an IDE An IDE, which stands for Integrated Development Environment, is a program designed with a number of features that are useful to programmers. It has a graphical interface, and it makes typing code much faster due to autocomplete and history functions. Programming stays the same whether you are using a text editor or an IDE. However, with the IDE, you will benefit from many shortcuts, reminders, error signaling, and code autocorrect. Many IDE’s even include suggestions on how to fix an error. There are many IDE’s to choose from, but one of the most popular ones is IDLE. It comes in the same package as Python, so there’s no need to perform any extra steps. Keep in mind that it can run in two modes, namely interactive and script. Use interactive if you want Python to respond to whatever commands you type immediately.
Your First Program Now that your toolkit is prepared, it’s time to write your first program. For this example, we’ll use IDLE because it’s important to get used to IDE’s from the start in order to avoid any future frustrations. If you prefer to use a text editor or the online Python console, go ahead, the code will work the same. Now, start running IDLE in interactive mode. You will now see a window that is known as a Python shell. At the command prompt, type the following line: print (“Hello World!”) Now you should see the result displayed on your screen like the following: Hello World! That’s it! Congratulations, you can call yourself a programmer now. Now let’s discuss this bit of code briefly. The first thing you’ll notice is that Python code is plain English, easy to read and understand. Even without programming knowledge, you probably knew what this line of code would do because it’s self-explanatory. That’s the beauty of working with Python. As for the command we used, “print ()” is a function that displays the text which is written in the parentheses. Keep in mind that the line needs to be surrounded by quotation marks; otherwise, you’ll get an error. Furthermore, pay attention to how you type the function because, in Python, everything is case sensitive.
The command “print” will work. However, if you type it as “Print,” it will not. Now, let’s create the same program but this time by using IDLE’s script mode. Don’t forget that interactive mode gives you instant results. It works the same as the online Python shell. However, you won’t be able to save your program so that you can continue working on it later. In order to save it and edit it later, you need to work in script mode. You can run IDLE in script mode simply by clicking on “File” and selecting “New Window.” Now type the same line again: print (“Hello World!) Hit the Enter key. You’ll notice that nothing happens. That’s because you are writing a list of instructions that will be executed at a later date when you run the program. First, you need to save the application by clicking on “Save As” from the “File” menu. You’ll notice that by default, the file has the “py” extension. Always make sure your scripts are saved this way in order to be recognized as Python programs. Now, if you run the program, IDLE will open the interactive mode window and display the result. For now, you’ve run your “Hello World” program by using IDLE. However, you normally want your applications to run like the ones you are currently using.
This means you want an executable file that you double click and it runs. At the moment, if you click on the Python file, a window will open and then close abruptly. You may be thinking that the program doesn’t work because nothing happened; however, something did happen. It was simply too fast for you to observe anything concrete. The program executed all of its instructions, which means that it displayed the message in a fraction of a second, and then it terminated itself. What you need to do is keep the program running once it executes all of its commands so that you can see the results and interact with them. But before you do that, let’s take a moment to discuss how to comment on your code and make it readable and easy to understand.
Code Comments and Your Program Open your script and type the following lines: # Hello World! # This is a demonstration of the “print” function. If you run the program again, you will see that nothing changed. These lines you added aren’t executed as code. They are known as comments, and their purpose is to make the code of an application more understandable. You might be thinking that typing such information is a waste of time because, as the programmer, you already know what your code is about. That may be true; however, when you write a complex program, and then you abandon it for a week or two, you’re going to have some trouble understanding the purpose of every function and variable. Sure, you can read your code and eventually figure everything out, but that is not the proper use of your time. Code comments are used to label and explain complicated functions so that you don’t have to dive into the code itself. They are especially useful if another programmer is going to work on your program at a later date. Imagine a stranger having to decipher your personal approach to the development of your application. On a large project, he could waste wakes of his time instead of doing some work in order to progress. Comments are defined by the hash mark in front of a line. Each line you intend as a comment needs to have its own mark; otherwise, you will get an error. If you’re worried about your programs' efficiency due to hundreds or even
thousands of comments, you shouldn’t be. They have no impact on your computer because when the code is executed, the machine ignores all comments and uses no additional resources. Additionally, to make your comments and code more readable, you can leave empty lines. However, don’t do this after every line of code. Use an empty space in between blocks of code or sections. Programs ignore blank space, so nothing will be affected by using it. Now let’s get back to your first program. Add the following line after the print function: input (“\n\n Hit the Enter key to exit!”) This line will display the console in which the line “Hello World!” is printed and then display the line “Hit the Enter key to exit!” Finally, the program will stay open and wait for you until you hit the Enter key. This is a simple way to keep the program running until the user performs an action.
Chapter 2. Data Types in Python Every program has certain data that allows it to function and operate in the way we want. The data can be a text, a number, or any other thing in between. Whether complex or as simple as you like, these data types are the cogs in a machine that allow the rest of the mechanism to connect and work. Python is a host to a few data types, and, unlike its competitors, it does not deal with an extensive range of things. That is good because we have less to worry about and yet achieve accurate results despite the lapse. Python was created to make our lives, as programmers, a lot easier.
Strings In Python and other programming languages, any text value that we may use, such as names, places, sentences, are all referred to as strings. A string is a collection of characters, not words or letters, which is marked by the use of single or double quotation marks. To display a string, use the print command, open up a parenthesis, put in a quotation mark, and write anything. Once done, we generally end the quotation marks and close the bracket. Since we are using PyCharm, the IntelliSense detects what we are about to do and delivers the rest for us immediately. You may have noticed how it jumped to the rescue when you only type in the opening bracket. It will automatically provide you with a closing one. Similarly, for the quotation marks, one or two, it will provide the closing ones for you. See why we are using PyCharm? It greatly helps us out. “I do have a question. Why do we use either single or double quotation marks if both provide the same result?” Ah! Quite the eye. There is a reason we use these, let me explain by using the example below: print(‘I’m afraid I won’t be able to make it’) print(“He said ‘Why do you care?’”) Try and run this through PyCharm. Remember, to run, simply click on the green play-like button on the top right side of the interface.
"C:\Users\Programmer\AppData\Local\Programs\Python\Python3732\python.exe" "C:/Users/Programmer/PycharmProjects/PFB/Test1.py" File "C:/Users/Programmer/PycharmProjects/PFB/Test1.py", line 1 print('I'm afraid I won't be able to make it') ^ SyntaxError: invalid syntax Process finished with exit code 1 Here’s a hint: That’s an error! So what happened here? Try and revisit the inputs. See how we started the first print statement with a single quote? Immediately, we ended the quote using another quotation mark. The program only accepted the letter ‘I’ as a string. You may have noticed how the color may have changed for every other character from ‘m’ until ‘won,’ after which the program detects yet another quotation mark and accepts the rest as another string. Quite confusing, to be honest. Similarly, in the second statement, the same thing happened. The program saw double quotes and understood it as a string, right until the point the second instance of double quotation marks arrives. That’s where it did not bother checking whether it is a sentence or that it may have still been going on. Computers do not understand English; they understand binary communications. The compiler is what runs when we press the run button. It compiles our code and interprets the same into a series of ones and zeros so that the computer may understand what we are asking it to do.
This is exactly why the second it spots the first quotation mark, it considers it as a start of a string and ends it immediately when it spots a second quotation mark, even if the sentence was carrying onwards. To overcome this obstacle, we use a mixture of single and double quotes when we know we need to use one of these within the sentence. Try and replace the opening and closing quotation marks in the first state as double quotation marks on both ends. Likewise, change the quotation marks for the second statement to single quotation marks as shown here: print("I'm afraid I won't be able to make it") print('He said "Why do you care?"') Now the output should look like this: I'm afraid I won't be able to make it He said, "Why do you care?" Lastly, for strings, the naming convention does not apply to the text of the string itself. You can use regular English writing methods and conventions without worries, as long as that is within the quotation marks. Anything outside it will not be a string in the first place and will or may not work if you change the cases. Did you know that strings also use triple quotes? Never heard that before, have you? We will cover that shortly!
Numeric Data Type Just as the number suggests, Python is able to recognize numbers rather well. The numbers are divided into two pairs: ● Integer – A positive and/or negative whole numbers that are represented without any decimal points. ●
Float – A real number that has a decimal point representation.
This means, if you were to use 100 and 100.00, one would be identified as an integer while the other will be deemed as a float. So why do we need to use two various number representations? If you are designing a program, suppose a small game that has a character’s life of 10, you might wish to keep the program in a way that whenever a said character takes a hit, his life reduces by one or two points. However, to make things a little more precise, you may need to use float numbers. Now, each hit might vary and may take 1.5, 2.1, or 1.8 points away from the life total. Using floats allows us to use greater precision, especially when calculations are on the cards. If you aren’t too troubled about the accuracy, or your programming involves whole numbers only, stick to integers.
Booleans Ah! The one with the funny name. Boolean (or bool) is a data type that can only operate on and return two values: True or False. Booleans are a vital part of any program, except the ones where you may never need them, such as our first program. These are what allow programs to take various paths if the result is true or false. Here’s a little example. Suppose you are traveling to a country you have never been to. There are two choices you are most likely to face. If it is cold, you will be packing your winter clothes. If it is warm, you will be packing clothes that are appropriate for warm weather. Simple, right? That is exactly how the Booleans work. We will look into the coding aspect of it as well. For now, just remember, when it comes to true and false, you are dealing with a bool value.
List While this is slightly more advanced for someone at this stage of learning, the list is a data type that does what it sounds like. It lists objects, values, or stores data within square brackets ([]). Here’s what a list would look like: month = ['Jan', 'Feb', 'March', 'And so on!'] We will be looking into this separately, where we will discuss lists, tuples, and dictionaries. We have briefly discussed these data types. Surely, they are used within Python, but how? If you think you can type in the numbers and true and false, all on their own, it will never work.
Variables You have the passengers, but you do not have a mode of commuting; they will have nowhere to go. These passengers would just be folks standing around, waiting for some kind of transportation to pick them up. Similarly, data types cannot function alone. They need to be ‘stored’ in these vehicles, which can take them places. As we programmers refer to as containers, these special vehicles are called ‘variables,’ and they are the elements that perform the magic for us. Variables are specialized containers that store a specific value in them and can then be accessed, called, modified, or even removed when the need arises. Every variable that you may create will hold a specific type of data in them. You cannot add more than one type of data within a variable. In other programming languages, you will find that in order to create a variable, you need to use the keyword ‘var’ followed by an equals mark ‘=’ and then the value. In Python, it is a lot easier, as shown below: name = "John" age = 33 weight = 131.50 is_married = True In the above, we have created a variable named ‘name’ and given it a value of characters. If you recall strings, we have used double quotation marks to let the program know that this is a string.
We then created a variable called age. Here, we simply wrote 33, which is an integer, as there are no decimal figures following that. You do not need to use quotation marks here at all. Next, we created a variable ‘weight’ and assigned it a float value. Finally, we created a variable called ‘is_married’ and assigned it a ‘True’ bool value. If you were to change the ‘T’ to ‘t,’ the system will not recognize it as a bool and will end up giving an error. Focus on how we used the naming convention for the last variable. We will be ensuring that our variables follow the same naming convention. You can even create blank variables if you feel like you may need these at a later point in time or wish to initiate them at no value at the start of the application. For variables with numeric values, you can create a variable with a name of your choosing and assign it a value of zero. Alternatively, you can create an empty string as well by using opening and closing quotation marks only. empty_variable1 = 0 empty_variable2 = "" You do not have to name them like this necessarily, you can come up with more meaningful names so that you and any other programmer who may read your code would understand. I have given them these names to ensure anyone can immediately understand their purpose. Now we have learned how to create variables, let’s learn how to call them. What’s the point of having these variables if we are never going to use them, right?
Let’s create a new set of variables. Have a look here: name = "James" age = 43 height_in_cm = 163 occupation = "Programmer" I do encourage you to use your own values and play around with variables if you like. In order for us to call the name variable, we simply need to type the name of the variable. To print that to the console, we will do this: print(name) Output James The same goes for age, the height variable, and occupation. But what if we wanted to print them together and not separately? Try running the code below and see what happens: print(name age height_in_cm occupation) Surprised? Did you end up with this? print(name age height_in_cm occupation) ^ SyntaxError: invalid syntax Process finished with exit code 1 Here is the reason why that happened. When you were using a single variable, the program knew what variable that
was. The minute you added a second, a third, and a fourth variable, it tried to look for something that was written in that manner. Since there wasn’t any, it returned with an error that otherwise says: “Umm… Are you sure, Sir? I tried looking everywhere, but I couldn’t find this ‘name age height_in_cm occupation’ element anywhere.” All you need to do is add a comma to act as a separator like so: print(name, age, height_in_cm, occupation) Output: James 43 163 Programmer “Your variables, Sir!” And now, it knew what we were talking about. The system recalled these variables and was successfully able to show us what their values were. But what happens if you try to add two strings together? What if you wish to merge two separate strings and create a third-string as a result? first_name = “John” last_name = “Wick” To join these two strings into one, we can use the ‘+’ sign. The resulting string will now be called a String Object, and since this is Python we are dealing with, everything within this language is considered as an object, thus the Object-Oriented Programming nature that we discussed somewhere in the start. first_name = "John" last_name = "Wick"
first_name + last_name Here, we did not ask the program to print the two strings. If you wish to print these two instead, simply add the print function and type in the string variables with a + sign in the middle within parentheses. Sounds good, but the result will not be quite what you expect: first_name = "John" last_name = "Wick" print(first_name + last_name) Output: JohnWick Hmm. Why do you think that happened? Certainly, we did use a space between the two variables. The problem is that the two strings have combined together, quite literally here, and we did not provide a white space (blank space) after John or before Wick; it will not include that. Even the white space can be a part of a string. To test it out, add one character of space within the first line of code by tapping on the friendly spacebar after John. Now try running the same command again and you should see “John Wick” as your result. The process of merging two strings is called concatenation. While you can concatenate as many strings as you like, you cannot concatenate a string and an integer together. If you really need to do that, you will need to use another technique to convert the integer into a string first and then concatenate the same. To convert an integer, we use the str() function.
text1 = "Zero is equal to " text2 = 0 print(text1 + str(text2)) Output: Zero is equal to 0 Python reads the codes in a line-by-line method. First, it will read the first line, then the second, then third, and so on. This means we can do a few things beforehand as well, to save some time for ourselves. text1 = "Zero is still equal to" text2 = str(0) print(text1 + text2) Output: Zero is still equal to 0 You may wish to remember this as we will be visiting the conversion of values into strings a lot sooner than you might expect. There is one more way through which you can print out both string variables and numeric variables, all at the same time, without the need for ‘+’ signs or conversion. This way is called String Formatting. To create a formatted string, we follow a simple process as shown here: print(f“ This is where {var 1} will be. Then {var 2}, then {var 3} and so on”) Var 1, 2, and 3 are variables. You can have as many as you like here. Notice the importance of whitespace.
Try not to use the spacebar as much. You might struggle at the start but will eventually get the hang of it. When we start the string, we place the character ‘f’ to let Python know that this is a formatted string. Here, the curly brackets are performing a part of placeholders. Within these curly brackets, you can recall your variables. One set of curly brackets will be a placeholder for each variable that you would like to call upon. To put this in practical terms, let’s look at an example: show = "GOT" name1 = "Daenerys" name2 = "Jon" name3 = "Tyrion" seasons = 8 print(f "The show called {show} had characters like {name1}, {name2} and {name3} in all {seasons} seasons. ") Output: The show called GOT had characters like Daenerys, Jon, and Tyrion in all 8 seasons. While there are other variations to convert integers into strings and concatenate strings together, it is best to learn those, which are used throughout the industry as standard. Remember the triple quotes mentioned earlier? I believe you are in a good position now to begin using those. Have a look at this result, and keep in mind that I did not use any variable here at all.
Now, you have seen how to create a variable, recall it, and concatenate the same. Everything sounds perfect, except for one thing; These are predefined values. What if we need an input directly from the end-user? How can we possibly know that? Even if we do, where do we store them?
User-Input Values Suppose we are trying to create an online form. This form will contain simple questions like asking for the user’s name, age, city, email address, and so on. There must be some way through which we can allow users to input these values on his/her own and for us to get those back. We can use the same to print out a message that thanks the users for using the form and that they will be contacted at their email address for further steps. To do that, we will use the input() function. The input function can accept any kind of input. In order to use this function, we will need to provide it with some reference so that the end-user is able to know what he/she is about to fill out. Let us look at a typical example and see how such a form can be created: print("Hello and welcome to my interactive tutorial.") name = input("Your Name: ") age = int(input("Your age: ")) city = input("Where do you live? ") email = input("Please enter your email address: ") print(f"Thank you very much {name}, you will be contacted at {email}.") Output: Hello and welcome to my interactive tutorial. Your Name: Sam Your age: 28 Where do you live? London Please enter your email address: [email protected]
Thank you very much Sam, you will be contacted at [email protected]. In the above, we began by printing a greeting to the user and welcoming them to the tutorial. Next, we created a variable named ‘name’ and assigned it a value that our user will generously provide us with. In the age, you may have noticed I changed the input to int(), just as we changed integer to string earlier on. This is because our message within the input parameters is a string value by default, as it is within quotation marks. You will always need to ensure you know what type of value you are after and do the needful, as shown above. Next, we asked for the name of the city and the email address. Now, using a formatted string, we printed out our final message. “Wait! How can we print out something we have yet to receive or know?” I did mention that Python works line by line. The program will start with a greeting, as shown in the output. It will then move to the next line and realize that it must wait for the user to input something and hit enter. This is why the input value has been highlighted by bold and italic fonts here. The program then moves to the next line and waits yet again for the user to put something in and press enter, and this goes on until the final input command is sorted. Now the program has the values stored; it immediately recalls these values and prints them out for the viewer to see in the end. The result was rather pleasing as it gave a personalized message to the user, and we received the information we need. Everybody walks away, happy!
Storing information directly from the user is both essential and, at times, necessary. Imagine a game that is based on Python. The game is rather simple, where a ball will jump when you tap the screen. The problem is, your screen isn’t responding to the touch at all for some reason. While that happens, the program will either keep the ball running until an input is detected, or it will just not work at all. We also use input functions to gather information such as login ID and passwords to match with the database, but that is a point that we shall discuss later when we will talk about statements. It is a little more complicated than it sounds now, but once you understand how to use statements, you will be one step closer than ever before becoming a programmer.
Chapter 3. Operators - The Types and Their Uses Operators are pretty much how they sound like. They operate as per our needs and connect two dots together. That was the simplest way I can explain these. However, there are quite a few operators available when it comes to Python. They are used for various purposes and are seemingly being used in every program that will be created, apart from the ones where you are only relying on print statements. I shall not waste a lot of time here, so let us get straight to business and see the types first and then move a little towards their uses, both including quite a bit of arithmetic as well. Not a fan of arithmetic myself, but then again, it is necessary!
The Types Straight away, we begin by seeing some basic ones. When we talk about arithmetic, the first few things to pop-up are the addition, subtraction, multiplication, and division signs. Python is no stranger to these, either. There are a lot of applications and programs designed using these. We will be looking into those too, I promise. +, -, /, * The above signs, not including the comma marks, are universal in nature. Whether you speak English, Japanese, or Mandarin, you know you are dealing with some basic operators. These operators are in use throughout the world, at least within a calculator. Hopefully, using these within Python at this point in time should not be a problem for you. However, these are not the only operators we use. The ‘=’ sign, if you may recall, is not an ‘equal to’ sign in Python. It is an operator that assigns a value to a variable. To equate something, we use the ‘==’ sign. I am sure you had already figured that out. What about these, then? != >= = and 0
#The comparison operator
print(number, “The number is a positive number”) Discussion The program contains the if the condition that tests if the given number satisfies the if condition, “is it greater than 0” since 5 is greater than zero, the condition is satisfied the interpreter is allowed to execute the next statement which is to extract and display the numerical value including the string message. The test condition in this program is “number>0. But think of when the condition is not met, what happens? Let us look at Example 2.
Example 2 Start IDLE. Navigate to the File menu and click New Window. Type the following: number=-9 if number>0: print(number, “This is a positive number”) Discussion The program contains only the if statement, which tests the expression by testing of -9 is greater than zero since it is not the interpreter will not execute the subsequent program code lines. In real life, you will want to provide for an alternate in case the first condition is not met. This program will not display anything when executed because the ‘if’ condition has not been met. The test condition in this program is “number>0. Practice Exercise Write programs in Python using if statement only to perform the following: Given number=7, write a program to test and display only even numbers. Given number1=8, number2=13, write a program to only display if the sum is less than 10. Given count_int=57, write a program that tests if the count is more than 45 and displays, the count is above the recommended number. Given marks=34, write a program that tests if the marks are less than 50 and display the message, the score is below average. Given marks=78, write a program that tests if the marks are more
than 50 and display the message, great performance. Given number=88, write a program that tests if the number is an odd number and displays the message, Yes it is an odd number. Given number=24, write a program that tests and displays if the number is even. Given number =21, write a program that tests if the number is odd and displays the string, Yes it is an odd number. Note The execution of statements after the if expression will only happen where the if the expression evaluates to True; otherwise, the statements are ignored. if…else statement in Python The if…else syntax if test condition: Statements else: Statements The explanation the if statement, the if…else statement will execute the body of if in the case that the test condition is True. Should the if…else test expression evaluate to false, the body of the else will be executed. Program blocks are denoted by indentation. The if…else provides more maneuverability when placing conditions on the code. Example A program that checks whether a number is positive or negative Start IDLE.
Navigate to the File menu and click New Window. Type the following: number_mine=-56 if(number0): print(number_mine, ”This is a positive number”) elif(number_mie==0): print(number_mine, ”The number is zero”) else: print(number_mine, ”The number is a negative number”) Discussion: There are three possibilities but at any given instance the only condition will exist and this qualifies the use of if family flow control statement. For three or more conditions to evaluate, the if…elif..else flow statement merits.
Nested if Statements in Python Sometimes, a condition exists, but there are more sub-conditions that need to be covered, which leads to a concept known as nesting. The amount of statements to nests is not limited, but you should exercise caution as you will realize nesting can lead to user errors when writing code. Nesting can also complicate maintaining of code. The only indentation can help determine the level of nesting. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: my_charact=str(input(“Type a character here either ‘a’, ‘b’ or ‘c’:”)) if (my_charact=’a’): if(my_charact=’a’): print(“a”) else if: (my_charact=’b’) print(“b”) else: print(“c”) Practice Exercise Write a program that uses the if..else flow control statement to check nonleap year and display either scenario. Include comments and indentation to enhance the readability of the program.
For Loop in Python Indentation is used to separate the body of for loop in Python. Note: Simple linear list takes the following syntax: Variable_name=[values separated by a comma] Example Start IDLE. Navigate to the File menu and click New Window. Type the following: numbers=[12, 3,18,10,7,2,3,6,1] sum=0
#Variable name storing the list
#Initialize sum before usage, very important
for cumulative in numbers:
#Iterate over the list
sum=sum+cumulative print(“The sum is” ,sum) Practice Exercise Start IDLE. Navigate to the File menu and click New Window. Type the following: Write a Python program that uses the for loop to sum the following lists. marks=[3, 8,19, 6,18,29,15] ages=[12,17,14,18,11,10,16] mileage=[15,67,89,123,76,83] cups=[7,10,3,5,8,16,13]
Range() Function in Python The range function (range()) in Python can help generate numbers. Remember, in programming, the first item is indexed 0. Therefore, range(11) will generate numbers from 0 to 10. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: print(range(7)) The output will be 0,1,2,3,4,5,6 Practice Exercise: Without writing and running a Python program, what will be the output for: range(16) range(8) range(4) Using range() and len() and indexing Practice Exercise Write a Python program to iterate through the following list and include the message I listen to (each of the music genre). Use the for loop, len() and range(). folders=[‘Rumba’, ‘House’, ‘Rock’]
Using for Loop with Else It is possible to include a for loop with anything else but as an option. The else block will be executed if the items contained in the sequence are exhausted. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: marks=[12, 15,17] for i in marks: print(i) else: print(“No items left”) Challenge: Write a Python program that prints all prime numbers between 1 and 50.
While Loop in Python In Python, the while loop is used to iterate over a block of program code as long as the test condition stays True. The while loop is used in contexts where the user does not know the loop cycles required. As earlier indicated, the while loop body is determined through indentation. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: Caution: Failing to include the value of the counter will lead to an infinite loop. Practice Exercise Write a Python program that utilizes the while flow control statement to display the sum of all odd numbers from 1 to 10. Write a Python program that employs the while flow control statement to display the sum of all numbers from 11 to 21. Write a Python program that incorporates a while flow control statement to display the sum of all even numbers from 1 to 10.
Using While Loop with Else If the condition is false and no break occurs, a while loop’s else part runs. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: track = 0 while track< 4: print("Within the loop") track = track + 1 else: print("Now within the else segment")
Python’s Break and Continue Let us use real-life analogy where we have to force a stop on iteration before it evaluates completely. Think of when cracking/breaking passwords using a simple dictionary attack that loops through all possible character combinations; you will want the program to strike the password searched without completing it immediately. Again, think of when recovering photos you accidentally deleted using recovery software, you will want the recovery to stop iterating through the files immediately it finds items within the specified range. The break and continue statement in Python works similarly. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: for tracker in "bring": if tracker == "i": break print(tracker) print("The End")
Continue Statement in Python When the continue statement is used, the interpreter skips the rest of the code inside a loop for the current iteration only and the loop does not terminate. The loop continues with the next iteration. The syntax of Python continue continue Example Start IDLE. Navigate to the File menu and click New Window. Type the following: for tracker in "bring": if tracker == "i": continue print(tracker) print("Finished") The output of this program will be: b r n g Finished Analogy: Assume that you are running data recovery software and have specified skip word files (.doc, dox extension). The program will have to continue iterating even after skipping word files.
Practice Exercise Write a Python program using for loop that will break after striking “v” in the string “Oliver”. Write a Python program that will continue after skipping “m” in the string “Lemon”.
Pass Statement in Python Like a comment, a pass statement does not impact the program as it leads to no operation. The syntax of pass pass Think of a program code that you plan to use in the future but is not currently needed. Instead of having to insert that code in the future, the code can be written as pass statements. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: my_list={‘k’,’i’,’n’} for tracker in my_list: pass
Functions in Python Functions in Python help split large code into smaller units. Functions make a program more organized and easy to manage. In Python functions will assume the syntax form below: def name_of_function (arguments): “””docstring””” statements(s)
Example Start IDLE. Navigate to the File menu and click New Window. Type the following: def welcome(salute): """The Python function welcomes you to the individual passed in as parameter""" print("Welcome " + salute + ". Lovely Day!")
Calling a Function in Python We can call a function once we have defined it from another function or program. Calling a function simply involves typing the function name with suitable parameters. Start IDLE. Navigate to the File menu and click New Window. Type the following: welcome(‘Brenda’) The output will be “Welcome Brenda. Lovely Day!’ Practice Exercise Write a function that when called outputs “Hello (student name), kindly submit your work by Sunday.”
Docstring It is placed after the function header as the first statement and explains in summary what the function does. Docstring should always be placed between triple quotes to accommodate multiple line strings. Calling/Invoking the docstring we typed earlier Example Start IDLE. Navigate to the File menu and click New Window. Type the following: print(welcome._doc_) The output will be “This function welcomes you to the individual passed in as parameter”. The syntax for calling/invoking the docstring is: print(function_name. _doc_)
Python Function Return Statement Return syntax return [list of expressions]
Discussion The return statement can return a value or a None object. Example Print(welcome(“Richard”)) #Passing arguments and calling the function Welcome, Richard. Lovely Day! None
#the returned value
Random Function in Python Start IDLE. Navigate to the File menu and click New Window. Type the following: import math print(random.shuffle_num(11, 21)) y=[‘f’,’g’,’h’,’m’] print(random.pick(y)) random.anypic(y) print(y) print(your_pick.random())
Iterators In Python, iterator refers to objects that can be iterated upon. The for loop is used to implement iterators in Python anywhere. Iterators in Python can also be implemented using generators and comprehensions. In Python, an iterator concerns a construct that can be called several times performing the same action. Iterators in Python implement the _iter_() special method and _next_() special method, which is collectively referred to as the iterator protocol. In Python, an object becomes iterable if we can get an iterator from it, for example, string, tuple, and the list is iterable. In operation, the iter() function calls the _iter_() method and returns an iterator from the set or list or string.
Manually Iterating Through Items in Python The next() function is used in Python to manually loop through all the items of an iterator Example list_mine = [14, 17, 10, 13] iter_list = iter(list_mine) print(next(iter_list)) print(next(iter_list)) print(my_iter.__next__()) print(my_iter.__next__()) next(iter_list) NOTE The for loop provides an efficienty ay of automatically iterating through a list. The for loop can be applied on a file, list or string among others in Python. Example for element in list_mine: print(element)
Explaining the Loop The for loop gets to iterate automatically through the Python list. Example for element in list_mine: object_iter = iter(iterable) while True: try: element = next(object_iter) except StopIteration: break
Creating Custom Iterator in Python On the other hand, the _next_() will scan and give the next element in the sequence and will trigger the StopIteration exception once it reaches the end. Example class Power: """Will implement powers of 2 """ def __init__(self, max = 0): self.max = max def __iter__(self): self.m = 0 return self def __next__(self): if self.m >> def div(dividend, divisor): print(dividend / divisor) >>> div(7, 0) Traceback (recent call to come last): File "", line 1, in File "", line 2, in div ZeroDivisionError: division by zero >>> _ Of course, division by zero is an impossible operation.
Because of that, Python stops the program since it does not know what you want to do when this is encountered. It does not know any valid answer or response. That being said, the problem here is that the error stops your program entirely. To manage this exception, you have two options. First, you can make sure to prevent such an operation from happening in your program. Second, you can let the operation and errors happen, but tell Python to continue your program. Here is what the first solution looks like: >>> def div(dividend, divisor): if (divisor != 0): print(dividend / divisor) else: print("Cannot Divide by Zero.") >>> div(5, 0) Cannot Divide by Zero. >>> _ Here is what the second solution looks like: >>> def div(dividend, divisor): try: print(dividend / divisor) except: print("Cannot Divide by Zero.")
>>> div(5, 0) Cannot Divide by Zero. >>> _ Remember the two core solutions to errors and exceptions. One, prevent the error from happening. Two, manage the aftermath of the error.
Using Try-Except Blocks In the previous example, the try-except blocks were used to manage the error. However, you or your user can still do something to screw your solution up. For example: >>> def div(dividend, divisor): try: print(dividend / divisor) except: print("Cannot Divide by Zero.") >>> div(5, "a") Cannot Divide by Zero. >>> _ The statement prepared for the “except” block is not enough to justify the error that was created by the input. Dividing a number by a string does not actually warrant a “Cannot Divide by Zero.” message. For this to work, you need to know more about how to use except block properly. First of all, you can specify the error that it will capture and respond to by indicating the exact exception. For example: >>> def div(dividend, divisor): try: print(dividend / divisor) except ZeroDivisionError:
print("Cannot Divide by Zero.") >>> div(5, 0) Cannot Divide by Zero. >>> div(5, "a") Traceback (most recent call last): File "", line 1, File "", line 3, in div TypeError: unsupported operand type(s) for /: 'int' and 'str' >>> _ Now, the error that will be handled has been specified. When the program encounters the specified error, it will execute the statements written on the “except” block that captured it. If no except block is set to capture other errors, Python will then step in, stop the program, and give you an exception. But why did that happen? When the example did not specify the error, it handled everything. That is correct. When the “except” block does not have any specified error to look out for, it will capture any error instead. For example: >>> def div(dividend, divisor): try: print(dividend / divisor) except: print("An error happened.")
>>> div(5, 0) An error happened. >>> div(5, "a") An error happened. >>> _ That is a better way of using the “except” block if you do not know exactly the error that you might encounter.
Reading an Exception Error Trace Back The most important part in error handling is to know how to read the traceback message. It is fairly easy to do. The trace-back message is structured like this:
, , : Here are things you need to remember: The traceback stack header informs you that an error occurred. The filename tells you the name of the file where the fault is located. Since the book's examples are coded using the interpreter, it always indicated that the file name is "" or standard input. The line number tells the exact line number in the file that caused the error. Since the examples are tested in the interpreter, it will always say line 1. However, if the error is found in a code block or module, it will return the line number of the statement relative to the code block or module. The function/module part tells what function or module owns the statement. If the code block does not have an identifier or the statement is declared outside code blocks, it will default to . The exception tells you what kind of error happened.
Some of them are built-in classes (e.g., ZeroDivisionError, TypeError, and etcetera) while some are just errors (e.g., SyntaxError). You can use them on your except blocks. The exception description gives you more details with regards to how the error occurred. The description format may vary from error to error.
Using Exceptions to Prevent Crashes Anyway, to know the exceptions that you can use, all you need to do is to generate the error. For example, using the TypeError found in the previous example, you can capture that error too and provide the correct statements in response. >>> def div(dividend, divisor): try: print(dividend / divisor) except ZeroDivisionError: print("Cannot Divide by Zero.") except TypeError: print("Cannot Divide by Anything Other Than a Number.") except: print("An unknown error has been detected.") >>> div(5, 0) Cannot Divide by Zero. >>> div(5, "a") Cannot Divide by Anything Other Than a Number. >>> div(undeclaredVariable / 20) An unknown error has been detected. >>> _ However, catching errors this way can still be problematic. It allows you to prevent a crash or stop, but you have no idea about what happened.
To know the unknown error, you can use the as keyword to pass the Exception details to a variable. Convention wise, the variable detail is often used for this purpose. For example: >>> def div(dividend, divisor): try: print(dividend / divisor) except Exception as detail: print("An error has been detected.") print(detail) print("Continuing with the program.") >>> div(5, 0) An error has been detected. division by zero Continuing with the program. >>> div(5, "a") An error has been detected. unsupported operand type(s) for /: 'int' and 'str' Continuing with the program. >>> _
The Else Block There are times that an error happens in the middle of your code block. You can catch that error with try and except. However, you might not want to execute any statement in that code block if an error happens. For example: >>> def div(dividend, divisor): try: quotient = dividend / divisor except Exception as detail: print("An error has been detected.") print(detail) print("Continuing with the program.") print(str(dividend) + " divided by " + str(divisor) + " is:") print(quotient) >>> div(4, 2) 4 divided by 2 is: 2.0 >>> div(5, 0) An error has been detected. division by zero Continuing with the program. 5 divided by 0 is: Traceback (most recent call last):
File "", line 1, in File "", line 8, in div Print(quotient) UnboundLocalError: local variable 'quotient' referenced before assignment >>> _ As you can see, the next statements after the initial fault are dependent on it thus, they are also affected. In this example, the variable quotient returned an error when used after the try and except block since its supposed value was not assigned because the expression assigned to it was impossible to evaluate. In this case, you would want to drop the remaining statements that are dependent on the contents of the try clause. To do that, you must use the else block. For example: >>> def div(dividend, divisor): try: quotient = dividend / divisor except Exception as detail: print("An error has been detected.") print(detail) print("Continuing with the program.") else: print(str(dividend) + " divided by " + str(divisor) + " is:") print(quotient) >>> div(4, 2)
4 divided by 2 is: 2 >>> div(5, 0) An error has been detected. division by zero Continuing with the program. >>> _ The first attempt on using the function with proper arguments went well. On the second attempt, the program did not execute the last two statements under the else block because it returned an error. The else block always follows except blocks. The else block's function is to let Python execute the statements under it when the try block did not return and let Python ignore them if an exception happens.
Failing Silently Failing silently or silent fails is a programming term often used during error and exception handling. From a user’s perspective, silent failure is a state wherein a program fails at a certain point but never informs a user. From a programmer’s perspective, silent failure is a state wherein the parser, runtime development environment, or compiler fails to produce an error or exception and proceed with the program. This often leads to unintended results. A programmer can also induce silent failures when he either ignores exceptions or bypasses them. Alternatively, he blatantly hides them and creates workarounds to make the program operate as expected, even if an error happened. He might do that because of multiple reasons, such as the error is not program breaking or the user does not need to know about the error.
Handling the File Not Found Exception Error There will be times when you will encounter the FileNotFoundError. Handling such an error depends on your intent or purpose with regards to opening the file. Here are common reasons you will encounter this error: You did not pass the directory and filename as a string. You misspelled the directory and filename. You did not specify the directory. You did not include the correct file extension. The file does not exist. The first method to handle the FileNotFoundError exception is to make sure that all the common reasons do not cause it. Once you do, then you will need to choose the best way to handle the error, which is completely dependent on the reason you are opening a file in the first place.
Checking If File Exists Again, there are always two ways to handle an exception: preventive and reactive. The preventive method is to check if the file exists in the first place. To do that, you will need to use the os (os.py) module that comes with your Python installation. Then, you can use its path module’s isfile() function. The path module’s file name depends on the operating system (posixpath for UNIX, ntpath for Windows, macpath for old MacOS). For example: >>> from os import path >>> path.isfile("random.txt") False >>> path.isfile("sampleFile.txt") True >>> _
Try and Except You can also do it the hard way by using try, except, and else blocks. >>> def openFile(filename): try: x = open(filename, "r") except FileNotFoundError: print("The file '" + filename + "' does not exist." except FileNotFound: print("The file '" + filename + "' does exist." >>> openFile("random.txt") The file 'random.txt' does not exist. >>> openFile("sampleFile.txt") The file 'sampleFile.txt' does exist. >>> _
Creating a New File If the file does not exist, and your goal is to overwrite any existing file anyway, then it will be best for you to use the "w" or "w+" access mode. The access mode creates a new file for you if it does not exist. For example: >>> x = open("new.txt", "w") >>> x.tell() 0 >>> _ If you are going to read and write, use "w+" access mode instead.
Chapter 6. Variable Scope and Lifetime in Python Functions Variables and parameters defined within a Python function have local scope implying they are not visible from outside. In Python, the variable lifetime is valid as long the function executes and is the period throughout that a variable exists in memory. Returning the function destroys the function variables. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: def function_my() marks=15 print(“The value inside the function is:”, marks) marks=37 function_my() Print”The value outside the function is:”,marks)
Function Types They are broadly grouped into user-defined and built-in functions. The built-in functions are part of the Python interpreter while the user specifies the user-defined functions. Exercise: Give three examples of built-in functions in Pythons. Function Argument Calling a function requires passing the correct number of parameters; otherwise, the interpreter will generate an error. Illustration Start IDLE. Navigate to the File menu and click New Window. Type the following: def salute(name,message): """This function welcomes to the student with the provided message""" print("Welcome",salute + ', ' + message) welcome("Brenda","Lovely Day!") Note: the function welcome() has two parameters. We will not get any error as it has been fed with two arguments. Let us try calling the function with one argument and see what happens: welcome(“Brenda”) #only one argument passed Running this program will generate an error saying “TypeError: welcome() missing 1 required positional argument.
The same will happen when we pass no arguments to the function. Example 2: Start IDLE. Navigate to the File menu and click New Window. Type the following: welcome() The interpreter will generate an error “typeerror: welcome() missing 2 required positional arguments”.
Keywords Arguments in Python Python provides a way of calling functions using keyword arguments. When calling functions using keyword arguments, the order of arguments can be changed. The values of a function are matched to the argument position-wise. Note: In the previous example, function welcome when invoked as welcome(“Brenda,” “Lovely Day!”). The value “Brenda” is assigned to the argument name and “Lovely Day!” to msg. Calling the function using keywords Start IDLE. Navigate to the File menu and click New Window. Type the following: welcome(name=”Brenda”, msg=”Lovely Day!”) Keywords not following the order Welcome(msg=”Lovely Day!”, name=”Brenda”)
Arbitrary Arguments It may happen that we do not have knowledge of all arguments needed to be passed into a function. Analogy: assume that you are writing a program to welcome all new students this semester. In this case, you do not how many will report. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: def welcome(*names): “””This welcome function salutes all students in the names tuple.””” for name in names: print(“Welcome”.name) welcome("Lucy","Richard","Fridah","James") The output of the program will be: Welcome Lucy Welcome Richard Welcome Fridah Welcome James
Recursion in Python The definition of something in terms of itself is called recursion. A recursive function calls other functions. Example: Python program to compute integer factorials Start IDLE. Navigate to the File menu and click New Window. Type the following: Exercise Write a Python program to find the factorial of 7.
Python Anonymous Function Some functions may be specified devoid of a name, and these are called anonymous functions. The lambda keyword is used to denote an anonymous function. Anonymous functions are also referred to as lambda functions in Python. Syntax lambda arguments: expression. Lambda functions must always have one expression but can have several arguments. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: double = lambda y: y * 2 # Output: 10 print(double(5)) Example 2: We can use inbuilt functions such as filter () and lambda to show only even numbers in a list/tuple. Start IDLE. Navigate to the File menu and click New Window. Type the following: first_marks = [3, 7, 14, 16, 18, 21, 13, 32] fresh_marks = list(filter(lambda n: (n%2 == 0) , first_marks))
# Output: [14, 16, 18, 32] print(fresh_marks) Lambda function and map() can be used to double individual list items. Example 3: Start IDLE. Navigate to the File menu and click New Window. Type the following: first_score = [3, 7, 14, 16, 18, 21, 13, 32] fresh_score = list(map(lambda m: m * 2 , first_score)) # Output: [6, 14, 28, 32, 36, 42, 26, 64] Print(fresh_score)
Python’s Global, Local and Nonlocal Python’s Global Variables Variables declared outside of a function in Python are known as global variables. They are declared in the global scope. A global variable can be accessed outside or inside of the function. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: y= "global" def foo(): print("y inside the function :", y) foo() print("y outside the function:", y) Explanation: In the illustration above, y is a global variable and is defined as a foo() to print the global variable y. When we call the foo() it will print the value of y. Local Variables: A local variable is declared within the body of the function or in the local scope. Example: Start IDLE.
Navigate to the File menu and click New Window. Type the following: def foo(): x = "local" foo() print(x) Explanation: Running this program will generate an error indicating ‘x’ is undefined. The error is occurring because we are trying to access local variable x in a global scope, whereas foo() functions only in the local scope.
Creating a Local Variable in Python Example: A local variable is created by declaring a variable within the function. def foo(): Start IDLE. Navigate to the File menu and click New Window. Type the following: x = "local" print(x) foo() Explanation: When we execute the code, the output will be: Local
Python’s Global and Local Variable Using both local and global variables in the same code. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: y = "global" def foo(): global y x = "local" y=y*2 print(y) print(x) foo() Explanation: The output of the program will be: global global local Explanation: We declared y as a global variable and x as a local variable in the foo(). The * operator issued to modify the global variable y, and finally, we printed both y and x. Local and Global Variables with the same name Start IDLE.
Navigate to the File menu and click New Window. Type the following: y=6 def foo(): y=11 print(“Local variable y-“, y) foo() Print("Global variable y-", y)
Python’s Nonlocal Variables A Python’s nonlocal variable is used in a nested function whose local scope is unspecified. It is neither global nor local scope. Example: Creating a nonlocal variable. Start IDLE. Navigate to the File menu and click New Window. Type the following: def outer(): y = "local variable" def inner(): nonlocal y y = "nonlocal variable" print("inner:", y) inner() print("outer scope:", y) Outer()
Global Keyword in Python There are rules when creating a global keyword: A global keyword is local by default when we create a variable within a function. It is global by default when we define a variable outside of a function, and you do not need to use the global keyword. The global keyword is used to read and write a global variable within a function. The use of a global keyword outside a function will have no effect. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: number = 3
#A global variable
def add(): print(number) add() The output of this program will be 3. Modifying a global variable from inside the function. number=3
#a global variable
def add(): number= number + 4 print(number) add()
# add 4 to 3
Explanation: When the program is executed it will generate an error indicating that the local variable number is referenced before assignment. The reason for encountering the error is that we can only access the global variable but cannot modify it from inside the function. Using a global keyword would solve this. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: Modifying global variable within a function using the global keyword. number = 3
# a global variable
def add(): global number number= number + 1 # increment by 1 print("Inside the function add():", number) add() print("In main area:", number) Explanation: When the program is run, the output will be: Inside the function add(): 4 In the main area: 4 We defined a number as a global keyword within the function add(). The variable was then incremented by 1, variable number. Then we called the add () function to print global variable c.
Creating Global Variables across Python Modules We can create a single module config.py that will contain all global variables and share the information across several modules within the same program. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: Create config.py x=0 y=”empty” Then create an update.py file to modify global variables Import config config.x=11 config.y=”Today” Then create a main.py file to evaluate the changes in value import config import update print(config.x) print(config.y) Explanation Running the main.py file will generate: 11 Today
Python Modules Modules consist of definitions as well as program statements. An illustration is a file name config.py, which is considered as a module. The module name would be config. Modules are used to help break large programs into smaller, manageable, and organized files, as well as promoting the reusability of code. Example: Creating the First module Start IDLE. Navigate to the File menu and click New Window. Type the following: Def add(x, y): “””This is a program to add two numbers and return the outcome""" outcome=x+y return outcome
Module Import The keyword import is used to import. Example: Import first The dot operator can help us access a function as long as we know the module's name. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: first.add(6,8) Explanation:
Import Statement in Python The import statement can be used to access the definitions within a module via the dot operator. Start IDLE. Navigate to the File menu and click New Window. Type the following: import math print("The PI value is", math.pi) Import with renaming Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: import math as h print(“The PI value is-“,h.pi) Explanation: In this case, h is our renamed math module with a view helping save typing time in some instances. When we rename, the new name becomes valid and recognized one and not the original one. From…import statement Python. It is possible to import particular names from a module rather than importing the entire module. Example: Start IDLE.
Navigate to the File menu and click New Window. Type the following: from math import pi Print("The PI value is-", pi)
Importing All Names Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: from math import* print("The PI value is-", pi) Explanation: In this context, we are importing all definitions from a particular module, but it is an encouraging norm as it can lead to unseen duplicates.
Module Search Path in Python Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: import sys sys.path Python searches everywhere, including the sys file.
Reloading a Module Python will only import a module once, increasing efficiency in execution. print(“This program was executed”) import mine Reloading Code Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: import mine import mine import mine Mine.reload(mine)
Dir() built-in Python function For discovering names contained in a module, we use the dir() inbuilt function. Syntax Dir(module_name)
Python Package Files in python hold modules and directories are stored in packages. A single package in Python holds similar modules. Therefore, different modules should be placed in different Python packages. Data types in Python Numbers The presence or absence of a decimal point separates integers and floating points. For instance, 4 is an integer, while 4.0 is a floating-point number. On the other hand, complex numbers in Python are denoted as r+tj, where j represents the real part, and t is the virtual part. In this context, the function type() is used to determine the variable class. The Python function instance() is invoked to make a determination of which specific class function originates from. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: number=6 print(type(number))#should output class int print(type(6.0))#should output class float complex_num=7+5j print(complex_num+5) print(isinstance(complex_num, complex))#should output True
Important: Integers in Python can be of infinite length. Floating numbers in Python are assumed precise up to fifteen decimal places.
Number Conversion This segment assumes you have prior basic knowledge of how to manually or using a calculator to convert decimal into binary, octal, and hexadecimal. Check out the Windows Calculator in Windows 10, Calculator version Version 10.1804.911.1000, and choose programmer mode to convert automatically. Programmers often need to convert decimal numbers into octal, hexadecimal, and binary forms. A prefix in Python allows the denotation of these numbers to their corresponding type. Number SystemPrefix Octal‘0O’ or '0o' Binary‘0B' or '0b' Hexadecimal'0X or '0x' Example: print(0b1010101)#Output:85 print(0x7B+0b0101)#Output: 128 (123+5) print(0o710)#Output:710 Exercise: Write a Python program to display the following: a.0011 11112 b.7478 C.9316
Type Conversion Sometimes referred to as coercion, type conversion allows us to change one type of number into another. The preloaded functions such as float(), int() and complex() enable implicit and explicit type conversions. The same functions can be used to change from strings. Example Start IDLE. Navigate to the File menu and click New Window. Type the following: int(5.3)#Gives 5 int(5.9)#Gives 5 The int() will produce a truncation effect when applied to floating numbers. It will simply drop the decimal point part without rounding off. For the float() let us take a look: Start IDLE. Navigate to the File menu and click New Window. Type the following: float(6)#Gives 6.0 ccomplex(‘4+2j’)#Gives (4+2j) Exercise: Apply the int() conversion to the following: a.4.1 b.4.7
c.13.3 d.13.9 Apply the float() conversion to the following: e.7 f.16 G.19 Decimal in Python Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: (1.2+2.1)==3.3 #Will return False, why? Explanation: The computer works with finite numbers, and fractions cannot be stored in their raw form as they will create an infinitely long binary sequence. Fractions in Python The fractions module in Python allows operations on fractional numbers. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: import fractions print(fractions.my_fraction(2.5))#Output 5/2 print(fractions.my_fraction(4))#Output 5
print(fractions.my_fraction(2,5))#output 2/5 Important: Creating my_fraction from float can lead to unusual results due to the misleading representation of binary floating point.
Mathematics in Python To carry out mathematical functions, Python offers modules like random and math. Start IDLE. Navigate to the File menu and click New Window. Type the following: import math print(math.pi)#output:3.14159…. print(math.cos(math.pi))#the output will be -1.0 print(math.exp(10))#the output will be 22026.4…. print(math.log10(100))#the output will be 2 print(math.factorial(5))#the output will be 120 Exercise: Write a python program that uses math functions from the math module to perform the following: a.Square of 34 b.Log1010000 c.Cos 45 x sin 90 D.Exponent of 20
Random Function in Python Start IDLE. Navigate to the File menu and click New Window. Type the following: import math print(random.shuffle_num(11, 21)) y=[‘f’,’g’,’h’,’m’] print(random.pick(y)) random.anypic(y) print(y) Print(your_pick.random())
Lists in Python We create a list in Python by placing items called elements inside square brackets separated by commas. The items in a list can be of mixed data types. Start IDLE. Navigate to the File menu and click New Window. Type the following: list_mine=[]#empty list list_mine=[2,5,8]#list of integers list_mine=[5,”Happy”, 5.2]#list having mixed data types Exercise: Write a program that captures the following in a list: “Best”, 26,89,3.9
Nested Lists A nested list is a list as an item in another list. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: list_mine=[“carrot”, [9, 3, 6], [‘g’]] Exercise Write a nested for the following elements: [36,2,1],”Writer”,’t’,[3.0, 2.5]
Accessing Elements from a List In programming and in Python specifically, the first time is always indexed zero. For a list of five items, we will access them from index0 to index4. Failure to access the items in a list in this manner will create an index error. The index is always an integer as using other number types will create a type error. For nested lists, they are accessed via nested indexing. Example: Start IDLE. Navigate to the File menu and click New Window. Type the following: list_mine=[‘b’,’e’,’s’,’t’] print(list_mine[0])#the output will be b print(list_mine[2])#the output will be s print(list_mine[3])#the output will be t Exercise: Given the following list: your_collection=[‘t’,’k’,’v’,’w’,’z’,’n’,’f’] a.Write a Python program to display the second item in the list b.Write a Python program to display the sixth item in the last C.Write a Python program to display the last item in the list.
Chapter 7. Modules What Are the Modules? In Python, a module is a portion of a program (an extension file) that can be invoked through other programs without having to write them in every program used. Besides, they can define classes and variables. These modules contain related sentences between them and can be used at any time. The modules' use is based on using a code (program body, functions, and variables) already stored on it called import. With the use of the modules, it can be observed that Python allows simplifying the programs a lot because it allows us to simplify the problems into a smaller one to make the code shorter so that programmers do not get lost when looking for something in hundreds of coding lines when making codes.
How to Create a Module? To create a module in Python, we don't need a lot; it's very simple. For example: if you want to create a module that prints a city, we write our code in the editor and save it as "mycity.py" Once this is done, we will know that this will be our module's name (omitting the .py sentence), which will be assigned to the global variable __city__. This is a very simple code designed for users of Python 2. The print function is not in parentheses, so that's the way this Python version handles that function. But, beyond that, we can see that the file "mycity.py" is pretty simple and not complicated at all, since the only thing inside is a function called "print_city," which will have a string as a parameter, and what it will do is to print "Hello, welcome to," and this will concatenate with the string that was entered as a parameter.
Import Statement This statement is used to import a module. Through any Python code file, its process is as follows: The Python interpreter searches the file system for the current directory where it is executed. Then, the interpreter searches for its predefined paths in its configuration. When it meets the first match (the module's name), the interpreter automatically executes it from start to finish. When importing a module for the first time, Python will generate a compiled .pyc extension file. This extension file will be used in the following imports of this module. When the interpreter detects that the module has already been modified since the last time it was generated, it will generate a new module. Example: This will print: You must save the imported file in the same directory where Python is using the import statement so that Python can find it. As we could see in our example, importing a module allows us to improve our program's functionalities through external files. Now, let's see some examples. The first one is a calculator where will create a module that performs all the mathematical functions and another program that runs the calculator itself. The first thing we do is the module "calculator.py" which is responsible for doing all the necessary operations. Among them are addition, subtraction, division, and multiplication, as you can see.
We included the use of conditional statements such as if, else, and elif. We also included the use of exceptions so that the program will not get stuck every time the user enters an erroneous value at the numbers of the calculator for the division. After that, we will create a program that will have to import the module previously referred to so that it manages to do all the pertinent mathematical functions. But at this time, you might be thinking that the only existing modules are the ones that the programmer creates. The answer is no since Python has modules that come integrated into it. With them, we will make two more programs: the first one is an improvement of the one that we have just done, and the second one will be an alarm that will print on screen a string periodically. First example: The first thing that was done was to create the module, but at first sight, we have a surprise, which is that math was imported. What does that mean to us? Well, that we are acquiring the properties of the math module that comes by default in Python. We see that the calculator function is created that has several options. If the op value is equal to 1, the addition operation is made. If it is equal to 2, the subtraction operation is made, and so on. But so new is from op is equal to 5 because, if this is affirmative, then it will return the value of the square root of the values num1 and num2 through the use of math.sqrt(num1), which returns the result of the root. Then, if op is equal to 6, using functions "math.radians()" which means that num1 or num2 will become radians since that is the type of value accepted by the functions "math.sin(),” meaning that the value of the sin of num1 and num2 will return to us, which will be numbers entered by users arbitrarily
who will become radians and then the value of the corresponding sin. The last thing will be to create the main program, as it can be seen next: Here, we can see the simple program, since it only imports the module "calculator.py," then the variables num1 and num2 are assigned the value by using an input. Finally, an operation to do is chosen, and to finish is called the calculator function of the calculator module to which we will pass three parameters. Second example: We are going to create a module, which has within itself a function that acts as a chronometer in such a way that it returns true in case time ends. In this module, as you can see, another module is imported, which is called "time," and as its name refers, functions to operate with times, and has a wide range of functions, from returning dates and times to help to create chronometers, among others. The first thing we do is to create the cron() function, which starts declaring that the start Alarm variables will be equal to time.time, which means that we are giving an initial value to this function o know the exact moment in which the function was initialized to then enter into an infinite cycle. Since the restriction is always True, therefore, this cycle will never end, unless the break command is inside it. Then, within the while cycle, there are several instructions. The first is that the final variable is equal to time.time() to consider the specific moment we are located and monitor time. After that, another variable is created called times, and this acquires the value of the final minus start Alarm. But you will be wondering what the round function does. It rounds up the values; we do that to work easier. But this is not enough; therefore, we use an if since, if the subtraction between the end and the beginning is greater or equal to 60, then one minute
was completed, and what happens to this? Why 60? This is because the time module works with a second and for a minute to elapse, 60 seconds have to be elapsed. Therefore, the subtraction between the end and the beginning has to be greater than or equal to 60. True will be returned in the affirmative case, and finally, we will get out of the infinite cycle. Once the alarm module is finished, we proceed to make the program, as we can see below: We can see that the program imports two modules, the one we have created, the alarm and the time module. The first thing we do is to create the variable s as an input, which tells the user if he wants to start. If the answer is affirmative, then the variable h representing the time will be equal to "time.strftime ("%H:%M:%S")," which means that we are using a function of the time module that returns the hour to use in the specified format so that it can then be printed using the print function. The next action is to use the alarm module using the command alarm.cron(), which means that the cron() function is being called. When this function is finished, the time will be assigned to the variable h, again, to finish printing it and being able to observe its correct operation. As a conclusion of this chapter, we can say that the modules are fundamental for the proper performance of the programmer since they allow to make the code more legible, in addition, that it allows subdividing the problems to attack them from one to one and thus to carry out the tasks easily.
Locate a Module When importing a module, the interpreter automatically searches the same module for its current address, if this is not available, Python (or its interpreter) will perform a search on the PYTHONPATH environment variable that is nothing more than a list containing directory names with the same syntax as the environment variable. If in any particular case, these previous actions failed, Python would look for a default UNIX path (located in /user/local/lib/python on Windows). The modules are searched in the directory list given by the variable sys.path. This variable contains the current directory, the PYTHONPATH directory, and the entire directory that comes by default in the installation.
Syntax of PYTHONPATH A PYTHONPATH syntax made in windows looks like this: Unlike a PYTHONPATH syntax made in UNIX
Chapter 8. Working with Files Programs are made with input and output in mind. You input data to the program, the program processes the input, and it ultimately provides you with output. For example, a calculator will take in numbers and operations you want. It will then process the operation you wanted. And then, it will display the result to you as its output. There are multiple ways for a program to receive input and to produce output. One of those ways is to read and write data on files. To start learning how to work with files, you need to learn the open() function. The open() function has one required parameter and two optional parameters. The first and required parameter is the file name. The second parameter is the access mode. And the third parameter is buffering or buffer size. The filename parameter requires string data. The access mode requires string data, but there is a set of string values that you can use and is defaulted to "r." The buffer size parameter requires an integer and is defaulted to 0. To practice using the open() function, create a file with the name sampleFile.txt inside your Python directory. Try this sample code: >>> file1 = open("sampleFile.txt") >>> _
Note that the file function returns a file object. The statement in the example assigns the file object to variable file1. The file object has multiple attributes, and three of them are: Name: this contains the name of the file. Mode: this contains the access mode you used to access the file. Closed: this returns False if the file has been opened and True if the file is closed. When you use the open() function, the file is set to open. Now, access those attributes. >>> file1 = open("sampleFile.txt") >>> file1.name 'sampleFile.txt' >>> file1.mode 'r' >>> file1.closed False >>> _ Whenever you are finished with a file, close them using the close() method. >>> file1 = open("sampleFile.txt") >>> file1.closed False >>> file1.close() >>> file1.closed True >>> _
Remember that closing the file does not delete the variable or object. To reopen the file, just open and reassign the file object. For example: >>> file1 = open("sampleFile.txt") >>> file1.close() >>> file1 = open(file1.name) >>> file1.closed False >>> _
Reading from a File Before proceeding, open the sampleFile.txt in your text editor. Type "Hello World" in it and save. Go back to Python. To read the contents of the file, use the read() method. For example: >>> file1 = open("sampleFile.txt") >>> file1.read() 'Hello World' >>> _
File Pointer Whenever you access a file, Python sets the file pointer. The file pointer is like your word processor’s cursor. Any operation on the file starts at where the file pointer is. When you open a file, and when it is set to the default access mode, which is "r" (read-only), the file pointer is set at the beginning of the file. To know the current position of the file pointer, you can use the tell() method. For example: >>> file1 = open("sampleFile.txt") >>> file1.tell() 0 >>> _ Most of the actions you perform on the file move the file pointer.
For example: >>> file1 = open("sampleFile.txt") >>> file1.tell() 0 >>> file1.read() 'Hello World' >>> file1.tell() 11 >>> file1.read()
'' >>> _ To move the file pointer to a position you desire, you can use the seek() function. For example: >>> file1 = open("sampleFile.txt") >>> file1.tell() 0 >>> file1.read() 'Hello World' >>> file1.tell() 11 >>> file1.seek(0) 0 >>> file1.read() 'Hello World' >>> file1.seek(1) 1 >>> file1.read() 'ello World' >>> _ The seek() method has two parameters. The first is offset, which sets the pointer’s position depending on the second parameter. Also, the argument for this parameter is required.
The second parameter is optional. It is for whence, which dictates where the “seek” will start. It is set to 0 by default. If set to 0, Python will set the pointer’s position to the offset argument. If set to 1, Python will set the pointer’s position relative or in addition to the current position of the pointer. If set to 2, Python will set the pointer’s position relative or in addition to the file’s end. Note that the last two options require the access mode to have binary access. If the access mode does not have binary access, the last two options will be useful to determine the current position of the pointer [seek(0, 1)] and the position at the end of the file [seek(0, 2)]. For example: >>> file1 = open("sampleFile.txt") >>> file1.tell() 0 >>> file1.seek(1) 1 >>> file1.seek(0, 1) 0 >>> file1.seek(0, 2) 11 >>> _
File Access Modes To write to a file, you will need to know more about file access modes in Python. There are three types of file operations: reading, writing and appending. Reading allows you to access and copy any part of the file’s content. Writing allows you to overwrite a file’s contents and create a new one. Appending allows you to write on the file while keeping the other content intact. There are two types of file access modes: string and binary. String access allows you to access a file’s content as if you are opening a text file. Binary access allows you to access a file in its rawest form: binary. In your sample file, accessing it using string access allows you to read the line “Hello World.” Accessing the file using binary access will let you read “Hello World” in binary, which will be b'Hello World'. For example: >>> x = open("sampleFile.txt", "rb") >>> x.read() b'Hello World' >>> _ String access is useful for editing text files. Binary access is useful for anything else, like pictures, compressed files, and executables. In this book, you will only be taught how to handle text files. You can enter multiple values in the file access mode parameter of the open() function.
But you do not need to memorize the combination. You just need to know the letter combinations. Each letter and symbol stands for an access mode and operation. For example: r = read-only—file pointer placed at the beginning r+ = read and write a = append—file pointer placed at the end a+ = read and append w = overwrite/create—file pointer set to 0 since you create the file w+ = read and overwrite/create b = binary By default, file access mode is set to string. You need to add b to allow binary access. For example: "rb"
Writing to a File When writing to a file, you must always remember that Python overwrites and does not insert file. For example: >>> x = open("sampleFile.txt", "r+") >>> x.read() 'Hello World' >>> x.tell(0) 0 >>> x.write("text") 4 >>> x.tell() 4 >>> x.read() 'o World' >>> x.seek(0) 0 >>> x.read() 'texto World' >>> _ You might have expected that the resulting text will be “textHello World.” The write method of the file object replaces each character one by one, starting from the pointer's current position.
Practice Exercise For practice, you need to perform the following tasks: Create a new file named test.txt. Write the entire practice exercise instructions on the file. Close the file and reopen it. Read the file and set the cursor back to 0. Close the file and open it using append access mode. Add a rewritten version of these instructions at the end of the file. Create a new file and put similar content to it by copying the contents of the test.txt file.
Summary Working with files in Python is easy to understand but difficult to implement. As you already saw, there are only a few things that you need to remember. The hard part is when you are actually accessing the file. Remember that the key things you should master are the access modes and the file pointer's management. It is easy to get lost in a file that contains a thousand characters. Aside from being versed with file operations, you should also supplement your learning with the functions and methods of the str class in Python. Most of the time, you will be dealing with strings if you need to work on a file. Do not worry about binary yet. That is a different beast altogether, and you will only need to tame it when you are already adept at Python. As a beginner, expect that you will not deal yet with binary files that often contain media information. Anyway, the next lesson is an elaboration on the “try” and “except” statements. You’ll discover how to manage and handle errors and exceptions effectively.
Chapter 9. Object-Oriented Programming Object-oriented programming (OOP) is a programming paradigm in which programs are modeled according to their properties and behaviors rather than functions and logic. All these elements are then bundled into objects. Let’s say, for example, an object could be you or me in real life. It could be a person with a valid name, age, birth date, occupation, and other data or properties in terms of programming languages. Also, we have certain behaviors. We can walk, talk, work, sleep, jog, and others as well. So, OOP allows us to program and model real-world elements and make them as realistic and meaningful as possible. Each entity in the world can be modeled as a Python object which possesses some data and does some function (has some behavior). What have we been doing till now? It’s the procedural programming paradigm. It provides steps, functions, and code blocks that follow a sequential order of completing commands. Let’s take a look at the most basic concepts of OOP; Classes.
Classes and Objects To model real-world objects in programming, we need a blueprint of these objects or a prototype on which these objects will be based on. Classes are basically user-defined blueprints that state how an object should look, what attributes or properties its object should have, and what it should do (the behaviors). Basically, we describe the general behavior each object of a class can have. What are objects? Objects are instances of a class that we work with in life and programs. This process, making objects from classes, is called instantiation. Let’s take an example under consideration. If you’ve ever come across a car, let’s see what attributes it can have: The color Number of tires Model of the car Engine specifications and others When we program our class called Car, these will act as the properties of our car. Now, what does the car do? It drives, honks, and performs other functions internally. These are the properties or methods of our class Car. See how every car performs these actions and has these properties; Classes are general representations of real-world objects. Objects are the specific instances of these classes and have relevant data in them.
For example, a Ford Mustang will be different from an SUV and have massively different properties. Both of them are individual objects from our class Car. Writing Classes Let’s head back to our editor and code an example class with properties and behaviors. Here’s the code: (don’t stress, I’ll explain everything later) class Car: '''Modelling a car''' def __init__(self, model, license): '''Initialize all attributes and properties ''' self.model = model self.license = license def drive(self): print("Vroom vroom! The car drives!") def honk(self): print("HONK! HONK!") fordMustang = Car("ford-8", "AX-2939") SUV = Car("Honda", "MX-2101") Now, on to the analysis of the code we just wrote. An Explanation on Classes (Code Breakdown) We begin by defining our class on line one using the class keyword and immediately following it is the name of the class. Conventionally, we start the name of the class in uppercase letters. In line 2, we define a docstring.
It is a simple statement that tells us more about what the class has to offer or what it does. On line 4, we finally define a function since we know that functions are defined using the def keyword. All functions defined in a class are called methods of that class. The __init__ is a special method provided by Python for every class, which, upon the instantiation process, runs automatically (when you create a new object). A question you might have: Why the underscores? They are to help you understand that Python's default function, and it shouldn’t conflict with your own special function names. Now, it takes in three parameters in our case, but it can have as many parameters as you want. The self parameter is necessary and should come before others. What is self? The self-keyword is a reference that helps objects refer to themselves anywhere in the class. It allows objects to have individual access to all the properties and methods defined in the class and doesn’t interfere with other objects. The self keyword is automatically passed whenever an object is made, and all other parameters can be passed with it (optional, but if used in the class declaration, they must be provided). Now, on line 6, we prefix each parameter with self. This is so, each object of the class has its own attributes (specific to it) and can be used throughout the class for that object only. Next, we define two other functions and pass the self parameter to it, which is necessary, so each object has access to its own methods. This is it for our class; let’s see what happens next.
Making an Instance: Objects Out of the class's scope, we are finally using our class to make objects of it (or cars out of the class Car). These are basically instructions for how our class should behave for a specific car. We can make an object using this syntax: objectOfTheClass = nameOfClass(‘param1’, ‘param2’, …) Let’s see how we did it for our example: fordMustang = Car("ford-8", "AX-2939") SUV = Car("Honda", "MX-2101") Simply, we ask Python to make a car whose model is something and the license is something else. Again, we ask Python to make a different car with different data. How does it work? As soon as you instantiate an object and assign it, the interpreter runs the __int__ function and assigns self to the newly made object, and also associates the passed arguments to the parameters. The init method then returns an object, and it is assigned to our variable fordMustang. Now, let’s use this object to see what attributes or properties our objects have. Accessing Attributes and Methods Try running the following code after instantiating your class: print(fordMustang.model) print(fordMustang.license) It prints what we sent to it using the arguments in our class.
As they are associated with our object now, the self.model is used to send back the data to us. Here’s the output:
If you ask for these attributes from the second object, the output will be what you sent with it. Here’s an example: print(SUV.model) print(SUV.license)
Now, if you want to access the methods, simply use the dot operator again and ask for the methods. Here’s how: print(fordMustang.honk()) And it outputs:
‘None’ is actually the return statement which is executing and printing as well. Let’s write a new method and use an attribute to see different outputs for different objects: (Add to your class from the last example) def mileage(self): val = input("What is the mileage? ") print(self.model + " Mileage: " + val)
When you run this on a class, you will be prompted to enter a value since we use the input() function. Enter the value and let’s check the output: print(fordMustang.mileage())
Similarly, you can run this in the second class. Let’s take a look at some other concepts for Object-oriented programming next. This chapter covers a little more advanced topics from object-oriented programming like inheritance, child classes, and others. Also, we’ll see how to import classes just like we imported modules. Inheritance In real-world situations, most objects have a relationship to other objects. Similarly, if we program something which is a specialized version of a more general element, this programming concept is called Inheritance, where a child class grabs all properties and methods from the parent class and makes use of them, and adds something of its own. The parent class is the class that is more general and has all the basic functions. For example, if we wrote the code for a Car, it is pretty general. If now, we wish to write a class for an electric car, it will inherit most of the properties and behaviors from the parent (Car) class and add more stuff of its own. Let’s take a look at child classes next. Child Classes: Writing One We’ll model an electric car, a more specific form of our Car class.
Here’s the code and let’s analyze it afterwards: class Car: def __init__(self, model, license): self.model = model self.license = license def drive(self): print("Vroom vroom! The car drives!") def mileage(self): val = input("What is the mileage? ") print(self.model + " Mileage: " + val) class ElectricCar(Car): def __init__(self, model, license): super().__init__(model, license) teslaX = ElectricCar("Tesla", "AA-9323") print(teslaX.mileage()) print(teslaX.model)
Firstly, we write our child class and use the parenthesis to provide to it, the parent class (Car). Next, we declare the __init__ function just like we did before and pass to it the parameters and the self keyword to refer to the object. Next, something strange.
We use the super() function and use the method __init__ to refer to the parent class's init method. This is done so a connection can be made between the parent class and the child class, and now, it can access all attributes and methods of the parent class. Although it doesn’t have any function of its own right now, it can definitely be added in later. Next, we make an object of our new ElectricCar class and ask for the methods and attributes, which yield expected output since now a relationship is made between the parent and child or super and subclass. If you decide to assign methods to the child class, remember, the parent class can’t access them. But, the child class can definitely (always) access the methods of the parent class. Importing Classes As your programs grow, so will the complexity, both in logic and file size. It is always recommended to ship your classes as individual files and import them wherever they are required. This is possible using the import statements we studied a while ago. Here’s how you can import the classes into another file and use them properly: 1. from car import Car 2. from car import ElectricCar 3. from car import Car, ElectricCar 4. from car import * 5. import car
Chapter 10. Real-World Examples of Python One might argue that Python's era was just 2017 when it witnessed some great rise in popularity and growth across the world. However, according to statistics and data, the recent rise in Python's growth could not be ignored. However, why do you think it will keep on attaining the rise in expansion and in size? To answer the question, we dive into the market data and the scale of Python adoption and acquisition by corporations and companies around the world. SO the reason behind the popularity of Python is one and simple. It will be as popular and widely used five years from now as it was five years ago. This is a big statement, and to prove this, we need to see in detail what makes Python so special for these developers and programmers. Years ago, when Python came into the market, people believed it would be dead within months of inception. In face when Larry Wall, who is also the founder and brain behind programing language Perl, was delivering his third annual state of Pearl Opinion, he said that there are some programming languages out there in the market that are C++, Java, Perl, Visual Basics, Javascript and in the last Python. Back then, the leading language for programming was C++, and Perl was the third number in the market. Python had very low demand and was not included among the PLs that could grow. However, in the years to follow, Python grew with tremendous speed and outshined Perl as well. According to Stack Overflow, the visitor volume to question and enquire about Python increased more rapidly than Perl.
The following are the reasons behind the rise and super demand for Python among developers.
Data Science This is one of the most adored languages among data scientists, unlike R and C++. So the current era is the era of big data, and since Python supports large sets of libraries, the internet, and prototypes, Python is the best and fully suited language for the operations. PyMySQL, PyBrain, and NumPy are the reason why Python is so extensively demanded. In addition, integrations and programming are the things a programmer has to deal with in everyday life, and this is the reason behind the huge demand for Python as well because it provides easy integration even of existing apps or sites to other programming languages. This makes it future-oriented and scalable.
Machine Learning These days, in the industry, artificial intelligence, and machine learning have created a huge buzz with every industry investing in the areas to maximize their revenue and cut costs. This is not really possible without the induction of Python. It is actually an interpreted language, and its use makes it elucidated enough to be interpreted by machines and to be understood by the hardware. The growth of ML has been on the rise in the last few years, and I think this is also one of the reasons why Python has witnessed a surge in its demand.
Applications in Web Development According to data, Python is chosen by two out of three developers who, in the start, worked with OHO, and this is an achievement. In the last couple of years, the rising trend of Python shows that it seems like the best alternative. It offers Flask and Django, which makes the process of web development easy and quick. Due to these reasons and features, leading tech giants like Google, Facebook, Instagram, etc., have been using it for a long time. Uber and Google use it for their algorithms. In addition, it is super simple, and this is the reason why it is easy to work with and adaptable.
Automation Software development applications are SCons, which is for build control, Roundup, and Trace, which are for bug tracking and project management. For IDE integrated development environments, a Roster is used. •
The most important stuff related to Python is that it provides special applications for education.
•
Its applications in business include Tryton, which is a 3-tier and advanced level application platform. Another management software called Odoocomes with a huge deal of business applications. This actually makes Python an all-rounder.
•
We have Twisted Python for network programming, which provides a platform and framework for the network programming that is asynchronous. It has a simple socket interface.
•
We all know that the gaming industry is evolving with great potential and the ability to create a replicated amount of revenue. Python's applications for gaming are very safe to use and have been pretty much and widely used. PyGame and PyKyra are bi-development frameworks for games. There is also a variety of 3D rendering options in the libraries.
•
Moreover, we have applications that interest the developers to a huge extent and are used widely. We have console-based applications, applications for robotics, machine learning, web scraping and scripting, and whatnot.
These are the main reason why Python is the best fit in the industry from the point of view of a developer. According to a report of myTectra, the jobs posted in Naukri from 2014 to
2017 have been monitored. The trend of Python jobs is compared to the world’s number one language showing different results.
Things We Can Do in Python In this chapter, we will discuss many things that you can do in Python. Some of the things we can do in Python include the comments, reading and writing, files and integers, strings, and variables. After reading this book, we are sure that you will be able to create a program that will run effectively. Due to the interactive and descriptive nature of Python, a beginner can handle many things using it. Therefore, this chapter will discuss some aspects and comments in Python to help you get started. You can make amazing codes in a short time using the Python programming language.
Comment A comment in the Python programming starts with the # sign. This continues until the programmer gets to the end of the line. A good example is; # This is a comment Print (hello, thanks for contacting us) It instructs your computer to print “Hello, thanks for contacting us.” The Python interpreter ignores all the comments. As a programmer, however, you should not leave a comment after every line. You can put in a comment when you need to explain something. Since Python does not support long comments, it is important to use short and descriptive comments to avoid them going across the lines.
Reading and Writing You will realize that some program requests specific information or show the text on the screen. Sometimes we start the program code by informing the readers about our programs. To ease things for the other coders, it is important to give it a name or title that is simple and descriptive. As a programmer, you can use a string literal that comprises the print function to get the right data. String literal is a line of the text surrounded by the quotes. They can be either double or single quotes. Although the type of quotes a programmer use matters less, the programmer must end with the quotes that he/she has used at the beginning of the phrase. You can command your computer to display a phrase or a word on the screen by just doing as discussed above.
Files Apart from using the print function to obtain a string when printing on the screen, it can be used to write something onto the file. First, you will have to open up the myfile.txt and write on it before assigning it the myfile, which is a variable. Once you have completed the first step, you will have to assign “w” in the new line to tell the program that you will only write or make changes after the file has opened. It is not mandatory to use the print function; use the right methods like the reading method. The reading method is used to open specific files to help you read the available data. You can use this option to open a specific file. Generally, the read method helps the programmers to read the contents into variable data, making it easy for them to open the program they would like to read.
Integers Always make sure that the integers are kept as whole numbers if you are using them. They can be negative or positive only if there are no decimals. However, if your number has a decimal point, use it as a floating number. Python will automatically display such integers on the screen. Moreover, you cannot place one number next to others if you are using the integers because Python is a strongly typed language; thus, it will not recognize them when you use them together. However, you put both the number and the string together by making sure you turn the number into a string first before going to the next steps.
Triple Quotes After reading and understanding both the single and double quotes, it is now time to look at the triple quotes. The triple quotes are used to define the literal that spans many lines. You can use three singles, double, or single when defining an authentic. Strings Although a string is seen as a complicated thing to many beginners, it is a term used by the programmers when referring to a sequence of characters and works just like a list. A string contains more functionality, which is specific than a list. You will find it challenging to format the strings when writing out the code because some messages will not be fixed easily due to its functionality. String formatting is the only way to go away within such a situation. Escape Sequences They are used to donate special characters that are hard to type on the keyboard or those that can be reserved to avoid confusion that may occur in programming. Operator Precedence It will help you to track what you are doing in Python. It makes things easy when ordering the operation to receive the right information. So, take enough time to understand how the operator precedence works to avoid confusion.
Variables Variables refer to the labels donated somewhere in the computer memory to store something like holding values and numbers. In the programming typed statistically, the variables have predetermined values. However, Python enables you to use one variable to store many different types. For example, in the calculator, variables are like memory function to hold values that can be retrieved if you need them later. The variables can only be erased if you store them in the newer value. You will have to name the variable and ensure it has an integer value. Moreover, the programmer can define a variable in Python by providing the label value. For instance, a programmer can name a variable count and even make it an integer of one, and this can be written as; count=1. It allows you to assign the same name to the variable, and in fact, the Python interpreter cannot read through the information if you are trying to access values in the undefined variable. It will display a message showing syntax error. Also, Python provides you with the opportunity of defining different variables in one line, even though this not good according to our experience.
The Scope of a Variable It is not easy to access everything in Python, and there will be differences in the length of the variables. However, the way we define the variable plays a vital role in determining the location and the duration of accessing the variables. The part of the program that allows you to access the variable is called the Scope, while the time taken for accessing the variable is a lifetime. Global variables refer to the variables defined in the primary file body. These variables are visible throughout the file and also in the file that imports specific data. As such, these variables cause a long-term impact, which you may notice when working on your program. This is the reason why it is not good to use global variables in the Python program. We advise programmers to add stuff into the global namespace only if they plan to use them internationally. A local variable is a variable defined within another variable. You can access local variables from the region they are assigned. Also, the variables are available in the specific parts of the program.
Modifying Values It is easy for an individual to define a particular variable whose values have been set for many programming languages. The values, which cannot be modified or changed in the programming language, are called constants. Although this kind of restriction is not allowed in Python, there are used to ensure some variables are marked, indicating that no one should change those values. You must write the name in capital letters, separated with underscores. A good example is shown below: NUMBER_OF_HOURS_IN_A_DAY=24 It is not mandatory to put the correct number in the end. Since Python programming does not keep tracking and has no rules for inserting the correct value in the end, you are free and allowed to say, for example, that they are 25 hours in a day. However, it is important to put the correct value for other coders to use in case they want. Modifying values is essential in your string as it allows a programmer to change the maximum number in the future. Therefore, understanding the working of the string in the program contributes a lot to your program's success. One has to learn and know where to store the values, the rules governing each value, and how to make them perform well in a specific area.
The Assignment Operator It refers to an equal sign (=). You will be using the assignment operator to assign values to the variable located at the left side on the statement's right. However, you must evaluate if the value on the right side is an arithmetic expression. Note that the assignment operator is not a mathematical sign in the programming because, in programming, we are allowed to add all types of things and make them look like they are equivalent to a certain number. This sign is used to show that those items can be changed or turned into the part on the other side.
Chapter 11. Getting Started; Python Tips and Tricks We have spent the last few days of this guidebook looking at some of the different parts of the Python language that we can work with. These are meant to help us get going with some of our coding in Python and will ensure that we can write some of your codes in no time. With some of this information in mind, we can work on some of the final skills that we need to focus on before we are done. We are going to look at some of the tips and tricks that will help you to get started with Python, along with how we can work with web scraping and debugging some of our programs as well. Let’s get started with this one to help us get started and finalize how good our codes can be.
Web Scraping Imagine for a moment that we are going to pull up a large amount of data from many websites, and we want to be able to do this at a very fast rate. How would we be able to go through this without having to go manually through each of the websites that we have and gathering the data in this manner? This is where the process of web scraping is going to come into play. Companies will use web scraping to collect a large amount of information from websites. But why does someone want to go through and collect all of this data, in such large amounts, from these websites in the first place? There are a lot of reasons for this, and some of them are going to include the following: Price comparison: some of the different services that are out there, such as ParseHub, will work with this process in order to collect data from websites for online shopping and then can use this in order to compare prices of similar products. Email address gathering: we can use the process of web scraping in order to help with marketing. This can help us to collect the email IDs that come with customers and then send out bulk emails to these individuals as well. Social media scraping: web scraping is going to be used to collect data from social media sites and then figure out what is trending. Research and development: web scraping is going to be used to help a company collect a lot of data from websites. We can then analyze this and use it to finish our surveys and to help out with research and development. Job listing: details regarding openings of jobs, interviews, and more
can be collected from a variety of websites, and then we can list them in one place in order to make them easier for the user to access Web scraping is going to be more of an automated method that we can use in order to get a huge amount of data from any website that we choose. The data that we are able to get out of these websites will be unstructured. And this web scraping helps a company to collect all of this data and then will ensure that they are able to store it in a structured form. There are a variety of methods that we are able to use in order to scrape these websites that we want, including online Services, writing out some of your own codes, and APIs. Talking about whether or not scraping of this kind is seen as legal or not, it can depend on what the website says. Some websites are fine with this, and some are not. You can check with each website to figure out whether they are fine with it, and if they are, you are able to continue with your web scraping tools and gather up the information that you need. Since we are talking about Python here, we are going to take some time to see how we are able to use Python to help out with web scraping. But this brings up the reasons why we would want to work with Python to help out with this process rather than working with some of the other coding languages that are out there. Some of the features that come with Python and can make it more suitable for web scraping will include: It is easy to use: the code that you are able to use along with Python is going to be simple. This ensures that any of the codes that you want to use for web scraping will not be as messy to work with and can be easy to use. A large library collection: There are a lot of libraries that work with data science and web scraping that are also compatible with what the Python language is able to do.
These include options like Pandas, Matplotlib, and NumPy. This is why you will find that the Python language is going to be suitable for web scraping and even for some of the other manipulations that you want to do with the extracted data. Dynamically typed: this is something in Python where you will not need to go through and define all of the types of data that you are using with our variables. Instead, you are able just to use these variables wherever you would like. This is going to save a lot of time when it comes to working on the codes and can make your job faster than ever. The syntax of Python is going to be easy to understand the syntax that we are able to see with Python is easy to understand, mainly because the statements that come with this are going to be written in English. It is going to be expressive and easy to read, and the indentations will make it easier for us to figure out between the different parts of the code. A small line of code is able to handle some large tasks. Web scraping is a process that we are going to use in order to save some time. And with Python, you can write out a small amount of code to get some of the big tasks that you would like to accomplish. This is going to save you time not only when it comes to figuring out the important data that comes in that website, but can also help you to save time when you would like to write out the codes. Community: at times, when you are a beginner, you are going to find that there are parts of the code that are hard to work with and are not going to go as smoothly as you had hoped in the process. This is where you will find the Python language to be healthy.
If you get stuck while writing out some of your code, you will like that the Python community is going to help you to answer your questions and get things done on the code in no time. Now that we know some of the benefits that come with Python, especially the ones that are going to help us to handle some of the web scrapings that we want to do, it is time for us to take things to the next step and look at how the process of web scraping is going to work. When you run out the code that you want to work within web scraping, you will find that there is a request that is sent out to the URL. Then there is going to be a response sent back from that request, and then the server is able to send the data and allows you a chance to read the page, whether it is XML or HTML at the time. The code is then able to go through and parse the XML or HTML page, find the data, and takes it out. The location where you are going to find this data when it is extracted will depend on what you told the code to do. Often it is going to be moved over to a database so that you are able to search through it later and learn more from it as well. There are going to be a few simple steps that you are able to take to make something to help us go through the process of extracting the data with the help of web scraping in Python. The steps that will help you to use Python to help with web scraping will include: Find the URL that you would like to scrape in the first place. Inspect the page that you are planning on using. Find the data that is on the page that you would like to extract. Write out the code that you would like to use with the help of Python in order to complete this process. Run the code that you just wrote and then extract out the data that
you would like to use. Store the data in the format that would be the most helpful for you in the process. There are also a few options that you are able to use when it is time to work on the process of web scraping. As we know, Python is already able to be used for a lot of different types of applications, and there are going to be a ton of libraries with Python that is going to be used for different purposes. There are a few libraries that work the best when it comes to working with the process of data web scraping will include: 1. Selenium: this is going to be a web testing library. It is going to be used to help automate some of the activities that are found on your browser. 2. BeautifulSoup: this is going to be one of those packages that you are able to use with Python to help us to parse HTML and XML documents. It is also able to create parse trees that can help us to extract the data in an easy manner. 3. Pandas: this is one of the best libraries to rely on when it is time to handle any kind of work that you would like in data analysis and data science. Pandas are often going to be used to help out with any of the data analysis and the data manipulation that you would like. When it comes to web scraping, you will find that Pandas is going to be used in order to extract the data and then get it stored in the right format in the way that you would like along the way. There are many times when a company is going to try and gather up data from other websites and from many other sources. This is one of the first steps that is going to be found when we are working
with data analysis and using that information to improve a business through their customers, the industry, or from the other competition out there. But going through and manually gathering all of that data is going to take too long and can be really hard to work with as well. With the large amounts of data that are being used and generated on a daily basis, it is no wonder that so many companies are working with processes like web scraping to handle all of the work in a timely manner as well. When we work with web scraping and do some of the codings that are necessary with the help of Python, we will find that we are able to get through the information in a fast manner and get it stored in the right place for our needs, without having to do all of the work manually. This can make the process of data analysis much easier overall and will ensure that we are able to see some of the results that we want with this as well. And with some of the right Python algorithms and codes, we can get data scraping done in no time.
Chapter 12. Common Programming Challenges The excitement about programming can fizzle out fast and turn into a nightmare. There are unexpected challenges that might make life difficult for you, especially as a beginner programmer. However, these challenges should not set you back or kill your resolve. They are common challenges that a lot of people have experienced before, and they overcame them, as you will too. If you want to succeed in programming, you should be aware of the fact that mistakes do happen, and you will probably make many of them. The downside of mistakes is that you can feel you are not good enough. Everyone else seems to be doing fine, but not you. On the flip side, mistakes are an opportunity for you to learn and advance. No one was born as good as they are today. What we are is the sum of mistakes and learning from those mistakes and experiences. Feel free to reach out to mentors whenever you feel stuck. Deadlines and bug reports might overwhelm you, but once you get the hang of it, you will do great. The following are some common challenges that you might experience as a beginner programmer.
Debugging You feel content with a project, satisfied that it will run without a hitch and perform the desired duties. However, when you arrive at your desk in the morning, your quality assurance team has other ideas. They point out what seems like endless issues with the project. Perhaps the OK button is not responsive; the error messages are not displaying correctly, and so forth. All these are issues that eventually leave a negative impact on the user experience. You must get back to the drawing board and figure out where the problem lies. Debugging will be part of your life as a programmer. It is not enjoyable, but it is reality. Debugging is one of the most exhausting things you have to do. If you are lucky, you will encounter bugs that can be fixed easily. Debugging costs you hours and lots of coffee. However, you should not feel downtrodden yet. Bugs are all over the place in programming. Even the best code you will ever come across needs debugging at some point. Solution How do you handle the debugging process and make your life easier? The first step is to document your work. Documentation might seem like a lot of work for you, but it helps you trace your steps in the event of an error.
That way, you can easily trace the source and fix it, saving you from inspecting hundreds or thousands of code. Another way of doing light work of debugging is to recreate the problem. You must understand what the problem is before you try to solve it. If you recreate the problem, you isolate it from the rest of the code and get a better perspective of it. Talk to someone. You might not always have all the answers. Do not fear anyone, especially if you work in a team. Beginner programmers often feel some people are out of reach, perhaps because of the positions they hold. However, if you do not ask for help, you will never really know whether the person will be helpful or not. The best person to ask for help, for example, is the quality tester who identified the problem, especially if you are unable to recreate the problem.
Working Smart As a programmer, one thing you must be aware of is that you will be sitting down for hours on end working on some code. This becomes your normal routine. However, you are aware of the risks this poses to your health. Neck sprains, numb legs, back pain, pain in your palms and fingers from typing away all day. For a beginner, you might not be ready for the challenge yet. However, you must still dig in daily to meet your deliverables. Solution The first thing you must consider is regular exercise. If you work a desk job, it is possible to lose motivation and feel exhausted even before your workday is over. You can tackle this by keeping a workout routine. Jog before you go to work every morning, take a brisk half-hour walk, and so forth. There are many simple routines that you can initiate, which will help you handle the situation better. While at work, take some time off and walk around—without looking like you are wasting time. This helps to relieve your body of the pain and pressure, and more importantly, allows for proper blood circulation. Other than that, you do not have to keep typing while seated. Stand up from time to time. Some companies have invested in height-adjustable desks, which help with this.
User Experience One of the most common challenges you will experience as a programmer is managing user experiences. You will come across a lot of clients in the course of your programming career. However, not all clients know how to communicate their needs. As a result, you will be involved in a lot of back and forth on project details and deliverables. Most users have a good idea of what they need the project you are developing to do. However, this is not always the same as what your development team believes. Given that most beginner programmers never interact directly with the clients, it might be difficult for you to understand them, especially in a team project. Solution The best way around this is to figure out the best features of the project. Your client already knows what they want the project to do. Ask the right questions, especially to your team members who are in direct contact with the client or the end-user. The best responses will often come from designers and user experience experts. Their insight comes from interacting with users most of the time. Another option is to test the product you are designing. You have probably used test versions of some products in the past. Most major players in the tech industry release beta versions of their products before the final.
This way, users try it out, share their views, ideas, and challenges they encounter. This information is collected and used to refine the beta product before the final one is released. Testing your product allows you to identify and fix bugs before you release the product to the end-user. It also allows you to interact with the user and gauge the level of acceptance for your project.
Estimates A lot of beginner programmers struggle with scheduling. Perhaps you gave an estimate for a task and you are unable to meet it. You are now a professional. Never delude yourself that you are not, perhaps because you are a beginner. This industry focuses on deadlines a lot. In software development, estimates are crucial. They are often used to plan bigger schedules for projects, and in some cases, agree on the project quotes. Delays end up in problems that might, in the long run, affect trust between the parties involved. Solution The first step towards getting your estimates right is to apportion time properly. Time management is key. Set out a schedule within which you can complete a given task. Within that schedule, allow yourself ample buffer time for any inconvenience, but not too much time. For example, allow yourself 30-40 minutes for an assignment that should take 20 minutes. Another way of improving your scheduling challenges is to break down assignments into micro milestones. A series of small tasks is easier to manage. Besides, when you complete these micro assignments, you are more psyched about getting onto the next one, and so on. You end up with a lighter workload, which is also a good way to prevent
burnout.
Constant Updates The tech industry keeps expanding in leaps and bounds. You can barely go a month before you learn about some groundbreaking work. Everything keeps upgrading or updating to better, more efficient versions. Libraries, tools, and frameworks are not left behind, either. Updates are awesome. Most updates improve user experiences and bolster the platform security. However, updates come with undue pressure, even for the most experienced programmers out there. Solution Stay abreast of the latest developments in your field of expertise. You cannot know everything, but catching up on trends from time to time will help you learn some new tools and tips available, which can also help you improve your skills and develop cutting edge products. Another option is to learn. The beauty of the world of IT is that things are always changing. It is one of the most dynamic industries today. Carve out half an hour daily to learn something new. You will be intrigued by how much you will have mastered after a few weeks. In your spare time, challenge yourself to build something simple, solve a problem, and so forth. There are lots of challenge websites available today where you can have a go at real-world problems.
Problems Communicating Beginner programmers face communication challenges all the time. You are new to the workplace, so you do not really know anyone. Most of the team members and managers are alien to you, and as a result, you often feel out of place. At some point in time, every programmer goes through this. You feel like a baby among giants. Eventually, the pressure gets to you, and you make a grave mistake, which could have been avoided if you reached out to someone to assist. Solution Dealing with communication problems is more than just a social interaction concern. First, you must learn to be proactive. If something bugs you, ask for help. The worst that can happen is people might laugh, especially if it is a rookie question, but someone will go out of their way and help you. If they don’t and something goes awry, the department shoulders the blame for their ignorance. Before you know it, people will keep checking in on you to make sure you are getting it right, and you might also make some good friends in the process. Consistency is another way to handle the communication challenge. For a beginner, you might not always get everything right. These are moments you can learn from. With practice, you grow bolder and learn to express yourself better over time.
Security Concerns Data is the new gold. This is the reality of the world right now. Data is precious and is one of the reasons why tech giants are facing lawsuits all over the place. Huawei recently found itself in a spat with the US government that ended up in a host of severed ties. There are so many reasons behind the US government's hard stance against Huawei, and most of them circle back to data. People are willing to pay a great deal of money to access specific data that can benefit them in one way or the other. Some companies play the short-term game; others are in it for the long-term. Competitors also use nefarious ways to gain access to their competitors’ databases and see what they are working on and how they do it. As a programmer, one thing your clients expect from you is that their data is safe, and the data their clients share with them through your project. Beginner programmers are fairly aware of all the security risks involved. This should not worry you so much, especially if you are part of a team of able developers. They will always have contingency measures in place. However, you must not be ignorant of security loopholes, especially in your code. Solution Hackers are always trying to gain access to some code. You cannot stop them from trying. However, you can make it difficult for them to penetrate your code.
Give them a challenge. The single biggest threat to any secure platform is human interaction. At times, someone will not compromise your code from outside but someone you know. In most cases, they compromise your code without knowing they do–unless they did it intentionally. Make sure your workstation is safe. Every time you step away from your workstation, ensure your screen is locked, and if you are going away for a long time, shut down your devices. It is also advisable in your programming language that you use parameterized queries, especially for SQL injections. This is important because most hackers use SQL injections to gain access and steal information.
Relying on Foreign Code You have written some code for a few years and believe in your ability. You are confident you are good enough, hence being hired by the company. However, make peace with the fact that you will have to work on projects that were written by someone else. Working with another person’s code is not always an easy thing, especially if their code seems outdated. There is a reason why the company insists on using that particular code. The worst possible situation would be company politics–they occur everywhere. Someone wrote some code that the entire company relies on, but you cannot change or question it because the original coder connects with the company hierarchy. Often this raises a problem where you are unable to figure out the code. Solution Since there is not much you can do about the code, why not try to learn it? If you can, talk to the developer who wrote it and understand their reasoning behind it. This way, it is easier for you to embrace their style, and you will also have a smooth time handling your projects. You never know; you might just show them something new and help them rethink their code. Another option is to embrace this code. It is not yours, but it is what you have and will be using for a very long time. Change your attitude about that code. Take responsibility for the code and work with it.
This way, your hesitation will slowly fade away.
Lack of Planning While you have a burning desire to impress in your new place of work, you must have a plan. Many beginner programmers do not. Many programmers jump into writing code before stopping in their tracks to determine the direction they want to steer the code. The problem with this approach is that you will fail to make sense. The code might sound right in your head, but on paper, nothing works. Solution Conceptualize an idea. Everything starts with an idea. Say you want to write a program that allows users to share important calendar dates and milestones with their loved ones. Focusing on this idea helps you remember why you are writing that code. Once you have an idea, how do you connect it with real problems? What are the problems you are trying to solve? How are they connected to your idea? This also begs the question—why do people need your program? Planning will help you save time when writing a program, and at the same time, help you stay on track.
Finally In programming, everyone starts somewhere. Being the new person in the company should not scare you. Communicate with your peers and seniors, be willing to learn from them, and all the things that might seem overwhelming will somehow become easier as time goes by.
Conclusion Programming is not easy. In fact, it’s rather difficult, and there are topics that are sadly too esoteric to cover in this book. For example, we didn’t get to the bulk of file operations, nor did we get to things like object-oriented programming. But I hope what I’ve given you is a very solid foundational understanding of Python so that you can get ready to learn about these things. Python—It is a language named after Monty Python. It is a programming language that has taken the world by storm. The applications we have seen so far, the examples we have discovered, and the future prospects of the language, when combined, point out one thing for sure. If you are a programmer, Python is your ticket to the future. When learning a new language, there will always be challenges. There will be times where you might even be frustrated and call it a day. The thing to remember here is this: many others have gone through this road just like you. Some have gone on to become successful, while others have remained within the shadow of someone else. It is up to you to grab the opportunity and become a unique and different programmer, and learning Python is just a part of the journey. Through Python, you will be able to do so much more than just design 2D snake games. Python has paved the way for many success stories and has certainly become the most popular language. Now you know why!
It is time for you to add Python to your resume and deliver results in the most effective and efficient manner possible. Good luck, and have a great programming journey ahead! I am not going to hold your hand. But what I can say is that you have worked hard to be a programmer, and you have worked hard throughout the course of this book, likewise. Not all of the concepts within this book are easy to understand, even with more in-depth explanations. My goal here wasn’t explicitly to teach you Python or object-oriented programming or any of that: my goal was to teach you the computer. The way it thinks, and the way programs are written. Anybody can learn Python keywords. But to learn to program and to write effective solid code regardless of which programming language that you’re using, that’s another skill entirely. Now that you have finished this book, you should be able to do a lot of programs for different situations. The next step is to keep practicing a lot in order to become a master of Python. While programming, sometimes you might think that some things are impossible to code or that you are not good enough to do them. But that is not right; you just have to think a lot in order to make it happen. Also, while programming, you may find that your code or program is not working, do not worry. Even the smartest people write codes that do not work at the beginning. You just have to keep trying. As you know, nowadays technology is everywhere, and so programming. Our recommendation is that you should try to code and solve problems of your daily activities in order to broaden your vision of the world since all
electronics have hundreds and hundreds of lines of codes on them. I sincerely hope that this book has helped you to get on the road to accomplishing everything that you want to accomplish in the world of programming. Remember going forward that without a doubt, it will not always be easy. It won’t be even remotely easy. Programming is made difficult by its very nature. Humans, we just don’t think like computers. But hopefully, I’ve helped you to understand programming, at least.
DATA SCIENCE WITH PYTHON
Introduction Python is termed as a multi-paradigm language used for programming and can be perceived as a Swiss army knife in the field of coding. It can support OOP, functional programming patterns, and structured programming, among several other things. There is a saying popularly used in the community of Python, and it goes like this: Python is normally the 2nd best language for all purposes. But this isn’t a knock for organizations faced with the dilemma of using the best of the breed solutions as they soon find themselves burdened with codebases that are unmaintainable and incompatible. Python is capable of handling all the jobs from data mining to website building to the running of embedded systems. It is an all-in-one programming language. For instance, in the case of ForecastWatch, Python was utilized for writing a parser for harvesting forecasts from other sites. It is also used for an aggregate engine that compiles the data and the website code for displaying results. It was PHP, which was originally utilized for building the website until the organization realized that it was a lot easier to deal with a single language for everything. Facebook also selected Python for data analysis because it was being used a great deal for other portions of the organization. The name Python is derived from the popular rock band Monty Python. The creator of the Python programming language, Guido Van Possum, chose this name to suggest that its use would be fun. You will find many obscure Monty Python sketches, which are referenced in the code samples used in Python and those used for documentation. For these reasons, this is a cherished programming language among programmers.
The data scientists with scientific or engineering backgrounds might feel like a barber armed with an ax when they use the language for the first time for data analysis—out of place. However, the inherent simplicity and readability of Python make it comparatively easy to pick, and the quantity of devoted analytical libraries available nowadays mean that data scientists in all sectors can find packages tailored for their needs, easily available on the net for downloads. Due to the general nature of Python and its extensibility, it was inevitable that as the popularity of the language went into an orbit, its use in the field of data science became a foregone conclusion. As a matter of fact, Python is a—jack of all trades—program, and it isn’t particularly well-suited for statistical analysis. However, several companies have invested in Python, realizing the advantages of using a standardized language and extending it for those purposes.
Effectiveness of Libraries for Python Similar to other programming languages, the main reason for the success of Python is the libraries. There are around 72,000 of them available with the PyPi (Python Package Index), and the number is constantly rising. Python is specifically designed to possess a stripped-down and lightweight core, and its standard library is built by using tools that can be utilized in all programming tasks. Python comes with a “batteries included” philosophy that allows its users to get down to the issue of finding solutions to problems quickly without having to go across many competing function libraries.
There Is Always Someone Available to Help in the Python Community There are many great things about Python, and one of them is the broad and diverse base of millions of Python users all across the globe who are ready to offer suggestions and advice when you’re stuck on something. There is a very good chance that someone else was stuck on the same problem before you. These open source communities are extremely popular due to their open discussion attitude. However, some of them are pretty fierce about not allowing newcomers to mix easily. Python is, happily, an exception. These Python experts are happy to aid you, both online and during the local meet-ups. Chances are, you will stumble into several intricacies of learning a new programming language. As Python plays such a vital role in the data science community, you can find several resources that are specific to the use of Python in the data sciences. These meet-up groups of data scientists who use Python are prevalent all across the US, especially in places such as Los Angeles and Seattle. In case you’re having trouble locating a meet-up group near you that has the right qualifications, there is a data science hack that uses Python for searching these meet-up groups to find the perfect match.
Chapter 1: What Is Data Science? The first thing that we need to take some time looking over in this guidebook is the basics of data science. Data science, to keep things simple, is the detailed study of the flow of information from the huge amounts of data that a company has gathered and stored. It is going to involve obtaining some meaningful insights out of raw and usually unstructured data that can then be processed through analytical programming and business skills. Many companies are going to spend a lot of time collecting data and trying to use it to learn more about their customers, figure out how to release the best product, and learning how to gain a competitive edge over others. While these are all great goals, just gathering the data is not going to be enough to make it happen. Instead, we need to be able to take that data, and that data is usually pretty messy and needs some work and analyze it so that we are better able to handle all that comes with it.
The Importance of Data Science In a world that is going more and more to the digital space, organizations are going to deal with unheard of amounts of data, but structured and unstructured, on a daily basis. Evolving technologies are going to enable some cost savings for us, and smarter storage spaces to help us store some of this critical data. Currently, no matter what kind of industry we are looking at or what kind of work the company does, there is already a huge need for skilled and knowledgeable data scientists. They are actually some of the highest-paid IT professionals right now, mainly because they can provide such a good value for the companies they work for, and because there is such a shortage of these professionals. The gap of data scientists versus the current supply is about 50 percent, and it is likely to continue growing as more people and companies start to see what value data science can have for them. So, why is data becoming so important to these businesses? In reality, data has always been important, but today, because of the growth in the internet and other sources, there is an unprecedented amount of data to work through. In the past, companies were able to manually go through the data they had, and maybe use a few business intelligence tools to learn more about the customer and to make smart decisions. But this is nearly impossible for any company to do now thanks to the large amount of data they have to deal with on a regular basis. In the last few years, there has been a huge amount of growth in something known as the "Internet of Things", due to which about 90 percent of the data has been generated in our current world. This sounds even more impressive when we find out that each day, 2.5 quintillion bytes of data are generated and used, and it is more accelerated with the growth of the IoT. This data is going to come to us from a lot of different sources, and where
you decide to gather this data is going to depend on your goals and what you are hoping to accomplish in the process. Some of the places where we are able to gather this kind of data will include: Sensors are used in malls and other shopping locations in order to gather more information about the people who shop there. Posts placed on various social media sites. Digital videos and pictures are captured on our phones. Purchase transactions that are made through e-commerce. These are just a few places where we are able to gather up some of the data that we need and put it to use with data science. And as the IoT grows and more data is created on a daily basis, it is likely that we are going to find even more sources that will help us to take on our biggest business problems. And this leads us to need data science more than ever. All of this data that we are gathering from the sources above and more will be known as big data. Currently, most companies are going to be flooded and a bit overwhelmed by all of the data that is coming their way. This is why it is so important for these companies to have a good idea of what to do with the exploding amount of data and how they are able to utilize it to get ahead. It is not enough to just gather up the data. This may seem like a great idea, but if you just gather up that data, and don’t learn what is inside of it, then you are leading yourself to trouble. Once you can learn what information is inside of that data, and what it all means, you will find that it is much easier to use that information to give yourself the competitive advantage that you are looking for. Data science is going to help us to get all of this done. It is designed to make it easier for us to really take in the big picture and use data for our needs.
It will encompass all of the parts of the process of getting the data to work for us, from gathering the data to cleaning it up and organizing it, to analyzing it, to creating visuals to help us better understand the data, and even to the point of how we decide to use that data. All of this comes together and helps us to really see what is inside of the data, and it is all a part of the data science process. Data science is going to work because it is able to bring together a ton of different skills, like statistics, mathematics, and business domain knowledge, and can help out a company in many ways. Some of the things that data science is able to do when it is used in the proper manner for a company, will include some of the following: Reduce costs. Get the company into a new market. Tap into a new demographic to increase their reach. Gauge the effectiveness of a marketing campaign. Launch a new service or a new product. And this is just the start of the list. If you are willing to work with data science and learn the different steps that come with it, you will find that it is able to help your business out in many different manners, and it can be one of the best options for you to use in order to get ahead in your industry.
How Is Data Science Used? One of the best ways to learn more about data science and how it works is to take a look at how some of the top players in the industry are already using data science. There are a ton of big-name companies who are already relying on data science to help them reach their customers better, keep waste and costs down, and so much more. For example, some of the names that we are going to take a look at here include Google, Amazon, and Visa. As you will see with all of these, one of the biggest deciding factors for an organization is what value they think is the most important to extract from their data using analytics, and how they would like to present that information as well. Let’s take a look at how each of these companies has been able to use data science for their needs to see some results. First on the list is Google. This is one of the biggest companies right now that is on a hiring spree for trained data scientists. Google has been driven by data science in a lot of the work that they do, and they also rely on machine learning and artificial intelligence in order to reach their customers and to ensure that they are providing some of the best products possible to customers as well. Data science and some good analysis have been able to help them get all of this done effectively. Next on the list is the company Amazon. This is a huge company known around the world, one that many of us use on a daily basis. It is a cloud computing and e-commerce site that relies heavily on data scientists to help them release new products, keep customer information safe, and even to do things like providing recommendations on what to purchase next on the site.
They will use the data scientist to help them find out more about the mindset of the customer and to enhance the geographical reach of their cloud domain and their e-commerce, just to name a few of their business goals right now. And then, we need to take a look at the Visa company and what they are doing with the help of data science. As an online financial gateway for countless other companies, Visa ends up completing transactions that are worth hundreds of millions in one day, much more than what other companies can even dream about. Due to the large number of transactions that are going on, Visa needs data scientists to help them increase their revenue, check if there are any fraudulent transactions, and even to customize some products and services based on the requirements of the customer.
The Lifecycle of Data Science We are going to go into more detail about the lifecycle of data science as we progress through this guidebook, but first, we can take a moment just to see how we are able to use this for our own needs. Data science is going to follow our data from the gathering stage of the data, all the way through until we use that data to make our big business decisions. There are a number of steps that are going to show up in the process in the meantime, and being prepared to handle all of these, and all that they entail, is the challenge that comes when we want to rely on data science. Some of the basic steps that are found in the data science lifecycle are going to include: Figuring out what business question we would like to answer with this process. The process of collecting raw data for use. Cleaning and organizing all unstructured data to be used. Preprocessing our data. Creating a model with the help of machine learning and taking some time to train and test it to ensure accurate results along the way. Running our data through the model to help us understand what insights and predictions are inside. Use visuals to help us better understand the complex relationships that are found in any data that we are using for this analysis. While the steps may sound easy enough to work with, there are going to be some complexities and a lot of back and forth that we have to work with here. The most important thing here is to go into it without any preconceived notions of what you would like to see happening and don’t try to push your own agenda on the data. This is the best way to ensure that you will actually learn what is inside of that data and can make it easier to choose the right decisions for your needs as well.
The Components of Data Science Now, we also need to take some time to look at the basics of data science. There are going to be a few key components that come into play when we are talking about data science, and having these in place is going to make a big difference in how well we are able to handle some of the different parts that come with data science, and how we can take on some of the different parts that we need with our own projects. Some of the key components that we need to take a look at when it comes to data science will include: The Various Types of Data: The foundation of any data science project is going to be the raw set of data. There are a lot of different types. We can work with the structured data that is mostly found in tabular form, and the unstructured data, which is going to include PDF files, emails, videos, and images. Programming: You will need some kind of programming language to get the work done, with Python and R being the best option. Data management and data analysis are going to be done with some computer programming. Python and R are the two most popular programming languages that we will focus on here. Statistics and Probability: Data is going to be manipulated in different ways in order to extract some good information out of it. The mathematical foundation of data science is going to be probability and statistics. Without having a good knowledge of probability and statistics, there is going to be a higher possibility of misinterpreting the data and reaching conclusions that are not correct. This is a big reason why the probability and statistics that we are looking at here are going to be so important in data science. Machine Learning: As someone who is working with data science, you are going to spend at least a little time learning the algorithms of machine learning on a daily basis. This can include methods of classification and regression. It is important for a data scientist to know machine learning to complete their job, since this is the tool that is
needed to help predict valuable insights from the data that is available. Big Data: In our current world, raw data is going to be what we use to train and test our models and then figure out the best insights and predictions out of that data. Working with big data is going to help us to figure out what important, although hidden, information is found in our raw data. There are a lot of different tools that we are able to use in order to help us not only find the big data but also to process some of these big data as well. There are many companies that are learning the value of data science and all that is going to come with it. They like the idea that they can take all of the data they have been collecting for a long period of time and put it to use to increase their business and give them that competitive edge they have been looking for. In the rest of this guidebook, we are going to spend some time focusing on how to work with data science and all of the different parts that come with it as well.
Chapter 2: Basics of Python
Python IDEs An Integrated Development Environment is a tool that provides facilities like build automation, testing, code lining, and debugging for different programming languages. Python IDEs are best suited for developing machine learning and deep analytics models. Here are some of the best IDEs for Python programming: Sublime Text Sublime Text is an amazing code editor that provides high customizability and is best for beginners. Along with other popular programming languages, Sublime Text also supports Python execution and comes with predefined support for the language. The editor can be downloaded free of cost and is considered a full-fledged Python development environment. Sublime Text packages are written in Python programming language, which provides a wide range of extensions and packages to support complex programming. Atom Atom is an open-source integrated development environment designed and developed by Github. Users can download and install the IDE along with predefined development packages such as linter-flake8 and python-debugger. Being highly customizable, users can install packages and set up the environment to meet their development requirements. Eclipse Eclipse is an all-rounder integrated development environment that is available for Windows, Linux, and OS X. The tool has a rich marketplace of add-ons and extensions, which makes it suitable for machine learning and Python development.
Furthermore, the PyDev extension allows the developers to perform Python debugging and utilize code completion facilities as well.
Getting Started with Python Basic Syntax For writing your first Python program, you are required to be well aware of the basic syntax and requirements of the Python programming language. A Python program can be written and executed in two basic modes, which are known as Interactive mode and Script mode. In the Interactive mode, developers are supposed to write a program and execute it, whereas, in the Script mode, files and code can be saved and accessed through the Python program (.py file). Identifiers Identifiers are used to identify a module, class, function, or variable in a program. In the Python programming language, an identifier can be a letter from A to Z or from a to z followed by zero or more digits, underscores, or letters. Furthermore, the Python language does not allow characters such as %, $, or @ within identifiers. Being a case-sensitive language, programmers need to place identifiers carefully to execute the program without any error. Python syntax can be executed by writing the following line in the command line: >> print(“Hello world!”) Hello world Variables and Data Types Similar to other major programming languages such as Java, C, and C++, Python has predefined data types and rules for using variables. For Python programming, you must remember that a variable can have both short and descriptive variables like x, y, age, or year. A variable name should always start with a letter and cannot start with a number.
Moreover, variable names are case-sensitive in Python, and developers need to be careful when declaring variables in the program.
Data Types Python has built-in default data types, which include text, numeric, sequence, mapping, Set and Boolean and Binary type. To get the data type in the Python programming language, the “type()” function can be used in the program. Here are some examples of setting data types in Python: Sample
Data Type
x = “ Python”
Str
x=5
Int
x = 5.0
float
x = range (5)
range
x = (“Red”, “Blue”)
Tuple
x = [“Red”, “Blue”]
list
x = True/False
Boolean
x = b “Python”
bytes
Decision Making and Basic Operators Decision-making is an essential part of any programming language because it specifies the program to take actions according to the given conditions. If statement Syntax: if expression: statement If-else statement
Syntax: if expression: statement else: statement Nested If statements In a nested, if statement, we can have an if, elif, and else present within another if, elif, and else statement. The syntax for implementing this statement is defined as follows: If first_expression: Write your statement here if second_expression: write your statement here elif third_expression: write your statements here else: write the default statement
Functions and Modules Python has built-in functions that can be used to create a number of complex machine learning and deep learning models. Built-in functions are also known as user-defined functions. For defining a function in the Python programming language, we can use the syntax as described below: Def function name( parameters) : “function docstring” function suite return [expression] For example: Def printme (str ): “Sample string passed into a function” print str return Modules in Python programming allow developers to organize their code and develop code modules that can be used further in the program. A module is also referred to as a file made up of Python code, which includes arbitrarily named attributes, classes, variables, and functions. For example: def print func( parameter ): print “Sample:”, parameter return Furthermore, we can also import an existing module into the Python source code by using the import module support. Here are some of the import statement modules for the Python programming language: import statement
from.. import statement: from modname import*
Object-Oriented Programming Python is based on object-oriented programming modules that enable developers to perform different tasks through classes and objects. In OOP, a class is a user-defined prototype that is defined for an object and contains a set of attributes, data members, class variables, and instance variables. A class variable is shared with each instance of the class and is usually defined outside the class method. Furthermore, class variables cannot be used more often as compared to instance variables. An instance variable is defined inside a method and only belongs to the current instance of a class. In object-oriented programming, the function overloading approach is referred to as the implementation of more than one behavior to a specific function. To implement classes in a program, we are required to make use of objects and methods in the class definition. Here is the syntax to create a class in Python: Class ClassName: “class documentation string” Class_suite Sample pupil class in Python: class Pupil: 'Base class for Students" empCount = 0 def __init__(self, fname, Marks): self.name = fname self.marks = marks
Pupil.pupCount += 1 def displayCount(self): print "Total Number of Pupils%d" % Pupil.pupCount def displayPupil(self): print "Name : ", self.name, ", Marks: ", self.marks To access class attributes, we can use the following syntax: pup.1displayPupil() pup.2displayPupil() print “Pupil %d” % Pupil.pupCount In a Python class, there are several built-in attributes that can be accessed by using the dot operator. For example, dict, doc, name, module, and bases.
Class Inheritance In object-oriented programming, a class can be created by deriving it from an existing class. The child class inherits the attributes from its parent class, and they can also be used to override data members, functions, and methods from the parent class. Furthermore, the derived classes are the same in functionality as their parent class. For example: class A:
// Class A definition
.. class B:
// Class B definition
.. class C(A, B): //Subclass A and B Python syntax: class SubClassName (ParentClass1[, ParentClass2, ..]): “class documentation string” Class_suite
Regular Expressions Regular expression or RegEx is referred to as the sequence of characters that is implemented to create a search pattern. For developing machine learning and data analytics models, regular expressions are widely used for pattern matching and training of models. Python comes with a built-in regular expression module, which is also known as re module or RegEx module. To import the module, we can use the “import re” statement in the program. The re module is comprised of different functions that can be used to search a string for a match. For example, search, split, sub, and findall are the major functions that are used for pattern matching and learning in machine learning models. Implementation: findall () function import re str = “Machine learning” x = re.findall(“in”, str) print(x) Search() function import re str = “Machine learning” x = re.search(“\s”, str) print(“Position of first white-space character:”, x.start()) Split () function import re str = “Machine learning” x = re.split(“\s”, str)
print(x)
Match and Search Functions The match function has the capability to match the re-pattern to the string. The syntax for match function is defined as follows: re.match(pattern, string, flags=0) In the search function, the first occurrence of re-pattern is searched within optional flags and the string. The syntax for search function is defined as follows: re.search(pattern, string, flags=0) Furthermore, regular expression literals can also include an optional modifier. The optional modifier has the capability to control different aspects of matching, and they are also considered as an optional flag.
Exception Handling An exception occurs during program execution and can disrupt the smooth flow of program instructions. When an exception occurs in a Python program, it can be handled through ‘try’ and ‘except’ statements as explained below: try: statements except Exception 1: //if exception 1 occurred, execute this block exception 2: //if exception 2 occurred, execute this block else No exception occurred There are different exceptions and assertions which can occur in a Python program. For example: Exception, StopIteration, SystemExit, StandardError, OverflowError, ArithmeticError, ZeroDivisionError, AssertionError, ImportError, KeyboardInterrupt, LookupError, IndexError, KeyError, and NameError. It must be noted that a single try statement can have various except statements, and they are only used when the try block has statements that might throw any type of exception. Furthermore, the Python program might also execute a generic except clause and handle any type of exception.
File Handling File handling is an essential part of every web or desktop application. The approach is used to create, read, update, and delete files from the database of the program. In Python programming, file handling is generally performed with the open() function and includes filename and mode parameters. Opening a File To open a file through a Python program, we can use four different modes defined as follows: Read: Opens a file for reading and initiates an error in case the file does not exist. Write: Opens file for writing and automatically creates a file if it is not available. Append: Opens file for appending and creates if not available. Create: Creates the required file and initiates an error if the file already exists. The basic syntax for file handling operations: To open a file: f = open(“samplefile.txt”0 To open a file on server: f = open(“samplefile.txt”, “r”) print(f.read()) Closing a File f = open(“Filename.txt”,”r”) print(f.readline()) f.close Writing into Existing Files f = open(“samplefile.txt”, “a”) f.write(“New content”)
f.close() f = open(“samplefile.txt”, “r”) print(f.read()) Create new file: f = open(“Newfile.txt”, “x”) Deleting Files To remove or delete a file in Python, you are required to import an OS module for which the os.remove() function is recommended. The syntax for deleting or removing a file in the Python programming language is defined as follows: import os os.remove(“Samplefile.txt”) Deleting a Folder To delete a specific folder, we can use the following syntax: import os os.rmdir(“Folder”)
Chapter 3: The Best Python Libraries for Data Science Before we dive into seeing some of the cool things that we are able to do when it comes to working with data science and machine learning with Python, we first need to explore some of the different libraries that we need to focus on to make all of this work. We have to remember with this one that while there are a lot of neat things that we are able to do with the help of the Python language, there are also going to be a few places where it falls behind and may not work as well as we would like. And this is why we are able to add in some of the best Python libraries and extensions that are designed to work well with data science and the different steps it requires as well. Python continues to take on some of the leading positions when it comes to solving tasks of data science and some of the challenges that come with it. And when we are able to add in some of the libraries that we are going to talk about here, you will find that you are really able to handle all of the different processes that are needed for data science in no time. Let’s take a look at these libraries and what we are able to do with them to help us see some amazing results.
Core Libraries and Statistics NumPy When we first get started with doing some data science on Python, one of the best libraries to download is going to be NumPy. This is going to be the best library to work with when it is time to process some large multi-dimensional arrays and matrices. It also has a pretty extensive collection of mathematical functions and implements methods that will make it possible for us to perform various operations with all of these objects. Many of the other data science libraries are going to rely on some of the capabilities that come with this kind of library, so having it set up and ready to go on your computer is going to make a big difference. Over the past few years, there have been a number of improvements that have been made to the library. In addition to fixing the bugs and dealing with some of the compatibility issues, some of the crucial changes that we are going to see are the possibilities of styling, namely the printing format that comes with the objects in NumPy. There is also the ability of some functions to handle files of any encoding that you can traditionally do with Python. When you are ready to start working with some of the scientific tasks with Python, you are going to need to work with the Python SciPy Stack. This is going to be a collection of software that is specifically designed to help us complete some of the scientific computing that we need to do with Python. Keep in mind that this SciPy stack is not going to be the same thing as the SciPy library, though, so keep the two of these apart. The stack is going to be pretty big because there are more than 12 libraries that are found inside of it, and we want to put a focal point on the core package, particularly the most essential ones that help with data science.
The most fundamental package around which this computation stack is going to be built around is NumPy, which is going to stand for Numerical Python. It is going to provide us with an abundance of useful features for operations that you want to handle with matrices and n-arrays. This library is going to help us with a lot of different tasks that we want to do, including the vectorization of mathematical operations on the NumPy array type, which is going to ameliorate the performance and will speed up the execution that we see at the same time. SciPy Another library that we are able to take a look at when it comes to working with the Python language is going to be SciPy. This is going to be a library of software that we can use to help us handle some of the tasks that we need for engineering and science. If this is something that your project is going to need to spend some time on, then SciPy is the best library to get it done. You will quickly find that this library is going to contain some of the different modules that we need in order to help out with optimization, integration, statistics, and even some linear algebra if we would like to name a few of the different tasks that work well with this. The main thing that we will use with this library and some of the functionality that you will need when bringing it up is that it is something we can build up with the help of the NumPy library from before. This means that the arrays that we want to use in SciPy are provided to us thanks to the NumPy library. This library is going to provide us with some of the most efficient numerical routines, as well as some of the numerical integrations that we need, the help of optimization, and a lot of the other options that we need with our specific submodules. Pandas We can’t go far in our discussion over the libraries in Python that work with data analysis without spending some time looking at the Pandas library.
This one is going to be designed to help us out with all of the different steps that we need with data science, such as collecting the data, sorting it and cleaning it off, and processing the various data points that we are working with as well. We are even able to take it a bit further and look at some of the visualizations that are needed to help showcase the data in a manner that is easier to work with. The Pandas library is going to be a package that will come with Python and has been designed so that it can specifically make some of the work that we need with labeled and relational data simple and more intuitive. Pandas are going to be the best tool that we can use to help out with many of the processes that we want to handle, and this can include some of the data wranglings that need to happen in this process. In addition to some of the benefits that we have talked about before, the Pandas library is going to work well when it comes to easy and quick data visualization, manipulation, and aggregation, along with some of the other tasks that we need to work with in order to help us get our work done in data science. Matplotlib As we are working through some of the libraries and projects that we want to focus on with data science, we are going to find that working with some data visuals can be helpful as well. These visuals are going to make it easier for us to handle the complex relationships that are found in our information and our data in the first place. For most people, it is a lot easier to go through and understand the information that we have when it comes to some sort of visual, whether this is in a picture, a graph or chart, or some other method. At least compared to some of the methods that we can use with reports and spreadsheets. This is why the visualization process of data is so important when it is time to work with data science. And this is why we need to look at Matplotlib to help us to take care of these
visuals. Matplotlib is going to be one of the best data science and Python libraries to work with to make sure that we can create and handle some of the simple and most powerful visuals in no time. It is going to be a really strong piece of software that will help us to take the results that we are getting when we do the algorithms, and then effortlessly turning them into something that we are able to see and understand easier than before. We have to remember here that when we are working with Matplotlib, you will find that it is going to be low-level. This means that you are going to need to spend more time writing out more code to help all of this get done and to give us some of the higher-levels of visuals that we would like. It requires a bit more effort than we are maybe used to working with, but it is going to still provide us with some of the things that we need to get our work done. Just be aware that it does require some more work. When we are working with this kind of library, we have to look at it to help us see how we are able to handle pretty much any kind of visual that we would like. But we have to remember that we are working with a lot of data and go through the algorithms to understand that information first. Some of the different options that you are able to work with when it comes to these visuals will include the following: The step plot Contour plots Quiver plots Spectrograms Pie charts Histograms
Bar charts Scatter plots Line plots In addition to helping you to work through some of the different plots and graphs that we have above, it is possible to work with a few of the other capabilities that happen with this language and this library. You can use this kind of library, and some of the features that we need, in order to work with creating grids, legends, and labels to make the formatting of our visuals easier to handle. There is a lot that we are going to enjoy when it is time to handle these visuals with Matplotlib, and it is definitely an option that you will want to spend some of your time on. Scikit-Learn This is going to be an additional package that you are able to get along with the SciPy Stack that we talked about earlier on. This one was designed to help us out with a few specific functions, like image processing and the facilitation of machine learning. When it comes to the latter of the two, one of the most prominent is going to be this library compared to all of the others. It is also one that is built on SciPy and will make a lot of use on a regular basis of the math operations that come with SciPy as well. This package is a good one to work with because it can expose a concise and consistent interface that programmers are able to use when it is time to work with the ones that go with the most common algorithms of machine learning. This is going to make it simple to bring machine learning into the production system. The library is able to combine together quality code and good documentation, which can bring together high performance and ease of use, and it is one of the industry standards when it comes to doing anything that you need with machine learning in Python. Theano
We can also spend some time working with the Theano library, and we will find how this one is going to work the best when we want to handle more of the deep learning process rather than machine learning like the other options. This library is going to be a kind of package from Python that is able to handle arrays that are more multi-dimensional, similar to what we saw with the NumPy library and some of the mathematical expressions and operations. When we work with the Theano library, and we get it all compiled, which means that we get it to run as efficiently as possible on all of the architectures along the way, it is going to help us to get so much done in no time at all. This library is going to be so great with some of the deep learning that we want to accomplish, and it is worth our time if we want to focus more on the deep learning that we need. One of the most important things that we will be able to focus on when it comes to working with the Theano library is that it is really great at integrating tightly with the NumPy library on some of the operations that are considered lower in level. The library is going to help us to optimize any of the GPU and CPU that you are working with, which is going to help us to go through these computations faster than before. Add in that this library is going to be more efficient and table, and you will get precision in your results that weren’t possible in the past, and you will see why this is a great option to go with. TensorFlow The next library on the list that we are able to talk about is going to be known as the TensorFlow library. This is going to be a library that is special because it was originally developed by Google, and it is also going to be open-sourced so that we are able to use it for our own needs in no time. It also comes with computations for data flow graphs and more that have been sharpened in order to make sure that we can handle machine learning. In addition, we are going to find that this library is going to be one of the best
to choose when it is time to work with neural networks. These networks are a great type of algorithm to handle because they will help us to handle our data and make some good decisions through the system. However, we have to remember that this is not something that is only specific to Google's company. It is going to have enough power behind it and will be general-purpose enough to help us out with some applications that are better for the real world. One of the biggest features that we are going to need to focus on when it comes to this kind of library is that we are likely to see a lot of nodes that are in many layers when we work with the system. This is going to be great to work with because it will help us to train any of the artificial neural networks that we have, even when we have a set of data that is really large. This is going to make it easier to handle some of the models and algorithms that we are looking to create. For example, this is a library that can help with voice recognition and even the identification of objects in a picture that is presented. And these are just a few of the options that we will be able to see with this kind of library. Keras And the final library that we are going to take a look at in this guidebook is the Keras library. This is going to be a great open-sourced library that is going to help again with some of the neural networks that we want to handle in this language, especially the ones that happen at a higher level, and it is also written in Python to make things easier. We will find that when it comes to the Keras library, the whole thing is pretty easy to work with and minimalistic, with some high-level extensibility to help us out. It is going to use the TensorFlow or Theano libraries as the back end, but
right now, Microsoft is working to integrate it with CNTK as a new back end to give us some more options. Many users are going to enjoy some of the minimalistic design that comes with Keras. In fact, this kind of design is aimed at making our experimentation as easy and fast as well, because the systems that you will use will still stay compact. In addition, we will find that Keras is going to be an easy language to get started with, and it can make some of the prototyping that we want to handle easier. We will also find that the Keras library is going to be written out in pure Python, and it is going to be a higher level just by nature, helping us to get more programming and machine learning done on our own. It is also highly extendable and modular. Despite the ease of using this library, the simplicity that comes with it, and the high-level orientation, Keras is still going to have enough power to help us get a lot of serious modeling. The general idea that is going to come with Keras is based on lots of layers, and then everything else that you need for that model is going to be built around all of the layers. The data is going to be prepared in tensors. The first layer that comes with this is then responsible for the input of those tensors. Then the last layer, however many layers this may be down the road, is going to be responsible for the output. We will find that all of the other parts of the model are going to be built in between on this to help us get the results that we would like. And finally, we need to work with StatsModels. This is going to be one of the modules that come with Python, and it is going to provide us with some chances to work with statistical data analysis. This includes things like a model estimation and helping us to perform some statistical tests when they are needed.
With the help of this library, we are going to be able to implement a lot of the methods of machine learning that we want, while exploring some of the different possibilities when it comes to plotting.
Visualization One thing that we need to focus on when it comes to working with machine learning and data science is figuring out how to take some of the insights and patterns that we find, and turning them into a visual that we are able to handle as well. You will need to work with a specific library to help make sure that you are able to get these visuals taken care of overall. The first of these data visualization libraries that we are able to work with is Matplotlib. This is going to be a library that is able to help us to really work on creating some 2D diagrams and graphs that we are working with. This helps us to create a huge number of charts and graphs that we need to handle and show off some of the complex relationships that are going to show up in our work. In addition, many of the other plotting libraries that are out there and pretty popular are designed so that they can work along with this Matplotlib. There have been a lot of changes to Matplotlib over the years when it comes to the sizes, colors, legends, and fonts that we can work with. These are good improvements to work with because they are going to show us a lot of different ways that we can create the graphs and more that we are looking at in a short amount of time. We can also work with Seaborn. This is going to be a higher-level of API that is going to be based on the Matplotlib library. It is going to contain a more suitable default setting to help us to go through and process some of the charts that we want to work with. There is also going to be a rich gallery of visuals that you are able to use, like the violin diagram, joint plots, and time series. Programmers are able to work with the Plotly library as well. This is going to be something that we are able to use when we would like an
easy method to build up sophisticated graphics and more. This is a package that is adapted to help work with web applications that are more interactive. And it is going to include a lot of different visualizations that you are able to work with, including ternary plots, 3D plots, and contour graphics.
Machine Learning Libraries As you go through this process and you work on some of the projects that come with data science, it is possible that you are going to work with machine learning as well. You need to learn how to work with machine learning because it is going to help us to really see how to take that data and learn from it. And this is where we are going to see some of that Python coding come into play. With this in mind, some of the libraries that we are able to use that can help us use Python for machine learning will include: Scikit-Learn is a library that we need to focus on. This is going to be a module that works with Python and will be based on SciPy and NumPy and can be a great option to handle when it is time to work with data. It is going to provide us with a ton of the algorithms that we need to work on machine learning and tasks of data mining, including model selection, dimensionality reduction, classification, regression, and clustering.
Deep Learning Deep learning is going to be a subset of machine learning that takes some of the things that we are able to do with that to the next level. A lot of the unsupervised machine learning tasks that you want to work with, and that we will talk about more later, will be done with the help of deep learning. Some of the different tasks that we are able to handle when it comes to working on deep learning will include: The first option is going to be TensorFlow. This is going to be a very popular framework when it comes to machine learning and deep learning, and it was developed by Google Brain. It is going to provide us with some abilities to work with multiple sets of data and neural networks all in one. In fact, it is going to be one of the most popular deep learning options, and some of the applications of TensorFlow that you will be able to enjoy, including speech recognition, object identification, and more. PyTorch is another framework that you are going to be able to work with, as well. This is going to include a large framework that is going to allow us to perform some tensor computations with the help of CPU accelerations, create some dynamic computational graphs, and even calculate gradients automatically. In addition to all of this, PyTorch is going to provide us with a rich API to help solve applications that are related to neural networks. This is a library that is going to be based on Torch, which is another library that is open-sourced, deep learning, and implemented with the C language and a wrapper of Lua. The Python API was not introduced until 2017, and from that point on, we are going to see that this is a framework that is really popular and used in many applications by data scientists.
The Keras library is next on the list. This is going to be a library that is high-level that can help us with neural networks as well. It is also going to be one that is able to work on top of Theano and TensorFlow. It is going to be able to take some of the specific tasks that you need to spend your time on, and make them easier to handle while reducing how much monotonous code that you have to deal with in data science. You will find that it is not going to be all that suitable for some of the more complicated things that you want to handle. These are just a few of the different languages that you are going to be able to work with when it is time to handle all of the data science projects that you would like to work with. Some of these are going to work well together, and others are not going to work the way that you want unless they are done on their own. But often it is going to depend on the kind of project that you want to work with, and what you are hoping to get out of that process, for you to choose which of the libraries that you want to pick form. The goal here is to learn about the libraries, know how your own project is supposed to work, and then move from there when it is time to make some choices on the library that you want to work with.
Chapter 4: Data Science and Applications To some extent, data science is recently becoming the most popular field. Nearly all of the world's businesses today use data science. Consequently, the fuel of any industry is data science. Industries that use data science include transport, banking, education, ecommerce, manufacturing, finance, and so on. To this end, related to the convention of data science are various applications. Multiple disciplines stem from this single career line. With massive numbers of applications, data science has become quite essential for all industries. It has shaped and kept so many businesses in any trends around the world. It is not overnight that the function of data science applications develops. Cheaper storage and computing have made tremendous contributions to shorten tasks people do in a day within a few hours. It will be essential to discuss some of these critical applications and see how they have shaped today’s industries. Also, the way they transform the world and revolutionize people’s perceptions of data. Ultimately, it is vital to address various situations industries use data to make them better.
Banking and Finance Finance takes the leading position when it comes to data science applications. Every year, losses and bad debts were on the rise, and businesses were going down. Grief was the order of the day for those surviving. However, since they sanctioned loans while they have paperwork that provided them with various data, they needed rescue, and that is where data scientists came in to help. As a vital element to match their competition, it is now more than a trend for the banking industries to engage in the applications of data science. Right now, making smarter decisions, enhancing performance, and focusing their resources have been possible for banks because of those big data technologies. Some of the cases of data science applications include: Fraud Detection For fraud involving credit cards prevention and detection, insurance, accounting, and so many more, data science application becomes crucial. Banks are being proactive with the security of their employees and customers. It is now faster for banks to resist activity on an account to minimize losses since they detect fraud quickly. As a result, they have been able to avoid significant losses and achieve necessary protection when they implement a series of fraud detection schemes. The fraud detection vital steps include: Estimation of model Getting data samplings for preliminary testing and model estimation Deployment and testing stage Data scientists need to fine-tune and train individual data set since they are
different. There are demands for expertise in techniques of data-mining, including forecasting, classification, association, and clustering, to transform the indepth theoretical knowledge into practical applications. For example, the bank’s fraud protection system can put unusual high transactions on hold pending confirmation from the account holder. Algorithms of fraud detection can also investigate multiple accounts opened in a short period with the same data or unusual high purchases of popular items of new accounts. Customer Data Management It is part of the obligations of the banks to analyze, store, or collect vast numbers of data. With these data, data science applications are transforming them into a possibility for banks to learn more about their customers. Doing this will drive new revenue opportunities instead of seeing those data as a mere compliance exercise. People widely use digital banking, and it is more popular these days. The result of this influx produces terabytes of data by customers; therefore, isolating genuinely relevant data is the first line of action for data scientists. With the customers’ preferences, interactions, and behaviors, then, data science applications will isolate the information of the most relevant clients and process them to enhance the decision-making of the business. Investment Banks Risk Modeling While it serves the most critical purposes during the pricing of financial investments, investment banks have a high priority for risk modeling since it helps regulate commercial activities. For investment goals and to conduct corporate reorganizations or restructuring, investment banking evaluates the values of businesses to facilitate acquisitions and mergers as well as create capital in corporate financing. For banks, as a result, risk modeling seems exceedingly substantial, and with
more data science tools in reserve and information at hand, they can assess it to their benefit. Now, for efficient risk modeling and better data-driven decisions, with data science applications, innovators in the industry are leveraging these new technologies. Personalized Marketing Providing a customized offer that fits the preferences and needs of particular customers is crucial to success in marketing. Now it is possible to make the right offer on the correct device to the right customer at the right time. For a new product, people target selection to identify potential customers with the use of data science applications. With the aid of apps, scientists create a model that predicts the probability of a customer’s response to an offer or promotion through their demographics, historical purchase, and behavioral data. Thus, banks have improved their customer relations, personalize outreach, and efficient marketing through data science applications.
Health and Medicine An innovative potential industry to implement the solutions of data science in health and medicine. From the exploration of genetic disease to the discovery of drugs and computerizing medical records, data analytics is taking medical science to an entirely new level. It is perhaps astonishing that this dynamic is just the beginning. Through finances, data science and healthcare are most times connected as the industry makes efforts to cut down on its expenses with the help of a large amount of data. There is quite a significant development between medicine and data science, and their advancement is crucial. Here are some of the impacts data science applications have on medicine and health. Analysis of Medical Image Medical imaging is one of the most significant benefits the healthcare sectors get from data science applications. As significant research, Big Data Analytics in healthcare indicates that some of the imaging techniques in medicine and health are X-ray, magnetic resonance imaging (MRI), mammography, computed tomography, and so many others. More applications in development will effectively extract data from images, present an accurate interpretation, and enhance the quality of the image. As these data science applications suggest better treatment solutions, they also boost the accuracy of diagnoses. Genomics and Genetics Sophisticated therapy individualization is made possible through studies in genomics and genetics. Finding the individual biological correlation between disease, genetics, and drug response and also understand the effect of the DNA on our health is the
primary purpose of this study. In the research of the disease, with an in-depth understanding of genetic issues in reaction to specific conditions and drugs, the integration of various kinds of data with genomic data comes through data science techniques. It may be useful to look into some of these frameworks and technologies. For a short time of processing efficient data, MapReduce allows reading genetic sequences mapping, retrieving genomic data is accessible through SQL, BAM file computation, and manipulation. Also, principally to DNA interpretation to predict the molecular effects of genetic variation, Deep Genomics makes a substantial impact. Scientists have the ability to understand the manner in which genetic variations impact a genetic code with their database. Drugs Creation The process of drug discovery is highly complicated since it involves various disciplines. Most times, the most excellent ideas pass through billions of enormous time and financial expenditure and testing. Typically, getting a drug submitted officially can take up to twelve years. With an addition of a perspective to the individual stage of drug compound screening to the prediction of success rate derived from the biological factors, the process is now shortened and simplified with the aid of data science applications. Using simulations rather than the “lab experiments,” and advanced mathematical modeling, these applications can forecast how the compound will act in the body. With computational drug discovery, it produces simulations of computer models as a biologically relevant network simplifying the prediction of future results with high accuracy. Virtual Assistance for Customer and Patients Support The idea that some patients don’t necessarily have to visit doctors in person is the concept behind the clinical process optimization.
Also, doctors don’t necessarily have to visit too when the patients can get more effective solutions with the use of a mobile application. Commonly as chatbots, the AI-powered mobile apps can provide vital healthcare support. Derived from a massive network connecting symptoms to causes, it is as simple as receiving vital information about your medical condition after you describe your symptoms. When necessary, applications can assign an appointment with a doctor and also remind you to take your medicine on time. Alongside allowing doctors to have their focus on more critical cases, these applications save patients’ time on waiting in line for an appointment as well as promote a healthy lifestyle. Industry Knowledge To offer the best possible treatment and improve the services, knowledge management in healthcare is vital. It brings together externally generated information and internal expertise. With the creation of new technologies and the rapid changes in the industry every day, effective distribution, storing, and gathering of different facts is essential. For healthcare organizations to achieve progressive results, the integration of various sources of knowledge and their combined use in the treatment process is secure through data science applications.
Oil and Gas The primary force behind various trends in industries like marketing, finance, internet, among others, is machine learning and data science. And there appears to be no exception for the oil and gas industry through the extracting of important observations with some applications in the sectors in upstream, midstream, and downstream. As a result, within the industry, a valuable asset to companies is refined data. Data science applications are quite useful in some of these sectors of oil and gas. Immediate Drag Calculation and Torque Using Neural Networks There is a need to analyze, in drilling, the structured visual data, which operators get through logging. Also, they can capture the electronic drilling recorder and contextual data, which takes the pattern of daily reports of the drilling log. It is essential to make an instant decision because of the time-bound disposition of drilling operations. As a result, companies predict drilling key performance indicators; analyze rig states for real-time data visualization with the use of neural networks. Using the AI, they can estimate the coefficient of regular and friction contact forces between the wellbore and the string. Also, in any given well, they can calculate on the drill strings real-time the drag and torque. Historical data of pump washouts is what operators can utilize, and through the alerts on their phone, they will be able to know when and if there will be a washout. Predicting Well Production Profile Through Feature Extraction Models The recurring neural networks and time series forecasting are part of the optimization of oil and gas production. Rates of gas-to-oil ratios and oil rates prediction are significant KPIs.
Operators can calculate bottom-hole pressure, choke, wellhead temperature, and daily oil rate prediction of data of nearby well with the use of feature extraction models. In the event of predicting production decline, they make use of fractured parameters. Also, for pattern recognition on sucker rod dynamometer cards, they utilize neural networks and deep learning. Downstream Optimization To process gas and crude oil, oil refineries use a massive volume of water. Now, there is a system that tackles water solution management in the oil and gas industry. Also, with the aid of distribution by analyzing data effectively, there is an increase in modeling speed for forecasting revenues through cloud-based services.
The Internet Anytime anyone thinks about data science, the first idea that comes to mind is the internet. It is typical of thinking of Google when we talk about searching for something on the internet. However, Bing, Yahoo, AOL, Ask, and some others are also search engines. For these search engines to give back to you in a fraction of a second when you put a search on them, data science algorithms are all that they all have in common. Every day, Google processes more than 20 petabytes, and these search engines are known today with the help of data science. Targeted Advertising Of all the data science applications, the whole digital marketing spectrum is a significant challenge against the search engines. The data science algorithms decide the distribution of digital billboards and banner displays on different websites. And against the traditional advertisements, data science algorithms have helped marketers get higher click-through-rates. Using the behavior of a user, they can target them with specific adverts. At the same time and in the same place online, one user might see ads on anger management while another user sees another ad on a keto diet. Website Recommendations This case is something familiar to everyone as you see suggestions of the same products, even on eBay and Amazon. Doing this adds so much to the user experience while it helps to discover appropriate products from several products available with them. Leaning on the relevant information and interest of the users. So many businesses have promoted their products and services with this engine. To improve user experience, some giants on the internet, including Google
Play, Amazon, Netflix, and others, have used this system. They derived these recommendations on the results of a user’s previous search. Advanced Image Recognition The face recognition algorithm makes use of an automatic tag suggestion feature when a user uploads their picture on social media like Facebook and starts getting tag suggestions. For some time now, Facebook has made significant capacity and accuracy with its image recognition. Also, by uploading an image to the internet, you have the option of searching for them on Google, providing the results of related search with the use of image recognition. Speech Recognition Siri, Google Voice, Cortana, and so many others are some of the best speech recognition products. It makes it easy for those who are not in the position of typing a message to use speech recognition tools. Their speech will be converted to text when they speak out their words. Though the accuracy of speech recognition is not certain.
Travel and Tourism There are several constant challenges and changes, even with the exceptional opportunities data science has brought to many industries. And there is no exception when it comes to travel and tourism. Today, there is a rise in travel culture since a broader audience has been able to afford it. Therefore, by getting more extensive than ever before, there is a dramatic change in the target market. As a worldwide trend, travel and tourism is no more a privilege of the noble and the rich. The data science algorithms have become essential in this industry to process massive data and also delight the requirements of the rising numbers of consumers. To enhance their services every day, the hotels, airlines, booking and reservation websites, and several others now see big data are a vital tool. The travel industry uses some of these tools to make it more efficient. Customer Segmentation and Personalized Marketing Personalization has become a preferred trend for some people to appreciate travel experience. The customer segmentation is the general stack of services to please the needs of every group through the adaptation and segmenting of the customers according to their preferences. Hence, finding a solution that will align with all situations is crucial. Collecting users’ social media data to unify behavior, metadata, and geolocation is what customer segmentation and personalized marketing is all about. For the future, it assumes and processes the preferences of the user. Analysis of Customer Sentiment Recognizing emotional elements in the text and analyzing textual data is what
sentiment analysis does. The service provider, as well as the owner of a business, can learn about the customers’ real attitude towards their brands through sentiment analysis. The reviews of customers have a huge role when it comes to the travel industry. This analysis is because to make decisions, travelers read reviews customers posted on various websites and platforms and then act upon these recommendations. As a result, providing sentiment analysis is one of the service packages of some modern booking websites for those travel hotels and agencies that are willing to cooperate with them. Recommendation Engine This concept is one of the most promising and efficient, according to some experts. In their everyday work, some central booking and travel web platforms use recommendation engines. Mainly, through the available offers, they match the needs and wishes of customers with these recommendations. Based on preferences and previous search, the travel and tourism companies have the ability to provide alternative travel dates, rental deals, new routes, attractions, and destinations when they apply the data-powered recommendation engine solutions. Offering suitable provisions to all these customers, booking service providers and travel agencies achieve this with the use of recommendation engines. Travel Support Bots With the provisions of exceptional assistance in travel arrangements and support for the customers, travel bots are indeed changing the travel industry nowadays. Saving user’s money and time, answering questions, suggesting new places to visit, and organizing the trips have the influence of an AI-powered travel bot.
It is the best possible solution for customers' support due to its support of multiple languages and 24/7 accessibility mode. It is significant to add that these bots are always learning and, as such, are becoming more helpful and smarter every day. Therefore, solving the major tasks of travel and tourism is what a chatbot can do. Both customers and business owners benefit from these chatbots. Route Optimization In the travel and tourism industry, route optimization plays a significant role. It can be quite challenging to account for several destinations, plan trips, schedules, and working distances and hours. With route optimization, it becomes easy to do some of the following: Time management Minimization of the travel costs Minimization of distance For sure, data science improves lives and also continues to change the faces of several industries, giving them the opportunity of providing unique experiences for their customers with high satisfaction rates. Apart from shifting our attitudes, data science has become one of the promising technologies that bring changes to different businesses. With several solutions the data science applications provide, it is no doubt that its benefits cannot be over-emphasized.
Chapter 5: The Lifecycle of Data Science The next thing that we need to take a look at here is the lifecycle of data science. There are actually quite a few steps that we need to focus on here in order to make sure that we are going to get the most out of any data science project that we are trying to work with along the way. It would be nice if it were a process that just took a few minutes, and then we were set, but this is just not how things are meant to go. There are a lot of steps that have to all come together and work well together to ensure that this is going to work in the manner that we would like. You have to make sure that you have the right data, for example, you have to make sure that you clean out the data and get it organized, and you need to run it through the algorithms and other models that you would like, just to learn what information and insights are found in all of that data. This is a complex process that is going to take some time to accomplish, and often those who are just getting started with this process are going to be amazed at the amount of work that they have to use in order to make this happen for their needs. Knowing the steps ahead of time will ensure that you are able to really get the most out of this process, that you will start out and end up in the right spot, and so much more. With that in mind, some of the steps that we need to use in order to get started with doing our own process of data science will include:
The Discovery Phase The first phase of this that we need to take a look at is going to be the discovery phase. This is going to include a lot of questions, a lot of research, and a good understanding of what your business is hoping to get out of this whole process before we even get started. Before you even think about starting out on this project, it is important to go through and understand some of the different specifications, the requirements, and then the priorities and the required budget to make this all work. Going in without all of this in place is just going to lead to a mess. If you do not have an idea of what the specifications and priorities are supposed to be in this process, then you are just going to grab any random data that you can find, and that is easy to gather up, and then call it good. This will definitely not be a good thing for what you are trying to do. If you do not go into this knowing the budget, the money will run out way before you are able to figure out how this process even works. You need to make sure in this stage that you have the right ability in order to ask the right questions all of the time. Here, you will assess if you have the right resources at hand, as well. This means that you need to know whether you have the right amount of data, time, technology, and people in order to fully support the project that you are working with. And finally, we have to be able to frame our main business problem (or at least the one that we want to work with right now) and formulate an initial hypothesis to test it all out.
The Data Preparation Phase The next thing that we need to take a look at is the data preparation phase. This is going to be the phase where you will spend most of your time in because we have to make sure that we are not just gathering up the right information as we go along. We want to make sure that we are organizing the data, dealing with the outliers that are there, filling in the missing values, and being careful about the duplicates that are going to show up in this process as well. It is a difficult task to work with, but it is necessary if we would like to make sure that our predictions are going to be as accurate as possible. This is not the most glamorous out of the work that you are doing, but it is very important. In this phase, it is often necessary that we have an analytical sandbox in which we are able to perform some of the analytics for the entire time that we work on the project at hand. We need to spend some time in this stage looking and exploring the data, preprocessing it, and conditioning the data before we do the modeling. This all takes time, and it may not be fun, but it is something that is necessary when it comes to the success of your project. In addition, during this phase, we are going to work with the process known as ETLT, or extract, transform, load, and transform in order to make sure that the data is organized and ready to go in the manner that we would like and to make sure that we are able to get the data into the sandbox that we would like. There are going to be a few different options that we are able to work on in order to handle the data preparation phase that we are on right now. The two most popular options and the ones that will ensure that we are able to get the most out of the process will be the R and the Python coding language. For the most part, data scientists are going to use Python in order to get the most out of their training process.
Working with Python is the best choice. It is simple enough to learn how to use, even for someone who is more of a beginner in all of this, and it will ensure that you have all of the power and all of the libraries that are needed in order to handle this phase as well. There are a lot of different parts that we are going to focus on when it is time to handle the data preparation phase of the whole process. And often, it is a phase that we are not going to spend enough time on. But in reality, if the data that you have is not organized and ready to go in the manner that you would like, then it is going to cause a lot of problems along the way. The cleaner that you are able to make that data, the better. When the data is higher in quality, and when things like the outliers, duplicates, and missing values are gone, it is so much easier to work with all of this. The algorithm will be able to go through the data more efficiently than before, and you will have predictions and insights that you are actually able to handle and trust along the way as well.
The Model Planning Phase The third step that we are going to be able to work with is known as the model planning phase. In this one, we are going to spend some time determining the techniques and methods that are available in order to draw up the relationships between the variables that we have. These relationships are important because they are basically going to help us set the foundation for the algorithms that we want to implement. Without these algorithms, your coding is not going to work, and you will never learn what is inside of that data you are dealing with. And we will learn how to implement some of these algorithms in the next phase. There are a number of planning tools that are available for us to work with on this one. Most programmers are going to focus on the Python language and all of the benefits that it is able to provide. There are a few others that we are able to spend our time and attention on, and these will include: R: This one is sometimes seen as the best option to work with when it comes to completing a data analysis because it allows us to have the capabilities of modeling and will provide us with a really good environment when it is time to build up the interpretive models that we want. SQL Analysis Services: These are going to be the ones that are able to perform some of the analytics that need to happen in the database. These are going to be done thanks to some of the more common functions of data mining, as long as they are used with some of the basic predictive models that you need as well. SAS/ACCESS: This one is going to be helpful because we are able to use it to access all of the data that we need from the Hadoop system when we need it the most.
And it is often going to be used when we would like to create model flow diagrams that we can repeat, and that we are able to reuse in the process as well. At this point, we should have a really good idea about some of the nature of our data, and we should know whether it is to the quality standards that we are looking for or not. And because of this, it is time to move on to the next step and ensure that we are ready to handle some of the data by putting it through the algorithms that we pick in the next step. This is where we are heading in the next step as well, it will be time to add in a bit of machine learning (which we are going to be able to use in more depth later on when we discuss it), in order to find a good algorithm that will go through the data that we have, and will provide us with the insights and predictions that we need. But this is only going to happen when we are able to go through and do some of the steps that were listed out before. The Model Building Phase The fourth step that we need to take some time on here is going to be the model-building phase. When we are here, we will need to actually go through and use machine learning and some of the algorithms that are necessary with it and put it to good use. This is also the phase where we are going to find all of the data sets and more that we need to handle the training and the testing. One thing that a lot of people are not aware of when they first get started with all of this is that they actually need to go through and train and then test the algorithm. They assume that they are able to just write out the algorithm in the manner that they would like, and then put through the data that they want. They then assume that the data that comes out of these untrained and untested algorithms are going to be accurate and will actually help them out with the work that they want to do.
But these algorithms have to be trained in some manner, and if you do not take the time to do this training ahead of time, or you don’t take the time to test them out either, then you are going to end up with some trouble. The algorithm is not going to be as accurate as we would like, and we will end up with a lot of predictions and insights that we are not able to trust at all. During this phase, we will also need to take some time to discover or consider whether the tools that we already have in our possession are going to be enough to help run some of those models, or if we are going to need to make sure that the environment that we are working with will be more robust. Usually, when we take a look at this, we are interested in finding out whether the processing power that we have is strong enough to handle the work or if we need to change it up. The good news here is that there are a lot of great algorithms that we are able to use to make this one work, and when we are able to put them all together and use all of the tools that are out there, you will be able to create some really good information that will push you forward. But you have to take the time to really optimize the algorithm that you want to work with and to ensure that it is going to be able to handle some of the predictions and insights that we need to make all of this work for our needs.
The Operationalize Phase When we are working with this phase, it is time to deliver some of the final reports and briefings, as well as some of the codes and technical documents that are needed to go along with this one. Usually, the data scientist will need to spend their time showing the right people in the business how to work on the information and make it work for their needs as well. And since a lot of these key decision-makers are often not going to have technical backgrounds, understanding the information is going to be tough. This is why the data scientist will need to spend some time going through the information and getting it set up in a manner that will make it easier overall. When the data is turned into a format that we are able to understand, and will ensure those who are using the information for making decisions, then it is going to help them out quite a bit. In addition, sometimes we are going to take some time to work on a pilot project. This can be implemented in real-time in the production environment as well. This allows us to take some time to try out some of the insights that we have found, without having to implement them throughout the whole company. This gives us a better idea of what is going on with some of the work that we are doing, and to see whether the process we want to use is going to actually work before we waste time and money on doing it throughout the company. This is one of the best ways for us to go through and see a clear picture of the performance and other related constraints on a smaller scale before you deploy it completely.
The Communicate Results Phase At this point, it is important for us to go through and evaluate whether or not we have been able to achieve some of our goals. This is talking about some of the goals that we were able to go through in the first phase of this process. If we have not been able to gather up and work with the goals along the way, then we will be able to work with making some of the necessary changes that will help us to meet our goals. So, when we are in this last phase, we are going to spend some time identifying all of the key findings that are going to show up along the way. And when we have been able to identify some of these key findings, we can then communicate that information between the stakeholders, and then determine whether or not the results of this project are going to be a failure, or if we were successful. But we have to base all of this on the criteria that we took the time to develop and look at in the first phase. As we can see, there is quite a bit that is going to show up when it is time to handle some of the work that we need with a data science project. We have to go through all of these phases to ensure that we are going to see some of the best results along the way. And when it all comes together, we will be able to have the right data, that we are cleaning it off, and that we are picking out the right algorithms to ensure that this is going to work the way that we would like.
Chapter 6: Probability, Statistics, and Data Types Things are quite straightforward in Knowledge Representation and Reasoning; KR&R. Exclusive of doubt, formulating and representing propositions is easy. The thing is, when uncertainty makes itself known, problems begin to arise— for example, an expert system designed to replace a doctor. For diagnosing patients, a doctor possesses no formal knowledge of treating the patient and no official rules based on symptoms. In this situation, to determine if the patient has a specific condition and also the cure for it, it is the probability the expert system will use to formulate the highest probability chance.
Real-Life Probability Examples As a mathematical term, probability has to do with the possibility that an event may occur, like taking out from a bag of assorted colors a piece of green or drawing an ace from a deck of cards. In all daily decision-making processes, you use probability even without having a clue of the consequences. While you may determine the best course of action is to make judgment calls using subjective probability, you may not perform actual probability problems sometimes. Organize Around the Weather You can make plans with the weather in mind since you use probability almost every day. Predicting the weather condition is not possible for meteorologists and, as a result, to establish the possibility that there will be snow, hail, or rain, they utilize instruments and tools. For example, it has rained with the conditions of the weather that is 60 out of 100 days amid the same conditions when there is a 60 percent chance of rain. Intuitively, rather than going to work with an umbrella or putting on sandals, closed-toed shoes, maybe preferred outfit to wear. Also, not only do meteorologists analyze probable weather patterns for that week or day, but with the historical databases that they also examine to calculate approximately low and high temperatures. Strategies in Sports For competitions and games, the probability is what coaches and athletes utilize to influence the best strategies for sports. When putting any player in the lineup, a coach of baseball evaluates the batting average of such a player. For example, out of every ten at-bats, an athlete may get a base hit two if the player’s batting average is 200. The odds are even higher for a player to even have, out of every ten at-bats,
four hits when such a player has a 400-batting average. Another example is when; with field goal attempts from over 40 yards out of 15, a high-school football kicker makes nine in a season, his next goal effort from the same space may be about 60 percent chance. We can have an equation like this: 9/15 = 0.60 or 60 percent Insurance Option To conclude on the plans that are best for your family and even for you and the required deductible amounts, probability plays a vital role in analyzing insurance policies. For example, you make use of probability to know how possible it can be that you will need to make a declaration when you choose a car insurance policy. You may likely make consideration for not only liability but comprehensive insurance on your car when 12 percent or of every 100 drivers over the past year, 12 out of them in your community have crashed into a deer. Also, if following a deer-connected event run $28,000, not to be in a situation where you cannot afford to cover certain expenses, you might consider a lower deductible on car repairs. Recreational and Games Activities Probability is what you use when you engage in video or card games or play board games that have the involvement of chance or luck. A required video game covert missile or the chances of getting the cards you need in poker is what you must weigh. Also, the determination of the extent of the risk you will be eager to take rests on the possibility of getting those tokens or cards. For example, as Wolfram Math World suggests, getting three of a class in a poker hand is the odds of 46.3-to-1, about a chance of 2 percent. However, you will have about 42 percent or 1.4-to-1 odds that you will catch one pair. It is through the help of probability that you settle on the manner with which you intend to play the game when you assess what is at stake.
Statistics The basis of modern science is on the statements of probability and statistical significance. In one example, according to studies, cigarette smokers have a 20 times greater likelihood of developing lung cancer than those who don’t smoke. In another research, the next 200,000 years will have the possibility of a catastrophic meteorite impact on Earth. Also, against the second male children, the first-born male children exhibit IQ test scores of 2.82 points. But, why do scientists talk in ambiguous expressions? Why don’t they say; that lung cancer is a result of cigarette smoking? And they could have informed people if there needs to be an establishment of a colony on the moon to escape the disaster of the extraterrestrial. The rationale behind these recent analyses is an accurate reflection of the data. It is not common to have absolute conclusions in scientific data. Some smokers can reduce the risk of lung cancer if they quit, while some smokers never contract the disease; other than lung cancer, it was cardiovascular diseases that kill some smokers prematurely. As a form of allowing scientists to make more accurate statements about their data, it is the statistic function to quantify variability since there is an exhibition of variability in all data. Those statistics offer evidence that something is incorrect may be a common misconception. However, statistics have no such features. Instead, to observe a specific result, they provide a measure of the probability. Scientists can put numbers to probability through statistical techniques, taking a step away from the statement that someone is more likely to develop lung cancer if they smoke cigarettes to a report that says it is nearly 20 times
greater in cigarette smokers compared to nonsmokers for the probability of developing lung cancer. It is a powerful tool the quantification of probability statistics offers and scientists use it thoroughly, yet they frequently misunderstand it. Statistics in Data Analysis Developed for data analysis is a large number of procedures for statistics they are in two parts of inferential and descriptive: Descriptive Statistics: With the use of measures for deviation like mean, median, and standard, scientists have the capability of quickly summing up significant attributes of a dataset through descriptive statistics. They allow scientists to put the research within a broad context while offering a general sense of the group they study. For example, initiated in 1959, potential research on mortality was Cancer Prevention Study 1 (CPS-1). Among other variables, investigators gave reports of demographics and ages of the participants to let them compare, at the time, the United States’ broader population and also the study group. The age of the volunteers was from ages 30 to 108, with age in the middle as 52 years. The research had 57 percent female as subjects, 2 percent black, and 97 percent white. Also, in 1960, the total population of females in the US was 51 percent, black was about 11 percent, and white was 89 percent. The statistics of descriptive easily identified CPS-1’s recognized shortcomings by suggesting that the research made no effort to sufficiently consider illness profiles in the US marginal groups when 97 percent of participants were white. Inferential Statistics: When scientists want to make a considered opinion about data, making suppositions about bigger populaces with the use of smaller samples of data, discover the connection between variables in datasets, and model patterns in data, they make use of inferential statistics.
From the perspective of statistics, the term “population” may differ from the ordinary, meaning that it belongs to a collection of people. The larger group is a geometric population used by a dataset for making suppositions about a society, locations of an oil field, meteor impacts, corn plants, or some various sets of measurements accordingly. With regard to scientific studies, the process of shifting results to larger populations from small sample sizes is quite essential. For example, though there was the conscription of about 1 million and 1.2 million individuals in that order for the Cancer Prevention Studies I and II, their representation is for a tiny portion of the 1960 and 1980 United States people that totaled about 179 and 226 million. Correlation, testing/point estimation, and regression are some of the standard inferential techniques. For example, Tor Bjerkedal and Peter Kristensen analyzed 250,000 male’s test scores in IQ for personnel of the Norwegian military in 2007. According to their examination, the IQ test scores of the first-born male children scored higher points of 2.82 +/- 0.07 than second-born male children, a 95 percent confidence level of a statistical difference. The vital concept in the analysis of data is the phrase “statistically significant,” and most times, people misunderstand it. Similar to the frequent application of the term significant, most people assume that a result is momentous or essential when they call it significant. However, the case is different. Instead, an estimate of the probability is the statistical significance that the difference or observed association is because of chance instead of any actual connection. In other words, when there is no valid existing difference or link, statistical significance tests describe the probability that the difference or a temporary link would take place. Because it has a similar implication in statistics typical of regular verbal communication, though people can measure it, the measure of significance is most times expressed in terms of confidence.
Data Types To do Exploratory Data Analysis, EDA, you need to have a clear grasp of measurement scales, which are also the different data types because specific data types have correlated with the use of individual statistical measurements. Additionally, to select the precise visualization process, there is the requirement of identifying data types with which you are handling. The manner with which you can categorize various types of variables is data types. Now, let’s take an in-depth look at the main types of variables and their examples, and we may refer to them as measurement scales sometimes. Categorical Data Characteristics are the representation of categorical data. As a result, it stands for things such as someone’s language, gender, and so on. Also, numerical values have a connection with categorical data like 0 for females and 1 for regard. Be aware that those numbers have no mathematical meaning. Nominal Data The discrete units are the representation of nominal values, and they use them to label variables without any quantitative value. They are nothing but “labels.” It is important to note that nominal data has no order. Hence, nothing would change about the meaning even if you improve the order of its values. For example, the value may not change when a question is asking you for your gender, and you need to choose between female and male. The order has no value. Ordinal Data Ordered and discrete units are what ordinal values represent.
Except for the importance of its ordering, ordinal data is therefore almost similar to nominal data. For example, when a question asks you about your educational background and has the order of elementary, high school, undergraduate, and graduate. If you observe, there is a difference between college and high school and also between high school and elementary. Here is where the major limitation of ordinal data suffices; it is hard to know the differences between the values. Due to this limitation, they use ordinal scales to measure non-numerical features such as customer satisfaction, happiness, etc. Numerical Data Discrete Data: When its values are separate and distinct, then we refer to discrete data. In other words, when the data can take on specific benefits, then we speak of discrete data. It is possible to count this type of data, but we cannot measure it. Classification is the category that its information represents. A perfect instance is the number of heads in 100-coin flips. To know if you are dealing with discrete data or not, try to ask the following two questions: can you divide it into smaller and smaller parts, or can you count it? Continuous Data: Measurements are what continuous data represents, and as such, you can only measure them, but you can’t count their values. For example, with the use of intervals on the real number lines, you can describe someone’s height. Interval Data: The representation of ordered units with similar differences is interval values. Consequently, in the course of a variable that contains ordered numeric values and where we know the actual differences between the values is interval data.
For example, a feature that includes a temperature of a given place may have the temperature in -10, -5, 0, +5, +10, and +15. Interval values have a setback since they have no “true zero.” It implies that there is no such thing as the temperature in regards to the example. Subtracting and adding is possible with interval data. However, they don’t give room for division, calculation, or multiplication of ratios. Ultimately, it is hard to apply plenty of inferential and descriptive statistics because there is no true zero. Ratio Data: Also, with a similar difference, ratio values are ordered units. The contrast of an absolute zero is what ratio values have, the same as the interval values. For example, weight, length, height, and so on.
The Importance of Data Types Since scientists can only use statistical techniques with specific data types, then data types are an essential concept. You may have a wrong analysis if you continue to analyze data differently than categorical data. As a result, you will have the ability to choose the correct technique of study when you have a clear understanding of the data with which you are dealing. It is essential to go over every data once more. However, in regards to what statistic techniques one can apply. There is a need to understand the basics of descriptive statistics before you can comprehend what we have to discuss right now. Note: You can read all about descriptive statistics down the line in this chapter.
Statistical Methods Nominal Data The sense behind dealing with nominal data is to accumulate information with the aid of: Frequencies: The degree upon which an occasion takes place concerning a dataset or over a period is the frequency. Proportion: When you divide the frequency by the total number of events, you can easily calculate the proportion. For example, how often an event occurs divided by how often the event could occur. Percentage: Here, the technique required is visualization, and a bar chart or a pie chart is all that you need to visualize nominal data. To transform nominal data into a numeric feature, you can make use of one-hot encoding in data science. Ordinal Data The same technique you use in nominal data can be applied with ordinal data. However, some additional tools are there for you to access. Consequently, proportions, percentages, and frequencies are the data you can use for your summary. Bar charts and pie charts can be used to visualize them. Also, for the review of your data, you can use median, interquartile range, mode, and percentiles. Continuous Data You can use most techniques for your data description when you are dealing with continuous data. For the summary of your data, you can use range, median, percentiles, standard deviation, interquartile range, and mean. Visualization Techniques: A box-plot or a histogram, checking the variability, central tendency, kurtosis of a distribution, and modality all come to mind when you are attempting to visualize continuous data.
You need to be aware that when you have any outliers, a histogram may not reveal that. That is the reason for the use of box-plots.
Descriptive Statistics As an essential aspect of machine learning, to have an understanding of your data, you need descriptive statistical analysis since making predictions is what machine learning is all about. On the other hand, as a necessary initial step, you conclude from data through statistics. Your dataset needs to go through descriptive statistical analysis. Most people often get to wrong conclusions by losing a considerable amount of beneficial understandings regarding their data since they skip this part. It is better to be careful when running your descriptive statistics, take your time, and for further analysis, ensure your data complements all prerequisites. Normal Distribution Since almost all statistical tests require normally distributed data, the most critical concept of statistics is the normal distribution. When scientists plot it, it is essentially the depiction of the patterns of large samples of data. Sometimes, they refer to it as the “Gaussian curve” or the “bell curve.” There is a requirement that a normal distribution is given for calculation and inferential statistics of probabilities. The implication of this is that you must be careful of what statistical test you apply to your data if it is not normally distributed since they could lead to wrong conclusions. If your data is symmetrical, unimodal, centered, and bell-shaped, a normal distribution is given. Each side is an exact mirror of the other in a perfectly normal distribution. Central Tendency Mean, mode, and the median is what we need to tackle in statistics. Also, these three are referred to as the “Central Tendency.”
Apart from being the most popular, these three are distinctive “averages.” With regard to its consideration as a measure that is most consistent with the central propensity for formulating a hypothesis about a population from a particular model, the mean is the average. For the clustering of your data value around its mean, mode, or median, central tendency determines the tendency. When the values’ number is divided, the mean is computed by the sum of all values. The category or value that frequently happens contained by the data is the mode. When there is no repletion of number or similarity in the class, there is no mode in a dataset. Also, it is likely for a dataset to have more than one mode. For categorical variables, the single central tendency measure is the mode since you can compute, such as the variable “gender” average. Percentages and numbers are the only categorical variables you can report. Also known as the “50th percentile,” the midpoint or “middle” value in your data is the median. More than the mean, the median is much less affected by skewed data and outliers. For example, when a housing prizes dataset is from $100,000 to £300,000 yet has more than $3 million worth of houses. Divided by the number of values and the sum of all values, the expensive homes will profoundly impact the mean. As all data points “middle” value, these outliers will not profoundly affect the median. Consequently, for your data description, the median is a much more suited statistic.
Chapter 7: Most Common Data Science Problems Regardless of whether you pursue a full-time job in the field, or if you’re using data analytics in your pre-existing career, you’ll face certain problems with your work. You can’t always have a flawless and efficient workflow, no-one can, if you could, you’d soon enough become obsolete because there’d be countless people like you. While some parts of working in data science are utterly amazing, there are still some issues. You can easily get frustrated, especially since most of your superiors won’t know in detail what you do. It’s very difficult to communicate with non-data-analysts precisely what you do. Because of this, the post is prone to misunderstandings and mismanagement. While all that’s true, some of the problems you’ll have can be managed and resolved. In this section, we’ll look at the most common complaints that people working with data analytics have had in the past, as well as how to resolve them without much consequence.
Management Expects the World This issue is especially prevalent in positions that require you to do a degree in data modeling. Most data modeling concerns gathering and cleaning the data so that it’s actually usable. This is obviously quite a bit of an issue on the manager’s fault, as many of them will just suddenly come up with an idea and expect it to be done lastminute. Obviously, sometimes the modelers are at fault, but unfortunately, more often than not, it’s managers simply not understanding the job. In the management world, it’s quite common to insert things last-minute, but in data analytics that’s basically impossible. Your manager might just pop in and say, “Hey, we’re going to include a social media history in our latest analysis. Cool? Cool, I’ll see you in 15 minutes when it’s done.” Now, if you sigh at this kind of request, then at least there are some solutions for it. Not resolving this issue is bound to either cause serious delays, or some serious dissatisfaction from your managers. The worst thing here is that both sides of the argument are entirely understandable. The data scientists simply can’t deal with this in such a short time span, and managers will have a hard time understanding that. Serious complaints about managers being unreasonable and expecting the world are quite common in most technical fields, especially those concerning programming and AI. Fortunately, some solutions exist, and most of them are concerned with improving your communication skills while at the same time being clear about the possibilities of what is possible, and what isn’t. Let’s run through some solutions now.
First of all, you should keep communication open, but keep a firm “no changes” date. After that date, make sure your manager is aware no changes will be processed. Unfortunately, some managers will not be swayed by this. Be clear about what you can or cannot do. You can’t expect your manager to be perfectly well versed in data analytics. The main mistake managers make here is expecting data scientists to utilize datasets that either contain bad, little, or no data and actually have something to show for it at the end of the day. It’s imperative to explain to your manager what you can and cannot do. Give them a few useful articles to read about what ML and AI can actually accomplish, rather than what they’ve probably read. These days ML and AI are being hyped up to be essentially omnipotent and capable of turning any dataset into extremely valuable information. Unfortunately, as you know, this is quite far from the truth. The analysis you make has a limit on how good it can be, and that limit is the quality of the data you’re given. Naturally, you can use interpolation and extrapolation to “plug” the holes in a dataset, but it’s not like there’s a magic wand you can just point at the computer to create data. If you’re given a week of sales info, it doesn’t matter how good you are; you won’t be able to predict the sales of next year accurately. The best thing you can do about this is to pay attention to what kind of company you apply to. Do they already have many data scientists onboard? Do they collect a lot of good data already? Are they maybe adaptable enough to start collecting it as soon as you join? If the answer to these questions is no, you might want to reconsider working there.
It’s important to address this early on so it doesn’t affect you in the future. Besides that, try to explain to your manager that last-minute alterations are very difficult, and try to use phrases like “Yes, I could totally do that, it’s just going to add about options days to the schedule.” Your manager’s going to be singing a different tune soon.
Misunderstanding How Data Works Generally, people think of data as a set of information, a truth if you will. This couldn’t be farther from the truth. Data is merely facts until someone comes by and puts some context into it. This is an issue that can affect basically everyone; your boss, your manager, even you might fall into this faulty mindset. Being careful not to think about data as the information is one of the most crucial parts of being a data analytics expert. Fundamentally, it’s extremely important to remember that even if your title is “data analyst” when it comes to actual work, the analyst comes before the data. Fostering a data-first culture in the workplace is a surefire way to have every one of your endeavors heralded by utter failure. It’s easy to forget that data needs context to be useful, and it is so, so easy to fall down the slippery slope of worshiping data. Giving the context is your job; your job is to think about the data, to frame it. The data itself is like a wench is to a mechanic. You don’t go praising the wench for fixing the car, so you shouldn’t rely on the data too much either. You need to know the broader conditions; for example, market trends that aren’t in the data need to be considered. While your managers might be most inclined to trust in the numbers, your job is to reveal where those numbers might be faulty, what might be affecting them, and what the truth is closest to. Fortunately, this is an easy problem to solve; just let them have it. If your manager gets burned for a few million because they trusted data more than you, then next time, you can be sure that they’re going to pay more attention to your words next time around. Now, you also need to consider the bias of data collection when dealing with
work. All data collection processes are susceptible to certain biases. Let’s say that you’re analyzing a market based on how many people buy from the company site. In this case, the bias is on younger, more tech-savvy people, as older people are more likely to buy from brick-and-mortar stores.
Taking the Blame for Bad News Unfortunately, when it comes to being a data scientist, your recommendations are likely to end up in one of three ways: A bonus, a promotion, or expulsion from work. The danger of working as anyone that concerns themselves with data analytics is that you will often have to profess the bad news to your bosses. Unfortunately, not all of them have read Sun Tzu’s book of war and refuse to shoot the messenger. If your data analysis shows that there are serious problems in the company, or even that the company is headed towards its own destruction, it’s quite likely your bosses will be less than kind. Presenting this information can feel very awkward and uncomfortable, and can sometimes end up in disastrous consequences. In most cases, you won’t be to blame for this, but you are an easy link to scapegoat. Any manager can easily put the blame on you, and your boss might not be well versed enough to see through it. Now, ultimately, this is an issue you cannot precisely solve. If your boss is blaming you for what you found out after digging through the company data, you’ll probably want to check if your resume is up to date as soon as possible. You don’t need, and shouldn't put up with being attacked for doing your job. If you’re really committed to solving the issue, try assigning blame to yourself. You can’t hold your job if you sidestep telling your boss about things that were others’ responsibilities. Even if companies are trying to be modern and adapt to changes in the industry swiftly, fundamentally, most companies still run in an old-fashioned manner. The complaint that Strand has articulated is extremely common in data
scientists, and the chances that you aren’t going to run into an example of this in your career are close to nil. A recent study has shown that ⅔ of all managers distrust data and would rather hand over decision-making onto their intuition rather than trusting scientists. Unfortunately, these are generally mid-level managers, who have just enough power to feel like they’re important, but not enough power to affect the decisions made on a broader, company-wide scale. Most data scientists get stuck with working for one of these at least at one point in their careers. You’ll find that you have to convince the management of practically every new decision you have to make. Do you need better data collection? Are you trying to make a financial model of the company spending so you can budget accordingly? Well, too bad, because Steve from management has decided that his intuition tops that. Even in the case that you’ve actually gotten approval for your project, you’ll still face challenges with getting management to well… Act accordingly. Even if your model showed that your company spends too much on marketing, good luck convincing your managers of that. This is why skills in communications are so useful for any role related to data science. All of the analytical skills in the world are going to be useless if there’s nobody to take action upon them. Your results won’t have even the slightest impact on the firm unless you’re able to engage upper management enough with your speech, data presentation, etc. This is why it’s important to keep in mind soft skills, as well as your ability to do presentations and visualizations of projects. It’s much easier to convince management if you’re showing them shapes and figures, rather than Excel spreadsheets.
Try running your presentation by a friend that has absolutely no technical skills; this will prove to you whether your presentation is fine. Pay attention to what questions are posed to you, and try to address them more clearly in the presentation. It’s also sometimes useful to try to explain your ideas to an inanimate object. This lets you pay attention to how you talk as well as how you communicate the data without the need to have an actual person with you there. With that being said, don’t feel too bad if it doesn’t work out. Sometimes your managers will simply elect not to listen to the data or decide that something is simply more important. A relatively recent case of this was when data analytics showed that Grace & Frankie’s promotional images worked the best without the show’s star. The team of executives at Netflix then had to think about the pros and cons of excluding the lead, Jane Fonda, from the images. In the end, they elected not to, partly not to anger the lead, and partly because the show would be more “iconic” if the lead was present, rather than if promotional images were used exclusively as an advertisement. The only fortunate thing here is that this is a bit of a cascading issue. If you fail a few times, management is unlikely ever to trust you again. On the other hand, if you bring success a few times, you’ll build their confidence in your data, and they’ll be much more likely to trust you with important projects. It is a matter of picking your battles, so to speak, try to only engage where you are absolutely sure you can succeed.
Communication as a Solution You might have noticed that the overarching theme here is communication, and it is. While your data analytics and portfolio are the things that will let you get the job and perform it well, mere performance isn’t enough. To make your day-to-day life better and your career more successful, you have to practice communication and learn how to speak to your managers in the most effective ways possible. If you’re looking to hone your communication skills, look no further than those same managers you take issue with. They tend to be quite good at communicating with their bosses, talking to them, and paying attention to the terms and tactics they use can be an excellent way to learn communication skills. Above all, it is important to practice. Try to make your every email sound more professional, your every message to be more concise and effective. The same way you analyze in your job, analyze your approach to your job, think about what the most effective words to use are, and when to use them.
Chapter 8: Comparison of Python with Other Languages Python can be compared with other high-level programming languages. In comparison to other languages, Python surpasses based on functionalities, methods, libraries, and user-friendliness. This language has professional modules, frameworks, and translators that are increasing its popularity among the software industry and IT professionals. These correlations focus on the credibility of programming code and other significant factors. Let's discuss the detailed comparison of Python with other programming languages.
Python versus Java comparison Java programs are faster than Python programs. Python is vastly improved as a "high-level" language, while Java is better described as a low-level execution language. Indeed, the two together make a superb mixture. Various segments can be generated in Java and joined to shape usage in Python. Python can be utilized to model parts until their structure can be "solidified" in Java usage. A Python program written in Java is considered half-developed, which permits calling Python code from Java and the other way around. In this execution, Python source code is meant Java bytecode (with assistance from a runtime library to help Python's dynamic semantics). Java is a carefully embodied language, which means the variable names must be unequivocally proclaimed. Interestingly, we have a progressively composed Python, where no affirmation is required. There are numerous questions about powerful and measurement producing in programming languages. Notwithstanding, one idea ought to be noted: Python is an adaptable language with a straightforward sentence structure, which makes it a superb answer for composing contents and rapidly creating applications for different fields. Java enables you to make cross-platform applications, while Python is good with practically all cutting-edge working frameworks. Regarding start, Java is unreasonably convoluted for tenderfoots contrasted with Python. Furthermore, the simplicity of perusing code is better with Python. When you require your code to be executed from anyplace, at that point, pick Java. The other bit of leeway of Java is that it gives you a chance to make organized-based applications, while Python can't.
Java is considerably more convoluted than Python. When you don't have any specialized foundation learning, Java won't be simple. Then again, Java is utilized to program for various conditions and runtime executions of the program.
Python versus C# Regarding effortlessness, Python was initially made to look like English discourse. Such vast numbers of articulations in it are anything but difficult to peruse, mainly if you utilize appropriate variable names. Moreover, because of basic grammar, there are no entangled developments, for example, syntactic sections, countless word-modifiers, different C-like developments, and various approaches to introducing factors. Everything makes the code written in Python simple for comprehension and learning. Simultaneously, C#, because of the language heredity, has loads of things from C++ and Java, which is at first communicated in C-like sentence structure. Also, the C# language structure makes it essential to adhere to specific standards when composing your techniques or acquiring classes, which is joined by another surge of word-modifiers. One shouldn't likewise disregard squares of code, which ought to be 'enclosed' in props. Python doesn't have everything; it uses shifts which additionally make the code look perfect. Concerning the code programming composition, it's likely worth referencing that projects which Python calls code are codes; they are merely recording with code that can be effectively executed by the mediator. One can open them in any manager, work with them, and after that, quickly run once more. Also, with Python, it's a lot simpler to compose cross-platform contents that should be recompiled. In the Python programming language, we can design the required function to translate the code by machine and can shift this code to other platforms or systems to get executed.
This cross-platform feature of this programming language is unique. Subsequently, it will build the size of the content from a few kilobytes to twelve megabytes. Not helpful for one-time use. Thus, C# requires IDE for typical programming. As one or more of C#, it has a reliable help for different segments of the Windows framework when you are composing content for Windows. For instance, there are worked in devices for working with the library, WMI, the system, etc. Also, C# enables you to utilize WinForms, which makes it extremely simple to create a graphical interface if it is all of a sudden, required all things considered. There is no right answer to what language Python or C# is better. Python is simpler to learn; it has a lot of increasingly open-source libraries contrasted with C#. However, the standard library of C# is superior to Python's, C# has more functions, its presentation is higher, and it advances truly quick.
Python versus JavaScript Python's "object-based" subset is commonly corresponding to JavaScript. Like JavaScript (and not at all like Java), Python reinforces a programming style that uses fundamental limits and factors without participating in class definitions. Regardless, for JavaScript, there is always a need for class participation. Python, on the other hand, supports making much higher ventures and better code reuse through a genuine article orchestrated programming style, where classes and heritage expect a critical activity.
Python versus Perl Python and Perl start from a near establishment (Unix scripting, which both have long outgrown), and sport various equivalent features, anyway, have a substitute perspective. Perl stresses support for typical application- assignments; for example, by having worked in common explanations, investigating records, and report creating features. Python underlines support for essential programming strategies; for instance, data structure plan and organized programming, and urges programming architects to create understandable (and along these lines reasonable) code by giving a rich anyway not unreasonably cloud documentation. Subsequently, Python approaches Perl yet on occasion beats it in its one of a kind application territory; in any case, Python has a genuine nature well past Perl's claim to fame.
Python versus Tcl Tcl likewise to Python is used as an application development language, similarly as a free programming language. In any case, Tcl, which for the most part, stores all data as strings, is frail on data structures, and executes conventional code significantly more delayed than Python. Tcl in like manner needs features required for creating vast activities, for instance, estimated namespaces. Along these lines, while a "regular" immense application using Tcl, as a rule, contains Tcl enlargements written in C or C++ that are express to that application, a related Python application can much of the time be written in "Complete Python Code." Tcl is one of the redeeming qualities is the Tk tool compartments, whereas Python has gotten an interface to Tk as its standard GUI portion library.
Python versus Smalltalk Possibly the best differentiation among Python and Smalltalk is Python's progressively "standard" language structure, which allows software experts ease in working. Like Smalltalk, Python has dynamic forming, which is increasing the usage and functionalities of this programming language. Nevertheless, Python perceives worked in object types of data from customer described classes. However, Smalltalk's standard library data types are dynamically refined. Python's library has more workplaces for overseeing Internet and WWW substances, for instance, email, HTML, and FTP. Python can store both standard modules and customer modules in individual records, which can be improved or coursed outside the framework. There is more than one decision for affixing a Graphical User Interface (GUI) to a Python program, whereas Smalltalk lacks this attribute.
Python versus C++ Python and C++ are the programming languages used for the development of high-level projects. Both Python and C++ languages vary from one another from numerous points of view. C++ is begun from C language with various ideal models and gives multiple in-built components for creating programs, whereas Python is similar to the English language with highly simple syntax. Python is universally useful and one of the high-level programming languages. A variable can be utilized straightforwardly without its presentation while composing code in Python. In C++, a separate program needs to get ordered on each working framework on which the code is to be executed, while Python has frameworks that allow users to run a program in small sections Python gives the capacity to compose, and run on any platform' that empowers it to keep running on all the working frameworks. C++ is inclined to memory spill as it doesn't give separate execution options and uses pointers to a vast degree. Python has inbuilt trash accumulation and dynamic memory portion process that empower proficient use of memory. C++, nowadays, is commonly utilized for planning equipment. It is first portrayed in C++ pursued by its examination, structurally compelled, and wanted to build up a register-move level equipment depiction language. Python is utilized as a scripting language, and now it is also used for a nonscripting reason. Likewise, Python has an independent executable application with the assistance of some built-in functions.
Python versus Common Lisp and Scheme Common Lisp and Scheme are close to Python in their dynamic semantics. Python has logical limits like those of Lisp. Their programs can have unlimited consistent conditions to perform a particular task of extended length. Common Lisp and Scheme have some complex variations in their coding schemes only understandable by programmers. In contrast, Python has simple, easy to understand, and straightforward coding to manage every line of code. Python vs. Golang Golang is quite an adaptable language, just like Python. Both languages do not require excessive instructional exercise and are easy to understand and executable. Golang is also called the Go language, and Google developed it in 2009. Python underpins numerous programming ideal models and has a vast standard library; ideal models included are object-oriented, basic, practical, and procedural. Go underpins multi-worldview like procedural, practical, and simultaneous. Its sentence structure is customarily originating from C; however, it has a smooth syntax structure, which requires less effort. It is observed that Python and 'Go' have too many differences. Take, for example, Golang doesn't use the feature of try-except, rather it allows functions to show problems together with a conclusion. Therefore, before using a function, it is required to check that error will not return. Python is mostly utilized in web applications, whereas Golang prime focus is to become a system language. However, Go is also utilized in some web applications. Python has no memory management, but Golang provides efficient memory management.
Python does not have a concurrency mechanism, whereas Golang, on the other hand, has a built-in concurrency mechanism. In terms of safety, Python is a strongly typed language that is compiled, so it adds an extra layer of security whereas Go is not too bad since each factor must have a sort related to it. It implies a designer can't let away the subtleties, which will further prompt bugs. Python has a greater number of libraries than Golang. Python is more concise than Golang. Python is the best option for basic programming, as it gets difficult to write complicated functions with it. However, Golang is much better in complex programming than Python Not only this but also one significant dissimilarity exists. Python is a language that can be typed dynamically, whereas Go is not dynamic. The main reason behind this fact is that Python developers can easily understand Golang without any problem. Python focuses on simple and clear syntax, and spotless grammar of Go drives correctly to high clarity. The static composting of Go lines up with the standard of "express is superior to understood" in Python. So it can be said that Python is the best option for software engineers and developers all around the globe. But because Python is a dynamically typed language, its performance is lesser than Golang due to its uniqueness of statically typed. Therefore, it is better to use both languages simultaneously. For coding, give priority to Golang and use Python otherwise.
Python versus Node.js It's critical to recollect that Node.js isn't a programming language like Python, yet instead a runtime domain for JavaScript. Hence, writing in Node.js means you're utilizing a similar language on the frontend and the backend. Favorable Circumstances of Python over Node.js At a further advanced level, JavaScript can be hard to comprehend for developers with less Node.js experience. They may commit some genuinely basic errors, hindering progress simultaneously. It isn't the situation with Python since it's simpler to use for less experienced developers. The slip-ups made by them will have, to a lesser extent, a negative effect on improvement. Lower Section Point Frameworks, for example, Django is supportive, increment the nature of your code, and accelerate the way toward composing. More Applications Node.js is, for the most part, utilized for the web, while the uses of Python are far more noteworthy. The all-inclusiveness and flexibility of Python are among the top reasons why the language is an excellent fit for slanting advancements, for example, data science. Better Usage JavaScript runtime conditions and frameworks all unexpectedly actualize the language; Node.js is no exemption. In all honesty, the ecosystem of JavaScript is somewhat of a wreck— however, not even close to as terrible as it used to be. Python doesn't have that issue, which is the reason it's more straightforward
and simpler to utilize. It additionally makes the language quicker to write in, although Node.js is not slow. It's crucial to know JavaScript if you wish to utilize Node.js since you're managing a similar language on the frontend and the backend. Less Obstinate Ecosystem Node.js has unique features that pushes developers through indicators about "what they need to use and when they need to use" when they are working through this programming language. It has a lot of built-in packages that developers need to understand. That's why, with the improvement of programming libraries, the developers will have to develop their skills to that level.
Coding Everything in JavaScript The JavaScript is used for frontend and backend programming with the assistance of Node.js to achieve the best results. It saves a lot of time and makes the work easy for users. Nowadays, IT experts use this language as much as possible to perform web-based programming tasks. Quick Development and a Huge Network Since 2012, Python has been reliably lauded for its incredible network and support—and which is all well and good. With its large number of libraries and frameworks, it has quick development procedures by calling the required library or function. Nowadays, JavaScript is similarly also upheld. It continues developing without any indications of halting and stays particularly ahead of the pack of the most powerfully growing languages in the business. Advancement History of Python and JavaScript JavaScript has seen a lot of developing agonies. Its code was rejected many times when it was created, and its old adaptations are as yet making similar issues today. Overall, Python has the high ground here. The documentation and inclusion of Python are both better than Node.js. With regard to unwavering quality, Python has consistently been in front of JavaScript. Inclining Advances The tumultuous ecosystem of JavaScript additionally makes Node.js excessively precarious and erratic to depend on for drifting innovations. As a result of the critical issues of JavaScript patterns, JavaScript innovations become obsolete significantly more rapidly. It is the reason Node.js is an unsafe decision for rising innovative trends.
Python doesn't represent that hazard since it presents significant changes gradually. The language is an ideal fit for slanting innovations, for example, machine learning or data science, with its first-class specialists and library support. Execution and Speed Node.js may battle with executing many assignments immediately. The code isn't composed well overall; your program will perform ineffectively and work gradually. It may occur with Python, but Python frameworks, for example, Django, provide instant support to assist your program to run smoothly. It's one more case of Python making life easier for developers. Your program quality is everything—it’s the main factor to think about when choosing the programming language for your final product shape. Python works better for certain undertakings, and Node.js works better for other people. Your decision ought to depend completely on whether you have great Python or JavaScript developers in your group. This contention is invalid on the off chance that you happen to have full-stack developers with the two programming languages; nonetheless, those are difficult to find, so you need to decide your programming strategy before you start.
Python versus PHP From the improvement perspective, PHP is a web-situated language. A PHP application is increasingly similar to a lot of exclusive content, possibly with a separate semantic section point. Python is an adaptable language that can be additionally applied for web improvement. A web application dependent on Python is an undeniable application stacked into memory with its inside state, spared from the inquiry to the solicitation. When choosing between Python or PHP for web applications, focus on the following qualities: Python versus PHP for Web Improvement Correlation Patterns and prevalence of a programming language are critical these days. A few clients and program proprietors need to utilize the most famous and advertised advancements for their undertakings. As PHP has command over web-application programming and is widely used among the developer's community, it is considered the best option to achieve high-speed applications. Whereas Python also works for web-applications, the main agenda of this programming language is Data Science. Frameworks Python has a lot of functional libraries that are famous across the world, for example, Pandas, Numpy, and more. Similarly, there are highly efficient open-source code mechanisms. PHP has a different approach towards code quality and a system of innovative addition in this programming source. There are popular frameworks in Python, but the most useful are Django and Flask. Globally, developers are using these frameworks to enhance the speed of their work.
PHP language doesn't use frameworks. Instead, it focuses on calling libraries built by other PHP communities. It is an established reality that Python's framework will change soon because of the developing network of Python.
Chapter 9: Data Cleaning and Preparation The next topic that we need to take a look at in our process of data science is known as data cleaning and preparation. During the course of doing our own data analysis and modeling, a lot of time is going to be spent on preparing the data before it even enters into the model that we want to use. The process of data preparation is going to include a lot of different tasks, including loading, cleaning, transforming, and rearranging the data. These tasks are so important and take up so much of our time; it is likely that an analyst is going to spend at least 80 percent of their time on this. Sometimes the way that we see the data stored in a database or a file is not going to provide us with the right format when we work with a particular task. Many researchers find that it is easier to do ad hoc processing of the data, taking it from one form to another, working with some programming language. The most common programming languages to use to make this happen include Perl, R, Python, or Java. The good news here though is that the Pandas library that we talked about before, along with the features it gets from Python, can provide us with everything that we need. It has the right tools that are fast, flexible, and high-level that will enable us to get the data manipulated into the form that is most needed at that time. There are a few steps that we are able to work with in order to clean the data and get it all prepared, and these include:
What Is Data Preparation? Let’s suppose that you are going through some of the log files of a website and analyzing these, hoping to find out which IP out of all the options the spammers are coming from. Or you can use this to figure out which demographic on the website is leading to more sales. An analysis has to be performed on the data with two important columns to provide answers to such questions and more. These are going to include the number of hits that have been made to the website, and the IP address of the hit. As we can imagine here, the log files that you are analyzing are not going to be structured, and they could contain a lot of textual information that is unstructured. To keep this simple, preparing the log file to extract the data in the format that you require in order to analyze it can be the process known as data preparation. Data preparation is a big part of the whole data science process. According to CrowdFlower, which is a provider of data enrichment platforms that data scientists can work with, it is seen that out of 80 data scientists, they will spend their day in the following: 60 percent of their time is spent on organizing and then cleaning the data they have collected. 19 percent is spent on collecting the sets of data that they want to use. 9 percent is used to mine the data that they have collected and prepared in order to draw the necessary patterns. 3 percent of their time will be spent doing any of the necessary training for the sets of data. 4 percent of the time is going to be spent trying to refine the algorithms that were created and working on getting them better at their jobs. 5 percent of the time is spent on some of the other tasks that are needed
for this job. As we can see from the statistics of the survey above, it helps us to see that most of the time for that data scientist is spent in preparing the data, which means they have to spend a good deal of time organizing, cleaning, and collecting, before they are even able to start on the process of analyzing the data. There are a few valuable tasks of data science like data visualization and data exploration, but the least enjoyable process of data science is going to be the data preparation. The amount of time that you actually will spend on preparing the data for a specific problem with the analysis is going to depend directly on the health of the data. If there are a lot of errors, missing parts, and duplicate values, then this is a process that will take a lot longer. But if the data is well-organized and doesn’t need a lot of fixing, then the data preparation process is not going to take that long at all.
Why Do I Need Data Preparation? One question that a lot of people have when it is time to work on the process of data preparation is why they need to do it in the first place. It may seem to someone who is just getting started in this field that collecting the data and getting it all as organized as possible would be the best steps to take, and then they can go on to making their own model. But there are a few different reasons why data preparation will be so important to this process, and they will include the following: The set of data that you are working with could contain a few discrepancies in the codes or the names that you are using. The set of data that you are working with could contain a lot of outliers or some errors that mess with the results. The set of data that you are working with will lack your attributes of interest to help with the analysis. The set of data that you want to explore is not going to be qualitative, but it is going to be quantitative. These are not the same things, and often having more quality is going to be the most important. Each of these things has the potential to really mess up the model that you are working on and could get you results or predictions that are not as accurate as you would like. Taking the time to prepare your data and get it clean and ready to go can solve this issue and will ensure that your data is going to be more than ready to use in no time.
What Are the Steps for Data Preparation? At this point, we need to take some time to look at some of the steps that are needed to handle the data preparation for data mining. The first step is to clean the data. This is one of the first and most important steps to handling the data and getting it prepared. We need to go through and correct any of the data that is inconsistent by filling out some of the values that are missing and then smoothing out the outliers and any data that is making a lot of noise and influencing the analysis in a negative manner. There is the possibility that we end up with many rows in our set of data that do not have a value for the attributes of interest, or they could be inconsistent data that is there as well. In some cases, there are records that have been duplicated or some other random error that shows up. We need to tackle all of these issues with the data quality as quickly as possible in order to get a model at the end that provides us with an honest and reliable prediction. There are a few methods that we can use to handle some of the missing values. The method that is chosen is going to be dependent on the requirement either by ignoring the tuple or filling in some of the missing values with the mean value of the attribute. This can be done with the help of the global constant or with some of the other Python machine learning techniques, including the Bayesian formulae or a decision tree. We can also take some time to tackle the noisy data when needed. It is possible to handle this in a manual manner. Or there are several techniques of clustering or regression that can help us to handle this as well. You have to choose the one that is needed based on the
data that you have. The second step that we need to focus on here is going to be known as data integration. This step is going to involve a few things like integrating the schema, resolving some of the conflicts of the data if any shows up, and even handling any of the redundancies that show up in the data that you are using. Next on the list is going to be the idea of data transformation. This step is going to be important because it will take time to handle some of the noise that is found in your data. This step is going to help us to take out that noise from the data so it will not cause the analysis you have to go wrong. We can also see the steps of normalization, aggregation, and generalization showing up in this step as well. We can then move on to the fourth step, which is going to be all about reducing the data. The data warehouse that you are using might be able to contain petabytes of data, and running an analysis on this complete set of data could take up a lot of time and may not be necessary for the goals that you want to get in the end with your model. In this step, it is the responsibility of the data science to obtain a reduced representation of their set of data. We want this set to be smaller in size than some of the others, but inclusive enough that it will provide us with some of the same analysis outcomes that we want. This can be hard when we have a very large set of data, but there are a few reduction strategies for the data that we can apply. Some of these are going to include numerosity reduction, aggregation, data cube, and dimensionality reduction, and more, based on the requirements that you have. And finally, the fifth step of this is going to be known as data discretization. The set of data that you are working with will contain three types of
attributes. These three attributes are going to include continuous, nominal, and ordinal. Some of the algorithms that you will choose to work with only handle the attributes that are categorical. This step of data discretization can help someone in data science divide continuous attributes into intervals, and can also help reduce the size of the data. This helps us to prepare it for analysis. Take your time with this one to make sure that it all matches up and does some of the things that you are expecting. Many of the methods and the techniques that you are able to use with this part of the process are going to be strong and can get a lot of the work with you. But even with all of these tools, it is still considered an area of research, one that many scientists are going to explore more and hopefully come up with some new strategies and techniques that you can use to get it done.
Handling the Missing Data It is common for data to become missing in many applications of data analysis. One of the goals of working with the Pandas library here is that we want to make it work with some of this missing data as easy and as painless as possible. For example, all of the descriptive statistics that happen on the objects of Pandas exclude the missing data by default. The way that this data is going to be represented in Pandas is going to have some problems, but it can be really useful for many of the users who decide to go with this kind of library. For some of the numeric data that we may have to work with, the Pandas library is going to work with a floating-point value that is known as NaN, or not a number, to represent the data that is missing inside of our set of data. In the Pandas library, we have adopted a convention that is used in the programming language of R in order to refer to the missing data. This missing data is going to show up as NA, which means not available right now. In the applications of statistics, NA data can either be data that doesn’t exist at all, or that exists, but we are not going to be able to observe through problems in the collection of data. When cleaning up the data to be analyzed, it is often important to do some of the analysis on the missing data itself to help identify the collection of the data and any problems or potential biases in the data that has been caused by the missing data. There are also times when the data is going to have duplicates. When you get information online or from other sets of data, it is possible that some of the results will be duplicated. If this happens often, then there is going to be a mess with the insights and predictions that you get.
The data is going to lean towards the duplicates, and it will not work the way that you would like. There are ways that you can work with the Pandas library in order to really improve this and make sure that the duplicates are eliminated or are limited at least a little bit. There is so much that we are able to do when it comes to working with data preparation in order to complete the process of data mining and getting the results that we want in no time with our analysis. Make sure to take some time on this part, as it can really make or break the system that we are trying to create. If you do spend enough time on it, and ensure that the data is as organized and clean as possible, you are going to be happy with the results and ready to take on the rest of the process.
Chapter 10: Data Visualization Data visualization is an important element for every data scientist. During the early periods of a project, you will need to perform an exploratory data analysis to identify insights into your data. Creating visualizations allows you to simplify things, particularly with a wide-dimensional dataset. Towards the end of your project, you need to deliver the final result in a transparent and compelling manner that your audience can understand.
Data Visualization to the End-User Usually, the data scientist has a role in submitting their insights to the final user. The results can be conveyed in different ways: A single presentation. In the following case, the research questions consist of one-shot deals because the business decision extracted from them will direct the organization to a given course for several years to come. For instance, company investment decisions. Do you distribute the goods from two distribution centers or just one? Where are they supposed to be located for the best efficiency? When the decision is made, the exercise might not be repeated until you retire. In the following case, the results are generated as a report with a presentation as the icing on the cake. A new viewport on data. The most common example of this is customer segmentation. For sure, the segments will be send using reports and presentations, but in essence, they comprise of tools, but not the final result itself. Once a clear and important customer segmentation is identified, it can be supplied back to the database as a new channel on the data from which it was extracted. From this point, people can create their own reports. For instance, how many products were sold to every customer segment? For a real-time, dashboard—your functions as a data scientist don’t complete once you have the new information. You can send your information back to the database and get done with it. However, when other people start to create reports on the discovered
gold nugget, they can interpret it incorrectly and generate reports that don’t make sense. Since you are the data scientist that found this new information, you need to set the example. In the following case, you need to create the first refreshable report so that the rest can learn from it and use your footsteps. Creating the first dashboard is still a means to reduce the delivery time of your insights to the final user who wants to make use of it daily. By doing this, they already have something to build upon until the reporting department discovers the time to establish a permanent report regarding the company’s reporting software. You may have discovered that some important elements are at play: First, what type of decision are you supporting? Is it strategic or operational? Strategic decisions need you to conduct an analysis and generate a report. But still, operational decisions require the report to be updated often. What is the size of your organization? For smaller organizations, you will deal with the general cycle. This one ranges from collection to reporting. For bigger teams, reporters could be available to create the dashboards for you. Still, in the last part, creating a prototype dashboard can be relevant because it provides an example and reduces the delivery time. Matplotlib is a great Python library that can be used to build your data visualizations. However, designing the data, parameters, and plots can get messy and tiresome to do regularly. This section will guide you through data visualizations and create some rapid and easy functions with the help of Python’s Matplotlib.
You will learn how to create basic plots using Seaborn, Matplotlib, and Pandas visualization. Python offers many graphing libraries that are packed with a lot of features. No matter whether you want to describe an interactive, or complex Python plots, it delivers a powerful library. To provide some overview, the popular plotting libraries consist of: Seaborn. This has an advanced interface and important default styles. Plotly. This is important in the development of significant plots. Visualization using Pandas. It is easy to apply interface and has been built on Matplotlib. Ggplot. This one relies on R’s ggplot2 and applies the Grammar of Graphics.
Matplotlib This is the most common Python library. It is a low library type that has a Matlab interface that has more freedom. Matplotlib is important for creating basic graphs such as bar charts, line charts, and histograms. You can import the following library by using the following line of code:
Line Chart The Matplotlib library allows the development of a line chart by applying the plot method. Still, it is possible to create multiple columns using a single graph by plotting and looping every column on the same axis. Histogram By using Matplotlib, you can design a histogram using the hist method. In case you relay categorical data like column points, it will help determine the likelihood of each class happening. Bar Chart In case you want to show data in a bar chart, then the bar function is useful. The bar chart is not automatically created using the frequency of a category, so you will require to use Pandas to achieve this. The bar chart is useful for grouping data that doesn’t spread categories because it can become messy.
Visualization Using Pandas Pandas represent an advanced level of open-source libraries. It is a simple library that represents data structures and data analysis tools. Visualization, with the help of Pandas, makes it easy to create data frames and series. Still, it has an advanced level of API than the Matplotlib. As a result, the minimum code is required for similar results. If you want to install Pandas, then you require to run the pip command.
The Objective of Visualization Data communication and exploration are the major focus of data visualization. Once data is visualized, the patterns become visible. You will immediately tell whether there is an increasing trend or the relative magnitude of something in connection to other factors. Rather than tell people the long list of numbers, why not display the numbers to them for better clarity? For instance, let’s consider the worldwide trend search on the word ‘bitcoin.’
You should see that there is a temporary rise in the bitcoin interest, but it starts to decrease after the peak. Overall, during the peak interval, there’s a huge hype connected to the technological and social effects of bitcoin. Again, the following hype decreases because people understand it, or it’s a common thing related to hypes. No matter the situation, data visualization helps us to determine the patterns in a very clear style. Keep in mind the importance of data visualization is to explore data. In the following case, you can quickly choose the patterns as well as the data send to us. This is critical when you submit it to the public audience. Others may decide to go for a quick brief of the data without rushing into detail.
You don’t really need to disturb them with texts and numbers. What presents a wide effect is the way you build them using numbers and texts. What sets a big difference is how you define the data so that individuals can quickly recall its importance. This is where data visualization becomes helpful to allow people to mine data and communicate whatever you are trying to speak. There are numerous methods of visualizing data.
The Simplest Method to Complex Visualization of Data Visualizations are a powerful skill that every data scientist needs to be aware of to create excellent data. It is more than just creating beautiful charts, representing the dataset’s information in a way that is easy for individuals to learn. When you have the right visualization, an individual can quickly learn the patterns and information that is found beneath the data. In the early stages of a project, you will conduct an exploratory data analysis to generate insights into your data. Creating visualizations will increase your analysis. At the end of your project, it is vital to submit your final results in a brief and compelling manner such that any audience can be able to read. There’s no doubt your visualizations to the next stage will let you defeat your next presentation. This section will explore ways in which you can define an attractive, complex data visualization. You will apply the Plotly python library that is excellent in creating interactive visualizations.
Overview of Plotly Plotly represents an interactive, browser-depended graphic Python library. It is a library that allows you to improve the visualization capabilities compared to the standard Matplotlib. There are two benefits of applying Plotly instead of other Python libraries such as Matplotlib, Pandas, and Seaborn. That is: The ease of application. This will define an interactive plot and other complex graphics. Performing the same operation using other libraries takes a lot of work. It provides additional functionalities. Since Plotly is designed from D3.js, the plotting capability is more powerful than other plotting libraries. The Sunburst charts and many more are possible using Plotly. Building Attractive Plots Using Plotly Plotly is useful in building fancy plots. To start, first, let’s import Plotly and its internal graph objects component. You will also import Pandas to load the dataset.
To read the dataset, you basically write a one-liner in Pandas. Scatter Plots For this particular section, we are going to plot a scatter plot for sales price against the year built. To achieve that, you will need to define a scatter graph object and store it in a trace.
Then, to plot, you only write a single line.
The following command will create a new tab within your browser with the plot. Graph interactivity comes automatically built-in with Plotly. Box Plots This time, we will look at the box plots. The process is quite similar. For that reason, we are going to define a graph object, store it into a trace, and then represent it in a browser.
The box plot will feature attractive properties with box plots.
By default, we attain the same zooming, panning, and point of selection. Now that the box plot exists, if you hover around each box plot, it will reveal the following: Median 1st and 3rd quartiles Min and Max values of the data range The upper and/or lower fences if there are outliers
Heat Maps Heat maps are a critical tool for data scientists. They are effective for displaying the association between multiple feature variables in a single graph plus the relative significance of each relationship. To demonstrate the way your Heat Maps can be improved with Plotly, we are going to create a correlation matrix of the House Prices dataset as a heat map.
Heat maps in Matplotlib can be somehow difficult because you cannot identify the correct value of each cell—you can only tell from the color. You can write the code to make it interactive, but that’s probably the hassle in Matplotlib. Plotly provides interactivity beyond the box, so when you plot a heat map, you get an attractive overview and an option to confirm exact values when needed. Both the pan-and-zoom functionality of Plotly are super clean, providing an easy mean to perform a comprehensive exploration from a visual point of view. These are just to indicate the significance and possibilities of applying Plotly. Keep in mind that you can create a publication using quality data visualizations. Additionally, you can change the example codes to your objective. There’s no need to invent something unique. You can copy the right sections and apply them to your data.
Probably, in the future, there will be easier and effective methods to build data visualizations, especially when dealing with huge datasets. You can still build animated presentations that can change with time. Whichever way, the main goal of data visualization is to communicate data. You can select other methods, but the goal normally remains the same. You have learned general aspects of data visualization. You have learned that data visualization is the practice of understanding data by representing it in a graphical style so that trends may not be seen exposed. Python provides many different graphic libraries that are packed with lots of different attributes.
Conclusion It was a long journey, but it’s certainly not the end. Data science is a massive field of study that requires years of learning and practice before you can master it. This shouldn’t discourage you, however! Embrace it as a challenge that you can undertake in order to broaden your horizons and improve your knowledge of all that is data science and machine learning. This book offers you the fundamental knowledge you need to get started, but keep in mind that no book or even teacher can do everything for you. You need to work hard by putting each building block in its place as you advance. Data science is a highly complex topic that has continuously been developed for decades. It is constantly evolving, and it can be challenging to keep up with all the past, present, and future concepts. With that being said, this isn’t supposed to discourage you from pursuing this field. You don’t necessarily need a computer science degree in order to learn all aspects of data science. What you do need, however, is that spark that urges you to learn more and put everything new to the test by working with real data sets and actual data science projects. Acquire more books and join online communities of data scientists, programmers, statisticians, and machine learning enthusiasts! You can benefit a lot from working with others. Expose yourself to other perspectives and ideas as soon as possible, even when you barely know the basics. The learning process receives a boost when you have other people with
similar goals helping you out. Almost everyone will agree with the statement that big data has arrived in a big way and has taken the business world by storm. But what is the future of data analysis, and will it grow? What are the technologies that will grow around it? What is the future of big data? Will it grow more? Or is the big data going to become a museum article soon? What is cognitive technology? What is the future of fast data? The data volume will keep on growing. There is practically no question in the minds of people that we’ll keep on developing a larger and larger quantity of data, especially after taking into consideration the number of internet-connected devices and handheld devices is going to grow exponentially. The ways we undertake data analysis will show marked improvement in the upcoming years. Although SQL will remain the standard tool, we’ll see other tools such as Spark emerging as a complementary method for the data analysis, and their number will keep on growing as per reports. More and more tools will become available for data analysis, and some of them will not need the analyst. Microsoft and Salesforce have announced some combined features which will allow the non-coders to create apps for viewing the business data. The prescriptive analytics will get built into the business analytics software, and IDC predicts that 50 percent of all software related to business analysis will become available with all the business intelligence it needs by the year 2020. In addition to these features, real-time streaming insight into the big data will turn into a hallmark for the data winners moving forward.
The users will be looking to use this data for making informed decisions within real-time by using programs such as Spark and Kafka. The topmost strategic trend that will emerge is machine learning. Machine learning will become a mandatory element for big data preparation and predictive analysis in businesses going forward. You can expect big data to face huge challenges as well, especially in the field of privacy of user details. The new private regulations enforced by the European Union clearly intend to protect the personal information of the users. Various companies will have to address privacy controls and processes. It is predicted that most of the business ethics violations will be related to data in the upcoming years. Soon you can pretty much expect all companies to have a chief data officer in place. Forrester says that this officer will rise in significance within a short period of time, but certain kinds of businesses and generation gaps might decrease their significance in the upcoming future. Autonomous agents will continue to play a significant role, and they will keep on being a huge trend as per Gartner. These agents include autonomous vehicles, smart advisers, virtual personal assistants, and robots. The staffing required for the data analysis will keep on expanding, and people from scientists to analysts to architects to the experts in the field of data management will be needed. However, a crunch in the availability of big data talent might see the large companies develop new tactics. Some large institutes predict that various organizations will use internal training to get their issues resolved. A business model having big data in the form of service can be seen on the horizon. Data science is a complex field that requires a lot of dedication from you due
to the amount of information you need to absorb. This book hands you the tools you need to study every concept and guides you with clear examples of code and data sets. The rest is up to you! You have the fundamentals below your belt, and now you can continue your journey to become a data scientist!